Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-29 Thread Michael Paquier
On Fri, Sep 30, 2016 at 2:05 PM, Kyotaro HORIGUCHI
 wrote:
> At Fri, 30 Sep 2016 14:00:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20160930.140015.150178454.horiguchi.kyot...@lab.ntt.co.jp>
>> I don't see no problem in setting progressAt in XLogInsertRecord.
>> But I doubt GetProgressRecPtr is harmful, especially when
>
> But I suspect that GetProgressRecPtr could be harmful.

Well, you can maximize its effects by doing NUM_XLOGINSERT_LOCKS ==
nproc and reducing checkpoint_timeout. That's what I did but..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2016-09-29 Thread Kyotaro HORIGUCHI
At Thu, 29 Sep 2016 16:16:00 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20160929.161600.224338668.horiguchi.kyot...@lab.ntt.co.jp>
> That looks better. I'll change the API as the following.
> 
> COMPLETE_WITH_QUERY(query);
> COMPLETE_WITH_QUERY_KW(query, kwlist);
> COMPLETE_WITH_SCHEMA_QUERY(squery);
> COMPLETE_WITH_SCHEMA_QUERY_KW(squery, kwlist);

Done on my environment.

> > >> 4. Introduce word shift and removal feature to psql-completion
> > >>
> > >>   This is the second core for the flexibility of completion code.
> > >>   The word shift feature is the ability to omit first several
> > >>   words in *MatchesN macros. For example this allows complete
> > >>   create-schema's schema elements in a natural code. (Currently
> > >>   those syntaxes that can be a schema elements are using
> > >>   TailMatches instead of Matches, as the result HeadMatches are
> > >>   not available there). The words removing feature is the ability
> > >>   to (desructively) clip multiple suceessive words in the
> > >>   previous_words list. This feature allows suceeding completion
> > >>   code not to care about the removed words, such like UNIQUE,
> > >>   CONCURRENTLY, VERBOSE and so on.
> > >>
> > >
> > I am thinking so commit's description should be inside README
> 
> Currently psql or tab-complete.c/psql_completion() have no such
> document. If this should be written as README, perhaps I should
> write about completion in general. On the other hand, per-macro
> explanations are written in tab-complete-macros.h but the usages
> are not. I'll try to write README.

Before proposing new patch including this. Since I'm totally
inconfident for my capability to write a documentation, I'd like
to ask anyone here of what shape we are to aim..

The following is the first part of the document I have written up
to now. Please help me by giving me any corrections, suggestions,
opinions, or anything else!

# The completion macro part would be better to be moved as
# comments for the macros in tab-complete-macros.h.

==
src/bin/psql/README.completion

Word completion of interactive psql
===

psql supports word completion on interactive input. The core function
of the feature is psql_completion_internal in tab-complete.c. A bunch
of macros are provided in order to make it easier to read and maintain
the completion code. The console input to refer is stored in char **
previous_words in reverse order but maintaiers of
psql_completion_internal don't need to be aware of the detail of
it. Most of the operation can be described using the provided macros.

Basic structure of the completion code
--

The main part of the function is just a series of completion
definitions, where the first match wins. Each definition basically is
in the following shape.

   if (*matching expression*)
  *corresponding completion, then return*

The matching expression checks against all input words before the word
currently being entered. Completion chooses the words prefixed by
letters already entered. For example, for "CREATE " the word list
to be matched is ["CREATE"] and the prefix for completion is
nothing. For "CREATE INDEX id", the list is ["CREATE", "INDEX"] and
the prefix is "id".


Matching expression macros
--
There are four types of matching expression macros.

- MatchesN(word1, word2 .. , wordN)

 true iff the word list is exactly the same as the paremeter.

- HeadMatchesN(word1, word2 .., wordN)

 true iff the first N words in the word list matches the parameter.

- TailMatchesN(word1, word2 .., wordN)

 true iff the last N words in the word list matches the parameter.

- MidMatchesN(pos, word1, word2 .., wordN)

 true iff N successive words starts from pos in the word list matches
 the parameter. The position is 1-based.


Completion macros
-
There are N types of completion macros.

- COMPLETE_WITH_QUERY(query), COMPLETE_WITH_QUERY_KW(query, addon)

  Suggest completion words acquired using the given query. The details
  of the query is seen in the comment for _complete_from_query(). Word
  matching is case-sensitive.

  The latter takes an additional parameter, which should be a fragment
  of query starts with " UNION " followed by a query string which
  gives some additional words. This addon can be ADDLISTN() macro for
  case-sensitive suggestion.

- COMPLETE_WITH_SCHEMA_QUERY(squery),
  COMPLETE_WITH_SCHEMA_QUERY_KW(squery, addon)

  Suggest based on a "schema query", which is a struct containing
  parameters. You will see the details in the comment for
  _complete_from_query(). Word maching is case-sensitive.

  Just same as COMPLETE_WITH_QUERY_KW, the latter form takes a
  fragment query same to that for COMPLETE_WITH_QUERY_KW.

- COMPLETE_WITH_LIST_CS(list)

  Suggest completion words given as an array of strings. Word matching
  is case-sensitive.

- 

Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

2016-09-29 Thread Masahiko Sawada
On Fri, Sep 30, 2016 at 1:26 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Fri, 30 Sep 2016 13:11:21 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-29 Thread Kyotaro HORIGUCHI
Sorry, it wrote wrong thing.

At Fri, 30 Sep 2016 14:00:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20160930.140015.150178454.horiguchi.kyot...@lab.ntt.co.jp>
> Sorry, I might have torn off this thread somehow..
> 
> At Thu, 29 Sep 2016 11:26:29 -0400, David Steele  wrote 
> in <30095aea-3910-dbb7-1790-a579fb93f...@pgmasters.net>
> > On 9/28/16 10:32 PM, Michael Paquier wrote:
> > > On Thu, Sep 29, 2016 at 7:45 AM, David Steele 
> > > wrote:
> > >>
> > >> In general I agree with the other comments that this could end up
> > >> being
> > >> a problem.  On the other hand, since the additional locks are only
> > >> taken
> > >> at checkpoint or archive_timeout it may not be that big a deal.
> > >
> > > Yes, I did some tests on my laptop a couple of months back, that has 4
> > > cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase
> > > contention and performing a bunch of INSERT using 4 clients on 4
> > > different relations I could not catch a difference.. Autovacuum was
> > > disabled to eliminate any noise. I tried checkpoint_segments at 30s to
> > > see its effects, as well as larger values to see the impact with the
> > > standby snapshot taken by the bgwriter. Other thoughts are welcome.
> > 
> > I don't have any better ideas than that.
> 
> I don't see no problem in setting progressAt in XLogInsertRecord.
> But I doubt GetProgressRecPtr is harmful, especially when

But I suspect that GetProgressRecPtr could be harmful.

> NUM_XLOGINSERT_LOCKS is *large*. So reducing the number seems
> rather alleviates the impact. But it actually doesn't seem so
> harmful up to 8. (Even though I don't like the locking in
> GetProgressRecPtr..)
> 
> Currently possiblly harmful calling of GetProgressRecPtr is that
> in BackgroundWriterMain. It should be called with ther interval
> BgWriterDelay, and anytime pgwriter recieved SIGUSR1. But I don't
> see the issuer of SIGUSR1 of bgwriter..
> 
> 
> > >> +++ b/src/backend/postmaster/checkpointer.c
> > >> +   /* OK, it's time to switch */
> > >> +   elog(LOG, "Request XLog Switch");
> > >>
> > >> LOG level seems a bit much here, perhaps DEBUG1?
> > >
> > > That's from Horiguchi-san's patch, and those would be definitely
> > > better as DEBUG1 by looking at it. Now and in order to keep things
> > > simple I think that we had better discard this patch for now. I was
> > > planning to come back to this thing anyway once we are done with the
> > > first problem.
> > 
> > I still see this:
> > 
> > +++ b/src/backend/postmaster/checkpointer.c
> > /* OK, it's time to switch */
> > +   elog(LOG, "Request XLog Switch");
> > 
> > > Well for now attached are two patches, that could just be squashed
> > > into one.
> 
> Mmmm. Sorry, this was for my quite private instant debug, spilt
> outside.. But I don't mind to leave it with DEBUG2 if it seems
> useful.
> 
> > Yes, I think that makes sense.
> > 
> > More importantly, there is a regression.  With your new patch the
> > xlogs are switching on archive_timeout again even with no changes.
> > The v12 worked fine.
> 
> As Michael mentioned in this or another thread, it is another
> issue that he wants to solve separately. I personally doubt that
> this patch (v11 and v13) can be evaluated alone without it, but
> we can review this with the excessive switching problem, perhaps?
> 
> > The differences are all in 0002-hs-checkpoints-v12-2.patch and as far
> > as I can see the patch does not work correctly without these changes.
> > Am I missing something?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-29 Thread Kyotaro HORIGUCHI
Sorry, I might have torn off this thread somehow..

At Thu, 29 Sep 2016 11:26:29 -0400, David Steele  wrote in 
<30095aea-3910-dbb7-1790-a579fb93f...@pgmasters.net>
> On 9/28/16 10:32 PM, Michael Paquier wrote:
> > On Thu, Sep 29, 2016 at 7:45 AM, David Steele 
> > wrote:
> >>
> >> In general I agree with the other comments that this could end up
> >> being
> >> a problem.  On the other hand, since the additional locks are only
> >> taken
> >> at checkpoint or archive_timeout it may not be that big a deal.
> >
> > Yes, I did some tests on my laptop a couple of months back, that has 4
> > cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase
> > contention and performing a bunch of INSERT using 4 clients on 4
> > different relations I could not catch a difference.. Autovacuum was
> > disabled to eliminate any noise. I tried checkpoint_segments at 30s to
> > see its effects, as well as larger values to see the impact with the
> > standby snapshot taken by the bgwriter. Other thoughts are welcome.
> 
> I don't have any better ideas than that.

I don't see no problem in setting progressAt in XLogInsertRecord.
But I doubt GetProgressRecPtr is harmful, especially when
NUM_XLOGINSERT_LOCKS is *large*. So reducing the number seems
rather alleviates the impact. But it actually doesn't seem so
harmful up to 8. (Even though I don't like the locking in
GetProgressRecPtr..)

Currently possiblly harmful calling of GetProgressRecPtr is that
in BackgroundWriterMain. It should be called with ther interval
BgWriterDelay, and anytime pgwriter recieved SIGUSR1. But I don't
see the issuer of SIGUSR1 of bgwriter..


> >> +++ b/src/backend/postmaster/checkpointer.c
> >> +   /* OK, it's time to switch */
> >> +   elog(LOG, "Request XLog Switch");
> >>
> >> LOG level seems a bit much here, perhaps DEBUG1?
> >
> > That's from Horiguchi-san's patch, and those would be definitely
> > better as DEBUG1 by looking at it. Now and in order to keep things
> > simple I think that we had better discard this patch for now. I was
> > planning to come back to this thing anyway once we are done with the
> > first problem.
> 
> I still see this:
> 
> +++ b/src/backend/postmaster/checkpointer.c
>   /* OK, it's time to switch */
> + elog(LOG, "Request XLog Switch");
> 
> > Well for now attached are two patches, that could just be squashed
> > into one.

Mmmm. Sorry, this was for my quite private instant debug, spilt
outside.. But I don't mind to leave it with DEBUG2 if it seems
useful.

> Yes, I think that makes sense.
> 
> More importantly, there is a regression.  With your new patch the
> xlogs are switching on archive_timeout again even with no changes.
> The v12 worked fine.

As Michael mentioned in this or another thread, it is another
issue that he wants to solve separately. I personally doubt that
this patch (v11 and v13) can be evaluated alone without it, but
we can review this with the excessive switching problem, perhaps?

> The differences are all in 0002-hs-checkpoints-v12-2.patch and as far
> as I can see the patch does not work correctly without these changes.
> Am I missing something?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracking wait event for latches

2016-09-29 Thread Michael Paquier
On Fri, Sep 30, 2016 at 1:48 AM, Robert Haas  wrote:
> It seems to me that you haven't tested this patch very carefully,
> because as far as I can see it breaks wait event reporting for both
> heavyweight locks and buffer pins - or in other words two out of the
> three existing cases covered by the mechanism you are patching.

Oops. The WaitLatch calls overlap the other things if another event is reported.

> The heavyweight lock portion is broken because WaitOnLock() reports a
> Lock wait before calling ProcSleep(), which now clobbers it. Instead
> of reporting that we're waiting for Lock/relation, a LOCK TABLE
> statement that hangs now reports IPC/ProcSleep.  Similarly, a conflict
> over a tuple lock is now also reported as IPC/ProcSleep, and ditto for
> any other kind of lock, which were all reported separately before.
> Obviously, that's no good.  I think it would be just fine to push down
> setting the wait event into ProcSleep(), doing it when we actually
> WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to
> report that we're waiting for the heavyweight lock, not ProcSleep().

Somewhat similar problem with ResolveRecoveryConflictWithBufferPin(),
per this reasoning what we should wait for here is a buffer pin and
not a IPC/WaitForSignal.

> As for buffer pins, note that LockBufferForCleanup() calls
> ProcWaitForSignal(), which now overwrites the wait event set in by its
> caller.  I think we could just make ProcWaitForSignal() take a wait
> event as an argument; then LockBufferForCleanup() could pass an
> appropriate constant and other callers of ProcWaitForSignal() could
> pass their own wait events.

Agreed. So changed the patch this way.

> The way that we're constructing the wait event ID that ultimately gets
> advertised in pg_stat_activity is a bit silly in this patch.  We start
> with an event ID (which is an integer) and we then do an array lookup
> (via GetWaitEventClass) to get a second integer which is then XOR'd
> back into the first integer (via pgstat_report_wait_start) before
> storing it in the PGPROC.  New idea: let's change
> pgstat_report_wait_start() to take a single integer argument which it
> stores without interpretation.  Let's separate the construction of the
> wait event into a separate macro, say make_wait_event().  Then
> existing callers of pgstat_report_wait_start(a, b) will instead do
> pgstat_report_wait_start(make_wait_event(a, b)), but callers that want
> to construct the wait event and push it through a few levels of the
> call stack before advertising it only need to pass one integer.  If
> done correctly, this should be cleaner and faster than what you've got
> right now.

Hm, OK So ProcSleep() and ProcWaitForSignal() need an additional
argument in the shape of a uint32 wait_event. So we need to change the
shape of WaitLatch to have also just a uint32 as an extra
argument. This has as result to force all the callers of WaitLatch to
use the new routine pgstat_make_wait_event() (renamed it), so pgstat.h
needs to be included where WaitLatch() is called.

And this has as consequence to make the addition of classId in
WaitEventEntries completely useless, because including them has the
advantage to reduce the places where pgstat.h is included, but as
make_wait_event is needed to the wait_event value...

> I am not a huge fan of the "WE_" prefix you've used.  It's not
> terrible, but "we" doesn't obviously stand for "wait event" and it's
> also a word itself.  I think a little more verbosity would add
> clarity.  Maybe we could go with WAIT_EVENT_ instead.

OK. Switched that back. Hopefully you find the result cleaner.
-- 
Michael
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 8ca1c1c..0841fd7 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -17,6 +17,7 @@
 #include "access/xact.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "storage/latch.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
@@ -491,12 +492,18 @@ pgfdw_get_result(PGconn *conn, const char *query)
 		while (PQisBusy(conn))
 		{
 			int			wc;
+			uint32		wait_event;
+
+			/* Define event to wait for */
+			wait_event = pgstat_make_wait_event(WAIT_EXTENSION,
+WAIT_EVENT_EXTENSION);
 
 			/* Sleep until there's something to do */
 			wc = WaitLatchOrSocket(MyLatch,
    WL_LATCH_SET | WL_SOCKET_READABLE,
    PQsocket(conn),
-   -1L);
+   -1L,
+   wait_event);
 			ResetLatch(MyLatch);
 
 			CHECK_FOR_INTERRUPTS();
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index f400785..bb975c1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -679,6 +679,42 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   buffer in question.
  
 
+
+ 
+  Activity: The server 

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-09-29 Thread Thomas Munro
On Thu, Sep 29, 2016 at 9:17 PM, Michael Paquier
 wrote:
> On Tue, Aug 30, 2016 at 11:04 AM, Michael Paquier
>  wrote:
>> On Mon, Aug 29, 2016 at 9:39 PM, Craig Ringer
>>  wrote:
>>> Cool. I'll mark as waiting on author pending that.
>>>
>>> It'll be good to get this footgun put away.
>>
>> OK, so done. I have put the renaming of pg_xlog to pg_wal on top patch
>> stack as that's the one making no discussion it seems: a lot of people
>> like pg_wal. I have added as well handling for the renaming in
>> pg_basebackup by using PQserverVersion. One thing to note is that a
>> connection needs to be made to the target server *before* creating the
>> soft link of pg_xlog/pg_wal because we need to know the version of the
>> target server. pg_upgrade is handled as well. And that's all in 0001.
>>
>> Patch 0002 does pg_clog -> pg_trans, "trans" standing for
>> "transaction". Switching to pg_trans_status or pg_xact_status is not
>> that complicated to change anyway...
>
> Any input to offer for those patches? If there is nothing happening, I
> guess that the best move is just to move it to next CF. At least I can
> see that the flow would be to get those renamings done.

+1 for pg_xlog -> pg_wal.

Of the existing suggestions, would like to add my vote for the
following renames, matching pg_multixact:

pg_clog -> pg_xact
pg_subtrans -> pg_subxact

If longer names are on the table, I would consider expanding all three of those:

pg_clog -> pg_transaction
pg_subtrans -> pg_subtransaction
pg_multixact -> pg_multitransaction

They sound eminently non-deletable.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Dilip Kumar
On Thu, Sep 29, 2016 at 8:05 PM, Robert Haas  wrote:
> OK, another theory: Dilip is, I believe, reinitializing for each run,
> and you are not.

Yes, I am reinitializing for each run.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

2016-09-29 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 30 Sep 2016 13:11:21 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

2016-09-29 Thread Masahiko Sawada
On Fri, Sep 30, 2016 at 7:08 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> I wouldn't even put a lot of faith in the errno being meaningful,
>>> considering that it does close() calls before capturing the errno.
>
>> So we do close() in a bunch of places while closing shop, which calls
>> _close() on Windows; this function sets errno.
>
> But only on failure, no?  The close()s usually shouldn't fail, and
> therefore shouldn't change errno, it's just that you can't trust that
> 100%.
>
> I think likely what's happening is that we're seeing a leftover value from
> some previous syscall that set GetLastError's result (and, presumably,
> wasn't fatal so far as pg_upgrade was concerned).
>

It means that read() syscall could not read BLOCKSZ bytes because
probably _vm file is corrupted?

> if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)

It could be happen that read() syscall stopped to read at where it
reads bits representing '\n' characters (0x5c6e) because we don't open
file with O_BINARY flag?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-09-29 Thread Amul Sul
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Appreciate your support. 
I've tested v6 patch again, looks good to me, code in v6 does not differ much 
from v4 patch. Ready for committer review. Thanks !

Regards,
Amul Sul

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-09-29 Thread Thomas Munro
On Thu, Sep 29, 2016 at 6:19 PM, David Fetter  wrote:
> On Thu, Sep 29, 2016 at 11:12:11AM +1300, Thomas Munro wrote:
>> On Mon, Sep 26, 2016 at 5:11 PM, Thomas Munro
>>  wrote:
>> > On Mon, Sep 26, 2016 at 1:18 PM, Thomas Munro
>> >  wrote:
>> >>
>> >> On Mon, Sep 19, 2016 at 4:02 PM, David Fetter  wrote:
>> >> >
>> >> > [training_wheels_004.patch]
>> >>
>> >> [review]
>>
>> Ping.
>
> Please find attached the next revision.

I'm not sold on ERRCODE_SYNTAX_ERROR.  There's nothing wrong with the
syntax, since parsing succeeded.  It would be cool if we could use
ERRCODE_E_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED, though I'm not sure
what error class 38 really means.  Does require_where's hook function
count as an 'external routine' and on that basis is it it allowed to
raise this error?  Thoughts, anyone?

I am still seeing the underscore problem when building the docs.
Please find attached fix-docs.patch which applies on top of
training_wheels_005.patch.  It fixes that problem for me.

The regression test fails.  The expected error messages show the old
wording, so I think you forgot to add a file.  Also, should
contrib/require_where/Makefile define REGRESS = require_where?  That
would allow 'make check' from inside that directory, which is
convenient and matches other extensions.  Please find attached
fix-regression-test.patch which also applies on top of
training_wheels_005.patch and fixes those things for me, and also adds
a couple of extra test cases.

Robert Haas mentioned upthread that the authors section was too short,
and your follow-up v3 patch addressed that.  Somehow two authors got
lost on their way to the v5 patch.  Please find attached
fix-authors.patch which also applies on top of
training_wheels_005.patch to restore them.

It would be really nice to be able to set this to 'Ready for
Committer' in this CF.  Do you want to post a v6 patch or are you
happy for me to ask a committer to look at v5 + these three
corrections?

-- 
Thomas Munro
http://www.enterprisedb.com


fix-regression-test.patch
Description: Binary data


fix-docs.patch
Description: Binary data


fix-authors.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notice lock waits

2016-09-29 Thread Haribabu Kommi
On Fri, Sep 30, 2016 at 3:00 AM, Jeff Janes  wrote:

> On Wed, Sep 28, 2016 at 11:57 PM, Haribabu Kommi  > wrote:
>>
>>
>> Providing the details of lock wait to the client is good. I fell this
>> message
>> is useful for the cases where User/administrator is trying to perform some
>> SQL operations.
>>
>> I also feel that, adding a GUC variable for these logs to show it to user
>> may not be good. Changing the existing GUC may be better.
>>
>
> I don't think it would be a good idea to refactor the existing GUC
> (log_lock_waits) to accomplish this.
>
> There would have to be four states, log only, notice only, both log and
> notice, and neither.  But non-superusers can't be allowed to  change the
> log flag, only the notice flag.  It is probably possible to implement that,
> but it seems complicated both to implement, and to explain/document.  I
> think that adding another GUC is better than greatly complicating an
> existing one.
>

Yes, I understood. Changing the existing GUC will make it complex.

What do you think of Jim Nasby's idea of making a settable level, rather
> just on or off?
>

I am not clearly understood, how the settable level works here? Based on
log_min_messages
or something, the behavior differs?

The Notification messages are good, If we are going to add this facility
only for lock waits, then
a simple GUC is enough. If we are going to enhance the same for other
messages, then I prefer
something like log_statement GUC to take some input from user and those
messages will be
sent to the user.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Hash Indexes

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 8:53 PM, Peter Geoghegan  wrote:
> On Fri, Sep 30, 2016 at 1:14 AM, Robert Haas  wrote:
>>> I, for one, agree with this position.
>>
>> Well, I, for one, find it frustrating.  It seems pretty unhelpful to
>> bring this up only after the code has already been written.  The first
>> post on this thread was on May 10th.  The first version of the patch
>> was posted on June 16th.  This position was first articulated on
>> September 15th.
>
> Really, what do we have to lose at this point? It's not very difficult
> to do what Andres proposes.

Well, first of all, I can't, because I don't really understand what
tests he has in mind.  Maybe somebody else does, in which case perhaps
they could do the work and post the results.  If the tests really are
simple, that shouldn't be much of a burden.

But, second, suppose we do the tests and find out that the
hash-over-btree idea completely trounces hash indexes.  What then?  I
don't think that would really prove anything because, as I said in my
email to Andres, the current hash index code is severely
under-optimized, so it's not really an apples-to-apples comparison.
But even if it did prove something, is the idea then that Amit (with
help from Mithun and Ashutosh Sharma) should throw away the ~8 months
of development work that's been done on hash indexes in favor of
starting all over with a new and probably harder project to build a
whole new AM, and just leave hash indexes broken?  That doesn't seem
like a very reasonable think to ask.  Leaving hash indexes broken
fixes no problem that we have.

On the other hand, applying those patches (after they've been suitably
reviewed and fixed up) does fix several things.  For one thing, we can
stop shipping a totally broken feature in release after release.  For
another thing, those hash indexes do in fact outperform btree on some
workloads, and with more work they can probably beat btree on more
workloads.  And if somebody later wants to write hash-over-btree and
that turns out to be better still, great!  I'm not blocking anyone
from doing that.

The only argument that's been advanced for not fixing hash indexes is
that we'd then have to give people accurate guidance on whether to
choose hash or btree, but that would also be true of a hypothetical
hash-over-btree AM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Learning to hack Postgres - Keeping track of ctids

2016-09-29 Thread Craig Ringer
On 30 September 2016 at 04:15, Emrul  wrote:
> Hi,
>
> I'm working on an idea to implement a graph database in Postgres.  At the
> moment its just a learning exercise.
>
> What I'd like to do is store a reference to all the links from one record
> using an array type that stores links to all related tables.
>
> At first, I've succeeded in doing this using primary key Ids and this works
> fine.  However, I'd like to be able to bypass the index lookup altogether by
> storing the ctids in my array instead of the primary key ids.
>
> Trouble of course is that ctids can get changed (like for instance
> vacuuming).  So my question is: how can I keep my ctid references up to date
> - is there any way to detect when a ctid is changed?

I expect that you'd have to teach the heapam code, vacuum, etc to do
the required housekeeping.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 8:29 PM, Andres Freund  wrote:
> On September 29, 2016 5:28:00 PM PDT, Robert Haas  
> wrote:
>>On Thu, Sep 29, 2016 at 8:16 PM, Andres Freund 
>>wrote:
 Well, I, for one, find it frustrating.  It seems pretty unhelpful to
 bring this up only after the code has already been written.
>>>
>>> I brought this up in person at pgcon too.
>>
>>To whom?  In what context?
>
> Amit, over dinner.

OK, well, I can't really comment on that, then, except to say that if
you waited three months to follow up on the mailing list, you probably
can't blame Amit if he thought that it was more of a casual suggestion
than a serious objection.  Maybe it was?  I don't know.

For  my part, I don't really understand how you think that we could
find anything out via relatively simple tests.  The hash index code is
horribly under-maintained, which is why Amit is able to get large
performance improvements out of improving it.  If you compare it to
btree in some way, it's probably going to lose.  But I don't think
that answers the question of whether a hash AM that somebody's put
some work into will win or lose against a hypothetical hash-over-btree
AM that nobody's written.  Even if it wins, is that really a reason to
leave the hash index code itself in a state of disrepair?  We probably
would have removed it already except that the infrastructure is used
for hash joins and hash aggregation, so we really can't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-29 Thread Peter Geoghegan
On Fri, Sep 30, 2016 at 1:14 AM, Robert Haas  wrote:
>> I, for one, agree with this position.
>
> Well, I, for one, find it frustrating.  It seems pretty unhelpful to
> bring this up only after the code has already been written.  The first
> post on this thread was on May 10th.  The first version of the patch
> was posted on June 16th.  This position was first articulated on
> September 15th.

Really, what do we have to lose at this point? It's not very difficult
to do what Andres proposes.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-29 Thread Peter Geoghegan
On Fri, Sep 30, 2016 at 1:29 AM, Andres Freund  wrote:
>>To whom?  In what context?
>
> Amit, over dinner.

In case it matters, I also talked to Amit about this privately.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-29 Thread Andres Freund


On September 29, 2016 5:28:00 PM PDT, Robert Haas  wrote:
>On Thu, Sep 29, 2016 at 8:16 PM, Andres Freund 
>wrote:
>>> Well, I, for one, find it frustrating.  It seems pretty unhelpful to
>>> bring this up only after the code has already been written.
>>
>> I brought this up in person at pgcon too.
>
>To whom?  In what context?

Amit, over dinner.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 8:16 PM, Andres Freund  wrote:
>> Well, I, for one, find it frustrating.  It seems pretty unhelpful to
>> bring this up only after the code has already been written.
>
> I brought this up in person at pgcon too.

To whom?  In what context?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-29 Thread Andres Freund
On 2016-09-29 20:14:40 -0400, Robert Haas wrote:
> On Thu, Sep 29, 2016 at 8:07 PM, Peter Geoghegan  wrote:
> > On Wed, Sep 28, 2016 at 8:06 PM, Andres Freund  wrote:
> >> On 2016-09-28 15:04:30 -0400, Robert Haas wrote:
> >>> Andres already
> >>> stated that he things working on btree-over-hash would be more
> >>> beneficial than fixing hash, but at this point it seems like he's the
> >>> only one who takes that position.
> >>
> >> Note that I did *NOT* take that position. I was saying that I think we
> >> should evaluate whether that's not a better approach, doing some simple
> >> performance comparisons.
> >
> > I, for one, agree with this position.
>
> Well, I, for one, find it frustrating.  It seems pretty unhelpful to
> bring this up only after the code has already been written.

I brought this up in person at pgcon too.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 8:07 PM, Peter Geoghegan  wrote:
> On Wed, Sep 28, 2016 at 8:06 PM, Andres Freund  wrote:
>> On 2016-09-28 15:04:30 -0400, Robert Haas wrote:
>>> Andres already
>>> stated that he things working on btree-over-hash would be more
>>> beneficial than fixing hash, but at this point it seems like he's the
>>> only one who takes that position.
>>
>> Note that I did *NOT* take that position. I was saying that I think we
>> should evaluate whether that's not a better approach, doing some simple
>> performance comparisons.
>
> I, for one, agree with this position.

Well, I, for one, find it frustrating.  It seems pretty unhelpful to
bring this up only after the code has already been written.  The first
post on this thread was on May 10th.  The first version of the patch
was posted on June 16th.  This position was first articulated on
September 15th.

But, by all means, please feel free to do the performance comparison
and post the results.  I'd be curious to see them myself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 8:09 AM, Amit Langote
 wrote:
> I removed DEPENDENCY_IGNORE.  Does the following look good or am I still
> missing something?

You missed your commit message, but otherwise looks fine.

>> Also, I think this should be rephrased a bit to be more clear about
>> how the partitioning key works, like this: The optional
>> PARTITION BY clause specifies a method of
>> partitioning the table.  The table thus created is called a
>> partitioned table.  The parenthesized list of
>> expressions forms the partitioning key for the
>> table.  When using range partitioning, the partioning key can include
>> multiple columns or expressions, but for list partitioning, the
>> partitioning key must consist of a single column or expression.  If no
>> btree operator class is specified when creating a partitioned table,
>> the default btree operator class for the datatype will be used.  If
>> there is none, an error will be reported.
>
> Revised text along these lines.

It seems you copied my typo: my text has "partioning" for
"partitioning" and your patch now has that, too.

> Things were that way initially, that is, the parent relations had no
> relfilenode.  I abandoned that project however.  The storage-less parent
> thing seemed pretty daunting to me to handle right away.  For example,
> optimizer and the executor code needed to be taught about the parent rel
> appendrel member that used to be included as part of the result of
> scanning the inheritance set but was no longer.

Even if we leave the empty relfilenode around for now -- in the long
run I think it should die -- I think we should prohibit the creation
of subsidiary object on the parent which is only sensible if it has
rows - e.g. indexes.  It makes no sense to disallow non-inheritable
constraints while allowing indexes, and it could box us into a corner
later.

>> +/*
>> + * Run the expressions through eval_const_expressions. This is
>> + * not just an optimization, but is necessary, because eventually
>> + * the planner will be comparing them to similarly-processed qual
>> + * clauses, and may fail to detect valid matches without this.
>> + * We don't bother with canonicalize_qual, however.
>> + */
>>
>> I'm a bit confused by this, because I would think this processing
>> ought to have been done before storing anything in the system
>> catalogs.  I don't see why it should be necessary to do it again after
>> pulling data back out of the system catalogs.
>
> The pattern matches what's done for other expressions that optimizer deals
> with, such as CHECK, index key, and index predicate expressions.

That's kind of a non-answer answer, but OK.  Still, I think you
shouldn't just copy somebody else's comment blindly into a new place.
Reference the original comment, or write your own.

>> How can it be valid to have no partitioning expressions?
>
> Keys that are simply column names are resolved to attnums and stored
> likewise.  If some key is an expression, then corresponding attnum is 0
> and the expression itself is added to the list that gets stored into
> partexprbin.  It is doing the same thing as index expressions.

Oh, right.  Oops.

>> +if (classform->relkind != relkind &&
>> +(relkind == RELKIND_RELATION &&
>> +classform->relkind != RELKIND_PARTITIONED_TABLE))
>>
>> That's broken.  Note that all of the conditions are joined using &&,
>> so if any one of them fails then we won't throw an error.  In
>> particular, it's no longer possible to throw an error when relkind is
>> not RELKIND_RELATION.
>
> You are right.  I guess it would have to be the following:
>
> +if ((classform->relkind != relkind &&
> + classform->relkind != RELKIND_PARTITIONED_TABLE) ||
> +(classform->relkind == RELKIND_PARTITIONED_TABLE &&
> + relkind != RELKIND_RELATION))
>
> Such hackishness could not be helped because we have a separate DROP
> command for every distinct relkind, except we overload DROP TABLE for both
> regular and partitioned tables.

Maybe this would be better:

if (relkind == RELKIND_PARTITIONED_TABLE)
syntax_relkind = RELKIND_RELATION;
else
syntax_relkind = rekind;
if (classform->relkind != syntax_relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);

>> Why isn't this logic being invoked from transformCreateStmt()?  Then
>> we could use the actual parseState for the query instead of a fake
>> one.
>
> Because we need an open relation for it to work, which in this case there
> won't be until after we have performed heap_create_with_catalog() in
> DefineRelation().  Mainly because we need to perform transformExpr() on
> expressions.  That's similar to how cookConstraint() on the new CHECK
> constraints cannot be performed earlier.  Am I missing something?

Hmm, yeah, I guess that's the same thing.  I guess I got on this line
of thinking because the function 

Re: [HACKERS] Hash Indexes

2016-09-29 Thread Peter Geoghegan
On Wed, Sep 28, 2016 at 8:06 PM, Andres Freund  wrote:
> On 2016-09-28 15:04:30 -0400, Robert Haas wrote:
>> Andres already
>> stated that he things working on btree-over-hash would be more
>> beneficial than fixing hash, but at this point it seems like he's the
>> only one who takes that position.
>
> Note that I did *NOT* take that position. I was saying that I think we
> should evaluate whether that's not a better approach, doing some simple
> performance comparisons.

I, for one, agree with this position.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

2016-09-29 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> I wouldn't even put a lot of faith in the errno being meaningful,
>> considering that it does close() calls before capturing the errno.

> So we do close() in a bunch of places while closing shop, which calls
> _close() on Windows; this function sets errno.

But only on failure, no?  The close()s usually shouldn't fail, and
therefore shouldn't change errno, it's just that you can't trust that
100%.

I think likely what's happening is that we're seeing a leftover value from
some previous syscall that set GetLastError's result (and, presumably,
wasn't fatal so far as pg_upgrade was concerned).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-29 Thread Tom Lane
Michael Paquier  writes:
> Oops. Are you using gcc to see that? I compiled with clang and on
> Windows without noticing it.

Peter already noted that you'd only see it on platforms that define
PG_FLUSH_DATA_WORKS.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

2016-09-29 Thread Tom Lane
Alvaro Herrera  writes:
> Moreover I think getErrorText() as a whole is misconceived and should be
> removed altogether (why pstrdup the string?).

Indeed.  I think bouncing the error back to the caller is misguided
to start with, seeing that the caller is just going to do pg_fatal
anyway.  We should rewrite these functions to just error out internally,
which will make it much easier to provide decent error reporting
indicating which call failed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

2016-09-29 Thread Tom Lane
I wrote:
> But what gets my attention in this connection is that it doesn't
> seem to be taking the trouble to open the files in binary mode.
> Could that lead to the reported failure?  Not sure, but it seems
> like at the least it could result in corrupted VM files.

On further thought, it definitely would lead to some sort of
hard-to-diagnose error.  Assuming there's some bytes that look
like \n, Windows read() would expand them to \r\n, which not
only produces garbage vismap data but throws off the page boundaries
in the file.  rewriteVisibilityMap() would not notice that, since
it contains exactly no sanity checking on the page headers,
but what it would notice is getting a short read on what it'll
think is a partial last page.  Then it does this:

if ((bytesRead = read(src_fd, buffer, BLCKSZ)) != BLCKSZ)
{
close(dst_fd);
close(src_fd);
return getErrorText();
}

The read() won't have set errno, since by its lights there's nothing
wrong.  So while we know that getErrorText saw errno == EINVAL, there's
no way to guess what call that might've been left over from.

BTW, just to add to the already long list of mortal sins in this
bit of code, I believe it does 100% the wrong thing on big-endian
hardware.

uint16new_vmbits = 0;
...
/* Copy new visibility map bit to new format page */
memcpy(new_cur, _vmbits, BITS_PER_HEAPBLOCK);

That's going to drop the two new bytes down in exactly the wrong
order if big-endian.

Oh, and for even more fun, this bit will dump core on alignment-picky
machines, if the vmbuf local array starts on an odd boundary:

((PageHeader) new_vmbuf)->pd_checksum =
pg_checksum_page(new_vmbuf, new_blkno);

We might've caught these things earlier if the buildfarm testing
included cross-version upgrades, but of course that requires a
lot of test infrastructure that's not there ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-29 Thread Michael Paquier
On Fri, Sep 30, 2016 at 1:31 AM, Jeff Janes  wrote:
> On Thu, Sep 29, 2016 at 8:33 AM, Peter Eisentraut
>  wrote:
>>
>> On 9/26/16 10:34 PM, Michael Paquier wrote:
>> > I thought that as long as the error string is shown to the user, it
>> > does not matter much if errno is still saved or not. All the callers
>> > of durable_rename() on frontends don't check for strerrno(errno)
>> > afterwards. Do you think it matters? Changing that back is trivial.
>> > Sorry for not mentioning directly in the thread that I modified that
>> > when dropping the last patch set.
>>
>> Actually, I think the equivalent backend code only does this to save the
>> errno across the close call because the elog call comes after the close.
>>  So it's OK to not do that in the frontend.
>>
>> With that in mind, I have committed the v3 series now.

Thanks!

> I'm getting compiler warnings:
>
> file_utils.c: In function 'fsync_pgdata':
> file_utils.c:86: warning: passing argument 2 of 'walkdir' from incompatible
> pointer type
> file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char *)'
> but argument is of type 'void (*)(const char *, bool,  const char *)'
> file_utils.c:88: warning: passing argument 2 of 'walkdir' from incompatible
> pointer type
> file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char *)'
> but argument is of type 'void (*)(const char *, bool,  const char *)'
> file_utils.c:89: warning: passing argument 2 of 'walkdir' from incompatible
> pointer type
> file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char *)'
> but argument is of type 'void (*)(const char *, bool,  const char *)'

Oops. Are you using gcc to see that? I compiled with clang and on
Windows without noticing it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

2016-09-29 Thread Alvaro Herrera
Tom Lane wrote:
> Thomas Kellerer  writes:
> > for some reason pg_upgrade failed on Windows 10 for me, with an error 
> > message that one specifc _vm file couldn't be copied.
> 
> Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new
> code for 9.6 and hasn't really gotten that much testing.  Its error
> reporting is shamefully bad --- you can't tell which step failed, and
> I wouldn't even put a lot of faith in the errno being meaningful,
> considering that it does close() calls before capturing the errno.

So we do close() in a bunch of places while closing shop, which calls
_close() on Windows; this function sets errno.  Then we call
getErrorText(), which calls _dosmaperr() on the result of
GetLastError().  But the last-error stuff is not set by _close; I suppose
GetLastError() returns 0 in that case, which promps _doserrmap to set errno to 
0.
http://stackoverflow.com/questions/20056851/getlasterror-errno-formatmessagea-and-strerror-s
So this wouldn't quite have the effect you say; I think it'd say
"Failure while copying ...: Success" instead.

However surely we should have errno save/restore.

Other than that, I think the _dosmaperr() call should go entirely.
Moreover I think getErrorText() as a whole is misconceived and should be
removed altogether (why pstrdup the string?).  There are very few places
in pg_upgrade that require _dosmaperr; I can see only copyFile and
linkFile.  All the others should just be doing strerror() only, at least
according to the manual.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

2016-09-29 Thread Tom Lane
Thomas Kellerer  writes:
> for some reason pg_upgrade failed on Windows 10 for me, with an error message 
> that one specifc _vm file couldn't be copied.

Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new
code for 9.6 and hasn't really gotten that much testing.  Its error
reporting is shamefully bad --- you can't tell which step failed, and
I wouldn't even put a lot of faith in the errno being meaningful,
considering that it does close() calls before capturing the errno.

But what gets my attention in this connection is that it doesn't
seem to be taking the trouble to open the files in binary mode.
Could that lead to the reported failure?  Not sure, but it seems
like at the least it could result in corrupted VM files.

Has anyone tested vismap upgrades on Windows, and made an effort
to validate that the output wasn't garbage?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Congrats on the on-time release!

2016-09-29 Thread Josh Berkus
Hackers,

I wanted to congratulate everyone involved (and it's a long list of
people) in having our first on-schedule major release since 9.3.
Especially the RMT, which did a lot to make that happen.

Getting the release train to run on time is a major accomplishment, and
will help both development and adoption.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Order of operations in SubPostmasterMain()

2016-09-29 Thread Andres Freund
On 2016-09-29 15:46:00 -0400, Tom Lane wrote:
> I noticed that buildfarm member culicidae, which is running an
> EXEC_BACKEND build on Linux, occasionally falls over like this:
> 
> FATAL:  could not reattach to shared memory (key=6280001, 
> addr=0x7fa9df845000): Invalid argument
> 
> That's probably because Andres failed to disable ASLR on that machine,

Intentionally so, FWIW.  Want to enable PIE as well soon-ish.

Newer versions of windows / msvc want to enable ASLR by default, and I
think that'll break parallel query bgworkers, because they pass the
entry point around as a pointer. So I don't think we'll be able to duck
this issue for much longer.


> but the exact cause isn't too important right now.  What I'm finding
> distressing is that that message doesn't show up on the client side,
> making it look like a postmaster crash:

Indeed.


> The reason we don't see anything is that backend libpq hasn't been
> initialized yet, so it won't send the ereport message to the client.
> 
> The original design of SubPostmasterMain() intended that such cases
> would be reported, because it runs BackendInitialize (which calls
> pq_init()) before CreateSharedMemoryAndSemaphores (which is where
> the reattach used to happen --- the comments at postmaster.c 4730ff
> and 4747ff reflect this expectation).  We broke it when we added
> the earlier reattach call at lines 4669ff.
> 
> We could probably refactor things enough so that we do pq_init()
> before PGSharedMemoryReAttach().  It would be a little bit ugly,
> and it would fractionally increase the chance of a reattach failure
> because pq_init() palloc's a few KB worth of buffers.  I'm not quite
> sure if it's worth it; thoughts?  In any case the mentioned comments
> are obsolete and need to be moved/rewritten.

Hm.  It doesn't really seem necessary to directly allocate that
much. Seems pretty harmless to essentially let it start at 0 bytes (so
we can repalloc, but don't use a lot of memory)?


> Another thing that I'm finding distressing while I look at this is
> the placement of ClosePostmasterPorts().  That needs to happen *early*,
> because as long as the child process is holding open the postmaster side
> of the postmaster death pipe, we are risking unhappiness.  Not only is
> it not particularly early, it's after process_shared_preload_libraries()
> which could invoke arbitrarily stupid stuff.  Whether or not we do
> anything about moving pq_init(), I'm thinking we'd better move the
> ClosePostmasterPorts() call so that it's done as soon as possible,
> probably right after read_backend_variables() which loads the data
> it needs.

That seems like a good idea. Besides the benefit you mention, it looks
like that'll also reduce duplication.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Order of operations in SubPostmasterMain()

2016-09-29 Thread Tom Lane
Greg Stark  writes:
> On Thu, Sep 29, 2016 at 8:46 PM, Tom Lane  wrote:
>> We could probably refactor things enough so that we do pq_init()
>> before PGSharedMemoryReAttach().  It would be a little bit ugly,
>> and it would fractionally increase the chance of a reattach failure
>> because pq_init() palloc's a few KB worth of buffers.  I'm not quite
>> sure if it's worth it; thoughts?  In any case the mentioned comments
>> are obsolete and need to be moved/rewritten.

> Alternately we could call pq_init in the error path if it hasn't been
> called yet. I'm sure there are problems with doing that in general but
> for the specific errors that can happen before pq_init it might be
> feasible since they obviously can't have very much shared state yet to
> have corrupted.

Thanks for the suggestion, but I don't think it helps much.  Most of
the refactoring pain comes from the fact that we'd have to set up
MyProcPort earlier (since pqcomm.c needs that to be valid), and
it's not very clear just where to do that.  Having pq_init be called
behind the scenes does nothing to fix that issue. 

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Learning to hack Postgres - Keeping track of ctids

2016-09-29 Thread Emrul
Hi,

I'm working on an idea to implement a graph database in Postgres.  At the
moment its just a learning exercise.

What I'd like to do is store a reference to all the links from one record
using an array type that stores links to all related tables.

At first, I've succeeded in doing this using primary key Ids and this works
fine.  However, I'd like to be able to bypass the index lookup altogether by
storing the ctids in my array instead of the primary key ids.

Trouble of course is that ctids can get changed (like for instance
vacuuming).  So my question is: how can I keep my ctid references up to date
- is there any way to detect when a ctid is changed?



--
View this message in context: 
http://postgresql.nabble.com/Learning-to-hack-Postgres-Keeping-track-of-ctids-tp5923649.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability

2016-09-29 Thread Peter Eisentraut
On 9/29/16 12:31 PM, Jeff Janes wrote:
> With that in mind, I have committed the v3 series now.
> 
> 
> I'm getting compiler warnings:

Fixed.

> 
> file_utils.c: In function 'fsync_pgdata':
> file_utils.c:86: warning: passing argument 2 of 'walkdir' from
> incompatible pointer type
> file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char
> *)' but argument is of type 'void (*)(const char *, bool,  const char *)'
> file_utils.c:88: warning: passing argument 2 of 'walkdir' from
> incompatible pointer type
> file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char
> *)' but argument is of type 'void (*)(const char *, bool,  const char *)'
> file_utils.c:89: warning: passing argument 2 of 'walkdir' from
> incompatible pointer type
> file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char
> *)' but argument is of type 'void (*)(const char *, bool,  const char *)'

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Order of operations in SubPostmasterMain()

2016-09-29 Thread Greg Stark
On Thu, Sep 29, 2016 at 8:46 PM, Tom Lane  wrote:
> We could probably refactor things enough so that we do pq_init()
> before PGSharedMemoryReAttach().  It would be a little bit ugly,
> and it would fractionally increase the chance of a reattach failure
> because pq_init() palloc's a few KB worth of buffers.  I'm not quite
> sure if it's worth it; thoughts?  In any case the mentioned comments
> are obsolete and need to be moved/rewritten.


Just speaking off the cuff without reviewing the code in detail...

Alternately we could call pq_init in the error path if it hasn't been
called yet. I'm sure there are problems with doing that in general but
for the specific errors that can happen before pq_init it might be
feasible since they obviously can't have very much shared state yet to
have corrupted.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2016-09-29 Thread Peter Eisentraut
On 9/29/16 4:00 PM, Peter Eisentraut wrote:
> Since the commit fest is drawing to a close, I'll set this patch as
> returned with feedback.

Actually, the CF app informs me that moving to the next CF is more
appropriate, so I have done that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2016-09-29 Thread Peter Eisentraut
I think we should look into handling the different page types better.
The hash_page_stats function was copied from btree, which only has one
type.  It's not clear whether all the values apply to each page type.
At least they should be null if they don't apply.  BRIN has a separate
function for each page type, which might make more sense.  I suggest
taken the test suite that I posted and expanding the tests so that we
see output for each different page type.

Besides that, I would still like better data types for some of the
output columns, as previously discussed.  In addition to what I already
pointed out, the ctid column can be of type ctid instead of text.

Since the commit fest is drawing to a close, I'll set this patch as
returned with feedback.  Please continue working on it, since there is
clearly renewed interest in hash indexes, and we'll need this type of
functionality for that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Order of operations in SubPostmasterMain()

2016-09-29 Thread Tom Lane
I noticed that buildfarm member culicidae, which is running an
EXEC_BACKEND build on Linux, occasionally falls over like this:

FATAL:  could not reattach to shared memory (key=6280001, addr=0x7fa9df845000): 
Invalid argument

That's probably because Andres failed to disable ASLR on that machine,
but the exact cause isn't too important right now.  What I'm finding
distressing is that that message doesn't show up on the client side,
making it look like a postmaster crash:

*** 
/home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/contrib/intarray/expected/_int.out
 2016-09-15 22:02:39.512951557 +
--- 
/home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/contrib/intarray/results/_int.out
   2016-09-29 17:52:56.921948612 +
***
*** 1,568 
!  lots ...
--- 1,3 
! psql: server closed the connection unexpectedly
!   This probably means the server terminated abnormally
!   before or while processing the request.

The reason we don't see anything is that backend libpq hasn't been
initialized yet, so it won't send the ereport message to the client.

The original design of SubPostmasterMain() intended that such cases
would be reported, because it runs BackendInitialize (which calls
pq_init()) before CreateSharedMemoryAndSemaphores (which is where
the reattach used to happen --- the comments at postmaster.c 4730ff
and 4747ff reflect this expectation).  We broke it when we added
the earlier reattach call at lines 4669ff.

We could probably refactor things enough so that we do pq_init()
before PGSharedMemoryReAttach().  It would be a little bit ugly,
and it would fractionally increase the chance of a reattach failure
because pq_init() palloc's a few KB worth of buffers.  I'm not quite
sure if it's worth it; thoughts?  In any case the mentioned comments
are obsolete and need to be moved/rewritten.

Another thing that I'm finding distressing while I look at this is
the placement of ClosePostmasterPorts().  That needs to happen *early*,
because as long as the child process is holding open the postmaster side
of the postmaster death pipe, we are risking unhappiness.  Not only is
it not particularly early, it's after process_shared_preload_libraries()
which could invoke arbitrarily stupid stuff.  Whether or not we do
anything about moving pq_init(), I'm thinking we'd better move the
ClosePostmasterPorts() call so that it's done as soon as possible,
probably right after read_backend_variables() which loads the data
it needs.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2016-09-29 Thread Peter Eisentraut
I wrote some tests for pageinspect, attached here (hash stuff in a
separate patch).  I think all the output numbers ought to be
deterministic (I excluded everything that might contain xids).  Please test.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Add-tests-for-pageinspect.patch
Description: invalid/octet-stream


0002-Add-tests-for-pageinspect-hash.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python adding support for multi-dimensional arrays

2016-09-29 Thread Heikki Linnakangas

On 09/23/2016 10:27 PM, Jim Nasby wrote:

On 9/23/16 2:42 AM, Heikki Linnakangas wrote:

How do we handle single-dimensional arrays of composite types at the
moment? At a quick glance, it seems that the composite types are just
treated like strings, when they're in an array. That's probably OK, but
it means that there's nothing special about composite types in
multi-dimensional arrays. In any case, we should mention that in the docs.


That is how they're handled, but I'd really like to change that. I've
held off because I don't know how to handle the backwards
incompatibility that would introduce. (I've been wondering if we might
add a facility to allow specifying default TRANSFORMs that should be
used for specific data types in specific languages.)

The converse case (a composite with arrays) suffers the same problem
(array is just treated as a string).


I take that back, I don't know what I was talking about. Without this 
patch, an array of composite types can be returned, using any of the 
three representations for the composite type explained in the docs: a 
string, a sequence, or a dictionary. So, all these work, and return the 
same value:


create table foo (a int4, b int4);

CREATE FUNCTION comp_array_string() RETURNS foo[] AS $$
return ["(1, 2)"]
$$ LANGUAGE plpythonu;

CREATE FUNCTION comp_array_sequence() RETURNS foo[] AS $$
return [[1, 2]]
$$ LANGUAGE plpythonu;

CREATE FUNCTION comp_array_dict() RETURNS foo[] AS $$
return [{"a": 1, "b": 2}]
$$ LANGUAGE plpythonu;

Jim, I was confused, but you agreed with me. Were you also confused, or 
am I missing something?


Now, back to multi-dimensional arrays. I can see that the Sequence 
representation is problematic, with arrays, because if you have a python 
list of lists, like [[1, 2]], it's not immediately clear if that's a 
one-dimensional array of tuples, or two-dimensional array of integers. 
Then again, we do have the type definitions available. So is it really 
ambiguous?


The string and dict representations don't have that ambiguity at all, so 
I don't see why we wouldn't support those, at least.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-29 Thread Mithun Cy
This patch do not apply on latest code. it fails as follows
libpq-failover-9.patch:176: trailing whitespace.
thread.o pgsleep.o
libpq-failover-9.patch:184: trailing whitespace.
check:
libpq-failover-9.patch:185: trailing whitespace.
$(prove_check)
libpq-failover-9.patch:186: trailing whitespace.

libpq-failover-9.patch:194: trailing whitespace.
rm -rf tmp_check
error: patch failed: doc/src/sgml/libpq.sgml:792
error: doc/src/sgml/libpq.sgml: patch does not apply
error: patch failed: src/interfaces/libpq/Makefile:36
error: src/interfaces/libpq/Makefile: patch does not apply
error: patch failed: src/interfaces/libpq/fe-connect.c:299
error: src/interfaces/libpq/fe-connect.c: patch does not apply
error: patch failed: src/interfaces/libpq/libpq-fe.h:62
error: src/interfaces/libpq/libpq-fe.h: patch does not apply
error: patch failed: src/interfaces/libpq/libpq-int.h:334
error: src/interfaces/libpq/libpq-int.h: patch does not apply
error: patch failed: src/interfaces/libpq/test/expected.out:1
error: src/interfaces/libpq/test/expected.out: patch does not apply
error: patch failed: src/test/perl/PostgresNode.pm:398
error: src/test/perl/PostgresNode.pm: patch does not apply


On Tue, Sep 27, 2016 at 2:49 PM, Victor Wagner  wrote:
1).
>* if there is more than one host in the connect string and
>* target_server_type is not specified explicitely, set
>* target_server_type to "master", because default mode of
>* operation is failover, and so, we need to connect to RW
>* host, and keep other nodes of the cluster in the connect
>* string just in case master would fail and one of standbys
>* would be promoted to master.

I am slightly confused. As per this target_server_type=master means user is
expecting failover. What if user pass target_server_type=any and request
sequential connection isn't this also a case for failover?.  I think by
default it should be "any" for any number of hosts. And parameter
"sequential and random will decide whether we want just failover or with
load-balance.

2).

>For some reason DNS resolving was concentrated in one place before my
>changes. So, I've tried to not change this decision.

My intention was not to have a replacement function for
"pg_getaddrinfo_all", I just suggested to

have a local function which call pg_getaddrinfo_all for every host, port
pair read earlier. By this way we need not to maintain nodes struct. This
also reduces complexity of connectDBStart.

FUNC (host, port, addrs)

{

CALL pg_getaddrinfo_all(host, port, newaddrs);

   addrs-> ai_next = newaddrs;

}

3).

I think you should run a spellcheck once. And, there are some formatting
issue with respect to comments and curly braces of controlled blocks which
need to be fixed.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-09-29 Thread Tom Lane
Corey Huinker  writes:
> [ file_fdw_program_v3.diff ]

Pushed with cosmetic adjustments, mostly more work on the comments and
documentation.

I did not push the proposed test case; it's unportable.  The upthread
suggestion to add a TAP test would have been all right, because
--enable-tap-tests requires Perl, but the basic regression tests must
not.  I'm a bit dubious that it'd be worth the work to create such a
test anyway, when COPY FROM PROGRAM itself hasn't got one.

What *would* be worth some effort is allowing ANALYZE on a file_fdw
table reading from a program.  I concur that that probably can't be
the default behavior, but always falling back to the 10-block default
with no pg_stats stats is a really horrid prospect.

One idea is to invent another table-level FDW option "analyze".
If we could make that default to true for files and false for programs,
it'd preserve the desired default behavior, but it would add a feature
for plain files too: if they're too unstable to be worth analyzing,
you could turn it off.

Another thought is that maybe manual ANALYZE should go through in any
case, and the FDW option would only be needed to control auto-analyze.
Although I'm not sure what to think about scripted cases like
vacuumdb --analyze.  Maybe we'd need two flags, one permitting explicit
ANALYZE and one for autoanalyze, which could have different defaults.

Another thing that felt a little unfinished was the cost estimation
behavior.  Again, it's not clear how to do any better by default,
but is it worth the trouble to provide an FDW option to let the user
set the cost estimate for reading the table?  I'm not sure honestly.
Since there's only one path for the FDW itself, the cost estimate
doesn't matter in simple cases, and I'm not sure how much it matters
even in more complicated ones.  It definitely sucks that we don't
have a rows estimate that has anything to do with reality, but allowing
ANALYZE would be enough to handle that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notice lock waits

2016-09-29 Thread Jeff Janes
On Wed, Sep 28, 2016 at 11:57 PM, Haribabu Kommi 
wrote:

>
>
> On Sat, Aug 6, 2016 at 3:00 AM, Jeff Janes  wrote:
>
>> One time too many, I ran some minor change using psql on a production
>> server and was wondering why it was taking so much longer than it did
>> on the test server.  Only to discover, after messing around with
>> opening new windows and running queries against pg_stat_activity and
>> pg_locks and so on, that it was waiting for a lock.
>>
>> So I created a new guc, notice_lock_waits, which acts like
>> log_lock_waits but sends the message as NOTICE so it will show up on
>> interactive connections like psql.
>>
>> I turn it on in my .psqlrc, as it doesn't make much sense for me to
>> turn it on in non-interactive sessions.
>>
>> A general facility for promoting selected LOG messages to NOTICE would
>> be nice, but I don't know how to design or implement that.  This is
>> much easier, and I find it quite useful.
>>
>> I have it PGC_SUSET because it does send some tiny amount of
>> information about the blocking process (the PID) to the blocked
>> process.  That is probably too paranoid, because the PID can be seen
>> by anyone in the pg_locks table anyway.
>>
>> Do you think this is useful and generally implemented in the correct
>> way?  If so, I'll try to write some sgml documentation for it.
>>
>
>
> Providing the details of lock wait to the client is good. I fell this
> message
> is useful for the cases where User/administrator is trying to perform some
> SQL operations.
>
> I also feel that, adding a GUC variable for these logs to show it to user
> may not be good. Changing the existing GUC may be better.
>

I don't think it would be a good idea to refactor the existing GUC
(log_lock_waits) to accomplish this.

There would have to be four states, log only, notice only, both log and
notice, and neither.  But non-superusers can't be allowed to  change the
log flag, only the notice flag.  It is probably possible to implement that,
but it seems complicated both to implement, and to explain/document.  I
think that adding another GUC is better than greatly complicating an
existing one.

What do you think of Jim Nasby's idea of making a settable level, rather
just on or off?

Thanks,

Jeff


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 11:38 AM, Peter Geoghegan  wrote:
> On Thu, Sep 29, 2016 at 2:59 PM, Robert Haas  wrote:
>>> Maybe that was the wrong choice of words. What I mean is that it seems
>>> somewhat unprincipled to give over an equal share of memory to each
>>> active-at-least-once tape, ...
>>
>> I don't get it.  If the memory is being used for prereading, then the
>> point is just to avoid doing many small I/Os instead of one big I/O,
>> and that goal will be accomplished whether the memory is equally
>> distributed or not; indeed, it's likely to be accomplished BETTER if
>> the memory is equally distributed than if it isn't.
>
> I think it could hurt performance if preloading loads runs on a tape
> that won't be needed until some subsequent merge pass, in preference
> to using that memory proportionately, giving more to larger input runs
> for *each* merge pass (giving memory proportionate to the size of each
> run to be merged from each tape).  For tapes with a dummy run, the
> appropriate amount of memory for there next merge pass is zero.

OK, true.  But I still suspect that unless the amount of data you need
to read from a tape is zero or very small, the size of the buffer
doesn't matter.  For example, if you have a 1GB tape and a 10GB tape,
I doubt there's any benefit in making the buffer for the 10GB tape 10x
larger.  They can probably be the same.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracking wait event for latches

2016-09-29 Thread Robert Haas
On Wed, Sep 28, 2016 at 9:40 PM, Michael Paquier
 wrote:
> On Wed, Sep 28, 2016 at 9:45 PM, Robert Haas  wrote:
>> On Wed, Sep 28, 2016 at 8:38 AM, Michael Paquier
>>  wrote:
>>> So should I change back the patch to have only one argument for the
>>> eventId, and guess classId from it?
>>
>> Why would you need to guess?
>
> Incorrect wording from me perhaps? i just meant that processing needs
> to know what is the classId coming for a specific eventId.
>
>> But, yes, I think one argument is much preferable.
>
> OK. Here is the wanted patch. I have reduced the routines of WaitLatch
> & friends to use only one argument, and added what is the classId
> associated with a given eventId in an array of multiple fields, giving
> something like that:
> + const struct wait_event_entry WaitEventEntries[] = {
> +   /* Activity */
> +   {WAIT_ACTIVITY, "ArchiverMain"},
> [...]
>
> I have cleaned up as well the inclusions of pgstat.h that I added
> previously. Patch is attached.

It seems to me that you haven't tested this patch very carefully,
because as far as I can see it breaks wait event reporting for both
heavyweight locks and buffer pins - or in other words two out of the
three existing cases covered by the mechanism you are patching.

The heavyweight lock portion is broken because WaitOnLock() reports a
Lock wait before calling ProcSleep(), which now clobbers it. Instead
of reporting that we're waiting for Lock/relation, a LOCK TABLE
statement that hangs now reports IPC/ProcSleep.  Similarly, a conflict
over a tuple lock is now also reported as IPC/ProcSleep, and ditto for
any other kind of lock, which were all reported separately before.
Obviously, that's no good.  I think it would be just fine to push down
setting the wait event into ProcSleep(), doing it when we actually
WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to
report that we're waiting for the heavyweight lock, not ProcSleep().

As for buffer pins, note that LockBufferForCleanup() calls
ProcWaitForSignal(), which now overwrites the wait event set in by its
caller.  I think we could just make ProcWaitForSignal() take a wait
event as an argument; then LockBufferForCleanup() could pass an
appropriate constant and other callers of ProcWaitForSignal() could
pass their own wait events.

The way that we're constructing the wait event ID that ultimately gets
advertised in pg_stat_activity is a bit silly in this patch.  We start
with an event ID (which is an integer) and we then do an array lookup
(via GetWaitEventClass) to get a second integer which is then XOR'd
back into the first integer (via pgstat_report_wait_start) before
storing it in the PGPROC.  New idea: let's change
pgstat_report_wait_start() to take a single integer argument which it
stores without interpretation.  Let's separate the construction of the
wait event into a separate macro, say make_wait_event().  Then
existing callers of pgstat_report_wait_start(a, b) will instead do
pgstat_report_wait_start(make_wait_event(a, b)), but callers that want
to construct the wait event and push it through a few levels of the
call stack before advertising it only need to pass one integer.  If
done correctly, this should be cleaner and faster than what you've got
right now.

I am not a huge fan of the "WE_" prefix you've used.  It's not
terrible, but "we" doesn't obviously stand for "wait event" and it's
also a word itself.  I think a little more verbosity would add
clarity.  Maybe we could go with WAIT_EVENT_ instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-29 Thread Jeff Janes
On Thu, Sep 29, 2016 at 8:33 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/26/16 10:34 PM, Michael Paquier wrote:
> > I thought that as long as the error string is shown to the user, it
> > does not matter much if errno is still saved or not. All the callers
> > of durable_rename() on frontends don't check for strerrno(errno)
> > afterwards. Do you think it matters? Changing that back is trivial.
> > Sorry for not mentioning directly in the thread that I modified that
> > when dropping the last patch set.
>
> Actually, I think the equivalent backend code only does this to save the
> errno across the close call because the elog call comes after the close.
>  So it's OK to not do that in the frontend.
>
> With that in mind, I have committed the v3 series now.
>

I'm getting compiler warnings:

file_utils.c: In function 'fsync_pgdata':
file_utils.c:86: warning: passing argument 2 of 'walkdir' from incompatible
pointer type
file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char
*)' but argument is of type 'void (*)(const char *, bool,  const char *)'
file_utils.c:88: warning: passing argument 2 of 'walkdir' from incompatible
pointer type
file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char
*)' but argument is of type 'void (*)(const char *, bool,  const char *)'
file_utils.c:89: warning: passing argument 2 of 'walkdir' from incompatible
pointer type
file_utils.c:36: note: expected 'int (*)(const char *, bool,  const char
*)' but argument is of type 'void (*)(const char *, bool,  const char *)'

Cheers,

Jeff


Re: [HACKERS] pageinspect: Hash index support

2016-09-29 Thread Jesper Pedersen

On 09/29/2016 11:58 AM, Peter Eisentraut wrote:

On 9/27/16 10:10 AM, Jesper Pedersen wrote:

contrib/pageinspect/pageinspect--1.5--1.6.sql |  59 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 --
 contrib/pageinspect/pageinspect--1.6.sql  | 346 ++


I think there is still a misunderstanding about these extension files.
You only need to supply pageinspect--1.5--1.6.sql and leave all the
other ones alone.



Ok.

Left the 'DATA' section in the Makefile, and the control file as is.

Best regards,
 Jesper

>From b48623beda6aad43116d46ff44de55bdc7547325 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 5 Aug 2016 10:16:32 -0400
Subject: [PATCH] pageinspect: Hash index support

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 408 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  59 
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 146 -
 5 files changed, 609 insertions(+), 16 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a98237e..c73d728 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..a76b683
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,408 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/htup_details.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+
+PG_FUNCTION_INFO_V1(hash_metapage_info);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		free_size;
+	char	   *type;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPageStat;
+
+
+/*
+ * Verify that the given bytea contains a HASH page, or die in the attempt.
+ * A pointer to the page is returned.
+ */
+static Page
+verify_hash_page(bytea *raw_page, bool metap)
+{
+	Page		page;
+	int			raw_page_size;
+	HashPageOpaque pageopaque;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+	if (raw_page_size != BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small"),
+ errdetail("Expected size %d, got %d",
+		   BLCKSZ, raw_page_size)));
+
+	page = VARDATA(raw_page);
+	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not a HASH page"),
+ errdetail("Expected %08x, got %08x.",
+		   HASHO_PAGE_ID, pageopaque->hasho_page_id)));
+
+	if (metap)
+	{
+		if (pageopaque->hasho_flag != LH_META_PAGE)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("page is not a metadata page"),
+	 errdetail("Expected %08x, got %08x.",
+			   LH_META_PAGE, pageopaque->hasho_flag)));
+	}
+	else
+	{
+		if (pageopaque->hasho_flag == LH_META_PAGE)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("page is a metadata page"),
+	 errdetail("Flags %08x.",
+			   pageopaque->hasho_flag)));
+	}
+
+	return page;
+}
+
+/* -
+ * GetHashPageStatistics()
+ *
+ * Collect statistics of single hash page
+ * -
+ */
+static void
+GetHashPageStatistics(Page page, HashPageStat 

Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-09-29 Thread Ashutosh Bapat
>
>> The attached patch adds the
>> dependencies from create_foreignscan_plan() which is called for any
>> foreign access. I have also added testcases to test the functionality.
>> Let me know your comments on the patch.
>
>
> Hmm.  I'm not sure that that's a good idea.
>
> I was thinking the changes to setrefs.c proposed by Amit to collect that
> dependencies would be probably OK, but I wasn't sure that it's a good idea
> that he used PlanCacheFuncCallback as the syscache inval callback function
> for the foreign object caches because it invalidates not only generic plans
> but query trees, as you mentioned downthread.  So, I was thinking to modify
> his patch so that we add a new syscache inval callback function for the
> caches that is much like PlanCacheFuncCallback but only invalidates generic
> plans.

PlanCacheFuncCallback() invalidates the query tree only when
invalItems are added to the plan source. The patch adds the
dependencies in root->glob->invalItems, which standard_planner()
copies into PlannedStmt::invalItems. This is then copied into the
gplan->stmt_list. Thus PlanCacheFuncCallback never invalidates the
query tree. I have verified this under the debugger. Am I missing
something?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2016-09-29 Thread Peter Eisentraut
On 9/27/16 10:10 AM, Jesper Pedersen wrote:
> contrib/pageinspect/pageinspect--1.5--1.6.sql |  59 
>  contrib/pageinspect/pageinspect--1.5.sql  | 279 --
>  contrib/pageinspect/pageinspect--1.6.sql  | 346 ++

I think there is still a misunderstanding about these extension files.
You only need to supply pageinspect--1.5--1.6.sql and leave all the
other ones alone.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set log_line_prefix and application name in test drivers

2016-09-29 Thread Alvaro Herrera
Tom Lane wrote:
> Robert Haas  writes:
> > As long as we get %t and %p in there we're going to be way ahead, really.
> 
> Could we get consensus on just changing the default to
> 
>   log_line_prefix = '%t [%p] '
> 
> and leaving the rest out of it?

+1 from me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Peter Geoghegan
On Thu, Sep 29, 2016 at 2:59 PM, Robert Haas  wrote:
>> Maybe that was the wrong choice of words. What I mean is that it seems
>> somewhat unprincipled to give over an equal share of memory to each
>> active-at-least-once tape, ...
>
> I don't get it.  If the memory is being used for prereading, then the
> point is just to avoid doing many small I/Os instead of one big I/O,
> and that goal will be accomplished whether the memory is equally
> distributed or not; indeed, it's likely to be accomplished BETTER if
> the memory is equally distributed than if it isn't.

I think it could hurt performance if preloading loads runs on a tape
that won't be needed until some subsequent merge pass, in preference
to using that memory proportionately, giving more to larger input runs
for *each* merge pass (giving memory proportionate to the size of each
run to be merged from each tape).  For tapes with a dummy run, the
appropriate amount of memory for there next merge pass is zero.

I'm not arguing that it would be worth it to do that, but I do think
that that's the sensible way of framing the idea of using a uniform
amount of memory to every maybe-active tape up front. I'm not too
concerned about this because I'm never too concerned about multiple
merge pass cases, which are relatively rare and relatively
unimportant. Let's just get our story straight.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-09-29 Thread Peter Eisentraut
On 9/26/16 10:34 PM, Michael Paquier wrote:
> I thought that as long as the error string is shown to the user, it
> does not matter much if errno is still saved or not. All the callers
> of durable_rename() on frontends don't check for strerrno(errno)
> afterwards. Do you think it matters? Changing that back is trivial.
> Sorry for not mentioning directly in the thread that I modified that
> when dropping the last patch set.

Actually, I think the equivalent backend code only does this to save the
errno across the close call because the elog call comes after the close.
 So it's OK to not do that in the frontend.

With that in mind, I have committed the v3 series now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "Re: Question about grant create on database and pg_dump/pg_dumpall

2016-09-29 Thread Tom Lane
Robert Haas  writes:
> On Thu, Sep 29, 2016 at 4:04 AM, Haribabu Kommi
>  wrote:
>> I am also not sure whether pg_dumpall -g and then individual pg_dump
>> is the more widely used approach or not?

> That's the approach I normally recommend.

The fundamental thing we have to do in order to move forward on this is
to rethink what's the division of labor between pg_dump and pg_dumpall.
I find the patch as presented quite unacceptable because it's made no
effort to do that (or even to touch the documentation).

What do people think of this sketch:

1. pg_dump without --create continues to do what it does today, ie it just
dumps objects within the database, assuming that database-level properties
will already be set correctly for the target database.

2. pg_dump with --create creates the target database and also sets all
database-level properties (ownership, ACLs, ALTER DATABASE SET, etc etc).

3. pg_dumpall loses all code relating to individual-database creation
and property setting and instead relies on pg_dump --create to do that.
This would leave only the code relating to "pg_dumpall -g" (ie, dump roles
and tablespaces) within pg_dumpall itself.

One thing that would still be messy is that presumably "pg_dumpall -g"
would issue ALTER ROLE SET commands, but it's unclear what to do with
ALTER ROLE IN DATABASE SET commands.  Should those become part of
"pg_dump --create"'s charter?  It seems like not, but I'm not certain.

Another thing that requires some thought is that pg_dumpall is currently
willing to dump ACLs and other properties for template1/template0, though
it does not invoke pg_dump on them.  If we wanted to preserve that
behavior while still moving the code that does those things to pg_dump,
pg_dump would have to grow an option that would let it do that.  But
I'm not sure how much of that behavior is actually sensible.

This would probably take a pg_dump archive version bump, since I think
we don't currently record enough information for --create to do this
(and we can't just cram the extra commands into the DATABASE entry,
since we don't know whether --create will be specified to pg_restore).
But we've done those before.

Thoughts?  Is there a better way to look at this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-29 Thread David Steele

On 9/28/16 10:32 PM, Michael Paquier wrote:

On Thu, Sep 29, 2016 at 7:45 AM, David Steele  wrote:


In general I agree with the other comments that this could end up being
a problem.  On the other hand, since the additional locks are only taken
at checkpoint or archive_timeout it may not be that big a deal.


Yes, I did some tests on my laptop a couple of months back, that has 4
cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase
contention and performing a bunch of INSERT using 4 clients on 4
different relations I could not catch a difference.. Autovacuum was
disabled to eliminate any noise. I tried checkpoint_segments at 30s to
see its effects, as well as larger values to see the impact with the
standby snapshot taken by the bgwriter. Other thoughts are welcome.


I don't have any better ideas than that.


+++ b/src/backend/postmaster/checkpointer.c
+   /* OK, it's time to switch */
+   elog(LOG, "Request XLog Switch");

LOG level seems a bit much here, perhaps DEBUG1?


That's from Horiguchi-san's patch, and those would be definitely
better as DEBUG1 by looking at it. Now and in order to keep things
simple I think that we had better discard this patch for now. I was
planning to come back to this thing anyway once we are done with the
first problem.


I still see this:

+++ b/src/backend/postmaster/checkpointer.c
/* OK, it's time to switch */
+   elog(LOG, "Request XLog Switch");


Well for now attached are two patches, that could just be squashed into one.


Yes, I think that makes sense.

More importantly, there is a regression.  With your new patch the xlogs 
are switching on archive_timeout again even with no changes.  The v12 
worked fine.


The differences are all in 0002-hs-checkpoints-v12-2.patch and as far as 
I can see the patch does not work correctly without these changes.  Am I 
missing something?


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Heikki Linnakangas

On 09/29/2016 05:41 PM, Heikki Linnakangas wrote:

Here's a new patch version, addressing the points you made. Please have
a look!


Bah, I fumbled the initSlabAllocator() function, attached is a fixed 
version.


- Heikki

>From bd74cb9c32b3073637d6932f3b4552598fcdc92a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 14 Sep 2016 17:29:11 +0300
Subject: [PATCH 1/1] Change the way pre-reading in external sort's merge phase
 works.

Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.

Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE. In other words, it's used whenever we do
an external sort.

Reviewed by Peter Geoghegan and Claudio Freire.
---
 src/backend/utils/sort/logtape.c   |  153 -
 src/backend/utils/sort/tuplesort.c | 1208 +---
 src/include/utils/logtape.h|1 +
 3 files changed, 565 insertions(+), 797 deletions(-)

diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 7745207..4152da1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -52,12 +52,17 @@
  * not clear this helps much, but it can't hurt.  (XXX perhaps a LIFO
  * policy for free blocks would be better?)
  *
+ * To further make the I/Os more sequential, we can use a larger buffer
+ * when reading, and read multiple blocks from the same tape in one go,
+ * whenever the buffer becomes empty. LogicalTapeAssignReadBufferSize()
+ * can be used to set the size of the read buffer.
+ *
  * To support the above policy of writing to the lowest free block,
  * ltsGetFreeBlock sorts the list of free block numbers into decreasing
  * order each time it is asked for a block and the list isn't currently
  * sorted.  This is an efficient way to handle it because we expect cycles
  * of releasing many blocks followed by re-using many blocks, due to
- * tuplesort.c's "preread" behavior.
+ * the larger read buffer.
  *
  * Since all the bookkeeping and buffer memory is allocated with palloc(),
  * and the underlying file(s) are made with OpenTemporaryFile, all resources
@@ -79,6 +84,7 @@
 
 #include "storage/buffile.h"
 #include "utils/logtape.h"
+#include "utils/memutils.h"
 
 /*
  * Block indexes are "long"s, so we can fit this many per indirect block.
@@ -131,9 +137,18 @@ typedef struct LogicalTape
 	 * reading.
 	 */
 	char	   *buffer;			/* physical buffer (separately palloc'd) */
+	int			buffer_size;	/* allocated size of the buffer */
 	long		curBlockNumber; /* this block's logical blk# within tape */
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
+
+	/*
+	 * Desired buffer size to use when reading.  To keep things simple, we
+	 * use a single-block buffer when writing, or when reading a frozen
+	 * tape.  But when we are reading and will only read forwards, we
+	 * allocate a larger buffer, determined by read_buffer_size.
+	 */
+	int			read_buffer_size;
 } LogicalTape;
 
 /*
@@ -228,6 +243,53 @@ ltsReadBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 }
 
 /*
+ * Read as many blocks as we can into the per-tape buffer.
+ *
+ * The caller can specify the next physical block number to read, in
+ * datablocknum, or -1 to fetch the next block number from the internal block.
+ * If datablocknum == -1, the caller must've already set curBlockNumber.
+ *
+ * Returns true if anything was read, 'false' on EOF.
+ */
+static bool
+ltsReadFillBuffer(LogicalTapeSet *lts, LogicalTape *lt, long datablocknum)
+{
+	lt->pos = 0;
+	lt->nbytes = 0;
+
+	do
+	{
+		/* Fetch next block number (unless provided by caller) */
+		if (datablocknum == -1)
+		{
+			datablocknum = ltsRecallNextBlockNum(lts, lt->indirect, lt->frozen);
+			if (datablocknum == -1L)
+break;			/* EOF */
+			lt->curBlockNumber++;
+		}
+
+		/* Read the block */
+		ltsReadBlock(lts, datablocknum, (void *) (lt->buffer + lt->nbytes));
+		if (!lt->frozen)
+			ltsReleaseBlock(lts, datablocknum);
+
+		if (lt->curBlockNumber < lt->numFullBlocks)
+			lt->nbytes += BLCKSZ;
+		else
+		{
+			/* EOF */
+			lt->nbytes += lt->lastBlockBytes;
+			break;
+		}
+
+		/* Advance to next block, if we have buffer space left */
+		datablocknum = -1;
+	} 

Re: [HACKERS] PL/Python adding support for multi-dimensional arrays

2016-09-29 Thread Dave Cramer
On 27 September 2016 at 14:58, Heikki Linnakangas  wrote:

> On 09/27/2016 02:04 PM, Dave Cramer wrote:
>
>> On 26 September 2016 at 14:52, Dave Cramer  wrote:
>>
>>> This crashes with arrays with non-default lower bounds:

 postgres=# SELECT * FROM test_type_conversion_array_int
 4('[2:4]={1,2,3}');
 INFO:  ([1, 2, ], )
 server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.

 Attached patch fixes this bug, and adds a test for it.

>>>
> I spent some more time massaging this:
>
> * Changed the loops from iterative to recursive style. I think this indeed
> is slightly easier to understand.
>
> * Fixed another segfault, with too deeply nested lists:
>
> CREATE or replace FUNCTION test_type_conversion_mdarray_toodeep() RETURNS
> int[] AS $$
> return [[1]]
> $$ LANGUAGE plpythonu;
>
> * Also, in PLySequence_ToArray(), we must check that the 'len' of the
> array doesn't overflow.
>
> * Fixed reference leak in the loop in PLySequence_ToArray() to count the
> number of dimensions.
>
> I'd like to see some updates to the docs for this. The manual doesn't
 currently say anything about multi-dimensional arrays in pl/python, but
 it
 should've mentioned that they're not supported. Now that it is
 supported,
 should mention that, and explain briefly that a multi-dimensional array
 is
 mapped to a python list of lists.

 If the code passes I'll fix the docs
>>>
>>
> Please do, thanks!
>
>
see attached



Dave Cramer

da...@postgresintl.com
www.postgresintl.com


0002-WIP-Multi-dimensional-arrays-in-PL-python.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-29 Thread Amit Kapila
On Thu, Sep 29, 2016 at 6:04 AM, Robert Haas  wrote:
> On Wed, Sep 28, 2016 at 3:04 PM, Robert Haas  wrote:
>> I'll write another email with my thoughts about the rest of the patch.
>
> I think that the README changes for this patch need a fairly large
> amount of additional work.  Here are a few things I notice:
>
> - The confusion between buckets and pages hasn't been completely
> cleared up.  If you read the beginning of the README, the terminology
> is clearly set forth.  It says:
>
>>> A hash index consists of two or more "buckets", into which tuples are 
>>> placed whenever their hash key maps to the bucket number.  Each bucket in 
>>> the hash index comprises one or more index pages.  The bucket's first page 
>>> is permanently assigned to it when the bucket is created. Additional pages, 
>>> called "overflow pages", are added if the bucket receives too many tuples 
>>> to fit in the primary bucket page."
>
> But later on, you say:
>
>>> Scan will take a lock in shared mode on the primary bucket or on one of the 
>>> overflow page.
>
> So the correct terminology here would be "primary bucket page" not
> "primary bucket".
>
> - In addition, notice that there are two English errors in this
> sentence: the word "the" needs to be added to the beginning of the
> sentence, and the last word needs to be "pages" rather than "page".
> There are a considerable number of similar minor errors; if you can't
> fix them, I'll make a pass over it and clean it up.
>
> - The whole "lock definitions" section seems to me to be pretty loose
> and imprecise about what is happening.  For example, it uses the term
> "split-in-progress" without first defining it.  The sentence quoted
> above says that scans take a lock in shared mode either on the primary
> page or on one of the overflow pages, but it's not to document code by
> saying that it will do either A or B without explaining which one!  In
> fact, I think that a scan will take a content lock first on the
> primary bucket page and then on each overflow page in sequence,
> retaining a pin on the primary buffer page throughout the scan.  So it
> is not one or the other but both in a particular sequence, and that
> can and should be explained.
>
> Another problem with this section is that even when it's precise about
> what is going on, it's probably duplicating what is or should be in
> the following sections where the algorithms for each operation are
> explained.  In the original wording, this section explains what each
> lock protects, and then the following sections explain the algorithms
> in the context of those definitions.  Now, this section contains a
> sketch of the algorithm, and then the following sections lay it out
> again in more detail.  The question of what each lock protects has
> been lost.  Here's an attempt at some text to replace what you have
> here:
>
> ===
> Concurrency control for hash indexes is provided using buffer content
> locks, buffer pins, and cleanup locks.   Here as elsewhere in
> PostgreSQL, cleanup lock means that we hold an exclusive lock on the
> buffer and have observed at some point after acquiring the lock that
> we hold the only pin on that buffer.  For hash indexes, a cleanup lock
> on a primary bucket page represents the right to perform an arbitrary
> reorganization of the entire bucket, while a cleanup lock on an
> overflow page represents the right to perform a reorganization of just
> that page.  Therefore, scans retain a pin on both the primary bucket
> page and the overflow page they are currently scanning, if any.
>

I don't think we take cleanup lock on overflow page, so I will edit that part.

> Splitting a bucket requires a cleanup lock on both the old and new
> primary bucket pages.  VACUUM therefore takes a cleanup lock on every
> bucket page in turn order to remove tuples.  It can also remove tuples
> copied to a new bucket by any previous split operation, because the
> cleanup lock taken on the primary bucket page guarantees that no scans
> which started prior to the most recent split can still be in progress.
> After cleaning each page individually, it attempts to take a cleanup
> lock on the primary bucket page in order to "squeeze" the bucket down
> to the minimum possible number of pages.
> ===
>
> As I was looking at the old text regarding deadlock risk, I realized
> what may be a serious problem.  Suppose process A is performing a scan
> of some hash index.  While the scan is suspended, it attempts to take
> a lock and is blocked by process B.  Process B, meanwhile, is running
> VACUUM on that hash index.  Eventually, it will do
> LockBufferForCleanup() on the hash bucket on which process A holds a
> buffer pin, resulting in an undetected deadlock. In the current
> coding, A would hold a heavyweight lock and B would attempt to acquire
> a conflicting heavyweight lock, and the deadlock detector would kill
> one of them.  This patch probably breaks that.  I 

Re: [HACKERS] "Re: Question about grant create on database and pg_dump/pg_dumpall

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 4:04 AM, Haribabu Kommi
 wrote:
> I am also not sure whether pg_dumpall -g and then individual pg_dump
> is the more widely used approach or not?

That's the approach I normally recommend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Heikki Linnakangas

On 09/29/2016 01:52 PM, Peter Geoghegan wrote:

* Variables like maxTapes have a meaning that is directly traceable
back to Knuth's description of polyphase merge. I don't think that you
should do anything to them, on general principle.


Ok. I still think that changing maxTapes would make sense, when there 
are fewer runs than tapes, but that is actually orthogonal to the rest 
of the patch, so let's discuss that separately. I've changed the patch 
to not do that.



* Everything or almost everything that you've added to mergeruns()
should probably be in its own dedicated function. This function can
have a comment where you acknowledge that it's not perfectly okay that
you claim memory per-tape, but it's simpler and faster overall.


Ok.


* I think you should be looking at the number of active tapes, and not
state->Level or state->currentRun in this new function. Actually,
maybe this wouldn't be the exact definition of an active tape that we
establish at the beginning of beginmerge() (this considers tapes with
dummy runs to be inactive for that merge), but it would look similar.
The general concern I have here is that you shouldn't rely on
state->Level or state->currentRun from a distance for the purposes of
determining which tapes need some batch memory. This is also where you
might say something like: "we don't bother with shifting memory around
tapes for each merge step, even though that would be fairer. That's
why we don't use the beginmerge() definition of activeTapes --
instead, we use our own broader definition of the number of active
tapes that doesn't exclude dummy-run-tapes with real runs, making the
allocation reasonably sensible for every merge pass".


I'm not sure I understood what your concern was, but please have a look 
at this new version, if the comments in initTapeBuffers() explain that 
well enough.



* The "batchUsed" terminology really isn't working here, AFAICT. For
one thing, you have two separate areas where caller tuples might
reside: The small per-tape buffers (sized MERGETUPLEBUFFER_SIZE per
tape), and the logtape.c controlled preread buffers (sized up to
MaxAllocSize per tape). Which of these two things is batch memory? I
think it might just be the first one, but KiBs of memory do not
suggest "batch" to me. Isn't that really more like what you might call
double buffer memory, used not to save overhead from palloc (having
many palloc headers in memory), but rather to recycle memory
efficiently? So, these two things should have two new names of their
own, I think, and neither should be called "batch memory" IMV. I see
several assertions remain here and there that were written with my
original definition of batch memory in mind -- assertions like:


Ok. I replaced "batch" terminology with "slab allocator" and "slab 
slots", I hope this is less confusing. This isn't exactly like e.g. the 
slab allocator in the Linux kernel, as it has a fixed number of slots, 
so perhaps an "object pool" might be more accurate. But I like "slab" 
because it's not used elsewhere in the system. I also didn't use the 
term "cache" for the "slots", as might be typical for slab allocators, 
because "cache" is such an overloaded term.



case TSS_SORTEDONTAPE:
Assert(forward || state->randomAccess);
Assert(!state->batchUsed);

(Isn't state->batchUsed always true now?)


Good catch. It wasn't, because mergeruns() set batchUsed only after 
checking for the TSS_SORTEDONTAPE case, even though it set up the batch 
memory arena before it. So if replacement selection was able to produce 
a single run batchUsed was false. Fixed, the slab allocator (as it's now 
called) is now always used in TSS_SORTEDONTAPE case.



And:

case TSS_FINALMERGE:
Assert(forward);
Assert(state->batchUsed || !state->tuples);

(Isn't state->tuples only really of interest to datum-tuple-case
routines, now that you've made everything use large logtape.c preread
buffers?)


Yeah, fixed.


* Is is really necessary for !state->tuples cases to do that
MERGETUPLEBUFFER_SIZE-based allocation? In other words, what need is
there for pass-by-value datum cases to have this scratch/double buffer
memory at all, since the value is returned to caller by-value, not
by-reference? This is related to the problem of it not being entirely
clear what batch memory now is, I think.


True, fixed. I still set slabAllocatorUsed (was batchUsed), but it's 
initialized as a dummy 0-slot arena when !state->tuples.


Here's a new patch version, addressing the points you made. Please have 
a look!


- Heikki

>From a958cea32550825aa0ea487f58ac87c2c3620cda Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 14 Sep 2016 17:29:11 +0300
Subject: [PATCH 1/1] Change the way pre-reading in external sort's merge phase
 works.

Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number

Re: [HACKERS] Bug in to_timestamp().

2016-09-29 Thread Tom Lane
Robert Haas  writes:
> On Thu, Sep 29, 2016 at 4:54 AM, amul sul  wrote:
>> Commitfest status left untouched, I am not sure how to deal with
>> "Returned With Feedback" patch. Is there any chance that, this can be
>> considered again for committer review?

> You may be able to set the status back to "Ready for Committer", or
> else you can move the entry to the next CommitFest and do it there.

I already switched it back to "Needs Review".  AFAIK none of the statuses
are immovable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set log_line_prefix and application name in test drivers

2016-09-29 Thread Tom Lane
Christoph Berg  writes:
> Re: Tom Lane 2016-09-29 <16946.1475157...@sss.pgh.pa.us>
>> Personally I'm also on board with using this for regression testing:
>> log_line_prefix = '%t [%p] %q%a '
>> but I doubt it can be sold as a general-purpose default.

> I don't think it makes much sense to log only %a unconditionally,

Right; this is helpful for the regression tests, now that Peter has set
up pg_regress to set the application name, but I can't see trying to
push it on the rest of the world.

> Possibly the longer version could be added as an example in the
> documentation.

I suspect that simply having a nonempty default in the first place
is going to do more to raise peoples' awareness than anything we
could do in the documentation.  But perhaps an example along these
lines would be useful for showing proper use of %q.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 10:14 AM, Tomas Vondra
 wrote:
>> It's not impossible that the longer runs could matter - performance
>> isn't necessarily stable across time during a pgbench test, and the
>> longer the run the more CLOG pages it will fill.
>
> Sure, but I'm not doing just a single pgbench run. I do a sequence of
> pgbench runs, with different client counts, with ~6h of total runtime.
> There's a checkpoint in between the runs, but as those benchmarks are on
> unlogged tables, that flushes only very few buffers.
>
> Also, the clog SLRU has 128 pages, which is ~1MB of clog data, i.e. ~4M
> transactions. On some kernels (3.10 and 3.12) I can get >50k tps with 64
> clients or more, which means we fill the 128 pages in less than 80 seconds.
>
> So half-way through the run only 50% of clog pages fits into the SLRU, and
> we have a data set with 30M tuples, with uniform random access - so it seems
> rather unlikely we'll get transaction that's still in the SLRU.
>
> But sure, I can do a run with larger data set to verify this.

OK, another theory: Dilip is, I believe, reinitializing for each run,
and you are not.  Maybe somehow the effect Dilip is seeing only
happens with a newly-initialized set of pgbench tables.  For example,
maybe the patches cause a huge improvement when all rows have the same
XID, but the effect fades rapidly once the XIDs spread out...

I'm not saying any of what I'm throwing out here is worth the
electrons upon which it is printed, just that there has to be some
explanation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 4:54 AM, amul sul  wrote:
> On Thu, Sep 29, 2016 at 2:48 AM, Tom Lane  wrote:
>> Artur Zakirov  writes:
>>> - now DCH_cache_getnew() is called after parse_format(). Because now
>>> parse_format() can raise an error and in the next attempt
>>> DCH_cache_search() could return broken cache entry.
>>
>> I started looking at your 0001-to-timestamp-format-checking-v4.patch
>> and this point immediately jumped out at me.  Currently the code relies
>> ... without any documentation ... on no elog being thrown out of
>> parse_format().  That's at the very least trouble waiting to happen.
>> There's a hack to deal with errors from within the NUMDesc_prepare
>> subroutine, but it's a pretty ugly and underdocumented hack.  And what
>> you had here was randomly different from that solution, too.
>>
>> After a bit of thought it seemed to me that a much cleaner fix is to add
>> a "valid" flag to the cache entries, which we can leave clear until we
>> have finished parsing the new format string.  That avoids adding extra
>> data copying as you suggested, removes the need for PG_TRY, and just
>> generally seems cleaner and more bulletproof.
>>
>> I've pushed a patch that does it that way.  The 0001 patch will need
>> to be rebased over that (might just require removal of some hunks,
>> not sure).
>>
>> I also pushed 0002-to-timestamp-validation-v2.patch with some revisions
>> (it'd broken acceptance of BC dates, among other things, but I think
>> I fixed everything).
>>
>> Since you told us earlier that you'd be on vacation through the end of
>> September, I'm assuming that nothing more will happen on this patch during
>> this commitfest, so I will mark the CF entry Returned With Feedback.
>
> Behalf of Artur I've rebased patch, removed hunk dealing with broken
> cache entries by copying it, which is no longer required after 83bed06
> commit.
>
> Commitfest status left untouched, I am not sure how to deal with
> "Returned With Feedback" patch. Is there any chance that, this can be
> considered again for committer review?

You may be able to set the status back to "Ready for Committer", or
else you can move the entry to the next CommitFest and do it there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set log_line_prefix and application name in test drivers

2016-09-29 Thread Christoph Berg
Re: Tom Lane 2016-09-29 <16946.1475157...@sss.pgh.pa.us>
> Robert Haas  writes:
> > As long as we get %t and %p in there we're going to be way ahead, really.
> 
> Could we get consensus on just changing the default to
> 
>   log_line_prefix = '%t [%p] '
> 
> and leaving the rest out of it?  I think pretty much everybody agrees
> that those fields are useful, whereas the rest of it is a lot more
> context-dependent.  (For example, there are a whole lot of real
> installations where neither %u nor %d would be worth the log space.)

Nod. In many installations %u and %d will be effectively constants.

> Personally I'm also on board with using this for regression testing:
> 
>   log_line_prefix = '%t [%p] %q%a '
> 
> but I doubt it can be sold as a general-purpose default.

I don't think it makes much sense to log only %a unconditionally,
except maybe for making more applications actually set it.

If we were to add more fields I'd go for

log_line_prefix = '%t [%p] %q%u@%d/%a '

but the above-proposed '%t [%p] ' is already fixing 95% of the problem
(and the remaining 5% are unclear).

Possibly the longer version could be added as an example in the
documentation.

So, in short, +1.

Christoph


signature.asc
Description: PGP signature


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Tomas Vondra

On 09/29/2016 03:47 PM, Robert Haas wrote:

On Wed, Sep 28, 2016 at 9:10 PM, Tomas Vondra
 wrote:

I feel like we must be missing something here.  If Dilip is seeing
huge speedups and you're seeing nothing, something is different, and
we don't know what it is.  Even if the test case is artificial, it
ought to be the same when one of you runs it as when the other runs
it.  Right?


Yes, definitely - we're missing something important, I think. One difference
is that Dilip is using longer runs, but I don't think that's a problem (as I
demonstrated how stable the results are).


It's not impossible that the longer runs could matter - performance
isn't necessarily stable across time during a pgbench test, and the
longer the run the more CLOG pages it will fill.



Sure, but I'm not doing just a single pgbench run. I do a sequence of 
pgbench runs, with different client counts, with ~6h of total runtime. 
There's a checkpoint in between the runs, but as those benchmarks are on 
unlogged tables, that flushes only very few buffers.


Also, the clog SLRU has 128 pages, which is ~1MB of clog data, i.e. ~4M 
transactions. On some kernels (3.10 and 3.12) I can get >50k tps with 64 
clients or more, which means we fill the 128 pages in less than 80 seconds.


So half-way through the run only 50% of clog pages fits into the SLRU, 
and we have a data set with 30M tuples, with uniform random access - so 
it seems rather unlikely we'll get transaction that's still in the SLRU.


But sure, I can do a run with larger data set to verify this.


I wonder what CPU model is Dilip using - I know it's x86, but not which
generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer
model and it makes a difference (although that seems unlikely).


The fact that he's using an 8-socket machine seems more likely to
matter than the CPU generation, which isn't much different.  Maybe
Dilip should try this on a 2-socket machine and see if he sees the
same kinds of results.



Maybe. I wouldn't expect a major difference between 4 and 8 sockets, but 
I may be wrong.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set log_line_prefix and application name in test drivers

2016-09-29 Thread Christoph Berg
Re: Peter Eisentraut 2016-09-29 
<21d2719f-36ff-06d2-5856-25ed48b96...@2ndquadrant.com>
> > Christoph/Debian:
> > log_line_prefix = '%t [%p-%l] %q%u@%d '
> > Peter:
> > log_line_prefix = '%t [%p]: [%l] %qapp=%a '
> 
> I'm aware of two existing guidelines on log line formats: syslog and
> pgbadger.  Syslog output looks like this:
> 
> Sep 28 00:58:56 hostname syslogd[46]: some text here
> 
> pgbadger by default asks for this:
> 
> log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
> 
> I don't know why it wants that "-1" there, and I'm actually not sure
> what the point of %l is in practice.  Those are separate issues that are
> having their own lively discussions at times.  I could drop the [%l]
> from my proposal if that causes concerns.

[%l-1] is originally from pgfouine, and I vaguely remember that it
used to be something like [%l-%c] where %c was some extra line
numbering removed in later (7.something?) PG versions. In any case,
the -1 isn't useful.

I'm happy to remove %l as well. Log lines won't be out of order
anyway, and one needs to look at %p anyway to correlate them. %l
doesn't help there.

Christoph


signature.asc
Description: PGP signature


Re: [HACKERS] Set log_line_prefix and application name in test drivers

2016-09-29 Thread Tom Lane
Robert Haas  writes:
> As long as we get %t and %p in there we're going to be way ahead, really.

Could we get consensus on just changing the default to

log_line_prefix = '%t [%p] '

and leaving the rest out of it?  I think pretty much everybody agrees
that those fields are useful, whereas the rest of it is a lot more
context-dependent.  (For example, there are a whole lot of real
installations where neither %u nor %d would be worth the log space.)

Personally I'm also on board with using this for regression testing:

log_line_prefix = '%t [%p] %q%a '

but I doubt it can be sold as a general-purpose default.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 6:52 AM, Peter Geoghegan  wrote:
>> How is it awkward?
>
> Maybe that was the wrong choice of words. What I mean is that it seems
> somewhat unprincipled to give over an equal share of memory to each
> active-at-least-once tape, ...

I don't get it.  If the memory is being used for prereading, then the
point is just to avoid doing many small I/Os instead of one big I/O,
and that goal will be accomplished whether the memory is equally
distributed or not; indeed, it's likely to be accomplished BETTER if
the memory is equally distributed than if it isn't.

I can imagine that there might be a situation in which it makes sense
to give a bigger tape more resources than a smaller one; for example,
if one were going to divide N tapes across K worker processess and
make individual workers or groups of workers responsible for sorting
particular tapes, one would of course want to divide up the data
across the available processes relatively evenly, rather than having
some workers responsible for only a small amount of data and others
for a very large amount of data.  But such considerations are
irrelevant here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set log_line_prefix and application name in test drivers

2016-09-29 Thread Robert Haas
On Wed, Sep 28, 2016 at 10:30 PM, Peter Eisentraut
 wrote:
> On 9/28/16 6:13 PM, Robert Haas wrote:
>> Christoph/Debian:
>> log_line_prefix = '%t [%p-%l] %q%u@%d '
>> Peter:
>> log_line_prefix = '%t [%p]: [%l] %qapp=%a '
>
> I'm aware of two existing guidelines on log line formats: syslog and
> pgbadger.  Syslog output looks like this:
>
> Sep 28 00:58:56 hostname syslogd[46]: some text here
>
> pgbadger by default asks for this:
>
> log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
>
> I don't know why it wants that "-1" there, and I'm actually not sure
> what the point of %l is in practice.  Those are separate issues that are
> having their own lively discussions at times.  I could drop the [%l]
> from my proposal if that causes concerns.
>
> On balance, I think my proposal is more in line with existing
> wide-spread conventions.

As long as we get %t and %p in there we're going to be way ahead, really.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Robert Haas
On Wed, Sep 28, 2016 at 9:10 PM, Tomas Vondra
 wrote:
>> I feel like we must be missing something here.  If Dilip is seeing
>> huge speedups and you're seeing nothing, something is different, and
>> we don't know what it is.  Even if the test case is artificial, it
>> ought to be the same when one of you runs it as when the other runs
>> it.  Right?
>>
> Yes, definitely - we're missing something important, I think. One difference
> is that Dilip is using longer runs, but I don't think that's a problem (as I
> demonstrated how stable the results are).

It's not impossible that the longer runs could matter - performance
isn't necessarily stable across time during a pgbench test, and the
longer the run the more CLOG pages it will fill.

> I wonder what CPU model is Dilip using - I know it's x86, but not which
> generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer
> model and it makes a difference (although that seems unlikely).

The fact that he's using an 8-socket machine seems more likely to
matter than the CPU generation, which isn't much different.  Maybe
Dilip should try this on a 2-socket machine and see if he sees the
same kinds of results.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql infinite loop

2016-09-29 Thread Tom Lane
=?GB2312?B?ufkg08I=?=  writes:
> I have a postgresql infinite problem. 
> when inserting data, a postgresql process involve to infinite loop and cpu 
> usage is 100%. perf top show that LWacquirelock is most costful.
> Callstack is below. 

Given that stack trace, I'd have to guess that this is caused by a
circular chain of right-links in a btree index.  How it got that
way is hard to say, but reindexing the index ought to fix it.

If you can show a recipe for producing an index that's broken
this way from a standing start, we'd be very interested.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] postgresql infinite loop

2016-09-29 Thread 郭 勇

I have a postgresql infinite problem. 
when inserting data, a postgresql process involve to infinite loop and cpu 
usage is 100%. perf top show that LWacquirelock is most costful.
Callstack is below. 
Other processes are normal and can process insert operation. 
But after some minutes, one another process involve to infinite loop.postgresql

I can reproduce the problem easily. About an hour, one of processes go into 
abnormal process firstly


postgresql version is 9.2.15
postgresql is running in a container which os is centos 7.2.1511. Os of 
physical machine is centos7.1.1503


#0  UnpinBuffer (buf=0x7fb47d11bdf8, fixOwner=) at bufmgr.c:1201
#1  0x0063947d in ReleaseAndReadBuffer (buffer=buffer@entry=2463, 
relation=relation@entry=0x7fb47cc58f98, blockNum=423) at bufmgr.c:1089
#2  0x00482b6f in _bt_relandgetbuf (rel=rel@entry=0x7fb47cc58f98, 
obuf=obuf@entry=2463, blkno=, access=access@entry=1) at 
nbtpage.c:639
#3  0x004863d8 in _bt_moveright (rel=rel@entry=0x7fb47cc58f98, 
buf=2463, keysz=keysz@entry=1, scankey=scankey@entry=0x178a350, 
nextkey=nextkey@entry=0 '\000', access=access@entry=1)
at nbtsearch.c:194
#4  0x0048669f in _bt_search (rel=rel@entry=0x7fb47cc58f98, 
keysz=keysz@entry=1, scankey=scankey@entry=0x178a350, nextkey=nextkey@entry=0 
'\000', bufP=bufP@entry=0x7fff0e5f8064,
access=access@entry=2) at nbtsearch.c:86
#5  0x0048249f in _bt_doinsert (rel=rel@entry=0x7fb47cc58f98, 
itup=itup@entry=0x1788200, checkUnique=checkUnique@entry=UNIQUE_CHECK_YES, 
heapRel=heapRel@entry=0x7fb47cc581d8)
at nbtinsert.c:118
#6  0x0048520d in btinsert (fcinfo=) at nbtree.c:257
#7  0x00728cfa in FunctionCall6Coll (flinfo=flinfo@entry=0x1758040, 
collation=collation@entry=0, arg1=arg1@entry=140413164162968, 
arg2=arg2@entry=140733434529104,
arg3=arg3@entry=140733434529360, arg4=arg4@entry=24658596, 
arg5=arg5@entry=140413164159448, arg6=1) at fmgr.c:1440
#8  0x0047f109 in index_insert 
(indexRelation=indexRelation@entry=0x7fb47cc58f98, 
values=values@entry=0x7fff0e5f8550, isnull=isnull@entry=0x7fff0e5f8650 "",
heap_t_ctid=heap_t_ctid@entry=0x17842a4, 
heapRelation=heapRelation@entry=0x7fb47cc581d8, 
checkUnique=checkUnique@entry=UNIQUE_CHECK_YES) at indexam.c:216
#9  0x0058db1d in ExecInsertIndexTuples (slot=slot@entry=0x1751b30, 
tupleid=tupleid@entry=0x17842a4, estate=estate@entry=0x1750fc0) at 
execUtils.c:1089
#10 0x0059adc7 in ExecInsert (canSetTag=1 '\001', estate=0x1750fc0, 
planSlot=0x1751b30, slot=0x1751b30) at nodeModifyTable.c:248
#11 ExecModifyTable (node=node@entry=0x17511e0) at nodeModifyTable.c:854
#12 0x00584478 in ExecProcNode (node=node@entry=0x17511e0) at 
execProcnode.c:376
#13 0x00581c40 in ExecutePlan (dest=0x176f238, direction=, numberTuples=0, sendTuples=0 '\000', operation=CMD_INSERT, 
planstate=0x17511e0, estate=0x1750fc0) at execMain.c:1411
#14 standard_ExecutorRun (queryDesc=0x1758850, direction=, 
count=0) at execMain.c:315
#15 0x0065e3ae in ProcessQuery (plan=,
sourceText=0x177ba80 "INSERT INTO token (id, expires, extra, valid, 
user_id, trust_id) VALUES ('d810c3f542414cfa86778249be70ab93', 
'2016-09-26T06:32:46.097976'::timestamp, '{\"token_data\": {\"token\": 
{\"methods\": [\"password\"]"..., params=0x0, dest=0x176f238, 
completionTag=0x7fff0e5f8c80 "") at pquery.c:185
#16 0x0065e5dd in PortalRunMulti (portal=portal@entry=0x1638380, 
isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x176f238, 
altdest=altdest@entry=0x176f238,
completionTag=completionTag@entry=0x7fff0e5f8c80 "") at pquery.c:1275
#17 0x0065f1a8 in PortalRun (portal=0x1638380, 
count=9223372036854775807, isTopLevel=, dest=0x176f238, 
altdest=0x176f238, completionTag=0x7fff0e5f8c80 "") at pquery.c:812
#18 0x0065b100 in PostgresMain (argc=, argv=, dbname=, username=) at postgres.c:1073
#19 0x006182c6 in BackendRun (port=0x163e9a0) at postmaster.c:3813
#20 BackendStartup (port=0x163e9a0) at postmaster.c:3484
#21 ServerLoop () at postmaster.c:1497
#22 0x00619077 in PostmasterMain (argc=argc@entry=1, 
argv=argv@entry=0x1616690) at postmaster.c:1226
#23 0x0045bb45 in main (argc=1, argv=0x1616690) at main.c:230




Re: [HACKERS] WAL logging problem in 9.4.3?

2016-09-29 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 29 Sep 2016 16:59:55 +0900, Michael Paquier  
wrote in 
> On Mon, Sep 26, 2016 at 5:03 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello, I return to this before my things:)
> >
> > Though I haven't played with the patch yet..
> 
> Be sure to run the test cases in the patch or base your tests on them then!

All items of 006_truncate_opt fail on ed0b228 and they are fixed
with the patch.

> > Though I don't know how it actually impacts the perfomance, it
> > seems to me that we can live with truncated_to and sync_above in
> > RelationData and BufferNeedsWAL(rel, buf) instead of
> > HeapNeedsWAL(rel, buf).  Anyway up to one entry for one relation
> > seems to exist at once in the hash.
> 
> TBH, I still think that the design of this patch as proposed is pretty
> cool and easy to follow.

It is clean from certain viewpoint but additional hash,
especially hash-searching on every HeapNeedsWAL seems to me to be
unacceptable. Do you see it accetable?


The attached patch is quiiiccck-and-dirty-hack of Michael's patch
just as a PoC of my proposal quoted above. This also passes the
006 test.  The major changes are the following.

- Moved sync_above and truncted_to into  RelationData.

- Cleaning up is done in AtEOXact_cleanup instead of explicit
  calling to smgrDoPendingSyncs().

* BufferNeedsWAL (replace of HeapNeedsWAL) no longer requires
  hash_search. It just refers to the additional members in the
  given Relation.

X I feel that I have dropped one of the features of the origitnal
  patch during the hack, but I don't recall it clearly now:(

X I haven't consider relfilenode replacement, which didn't matter
  for the original patch. (but there's few places to consider).

What do you think about this?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 38bba16..02e33cc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -34,6 +34,28 @@
  *	  the POSTGRES heap access method used for all POSTGRES
  *	  relations.
  *
+ * WAL CONSIDERATIONS
+ *	  All heap operations are normally WAL-logged. but there are a few
+ *	  exceptions. Temporary and unlogged relations never need to be
+ *	  WAL-logged, but we can also skip WAL-logging for a table that was
+ *	  created in the same transaction, if we don't need WAL for PITR or
+ *	  WAL archival purposes (i.e. if wal_level=minimal), and we fsync()
+ *	  the file to disk at COMMIT instead.
+ *
+ *	  The same-relation optimization is not employed automatically on all
+ *	  updates to a table that was created in the same transacton, because
+ *	  for a small number of changes, it's cheaper to just create the WAL
+ *	  records than fsyncing() the whole relation at COMMIT. It is only
+ *	  worthwhile for (presumably) large operations like COPY, CLUSTER,
+ *	  or VACUUM FULL. Use heap_register_sync() to initiate such an
+ *	  operation; it will cause any subsequent updates to the table to skip
+ *	  WAL-logging, if possible, and cause the heap to be synced to disk at
+ *	  COMMIT.
+ *
+ *	  To make that work, all modifications to heap must use
+ *	  HeapNeedsWAL() to check if WAL-logging is needed in this transaction
+ *	  for the given block.
+ *
  *-
  */
 #include "postgres.h"
@@ -55,6 +77,7 @@
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
+#include "catalog/storage.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
@@ -2331,12 +2354,6 @@ FreeBulkInsertState(BulkInsertState bistate)
  * The new tuple is stamped with current transaction ID and the specified
  * command ID.
  *
- * If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not
- * logged in WAL, even for a non-temp relation.  Safe usage of this behavior
- * requires that we arrange that all new tuples go into new pages not
- * containing any tuples from other transactions, and that the relation gets
- * fsync'd before commit.  (See also heap_sync() comments)
- *
  * The HEAP_INSERT_SKIP_FSM option is passed directly to
  * RelationGetBufferForTuple, which see for more info.
  *
@@ -2440,7 +2457,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
-	if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))
+	if (BufferNeedsWAL(relation, buffer))
 	{
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
@@ -2639,12 +2656,10 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	int			ndone;
 	char	   *scratch = NULL;
 	Page		page;
-	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
 	bool		need_cids = RelationIsAccessibleInLogicalDecoding(relation);
 
-	needwal 

Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-09-29 Thread Etsuro Fujita

On 2016/09/29 20:03, Ashutosh Bapat wrote:

On Tue, Apr 5, 2016 at 10:54 AM, Amit Langote
 wrote:

How about the attached that teaches
extract_query_dependencies() to add a foreign table and associated foreign
data wrapper and foreign server to invalItems.  Also, it adds plan cache
callbacks for respective caches.



Although this is a better solution than the previous one, it
invalidates the query tree along with the generic plan. Invalidating
the query tree and the generic plan required parsing the query again
and generating the plan. Instead I think, we should invalidate only
the generic plan and not the query tree.


Agreed.


The attached patch adds the
dependencies from create_foreignscan_plan() which is called for any
foreign access. I have also added testcases to test the functionality.
Let me know your comments on the patch.


Hmm.  I'm not sure that that's a good idea.

I was thinking the changes to setrefs.c proposed by Amit to collect that 
dependencies would be probably OK, but I wasn't sure that it's a good 
idea that he used PlanCacheFuncCallback as the syscache inval callback 
function for the foreign object caches because it invalidates not only 
generic plans but query trees, as you mentioned downthread.  So, I was 
thinking to modify his patch so that we add a new syscache inval 
callback function for the caches that is much like PlanCacheFuncCallback 
but only invalidates generic plans.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-09-29 Thread Etsuro Fujita

On 2016/09/29 20:06, Ashutosh Bapat wrote:

On Wed, Aug 24, 2016 at 1:55 PM, Etsuro Fujita
 wrote:

On 2016/04/04 23:24, Tom Lane wrote:

A related issue, now that I've seen this example, is that altering
FDW-level or server-level options won't cause a replan either.  I'm
not sure there's any very good fix for that.  Surely we don't want
to try to identify all tables belonging to the FDW or server and
issue relcache invals on all of them.



While improving join pushdown in postgres_fdw, I happened to notice that the
fetch_size option in 9.6 has the same issue.  A forced replan discussed
above would fix that issue, but I'm not sure that that's a good idea,
because the fetch_size option, which postgres_fdw gets at GetForeignRelSize,
is not used for anything in creating a plan.  So, as far as the fetch_size
option is concerned, a better solution is (1) to consult that option at
execution time, probably at BeginForeignScan, not at planning time such as
GetForiegnRelSize (and (2) to not cause that replan when altering that
option.)  Maybe I'm missing something, though.



As explained in my latest patch, an FDW knows which of the options,
when changed, render a plan invalid. If we have to track the changes
for each option, an FDW will need to consulted to check whether that
options affects the plan or not. That may be an overkill and there is
high chance that the FDW authors wouldn't bother implementing that
hook.


I thought it would be better to track that dependencies to avoid useless 
replans, but I changed my mind; I agree with Amit's approach.  In 
addition to what you mentioned, ISTM that users don't change such 
options frequently, so I think Amit's approach is much practical.



Let me know your views on my latest patch on this thread.


OK, will do.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2016-09-29 Thread Etsuro Fujita

On 2016/09/28 18:35, Etsuro Fujita wrote:

Attached is an updated version of the patch.


I found a minor bug in that patch; the relation_index added to 
PgFdwRelationInfo was defined as Index, but I used the modifier %d to 
print that.  So, I changed the data type to int.  Also, I added a bit 
more comments.  Please find attached an updated version of the patch.


Best regards,
Etsuro Fujita


postgres-fdw-more-full-join-pushdown-v5.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-09-29 Thread Ashutosh Bapat
On Wed, Aug 24, 2016 at 1:55 PM, Etsuro Fujita
 wrote:
> On 2016/04/04 23:24, Tom Lane wrote:
>>
>> Amit Langote  writes:
>>>
>>> On 2016/04/04 15:17, Rajkumar Raghuwanshi wrote:

 * .Observation: *Prepare statement execution plan is not getting changed
 even after altering foreign table to point to new schema.
>
>
>>> I wonder if performing relcache invalidation upon ATExecGenericOptions()
>>> is the correct solution for this problem.  The attached patch fixes the
>>> issue for me.
>
>
>> A forced relcache inval will certainly fix it, but I'm not sure if that's
>> the best (or only) place to put it.
>
>
>> A related issue, now that I've seen this example, is that altering
>> FDW-level or server-level options won't cause a replan either.  I'm
>> not sure there's any very good fix for that.  Surely we don't want
>> to try to identify all tables belonging to the FDW or server and
>> issue relcache invals on all of them.
>
>
> While improving join pushdown in postgres_fdw, I happened to notice that the
> fetch_size option in 9.6 has the same issue.  A forced replan discussed
> above would fix that issue, but I'm not sure that that's a good idea,
> because the fetch_size option, which postgres_fdw gets at GetForeignRelSize,
> is not used for anything in creating a plan.  So, as far as the fetch_size
> option is concerned, a better solution is (1) to consult that option at
> execution time, probably at BeginForeignScan, not at planning time such as
> GetForiegnRelSize (and (2) to not cause that replan when altering that
> option.)  Maybe I'm missing something, though.

As explained in my latest patch, an FDW knows which of the options,
when changed, render a plan invalid. If we have to track the changes
for each option, an FDW will need to consulted to check whether that
options affects the plan or not. That may be an overkill and there is
high chance that the FDW authors wouldn't bother implementing that
hook. Let me know your views on my latest patch on this thread.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-09-29 Thread Ashutosh Bapat
On Tue, Apr 5, 2016 at 10:54 AM, Amit Langote
 wrote:
> On 2016/04/05 0:23, Tom Lane wrote:
>> Amit Langote  writes:
>>> On Mon, Apr 4, 2016 at 11:24 PM, Tom Lane  wrote:
 A related issue, now that I've seen this example, is that altering
 FDW-level or server-level options won't cause a replan either.  I'm
 not sure there's any very good fix for that.  Surely we don't want
 to try to identify all tables belonging to the FDW or server and
 issue relcache invals on all of them.
>>
>>> Hm, some kind of PlanInvalItem-based solution could work maybe?
>>
>> Hm, so we'd expect that whenever an FDW consulted the options while
>> making a plan, it'd have to record a plan dependency on them?  That
>> would be a clean fix maybe, but I'm worried that third-party FDWs
>> would fail to get the word about needing to do this.
>
> I would imagine that that level of granularity may be a little too much; I
> mean tracking dependencies at the level of individual FDW/foreign
> table/foreign server options.  I think it should really be possible to do
> the entire thing in core instead of requiring this to be made a concern of
> FDW authors.  How about the attached that teaches
> extract_query_dependencies() to add a foreign table and associated foreign
> data wrapper and foreign server to invalItems.  Also, it adds plan cache
> callbacks for respective caches.

Although this is a better solution than the previous one, it
invalidates the query tree along with the generic plan. Invalidating
the query tree and the generic plan required parsing the query again
and generating the plan. Instead I think, we should invalidate only
the generic plan and not the query tree. The attached patch adds the
dependencies from create_foreignscan_plan() which is called for any
foreign access. I have also added testcases to test the functionality.
Let me know your comments on the patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index d97e694..7ca1102 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2480,26 +2480,114 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st5('foo', 1);
Filter: (t1.c8 = $1)
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = $1::integer))
 (4 rows)
 
 EXECUTE st5('foo', 1);
  c1 | c2 |  c3   |  c4  |c5| c6 | c7 | c8  
 ++---+--+--+++-
   1 |  1 | 1 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1  | foo
 (1 row)
 
+-- changing foreign table option should change the plan
+PREPARE st6 AS SELECT * FROM ft4;
+PREPARE st7 AS UPDATE ft4 SET c1 = c1;
+PREPARE st8 AS UPDATE ft4 SET c1 = random();
+EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st6;
+QUERY PLAN
+--
+ Foreign Scan on public.ft4
+   Output: c1, c2, c3
+   Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
+ QUERY PLAN 
+
+ Update on public.ft4
+   ->  Foreign Update on public.ft4
+ Remote SQL: UPDATE "S 1"."T 3" SET c1 = c1
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st8;
+ QUERY PLAN  
+-
+ Update on public.ft4
+   Remote SQL: UPDATE "S 1"."T 3" SET c1 = $2 WHERE ctid = $1
+   ->  Foreign Scan on public.ft4
+ Output: random(), c2, c3, ctid
+ Remote SQL: SELECT c2, c3, ctid FROM "S 1"."T 3" FOR UPDATE
+(5 rows)
+
+ALTER FOREIGN TABLE ft4 OPTIONS (SET table_name 'T 4');
+EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st6;
+QUERY PLAN
+--
+ Foreign Scan on public.ft4
+   Output: c1, c2, c3
+   Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
+ QUERY PLAN 
+
+ Update on public.ft4
+   ->  Foreign Update on public.ft4
+ Remote SQL: UPDATE "S 1"."T 4" SET c1 = c1
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st8;
+ QUERY PLAN  
+-
+ Update on public.ft4
+   Remote SQL: UPDATE "S 1"."T 4" SET c1 = $2 WHERE ctid = $1
+   ->  Foreign Scan on public.ft4
+ Output: random(), c2, c3, ctid
+ Remote SQL: SELECT c2, c3, ctid FROM "S 

Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-29 Thread David Steele
On 9/28/16 9:55 PM, Peter Eisentraut wrote:
> On 9/28/16 2:45 AM, Michael Paquier wrote:
>> After all that fixed, I have moved the patch to "Ready for Committer".
>> Please use the updated patch though.
> 
> Committed after some cosmetic changes.

Thank you, Peter!

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Peter Geoghegan
On Thu, Sep 29, 2016 at 10:49 AM, Heikki Linnakangas  wrote:
>> Do I have that right? If so, this seems rather awkward. Hmm.
>
>
> How is it awkward?

Maybe that was the wrong choice of words. What I mean is that it seems
somewhat unprincipled to give over an equal share of memory to each
active-at-least-once tape, regardless of how much that matters in
practice. One tape could have several runs in memory at once, while
another only has a fraction of a single much larger run. Maybe this is
just the first time that I find myself on the *other* side of a
discussion about an algorithm that seems brute-force compared to what
it might replace, but is actually better overall.   :-)

Now, that could be something that I just need to get over. In any
case, I still think:

* Variables like maxTapes have a meaning that is directly traceable
back to Knuth's description of polyphase merge. I don't think that you
should do anything to them, on general principle.

* Everything or almost everything that you've added to mergeruns()
should probably be in its own dedicated function. This function can
have a comment where you acknowledge that it's not perfectly okay that
you claim memory per-tape, but it's simpler and faster overall.

* I think you should be looking at the number of active tapes, and not
state->Level or state->currentRun in this new function. Actually,
maybe this wouldn't be the exact definition of an active tape that we
establish at the beginning of beginmerge() (this considers tapes with
dummy runs to be inactive for that merge), but it would look similar.
The general concern I have here is that you shouldn't rely on
state->Level or state->currentRun from a distance for the purposes of
determining which tapes need some batch memory. This is also where you
might say something like: "we don't bother with shifting memory around
tapes for each merge step, even though that would be fairer. That's
why we don't use the beginmerge() definition of activeTapes --
instead, we use our own broader definition of the number of active
tapes that doesn't exclude dummy-run-tapes with real runs, making the
allocation reasonably sensible for every merge pass".

* The "batchUsed" terminology really isn't working here, AFAICT. For
one thing, you have two separate areas where caller tuples might
reside: The small per-tape buffers (sized MERGETUPLEBUFFER_SIZE per
tape), and the logtape.c controlled preread buffers (sized up to
MaxAllocSize per tape). Which of these two things is batch memory? I
think it might just be the first one, but KiBs of memory do not
suggest "batch" to me. Isn't that really more like what you might call
double buffer memory, used not to save overhead from palloc (having
many palloc headers in memory), but rather to recycle memory
efficiently? So, these two things should have two new names of their
own, I think, and neither should be called "batch memory" IMV. I see
several assertions remain here and there that were written with my
original definition of batch memory in mind -- assertions like:

case TSS_SORTEDONTAPE:
Assert(forward || state->randomAccess);
Assert(!state->batchUsed);

(Isn't state->batchUsed always true now?)

And:

case TSS_FINALMERGE:
Assert(forward);
Assert(state->batchUsed || !state->tuples);

(Isn't state->tuples only really of interest to datum-tuple-case
routines, now that you've made everything use large logtape.c preread
buffers?)

* Is is really necessary for !state->tuples cases to do that
MERGETUPLEBUFFER_SIZE-based allocation? In other words, what need is
there for pass-by-value datum cases to have this scratch/double buffer
memory at all, since the value is returned to caller by-value, not
by-reference? This is related to the problem of it not being entirely
clear what batch memory now is, I think.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup stream xlog to tar

2016-09-29 Thread Magnus Hagander
On Mon, Sep 5, 2016 at 10:01 AM, Michael Paquier 
wrote:

> On Sat, Sep 3, 2016 at 10:35 PM, Magnus Hagander 
> wrote:
> > Ugh. That would be nice to have, but I think that's outside the scope of
> > this patch.
>
> A test for this patch that could have value would be to use
> pg_basebackup -X stream -Ft, then untar pg_xlog.tar and look at the
> size of the segments. If you have an idea to untar something without
> the in-core perl support because we need to have the TAP stuff able to
> run on at least 5.8.8, I'd welcome an idea. Honestly I still have
> none, and that's why the recovery tests do not use tarballs in their
> tests when using pg_basebackup. In short let's not add something more
> for this patch.
>
> > PFA is an updated version of this patch that:
> > * documents a magic value passed to zlib (which is in their
> documentation as
> > being a magic value, but has no define)
> > * fixes the padding of tar files
> > * adds a most basic test that the -X stream -Ft does produce a tarfile
>
> So I had a more serious look at this patch, and it basically makes
> more generic the operations done for the plain mode by adding a set of
> routines that can be used by both tar and plain mode to work on the
> WAL files streamed. Elegant.
>
> +   
> +The transaction log files will be written to
> + the base.tar file.
> +   
> Nit: number of spaces here.
>

Fixed.



> -mark_file_as_archived(const char *basedir, const char *fname)
> +mark_file_as_archived(StreamCtl *stream, const char *fname)
> Just passing WalMethod as argument would be enough, but... My patch
> adding the fsync calls to pg_basebackup could just make use of
> StreamCtl, so let's keep it as you suggest.
>

Yeah, I think it's cleaner to pass the whole structure around really. If
not now, we'd need it eventually. That makes all callers more consistent.



>  static bool
>  existsTimeLineHistoryFile(StreamCtl *stream)
>  {
> [...]
> +   return stream->walmethod->existsfile(histfname);
>  }
> existsfile returns always false for the tar method. This does not
> matter much because pg_basebackup exists immediately in case of a
> failure, but I think that this deserves a comment in ReceiveXlogStream
> where existsTimeLineHistoryFile is called.
>

OK, added. As you say, the behaviour is expected, but it makes sense to
mention it clealry there.



> I find the use of existsfile() in open_walfile() rather confusing
> because this relies on the fact  that existsfile() returns always
> false for the tar mode. We could add an additional field in WalMethod
> to store the method type and use that more, but that may make the code
> more confusing than what you propose. What do you think?
>

Yeah, I'm not sure that helps. The point is that the abstraction is
supposed to take care of that. But if it's confusing, then clearly a
comment is warranted there, so I've added that. Do you think that makes it
clear enough?



>
> +   int (*unlink) (const char *pathname);
> The unlink method is used nowhere. This could just be removed.
>

That's clearly a missed cleanup. Removed, thanks.



> -static void
> +void
>  print_tar_number(char *s, int len, uint64 val)
> This could be an independent patch. Or not.
>

Could be, but we don't really have any other uses for it.



> I think that I found another bug regarding the contents of the
> segments. I did pg_basebackup -F t -X stream, then untared pg_xlog.tar
> which contained segment 1/0/2, then:
> $ pg_xlogdump 00010002
> pg_xlogdump: FATAL:  could not find a valid record after 0/200
> I'd expect this segment to have records, up to a XLOG_SWITCH record.
>

Ugh. That's definitely broken yes. It seeked back and overwrote the tar
header with the data, instead of starting where the file part was supposed
to be. It worked fine on compressed files, and it's when implementing that
that it broke.

So what's our basic rule for these perl tests - are we allowed to use
pg_xlogdump from within a pg_basebackup test? If so that could actually be
a useful test - do the backup, extract the xlog and verify that it contains
the SWITCH record.


I also noticed that using -Z5 created a .tar.gz and -z created a .tar
(which was compressed).  Because compresslevel is set to -1 with -z,
meaning default.


Again, apologies for getting late back into the game here.

//Magnus
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9f1eae1..a4236a5 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -180,7 +180,8 @@ PostgreSQL documentation
 target directory, the tar contents will be written to
 standard output, suitable for piping to for example
 gzip. This is only possible if
-the cluster has no additional tablespaces.
+the cluster has no additional tablespaces and transaction
+log 

Re: [HACKERS] Add support for restrictive RLS policies

2016-09-29 Thread Jeevan Chalke
Hi Stephen,


> 4. It will be good if we have an example for this in section
> > "5.7. Row Security Policies"
>
> I haven't added one yet, but will plan to do so.
>
> I think you are going to add this in this patch itself, right?

I have reviewed your latest patch and it fixes almost all my review
comments.
Also I am agree with your responses for couple of comments like response on
ALTER POLICY and tab completion. No issues with that.

However in documentation, PERMISSIVE and RESTRICTIVE are actually literals
and not parameters as such.  Also can we combine these two options into one
like below (similar to how we document CASCADE and RESTRICT for DROP
POLICY):

   
PERMISSIVE
RESTRICTIVE


 
... explain PERMISSIVE ...
 
 
... explain RESTRICTIVE ...
 

   

Apart from this changes look excellent to me.

Thanks

-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] less expensive pg_buffercache on big shmem

2016-09-29 Thread Heikki Linnakangas

On 09/29/2016 02:45 AM, Ivan Kartyshov wrote:

 > Secondly, I see this bit added to the loop over buffers:
 >
 > if (bufHdr->tag.forkNum == -1)
 > {
 > fctx->record[i].blocknum = InvalidBlockNumber;
 > continue;
 > }
 >
 > and I have no idea why this is needed (when it was not needed before).

This helps to skip not used bufferpages. It is valuable on big and  not
warmup shared memory.


That seems like an unrelated change. Checking for forkNum, instead of 
e.g. BM_VALID seems a bit surprising, too.



On 09/02/2016 12:01 PM, Robert Haas wrote:

I think we certainly want to lock the buffer header, because otherwise
we might get a torn read of the buffer tag, which doesn't seem good.
But it's not obvious to me that there's any point in taking the lock
on the buffer mapping partition; I'm thinking that doesn't really do
anything unless we lock them all, and we all seem to agree that's
going too far.


Replace consistent method with semiconsistent (that lock buffer headers
without partition lock). Made some additional fixes thanks to reviewers.


This patch also does:

+++ b/contrib/pg_buffercache/pg_buffercache--1.2--1.3.sql
@@ -0,0 +1,6 @@
+/* contrib/pg_buffercache/pg_buffercache--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_buffercache(text) UPDATE TO '1.3'" to 
load this file. \quit

+
+ALTER FUNCTION pg_buffercache_pages() PARALLEL SAFE;

Why? That seems unrelated to what's been discussed in this thread.


I have committed the straightforward part of this, removing the 
partition-locking. I think we're done here for this commitfest, but feel 
free to post new patches for that PARALLEL SAFE and the quick-check for 
unused buffers, if you think it's worthwhile.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-09-29 Thread Artur Zakirov
2016-09-29 13:54 GMT+05:00 amul sul :
>
> On Thu, Sep 29, 2016 at 2:48 AM, Tom Lane  wrote:
> >
> > I started looking at your 0001-to-timestamp-format-checking-v4.patch
> > and this point immediately jumped out at me.  Currently the code relies
> > ... without any documentation ... on no elog being thrown out of
> > parse_format().  That's at the very least trouble waiting to happen.
> > There's a hack to deal with errors from within the NUMDesc_prepare
> > subroutine, but it's a pretty ugly and underdocumented hack.  And what
> > you had here was randomly different from that solution, too.
> >
> > After a bit of thought it seemed to me that a much cleaner fix is to add
> > a "valid" flag to the cache entries, which we can leave clear until we
> > have finished parsing the new format string.  That avoids adding extra
> > data copying as you suggested, removes the need for PG_TRY, and just
> > generally seems cleaner and more bulletproof.
> >
> > I've pushed a patch that does it that way.  The 0001 patch will need
> > to be rebased over that (might just require removal of some hunks,
> > not sure).
> >
> > I also pushed 0002-to-timestamp-validation-v2.patch with some revisions
> > (it'd broken acceptance of BC dates, among other things, but I think
> > I fixed everything).

Thank you for committing the 0002 part of the patch! I wanted to fix
cache functions too, but wasn't sure about that.

> >
> > Since you told us earlier that you'd be on vacation through the end of
> > September, I'm assuming that nothing more will happen on this patch during
> > this commitfest, so I will mark the CF entry Returned With Feedback.
>
> Behalf of Artur I've rebased patch, removed hunk dealing with broken
> cache entries by copying it, which is no longer required after 83bed06
> commit.
>
> Commitfest status left untouched, I am not sure how to deal with
> "Returned With Feedback" patch. Is there any chance that, this can be
> considered again for committer review?

Thank you for fixing the patch!
Today I have access to the internet and able to fix and test the
patch. I've looked at your 0001-to-timestamp-format-checking-v5.patch.
It seems nice to me. I suppose it is necessary to fix
is_char_separator() declaration.

from:
static bool is_char_separator(char *str);

to:
static bool is_char_separator(const char *str);

Because now in parse_format() *str argument is const.
I attached new version of the patch, which fix is_char_separator()
declaration too.

Sorry for confusing!

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


0001-to-timestamp-format-checking-v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Heikki Linnakangas

On 09/28/2016 07:20 PM, Peter Geoghegan wrote:

On Wed, Sep 28, 2016 at 5:11 PM, Peter Geoghegan  wrote:

This is why I never pursued batch memory for non-final merges. Isn't
that what you're doing here? You're pretty much always setting
"state->batchUsed = true".


Wait. I guess you feel you have to, since it wouldn't be okay to use
almost no memory per tape on non-final merges.

You're able to throw out so much code here in large part because you
give almost all memory over to logtape.c (e.g., you don't manage each
tape's share of "slots" anymore -- better to give everything to
logtape.c). So, with your patch, you would actually only have one
caller tuple in memory at once per tape for a merge that doesn't use
batch memory (if you made it so that a merge *could* avoid the use of
batch memory, as I suggest).


Correct.


In summary, under your scheme, the "batchUsed" variable contains a
tautological value, since you cannot sensibly not use batch memory.
(This is even true with !state->tuples callers).


I suppose.


Do I have that right? If so, this seems rather awkward. Hmm.


How is it awkward?

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An extra error for client disconnection on Windows

2016-09-29 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 13 Sep 2016 10:00:32 -0300, Alvaro Herrera  
wrote in <20160913130032.GA391646@alvherre.pgsql>
> Michael Paquier wrote:
> > On Tue, Sep 13, 2016 at 10:42 AM, Kyotaro HORIGUCHI
> >  wrote:
> > > If we take a policy to try to imitate the behavior of some
> > > reference platform (specifically Linux) on other platforms, this
> > > is required disguising. Another potential policy on this problem
> > > is "following the platform's behavior". From this viewpoint, this
> > > message should be shown to users because Windows says
> > > so. Especially for socket operations, the simultion layer is
> > > intending the former for non-error behaviors, but I'm not sure
> > > about the behaviors on errors.
> > 
> > The more you hack windows, the more you'll notice that it is full of
> > caveats, behavior exceptions, and that it runs in its way as nothing
> > else in this world... This patch looks like a tempest in a teapot at
> > the end. Why is it actually a problem to show this message? Just
> > useless noise? If that's the only reason let's drop the patch and move
> > on.
> 
> Yeah, I looked into this a few days ago and that was my conclusion also:
> let's drop this.

Ok, I greed. Thanks.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-09-29 Thread amul sul
On Thu, Sep 29, 2016 at 2:48 AM, Tom Lane  wrote:
> Artur Zakirov  writes:
>> - now DCH_cache_getnew() is called after parse_format(). Because now
>> parse_format() can raise an error and in the next attempt
>> DCH_cache_search() could return broken cache entry.
>
> I started looking at your 0001-to-timestamp-format-checking-v4.patch
> and this point immediately jumped out at me.  Currently the code relies
> ... without any documentation ... on no elog being thrown out of
> parse_format().  That's at the very least trouble waiting to happen.
> There's a hack to deal with errors from within the NUMDesc_prepare
> subroutine, but it's a pretty ugly and underdocumented hack.  And what
> you had here was randomly different from that solution, too.
>
> After a bit of thought it seemed to me that a much cleaner fix is to add
> a "valid" flag to the cache entries, which we can leave clear until we
> have finished parsing the new format string.  That avoids adding extra
> data copying as you suggested, removes the need for PG_TRY, and just
> generally seems cleaner and more bulletproof.
>
> I've pushed a patch that does it that way.  The 0001 patch will need
> to be rebased over that (might just require removal of some hunks,
> not sure).
>
> I also pushed 0002-to-timestamp-validation-v2.patch with some revisions
> (it'd broken acceptance of BC dates, among other things, but I think
> I fixed everything).
>
> Since you told us earlier that you'd be on vacation through the end of
> September, I'm assuming that nothing more will happen on this patch during
> this commitfest, so I will mark the CF entry Returned With Feedback.

Behalf of Artur I've rebased patch, removed hunk dealing with broken
cache entries by copying it, which is no longer required after 83bed06
commit.

Commitfest status left untouched, I am not sure how to deal with
"Returned With Feedback" patch. Is there any chance that, this can be
considered again for committer review?

Thanks !

Regards,
Amul


0001-to-timestamp-format-checking-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-09-29 Thread Kyotaro HORIGUCHI
Hello, thank you for the comment.

At Fri, 23 Sep 2016 18:15:40 +0530, Amit Khandekar  
wrote in 

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-29 Thread Emre Hasegeli
> Well, those two results are not contradictory -- notice that you
> switched the order of the values in the comparison.  I don't think
> you've really found the explanation yet.

I am sorry I copy-pasted the wrong example:

"select_views" test runs:

> SELECT name, #thepath FROM iexit ORDER BY 1, 2

and expects to get rows in this order:

>  I- 580Ramp |8
>  I- 580/I-680  Ramp |2

with the collation on my laptop, this is actually true:

> regression=# select 'I- 580Ramp' < 'I- 580/I-680  
> Ramp';
>  ?column?
> --
>  t
> (1 row)

on the Linux server I am testing, it is not:

> regression=# select 'I- 580Ramp' < 'I- 580/I-680  
> Ramp';
>  ?column?
> --
>  f
> (1 row)

The later should be the case on your environment as the test was also
failing for you.  This is not consistent with the expected test
result.  Do you know how this test can still pass on the master?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-09-29 Thread Michael Paquier
On Tue, Aug 30, 2016 at 11:04 AM, Michael Paquier
 wrote:
> On Mon, Aug 29, 2016 at 9:39 PM, Craig Ringer
>  wrote:
>> Cool. I'll mark as waiting on author pending that.
>>
>> It'll be good to get this footgun put away.
>
> OK, so done. I have put the renaming of pg_xlog to pg_wal on top patch
> stack as that's the one making no discussion it seems: a lot of people
> like pg_wal. I have added as well handling for the renaming in
> pg_basebackup by using PQserverVersion. One thing to note is that a
> connection needs to be made to the target server *before* creating the
> soft link of pg_xlog/pg_wal because we need to know the version of the
> target server. pg_upgrade is handled as well. And that's all in 0001.
>
> Patch 0002 does pg_clog -> pg_trans, "trans" standing for
> "transaction". Switching to pg_trans_status or pg_xact_status is not
> that complicated to change anyway...

Any input to offer for those patches? If there is nothing happening, I
guess that the best move is just to move it to next CF. At least I can
see that the flow would be to get those renamings done.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2016-09-29 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 20 Sep 2016 16:50:29 +0900, Michael Paquier  
wrote in 
> On Mon, Sep 19, 2016 at 6:11 PM, Pavel Stehule  
> wrote:
> > I am thinking so commit's description should be inside README
> 
> Horiguchi-san, your patch has some whitespace issues, you may want to
> get a run with git diff --check. Here are some things I have spotted:
> src/bin/psql/tab-complete.c:1074: trailing whitespace.
> +"MATERIALIZED VIEW",
> src/bin/psql/tab-complete.c:2621: trailing whitespace.
> +   COMPLETE_WITH_QUERY(Query_for_list_of_roles,

Thank you very much for pointing it out. I put a pre-commit hook
to check that not to do such a mistake again.

http://stackoverflow.com/questions/591923/make-git-automatically-remove-trailing-whitespace-before-committing/22704385#22704385

> This set of patches is making psql tab completion move into a better
> shape, particularly with 0001 that removes the legendary huge if-elif
> and just the routine return immediately in case of a keyword match.
> Things could be a little bit more shortened by for example not doing
> the refactoring of the tab macros because they are just needed in
> tab-complete.c. The other patches introduce further improvements for
> the existing infrastructure, but that's a lot of things just for
> adding IF [NOT] EXISTS to be honest.

It was the motive for this, but even excluding it, some syntaxes
with optional keywords can be simplified or enriched with the new
macros. CREATE SCHEMA's schema elements, CREATE INDEX and some
other syntaxes are simplified using the feature.

> Testing a bit, I have noticed that for example trying to after typing
> "create table if", if I attempt to do a tab completion "not exists"
> does not show up. I suspect that the other commands are failing at
> that as well.

I suppose it is "create table if ", with a space at the tail. It
is a general issue on combined keywords(?) suggestion in the
whole tab-completion mechanism (or readline's limitation). Some
sytaxes have explicit complition for such cases. For examle,
"create foreign " gets a suggestion of "DATA WRAPPER" since it
has an explcit suggestion step.

> /* ALTER FOREIGN */
> if (Matches2("ALTER", "FOREIGN"))
> COMPLETE_WITH_LIST2("DATA WRAPPER", "TABLE");

It is apparently solvable, but needs additional code to suggest
the rest words for every steps. It should be another issue.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "Re: Question about grant create on database and pg_dump/pg_dumpall

2016-09-29 Thread Haribabu Kommi
On Mon, Sep 26, 2016 at 2:29 PM, Rafia Sabih 
wrote:

> On Tue, Jul 5, 2016 at 06:39 AM, Haribabu Kommi
> kommi(dot)haribabu(at)gmail(dot)com wrote:
>
> Still i feel the GRANT statements should be present, as the create
> database statement
> is generated only with -C option. So attached patch produces the GRANT
> statements based
> on the -x option.
>
>
> The attached patch does the job fine. However, I am a little skeptical
> about this addition, since, it is clearly mentioned in the documentation of
> pg_dump that it would not restore global objects, then why expecting this.
> This addditon makes pg_dump -C somewhat special as now it is restoring
> these grant statements. Only if we consider the popular method of
> dump-restore mentioned in the thread (https://www.postgresql.org/me
> ssage-id/E1VYMqi-0001P4-P4%40wrigleys.postgresql.org) with pg_dumpall -g
> and then individual pg_dump, then it would be helpful to have this patch.
>

Thanks for your comments.

I am also not sure whether pg_dumpall -g and then individual pg_dump
is the more widely used approach or not? If it is the case, it is better
to fix the grant statements with -C option, otherwise I agree that
this patch is not required.

Any opinions from other members?

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] WAL logging problem in 9.4.3?

2016-09-29 Thread Michael Paquier
On Mon, Sep 26, 2016 at 5:03 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, I return to this before my things:)
>
> Though I haven't played with the patch yet..

Be sure to run the test cases in the patch or base your tests on them then!

> Though I don't know how it actually impacts the perfomance, it
> seems to me that we can live with truncated_to and sync_above in
> RelationData and BufferNeedsWAL(rel, buf) instead of
> HeapNeedsWAL(rel, buf).  Anyway up to one entry for one relation
> seems to exist at once in the hash.

TBH, I still think that the design of this patch as proposed is pretty
cool and easy to follow.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Amit Kapila
On Thu, Sep 29, 2016 at 12:56 PM, Dilip Kumar  wrote:
> On Thu, Sep 29, 2016 at 6:40 AM, Tomas Vondra
>  wrote:
>> Yes, definitely - we're missing something important, I think. One difference
>> is that Dilip is using longer runs, but I don't think that's a problem (as I
>> demonstrated how stable the results are).
>>
>> I wonder what CPU model is Dilip using - I know it's x86, but not which
>> generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer
>> model and it makes a difference (although that seems unlikely).
>
> I am using "Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz "
>

Another difference is that m/c on which Dilip is doing tests has 8 sockets.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Dilip Kumar
On Thu, Sep 29, 2016 at 6:40 AM, Tomas Vondra
 wrote:
> Yes, definitely - we're missing something important, I think. One difference
> is that Dilip is using longer runs, but I don't think that's a problem (as I
> demonstrated how stable the results are).
>
> I wonder what CPU model is Dilip using - I know it's x86, but not which
> generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer
> model and it makes a difference (although that seems unlikely).

I am using "Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz "


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2016-09-29 Thread Kyotaro HORIGUCHI
Thank you for reviewing!

At Mon, 19 Sep 2016 11:11:03 +0200, Pavel Stehule  
wrote in 

Re: [HACKERS] Handling dropped attributes in pglogical_proto

2016-09-29 Thread Craig Ringer
On 29 September 2016 at 12:53, Petr Jelinek  wrote:
> On 29/09/16 05:33, Michael Paquier wrote:
>> On Wed, Sep 28, 2016 at 11:25 PM, Konstantin Knizhnik
>>  wrote:
>>> But if table was just altered and some attribute was removed from the table,
>>> then rel->natts can be greater than natts.
>>
>> This is part of pglogical, so you may want to reply on the dedicated
>> thread or send directly a patch to them. By the way, this code path
>> may need to care as well about attisdropped. It is never good not to
>> check for it.
>>
>
> Agreed this does not belong to hackers and should reported on github.
>
> But just as note the rel variable is not Relation, it's
> PGLogicalRelation which gets populated by relation message from the
> upstream so if you observe the behavior mentioned in the original email,
> you are probably doing something unexpected there (Konstantin is not
> using vanilla pglogical). Also the attmap should never contain
> attisdropped attributes.

Yeah, and if you're relying on attno equivalence you're also going to
have to deal with it in dump/restore unless you only create new nodes
using physical copy. (See bdr's bdr_dump).

BDR relies on attno equivalence. pglogical avoids doing because of the
pain it caused in BDR.

On a side note, do NOT report bugs on ANY software when you're talking
about a modified version unless you clearly disclose that it's
modified. You'll waste everyone's time. We already spent a while
looking into a logical decoding issue you reported that turned out to
be caused by your team's patches to add 2PC decoding support, which
you didn't mention in the report. If you want anyone to listen to your
reports you really need to be very explicit about whether it's a
modified version or not. If it's modified, make sure you can reproduce
the problem in the unmodified version or explain the situation more.

If you want to ask things about BDR and pglogical, now that it's clear
that in-core logical replication is going in another direction so
-hackers isn't the place for it, please use bdr-l...@2ndquadrant.com .

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notice lock waits

2016-09-29 Thread Haribabu Kommi
On Sat, Aug 6, 2016 at 3:00 AM, Jeff Janes  wrote:

> One time too many, I ran some minor change using psql on a production
> server and was wondering why it was taking so much longer than it did
> on the test server.  Only to discover, after messing around with
> opening new windows and running queries against pg_stat_activity and
> pg_locks and so on, that it was waiting for a lock.
>
> So I created a new guc, notice_lock_waits, which acts like
> log_lock_waits but sends the message as NOTICE so it will show up on
> interactive connections like psql.
>
> I turn it on in my .psqlrc, as it doesn't make much sense for me to
> turn it on in non-interactive sessions.
>
> A general facility for promoting selected LOG messages to NOTICE would
> be nice, but I don't know how to design or implement that.  This is
> much easier, and I find it quite useful.
>
> I have it PGC_SUSET because it does send some tiny amount of
> information about the blocking process (the PID) to the blocked
> process.  That is probably too paranoid, because the PID can be seen
> by anyone in the pg_locks table anyway.
>
> Do you think this is useful and generally implemented in the correct
> way?  If so, I'll try to write some sgml documentation for it.
>


Providing the details of lock wait to the client is good. I fell this
message
is useful for the cases where User/administrator is trying to perform some
SQL operations.

I also feel that, adding a GUC variable for these logs to show it to user
may not be good. Changing the existing GUC may be better.

I am not sure whether it really beneficial in providing all LOG as NOTICE
messages with a generic framework, it may be unnecessary overhead
for some users, I am not 100% sure.

Regards,
Hari Babu
Fujitsu Australia


  1   2   >