Re: [HACKERS] WAL consistency check facility

2016-11-12 Thread Michael Paquier
On Sun, Nov 13, 2016 at 12:06 PM, Peter Eisentraut
 wrote:
> Could we name this something like wal_consistency_checking?
>
> Otherwise it sounds like you use this to select the amount of
> consistency in the WAL (similar to, say, wal_level).

Or wal_check? Or wal_consistency_check?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-11-12 Thread Peter Eisentraut
On 11/12/16 2:18 PM, Andres Freund wrote:
>>>  I also wonder if we want an easier to
>>> > > extend form of pubinsert/update/delete (say to add pubddl, pubtruncate,
>>> > > pub ... without changing the schema).
>>> > > 
>> > 
>> > So like, text array that's then parsed everywhere (I am not doing
>> > bitmask/int definitely)?
> Yes, that sounds good to me. Then convert it to individual booleans or a
> bitmask when loading the publications into the in-memory form (which you
> already do).

I'm not sure why that would be better.  Adding catalog columns in future
versions is not a problem.  We're not planning on adding hundreds of
publication attributes.  Denormalizing catalog columns creates all kinds
of inconveniences, in the backend code, in frontend code, for users.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?

2016-11-12 Thread Peter Eisentraut
On 11/11/16 11:10 AM, Tom Lane wrote:
> boolin: OID=1242 proname=boolin proargtypes="cstring" prorettype=bool
> boolin: prosrc=boolin provolatile=i proparallel=s

Then we're not very far away from just using CREATE FUNCTION SQL commands.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: scan key push down to heap [WIP]

2016-11-12 Thread Dilip Kumar
I have done performance analysis for TPCH queries, I saw visible gain
in 5 queries (10-25%).

Performance Data:

Benchmark : TPCH (S.F. 10)
shared_buffer : 20GB
work_mem  : 50MB
Machine   : POWER

Results are median of three run (explain analyze results for both
head/patch are attached in TPCH_out.tar).

  Query Execution Time in (ms)
HeadPatchImprovement
Q3 1847516558   10%
Q4   7526  585622%
Q7 193861742510%
Q10   169941501911%
Q12   1385210117 26%

Currently we had two major problems about this patch..

Problem1:  As Andres has mentioned, HeapKeyTest uses heap_getattr,
whereas ExecQual use slot_getattr().So we can have worst case
performance problem when very less tuple are getting filter out and we
have table with many columns with qual on most of the columns.

Problem2. In HeapKeyTest we are under per_query_ctx, whereas in
ExecQual we are under per_tuple_ctx , so in former we can not afford
to have any palloc.

In this patch I have address both the concern by exposing executor
information to heap (I exposed per_tuple_ctx and slot to HeapDesc),
which is not a very good design.

I have other ideas in mind for solving these concerns, please provide
your thoughts..

For problem1 :
I think it's better to give task of key push down to optimizer, there
we can actually take the decision   mostly based on two parameters.
   1. Selectivity.
   2. Column number on which qual is given.

For problem2 :
I think for solving this we need to limit the number of datatype we
pushdown to heap (I mean we can  push all datatype which don't need
palloc in qual test).
   1. Don't push down datatype with variable length.
   2. Some other datatype with fixed length like 'name' can also
palloc (i.e nameregexeq). so we needto block them as well.


*Note: For exactly understanding which key is pushed down in below
attached exact analyze output, refer this example..

without patch:
->  Parallel Seq Scan on orders  (cost=0.00..369972.00 rows=225038
width=20) (actual time=0.025..3216.157 rows=187204 loops=3)
   Filter: ((o_orderdate >= '1995-01-01'::date) AND
(o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone))
   Rows Removed by Filter: 4812796

with patch:
->  Parallel Seq Scan on orders  (cost=0.00..369972.00 rows=225038
width=20) (actual time=0.015..1884.993 rows=187204 loops=3)
   Filter: ((o_orderdate >= '1995-01-01'::date) AND
(o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone))

1. So basically on head it shows how many rows are discarded by filter
(Rows Removed by Filter: 4812796), Now if we pushdown all the keys
then it will not show this value.

2. We can also check how much actual time reduced in the SeqScan node.
(i.e in above example on head it was 3216.157 whereas with patch it
was 1884.993).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


On Thu, Nov 3, 2016 at 7:29 PM, Robert Haas  wrote:
> On Tue, Nov 1, 2016 at 8:31 PM, Kouhei Kaigai  wrote:
>> By the way, I'm a bit skeptical whether this enhancement is really beneficial
>> than works for this enhancement, because we can now easily increase the 
>> number
>> of processor cores to run seq-scan with qualifier, especially, when it has 
>> high
>> selectivity.
>> How about your thought?
>
> Are you saying we don't need to both making sequential scans faster
> because we could just use parallel sequential scan instead?  That
> doesn't sound right to me.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


heap_scankey_pushdown_expr_ctx.patch
Description: Binary data


TPCH_out.tar
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-11-12 Thread Petr Jelinek
On 12/11/16 20:19, Andres Freund wrote:
> On 2016-11-10 23:31:27 +0100, Petr Jelinek wrote:
>> On 04/11/16 13:15, Andres Freund wrote:
>>>
>>>  /* Prototypes for private functions */
>>> -static bool libpq_select(int timeout_ms);
>>> +static bool libpq_select(PGconn *streamConn,
>>> +int timeout_ms);
>>>
>>> If we're starting to use this more widely, we really should just a latch
>>> instead of the plain select(). In fact, I think it's more or less a bug
>>> that we don't (select is only interruptible by signals on a subset of
>>> our platforms).  That shouldn't bother this patch, but...
>>>
>>>
>>
>> Agree that this is problem, especially for the subscription creation
>> later. We should be doing WaitLatchOrSocket, but the question is which
>> latch. We can't use MyProc one as that's not the latch that WalReceiver
>> uses so I guess we would have to send latch as parameter to any caller
>> of this which is not very pretty from api perspective but I don't have
>> better idea here.
> 
> I think we should simply make walsender use the standard proc
> latch. Afaics that should be fairly trivial?

Walreceiver you mean. Yeah that should be simple, looking at the code I
am not quite sure why it uses separate latch in the first place.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-12 Thread Peter Eisentraut
On 11/9/16 11:55 PM, Michael Paquier wrote:
> + 
> +  wal_consistency (string)
> +  
> +   wal_consistency configuration parameter
> +  
> +  
> +  
> +   
> +This parameter is used to check the consistency of WAL records, i.e,
> +whether the WAL records are inserted and applied correctly. When
> +wal_consistency is enabled for a WAL record, it
> +stores a full-page image along with the record. When a full-page 
> image
> +arrives during redo, it compares against the current page to check 
> whether
> +both are consistent.
> +   

Could we name this something like wal_consistency_checking?

Otherwise it sounds like you use this to select the amount of
consistency in the WAL (similar to, say, wal_level).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2016-11-12 Thread Peter Eisentraut
On 11/8/16 3:48 PM, Robert Haas wrote:
> First question: Do we even want this?  Generally, when a program
> writes to a file, we rely on the operating system to decide when that
> data should be written back to disk.  We have to override that
> distinction for things internal to PostgreSQL because we need certain
> bits of data to reach the disk in a certain order, but it's unclear to
> me how far outside the core database system we want to extend that.

I had voiced a similar concern previously:
https://www.postgresql.org/message-id/f8dff810-f5f4-77c3-933b-127df4ed9...@2ndquadrant.com

At the time, there were no other comments, so we went ahead with it,
which presumably gave encouragement to pursue the current patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2016-11-12 Thread Michael Paquier
On Sun, Nov 13, 2016 at 12:17 AM, Dilip Kumar  wrote:
> On Sat, Nov 12, 2016 at 10:16 AM, Michael Paquier
>  wrote:
>>> - For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
>>>   pg_dump (because -N is already taken there).
>>>   I'd opt for either using the same short option for both or (IMO better)
>>>   only offering a long option for both.
>>
>> Okay. For consistency's sake let's do that. I was a bit hesitant
>> regarding that to be honest.
>
> Seems like you have missed to remove -N at some places, as mentioned below.
>
> 1.
> --- a/doc/src/sgml/ref/pg_dumpall.sgml
> +++ b/doc/src/sgml/ref/pg_dumpall.sgml
> @@ -365,6 +365,21 @@ PostgreSQL documentation
>   
>
>   
> +  -N
>
> @@ -543,6 +557,7 @@ help(void)
>
>
> 2.
>   printf(_("\nGeneral options:\n"));
>   printf(_("  -f, --file=FILENAME  output file name\n"));
> + printf(_("  -N, --no-syncdo not wait for changes to
> be written safely to disk\n"));

v4 fixed those two places.

> 3.
> - while ((c = getopt_long(argc, argv, "acd:f:gh:l:oOp:rsS:tU:vwWx",
> long_options, )) != -1)
> + while ((c = getopt_long(argc, argv, "acd:f:gh:l:NoOp:rsS:tU:vwWx",
> long_options, )) != -1)

But not this one...
-- 
Michael
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 371a614..dcad095 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -815,6 +815,20 @@ PostgreSQL documentation
  
 
  
+  --no-sync
+  
+   
+By default, pg_dump will wait for all files
+to be written safely to disk.  This option causes
+pg_dump to return without waiting, which is
+faster, but means that a subsequent operating system crash can leave
+the dump corrupt.  Generally, this option is useful for testing
+but should not be used when dumping data from production installation.
+   
+  
+ 
+
+ 
   --quote-all-identifiers
   

diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 97168a0..70c5332 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -342,6 +342,20 @@ PostgreSQL documentation
  
 
  
+  --no-sync
+  
+   
+By default, pg_dumpall will wait for all files
+to be written safely to disk.  This option causes
+pg_dumpall to return without waiting, which is
+faster, but means that a subsequent operating system crash can leave
+the dump corrupt.  Generally, this option is useful for testing
+but should not be used when dumping data from production installation.
+   
+  
+ 
+
+ 
   --no-tablespaces
   

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 0a28124..becbc56 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -268,7 +268,7 @@ extern Archive *OpenArchive(const char *FileSpec, const ArchiveFormat fmt);
 
 /* Create a new archive */
 extern Archive *CreateArchive(const char *FileSpec, const ArchiveFormat fmt,
-			  const int compression, ArchiveMode mode,
+			  const int compression, bool dosync, ArchiveMode mode,
 			  SetupWorkerPtr setupDumpWorker);
 
 /* The --list option */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 0e20985..5b60da6 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -56,7 +56,8 @@ static const char *modulename = gettext_noop("archiver");
 
 
 static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
-	 const int compression, ArchiveMode mode, SetupWorkerPtr setupWorkerPtr);
+	 const int compression, bool dosync, ArchiveMode mode,
+	 SetupWorkerPtr setupWorkerPtr);
 static void _getObjectDescription(PQExpBuffer buf, TocEntry *te,
 	  ArchiveHandle *AH);
 static void _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData, bool acl_pass);
@@ -202,10 +203,12 @@ setupRestoreWorker(Archive *AHX)
 /* Public */
 Archive *
 CreateArchive(const char *FileSpec, const ArchiveFormat fmt,
-	 const int compression, ArchiveMode mode, SetupWorkerPtr setupDumpWorker)
+			  const int compression, bool dosync, ArchiveMode mode,
+			  SetupWorkerPtr setupDumpWorker)
 
 {
-	ArchiveHandle *AH = _allocAH(FileSpec, fmt, compression, mode, setupDumpWorker);
+	ArchiveHandle *AH = _allocAH(FileSpec, fmt, compression, dosync,
+ mode, setupDumpWorker);
 
 	return (Archive *) AH;
 }
@@ -215,7 +218,7 @@ CreateArchive(const char *FileSpec, const ArchiveFormat fmt,
 Archive *
 OpenArchive(const char *FileSpec, const ArchiveFormat fmt)
 {
-	ArchiveHandle *AH = _allocAH(FileSpec, fmt, 0, archModeRead, setupRestoreWorker);
+	ArchiveHandle *AH = _allocAH(FileSpec, fmt, 0, true, archModeRead, setupRestoreWorker);
 
 	return (Archive *) AH;
 }
@@ -2223,7 +2226,8 @@ 

Re: [HACKERS] More snapshot-management fun

2016-11-12 Thread Tom Lane
I wrote:
> * ProcArrayInstallImportedXmin and ProcArrayInstallRestoredXmin will
> forcibly overwrite MyPgXact->xmin with some random xmin or other.
> While they take pains to ensure that doesn't cause the global xmin
> horizon to go backwards,

and while we're on the subject, I doubt that those pains are sufficient.
Assume that xact A is importing an xmin from xact B, and meanwhile xact
C is attempting to determine the global xmin.  Since the above-mentioned
functions take the ProcArrayLock in shared mode, these things can happen
concurrently.  Assume that the xmin in question is the oldest one in the
system, and consider the following sequence of events:

1. C's scan of the procarray reaches xact A.  Its xmin isn't set yet,
so nothing happens.

2. A imports xmin from B.  Now A's xmin is set, but too late for C
to notice.

3. B resets its MyPgXact->xmin to zero.  Even though it's not allowed to
exit its transaction because C still has the shared ProcArrayLock, that
doesn't stop B from running SnapshotResetXmin intra-transaction.

4. C's scan of the procarray reaches xact B.  B's xmin is now zero,
so that doesn't inform C either.

Upshot: A now has an xmin that is before C's opinion of the global
xmin, allowing C to prune away data that A needs.

In practice this may not be a live bug, because I suspect that we don't
export snapshots that wouldn't be kept till end-of-transaction, so that
B would not actually advance/clear its xmin while C holds the lock.
But there's certainly nothing protecting against it in procarray.c,
and no documentation of the hazard anyplace.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-11-12 Thread Andres Freund
Hi,

Subject: [PATCH 1/2] slab allocator

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 6ad7e7d..520f295 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c

I'd rather have that in a separate followup commit...


+ * IDENTIFICATION
+ *   src/backend/utils/mmgr/slab.c
+ *
+ *
+ * The constant allocation size allows significant simplification and 
various
+ * optimizations that are not possible in AllocSet. Firstly, we can get rid
+ * of the doubling and carve the blocks into chunks of exactly the right 
size
+ * (plus alignment), not wasting memory.

References to AllocSet aren't necessarily a good idea, they'll quite
possibly get out of date. The argument can be quite easily be made
without referring to a concrete reference to behaviour elsewhere.

+ *
+ * At the context level, we use 'freelist' array to track blocks grouped by
+ * number of free chunks. For example freelist[0] is a list of completely 
full
+ * blocks, freelist[1] is a block with a single free chunk, etc.

Hm. Those arrays are going to be quite large for small allocations w/
big blocks (an imo sensible combination). Maybe it'd make more sense to
model it as a linked list of blocks? Full blocks are at one end, empty
ones at the other?


+ * About MEMORY_CONTEXT_CHECKING:
+ *
+ * Since we usually round request sizes up to the next power of 2, there
+ * is often some unused space immediately after a requested data
area.

I assume the "round up" stuff is copy-paste?


+ * Thus, if someone makes the common error of writing past what they've
+ * requested, the problem is likely to go unnoticed ... until the day when
+ * there *isn't* any wasted space, perhaps because of different memory
+ * alignment on a new platform, or some other effect.  To catch this sort
+ * of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond
+ * the requested space whenever the request is less than the actual chunk
+ * size, and verifies that the byte is undamaged when the chunk is freed.
+ *
+ *
+ * About USE_VALGRIND and Valgrind client requests:
+ *
+ * Valgrind provides "client request" macros that exchange information with
+ * the host Valgrind (if any).  Under !USE_VALGRIND, memdebug.h stubs out
+ * currently-used macros.
+ *
+ * When running under Valgrind, we want a NOACCESS memory region both 
before
+ * and after the allocation.  The chunk header is tempting as the preceding
+ * region, but mcxt.c expects to able to examine the standard chunk header
+ * fields.  Therefore, we use, when available, the requested_size field and
+ * any subsequent padding.  requested_size is made NOACCESS before 
returning
+ * a chunk pointer to a caller.  However, to reduce client request traffic,
+ * it is kept DEFINED in chunks on the free list.
+ *
+ * The rounded-up capacity of the chunk usually acts as a post-allocation
+ * NOACCESS region.  If the request consumes precisely the entire chunk,
+ * there is no such region; another chunk header may immediately follow.  
In
+ * that case, Valgrind will not detect access beyond the end of the chunk.
+ *
+ * See also the cooperating Valgrind client requests in mcxt.c.

I think we need a preliminary patch moving a lot of this into something
like mcxt_internal.h. Copying this comment, and a lot of the utility
functions, into every memory context implementation is a bad pattern.


+typedef struct SlabBlockData *SlabBlock;   /* forward reference */
+typedef struct SlabChunkData *SlabChunk;

Can we please not continue hiding pointers behind typedefs? It's a bad
pattern, and that it's fairly widely used isn't a good excuse to
introduce further usages of it.

+/*
+ * SlabContext is a specialized implementation of MemoryContext.
+ */
+typedef struct SlabContext
+{
+   MemoryContextData header;   /* Standard memory-context fields */
+   /* Allocation parameters for this context: */
+   SizechunkSize;  /* chunk size */
+   SizefullChunkSize;  /* chunk size including header and 
alignment */
+   SizeblockSize;  /* block size */
+   int chunksPerBlock; /* number of chunks per block */
+   int minFreeChunks;  /* min number of free chunks in 
any block */
+   int nblocks;/* number of blocks 
allocated */
+   /* Info about storage allocated in this context: */
+   SlabBlock   freelist[1];/* free lists (block-level) */

I assume this is a variable-length array? If so, that a) should be
documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
actually will cause compiler warnings and potential misoptimizations.

+/*
+ * SlabBlockData
+ * Structure 

Re: [HACKERS] Logical Replication WIP

2016-11-12 Thread Andres Freund
Hi,

On 2016-11-11 12:04:27 +0100, Petr Jelinek wrote:
> On 04/11/16 14:00, Andres Freund wrote:
> > Hi,
> > 
> > + 
> > +  pg_publication_rel
> > +
> > +  
> > +   pg_publication_rel
> > +  
> > +
> > +  
> > +   The pg_publication_rel catalog contains
> > +   mapping between tables and publications in the database. This is many to
> > +   many mapping.
> > +  
> > 
> > I wonder if we shouldn't abstract this a bit away from relations to
> > allow other objects to be exported to. Could structure it a bit more
> > like pg_depend.
> > 
> 
> Honestly, let's not overdesign this. Change like that can be made in the
> future if we need it and I am quite unconvinced we do given that
> anything we might want to replicate will be relation. I understand that
> it might be useful to know what's on downstream in terms of objects at
> some point for some future functionality,  but I am don't have idea how
> that functionality will look like so it's premature to guess what
> catalog structure it will need.

I slightly prefer to make it more generic right now, but I don't think
that's a blocker.

> > I still would very much like to move this outside of gram.y and just use
> > IDENTs here. Like how COPY options are handled.
> > 
> 
> Well, I looked into it and it means some loss of info in the error
> messages - mainly the error position in the query because utility
> statements don't get ParseState (unlike COPY). It might be worth the
> flexibility though.

Pretty sure that that's the case.


> >  I also wonder if we want an easier to
> > extend form of pubinsert/update/delete (say to add pubddl, pubtruncate,
> > pub ... without changing the schema).
> > 
> 
> So like, text array that's then parsed everywhere (I am not doing
> bitmask/int definitely)?

Yes, that sounds good to me. Then convert it to individual booleans or a
bitmask when loading the publications into the in-memory form (which you
already do).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-11-12 Thread Andres Freund
On 2016-11-10 23:31:27 +0100, Petr Jelinek wrote:
> On 04/11/16 13:15, Andres Freund wrote:
> > 
> >  /* Prototypes for private functions */
> > -static bool libpq_select(int timeout_ms);
> > +static bool libpq_select(PGconn *streamConn,
> > +int timeout_ms);
> > 
> > If we're starting to use this more widely, we really should just a latch
> > instead of the plain select(). In fact, I think it's more or less a bug
> > that we don't (select is only interruptible by signals on a subset of
> > our platforms).  That shouldn't bother this patch, but...
> > 
> > 
> 
> Agree that this is problem, especially for the subscription creation
> later. We should be doing WaitLatchOrSocket, but the question is which
> latch. We can't use MyProc one as that's not the latch that WalReceiver
> uses so I guess we would have to send latch as parameter to any caller
> of this which is not very pretty from api perspective but I don't have
> better idea here.

I think we should simply make walsender use the standard proc
latch. Afaics that should be fairly trivial?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql \setfileref

2016-11-12 Thread Pavel Stehule
2016-11-10 18:56 GMT+01:00 Tom Lane :

> Pavel Stehule  writes:
> > 2016-11-09 22:47 GMT+01:00 Tom Lane :
> >> * I really dislike the notion of keying the behavior to a special type
> of
> >> psql variable.
>
> > still I am thinking so some differencing can be nice, because we can use
> > psql file path tab autocomplete.
> > Maybe \setfileref can stay - it will set any variable, but the
> autocomplete
> > will be based on file path.
>
> Personally, I'd rather have filename tab completion after ":<", and forget
> \setfileref --- I do not think that setting a variable first is going to
> be the primary use-case for this.  In fact, I could happily dispense with
> the variable case entirely, except that sometimes people will want to
> reference file names that don't syntactically fit into the colon syntax
> (eg, names with spaces in them).  Even that seems surmountable, if people
> are okay with requiring the '<' trailer --- I don't think we need to worry
> too much about supporting file names with '<' in them.  (Likewise for
> '>', if you feel like : is a less ugly syntax.)
>

I found early stop  - there are not easy implement any complex syntax in
lexer, without complex backup rules.
Now, I am coming with little bit different idea, different syntax.

: means insert some content based on psql variable

We can introduce psql content commands that produces some content. The
syntax can be

:{cmd parameters}

Some commands:

* r - read
* rq - read and quote, it can has a alias "<"
* rbq - read binary and quote

Other commands can be introduced in future

Usage:

INSERT INTO foo(image) VALUES(:{rbq ~/avatar.jpg})
INSERT INTO foo(xmdoc) VALUES(:{rq "~/current doc.xml"})
INSERT INTO foo(longtext) VALUES(:{< ~/book.txt})

It is general with possible simple implementation - doesn't big impact on
psql lexer complexity.

What do you think about it?

Regards

Pavel

p.s. the colon is not necessary - we can use {} as special case of ``.

INSERT INTO foo(image) VALUES({rbq ~/avatar.jpg})
INSERT INTO foo(xmdoc) VALUES({rq "~/current doc.xml"})
INSERT INTO foo(longtext) VALUES({< ~/book.txt})

Then we can support psql variables there

\set avatar ~/avatar.jpg
INSERT INTO foo(image) VALUES({rbq :avatar})









>
> > What do you thing about following example?
>
> > INSERT INTO tab VALUES(1, : > escaping
> > INSERT INTO tab VALUES(1, :<#varname); -- insert bytea value  -- used
> bytea
> > escaping
>
> Seems a bit arbitrary, and not readily extensible to more datatypes.
>
> I guess an advantage of the parameterized-query approach is that we
> wouldn't really have to distinguish different datatypes in the syntax,
> because we could use the result of Describe Statement to figure out what
> to do.  Maybe that outweighs the concern about magic behavioral changes.
>
> On the other hand, it's also arguable that trying to cater for automatic
> handling of raw files as bytea literals is overcomplicating the feature
> in support of a very infrequent use-case, and giving up some other
> infrequent use-cases to get that.  An example is that if we insist on
> doing this through parameterized queries, it will not work for inserting
> literals into utility commands.  I don't know which of these things is
> more important to have.
>
> regards, tom lane
>


Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-12 Thread Jeff Janes
On Mon, Nov 7, 2016 at 7:03 PM, Amit Kapila  wrote:

> On Tue, Nov 8, 2016 at 5:12 AM, Jeff Janes  wrote:
> > On Mon, Nov 7, 2016 at 5:55 AM, Amit Kapila 
> wrote:
> >>
> >>
> >> Isn't it somewhat strange that writes are showing big win whereas
> >> reads doesn't show much win?
> >
> >
> > I don't find that unusual, and have seen the same thing on Linux.
> >
> > With small shared_buffers, you are constantly throwing dirty buffers at
> the
> > kernel in no particular order, and the kernel does a poor job of
> predicting
> > when the same buffer will be dirtied repeatedly and only needs the final
> > version of the data actually written to disk.
> >
>
> Okay and I think partially it might be because we don't have writeback
> optimization (done in 9.6) for Windows.


Is the writeback optimization the introduction of checkpoint_flush_after,
or is it something else?

If it is checkpoint_flush_after, then I don't think that that is related.
In fact, they operate in opposite directions.  The writeback optimization
forces the kernel to be more eager about writing out dirty data, so it
doesn't have a giant pile of it when the fsyncs comes at the end of the
checkpoint.  Using a large shared_buffers forces it to be less eager, by
not turning the dirty buffers over to the kernel as often.


> However, still the broader
> question stands that whether above data is sufficient to say that we
> can recommend the settings of shared_buffers on Windows similar to
> Linux?
>

We do have evidence that the old advice is out of date.  So I think we
should just remove it.  We have no evidence that the optimal Windows
setting on modern Windows is the same as it is for Linux, but also no
evidence that it is not the same.

Cheers,

Jeff


Re: [HACKERS] xlogreader.c fails with FATAL on a cluster with 4kB block size

2016-11-12 Thread Tomas Vondra
Meh, ignore this report - I've just realized I've been running the 
pg_xlogdump binary built for 8kB pages, so the failures are kinda 
expected. Sorry about the confusion.


regards

On 11/12/2016 07:52 PM, Tomas Vondra wrote:

Hi,

I'm running some tests on a cluster with 4kB blocks, and it seems
there's a bug in xlogreader.c, causing FATAL errors for example when
running pg_xlogdump:

pg_xlogdump: FATAL:  error in WAL record at 48/63970258:
BKPIMAGE_HAS_HOLE not set, but hole offset 0 length 4096 at 48/63970258

This particular failure comes from "pg_xlogdump --stats", but my guess
is this means recovery is broken with 4kB blocks (and possibly with some
other non-standard block sizes).

The standard 8kB blocks seem to be unaffected, as I've done the same
test on 8kB blocks many times and it never triggered this error.

FWIW the tests were done on bfcd07b4, so fairly recent code.

regards




--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] More snapshot-management fun

2016-11-12 Thread Tom Lane
While pursuing the unprotected-use-of-CatalogSnapshot problem I noted
earlier, I came across a couple of related issues:

* ProcArrayInstallImportedXmin and ProcArrayInstallRestoredXmin will
forcibly overwrite MyPgXact->xmin with some random xmin or other.
While they take pains to ensure that doesn't cause the global xmin
horizon to go backwards, there's no obvious interlock ensuring that
it doesn't cause MyPgXact->xmin to go *forwards* enough to render
some locally-stored snapshot unprotected.  I think we need to add a
call that will check for that and throw an error.  (In the case of
CatalogSnapshot, it could just invalidate CatalogSnapshot, but there
isn't any way for snapmgr.c to unilaterally invalidate a registered
or stacked-active snapshot.)  I think this may be a live bug with
respect to CatalogSnapshot, even if callers are defending against
there being any other active snapshots.

* I tried adding
Assert(MyPgXact->xmin != InvalidTransactionId);
to SnapshotResetXmin() just after it's determined that RegisteredSnapshots
is nonempty, reasoning that the advertised xmin had better be nonzero if
we have any registered snapshots.  It promptly blew up on me.  The reason
is that we clear MyPgXact->xmin during ProcArrayEndTransaction, long
before AtEOXact_Snapshot.  AtEOXact_Snapshot itself didn't have a problem,
but in some cases we may call InvalidateCatalogSnapshot during the cleanup
sequence, and the modified version of that that I'm testing invokes
SnapshotResetXmin which detected the inconsistency.  I find this very
scary: it means that all through transaction cleanup, we have unprotected
snapshots sitting around.  Yeah, they should probably not get used for
anything, but there's no very strong mechanism guaranteeing that.

An easy attempt at a fix for the second problem would be to not clear
MyPgXact->xmin during ProcArrayEndTransaction, but leave it to be done
when we flush our snapshots.  But I seem to recall that it's intentional
and necessary that we clear that synchronously with clearing MyPgXact->xid.

An idea I'm playing with is to add logic so that just before
ProcArrayEndTransaction, we run through all registered and stacked
snapshots and replace their "satisfies" function pointers with pointers
to a function that just throws an error, thus catching any attempt to
use the snapshots for anything during transaction cleanup.  This is sort
of an annoying use of cycles, though I suppose we could do it only in
assert-enabled builds.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-12 Thread Karl O. Pinc
On Mon, 7 Nov 2016 23:29:28 +0100
Gilles Darold  wrote:


> Here is the v13 of the patch,

>   - I've reverted the patch removing the call to logfile_writename in
> open_csvlogfile function, it is required to write log filename at
> startup when log_destination is set to csvlog. I could not find a
> better place right now, but will try to see if we can avoid the call
> to logfile_writename().

Why is calling logfile_writename() in open_csvlogfile() a problem?

I've not thought too deeply but this is what's in my mind.

The only issue that leaps to mind is overall code simplicity.  But
because csv files are opened "on-demand", only when there's something
in the queue to get written to a csv file, it's not clear how to avoid
calling logfile_writename() "on-demand" as well.  To do otherwise might
risk putting the name of a csv logfile into current_logfiles when that
csv log file does not yet exist.  Or you could wait, and have a csv
log file in existance but not have it reflected in the content of
current_logfiles for some time period.  But why?

If there is a problem in code clarity it might be that the csv log files
are not created until there's a csv log entry in the queue.  Change this
and other confusion, if there really is any, goes away.

>   - Do not write current_logfiles when log_collector is activated but
> log_destination doesn't contained stderr or csvlog. This was creating
> an empty file that can confuse the user.

Good catch.


Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] xlogreader.c fails with FATAL on a cluster with 4kB block size

2016-11-12 Thread Tomas Vondra

Hi,

I'm running some tests on a cluster with 4kB blocks, and it seems 
there's a bug in xlogreader.c, causing FATAL errors for example when 
running pg_xlogdump:


pg_xlogdump: FATAL:  error in WAL record at 48/63970258: 
BKPIMAGE_HAS_HOLE not set, but hole offset 0 length 4096 at 48/63970258


This particular failure comes from "pg_xlogdump --stats", but my guess 
is this means recovery is broken with 4kB blocks (and possibly with some 
other non-standard block sizes).


The standard 8kB blocks seem to be unaffected, as I've done the same 
test on 8kB blocks many times and it never triggered this error.


FWIW the tests were done on bfcd07b4, so fairly recent code.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-12 Thread Magnus Hagander
On Sat, Nov 12, 2016 at 4:03 AM, Amit Kapila 
wrote:

> On Fri, Nov 11, 2016 at 3:01 PM, Magnus Hagander 
> wrote:
> >> > Based on this optimization we might want to keep the text that says
> >> > large
> >> > shared buffers on Windows aren't as effective perhaps,
>
> Sounds sensible or may add a line to say why it isn't as effective as on
> Linux.
>

Do we actually know *why*?



> > and just remove
> >> > the
> >> > sentence that explicitly says don't go over 512MB?
> >>
>
> Have we done any windows specific optimization since it was originally
> mentioned as 512MB which indicates that we can remove it?  Are you
> telling it based on results in this thread, if so, I think it is
> better to do few more tests before changing it.
>

Well, that advice goes *way* back. Many things have changed since then -
and just look at things like the updating of the stats target. For one
thing, we're in 64-bit now, not 32, for the majority of users. We reserve
the shared memory on startup which was done for address probability issues,
but may have had side effects. And *Windows itself* has changed in those 10
or so years.

I've heard it from other people as well, but this thread is definitely one
of them. I think the old truth about "never go above that" is invalid at
this point, but it may never have been valid. But I also don't think we
know what to put there instead, as a recommendation.



> >> Just removing the reference to the size would make users ask a question
> >> "What size is the effective upper limit?"
> >
> >
> > True, but that's a question for other platforms as well, isn't it?
> >
>
> Right, but for other platforms, the recommendation seems to be 25% of
> RAM, can we safely say that for Windows as well?  As per test results
> in this thread, it seems the read-write performance degrades when
> shared buffers have increased from 12.5 to 25%.  I think as the test
> is done for a short duration so that degradation could be just a run
> to run to run variation, that's why I suggested doing few more tests.


We talk about 25%, but only up to a certain size. It's suggested as a
starting point. The 25% value us probably good as a starting point, as it's
recommended, but not as a "recommended setting". I'm fine with doing
something similar for Windows -- say "10-15% as a starting point, but you
have to check with your workload" kind of statements.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Indirect indexes

2016-11-12 Thread Andres Freund
Hi,

On 2016-11-01 01:43:31 -0300, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> > I propose we introduce the concept of "indirect indexes".
> 
> This is a WIP non-functional patch for indirect indexes.  I've been
> distracted from working on it for some time already and will be off-line
> for half this month yet, but since this was discussed and seems to be
> considered a welcome idea, I am posting it for those who want to have a
> look at what I'm doing.

I see that this patch has a CF entry, but I'm unclear what reviewer
ought to do at the current state?  There's a lot of stuff closer to
being committable in this fest...


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Change the way that LWLocks for extensions are allocated.

2016-11-12 Thread Tom Lane
Andres Freund  writes:
> On 2016-11-12 11:42:12 -0500, Tom Lane wrote:
>> Hm.  What about inventing an additional macro NO_INSTALLCHECK that
>> prevents pgxs.mk from generating an installcheck rule?

> That'd work. I'd also be ok with living with the warning.  I have to say
> I find it hard to be concerned about the cycles here. It's not like
> anybody complained about make check unconditionally creating a test
> installation, even if there's tests in a contrib module...

[ shrug... ]  The reason why the buildfarm doesn't do "make check" here is
precisely the extra cycles it would take.  Those add up over hundreds of
runs, particularly on slow machines.  So I'm not excited about running
completely useless operations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Change the way that LWLocks for extensions are allocated.

2016-11-12 Thread Andres Freund
On 2016-11-12 11:42:12 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-11-12 11:30:42 -0500, Tom Lane wrote:
> >> which is a rather blatant waste of cycles. I would suggest an explicit
> >> do-nothing installcheck rule rather than the hack you came up with here.
> 
> > I had that at first, but that generates a warning about overwriting the
> > makefile target - which afaics cannot be fixed.
> 
> Hm.  What about inventing an additional macro NO_INSTALLCHECK that
> prevents pgxs.mk from generating an installcheck rule?

That'd work. I'd also be ok with living with the warning.  I have to say
I find it hard to be concerned about the cycles here. It's not like
anybody complained about make check unconditionally creating a test
installation, even if there's tests in a contrib module...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Change the way that LWLocks for extensions are allocated.

2016-11-12 Thread Tom Lane
Andres Freund  writes:
> On 2016-11-12 11:30:42 -0500, Tom Lane wrote:
>> which is a rather blatant waste of cycles. I would suggest an explicit
>> do-nothing installcheck rule rather than the hack you came up with here.

> I had that at first, but that generates a warning about overwriting the
> makefile target - which afaics cannot be fixed.

Hm.  What about inventing an additional macro NO_INSTALLCHECK that
prevents pgxs.mk from generating an installcheck rule?  It's not
like we don't have similar issues elsewhere, eg contrib/sepgsql.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Change the way that LWLocks for extensions are allocated.

2016-11-12 Thread Andres Freund
On 2016-11-12 11:30:42 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Committed after simplifying the Makefile.
> 
> While I have no particular objection to adding these tests, the
> commit log's claim that this will improve buildfarm testing is
> quite wrong.  The buildfarm runs "make installcheck" not
> "make check" in contrib.

Gah.  It was more aimed at the coverage stuff, but that might work the
same. Alvaro?

> which is a rather blatant waste of cycles. I would suggest an explicit
> do-nothing installcheck rule rather than the hack you came up with here.

I had that at first, but that generates a warning about overwriting the
makefile target - which afaics cannot be fixed.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Change the way that LWLocks for extensions are allocated.

2016-11-12 Thread Tom Lane
Andres Freund  writes:
> Committed after simplifying the Makefile.

While I have no particular objection to adding these tests, the
commit log's claim that this will improve buildfarm testing is
quite wrong.  The buildfarm runs "make installcheck" not
"make check" in contrib.  What I'm actually seeing in the
buildfarm reports is

make -C pg_stat_statements installcheck
make -C ../../src/test/regress pg_regress
make -C ../../../src/port all
make -C ../backend submake-errcodes
make[4]: Nothing to be done for `submake-errcodes'.
make -C ../../../src/common all
make -C ../backend submake-errcodes
make[4]: Nothing to be done for `submake-errcodes'.
../../src/test/regress/pg_regress --inputdir=. 
--bindir='/Users/buildfarm/bf-data/HEAD/inst/bin'   --port=5678 --temp-config 
../../contrib/pg_stat_statements/pg_stat_statements.conf 
--dbname=contrib_regression_pg_stat_statements 
(using postmaster on Unix socket, port 5678)
== dropping database "contrib_regression_pg_stat_statements" 
==
NOTICE:  database "contrib_regression_pg_stat_statements" does not exist, 
skipping
DROP DATABASE
== creating database "contrib_regression_pg_stat_statements" 
==
CREATE DATABASE
ALTER DATABASE
== running regression test queries==

=
 All 0 tests passed. 
=

which is a rather blatant waste of cycles.  I would suggest an explicit
do-nothing installcheck rule rather than the hack you came up with here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for changes to recovery.conf API

2016-11-12 Thread Andres Freund
Hi,

On 2016-11-01 05:14:58 +0530, Abhijit Menon-Sen wrote:
> Subject: Convert recovery.conf settings to GUCs

I really want to get this in, we've been back and forth about this for
far too long.

> 
>- Create a recovery command file recovery.conf in the cluster
>- data directory (see ). You might
>+ Set up recovery parameters in postgresql.conf (see
>+  and
>+ ).
>+
>+   
>+   
>+
>+ Create a recovery trigger file recovery.trigger
>+ in the cluster data directory. You might
>  also want to temporarily modify pg_hba.conf to prevent
>  ordinary users from connecting until you are sure the recovery was 
> successful.
> 

I think this means we need to rephrase a number of docs and code
references to trigger files, because they'll currently often refer to
the promotion trigger, no?


> diff --git a/src/backend/access/transam/recovery.conf.sample 
> b/src/backend/access/transam/recovery.conf.sample
> index 7a16751..15ce784 100644
> --- a/src/backend/access/transam/recovery.conf.sample
> +++ b/src/backend/access/transam/recovery.conf.sample
> @@ -2,23 +2,14 @@
>  # PostgreSQL recovery config file
>  # ---
>  #
> -# Edit this file to provide the parameters that PostgreSQL needs to
> -# perform an archive recovery of a database, or to act as a replication
> -# standby.
> +# PostgreSQL 10.0 or later, recovery.conf is no longer used. Instead,
> +# specify all recovery parameters in postgresql.conf and create
> +# recovery.trigger to enter archive recovery or standby mode.
>  #
> -# If "recovery.conf" is present in the PostgreSQL data directory, it is
> -# read on postmaster startup.  After successful recovery, it is renamed
> -# to "recovery.done" to ensure that we do not accidentally re-enter
> -# archive recovery or standby mode.
> +# If you must use recovery.conf, specify "include directives" in
> +# postgresql.conf like this:
>  #
> -# This file consists of lines of the form:
> -#
> -#   name = value
> -#
> -# Comments are introduced with '#'.
> -#
> -# The complete list of option names and allowed values can be found
> -# in the PostgreSQL documentation.
> +#   include 'recovery.conf'

This should probably warn about the differences in "trigger" behaviour
of recovery.conf being present.

>  #---
>  # ARCHIVE RECOVERY PARAMETERS
> @@ -131,14 +122,6 @@
>  #
>  #primary_slot_name = ''

I don't think it makes much sense to still list all this?


> +bool standby_mode = false;

I'm very tempted to rename this during the move to GUCs.


> +char*primary_conninfo = NULL;
> +char*primary_slot_name = NULL;

Slightly less so, but still tempted to also rename these. They're not
actually necessarily pointing towards a primary, and namespace-wise
they're not grouped with recovery_*, which has become more important now
that recovery.conf isn't a separate namespace anymore.

> -static int   recovery_min_apply_delay = 0;
>  static TimestampTz recoveryDelayUntilTime;
>
> +static TimestampTz recoveryDelayUntilTime;
> +

Isn't this a repeated definition?

>  /*
>   * During normal operation, the only timeline we care about is 
> ThisTimeLineID.
>   * During recovery, however, things are more complicated.  To simplify life
> @@ -577,10 +578,10 @@ typedef struct XLogCtlData
>   TimeLineID  PrevTimeLineID;
>
>   /*
> -  * archiveCleanupCommand is read from recovery.conf but needs to be in
> -  * shared memory so that the checkpointer process can access it.
> +  * archive_cleanup_command is read from recovery.conf but needs to
> +  * be in shared memory so that the checkpointer process can access it.
>*/

References recovery.conf. It'd be a good idea to grep for all remaining
references to recovery.conf.


> +static bool
> +check_recovery_target(char **newval, void **extra, GucSource source)
> +{
> + RecoveryTargetType *myextra;
> + RecoveryTargetType rt = RECOVERY_TARGET_UNSET;
> +
> + if (strcmp(*newval, "xid") == 0)
> + rt = RECOVERY_TARGET_XID;
> + else if (strcmp(*newval, "time") == 0)
> + rt = RECOVERY_TARGET_TIME;
> + else if (strcmp(*newval, "name") == 0)
> + rt = RECOVERY_TARGET_NAME;
> + else if (strcmp(*newval, "lsn") == 0)
> + rt = RECOVERY_TARGET_LSN;
> + else if (strcmp(*newval, "immediate") == 0)
> + rt = RECOVERY_TARGET_IMMEDIATE;
> + else if (strcmp(*newval, "") != 0)
> + {
> + GUC_check_errdetail("recovery_target is not valid: \"%s\"", 
> *newval);
> + return false;
> + }
> +
> + myextra = (RecoveryTargetType *) guc_malloc(ERROR, 
> sizeof(RecoveryTargetType));
> + *myextra = rt;
> + *extra = (void *) myextra;
> +
> + return true;
> +}

Shouldn't this instead be an enum type GUC?

> +static bool
> +check_recovery_target_time(char **newval, void **extra, GucSource source)
> +{
> + 

Re: [HACKERS] move collation import to backend

2016-11-12 Thread Andres Freund
Hi,

On 2016-10-27 21:56:53 -0400, Peter Eisentraut wrote:
> Currently, initdb parses locale -a output to populate pg_collation.  If
> additional collations are installed in the operating system, it is not
> possible to repeat this process, only by doing each step manually.  So I
> propose to move this to a backend function that can be called
> separately, and have initdb call that.  Running this logic in the
> backend instead of initdb also makes the code simpler.  If we add other
> collation providers such as ICU, initdb doesn't need to know about that
> at all, because all the logic would be contained in the backend.

That generally sounds like a good idea.  There's some questions imo:
E.g. what if previously present collations are now unavailable?

> I thought about making this a top-level command (IMPORT COLLATIONS ...
> ?) but decided against it for now, to keep it simple.

Seems ok to me.

>  
>   /*
>* Also forbid matching an any-encoding entry.  This test of course is 
> not
>* backed up by the unique index, but it's not a problem since we don't
>* support adding any-encoding entries after initdb.
>*/

Note that this isn't true anymore...

> +
> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
> +
> +Datum
> +pg_import_system_collations(PG_FUNCTION_ARGS)
> +{

Uh?

> + boolif_not_exists = PG_GETARG_BOOL(0);
> + Oid nspid = PG_GETARG_OID(1);
> +
> + FILE   *locale_a_handle;
> + charlocalebuf[NAMEDATALEN]; /* we assume ASCII so this is 
> fine */
> + int count = 0;
> +
> + locale_a_handle = OpenPipeStream("locale -a", "r");
> + if (locale_a_handle == NULL)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not execute command \"%s\": %m",
> + "locale -a")));

This function needs to have !superuser permissions revoked, which it
afaics currently hasn't.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pinning a buffer in TupleTableSlot is unnecessary

2016-11-12 Thread Andres Freund
On 2016-08-30 07:38:10 -0400, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > While profiling some queries and looking at executor overhead, I
> > realized that we're not making much use of TupleTableSlot's ability to
> > hold a buffer pin. In a SeqScan, the buffer is held pinned by the
> > underlying heap-scan anyway. Same with an IndexScan, and the SampleScan.
>
> I think this is probably wrong, or at least very dangerous to remove.
> The reason for the feature is that the slot may continue to point at
> the tuple after the scan has moved on.

FWIW, that's not safe to assume in upper layers *anyway*. If you want to
do that, the slot has to be materialized, and that'd make a local
copy. If you don't materialize tts_values/isnull can point into random
old memory (common e.g. for projections and virtual tuples in general).

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2016-11-12 Thread Dilip Kumar
On Sat, Nov 12, 2016 at 10:16 AM, Michael Paquier
 wrote:
>> - For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
>>   pg_dump (because -N is already taken there).
>>   I'd opt for either using the same short option for both or (IMO better)
>>   only offering a long option for both.
>
> Okay. For consistency's sake let's do that. I was a bit hesitant
> regarding that to be honest.

Seems like you have missed to remove -N at some places, as mentioned below.

1.
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -365,6 +365,21 @@ PostgreSQL documentation
  

  
+  -N

@@ -543,6 +557,7 @@ help(void)


2.
  printf(_("\nGeneral options:\n"));
  printf(_("  -f, --file=FILENAME  output file name\n"));
+ printf(_("  -N, --no-syncdo not wait for changes to
be written safely to disk\n"));

3.
- while ((c = getopt_long(argc, argv, "acd:f:gh:l:oOp:rsS:tU:vwWx",
long_options, )) != -1)
+ while ((c = getopt_long(argc, argv, "acd:f:gh:l:NoOp:rsS:tU:vwWx",
long_options, )) != -1)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding timeline following take II

2016-11-12 Thread Andres Freund
On 2016-10-24 17:49:13 +0200, Petr Jelinek wrote:
> + * Determine which timeline to read an xlog page from and set the
> + * XLogReaderState's state->currTLI to that timeline ID.

"XLogReaderState's state->currTLI" - the state's a bit redundant.

> + * The caller must also make sure it doesn't read past the current redo 
> pointer
> + * so it doesn't fail to notice that the current timeline became historical.
> + */

Not sure what that means? The redo pointer usually menas the "logical
start" of the last checkpoint, but I don't see how thta could be meant
here?

> +static void
> +XLogReadDetermineTimeline(XLogReaderState *state, XLogRecPtr wantPage, 
> uint32 wantLength)
> +{
> + const XLogRecPtr lastReadPage = state->readSegNo * XLogSegSize + 
> state->readOff;
> +
> + elog(DEBUG4, "Determining timeline for read at %X/%X+%X",
> + (uint32)(wantPage>>32), (uint32)wantPage, wantLength);

Doing this on every single read from a page seems quite verbose.   It's
also (like most or all the following debug elogs) violating the casing
prescribed by the message style guidelines.


> + /*
> +  * If the desired page is currently read in and valid, we have nothing 
> to do.
> +  *
> +  * The caller should've ensured that it didn't previously advance 
> readOff
> +  * past the valid limit of this timeline, so it doesn't matter if the 
> current
> +  * TLI has since become historical.
> +  */
> + if (lastReadPage == wantPage &&
> + state->readLen != 0 &&
> + lastReadPage + state->readLen >= wantPage + 
> Min(wantLength,XLOG_BLCKSZ-1))
> + {
> + elog(DEBUG4, "Wanted data already valid"); //XXX
> + return;
> + }

With that kind of comment/XXX present, this surely can't be ready for
committer?


> + /*
> +  * If we're reading from the current timeline, it hasn't become 
> historical
> +  * and the page we're reading is after the last page read, we can again
> +  * just carry on. (Seeking backwards requires a check to make sure the 
> older
> +  * page isn't on a prior timeline).
> +  */

How can ThisTimeLineID become historical?

> + if (state->currTLI == ThisTimeLineID && wantPage >= lastReadPage)
> + {
> + Assert(state->currTLIValidUntil == InvalidXLogRecPtr);
> + elog(DEBUG4, "On current timeline");
> + return;
> + }

Also, is it actually ok to rely on ThisTimeLineID here? That's IIRC not
maintained outside of the startup process, until recovery ended (cf it
being set in InitXLOGAccess() called via RecoveryInProgress()).


>   /*
> -  * TODO: we're going to have to do something more intelligent 
> about
> -  * timelines on standbys. Use readTimeLineHistory() and
> -  * tliOfPointInHistory() to get the proper LSN? For now we'll 
> catch
> -  * that case earlier, but the code and TODO is left in here for 
> when
> -  * that changes.
> +  * Check which timeline to get the record from.
> +  *
> +  * We have to do it each time through the loop because if we're 
> in
> +  * recovery as a cascading standby, the current timeline 
> might've
> +  * become historical.
>*/

I guess you mean cascading as "replicating logically from a physical
standby"? I don't think it's good to use cascading here, because that's
usually thought to mean doing physical SR from a standby...



> diff --git a/src/backend/replication/logical/logicalfuncs.c 
> b/src/backend/replication/logical/logicalfuncs.c
> index 318726e..4315fb3 100644
> --- a/src/backend/replication/logical/logicalfuncs.c
> +++ b/src/backend/replication/logical/logicalfuncs.c
> @@ -234,12 +234,6 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo 
> fcinfo, bool confirm, bool bin
>   rsinfo->setResult = p->tupstore;
>   rsinfo->setDesc = p->tupdesc;
>  
> - /* compute the current end-of-wal */
> - if (!RecoveryInProgress())
> - end_of_wal = GetFlushRecPtr();
> - else
> - end_of_wal = GetXLogReplayRecPtr(NULL);
> -
>   ReplicationSlotAcquire(NameStr(*name));
>  
>   PG_TRY();
> @@ -279,6 +273,12 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo 
> fcinfo, bool confirm, bool bin
>   /* invalidate non-timetravel entries */
>   InvalidateSystemCaches();
>  
> + if (!RecoveryInProgress())
> + end_of_wal = GetFlushRecPtr();
> + else
> + end_of_wal = GetXLogReplayRecPtr(NULL);
> +
> + /* Decode until we run out of records */
>   while ((startptr != InvalidXLogRecPtr && startptr < end_of_wal) 
> ||
>  (ctx->reader->EndRecPtr != InvalidXLogRecPtr && 
> ctx->reader->EndRecPtr < end_of_wal))
>   {

That seems like a pretty random change?


> diff --git 

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-12 Thread Gilles Darold
Le 09/11/2016 à 02:51, Robert Haas a écrit :
> On Thu, Nov 3, 2016 at 7:34 PM, Karl O. Pinc  wrote:
>> To me, the CURRENT_LOG_FILENAME symbol should contain the name
>> of the current log file.  It does not.  The CURRENT_LOG_FILENAME symbol
>> holds the name of the file which itself contains the name of
>> the log file(s) being written, plus the log file
>> structure of each log file.
> I agree that this is confusing.
>
>> IMO, the name of the log files being written, as well as
>> the type of data structure written into each log file,
>> are meta-information about the logging data.  So maybe
>> the right name is LOG_METAINFO_DATAFILE.
> +1 for something like this.
>

Ok, it will be changed it in next patch.

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Change the way that LWLocks for extensions are allocated.

2016-11-12 Thread Andres Freund
Hi,

On 2016-11-08 10:25:11 +0530, Amit Kapila wrote:
>  ifdef USE_PGXS
>  PG_CONFIG = pg_config
>  PGXS := $(shell $(PG_CONFIG) --pgxs)
> @@ -21,3 +25,29 @@ top_builddir = ../..
>  include $(top_builddir)/src/Makefile.global
>  include $(top_srcdir)/contrib/contrib-global.mk
>  endif
> +
> +# Disabled because these tests require 
> "shared_preload_libraries=pg_stat_statements",
> +# which typical installcheck users do not have (e.g. buildfarm clients).

> +installcheck:;
> +
> +check: regresscheck
> +
> +submake-regress:
> + $(MAKE) -C $(top_builddir)/src/test/regress all
> +
> +submake-pg_stat_statements:
> + $(MAKE) -C $(top_builddir)/contrib/pg_stat_statements

Why is this a submake? That seems to make little sense?   But stepping
back one step further: I don't think we need all the remade rules in the
first place.

REGRESS_OPTS = --temp-config 
$(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
and then
installcheck: REGRESS=
to prevent installcheck from doing anything ought to be enough these days.

Committed after simplifying the Makefile.

We can't simplify test_decoding's makefile to that extent, because it
uses isolationtester, which we don't provide for contrib yet...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2016-11-12 Thread Amit Kapila
On Thu, Nov 10, 2016 at 10:43 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
>> OK.  I agree that's a problem.  However, your patch adds zero new comment
>> text while removing some existing comments, so I can't easily tell how it
>> solves that problem or whether it does so correctly.  Even if I were smart
>> enough to figure it out, I wouldn't want to rely on the next person also
>> being that smart.  This is obviously a subtle problem in tricky code, so
>> a clear explanation of the fix seems like a very good idea.
>
> The comment describes what the code is trying to achieve.  Actually, I just 
> imitated the code and comment of later major releases.  The only difference 
> between later releases and my patch (for 9.2) is whether the state is stored 
> in XLogReaderStruct or as global variables.  Below is the comment from 9.6, 
> where the second paragraph describes what the two nested if conditions mean.  
> The removed comment lines are what became irrelevant, which is also not 
> present in later major releases.
>
> /*
>  * Since child timelines are always assigned a TLI greater than their
>  * immediate parent's TLI, we should never see TLI go backwards across
>  * successive pages of a consistent WAL sequence.
>  *
>  * Sometimes we re-read a segment that's already been (partially) 
> read. So
>  * we only verify TLIs for pages that are later than the last 
> remembered
>  * LSN.
>  */
>

I think the changes which you are referring has been done as part of
commit 7fcbf6a405ffc12a4546a25b98592ee6733783fc.  There is no mention
of such a bug fix in that commit; however, it is quite possible that
such a change has fixed the problem you have reported.  It is not
clear if we can directly copy that change and it seems to me the
change copied is also not complete.  It looks like the code in 9.3 or
later version uses the recptr as the target segment location
(targetSegmentPtr) whereas 9.2 uses recptr as beginning of segment
(readOff = 0;).  If above understanding is right then it will set
different values for latestPagePtr in 9.2 and 9.3 onwards code.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-12 Thread Andres Freund
Hi,

On 2016-11-11 16:42:43 +0900, Michael Paquier wrote:

> + * This takes also
> + * advantage to avoid 8-byte torn reads on some platforms by using the
> + * fact that each insert lock is located on the same cache line.

Something residing on the same cache line doens't provide that guarantee
on all platforms.


> + * XXX: There is still room for more improvements here, particularly
> + * WAL operations related to unlogged relations (INIT_FORKNUM) should not
> + * update the progress LSN as those relations are reset during crash
> + * recovery so enforcing buffers of such relations to be flushed for
> + * example in the case of a load only on unlogged relations is a waste
> + * of disk write.

Commit records still have to be written, everything else doesn't write
WAL. So I'm doubtful this matters much?

> @@ -997,6 +1022,24 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn)
>   inserted = true;
>   }
>  
> + /*
> +  * Update the LSN progress positions. At least one WAL insertion lock
> +  * is already taken appropriately before doing that, and it is simpler
> +  * to do that here when the WAL record data and type are at hand.

But we don't use the "WAL record data and type"?


> + * GetProgressRecPtr -- Returns the newest WAL activity position, or in
> + * other words any activity not referring to standby logging or segment
> + * switches. Finding the last activity position is done by scanning each
> + * WAL insertion lock by taking directly the light-weight lock associated
> + * to it.
> + */

I'd prefer not to list the specific records here - that's just
guaranteed to get out of date. Why not say something "any activity not
requiring a checkpoint to be triggered" or such?


> +  * If this isn't a shutdown or forced checkpoint, and if there has been 
> no
> +  * WAL activity, skip the checkpoint.  The idea here is to avoid 
> inserting
> +  * duplicate checkpoints when the system is idle. That wastes log space,
> +  * and more importantly it exposes us to possible loss of both current 
> and
> +  * previous checkpoint records if the machine crashes just as we're 
> writing
> +  * the update.

Shouldn't this mention archiving and also that we also ignore some forms
of WAL activity?

> -/* Should the in-progress insertion log the origin? */
> -static bool include_origin = false;
> +/* status flags of the in-progress insertion */
> +static uint8 status_flags = 0;

that seems a bit too generic a name. 'curinsert_flags'?

>  /*

> @@ -317,19 +317,23 @@ BackgroundWriterMain(void)
>   {
>   TimestampTz timeout = 0;
>   TimestampTz now = GetCurrentTimestamp();
> + XLogRecPtr  current_progress_lsn = 
> GetProgressRecPtr();
>  
>   timeout = TimestampTzPlusMilliseconds(last_snapshot_ts,
>   
>   LOG_SNAPSHOT_INTERVAL_MS);
>  
>   /*
> -  * only log if enough time has passed and some xlog 
> record has
> -  * been inserted.
> +  * Only log if enough time has passed, that some WAL 
> activity
> +  * has happened since last checkpoint, and that some 
> new WAL
> +  * records have been inserted since the last time we 
> came here.

I think that sentence needs some polish.


>*/
>   if (now >= timeout &&
> - last_snapshot_lsn != GetXLogInsertRecPtr())
> + GetLastCheckpointRecPtr() < 
> current_progress_lsn &&
> + last_progress_lsn < current_progress_lsn)
>   {

Hm. I don't think it's correct to use GetLastCheckpointRecPtr() here?
Don't we need to do the comparisons here (and when doing the checkpoint
itself) with the REDO pointer of the last checkpoint?


> diff --git a/src/backend/postmaster/checkpointer.c 
> b/src/backend/postmaster/checkpointer.c
> index 397267c..7ecc00e 100644
> --- a/src/backend/postmaster/checkpointer.c
> +++ b/src/backend/postmaster/checkpointer.c
> @@ -164,6 +164,7 @@ static double ckpt_cached_elapsed;
>  
>  static pg_time_t last_checkpoint_time;
>  static pg_time_t last_xlog_switch_time;
> +static XLogRecPtr last_xlog_switch_lsn = InvalidXLogRecPtr;

Hm. Is it a good idea to use a static for this? Did you consider
checkpointer restarts?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2016-11-12 Thread Michael Paquier
On Sat, Nov 12, 2016 at 1:46 PM, Michael Paquier
 wrote:
> On Fri, Nov 11, 2016 at 11:03 PM, Albe Laurenz  
> wrote:
>>   This would avoid confusion, and we expect that few people will want to use
>>   this option anyway, right?
>
> Definitely a good point.

Meh. I forgot docs and --help output updates.
-- 
Michael
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 371a614..dcad095 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -815,6 +815,20 @@ PostgreSQL documentation
  
 
  
+  --no-sync
+  
+   
+By default, pg_dump will wait for all files
+to be written safely to disk.  This option causes
+pg_dump to return without waiting, which is
+faster, but means that a subsequent operating system crash can leave
+the dump corrupt.  Generally, this option is useful for testing
+but should not be used when dumping data from production installation.
+   
+  
+ 
+
+ 
   --quote-all-identifiers
   

diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 97168a0..70c5332 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -342,6 +342,20 @@ PostgreSQL documentation
  
 
  
+  --no-sync
+  
+   
+By default, pg_dumpall will wait for all files
+to be written safely to disk.  This option causes
+pg_dumpall to return without waiting, which is
+faster, but means that a subsequent operating system crash can leave
+the dump corrupt.  Generally, this option is useful for testing
+but should not be used when dumping data from production installation.
+   
+  
+ 
+
+ 
   --no-tablespaces
   

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 0a28124..becbc56 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -268,7 +268,7 @@ extern Archive *OpenArchive(const char *FileSpec, const ArchiveFormat fmt);
 
 /* Create a new archive */
 extern Archive *CreateArchive(const char *FileSpec, const ArchiveFormat fmt,
-			  const int compression, ArchiveMode mode,
+			  const int compression, bool dosync, ArchiveMode mode,
 			  SetupWorkerPtr setupDumpWorker);
 
 /* The --list option */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 0e20985..5b60da6 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -56,7 +56,8 @@ static const char *modulename = gettext_noop("archiver");
 
 
 static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
-	 const int compression, ArchiveMode mode, SetupWorkerPtr setupWorkerPtr);
+	 const int compression, bool dosync, ArchiveMode mode,
+	 SetupWorkerPtr setupWorkerPtr);
 static void _getObjectDescription(PQExpBuffer buf, TocEntry *te,
 	  ArchiveHandle *AH);
 static void _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData, bool acl_pass);
@@ -202,10 +203,12 @@ setupRestoreWorker(Archive *AHX)
 /* Public */
 Archive *
 CreateArchive(const char *FileSpec, const ArchiveFormat fmt,
-	 const int compression, ArchiveMode mode, SetupWorkerPtr setupDumpWorker)
+			  const int compression, bool dosync, ArchiveMode mode,
+			  SetupWorkerPtr setupDumpWorker)
 
 {
-	ArchiveHandle *AH = _allocAH(FileSpec, fmt, compression, mode, setupDumpWorker);
+	ArchiveHandle *AH = _allocAH(FileSpec, fmt, compression, dosync,
+ mode, setupDumpWorker);
 
 	return (Archive *) AH;
 }
@@ -215,7 +218,7 @@ CreateArchive(const char *FileSpec, const ArchiveFormat fmt,
 Archive *
 OpenArchive(const char *FileSpec, const ArchiveFormat fmt)
 {
-	ArchiveHandle *AH = _allocAH(FileSpec, fmt, 0, archModeRead, setupRestoreWorker);
+	ArchiveHandle *AH = _allocAH(FileSpec, fmt, 0, true, archModeRead, setupRestoreWorker);
 
 	return (Archive *) AH;
 }
@@ -2223,7 +2226,8 @@ _discoverArchiveFormat(ArchiveHandle *AH)
  */
 static ArchiveHandle *
 _allocAH(const char *FileSpec, const ArchiveFormat fmt,
-	  const int compression, ArchiveMode mode, SetupWorkerPtr setupWorkerPtr)
+		 const int compression, bool dosync, ArchiveMode mode,
+		 SetupWorkerPtr setupWorkerPtr)
 {
 	ArchiveHandle *AH;
 
@@ -2277,6 +2281,7 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
 
 	AH->mode = mode;
 	AH->compression = compression;
+	AH->dosync = dosync;
 
 	memset(&(AH->sqlparse), 0, sizeof(AH->sqlparse));
 
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 50cf452..4492542 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -307,6 +307,7 @@ struct _archiveHandle
  * values for compression: -1
  * Z_DEFAULT_COMPRESSION 0	COMPRESSION_NONE
  * 1-9 levels for gzip compression */
+	bool		dosync;			/* 

Re: [HACKERS] Shared memory estimation for postgres

2016-11-12 Thread Andres Freund
On 2016-11-11 09:56:21 +0900, Michael Paquier wrote:
> On Fri, Nov 11, 2016 at 8:40 AM, leoaaryan  wrote:
> > The easiest way to find the value for the shared memory computation is to
> > change the logging level to DEBUG3 and start postgres DB engine and it will
> > give the calculated value in the log file.
> >
> > I believe postgres as a DB needs to be running for any extension to run (I
> > may be wrong here) and I dont want to start the database for this analysis.
> >
> > Please correct me if I'm wrong in my concepts or if I've not understood
> > anything.
> 
> Some time ago a patch has been proposed to project to a system view
> the shared memory allocations currently happening on an instance:
> https://www.postgresql.org/message-id/20140504114417.gm12...@awork2.anarazel.de
> This could be plugged into its own extension rather easily. Note
> though that DSM segments (dynamic shared memory) do not have names
> associated to them which would be helpful if you'd like to track that
> as well. But perhaps you don't care much.

FWIW, there's a number of purposes where it'd be more useful to have the
total amount of shared memory output without starting postgres. E.g. to
compute the amount of hugepages to configure as root... I wondered if we
could have a readonly guc that's output by something like -C
total_shared_memory or such (which doesn't start a server...).

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers