date:20170227

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-02-27 Thread Masahiko Sawada

On Wed, Feb 15, 2017 at 3:11 PM, Masahiko Sawada  wrote:
> On Mon, Feb 6, 2017 at 10:48 PM, Masahiko Sawada  
> wrote:
>> On Wed, Feb 1, 2017 at 8:25 PM, Robert Haas  wrote:
>>> On Mon, Jan 30, 2017 at 2:30 AM, Masahiko Sawada  
>>> wrote:
 "txn" can be used for abbreviation of "Transaction", so for example
 pg_fdw_txn_resolver?
 I'm also fine to change the module and function name.
>>>
>>> If we're judging the relative clarity of various ways of abbreviating
>>> the word "transaction", "txn" surely beats "x".
>>>
>>> To repeat my usual refrain, is there any merit to abbreviating at all?
>>>  Could we call it, say, "fdw_transaction_resolver" or
>>> "fdw_transaction_manager"?
>>>
>>
>> Almost modules in contrib are name with "pg_" prefix but I prefer
>> "fdw_transcation_resolver" if we don't need  "pg_" prefix.
>>
>
> Since previous patches conflict to current HEAD, attached latest
> version patches.
> Please review them.
>

I've created a wiki page[1] describing about the design and
functionality of this feature. Also it has some examples of use case,
so this page would be helpful for even testing. Please refer it if
you're interested in testing this feature.

[1] 2PC on FDW


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund

Hi,

*preliminary* patch attached. This needs a good bit of polishing
(primarily comment work, verifying that valgrind works), but I'm too
tired now.

I'm not quite sure how to deal with mmgr/README - it's written as kind
of a historical document, and the "Mechanisms to Allow Multiple Types of
Contexts" is already quite out of date.  I think it'd be good to rip out
all the historical references and just describe the current state, but
I'm not really enthusiastic about tackling that :/

On 2017-02-28 00:29:44 -0500, Tom Lane wrote:
> Not sure either, but suggest we add a StaticAssert asserting there's no
> padding; something along the lines of
>   offsetof(AllocSetChunkHeader, context) + sizeof(MemoryContext) == 
> MAXALIGN(sizeof(AllocSetChunkHeader))

Included.

With Craig's help I've verified that the patch as attached works on a
platform that's currently failing. Thanks!

- Andres
>From a59c3200dd127feb0cb09a055250ff6401aee1aa Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 27 Feb 2017 23:32:22 -0800
Subject: [PATCH] Remove StandardChunkHeader for Slab's benefit.

---
 src/backend/utils/mmgr/aset.c|  51 -
 src/backend/utils/mmgr/mcxt.c| 121 +++
 src/backend/utils/mmgr/slab.c|  81 --
 src/include/utils/memutils.h |  56 +++---
 src/tools/pgindent/typedefs.list |   1 -
 5 files changed, 92 insertions(+), 218 deletions(-)

diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 8056c00ae4..31228ade19 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -97,20 +97,7 @@
  */
 
 #define ALLOC_BLOCKHDRSZ	MAXALIGN(sizeof(AllocBlockData))
-#define ALLOC_CHUNKHDRSZ	MAXALIGN(sizeof(AllocChunkData))
-
-/* Portion of ALLOC_CHUNKHDRSZ examined outside aset.c. */
-#define ALLOC_CHUNK_PUBLIC	\
-	(offsetof(AllocChunkData, size) + sizeof(Size))
-
-/* Portion of ALLOC_CHUNKHDRSZ excluding trailing padding. */
-#ifdef MEMORY_CONTEXT_CHECKING
-#define ALLOC_CHUNK_USED	\
-	(offsetof(AllocChunkData, requested_size) + sizeof(Size))
-#else
-#define ALLOC_CHUNK_USED	\
-	(offsetof(AllocChunkData, size) + sizeof(Size))
-#endif
+#define ALLOC_CHUNKHDRSZ	sizeof(struct AllocChunkData)
 
 typedef struct AllocBlockData *AllocBlock;		/* forward reference */
 typedef struct AllocChunkData *AllocChunk;
@@ -169,20 +156,27 @@ typedef struct AllocBlockData
 /*
  * AllocChunk
  *		The prefix of each piece of memory in an AllocBlock
- *
- * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
  */
 typedef struct AllocChunkData
 {
-	/* aset is the owning aset if allocated, or the freelist link if free */
-	void	   *aset;
 	/* size is always the size of the usable space in the chunk */
 	Size		size;
+
 #ifdef MEMORY_CONTEXT_CHECKING
 	/* when debugging memory usage, also store actual requested size */
 	/* this is zero in a free chunk */
 	Size		requested_size;
+
+#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
+	Size		padding;
 #endif
+
+#endif /* MEMORY_CONTEXT_CHECKING */
+
+	/* aset is the owning aset if allocated, or the freelist link if free */
+	void	   *aset;
+
+	/* there must not be any padding to reach a MAXALIGN boundary here! */
 }	AllocChunkData;
 
 /*
@@ -334,6 +328,10 @@ AllocSetContextCreate(MemoryContext parent,
 {
 	AllocSet	set;
 
+	StaticAssertStmt(offsetof(AllocChunkData, aset) + sizeof(MemoryContext) ==
+	 MAXALIGN(sizeof(AllocChunkData)),
+	 "padding calculation in AllocChunkData is wrong");
+
 	/*
 	 * First, validate allocation parameters.  (If we're going to throw an
 	 * error, we should do so before the context is created, not after.)  We
@@ -616,13 +614,14 @@ AllocSetAlloc(MemoryContext context, Size size)
 		AllocAllocInfo(set, chunk);
 
 		/*
-		 * Chunk header public fields remain DEFINED.  The requested
-		 * allocation itself can be NOACCESS or UNDEFINED; our caller will
-		 * soon make it UNDEFINED.  Make extra space at the end of the chunk,
-		 * if any, NOACCESS.
+		 * Chunk's context fields remain DEFINED.  The requested allocation
+		 * itself can be NOACCESS or UNDEFINED; our caller will soon make it
+		 * UNDEFINED.  Make extra space at the end of the chunk, if any,
+		 * NOACCESS.
 		 */
-		VALGRIND_MAKE_MEM_NOACCESS((char *) chunk + ALLOC_CHUNK_PUBLIC,
-		 chunk_size + ALLOC_CHUNKHDRSZ - ALLOC_CHUNK_PUBLIC);
+		VALGRIND_MAKE_MEM_NOACCESS((char *) chunk,
+   chunk_size + ALLOC_CHUNKHDRSZ);
+		VALGRIND_MAKE_MEM_DEFINED(chunk->aset, sizeof(MemoryContext));
 
 		return AllocChunkGetPointer(chunk);
 	}
@@ -709,7 +708,7 @@ AllocSetAlloc(MemoryContext context, Size size)
 chunk = (AllocChunk) (block->freeptr);
 
 /* Prepare to initialize the chunk header. */
-VALGRIND_MAKE_MEM_UNDEFINED(chunk, ALLOC_CHUNK_USED);
+VALGRIND_MAKE_MEM_UNDEFINED(chunk, sizeof(AllocChunkData));
 
 block->freeptr += (availchunk + ALLOC_CHUNKHDRSZ);
 availspace -= (availchunk + ALLOC_CHUNKHDRSZ);
@@ -799,7

Re: [HACKERS] Report the number of skipped frozen pages by manual VACUUM

2017-02-27 Thread Masahiko Sawada

On Fri, Feb 24, 2017 at 1:30 AM, Masahiko Sawada  wrote:
> Hi all,
>
> The autovacuum reports the number of skipped frozen pages to the
> VACUUM output. But these information is not appeared by manual VACUUM.
> This information is useful for the user to check efficiency of VACUUM.
>
> Attached patch add this information to VACUUM output.
>
> * Example
> =# VACUUM VERBOSE test
> INFO:  vacuuming "public.test"
> INFO:  "test": found 0 removable, 56 nonremovable row versions in 1
> out of 45 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> Skipped 0 pages due to buffer pins, 44 frozen pages.
> 0 pages are entirely empty.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> VACUUM
>
> I'll register it to next CF.
>

Added this patch to CF 2017-03.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Statement-level rollback

2017-02-27 Thread Tsunakawa, Takayuki

Hello,

As I stated here and at the PGConf.ASIA developer meeting last year, I'd like 
to propose statement-level rollback feature.  To repeat myself, this is 
requested for users to migrate from other DBMSs to PostgreSQL.  They expect 
that a failure of one SQL statement should not abort the entire transaction and 
their apps (client programs and stored procedures) can continue the transaction 
with a different SQL statement.


SPECIFICATION
==

START TRANSACTION ROLLBACK SCOPE { TRANSACTION | STATEMENT };

This syntax controls the behavior of the transaction when an SQL statement 
fails.  TRANSACTION (default) is the traditional behavior (i.e. rolls back the 
entire transaction or subtransaction).  STATEMENT rolls back the failed SQL 
statement.

Just like the isolation level and access mode, 
default_transaction_rollback_scope GUC variable is also available.


DESIGN
==

Nothing much to talk about... it merely creates a savepoint before each 
statement execution and destroys it after the statement finishes.  This is done 
in postgres.c for top-level SQL statements.

The stored function hasn't been handled yet; I'll submit the revised patch soon.


CONSIDERATIONS AND REQUESTS
==

The code for stored functions is not written yet, but I'd like your feedback 
for the specification and design based on the current patch.  I'll add this 
patch to CommitFest 2017-3.

The patch creates and destroys a savepoint for each message of the extended 
query protocol (Parse, Bind, Execute and Describe).  I'm afraid this will add 
significant overhead, but I don't find a better way, because those messages 
could be send arbitrarily for different statements, e.g. Parse stmt1, Parse 
stmt2, Bind stmt1, Execute stmt1, Bind stmt2, Execute stmt2.


Regards
Takayuki Tsunakawa



stmt_rollback.patch
Description: stmt_rollback.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Keep ECPG comment for log_min_duration_statement

2017-02-27 Thread Okano, Naoki

Michael Meskes wrote:
> > Michael Meskes wrote:
> > > The other option I can see, albeit without looking into details, is 
> > > allowing all comments and then testing it for the right syntax after 
> > > parsing. This could potentially also solve the above mentioned 
> > > option problem.
> >
> > This idea sounds great. But I am not sure that I can understand it 
> > correctly.
> >
> > I understood the concept of this idea as following. Is it right?
> > 1. The parser ignores comments/*...*/ as usual. That is, parser does 
> > not
> >   identify comments as a token.
>
> I guess it'd be easier to accept each comment as a token and then parse the 
> token 
> text afterwards.
>
> > 2. After parsing, we parse again only to extract comments.
>
> Not sure if we can do that without creating a lot of overhead.
I see. Based on your advice, I try to make a patch. 
I will attach a patch when I finish it.

Regards,
Okano Naoki
Fujitsu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-27 Thread David Fetter

On Mon, Feb 27, 2017 at 11:53:17PM -0500, Stephen Frost wrote:
> * Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote:
> > I suppose it is for suggesting what kind of word should come
> > there, or avoiding silence for a tab. Or for symmetry with other
> > types of manipulation, like DROP. Another possibility is creating
> > multiple objects with similar names, say CREATE TABLE employee_x1,
> > CREATE TABLE employee_x2. Just trying to complete existing
> > *schema* is one more another possible objective.
> 
> I don't buy any of these arguments either.  I *really* don't want us
> going down some road where we try to make sure that hitting 'tab'
> never fails...

Wouldn't that just be a correct, grammar-aware implementation of tab
completion?  Why wouldn't you want that?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication existing data copy

2017-02-27 Thread Erik Rijkers


On 2017-02-27 15:08, Petr Jelinek wrote:


The performance was why in original patch I wanted the apply process to
default to synchronous_commit = off as without it the apply performance
(due to applying transactions individually and in sequences) is quite
lackluster.

It can be worked around using user that has synchronous_commit = off 
set

via ALTER ROLE as owner of the subscription.



Wow, that's a huge difference in speed.

I set
   ALTER ROLE aardvark synchronous_commit = off;

during the first iteration of a 10x pgbench-test (so the first was still 
done with it 'on'):

here the pertinent grep | uniq -c lines:

-- out_20170228_0004.txt
 10 -- pgbench -c 16 -j 8 -T 900 -P 180 -n   --  scale 25
 10 -- All is well.
  1 -- 1325 seconds total.
  9 -- 5 seconds total.

And that 5 seconds is a hardcoded wait; so it's probably even quicker.

This is a slowish machine but that's a really spectacular difference.  
It's the difference between keeping up or getting lost.


Would you remind me why synchronous_commit = on was deemed a better 
default?  This thread isn't very clear about it (not the 'logical 
replication WIP' thread).



thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] timeouts in PostgresNode::psql

2017-02-27 Thread Michael Paquier

On Mon, Feb 27, 2017 at 11:28 AM, Craig Ringer  wrote:
> Amended patch attached after a few Perl-related comments I got on
> private mail.

Out of curiosity, what are they?

> Instead of
>
> $exc_save !~ /^$timeout_exception.*/
>
> I've updated to:
>
> $exc_save !~ /^\Q$timeout_exception\E/
>
> i.e. don't do an unnecessary wildcard match at the end, and disable
> metachar interpretation in the substituted range.
>
> Still needs applying to pg9.6 and pg10.

I did not understand at first what you meant, but after looking at the
commit message of the patch things are clear:
Newer Perl or IPC::Run versions default to appending the filename to string
exceptions, e.g. the exception
psql timed out
 is thrown as
psql timed out at /usr/share/perl5/vendor_perl/IPC/Run.pm line 2961.

And that's good to know, I can see as well that the patch is on the CF app:
https://commitfest.postgresql.org/13/
And that it has been marked as ready for committer.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

2017-02-27 Thread Michael Paquier

On Mon, Feb 27, 2017 at 5:37 PM, Kyotaro HORIGUCHI
 wrote:
> At Wed, 22 Feb 2017 16:06:14 +0900, Michael Paquier 
>  wrote in 
> 
>> In order to conduct sanity checks on the shape of the radix tree maps
>> compared to the existing maps, having map_checker surely makes sense.
>> Now in the final result I don't think we need it. The existing map
>> files ought to be replaced by their radix versions at the end, and
>> map_checker should be removed. This leads to a couple of
>> simplifications, like Makefile, and reduces the maintenance to one
>> mechanism.
>
> Hmm.. Though I don't remember clearly what the radix map of the
> first version looked like, the current radix map seems
> human-readable for me. It might be by practice or by additional
> comments in map files. Anyway I removed all of the stuff so as
> not to generate the plain maps. But I didn't change the names of
> _radix.map and just commented out the line to output the plain
> maps in UCS_to_*.pl.  Combined maps are still in the plain format
> so print_tables was changed to take character tables separately
> for regular (non-combined) characters and combined characters.

Do others have thoughts to offer on the matter? I would think that the
new radix maps should just replace by the old plain ones, and that the
only way to build the maps going forward is to use the new methods.
The radix trees is the only thing used in the backend code as well
(conv.c). We could keep the way to build the old maps, with the
map_checker in module out of the core code. FWIW, I am fine to add the
old APIs in my plugin repository on github and have the sanity checks
in that as well. And of course also publish on this thread a module to
do that.

>> Or if you want to go to the road of non-simple things, you
>> could have two arguments: an origin and a target. If one is
>> UTF8 the other is the mapping name.
>
> Mmmm. It seems (even) to me to give more harm than good.  I can
> guess two alternatives for this.
>
> - Split the property {direction} into two boolean properties
>   {to_unicode} and {from_unicode}.
>
> - Make the {direction} property an integer and compared with
>   defined constants $BOTH, $TO_UNICODE and $FROM_UNICODE using
>   the '=' operator.
>
> I choosed the former in this patch.

Fine for me.

>> +   $charmap{ ucs2utf($src) } = $dst;
>> +   }
>> +
>> +   }
>> Unnecessary newline here.
>
> removed in convutils.pm.
>
> Since Makefile ignores old .map files, the steps to generate a
> patch for map files was a bit chaged.
>
> $ rm *.map
> $ make distclean maintainer-clean all
> $ make distclean
> $ git add .
> $ git commit

+# ignore generated files
+/map_checker
+/map_checker.h
[...]
+map_checker.h: make_mapchecker.pl $(MAPS) $(RADIXMAPS)
+   $(PERL) $<
+
+map_checker.o: map_checker.c map_checker.h ../char_converter.c
+
+map_checker: map_checker.o
With map_checker out of the game, those things are not needed.

+++ b/src/backend/utils/mb/char_converter.c
@@ -0,0 +1,116 @@
+/*-
+ *
+ *   Character converter function using radix tree
In the simplified version of the patch, pg_mb_radix_conv() being only
needed in conv.c I think that this could just be a static local
routine.

-#include "../../Unicode/utf8_to_koi8r.map"
-#include "../../Unicode/koi8r_to_utf8.map"
-#include "../../Unicode/utf8_to_koi8u.map"
-#include "../../Unicode/koi8u_to_utf8.map"
+#include "../../Unicode/utf8_to_koi8r_radix.map"
+#include "../../Unicode/koi8r_to_utf8_radix.map"
+#include "../../Unicode/utf8_to_koi8u_radix.map"
+#include "../../Unicode/koi8u_to_utf8_radix.map"
FWIW, I am fine to use those new names as include points.

-distclean: clean
+distclean:
rm -f $(TEXTS)
-maintainer-clean: distclean
+# maintainer-clean intentionally leaves $(TEXTS)
+maintainer-clean:
Why is that? There is also a useless diff down that code block.

+conv.o: conv.c char_converter.c
This also can go away.

-print_tables("EUC_JIS_2004", \@all, 1);
+# print_tables("EUC_JIS_2004", \@regular, undef, 1);
+print_radix_trees("EUC_JIS_2004", \@regular);
+print_tables("EUC_JIS_2004", undef, \@combined, 1);
[...]
 sub print_tables
 {
-   my ($charset, $table, $verbose) = @_;
+   my ($charset, $regular, $combined, $verbose) = @_;
print_tables is only used for combined maps, you could remove $regular
from it and just keep $combined around, perhaps renaming print_tables
to print_combined_maps?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-27 Thread Amit Langote

Thanks for the review.  I was just about to send a new version of the patches.

On 2017/02/28 12:20, Robert Haas wrote:
> On Thu, Feb 23, 2017 at 11:19 AM, Amit Langote
>  wrote:
>> Thanks for the review.
> 
> In 0001, the documentation which you are patching has a section for
> limitations that apply only to both partitioning and constraint
> exclusion, and another for limitations that apply only to constraint
> exclusion.  Surely the patch should be moving a limitation that will
> no longer apply to partitioning from the first section to the second
> section, rather than leaving it in the section for limitations that
> apply to both systems and just adding a note that say "this doesn't
> apply to partitioning any more".

So we have the following headings under 5.11.6 Caveats:

"The following caveats apply to partitioned tables implemented using
either method (unless noted otherwise)"

and,

"The following caveats apply to constraint exclusion"

The latter lists only the caveats about the planner capability (constraint
exclusion) for partitioning - that it cannot be performed against volatile
WHERE conditions, only works with comparisons using btree operators, does
not scale beyond hundred partitions, etc.  These limitations are common to
both systems (inheritance and declarative partitioned tables), because
both rely on constraint exclusion.

How about separating the limitations listed under the first heading into:

"Limitations of using inheritance for partitioning"

and,

"Limitations of using partitioned tables"

This particular patch gets rid of the restriction that vacuum/analyze do
not recurse to child tables (partitions) only for the second of the above.

How about the documentation changes in the attached updated 0001?  I know
that this page needs a much larger rewrite as we are discussing in the
other thread.

> In acquire_inherited_sample_rows(), instead of inserting a whole
> stanza of logic just above the existing dispatch on relkind, I think
> we can get by with a very slightly update to what's already there.
> 
> You can't use the result of a & b as a bool.  You need to write (a &
> b) != 0, because the bool should always use 1 for true and 0 for
> false; it should not set some higher-numbered bit.

Oops, thanks for fixing that.  I suppose you are referring to this hunk in
the original patch:

-relations = get_rel_oids(relid, relation);
+relations = get_rel_oids(relid, relation, options & VACOPT_VACUUM);

And we need to do it this way in *this* case, because we're passing it as
a bool argument.  I see that it's OK to do this:

stmttype = (options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";

Or this:

if (options & VACOPT_VACUUM)
{
PreventTransactionChain(isTopLevel, stmttype);

> The changes to autovacuum.c look useless, because
> relation_needs_vacanalyze() will presumably never fire for a table
> with no tuples of its own.

Hmm, I guess that makes sense.  Inheritance statistics are never collected
by autovacuum (at least in the partitioning setups and especially with the
partitioned tables now).  I recall that you mentioned [1] this as one of
the limitations of the system currently.

> Updated patch with those changes and a few cosmetic tweaks attached.

I have incorporated the changes in your patch in the attached updated
0001, with documentation tweaks as mentioned above.

Other than that, in acquire_inherited_sample_rows() I've used a method
suggested by Ashutosh Bapat to track if we encountered a child table,
which is much less ugly than below:

-if (nrels < 2)
+if ((onerel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE && nrels <
2) ||
+nrels < 1)

I keep updated 0002 and 0003 here, but I've changed their order.  Don't
scan partitioned tables thing is now 0002 and don't allocate file for
partitioned tables is 0003.  Sorry about the confusion.

Thanks,
Amit

[1]
https://postgr.es/m/CA%2BTgmobTxn2%2B0x96h5Le%2BGOK5kw3J37SRveNfzEdx9s5-Yd8vA%40mail.gmail.com
>From 5e3a88765b87d872083700f3d6dc1ddd86b64ae6 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 28 Feb 2017 13:43:59 +0900
Subject: [PATCH 1/3] Avoid useless partitioned table ops

Also, recursively perform VACUUM and ANALYZE on partitions when the
command is applied to a partitioned table.
---
 doc/src/sgml/ddl.sgml| 47 ++--
 src/backend/commands/analyze.c   | 37 ++
 src/backend/commands/tablecmds.c | 15 +++--
 src/backend/commands/vacuum.c| 66 
 4 files changed, 120 insertions(+), 45 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index ef0f7cf727..09b5b3ff70 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3792,8 +3792,7 @@ UNION ALL SELECT * FROM measurement_y2008m01;
Caveats

-The following caveats apply to partitioned tables implemented using either
-method (unless noted otherwise):
+The following caveats apply to using in

[HACKERS] WAL Consistency checking for hash indexes

2017-02-27 Thread Kuntal Ghosh

Hello everyone,

I've attached a patch which implements WAL consistency checking for
hash indexes. This feature is going to be useful for developing and
testing of WAL logging for hash index.

Note that, this patch should be applied on top of the following WAL
logging for hash index patch:
https://www.postgresql.org/message-id/CAA4eK1%2B%2BP%2BjVZC7MC3xzw5uLCpva9%2BgsBpd3semuWffWAftr5Q%40mail.gmail.com

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-wal_consistency_checking-for-hash-index.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tom Lane

Andres Freund  writes:
> Hm, that should be doable with something like
> #if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
> which'd probably be better documentation than a macro that hides this
> (arguing internally whether SIZEOF_VOID_P or SIZEOF_SIZE_T) is better.

Not sure either, but suggest we add a StaticAssert asserting there's no
padding; something along the lines of
offsetof(AllocSetChunkHeader, context) + sizeof(MemoryContext) == 
MAXALIGN(sizeof(AllocSetChunkHeader))

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

2017-02-27 Thread Alvaro Herrera

Here's another small patch, this time from Simon Riggs.  Maybe he already
posted it for this commitfest, but I didn't find it in a quick look so
here it is.

This patch reduces the amount of bloat you get from running CREATE INDEX
CONCURRENTLY by destroying the snapshot taken in the first phase, before
entering the second phase.  This allows the global xmin to advance,
letting concurrent vacuum keep bloat in other tables in check.
Currently this implements the change for btree indexes only, but doing
it for other indexes should be a one-liner.

-- 
Álvaro Herrera
diff --git a/src/backend/access/nbtree/nbtree.c 
b/src/backend/access/nbtree/nbtree.c
index 469e7ab..853d8e0 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -168,7 +168,17 @@ btbuild(Relation heap, Relation index, IndexInfo 
*indexInfo)
reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
   
btbuildCallback, (void *) &buildstate);
 
-   /* okay, all heap tuples are indexed */
+   /*
+* By this point, all heap tuples are indexed so we don't need the
+* snapshot to complete the index build.
+*
+* Ideally, we'd like to drop our snapshot as soon as possible, to
+* avoid holding xmin back and causing bloat. That is only possible
+* if we are running a concurrent index build because that command
+* is sufficiently restricted to allow this to happen safely.
+*/
+   if (indexInfo->ii_Concurrent)
+   PopActiveSnapshot();
if (buildstate.spool2 && !buildstate.haveDead)
{
/* spool2 turns out to be unnecessary */
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ed6136c..469d455 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -770,11 +770,27 @@ DefineIndex(Oid relationId,
/* Now build the index */
index_build(rel, indexRelation, indexInfo, stmt->primary, false);
 
+   /*
+* At this point it is possible that the indexAM has popped its
+* snapshot and will be unable to run other commands, but we
+* accept that restriction because it means the snapshot is held
+* open for much less time.
+*/
+   PopActiveSnapshotIfAny();
+
/* Close both the relations, but keep the locks */
heap_close(rel, NoLock);
index_close(indexRelation, NoLock);
 
/*
+* Now we need a snapshot so we can update our indexes. The snapshot
+* here is different from that used to build the index, but the only
+* thing we will do with it is update the pg_index row for the index
+* we have locked, so we're not in danger of mismatch.
+*/
+   PushActiveSnapshot(GetTransactionSnapshot());
+
+   /*
 * Update the pg_index row to mark the index as ready for inserts. Once 
we
 * commit this transaction, any new transactions that open the table 
must
 * insert new entries into the index for insertions and non-HOT updates.
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 92afc32..4d949f6 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -806,11 +806,19 @@ UpdateActiveSnapshotCommandId(void)
 void
 PopActiveSnapshot(void)
 {
+   Assert(ActiveSnapshot->as_snap->active_count > 0);
+   PopActiveSnapshotIfAny();
+}
+
+void
+PopActiveSnapshotIfAny(void)
+{
ActiveSnapshotElt *newstack;
 
-   newstack = ActiveSnapshot->as_next;
+   if (ActiveSnapshot == NULL)
+   return;
 
-   Assert(ActiveSnapshot->as_snap->active_count > 0);
+   newstack = ActiveSnapshot->as_next;
 
ActiveSnapshot->as_snap->active_count--;
 
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 2618cc4..66368d2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -75,6 +75,7 @@ extern void PushActiveSnapshot(Snapshot snapshot);
 extern void PushCopiedSnapshot(Snapshot snapshot);
 extern void UpdateActiveSnapshotCommandId(void);
 extern void PopActiveSnapshot(void);
+extern void PopActiveSnapshotIfAny(void);
 extern Snapshot GetActiveSnapshot(void);
 extern bool ActiveSnapshotSet(void);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund

Hi,

On 2017-02-27 22:57:24 -0500, Tom Lane wrote:
> If the slab allocator would be happier with just a MemoryContext pointer
> as chunk header, I think we should push in this direction rather than
> invent some short-term hack.

It would - it really doesn't need the size, because it's the same for
the whole context, and thereby is a waste of space.  Still wondering if
we should band-aid this till that's done.


> One could imagine redefining aset.c's chunk header along the lines of
> 
> typedef struct AllocSetChunkHeader
> {
> Sizesize; /* size of data space allocated in chunk */
> #ifdef MEMORY_CONTEXT_CHECKING
> Sizerequested_size;   /* original request size */
> #if 32-bit-but-maxalign-is-8
> Sizepadding;  /* needed to avoid padding below */
> #endif
> #endif
> MemoryContext context;/* owning context */
> /* there must not be any padding to reach a MAXALIGN boundary here! */
> } AllocSetChunkHeader;
> 
> where we'd possibly need some help from configure to implement that inner
> #if condition, but it seems doable enough.

Hm, that should be doable with something like
#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4

which'd probably be better documentation than a macro that hides this
(arguing internally whether SIZEOF_VOID_P or SIZEOF_SIZE_T) is better.

Working on a patch now, will post but not push tonight.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-02-27 Thread Beena Emerson

On Tue, Feb 28, 2017 at 9:45 AM, Jim Nasby  wrote:

> On 2/24/17 6:30 AM, Kuntal Ghosh wrote:
>
>> * You're considering any WAL file with a power of 2 as valid. Suppose,
>> the correct WAL seg size is 64mb. For some reason, the server
>> generated a 16mb invalid WAL file(maybe it crashed while creating the
>> WAL file). Your code seems to treat this as a valid file which I think
>> is incorrect. Do you agree with that?
>>
>
> Detecting correct WAL size based on the size of a random WAL file seems
> like a really bad idea to me.

> I also don't see the reason for #2... or is that how initdb writes out the
> correct control file?

The initdb module reads the size from the option provided and sets the
environment variable. This variable is read
in src/backend/access/transam/xlog.c and the ControlFile written.
Unlike pg_resetwal and pg_rewind, pg_basebackup cannot access the Control
file. It only accesses the wal log folder.  So we get the XLogSegSize from
the SHOW command using  replication connection.
As Kuntal pointed out, I might need to set it from  pg_receivewal.c as
well.

Thank you,

Beena Emerson

EnterpriseDB: https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-27 Thread Amit Langote

On 2017/02/28 13:52, Ashutosh Bapat wrote:
> Amit's original patch had an assertion failure in
> extract_autovac_opts(). His patch adds partitioned tables in the list
> of tables to be auto-analyzed. But there's an assertion in
> extract_autovac_opts(), which did not consider partitioned tables.
> When a partitioned table is created, the assertion trips in autovacuum
> worker and resets the cluster, restarting all the backends. I have
> updated Robert's patch with the assertion fixed.

Since Robert's patch backed out all autovacuum.c changes, partitioned
tables case would no longer be able to reach that Assert.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] BRIN de-summarize ranges

2017-02-27 Thread Alvaro Herrera

Here's a small patch to make a BRIN page range unsummarized.  This is
useful if data has been deleted, and the heap pages are now used for
completely different data.

-- 
Álvaro Herrera
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b22563b..fd7c323 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -846,6 +846,69 @@ brin_summarize_new_values(PG_FUNCTION_ARGS)
 }
 
 /*
+ * SQL-callable interface to mark a range as no longer summarized
+ */
+Datum
+brin_desummarize_range(PG_FUNCTION_ARGS)
+{
+   Oid indexoid = PG_GETARG_OID(0);
+   BlockNumber heapBlk = PG_GETARG_INT64(1);
+   Oid heapoid;
+   Relation heapRel;
+   Relation indexRel;
+   booldone;
+
+   /*
+* We must lock table before index to avoid deadlocks.  However, if the
+* passed indexoid isn't an index then IndexGetRelation() will fail.
+* Rather than emitting a not-very-helpful error message, postpone
+* complaining, expecting that the is-it-an-index test below will fail.
+*/
+   heapoid = IndexGetRelation(indexoid, true);
+   if (OidIsValid(heapoid))
+   heapRel = heap_open(heapoid, ShareUpdateExclusiveLock);
+   else
+   heapRel = NULL;
+
+   indexRel = index_open(indexoid, ShareUpdateExclusiveLock);
+
+   /* Must be a BRIN index */
+   if (indexRel->rd_rel->relkind != RELKIND_INDEX ||
+   indexRel->rd_rel->relam != BRIN_AM_OID)
+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+errmsg("\"%s\" is not a BRIN index",
+   
RelationGetRelationName(indexRel;
+
+   /* User must own the index (comparable to privileges needed for VACUUM) 
*/
+   if (!pg_class_ownercheck(indexoid, GetUserId()))
+   aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
+  RelationGetRelationName(indexRel));
+
+   /*
+* Since we did the IndexGetRelation call above without any lock, it's
+* barely possible that a race against an index drop/recreation could 
have
+* netted us the wrong table.  Recheck.
+*/
+   if (heapRel == NULL || heapoid != IndexGetRelation(indexoid, false))
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_TABLE),
+errmsg("could not open parent table of index 
%s",
+   
RelationGetRelationName(indexRel;
+
+   /* the revmap does the hard work */
+   do {
+   done = brinRevmapDesummarizeRange(indexRel, heapBlk);
+   }
+   while (!done);
+
+   relation_close(indexRel, ShareUpdateExclusiveLock);
+   relation_close(heapRel, ShareUpdateExclusiveLock);
+
+   PG_RETURN_VOID();
+}
+
+/*
  * Build a BrinDesc used to create or scan a BRIN index
  */
 BrinDesc *
diff --git a/src/backend/access/brin/brin_revmap.c 
b/src/backend/access/brin/brin_revmap.c
index 0de6999..00744df 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -168,9 +168,12 @@ brinSetHeapBlockItemptr(Buffer buf, BlockNumber 
pagesPerRange,
iptr = (ItemPointerData *) contents->rm_tids;
iptr += HEAPBLK_TO_REVMAP_INDEX(pagesPerRange, heapBlk);
 
-   ItemPointerSet(iptr,
-  ItemPointerGetBlockNumber(&tid),
-  ItemPointerGetOffsetNumber(&tid));
+   if (ItemPointerIsValid(&tid))
+   ItemPointerSet(iptr,
+  ItemPointerGetBlockNumber(&tid),
+  ItemPointerGetOffsetNumber(&tid));
+   else
+   ItemPointerSetInvalid(iptr);
 }
 
 /*
@@ -301,6 +304,138 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber 
heapBlk,
 }
 
 /*
+ * Delete an index tuple, marking a page range as unsummarized.
+ *
+ * Index must be locked in ShareUpdateExclusiveLock mode.
+ *
+ * Return FALSE if caller should retry.
+ */
+bool
+brinRevmapDesummarizeRange(Relation idxrel, BlockNumber heapBlk)
+{
+   BrinRevmap *revmap;
+   BlockNumber pagesPerRange;
+   RevmapContents *contents;
+   ItemPointerData *iptr;
+   ItemPointerData invalidIptr;
+   BlockNumber revmapBlk;
+   Buffer  revmapBuf;
+   Buffer  regBuf;
+   PagerevmapPg;
+   PageregPg;
+   OffsetNumber revmapOffset;
+   OffsetNumber regOffset;
+   ItemId  lp;
+   BrinTuple  *tup;
+
+   revmap = brinRevmapInitialize(idxrel, &pagesPerRange, NULL);
+
+   revmapBlk = revmap_get_blkno(revmap, heapBlk);
+   if (!BlockNumberIsValid(revmapBlk))
+   {
+   /* revmap page doesn't exist: range not summarized, we're

[HACKERS] Re: new high availability feature for the system with both asynchronous and synchronous replication

2017-02-27 Thread Higuchi, Daisuke

Hi all, 

I create POC patch for my proposal of new feature for high availability. 
I want to discuss about this feature. But this feature might be PG11 
because discussion is not enough. 

This patch enables walsender for async to wait until walsender for sync confirm 
WAL is flashed to Disk. This feature is activated when GUC parameter 
"async_walsender_delay" is set on. 

I write the case when this feature is useful (this is the same as I wrote 
before): 
1. Primary and synchronous standby are in the same center; called main center. 
2. Asynchronous standby is in the another center; called backup center. 
   (The backup center is located far away from the main center. If replication 
   mode is synchronous, performance will be deteriorated. So, this replication 
   must be Asynchronous. )
3. Asynchronous replication is performed in the backup center too. 
4. When primary in main center abnormally stops, standby in main center is 
   promoted, and the standby in backup center connects to the new primary.

[Main center]
||
| |--|  synchronous |--| |
| |  |replication   |  | |
| | primary  | <--> | standby1 | |
| |--|  |--| |
|||--|
 ||
 || asynchronous
 ||   replication
 ||
 ||[Backup center]
|||--|
| |--|  asynchronous|--| |
| |  |replication   |  | |
| | standby2 | <--> | standby3 | |
| |--|  |--| |
||

When the load in the main center becomes high, although WAL reaches standby in 
backup center, WAL may not reach synchronous standby in main center for various 
reasons. In other words, standby in the backup center may advance beyond 
synchronous standby in main center.

When the primary abnormally stops and standby in main center promotes, two 
standbys in backup center must be recovered by pg_rewind. However, it is 
necessary to stop new primary for pg_rewind. If pg_basebackup is used, 
recovery of backup center takes some times. This is not high availability.

Regards, 
Daisuke Higuchi 



async-waits-sync-walsender-v1.patch
Description: async-waits-sync-walsender-v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] help to identify the reason that extension's C function returns array get segmentation fault

2017-02-27 Thread Tom Lane

=?UTF-8?B?6ZKx5paw5p6X?=  writes:
> I have written an extension to manage openstreetmap data. There is a C
> function to perform spatial top k query on several  tables and return an
> array of int8 type as result. The code skeleton of this function is as
> follows:

There are a remarkable lot of bugs in this code fragment.  Many of them
would not bite you as long as you are running on 64-bit Intel hardware,
but that doesn't make them not bugs.

> Datum vquery(PG_FUNCTION_ARGS) {

> int array_len = PG_GETARG_INT32(0);
> long * node_ids;

> SPI_connect();

> //some code to retrieve data from various tables
> // node_ids are allocated and filled up

> ArrayType * retarr;
> Datum * vals ;

> vals = palloc0(array_len * sizeof(long));

Datum is not necessarily the same as "long".

> // fill the vals up
> for (i = 0 ; i < array_len ; i++)
>   vals[i] = Int64GetDatum((node_ids[i]));

int64 is not necessarily the same as "long", either.

> retarr = construct_array(vals, retcnt, INT8OID, sizeof(long), true, 'i');

Again, INT8 is not the same size as "long", and it's not necessarily
pass-by-val, and it's *certainly* not integer alignment.

> SPI_finish();

> PG_RETURN_ARRAYTYPE_P(retarr);

But I think what's really biting you, probably, is that construct_array()
made the array in CurrentMemoryContext which at that point was the SPI
execution context; which would be deleted by SPI_finish.  So you're
returning a dangling pointer.  You need to do something to either copy
the array value out to the caller's context, or build it there in the
first place.

BTW, this failure would be a lot less intermittent if you were testing
in a CLOBBER_FREED_MEMORY build.  I would go so far as to say you should
*never* develop or test C code for the Postgres backend without using
the --enable-cassert configure option for your build.  You're simply
tossing away a whole lot of debug support if you don't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-27 Thread Stephen Frost

* Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote:
> I suppose it is for suggesting what kind of word should come
> there, or avoiding silence for a tab. Or for symmetry with other
> types of manipulation, like DROP. Another possibility is creating
> multiple objects with similar names, say CREATE TABLE
> employee_x1, CREATE TABLE employee_x2. Just trying to complete
> existing *schema* is one more another possible objective.

I don't buy any of these arguments either.  I *really* don't want us
going down some road where we try to make sure that hitting 'tab' never
fails...

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-27 Thread Ashutosh Bapat

Amit's original patch had an assertion failure in
extract_autovac_opts(). His patch adds partitioned tables in the list
of tables to be auto-analyzed. But there's an assertion in
extract_autovac_opts(), which did not consider partitioned tables.
When a partitioned table is created, the assertion trips in autovacuum
worker and resets the cluster, restarting all the backends. I have
updated Robert's patch with the assertion fixed.

On Tue, Feb 28, 2017 at 8:50 AM, Robert Haas  wrote:
> On Thu, Feb 23, 2017 at 11:19 AM, Amit Langote
>  wrote:
>> Thanks for the review.
>
> In 0001, the documentation which you are patching has a section for
> limitations that apply only to both partitioning and constraint
> exclusion, and another for limitations that apply only to constraint
> exclusion.  Surely the patch should be moving a limitation that will
> no longer apply to partitioning from the first section to the second
> section, rather than leaving it in the section for limitations that
> apply to both systems and just adding a note that say "this doesn't
> apply to partitioning any more".
>
> In acquire_inherited_sample_rows(), instead of inserting a whole
> stanza of logic just above the existing dispatch on relkind, I think
> we can get by with a very slightly update to what's already there.
>
> You can't use the result of a & b as a bool.  You need to write (a &
> b) != 0, because the bool should always use 1 for true and 0 for
> false; it should not set some higher-numbered bit.
>
> The changes to autovacuum.c look useless, because
> relation_needs_vacanalyze() will presumably never fire for a table
> with no tuples of its own.
>
> Updated patch with those changes and a few cosmetic tweaks attached.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


avoid-useless-partition-ops_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-02-27 Thread Robert Haas

On Mon, Feb 6, 2017 at 3:34 PM, Ashutosh Bapat
 wrote:
> PartitionScheme is shared across multiple relations, join or base,
> partitioned similarly. Obviously it can't and does not need to point
> partition bound informations (which should all be same) of all those
> base relations. O the the face of it, it looks weird that it points to
> only one of them, mostly the one which it encounters first. But, since
> it's going to be the same partition bound information, it doesn't
> matter which one. So, I think, we can point of any one of those. Do
> you agree?

Yes.

>> The fact that set_append_rel_size needs to reopen the relation to
>> extract a few more bits of information is not desirable.  You need to
>> fish this information through in some other way; for example, you
>> could have get_relation_info() stash the needed bits in the
>> RelOptInfo.
>
> I considered this option and discarded it, since not all partitioned
> relations will have OIDs for partitions e.g. partitioned joins will
> not have OIDs for their partitions. But now that I think of it, we
> should probably store those OIDs just for the base relation and leave
> them unused for non-base relations just like other base relation
> specific fields in RelOptInfo.

Right.

>> FRACTION_PARTS_TO_PLAN seems like it should be a GUC.
>
> +1. Will take care of this. Does "representative_partitions_fraction"
> or "sample_partition_fraction" look like a good GUC name? Any other
> suggestions?

I like the second one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-02-27 Thread Kuntal Ghosh

On Tue, Feb 28, 2017 at 9:45 AM, Jim Nasby  wrote:
> On 2/24/17 6:30 AM, Kuntal Ghosh wrote:
>>
>> * You're considering any WAL file with a power of 2 as valid. Suppose,
>> the correct WAL seg size is 64mb. For some reason, the server
>> generated a 16mb invalid WAL file(maybe it crashed while creating the
>> WAL file). Your code seems to treat this as a valid file which I think
>> is incorrect. Do you agree with that?
>
>
> Detecting correct WAL size based on the size of a random WAL file seems like
> a really bad idea to me.
+1



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Restricting maximum keep segments by repslots

2017-02-27 Thread Michael Paquier

On Tue, Feb 28, 2017 at 1:16 PM, Kyotaro HORIGUCHI
 wrote:
> It is doable without a plugin and currently we are planning to do
> in the way (Maybe such plugin would be unacceptable..). Killing
> walsender (which one?), removing the slot and if failed..

The PID and restart_lsn associated to each slot offer enough
information for monitoring.

> This is the 'steps rather complex' and fragile.

The handling of slot drop is not complex. The insurance that WAL
segments get recycled on time and avoid a full bloat is though.

>> That's as well more flexible than having a parameter
>> that basically is just a synonym of max_wal_size.
>
> I thought the same thing first, max_wal_size_hard, that limits
> the wal size including extra (other than them for the two
> checkpoig cycles) segments.

It would make more sense to just switch max_wal_size from a soft to a
hard limit. The current behavior is not cool with activity spikes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] help to identify the reason that extension's C function returns array get segmentation fault

2017-02-27 Thread 钱新林

I have written an extension to manage openstreetmap data. There is a C
function to perform spatial top k query on several  tables and return an
array of int8 type as result. The code skeleton of this function is as
follows:

Datum vquery(PG_FUNCTION_ARGS) {

int array_len = PG_GETARG_INT32(0);
long * node_ids;

SPI_connect();

//some code to retrieve data from various tables
// node_ids are allocated and filled up

ArrayType * retarr;
Datum * vals ;

vals = palloc0(array_len * sizeof(long));

// fill the vals up
for (i = 0 ; i < array_len ; i++)
  vals[i] = Int64GetDatum((node_ids[i]));

retarr = construct_array(vals, retcnt, INT8OID, sizeof(long), true, 'i');

SPI_finish();

PG_RETURN_ARRAYTYPE_P(retarr);
}

the function runs smoothly when called using relatively small parameter,
such as select(unnest(vquery(1000))) ;  but when called with large
parameter, such as select(unnest(vquery(5))), sometimes it runs
normally, sometimes it runs into "Segmentation Fault" error. the larger the
parameter is, the more likely to run into segmentation fault.

back trace of the process as followings:

Program received signal SIGSEGV, Segmentation fault.
pg_detoast_datum (datum=0x55d4e7e43bc0) at
/build/postgresql-9.3-xceQkK/postgresql-9.3-9.3.16/build/../src/backend/utils/fmgr/fmgr.c:2241
2241 
/build/postgresql-9.3-xceQkK/postgresql-9.3-9.3.16/build/../src/backend/utils/fmgr/fmgr.c:
No such file or directory.
(gdb) backtrace full
#0  pg_detoast_datum (datum=0x55d4e7e43bc0) at
/build/postgresql-9.3-xceQkK/postgresql-9.3-9.3.16/build/../src/backend/utils/fmgr/fmgr.c:2241
No locals.
#1  0x55d4e485a29f in array_out (fcinfo=0x7ffd0fdb9f30) at
/build/postgresql-9.3-xceQkK/postgresql-9.3-9.3.16/build/../src/backend/utils/adt/arrayfuncs.c:958
v = 
element_type = 
typlen = 
typbyval = 
typalign = 
typdelim = 
p = 
tmp = 
retval = 
values = 
dims_str =
"\346\263\346U\000\000\000\000f\021\352\312\342\177\000\000
%q\003\000\000\000\000\270\347\026\313\342\177\000\000\000\000\002\000\000\000\000\000\211O\343\312\342\177\000\000(?\a\000\000\000\000\000\371\336\342\312\342\177\000\000\001\000\000\000\000\000\000\000f\021\352\312\342\177\000\000\200\204\004\001\000\000\000\000\270\347\026\313\342\177\000\000\000\000\002\000\000\000\000\000\211O\343\312\342\177\000\000`1\354\352\324U\000\000\371\336\342\312\342\177\000\000\001\000\000\000\000\000\000\000\000\200\355\347\324U\000\000\300;\344\347\324U\000\000\200\373\350\346\324U\000\000\200\204\004\001\000\000\000\000`\347\026\313\342\177\000\000\200;\344\347\324U\000\000\200D\t\000\000\000\000\000\001\000\000\000\000\000\000"
bitmap = 
bitmask = 
needquotes = 
needdims = 
nitems = 
overall_length = 
i = 
j = 
k = 
indx = {125, 0, 1638826752, 1007657037, 266051136, 32765}
ndim = 
dims = 
lb = 
my_extra = 
#2  0x55d4e491bf77 in FunctionCall1Coll
(flinfo=flinfo@entry=0x55d4e6281608,
collation=collation@entry=0, arg1=arg1@entry=94372911922112)
at
/build/postgresql-9.3-xceQkK/postgresql-9.3-9.3.16/build/../src/backend/utils/fmgr/fmgr.c:1301
fcinfo = {flinfo = 0x55d4e6281608, context = 0x0, resultinfo = 0x0,
fncollation = 0, isnull = 0 '\000', nargs = 1, arg = {94372911922112, 16,
94372911922112,
94372881863664, 81000, 140612075225152, 140724869504928,
94372856290490, 20, 94372911922112, 140724869504960, 94372855395365, 59362,
59362,
140724869505552, 140611821448277, 140724869505184,
140724869505104, 140724869505056, 25309696688, 4630392398271114606,
463877674369024,
4628742354541509959, 4638236535588651008, 140612075225152,
140724869505184, 140724869505264, 140724869505344, 8192, 1340029796386,
94372903805648,
94372905445328, 4412211000755930201, 4295079941117417898,
4212081119735560672, 94372856202527, 2087976960, 94372882833728,
94372882836912, 1,
140724869505296, 4327854021138088704, 3599182594146, 1,
140724869505248, 94372856077425, 1016, 94372882392848, 94372860652912,
140612076116712,
140724869505680, 94372856082219, 94372882407728,
140724869505360, 140724869505343, 3886087214688, 140612076116712,
140724869505344, 140728326873992,
94372882407736, 94372882817344, 94372882817312,
1125891316908032, 0, 94372855675584, 281483566645432, 2, 0, 94372881959504,
0, 1016, 0, 0, 8192,
18446603348840046049, 513, 128, 176, 140724869505568, 16,
459561500672, 2, 0, 511101108336, 0, 140724869505567, 0, 0, 124, 0, 0, 0,
0, 0, 0,
140612046612320, 8192, 1024, 1024, 1072},
  argnull = "\000
\000\000\000\000\000\000\300&\343\312\342\177\000\000\220\017(\346\324U\000\000x\201'\346\324U\000\000\320\242\333\017\375\177\000\000\260R\223\344\324U\000\000\060\243\333\017\375\177\000\000\002\000\000\000\000\000\000\000\340\242\333\017\375\177\000\000?:x\344\324U\000\000\0

Re: [HACKERS] Restricting maximum keep segments by repslots

2017-02-27 Thread Petr Jelinek

On 28/02/17 04:27, Kyotaro HORIGUCHI wrote:
> Hello.
> 
> Although replication slot is helpful to avoid unwanted WAL
> deletion, on the other hand it can cause a disastrous situation
> by keeping WAL segments without a limit. Removing the causal
> repslot will save this situation but it is not doable if the
> standby is active. We should do a rather complex and forcible
> steps to relieve the situation especially in an automatic
> manner. (As for me, specifically in an HA cluster.)
> 

I agree that that it should be possible to limit how much WAL slot keeps.

> This patch adds a GUC to put a limit to the number of segments
> that replication slots can keep. Hitting the limit during
> checkpoint shows a warining and the segments older than the limit
> are removed.
> 
>> WARNING:  restart LSN of replication slots is ignored by checkpoint
>> DETAIL:  Some replication slots lose required WAL segnents to continue.
> 

However this is dangerous as logical replication slot does not consider
it error when too old LSN is requested so we'd continue replication,
hiding data loss.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

2017-02-27 Thread Kyotaro HORIGUCHI

Hello,

At Tue, 28 Feb 2017 08:00:22 +0530, Robert Haas  wrote 
in 
> On Mon, Feb 27, 2017 at 2:07 PM, Kyotaro HORIGUCHI
>  wrote:
> >> +# make_charmap - convert charset table to charmap hash
> >> +# with checking duplicate source code
> >> Maybe this should be "with checking of duplicated source codes".
> >
> > Even though I'm not good English writer, 'duplicated codes' looks
> > as multiple copies of the original 'code' (for me, of
> > course). And the 'checking' is a (pure) verbal noun (means not a
> > deverbal noun) so 'of' is not required. But, of course, I'm not
> > sure which sounds more natural as English
> 
> The problem is that, because "checking" is a noun in this sentence, it
> can't be followed by a direct object so you need "of" to connect
> "checking" with the thing that is being checked.  However, what I
> would do is rearrange this sentence slightly as to use "checking" as a
> verb, like this:
> 
> convert charset table to charmap hash, checking for duplicate source
> codes along the way
> 
> While I don't think Michael's suggestion is wrong, I find the above a
> little more natural.

Thank you for the suggestion and explantion. I leaned that,
maybe.  I'll send the version with the revised comment.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wrong variable type in KeepLogSeg

2017-02-27 Thread Kyotaro HORIGUCHI

At Tue, 28 Feb 2017 12:17:07 +0900, Michael Paquier  
wrote in 
> On Tue, Feb 28, 2017 at 11:17 AM, Kyotaro HORIGUCHI
>  wrote:
> > slotSegNo should be a XLogSegNo. Both types share the same
> > intrinsic type so it doesn't harm anything.
> 
> Nice catch!

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Restricting maximum keep segments by repslots

2017-02-27 Thread Kyotaro HORIGUCHI

Thank you for the opinion.

At Tue, 28 Feb 2017 12:42:32 +0900, Michael Paquier  
wrote in 
> Please no. Replication slots are designed the current way because we
> don't want to have to use something like wal_keep_segments as it is a
> wart, and this applies as well for replication slots in my opinion. If
> a slot is bloating WAL and you care about your Postgres instance, I
> would recommend instead that you use a background worker that does
> monitoring of the situation based on max_wal_size for example, killing
> the WAL sender associated to the slot if there is something connected
> but it is frozen or it cannot keep up the pace of WAL generation, and
> then dropping the slot.

It is doable without a plugin and currently we are planning to do
in the way (Maybe such plugin would be unacceptable..). Killing
walsender (which one?), removing the slot and if failed.. This is
the 'steps rather complex' and fragile.

> You may want to issue a checkpoint in this
> case as well to ensure that segments get recycled. But anyway, if you
> reach this point of WAL bloat, perhaps that's for the best as users
> would know about it because backups would get in danger.

Yes, but at the end it is better than that a server just stops
with a PANIC.

> For some applications, that is acceptable, but you could always
> rely on monitoring slots and kill them on sight if
> needed.

Another solution would be that removing a slot kills
corresponding walsender. What do you think about this?

pg_drop_replication_slot(name, *force*)

force = true kills the walsender runs on the slot.

> That's as well more flexible than having a parameter
> that basically is just a synonym of max_wal_size.

I thought the same thing first, max_wal_size_hard, that limits
the wal size including extra (other than them for the two
checkpoig cycles) segments.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-02-27 Thread Jim Nasby


On 2/24/17 6:30 AM, Kuntal Ghosh wrote:

* You're considering any WAL file with a power of 2 as valid. Suppose,
the correct WAL seg size is 64mb. For some reason, the server
generated a 16mb invalid WAL file(maybe it crashed while creating the
WAL file). Your code seems to treat this as a valid file which I think
is incorrect. Do you agree with that?


Detecting correct WAL size based on the size of a random WAL file seems 
like a really bad idea to me.


I also don't see the reason for #2... or is that how initdb writes out 
the correct control file?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Disallowing multiple queries per PQexec()

2017-02-27 Thread Surafel Temesgen

This assignment is on todo list and has a benefit of providing an
additional defense against SQL-injection attacks. Previous mailing list
discussion is here
 and I
attach a small patch that fix the issue by checking whether query string
contains multiple sql commands without being a transaction block or not and
emits appropriate error message in the case of non-transaction block
multiple query string.


This patch tests using psql –c option


 i.e. if it’s not a transaction block and have multiple query string ,it
emits appropriate error message.


psql -c 'DECLARE myportal CURSOR FOR select * from pg_database;FETCH ALL in
myportal;CLOSE myportal' postgres

ERROR:  cannot execute multiple commands unless it is a transaction block


In a case of transaction block and single command query string it continue
with normal execution


psql -c 'BEGIN;DECLARE myportal CURSOR FOR select * from pg_database;FETCH
ALL in myportal;CLOSE myportal;END' postgres

COMMIT



psql -c 'CREATE TABLE foo();' postgres

CREATE TABLE



Comments?


Regards

Surafel


disallow-multiple-queries-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-27 Thread Amit Langote

On 2017/02/28 12:29, Michael Paquier wrote:
> On Tue, Feb 28, 2017 at 12:23 PM, Bruce Momjian  wrote:
>> I am a little confused by the above.  Is a partitioned table the parent
>> or the children?  Reading the code it seems it is the parent, which
>> explains why it works.  Can I clarify that?
> 
> As I understand things, a partitioned table is a parent relation that
> exists only to link to its child portions, holding as well the
> definitions linking to each partition.

Yes.  As I mentioned in my previous email, data inserted into the
partitioned table is routed to its partitions.  A partition may be itself
a partitioned table, so the routing continues until we find a leaf
partition.  All the partitioned tables in this chain leading to the leaf
partition are RELKIND_PARTITIONED_TABLE ('P') relations and only the leaf
tables are RELKIND_RELATION tables.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tom Lane

Andres Freund  writes:
> Independently of this, we really should redefine StandardChunkHeader to
> be only the MemoryContext.  There's no need to have size/requested_size
> part of StandardChunkHeader, given there's
> MemoryContextMethods->get_chunk_space().

Yeah, perhaps.  The trick would be to get things laid out so that the 
MemoryContext pointer is always immediately adjacent to the chunk data
(no padding between).

One could imagine redefining aset.c's chunk header along the lines of

typedef struct AllocSetChunkHeader
{
Sizesize; /* size of data space allocated in chunk */
#ifdef MEMORY_CONTEXT_CHECKING
Sizerequested_size;   /* original request size */
#if 32-bit-but-maxalign-is-8
Sizepadding;  /* needed to avoid padding below */
#endif
#endif
MemoryContext context;/* owning context */
/* there must not be any padding to reach a MAXALIGN boundary here! */
} AllocSetChunkHeader;

where we'd possibly need some help from configure to implement that inner
#if condition, but it seems doable enough.

If the slab allocator would be happier with just a MemoryContext pointer
as chunk header, I think we should push in this direction rather than
invent some short-term hack.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication access control patches

2017-02-27 Thread Petr Jelinek

On 28/02/17 04:10, Stephen Frost wrote:
> Peter,
> 
> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
>> On 2/18/17 18:06, Stephen Frost wrote:
>>> I'm not convinced that it really makes sense to have PUBLICATION of a
>>> table be independent from the rights an owner of a table has.  We don't
>>> allow other ALTER commands on objects based on GRANT'able rights, in
>>> general, so I'm not really sure that it makes sense to do so here.
>>
>> The REFERENCES and TRIGGER privileges are very similar in principle.
> 
> TRIGGER is a great example of an, ultimately, poorly chosen privilege to
> GRANT away, with a good history of discussion about it:
> 
> https://www.postgresql.org/message-id/21827.1166115978%40sss.pgh.pa.us
> https://www.postgresql.org/message-id/7389.1405554356%40sss.pgh.pa.us
> 
> Further, how would RLS be handled with publication?  I've been assuming
> that it's simply ignored, but that's clearly wrong if a non-owner can
> publish a table that they just have SELECT,PUBLISH on, isn't it?
> 

I don't see why would PUBLISH care about RLS.

>>> The downside of adding these privileges is that we're burning through
>>> the last few bits in the ACLMASK for a privilege that doesn't really
>>> seem like it's something that would be GRANT'd in general usage.
>>
>> I don't see any reason why we couldn't increase the size of AclMode if
>> it becomes necessary.
> 
> I've fought exactly that argument before and there is a good deal of
> entirely reasonable push-back on increasing our catalogs by so much.
> 

Hmm to be honest I don't see what's the issue there.

>>> I'm certainly all for removing the need for users to be the superuser
>>> for such commands, just not sure that they should be GRANT'able
>>> privileges instead of privileges which the owner of the relation or
>>> database has.
>>
>> Then you couldn't set up a replication structure involving tables owned
>> by different users without resorting to blunt instruments like having
>> everything owned by the same user or using superusers.
> 
> That's not correct- you would simply need a user who is considered an
> owner for all of the objects which you want to replicate, that doesn't
> have to be a *direct* owner or a superuser.
> 
> The tables can have individual roles, with some admin role who is a
> member of those other roles, or another mechanism (as Simon has proposed
> not too long ago...) to have a given role be considered equivilant to an
> owner for all of the relations in a particular database.
> 

I do agree with this though. And I also agree with the overall sentiment
that we don't need special PUBLICATION privilege on tables.

I do like the rest of the patch.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund

On 2017-02-28 01:44:42 +0100, Tomas Vondra wrote:
> On 02/27/2017 06:42 PM, Andres Freund wrote:
> > Yea, I hadn't yet realized when writing that that termite actually,
> > despite running on ppc64, compiles a 32bit postgres. Will thus
> > duplicate StandardChunkHeader's contents in to slab.c :( - I don't
> > see an easy way around that...
>
> I've tried this - essentially copying the StandardChunkHeader's contents
> into SlabChunk, but that does not seem to do the trick, sadly. Per pahole,
> the structures then (at least on armv7l) look like this:
>
> struct SlabChunk {
> void * block;/*  0   4 */
> MemoryContext  context;  /*  4   4 */
> Size   size; /*  8   4 */
> Size   requested_size;   /* 12   4 */
>
> /* size: 16, cachelines: 1, members: 4 */
> /* last cacheline: 16 bytes */
> };
>
> struct StandardChunkHeader {
> MemoryContext  context;  /*  0   4 */
> Size   size; /*  4   4 */
> Size   requested_size;   /*  8   4 */
>
> /* size: 12, cachelines: 1, members: 3 */
> /* last cacheline: 12 bytes */
> };

> So SlabChunk happens to be perfectly aligned (MAXIMUM_ALIGNOF=8), and so
> pfree() grabs the block pointer but thinks it's the context :-(

Hm. The only way I can think of to do achieve the right thing here would
be something like:

typedef struct StandardChunkHeader
{
MemoryContext context;  /* owning context */
Sizesize;   /* size of data space allocated in chunk */
#ifdef MEMORY_CONTEXT_CHECKING
/* when debugging memory usage, also store actual requested size */
Sizerequested_size;
#endif
union
{
char *data;
/* ensure MAXALIGNed */
int64 alignof_int64;
double alignof_double;
} d;
} StandardChunkHeader;

typedef struct SlabChunk
{
void *block;
StandardChunkHeader header;
} SlabChunk;

That's not overly pretty, but also not absolutely disgusting.  Unifying
the padding calculations between allocators would be a nice side-effect.
Note we at least previously had such union/double tricks in the tree, via
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e1a11d93111ff3fba7a91f3f2ac0b0aca16909a8

It might be a good idea to have configure define maxaligned_type instead
of including both int64/double (although it'll IIRC practically always
be double that's maxaligned).

Independently of this, we really should redefine StandardChunkHeader to
be only the MemoryContext.  There's no need to have size/requested_size
part of StandardChunkHeader, given there's
MemoryContextMethods->get_chunk_space().

> Not sure what to do about this - the only thing I can think about is
> splitting SlabChunk into two separate structures, and align them
> independently.
>
> The attached patch does that - it probably needs a bit more work on the
> comments to make it commit-ready, but it fixes the test_deconding tests on
> the rpi3 board I'm using for testing.

That'd work as well, although at the very least I'd want to add a
comment explaining the actual memory layout somewhere - this is a bit
too finnicky to expect to get right the next time round.

Any preferences?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Restricting maximum keep segments by repslots

2017-02-27 Thread Michael Paquier

On Tue, Feb 28, 2017 at 12:27 PM, Kyotaro HORIGUCHI
 wrote:
> Although replication slot is helpful to avoid unwanted WAL
> deletion, on the other hand it can cause a disastrous situation
> by keeping WAL segments without a limit. Removing the causal
> repslot will save this situation but it is not doable if the
> standby is active. We should do a rather complex and forcible
> steps to relieve the situation especially in an automatic
> manner. (As for me, specifically in an HA cluster.)
>
> This patch adds a GUC to put a limit to the number of segments
> that replication slots can keep. Hitting the limit during
> checkpoint shows a warining and the segments older than the limit
> are removed.
>
>> WARNING:  restart LSN of replication slots is ignored by checkpoint
>> DETAIL:  Some replication slots lose required WAL segnents to continue.
>
> Another measure would be automatic deletion or inactivation of
> the culprit slot but it seems too complex for the problem.
>
>
> As we have already postponed some patches by the triage for the
> last commit fest, this might should be postponed to PG11.

Please no. Replication slots are designed the current way because we
don't want to have to use something like wal_keep_segments as it is a
wart, and this applies as well for replication slots in my opinion. If
a slot is bloating WAL and you care about your Postgres instance, I
would recommend instead that you use a background worker that does
monitoring of the situation based on max_wal_size for example, killing
the WAL sender associated to the slot if there is something connected
but it is frozen or it cannot keep up the pace of WAL generation, and
then dropping the slot. You may want to issue a checkpoint in this
case as well to ensure that segments get recycled. But anyway, if you
reach this point of WAL bloat, perhaps that's for the best as users
would know about it because backups would get in danger. For some
applications, that is acceptable, but you could always rely on
monitoring slots and kill them on sight if needed. That's as well more
flexible than having a parameter that basically is just a synonym of
max_wal_size.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-27 Thread Michael Paquier

On Tue, Feb 28, 2017 at 12:23 PM, Bruce Momjian  wrote:
> I am a little confused by the above.  Is a partitioned table the parent
> or the children?  Reading the code it seems it is the parent, which
> explains why it works.  Can I clarify that?

As I understand things, a partitioned table is a parent relation that
exists only to link to its child portions, holding as well the
definitions linking to each partition.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Restricting maximum keep segments by repslots

2017-02-27 Thread Kyotaro HORIGUCHI

Hello.

Although replication slot is helpful to avoid unwanted WAL
deletion, on the other hand it can cause a disastrous situation
by keeping WAL segments without a limit. Removing the causal
repslot will save this situation but it is not doable if the
standby is active. We should do a rather complex and forcible
steps to relieve the situation especially in an automatic
manner. (As for me, specifically in an HA cluster.)

This patch adds a GUC to put a limit to the number of segments
that replication slots can keep. Hitting the limit during
checkpoint shows a warining and the segments older than the limit
are removed.

> WARNING:  restart LSN of replication slots is ignored by checkpoint
> DETAIL:  Some replication slots lose required WAL segnents to continue.

Another measure would be automatic deletion or inactivation of
the culprit slot but it seems too complex for the problem.


As we have already postponed some patches by the triage for the
last commit fest, this might should be postponed to PG11.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

>From 367205d51ef471defd0f9df76840dee1c4cd4036 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 28 Feb 2017 11:39:48 +0900
Subject: [PATCH] Add WAL releaf vent for replication slots

Adds a capability to limit the number of segments kept by replication
slots by a GUC variable.
---
 src/backend/access/transam/xlog.c | 12 
 src/backend/utils/misc/guc.c  | 10 ++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h |  1 +
 4 files changed, 24 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5016273..6c57e99 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -104,6 +104,7 @@ int			wal_level = WAL_LEVEL_MINIMAL;
 int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
+int			max_slot_wal_keep_segments = 0;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -9267,6 +9268,17 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 
 		XLByteToSeg(keep, slotSegNo);
 
+		/* emergency vent */
+		if (max_slot_wal_keep_segments > 0 &&
+			slotSegNo < segno - max_slot_wal_keep_segments)
+		{
+			ereport(WARNING,
+	(errmsg ("restart LSN of replication slots is ignored by checkpoint"),
+	 errdetail("Some replication slots lose required WAL segnents to continue.")));
+			/* slotSegNo cannot be negative here */
+			slotSegNo = segno - max_slot_wal_keep_segments;
+		}
+
 		if (slotSegNo <= 0)
 			segno = 1;
 		else if (slotSegNo < segno)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0707f66..4ff1c2a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2335,6 +2335,16 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		{"max_slot_wal_keep_segments", PGC_SIGHUP, REPLICATION_SENDING,
+			gettext_noop("Sets the maximum keep segments by replication slots."),
+			NULL
+		},
+		&max_slot_wal_keep_segments,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
 			gettext_noop("Sets the maximum time to wait for WAL replication."),
 			NULL,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 157d775..7424a63 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -233,6 +233,7 @@
 #max_wal_senders = 10		# max number of walsender processes
 # (change requires restart)
 #wal_keep_segments = 0		# in logfile segments, 16MB each; 0 disables
+#max_slot_wal_keep_segments = 0	# in logfile segments, 16MB each; 0 disables
 #wal_sender_timeout = 60s	# in milliseconds; 0 disables
 
 #max_replication_slots = 10	# max number of replication slots
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9f036c7..fae1b87 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -97,6 +97,7 @@ extern bool reachedConsistency;
 extern int	min_wal_size;
 extern int	max_wal_size;
 extern int	wal_keep_segments;
+extern int	max_slot_wal_keep_segments;
 extern int	XLOGbuffers;
 extern int	XLogArchiveTimeout;
 extern int	wal_retrieve_retry_interval;
-- 
2.9.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-27 Thread Bruce Momjian

On Tue, Feb 28, 2017 at 11:53:16AM +0900, Amit Langote wrote:
> > I don't think we are doing this, but if the parent table doesn't have a
> > physical file pg_upgrade will need to be taught that.  We have that case
> > now for unlogged tables on standby servers that we need to address.
> 
> Partitioned tables do have physical files as of now.  This thread is to
> discuss the proposal to get rid of the physical file for partitioned tables.
> 
> By the way, I just tried pg_upgrade on top of this patch, and it seems
> partitioned tables without the physical file migrate just fine to the new
> cluster.  To test I did: created a partitioned table and few partitions,
> inserted some data into it, pg_upgraded the cluster to find the
> partitioned table intact with its data in the new cluster (to be clear,
> the data is contained in partitions).  Is there something that wouldn't
> work that I should instead be testing?
> 
> Also, it seems that the partitioned tables (without physical files) won't
> have the same issue on the standby servers as unlogged tables.  It's just
> that we route inserts into a partitioned table to its partitions and hence
> the partitioned table itself does not contain because all the incoming
> data is routed.  Am I missing something?

I see why it works now.  pg_upgrade only upgrades relations and
materialized views:

"  WHERE relkind IN ('r', 'm') AND "

but partitions are defined as 'P':

#define   RELKIND_PARTITIONED_TABLE 'P' /* partitioned table */

I am a little confused by the above.  Is a partitioned table the parent
or the children?  Reading the code it seems it is the parent, which
explains why it works.  Can I clarify that?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-27 Thread Robert Haas

On Thu, Feb 23, 2017 at 11:19 AM, Amit Langote
 wrote:
> Thanks for the review.

In 0001, the documentation which you are patching has a section for
limitations that apply only to both partitioning and constraint
exclusion, and another for limitations that apply only to constraint
exclusion.  Surely the patch should be moving a limitation that will
no longer apply to partitioning from the first section to the second
section, rather than leaving it in the section for limitations that
apply to both systems and just adding a note that say "this doesn't
apply to partitioning any more".

In acquire_inherited_sample_rows(), instead of inserting a whole
stanza of logic just above the existing dispatch on relkind, I think
we can get by with a very slightly update to what's already there.

You can't use the result of a & b as a bool.  You need to write (a &
b) != 0, because the bool should always use 1 for true and 0 for
false; it should not set some higher-numbered bit.

The changes to autovacuum.c look useless, because
relation_needs_vacanalyze() will presumably never fire for a table
with no tuples of its own.

Updated patch with those changes and a few cosmetic tweaks attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


avoid-useless-partition-ops.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wrong variable type in KeepLogSeg

2017-02-27 Thread Michael Paquier

On Tue, Feb 28, 2017 at 11:17 AM, Kyotaro HORIGUCHI
 wrote:
> slotSegNo should be a XLogSegNo. Both types share the same
> intrinsic type so it doesn't harm anything.

Nice catch!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication access control patches

2017-02-27 Thread Stephen Frost

Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 2/18/17 18:06, Stephen Frost wrote:
> > I'm not convinced that it really makes sense to have PUBLICATION of a
> > table be independent from the rights an owner of a table has.  We don't
> > allow other ALTER commands on objects based on GRANT'able rights, in
> > general, so I'm not really sure that it makes sense to do so here.
> 
> The REFERENCES and TRIGGER privileges are very similar in principle.

TRIGGER is a great example of an, ultimately, poorly chosen privilege to
GRANT away, with a good history of discussion about it:

https://www.postgresql.org/message-id/21827.1166115978%40sss.pgh.pa.us
https://www.postgresql.org/message-id/7389.1405554356%40sss.pgh.pa.us

Further, how would RLS be handled with publication?  I've been assuming
that it's simply ignored, but that's clearly wrong if a non-owner can
publish a table that they just have SELECT,PUBLISH on, isn't it?

> > The downside of adding these privileges is that we're burning through
> > the last few bits in the ACLMASK for a privilege that doesn't really
> > seem like it's something that would be GRANT'd in general usage.
> 
> I don't see any reason why we couldn't increase the size of AclMode if
> it becomes necessary.

I've fought exactly that argument before and there is a good deal of
entirely reasonable push-back on increasing our catalogs by so much.

> > I'm certainly all for removing the need for users to be the superuser
> > for such commands, just not sure that they should be GRANT'able
> > privileges instead of privileges which the owner of the relation or
> > database has.
> 
> Then you couldn't set up a replication structure involving tables owned
> by different users without resorting to blunt instruments like having
> everything owned by the same user or using superusers.

That's not correct- you would simply need a user who is considered an
owner for all of the objects which you want to replicate, that doesn't
have to be a *direct* owner or a superuser.

The tables can have individual roles, with some admin role who is a
member of those other roles, or another mechanism (as Simon has proposed
not too long ago...) to have a given role be considered equivilant to an
owner for all of the relations in a particular database.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] rename pg_log directory?

2017-02-27 Thread Bruce Momjian

On Mon, Feb 27, 2017 at 09:51:26AM -0500, Tom Lane wrote:
> Peter Eisentraut  writes:
> > How about changing the default for log_directory from 'pg_log' to, say,
> > 'log'?
> 
> > We have been emphasizing that the prefix "pg_" is for things reserved to
> > PostgreSQL, whereas the pg_log directory is entirely an arbitrary
> > user-space name.  Also, with a different name, the directory would stand
> > out more between all the other pg_* directories in the data directory.
> 
> No objection to the basic point, but "log" seems perhaps a little too
> generic to me.  Would something like "server_log" be better?

"activity_log"?  I like the idea of a rename.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-27 Thread Amit Langote

On 2017/02/28 3:54, Bruce Momjian wrote:
> On Fri, Feb 10, 2017 at 03:19:47PM +0900, Amit Langote wrote:
>> The new partitioned tables do not contain any data by themselves.  Any
>> data inserted into a partitioned table is routed to and stored in one of
>> its partitions.  In fact, it is impossible to insert *any* data before a
>> partition (to be precise, a leaf partition) is created.  It seems wasteful
>> then to allocate physical storage (files) for partitioned tables.  If we
>> do not allocate the storage, then we must make sure that the right thing
>> happens when a command that is intended to manipulate a table's storage
>> encounters a partitioned table, the "right thing" here being that the
>> command's code either throws an error or warning (in some cases) if the
>> specified table is a partitioned table or ignores any partitioned tables
>> when it reads the list of relations to process from pg_class.  Commands
>> that need to be taught about this are vacuum, analyze, truncate, and alter
>> table.  Specifically:
>>
>> - In case of vacuum, specifying a partitioned table causes a warning
>>
>> - In case of analyze, we do not throw an error or warning but simply
>>   avoid calling do_analyze_rel() *non-recursively*.  Further in
>>   acquire_inherited_sample_rows(), any partitioned tables in the list
>>   returned by find_all_inheritors() are skipped.
>>
>> - In case of truncate, only the part which manipulates table's physical
>>   storage is skipped for partitioned tables.
>>
>> - ATRewriteTables() skips on the AlteredTableInfo entries for partitioned
>>   tables, because there is nothing to be done.
>>
>> - Since we cannot create indexes on partitioned tables anyway, there is
>>   no need to handle cluster and reindex (they throw a meaningful error
>>   already due to the lack of indexes.)
> 
> I don't think we are doing this, but if the parent table doesn't have a
> physical file pg_upgrade will need to be taught that.  We have that case
> now for unlogged tables on standby servers that we need to address.

Partitioned tables do have physical files as of now.  This thread is to
discuss the proposal to get rid of the physical file for partitioned tables.

By the way, I just tried pg_upgrade on top of this patch, and it seems
partitioned tables without the physical file migrate just fine to the new
cluster.  To test I did: created a partitioned table and few partitions,
inserted some data into it, pg_upgraded the cluster to find the
partitioned table intact with its data in the new cluster (to be clear,
the data is contained in partitions).  Is there something that wouldn't
work that I should instead be testing?

Also, it seems that the partitioned tables (without physical files) won't
have the same issue on the standby servers as unlogged tables.  It's just
that we route inserts into a partitioned table to its partitions and hence
the partitioned table itself does not contain because all the incoming
data is routed.  Am I missing something?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bytea_output output of base64

2017-02-27 Thread Bruce Momjian

On Mon, Feb 27, 2017 at 09:28:10PM +0530, Robert Haas wrote:
> I don't think Bruce was seriously proposing a change in this area
> anyway.  I think he was just asking a question.

That is correct.  I was asking if we made an obvious mistake, and most
people are saying no.  Also, base64 is less easy to compress because
input bytes span base64-bytes, so "dog" might encode differently
depending on where the two high bits are stored, while hex alway encodes
"dog" the same way.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Unhelpful typesetting of callouts in example queries in the docs

2017-02-27 Thread Thomas Munro

Hi

A novice user asked me about the last example here, which kept producing errors:

https://www.postgresql.org/docs/9.6/static/tutorial-agg.html?

It turned out that the 'callout' was causing confusion because it
sticks "(1)" into the middle of the query in approximately the same
typeface:

SELECT city, max(temp_lo)
FROM weather
WHERE city LIKE 'S%'(1)
GROUP BY city
HAVING max(temp_lo) < 40;

Maybe we should move it over a bit (?) and make it a comment, in case
it gets copied-and-pasted or otherwise misinterpreted?

SELECT city, max(temp_lo)
FROM weather
WHERE city LIKE 'S%'   -- (1)
GROUP BY city
HAVING max(temp_lo) < 40;

See attached which does that and there and a couple of other places.

-- 
Thomas Munro
http://www.enterprisedb.com


put-callouts-in-sql-comments.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Final commitfest for 10.0 almost underway

2017-02-27 Thread David Steele

Submissions for patches to be included in PostgresSQL 10.0 will be
accepted until 2017-03-01 00:00 AoE (UTC-12).

Get 'em in while you can!

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

2017-02-27 Thread Robert Haas

On Mon, Feb 27, 2017 at 2:07 PM, Kyotaro HORIGUCHI
 wrote:
>> +# make_charmap - convert charset table to charmap hash
>> +# with checking duplicate source code
>> Maybe this should be "with checking of duplicated source codes".
>
> Even though I'm not good English writer, 'duplicated codes' looks
> as multiple copies of the original 'code' (for me, of
> course). And the 'checking' is a (pure) verbal noun (means not a
> deverbal noun) so 'of' is not required. But, of course, I'm not
> sure which sounds more natural as English

The problem is that, because "checking" is a noun in this sentence, it
can't be followed by a direct object so you need "of" to connect
"checking" with the thing that is being checked.  However, what I
would do is rearrange this sentence slightly as to use "checking" as a
verb, like this:

convert charset table to charmap hash, checking for duplicate source
codes along the way

While I don't think Michael's suggestion is wrong, I find the above a
little more natural.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Wrong variable type in KeepLogSeg

2017-02-27 Thread Kyotaro HORIGUCHI

Hello, I found a variable definition with wrong type
specification in KeepLogSeg, which doesn't harm anything.

> static void
> KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
> {
> ...
> /* then check whether slots limit removal further */
> if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)
> {
> XLogRecPtrslotSegNo;
> 
> XLByteToSeg(keep, slotSegNo);


slotSegNo should be a XLogSegNo. Both types share the same
intrinsic type so it doesn't harm anything.

This is back-patchable upto 9.4.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5016273..8973583 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9263,7 +9263,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 	/* then check whether slots limit removal further */
 	if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)
 	{
-		XLogRecPtr	slotSegNo;
+		XLogSegNo	slotSegNo;
 
 		XLByteToSeg(keep, slotSegNo);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] utility commands benefiting from parallel plan

2017-02-27 Thread Haribabu Kommi

On Sat, Feb 25, 2017 at 3:21 AM, Robert Haas  wrote:

> On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
>  wrote:
> > Here I attached an implementation patch that allows
> > utility statements that have queries underneath such as
> > CREATE TABLE AS, CREATE MATERIALIZED VIEW
> > and REFRESH commands to benefit from parallel plan.
> >
> > These write operations not performed concurrently by the
> > parallel workers, but the underlying query that is used by
> > these operations are eligible for parallel plans.
> >
> > Currently the write operations are implemented for the
> > tuple dest types DestIntoRel and DestTransientRel.
> >
> > Currently I am evaluating other write operations that can
> > benefit with parallelism without side effects in enabling them.
> >
> > comments?
>
> I think a lot more work than this will be needed.  See:
>
> https://www.postgresql.org/message-id/CA+TgmoZC5ft_t9uQWSO5_1vU6H8oVyD=
> zyuLvRnJqTN==fv...@mail.gmail.com
>
> ...and the discussion which followed.
>


Thanks for the link.
Yes, it needs more work to support parallelism even for
queries that involved in write operations like INSERT,
DELETE and UPDATE commands.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] pg_upgrade loses security lables and COMMENTs on blobs

2017-02-27 Thread Bruce Momjian

On Thu, Feb 23, 2017 at 10:36:37AM -0500, Stephen Frost wrote:
> All,
> 
> * Stephen Frost (sfr...@snowman.net) wrote:
> > Just wanted to get a note out to -hackers about the issue, I'll see
> > about getting a fix written up for it soon.
> 
> Attached is a patch which addresses this issue.  I'm not terribly
> pleased with it, but I also haven't got any great ideas of what else to
> do.  Suggestions welcome, of course.
> 
> Otherwise, I'll plan to start working on the back-branch changes for
> this soon.

Yeah, this is probably the best you can do.   Your analysis of how we
used to treat large objects is correct, and was never adjusted for the
changes you outlined.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] utility commands benefiting from parallel plan

2017-02-27 Thread Haribabu Kommi

On Sat, Feb 25, 2017 at 2:45 AM, Dilip Kumar  wrote:

> On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
>  wrote:
> > Here I attached an implementation patch that allows
> > utility statements that have queries underneath such as
> > CREATE TABLE AS, CREATE MATERIALIZED VIEW
> > and REFRESH commands to benefit from parallel plan.
> >
> > These write operations not performed concurrently by the
> > parallel workers, but the underlying query that is used by
> > these operations are eligible for parallel plans.
> >
> > Currently the write operations are implemented for the
> > tuple dest types DestIntoRel and DestTransientRel.
> >
> > Currently I am evaluating other write operations that can
> > benefit with parallelism without side effects in enabling them.
>
> The Idea looks good to me.
>
> Since we are already modifying heap_prepare_insert, I am thinking that
> we can as well enable queries like "insert into .. select from .."
> with minor modification?
>

Thanks for the review.

I am finding it not so easy in supporting write operations like INSERT,
DELETE and UPDATE commands to use parallelism benefits for the
queries that are underneath.

Currently the parallelism is enabled only for the tables that don't have
any triggers and indexes with expressions. This limitation can be
removed after a though testing.

To support the same, I removed all the errors from heap functions
and functions to get a new transaction and updating the command id
to the current snapshot (Required for the cases where a single command
validates the input).

Attached a WIP patch for the support for DML write operations.
There is no functional change in base utility write support patch.

Regards,
Hari Babu
Fujitsu Australia

0002_dml_write_using_parallel_1.patch
Description: Binary data

0001_utility_write_using_parallel_1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-27 Thread Andres Freund

On 2017-02-27 17:00:23 -0800, Joshua D. Drake wrote:
> On 02/22/2017 02:45 PM, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2017-02-22 08:43:28 -0500, Tom Lane wrote:
> > > > (To be concrete, I'm suggesting dropping --disable-integer-datetimes
> > > > in HEAD, and just agreeing that in the back branches, use of replication
> > > > protocol with float-timestamp servers is not supported and we're not
> > > > going to bother looking for related bugs there.  Given the lack of field
> > > > complaints, I do not believe anyone cares.)
> > 
> > What I *am* willing to spend time on is removing float-timestamp code
> > in HEAD.  I've not yet heard anybody speak against doing that (or at
> > least, nothing I interpreted as a vote against it).  If I've not heard
> > any complaints by tomorrow, I'll get started on that.
> 
> Rip it out!

Already happened: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6aa17e0ae367afdcea07118e016111af4fa6bc3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-27 Thread Joshua D. Drake


On 02/22/2017 02:45 PM, Tom Lane wrote:

Andres Freund  writes:

On 2017-02-22 08:43:28 -0500, Tom Lane wrote:

(To be concrete, I'm suggesting dropping --disable-integer-datetimes
in HEAD, and just agreeing that in the back branches, use of replication
protocol with float-timestamp servers is not supported and we're not
going to bother looking for related bugs there.  Given the lack of field
complaints, I do not believe anyone cares.)


What I *am* willing to spend time on is removing float-timestamp code
in HEAD.  I've not yet heard anybody speak against doing that (or at
least, nothing I interpreted as a vote against it).  If I've not heard
any complaints by tomorrow, I'll get started on that.


Rip it out!

Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-27 Thread Bruce Momjian

On Mon, Feb 20, 2017 at 09:07:33AM -0500, Tom Lane wrote:
> The question to be asked is whether there is still anybody out there
> using float timestamps.  I'm starting to get dubious about it myself.
> Certainly, no packager that I'm aware of has shipped a float-timestamp
> build since we switched the default in 8.4.  Maybe there is somebody
> who's faithfully built a float-timestamp custom build every year so they
> can pg_upgrade in place from their 8.3 installation, but there have got
> to be darn few such people.
> 
> As for "proper deprecation period", the documentation has called the
> option deprecated since 8.4:
> 
> -disable-integer-datetimes
>   Disable support for 64-bit integer storage for timestamps and
>   intervals, and store datetime values as floating-point numbers
>   instead. Floating-point datetime storage was the default in
>   PostgreSQL releases prior to 8.4, but it is now deprecated,
>   because it does not support microsecond precision for the full
>   range of timestamp values.
> 
> I think the strongest reason why we didn't move to kill it sooner was
> that we were not then assuming that every platform had 64-bit ints;
> but we've required that since 9.0.

I agree with removing support for float timestamps in PG 10.  It would
be tempting to think we can reuse the bits we stopped using in 9.0 for
VACUUM FULL, e.g.:

#define HEAP_MOVED_OFF  0x4000  /* moved to another place by 
pre-9.0
 * VACUUM FULL; kept for binary
 * upgrade support */
#define HEAP_MOVED_IN   0x8000  /* moved from another place by 
pre-9.0
 * VACUUM FULL; kept for binary
 * upgrade support */

However, if users built Postgres 8.4 with integer timestamps, they could
still be in the data files.  Does anyone see a fault in my logic, I
hope?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-02-27 Thread David Steele

On 2/27/17 7:50 PM, Michael Paquier wrote:
> On Tue, Feb 28, 2017 at 9:42 AM, David Steele  wrote:
>> On 2/27/17 7:38 PM, Michael Paquier wrote:
>>> On Tue, Feb 28, 2017 at 9:25 AM, David Steele  wrote:
 I also marked the pg_stop_* functions as parallel restricted, the same
 as pg_start_backup().  Previously they were parallel safe which I don't
 believe is accurate for the non-exclusive version at the very least,
 since it is tied to a particular backend.
>>>
>>> Yeah, those should really be parallel restricted. For the exclusive
>>> version, having the function run in parallel would also lead to errors
>>> per the presence/lack of backup_label file.
>>
>> I'm not sure that's the case.  It seems like it should lock just as
>> multiple backends would now.  One process would succeed and the others
>> would error.  Maybe I'm missing something?
> 
> Hm, any errors happening in the workers would be reported to the
> leader, meaning that even if one worker succeeded to run
> pg_start_backup() it would be reported as an error at the end to the
> client, no? By marking the exclusive function restricted we get sure
> that it is just the leader that fails or succeeds.

Good point, and it strengthens the argument beyond, "it just seems right."

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-02-27 Thread Michael Paquier

On Tue, Feb 28, 2017 at 9:42 AM, David Steele  wrote:
> On 2/27/17 7:38 PM, Michael Paquier wrote:
>> On Tue, Feb 28, 2017 at 9:25 AM, David Steele  wrote:
>>> I also marked the pg_stop_* functions as parallel restricted, the same
>>> as pg_start_backup().  Previously they were parallel safe which I don't
>>> believe is accurate for the non-exclusive version at the very least,
>>> since it is tied to a particular backend.
>>
>> Yeah, those should really be parallel restricted. For the exclusive
>> version, having the function run in parallel would also lead to errors
>> per the presence/lack of backup_label file.
>
> I'm not sure that's the case.  It seems like it should lock just as
> multiple backends would now.  One process would succeed and the others
> would error.  Maybe I'm missing something?

Hm, any errors happening in the workers would be reported to the
leader, meaning that even if one worker succeeded to run
pg_start_backup() it would be reported as an error at the end to the
client, no? By marking the exclusive function restricted we get sure
that it is just the leader that fails or succeeds.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tomas Vondra


On 02/27/2017 06:42 PM, Andres Freund wrote:

On 2017-02-27 12:27:48 -0500, Tom Lane wrote:

Andres Freund  writes:

The best theory I have so far that I have is that slab.c's idea of
StandardChunkHeader's size doesn't match what mcxt.c think it is
(because slab.c simply embeds StandardChunkHeader, but mcxt uses
MAXALIGN(sizeof(StandardChunkHeader))).  That's not good, but I don't
quite see how that'd cause the issue, since StandardChunkHeader's size
should always be properly sized.


Uh, wrong.  On a 32-bit machine with debug enabled, StandardChunkHeader
will contain 3 4-byte fields.  However, there are some such machines on
which MAXALIGN is 8.  For example, looking at termite's configure
output:

checking size of void *... 4
checking size of size_t... 4
checking size of long... 4
checking alignment of short... 2
checking alignment of int... 4
checking alignment of long... 4
checking alignment of long long int... 8
checking alignment of double... 8

axolotl's output looks similar.  I expect my old HPPA dinosaur
will show the failure as well.


Yea, I hadn't yet realized when writing that that termite actually,
despite running on ppc64, compiles a 32bit postgres. Will thus
duplicate StandardChunkHeader's contents in to slab.c :( - I don't
see an easy way around that...



I've tried this - essentially copying the StandardChunkHeader's contents 
into SlabChunk, but that does not seem to do the trick, sadly. Per 
pahole, the structures then (at least on armv7l) look like this:


struct SlabChunk {
void * block;/*  0   4 */
MemoryContext  context;  /*  4   4 */
Size   size; /*  8   4 */
Size   requested_size;   /* 12   4 */

/* size: 16, cachelines: 1, members: 4 */
/* last cacheline: 16 bytes */
};

struct StandardChunkHeader {
MemoryContext  context;  /*  0   4 */
Size   size; /*  4   4 */
Size   requested_size;   /*  8   4 */

/* size: 12, cachelines: 1, members: 3 */
/* last cacheline: 12 bytes */
};

So SlabChunk happens to be perfectly aligned (MAXIMUM_ALIGNOF=8), and so 
pfree() grabs the block pointer but thinks it's the context :-(


Not sure what to do about this - the only thing I can think about is 
splitting SlabChunk into two separate structures, and align them 
independently.


The attached patch does that - it probably needs a bit more work on the 
comments to make it commit-ready, but it fixes the test_deconding tests 
on the rpi3 board I'm using for testing.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-fix.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-02-27 Thread David Steele

On 2/27/17 7:38 PM, Michael Paquier wrote:
> On Tue, Feb 28, 2017 at 9:25 AM, David Steele  wrote:
>> I also marked the pg_stop_* functions as parallel restricted, the same
>> as pg_start_backup().  Previously they were parallel safe which I don't
>> believe is accurate for the non-exclusive version at the very least,
>> since it is tied to a particular backend.
> 
> Yeah, those should really be parallel restricted. For the exclusive
> version, having the function run in parallel would also lead to errors
> per the presence/lack of backup_label file.

I'm not sure that's the case.  It seems like it should lock just as
multiple backends would now.  One process would succeed and the others
would error.  Maybe I'm missing something?

Either way, I don't think the behavior makes any sense.  Parallel safe
seems more sensible.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-02-27 Thread Michael Paquier

On Tue, Feb 28, 2017 at 9:25 AM, David Steele  wrote:
> I also marked the pg_stop_* functions as parallel restricted, the same
> as pg_start_backup().  Previously they were parallel safe which I don't
> believe is accurate for the non-exclusive version at the very least,
> since it is tied to a particular backend.

Yeah, those should really be parallel restricted. For the exclusive
version, having the function run in parallel would also lead to errors
per the presence/lack of backup_label file.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-27 Thread David Steele

On 2/27/17 7:27 PM, Michael Paquier wrote:
> On Tue, Feb 28, 2017 at 9:21 AM, David Steele  wrote:
>> Looks like it's me, then.  I don't seem to have admin privileges to
>> close the commitfest or I don't know where the option is located.
>>
>> Can somebody grant privileges or close the CF at 3/1 12AM AoE (UTC-12)
>> for me?
> 
> I have no way to give the administration rights, but as I got them I
> am fine to mark the CF as in progress around this time.

That works for me.  I have all the privileges I need to run the CF
otherwise.

Thanks!
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-27 Thread Michael Paquier

On Tue, Feb 28, 2017 at 9:21 AM, David Steele  wrote:
> Looks like it's me, then.  I don't seem to have admin privileges to
> close the commitfest or I don't know where the option is located.
>
> Can somebody grant privileges or close the CF at 3/1 12AM AoE (UTC-12)
> for me?

I have no way to give the administration rights, but as I got them I
am fine to mark the CF as in progress around this time.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-02-27 Thread Seki, Eiji

on 2017-02-24 04:41:09 Simon Riggs wrote:
> ...you didn't comment at all on the accuracy and usefulness of the
> gathered statistics, when the sample is biased towards non-updated
> data.

In my understanding, the sample for statistics is not biased at least in our 
use case because the conversion process from WOS to ROS never modify (but only 
read) indexed table and modify only an index relation. And I think such a 
process that modifies some tables should not ignore analyze processes.

--
Regards,
Eiji Seki
Fujitsu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-02-27 Thread David Steele

Currently pg_stop_backup() will wait for all WAL to be archived before
returning.  If there is a problem with the archive command or a large
backlog it may not return for a very long time (if ever).  Backup
software is faced with the choice of cancelling the query and hoping the
stop backup record was written or waiting indefinitely.

The purpose of this patch is to make waiting for archive optional, with
the default being the current behavior, i.e., to wait for all WAL to be
archived.  This functionality is already used internally by
pg_basebackup, so the only real change is to expose it through the
pg_stop_backup() function.

I wasn't sure where, if anywhere, to put tests (the underlying
functionality is tested in the pg_basebackup TAP tests).  test/recovery
and bin/pg_basebackup did not seem like appropriate places and the
pg_regress tests are not capable enough.  It seems like a new
test/backup suite would be a good idea but I wanted to get some feedback
before proceeding in that direction.

I also marked the pg_stop_* functions as parallel restricted, the same
as pg_start_backup().  Previously they were parallel safe which I don't
believe is accurate for the non-exclusive version at the very least,
since it is tied to a particular backend.

The patch applies cleanly on 30df93f but will be broken (and easily
fixed) by any cat version bump in later commits.

-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9c53e42..680a0dc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18044,7 +18044,7 @@ SELECT set_config('log_statement_stats', 'off', false);
   
   

-pg_stop_backup(exclusive 
boolean)
+pg_stop_backup(exclusive 
boolean , wait_for_archive boolean 
)
 
setof record
Finish performing exclusive or non-exclusive on-line backup 
(restricted to superusers by default, but other users can be granted EXECUTE to 
run the function)
@@ -18128,7 +18128,13 @@ postgres=# select pg_start_backup('label_goes_here');
 pg_start_backup. In a non-exclusive backup, the contents of
 the backup_label and tablespace_map are returned
 in the result of the function, and should be written to files in the
-backup (and not in the data directory).
+backup (and not in the data directory).  There is an optional second
+parameter of type boolean.  If false, the pg_stop_backup
+will return immediately after the backup is completed without waiting for
+WAL to be archived.  This behavior is only useful for backup
+software which independently monitors WAL archiving, otherwise WAL
+required to make the backup consistent might be missing and make the backup
+useless.

 

diff --git a/src/backend/access/transam/xlogfuncs.c 
b/src/backend/access/transam/xlogfuncs.c
index 27c0c56..d8fdacf 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -190,6 +190,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
boolnulls[3];
 
boolexclusive = PG_GETARG_BOOL(0);
+   boolwaitforarchive = PG_GETARG_BOOL(1);
XLogRecPtr  stoppoint;
 
/* check to see if caller supports us returning a tuplestore */
@@ -232,7 +233,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 * Stop the exclusive backup, and since we're in an exclusive 
backup
 * return NULL for both backup_label and tablespace_map.
 */
-   stoppoint = do_pg_stop_backup(NULL, true, NULL);
+   stoppoint = do_pg_stop_backup(NULL, waitforarchive, NULL);
exclusive_backup_running = false;
 
nulls[1] = true;
@@ -250,7 +251,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 * Stop the non-exclusive backup. Return a copy of the backup 
label
 * and tablespace map so they can be written to disk by the 
caller.
 */
-   stoppoint = do_pg_stop_backup(label_file->data, true, NULL);
+   stoppoint = do_pg_stop_backup(label_file->data, waitforarchive, 
NULL);
nonexclusive_backup_running = false;
cancel_before_shmem_exit(nonexclusive_base_backup_cleanup, 
(Datum) 0);
 
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 38be9cf..c2ca2b8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -984,6 +984,12 @@ CREATE OR REPLACE FUNCTION
   RETURNS pg_lsn STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_stop_backup (
+exclusive boolean, wait_for_archive boolean DEFAULT true,
+OUT lsn pg_lsn, OUT labelfile text, OUT spcmapfile text)
+  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE O

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-27 Thread David Steele

On 2/27/17 6:40 PM, Michael Paquier wrote:
> On Tue, Feb 28, 2017 at 2:43 AM, Andres Freund  wrote:
>> On 2017-02-27 23:09:40 +0530, Robert Haas wrote:
>>> On Mon, Feb 27, 2017 at 9:39 PM, David Steele  wrote:
 I'm happy to be CFM.  Somehow I doubt there will be a lot of objections!
>>>
>>> No.  Aren't you the guy who did a good job with it last year and just
>>> about perished in the process?
>>
>> Yea, I was wondering if that Davis S. guy from last year would object.
>>
>> None here either.
> 
> Fine for me too. Congratulations (?)

Looks like it's me, then.  I don't seem to have admin privileges to
close the commitfest or I don't know where the option is located.

Can somebody grant privileges or close the CF at 3/1 12AM AoE (UTC-12)
for me?

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-27 Thread Michael Paquier

On Tue, Feb 28, 2017 at 2:43 AM, Andres Freund  wrote:
> On 2017-02-27 23:09:40 +0530, Robert Haas wrote:
>> On Mon, Feb 27, 2017 at 9:39 PM, David Steele  wrote:
>> > I'm happy to be CFM.  Somehow I doubt there will be a lot of objections!
>>
>> No.  Aren't you the guy who did a good job with it last year and just
>> about perished in the process?
>
> Yea, I was wondering if that Davis S. guy from last year would object.
>
> None here either.

Fine for me too. Congratulations (?)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017

2017-02-27 Thread Thomas Munro

On Tue, Feb 28, 2017 at 11:42 AM, Alexander Korotkov
 wrote:
> Hi all!
>
> It seems that PostgreSQL has passed to GSoC mentoring organizations this
> year!
> https://summerofcode.withgoogle.com/organizations/4558465230962688/
> Congratulations!

Very cool!

By the way, that page claims that PostgreSQL runs on Irix and Tru64,
which hasn't been true for a few years.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Polyphase merge is obsolete

2017-02-27 Thread Peter Geoghegan

On Mon, Jan 16, 2017 at 5:56 PM, Peter Geoghegan  wrote:
> On Wed, Oct 12, 2016 at 10:16 AM, Heikki Linnakangas  wrote:
>> The number of *input* tapes we can use in each merge pass is still limited,
>> by the memory needed for the tape buffers and the merge heap, but only one
>> output tape is active at any time. The inactive output tapes consume very
>> few resources. So the whole idea of trying to efficiently reuse input tapes
>> as output tapes is pointless
>
> I picked this up again. The patch won't apply cleanly. Can you rebase?

Since we have an awful lot of stuff in the last CF, and this patch
doesn't seem particularly strategic, I've marked it "Returned with
Feedback".

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017

2017-02-27 Thread Alexander Korotkov

Hi all!

It seems that PostgreSQL has passed to GSoC mentoring organizations this
year!
https://summerofcode.withgoogle.com/organizations/4558465230962688/
Congratulations!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-02-27 Thread Bruce Momjian

On Mon, Feb 27, 2017 at 05:31:21PM -0500, Tom Lane wrote:
> Michael Paquier  writes:
> > I don't object to the addition of this patch in next CF as this
> > presents no new concept. Still per the complications this patch and
> > because it is a complicated patch I was wondering if people are fine
> > to include it in this last CF.
> 
> The March CF is already looking pretty daunting.  We can try to include
> this but I won't be too surprised if it gets punted to a future CF.

Yeah, that was my reaction too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-02-27 Thread Tom Lane

Michael Paquier  writes:
> I don't object to the addition of this patch in next CF as this
> presents no new concept. Still per the complications this patch and
> because it is a complicated patch I was wondering if people are fine
> to include it in this last CF.

The March CF is already looking pretty daunting.  We can try to include
this but I won't be too surprised if it gets punted to a future CF.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-02-27 Thread Michael Paquier

On Tue, Feb 28, 2017 at 5:29 AM, Bruce Momjian  wrote:
> On Fri, Feb 17, 2017 at 11:05:31PM +0900, Michael Paquier wrote:
>> On Fri, Feb 17, 2017 at 10:43 PM, Andreas Karlsson  wrote:
>> > Thinking about this makes me wonder about why you decided to use a
>> > transaction per index in many of the steps rather than a transaction per
>> > step. Most steps should be quick. The only steps I think the makes sense to
>> > have a transaction per table are.
>>
>> I don't recall all the details to be honest :)
>>
>> > 1) When building indexes to avoid long running transactions.
>> > 2) When validating the new indexes, also to avoid long running 
>> > transactions.
>> >
>> > But when swapping the indexes or when dropping the old indexes I do not see
>> > any reason to not just use one transaction per step since we do not even
>> > have to wait for any locks (other than WaitForLockers which we just want to
>> > call once anyway since all indexes relate to the same table).
>>
>> Perhaps, this really needs a careful lookup.
>>
>> By the way, as this patch is showing up for the first time in this
>> development cycle, would it be allowed in the last commit fest? That's
>> not a patch in the easy category, far from that, but it does not
>> present a new concept.
>
> FYI, the thread started on 2013-11-15.

I don't object to the addition of this patch in next CF as this
presents no new concept. Still per the complications this patch and
because it is a complicated patch I was wondering if people are fine
to include it in this last CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] removing tsearch2

2017-02-27 Thread Mike Blackwell

There is also a mechanism for the results of the Perl module's "make test"
to be reported to a site which aggregates and reports them by Perl version
and OS - a sort of distributed build farm.  See for example
http://matrix.cpantesters.org/?dist=DBD-Pg+3.5.3

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com



* *

On Mon, Feb 27, 2017 at 4:02 PM, David E. Wheeler 
wrote:

> On Feb 27, 2017, at 1:53 PM, Bruce Momjian  wrote:
>
> > Oh, does CPAN distribute compiled modules or requires users to compile
> > them.
>
> Like PGXN, it formally does not care, but its implementation expects
> source code distributions what will be built and installed by users. Note
> that the vast majority of those modules, -- even pure Perl modules -- are
> built with make.
>
> So users typically get their Perl modules in one of these ways:
>
> 1. As binaries from their distribution’s package manager. These tend to be
> updated manually by volunteers and not integrated into CPAN, though there
> are solutions such as [rpmcpan](https://github.com/iovation/rpmcpan) and
> [PPM](http://www.activestate.com/activeperl/ppm-perl-modules) which do
> regular distro package builds.
>
> 2. As source code from CPAN, from which they are compiled (when
> necessary), built, and installed by the user or a build system such as
> [Homebrew](https://brew.sh).
>
> Best,
>
> David
>
>

Re: [HACKERS] removing tsearch2

2017-02-27 Thread David E. Wheeler

On Feb 27, 2017, at 1:53 PM, Bruce Momjian  wrote:

> Oh, does CPAN distribute compiled modules or requires users to compile
> them.

Like PGXN, it formally does not care, but its implementation expects source 
code distributions what will be built and installed by users. Note that the 
vast majority of those modules, -- even pure Perl modules -- are built with 
make.

So users typically get their Perl modules in one of these ways:

1. As binaries from their distribution’s package manager. These tend to be 
updated manually by volunteers and not integrated into CPAN, though there are 
solutions such as [rpmcpan](https://github.com/iovation/rpmcpan) and 
[PPM](http://www.activestate.com/activeperl/ppm-perl-modules) which do regular 
distro package builds.

2. As source code from CPAN, from which they are compiled (when necessary), 
built, and installed by the user or a build system such as 
[Homebrew](https://brew.sh).

Best,

David

smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] removing tsearch2

2017-02-27 Thread Bruce Momjian

On Mon, Feb 27, 2017 at 01:00:20PM -0800, David E. Wheeler wrote:
> On Feb 27, 2017, at 12:04 PM, Bruce Momjian  wrote:
> 
> > Just stating the obvious, but one of the reasons CPAN works so well is
> > that most of the modules are written in Perl and hence don't need
> > per-platform compilation.
> 
> There are a *lot* of C-baded modules on CPAN; and my guess is that, more 
> often than not, Perl modules depend on other C-based modules.
> 
> I daresay a lot of PostgreSQL extensions can be written in pure SQL or 
> PL/pgSQL.

Oh, does CPAN distribute compiled modules or requires users to compile
them.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] btree_gin and btree_gist for enums

2017-02-27 Thread Tom Lane

Andrew Dunstan  writes:
> OK, here's the whole series of patches.

I've not tested it at all, but this looks generally sane in a quick
once-over.

A minor quibble is that in 0003, you weren't terribly consistent about
argument order --- in some places you have the FmgrInfo argument added
before the collation argument, and in some places after.  I'd suggest
trying to make the argument orders consistent with the fmgr.c support
functions.  (I'm generally -1 on blindly adding stuff at the end.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-02-27 Thread Petr Jelinek

On 27/02/17 07:59, Amit Langote wrote:
> On 2017/02/27 3:18, Petr Jelinek wrote:
>> On 24/02/17 07:15, Robert Haas wrote:
>>> On Fri, Feb 24, 2017 at 9:53 AM, Simon Riggs  wrote:
 The good news is that logical replication DOES work with partitioning,
 but only for a Publication with PUBLISH INSERT, pushing from a normal
 table to a partitioned one. Which is useful enough for first release.

 The work on having UPDATE work between partitions can be used to make
 updates and deletes work in logical replication. That might yet be
 made to work in this release, and I think we should look into it, but
 I think it can be left til next release if we try.
>>>
>>> Are you speaking of the case where you want to replicate from an
>>> unpartitioned table to a partitioned table?  I would imagine that if
>>> you are replicating between two identical partitioning hierarchies, it
>>> would just work.  (If not, that's probably a bug, though I don't know
>>> whose bug.)
>>>
>>
>> Yes same hierarchies will work but mainly because one has to add
>> partitions themselves to publication currently. I guess that's the
>> limitation we might have to live with in 10 as adding the whole
>> partitioned table should probably work for different hierarchies when we
>> enable it and I am not quite sure that's doable before start of the CF
>> at this point.
> 
> If and when we add support to add partitioned tables to publications, I
> think it will work by recursing to include the partitions to the same
> publication (I see that OpenTableList() in publicationcmds.c calls
> find_all_inheritors if recursion is requested by not specifying ONLY).
> When a subscription replicates from this publication, it's going to match
> changes for individual leaf partitions, not the root parent table.  IOW,
> none of the changes applied to a partitioned table are replicated as
> changes to the table itself.  So, it seems that adding a partitioned table
> to a publication or subscription would simply be a *shorthand* for adding
> all the (leaf) partitions that will actually emit and receive changes.

This was my first thought as well. However we need to also take into
account the use-case with different partitioning topology on publisher
and subscriber (at least in a sense that the initial design will not
paint us in the corner).

Now I see two ways of achieving this.
a) Either going with more or less what you wrote above and in the future
have some smarts where we can specify that it should replicate with
different name. We'll eventually want to be able to replicate table to
differently named table anyway so it's not stretch by any means to do that.
b) Or just replicate changes to leaf partitions as changes to
partitioned table. That's also not that hard to do, we just need fast
lookup of partitioned table from leaf table.

I guess a) looks simpler given that we eventually need the rename
anyway, but I'd like opinions of other people as well.

> I'm not sure but should adding/removing partitions after the fact cause
> their addition/removal from the publication (& subscription)? Maybe we'll
> discuss these issues later.

That's something I've been also wondering as there is many corner cases
here. For example if table is in some publication and then is added as
partition to partitioned table what should happen? What should happen
when the partitioned table is then removed from same publication to
which partition was added explicitly? Should we allow different
publication configuration for different partitions within same
partitioned table, etc?

This somewhat brings bigger question about where we want to go in
general with partitioning. And that is, should partition be a separate
entity that is on the same level of table, or should it be part of the
partitioned table without it's own "identity".

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] btree_gin and btree_gist for enums

2017-02-27 Thread Andrew Dunstan



On 02/26/2017 04:09 PM, Andrew Dunstan wrote:
>
> On 02/26/2017 03:26 PM, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> This works for the btree_gin case. However, there's a difficulty for
>>> btree_gist - in the puicksplit routine the cmp function is passed to
>>> qsort() so there is no chance to pass it an flinfo to set up the call to
>>> the real comparison routine. Implementing a custom sort routine to work
>>> around the problem seems a bridge too far. I can;t think of an
>>> alternative off hand.
>> We already have qsort_arg ... can't you change it to use that?
>>
>>  
>
> Yes, wasn't aware of that, that looks like exactly what I need. thanks.
>
>


OK, here's the whole series of patches.


Patch 1 adds the CallerFInfoFunctionCall{1,2} functions.

Patch 2 adds btree_gist support for their use for non-varlena types

Patch 3 does the same for varlena types (Not required for patch 4, but
better to be consistent, I think.)

Patch 4 adds enum support to btree_gist

Patch 5 adds enum support to btree_gin

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

>From 74ebe1490de916409458e3d1adf47dfa20b1fefb Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Mon, 27 Feb 2017 15:41:03 -0500
Subject: [PATCH 4/4] Add btree_gist support for enum types.

This will alow enums to be used in exclusio n constraints.

The code uses the new CallerFInfoFunctionCall infrastructure in fmgr,
and the support for it added to btree_gist in a nearby patch.
---
 contrib/btree_gist/Makefile |   8 +-
 contrib/btree_gist/btree_enum.c | 187 +
 contrib/btree_gist/btree_gist--1.3--1.4.sql |  69 
 contrib/btree_gist/btree_gist.control   |   2 +-
 contrib/btree_gist/btree_gist.h |   3 +-
 contrib/btree_gist/btree_utils_num.c|   2 +
 contrib/btree_gist/data/enum.data   | 595 
 contrib/btree_gist/expected/enum.out|  91 +
 contrib/btree_gist/sql/enum.sql |  38 ++
 doc/src/sgml/btree-gist.sgml|   2 +-
 10 files changed, 991 insertions(+), 6 deletions(-)
 create mode 100644 contrib/btree_gist/btree_enum.c
 create mode 100644 contrib/btree_gist/btree_gist--1.3--1.4.sql
 create mode 100644 contrib/btree_gist/data/enum.data
 create mode 100644 contrib/btree_gist/expected/enum.out
 create mode 100644 contrib/btree_gist/sql/enum.sql

diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index d36f517..beb2ed7 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -6,16 +6,18 @@ OBJS =  btree_gist.o btree_utils_num.o btree_utils_var.o btree_int2.o \
 btree_int4.o btree_int8.o btree_float4.o btree_float8.o btree_cash.o \
 btree_oid.o btree_ts.o btree_time.o btree_date.o btree_interval.o \
 btree_macaddr.o btree_inet.o btree_text.o btree_bytea.o btree_bit.o \
-btree_numeric.o btree_uuid.o $(WIN32RES)
+btree_numeric.o btree_uuid.o btree_enum.o $(WIN32RES)
 
 EXTENSION = btree_gist
 DATA = btree_gist--unpackaged--1.0.sql btree_gist--1.0--1.1.sql \
-   btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql
+   btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql \
+	   btree_gist--1.3--1.4.sql
+
 PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
 
 REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \
 time timetz date interval macaddr inet cidr text varchar char bytea \
-bit varbit numeric uuid not_equal
+bit varbit numeric uuid enum not_equal
 
 SHLIB_LINK += $(filter -lm, $(LIBS))
 
diff --git a/contrib/btree_gist/btree_enum.c b/contrib/btree_gist/btree_enum.c
new file mode 100644
index 000..5e46e78
--- /dev/null
+++ b/contrib/btree_gist/btree_enum.c
@@ -0,0 +1,187 @@
+/*
+ * contrib/btree_gist/btree_enum.c
+ */
+#include "postgres.h"
+#include "fmgr.h"
+#include "utils/builtins.h"
+
+#include "btree_gist.h"
+#include "btree_utils_num.h"
+
+/* enums are really Oids, so we just use the same structure */
+
+typedef struct
+{
+	Oid			lower;
+	Oid			upper;
+} oidKEY;
+
+/*
+** enum ops
+*/
+PG_FUNCTION_INFO_V1(gbt_enum_compress);
+PG_FUNCTION_INFO_V1(gbt_enum_fetch);
+PG_FUNCTION_INFO_V1(gbt_enum_union);
+PG_FUNCTION_INFO_V1(gbt_enum_picksplit);
+PG_FUNCTION_INFO_V1(gbt_enum_consistent);
+PG_FUNCTION_INFO_V1(gbt_enum_penalty);
+PG_FUNCTION_INFO_V1(gbt_enum_same);
+
+
+static bool
+gbt_enumgt(const void *a, const void *b, FmgrInfo *flinfo)
+{
+	return DatumGetBool(
+		CallerFInfoFunctionCall2(enum_gt, flinfo, InvalidOid, ObjectIdGetDatum(*((const Oid *) a)), ObjectIdGetDatum(*((const Oid *) b)))
+		);
+}
+static bool
+gbt_enumge(const void *a, const void *b, FmgrInfo *flinfo)
+{
+	return DatumGetBool(
+		CallerFInfoFunctionCall2(enum_ge, flinfo, InvalidOid, ObjectIdGetDatum(*((const Oid *) a)), ObjectIdGetDatum(*((

Re: [HACKERS] removing tsearch2

2017-02-27 Thread David E. Wheeler

On Feb 27, 2017, at 12:04 PM, Bruce Momjian  wrote:

> Just stating the obvious, but one of the reasons CPAN works so well is
> that most of the modules are written in Perl and hence don't need
> per-platform compilation.

There are a *lot* of C-baded modules on CPAN; and my guess is that, more often 
than not, Perl modules depend on other C-based modules.

I daresay a lot of PostgreSQL extensions can be written in pure SQL or PL/pgSQL.

Best,

David

smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-02-27 Thread Thomas Munro

On Thu, Feb 23, 2017 at 3:08 AM, Thom Brown  wrote:
> On 23 January 2017 at 11:56, Ivan Kartyshov  
> wrote:
>>
>> Thank you for reading, will be glad to get your feedback.
>
> Could you please rebase your patch as it no longer applies cleanly.

+1

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] I propose killing PL/Tcl's "modules" infrastructure

2017-02-27 Thread Tom Lane

Andrew Dunstan  writes:
> On 02/26/2017 02:54 PM, Tom Lane wrote:
>> * I'm not terribly comfortable about what the permissions levels of the
>> GUCs ought to be.

> plperl's on_plperl_init and on_plperlu_init settings are both SUSET.
> In practice with PLv8 this is usually set in the config file in my
> experience.

Ah, I'd forgotten about that precedent.  Being consistent with that seems
like a good thing --- and I agree with your point that this would likely
usually be set in postgresql.conf anyway, making the issue rather moot.

I noticed also that the precedent of plperl is that if the init code
fails, we give up on use of that interpreter, and try again to run
the init code if plperl is used again.  This is different from what
I had in my draft spec but it probably is a better definition; without
it, people might find themselves running in Tcl interpreters that do
not behave as intended.

In sum, then, PFA a patch that adds these GUCs.  They're still function
names but otherwise the details are closer to what plperl does.

regards, tom lane

diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index 0a69380..ad216dd 100644
*** a/doc/src/sgml/pltcl.sgml
--- b/doc/src/sgml/pltcl.sgml
*** if {[catch { spi_exec $sql_command }]} {
*** 902,907 
--- 902,981 
  
 
  
+
+ PL/Tcl Configuration
+ 
+ 
+  This section lists configuration parameters that
+  affect PL/Tcl.
+ 
+ 
+ 
+ 
+  
+   
+pltcl.start_proc (string)
+
+ pltcl.start_proc configuration parameter
+
+   
+   
+
+ This parameter, if set to a nonempty string, specifies the name
+ (possibly schema-qualified) of a parameterless PL/Tcl function that
+ is to be executed whenever a new Tcl interpreter is created for
+ PL/Tcl.  Such a function can perform per-session initialization, such
+ as loading additional Tcl code.  A new Tcl interpreter is created
+ when a PL/Tcl function is first executed in a database session, or
+ when an additional interpreter has to be created because a PL/Tcl
+ function is called by a new SQL role.
+
+ 
+
+ The referenced function must be written in the pltcl
+ language, and must not be marked SECURITY DEFINER.
+ (These restrictions ensure that it runs in the interpreter it's
+ supposed to initialize.)  The current user must have permission to
+ call it, too.
+
+ 
+
+ If the function fails with an error it will abort the function call
+ that caused the new interpreter to be created and propagate out to
+ the calling query, causing the current transaction or subtransaction
+ to be aborted.  Any actions already done within Tcl won't be undone;
+ however, that interpreter won't be used again.  If the language is
+ used again the initialization will be attempted again within a fresh
+ Tcl interpreter.
+
+ 
+
+ Only superusers can change this setting.  Although this setting
+ can be changed within a session, such changes will not affect Tcl
+ interpreters that have already been created.
+
+   
+  
+ 
+  
+   
+pltclu.start_proc (string)
+
+ pltclu.start_proc configuration parameter
+
+   
+   
+
+ This parameter is exactly like pltcl.start_proc,
+ except that it applies to PL/TclU.  The referenced function must
+ be written in the pltclu language.
+
+   
+  
+ 
+ 
+
+ 
 
  Tcl Procedure Names
  
diff --git a/src/pl/tcl/Makefile b/src/pl/tcl/Makefile
index 453e7ad..1096c4f 100644
*** a/src/pl/tcl/Makefile
--- b/src/pl/tcl/Makefile
*** DATA = pltcl.control pltcl--1.0.sql pltc
*** 28,34 
 pltclu.control pltclu--1.0.sql pltclu--unpackaged--1.0.sql
  
  REGRESS_OPTS = --dbname=$(PL_TESTDB) --load-extension=pltcl
! REGRESS = pltcl_setup pltcl_queries pltcl_unicode
  
  # Tcl on win32 ships with import libraries only for Microsoft Visual C++,
  # which are not compatible with mingw gcc. Therefore we need to build a
--- 28,34 
 pltclu.control pltclu--1.0.sql pltclu--unpackaged--1.0.sql
  
  REGRESS_OPTS = --dbname=$(PL_TESTDB) --load-extension=pltcl
! REGRESS = pltcl_setup pltcl_queries pltcl_start_proc pltcl_unicode
  
  # Tcl on win32 ships with import libraries only for Microsoft Visual C++,
  # which are not compatible with mingw gcc. Therefore we need to build a
diff --git a/src/pl/tcl/expected/pltcl_start_proc.out b/src/pl/tcl/expected/pltcl_start_proc.out
index ...0b8afb7 .
*** a/src/pl/tcl/expected/pltcl_start_proc.out
--- b/src/pl/tcl/expected/pltcl_start_proc.out
***
*** 0 
--- 1,28 
+ --
+ -- Test start_proc execution
+ --
+ SET pltcl.start_proc = 'no_such_function';
+ select tc

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-02-27 Thread Bruce Momjian

On Fri, Feb 17, 2017 at 11:05:31PM +0900, Michael Paquier wrote:
> On Fri, Feb 17, 2017 at 10:43 PM, Andreas Karlsson  wrote:
> > Thinking about this makes me wonder about why you decided to use a
> > transaction per index in many of the steps rather than a transaction per
> > step. Most steps should be quick. The only steps I think the makes sense to
> > have a transaction per table are.
> 
> I don't recall all the details to be honest :)
> 
> > 1) When building indexes to avoid long running transactions.
> > 2) When validating the new indexes, also to avoid long running transactions.
> >
> > But when swapping the indexes or when dropping the old indexes I do not see
> > any reason to not just use one transaction per step since we do not even
> > have to wait for any locks (other than WaitForLockers which we just want to
> > call once anyway since all indexes relate to the same table).
> 
> Perhaps, this really needs a careful lookup.
> 
> By the way, as this patch is showing up for the first time in this
> development cycle, would it be allowed in the last commit fest? That's
> not a patch in the easy category, far from that, but it does not
> present a new concept.

FYI, the thread started on 2013-11-15.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] removing tsearch2

2017-02-27 Thread Bruce Momjian

On Fri, Feb 17, 2017 at 09:54:37AM +0100, Magnus Hagander wrote:
> If we could somehow integrate PGXN with both the RPM build process, the DEB
> build process and a Windows build process (whether driven by PGXN or just "fed
> enough data" by PGXN is a different question), I think that would go a long 
> way
> towards the goal. Also being able to use this somehow to drive continuous
> builds and tests (kind of like a buildfarm-lite for a subset of platforms)
> would be useful for reaching a point where extensions outside of core can come
> at least close to what we deliver in core.

Just stating the obvious, but one of the reasons CPAN works so well is
that most of the modules are written in Perl and hence don't need
per-platform compilation.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-27 Thread Bruce Momjian

On Fri, Feb 10, 2017 at 03:19:47PM +0900, Amit Langote wrote:
> The new partitioned tables do not contain any data by themselves.  Any
> data inserted into a partitioned table is routed to and stored in one of
> its partitions.  In fact, it is impossible to insert *any* data before a
> partition (to be precise, a leaf partition) is created.  It seems wasteful
> then to allocate physical storage (files) for partitioned tables.  If we
> do not allocate the storage, then we must make sure that the right thing
> happens when a command that is intended to manipulate a table's storage
> encounters a partitioned table, the "right thing" here being that the
> command's code either throws an error or warning (in some cases) if the
> specified table is a partitioned table or ignores any partitioned tables
> when it reads the list of relations to process from pg_class.  Commands
> that need to be taught about this are vacuum, analyze, truncate, and alter
> table.  Specifically:
> 
> - In case of vacuum, specifying a partitioned table causes a warning
> 
> - In case of analyze, we do not throw an error or warning but simply
>   avoid calling do_analyze_rel() *non-recursively*.  Further in
>   acquire_inherited_sample_rows(), any partitioned tables in the list
>   returned by find_all_inheritors() are skipped.
> 
> - In case of truncate, only the part which manipulates table's physical
>   storage is skipped for partitioned tables.
> 
> - ATRewriteTables() skips on the AlteredTableInfo entries for partitioned
>   tables, because there is nothing to be done.
> 
> - Since we cannot create indexes on partitioned tables anyway, there is
>   no need to handle cluster and reindex (they throw a meaningful error
>   already due to the lack of indexes.)

I don't think we are doing this, but if the parent table doesn't have a
physical file pg_upgrade will need to be taught that.  We have that case
now for unlogged tables on standby servers that we need to address.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] PGSERVICEFILE as a connection string parameter

2017-02-27 Thread Dave Cramer

+Vladimir

On 27 February 2017 at 11:36, Andres Freund  wrote:

> On 2017-02-27 16:23:46 +0100, Magnus Hagander wrote:
> > On Mon, Feb 27, 2017 at 7:03 AM, Andres Freund 
> wrote:
> > > On 2017-02-27 14:43:49 +0900, Michael Paquier wrote:
> > > > I bumped into a case where it would have been rather useful to
> specify
> > > > a service file path in a connection string with a service name. In my
> > > > case, I have finished by setting up PGSERVICEFILE, but now like
> > > > PGPASSFILE I think that being able to define the service file
> > > > available as well as a connection parameter would be useful as well.
> > > >
> > > > I am not planning to work on that immediately (there is one day left
> > > > for the last CF of PG10!), but I was wondering if people would be
> > > > interested in something like that.
> > >
> > > Hm - I'm not sure that's a good idea. service files are a libpq
> feature,
> > > but connection strings are a bit more universal than just libpq...
> > >
> >
> > That same argument applies to PGPASSFILE, does it not?
>
> It does.  I'm not really convinced it's a good idea to have that as a
> full blown parameter, but as you say:
>
> > Properly implementing PGSERVICEFILE is more complicated though -- as it
> > requires LDAP support to go there the whole way for example.
>
>
>
> > But it might not hurt to encourage other drivers (such as jdbc) to
> > support at least the basic format of pgpass.
>
> Probably makes sense to bring in some of the external driver authors
> (jdbc, npgsql CCed).
>
> Currently PGPASS is in the users home directory Many JDBC applications are
in much larger apps such as tomcat, etal
this concept is a bit foreign to JDBC. That being said I don't think it's
difficult to implement. Just somewhat harder to specify
for us. psql is rather limited being a command line app which is *usually*
evoked directly from the command line.


Dave Cramer

da...@postgresintl.com
www.postgresintl.com

Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint

2017-02-27 Thread Jeff Janes

On Sun, Feb 26, 2017 at 12:32 PM, Magnus Hagander 
wrote:

>
> On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck 
> wrote:
>
>> Hi,
>>
>> Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:
>> > On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
>> >  wrote:
>> > > I'd rather have a --quiet mode instead.  If you're running it by hand,
>> > > you're likely to omit the switch, whereas when writing the cron job
>> > > you're going to notice lack of switch even before you let the job run
>> > > once.
>> >
>> > Well, that might've been a better way to design it, but changing it
>> > now would break backward compatibility and I'm not really sure that's
>> > a good idea.  Even if it is, it's a separate concern from whether or
>> > not in the less-quiet mode we should point out that we're waiting for
>> > a checkpoint on the server side.
>>
>> ISTM the consensus is that there should be no output in regular mode,
>> but a message should be displayed in verbose and progress mode.
>>
>> So I went forth and also added a message in progress mode (unless
>> verbose messages are requested anyway).
>>
>> Regarding the documentation, I tried to clarify the difference between
>> the checkpoint types a bit more, but I think any further action is
>> probably a larger rewrite of the documentation on this topic.
>>
>> So attached are two patches, I've split it up in the documentation and
>> the code output part. I'll add it as one commitfest entry in the
>> "Clients" section though, as it's not really a big patch, unless
>> somebody thinks it should have a secon entry in "Documentation"?
>
>
> Agreed, and applied as one patch. Except I noticed you also fixed a couple
> of entries which were missing the progname in the messages -- I broke those
> out to a separate patch instead.
>
> Made a small change to "using as much I/O as available" rather than "as
> possible", which I think is a better wording, along with the change of the
> idle wording I suggested before. (but feel free to point it out to me if
> that's wrong).
>

Should the below fprintf end in a \r rather than a \n, so that the the
progress message gets over-written once the checkpoint is done and we have
moved on?

if (showprogress && !verbose)
fprintf(stderr, "waiting for checkpoint\n");

That would seem more in keeping with how the other progress messages
operate.

Cheers,

Jeff

Re: [HACKERS] Poor memory context performance in large hash joins

2017-02-27 Thread Tomas Vondra


On 02/27/2017 12:55 PM, Andres Freund wrote:

On 2017-02-24 15:18:04 -0800, Andres Freund wrote:

On 2017-02-24 15:12:37 -0800, Andres Freund wrote:

On 2017-02-24 18:04:18 -0500, Tom Lane wrote:

Concretely, something like the attached.  This passes regression tests
but I've not pushed on it any harder than that.


Heh, I'd just gotten something that didn't immediately crash anymore ;)

Running your patch against Jeff's test-case, verified before that I
could easily reproduce the O(N^2) cost.


Oh, that didn't take as long as I was afraid (optimized/non-assert build):

postgres[26268][1]=# SET work_mem = '13GB';
SET
Time: 2.591 ms
postgres[26268][1]=# select count(*) from foobar2 where not exists (select 1 
from foobar t where t.titleid=foobar2.titleid);
Time: 268043.710 ms (04:28.044)


As another datapoint, I measured this patch against the problem from
https://www.postgresql.org/message-id/20170227111732.vrx5v72ighehw...@alap3.anarazel.de
(see top post in thread), and it indeed fixes the runtime issue -
there's still considerably higher memory usage and some runtime
overhead, but the quadratic behaviour is gone.

I think we should go forward with something like this patch in all
branches, and only use Tomas' patch in master, because they're
considerably larger.



So you've tried to switch hashjoin to the slab allocators? Or what have 
you compared?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Poor memory context performance in large hash joins

2017-02-27 Thread Andres Freund

On 2017-02-27 19:20:56 +0100, Tomas Vondra wrote:
> On 02/27/2017 12:55 PM, Andres Freund wrote:
> > On 2017-02-24 15:18:04 -0800, Andres Freund wrote:
> > > On 2017-02-24 15:12:37 -0800, Andres Freund wrote:
> > > > On 2017-02-24 18:04:18 -0500, Tom Lane wrote:
> > > > > Concretely, something like the attached.  This passes regression tests
> > > > > but I've not pushed on it any harder than that.
> > > >
> > > > Heh, I'd just gotten something that didn't immediately crash anymore ;)
> > > >
> > > > Running your patch against Jeff's test-case, verified before that I
> > > > could easily reproduce the O(N^2) cost.
> > >
> > > Oh, that didn't take as long as I was afraid (optimized/non-assert build):
> > >
> > > postgres[26268][1]=# SET work_mem = '13GB';
> > > SET
> > > Time: 2.591 ms
> > > postgres[26268][1]=# select count(*) from foobar2 where not exists 
> > > (select 1 from foobar t where t.titleid=foobar2.titleid);
> > > Time: 268043.710 ms (04:28.044)
> >
> > As another datapoint, I measured this patch against the problem from
> > https://www.postgresql.org/message-id/20170227111732.vrx5v72ighehw...@alap3.anarazel.de
> > (see top post in thread), and it indeed fixes the runtime issue -
> > there's still considerably higher memory usage and some runtime
> > overhead, but the quadratic behaviour is gone.
> >
> > I think we should go forward with something like this patch in all
> > branches, and only use Tomas' patch in master, because they're
> > considerably larger.
> >
>
> So you've tried to switch hashjoin to the slab allocators? Or what have you
> compared?

No, sorry for not being more explicit about this.  Meant that Tom's
patch addresses the performance issue in the reorderbuffer.c to a good
degree (i.e. gets rid of the quadratic cost, even though constants are
higher than w/ your patches).  As the patch here is a lot smaller, it
seems like a better choice for the back-branches than backporting
slab.c/generation.c.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tomas Vondra


On 02/27/2017 04:07 PM, Andres Freund wrote:



On February 27, 2017 6:14:20 AM PST, Tomas Vondra
 wrote:

On 02/27/2017 01:02 PM, Andres Freund wrote:

Hi,

On 2017-02-27 03:17:32 -0800, Andres Freund wrote:

I'll work on getting slab committed first, and then review /
edit / commit generation.c later.  One first note there is that
I'm

wondering

if generation.c is a too generic filename.


And pushed slab and its usage.  Will have a look at generation.c
tomorrow.

- Andres



Gah. I don't want to annoy person who just committed my patch, but
can you give more time when asking for feedback? I mean, sending a
modified

patch on Friday midnight, and committing on Monday noon does not
really

give much time to look at it.


Hm. The changes IMO weren't controversial (or surprising -most of
them I had announced previously); I announced that I would when
posting the review that I'd push the patch later that weekend. If I
hadn't been tired after doing the review/editing I'd have just pushed
right then and there.  It's hard to find time and attention, so not
introducing a week of feedback time is quite worthwhile.  I listed
the changes I made primarily for posterities sake.  Most if not all
committers make editorializing changed around commit, so that's not
just me.

If you specifically want I can try to give you more time to look at
an edited patch, but that'll mean things move slower.  I won't
promise not to make minor changed just before commit either way, I
always do another round of review just before push.



I also agree the changes are not particularly controversial, but then 
why to ask for comments at all? I'm OK with a committer making final 
tweaks and pushing it without asking, but if you ask for comments, let's 
give people time to actually respond.


I agree introducing weeks of delays would be silly, but I'm not asking 
for that. I'm perfectly fine with two days for feedback, as long as it's 
not a weekend + half of Monday.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint

2017-02-27 Thread Simon Riggs

On 26 February 2017 at 20:55, Magnus Hagander  wrote:

> What do others think?

Changing the output behaviour of a command isn't something we usually
do as a backpatch.

This change doesn't affect the default behaviour so probably wouldn't
make a difference to the outcome of the situation that generated this
thread.

Having said that, if it helps others to avoid mistakes in the future
then its worth doing, so +1 to backpatch.

I've looked into changing the actual underlying behaviour and I don't
think its feasible, so making this change will at least allow some
responsiveness from us. Thanks Michael, Magnus.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-27 Thread Peter Geoghegan

On Sat, Feb 25, 2017 at 10:51 PM, Robert Haas  wrote:
> The thing that strikes me based on what you wrote is that our page
> recycling seems to admit of long delays even as things stand.  There's
> no bound on how much time could pass between one index vacuum and the
> next, and RecentGlobalXmin could and probably usually will advance
> past the point that would allow recycling long before the next index
> vacuum cycle.

Agreed.

> I don't know whether that strengthens or weakens
> Simon's argument.

I think it weakens it, but I hesitate to take a firm position on it just yet.

> On the one hand, you could argue that if we're
> already doing this on a long delay, making it even longer probably
> won't hurt much.  On the other hand, you could argue that if the
> current situation is bad, we should at least avoid making it worse.  I
> don't know which of those arguments is correct, if either.

Unsure. I will point out:

* There probably is a big problem with B-Tree index bloat for certain
workloads ("sparse deletion patterns"), that interacts badly with less
frequent VACUUMing.

* Whatever the bloat this patch makes worse is not *that* bloat, at
least with the proposed default for vacuum_cleanup_index_scale_factor;
it's not the bloat we usually think of when we talk about index bloat.
A full index scan will not need to visit any of the dead pages, even
just to immediately skip over them. We just won't be able to recycle
them, which is another problem.

* The problem of not recycling as soon as we'd prefer can only happen
when everything else is, roughly speaking, going right. Which is still
pretty good.  (Again, remarks only apply when the default
vacuum_cleanup_index_scale_factor is used.)

* Roughly speaking, the recycling of disk blocks, and efficient use of
disk space more generally is not a priority for the implementation.
Nor should it be.

I tend to think of this recycling as being more about the worst case
for space utilization than about the average case. Kind of like the
fast root vs. true root thing prevents our needing to descend "skinny"
B-Trees from the true root, which can only really happen following
vast changes to the key space, which are always going to be painful.
These cases are a bit pathological.

For more on this recycling stuff, see section 2.5 of the Lanin and
Shasha paper, "Freeing empty nodes" [1]. It's describing what is
essentially the RecentGlobalXmin interlock, and I think you're right
to point out that that could stand to be a lot more aggressive, which
is maybe the real problem, if any. (The papers remarks suggest we
could stand to be more eager about it.)

> Do you have an idea about that, or any ideas for experiments we could try?

Nothing occurs to me right now, unfortunately. However, my general
sense is that it would probably be just fine when
vacuum_cleanup_index_scale_factor was 0.0, but there might be
non-linear increases in "the serious type of index bloat" as the
proposed new setting was scaled up. I'd be much more worried about
that.

[1] https://archive.org/stream/symmetricconcurr00lani#page/6/mode/2up
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tomas Vondra


On 02/27/2017 06:40 PM, Andres Freund wrote:

On 2017-02-27 18:04:41 +0100, Petr Jelinek wrote:

On 27/02/17 18:00, Andres Freund wrote:



FWIW I think the ppc64 machines are failing because of unrelated issue
(changes to integer timestamps). We should probably look at 32bit machines
first.


Don't think so - termite is ppc64 afaics, and the failure doesn't look
integer timestamp related (assert failure is clearly about this, and set
of changed commits *only* include slab related commits).



termite is ppc64 but with 4 byte pointer size according to configure so
it might be related to that perhaps?


Uh, ok. I checked the --configure options, but not the actual configure
output (blame -ENOCOFEE and jetlag).  The output makes it fairly likely
that my StandardChunkHeader theory is valid, so I'll work on a patch to
clean that up.



Thanks. I set up a rpi3 machine (amrv7l) that fails with the same issue, 
so if you need to test the patch, let me know.


While building, I've also noticed a bunch of warnings about string 
formatting, attached is a patch that that fixes those.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index a5e140e..c673bc3 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -207,7 +207,7 @@ SlabContextCreate(MemoryContext parent,
 
 	/* Make sure the block can store at least one chunk. */
 	if (blockSize - sizeof(SlabBlock) < fullChunkSize)
-		elog(ERROR, "block size %ld for slab is too small for %ld chunks",
+		elog(ERROR, "block size %zu for slab is too small for %zu chunks",
 			 blockSize, chunkSize);
 
 	/* Compute maximum number of chunks per block */
@@ -333,7 +333,7 @@ SlabAlloc(MemoryContext context, Size size)
 
 	/* make sure we only allow correct request size */
 	if (size != slab->chunkSize)
-		elog(ERROR, "unexpected alloc chunk size %ld (expected %ld)",
+		elog(ERROR, "unexpected alloc chunk size %zu (expected %zu)",
 			 size, slab->chunkSize);
 
 	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-27 Thread David Steele

On 2/27/17 12:40 PM, Robert Haas wrote:
> On Mon, Feb 27, 2017 at 11:09 PM, Robert Haas  wrote:
>> On Mon, Feb 27, 2017 at 9:39 PM, David Steele  wrote:
>>> I'm happy to be CFM.  Somehow I doubt there will be a lot of objections!
>>
>> No.  Aren't you the guy who did a good job with it last year and just
>> about perished in the process?

I won't deny that it's a lot of work, but I'm hoping my experience from
last year will make it a little easier.  We'll see.

> (Just to be clear, the "no" meant "no objections", not any other
> possible meaning.)

I didn't take it any other way.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-27 Thread Andres Freund

On 2017-02-27 23:09:40 +0530, Robert Haas wrote:
> On Mon, Feb 27, 2017 at 9:39 PM, David Steele  wrote:
> > I'm happy to be CFM.  Somehow I doubt there will be a lot of objections!
> 
> No.  Aren't you the guy who did a good job with it last year and just
> about perished in the process?

Yea, I was wondering if that Davis S. guy from last year would object.

None here either.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund

On 2017-02-27 12:27:48 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > The best theory I have so far that I have is that slab.c's idea of
> > StandardChunkHeader's size doesn't match what mcxt.c think it is
> > (because slab.c simply embeds StandardChunkHeader, but mcxt uses
> > MAXALIGN(sizeof(StandardChunkHeader))).  That's not good, but I don't
> > quite see how that'd cause the issue, since StandardChunkHeader's size
> > should always be properly sized.
> 
> Uh, wrong.  On a 32-bit machine with debug enabled, StandardChunkHeader
> will contain 3 4-byte fields.  However, there are some such machines on
> which MAXALIGN is 8.  For example, looking at termite's configure
> output:
> 
> checking size of void *... 4
> checking size of size_t... 4
> checking size of long... 4
> checking alignment of short... 2
> checking alignment of int... 4
> checking alignment of long... 4
> checking alignment of long long int... 8
> checking alignment of double... 8
> 
> axolotl's output looks similar.  I expect my old HPPA dinosaur
> will show the failure as well.

Yea, I hadn't yet realized when writing that that termite actually,
despite running on ppc64, compiles a 32bit postgres.  Will thus
duplicate StandardChunkHeader's contents in to slab.c :( - I don't see
an easy way around that...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tom Lane

Andres Freund  writes:
> The best theory I have so far that I have is that slab.c's idea of
> StandardChunkHeader's size doesn't match what mcxt.c think it is
> (because slab.c simply embeds StandardChunkHeader, but mcxt uses
> MAXALIGN(sizeof(StandardChunkHeader))).  That's not good, but I don't
> quite see how that'd cause the issue, since StandardChunkHeader's size
> should always be properly sized.

Uh, wrong.  On a 32-bit machine with debug enabled, StandardChunkHeader
will contain 3 4-byte fields.  However, there are some such machines on
which MAXALIGN is 8.  For example, looking at termite's configure
output:

checking size of void *... 4
checking size of size_t... 4
checking size of long... 4
checking alignment of short... 2
checking alignment of int... 4
checking alignment of long... 4
checking alignment of long long int... 8
checking alignment of double... 8

axolotl's output looks similar.  I expect my old HPPA dinosaur
will show the failure as well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund

On 2017-02-27 18:04:41 +0100, Petr Jelinek wrote:
> On 27/02/17 18:00, Andres Freund wrote:
> > 
> >> FWIW I think the ppc64 machines are failing because of unrelated issue
> >> (changes to integer timestamps). We should probably look at 32bit machines
> >> first.
> > 
> > Don't think so - termite is ppc64 afaics, and the failure doesn't look
> > integer timestamp related (assert failure is clearly about this, and set
> > of changed commits *only* include slab related commits).
> > 
> 
> termite is ppc64 but with 4 byte pointer size according to configure so
> it might be related to that perhaps?

Uh, ok. I checked the --configure options, but not the actual configure
output (blame -ENOCOFEE and jetlag).  The output makes it fairly likely
that my StandardChunkHeader theory is valid, so I'll work on a patch to
clean that up.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-27 Thread Robert Haas

On Mon, Feb 27, 2017 at 11:09 PM, Robert Haas  wrote:
> On Mon, Feb 27, 2017 at 9:39 PM, David Steele  wrote:
>> I'm happy to be CFM.  Somehow I doubt there will be a lot of objections!
>
> No.  Aren't you the guy who did a good job with it last year and just
> about perished in the process?

(Just to be clear, the "no" meant "no objections", not any other
possible meaning.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-27 Thread Robert Haas

On Mon, Feb 27, 2017 at 9:39 PM, David Steele  wrote:
> I'm happy to be CFM.  Somehow I doubt there will be a lot of objections!

No.  Aren't you the guy who did a good job with it last year and just
about perished in the process?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-02-27 Thread Robert Haas

On Thu, Feb 16, 2017 at 8:16 PM, Amit Kapila  wrote:
> Attached are refactoring patches.  WAL patch needs some changes based
> on above comments, so will post it later.

After some study, I have committed 0001, and also committed 0002 and
0003 as a single commit, with only minor tweaks.

I haven't made a full study of 0004 yet, but I'm a bit concerned about
a couple of the hunks, specifically this one:

@@ -985,9 +962,9 @@ _hash_splitbucket_guts(Relation rel,

 if (PageGetFreeSpace(npage) < itemsz)
 {
-/* write out nbuf and drop lock, but keep pin */
-MarkBufferDirty(nbuf);
+/* drop lock, but keep pin */
 LockBuffer(nbuf, BUFFER_LOCK_UNLOCK);
+
 /* chain to a new overflow page */
 nbuf = _hash_addovflpage(rel, metabuf, nbuf,
(nbuf == bucket_nbuf) ? true : false);
 npage = BufferGetPage(nbuf);

And also this one:

@@ -1041,17 +1024,6 @@ _hash_splitbucket_guts(Relation rel,
  * To avoid deadlocks due to locking order of buckets, first lock the old
  * bucket and then the new bucket.
  */
-if (nbuf == bucket_nbuf)
-{
-MarkBufferDirty(bucket_nbuf);
-LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK);
-}
-else
-{
-MarkBufferDirty(nbuf);
-_hash_relbuf(rel, nbuf);
-}
-
 LockBuffer(bucket_obuf, BUFFER_LOCK_EXCLUSIVE);
 opage = BufferGetPage(bucket_obuf);
 oopaque = (HashPageOpaque) PageGetSpecialPointer(opage);

I haven't quite grasped what's going on here, but it looks like those
MarkBufferDirty() calls didn't move someplace else, but rather just
vanished.  That would seem to be not good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 148 matches

Mail list logo