date:20170302

Re: [HACKERS] Documentation improvements for partitioning

2017-03-02 Thread Amit Langote

On 2017/02/28 17:25, Simon Riggs wrote:
> On 28 February 2017 at 08:14, Amit Langote
>  wrote:
> 
>> OK.  So, I will start writing the patch with above general skeleton and
>> hopefully post it within this week and you can improve it as fit.
> 
> Will do, thanks.

Attached patch 0001 is what I managed so far.  Please take a look and let
me know if there is more I can do.  I guess you might want to expand the
parts related to UNION ALL views and BRIN indexes.

Also for consideration,

0002: some cosmetic fixes to create_table.sgml

0003: add clarification about NOT NULL constraint on partition columns in
  alter_table.sgml

Thoughts?

Thanks,
Amit
>From 97c104054310c9361623c182497d708ed65bf65f Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 3 Mar 2017 16:39:24 +0900
Subject: [PATCH 1/3] Rewrite sections in ddl.sgml related to partitioning

Merge sections Partitioned Tables and Partitioning into one section
called Table Partitioning and Related Solutions.
---
 doc/src/sgml/ddl.sgml | 1359 +
 1 file changed, 707 insertions(+), 652 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 09b5b3ff70..a2dd39df54 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2772,14 +2772,181 @@ VALUES ('Albany', NULL, NULL, 'NY');

   
 
-  
-   Partitioned Tables
+  
+   Table Partitioning and Related Solutions
+
+   
+partitioning
+   
+
+   
+table
+partitioning
+   
 

 partitioned table

 

+PostgreSQL supports basic table
+partitioning. This section describes why and how to implement
+partitioning as part of your database design.
+   
+
+   
+ Overview
+
+
+ Partitioning refers to splitting what is logically one large table into
+ smaller physical pieces.  Partitioning can provide several benefits:
+
+ 
+  
+   Query performance can be improved dramatically in certain situations,
+   particularly when most of the heavily accessed rows of the table are in a
+   single partition or a small number of partitions.  The partitioning
+   substitutes for leading columns of indexes, reducing index size and
+   making it more likely that the heavily-used parts of the indexes
+   fit in memory.
+  
+ 
+
+ 
+  
+   When queries or updates access a large percentage of a single
+   partition, performance can be improved by taking advantage
+   of sequential scan of that partition instead of using an
+   index and random access reads scattered across the whole table.
+  
+ 
+
+ 
+  
+   Bulk loads and deletes can be accomplished by adding or removing
+   partitions, if that requirement is planned into the partitioning design.
+   Doing ALTER TABLE DETACH PARTITION followed by
+   DROP TABLE is far faster than a bulk operation.  These
+   commands also entirely avoid the VACUUM overhead
+   caused by a bulk DELETE.
+  
+ 
+
+ 
+  
+   Seldom-used data can be migrated to cheaper and slower storage media.
+  
+ 
+
+
+ The benefits will normally be worthwhile only when a table would
+ otherwise be very large. The exact point at which a table will
+ benefit from partitioning depends on the application, although a
+ rule of thumb is that the size of the table should exceed the physical
+ memory of the database server.
+
+
+
+ The following forms of partitioning can be implemented in
+ PostgreSQL:
+
+ 
+  
+   Range Partitioning
+
+   
+
+ The table is partitioned into ranges defined
+ by a key column or set of columns, with no overlap between
+ the ranges of values assigned to different partitions.  For
+ example one might partition by date ranges, or by ranges of
+ identifiers for particular business objects.
+
+   
+  
+
+  
+   List Partitioning
+
+   
+
+ The table is partitioned by explicitly listing which key values
+ appear in each partition.
+
+   
+  
+ 
+
+
+
+ The following partitioning methods are currently supported:
+
+ 
+  
+   Declarative Partitioning
+
+   
+
+ One creates a partitioned table by specifying
+ the partitioning method and a set of columns as the partition key.
+ Partitions, which contain actual data inserted
+ into the table, are created by specifying what subset of the data it
+ accepts.
+
+   
+  
+
+  
+   Using inheritance
+
+   
+
+ Each partition must be created as a child table of a single parent
+ table.  The parent table itself is normally empty; it exists just to
+ represent the entire data set.
+
+   
+  
+
+  
+   Using UNION ALL

Re: [HACKERS] Statement-level rollback

2017-03-02 Thread Tsunakawa, Takayuki

From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> 1. The argument for this is mostly, if not entirely, "application
> compatibility".  But it won't succeed at providing that if every BEGIN has
> to be spelled differently than it would be on other DBMSes.
> Therefore there is going to be enormous pressure to allow enabling the
> feature through a GUC, or some other environment-level way, and as soon
> as we do that we've lost.

I thought so, too.  I believe people who want to migrate from other DBMSs would 
set the GUC in postgresql.conf, or with ALTER DATABASE/USER just for 
applications which are difficult to modify.

> 2. The proposed feature would affect the internal operation of PL functions,
> so that those would need to become bulletproof against being invoked in
> either operating environment.  Likewise, all sorts of intermediate tools
> like connection poolers would no doubt be broken if they don't know about
> this and support both modes.  (We would have to start by fixing postgres_fdw
> and dblink, for instance.)

Yes, I'm going to modify the PL's behavior.  I'll also check the dblink and 
postgres_fdw as well.  In addition, I'll have a quick look at the code of 
pgpool-II and pgBouncer to see how they depend on the transaction state.  I'll 
run the regression tests of contribs, pgpool-II and pgBouncer with 
default_transaction_rollback_scope set to 'statement'.

But I don't see how badly the statement-level rollback affects those features 
other than PL.  I think the only relevant thing to those client-side programs 
is whether the transaction is still running, which is returned with 
ReadyForQuery.  Both of statement-level rollback and the traditional behavior 
leave the transaction running when an SQL statement fails.  Server-side 
autocommit differs in that respect.

Regards
Takayuki Tsunakawa






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statement-level rollback

2017-03-02 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Peter Eisentraut
> On 2/28/17 02:39, Tsunakawa, Takayuki wrote:
> > I'd like to propose statement-level rollback feature.  To repeat myself,
> this is requested for users to migrate from other DBMSs to PostgreSQL.  They
> expect that a failure of one SQL statement should not abort the entire
> transaction and their apps (client programs and stored procedures) can
> continue the transaction with a different SQL statement.
> 
> Can you provide some references on how other systems provide this feature?

Oracle doesn't.

SQL Server provides like this:

SET XACT_ABORT
https://msdn.microsoft.com/en-us/library/ms188792.aspx

MySQL doesn't.  BTW, MySQL enables changing autocommit mode with SET statement:

16.5.2.2 autocommit, Commit, and Rollback
https://dev.mysql.com/doc/refman/8.0/en/innodb-autocommit-commit-rollback.html

And above all, I've found EnterpriseDB supports statement-level rollback with 
GUC!  So PostgreSQL should be able to do.

https://www.edbpostgres.com/docs/en/9.6/asguide/EDB_Postgres_Advanced_Server_Guide.1.17.html#pID0E0QUD0HA

edb_stmt_level_tx is set to TRUE, then an exception will not automatically roll 
back prior uncommitted database updates. If edb_stmt_level_tx is set to FALSE, 
then an exception will roll back uncommitted database updates.

Note: Use edb_stmt_level_tx set to TRUE only when absolutely necessary, as this 
may cause a negative performance impact.

Regards
Takayuki Tsunakawa

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allow referring to functions without arguments when unique

2017-03-02 Thread Michael Paquier

On Wed, Mar 1, 2017 at 11:50 AM, Peter Eisentraut
 wrote:
> This is the "grand finale" that goes on top of the "DROP FUNCTION of
> multiple functions" patch set.  The purpose is to allow referring to
> functions without having to spell out the argument list, when the
> function name is unique.  This is especially useful when having to
> operate on "business logic" functions with many many arguments.  It's an
> SQL standard feature, and it applies everywhere a function is referred
> to in the grammar.  We already have the lookup logic for the regproc
> type, and thanks to the grand refactoring of the parser representation
> of functions, this is quite a small patch.  There is a bit of
> reduce/reduce parser mystery, to keep the reviewer entertained.

... Which would be nice.

> (The
> equivalent could be implemented for aggregates and operators, but I
> haven't done that here yet.)

OK. After a lookup, I am just seeing opfamily, opclass missing, so
this patch is doing it as you describe.

I have read through the code once, still I am waiting for the DROP
FUNCTION patches to be committed before doing a real hands-on.

@@ -7198,6 +7198,33 @@ function_with_argtypes:
n->objargs = extractArgTypes($2);
$$ = n;
}
This may not have arguments listed, so is function_with_argtypes really adapted?

+   /*
+* If no arguments were specified, the name must yield a unique candidate.
+*/
+   if (nargs == -1 && clist)
+   {
+   if (clist->next)
+   ereport(ERROR,
I would have used list_length here for clarity.

--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -1914,6 +1914,21 @@ LookupFuncName(List *funcname, int nargs, const
Oid *argtypes, bool noError)
The comment at the top of LookupFuncName() needs a refresh. The caller
can as well just use a function name without arguments.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-02 Thread Corey Huinker

On Fri, Mar 3, 2017 at 1:25 AM, Fabien COELHO  wrote:

>
>
> For endif, I really exagerated, "switch { defaut: " is too much, please
>> accept my apology. Maybe just do the pop & error reporting?
>>
>
It seemed like overkill, but I decided to roll with it.


>
> Or maybe be more explicit:
>
>   switch (current state)
>   case NONE:
>  error no matching if;
>   case ELSE_FALSE:
>   case ELSE_TRUE:
>   case ...:
>  pop;
>  Assert(success);


the pop() function tests for an empty stack, so this switch is
double-testing, but it's also no big deal, so here you go...
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index ae58708..9d245a9 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2035,6 +2035,83 @@ hello 10
 
   
 
+  
+\if expr
+\elif expr
+\else
+\endif
+
+
+This group of commands implements nestable conditional blocks, like
+this:
+
+
+-- set ON_ERROR_STOP in case the variables are not valid boolean expressions
+\set ON_ERROR_STOP on
+-- check for the existence of two separate records in the database and store
+-- the results in separate psql variables
+SELECT
+EXISTS(SELECT 1 FROM customer WHERE customer_id = 123) as is_customer,
+EXISTS(SELECT 1 FROM employee WHERE employee_id = 456) as is_employee
+\gset
+\if :is_customer
+SELECT * FROM customer WHERE customer_id = 123;
+\elif :is_employee
+\echo 'is not a customer but is an employee'
+SELECT * FROM employee WHERE employee_id = 456;
+\else
+\if yes
+\echo 'not a customer or employee'
+\else
+\echo 'this should never print'
+\endif
+\endif
+
+
+Conditional blocks must begin with a \if and end
+with an \endif, and the pairs must be found in
+the same source file. If an EOF is reached on the main file or an
+\include-ed file before all local
+\if-\endif are matched, then
+psql will raise an error.
+
+
+A conditional block can have any number of 
+\elif clauses, which may optionally be followed by a
+single \else clause.
+
+
+The \if and \elif commands
+read the rest of the line and evaluate it as a boolean expression.
+Currently, expressions are limited to a single unquoted string
+which are evaluated like other on/off options, so the valid values
+are any unambiguous case insensitive matches for one of:
+true, false, 
1,
+0, on, off,
+yes, no.  So 
+t, T, and tR
+will all match true.
+
+Expressions that do not properly evaluate to true or false will
+generate an error and cause the \if or
+\elif command to fail.  Because that behavior may
+change branching context in undesirable ways (executing code which
+was intended to be skipped, causing \elif,
+\else, and \endif commands to
+pair with the wrong \if, etc), it is
+recommended that scripts which use conditionals also set
+ON_ERROR_STOP.
+
+
+Lines within false branches are parsed normally for query/command
+completion, but any queries are not sent to the server, and any
+commands other than conditionals (\if,
+\elif, \else,
+\endif) are ignored. Conditional commands are
+still checked for valid nesting.
+
+
+  
 
   
 \ir or \include_relative 
filename
@@ -3703,8 +3780,9 @@ testdb= INSERT INTO my_table VALUES 
(:'content');
 
 
 In prompt 1 normally =,
-but ^ if in single-line mode,
-or ! if the session is disconnected from the
+but @ if the session is in a false conditional
+block, or ^ if in single-line mode, or
+! if the session is disconnected from the
 database (which can happen if \connect fails).
 In prompt 2 %R is replaced by a character that
 depends on why psql expects more input:
diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile
index c53733f..1492b66 100644
--- a/src/bin/psql/Makefile
+++ b/src/bin/psql/Makefile
@@ -21,7 +21,7 @@ REFDOCDIR= $(top_srcdir)/doc/src/sgml/ref
 override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS += -L$(top_builddir)/src/fe_utils -lpgfeutils -lpq
 
-OBJS=  command.o common.o help.o input.o stringutils.o mainloop.o copy.o \
+OBJS=  command.o common.o conditional.o help.o input.o stringutils.o 
mainloop.o copy.o \
startup.o prompt.o variables.o large_obj.o describe.o \
crosstabview.o tab-complete.o \
sql_help.o psqlscanslash.o \
@@ -61,8 +61,16 @@ uninstall:
 
 clean distclean:
rm -f psql$(X) $(OBJS) lex.backup
+   rm -rf tmp_check
 
 # files removed here are supposed to be in the distribution tarball,
 # so do not

Re: [HACKERS] [patch] reorder tablespaces in basebackup tar stream for backup_label

2017-03-02 Thread Michael Paquier

On Wed, Feb 22, 2017 at 9:23 PM, Bernd Helmle  wrote:
> The comment in the code says explicitely to add the base directory to
> the end of the list, not sure if that is out of a certain reason.
>
> I'd say this is an oversight in the implementation. I'm currently
> working on a tool using the streaming protocol directly and i've
> understood it exactly the way, that the default tablespace is the first
> one in the stream.
>
> So +1 for the patch.

Commit 507069de has switched the main directory from the beginning to
the end of the list, and the thread about this commit is here:
https://www.postgresql.org/message-id/AANLkTikgmZRkBuQ%2B_hcwPBv7Cd7xW48Ev%3DUBHA-k4v0W%40mail.gmail.com

+   /* Add a node for the base directory at the beginning.  This way, the
+* backup_label file is always the first file to be sent. */
ti = palloc0(sizeof(tablespaceinfo));
ti->size = opt->progress ? sendDir(".", 1, true, tablespaces,
true) : -1;
-   tablespaces = lappend(tablespaces, ti);
+   tablespaces = lcons(ti, tablespaces);
So, the main directory is located at the end on purpose. When using
--wal-method=fetch the WAL segments are part of the main tarball, so
if you send the main tarball first you would need to generate a second
tarball with the WAL segments that have been generated between the
moment the main tarball has finished until the end of the last
tablespace taken if you want to have a consistent backup. Your patch
would work with the stream mode though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-02 Thread Fabien COELHO




For endif, I really exagerated, "switch { defaut: " is too much, please 
accept my apology. Maybe just do the pop & error reporting?


Or maybe be more explicit:

  switch (current state)
  case NONE:
 error no matching if;
  case ELSE_FALSE:
  case ELSE_TRUE:
  case ...:
 pop;
 Assert(success);

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-03-02 Thread Michael Paquier

On Fri, Mar 3, 2017 at 3:18 PM, Robert Haas  wrote:
> Hopefully I haven't broken anything; please let me know if you
> encounter any issues.

Reading what has just been committed...

+   /*
+* No space found, file content is corrupted.  Return NULL to the
+* caller and inform him on the situation.
+*/
+   elog(ERROR,
+"missing space character in \"%s\"", LOG_METAINFO_DATAFILE);
+   break;
There is no need to issue a break after a elog(ERROR).

+* No newlinei found, file content is corrupted.  Return NULL to
s/newlinei/newline/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-03-02 Thread Robert Haas

On Mon, Feb 20, 2017 at 4:37 AM, Gilles Darold  wrote:
>> I think it's sufficient to just remove the file once on postmaster
>> startup before trying to launch the syslogger for the first time.
>> logging_collector is PGC_POSTMASTER, so if it's not turned on when the
>> cluster first starts, it can't be activated later.  If it dies, that
>> doesn't seem like a reason to remove the file.  We're going to restart
>> the syslogger, and when we do, it can update the file.
>>
>
> I've attached a new full v30 patch that include last patch from Karl.
> Now the current_logfile file is removed only at postmaster startup, just
> before launching the syslogger for the first time.

Committed with some changes.

- Added missing REVOKE for pg_current_logfile(text).
- Added error checks for unlink() and logfile_open().
- Changed error messages to use standard wording (among other reasons,
this helps minimize translation work).
- Removed update_metainfo_datafile() call that ran in the postmaster
and added a similar call that runs in the syslogger.
- doc: Removed an index entry that had no existing parallel.

Hopefully I haven't broken anything; please let me know if you
encounter any issues.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-02 Thread Fabien COELHO



Hello Corey,


v20: attempt at implementing the switch-on-all-states style.


For the elif I think it is both simpler and better like that. Whether 
committer will agree is an unkown, as always.


For endif, I really exagerated, "switch { defaut: " is too much, please 
accept my apology. Maybe just do the pop & error reporting?


For if, the evaluation & error could be moved before the switch, which may 
contain only the new state setting decision, and the push after the 
switch? Also, I would suggest to use default only to detect an unexpected 
state error, and list all other states explicitely.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance degradation in TPC-H Q18

2017-03-02 Thread Kuntal Ghosh

On Fri, Mar 3, 2017 at 8:41 AM, Robert Haas  wrote:
> On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund  wrote:
>> the resulting hash-values aren't actually meaningfully influenced by the
>> IV. Because we just xor with the IV, most hash-value that without the IV
>> would have fallen into a single hash-bucket, fall into a single
>> hash-bucket afterwards as well; just somewhere else in the hash-range.
>
> Wow, OK.  I had kind of assumed (without looking) that setting the
> hash IV did something a little more useful than that.  Maybe we should
> do something like struct blah { int iv; int hv; }; newhv =
> hash_any(, sizeof(blah)).
>
Sounds good. I've seen a post from Thomas Munro suggesting some
alternative approach for combining hash values in execGrouping.c[1].

>> In addition to that it seems quite worthwhile to provide an iterator
>> that's not vulnerable to this.  An approach that I am, seemingly
>> successfully, testing is to iterate the hashtable in multiple (in my
>> case 23, because why not) passes, accessing only every nth element. That
>> allows the data to be inserted in a lot less "dense" fashion.  But
>> that's more an optimization, so I'll just push something like the patch
>> mentioned in the thread already.
>>
>> Makes some sense?
>
> Yep.
>
Yes, it makes sense. Quadratic probing is another approach, but it
would require an extra shift op every time we want to find the next or
prev location during a collision.

[1] 
https://www.postgresql.org/message-id/caeepm%3d3rdgjfxw4ckvj0oemya2-34b0qhng1xv0vk7tgpjg...@mail.gmail.com
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-03-02 Thread Michael Paquier

On Thu, Mar 2, 2017 at 9:13 PM, Kyotaro HORIGUCHI
 wrote:
> I'm studying the normalization of Unicode so I apologize possible
> stupidity in advance.
>
> I looked into this and have some comments. Sorry for the random
> order.

Thanks, this needs a lot of background and I am glad that somebody is
taking the time to look at what I am doing here.

> 
> Composition version should be written some where.

Sure.

> 
> Perhaps one of the most important things is that the code exactly
> reflects the TR. pg_utf8_check_string returns true for ASCII
> strings but the TR says that
>
> | Text containing only ASCII characters (U+ to U+007F) is left
> | unaffected by all of the normalization forms. This is
> | particularly important for programming languages
>
> And running SASLprepare for apparent ASCII string (which is the
> most case) is a waste of CPU cycles.

Yeah, that's true. We could just for example check in
pg_utf8_check_string() if the length gathered matches strlen(source)
as only ASCII are 1-byte long.

> 
> From the point of view of reflectivity (please someone teach me an
> appropreate wording for this..), basically the code had better to
> be a copy of the reference code as long as no performance
> degradation occurs. Hangul part of get_decomposed_size(and
> decompose_code, recompose_code) uses different naming from the
> TR. hindex should be sindex and t should be tindex. Magic numbers
> should have names in the TR.
>
> * A bit later, I noticed that these are copies of charlint. If so
>   I want a description about that.)

Yeah, their stuff works quite nicely.

> 
> The following comment is equivalent to "reordering in canonical
> order". But the definition of "decomposition" includes this
> step. (I'm not sure if it needs rewriting, since I acutually
> could understand that.)
>
>> /*
>>  * Now that the decomposition is done, apply the combining class
>>  * for each multibyte character.
>>  */

I have reworked a bit this one:
/*
-* Now that the decomposition is done, apply the combining class
-* for each multibyte character.
+* Now end the decomposition by applying the combining class for
+* each multibyte character.
 */

> 
>>* make the allocation of the recomposed string based on that assumption.
>>*/
>>   recomp_chars = (pg_wchar *) malloc(decomp_size * sizeof(int));
>>   lastClass = -1; /* this eliminates a special check */
>
> utf_sasl_prepare uses variable names with two naming
> conventions. Is there any reason for the two?

OK, did some adjustments here.

> 
>> starterCh = recomp_chars[0] = decomp_chars[0];
>
> starterCh reads as "starter channel" why not "starter_char"?

Starter character of the current set, which is a character with a
combining class of 0.

> 
>>   else if (start >= SBASE && start < (SBASE + SCOUNT) &&
>>((start - SBASE) % TCOUNT) == 0 &&
>>code >= TBASE && code <= (TBASE + TCOUNT))
>
> "code <= (TBASE + TCOUNT)" somewhat smells. Then I found the
> original code for the current implementation in charlint and it
> seems correct to me. Some description about the difference
> between them is needed.

Right. I have updated all those things to use constants instead of
hardcoded values.

> 
> In use_sasl_prepare, the recompose part is the copy of charlint
> but it seems a bit inefficient. It calls recompose_code
> unconditionally but it is required only for the case of
> "lastClass < chClass".
>
> Something like this. (This still calls recompose_code for the
> case that ch is the same position with starterChar so there still
> be room for more improvement.)

Agreed.

> 
> If I read the TR correctly, "6 Composition Exclusion Table" says
> that there some characters not to be composed. But I don't find
> the corresponding code. (or comments)

Ah, right! There are 100 characters that enter in this category, and
all of them have a combining class of 0, so it is as simple as
removing them from the tables generated.

I am attaching 0009 and 0010 that address those problems (pushed on
github as well) that can be applied on top of the latest set.
-- 
Michael


0009-Set-of-fixes-for-SASLprep.patch
Description: Binary data


0010-Consider-characters-excluded-from-composition-in-con.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Print correct startup cost for the group aggregate.

2017-03-02 Thread Ashutosh Bapat

On Thu, Mar 2, 2017 at 6:48 PM, Ashutosh Bapat
 wrote:
> On Thu, Mar 2, 2017 at 6:06 PM, Rushabh Lathia  
> wrote:
>> Hi,
>>
>> While reading through the cost_agg() I found that startup cost for the
>> group aggregate is not correctly assigned. Due to this explain plan is
>> not printing the correct startup cost.
>>
>> Without patch:
>>
>> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
>> filler like '%foo%' group by aid;
>>  QUERY PLAN
>> -
>>  GroupAggregate  (cost=81634.33..85102.04 rows=198155 width=12)
>>Group Key: aid
>>->  Sort  (cost=81634.33..82129.72 rows=198155 width=8)
>>  Sort Key: aid
>>  ->  Seq Scan on pgbench_accounts  (cost=0.00..61487.89 rows=198155
>> width=8)
>>Filter: (filler ~~ '%foo%'::text)
>> (6 rows)
>>
>> With patch:
>>
>> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
>> filler like '%foo%' group by aid;
>>  QUERY PLAN
>> -
>>  GroupAggregate  (cost=82129.72..85102.04 rows=198155 width=12)
>>Group Key: aid
>>->  Sort  (cost=81634.33..82129.72 rows=198155 width=8)
>>  Sort Key: aid
>>  ->  Seq Scan on pgbench_accounts  (cost=0.00..61487.89 rows=198155
>> width=8)
>>Filter: (filler ~~ '%foo%'::text)
>> (6 rows)
>>
>
> The reason the reason why startup_cost = input_startup_cost and not
> input_total_cost for aggregation by sorting is we don't need the whole
> input before the Group/Agg plan can produce the first row. But I think
> setting startup_cost = input_startup_cost is also not exactly correct.
> Before the plan can produce one row, it has to transit through all the
> rows belonging to the group to which the first row belongs. On an
> average it has to scan (total number of rows)/(number of groups)
> before producing the first aggregated row. startup_cost will be
> input_startup_cost + cost to scan (total number of rows)/(number of
> groups) rows + cost of transiting over those many rows. Total cost =
> startup_cost + cost of scanning and transiting through the remaining
> number of input rows.

The comment below from cost_agg() may be the reason why we don't
bother including the cost of aggregating first group in startup cost.
 * Note: in this cost model, AGG_SORTED and AGG_HASHED have exactly the
 * same total CPU cost, but AGG_SORTED has lower startup cost.  If the
 * input path is already sorted appropriately, AGG_SORTED should be
 * preferred (since it has no risk of memory overflow).  This will happen
 * as long as the computed total costs are indeed exactly equal --- but if
 * there's roundoff error we might do the wrong thing.  So be sure that
 * the computations below form the same intermediate values in the same
 * order.

Particularly the last line implies that if we compute intermediate
values differently for the hash and sort aggregates, we have a risk of
floating point rounding errors. Including the cost of computing first
group's in the startup cost does exactly that. Probably having total
cost consistent is more important than having correct startup cost as
long as it's lesser than hash aggregate when the input is sorted per
grouping key.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Consistency checking for hash indexes

2017-03-02 Thread Amit Kapila

On Tue, Feb 28, 2017 at 11:06 AM, Kuntal Ghosh
 wrote:
> Hello everyone,
>
> I've attached a patch which implements WAL consistency checking for
> hash indexes. This feature is going to be useful for developing and
> testing of WAL logging for hash index.
>

I think it is better if you base your patch on (Microvacuum support
for hash index - https://commitfest.postgresql.org/13/835/).

1.
There are some hints which we might want to mask that are used in that
patch.  For ex, I think you need to take care of Dead marking at page
level. Refer below code in patch "Microvacuum support for hash index".
+ if (killedsomething)
+ {
+ opaque->hasho_flag |= LH_PAGE_HAS_DEAD_TUPLES;


2.
+ else if ((opaque->hasho_flag & LH_BUCKET_PAGE) ||
+ (opaque->hasho_flag & LH_OVERFLOW_PAGE))
+ {
+ /*
+ * In btree bucket and overflow pages, it is possible to modify the
+ * LP_FLAGS without emitting any WAL record. Hence, mask the line
+ * pointer flags.
+ * See hashgettuple() for details.
+ */
+ mask_lp_flags(page);
+ }

Again, this mechanism is also modified by patch "Microvacuum support
for hash index", so above changes needs to be adjusted accordingly.
Comment referring to btree is wrong, you need to refer hash.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-02 Thread Robert Haas

On Mon, Feb 20, 2017 at 9:43 PM, Tomas Vondra
 wrote:
> BTW I've noticed the pageinspect version is 1.6, but we only have
> pageinspect--1.5.sql (and upgrade script to 1.6). Not sure that's entirely
> intentional?

Actually, that's the New Way.  See 40b449ae84dcf71177d7749a7b0c582b64dc15f0.

+extern Datum bt_metap(PG_FUNCTION_ARGS);
+extern Datum bt_page_items(PG_FUNCTION_ARGS);
+extern Datum bt_page_items_bytea(PG_FUNCTION_ARGS);
+extern Datum bt_page_stats(PG_FUNCTION_ARGS);

Not needed.  PG_FUNCTION_INFO_V1 now does it.

-values[j++] = psprintf("%d", stat.blkno);
-values[j++] = psprintf("%c", stat.type);
-values[j++] = psprintf("%d", stat.live_items);
-values[j++] = psprintf("%d", stat.dead_items);
-values[j++] = psprintf("%d", stat.avg_item_size);
-values[j++] = psprintf("%d", stat.page_size);
-values[j++] = psprintf("%d", stat.free_size);
-values[j++] = psprintf("%d", stat.btpo_prev);
-values[j++] = psprintf("%d", stat.btpo_next);
-values[j++] = psprintf("%d", (stat.type == 'd') ? stat.btpo.xact : stat.btp
o.level);
-values[j++] = psprintf("%d", stat.btpo_flags);
+values[j] = palloc(32);
+snprintf(values[j++], 32, "%d", stat.blkno);
+values[j] = palloc(32);
+snprintf(values[j++], 32, "%c", stat.type);
+values[j] = palloc(32);
+snprintf(values[j++], 32, "%d", stat.live_items);
+values[j] = palloc(32);
+snprintf(values[j++], 32, "%d", stat.dead_items);
+values[j] = palloc(32);
+snprintf(values[j++], 32, "%d", stat.avg_item_size);
+values[j] = palloc(32);
+snprintf(values[j++], 32, "%d", stat.page_size);
+values[j] = palloc(32);
+snprintf(values[j++], 32, "%d", stat.free_size);
+values[j] = palloc(32);
+snprintf(values[j++], 32, "%d", stat.btpo_prev);
+values[j] = palloc(32);
+snprintf(values[j++], 32, "%d", stat.btpo_next);
+values[j] = palloc(32);
+if (stat.type == 'd')
+snprintf(values[j++], 32, "%d", stat.btpo.xact);
+else
+snprintf(values[j++], 32, "%d", stat.btpo.level);
+values[j] = palloc(32);
+snprintf(values[j++], 32, "%d", stat.btpo_flags);

This does not seem like a good idea in any way, and the patch has
several instances of it.

I don't object to the concept, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Instability in select_parallel regression test

2017-03-02 Thread Robert Haas

On Mon, Feb 27, 2017 at 8:07 AM, Amit Kapila  wrote:
>> My guess is that if we apply the fix I suggested above, it'll be good
>> enough.  If that turns out not to be true, then I guess we'll have to
>> deal with that, but why not do the easy thing first?
>
> Okay, that is also a sensible approach.  Your patch looks good to me,
> though I have not tested it.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error detail when partition not found

2017-03-02 Thread Amit Langote

On 2017/03/03 12:43, Robert Haas wrote:
> On Mon, Feb 27, 2017 at 9:54 AM, Amit Langote
>  wrote:
>> Updated patch is attached.
> 
> Committed with one grammatical change to the comments.

Thanks.  I've marked this as fixed on the open item wiki page.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

2017-03-02 Thread Michael Paquier

On Thu, Mar 2, 2017 at 2:20 PM, Kyotaro HORIGUCHI
 wrote:
> 5) Just remove plain map files and all related code. Addition to
>that, Makefile stores hash digest of authority files in
>Unicode/authoriy_hashes.txt or something that is managed by
>git.

That may be an idea to check for differences across upstream versions.
But that sounds like a separate discussion to me.

> This digest may differ among platforms (typically from cr/nl
> difference) but we can assume *nix for the usage.
>
> I will send the next version after this discussion is settled.

Sure. There is not much point to move on without Heikki's opinion at
least, or anybody else like Ishii-san or Tom who are familiar with
this code. I would think that Heikki would be the committer to pick up
this change though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error detail when partition not found

2017-03-02 Thread Robert Haas

On Mon, Feb 27, 2017 at 9:54 AM, Amit Langote
 wrote:
> Updated patch is attached.

Committed with one grammatical change to the comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance degradation in TPC-H Q18

2017-03-02 Thread Robert Haas

On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund  wrote:
> the resulting hash-values aren't actually meaningfully influenced by the
> IV. Because we just xor with the IV, most hash-value that without the IV
> would have fallen into a single hash-bucket, fall into a single
> hash-bucket afterwards as well; just somewhere else in the hash-range.

Wow, OK.  I had kind of assumed (without looking) that setting the
hash IV did something a little more useful than that.  Maybe we should
do something like struct blah { int iv; int hv; }; newhv =
hash_any(, sizeof(blah)).

> In addition to that it seems quite worthwhile to provide an iterator
> that's not vulnerable to this.  An approach that I am, seemingly
> successfully, testing is to iterate the hashtable in multiple (in my
> case 23, because why not) passes, accessing only every nth element. That
> allows the data to be inserted in a lot less "dense" fashion.  But
> that's more an optimization, so I'll just push something like the patch
> mentioned in the thread already.
>
> Makes some sense?

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-03-02 Thread Andreas Karlsson


On 03/02/2017 02:25 AM, Jim Nasby wrote:

On 2/28/17 11:21 AM, Andreas Karlsson wrote:

The only downside I can see to this approach is that we no logner will
able to reindex catalog tables concurrently, but in return it should be
easier to confirm that this approach can be made work.


Another downside is any stored regclass fields will become invalid.
Admittedly that's a pretty unusual use case, but it'd be nice if there
was at least a way to let users fix things during the rename phase
(perhaps via an event trigger).


Good point, but I agree with Andres here. Having REINDEX CONCURRENTLY 
issue event triggers seems strange to me. While it does create and drop 
indexes as part of its implementation, it is actually just an index 
maintenance job.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-03-02 Thread Robert Haas

On Wed, Mar 1, 2017 at 6:29 AM, Rahila Syed  wrote:
> 3. Handling adding a new partition to a partitioned table
>with default partition.
>This will require moving tuples from existing default partition to
>   newly created partition if they satisfy its partition bound.

Considering that this patch was submitted at the last minute and isn't
even complete, I can't see this getting into v10.  But that doesn't
mean we can't talk about it.  I'm curious to hear other opinions on
whether we should have this feature.  On the point mentioned above, I
don't think adding a partition should move tuples, necessarily; seems
like it would be good enough - maybe better - for it to fail if there
are any that would need to be moved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-02 Thread Neha Khatri

Hi Brandur,

Couple of typo corrections required in patch:

s/converstion/conversion
s/go the heap/go to the heap

The performance results you shared are for he Index Creation operation.
Are there similar results for the sorting using ORDER BY queries too? Just
curious.

Regards,
Neha

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-02 Thread Amit Langote

On 2017/03/02 21:48, Robert Haas wrote:
> On Thu, Mar 2, 2017 at 3:52 PM, Amit Langote
>  wrote:
>>> think we should omit this logic (and change the documentation to
>>> match).  That is, a database-wide ANALYZE should update the statistics
>>> for each child as well as for the parent.  Otherwise direct queries
>>> against the children (and partitionwise joins, once we have that) are
>>> going to go haywire.
>>
>> OK, done.  I updated both analyze.sgml and vacuum.sgml to be more up to
>> date.  Both pages previously omitted materialized views.
>>
>> Attached updated patches.
> 
> Thanks, committed 0001 with a slight change to the wording.

Thanks.  I noticed that 'and' is duplicated in a line added by the commit
to analyze.sgml.  Attached 0001 fixes that.  0002 and 0003 same as the
last version.

Thanks,
Amit
>From 8a7bc891a9f54b72e111ccfe98fb96c75eb4e546 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 3 Mar 2017 09:57:11 +0900
Subject: [PATCH 1/3] Fix duplicated 'and' in analyze.sgml

---
 doc/src/sgml/ref/analyze.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index 49727e22df..45dee101df 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -64,7 +64,7 @@ ANALYZE [ VERBOSE ] [ table_name [
  
   The name (possibly schema-qualified) of a specific table to
   analyze.  If omitted, all regular tables, partitioned tables, and
-  and materialized views in the current database are analyzed (but not
+  materialized views in the current database are analyzed (but not
   foreign tables).  If the specified table is a partitioned table, both the
   inheritance statistics of the partitioned table as a whole and
   statistics of the individual partitions are updated.
-- 
2.11.0

>From 9776733add8ae509eb6ef6a226a17e81b057677b Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 6 Feb 2017 17:26:48 +0900
Subject: [PATCH 2/3] Avoid creating scan nodes for partitioned tables

* Currently, we create scan nodes for inheritance parents in their role
  as an inheritance set member.  Partitioned tables do not contain any
  data, so it's useless to create scan nodes for them.  So we need not
  create AppendRelInfo's for them in the planner prep phase.

* The planner prep phase turns off inheritance on the parent RTE if
  there isn't at least one child member (other than the parent itself
  which in normal cases is also a child member), which means the latter
  phases will not consider creating an Append plan and instead create a
  scan node. Avoid this if the RTE is a partitioned table, by noticing
  such cases in set_rel_size().  Per suggestion from Ashutosh Bapat.

* Since we do not add the RTE corresponding to the root partitioned
  table as the 1st child member of the inheritance set,
  inheritance_planner() must not assume the same when assigning
  nominalRelation to a ModifyTable node.

* In ExecInitModifyTable(), use nominalRelation to initialize tuple
  routing information, instead of the first resultRelInfo.  We set
  ModifyTable.nominalRelation to the root parent RTE in the
  partitioned table case (implicitly in the INSERT case, whereas
  explicitly in the UPDATE/DELETE cases).
---
 src/backend/executor/nodeModifyTable.c|  18 +++--
 src/backend/optimizer/path/allpaths.c |   8 ++
 src/backend/optimizer/plan/planner.c  |  14 +++-
 src/backend/optimizer/prep/prepunion.c|  32 ++--
 src/test/regress/expected/inherit.out | 124 ++
 src/test/regress/expected/tablesample.out |   4 +-
 src/test/regress/sql/inherit.sql  |   8 ++
 7 files changed, 108 insertions(+), 100 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 95e158970c..188ce8f8d4 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -45,6 +45,7 @@
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/parsetree.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "utils/builtins.h"
@@ -1634,7 +1635,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	Plan	   *subplan;
 	ListCell   *l;
 	int			i;
-	Relation	rel;
+	Relation		nominalRel;
+	RangeTblEntry  *nominalRTE;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -1726,9 +1728,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	estate->es_result_relation_info = saved_resultRelInfo;
 
 	/* Build state for INSERT tuple routing */
-	rel = mtstate->resultRelInfo->ri_RelationDesc;
+	nominalRTE = rt_fetch(node->nominalRelation, estate->es_range_table);
+	nominalRel = heap_open(nominalRTE->relid, NoLock);
 	if (operation == CMD_INSERT &&
-		rel->rd_rel->relkind ==

Re: [HACKERS] snapbuild woes

2017-03-02 Thread Petr Jelinek

On 03/03/17 01:53, Erik Rijkers wrote:
> On 2017-03-03 01:30, Petr Jelinek wrote:
> 
> With these patches:
> 
> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
> 0002-Fix-after-trigger-execution-in-logical-replication.patch
> 0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
> snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
> snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
> 
> snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
> snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
> snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patch
> 0001-Logical-replication-support-for-initial-data-copy-v6.patch
> 
> I get:
> 
> subscriptioncmds.c:47:12: error: static declaration of ‘oid_cmp’ follows
> non-static declaration
>  static int oid_cmp(const void *p1, const void *p2);
> ^~~
> In file included from subscriptioncmds.c:42:0:
> ../../../src/include/utils/builtins.h:70:12: note: previous declaration
> of ‘oid_cmp’ was here
>  extern int oid_cmp(const void *p1, const void *p2);
> ^~~
> make[3]: *** [subscriptioncmds.o] Error 1
> make[3]: *** Waiting for unfinished jobs
> make[2]: *** [commands-recursive] Error 2
> make[2]: *** Waiting for unfinished jobs
> make[1]: *** [all-backend-recurse] Error 2
> make: *** [all-src-recurse] Error 2
> 

Yes the copy patch needs rebase as well. But these ones are fine.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-03-02 Thread Erik Rijkers


On 2017-03-03 01:30, Petr Jelinek wrote:

With these patches:

0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patch
0001-Logical-replication-support-for-initial-data-copy-v6.patch

I get:

subscriptioncmds.c:47:12: error: static declaration of ‘oid_cmp’ follows 
non-static declaration

 static int oid_cmp(const void *p1, const void *p2);
^~~
In file included from subscriptioncmds.c:42:0:
../../../src/include/utils/builtins.h:70:12: note: previous declaration 
of ‘oid_cmp’ was here

 extern int oid_cmp(const void *p1, const void *p2);
^~~
make[3]: *** [subscriptioncmds.o] Error 1
make[3]: *** Waiting for unfinished jobs
make[2]: *** [commands-recursive] Error 2
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-03-02 Thread Petr Jelinek

On 26/02/17 01:43, Petr Jelinek wrote:
> On 24/02/17 22:56, Petr Jelinek wrote:
>> On 22/02/17 03:05, Petr Jelinek wrote:
>>> On 13/12/16 00:38, Petr Jelinek wrote:
 On 12/12/16 23:33, Andres Freund wrote:
> On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
>> On 12/12/16 22:42, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
 Hi,
 First one is outright bug, which has to do with how we track running
 transactions. What snapbuild basically does while doing initial 
 snapshot
 is read the xl_running_xacts record, store the list of running txes and
 then wait until they all finish. The problem with this is that
 xl_running_xacts does not ensure that it only logs transactions that 
 are
 actually still running (to avoid locking PGPROC) so there might be xids
 in xl_running_xacts that already committed before it was logged.
>>>
>>> I don't think that's actually true?  Notice how LogStandbySnapshot()
>>> only releases the lock *after* the LogCurrentRunningXacts() iff
>>> wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
>>> observed must actually be a bit more complex :(
>>>
>>
>> Hmm, interesting, I did see the transaction commit in the WAL before the
>> xl_running_xacts that contained the xid as running. I only seen it on
>> production system though, didn't really manage to easily reproduce it
>> locally.
>
> I suspect the reason for that is that RecordTransactionCommit() doesn't
> conflict with ProcArrayLock in the first place - only
> ProcArrayEndTransaction() does.  So they're still running in the PGPROC
> sense, just not the crash-recovery sense...
>

 That looks like reasonable explanation. BTW I realized my patch needs
 bit more work, currently it will break the actual snapshot as it behaves
 same as if the xl_running_xacts was empty which is not correct AFAICS.

>>>
>>> Hi,
>>>
>>> I got to work on this again. Unfortunately I haven't found solution that
>>> I would be very happy with. What I did is in case we read
>>> xl_running_xacts which has all transactions we track finished, we start
>>> tracking from that new xl_running_xacts again with the difference that
>>> we clean up the running transactions based on previously seen committed
>>> ones. That means that on busy server we may wait for multiple
>>> xl_running_xacts rather than just one, but at least we have chance to
>>> finish unlike with current coding which basically waits for empty
>>> xl_running_xacts. I also removed the additional locking for logical
>>> wal_level in LogStandbySnapshot() since it does not work.
>>
>> Not hearing any opposition to this idea so I decided to polish this and
>> also optimize it a bit.
>>
>> That being said, thanks to testing from Erik Rijkers I've identified one
>> more bug in how we do the initial snapshot. Apparently we don't reserve
>> the global xmin when we start building the initial exported snapshot for
>> a slot (we only reserver catalog_xmin which is fine for logical decoding
>> but not for the exported snapshot) so the VACUUM and heap pruning will
>> happily delete old versions of rows that are still needed by anybody
>> trying to use that exported snapshot.
>>
> 
> Aaand I found one more bug in snapbuild. Apparently we don't protect the
> snapshot builder xmin from going backwards which can yet again result in
> corrupted exported snapshot.
> 
> Summary of attached patches:
> 0001 - Fixes the above mentioned global xmin tracking issues. Needs to
> be backported all the way to 9.4
> 
> 0002 - Removes use of the logical decoding saved snapshots for initial
> exported snapshot since those snapshots only work for catalogs and not
> user data. Also needs to be backported all the way to 9.4.

I've been bit overzealous about this one (removed the use of the saved
snapshots completely). So here is a fix for that (and rebase on top of
current HEAD).

> 
> 0003 - Makes sure snapshot builder xmin is not moved backwards by
> xl_running_xacts (which can otherwise happen during initial snapshot
> building). Also should be backported to 9.4.
> 
> 0004 - Changes handling of the xl_running_xacts in initial snapshot
> build to what I wrote above and removes the extra locking from
> LogStandbySnapshot introduced by logical decoding.
> 
> 0005 - Improves performance of initial snapshot building by skipping
> catalog snapshot build for transactions that don't do catalog changes.
> 


-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 24 Feb 2017 21:39:03 +0100
Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export

Otherwise the VACUUM or pruning might remove

Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-02 Thread Thomas Munro

On Wed, Mar 1, 2017 at 6:06 PM, Thomas Munro
 wrote:
> On Wed, Mar 1, 2017 at 5:58 PM, Alvaro Herrera  
> wrote:
>> I think one of the most serious issues with BRIN indexes is how they
>> don't get updated automatically as the table is filled.  This patch
>> attempts to improve on that.  During brininsert() time, we check whether
>> we're inserting the first item on the first page in a range.  If we are,
>> request autovacuum to do a summarization run on that table.  This is
>> dependent on a new reloption for BRIN called "autosummarize", default
>> off.
>
> Nice.

Another thought about this design:  Why autovacuum?

Obviously we don't want to get to the point where you start up
PostgreSQL and see 25 llines like BRIN SummarizationLauncher started,
Foo Launcher started, Bar Launcher started, ... but perhaps there is a
middle ground between piling all the background work into the
autovacuum framework, and making new dedicated launchers and workers
for each thing.

Is there some way we could turn this kind of maintenance work into a
'task' (insert better word) that can be scheduled to run
asynchronously by magic workers, so that you don't have to supply a
whole worker and main loop and possibly launcher OR jam new
non-vacuum-related work into the vacuum machinery, for each thing like
this that someone decides to invent?

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-02 Thread Thomas Munro

On Wed, Mar 1, 2017 at 6:06 PM, Thomas Munro
 wrote:
> On Wed, Mar 1, 2017 at 5:58 PM, Alvaro Herrera  
> wrote:
>> I think one of the most serious issues with BRIN indexes is how they
>> don't get updated automatically as the table is filled.  This patch
>> attempts to improve on that.  During brininsert() time, we check whether
>> we're inserting the first item on the first page in a range.  If we are,
>> request autovacuum to do a summarization run on that table.  This is
>> dependent on a new reloption for BRIN called "autosummarize", default
>> off.
>
> Nice.
>
>> The way the request works is that autovacuum maintains a DSA which can
>> be filled by backends with "work items".  Currently, work items can
>> specify a BRIN summarization of some specific index; in the future we
>> could use this framework to request other kinds of things that do not
>> fit in the "dead tuples / recently inserted tuples" logic that autovac
>> currently uses to decide to vacuum/analyze tables.
>>
>> However, it seems I have not quite gotten the hang of DSA just yet,
>> because after a couple of iterations, crashes occur.  I think the reason
>> has to do with either a resource owner clearing the DSA at an unwelcome
>> time, or perhaps there's a mistake in my handling of DSA "relative
>> pointers" stuff.
>
> Ok, I'll take a look.  It's set up for ease of use in short lifespan
> situations like parallel query, and there are a few extra hoops to
> jump through for longer lived DSA areas.

I haven't tested this, but here is some initial feedback after reading
it through once:

 /*
  * Attach to an area given a handle generated (possibly in another process) by
- * dsa_get_area_handle.  The area must have been created with dsa_create (not
+ * dsa_get_handle.  The area must have been created with dsa_create (not
  * dsa_create_in_place).
  */

This is an independent slam-dunk typo fix.

+/*
+ * Set up our DSA so that backends can install work-item requests.  It may
+ * already exist as created by a previous launcher.
+ */
+if (!AutoVacuumShmem->av_dsa_handle)
+{
+LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+AutoVacuumDSA = dsa_create(LWTRANCHE_AUTOVACUUM);
+AutoVacuumShmem->av_dsa_handle = dsa_get_handle(AutoVacuumDSA);
+/* delay array allocation until first request */
+AutoVacuumShmem->av_workitems = InvalidDsaPointer;
+LWLockRelease(AutovacuumLock);
+}
+else
+AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);

I haven't looked into the autovacuum launcher lifecycle, but if it can
be restarted as implied by the above then I see no reason to believe
that the DSA area still exists at the point where you call
dsa_attach() here.  DSA areas are reference counted, so if there ever
a scenario where no backend is currently attached, then it will be
destroyed and this call will fail.  If you want to create a DSA area
that lasts until cluster shutdown even while all backends are
detached, you need to call dsa_pin() after creating it.

In AutoVacuumRequestWork:

+AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
+
+if (!AutoVacuumDSA)
+{
+/* cannot attach?  disregard request */
+LWLockRelease(AutovacuumLock);
+return;
+}

dsa_attach either succeeds or throws, so that conditional code is unreachable.

+/* XXX surely there is a simpler way to do this */
+wi_ptr = AutoVacuumShmem->av_workitems + sizeof(AutovacWorkItems) +
+sizeof(AutoVacuumWorkItem) * i;
+workitem = (AutoVacuumWorkItem *)
dsa_get_address(AutoVacuumDSA, wi_ptr);

It'd probably be simpler to keep hold of the backend-local address of
the the base of the workitems array and then use regular C language
facilities like array notation to work with it: workitems =
[i], and then:

+/* ... and put it on autovacuum's to-do list */
+workitems->avs_usedItems = wi_ptr;

Considering that i is really an index into the contiguous workitems
array, maybe you should really just be storing the index from i here,
instead of dealing in dsa_pointers.  The idea with dsa_pointers is
that they're useful for complex data structures that might point into
different DSM segments, like a hash table or binary tree that has
internal pointers that could pointer arbitrary other objects in a data
structure because it's allocated in incremental pieces.  Here, you are
dealing with objects in a contiguous memory space of fixed size.  This
leads to a bigger question about this design, which I'll ask at the
end.

Then at the bottom of AutoVacuumRequestWork:

+LWLockRelease(AutovacuumLock);
+dsa_detach(AutoVacuumDSA);
+AutoVacuumDSA = NULL;
+}

I'm guessing that you intended to remain attached, rather than
detaching at the end like this?  Otherwise every backend that is
inserting lots of new data attaches and detaches

Re: [HACKERS] Cleanup: avoid direct use of ip_posid/ip_blkid

2017-03-02 Thread Peter Geoghegan

On Thu, Mar 2, 2017 at 8:25 AM, Peter Eisentraut
 wrote:
> I wonder why we allow that.  Shouldn't the tid type reject input that
> has ip_posid == 0?

InvalidOffsetNumber (that is, 0) is something that I wouldn't like to
bet doesn't make it out to disk at some point. I know that we use 1 as
a meaningless placeholder value for internal B-Tree pages. P_HIKEY is
where I get 1 from, which might as well be any other value in
practice, I think -- we only need an item pointer to point to a block
from an internal page. SpecTokenOffsetNumber can certainly make it to
disk, and that is invalid in some sense, even if it isn't actually set
to InvalidOffsetNumber. So, seems pretty risky to me.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [GSoC] Personal presentation and request for clarification

2017-03-02 Thread João Miguel Afonso

Dear community member(s),

I am João Afonso, a Portuguese MSc student and I'm writing to ask some 
information about the GSoC projects.

For the reasons explained below, PostgreSQL was the organisation that I most 
identify with, so I am trying to introduce myself to the community. This way, 
as I really want to participate, I will describe  my most relevant experiences 
and knowledge on the field.
Please feel free to pass by the less relevant topics.

The project that most caught my eye was on "Implementing push-based query 
executor".
Although it completely fits my capabilities and current research, I have some 
concerns on "The ability to understand and modify PostgresSQL executor code" as 
I had not enough time to understand the dimension of the referred changes.

My second choice would be the "Sorting algorithms benchmark and 
implementation", that although is not directly related to my current work, I am 
more familiarised with and looks quite easier to accomplish.
As I said, I had not enough time to explore the whole project documentation or 
source code, but I read the code of the sorting algorithm and I realised that 
it is sequential. Would a parallel implementation take some benefits here?

I will keep working on reading all the documentation and some of the code, but 
I would appreciate if someone more familiarised with the project could point me 
the project that best suits my abilities



My motivations:


A group formed by me and other four MSc students is currently working on a 
solution for

a linear algebra approach to OLAP. We are at the same time translating the SQL 
language to linear

algebra operations, developing methods to automate the process, optimising the 
previously

implemented sparse matrix operations, and benchmarking the resultant work on 
different

Intel x86 and Nvidia architectures (multi-core, many-core, GPU). Future work 
may even include query/machine level cost prediction functionalities.


It would be really interesting for me to do a continue analysis on how the 
replacement of relational algebra would influence the performance and 
implementation complexity of each independent module and the entire system, so 
at least I could do benchmark tasks even if I am not accepted on GSoC.


I'm also working on a personal project of making a general benchmark script, 
capable of test all the combinations of N parameters, both in Serial, Shared 
and Distributed Memory. It main purpose is to reduce the time spent on the MSc 
assigned tasks. [GIT HERE]


Not referring the anxiety of joining such important project (what I think is 
normal), my major concern for both projects is my reduced experience in the 
referred microoptimizations.


This way, I feel that is important to include my previous work on this topic. 
It was on the simple dot product algorithm and the main cases of study was 
cache issues, CPU/Memory bounds,... The work is described 
[HERE].


Please feel free to contact me for any question.

Best regards,

João Afonso

Re: [HACKERS] delta relations in AFTER triggers

2017-03-02 Thread Kevin Grittner

On Thu, Mar 2, 2017 at 9:04 AM, David Steele  wrote:
> On 2/20/17 10:43 PM, Thomas Munro wrote:

>> Based on a suggestion from Robert off-list I tried inserting into a
>> delta relation from a trigger function and discovered that it
>> segfaults:
>>
>>   * frame #0: 0x0001057705a6
>> postgres`addRangeTableEntryForRelation(pstate=0x7fa58186a4d0,
>> rel=0x, alias=0x, inh='\0',
>> inFromCl='\0') + 70 at parse_relation.c:1280 [opt]
>> frame #1: 0x00010575bbda
>> postgres`setTargetTable(pstate=0x7fa58186a4d0,
>> relation=0x7fa58186a098, inh=, alsoSource='\0',
>> requiredPerms=1) + 90 at parse_clause.c:199 [opt]
>> frame #2: 0x000105738530 postgres`transformStmt [inlined]
>> transformInsertStmt(pstate=) + 69 at analyze.c:540 [opt]
>> frame #3: 0x0001057384eb
>> postgres`transformStmt(pstate=, parseTree=)
>> + 2411 at analyze.c:279 [opt]
>> frame #4: 0x000105737a42
>> postgres`transformTopLevelStmt(pstate=,
>> parseTree=0x7fa58186a438) + 18 at analyze.c:192 [opt]
>> frame #5: 0x0001059408d0
>> postgres`pg_analyze_and_rewrite_params(parsetree=0x7fa58186a438,
>> query_string="insert into d values (100, 100, 'x')",
>> parserSetup=(plpgsql.so`plpgsql_parser_setup at pl_comp.c:1017),
>> parserSetupArg=0x7fa58185c2a0, queryEnv=0x7fa581857798) + 128
>> at postgres.c:706 [opt]
>
> Do you know when you will have a new patch ready?
>
> This looks like an interesting and important feature but I think it's
> going to have to come together quickly if it's going to make it into v10.

I hope to post  a new version addressing review comments by Monday (6 March).

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-02 Thread Julien Rouhaud

On Tue, Feb 28, 2017 at 08:58:24AM -0800, Brandur Leach wrote:
> Hi Julien,
> 

Hello Brandur,

> Thanks for the expedient reply, even after I'd dropped the
> ball for so long :)

:)

> Cool! I just re-read my own comment a few days later and I
> think that it still mostly makes sense, but definitely open
> to other edits if anyone else has one.
> 

Ok.

> > One last thing, I noticed that you added:
> >
> > +static int macaddr_internal_cmp(macaddr *a1, macaddr *a2);
> >
> > but the existing function is declared as
> >
> > static int32
> > macaddr_internal_cmp(macaddr *a1, macaddr *a2)
> >
> > I'd be in favor to declare both as int.
> 
> Great catch! I have no idea how I missed that. I've done as
> you suggested and made them both "int", which seems
> consistent with SortSupport implementations elsewhere.
> 

Great.

> > After this, I think this patch will be ready for committer.
> 
> Excellent! I've attached a new (and hopefully final)
> version of the patch.
> 
> Two final questions about the process if you'd be so kind:
> 
> * Should I change the status on the Commitfest link [1] or
>   do I leave that to you (or someone else like a committer)?
> 

This is is done by the reviewer, after check of the last patch version. I just
changed the status to ready for committer!

> * I've been generating a new OID value with the
>   `unused_oids` script, but pretty much every time I rebase
>   I collide with someone else's addition and need to find a
>   new one. Is it better for me to pick an OID in an exotic
>   range for my final patch, or that a committer just finds
>   a new one (if necessary) as they're putting it into
>   master?

I think picking the value with unused_oids as you dd is the right thing to do.
As Robert said, if this oid is used in another patch in the meantime, updating
it at commit time is not a big deal.  Moreover, this patch will require a
catversion bump, which is meant to be done by the committer.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-02 Thread Tomas Vondra


On 03/01/2017 05:29 AM, Andres Freund wrote:

On 2017-02-28 20:18:35 -0800, Andres Freund wrote:

- Andres, hoping the buildfarm turns greener


Oh well, that didn't work. Investigating.



Attaches is the last part of the patch series, rebased to current master 
and adopting the new chunk header approach. Unlike Slab, this context 
needs the whole AllocSet header (size, requested_size), and also the 
block pointer, so no padding seems to be needed.


I've tested this on x86-64 and amrv7l, and the test_decoding test suite 
passes on both.


FWIW, I'm still not entirely happy with the name "Generation". I agree 
with Andres that it's perhaps a bit too generic, but more importantly 
the name might actually be a bit obsolete. There used to be generations 
of chunks, but that's gone. Now it simply does not reuse the chunks at 
all, and frees the blocks when they get empty.


It's not entirely FIFO though, because the transactions interleave, so 
later blocks may be released first. But the "allocated close, freed 
close" is still there. So perhaps something like "TemporalSet" or 
something like that would be a better name?


Man, naming things is hard ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


gen-context.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-03-02 Thread Alvaro Herrera

Pavel Stehule wrote:
> 2017-03-02 19:32 GMT+01:00 Alvaro Herrera :
> 
> > So in the old (non-executor-node) implementation, you could attach WITH
> > ORDINALITY to the xmltable expression and it would count the output
> > rows, regardless of which XML document it comes from.  With the new
> > implementation, the grammar no longer accepts it.  To count output rows,
> > you still need to use row_number().  Maybe this is okay.  This is the
> > example from the docs, and I add another XML document with two more rows
> > for xmltable.  Look at the three numbering columns ...
> >
> 
> It is expected - now tablefunc are not special case of SRF, so it lost all
> SRF functionality. It is not critical lost - it supports internally FOR
> ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced
> to support WITH ORDINALITY in future, but I have not any use case for it.

Fine.

After looking at the new executor code a bit, I noticed that we don't
need the resultSlot anymore; we can use the ss_ScanTupleSlot instead.
Because resultSlot was being used in the xml.c code (which already
appeared a bit dubious to me), I changed the interface so that instead
the things that it read from it are passed as parameters -- namely, in
InitBuilder we pass natts, and in GetValue we pass typid and typmod.

Secondly, I noticed we have the FetchRow routine produce a minimal
tuple, put it in a slot; then its caller takes the slot and put the
tuple in the tuplestore.  This is pointless; we can just have FetchRow
put the tuple in the tuplestore directly and not bother with any slot
manipulations there.  This simplifies the code a bit.

Here's v47 with those changes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

xmltable-47.patch.gz
Description: application/gunzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance degradation in TPC-H Q18

2017-03-02 Thread Andres Freund

On 2017-03-01 11:05:33 +0530, Kuntal Ghosh wrote:
> On Wed, Mar 1, 2017 at 10:53 AM, Andres Freund  wrote:
> > On 2017-03-01 10:47:45 +0530, Kuntal Ghosh wrote:
> >> if (insertdist > curdist)
> >> {
> >> swap the entry to be inserted with the current entry.
> >> Try to insert the current entry in the same logic.
> >> }
> >>
> >> So, the second approach will not cause all the followers to be shifted by 
> >> 1.
> >
> > How not? You'll have to do that game until you found a free slot.
> You can skip increasing the curdist of some elements for which it is
> already pretty high. Suppose, we've the following hash table and an
> element e,
> _,_,_,i,_,_,j,j+1,j+2,_,_
> We want to insert e at i but there is a collision and we start doing
> probing. At j, the condition if (insertdist > curdist) satisfies and
> we'll swap e with the element at j. Now, we try to insert e( = element
> at j) and so on. In this process, if the element at j+1,j+2 has
> already high distance from their optimal location, you can skip those
> element and go ahead. But, in the existing approach, we increase their
> distance further. Thoughts?

All the following elements already are at their "optimal" position given
the previous occupation of the hash-table.  As we only start to move
once the insert-distance of the existing element is lower than the one
of the to-be-inserted one, the following elements will all be moved.
Doing the swap on a element-by-element basis would be possible, but just
achieves the same result at a higher cost.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance degradation in TPC-H Q18

2017-03-02 Thread Andres Freund

Hi,

On 2017-03-01 10:39:11 +0530, Kuntal Ghosh wrote:
> On Wed, Mar 1, 2017 at 9:33 AM, Kuntal Ghosh  
> wrote:
> > On Wed, Mar 1, 2017 at 9:19 AM, Andres Freund  wrote:
> >> That's without the patch in
> >> http://archives.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de
> >> ? With that patch it should complete without that change, right?
> >>
> > No, it's with the patch in the above link which is
> > 0001-Resize-simplehash.h-table-in-case-of-long-runs.patch. But, I've
> > kept the SH_FILLFACTOR to 0.8 and SH_MAX_FILLFACTOR to 0.95. I'll do
> > another round of testing with the constants provided by you.
> >
> I've tested with the patch
> 0001-Resize-simplehash.h-table-in-case-of-long-runs.patch and the
> results are same, i.e., hash table grows even when it is only 10-12%
> filled. I've attached the logfile for reference.

So, I observed similar things, where the *leader backend* reports a lot
of "long runs".

WARNING:  clumping, growing this thing
LOG:  size: 1048576, members: 158071, filled: 0.150748, total chain: 95622, max 
chain: 35, avg chain: 0.604931, total_collisions: 35909, max_collisions: 5, 
avg_collisions: 0.227170
STATEMENT:  EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey;
WARNING:  clumping, growing this thing
LOG:  size: 2097152, members: 238992, filled: 0.113960, total chain: 1850141, 
max chain: 422, avg chain: 7.741435, total_collisions: 55363, max_collisions: 
5, avg_collisions: 0.231652
STATEMENT:  EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey;
WARNING:  clumping, growing this thing
LOG:  size: 4194304, members: 881089, filled: 0.210068, total chain: 567907, 
max chain: 29, avg chain: 0.644551, total_collisions: 174681, max_collisions: 
6, avg_collisions: 0.198256
STATEMENT:  EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey;
WARNING:  clumping, growing this thing
LOG:  size: 8388608, members: 5397649, filled: 0.643450, total chain: 5752690, 
max chain: 22, avg chain: 1.065777, total_collisions: 1437957, max_collisions: 
8, avg_collisions: 0.266404
STATEMENT:  EXPLAIN ANALYZE select l_orderkey from lineitem group by l_orderkey;

The pertinent part is that there's "forced" resizes due to long runs at
0.15%, 0.11%, 0.21%, 0.64%.  Note that there's a good number of
"ordinary" resizes, and at the end there's 750 tuples in the
hashtable.  I.e. there's a number of resizes after the last one due to
clumping, and all of those are entirely guided through fillfactor; the
final hash-table is entirely reasonably sized.

Trying to do the math, that just didn't sit well with me. There's just
no way there could be that many and such long runs, without something
broken.  And indeed:

Thinking about this even more I realized the reason this happens is that
b81b5a96f4 didn't actually address the problem of inserting in
hash-table order.   To understand, let's first look at the query plan:

Finalize HashAggregate  (cost=899227.75..903176.55 rows=394880 width=4) (actual 
time=6255.957..7415.152 rows=750 loops=1)
  Group Key: l_orderkey
  ->  Gather  (cost=652427.75..893304.55 rows=2369280 width=4) (actual 
time=1741.706..3699.248 rows=7940881 loops=1)
Workers Planned: 6
Workers Launched: 6
->  Partial HashAggregate  (cost=651427.75..655376.55 rows=394880 
width=4) (actual time=1731.468..1956.693 rows=11344
  Group Key: l_orderkey
  ->  Parallel Seq Scan on lineitem  (cost=0.00..638927.80 
rows=480 width=4) (actual time=0.025..711.771 rows
Planning time: 0.192 ms
Execution time: 7700.027 ms

The important part is the estimated rows=394880 and actual
rows=750. That means that the hash-table on the leader backend will
first be created suitably for 394880 rows.  But we obviously get a lot
more.

The problem then is that even after introducing the hash_iv stuff
(b81b5a96f), we essentially still iterate over the tuples in
hash-order.  TupleHashTableHash() computes the hash for a single-column
group condition as:

uint32 hashkey = hashtable->hash_iv;

hashkey = (hashkey << 1) | ((hashkey & 0x8000) ? 1 : 0);
attr = slot_getattr(slot, att, );
if (!isNull)/* treat nulls as having hash key 0 */
{
uint32 hkey;

hkey = DatumGetUInt32(FunctionCall1([i], attr));
hashkey ^= hkey;
}

the resulting hash-values aren't actually meaningfully influenced by the
IV. Because we just xor with the IV, most hash-value that without the IV
would have fallen into a single hash-bucket, fall into a single
hash-bucket afterwards as well; just somewhere else in the hash-range.

Which then leads to the issue that we insert hundreds of rows into the
leader's hash-value that fall into the same hash-bucket (as the leader's
hashtable is quite small at this point, the hash-buckets on the worker
tables have *far* fewer entries per bucket / are far smaller on the

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-02 Thread Corey Huinker

>
>
> For me, it is only slightly better: I think that for helping understanding
> and maintenance, the automaton state transitions should be all clear and
> loud in just one place, so I would really like to see a single common
> structure:
>
>   if (is "if") switch on all states;
>   else if (is "elif") switch on all states;
>   else if (is "else") switch on all states;
>   else if (is "endif") switch on all states;
>
> And minimal necessary error handling around that.
>
>
v20: attempt at implementing the switch-on-all-states style.
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index ae58708..9d245a9 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2035,6 +2035,83 @@ hello 10
 
   
 
+  
+\if expr
+\elif expr
+\else
+\endif
+
+
+This group of commands implements nestable conditional blocks, like
+this:
+
+
+-- set ON_ERROR_STOP in case the variables are not valid boolean expressions
+\set ON_ERROR_STOP on
+-- check for the existence of two separate records in the database and store
+-- the results in separate psql variables
+SELECT
+EXISTS(SELECT 1 FROM customer WHERE customer_id = 123) as is_customer,
+EXISTS(SELECT 1 FROM employee WHERE employee_id = 456) as is_employee
+\gset
+\if :is_customer
+SELECT * FROM customer WHERE customer_id = 123;
+\elif :is_employee
+\echo 'is not a customer but is an employee'
+SELECT * FROM employee WHERE employee_id = 456;
+\else
+\if yes
+\echo 'not a customer or employee'
+\else
+\echo 'this should never print'
+\endif
+\endif
+
+
+Conditional blocks must begin with a \if and end
+with an \endif, and the pairs must be found in
+the same source file. If an EOF is reached on the main file or an
+\include-ed file before all local
+\if-\endif are matched, then
+psql will raise an error.
+
+
+A conditional block can have any number of 
+\elif clauses, which may optionally be followed by a
+single \else clause.
+
+
+The \if and \elif commands
+read the rest of the line and evaluate it as a boolean expression.
+Currently, expressions are limited to a single unquoted string
+which are evaluated like other on/off options, so the valid values
+are any unambiguous case insensitive matches for one of:
+true, false, 
1,
+0, on, off,
+yes, no.  So 
+t, T, and tR
+will all match true.
+
+Expressions that do not properly evaluate to true or false will
+generate an error and cause the \if or
+\elif command to fail.  Because that behavior may
+change branching context in undesirable ways (executing code which
+was intended to be skipped, causing \elif,
+\else, and \endif commands to
+pair with the wrong \if, etc), it is
+recommended that scripts which use conditionals also set
+ON_ERROR_STOP.
+
+
+Lines within false branches are parsed normally for query/command
+completion, but any queries are not sent to the server, and any
+commands other than conditionals (\if,
+\elif, \else,
+\endif) are ignored. Conditional commands are
+still checked for valid nesting.
+
+
+  
 
   
 \ir or \include_relative 
filename
@@ -3703,8 +3780,9 @@ testdb= INSERT INTO my_table VALUES 
(:'content');
 
 
 In prompt 1 normally =,
-but ^ if in single-line mode,
-or ! if the session is disconnected from the
+but @ if the session is in a false conditional
+block, or ^ if in single-line mode, or
+! if the session is disconnected from the
 database (which can happen if \connect fails).
 In prompt 2 %R is replaced by a character that
 depends on why psql expects more input:
diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile
index c53733f..1492b66 100644
--- a/src/bin/psql/Makefile
+++ b/src/bin/psql/Makefile
@@ -21,7 +21,7 @@ REFDOCDIR= $(top_srcdir)/doc/src/sgml/ref
 override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS += -L$(top_builddir)/src/fe_utils -lpgfeutils -lpq
 
-OBJS=  command.o common.o help.o input.o stringutils.o mainloop.o copy.o \
+OBJS=  command.o common.o conditional.o help.o input.o stringutils.o 
mainloop.o copy.o \
startup.o prompt.o variables.o large_obj.o describe.o \
crosstabview.o tab-complete.o \
sql_help.o psqlscanslash.o \
@@ -61,8 +61,16 @@ uninstall:
 
 clean distclean:
rm -f psql$(X) $(OBJS) lex.backup
+   rm -rf tmp_check
 
 # files removed here are supposed to be in the distribution tarball,
 # so do not clean them in the clean/distclean rules
 maintainer-clean:

Re: [HACKERS] patch: function xmltable

2017-03-02 Thread Pavel Stehule

2017-03-02 19:32 GMT+01:00 Alvaro Herrera :

> So in the old (non-executor-node) implementation, you could attach WITH
> ORDINALITY to the xmltable expression and it would count the output
> rows, regardless of which XML document it comes from.  With the new
> implementation, the grammar no longer accepts it.  To count output rows,
> you still need to use row_number().  Maybe this is okay.  This is the
> example from the docs, and I add another XML document with two more rows
> for xmltable.  Look at the three numbering columns ...
>

It is expected - now tablefunc are not special case of SRF, so it lost all
SRF functionality. It is not critical lost - it supports internally FOR
ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced
to support WITH ORDINALITY in future, but I have not any use case for it.

Regards

Pavel



>
> CREATE TABLE xmldata AS SELECT
> xml $$
> 
>   
> AU
> Australia
>   
>   
> JP
> Japan
> Shinzo Abe
> 145935
>   
>   
> SG
> Singapore
> 697
>   
> 
> $$ AS data;
>
>  insert into xmldata values ($$
>  CL
> Chile
>  AR
> Argentina$$);
>
> SELECT ROW_NUMBER() OVER (), xmltable.*
>   FROM xmldata,
>XMLTABLE('//ROWS/ROW'
> PASSING data
> COLUMNS id int PATH '@id',
> ordinality FOR ORDINALITY,
> "COUNTRY_NAME" text,
> country_id text PATH 'COUNTRY_ID',
> size_sq_km float PATH 'SIZE[@unit = "sq_km"]',
> size_other text PATH
>  'concat(SIZE[@unit!="sq_km"], " ",
> SIZE[@unit!="sq_km"]/@unit)',
> premier_name text PATH 'PREMIER_NAME' DEFAULT 'not
> specified')
> ;
>
>  row_number │ id │ ordinality │ COUNTRY_NAME │ country_id │ size_sq_km │
> size_other  │ premier_name
> ┼┼┼──┼┼─
> ───┼──┼───
>   1 │  1 │  1 │ Australia│ AU ││
> │ not specified
>   2 │  5 │  2 │ Japan│ JP ││
> 145935 sq_mi │ Shinzo Abe
>   3 │  6 │  3 │ Singapore│ SG │697 │
> │ not specified
>   4 │  2 │  1 │ Chile│ CL ││
> │ not specified
>   5 │  3 │  2 │ Argentina│ AR ││
> │ not specified
>
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] patch: function xmltable

2017-03-02 Thread Alvaro Herrera

So in the old (non-executor-node) implementation, you could attach WITH
ORDINALITY to the xmltable expression and it would count the output
rows, regardless of which XML document it comes from.  With the new
implementation, the grammar no longer accepts it.  To count output rows,
you still need to use row_number().  Maybe this is okay.  This is the
example from the docs, and I add another XML document with two more rows
for xmltable.  Look at the three numbering columns ...

CREATE TABLE xmldata AS SELECT
xml $$
  
  
AU
Australia
  
  
JP
Japan
Shinzo Abe
145935
  
  
SG
Singapore
697
  

$$ AS data;

 insert into xmldata values ($$
 CLChile
 ARArgentina$$);

SELECT ROW_NUMBER() OVER (), xmltable.*
  FROM xmldata, 
 
   XMLTABLE('//ROWS/ROW'
  
PASSING data
COLUMNS id int PATH '@id',
ordinality FOR ORDINALITY,
"COUNTRY_NAME" text,
country_id text PATH 'COUNTRY_ID',
size_sq_km float PATH 'SIZE[@unit = "sq_km"]',
size_other text PATH
 'concat(SIZE[@unit!="sq_km"], " ", 
SIZE[@unit!="sq_km"]/@unit)',
premier_name text PATH 'PREMIER_NAME' DEFAULT 'not 
specified')
;

 row_number │ id │ ordinality │ COUNTRY_NAME │ country_id │ size_sq_km │  
size_other  │ premier_name  
┼┼┼──┼┼┼──┼───
  1 │  1 │  1 │ Australia│ AU ││
  │ not specified
  2 │  5 │  2 │ Japan│ JP ││ 145935 
sq_mi │ Shinzo Abe
  3 │  6 │  3 │ Singapore│ SG │697 │
  │ not specified
  4 │  2 │  1 │ Chile│ CL ││
  │ not specified
  5 │  3 │  2 │ Argentina│ AR ││
  │ not specified


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] I propose killing PL/Tcl's "modules" infrastructure

2017-03-02 Thread Tom Lane

Jim Nasby  writes:
> On 2/27/17 2:42 PM, Tom Lane wrote:
>> + SET pltcl.start_proc = 'no_such_function';
>> + select tcl_int4add(1, 2);
>> + ERROR:  function no_such_function() does not exist

> Can the error message be more explicit somehow? Otherwise people will be 
> quite confused as to where no_such_function() is coming from.

After thinking about that for awhile, it seemed like the most useful thing
to do is to set up an errcontext callback that will be active throughout
execution of the start_proc.  That will cover both setup failures like
the above, and errors occurring within the start_proc, which could be
equally confusing if you think they apply to the function you initially
tried to call.  v2 patch attached that does it like that.

> 
> BTW, I'd think this functionality would be valuable for every PL.

Maybe for some.  I see no value in putting anything about it in
pg_language though.  I don't see that we could share any useful amount of
mechanism, and it won't necessarily look the same in every language ---
plperl for instance prefers code fragments over procedures.

In any case, I'm not going there in this patch.

regards, tom lane

diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index 0a693803dd3432246ba5f9af6625b90eaa871ab1..ad216dd5b752160e47b9732ef603dbf3e0c56ebb 100644
*** a/doc/src/sgml/pltcl.sgml
--- b/doc/src/sgml/pltcl.sgml
*** if {[catch { spi_exec $sql_command }]} {
*** 902,907 
--- 902,981 
  
 
  
+
+ PL/Tcl Configuration
+ 
+ 
+  This section lists configuration parameters that
+  affect PL/Tcl.
+ 
+ 
+ 
+ 
+  
+   
+pltcl.start_proc (string)
+
+ pltcl.start_proc configuration parameter
+
+   
+   
+
+ This parameter, if set to a nonempty string, specifies the name
+ (possibly schema-qualified) of a parameterless PL/Tcl function that
+ is to be executed whenever a new Tcl interpreter is created for
+ PL/Tcl.  Such a function can perform per-session initialization, such
+ as loading additional Tcl code.  A new Tcl interpreter is created
+ when a PL/Tcl function is first executed in a database session, or
+ when an additional interpreter has to be created because a PL/Tcl
+ function is called by a new SQL role.
+
+ 
+
+ The referenced function must be written in the pltcl
+ language, and must not be marked SECURITY DEFINER.
+ (These restrictions ensure that it runs in the interpreter it's
+ supposed to initialize.)  The current user must have permission to
+ call it, too.
+
+ 
+
+ If the function fails with an error it will abort the function call
+ that caused the new interpreter to be created and propagate out to
+ the calling query, causing the current transaction or subtransaction
+ to be aborted.  Any actions already done within Tcl won't be undone;
+ however, that interpreter won't be used again.  If the language is
+ used again the initialization will be attempted again within a fresh
+ Tcl interpreter.
+
+ 
+
+ Only superusers can change this setting.  Although this setting
+ can be changed within a session, such changes will not affect Tcl
+ interpreters that have already been created.
+
+   
+  
+ 
+  
+   
+pltclu.start_proc (string)
+
+ pltclu.start_proc configuration parameter
+
+   
+   
+
+ This parameter is exactly like pltcl.start_proc,
+ except that it applies to PL/TclU.  The referenced function must
+ be written in the pltclu language.
+
+   
+  
+ 
+ 
+
+ 
 
  Tcl Procedure Names
  
diff --git a/src/pl/tcl/Makefile b/src/pl/tcl/Makefile
index 453e7ad2ecb34ccb68e672c5c1637d332f05f1aa..1096c4faf04e3b29d00182414e20fd225ba64ffc 100644
*** a/src/pl/tcl/Makefile
--- b/src/pl/tcl/Makefile
*** DATA = pltcl.control pltcl--1.0.sql pltc
*** 28,34 
 pltclu.control pltclu--1.0.sql pltclu--unpackaged--1.0.sql
  
  REGRESS_OPTS = --dbname=$(PL_TESTDB) --load-extension=pltcl
! REGRESS = pltcl_setup pltcl_queries pltcl_unicode
  
  # Tcl on win32 ships with import libraries only for Microsoft Visual C++,
  # which are not compatible with mingw gcc. Therefore we need to build a
--- 28,34 
 pltclu.control pltclu--1.0.sql pltclu--unpackaged--1.0.sql
  
  REGRESS_OPTS = --dbname=$(PL_TESTDB) --load-extension=pltcl
! REGRESS = pltcl_setup pltcl_queries pltcl_start_proc pltcl_unicode
  
  # Tcl on win32 ships with import libraries only for Microsoft Visual C++,
  # which are not compatible with mingw gcc. Therefore we need to build a
diff --git a/src/pl/tcl/expected/pltcl_start_proc.out

Re: [HACKERS] patch: function xmltable

2017-03-02 Thread Pavel Stehule

Dne 2. 3. 2017 18:14 napsal uživatel "Alvaro Herrera" <
alvhe...@2ndquadrant.com>:

Pavel Stehule wrote:

> It is documented already
>
> "If the PATH matches an empty tag the result is an empty
string"

Hmm, okay.  But what we have here is not an empty tag, but a tag that is
completely missing.  I don't think those two cases should be treated in
the same way ...

this information is not propagated from libxml2.

> Attached new patch
>
> cleaned documentation
> regress tests is more robust
> appended comment in src related to generating empty string for empty tag

Thanks, I incorporated those changes.  Here's v46.  I rewrote the
documentation, and fixed a couple of incorrectly copied comments
in the new executor code; I think that one looks good.  In the future we
could rewrite it to avoid the need for a tuplestore, but I think the
current approach is good enough for a pg10 implementation.

Barring serious problems, I intend to commit this later today.

thank you very much

regards

Pavel

--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

[HACKERS] Two phase commit in ECPG

2017-03-02 Thread Masahiko Sawada

Hi all,

I found that ecpg has not been supporting COMMIT PREPARED and ROLLBACK
PREPARED since version 9.0.2.
For example, if the test code does,
  EXEC SQL BEGIN;
  EXEC SQL INSERT INTO t1 VALUES(1);
  EXEC SQL PREPARE TRANSACTION 'gxid';
  EXEC SQL COMMIT PREPARED 'gxid';
I got error "COMMIT PREPARED cannot run inside a transaction block".

This cause is that the "begin transaction" is issued automatically
before executing COMMIT PREPARED if we're not in auto commit. The
Commit 816b008eaf1a1ff1069f3bafff363a9a8bf04a21 fixed bug of incorrect
status calculation but at the same time it became the cause of this
behavior.
Is this a bug?

Attached 001 patch fixes this issue and add regression test for two
phase commit in ecpg.

Also, in spite of ecpg identifies these two commands as transaction
command similar to other transaction commands like commit and rollback
the documentation doesn't mentioned about these at all. Attached 002
patch add description about these to documentation.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


002_ecpg_commit_rollback_prepared_doc.patch
Description: Binary data


001_ecpg_commit_rollback_prepared.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-03-02 Thread Alvaro Herrera

Pavel Stehule wrote:

> It is documented already
> 
> "If the PATH matches an empty tag the result is an empty string"

Hmm, okay.  But what we have here is not an empty tag, but a tag that is
completely missing.  I don't think those two cases should be treated in
the same way ...

> Attached new patch
> 
> cleaned documentation
> regress tests is more robust
> appended comment in src related to generating empty string for empty tag

Thanks, I incorporated those changes.  Here's v46.  I rewrote the
documentation, and fixed a couple of incorrectly copied comments
in the new executor code; I think that one looks good.  In the future we
could rewrite it to avoid the need for a tuplestore, but I think the
current approach is good enough for a pg10 implementation.

Barring serious problems, I intend to commit this later today.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

xmltable-46.patch.gz
Description: application/gunzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-02 Thread Peter Geoghegan

On Thu, Mar 2, 2017 at 5:50 AM, Robert Haas  wrote:
> On Wed, Mar 1, 2017 at 12:58 AM, Peter Geoghegan  wrote:
>> * This scales based on output size (projected index size), not input
>> size (heap scan input). Apparently, that's what we always do right
>> now.
>
> Actually, I'm not aware of any precedent for that. I'd just pass the
> heap size to compute_parallel_workers(), leaving the index size as 0,
> and call it good.  What you're doing now seems exactly backwards from
> parallel query generally.

Sorry, that's what I meant.

>> So, the main factor that
>> discourages parallel sequential scans doesn't really exist for
>> parallel CREATE INDEX.
>
> Agreed.

I'm glad. This justifies the lack of much of any "veto" on the
logarithmic scaling. The only thing that can do that is
max_parallel_workers_maintenance, the storage parameter
parallel_workers (maybe this isn't a storage parameter in V9), and
insufficient maintenance_work_mem per worker (as judged by
min_parallel_relation_size being greater than workMem per worker).

I guess that the workMem scaling threshold thing could be
min_parallel_index_scan_size, rather than min_parallel_relation_size
(which we now call min_parallel_table_scan_size)?

In general, I would expect this to leave most CREATE INDEX statements
with a parallel plan in the real world, using exactly the number of
workers indicated by the logarithmic scaling. (pg_restore would also
not use parallelism, because it's specially disabled -- you have to
have set the storage param at some point.)

>> We could always defer the cost model to another release, and only
>> support the storage parameter for now, though that has disadvantages,
>> some less obvious [4].
>
> I think it's totally counter-intuitive that any hypothetical index
> storage parameter would affect the degree of parallelism involved in
> creating the index and also the degree of parallelism involved in
> scanning it.  Whether or not other systems do such crazy things seems
> to me to beside the point.  I think if CREATE INDEX allows an explicit
> specification of the degree of parallelism (a decision I would favor)
> it should have a syntactically separate place for unsaved build
> options vs. persistent storage parameters.

I can see both sides of it.

On the one hand, it's weird that you might have query performance
adversely affected by what you thought was a storage parameter that
only affected the index build. On the other hand, it's useful that you
retain that as a parameter, because you may want to periodically
REINDEX, or have a way of ensuring that pg_restore does go on to use
parallelism, since it generally won't otherwise. (As mentioned
already, pg_restore does not trust the cost model due to issues with
the availability of statistics).

There are reports on Google of users of these other systems being
confused by all this, and I don't think that it's any different there
(those other systems don't treat parallel_workers style storage
parameter much different for the purposes of index scans, or anything
like that). I agree that that isn't very user friendly.

In theory, having two index storage parameters solves our problem. I
don't like that either, though, since it creates a whole new problem.

To be clear, I don't have any strong feelings on all this. I just
think it's worth pointing out that there are reasons to not do what
you suggest, that you might want to consider if you haven't already.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-02 Thread Petr Jelinek

On 02/03/17 13:23, Craig Ringer wrote:
> On 2 March 2017 at 16:20, Stas Kelvich  wrote:
>>
>>> On 2 Mar 2017, at 11:00, Craig Ringer  wrote:
>>>
>>> We already have it, because we just decoded the PREPARE TRANSACTION.
>>> I'm preparing a patch revision to demonstrate this.
>>
>> Yes, we already have it, but if server reboots between commit prepared (all
>> prepared state is gone) and decoding of this commit prepared then we loose
>> that mapping, isn’t it?
> 
> I was about to explain how restart_lsn works again, and how that would
> mean we'd always re-decode the PREPARE TRANSACTION before any COMMIT
> PREPARED or ROLLBACK PREPARED on crash. But...
> 
> Actually, the way you've implemented it, that won't be the case. You
> treat PREPARE TRANSACTION as a special-case of COMMIT, and the client
> will presumably send replay confirmation after it has applied the
> PREPARE TRANSACTION. In fact, it has to if we want 2PC to work with
> synchronous replication. This will allow restart_lsn to advance to
> after the PREPARE TRANSACTION record if there's no other older xact
> and we see a suitable xl_running_xacts record. So we wouldn't decode
> the PREPARE TRANSACTION again after restart.
> 

Unless we just don't let restart_lsn to go forward if there is 2pc that
wasn't decoded yet (twopcs store the prepare lsn) but that's probably
too much of a kludge.

> 
> It's also worth noting that with your current approach, 2PC xacts will
> produce two calls to the output plugin's commit() callback, once for
> the PREPARE TRANSACTION and another for the COMMIT PREPARED or
> ROLLBACK PREPARED, the latter two with a faked-up state. I'm not a
> huge fan of that. It's not entirely backward compatible since it
> violates the previously safe assumption that there's a 1:1
> relationship between begin and commit callbacks with no interleaving,
> for one thing, and I think it's also a bit misleading to send a
> PREPARE TRANSACTION to a callback that could previously only receive a
> true commit.
> 
> I particularly dislike calling a commit callback for an abort. So I'd
> like to look further into the interface side of things. I'm inclined
> to suggest adding new callbacks for 2pc prepare, commit and rollback,
> and if the output plugin doesn't set them fall back to the existing
> behaviour. Plugins that aren't interested in 2PC (think ETL) should
> probably not have to deal with it, we might as well just send them
> only the actually committed xacts, when they commit.
> 

I think this is a good approach to handle it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster methods for getting SPI results

2017-03-02 Thread Jon Nelson

On Thu, Mar 2, 2017 at 10:03 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 12/20/16 23:14, Jim Nasby wrote:
> > I've been looking at the performance of SPI calls within plpython.
> > There's a roughly 1.5x difference from equivalent python code just in
> > pulling data out of the SPI tuplestore. Some of that is due to an
> > inefficiency in how plpython is creating result dictionaries, but fixing
> > that is ultimately a dead-end: if you're dealing with a lot of results
> > in python, you want a tuple of arrays, not an array of tuples.
>
> There is nothing that requires us to materialize the results into an
> actual list of actual rows.  We could wrap the SPI_tuptable into a
> Python object and implement __getitem__ or __iter__ to emulate sequence
> or mapping access.
>

Python objects have a small (but non-zero) overhead in terms of both memory
and speed. A built-in dictionary is probably one of the least-expensive
(memory/cpu) choices, although how the dictionary is constructed also
impacts performance.  Another choice is a tuple.

Avoiding Py_BuildValue(...) in exchange for a bit more complexity (via
PyTuple_New(..) and PyTuple_SetItem(...)) is also a nice performance win in
my experience.

-- 
Jon

Re: [HACKERS] Cleanup: avoid direct use of ip_posid/ip_blkid

2017-03-02 Thread Peter Eisentraut

On 2/22/17 08:38, Pavan Deolasee wrote:
> One reason why these macros are not always used is because they
> typically do assert-validation to ensure ip_posid has a valid value.
> There are a few places in code, especially in GIN and also when we are
> dealing with user-supplied TIDs when we might get a TID with invalid
> ip_posid. I've handled that by defining and using separate macros which
> skip the validation. This doesn't seem any worse than what we are
> already doing.

I wonder why we allow that.  Shouldn't the tid type reject input that
has ip_posid == 0?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-03-02 Thread Masahiko Sawada

On Thu, Mar 2, 2017 at 11:56 AM, vinayak
 wrote:
>
> On 2017/02/28 16:54, Masahiko Sawada wrote:
>
> I've created a wiki page[1] describing about the design and
> functionality of this feature. Also it has some examples of use case,
> so this page would be helpful for even testing. Please refer it if
> you're interested in testing this feature.
>
> [1] 2PC on FDW
> 
>
> Thank you for creating the wiki page.

Thank you for looking at this patch.

> In the "src/test/regress/pg_regress.c" file
> -* xacts.  (Note: to reduce the probability of unexpected
> shmmax
> -* failures, don't set max_prepared_transactions any higher
> than
> -* actually needed by the prepared_xacts regression test.)
> +* xacts. We also set max_fdw_transctions to enable testing
> of atomic
> +* foreign transactions. (Note: to reduce the probability of
> unexpected
> +* shmmax failures, don't set max_prepared_transactions or
> +* max_prepared_foreign_transactions any higher than
> actually needed by the
> +* corresponding regression tests.).
>
> I think we are not setting the "max_fdw_transctions" anywhere.
> Is this correct?

This comment is out of date. Will fix.

>
> In the "src/bin/pg_waldump/rmgrdesc.c" file following header file used two
> times.
> + #include "access/fdw_xact.h"
> I think we need to remove one line.
>

Not necessary. Will get rid of it.

Since these are not feature bugs I will incorporate these when making
update version patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backport of pg_statistics typos fix

2017-03-02 Thread Peter Eisentraut

On 2/8/17 14:54, Robert Haas wrote:
> BTW, looking at that commit, this change looks to have adjusted this
> from being wrong to still being wrong:
> 
> -Allow pg_statistics to be reset by calling
> pg_stat_reset() (Christopher)
> +Allow pg_statistic to be reset by calling
> pg_stat_reset() (Christopher)
> 
> It's true that pg_stat_reset() doesn't reset the nonexistent
> pg_statistics table.  But it doesn't reset pg_statistic either.  IIUC,
> it resets the data gathered by the statistics collector, which is
> something else altogether.

Fixed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster methods for getting SPI results

2017-03-02 Thread Peter Eisentraut

On 12/20/16 23:14, Jim Nasby wrote:
> I've been looking at the performance of SPI calls within plpython. 
> There's a roughly 1.5x difference from equivalent python code just in 
> pulling data out of the SPI tuplestore. Some of that is due to an 
> inefficiency in how plpython is creating result dictionaries, but fixing 
> that is ultimately a dead-end: if you're dealing with a lot of results 
> in python, you want a tuple of arrays, not an array of tuples.

There is nothing that requires us to materialize the results into an
actual list of actual rows.  We could wrap the SPI_tuptable into a
Python object and implement __getitem__ or __iter__ to emulate sequence
or mapping access.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling replication connections by default in pg_hba.conf

2017-03-02 Thread Peter Eisentraut

On 2/3/17 17:47, Michael Paquier wrote:
> On Fri, Feb 3, 2017 at 4:59 AM, Simon Riggs  wrote:
>>> It's weirdly inconsistent now.  You need a "replication" line in
>>> pg_hba.conf to connect for logical decoding, but you can't restrict that
>>> to a specific database because the database column in pg_hba.conf is
>>> occupied by the "replication" key word.
>> Agreed. Change needed.
> That sounds really apealling indeed after thinking about its
> implications. So we would simply authorize a WAL sender sending
> "replication" to connect if the user name matches. That's in short
> check_db() in hba.c.

In

patch 0006 it is proposed to no longer use the "replication" keyword in
pg_hba.conf for logical
replication and use the normal database entries instead.

However, I don't think we can reasonably get rid of the replication
keyword for physical replication.  Say if you have a pg_hba.conf like

host  db1  someusers  ...
host  db2  someusers  ...
host  db3  someusers  ...

how would you decide access for physical replication?  Since physical
replication is not to a database, you need a way to call it out
separately if your pg_hba.conf style is to enumerate databases.

What we could do to make things simpler is to include "replication" in
the "all" keyword.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-03-02 Thread David Steele

Nikhil,

On 2/27/17 12:19 AM, Nikhil Sontakke wrote:
> Hi Michael,
> 
> Thanks for taking a look at the patch.
> 
> twophase.c: In function ‘PrepareRedoAdd’:
> twophase.c:2539:20: warning: variable ‘gxact’ set but not used
> [-Wunused-but-set-variable]
>   GlobalTransaction gxact;
> There is a warning at compilation.
> 
> Will fix. 

<...>

Do you know when you will have a new patch ready?

It would be great to get this thread closed out after 14 months and many
commits.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for changes to recovery.conf API

2017-03-02 Thread David Steele

Hi Simon,

On 2/25/17 2:43 PM, Simon Riggs wrote:
> On 25 February 2017 at 13:58, Michael Paquier  
> wrote:
> 
>> - trigger_file is removed.
>> FWIW, my only complain is about the removal of trigger_file, this is
>> useful to detect a trigger file on a different partition that PGDATA!
>> Keeping it costs also nothing..
> 
> Sorry, that is just an error of implementation, not intention. I had
> it on my list to keep, at your request.
> 
> New version coming up.

Do you have an idea when the new version will be available?

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-02 Thread David Steele

Hi Kevin,

On 2/20/17 10:43 PM, Thomas Munro wrote:
> On Tue, Feb 21, 2017 at 7:14 AM, Thomas Munro
>  wrote:
>> On Fri, Jan 20, 2017 at 2:49 AM, Kevin Grittner  wrote:
>>> Attached is v9 which fixes bitrot from v8.  No other changes.
>>>
>>> Still needs review.
> 
> Based on a suggestion from Robert off-list I tried inserting into a
> delta relation from a trigger function and discovered that it
> segfaults:
> 
>   * frame #0: 0x0001057705a6
> postgres`addRangeTableEntryForRelation(pstate=0x7fa58186a4d0,
> rel=0x, alias=0x, inh='\0',
> inFromCl='\0') + 70 at parse_relation.c:1280 [opt]
> frame #1: 0x00010575bbda
> postgres`setTargetTable(pstate=0x7fa58186a4d0,
> relation=0x7fa58186a098, inh=, alsoSource='\0',
> requiredPerms=1) + 90 at parse_clause.c:199 [opt]
> frame #2: 0x000105738530 postgres`transformStmt [inlined]
> transformInsertStmt(pstate=) + 69 at analyze.c:540 [opt]
> frame #3: 0x0001057384eb
> postgres`transformStmt(pstate=, parseTree=)
> + 2411 at analyze.c:279 [opt]
> frame #4: 0x000105737a42
> postgres`transformTopLevelStmt(pstate=,
> parseTree=0x7fa58186a438) + 18 at analyze.c:192 [opt]
> frame #5: 0x0001059408d0
> postgres`pg_analyze_and_rewrite_params(parsetree=0x7fa58186a438,
> query_string="insert into d values (100, 100, 'x')",
> parserSetup=(plpgsql.so`plpgsql_parser_setup at pl_comp.c:1017),
> parserSetupArg=0x7fa58185c2a0, queryEnv=0x7fa581857798) + 128
> at postgres.c:706 [opt]

Do you know when you will have a new patch ready?

This looks like an interesting and important feature but I think it's
going to have to come together quickly if it's going to make it into v10.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] kNN for btree

2017-03-02 Thread David Steele

Hi Alexander,

On 2/16/17 11:20 AM, Robert Haas wrote:
> On Thu, Feb 16, 2017 at 10:59 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Thu, Feb 16, 2017 at 8:05 AM, Alexander Korotkov
>>>  wrote:
 My idea is that we need more general redesign of specifying ordering which
 index can produce.  Ideally, we should replace amcanorder, amcanbackward 
 and
 amcanorderbyop with single callback.  Such callback should take a list of
 pathkeys and return number of leading pathkeys index could satisfy (with
 corresponding information for index scan).  I'm not sure that other hackers
 would agree with such design, but I'm very convinced that we need something
 of this level of extendability.  Otherwise we would have to hack our 
 planner
 <-> index_access_method interface each time we decide to cover another 
 index
 produced ordering.
>>
>>> Yeah.  I'm not sure if that's exactly the right idea.  But it seems
>>> like we need something.
>>
>> That's definitely not exactly the right idea, because using it would
>> require the core planner to play twenty-questions trying to guess which
>> pathkeys the index can satisfy.  ("Can you satisfy some prefix of this
>> pathkey list?  How about that one?")  It could be sensible to have a
>> callback that's called once per index and hands back a list of pathkey
>> lists that represent interesting orders the index could produce, which
>> could be informed by looking aside at the PlannerInfo contents to see
>> what is likely to be relevant to the query.
>>
>> But even so, I'm not convinced that that is a better design or more
>> maintainable than the current approach.  I fear that it will lead to
>> duplicating substantial amounts of code and knowledge into each index AM,
>> which is not an improvement; and if anything, that increases the risk of
>> breaking every index AM anytime you want to introduce some fundamentally
>> new capability in the area.  Now that it's actually practical to have
>> out-of-core index AMs, that's a bigger concern than it might once have
>> been.
> 
> Yeah, that's all true.  But I think Alexander is right that just
> adding amcandoblah flags ad infinitum doesn't feel good either.  The
> interface isn't really arm's-length if every new thing somebody wants
> to do something new requires another flag.
> 
>> Also see the discussion that led up to commit ed0097e4f.  Users objected
>> the last time we tried to make index capabilities opaque at the SQL level,
>> so they're not going to like a design that tries to hide that information
>> even from the core C code.
> 
> Discoverability is definitely important, but first we have to figure
> out how we're going to make it work, and then we can work out how to
> let users see how it works.

Reading through this thread I'm concerned that this appears to be a big
change making its first appearance in the last CF.  There is also the
need for a new patch and a general consensus of how to proceed.

I recommend moving this patch to 2017-07 or marking it RWF.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v25)

2017-03-02 Thread Tomas Vondra


On 03/02/2017 07:42 AM, Kyotaro HORIGUCHI wrote:

Hello,

At Thu, 2 Mar 2017 04:05:34 +0100, Tomas Vondra  wrote 
in 

OK,

attached is v24 of the patch series, addressing most of the reported
issues and comments (at least I believe so). The main changes are:


Unfortunately, 0002 conflicts with the current master
(4461a9b). Could you rebase them or tell us the commit where this
patches stand on?



Attached is a rebased patch series, otherwise it's the same as v24.

FWIW it was based on 016c990834 from Feb 28, but apparently some recent 
patch caused a minor conflict.



I only saw the patch files but have some comments.


1) I've mostly abandoned the "multivariate" name in favor of
"extended", particularly in places referring to stats stored in the
pg_statistic_ext in general. "Multivariate" is now used only in places
talking about particular types (e.g. multivariate histograms).

The "extended" name is more widely used for this type of statistics,
and the assumption is that we'll also add other (non-multivariate)
types of statistics - e.g. statistics on custom expressions, or some
for of join statistics.


In 0005, and

@@ -184,14 +208,43 @@ clauselist_selectivity(PlannerInfo *root,
 * If there are no such stats or not enough attributes, don't waste time
 * simply skip to estimation using the plain per-column stats.
 */
+   if (has_stats(stats, STATS_TYPE_MCV) &&
...
+   /* compute the multivariate stats */
+   s1 *= clauselist_ext_selectivity(root, mvclauses, stat);

@@ -1080,10 +1136,71 @@ clauselist_ext_selectivity_deps(PlannerInfo *root, 
Index relid,
 }

 /*
+ * estimate selectivity of clauses using multivariate statistic

These comment is left unchanged?  or on purpose? 0007 adds very
similar texts.



Hmm, those comments should be probably changed to "extended".


2) Catalog pg_mv_statistic was renamed to pg_statistic_ext (and
pg_mv_stats view renamed to pg_stats_ext).


FWIW, "extended statistic" would be abbreviated as
"ext_statistic" or "extended_stats". Why have you exchanged the
words?



Because this way it's clear it's a version of pg_statistic, and it will 
be sorted right next to it.



3) The structure of pg_statistic_ext was changed as proposed by
Alvaro, i.e. the boolean flags were removed and instead we have just a
single "char[]" column with list of enabled statistics.

4) I also got rid of the "mv" part in most variable/function/constant
names, replacing it by "ext" or something similar. Also mvstats.h got
renamed to stats.h.

5) Moved the files from src/backend/utils/mvstats to
backend/statistics.

6) Fixed the n_choose_k() overflow issues by using the algorithm
proposed by Dean. Also, use the simple formula for num_combinations().

7) I've tweaked data types for a few struct members (in stats.h). I've
kept most of the uint32 fields at the top level though, because int16
might not be large enough for large statistics and the overhead is
minimal (compared to the space needed e.g. for histogram buckets).


Some formulated proof or boundary value test cases might be
needed (to prevent future trouble). Or any defined behavior on
overflow of them might be enough. I belive all (or most) of
overflow-able data has such behavior.



That is probably a good idea and I plan to do that.


The renames/changes were quite widespread, but I've done my best to
fix all the comments and various other places.

regards


regards,



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-03-02 Thread David Steele

Peter,

On 2/1/17 12:59 AM, Michael Paquier wrote:
> On Thu, Jan 26, 2017 at 8:16 AM, Tom Lane  wrote:
>> [ in the service of closing out this thread... ]
>>
>> Peter Geoghegan  writes:
>>> Finally, 0003-* is a Valgrind suppression borrowed from my parallel
>>> CREATE INDEX patch. It's self-explanatory.
>>
>> Um, I didn't find it all that self-explanatory.  Why wouldn't we want
>> to avoid writing undefined data?  I think the comment at least needs
>> to explain exactly what part of the written data might be uninitialized.
>> And I'd put the comment into valgrind.supp, too, not in the commit msg.
>>
>> Also, the suppression seems far too broad.  It would for instance
>> block any complaint about a write() invoked via an elog call from
>> any function invoked from any LogicalTape* function, no matter
>> how far removed.

It looks like we are waiting on a new patch.  Do you know when you will
have that ready?

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Doc fix] Wrong explanation about tsquery_phrase

2017-03-02 Thread Tom Lane

"Seki, Eiji"  writes:
> I found a wrong explanation about tsquery_phrase. I was slightly confused 
> when I tried to use it.

Yes, this was definitely an oversight in 028350f61; thanks for catching
it!  I think though that it would read better if it simply said
"distance", so I changed it that way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Questions about MergeAppend

2017-03-02 Thread Ronan Dunklau

Hello,

Looking into the MergeAppendPath generation, I'm a bit surprised by several 
things:

  - When generating the mergeappendpath, we use a dummy path to take the sort 
cost into account for non-sorted subpaths. This is called with the base 
relation tuples instead of the subpath estimated number of rows. This tends to 
overestimate the sorting cost drastically, since the base relation can be 
filtered thus reducing the number of input tuples for the sorting routine. 
Please find attached a trivial patch fixing this.
  - Why does it only generate a "fake" SortPath for sorting purpose, not 
adding it to the subpath, and posptone the creation of Sort plan node until 
later ? This also adds a bit of complexity when fixing the sort cost node 
later for explain output.
  - Why do we only consider generating MergeAppendPath for PathKeys for which 
a sorted Path exists in any of the child relation ? It seems to me there could 
be an advantage in using a MergeAppend of explicitly sorted relations over 
sorting an Append, in particular if every existing subpath can be sorted in 
work_mem.


-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org   diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 324829690d..a99a3aeceb 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1313,7 +1313,7 @@ create_merge_append_path(PlannerInfo *root,
 	  root,
 	  pathkeys,
 	  subpath->total_cost,
-	  subpath->parent->tuples,
+	  subpath->rows,
 	  subpath->pathtarget->width,
 	  0.0,
 	  work_mem,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2017-03-02 Thread David Steele

Hi Álvaro,

On 1/31/17 11:54 PM, Michael Paquier wrote:
> On Sat, Dec 31, 2016 at 7:35 AM, Alvaro Herrera
>  wrote:
>> Attached is v4, which fixes a couple of relatively minor bugs.  There
>> are still things to tackle before this is committable, but coding review
>> of the new executor node would be welcome.
> 
> Moved to CF 2017-03 because of a lack of reviews. The patch fails to
> apply and needs a rebase, so it is marked as "waiting on author".

It looks like we need a rebase if it's going to attract any reviewers.

Might I suggest this patch is not complete enough to be in the last CF
for a release?  I know it's been around for a while, but it is a major
feature and the consensus and completeness do not seem to be there.

I would encourage you to move this to 2017-07.

Thanks,
-- 
-David
da...@pgmasters.net

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-02 Thread Robert Haas

On Wed, Mar 1, 2017 at 12:58 AM, Peter Geoghegan  wrote:
> * This scales based on output size (projected index size), not input
> size (heap scan input). Apparently, that's what we always do right
> now.

Actually, I'm not aware of any precedent for that. I'd just pass the
heap size to compute_parallel_workers(), leaving the index size as 0,
and call it good.  What you're doing now seems exactly backwards from
parallel query generally.

> So, the main factor that
> discourages parallel sequential scans doesn't really exist for
> parallel CREATE INDEX.

Agreed.

> We could always defer the cost model to another release, and only
> support the storage parameter for now, though that has disadvantages,
> some less obvious [4].

I think it's totally counter-intuitive that any hypothetical index
storage parameter would affect the degree of parallelism involved in
creating the index and also the degree of parallelism involved in
scanning it.  Whether or not other systems do such crazy things seems
to me to beside the point.  I think if CREATE INDEX allows an explicit
specification of the degree of parallelism (a decision I would favor)
it should have a syntactically separate place for unsaved build
options vs. persistent storage parameters.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-03-02 Thread Amit Kapila

On Sun, Feb 26, 2017 at 10:16 PM, Robert Haas  wrote:
> On Wed, Feb 22, 2017 at 11:49 PM, Mithun Cy  
> wrote:
>> Hi all thanks,
>> I have tried to fix all of the comments given above with some more
>> code cleanups.
>
> While reading this patch tonight, I realized a serious problem with
> the entire approach, which is that this patch is supposing that we can
> read relation blocks for every database from a single worker that's
> not connected to any database.  I realize that I suggested that
> approach, but now I think it's broken, because the patch isn't taking
> any locks on the relations whose pages it is reading, and that is
> definitely going to break things.  While autoprewarm is busy sucking
> blocks into the shared buffer cache, somebody could be, for example,
> dropping one of those relations.  DropRelFileNodesAllBuffers and
> friends expect that nobody is going to be concurrently reading blocks
> back into the buffer cache because they hold AccessExclusiveLock, and
> they assume that anybody else who is touching it will hold at least
> AccessShareLock.  But this violates that assumption, and probably some
> others.
>
> This is not easy to fix.  The lock has to be taken based on the
> relation OID, not the relfilenode, but we don't have the relation OID
> in the dump file, and recording it there won't help, because the
> relfilenode can change under us if the relation is rewritten with
> CLUSTER or VACUUM FULL or relevant forms of ALTER TABLE.  I don't see
> a solution other than launching a separate worker for each database,
> which seems like it could be extremely expensive if there are many
> databases.  Also, I am pretty sure it's no good to take locks before
> recovery reaches a consistent state.

So we should move this loading of blocks once the recovery reaches a
consistent state so that we can connect to a database.  To allow
worker, to take a lock, we need to dump relation oid as well.  Is that
what you are envisioning to fix this problem?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] ALTER DEFAULT PRIVILEGES with GRANT/REVOKE ON SCHEMAS

2017-03-02 Thread David Steele

On 1/18/17 7:18 PM, Petr Jelinek wrote:
> On 10/01/17 17:33, Matheus de Oliveira wrote:
>>
>> On Mon, Jan 9, 2017 at 10:58 AM, Ashutosh Sharma > > wrote:
>>
>> > Also, should I add translations for that error message in other 
>> languages (I
>> > can do that without help of tools for pt_BR) or is that a latter 
>> process in
>> > the releasing?
>> >
>>
>> I think you should add it but i am not sure when it is done.
>>
>>
>> I'll ask one of the guys who work with pt_BR translations (I know him in
>> person).
> 
> Translations are not handled by patch author but by translation project
> so no need.
> 
>>
>> Attached a rebased version and with the docs update pointed by Ashutosh
>> Sharma.
>>
> 
> The patch looks good, the only thing I am missing is tab completion
> support for psql.

It looks like this patch is still waiting on an update for tab
completion in psql.

Do you know when will have that patch ready?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-03-02 Thread Robert Haas

On Tue, Feb 28, 2017 at 9:18 PM, Dilip Kumar  wrote:
> 0001- same as previous with some changes for freeing the shared memory stuff.
> 0002- nodeBitmapHeapScan refactoring, this applies independently
> 0003- actual parallel bitmap stuff applies on top of 0001 and 0002

0002 wasn't quite careful enough about the placement of #ifdef
USE_PREFETCH, but otherwise looks OK.  Committed after changing that
and getting rid of the local variable prefetch_iterator, which seemed
to be adding rather than removing complexity after this refactoring.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Print correct startup cost for the group aggregate.

2017-03-02 Thread Ashutosh Bapat

On Thu, Mar 2, 2017 at 6:06 PM, Rushabh Lathia  wrote:
> Hi,
>
> While reading through the cost_agg() I found that startup cost for the
> group aggregate is not correctly assigned. Due to this explain plan is
> not printing the correct startup cost.
>
> Without patch:
>
> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
> filler like '%foo%' group by aid;
>  QUERY PLAN
> -
>  GroupAggregate  (cost=81634.33..85102.04 rows=198155 width=12)
>Group Key: aid
>->  Sort  (cost=81634.33..82129.72 rows=198155 width=8)
>  Sort Key: aid
>  ->  Seq Scan on pgbench_accounts  (cost=0.00..61487.89 rows=198155
> width=8)
>Filter: (filler ~~ '%foo%'::text)
> (6 rows)
>
> With patch:
>
> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
> filler like '%foo%' group by aid;
>  QUERY PLAN
> -
>  GroupAggregate  (cost=82129.72..85102.04 rows=198155 width=12)
>Group Key: aid
>->  Sort  (cost=81634.33..82129.72 rows=198155 width=8)
>  Sort Key: aid
>  ->  Seq Scan on pgbench_accounts  (cost=0.00..61487.89 rows=198155
> width=8)
>Filter: (filler ~~ '%foo%'::text)
> (6 rows)
>

The reason the reason why startup_cost = input_startup_cost and not
input_total_cost for aggregation by sorting is we don't need the whole
input before the Group/Agg plan can produce the first row. But I think
setting startup_cost = input_startup_cost is also not exactly correct.
Before the plan can produce one row, it has to transit through all the
rows belonging to the group to which the first row belongs. On an
average it has to scan (total number of rows)/(number of groups)
before producing the first aggregated row. startup_cost will be
input_startup_cost + cost to scan (total number of rows)/(number of
groups) rows + cost of transiting over those many rows. Total cost =
startup_cost + cost of scanning and transiting through the remaining
number of input rows.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-03-02 Thread amul sul

On Wed, Mar 1, 2017 at 3:50 PM, Yugo Nagata  wrote:

> []
>
> I Agree that it is unavoidable partitions number in modulo hashing,
> > but we can do in other hashing technique.  Have you had thought about
> > Linear hashing[1] or Consistent hashing[2]?  This will allow us to
> > add/drop
> > partition with minimal row moment. 
>
> Thank you for your information of hash technique. I'll see them
> and try to allowing the number of partitions to be changed.
>
> 
Thanks for showing interest, I was also talking about this with Robert Haas
and
hacking on this, here is what we came up with this.

If we want to introduce hash partitioning without syntax contort and minimal
movement while changing hash partitions (ADD-DROP/ATTACH-DETACH operation),
at start I thought we could pick up linear hashing, because of in both the
hashing we might need to move approx tot_num_of_tuple/tot_num_of_partitions
tuples at adding new partition and no row moment required at dropping
partitioning.

With further thinking and talking through the idea of using linear hashing
with my team, we realized that has some problems specially during pg_dump
and pg_upgrade. Both a regular pg_dump and the binary-upgrade version of
pg_dump which is used by pg_restore need to maintain the identity of the
partitions. We can't rely on things like OID order which may be unstable, or
name order which might not match the order in which partitions were added.
So
somehow the partition position would need to be specified explicitly.

So later we came up with some syntax like this (just fyi, this doesn't add
any new keywords):

create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 4, remainder 0);
create table foo2 partition of foo with (modulus 8, remainder 1);  --
legal, modulus doesn't need to match
create table foo3 partition of foo with (modulus 8, remainder 4);  --
illegal, overlaps foo1

Here we need to enforce a rule that every modulus must be a factor of the
next
larger modulus. So, for example, if you have a bunch of partitions that all
have
modulus 5, you can add a new partition with modulus 10 or a new partition
with
modulus 15, but you cannot add both a partition with modulus 10 and a
partition
with modulus 15, because 10 is not a factor of 15. However, you could
simultaneously use modulus 4, modulus 8, modulus 16, and modulus 32 if you
wished, because each modulus is a factor of the next larger one. You could
also use modulus 10, modulus 20, and modulus 60. But you could not use
modulus
10, modulus 15, and modulus 60, because while both of the smaller module are
factors of 60, it is not true that each is a factor of the next.

Other advantages with this rule are:

1. Dropping (or detaching) and adding (or attaching) a partition can never
cause the rule to be violated.

2. We can easily build a tuple-routing data structure based on the largest
modulus.

For example: If the user has
partition 1 with (modulus 2, remainder 1),
partition 2 with (modulus 4, remainder 2),
partition 3 with (modulus 8, remainder 0) and
partition 4 with (modulus 8, remainder 4),

then we can build the following tuple routing array in the relcache:

== lookup table for hashvalue % 8 ==
0 => p3
1 => p1
2 => p2
3 => p1
4 => p4
5 => p1
6 => p2
7 => p1

3. It's also quite easy to test with a proposed new partition overlaps with
any
existing partition. Just build the mapping array and see if you ever end up
trying to assign a partition to a slot that's already been assigned to some
other partition.

We can still work on the proposed syntax - and I am open for suggestions.
One
more thought is to use FOR VALUES HAVING like:
CREATE TABLE foo1 PARTITION OF foo FOR VALUES HAVING (modulus 2, remainder
1);

But still more thoughts/inputs welcome here.

Attached patch implements former syntax, here is quick demonstration:

1.CREATE :
create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 2, remainder 1);
create table foo2 partition of foo with (modulus 4, remainder 2);
create table foo3 partition of foo with (modulus 8, remainder 0);
create table foo4 partition of foo with (modulus 8, remainder 4);

2. Display parent table info:
postgres=# \d+ foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats
target | Description
+-+---+--+-+--+--+-
 a  | integer |   |  | | plain|
 |
 b  | text|   |  | | extended |
 |
Partition key: HASH (a)
Partitions: foo1 WITH (modulus 2, remainder 1),
foo2 WITH (modulus 4, remainder 2),
foo3 WITH (modulus 8, remainder 0),
foo4 WITH (modulus 8, remainder 4)

3. Display child table info:
postgres=# \d+ foo1
Table "public.foo1"
 Column |  Type

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-02 Thread Robert Haas

On Thu, Mar 2, 2017 at 3:52 PM, Amit Langote
 wrote:
>> think we should omit this logic (and change the documentation to
>> match).  That is, a database-wide ANALYZE should update the statistics
>> for each child as well as for the parent.  Otherwise direct queries
>> against the children (and partitionwise joins, once we have that) are
>> going to go haywire.
>
> OK, done.  I updated both analyze.sgml and vacuum.sgml to be more up to
> date.  Both pages previously omitted materialized views.
>
> Attached updated patches.

Thanks, committed 0001 with a slight change to the wording.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Print correct startup cost for the group aggregate.

2017-03-02 Thread Rushabh Lathia

Hi,

While reading through the cost_agg() I found that startup cost for the
group aggregate is not correctly assigned. Due to this explain plan is
not printing the correct startup cost.

Without patch:

postgres=# explain select aid, sum(abalance) from pgbench_accounts where
filler like '%foo%' group by aid;
 QUERY
PLAN
-
 GroupAggregate  (cost=81634.33..85102.04 rows=198155 width=12)
   Group Key: aid
   ->  Sort  (cost=81634.33..82129.72 rows=198155 width=8)
 Sort Key: aid
 ->  Seq Scan on pgbench_accounts  (cost=0.00..61487.89 rows=198155
width=8)
   Filter: (filler ~~ '%foo%'::text)
(6 rows)

With patch:

postgres=# explain select aid, sum(abalance) from pgbench_accounts where
filler like '%foo%' group by aid;
 QUERY
PLAN
-
 GroupAggregate  (cost=82129.72..85102.04 rows=198155 width=12)
   Group Key: aid
   ->  Sort  (cost=81634.33..82129.72 rows=198155 width=8)
 Sort Key: aid
 ->  Seq Scan on pgbench_accounts  (cost=0.00..61487.89 rows=198155
width=8)
   Filter: (filler ~~ '%foo%'::text)
(6 rows)

PFA patch to correct the same.

Regards,
Rushabh Lathia
www.EnterpriseDB.com


fix_startup_cost_cost_agg.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-02 Thread Craig Ringer

On 2 March 2017 at 16:20, Stas Kelvich  wrote:
>
>> On 2 Mar 2017, at 11:00, Craig Ringer  wrote:
>>
>> We already have it, because we just decoded the PREPARE TRANSACTION.
>> I'm preparing a patch revision to demonstrate this.
>
> Yes, we already have it, but if server reboots between commit prepared (all
> prepared state is gone) and decoding of this commit prepared then we loose
> that mapping, isn’t it?

I was about to explain how restart_lsn works again, and how that would
mean we'd always re-decode the PREPARE TRANSACTION before any COMMIT
PREPARED or ROLLBACK PREPARED on crash. But...

Actually, the way you've implemented it, that won't be the case. You
treat PREPARE TRANSACTION as a special-case of COMMIT, and the client
will presumably send replay confirmation after it has applied the
PREPARE TRANSACTION. In fact, it has to if we want 2PC to work with
synchronous replication. This will allow restart_lsn to advance to
after the PREPARE TRANSACTION record if there's no other older xact
and we see a suitable xl_running_xacts record. So we wouldn't decode
the PREPARE TRANSACTION again after restart.

Hm.

That's actually a pretty good reason to xlog the gid for 2pc rollback
and commit if we're at wal_level >= logical . Being able to advance
restart_lsn and avoid the re-decoding work is a big win.

Come to think of it, we have to advance the client replication
identifier as part of PREPARE TRANSACTION anyway, otherwise we'd try
to repeat and re-prepare the same xact on crash recovery.

Given that, I withdraw my objection to adding the gid to commit and
rollback xlog records, though it should only be done if they're 2pc
commit/abort, and only if XLogLogicalInfoActive().

>> BTW, I've been reviewing the patch in more detail. Other than a bunch
>> of copy-and-paste that I'm cleaning up, the main issue I've found is
>> that in DecodePrepare, you call:
>>
>>SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
>>   parsed->nsubxacts, parsed->subxacts);
>>
>> but I am not convinced it is correct to call it at PREPARE TRANSACTION
>> time, only at COMMIT PREPARED time. We want to see the 2pc prepared
>> xact's state when decoding it, but there might be later commits that
>> cannot yet see that state and shouldn't have it visible in their
>> snapshots.
>
> Agree, that is problem. That allows to decode this PREPARE, but after that
> it is better to mark this transaction as running in snapshot or perform 
> prepare
> decoding with some kind of copied-end-edited snapshot. I’ll have a look at 
> this.

Thanks.

It's also worth noting that with your current approach, 2PC xacts will
produce two calls to the output plugin's commit() callback, once for
the PREPARE TRANSACTION and another for the COMMIT PREPARED or
ROLLBACK PREPARED, the latter two with a faked-up state. I'm not a
huge fan of that. It's not entirely backward compatible since it
violates the previously safe assumption that there's a 1:1
relationship between begin and commit callbacks with no interleaving,
for one thing, and I think it's also a bit misleading to send a
PREPARE TRANSACTION to a callback that could previously only receive a
true commit.

I particularly dislike calling a commit callback for an abort. So I'd
like to look further into the interface side of things. I'm inclined
to suggest adding new callbacks for 2pc prepare, commit and rollback,
and if the output plugin doesn't set them fall back to the existing
behaviour. Plugins that aren't interested in 2PC (think ETL) should
probably not have to deal with it, we might as well just send them
only the actually committed xacts, when they commit.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-03-02 Thread Kyotaro HORIGUCHI

I'm studying the normalization of Unicode so I apologize possible
stupidity in advance.

At Thu, 2 Mar 2017 15:50:34 +0900, Michael Paquier  
wrote in 
> On Tue, Feb 21, 2017 at 9:53 AM, Michael Paquier
>  wrote:
> > On Mon, Feb 20, 2017 at 9:41 PM, Aleksander Alekseev
> >  wrote:
> >>> Speaking about flaws, it looks like there is a memory leak in
> >>> array_to_utf procedure - result is allocated twice.
> >
> > Pushed a fix for this one on my branch.
> >
> >> And a few more things I've noticed after a closer look:
> >>
> >> * build_client_first_message does not free `state->client_nonce` if
> >>   second malloc (for `buf`) fails
> >> * same for `state->client_first_message_bare`
> >> * ... and most other procedures declared in fe-auth-scram.c file
> >>   (see malloc and strdup calls)
> >
> > You are visibly missing pg_fe_scram_free().
> >
> >> * scram_Normalize doesn't check malloc return value
> >
> > Yes, I am aware of this one. This makes the interface utterly ugly
> > though because an error log message needs to be handled across many
> > API layers. We could just assume anything returning NULL is equivalent
> > to an OOM and nothing else though.
> 
> Attached is a new patch set. I have combined SASLprep with the rest
> and fixed some conflicts. At the same time when going through NFKC
> this morning I have noticed that the implementation was doing the
> canonical decomposition and reordered the characters using the
> combining classes, but the string recomposition was still missing.
> This is addressed in this patch set, and well as on my dev tree:
> https://github.com/michaelpq/postgres/tree/scram

I looked into this and have some comments. Sorry for the random
order.

Composition version should be written some where.

Perhaps one of the most important things is that the code exactly
reflects the TR. pg_utf8_check_string returns true for ASCII
strings but the TR says that

| Text containing only ASCII characters (U+ to U+007F) is left
| unaffected by all of the normalization forms. This is
| particularly important for programming languages

And running SASLprepare for apparent ASCII string (which is the
most case) is a waste of CPU cycles.

>From the point of view of reflectivity(please someone teach me an
appropreate wording for this..), basically the code had better to
be a copy of the reference code as long as no performance
degradation occurs. Hangul part of get_decomposed_size(and
decompose_code, recompose_code) uses different naming from the
TR. hindex should be sindex and t should be tindex. Magic numbers
should have names in the TR. 

* A bit later, I noticed that these are copies of charlint. If so
  I want a description about that.)

The following comment is equivalent to "reordering in canonical
order". But the definition of "decomposition" includes this
step. (I'm not sure if it needs rewriting, since I acutually
could understand that.)

> /*
>  * Now that the decomposition is done, apply the combining class
>  * for each multibyte character.
>  */

utf_sasl_prepare does canonical ordering in a bit different way
than the TR. Totally it should make a sequence of characters
starts with combining class = 0 and in the order of combining
class. The code does stable bubble sort within each combining
character and it seems to work as the same way. (In short, no
probelm found here.)

>* make the allocation of the recomposed string based on that assumption.
>*/
>   recomp_chars = (pg_wchar *) malloc(decomp_size * sizeof(int));
>   lastClass = -1; /* this eliminates a special check */

utf_sasl_prepare uses variable names with two naming
conventions. Is there any reason for the two?

> starterCh = recomp_chars[0] = decomp_chars[0];

starterCh reads as "starter channel" why not "starter_char"?

Other than the magic numbers, I don't think that the following is
not a good expression.

>   else if (start >= 0xAC00 && start < 0xD7A4 &&
>!((start - 0xAC00) % 28) &&
>code >= 0x11A8 && code < 0x11C3)

"!((start - 0xAC00) % 28)" is a similar of !strcmp(a, b) and it
is confusing. It would be better to be "((start - 0xAC00) % 28) == 0".

The last sub-condition "code >= 0x11A8 && code < 0x11C3"
corresnponds to "(0 <= TIndex && TIndex <= TCount)". TIndex here
is (code - 0x11a7) and TCount = 28 so this two are not identical.

Totally the condition should be like the following.

>   else if (start >= 0xAC00 && start < 0xD7A4 &&
>((start - 0xAC00) % 28) == 0 &&
>code >= 0x11A7 && code <= 0x11C3)

The more preferable form is 

>   else if (start >= SBASE && start < (SBASE + SCOUNT) &&
>((start - SBASE) % TCOUNT) == 0 &&
>code >= TBASE && code <= (TBASE + TCOUNT))

"code <= (TBASE + TCOUNT)" somewhat smells. Then I found the
original code

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-02 Thread Robert Haas

On Thu, Mar 2, 2017 at 3:54 PM, Amit Kapila  wrote:
> On Thu, Mar 2, 2017 at 3:50 PM, Robert Haas  wrote:
>> On Tue, Feb 28, 2017 at 5:25 PM, Amit Kapila  wrote:
>>> When such a function (that contains statements which have parallel
>>> plans) is being executed as part of another parallel plan, it can
>>> allow spawning workers unboundedly.   Assume a query like  select *
>>> from t1 where c1 < func1(),  this can use parallel scan for t1 and
>>> then in master backend, during partial scan of t1, it can again spawn
>>> new set of workers for queries inside func1(), this can happen
>>> multiple times if parallel query inside func1() again calls some other
>>> function func2() which has parallel query.  Now, this might be okay,
>>> but today such a situation doesn't exist that Gather execution can
>>> invoke another Gather node, so it is worth to consider if we want to
>>> allow it.
>>
>> If we want to prohibit that, the check in standard_planner can be
>> changed from !IsParallelWorker() to !IsInParallelMode(), but I'm not
>> 100% sure whether that's an improvement or not.
>
> I am not sure how you can achieve that by just changing
> standard_planner() code, because the plans of statements inside pl can
> be cached in which case it will not try to regenerate the plan.

Oh, good point.

>>  I would be inclined
>> to leave it alone unless we get several votes to change it.
>
> Okay, not a problem.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] update comments about CatalogUpdateIndexes

2017-03-02 Thread Robert Haas

On Tue, Feb 28, 2017 at 10:45 PM, Tomas Vondra
 wrote:
> commit 2f5c9d9c9ce removed CatalogUpdateIndexes and replaced it with
> CatalogTupleInsert/CatalogTupleUpdate, which do both the operation and index
> update.
>
> But there remained three places in comments still referencing the removed
> CatalogUpdateIndexes. Attached is a patch fixing those places. It also
> removes the comment attribution to "bjm" from syscache.c, because after
> modifying it's no longer the original comment.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [pgsql-www] [HACKERS] Small issue in online devel documentation build

2017-03-02 Thread Fabien COELHO



After asking around, and testing with a colleague, we got the same strange 
size behavior on firefox mac as well.


There are 2 possible solutions:

(1) do not nest in the first place (i.e. apply the patch I sent).

(2) do use absolute sizes in the CSS, not relative ones like "1.3em"
   which accumulate multiplications when code appears in code,
   and count on the navigator ctrl-+/- for users to adjust size
   consistently to their needs.


(3) a CSS work-around, including the "!important" marker:

code code {
  font-size: 100% !important;
}

This is probably the minimum fuss solution.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-03-02 Thread Maksim Milyutin


On 02.03.2017 11:41, Robert Haas wrote:

Sounds generally good.  One thing to keep in mind is that - in this
system - a UNIQUE index on the parent doesn't actually guarantee
uniqueness across the whole partitioning hierarchy unless it so
happens that the index columns or expressions are the same as the
partitioning columns or expressions.  That's a little a
counterintuitive, and people have already been complaining that a
partitioned table + partitions doesn't look enough like a plain table.
However, I'm not sure there's a better alternative, because somebody
might want partition-wise unique indexes even though that doesn't
guarantee global uniqueness.  So I think if someday we have global
indexes, then we can plan to use some other syntax for that, like
CREATE GLOBAL [ UNIQUE ] INDEX.


Yes, I absolutely agree with your message that cross-partition 
uniqueness is guaranteed through global index on partitioned table apart 
from the case when the index key are the same as partitioning key (or 
index comprises partitioning key in general).


Thanks for your comment. I'll try to propose the first patches as soon 
as possible.


--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-02 Thread Robert Haas

On Tue, Feb 28, 2017 at 10:28 PM, Brandur Leach  wrote:
> * I've been generating a new OID value with the
>   `unused_oids` script, but pretty much every time I rebase
>   I collide with someone else's addition and need to find a
>   new one. Is it better for me to pick an OID in an exotic
>   range for my final patch, or that a committer just finds
>   a new one (if necessary) as they're putting it into
>   master?

It's nice if you can do it before the committer gets to it, but if it
ends up conflicting I personally don't find fixing it to be a big
deal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [pgsql-www] [HACKERS] Small issue in online devel documentation build

2017-03-02 Thread Fabien COELHO




I'm not sure whether it is a stylesheet issue: it is the stylesheet as 
interpreted by chrome... all is fine with firefox. Whether the bug is in 
chrome or the stylesheet or elsewhere is well beyond my HTML/CSS skills, but 
I can ask around.


After asking around, and testing with a colleague, we got the same 
strange size behavior on firefox mac as well.


There are 2 possible solutions:

(1) do not nest in the first place (i.e. apply the patch I sent).

(2) do use absolute sizes in the CSS, not relative ones like "1.3em"
which accumulate multiplications when code appears in code,
and count on the navigator ctrl-+/- for users to adjust size
consistently to their needs.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-02 Thread Amit Kapila

On Thu, Mar 2, 2017 at 3:50 PM, Robert Haas  wrote:
> On Tue, Feb 28, 2017 at 5:25 PM, Amit Kapila  wrote:
>> When such a function (that contains statements which have parallel
>> plans) is being executed as part of another parallel plan, it can
>> allow spawning workers unboundedly.   Assume a query like  select *
>> from t1 where c1 < func1(),  this can use parallel scan for t1 and
>> then in master backend, during partial scan of t1, it can again spawn
>> new set of workers for queries inside func1(), this can happen
>> multiple times if parallel query inside func1() again calls some other
>> function func2() which has parallel query.  Now, this might be okay,
>> but today such a situation doesn't exist that Gather execution can
>> invoke another Gather node, so it is worth to consider if we want to
>> allow it.
>
> If we want to prohibit that, the check in standard_planner can be
> changed from !IsParallelWorker() to !IsInParallelMode(), but I'm not
> 100% sure whether that's an improvement or not.
>

I am not sure how you can achieve that by just changing
standard_planner() code, because the plans of statements inside pl can
be cached in which case it will not try to regenerate the plan.

>  I would be inclined
> to leave it alone unless we get several votes to change it.
>

Okay, not a problem.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-02 Thread Amit Langote

On 2017/03/02 18:36, Robert Haas wrote:
> On Tue, Feb 28, 2017 at 11:35 AM, Amit Langote wrote:
>>> In acquire_inherited_sample_rows(), instead of inserting a whole
>>> stanza of logic just above the existing dispatch on relkind, I think
>>> we can get by with a very slightly update to what's already there.
>>>
>>> You can't use the result of a & b as a bool.  You need to write (a &
>>> b) != 0, because the bool should always use 1 for true and 0 for
>>> false; it should not set some higher-numbered bit.
>>
>> Oops, thanks for fixing that.  I suppose you are referring to this hunk in
>> the original patch:
>>
>> -relations = get_rel_oids(relid, relation);
>> +relations = get_rel_oids(relid, relation, options & VACOPT_VACUUM);
>>
>> And we need to do it this way in *this* case, because we're passing it as
>> a bool argument.  I see that it's OK to do this:
>>
>> stmttype = (options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
>>
>> Or this:
>>
>> if (options & VACOPT_VACUUM)
>> {
>> PreventTransactionChain(isTopLevel, stmttype);
> 
> In those cases it's still clearer, IMHO, to use != 0, but it's not
> necessary.  However, when you're explicitly creating a value of type
> "bool", then it's necessary.

Agreed.

> Actually, looking at this again, I now think this part is wrong:
> 
> +/*
> + * If only ANALYZE is to be performed, there is no need to 
> include
> + * partitions in the list.  In a database-wide ANALYZE, we only
> + * update the inheritance statistics of partitioned tables, not
> + * the statistics of individual partitions.
> + */
> +if (!is_vacuum && classForm->relispartition)
>  continue;
> 
> I was thinking earlier that an ANALYZE on the parent would also update
> the statistics for each child, but now I see that's not so.  So now I

Yep, the patch enables ANALYZE to be propagated to partitions when the
parent table is specified in the command.  The above logic in the patch
made the database-wide ANALYZE to ignore partitions, in which case, only
the inheritance statistics would be updated.  I can also see why that'd be
undesirable.

> think we should omit this logic (and change the documentation to
> match).  That is, a database-wide ANALYZE should update the statistics
> for each child as well as for the parent.  Otherwise direct queries
> against the children (and partitionwise joins, once we have that) are
> going to go haywire.

OK, done.  I updated both analyze.sgml and vacuum.sgml to be more up to
date.  Both pages previously omitted materialized views.

Attached updated patches.

Thanks,
Amit
>From 32447566c1c3bf5201e36141590f4654d144f777 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 28 Feb 2017 13:43:59 +0900
Subject: [PATCH 1/3] Avoid useless partitioned table ops

Also, recursively perform VACUUM and ANALYZE on partitions when the
command is applied to a partitioned table.
---
 doc/src/sgml/ddl.sgml| 47 ++--
 doc/src/sgml/ref/analyze.sgml|  7 --
 doc/src/sgml/ref/vacuum.sgml |  4 +++-
 src/backend/commands/analyze.c   | 37 +++--
 src/backend/commands/tablecmds.c | 15 +---
 src/backend/commands/vacuum.c| 51 +---
 6 files changed, 116 insertions(+), 45 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index ef0f7cf727..09b5b3ff70 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3792,8 +3792,7 @@ UNION ALL SELECT * FROM measurement_y2008m01;
Caveats
 

-The following caveats apply to partitioned tables implemented using either
-method (unless noted otherwise):
+The following caveats apply to using inheritance to implement partitioning:

 
  
@@ -3803,13 +3802,6 @@ UNION ALL SELECT * FROM measurement_y2008m01;
   partitions and creates and/or modifies associated objects than
   to write each by hand.
  
-
- 
-  This is not a problem with partitioned tables though, as trying to
-  create a partition that overlaps with one of the existing partitions
-  results in an error, so it is impossible to end up with partitions
-  that overlap one another.
- 
 
 
 
@@ -3822,14 +3814,6 @@ UNION ALL SELECT * FROM measurement_y2008m01;
   on the partition tables, but it makes management of the structure
   much more complicated.
  
-
- 
-  This problem exists even for partitioned tables.  An UPDATE
-  that causes a row to move from one partition to another fails, because
-  the new value of the row fails to satisfy the implicit partition
-  constraint of the original partition.  This might change in future
-  releases.
- 
 
 
 
@@ -3840,8 +3824,7 @@ UNION ALL SELECT * FROM measurement_y2008m01;
 
 ANALYZE measurement;
 
-  will only process the master table.  This

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-02 Thread Robert Haas

On Tue, Feb 28, 2017 at 5:25 PM, Amit Kapila  wrote:
> When such a function (that contains statements which have parallel
> plans) is being executed as part of another parallel plan, it can
> allow spawning workers unboundedly.   Assume a query like  select *
> from t1 where c1 < func1(),  this can use parallel scan for t1 and
> then in master backend, during partial scan of t1, it can again spawn
> new set of workers for queries inside func1(), this can happen
> multiple times if parallel query inside func1() again calls some other
> function func2() which has parallel query.  Now, this might be okay,
> but today such a situation doesn't exist that Gather execution can
> invoke another Gather node, so it is worth to consider if we want to
> allow it.

If we want to prohibit that, the check in standard_planner can be
changed from !IsParallelWorker() to !IsInParallelMode(), but I'm not
100% sure whether that's an improvement or not.  I would be inclined
to leave it alone unless we get several votes to change it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-02 Thread Fabien COELHO



Hello Corey,


Tom was pretty adamant that invalid commands are not executed. So in a case
like this, with ON_ERROR_STOP off:

\if false
\echo 'a'
\elif true
\echo 'b'
\elif invalid
\echo 'c'
\endif

Both 'b' and 'c' should print, because "\elif invalid" should not execute.
The code I had before was simpler, but it missed that.


Hmmm. You can still have it with one switch, by repeating the evaluation 
under true and ignore, even if the value is not used:


  switch(state)
  {
case NONE: error;
case ELSE_TRUE: error;
case ELSE_FALSE: error;
case IF_TRUE:
if (eval())
  ...
else error;
break;
case IF_FALSE:
if (eval())
  ...
else error;
break;
case IGNORE:
if (eval())
  ...
else error;
break;
}


Ok, so here's one idea I tossed around, maybe this will strike the right
balance for you.  If I create a function like this: [...]

Does that handle your objections?


For me, it is only slightly better: I think that for helping understanding 
and maintenance, the automaton state transitions should be all clear and 
loud in just one place, so I would really like to see a single common 
structure:


  if (is "if") switch on all states;
  else if (is "elif") switch on all states;
  else if (is "else") switch on all states;
  else if (is "endif") switch on all states;

And minimal necessary error handling around that.

Your suggestion does not achieve this, although I agree that the code 
structure would be cleaner thanks to the function.



p.s.  do we try to avoid constructs likeif (success = my_function(var1,
var2))   ?


I think it is allowed because I found some of them with grep (libpq, ecpg, 
postmaster, pg_dump, pg_upgrade...). They require added parentheses around 
the assignment:


  if ((success = eval())) ...

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Doc fix] Wrong explanation about tsquery_phrase

2017-03-02 Thread Robert Haas

On Tue, Feb 28, 2017 at 3:24 PM, Seki, Eiji  wrote:
> I found a wrong explanation about tsquery_phrase. I was slightly confused 
> when I tried to use it.
>
> The current document explains tsquery_phrase as the followings[1].
>
> - Function: tsquery_phrase(query1 tsquery, query2 tsquery, distance integer)
> - Return Type: tsquery
> - Description: make query that searches for query1 followed by query2 at 
> maximum distance distance
>
> However, 'exact distance' seems to be right instead of 'maximum distance'.
>
> This was probably overlooked in the following commit.
>
> 028350f619f7688e0453fcd2c4b25abe9ba30fa7

I think you are correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-03-02 Thread Robert Haas

On Mon, Feb 27, 2017 at 5:14 PM, Simon Riggs  wrote:
>> I like the idea of merging what are now two chapters into one and call it
>> Partitioned Tables, retaining the text that describes concepts
>
> +1
>
> ...but how?
>
> 5.10 Partitioned Tables and Related Solutions
> 5.10.1 Declarative Partitioning (this new feature)
> 5.10.2 Managing Partitions using Inheritance
> 5.10.3 Managing Partitions using Union All Views
> 5.10.4 Accessing tables using BRIN indexes
>
> So first and foremost we highlight the new feature and explain all its
> strengths with examples.
>
> We then explain the other possible ways of implementing something
> similar. This allows us to explain how to handle cases such as when
> partitions have different set of columns etc..
>
> I'm happy to put my name down to write the sections on Union All
> Views, which is useful but only mentioned in passing, and  the section
> on BRIN indexes, all of which would have their own independent sets of
> caveats.

I like the proposed 5.10.1 and 5.10.2 organization.  I am not sure
whether 5.10.3 and 5.10.4 make sense because I can't quite imagine
what content would go there.  We don't document UNION ALL views as a
method today, and I'm not sure we really need to start.  Also I don't
see what BRIN indexes have to do with partitioning.  But I may be
missing something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new high availability feature for the system with both asynchronous and synchronous replication

2017-03-02 Thread Robert Haas

On Tue, Feb 28, 2017 at 10:26 AM, Higuchi, Daisuke
 wrote:
> I create POC patch for my proposal of new feature for high availability.
> I want to discuss about this feature. But this feature might be PG11
> because discussion is not enough.
>
> This patch enables walsender for async to wait until walsender for sync 
> confirm
> WAL is flashed to Disk. This feature is activated when GUC parameter
> "async_walsender_delay" is set on.

So this new option makes asynchronous replication synchronous?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-02 Thread Robert Haas

On Tue, Feb 28, 2017 at 11:35 AM, Amit Langote
 wrote:
> How about the documentation changes in the attached updated 0001?  I know
> that this page needs a much larger rewrite as we are discussing in the
> other thread.

Looks good.

>> In acquire_inherited_sample_rows(), instead of inserting a whole
>> stanza of logic just above the existing dispatch on relkind, I think
>> we can get by with a very slightly update to what's already there.
>>
>> You can't use the result of a & b as a bool.  You need to write (a &
>> b) != 0, because the bool should always use 1 for true and 0 for
>> false; it should not set some higher-numbered bit.
>
> Oops, thanks for fixing that.  I suppose you are referring to this hunk in
> the original patch:
>
> -relations = get_rel_oids(relid, relation);
> +relations = get_rel_oids(relid, relation, options & VACOPT_VACUUM);
>
> And we need to do it this way in *this* case, because we're passing it as
> a bool argument.  I see that it's OK to do this:
>
> stmttype = (options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
>
> Or this:
>
> if (options & VACOPT_VACUUM)
> {
> PreventTransactionChain(isTopLevel, stmttype);

In those cases it's still clearer, IMHO, to use != 0, but it's not
necessary.  However, when you're explicitly creating a value of type
"bool", then it's necessary.

Actually, looking at this again, I now think this part is wrong:

+/*
+ * If only ANALYZE is to be performed, there is no need to include
+ * partitions in the list.  In a database-wide ANALYZE, we only
+ * update the inheritance statistics of partitioned tables, not
+ * the statistics of individual partitions.
+ */
+if (!is_vacuum && classForm->relispartition)
 continue;

I was thinking earlier that an ANALYZE on the parent would also update
the statistics for each child, but now I see that's not so.  So now I
think we should omit this logic (and change the documentation to
match).  That is, a database-wide ANALYZE should update the statistics
for each child as well as for the parent.  Otherwise direct queries
against the children (and partitionwise joins, once we have that) are
going to go haywire.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disallowing multiple queries per PQexec()

2017-03-02 Thread Surafel Temesgen

As far as my understanding the issue at that time was inability to process
creation

of a database and connecting to it with one query string and that can be
solved by

fixing transaction restriction checks for CREATE DATABASE or disallowing
multiple

queries in PQexe.


If the issue solved and allowing multiple queries in PQexec doesn’t result
in SQL injection

attacks that worth backwards-compatibility breakage by itself the item can
be drop or

included to v4 Protocol section if it contains items that break
backwards-compatibility already


regards

surafel

On Thu, Mar 2, 2017 at 1:02 AM, Jim Nasby  wrote:

> On 2/28/17 2:45 PM, Andres Freund wrote:
>
>> So if you don't want to allow multiple statements, use PQexecParams et
>> al.
>>
>
> That does leave most application authors out in the cold though, since
> they're using a higher level connection manager.
>
> If the maintenance burden isn't terribly high it would be nice to allow
> disabling multiple statements via a GUC.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)
>

Re: [HACKERS] [POC] hash partitioning

2017-03-02 Thread Yugo Nagata

Hi Aleksander ,

Thank you for reviewing the patch.

On Wed, 1 Mar 2017 17:08:49 +0300
Aleksander Alekseev  wrote:

> Hi, Yugo.
> 
> Today I've had an opportunity to take a closer look on this patch. Here are
> a few things that bother me.
> 
> 1a) There are missing commends here:
> 
> ```
> --- a/src/include/catalog/pg_partitioned_table.h
> +++ b/src/include/catalog/pg_partitioned_table.h
> @@ -33,6 +33,9 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
> charpartstrat;  /* partitioning strategy */
> int16   partnatts;  /* number of partition key columns */
> 
> +   int16   partnparts;
> +   Oid parthashfunc;
> +
> ```
> 
> 1b) ... and here:
> 
> ```
> --- a/src/include/nodes/parsenodes.h
> +++ b/src/include/nodes/parsenodes.h
> @@ -730,11 +730,14 @@ typedef struct PartitionSpec
> NodeTag type;
> char   *strategy;   /* partitioning strategy ('list' or 'range') 
> */
> List   *partParams; /* List of PartitionElems */
> +   int partnparts;
> +   List   *hashfunc;
> int location;   /* token location, or -1 if unknown */
>  } PartitionSpec;
> ```

ok, I'll add comments for these members;

> 
> 2) I believe new empty lines in patches are generally not welcomed by
> community:
> 
> ```
> @@ -49,6 +52,8 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
> pg_node_tree partexprs; /* list of expressions in the partition key;
>  * one item for each zero entry in 
> partattrs[] */
>  #endif
> +
> +
>  } FormData_pg_partitioned_table;
> ```

I'll remove it from the patch.

> 
> 3) One test fails on my laptop (Arch Linux, x64) [1]:
> 
> ```
> ***
> *** 344,350 
>   CREATE TABLE partitioned (
>   a int
>   ) PARTITION BY HASH (a);
> ! ERROR:  unrecognized partitioning strategy "hash"
>   -- specified column must be present in the table
>   CREATE TABLE partitioned (
>   a int
> --- 344,350 
>   CREATE TABLE partitioned (
>   a int
>   ) PARTITION BY HASH (a);
> ! ERROR:  number of partitions must be specified for hash partition
>   -- specified column must be present in the table
>   CREATE TABLE partitioned (
>   a int
> ```

These are expected behaviors in the current patch. However, there
are some discussions on the specification about CREATE TABLE, so
it may be changed.

> 
> Exact script I'm using for building and testing PostgreSQL could be
> found here [2].
> 
> 4) As I already mentioned - missing documentation.

I think writing the documentation should be waited fo the specification
getting a consensus.

> 
> In general patch looks quite good to me. I personally believe it has all
> the changes to be accepted in current commitfest. Naturally if community
> will come to a consensus regarding keywords, whether all partitions
> should be created automatically, etc :)
> 
> [1] http://afiskon.ru/s/dd/20cbe21934_regression.diffs.txt
> [2] http://afiskon.ru/s/76/a4fb71739c_full-build.sh.txt
> 
> On Wed, Mar 01, 2017 at 06:10:10PM +0900, Yugo Nagata wrote:
> > Hi Aleksander,
> > 
> > On Tue, 28 Feb 2017 18:05:36 +0300
> > Aleksander Alekseev  wrote:
> > 
> > > Hi, Yugo.
> > > 
> > > Looks like a great feature! I'm going to take a closer look on your code
> > > and write a feedback shortly. For now I can only tell that you forgot
> > > to include some documentation in the patch.
> > 
> > Thank you for looking into it. I'm forward to your feedback.
> > This is a proof of concept patch and additional documentation
> > is not included. I'll add this after reaching a consensus
> > on the specification of the feature.
> > 
> > > 
> > > I've added a corresponding entry to current commitfest [1]. Hope you
> > > don't mind. If it's not too much trouble could you please register on a
> > > commitfest site and add yourself to this entry as an author? I'm pretty
> > > sure someone is using this information for writing release notes or
> > > something like this.
> > 
> > Thank you for registering it to the commitfest. I have added me as an 
> > auther.
> > 
> > > 
> > > [1] https://commitfest.postgresql.org/13/1059/
> > > 
> > > On Tue, Feb 28, 2017 at 11:33:13PM +0900, Yugo Nagata wrote:
> > > > Hi all,
> > > > 
> > > > Now we have a declarative partitioning, but hash partitioning is not
> > > > implemented yet. Attached is a POC patch to add the hash partitioning
> > > > feature. I know we will need more discussions about the syntax and other
> > > > specifications before going ahead the project, but I think this runnable
> > > > code might help to discuss what and how we implement this.
> > > > 
> > > > * Description
> > > > 
> > > > In this patch, the hash partitioning implementation is basically based
> > > > on the list partitioning mechanism. However, partition bounds cannot be
> > > > specified explicitly, but this is used internally as hash partition
> > > > index, which is calculated

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-03-02 Thread Robert Haas

On Wed, Mar 1, 2017 at 4:23 PM, Maksim Milyutin
 wrote:
> As I've understood from thread [1] the main issue of creating local indexes
> for partitions is supporting REINDEX and DROP INDEX operations on parent
> partitioned tables. Furthermore Robert Haas mentioned the problem of
> creating index on key that is represented in partitions with single value
> (or primitive interval) [1] i.e. under the list-partitioning or
> range-partitioning with unit interval.
>
> I would like to propose the following solution:
>
> 1. Create index for hierarchy of partitioned tables and partitions
> recursively. Don't create relfilenode for indexes on parents, only entries
> in catalog (much like the partitioned table's storage elimination in [2]).
> Abstract index for partitioned tables is only for the reference on indexes
> of child tables to perform REINDEX and DROP INDEX operations.
>
> 2. Specify created indexes in pg_depend table so that indexes of child
> tables depend on corresponding indexes of parent tables with type of
> dependency DEPENDENCY_NORMAL so that index could be removed separately for
> partitions and recursively/separately for partitioned tables.
>
> 3. REINDEX on index of partitioned table would perform this operation on
> existing indexes of corresponding partitions. In this case it is necessary
> to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that
> partitions' indexes wouldn't be re-indexed multiple times in a row.
>
> Any thoughts?

Sounds generally good.  One thing to keep in mind is that - in this
system - a UNIQUE index on the parent doesn't actually guarantee
uniqueness across the whole partitioning hierarchy unless it so
happens that the index columns or expressions are the same as the
partitioning columns or expressions.  That's a little a
counterintuitive, and people have already been complaining that a
partitioned table + partitions doesn't look enough like a plain table.
However, I'm not sure there's a better alternative, because somebody
might want partition-wise unique indexes even though that doesn't
guarantee global uniqueness.  So I think if someday we have global
indexes, then we can plan to use some other syntax for that, like
CREATE GLOBAL [ UNIQUE ] INDEX.

But, of course, that's just my opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-02 Thread Stas Kelvich


> On 2 Mar 2017, at 11:00, Craig Ringer  wrote:
> 
> We already have it, because we just decoded the PREPARE TRANSACTION.
> I'm preparing a patch revision to demonstrate this.

Yes, we already have it, but if server reboots between commit prepared (all
prepared state is gone) and decoding of this commit prepared then we loose
that mapping, isn’t it?

> BTW, I've been reviewing the patch in more detail. Other than a bunch
> of copy-and-paste that I'm cleaning up, the main issue I've found is
> that in DecodePrepare, you call:
> 
>SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
>   parsed->nsubxacts, parsed->subxacts);
> 
> but I am not convinced it is correct to call it at PREPARE TRANSACTION
> time, only at COMMIT PREPARED time. We want to see the 2pc prepared
> xact's state when decoding it, but there might be later commits that
> cannot yet see that state and shouldn't have it visible in their
> snapshots. 

Agree, that is problem. That allows to decode this PREPARE, but after that
it is better to mark this transaction as running in snapshot or perform prepare
decoding with some kind of copied-end-edited snapshot. I’ll have a look at this.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-03-02 Thread Pavel Stehule

2017-03-02 8:04 GMT+01:00 Pavel Stehule :

> Hi
>
> 2017-03-02 1:12 GMT+01:00 Alvaro Herrera :
>
>>
>> I've been giving this a look.  I started by tweaking the docs once
>> again, and while verifying that the example works as expected, I
>> replayed what I have in sgml:
>>
>> ... begin SGML paste ...
>> 
>>  For example, given the following XML document:
>>   
>>
>>  the following query produces the result shown below:
>>
>>

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-02 Thread Craig Ringer

On 2 March 2017 at 16:00, Craig Ringer  wrote:

> What about if we ROLLBACK PREPARED after
> we made the snapshot visible?

Yeah, I'm pretty sure that's going to be a problem actually.

You're telling the snapshot builder that an xact committed at PREPARE
TRANSACTION time.

If we then ROLLBACK PREPARED, we're in a mess. It looks like it'll
cause issues with catalogs, user-catalog tables, etc.

I suspect we need to construct a temporary snapshot to decode PREPARE
TRANSACTION then discard it. If we later COMMIT PREPARED we should
perform the current steps to merge the snapshot state in.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-02 Thread Craig Ringer

On 2 March 2017 at 15:27, Stas Kelvich  wrote:
>
>> On 2 Mar 2017, at 01:20, Petr Jelinek  wrote:
>>
>> The info gets removed on COMMIT PREPARED but at that point
>> there is no real difference between replicating it as 2pc or 1pc since
>> the 2pc behavior is for all intents and purposes lost at that point.
>>
>
> If we are doing 2pc and COMMIT PREPARED happens then we should
> replicate that without transaction body to the receiving servers since tx
> is already prepared on them with some GID. So we need a way to construct
> that GID.

We already have it, because we just decoded the PREPARE TRANSACTION.
I'm preparing a patch revision to demonstrate this.

BTW, I've been reviewing the patch in more detail. Other than a bunch
of copy-and-paste that I'm cleaning up, the main issue I've found is
that in DecodePrepare, you call:

SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
   parsed->nsubxacts, parsed->subxacts);

but I am not convinced it is correct to call it at PREPARE TRANSACTION
time, only at COMMIT PREPARED time. We want to see the 2pc prepared
xact's state when decoding it, but there might be later commits that
cannot yet see that state and shouldn't have it visible in their
snapshots. Imagine, say

BEGIN;
ALTER TABLE t ADD COLUMN ...
INSERT INTO 't' ...
PREPARE TRANSACTION 'x';

BEGIN;
INSERT INTO t ...;
COMMIT;

COMMIT PREPARED 'x';

We want to see the new column when decoding the prepared xact, but
_not_ when decoding the subsequent xact between the prepare and
commit. This particular case cannot occur because the lock held by
ALTER TABLE blocks the INSERT in the other xact, but how sure are you
that there are no other snapshot issues that could arise if we promote
a snapshot to visible early? What about if we ROLLBACK PREPARED after
we made the snapshot visible?

The tests don't appear to cover logical decoding 2PC sessions that do
DDL at all. I emphasised that that would be one of the main problem
areas when we originally discussed this. I'll look at adding some,
since I think this is one of the areas that's most likely to find
issues.

> It seems that last ~10 messages I’m failing to explain some points about this
> topic. Or, maybe, I’m failing to understand some points. Can we maybe setup
> skype call to discuss this and post summary here? Craig? Peter?

Let me prep an updated patch. Time zones make it rather hard to do
voice; I'm in +0800 Western Australia, Petr is in +0200...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

92 matches

Mail list logo