date:20160403

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread David G. Johnston

On Sun, Apr 3, 2016 at 9:49 PM, Igal @ Lucee.org  wrote:

> On 4/3/2016 4:34 PM, Dave Cramer wrote:
>
>
> On 4/3/2016 8:21 AM, Dave Cramer wrote:
>
>>
>> I'd like to turn this question around. Are there good reasons to use -ng
>> over pgjdbc ?
>>
>> As to your question, you may be interested to know that pgjdbc is more
>> performant than ng.
>>
>> That's good to know, but unfortunately pgjdbc is unusable for us until
>> https://github.com/pgjdbc/pgjdbc/issues/488 is fixed.
>>
>> Also, as I mentioned in the ticket, I can't imagine RETURNING * being
>> performant if, for example, I INSERT a large chunk of data like an image
>> data or an uploaded file.
>>
>>
> Thanks for the reminder!
>
> So I"m guessing the reason to use ng is to avoid returning * ?
>
>
> I'm not sure if you're serious or if you're just trying to be "cute".
> This ticket should still be fixed.  It really doesn't make any sense to me
> that the driver will just blindly append "RETURNING *" to the query.
>
> If I want to return all of the columns from an UPDATE or an INSERT -- then
> I will add "RETURNING *" myself.  And if I don't add it, then I probably
> don't want the driver to second guess me, or to think that it knows better
> than I do what I want.  If I wanted software that thinks that it knows what
> I want better than I do -- then I would stick with SQL Server rather than
> switch to Postgres.
>
> The driver used to work until someone decided to append "RETURNING *" to
> the SQL code and make it unusable in many cases.
>
> Was there any discussion on this before it was added?
>

Except the main problem you describe is one where you WANT the driver to
be smart and understand that even though you've asked it to return
generated keys the statement you've provided it is one that incapable of
doing so.  Thus you do want it to interpret what you've told it and to do
what you mean and not what you say.

Obviously the problem is solvable - you yourself have said other's have
solved it.  That is one piece of good news - the other piece is that
PostgreSQL, and the JDBC driver in question, is open source software.

Somehow the driver needs to determine, reliably and ideally inexpensively,
how to effect:

"This parameter is ignored if the SQL statement is not an INSERT statement,
or an SQL statement able to return auto-generated keys (the list of such
statements is vendor-specific)."

https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int)

Discussions and patches exploring how to go about that are welcomed.

I do think that issue 488 needs to separate out and fix the non-conformance
to the API that is present - namely not ignoring the "int" argument when
the supplied statement is not capable (i.e., not an INSERT statement) - and
posted such (and a bit more) on the issue itself.

David J.

Re: [HACKERS] pgbench more operators & functions

2016-04-03 Thread Simon Riggs

On 4 April 2016 at 01:14, Michael Paquier  wrote:

> I'd say why not.
>

I'd say "why not wait?". Minor, non-urgent patches will definitely go
nowhere for a long time, so it gains nobody to submit now.

Submitting patches during freeze has been discouraged for many years, so
asking a long term contributor to avoid sending multiple minor patches is
in line with that.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Igal @ Lucee.org


On 4/3/2016 4:34 PM, Dave Cramer wrote:


On 4/3/2016 8:21 AM, Dave Cramer wrote:



I'd like to turn this question around. Are there good reasons to
use -ng over pgjdbc ?

As to your question, you may be interested to know that pgjdbc is
more performant than ng.

That's good to know, but unfortunately pgjdbc is unusable for us
until
https://github.com/pgjdbc/pgjdbc/issues/488 is fixed.

Also, as I mentioned in the ticket, I can't imagine RETURNING *
being performant if, for example, I INSERT a large chunk of data
like an image data or an uploaded file.


Thanks for the reminder!

So I"m guessing the reason to use ng is to avoid returning * ?


I'm not sure if you're serious or if you're just trying to be "cute".  
This ticket should still be fixed.  It really doesn't make any sense to 
me that the driver will just blindly append "RETURNING *" to the query.


If I want to return all of the columns from an UPDATE or an INSERT -- 
then I will add "RETURNING *" myself.  And if I don't add it, then I 
probably don't want the driver to second guess me, or to think that it 
knows better than I do what I want.  If I wanted software that thinks 
that it knows what I want better than I do -- then I would stick with 
SQL Server rather than switch to Postgres.


The driver used to work until someone decided to append "RETURNING *" to 
the SQL code and make it unusable in many cases.


Was there any discussion on this before it was added?

Igal Sapir
Lucee Core Developer
Lucee.org

Re: [HACKERS] Proposal: Generic WAL logical messages

2016-04-03 Thread Petr Jelinek


Hi,

I rebased this patch on top of current master as the generic wal commit 
added some conflicting changes. Also fixed couple of typos in comments 
and added non ascii message to test.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From b74e6dc5956446d514130c575263f4ba6ad71db3 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Wed, 24 Feb 2016 17:02:36 +0100
Subject: [PATCH] Logical Decoding Messages

---
 contrib/test_decoding/Makefile  |   2 +-
 contrib/test_decoding/expected/ddl.out  |  21 ++--
 contrib/test_decoding/expected/messages.out |  79 
 contrib/test_decoding/sql/ddl.sql   |   3 +-
 contrib/test_decoding/sql/messages.sql  |  25 +
 contrib/test_decoding/test_decoding.c   |  18 
 doc/src/sgml/func.sgml  |  45 +
 doc/src/sgml/logicaldecoding.sgml   |  38 
 src/backend/access/rmgrdesc/Makefile|   6 +-
 src/backend/access/rmgrdesc/logicalmsgdesc.c|  41 
 src/backend/access/transam/rmgr.c   |   1 +
 src/backend/replication/logical/Makefile|   2 +-
 src/backend/replication/logical/decode.c|  46 +
 src/backend/replication/logical/logical.c   |  38 
 src/backend/replication/logical/logicalfuncs.c  |  27 ++
 src/backend/replication/logical/message.c   |  87 +
 src/backend/replication/logical/reorderbuffer.c | 121 
 src/backend/replication/logical/snapbuild.c |  19 
 src/bin/pg_xlogdump/.gitignore  |  21 +---
 src/bin/pg_xlogdump/rmgrdesc.c  |   1 +
 src/include/access/rmgrlist.h   |   1 +
 src/include/catalog/pg_proc.h   |   4 +
 src/include/replication/logicalfuncs.h  |   2 +
 src/include/replication/message.h   |  41 
 src/include/replication/output_plugin.h |  13 +++
 src/include/replication/reorderbuffer.h |  22 +
 src/include/replication/snapbuild.h |   2 +
 27 files changed, 693 insertions(+), 33 deletions(-)
 create mode 100644 contrib/test_decoding/expected/messages.out
 create mode 100644 contrib/test_decoding/sql/messages.sql
 create mode 100644 src/backend/access/rmgrdesc/logicalmsgdesc.c
 create mode 100644 src/backend/replication/logical/message.c
 create mode 100644 src/include/replication/message.h

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 06c9546..309cb0b 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -38,7 +38,7 @@ submake-test_decoding:
 	$(MAKE) -C $(top_builddir)/contrib/test_decoding
 
 REGRESSCHECKS=ddl xact rewrite toast permissions decoding_in_xact \
-	decoding_into_rel binary prepared replorigin time
+	decoding_into_rel binary prepared replorigin time messages
 
 regresscheck: | submake-regress submake-test_decoding temp-install
 	$(MKDIR_P) regression_output
diff --git a/contrib/test_decoding/expected/ddl.out b/contrib/test_decoding/expected/ddl.out
index 77719e8..32cd24d 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -220,11 +220,17 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
 (7 rows)
 
 /*
- * check that disk spooling works
+ * check that disk spooling works (also for logical messages)
  */
 BEGIN;
 CREATE TABLE tr_etoomuch (id serial primary key, data int);
 INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+SELECT 'tx logical msg' FROM pg_logical_emit_message(true, 'test', 'tx logical msg');
+?column?
+
+ tx logical msg
+(1 row)
+
 DELETE FROM tr_etoomuch WHERE id < 5000;
 UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
 COMMIT;
@@ -233,12 +239,13 @@ SELECT count(*), min(data), max(data)
 FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')
 GROUP BY substring(data, 1, 24)
 ORDER BY 1,2;
- count |   min   |  max   
+-+
- 1 | BEGIN   | BEGIN
- 1 | COMMIT  | COMMIT
- 20467 | table public.tr_etoomuch: DELETE: id[integer]:1 | table public.tr_etoomuch: UPDATE: id[integer]: data[integer]:-
-(3 rows)
+ count |  min  |  max

Re: [HACKERS] psql metaqueries with \gexec

2016-04-03 Thread Corey Huinker

On Sun, Apr 3, 2016 at 8:42 PM, Corey Huinker 
wrote:

> On Sun, Apr 3, 2016 at 7:43 PM, Tom Lane  wrote:
>
>> Corey Huinker  writes:
>> >>> + The secondary queries are executed in top-to-bottom,
>> >>> left-to-right order, so the command
>>
>> >> I took that as meaning what I said above.
>>
>> > Would using the term https://en.wikipedia.org/wiki/Row-major_order be
>> more
>> > clear?
>>
>> Meh, I suspect a lot of people don't know that term.  Perhaps something
>> like "The generated queries are executed in the order in which the rows
>> are returned, and left-to-right within each row if there is more than one
>> column."
>>
>> regards, tom lane
>>
>
>
> I like it. Change forthcoming.
>


Changes since last submission:

Patch attached. Changes are thus:
- rebased
- pset.gexec_flag unconditionally set to false at end of SendQuery
- wording of documentation describing execution order of results
- rebasing allowed for undoing the re-wrap of enumerated slash commands.

Still not changed:
- exuberant braces, can remove if someone wants me to
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index e8afc24..1fb4b55 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1767,6 +1767,92 @@ Tue Oct 26 21:40:57 CEST 1999
   
 
   
+\gexec
+
+
+
+ Sends the current query input buffer to the server and treats
+ every column of every row of query output (if any) as a separate
+ SQL statement to be immediately executed. For example:
+
+= SELECT 'select 1 as ones', 'select x.y, x.y*2 as double from 
generate_series(1,4) as x(y)'
+- UNION ALL
+- SELECT 'select true as is_true', 'select ''2000-01-01''::date 
as party_over'
+- \gexec
+ones
+
+   1
+(1 row)
+
+y double
+- --
+1  2
+2  4
+3  6
+4  8
+(4 rows)
+
+is_true
+---
+t
+(1 row)
+
+party_over
+--
+01-01-2000
+(1 row)
+
+
+
+The generated queries are executed in the order in which the rows are 
returned, and
+left-to-right within each row if there is more than one column. So, 
the command
+above is the equivalent of:
+
+= select 1 as ones;
+= select x.y, x.y*2 as double from generate_series(1,4) as 
x(y);
+= select true as is_true;
+= select '2000-01-01'::date as party_over;
+
+
+
+If the query returns no rows, no error is raised, but no secondary 
query 
+is executed, either.
+
+=%gt; SELECT 'select 1 as expect_zero_rows ' where false
+- \gexec
+
+
+
+
+Results that are not valid SQL will of course fail, and the execution 
of further
+secondary statements is subject to the current \ON_ERROR_STOP setting.
+
+= SELECT 'a', 'select 1', 'b'
+- \gexec
+ERROR:  syntax error at or near "a"
+LINE 1: a
+^
+?column?
+
+   1
+(1 row)
+ERROR:  syntax error at or near "b"
+LINE 1: b
+^
+= \set ON_ERROR_STOP 1
+= SELECT 'a', 'select 1', 'b'
+- \gexec
+ERROR:  syntax error at or near "a"
+LINE 1: a
+^
+
+
+The results of the main query are sent directly to the server, without
+evaluation by psql. Therefore, they cannot contain psql vars or \ 
commands.
+
+
+  
+  
 \gset [ prefix ]
 
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 3401b51..1baff8e 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -871,6 +871,13 @@ exec_command(const char *cmd,
status = PSQL_CMD_SEND;
}
 
+   /* \gexec -- send query and treat every result cell as a query to be 
executed */
+   else if (strcmp(cmd, "gexec") == 0)
+   {
+   pset.gexec_flag = true;
+   status = PSQL_CMD_SEND;
+   }
+
/* \gset [prefix] -- send query and store result into variables */
else if (strcmp(cmd, "gset") == 0)
{
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index a2a07fb..0db5de2 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -796,6 +796,46 @@ StoreQueryTuple(const PGresult *result)
return success;
 }
 
+/*
+ * ExecQueryTuples: assuming query result is OK, execute every query
+ * result as its own statement
+ *
+ * Returns true if successful, false otherwise.
+ */
+static bool
+ExecQueryTuples(const PGresult *result)
+{
+   boolsuccess = true;
+   int nrows = PQntuples(result);
+   int ncolumns = PQnfields(result);
+   int r, c;
+
+   for (r = 0; r < nrows; r++)
+   {
+   for (c = 0; c < ncolumns; c++)
+   {
+   if (! PQgetisnull(result, r, c))
+   {
+   if ( ! SendQuery(PQgetvalue(result, r, c)) )
+   {
+

Re: [HACKERS] Default Roles (was: Additional role attributes)

2016-04-03 Thread Stephen Frost

Fujii,

* Fujii Masao (masao.fu...@gmail.com) wrote:
> On Tue, Jul 14, 2015 at 3:46 AM, Stephen Frost  wrote:
> > Possibly, but I'd need to look at them more closely than I have time to
> > right now.  Can you provide a use-case?  That would certainly help.
> 
> I have seen the monitoring system which periodically executes
> the statement like
> 
> SELECT * FROM pgstattuple('hoge');
> 
> to monitor the relation's physical length, the number of dead
> tuples, etc. Then, for example, if the number of dead tuples is
> increased unexpectedly and the relation becomes bloated, DBA tries
> to find out the cause and execute the maintenance commands
> if necessary to alleviate the situation. The monitoring system calls
> pgstattuple() at off-peak times because pgstattuple() needs to
> scan all the pages in the relation and its performance penalty
> might be big.
[...]
> Currently only superusers can call pgstattuple().

I started looking into this.

If we were starting from a green field, the pg_dump dump catalog ACLs
patch would work just fine for this case.  Simply remove the superuser
checks and REVOKE EXECUTE from public in the script and we're done.

Unfortunately, we aren't, and that's where things get complicated.  The
usual pg_upgrade case will, quite correctly, dump out the objects
exactly as they exist from the 9.5-or-earlier system and restore them
into the 9.6 system, however, the new .so will be installed and that .so
won't have the superuser checks in it.

The only approach to addressing this which I can think of offhand would
be to have the new .so library check the version of the extension and,
for the 1.3 (pre-9.6) and previous versions, keep the superuser check,
but skip it for 1.4 (9.6) and later versions.

I'm certainly open to other suggestions, of course.  Would be great to
remove those superuser() checks and allow non-superusers to be GRANT'd
the right to run those functions, as discussed.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Dave Cramer

On 3 April 2016 at 22:20, Stephen Frost  wrote:

> * Craig Ringer (cr...@2ndquadrant.com) wrote:
> > On 4 April 2016 at 10:13, Dave Cramer  wrote:
> > > Async notification is the easier part, I wasn't aware that the ssl
> library
> > > had this problem though
> >
> > AFAIK the issue is that even if there are bytes available on the
> underlying
> > socket, the SSL lib doesn't know if that means there are bytes readable
> > from the wrapped SSL socket. The traffic on the underlying socket could
> be
> > renegotiation messages or whatever.
> >
> > We really need non-blocking reads.
>
> That would certainly be a good way to address this, but I'm guessing
> it's non-trivial to implement.
>
>
AFAICT, the ng driver still has to generate traffic as well.


Dave Cramer

da...@postgresintl.com
www.postgresintl.com



> Thanks!
>
> Stephen
>

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Stephen Frost

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> On 4 April 2016 at 10:13, Dave Cramer  wrote:
> > Async notification is the easier part, I wasn't aware that the ssl library
> > had this problem though
>
> AFAIK the issue is that even if there are bytes available on the underlying
> socket, the SSL lib doesn't know if that means there are bytes readable
> from the wrapped SSL socket. The traffic on the underlying socket could be
> renegotiation messages or whatever.
> 
> We really need non-blocking reads.

That would certainly be a good way to address this, but I'm guessing
it's non-trivial to implement.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Stephen Frost

Dave,

* Dave Cramer (p...@fastcrypt.com) wrote:
> On 3 April 2016 at 21:56, Stephen Frost  wrote:
> > * Dave Cramer (p...@fastcrypt.com) wrote:
> > > On 3 April 2016 at 15:35, Stephen Frost  wrote:
> > > > Not generally much of a JDBC user myself, but the inability to avoid
> > > > polling for LISTEN notifications is a pretty big annoyance, which I
> > just
> > > > ran into with a client.  I understand that -ng has a way to avoid that,
> > > > even for SSL connections.
> > >
> > > Yes, it is a custom api. Easy enough to add. Is this something of
> > interest ?
> >
> > I'd say that there is definite interest in this and there's a lot of
> > conversation about it on the interwebs (stackoverflow, etc).
> >
> > My understanding is that the problem is actually with the SSL library
> > that the JDBC driver uses and that it basically lies about if there are
> > bytes available for reading (claiming that there never is by always
> > returning zero).  The -ng driver, as I understand it, uses a newer SSL
> > library which better supports asking if there are bytes available to
> > read.
>
> Hmmm. that complicates things...
> 
> Async notification is the easier part, I wasn't aware that the ssl library
> had this problem though

Right.  It's not sufficient to simply poll the JDBC driver to see if
there are notifications currently, you have to actually generate traffic
between the client and the server, to force the driver to read from the
SSL library and discover any notifications which have arrived from the
server.  That can be done by issuing an all-whitespace command, which
the server will respond to with an EmptyQueryMessage (iirc), but you
can't simply have the Java side sit in a select() loop or similar
waiting for notifications to arrive.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Craig Ringer

On 4 April 2016 at 10:13, Dave Cramer  wrote:


> Async notification is the easier part, I wasn't aware that the ssl library
> had this problem though
>
>
AFAIK the issue is that even if there are bytes available on the underlying
socket, the SSL lib doesn't know if that means there are bytes readable
from the wrapped SSL socket. The traffic on the underlying socket could be
renegotiation messages or whatever.

We really need non-blocking reads.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Dave Cramer

On 3 April 2016 at 21:56, Stephen Frost  wrote:

> Dave,
>
> * Dave Cramer (p...@fastcrypt.com) wrote:
> > On 3 April 2016 at 15:35, Stephen Frost  wrote:
> > > Not generally much of a JDBC user myself, but the inability to avoid
> > > polling for LISTEN notifications is a pretty big annoyance, which I
> just
> > > ran into with a client.  I understand that -ng has a way to avoid that,
> > > even for SSL connections.
> >
> > Yes, it is a custom api. Easy enough to add. Is this something of
> interest ?
>
> I'd say that there is definite interest in this and there's a lot of
> conversation about it on the interwebs (stackoverflow, etc).
>
> My understanding is that the problem is actually with the SSL library
> that the JDBC driver uses and that it basically lies about if there are
> bytes available for reading (claiming that there never is by always
> returning zero).  The -ng driver, as I understand it, uses a newer SSL
> library which better supports asking if there are bytes available to
> read.
>
>
Hmmm. that complicates things...

Async notification is the easier part, I wasn't aware that the ssl library
had this problem though



Dave Cramer

da...@postgresintl.com
www.postgresintl.com

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Stephen Frost

Dave,

* Dave Cramer (p...@fastcrypt.com) wrote:
> On 3 April 2016 at 15:35, Stephen Frost  wrote:
> > Not generally much of a JDBC user myself, but the inability to avoid
> > polling for LISTEN notifications is a pretty big annoyance, which I just
> > ran into with a client.  I understand that -ng has a way to avoid that,
> > even for SSL connections.
>
> Yes, it is a custom api. Easy enough to add. Is this something of interest ?

I'd say that there is definite interest in this and there's a lot of
conversation about it on the interwebs (stackoverflow, etc).

My understanding is that the problem is actually with the SSL library
that the JDBC driver uses and that it basically lies about if there are
bytes available for reading (claiming that there never is by always
returning zero).  The -ng driver, as I understand it, uses a newer SSL
library which better supports asking if there are bytes available to
read.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Recovery test failure for recovery_min_apply_delay on hamster

2016-04-03 Thread Michael Paquier

On Fri, Apr 1, 2016 at 10:02 AM, Michael Paquier
 wrote:
> On Fri, Apr 1, 2016 at 4:14 AM, Alvaro Herrera  
> wrote:
>> Noah Misch wrote:
>>
>>> The above-described topic is currently a PostgreSQL 9.6 open item.  Alvaro,
>>> since you committed the patch believed to have created it, you own this open
>>> item.
>>
>> That's correct.  Since we already had a patch available, I pushed it.
>> I'll wait for a few days before marking the open item as closed in the
>> wiki, to make sure that hamster reports success a few times.
>
> Thanks. I just did a couple of additional manual tests on hamster, and
> the sporadic failure does not show up anymore, so the daily runs
> should be in good shape now for this test.

The failing test has passed 7 days in a row, so I am marking this
issue as fixed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Peter Geoghegan

On Sun, Apr 3, 2016 at 4:08 PM, Greg Stark  wrote:
>> SELECT * FROM (SELECT * FROM (SELECT a FROM int_test UNION SELECT a
>> FROM int_test_padding) gg OFFSET 1e10) ff;
>
> ISTM OFFSET binds more loosely than UNION so these should be equivalent.

Not exactly:

postgres=# explain analyze select i from fff union select i from ggg
offset 1e10;
  QUERY
PLAN
---
 Limit  (cost=357771.51..357771.51 rows=1 width=4) (actual
time=2989.378..2989.378 rows=0 loops=1)
   ->  Unique  (cost=345771.50..357771.51 rows=242 width=4)
(actual time=2031.044..2930.903 rows=151 loops=1)
 ->  Sort  (cost=345771.50..351771.51 rows=242 width=4)
(actual time=2031.042..2543.167 rows=242 loops=1)
   Sort Key: fff.i
   Sort Method: external merge  Disk: 32840kB
   ->  Append  (cost=0.00..58620.04 rows=242 width=4)
(actual time=0.048..435.408 rows=242 loops=1)
 ->  Seq Scan on fff  (cost=0.00..14425.01
rows=101 width=4) (actual time=0.048..100.435 rows=101
loops=1)
 ->  Seq Scan on ggg  (cost=0.00..20195.01
rows=141 width=4) (actual time=0.042..138.991 rows=141
loops=1)
 Planning time: 0.123 ms
 Execution time: 2999.564 ms
(10 rows)

postgres=# explain analyze select * from (select i from fff union
select i from ggg) fg offset 1e10;
  QUERY
PLAN
---
 Limit  (cost=381771.53..381771.53 rows=1 width=4) (actual
time=2982.519..2982.519 rows=0 loops=1)
   ->  Unique  (cost=345771.50..357771.51 rows=242 width=4)
(actual time=2009.176..2922.874 rows=151 loops=1)
 ->  Sort  (cost=345771.50..351771.51 rows=242 width=4)
(actual time=2009.174..2522.761 rows=242 loops=1)
   Sort Key: fff.i
   Sort Method: external merge  Disk: 32840kB
   ->  Append  (cost=0.00..58620.04 rows=242 width=4)
(actual time=0.056..428.934 rows=242 loops=1)
 ->  Seq Scan on fff  (cost=0.00..14425.01
rows=101 width=4) (actual time=0.055..100.806 rows=101
loops=1)
 ->  Seq Scan on ggg  (cost=0.00..20195.01
rows=141 width=4) (actual time=0.042..139.994 rows=141
loops=1)
 Planning time: 0.127 ms
 Execution time: 2993.294 ms
(10 rows)

The startup and total costs are greater in the latter case, but the
costs match at and below the Unique node. Whether or not this was
relevant is probably unimportant, though. My habit is to do the offset
outside of the subquery.

My theory is that the master branch happened to get a HashAggregate
for the 128MB case that caused us both confusion, because it looked
cheaper than an external sort + unique when the sort required many
passes on the master branch only (where my cost_sort() changes that
lower the costing of external sorts were not included).  This wasn't a
low cardinality case, so the HashAggregate may have only won by a
small amount. I suppose that this could happen when the HashAggregate
was not predicted to use memory > work_mem, but a sort was. Then, as
the sort requires fewer merge passes with more work_mem, the master
branch starts to agree with the patch on the cheapest plan once again.
The trend of the patch being faster continues, after this one hiccup.

This is down to the cost_sort() changes, not the tuplesort.c changes.
But this was just a quirk, and the trend still seems clear. This
theory seems very likely based on this strange query's numbers for i5
master as work_mem increases:

Master: 16.711, 9.94, 4.891, 8.32, 4.88

Patch: 17.23, 9.77, 9.78, 4.95, 4.94

ISTM that master's last and third-from-last cases *both* use a
HashAggregate, where the patch behaves more consistently. After all,
the patch does smooth the cost function of sorting, an independently
useful goal to simply making sorting faster. We don't have to be
afraid of crossing an arbitrary, fuzzy threshold.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql metaqueries with \gexec

2016-04-03 Thread Corey Huinker

On Sun, Apr 3, 2016 at 7:43 PM, Tom Lane  wrote:

> Corey Huinker  writes:
> >>> + The secondary queries are executed in top-to-bottom,
> >>> left-to-right order, so the command
>
> >> I took that as meaning what I said above.
>
> > Would using the term https://en.wikipedia.org/wiki/Row-major_order be
> more
> > clear?
>
> Meh, I suspect a lot of people don't know that term.  Perhaps something
> like "The generated queries are executed in the order in which the rows
> are returned, and left-to-right within each row if there is more than one
> column."
>
> regards, tom lane
>


I like it. Change forthcoming.

Re: [HACKERS] pgbench more operators & functions

2016-04-03 Thread Michael Paquier

On Mon, Apr 4, 2016 at 1:15 AM, Fabien COELHO  wrote:
>>> Here is a simple patch...
>>
>> The patch deadline has passed and we are in the last CF of 9.6, as I'm
>> sure you know.
>
> Yes I know, I'm ok with that, I was just putting stuff in the queue for
> later, I was not asking for the patch to be considered right now.

There is nothing bad in sending a patch now. Though it is true that at
this period of the 9.6 development attention should be to focus on the
remaining patches in the CF.

>> Given that, please save up all your desired changes to pgbench and submit
>> in one go nearer the next CF. Thanks.
>
> Ok. Sorry, I did not realise that submitting stuff and recording it in a CF
> should not be done now.

Personally I have no problem if someone wants to register a patch,
however reviews on such a patch are unfair for the other existing
ones. Perhaps you got an idea and wanted to code it and thought that
it would be a good idea to send it now instead of three month later.
I'd say why not.

> Maybe you should consider not opening the September CF if this is the
> intent?
> Also, what period "nearer to the next CF" is appropriate for sending patches
> for this CF, which starts in five months?

The CF can remain open as far as it goes in my view to allow people to
add patches whenever they want, I see little point to close it and
prevent people from registering patches if they'd want to. They are
just not going to be considered for review and integration until the
next CF begins if those are new features, note that some of the
patches registered there are aimed at being bug fixes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Alex Shulgin

On Mon, Apr 4, 2016 at 1:06 AM, Tom Lane  wrote:

> Alex Shulgin  writes:
> > On Sun, Apr 3, 2016 at 10:53 PM, Tom Lane  wrote:
> >> The reason for checking toowide_cnt is that if it's greater than zero,
> >> then in fact the track list does NOT include all values seen, and it's
> >> flat-out wrong to claim that it is an exhaustive set of values.
>
> > But do we really state that with the short path?
>
> Well, yes: the point of the short path is that we're hypothesizing that
> the track list contains all values in the table, and that they should all
> be considered MCVs.  Another way to think about it is that if we didn't
> have the width-limit implementation restriction, those values would appear
> in the track list, almost certainly with count 1, and so we would not
> have taken the short path anyway.
>
> Now you can argue that the long path would have accepted all the real
> track-list entries as MCVs, and have rejected all these hypothetical
> count-1 entries for too-wide values, and so the end result would be the
> same.  But that gets back to the fact that that's not necessarily how
> the long path behaves, either today or with the proposed patch.
>

 Agreed.

The design intention was that the short path would handle columns
> with a finite, small set of values (think booleans or enums) where the
> ideal thing is that the table population is completely represented by
> the MCV list.  As soon as we have good reason to think that the MCV
> list won't represent the table contents fully, we should switch over
> to a different approach where we're trying to identify which sample
> values are common enough to justify putting in the MCV list.


This is a precious detail that I unfortunately couldn't find in any of the
sources of information available to me online. :-)

I don't have a habit of hanging on IRC channels, but now I wonder how
likely is it that I could learn this by just asking around on #postgresql
(or mailing you directly as the committer of this early implementation--is
that OK at all?)

Again, having this type of design decisions documented in the code might
save some time and confusion for the sociopath^W introvert-type of folks
like myself. ;-)

In that
> situation there are good reasons to not blindly fill the MCV list all
> the way to the stats-target limit, but to try to cut it off at the
> point of diminishing returns, so that the planner isn't saddled with
> a huge MCV list that doesn't really contain a lot of useful information.
>

This came to be my understanding also at some point.

So that's the logic behind there being two code paths with discontinuous
> behavior.  I'm not sure whether we need to try to narrow the discontinuity
> or whether it's OK to act that way and we just need to refine the decision
> rule about which path to take.  But anyway, comparisons of frequencies
> of candidate MCVs seem to me to make sense in a large-ndistinct scenario
> (where we have to be selective) but not a small-ndistinct scenario
> (where we should just take 'em all).
>

Yeah, this seems to be an open question.  And a totally new one to me in
the light of recent revelations.

>> The point of the original logic was to try to decide whether the
> >> values in the sample are significantly more common than typical values
> >> in the whole table population.  I think we may have broken that with
> >> 3d3bf62f3: you can't make any such determination if you consider only
> >> what's in the sample without trying to estimate what is not in the
> >> sample.
>
> > Speaking of rabbit holes...
> > I'm out of ideas, unfortunately.  We badly need more eyes/brainpower on
> > this, which is why I have submitted a talk proposal on this topic to
> > PGDay.ru this summer in St. Petersburg, fearing that it might be too late
> > to commit a satisfactory version during the current dev cycle for 9.6,
> and
> > in hope to draw at least some attention to it.
>
> If you're thinking it's too late to get more done for 9.6,


Not necessarily, but given the time constraints and some personal issues
that just keep popping up I'm not that optimistic as I was just 24h ago
anymore.


> I'm inclined to
> revert the aspect of 3d3bf62f3 that made us work from "d" (the observed
> number of distinct values in the sample) rather than stadistinct (the
> extrapolated estimate for the table).  On reflection I think that that's
> inconsistent with the theory behind the old MCV-cutoff rule.  It wouldn't
> matter if we were going to replace the cutoff rule with something else,
> but it's beginning to sound like that won't happen for 9.6.
>

Please feel free to do what you think is in the best interest of the people
preparing 9.6 for the freeze.  I'm not all that familiar with the process,
but I guess reverting this early might save some head-scratching if some
interesting interactions of this change combined with some others are going
to show up.

Cheers!
--
Alex

Re: [HACKERS] psql metaqueries with \gexec

2016-04-03 Thread Tom Lane

Corey Huinker  writes:
>>> + The secondary queries are executed in top-to-bottom,
>>> left-to-right order, so the command

>> I took that as meaning what I said above.

> Would using the term https://en.wikipedia.org/wiki/Row-major_order be more
> clear?

Meh, I suspect a lot of people don't know that term.  Perhaps something
like "The generated queries are executed in the order in which the rows
are returned, and left-to-right within each row if there is more than one
column."

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Dave Cramer

On 3 April 2016 at 12:18, Igal @ Lucee.org  wrote:

> On 4/3/2016 8:21 AM, Dave Cramer wrote:
>
>
> On 9 March 2016 at 20:49, Craig Ringer  wrote:
>
>> On 3/8/2016 5:12 PM, Craig Ringer wrote:
>>
>>>
>>> Are there good reasons to use pgjdbc over pgjdbc-ng then?
>>>
>>>
>> Maturity, support for older versions (-ng just punts on support for
>> anything except new releases) and older JDBC specs, completeness of support
>> for some extensions. TBH I haven't done a ton with -ng yet.
>>
>>
> I'd like to turn this question around. Are there good reasons to use -ng
> over pgjdbc ?
>
> As to your question, you may be interested to know that pgjdbc is more
> performant than ng.
>
> That's good to know, but unfortunately pgjdbc is unusable for us until
> https://github.com/pgjdbc/pgjdbc/issues/488 is fixed.
>
> Also, as I mentioned in the ticket, I can't imagine RETURNING * being
> performant if, for example, I INSERT a large chunk of data like an image
> data or an uploaded file.
>
>
>
Thanks for the reminder!

So I"m guessing the reason to use ng is to avoid returning * ?

Dave Cramer

da...@postgresintl.com
www.postgresintl.com


> Igal
>
>

Re: [HACKERS] psql metaqueries with \gexec

2016-04-03 Thread Corey Huinker

>
> + The secondary queries are executed in top-to-bottom,
> left-to-right order, so the command
>
> I took that as meaning what I said above.
>
>
Would using the term https://en.wikipedia.org/wiki/Row-major_order be more
clear?

 The secondary queries are executed in row-major order, so the
command...

If so, it will probably aide in translation as well.



> >> It should clear that in all the same
> >> places where gfname or gset_prefix get cleared.
>
> > I'm only seeing one place where those two vars are deallocated and
> nulled,
> > and that's at the tail end of SendQuery. Were you expecting more than
> just
> > that?
>
> That may be the only place; I've not looked around.
>

Yeah, seems that there might have been multiple ones in the past, but all
paths now funnel through the sendquery_cleanup: goto.

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Dave Cramer

On 3 April 2016 at 15:35, Stephen Frost  wrote:

> * Dave Cramer (p...@fastcrypt.com) wrote:
> > On 9 March 2016 at 20:49, Craig Ringer  wrote:
> >
> > > On 10 March 2016 at 00:41, Igal @ Lucee.org  wrote:
> > >
> > >> On 3/8/2016 5:12 PM, Craig Ringer wrote:
> > >>
> > >>> One of the worst problems (IMO) is in the driver architecture its
> self.
> > >>> It attempts to prevent blocking by guestimating the server's send
> buffer
> > >>> state and its recv buffer state, trying to stop them filling and
> causing
> > >>> the server to block on writes. It should just avoid blocking on its
> own
> > >>> send buffer, which it can control with confidence. Or use some of
> Java's
> > >>> rather good concurrency/threading features to simultaneously consume
> data
> > >>> from the receive buffer and write to the send buffer when needed,
> like
> > >>> pgjdbc-ng does.
> > >>>
> > >>
> > >> Are there good reasons to use pgjdbc over pgjdbc-ng then?
> > >>
> > >>
> > > Maturity, support for older versions (-ng just punts on support for
> > > anything except new releases) and older JDBC specs, completeness of
> support
> > > for some extensions. TBH I haven't done a ton with -ng yet.
> >
> > I'd like to turn this question around. Are there good reasons to use -ng
> > over pgjdbc ?
>
> Not generally much of a JDBC user myself, but the inability to avoid
> polling for LISTEN notifications is a pretty big annoyance, which I just
> ran into with a client.  I understand that -ng has a way to avoid that,
> even for SSL connections.
>
>
Yes, it is a custom api. Easy enough to add. Is this something of interest ?



Dave Cramer

da...@postgresintl.com
www.postgresintl.com

Re: [HACKERS] psql metaqueries with \gexec

2016-04-03 Thread Tom Lane

Corey Huinker  writes:
> On Sun, Apr 3, 2016 at 4:26 PM, Tom Lane  wrote:
>> I'm not Jim, but I have a question: what's the motivation for the
>> Fortran-order traversal of the result (down rows before across columns)?

> If I am understanding you correctly, it does work the way you find
> intuitive: all results from the first row are executed before any in the
> second row, so

Oh, I hadn't checked the code closely enough to realize that, but I see
you're right.  The patch's documentation seems very confusing on the
point, though:

+ The secondary queries are executed in top-to-bottom, left-to-right 
order, so the command

I took that as meaning what I said above.

>> It should clear that in all the same
>> places where gfname or gset_prefix get cleared.

> I'm only seeing one place where those two vars are deallocated and nulled,
> and that's at the tail end of SendQuery. Were you expecting more than just
> that?

That may be the only place; I've not looked around.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Greg Stark

On Sun, Apr 3, 2016 at 12:50 AM, Peter Geoghegan  wrote:
> 1459308434.753 2016-03-30 05:27:14 CEST STATEMENT:  SELECT * FROM
> (SELECT a FROM int_test UNION SELECT a FROM int_test_padding OFFSET
> 1e10) ff;
>
> I think that this is invalid, because the query was intended as this:
>
> SELECT * FROM (SELECT * FROM (SELECT a FROM int_test UNION SELECT a
> FROM int_test_padding) gg OFFSET 1e10) ff;

ISTM OFFSET binds more loosely than UNION so these should be equivalent.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Tom Lane

Alex Shulgin  writes:
> On Sun, Apr 3, 2016 at 10:53 PM, Tom Lane  wrote:
>> The reason for checking toowide_cnt is that if it's greater than zero,
>> then in fact the track list does NOT include all values seen, and it's
>> flat-out wrong to claim that it is an exhaustive set of values.

> But do we really state that with the short path?

Well, yes: the point of the short path is that we're hypothesizing that
the track list contains all values in the table, and that they should all
be considered MCVs.  Another way to think about it is that if we didn't
have the width-limit implementation restriction, those values would appear
in the track list, almost certainly with count 1, and so we would not
have taken the short path anyway.

Now you can argue that the long path would have accepted all the real
track-list entries as MCVs, and have rejected all these hypothetical
count-1 entries for too-wide values, and so the end result would be the
same.  But that gets back to the fact that that's not necessarily how
the long path behaves, either today or with the proposed patch.

The design intention was that the short path would handle columns
with a finite, small set of values (think booleans or enums) where the
ideal thing is that the table population is completely represented by
the MCV list.  As soon as we have good reason to think that the MCV
list won't represent the table contents fully, we should switch over
to a different approach where we're trying to identify which sample
values are common enough to justify putting in the MCV list.  In that
situation there are good reasons to not blindly fill the MCV list all
the way to the stats-target limit, but to try to cut it off at the
point of diminishing returns, so that the planner isn't saddled with
a huge MCV list that doesn't really contain a lot of useful information.

So that's the logic behind there being two code paths with discontinuous
behavior.  I'm not sure whether we need to try to narrow the discontinuity
or whether it's OK to act that way and we just need to refine the decision
rule about which path to take.  But anyway, comparisons of frequencies
of candidate MCVs seem to me to make sense in a large-ndistinct scenario
(where we have to be selective) but not a small-ndistinct scenario
(where we should just take 'em all).

>> The point of the original logic was to try to decide whether the
>> values in the sample are significantly more common than typical values
>> in the whole table population.  I think we may have broken that with
>> 3d3bf62f3: you can't make any such determination if you consider only
>> what's in the sample without trying to estimate what is not in the
>> sample.

> Speaking of rabbit holes...
> I'm out of ideas, unfortunately.  We badly need more eyes/brainpower on
> this, which is why I have submitted a talk proposal on this topic to
> PGDay.ru this summer in St. Petersburg, fearing that it might be too late
> to commit a satisfactory version during the current dev cycle for 9.6, and
> in hope to draw at least some attention to it.

If you're thinking it's too late to get more done for 9.6, I'm inclined to
revert the aspect of 3d3bf62f3 that made us work from "d" (the observed
number of distinct values in the sample) rather than stadistinct (the
extrapolated estimate for the table).  On reflection I think that that's
inconsistent with the theory behind the old MCV-cutoff rule.  It wouldn't
matter if we were going to replace the cutoff rule with something else,
but it's beginning to sound like that won't happen for 9.6.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql metaqueries with \gexec

2016-04-03 Thread Corey Huinker

On Sun, Apr 3, 2016 at 4:26 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > Jim, can you re-review this?
>
> I'm not Jim, but I have a question: what's the motivation for the
> Fortran-order traversal of the result (down rows before across columns)?
> It seems less than intuitive to do it that way.  Perhaps there's a good
> reason, but I do not see any defense of this choice in the thread.
>

If I am understanding you correctly, it does work the way you find
intuitive: all results from the first row are executed before any in the
second row, so

SELECT a, b UNION ALL SELECT c, d

will execute the queries in order: a, b, c, d as is shown in the changes to
the sgml and the test cases.

Did you get the impression of Fortran-ordering from the phrase
"top-to-bottom, left-to-right order" in the sgml patch? If so, would
calling it "rows first" or something else be more clear?

Or am I misunderstanding you and you find the order a, c, b, d more
intuitive?

I also note that the patch seems to be missing resetting gexec_flag
> in some error exit paths, possibly allowing the \gexec to be applied
> to the next query unexpectedly.  It should clear that in all the same
> places where gfname or gset_prefix get cleared.
>

Will do!

I'm only seeing one place where those two vars are deallocated and nulled,
and that's at the tail end of SendQuery. Were you expecting more than just
that?

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Alex Shulgin

On Sun, Apr 3, 2016 at 10:53 PM, Tom Lane  wrote:

> Alex Shulgin  writes:
> > This recalled observation can now also explain to me why in the
> regression
> > you've seen, the short path was not followed: my bet is that stadistinct
> > appeared negative.
>
> Yes, I think that's right.  The table under consideration had just a few
> live rows (I think 3), so that even though there was only one value in
> the sample, the "if (stats->stadistinct > 0.1 * totalrows)" condition
> succeeded.
>

Yeah, this part of the logic can be really surprising at times.

> Given that we change the logic in the complex path substantially, the
> > assumptions that lead to the "Take all MVCs" condition above might no
> > longer hold, and I see it as a pretty compelling argument to remove the
> > extra checks, thus keeping the only one: track_cnt == ndistinct.  This
> > should also bring the patch's effect more close to the thread's topic,
> > which is "More stable query plans".
>
> The reason for checking toowide_cnt is that if it's greater than zero,
> then in fact the track list does NOT include all values seen, and it's
> flat-out wrong to claim that it is an exhaustive set of values.
>

But do we really state that with the short path?

If there would be only one too wide value, it might be the only thing left
for the histogram in the end and will be discarded anyway, so from the end
result perspective there is no difference.

If there are multiple too wide values, they will be effectively discarded
by the histogram calculation part also, so again no difference from the
perspective of the end result.

The reason for the track_cnt <= num_mcv condition is that if that's not
> true, the track list has to be trimmed to meet the statistics target.
> Again, that's not optional.
>

Yes, but this check we only need in compute_distinct_stats() and we are
talking about compute_scalar_stats() now where track_cnt is always less
than or equals to num_mcv (again, please see at the bottom of the
thread-starting email), or is my analysis broken on this part?

I think the reasoning for having the stats->stadistinct > 0 test in there
> was that if we'd set it negative, then we think that the set of distinct
> values will grow --- which again implies that the set of values actually
> seen should not be considered exhaustive.

This is actually very neat.  So the idea here as I get it is that if we
have enough distinct values to suspect that more unique ones will be added
later as the table grows (which is a natural tendency with most of the
tables anyway), then at the moment the statistics we produce are going to
be actually used by the planner, it is likely that we no longer cover all
the distinct values by the MCV list, right?

I would *love* to see that documented in code comments at the least.

Of course, with a table as
> small as that regression-test example, we have little evidence to support
> either that conclusion or its opposite.
>

I think it might be possible to record historical ndistinct values between
the ANALYZE runs and use that as better evidence that the number of
distincts is actually growing rather than basing that decision on that
hard-coded 10% limit rule.  What do you think?

We do not support migration of pg_statistic system table during major
version upgrades (yet), so if we somehow achieve what I've just described,
it might be not a compatibility-breaking change anyway.

It's possible that what we should do to eliminate the sudden change
> of behaviors is to drop the "track list includes all values seen, and all
> will fit" code path entirely, and always go through the track list
> one-at-a-time.
>

That could also be an option, that I have considered initially.  Now that I
read your explanation of each check, I'm not that sure anymore.

If we do, though, the currently-proposed filter rules aren't going to
> be too satisfactory: if we have a relatively small group of roughly
> equally common MCVs, this logic would reject all of them, which is
> surely not what we want.
>

Indeed. :-(

> The point of the original logic was to try to decide whether the
> values in the sample are significantly more common than typical values
> in the whole table population.  I think we may have broken that with
> 3d3bf62f3: you can't make any such determination if you consider only
> what's in the sample without trying to estimate what is not in the
> sample.
>

Speaking of rabbit holes...

I'm out of ideas, unfortunately.  We badly need more eyes/brainpower on
this, which is why I have submitted a talk proposal on this topic to
PGDay.ru this summer in St. Petersburg, fearing that it might be too late
to commit a satisfactory version during the current dev cycle for 9.6, and
in hope to draw at least some attention to it.

Regards,
--
Alex

Re: [HACKERS] PATCH: use foreign keys to improve join estimates v1

2016-04-03 Thread Simon Riggs

On 3 April 2016 at 22:09, Tomas Vondra  wrote:

> On 04/03/2016 10:06 PM, Simon Riggs wrote:
>
>> On 14 March 2016 at 19:42, Tomas Vondra > > wrote:
>>
>> ...
>
>>
>>
>> I'd like to split this into 2 patches
>> 1) Add FK info to relcache
>> 2) use FK info in planner
>>
>> Would the credit for this be 1) Tomas, 2) Tomas + David ?
>>
>
> I could split the patch like that, but I don't quite see the point. The
> two parts will not overlap at all, so not making reviews any simpler. So
> the only reason for such split seems to be be different credits, but would
> we actually commit the pieces independently? Not sure about that.


Oh sorry. Reason for split was because adding the FK info to relcache was a
very solid addition, whereas we might imagine some churn around the planner
aspects.

I'd be inclined to see a little more explanatory docs on this.
>>
>
> That's probably a good idea. Do you have any particular place for the docs
> in mind?


Detailed comments in the planner part of the patch. The discussion around
this patch isn't reflected enough in the patch.

Have we done any tests on planning overhead for cases where multiple
>> FKs exist?
>>
>>
> I have done some benchmarks initially, and haven't measured any noticeable
> impact. But the code changed since than, so I'll redo that and post some
> results.


Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-04-03 Thread Simon Riggs

On 3 April 2016 at 21:32, Simon Riggs  wrote:

> On 14 March 2016 at 17:46, David Steele  wrote:
>
>> On 2/24/16 12:40 AM, Michael Paquier wrote:
>>
>> This has the merit to be clear, thanks for the input. Whatever the
>>> approach taken at the end we have two candidates:
>>> - Extend XLogInsert() with an extra argument for flags (Andres)
>>> - Introduce XLogInsertExtended with this extra argument and let
>>> XLogInsert() in peace (Robert and I).
>>> Actually, I lied, there was still something I could do for this
>>> thread: attached are two patches implementing both approaches as
>>> respectively a-1 and a-2. Patch b is the set of logs I used for the
>>> tests to show with a low checkpoint_timeout that checkpoints are
>>> getting correctly skipped on an idle system.
>>>
>>
>> Unfortunately neither A nor B apply anymore.
>>
>> However, since the patches can still be read through I wonder if Robert
>> or Andres would care to opine on whether A or B is better now that they can
>> see the full implementation?
>>
>> For my 2c I'm happy with XLogInsertExtended() since it seems to be a rare
>> use case where flags are required.  This can always be refactored in the
>> future if/when the use of flags spreads.
>>
>
> XLogInsertExtended() is the one I would commit, if.
>

I'm not very excited about this patch. Too much code for so little benefit
and fragile too.

I'm not even sure what definition of "meaningful progress" is useful. If we
commit this, a similar bug could be filed for many similar WAL record types.

Since we zero out WAL files specifically to allow them to be compressed in
the archive, this patch saves a few bytes in the archive. Seems like we
could fake up some WAL files if needed for restore with an external tool,
if we really care.

Since we agree it wouldn't be backpatched, its pretty much saying we don't
care.

A promising approach would be to just skip logging the RUNNING_XACTS record
if there are no xacts running, which is the case we care about here (no
progress => no xacts). HS startup can only happen at a checkpoint, so if
there is no WAL file there is no checkpoint, so we don't need the
RUNNING_XACTS record to be written. That's a short patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] snapshot too old, configured by time

2016-04-03 Thread Jeff Janes

On Wed, Mar 30, 2016 at 12:34 PM, Kevin Grittner  wrote:
> On Sat, Mar 19, 2016 at 1:27 AM, Jeff Janes  wrote:
>
>> I'm not sure if this is operating as expected.
>>
>> I set the value to 1min.
>>
>> I set up a test like this:
>>
>> pgbench -i
>>
>> pgbench -c4 -j4 -T 3600 &
>>
>> ### watch the size of branches table
>> while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done &
>>
>> ### set up a long lived snapshot.
>> psql -c 'begin; set transaction isolation level repeatable read;
>> select sum(bbalance) from pgbench_branches; select pg_sleep(300);
>> select sum(bbalance) from pgbench_branches;'
>>
>> As this runs, I can see the size of the pgbench_branches bloating once
>> the snapshot is taken, and continues bloating at a linear rate for the
>> full 300 seconds.
>>
>> Once the 300 second pg_sleep is up, the long-lived snapshot holder
>> receives an error once it tries to access the table again, and then
>> the bloat stops increasing.  But shouldn't the bloat have stopped
>> increasing as soon as the snapshot became doomed, which would be after
>> a minute or so?
>
> This is actually operating as intended, not a bug.  Try running a
> manual VACUUM command about two minutes after the snapshot is taken
> and you should get a handle on what's going on.  The old tuples
> become eligible for vacuuming after one minute, but that doesn't
> necessarily mean that autovacuum jumps in and that the space starts
> getting reused.

I can verify that a manual vacuum does stop the bloat from continuing
to increase.  But I don't see why autovacuum is not already stopping
the bloat.  It is running often enough that it really ought to do so
(as verified by setting log_autovacuum_min_duration = 0 and looking in
the log files to see that it is vacuuming the table once per nap-time,
although it is not accomplishing much by doing so as no tuples can be
removed.)

Also, HOT-cleanup should stop the bloat increase once the snapshot
crosses the old_snapshot_threshold without even needing to wait until
the next autovac runs.

Does the code intentionally only work for manual vacuums?  If so, that
seems quite surprising.  Or perhaps I am missing something else here.

Thanks,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: use foreign keys to improve join estimates v1

2016-04-03 Thread Tomas Vondra


On 04/03/2016 10:06 PM, Simon Riggs wrote:

On 14 March 2016 at 19:42, Tomas Vondra > wrote:


...



I'd like to split this into 2 patches
1) Add FK info to relcache
2) use FK info in planner

Would the credit for this be 1) Tomas, 2) Tomas + David ?


I could split the patch like that, but I don't quite see the point. The 
two parts will not overlap at all, so not making reviews any simpler. So 
the only reason for such split seems to be be different credits, but 
would we actually commit the pieces independently? Not sure about that.




I'd be inclined to see a little more explanatory docs on this.


That's probably a good idea. Do you have any particular place for the 
docs in mind?




Have we done any tests on planning overhead for cases where multiple
FKs exist?



I have done some benchmarks initially, and haven't measured any 
noticeable impact. But the code changed since than, so I'll redo that 
and post some results.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Peter Geoghegan

I just mean that, as you say, trace_sort is described in the documentation.

I don't think we'll end up with any kind of cost model here, so where that
would need to happen is only an academic matter. The create index parameter
would only be an option for the DBA. That's about the only case I can see
working for replacement selection: where indexes can be created with very
little memory quickly, by optimistically starting to write out the start of
the final index representation almost immediately, before most of the
underlying table has even been read in.

--
Peter Geoghegan

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Tom Lane

Alex Shulgin  writes:
> This recalled observation can now also explain to me why in the regression
> you've seen, the short path was not followed: my bet is that stadistinct
> appeared negative.

Yes, I think that's right.  The table under consideration had just a few
live rows (I think 3), so that even though there was only one value in
the sample, the "if (stats->stadistinct > 0.1 * totalrows)" condition
succeeded.

> Given that we change the logic in the complex path substantially, the
> assumptions that lead to the "Take all MVCs" condition above might no
> longer hold, and I see it as a pretty compelling argument to remove the
> extra checks, thus keeping the only one: track_cnt == ndistinct.  This
> should also bring the patch's effect more close to the thread's topic,
> which is "More stable query plans".

The reason for checking toowide_cnt is that if it's greater than zero,
then in fact the track list does NOT include all values seen, and it's
flat-out wrong to claim that it is an exhaustive set of values.

The reason for the track_cnt <= num_mcv condition is that if that's not
true, the track list has to be trimmed to meet the statistics target.
Again, that's not optional.

I think the reasoning for having the stats->stadistinct > 0 test in there
was that if we'd set it negative, then we think that the set of distinct
values will grow --- which again implies that the set of values actually
seen should not be considered exhaustive.  Of course, with a table as
small as that regression-test example, we have little evidence to support
either that conclusion or its opposite.

It's possible that what we should do to eliminate the sudden change
of behaviors is to drop the "track list includes all values seen, and all
will fit" code path entirely, and always go through the track list
one-at-a-time.

If we do, though, the currently-proposed filter rules aren't going to
be too satisfactory: if we have a relatively small group of roughly
equally common MCVs, this logic would reject all of them, which is
surely not what we want.

The point of the original logic was to try to decide whether the
values in the sample are significantly more common than typical values
in the whole table population.  I think we may have broken that with
3d3bf62f3: you can't make any such determination if you consider only
what's in the sample without trying to estimate what is not in the
sample.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Tomas Vondra


On 04/03/2016 09:41 PM, Peter Geoghegan wrote:

Hi Tomas,

...

3) replacement_sort_mem GUC

I'm not quite sure what's the plan with this GUC. It was useful for
development, but it seems to me it's pretty difficult to tune it in practice
(especially if you don't know the internals, which users generally don't).


I agree.


So I think we should either remove the GUC entirely, or move it to the
developer section next to trace_sort (and removing it from the conf).


I'll let Robert decide what's best here, but I see your point.

Side note: trace_sort actually is documented. It's a bit weird that we
have those TRACE_SORT macros at all IMV. I think we should rip those
out, and assume every build enables TRACE_SORT, because that's
probably true anyway.


What do you mean by documented? I thought this might be a good place is:

http://www.postgresql.org/docs/devel/static/runtime-config-developer.html

which is where trace_sort is documented.



I do think that replacement selection could be put to good use for
CREATE INDEX if the CREATE INDEX utility command had a "presorted"
parameter. Specifically, an implementation of the "presorted" idea
that I recently sketched [1] could do better than any presorted
replacement selection case we've seen so far because it allows the
implementation to optimistically create the index on-the-fly (if that
isn't possible, throw an error), without a second pass over tuples
sorted on tape. Nothing needs to be stored on a tape/temp file *at
all*; the only thing that is stored externally is the index itself.
But this patch doesn't add that feature, which can be worked on
without the user needing to know about replacement_sort_mem in 9.6.

So, I'm not in favor of ripping out the replacement selection code,
but think it could make sense to effectively disable it entirely for
the time being (with some developer feature to turn it back on for
testing). In general, I share your misgivings about the new GUC,
though.


OK.




I'm wondering whether 16MB default is not a bit too much, actually. As
explained before, that's not the amount of cache we should expect per
process, so maybe ~2-4MB would be a better default value?


The obvious presorted case is where we have a SERIAL column, but as I
mentioned even that isn't helped by RS. Moreover, it will be
significantly hurt with a default maintenance_work_mem of 64MB. Your
int4 CREATE INDEX cases clearly show this.


BTW couldn't we tune the value automatically for each sort, using the
pg_stats.correlation for the sort keys, when available (increasing the
replacement_sort_mem when correlation is close to 1.0)? Wouldn't that
improve at least some of the regressions?


Maybe, but that seems hard. That information isn't conveniently
available to the executor/tuplesort, and as we've seen with CREATE
INDEX int4 cases, it's far from clear that we'll win even when there
definitely is presorted input. Replacement selection needs more than a
simple correlation to win, so you'll end up building a cost model with
many new problems if this is to work.


Sure, that's non-trivial and definitely not a 9.6 material. I'm also 
wondering whether we need to do choose replacement_sort_mem at planning 
time, or whether it could be done in the executor based on actually 
observed data ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-04-03 Thread Simon Riggs

On 14 March 2016 at 17:46, David Steele  wrote:

> On 2/24/16 12:40 AM, Michael Paquier wrote:
>
> This has the merit to be clear, thanks for the input. Whatever the
>> approach taken at the end we have two candidates:
>> - Extend XLogInsert() with an extra argument for flags (Andres)
>> - Introduce XLogInsertExtended with this extra argument and let
>> XLogInsert() in peace (Robert and I).
>> Actually, I lied, there was still something I could do for this
>> thread: attached are two patches implementing both approaches as
>> respectively a-1 and a-2. Patch b is the set of logs I used for the
>> tests to show with a low checkpoint_timeout that checkpoints are
>> getting correctly skipped on an idle system.
>>
>
> Unfortunately neither A nor B apply anymore.
>
> However, since the patches can still be read through I wonder if Robert or
> Andres would care to opine on whether A or B is better now that they can
> see the full implementation?
>
> For my 2c I'm happy with XLogInsertExtended() since it seems to be a rare
> use case where flags are required.  This can always be refactored in the
> future if/when the use of flags spreads.
>

XLogInsertExtended() is the one I would commit, if.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] psql metaqueries with \gexec

2016-04-03 Thread Tom Lane

Robert Haas  writes:
> Jim, can you re-review this?

I'm not Jim, but I have a question: what's the motivation for the
Fortran-order traversal of the result (down rows before across columns)?
It seems less than intuitive to do it that way.  Perhaps there's a good
reason, but I do not see any defense of this choice in the thread.

I also note that the patch seems to be missing resetting gexec_flag
in some error exit paths, possibly allowing the \gexec to be applied
to the next query unexpectedly.  It should clear that in all the same
places where gfname or gset_prefix get cleared.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Alex Shulgin

On Sun, Apr 3, 2016 at 8:24 AM, Alex Shulgin  wrote:
>
> On Sun, Apr 3, 2016 at 7:49 AM, Tom Lane  wrote:
>>
>> Alex Shulgin  writes:
>> > On Sun, Apr 3, 2016 at 7:18 AM, Tom Lane  wrote:
>> >> Well, we have to do *something* with the last (possibly only) value.
>> >> Neither "include always" nor "omit always" seem sane to me.  What
other
>> >> decision rule do you want there?
>>
>> > Well, what implies that the last value is somehow special?  I would
think
>> > we should just do with it whatever we do with the rest of the candidate
>> > MCVs.
>>
>> Sure, but both of the proposed decision rules break down when there are
no
>> values after the one under consideration.  We need to do something sane
>> there.
>
>
> Hm... There are indeed the case where it would beneficial to have at
least 2 values in the histogram (to have at least the low/high bounds for
inequality comparison selectivity) instead of taking both to the MCV list
or taking one to the MCVs and having to discard the other.

I was thinking about this in the background...

Popularity of the last sample value (which is not the only) one can be:

a) As high as 50%, in case we have an even division between the only two
values in the sample.  Quite obviously, we should take this one into the
MCV list (well, unless the user has specified statistics_target of 1 for
some bizarre reason, but that should not be our problem).

b) As low as 2/(statistics_target*300), which is with the target set to a
maximum allowed value of 10,000 amounts to 2/(10,000*300) = 1 in
1,500,000.  This seems like a really tiny number, but if your table has
some tens of billions of rows, for example, seeing such a value at least
twice means that it might correspond to some thousands of rows in the
table, whereas seeing a value only once might mean just that: it's a unique
value.

In this case, putting such a duplicate value in the MCV list will allow a
much better selectivity estimate for equality comparison, as I've mentioned
earlier.  It also allows for better estimate with inequality comparison,
since MCVs are also consulted in this case.  I see no good reason to
discard such a value.

c) Or anything in between the above figures.

In my opinion that amounts to "include always" being the sane option.  Do
you see anything else as a problem here?

> Obviously, we need a fresh idea on how to handle this.

On reflection, the case where we have a duplicate value in the track list
which is not followed by any other sample should be covered by the short
path where we put all the tracked values in the MCV list, so there should
be no point to even consider all of the above!

But the exact short path condition is formulated like this:

if (track_cnt == ndistinct && toowide_cnt == 0 &&
stats->stadistinct > 0 &&
track_cnt <= num_mcv)
{
/* Track list includes all values seen, and all will fit */

So the execution path here is additionally put in dependence of two
factors: whether we've seen at least one too wide sample or the distinct
estimation produced a number higher than 10% of the estimated total table
size (which is yet another arbitrary limit, but that's not in scope of this
patch).

I've been puzzled by these conditions a lot, as I have mentioned in the
last section of this thread's starting email and I could not find anything
that would hint why they exist there, in the documentation, code comments
or emails on hackers leading to the introduction of analyze.c in the form
we know it today.  Probably we will never know, unless Tom still has some
notes on this topic from 15 years ago. ;-)

This recalled observation can now also explain to me why in the regression
you've seen, the short path was not followed: my bet is that stadistinct
appeared negative.

Given that we change the logic in the complex path substantially, the
assumptions that lead to the "Take all MVCs" condition above might no
longer hold, and I see it as a pretty compelling argument to remove the
extra checks, thus keeping the only one: track_cnt == ndistinct.  This
should also bring the patch's effect more close to the thread's topic,
which is "More stable query plans".

Regards,
--
Alex

Re: [HACKERS] PATCH: use foreign keys to improve join estimates v1

2016-04-03 Thread Simon Riggs

On 14 March 2016 at 19:42, Tomas Vondra 
wrote:

> Hi,
>
> On 03/14/2016 02:12 PM, David Steele wrote:
>
>> Hi Thomas,
>>
> ...
>
>> I don't think it would be clear to any reviewer which patch to apply
>> even if they were working.  I'm marking this "waiting for author".
>>
>
> Yeah. Rebasing the patches to current master was simple enough (there was
> just a simple #include conflict), but figuring out which of the patches is
> review-worthy was definitely difficult.
>
> I do believe David's last patch is the best step forward, so I've rebased
> it, and made some basic aesthetic fixes (adding or rewording comments on a
> few places, etc.)
>

I'd like to split this into 2 patches
1) Add FK info to relcache
2) use FK info in planner

Would the credit for this be 1) Tomas, 2) Tomas + David ?

I'd be inclined to see a little more explanatory docs on this.

Have we done any tests on planning overhead for cases where multiple FKs
exist?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Peter Geoghegan

Hi Tomas,

Overall, I agree with your summary.

On Sun, Apr 3, 2016 at 5:24 AM, Tomas Vondra
 wrote:
> So, let me sum this up, the way I understand the current status.
>
>
> 1) overall, the patch seems to be a clear performance improvement

I think that's clear. There are even cases that are over 5x faster,
which are representative of some real workloads (e.g., "CREATE INDEX x
ON numeric_test (a)" when low_cardinality_almost_asc +
maintenance_work_mem=512MB). A lot of the aggregate (datum sort)
cases, and heap tuple cases are 3x - 4x faster.

> 2) it's unlikely we can improve the performance further

I think it's very unlikely that these remaining regressions can be fixed, yes.

> Secondly, master is faster only if there's enough on-CPU cache for the
> replacement sort (for the memtuples heap), but the benchmark is not
> realistic in this respect as it only ran 1 query at a time, so it used the
> whole cache (6MB for i5, 12MB for Xeon).
>
> In reality there will be multiple processes running at the same time (e.g
> backends when running parallel query), significantly reducing the amount of
> cache per process, making the replacement sort inefficient and thus
> eliminating the regressions (by making the master slower).

Agreed. And even though the 8MB work_mem cases always have more than
enough CPU cache to fit the replacement selection heap, it's still no
worse than a mixed picture. The replacement_work_mem=64KB + patch +
8MB (maintenance_)work_mem cases (i.e. replacement selection entirely
disabled) don't always do worse; they are often a draw, and sometimes
do much better. We *still* win in many cases, sometimes by quite a bit
(e.g. "SELECT COUNT(DISTINCT a) FROM int_test" typically loses about
50% of its runtime when patched and RS is disabled at work_mem=8MB).
The cases where we lose at work_mem=8MB involve padding and a
correlation. The really important case of CREATE INDEX on int4 almost
always wins, *even with sorted input* (the
almost-but-not-quite-asc-sorted case loses ~1%). We can shave 20% -
30% off the CREATE INDEX int4 cases with just maintenance_work_mem =
8MB.

Even in these cases with so much CPU cache relative to work_mem, you
need to search for regressed cases to find them, and they are less
representative cases. So, while the picture for the work_mem=8MB
column alone seems kind of bad, if you consider where the regressions
actually occur, you could argue that even that's a draw.

> 3) replacement_sort_mem GUC
>
> I'm not quite sure what's the plan with this GUC. It was useful for
> development, but it seems to me it's pretty difficult to tune it in practice
> (especially if you don't know the internals, which users generally don't).

I agree.

> So I think we should either remove the GUC entirely, or move it to the
> developer section next to trace_sort (and removing it from the conf).

I'll let Robert decide what's best here, but I see your point.

Side note: trace_sort actually is documented. It's a bit weird that we
have those TRACE_SORT macros at all IMV. I think we should rip those
out, and assume every build enables TRACE_SORT, because that's
probably true anyway.

I do think that replacement selection could be put to good use for
CREATE INDEX if the CREATE INDEX utility command had a "presorted"
parameter. Specifically, an implementation of the "presorted" idea
that I recently sketched [1] could do better than any presorted
replacement selection case we've seen so far because it allows the
implementation to optimistically create the index on-the-fly (if that
isn't possible, throw an error), without a second pass over tuples
sorted on tape. Nothing needs to be stored on a tape/temp file *at
all*; the only thing that is stored externally is the index itself.
But this patch doesn't add that feature, which can be worked on
without the user needing to know about replacement_sort_mem in 9.6.

So, I'm not in favor of ripping out the replacement selection code,
but think it could make sense to effectively disable it entirely for
the time being (with some developer feature to turn it back on for
testing). In general, I share your misgivings about the new GUC,
though.

> I'm wondering whether 16MB default is not a bit too much, actually. As
> explained before, that's not the amount of cache we should expect per
> process, so maybe ~2-4MB would be a better default value?

The obvious presorted case is where we have a SERIAL column, but as I
mentioned even that isn't helped by RS. Moreover, it will be
significantly hurt with a default maintenance_work_mem of 64MB. Your
int4 CREATE INDEX cases clearly show this.

> BTW couldn't we tune the value automatically for each sort, using the
> pg_stats.correlation for the sort keys, when available (increasing the
> replacement_sort_mem when correlation is close to 1.0)? Wouldn't that
> improve at least some of the regressions?

Maybe, but that seems hard. That information isn't conveniently
available to the

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Stephen Frost

* Dave Cramer (p...@fastcrypt.com) wrote:
> On 9 March 2016 at 20:49, Craig Ringer  wrote:
> 
> > On 10 March 2016 at 00:41, Igal @ Lucee.org  wrote:
> >
> >> On 3/8/2016 5:12 PM, Craig Ringer wrote:
> >>
> >>> One of the worst problems (IMO) is in the driver architecture its self.
> >>> It attempts to prevent blocking by guestimating the server's send buffer
> >>> state and its recv buffer state, trying to stop them filling and causing
> >>> the server to block on writes. It should just avoid blocking on its own
> >>> send buffer, which it can control with confidence. Or use some of Java's
> >>> rather good concurrency/threading features to simultaneously consume data
> >>> from the receive buffer and write to the send buffer when needed, like
> >>> pgjdbc-ng does.
> >>>
> >>
> >> Are there good reasons to use pgjdbc over pgjdbc-ng then?
> >>
> >>
> > Maturity, support for older versions (-ng just punts on support for
> > anything except new releases) and older JDBC specs, completeness of support
> > for some extensions. TBH I haven't done a ton with -ng yet.
>
> I'd like to turn this question around. Are there good reasons to use -ng
> over pgjdbc ?

Not generally much of a JDBC user myself, but the inability to avoid
polling for LISTEN notifications is a pretty big annoyance, which I just
ran into with a client.  I understand that -ng has a way to avoid that,
even for SSL connections.

> As to your question, you may be interested to know that pgjdbc is more
> performant than ng.

Interesting, good to know.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Alex Shulgin

On Sun, Apr 3, 2016, 18:40 Tom Lane  wrote:

> Alex Shulgin  writes:
>
> > Well, if it's the only value it will be accepted simply because we are
> > checking that special case already and don't even bother to loop through
> > the track list.
>
> That was demonstrably not the case in the failing regression test.
> I forget what aspect of the test case allowed it to get past the short
> circuit, but it definitely got into the scan-the-track-list code.
>

Hm, I'll have to see that for myself, probably there was something more to
it.

--
Alex

Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-04-03 Thread Alex Shulgin

On Sun, Apr 3, 2016, 18:32 Tom Lane  wrote:

> Peter Geoghegan  writes:
> > On Thu, Feb 11, 2016 at 9:50 AM, Robert Haas 
> wrote:
> >> Wow, that's a fabulous idea.  I see Oleksandr has tried to implement
> >> it, although I haven't looked at the patch.  But I think this would be
> >> REALLY helpful.
>
> > +1
>
> Pushed.
>

Yay!

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Tom Lane

Alex Shulgin  writes:
> On Sun, Apr 3, 2016 at 7:49 AM, Tom Lane  wrote:
>> If there is only one value, it will have 100% of the samples, so it would
>> get included under just about any decision rule (other than "more than
>> 100% of this value plus following values").  I don't think making sure
>> this case works is sufficient to get us to a reasonable rule --- it's
>> a necessary case, but not a sufficient case.

> Well, if it's the only value it will be accepted simply because we are
> checking that special case already and don't even bother to loop through
> the track list.

That was demonstrably not the case in the failing regression test.
I forget what aspect of the test case allowed it to get past the short
circuit, but it definitely got into the scan-the-track-list code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-04-03 Thread Tom Lane

Peter Geoghegan  writes:
> On Thu, Feb 11, 2016 at 9:50 AM, Robert Haas  wrote:
>> Wow, that's a fabulous idea.  I see Oleksandr has tried to implement
>> it, although I haven't looked at the patch.  But I think this would be
>> REALLY helpful.

> +1

Pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Igal @ Lucee.org


On 4/3/2016 8:21 AM, Dave Cramer wrote:


On 9 March 2016 at 20:49, Craig Ringer > wrote:


On 3/8/2016 5:12 PM, Craig Ringer wrote:


Are there good reasons to use pgjdbc over pgjdbc-ng then?


Maturity, support for older versions (-ng just punts on support
for anything except new releases) and older JDBC specs,
completeness of support for some extensions. TBH I haven't done a
ton with -ng yet.


I'd like to turn this question around. Are there good reasons to use 
-ng over pgjdbc ?


As to your question, you may be interested to know that pgjdbc is more 
performant than ng.

That's good to know, but unfortunately pgjdbc is unusable for us until
https://github.com/pgjdbc/pgjdbc/issues/488 is fixed.

Also, as I mentioned in the ticket, I can't imagine RETURNING * being 
performant if, for example, I INSERT a large chunk of data like an image 
data or an uploaded file.



Igal

Re: [HACKERS] pgbench more operators & functions

2016-04-03 Thread Fabien COELHO



Hello Simon,


Here is a simple patch...


The patch deadline has passed and we are in the last CF of 9.6, as I'm 
sure you know.


Yes I know, I'm ok with that, I was just putting stuff in the queue for 
later, I was not asking for the patch to be considered right now.



Another minor patch on pgbench probably isn't going to help stabilise this
release, so these changes won't be available in core until late 2017 now.


Sure.


Given that, please save up all your desired changes to pgbench and submit
in one go nearer the next CF. Thanks.


Ok. Sorry, I did not realise that submitting stuff and recording it in a 
CF should not be done now.


Maybe you should consider not opening the September CF if this is the 
intent?


Also, what period "nearer to the next CF" is appropriate for sending 
patches for this CF, which starts in five months?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-04-03 Thread Dave Cramer

On 9 March 2016 at 20:49, Craig Ringer  wrote:

> On 10 March 2016 at 00:41, Igal @ Lucee.org  wrote:
>
>> On 3/8/2016 5:12 PM, Craig Ringer wrote:
>>
>>> One of the worst problems (IMO) is in the driver architecture its self.
>>> It attempts to prevent blocking by guestimating the server's send buffer
>>> state and its recv buffer state, trying to stop them filling and causing
>>> the server to block on writes. It should just avoid blocking on its own
>>> send buffer, which it can control with confidence. Or use some of Java's
>>> rather good concurrency/threading features to simultaneously consume data
>>> from the receive buffer and write to the send buffer when needed, like
>>> pgjdbc-ng does.
>>>
>>
>> Are there good reasons to use pgjdbc over pgjdbc-ng then?
>>
>>
> Maturity, support for older versions (-ng just punts on support for
> anything except new releases) and older JDBC specs, completeness of support
> for some extensions. TBH I haven't done a ton with -ng yet.
>
>
I'd like to turn this question around. Are there good reasons to use -ng
over pgjdbc ?

As to your question, you may be interested to know that pgjdbc is more
performant than ng.



Dave Cramer

da...@postgresintl.com
www.postgresintl.com

Re: [HACKERS] remove wal_level archive

2016-04-03 Thread Michael Paquier

On Mon, Mar 14, 2016 at 11:46 PM, Michael Paquier
 wrote:
> On Mon, Mar 14, 2016 at 12:50 PM, David Steele  wrote:
>> On 3/11/16 1:29 PM, David Steele wrote:
>>
>>> Unless anyone has objections I would like to mark this 'ready for
>>> committer'.
>>
>>
>> This patch is now ready for committer.
>
> Yes, thanks, I am cool with this version as well. I guess I should
> have done what you just did at my last review to be honest.

This patch has been committed as b555ed8, and maps wal_level =
"archive" to "hot_standby". As mentioned here, the condition to skip
checkpoints when a system is idle is already broken for a couple of
releases when wal_level = "hot_standby":
http://www.postgresql.org/message-id/CAB7nPqT5XdZYo0rub8hyBC9CiWxB6=tsg7ffm_qbr+q4l8z...@mail.gmail.com
So now it is broken as for "archive".

This has been already discussed on this thread:
http://www.postgresql.org/message-id/20151016203031.3019.72...@wrigleys.postgresql.org
And there is a patch as well:
https://commitfest.postgresql.org/9/398/

As the bug discussed previously is now also a regression specific to
9.6, are there objections if I add an open item?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Tomas Vondra


Hi,

So, let me sum this up, the way I understand the current status.


1) overall, the patch seems to be a clear performance improvement

There's far more "green" cells than "red" ones in the spreadsheets, and 
the patch often shaves off 30-75% of the sort duration. Improvements are 
pretty much all over the board, for all data sets (low/high/unique 
cardinality, initial ordering) and data types.



2) it's unlikely we can improve the performance further

The regressions are limited to low work_mem settings, which we believe 
are not representative (or at least not as much as the higher work_mem 
values), for two main reasons.


Firstly, if you need to sort a lot of data (e.g. 10M, as benchmarked), 
it's quite reasonable to use larger work_mem values. It'd be a bit 
backwards to reject a patch that gets you 2-4x speedup with enough 
memory, on the grounds that it may have negative impact with 
unreasonably small work_mem values.


Secondly, master is faster only if there's enough on-CPU cache for the 
replacement sort (for the memtuples heap), but the benchmark is not 
realistic in this respect as it only ran 1 query at a time, so it used 
the whole cache (6MB for i5, 12MB for Xeon).


In reality there will be multiple processes running at the same time 
(e.g backends when running parallel query), significantly reducing the 
amount of cache per process, making the replacement sort inefficient and 
thus eliminating the regressions (by making the master slower).



3) replacement_sort_mem GUC

I'm not quite sure what's the plan with this GUC. It was useful for 
development, but it seems to me it's pretty difficult to tune it in 
practice (especially if you don't know the internals, which users 
generally don't).


The current patch includes the new GUC right next to work_mem, which 
seems rather unfortunate - I do expect users to simply mess with 
assuming "more is better" which seems to be rather poor idea.


So I think we should either remove the GUC entirely, or move it to the 
developer section next to trace_sort (and removing it from the conf).


I'm wondering whether 16MB default is not a bit too much, actually. As 
explained before, that's not the amount of cache we should expect per 
process, so maybe ~2-4MB would be a better default value?


Also, not what I'm re-reading the docs for the GUC, I realize it also 
depends on how the input data is correlated - that seems like a rather 
useless criteria for tuning, though, because that varies per sort node, 
so using that for a GUC value set in postgresql.conf does not seem very 
wise. Actually even on per-query basis that's rather dubious, as it 
depends on how the sort node gets data (some nodes preserve ordering, 
some don't).


BTW couldn't we tune the value automatically for each sort, using the 
pg_stats.correlation for the sort keys, when available (increasing the 
replacement_sort_mem when correlation is close to 1.0)? Wouldn't that 
improve at least some of the regressions?



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-03 Thread Dilip Kumar

On Sun, Apr 3, 2016 at 2:28 PM, Amit Kapila  wrote:

>
> What is the conclusion of this test?  As far as I see, with the patch
> (0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect), the performance
> degradation is not fixed, but with pin-unpin patch, the performance seems
> to be better in most of the runs, however still you see less performance in
> some of the runs.  Is that right?
>

Summary Of the Run:
-
1. Throughout one run if we observe TPS every 30 seconds its stable in one
run.
2. With Head 64 client run vary between ~250,000 to ~45. you can see
below results.

run1: 434860(5min)
run2: 275815(5min)
run3: 437872(5min)
run4: 237033   (5min)
run5: 347611(10min)
run6: 435933   (20min)

3. With Head + 0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect
 with 64 client I always saw ~450,000 TPS

run1: 429520   (5min)
run2: 446249   (5min)
run3: 431066   (5min)
run4: 441280   (10min)
run5: 429844   (20 mins)

4. With Head+ 0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect
with 128 Client something performance is as low as ~150,000 which is never
observed with Head (with head it is constantly ~ 350,000 TPS).

run1: 372958  (5min)
run2: 167189  (5min)
run3: 381592  (5min)
run4: 441280  (10min)
run5: 362742  (20 min)

5. With Head+pinunpin-cas-8, with 64 client its ~ 550,000 TPS and with 128
client ~650,000 TPS.

6. With Head+ pinunpin-cas-8 +
0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect performance is
almost same as with
Head+pinunpin-cas-8, only sometime performance at 128 client is low
(~250,000 instead of 650,000)

Seems like Head+ pinunpin-cas-8 is giving best performance and without much
fluctuation.

> Can you answer some of the questions asked by Andres upthread[1]?
>
>
> [1] -
> http://www.postgresql.org/message-id/20160401083518.ge9...@awork2.anarazel.de
>

Non Default Parameter:

shared_buffer 8GB
Max Connections 150

./pgbench -c $threads -j $threads -T 1200 -M prepared -S -P 30 postgres

*BufferDesc Size:*
**
Head: 80 Bytes
Head+0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect : 72Bytes
Head+0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect+
Pinunpin-cas-8 : 64 Bytes

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Choosing parallel_degree

2016-04-03 Thread Julien Rouhaud

On 22/03/2016 07:58, Julien Rouhaud wrote:
> On 21/03/2016 20:38, Julien Rouhaud wrote:
>> On 21/03/2016 05:18, James Sewell wrote:
>>> OK cool, thanks.
>>>
>>> Can we remove the minimum size limit when the per table degree setting
>>> is applied?
>>>
>>> This would help for tables with 2  - 1000 pages combined with a high CPU
>>> cost aggregate.
>>>
>>
>> Attached v4 implements that. It also makes sure that the chosen
>> parallel_degree won't be more than the relation's estimated number of pages.
>>
> 
> And I just realize that it'd prevent from forcing parallelism on
> partitionned table, v5 attached removes the check on the estimated
> number of pages.
> 
> 

The feature freeze is now very close.  If this GUC is still wanted,
should I add this patch to the next commitfest?


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-03 Thread Amit Kapila

On Sun, Apr 3, 2016 at 9:55 AM, Dilip Kumar  wrote:

>
> On Fri, Apr 1, 2016 at 2:09 PM, Andres Freund  wrote:
>
>> One interesting thing to do would be to use -P1 during the test and see
>> how much the performance varies over time.
>>
>
> I have run with -P option, I ran for 1200 second and set -P as 30 second,
> and what I observed is that when its low its low throughout the run and
> when its high, Its high for complete run.
>
>
What is the conclusion of this test?  As far as I see, with the patch
(0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect), the performance
degradation is not fixed, but with pin-unpin patch, the performance seems
to be better in most of the runs, however still you see less performance in
some of the runs.  Is that right?   Can you answer some of the questions
asked by Andres upthread[1]?


[1] -
http://www.postgresql.org/message-id/20160401083518.ge9...@awork2.anarazel.de


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgbench more operators & functions

2016-04-03 Thread Simon Riggs

On 3 April 2016 at 06:54, Fabien COELHO  wrote:


> Here is a simple patch...


The patch deadline has passed and we are in the last CF of 9.6, as I'm sure
you know.

Another minor patch on pgbench probably isn't going to help stabilise this
release, so these changes won't be available in core until late 2017 now.

Given that, please save up all your desired changes to pgbench and submit
in one go nearer the next CF. Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

[HACKERS] Small fix: avoid passing null pointers to memcpy()

2016-04-03 Thread Piotr Stefaniak


Hello,

from running the regression test suite (including TAP tests) and also 
sqlsmith, I've got a couple of places where UBSan reported calls to 
memcpy() with null pointer passed as either source or destination.


Patch attached.
diff --git a/contrib/pgcrypto/px.c b/contrib/pgcrypto/px.c
index cfb3b50..600824a 100644
--- a/contrib/pgcrypto/px.c
+++ b/contrib/pgcrypto/px.c
@@ -180,7 +180,7 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned klen,
 		memset(ivbuf, 0, ivs);
 		if (ivlen > ivs)
 			memcpy(ivbuf, iv, ivs);
-		else
+		else if (ivlen > 0)
 			memcpy(ivbuf, iv, ivlen);
 	}
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34ba385..67c7586 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -305,7 +305,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
 	/*
 	 * copy the scan key, if appropriate
 	 */
-	if (key != NULL)
+	if (key != NULL && scan->rs_nkeys > 0)
 		memcpy(scan->rs_key, key, scan->rs_nkeys * sizeof(ScanKeyData));
 
 	/*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 7e37331..e7599be 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -4872,8 +4872,9 @@ SerializeTransactionState(Size maxsize, char *start_address)
 	{
 		if (TransactionIdIsValid(s->transactionId))
 			workspace[i++] = s->transactionId;
-		memcpy([i], s->childXids,
-			   s->nChildXids * sizeof(TransactionId));
+		if (s->nChildXids > 0)
+			memcpy([i], s->childXids,
+   s->nChildXids * sizeof(TransactionId));
 		i += s->nChildXids;
 	}
 	Assert(i == nxids);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index b88e012..dc7a323 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -402,12 +402,14 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
 	CurrentSnapshot->xmax = sourcesnap->xmax;
 	CurrentSnapshot->xcnt = sourcesnap->xcnt;
 	Assert(sourcesnap->xcnt <= GetMaxSnapshotXidCount());
-	memcpy(CurrentSnapshot->xip, sourcesnap->xip,
-		   sourcesnap->xcnt * sizeof(TransactionId));
+	if (sourcesnap->xcnt > 0)
+		memcpy(CurrentSnapshot->xip, sourcesnap->xip,
+			   sourcesnap->xcnt * sizeof(TransactionId));
 	CurrentSnapshot->subxcnt = sourcesnap->subxcnt;
 	Assert(sourcesnap->subxcnt <= GetMaxSnapshotSubxidCount());
-	memcpy(CurrentSnapshot->subxip, sourcesnap->subxip,
-		   sourcesnap->subxcnt * sizeof(TransactionId));
+	if (sourcesnap->subxcnt > 0)
+		memcpy(CurrentSnapshot->subxip, sourcesnap->subxip,
+			   sourcesnap->subxcnt * sizeof(TransactionId));
 	CurrentSnapshot->suboverflowed = sourcesnap->suboverflowed;
 	CurrentSnapshot->takenDuringRecovery = sourcesnap->takenDuringRecovery;
 	/* NB: curcid should NOT be copied, it's a local matter */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Alex Shulgin

On Sun, Apr 3, 2016 at 7:49 AM, Tom Lane  wrote:

> Alex Shulgin  writes:
> > On Sun, Apr 3, 2016 at 7:18 AM, Tom Lane  wrote:
> >> Well, we have to do *something* with the last (possibly only) value.
> >> Neither "include always" nor "omit always" seem sane to me.  What other
> >> decision rule do you want there?
>
> > Well, what implies that the last value is somehow special?  I would think
> > we should just do with it whatever we do with the rest of the candidate
> > MCVs.
>
> Sure, but both of the proposed decision rules break down when there are no
> values after the one under consideration.  We need to do something sane
> there.
>

Hm... There are indeed the case where it would beneficial to have at least
2 values in the histogram (to have at least the low/high bounds for
inequality comparison selectivity) instead of taking both to the MCV list
or taking one to the MCVs and having to discard the other.

Obviously, we need a fresh idea on how to handle this.

> For "the only value" case: we cannot build a histogram out of a single
> > value, so omitting it from MCVs is not a good strategy, ISTM.
> > From my point of view that amounts to "include always".
>
> If there is only one value, it will have 100% of the samples, so it would
> get included under just about any decision rule (other than "more than
> 100% of this value plus following values").  I don't think making sure
> this case works is sufficient to get us to a reasonable rule --- it's
> a necessary case, but not a sufficient case.
>

Well, if it's the only value it will be accepted simply because we are
checking that special case already and don't even bother to loop through
the track list.

--
Alex

54 matches

Mail list logo