date:20160314

Re: [HACKERS] propose: detail binding error log

2016-03-14 Thread Craig Ringer

On 15 March 2016 at 10:52, Ioseph Kim  wrote:

> Hi, hackers.
>
> I had a error message while using PostgreSQL.
>
> "ERROR:  42804: column "a" is of type boolean but expression is of type
> integer at character 25
> LOCATION:  transformAssignedExpr, parse_target.c:529"
>
> This error is a java jdbc binding error.
> column type is boolean but bind variable is integer.
>
> I want see that value of bind variable at a server log.
>

log_statement = 'all' will log bind var values, but only when the statement
actually gets executed.

This is an error in parsing or parameter binding, before we execute the
statement. It's a type error and not related to the actual value of the
bind variable - you could put anything in the variable and you would get
the same error.

PostgreSQL is complaining that you bound an integer variable and tried to
insert it into a boolean column. There is no implicit cast from integer to
boolean, so that's an error. It doesn't care if the integer is 1, 42, or
null, since this is a type error. There's no need to log the value since
it's irrelevant.

Observe:

postgres=# create table demo(col boolean);
CREATE TABLE

postgres=# prepare my_insert(boolean) AS insert into demo(col) values ($1);
PREPARE

postgres=# prepare my_insertint(integer) AS insert into demo(col) values
($1);
ERROR:  column "col" is of type boolean but expression is of type integer
LINE 1: ... my_insertint(integer) AS insert into demo(col) values ($1);
   ^
HINT:  You will need to rewrite or cast the expression.

As you see, the error is at PREPARE time, when we parse and validate the
statement, before we bind parameters to it. You can get the same effect
without prepared statements by specifying the type of a literal explicitly:

postgres=# insert into demo(col) values ('1'::integer);
ERROR:  column "col" is of type boolean but expression is of type integer
LINE 1: insert into demo(col) values ('1'::integer);
  ^
HINT:  You will need to rewrite or cast the expression.

At the time PostgreSQL parses the statement it doesn't know the parameter
values yet, because PgJDBC hasn't sent them to it. It  cannot log them even
if they mattered, which they don't.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-14 Thread Amit Kapila

On Tue, Mar 15, 2016 at 5:22 AM, Robert Haas  wrote:
>
> On Sat, Mar 12, 2016 at 1:58 AM, Amit Kapila 
wrote:
> > Yeah, that makes the addition of test for this functionality difficult.
> > Robert, do you have any idea what kind of test would have caught this
issue?
>
> Yep.  Committed with that test:
>

Nice.  Thanks!

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] propose: detail binding error log

2016-03-14 Thread Ioseph Kim

thanks for reply.

value of log_statement is already 'all'
I set log_min_messages = debug5, log_error_verbosity = verbose and
debug_print_parse = on too.
but I could not a value of client in server log.

this case is occured only at jdbc prepare statement and wrong type
binding.

reguards, Ioseph.

2016-03-14 (월), 23:06 -0400, Tom Lane:
> Ioseph Kim  writes:
> > I want see that value of bind variable at a server log.
> 
> That's available if you turn on log_statements, IIRC.
> 
>   regards, tom lane
> 
> 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2016-03-14 Thread Craig Ringer

On 11 March 2016 at 22:24, Petr Jelinek  wrote:

> On 02/03/16 08:05, Craig Ringer wrote:
>
>> On 1 March 2016 at 05:30, Petr Jelinek > > wrote:
>>
>

>
>>
>> I wonder if it would be acceptable to create new info flag for
>> RM_SEQ_ID that would behave just like XLOG_SEQ_LOG but would be used
>> only for the nontransactional updates (nextval) so that decoding
>> could easily differentiate between transactional and
>> non-transactional update of sequence and then just either call the
>> callback immediately or add the change to reorder buffer based on
>> that. The redo code could just have simple OR expression to behave
>> same with both of the info flags.
>>
>>
>> That's much cleaner than trying to keep track of sequence creations and
>> really pretty harmless. I'll give that a go and see how it looks.
>>
>> Seems like simpler solution than building all the tracking code on
>> the decoding side to me.
>>
>>
>> +1
>>
>>
> Except this won't work for sequences that have been created in same
> transaction as the nextval()/setval() was called because in those cases we
> don't want to decode the advancement of sequence until the end of
> transaction and we can't map the relfilenode to sequence without going
> through reorder buffer in those cases either
>

I'll explain this a bit (for when I forget all about it and come back to it
confused, or if someone else picks this up):

The issue is transactions like

BEGIN;
CREATE TABLE blah (id serial primary key, something text);
INSERT INTO blah (something) SELECT  ;
COMMIT;

Here we create the sequence, then we advance the sequence in subsequent
statements that're part of the same xact but not directly connected to the
sequence creation. There's no convenient way to tell, when we see the
Form_pg_sequence updates in WAL for the newly created sequence, that it's
for a not-yet-committed xact so we shouldn't send the advance to the client
yet.

Once the xact that created the sequence commits we have to make sure we
send its latest state, not the initial state when it was created. So with
the above proposal we'd still need to look up those new-info-flagged
entries against a map of uncommitted sequences by relfilenode and decide
whether to send it immediately or update the latest state of an uncommitted
sequence in a reorder buffer.

IOW we have to do pretty much what I described before. We can still log
sequence updates with a different info flag but we need to know how to
associate the record with the xact that created it, so we have to log the
creating xid in the record for the initial state of a newly created
sequence. At least that'd be less ugly than trying to peek at decoded
catalog updates in the reorder buffer to spot new sequence creation and can
be done only when wal_level = logical, but it'd mean that the two record
types were different in more than just info flag.

The other wrinkle Petr refers to is that when decoding XLOG_SEQ_LOG we only
have a relfilenode. We don't know the oid of the sequence, which we need to
look up its name. The reorder buffer code uses RelidByRelfilenode for that,
which requires a snapshot. I'm not sure what problem that poses, since we'd
obviously need a snapshot set up to look up the name by oid anyway and we'd
be using the most recently committed historic snapshot for both.

Anyway, this is still complicated because of the mess with sequences being
both transactional and not-transactional in ways that rely on how the low
level storage and WAL works.

Unfortunately I don't expect to have time to produce a new patch for 9.6.

(BTW, I'd be interested in seeing what code breaks if we introduced a
compile option in src/include/pg_config_manual.h to force oid and
relfilenode randomization rather than starting off with them being the
same.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-14 Thread Amit Langote

On 2016/03/15 3:41, Robert Haas wrote:
> On Sat, Mar 12, 2016 at 7:49 AM, Amit Langote  wrote:
>> Instead, the attached patch adds a IndexBulkDeleteProgressCallback
>> which AMs should call for every block that's read (say, right before a
>> call to ReadBufferExtended) as part of a given vacuum run. The
>> callback with help of some bookkeeping state can count each block and
>> report to pgstat_progress API. Now, I am not sure if all AMs read 1..N
>> blocks for every vacuum or if it's possible that some blocks are read
>> more than once in single vacuum, etc.  IOW, some AM's processing may
>> be non-linear and counting blocks 1..N (where N is reported total
>> index blocks) may not be possible.  However, this is the best I could
>> think of as doing what we are trying to do here. Maybe index AM
>> experts can chime in on that.
>>
>> Thoughts?
> 
> Well, I think you need to study the index AMs and figure this out.

OK.  I tried to put calls to the callback in appropriate places, but
couldn't get the resulting progress numbers to look sane.  So I ended up
concluding that any attempt to do so is futile unless I analyze each AM's
vacuum code carefully to be able to determine in advance the max bound on
the count of blocks that the callback will report.  Anyway, as you
suggest, we can improve it later.

> But I think for starters you should write a patch that reports the following:
> 
> 1. phase
> 2. number of heap blocks scanned
> 3. number of heap blocks vacuumed
> 4. number of completed index vac cycles
> 5. number of dead tuples collected since the last index vac cycle
> 6. number of dead tuples that we can store before needing to perform
> an index vac cycle
> 
> All of that should be pretty straightforward, and then we'd have
> something we can ship.  We can add the detailed index reporting later,
> when we get to it, perhaps for 9.7.

OK, I agree with this plan.  Attached updated patch implements this.

Thanks,
Amit
>From 0f830273cb2afb528b8b9ed2dcbaf136d4c3e64f Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 15 Mar 2016 10:14:33 +0900
Subject: [PATCH] WIP: Implement progress reporting for VACUUM command.

This basically utilizes the pgstat_progress* API to report a handful of
paramters to indicate its progress.  lazy_vacuum_rel() has already been
taught in b6fb6471 to report command start and end so that it's listed
in pg_stat_get_progress_info('VACUUM'). This commit makes lazy_scan_heap
to report following parameters: vacuum phase, total number of heap blocks,
number of heap blocks vacuumed, dead tuples found since last index vacuum
cycle, dead tuple slots left to fill until the next index vacuum run.
Following phases are identified and reported whenever one changes to
another: 'scanning heap', 'vacuuming indexes', 'vacuuming heap', and
finally 'cleanup'.

A view named pg_stat_progress_vacuum has been added with the following
columns: pid (int), datid (oid), datname (name), relid (oid),
vacuum_phase (text), heap_blks_total, heap_blks_vacuumed,
index_vacuum_count, dead_tup_found, dead_tup_slots (all bigint),
percent_done (numeric).
---
 doc/src/sgml/monitoring.sgml |  114 ++
 src/backend/catalog/system_views.sql |   25 
 src/backend/commands/vacuumlazy.c|   68 
 src/test/regress/expected/rules.out  |   22 +++
 4 files changed, 229 insertions(+), 0 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ec5328e..e5ffa10 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -507,6 +507,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_progress_vacuumpg_stat_progress_vacuum
+  One row for each backend (including autovacuum worker processes) running
+  VACUUM, showing current progress in terms of heap pages it
+  has finished processing.
+ 
 

   
@@ -2204,6 +2210,114 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
controls exactly which functions are tracked.
   
 
+  
+   pg_stat_progress_vacuum View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend
+
+
+ datid
+ oid
+ OID of the database this backend is connected to
+
+
+ datname
+ name
+ Name of the database this backend is connected to
+
+
+ relid
+ oid
+ OID of the table being vacuumed
+
+
+ vacuum_phase
+ text
+ Current processing phase of vacuum.
+   Possible values are:
+   
+
+ 
+  scanning heap
+ 
+
+
+ 
+  vacuuming indexes
+ 
+
+
+ 
+  vacuuming heap
+ 
+
+
+ 
+

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-14 Thread Amit Kapila

On Tue, Mar 15, 2016 at 12:00 AM, David Steele  wrote:
>
> On 2/26/16 11:37 PM, Amit Kapila wrote:
>
>> On Sat, Feb 27, 2016 at 10:03 AM, Amit Kapila >
>> Here, we can see that there is a gain of ~15% to ~38% at higher
>> client count.
>>
>> The attached document (perf_write_clogcontrollock_data_v6.ods)
>> contains data, mainly focussing on single client performance.  The
>> data is for multiple runs on different machines, so I thought it is
>> better to present in form of document rather than dumping everything
>> in e-mail.  Do let me know if there is any confusion in
>> understanding/interpreting the data.
>>
>> Forgot to mention that all these tests have been done by
>> reverting commit-ac1d794.
>
>
> This patch no longer applies cleanly:
>
> $ git apply ../other/group_update_clog_v6.patch
> error: patch failed: src/backend/storage/lmgr/proc.c:404
> error: src/backend/storage/lmgr/proc.c: patch does not apply
> error: patch failed: src/include/storage/proc.h:152
> error: src/include/storage/proc.h: patch does not apply
>

For me, with patch -p1 <  it works, but any how I have
updated the patch based on recent commit.  Can you please check the latest
patch and see if it applies cleanly for you now.

>
> It's not clear to me whether Robert has completed a review of this code
or it still needs to be reviewed more comprehensively.
>
> Other than a comment that needs to be fixed it seems that all questions
have been answered by Amit.
>

I have updated the comments and changed the name of one of a variable from
"all_trans_same_page" to "all_xact_same_page" as pointed out offlist by
Alvaro.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prepared Statement support for Parallel query

2016-03-14 Thread Amit Kapila

On Tue, Mar 15, 2016 at 12:21 AM, Robert Haas  wrote:
>
> On Mon, Mar 14, 2016 at 9:18 AM, Amit Kapila 
wrote:
> > On Fri, Feb 26, 2016 at 4:37 PM, Robert Haas 
wrote:
> >
> >
> > The failure cases fall into that category, basically
wholePlanParallelSafe
> > will be false, but parallelModeNeeded will be true which will enable
> > parallel mode restrictions even though the plan won't contain Gather
node.
> > I think if we want to operate in above way for testing purpose, then we
need
> > to force during execution that statements for non read-only operations
> > should not enter into parallel mode similar to what we are doing for
> > non-zero tuple count case.  Attached patch fixes the problem.
>
> This seems like a really ugly fix.  It might be possible to come up
> with a fix along these lines, but I don't have much confidence in the
> specific new test you've injected into executePlan().  Broadly, any
> change of this type implicitly changes the contract between
> executePlan() and the planner infrastructure - the planner can now
> legally generate parallel plans in some cases where that would
> previously have been unacceptable.  But I'm not in a hurry to rethink
> where we've drawn the line there for 9.6.  Let's punt this issue for
> now and come back to it in a future release.
>

No issues.  I felt that it might be good to support parallel query via
Prepare statement as there is no fundamental issue in the same, but as you
say, we can do that in future release as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] jsonb array-style subscription

2016-03-14 Thread Peter Geoghegan

On Thu, Mar 3, 2016 at 2:31 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> Well, actually, I agree with that. I can try to rework the patch to achieve
> this goal.

Good idea.

I wonder, having taken a quick look at the patch, how this works?:

+select * from test_jsonb_subscript where
test_json['key_doesnt_exists'] = '"value"';
+ id | test_json
++---
+(0 rows)

Can this use an index, in principle? If not, do you have a plan to get
it to a place where it can? How can the expression be made to map on
to existing indexable operators, such as the containment operator @>,
or even B-Tree opclass operators like = or <=, for example?

This kind of thing was always my main problem with jsonb array-style
subscription. I think it's really quite desirable in theory, but I
also think that these problems need to be fixed first. Think that I
made this point before.

ISTM that these expressions need to be indexable in some way, which
seems like a significantly harder project, especially because the
mapping between an expression in a predicate like this and an
indexable operator like @> is completely non-obvious. Making such a
mapping itself extensible seems even more tricky, which is what it
would take, I suspect. Indexing is always of great importance for
jsonb. It's already too complicated.

-- 
Peter Geoghega

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-03-14 Thread Craig Ringer

On 15 March 2016 at 04:48, Andres Freund  wrote:

> On 2016-01-31 05:09:33 +0800, Craig Ringer wrote:
> > On 29 January 2016 at 18:16, Andres Freund  wrote:
> >
> > > Hi,
> > >
> > > so, I'm reviewing the output of:
> > >
> >
> > Thankyou very much for the review.
>
> Afaics you've not posted an updated version of this? Any chance you
> could?
>

I'll try to get to it soon, yeah. I have been focusing on things that
cannot exist as extensions, especially timeline following for logical slots
and failover slots.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Timeline following for logical slots

2016-03-14 Thread Craig Ringer

On 15 March 2016 at 07:10, Alvaro Herrera  wrote:

> Petr Jelinek wrote:
> > On 04/03/16 17:08, Craig Ringer wrote:
> > >I'd really appreciate some review of the logic there by people who know
> > >timelines well and preferably know the xlogreader. It's really just one
> > >function and 2/3 comments; the code is simple but the reasoning leading
> > >to it is not.
> >
> > I think this will have to be work for committer at this point. I can't
> find
> > any flaws in the logic myself so I unless somebody protests I am going to
> > mark this as ready for committer.
>
> Great, thanks.  I've studied this to the point where I'm confident that
> it makes sense, so I'm about to push it.
>

Thanks, though I'm happy enough to wait for Andres's input as he requested.

> Also, hopefully you don't have any future callers for the functions I
> marked static (I checked the failover slots series and couldn't see
> anything).  I will push this early tomorrow.
>

I don't see it being overly useful for an extension, so that sounds fine.

> One thing I'm not quite sure about is why this is said to apply to
> "logical slots" and not just to replication slots in general.  In what
> sense does it not apply to physical replication slots?
>

 It's only useful for logical slots currently because physical slot
timeline following is done client side by the walreceiver. When the
walsender hits the end of the timeline it sends CopyDone and returns to
command mode. The client is expected to request the timeline history file,
parse it, work out which timeline to request next and specify that in its
next START_REPLICATION call.

None of that happens for logical slots. Andres was quite deliberate about
keeping timelines out of the interface for logical slots and logical
replication. I tried to retain that when adding the ability to follow
physical failover, keeping things simple for the client. Given that logical
replication slots are intended to be consumed by more than just a postgres
downstream it IMO makes sense to keep as much internal complexity hidden,
especially when dealing with something as surprisingly convoluted as
timelines.

This does mean that we can't tell if we replayed past the timeline switch
point on the old master before we switched to the new master, but I don't
think sending a timeline history file is the solution there. I'd rather
expose a walsender command and SQL function to let the client say, after
connecting, "I last replayed from timeline  up to point , if I start
replaying from you will I get a consistent stream?". That can be done
separately to this patch and IMO isn't crucial since clients can currently
work it out, just with more difficulty.

Maybe physical rep can use the same logic, but I'm really not convinced. It
already knows how to follow timelines client-side and handles that as part
of walreceiver and archive recovery. Because recovery handles following the
timeline switches and walreceiver just fetches the WAL as an alternative to
archive fetches I'm not sure it makes sense; we still have to have all that
logic for archive recovery from restore_command, so doing it differently
for streaming just makes things more complicated for no clear benefit.

I certainly didn't want to address it in the same patch, much like I
intentionally avoided trying to teach pg_xlogdump to be able to follow
timelines. Partly to keep it simpler and focused, partly because it'd
require timeline.c to be made frontend-clean which means no List, no elog,
etc...

>
> (I noticed that we have this new line,
> -   randAccess = true;
> +   randAccess = true;  /* allow readPageTLI to go
> backwards */
> in which now the comment is an identical copy of an identical line
> elsewhere; however, randAccess doesn't actually affect readPageTLI in
> any way, so AFAICS both comments are now wrong.)
>

Yeah, that's completely bogus. I have a clear image in my head of
randAccess being tested by ValidXLogPageHeader where it sanity
checks state->latestPageTLI, the TLI of the *prior* page, against the TLI
of the most recent page read, to make sure the TLI advances. But nope, I'm
apparently imagining things 'cos it just isn't there, it just tests to see
if we read a LSN later than the prior page instead.

> > Well for testing purposes it's quite fine I think. The TAP framework
> > enhancements needed for this are now in and it works correctly against
> > current master.
>
> Certainly the new src/test/recovery/t/006whatever.pl file is going to be
> part of the commit.  What I'm not so sure about is
> src/test/modules/decoding_failover: is there any reason we don't just
> put the new functions in contrib/test_decoding?

Because contrib/test_decoding isn't an extension. It is a sample logical
decoding output plugin.  I didn't want to mess it up by adding an extension
control file, extension script and some extension functions. Sure, you
*can* use the same shared library as an extension and

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-14 Thread Peter Geoghegan

On Fri, Mar 11, 2016 at 8:19 AM, Anastasia Lubennikova
 wrote:
> But I have some concerns about compatibility with my patches.
> I've tried to call bt_index_check() over my "including" patch [1] and caught
> a segfault.
>
> LOG:  server process (PID 31794) was terminated by signal 11: Segmentation
> fault
> DETAIL:  Failed process was running: select bt_index_check('idx');
>
> I do hope that my patch will be accepted in 9.6, so this conflict looks
> really bad.
> I think that error is caused by changes in pages layout. To save some space
> nonkey attributes are truncated

> [1] https://commitfest.postgresql.org/9/433/

I posted a review of your "Covering + unique indexes" patch, where I
made an educated guess about what the problem is here (I sort of
hinted at what I thought it was already, in this thread, actually). I
haven't actually tested this theory of mine myself just yet, but let
me know what you think of it on the thread for your patch.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-14 Thread Kouhei Kaigai

Petr,

The attached patch is the revised one that follows the new extensible-
node routine.

It is almost same the previous version except for:
- custom-apis.[ch] was renamed to custom-node.[ch]
- check for the length of custom-scan-method name followed
  the manner of RegisterExtensibleNodeMethods()

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Tuesday, March 15, 2016 2:54 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Petr Jelinek; pgsql-hackers@postgresql.org
> Subject: ##freemail## Re: [HACKERS] Reworks of CustomScan
> serialization/deserialization
> 
> On Sun, Mar 13, 2016 at 9:53 PM, Kouhei Kaigai  wrote:
> > OK, I split the previous small patch into two tiny patches.
> > The one is bugfix around max length of the extnodename.
> > The other replaces Assert() by ereport() according to the upthread 
> > discussion.
> 
> Committed, except that (1) I replaced ereport() with elog(), because I
> can't see making translators care about this message; and (2) I
> reworded the error message a bit.
> 
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


pgsql-v9.6-custom-scan-serialization-reworks.2.patch
Description: pgsql-v9.6-custom-scan-serialization-reworks.2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][PROPOSAL] Covering + unique indexes.

2016-03-14 Thread Peter Geoghegan

On Mon, Mar 14, 2016 at 8:43 PM, Peter Geoghegan  wrote:
> * I think the comparison logic may have a bug.
>
> Does this work with amcheck? Maybe it works with bt_index_check(), but
> not bt_index_parent_check()? I think that you need to make sure that
> _bt_compare() knows about this, too. That's because it isn't good
> enough to let a truncated internal IndexTuple compare equal to a
> scankey when non-truncated attributes are equal. I think you need to
> have an imaginary "minus infinity" attribute past the first
> non-truncated attribute (i.e. "minus infinity value" for the first
> *truncated* attribute). That way, the downlinks will always be lower
> bounds when the non-"included"/truncated attributes are involved. This
> seems necessary. No?

Oh, BTW: You probably need to worry about high key items as a special
case, too. Note that there is a special case when the ScanKey is equal
to the high key on a page during insertion. As the nbtree README puts
it:

"""
An insertion that sees the high key of its target page is equal to the key
to be inserted has a choice whether or not to move right, since the new
key could go on either page.  (Currently, we try to find a page where
there is room for the new key without a split.)

"""

Just something to watch out for if you add "minus infinity" attributes
as I suggested. Not exactly sure what to do about this other problem,
but it seems manageable.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][PROPOSAL] Covering + unique indexes.

2016-03-14 Thread Peter Geoghegan

On Mon, Mar 14, 2016 at 8:43 PM, Peter Geoghegan  wrote:
>
> Does this work with amcheck? Maybe it works with bt_index_check(), but
> not bt_index_parent_check()? I think that you need to make sure that
> _bt_compare() knows about this, too. That's because it isn't good
> enough to let a truncated internal IndexTuple compare equal to a
> scankey when non-truncated attributes are equal. I think you need to
> have an imaginary "minus infinity" attribute past the first
> non-truncated attribute (i.e. "minus infinity value" for the first
> *truncated* attribute). That way, the downlinks will always be lower
> bounds when the non-"included"/truncated attributes are involved. This
> seems necessary. No?

Maybe  can store information about minus infinity attributes in
"itup->t_tid.ip_posid". As you know, this is unused within
internal/non-leaf pages, whose downlink items only need a block number
(the child's block number/location on disk for that particular
downlink). That's a bit ugly, but there are plenty of bits available
from there, so use them if you need them.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][PROPOSAL] Covering + unique indexes.

2016-03-14 Thread Peter Geoghegan

On Wed, Sep 16, 2015 at 8:53 AM, Nicolas Barbier
 wrote:
> After thinking about it a bit more, it indeed seems never useful to
> have f3 in the internal nodes if it is not part of the columns that
> determine the UNIQUE property. It could as well be pushed out of the
> internal nodes and only appear in the leaf nodes.

Correct. That's a standard technique in B-Tree implementations like
our own; suffix truncation can remove unneeded information from the
end of values, possibly including entire attributes, which can be
truncated in a way that is similar to this patch.

The difference here is only that this patch does not dynamically
determine which attributes can be removed while re-finding the parent
downlink in the second phase of a page split (the usual place it
happens with standard suffix truncation). Rather, this patch knows "a
priori" that it can truncate attributes that are merely "included"
attributes. That means that this patch has as much to do with
increasing B-Tree fan-out for these indexes as it does for making
unique indexes more usable for index-only scans. Both of those goals
are important, IMV.

This patch seems pretty cool. I noticed some issues following a quick
read though the patch including_columns_6.0.patch that Anastasia
should look into:

* You truncate (remove suffix attributes -- the "included" attributes)
within _bt_insertonpg():

-   right_item = CopyIndexTuple(item);
+   indnatts = IndexRelationGetNumberOfAttributes(rel);
+   indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+
+   if (indnatts != indnkeyatts)
+   {
+   right_item = index_reform_tuple(rel, item, indnatts, indnkeyatts);
+   right_item_sz = IndexTupleDSize(*right_item);
+   right_item_sz = MAXALIGN(right_item_sz);
+   }
+   else
+   right_item = CopyIndexTuple(item);
ItemPointerSet(&(right_item->t_tid), rbkno, P_HIKEY);

I suggest that you do this within _bt_insert_parent(), instead, iff
the original target page is know to be a leaf page. That's where it
needs to happen for conventional suffix truncation, which has special
considerations when determining which attributes are safe to truncate
(or even which byte in the first distinguishing attribute it is okay
to truncate past). Conventional suffix truncation (not this patch)
could truncate, say, "C" collation text past the first distinguishing
byte, where the byte distinguishes the new downlink being inserted
into the parent page compared to both the left downlink and right
downlink in the parent page-- the minimum amount of information to
correctly guide later index scans is only stored. But it isn't correct
(again, with conventional suffix truncation) to do this passed the
leaf level. It must end there.

It isn't safe past the leaf level (by which I mean when inserting a
downlink into its parent, one level up) because applying suffix
truncation again for the next level up might guide a search to the
highest node in the left sub-tree rather than to the lowest node in
the right sub-tree, or vice versa. In general, we must be careful
about "cousin" nodes, that are beside each other but are not
"siblings" due to not sharing the same parent. It doesn't really
matter that this restriction exists, because you get almost all the
benefit at the leaf -> immediate parent level anyway. Higher levels
will reuse already truncated Index Tuples, that are typically
"truncated enough".

So, this should work in a similar way to conventional suffix
truncation (BTW, you should work on that later). And so, it should
just do it there. Besides, checking it only where it could possibly
help is clearer, since as written the code in _bt_insertonpg() will
never need to truncate following a non-leaf/internal page split.

* I think the comparison logic may have a bug.

Does this work with amcheck? Maybe it works with bt_index_check(), but
not bt_index_parent_check()? I think that you need to make sure that
_bt_compare() knows about this, too. That's because it isn't good
enough to let a truncated internal IndexTuple compare equal to a
scankey when non-truncated attributes are equal. I think you need to
have an imaginary "minus infinity" attribute past the first
non-truncated attribute (i.e. "minus infinity value" for the first
*truncated* attribute). That way, the downlinks will always be lower
bounds when the non-"included"/truncated attributes are involved. This
seems necessary. No?

It's necessary because you aren't storing any attributes, so it's not
acceptable to even attempt a comparison -- I think that will segfault
(doesn't matter that the index scan wouldn't have returned anything
anyway). I think it's also necessary because of  issues with "cousin"
nodes making index scans lose their way.

That's all I have right now. Nice work.
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Identifying a message in emit_log_hook.

2016-03-14 Thread Kyotaro HORIGUCHI

Thnak you for scooping up this.

At Thu, 10 Mar 2016 08:14:09 -0500, David Steele  wrote in 
<56e17321.5050...@pgmasters.net>
> Hi Simon,
> 
> On 3/10/16 7:26 AM, Simon Riggs wrote:
> >
> > Can you add this to the CF? It was submitted before deadline.
> > I presume you have access to do that?
> 
> No problem - here it is:
> 
> https://commitfest.postgresql.org/9/576/

Some kind of hash code can be added for shotrcut, but this is
usable even after it is added.

One arguable point I see now on this is only ids for the message
type "message" would be enough, or needed for some other types
such as "details". This is quite straightforward so I see no
other arguable point other than the code itself.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Choosing parallel_degree

2016-03-14 Thread David Rowley

On 15 March 2016 at 15:24, James Sewell  wrote:
>
> I did want to test with some really slow aggs, but even when I take out the 
> small table test in create_parallel_paths I can't seem to get a parallel plan 
> for a tiny table. Any idea on why this would be David?

In the test program I attached to the previous email, if I change the
parallel_threshold = 1000; to be parallel_threshold = 1; then I get
the following output:

For 1 pages there will be 1 workers (rel size 0 MB, 0 GB)
For 4 pages there will be 2 workers (rel size 0 MB, 0 GB)

So I'm getting 2 workers for only 4 pages. I've not tested in
Postgres, but if you do this and: SET parallel_setup_cost = 0; then
I'd imagine it should generate a parallel plan.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] propose: detail binding error log

2016-03-14 Thread Tom Lane

Ioseph Kim  writes:
> I want see that value of bind variable at a server log.

That's available if you turn on log_statements, IIRC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] propose: detail binding error log

2016-03-14 Thread Ioseph Kim

Hi, hackers.

I had a error message while using PostgreSQL.

"ERROR:  42804: column "a" is of type boolean but expression is of type
integer at character 25
LOCATION:  transformAssignedExpr, parse_target.c:529"

This error is a java jdbc binding error.
column type is boolean but bind variable is integer.

I want see that value of bind variable at a server log.

java code:
pstmt = connection.prepareStatement("insert into test values (?)");
pstmt.setInt(1, 1);

I could not see that value at a server log, by changing any servier
configurations.

That case is debuged to client only.

So, I propose that error value of bind variable will be displayed at a
server log.
 
in parse_target.c:529,
input parameter value of that node not containded value of column at
client.

I want more detail error log.

Regards, Ioseph Kim



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor bug affecting ON CONFLICT lock wait log messages

2016-03-14 Thread Peter Geoghegan

On Mon, Mar 7, 2016 at 1:46 PM, Peter Geoghegan  wrote:
> Attached patch fixes a bug reported privately by Stephen this morning.

Bump.

I would like to see this in the next point release. It shouldn't be
hard to review.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Choosing parallel_degree

2016-03-14 Thread James Sewell

Thanks David,

Eventually it would be great to take into account the cost of the function
doing the agg (pg_proc.procost, which is a multiple of CPU units).

This would allow people to mark specific aggregations as needing more CPU
power, therefore needing more workers per page (or should it be tuple in
this case?).

In the meantime some way to manually influence this would be good. I just
did some testing (on an 8VCPU machine) with a 139MB table, which gets 3
workers currently.

For a count(*) I get a time of 131.754 ms. If I increase this to 8 workers
I get around 86.193 ms.

Obviously this doesn't mean much as YMMV - but it does show that the
ability to manually adjust the scaling would be great, especially in
data warehouse or reporting environments.

I did want to test with some really slow aggs, but even when I take out the
small table test in create_parallel_paths I can't seem to get a parallel
plan for a tiny table. Any idea on why this would be David?


Cheers,













James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Tue, Mar 15, 2016 at 12:25 PM, David Rowley  wrote:

> Over in [1] James mentioned about wanting more to be able to have more
> influence over the partial path's parallel_degree decision.  At risk
> of a discussion on that hijacking the parallel aggregate thread, I
> thought I'd start this for anyone who would want to discuss making
> changes to that.
>
> I've attached a simple C program which shows the parallel_degree which
> will be chosen at the moment. For now it's based on the size of the
> base relation. Perhaps that will need to be rethought later, perhaps
> based on costs. But I just don't think it's something for 9.6.
>
> Here's the output of the C program.
>
> For 1 pages there will be 1 workers (rel size 0 MB, 0 GB)
> For 3001 pages there will be 2 workers (rel size 23 MB, 0 GB)
> For 9001 pages there will be 3 workers (rel size 70 MB, 0 GB)
> For 27001 pages there will be 4 workers (rel size 210 MB, 0 GB)
> For 81001 pages there will be 5 workers (rel size 632 MB, 0 GB)
> For 243001 pages there will be 6 workers (rel size 1898 MB, 1 GB)
> For 729001 pages there will be 7 workers (rel size 5695 MB, 5 GB)
> For 2187001 pages there will be 8 workers (rel size 17085 MB, 16 GB)
> For 6561001 pages there will be 9 workers (rel size 51257 MB, 50 GB)
> For 19683001 pages there will be 10 workers (rel size 153773 MB, 150 GB)
> For 59049001 pages there will be 11 workers (rel size 461320 MB, 450 GB)
> For 177147001 pages there will be 12 workers (rel size 1383960 MB, 1351 GB)
> For 531441001 pages there will be 13 workers (rel size 4151882 MB, 4054 GB)
> For 1594323001 pages there will be 14 workers (rel size 12455648 MB, 12163
> GB)
>
> [1]
> http://www.postgresql.org/message-id/CANkGpBtUvzpdvF2=_iq64ujmvrpycs6d4i9-wepbusq1sq+...@mail.gmail.com
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-14 Thread Noah Misch

On Mon, Mar 14, 2016 at 01:33:08PM +0100, Tomas Vondra wrote:
> On 03/14/2016 07:14 AM, Noah Misch wrote:
> >On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:
> >>+* XXX Maybe this should also care about the clock skew, just like the
> >>+* block a few lines down.
> >
> >Yes, it should.  (The problem is large (>~100s), backward clock resets, not
> >skew.)  A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
> >will usually also cause "msg->cutoff_time < dbentry->stats_timestamp".  Such
> >cases need the correction a few lines down.
> 
> I'll look into that. I have to admit I have a hard time reasoning about the
> code handling clock skew, so it might take some time, though.

No hurry; it would be no problem to delay this several months.

> >The other thing needed here is to look through and update comments about
> >last_statrequests.  For example, this loop is dead code due to us writing
> >files as soon as we receive one inquiry:
> >
> > /*
> >  * Find the last write request for this DB.  If it's older than the
> >  * request's cutoff time, update it; otherwise there's nothing to do.
> >  *
> >  * Note that if a request is found, we return early and skip the below
> >  * check for clock skew.  This is okay, since the only way for a DB
> >  * request to be present in the list is that we have been here since the
> >  * last write round.
> >  */
> > slist_foreach(iter, _statrequests) ...
> >
> >I'm okay keeping the dead code for future flexibility, but the comment should
> >reflect that.
> 
> Yes, that's another thing that I'd like to look into. Essentially the
> problem is that we always process the inquiries one by one, so we never
> actually see a list with more than a single element. Correct?

Correct.

> I think the best way to address that is to peek is to first check how much
> data is in the UDP queue, and then fetching all of that before actually
> doing the writes. Peeking at the number of requests first (or even some
> reasonable hard-coded limit) should should prevent starving the inquirers in
> case of a steady stream or inquiries.

Now that you mention it, a hard-coded limit sounds good: write the files for
pending inquiries whenever the socket empties or every N messages processed,
whichever comes first.  I don't think the amount of pending UDP data is
portably available, and I doubt it's important.  Breaking every, say, one
thousand messages will make the collector predictably responsive to inquiries,
and that is the important property.

I would lean toward making this part 9.7-only; it would be a distinct patch
from the one previously under discussion.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Peter Geoghegan

On Mon, Mar 14, 2016 at 4:11 PM, Peter Geoghegan  wrote:
> Yes, with one small difference: I wouldn't be calling ERR_get_error()
> in the common case where SSL_get_error() returns SSL_ERROR_NONE, on
> the theory that skipping that case represents no risk. I'm making a
> concession to Peter E's view that that will calling ERR_get_error()
> more will add useless cycles.

The attached patch is what I have in mind.

I can produce a back-patchable variant of this if you and Peter E.
think this approach is okay.

-- 
Peter Geoghegan
From f7a72e36cdf2ff58857bd962e26daabdc5747fe1 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Tue, 26 Jan 2016 15:11:15 -0800
Subject: [PATCH] Distrust external OpenSSL clients; clear err queue

OpenSSL has an unfortunate tendency to mix per-session state error
handling with per-thread error handling.  This can cause problems when
programs that link to libpq with OpenSSL enabled have some other use of
OpenSSL; without care, one caller of OpenSSL may cause problems for the
other caller.  Backend code might similarly be affected, for example
when a third party extension independently uses OpenSSL without taking
the appropriate precautions.

To fix, don't trust other users of OpenSSL to clear the per-thread error
queue.  Instead, clear the entire per-thread queue ahead of certain I/O
operations when it appears that there might be trouble (these I/O
operations mostly need to call SSL_get_error() to check for success,
which relies on the queue being empty).  This is slightly aggressive,
but it's pretty clear that the other callers have a very dubious claim
to ownership of the per-thread queue.  Do this is both frontend and
backend code.

Finally, be more careful about clearing our own error queue, so as to
not cause these problems ourself.  It's possibly that control previously
did not always reach SSLerrmessage(), where ERR_get_error() was supposed
to be called to clear the queue's earliest code.  Make sure
ERR_get_error() is always called, so as to spare other users of OpenSSL
the possibility of similar problems caused by libpq (as opposed to
problems caused by a third party OpenSSL library like PHP's OpenSSL
extension).  Again, do this is both frontend and backend code.

See bug #12799 and https://bugs.php.net/bug.php?id=68276

Based on patches by Dave Vitek and Peter Eisentraut.

Back-patch to all supported versions.
---
 src/backend/libpq/be-secure-openssl.c| 104 ++--
 src/interfaces/libpq/fe-secure-openssl.c | 114 ---
 2 files changed, 173 insertions(+), 45 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1e3dfb6..8fd92ab 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -78,7 +78,7 @@ static DH  *tmp_dh_cb(SSL *s, int is_export, int keylength);
 static int	verify_cb(int, X509_STORE_CTX *);
 static void info_cb(const SSL *ssl, int type, int args);
 static void initialize_ecdh(void);
-static const char *SSLerrmessage(void);
+static const char *SSLerrmessage(unsigned long ecode);

 static char *X509_NAME_to_cstring(X509_NAME *name);

@@ -182,7 +182,7 @@ be_tls_init(void)
 		if (!SSL_context)
 			ereport(FATAL,
 	(errmsg("could not create SSL context: %s",
-			SSLerrmessage(;
+			SSLerrmessage(ERR_get_error();

 		/*
 		 * Disable OpenSSL's moving-write-buffer sanity check, because it
@@ -198,7 +198,7 @@ be_tls_init(void)
 			ereport(FATAL,
 	(errcode(ERRCODE_CONFIG_FILE_ERROR),
   errmsg("could not load server certificate file \"%s\": %s",
-		 ssl_cert_file, SSLerrmessage(;
+		 ssl_cert_file, SSLerrmessage(ERR_get_error();

 		if (stat(ssl_key_file, ) != 0)
 			ereport(FATAL,
@@ -228,12 +228,12 @@ be_tls_init(void)
 		SSL_FILETYPE_PEM) != 1)
 			ereport(FATAL,
 	(errmsg("could not load private key file \"%s\": %s",
-			ssl_key_file, SSLerrmessage(;
+			ssl_key_file, SSLerrmessage(ERR_get_error();

 		if (SSL_CTX_check_private_key(SSL_context) != 1)
 			ereport(FATAL,
 	(errmsg("check of private key failed: %s",
-			SSLerrmessage(;
+			SSLerrmessage(ERR_get_error();
 	}

 	/* set up ephemeral DH keys, and disallow SSL v2/v3 while at it */
@@ -262,7 +262,7 @@ be_tls_init(void)
 			(root_cert_list = SSL_load_client_CA_file(ssl_ca_file)) == NULL)
 			ereport(FATAL,
 	(errmsg("could not load root certificate file \"%s\": %s",
-			ssl_ca_file, SSLerrmessage(;
+			ssl_ca_file, SSLerrmessage(ERR_get_error();
 	}

 	/*--
@@ -293,7 +293,7 @@ be_tls_init(void)
 			else
 ereport(FATAL,
 		(errmsg("could not load SSL certificate revocation list file \"%s\": %s",
-ssl_crl_file, SSLerrmessage(;
+ssl_crl_file, SSLerrmessage(ERR_get_error();
 		}
 	}

@@ -330,6 +330,7 @@ be_tls_open_server(Port *port)
 	int			r;
 	int			err;
 	int			waitfor;
+

[HACKERS] Choosing parallel_degree

2016-03-14 Thread David Rowley

Over in [1] James mentioned about wanting more to be able to have more
influence over the partial path's parallel_degree decision.  At risk
of a discussion on that hijacking the parallel aggregate thread, I
thought I'd start this for anyone who would want to discuss making
changes to that.

I've attached a simple C program which shows the parallel_degree which
will be chosen at the moment. For now it's based on the size of the
base relation. Perhaps that will need to be rethought later, perhaps
based on costs. But I just don't think it's something for 9.6.

Here's the output of the C program.

For 1 pages there will be 1 workers (rel size 0 MB, 0 GB)
For 3001 pages there will be 2 workers (rel size 23 MB, 0 GB)
For 9001 pages there will be 3 workers (rel size 70 MB, 0 GB)
For 27001 pages there will be 4 workers (rel size 210 MB, 0 GB)
For 81001 pages there will be 5 workers (rel size 632 MB, 0 GB)
For 243001 pages there will be 6 workers (rel size 1898 MB, 1 GB)
For 729001 pages there will be 7 workers (rel size 5695 MB, 5 GB)
For 2187001 pages there will be 8 workers (rel size 17085 MB, 16 GB)
For 6561001 pages there will be 9 workers (rel size 51257 MB, 50 GB)
For 19683001 pages there will be 10 workers (rel size 153773 MB, 150 GB)
For 59049001 pages there will be 11 workers (rel size 461320 MB, 450 GB)
For 177147001 pages there will be 12 workers (rel size 1383960 MB, 1351 GB)
For 531441001 pages there will be 13 workers (rel size 4151882 MB, 4054 GB)
For 1594323001 pages there will be 14 workers (rel size 12455648 MB, 12163 GB)

[1] 
http://www.postgresql.org/message-id/CANkGpBtUvzpdvF2=_iq64ujmvrpycs6d4i9-wepbusq1sq+...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
#include 
#include 

int max_parallel_degree = 64;

#define BLOCKSZ (8*1024)

#define MULTIPLIER 3
int
choose_degree(unsigned int pages)
{
	int		parallel_threshold = 1000;
	int		parallel_degree = 1;

	/*
	 * If this relation is too small to be worth a parallel scan, just return
	 * without doing anything ... unless it's an inheritance child.  In that case,
	 * we want to generate a parallel path here anyway.  It might not be worthwhile
	 * just for this relation, but when combined with all of its inheritance siblings
	 * it may well pay off.
	 */
	if (pages < parallel_threshold)
		return parallel_degree;

	/*
	 * Limit the degree of parallelism logarithmically based on the size of the
	 * relation.  This probably needs to be a good deal more sophisticated, but we
	 * need something here for now.
	 */
	while (pages > parallel_threshold * 3 &&
		   parallel_degree < max_parallel_degree)
	{
		parallel_degree++;
		parallel_threshold *= 3;
		if (parallel_threshold >= INT_MAX / 3)
			break;
	}
	return parallel_degree;
}

int
main(void)
{
	unsigned int pages;
	int last_workers = -1;

	for (pages = 1; pages != 0; pages += 1)
	{
		int workers = choose_degree(pages);

		if (workers != last_workers)
		{
			printf("For %u pages there will be %d workers (rel size %llu MB, %llu GB)\n", pages, workers,
(unsigned long long) pages * BLOCKSZ / 1024 / 1024,
(unsigned long long) pages * BLOCKSZ / 1024 / 1024 / 1024);
			last_workers = workers;
		}
	}
	return 0;
}
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-14 Thread Kouhei Kaigai





> -Original Message-
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Tuesday, March 15, 2016 2:04 AM
> To: Petr Jelinek
> Cc: Kaigai Kouhei(海外 浩平); David Rowley; Robert Haas;
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP: Upper planner pathification
> 
> Petr Jelinek  writes:
> > On 14/03/16 02:43, Kouhei Kaigai wrote:
> >> Even though I couldn't check the new planner implementation entirely,
> >> it seems to be the points below are good candidate to inject CustomPath
> >> (and potentially ForeignScan).
> >>
> >> - create_grouping_paths
> >> - create_window_paths
> >> - create_distinct_paths
> >> - create_ordered_paths
> >> - just below of the create_modifytable_path
> >> (may be valuable for foreign-update pushdown)
> 
> > To me that seems too low inside the planning tree, perhaps adding it
> > just to the subquery_planner before SS_identify_outer_params would be
> > better, that's the place where you see the path for the whole (sub)query
> > so you can search and modify what you need from there.
> 
> I don't like either of those too much.  The main thing I've noticed over
> the past few days is that you can't readily generate custom upper-level
> Paths unless you know what PathTarget grouping_planner is expecting each
> level to produce.  So what I was toying with doing is (1) having
> grouping_planner put all those targets into the PlannerInfo, perhaps
> in an array indexed by UpperRelationKind; and (2) adding a hook call
> immediately after those targets are computed, say right before
> the create_grouping_paths() call (approximately planner.c:1738
> in HEAD).  It should be sufficient to have one hook there since
> you can inject Paths into any of the upper relations at that point;
> moreover, that's late enough that you shouldn't have to recompute
> anything you figured out during scan/join planning.
>
Regarding to the (2), I doubt whether the location is reasonable,
because pathlist of each upper_rels[] are still empty, aren't it?
It will make extension not-easy to construct its own CustomPath that
takes underlying built-in pathnodes.

For example, an extension implements its own sort logic but not
interested in group-by/window function, it shall want to add its
CustomPath to UPPERREL_ORDERED, however, it does not know which is
the input_rel and no built-in paths are not added yet at the point
of create_upper_paths_hook().

On the other hands, here is another problem if we put a hook after
all the upper paths done. In this case, built-in create__paths()
functions cannot pay attention for CustomPath to be added later when
these functions pick up the cheapest path.

So, even though we don't need to define multiple hook declarations,
I think the hook invocation is needed just after create__paths()
for each. It will need to inform extension the context of hook
invocation, the argument list will take UpperRelationKind.

In addition, extension cannot reference some local variables from
the root structure, like:
 - rollup_lists
 - rollup_groupclauses
 - wflists
 - activeWindows
 - have_postponed_srfs
As we are doing at set_join_pathlist_hook, it is good idea to define
UpperPathExtraData structure to pack misc information.

So, how about to re-define the hook as follows?

typedef void (*create_upper_paths_hook_type) (UpperRelationKind upper_kind,
  PlannerInfo *root,
  RelOptInfo *scan_join_rel,
  UpperPathExtraData *extra);

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-14 Thread Kouhei Kaigai

> On Sun, Mar 13, 2016 at 9:53 PM, Kouhei Kaigai  wrote:
> > OK, I split the previous small patch into two tiny patches.
> > The one is bugfix around max length of the extnodename.
> > The other replaces Assert() by ereport() according to the upthread 
> > discussion.
> 
> Committed, except that (1) I replaced ereport() with elog(), because I
> can't see making translators care about this message; and (2) I
> reworded the error message a bit.
>
Thanks, and I got the point why ereport() is suggested for the error
message that may be visible to users, instead of elog().

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread David Rowley

On 15 March 2016 at 11:24, James Sewell  wrote:
> On Tuesday, 15 March 2016, Robert Haas  wrote:
>>
>> > Does the cost of the aggregate function come into this calculation at
>> > all? In PostGIS land, much smaller numbers of rows can generate loads
>> > that would be effective to parallelize (worker time much >> than
>> > startup cost).
>>
>> Unfortunately, no - only the table size.  This is a problem, and needs
>> to be fixed.  However, it's probably not going to get fixed for 9.6.
>> :-(
>
>
> Any chance of getting a GUC (say min_parallel_degree) added to allow setting
> the initial value of parallel_degree, then changing the small relation check
> to also pass if parallel_degree > 1?
>
> That way you could set min_parallel_degree on a query by query basis if you
> are running aggregates which you know will take a lot of CPU.
>
> I suppose it wouldn't make much sense at all to set globally though, so it
> could just confuse matters.

I agree that it would be nice to have more influence on this decision,
but let's start a new thread for that. I don't want this one getting
bloated with debates on that. It's not code I'm planning on going
anywhere near for this patch.

I'll start a thread...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread David Rowley

On 15 March 2016 at 11:39, Tomas Vondra  wrote:
> I've looked at this patch today. The patch seems quite solid, but I do have
> a few minor comments (or perhaps questions, given that this is the first
> time I looked at the patch).
>
> 1) exprCollation contains this bit:
> ---
>
> case T_PartialAggref:
> coll = InvalidOid; /* XXX is this correct? */
> break;
>
> I doubt this is the right thing to do. Can we actually get to this piece of
> code? I haven't tried too hard, but regression tests don't seem to trigger
> this piece of code.

Thanks for looking at this.

Yeah, it's there because it it being called during setrefs.c in
makeVarFromTargetEntry() via fix_combine_agg_expr_mutator(), so it's
required when building the new target list for Aggref->args to point
to the underlying Aggref's Var.

As for the collation, I'm still not convinced if it's right or wrong.
I know offlist you mentioned about string_agg() and sorting, but
there' no code that'll sort the agg's state. The only possible thing
that gets sorted there is the group by key.

> Moreover, if we're handling PartialAggref in exprCollation(), maybe we
> should handle it also in exprInputCollation and exprSetCollation?

hmm, maybe, that I'm not sure about. I don't see where we'd call
exprSetCollation() for this, but I think I need to look at
exprInputCollation()

> And if we really need the collation there, why not to fetch the actual
> collation from the nested Aggref? Why should it be InvalidOid?

It seems quite random to me to do that. If the trans type is bytea,
why would it be useful to inherit the collation from the aggregate?
I'm not confident I'm right with InvalidOId... I just don't think we
can pretend the collation is the same as the Aggref's.

> 2) partial_aggregate_walker
> ---
>
> I think this should follow the naming convention that clearly identifies the
> purpose of the walker, not what kind of nodes it is supposed to walk. So it
> should be:
>
> aggregates_allow_partial_walker

Agreed and changed.

> 3) create_parallelagg_path
> --
>
> I do agree with the logic that under-estimates are more dangerous than
> over-estimates, so the current estimate is safer. But I think this would be
> a good place to apply the formula I proposed a few days ago (or rather the
> one Dean Rasheed proposed in response).
>
> That is, we do know that there are numGroups in total, and each parallel
> worker sees subpath->rows then it's expected to see
>
> sel = (subpath->rows / rel->tuples);
> perGroup = (rel->tuples / numGroups);
> workerGroups = numGroups * (1 - powl(1 - s, perGroup));
> numPartialGroups = numWorkers * workerGroups
>
> It's probably better to see Dean's message from 13/3.

I think what I have works well when there's a small number of groups,
as there's a good chance that each worker will see at least 1 tuple
from each group. However I understand that will become increasingly
unlikely with a larger number of groups, which is why I capped it to
the total input rows, but even in cases before that cap is reached I
think it will still overestimate. I'd need to analyze the code above
to understand it better, but my initial reaction is that, you're
probably right, but I don't think I want to inherit the fight for
this. Perhaps it's better to wait until GROUP BY estimate improvement
patch gets in, and change this, or if this gets in first, then you can
include this change in your patch. I'm not trying to brush off the
work, I just would rather it didn't delay parallel aggregate.

> 4) Is clauses.h the right place for PartialAggType?

I'm not sure that it is to be honest. I just put it there because the
patch never persisted the value of a PartialAggType in any struct
field anywhere and checks it later in some other file. In all places
where we use PartialAggType we're also calling
aggregates_allow_partial(), which does require clauses.h. So that's
why it ended up there... I think I'll leave it there until someone
gives me a good reason to move it.

An updated patch will follow soon.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-03-14 Thread Peter Geoghegan

On Mon, Mar 14, 2016 at 5:21 PM, Andres Freund  wrote:
> So? You're not the only one. I don't see why we shouldn't move this to
> 'returned with feedback' until there's a new version.

I don't see any point in that; I intend to get a revision in to the
ongoing CF. But fine.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

2016-03-14 Thread Vitaly Burovoy

On 3/14/16, Anastasia Lubennikova  wrote:
> 14.03.2016 16:23, David Steele:
>> On 2/25/16 4:44 PM, Vitaly Burovoy wrote:
>>
>>> Added to the commitfest 2016-03.
>>>
>>> [CF] https://commitfest.postgresql.org/9/540/
>>
>> This looks like a fairly straight-forward bug fix (the size of the
>> patch is deceptive because there a lot of new tests included).  It
>> applies cleanly.
>>
>> Anastasia, I see you have signed up to review.  Do you have an idea
>> when you will get the chance to do that?
>>
>> Thanks,
>
> I've read the patch thoroughly and haven't found any problems. I think
> that the patch is in a very good shape.
> It fixes a bug and has an excellent set of tests.
>
> There is an issue, mentioned in the thread above:
>
>>postgres=# select
>>postgres-#  to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
>>postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
>>postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
>>  to_char  |  to_char  |  to_char
>>---+---+---
>> monday| monday| thursday
>>(1 row)
>
>>since 4714-12-28 BC and to the past detection when a week is starting
>>is broken (because it is boundary of isoyears -4713 and -4712).
>>Is it worth to break undocumented range or leave it as is?
>
> But I suppose that behavior of undocumented dates is not essential.

I'm sorry... What should I do with "Waiting on Author" state if you
don't have complaints?

> --
> Anastasia Lubennikova
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-03-14 Thread Andres Freund

On 2016-03-14 17:17:02 -0700, Peter Geoghegan wrote:
> On Mon, Mar 14, 2016 at 5:04 PM, Robert Haas  wrote:
> > There hasn't been a new version of this patch in 9 months, you're
> > clearly not in a hurry to produce one, and nobody else seems to feel
> > strongly that this is something that needs to be done at all.  I think
> > we could just let this go and be just fine, but instead of doing that
> > and moving onto patches that people do feel strongly about, we're
> > arguing about this.  Bummer.
> 
> I'm busy working on fixing an OpenSSL bug affecting all released
> versions right at the moment, but have a number of complex 9.6 patches
> to review, most of which are in need of support. I'm very busy.

So? You're not the only one. I don't see why we shouldn't move this to
'returned with feedback' until there's a new version.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-03-14 Thread Peter Geoghegan

On Mon, Mar 14, 2016 at 5:04 PM, Robert Haas  wrote:
> There hasn't been a new version of this patch in 9 months, you're
> clearly not in a hurry to produce one, and nobody else seems to feel
> strongly that this is something that needs to be done at all.  I think
> we could just let this go and be just fine, but instead of doing that
> and moving onto patches that people do feel strongly about, we're
> arguing about this.  Bummer.

I'm busy working on fixing an OpenSSL bug affecting all released
versions right at the moment, but have a number of complex 9.6 patches
to review, most of which are in need of support. I'm very busy.

I said that I'd get to this patch soon. I might be kicking the can
down the road a little with this patch; if so, I'm sorry. I suggest we
leave it at that, until the CF is almost over or until I produce a
revision.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

2016-03-14 Thread Vitaly Burovoy

On 3/14/16, Mark Dilger  wrote:
> The first thing I notice about this patch is that
> src/include/datatype/timestamp.h
> has some #defines that are brittle.  The #defines have comments explaining
> their logic, but I'd rather embed that in the #define directly:
>
> This:
>
> +#ifdef HAVE_INT64_TIMESTAMP
> +#define MIN_TIMESTAMP  INT64CONST(-2118134880)
> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */
> +#define MAX_TIMESTAMP  INT64CONST(92233713312)
> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC
> */
> +#else
> +#define MIN_TIMESTAMP  -211813488000.0
> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */
> +#define MAX_TIMESTAMP  9223371331200.0
> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */
> +#endif
>
> Could be written as:
>
> #ifdef HAVE_INT64_TIMESTAMP
> #define MIN_TIMESTAMP
> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
> #define MAX_TIMESTAMP
> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
> #else
> #define MIN_TIMESTAMP
> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
> #define MAX_TIMESTAMP
> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
> #endif
>
> I assume modern compilers would convert these to the same constants at
> compile-time,

Firstly, Postgres is compiling not only by modern compilers.

> rather than impose a runtime penalty.

Secondary, It is hard to write it correctly obeying Postgres' coding
standard (e.g. 80-columns border) and making it clear: it is not so
visual difference between USECS_PER_DAY and SECS_PER_DAY in the
expressions above (but it is vivid in comments in the file). Also a
questions raise "Why is INT64CONST(0) necessary in `#else' block"
(whereas `float' is necessary there) or "Why is INT64CONST set for
MIN_TIMESTAMP, but not for MAX_TIMESTAMP?" (JULIAN_MAX4STAMPS is _not_
int64).

The file is full of constants. For example, JULIAN_MAX and
SECS_PER_DAY are one of them.

I would leave it as is.

> The #defines would be less brittle in
> the event, for example, that the postgres epoch were ever changed.

I don't think it is real, and even in such case all constants are
collected together in the file and will be found and changed at once.

> Mark Dilger

-- 
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-03-14 Thread Robert Haas

On Sat, Mar 12, 2016 at 5:53 PM, Peter Geoghegan  wrote:
> On Sat, Mar 12, 2016 at 2:43 PM, Michael Paquier
>  wrote:
>> Only one version of this patch has been sent at the beginning of this
>> thread, and Heikki has clearly expressed his disagreement about at
>> least a portion of it at the beginning of this thread, so I find it
>> hard to define it as an "uncontroversial" thing and something that is
>> clear to have as things stand. Seeing a new version soon would be a
>> good next step I guess.
>
> What is the point in saying this, Michael? What purpose does it serve?

Gee, I think Michael is right on target.  What purpose does writing
him an email that sounds annoyed serve?

There hasn't been a new version of this patch in 9 months, you're
clearly not in a hurry to produce one, and nobody else seems to feel
strongly that this is something that needs to be done at all.  I think
we could just let this go and be just fine, but instead of doing that
and moving onto patches that people do feel strongly about, we're
arguing about this.  Bummer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread David Rowley

On 15 March 2016 at 08:53, Robert Haas  wrote:
> I haven't fully studied every line of this yet, but here are a few comments:
>
> +   case T_PartialAggref:
> +   coll = InvalidOid; /* XXX is this correct? */
> +   break;
>
> I doubt it.

Thanks for looking at this.

Yeah, I wasn't so sure of the collation thing either, so stuck a
reminder on there. The way I'm seeing it at the moment is that since
the partial aggregate is never displayed to the user, and we never
perform equality comparison on them (since HAVING is applied in the
final aggregate stage), then my line of thought was that the collation
should not matter. Over on the combine aggregate states thread I'm
doing work to make the standard serialize functions use bytea, and
bytea don't allow collate:

# create table c (b bytea collate "en_NZ");
ERROR:  collations are not supported by type bytea
LINE 1: create table c (b bytea collate "en_NZ");

I previously did think of reusing the Aggref's collation, but I ended
up leaning towards the more "does not matter" side of the argument. Of
course, I may have completely failed to think of some important
reason, which is why I left that comment, so it might provoke some
thought with someone else with more collation knowledge.

> More generally, why are we inventing PartialAggref
> instead of reusing Aggref?  The code comments seem to contain no
> indication as to why we shouldn't need all the same details for
> PartialAggref that we do for Aggref, instead of only a small subset of
> them.  Hmm... actually, it looks like PartialAggref is intended to
> wrap Aggref, but that seems like a weird design.  Why not make Aggref
> itself DTRT?   There's not enough explanation in the patch of what is
> going on here and why.

A comment does explain this, but perhaps it's not good enough, so I've
rewritten it to become.

/*
 * PartialAggref
 *
 * When partial aggregation is required in a plan, the nodes from the partial
 * aggregate node, up until the finalize aggregate node must pass the partially
 * aggregated states up the plan tree. In regards to target list construction
 * in setrefs.c, this requires that exprType() returns the state's type rather
 * than the final aggregate value's type, and since exprType() for Aggref is
 * coded to return the aggtype, this is not correct for us. We can't fix this
 * by going around modifying the Aggref to change it's return type as setrefs.c
 * requires searching for that Aggref using equals() which compares all fields
 * in Aggref, and changing the aggtype would cause such a comparison to fail.
 * To get around this problem we wrap the Aggref up in a PartialAggref, this
 * allows exprType() to return the correct type and we can handle a
 * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
 * the underlying Aggref. The PartialAggref lives as long as executor start-up,
 * where it's removed and replaced with it's underlying Aggref.
 */
typedef struct PartialAggref

does that help explain it better?

> }
> +   if (can_parallel)
> +   {
>
> Seems like a blank line would be in order.

Fixed.

> I don't see where this applies has_parallel_hazard or anything
> comparable to the aggregate functions.  I think it needs to do that.

Not sure what you mean here.

>
> +   /* XXX +1 ? do we expect the main process to actually do real work? */
> +   numPartialGroups = Min(numGroups, subpath->rows) *
> +   (subpath->parallel_degree + 
> 1);
> I'd leave out the + 1, but I don't think it matters much.

Actually I meant to ask you about this. I see that subpath->rows is
divided by the Path's parallel_degree, but it seems the main process
does some work too, so this is why I added + 1, as during my tests
using a query which produces 10 groups, and had 4 workers, I noticed
that Gather was getting 50 groups back, rather than 40, I assumed this
is because the main process is helping too, but from my reading of the
parallel query threads I believe this is because the Gather, instead
of sitting around idle tries to do a bit of work too, if it appears
that nothing else is happening quickly enough. I should probably go
read nodeGather.c to learn that though.

In the meantime I've removed the + 1, as it's not correct to do
subpath->rows * (subpath->parallel_degree + 1), since it was divided
by subpath->parallel_degree in the first place, we'd end up with an
extra worker's worth of rows for queries which estimate a larger
number of groups than partial path rows.

> +   aggstate->finalizeAggs == true)
>
> We usually just say if (a) not if (a == true) when it's a boolean.
> Similarly !a rather than a == false.

hmm, thanks. It appears that I've not been all that consistent in that
area. I didn't know that was convention. I see that some of my way
have crept into the explain.c

Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-14 Thread Robert Haas

On Sat, Mar 12, 2016 at 1:58 AM, Amit Kapila  wrote:
> Yeah, that makes the addition of test for this functionality difficult.
> Robert, do you have any idea what kind of test would have caught this issue?

Yep.  Committed with that test:

DO $$
BEGIN
   EXECUTE 'EXPLAIN ANALYZE SELECT * INTO TABLE easi FROM int8_tbl';
END$$;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread James Sewell

On Tue, Mar 15, 2016 at 9:32 AM, Robert Haas  wrote:

>
> I kind of doubt this would work well, but somebody could write a patch
> for it and try it out.

OK I'll give this a go today and report back.

Would the eventual plan be to use pg_proc.procost for the functions from
each aggregate concerned? If so I might have a peek at that too, although I
imagine I won't get far.

Cheers,

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] [PATCH v6] GSSAPI encryption support

2016-03-14 Thread David Steele

On 3/14/16 7:20 PM, Robbie Harwood wrote:

> David Steele  writes:
>
>>
>> Strange timing since I was just testing this.  Here's what I got:
>>
>> $ pg/bin/psql -h localhost -U vagr...@pgmasters.net postgres
>> conn->inStart = 179, conn->inEnd = 179, conn->inCursor = 179
>> psql (9.6devel)
>> Type "help" for help.
>>
>> postgres=>
> 
> Thanks, that certainly is interesting!  I did finally manage to
> reproduce the issue on my end, but the rate of incidence is much lower
> than what you and Michael were seeing: I have to run connections in a
> loop for about 10-20 minutes before it makes itself apparent (and no,
> it's not due to entropy).  Apparently I just wasn't patient enough.

I'm running client and server on the same VM - perhaps that led to less
latency than your setup?

> All that is to say: thank you very much for investigating that!

My pleasure!

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Timeline following for logical slots

2016-03-14 Thread Andres Freund

On 2016-03-14 20:10:58 -0300, Alvaro Herrera wrote:
> Great, thanks.  I've studied this to the point where I'm confident that
> it makes sense, so I'm about to push it.  I didn't change any logic,
> only updated comments here and there, both in the patch and in some
> preexisting code.  I also changed the "List *timelineHistory" to be
> #ifndef FRONTEND, which seems cleaner than having it and insist that it
> couldn't be used.

Could you perhaps wait till tomorrow? I'd like to do a pass over it.

Thanks

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v6] GSSAPI encryption support

2016-03-14 Thread Robbie Harwood

David Steele  writes:

> On 3/14/16 4:10 PM, Robbie Harwood wrote:
>
>> David Steele  writes:
>>
>>> On 3/8/16 5:44 PM, Robbie Harwood wrote:
>>>
 Here's yet another version of GSSAPI encryption support.  It's also
 available for viewing on my github:
>>>
>>> psql simply hangs and never returns.  I have attached a pcap of the
>>> psql/postgres session generated with:
>> 
>> Please disregard my other email.  I think I've found the issue; will
>> post a new version in a moment.
>
> Strange timing since I was just testing this.  Here's what I got:
>
> $ pg/bin/psql -h localhost -U vagr...@pgmasters.net postgres
> conn->inStart = 179, conn->inEnd = 179, conn->inCursor = 179
> psql (9.6devel)
> Type "help" for help.
>
> postgres=>

Thanks, that certainly is interesting!  I did finally manage to
reproduce the issue on my end, but the rate of incidence is much lower
than what you and Michael were seeing: I have to run connections in a
loop for about 10-20 minutes before it makes itself apparent (and no,
it's not due to entropy).  Apparently I just wasn't patient enough.

> This was after commenting out:
>
> // appendBinaryPQExpBuffer(>gwritebuf,
> //conn->inBuffer + conn->inStart,
> //conn->inEnd - conn->inStart);
> // conn->inEnd = conn->inStart;
>
> The good news I can log on every time now!

Since conn->inStart == conn->inEnd in the case you were testing, the
lines you commented out would have been a no-op anyway (that's the
normal case of operation, as far as I can tell).  That said, the chances
of hitting the race for me seemed very dependent on how much code wants
to run in that conditional: I got it up to 30-40 minutes when I added a
lot of printf()s (can't just run in gdb because it's nondeterministic
and rr has flushing bugs at the moment).

All that is to say: thank you very much for investigating that!

signature.asc
Description: PGP signature

[HACKERS] [PATCH v7] GSSAPI encryption support

2016-03-14 Thread Robbie Harwood

Hello friends,

New week, new version.  GitHub link:
https://github.com/frozencemetery/postgres/tree/feature/gssencrypt7

Changes in this version:

- Removed extra whitespace in auth code movement.

- Fixed connection desync issue.  A diff of this and v6 will reveal
  three issues:

  - First, that pg_GSS_read() didn't properly handle having a full
buffer when called because pqsecure_raw_read() doesn't handle reads of
size 0.  I've elected to change my own code here only, but it may be
desirable to change pqsecure_raw_read() as well depending on whether
other people are likely to hit that.

  - Second, that I was shunting data into the wrong buffer (don't know
how this was overlooked; it has "write" right there in the name).

  - Third, that I'm now immediately decrypting that data into
conn->inBuffer rather than deferring that step until later.  This
removes the hang because now the connection will not erroneously get
stuck polling while data is buffered.

Thanks!
From 3b62e99de16f2c4600d0bb02f3626e5157ecdc6c Mon Sep 17 00:00:00 2001
From: Robbie Harwood 
Date: Fri, 26 Feb 2016 16:07:05 -0500
Subject: [PATCH 1/3] Move common GSSAPI code into its own files

On both the frontend and backend, prepare for GSSAPI encryption suport
by moving common code for error handling into a common file.  Other than
build-system changes, no code changes occur in this patch.
---
 configure   |  2 +
 configure.in|  1 +
 src/Makefile.global.in  |  1 +
 src/backend/libpq/Makefile  |  4 ++
 src/backend/libpq/auth.c| 63 +---
 src/backend/libpq/be-gssapi-common.c| 74 +
 src/include/libpq/be-gssapi-common.h| 26 
 src/interfaces/libpq/Makefile   |  4 ++
 src/interfaces/libpq/fe-auth.c  | 48 +
 src/interfaces/libpq/fe-gssapi-common.c | 63 
 src/interfaces/libpq/fe-gssapi-common.h | 21 ++
 11 files changed, 198 insertions(+), 109 deletions(-)
 create mode 100644 src/backend/libpq/be-gssapi-common.c
 create mode 100644 src/include/libpq/be-gssapi-common.h
 create mode 100644 src/interfaces/libpq/fe-gssapi-common.c
 create mode 100644 src/interfaces/libpq/fe-gssapi-common.h

diff --git a/configure b/configure
index b3f3abe..a5bd629 100755
--- a/configure
+++ b/configure
@@ -713,6 +713,7 @@ with_systemd
 with_selinux
 with_openssl
 krb_srvtab
+with_gssapi
 with_python
 with_perl
 with_tcl
@@ -5491,6 +5492,7 @@ $as_echo "$with_gssapi" >&6; }
 
 
 
+
 #
 # Kerberos configuration parameters
 #
diff --git a/configure.in b/configure.in
index 0bd90d7..4fd8f05 100644
--- a/configure.in
+++ b/configure.in
@@ -636,6 +636,7 @@ PGAC_ARG_BOOL(with, gssapi, no, [build with GSSAPI support],
   krb_srvtab="FILE:\$(sysconfdir)/krb5.keytab"
 ])
 AC_MSG_RESULT([$with_gssapi])
+AC_SUBST(with_gssapi)
 
 
 AC_SUBST(krb_srvtab)
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index e94d6a5..3dbc5c2 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -183,6 +183,7 @@ with_perl	= @with_perl@
 with_python	= @with_python@
 with_tcl	= @with_tcl@
 with_openssl	= @with_openssl@
+with_gssapi	= @with_gssapi@
 with_selinux	= @with_selinux@
 with_systemd	= @with_systemd@
 with_libxml	= @with_libxml@
diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
index 09410c4..a8cc9a0 100644
--- a/src/backend/libpq/Makefile
+++ b/src/backend/libpq/Makefile
@@ -21,4 +21,8 @@ ifeq ($(with_openssl),yes)
 OBJS += be-secure-openssl.o
 endif
 
+ifeq ($(with_gssapi),yes)
+OBJS += be-gssapi-common.o
+endif
+
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 57c2f48..73d493e 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -136,11 +136,7 @@ bool		pg_krb_caseins_users;
  *
  */
 #ifdef ENABLE_GSS
-#if defined(HAVE_GSSAPI_H)
-#include 
-#else
-#include 
-#endif
+#include "libpq/be-gssapi-common.h"
 
 static int	pg_GSS_recvauth(Port *port);
 #endif   /* ENABLE_GSS */
@@ -715,63 +711,6 @@ recv_and_check_password_packet(Port *port, char **logdetail)
  */
 #ifdef ENABLE_GSS
 
-#if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
-/*
- * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for MingW
- * that contain the OIDs required. Redefine here, values copied
- * from src/athena/auth/krb5/src/lib/gssapi/generic/gssapi_generic.c
- */
-static const gss_OID_desc GSS_C_NT_USER_NAME_desc =
-{10, (void *) "\x2a\x86\x48\x86\xf7\x12\x01\x02\x01\x02"};
-static GSS_DLLIMP gss_OID GSS_C_NT_USER_NAME = _C_NT_USER_NAME_desc;
-#endif
-
-
-static void
-pg_GSS_error(int severity, char *errmsg, OM_uint32 maj_stat, OM_uint32 min_stat)
-{
-	gss_buffer_desc gmsg;
-	OM_uint32	lmin_s,
-msg_ctx;
-	char

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Peter Geoghegan

On Mon, Mar 14, 2016 at 4:05 PM, Tom Lane  wrote:
> So your proposal is basically to do #2 in all branches?  I won't fight it,
> if it doesn't bloat the code much.  The overhead should surely be trivial
> compared to network communication costs, and I'm afraid you might be right
> about the risk of latent bugs.

Yes, with one small difference: I wouldn't be calling ERR_get_error()
in the common case where SSL_get_error() returns SSL_ERROR_NONE, on
the theory that skipping that case represents no risk. I'm making a
concession to Peter E's view that that will calling ERR_get_error()
more will add useless cycles.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeline following for logical slots

2016-03-14 Thread Alvaro Herrera

Petr Jelinek wrote:
> On 04/03/16 17:08, Craig Ringer wrote:
> >I'd really appreciate some review of the logic there by people who know
> >timelines well and preferably know the xlogreader. It's really just one
> >function and 2/3 comments; the code is simple but the reasoning leading
> >to it is not.
> 
> I think this will have to be work for committer at this point. I can't find
> any flaws in the logic myself so I unless somebody protests I am going to
> mark this as ready for committer.

Great, thanks.  I've studied this to the point where I'm confident that
it makes sense, so I'm about to push it.  I didn't change any logic,
only updated comments here and there, both in the patch and in some
preexisting code.  I also changed the "List *timelineHistory" to be
#ifndef FRONTEND, which seems cleaner than having it and insist that it
couldn't be used.

Also, hopefully you don't have any future callers for the functions I
marked static (I checked the failover slots series and couldn't see
anything).  I will push this early tomorrow.

One thing I'm not quite sure about is why this is said to apply to
"logical slots" and not just to replication slots in general.  In what
sense does it not apply to physical replication slots?

(I noticed that we have this new line,
-   randAccess = true;
+   randAccess = true;  /* allow readPageTLI to go backwards */ 
in which now the comment is an identical copy of an identical line
elsewhere; however, randAccess doesn't actually affect readPageTLI in
any way, so AFAICS both comments are now wrong.)

> Well for testing purposes it's quite fine I think. The TAP framework
> enhancements needed for this are now in and it works correctly against
> current master.

Certainly the new src/test/recovery/t/006whatever.pl file is going to be
part of the commit.  What I'm not so sure about is
src/test/modules/decoding_failover: is there any reason we don't just
put the new functions in contrib/test_decoding?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index fcb0872..7b60b8f 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -10,9 +10,11 @@
  *
  * NOTES
  *		See xlogreader.h for more notes on this facility.
+ *
+ *		This file is compiled as both front-end and backend code, so it
+ *		may not use ereport, server-defined static variables, etc.
  *-
  */
-
 #include "postgres.h"
 
 #include "access/transam.h"
@@ -116,6 +118,11 @@ XLogReaderAllocate(XLogPageReadCB pagereadfunc, void *private_data)
 		return NULL;
 	}
 
+#ifndef FRONTEND
+	/* Will be loaded on first read */
+	state->timelineHistory = NIL;
+#endif
+
 	return state;
 }
 
@@ -135,6 +142,10 @@ XLogReaderFree(XLogReaderState *state)
 	pfree(state->errormsg_buf);
 	if (state->readRecordBuf)
 		pfree(state->readRecordBuf);
+#ifndef FRONTEND
+	if (state->timelineHistory)
+		list_free_deep(state->timelineHistory);
+#endif
 	pfree(state->readBuf);
 	pfree(state);
 }
@@ -208,10 +219,11 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 
 	if (RecPtr == InvalidXLogRecPtr)
 	{
+		/* No explicit start point; read the record after the one we just read */
 		RecPtr = state->EndRecPtr;
 
 		if (state->ReadRecPtr == InvalidXLogRecPtr)
-			randAccess = true;
+			randAccess = true;	/* allow readPageTLI to go backwards */
 
 		/*
 		 * RecPtr is pointing to end+1 of the previous WAL record.  If we're
@@ -223,6 +235,8 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 	else
 	{
 		/*
+		 * Caller supplied a position to start at.
+		 *
 		 * In this case, the passed-in record pointer should already be
 		 * pointing to a valid record starting position.
 		 */
@@ -309,8 +323,10 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 		/* XXX: more validation should be done here */
 		if (total_len < SizeOfXLogRecord)
 		{
-			report_invalid_record(state, "invalid record length at %X/%X",
-  (uint32) (RecPtr >> 32), (uint32) RecPtr);
+			report_invalid_record(state,
+		"invalid record length at %X/%X: wanted %lu, got %u",
+  (uint32) (RecPtr >> 32), (uint32) RecPtr,
+  SizeOfXLogRecord, total_len);
 			goto err;
 		}
 		gotheader = false;
@@ -466,9 +482,7 @@ err:
 	 * Invalidate the xlog page we've cached. We might read from a different
 	 * source after failure.
 	 */
-	state->readSegNo = 0;
-	state->readOff = 0;
-	state->readLen = 0;
+	XLogReaderInvalCache(state);
 
 	if (state->errormsg_buf[0] != '\0')
 		*errormsg = state->errormsg_buf;
@@ -580,10 +594,19 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 	return readLen;
 
 err:
+	XLogReaderInvalCache(state);
+	return -1;
+}
+
+/*
+ *

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Tom Lane

Peter Geoghegan  writes:
> On Mon, Mar 14, 2016 at 3:06 PM, Tom Lane  wrote:
>> Agreed, we need to deal with this one way or the other.  My proposal
>> is:
>> 
>> 1. In HEAD, do it as Peter E. suggests, ie clear error queue before calls.
>> 
>> 2. In back branches, clear error queue before *and* after calls.  This
>> will waste a few nanoseconds but will avoid any risk of breaking
>> existing third-party code.

> I am concerned that users will never be able to get this right, since
> I think it requires every Ruby or PHP app using some thin OpenSSL
> wrapper to clear the per-queue thread. It's a big mess, but it's our
> mess to some degree.

So your proposal is basically to do #2 in all branches?  I won't fight it,
if it doesn't bloat the code much.  The overhead should surely be trivial
compared to network communication costs, and I'm afraid you might be right
about the risk of latent bugs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Peter Geoghegan

On Mon, Mar 14, 2016 at 3:06 PM, Tom Lane  wrote:
> Agreed, we need to deal with this one way or the other.  My proposal
> is:
>
> 1. In HEAD, do it as Peter E. suggests, ie clear error queue before calls.
>
> 2. In back branches, clear error queue before *and* after calls.  This
> will waste a few nanoseconds but will avoid any risk of breaking
> existing third-party code.
>
> I think it's reasonable to expect extensions to update to the newer
> behavior in a major release, but we're taking risks if we expect
> that to happen in minor releases.

I am concerned that users will never be able to get this right, since
I think it requires every Ruby or PHP app using some thin OpenSSL
wrapper to clear the per-queue thread. It's a big mess, but it's our
mess to some degree.

I wonder if it would be just as good if we ensured that
ERR_get_error() was definitely called in any circumstance where it
looked like we might have an error: ERR_get_error() would be
*reliably* called, as in my patch, but unlike my patch only when
SSL_get_error() indicated a problem (not all the time).

Heikki believed that clearing the error queue by calling
ERR_clear_error() before calling an I/O function like SSL_read() was
necessary, as we all do; no controversy there. But Heikki also
believed, even prior to hearing about this bug, that it was important
and necessary for ERR_get_error() to be reached when there might be an
error added to the queue following a Postgres/libpq call to an I/O
function like SSL_read() followed by SSL_get_error() indicating
trouble. He thought, as I do, that it would be a good idea to not rely
on that happening from a distance (i.e. not relying on reaching
SSLerrmessage()). Peter E. seems to believe that there is absolutely
no reason to rely on ERR_get_error() getting called at all, and that
the existing SSLerrmessage() only exists for the purposes of producing
a human-readable error message.

Anyway, thinking about it some more, perhaps the best solution is to
do the ERR_get_error() call iff SSL_get_error() seems unhappy, perhaps
even placing the two into a utility function. That's probably almost
the same as the existing behavior, as far as clearing up the queue
after we may have added to it goes. I don't know if that's any less
safe then my patch's pessimistic approach. It seems like it might be
just as safe. Under this compromise, I think we'd probably clear the
error queue after SSL_get_error() returned a value that is not
SSL_ERROR_NONE, though (including SSL_ERROR_WANT_READ, etc). What do
you think about that?

> In any case, we need a patch that touches the backend-side code as well.

I agree that the backend-side code should be covered. I quickly
changed my mind about that, and am happy to produce a revision along
those lines.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Tomas Vondra


Hi,

On 03/13/2016 10:44 PM, David Rowley wrote:

On 12 March 2016 at 16:31, David Rowley  wrote:

I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.


The attached patch fixes a harmless compiler warning about a
possible uninitialised variable.


I've looked at this patch today. The patch seems quite solid, but I do 
have a few minor comments (or perhaps questions, given that this is the 
first time I looked at the patch).


1) exprCollation contains this bit:
---

case T_PartialAggref:
coll = InvalidOid; /* XXX is this correct? */
break;

I doubt this is the right thing to do. Can we actually get to this piece 
of code? I haven't tried too hard, but regression tests don't seem to 
trigger this piece of code.


Moreover, if we're handling PartialAggref in exprCollation(), maybe we 
should handle it also in exprInputCollation and exprSetCollation?


And if we really need the collation there, why not to fetch the actual 
collation from the nested Aggref? Why should it be InvalidOid?



2) partial_aggregate_walker
---

I think this should follow the naming convention that clearly identifies 
the purpose of the walker, not what kind of nodes it is supposed to 
walk. So it should be:


aggregates_allow_partial_walker


3) create_parallelagg_path
--

I do agree with the logic that under-estimates are more dangerous than 
over-estimates, so the current estimate is safer. But I think this would 
be a good place to apply the formula I proposed a few days ago (or 
rather the one Dean Rasheed proposed in response).


That is, we do know that there are numGroups in total, and each parallel 
worker sees subpath->rows then it's expected to see


sel = (subpath->rows / rel->tuples);
perGroup = (rel->tuples / numGroups);
workerGroups = numGroups * (1 - powl(1 - s, perGroup));
numPartialGroups = numWorkers * workerGroups

It's probably better to see Dean's message from 13/3.

4) Is clauses.h the right place for PartialAggType?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 6:24 PM, James Sewell  wrote:
> Any chance of getting a GUC (say min_parallel_degree) added to allow setting
> the initial value of parallel_degree, then changing the small relation check
> to also pass if parallel_degree > 1?
>
> That way you could set min_parallel_degree on a query by query basis if you
> are running aggregates which you know will take a lot of CPU.
>
> I suppose it wouldn't make much sense at all to set globally though, so it
> could just confuse matters.

I kind of doubt this would work well, but somebody could write a patch
for it and try it out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread James Sewell

On Tuesday, 15 March 2016, Robert Haas  wrote:
>
> > Does the cost of the aggregate function come into this calculation at
> > all? In PostGIS land, much smaller numbers of rows can generate loads
> > that would be effective to parallelize (worker time much >> than
> > startup cost).
>
> Unfortunately, no - only the table size.  This is a problem, and needs
> to be fixed.  However, it's probably not going to get fixed for 9.6.
> :-(
>

Any chance of getting a GUC (say min_parallel_degree) added to allow
setting the initial value of parallel_degree, then changing the small
relation check to also pass if parallel_degree > 1?

That way you could set min_parallel_degree on a query by query basis if you
are running aggregates which you know will take a lot of CPU.

I suppose it wouldn't make much sense at all to set globally though, so it
could just confuse matters.

Cheers,

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Tom Lane

Peter Geoghegan  writes:
> On Thu, Mar 10, 2016 at 8:16 PM, Peter Geoghegan  wrote:
>> On Thu, Mar 10, 2016 at 7:22 PM, Peter Eisentraut  wrote:
>>> Arguably, if everyone followed "my" approach, this should be very easy
>>> to fix everywhere.

>> I don't think that there is any clear indication that the OpenSSL
>> people would share that view. Or my view. Or anything that's sensible
>> or practical or actionable.

> Ideally, we'd be able to get this into the upcoming minor release.

Agreed, we need to deal with this one way or the other.  My proposal
is:

1. In HEAD, do it as Peter E. suggests, ie clear error queue before calls.

2. In back branches, clear error queue before *and* after calls.  This
will waste a few nanoseconds but will avoid any risk of breaking
existing third-party code.

I think it's reasonable to expect extensions to update to the newer
behavior in a major release, but we're taking risks if we expect
that to happen in minor releases.

In any case, we need a patch that touches the backend-side code as well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Background Processes and reporting

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 4:42 PM, Andres Freund  wrote:
> On 2016-03-14 16:16:43 -0400, Robert Haas wrote:
>> > I have already shown [0, 1] the overhead of measuring timings in linux on
>> > representative workload. AFAIK, these tests were the only one that showed
>> > any numbers. All other statements about terrible performance have been and
>> > remain unconfirmed.
>>
>> Of course, those numbers are substantial regressions which would
>> likely make it impractical to turn this on on a heavily-loaded
>> production system.
>
> A lot of people operating production systems are fine with trading a <=
> 10% impact for more insight into the system; especially if that
> configuration can be changed without a restart.  I know a lot of systems
> that use pg_stat_statements, track_io_timing = on, etc; just to get
> that. In fact there's people running perf more or less continuously in
> production environments; just to get more insight.
>
> I think it's important to get as much information out there without
> performance overhead, so it can be enabled by default. But I don't think
> it makes sense to not allow features in that cannot be enabled by
> default, *if* we tried to make them cheap enough beforehand.

Hmm, OK.  I would have expected you to be on the other side of this
question, so maybe I'm all wet.  One point I am concerned about is
that, right now, we have only a handful of types of wait events.  I'm
very interested in seeing us add more, like I/O and client wait.  So
any overhead we pay here is likely to eventually be paid in a lot of
places - thus it had better be extremely small.

>> I tend to think that there's not much point in allowing
>> pg_stat_get_progress_info('checkpointer') because we can just have a
>> dedicated view for that sort of thing, cf. pg_stat_bgwriter, which
>> seems better.
>
> But that infrastructure isn't really suitable for exposing quickly
> changing counters imo. And given that we now have a relatively generic
> framework, it seems like a pain to add a custom implementation just for
> the checkpointer. Also, using custom infrastructure means it's not
> extensible to custom bgworker, which doesn't seem like a good
> idea. E.g. it'd be very neat to show the progress of a logical
> replication catchup process that way, no?

It isn't really that hard to make a purpose-built shared memory area
for each permanent background process, and I think that would be a
better design.  Then you can have whatever types you need, whatever
column labels make sense, etc.  You can't really do that for command
progress reporting because there are so many commands, but a
single-purpose backend doesn't have that issue.  I do agree that
having background workers report into this facility could make sense.

>> Exposing the wait events from background processes
>> might be worth doing, but I don't think we want to add a bunch of
>> dummy lines to pg_stat_activity.
>
> Why are those dummy lines? It's activity in the cluster? We already show
> autovacuum workers in there. And walsenders, if you query the underlying
> function, instead of pg_stat_activity (due to a join to pg_database).

Hmm.  Well, OK, maybe.  I didn't realize walsenders were showing up
there ... sorta.  I guess my concern was that people would complain
about breaking compatibility, but since we're doing that already maybe
we ought to double down on it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-03-14 Thread Tom Lane

Pavel Stehule  writes:
> Where you are expecting the implementation? In PLpgSQL only, or generally
> in DDL, or in both levels?

I'd envision this as something the main parser does and plpgsql piggybacks
on.  One of the many half-baked things about %TYPE is that the main parser
has an implementation, and plpgsql has its own not-entirely-compatible
one.  (And one thing I especially don't like about the submitted patch is
that it makes those two diverge even further.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-03-14 Thread Pavel Stehule

2016-03-14 20:38 GMT+01:00 Tom Lane :

> Robert Haas  writes:
> > On Mon, Mar 14, 2016 at 12:04 PM, Tom Lane  wrote:
> >> Or in short: maybe it's time to blow up %TYPE and start fresh.
>
> > That's not a dumb idea.  I think %TYPE is an Oracle-ism, and it
> > doesn't seem to have been their best-ever design decision.
>
>
Using %TYPE has sense in PostgreSQL too. It is protection against a
domain's explosion - I don't need to declare the domains like varchar_10,
varchar_100, etc. It does more work in Oracle due living relation between
table columns and PL/SQL variables - but this differences are same for
domain types in PL/pgSQL. What is redundant in Postgres, is using %TYPE and
%ROWTYPE. Because native PL languages has strong relation to system catalog
I see this feature like well designed and helpful. Some different can be
our implementation.

>
> It is, and it wasn't.  What concerns me about the present patch is
> that it's trying to shoehorn more functionality into something that
> was badly designed to start with.  I think we'd be better off leaving
> %TYPE as a deprecated backwards-compatibility feature and inventing
> something new and more extensible.
>
> I'm not wedded to any part of the syntax I just wrote, but I do say
> that we need something that allows composability of type selectors.
>

Last version of this patch doesn't modify %TYPE part - and the implemented
syntax is near to your proposed (not all cases), and near to ADA syntax.
But, probably, you are unhappy with it.

Can you describe your expectations from this feature little bit more,
please?

We can leave the original discussion about %TYPE. It was my mistake. I push
two different ingredients to one soup.

There are two independent features - referenced types - the original %TYPE
and %ROWTYPE features, the possibility to take type from system catalog
information.

Last patch implements linear ELEMENT OF ... , ARRAY OF ... . Can be
solution recursive enhancing of last patch? We can implement other type
modificator like RANGE OF (any other?)

Then we can write some like

ARRAY OF RANGE OF int;  or ELEMENT OF ELEMENT OF array_of_ranges

or if we use functional syntax

ARRAY_OF(RANGE_OF(int))

I prefer non functional syntax - it looks more natural in DECLARE part, but
it is detail in this moment. I can live with functional syntax too. The
functional syntax is easy parserable, but the SQL is not functional - so I
see it foreign there.

Where you are expecting the implementation? In PLpgSQL only, or generally
in DDL, or in both levels?

Regards

Pavel

>
> regards, tom lane
>

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 3:56 PM, Paul Ramsey  wrote:
> On Sun, Mar 13, 2016 at 7:31 PM, David Rowley
>  wrote:
>> On 14 March 2016 at 14:52, James Sewell  wrote:
>>> One question - how is the upper limit of workers chosen?
>>
>> See create_parallel_paths() in allpaths.c. Basically the bigger the
>> relation (in pages) the more workers will be allocated, up until
>> max_parallel_degree.
>
> Does the cost of the aggregate function come into this calculation at
> all? In PostGIS land, much smaller numbers of rows can generate loads
> that would be effective to parallelize (worker time much >> than
> startup cost).

Unfortunately, no - only the table size.  This is a problem, and needs
to be fixed.  However, it's probably not going to get fixed for 9.6.
:-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-03-14 Thread Andres Freund

On 2016-01-31 05:09:33 +0800, Craig Ringer wrote:
> On 29 January 2016 at 18:16, Andres Freund  wrote:
> 
> > Hi,
> >
> > so, I'm reviewing the output of:
> >
> 
> Thankyou very much for the review.

Afaics you've not posted an updated version of this? Any chance you
could?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Background Processes and reporting

2016-03-14 Thread Andres Freund

Hi,

On 2016-03-14 16:16:43 -0400, Robert Haas wrote:
> > I have already shown [0, 1] the overhead of measuring timings in linux on
> > representative workload. AFAIK, these tests were the only one that showed
> > any numbers. All other statements about terrible performance have been and
> > remain unconfirmed.
>
> Of course, those numbers are substantial regressions which would
> likely make it impractical to turn this on on a heavily-loaded
> production system.

A lot of people operating production systems are fine with trading a <=
10% impact for more insight into the system; especially if that
configuration can be changed without a restart.  I know a lot of systems
that use pg_stat_statements, track_io_timing = on, etc; just to get
that. In fact there's people running perf more or less continuously in
production environments; just to get more insight.

I think it's important to get as much information out there without
performance overhead, so it can be enabled by default. But I don't think
it makes sense to not allow features in that cannot be enabled by
default, *if* we tried to make them cheap enough beforehand.

> > Ok, doing it in short steps seems to be a good plan. Any objections against
> > giving people an ability to turn some feature (i.e. notorious measuring
> > timings) even if it makes some performance degradation? Of course, it should
> > be turned off by default.
>
> I am not totally opposed to that, but I think a feature that causes a
> 10% performance hit when you turn it on will be mostly useless.  The
> people who need it won't be able to risk turning it on.

That's not my experience.

> > If anything, I’m not from PostgresPro and I’m not «accusing you». But to be
> > honest current committed implementation has been tested exactly on one
> > machine with two workloads. And I think, it is somehow unfair to demand more
> > from others. Although it doesn’t mean that testing on exactly one machine
> > with only one OS is enough, of course. I suppose, you should ask the authors
> > to test it on some representative hardware and workload but if authors don’t
> > have them, it would be nice to help them with that.
>
> I'm not necessarily opposed to that, but this thread has a lot more
> heat than light

Indeed.

>, and some of the other threads on this topic have had
> the same problem. There seems to be tremendous resistance to the idea
> that recording timestamps is going to be extensive even though there
> are many previous threads on pgsql-hackers about many different
> features showing that this is true.  Somehow, I've got to justify a
> position which has been taken by many people many times before on this
> very same mailing list.  That strikes me as 100% backwards.

Agreed; I find that pretty baffling. Especially that pointing out
problems like timestamp overhead generates a remarkable amount of
hostility is weird.

> > Also it would be really interesting to hear your opinion about the initial
> > Andres’s question. Any thoughts about changing current committed
> > implementation?
>
> I'm a little vague on specifically what Andres has in mind.

That makes two of us.

> I tend to think that there's not much point in allowing
> pg_stat_get_progress_info('checkpointer') because we can just have a
> dedicated view for that sort of thing, cf. pg_stat_bgwriter, which
> seems better.

But that infrastructure isn't really suitable for exposing quickly
changing counters imo. And given that we now have a relatively generic
framework, it seems like a pain to add a custom implementation just for
the checkpointer. Also, using custom infrastructure means it's not
extensible to custom bgworker, which doesn't seem like a good
idea. E.g. it'd be very neat to show the progress of a logical
replication catchup process that way, no?

> Exposing the wait events from background processes
> might be worth doing, but I don't think we want to add a bunch of
> dummy lines to pg_stat_activity.

Why are those dummy lines? It's activity in the cluster? We already show
autovacuum workers in there. And walsenders, if you query the underlying
function, instead of pg_stat_activity (due to a join to pg_database).

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Background Processes and reporting

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 3:54 PM, Vladimir Borodin  wrote:
> 5. Show extra information about wait event (i.e. exclusive of shared mode
> for LWLocks, relation/forknum/blknum for I/O operations, etc.).

I doubt that this is a good idea.  Everybody will pay the cost of it,
and who will get a benefit?  We haven't had any wait monitoring at all
in PostgreSQL for years and years and years and it's only just now
getting to the top of our list of things to fix.  So I have a hard
time believing that now we suddenly need this level of detail.  The
very good thing about the committed implementation is that it requires
*no* synchronization, and anything more than a 4-byte integer will
(probably an st_changecount type protocol).  I continue to believe
that a feature that is on for everyone and dirt cheap is going to be
more valuable than anything that is expensive enough to require an
"off" switch.

> I have already shown [0, 1] the overhead of measuring timings in linux on
> representative workload. AFAIK, these tests were the only one that showed
> any numbers. All other statements about terrible performance have been and
> remain unconfirmed.

Of course, those numbers are substantial regressions which would
likely make it impractical to turn this on on a heavily-loaded
production system.  On the other hand, the patch actually committed is
turned on by default and Amit posted numbers showing no performance
change at all.

> As for the size of such information it of course should be configurable.
> I.e. in Oracle there is a GUC for the size of ring buffer to store history
> of sampling with extra information about each wait event.

That's a reasonable idea, although not one I'm very excited about.

> Ok, doing it in short steps seems to be a good plan. Any objections against
> giving people an ability to turn some feature (i.e. notorious measuring
> timings) even if it makes some performance degradation? Of course, it should
> be turned off by default.

I am not totally opposed to that, but I think a feature that causes a
10% performance hit when you turn it on will be mostly useless.  The
people who need it won't be able to risk turning it on.

> If anything, I’m not from PostgresPro and I’m not «accusing you». But to be
> honest current committed implementation has been tested exactly on one
> machine with two workloads. And I think, it is somehow unfair to demand more
> from others. Although it doesn’t mean that testing on exactly one machine
> with only one OS is enough, of course. I suppose, you should ask the authors
> to test it on some representative hardware and workload but if authors don’t
> have them, it would be nice to help them with that.

I'm not necessarily opposed to that, but this thread has a lot more
heat than light, and some of the other threads on this topic have had
the same problem. There seems to be tremendous resistance to the idea
that recording timestamps is going to be extensive even though there
are many previous threads on pgsql-hackers about many different
features showing that this is true.  Somehow, I've got to justify a
position which has been taken by many people many times before on this
very same mailing list.  That strikes me as 100% backwards.

Similarly, the position that a wait-reporting interface that does not
require synchronization will be a lot cheaper than one that does
require synchronization has been questioned repeatedly.  I'm not very
interested in spending a lot of time defending that proposition or
producing benchmarking results to support it, and I don't think I
should have to.  We wouldn't have so many patches floating around that
aimed to reduce locking if synchronization overhead didn't cost, and
what is being proposed is to stick those into low-level code paths
that are sometimes highly trafficked.

> Also it would be really interesting to hear your opinion about the initial
> Andres’s question. Any thoughts about changing current committed
> implementation?

I'm a little vague on specifically what Andres has in mind.  I tend to
think that there's not much point in allowing
pg_stat_get_progress_info('checkpointer') because we can just have a
dedicated view for that sort of thing, cf. pg_stat_bgwriter, which
seems better.  Exposing the wait events from background processes
might be worth doing, but I don't think we want to add a bunch of
dummy lines to pg_stat_activity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v6] GSSAPI encryption support

2016-03-14 Thread David Steele

On 3/14/16 4:10 PM, Robbie Harwood wrote:
> David Steele  writes:
> 
>> Hi Robbie,
>>
>> On 3/8/16 5:44 PM, Robbie Harwood wrote:
>>> Hello friends,
>>>
>>> Here's yet another version of GSSAPI encryption support.  It's also
>>> available for viewing on my github:
>>
>> The build went fine but when testing I was unable to logon at all.  I'm
>> using the same methodology as in
>> http://www.postgresql.org/message-id/56be0ff9.70...@pgmasters.net except
>> that I'm running against 51c0f63 and using the v6 patch set.
>>
>> psql simply hangs and never returns.  I have attached a pcap of the
>> psql/postgres session generated with:
>>
>> tcpdump -i lo -nnvvXSs 1514 port 5432 -w gssapi.pcap
>>
>> If you would like me to capture more information please let me know
>> specifically how you would like me to capture it.
> 
> Please disregard my other email.  I think I've found the issue; will
> post a new version in a moment.

Strange timing since I was just testing this.  Here's what I got:

$ pg/bin/psql -h localhost -U vagr...@pgmasters.net postgres
conn->inStart = 179, conn->inEnd = 179, conn->inCursor = 179
psql (9.6devel)
Type "help" for help.

postgres=>

This was after commenting out:

// appendBinaryPQExpBuffer(>gwritebuf,
//  conn->inBuffer + conn->inStart,
//  conn->inEnd - conn->inStart);
// conn->inEnd = conn->inStart;

The good news I can log on every time now!

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] [PATCH v6] GSSAPI encryption support

2016-03-14 Thread Robbie Harwood

David Steele  writes:

> Hi Robbie,
>
> On 3/8/16 5:44 PM, Robbie Harwood wrote:
>> Hello friends,
>> 
>> Here's yet another version of GSSAPI encryption support.  It's also
>> available for viewing on my github:
>
> The build went fine but when testing I was unable to logon at all.  I'm
> using the same methodology as in
> http://www.postgresql.org/message-id/56be0ff9.70...@pgmasters.net except
> that I'm running against 51c0f63 and using the v6 patch set.
>
> psql simply hangs and never returns.  I have attached a pcap of the
> psql/postgres session generated with:
>
> tcpdump -i lo -nnvvXSs 1514 port 5432 -w gssapi.pcap
>
> If you would like me to capture more information please let me know
> specifically how you would like me to capture it.

Please disregard my other email.  I think I've found the issue; will
post a new version in a moment.


signature.asc
Description: PGP signature

Re: [HACKERS] WIP: Upper planner pathification

2016-03-14 Thread Tom Lane

Robert Haas  writes:
> On Mon, Mar 14, 2016 at 1:37 PM, Tom Lane  wrote:
>> Yeah.  An alternative definition that would support that would be to
>> call the upper-path-providing callback for each FDW that's responsible
>> for any base relation of the query.  But I think that that would often
>> lead to a lot of redundant/wasted computation, and it's not clear to
>> me that we can support such cases without other changes as well.

> Sure, that's fine with me.  Are you going to go make these changes now?

So I started looking at this, and almost immediately came to two
conclusions:

1. We need to add a "PathTarget *" parameter to create_foreignscan_path.
The hard-wired assignment that's in there now,

pathnode->path.pathtarget = &(rel->reltarget);

is sufficient for scan and join paths, but it doesn't work at all for
upper relations because we don't put anything in their reltarget fields.
We could do so, but it's still problematic because there will be
situations where non-topmost Paths associated with an upperrel have other
targets than what topmost Paths do.  This is true today for set-operation
Paths and WindowAgg Paths, and I think possibly other cases as well (such
as inserted projection nodes).  We could possibly insist on creating a
separate upperrel for every distinct target that's used by intermediate
levels of Path in these trees, but that seems kinda pointless to me, at
least for now.  Really the best answer is to let FDWs have control of it.
And it's not like we've never whacked this function's API around before.

2. I was being way too cycle-miserly when I decided to make
RelOptInfo.reltarget be an embedded struct rather than defining PathTarget
as a regular, separate node type.  I'm gonna change that before it's too
late.  One extra palloc per RelOptInfo is not a noticeable price, and
there are too many places where this choice is resulting in notational
weirdness.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Background Processes and reporting

2016-03-14 Thread Andres Freund

On 2016-03-12 16:29:11 +0530, Amit Kapila wrote:
> On Sat, Mar 12, 2016 at 3:10 AM, Andres Freund  wrote:
> >
> >
> > > Similarly for the wait event stuff - checkpointer, wal writer,
> > > background writer are in many cases processes that very often are
> > > blocked on locks, IO and such.  Thus restricting the facility to
> > > database connected processes seems like a loss.
> >
> > I think one way to address this would be to not only report
> > PgBackendStatus type processes in pg_stat_activity. While that'd
> > obviously be a compatibility break, I think it'd be an improvement.
> >
> 
> I think here another point which needs more thoughts is that many of the
> pg_stat_activity fields are not relevant for background processes, ofcourse
> one can say that we can keep those fields as NULL, but still I think that
> indicates it is not the most suitable way to expose such information.

But neither are all of them relevant for autovacuum workers, and we
already show them.  pg_stat_activity as a name imo doesn't really imply
that it's about plain queries.  ISTM we should add a 'backend_type'
column that is one of backend|checkpointer|autovacuum|autovacuum-worker|wal 
writer| bgwriter| bgworker
(or similar). That makes querying easier.  And then display all shmem
connected processes.

> Another way could be to have new view like pg_stat_background_activity with
> only relevant fields or try expose via individual views like
> pg_stat_bgwriter.

I think the second is a pretty bad alternative; it'll force us to add
new views with very similar information; and it'll be hard to get
information about the whole system.   I mean if you want to know which
locks are causing problems, you don't primarily care whether it's a
background process or a backend that has contention issues.

> Do you intend to get this done for 9.6 considering an add-on patch for wait
> event information displayed in pg_stat_activity?

I think we should fix this for 9.6; it's a weakness in a new
interface. Let's not yank people around more than we need to.

I'm willing to do some work on that, if we can agree upon a course.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Paul Ramsey

On Sun, Mar 13, 2016 at 7:31 PM, David Rowley
 wrote:
> On 14 March 2016 at 14:52, James Sewell  wrote:
>> One question - how is the upper limit of workers chosen?
>
> See create_parallel_paths() in allpaths.c. Basically the bigger the
> relation (in pages) the more workers will be allocated, up until
> max_parallel_degree.

Does the cost of the aggregate function come into this calculation at
all? In PostGIS land, much smaller numbers of rows can generate loads
that would be effective to parallelize (worker time much >> than
startup cost).

P

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Background Processes and reporting

2016-03-14 Thread Vladimir Borodin

> 14 марта 2016 г., в 22:21, Robert Haas  написал(а):
> 
> On Sat, Mar 12, 2016 at 6:05 AM, Oleg Bartunov  wrote:
>>> So?
>> 
>> So, Robert already has experience with the subject, probably,  he has bad
>> experience with edb implementation and he'd like to see something better in
>> community version. That's fair and I accept his position.
> 
> Bingo - though maybe "bad" experience is not quite as accurate as
> "could be better".
> 
>> Wait monitoring is one of the popular requirement of russian companies, who
>> migrated from Oracle. Overwhelming majority of them use Linux, so I suggest
>> to have configure flag for including wait monitoring at compile time
>> (default is no wait monitoring), or have GUC variable, which is also off by
>> default, so we have zero to minimal overhead of monitoring. That way we'll
>> satisfy many enterprises and help them to choose postgres, will get feedback
>> from production use and have time for feature improving.
> 
> So, right now we can only display the wait information in
> pg_stat_activity.  There are a couple of other things that somebody
> might want to do:
> 
> 1. Sample the wait state information across all backends in the
> system.  On a large, busy system, this figures to be quite cheap, and
> the sampling interval could be configurable.
> 
> 2. Count every instance of every wait event in every backend, and roll
> that up either via shared memory or additional stats messges.
> 
> 3. Like #2, but with timing information.
> 
> 4. Like #2, but on a per-query basis, somehow integrated with
> pg_stat_statements.

5. Show extra information about wait event (i.e. exclusive of shared mode for 
LWLocks, relation/forknum/blknum for I/O operations, etc.).

> 
> The challenge with any of these except #1 is that they are going to
> produce a huge volume of data, and, whether you believe it or not, #3
> is going to sometimes be crushingly slow.  Really.  I tend to think
> that #1 might be better than #2 or #3, but I'm not unwilling to listen
> to contrary arguments, especially if backed up by careful benchmarking
> showing that the performance hit is negligible.

I have already shown [0, 1] the overhead of measuring timings in linux on 
representative workload. AFAIK, these tests were the only one that showed any 
numbers. All other statements about terrible performance have been and remain 
unconfirmed.

As for the size of such information it of course should be configurable. I.e. 
in Oracle there is a GUC for the size of ring buffer to store history of 
sampling with extra information about each wait event.

[0] 
http://www.postgresql.org/message-id/eee78e40-0e48-411a-9f90-cf9339da9...@simply.name
[1] 
http://www.postgresql.org/message-id/5f3dd73a-2a85-44bf-9f47-54049a81c...@simply.name

>  My reason for wanting
> to get the stuff we already had committed first is because I have
> found that it is best to proceed with these kinds of problems
> incrementally, not trying to solve too much in a single commit.  Now
> that we have the basics, we can build on it, adding more wait events
> and possibly more recordkeeping for the ones we have already - but
> anything that regresses performance for people not using the feature
> is a dead end in my book, as is anything that introduces overall
> stability risks.

Ok, doing it in short steps seems to be a good plan. Any objections against 
giving people an ability to turn some feature (i.e. notorious measuring 
timings) even if it makes some performance degradation? Of course, it should be 
turned off by default.

> 
> I think the way forward from here is that Postgres Pro should (a)
> rework their implementation to work with what has already been
> committed, (b) consider carefully whether they've done everything
> possible to contain the performance loss, (c) benchmark it on several
> different machines and workloads to see how much performance loss
> there is, and (d) stop accusing me of acting in bad faith.

If anything, I’m not from PostgresPro and I’m not «accusing you». But to be 
honest current committed implementation has been tested exactly on one machine 
with two workloads. And I think, it is somehow unfair to demand more from 
others. Although it doesn’t mean that testing on exactly one machine with only 
one OS is enough, of course. I suppose, you should ask the authors to test it 
on some representative hardware and workload but if authors don’t have them, it 
would be nice to help them with that.

Also it would be really interesting to hear your opinion about the initial 
Andres’s question. Any thoughts about changing current committed implementation?

> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
May the force be with you…

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Robert Haas

On Sun, Mar 13, 2016 at 5:44 PM, David Rowley
 wrote:
> On 12 March 2016 at 16:31, David Rowley  wrote:
>> I've attached an updated patch which is based on commit 7087166,
>> things are really changing fast in the grouping path area at the
>> moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a possible
> uninitialised variable.

I haven't fully studied every line of this yet, but here are a few comments:

+   case T_PartialAggref:
+   coll = InvalidOid; /* XXX is this correct? */
+   break;

I doubt it.  More generally, why are we inventing PartialAggref
instead of reusing Aggref?  The code comments seem to contain no
indication as to why we shouldn't need all the same details for
PartialAggref that we do for Aggref, instead of only a small subset of
them.  Hmm... actually, it looks like PartialAggref is intended to
wrap Aggref, but that seems like a weird design.  Why not make Aggref
itself DTRT?   There's not enough explanation in the patch of what is
going on here and why.

}
+   if (can_parallel)
+   {

Seems like a blank line would be in order.

I don't see where this applies has_parallel_hazard or anything
comparable to the aggregate functions.  I think it needs to do that.

+   /* XXX +1 ? do we expect the main process to actually do real work? */
+   numPartialGroups = Min(numGroups, subpath->rows) *
+   (subpath->parallel_degree + 1);

I'd leave out the + 1, but I don't think it matters much.

+   aggstate->finalizeAggs == true)

We usually just say if (a) not if (a == true) when it's a boolean.
Similarly !a rather than a == false.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: use foreign keys to improve join estimates v1

2016-03-14 Thread Tomas Vondra


Hi,

On 03/14/2016 02:12 PM, David Steele wrote:

Hi Thomas,

...

I don't think it would be clear to any reviewer which patch to apply
even if they were working.  I'm marking this "waiting for author".


Yeah. Rebasing the patches to current master was simple enough (there 
was just a simple #include conflict), but figuring out which of the 
patches is review-worthy was definitely difficult.


I do believe David's last patch is the best step forward, so I've 
rebased it, and made some basic aesthetic fixes (adding or rewording 
comments on a few places, etc.)


The one important code change is that I've removed the piece of code 
from find_best_foreign_key_quals that tried to be a bit too smart about 
equivalence classes.


My understanding is that it tried to handle cases like this example:

CREATE TABLE f (id1 INT, id2 INT, PRIMARY KEY (id1, id2));

CREATE TABLE d1 (id1 INT, id2 INT, FOREIGN KEY (id1, id2)
   REFERENCES f(id1, id2));

CREATE TABLE d2 (id1 INT, id2 INT, FOREIGN KEY (id1, id2)
   REFERENCES f(id1, id2));

SELECT * FROM f JOIN d1 ON (f.id1 = d1.id1 AND f.id2 = d1.id2)
JOIN d2 ON (f.id1 = d2.id1 AND f.id2 = d2.id2);

But it did so by also deriving foreign keys between d1 and d2, which I 
believe is wrong as there really is no foreign key, and thus no 
guarantee of existence of a matching row.


FWIW as I explained in a message from 24/2/2015, while this is 
definitely an issue worth fixing, I believe it needs to be done in some 
other way, not by foreign keys.


Attached is v3 of the patch, and also three SQL scripts demonstrating 
the impact of the patch on simple examples.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb0fc1e..3d38384 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2153,6 +2153,16 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outForeignKeyOptInfo(StringInfo str, const ForeignKeyOptInfo *node)
+{
+	WRITE_NODE_TYPE("FOREIGNKEYOPTINFO");
+
+	WRITE_OID_FIELD(conrelid);
+	WRITE_OID_FIELD(confrelid);
+	WRITE_INT_FIELD(nkeys);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3603,6 +3613,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_IndexOptInfo:
 _outIndexOptInfo(str, obj);
 break;
+			case T_ForeignKeyOptInfo:
+_outForeignKeyOptInfo(str, obj);
+break;
 			case T_EquivalenceClass:
 _outEquivalenceClass(str, obj);
 break;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5350329..4399a9f 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3870,6 +3870,347 @@ get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * quals_match_foreign_key
+ *		Determines if the foreign key is matched by joinquals.
+ *
+ * Checks that there are conditions on all columns of the foreign key, matching
+ * the operator used by the foreign key etc. If such complete match is found,
+ * the function returns bitmap identifying the matching quals (0-based).
+ *
+ * Otherwise (no match at all or incomplete match), NULL is returned.
+ */
+static Bitmapset *
+quals_match_foreign_key(PlannerInfo *root, ForeignKeyOptInfo *fkinfo,
+		RelOptInfo *fkrel, RelOptInfo *foreignrel,
+		List *joinquals)
+{
+	int i;
+	int nkeys = fkinfo->nkeys;
+	Bitmapset *qualmatches = NULL;
+	Bitmapset *fkmatches = NULL;
+
+	/*
+	 * Loop over each column of the foreign key and build a bitmap index
+	 * of each joinqual which matches. Note that we don't stop when we find
+	 * the first match, as the expression could be duplicated in the
+	 * joinquals, and we want to generate a bitmap index which has bits set for
+	 * every matching join qual.
+	 */
+	for (i = 0; i < nkeys; i++)
+	{
+		ListCell *lc;
+		int quallstidx = -1;
+
+		foreach(lc, joinquals)
+		{
+			RestrictInfo   *rinfo;
+			OpExpr		   *clause;
+			Var			   *leftvar;
+			Var			   *rightvar;
+
+			quallstidx++;
+
+			/*
+			 * Technically we don't need to, but here we skip this qual if
+			 * we've matched it to part of the foreign key already. This
+			 * should prove to be a useful optimization when the quals appear
+			 * in the same order as the foreign key's keys. We need only bother
+			 * doing this when the foreign key is made up of more than 1 set
+			 * of columns, and we're not testing the first column.
+			 */
+			if (i > 0 && bms_is_member(quallstidx, qualmatches))
+continue;
+
+			/*
+			 * Here since 'usefulquals' only contains bitmap indexes for quals
+			 * of type "var op var" we can safely skip checking this.
+			 */
+			rinfo = (RestrictInfo *) lfirst(lc);
+			clause = (OpExpr *) rinfo->clause;
+

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-03-14 Thread Tom Lane

Robert Haas  writes:
> On Mon, Mar 14, 2016 at 12:04 PM, Tom Lane  wrote:
>> Or in short: maybe it's time to blow up %TYPE and start fresh.

> That's not a dumb idea.  I think %TYPE is an Oracle-ism, and it
> doesn't seem to have been their best-ever design decision.

It is, and it wasn't.  What concerns me about the present patch is
that it's trying to shoehorn more functionality into something that
was badly designed to start with.  I think we'd be better off leaving
%TYPE as a deprecated backwards-compatibility feature and inventing
something new and more extensible.

I'm not wedded to any part of the syntax I just wrote, but I do say
that we need something that allows composability of type selectors.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2016-03-14 Thread David Steele

Hi Fabien,

On 3/14/16 3:27 PM, Fabien COELHO wrote:

>> Any takers to review this updated patch?
> 
> I intend to have a look at it, I had a look at a previous instance, but
> I'm ok if someone wants to proceed.

There's not exactly a long line of reviewers at the moment so if you
could do a followup review that would be great.

Thanks,
-- 
-David
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-14 Thread Peter Geoghegan

On Mon, Mar 14, 2016 at 10:39 AM, Robert Haas  wrote:
> I don't particularly like that interface.  I also suggest that it
> would be better to leave throttling to a future commit, and focus on
> getting the basic feature in first.

Works for me. I don't think throttling is an especially compelling feature.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2016-03-14 Thread Fabien COELHO



Hello David,


Any takers to review this updated patch?


I intend to have a look at it, I had a look at a previous instance, but 
I'm ok if someone wants to proceed.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 12:04 PM, Tom Lane  wrote:
> I wrote:
>> However ... one thing I was intending to mention on this thread is that
>> "get the array type over this type" isn't the only extension one might
>> wish for.  Another likely desire is "get the type of field 'foo' of this
>> composite type".  I don't suggest that this patch needs to implement
>> that right now; but it would be a good thing if we could see how the
>> chosen syntax could be extended in such a direction.  Otherwise we might
>> be painting ourselves into a corner.
>
> To enlarge a little bit: it seems to me that what we're really wishing for
> here is a type name syntax that goes beyond simple names.  If we were
> starting in a green field, we might choose a recursively-composable syntax
> like the following.
>
> type_name can be:
>
> * A simple type name, such as int8 or varchar[42].
>
> * TYPE_OF(expression), meaning that the SQL expression is parsed but never
> executed, we just take this construct as naming its result type.
>
> * ARRAY_OF(type_name), meaning the array type having type_name as its
> element type.
>
> * ELEMENT_OF(type_name), meaning the element type of the array type
> named by type_name.
>
> * ROW_OF(type_name [, type_name ...]), meaning the composite type with
> those types as columns.
>
> * FIELD_OF(type_name, foo), meaning the type of column "foo" of the
> composite type named by type_name.  I'm not sure if there would be
> use-cases for selecting a column other than by a simple literal name,
> but there could be variants of this function if so.
>
> It's possible to think of other cases, for example what about range
> types?  You could allow ELEMENT_OF() to apply to range types, certainly.
> I'm not sure about the other direction, because multiple range types
> could have the same element type; but it's possible to hope that some
> type-naming function along the lines of RANGE_OF(type_name, other args)
> could disambiguate.  The main reason I'm thinking of a function-like
> syntax here is that it can easily handle additional arguments when
> needed.
>
> Comparing this flight of fancy to where we are today, we have
> %TYPE as a remarkably ugly and limited implementation of TYPE_OF(),
> and we have the precedent that foo[] means ARRAY_OF(foo).  I'm not
> sure how we get any extensibility out of either of those things.
>
> Or in short: maybe it's time to blow up %TYPE and start fresh.

That's not a dumb idea.  I think %TYPE is an Oracle-ism, and it
doesn't seem to have been their best-ever design decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Peter Geoghegan

On Thu, Mar 10, 2016 at 8:16 PM, Peter Geoghegan  wrote:
> On Thu, Mar 10, 2016 at 7:22 PM, Peter Eisentraut  wrote:
>> Arguably, if everyone followed "my" approach, this should be very easy
>> to fix everywhere.
>
> I don't think that there is any clear indication that the OpenSSL
> people would share that view. Or my view. Or anything that's sensible
> or practical or actionable.

Ideally, we'd be able to get this into the upcoming minor release.
This bug has caused Heroku some serious problems.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Background Processes and reporting

2016-03-14 Thread Robert Haas

On Sat, Mar 12, 2016 at 6:05 AM, Oleg Bartunov  wrote:
>> So?
>
> So, Robert already has experience with the subject, probably,  he has bad
> experience with edb implementation and he'd like to see something better in
> community version. That's fair and I accept his position.

Bingo - though maybe "bad" experience is not quite as accurate as
"could be better".

> Wait monitoring is one of the popular requirement of russian companies, who
> migrated from Oracle. Overwhelming majority of them use Linux, so I suggest
> to have configure flag for including wait monitoring at compile time
> (default is no wait monitoring), or have GUC variable, which is also off by
> default, so we have zero to minimal overhead of monitoring. That way we'll
> satisfy many enterprises and help them to choose postgres, will get feedback
> from production use and have time for feature improving.

So, right now we can only display the wait information in
pg_stat_activity.  There are a couple of other things that somebody
might want to do:

1. Sample the wait state information across all backends in the
system.  On a large, busy system, this figures to be quite cheap, and
the sampling interval could be configurable.

2. Count every instance of every wait event in every backend, and roll
that up either via shared memory or additional stats messges.

3. Like #2, but with timing information.

4. Like #2, but on a per-query basis, somehow integrated with
pg_stat_statements.

The challenge with any of these except #1 is that they are going to
produce a huge volume of data, and, whether you believe it or not, #3
is going to sometimes be crushingly slow.  Really.  I tend to think
that #1 might be better than #2 or #3, but I'm not unwilling to listen
to contrary arguments, especially if backed up by careful benchmarking
showing that the performance hit is negligible.  My reason for wanting
to get the stuff we already had committed first is because I have
found that it is best to proceed with these kinds of problems
incrementally, not trying to solve too much in a single commit.  Now
that we have the basics, we can build on it, adding more wait events
and possibly more recordkeeping for the ones we have already - but
anything that regresses performance for people not using the feature
is a dead end in my book, as is anything that introduces overall
stability risks.

I think the way forward from here is that Postgres Pro should (a)
rework their implementation to work with what has already been
committed, (b) consider carefully whether they've done everything
possible to contain the performance loss, (c) benchmark it on several
different machines and workloads to see how much performance loss
there is, and (d) stop accusing me of acting in bad faith.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Upcoming back-branch releases

2016-03-14 Thread Tom Lane

In view of some rather important recent fixes such as commits
bf7ced5e2dc8622f and 301cc3549c29aaa5, the release team has decided
that it'd be a good thing to schedule a set of minor releases in
the near future.  The nearest convenient time seems to be week after
next, that is, wrap Monday Mar 28 for public announcement on Thursday
Mar 31st.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-03-14 Thread Robert Haas

On Sat, Mar 12, 2016 at 10:34 AM, Daniel Verite  wrote:
>> But worse than either of  those things, there is no real
>> agreement on what the overall design of this feature
>> should be.
>
> The part in the design that raised concerns upthread is
> essentially how headers sorting is exposed to the user and
> implemented.
>
> As suggested in [1], I've made some drastic changes in the
> attached patch to take the comments (from Dean R., Tom L.)
> into account.
> [ ... lengthy explanation ... ]
> - also NULLs are no longer excluded from headers, per Peter E.
>   comment in [2].

Dean, Tom, Peter, what do you think of the new version?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-03-14 Thread Tom Lane

"Shulgin, Oleksandr"  writes:
> What I dislike about this POC is all the disruption in libpq, to be
> honest.

Yeah, I don't much like that either.  But I don't think we can avoid
some refactoring there; as designed, conversion of an error message into
user-visible form is too tightly tied to receipt of the message.

> It would be much neater if we could form the verbose message every
> time and let the client decide where to cut it.  Maybe a bit "too clever"
> would be to put a \0 char between short message and it's verbose
> continuation.  The client could then reach the verbose part like this
> (assuming that libpq did put a verbose part there): msg + strlen(msg) + 1.

Blech :-(

Thinking about it, though, it seems to me that we could get away with
always performing both styles of conversion and sticking both strings
into the PGresult.  That would eat a little more space but not much.
Then we just need to add API to let the application get at both forms.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prepared Statement support for Parallel query

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 9:18 AM, Amit Kapila  wrote:
> On Fri, Feb 26, 2016 at 4:37 PM, Robert Haas  wrote:
>> And, I'm going to revert this part.  If you'd run the regression tests
>> under force_parallel_mode=regress, max_parallel_degree>0, you would
>> have noticed that this part breaks it, because of CREATE TABLE ... AS
>> EXECUTE.
>>
>
> I have looked into this issue and found that the reason for the failure is
> that in force_parallel_mode=regress, we enable parallel mode restrictions
> even if the parallel plan is not choosen as part of below code in
> standard_planner()
>
> if (force_parallel_mode == FORCE_PARALLEL_OFF || !glob->parallelModeOK)
>
> {
>
> glob->parallelModeNeeded = false;
>
> glob->wholePlanParallelSafe = false; /* either false or don't care */
>
> }
>
> else
>
> {
>
> glob->parallelModeNeeded = true;
>
> glob->wholePlanParallelSafe =
>
> !has_parallel_hazard((Node *) parse, false);
>
> }
>
>
> The failure cases fall into that category, basically wholePlanParallelSafe
> will be false, but parallelModeNeeded will be true which will enable
> parallel mode restrictions even though the plan won't contain Gather node.
> I think if we want to operate in above way for testing purpose, then we need
> to force during execution that statements for non read-only operations
> should not enter into parallel mode similar to what we are doing for
> non-zero tuple count case.  Attached patch fixes the problem.

This seems like a really ugly fix.  It might be possible to come up
with a fix along these lines, but I don't have much confidence in the
specific new test you've injected into executePlan().  Broadly, any
change of this type implicitly changes the contract between
executePlan() and the planner infrastructure - the planner can now
legally generate parallel plans in some cases where that would
previously have been unacceptable.  But I'm not in a hurry to rethink
where we've drawn the line there for 9.6.  Let's punt this issue for
now and come back to it in a future release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Obsolete wording in PL/Perl comment

2016-03-14 Thread Tom Lane

ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> The comment in hv_store_string() says that negative key length to
> hv_store() for UTF-8 is not documented, and mentions that 5.6 doesn't
> track UTF-8-ness of keys.  However, the negative length convention has
> been documented since 5.16, and 5.6 is no longer supported.  The
> attached patch updates the comment to reflect this.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-14 Thread Robert Haas

On Sat, Mar 12, 2016 at 7:49 AM, Amit Langote  wrote:
> On Fri, Mar 11, 2016 at 2:31 PM, Amit Langote
>  wrote:
>> On 2016/03/11 13:16, Robert Haas wrote:
>>> On Thu, Mar 10, 2016 at 9:04 PM, Amit Langote
>>>  wrote:
 So, from what I understand here, we should not put total count of index
 pages into st_progress_param; rather, have the client (reading
 pg_stat_progress_vacuum) derive it using pg_indexes_size() (?), as and
 when necessary.  However, only server is able to tell the current position
 within an index vacuuming round (or how many pages into a given index
 vacuuming round), so report that using some not-yet-existent mechanism.
>>>
>>> Isn't that mechanism what you are trying to create in 0003?
>>
>> Right, 0003 should hopefully become that mechanism.
>
> About 0003:
>
> Earlier, it was trying to report vacuumed index block count using
> lazy_tid_reaped() callback for which I had added a index_blkno
> argument to IndexBulkDeleteCallback. Turns out it's not such a good
> place to do what we are trying to do.  This callback is called for
> every heap pointer in an index. Not all index pages contain heap
> pointers, which means the existing callback does not allow to count
> all the index blocks that AM would read to finish a given index vacuum
> run.
>
> Instead, the attached patch adds a IndexBulkDeleteProgressCallback
> which AMs should call for every block that's read (say, right before a
> call to ReadBufferExtended) as part of a given vacuum run. The
> callback with help of some bookkeeping state can count each block and
> report to pgstat_progress API. Now, I am not sure if all AMs read 1..N
> blocks for every vacuum or if it's possible that some blocks are read
> more than once in single vacuum, etc.  IOW, some AM's processing may
> be non-linear and counting blocks 1..N (where N is reported total
> index blocks) may not be possible.  However, this is the best I could
> think of as doing what we are trying to do here. Maybe index AM
> experts can chime in on that.
>
> Thoughts?

Well, I think you need to study the index AMs and figure this out.

But I think for starters you should write a patch that reports the following:

1. phase
2. number of heap blocks scanned
3. number of heap blocks vacuumed
4. number of completed index vac cycles
5. number of dead tuples collected since the last index vac cycle
6. number of dead tuples that we can store before needing to perform
an index vac cycle

All of that should be pretty straightforward, and then we'd have
something we can ship.  We can add the detailed index reporting later,
when we get to it, perhaps for 9.7.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-03-14 Thread Thomas Munro

On Tue, Mar 15, 2016 at 6:58 AM, Robert Haas  wrote:
> On Sun, Mar 13, 2016 at 11:50 PM, Thomas Munro
>  wrote:
>> The last patches I posted don't apply today due to changes in master,
>> so here's a freshly merged patch series.
>
> +from the current synchronous stanbyindicates it has received the
>
> Uh, no.

Oops, thanks, fixed.  I'll wait for some more feedback or a conflict
with master before sending a new version.

> -SyncRepWaitForLSN(gxact->prepare_end_lsn);
> +{
> +/*
> + * Don't wait for the prepare to be applied remotely in remote_apply
> + * mode, just wait for it to be flushed to the WAL.  We will wait for
> + * apply when the transaction eventuallly commits or aborts.
> + */
> +if (synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY)
> +assign_synchronous_commit(SYNCHRONOUS_COMMIT_REMOTE_FLUSH, NULL);
> +
> +SyncRepWaitForLSN(gxact->prepare_end_lsn);
> +
> +if (synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY)
> +assign_synchronous_commit(SYNCHRONOUS_COMMIT_REMOTE_APPLY, NULL);
> +}
>
> What's with the extra block?

Yeah, that's silly, thanks.  Tidied up for the next version.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Obsolete comment in postgres_fdw.c

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 4:31 AM, Etsuro Fujita
 wrote:
>> It was left there intentionally to document all the conditions in one
>> place (some from the core and some from the FDW itself), for a ready
>> reference. In case tomorrow core thinks that matching user mapping is
>> not required, postgres_fdw would still require it to be incorporated.
>
> Thank you for the explanation!  I understand the reason, but that seems
> confusing to me.

Agreed.

>> In addition, I'd like to update some related comments in
>> src/include/nodes/relation.h and
>> src/backend/optimizer/path/joinpath.c.
>>
>>
>> Those look fine. Sorry for missing those in the commit and thanks for
>> providing a patch for the same.
>
> No problem.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-14 Thread David Steele


On 2/26/16 11:37 PM, Amit Kapila wrote:


On Sat, Feb 27, 2016 at 10:03 AM, Amit Kapila

Re: [HACKERS] [PATCH] Use correct types and limits for PL/Perl SPI query results

2016-03-14 Thread Tom Lane

ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> Tom Lane  writes:
>> I thought about using UV where feasible, but it was not clear to me
>> whether unsigned numbers behave semantically differently from signed ones
>> in Perl.  If they do, the change you suggest would create a small semantic
>> change in the behavior of code using spi_exec_query.  I'm not sure it's
>> worth taking any risk of that, considering we already discourage people
>> from using spi_exec_query for large results.

> IVs and UVs are semantically equivalent, and Perl will automatically
> convert between them (and NVs) as necessary, i.e. when crossing
> IV_MAX/UV_MAX/IV_MIN.

OK, thanks for the authoritative statement.

>> I don't much like having such hard-wired knowledge about Perl versions
>> in our code.  Is there a way to identify this change other than
>> #if PERL_BCDVERSION >= 0x5019004 ?

> There isn't, unfortunately.

Sigh ... it was worth asking anyway.

>>> 3) To be able to actually return such result sets from sp_exec_query(),
>>> I had to change the repalloc() in spi_printtup() to repalloc_huge().

>> Hmm, good point.  I wonder if there are any hazards to doing that.

> One hazard would be that people not heeding the warning in
> spi_exec_query will get a visit from the OOM killer (or death by swap)
> instead of a friendly "invalid memory alloc request size".

Yeah.  But there are plenty of other ways to drive a backend into swap
hell, and the whole point of this change is to eliminate arbitrary
boundaries on query result size.  So let's do it.

Thanks for the patch and discussion!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-14 Thread Tom Lane

Robert Haas  writes:
> On Mon, Mar 14, 2016 at 1:37 PM, Tom Lane  wrote:
>> Yeah.  An alternative definition that would support that would be to
>> call the upper-path-providing callback for each FDW that's responsible
>> for any base relation of the query.  But I think that that would often
>> lead to a lot of redundant/wasted computation, and it's not clear to
>> me that we can support such cases without other changes as well.

> Sure, that's fine with me.  Are you going to go make these changes now?

Yeah, in a bit.

> Eventually, we might just support a configurable flag on FDWs where
> FDWs that want to do this sort of thing can request callbacks on every
> join and every upper rel in the query.  But that can wait.

That'd be a possibility, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-03-14 Thread David Steele

Hi Abhijit,

On 3/1/16 8:36 AM, Jim Nasby wrote:

On 2/29/16 10:33 PM, Abhijit Menon-Sen wrote:

>Given the audience for this, I think it'd probably be OK to just
>provide a function that does this, instead of DDL.

That seems like a promising idea. Can you suggest some possible usage?

pg_extension_dependency( regextension, any )

where "any" would be all the other reg* types. That should be a lot less
work to code up than messing with the grammar.

So where are we on this now?  Were you going to implement this as a 
function the way Jim suggested?

Alexander, you are signed up to review.  Any opinion on which course is 
best?

Thanks,
--
-David
da...@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving replay of XLOG_BTREE_VACUUM records

2016-03-14 Thread Vladimir Borodin


> 10 марта 2016 г., в 14:38, Simon Riggs  написал(а):
> 
> On 10 March 2016 at 09:22, Michael Paquier  > wrote:
> On Thu, Mar 10, 2016 at 10:00 AM, Vladimir Borodin  > wrote:
> > Let’s do immediately after you will send a new version of your patch? Or
> > even better after testing your patch? Don’t get me wrong, but rejecting my
> > patch without tangible work on your patch may lead to forgiving about the
> > problem before 9.6 freeze.
> 
> This makes sense. Let's not reject this patch yet if the alternative
> approach is not committed.
> 
> I attach 2 patches.
> 
> avoid_pin_scan_always.v1.patch 
> Takes the approach that we generate the same WAL records as in 9.5, we just 
> choose not to do anything with that information. This is possible because we 
> don't care anymore whether it is toast or other relations. So it effectively 
> reverts parts of the earlier patch.
> This could be easily back-patched more easily.
> 
> toast_recheck.v1.patch
> Adds recheck code for toast access. I'm not certain this is necessary, but 
> here it is. No problems found with it.

JFYI, I’m preparing the stand to reproduce the initial problem and I hope to 
finish testing this week.

> 
> -- 
> Simon Riggshttp://www.2ndQuadrant.com/ 
> 
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 


--
May the force be with you…
https://simply.name

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-14 Thread David Steele


On 2/15/16 10:29 AM, Teodor Sigaev wrote:


It's very pity but author is not able to continue work on this patch,
and I would like to raise this flag.

I'd like to add some comments about patches:

traversalValue patch adds arbitrary value assoсiated with branch in
SP-GiST tree walk. Unlike to recostructedValue it could be just pointer,
it havn't to be a regular pgsql type. Also, it could be used independly
from reconstructedValue. This patch is used both following attached
patches.

range patch just switchs using reconstructedValue to traversalValue in
range opclasses. reconstructedValue was used just because it was an
acceptable workaround in case of range type. Range opclase stores a full
value in leafs and doesn't need to use reconstructedValue to return
tuple in index only scan.
See http://www.postgresql.org/message-id/5399.1343512...@sss.pgh.pa.us

q4d patch implements index over boxes using SP-GiST. Basic idea was an
observation, range opclass thinks about one-dimensional ranges as 2D
points.
Following this idea, we can think that 2D box (what is 2 points or 4
numbers) could represent a 4D point. We hoped that this approach will be
much more effective than traditional R-Tree in case of many overlaps in
box's collection.
Performance results are coming shortly.


It appears that the issues raised in this thread have been addressed but 
the patch still has not gone though a real review.


Anybody out there willing to take a crack at a review?  All three 
patches apply (with whitespace issues).


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-03-14 Thread Robert Haas

On Sun, Mar 13, 2016 at 11:50 PM, Thomas Munro
 wrote:
> The last patches I posted don't apply today due to changes in master,
> so here's a freshly merged patch series.

+from the current synchronous stanbyindicates it has received the

Uh, no.

-SyncRepWaitForLSN(gxact->prepare_end_lsn);
+{
+/*
+ * Don't wait for the prepare to be applied remotely in remote_apply
+ * mode, just wait for it to be flushed to the WAL.  We will wait for
+ * apply when the transaction eventuallly commits or aborts.
+ */
+if (synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY)
+assign_synchronous_commit(SYNCHRONOUS_COMMIT_REMOTE_FLUSH, NULL);
+
+SyncRepWaitForLSN(gxact->prepare_end_lsn);
+
+if (synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY)
+assign_synchronous_commit(SYNCHRONOUS_COMMIT_REMOTE_APPLY, NULL);
+}

What's with the extra block?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-14 Thread Robert Haas

On Sun, Mar 13, 2016 at 9:53 PM, Kouhei Kaigai  wrote:
> OK, I split the previous small patch into two tiny patches.
> The one is bugfix around max length of the extnodename.
> The other replaces Assert() by ereport() according to the upthread discussion.

Committed, except that (1) I replaced ereport() with elog(), because I
can't see making translators care about this message; and (2) I
reworded the error message a bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-03-14 Thread David Steele


On 2/24/16 12:40 AM, Michael Paquier wrote:


This has the merit to be clear, thanks for the input. Whatever the
approach taken at the end we have two candidates:
- Extend XLogInsert() with an extra argument for flags (Andres)
- Introduce XLogInsertExtended with this extra argument and let
XLogInsert() in peace (Robert and I).
Actually, I lied, there was still something I could do for this
thread: attached are two patches implementing both approaches as
respectively a-1 and a-2. Patch b is the set of logs I used for the
tests to show with a low checkpoint_timeout that checkpoints are
getting correctly skipped on an idle system.


Unfortunately neither A nor B apply anymore.

However, since the patches can still be read through I wonder if Robert 
or Andres would care to opine on whether A or B is better now that they 
can see the full implementation?


For my 2c I'm happy with XLogInsertExtended() since it seems to be a 
rare use case where flags are required.  This can always be refactored 
in the future if/when the use of flags spreads.


I think it would be good to make a decision on this before asking 
Michael to rebase.


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 1:37 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Mar 14, 2016 at 1:04 PM, Tom Lane  wrote:
>>> It would be better if we invent an FDW callback that's meant to be
>>> invoked at this stage, but only call it for FDW(s) actively involved
>>> in the query.  I'm not sure exactly what that ought to look like though.
>>> Maybe only call the FDW identified as possible owner of the topmost
>>> scan/join relation?
>
>> I think in the short term that's as well as we're going to do, so +1.
>> In the long run, I'm interested in making FDWs be able to optimize
>> queries like foreigntab JOIN localtab ON foreigntab.x = localtab.x
>> (e.g. by copying localtab to the remote side when it's small); but
>> that will require revisiting some old decisions, too.
>
> Yeah.  An alternative definition that would support that would be to
> call the upper-path-providing callback for each FDW that's responsible
> for any base relation of the query.  But I think that that would often
> lead to a lot of redundant/wasted computation, and it's not clear to
> me that we can support such cases without other changes as well.

Sure, that's fine with me.  Are you going to go make these changes now?

Eventually, we might just support a configurable flag on FDWs where
FDWs that want to do this sort of thing can request callbacks on every
join and every upper rel in the query.  But that can wait.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-14 Thread Robert Haas

On Fri, Mar 11, 2016 at 7:45 PM, Peter Geoghegan  wrote:
> On Fri, Mar 11, 2016 at 4:30 PM, Jim Nasby  wrote:
>> Right, but you still have the option to enable them if you don't want to
>> swamp your IO system. That's why CIC obeys it too. If I was running a
>> consistency check on a production system I'd certainly want the option to
>> throttle it. Without that option, I don't see running this on production
>> systems as being an option. If that's not a goal then fine, but if it is a
>> goal I think it needs to be there.
>>
>> Isn't it just a few extra lines of code to support it?
>
> I see your point.
>
> I'll add that if people like the interface you propose. (Overloading
> the VACUUM cost delay functions to cause a delay for amcheck
> functions, too). Note that the functions already use an appropriate
> buffer access strategy (it avoids blowing shared_buffers, much like
> VACUUM itself).

I don't particularly like that interface.  I also suggest that it
would be better to leave throttling to a future commit, and focus on
getting the basic feature in first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-14 Thread Tom Lane

Robert Haas  writes:
> On Mon, Mar 14, 2016 at 1:04 PM, Tom Lane  wrote:
>> It would be better if we invent an FDW callback that's meant to be
>> invoked at this stage, but only call it for FDW(s) actively involved
>> in the query.  I'm not sure exactly what that ought to look like though.
>> Maybe only call the FDW identified as possible owner of the topmost
>> scan/join relation?

> I think in the short term that's as well as we're going to do, so +1.
> In the long run, I'm interested in making FDWs be able to optimize
> queries like foreigntab JOIN localtab ON foreigntab.x = localtab.x
> (e.g. by copying localtab to the remote side when it's small); but
> that will require revisiting some old decisions, too.

Yeah.  An alternative definition that would support that would be to
call the upper-path-providing callback for each FDW that's responsible
for any base relation of the query.  But I think that that would often
lead to a lot of redundant/wasted computation, and it's not clear to
me that we can support such cases without other changes as well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Use correct types and limits for PL/Perl SPI query results

2016-03-14 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
>> 1) Perl's integers are at least pointer-sized and either signed or
>>unsigned, so can potentially hold up to 2⁶⁴-1. Floating point numbers
>>can also be larger than double (up to 128bit), allowing for exact
>>representation of integers beyond 2⁵³.  Hence, adjust the setting of
>>the "processed" hash item to use UV_MAX for the limit and (NV) or
>>(UV) for the casts.
>
> I thought about using UV where feasible, but it was not clear to me
> whether unsigned numbers behave semantically differently from signed ones
> in Perl.  If they do, the change you suggest would create a small semantic
> change in the behavior of code using spi_exec_query.  I'm not sure it's
> worth taking any risk of that, considering we already discourage people
> from using spi_exec_query for large results.

IVs and UVs are semantically equivalent, and Perl will automatically
convert between them (and NVs) as necessary, i.e. when crossing
IV_MAX/UV_MAX/IV_MIN.

>> 2) Perl 5.20 and later use SSize_t for array indices, so can cope with
>>up to SSize_t_max items in an array (if you have the memory).
>
> I don't much like having such hard-wired knowledge about Perl versions
> in our code.  Is there a way to identify this change other than
> #if PERL_BCDVERSION >= 0x5019004 ?

There isn't, unfortunately.  I could add e.g. AVidx_t and _MAX defines,
but that wouldn't be available until 5.26 (May 2017) at the earliest,
since full code freeze for May's 5.24 is next week.

>> 3) To be able to actually return such result sets from sp_exec_query(),
>>I had to change the repalloc() in spi_printtup() to repalloc_huge().
>
> Hmm, good point.  I wonder if there are any hazards to doing that.

One hazard would be that people not heeding the warning in
spi_exec_query will get a visit from the OOM killer (or death by swap)
instead of a friendly "invalid memory alloc request size".

>   regards, tom lane

ilmari

-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 1:04 PM, Tom Lane  wrote:
> Petr Jelinek  writes:
>> On 14/03/16 02:43, Kouhei Kaigai wrote:
>>> Even though I couldn't check the new planner implementation entirely,
>>> it seems to be the points below are good candidate to inject CustomPath
>>> (and potentially ForeignScan).
>>>
>>> - create_grouping_paths
>>> - create_window_paths
>>> - create_distinct_paths
>>> - create_ordered_paths
>>> - just below of the create_modifytable_path
>>> (may be valuable for foreign-update pushdown)
>
>> To me that seems too low inside the planning tree, perhaps adding it
>> just to the subquery_planner before SS_identify_outer_params would be
>> better, that's the place where you see the path for the whole (sub)query
>> so you can search and modify what you need from there.
>
> I don't like either of those too much.  The main thing I've noticed over
> the past few days is that you can't readily generate custom upper-level
> Paths unless you know what PathTarget grouping_planner is expecting each
> level to produce.  So what I was toying with doing is (1) having
> grouping_planner put all those targets into the PlannerInfo, perhaps
> in an array indexed by UpperRelationKind; and (2) adding a hook call
> immediately after those targets are computed, say right before
> the create_grouping_paths() call (approximately planner.c:1738
> in HEAD).  It should be sufficient to have one hook there since
> you can inject Paths into any of the upper relations at that point;
> moreover, that's late enough that you shouldn't have to recompute
> anything you figured out during scan/join planning.

That is a nice set of characteristics for a hook.

> Now, a simple hook function is probably about the best we can offer
> CustomScan providers, since we don't really know what they're going
> to do.  But I'm pretty unenthused about telling FDW authors to use
> such a hook, because generally speaking they're simply going to waste
> cycles finding out that they aren't involved in a given query.
> It would be better if we invent an FDW callback that's meant to be
> invoked at this stage, but only call it for FDW(s) actively involved
> in the query.  I'm not sure exactly what that ought to look like though.
> Maybe only call the FDW identified as possible owner of the topmost
> scan/join relation?

I think in the short term that's as well as we're going to do, so +1.
In the long run, I'm interested in making FDWs be able to optimize
queries like foreigntab JOIN localtab ON foreigntab.x = localtab.x
(e.g. by copying localtab to the remote side when it's small); but
that will require revisiting some old decisions, too.  What you're
proposing here sounds consistent with what we've done so far.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-14 Thread Robert Haas

On Mon, Mar 14, 2016 at 1:05 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Mar 13, 2016 at 10:51 PM, Tom Lane  wrote:
>>> I'm not really sold on enforcing that people create meaningless user
>>> mappings.  For one thing, they're likely to be sloppy about it, which
>>> could lead to latent security problems if the FDW later acquires a
>>> concept that user mappings mean something.
>
>> I think we should just fix build_simple_rel() so that it doesn't fail
>> if there is no user mapping.  It can just set rel->umid to InvalidOid
>> in that case, and if necessary we can adjust the code elsewhere to
>> tolerate that.  This wasn't an intentional behavior change, and I
>> think we should just put things back to the way they were.
>
> So, allow rel->umid to be InvalidOid if there's no user mapping, and
> when dealing with a join, allow combining when both sides have InvalidOid?

Exactly.  And we should make sure (possibly with a regression test)
that postgres_fdw handles that case correctly - i.e. with the right
error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-03-14 Thread Daniel Verite

Jim Nasby wrote:

> Ultimately I'd really like some way to remove/reduce the restriction of 
> result set definitions needing to be determined at plan time. That would 
> open the door for server-side crosstab/pivot as well a a host of other 
> things (such as dynamically turning a hstore/json/xml field into a 
> recordset).

> Ultimately I'd really like some way to remove/reduce the restriction of 
> result set definitions needing to be determined at plan time. That would 
> open the door for server-side crosstab/pivot as well a a host of other 
> things (such as dynamically turning a hstore/json/xml field into a 
> recordset).

That would go against a basic expectation of prepared statements, the
fact that queries can be parsed/prepared without any part of them
being executed.

For a dynamic pivot, but probably also for the other examples you
have in mind, the SQL engine wouldn't be able to determine the output
columns without executing a least a subselect to look inside some
table(s).

I suspect that the implications of this would be so far reaching and
problematic that it will just not happen.

It seems to me that a dynamic pivot will always consist of
two SQL queries that can never be combined into one,
unless using a workaround à la Oracle, which encapsulates the
entire dynamic resultset into an XML blob as output.
The problem here being that the client-side tools
that people routinely use are not equipped to process it anyway;
at least that's what  I find by anecdotal evidence for instance in:
https://community.oracle.com/thread/2133154?tstart=0
 or
http://stackoverflow.com/questions/19298424
 or
https://community.oracle.com/thread/2388982?tstart=0

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-14 Thread Tom Lane

Robert Haas  writes:
> On Sun, Mar 13, 2016 at 10:51 PM, Tom Lane  wrote:
>> I'm not really sold on enforcing that people create meaningless user
>> mappings.  For one thing, they're likely to be sloppy about it, which
>> could lead to latent security problems if the FDW later acquires a
>> concept that user mappings mean something.

> I think we should just fix build_simple_rel() so that it doesn't fail
> if there is no user mapping.  It can just set rel->umid to InvalidOid
> in that case, and if necessary we can adjust the code elsewhere to
> tolerate that.  This wasn't an intentional behavior change, and I
> think we should just put things back to the way they were.

So, allow rel->umid to be InvalidOid if there's no user mapping, and
when dealing with a join, allow combining when both sides have InvalidOid?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-14 Thread Tom Lane

Petr Jelinek  writes:
> On 14/03/16 02:43, Kouhei Kaigai wrote:
>> Even though I couldn't check the new planner implementation entirely,
>> it seems to be the points below are good candidate to inject CustomPath
>> (and potentially ForeignScan).
>> 
>> - create_grouping_paths
>> - create_window_paths
>> - create_distinct_paths
>> - create_ordered_paths
>> - just below of the create_modifytable_path
>> (may be valuable for foreign-update pushdown)

> To me that seems too low inside the planning tree, perhaps adding it
> just to the subquery_planner before SS_identify_outer_params would be
> better, that's the place where you see the path for the whole (sub)query
> so you can search and modify what you need from there.

I don't like either of those too much.  The main thing I've noticed over
the past few days is that you can't readily generate custom upper-level
Paths unless you know what PathTarget grouping_planner is expecting each
level to produce.  So what I was toying with doing is (1) having
grouping_planner put all those targets into the PlannerInfo, perhaps
in an array indexed by UpperRelationKind; and (2) adding a hook call
immediately after those targets are computed, say right before
the create_grouping_paths() call (approximately planner.c:1738
in HEAD).  It should be sufficient to have one hook there since
you can inject Paths into any of the upper relations at that point;
moreover, that's late enough that you shouldn't have to recompute
anything you figured out during scan/join planning.

Now, a simple hook function is probably about the best we can offer
CustomScan providers, since we don't really know what they're going
to do.  But I'm pretty unenthused about telling FDW authors to use
such a hook, because generally speaking they're simply going to waste
cycles finding out that they aren't involved in a given query.
It would be better if we invent an FDW callback that's meant to be
invoked at this stage, but only call it for FDW(s) actively involved
in the query.  I'm not sure exactly what that ought to look like though.
Maybe only call the FDW identified as possible owner of the topmost
scan/join relation?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] eXtensible Transaction Manager API (v2)

2016-03-14 Thread Kevin Grittner

On Sat, Mar 12, 2016 at 11:21 AM, Robert Haas  wrote:

> I'd also be interested in hearing Kevin Grittner's thoughts about
> serializability in a distributed environment, since he's obviously
> thought about the topic of serializability quite a bit.

I haven't done a thorough search of the academic literature on
this, and I wouldn't be comfortable taking a really solid position
without that; but in thinking about it it seems like there are at
least three distinct problems which may have distinct solutions.

*Physical replication* may be best handled by leveraging the "safe
snapshot" idea already implemented in READ ONLY DEFERRABLE
transactions, and passing through information in the WAL stream to
allow the receiver to identify points where a snapshot can be taken
which cannot see an anomaly.  There should probably be an option to
use the last known safe snapshot or wait for a point in the stream
where one next appears.  This might take as little as a bit or two
per WAL commit record.  It's not clear what the processing overhead
would be -- it wouldn't surprise me if it was "in the noise", nor
would it surprise me if it wasn't.  We would need some careful
benchmarking, and, if performance was an issue, A GUC to control
whether the information was passed along (and, thus, whether
SERIALIZABLE transactions were allowed on the replica).

*Logical replication* (considered for the moment in a
unidirectional context) might best be handled by some reordering of
application of the commits on the replica into "apparent order of
execution" -- which is pretty well defined on the primary based on
commit order adjusted by read-write dependencies.  Basically, the
"simple" implementation would be that WAL is applied normally
unless you receive a commit record which is flagged in some way to
indicate that it is for a serializable transaction which wrote data
and at the time of commit was concurrent with at least one other
serializable transaction which had not completed and was not READ
ONLY.  Such a commit would await information in the WAL stream to
tell it when all such concurrent transactions completed, and would
indicate when such a transaction had a read-write dependency *in*
to the transaction with the suspended commit; commits for any such
transactions must be moved ahead of the suspended commit.  This
would allow logical replication, with all the filtering and such,
to avoid ever showing a state on the replica which contained
serialization anomalies.

*Logical replication with cycles* (where there is some path for
cluster A to replicate to cluster B, and some other path for
cluster B to replicate the same or related data to cluster A) has a
few options.  You could opt for "eventual consistency" --
essentially giving up on the I in ACID and managing the anomalies.
In practice this seems to lead to some form of S2PL at the
application coding level, which is very bad for performance and
concurrency, so I tend to think it should not be the only option.
Built-in S2PL would probably perform better than having it pasted
on at the application layer through some locking API, but for most
workloads is still inferior to SSI in both concurrency and
performance.  Unless a search of the literature turns up some new
alternative, I'm inclined to think that if you want to distribute a
"logical" database over multiple clusters and still manage race
conditions through use of SERIALIZABLE transactions, a distributed
SSI implementation may be the best bet.  That requires the
transaction manager (or something like it) to track non-blocking
predicate "locks" (what the implementation calls a SIReadLock)
across the whole environment, as well as tracking rw-conflicts (our
short name for read-write dependencies) across the whole
environment.  Since SSI also looks at the MVCC state, handling
checks of that without falling victim to race conditions would also
need to be handled somehow.

If I remember correctly, the patches to add the SSI implementation
of SERIALIZABLE transactions were about ten times the size of the
patches to remove S2PL and initially replace it with MVCC.  I don't
have even a gut feel as to how much bigger the distributed form is
likely to be.  On the one hand the *fundamental logic* is all there
and should not need to change; on the other hand the *mechanism*
for acquiring the data to be *used* in that logic would be
different and potentially complex.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_get_progress_info(NULL) blows up

2016-03-14 Thread Tom Lane

Thomas Munro  writes:
> I guess pg_stat_get_progress_info should either be strict (see
> attached) or check for NULL.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-03-14 Thread Alvaro Herrera

Jim Nasby wrote:
> On 3/13/16 12:48 AM, Pavel Stehule wrote:
> >crosstabview is really visualization tool. **But now, there are not any
> >other tool available from terminal.** So this can be significant help to
> >all people who would to use this functionality.
> 
> Not just the terminal either. Offhand I'm not aware of *any* fairly simple
> tool that provides crosstab. There's a bunch of complicated/expensive BI
> tools that do, but unless you've gone through the trouble of getting one of
> those setup you're currently pretty stuck.

I'm definitely +1 for this feature in psql also.

Some years ago we had a discussion about splitting psql in two parts, a
bare-bones one which would help script-writing and another one with
fancy features; we decided to keep one tool to rule them all and made
the implicit decision that we would grow exotic, sophisticated features
into psql.  ISTM that this patch is going in that direction.

> Ultimately I'd really like some way to remove/reduce the restriction of
> result set definitions needing to be determined at plan time. That would
> open the door for server-side crosstab/pivot as well a a host of other
> things (such as dynamically turning a hstore/json/xml field into a
> recordset).

That seems so far down the road that I don't think it should block the
psql feature being proposed in this thread, but yes I would like that
one too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 149 matches

Mail list logo