date:20170407

[HACKERS] [sqlsmith] Planner crash on foreign table join

2017-04-07 Thread Andreas Seltenreich

Hi,

testing master at f0e44021df with a loopback postgres_fdw installed, I
see lots of crashes on queries joining foreign tables with various
expressions.  Below is a reduced recipe for the regression database and
a backtrace.

regards,
Andreas

--8<---cut here---start->8---
create extension postgres_fdw;
create server myself foreign data wrapper postgres_fdw;
create schema fdw_postgres;
create user fdw login;
grant all on schema public to fdw;
grant all on all tables in schema public to fdw;
create user mapping for public server myself options (user 'fdw');
import foreign schema public from server myself into fdw_postgres;

explain select from
  fdw_postgres.hslot
left join fdw_postgres.num_exp_div
on ((exists (values (1))) and (values (1)) is null);
--8<---cut here---end--->8---

Program terminated with signal SIGSEGV, Segmentation fault.
#0  bms_get_singleton_member (a=0x10, member=member@entry=0x7fffb577cafc) at 
bitmapset.c:577
#1  0x56425107b531 in find_relation_from_clauses (clauses=0x564251a68570, 
root=0x564251a273d8) at clausesel.c:445
#2  clauselist_selectivity (root=root@entry=0x564251a273d8, 
clauses=0x564251a68570, varRelid=varRelid@entry=0, jointype=JOIN_LEFT, 
sjinfo=0x564251a661c0) at clausesel.c:128
#3  0x7f61d3d9f22f in postgresGetForeignJoinPaths (root=, 
joinrel=0x564251a66ba8, outerrel=, innerrel=, 
jointype=, extra=0x7fffb577cc50) at postgres_fdw.c:4466
#4  0x56425108a238 in add_paths_to_joinrel (root=root@entry=0x564251a273d8, 
joinrel=joinrel@entry=0x564251a66ba8, outerrel=outerrel@entry=0x564251a65378, 
innerrel=innerrel@entry=0x564251a65f30, jointype=jointype@entry=JOIN_LEFT, 
sjinfo=sjinfo@entry=0x564251a661c0, restrictlist=0x564251a681c8) at 
joinpath.c:278
#5  0x56425108bff2 in populate_joinrel_with_paths (restrictlist=, sjinfo=0x564251a661c0, joinrel=0x564251a66ba8, rel2=0x564251a65f30, 
rel1=0x564251a65378, root=0x564251a273d8) at joinrels.c:795
#6  make_join_rel (root=root@entry=0x564251a273d8, 
rel1=rel1@entry=0x564251a65378, rel2=rel2@entry=0x564251a65f30) at 
joinrels.c:731
#7  0x56425108c7ef in make_rels_by_clause_joins (other_rels=, old_rel=, root=) at joinrels.c:277
#8  join_search_one_level (root=root@entry=0x564251a273d8, level=level@entry=2) 
at joinrels.c:99
#9  0x564251079bdb in standard_join_search (root=0x564251a273d8, 
levels_needed=2, initial_rels=) at allpaths.c:2385
#10 0x56425107ac7b in make_one_rel (root=root@entry=0x564251a273d8, 
joinlist=joinlist@entry=0x564251a65998) at allpaths.c:184
#11 0x564251099ef4 in query_planner (root=root@entry=0x564251a273d8, 
tlist=tlist@entry=0x0, qp_callback=qp_callback@entry=0x56425109aeb0 
, qp_extra=qp_extra@entry=0x7fffb577cff0) at 
planmain.c:253
#12 0x56425109dbc2 in grouping_planner (root=root@entry=0x564251a273d8, 
inheritance_update=inheritance_update@entry=0 '\000', tuple_fraction=, tuple_fraction@entry=0) at planner.c:1684
#13 0x5642510a0133 in subquery_planner (glob=glob@entry=0x564251a2a6d0, 
parse=parse@entry=0x5642519aac60, parent_root=parent_root@entry=0x0, 
hasRecursion=hasRecursion@entry=0 '\000', 
tuple_fraction=tuple_fraction@entry=0) at planner.c:833
#14 0x5642510a0f71 in standard_planner (parse=0x5642519aac60, 
cursorOptions=256, boundParams=0x0) at planner.c:333
#15 0x5642511458cd in pg_plan_query 
(querytree=querytree@entry=0x5642519aac60, cursorOptions=256, 
boundParams=boundParams@entry=0x0) at postgres.c:802
#16 0x564250fa9a40 in ExplainOneQuery (query=0x5642519aac60, 
cursorOptions=, into=0x0, es=0x564251a513a0, 
queryString=0x564251a09590 "explain select from\n  fdw_postgres.hslot\nleft 
join fdw_postgres.num_exp_div\non ((exists (values (1))) and (values (1)) 
is null);", params=0x0, queryEnv=0x0) at explain.c:367
#17 0x564250faa005 in ExplainQuery (pstate=pstate@entry=0x564251a511f0, 
stmt=stmt@entry=0x564251a0aa58, queryString=queryString@entry=0x564251a09590 
"explain select from\n  fdw_postgres.hslot\nleft join 
fdw_postgres.num_exp_div\non ((exists (values (1))) and (values (1)) is 
null);", params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, 
dest=dest@entry=0x564251a51308) at explain.c:256
#18 0x56425114b9cb in standard_ProcessUtility (pstmt=0x564251a0b2e0, 
queryString=0x564251a09590 "explain select from\n  fdw_postgres.hslot\nleft 
join fdw_postgres.num_exp_div\non ((exists (values (1))) and (values (1)) 
is null);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, 
dest=0x564251a51308, completionTag=0x7fffb577d390 "") at utility.c:680
#19 0x5642511487b4 in PortalRunUtility (portal=0x564251a07580, 
pstmt=0x564251a0b2e0, isTopLevel=, setHoldSnapshot=, dest=, completionTag=0x7fffb577d390 "") at pquery.c:1179
#20 0x564251149633 in FillPortalStore (portal=portal@entry=0x564251a07580, 
isTopLevel=isTopLevel@entry=1 '\001') at pquery.c:1039
#21 0x56425114a21d in PortalRun (portal=portal@entry=0x564251a0

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Masahiko Sawada

On Sat, Apr 8, 2017 at 1:09 PM, Masahiko Sawada  wrote:
> On Sat, Apr 8, 2017 at 3:37 AM, Andres Freund  wrote:
>> Hi,
>>
>> When I started writing this, there were the following reamining CF
>> items, minus bugfix ones which aren't bound by the code freeze.
>>
>> I think it makes sense to go through those and see whether it's
>> realistic to commit any of them.
>>
>> Ready for Committer:
>>
>> Add GUCs for predicate lock promotion thresholds:
>> - claimed by Kevin, should be easy enough
>>
>> initdb configurable wal_segment_size
>> - parts have been committed
>> - significantly derailed by segment naming discussion
>> - possibly committable if we decide to skip the naming bit? But also a
>>   bit late given that it touches some quite sensitive code.
>>
>> Create fdw_outerpath for foreign
>> - haven't really followed discussion
>> - only marked as ready-for-committer 2017-04-04
>>
>> Vacuum: allow usage of more than 1GB of work mem
>> - hm, maybe?  Will take a look.
>>
>> Unique Joins
>> - Tom's discussing things with David, not sure.
>>
>> Push down more UPDATEs/DELETEs in postgres_fdw
>> - claimed by Robert?
>>
>> postgres_fdw: support parameterized foreign joins
>> - think that depends on fdw_outerpath?
>>
>>
>> Waiting on Author:
>>
>> SQL statements statistics counter view (pg_stat_sql)
>> - the code doesn't look quite ready
>> - don't think we quite have design agreement, e.g. I don't like where it's
>>   hooked into query execution
>>
>> Write Amplification Reduction Method (WARM)
>> - fair number of people don't think it's ready for v10.
>> - can't move to next fest because it's waiting-on-author, which doesn't
>>   allow that.  Doesn't strike me as a useful restriction.
>>
>> BRIN optimize memory allocation
>> - I think Alvaro has indicated that he wants to take care of that?
>>
>> Indexes with Included Columns (was Covering + unique indexes)
>> - Don't think concerns about #columns on truncated tuples have been
>>   addressed.  Should imo be returned-with-feedback.
>>
>>
>> Needs-Review:
>>
>> Better estimate merging for duplicate vars in clausesel.c
>> - has been submitted pretty late (2017-02-24) and discussion is ongoing
>> - I'm inclined to punt on this one to the next release, previous
>>   proposal along that line got some pushback
>>
>> new plpgsql extra_checks
>> - Winner of the "most opaque CF title" award
>> - hasn't received a whole lot of review
>> - don't think we're even close to having design agreement
>>
>> Generic type subscripting
>> - still some review back and forth
>> - probably should be punted
>>
>>
>> Any comments?
>>
>
> HI,
>
> Could you consider the item 2PC on FDW as well? It is marked as "Move
> to Next CF" early yesterday but I'm not sure that reason..
>

Oops, I meant "Transactions involving multiple postgres foreign servers"[1].

[1] https://commitfest.postgresql.org/13/928/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Tom Lane

Thomas Munro  writes:
> Here is an attempt at option 2 from the menu I posted above.  Questions:

> 1.  Does anyone object to this extension of pg_blocking_pids()'s
> remit?  If so, I could make it a separate function (that was option
> 3).

It seems an entirely principle-free change in the function's definition.

I'm not actually clear on why Kevin wanted this change in
isolationtester's wait behavior anyway, so maybe some clarification
on that would be a good idea.  But if we need it, I think probably
a dedicated function would be a good thing.  We want the wait-checking
query to be as trivial as possible at the SQL level, so whatever
semantic oddities it needs to have should be pushed into C code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Masahiko Sawada

On Sat, Apr 8, 2017 at 3:37 AM, Andres Freund  wrote:
> Hi,
>
> When I started writing this, there were the following reamining CF
> items, minus bugfix ones which aren't bound by the code freeze.
>
> I think it makes sense to go through those and see whether it's
> realistic to commit any of them.
>
> Ready for Committer:
>
> Add GUCs for predicate lock promotion thresholds:
> - claimed by Kevin, should be easy enough
>
> initdb configurable wal_segment_size
> - parts have been committed
> - significantly derailed by segment naming discussion
> - possibly committable if we decide to skip the naming bit? But also a
>   bit late given that it touches some quite sensitive code.
>
> Create fdw_outerpath for foreign
> - haven't really followed discussion
> - only marked as ready-for-committer 2017-04-04
>
> Vacuum: allow usage of more than 1GB of work mem
> - hm, maybe?  Will take a look.
>
> Unique Joins
> - Tom's discussing things with David, not sure.
>
> Push down more UPDATEs/DELETEs in postgres_fdw
> - claimed by Robert?
>
> postgres_fdw: support parameterized foreign joins
> - think that depends on fdw_outerpath?
>
>
> Waiting on Author:
>
> SQL statements statistics counter view (pg_stat_sql)
> - the code doesn't look quite ready
> - don't think we quite have design agreement, e.g. I don't like where it's
>   hooked into query execution
>
> Write Amplification Reduction Method (WARM)
> - fair number of people don't think it's ready for v10.
> - can't move to next fest because it's waiting-on-author, which doesn't
>   allow that.  Doesn't strike me as a useful restriction.
>
> BRIN optimize memory allocation
> - I think Alvaro has indicated that he wants to take care of that?
>
> Indexes with Included Columns (was Covering + unique indexes)
> - Don't think concerns about #columns on truncated tuples have been
>   addressed.  Should imo be returned-with-feedback.
>
>
> Needs-Review:
>
> Better estimate merging for duplicate vars in clausesel.c
> - has been submitted pretty late (2017-02-24) and discussion is ongoing
> - I'm inclined to punt on this one to the next release, previous
>   proposal along that line got some pushback
>
> new plpgsql extra_checks
> - Winner of the "most opaque CF title" award
> - hasn't received a whole lot of review
> - don't think we're even close to having design agreement
>
> Generic type subscripting
> - still some review back and forth
> - probably should be punted
>
>
> Any comments?
>

HI,

Could you consider the item 2PC on FDW as well? It is marked as "Move
to Next CF" early yesterday but I'm not sure that reason..

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 9:24 PM, Thomas Munro
 wrote:

> 2.  Did I understand correctly that it is safe to scan the list of
> SERIALIZABLEXACTs and access the possibleUnsafeConflicts list while
> holding only SerializableXactHashLock,

Yes.

> and that 'inLink' is the correct link to be following?

If you're starting from the blocked (read-only) transaction (which
you are), inLink is the one to follow.

Note: It would be better form to use the SxactIsDeferrableWaiting()
macro than repeat the bit-testing code directly in your function.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2017-04-07 Thread Kevin Grittner

On Mon, Feb 27, 2017 at 7:35 AM, Dagfinn Ilmari Mannsåker
 wrote:
> Kevin Grittner  writes:
>
>> It occurred to me that it would make sense to allow these settings
>> to be attached to a database or table (though *not* a role).  I'll
>> look at that when I get back if you don't get to it first.
>
> Attached is a draft patch (no docs or tests) on top of the previous one,
> adding reloptions for the per-relation and per-page thresholds.  That
> made me realise that the corresponding GUCs don't need to be PGC_SIGHUP,
> but could be PGC_SUSET or PGC_USERSET.  I'll adjust the original patch
> if you agree.  I have not got around around to adding per-database
> versions of the setting, but could have a stab at that.

Unfortunately, I was unable to get the follow-on patch to allow
setting by relation into a shape I liked.  Let's see what we can do
for the next release.  The first patch was applied with only very
minor tweaks.  I realize that nothing would break if individual
users could set their granularity thresholds on individual
connections, but as someone who has filled the role of DBA, the
thought kinda made my skin crawl.  I left it as PGC_SIGHUP for now;
we can talk about loosening that up later, but I think we should
have one or more use-cases that outweigh the opportunities for
confusion and bad choices by individual programmers to justify that.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] valgrind errors around dsa.c

2017-04-07 Thread Thomas Munro

On Sat, Apr 8, 2017 at 8:57 AM, Thomas Munro
 wrote:
> On Sat, Apr 8, 2017 at 4:49 AM, Andres Freund  wrote:
>> Hi,
>>
>> newly added tests exercise parallel bitmap scans.  And they trigger
>> valgrind errors:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-07%2007%3A10%3A01
>>
>>
>> ==4567== VALGRINDERROR-BEGIN
>> ==4567== Conditional jump or move depends on uninitialised value(s)
>> ==4567==at 0x5FD62A: check_for_freed_segments (dsa.c:2219)
>> ==4567==by 0x5FD97E: dsa_get_address (dsa.c:934)
>> ==4567==by 0x5FDA2A: init_span (dsa.c:1339)
>> ==4567==by 0x5FE6D1: ensure_active_superblock (dsa.c:1696)
>> ==4567==by 0x5FEBBD: alloc_object (dsa.c:1452)
>> ==4567==by 0x5FEBBD: dsa_allocate_extended (dsa.c:693)
>> ==4567==by 0x3C7A83: pagetable_allocate (tidbitmap.c:1536)
>> ==4567==by 0x3C7A83: pagetable_create (simplehash.h:342)
>> ==4567==by 0x3C7A83: tbm_create_pagetable (tidbitmap.c:323)
>> ==4567==by 0x3C8DAD: tbm_get_pageentry (tidbitmap.c:1246)
>> ==4567==by 0x3C98A1: tbm_add_tuples (tidbitmap.c:432)
>> ==4567==by 0x22510C: btgetbitmap (nbtree.c:460)
>> ==4567==by 0x21A8D1: index_getbitmap (indexam.c:726)
>> ==4567==by 0x38AD48: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:91)
>> ==4567==by 0x37D353: MultiExecProcNode (execProcnode.c:621)
>> ==4567==  Uninitialised value was created by a heap allocation
>> ==4567==at 0x602FD5: palloc (mcxt.c:872)
>> ==4567==by 0x5FF73B: create_internal (dsa.c:1242)
>> ==4567==by 0x5FF8F5: dsa_create_in_place (dsa.c:473)
>> ==4567==by 0x37CA32: ExecInitParallelPlan (execParallel.c:532)
>> ==4567==by 0x38C324: ExecGather (nodeGather.c:152)
>> ==4567==by 0x37D247: ExecProcNode (execProcnode.c:551)
>> ==4567==by 0x39870F: ExecNestLoop (nodeNestloop.c:156)
>> ==4567==by 0x37D1B7: ExecProcNode (execProcnode.c:512)
>> ==4567==by 0x3849D4: fetch_input_tuple (nodeAgg.c:686)
>> ==4567==by 0x387764: agg_retrieve_direct (nodeAgg.c:2306)
>> ==4567==by 0x387A11: ExecAgg (nodeAgg.c:2117)
>> ==4567==by 0x37D217: ExecProcNode (execProcnode.c:539)
>> ==4567==
>>
>> It could be that these are spurious due to shared memory - valgrind
>> doesn't track definedness across processes - but the fact that memory
>> allocated by palloc is the source of the undefined memory makes me doubt
>> that.
>
> Thanks.  Will post a fix for this later today.

Fix attached.

Explanation:  Whenever segments are destroyed because they no longer
contain any live blocks, the shared variable
control->freed_segment_counter advances.  Each attached backend has
its own local variable area->freed_segment_counter, and if it sees
that the former differs from the latter it checks all attached
segments to see if any need to be detached.  I failed to initialise
the backend-local version, with the consequence that if you were very
unlucky your backend could fail to detach from a no-longer needed
segment until a another segment was eventually freed causing the
shared counter to move again.  More likely, it would notice that they
are different because one holds uninitialised junk, perform a spurious
scan for dead segments, and then get them in sync.

-- 
Thomas Munro
http://www.enterprisedb.com


initialise-freed-segment-counter.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-07 Thread Joe Conway

On 04/07/2017 05:36 PM, Robert Haas wrote:
> On Fri, Apr 7, 2017 at 5:22 PM, Joe Conway  wrote:
>> 1) commit the 0002 patch now before the feature freeze and follow up
>>with the regression test patch when ready in a couple of days
>> 2) hold off on both patches until ready
>> 3) push both patches to the next commitfest/pg11
>>
>> Some argue this is an open issue against the new partitioning feature in
>> pg10 and therefore should be addressed now, and others do not. I can see
>> both sides of that argument.
>>
>> In any case, thoughts on what to do?
> 
> Speaking only for myself, I'm OK with any of those options, provided
> that that "a couple" means what my dictionary says it means.

Thanks. I'd prefer not to do #1 actually, and I think we can adhere to
the dictionary meaning of "a couple" (i.e. by EOD Sunday). Assuming we
are ready by Sunday I will push both together (#2) or else I will move
them both together to the next CF (#3).

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Thomas Munro

On Sat, Apr 8, 2017 at 9:47 AM, Kevin Grittner  wrote:
> On Fri, Apr 7, 2017 at 3:47 PM, Thomas Munro
>  wrote:
>> On Sat, Apr 8, 2017 at 6:35 AM, Kevin Grittner  wrote:
>>> On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund  wrote:
>>>
 I'd rather fix the issue, than remove the tests entirely.  Seems quite
 possible to handle blocking on Safesnapshot in a similar manner as 
 pg_blocking_pids?
>>>
>>> I'll see what I can figure out.
>>
>> Ouch.  These are the other ways I thought of to achieve this:
>>
>> https://www.postgresql.org/message-id/CAEepm%3D1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN%3D0LY3pJ49Ksg%40mail.gmail.com
>>
>> I'd be happy to write one of those, but it may take a day as I have
>> some other commitments.
>
> Please give it a go.  I'm dealing with putting out fires with
> customers while trying to make sure I have tested the predicate
> locking GUCs patch sufficiently.  (I think it's ready to go, and has
> passed all tests so far, but there are a few more I want to run.)
> I'm not sure I can come up with a solution faster than you, given
> that.  Since it is an improvement to performance for a test-only
> environment on a feature that is already committed and not causing
> problems for production environments, hopefully people will tolerate
> a fix a day or two out.  If not, we'll have to revert and get it
> into the first CF for v11.

Here is an attempt at option 2 from the menu I posted above.  Questions:

1.  Does anyone object to this extension of pg_blocking_pids()'s
remit?  If so, I could make it a separate function (that was option
3).

2.  Did I understand correctly that it is safe to scan the list of
SERIALIZABLEXACTs and access the possibleUnsafeConflicts list while
holding only SerializableXactHashLock, and that 'inLink' is the
correct link to be following?

-- 
Thomas Munro
http://www.enterprisedb.com


safe-snapshot-blocker-introspection.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-04-07 Thread Tom Lane

David Rowley  writes:
[ unique_joins_2017-04-07b.patch ]

It turned out that this patch wasn't as close to committable as I'd
thought, but after a full day of whacking at it, I got to a place
where I thought it was OK.  So, pushed.

[ and that's a wrap for v10 feature freeze, I think ]

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-07 Thread Claudio Freire

On Fri, Apr 7, 2017 at 10:06 PM, Claudio Freire  wrote:
>>> >> + if (seg->num_dead_tuples >= seg->max_dead_tuples)
>>> >> + {
>>> >> + /*
>>> >> +  * The segment is overflowing, so we must allocate 
>>> >> a new segment.
>>> >> +  * We could have a preallocated segment descriptor 
>>> >> already, in
>>> >> +  * which case we just reinitialize it, or we may 
>>> >> need to repalloc
>>> >> +  * the vacrelstats->dead_tuples array. In that 
>>> >> case, seg will no
>>> >> +  * longer be valid, so we must be careful about 
>>> >> that. In any case,
>>> >> +  * we must update the last_dead_tuple copy in the 
>>> >> overflowing
>>> >> +  * segment descriptor.
>>> >> +  */
>>> >> + Assert(seg->num_dead_tuples == 
>>> >> seg->max_dead_tuples);
>>> >> + seg->last_dead_tuple = 
>>> >> seg->dt_tids[seg->num_dead_tuples - 1];
>>> >> + if (vacrelstats->dead_tuples.last_seg + 1 >= 
>>> >> vacrelstats->dead_tuples.num_segs)
>>> >> + {
>>> >> + int new_num_segs = 
>>> >> vacrelstats->dead_tuples.num_segs * 2;
>>> >> +
>>> >> + vacrelstats->dead_tuples.dt_segments = 
>>> >> (DeadTuplesSegment *) repalloc(
>>> >> +(void *) 
>>> >> vacrelstats->dead_tuples.dt_segments,
>>> >> +
>>> >> new_num_segs * sizeof(DeadTuplesSegment));
>>> >
>>> > Might be worth breaking this into some sub-statements, it's quite hard
>>> > to read.
>>>
>>> Breaking what precisely? The comment?
>>
>> No, the three-line statement computing the new value of
>> dead_tuples.dt_segments.  I'd at least assign dead_tuples to a local
>> variable, to cut the length of the statement down.
>
> Ah, alright. Will try to do that.

Attached is an updated patch set with the requested changes.

Segment allocation still follows the exponential strategy, and segment
lookup is still linear.

I rebased the early free patch (patch 3) to apply on top of the v9
patch 2 (it needed some changes). I recognize the early free patch
didn't get nearly as much scrutiny, so I'm fine with commiting only 2
if that one's ready to go but 3 isn't.

If it's decided to go for fixed 128M segments and a binary search of
segments, I don't think I can get that ready and tested before the
commitfest ends.
From 9b8f90f19d558a7e6a32cb253d89819f7c300598 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH 1/2] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum and thus reduce the number of full
index scans necessary to remove all dead tids when the memory is
available.
---
 src/backend/commands/vacuumlazy.c| 346 ---
 src/test/regress/expected/vacuum.out |   8 +
 src/test/regress/sql/vacuum.sql  |  10 +
 3 files changed, 299 insertions(+), 65 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 005440e..4f0cf1b 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -12,11 +12,25 @@
  *
  * We are willing to use at most maintenance_work_mem (or perhaps
  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
- * initially allocate an array of TIDs of that size, with an upper limit that
+ * initially allocate an array of TIDs of 128MB, or an upper limit that
  * depends on table size (this limit ensures we don't allocate a huge area
- * uselessly for vacuuming small tables).  If the array threatens to overflow,
+ * uselessly for vacuuming small tables). Additional arrays of increasingly
+ * large sizes are allocated as they become necessary.
+ *
+ * The TID array is thus represented as a list of multiple segments of
+ * varying size, beginning with the initial size of up to 128MB, and growing
+ * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
+ * is used up.
+ *
+ * Lookup in that structure proceeds sequentially in the list of segments,
+ * and with a binary search within each segment. Since segment's size grows
+ * exponentially, this retains O(log N) lookup complexity (2 log N to be
+ * precise).
+ *
+ * If the multiarray's total size threatens to grow beyond maintenance_work_mem,
  * we suspend the heap scan phase and perform a pass of index cleanup and page
- * compaction, then resume the heap scan with an empty TID array.
+ * compaction, then resume the heap scan with an array of logically empty but
+ * already preallocated TID segments to be refilled with more dead tuple T

Re: [pgsql-www] [HACKERS] Small issue in online devel documentation build

2017-04-07 Thread Bruce Momjian

On Fri, Mar 24, 2017 at 07:01:46AM +0100, Fabien COELHO wrote:
> 
> Hello Peter,
> 
> >I think the fix belongs into the web site CSS, so there is nothing to
> >commit into PostgreSQL here.
> 
> Indeed, the changes were only for the "remove nesting" solution.
> 
> >I will close the commit fest entry, but I have added a section to the open
> >items list so we keep track of it. 
> >(https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Documentation_tool_chain)
> 
> I put forward that the quick workaround a colleague of mine suggested (aka
> something like code code { font-size: 100%; important! }) could also be
> applied to the web site CSS while waiting for a more definite answer which
> might take some pretty unknown time close to never?

Sorry I am just getting back to this.  Below I am going to cover only
the problem with the font size of nested  tags, and I am going to
confirm what most people already figured out.

The basic problem was already posted by Fabien, with an image example. 
The cause of the fonts being too large on Chrome is an interaction of
Chrome's default font size for different blocks, the JavaScript that is
meant to fix such mismatches, and the new nested code blocks in the PG
10 docs.

First, the JavaScript:

https://github.com/postgres/pgweb/blob/master/media/js/monospacefix.js

There is no git history for this file except for its initial checkin in
2011, but I am pretty sure I wrote it.  What it does is to create 
and  blocks, find the font point size, and compute a ratio.  If the
ratio is not 1, , , and  blocks are adjusted in size to
match .  The complex part is that the JavaScript conditionally
injects CSS into the web-page to accomplish this.

The reason the PG 10 docs look fine on Linux Firefox is because the font
points sizes match so no CSS is injected.  They don't match on Chrome,
so the CSS is injected.  When the CSS hits double-embedded code blocks,
 , it makes the font too large because it double-adjusts.

The fix, as Fabien identified, is to conditionally inject additional CSS
to be _more_ specific than the first CSS and set the font-size to a
simple '1em' so the first CSS is not called twice.  I don't think
'important!' is necessary but it would be good to test this.

Attached is a patch that can be applied to pgweb which should fix all of
this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/media/js/monospacefix.js b/media/js/monospacefix.js
new file mode 100644
index 8523932..42ce913
*** a/media/js/monospacefix.js
--- b/media/js/monospacefix.js
*** if (newMonoSize != 1)
*** 19,24 
  {
  document.write(''
  	+ '#docContainer tt, #docContainer pre, #docContainer code'
! 	+ '{font-size: ' + newMonoSize.toFixed(1) + 'em;}\n');
  }

--- 19,27 
  {
  document.write(''
  	+ '#docContainer tt, #docContainer pre, #docContainer code'
! 	+ '{font-size: ' + newMonoSize.toFixed(1) + 'em;}\n'
! 	/* prevent embedded code tags from changing font size */
! 	+ '#docContainer code code'
! 	+ '{font-size: 1em;}\n');
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-07 Thread Claudio Freire

On Fri, Apr 7, 2017 at 10:12 PM, Andres Freund  wrote:
> On 2017-04-07 22:06:13 -0300, Claudio Freire wrote:
>> On Fri, Apr 7, 2017 at 9:56 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> >
>> > On 2017-04-07 19:43:39 -0300, Claudio Freire wrote:
>> >> On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund  wrote:
>> >> > Hi,
>> >> >
>> >> > I've *not* read the history of this thread.  So I really might be
>> >> > missing some context.
>> >> >
>> >> >
>> >> >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
>> >> >> From: Claudio Freire 
>> >> >> Date: Mon, 12 Sep 2016 23:36:42 -0300
>> >> >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
>> >> >>
>> >> >> Turn the dead_tuples array into a structure composed of several
>> >> >> exponentially bigger arrays, to enable usage of more than 1GB
>> >> >> of work mem during vacuum and thus reduce the number of full
>> >> >> index scans necessary to remove all dead tids when the memory is
>> >> >> available.
>> >> >
>> >> >>   * We are willing to use at most maintenance_work_mem (or perhaps
>> >> >>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
>> >> >> - * initially allocate an array of TIDs of that size, with an upper 
>> >> >> limit that
>> >> >> + * initially allocate an array of TIDs of 128MB, or an upper limit 
>> >> >> that
>> >> >>   * depends on table size (this limit ensures we don't allocate a huge 
>> >> >> area
>> >> >> - * uselessly for vacuuming small tables).  If the array threatens to 
>> >> >> overflow,
>> >> >> - * we suspend the heap scan phase and perform a pass of index cleanup 
>> >> >> and page
>> >> >> - * compaction, then resume the heap scan with an empty TID array.
>> >> >> + * uselessly for vacuuming small tables). Additional arrays of 
>> >> >> increasingly
>> >> >> + * large sizes are allocated as they become necessary.
>> >> >> + *
>> >> >> + * The TID array is thus represented as a list of multiple segments of
>> >> >> + * varying size, beginning with the initial size of up to 128MB, and 
>> >> >> growing
>> >> >> + * exponentially until the whole budget of 
>> >> >> (autovacuum_)maintenance_work_mem
>> >> >> + * is used up.
>> >> >
>> >> > When the chunk size is 128MB, I'm a bit unconvinced that using
>> >> > exponential growth is worth it. The allocator overhead can't be
>> >> > meaningful in comparison to collecting 128MB dead tuples, the potential
>> >> > waste is pretty big, and it increases memory fragmentation.
>> >>
>> >> The exponential strategy is mainly to improve lookup time (ie: to
>> >> avoid large segment lists).
>> >
>> > Well, if we were to do binary search on the segment list, that'd not be
>> > necessary.
>>
>> True, but the initial lookup might be slower in the end, since the
>> array would be bigger and cache locality worse.
>>
>> Why do you say exponential growth fragments memory? AFAIK, all those
>> allocations are well beyond the point where malloc starts mmaping
>> memory, so each of those segments should be a mmap segment,
>> independently freeable.
>
> Not all platforms have that, and even on platforms with it, frequent,
> unevenly sized, very large allocations can lead to enough fragmentation
> that further allocations are harder and fragment / enlarge the
> pagetable.

I wouldn't call this frequent. You can get at most slightly more than
a dozen such allocations given the current limits.
And allocation sizes are quite regular - you get 128M or multiples of
128M, so each free block can be reused for N smaller allocations if
needed. I don't think it has much potential to fragment memory.

This isn't significantly different from tuplesort or any other code
that can do big allocations, and the differences favor less
fragmentation than those, so I don't see why this would need special
treatment.

My point being that it's not been simple getting to a point where this
beats even in CPU time the original single binary search.
If we're to scrap this implementation and go for a double binary
search, I'd like to have a clear measurable benefit to chase from
doing so. Fragmentation is hard to measure, and I cannot get CPU-bound
vacuums on the test hardware I have to test lookup performance at big
scales.

>> >> Yes, the benchmarks are upthread. The earlier runs were run on my
>> >> laptop and made little sense, so I'd ignore them as inaccurate. The
>> >> latest run[1] with a pgbench scale of 4000 gave an improvement in CPU
>> >> time (ie: faster) of about 20%. Anastasia did another one[2] and saw
>> >> improvements as well, roughly 30%, though it's not measuring CPU time
>> >> but rather elapsed time.
>> >
>> > I'd be more concerned about cases that'd already fit into memory, not ones
>> > where we avoid doing another scan - and I think you mostly measured that?
>> >
>> > - Andres
>>
>> Well, scale 400 is pretty much as big as you can get with the old 1GB
>> limit, and also suffered no significant regression. Although, true, id
>> didn't significantly improve either.
>

Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2017-04-07 Thread Robert Haas

On Wed, Mar 22, 2017 at 6:20 AM, Etsuro Fujita
 wrote:
> On 2017/02/22 19:57, Rushabh Lathia wrote:
>> Marked this as Ready for Committer.
>
> I noticed that this item in the CF app was incorrectly marked as Committed.
> This patch isn't committed, so I returned it to the previous status.  I also
> rebased the patch.  Attached is a new version of the patch.

Sorry, I marked the wrong patch as committed.  Apologies for that.

This doesn't apply any more because of recent changes.

git diff --check complains:
contrib/postgres_fdw/postgres_fdw.c:3653: space before tab in indent.

+/* Shouldn't contain the target relation. */
+Assert(target_rel == 0);

This comment should give a reason.

 void
 deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
+   RelOptInfo *foreignrel,
List *targetlist,
List *targetAttrs,
List *remote_conds,

Could you add a comment explaining the meaning of these various
arguments?  It takes rtindex, rel, and foreignrel, which apparently
are all different things, but the meaning is not explained.

 /*
+ * Add a RETURNING clause, if needed, to an UPDATE/DELETE on a join.
+ */
+static void
+deparseExplicitReturningList(List *rlist,
+ List **retrieved_attrs,
+ deparse_expr_cxt *context)
+{
+deparseExplicitTargetList(rlist, true, retrieved_attrs, context);
+}

Do we really want to add a function for one line of code?

+/*
+ * Look for conditions mentioning the target relation in the given join tree,
+ * which will be pulled up into the WHERE clause.  Note that this is safe due
+ * to the same reason stated in comments in deparseFromExprForRel.
+ */

The comments for deparseFromExprForRel do not seem to address the
topic of why this is safe.  Also, the answer to the question "safe
from what?" is not clear.

-deparseReturningList(buf, root, rtindex, rel, false,
- returningList, retrieved_attrs);
+if (foreignrel->reloptkind == RELOPT_JOINREL)
+deparseExplicitReturningList(returningList, retrieved_attrs, &context);
+else
+deparseReturningList(buf, root, rtindex, rel, false,
+ returningList, retrieved_attrs);

Why do these cases need to be handled differently?  Maybe add a brief comment?

+if ((outerrel->reloptkind == RELOPT_BASEREL &&
+ outerrel->relid == target_rel) ||
+(innerrel->reloptkind == RELOPT_BASEREL &&
+ innerrel->relid == target_rel))

1. Surely it's redundant to check the RelOptKind if the RTI matches?

2. Generally, the tests in this patch against various RelOptKind
values should be adapted to use the new macros introduced in
7a39b5e4d11229ece930a51fd7cb29e535db4494.

The regression tests remove every remaining case where an update or
delete gets fails to get pushed to the remote side.  I think we should
still test that path, because we've still got that code.  Maybe use a
non-pushable function in the join clause, or something.

The new test cases could use some brief comments explaining their purpose.

 if (plan->returningLists)
+{
 returningList = (List *) list_nth(plan->returningLists, subplan_index);

+/*
+ * If UPDATE/DELETE on a join, create a RETURNING list used in the
+ * remote query.
+ */
+if (fscan->scan.scanrelid == 0)
+returningList = make_explicit_returning_list(resultRelation,
+ rel,
+ returningList);
+}

Again, the comment doesn't really explain why we're doing this.  And
initializing returningList twice in a row seems strange, too.

I am unfortunately too tired to finish properly reviewing this tonight.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables vs GRANT

2017-04-07 Thread Keith Fiske

On Fri, Apr 7, 2017 at 8:41 PM, Tom Lane  wrote:

> Keith Fiske  writes:
> > On Fri, Apr 7, 2017 at 2:46 PM, Tom Lane  wrote:
> >> Joe Conway  writes:
> >>> Apparently INSERT and SELECT on the parent partitioned table skip
> normal
> >>> acl checks on the partitions. Is that intended behavior?
>
> >> Yes, this matches normal inheritance behavior.
>
> > Should that really be normal partitioning behavior though?
>
> Yes, it should.  Consider the alternatives:
>
> 1. Owner must remember to run around and grant permissions on all child
> tables along with the parent.
>

I'm not following. That's what Joe is saying is happening now. The child
tables are not getting the parent privileges so this is what the owner must
remember to do every time they add a new child if they want to role to be
able to interact directly with the children. They can select, insert, etc
with the parent, but any direct interaction with the child is denied. I
know you're all trying to make the planner work so queries work efficiently
from the parent, but they'll never be as good as being able to hit the
child tables directly if they know where the data they want is. Why even
leave the child tables visible at all they can't be interacted with the
same as the parent? I thought that was supposed to be one of the advantages
to doing partitioning this way vs how Oracle & MySQL do it.

> 2. The system silently(?) doesn't show you some rows that are supposed
> to be visible when scanning the parent table.
>

> If you want RLS, use RLS; this is not that, and is not a good substitute.
>

Agreed. It appears the rows are visible if the role has select privileges
on the parent. But they cannot select directly from children. Not sure what
this has to do with RLS.

>
> (We've been around on this topic before, btw.  See the archives.)
>
> regards, tom lane
>

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-07 Thread Andres Freund

On 2017-04-07 22:06:13 -0300, Claudio Freire wrote:
> On Fri, Apr 7, 2017 at 9:56 PM, Andres Freund  wrote:
> > Hi,
> >
> >
> > On 2017-04-07 19:43:39 -0300, Claudio Freire wrote:
> >> On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund  wrote:
> >> > Hi,
> >> >
> >> > I've *not* read the history of this thread.  So I really might be
> >> > missing some context.
> >> >
> >> >
> >> >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
> >> >> From: Claudio Freire 
> >> >> Date: Mon, 12 Sep 2016 23:36:42 -0300
> >> >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
> >> >>
> >> >> Turn the dead_tuples array into a structure composed of several
> >> >> exponentially bigger arrays, to enable usage of more than 1GB
> >> >> of work mem during vacuum and thus reduce the number of full
> >> >> index scans necessary to remove all dead tids when the memory is
> >> >> available.
> >> >
> >> >>   * We are willing to use at most maintenance_work_mem (or perhaps
> >> >>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
> >> >> - * initially allocate an array of TIDs of that size, with an upper 
> >> >> limit that
> >> >> + * initially allocate an array of TIDs of 128MB, or an upper limit that
> >> >>   * depends on table size (this limit ensures we don't allocate a huge 
> >> >> area
> >> >> - * uselessly for vacuuming small tables).  If the array threatens to 
> >> >> overflow,
> >> >> - * we suspend the heap scan phase and perform a pass of index cleanup 
> >> >> and page
> >> >> - * compaction, then resume the heap scan with an empty TID array.
> >> >> + * uselessly for vacuuming small tables). Additional arrays of 
> >> >> increasingly
> >> >> + * large sizes are allocated as they become necessary.
> >> >> + *
> >> >> + * The TID array is thus represented as a list of multiple segments of
> >> >> + * varying size, beginning with the initial size of up to 128MB, and 
> >> >> growing
> >> >> + * exponentially until the whole budget of 
> >> >> (autovacuum_)maintenance_work_mem
> >> >> + * is used up.
> >> >
> >> > When the chunk size is 128MB, I'm a bit unconvinced that using
> >> > exponential growth is worth it. The allocator overhead can't be
> >> > meaningful in comparison to collecting 128MB dead tuples, the potential
> >> > waste is pretty big, and it increases memory fragmentation.
> >>
> >> The exponential strategy is mainly to improve lookup time (ie: to
> >> avoid large segment lists).
> >
> > Well, if we were to do binary search on the segment list, that'd not be
> > necessary.
> 
> True, but the initial lookup might be slower in the end, since the
> array would be bigger and cache locality worse.
> 
> Why do you say exponential growth fragments memory? AFAIK, all those
> allocations are well beyond the point where malloc starts mmaping
> memory, so each of those segments should be a mmap segment,
> independently freeable.

Not all platforms have that, and even on platforms with it, frequent,
unevenly sized, very large allocations can lead to enough fragmentation
that further allocations are harder and fragment / enlarge the
pagetable.


> >> Yes, the benchmarks are upthread. The earlier runs were run on my
> >> laptop and made little sense, so I'd ignore them as inaccurate. The
> >> latest run[1] with a pgbench scale of 4000 gave an improvement in CPU
> >> time (ie: faster) of about 20%. Anastasia did another one[2] and saw
> >> improvements as well, roughly 30%, though it's not measuring CPU time
> >> but rather elapsed time.
> >
> > I'd be more concerned about cases that'd already fit into memory, not ones
> > where we avoid doing another scan - and I think you mostly measured that?
> >
> > - Andres
> 
> Well, scale 400 is pretty much as big as you can get with the old 1GB
> limit, and also suffered no significant regression. Although, true, id
> didn't significantly improve either.

Aren't more interesting cases those where not that many dead tuples are
found, but the indexes are pretty large?  IIRC the index vacuum scans
still visit every leaf index tuple, no?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-07 Thread Claudio Freire

On Fri, Apr 7, 2017 at 9:56 PM, Andres Freund  wrote:
> Hi,
>
>
> On 2017-04-07 19:43:39 -0300, Claudio Freire wrote:
>> On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> > I've *not* read the history of this thread.  So I really might be
>> > missing some context.
>> >
>> >
>> >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
>> >> From: Claudio Freire 
>> >> Date: Mon, 12 Sep 2016 23:36:42 -0300
>> >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
>> >>
>> >> Turn the dead_tuples array into a structure composed of several
>> >> exponentially bigger arrays, to enable usage of more than 1GB
>> >> of work mem during vacuum and thus reduce the number of full
>> >> index scans necessary to remove all dead tids when the memory is
>> >> available.
>> >
>> >>   * We are willing to use at most maintenance_work_mem (or perhaps
>> >>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
>> >> - * initially allocate an array of TIDs of that size, with an upper limit 
>> >> that
>> >> + * initially allocate an array of TIDs of 128MB, or an upper limit that
>> >>   * depends on table size (this limit ensures we don't allocate a huge 
>> >> area
>> >> - * uselessly for vacuuming small tables).  If the array threatens to 
>> >> overflow,
>> >> - * we suspend the heap scan phase and perform a pass of index cleanup 
>> >> and page
>> >> - * compaction, then resume the heap scan with an empty TID array.
>> >> + * uselessly for vacuuming small tables). Additional arrays of 
>> >> increasingly
>> >> + * large sizes are allocated as they become necessary.
>> >> + *
>> >> + * The TID array is thus represented as a list of multiple segments of
>> >> + * varying size, beginning with the initial size of up to 128MB, and 
>> >> growing
>> >> + * exponentially until the whole budget of 
>> >> (autovacuum_)maintenance_work_mem
>> >> + * is used up.
>> >
>> > When the chunk size is 128MB, I'm a bit unconvinced that using
>> > exponential growth is worth it. The allocator overhead can't be
>> > meaningful in comparison to collecting 128MB dead tuples, the potential
>> > waste is pretty big, and it increases memory fragmentation.
>>
>> The exponential strategy is mainly to improve lookup time (ie: to
>> avoid large segment lists).
>
> Well, if we were to do binary search on the segment list, that'd not be
> necessary.

True, but the initial lookup might be slower in the end, since the
array would be bigger and cache locality worse.

Why do you say exponential growth fragments memory? AFAIK, all those
allocations are well beyond the point where malloc starts mmaping
memory, so each of those segments should be a mmap segment,
independently freeable.

>> >> + if (seg->num_dead_tuples >= seg->max_dead_tuples)
>> >> + {
>> >> + /*
>> >> +  * The segment is overflowing, so we must allocate 
>> >> a new segment.
>> >> +  * We could have a preallocated segment descriptor 
>> >> already, in
>> >> +  * which case we just reinitialize it, or we may 
>> >> need to repalloc
>> >> +  * the vacrelstats->dead_tuples array. In that 
>> >> case, seg will no
>> >> +  * longer be valid, so we must be careful about 
>> >> that. In any case,
>> >> +  * we must update the last_dead_tuple copy in the 
>> >> overflowing
>> >> +  * segment descriptor.
>> >> +  */
>> >> + Assert(seg->num_dead_tuples == 
>> >> seg->max_dead_tuples);
>> >> + seg->last_dead_tuple = 
>> >> seg->dt_tids[seg->num_dead_tuples - 1];
>> >> + if (vacrelstats->dead_tuples.last_seg + 1 >= 
>> >> vacrelstats->dead_tuples.num_segs)
>> >> + {
>> >> + int new_num_segs = 
>> >> vacrelstats->dead_tuples.num_segs * 2;
>> >> +
>> >> + vacrelstats->dead_tuples.dt_segments = 
>> >> (DeadTuplesSegment *) repalloc(
>> >> +(void *) 
>> >> vacrelstats->dead_tuples.dt_segments,
>> >> +
>> >> new_num_segs * sizeof(DeadTuplesSegment));
>> >
>> > Might be worth breaking this into some sub-statements, it's quite hard
>> > to read.
>>
>> Breaking what precisely? The comment?
>
> No, the three-line statement computing the new value of
> dead_tuples.dt_segments.  I'd at least assign dead_tuples to a local
> variable, to cut the length of the statement down.

Ah, alright. Will try to do that.

>> >> +/*
>> >>   *   lazy_tid_reaped() -- is a particular tid deletable?
>> >>   *
>> >>   *   This has the right signature to be an 
>> >> IndexBulkDeleteCallback.
>> >>   *
>> >> - *   Assumes dead_tuples array is in sorted order.
>> >> + *   Assumes t

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-07 Thread Andres Freund

Hi,


On 2017-04-07 19:43:39 -0300, Claudio Freire wrote:
> On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund  wrote:
> > Hi,
> >
> > I've *not* read the history of this thread.  So I really might be
> > missing some context.
> >
> >
> >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
> >> From: Claudio Freire 
> >> Date: Mon, 12 Sep 2016 23:36:42 -0300
> >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
> >>
> >> Turn the dead_tuples array into a structure composed of several
> >> exponentially bigger arrays, to enable usage of more than 1GB
> >> of work mem during vacuum and thus reduce the number of full
> >> index scans necessary to remove all dead tids when the memory is
> >> available.
> >
> >>   * We are willing to use at most maintenance_work_mem (or perhaps
> >>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
> >> - * initially allocate an array of TIDs of that size, with an upper limit 
> >> that
> >> + * initially allocate an array of TIDs of 128MB, or an upper limit that
> >>   * depends on table size (this limit ensures we don't allocate a huge area
> >> - * uselessly for vacuuming small tables).  If the array threatens to 
> >> overflow,
> >> - * we suspend the heap scan phase and perform a pass of index cleanup and 
> >> page
> >> - * compaction, then resume the heap scan with an empty TID array.
> >> + * uselessly for vacuuming small tables). Additional arrays of 
> >> increasingly
> >> + * large sizes are allocated as they become necessary.
> >> + *
> >> + * The TID array is thus represented as a list of multiple segments of
> >> + * varying size, beginning with the initial size of up to 128MB, and 
> >> growing
> >> + * exponentially until the whole budget of 
> >> (autovacuum_)maintenance_work_mem
> >> + * is used up.
> >
> > When the chunk size is 128MB, I'm a bit unconvinced that using
> > exponential growth is worth it. The allocator overhead can't be
> > meaningful in comparison to collecting 128MB dead tuples, the potential
> > waste is pretty big, and it increases memory fragmentation.
> 
> The exponential strategy is mainly to improve lookup time (ie: to
> avoid large segment lists).

Well, if we were to do binary search on the segment list, that'd not be
necessary.

> >> + if (seg->num_dead_tuples >= seg->max_dead_tuples)
> >> + {
> >> + /*
> >> +  * The segment is overflowing, so we must allocate a 
> >> new segment.
> >> +  * We could have a preallocated segment descriptor 
> >> already, in
> >> +  * which case we just reinitialize it, or we may 
> >> need to repalloc
> >> +  * the vacrelstats->dead_tuples array. In that case, 
> >> seg will no
> >> +  * longer be valid, so we must be careful about 
> >> that. In any case,
> >> +  * we must update the last_dead_tuple copy in the 
> >> overflowing
> >> +  * segment descriptor.
> >> +  */
> >> + Assert(seg->num_dead_tuples == seg->max_dead_tuples);
> >> + seg->last_dead_tuple = 
> >> seg->dt_tids[seg->num_dead_tuples - 1];
> >> + if (vacrelstats->dead_tuples.last_seg + 1 >= 
> >> vacrelstats->dead_tuples.num_segs)
> >> + {
> >> + int new_num_segs = 
> >> vacrelstats->dead_tuples.num_segs * 2;
> >> +
> >> + vacrelstats->dead_tuples.dt_segments = 
> >> (DeadTuplesSegment *) repalloc(
> >> +(void *) 
> >> vacrelstats->dead_tuples.dt_segments,
> >> +
> >> new_num_segs * sizeof(DeadTuplesSegment));
> >
> > Might be worth breaking this into some sub-statements, it's quite hard
> > to read.
> 
> Breaking what precisely? The comment?

No, the three-line statement computing the new value of
dead_tuples.dt_segments.  I'd at least assign dead_tuples to a local
variable, to cut the length of the statement down.


> >> + while (vacrelstats->dead_tuples.num_segs < 
> >> new_num_segs)
> >> + {
> >> + /* Initialize as "unallocated" */
> >> + DeadTuplesSegment *nseg = 
> >> &(vacrelstats->dead_tuples.dt_segments[
> >> +  
> >> vacrelstats->dead_tuples.num_segs]);
> >
> > dito.
> 
> I don't really get what you're asking here.

Trying to simplify/shorten the statement.


> >> +/*
> >>   *   lazy_tid_reaped() -- is a particular tid deletable?
> >>   *
> >>   *   This has the right signature to be an 
> >> IndexBulkDeleteCallback.
> >>   *
> >> - *   Assumes dead_tuples array is in sorted order.
> >> + *

Re: [HACKERS] Partitioned tables vs GRANT

2017-04-07 Thread Tom Lane

Keith Fiske  writes:
> On Fri, Apr 7, 2017 at 2:46 PM, Tom Lane  wrote:
>> Joe Conway  writes:
>>> Apparently INSERT and SELECT on the parent partitioned table skip normal
>>> acl checks on the partitions. Is that intended behavior?

>> Yes, this matches normal inheritance behavior.

> Should that really be normal partitioning behavior though?

Yes, it should.  Consider the alternatives:

1. Owner must remember to run around and grant permissions on all child
tables along with the parent.

2. The system silently(?) doesn't show you some rows that are supposed
to be visible when scanning the parent table.

If you want RLS, use RLS; this is not that, and is not a good substitute.

(We've been around on this topic before, btw.  See the archives.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2017-04-07 Thread Tom Lane

Alvaro Herrera  writes:
> Interesting.  I wonder if it's possible that a relcache invalidation
> would cause these values to get lost for some reason, because that would
> be dangerous.

> I suppose the rationale is that this shouldn't happen because any
> operation that does things this way must hold an exclusive lock on the
> relation.  But that doesn't guarantee that the relcache entry is
> completely stable,

It ABSOLUTELY is not safe.  Relcache flushes can happen regardless of
how strong a lock you hold.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-07 Thread Robert Haas

On Fri, Apr 7, 2017 at 5:22 PM, Joe Conway  wrote:
> On 04/07/2017 11:37 AM, Mike Palmiotto wrote:
 I found some missing bits in the 0002 patch -- new version attached.
 Will wait on new regression tests before committing, but I expect we'll
 have those by end of today and be able to commit the rest tomorrow.
>>>
>>> Attached are the regression test updates for partitioned tables.
>>
>> Actually attached this time.
>
> Based on my review and testing of the 0002 patch I believe it is
> correct. However Mike and I just went through the regression test patch
> line by line and there are issues he needs to address -- there is no way
> that is happening by tonight as the output is very verbose and we need
> to be sure we are both testing the correct things and getting the
> correct behaviors.
>
> Based on that I can:
>
> 1) commit the 0002 patch now before the feature freeze and follow up
>with the regression test patch when ready in a couple of days
> 2) hold off on both patches until ready
> 3) push both patches to the next commitfest/pg11
>
> Some argue this is an open issue against the new partitioning feature in
> pg10 and therefore should be addressed now, and others do not. I can see
> both sides of that argument.
>
> In any case, thoughts on what to do?

Speaking only for myself, I'm OK with any of those options, provided
that that "a couple" means what my dictionary says it means.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-04-07 Thread Peter Eisentraut

On 4/6/17 14:32, Pavel Stehule wrote:
> I like to see any proposals about syntax or implementation.
> 
> Using PRAGMA is one variant - introduced by PLpgSQL origin - Ada
> language. The PRAGMA syntax can be used for PRAGMA autonomous with well
> known syntax. It scales well  - it supports function, block or command
> level.

I had pragmas implemented in the original autonomous transactions patch
(https://www.postgresql.org/message-id/659a2fce-b6ee-06de-05c0-c8ed6a019...@2ndquadrant.com).
 However, the difference there is that the behavior is lexical, specific
to plpgsql, whereas here you are really just selecting run time
behavior.  So a GUC, and also something that could apply to other
places, should be considered.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-04-07 Thread Robert Haas

On Fri, Apr 7, 2017 at 6:32 PM, Michael Paquier
 wrote:
> On Sat, Apr 8, 2017 at 1:59 AM, Robert Haas  wrote:
>> On Fri, Apr 7, 2017 at 3:59 AM, Heikki Linnakangas  wrote:
>>> I think the "SCRAM" part is more important than "SHA-256", so -1 on that.
>>
>> I agree.  The point here isn't that we're using a better hashing
>> method, even if a lot of people *think* that's the point.  The point
>> is we're using a modern algorithm that has nice properties like "you
>> can't impersonate the client by steeling the verifier, or even by
>> snooping the exchange".
>>
>> But "sasl" might be even better.
>
> FWIW, my opinion has not changed much on the matter, I would still
> favor "sasl" as the keyword used in pg_hba.conf. What has changed in
> my mind though is that defining no mechanisms with an additional
> option mean that all possible choices are sent to the client. But if
> you define a list of mechanisms, then we'll just send back to the
> client the specified list as a possible choice of exchange mechanism:
> host all all blah.com sasl mechanism=scram-sha-256-plus
> Here for example the user would not be allowed to use SCRAM-SHA-256,
> just SCRAM with channel binding.
>
> Such an option makes sense once we add support for one more mechanism
> in SASL, like channel binding, but that's by far a generic approach
> that can serve us for years to come, and by admitting that nothing
> listed means all possible options we don't need any immediate action.

Yes, that all seems quite sensible.  What exactly is the counter-argument?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve 64bit atomics support.

2017-04-07 Thread Andres Freund

On 2017-04-07 16:36:09 -0700, Andres Freund wrote:
> On 2017-04-07 19:55:21 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > Improve 64bit atomics support.
> > > 
> > > When adding atomics back in b64d92f1a, I added 64bit support as
> > > optional; there wasn't yet a direct user in sight.  That turned out to
> > > be a bit short-sighted, it'd already have been useful a number of times.
> > 
> > Seems like this killed an arapaima:
> >   
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=arapaima&dt=2017-04-07%2022%3A06%3A59
> > 
> > Program terminated with signal 6, Aborted.
> > #0  0x00c6a402 in __kernel_vsyscall ()
> > #0  0x00c6a402 in __kernel_vsyscall ()
> > #1  0x00284b10 in raise () from /lib/libc.so.6
> > #2  0x00286421 in abort () from /lib/libc.so.6
> > #3  0x084d967e in ExceptionalCondition (
> > conditionName=0xe19dac "(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
> > ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", 
> > errorType=0xe19831 "UnalignedPointer", 
> > fileName=0xe19d88 "../../../src/include/port/atomics.h", lineNumber=428)
> > at assert.c:54
> > #4  0x00e189b0 in pg_atomic_init_u64 ()
> > at ../../../src/include/port/atomics.h:428
> 
> Gah, that's fairly annoying :(.  We can't trivially force alignment in
> the generic fallback case, because not all compilers support that.  We
> don't really need it the fallback case, because things are protected by
> a lock - but that means we'll have to make a bunch of locks conditional
> :/

Pushed an attempt at fixing this along those lines, let's hope that
works.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables vs GRANT

2017-04-07 Thread Keith Fiske

On Fri, Apr 7, 2017 at 2:46 PM, Tom Lane  wrote:

> Joe Conway  writes:
> > Apparently INSERT and SELECT on the parent partitioned table skip normal
> > acl checks on the partitions. Is that intended behavior?
>
> Yes, this matches normal inheritance behavior.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Should that really be normal partitioning behavior though? Pretty sure
people would expect child tables to have consistent permissions in a
partition set and I'd think setting them on the parent should be what they
expect the children to have.

Keith

Re: [HACKERS] WAL logging problem in 9.4.3?

2017-04-07 Thread Alvaro Herrera

Alvaro Herrera wrote:

> I suppose the rationale is that this shouldn't happen because any
> operation that does things this way must hold an exclusive lock on the
> relation.  But that doesn't guarantee that the relcache entry is
> completely stable, does it?  If we can get proof of that, then this
> technique should be safe, I think.

It occurs to me that in order to test this we could run the recovery
tests (including Michael's new 006 file, which you didn't include in
your patch) under -D CLOBBER_CACHE_ALWAYS.  I think that'd be sufficient
proof that it is solid.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2017-04-07 Thread Alvaro Herrera

I have claimed this patch as committer FWIW.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve 64bit atomics support.

2017-04-07 Thread Andres Freund

On 2017-04-07 19:55:21 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Improve 64bit atomics support.
> > 
> > When adding atomics back in b64d92f1a, I added 64bit support as
> > optional; there wasn't yet a direct user in sight.  That turned out to
> > be a bit short-sighted, it'd already have been useful a number of times.
> 
> Seems like this killed an arapaima:
>   
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=arapaima&dt=2017-04-07%2022%3A06%3A59
> 
> Program terminated with signal 6, Aborted.
> #0  0x00c6a402 in __kernel_vsyscall ()
> #0  0x00c6a402 in __kernel_vsyscall ()
> #1  0x00284b10 in raise () from /lib/libc.so.6
> #2  0x00286421 in abort () from /lib/libc.so.6
> #3  0x084d967e in ExceptionalCondition (
> conditionName=0xe19dac "(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
> ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", 
> errorType=0xe19831 "UnalignedPointer", 
> fileName=0xe19d88 "../../../src/include/port/atomics.h", lineNumber=428)
> at assert.c:54
> #4  0x00e189b0 in pg_atomic_init_u64 ()
> at ../../../src/include/port/atomics.h:428

Gah, that's fairly annoying :(.  We can't trivially force alignment in
the generic fallback case, because not all compilers support that.  We
don't really need it the fallback case, because things are protected by
a lock - but that means we'll have to make a bunch of locks conditional
:/

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2017-04-07 Thread Alvaro Herrera

Kyotaro HORIGUCHI wrote:

> The attached patch is quiiiccck-and-dirty-hack of Michael's patch
> just as a PoC of my proposal quoted above. This also passes the
> 006 test.  The major changes are the following.
> 
> - Moved sync_above and truncted_to into  RelationData.

Interesting.  I wonder if it's possible that a relcache invalidation
would cause these values to get lost for some reason, because that would
be dangerous.

I suppose the rationale is that this shouldn't happen because any
operation that does things this way must hold an exclusive lock on the
relation.  But that doesn't guarantee that the relcache entry is
completely stable, does it?  If we can get proof of that, then this
technique should be safe, I think.

In your version of the patch, which I spent some time skimming, I am
missing comments on various functions.  I added some as I went along,
including one XXX indicating it must be filled.

RecordPendingSync() should really live in relcache.c (and probably get a
different name).

> X I feel that I have dropped one of the features of the origitnal
>   patch during the hack, but I don't recall it clearly now:(

Hah :-)

> X I haven't consider relfilenode replacement, which didn't matter
>   for the original patch. (but there's few places to consider).

Hmm ...  Please provide.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0c3e2b0..aa1b97d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -34,6 +34,28 @@
  *   the POSTGRES heap access method used for all POSTGRES
  *   relations.
  *
+ * WAL CONSIDERATIONS
+ *   All heap operations are normally WAL-logged. but there are a few
+ *   exceptions. Temporary and unlogged relations never need to be
+ *   WAL-logged, but we can also skip WAL-logging for a table that was
+ *   created in the same transaction, if we don't need WAL for PITR or
+ *   WAL archival purposes (i.e. if wal_level=minimal), and we fsync()
+ *   the file to disk at COMMIT instead.
+ *
+ *   The same-relation optimization is not employed automatically on all
+ *   updates to a table that was created in the same transacton, because
+ *   for a small number of changes, it's cheaper to just create the WAL
+ *   records than fsyncing() the whole relation at COMMIT. It is only
+ *   worthwhile for (presumably) large operations like COPY, CLUSTER,
+ *   or VACUUM FULL. Use heap_register_sync() to initiate such an
+ *   operation; it will cause any subsequent updates to the table to skip
+ *   WAL-logging, if possible, and cause the heap to be synced to disk at
+ *   COMMIT.
+ *
+ *   To make that work, all modifications to heap must use
+ *   HeapNeedsWAL() to check if WAL-logging is needed in this transaction
+ *   for the given block.
+ *
  *-
  */
 #include "postgres.h"
@@ -56,6 +78,7 @@
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
+#include "catalog/storage.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
@@ -2356,12 +2379,6 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * The new tuple is stamped with current transaction ID and the specified
  * command ID.
  *
- * If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not
- * logged in WAL, even for a non-temp relation.  Safe usage of this behavior
- * requires that we arrange that all new tuples go into new pages not
- * containing any tuples from other transactions, and that the relation gets
- * fsync'd before commit.  (See also heap_sync() comments)
- *
  * The HEAP_INSERT_SKIP_FSM option is passed directly to
  * RelationGetBufferForTuple, which see for more info.
  *
@@ -2465,7 +2482,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId 
cid,
MarkBufferDirty(buffer);
 
/* XLOG stuff */
-   if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))
+   if (BufferNeedsWAL(relation, buffer))
{
xl_heap_insert xlrec;
xl_heap_header xlhdr;
@@ -2664,12 +2681,10 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, 
int ntuples,
int ndone;
char   *scratch = NULL;
Pagepage;
-   boolneedwal;
SizesaveFreeSpace;
boolneed_tuple_data = RelationIsLogicallyLogged(relation);
boolneed_cids = 
RelationIsAccessibleInLogicalDecoding(relation);
 
-   needwal = !(options & HEAP_INSERT_SKIP_WAL) && 
RelationNeedsWAL(relation);
saveFreeSpace = RelationGetTargetPageFreeSpace(relation,

Re: [HACKERS] Interval for launching the table sync worker

2017-04-07 Thread Peter Eisentraut

On 4/7/17 01:10, Masahiko Sawada wrote:
> It's not critical but it could be problem. So I thought we should fix
> it before the PostgreSQL 10 release. If it's not appropriate as an
> open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve 64bit atomics support.

2017-04-07 Thread Alvaro Herrera

Andres Freund wrote:
> Improve 64bit atomics support.
> 
> When adding atomics back in b64d92f1a, I added 64bit support as
> optional; there wasn't yet a direct user in sight.  That turned out to
> be a bit short-sighted, it'd already have been useful a number of times.

Seems like this killed an arapaima:
  
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=arapaima&dt=2017-04-07%2022%3A06%3A59

Program terminated with signal 6, Aborted.
#0  0x00c6a402 in __kernel_vsyscall ()
#0  0x00c6a402 in __kernel_vsyscall ()
#1  0x00284b10 in raise () from /lib/libc.so.6
#2  0x00286421 in abort () from /lib/libc.so.6
#3  0x084d967e in ExceptionalCondition (
conditionName=0xe19dac "(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", 
errorType=0xe19831 "UnalignedPointer", 
fileName=0xe19d88 "../../../src/include/port/atomics.h", lineNumber=428)
at assert.c:54
#4  0x00e189b0 in pg_atomic_init_u64 ()
at ../../../src/include/port/atomics.h:428
#5  test_atomic_uint64 () at regress.c:1007
#6  0x00e1905d in test_atomic_ops (fcinfo=0x9362584) at regress.c:1097
#7  0x08273ab2 in ExecInterpExpr (state=0x9362510, econtext=0x93622e0, 
isnull=0xbfbc1a4b "") at execExprInterp.c:650
#8  0x082990a7 in ExecEvalExprSwitchContext (node=0x9362294)
at ../../../src/include/executor/executor.h:289
#9  ExecProject (node=0x9362294)
at ../../../src/include/executor/executor.h:323
#10 ExecResult (node=0x9362294) at nodeResult.c:132
#11 0x0827c6d5 in ExecProcNode (node=0x9362294) at execProcnode.c:416
#12 0x0827a08d in ExecutePlan (queryDesc=0x92daf38, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001')
at execMain.c:1651
#13 standard_ExecutorRun (queryDesc=0x92daf38, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001')
at execMain.c:360
#14 0x083cd8bb in PortalRunSelect (portal=0x92d78a8, 
forward=, count=0, dest=0x9337728) at pquery.c:933
#15 0x083cef1e in PortalRun (portal=0x92d78a8, count=2147483647, 
isTopLevel=1 '\001', run_once=1 '\001', dest=0x9337728, 
altdest=0x9337728, completionTag=0xbfbc1c6a "") at pquery.c:774
#16 0x083cb38f in exec_simple_query (
query_string=0x93360b0 "SELECT test_atomic_ops();") at postgres.c:1105
#17 0x083cc640 in PostgresMain (argc=1, argv=0x92e4160, 
dbname=0x92e3fe0 "regression", username=0x92e3fc4 "postgres")
at postgres.c:4075
#18 0x08349eaf in BackendStartup () at postmaster.c:4317
#19 ServerLoop () at postmaster.c:1729
#20 0x0834db50 in PostmasterMain (argc=8, argv=0x92b9968) at postmaster.c:1337
#21 0x082c01e2 in main (argc=8, argv=Cannot access memory at address 0x5aa5
) at main.c:228


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-07 Thread Claudio Freire

On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund  wrote:
> Hi,
>
> I've *not* read the history of this thread.  So I really might be
> missing some context.
>
>
>> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
>> From: Claudio Freire 
>> Date: Mon, 12 Sep 2016 23:36:42 -0300
>> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
>>
>> Turn the dead_tuples array into a structure composed of several
>> exponentially bigger arrays, to enable usage of more than 1GB
>> of work mem during vacuum and thus reduce the number of full
>> index scans necessary to remove all dead tids when the memory is
>> available.
>
>>   * We are willing to use at most maintenance_work_mem (or perhaps
>>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
>> - * initially allocate an array of TIDs of that size, with an upper limit 
>> that
>> + * initially allocate an array of TIDs of 128MB, or an upper limit that
>>   * depends on table size (this limit ensures we don't allocate a huge area
>> - * uselessly for vacuuming small tables).  If the array threatens to 
>> overflow,
>> - * we suspend the heap scan phase and perform a pass of index cleanup and 
>> page
>> - * compaction, then resume the heap scan with an empty TID array.
>> + * uselessly for vacuuming small tables). Additional arrays of increasingly
>> + * large sizes are allocated as they become necessary.
>> + *
>> + * The TID array is thus represented as a list of multiple segments of
>> + * varying size, beginning with the initial size of up to 128MB, and growing
>> + * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
>> + * is used up.
>
> When the chunk size is 128MB, I'm a bit unconvinced that using
> exponential growth is worth it. The allocator overhead can't be
> meaningful in comparison to collecting 128MB dead tuples, the potential
> waste is pretty big, and it increases memory fragmentation.

The exponential strategy is mainly to improve lookup time (ie: to
avoid large segment lists).

>> + * Lookup in that structure proceeds sequentially in the list of segments,
>> + * and with a binary search within each segment. Since segment's size grows
>> + * exponentially, this retains O(N log N) lookup complexity.
>
> N log N is a horrible lookup complexity.  That's the complexity of
> *sorting* an entire array.  I think you might be trying to argue that
> it's log(N) * log(N)? Once log(n) for the exponentially growing size of
> segments, one for the binary search?
>
> Afaics you could quite easily make it O(2 log(N)) by simply also doing
> binary search over the segments.  Might not be worth it due to the small
> constant involved normally.

It's a typo, yes, I meant O(log N) (which is equivalent to O(2 log N))

>> + * If the array threatens to overflow, we suspend the heap scan phase and
>> + * perform a pass of index cleanup and page compaction, then resume the heap
>> + * scan with an array of logically empty but already preallocated TID 
>> segments
>> + * to be refilled with more dead tuple TIDs.
>
> Hm, it's not really the array that overflows, it's m_w_m that'd be
> exceeded, right?

Yes, will rephrase. Although that's how the original comment expressed
the same concept.

>>  /*
>> + * Minimum (starting) size of the dead_tuples array segments. Will allocate
>> + * space for 128MB worth of tid pointers in the first segment, further 
>> segments
>> + * will grow in size exponentially. Don't make it too small or the segment 
>> list
>> + * will grow bigger than the sweetspot for search efficiency on big vacuums.
>> + */
>> +#define LAZY_MIN_TUPLES  Max(MaxHeapTuplesPerPage, (128<<20) / 
>> sizeof(ItemPointerData))
>
> That's not really the minimum, no? s/MIN/INIT/?

Ok

>> +typedef struct DeadTuplesSegment
>> +{
>> + int num_dead_tuples;/* # of entries in the 
>> segment */
>> + int max_dead_tuples;/* # of entries 
>> allocated in the segment */
>> + ItemPointerData last_dead_tuple;/* Copy of the last dead tuple 
>> (unset
>> +
>>   * until the segment is fully
>> +
>>   * populated) */
>> + unsigned short padding;
>> + ItemPointer dt_tids;/* Array of dead tuples */
>> +}DeadTuplesSegment;
>
> Whenever padding is needed, it should have an explanatory comment.  It's
> certainly not obvious to me wh it's neede here.

Ok

>> @@ -1598,6 +1657,11 @@ lazy_vacuum_index(Relation indrel,
>>   ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
>>   ivinfo.strategy = vac_strategy;
>>
>> + /* Finalize the current segment by setting its upper bound dead tuple 
>> */
>> + seg = DeadTuplesCurrentSegment(vacrelstats);
>> + if (seg->num_dead_tuples > 0)
>> + seg->last_dead_tuple = seg->dt_tids[seg->num_dead_tuples - 1];
>

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-07 Thread Claudio Freire

On Fri, Apr 7, 2017 at 7:43 PM, Claudio Freire  wrote:
>>> + * Lookup in that structure proceeds sequentially in the list of segments,
>>> + * and with a binary search within each segment. Since segment's size grows
>>> + * exponentially, this retains O(N log N) lookup complexity.
>>
>> N log N is a horrible lookup complexity.  That's the complexity of
>> *sorting* an entire array.  I think you might be trying to argue that
>> it's log(N) * log(N)? Once log(n) for the exponentially growing size of
>> segments, one for the binary search?
>>
>> Afaics you could quite easily make it O(2 log(N)) by simply also doing
>> binary search over the segments.  Might not be worth it due to the small
>> constant involved normally.
>
> It's a typo, yes, I meant O(log N) (which is equivalent to O(2 log N))


To clarify, lookup over the segments is linear, so it's O(M) with M
the number of segments, then the binary search is O(log N) with N the
number of dead tuples.

So lookup is O(M + log N), but M < log N because of the segment's
exponential growth, therefore the lookup is O(2 log N)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] monitoring.sgml missing tag

2017-04-07 Thread Peter Eisentraut

On 4/7/17 16:50, Andres Freund wrote:
> On 2017-04-07 22:47:55 +0200, Erik Rijkers wrote:
>> monitoring.sgml has one  tag missing
> 
> Is that actually an issue? SGML allows skipping certain close tags, and
> IIRC row is one them.

The issue is a weird one.  For some reason, older tool chains complain
about this, but newer ones only complain about it when you use certain
warning options.  The mistake here was basically that the osx calls in
the makefile didn't enable those options, so users of newer tool chains
didn't see any complaints.  I have fixed that now.

For clarification, SGML allows applications of SGML to declare whether
they want to allow omitting tags.  HTML (<5) does so.  DocBook does not.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-04-07 Thread Michael Paquier

On Sat, Apr 8, 2017 at 1:59 AM, Robert Haas  wrote:
> On Fri, Apr 7, 2017 at 3:59 AM, Heikki Linnakangas  wrote:
>> I think the "SCRAM" part is more important than "SHA-256", so -1 on that.
>
> I agree.  The point here isn't that we're using a better hashing
> method, even if a lot of people *think* that's the point.  The point
> is we're using a modern algorithm that has nice properties like "you
> can't impersonate the client by steeling the verifier, or even by
> snooping the exchange".
>
> But "sasl" might be even better.

FWIW, my opinion has not changed much on the matter, I would still
favor "sasl" as the keyword used in pg_hba.conf. What has changed in
my mind though is that defining no mechanisms with an additional
option mean that all possible choices are sent to the client. But if
you define a list of mechanisms, then we'll just send back to the
client the specified list as a possible choice of exchange mechanism:
host all all blah.com sasl mechanism=scram-sha-256-plus
Here for example the user would not be allowed to use SCRAM-SHA-256,
just SCRAM with channel binding.

Such an option makes sense once we add support for one more mechanism
in SASL, like channel binding, but that's by far a generic approach
that can serve us for years to come, and by admitting that nothing
listed means all possible options we don't need any immediate action.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mvstats triggers 32bit warnings

2017-04-07 Thread Alvaro Herrera

Andres Freund wrote:
> Hi,
> 
> compiling on linux 32 bit I get a lot of warnings due to printf format.
> I suspect most of them should just be cured by using %zd or %zu instead
> of %ld.

You're right, they are.  Confirmed, and pushed fix using %zd.  I suppose
%zu would be "more correct", but this doesn't raise warnings anymore.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] monitoring.sgml missing tag

2017-04-07 Thread Peter Eisentraut

On 4/7/17 16:47, Erik Rijkers wrote:
> monitoring.sgml has one  tag missing

fixed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Optimize memory allocation in function 'bringetbitmap'

2017-04-07 Thread Alvaro Herrera

Tomas Vondra wrote:

> 1) bringetbitmap(PG_FUNCTION_ARGS)
> 
> + /*
> +  * Estimate the approximate size of btup and allocate memory for btup.
> +  */
> + btupInitSize = bdesc->bd_tupdesc->natts * 16;
> + btup = palloc(btupInitSize);
> 
> What is the reasoning behind this estimate? I mean, this only accounts for
> the values, not the NULL bitmap or BrinTuple fields. Maybe it does not
> really matter much, but I'd like to see some brief description in the docs,
> please.
> 
> This also interacts with our memory allocator in a rather annoying way,
> because palloc() always allocates memory in 2^k chunks, but sadly the
> estimate ignores that. So for natts=3 we get btupInitSize=48, but internally
> allocate 64B (and ignore the top 16B).

I fixed this point (and the following one, which is essentially
complaining about the same thing) by changing the API of
brin_copy_tuple, similarly to the changes we made to brin_deform_tuple:
pass the previously allocated memory (along with its size) as arguments,
so that it can be repalloc'ed if necessary, or just re-used as-is.

That seemed the most effective way to solve the points you raised.

Your review was extremely useful, many thanks.

In a extremely simpleminded test to measure the benefit, I did this:

create table t (a int, b bigint, c bigint) ;
insert into t select g, g, g from generate_series(1, 1 * 1000) g;
-- generates about 800 MB of data; fits in shared_buffers
create index ti on t using brin (a, b, c) with (pages_per_range = 1);
-- index contains about 84 revmap pages

and then measured this query:
   explain analyze select * from t where b between 245782 and 1256782;
with and without this patch.  It goes from 40ms without patch to 25ms
with, which seems a decent enough improvement.  (Assertions were
enabled, but I disabled memory clobbering).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 3:47 PM, Thomas Munro
 wrote:
> On Sat, Apr 8, 2017 at 6:35 AM, Kevin Grittner  wrote:
>> On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund  wrote:
>>
>>> I'd rather fix the issue, than remove the tests entirely.  Seems quite
>>> possible to handle blocking on Safesnapshot in a similar manner as 
>>> pg_blocking_pids?
>>
>> I'll see what I can figure out.
>
> Ouch.  These are the other ways I thought of to achieve this:
>
> https://www.postgresql.org/message-id/CAEepm%3D1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN%3D0LY3pJ49Ksg%40mail.gmail.com
>
> I'd be happy to write one of those, but it may take a day as I have
> some other commitments.

Please give it a go.  I'm dealing with putting out fires with
customers while trying to make sure I have tested the predicate
locking GUCs patch sufficiently.  (I think it's ready to go, and has
passed all tests so far, but there are a few more I want to run.)
I'm not sure I can come up with a solution faster than you, given
that.  Since it is an improvement to performance for a test-only
environment on a feature that is already committed and not causing
problems for production environments, hopefully people will tolerate
a fix a day or two out.  If not, we'll have to revert and get it
into the first CF for v11.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] mvstats triggers 32bit warnings

2017-04-07 Thread Andres Freund

Hi,

compiling on linux 32 bit I get a lot of warnings due to printf format.
I suspect most of them should just be cured by using %zd or %zu instead
of %ld.

/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c: In function 
‘statext_ndistinct_deserialize’:
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:241:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has 
type ‘unsigned int’ [-Wformat=]
   elog(ERROR, "invalid MVNDistinct size %ld (expected at least %ld)",
   ^
/home/andres/src/postgresql/src/include/utils/elog.h:202:23: note: in 
definition of macro ‘elog’
   elog_finish(elevel, __VA_ARGS__); \
   ^~~
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:241:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has 
type ‘unsigned int’ [-Wformat=]
   elog(ERROR, "invalid MVNDistinct size %ld (expected at least %ld)",
   ^
/home/andres/src/postgresql/src/include/utils/elog.h:202:23: note: in 
definition of macro ‘elog’
   elog_finish(elevel, __VA_ARGS__); \
   ^~~
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:277:13: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 2 has 
type ‘unsigned int’ [-Wformat=]
  errmsg("invalid MVNDistinct size %ld (expected at least %ld)",
 ^
/home/andres/src/postgresql/src/include/utils/elog.h:107:14: note: in 
definition of macro ‘ereport_domain’
errfinish rest; \
  ^~~~
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:275:3: note: in 
expansion of macro ‘ereport’
   ereport(ERROR,
   ^~~
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:277:13: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has 
type ‘Size {aka unsigned int}’ [-Wformat=]
  errmsg("invalid MVNDistinct size %ld (expected at least %ld)",
 ^
/home/andres/src/postgresql/src/include/utils/elog.h:107:14: note: in 
definition of macro ‘ereport_domain’
errfinish rest; \
  ^~~~
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:275:3: note: in 
expansion of macro ‘ereport’
   ereport(ERROR,
   ^~~
In file included from /home/andres/src/postgresql/src/include/postgres.h:48:0,
 from 
/home/andres/src/postgresql/src/backend/statistics/dependencies.c:14:
/home/andres/src/postgresql/src/backend/statistics/dependencies.c: In function 
‘statext_dependencies_deserialize’:
/home/andres/src/postgresql/src/backend/statistics/dependencies.c:514:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has 
type ‘unsigned int’ [-Wformat=]
   elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
   ^
/home/andres/src/postgresql/src/include/utils/elog.h:202:23: note: in 
definition of macro ‘elog’
   elog_finish(elevel, __VA_ARGS__); \
   ^~~
/home/andres/src/postgresql/src/backend/statistics/dependencies.c:514:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has 
type ‘unsigned int’ [-Wformat=]
   elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
   ^
/home/andres/src/postgresql/src/include/utils/elog.h:202:23: note: in 
definition of macro ‘elog’
   elog_finish(elevel, __VA_ARGS__); \
   ^~~
/home/andres/src/postgresql/src/backend/statistics/dependencies.c:550:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has 
type ‘unsigned int’ [-Wformat=]
   elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
   ^
/home/andres/src/postgresql/src/include/utils/elog.h:202:23: note: in 
definition of macro ‘elog’
   elog_finish(elevel, __VA_ARGS__); \
   ^~~
/home/andres/src/postgresql/src/backend/statistics/dependencies.c:550:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has 
type ‘Size {aka unsigned int}’ [-Wformat=]
   elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
   ^
- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compilation warning with MSVC in pg_depend.c

2017-04-07 Thread Peter Eisentraut

On 4/6/17 23:03, Michael Paquier wrote:
> Hi all,
> 
> I am getting the following warning with MSVC 2010 for what is a result
> of 3217327 (identity columns):
>   c:\users\michael\git\postgres\src\backend\catalog\pg_depend.c(615):
> warning C4715: 'getOwnedSequence' : not all control paths return a
> value [C:\Users\michael\git\postgres\postgres.vcxproj]
> 
> I agree that this compiler is dumb after looking at the code, but
> could it be possible to silence this warning with something like the
> attached?

fixed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making clausesel.c Smarter

2017-04-07 Thread Claudio Freire

On Fri, Apr 7, 2017 at 2:28 AM, David Rowley
 wrote:
>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>> DEFAULT_INEQ_SEL)
>> + {
>> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>> implies we have no stats */
>> + s2 = DEFAULT_RANGE_INEQ_SEL;
>> + }
>> + else
>> + {
>> ...
>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>> +   notnullsel = 1.0 - nullsel;
>> +
>> +   /* Adjust for double-exclusion of NULLs */
>> +   s2 += nullsel;
>> +   if (s2 <= 0.0) {
>> +  if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>> +  {
>> +   /* Most likely contradictory clauses found */
>> +   s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>> +  }
>> +  else
>> +  {
>> +   /* Could be a rounding error */
>> +   s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>> +  }
>> +   }
>> + }
>>
>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>> cautious) estimation of the amount of rounding error that could be
>> there with 32-bit floats.
>>
>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>> an estimation based on stats, maybe approximate, but one that makes
>> sense (ie: preserves the monotonicity of the CPF), and as such
>> negative values are either sign of a contradiction or rounding error.
>> The previous code left the possibility of negatives coming out of
>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>> but that doesn't seem possible coming out of scalarineqsel.
>
> I notice this does change the estimates for contradictory clauses such as:
>
> create table a (value int);
> insert into a select x/100 from generate_Series(1,1) x;
> analyze a;
> explain analyze select * from a where value between 101 and -1;
>
> We now get:
>
>  QUERY PLAN
> -
>  Seq Scan on a  (cost=0.00..195.00 rows=1 width=4) (actual
> time=1.904..1.904 rows=0 loops=1)
>Filter: ((value >= 101) AND (value <= '-1'::integer))
>Rows Removed by Filter: 1
>  Planning time: 0.671 ms
>  Execution time: 1.950 ms
> (5 rows)
>
> where before we'd get:
>
>   QUERY PLAN
> --
>  Seq Scan on a  (cost=0.00..195.00 rows=50 width=4) (actual
> time=0.903..0.903 rows=0 loops=1)
>Filter: ((value >= 101) AND (value <= '-1'::integer))
>Rows Removed by Filter: 1
>  Planning time: 0.162 ms
>  Execution time: 0.925 ms
> (5 rows)
>
> Which comes from the 1 * 0.005 selectivity estimate from tuples *
> DEFAULT_RANGE_INEQ_SEL
>
> I've attached a patch to this effect.

+/*
+ * A zero or slightly negative s2 should be converted into
+ * a small positive value; we probably are dealing with a
+ * very tight range and got a bogus result due to roundoff
+ * errors. However, if s2 is very negative, then we
+ * probably have default selectivity estimates on one or
+ * both sides of the range that we failed to recognize
+ * above for some reason.
+ */
+if (s2 <= 0.0)

That comment seems outdated

Otherwise, the patch LGTM, but I'd like to solve the quadratic
behavior too... are you going to try? Otherwise I could take a stab at
it myself. It doesn't seem very difficult.

Also, can you add a test case? Cost values could make the test
fragile, so if that gives you trouble I'll understand, but it'd be
best to try and test this if possible.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-07 Thread Joe Conway

On 04/07/2017 11:37 AM, Mike Palmiotto wrote:
>>> I found some missing bits in the 0002 patch -- new version attached.
>>> Will wait on new regression tests before committing, but I expect we'll
>>> have those by end of today and be able to commit the rest tomorrow.
>>
>> Attached are the regression test updates for partitioned tables.
> 
> Actually attached this time.

Based on my review and testing of the 0002 patch I believe it is
correct. However Mike and I just went through the regression test patch
line by line and there are issues he needs to address -- there is no way
that is happening by tonight as the output is very verbose and we need
to be sure we are both testing the correct things and getting the
correct behaviors.

Based on that I can:

1) commit the 0002 patch now before the feature freeze and follow up
   with the regression test patch when ready in a couple of days
2) hold off on both patches until ready
3) push both patches to the next commitfest/pg11

Some argue this is an open issue against the new partitioning feature in
pg10 and therefore should be addressed now, and others do not. I can see
both sides of that argument.

In any case, thoughts on what to do?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] monitoring.sgml missing tag

2017-04-07 Thread Andres Freund

Hi,

On 2017-04-07 23:00:01 +0200, Erik Rijkers wrote:
> On 2017-04-07 22:50, Andres Freund wrote:
> > On 2017-04-07 22:47:55 +0200, Erik Rijkers wrote:
> > > monitoring.sgml has one  tag missing
> > 
> > Is that actually an issue? SGML allows skipping certain close tags, and
> > IIRC row is one them.  We'll probably move to xml at some point not too
> > far away, but I don't think it makes much sense to fix these one-by-one.
> 
> 
> Well, I have only used  make oldhtml  before now so maybe I am doing
> something wrong.
> 
> I try to run  make html.
> 
> First, I got this (just showing first few of a 75x repeat):
> 
> $ time ( cd /home/aardvark/pg_stuff/pg_sandbox/pgsql.HEAD/doc/src/sgml;
> make html; )
> osx -D . -D . -x lower postgres.sgml >postgres.xml.tmp
> osx:monitoring.sgml:1278:12:E: document type does not allow element "ROW"
> here

> xmllint --noout --valid postgres.xml
> xsltproc --stringparam pg.version '10devel'  stylesheet.xsl postgres.xml
> runtime error: file stylesheet-html-common.xsl line 41 element call-template
> The called template 'id.attribute' was not found.
> runtime error: file stylesheet-html-common.xsl line 41 element call-template


> Any hints welcome...
> 
> $ cat /etc/redhat-release
> CentOS release 6.6 (Final)

I think this means that your tools are too old.  See e.g.
http://archives.postgresql.org/message-id/21063.1491231996%40sss.pgh.pa.us

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] monitoring.sgml missing tag

2017-04-07 Thread Erik Rijkers


On 2017-04-07 22:50, Andres Freund wrote:

On 2017-04-07 22:47:55 +0200, Erik Rijkers wrote:

monitoring.sgml has one  tag missing


Is that actually an issue? SGML allows skipping certain close tags, and
IIRC row is one them.  We'll probably move to xml at some point not too
far away, but I don't think it makes much sense to fix these 
one-by-one.



Well, I have only used  make oldhtml  before now so maybe I am doing 
something wrong.


I try to run  make html.

First, I got this (just showing first few of a 75x repeat):

$ time ( cd /home/aardvark/pg_stuff/pg_sandbox/pgsql.HEAD/doc/src/sgml;  
make html; )

osx -D . -D . -x lower postgres.sgml >postgres.xml.tmp
osx:monitoring.sgml:1278:12:E: document type does not allow element 
"ROW" here
osx:monitoring.sgml:1282:12:E: document type does not allow element 
"ROW" here
osx:monitoring.sgml:1286:12:E: document type does not allow element 
"ROW" here

...
osx:monitoring.sgml:1560:12:E: document type does not allow element 
"ROW" here
osx:monitoring.sgml:1564:13:E: end tag for "ROW" omitted, but OMITTAG NO 
was specified

osx:monitoring.sgml:1275:8: start tag was here
make: *** [postgres.xml] Error 1


After closing that tag with  ,  make html  still fails:



$ time ( cd /home/aardvark/pg_stuff/pg_sandbox/pgsql.HEAD/doc/src/sgml;  
make html; )

osx -D . -D . -x lower postgres.sgml >postgres.xml.tmp
'/opt/perl-5.24/bin/perl' -p -e 
's/\[(aacute|acirc|aelig|agrave|amp|aring|atilde|auml|bull|copy|eacute|egrave|gt|iacute|lt|mdash|nbsp|ntilde|oacute|ocirc|oslash|ouml|pi|quot|scaron|uuml) 
*\]/\&\1;/gi;' -e '$_ .= qq{XML V4.2//EN" 
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd";>\n} if $. == 
1;'  postgres.xml

rm postgres.xml.tmp
xmllint --noout --valid postgres.xml
xsltproc --stringparam pg.version '10devel'  stylesheet.xsl postgres.xml
runtime error: file stylesheet-html-common.xsl line 41 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 41 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 41 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 41 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 41 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
no result for postgres.xml
make: *** [html-stamp] Error 9

real4m23.641s
user4m22.304s
sys 0m0.914s


Any hints welcome...

thanks


$ cat /etc/redhat-release
CentOS release 6.6 (Final)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] valgrind errors around dsa.c

2017-04-07 Thread Thomas Munro

On Sat, Apr 8, 2017 at 4:49 AM, Andres Freund  wrote:
> Hi,
>
> newly added tests exercise parallel bitmap scans.  And they trigger
> valgrind errors:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-07%2007%3A10%3A01
>
>
> ==4567== VALGRINDERROR-BEGIN
> ==4567== Conditional jump or move depends on uninitialised value(s)
> ==4567==at 0x5FD62A: check_for_freed_segments (dsa.c:2219)
> ==4567==by 0x5FD97E: dsa_get_address (dsa.c:934)
> ==4567==by 0x5FDA2A: init_span (dsa.c:1339)
> ==4567==by 0x5FE6D1: ensure_active_superblock (dsa.c:1696)
> ==4567==by 0x5FEBBD: alloc_object (dsa.c:1452)
> ==4567==by 0x5FEBBD: dsa_allocate_extended (dsa.c:693)
> ==4567==by 0x3C7A83: pagetable_allocate (tidbitmap.c:1536)
> ==4567==by 0x3C7A83: pagetable_create (simplehash.h:342)
> ==4567==by 0x3C7A83: tbm_create_pagetable (tidbitmap.c:323)
> ==4567==by 0x3C8DAD: tbm_get_pageentry (tidbitmap.c:1246)
> ==4567==by 0x3C98A1: tbm_add_tuples (tidbitmap.c:432)
> ==4567==by 0x22510C: btgetbitmap (nbtree.c:460)
> ==4567==by 0x21A8D1: index_getbitmap (indexam.c:726)
> ==4567==by 0x38AD48: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:91)
> ==4567==by 0x37D353: MultiExecProcNode (execProcnode.c:621)
> ==4567==  Uninitialised value was created by a heap allocation
> ==4567==at 0x602FD5: palloc (mcxt.c:872)
> ==4567==by 0x5FF73B: create_internal (dsa.c:1242)
> ==4567==by 0x5FF8F5: dsa_create_in_place (dsa.c:473)
> ==4567==by 0x37CA32: ExecInitParallelPlan (execParallel.c:532)
> ==4567==by 0x38C324: ExecGather (nodeGather.c:152)
> ==4567==by 0x37D247: ExecProcNode (execProcnode.c:551)
> ==4567==by 0x39870F: ExecNestLoop (nodeNestloop.c:156)
> ==4567==by 0x37D1B7: ExecProcNode (execProcnode.c:512)
> ==4567==by 0x3849D4: fetch_input_tuple (nodeAgg.c:686)
> ==4567==by 0x387764: agg_retrieve_direct (nodeAgg.c:2306)
> ==4567==by 0x387A11: ExecAgg (nodeAgg.c:2117)
> ==4567==by 0x37D217: ExecProcNode (execProcnode.c:539)
> ==4567==
>
> It could be that these are spurious due to shared memory - valgrind
> doesn't track definedness across processes - but the fact that memory
> allocated by palloc is the source of the undefined memory makes me doubt
> that.

Thanks.  Will post a fix for this later today.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] monitoring.sgml missing tag

2017-04-07 Thread Andres Freund

On 2017-04-07 22:47:55 +0200, Erik Rijkers wrote:
> monitoring.sgml has one  tag missing

Is that actually an issue? SGML allows skipping certain close tags, and
IIRC row is one them.  We'll probably move to xml at some point not too
far away, but I don't think it makes much sense to fix these one-by-one.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] monitoring.sgml missing tag

2017-04-07 Thread Erik Rijkers


monitoring.sgml has one  tag missing--- doc/src/sgml/monitoring.sgml.orig	2017-04-07 22:37:55.388708334 +0200
+++ doc/src/sgml/monitoring.sgml	2017-04-07 22:38:16.582047695 +0200
@@ -1275,6 +1275,7 @@
 
  ProcArrayGroupUpdate
  Waiting for group leader to clear transaction id at transaction end.
+
 
  SafeSnapshot
  Waiting for a snapshot for a READ ONLY DEFERRABLE transaction.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Thomas Munro

On Sat, Apr 8, 2017 at 6:35 AM, Kevin Grittner  wrote:
> On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund  wrote:
>
>> I'd rather fix the issue, than remove the tests entirely.  Seems quite
>> possible to handle blocking on Safesnapshot in a similar manner as 
>> pg_blocking_pids?
>
> I'll see what I can figure out.

Ouch.  These are the other ways I thought of to achieve this:

https://www.postgresql.org/message-id/CAEepm%3D1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN%3D0LY3pJ49Ksg%40mail.gmail.com

I'd be happy to write one of those, but it may take a day as I have
some other commitments.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN desummarization writes junk WAL records

2017-04-07 Thread Alvaro Herrera

Tom Lane wrote:

> The proximate cause of the exception seems to be that
> brinSetHeapBlockItemptr is being passed pagesPerRange = 0,
> which is problematic since HEAPBLK_TO_REVMAP_INDEX tries to
> divide by that.  Looking one level down, the bogus value
> seems to be coming out of an xl_brin_desummarize WAL record:
> 
> (gdb) f 1
> #1  0x00478cdc in brin_xlog_desummarize_page (record=0x2403ac8)
> at brin_xlog.c:274
> 274 brinSetHeapBlockItemptr(buffer, xlrec->pagesPerRange, 
> xlrec->heapBlk, iptr);
> (gdb) p *xlrec
> $1 = {pagesPerRange = 0, heapBlk = 0, regOffset = 1}
> 
> This is, perhaps, not unrelated to the fact that
> brinRevmapDesummarizeRange doesn't seem to be bothering to fill
> that field of the record.

Absolutely.

> BTW, is it actually sensible that xl_brin_desummarize's heapBlk
> is declared OffsetNumber and not BlockNumber?  If there's a reason
> why that's correct, the field name seems damn misleading.

Nah, just an oversight (against which the compiler doesn't protect.)

Fixed both problems.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Alvaro Herrera

Peter Geoghegan wrote:

> My offer to work with you on amcheck verification of WARM invariants
> remains open. If nothing else, structuring things so that verification
> is possible may clarify your design. Formalizing the preconditions,
> postconditions, and legal states for on-disk structures might just be
> a useful exercise, even if verification never actually finds a
> problem.

Agreed.  Thanks much.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-07 Thread Andres Freund

Hi,

I've *not* read the history of this thread.  So I really might be
missing some context.


> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
> From: Claudio Freire 
> Date: Mon, 12 Sep 2016 23:36:42 -0300
> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
> 
> Turn the dead_tuples array into a structure composed of several
> exponentially bigger arrays, to enable usage of more than 1GB
> of work mem during vacuum and thus reduce the number of full
> index scans necessary to remove all dead tids when the memory is
> available.

>   * We are willing to use at most maintenance_work_mem (or perhaps
>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
> - * initially allocate an array of TIDs of that size, with an upper limit that
> + * initially allocate an array of TIDs of 128MB, or an upper limit that
>   * depends on table size (this limit ensures we don't allocate a huge area
> - * uselessly for vacuuming small tables).  If the array threatens to 
> overflow,
> - * we suspend the heap scan phase and perform a pass of index cleanup and 
> page
> - * compaction, then resume the heap scan with an empty TID array.
> + * uselessly for vacuuming small tables). Additional arrays of increasingly
> + * large sizes are allocated as they become necessary.
> + *
> + * The TID array is thus represented as a list of multiple segments of
> + * varying size, beginning with the initial size of up to 128MB, and growing
> + * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
> + * is used up.

When the chunk size is 128MB, I'm a bit unconvinced that using
exponential growth is worth it. The allocator overhead can't be
meaningful in comparison to collecting 128MB dead tuples, the potential
waste is pretty big, and it increases memory fragmentation.


> + * Lookup in that structure proceeds sequentially in the list of segments,
> + * and with a binary search within each segment. Since segment's size grows
> + * exponentially, this retains O(N log N) lookup complexity.

N log N is a horrible lookup complexity.  That's the complexity of
*sorting* an entire array.  I think you might be trying to argue that
it's log(N) * log(N)? Once log(n) for the exponentially growing size of
segments, one for the binary search?

Afaics you could quite easily make it O(2 log(N)) by simply also doing
binary search over the segments.  Might not be worth it due to the small
constant involved normally.


> + * If the array threatens to overflow, we suspend the heap scan phase and
> + * perform a pass of index cleanup and page compaction, then resume the heap
> + * scan with an array of logically empty but already preallocated TID 
> segments
> + * to be refilled with more dead tuple TIDs.

Hm, it's not really the array that overflows, it's m_w_m that'd be
exceeded, right?


>  /*
> + * Minimum (starting) size of the dead_tuples array segments. Will allocate
> + * space for 128MB worth of tid pointers in the first segment, further 
> segments
> + * will grow in size exponentially. Don't make it too small or the segment 
> list
> + * will grow bigger than the sweetspot for search efficiency on big vacuums.
> + */
> +#define LAZY_MIN_TUPLES  Max(MaxHeapTuplesPerPage, (128<<20) / 
> sizeof(ItemPointerData))

That's not really the minimum, no? s/MIN/INIT/?


> +typedef struct DeadTuplesSegment
> +{
> + int num_dead_tuples;/* # of entries in the 
> segment */
> + int max_dead_tuples;/* # of entries 
> allocated in the segment */
> + ItemPointerData last_dead_tuple;/* Copy of the last dead tuple 
> (unset
> + 
>  * until the segment is fully
> + 
>  * populated) */
> + unsigned short padding;
> + ItemPointer dt_tids;/* Array of dead tuples */
> +}DeadTuplesSegment;

Whenever padding is needed, it should have an explanatory comment.  It's
certainly not obvious to me wh it's neede here.


> @@ -1598,6 +1657,11 @@ lazy_vacuum_index(Relation indrel,
>   ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
>   ivinfo.strategy = vac_strategy;
>  
> + /* Finalize the current segment by setting its upper bound dead tuple */
> + seg = DeadTuplesCurrentSegment(vacrelstats);
> + if (seg->num_dead_tuples > 0)
> + seg->last_dead_tuple = seg->dt_tids[seg->num_dead_tuples - 1];

Why don't we just maintain this here, for all of the segments?  Seems a
bit easier.


> @@ -1973,7 +2037,8 @@ count_nondeletable_pages(Relation onerel, LVRelStats 
> *vacrelstats)
>  static void
>  lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
>  {
> - longmaxtuples;
> + longmaxtuples,
> + mintuples;
>   int vac_work_mem = IsAutoV

Re: [HACKERS] Undefined psql variables

2017-04-07 Thread Pavel Stehule

2017-04-07 21:04 GMT+02:00 Fabien COELHO :

>
> Hello Pavel,
>
> I wish I could have an explanation about why the :?varname (or some other
>>> variant) syntax I suggested has a "namespace" issue.
>>>
>>> The advantage that I see is that although it is obviously ugly, it is
>>> ugly
>>> in the continuity of the various :["'?]varname syntaxes already offered
>>> and
>>> it allows to get rid of "defined varname" which does not look like SQL. A
>>> second advantage is that with the "defined" proposal
>>>
>>
>> I don't think so this argument is valid - \if doesn't look like SQL too.
>>
>
> Sure. I'm talking about the expressions after the "\if" which should be as
> close as SQL, I think. At least that is what Tom required about the
> expression syntax in pgbench, and I tend to agree that psql should avoid to
> mix in another language if possible.
>
>\if defined var1 and defined var2 or defined var3 and sqlrt() >= ..
>>>
>>> Would probably never work work, as it cannot be embedded in another
>>> expression, while it would work with
>>>
>>>\if :?var1 and :?var2 or :?var3 and ...
>>>
>>> I don't see any reason why first should not work and second should to
>> work
>>
>
> Because of the mix of client-side and server-side stuff which needs to be
> interpreted. Let us consider:
>
>   \if EXISTS (SELECT * FROM tbl WHERE id=3) AND defined foo
>
> The "exists" is obviously executed server-side, but "defined foo" needs to
> be interpreted client-side, and it means that some parser client side would
> have been able to catch it in the middle of everything else. This example
> also illustrate my "does not look like SQL" point, as the first part is
> clearly SQL and the part after AND is not.
>
> With the second approach, ... "AND :?foo", the ":?foo" reference would be
> substituted directly by psql lexer and replaced on the fly by the answer,
> resulting in "AND TRUE" or "AND FALSE" depending, then the whole result
> (from EXISTS to TRUE/FALSE) could be interpreted server side to get an
> answer.
>
> Basically, catching :?varname seems easier/safer than catching "defined
> varname". I think that Tom's intent is that the defined expressions could
> not be mixed with SQL server side stuff, but I do not see why not, it is
> easy to imagine some use case where it would make sense.
>
> I have a different opinion - the condition expression should not be SQL
>> necessary. This language is oriented on client side operations. What is
>> benefit from server side expression?
>>
>
> Because I think it is legitimate to be able to write things like:
>
>   \if NOT pg_extension_is_loaded('units')
> \echo 'this application requires the great units extension'
> \q
>   \endif
>
>   \if (SELECT version FROM app_version) >= 2.0
> \echo 'application already installed at 2.0'
> \q
>   \endif
>
>
you proposal disallow client side expressions. I agree so is not possible
to mix server side and client side expressions. But I am sceptic so benefit
of server side expression is higher than a lost of client side expressions.
If we disallow server side expressions, then your examples are only two
lines longer, but the implementation can be more simpler.

SELECT version FROM  app_version
\gset
\if :version >= 2.0
 ...

Still I don't think so server side expression in \if is good idea.

Regards

Pavel




> --
> Fabien.
>

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Peter Geoghegan

On Fri, Apr 7, 2017 at 12:28 PM, Alvaro Herrera
 wrote:
> Peter Geoghegan wrote:
>> On Fri, Apr 7, 2017 at 11:37 AM, Andres Freund  wrote:
>> > Write Amplification Reduction Method (WARM)
>> > - fair number of people don't think it's ready for v10.
>
> Given the number of votes against putting this on pg10, I am going to
> back off from this patch now, with an eye towards putting it in pg11 as
> soon as the tree opens.  Either I or Pavan are going to post another
> version of this patch series, within the next couple of weeks, so that
> others can base their testing, review and suggestions.

My offer to work with you on amcheck verification of WARM invariants
remains open. If nothing else, structuring things so that verification
is possible may clarify your design. Formalizing the preconditions,
postconditions, and legal states for on-disk structures might just be
a useful exercise, even if verification never actually finds a
problem.

I anticipate that amcheck verification will become my main focus for
Postgres 11, in any case.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Unpinning error in parallel worker

2017-04-07 Thread Robert Haas

On Wed, Apr 5, 2017 at 10:18 AM, Robert Haas  wrote:
>>> Ugh, OK.  I committed this, but I think this whole file needs a visit
>>> from the message style police.
>>
>> Like this?
>
> I was thinking of maybe not creating two separate (translatable)
> messages, and just using "could not attach to dynamic shared area" for
> both.

Done that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Robert Haas

On Fri, Apr 7, 2017 at 3:30 PM, Andres Freund  wrote:
> On 2017-04-07 16:28:03 -0300, Alvaro Herrera wrote:
>> Peter Geoghegan wrote:
>> > > - can't move to next fest because it's waiting-on-author, which doesn't
>> > >   allow that.  Doesn't strike me as a useful restriction.
>> >
>> > I agree that that CF app restriction makes little sense.
>>
>> What the restriction means is that if a patch is in waiting-on-author,
>> the proper "close" action is to return-with-feedback.  There is no point
>> in moving the patch to the next commitfest if there is no further patch
>> version.
>
> That's true if the patch has been in that state for a while, but if you
> find some relatively minor issues, and then move it soon after, it seems
> to make sense to keep it open in the next CF.

In an ideal world, we wouldn't do that.  Of course, we do not live in
an ideal world...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Andres Freund

On 2017-04-07 16:28:03 -0300, Alvaro Herrera wrote:
> Peter Geoghegan wrote:
> > > - can't move to next fest because it's waiting-on-author, which doesn't
> > >   allow that.  Doesn't strike me as a useful restriction.
> > 
> > I agree that that CF app restriction makes little sense.
> 
> What the restriction means is that if a patch is in waiting-on-author,
> the proper "close" action is to return-with-feedback.  There is no point
> in moving the patch to the next commitfest if there is no further patch
> version.

That's true if the patch has been in that state for a while, but if you
find some relatively minor issues, and then move it soon after, it seems
to make sense to keep it open in the next CF.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Alvaro Herrera

Peter Geoghegan wrote:
> On Fri, Apr 7, 2017 at 11:37 AM, Andres Freund  wrote:
> > Write Amplification Reduction Method (WARM)
> > - fair number of people don't think it's ready for v10.

Given the number of votes against putting this on pg10, I am going to
back off from this patch now, with an eye towards putting it in pg11 as
soon as the tree opens.  Either I or Pavan are going to post another
version of this patch series, within the next couple of weeks, so that
others can base their testing, review and suggestions.

> > - can't move to next fest because it's waiting-on-author, which doesn't
> >   allow that.  Doesn't strike me as a useful restriction.
> 
> I agree that that CF app restriction makes little sense.

What the restriction means is that if a patch is in waiting-on-author,
the proper "close" action is to return-with-feedback.  There is no point
in moving the patch to the next commitfest if there is no further patch
version.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-07 Thread Robert Haas

On Fri, Apr 7, 2017 at 2:33 PM, Robert Haas  wrote:
> On Mon, Apr 3, 2017 at 2:12 AM, Etsuro Fujita
>  wrote:
>> On 2017/04/01 1:32, Jeff Janes wrote:
>>> On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
>>> mailto:fujita.ets...@lab.ntt.co.jp>> wrote:
>>> Done.  Attached is a new version of the patch.
>>> Is the fix for 9.6.3 going to be just a back port of this, or will it
>>> look different?
>>
>> +1 for backporting; although that requires that GetForeignJoinPaths be
>> updated so that the FDW uses a new function to create an alternative local
>> join path (ie, CreateLocalJoinPath), that would make maintenance of the code
>> easy.
>
> Well, the problem here is that this breaks ABI compatibility.  If we
> applied this to 9.6, and somebody tried to use a previously-compiled
> FDW .so against a new server version, it would fail after the upgrade,
> because the new server wouldn't have GetExistingLocalJoinPath and also
> possibly because of the change to the structure of JoinPathExtraData.
> Maybe there's no better alternative, and maybe nothing outside of
> postgres_fdw is using this stuff anyway, but it seems like a concern.
>
> Also, the CommitFest entry for this seems to be a bit sketchy.  It
> claims that Tom Lane is a co-author of this patch which, AFAICS, is
> not the case.  It is listed under Miscellaneous rather than "Bug
> Fixes", which seems like a surprising decision.  And it uses a subject
> line which is neither very clear nor the same as the (also not
> particularly helpful) subject line of the email thread.

Looking at the code itself, I find the changes to joinpath.c rather alarming.

+/* Save hashclauses for possible use by the FDW */
+if (extra->consider_foreignjoin && hashclauses)
+extra->hashclauses = hashclauses;

A minor consideration is that this is fairly far away from where
hashclauses actually gets populated, so if someone later added an
early "return" statement to this function -- after creating some paths
-- it could subtly break join pushdown.  But I also think there's no
real need for this.  The loop that extracts hash clauses is simple
enough that we could just refactor it into a separate function, or if
necessary duplicate the logic.

+/* Save first mergejoin data for possible use by the FDW */
+if (extra->consider_foreignjoin && outerkeys == all_pathkeys)
+{
+extra->mergeclauses = cur_mergeclauses;
+extra->outersortkeys = outerkeys;
+extra->innersortkeys = innerkeys;
+}

Similarly here.  select_outer_pathkeys_for_merge(),
find_mergeclauses_for_pathkeys(), and make_inner_pathkeys_for_merge()
are all extern, so there's nothing to keep CreateLocalJoinPath() from
just doing that work itself instead of getting help from joinpath,
which I guess seems better to me.  I think it's just better if we
don't burden joinpath.c with keeping little bits of data around that
CreateLocalJoinPath() can easily get for itself.

There appears to be no regression test covering the case where we get
a Merge Full Join with a non-empty list of mergeclauses.  Hash Full
Join is tested, as is Merge Full Join without merge clauses, but
there's no test for Merge Full Join with mergeclauses, and since that
is a separate code path it seems like it should be tested.

-/*
- * If either inner or outer path is a ForeignPath corresponding to a
- * pushed down join, replace it with the fdw_outerpath, so that we
- * maintain path for EPQ checks built entirely of local join
- * strategies.
- */
-if (IsA(joinpath->outerjoinpath, ForeignPath))
-{
-ForeignPath *foreign_path;
-
-foreign_path = (ForeignPath *) joinpath->outerjoinpath;
-if (IS_JOIN_REL(foreign_path->path.parent))
-joinpath->outerjoinpath = foreign_path->fdw_outerpath;
-}
-
-if (IsA(joinpath->innerjoinpath, ForeignPath))
-{
-ForeignPath *foreign_path;
-
-foreign_path = (ForeignPath *) joinpath->innerjoinpath;
-if (IS_JOIN_REL(foreign_path->path.parent))
-joinpath->innerjoinpath = foreign_path->fdw_outerpath;
-}

This logic is removed and not replaced with anything, but I don't see
what keeps this code...

+Path   *outer_path = outerrel->cheapest_total_path;
+Path   *inner_path = innerrel->cheapest_total_path;

...from picking a ForeignPath?

There's probably more to think about here, but those are my question
on an initial read-through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Undefined psql variables

2017-04-07 Thread Fabien COELHO



Hello Pavel,


I wish I could have an explanation about why the :?varname (or some other
variant) syntax I suggested has a "namespace" issue.

The advantage that I see is that although it is obviously ugly, it is ugly
in the continuity of the various :["'?]varname syntaxes already offered and
it allows to get rid of "defined varname" which does not look like SQL. A
second advantage is that with the "defined" proposal


I don't think so this argument is valid - \if doesn't look like SQL too.


Sure. I'm talking about the expressions after the "\if" which should be 
as close as SQL, I think. At least that is what Tom required about the 
expression syntax in pgbench, and I tend to agree that psql should avoid 
to mix in another language if possible.



   \if defined var1 and defined var2 or defined var3 and sqlrt() >= ..

Would probably never work work, as it cannot be embedded in another
expression, while it would work with

   \if :?var1 and :?var2 or :?var3 and ...


I don't see any reason why first should not work and second should to work


Because of the mix of client-side and server-side stuff which needs to be 
interpreted. Let us consider:


  \if EXISTS (SELECT * FROM tbl WHERE id=3) AND defined foo

The "exists" is obviously executed server-side, but "defined foo" needs to 
be interpreted client-side, and it means that some parser client side 
would have been able to catch it in the middle of everything else. This 
example also illustrate my "does not look like SQL" point, as the first 
part is clearly SQL and the part after AND is not.


With the second approach, ... "AND :?foo", the ":?foo" reference would be 
substituted directly by psql lexer and replaced on the fly by the answer, 
resulting in "AND TRUE" or "AND FALSE" depending, then the whole result 
(from EXISTS to TRUE/FALSE) could be interpreted server side to get an 
answer.


Basically, catching :?varname seems easier/safer than catching "defined 
varname". I think that Tom's intent is that the defined expressions could 
not be mixed with SQL server side stuff, but I do not see why not, it is 
easy to imagine some use case where it would make sense.



I have a different opinion - the condition expression should not be SQL
necessary. This language is oriented on client side operations. What is
benefit from server side expression?


Because I think it is legitimate to be able to write things like:

  \if NOT pg_extension_is_loaded('units')
\echo 'this application requires the great units extension'
\q
  \endif

  \if (SELECT version FROM app_version) >= 2.0
\echo 'application already installed at 2.0'
\q
  \endif

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Robert Haas

On Fri, Apr 7, 2017 at 2:53 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Andres Freund wrote:
>>> Write Amplification Reduction Method (WARM)
>>> - fair number of people don't think it's ready for v10.
>
>> I'm going over this one now with Pavan, with the intent of getting it in
>> committable shape.
>
> I have to agree with Andres that this is not something to push in, on the
> last day before feature freeze, when a number of people aren't comfortable
> with it.  It looks much more like a feature to push at the start of a
> development cycle.

I strongly agree.  Testing has found some noticeable regressions in
some cases as well, even if there were no outright bugs.  I'm frankly
astonished by the ongoing unwillingness to admit that the objections
(by multiple people) to this patch have any real merit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Tom Lane

Alvaro Herrera  writes:
> Andres Freund wrote:
>> Write Amplification Reduction Method (WARM)
>> - fair number of people don't think it's ready for v10.

> I'm going over this one now with Pavan, with the intent of getting it in
> committable shape.

I have to agree with Andres that this is not something to push in, on the
last day before feature freeze, when a number of people aren't comfortable
with it.  It looks much more like a feature to push at the start of a
development cycle.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance issue with postgres9.6

2017-04-07 Thread Merlin Moncure

On Fri, Apr 7, 2017 at 12:54 PM, Tom Lane  wrote:
> Tomas Vondra  writes:
>> On 04/07/2017 06:31 PM, Merlin Moncure wrote:
>>> I think your math is off.  Looking at your attachments, planning time
>>> is 0.056ms, not 0.56ms.  This is in no way relevant to performance on
>>> the order of your measured TPS.   How are you measuring TPS?
>
>> Not sure where did you get the 0.056ms?
>
> I don't see that either, but:
>
>> What I see is this in the 9.3 explains:
>>   Total runtime: 0.246 ms
>> and this in those from 9.6:
>>   Planning time: 0.396 ms
>>   Execution time: 0.181 ms
>> That is roughly 0.25ms vs. 0.6ms (0.4+0.2), as reported by Prakash.
>
> 9.3's EXPLAIN did not measure planning time at all.  The "Total runtime"
> it reports corresponds to "Execution time" in the newer version.  So
> these numbers indicate that 9.6 is significantly *faster*, not slower,
> than 9.3, at least so far as execution of this one example is concerned.
>
> The OP may well be having some performance issue with 9.6, but the
> presented material completely fails to demonstrate it.

This smells like a problem with the test execution environment itself.
OP (if on linux), try:

pgbench -n -f <(echo "select * from subscriber where s_id = 100") -c 4 -T 10

...where pgbench is run from the database server (if pgbench is not in
the default path, you may have to qualify it).  This should give
apples to apples comparison, or at least rule out certain
environmental considerations like the network stack.

If your client is running on windows, one place to look is the TCP
stack.  In my experience tcp configuration issues are much more common
on windows.  On any reasonably modern hardware can handle thousands
and thousands of transactions per second for simple indexed select.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Andres Freund

On 2017-04-07 15:45:33 -0300, Alvaro Herrera wrote:
> > Write Amplification Reduction Method (WARM)
> > - fair number of people don't think it's ready for v10.
> 
> I'm going over this one now with Pavan, with the intent of getting it in
> committable shape.
> 
> I may be biased, but the claimed performance gains are so large that I
> can't let it slip through without additional effort.

I strongly object to pushing it into v10.  The potential negative impact
of a patch that touches the on-disk representation is also pretty
large.  I think we'll have to discuss among a few more committers
whether we're ok with pushing this one.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Tom Lane

Andres Freund  writes:
> I think it makes sense to go through those and see whether it's
> realistic to commit any of them.

> Unique Joins
> - Tom's discussing things with David, not sure.

Working on this one today.

> Generic type subscripting
> - still some review back and forth
> - probably should be punted

Yeah, I do not think we should hustle this in at the last minute.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables vs GRANT

2017-04-07 Thread Tom Lane

Joe Conway  writes:
> Apparently INSERT and SELECT on the parent partitioned table skip normal
> acl checks on the partitions. Is that intended behavior?

Yes, this matches normal inheritance behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Peter Geoghegan

On Fri, Apr 7, 2017 at 11:37 AM, Andres Freund  wrote:
> Write Amplification Reduction Method (WARM)
> - fair number of people don't think it's ready for v10.
> - can't move to next fest because it's waiting-on-author, which doesn't
>   allow that.  Doesn't strike me as a useful restriction.

I agree that that CF app restriction makes little sense.

> Indexes with Included Columns (was Covering + unique indexes)
> - Don't think concerns about #columns on truncated tuples have been
>   addressed.  Should imo be returned-with-feedback.

+1.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Alvaro Herrera

Andres Freund wrote:

> Unique Joins
> - Tom's discussing things with David, not sure.

This one was already included-and-removed from 9.6, Tom had said he'd
give it priority during the current cycle as I recall.  It seems unfair
that it's still waiting for review on the last day of pg10's last
commitfest.

> Write Amplification Reduction Method (WARM)
> - fair number of people don't think it's ready for v10.

I'm going over this one now with Pavan, with the intent of getting it in
committable shape.

I may be biased, but the claimed performance gains are so large that I
can't let it slip through without additional effort.

> BRIN optimize memory allocation
> - I think Alvaro has indicated that he wants to take care of that?

I am happy to see it move to pg11 to give priority to WARM.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 1:37 PM, Andres Freund  wrote:

> I think it makes sense to go through those and see whether it's
> realistic to commit any of them.
>
> Ready for Committer:
>
> Add GUCs for predicate lock promotion thresholds:
> - claimed by Kevin, should be easy enough

I was planning on pushing this today.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Andres Freund

Hi,

When I started writing this, there were the following reamining CF
items, minus bugfix ones which aren't bound by the code freeze.

I think it makes sense to go through those and see whether it's
realistic to commit any of them.

Ready for Committer:

Add GUCs for predicate lock promotion thresholds:
- claimed by Kevin, should be easy enough

initdb configurable wal_segment_size
- parts have been committed
- significantly derailed by segment naming discussion
- possibly committable if we decide to skip the naming bit? But also a
  bit late given that it touches some quite sensitive code.

Create fdw_outerpath for foreign
- haven't really followed discussion
- only marked as ready-for-committer 2017-04-04

Vacuum: allow usage of more than 1GB of work mem
- hm, maybe?  Will take a look.

Unique Joins
- Tom's discussing things with David, not sure.

Push down more UPDATEs/DELETEs in postgres_fdw
- claimed by Robert?

postgres_fdw: support parameterized foreign joins
- think that depends on fdw_outerpath?


Waiting on Author:

SQL statements statistics counter view (pg_stat_sql)
- the code doesn't look quite ready
- don't think we quite have design agreement, e.g. I don't like where it's
  hooked into query execution

Write Amplification Reduction Method (WARM)
- fair number of people don't think it's ready for v10.
- can't move to next fest because it's waiting-on-author, which doesn't
  allow that.  Doesn't strike me as a useful restriction.

BRIN optimize memory allocation
- I think Alvaro has indicated that he wants to take care of that?

Indexes with Included Columns (was Covering + unique indexes)
- Don't think concerns about #columns on truncated tuples have been
  addressed.  Should imo be returned-with-feedback.


Needs-Review:

Better estimate merging for duplicate vars in clausesel.c
- has been submitted pretty late (2017-02-24) and discussion is ongoing
- I'm inclined to punt on this one to the next release, previous
  proposal along that line got some pushback

new plpgsql extra_checks
- Winner of the "most opaque CF title" award
- hasn't received a whole lot of review
- don't think we're even close to having design agreement

Generic type subscripting
- still some review back and forth
- probably should be punted


Any comments?

Greetings,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-07 Thread Mike Palmiotto

On Thu, Apr 6, 2017 at 5:52 PM, Joe Conway  wrote:
> On 04/06/2017 12:35 PM, Tom Lane wrote:
>> Joe Conway  writes:
>>> Any thoughts on whether 0001a and 0001b ought to be backpatched? I'm
>>> thinking not given the lack of past complaints but it might make sense
>>> to do.
>>
>> I think 0001a absolutely needs to be, because it is fixing what is really
>> an ABI violation: sepgsql_needs_fmgr_hook is supposed to return our notion
>> of bool, but as things stand it's returning _Bool (which is why the
>> compiler is complaining).  Now we might get away with that on most
>> hardware, but on platforms where those are different widths, it's possible
>> to imagine function-return conventions that would make it fail.
>>
>> 0001b seems to only be needed for compilers that aren't smart enough
>> to see that tclass won't be referenced for RELKIND_INDEX, so it's
>> just cosmetic.
>
> Ok, committed/pushed that way.
>
> I found some missing bits in the 0002 patch -- new version attached.
> Will wait on new regression tests before committing, but I expect we'll
> have those by end of today and be able to commit the rest tomorrow.

Attached are the regression test updates for partitioned tables.

Thanks,
-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund  wrote:

> I'd rather fix the issue, than remove the tests entirely.  Seems quite
> possible to handle blocking on Safesnapshot in a similar manner as 
> pg_blocking_pids?

I'll see what I can figure out.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-07 Thread Robert Haas

On Mon, Apr 3, 2017 at 2:12 AM, Etsuro Fujita
 wrote:
> On 2017/04/01 1:32, Jeff Janes wrote:
>> On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
>> mailto:fujita.ets...@lab.ntt.co.jp>> wrote:
>> Done.  Attached is a new version of the patch.
>> Is the fix for 9.6.3 going to be just a back port of this, or will it
>> look different?
>
> +1 for backporting; although that requires that GetForeignJoinPaths be
> updated so that the FDW uses a new function to create an alternative local
> join path (ie, CreateLocalJoinPath), that would make maintenance of the code
> easy.

Well, the problem here is that this breaks ABI compatibility.  If we
applied this to 9.6, and somebody tried to use a previously-compiled
FDW .so against a new server version, it would fail after the upgrade,
because the new server wouldn't have GetExistingLocalJoinPath and also
possibly because of the change to the structure of JoinPathExtraData.
Maybe there's no better alternative, and maybe nothing outside of
postgres_fdw is using this stuff anyway, but it seems like a concern.

Also, the CommitFest entry for this seems to be a bit sketchy.  It
claims that Tom Lane is a co-author of this patch which, AFAICS, is
not the case.  It is listed under Miscellaneous rather than "Bug
Fixes", which seems like a surprising decision.  And it uses a subject
line which is neither very clear nor the same as the (also not
particularly helpful) subject line of the email thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables vs GRANT

2017-04-07 Thread Keith Fiske

On Fri, Apr 7, 2017 at 2:05 PM, Joe Conway  wrote:

> Apparently INSERT and SELECT on the parent partitioned table skip normal
> acl checks on the partitions. Is that intended behavior?
>
> 8<---
> test=# create user part_test;
> CREATE ROLE
> test=#
> test=# create table t1 (id int) partition by range ((id % 4));
> CREATE TABLE
> test=# create table t1_0 partition of t1 for values from (0) to (1);
> CREATE TABLE
> test=# create table t1_1 partition of t1 for values from (1) to (2);
> CREATE TABLE
> test=# create table t1_2 partition of t1 for values from (2) to (3);
> CREATE TABLE
> test=# create table t1_3 partition of t1 for values from (3) to (4);
> CREATE TABLE
> test=# grant all on TABLE t1 to part_test;
> GRANT
> test=# set session authorization part_test ;
> SET
> test=> select current_user;
>  current_user
> --
>  part_test
> (1 row)
>
> test=> insert into t1 values(0),(1),(2),(3);
> INSERT 0 4
> test=> insert into t1_0 values(0);
> ERROR:  permission denied for relation t1_0
> test=> insert into t1_1 values(1);
> ERROR:  permission denied for relation t1_1
> test=> insert into t1_2 values(2);
> ERROR:  permission denied for relation t1_2
> test=> insert into t1_3 values(3);
> ERROR:  permission denied for relation t1_3
> test=> select * from t1;
>  id
> 
>   0
>   1
>   2
>   3
> (4 rows)
>
> test=> select * from t1_0;
> ERROR:  permission denied for relation t1_0
> test=> select * from t1_1;
> ERROR:  permission denied for relation t1_1
> test=> select * from t1_2;
> ERROR:  permission denied for relation t1_2
> test=> select * from t1_3;
> ERROR:  permission denied for relation t1_3
> test=> reset session authorization;
> RESET
> test=# drop table if exists t1;
> DROP TABLE
> 8<---
>
> Joe
>
> --
> Crunchy Data - http://crunchydata.com
> PostgreSQL Support for Secure Enterprises
> Consulting, Training, & Open Source Development
>
>
I encountered that as well testing for native in pg_partman. I had to
include the code for non-native that propagates ownership/privileges from
the parent to the child.
Another question to ask is that if you change privileges on the parent,
does that automatically change them for all children as well? I encountered
this being a rather expensive operation using plpgsql methods to fix it
when the child count grows high. That's why I have resetting all child
table privileges as a separate, manual function and changes only apply to
new partitions automatically. Hopefully internally there's a more efficient
way.

Keith

Re: [HACKERS] [PATCH] Remove unused argument in btree_xlog_split

2017-04-07 Thread Robert Haas

On Thu, Apr 6, 2017 at 8:21 AM, Aleksander Alekseev
 wrote:
> Hi Robert,
>
>> Hmm.  I don't see anything wrong with that, particularly, but it seems
>> we also don't need the distinction between XLOG_BTREE_SPLIT_L and
>> XLOG_BTREE_SPLIT_L_ROOT or likewise between XLOG_BTREE_SPLIT_R and
>> XLOG_BTREE_SPLIT_R_ROOT -- in which case I think this patch should go
>> a little further and do all of that together.
>
> Thank you for sharing your thoughts on this patch. Here is a new
> version.

Thanks.  Please add this to the next CommitFest, as there seems to be
no urgency (and some risk) in committing it right before feature
freeze.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Partitioned tables vs GRANT

2017-04-07 Thread Joe Conway

Apparently INSERT and SELECT on the parent partitioned table skip normal
acl checks on the partitions. Is that intended behavior?

8<---
test=# create user part_test;
CREATE ROLE
test=#
test=# create table t1 (id int) partition by range ((id % 4));
CREATE TABLE
test=# create table t1_0 partition of t1 for values from (0) to (1);
CREATE TABLE
test=# create table t1_1 partition of t1 for values from (1) to (2);
CREATE TABLE
test=# create table t1_2 partition of t1 for values from (2) to (3);
CREATE TABLE
test=# create table t1_3 partition of t1 for values from (3) to (4);
CREATE TABLE
test=# grant all on TABLE t1 to part_test;
GRANT
test=# set session authorization part_test ;
SET
test=> select current_user;
 current_user
--
 part_test
(1 row)

test=> insert into t1 values(0),(1),(2),(3);
INSERT 0 4
test=> insert into t1_0 values(0);
ERROR:  permission denied for relation t1_0
test=> insert into t1_1 values(1);
ERROR:  permission denied for relation t1_1
test=> insert into t1_2 values(2);
ERROR:  permission denied for relation t1_2
test=> insert into t1_3 values(3);
ERROR:  permission denied for relation t1_3
test=> select * from t1;
 id

  0
  1
  2
  3
(4 rows)

test=> select * from t1_0;
ERROR:  permission denied for relation t1_0
test=> select * from t1_1;
ERROR:  permission denied for relation t1_1
test=> select * from t1_2;
ERROR:  permission denied for relation t1_2
test=> select * from t1_3;
ERROR:  permission denied for relation t1_3
test=> reset session authorization;
RESET
test=# drop table if exists t1;
DROP TABLE
8<---

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] UPDATE of partition key

2017-04-07 Thread Andres Freund

On 2017-04-07 13:55:51 -0400, Robert Haas wrote:
> On Wed, Apr 5, 2017 at 5:54 AM, Amit Langote
>  wrote:
> > Marked as ready for committer.
> 
> Andres seems to have changed the status of this patch to "Needs
> review" and then, 30 seconds later, to "Waiting on author"
> there's no actual email on the thread explaining what his concerns
> were.  I'm going to set this back to "Ready for Committer" and push it
> out to the next CommitFest.  I think this would be a great feature,
> but I think it's not entirely clear that we have consensus on the
> design, so let's revisit it for next release.

I was kind of looking for the appropriate status of "not entirely clear
that we have consensus on the design" - which isn't really
ready-for-committer, but no waiting-on-author either...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-04-07 Thread Robert Haas

On Wed, Apr 5, 2017 at 5:54 AM, Amit Langote
 wrote:
> Marked as ready for committer.

Andres seems to have changed the status of this patch to "Needs
review" and then, 30 seconds later, to "Waiting on author", but
there's no actual email on the thread explaining what his concerns
were.  I'm going to set this back to "Ready for Committer" and push it
out to the next CommitFest.  I think this would be a great feature,
but I think it's not entirely clear that we have consensus on the
design, so let's revisit it for next release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance issue with postgres9.6

2017-04-07 Thread Tom Lane

Tomas Vondra  writes:
> On 04/07/2017 06:31 PM, Merlin Moncure wrote:
>> I think your math is off.  Looking at your attachments, planning time
>> is 0.056ms, not 0.56ms.  This is in no way relevant to performance on
>> the order of your measured TPS.   How are you measuring TPS?

> Not sure where did you get the 0.056ms?

I don't see that either, but:

> What I see is this in the 9.3 explains:
>   Total runtime: 0.246 ms
> and this in those from 9.6:
>   Planning time: 0.396 ms
>   Execution time: 0.181 ms
> That is roughly 0.25ms vs. 0.6ms (0.4+0.2), as reported by Prakash.

9.3's EXPLAIN did not measure planning time at all.  The "Total runtime"
it reports corresponds to "Execution time" in the newer version.  So
these numbers indicate that 9.6 is significantly *faster*, not slower,
than 9.3, at least so far as execution of this one example is concerned.

The OP may well be having some performance issue with 9.6, but the
presented material completely fails to demonstrate it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Andres Freund

On 2017-04-07 12:49:22 -0500, Kevin Grittner wrote:
> On Fri, Apr 7, 2017 at 12:28 PM, Tom Lane  wrote:
> > Andrew Dunstan  writes:
> >> On 04/07/2017 12:57 PM, Andres Freund wrote:
> >>> I don't think any recent changes are supposed to affect deadlock
> >>> detector behaviour?
> >
> >> Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
> >> recent changes have made the isolation tests run much much longer.
> >
> > Ouch.  I see friarbird's run time for the isolation tests has gone from an
> > hour and change to over 5 hours in one fell swoop.  hyrax not much better.
> > Oddly, non-CCA animals don't seem to have changed much.
> >
> > Eyeing recent patches, it seems like the culprit must be Kevin's
> > addition to isolationtester's wait query:
> 
> Ouch.  Without this we don't have regression test coverage for the
> SERIALIZABLE READ ONLY DEFERRABLE code, but it's probably not worth
> adding 4 hours to any tests, even if it only shows up with
> CLOBBER_CACHE_ONLY.  I assume the consensus is that I should revert
> it?

I'd rather fix the issue, than remove the tests entirely.  Seems quite
possible to handle blocking on Safesnapshot in a similar manner as 
pg_blocking_pids?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel bitmapscan isn't exercised in regression tests

2017-04-07 Thread Andres Freund

On 2017-04-06 13:43:55 -0700, Andres Freund wrote:
> On 2017-04-06 10:00:32 +0530, Dilip Kumar wrote:
> > On Tue, Apr 4, 2017 at 5:51 AM, Dilip Kumar  wrote:
> > > Sure I can do that, In attached patch, I only fixed the problem of not
> > > executing the bitmap test.  Now, I will add few cases to cover other
> > > parts especially rescan and prefetching logic.
> > 
> > I have added two test cases to cover rescan, prefetch and lossy pages
> > logic for parallel bitmap.  I have removed the existing case because
> > these two new cases will be enough to cover that part as well.
> > 
> > Now, nodeBitmapHeapScan.c has 95.5% of line coverage.
> 
> Great!  Pushed.

At some point it might also be a good idea to compare parallel and
non-parallel results.  It's obviously quite possible to break semantics
with parallelism...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 12:28 PM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 04/07/2017 12:57 PM, Andres Freund wrote:
>>> I don't think any recent changes are supposed to affect deadlock
>>> detector behaviour?
>
>> Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
>> recent changes have made the isolation tests run much much longer.
>
> Ouch.  I see friarbird's run time for the isolation tests has gone from an
> hour and change to over 5 hours in one fell swoop.  hyrax not much better.
> Oddly, non-CCA animals don't seem to have changed much.
>
> Eyeing recent patches, it seems like the culprit must be Kevin's
> addition to isolationtester's wait query:

Ouch.  Without this we don't have regression test coverage for the
SERIALIZABLE READ ONLY DEFERRABLE code, but it's probably not worth
adding 4 hours to any tests, even if it only shows up with
CLOBBER_CACHE_ONLY.  I assume the consensus is that I should revert
it?

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-04-07 Thread Robert Haas

On Thu, Mar 9, 2017 at 5:49 PM, Robert Haas  wrote:
> However, I just realized that in
> both this case and in the case of group XID clearing, we weren't
> advertising a wait event for the PGSemaphoreLock calls that are part
> of the group locking machinery.  I think we should fix that, because a
> quick test shows that can happen fairly often -- not, I think, as
> often as we would have seen LWLock waits without these patches, but
> often enough that you'll want to know.  Patch attached.

I've pushed the portion of this that relates to ProcArrayLock.  (I
know this hasn't been discussed much, but there doesn't really seem to
be any reason for anybody to object, and looking at just the
LWLock/ProcArrayLock wait events gives a highly misleading answer.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance issue with postgres9.6

2017-04-07 Thread Tomas Vondra


On 04/07/2017 06:31 PM, Merlin Moncure wrote:

On Fri, Apr 7, 2017 at 5:16 AM, Prakash Itnal  wrote:

Hello,

We currently use psotgres 9.3 in our products. Recently we upgraded to
postgres 9.6. But with 9.6 we have seen a drastic reduction in throughput.
After analyzing carefully I found that "planner time" in 9.6 is very high.
Below are the details:

Scenario:
1 Create a table with 10 rows.
2 Execute simple query: select * from subscriber where s_id = 100;
3 No update/delete/insert; tried vacuum, full vacuum; by default we enable
auto-vacuum

9.3: Avg of "Total runtime" : 0.24ms [actual throughput: 650 TPS]
9.6: Avg of Total time: 0.56ms (Avg of "Planning time" : 0.38ms + Avg of
"Execution time" : 0.18ms) [actual throughput: 80 TPS]


I think your math is off.  Looking at your attachments, planning time
is 0.056ms, not 0.56ms.  This is in no way relevant to performance on
the order of your measured TPS.   How are you measuring TPS?



Not sure where did you get the 0.056ms? What I see is this in the 9.3 
explains:


 Total runtime: 0.246 ms

and this in those from 9.6:

 Planning time: 0.396 ms

 Execution time: 0.181 ms


That is roughly 0.25ms vs. 0.6ms (0.4+0.2), as reported by Prakash.

Obviously, this "just" 2x slowdown, so it does not match the drop from 
650 to 80 tps. Also, 0.25ms would be ~4000 tps, so I guess this was just 
an example of a query that slowed down.


Prakash, are you using packages (which ones?), or have you compiled from 
sources? Can you provide pg_config output from both versions, and also 
'select * from pg_settings' (the full config)?


It might also be useful to collect profiles, i.e. (1) install debug 
symbols (2) run the query in a loop and (3) collect profiles from that 
one backend using 'perf'.


I assume you're using the same hardware / machine for the tests?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Tom Lane

Andrew Dunstan  writes:
> On 04/07/2017 12:57 PM, Andres Freund wrote:
>> I don't think any recent changes are supposed to affect deadlock
>> detector behaviour?

> Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
> recent changes have made the isolation tests run much much longer.

Ouch.  I see friarbird's run time for the isolation tests has gone from an
hour and change to over 5 hours in one fell swoop.  hyrax not much better.
Oddly, non-CCA animals don't seem to have changed much.

Eyeing recent patches, it seems like the culprit must be Kevin's
addition to isolationtester's wait query:

@@ -231,6 +231,14 @@ main(int argc, char **argv)
appendPQExpBuffer(&wait_query, ",%s", backend_pids[i]);
appendPQExpBufferStr(&wait_query, "}'::integer[]");

+   /* Also detect certain wait events. */
+   appendPQExpBufferStr(&wait_query,
+" OR EXISTS ("
+"  SELECT * "
+"  FROM pg_catalog.pg_stat_activity "
+"  WHERE pid = $1 "
+"  AND wait_event IN ('SafeSnapshot'))");
+
res = PQprepare(conns[0], PREP_WAITING, wait_query.data, 0, NULL);
if (PQresultStatus(res) != PGRES_COMMAND_OK)
{

This seems a tad ill-considered.  We sweated a lot of blood not so long
ago to get the runtime of that query down, and this seems to have blown
it up again.  And done so for every isolation test case, not only the
ones where it could possibly matter.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Andrew Dunstan



On 04/07/2017 12:57 PM, Andres Freund wrote:
> Hi,
>
> There's two machines that recently report changes in deadlock detector
> output:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2017-04-05%2018%3A58%3A04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird&dt=2017-04-07%2004%3A20%3A01
>
> both just failed twice in a row:
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hyrax&br=HEAD
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=friarbird&br=HEAD
> without previous errors of the same ilk.
>
> I don't think any recent changes are supposed to affect deadlock
> detector behaviour?
>


Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
recent changes have made the isolation tests run much much longer.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-04-07 Thread Robert Haas

On Fri, Apr 7, 2017 at 3:59 AM, Heikki Linnakangas  wrote:
> I think the "SCRAM" part is more important than "SHA-256", so -1 on that.

I agree.  The point here isn't that we're using a better hashing
method, even if a lot of people *think* that's the point.  The point
is we're using a modern algorithm that has nice properties like "you
can't impersonate the client by steeling the verifier, or even by
snooping the exchange".

But "sasl" might be even better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pgbench --progress-timestamp no longer works correctly

2017-04-07 Thread Jeff Janes

--progress-timestamp is supposed to make -P report a Unix Epoch time stamp,
for easy correlation with the entries in other log files (like the postgres
server log file using %n).

But that broke in this commit:

commit 1d63f7d2d180c8708bc12710254eb7b45823440f
Author: Tom Lane 
Date:   Mon Jan 2 13:41:51 2017 -0500

Use clock_gettime(), if available, in instr_time measurements.


The commit before that one changed pgbench to make it tolerate the change
in clock, but it overlooked --progress-timestamp.

Cheers,

Jeff

[HACKERS] recent deadlock regression test failures

2017-04-07 Thread Andres Freund

Hi,

There's two machines that recently report changes in deadlock detector
output:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2017-04-05%2018%3A58%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird&dt=2017-04-07%2004%3A20%3A01

both just failed twice in a row:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hyrax&br=HEAD
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=friarbird&br=HEAD
without previous errors of the same ilk.

I don't think any recent changes are supposed to affect deadlock
detector behaviour?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecPrepareExprList and per-query context

2017-04-07 Thread Tom Lane

Amit Langote  writes:
> Should ExecPrepareExprList also switch to estate->es_query_cxt?

Good point; I'm surprised we haven't noted any failures from that.
We surely want the entire result data structure to be in the same
memory context.  There are not very many callers right now, and
I guess they are all in the right context already (or we aren't
testing them :-().

> Or maybe
> ExecPrepareExpr could itself detect that passed-in node is a List and
> create the list of ExprState nodes by itself.

-1.  That's just breaking the API of ExecPrepareExpr.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] valgrind errors around dsa.c

2017-04-07 Thread Andres Freund

Hi,

newly added tests exercise parallel bitmap scans.  And they trigger
valgrind errors:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-07%2007%3A10%3A01


==4567== VALGRINDERROR-BEGIN
==4567== Conditional jump or move depends on uninitialised value(s)
==4567==at 0x5FD62A: check_for_freed_segments (dsa.c:2219)
==4567==by 0x5FD97E: dsa_get_address (dsa.c:934)
==4567==by 0x5FDA2A: init_span (dsa.c:1339)
==4567==by 0x5FE6D1: ensure_active_superblock (dsa.c:1696)
==4567==by 0x5FEBBD: alloc_object (dsa.c:1452)
==4567==by 0x5FEBBD: dsa_allocate_extended (dsa.c:693)
==4567==by 0x3C7A83: pagetable_allocate (tidbitmap.c:1536)
==4567==by 0x3C7A83: pagetable_create (simplehash.h:342)
==4567==by 0x3C7A83: tbm_create_pagetable (tidbitmap.c:323)
==4567==by 0x3C8DAD: tbm_get_pageentry (tidbitmap.c:1246)
==4567==by 0x3C98A1: tbm_add_tuples (tidbitmap.c:432)
==4567==by 0x22510C: btgetbitmap (nbtree.c:460)
==4567==by 0x21A8D1: index_getbitmap (indexam.c:726)
==4567==by 0x38AD48: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:91)
==4567==by 0x37D353: MultiExecProcNode (execProcnode.c:621)
==4567==  Uninitialised value was created by a heap allocation
==4567==at 0x602FD5: palloc (mcxt.c:872)
==4567==by 0x5FF73B: create_internal (dsa.c:1242)
==4567==by 0x5FF8F5: dsa_create_in_place (dsa.c:473)
==4567==by 0x37CA32: ExecInitParallelPlan (execParallel.c:532)
==4567==by 0x38C324: ExecGather (nodeGather.c:152)
==4567==by 0x37D247: ExecProcNode (execProcnode.c:551)
==4567==by 0x39870F: ExecNestLoop (nodeNestloop.c:156)
==4567==by 0x37D1B7: ExecProcNode (execProcnode.c:512)
==4567==by 0x3849D4: fetch_input_tuple (nodeAgg.c:686)
==4567==by 0x387764: agg_retrieve_direct (nodeAgg.c:2306)
==4567==by 0x387A11: ExecAgg (nodeAgg.c:2117)
==4567==by 0x37D217: ExecProcNode (execProcnode.c:539)
==4567==

It could be that these are spurious due to shared memory - valgrind
doesn't track definedness across processes - but the fact that memory
allocated by palloc is the source of the undefined memory makes me doubt
that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compiler warning in costsize.c

2017-04-07 Thread Tom Lane

Michael Paquier  writes:
> Bah. This actually fixes nothing. Attached is a different patch that
> really addresses the problem, by removing the variable because we
> don't want planner_rt_fetch() to run for non-Assert builds.

I don't really like any of these fixes, because they take the code
further away from the structure used by all the other similar functions
in costsize.c, and they will be hard to undo whenever these functions
grow a reason to look at the RTE normally (outside asserts).

I'd be happier with something along the line of

RangeTblEntry *rte;
ListCell   *lc;

/* Should only be applied to base relations that are subqueries */
Assert(rel->relid > 0);
rte = planner_rt_fetch(rel->relid, root);
#ifdef USE_ASSERT_CHECKING
Assert(rte->rtekind == RTE_SUBQUERY);
#else
(void) rte;  /* silence unreferenced-variable complaints */
#endif

assuming that that actually does silence the warning on MSVC.

BTW, is it really true that only these two places produce such warnings
on MSVC?  I see about three dozen uses of PG_USED_FOR_ASSERTS_ONLY in our
tree, and I'd have expected all of those places to be causing warnings on
a compiler that doesn't have a way to understand that annotation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance issue with postgres9.6

2017-04-07 Thread Merlin Moncure

On Fri, Apr 7, 2017 at 5:16 AM, Prakash Itnal  wrote:
> Hello,
>
> We currently use psotgres 9.3 in our products. Recently we upgraded to
> postgres 9.6. But with 9.6 we have seen a drastic reduction in throughput.
> After analyzing carefully I found that "planner time" in 9.6 is very high.
> Below are the details:
>
> Scenario:
> 1 Create a table with 10 rows.
> 2 Execute simple query: select * from subscriber where s_id = 100;
> 3 No update/delete/insert; tried vacuum, full vacuum; by default we enable
> auto-vacuum
>
> 9.3: Avg of "Total runtime" : 0.24ms [actual throughput: 650 TPS]
> 9.6: Avg of Total time: 0.56ms (Avg of "Planning time" : 0.38ms + Avg of
> "Execution time" : 0.18ms) [actual throughput: 80 TPS]

I think your math is off.  Looking at your attachments, planning time
is 0.056ms, not 0.56ms.  This is in no way relevant to performance on
the order of your measured TPS.   How are you measuring TPS?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Undefined psql variables

2017-04-07 Thread Pavel Stehule

2017-04-07 9:52 GMT+02:00 Fabien COELHO :

>
> Hello Corey,
>
> \if defined varname
 \if sql boolean expression to send to server
 \if compare value operator value

>>>
>>> I'm still thinking:-)
>>>
>>> Independently of the my aethetical complaint against having a pretty
>>> unusual keyword prefix syntax, how would you envision a \set assignment
>>> variant? Would \if have a different expression syntax somehow?
>>>
>>
>> Any further thoughts?
>>
>
> My current opinion:
>
>  - I'm fine if \set stays as it is, i.e. no expression.
>
>  - I agree that some client-side expressions are needed, along the
>semantics suggested by Tom, i.e. definition and comparisons.
>
>  - I'm really against the prefix syntax suggested by Tom
>
>
> I wish I could have an explanation about why the :?varname (or some other
> variant) syntax I suggested has a "namespace" issue.
>
> The advantage that I see is that although it is obviously ugly, it is ugly
> in the continuity of the various :["'?]varname syntaxes already offered and
> it allows to get rid of "defined varname" which does not look like SQL. A
> second advantage is that with the "defined" proposal
>

I don't think so this argument is valid - \if doesn't look like SQL too.


>
>\if defined var1 and defined var2 or defined var3 and sqlrt() >= ..
>
> Would probably never work work, as it cannot be embedded in another
> expression, while it would work with
>
>\if :?var1 and :?var2 or :?var3 and ...
>
>
I don't see any reason why first should not work and second should to work


>
> Moreover, I would like the condition syntax to be basically SQL & psql
> variables, without explicit prefixes, with a transparent decision whether
> it is evaluated client side or server side.
>
> As client-side expressions are pretty simple, ISTM that some regex could
> be used for this purpose, eg for integer and boolean comparisons:
>
>  ^\s*\d+\s*(=|<>|!=|<|<=|>|>=)\s*\d+\s*$
>  ^\s*(bool...)\s*(=|<>|!=)\s*(bool...)\s*$
>  ^\s*(NOT\s*)?(bool...)\s*$
>
> So that one could just write the expressions without having to tell where
> it is executed, eg
>
>  \if :VERSION_NUM < 11
>
> Would lead to
>
>  \if 10 < 11
>
> Caught by the first regex, and evaluated with a few lines of code.


I have a different opinion - the condition expression should not be SQL
necessary. This language is oriented on client side operations. What is
benefit from server side expression?

Regards

Pavel


>
>
> --
> Fabien.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Duplicate assignment in Unicode/convutils.pm

2017-04-07 Thread Heikki Linnakangas


On 04/07/2017 09:32 AM, Kyotaro HORIGUCHI wrote:

Hi, I found that convutils.pl contains a harmless duplicate
assignemnt.


my $out = {f => $fname, l => $.,
   code => hex($1),
   ucs => hex($2),
   comment => $4,
   direction => BOTH,
   f => $fname,
   l => $.
};


Of course this is utterly harmless but wrong.

The attached patch fixes this following other perl files around.
No similar mistake is not found there.


Committed, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup: Allow use of arbitrary compression program

2017-04-07 Thread Jeff Janes

On Thu, Apr 6, 2017 at 7:04 PM, Michael Harris  wrote:

> Hello,
>
> Back in pg 9.2, we hacked a copy of pg_basebackup to add a command
> line option which would allow the user to specify an arbitrary
> external program (potentially including arguments) to be used to
> compress the tar backup.
>
> Our motivation was to be able to use pigz (parallel gzip
> implementation) to speed up the compression. It also allows using
> tools like bzip2, xz, etc instead of the inbuilt zlib.
>
> I never ended up submitting that upstream, but now it looks like I
> will have to repeat the exercise for 9.6, so I was wondering if such a
> feature would be welcomed.
>

I would welcome it.  I would really like to be able to use parallel pigz
and pxz.

You can stream the data into a compression tool of your choice as long as
you use tar mode and specify '-D -', but that is incompatible with table
spaces, and with xlog streaming, and so is not a very good solution.

Cheers,

Jeff

Re: [HACKERS] [PATCH] Document the order of changing certain settings when using hot-standby servers

2017-04-07 Thread Yorick Peterse

Done! It can be found at https://commitfest.postgresql.org/14/1110/

Thanks for reviewing thus far :)

Yorick


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Warn users about duplicate configuration parameters

2017-04-07 Thread David G. Johnston

On Fri, Apr 7, 2017 at 8:29 AM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> Andres, Tatsuo,
>
> Thank you for sharing your thoughts.
>
> > -1 - I frequently just override earlier parameters by adding an
> > include at the end of the file.  Also, with postgresql.auto.conf it's
> > even more common to override parameters.
>
> > -1 from me too by the same reason Andres said.
>
> I see no problem here. After all, it's just warnings. We can even add
> a GUC that disables them specially for experienced users who knows what
> she or he is doing. And/or add special case for postgresql.auto.conf.
>
>
-1 for learning how the configuration system work via warning messages.

We've recently added pg_file_settings to provide a holistic view and the
docs cover the topic quite well.

David J.

Re: [HACKERS] [PATCH] Warn users about duplicate configuration parameters

2017-04-07 Thread Euler Taveira

2017-04-07 12:14 GMT-03:00 Aleksander Alekseev :

> Recently I've discovered that if there are multiple values of the same
> parameter in postgresql.conf PostgreSQL will silently use the last one.
> It looks like not the best approach to me. For instance, user can find
> the first value in the config file and expect that it will be used, etc.
>

Postgres configuration file concept is based on overriding parameter
values. It would be annoying if we emit warning for this feature. Also, it
is easier to know which file/line the parameter value came from. You can
check for duplicates with a small script.


-- 
   Euler Taveira   Timbira - http://www.
timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: [HACKERS] [PATCH] Warn users about duplicate configuration parameters

2017-04-07 Thread David Steele

On 4/7/17 11:22 AM, Tatsuo Ishii wrote:
>>> Recently I've discovered that if there are multiple values of the same
>>> parameter in postgresql.conf PostgreSQL will silently use the last one.
>>> It looks like not the best approach to me. For instance, user can find
>>> the first value in the config file and expect that it will be used, etc.
>>>
>>> I suggest to warn users about duplicated parameters. Here is a
>>> corresponding patch.
>>>
>>> Thoughts?
>>
>> -1 - I frequently just override earlier parameters by adding an include
>> at the end of the file.  Also, with postgresql.auto.conf it's even more
>> common to override parameters.
> 
> -1 from me too by the same reason Andres said.

-1 from me for the same reason.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 147 matches

Mail list logo