date:20161227

Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-27 Thread Ashutosh Bapat

On Wed, Dec 28, 2016 at 9:20 AM, Etsuro Fujita
 wrote:
> On 2016/12/27 22:03, Ashutosh Bapat wrote:
>>
>> If mergejoin_allowed is true and mergeclauselist is non-NIL but
>> hashclauselist is NIL (that's rare but there can be types has merge
>> operators but not hash operators), we will end up returning NULL. I
>> think we want to create a merge join in that case. I think the order
>> of conditions should be 1. check hashclause_list then create hash join
>> 2. else check if merge allowed, create merge join. It looks like that
>> would cover all the cases, if there aren't any hash clauses, and also
>> no merge clauses, we won't be able to implement a FULL join, so it
>> will get rejected during path creation itself.
>
>
> Right, maybe we can do that by doing similar things as in match_unsort_outer
> and/or sort_inner_and_outer.  But as you mentioned, the case is rare, so the
> problem would be whether it's worth complicating the code (and if it's
> worth, whether we should do that at the first version of the function).

All I am requesting is changing the order of conditions. That doesn't
complicate the code.

>
 I
 am
 wondering whether the easy and possibly correct solution here is to not
 replace
 a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we
 don't do
 that, there won't be error building merge join plan and
 postgresRecheckForeignScan() would correctly route the EPQ checks to the
 local
 plan available as outer plan.
>
>
>>> That might be fine for PG9.6, but I'd like to propose replacing
>>> GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
>>> GetExistingLocalJoinPath might choose an overkill, merge or hash join
>>> path
>>> for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might cause an
>>> overhead at EPQ rechecks, and
>
>
>> The reason we chose to pick up an existing path was that the
>> discussion in thread [1] concluded the efficiency of the local plan
>> wasn't a concern for EPQ. Are we now saying something otherwise?
>
>
> No, I won't.  Usually, the overhead would be negligible, but in some cases
> where there are many concurrent updates, the overhead might not be
> negligible due to many EPQ rechecks.  So it would be better to have an
> efficient local plan.
>

All that the EPQ rechecks do is apply the join and other quals again
on the base relation rows. Will choice of plan affect the efficiency?

>>> (2) choosing a local join path randomly from
>>> the rel's pathlist wouldn't be easy to understand.
>
>
>> Easy to understand for whom? Can you please elaborate?
>
>
> Users.  I think the ease of understanding for users is important.

I doubt users care much about whether an existing path is returned or
a new one created as long as they get one to stuff in fdw_outerpath.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index-only scan

2016-12-27 Thread Amit Kapila

On Wed, Dec 28, 2016 at 12:18 PM, Rafia Sabih
 wrote:
> Rebased patch of parallel-index only scan based on the latest version of
> parallel index scan [1] is attached.
>
> [1]
> https://www.postgresql.org/message-id/CAA4eK1LiNi7_Z1%2BPCV4y06o_v%3DZdZ1UThE%2BW9JhthX4B8uifnA%40mail.gmail.com
>

The link for the latest version is wrong, the correct link is:
https://www.postgresql.org/message-id/CAA4eK1KthrAvNjmB2cWuUHz%2Bp3ZTTtbD7o2KUw49PC8HK4r1Wg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index-only scan

2016-12-27 Thread Rafia Sabih

Rebased patch of parallel-index only scan based on the latest version of
parallel index scan [1] is attached.

[1] https://www.postgresql.org/message-id/CAA4eK1LiNi7_Z1%
2BPCV4y06o_v%3DZdZ1UThE%2BW9JhthX4B8uifnA%40mail.gmail.com

On Sat, Dec 24, 2016 at 7:55 PM, Rafia Sabih 
wrote:

> Extremely sorry for the inconvenience caused, please find the attached
> file for the latest version of the patch.
>
> On Sat, Dec 24, 2016 at 1:41 AM, Robert Haas 
> wrote:
>
>> On Fri, Dec 23, 2016 at 3:03 PM, Tom Lane  wrote:
>> > Or in words of one syllable: please do not use nabble to post to the
>> > community mailing lists.
>>
>> Many of those words have two syllables, and one has four.
>>
>> Anyhow, I think three castigating emails from committers in a span of
>> 62 minutes is probably enough for the OP to get the point, unless
>> someone else REALLY feels the need to pile on.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
>
> --
> Regards,
> Rafia Sabih
> EnterpriseDB: http://www.enterprisedb.com/
>



-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


parallel_index_only_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE parent SET WITHOUT OIDS and the oid column

2016-12-27 Thread Ashutosh Bapat

>>
>> We allow creating user attribute with name "oid" so you do not want to
>> search system attribute oid by name. Instead search by attribute id
>> ObjectIdAttributeNumber.
>
> Good point.  Although, there can only be one of the two in a table at any
> given time - either the "oid" system column or user-defined column named
> "oid".  I was afraid that with the patch, the attinhcount of a
> user-defined oid column may get incremented twice, but before that could
> happen, one would get the error: "child table without OIDs cannot
> inherited from table with OIDs"
>
> create table foo (a int) with oids;
> create table bar (a int, oid int);
> alter table bar inherit foo;
> ERROR:  table "bar" without OIDs cannot inherit from table "foo" with OIDs
>
> And then:
>
> alter table bar set with oids;
> ERROR:  column "oid" of relation "bar" already exists
>

Right. But I think it's better to use attribute id, in case the code
raising this error changes for any reason in future. Or at least we
should add an assert or an elog to make sure that we are updating the
system attribute OID and not user defined one.

The code updating attinhcount and then updating the catalogs is same
for user defined attributes and OID. Should we separate it out into a
function and use that function instead of duplicating the code? The
problem with duplicating the code is one has to remember to update
both copies while changing it.

Your test uses tablenames starting with "_". I have not seen that
style in the testcases. Is it intentional?

Rest of the patch looks good to me.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] parallelize queries containing subplans

2016-12-27 Thread Amit Kapila

Currently, queries that have references to SubPlans or
AlternativeSubPlans are considered parallel-restricted.  I think we
can lift this restriction in many cases especially when SubPlans are
parallel-safe.  To make this work, we need to propagate the
parallel-safety information from path node to plan node and the same
could be easily done while creating a plan.  Another option could be
that instead of propagating parallel-safety information from path to
plan, we can find out from the plan if it is parallel-safe (doesn't
contain any parallel-aware node) by traversing whole plan tree, but I
think it is a waste of cycles.  Once we have parallel-safety
information in the plan, we can use that for detection of
parallel-safe expressions in max_parallel_hazard_walker().  Finally,
we can pass all the subplans to workers during plan serialization in
ExecSerializePlan().  This will enable workers to execute subplans
that are referred in parallel part of the plan.  Now, we might be able
to optimize it such that we pass only subplans that are referred in
parallel portion of plan, but I am not sure if it is worth the trouble
because it is one-time cost and much lesser than other things we do
(like creating
dsm, launching workers).

Attached patch implements the above idea.  This will enable
parallelism for queries containing un-correlated subplans, an example
of which is as follows:

set parallel_tuple_cost=0;
set parallel_setup_cost=0;
set min_parallel_relation_size=50;

create table t1 (i int, j int, k int);
create table t2 (i int, j int, k int);

insert into t1 values (generate_series(1,10)*random(),
generate_series(5,50)*random(),
generate_series(8,80)*random());
insert into t2 values (generate_series(4,10)*random(),
generate_series(5,90)*random(),
generate_series(2,10)*random());


Plan without Patch
-
postgres=# explain select * from t1 where t1.i not in (select t2.i
from t2 where t2.i in (1,2,3));
  QUERY PLAN
---
 Seq Scan on t1  (cost=110.84..411.72 rows=8395 width=12)
   Filter: (NOT (hashed SubPlan 1))
   SubPlan 1
 ->  Seq Scan on t2  (cost=0.00..104.50 rows=2537 width=4)
   Filter: (i = ANY ('{1,2,3}'::integer[]))
(5 rows)

Plan with Patch

postgres=# explain select * from t1 where t1.i not in (select t2.i
from t2 where t2.i in (1,2,3));
   QUERY PLAN
-
 Gather  (cost=110.84..325.30 rows=8395 width=12)
   Workers Planned: 1
   ->  Parallel Seq Scan on t1  (cost=110.84..325.30 rows=4938 width=12)
 Filter: (NOT (hashed SubPlan 1))
 SubPlan 1
   ->  Seq Scan on t2  (cost=0.00..104.50 rows=2537 width=4)
 Filter: (i = ANY ('{1,2,3}'::integer[]))
(7 rows)

We have observed that Q-16 in TPC-H have been improved with the patch
and the analysis of same will be shared by my colleague Rafia.

Now, we can further extend this to parallelize queries containing
correlated subplans like below:

explain select * from t1 where t1.i in (select t2.i from t2 where t2.i=t1.i);
 QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..831049.09 rows=8395 width=12)
   Filter: (SubPlan 1)
   SubPlan 1
 ->  Seq Scan on t2  (cost=0.00..97.73 rows=493 width=4)
   Filter: (i = t1.i)
(5 rows)

In the above query, Filter on t2 (i = t1.i) generates Param node which
is a parallel-restricted node, so such queries won't be able to use
parallelism even with the patch.  I think we can mark Params which
refer to same level as parallel-safe and I think we have this
information (node-> varlevelsup/ phlevelsup/ agglevelsup) available
when we replace correlation vars (SS_replace_correlation_vars).  The
reason why it is not advisable to mark Params that don't refer to same
query level as parallel-safe is that can lead to plans like below:

Foo
-> Gather
  -> Bar
   SubPlan 1 (SubPlan refers to Foo)

Now, this problem is tricky because we need to pass all such Params
each time we invoke Gather.  That also is doable with much more
effort, but I am not sure if it is worth because all of the use cases
I have seen in TPC-H (Q-2) or TPC-DS (Q-6) always uses SubPlans that
refer to same query level.

Yet, another useful enhancement in this area could be to consider both
parallel and non-parallel paths for subplans.  As of now, we consider
the cheapest/best path and form subplan from it, but it is quite
possible that instead of choosing parallel path (in case it is
cheapest) at subplan level, the non-parallel path at subplan level
could be beneficial when upper plan can use parallelism.  I think this
will be a separate project in itself if we want to do this and based
on my study of TPC-H and TPC-DS queries, I am confident that this will
be helpful in certain queries at higher scale

Re: [HACKERS] Hooks

2016-12-27 Thread Michael Paquier

On Wed, Dec 28, 2016 at 2:14 PM, David Fetter  wrote:
> Here's everything that matches ^\w+_hook$ that I've found so far in
> git master.  There are very likely false positives in this list.
>
> [... long list of hooks ...]
>
> Some appear to be client-side, some server-side.  I hope that no hook
> is named other than that way, and I'll do my best to check.

This list includes some of the hooks of psql used to assign a pset
variable. You had better just list the backend-side ones, or if you
want just those in src/backend/. Those are the ones that matter to
extension and plugin developers.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for pg_receivexlog --format=plain|tar

2016-12-27 Thread Michael Paquier

On Wed, Dec 28, 2016 at 3:12 AM, Magnus Hagander  wrote:
> On Tue, Dec 27, 2016 at 1:16 PM, Michael Paquier 
> wrote:
>> On Tue, Dec 27, 2016 at 6:34 PM, Magnus Hagander 
>> wrote:
>> > On Tue, Dec 27, 2016 at 2:23 AM, Michael Paquier
>> > 
>> > wrote:
>> >> Magnus, you have mentioned me as well that you had a couple of ideas
>> >> on the matter, feel free to jump in and let's mix our thoughts!
>> >
>> >
>> > Yeah, I've been wondering what the actual usecase is here :)
>>
>> There is value to compress segments finishing with trailing zeros,
>> even if they are not the species with the highest representation in
>> the WAL archive.
>
> Agreed on that part -- that's the value in compression though, and not
> necessarily the TAR format itself.
>
> Is there any value of the TAR format *without* compression in your scenario?

Hm. I cannot think of one.

>> > I can see the point of being able to compress the individual segments
>> > directly in pg_receivexlog in smaller systems though, without the need
>> > to
>> > rely on an external compression program as well. But in that case, is
>> > there
>> > any reason we need to wrap it in a tarfile, and can't just write it to
>> > .gz natively?
>>
>> You mean having a --compress=0|9 option that creates individual gz
>> files for each segment? Definitely we could just do that. It would be
>
> Yes, that's what I meant.

OK, I have bitten the bullet and attached is a patch to add just
--compress=0|9. So there is one .gz file generated per segment, and
things are rather straight-forward. Adding tests is unfortunately not
in scope as not all builds are compiled with libz.

A couple of things to be aware of though:
- gzopen, gzwrite and gzclose are used to handle the gz files. That's
unconsistent with the tar method that is based on mainly deflate() and
more low level routines.
- I have switched the directory method to use a file pointer instead
of a file descriptor as gzwrite returns int as the number of
uncompressed bytes written.
- history and timeline files are gzip'd as well. Perhaps we don't want
to do that.

What do you think about this approach? I'll add that to the next CF.
-- 
Michael


receivexlog-gzip-v1.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks

2016-12-27 Thread David Fetter

On Tue, Dec 27, 2016 at 10:15:55PM -0600, Jim Nasby wrote:
> On 12/27/16 7:41 PM, David Fetter wrote:
> > I see it as larger in scope than mine because it changes how we do
> > things as a project.  An example of the kind of thing that this raises
> > is enforcement.  Will something (or someone) check that new hooks have
> > this?  Will somebody check for comment skew when the APIs change?
> > What happens when somebody forgets?
> 
> Can we reduce the scope of this to a manageable starting point?

That is what I'm trying to do.

> I'm guessing that all existing hooks share certain characteristics
> that it'd be pretty easy to detect. If you can detect the hook
> (which I guess means finding a static variable with hook in the
> name) then you can verify that there's an appropriate comment block.
> I'm guessing someone familiar with tools like doxygen could set that
> up without too much effort, and I'd be surprised if the community
> had a problem with it.

Sure, but that seems like an effort somewhat orthogonal to the one I
proposed, which is to get some user-facing i.e. SGML docs up for the
current hooks.

Here's everything that matches ^\w+_hook$ that I've found so far in
git master.  There are very likely false positives in this list.

ClientAuthentication_hook
ExecutorCheckPerms_hook
ExecutorEnd_hook
ExecutorFinish_hook
ExecutorRun_hook
ExecutorStart_hook
ExplainOneQuery_hook
ProcessUtility_hook
assign_hook
autocommit_hook
call_bool_check_hook
call_enum_check_hook
call_int_check_hook
call_real_check_hook
call_string_check_hook
check_hook
check_password_hook
comp_keyword_case_hook
create_upper_paths_hook
echo_hidden_hook
echo_hook
emit_log_hook
explain_get_index_name_hook
fetch_count_hook
fixed_paramref_hook
fmgr_hook
get_attavgwidth_hook
get_index_stats_hook
get_relation_info_hook
get_relation_stats_hook
histcontrol_hook
join_search_hook
needs_fmgr_hook
next_client_auth_hook
next_exec_check_perms_hook
object_access_hook
on_error_rollback_hook
on_error_stop_hook
original_client_auth_hook
original_post_parse_analyze_hook
p_coerce_param_hook
p_paramref_hook
p_post_columnref_hook
p_pre_columnref_hook
pgstat_beshutdown_hook
planner_hook
plpgsql_extra_checks_check_hook
plpgsql_extra_errors_assign_hook
plpgsql_extra_warnings_assign_hook
post_parse_analyze_hook
prev_post_parse_analyze_hook
prompt1_hook
prompt2_hook
prompt3_hook
quiet_hook
sepgsql_audit_hook
set_join_pathlist_hook
set_rel_pathlist_hook
shmem_startup_hook
show_context_hook
show_hook
singleline_hook
singlestep_hook
variable_coerce_param_hook
variable_paramref_hook
verbosity_hook

Some appear to be client-side, some server-side.  I hope that no hook
is named other than that way, and I'll do my best to check.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[WIP] RE: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

2016-12-27 Thread Ideriha, Takeshi

> 
> I'm looking forward to seeing your patch.
>

I created a patch.

I marked [WIP] to the title because some documentation is lacked and format 
needs some fixing.

I'm going to post this patch to the January CF. 
But it's my first time to send a patch so please excuse me if there's something 
you feel bad with.

Regards, 
Ideriha Takeshi

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Meskes
> Sent: Tuesday, November 22, 2016 2:57 AM
> To: Ideriha, Takeshi 
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG
> 
> > I wanted to say that in order to use the connection pointed by the
> > DECLARE STATEMENT some functions like ECPGdo() would be modified or
> > new function would be added under the directory ecpglib/.
> >
> > This modification or new function will be used to get the connection
> > by statement_name.
> 
> Ah, now I understand. Thank you for your explanation.
> 
> I'm looking forward to seeing your patch.
> 
> Michael
> --
> Michael Meskes
> Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes
> at (Debian|Postgresql) dot Org
> Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 
> 49ers!
> Use Debian GNU/Linux, PostgreSQL
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


001_declareStatement_v1.patch.tar.gz
Description: 001_declareStatement_v1.patch.tar.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster methods for getting SPI results

2016-12-27 Thread Jim Nasby


On 12/27/16 9:10 PM, Craig Ringer wrote:

On 28 December 2016 at 09:58, Jim Nasby  wrote:


I've looked at this some more, and ITSM that the only way to do this without
some major surgery is to create a new type of Destination specifically for
SPI that allows for the execution of an arbitrary C function for each tuple
to be sent.


That sounds a lot more sensible than the prior proposals. Callback driven.


Are there other places this would be useful? I'm reluctant to write all 
of this just to discover it doesn't help performance at all, but if it's 
useful on it's own I can just submit it as a stand-alone patch.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some thoughts about multi-server sync rep configurations

2016-12-27 Thread Thomas Munro

On Wed, Dec 28, 2016 at 4:21 PM, Craig Ringer  wrote:
> On 28 December 2016 at 08:14, Thomas Munro
>  wrote:
>
>> 3.  No server must allow a transaction to be visible that hasn't been
>> flushed on N standby servers.  We already prevent that on the primary
>
> Only if the primary doesn't restart. We don't persist the xact masking
> used by sync rep at the moment.

Right.  Maybe you could fix that gap by making the primary wait until
the rule in synchronous_standby_names would be satisfied by the most
conservative possible synchronous_commit level (remote_apply) after
recovery and before allowing any queries to run?

> I suspect that solving that is probably tied to solving it on standbys.

Hmm.  I was imagining that for standbys it might involve extra
messages flowing from the primary carrying the consensus write and
flush LSN locations (since there isn't any other kind of inter-node
communication possible, today), and then somehow teaching the standby
to see only the transactions whose commit record is <= the lastest
consensus commit LSN (precisely, and no more!) when acquiring a
snapshot if K is > 2 and N > 1 and you have synchronous_commit set to
a level >= remote_write on the standby.  That could be done by simply
waiting for the consensus write or flush LSN (as appropriate) to be
applied before taking a snapshot, but aside from complicated
interlocking requirements, that would slow down snapshot acquisition
unacceptably on write-heavy systems.  Another way to do it could be to
maintain separate versions of the snapshot data somehow for each
synchronous_commit level on standbys, so that you can get quickly your
hands on a snapshot that can only see xids whose commit record was <=
consensus write or flush location as appropriate.  That interacts
weirdly with synchronous_commit = remote_apply on the primary though
because (assuming we want it to be useful) it needs to wait until the
LSN is applied on the standby(s) *and* they can see it in this weird
new time-delayed snapshot thing; perhaps it would require a new level
remote_apply_consensus_flush which waits for the standby(s) to apply
and and also know that the transaction has been flushed on enough
nodes to allow it to be seen...

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks

2016-12-27 Thread Craig Ringer

On 28 December 2016 at 12:15, Jim Nasby  wrote:

> Can we reduce the scope of this to a manageable starting point? I'm guessing
> that all existing hooks share certain characteristics that it'd be pretty
> easy to detect. If you can detect the hook (which I guess means finding a
> static variable with hook in the name) then you can verify that there's an
> appropriate comment block. I'm guessing someone familiar with tools like
> doxygen could set that up without too much effort, and I'd be surprised if
> the community had a problem with it.

Lets just make sure the comment blocks are nice and grep-able too.

I think this is a great idea FWIW. Discovering the extension points
within Pg isn't easy.

Callbacks aren't easy to find either.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks

2016-12-27 Thread Jim Nasby


On 12/27/16 7:41 PM, David Fetter wrote:

I see it as larger in scope than mine because it changes how we do
things as a project.  An example of the kind of thing that this raises
is enforcement.  Will something (or someone) check that new hooks have
this?  Will somebody check for comment skew when the APIs change?
What happens when somebody forgets?


Can we reduce the scope of this to a manageable starting point? I'm 
guessing that all existing hooks share certain characteristics that it'd 
be pretty easy to detect. If you can detect the hook (which I guess 
means finding a static variable with hook in the name) then you can 
verify that there's an appropriate comment block. I'm guessing someone 
familiar with tools like doxygen could set that up without too much 
effort, and I'd be surprised if the community had a problem with it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2016-12-27 Thread Jim Nasby


On 12/27/16 4:56 PM, Merlin Moncure wrote:

On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule  wrote:

First I describe my initial position. I am strongly against introduction
"new" language - plpgsql2 or new plpgsql, or any else. The trust of
developers to us is important and introduction of any not compatible or
different feature has to have really big reason. PostgreSQL is conservative
environment, and PLpgSQL should not be a exception. More - I have not any


Which is why this is an external fork of plpgsql.

** The real problem is that we have no mechanism for allowing a PL's 
language/syntax/API to move forward without massive backwards 
compatibility problems. **


This is NOT unique to plpgsql. plpython (for one) definitely has some 
stupidity that will require an API break to fix.


A secondary issue is the lack of a blessed collection of extensions. If 
we had that we could maintain some of this stuff outside of the core 
release schedule, as well as provide more room for people to run 
experimental versions of extensions if they desired. If we had this then 
perhaps plpgsql_check would become a viable answer to some of this 
(though IMHO plpgsql_check is just a work-around for our lack of dealing 
with API compatibility).



information from my customers, colleagues about missing features in this
language.  If there is some gaps, then it is in outer environment - IDE,
deployment, testing,


I'm honestly surprised (even shocked) that you've never run into any of 
the problems plpgsql2 is trying to solve. I've hit all those problems 
except for OUT parameters. I'd say the order they're listed in actually 
corresponds to how often I hit the problems.



Breaking language compatibility is a really big deal.  There has to be
a lot of benefits to the effort and you have to make translation from
plpgsql1 to plpgsql2 really simple.  You have made some good points on


I think trying to move the ball forward in a meaningful way without 
breaking compatibility is a lost cause. Some of these issues could be 
addressed by adding more syntax, but even that has limits (do we really 
want another variation of STRICT that allows only 0 or 1 rows?). And 
there's no way to fix your #1 item below without breaking compatibility.


There *are* other ways this could be done, besides creating a different 
PL. One immediate possibility is custom GUCs; there may be other options.



#1 problem with plpgsql in my point of view is that the language and
grammar are not supersets of sql.  A lot of PLPGSQL keywords (EXECUTE,
BEGIN, INTO, END) have incompatible meanings with our SQL
implementation.  IMNSHO, SQL ought to give the same behavior inside or
outside of plpgsql.  It doesn't, and this is one of the reasons why
plpgsql may not be a good candidate for stored procedure
implementation.


While this doesn't bug me, it's got to be confusing as hell for newbies.


#2 problem with plpgsql is after function entry it's too late to do
things like set transaction isolation level and change certain kinds
of variables (like statement_timeout).  This is very obnoxious, I
can't wrap the database in an API 100%; the application has to manage
things that really should be controlled in SQL.


+1


#3 problem with plpgsql is complete lack of inlining.  inlining
function calls in postgres is a black art even for very trivial cases.
This makes it hard for us to write quick things and in the worst case
causes endless duplications of simple expressions.


Instead of banging our heads against the fmgr API to try and solve this, 
I suspect it would be much simpler (and easier to understand) if we had 
the equivalent to a #define for queries. The fmgr API just isn't 
amenable to trying to inline stuff. This would allow you to define 
things like views that accept arguments, so you can shove the argument 
way down in the guts of the query without getting tripped up by fences.


Here's some other plpgsql pain-points (though, not all of these require 
an API break):


#4: it's impossible to operate on a Datum in-place. Though, maybe the 
work Tom did with ExpandedObjects eliminates some of this problem, but 
if it does it's hidden behind the existing syntax and you have no way to 
know it (and AFAICT the only thing using that infrastructure right now 
is arrays). Aside from the performance aspects, it'd be damn nice to be 
able to do things like ++, +=, etc.


#5: handling of parameter name collisions still sucks. One way to 
improve this would be to put parameters inside the outer-most statement 
block, so you could use a block name that was different from the 
function name. Something else that might help is the ability to assign a 
namespace for query identifiers, so you don't have to alias every 
individual relation in a query.


#6: The variations of syntax between the FOR variants is annoying 
(specifically, FOREACH necessitating the ARRAY keyword).


#7: = vs := vs INTO. = can do everything the others can do except

Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-27 Thread Etsuro Fujita


On 2016/12/27 22:03, Ashutosh Bapat wrote:

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.


Right, maybe we can do that by doing similar things as in 
match_unsort_outer and/or sort_inner_and_outer.  But as you mentioned, 
the case is rare, so the problem would be whether it's worth 
complicating the code (and if it's worth, whether we should do that at 
the first version of the function).



I
am
wondering whether the easy and possibly correct solution here is to not
replace
a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we
don't do
that, there won't be error building merge join plan and
postgresRecheckForeignScan() would correctly route the EPQ checks to the
local
plan available as outer plan.



That might be fine for PG9.6, but I'd like to propose replacing
GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
GetExistingLocalJoinPath might choose an overkill, merge or hash join path
for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might cause an
overhead at EPQ rechecks, and



The reason we chose to pick up an existing path was that the
discussion in thread [1] concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?


No, I won't.  Usually, the overhead would be negligible, but in some 
cases where there are many concurrent updates, the overhead might not be 
negligible due to many EPQ rechecks.  So it would be better to have an 
efficient local plan.



(2) choosing a local join path randomly from
the rel's pathlist wouldn't be easy to understand.



Easy to understand for whom? Can you please elaborate?


Users.  I think the ease of understanding for users is important.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: session server side variables

2016-12-27 Thread Craig Ringer

Fabien, I don't really see the point of "persistent variables". What
benefit do they add over relations?

You can add a simple function to fetch a tuple if you want it not to
look like a subquery. Do it with heap access in C if you like, save
the (trivial) planning costs.


I do see value to two different things discussed here:

* Pavel's proposal for persistent-declaration, non-persistent-value
session variables with access control. These will be very beneficial
with RLS, where we need very fast lookups. Their purpose is that you
can set up expensive state with SECURITY DEFINER functions, C-level
functions, etc, then test it very cheaply later from RLS and from
other functions. Their advantage over relations is very cheap, fast
access.

  I can maybe see global temporary relations being an alternative to
these, if we optimise by using a tuplestore to back them and only
switch to a relfilenode if the relation grows. The pg_catalog entries
would be persistent so you could GRANT or REVOKE access to them, etc.
Especially if we optimised the 1-row case this could work. It'd be
less like what Oracle does, but might let us re-use more functionality
and have fewer overlapping features. Pavel?


* Fabien's earlier mention of transient session / query variables,
a-la MySQL or MS-SQL. They're obviously handy for more general purpose
programming, but our strict division between SQL and plpgsql makes
them a bit less useful than in MS-SQL with T-SQL. I think it's a very
separate topic to this and should be dealt with in a separate thread
if/when someone wants to work on them.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Add doc advice about systemd RemoveIPC

2016-12-27 Thread Peter Eisentraut

Here is a patch to add some information about the systemd RemoveIPC
issue to the documentation, sort of in the spirit of the OOM discussion
nearby.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From eaf0eda3f4c402a2e8b7f2f2395a8536f38aafbc Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 27 Dec 2016 12:00:00 -0500
Subject: [PATCH] doc: Add advice about systemd RemoveIPC

---
 doc/src/sgml/runtime.sgml | 82 +++
 1 file changed, 82 insertions(+)

diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 787cfce987..fee0d65d90 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1165,6 +1165,88 @@ System V IPC Parameters
 
   
 
+  
+   systemd RemoveIPC
+
+   
+systemd
+RemoveIPC
+   
+
+   
+
+ If systemd is in use, the advice in this
+ section must be followed, or your PostgreSQL server will be very
+ unreliable.
+
+   
+
+   
+If systemd is in use, special care must be
+taken that IPC resources (shared memory and semaphores) are not
+prematurely removed by the operating system.  Although the issues
+described here are most commonly known to happen if the PostgreSQL server
+is started through a systemd service unit, they
+can also happen in other setups.
+   
+
+   
+The setting RemoveIPC
+in logind.conf controls whether IPC objects are
+removed when a user fully logs out.  System users are exempt.  This
+setting defaults to on in stock systemd, but
+some operating system distributions default it to off.
+   
+
+   
+A typical observed effect when this setting is on is that the semaphore
+objects used by a PostgreSQL server are removed at apparently random
+times, leading to the server crashing with log messages like
+
+LOG: semctl(1234567890, 0, IPC_RMID, ...) failed: Invalid argument
+
+Different types of IPC objects (shared memory vs. semaphores, System V
+vs. POSIX) are treated slightly differently
+by systemd, so one might observe that some IPC
+resources are not removed in the same way as others.  But it is not
+advisable to rely on these subtle differences.
+   
+
+   
+A user logging out might happen as part of a maintenance
+job or manually when an administrator logs in as
+the postgres user or similar, so it is hard to prevent
+in general.
+   
+
+   
+What is a system user is determined
+at systemd compile time from
+the SYS_UID_MAX setting
+in /etc/login.defs.
+   
+
+   
+It is recommended to set
+
+RemoveIPC=no
+
+on all server hosts used for PostgreSQL.
+   
+
+   
+Also, packaging and deployment scripts should be careful to create
+the postgres user as a system user by
+using useradd -r, adduser --system,
+or equivalent.
+   
+
+   
+At least one of these two things have to be ensured, or the
+PostgreSQL server will be very unreliable.
+   
+  
+
   
Resource Limits
 
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some thoughts about multi-server sync rep configurations

2016-12-27 Thread Craig Ringer

On 28 December 2016 at 08:14, Thomas Munro
 wrote:

> 3.  No server must allow a transaction to be visible that hasn't been
> flushed on N standby servers.  We already prevent that on the primary

Only if the primary doesn't restart. We don't persist the xact masking
used by sync rep at the moment.

I suspect that solving that is probably tied to solving it on standbys.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

2016-12-27 Thread Craig Ringer

On 28 December 2016 at 08:33, Tom Lane  wrote:

>> I would not say that the current patch is giant & invasive, if you
>> abstract the two added fields to high-level statements.
>
> It's touching every single utility statement type, which is not only
> pretty invasive in itself, but will break any pending patches that
> add more utility statement types.  And you've not followed through on the
> implications of adding those fields in, for instance, src/backend/nodes/;
> so the patch will be even bigger once all the required tidying-up is done.

I echo Tom's complaint. Adding fields to every node is definitely not
the way to do this.

>> I understand that you suggest to add a new intermediate structure:
>
>>typedef struct {
>>  NodeTag tag;
>>  int location, length;
>>  Node *stmt;
>>} ParseNode;
>
>> So that raw_parser would return a List instead of a List,
>> and the statements would be unmodified.
>
> Yeah, that's what I was thinking of.

I strongly agree. That's what I'd done in my draft patch, but I hadn't
got the details of actually collecting position tracking info sorted
out yet.

The information will need to be copied to plans later, but that's
easy, and would've been necessary anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster methods for getting SPI results

2016-12-27 Thread Craig Ringer

On 28 December 2016 at 09:58, Jim Nasby  wrote:

> I've looked at this some more, and ITSM that the only way to do this without
> some major surgery is to create a new type of Destination specifically for
> SPI that allows for the execution of an arbitrary C function for each tuple
> to be sent.

That sounds a lot more sensible than the prior proposals. Callback driven.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Thomas Munro

On Wed, Dec 28, 2016 at 11:57 AM, Thomas Munro
 wrote:
> But I'm starting to think that the best way might be to do BOTH of the
> things I said in my previous message: make dsa.c register on
> create/attach and also unregister before detaching iff the name was
> supplied at creation time for the benefit of extension writers, but
> make it not do anything at all about tranche name
> registration/unregistration if NULL was passed in at create time.
> Then register this particular tranche (LWTRANCHE_PARALLEL_QUERY_DSA)
> in every process in RegisterLWLockTranches.  That way, you'd get a
> useful name in pg_stat_activity for other backends that are running
> parallel queries if they are ever waiting for these locks (unlikely
> but interesting to know abotu if it happens).

Maybe something like the attached.

-- 
Thomas Munro
http://www.enterprisedb.com


fix-dsa-tranche-registration.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster methods for getting SPI results

2016-12-27 Thread Jim Nasby


On 12/21/16 8:21 AM, Jim Nasby wrote:

On 12/20/16 10:14 PM, Jim Nasby wrote:

It would be a lot more efficient if we could just grab datums from the
executor and make a single copy into plpython (or R), letting the PL
deal with all the memory management overhead.

I briefly looked at using SPI cursors to do just that, but that looks
even worse: every fetch is executed in a subtransaction, and every fetch
creates an entire tuplestore even if it's just going to return a single
value. (But hey, we never claimed cursors were fast...)

Is there any way to avoid all of this? I'm guessing one issue might be
that we don't want to call an external interpreter while potentially
holding page pins, but even then couldn't we just copy a single tuple at
a time and save a huge amount of palloc overhead?


AFAICT that's exactly how DestRemote works: it grabs a raw slot from the
executor, makes sure it's fully expanded, and sends it on it's way via
pq_send*(). So presumably the same could be done for SPI, by creating a
new CommandDest (ISTM none of the existing ones would do what we want).

I'm not sure what the API for this should look like. One possibility is
to have SPI_execute and friends accept a flag that indicates not to
build a tupletable. I don't think a query needs to be read-only to allow
for no tuplestore, so overloading read_only seems like a bad idea.

Another option is to treat this as a "lightweight cursor" that only
allows forward fetches. One nice thing about that option is it leaves
open the possibility of using a small tuplestore for each "fetch",
without all the overhead that a full blown cursor has. This assumes
there are some use cases where you want to operate on relatively small
sets of tuples at a time, but you don't need to materialize the whole
thing in one shot.


I've looked at this some more, and ITSM that the only way to do this 
without some major surgery is to create a new type of Destination 
specifically for SPI that allows for the execution of an arbitrary C 
function for each tuple to be sent. AFAICT this should be fairly safe, 
since DestRemote can potentially block while sending a tuple and also 
runs output functions (which presumably could themselves generate errors).


_SPI_execute_plan() would need to accept an arbitrary DestReceiver 
struct, and use that (if specified) instead of creating it's own.


Once that's done, my plan is to allow plpy to use this functionality 
with a receiver function that adds tuple fields to corresponding python 
lists. This should result in significantly less overhead than going 
through a tuplestore when dealing with a large number of rows.


Before I go code this up, I'd like to know if there's some fatal flaw in 
this, or if there's an easier way to hack this up just to test my 
performance theory.


Suggestions?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks

2016-12-27 Thread David Fetter

On Wed, Dec 28, 2016 at 01:33:13AM +, Tsunakawa, Takayuki wrote:
> From: David Fetter [mailto:da...@fetter.org]
> > > How about putting a descriptive comment at the location where each
> > > hook variable is defined, using some convention (e.g. like
> > > Javadoc-style)?  A separate document such as README and wiki can fail
> > > to be updated.  OTOH, if someone wants to add a new hook, we can
> > > expect him to add appropriate comment by following existing hooks.
> > > Using a fixed tag, e.g. "", would facilitate finding all hooks.
> > 
> > I like this idea, but it's a much bigger one than mine because it's
> > essentially inventing (or adopting, whatever we settle on) literate
> > programming for the PostgreSQL project.
> > 
> > https://en.wikipedia.org/wiki/Literate_programming
> 
> I didn't intend to invent a new heavy rule or tool.  I just meant comments 
> just like the existing function descriptions, something like
> 
> /*
>  * Hook name: Authentication hook
>  * Description: ...
>  * Arguments: ...
>  * Return value: ...
>  * Note: ...
>  */

It's still new.  For the record, I think it's an excellent idea.

I see it as larger in scope than mine because it changes how we do
things as a project.  An example of the kind of thing that this raises
is enforcement.  Will something (or someone) check that new hooks have
this?  Will somebody check for comment skew when the APIs change?
What happens when somebody forgets?

> > At the moment, our practice is that (most--hooks being an
> > exception) user-facing features must come with with user-facing
> > docs which are written separately from the source code
> > implementing them.
> 
> OK.  Anyway, if we can see in the PostgreSQL documentation what
> hooks are available, it would be the best.  I imagine you meant
> adding a new chapter under the part "V. Server Programming".

Exactly :)

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread Amit Langote

On 2016/12/27 19:07, Amit Langote wrote:
> Attached should fix that.

Here are the last two patches with additional information like other
patches.  Forgot to do that yesterday.

Thanks,
Amit
>From 5a82b4caa6cec7845eb48e0397fab49c74b8dd98 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:10:26 +0900
Subject: [PATCH 1/2] Set ecxt_scantuple correctly for tuple-routing

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successive levels.  If we do end up changing the slot from
the original, we must update ecxt_scantuple to point to the new one for
partition key expressions to be computed correctly.

Also update the regression tests so that the code manipulating
ecxt_scantuple is covered.

Reported by: Rajkumar Raghuwanshi
Patch by: Amit Langote
Reports: https://www.postgresql.org/message-id/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com
---
 src/backend/catalog/partition.c  | 2 ++
 src/backend/executor/execMain.c  | 2 --
 src/test/regress/expected/insert.out | 2 +-
 src/test/regress/sql/insert.sql  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index fca874752f..f9daf8052d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1647,6 +1647,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		PartitionDesc partdesc = parent->partdesc;
 		TupleTableSlot *myslot = parent->tupslot;
 		TupleConversionMap *map = parent->tupmap;
+		ExprContext *econtext = GetPerTupleExprContext(estate);
 
 		/* Quick exit */
 		if (partdesc->nparts == 0)
@@ -1667,6 +1668,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		}
 
 		/* Extract partition key from tuple */
+		econtext->ecxt_scantuple = slot;
 		FormPartitionKeyDatum(parent, slot, estate, values, isnull);
 
 		if (key->strategy == PARTITION_STRATEGY_RANGE)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bca34a509c..1d699c1dab 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3107,9 +3107,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 {
 	int		result;
 	Oid		failed_at;
-	ExprContext *econtext = GetPerTupleExprContext(estate);
 
-	econtext->ecxt_scantuple = slot;
 	result = get_partition_for_tuple(pd, slot, estate, _at);
 	if (result < 0)
 	{
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 49f667b119..ae54625034 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -302,7 +302,7 @@ drop cascades to table part_ee_ff1
 drop cascades to table part_ee_ff2
 -- more tests for certain multi-level partitioning scenarios
 create table p (a int, b int) partition by range (a, b);
-create table p1 (b int, a int not null) partition by range (b);
+create table p1 (b int not null, a int not null) partition by range ((b+0));
 create table p11 (like p1);
 alter table p11 drop a;
 alter table p11 add a int;
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index 08dc068de8..9d3a34073c 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -173,7 +173,7 @@ drop table list_parted cascade;
 
 -- more tests for certain multi-level partitioning scenarios
 create table p (a int, b int) partition by range (a, b);
-create table p1 (b int, a int not null) partition by range (b);
+create table p1 (b int not null, a int not null) partition by range ((b+0));
 create table p11 (like p1);
 alter table p11 drop a;
 alter table p11 add a int;
-- 
2.11.0

>From 798c51254563735dff843c71f1bbf34b969d8162 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:28:37 +0900
Subject: [PATCH 2/2] No BulkInsertState when tuple-routing is in action
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In future, we might consider alleviating this restriction by
allocating a BulkInsertState per partition.

Reported by: é«å¢ç¦ 
Patch by: Amit Langote (with pointers from é«å¢ç¦)
Reports: https://www.postgresql.org/message-id/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com
---
 src/backend/commands/copy.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index aa25a23336..e9bf4afa44 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2290,7 +2290,7 @@ CopyFrom(CopyState cstate)
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			hi_options = 0; /* start with default heap_insert options */
-	BulkInsertState bistate;
+	BulkInsertState bistate = NULL;
 	uint64		processed = 0;
 	bool		useHeapMultiInsert;
 	int

Re: [HACKERS] Hooks

2016-12-27 Thread Tsunakawa, Takayuki

From: David Fetter [mailto:da...@fetter.org]
> > How about putting a descriptive comment at the location where each
> > hook variable is defined, using some convention (e.g. like
> > Javadoc-style)?  A separate document such as README and wiki can fail
> > to be updated.  OTOH, if someone wants to add a new hook, we can
> > expect him to add appropriate comment by following existing hooks.
> > Using a fixed tag, e.g. "", would facilitate finding all hooks.
> 
> I like this idea, but it's a much bigger one than mine because it's
> essentially inventing (or adopting, whatever we settle on) literate
> programming for the PostgreSQL project.
> 
> https://en.wikipedia.org/wiki/Literate_programming

I didn't intend to invent a new heavy rule or tool.  I just meant comments just 
like the existing function descriptions, something like

/*
 * Hook name: Authentication hook
 * Description: ...
 * Arguments: ...
 * Return value: ...
 * Note: ...
 */


> 
> In the realm of generated documentation, we do have a doxygen
> https://doxygen.postgresql.org/ for the project, but I haven't really found
> it helpful thus far.

Me, too.


> At the moment, our practice is that (most--hooks being an exception)
> user-facing features must come with with user-facing docs which are written
> separately from the source code implementing them.

OK.  Anyway, if we can see in the PostgreSQL documentation what hooks are 
available, it would be the best.  I imagine you meant adding a new chapter 
under the part "V. Server Programming".

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks

2016-12-27 Thread David Fetter

On Wed, Dec 28, 2016 at 12:47:10AM +, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Jim Nasby
> > AFAIK there's no way to get a list of hooks today, short of
> > something like `git grep hook`. I think a simple list of what
> > hooks we have, when they fire and where to find them in code would
> > be sufficient.
> 
> How about putting a descriptive comment at the location where each
> hook variable is defined, using some convention (e.g. like
> Javadoc-style)?  A separate document such as README and wiki can
> fail to be updated.  OTOH, if someone wants to add a new hook, we
> can expect him to add appropriate comment by following existing
> hooks.  Using a fixed tag, e.g. "", would facilitate finding
> all hooks.

I like this idea, but it's a much bigger one than mine because it's
essentially inventing (or adopting, whatever we settle on) literate
programming for the PostgreSQL project.

https://en.wikipedia.org/wiki/Literate_programming

In the realm of generated documentation, we do have a doxygen
https://doxygen.postgresql.org/ for the project, but I haven't really
found it helpful thus far.

Let's take up literate programming in a separate thread.

At the moment, our practice is that (most--hooks being an exception)
user-facing features must come with with user-facing docs which are
written separately from the source code implementing them.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE parent SET WITHOUT OIDS and the oid column

2016-12-27 Thread Amit Langote

On 2016/12/27 22:24, Ashutosh Bapat wrote:
> On Mon, Dec 26, 2016 at 3:36 PM, Amit Langote
>  wrote:
>> Attached patches modifies MergeAttributesIntoExisting() such that we
>> increment attinhcount not only for user attributes, but also for the oid
>> system column if one exists.
>>
>> Thoughts?
> 
> Yes, if a child inherits from a parent with OID, child gets OID column
> and its inhcount is set to 1.

[ ... ]

> So, I think your patch is on the right track.
> 
> We allow creating user attribute with name "oid" so you do not want to
> search system attribute oid by name. Instead search by attribute id
> ObjectIdAttributeNumber.

Good point.  Although, there can only be one of the two in a table at any
given time - either the "oid" system column or user-defined column named
"oid".  I was afraid that with the patch, the attinhcount of a
user-defined oid column may get incremented twice, but before that could
happen, one would get the error: "child table without OIDs cannot
inherited from table with OIDs"

create table foo (a int) with oids;
create table bar (a int, oid int);
alter table bar inherit foo;
ERROR:  table "bar" without OIDs cannot inherit from table "foo" with OIDs

And then:

alter table bar set with oids;
ERROR:  column "oid" of relation "bar" already exists

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks

2016-12-27 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Jim Nasby
> AFAIK there's no way to get a list of hooks today, short of something like
> `git grep hook`. I think a simple list of what hooks we have, when they
> fire and where to find them in code would be sufficient.

How about putting a descriptive comment at the location where each hook 
variable is defined, using some convention (e.g. like Javadoc-style)?  A 
separate document such as README and wiki can fail to be updated.  OTOH, if 
someone wants to add a new hook, we can expect him to add appropriate comment 
by following existing hooks.  Using a fixed tag, e.g. "", would 
facilitate finding all hooks.

Regards
Takayuki Tsunakawa





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

2016-12-27 Thread Tom Lane

Fabien COELHO  writes:
>>> How? The issue is that stmtmulti silently skip some ';' when empty
>>> statements are found, [...]

>> Maybe you should undo that.

> I was trying to be minimally invasive in the parser, i.e. not to change 
> any rules.

That's fairly silly when the alternative is to be maximally invasive
at the parse-nodes level, thereby affecting lots and lots of code that
isn't really related to the problem at hand.

>> I do not like this.  You are making what should be a small patch into
>> a giant, invasive one.

> I would not say that the current patch is giant & invasive, if you 
> abstract the two added fields to high-level statements.

It's touching every single utility statement type, which is not only
pretty invasive in itself, but will break any pending patches that
add more utility statement types.  And you've not followed through on the
implications of adding those fields in, for instance, src/backend/nodes/;
so the patch will be even bigger once all the required tidying-up is done.

> I understand that you suggest to add a new intermediate structure:

>typedef struct {
>  NodeTag tag;
>  int location, length;
>  Node *stmt;
>} ParseNode;

> So that raw_parser would return a List instead of a List, 
> and the statements would be unmodified.

Yeah, that's what I was thinking of.  There aren't very many places that
would need to know about that, I believe; possibly just gram.y itself
and transformTopLevelStmt().  The stuff in between only knows that it's
passing around lists of statement parse trees, it doesn't know much about
what's in those trees.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: session server side variables

2016-12-27 Thread Fabrízio de Royes Mello

On Tue, Dec 27, 2016 at 6:55 PM, Pavel Stehule 
wrote:
>
>
> 2016-12-27 21:38 GMT+01:00 Fabrízio de Royes Mello <
fabriziome...@gmail.com>:
>>
>>
>> On Fri, Dec 23, 2016 at 4:00 PM, Joe Conway  wrote:
>> >
>> > >> In the long term, What would be the possible scopes?
>> > >>
>> > >> TRANSACTION, SESSION, PERSISTANT ?
>> > >>
>> > >> Would some scopes orthogonal (eg SHARED between sessions for a USER
in a
>> > >> DATABASE, SHARED at the cluster level?).
>> > >
>> > > I have a plan to support TRANSACTION and SESSION scope. Persistent or
>> > > shared scope needs much more complex rules, and some specialized
extensions
>> > > will be better.
>> >
>> >
>> > I can see where persistent variables would be very useful though.
>> >
>>
>> Can you talk more about your though??
>>
>> I'm thinking about PERSISTENT VARIABLES and maybe they can be the
redesign of our hard coded "reloptions", with the ability to provide users
to create their own customized. If we think more carefully we already have
some persistent variables with specialized context: reloptions (hardcoded),
security labels (local and shared catalog) and comments (local and shared
catalog). I was clear enough?
>
>
> What is difference from table Values(keys varchar primary key, value
text) ? Where is any benefit?
>

IMHO it's a totally different thing because with this approach we'll have:
- track dependency (aka pg_depend)
- different LockLevel for each persistent variable (reloption)
- syscache to improve performance

And It's a way to store metadata about metadata, i.e. property about our
objects. This can be useful to:
- author extensions to store your own properties with each object
- postgres developers to have a simple way to add a new reloption just
adding a new row to bootstrap data

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] PATCH: recursive json_populate_record()

2016-12-27 Thread Nikita Glukhov


> I've noticed that this patch is on CF and needs a reviewer so I decided
> to take a look. Code looks good to me in general, it's well covered by
> tests and passes `make check-world`.

Thanks for your review.

> However it would be nice to have a little more comments. In my opinion
> every procedure have to have at least a short description - what it
> does, what arguments it receives and what it returns, even if it's a
> static procedure. Same applies for structures and their fields.

I have added some comments for functions and structures in the second 
version of this patch.


> Another thing that bothers me is a FIXME comment:
>
> ```
> tupdesc = lookup_rowtype_tupdesc(type, typmod); /* FIXME cache */
> ```
>
> I suggest remove it or implement a caching here as planned.

I have implemented tuple descriptor caching here in populate_composite() 
and also in populate_record_worker() (by using populate_composite() 
instead of populate_record()). These improvements can speed up bulk 
jsonb conversion by 15-20% in the simple test with two nested records.


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 10e3186..55cacfb 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -11249,12 +11249,12 @@ table2-mapping
  whose columns match the record type defined by base
  (see note below).

-   select * from json_populate_record(null::myrowtype, '{"a":1,"b":2}')
+   select * from json_populate_record(null::myrowtype, '{"a": 1, "b": ["2", "a b"], "c": {"d": 4, "e": "a  b c"}}')

 
- a | b
+---
- 1 | 2
+ a |   b   |  c
+---+---+-
+ 1 | {2,"a b"} | (4,"a b c")
 

   
@@ -11343,12 +11343,12 @@ table2-mapping
  explicitly define the structure of the record with an AS
  clause.

-   select * from json_to_record('{"a":1,"b":[1,2,3],"c":"bar"}') as x(a int, b text, d text) 
+   select * from json_to_record('{"a":1,"b":[1,2,3],"c":[1,2,3],"e":"bar","r": {"a": 123, "b": "a b c"}}') as x(a int, b text, c int[], d text, r myrowtype) 

 
- a |b| d
+-+---
- 1 | [1,2,3] |
+ a |b|c| d |   r
+---+-+-+---+---
+ 1 | [1,2,3] | {1,2,3} |   | (123,"a b c")
 

   
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 17ee4e4..bb72b8e 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -93,7 +93,7 @@ static void elements_array_element_end(void *state, bool isnull);
 static void elements_scalar(void *state, char *token, JsonTokenType tokentype);
 
 /* turn a json object into a hash table */
-static HTAB *get_json_object_as_hash(text *json, const char *funcname);
+static HTAB *get_json_object_as_hash(char *json, int len, const char *funcname);
 
 /* common worker for populate_record and to_record */
 static Datum populate_record_worker(FunctionCallInfo fcinfo, const char *funcname,
@@ -149,6 +149,33 @@ static void setPathArray(JsonbIterator **it, Datum *path_elems,
 			 int level, Jsonb *newval, uint32 nelems, int op_type);
 static void addJsonbToParseState(JsonbParseState **jbps, Jsonb *jb);
 
+/* helper functions for populate_record[set] */
+typedef struct ColumnIOData ColumnIOData;
+typedef struct RecordIOData RecordIOData;
+
+static HeapTupleHeader
+populate_record(TupleDesc		tupdesc,
+RecordIOData  **record_info,
+HeapTupleHeader	template,
+MemoryContext	mcxt,
+Oidjtype,
+HTAB	 	   *json_hash,
+JsonbContainer *cont);
+
+static Datum
+populate_record_field(ColumnIOData *col,
+	  Oid			typid,
+	  int32			typmod,
+	  int32			ndims,
+	  const char   *colname,
+	  MemoryContext	mcxt,
+	  Datum			defaultval,
+	  Oid			jtype,
+	  char		   *json,
+	  bool			json_is_string,
+	  JsonbValue   *jval,
+	  bool		   *isnull);
+
 /* state for json_object_keys */
 typedef struct OkeysState
 {
@@ -216,6 +243,7 @@ typedef struct JhashState
 	HTAB	   *hash;
 	char	   *saved_scalar;
 	char	   *save_json_start;
+	bool		saved_scalar_is_string;
 } JHashState;
 
 /* hashtable element */
@@ -223,26 +251,67 @@ typedef struct JsonHashEntry
 {
 	char		fname[NAMEDATALEN];		/* hash key (MUST BE FIRST) */
 	char	   *val;
-	char	   *json;
 	bool		isnull;
+	bool		isstring;
 } JsonHashEntry;
 
-/* these two are stolen from hstore / record_out, used in populate_record* */
-typedef struct ColumnIOData
+/* structure to cache type I/O metadata needed for populate_scalar() */
+typedef struct ScalarIOData
 {
-	Oid			column_type;
-	Oid			typiofunc;
 	Oid			typioparam;
-	FmgrInfo	proc;
-} ColumnIOData;
+	FmgrInfo	typiofunc;
+} ScalarIOData;
 
-typedef struct RecordIOData
+/* structure to cache metadata needed for populate_array() */
+typedef struct ArrayIOData
+{
+	Oidelement_type;	/* array element

[HACKERS] Some thoughts about multi-server sync rep configurations

2016-12-27 Thread Thomas Munro

Hi,

Sync rep with multiple standbys allows queries run on standbys to see
transactions that haven't been flushed on the configured number of
standbys.  That means that it's susceptible to lost updates or a kind
of "dirty read" in certain cluster reconfiguration scenarios.  To
close that gap, we would need to introduce extra communication so that
standbys wait for flushes on other servers before a snapshot can be
used, in certain circumstances.  That doesn't sound like a popular
performance feature, and I don't have a concrete proposal for that,
but wanted to raise the topic for discussion to see if others have
thought about it.

I speculate that there are three things that need to be aligned for
multi-server sync rep to be able to make complete guarantees about
durability, and pass the kind of testing that someone like Aphyr of
Jepsen[1] would throw at it.  Of course you might argue that the
guarantees about reconfiguration that I'm assuming in the following
are not explicitly made anywhere, and I'd be very interested to hear
what others have to say about them.  Suppose you have K servers and
you decide that you want to be able to lose N servers without data
loss.  Then as far as I can see the three things are:

1.  While automatic replication cluster management is not part of
Postgres, you must have a manual or automatic procedure with two
important properties in order for synchronous replication to be able
to reconfigure without data loss.  At cluster reconfiguration time on
the loss of the primary, you must be able to contact at least K - N
servers and of those you must promote the server that has the highest
LSN, otherwise there is no way to know that the latest successfully
committed transaction is present in the new timeline.  Furthermore,
you must be able to contact more than K / 2 servers (a majority) to
avoid split-brain syndrome.

2.  The primary must wait for N standby servers to acknowledge
flushing before returning from commit, as we do.

3.  No server must allow a transaction to be visible that hasn't been
flushed on N standby servers.  We already prevent that on the primary,
but not on standbys.  You might see a transaction on a given standby,
then lose that standby and the primary, and then a new primary might
be elected that doesn't have that transaction.  We don't have this
problem if you only run queries on the primary, and we don't have it
on single standby configurations ie K = 2 and N = 1.  But as soon as K
> 2 and N > 1, we can have the problem on standbys.

Example:

You have 2 servers in London, 2 in New York and 2 in Tokyo.  You
enable synchronous replication with N = 3.  A transaction updates a
record and commits locally on host "london1", and begins waiting for 3
servers to respond.  A network fault prevents messages from London
reaching the other two data centres, because the rack is on fire.  But
"london2" receives and applies the WAL.  Now another session sees this
transaction on "london2" and reports a fact that it represents to a
customer.  Finally failover software or humans determine that it's
time to promote a new primary server in Tokyo.  The fact reported to
the customer has evaporated; that was a kind of "dirty read" that
might have consequences for your business.

[1] https://aphyr.com/tags/jepsen

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2016-12-27 Thread Pavel Stehule

2016-12-27 23:56 GMT+01:00 Merlin Moncure :

> On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule 
> wrote:
> > Hi
> >
> > I reread ideas described on page https://github.com/trustly/plpgsql2
> >
> > Some points are well and can be benefit for PlpgSQL.
> >
> > First I describe my initial position. I am strongly against introduction
> > "new" language - plpgsql2 or new plpgsql, or any else. The trust of
> > developers to us is important and introduction of any not compatible or
> > different feature has to have really big reason. PostgreSQL is
> conservative
> > environment, and PLpgSQL should not be a exception. More - I have not any
> > information from my customers, colleagues about missing features in this
> > language.  If there is some gaps, then it is in outer environment - IDE,
> > deployment, testing,
>
> Breaking language compatibility is a really big deal.  There has to be
> a lot of benefits to the effort and you have to make translation from
> plpgsql1 to plpgsql2 really simple.  You have made some good points on
> the rationale but not nearly enough to justify implementation fork. So
> basically I agree.  Having said that, If you don't mind I'd like to
> run with the topic (which I'm loosely interpreting as, "Things I'd
> like to do in SQL/PLPGSQL and can't").
>
> #1 problem with plpgsql in my point of view is that the language and
> grammar are not supersets of sql.  A lot of PLPGSQL keywords (EXECUTE,
> BEGIN, INTO, END) have incompatible meanings with our SQL
> implementation.  IMNSHO, SQL ought to give the same behavior inside or
> outside of plpgsql.  It doesn't, and this is one of the reasons why
> plpgsql may not be a good candidate for stored procedure
> implementation.
>

There is little bit cleaner language for this purpose - SQL/PSM. But it is
hard to switch main language without big lost of reputation. I am not sure
about benefit.


> #2 problem with plpgsql is after function entry it's too late to do
> things like set transaction isolation level and change certain kinds
> of variables (like statement_timeout).  This is very obnoxious, I
> can't wrap the database in an API 100%; the application has to manage
> things that really should be controlled in SQL.
>

It is long story about implementation procedures - it is not related to
PLpgSQL - the language is not a issue.


>
> #3 problem with plpgsql is complete lack of inlining.  inlining
> function calls in postgres is a black art even for very trivial cases.
> This makes it hard for us to write quick things and in the worst case
> causes endless duplications of simple expressions.


> In short I guess the issue is that we don't have stored procedures and
> I don't see an easy path to getting there with the current language.
> There are a lot of other little annoyances but most of them can be
> solved without a compatibility break.
>

I don't think so implementation of procedures will be simple, but I don't
see any issue in PLpgSQL.


> It would be pretty neat if postgres SQL implementation could directly
> incorporate limited flow control and command execution.  For example,
> CREATE my_proc(Done OUT BOOL) RETURNS BOOL AS
> $$
>   BEGIN;
>   SET transaction_isolation = 'serializable';
>   SELECT some_plpgsql_func_returning_bool();
>   COMMIT;
> $$;
> CALL my_proc() UNTIL Done;
>
> Key points here are:
> *) my_proc is in native SQL (not plpgsql), and run outside of snapshot
> *) CALL is invocation into stored procedure.  I extended it in similar
> fashion as pl/sql CALL
> (https://docs.oracle.com/cd/B19306_01/server.102/b14200/
> statements_4008.htm)
> but anything will do for syntaxs as long as you get arbitrary control
> of procedure lifetime external to snapshot and transaction
> *) simple addition of UNTIL gets us out of the debate for best 'stored
> procedure language'.   Keeping things to pure SQL really simplifies
> things since we already have statement parsing at tcop level.  We just
> need some special handling for CALL.
> *) In my usage of plpgsql maybe 80% of database cases are covered
> purely in language but maybe 20% of cases need support from
> application typically where threading and transaction management is
> involved.  With the above it would be more like 95% would be covered
> and if you extended CALL to something like:
>

It is similar to my older proposals of stored procedures.


>
> CALL my_proc() IN BACKGROUND UNTIL Done;
>
> ..where "IN BACKGOUND" moved execution to a background worker one
> could do just about everything in SQL in tasks that do nothing but
> read and write to the database that today need significant support
> from outside language (primarily bash for me).
>
> With respect to stuff you mentioned, like smarter handling of INTO,
> are you really sure you need to break compatibility for that?
>

I didn't propose any compatibility break.

Can we talk about another proposals separately, please. Stored procedures,
batch processing, different language

Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Andreas Seltenreich

Andreas Seltenreich writes:

> Thomas Munro writes:
>
>> It is safe, as long as the segment remains mapped.  Each backend that
>> attaches calls LWLockRegisterTranche giving it the address of the name
>> in its virtual address space.
>
> Hmok, I was under the impression only backends participating in the IPC
> call the attach function, not necessarily the ones that could possible
> want to resolve the wait_event_info they found in the procArray via
> pgstat_get_wait_event().

Erm, ignore that question: They'll find a NULL in their
LWLockTrancheArray and run into the "extension" case you mentioned.

> But I really feel like I need to study the code a bit more before
> commenting further…

Following this advise now :-)

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Andreas Seltenreich

Thomas Munro writes:

> On Wed, Dec 28, 2016 at 11:38 AM, Andreas Seltenreich
>  wrote:
>> Thomas Munro writes:
>>
>>> On Wed, Dec 28, 2016 at 10:40 AM, Thomas Munro 
>>>  wrote:
 On Wed, Dec 28, 2016 at 10:01 AM, Andreas Seltenreich  
 wrote:
> testing master as of fe591f8bf6 produced a crash reading
> pg_stat_activity (backtrace below).  Digging around with with gdb
> revealed that pgstat_get_wait_event() returned an invalid pointer for a
> classId PG_WAIT_LWLOCK.
>
> I think the culprit is dsa.c passing a pointer to memory that goes away
> on dsa_free() as a name to LWLockRegisterTranche.
>> [..]
 Maybe we should replace it with another value when the DSA area is
 detached, using a constant string.  Something like
>>
>> I'm wondering: Is it safe to pass a pointer into a DSA at all?  If I
>> understand the comments correctly, they are not necessarily mapped (at
>> the same address) in an unrelated backend looking into pg_stat_activity,
>> and in this case a dsa_free() is not actually needed to trigger a crash.
>
> It is safe, as long as the segment remains mapped.  Each backend that
> attaches calls LWLockRegisterTranche giving it the address of the name
> in its virtual address space.

Hmok, I was under the impression only backends participating in the IPC
call the attach function, not necessarily the ones that could possible
want to resolve the wait_event_info they found in the procArray via
pgstat_get_wait_event().

>> Maybe instead of copying the name, just put the passed pointer itself
>> into the area?  Extensions using LWLockNewTrancheId need to use
>> shared_preload_libraries anyway, so static strings would be mapped in
>> all backends.
>
> Yeah that would be another way.  I had this idea that only the process
> that creates a DSA area should name it, and then processes attaching
> would see the existing tranche ID and name, so could use a narrower
> interface.  We could instead do as you say and make processes that
> attach provide a pointer to the name too, and make it the caller's
> problem to ensure that the pointers remain valid long enough; or go
> one step further and make them register/unregister it themselves.

Hmm, turning the member of the control struct
char lwlock_tranche_name[DSA_MAXLEN];
into
const char *lwlock_tranche_name;
and initializing it with the passed static const char * instead of
copying wouldn't require a change of the interface, would it?

But I really feel like I need to study the code a bit more before
commenting further…

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-12-27 Thread Tomas Vondra


On 12/23/2016 03:58 AM, Amit Kapila wrote:

On Thu, Dec 22, 2016 at 6:59 PM, Tomas Vondra
 wrote:

Hi,

But as discussed with Amit in Tokyo at pgconf.asia, I got access to a
Power8e machine (IBM 8247-22L to be precise). It's a much smaller machine
compared to the x86 one, though - it only has 24 cores in 2 sockets, 128GB
of RAM and less powerful storage, for example.

I've repeated a subset of x86 tests and pushed them to

https://bitbucket.org/tvondra/power8-results-2

The new results are prefixed with "power-" and I've tried to put them right
next to the "same" x86 tests.

In all cases the patches significantly reduce the contention on
CLogControlLock, just like on x86. Which is good and expected.



The results look positive.  Do you think we can conclude based on all
the tests you and Dilip have done, that we can move forward with this
patch (in particular group-update) or do you still want to do more
tests?   I am aware that in one of the tests we have observed that
reducing contention on CLOGControlLock has increased the contention on
WALWriteLock, but I feel we can leave that point as a note to
committer and let him take a final call.  From the code perspective
already Robert and Andres have taken one pass of review and I have
addressed all their comments, so surely more review of code can help,
but I think that is not a big deal considering patch size is
relatively small.



Yes, I believe that seems like a reasonable conclusion. I've done a few 
more tests on the Power machine with data placed on a tmpfs filesystem 
(to minimize all the I/O overhead), but the results are the same.


I don't think more testing is needed at this point, at lest not with the 
synthetic test cases we've been using for the testing. The patch already 
received way more benchmarking than most other patches.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Thomas Munro

On Wed, Dec 28, 2016 at 11:38 AM, Andreas Seltenreich
 wrote:
> Thomas Munro writes:
>
>> On Wed, Dec 28, 2016 at 10:40 AM, Thomas Munro 
>>  wrote:
>>> On Wed, Dec 28, 2016 at 10:01 AM, Andreas Seltenreich  
>>> wrote:
 testing master as of fe591f8bf6 produced a crash reading
 pg_stat_activity (backtrace below).  Digging around with with gdb
 revealed that pgstat_get_wait_event() returned an invalid pointer for a
 classId PG_WAIT_LWLOCK.

 I think the culprit is dsa.c passing a pointer to memory that goes away
 on dsa_free() as a name to LWLockRegisterTranche.
> [..]
>>> Maybe we should replace it with another value when the DSA area is
>>> detached, using a constant string.  Something like
>
> I'm wondering: Is it safe to pass a pointer into a DSA at all?  If I
> understand the comments correctly, they are not necessarily mapped (at
> the same address) in an unrelated backend looking into pg_stat_activity,
> and in this case a dsa_free() is not actually needed to trigger a crash.

It is safe, as long as the segment remains mapped.  Each backend that
attaches calls LWLockRegisterTranche giving it the address of the name
in its virtual address space.  The problem is that after the segment
is unmapped, that address is now garbage, which is clearly a bug.
That's why I started talking about how to 'unregister' it (or register
a replacement constant name).  We have some code that always runs
before the control segment containing the name is detached, so that'd
be the perfect place to do that.

>> That line of thinking suggests another potential solution: go and
>> register the name in RegisterLWLockTranches, and stop registering it
>> in dsa.c.  For other potential uses of DSA including extensions that
>> call LWLockNewTrancheId() we'd have to decide whether to make the name
>> an optional argument, or require those users to register it
>> themselves.
>
> Maybe instead of copying the name, just put the passed pointer itself
> into the area?  Extensions using LWLockNewTrancheId need to use
> shared_preload_libraries anyway, so static strings would be mapped in
> all backends.

Yeah that would be another way.  I had this idea that only the process
that creates a DSA area should name it, and then processes attaching
would see the existing tranche ID and name, so could use a narrower
interface.  We could instead do as you say and make processes that
attach provide a pointer to the name too, and make it the caller's
problem to ensure that the pointers remain valid long enough; or go
one step further and make them register/unregister it themselves.

But I'm starting to think that the best way might be to do BOTH of the
things I said in my previous message: make dsa.c register on
create/attach and also unregister before detaching iff the name was
supplied at creation time for the benefit of extension writers, but
make it not do anything at all about tranche name
registration/unregistration if NULL was passed in at create time.
Then register this particular tranche (LWTRANCHE_PARALLEL_QUERY_DSA)
in every process in RegisterLWLockTranches.  That way, you'd get a
useful name in pg_stat_activity for other backends that are running
parallel queries if they are ever waiting for these locks (unlikely
but interesting to know abotu if it happens).

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2016-12-27 Thread Merlin Moncure

On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule  wrote:
> Hi
>
> I reread ideas described on page https://github.com/trustly/plpgsql2
>
> Some points are well and can be benefit for PlpgSQL.
>
> First I describe my initial position. I am strongly against introduction
> "new" language - plpgsql2 or new plpgsql, or any else. The trust of
> developers to us is important and introduction of any not compatible or
> different feature has to have really big reason. PostgreSQL is conservative
> environment, and PLpgSQL should not be a exception. More - I have not any
> information from my customers, colleagues about missing features in this
> language.  If there is some gaps, then it is in outer environment - IDE,
> deployment, testing,

Breaking language compatibility is a really big deal.  There has to be
a lot of benefits to the effort and you have to make translation from
plpgsql1 to plpgsql2 really simple.  You have made some good points on
the rationale but not nearly enough to justify implementation fork. So
basically I agree.  Having said that, If you don't mind I'd like to
run with the topic (which I'm loosely interpreting as, "Things I'd
like to do in SQL/PLPGSQL and can't").

#1 problem with plpgsql in my point of view is that the language and
grammar are not supersets of sql.  A lot of PLPGSQL keywords (EXECUTE,
BEGIN, INTO, END) have incompatible meanings with our SQL
implementation.  IMNSHO, SQL ought to give the same behavior inside or
outside of plpgsql.  It doesn't, and this is one of the reasons why
plpgsql may not be a good candidate for stored procedure
implementation.

#2 problem with plpgsql is after function entry it's too late to do
things like set transaction isolation level and change certain kinds
of variables (like statement_timeout).  This is very obnoxious, I
can't wrap the database in an API 100%; the application has to manage
things that really should be controlled in SQL.

#3 problem with plpgsql is complete lack of inlining.  inlining
function calls in postgres is a black art even for very trivial cases.
This makes it hard for us to write quick things and in the worst case
causes endless duplications of simple expressions.

In short I guess the issue is that we don't have stored procedures and
I don't see an easy path to getting there with the current language.
There are a lot of other little annoyances but most of them can be
solved without a compatibility break.

It would be pretty neat if postgres SQL implementation could directly
incorporate limited flow control and command execution.  For example,
CREATE my_proc(Done OUT BOOL) RETURNS BOOL AS
$$
  BEGIN;
  SET transaction_isolation = 'serializable';
  SELECT some_plpgsql_func_returning_bool();
  COMMIT;
$$;
CALL my_proc() UNTIL Done;

Key points here are:
*) my_proc is in native SQL (not plpgsql), and run outside of snapshot
*) CALL is invocation into stored procedure.  I extended it in similar
fashion as pl/sql CALL
(https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_4008.htm)
but anything will do for syntaxs as long as you get arbitrary control
of procedure lifetime external to snapshot and transaction
*) simple addition of UNTIL gets us out of the debate for best 'stored
procedure language'.   Keeping things to pure SQL really simplifies
things since we already have statement parsing at tcop level.  We just
need some special handling for CALL.
*) In my usage of plpgsql maybe 80% of database cases are covered
purely in language but maybe 20% of cases need support from
application typically where threading and transaction management is
involved.  With the above it would be more like 95% would be covered
and if you extended CALL to something like:

CALL my_proc() IN BACKGROUND UNTIL Done;

..where "IN BACKGOUND" moved execution to a background worker one
could do just about everything in SQL in tasks that do nothing but
read and write to the database that today need significant support
from outside language (primarily bash for me).

With respect to stuff you mentioned, like smarter handling of INTO,
are you really sure you need to break compatibility for that?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Andreas Seltenreich

Thomas Munro writes:

> On Wed, Dec 28, 2016 at 10:40 AM, Thomas Munro 
>  wrote:
>> On Wed, Dec 28, 2016 at 10:01 AM, Andreas Seltenreich  
>> wrote:
>>> testing master as of fe591f8bf6 produced a crash reading
>>> pg_stat_activity (backtrace below).  Digging around with with gdb
>>> revealed that pgstat_get_wait_event() returned an invalid pointer for a
>>> classId PG_WAIT_LWLOCK.
>>>
>>> I think the culprit is dsa.c passing a pointer to memory that goes away
>>> on dsa_free() as a name to LWLockRegisterTranche.
[..]
>> Maybe we should replace it with another value when the DSA area is
>> detached, using a constant string.  Something like

I'm wondering: Is it safe to pass a pointer into a DSA at all?  If I
understand the comments correctly, they are not necessarily mapped (at
the same address) in an unrelated backend looking into pg_stat_activity,
and in this case a dsa_free() is not actually needed to trigger a crash.

> That line of thinking suggests another potential solution: go and
> register the name in RegisterLWLockTranches, and stop registering it
> in dsa.c.  For other potential uses of DSA including extensions that
> call LWLockNewTrancheId() we'd have to decide whether to make the name
> an optional argument, or require those users to register it
> themselves.

Maybe instead of copying the name, just put the passed pointer itself
into the area?  Extensions using LWLockNewTrancheId need to use
shared_preload_libraries anyway, so static strings would be mapped in
all backends.

regards,
Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Thomas Munro

On Wed, Dec 28, 2016 at 10:40 AM, Thomas Munro
 wrote:
> On Wed, Dec 28, 2016 at 10:01 AM, Andreas Seltenreich
>  wrote:
>> testing master as of fe591f8bf6 produced a crash reading
>> pg_stat_activity (backtrace below).  Digging around with with gdb
>> revealed that pgstat_get_wait_event() returned an invalid pointer for a
>> classId PG_WAIT_LWLOCK.
>>
>> I think the culprit is dsa.c passing a pointer to memory that goes away
>> on dsa_free() as a name to LWLockRegisterTranche.
>
> I see, nice detective work!
>
> dsa.c's create_internal and attach_internal call LWLockRegisterTranche
> with a pointer to a name in a DSM segment (copied from an argument to
> dsa_create), which doesn't have the right extent in this rare race
> case: the DSM segment goes away when the last backend detaches, but
> after you've detached you might still see others waiting for a lock
> that references that tranche ID in pg_stat_activity.  This could be
> fixed by copying the tranche name to somewhere with backend lifetime
> extent in each backend, but then that would need cleanup.  Maybe we
> should replace it with another value when the DSA area is detached,
> using a constant string.  Something like
> LWLockRegisterTranche(tranche_id, "detached_dsa").  That would be a
> new use for the LWLockRegisterTranche function, for
> re-registering/renaming and existing tranche ID.  Thoughts?

Or we could introduce a new function
UnregisterLWLockTranche(tranche_id) that is equivalent to
RegisterLWLockTranche(tranche_id, NULL), and call that when detaching
(ie in the hook that runs before we detach the control segment that
holds the name).  It looks like NULL would result in
GetLWLockIdentifier returning "extension".  I guess before DSA came
along only extensions could create lock tranches not known to every
backend, but perhaps now it should return "unknown"?

It does seem a little odd that we are using a well known tranche ID
defined in lwlock.h (as opposed to an ID obtained at runtime by
LWLockNewTrancheId()) but the tranche name has to be registered at
runtime in each backend and is unknown to other backends.  That line
of thinking suggests another potential solution: go and register the
name in RegisterLWLockTranches, and stop registering it in dsa.c.  For
other potential uses of DSA including extensions that call
LWLockNewTrancheId() we'd have to decide whether to make the name an
optional argument, or require those users to register it themselves.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

2016-12-27 Thread Fabien COELHO



Hello Tom,


How? The issue is that stmtmulti silently skip some ';' when empty
statements are found, [...]


Maybe you should undo that.


I was trying to be minimally invasive in the parser, i.e. not to change 
any rules.


I've generally found that trying to put optimizations into the grammar 
is a bad idea; it's better to KISS in gram.y and apply optimizations 
later on.


I can change rules, but I'm not sure it will be better. It would allow to 
remove the last ';' location in the lexer which Kyotar does not like, but 
it would add some new stretching to compute the length of statements and 
remove the empty statements.


I would say that it would be different, but not necessarily better.


- Make all parse nodes have the following two members at the
beginning. This unifies all parse node from the view of setting
their location. Commenting about this would be better.

| NodeTag   type;
| int   location;


I do not like this.  You are making what should be a small patch into
a giant, invasive one.


I would not say that the current patch is giant & invasive, if you 
abstract the two added fields to high-level statements.


I'd think about adding a single parse node type that corresponds to 
"stmt" in the grammar and really doesn't do anything but carry the 
location info for the overall statement.  That info could then be 
transferred into the resulting Query node.


I'm not sure I understand your suggestion. Currently I have added the 
location & length information to all high-level statements nodes, plus 
query and planned. I think that planned is necessary for utility 
statements.


I understand that you suggest to add a new intermediate structure:

  typedef struct {
NodeTag tag;
int location, length;
Node *stmt;
  } ParseNode;

So that raw_parser would return a List instead of a List, 
and the statements would be unmodified.



Problem solved with (I think) very little code touched.


Hmmm...

Then raw_parser() callers have to manage a List instead of the 
List, I'm not sure of the size of the propagation to manage the 
added indirection level appropriately: raw_parser is called 4 times (in 
parser, tcop, commands, plpgsql...). The first call in tcop is 
copyObject(), equal(), check_log_statement(), errdetail_execute()...


Probably it can be done, but I'm moderately unthousiastic: it looks like 
starting unwinding a ball of wool, and I'm not sure where it would stop, 
as it means touching at least the 4 uses of raw_parser in 4 distinct part 
of postgres, whereas the current approach did not change anything for 
raw_parser callers, although at the price of modifying all statement 
nodes.


Did I read you correctly?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks

2016-12-27 Thread Jim Nasby


On 12/27/16 1:43 PM, David Fetter wrote:

So I'm a bit suspicious of this project in the first place, but it's
hard to discuss which hooks should be documented when you haven't
defined what you mean by documentation.

I haven't quite come up with that, but I'd pictured a part of the SGML
docs that goes over, at a minimum, what all the hooks are and what
they do, at least at the level of a sentence or paragraph's worth of
description.


AFAIK there's no way to get a list of hooks today, short of something 
like `git grep hook`. I think a simple list of what hooks we have, when 
they fire and where to find them in code would be sufficient.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_activity.waiting_start

2016-12-27 Thread Jim Nasby


On 12/27/16 11:17 AM, Greg Stark wrote:



On Dec 24, 2016 5:44 PM, "Tom Lane" > wrote:

I think we'd need at least an order of
magnitude cheaper to consider putting timing calls into spinlock or
lwlock
paths, and that's just not available at all, let alone portably.


For spinlocks we could conceivably just bite the bullet and use a raw
rdtsc or the equivalent for other platforms. It might be pretty easy to
distinguish sane numbers from numbers that result after a process
reschedule and we could just discard data when that happens (or count
occurrences).

That may possibly work for spinlocks but it won't work for anything
heavier where process reschedules are routine.


Anything heavier and a ~50ns gettimeofday() call is probably not that 
bad anymore...


I also wonder if setting an alarm is cheap enough to happen during a 
spinlock? Set a fairly short alarm on entry, clear it on exit. If the 
alarm fires, do the gettimeofday(). Add the alarm time back in and 
you're going to be close enough to when the wait started for any 
practical use of pg_stat_activity (perhaps this is what Stephen is 
suggesting about deadlock timeout...)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Thomas Munro

On Wed, Dec 28, 2016 at 10:01 AM, Andreas Seltenreich
 wrote:
> testing master as of fe591f8bf6 produced a crash reading
> pg_stat_activity (backtrace below).  Digging around with with gdb
> revealed that pgstat_get_wait_event() returned an invalid pointer for a
> classId PG_WAIT_LWLOCK.
>
> I think the culprit is dsa.c passing a pointer to memory that goes away
> on dsa_free() as a name to LWLockRegisterTranche.

I see, nice detective work!

dsa.c's create_internal and attach_internal call LWLockRegisterTranche
with a pointer to a name in a DSM segment (copied from an argument to
dsa_create), which doesn't have the right extent in this rare race
case: the DSM segment goes away when the last backend detaches, but
after you've detached you might still see others waiting for a lock
that references that tranche ID in pg_stat_activity.  This could be
fixed by copying the tranche name to somewhere with backend lifetime
extent in each backend, but then that would need cleanup.  Maybe we
should replace it with another value when the DSA area is detached,
using a constant string.  Something like
LWLockRegisterTranche(tranche_id, "detached_dsa").  That would be a
new use for the LWLockRegisterTranche function, for
re-registering/renaming and existing tranche ID.  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Andreas Seltenreich

Hi,

testing master as of fe591f8bf6 produced a crash reading
pg_stat_activity (backtrace below).  Digging around with with gdb
revealed that pgstat_get_wait_event() returned an invalid pointer for a
classId PG_WAIT_LWLOCK.

I think the culprit is dsa.c passing a pointer to memory that goes away
on dsa_free() as a name to LWLockRegisterTranche.

regards,
Andreas

Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
(gdb) bt
#1  0x007e03c9 in cstring_to_text (s=0x7fab18d1f954 ) at varlena.c:152
#2  0x00792d7c in pg_stat_get_activity (fcinfo=) at 
pgstatfuncs.c:805
#3  0x005f0af5 in ExecMakeTableFunctionResult (funcexpr=0x5469f90, 
econtext=0x5469c80, argContext=, expectedDesc=0x387b2b0, 
randomAccess=0 '\000') at execQual.c:2216
#4  0x00608633 in FunctionNext (node=node@entry=0x5469b68) at 
nodeFunctionscan.c:94
#5  0x005f2c22 in ExecScanFetch (recheckMtd=0x608390 , 
accessMtd=0x6083a0 , node=0x5469b68) at execScan.c:95
#6  ExecScan (node=node@entry=0x5469b68, accessMtd=accessMtd@entry=0x6083a0 
, recheckMtd=recheckMtd@entry=0x608390 ) at 
execScan.c:180
#7  0x0060867f in ExecFunctionScan (node=node@entry=0x5469b68) at 
nodeFunctionscan.c:268
#8  0x005eb4c8 in ExecProcNode (node=node@entry=0x5469b68) at 
execProcnode.c:449
#9  0x00602cd0 in ExecLimit (node=node@entry=0x54697f0) at 
nodeLimit.c:91
#10 0x005eb368 in ExecProcNode (node=node@entry=0x54697f0) at 
execProcnode.c:531
[...]


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: session server side variables

2016-12-27 Thread Pavel Stehule

2016-12-27 21:38 GMT+01:00 Fabrízio de Royes Mello 
:

>
> On Fri, Dec 23, 2016 at 4:00 PM, Joe Conway  wrote:
> >
> > >> In the long term, What would be the possible scopes?
> > >>
> > >> TRANSACTION, SESSION, PERSISTANT ?
> > >>
> > >> Would some scopes orthogonal (eg SHARED between sessions for a USER
> in a
> > >> DATABASE, SHARED at the cluster level?).
> > >
> > > I have a plan to support TRANSACTION and SESSION scope. Persistent or
> > > shared scope needs much more complex rules, and some specialized
> extensions
> > > will be better.
> >
> >
> > I can see where persistent variables would be very useful though.
> >
>
> Can you talk more about your though??
>
> I'm thinking about PERSISTENT VARIABLES and maybe they can be the redesign
> of our hard coded "reloptions", with the ability to provide users to create
> their own customized. If we think more carefully we already have some
> persistent variables with specialized context: reloptions (hardcoded),
> security labels (local and shared catalog) and comments (local and shared
> catalog). I was clear enough?
>

What is difference from table Values(keys varchar primary key, value text)
? Where is any benefit?

Best regards

Pavel



>
> Regards,
>
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Timbira: http://www.timbira.com.br
> >> Blog: http://fabriziomello.github.io
> >> Linkedin: http://br.linkedin.com/in/fabriziomello
> >> Twitter: http://twitter.com/fabriziomello
> >> Github: http://github.com/fabriziomello
>

Re: [HACKERS] proposal: session server side variables

2016-12-27 Thread Fabrízio de Royes Mello

On Fri, Dec 23, 2016 at 4:00 PM, Joe Conway  wrote:
>
> >> In the long term, What would be the possible scopes?
> >>
> >> TRANSACTION, SESSION, PERSISTANT ?
> >>
> >> Would some scopes orthogonal (eg SHARED between sessions for a USER in
a
> >> DATABASE, SHARED at the cluster level?).
> >
> > I have a plan to support TRANSACTION and SESSION scope. Persistent or
> > shared scope needs much more complex rules, and some specialized
extensions
> > will be better.
>
>
> I can see where persistent variables would be very useful though.
>

Can you talk more about your though??

I'm thinking about PERSISTENT VARIABLES and maybe they can be the redesign
of our hard coded "reloptions", with the ability to provide users to create
their own customized. If we think more carefully we already have some
persistent variables with specialized context: reloptions (hardcoded),
security labels (local and shared catalog) and comments (local and shared
catalog). I was clear enough?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Hooks

2016-12-27 Thread David Fetter

On Tue, Dec 27, 2016 at 01:32:46PM -0500, Tom Lane wrote:
> David Fetter  writes:
> > One of our hidden treasures is the hook system, documented only in
> > random presentations, if you can find them, and in the source code, if
> > you know to look.
> 
> > I'd like to document the hooks that we consider public APIs.
> 
> The main reason we send people to the source code for that is that
> it's often not very clear what the extent of a hook's API is.  It would
> not be terribly useful to document, say, planner_hook just by listing
> its arguments and result type.

Indeed it wouldn't, but there are hooks where the API is, at least to
me, a little clearer, and I'll start with those.

> To do anything useful with that hook requires a pretty extensive
> understanding of what the planner does, and you're not going to get
> that without a willingness to read source.

I would assume willingness to read source for anything hook related.
Documenting the hooks will help with places to look.

> So I'm a bit suspicious of this project in the first place, but it's
> hard to discuss which hooks should be documented when you haven't
> defined what you mean by documentation.

I haven't quite come up with that, but I'd pictured a part of the SGML
docs that goes over, at a minimum, what all the hooks are and what
they do, at least at the level of a sentence or paragraph's worth of
description.

> Anyway, there aren't any hooks that weren't put in with the expectation
> of third-party code using them, so I'm not following your proposed
> distinction between public and private hooks.

That answers the question I had, which is essentially, "are there any
private hooks?" and the answer is no.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] psql tab completion for ALTER DEFAULT PRIVILEGES

2016-12-27 Thread Stephen Frost

Gilles, all,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Gilles Darold (gilles.dar...@dalibo.com) wrote:
> > Added to next commitfest. To explain more this patch, the completion of
> > SQL command:
> > 
> > ALTER DEFAULT PRIVILEGES FOR ROLE xxx [tab]
> 
> Here is a cleaned up patch for master and 9.6.  Tomorrow I'll look into
> what we can do for 9.5 and earlier, which are also wrong, but the code
> is quite a bit different.

Just a note that I've now fixed this and back-patched it, per
discussion.  I also closed it out on the commitfest app.

Thanks again!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Hooks

2016-12-27 Thread Tom Lane

David Fetter  writes:
> One of our hidden treasures is the hook system, documented only in
> random presentations, if you can find them, and in the source code, if
> you know to look.

> I'd like to document the hooks that we consider public APIs.

The main reason we send people to the source code for that is that
it's often not very clear what the extent of a hook's API is.  It would
not be terribly useful to document, say, planner_hook just by listing
its arguments and result type.  To do anything useful with that hook
requires a pretty extensive understanding of what the planner does,
and you're not going to get that without a willingness to read source.

So I'm a bit suspicious of this project in the first place, but it's
hard to discuss which hooks should be documented when you haven't
defined what you mean by documentation.

Anyway, there aren't any hooks that weren't put in with the expectation
of third-party code using them, so I'm not following your proposed
distinction between public and private hooks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

2016-12-27 Thread Tom Lane

Fabien COELHO  writes:
> How? The issue is that stmtmulti silently skip some ';' when empty 
> statements are found, so I need to keep track of that to know where to 
> stop, using the beginning location of the next statement, which is 
> probably your idea, does not work.

Maybe you should undo that.  I've generally found that trying to put
optimizations into the grammar is a bad idea; it's better to KISS in
gram.y and apply optimizations later on.

>> - Make all parse nodes have the following two members at the
>> beginning. This unifies all parse node from the view of setting
>> their location. Commenting about this would be better.
>> 
>> | NodeTag   type;
>> | int   location;

I do not like this.  You are making what should be a small patch into
a giant, invasive one.  I'd think about adding a single parse node type
that corresponds to "stmt" in the grammar and really doesn't do anything
but carry the location info for the overall statement.  That info could
then be transferred into the resulting Query node.  Problem solved with
(I think) very little code touched.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hooks

2016-12-27 Thread David Fetter

Folks,

One of our hidden treasures is the hook system, documented only in
random presentations, if you can find them, and in the source code, if
you know to look.

I'd like to document the hooks that we consider public APIs.

To do this, I need to figure out whether there are hooks that we don't
consider public APIs, ideally in some principled way.  C doesn't have
affordances built in for this, but maybe we've done something else to
indicate which are implementation details and which are public APIs.

Are there any hooks I should not document?  If so, how will I tell in
the future that a new hook shouldn't be documented?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for pg_receivexlog --format=plain|tar

2016-12-27 Thread Magnus Hagander

On Tue, Dec 27, 2016 at 1:16 PM, Michael Paquier 
wrote:

> On Tue, Dec 27, 2016 at 6:34 PM, Magnus Hagander 
> wrote:
> > On Tue, Dec 27, 2016 at 2:23 AM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >> Magnus, you have mentioned me as well that you had a couple of ideas
> >> on the matter, feel free to jump in and let's mix our thoughts!
> >
> >
> > Yeah, I've been wondering what the actual usecase is here :)
>
> There is value to compress segments finishing with trailing zeros,
> even if they are not the species with the highest representation in
> the WAL archive.
>


Agreed on that part -- that's the value in compression though, and not
necessarily the TAR format itself.

Is there any value of the TAR format *without* compression in your scenario?



> > Though I was considering the case where all segments are streamed into
> the
> > same tarfile (and then some sort of configurable limit where we'd switch
> > tarfile after  segments, which rapidly started to feel too
> complicated).
> >
> > What's the actual advantage of having it wrapped inside a single tarfile?
>
> I am advocating for one tar file per segment to be honest. Grouping
> them makes the failure handling more complicated when connection to
> the server is killed, or the replication stream is cut. Well, not
> really complicated actually, because I think that you would need to
> drop in the segment folder a status file with enough information to
> let pg_receivexlog know from where in the tar file it needs to
> continue writing. If a new tarball is created for each segment,
> deciding from where to stream after a connection failure is just a
> matter of doing what is done today: having a look at the completed
> segments and begin streaming from the incomplete/absent one.
>

This pretty much matches up with the conclusion I got to myself as well. We
could create a new tarfile for each restart of pg_receivexlog, but then it
becomes unpredictable.



> >> There are a couple of things that I have been considering as well for
> >> pg_receivexlog. Though they are not directly stick to this thread,
> >> here they are as I don't forget about them:
> >> - Removal of oldest WAL segments on a partition. When writing WAL
> >> segments to a dedicated partition, we could have an option that
> >> automatically removes the oldest WAL segment if the partition is full.
> >> This triggers once a segment is completed.
> >> - Compression of fully-written segments. When a segment is finished
> >> being written, pg_receivexlog could compress them further with gz for
> >> example. With --format=t this leads to segnum.tar.gz being generated.
> >> The advantage of doing those two things in pg_receivexlog is
> >> monitoring. One process to handle them all, and there is no need of
> >> cron jobs to handle any cleanup or compression.
> >
> > I was at one point thinking that would be a good idea as well, but
> recently
> > I've more been thinking that what we should do is implement a
> > "--post-segment-command", which would act similar to archive_command but
> > started by pg_receivexlog. This could handle things like compression, and
> > also integration with external backup tools like backrest or barman in a
> > cleaner way. We could also spawn this without waiting for it to finish
> > immediately, which would allow parallellization of the process. When
> doing
> > the compression inline that rapidly becomes the bottleneck. Unlike a
> > basebackup you're only dealing with the need to buffer 16Mb on disk
> before
> > compressing it, so it should be fairly cheap.
>
> I did not consider the case of barman and backrest to be honest,
> having the view of 2ndQ folks and David would be nice here. Still, the
> main idea behind those done by pg_receivexlog's process would be to
> not spawn a new process. I have a class of users that care about
> things that could hang, they play a lot with network-mounted disks...
> And VMs of course.
>

I have been talking to David about it a couple of times, and he agreed that
it'd be useful to have a post-segment command. We haven't discussed it in
much detail though. I'll add him to direct-cc here to see if he has any
further input :)

It could be that the best idea is to just notify some other process of
what's happening. But making it an external command would give that a lot
of flexibility. Of course, we need to be careful not to put ourselves back
in the position we are in with archive_command, in that it's very difficult
to write a good one.

I'm sure everybody cares about things that could hang. But everything can
hang...



> > Another thing I've been considering in the same area would be to add the
> > ability to write the segments to a pipe instead of a directory. Then you
> > could just pipe it into gzip without the need to buffer on disk. This
> would
> > kill the ability to know at which point we'd sync()ed to disk, but in
> most
> > cases so will doing direct

Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

2016-12-27 Thread Fabien COELHO




An additional comment on parser(planner?) part.

 This patch make planner() copy the location and length from
 parse to result, but copying such stuffs is a job of
 standard_planner.


I put the copy in planner because standard_planner may not be called by 
planner, and in all cases I think that the fields should be copied...


Otherwise it is the responsability of the planner hook to do the copy, it 
would is ok if the planner hook calls standard_planner, but the fact is 
not warranted, the comments says "the plugin would normally call 
standard_planner". What if it does not?


So it seemed safer this way.


Then the following is a comment on pg_stat_statements.c

- pg_stat_statements.c:977 - isParseNode(node)
 node should be parsenode.


Could be. ParseNode is a Node, it is just a pointer cast. I can assert on 
pn instead of node.



- The name for the addional parameter query_loc is written as
 query_len in its prototype.


Indeed, a typo on "generate_normalized_query" prototype.


- In pgss_store, "else if (strlen(query)) != query_len)" doesn't
 work as expected because of one-off error. query_len here is
 the length of the query *excluding* the last semicolon.


It was somehow intentional: if the query is not the same as the string, 
then a copy is performed to have the right null-terminated string...

but


- qtext_store doesn't rely on terminating nul character and the
 function already takes query length as a parameter. So, there's
 no need to copy the partial debug_query_string into palloc'ed
 memory.  Just replacing the query with query_loc will be
 sufficient.


Hmmm... it seemed good so I tried it, and it did not work...

The subtle reason is that qtext_store does assume that the query string is 
null terminated... it just does not recompute the length its length.


However it can be changed to behave correctly in this case, so as to avoid 
the copy.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_activity.waiting_start

2016-12-27 Thread Greg Stark

On Dec 24, 2016 5:44 PM, "Tom Lane"  wrote:

I think we'd need at least an order of
magnitude cheaper to consider putting timing calls into spinlock or lwlock
paths, and that's just not available at all, let alone portably.


For spinlocks we could conceivably just bite the bullet and use a raw rdtsc
or the equivalent for other platforms. It might be pretty easy to
distinguish sane numbers from numbers that result after a process
reschedule and we could just discard data when that happens (or count
occurrences).

That may possibly work for spinlocks but it won't work for anything heavier
where process reschedules are routine.

Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

2016-12-27 Thread Fabien COELHO



Hello Kyotaro-san,


In nonterminal stmtmulti, setQueryLocation is added and used to
calcualte and store the length of every stmt's. This needs an
extra storage in bse_yy_extra_type


The extra storage is one int.

and introduces a bit complicated stuff. But I think raw_parser can do 
that without such extra storage and labor,


How? The issue is that stmtmulti silently skip some ';' when empty 
statements are found, so I need to keep track of that to know where to 
stop, using the beginning location of the next statement, which is 
probably your idea, does not work.



then gram.y gets far simpler.




The struct member to store the location and length is named
'qlocation', and 'qlength' in the added ParseNode. But many nodes
already have 'location' of exactly the same purpopse. I don't see
a necessity to differentiate them.


Alas, the "CreateTableSpaceStmt" struct already had a "char *location" 
that I did not want to change... If I use "location", then I have to 
change it, why not...


Another reason for the name difference is that qlocation/qlength 
convention is slightly different from location when not set: location is 
set manually to -1 when the information is not available, but this 
convention is not possible for statements without changing all their 
allocations and initializations (there are hundreds...), so the convention 
I used for qlocation/qlength is that it is not set if qlength is zero, 
which is the default value set by makeNode, so no change was needed...


Changing this point would mean a *lot* of possibly error prone changes in 
the parser code everywhere statements are allocated...



Putting the two above together, the following is my suggestion
for the parser part.

- Make all parse nodes have the following two members at the
 beginning. This unifies all parse node from the view of setting
 their location. Commenting about this would be better.

 | NodeTag   type;
 | int   location;


Currently only a few parse nodes have locations, those about variables and 
constants that can be designated by an error message. I added the 
information to about 100 statements, but for the many remaining parse 
nodes the information does not exist and is not needed, so I would prefer 
to avoid adding a field both unset and unused.



- Remove struct ParseNode


"ParseNode" is needed to access the location and length of statements, 
otherwise they are only "Node", which does not have a location.



and setQueryLocation.


The function is called twice, I wanted to avoid duplicating code.

stmtmulti sets location in the same manner to other kind of nodes, just 
doing '= @n'. base_yyparse() doesn't calculate statement length.


But...


- Make raw_parser caluclate statement length then store it into
 each stmt scanning the yyextra.parsetree.


... I cannot calculate the statement length cleanly because of empty 
statements. If I know where the last ';' is, then the length computation 
must be when the reduction occurs, hence the function called from the 
stmtmulti rule.



What do you think about this?


I think that I do not know how to compute the length without knowing where 
the last ';' was, because of how empty statements are silently skipped 
around the stmtmulti/stmt rules, so I think that the length computation 
should stay where it is.


What I can certainly do is:

 - add more comments to explain why it is done like that


What I could do with some inputs from reviewers/committers is:

 - rename the "char *location" field of the create table space statement
   to "directory" or something else... but what?

 - change qlocation/qlength to location/length...

   However, then the -1 convention for location not set is not true, which
   is annoying. Using this convention means hundreds of changes because every
   statement allocation & initialization must be changed. This is
   obviously possible, but is a much larger patch, which I cannot say
   would be much better than this one, it would just be different.

What I'm reserved about:

 - adding a location field to nodes that do not need it.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-12-27 Thread Claudio Freire

On Tue, Dec 27, 2016 at 10:41 AM, Anastasia Lubennikova
 wrote:
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360,
> vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
> 1417tblk =
> ItemPointerGetBlockNumber(>dead_tuples[tupindex]);
> (gdb) bt
> #0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360,
> vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
> #1  0x00693dfe in lazy_scan_heap (onerel=0x1ec2360, options=9,
> vacrelstats=0x1ef6e00, Irel=0x1ef7168, nindexes=2, aggressive=1 '\001')
> at vacuumlazy.c:1337
> #2  0x00691e66 in lazy_vacuum_rel (onerel=0x1ec2360, options=9,
> params=0x7ffe0f866310, bstrategy=0x1f1c4a8) at vacuumlazy.c:290
> #3  0x0069191f in vacuum_rel (relid=1247, relation=0x0, options=9,
> params=0x7ffe0f866310) at vacuum.c:1418


Those line numbers don't match my code.

Which commit are you based on?

My tree is (currently) based on 71f996d22125eb6cfdbee6094f44370aa8dec610


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-12-27 Thread Claudio Freire

On Tue, Dec 27, 2016 at 10:54 AM, Alvaro Herrera
 wrote:
> Anastasia Lubennikova wrote:
>
>> I ran configure using following set of flags:
>>  ./configure --enable-tap-tests --enable-cassert --enable-debug
>> --enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer"
>> And then ran make check. Here is the stacktrace:
>>
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360,
>> vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
>> 1417tblk =
>> ItemPointerGetBlockNumber(>dead_tuples[tupindex]);
>
> This doesn't make sense, since the patch removes the "tupindex"
> variable in that function.

The variable is still there. It just has a slightly different meaning
(index within the current segment, rather than global index).

On Tue, Dec 27, 2016 at 10:41 AM, Anastasia Lubennikova
 wrote:
> 23.12.2016 22:54, Claudio Freire:
>
> On Fri, Dec 23, 2016 at 1:39 PM, Anastasia Lubennikova
>  wrote:
>
> I found the reason. I configure postgres with CFLAGS="-O0" and it causes
> Segfault on initdb.
> It works fine and passes tests with default configure flags, but I'm pretty
> sure that we should fix segfault before testing the feature.
> If you need it, I'll send a core dump.
>
> I just ran it with CFLAGS="-O0" and it passes all checks too:
>
> CFLAGS='-O0' ./configure --enable-debug --enable-cassert
> make clean && make -j8 && make check-world
>
> A stacktrace and a thorough description of your build environment
> would be helpful to understand why it breaks on your system.
>
>
> I ran configure using following set of flags:
>  ./configure --enable-tap-tests --enable-cassert --enable-debug
> --enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer"
> And then ran make check. Here is the stacktrace:

Same procedure runs fine on my end.

> core file is quite big, so I didn't attach it to the mail. You can download 
> it here: core dump file.

Can you attach your binary as well? (it needs to be identical to be
able to inspect the core dump, and quite clearly my build is
different)

I'll keep looking for ways it could crash there, but being unable to
reproduce the crash is a big hindrance, so if you can send the binary
that could help speed things up.

On Tue, Dec 27, 2016 at 10:41 AM, Anastasia Lubennikova
 wrote:
> 1. prefetchBlkno = blkno & ~0x1f;
> prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0;
>
> I didn't get it what for we need these tricks. How does it differ from:
> prefetchBlkno = (blkno > 32) ? blkno - 32 : 0;

It makes all prefetches ranges of 32 blocks aligned to 32-block boundaries.

It helps since it's at 32 block boundaries that the truncate stops to
check for locking conflicts and abort, guaranteeing no prefetch will
be needless (if it goes into that code it *will* read the next 32
blocks).

> 2. Why do we decrease prefetchBlckno twice?
>
> Here:
> +prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0;
> And here:
> if (prefetchBlkno >= 32)
> +prefetchBlkno -= 32;

The first one is outside the loop, it's finding the first prefetch
range that is boundary aligned, taking care not to cause underflow.

The second one is inside the loop, it's moving the prefetch window
down as the truncate moves along. Since it's already aligned, it
doesn't need to be realigned, just clamped to zero if it happens to
reach the bottom.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2016-12-27 Thread Tom Lane

Andres Freund  writes:
> On 2016-12-27 01:35:05 +, Greg Stark wrote:
>> On Dec 26, 2016 10:35 PM, "Tom Lane"  wrote:
>>> So it seems like the configure support we'd need is to detect
>>> whether clock_gettime is available (note on Linux there's also
>>> a library requirement, -lrt), and we would also need a way to
>>> provide a platform-specific choice of clockid; we at least need
>>> enough smarts to use CLOCK_MONOTONIC_RAW on macOS.

>> This seems like something that really should be checked at runtime.

> I'm pretty strongly against doing performance measurements at
> startup. Both the delay and the potential for differing test results
> seem like pretty bad consequences.

Yeah, that doesn't sound great to me either.  And I don't entirely
see the point, at least not with what we know now.  I am a bit concerned
that we'll find out there are popular platforms where clock_gettime
compiles but fails with ENOSYS, or some similarly unhelpful behavior.
But we won't find that out if we don't try.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index Scans

2016-12-27 Thread Amit Kapila

On Fri, Dec 23, 2016 at 5:48 PM, Rahila Syed  wrote:
>>> 5. Comment for _bt_parallel_seize() says:
>>> "False indicates that we have reached the end of scan for
>>>  current scankeys and for that we return block number as P_NONE."
>>>
>>>  What is the reason to check (blkno == P_NONE) after checking (status ==
>>> false)
>>>  in _bt_first() (see code below)? If comment is correct
>>>  we'll never reach _bt_parallel_done()
>>>
>>> +   blkno = _bt_parallel_seize(scan, );
>>> +   if (status == false)
>>> +   {
>>> +   BTScanPosInvalidate(so->currPos);
>>> +   return false;
>>> +   }
>>> +   else if (blkno == P_NONE)
>>> +   {
>>> +   _bt_parallel_done(scan);
>>> +   BTScanPosInvalidate(so->currPos);
>>> +   return false;
>>> +   }
>>>
>>The first time master backend or worker hits last page (calls this
>>API), it will return P_NONE and after that any worker tries to fetch
>>next page, it will return status as false.  I think we can expand a
>>comment to explain it clearly.  Let me know, if you need more
>>clarification, I can explain it in detail.
>
> Probably this was confusing because we have not mentioned
> that P_NONE can be returned even when status = TRUE and
> not just when status is false.
>
> I think, the comment above the function can be modified as follows,
>
> + /*
> + * True indicates that the block number returned is either valid including
> P_NONE
> + * and scan is continued or block number is invalid and scan has just
> + * begun.
>

I think the modification (including P_NONE and scan is continued)
suggested by you can confuse the reader, because if the returned block
number is P_NONE, then we don't continue the scan.  I have used
slightly different words in the patch I have just posted, please check
and see if that looks fine to you.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index Scans

2016-12-27 Thread Amit Kapila

On Fri, Dec 23, 2016 at 6:42 PM, Anastasia Lubennikova
 wrote:
> 22.12.2016 07:19, Amit Kapila:
>>
>> On Wed, Dec 21, 2016 at 8:46 PM, Anastasia Lubennikova
>>  wrote:
>>>
>>> The following review has been posted through the commitfest application:
>>> make installcheck-world:  tested, passed
>>> Implements feature:   tested, passed
>>> Spec compliant:   tested, passed
>>> Documentation:tested, passed
>>>
>>> Hi, thank you for the patch.
>>> Results are very promising. Do you see any drawbacks of this feature or
>>> something that requires more testing?
>>>
>> I think you can focus on the handling of array scan keys for testing.
>> In general, one of my colleagues has shown interest in testing this
>> patch and I think he has tested as well but never posted his findings.
>> I will request him to share his findings and what kind of tests he has
>> done, if any.
>
>
>
> Check please code related to buffer locking and pinning once again.
> I got the warning. Here are the steps to reproduce it:
> Except "autovacuum = off" config is default.
>
> pgbench -i -s 100 test
> pgbench -c 10 -T 120 test
>
> SELECT count(aid) FROM pgbench_accounts
> WHERE aid > 1000 AND aid < 90 AND bid > 800 AND bid < 900;
> WARNING:  buffer refcount leak: [8297] (rel=base/12289/16459, blockNum=2469,
> flags=0x9380, refcount=1 1)
>  count
>

The similar problem has occurred while testing "parallel index only
scan" patch and Rafia has included the fix in her patch [1] which
ideally should be included in this patch, so I have copied the fix
from her patch.  Apart from that, I observed that similar problem can
happen for backward scans, so fixed the same as well.

>
>>> 2. Don't you mind to rename 'amestimateparallelscan' to let's say
>>> 'amparallelscan_spacerequired'
>>> or something like this?
>>
>> Sure, I am open to other names, but IMHO, lets keep "estimate" in the
>> name to keep it consistent with other parallel stuff. Refer
>> execParallel.c to see how widely this word is used.
>>
>>> As far as I understand there is nothing to estimate, we know this size
>>> for sure. I guess that you've chosen this name because of
>>> 'heap_parallelscan_estimate'.
>>> But now it looks similar to 'amestimate' which refers to indexscan cost
>>> for optimizer.
>>> That leads to the next question.
>>>
>> Do you mean 'amcostestimate'?  If you want we can rename it
>> amparallelscanestimate to be consistent with amcostestimate.
>
>
> I think that 'amparallelscanestimate' seems less ambiguous than
> amestimateparallelscan.
> But it's up to you. There are enough comments to understand the purpose of
> this field.
>

Okay, then lets leave as it is, because we have aminitparallelscan
which should also be renamed to amparallelscaninit if we rename
amestimateparallelscan.

>>
>>> And why do you call it pageStatus? What does it have to do with page?
>>>
>> During scan this tells us whether next page is available for scan.
>> Another option could be to name it as scanStatus, but not sure if that
>> is better.  Do you think if we add a comment like "indicates whether
>> next page is available for scan" for this variable then it will be
>> clear?
>
>
> Yes, I think it describes the flag better.

Changed as per above suggestion.

>>>
>>> 5. Comment for _bt_parallel_seize() says:
>>> "False indicates that we have reached the end of scan for
>>>   current scankeys and for that we return block number as P_NONE."
>>>
>>>   What is the reason to check (blkno == P_NONE) after checking (status ==
>>> false)
>>>   in _bt_first() (see code below)? If comment is correct
>>>   we'll never reach _bt_parallel_done()
>>>
>>> +   blkno = _bt_parallel_seize(scan, );
>>> +   if (status == false)
>>> +   {
>>> +   BTScanPosInvalidate(so->currPos);
>>> +   return false;
>>> +   }
>>> +   else if (blkno == P_NONE)
>>> +   {
>>> +   _bt_parallel_done(scan);
>>> +   BTScanPosInvalidate(so->currPos);
>>> +   return false;
>>> +   }
>>>
>> The first time master backend or worker hits last page (calls this
>> API), it will return P_NONE and after that any worker tries to fetch
>> next page, it will return status as false.  I think we can expand a
>> comment to explain it clearly.  Let me know, if you need more
>> clarification, I can explain it in detail.
>>
> Got it,
> I think you can add this explanation to the comment for
> _bt_parallel_seize().
>

Expanded the comment as discussed.

>>> 6. To avoid code duplication, I would wrap this into the function
>>>
>>> +   /* initialize moreLeft/moreRight appropriately for scan direction
>>> */
>>> +   if (ScanDirectionIsForward(dir))
>>> +   {
>>> +   so->currPos.moreLeft = false;
>>> +   so->currPos.moreRight = true;
>>> +   }

Re: [HACKERS] Reporting planning time with EXPLAIN

2016-12-27 Thread Stephen Frost

Ashutosh,

* Ashutosh Bapat (ashutosh.ba...@enterprisedb.com) wrote:
> We report planning and execution time when EXPLAIN ANALYZE is issued.
> We do not have facility to report planning time as part EXPLAIN
> output. In order to get the planning time, one has to issue EXPLAIN
> ANALYZE which involves executing the plan, which is unnecessary.

+1, that seems like a useful thing to have.

> The discussion referred to seems to be [1]. Here's patch to expose the
> "summary" option as mentioned in the last paragraph of above commit
> message. Right now I have named it as "summary", but I am fine if we
> want to change it to something meaningful. "timing" already has got
> some other usage, so can't use it here.

After reading that, rather long, thread, I agree that just having it be
'summary' is fine.  We don't really want to make it based off of
'timing' or 'costs' or 'verbose', those are different things.

I've only briefly looked at the patch so far, but it seemed pretty
straight-forward.  If there aren't objections, I'll see about getting
this committed later this week.

I will point out that it'd still be nice to have something like
'explain (I WANT IT ALL)', but that's a different feature and has its
own challenges, so let's not argue about it here.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_stat_activity.waiting_start

2016-12-27 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> In practice, there should never be waits on LWLocks (much less spinlocks)
> that exceed order-of-milliseconds; if there are, either we chose the wrong
> lock type or the system is pretty broken in general.  So maybe it's
> sufficient if we provide a wait start time for heavyweight locks ...
> though that still seems kind of ugly.

While I agree that it's a bit ugly, if the alternative is "don't have
anything", then I'd argue that it's worth it.  The use-case for this
feature, as I see it, is for admins to be able to go look at how long
something has been waiting and monitoring scripts to which fire only
every minute or more, and order-of-milliseconds differences aren't
significant there.

It's terribly ugly, but from a practical standpoint, we could probably
make it "waiting after deadlock timeout" and just set the time when the
deadlock timeout fires and the use-case for this would be satisfied.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-12-27 Thread Alvaro Herrera

Anastasia Lubennikova wrote:

> I ran configure using following set of flags:
>  ./configure --enable-tap-tests --enable-cassert --enable-debug
> --enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer"
> And then ran make check. Here is the stacktrace:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360,
> vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
> 1417tblk =
> ItemPointerGetBlockNumber(>dead_tuples[tupindex]);

This doesn't make sense, since the patch removes the "tupindex"
variable in that function.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-12-27 Thread Anastasia Lubennikova


23.12.2016 22:54, Claudio Freire:

On Fri, Dec 23, 2016 at 1:39 PM, Anastasia Lubennikova
  wrote:

I found the reason. I configure postgres with CFLAGS="-O0" and it causes
Segfault on initdb.
It works fine and passes tests with default configure flags, but I'm pretty
sure that we should fix segfault before testing the feature.
If you need it, I'll send a core dump.

I just ran it with CFLAGS="-O0" and it passes all checks too:

CFLAGS='-O0' ./configure --enable-debug --enable-cassert
make clean && make -j8 && make check-world

A stacktrace and a thorough description of your build environment
would be helpful to understand why it breaks on your system.


I ran configure using following set of flags:
 ./configure --enable-tap-tests --enable-cassert --enable-debug 
--enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer"

And then ran make check. Here is the stacktrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, 
vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
1417tblk = 
ItemPointerGetBlockNumber(>dead_tuples[tupindex]);

(gdb) bt
#0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, 
vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
#1  0x00693dfe in lazy_scan_heap (onerel=0x1ec2360, options=9, 
vacrelstats=0x1ef6e00, Irel=0x1ef7168, nindexes=2, aggressive=1 '\001')

at vacuumlazy.c:1337
#2  0x00691e66 in lazy_vacuum_rel (onerel=0x1ec2360, options=9, 
params=0x7ffe0f866310, bstrategy=0x1f1c4a8) at vacuumlazy.c:290
#3  0x0069191f in vacuum_rel (relid=1247, relation=0x0, 
options=9, params=0x7ffe0f866310) at vacuum.c:1418
#4  0x00690122 in vacuum (options=9, relation=0x0, relid=0, 
params=0x7ffe0f866310, va_cols=0x0, bstrategy=0x1f1c4a8,

isTopLevel=1 '\001') at vacuum.c:320
#5  0x0068fd0b in vacuum (options=-1652367447, relation=0x0, 
relid=3324614038, params=0x1f11bf0, va_cols=0xb59f63,

bstrategy=0x1f1c620, isTopLevel=0 '\000') at vacuum.c:150
#6  0x00852993 in standard_ProcessUtility (parsetree=0x1f07e60, 
queryString=0x1f07468 "VACUUM FREEZE;\n",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0xea5cc0 
, completionTag=0x7ffe0f866750 "") at utility.c:669
#7  0x008520da in standard_ProcessUtility 
(parsetree=0x401ef6cd8, queryString=0x18 address 0x18>,
context=PROCESS_UTILITY_TOPLEVEL, params=0x68, dest=0x9e5d62 
, completionTag=0x7ffe0f8663f0 "`~\360\001")

at utility.c:360
#8  0x00851161 in PortalRunMulti (portal=0x7ffe0f866750, 
isTopLevel=0 '\000', setHoldSnapshot=-39 '\331',
dest=0x851161 , altdest=0x7ffe0f8664f0, 
completionTag=0x1f07e60 "\341\002") at pquery.c:1219
#9  0x00851374 in PortalRunMulti (portal=0x1f0a488, isTopLevel=1 
'\001', setHoldSnapshot=0 '\000', dest=0xea5cc0 ,
altdest=0xea5cc0 , completionTag=0x7ffe0f866750 "") at 
pquery.c:1345
#10 0x00850889 in PortalRun (portal=0x1f0a488, 
count=9223372036854775807, isTopLevel=1 '\001', dest=0xea5cc0 ,
altdest=0xea5cc0 , completionTag=0x7ffe0f866750 "") at 
pquery.c:824
#11 0x0084a4dc in exec_simple_query (query_string=0x1f07468 
"VACUUM FREEZE;\n") at postgres.c:1113
#12 0x0084e960 in PostgresMain (argc=10, argv=0x1e60a50, 
dbname=0x1e823b0 "template1", username=0x1e672a0 "anastasia")

at postgres.c:4091
#13 0x006f967e in init_locale (categoryname=0x100 
,
category=32766, locale=0xa004692f0 address 0xa004692f0>) at main.c:310
#14 0x7f1e5f463830 in __libc_start_main (main=0x6f93e1 , 
argc=10, argv=0x7ffe0f866a78, init=,
fini=, rtld_fini=, 
stack_end=0x7ffe0f866a68) at ../csu/libc-start.c:291

#15 0x00469319 in _start ()

core file is quite big, so I didn't attach it to the mail. You can 
download it here: core dump file 
.


Here are some notes about the first patch:

1. prefetchBlkno = blkno & ~0x1f;
prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0;

I didn't get it what for we need these tricks. How does it differ from:
prefetchBlkno = (blkno > 32) ? blkno - 32 : 0;

2. Why do we decrease prefetchBlckno twice?

Here:
+prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0;
And here:
if (prefetchBlkno >= 32)
+prefetchBlkno -= 32;


I'll inspect second patch in a few days and write questions about it.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] ALTER TABLE parent SET WITHOUT OIDS and the oid column

2016-12-27 Thread Ashutosh Bapat

On Mon, Dec 26, 2016 at 3:36 PM, Amit Langote
 wrote:
> I suspect the following is a bug:
>
> create table foo (a int) with oids;
> CREATE TABLE
> create table bar (a int);
> CREATE TABLE
> alter table bar inherit foo;
> ERROR:  table "bar" without OIDs cannot inherit from table "foo" with OIDs
>
> alter table bar set with oids;
> ALTER TABLE
> alter table bar inherit foo;
> ALTER TABLE
>
> alter table foo set without oids;
> ERROR:  relation 16551 has non-inherited attribute "oid"
>
> Because:
>
> select attinhcount from pg_attribute where attrelid = 'bar'::regclass and
> attname = 'oid';
>  attinhcount
> -
>0
> (1 row)
>
> Which also means "oid" can be safely dropped from bar breaking the
> invariant that if the parent table has oid column, its child tables must too:
>
> alter table bar drop oid;  -- or, alter table bar set without oids;
> ALTER TABLE
>
> Attached patches modifies MergeAttributesIntoExisting() such that we
> increment attinhcount not only for user attributes, but also for the oid
> system column if one exists.
>
> Thoughts?

Yes, if a child inherits from a parent with OID, child gets OID column
and its inhcount is set to 1.
postgres=# create table foo (a int) with oids;
CREATE TABLE
postgres=# create table bar() inherits (foo);
CREATE TABLE
postgres=# \d+ foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
+-+---+--+-+-+--+-
 a  | integer |   |  | | plain   |  |
Child tables: bar
Has OIDs: yes

postgres=# \d+ bar
Table "public.bar"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
+-+---+--+-+-+--+-
 a  | integer |   |  | | plain   |  |
Inherits: foo
Has OIDs: yes

postgres=# select attname, attinhcount from pg_attribute where
attrelid = 'bar'::regclass;
 attname  | attinhcount
--+-
 tableoid |   0
 cmax |   0
 xmax |   0
 cmin |   0
 xmin |   0
 oid  |   1
 ctid |   0
 a|   1
(8 rows)

So, I think your patch is on the right track.

We allow creating user attribute with name "oid" so you do not want to
search system attribute oid by name. Instead search by attribute id
ObjectIdAttributeNumber.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2016-12-27 Thread Pavan Deolasee

On Mon, Dec 26, 2016 at 11:49 AM, Jaime Casanova <
jaime.casan...@2ndquadrant.com> wrote:

> On 2 December 2016 at 07:36, Pavan Deolasee 
> wrote:
> >
> > I've updated the patches after fixing the issue. Multiple rounds of
> > regression passes for me without any issue. Please let me know if it
> works
> > for you.
> >
>
> Hi Pavan,
>
> Today i was playing with your patch and running some tests and found
> some problems i wanted to report before i forget them ;)
>
>
Thanks Jaime for the tests and bug reports. I'm attaching an add-on patch
which fixes these issues for me. I'm deliberately not sending a fresh
revision because the changes are still minor.


> * You need to add a prototype in src/backend/utils/adt/pgstatfuncs.c:
> extern Datum pg_stat_get_tuples_warm_updated(PG_FUNCTION_ARGS);
>

Added.


>
> * The isolation test for partial_index fails (attached the
> regression.diffs)
>

Fixed. Looks like I forgot to include attributes from predicates and
expressions in the list of index attributes (as pointed by Alvaro)


>
> * running a home-made test i have at hand i got this assertion:
> """
> TRAP: FailedAssertion("!(buf_state & (1U << 24))", File: "bufmgr.c", Line:
> 837)
> LOG:  server process (PID 18986) was terminated by signal 6: Aborted
> """
> To reproduce:
> 1) run prepare_test.sql
> 2) then run the following pgbench command (sql scripts attached):
> pgbench -c 24 -j 24 -T 600 -n -f inserts.sql@15 -f updates_1.sql@20 -f
> updates_2.sql@20 -f deletes.sql@45 db_test
>
>
Looks like the patch was failing to set the block number correctly in the
t_ctid field, leading to these strange failures. There was also couple of
instances where the t_ctid field was being accessed directly, instead of
the newly added macro. I think we need some better mechanism to ensure that
we don't miss out on such things. But I don't have a very good idea about
doing that right now.


>
> * sometimes when i have made the server crash the attempt to recovery
> fails with this assertion:
> """
> LOG:  database system was not properly shut down; automatic recovery in
> progress
> LOG:  redo starts at 0/157F970
> TRAP: FailedAssertion("!(!warm_update)", File: "heapam.c", Line: 8924)
> LOG:  startup process (PID 14031) was terminated by signal 6: Aborted
> LOG:  aborting startup due to startup process failure
> """
> still cannot reproduce this one consistently but happens often enough
>
>
This could be a case of uninitialised variable in log_heap_update(). What
surprises me though that none of the compilers I tried so far could catch
that. In the following code snippet, if the condition evaluates to false
then "warm_update" may remain uninitialised, leading to wrong xlog entry,
which may later result in assertion failure during redo recovery.

7845
7846 if (HeapTupleIsHeapWarmTuple(newtup))
7847 warm_update = true;
7848

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0003_warm_fixes_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: recursive json_populate_record()

2016-12-27 Thread Aleksander Alekseev

Hello.

I've noticed that this patch is on CF and needs a reviewer so I decided
to take a look. Code looks good to me in general, it's well covered by
tests and passes `make check-world`.

However it would be nice to have a little more comments. In my opinion
every procedure have to have at least a short description - what it
does, what arguments it receives and what it returns, even if it's a
static procedure. Same applies for structures and their fields.

Another thing that bothers me is a FIXME comment:

```
tupdesc = lookup_rowtype_tupdesc(type, typmod); /* FIXME cache */
```

I suggest remove it or implement a caching here as planned.

On Tue, Dec 13, 2016 at 10:07:46AM +0900, Michael Paquier wrote:
> On Tue, Dec 13, 2016 at 9:38 AM, Nikita Glukhov  
> wrote:
> > It also fixes the following errors/inconsistencies caused by lost quoting of
> > string json values:
> >
> > [master]=# select * from json_to_record('{"js": "a"}') as rec(js json);
> > ERROR:  invalid input syntax for type json
> > DETAIL:  Token "a" is invalid.
> > CONTEXT:  JSON data, line 1: a
> 
> The docs mention that this is based on a best effort now. It may be
> interesting to improve that.
> 
> > [master]=# select * from json_to_record('{"js": "true"}') as rec(js json);
> >   js
> > --
> > true
> >
> > [patched]=# select * from json_to_record('{"js": "a"}') as rec(js json);
> >   js
> > -
> >  "a"
> 
> That's indeed valid JSON.
> 
> > [patched]=# select * from json_to_record('{"js": "true"}') as rec(js json);
> >js
> > 
> >  "true"
> 
> And so is that.
> 
> > The second patch adds assignment of correct ndims to array fields of RECORD
> > function result types.
> > Without this patch, attndims in tuple descriptors of RECORD types is always
> > 0 and the corresponding assertion fails in the next test:
> >
> > [patched]=# select json_to_record('{"a": [1, 2, 3]}') as rec(a int[]);
> >
> >
> > Should I submit these patches to commitfest?
> 
> It seems to me that it would be a good idea to add them.
> -- 
> Michael
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-27 Thread Ashutosh Bapat

>
>> 3. Talking about saving some CPU cycles - if a clauseless full join can be
>> implemented only using merge join, probably that's the only path available
>> in
>> the list of paths for the given relation. Instead of building the same
>> path
>> anew, should we use the existing path like GetExistingLocalJoinPath() for
>> that
>> case?
>
>
> Hm, that might be an idea, but my concern about that is: the existing path
> wouldn't always be guaranteed to be unprameterized.

Why? We retain all the parameterizations (including no
parameterization) available in the pathlist, so if it's possible to
create an unparameterized path, there will be one.

>
>> In fact, I am wondering whether we should look for an existing nestloop
>> path for all joins except full, in which case we should look for merge or
>> hash
>> join path. We go on building a new path, if only there isn't an existing
>> one.
>> That will certainly save some CPU cycles spent in costing the path.
>
>
> Maybe in many cases, nestloop paths for INNER/LEFT/SEMI/ANTI might be
> removed from the rel's pathlist by dominated merge or hash join paths, so
> searching the pathlist might cause a useless overhead.
>

Fare point.

>
>> 5. Per comment below a clauseless full join can be implemented using only
>> merge
>> join. Why are we checking extra->mergejoin_allowed? Shouldn't it be true
>> here?
>> /*
>>  * Special corner case: for "x FULL JOIN y ON true", there will be
>> no
>>  * join clauses at all.  Note that mergejoin is our only join type
>>  * that supports FULL JOIN without any join clauses, and in that
>> case
>>  * it doesn't require the input paths to be well ordered, so
>> generate
>>  * a clauseless mergejoin path from the cheapest-total-cost paths.
>>  */
>> if (extra->mergejoin_allowed && !extra->mergeclause_list)
>
>
> See the comments for select_mergejoin_clauses:
>
>  * select_mergejoin_clauses
>  *Select mergejoin clauses that are usable for a particular join.
>  *Returns a list of RestrictInfo nodes for those clauses.
>  *
>  * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
>  * this is a right/full join and there are nonmergejoinable join clauses.
>  * The executor's mergejoin machinery cannot handle such cases, so we have
>  * to avoid generating a mergejoin plan.  (Note that this flag does NOT
>  * consider whether there are actually any mergejoinable clauses.  This is
>  * correct because in some cases we need to build a clauseless mergejoin.
>  * Simply returning NIL is therefore not enough to distinguish safe from
>  * unsafe cases.)

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.

>
>> Rethinking about the problem, the error comes because the inner or outer
>> plan
>> of a merge join plan doesn't have pathkeys required by the merge join.
>> This
>> happens because the merge join path had foreign path with pathkeys as
>> inner or
>> outer path and corresponding fdw_outerpath didn't have those pathkeys. I
>> am
>> wondering whether the easy and possibly correct solution here is to not
>> replace
>> a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we
>> don't do
>> that, there won't be error building merge join plan and
>> postgresRecheckForeignScan() would correctly route the EPQ checks to the
>> local
>> plan available as outer plan. Attached patch with that code removed.
>
>
> That might be fine for PG9.6, but I'd like to propose replacing
> GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
> GetExistingLocalJoinPath might choose an overkill, merge or hash join path
> for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might cause an
> overhead at EPQ rechecks, and

The reason we chose to pick up an existing path was that the
discussion in thread [1] concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?

> (2) choosing a local join path randomly from
> the rel's pathlist wouldn't be easy to understand.
>

Easy to understand for whom? Can you please elaborate?

[1] 
https://www.postgresql.org/message-id/flat/565E638E.8020703%40lab.ntt.co.jp#565e638e.8020...@lab.ntt.co.jp
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-12-27 Thread Etsuro Fujita


On 2016/12/08 21:08, Etsuro Fujita wrote:

On 2016/12/07 20:23, Etsuro Fujita wrote:

My proposal here would be a bit different from what you proposed; right
before deparseSelectSql in deparseSelectStmtForRel, build the tlists for
relations present in the given jointree that will be deparsed as
subqueries, by (1) traversing the given jointree and (2) applying
build_tlist_to_deparse to each relation to be deparsed as a subquery and
saving the result in that relation's fpinfo.  I think that would be
implemented easily, and the overhead would be small.



I implemented that to address your concern:
* deparseRangeTblRef uses the tlist created as above, to build the
SELECT clause of the subquery.  (You said "Then in deparseRangeTblRef()
assert that there's a tlist in fpinfo", but the tlist may be empty, so I
didn't add any assertion to that function.)
* get_relation_column_alias_ids uses tlist_member with the tlist.

I also addressed the comments #1, #2 and #3 in [1].  For #1, I added
"LIMIT 10" to the query.  Attached is the new version of the patch.

Other changes:
* As discussed before, renamed grouped_tlist in fpinfo to "tlist" and
used it to store the tlist created as above.
* Also, renamed SS_REL_ALIAS_PREFIX to SUBQUERY_REL_ALIAS_PREFIX
(Likewise for SS_COL_ALIAS_PREFIX).
* Relocated some functions.
* Revised comments a little bit.


I updated the patch a bit further: simplified the function name 
(s/build_subquery_rel_tlists/build_subquery_tlists/), and revised 
comments a little bit.  Attached is an updated version 
(postgres-fdw-subquery-support-v14.patch).  And I rebased another patch 
for PHVs against that patch, which is also attached 
(postgres-fdw-phv-pushdown-v14.patch).


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 109,114  typedef struct deparse_expr_cxt
--- 109,116 
  /* Handy macro to add relation name qualification */
  #define ADD_REL_QUALIFIER(buf, varno)	\
  		appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno))
+ #define SUBQUERY_REL_ALIAS_PREFIX	"s"
+ #define SUBQUERY_COL_ALIAS_PREFIX	"c"
  
  /*
   * Functions to determine whether an expression can be evaluated safely on
***
*** 159,165  static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod,
   deparse_expr_cxt *context);
  static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
  	   deparse_expr_cxt *context);
! static void deparseSelectSql(List *tlist, List **retrieved_attrs,
   deparse_expr_cxt *context);
  static void deparseLockingClause(deparse_expr_cxt *context);
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
--- 161,167 
   deparse_expr_cxt *context);
  static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
  	   deparse_expr_cxt *context);
! static void deparseSelectSql(List *tlist, bool is_subquery, List **retrieved_attrs,
   deparse_expr_cxt *context);
  static void deparseLockingClause(deparse_expr_cxt *context);
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
***
*** 167,172  static void appendConditions(List *exprs, deparse_expr_cxt *context);
--- 169,177 
  static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
  	RelOptInfo *joinrel, bool use_alias, List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
+ static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
+ 			   RelOptInfo *foreignrel, List **params_list);
+ static void appendSubqueryAlias(StringInfo buf, int relno, int ncols);
  static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
  static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
  static void appendAggOrderBy(List *orderList, List *targetList,
***
*** 175,180  static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
--- 180,194 
  static Node *deparseSortGroupClause(Index ref, List *tlist,
  	   deparse_expr_cxt *context);
  
+ /*
+  * Helper functions
+  */
+ static void build_subquery_tlists(RelOptInfo *foreignrel);
+ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
+ 			int *relno, int *colno);
+ static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
+ 		  int *relno, int *colno);
+ 
  
  /*
   * Examine each qual clause in input_conds, and classify them into two groups,
***
*** 861,867  build_tlist_to_deparse(RelOptInfo *foreignrel)
  	 * checking shippability, so just return that.
  	 */
  	if (foreignrel->reloptkind == RELOPT_UPPER_REL)
! 		return fpinfo->grouped_tlist;
  
  	/*
  	 * We require columns specified in foreignrel->reltarget->exprs and those
--- 875,881 
  	 * checking shippability, so just return that.
  	 */
  	if (foreignrel->reloptkind == RELOPT_UPPER_REL)
! 		return fpinfo->tlist;
  
  	/*
  	 * We require

Re: [HACKERS] Support for pg_receivexlog --format=plain|tar

2016-12-27 Thread Michael Paquier

On Tue, Dec 27, 2016 at 6:34 PM, Magnus Hagander  wrote:
> On Tue, Dec 27, 2016 at 2:23 AM, Michael Paquier 
> wrote:
>> Magnus, you have mentioned me as well that you had a couple of ideas
>> on the matter, feel free to jump in and let's mix our thoughts!
>
>
> Yeah, I've been wondering what the actual usecase is here :)

There is value to compress segments finishing with trailing zeros,
even if they are not the species with the highest representation in
the WAL archive.

> Though I was considering the case where all segments are streamed into the
> same tarfile (and then some sort of configurable limit where we'd switch
> tarfile after  segments, which rapidly started to feel too complicated).
>
> What's the actual advantage of having it wrapped inside a single tarfile?

I am advocating for one tar file per segment to be honest. Grouping
them makes the failure handling more complicated when connection to
the server is killed, or the replication stream is cut. Well, not
really complicated actually, because I think that you would need to
drop in the segment folder a status file with enough information to
let pg_receivexlog know from where in the tar file it needs to
continue writing. If a new tarball is created for each segment,
deciding from where to stream after a connection failure is just a
matter of doing what is done today: having a look at the completed
segments and begin streaming from the incomplete/absent one.

>> There are a couple of things that I have been considering as well for
>> pg_receivexlog. Though they are not directly stick to this thread,
>> here they are as I don't forget about them:
>> - Removal of oldest WAL segments on a partition. When writing WAL
>> segments to a dedicated partition, we could have an option that
>> automatically removes the oldest WAL segment if the partition is full.
>> This triggers once a segment is completed.
>> - Compression of fully-written segments. When a segment is finished
>> being written, pg_receivexlog could compress them further with gz for
>> example. With --format=t this leads to segnum.tar.gz being generated.
>> The advantage of doing those two things in pg_receivexlog is
>> monitoring. One process to handle them all, and there is no need of
>> cron jobs to handle any cleanup or compression.
>
> I was at one point thinking that would be a good idea as well, but recently
> I've more been thinking that what we should do is implement a
> "--post-segment-command", which would act similar to archive_command but
> started by pg_receivexlog. This could handle things like compression, and
> also integration with external backup tools like backrest or barman in a
> cleaner way. We could also spawn this without waiting for it to finish
> immediately, which would allow parallellization of the process. When doing
> the compression inline that rapidly becomes the bottleneck. Unlike a
> basebackup you're only dealing with the need to buffer 16Mb on disk before
> compressing it, so it should be fairly cheap.

I did not consider the case of barman and backrest to be honest,
having the view of 2ndQ folks and David would be nice here. Still, the
main idea behind those done by pg_receivexlog's process would be to
not spawn a new process. I have a class of users that care about
things that could hang, they play a lot with network-mounted disks...
And VMs of course.

> Another thing I've been considering in the same area would be to add the
> ability to write the segments to a pipe instead of a directory. Then you
> could just pipe it into gzip without the need to buffer on disk. This would
> kill the ability to know at which point we'd sync()ed to disk, but in most
> cases so will doing direct gzip. Just means we couldn't support this in sync
> mode.

Users piping their data don't care about reliability anyway. So that
is not a problem.

> I can see the point of being able to compress the individual segments
> directly in pg_receivexlog in smaller systems though, without the need to
> rely on an external compression program as well. But in that case, is there
> any reason we need to wrap it in a tarfile, and can't just write it to
> .gz natively?

You mean having a --compress=0|9 option that creates individual gz
files for each segment? Definitely we could just do that. It would be
a shame though to not use the WAL methods you have introduced in
src/bin/pg_basebackup, with having the whole set tar and tar.gz. A
quick hack in pg_receivexlog has showed me that segments are saved in
a single tarball, which is not cool. My feeling is that using the
existing infrastructure, but making it pluggable for individual files
(in short I think that what is needed here is a way to tell the WAL
method to switch to a new file when a segment completes) would really
be the most simple one in terms of code lines and maintenance.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to

Re: [HACKERS] Potential data loss of 2PC files

2016-12-27 Thread Andres Freund

On 2016-12-27 14:09:05 +0900, Michael Paquier wrote:
> On Fri, Dec 23, 2016 at 3:02 AM, Andres Freund  wrote:
> > Not quite IIRC: that doesn't deal with file size increase.  All this would 
> > be easier if hardlinks wouldn't exist IIUC. It's basically a question 
> > whether dentry, inode or contents need to be synced.   Yes, it sucks.
>
> I did more monitoring of the code... Looking at unlogged tables and
> empty() routines of access methods, isn't there a risk as well for
> unlogged tables? mdimmedsync() does not fsync() the parent directory
> either!

But the files aren't created there, so I don't generally see the
problem.  And the creation of the metapages etc. should be WAL logged
anyway.  So I don't think we should / need to do anything special
there.  You can argue however that we wouldn't necessarily fsync the
parent directory for the file creation, ever.  But that'd be more
smgrcreate's responsibility than anything.

> We could do that at checkpoint as well, actually, by looping through
> all the tablespaces and fsync the database directories.

That seems like a bad idea, as it'd force fsyncs on unlogged tables and
such.

Regards,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread Amit Langote

On 2016/12/27 18:48, 高增琦 wrote:
> Hi ,
> 
> I tried "COPY FROM"  in the git version. It inserts rows to wrong partition.
> 
> step to reproduce:
> create table t(a int, b int) partition by range(a);
> create table t_p1 partition of t for values from (1) to (100);
> create table t_p2 partition of t for values from (100) to (200);
> create table t_p3 partition of t for values from (200) to (300);
> insert into t values(1,1);
> insert into t values(101,101);
> insert into t values(201,201);
> copy (select * from t) to '/tmp/test2.txt';
> copy t from '/tmp/test2.txt';
> select * from t_p1;
> 
> result:
> postgres=# select * from t_p1;
>   a  |  b
> -+-
>1 |   1
>1 |   1
>  101 | 101
>  201 | 201
> (4 rows)
> 
> I think the argument "BulkInsertState" used in CopyFrom/heap_insert
> is related to this problem. Please check it.

You're quite right.  Attached should fix that.

Thanks,
Amit
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index aa25a23336..e9bf4afa44 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2290,7 +2290,7 @@ CopyFrom(CopyState cstate)
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			hi_options = 0; /* start with default heap_insert options */
-	BulkInsertState bistate;
+	BulkInsertState bistate = NULL;
 	uint64		processed = 0;
 	bool		useHeapMultiInsert;
 	int			nBufferedTuples = 0;
@@ -2482,7 +2482,8 @@ CopyFrom(CopyState cstate)
 	values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
 	nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
 
-	bistate = GetBulkInsertState();
+	if (useHeapMultiInsert)
+		bistate = GetBulkInsertState();
 	econtext = GetPerTupleExprContext(estate);
 
 	/* Set up callback to identify error line number */
@@ -2707,7 +2708,8 @@ CopyFrom(CopyState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	FreeBulkInsertState(bistate);
+	if (bistate != NULL)
+		FreeBulkInsertState(bistate);
 
 	MemoryContextSwitchTo(oldcontext);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread Amit Langote

On 2016/12/27 18:30, Rajkumar Raghuwanshi wrote:
> Hi Amit,
> 
> I have pulled latest sources from git and tried to create multi-level
> partition,  getting a server crash, below are steps to reproduce. please
> check if it is reproducible in your machine also.
> 

[ ... ]

> postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM
> generate_series(0, 599, 2) i;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

Thanks for the example.  Looks like there was an oversight in my patch
that got committed as 2ac3ef7a01 [1].

Attached patch should fix the same.

Thanks,
Amit

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ac3ef7a01df859c62d0a02333b646d65eaec5ff
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index fca874752f..f9daf8052d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1647,6 +1647,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		PartitionDesc partdesc = parent->partdesc;
 		TupleTableSlot *myslot = parent->tupslot;
 		TupleConversionMap *map = parent->tupmap;
+		ExprContext *econtext = GetPerTupleExprContext(estate);
 
 		/* Quick exit */
 		if (partdesc->nparts == 0)
@@ -1667,6 +1668,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		}
 
 		/* Extract partition key from tuple */
+		econtext->ecxt_scantuple = slot;
 		FormPartitionKeyDatum(parent, slot, estate, values, isnull);
 
 		if (key->strategy == PARTITION_STRATEGY_RANGE)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bca34a509c..1d699c1dab 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3107,9 +3107,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 {
 	int		result;
 	Oid		failed_at;
-	ExprContext *econtext = GetPerTupleExprContext(estate);
 
-	econtext->ecxt_scantuple = slot;
 	result = get_partition_for_tuple(pd, slot, estate, _at);
 	if (result < 0)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread 高增琦

Hi ,

I tried "COPY FROM"  in the git version. It inserts rows to wrong partition.

step to reproduce:
create table t(a int, b int) partition by range(a);
create table t_p1 partition of t for values from (1) to (100);
create table t_p2 partition of t for values from (100) to (200);
create table t_p3 partition of t for values from (200) to (300);
insert into t values(1,1);
insert into t values(101,101);
insert into t values(201,201);
copy (select * from t) to '/tmp/test2.txt';
copy t from '/tmp/test2.txt';
select * from t_p1;

result:
postgres=# select * from t_p1;
  a  |  b
-+-
   1 |   1
   1 |   1
 101 | 101
 201 | 201
(4 rows)

I think the argument "BulkInsertState" used in CopyFrom/heap_insert
is related to this problem. Please check it.

Thanks.





2016-12-27 17:30 GMT+08:00 Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com>:

> Hi Amit,
>
> I have pulled latest sources from git and tried to create multi-level
> partition,  getting a server crash, below are steps to reproduce. please
> check if it is reproducible in your machine also.
>
> postgres=# CREATE TABLE test_ml (a int, b int, c varchar) PARTITION BY
> RANGE(a);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p1 PARTITION OF test_ml FOR VALUES FROM
> (0) TO (250) PARTITION BY RANGE (b);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p1_p1 PARTITION OF test_ml_p1 FOR VALUES
> FROM (0) TO (100);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p1_p2 PARTITION OF test_ml_p1 FOR VALUES
> FROM (100) TO (250);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p2 PARTITION OF test_ml FOR VALUES FROM
> (250) TO (500) PARTITION BY RANGE (c);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p2_p1 PARTITION OF test_ml_p2 FOR VALUES
> FROM ('0250') TO ('0400');
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p2_p2 PARTITION OF test_ml_p2 FOR VALUES
> FROM ('0400') TO ('0500');
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p3 PARTITION OF test_ml FOR VALUES FROM
> (500) TO (600) PARTITION BY RANGE ((b + a));
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p3_p1 PARTITION OF test_ml_p3 FOR VALUES
> FROM (1000) TO (1100);
> CREATE TABLE
> postgres=# CREATE TABLE test_ml_p3_p2 PARTITION OF test_ml_p3 FOR VALUES
> FROM (1100) TO (1200);
> CREATE TABLE
> postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM
> generate_series(0, 599, 2) i;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation
>
>
>


-- 
GaoZengqi
pgf...@gmail.com
zengqi...@gmail.com

Re: [HACKERS] [PATCH] Generic type subscription

2016-12-27 Thread Dmitry Dolgov

> On 27 December 2016 at 16:09, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
> until it breaks existing extensions.

Hm...I already answered, that I managed to avoid compilation problems for
this particular extension
using the `genparser` command again:

> On Thu, Nov 17, 2016 at 10:56 PM, Dmitry Dolgov
<9erthalion6(at)gmail(dot)com>
> wrote:
>
> > On 15 November 2016 at 15:03, Aleksander Alekseev <
> a(dot)alekseev(at)postgrespro(dot)ru> wrote:
> > Hello.
> >
> > I took a look on the latest -v4 patch. I would like to note that this
> > patch breaks a backward compatibility. For instance sr_plan extension[1]
> > stop to compile with errors
>
> Thank you for the feedback.
>
> Well, if we're speaking about this particular extension, if I understood
> correctly, it fetches all parse tree nodes from Postgres and generates
code
> using this information. So to avoid compilation problems I believe you
need to
> run `make USE_PGXS=1 genparser` again (it worked for me, I don't see any
> mentions of `ArrayRef`).
>
> But speaking generally, I don't see how we can provide backward
> compatibility for those extensions, who are strongly coupled with
implementation details
> of parsing tree. I mean, in terms of interface it's mostly about to
replace
> `ArrayRef` to `SubscriptingRef`, but I think it's better to do it in the
extension code.

Or is there something else that I missed?

Re: [HACKERS] comments tablecmds.c

2016-12-27 Thread Magnus Hagander

On Tue, Dec 27, 2016 at 10:36 AM, Erik Rijkers  wrote:

> On 2016-12-27 10:25, Magnus Hagander wrote:
>
>> On Sun, Dec 25, 2016 at 2:35 PM, Erik Rijkers  wrote:
>>
>> Applied, thanks.
>>
>> Seems to me that in comments it's not worth chasing them down
>> individually,
>> but if you happen to fix one alongside another one like here, then it's a
>> different story :)
>>
>>
> Well, it had been triggering my (very simple) spell-checker at each
> compile, for a month a so.  I just wanted to silence it :)
>
> Thanks for doing that :)


FWIW, that comment was in reference to the i.e. vs ie. The "the the"
problem was definitely something to fix :)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] comments tablecmds.c

2016-12-27 Thread Erik Rijkers


On 2016-12-27 10:25, Magnus Hagander wrote:

On Sun, Dec 25, 2016 at 2:35 PM, Erik Rijkers  wrote:

Applied, thanks.

Seems to me that in comments it's not worth chasing them down 
individually,
but if you happen to fix one alongside another one like here, then it's 
a

different story :)



Well, it had been triggering my (very simple) spell-checker at each 
compile, for a month a so.  I just wanted to silence it :)


Thanks for doing that :)

Erik Rijkers



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for pg_receivexlog --format=plain|tar

2016-12-27 Thread Magnus Hagander

On Tue, Dec 27, 2016 at 2:23 AM, Michael Paquier 
wrote:

> Hi all,
>
> Since 56c7d8d4, pg_basebackup supports tar format when streaming WAL
> records. This has been done by introducing a new transparent routine
> layer to control the method used to fetch WAL walmethods.c: plain or
> tar.
>
> pg_receivexlog does not make use of that yet, but I think that it
> could to allow retention of more WAL history within the same amount of
> disk space. OK, disk space is cheap but for some users things like
> that matters to define a duration retention policy. Especially when
> things are automated around Postgres. I really think that
> pg_receivexlog should be able to support an option like
> --format=plain|tar. "plain" is the default, and matches the current
> behavior. This option is of course designed to match pg_basebackup's
> one.
>
> So, here is in details what would happen if --format=tar is done:
> - When streaming begins, write changes to a tar stream, named
> segnum.tar.partial as long as the segment is not completed.
> - Once the segment completes, rename it to segnum.tar.
> - each individual segment has its own tarball.
> - if pg_receivexlog fails to receive changes in the middle of a
> segment, it begins streaming back at the beginning of a segment,
> considering that the current .partial segment is corrupted. So if
> server comes back online, empty the current .partial file and begin
> writing on it again. (I have found a bug on HEAD in this area
> actually).

> Magnus, you have mentioned me as well that you had a couple of ideas
> on the matter, feel free to jump in and let's mix our thoughts!
>

Yeah, I've been wondering what the actual usecase is here :)

Though I was considering the case where all segments are streamed into the
same tarfile (and then some sort of configurable limit where we'd switch
tarfile after  segments, which rapidly started to feel too complicated).

What's the actual advantage of having it wrapped inside a single tarfile?

> There are a couple of things that I have been considering as well for
> pg_receivexlog. Though they are not directly stick to this thread,
> here they are as I don't forget about them:
> - Removal of oldest WAL segments on a partition. When writing WAL
> segments to a dedicated partition, we could have an option that
> automatically removes the oldest WAL segment if the partition is full.
> This triggers once a segment is completed.
> - Compression of fully-written segments. When a segment is finished
> being written, pg_receivexlog could compress them further with gz for
> example. With --format=t this leads to segnum.tar.gz being generated.
> The advantage of doing those two things in pg_receivexlog is
> monitoring. One process to handle them all, and there is no need of
> cron jobs to handle any cleanup or compression.
>

I was at one point thinking that would be a good idea as well, but recently
I've more been thinking that what we should do is implement a
"--post-segment-command", which would act similar to archive_command but
started by pg_receivexlog. This could handle things like compression, and
also integration with external backup tools like backrest or barman in a
cleaner way. We could also spawn this without waiting for it to finish
immediately, which would allow parallellization of the process. When doing
the compression inline that rapidly becomes the bottleneck. Unlike a
basebackup you're only dealing with the need to buffer 16Mb on disk before
compressing it, so it should be fairly cheap.

Another thing I've been considering in the same area would be to add the
ability to write the segments to a pipe instead of a directory. Then you
could just pipe it into gzip without the need to buffer on disk. This would
kill the ability to know at which point we'd sync()ed to disk, but in most
cases so will doing direct gzip. Just means we couldn't support this in
sync mode.

I can see the point of being able to compress the individual segments
directly in pg_receivexlog in smaller systems though, without the need to
rely on an external compression program as well. But in that case, is there
any reason we need to wrap it in a tarfile, and can't just write it to
.gz natively?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread Rajkumar Raghuwanshi

Hi Amit,

I have pulled latest sources from git and tried to create multi-level
partition,  getting a server crash, below are steps to reproduce. please
check if it is reproducible in your machine also.

postgres=# CREATE TABLE test_ml (a int, b int, c varchar) PARTITION BY
RANGE(a);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p1 PARTITION OF test_ml FOR VALUES FROM (0)
TO (250) PARTITION BY RANGE (b);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p1_p1 PARTITION OF test_ml_p1 FOR VALUES
FROM (0) TO (100);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p1_p2 PARTITION OF test_ml_p1 FOR VALUES
FROM (100) TO (250);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p2 PARTITION OF test_ml FOR VALUES FROM
(250) TO (500) PARTITION BY RANGE (c);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p2_p1 PARTITION OF test_ml_p2 FOR VALUES
FROM ('0250') TO ('0400');
CREATE TABLE
postgres=# CREATE TABLE test_ml_p2_p2 PARTITION OF test_ml_p2 FOR VALUES
FROM ('0400') TO ('0500');
CREATE TABLE
postgres=# CREATE TABLE test_ml_p3 PARTITION OF test_ml FOR VALUES FROM
(500) TO (600) PARTITION BY RANGE ((b + a));
CREATE TABLE
postgres=# CREATE TABLE test_ml_p3_p1 PARTITION OF test_ml_p3 FOR VALUES
FROM (1000) TO (1100);
CREATE TABLE
postgres=# CREATE TABLE test_ml_p3_p2 PARTITION OF test_ml_p3 FOR VALUES
FROM (1100) TO (1200);
CREATE TABLE
postgres=# INSERT INTO test_ml SELECT i, i, to_char(i, 'FM') FROM
generate_series(0, 599, 2) i;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] comments tablecmds.c

2016-12-27 Thread Magnus Hagander

On Sun, Dec 25, 2016 at 2:35 PM, Erik Rijkers  wrote:

> On 2016-12-25 13:38, Erik Rijkers wrote:
>
>> 'the the' -> 'the'
>>
>> and
>>
>> 'ie' -> 'i.e.'
>>
>> Although (concening the latter change) the present counts are 'ie'
>> 428, and 'i.e.' 428.
>> so it might be debatable (but let's not)
>>
>>
> Sorry; I meant:  'ie' 428, and 'i.e.' 305.


Applied, thanks.

Seems to me that in comments it's not worth chasing them down individually,
but if you happen to fix one alongside another one like here, then it's a
different story :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2016-12-27 Thread Andres Freund

On 2016-12-27 01:35:05 +, Greg Stark wrote:
> On Dec 26, 2016 10:35 PM, "Tom Lane"  wrote:
> 
> 
> So it seems like the configure support we'd need is to detect
> whether clock_gettime is available (note on Linux there's also
> a library requirement, -lrt), and we would also need a way to
> provide a platform-specific choice of clockid; we at least need
> enough smarts to use CLOCK_MONOTONIC_RAW on macOS.
> 
> This seems like something that really should be checked at runtime. It's
> very specific to the specific kernel you're running on, not the build
> environment, and it can hopefully be measured in only a second or even a
> fraction of a second. The only Pebblebrook would be if other things running
> on the system made the test results unpredictable so that you had a small
> chance of getting a very suboptimal choice and we ruling the dice each time
> you restarted...

I'm pretty strongly against doing performance measurements at
startup. Both the delay and the potential for differing test results
seem like pretty bad consequences.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generic type subscription

2016-12-27 Thread Aleksander Alekseev

As I mentioned above [1] in my humble opinion this patch is not at all in a
"good shape" until it breaks existing extensions.

[1] http://postgr.es/m/20161115080324.GA5351%40e733.localdomain

On Mon, Dec 26, 2016 at 10:49:30PM +0700, Dmitry Dolgov wrote:
> > On 5 December 2016 at 12:03, Haribabu Kommi 
>  wrote:
> 
> > > Moved to next CF with "needs review" status.
> 
> Looks like we stuck here little bit. Does anyone else have any
> suggestions/improvements, or this patch is in good enough shape?

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2016-12-27 Thread Pavan Deolasee

On Sat, Dec 24, 2016 at 1:18 AM, Alvaro Herrera 
wrote:

> Alvaro Herrera wrote:
>
> > With your WARM and my indirect indexes, plus the additions for for-key
> > locks, plus identity columns, there is no longer a real expectation that
> > we can exit early from the function.  In your patch, as well as mine,
> > there is a semblance of optimization that tries to avoid computing the
> > updated_attrs output bitmapset if the pointer is not passed in, but it's
> > effectively pointless because the only interesting use case is from
> > ExecUpdate() which always activates the feature.  Can we just agree to
> > drop that?
>
>
Yes, I agree. As you noted below, the only case that may be affected is
simple_heap_update() which does a lot more and hence this function will be
least of the worries.



> I think the only case that gets worse is the path that does
> simple_heap_update, which is used for DDL.  I would be very surprised if
> a change there is noticeable, when compared to the rest of the stuff
> that goes on for DDL commands.
>
> Now, after saying that, I think that a table with a very large number of
> columns is going to be affected by this.  But we don't really need to
> compute the output bits for every single column -- we only care about
> those that are covered by some index.  So we should pass an input
> bitmapset comprising all such columns, and the output bitmapset only
> considers those columns, and ignores columns not indexed.  My patch for
> indirect indexes already does something similar (though it passes a
> bitmapset of columns indexed by indirect indexes only, so it needs a
> tweak there.)
>
>
Yes, that looks like a good compromise. This would require us to compare
only those columns that any caller of the function might be interested in.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Declarative partitioning - another take

2016-12-27 Thread Amit Langote

On 2016/12/26 19:46, Amit Langote wrote:
> (Perhaps, the following should be its own new thread)
> 
> I noticed that ExecProcessReturning() doesn't work properly after tuple
> routing (example shows how returning tableoid currently fails but I
> mention some other issues below):
> 
> create table p (a int, b int) partition by range (a);
> create table p1 partition of p for values from (1) to (10);
> insert into p values (1) returning tableoid::regclass, *;
>  tableoid | a | b
> --+---+---
>  -| 1 |
> (1 row)
> 
> INSERT 0 1
> 
> I tried to fix that in 0007 to get:
> 
> insert into p values (1) returning tableoid::regclass, *;
>  tableoid | a | b
> --+---+---
>  p| 1 |
> (1 row)
> 
> INSERT 0 1
> 
> But I think it *may* be wrong to return the root table OID for tuples
> inserted into leaf partitions, because with select we get partition OIDs:
> 
> select tableoid::regclass, * from p;
>  tableoid | a | b
> --+---+---
>  p1   | 1 |
> (1 row)
> 
> If so, that means we should build the projection info (corresponding to
> the returning list) for each target partition somehow.  ISTM, that's going
> to have to be done within the planner by appropriate inheritance
> translation of the original returning targetlist.

Turns out getting the 2nd result may not require planner tweaks after all.
Unless I'm missing something, translation of varattnos of the RETURNING
target list can be done as late as ExecInitModifyTable() for the insert
case, unlike update/delete (which do require planner's attention).

I updated the patch 0007 to implement the same, including the test. While
doing that, I realized map_partition_varattnos introduced in 0003 is
rather restrictive in its applicability, because it assumes varno = 1 for
the expressions it accepts as input for the mapping.  Mapping returning
(target) list required modifying map_partition_varattnos to accept
target_varno as additional argument.  That way, we can map arbitrary
expressions from the parent attributes numbers to partition attribute
numbers for expressions not limited to partition constraints.

Patches 0001 to 0006 unchanged.

Thanks,
Amit
>From fcfe08948d31802547e93ac6551873afd554bc36 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 26 Dec 2016 11:53:19 +0900
Subject: [PATCH 1/7] Allocate partition_tuple_slot in respective nodes

...instead of making it part of EState and its tuple table.
Respective nodes means ModifyTableState and CopyState for now.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/commands/copy.c| 30 +-
 src/backend/executor/execMain.c| 12 
 src/backend/executor/nodeModifyTable.c | 17 -
 src/include/executor/executor.h|  1 +
 src/include/nodes/execnodes.h  |  6 +++---
 5 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index aa25a23336..e5a0f1bf80 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -161,11 +161,18 @@ typedef struct CopyStateData
 	ExprState **defexprs;		/* array of default att expressions */
 	bool		volatile_defexprs;		/* is any of defexprs volatile? */
 	List	   *range_table;
+
 	PartitionDispatch *partition_dispatch_info;
-	int			num_dispatch;
-	int			num_partitions;
-	ResultRelInfo *partitions;
+	/* Tuple-routing support info */
+	int			num_dispatch;		/* Number of entries in the above array */
+	int			num_partitions;		/* Number of members in the following
+	 * arrays */
+	ResultRelInfo  *partitions;		/* Per partition result relation */
 	TupleConversionMap **partition_tupconv_maps;
+	/* Per partition tuple conversion map */
+	TupleTableSlot *partition_tuple_slot;
+	/* Slot used to manipulate a tuple after
+	 * it is routed to a partition */
 
 	/*
 	 * These variables are used to reduce overhead in textual COPY FROM.
@@ -1409,6 +1416,7 @@ BeginCopy(ParseState *pstate,
 			PartitionDispatch  *partition_dispatch_info;
 			ResultRelInfo	   *partitions;
 			TupleConversionMap **partition_tupconv_maps;
+			TupleTableSlot	   *partition_tuple_slot;
 			int	num_parted,
 num_partitions;
 
@@ -1416,12 +1424,14 @@ BeginCopy(ParseState *pstate,
 		   _dispatch_info,
 		   ,
 		   _tupconv_maps,
+		   _tuple_slot,
 		   _parted, _partitions);
 			cstate->partition_dispatch_info = partition_dispatch_info;
 			cstate->num_dispatch = num_parted;
 			cstate->partitions = partitions;
 			cstate->num_partitions = num_partitions;
 			cstate->partition_tupconv_maps = partition_tupconv_maps;
+			cstate->partition_tuple_slot = partition_tuple_slot;
 		}
 	}
 	else
@@ -2436,15 +2446,6 @@ CopyFrom(CopyState cstate)
 	estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);
 
 	/*
-	 * Initialize a dedicated slot to manipulate tuples of any given
-	 * partition's rowtype.
-	 */
-	if

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-12-27 Thread Kyotaro HORIGUCHI

At Fri, 23 Dec 2016 11:02:11 +0900, Michael Paquier  
wrote in 
> On Fri, Dec 23, 2016 at 8:13 AM, Robert Haas  wrote:
> > On Thu, Dec 22, 2016 at 2:34 PM, Andres Freund  wrote:
> >> On 2016-12-22 08:32:56 -0800, Andres Freund wrote:
> >>> I plan to commit this later today.  Hope I got the reviewers roughly 
> >>> right.
> >>
> >> And pushed. Thanks for the work on this everyone.
> >
> > Cool.  Also, +1 for the important/unimportant terminology.  I like that.
> 
> Thanks for the commit.

Thanks for commiting.

By the way this issue seems beeing in the ToDo list.

https://wiki.postgresql.org/wiki/Todo#Point-In-Time_Recovery_.28PITR.29

>Consider avoiding WAL switching via archive_timeout if there
>has been no database activity
> - archive_timeout behavior for no activity
> - Re: archive_timeout behavior for no activity

So I marked it as "done".

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2016-12-27 Thread Mithun Cy

On Tue, Dec 27, 2016 at 1:36 PM, Mithun Cy  wrote:
Oops, patch number should be 08 re-attaching same after renaming.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 46df589..c45b3f0 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,19 +32,13 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
 	Size		itemsz;
 	bool		do_expand;
 	uint32		hashkey;
-	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +51,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
  * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into to it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,59 +75,7 @@ restart_insert:
 		itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-	  metap->hashm_maxbucket,
-	  metap->hashm_highmask,
-	  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(hashkey, rel, HASH_WRITE, );
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
@@ -152,7 +99,9 @@ restart_insert:
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
 		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-		   maxbucket, highmask, lowmask);
+		   usedmetap->hashm_maxbucket,
+		   usedmetap->hashm_highmask,
+		   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +174,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap = HashPageGetMeta(metapage);
 	metap->hashm_ntuples += 1;
 
 	/* Make sure this stays in sync with _hash_expandtable() */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 45e184c..c726909 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -434,7 +434,13 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Setting hasho_prevblkno of bucket page with latest maxbucket number
+		 * to indicate bucket has been initialized and need to reconstruct
+		 * HashMetaCache if it is older.
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -845,6 +851,12 @@ _hash_splitbucket(Relation rel,

Re: [HACKERS] Cache Hash Index meta page.

2016-12-27 Thread Mithun Cy

On Thu, Dec 22, 2016 at 12:17 PM, Amit Kapila  wrote:
> On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas  wrote:
>> On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy  
>> wrote:
>>> -- I think if it is okay, I can document same for each member of 
>>> HashMetaPageData whether to read from cached from meta page or directly 
>>> from current meta page. Below briefly I have commented for each member. If 
>>> you suggest I can go with that approach, I will produce a neat patch for 
>>> same.
>>
>> Plain text emails are preferred on this list.

Sorry, I have set the mail to plain text mode now.

> I think this will make metpage cache access somewhat
> similar to what we have in btree where we use cache to access
> rootpage.  Will something like that address your concern?

Thanks, just like _bt_getroot I am introducing a new function
_hash_getbucketbuf_from_hashkey() which will give the target bucket
buffer for the given hashkey. This will use the cached metapage
contents instead of reading meta page buffer if cached data is valid.
There are 2 places which can use this service 1. _hash_first and 2.
_hash_doinsert.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 46df589..c45b3f0 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,19 +32,13 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
 	Size		itemsz;
 	bool		do_expand;
 	uint32		hashkey;
-	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +51,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
  * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into to it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,59 +75,7 @@ restart_insert:
 		itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-	  metap->hashm_maxbucket,
-	  metap->hashm_highmask,
-	  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(hashkey, rel, HASH_WRITE, );
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
@@ -152,7 +99,9 @@ restart_insert:
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
 		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-		   maxbucket, highmask, lowmask);
+		   usedmetap->hashm_maxbucket,
+		   usedmetap->hashm_highmask,
+		   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +174,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap =

Re: [HACKERS] merging some features from plpgsql2 project

2016-12-27 Thread Pavel Stehule

2016-12-27 8:54 GMT+01:00 Pavel Stehule :

> Hi
>
> I reread ideas described on page https://github.com/trustly/plpgsql2
>
> Some points are well and can be benefit for PlpgSQL.
>
> First I describe my initial position. I am strongly against introduction
> "new" language - plpgsql2 or new plpgsql, or any else. The trust of
> developers to us is important and introduction of any not compatible or
> different feature has to have really big reason. PostgreSQL is conservative
> environment, and PLpgSQL should not be a exception. More - I have not any
> information from my customers, colleagues about missing features in this
> language.  If there is some gaps, then it is in outer environment - IDE,
> deployment, testing,
>
>
I forgot - the big plpgsql issue are too weak expressions on left  part of
assignment statements.


>
>
> Pavel
>

89 matches

Mail list logo