date:20160215

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-15 Thread Etsuro Fujita


On 2016/02/16 16:02, Ashutosh Bapat wrote:

On Tue, Feb 16, 2016 at 12:26 PM, Etsuro Fujita
> wrote:



On 2016/02/16 15:22, Ashutosh Bapat wrote:



During join planning, the planner tries multiple combinations of
joining
relations, thus the same base or join relation can be part of
multiple
of combination. Hence remote_conds or joinclauses will get linked
multiple times as they are bidirectional lists, thus breaking
linkages
of previous join combinations tried. E.g. while planning A join
B join C
join D planner will come up with combinations like A(B(CD)) or
(AB)(CD)
or ((AB)C)D etc. and remote_conds from A will first be linked into
A(B(CD)), then AB breaking the first linkages.



Exactly, but I don't think that that needs to be considered because
we have this at the beginning of postgresGetGForeignJoinPaths:

 /*
  * Skip if this join combination has been considered already.
  */
 if (joinrel->fdw_private)
 return;



There will be different joinrels for A(B(CD)) and (AB) where A's
remote_conds need to be pulled up.


Agreed.


The check you have mentioned above
only protects us from adding paths multiple times to (AB) when we
encounter it for (AB)(CD) and ((AB)C)D.


Sorry, I don't understand this fully.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-02-15 Thread Michael Paquier

On Tue, Feb 16, 2016 at 5:29 AM, Joe Conway  wrote:
> I believe this takes care of all open issues with this, so I plan to
> commit it as attached in a day or two. Thanks for your reviews and comments!

Here are just a couple of cosmetic comments.

+   The view pg_config describes the
+   compile-time configuration parameters of the currently installed
+   version of PostgreSQL. It is intended, for example, to be used by
+   software packages that want to interface to PostgreSQL to facilitate
+   finding the required header files and libraries. It provides the same
+   basic information as the  PostgreSQL Client
+   Application. There is a System Information Function
Missing markup  around PostgreSQL.

+   Application. There is a System Information Function
Why is using upper-case characters necessary here? This could just say
system function.

The paragraph in func.sgml is a copy-paste of the one in
catalogs.sgml. We may want to remove the duplication.

+   /* let the caller know we're sending back a tuplestore */
+   rsinfo->returnMode = SFRM_Materialize;
I guess one can recognize your style here for SRF functions :)

@@ -61,7 +74,7 @@ libpgcommon_srv.a: $(OBJS_SRV)
 # a hack that might fail someday if there is a *_srv.o without a
 # corresponding *.o, but it works for now.
 %_srv.o: %.c %.o
-   $(CC) $(CFLAGS) $(subst -DFRONTEND,, $(CPPFLAGS)) -c $< -o $@
+   $(CC) $(CFLAGS) $(subst -DFRONTEND ,, $(CPPFLAGS)) -c $< -o $@
Diff noise?

--- /dev/null
+++ b/src/common/config_info.c
[...]
+ * IDENTIFICATION
+ *   src/common/controldata_utils.c
This is incorrect.

+ * IDENTIFICATION
+ *   src/backend/utils/misc/pg_config.c
+ *
+ */
I am nitpicking here but this header block should have a long
"" at its bottom.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-15 Thread Masahiko Sawada

On Mon, Feb 15, 2016 at 2:54 PM, Michael Paquier
 wrote:
> On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote:
>> Surprizingly yes. The list is handled as an identifier list and
>> parsed by SplitIdentifierString thus it can accept double-quoted
>> names.
>

Attached latest version patch which has only feature logic so far.
I'm writing document patch about this feature now, so this version
patch doesn't have document and regression test patch.

> | $ postgres
> | FATAL:  syntax error: unexpected character "*"
> Mmm.. It should be tough to find what has happened..

I'm trying to implement better error message, but that change is not
included in this version patch yet.

> malloc/free are used in create_name_node and other functions to
> be used in scanner, but syncgroup_gram.y is said to use
> palloc/pfree. Maybe they should use the same memory
> allocation/freeing functions.

Setting like this, I think that we use malloc/free funcion when we
allocate/free memory for SyncRepStandbys variables.
OTOH, we use palloc/pfree function during parsing SyncRepStandbyString.
Am I missing something?

> I suppose SyncRepGetSyncedLsnsFn, or SyncRepGetSyncedLsnsPriority
> can return InvalidXLogRecPtr as cur_*_pos even when it returns
> true. And, I suppose comparison of LSN values with
> InvalidXLogRecPtr is not well-defined. Anyway the condition goes
> wrong when cur_write_pos = InvalidXLogRecPtr (but ret = true).

In this version patch, it's not possible to return InvalidXLogRecPtr
with got_lsns = false (was ret = false).
So we can ensure that we got valid LSNs when got_lsns = true.

> At a glance, SyncRepGetSyncedLsnsPriority and
> SyncRepGetSyncStandbysPriority does almost the same thing and both
> runs loops over group members. Couldn't they run at once?

Yeah, I've optimized that logic.

> We may want to be careful with the use of '[' in application_name.
> I am not much thrilled with forbidding the use of []() in application_name, 
> so we may
> want to recommend user to use a backslash when using s_s_names when a
> group is defined.
> s_s_names='abc, def, " abc,""def"'
>
> Result list is ["abc", "def", " abc,\"def"]
>
> Simplly supporting the same notation addresses the problem and
> accepts strings like the following.
>
> s_s_names='2["comma,name", "foo[bar,baz]"]'

I've changed s_s_names parser so that it can handle special 4
characters (\,\ \[\]) and can handle double-quoted string accurately
same as what SplitIdentifierString does.
We can not use special 4 characters (\,\ \[ \]) without using
double-quoted string. Also if we use "(double-quote) character in
double-quoted string, we should use ""(double double-quotes).
For example,
if application_name = 'hoge " bar', s_s_name = '"hoge "" bar"' would be matched.

Other given comments are fixed.

Remaining tasks are;
- Document patch.
- Regression test patch.
- Syntax error message for s_s_names improvement.

Regards,

--
Masahiko Sawada

000_multi_sync_replication_v10.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-15 Thread Ashutosh Bapat

On Tue, Feb 16, 2016 at 12:26 PM, Etsuro Fujita  wrote:

> On 2016/02/16 15:22, Ashutosh Bapat wrote:
>
>> During join planning, the planner tries multiple combinations of joining
>> relations, thus the same base or join relation can be part of multiple
>> of combination. Hence remote_conds or joinclauses will get linked
>> multiple times as they are bidirectional lists, thus breaking linkages
>> of previous join combinations tried. E.g. while planning A join B join C
>> join D planner will come up with combinations like A(B(CD)) or (AB)(CD)
>> or ((AB)C)D etc. and remote_conds from A will first be linked into
>> A(B(CD)), then AB breaking the first linkages.
>>
>
> Exactly, but I don't think that that needs to be considered because we
> have this at the beginning of postgresGetGForeignJoinPaths:
>
> /*
>  * Skip if this join combination has been considered already.
>  */
> if (joinrel->fdw_private)
> return;
>
>
There will be different joinrels for A(B(CD)) and (AB) where A's
remote_conds need to be pulled up. The check you have mentioned above only
protects us from adding paths multiple times to (AB) when we encounter it
for (AB)(CD) and ((AB)C)D.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] planstats.sgml

2016-02-15 Thread Tatsuo Ishii

>> Tatsuo Ishii  writes:
>>> While reading planstats.sgml, I encounted a sentence which I don't
>>> understand.
>> 
>>> These numbers are current as of the last VACUUM or
>>> ANALYZE on the table.  The planner then fetches the
>>> actual current number of pages in the table (this is a cheap operation,
>>> not requiring a table scan).  If that is different from
>>> relpages then
>>> reltuples is scaled accordingly to
>>> arrive at a current number-of-rows estimate.  In this case the value of
>>> relpages is up-to-date so the rows estimate 
>>> is
>>> the same as reltuples.
>> 
>>> I don't understand the last sentence (In this case...). For me it
>>> seems it is talking about the case when replages is not different from
>>> what the planner fetches from the table. If so, why "In this case"?
>> 
>> I think what it meant is "In the example above, ...".  Feel free to
>> change it if you think that is clearer.
> 
> Oh, I see. I'm going to change "In this case" to "In the example above".

Done.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2016-02-15 Thread Michael Paquier

On Tue, Feb 16, 2016 at 9:18 AM, Robert Haas  wrote:
> I experimented with trying to do this and ran into a problem: where
> exactly would you store the evaluated arguments when you don't know
> how many of them there will be?  And even if you did know how many of
> them there will be, wouldn't that mean that evalFunc or evaluateExpr
> would have to palloc a buffer of the correct size for each invocation?
>  That's far more heavyweight than the current implementation, and
> minimizing CPU usage inside pgbench is a concern.  It would be
> interesting to do some pgbench runs with this patch, or the final
> patch, and see what effect it has on the TPS numbers, if any, and I
> think we should.  But the first concern is to minimize any negative
> impact, so let's talk about how to do that.

Good point. One simple idea here would be to use a custom pgbench
script that has no SQL commands and just calculates the values of some
parameters to measure the impact without depending on the backend,
with a fixed number of transactions.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-15 Thread Etsuro Fujita


On 2016/02/16 15:22, Ashutosh Bapat wrote:

During join planning, the planner tries multiple combinations of joining
relations, thus the same base or join relation can be part of multiple
of combination. Hence remote_conds or joinclauses will get linked
multiple times as they are bidirectional lists, thus breaking linkages
of previous join combinations tried. E.g. while planning A join B join C
join D planner will come up with combinations like A(B(CD)) or (AB)(CD)
or ((AB)C)D etc. and remote_conds from A will first be linked into
A(B(CD)), then AB breaking the first linkages.


Exactly, but I don't think that that needs to be considered because we 
have this at the beginning of postgresGetGForeignJoinPaths:


/*
 * Skip if this join combination has been considered already.
 */
if (joinrel->fdw_private)
return;

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

2016-02-15 Thread Michael Paquier

On Tue, Feb 16, 2016 at 5:58 AM, Alvaro Herrera
 wrote:
> Peter Geoghegan wrote:
>
>> Michael (the CF manager at the time) remembered to change the status
>> to "Ready for Committer" again; you see this entry immediately
>> afterwards:
>>
>> "New status: Ready for Committer"
>>
>> but I gather from the CF app history that Alvaro (the current CF
>> manager) did not remember to do that second step when he later moved
>> the patch to the "next" (current) CF. Or maybe he just wasn't aware of
>> this quirk of the CF app.

I recall switching this patch as "Ready for committer" based on the
status of the thread.

> That seems a bug in the CF app to me.
>
> (FWIW I'm not the "current" CF manager, because the CF I managed is
> already over.  I'm not sure that we know who the manager for the
> upcoming one is.)

When a patch with status "Ready for committer" on CF N is moved to CF
(N+1), its status is automatically changed to "Needs Review". That's
something to be aware of when cleaning up the CF app entries.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-15 Thread Etsuro Fujita


On 2016/02/15 21:33, Ashutosh Bapat wrote:

Here's patch with better way to fix it. I think while concatenating the
lists, we need to copy the lists being appended and in all the cases. If
we don't copy, a change in those lists can cause changes in the upward
linkages and thus lists of any higher level joins.


Maybe I'm missing something, but I don't understand why such a change in 
those lists happens.  Could you explain about that in more detail?


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-15 Thread Ashutosh Bapat

During join planning, the planner tries multiple combinations of joining
relations, thus the same base or join relation can be part of multiple of
combination. Hence remote_conds or joinclauses will get linked multiple
times as they are bidirectional lists, thus breaking linkages of previous
join combinations tried. E.g. while planning A join B join C join D planner
will come up with combinations like A(B(CD)) or (AB)(CD) or ((AB)C)D etc.
and remote_conds from A will first be linked into A(B(CD)), then AB
breaking the first linkages.

On Tue, Feb 16, 2016 at 11:36 AM, Etsuro Fujita  wrote:

> On 2016/02/15 21:33, Ashutosh Bapat wrote:
>
>> Here's patch with better way to fix it. I think while concatenating the
>> lists, we need to copy the lists being appended and in all the cases. If
>> we don't copy, a change in those lists can cause changes in the upward
>> linkages and thus lists of any higher level joins.
>>
>
> Maybe I'm missing something, but I don't understand why such a change in
> those lists happens.  Could you explain about that in more detail?
>
> Best regards,
> Etsuro Fujita
>
>
>

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Incorrect formula for SysV IPC parameters

2016-02-15 Thread Fujii Masao

On Fri, Feb 12, 2016 at 11:19 PM, Fujii Masao  wrote:
> On Fri, Feb 5, 2016 at 2:17 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Thu, 4 Feb 2016 21:43:04 +0900, Fujii Masao  wrote 
>> in 
>>> On Wed, Feb 3, 2016 at 12:51 PM, Kyotaro HORIGUCHI
>>>  wrote:
>>> > Hello, I found that the formulas to calculate SEMMNI and SEMMNS
>>> > are incorrect in 9.2 and later.
>>> >
>>> > http://www.postgresql.org/docs/9.5/static/kernel-resources.html
>>> >
>>> > But actually the number of semaphores PostgreSQL needs is
>>> > calculated as following in 9.4 and later.
>> ...
>>> > So, the formula for SEMMNI should be
>>> >
>>> > ceil((max_connections + autovacuum_max_workers + max_worker_processes + 
>>> > 5) / 16)
>>> >
>>> > and SEMMNS should have the same fix.
>>> >
>>> >
>>> > In 9.3 and 9.2, the documentation says the same thing but
>> ...
>>> > ceil((max_connections + autovacuum_max_workers + 5) / 16)
>>> >
>>> > In 9.1, NUM_AUXILIARY_PROCS is 3 so the documentations is correct.
>>>
>>> Good catch!
>>
>> Thanks.
>>
>>> ISTM that you also need to change the descriptions about SEMMNI and SEMMNS
>>> under the Table 17-1.
>>
>> Oops! Thank you for pointing it out.
>>
>> The original description doesn't mention the magic-number ('5' in
>> the above formulas, which previously was '4') so I haven't added
>> anything about it.
>>
>> Process of which the number is limited by max_worker_processes is
>> called 'background process' (not 'backend worker') in the
>> documentation so I used the name to mention it in the additional
>> description.
>>
>> The difference in the body text for 9.2, 9.3 is only a literal
>> '4' to '5' in the formula.
>
> Thanks for updating the patches!
>
> They look good to me except that the formulas don't include the number of
> background processes requesting shared memory access, i.e.,
> GetNumShmemAttachedBgworkers(), in 9.3. Isn't it better to use the following
> formula in 9.3?
>
>   ceil((max_connections + autovacuum_max_workers + number of
> background proceses + 5) / 16)
>
> Attached patch uses the above formula for 9.3. I'm thinking to push your
> patches to 9.2, 9.4, 9.5, master, also push the attached one to 9.3.
> Comments?

Pushed. Thanks for the report and patches!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planstats.sgml

2016-02-15 Thread Mark Kirkwood


On 16/02/16 12:46, David G. Johnston wrote:

On Mon, Feb 15, 2016 at 4:23 PM, Tatsuo Ishii >wrote:

While reading planstats.sgml, I encounted a sentence which I don't
understand.

 These numbers are current as of the last VACUUM or
 ANALYZE on the table.  The planner then fetches the
 actual current number of pages in the table (this is a cheap
operation,
 not requiring a table scan).  If that is different from
 relpages then
 reltuples is scaled accordingly to
 arrive at a current number-of-rows estimate.  In this case the
value of
 relpages is up-to-date so the rows
estimate is
 the same as reltuples.

I don't understand the last sentence (In this case...). For me it
seems it is talking about the case when replages is not different from
what the planner fetches from the table. If so, why "In this case"?
Isn't "In this case" referrers to the previous sentence (If that is
different from...)? Maybe "In this case" should be "Otherwise" or some
such?


The whole sentence is awkward but you are correct in your reading - and
"otherwise" would be a solid choice.



A long while ago when I wrote that, I was expressing the fact that *in 
the example* the numbers matched perfectly, but *in general* the planner 
would scale 'em. It still reads that way to me...but change away if you 
like :-)



Though iIt seems the whole thing could be simplified to:

These numbers are current as of the last VACUUM or ANALYZE on the
table.  To account for subsequent changes the planner obtains the actual
page count for the table and scales pg_class.reltuples by a factor of
"actual page count" over pg_class.relpages.

The "cheap operation" comment seems gratuitous though could still be
injected if desired.



Well folk interested in performance, like to keep an eye on whether 
these sort of probes cost a lot...


regards

Mark




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-02-15 Thread Pavel Stehule

Hi

2016-02-15 10:16 GMT+01:00 Dean Rasheed :

> > On 12/02/16 10:19, Dean Rasheed wrote:
> >> This seems like a reasonable first patch for me as a committer, so
> >> I'll take it unless anyone else was planning to do so.
> >
>
> So looking at this, it seems that for the most part pg_size_bytes()
> will parse any output produced by pg_size_pretty(). The exception is
> that there are 2 versions of pg_size_pretty(), one that takes bigint
> and one that takes numeric, whereas pg_size_bytes() returns bigint, so
> it can't handle all inputs. Is there any reason not to make
> pg_size_bytes() return numeric?

> It would still be compatible with the example use cases, but it would
> be a better inverse of both variants of pg_size_pretty() and would be
> more future-proof. It already works internally using numeric, so it's
> a trivial change to make now, but impossible to change in the future
> without introducing a new function with a different name, which is
> messy.
>
> Thoughts?
>

This is a question. I have not a strong opinion about it. There are no any
technical objection - the result will be +/- same. But you will enforce
Numeric into outer expression evaluation.

The result will not be used together with function pg_size_pretty, but much
more with functions pg_relation_size, pg_relation_size, .. and these
functions doesn't return Numeric. These functions returns bigint. Bigint is
much more natural type for this purpose.

Is there any use case for Numeric?

Regards

Pavel

>
> Regards,
> Dean
>

Re: [HACKERS] xlc atomics

2016-02-15 Thread Craig Ringer

On 5 July 2015 at 06:54, Andres Freund  wrote:

> I wrote this entirely blindly, as evidenced here by the changes you
> needed. At the time somebody had promised to soon put up an aix animal,
> but that apparently didn't work out.
>

Similarly, I asked IBM for XL/C for a POWER7 Linux VM to improve test cover
on the build farm but was told to just use gcc. It was with regards to
testing hardware decfloat support and the folks I spoke to said that gcc's
support was derived from that in xl/c anyway.

They're not desperately keen to get their products out there for testing.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] innocuous: pgbench does FD_ISSET on invalid socket

2016-02-15 Thread Michael Paquier

On Tue, Feb 16, 2016 at 8:47 AM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>
>> Different issues in the same area:
>> 1) In vacuumdb.c, init_slot() does not check for the return value of 
>> PQsocket():
>> slot->sock = PQsocket(conn);
>> 2) In isolationtester.c, try_complete_step() should do the same.
>> 3) In pg_recvlogical.c for StreamLogicalLog() I am spotting the same problem.
>> I guess those ones should be fixed as well, no?
>
> Hmm, yeah, perhaps it's worth closing these too.

Do you think that it would be better starting a new thread? The only
difficulty of this patch is to be sure that each error handling is
done correctly for each one of those frontend modules.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2016-02-15 Thread Amit Langote


Hi Josh,

On 2016/02/16 11:41, Josh berkus wrote:
> On 02/15/2016 04:28 PM, Amit Langote wrote:
>> Also, you won't see any optimizer and executor changes. Queries will still
>> use the same plans as existing inheritance-based partitioned tables,
>> although as I mentioned, constraint exclusion won't yet kick in. That will
>> be fixed very shortly.
> 
> We're not going to use CE for the new partitioning long-term, are we? This
> is just the first version, right?

Yes. My approach in previous versions of stuffing major planner changes in
with the syntax patch was not quite proper in retrospect. So, I thought
I'd propose any major planner (and executor) changes later.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2016-02-15 Thread Josh berkus


On 02/15/2016 04:28 PM, Amit Langote wrote:

Also, you won't see any optimizer and executor changes. Queries will still
use the same plans as existing inheritance-based partitioned tables,
although as I mentioned, constraint exclusion won't yet kick in. That will
be fixed very shortly.


We're not going to use CE for the new partitioning long-term, are we? 
This is just the first version, right?


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A bit of PG archeology uncovers an interesting Linux/Unix factoid

2016-02-15 Thread Chapman Flack

On 02/15/16 20:03, Greg Stark wrote:
> On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack  wrote:
>> If the calling process subsequently waits for its
>> children, and the process has no unwaited for children that were
>> transformed into zombie processes, it will block until all of its
>> children terminate, and wait(), wait3(), waitid() and waitpid() will
>> fail and set errno to [ECHILD].

> And actually looking at that documentation it's not clear to me why
> it's the case. I would have expected system to immediately call
> waitpid after the fork and unless the subprocess was very quick that
> should be sufficient to get the exit code. One might even imagine
> having system intentionally have some kind interlock to ensure that
> the parent has called waitpid before the child execs the shell.

Doesn't the wording suggest that even if the parent is fast enough
to call waitpid before the child exits, waitpid will only block until
the child terminates and then say ECHILD anyway?

I wouldn't be surprised if they specified it that way to avoid creating
a race condition where you would *sometimes* think it was doing what you
wanted.

Agree that the language for ECHILD in system(3) doesn't clearly reflect that
in the "status ... is no longer available" description it gives for ECHILD.

-Chap

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove or weaken hints about "effective resolution of sleep delays is 10 ms"?

2016-02-15 Thread Merlin Moncure

On Wed, Feb 10, 2016 at 4:15 PM, Andres Freund  wrote:
> Hi,
>
> Several places in our docs have blurbs like
>> Note that on many systems, the effective resolution of sleep delays is
>> 10 milliseconds; setting wal_writer_delay to a value that
>> is not a multiple of 10 might have the same results as setting it to
>> the next higher multiple of 10.
> Afaik that's not the case on any recent operating system/hardware. So
> perhaps we should just remove all of those blurbs, or just replace them
> with something like "on some older systems the effective resolution of
> sleep delays is limited to multiples of 10 milliseconds"?

I guess we should probably explain what is actually happening, namely
that the precise sleep duration is delegated to the operating system
scheduler which may cause the process to sleep longer than requested.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2016-02-15 Thread Corey Huinker

>
> Also, you won't see any optimizer and executor changes. Queries will still
> use the same plans as existing inheritance-based partitioned tables,
> although as I mentioned, constraint exclusion won't yet kick in. That will
> be fixed very shortly.
>
> And of course, comments on syntax are welcome as well.
>
> Thanks,
> Amit
>
>
>
Good to know the current limitations/expectations.

Our ETL has a great number of workers that do something like this:
1. grab a file
2. based on some metadata of that file, determine the partition that that
would receive ALL of the rows in that file. It's actually multiple tables,
all of which are partitioned, all of which fully expect the file data to
fit in exactly one partition.
3. \copy into a temp table
4. Transform the data and insert the relevant bits into each of the target
partitions derived in #2.

So while ATR is a *major* feature of true partitioning, it's not something
we'd actually need in our current production environment, but I can
certainly code it that way to benchmark ATR vs "know the destination
partition ahead of time" vs "insane layered range_partitioning trigger +
pg_partman trigger".

Currently we don't do anything like table swapping, but I've done that
enough in the past that I could probably concoct a test of that too, once
it's implemented.

As for the syntax, I'm not quite sure your patch addresses the concerned I
voiced earlier: specifically if the VALUES IN works for RANGE as well as
LIST,  but I figured that would become clearer once I tried to actually use
it. Currently we have partitioning on C-collated text ranges (no, they
don't ship with postgres, I had to make a custom type) something like this:

part0: (,BIG_CLIENT)
part1: [BIG_CLIENT,BIG_CLIENT]
part2: (BIG_CLIENT,L)
part3: [L,MONSTROUSLY_BIG_CLIENT)
part4: [MONSTROUSLY_BIG_CLIENT,MONSTROUSLY_BIG_CLIENT]
part5: (MONSTROUSLY_BIG_CLIENT,RANDOM_CLIENT_LATE_IN_ALPHABET]
part6: (RANDOM_CLIENT_LATE_IN_ALPHABET,)


I can't implement that with a simple VALUES LESS THAN clause, unless I
happen to know 'x' in 'BIG_CLIENTx', where 'x' is the exact first character
in the collation sequence, which has to be something unprintable, and that
would make those who later read my code to say something unprintable. So
yeah, I'm hoping there's some way to cleanly represent such ranges.

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-15 Thread Amit Langote

Hello,

On 2016/02/15 20:21, Kyotaro HORIGUCHI wrote:
> At Mon, 8 Feb 2016 11:37:17 +0900, Amit Langote wrote:
>> On 2016/02/05 17:15, poku...@pm.nttdata.co.jp wrote:
>>> Please find attached updated patch.

[ ... ]

>>
>> Instead of passing the array of char *'s, why not just pass a single char
>> *, because that's what it's doing - updating a single message. So,
>> something like:
> 
> As I might have written upthread, transferring the whole string
> as a progress message is useless at least in this scenario. Since
> they are a set of fixed messages, each of them can be represented
> by an identifier, an integer number. I don't see a reason for
> sending the whole of a string beyond a backend.

This tends to make sense. Perhaps, they could be macros:

#define VACUUM_PHASE_SCAN_HEAP  1
#define VACUUM_PHASE_VACUUM_INDEX_HEAP  2

> Next, the function pg_stat_get_command_progress() has a somewhat
> generic name, but it seems to reuturn the data only for the
> backends with beentry->st_command = COMMAND_LAZY_VACUUM and has
> the column names specific for vucuum like process. If the
> function is intended to be generic, it might be better to return
> a set of integer[] for given type. Otherwise it should have a
> name represents its objective.

Agreed.

> 
> CREATE FUNCTION
>  pg_stat_get_command_progress(IN cmdtype integer)
>  RETURNS SETOF integer[] as $$
> 
> SELECT * from pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as x
>  x
> -_
> {1233, 16233, 1, }
> {3244, 16236, 2, }
> 

I am not sure what we would pass as argument to the (SQL) function
pg_stat_get_command_progress() in the system view definition for
individual commands - what is PROGRESS_COMMAND_VACUUM exactly? Would
string literals like "vacuum", "cluster", etc. to represent command names
work?

> 
> CREATE VIEW pg_stat_vacuum_progress AS
>   SELECT S.s[1] as pid,
>  S.s[2] as relid,
>  CASE S.s[3] 
>WHEN 1 THEN 'Scanning Heap'
>WHEN 2 THEN 'Vacuuming Index and Heap'
>ELSE 'Unknown phase'
>  END,
>
>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;
> 
> # The name of the function could be other than *_command_progress.
> 
> Any thoughts or opinions?

How about pg_stat_get_progress_info()?

>> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", 
>> phase2);
>> + pgstat_report_progress_update_message(0, progress_message);
>>   /* Remove index entries */
>>   for (i = 0; i < nindexes; i++)
>> + {
>>   lazy_vacuum_index(Irel[i],
>> [i],
>> vacrelstats);
>> + scanned_index_pages += 
>> RelationGetNumberOfBlocks(Irel[i]);
>> + /* Update the scanned index pages and number of index 
>> scan */
>> + pgstat_report_progress_update_counter(3, 
>> scanned_index_pages);
>> + pgstat_report_progress_update_counter(4, 
>> vacrelstats->num_index_scans
>> + 1);
>> + }
>>   /* Remove tuples from heap */
>>   lazy_vacuum_heap(onerel, vacrelstats);
>>   vacrelstats->num_index_scans++;
>> + scanned_index_pages = 0;
>>
>> I guess num_index_scans could better be reported after all the indexes are
>> done, that is, after the for loop ends.
> 
> Precise reporting would be valuable if vacuuming indexes takes a
> long time. It seems to me to be fine as it is since updating of
> stat counters wouldn't add any significant overhead.

Sorry, my comment may be a bit unclear. vacrelstats->num_index_scans
doesn't count individual indexes vacuumed but rather the number of times
"all" the indexes of a table are vacuumed, IOW, the number of times the
vacuum phase runs. Purpose of counter #4 there seems to be to report the
latter. OTOH, reporting scanned_index_pages per index as is done in the
patch is alright.

That said, there is discussion upthread about more precise reporting on
index vacuuming by utilizing the lazy_tid_reaped() (the index bulk delete
callback) as a place where we can report what index block number we are
at.  I think that would mean the current IndexBulkDeleteCallback signature
is insufficient, which is the following:

/* Typedef for callback function to determine if a tuple is bulk-deletable */
typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);

One more parameter would be necessary:

typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, BlockNumber
current_index_blkno, void *state);

That would also require changing all the am specific vacuumpage routines
(like btvacuumpage) to also pass the new argument. Needless to say, some
bookkeeping information would also need to be kept in LVRelStats (the
"state" in above signature).

Am I missing something?

Re: [HACKERS] A bit of PG archeology uncovers an interesting Linux/Unix factoid

2016-02-15 Thread Greg Stark

On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack  wrote:
> If the calling process subsequently waits for its
> children, and the process has no unwaited for children that were
> transformed into zombie processes, it will block until all of its
> children terminate, and wait(), wait3(), waitid() and waitpid() will
> fail and set errno to [ECHILD].

Sure, but I don't see anything saying system() should be expected to
not handle this situation. At least there's nothing in the system.3
man page that says it should be expected to always return an error if
SIGCHILD is ignored.

And actually looking at that documentation it's not clear to me why
it's the case. I would have expected system to immediately call
waitpid after the fork and unless the subprocess was very quick that
should be sufficient to get the exit code. One might even imagine
having system intentionally have some kind interlock to ensure that
the parent has called waitpid before the child execs the shell.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-15 Thread Tom Lane

Noah Misch  writes:
> On Mon, Feb 15, 2016 at 07:31:40PM -0500, Robert Haas wrote:
>> Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
>> test suites.  Is there any reason for that, or is it just kinda where
>> we ended up?

> To my knowledge, it's just the undesirable place we ended up.

Yeah.  +1 for fixing that, if it's not unreasonably painful.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

2016-02-15 Thread Tom Lane

Robert Haas  writes:
> On Sun, Feb 14, 2016 at 5:26 PM, Tom Lane  wrote:
>> Also, after fixing that it would be good to add a comment explaining why
>> it's not fundamentally unsafe for BecomeLockGroupMember() to examine
>> leader->pgprocno without having any relevant lock.  AFAICS, that's only
>> okay because the pgprocno fields are never changed after InitProcGlobal,
>> even when a PGPROC is recycled.

> I am sort of disinclined to think that this needs a comment.

I might not either, except that the entire point of that piece of code is
to protect against the possibility that the leader PGPROC vanishes before
we can get this lock.  Since you've got extensive comments addressing that
point, it seems appropriate to explain why this one line doesn't invalidate
the whole design ... because it's right on the hairy edge of doing so.
If we did something like memset a PGPROC to all zeroes when we free it,
which in general would seem like a perfectly reasonable thing to do, this
code would be broken (and not in an easy to detect way; it would indeed
be indistinguishable from the way in which it's broken right now).

> Do we
> really have a comment every other place that pgprocno is referenced
> without a lock?

I suspect strongly that there is no other place that attempts to touch any
part of an invalid (freed) PGPROC.  If there is, more than likely it's
unsafe.

I don't have time right now to read the patch in detail, but will look
tomorrow.  In the meantime, thanks for addressing my concerns.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A bit of PG archeology uncovers an interesting Linux/Unix factoid

2016-02-15 Thread Chapman Flack

On 02/15/16 13:42, Greg Stark wrote:

> (it returns error with errno ECHILD upon successful completion of 
> commands).
> This fix ignores an error from system() if errno == ECHILD.
> 
> It looks like Linux now behaves similarly,

It seems to be official, in the Single Unix Specification:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/sigaction.html

SA_NOCLDWAIT
If set, and sig equals SIGCHLD, child processes of the calling
processes will not be transformed into zombie processes when they
terminate. If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait(), wait3(), waitid() and waitpid() will
fail and set errno to [ECHILD]. Otherwise, terminating child
processes will be transformed into zombie processes, unless SIGCHLD
is set to SIG_IGN.

> So just in case anyone else wants to use system() in Postgres or
> indeed any other Unix application that twiddles with the SIGCHILD
> handler this is something to beware of. It's not entirely clear to me
> that the mention of SA_NOCLDWAIT is the only way to get this
> behaviour, at least one stackoverflow answer implied just setting
> SIG_IGN was enough.

Yup:

• If a process sets the action for the SIGCHLD signal to SIG_IGN, the
behaviour is unspecified, except as specified below. If the action
for the SIGCHLD signal is set to SIG_IGN, child processes of the
calling processes will not be transformed into zombie processes when
they terminate. If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait(), wait3(), waitid() and waitpid() will
fail and set errno to [ECHILD].

-Chap

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

2016-02-15 Thread Robert Haas

On Sun, Feb 14, 2016 at 5:26 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane  wrote:
>>> pgprocno of the specific PGPROC, or of the group leader?  If it's
>>> the former, I'm pretty suspicious that there are deadlock and/or
>>> linked-list-corruption hazards here.  If it's the latter, at least
>>> the comments around this are misleading.
>
>> Leader.  The other way would be nuts.
>
> ... and btw, either BecomeLockGroupMember() has got this backwards, or
> I'm still fundamentally confused.

Yeah, you're right.  Attached is a draft patch that tries to clean up
that and a bunch of other things that you raised.  It hasn't really
been tested yet, so it might be buggy; I wrote it during my long plane
flight.  Fixes include:

1. It removes the groupLeader flag from PGPROC.  You're right: we don't need it.

2. It fixes this problem with BecomeLockGroupMember().  You're right:
the old way was backwards.

3. It fixes TopoSort() to be less inefficient by checking whether the
EDGE is for the correct lock before doing anything else.  I realized
this while updating comments related to EDGE.

4. It adds some text to the lmgr README.

5. It reflows the existing text in the lmgr README to fit within 80 columns.

6. It adjusts some other comments.

Possibly this should be broken up into multiple patches, but I'm just
sending it along all together for now.

> Also, after fixing that it would be good to add a comment explaining why
> it's not fundamentally unsafe for BecomeLockGroupMember() to examine
> leader->pgprocno without having any relevant lock.  AFAICS, that's only
> okay because the pgprocno fields are never changed after InitProcGlobal,
> even when a PGPROC is recycled.

I am sort of disinclined to think that this needs a comment.  Do we
really have a comment every other place that pgprocno is referenced
without a lock?  Or maybe there are none, but it seems like overkill
to me to comment on that at every site of use.  Better to comment on
the pgprocno definition itself and say "never changes".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/lmgr/README b/src/backend/storage/lmgr/README
index cb9c7d6..8eaa91c 100644
--- a/src/backend/storage/lmgr/README
+++ b/src/backend/storage/lmgr/README
@@ -591,26 +591,27 @@ Group Locking
 
 As if all of that weren't already complicated enough, PostgreSQL now supports
 parallelism (see src/backend/access/transam/README.parallel), which means that
-we might need to resolve deadlocks that occur between gangs of related processes
-rather than individual processes.  This doesn't change the basic deadlock
-detection algorithm very much, but it makes the bookkeeping more complicated.
+we might need to resolve deadlocks that occur between gangs of related
+processes rather than individual processes.  This doesn't change the basic
+deadlock detection algorithm very much, but it makes the bookkeeping more
+complicated.
 
 We choose to regard locks held by processes in the same parallel group as
-non-conflicting.  This means that two processes in a parallel group can hold
-a self-exclusive lock on the same relation at the same time, or one process
-can acquire an AccessShareLock while the other already holds AccessExclusiveLock.
+non-conflicting.  This means that two processes in a parallel group can hold a
+self-exclusive lock on the same relation at the same time, or one process can
+acquire an AccessShareLock while the other already holds AccessExclusiveLock.
 This might seem dangerous and could be in some cases (more on that below), but
 if we didn't do this then parallel query would be extremely prone to
 self-deadlock.  For example, a parallel query against a relation on which the
 leader had already AccessExclusiveLock would hang, because the workers would
-try to lock the same relation and be blocked by the leader; yet the leader can't
-finish until it receives completion indications from all workers.  An undetected
-deadlock results.  This is far from the only scenario where such a problem
-happens.  The same thing will occur if the leader holds only AccessShareLock,
-the worker seeks AccessShareLock, but between the time the leader attempts to
-acquire the lock and the time the worker attempts to acquire it, some other
-process queues up waiting for an AccessExclusiveLock.  In this case, too, an
-indefinite hang results.
+try to lock the same relation and be blocked by the leader; yet the leader
+can't finish until it receives completion indications from all workers.  An
+undetected deadlock results.  This is far from the only scenario where such a
+problem happens.  The same thing will occur if the leader holds only
+AccessShareLock, the worker seeks AccessShareLock, but between the time the
+leader attempts to acquire the lock and the time the worker attempts to
+acquire it, some other process

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-15 Thread Noah Misch

On Mon, Feb 15, 2016 at 07:31:40PM -0500, Robert Haas wrote:
> On Mon, Feb 15, 2016 at 5:52 PM, Noah Misch  wrote:
> > On Mon, Feb 08, 2016 at 02:49:27PM -0500, Robert Haas wrote:
> >> force_parallel_mode=regress
> >> max_parallel_degree=2
> >>
> >> And then run this: make check-world 
> >> TEMP_CONFIG=/path/to/aforementioned/file

> > I configured a copy of animal "mandrill" that way and launched a test run.
> > The postgres_fdw suite failed as attached.  A manual "make -C contrib
> > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> > x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> > check-world passes everywhere.
> 
> Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
> test suites.  Is there any reason for that, or is it just kinda where
> we ended up?

To my knowledge, it's just the undesirable place we ended up.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-15 Thread Robert Haas

On Mon, Feb 15, 2016 at 5:52 PM, Noah Misch  wrote:
> On Mon, Feb 08, 2016 at 02:49:27PM -0500, Robert Haas wrote:
>> Well, what I've done is push into the buildfarm code that will allow
>> us to do *the most exhaustive* testing that I know how to do in an
>> automated fashion. Which is to create a file that says this:
>>
>> force_parallel_mode=regress
>> max_parallel_degree=2
>>
>> And then run this: make check-world TEMP_CONFIG=/path/to/aforementioned/file
>>
>> Now, that is not going to find bugs in the deadlock.c portion of the
>> group locking patch, but it's been wildly successful in finding bugs
>> in other parts of the parallelism code, and there might well be a few
>> more that we haven't found yet, which is why I'm hoping that we'll get
>> this procedure running regularly either on all buildfarm machines, or
>> on some subset of them, or on new animals that just do this.
>
> I configured a copy of animal "mandrill" that way and launched a test run.
> The postgres_fdw suite failed as attached.  A manual "make -C contrib
> installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> check-world passes everywhere.

Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
test suites.  Is there any reason for that, or is it just kinda where
we ended up?

Retrying it the way you did it, I see the same errors here, so I think
this isn't a PPC-specific problem, but just a problem in general.
I've actually seen these kinds of errors before in earlier versions of
the testing code that eventually became force_parallel_mode.  I got
fooled into believing I'd fixed the problem because of my confusion
about how TEMP_CONFIG worked.  I think this is more likely to be a bug
in force_parallel_mode than a bug in the code that checks whether a
normal parallel query is safe, but I'll have to track it down before I
can say for sure.

Thanks for testing this.  It's not delightful to discover that I
muffed this, but better to find it now than in 6 months.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2016-02-15 Thread Amit Langote

Hi Corey,

On 2016/02/16 5:15, Corey Huinker wrote:
>>
>> The individual patches have commit messages that describe code changes.
>> This is registered in the upcoming CF. Feedback and review is greatly
>> welcomed!
>>
> We have a current system that is currently a mix of tables, each of which
> is range partitioned into approximately 15 partitions (using the pgxn range
> partitioning extension), and those partitions are themselves date-series
> partitioned via pg_partman. The largest table ingests about 100M rows per
> day in a single ETL. I will try this patch out and see how well it compares
> in handling the workload. Do you have any areas of interest or concern that
> I should monitor?

Thanks a lot for willing to give it a spin!

I would say this patch series is more geared toward usability.  For
example, you won't have to write a trigger to route tuples to correct
partitions. You can try your mentioned ETL load and see if the patch's
implementation of tuple routing fares any better than existing
trigger-based approach. Maybe, you were expecting something like load into
a stand-alone table and then ALTER TABLE INHERIT to instantly load into
the partitioned table (roll-in), but that command is not yet implemented
(ATTACH PARTITION command will show a "not implemented" error message).

Also, you won't see any optimizer and executor changes. Queries will still
use the same plans as existing inheritance-based partitioned tables,
although as I mentioned, constraint exclusion won't yet kick in. That will
be fixed very shortly.

And of course, comments on syntax are welcome as well.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2016-02-15 Thread Robert Haas

On Mon, Feb 15, 2016 at 4:58 AM, Fabien COELHO  wrote:
> Indeed!
>
>> Taking patch 1 as a completely independent thing, there is no need to
>> introduce PgBenchValueType yet. Similar remark for setIntValue and
>> coerceToInt. They are useful afterwards when introducing double types to be
>> able to handle double input parameters for the gaussian and other functions.
>
> Yes. This is exactly the pain I'm trying to avoid, creating a different
> implementation for the first patch, which is just overriden when the second
> part is applied...

Splitting a patch means breaking it into independently committable
sub-patches, not just throwing each line of the diff into a different
pile as best you can.  I'm with Michael: that part doesn't belong in
this patch.  If we want to have an infrastructure refactoring patch
that just replaces every relevant use of int64 with PgBenchValue, a
union supporting only integer values, then we can do that first and
have a later patch introduce double as a separate change.  But we
can't have things that are logically part of patch #2 just tossed in
with patch #1.

I was in the middle of ripping that out of the patch when I realized
that evalFunc() is pretty badly designed.  What you've done is made it
the job of each particular function to evaluate its arguments.  I
don't think that's a good plan.  I think that when we discover that
we've got a function, we should evaluate all of the arguments that
were passed to it using common code that is shared across all types of
functions and operators.  Then, the evaluated arguments should be
passed to the function-specific code, which can do as it likes with
them.  This way, you have less code that is specific to particular
operations and more common code, which is generally a good thing.
Every expression evaluation engine that I've ever heard of works this
way - see, for example, the PostgreSQL backend.

I experimented with trying to do this and ran into a problem: where
exactly would you store the evaluated arguments when you don't know
how many of them there will be?  And even if you did know how many of
them there will be, wouldn't that mean that evalFunc or evaluateExpr
would have to palloc a buffer of the correct size for each invocation?
 That's far more heavyweight than the current implementation, and
minimizing CPU usage inside pgbench is a concern.  It would be
interesting to do some pgbench runs with this patch, or the final
patch, and see what effect it has on the TPS numbers, if any, and I
think we should.  But the first concern is to minimize any negative
impact, so let's talk about how to do that.

I think we should limit the number of arguments to a function to, say,
16, so that an array of int64s or PgBenchValues long enough to hold an
entire argument list can be stack-allocated.  The backend's limit is
higher, but the only reason we need a value higher than 2 here is
because you've chosen to introduce variable-argument functions; but I
think 16 is enough for any practical purpose.  Then, I think we should
also change the parse tree representation so that transforms the
linked-list into an array stored within the PgBenchExpr, so that you
can access the expression for argument i as expr->u.arg[i].  Then we
can write this is a loop that evaluates each expression in an array of
expressions and stores the result in an array of values.  That seems
like it would be both cleaner and faster than what you've got here
right now.  It's also more similar to what you did with the function
name itself: the most trivial thing the parser could do is store a
pointer to the function or operator name, but that would be slow, so
instead it looks up the function and stores a constant.

I also went over your documentation changes.  I think you inserted the
new table smack dab in the middle of a section in a place that it
doesn't really belong.  The version attached makes it into its own
, puts it in a new section a bit further down so that it
doesn't break up the flow, and has a few other tweaks that I think are
improvements.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ade1b53..f39f341 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -786,7 +786,7 @@ pgbench  options  dbname

-   
+   

  \set varname expression

@@ -798,8 +798,10 @@ pgbench  options  dbname
   The expression may contain integer constants such as 5432,
   references to variables :variablename,
   and expressions composed of unary (-) or binary operators
-  (+, -, *, /, %)
-  with their usual associativity, and parentheses.
+  (+, -, *, /,
+  %) with their usual associativity,
+  function calls, and
+  parentheses.

@@ -994,6 +996,62 @@ END;

+ 
+  Built-In Functions
+
+   
+ The following functions are built into

Re: [HACKERS] planstats.sgml

2016-02-15 Thread Tatsuo Ishii

> Tatsuo Ishii  writes:
>> While reading planstats.sgml, I encounted a sentence which I don't
>> understand.
> 
>> These numbers are current as of the last VACUUM or
>> ANALYZE on the table.  The planner then fetches the
>> actual current number of pages in the table (this is a cheap operation,
>> not requiring a table scan).  If that is different from
>> relpages then
>> reltuples is scaled accordingly to
>> arrive at a current number-of-rows estimate.  In this case the value of
>> relpages is up-to-date so the rows estimate is
>> the same as reltuples.
> 
>> I don't understand the last sentence (In this case...). For me it
>> seems it is talking about the case when replages is not different from
>> what the planner fetches from the table. If so, why "In this case"?
> 
> I think what it meant is "In the example above, ...".  Feel free to
> change it if you think that is clearer.

Oh, I see. I'm going to change "In this case" to "In the example above".

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planstats.sgml

2016-02-15 Thread Tom Lane

Tatsuo Ishii  writes:
> While reading planstats.sgml, I encounted a sentence which I don't
> understand.

> These numbers are current as of the last VACUUM or
> ANALYZE on the table.  The planner then fetches the
> actual current number of pages in the table (this is a cheap operation,
> not requiring a table scan).  If that is different from
> relpages then
> reltuples is scaled accordingly to
> arrive at a current number-of-rows estimate.  In this case the value of
> relpages is up-to-date so the rows estimate is
> the same as reltuples.

> I don't understand the last sentence (In this case...). For me it
> seems it is talking about the case when replages is not different from
> what the planner fetches from the table. If so, why "In this case"?

I think what it meant is "In the example above, ...".  Feel free to
change it if you think that is clearer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] innocuous: pgbench does FD_ISSET on invalid socket

2016-02-15 Thread Alvaro Herrera

I noticed that strerror(errno) wasn't the most helpful error context
ever, so I changed it to PQerrorMessage().  There may be room for
additional improvement there.

I also noticed that if one connection dies, we terminate the whole
thread, and if the thread is serving multiple connections, the other
ones are not PQfinished.  It may or may not be worthwhile improving
this; if pgbench is used to test the server when connections are
randomly dropped that would be helpful, otherwise not so much.

I ended up backpatching all the way back.

Fabien COELHO wrote:
> 
> Hello Alvaro,
> 
> >Any objections to changing it like this?  I'd probably backpatch to 9.5,
> >but no further (even though this pattern actually appears all the way
> >back.)
> 
> My 0.02€: if the pattern is repeated, maybe a function which incorporates
> the check would save lines and improve overall readability?
> 
>... = safePQsocket(...);

Hmm, yeah, but that doesn't work too well because we're not invoking
exit() in case of an error (which is what the safe pg_malloc etc do), so
we'd have to check for what to do after safePQsocket returns -- no
improvement in code clarity IMHO.  Thanks for the suggestion.

Michael Paquier wrote:

> Different issues in the same area:
> 1) In vacuumdb.c, init_slot() does not check for the return value of 
> PQsocket():
> slot->sock = PQsocket(conn);
> 2) In isolationtester.c, try_complete_step() should do the same.
> 3) In pg_recvlogical.c for StreamLogicalLog() I am spotting the same problem.
> I guess those ones should be fixed as well, no?

Hmm, yeah, perhaps it's worth closing these too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planstats.sgml

2016-02-15 Thread David G. Johnston

On Mon, Feb 15, 2016 at 4:23 PM, Tatsuo Ishii  wrote:

> While reading planstats.sgml, I encounted a sentence which I don't
> understand.
>
> These numbers are current as of the last VACUUM or
> ANALYZE on the table.  The planner then fetches the
> actual current number of pages in the table (this is a cheap operation,
> not requiring a table scan).  If that is different from
> relpages then
> reltuples is scaled accordingly to
> arrive at a current number-of-rows estimate.  In this case the value of
> relpages is up-to-date so the rows estimate
> is
> the same as reltuples.
>
> I don't understand the last sentence (In this case...). For me it
> seems it is talking about the case when replages is not different from
> what the planner fetches from the table. If so, why "In this case"?
> Isn't "In this case" referrers to the previous sentence (If that is
> different from...)? Maybe "In this case" should be "Otherwise" or some
> such?
>
>
The whole sentence is awkward but you are correct in your reading - and
"otherwise" would be a solid choice.

Though iIt seems the whole thing could be simplified to:

These numbers are current as of the last VACUUM or ANALYZE on the table.
To account for subsequent changes the planner obtains the actual page count
for the table and scales pg_class.reltuples by a factor of "actual page
count" over pg_class.relpages.

The "cheap operation" comment seems gratuitous though could still be
injected if desired.

David J.

Re: [HACKERS] Using quicksort for every external sort run

2016-02-15 Thread Greg Stark

On Mon, Feb 15, 2016 at 8:43 PM, Jim Nasby  wrote:
> On 2/7/16 8:57 PM, Peter Geoghegan wrote:
>>
>> It seems less than ideal that DBAs have to be so
>> conservative in sizing work_mem.
>
>
> +10

I was thinking about this over the past couple weeks. I'm starting to
think the quicksort runs gives at least the beginnings of a way
forward on this front. Keeping in mind that we know how many tapes we
can buffer in memory and the number is likely to be relatively large
-- on the order of 100+ is typical, what if do something like the
following rough sketch:

Give users two knobs, a lower bound "sort in memory using quicksort"
memory size and an upper bound "absolutely never use more than this"
which they can set to a substantial fraction of physical memory. Then
when we overflow the lower bound we start generating runs, the first
one being of that length. Each run we generate we double (or increase
by 50% or something) until we hit the maximum. That means that the
first few runs may be shorter than necessary but we have enough tapes
available that that doesn't hurt much and we'll eventually get to a
large enough run size that we won't run out of tapes and can still do
a single final (on the fly) merge.

In fact what's really attractive about this idea is that it might give
us a reasonable spot to do some global system resource management.
Each time we want to increase the run size we check some shared memory
counter of how much memory is in use and refuse to increase if there's
too much in use (or if we're using too large a fraction of it or some
other heuristic). The key point is that since we don't need to decide
up front at the beginning of the sort and we don't need to track it
continuously there is neither too little nor too much contention on
this shared memory variable. Also the behaviour would be not too
chaotic if there's a user-tunable minimum and the other activity in
the system only controls how more memory it can steal from the global
pool on top of that.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] planstats.sgml

2016-02-15 Thread Tatsuo Ishii

While reading planstats.sgml, I encounted a sentence which I don't
understand.

These numbers are current as of the last VACUUM or
ANALYZE on the table.  The planner then fetches the
actual current number of pages in the table (this is a cheap operation,
not requiring a table scan).  If that is different from
relpages then
reltuples is scaled accordingly to
arrive at a current number-of-rows estimate.  In this case the value of
relpages is up-to-date so the rows estimate is
the same as reltuples.

I don't understand the last sentence (In this case...). For me it
seems it is talking about the case when replages is not different from
what the planner fetches from the table. If so, why "In this case"?
Isn't "In this case" referrers to the previous sentence (If that is
different from...)? Maybe "In this case" should be "Otherwise" or some
such?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-15 Thread Noah Misch

On Mon, Feb 15, 2016 at 06:07:48PM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > I configured a copy of animal "mandrill" that way and launched a test run.
> > The postgres_fdw suite failed as attached.  A manual "make -C contrib
> > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> > x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> > check-world passes everywhere.
> 
> Hm, is this with or without the ppc-related atomics fix you just found?

Without those.  The ppc64 GNU/Linux configuration used gcc, though, and the
atomics change affects xlC only.  Also, the postgres_fdw behavior does not
appear probabilistic; it failed twenty times in a row.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-15 Thread Tom Lane

Noah Misch  writes:
> I configured a copy of animal "mandrill" that way and launched a test run.
> The postgres_fdw suite failed as attached.  A manual "make -C contrib
> installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> check-world passes everywhere.

Hm, is this with or without the ppc-related atomics fix you just found?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock with Adaptive Replacement

2016-02-15 Thread Tom Lane

Jim Nasby  writes:
> On 2/12/16 9:55 AM, Robert Haas wrote:
>> I think it's important to spend time and energy figuring out exactly
>> what the problems with our current algorithm are.  We know in general
>> terms that usage counts tend to converge to either 5 or 0 and
>> therefore sometimes evict buffers both at great cost and almost

> Has anyone done testing on the best cap to usage count? IIRC 5 was 
> pulled out of thin air.

My recollection is that there was some testing behind it ... but that
was back around 2005 so it seems safe to assume that that testing
is no longer terribly relevant.  In particular, I'm sure it was tested
with shared_buffer counts far smaller than what we now consider sane.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

2016-02-15 Thread Tom Lane

Alvaro Herrera  writes:
> (FWIW I'm not the "current" CF manager, because the CF I managed is
> already over.  I'm not sure that we know who the manager for the
> upcoming one is.)

We had a vict^H^H^H^Hvolunteer a few days ago:

http://www.postgresql.org/message-id/56b91a6d.6030...@pgmasters.net

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations

2016-02-15 Thread Kevin Grittner

On Wed, Feb 3, 2016 at 5:12 PM, Thomas Munro
 wrote:

> I don't see it as a difficult choice between two reasonable
> alternatives.  It quacks suspiciously like a bug.

That seems a little strong to me; I think it would be an
unacceptable change in behavior to back-patch this, for example.
On the other hand, we have had multiple reports on these lists
asserting that the behavior is a bug (not to mention several
off-list communications to me about it), it seems like a POLA
violation, it hides the information that users of serializable
transactions consider most important in favor of relatively
insignificant (to them) details about what table and key were
involved, and it causes errors to be presented to end users that
the developers would prefer to be handled discretely in the
background.  The current behavior provides this guarantee:

"Any set of successfully committed concurrent serializable
transactions will provide a result consistent with running them one
at a time in some order."

Users of serializable transactions would, in my experience,
universally prefer to strengthen that  guarantee with:

"Should a serializable transaction fail only due to the action of a
concurrent serializable transaction, it should fail with a
serialization failure error."

People have had to resort to weird heuristics like performing a
limited number of retries on a duplicate key error in case it
happens to be due to a serialization problem, but that wastes
resources when it is not a serialization failure, and unnecessarily
complicates the retry framework.

> The theoretical problem with the current behaviour is that by
> reporting a unique constraint violation in this case, we reach an
> outcome that is neither a serialization failure nor a result that
> could occur in any serial ordering of transactions.

Well, not if you only consider successfully committed transactions.  ;-)

> The overlapping
> transactions both observed that the key they planned to insert was not
> present before inserting, and therefore they can't be untangled: there
> is no serial order of the transactions where the second transaction to
> run wouldn't see the key already inserted by the first transaction and
> (presumably) take a different course of action.  (If it *does* see the
> value already present in its snapshot, or doesn't even look first
> before inserting and it turns out to be present, then it really
> *should* get a unique constraint violation when trying to insert.)
>
> The practical problem with the current behaviour is that the user has
> to work out whether a unique constraint violation indicates:
>
> 1.  A programming error -- something is wrong that retrying probably won't fix
>
> 2.  An unfavourable outcome in the case that you speculatively
> inserted without checking whether the value was present so you were
> expecting a violation to be possible, in which case you know what
> you're doing and you can figure out what to do next, probably retry or
> give up
>
> 3.  A serialization failure that has been masked because our coding
> happens to check for unique constraint violations without considering
> SSI conflicts first -- retrying will eventually succeed.
>
> It's complicated for a human to work out how to distinguish the third
> category errors in each place where they might occur (and even to know
> that they are possible, since AFAIK the manual doesn't point it out),
> and impossible for an automatic retry-on-40001 framework to handle in
> general.  SERIALIZABLE is supposed to be super easy to use (and
> devilishly hard to implement...).

This is exactly on the mark, IMO.

FWIW, at the conference in Moscow I had people for whom this is
their #1 feature request.  (Well, actually, they also argued it
should be considered a bug fix; but on argument agreed that the
current guarantee is useful and operating as designed, so were
willing to see it treated as an enhancement.)

Another way of stating the impact of this patch is that it changes
the guarantee to:

"If you write a transaction so that it does the right thing when
run alone, it will always do the right thing as part of any mix of
serializable transactions or will fail with a serialization failure
error."

Right now we have to add:

"... or, er, maybe a duplicate key error."

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-15 Thread Tom Lane

Andres Freund  writes:
> I wonder if we shouldn't just expose a 'which pid is process X waiting
> for' API, implemented serverside. That's generally really useful, and
> looks like it's actually going to be less complicated than that
> query... And it's surely going to be faster.

Attached is a draft patch for a new function that reports the set of PIDs
directly blocking a given PID (that is, holding or awaiting conflicting
locks on the lockable object it's waiting for).

I replaced isolationtester's pg_locks query with this, and found that
it's about 9X faster in a normal build, and 3X faster with
CLOBBER_CACHE_ALWAYS turned on.  That would give us some nice headroom
for the isolation tests with CLOBBER_CACHE_ALWAYS animals.  (Note that
in view of
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor=2016-02-14%2007%3A38%3A37
we still need to do *something* about the speed of the new deadlock-hard
test; this patch could avoid the need to dumb down or slow down that test
even further.)

Not to be neglected also is that (I believe) this gives the right answer,
whereas isolationtester's existing query is currently completely broken by
parallel queries, and it doesn't understand non-conflicting lock modes
either.  (It did, at least partially, before commit 38f8bdcac4982215;
I am not sure that taking out the mode checks was a good idea.  But
putting them back would make the query slower yet.)

This is WIP, in part because I've written no user docs for the new
function, and in part because I think people might want to bikeshed the
API.  What is here is:

"pg_blocker_pids(integer) returns integer[]" returns a possibly-empty
array of the PIDs of backend processes that block the backend with
specified PID.  You get an empty array, not an error, if the argument
isn't a valid backend PID or that backend isn't waiting.  In parallel
query situations, the output includes PIDs that are blocking any PID
in the given process's lock group, and what is reported is always
the PID of the lock group leader for whichever process is kdoing the
blocking.  Also, in parallel query situations, the same PID might appear
multiple times in the output; it didn't seem worth trying to eliminate
duplicates.

Reasonable API change requests might include returning a rowset rather
than an array and returning more data per lock conflict.  I've not
bothered with either thing here because isolationtester doesn't care
and it would make the query somewhat slower for isolationtester's
usage (though, probably, not enough slower to really matter).

It should also be noted that I've not really tested the parallel
query aspects of this, because I'm not sure how to create a conflicting
lock request in a parallel worker.  However, if I'm not still
misunderstanding the new semantics of the lock data structures, that
aspect of it seems unlikely to be wrong.

Comments?

regards, tom lane

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 91218d0..97e8962 100644
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
*** HaveVirtualXIDsDelayingChkpt(VirtualTran
*** 2313,2318 
--- 2313,2341 
  PGPROC *
  BackendPidGetProc(int pid)
  {
+ 	PGPROC	   *result;
+ 
+ 	if (pid == 0)/* never match dummy PGPROCs */
+ 		return NULL;
+ 
+ 	LWLockAcquire(ProcArrayLock, LW_SHARED);
+ 
+ 	result = BackendPidGetProcWithLock(pid);
+ 
+ 	LWLockRelease(ProcArrayLock);
+ 
+ 	return result;
+ }
+ 
+ /*
+  * BackendPidGetProcWithLock -- get a backend's PGPROC given its PID
+  *
+  * Same as above, except caller must be holding ProcArrayLock.  The found
+  * entry, if any, can be assumed to be valid as long as the lock remains held.
+  */
+ PGPROC *
+ BackendPidGetProcWithLock(int pid)
+ {
  	PGPROC	   *result = NULL;
  	ProcArrayStruct *arrayP = procArray;
  	int			index;
*** BackendPidGetProc(int pid)
*** 2320,2327 
  	if (pid == 0)/* never match dummy PGPROCs */
  		return NULL;

- 	LWLockAcquire(ProcArrayLock, LW_SHARED);
- 
  	for (index = 0; index < arrayP->numProcs; index++)
  	{
  		PGPROC	   *proc = [arrayP->pgprocnos[index]];
--- 2343,2348 
*** BackendPidGetProc(int pid)
*** 2333,2340 
  		}
  	}

- 	LWLockRelease(ProcArrayLock);
- 
  	return result;
  }

--- 2354,2359 
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index fef59a2..fb32769 100644
*** a/src/backend/storage/lmgr/lock.c
--- b/src/backend/storage/lmgr/lock.c
***
*** 21,27 
   *
   *	Interface:
   *
!  *	InitLocks(), GetLocksMethodTable(),
   *	LockAcquire(), LockRelease(), LockReleaseAll(),
   *	LockCheckConflicts(), GrantLock()
   *
--- 21,27 
   *
   *	Interface:
   *
!  *	InitLocks(), GetLocksMethodTable(), GetLockTagsMethodTable(),
   *	LockAcquire(), LockRelease(), LockReleaseAll(),
   *	LockCheckConflicts(), GrantLock()
   *
***
*** 41,46

Re: [HACKERS] Freeze avoidance of very large table.

2016-02-15 Thread Bruce Momjian

On Wed, Feb 10, 2016 at 04:39:15PM +0900, Kyotaro HORIGUCHI wrote:
> > I still agree with this plugin approach, but I felt it's still
> > complicated a bit, and I'm concerned that patch size has been
> > increased.
> > Please give me feedbacks.
> 
> Yeah, I feel the same. What make it worse, the plugin mechanism
> will get further complex if we make it more flexible for possible
> usage as I proposed above. It is apparently too complicated for
> deciding whether to load *just one*, for now, converter
> function. And no additional converter is in sight.
> 
> I incline to pull out all the plugin stuff of pg_upgrade. We are
> so prudent to make changes of file formats so this kind of events
> will happen with several-years intervals. The plugin mechanism
> would be valuable if we are encouraged to change file formats
> more frequently and freely by providing it, but such situation
> absolutely introduces more untoward things..

I agreed on ripping out the converter plugin ability of pg_upgrade. 
Remember pg_upgrade was originally written by EnterpriseDB staff, and I
think they expected their closed-source fork of Postgres might need a
custom page converter someday, but it never needed one, and at this
point I think having the code in there is just making things more
complex.  I see _no_ reason for community Postgres to use a plugin
converter because we are going to need that code for every upgrade from
pre-9.6 to 9.6+, so why not just hard-code in the functions we need.  We
can remove it once 9.5 is end-of-life.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

2016-02-15 Thread Peter Geoghegan

On Mon, Feb 15, 2016 at 12:58 PM, Alvaro Herrera
 wrote:
> (FWIW I'm not the "current" CF manager, because the CF I managed is
> already over.  I'm not sure that we know who the manager for the
> upcoming one is.)

It's pretty easy to forget that this was the first time in a long time
that the CF wasn't closed because the next CF began.  :-)


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock with Adaptive Replacement

2016-02-15 Thread Jim Nasby


On 2/12/16 9:55 AM, Robert Haas wrote:

I think it's important to spend time and energy figuring out exactly
what the problems with our current algorithm are.  We know in general
terms that usage counts tend to converge to either 5 or 0 and
therefore sometimes evict buffers both at great cost and almost


Has anyone done testing on the best cap to usage count? IIRC 5 was 
pulled out of thin air. Actually, I don't recall ever seeing a clock 
sweep that supported more than a single bit, though often there are 
multiple 'pools' a buffer could be in (ie: active vs inactive in most 
unix VMs).


If you have a reasonable amount of 1 or 0 count buffers then this 
probably doesn't matter too much, but if your working set is 
significantly higher than shared buffers then you're probably doing a 
LOT of full sweeps to try and get something decremented down to 0.



randomly.  But what's a lot less clear is how much that actually hurts
us given that we are relying on the OS cache anyway.  It may be that
we need to fix some other things before or after improving the buffer
eviction algorithm before we actually get a performance benefit.  I
suspect, for example, that a lot of the problems with large
shared_buffers settings have to do with the bgwriter and checkpointer
behavior rather than with the buffer eviction algorithm; and that
others have to do with cache duplication between PostgreSQL and the
operating system.  So, I would suggest (although of course it's up to


It would be nice if there was at least an option to instrument how long 
an OS read request took, so that you could guage how many requests were 
coming from the OS vs disk. (Obviously direct knowledge from the OS is 
even better, but I don't think those APIs exist.)



you) that you might want to focus on experiments that will help you
understand where the problems are before you plunge into writing code
to fix them.


+1
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

2016-02-15 Thread Alvaro Herrera

Peter Geoghegan wrote:

> Michael (the CF manager at the time) remembered to change the status
> to "Ready for Committer" again; you see this entry immediately
> afterwards:
> 
> "New status: Ready for Committer"
> 
> but I gather from the CF app history that Alvaro (the current CF
> manager) did not remember to do that second step when he later moved
> the patch to the "next" (current) CF. Or maybe he just wasn't aware of
> this quirk of the CF app.

That seems a bug in the CF app to me.

(FWIW I'm not the "current" CF manager, because the CF I managed is
already over.  I'm not sure that we know who the manager for the
upcoming one is.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2016-02-15 Thread Jim Nasby


On 2/7/16 8:57 PM, Peter Geoghegan wrote:

It seems less than ideal that DBAs have to be so
conservative in sizing work_mem.


+10
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-02-15 Thread Joe Conway

On 01/20/2016 08:08 PM, Michael Paquier wrote:
> On Wed, Jan 20, 2016 at 2:32 AM, Joe Conway  wrote:
>> The only things I know of still lacking is:
>> 1) Documentation

Done and included in the attached.

>> 2) Decision on REVOKE ... FROM PUBLIC
> 
> Yep, regarding 2) I am the only one actually making noise to protect
> this information by default, against at least 2 committers :)

I plan to commit this way -- if the decision is made to remove the two
REVOKEs it can always be done later, but I see no problem with it.

> +typedef struct configdata
> +{
> +   char   *name;
> +   char   *setting;
> +} configdata;
> For a better analogy to ControlFileData, this could be renamed ConfigData?

Well I was already using ConfigData as the variable name, but after some
review it seems better your way, so I made the struct ConfigData and the
variable configdata.

> The point of the move to src/common is to remove the duplication in
> src/bin/pg_config/Makefile.

check

> All the files in src/common should begin their include declarations with that:
> #ifndef FRONTEND
> #include "postgres.h"
> #else
> #include "postgres_fe.h"
> #endif

check

> +configdata *
> +get_configdata(char *my_exec_path, size_t *configdata_len)
> +{
> It may be good to mention that the result is palloc'd and that caller
> may need to free it if necessary. It does not matter in the two code
> paths of this patch, but it may matter for other users calling that.

check

I believe this takes care of all open issues with this, so I plan to
commit it as attached in a day or two. Thanks for your reviews and comments!

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 412c845..6c50f79 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 7350,7355 
--- 7350,7360 
   
  
   
+   pg_config
+   compile-time configuration parameters
+  
+ 
+  
pg_cursors
open cursors
   
***
*** 7609,7614 
--- 7614,7667 

   
  
+  
+   pg_config
+ 
+   
+pg_config
+   
+ 
+   
+The view pg_config describes the
+compile-time configuration parameters of the currently installed
+version of PostgreSQL. It is intended, for example, to be used by
+software packages that want to interface to PostgreSQL to facilitate
+finding the required header files and libraries. It provides the same
+basic information as the  PostgreSQL Client
+Application. There is a System Information Function
+() called pg_config
+which underlies this view.
+   
+ 
+   
+pg_config Columns
+
+ 
+  
+   Name
+   Type
+   Description
+  
+ 
+ 
+ 
+  
+   name
+   text
+   The parameter name
+  
+ 
+  
+   setting
+   text
+   The parameter value
+  
+ 
+
+   
+ 
+  
+ 
   
pg_cursors
  
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f9eea76..29c36d2 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT * FROM pg_ls_dir('.') WITH ORDINA
*** 15003,15008 
--- 15003,15014 

  

+pg_config()
+setof record
+get list of compile-time configure variable names and their values
+   
+ 
+   
 pg_is_other_temp_schema(oid)
 boolean
 is schema another session's temporary schema?
*** SET search_path TO schema
  
 
+ pg_config
+
+ 
+
+ pg_config returns a set of records describing the
+ compile-time configuration parameters of the currently installed
+ version of PostgreSQL. It is intended, for example, to be used by
+ software packages that want to interface to PostgreSQL to facilitate
+ finding the required header files and libraries. The
+ name column contains the parameter name.
+ The setting column contains the parameter value. It
+ provides the same basic information as the
+  PostgreSQL Client Application. There
+ is also a  system view.
+
+ 
+
  version
 
  
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 923fe58..abf9a70 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_timezone_abbrevs AS
*** 433,438 
--- 433,444 
  CREATE VIEW pg_timezone_names AS
  SELECT * FROM pg_timezone_names();
  
+ CREATE VIEW pg_config AS
+ SELECT * FROM pg_config();
+ 
+ REVOKE ALL on pg_config FROM PUBLIC;
+ REVOKE EXECUTE ON FUNCTION pg_config() FROM PUBLIC;
+ 
  -- Statistics views
  
  CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 7889101..a0c82c1 100644
***

Re: [HACKERS] Declarative partitioning

2016-02-15 Thread Corey Huinker

>
>
> The individual patches have commit messages that describe code changes.
> This is registered in the upcoming CF. Feedback and review is greatly
> welcomed!
>
> Thanks,
> Amit
>
>
We have a current system that is currently a mix of tables, each of which
is range partitioned into approximately 15 partitions (using the pgxn range
partitioning extension), and those partitions are themselves date-series
partitioned via pg_partman. The largest table ingests about 100M rows per
day in a single ETL. I will try this patch out and see how well it compares
in handling the workload. Do you have any areas of interest or concern that
I should monitor?

Re: [HACKERS] PoC: Partial sort

2016-02-15 Thread Peter Geoghegan

On Sun, Jan 31, 2016 at 4:16 AM, Alvaro Herrera
 wrote:
> Great to have a new version -- there seems to be a lot of interest in
> this patch.  I'm moving this one to the next commitfest, thanks.

I am signed up to review this patch.

I was very surprised to see it in "Needs Review" state in the CF app
(Alexander just rebased the patch, and didn't do anything with the CF
app entry). Once again, this seems to have happened just because
Alvaro moved the patch to the next CF.

I've marked it "Waiting on Author" once more. Hopefully the CF app
will be fixed soon, so moving a patch to the next commitfest no longer
clobbers its state.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove or weaken hints about "effective resolution of sleep delays is 10 ms"?

2016-02-15 Thread Peter Geoghegan

On Mon, Feb 15, 2016 at 11:23 AM, Peter Geoghegan  wrote:
> commit_delay doesn't have any guidance like this, where it could
> certainly matter, because optimal settings are rarely greater than 10
> milliseconds.

Actually, it does, but it's in "29.4. WAL Configuration", not next to
the documentation for commit_delay.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove or weaken hints about "effective resolution of sleep delays is 10 ms"?

2016-02-15 Thread Peter Geoghegan

On Wed, Feb 10, 2016 at 2:15 PM, Andres Freund  wrote:
> Afaik that's not the case on any recent operating system/hardware. So
> perhaps we should just remove all of those blurbs, or just replace them
> with something like "on some older systems the effective resolution of
> sleep delays is limited to multiples of 10 milliseconds"?

Or just remove it entirely. Really, I can't see that that's likely to
matter when it does apply. Also, don't forget to do the same with
bgwriter_delay.

commit_delay doesn't have any guidance like this, where it could
certainly matter, because optimal settings are rarely greater than 10
milliseconds.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Code refactoring related to -fsanitize=use-after-scope

2016-02-15 Thread Tom Lane

Andres Freund  writes:
> On 2016-02-15 14:37:28 +0100, Martin LiÅ¡ka wrote:
>> I've been currently working on support of -sanitize=use-after-scope in the 
>> GCC compiler and
>> I decided to use postgresql as my test-case. The sanitation poisons every 
>> stack variable at the
>> very beginning of a function, unpoisons a variable at the beginning of scope 
>> definition and finally
>> poisons the variable again at the end of scope.

> Generally sounds like a good check.

>> Following patch fixes issues seen by the sanitizer. Hope it's acceptable?
>> With the patch applied, ASAN (with the new sanitization) works fine.

> But I'm not immediately seing why this is necessary? Is this about
> battling a false positive?

I bet a nickel that this is triggered by the goto leading into those
variables' scope ("goto process_inner_tuple" at line 2038 in HEAD).
That probably bypasses the "unpoison" step.

However, doesn't this represent a bug in the sanitizer rather than
anything we should change in Postgres?  There is no rule in C that
you can't execute such a goto, especially not if there is no
initialization of those variables.

If you can think of a reasonable refactoring that gets rid of the need
for that goto, I'd be for that, because it's certainly unsightly.
But I don't think it's wrong, and I don't think that the proposed patch
is any improvement from a structured-programming standpoint.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xlc atomics

2016-02-15 Thread Tom Lane

Noah Misch  writes:
> That commit (0d32d2e) permitted things to compile and usually pass tests, but
> I missed the synchronization bug.  Since 2015-10-01, the buildfarm has seen
> sixteen duplicate-catalog-OID failures.

I'd been wondering about those ...

> These suggested OidGenLock wasn't doing its job.  I've seen similar symptoms
> around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
> 5765-J08)" for ppc64le.  The problem is generic-xlc.h
> pg_atomic_compare_exchange_u32_impl() issuing __isync() before
> __compare_and_swap().  __isync() shall follow __compare_and_swap(); see our
> own s_lock.h, its references, and other projects' usage:

Nice catch!

> This patch's test case would have failed about half the time under today's
> generic-xlc.h.  Fast machines run it in about 1s.  A test that detects the bug
> 99% of the time would run far longer, hence this compromise.

Sounds like a reasonable compromise to me, although I wonder about the
value of it if we stick it into pgbench's TAP tests.  How many of the
slower buildfarm members are running the TAP tests?  Certainly mine are
not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] A bit of PG archeology uncovers an interesting Linux/Unix factoid

2016-02-15 Thread Greg Stark

For reasons, I was trying to compile older versions of Postgres and
ran into a strange behaviour where system() worked normally but then
returned -1 with errno set to ECHILD. And surprisingly it looks like
we've seen this behaviour in the past but on a Solaris:

commit 07d4d36aae79cf2ac365e381ed3e7ce62dcfa783
Author: Tatsuo Ishii 
Date:   Thu May 25 06:53:43 2000 +

On solaris, createdb/dropdb fails because of strange behavior of system().
(it returns error with errno ECHILD upon successful completion of commands).
This fix ignores an error from system() if errno == ECHILD.

It looks like Linux now behaves similarly, in fact there's a Redhat
notice about this causing similar headaches in Oracle:
https://access.redhat.com/solutions/37218

So just in case anyone else wants to use system() in Postgres or
indeed any other Unix application that twiddles with the SIGCHILD
handler this is something to beware of. It's not entirely clear to me
that the mention of SA_NOCLDWAIT is the only way to get this
behaviour, at least one stackoverflow answer implied just setting
SIG_IGN was enough.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

2016-02-15 Thread Peter Geoghegan

On Tue, Dec 22, 2015 at 10:49 PM, Peter Geoghegan  wrote:
> I attach a rebased patch for 9.6 only.

I marked the patch -- my own patch -- "Ready for Committer". I'm the
third person to have marked the patch "Ready for Committer", even
though no committer bounced the patch back for review by the reviewer,
Andreas:

https://commitfest.postgresql.org/9/300/

First Andreas did so, then Michael, and finally myself. The problem is
that you see things like this:

"Closed in commitfest 2015-11 with status: Moved to next CF"

Michael (the CF manager at the time) remembered to change the status
to "Ready for Committer" again; you see this entry immediately
afterwards:

"New status: Ready for Committer"

but I gather from the CF app history that Alvaro (the current CF
manager) did not remember to do that second step when he later moved
the patch to the "next" (current) CF. Or maybe he just wasn't aware of
this quirk of the CF app.

I don't have a problem with having to resubmit a patch to the next CF
manually, but it's easy to miss that the status has been changed from
"Ready for Committer" to "Needs Review". I don't think anyone ever
argued for that, and this patch was accidentally in "Needs Review"
state for about 5 days. Can we fix it?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xlc atomics

2016-02-15 Thread Noah Misch

On Mon, Feb 15, 2016 at 06:33:42PM +0100, Andres Freund wrote:
> On 2016-02-15 12:11:29 -0500, Noah Misch wrote:
> > These suggested OidGenLock wasn't doing its job.  I've seen similar symptoms
> > around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
> > 5765-J08)" for ppc64le.  The problem is generic-xlc.h
> > pg_atomic_compare_exchange_u32_impl() issuing __isync() before
> > __compare_and_swap().  __isync() shall follow __compare_and_swap(); see our
> > own s_lock.h, its references, and other projects' usage:
> 
> Ugh. You're right! It's about not moving code before the stwcx...
> 
> Do you want to add the new test (no objection, curious), or is that more
> for testing?

The patch's test would join PostgreSQL indefinitely.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v5] GSSAPI encryption support

2016-02-15 Thread Robbie Harwood

David Steele  writes:

> Hi Robbie,
>
> On 2/10/16 4:06 PM, Robbie Harwood wrote:
>> Hello friends,
>> 
>> For your consideration, here is a new version of GSSAPI encryption
>> support.  For those who prefer, it's also available on my github:
>> https://github.com/frozencemetery/postgres/commit/c92275b6605d7929cda5551de47a4c60aab7179e
>
> It tried out this patch and ran into a few problems:
>
> 1) It didn't apply cleanly to HEAD.  It did apply cleanly on a455878
> which I figured was recent enough for testing.  I didn't bisect to find
> the exact commit that broke it.

It applied to head of master (57c932475504d63d8f8a68fc6925d7decabc378a)
for me (`patch -p1 < v4-GSSAPI-encryption-support.patch`).  I rebased it
anyway and cut a v5 anyway, just to be sure.  It's attached, and
available on github as well:
https://github.com/frozencemetery/postgres/commit/dc10e3519f0f6c67f79abd157dc8ff1a1c293f53

> 2) While I was able to apply the patch and get it compiled it seemed
> pretty flaky - I was only able to logon about 1 in 10 times on average.
>  Here was my testing methodology:
>
> a) Build Postgres from a455878 (without your patch), install/configure
> Kerberos and get everything working.  I was able the set the auth method
> to gss in pg_hba.conf and logon successfully every time.
>
> b) On the same system rebuild Postgres from a455878 including your patch
> and attempt authentication.
>
> The problems arose after step 2b.  Sometimes I would try to logon twenty
> times without success and sometimes it only take five or six attempts.
> I was never able to logon successfully twice in a row.
>
> When not successful the client always output this incomplete message
> (without  terminating LF):
>
> psql: expected authentication request from server, but received
>
> From the logs I can see the server is reporting EOF from the client,
> though the client does not core dump and prints the above message before
> exiting.
>
> I have attached files that contain server logs at DEBUG5 and tcpdump
> output for both the success and failure cases.
>
> Please let me know if there's any more information you would like me to
> provide.

What I can't tell from looking at your methodology is whether both the
client and server were running my patches or no.  There's no fallback
here (I'd like to talk about how that should work, with example from
v1-v3, if people have ideas).  This means that both the client and the
server need to be running my patches for the moment.  Is this your
setup?

Thanks for taking it for a spin!
--Robbie

From dc10e3519f0f6c67f79abd157dc8ff1a1c293f53 Mon Sep 17 00:00:00 2001
From: Robbie Harwood 
Date: Tue, 17 Nov 2015 18:34:14 -0500
Subject: [PATCH] Connect encryption support for GSSAPI

Existing GSSAPI authentication code is extended to support connection
encryption.  Connection begins as soon as possible - that is,
immediately after the client and server complete authentication.
---
 configure   |   2 +
 configure.in|   1 +
 doc/src/sgml/client-auth.sgml   |   2 +-
 doc/src/sgml/runtime.sgml   |  20 +-
 src/Makefile.global.in  |   1 +
 src/backend/libpq/Makefile  |   4 +
 src/backend/libpq/auth.c| 330 +---
 src/backend/libpq/be-gssapi.c   | 583 
 src/backend/libpq/be-secure.c   |  16 +
 src/backend/postmaster/postmaster.c |  12 +
 src/include/libpq/libpq-be.h|  31 ++
 src/interfaces/libpq/Makefile   |   4 +
 src/interfaces/libpq/fe-auth.c  | 182 ---
 src/interfaces/libpq/fe-auth.h  |   5 +
 src/interfaces/libpq/fe-connect.c   |  10 +
 src/interfaces/libpq/fe-gssapi.c| 474 +
 src/interfaces/libpq/fe-secure.c|  16 +-
 src/interfaces/libpq/libpq-int.h|  16 +-
 18 files changed, 1191 insertions(+), 518 deletions(-)
 create mode 100644 src/backend/libpq/be-gssapi.c
 create mode 100644 src/interfaces/libpq/fe-gssapi.c

diff --git a/configure b/configure
index b3f3abe..a5bd629 100755
--- a/configure
+++ b/configure
@@ -713,6 +713,7 @@ with_systemd
 with_selinux
 with_openssl
 krb_srvtab
+with_gssapi
 with_python
 with_perl
 with_tcl
@@ -5491,6 +5492,7 @@ $as_echo "$with_gssapi" >&6; }
 
 
 
+
 #
 # Kerberos configuration parameters
 #
diff --git a/configure.in b/configure.in
index 0bd90d7..4fd8f05 100644
--- a/configure.in
+++ b/configure.in
@@ -636,6 +636,7 @@ PGAC_ARG_BOOL(with, gssapi, no, [build with GSSAPI support],
   krb_srvtab="FILE:\$(sysconfdir)/krb5.keytab"
 ])
 AC_MSG_RESULT([$with_gssapi])
+AC_SUBST(with_gssapi)
 
 
 AC_SUBST(krb_srvtab)
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 3b2935c..7d37223 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -915,7 +915,7 @@ omicron bryanh  guest1
 provides automatic authentication (single sign-on) for

Re: [HACKERS] [PATCH] Code refactoring related to -fsanitize=use-after-scope

2016-02-15 Thread Andres Freund

Hi,

On 2016-02-15 14:37:28 +0100, Martin Liška wrote:
> I've been currently working on support of -sanitize=use-after-scope in the 
> GCC compiler and
> I decided to use postgresql as my test-case. The sanitation poisons every 
> stack variable at the
> very beginning of a function, unpoisons a variable at the beginning of scope 
> definition and finally
> poisons the variable again at the end of scope.

Generally sounds like a good check.

> Following patch fixes issues seen by the sanitizer. Hope it's acceptable?
> With the patch applied, ASAN (with the new sanitization) works fine.


> diff --git a/src/backend/access/spgist/spgdoinsert.c 
> b/src/backend/access/spgist/spgdoinsert.c
> index f090ca5..ff986c2 100644
> --- a/src/backend/access/spgist/spgdoinsert.c
> +++ b/src/backend/access/spgist/spgdoinsert.c
> @@ -1871,6 +1871,10 @@ spgdoinsert(Relation index, SpGistState *state,
>   SPPageDesc  current,
>   parent;
>   FmgrInfo   *procinfo = NULL;
> + SpGistInnerTuple innerTuple;
> + spgChooseIn in;
> + spgChooseOut out;
> +
>  
>   /*
>* Look up FmgrInfo of the user-defined choose function once, to save
> @@ -2044,9 +2048,6 @@ spgdoinsert(Relation index, SpGistState *state,
>* Apply the opclass choose function to figure out how 
> to insert
>* the given datum into the current inner tuple.
>*/
> - SpGistInnerTuple innerTuple;
> - spgChooseIn in;
> - spgChooseOut out;

But I'm not immediately seing why this is necessary? Is this about
battling a false positive?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xlc atomics

2016-02-15 Thread Andres Freund

On 2016-02-15 12:11:29 -0500, Noah Misch wrote:
> These suggested OidGenLock wasn't doing its job.  I've seen similar symptoms
> around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
> 5765-J08)" for ppc64le.  The problem is generic-xlc.h
> pg_atomic_compare_exchange_u32_impl() issuing __isync() before
> __compare_and_swap().  __isync() shall follow __compare_and_swap(); see our
> own s_lock.h, its references, and other projects' usage:

Ugh. You're right! It's about not moving code before the stwcx...

Do you want to add the new test (no objection, curious), or is that more
for testing?

Greetings,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: make NOTIFY list de-duplication optional

2016-02-15 Thread Filip Rembiałkowski

Small update. I had to add one thing in /contrib/tcn/.
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..3a6d4f5 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -160,7 +160,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 	strcpy_quoted(payload, SPI_getvalue(trigtuple, tupdesc, colno), '\'');
 }
 
-Async_Notify(channel, payload->data);
+Async_Notify(channel, payload->data, false);
 			}
 			ReleaseSysCache(indexTuple);
 			break;
diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml
index 4dd5608..933c76c 100644
--- a/doc/src/sgml/ref/notify.sgml
+++ b/doc/src/sgml/ref/notify.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-NOTIFY channel [ , payload ]
+NOTIFY [ ALL | DISTINCT ] channel [ , payload ]
 
  
 
@@ -105,6 +105,10 @@ NOTIFY channel [ , ALL is specified (contrary to DISTINCT, the
+   default), the server will deliver all notifications, including duplicates.
+   Removal of duplicate notifications takes place within transaction block,
+   finished with COMMIT, END or SAVEPOINT.
   
 
   
@@ -190,6 +194,12 @@ NOTIFY channel [ , NOTIFY command if you need to work with
 non-constant channel names and payloads.

+   
+There is a three-argument version, pg_notify(text,
+text, boolean). The third argument acts like
+the ALL keyword when set to true, and
+DISTINCT when set to false. 
+   
   
  
 
@@ -210,6 +220,21 @@ Asynchronous notification "virtual" with payload "This is the payload" received
 LISTEN foo;
 SELECT pg_notify('fo' || 'o', 'pay' || 'load');
 Asynchronous notification "foo" with payload "payload" received from server process with PID 14728.
+
+/* Identical messages from same (sub-) transaction can be eliminated - unless you use the ALL keyword */
+LISTEN bar;
+BEGIN;
+NOTIFY bar, 'Coffee please';
+NOTIFY bar, 'Coffee please';
+NOTIFY bar, 'Milk please';
+NOTIFY ALL bar, 'Milk please';
+SAVEPOINT s;
+NOTIFY bar, 'Coffee please';
+COMMIT;
+Asynchronous notification "bar" with payload "Coffee please" received from server process with PID 31517.
+Asynchronous notification "bar" with payload "Milk please" received from server process with PID 31517.
+Asynchronous notification "bar" with payload "Milk please" received from server process with PID 31517.
+Asynchronous notification "bar" with payload "Coffee please" received from server process with PID 31517.
 
  
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index c39ac3a..38a8246 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -510,6 +510,7 @@ pg_notify(PG_FUNCTION_ARGS)
 {
 	const char *channel;
 	const char *payload;
+	bool use_all;
 
 	if (PG_ARGISNULL(0))
 		channel = "";
@@ -521,10 +522,15 @@ pg_notify(PG_FUNCTION_ARGS)
 	else
 		payload = text_to_cstring(PG_GETARG_TEXT_PP(1));
 
+	if (PG_NARGS() > 2 && ! PG_ARGISNULL(2))
+		use_all = PG_GETARG_BOOL(2);
+	else
+		use_all = false;
+
 	/* For NOTIFY as a statement, this is checked in ProcessUtility */
 	PreventCommandDuringRecovery("NOTIFY");
 
-	Async_Notify(channel, payload);
+	Async_Notify(channel, payload, use_all);
 
 	PG_RETURN_VOID();
 }
@@ -540,7 +546,7 @@ pg_notify(PG_FUNCTION_ARGS)
  *		^^
  */
 void
-Async_Notify(const char *channel, const char *payload)
+Async_Notify(const char *channel, const char *payload, bool use_all)
 {
 	Notification *n;
 	MemoryContext oldcontext;
@@ -570,9 +576,10 @@ Async_Notify(const char *channel, const char *payload)
 	 errmsg("payload string too long")));
 	}
 
-	/* no point in making duplicate entries in the list ... */
-	if (AsyncExistsPendingNotify(channel, payload))
-		return;
+	if (!use_all)
+		/* remove duplicate entries in the list */
+		if (AsyncExistsPendingNotify(channel, payload))
+			return;
 
 	/*
 	 * The notification list needs to live until end of transaction, so store
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b307b48..7203f4a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -8528,11 +8528,12 @@ DropRuleStmt:
  *
  */
 
-NotifyStmt: NOTIFY ColId notify_payload
+NotifyStmt: NOTIFY all_or_distinct ColId notify_payload
 {
 	NotifyStmt *n = makeNode(NotifyStmt);
-	n->conditionname = $2;
-	n->payload = $3;
+	n->use_all = $2;
+	n->conditionname = $3;
+	n->payload = $4;
 	$$ = (Node *)n;
 }
 		;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 045f7f0..0e50561 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -599,7 +599,7 @@ standard_ProcessUtility(Node *parsetree,
 NotifyStmt *stmt = (NotifyStmt *) parsetree;
 
 PreventCommandDuringRecovery("NOTIFY");
-Async_Notify(stmt->conditionname, stmt->payload);
+Async_Notify(stmt->conditionname, stmt->payload, stmt->use_all);
 			}
 			break;
 
diff --git

Re: [HACKERS] New pg_upgrade data directory inside old one?

2016-02-15 Thread Magnus Hagander

On Mon, Feb 15, 2016 at 6:29 PM, Bruce Momjian  wrote:

> Someone on IRC reported that if they had run the pg_upgrade-created
> delete_old_cluster.sh shell script it would have deleted their old _and_
> new data directories.  (Fortunately they didn't run it.)
>
> I was confused how this could have happened, and the user explained that
> their old cluster was in /u/pgsql/data, and that they wanted to switch to
> a per-major-version directory naming schema, so they put the new data
> directory in /u/pgsql/data/9.5.  (They could have just moved the
> directory while the server was down, but didn't.)
>
> Unfortunately, there is no check for having the new cluster data
> directory inside the old data directory, only a check for tablespace
> directories in the old cluster.  (I never anticipated someone would do
> this.)
>

Interesting - I definitely wouldn't have expected that either. And it
definitely seems like a foot-gun we should protect the users against.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

[HACKERS] New pg_upgrade data directory inside old one?

2016-02-15 Thread Bruce Momjian

Someone on IRC reported that if they had run the pg_upgrade-created
delete_old_cluster.sh shell script it would have deleted their old _and_
new data directories.  (Fortunately they didn't run it.)

I was confused how this could have happened, and the user explained that
their old cluster was in /u/pgsql/data, and that they wanted to switch to
a per-major-version directory naming schema, so they put the new data
directory in /u/pgsql/data/9.5.  (They could have just moved the
directory while the server was down, but didn't.)

Unfortunately, there is no check for having the new cluster data
directory inside the old data directory, only a check for tablespace
directories in the old cluster.  (I never anticipated someone would do
this.)

The attached patch adds the proper check.  This should be backpatched to
all supported Postgres versions.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
new file mode 100644
index 8c034bc..e92fc66
*** a/src/bin/pg_upgrade/check.c
--- b/src/bin/pg_upgrade/check.c
*** output_completion_banner(char *analyze_s
*** 195,203 
  			   deletion_script_file_name);
  	else
  		pg_log(PG_REPORT,
! 			   "Could not create a script to delete the old cluster's data\n"
! 		  "files because user-defined tablespaces exist in the old cluster\n"
! 		"directory.  The old cluster's contents must be deleted manually.\n");
  }
  
  
--- 195,204 
  			   deletion_script_file_name);
  	else
  		pg_log(PG_REPORT,
! 		  "Could not create a script to delete the old cluster's data files\n"
! 		  "because the new cluster's data directory or user-defined tablespaces\n"
! 		  "exist in the old cluster directory.  The old cluster's contents must\n"
! 		  "be deleted manually.\n");
  }
  
  
*** create_script_for_old_cluster_deletion(c
*** 496,513 
  {
  	FILE	   *script = NULL;
  	int			tblnum;
! 	char		old_cluster_pgdata[MAXPGPATH];
  
  	*deletion_script_file_name = psprintf("%sdelete_old_cluster.%s",
  		  SCRIPT_PREFIX, SCRIPT_EXT);
  
  	/*
  	 * Some users (oddly) create tablespaces inside the cluster data
  	 * directory.  We can't create a proper old cluster delete script in that
  	 * case.
  	 */
- 	strlcpy(old_cluster_pgdata, old_cluster.pgdata, MAXPGPATH);
- 	canonicalize_path(old_cluster_pgdata);
  	for (tblnum = 0; tblnum < os_info.num_old_tablespaces; tblnum++)
  	{
  		char		old_tablespace_dir[MAXPGPATH];
--- 497,531 
  {
  	FILE	   *script = NULL;
  	int			tblnum;
! 	char		old_cluster_pgdata[MAXPGPATH], new_cluster_pgdata[MAXPGPATH];
  
  	*deletion_script_file_name = psprintf("%sdelete_old_cluster.%s",
  		  SCRIPT_PREFIX, SCRIPT_EXT);
  
+ 	strlcpy(old_cluster_pgdata, old_cluster.pgdata, MAXPGPATH);
+ 	canonicalize_path(old_cluster_pgdata);
+ 
+ 	strlcpy(new_cluster_pgdata, new_cluster.pgdata, MAXPGPATH);
+ 	canonicalize_path(new_cluster_pgdata);
+ 
+ 	/* Some people put the new data directory inside the old one. */
+ 	if (path_is_prefix_of_path(old_cluster_pgdata, new_cluster_pgdata))
+ 	{
+ 		pg_log(PG_WARNING,
+ 		   "\nWARNING:  new data directory should not be inside the old data directory, e.g. %s\n", old_cluster_pgdata);
+ 
+ 		/* Unlink file in case it is left over from a previous run. */
+ 		unlink(*deletion_script_file_name);
+ 		pg_free(*deletion_script_file_name);
+ 		*deletion_script_file_name = NULL;
+ 		return;
+ 	}
+ 
  	/*
  	 * Some users (oddly) create tablespaces inside the cluster data
  	 * directory.  We can't create a proper old cluster delete script in that
  	 * case.
  	 */
  	for (tblnum = 0; tblnum < os_info.num_old_tablespaces; tblnum++)
  	{
  		char		old_tablespace_dir[MAXPGPATH];

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Code refactoring related to -fsanitize=use-after-scope

2016-02-15 Thread Martin Liška

Hello.

I've been currently working on support of -sanitize=use-after-scope in the GCC 
compiler and
I decided to use postgresql as my test-case. The sanitation poisons every stack 
variable at the
very beginning of a function, unpoisons a variable at the beginning of scope 
definition and finally
poisons the variable again at the end of scope.

Following patch fixes issues seen by the sanitizer. Hope it's acceptable?
With the patch applied, ASAN (with the new sanitization) works fine.

Thanks,
Martin
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index f090ca5..ff986c2 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -1871,6 +1871,10 @@ spgdoinsert(Relation index, SpGistState *state,
 	SPPageDesc	current,
 parent;
 	FmgrInfo   *procinfo = NULL;
+	SpGistInnerTuple innerTuple;
+	spgChooseIn in;
+	spgChooseOut out;
+
 
 	/*
 	 * Look up FmgrInfo of the user-defined choose function once, to save
@@ -2044,9 +2048,6 @@ spgdoinsert(Relation index, SpGistState *state,
 			 * Apply the opclass choose function to figure out how to insert
 			 * the given datum into the current inner tuple.
 			 */
-			SpGistInnerTuple innerTuple;
-			spgChooseIn in;
-			spgChooseOut out;
 
 			/*
 			 * spgAddNode and spgSplitTuple cases will loop back to here to

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Failover Slots

2016-02-15 Thread Petr Jelinek

Hi,

here is my code level review:

0001:
This one looks ok except for broken indentation in the new notes in
xlogreader.c and .h. It's maybe slightly overdocumented but given the
complicated way the timeline reading works it's probably warranted.

0002:
+/*
+ * No way to wait for the process since it's not a child
+ * of ours and there's no latch to set, so poll.
+ *
+ * We're checking this without any locks held, but
+ * we'll recheck when we attempt to drop the slot.
+ */
+while (slot->in_use && slot->active_pid == active_pid
+&& max_sleep_micros > 0)
+{
+usleep(micros_per_sleep);
+max_sleep_micros -= micros_per_sleep;
+}

Not sure I buy this, what about postmaster crashes and fast shutdown
requests etc. Also you do usleep for 10s which is quite long. I'd
prefer the classic wait for latch with timeout and pg crash check
here. And even if we go with usleep, then it should be 1s not 10s and
pg_usleep instead of usleep.

0003:
There is a lot of documentation improvements here that are not related
to failover slots or timeline following, it might be good idea to
split those into separate patch as they are separately useful IMHO.

Other than that it looks good to me.

About other things discussed in this thread. Yes it makes sense in
certain situations to handle this outside of WAL and that does require
notions of nodes, etc. That being said, the timeline following is
needed even if this is handled outside of WAL. And once timeline
following is in, the slots can be handled by the replication solution
itself which is good. But I think the failover slots are still a good
thing to have - it provides HA solution for anything that uses slots,
and that includes physical replication as well. If the specific
logical replication solution wants to handle it for some reason itself
outside of WAL, it can create non-failover slot so in my opinion we
ideally need both types of slots (and that's exactly what this patch
gives us).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Remove or weaken hints about "effective resolution of sleep delays is 10 ms"?

2016-02-15 Thread Andres Freund

Hi,

Several places in our docs have blurbs like
> Note that on many systems, the effective resolution of sleep delays is
> 10 milliseconds; setting wal_writer_delay to a value that
> is not a multiple of 10 might have the same results as setting it to
> the next higher multiple of 10.
Afaik that's not the case on any recent operating system/hardware. So
perhaps we should just remove all of those blurbs, or just replace them
with something like "on some older systems the effective resolution of
sleep delays is limited to multiples of 10 milliseconds"?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Existence check for suitable index in advance when concurrently refreshing.

2016-02-15 Thread Fujii Masao

On Wed, Feb 10, 2016 at 10:35 AM, Michael Paquier
 wrote:
> On Wed, Feb 10, 2016 at 2:23 AM, Fujii Masao  wrote:
>> On Wed, Feb 10, 2016 at 2:21 AM, Fujii Masao  wrote:
>>> On Tue, Feb 9, 2016 at 9:11 PM, Michael Paquier
>>>  wrote:
 On Tue, Feb 9, 2016 at 4:27 PM, Fujii Masao  wrote:
> Thanks for updating the patch!
> Attached is the updated version of the patch.
> I removed unnecessary assertion check and change of source code
> that you added, and improved the source comment.
> Barring objection, I'll commit this patch.

 So, this code basically duplicates what is already in
 refresh_by_match_merge to check if there is a UNIQUE index defined. If
 we are sure that an error is detected earlier in the code as done in
 this patch, wouldn't it be better to replace the error message in
 refresh_by_match_merge() by an assertion?
>>>
>>> I'm OK with an assertion if we add source comment about why
>>> refresh_by_match_merge() can always guarantee that there is
>>> a unique index on the matview. Probably it's because the matview
>>> is locked with exclusive lock at the start of ExecRefreshMatView(),
>>> i.e., it's guaranteed that we cannot drop any indexes on the matview
>>> after the first check is passed. Also it might be better to add
>>> another comment about that the caller of refresh_by_match_merge()
>>> must always check that there is a unique index on the matview before
>>> calling that function, just in the case where it's called elsewhere
>>> in the future.
>>>
>>> OTOH, I don't think it's not so bad idea to just emit an error, instead.
>>
>> Sorry, s/it's not/it's
>
> Well, refresh_by_match_merge is called only by ExecRefreshMatView()
> and it is not exposed externally in matview.h, so it is not like we
> are going to break any extension-related code by having an assertion
> instead of an error message, and that's less code duplication to
> maintain if this extra error message is removed. But feel free to
> withdraw my comment if you think that's not worth it, I won't deadly
> insist on that either :)

Okay, I revived Sawada's original assertion code and pushed the patch.
Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xlc atomics

2016-02-15 Thread Noah Misch

On Sat, Jul 04, 2015 at 08:07:49PM -0400, Noah Misch wrote:
> On Sun, Jul 05, 2015 at 12:54:43AM +0200, Andres Freund wrote:
> > On 2015-07-04 18:40:41 -0400, Noah Misch wrote:
> > > (1) "IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)".  Getting it 
> > > working
> > > required the attached patch.
> 
> > Will you apply? Having the ability to test change seems to put you in a
> > much better spot then me.
> 
> I will.

That commit (0d32d2e) permitted things to compile and usually pass tests, but
I missed the synchronization bug.  Since 2015-10-01, the buildfarm has seen
sixteen duplicate-catalog-OID failures.  The compiler has always been xlC, and
the branch has been HEAD or 9.5.  Examples:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2015-12-15%2004%3A12%3A49
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2015-12-08%2001%3A20%3A16
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2015-12-11%2005%3A25%3A54

These suggested OidGenLock wasn't doing its job.  I've seen similar symptoms
around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
5765-J08)" for ppc64le.  The problem is generic-xlc.h
pg_atomic_compare_exchange_u32_impl() issuing __isync() before
__compare_and_swap().  __isync() shall follow __compare_and_swap(); see our
own s_lock.h, its references, and other projects' usage:

https://github.com/boostorg/smart_ptr/blob/boost-1.60.0/include/boost/smart_ptr/detail/sp_counted_base_vacpp_ppc.hpp#L55
http://redmine.gromacs.org/projects/gromacs/repository/entry/src/external/thread_mpi/include/thread_mpi/atomic/xlc_ppc.h?rev=v5.1.2#L112
https://www-01.ibm.com/support/knowledgecenter/SSGH2K_13.1.2/com.ibm.xlc131.aix.doc/language_ref/asm_example.html

This patch's test case would have failed about half the time under today's
generic-xlc.h.  Fast machines run it in about 1s.  A test that detects the bug
99% of the time would run far longer, hence this compromise.

Archaeology enthusiasts may enjoy the bug's changing presentation over time.
LWLockAcquire() exposed it after commit ab5194e redesigned lwlock.c in terms
of atomics.  Assert builds were unaffected for commits in [ab5194e, bbfd7ed);
atomics.h asserts fattened code enough to mask the bug.  However, removing the
AssertPointerAlignment() from pg_atomic_fetch_or_u32() would have sufficed to
surface it in assert builds.  All builds demonstrated the bug once bbfd7ed
introduced xlC pg_attribute_noreturn(), which elicits a simpler instruction
sequence around the ExceptionalCondition() call embedded in each Assert.

nm
diff --git a/src/bin/pgbench/.gitignore b/src/bin/pgbench/.gitignore
index aae819e..983a3cd 100644
--- a/src/bin/pgbench/.gitignore
+++ b/src/bin/pgbench/.gitignore
@@ -1,3 +1,4 @@
 /exprparse.c
 /exprscan.c
 /pgbench
+/tmp_check/
diff --git a/src/bin/pgbench/Makefile b/src/bin/pgbench/Makefile
index 18fdf58..e9b1b74 100644
--- a/src/bin/pgbench/Makefile
+++ b/src/bin/pgbench/Makefile
@@ -40,3 +40,9 @@ clean distclean:

 maintainer-clean: distclean
rm -f exprparse.c exprscan.c
+
+check:
+   $(prove_check)
+
+installcheck:
+   $(prove_installcheck)
diff --git a/src/bin/pgbench/t/001_pgbench.pl b/src/bin/pgbench/t/001_pgbench.pl
new file mode 100644
index 000..88e83ab
--- /dev/null
+++ b/src/bin/pgbench/t/001_pgbench.pl
@@ -0,0 +1,25 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 3;
+
+# Test concurrent insertion into table with UNIQUE oid column.  DDL expects
+# GetNewOidWithIndex() to successfully avoid violating uniqueness for indexes
+# like pg_class_oid_index and pg_proc_oid_index.  This indirectly exercises
+# LWLock and spinlock concurrency.  This test makes a 5-MiB table.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->psql('postgres',
+   'CREATE UNLOGGED TABLE oid_tbl () WITH OIDS; '
+ . 'ALTER TABLE oid_tbl ADD UNIQUE (oid);');
+my $script = $node->basedir . '/pgbench_script';
+append_to_file($script,
+   'INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);');
+$node->command_like(
+   [   qw(pgbench --no-vacuum --client=5 --protocol=prepared
+ --transactions=25 --file), $script ],
+   qr{processed: 125/125},
+   'concurrent OID generation');
diff --git a/src/include/port/atomics/generic-xlc.h 
b/src/include/port/atomics/generic-xlc.h
index 1e8a11e..f24e3af 100644
--- a/src/include/port/atomics/generic-xlc.h
+++ b/src/include/port/atomics/generic-xlc.h
@@ -41,18 +41,22 @@ pg_atomic_compare_exchange_u32_impl(volatile 
pg_atomic_uint32 *ptr,
uint32 
*expected, uint32 newval)
 {
/*
+* XXX: __compare_and_swap is defined to take signed parameters, but 
that
+* shouldn't matter since we don't perform any arithmetic operations.
+*/
+   boolret = __compare_and_swap((volatile int*)>value,
+

Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE

2016-02-15 Thread Tom Lane

Teodor Sigaev  writes:
>> So basically, a generic CASCADE facility sounds like a lot of work to
>> produce something that would seldom be anything but a foot-gun.

> DELETE FROM  or TRUNCATE could be a foot-gun too, but it's not a reason to 
> remove tham. I faced with problem when I tried to change owner of datadase 
> with 
> all objects inside. Think, this feature could be useful although it should 
> restricted to superuser obly.

That's a pretty weak argument, and I do not think you have thought through
all the consequences.  It is not hard at all to imagine cases where using
this sort of thing could be a security vulnerability.  Are you familiar
with the reasons why Unix systems don't typically allow users to "give
away" ownership of files?  The same problem exists here.

To be concrete about it:

1. Alice does, say, "CREATE EXTENSION cube".

2. Bob creates a security-definer function owned by himself, using a
   "cube"-type parameter so that it's dependent on the extension.
   (It needn't actually do anything with that parameter.)

3. Alice does ALTER EXTENSION cube OWNER TO charlie CASCADE.

4. Bob now has a security-definer function owned by (and therefore
   executing as) Charlie, whose contents were determined by Bob.
   Game over for Charlie ... and for everyone else too, if Charlie is
   a superuser, which is not unlikely for an extension owner.

The only way Alice can be sure that the ALTER EXTENSION is safe is if
she manually inspects every dependent object, in which case she might
as well not use CASCADE.

Moreover, the use case you've sketched (ie, change ownership of all
objects inside a database) doesn't actually have anything to do with
following dependencies.  It's a lot closer to REASSIGN OWNED ... in
fact, it's not clear to me why REASSIGN OWNED doesn't solve that
use-case already.

I remain of the opinion that this is a terrible idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small PATCH: check of 2 Perl modules

2016-02-15 Thread Peter Eisentraut

On 2/15/16 8:57 AM, Teodor Sigaev wrote:
>> On 2/12/16 8:20 AM, Eugene Kazakov wrote:
>>> TAP-tests need two Perl modules: Test::More and IPC::Run.
>> I think those modules are part of a standard Perl installation.
> 
> IPC::Run is not. At least for perl from ports tree in FreeBSD.

Right, that's why we added the configure option.  But Test::More is
standard.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE

2016-02-15 Thread Teodor Sigaev


TBH, this sounds like a completely terrible idea.  There are far too many
sorts of dependencies across which one would not expect ownership to
propagate; for example, use of a function in a view, or even just a
foreign key dependency between two tables.

I'm not even clear that there are *any* cases where this behavior is
wanted, other than perhaps ALTER OWNER on an extension --- and even there,
what you would want is altering the ownership of the member objects, but
not everything that depends on the member objects.

So basically, a generic CASCADE facility sounds like a lot of work to
produce something that would seldom be anything but a foot-gun.


DELETE FROM  or TRUNCATE could be a foot-gun too, but it's not a reason to 
remove tham. I faced with problem when I tried to change owner of datadase with 
all objects inside. Think, this feature could be useful although it should 
restricted to superuser obly.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE

2016-02-15 Thread Dmitry Ivanov

> Dmitry Ivanov  writes:
> > As of now there's no way to transfer the ownership of an object and all
> > its
> > dependent objects in one step. One has to manually alter the owner of each
> > object, be it a table, a schema or something else.
> 
> TBH, this sounds like a completely terrible idea.  There are far too many
> sorts of dependencies across which one would not expect ownership to
> propagate; for example, use of a function in a view, or even just a
> foreign key dependency between two tables.

Well, actually this is a statement of the fact, and I don't propose enabling 
this option for every dependency possible. This patch includes an experimental 
feature that anyone can try and discuss, nothing more. Besides, it acts a lot 
like 'drop ... cascade' (the findDependentObjects() function is used under the 
hood), so the behavior is totally predictable. It also writes down all objects 
that have been touched, so no change goes unnoticed.

> I'm not even clear that there are *any* cases where this behavior is
> wanted, other than perhaps ALTER OWNER on an extension

At very least this might be useful in order to change owner of all tables 
which inherit some table.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GetExistingLocalJoinPath() vs. the docs

2016-02-15 Thread Stephen Frost

Greetings,

While getting back into the thread Re: Optimization for updating foreign
tables in Postgres FDW, I noticed some issues with the docs and
GetExistingLocalJoinPath():

GetExistingLocalJoinPath() exists in src/backend/foreign/foreign.c, but
the docs include discussion of it under 54.2 - Foreign Data Wrapper
Callback Routines.  Shouldn't this be under 54.3. Foreign Data Wrapper
Helper Functions?  Also, the prototype is incorrect in the docs (it
doesn't return a void) and the paragraph about what it's for could do
with some wordsmithing.

A link from 54.2 to 54.3 which mentions it would be fine, of course.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-02-15 Thread Teodor Sigaev

It's very pity but author is not able to continue work on this patch, and I 
would like to raise this flag.


I'd like to add some comments about patches:

traversalValue patch adds arbitrary value assoсiated with branch in SP-GiST tree 
walk. Unlike to recostructedValue it could be just pointer, it havn't to be a 
regular pgsql type. Also, it could be used independly from reconstructedValue. 
This patch is used both following attached patches.


range patch just switchs using reconstructedValue to traversalValue in range 
opclasses. reconstructedValue was used just because it was an acceptable 
workaround in case of range type. Range opclase stores a full value in leafs and 
doesn't need to use reconstructedValue to return tuple in index only scan.

See http://www.postgresql.org/message-id/5399.1343512...@sss.pgh.pa.us

q4d patch implements index over boxes using SP-GiST. Basic idea was an 
observation, range opclass thinks about one-dimensional ranges as 2D points.
Following this idea, we can think that 2D box (what is 2 points or 4 numbers) 
could represent a 4D point. We hoped that this approach will be much more 
effective than traditional R-Tree in case of many overlaps in box's collection.

Performance results are coming shortly.

In near future (say, tommorrow) I hope to suggest a opclass over polygons based 
on this approach.


Alexander Lebedev wrote:

Hello, Hacker.

* [PATCH] add a box index to sp-gist

   We have extended sp-gist with an index that keeps track of boxes

   We have used ideas underlying sp-gist range implementation to
   represent 2D boxes as points in 4D space. We use quad tree
   analogue, but in 4-dimensional space. We call this tree q4d. Each
   node of this tree is a box (a point in 4D space) which splits space
   in 16 hyperrectangles.

   Rationale: r-tree assumes that boxes we're trying to index don't
   overlap much. When this assumption fails, r-tree performs badly,
   while our proposal to represent a rectangle as a point in 4D space
   solves this problem.

   NB: the index uses traversalValue introduced in a separate patch.

* [PATCH] add traversalValue in sp-gist

   During implementation of box index for sp-gist we saw that we only
   keep rectangles, but to determine traversal direction we may need
   to know boundaries of a hyperrectangle. So we calculate them
   on the fly and store them in traversalValue, available
   when traversing child nodes, because otherwise we would't be able to
   calculate them from inside the inner_consistent function call.

   This patch was written by Teodor Sigaev.

* [PATCH] change reconstructValue -> traversalValue in range_spgist.c

   During traversal, local context is
   insufficient to pick traversal direction. As of now, this is worked
   around with the help of reconstructValue. reconstructValue only
   stores data of the same type as a tree node, that is, a range.

   We have written a patch that calculates auxillary values and makes
   them accessible during traversal.

   We then use traversalValue in a new box index and change
   range_spgist.c to use it in place of reconstructValue.

   NB: apply this patch after traversalValue patch.






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


traversalValue-1.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data


q4d-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small PATCH: check of 2 Perl modules

2016-02-15 Thread Tom Lane

Teodor Sigaev  writes:
>> On 2/12/16 8:20 AM, Eugene Kazakov wrote:
>>> TAP-tests need two Perl modules: Test::More and IPC::Run.

>> I think those modules are part of a standard Perl installation.

> IPC::Run is not. At least for perl from ports tree in FreeBSD.

Yeah, I remember having had to install IPC::Run from CPAN, and a couple
of other things too (but I don't remember Test::More specifically), when
setting up some of my buildfarm critters.  It's likely that a lot of
distros bundle these, but I don't think IPC::Run is in a basic
built-from-source Perl.

The real question though is do we need a configure check at all.

Given that investigation into a CMake conversion is actively going
on, I'm hesitant to move the goalposts for it by introducing a brand
new type of configure check.  Maybe we should punt this issue until
that patch is either accepted or decisively rejected.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE

2016-02-15 Thread Tom Lane

Dmitry Ivanov  writes:
> As of now there's no way to transfer the ownership of an object and all its 
> dependent objects in one step. One has to manually alter the owner of each 
> object, be it a table, a schema or something else.

TBH, this sounds like a completely terrible idea.  There are far too many
sorts of dependencies across which one would not expect ownership to
propagate; for example, use of a function in a view, or even just a
foreign key dependency between two tables.

I'm not even clear that there are *any* cases where this behavior is
wanted, other than perhaps ALTER OWNER on an extension --- and even there,
what you would want is altering the ownership of the member objects, but
not everything that depends on the member objects.

So basically, a generic CASCADE facility sounds like a lot of work to
produce something that would seldom be anything but a foot-gun.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small PATCH: check of 2 Perl modules

2016-02-15 Thread Michael Paquier

On Mon, Feb 15, 2016 at 10:57 PM, Teodor Sigaev  wrote:
>> On 2/12/16 8:20 AM, Eugene Kazakov wrote:
>>>
>>> TAP-tests need two Perl modules: Test::More and IPC::Run.
>>
>> I think those modules are part of a standard Perl installation.
>
> IPC::Run is not. At least for perl from ports tree in FreeBSD.

On OSX and on Windows as well they are now shipped AFAIK.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-15 Thread Daniel Verite

Dean Rasheed wrote:

> My biggest problem is with the sorting, for all the reasons discussed
> above. There is absolutely no reason for \crosstabview to be
> re-sorting rows -- they should just be left in the original query
> result order

We want the option to sort the vertical the header in a late additional
step when the ORDER BY of the query is already assigned to another
purpose. 

I've submitted this example on the wiki:
https://wiki.postgresql.org/wiki/Crosstabview

create view v_data as 
select * from ( values
   ('v1','h2','foo', '2015-04-01'::date),
   ('v2','h1','bar', '2015-01-02'),
   ('v1','h0','baz', '2015-07-12'),
   ('v0','h4','qux', '2015-07-15')
 ) as l(v,h,c,d);


Normal top-down display:

select v,to_char(d,'Mon') as m, c  from v_data order by d;

 v  |  m  |  c  
+-+-
 v2 | Jan | bar
 v1 | Apr | foo
 v1 | Jul | baz
 v0 | Jul | qux

Crosstabview display without any additional sort:

 \crosstabview v m c

 v  | Jan | Apr | Jul 
+-+-+-
 v2 | bar | | 
 v1 | | foo | baz
 v0 | | | qux

"d" is not present the resultset but it drives the sort
so that month names come out in the natural order.

\crosstabview does not discard the order of colH nor the order of colV,
it follows both, so that we get v2,v1,v0 in this order in the leftmost
column (vertical header) just like in the resultset.

At this point, it seems to me that it's perfectly reasonable for our user
to expect the possibility of sorting additionally by "v" , without
changing the query and without changing the order of the horizontal
header:

 \crosstabview +v m c

 v  | Jan | Apr | Jul 
+-+-+-
 v0 | | | qux
 v1 | | foo | baz
 v2 | bar | | 



Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small PATCH: check of 2 Perl modules

2016-02-15 Thread Teodor Sigaev


On 2/12/16 8:20 AM, Eugene Kazakov wrote:

TAP-tests need two Perl modules: Test::More and IPC::Run.

I think those modules are part of a standard Perl installation.


IPC::Run is not. At least for perl from ports tree in FreeBSD.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refectoring of receivelog.c

2016-02-15 Thread Magnus Hagander

On Mon, Feb 15, 2016 at 7:15 AM, Craig Ringer  wrote:

> On 15 February 2016 at 04:48, Magnus Hagander  wrote:
>
>> I was working on adding the tar streaming functionality we talked about
>> at the developer meeting to pg_basebackup, and rapidly ran across the issue
>> that Andres has been complaining about for a while. The code in
>> receivelog.c just passes an insane number of parameters around. Adding or
>> changing even a small thing ends up touching a huge number of places.
>>
>
>
> Other than the lack of comments on the fields in StreamCtl to indicate
> their functions I think this looks good.
>

I copied that lack of comments from the previous implementation :P

But yeah, I agree, it's probably a good idea to add them.



> I didn't find any mistakes, but I do admit my eyes started glazing over
> after a bit.
>
> I'd rather not have StreamCtl as a typedef of an anonymous struct if it's
> exposed in a header though. It should have a name that can be used in
> forward declarations etc.
>


It's not exactly uncommon with anonymous ones either in our code, but I see
no problem adding that.

Thanks!


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-15 Thread Daniel Verite

Alvaro Herrera wrote:

> So please can we have that wiki page so that the syntax can be hammered
> out a bit more.

I've added a wiki page with explanation and examples here:

https://wiki.postgresql.org/wiki/Crosstabview

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small PATCH: check of 2 Perl modules

2016-02-15 Thread Peter Eisentraut

On 2/12/16 8:20 AM, Eugene Kazakov wrote:
> TAP-tests need two Perl modules: Test::More and IPC::Run.
> 
> The patch adds checking of modules in configure.in and configure.

I think those modules are part of a standard Perl installation.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-02-15 Thread Vitaly Burovoy

On 2/15/16, Vitaly Burovoy  wrote:
> P.S.: "bytes" size unit was added just for consistency: each group
> should have a name, even with an exponent of 1.

Oops... Of course, "even with an exponent of 0".
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-02-15 Thread Vitaly Burovoy

On 2/15/16, Dean Rasheed  wrote:
>> On 12/02/16 10:19, Dean Rasheed wrote:
>>> This seems like a reasonable first patch for me as a committer, so
>>> I'll take it unless anyone else was planning to do so.
>>
>
> So looking at this, it seems that for the most part pg_size_bytes()
> will parse any output produced by pg_size_pretty(). The exception is
> that there are 2 versions of pg_size_pretty(), one that takes bigint
> and one that takes numeric, whereas pg_size_bytes() returns bigint, so
> it can't handle all inputs. Is there any reason not to make
> pg_size_bytes() return numeric?

I guess the only reason is a comparison purpose. The example from the
first letter[1]:

SELECT * FROM pg_class
  WHERE pg_table_size(oid) > pg_human_size('2GB');

... but not for something like:
WITH sel AS (
  SELECT pg_size_pretty(oid) AS _size, * FROM pg_class WHERE ...
)
SELECT pg_size_bytes(_size), * FROM sel;


So the use case of the "pg_size_bytes" is getting a value to compare
with a result from pg_table_size, ..., even with pg_database_size,
_each_ of whom returns bigint (or just int), but not numeric:

postgres=# \df pg_*_size
   List of functions
   Schema   |  Name  | Result data type | Argument
data types |  Type
++--+-+
 pg_catalog | pg_column_size | integer  | "any"
   | normal
 pg_catalog | pg_database_size   | bigint   | name
   | normal
 pg_catalog | pg_database_size   | bigint   | oid
   | normal
 pg_catalog | pg_indexes_size| bigint   | regclass
   | normal
 pg_catalog | pg_relation_size   | bigint   | regclass
   | normal
 pg_catalog | pg_relation_size   | bigint   | regclass,
text  | normal
 pg_catalog | pg_table_size  | bigint   | regclass
   | normal
 pg_catalog | pg_tablespace_size | bigint   | name
   | normal
 pg_catalog | pg_tablespace_size | bigint   | oid
   | normal
 pg_catalog | pg_total_relation_size | bigint   | regclass
   | normal
(10 rows)

The commit message for "pg_size_pretty(numeric)" explains[2] it was
introduced for using with "pg_xlog_location_diff"[3] only.
Since "pg_size_bytes" supports up to (4 EiB-1B)[4] I hope nobody will
want to send a query like "tell me where difference between two
transaction log locations is bigger or equal than 4EiB"...

But comparison between int8 and numeric is OK, so usage of rational
input will not break queries:

postgres=# select '10e200'::numeric > 10::int8 as check;
 check
---
 t
(1 row)


P.S.: "bytes" size unit was added just for consistency: each group
should have a name, even with an exponent of 1.

> It would still be compatible with the example use cases, but it would
> be a better inverse of both variants of pg_size_pretty() and would be
> more future-proof. It already works internally using numeric, so it's
> a trivial change to make now, but impossible to change in the future
> without introducing a new function with a different name, which is
> messy.
>
> Thoughts?
>
> Regards,
> Dean

[1]http://www.postgresql.org/message-id/cafj8prd-tgodknxdygecza4on01_urqprwf-8ldkse-6bdh...@mail.gmail.com
[2]http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4a2d7ad76f5f275ef2d6a57e1a61d5bf756349e8
[3]http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE
[4]https://en.wikipedia.org/wiki/Binary_prefix#Adoption_by_IEC.2C_NIST_and_ISO

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small PATCH: check of 2 Perl modules

2016-02-15 Thread Eugene Kazakov


13.02.2016 16:04, Michael Paquier :

On Sat, Feb 13, 2016 at 1:47 PM, Robert Haas  wrote:

On Fri, Feb 12, 2016 at 8:20 AM, Eugene Kazakov
 wrote:

TAP-tests need two Perl modules: Test::More and IPC::Run.

The patch adds checking of modules in configure.in and configure.

Why would we want that?

I was doubtful at the beginning, but it doesn't hurt to have those
sanity checks in configure actually. The idea is that when
--enable-tap-tests is used now we simply error with "Can't locate
IPC/Run.pm in @INC" when kicking the tests, this check would allow one
to know if his environment is adapted to run the tests or not before
compiling anything.

And with this patch, we would fail now with that:
configure: error: Need Perl IPC::Run module

By the way, the patch given upthread by Eugene is incorrect. To begin
with, AX_PROG_PERL_MODULES has not been compiled by autoconf and I
can't believe that it is available on all platforms, for example on
OSX 10.8 I could not see it. And it is actually here:
https://www.gnu.org/software/autoconf-archive/ax_prog_perl_modules.html

I would recommend grabbing a copy of this file, and change the error
message as follows:
Perl module IPC::Run is required to run TAP tests

See the patch attached as reference, we could simplify the macro of
this m4 file and remove the check for perl, which is here:
+# Make sure we have perl
+if test -z "$PERL"; then
+AC_CHECK_PROG(PERL,perl,perl)
+fi
Though I kept the original script as-is in the patch attached.
Regards,

Michael,

Thank you. You are right, of course. I missed the 
m4_ax_prog_perl_modules. Please, see the fixed version of patch in the 
attach. I added m4_ax_prog_perl_modules and change the error messages.


The best regards,
Eugene Kazakov,
Postgres Professional
diff --git a/aclocal.m4 b/aclocal.m4
index 6f930b6..e6b2aaf 100644
--- a/aclocal.m4
+++ b/aclocal.m4
@@ -10,3 +10,4 @@ m4_include([config/perl.m4])
 m4_include([config/programs.m4])
 m4_include([config/python.m4])
 m4_include([config/tcl.m4])
+m4_include([config/m4_ax_prog_perl_modules.m4])
diff --git a/config/m4_ax_prog_perl_modules.m4 
b/config/m4_ax_prog_perl_modules.m4
new file mode 100644
index 000..ed5dd30
--- /dev/null
+++ b/config/m4_ax_prog_perl_modules.m4
@@ -0,0 +1,72 @@
+# ===
+#   http://www.gnu.org/software/autoconf-archive/ax_prog_perl_modules.html
+# ===
+#
+# SYNOPSIS
+#
+#   AX_PROG_PERL_MODULES([MODULES], [ACTION-IF-TRUE], [ACTION-IF-FALSE])
+#
+# DESCRIPTION
+#
+#   Checks to see if the given perl modules are available. If true the shell
+#   commands in ACTION-IF-TRUE are executed. If not the shell commands in
+#   ACTION-IF-FALSE are run. Note if $PERL is not set (for example by
+#   calling AC_CHECK_PROG, or AC_PATH_PROG), AC_CHECK_PROG(PERL, perl, perl)
+#   will be run.
+#
+#   MODULES is a space separated list of module names. To check for a
+#   minimum version of a module, append the version number to the module
+#   name, separated by an equals sign.
+#
+#   Example:
+#
+# AX_PROG_PERL_MODULES( Text::Wrap Net::LDAP=1.0.3, ,
+#   AC_MSG_WARN(Need some Perl modules)
+#
+# LICENSE
+#
+#   Copyright (c) 2009 Dean Povey 
+#
+#   Copying and distribution of this file, with or without modification, are
+#   permitted in any medium without royalty provided the copyright notice
+#   and this notice are preserved. This file is offered as-is, without any
+#   warranty.
+
+#serial 7
+
+AU_ALIAS([AC_PROG_PERL_MODULES], [AX_PROG_PERL_MODULES])
+AC_DEFUN([AX_PROG_PERL_MODULES],[dnl
+
+m4_define([ax_perl_modules])
+m4_foreach([ax_perl_module], m4_split(m4_normalize([$1])),
+ [
+  m4_append([ax_perl_modules],
+[']m4_bpatsubst(ax_perl_module,=,[ ])[' ])
+  ])
+
+if test "x$PERL" != x; then
+  ax_perl_modules_failed=0
+  for ax_perl_module in ax_perl_modules; do
+AC_MSG_CHECKING(for perl module $ax_perl_module)
+
+# Would be nice to log result here, but can't rely on autoconf internals
+$PERL -e "use $ax_perl_module; exit" > /dev/null 2>&1
+if test $? -ne 0; then
+  AC_MSG_RESULT(no);
+  ax_perl_modules_failed=1
+   else
+  AC_MSG_RESULT(ok);
+fi
+  done
+
+  # Run optional shell commands
+  if test "$ax_perl_modules_failed" = 0; then
+:
+$2
+  else
+:
+$3
+  fi
+else
+  AC_MSG_WARN(could not find perl)
+fi])dnl
diff --git a/configure b/configure
index 1903815..2d80202 100755
--- a/configure
+++ b/configure
@@ -15554,6 +15554,79 @@ done
   if test -z "$PERL"; then
 as_fn_error $? "Perl not found" "$LINENO" 5
   fi
+
+
+
+
+
+
+
+if test "x$PERL" != x; then
+  ax_perl_modules_failed=0
+  for ax_perl_module in 'Test::More' ; do
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for perl module 
$ax_perl_module" >&5

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2016-02-15 Thread Artur Zakirov


On 05.02.2016 11:09, Kyotaro HORIGUCHI wrote:

Hello,

I considered how to make tab-completion robust for syntactical
noises, in other words, optional words in syntax. Typically "IF
(NOT) EXISTS", UNIQUE and TEMPORARY are words that don't affect
further completion. However, the current delimit-matching
mechanism is not so capable (or is complexty-prone) to live with
such noises. I have proposed to use regular expressions or
simplified one for the robustness but it was too complex to be
applied.

This is another answer for the problem. Removal of such words
on-the-fly makes further matching more robust.

Next, currently some CREATE xxx subsyntaxes of CREATE SCHEMA are
matched using TailMatching but it makes difficult the
options-removal operations, which needs forward matching.

So I introduced two things to resolve them by this patch.



I did some tests with your patch. But I am not confident in tab-complete.c.

And I have some notes:

1 - I execute git apply command and get the following warning:

../0001-Suggest-IF-NOT-EXISTS-for-tab-completion-of-psql.patch:302: 
trailing whitespace.

/*
warning: 1 line adds whitespace errors.

This is because of superfluous whitespace I think.

2 - In psql I write "create table if" and press . psql adds the 
following:


create table IF NOT EXISTS

I think psql should continue with lower case if user wrote query with 
loser case text:


create table if not exists

3 - Same with "IF EXISTS". If a write "alter view if" and press  
psql writes:


alter view IF EXISTS

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-15 Thread Ashutosh Bapat

Thanks Fujita-san for bug report and the fix. Sorry for bug.

Here's patch with better way to fix it. I think while concatenating the
lists, we need to copy the lists being appended and in all the cases. If we
don't copy, a change in those lists can cause changes in the upward
linkages and thus lists of any higher level joins.

On Mon, Feb 15, 2016 at 1:10 PM, Etsuro Fujita 
wrote:

> On 2016/02/10 4:16, Robert Haas wrote:
>
>> On Tue, Feb 9, 2016 at 8:39 AM, Ashutosh Bapat
>>  wrote:
>>
>>> Thanks Jeevan for your review and comments. PFA the patch which fixes
>>> those.
>>>
>>
> Committed with a couple more small adjustments.
>>
>
> Thanks for working on this, Robert, Ashutosh, and everyone involved!
>
> I happened to notice that this code in foreign_join_ok():
>
> switch (jointype)
> {
> case JOIN_INNER:
> fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
>fpinfo_i->remote_conds);
> fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
>fpinfo_o->remote_conds);
> break;
>
> case JOIN_LEFT:
> fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
>   fpinfo_i->remote_conds);
> fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
>fpinfo_o->remote_conds);
> break;
>
> case JOIN_RIGHT:
> fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
>   fpinfo_o->remote_conds);
> fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
>fpinfo_i->remote_conds);
> break;
>
> case JOIN_FULL:
> fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
>   fpinfo_i->remote_conds);
> fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
>   fpinfo_o->remote_conds);
> break;
>
> default:
> /* Should not happen, we have just check this above */
> elog(ERROR, "unsupported join type %d", jointype);
> }
>
> would break the list fpinfo_i->remote_conds in the case of INNER JOIN or
> FULL JOIN.  You can see the list breakage from e.g., the following queries
> on an Assert-enabled build:
>
> postgres=# create extension postgres_fdw;
> CREATE EXTENSION
> postgres=# create server myserver foreign data wrapper postgres_fdw
> options (dbname 'mydatabase');
> CREATE SERVER
> postgres=# create user mapping for current_user server myserver;
> CREATE USER MAPPING
> postgres=# create foreign table foo (a int) server myserver options
> (table_name 'foo');
> CREATE FOREIGN TABLE
> postgres=# create foreign table bar (a int) server myserver options
> (table_name 'bar');
> CREATE FOREIGN TABLE
> postgres=# create foreign table baz (a int) server myserver options
> (table_name 'baz');
> CREATE FOREIGN TABLE
> postgres=# select * from foo, bar, baz where foo.a = bar.a and bar.a =
> baz.a and foo.a < 10 and bar.a < 10 and baz.a < 10;
>
> Attached is a patch to avoid the breakage.
>
> Best regards,
> Etsuro Fujita
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


foreign_join_ok_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect formula for SysV IPC parameters

2016-02-15 Thread Fujii Masao

On Mon, Feb 15, 2016 at 2:31 PM, Kyotaro HORIGUCHI
 wrote:
> Thanks for looking at this.
>
> At Fri, 12 Feb 2016 23:19:45 +0900, Fujii Masao  wrote 
> in 
>> >> ISTM that you also need to change the descriptions about SEMMNI and SEMMNS
>> >> under the Table 17-1.
>> >
>> > Oops! Thank you for pointing it out.
>> >
>> > The original description doesn't mention the magic-number ('5' in
>> > the above formulas, which previously was '4') so I haven't added
>> > anything about it.
>> >
>> > Process of which the number is limited by max_worker_processes is
>> > called 'background process' (not 'backend worker') in the
>> > documentation so I used the name to mention it in the additional
>> > description.
>> >
>> > The difference in the body text for 9.2, 9.3 is only a literal
>> > '4' to '5' in the formula.
>>
>> Thanks for updating the patches!
>>
>> They look good to me except that the formulas don't include the number of
>> background processes requesting shared memory access, i.e.,
>> GetNumShmemAttachedBgworkers(), in 9.3. Isn't it better to use the following
>> formula in 9.3?
>>
>>   ceil((max_connections + autovacuum_max_workers + number of
>> background proceses + 5) / 16)
>>
>> Attached patch uses the above formula for 9.3. I'm thinking to push your
>> patches to 9.2, 9.4, 9.5, master, also push the attached one to 9.3.
>> Comments?
>
> One concern is that users don't have any simple way to know how
> many bg-workers a server runs in their current configuration.

Users need to read the document of the extensions they want to load,
to see the number of background worker processes which will be running.

> The formula donsn't make sense without it.

IMO, documenting "incorrect" formula can cause more troubles.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-15 Thread Kyotaro HORIGUCHI

Hello,

At Mon, 8 Feb 2016 11:37:17 +0900, Amit Langote  
wrote in <56b7ff5d.7030...@lab.ntt.co.jp>
> 
> Hi Vinayak,
> 
> Thanks for updating the patch, a couple of comments:
> 
> On 2016/02/05 17:15, poku...@pm.nttdata.co.jp wrote:
> > Hello,
> > 
> > Please find attached updated patch.
> >> The point of having pgstat_report_progress_update_counter() is so that 
> >> you can efficiently update a single counter without having to update 
> >> everything, when only one counter has changed.  But here you are 
> >> calling this function a whole bunch of times in a row, which 
> >> completely misses the point - if you are updating all the counters, 
> >> it's more efficient to use an interface that does them all at once 
> >> instead of one at a time.
> > 
> > The pgstat_report_progress_update_counter() is called at appropriate places 
> > in the attached patch.
> 
> + charprogress_message[N_PROGRESS_PARAM][PROGRESS_MESSAGE_LENGTH];
> 
> [ ... ]
> 
> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", phase1);
> + pgstat_report_progress_update_message(0, progress_message);
> 
> [ ... ]
> 
> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, 
> "%s", phase2);
> + pgstat_report_progress_update_message(0, 
> progress_message);
> 
> Instead of passing the array of char *'s, why not just pass a single char
> *, because that's what it's doing - updating a single message. So,
> something like:
> 
> + char progress_message[PROGRESS_MESSAGE_LENGTH];
> 
> [ ... ]
> 
> + snprintf(progress_message, PROGRESS_MESSAGE_LENGTH, "%s", phase1);
> + pgstat_report_progress_update_message(0, progress_message);
> 
> [ ... ]
> 
> + snprintf(progress_message, PROGRESS_MESSAGE_LENGTH, "%s", phase2);
> + pgstat_report_progress_update_message(0, progress_message);
> 
> And also:
> 
> +/*---
> + * pgstat_report_progress_update_message()-
> + *
> + *Called to update phase of VACUUM progress
> + *---
> + */
> +void
> +pgstat_report_progress_update_message(int index, char *msg)
> +{
> 
> [ ... ]
> 
> + pgstat_increment_changecount_before(beentry);
> + strncpy((char *)beentry->st_progress_message[index], msg,
> PROGRESS_MESSAGE_LENGTH);
> + pgstat_increment_changecount_after(beentry);


As I might have written upthread, transferring the whole string
as a progress message is useless at least in this scenario. Since
they are a set of fixed messages, each of them can be represented
by an identifier, an integer number. I don't see a reason for
sending the whole of a string beyond a backend.

Next, the function pg_stat_get_command_progress() has a somewhat
generic name, but it seems to reuturn the data only for the
backends with beentry->st_command = COMMAND_LAZY_VACUUM and has
the column names specific for vucuum like process. If the
function is intended to be generic, it might be better to return
a set of integer[] for given type. Otherwise it should have a
name represents its objective.

CREATE FUNCTION
 pg_stat_get_command_progress(IN cmdtype integer)
 RETURNS SETOF integer[] as $$

SELECT * from pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as x
 x
-_
{1233, 16233, 1, }
{3244, 16236, 2, }


CREATE VIEW pg_stat_vacuum_progress AS
  SELECT S.s[1] as pid,
 S.s[2] as relid,
 CASE S.s[3] 
   WHEN 1 THEN 'Scanning Heap'
   WHEN 2 THEN 'Vacuuming Index and Heap'
   ELSE 'Unknown phase'
 END,
   
  FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;

# The name of the function could be other than *_command_progress.

Any thoughts or opinions?


> One more comment:
> 
> @@ -1120,14 +1157,23 @@ lazy_scan_heap(Relation onerel, LVRelStats
> *vacrelstats,
>   /* Log cleanup info before we touch indexes */
>   vacuum_log_cleanup_info(onerel, vacrelstats);
> 
> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", 
> phase2);
> + pgstat_report_progress_update_message(0, progress_message);
>   /* Remove index entries */
>   for (i = 0; i < nindexes; i++)
> + {
>   lazy_vacuum_index(Irel[i],
> [i],
> vacrelstats);
> + scanned_index_pages += 
> RelationGetNumberOfBlocks(Irel[i]);
> + /* Update the scanned index pages and number of index 
> scan */
> + pgstat_report_progress_update_counter(3, 
> scanned_index_pages);
> + pgstat_report_progress_update_counter(4, 
> vacrelstats->num_index_scans
> + 1);
> + }
>   /* Remove tuples from heap */
>   lazy_vacuum_heap(onerel, vacrelstats);
>   vacrelstats->num_index_scans++;
> + scanned_index_pages = 0;
> 
> I guess

Re: [HACKERS] innocuous: pgbench does FD_ISSET on invalid socket

2016-02-15 Thread Fabien COELHO



Hello Alvaro,


Any objections to changing it like this?  I'd probably backpatch to 9.5,
but no further (even though this pattern actually appears all the way
back.)


My 0.02€: if the pattern is repeated, maybe a function which incorporates 
the check would save lines and improve overall readability?


   ... = safePQsocket(...);

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE

2016-02-15 Thread Dmitry Ivanov

Hi hackers,

Recently I've been working on a CASCADE option for ALTER ... OWNER TO 
statement. Although it's still a working prototype, I think it's time to share 
my work.


Introduction


As of now there's no way to transfer the ownership of an object and all its 
dependent objects in one step. One has to manually alter the owner of each 
object, be it a table, a schema or something else. This patch adds the 
'CASCADE' option to every 'ALTER X OWNER TO' statement, including the 'ALTER 
DATABASE db OWNER TO user CASCADE' which turns out to be a delicate matter.


Implementation
==

There are two functions that process 'ALTER ... OWNER' statement: 
ExecAlterOwnerStmt() and ATExecCmd(). The latter function deals with the tasks 
that refer to all kinds of relations, while the first one handles the remaining 
object types. Basically, all I had to do is to add 'cascade' flag to the 
corresponding parsenodes and to make these functions call the dependency tree 
walker function (which would change the ownership of the dependent objects if 
needed). Of course, there are various corner cases for each kind of objects 
that require special treatment, but the code speaks for itself.

The aforementioned 'ALTER DATABASE db ...' is handled in a special way. Since 
objects that don't belong to the 'current database' are hidden, it is 
impossible to change their owner directly, so we have to do the job in the 
background worker that is connected to the 'db'. I'm not sure if this is the 
best solution available, but anyway.


What works
==

Actually, it seems to work in simple cases like 'a table with its inheritors' 
or 'a schema full of tables', but of course there might be things I've 
overlooked. There are some regression tests, though, and I'll continue to 
write some more.


What's dubious
==

It is unclear what kinds of objects should be transferred in case of database 
ownership change, since there's no way to get the full list of objects that 
depend on a given database. Currently the code changes ownership of all 
schemas (excluding the 'information_schema' and some others) and their 
contents, but this is a temporary limitation.

Feedback is welcome!


-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index c48e37b..54be671 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -21,6 +21,7 @@
 #include "catalog/index.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_amop.h"
+#include "catalog/pg_aggregate.h"
 #include "catalog/pg_amproc.h"
 #include "catalog/pg_attrdef.h"
 #include "catalog/pg_authid.h"
@@ -40,6 +41,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_largeobject_metadata.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -56,6 +58,7 @@
 #include "catalog/pg_ts_template.h"
 #include "catalog/pg_type.h"
 #include "catalog/pg_user_mapping.h"
+#include "commands/alter.h"
 #include "commands/comment.h"
 #include "commands/defrem.h"
 #include "commands/event_trigger.h"
@@ -66,6 +69,7 @@
 #include "commands/seclabel.h"
 #include "commands/trigger.h"
 #include "commands/typecmds.h"
+#include "commands/tablecmds.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteRemove.h"
@@ -77,17 +81,6 @@
 #include "utils/tqual.h"
 
 
-/*
- * Deletion processing requires additional state for each ObjectAddress that
- * it's planning to delete.  For simplicity and code-sharing we make the
- * ObjectAddresses code support arrays with or without this extra state.
- */
-typedef struct
-{
-	int			flags;			/* bitmask, see bit definitions below */
-	ObjectAddress dependee;		/* object whose deletion forced this one */
-} ObjectAddressExtra;
-
 /* ObjectAddressExtra flag bits */
 #define DEPFLAG_ORIGINAL	0x0001		/* an original deletion target */
 #define DEPFLAG_NORMAL		0x0002		/* reached via normal dependency */
@@ -97,17 +90,6 @@ typedef struct
 #define DEPFLAG_REVERSE		0x0020		/* reverse internal/extension link */
 
 
-/* expansible list of ObjectAddresses */
-struct ObjectAddresses
-{
-	ObjectAddress *refs;		/* => palloc'd array */
-	ObjectAddressExtra *extras; /* => palloc'd array, or NULL if not used */
-	int			numrefs;		/* current number of references */
-	int			maxrefs;		/* current size of palloc'd array(s) */
-};
-
-/* typedef ObjectAddresses appears in dependency.h */
-
 /* threaded list of ObjectAddresses, for recursion detection */
 typedef struct ObjectAddressStack
 {
@@ -164,12 +146,21 @@ static const Oid object_classes[] = {
 };
 
 
+/*
+ * We limit the number of dependencies reported to the client to
+ * MAX_REPORTED_DEPS, since client software may not deal well with
+ * enormous error strings.  The server log always gets a

Re: [HACKERS] extend pgbench expressions with functions

2016-02-15 Thread Fabien COELHO



Hello Michaël,


+ * Recursive evaluation of int or double expressions
+ *
+ * Note that currently only integer variables are available, with values
+ * stored as text.
This comment is incorrect, we only care about integers in this patch.


Indeed!

Taking patch 1 as a completely independent thing, there is no need to 
introduce PgBenchValueType yet. Similar remark for setIntValue and 
coerceToInt. They are useful afterwards when introducing double types to 
be able to handle double input parameters for the gaussian and other 
functions.


Yes. This is exactly the pain I'm trying to avoid, creating a different 
implementation for the first patch, which is just overriden when the 
second part is applied...


So I'm trying to compromise, having a several part patch *but* having the 
infrastructure ready for the second patch which adds the double type.


Note that the first patch without the second is a loss of time for 
everyone, as the nearly only useful functions are the randoms, which 
require a double argument, so it does not make sense to apply the first 
one if the second one is not to be applied, I think.



[...]
(INT64_MIN / -1) should error.
(INT64_MIN % -1) should result in 0.
This is missing the division handling.


Oops, indeed I totally messed up when merging the handling of / and %:-(

I have found another issue in the (a) patch: the internal scripts were 
using the future random function which do not yet exist, as they are in 
patch (b).


Here is a three part v30, which still includes the infrastructure for 
future types in the a patch, see my argumentation above.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ade1b53..9d5eb32 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -798,8 +798,11 @@ pgbench  options  dbname
   The expression may contain integer constants such as 5432,
   references to variables :variablename,
   and expressions composed of unary (-) or binary operators
-  (+, -, *, /, %)
-  with their usual associativity, and parentheses.
+  (+, -, *, /,
+  %) with their usual associativity, function calls and
+  parentheses.
+   shows the available
+  functions.
  
 
  
@@ -965,6 +968,52 @@ f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))

   
 
+   
+   
+PgBench Functions
+
+ 
+  
+   Function
+   Return Type
+   Description
+   Example
+   Result
+  
+ 
+ 
+  
+   abs(a)
+   same as a
+   integer value
+   abs(-17)
+   17
+  
+  
+   debug(a)
+   same asa 
+   print to stderr the given argument
+   debug(5432)
+   5432
+  
+  
+   max(i [, ... ] )
+   integer
+   maximum value
+   max(5, 4, 3, 2)
+   5
+  
+  
+   min(i [, ... ] )
+   integer
+   minimum value
+   min(5, 4, 3, 2)
+   2
+  
+ 
+ 
+   
+
   
As an example, the full definition of the built-in TPC-B-like
transaction is:
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 06ee04b..93c6173 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -16,10 +16,13 @@
 
 PgBenchExpr *expr_parse_result;
 
+static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
 static PgBenchExpr *make_integer_constant(int64 ival);
 static PgBenchExpr *make_variable(char *varname);
-static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
+static PgBenchExpr *make_op(const char *operator, PgBenchExpr *lexpr,
 		PgBenchExpr *rexpr);
+static int find_func(const char *fname);
+static PgBenchExpr *make_func(const int fnumber, PgBenchExprList *args);
 
 %}
 
@@ -31,13 +34,15 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 	int64		ival;
 	char	   *str;
 	PgBenchExpr *expr;
+	PgBenchExprList *elist;
 }
 
+%type  elist
 %type  expr
-%type  INTEGER
-%type  VARIABLE
+%type  INTEGER function
+%type  VARIABLE FUNCTION
 
-%token INTEGER VARIABLE
+%token INTEGER VARIABLE FUNCTION
 %token CHAR_ERROR /* never used, will raise a syntax error */
 
 /* Precedence: lowest to highest */
@@ -49,16 +54,25 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 
 result: expr{ expr_parse_result = $1; }
 
+elist:  	{ $$ = NULL; }
+	| expr 	{ $$ = make_elist($1, NULL); }
+	| elist ',' expr		{ $$ = make_elist($3, $1); }
+	;
+
 expr: '(' expr ')'			{ $$ = $2; }
 	| '+' expr %prec UMINUS	{ $$ = $2; }
-	| '-' expr %prec UMINUS	{ $$ = make_op('-', make_integer_constant(0), $2); }
-	| expr '+' expr			{ $$ = make_op('+', $1, $3); }
-	| expr '-' expr			{ $$ = make_op('-', $1, $3); }
-	| expr '*' expr			{ $$ = make_op('*', $1, $3); }
-	| expr '/' expr			{ $$ = make_op('/', $1, $3); }
-	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
+	| '-' expr %prec UMINUS	{ $$ = make_op("-", make_integer_constant(0), $2); }
+	|

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-02-15 Thread Etsuro Fujita


On 2016/02/15 15:20, Rushabh Lathia wrote:

On Fri, Feb 12, 2016 at 5:40 PM, Etsuro Fujita
> wrote:



As a result of our discussions, we reached a conclusion that the DML
pushdown APIs should be provided together with existing APIs such as
ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete, IIUC.
So, how about (1) leaving the description for the existing APIs
as-is and (2) adding a new description for the DML pushdown APIs in
parenthesis, like this?:

  If the IsForeignRelUpdatable pointer is set to
  NULL, foreign tables are assumed to be insertable,
updatable,
  or deletable if the FDW provides ExecForeignInsert,
  ExecForeignUpdate, or ExecForeignDelete
  respectively.
  (If the FDW attempts to optimize a foreign table update, it still
  needs to provide PlanDMLPushdown, BeginDMLPushdown,
  IterateDMLPushdown and EndDMLPushdown.)

Actually, if the FDW provides the DML pushdown APIs, (pushdown-able)
foreign table updates can be done without ExecForeignInsert,
ExecForeignUpdate or ExecForeignDelete.  So, the above docs are not
necessarily correct.  But we don't recommend to do that without the
existing APIs, so I'm not sure it's worth complicating the docs.



Adding a new description for DML pushdown API seems good idea. I would
suggest to add that as separate paragraph rather then into brackets.


OK, will do.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-02-15 Thread Dean Rasheed

> On 12/02/16 10:19, Dean Rasheed wrote:
>> This seems like a reasonable first patch for me as a committer, so
>> I'll take it unless anyone else was planning to do so.
>

So looking at this, it seems that for the most part pg_size_bytes()
will parse any output produced by pg_size_pretty(). The exception is
that there are 2 versions of pg_size_pretty(), one that takes bigint
and one that takes numeric, whereas pg_size_bytes() returns bigint, so
it can't handle all inputs. Is there any reason not to make
pg_size_bytes() return numeric?

It would still be compatible with the example use cases, but it would
be a better inverse of both variants of pg_size_pretty() and would be
more future-proof. It already works internally using numeric, so it's
a trivial change to make now, but impossible to change in the future
without introducing a new function with a different name, which is
messy.

Thoughts?

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

94 matches

Mail list logo