Re: [HACKERS] Current int & float overflow checking is slow.

2017-10-30 Thread Robert Haas
On Mon, Oct 30, 2017 at 11:24 PM, Andres Freund  wrote:
>> >   Perhaps it should rather be pg_add_s32_overflow, or a similar
>> >   naming scheme?
>>
>> Not sure what the s is supposed to be?  Signed?
>
> Yes, signed. So we could add a u32 or something complementing the
> functions already in the patch. Even though overflow checks are a heck
> of a lot easier to write for unsigned ints, the intrinsics are still
> faster.  I don't have any sort of strong feelings on the naming.

Right, I guess including the s is probably a good idea then.

>> I suggest that if we think we don't need -fwrapv any more, we ought to
>> remove it.  Otherwise, we won't find out if we're wrong.
>
> I agree that we should do so at some point not too far away in the
> future. Not the least because we don't specify this kind of C dialect in
> a lot of other compilers. Additionally the flag causes some slowdown
> (because e.g. for loop variables are optimized less). But I'm fairly
> certain it needs a bit more care that I've invested as of now - should
> probably at least compile with -Wstrict-overflow=some-higher-level, and
> with ubsan. I'm fairly certain there's more bogus overflow checks
> around...

Makes sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> So: I put the blame on the fact that summarize_range() thinks that
>> the tuple offset it has for the placeholder tuple is guaranteed to
>> hold good, even across possibly-long intervals where it's holding
>> no lock on the containing buffer.

> Yeah, I think this is a pretty reasonable explanation for the problem.
> I don't understand why it doesn't fail in 9.6.

Yeah, we're still missing an understanding of why we didn't see it
before; the inadequate locking was surely there before.  I'm guessing
that somehow the previous behavior of PageIndexDeleteNoCompact managed
to mask the problem (perhaps only by not throwing an error, which doesn't
imply that the index state was good afterwards).  But I don't see quite
how it did that.

One thing I think we do know now is that the bug is triggered by two
concurrent executions of summarize_range.  So I'd look for edge cases
like whether the placeholder tuple can be deleted and then reinserted
into the same lp index.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-30 Thread amul sul
On Tue, Oct 31, 2017 at 9:54 AM, Robert Haas  wrote:
> On Mon, Oct 30, 2017 at 5:52 PM, amul sul  wrote:
>> Actually, int4[] is also inappropriate type as we have started using a 64bit
>> hash function.  We need something int8[] which is not available, so that I
>> have used ANYARRAYOID in the attached patch(0004).
>
> I don't know why you think int8[] is not available.
>
> rhaas=# select 'int8[]'::regtype;
>  regtype
> --
>  bigint[]
> (1 row)
>

I missed _int8, was searching for INT8ARRAYOID in pg_type.h, my bad.

>>> I wrote the following query
>>> to detect problems of this type, and I think we might want to just go
>>> ahead and add this to the regression test suite, verifying that it
>>> returns no rows:
>>>
>>> select oid::regprocedure, provariadic::regtype, proargtypes::regtype[]
>>> from pg_proc where provariadic != 0
>>> and case proargtypes[array_length(proargtypes, 1)-1]
>>> when 2276 then 2276 -- any -> any
>>> when 2277 then 2283 -- anyarray -> anyelement
>>> else (select t.oid from pg_type t where t.typarray =
>>> proargtypes[array_length(proargtypes, 1)-1]) end
>>> != provariadic;
>>>
>>
>> Added in 0001 patch.
>
> Committed.
>

Thanks !

>> One advantage of current implementation is that we can see which hash
>> function are used for the each partitioning column and also we don't need to
>> worry about user specified opclass and different input types.
>>
>> Something similar I've tried in my initial patch version[1], but I have 
>> missed
>> user specified opclass handling for each partitioning column.  Do you want me
>> to handle opclass using RelabelType node? I am afraid that, that would make
>> the \d+ output more horrible than the current one if non-default opclass 
>> used.
>
> Maybe we should just pass the OID of the partition (or both the
> partition and the parent, so we can get the lock ordering right?)
> instead.
>
Okay, will try this.


Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-30 Thread Robert Haas
On Mon, Oct 30, 2017 at 5:52 PM, amul sul  wrote:
> Actually, int4[] is also inappropriate type as we have started using a 64bit
> hash function.  We need something int8[] which is not available, so that I
> have used ANYARRAYOID in the attached patch(0004).

I don't know why you think int8[] is not available.

rhaas=# select 'int8[]'::regtype;
 regtype
--
 bigint[]
(1 row)

>> I wrote the following query
>> to detect problems of this type, and I think we might want to just go
>> ahead and add this to the regression test suite, verifying that it
>> returns no rows:
>>
>> select oid::regprocedure, provariadic::regtype, proargtypes::regtype[]
>> from pg_proc where provariadic != 0
>> and case proargtypes[array_length(proargtypes, 1)-1]
>> when 2276 then 2276 -- any -> any
>> when 2277 then 2283 -- anyarray -> anyelement
>> else (select t.oid from pg_type t where t.typarray =
>> proargtypes[array_length(proargtypes, 1)-1]) end
>> != provariadic;
>>
>
> Added in 0001 patch.

Committed.

> One advantage of current implementation is that we can see which hash
> function are used for the each partitioning column and also we don't need to
> worry about user specified opclass and different input types.
>
> Something similar I've tried in my initial patch version[1], but I have missed
> user specified opclass handling for each partitioning column.  Do you want me
> to handle opclass using RelabelType node? I am afraid that, that would make
> the \d+ output more horrible than the current one if non-default opclass used.

Maybe we should just pass the OID of the partition (or both the
partition and the parent, so we can get the lock ordering right?)
instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-10-30 Thread Thomas Munro
On Tue, Aug 1, 2017 at 9:28 AM, Andres Freund  wrote:
> On 2017-07-26 20:12:56 +1200, Thomas Munro wrote:
>> I'll report on performance separately.
>
> Looking forward to that ;)

Here are some experimental results from a Xeon E5-2695 v3 with a ton
of RAM and spinning disks (EDB lab server "scylla").  I used TPC-H
dbgen scale 10 with the additional indexes suggested by Tomas
Vondra[1].  10GB of source data (= 23GB pgdata dir) is obviously quite
small as these things go, and I hope we'll run some of these with a
much larger scale soon (it's a big job), but it's enough to runs
queries for tens of seconds to minutes so it's definitely in parallel
query territory and shows some pretty interesting effects IMHO.

First, here's a stupid self-join as a warm-up.  The query is SELECT
COUNT(*) FROM lineitem r JOIN lineitem s USING (l_orderkey,
l_linenumber), where lineitem is ~60 million rows.

(1) With work_mem set sky-high so no batching is required, how much
speed-up does each worker contribute with this feature off (= same as
unpatched master) and on?  In this table, each cell shows the speed-up
compared to w=0 (no workers):

 parallel_hash |  w=0  |  w=1  |  w=2  |  w=3  |  w=4  |  w=5  |  w=6
---+---+---+---+---+---+---+---
 off   | 1.00x | 1.42x | 1.66x | 1.76x | 1.66x | 1.68x | 1.69x
 on| 1.00x | 1.76x | 2.54x | 3.21x | 3.80x | 4.30x | 4.79x

(2) With work_mem set to 128MB this query needs 32 batches.  Again,
how much speed-up does each worker contribute with the feature off and
on?

 parallel_hash |  w=0  |  w=1  |  w=2  |  w=3  |  w=4  |  w=5  |  w=6
---+---+---+---+---+---+---+---
 off   | 1.00x | 1.37x | 1.49x | 1.32x | 1.67x | 1.63x | 1.64x
 on| 1.00x | 1.94x | 2.72x | 2.82x | 3.02x | 4.64x | 5.98x

I haven't tried to grok the shape of that curve yet.  Interestingly
(not shown here) the 32 batch parallel hash actually starts to beat
the single-batch parallel hash somewhere around 5-6 workers, and at 15
workers it achieves 9.53x speed-up compared to w=0 and is still
gaining as you add more workers, whereas the single-batch version tops
out around 8 workers.  This may be in part due to the trade-offs
discussed in "Design and Evaluation of Main Memory Hash Join
Algorithms for Multi-core CPUs" (short version: partitioning up front
can pay off by reducing cache misses at various levels and some
research databases would consider that), but I would think we're
probably pretty far away from that frontier and there other probably
other more basic problems.  Investigation/profiling required.

Next, here are some numbers from the TPC-H queries.  I included only
queries where a Parallel Hash was selected by the planner.  I stopped
at w=6 because that's the highest number of workers the planner would
pick by default at that scale.  (If I'm getting the maths right, TPC-H
scale 300's lineitem table should inspire about 10 workers to get out
of bed; you get an extra worker each time a table triples in size.)

(3) What is the speed-up with enable_parallel_hash = on vs
enable_parallel_hash = off?  Here is the result table for various
numbers of workers, with work_mem set to 1GB.

 query |  w=0  |  w=1  |  w=2  |  w=3  |  w=4  |  w=5  |  w=6
---+---+---+---+---+---+---+---
 3 | 1.02x | 1.16x | 1.37x | 1.79x | 1.95x | 2.29x | 2.44x
 5 | 1.03x | 1.15x | 1.20x | 1.44x | 1.95x | 2.05x | 1.34x
 7 | 1.02x | 1.26x | 1.54x | 2.18x | 2.57x | 1.25x | 1.32x
 8 | 1.00x | 1.56x | 1.49x | 1.47x | 1.40x | 0.55x | 0.55x
 9 | 1.02x | 1.24x | 1.35x | 1.50x | 1.59x | 1.82x | 1.82x
10 | 1.02x | 1.16x | 1.19x | 1.44x | 1.51x | 1.75x | 1.83x
12 | 1.01x | 1.22x | 1.53x | 0.72x | 0.74x | 0.73x | 0.99x
14 | 1.00x | 1.08x | 1.18x | 1.33x | 1.41x | 1.54x | 1.52x
16 | 1.01x | 1.22x | 1.10x | 1.10x | 1.10x | 1.11x | 1.10x
18 | 0.99x | 1.07x | 1.05x | 0.99x | 0.99x | 0.99x | 1.03x
21 | 1.00x | 1.28x | 1.24x | 1.34x | 0.18x | 0.19x | 0.23x

Some commentary on the cases where the performance was apparently hurt
by the feature: for Q21 with w=3 workers and above with
enable_parallel_hash = off the planner switched from a hash join to a
nested loop and that turned out to be better, but with
enable_parallel_hash = on it didn't give up on hash join until there
were 6 workers.  Something similar happened with Q8 around 5 workers.
Q21 has some major cardinality estimation problems as discussed
elsewhere, and on this run I didn't think to apply the patch that
fixes (?) that.  In other words, as far as I can tell, all of those
are cases where there is possibly room for general planner improvement
outside this project: the point at which we flip from one plan type to
another moves around, not necessarily indicating a problem with
Parallel Hash as an executor node.  That isn't to say I'm not
interested in understanding the causes better and trying to 

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Alvaro Herrera
Tom Lane wrote:

> So: I put the blame on the fact that summarize_range() thinks that
> the tuple offset it has for the placeholder tuple is guaranteed to
> hold good, even across possibly-long intervals where it's holding
> no lock on the containing buffer.

Yeah, I think this is a pretty reasonable explanation for the problem.
I don't understand why it doesn't fail in 9.6.

> Fixing this without creating new problems is beyond my familiarity
> with the BRIN code.  It looks like it might be nontrivial :-(

Looking.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Alvaro Herrera
Thanks everyone for the analysis downthread.  Here's a test case that
provokes the crash faster.  Initialize with

create table brin_test (a text);
create index on brin_test using brin (a) with (pages_per_range = 1);

Then in one psql, run this:
select brin_summarize_new_values('brin_test_a_idx') \watch 0.1

and a pgbench running a script with this line (one client is enough):

insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), 
int4(random() * 200)));

It crashes in 10-20 seconds for me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-30 Thread Masahiko Sawada
On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas  wrote:
> On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
>  wrote:
>> Hello. I made some bugfixes and rewrite the patch.
>
> I don't think it's a good idea to deliberately leave the state of the
> standby different from the state of the  master on the theory that it
> won't matter.  I feel like that's something that's likely to come back
> to bite us.

I agree with Robert. What happen if we intentionally don't apply the
truncation WAL and switched over? If we insert a tuple on the new
master server to a block that has been truncated on the old master,
the WAL apply on the new standby will fail? I guess there are such
corner cases causing failures of WAL replay after switch-over.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Connor Wolf
I was mostly unclear on how I'd go about attaching the extension functions
to the relevant indexing mechanism. From the looks of the vptree.tar.gz
file (which is really, *really* helpful, incidentally!), a it's done via a
custom operator class, which then gets passed to the actual index creation
mechanism when you're declaring the index.

I think I had looked at that at one point, but it's been a while. In my
case, I'm using discrete-cosine-transform based perceptual hashes for
searching. They are nice and compact (64-bits per hash), while still
producing good search results. I have a dataset of ~36 million images, and
it does searches in < 50 milliseconds with a hamming distance of 4, while
touching ~0.25% of the tree (And occupying ~18 GB of ram).

My BK tree is up on github here
,
if anyone needs something like that (BSD licensed, pretty well tested).
There's also a python wrapper for it.

I'll probably not have time to poke about until this weekend, but thanks!
Connor



On Mon, Oct 30, 2017 at 4:50 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi!
>
> On Sun, Oct 29, 2017 at 12:07 PM, Connor Wolf <
> w...@imaginaryindustries.com> wrote:
>
>> I'm looking at implementing a custom indexing scheme, and I've been
>> having trouble understanding the proper approach.
>>
>> Basically, I need a BK tree, which is a tree-structure useful for
>> indexing arbitrary discrete metric-spaces (in my case, I'm interested in
>> indexing across the hamming edit-distance of perceptual hashes, for fuzzy
>> image searching). I'm *pretty* sure a SP-GiST index is the correct index
>> type, as my tree is intrinsically unbalanced.
>>
>
> Yes, SP-GiST is appropriate index type for BK tree.  I'm pretty sure BK
> tree could be implemented as SP-GiST opclass.
> The only thing worrying me is selection pivot values for nodes.  SP-GiST
> builds by insertion of index tuples on by one.  First pivot value for root
> node in SP-GIST would be created once first leaf page overflows.  Thus, you
> would have to select this pivot value basing on very small fraction in the
> beginning of dataset.
> As I know, BK tree is most efficient when root pivot value is selected
> after looking in whole dataset and then hierarchically to subtrees.
>
> BTW, did you try my extension for searching similar images.  It's quite
> primitive, but works for some cases.
> https://github.com/postgrespro/imgsmlr
> https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdf
>
> I have a functional stand-alone implementation of a BK-Tree, and it works
>> very well, but the complexity of managing what is basically a external
>> index for my database has reached the point where it's significantly
>> problematic, and it seems to be it should be moved into the database.
>>
>
> Sure, moving this index to the database is right decision.
>
> Anyways, looking at the contents of postgres/src/backend/access/spgist,
>> it looks pretty straightforward in terms of the actual C implementation,
>> but I'm stuck understanding how to "install" a custom SP-GiST
>> implementation. There are several GiST indexing implementations in the
>> contrib directory, but no examples for how I'd go about implementing a
>> loadable SP-GiST index.
>>
>> Basically, my questions are:
>>
>>- Is it possible to implement a SP-GiST indexing scheme as a loadable
>>module?
>>
>>  Yes, it's possible to define SP-GiST.
>
>>
>>- If so, how?
>>
>> The pretty same way as GiST opclass extension.  You have to define
> supporting functions and operators and then define operator class over them.
>
>>
>>- And is there an example I can base my implementation off of?
>>
>>
>>
>> I'm relatively comfortable with C (much moreso with C++), but I haven't
>> spent a lot of time looking at the postgresql codebase.  I don't think I
>> could start from a empty folder and make a properly-implemented module in
>> any reasonable period of time, so if I have a working example for some sort
>> of index that uses the same interfaces that would really help a lot
>>
> I don't think there is an example in PostgreSQL source code tree or on
> github.  But I've attached by early experiment with VP-tree (seems to be
> pretty same as BK tree) using SP-GiST (see vptree.tar.gz).  Basing on this
> experiment I realized that it's important to select root pivot value basing
> on the whole dataset.  However, for your metric/dataset/queries it might
> appear to be different.
>
> It also would be nice to someday improve SP-GiST to support some global
> strategies on index creation.  In particular, it would allow to resolve
> selection of pivot values problem that I mention above.  Right now my
> colleagues and me don't have time for that.  But I can assist you with
> advises if you will decide to implement that.
>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres 

Re: [HACKERS] Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)

2017-10-30 Thread Badrul Chowdhury
Hi Robert,

Thank you for the comprehensive review! We are very much in the early stages of 
contributing to the PG community and we clearly have lots to learn, but we look 
forward to becoming proficient and active members of the pg community.

Regarding the patch, I have tested it extensively by hand and it works great.

Some comments on the future direction:

>> Some thoughts on the future:
>> 
>> - libpq should grow an option to force a specific protocol version.
>> Andres already proposed one to force 2.0, but now we probably want to
>> generalize that to also allow forcing a particular minor version.
>> That seems useful for testing, if nothing else, and might let us add TAP 
>> tests
>> that this stuff works as intended.
>> 
>> - Possibly we should commit the portion of the testing patch which ignores
>> NegotiateProtocolVersion to v11, maybe also adding a connection status
>> function that lets users inquire about whether a NegotiateProtocolVersion
>> message was received and, if so, what parameters it reported as unrecognized
>> and what minor version it
>> reported the server as speaking.   The existing PQprotocolVersion
>> interface looks inadequate, as it seems to return only the major version.

I think these changes are a good idea; I will initiate a design discussion on 
these targeting the 2018-01 commitfest on a separate thread.

>> - On further reflection, I think the reconnect functionality you proposed
>> previously is probably a good idea.  It won't be necessary with servers that
>> have been patched to send NegotiateProtocolVersion, but there may be quite
>> a few that haven't for a long time to come, and although an automated
>> reconnect is a little annoying, it's a lot less annoying than an outright
>> connection failure.  So that part of your patch should probably be 
>> resubmitted
>> when and if we bump the version to 3.1.

I will preserve the FE changes in my local branch so that we have it ready when 
a decision has been made regarding the bumping of the pgwire version.

Again, thanks very much for your feedback- I am in a much better position to 
make future contributions to the community explicitly because of it.

Regards,
Badrul

>> -Original Message-
>> From: Robert Haas [mailto:robertmh...@gmail.com]
>> Sent: Friday, October 27, 2017 5:56 AM
>> To: Badrul Chowdhury 
>> Cc: Tom Lane ; Satyanarayana Narlapuram
>> ; Craig Ringer
>> ; Peter Eisentraut ; Magnus
>> Hagander ; PostgreSQL-development > hack...@postgresql.org>
>> Subject: Re: [HACKERS] Re: protocol version negotiation (Re: Libpq
>> PGRES_COPY_BOTH - version compatibility)
>> 
>> On Thu, Oct 19, 2017 at 1:35 AM, Badrul Chowdhury
>>  wrote:
>> > The new functionality is for sending 64bit ints. I think 32bits is 
>> > sufficient for
>> the information we want to pass around in the protocol negotiation phase, so
>> I left this part unchanged.
>> 
>> No, it isn't.  That commit didn't add any new functionality, but it changed 
>> the
>> preferred interfaces for assembling protocol messages.
>> Your patch was still using the old ones.
>> 
>> Attached is an updated patch.  I've made a few modifications:
>> 
>> - I wrote documentation.  For future reference, that's really the job of the
>> patch submitter.
>> 
>> - I changed the message type byte for the new message from 'Y' to 'v'.
>> 'Y' is apparently used by logical replication as a "type message", but 'v' 
>> is not
>> currently used for anything.  It's also somewhat mnemonic.
>> 
>> - I deleted the minimum protocol version from the new message.  I know there
>> were a few votes for including it, but I think it's probably useless.  The 
>> client
>> should always request the newest version it can support; if that's not new
>> enough for the server, then we're dead anyway and we might as well just
>> handle that via ERROR. Moreover, it seems questionable whether we'd ever
>> deprecate 3.0 support in the server anyway, or if we do, it'll probably be
>> because 4.0 has been stable for a decade or so.  Desupporting 3.0 while
>> continuing to support 3.x,x>0 seems like a remote outcome (and, like I say, 
>> if it
>> does happen, ERROR is a good-enough response).  If there's some use case for
>> having a client request an older protocol version than the newest one it can
>> support, then this change could be revisited, or we can just handle that by
>> retrying the whole connection attempt.
>> 
>> - I changed the test for whether to send NegotiateProtocolVersion to send it
>> only when the client requests a version too new for the server.  I think 
>> that if
>> the client requests a version older than what the server could support, the
>> server should just silently use the older version.  That's arguable.  You 
>> could
>> argue that the message should be sent anyway (at least to post-3.0 clients) 
>> as

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
I wrote:
> Hmm.  The index offnum being complained of is one past the end of the
> lp array.  I think I see what about that commit changed the behavior:
> the old code for PageIndexDeleteNoCompact never changed the length
> of the lp array, except in the corner case where the page is becoming
> completely empty.  The new code will shorten the lp array (decrease
> phdr->pd_lower) if it's told to remove the last item.  So if you make
> two successive calls specifying the same offnum, and it's the last one
> on the page, the second one will fail with the symptoms we see here.
> However, so far as I can see, a sequence like that would have failed
> before too, just with a different error message, because once the
> first call had marked the item unused, the second call would not see
> it as a candidate to match.  So I'm not quite sure how that's related
> ... but it seems like it must be.

I'm still confused about why it didn't fail before, but after adding
some additional code to log each call of PageIndexTupleDeleteNoCompact,
I think I've got a smoking gun:

2017-10-30 18:18:44.321 EDT [10932] LOG:  deleting tuple 292 (of 292) in rel 
brin_test_c_idx page 2
2017-10-30 18:18:44.321 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:18:44.393 EDT [10932] LOG:  deleting tuple 292 (of 292) in rel 
brin_test_d_idx page 2
2017-10-30 18:18:44.393 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:18:53.428 EDT [10932] LOG:  deleting tuple 186 (of 186) in rel 
brin_test_e_idx page 3
2017-10-30 18:18:53.428 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:19:13.794 EDT [10938] LOG:  deleting tuple 186 (of 186) in rel 
brin_test_e_idx page 4
2017-10-30 18:19:13.794 EDT [10938] STATEMENT:  insert into brin_test select
 mod(i,10)/25,
 mod(i,10)/25,
 mod(i,10)/25,
 mod(i,10)/25,
 md5((mod(i,10)/25)::text)::uuid
from generate_series(1,10) s(i)
2017-10-30 18:19:13.795 EDT [10932] LOG:  deleting tuple 186 (of 185) in rel 
brin_test_e_idx page 4
2017-10-30 18:19:13.795 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:19:13.795 EDT [10932] PANIC:  invalid index offnum: 186
2017-10-30 18:19:13.795 EDT [10932] STATEMENT:  vacuum brin_test

So what happened here is that the inserting process decided to
summarize concurrently with the VACUUM process, and the inserting
process deleted (or maybe just updated/moved?) the placeholder tuple
that VACUUM was expecting to delete, and then we get the PANIC because
the tuple we're expecting to delete is already gone.

So: I put the blame on the fact that summarize_range() thinks that
the tuple offset it has for the placeholder tuple is guaranteed to
hold good, even across possibly-long intervals where it's holding
no lock on the containing buffer.

Fixing this without creating new problems is beyond my familiarity
with the BRIN code.  It looks like it might be nontrivial :-(

regards, tom lane

diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e..04ad804 100644
*** a/src/backend/access/brin/brin_pageops.c
--- b/src/backend/access/brin/brin_pageops.c
*** brin_doupdate(Relation idxrel, BlockNumb
*** 243,249 
  		if (extended)
  			brin_page_init(BufferGetPage(newbuf), BRIN_PAGETYPE_REGULAR);
  
! 		PageIndexTupleDeleteNoCompact(oldpage, oldoff);
  		newoff = PageAddItem(newpage, (Item) newtup, newsz,
  			 InvalidOffsetNumber, false, false);
  		if (newoff == InvalidOffsetNumber)
--- 243,249 
  		if (extended)
  			brin_page_init(BufferGetPage(newbuf), BRIN_PAGETYPE_REGULAR);
  
! 		PageIndexTupleDeleteNoCompact(idxrel, oldbuf, oldpage, oldoff);
  		newoff = PageAddItem(newpage, (Item) newtup, newsz,
  			 InvalidOffsetNumber, false, false);
  		if (newoff == InvalidOffsetNumber)
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 22f2076..4d5dad3 100644
*** a/src/backend/access/brin/brin_revmap.c
--- b/src/backend/access/brin/brin_revmap.c
*** brinRevmapDesummarizeRange(Relation idxr
*** 409,415 
  	ItemPointerSetInvalid();
  	brinSetHeapBlockItemptr(revmapBuf, revmap->rm_pagesPerRange, heapBlk,
  			invalidIptr);
! 	PageIndexTupleDeleteNoCompact(regPg, regOffset);
  	/* XXX record free space in FSM? */
  
  	MarkBufferDirty(regBuf);
--- 409,415 
  	ItemPointerSetInvalid();
  	brinSetHeapBlockItemptr(revmapBuf, revmap->rm_pagesPerRange, heapBlk,
  			invalidIptr);
! 	PageIndexTupleDeleteNoCompact(idxrel, regBuf, regPg, regOffset);
  	/* XXX record free space in FSM? */
  
  	MarkBufferDirty(regBuf);
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index 60daa54..c8cc9f8 100644
*** a/src/backend/access/brin/brin_xlog.c
--- b/src/backend/access/brin/brin_xlog.c
*** brin_xlog_update(XLogReaderState *record
*** 150,156 
  
  		offnum 

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Tomas Vondra  writes:
> On 10/30/2017 09:03 PM, Tom Lane wrote:
>> [ scratches head ... ]  Not sure how that could've introduced any
>> problems?  Will look.

> Not sure, but I can confirm Michael's findings - I've been unable to
> reproduce the issue on 1a4be103a5 even after 20 minutes, and on
> 24992c6db9 it failed after only 2.

Hmm.  The index offnum being complained of is one past the end of the
lp array.  I think I see what about that commit changed the behavior:
the old code for PageIndexDeleteNoCompact never changed the length
of the lp array, except in the corner case where the page is becoming
completely empty.  The new code will shorten the lp array (decrease
phdr->pd_lower) if it's told to remove the last item.  So if you make
two successive calls specifying the same offnum, and it's the last one
on the page, the second one will fail with the symptoms we see here.
However, so far as I can see, a sequence like that would have failed
before too, just with a different error message, because once the
first call had marked the item unused, the second call would not see
it as a candidate to match.  So I'm not quite sure how that's related
... but it seems like it must be.

Anyway, my opinion ATM is that PageIndexTupleDeleteNoCompact is fine,
and what we're looking at is some kind of bug in summarize_range
or brin_doupdate, causing them to pass a offset beyond the end of
the page in some corner case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-10-30 Thread srielau
Pavel,

I wouldn't put in the DROP option.
Or at least not in that form of syntax.

By convention CREATE persists DDL and makes object definitions visible
across sessions.
DECLARE defines session private objects which cannot collide with other
sessions.

If you want variables with a short lifetime that get dropped at the end of
the transaction that by definition would imply a session private object. So
it ought to be DECLARE'd.

As far as I can see PG has been following this practice so far.

Cheers
Serge Rielau
Salesforce.com



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Adding column_constraint description in ALTER TABLE synopsis

2017-10-30 Thread Lætitia Avrot
Hello,

As Amit Langot pointed out, the column_constraint definition is missing
whereas it is used in ALTER TABLE synopsis. It can be easily found in the
CREATE TABLE synopsis, but it's not very user friendly.

I simply copied/paste the column_constraint definition from the CREATE
TABLE synopsis to the ALTER TABLE synopsis. I also had to change the first
word "where" to "and" as it's not the first definition in that synopsis.

I choose to add it above table_constraint as column_constraint is used
before table_constraint.

The patch should apply to MASTER (or I messed up with git). I built and
tested it successfully on my laptop.

You will find enclosed my patch.

Regards,

Lætitia
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 41acda0..c02dc38 100644
*** a/doc/src/sgml/ref/alter_table.sgml
--- b/doc/src/sgml/ref/alter_table.sgml
***
*** 85,90  ALTER TABLE [ IF EXISTS ] name
--- 85,104 
  OWNER TO { new_owner | CURRENT_USER | SESSION_USER }
  REPLICA IDENTITY { DEFAULT | USING INDEX index_name | FULL | NOTHING }
  
+ and column_constraint is:
+ 
+ [ CONSTRAINT constraint_name ]
+ { NOT NULL |
+   NULL |
+   CHECK ( expression ) [ NO INHERIT ] |
+   DEFAULT default_expr |
+   GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ] |
+   UNIQUE index_parameters |
+   PRIMARY KEY index_parameters |
+   REFERENCES reftable [ ( refcolumn ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ]
+ [ ON DELETE action ] [ ON UPDATE action ] }
+ [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
+ 
  and table_constraint is:
  
  [ CONSTRAINT constraint_name ]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: BRIN multi-range indexes

2017-10-30 Thread Tomas Vondra
Hi,

attached is a patch series that includes both the BRIN multi-range
minmax indexes discussed in this thread, and the BRIN bloom indexes
initially posted in [1].

It seems easier to just deal with a single patch series, although we may
end up adding just one of those proposed opclasses.

There are 4 parts:

0001 - Modifies bringetbitmap() to pass all scan keys to the consistent
function at once. This is actually needed by the multi-minmax indexes,
but not really required for the others.

I'm wondering if this is a safechange, considering it affects the BRIN
interface. I.e. custom BRIN opclasses (perhaps in extensions) will be
broken by this change. Maybe we should extend the BRIN API to support
two versions of the consistent function - one that processes scan keys
one by one, and the other one that processes all of them at once.

0002 - Adds BRIN bloom indexes, along with opclasses for all built-in
data types (or at least those that also have regular BRIN opclasses).

0003 - Adds BRIN multi-minmax indexes, but only with float8 opclasses
(which also includes timestamp etc.). That should be good enough for
now, but adding support for other data types will require adding some
sort of "distance" operator which is needed for merging ranges (to pick
the two "closest" ones). For float8 it's simply a subtraction.

0004 - Moves dealing with IS [NOT] NULL search keys from opclasses to
bringetbitmap(). The code was exactly the same in all opclasses, so
moving it to bringetbitmap() seems right. It also allows some nice
optimizations where we can skip the consistent() function entirely,
although maybe that's useless. Also, maybe the there are opclasses that
actually need to deal with the NULL values in consistent() function?


regards


[1]
https://www.postgresql.org/message-id/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Pass-all-keys-to-BRIN-consistent-function-at-once.patch.gz
Description: application/gzip


0002-BRIN-bloom-indexes.patch.gz
Description: application/gzip


0003-BRIN-multi-range-minmax-indexes.patch.gz
Description: application/gzip


0004-Move-IS-NOT-NULL-checks-to-bringetbitmap.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tomas Vondra


On 10/30/2017 09:03 PM, Tom Lane wrote:
> Michael Paquier  writes:
>> b1328d78f is close enough, but per what I see that's actually not
>> true. I have been able to reproduce the problem, which shows up within
>> a window of 2-4 minutes. Still sometimes it can take way longer, and I
>> spotted one test where it took 15 minutes to show up... So, bisecting
>> with a test that looks for core files for 20 minutes, I am seeing that
>> the following commit is actually at fault:
> 
>> commit 24992c6db9fd40f342db1f22747ec9e56483796d
>> Author: Tom Lane 
>> Date:   Fri Sep 9 19:00:59 2016 -0400
> 
>> Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.
> 
> [ scratches head ... ]  Not sure how that could've introduced any
> problems?  Will look.
> 

Not sure, but I can confirm Michael's findings - I've been unable to
reproduce the issue on 1a4be103a5 even after 20 minutes, and on
24992c6db9 it failed after only 2.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Remove inbound links to sql-createuser

2017-10-30 Thread David G. Johnston
Since CREATE USER is officially an alias for CREATE ROLE other parts of the
documentation should point to CREATE ROLE, not CREATE USER.  Most do but I
noticed when looking at CREATE DATABASE that it did not.  Further searching
turned up the usage in client-auth.sgml.  That one is questionable since we
are indeed talking about LOGIN roles there but we are already pointing to
sql-alterrole instead of sql-alteruser and so changing the create variation
to point to sql-createrole seems warranted.

Attached, and below.

David J.

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 722f3da813..99921ba079 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -998,9 +998,9 @@ omicron bryanh  guest1
 separate from operating system user passwords. The password for
 each database user is stored in the pg_authid system
 catalog. Passwords can be managed with the SQL commands
- and
+ and
 ,
-e.g., CREATE USER foo WITH PASSWORD 'secret',
+e.g., CREATE ROLE foo WITH LOGIN PASSWORD
'secret',
 or the psql
 command \password.
 If no password has been set up for a user, the stored password
diff --git a/doc/src/sgml/ref/create_database.sgml
b/doc/src/sgml/ref/create_database.sgml
index 3e35c776ea..f63f1f92ac 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -45,7 +45,7 @@ CREATE DATABASE name
   
To create a database, you must be a superuser or have the special
CREATEDB privilege.
-   See .
+   See .
   

   


cleanup-createuser-links.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Michael Paquier  writes:
> b1328d78f is close enough, but per what I see that's actually not
> true. I have been able to reproduce the problem, which shows up within
> a window of 2-4 minutes. Still sometimes it can take way longer, and I
> spotted one test where it took 15 minutes to show up... So, bisecting
> with a test that looks for core files for 20 minutes, I am seeing that
> the following commit is actually at fault:

> commit 24992c6db9fd40f342db1f22747ec9e56483796d
> Author: Tom Lane 
> Date:   Fri Sep 9 19:00:59 2016 -0400

> Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.

[ scratches head ... ]  Not sure how that could've introduced any
problems?  Will look.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Rewriting PL/Python's typeio code

2017-10-30 Thread Tom Lane
I started to work on teaching PL/Python about domains over composite,
and soon found that it was a can of worms.  Aside from the difficulty
of shoehorning that in with a minimal patch, there were pre-existing
problems.  I found that it didn't do arrays of domains right either
(ok, that's an oversight in my recent commit c12d570fa), and there
are assorted bugs that have been there much longer.  For instance, if
you return a composite type containing a domain, it fails to enforce
domain constraints on the type's field.  Also, if a transform function
is in use, it missed enforcing domain constraints on the result.
And in many places it missed checking domain constraints on null values,
because the plpy_typeio code simply wasn't called for Py_None.

Plus the code was really messy and duplicative, e.g. domain_check was
called in three different places ... which wasn't enough.  It also did
a lot of repetitive catalog lookups.

So, I ended up rewriting/refactoring pretty heavily.  The new idea
is to solve these problems by making heavier use of recursion between
plpy_typeio's conversion functions, and in particular to treat domains
as if they were containers.  So now there's exactly one place to call
domain_check, in a conversion function that has first recursed to do
conversion of the base type.  Nulls are treated more honestly, and
the SQL-to-Python functions are more careful about not leaking memory.
Also, I solved some of the repetitive catalog lookup problems by
making the code rely as much as possible on the typcache (which I think
didn't exist when this code originated).  I added a couple of small
features to typcache to help with that.

This is a fairly large amount of code churn, and it could stand testing
by someone who's more Python-savvy than I am.  So I'll stick it into
the upcoming commitfest as a separate item.

regards, tom lane

diff --git a/contrib/hstore_plpython/expected/hstore_plpython.out b/contrib/hstore_plpython/expected/hstore_plpython.out
index df49cd5..1ab5fee 100644
*** a/contrib/hstore_plpython/expected/hstore_plpython.out
--- b/contrib/hstore_plpython/expected/hstore_plpython.out
*** AS $$
*** 68,79 
  val = [{'a': 1, 'b': 'boo', 'c': None}, {'d': 2}]
  return val
  $$;
!  SELECT test2arr();
 test2arr   
  --
   {"\"a\"=>\"1\", \"b\"=>\"boo\", \"c\"=>NULL","\"d\"=>\"2\""}
  (1 row)
  
  -- test as part of prepare/execute
  CREATE FUNCTION test3() RETURNS void
  LANGUAGE plpythonu
--- 68,97 
  val = [{'a': 1, 'b': 'boo', 'c': None}, {'d': 2}]
  return val
  $$;
! SELECT test2arr();
 test2arr   
  --
   {"\"a\"=>\"1\", \"b\"=>\"boo\", \"c\"=>NULL","\"d\"=>\"2\""}
  (1 row)
  
+ -- test python -> domain over hstore
+ CREATE DOMAIN hstore_foo AS hstore CHECK(VALUE ? 'foo');
+ CREATE FUNCTION test2dom(fn text) RETURNS hstore_foo
+ LANGUAGE plpythonu
+ TRANSFORM FOR TYPE hstore
+ AS $$
+ return {'a': 1, fn: 'boo', 'c': None}
+ $$;
+ SELECT test2dom('foo');
+  test2dom  
+ ---
+  "a"=>"1", "c"=>NULL, "foo"=>"boo"
+ (1 row)
+ 
+ SELECT test2dom('bar');  -- fail
+ ERROR:  value for domain hstore_foo violates check constraint "hstore_foo_check"
+ CONTEXT:  while creating return value
+ PL/Python function "test2dom"
  -- test as part of prepare/execute
  CREATE FUNCTION test3() RETURNS void
  LANGUAGE plpythonu
diff --git a/contrib/hstore_plpython/sql/hstore_plpython.sql b/contrib/hstore_plpython/sql/hstore_plpython.sql
index 911bbd6..2c54ee6 100644
*** a/contrib/hstore_plpython/sql/hstore_plpython.sql
--- b/contrib/hstore_plpython/sql/hstore_plpython.sql
*** val = [{'a': 1, 'b': 'boo', 'c': None}, 
*** 60,66 
  return val
  $$;
  
!  SELECT test2arr();
  
  
  -- test as part of prepare/execute
--- 60,80 
  return val
  $$;
  
! SELECT test2arr();
! 
! 
! -- test python -> domain over hstore
! CREATE DOMAIN hstore_foo AS hstore CHECK(VALUE ? 'foo');
! 
! CREATE FUNCTION test2dom(fn text) RETURNS hstore_foo
! LANGUAGE plpythonu
! TRANSFORM FOR TYPE hstore
! AS $$
! return {'a': 1, fn: 'boo', 'c': None}
! $$;
! 
! SELECT test2dom('foo');
! SELECT test2dom('bar');  -- fail
  
  
  -- test as part of prepare/execute
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 7aadc5d..f6450c4 100644
*** a/src/backend/utils/cache/typcache.c
--- b/src/backend/utils/cache/typcache.c
*** lookup_type_cache(Oid type_id, int flags
*** 377,382 
--- 377,383 
  		typentry->typstorage = typtup->typstorage;
  		typentry->typtype = typtup->typtype;
  		typentry->typrelid = typtup->typrelid;
+ 		typentry->typelem = typtup->typelem;
  
  		/* If it's a domain, immediately thread it into the domain cache list */
  		if 

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-10-30 Thread Nikita Glukhov

Hi, hackers!

I have a question about transformation of JSON constructors into executor nodes.

In first letter in this thread we wrote:
   JSON_OBJECT(), JSON_ARRAY() constructors and IS JSON predicate are
   transformed into raw function calls.

Here is an example explaining what it means:

=# CREATE VIEW json_object_view AS
SELECT JSON_OBJECT('foo': 1, 'bar': '[1,2]' FORMAT JSON RETURNING text);
CREATE VIEW
=# \sv json_object_view
CREATE OR REPLACE VIEW public.json_object_view AS
 SELECT json_build_object_ext(false, false, 'foo', 1, 'bar', 
'[1,2]'::text::json)::text

As you can see JSON_OBJECT() was transformed into a call on new function
json_build_object_ext(), which shares a code with existing json_build_object()
but differs from it only by two additional boolean parameters for
representation of  {WITH|WITHOUT} UNIQUE [KEYS] and {NULL|ABSENT} ON NULL
clauses.  Information about FORMAT, RETURNING clauses was lost, since they
were transformed into casts.

Other constructors are transformed similary:
JSON_ARRAY() => json[b]_build_array_ext(boolean, VARIADIC any)
JSON_OBJECTAGG() => json[b]_objectagg(any, any, boolean, boolean)
JSON_ARRAYAGG()  => json[b]_agg[_strict](any)

Also there is a variant of JSON_ARRAY() with subquery which transformed into a
subselect with json[b]_agg():
=# CREATE VIEW json_array_view AS SELECT JSON_ARRAY(SELECT 
generate_series(1,3));
CREATE VIEW
=# \sv json_array_view
CREATE OR REPLACE VIEW public.json_array_view AS
 SELECT ( SELECT json_agg_strict(q.a)
   FROM ( SELECT generate_series(1, 3) AS generate_series) q(a))



And here is my question: is it acceptable to do such transformations?
And if is not acceptable (it seemed unacceptable to us from the beginning,
but we did not have time for correct implementation), how should JSON
constructor nodes look like?


The simplest solution that I can propose is to save both transformed
expressions in existing JsonObjectCtor/JsonArrayCtor nodes which exist
now only in untransformed trees.  Whole untransformed JsonXxxCtor node
will be used for displaying, transformed expression -- for execution only.

But it will not work for aggregates, because they are transformed into a
Aggref/WindowFunc node.  Information needed for correct displaying should be
saved somewhere in these standard nodes.

And for subquery variant of JSON_ARRAY I can only offer to leave transformation
into a subselect with JSON_ARRAYAGG():
JSON_ARRAY(query) => (SELECT JSON_ARRAYAGG(bar) FROM (query) foo(bar))

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Michael Paquier
On Mon, Oct 30, 2017 at 9:42 AM, Alvaro Herrera  wrote:
> Tomas Vondra wrote:
>> FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So
>> it seems to be due to something that changed in the last release.
>
> I've been trying to reproduce it for half an hour with no success (I
> think my laptop is just too slow). I bet this is related to the
> addition of PageIndexTupleOverwrite (commit b1328d78f) though I find it
> more likely that this was not *caused* by that commit but rather just
> made it easier to hit.

b1328d78f is close enough, but per what I see that's actually not
true. I have been able to reproduce the problem, which shows up within
a window of 2-4 minutes. Still sometimes it can take way longer, and I
spotted one test where it took 15 minutes to show up... So, bisecting
with a test that looks for core files for 20 minutes, I am seeing that
the following commit is actually at fault:
commit 24992c6db9fd40f342db1f22747ec9e56483796d
Author: Tom Lane 
Date:   Fri Sep 9 19:00:59 2016 -0400

Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.

The full generality of deleting an arbitrary number of tuples is no longer
needed, so let's save some code and cycles by replacing the original coding
with an implementation based on PageIndexTupleDelete.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs  wrote:
> > Nothing I am proposing blocks later work.
> 
> That's not really true.  Nobody's going to be happy if MERGE has one
> behavior in one set of cases and an astonishingly different behavior
> in another set of cases.  If you adopt a behavior for certain cases
> that can't be extended to other cases, then you're blocking a
> general-purpose MERGE.
> 
> And, indeed, it seems that you're proposing an implementation that
> adds no new functionality, just syntax compatibility.  Do we really
> want or need two syntaxes  for the same thing in core?  I kinda think
> Peter might have the right idea here.  Under his proposal, we'd be
> getting something that is, in a way, new.

+1.

I don't think MERGE should be radically different from other database
systems and just syntax sugar over a capability we have.  The
downthread comparison to partitioning isn't accurate either.

There's a reason that we have INSERT .. ON CONFLICT and not MERGE and
it's because they aren't the same thing, as Peter's already explained,
both now and when he and I had exactly this same discussion years ago
when he was working on implementing INSERT .. ON CONFLICT.  Time changes
many things, but I don't think anything's changed in this from the prior
discussions about it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Nico Williams
On Mon, Oct 30, 2017 at 10:59:43AM -0700, Peter Geoghegan wrote:
> On Mon, Oct 30, 2017 at 6:21 AM, Simon Riggs  wrote:
> > If a general purpose solution exists, please explain what it is.
> 
> For the umpteenth time, a general purpose solution is one that more or
> less works like an UPDATE FROM, with an outer join, whose ModifyTable
> node is capable of insert, update, or delete (and accepts quals for
> MATCHED and NOT matched cases, etc). You could still get duplicate
> violations due to concurrent activity in READ COMMITTED mode, but not
> at higher isolation levels thanks to Thomas Munro's work there. In
> this world, ON CONFLICT and MERGE are fairly distinct things.

FWIW, and as an outsider, having looked at MERGE docs from other
RDBMSes, I have to agree that the PG UPSERT (ON CONFLICT .. DO) and
MERGE are rather different beasts.

In particular, I suspect all UPSERT statements can be mapped onto
equivalent MERGE statements, but not all MERGE statements can be mapped
onto UPSERTs.

The reason is that UPSERT depends on UNIQUE constraints, whereas MERGE
uses a generic join condition that need not even refer to any INDEXes,
let alone UNIQUE ones.

Perhaps PG's UPSERT can be generalized to create a temporary UNIQUE
constraint on the  specified in the conflict_target portion of the
statement, increasing the number of MERGE statements that could be
mapped onto UPSERT.  But even then, that would still be a UNIQUE
constraint, whereas MERGE does not even imply such a thing.

Now, a subset of MERGE (those using equijoins in the ON condition) can
be mapped onto UPSERT provided either a suitable UNIQUE index exists (or
that PG notionally creates a temporary UNIQUE constraint for the purpose
of evaluating the UPSERT).  This approach would NOT preclude a more
complete subsequent implementation of MERGE.  But I wonder if that's
worthwhile given that a proper and complete implementation of MERGE is
probably very desirable.

On a tangentially related note, I've long wanted to have an RDBMS-
independent SQL parser for the purpose of implementing external query-
rewriting (and external optimizers), syntax highlighting, and so on.
Having an external / plug-in method for rewriting unsupported SQL as a
way of bridging functionality gaps (like lack of MERGE support) would be
very nice.  PG does have a way to expose its AST...  It might be a good
idea to start by implementing unsupported SQL features in such a way
that they parse and can produce an AST along with a syntax/unsupported
error -- then one might rewrite the parsed AST, generate appropriate
SQL, and execute that.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Peter Geoghegan
On Mon, Oct 30, 2017 at 11:07 AM, Simon Riggs  wrote:
> Please explain in detail the MERGE SQL statements that you think will
> be problematic and why.

Your proposal is totally incomplete, so I can only surmise its
behavior in certain cases, to make a guess at what the problems might
be (see my remarks on EPQ, live lock, etc). This makes it impossible
to do what you ask right now.

Besides, you haven't answered the question from my last e-mail
("What's wrong with that [set of MERGE semantics]?"), so why should I
go to further trouble? You're just not constructively engaging with me
at this point. We're going around in circles.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Current int & float overflow checking is slow.

2017-10-30 Thread Dagfinn Ilmari Mannsåker
Robert Haas  writes:

>> 0003) Removes -fwrapv. I'm *NOT* suggesting we apply this right now, but
>>   it seems like an important test for the new facilities. Without
>>   0002, tests would fail after this, after it all tests run
>>   successfully.
>
> I suggest that if we think we don't need -fwrapv any more, we ought to
> remove it.  Otherwise, we won't find out if we're wrong.

Without -fwrapv signed overflow is undefined behaviour.  We should test
thoroughly with -ftrapv or -fsanitize=signed-integer-overflow to be
confident the code is free of such things.  We might even want to enable
-ftrapv by default in cassert-enabled builds.

- ilmari
-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Linking libpq statically to libssl

2017-10-30 Thread Stephen Frost
Daniele,

* Daniele Varrazzo (daniele.varra...@gmail.com) wrote:
> On Fri, Oct 27, 2017 at 2:37 PM, Tom Lane  wrote:
> > Daniele Varrazzo  writes:
> >> I have a problem building binary packages for psycopg2. Binary
> >> packages ship with their own copies of libpq and libssl; however if
> >> another python package links to libssl the library will be imported
> >> twice with conflicting symbols, likely resulting in a segfault (see
> >> https://github.com/psycopg/psycopg2/issues/543). This happens e.g. if
> >> a python script both connects to postgres and opens an https resource.
> >
> > Basically, you're doing it wrong.  Shipping your own copy of libssl,
> > rather than depending on whatever packaging the platform provides,
> > is just asking for pain --- and not only of this sort.  You're also
> > now on the hook to update your package whenever libssl fixes a bug
> > or a security vulnerability, which happens depressingly often.
> >
> > The same applies to libpq, really.  You don't want to be in the
> > business of shipping bits that you are not the originator of.
> >
> > When I worked at Red Hat, there was an ironclad policy against
> > building packages that incorporated other packages statically.
> > I would imagine that other distros have similar policies for
> > similar reasons.  Just because you *can* ignore those policies
> > doesn't mean you *should*.
> 
> Distros do compile the library from source and against the system
> package, and everyone using the package directly can still do so; the
> binary packages are only installed by the Python package manager.

Which, frankly, is why everyone having their own package manager that
ignores the OS package manager is actually rather horrible.  Obviously,
it's done extensively, but it's outright horrible.

> Psycopg is more complex to install than the average Python package (it
> needs python and postgres dev files, pg_config available somewhere
> etc) and a no-dependencies package provides a much smoother
> experience. It also happens that the libpq and libssl versions used
> tend to be more up-to-date than the system one (people can already use
> the new libpq 10 features without waiting for debian packages).

Having randomly different versions of libraries installed through two
different package managers on the same system is not an improvement for
users.  Further, it's not like there aren't already properly built PG10
packages- they were there from the day PG10 was released, and users can
install them using the OS package manager and against the OS provided
version of all the various libraries.

> I am confronted with the reality of Python developers as of 201x's,
> and shipping binary packages has proven generally a positive feature,
> even factoring in the need of shipping updated binary packages when
> the need arises. The benefit of a simple-to-use library is for the
> Postgres project at large, it is not for my personal gain. So while I
> know the shortcomings of binary packages and static libraries, I would
> still be interested in knowing the best way to fix the problem in the
> subject. Feel free to write in private if you want to avoid the public
> shaming.

The way to fix the problem is to not use two different versions of the
same library in the same binary, which you aren't going to be able to
accomplish when you're forcibly pulling in a different version through
your own binary package.  At best you could try to use symbol versioning
to try and differentiate your symbols from those of the system one, but
that would depend on if the system level library is doing symbol
versioning or not and then you'd still have to figure out what to do
when the OS level package updates and possibly ends up with symbols
*and* versions conflicting.  Of course, these issues are generally (or
should be) handled already by the OS level library package managers and
it's no simple thing to do, but it's how things like multiple libdb
versions can be available and even linked into the same running binaries
and things still work there.  Basically, if you can't control both
versions you're just setting yourself (and our users) up for failure, if
not now then in the future.

In short, figure out how to have a completely different OS that's only
provided through your package manager, or get along with the packages
and versions as provided through the OS system (or provide your own
updated versions of the OS packages and get them installed that matches
what your packages are built against).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Simon Riggs
On 30 October 2017 at 18:59, Peter Geoghegan  wrote:

> It is most emphatically *not* just a "small matter of programming".

Please explain in detail the MERGE SQL statements that you think will
be problematic and why.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Peter Geoghegan
On Mon, Oct 30, 2017 at 6:21 AM, Simon Riggs  wrote:
>> That's not really true.  Nobody's going to be happy if MERGE has one
>> behavior in one set of cases and an astonishingly different behavior
>> in another set of cases.  If you adopt a behavior for certain cases
>> that can't be extended to other cases, then you're blocking a
>> general-purpose MERGE.
>
> If a general purpose solution exists, please explain what it is.

For the umpteenth time, a general purpose solution is one that more or
less works like an UPDATE FROM, with an outer join, whose ModifyTable
node is capable of insert, update, or delete (and accepts quals for
MATCHED and NOT matched cases, etc). You could still get duplicate
violations due to concurrent activity in READ COMMITTED mode, but not
at higher isolation levels thanks to Thomas Munro's work there. In
this world, ON CONFLICT and MERGE are fairly distinct things.

What's wrong with that? You haven't actually told us why you don't like that.

> The problem is that nobody has ever done so, so its not like we are
> discussing the difference between bad solution X and good solution Y,
> we are comparing reasonable solution X with non-existent solution Y.

Nobody knows what your proposal would be like when time came to remove
the restrictions that you suggest could be removed later. You're the
one with the information here. We need to know what those semantics
would be up-front, since you're effectively committing us down that
path.

You keep making vague appeals to pragmatism, but, in all sincerity, I
don't understand where you're coming from at all. I strongly believe
that generalizing from ON CONFLICT doesn't even make the
implementation easier in the short term. ISTM that you're making this
difficult for yourself for reasons that are known only to you.

>> And, indeed, it seems that you're proposing an implementation that
>> adds no new functionality, just syntax compatibility.  Do we really
>> want or need two syntaxes  for the same thing in core?  I kinda think
>> Peter might have the right idea here.  Under his proposal, we'd be
>> getting something that is, in a way, new.
>
> Partitioning looked like "just new syntax", but it has been a useful
> new feature.

False equivalency. Nobody, including you, ever argued that that work
risked painting us into a corner. (IIRC you said something like the
progress was too small to justify putting into a single release.)

> MERGE provides new capabilities that we do not have and is much more
> powerful than INSERT/UPDATE, in a syntax that follow what other
> databases use today. Just like partitioning.

But you haven't told us *how* it is more powerful. Again, the
semantics of a MERGE that is a generalization of ON CONFLICT are not
at all obvious, and seem like they might be very surprising and risky.
There is no question that it's your job to (at a minimum) define those
semantics ahead of time, since you're going to commit us to them in
the long term if you continue down this path. It is most emphatically
*not* just a "small matter of programming".

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Simon Riggs
On 30 October 2017 at 15:22, Simon Riggs  wrote:

>> You forgot to update this formula in xlog.c:
>> distance = (2.0 + CheckPointCompletionTarget) * 
>> CheckPointDistanceEstimate;
>> /* add 10% for good measure. */
>> distance *= 1.10;
>> You can guess easily where the mistake is.
>
> Doh - fixed that before posting, so I must have sent the wrong patch.
>
> It's the 10%, right? ;-)

So, there are 2 locations that mention 2.0 in xlog.c

I had already fixed one, which is why I remembered editing it.

The other one you mention has a detailed comment above it explaining
why it should be 2.0 rather than 1.0, so I left it.

Let me know if you still think it should be removed? I can see the
argument both ways.

Or maybe we need another patch to account for manual checkpoints.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Current int & float overflow checking is slow.

2017-10-30 Thread Andres Freund
Hi,

On 2017-10-30 22:29:42 +0530, Robert Haas wrote:
> On Mon, Oct 30, 2017 at 4:57 PM, Andres Freund  wrote:
> > 0001) Introduces pg_{add,sub,mul}{16,32,64}_overflow(a, b, *result)
> >   These use compiler intrinsics on gcc/clang. If that's not
> >   available, they cast to a wider type and to overflow checks. For
> >   64bit there's a fallback for the case 128bit math is not
> >   available (here I stole from an old patch of Greg's).
> >
> >   These fallbacks are, as far as I can tell, C free of overflow
> >   related undefined behaviour.
> 
> Looks nice.

Thanks.


> >   Perhaps it should rather be pg_add_s32_overflow, or a similar
> >   naming scheme?
> 
> Not sure what the s is supposed to be?  Signed?

Yes, signed. So we could add a u32 or something complementing the
functions already in the patch. Even though overflow checks are a heck
of a lot easier to write for unsigned ints, the intrinsics are still
faster.  I don't have any sort of strong feelings on the naming.


> > 0002) Converts int.c, int8.c and a smattering of other functions to use
> >   the new facilities. This removes a fair amount of code.
> >
> >   It might make sense to split this up further, but right now that's
> >   the set of functions that either are affected performancewise by
> >   previous overflow checks, and/or relied on wraparound
> >   overflow. There's probably more places, but this is what I found
> >   by visual inspection and compiler warnings.
> 
> I lack the patience to review this tonight.

Understandable ;)


> > 0003) Removes -fwrapv. I'm *NOT* suggesting we apply this right now, but
> >   it seems like an important test for the new facilities. Without
> >   0002, tests would fail after this, after it all tests run
> >   successfully.
> 
> I suggest that if we think we don't need -fwrapv any more, we ought to
> remove it.  Otherwise, we won't find out if we're wrong.

I agree that we should do so at some point not too far away in the
future. Not the least because we don't specify this kind of C dialect in
a lot of other compilers. Additionally the flag causes some slowdown
(because e.g. for loop variables are optimized less). But I'm fairly
certain it needs a bit more care that I've invested as of now - should
probably at least compile with -Wstrict-overflow=some-higher-level, and
with ubsan. I'm fairly certain there's more bogus overflow checks
around...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Comment typo in get_collation_name() comment

2017-10-30 Thread Dagfinn Ilmari Mannsåker
Hi Hackers,

The comment for get_collation_name() seems to have been copy-pasted from
get_constraint_name(), but missed one s/constraint/collation/.

Patch attached.

-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 48961e31aa..acbd9ac63e 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -958,7 +958,7 @@ get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
  * get_collation_name
  *		Returns the name of a given pg_collation entry.
  *
- * Returns a palloc'd copy of the string, or NULL if no such constraint.
+ * Returns a palloc'd copy of the string, or NULL if no such collation.
  *
  * NOTE: since collation name is not unique, be wary of code that uses this
  * for anything except preparing error messages.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-30 Thread Ivan Kartyshov

Ants Aasma писал 2017-10-26 17:29:

On Mon, Oct 23, 2017 at 12:29 PM, Ivan Kartyshov
 wrote:

Ants Aasma писал 2017-09-26 13:00:


Exposing this interface as WAITLSN will encode that visibility order
matches LSN order. This removes any chance of fixing for example
visibility order of async/vs/sync transactions. Just renaming it so
the token is an opaque commit visibility token that just happens to 
be

a LSN would still allow for progress in transaction management. For
example, making PostgreSQL distributed will likely want timestamp
and/or vector clock based visibility rules.



I'm sorry I did not understand exactly what you meant.
Please explain this in more detail.


Currently transactions on the master become visible when xid is
removed from proc array. This depends on the order of acquiring
ProcArrayLock, which can happen in a different order from inserting
the commit record to wal. Whereas on the standby the transactions will
become visible in the same order that commit records appear in wal.
The difference can be quite large when transactions are using
different values for synchronous_commit. Fixing this is not easy, but
we may want to do it someday. IIRC CSN threads contained more
discussion on this topic. If we do fix it, it seems likely that what
we need to wait on is not LSN, but some other kind of value. If the UI
were something like "WAITVISIBILITY token", then we can later change
the token to something other than LSN.

Regards,
Ants Aasma


It sounds reasonable. I can offer the following version.

WAIT  LSN  lsn_number;
WAIT  LSN  lsn_number  TIMEOUT delay;
WAIT  LSN  lsn_number  INFINITE;
WAIT  LSN  lsn_number  NOWAIT;


WAIT  [token]  wait_value [option];

token - [LSN, TIME | TIMESTAMP | CSN | XID]
option - [TIMEOUT | INFINITE | NOWAIT]

Ready to listen to your suggestions.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Current int & float overflow checking is slow.

2017-10-30 Thread Robert Haas
On Mon, Oct 30, 2017 at 4:57 PM, Andres Freund  wrote:
> 0001) Introduces pg_{add,sub,mul}{16,32,64}_overflow(a, b, *result)
>   These use compiler intrinsics on gcc/clang. If that's not
>   available, they cast to a wider type and to overflow checks. For
>   64bit there's a fallback for the case 128bit math is not
>   available (here I stole from an old patch of Greg's).
>
>   These fallbacks are, as far as I can tell, C free of overflow
>   related undefined behaviour.

Looks nice.

>   Perhaps it should rather be pg_add_s32_overflow, or a similar
>   naming scheme?

Not sure what the s is supposed to be?  Signed?

> 0002) Converts int.c, int8.c and a smattering of other functions to use
>   the new facilities. This removes a fair amount of code.
>
>   It might make sense to split this up further, but right now that's
>   the set of functions that either are affected performancewise by
>   previous overflow checks, and/or relied on wraparound
>   overflow. There's probably more places, but this is what I found
>   by visual inspection and compiler warnings.

I lack the patience to review this tonight.

> 0003) Removes -fwrapv. I'm *NOT* suggesting we apply this right now, but
>   it seems like an important test for the new facilities. Without
>   0002, tests would fail after this, after it all tests run
>   successfully.

I suggest that if we think we don't need -fwrapv any more, we ought to
remove it.  Otherwise, we won't find out if we're wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-30 Thread Dilip Kumar
On Mon, Oct 30, 2017 at 12:20 PM, Amit Langote
 wrote:
> On 2017/10/30 14:55, Amit Langote wrote:
>> Fixed in the attached updated patch, along with a new test in 0001 to
>> cover this case.  Also, made a few tweaks to 0003 and 0005 (moved some
>> code from the former to the latter) around the handling of 
>> ScalarArrayOpExprs.
>
> Sorry, I'd forgotten to include some changes.
>
> In the previous versions, RT index of the table needed to be passed to
> partition.c, which I realized is no longer needed, so I removed that
> requirement from the interface.  As a result, patches 0002 and 0003 have
> changed in this version.

Some Minor comments:

+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,

Function name in function header is not correct.

+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;

DatumGetBool will return true if the first byte of constvalue is
nonzero otherwise
false.  IIUC, this is not the intention here. Or I am missing something?

+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list

a = 0 the list -> a = 0 in the list

+static bool
+partkey_datum_from_expr(const Expr *expr, Datum *value)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;

I think for evaluating other expressions (e.g. T_Param) we will have
to pass ExprContext to this function. Or we can do something cleaner
because if we want to access the ExprContext inside
partkey_datum_from_expr then we may need to pass it to
"get_partitions_from_clauses" which is a common function for optimizer
and executor.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix duplicated "the" occurrences in codebase

2017-10-30 Thread Christoph Dreis
Hey,

please find a patch attached that fixes duplicated "the" occurrences in the
codebase.

As this is my first patch, please let me know in case I did something wrong.

Cheers,
Christoph


duplicate-the-typos.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Andres Freund
On 2017-10-30 10:10:19 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > I was mostly just thinking out loud, listing another option rather
> > than advocating for it.
> 
> FWIW, I just wanted the question to be debated and resolved properly.
> 
> After rereading the thread Andres pointed to, I thought of a hazard
> that I think Andres alluded to, but didn't spell out explicitly:
> if we can't read the primary checkpoint, and then back up to a
> secondary one and replay as much of WAL as we can read, we may well
> be left with an inconsistent database.

Exactly.


> I'm content now that removing the secondary checkpoint is an OK
> decision.  (This isn't a review of Simon's patch, though.)

I wonder if we shouldn't add a pg_resetxlog option that sets the
checkpoint to start from to a certain LSN. For the few cases where
there's actual data recovery needed that's a lot more useful than
randomly using checkpoint - 1. And it's an explicit expert only thing,
without costing everyone.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pow support for pgbench

2017-10-30 Thread Fabien COELHO


Hello Michaël,

I'm fine with having pow in pgbench.


I am adding as well Fabien in CC who worked in getting the internal
function infrastructure in the shape it is now (waaay better) with
commit 86c43f4.


It might be even better if https://commitfest.postgresql.org/15/985/, 
which has been around for over one year (!), get through some day...


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-30 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> 
> > Before pushing, I'll give a look to the regular autovacuum path to see
> > if it needs a similar fix.
> 
> Reading that one, my conclusion is that it doesn't have the same problem
> because the strings are allocated in AutovacuumMemCxt which is not reset
> by error recovery.  This gave me the idea to use that context instead of
> TopTransactionContext to store the strings in workitem processing also.
> The patch I propose now is attached.

I ended up backpatching 335f3d04e4c8.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Jsonb transform for pl/python

2017-10-30 Thread David Fetter
On Mon, Oct 30, 2017 at 11:15:00AM +0300, Anthony Bykov wrote:
> On Sun, 29 Oct 2017 19:11:02 +0100
> David Fetter  wrote:
> 
> > Thanks for your hard work!
> > 
> > Should there also be one for PL/Python3U?
> > 
> > Best,
> > David.
> Hi.
> Actually, there is one for PL/Python3U. This patch contains following
> extensions:
> jsonb_plpythonu
> jsonb_plpython2u
> jsonb_plpython3u
> "make install" checks which python major version was your postgresql
> configured with and installs corresponding extension.

My mistake.  Sorry about the noise.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Simon Riggs
On 30 October 2017 at 15:31, Michael Paquier  wrote:
> On Mon, Oct 30, 2017 at 2:22 PM, Simon Riggs  wrote:
>> On 25 October 2017 at 00:17, Michael Paquier  
>> wrote:
>>> -* Delete old log files (those no longer needed even for previous
>>> -* checkpoint or the standbys in XLOG streaming).
>>> +* Delete old log files and recycle them
>>>  */
>>> Here that's more "Delete or recycle old log files". Recycling of a WAL
>>> segment is its renaming into a newer segment.
>>
>> Sometimes files are deleted if there are too many.
>
> Sure, but one segment is never deleted and then recycled, which is
> what your new comment implies as I understand it.

I guess it depends how you read it.

The function performs both deletion AND recycling

Yet one file is only ever deleted OR recycled


>>> -   checkPointLoc = ControlFile->prevCheckPoint;
>>> +   /*
>>> +* It appears to be a bug that we used to use
>>> prevCheckpoint here
>>> +*/
>>> +   checkPointLoc = ControlFile->checkPoint;
>>> Er, no. This is correct because we expect the prior checkpoint to
>>> still be present in the event of a failure detecting the latest
>>> checkpoint.
>>
>> I'm removing "prevCheckPoint", so not sure what you mean.
>
> I mean that there is no actual bug here. And changing the code as you
> do is correct, but the comment is not.

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Tom Lane
Alexander Korotkov  writes:
> I think Connor struggles to implement just an operator class.  Advising him
> to implement an index access method is a good way to get him away of
> PostgreSQL hacking for a long time :)

Yeah.  To answer the question a bit more directly: there are not any
contrib modules that add SP-GiST opclasses, but there are some that add
GiST or GIN opclasses, so any one of those would serve as a model for the
basic mechanism of writing an extension.  Just replace the AM-specific
support functions for those AMs with the ones SP-GiST uses.  (You can crib
some code details from the in-core SP-GiST support functions.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Michael Paquier
On Mon, Oct 30, 2017 at 2:22 PM, Simon Riggs  wrote:
> On 25 October 2017 at 00:17, Michael Paquier  
> wrote:
>> -* Delete old log files (those no longer needed even for previous
>> -* checkpoint or the standbys in XLOG streaming).
>> +* Delete old log files and recycle them
>>  */
>> Here that's more "Delete or recycle old log files". Recycling of a WAL
>> segment is its renaming into a newer segment.
>
> Sometimes files are deleted if there are too many.

Sure, but one segment is never deleted and then recycled, which is
what your new comment implies as I understand it.

>> -   checkPointLoc = ControlFile->prevCheckPoint;
>> +   /*
>> +* It appears to be a bug that we used to use
>> prevCheckpoint here
>> +*/
>> +   checkPointLoc = ControlFile->checkPoint;
>> Er, no. This is correct because we expect the prior checkpoint to
>> still be present in the event of a failure detecting the latest
>> checkpoint.
>
> I'm removing "prevCheckPoint", so not sure what you mean.

I mean that there is no actual bug here. And changing the code as you
do is correct, but the comment is not.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Simon Riggs
On 25 October 2017 at 00:17, Michael Paquier  wrote:
> On Mon, Oct 23, 2017 at 11:20 PM, Simon Riggs  wrote:
>> Remove the code that maintained two checkpoint's WAL files and all
>> associated stuff.
>>
>> Try to avoid breaking anything else
>>
>> This
>> * reduces disk space requirements on master
>> * removes a minor bug in fast failover
>> * simplifies code
>
> In short, yeah! I had a close look at the patch and noticed a couple of 
> issues.

Thanks for the detailed review

> +else
>  /*
> - * The last valid checkpoint record required for a streaming
> - * recovery exists in neither standby nor the primary.
> + * We used to attempt to go back to a secondary checkpoint
> + * record here, but only when not in standby_mode. We now
> + * just fail if we can't read the last checkpoint because
> + * this allows us to simplify processing around checkpoints.
>   */
>  ereport(PANIC,
>  (errmsg("could not locate a valid checkpoint record")));
> -}
> Using brackets in this case is more elegant style IMO. OK, there is
> one line, but the comment is long so the block becomes
> confusingly-shaped.

OK

>  /* Version identifier for this pg_control format */
> -#define PG_CONTROL_VERSION1003
> +#define PG_CONTROL_VERSION1100
> Yes, this had to happen anyway in this release cycle.
>
> backup.sgml describes the following:
> to higher segment numbers.  It's assumed that segment files whose
> contents precede the checkpoint-before-last are no longer of
> interest and can be recycled.
> However this is not true anymore with this patch.

Thanks, will fix.

> The documentation of pg_control_checkpoint() is not updated. func.sgml
> lists the tuple fields returned for this function.

Ah, OK.

> You forgot to update pg_control_checkpoint in pg_proc.h. prior_lsn is
> still listed in the list of arguments returned. And you need to update
> as well the output list of types.
>
>   * whichChkpt identifies the checkpoint (merely for reporting purposes).
> - * 1 for "primary", 2 for "secondary", 0 for "other" (backup_label)
> + * 1 for "primary", 0 for "other" (backup_label)
> + * 2 for "secondary" is no longer supported
>   */
> I would suggest to just remove the reference to "secondary".

Yeh, that was there for review.

> -* Delete old log files (those no longer needed even for previous
> -* checkpoint or the standbys in XLOG streaming).
> +* Delete old log files and recycle them
>  */
> Here that's more "Delete or recycle old log files". Recycling of a WAL
> segment is its renaming into a newer segment.

Sometimes files are deleted if there are too many.

> You forgot to update this formula in xlog.c:
> distance = (2.0 + CheckPointCompletionTarget) * 
> CheckPointDistanceEstimate;
> /* add 10% for good measure. */
> distance *= 1.10;
> You can guess easily where the mistake is.

Doh - fixed that before posting, so I must have sent the wrong patch.

It's the 10%, right? ;-)

> -   checkPointLoc = ControlFile->prevCheckPoint;
> +   /*
> +* It appears to be a bug that we used to use
> prevCheckpoint here
> +*/
> +   checkPointLoc = ControlFile->checkPoint;
> Er, no. This is correct because we expect the prior checkpoint to
> still be present in the event of a failure detecting the latest
> checkpoint.

I'm removing "prevCheckPoint", so not sure what you mean.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Tom Lane
Robert Haas  writes:
> I was mostly just thinking out loud, listing another option rather
> than advocating for it.

FWIW, I just wanted the question to be debated and resolved properly.

After rereading the thread Andres pointed to, I thought of a hazard
that I think Andres alluded to, but didn't spell out explicitly:
if we can't read the primary checkpoint, and then back up to a
secondary one and replay as much of WAL as we can read, we may well
be left with an inconsistent database.  This would happen because
some changes that post-date the part of WAL we could read may have
been flushed to disk, if the system previously thought that WAL
up through the primary checkpoint was all safely down to disk.
Therefore, backing up to the secondary checkpoint is *not* a safe
automatic recovery choice, and it's dubious that it's even a useful
approach for manual recovery.  You're really into restore-from-
backup territory at that point.

I'm content now that removing the secondary checkpoint is an OK
decision.  (This isn't a review of Simon's patch, though.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-30 Thread Alexander Korotkov
On Sun, Oct 29, 2017 at 1:30 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Oct 20, 2017 at 1:01 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> I've reviewed code of ~> operator and its KNN-GiST support.
> Unfortunately, it appears that it's broken in design...  The explanation is
> above.
>
> We've following implementation of ~> operator.
>
> if (coord <= DIM(cube))
>> {
>> if (IS_POINT(cube))
>> PG_RETURN_FLOAT8(cube->x[coord - 1]);
>> else
>> PG_RETURN_FLOAT8(Min(cube->x[coord - 1],
>> cube->x[coord - 1 + DIM(cube)]));
>> }
>> else
>> {
>> if (IS_POINT(cube))
>> PG_RETURN_FLOAT8(cube->x[(coord - 1) % DIM(cube)]);
>> else
>> PG_RETURN_FLOAT8(Max(cube->x[coord - 1],
>> cube->x[coord - 1 - DIM(cube)]));
>> }
>
>
> Thus, for cube of N dimensions [1; N - 1] are lower bounds and [N; 2*N -
> 1] are upper bounds.  However, we might have indexed cubes of different
> dimensions.  For example, for cube of 2 dimensions "cube ~> 3" selects
> upper bound of 1st dimension.  For cube of 3 dimensions "cube ~> 3" selects
> lower bound of 3rd dimension.
>
> Therefore, despite ~> operator was specially invented to be supported by
> KNN-GIST, it can't serve this way.
>
> Regarding particular case discovered by Tomas, I think the error is in the
> GiST supporting code.
>
> if (strategy == CubeKNNDistanceCoord)
>> {
>> int coord = PG_GETARG_INT32(1);
>> if (DIM(cube) == 0)
>> retval = 0.0;
>> else if (IS_POINT(cube))
>> retval = cube->x[(coord - 1) % DIM(cube)];
>> else
>> retval = Min(cube->x[(coord - 1) % DIM(cube)],
>> cube->x[(coord - 1) % DIM(cube) + DIM(cube)]);
>> }
>
>
> g_cube_distance() always returns lower bound of cube.  It should return
> upper bound for coord > DIM(cube).
>
> It would be also nice to provide descending ordering using KNN-GiST.  It
> would be especially effective for upper bound.  Since, KNN-GiST doesn't
> support explicit descending ordering, it might be useful to make ~>
> operator return negative of coordinate when negative argument is provided.
> For instance, '(1,2), (3,4)'::cube ~> -1 return -1.
>
> I'd like to propose following way to resolve design issue.  cube ~> (2*N -
> 1) should return lower bound of Nth coordinate of the cube while cube ~>
> 2*N should return upper bound of Nth coordinate.  Then it would be
> independent on number of coordinates in particular cube.  For sure, it
> would be user-visible incompatible change.  But I think there is not so
> many users of this operator yet.  Also, current behavior of ~> seems quite
> useless.
>

Thus, I heard no objection to my approach.  Attached patch changes ~>
operator in the proposed way and fixes KNN-GiST search for that.  I'm going
to register this patch to the nearest commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


cube-knn-fix-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Robert Haas
On Mon, Oct 30, 2017 at 7:04 PM, Alvaro Herrera  wrote:
>> A sort of middle way would be to keep the secondary checkpoint around
>> but never try to replay from that point, or only if a specific flag is
>> provided.
>
> Why do you want to keep the secondary checkpoint?  If there is no way to
> automatically start a recovery from the prior checkpoint, is it really
> possible to do the same manually?  I think the only advantage of keeping
> it is that the WAL files are kept around for a little bit longer.  But
> is that useful?  Surely for any environment where you really care, you
> have a WAL archive somewhere, so it doesn't matter if files are removed
> from the primary's pg_xlog dir.

(apologies for the empty message)

I don't really want anything in particular here, other than for the
system to be reliable.  If we're confident that there's zero value in
the secondary checkpoint then, sure, ditch it.  Even if you have the
older WAL files around in an archive, it doesn't mean that you know
where the previous checkpoint start location is ... but actually, come
to think of it, if you did need to know that, you could just run
pg_waldump to find it.  That probably wasn't true when this code was
originally written, but it is today.

I was mostly just thinking out loud, listing another option rather
than advocating for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Robert Haas
On Mon, Oct 30, 2017 at 7:04 PM, Alvaro Herrera  wrote:
> Robert Haas wrote:
>> On Tue, Oct 24, 2017 at 7:25 PM, Andres Freund  wrote:
>> > I think it does the contrary. The current mechanism is, in my opinion,
>> > downright dangerous:
>> > https://www.postgresql.org/message-id/20160201235854.go8...@awork2.anarazel.de
>>
>> A sort of middle way would be to keep the secondary checkpoint around
>> but never try to replay from that point, or only if a specific flag is
>> provided.
>
> Why do you want to keep the secondary checkpoint?  If there is no way to
> automatically start a recovery from the prior checkpoint, is it really
> possible to do the same manually?  I think the only advantage of keeping
> it is that the WAL files are kept around for a little bit longer.  But
> is that useful?  Surely for any environment where you really care, you
> have a WAL archive somewhere, so it doesn't matter if files are removed
> from the primary's pg_xlog dir.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comment typo

2017-10-30 Thread Magnus Hagander
On Mon, Oct 30, 2017 at 3:49 AM, Etsuro Fujita 
wrote:

> Here is a patch to fix a typo in a comment in partition.c:
> s/specificiation/specification/.
>

Applied, thanks.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] pow support for pgbench

2017-10-30 Thread Raúl Marín Rodríguez
Sorry about the patch. Attaching it now so it can be considered as
submitted.

-- 

*Raúl Marín Rodríguez*carto.com
From a6eecfe6637bdb0cbb02a9eda7b88d9c4325bb51 Mon Sep 17 00:00:00 2001
From: Raul Marin 
Date: Fri, 13 Oct 2017 17:42:23 +0200
Subject: [PATCH] Add pow() support to pgbench

---
 doc/src/sgml/ref/pgbench.sgml| 7 +++
 src/bin/pgbench/exprparse.y  | 3 +++
 src/bin/pgbench/pgbench.c| 9 +
 src/bin/pgbench/pgbench.h| 3 ++-
 src/bin/pgbench/t/001_pgbench_with_server.pl | 5 +
 5 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1f55967e40a..2e913dfcfd6 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1233,6 +1233,13 @@ pgbench  options  d
sqrt(2.0)
1.414213562
   
+  
+   pow(x, y)
+   double if x or y are doubles, else integer
+   Numeric exponentiation
+   pow(2.0, 10)
+   1024.0/
+  
  
  

diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 770be981f06..290bca99d12 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -334,6 +334,9 @@ static const struct
 	{
 		"!case_end", -2, PGBENCH_CASE
 	},
+	{
+		"pow", 2, PGBENCH_POW
+	},
 	/* keep as last array element */
 	{
 		NULL, 0, 0
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index add653bf90c..b2bab9b8239 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1474,6 +1474,7 @@ evalFunc(TState *thread, CState *st,
 		case PGBENCH_NE:
 		case PGBENCH_LE:
 		case PGBENCH_LT:
+		case PGBENCH_POW:
 			{
 PgBenchValue *lval = [0],
 		   *rval = [1];
@@ -1525,6 +1526,10 @@ evalFunc(TState *thread, CState *st,
 			setBoolValue(retval, ld < rd);
 			return true;
 
+		case PGBENCH_POW:
+			setDoubleValue(retval, pow(ld, rd));
+			return true;
+
 		default:
 			/* cannot get here */
 			Assert(0);
@@ -1602,6 +1607,10 @@ evalFunc(TState *thread, CState *st,
 
 			return true;
 
+		case PGBENCH_POW:
+			setIntValue(retval, pow(li, ri));
+			return true;
+
 		default:
 			/* cannot get here */
 			Assert(0);
diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h
index e1277a1dde6..9f26af92bf6 100644
--- a/src/bin/pgbench/pgbench.h
+++ b/src/bin/pgbench/pgbench.h
@@ -94,7 +94,8 @@ typedef enum PgBenchFunction
 	PGBENCH_LE,
 	PGBENCH_LT,
 	PGBENCH_IS,
-	PGBENCH_CASE
+	PGBENCH_CASE,
+	PGBENCH_POW
 } PgBenchFunction;
 
 typedef struct PgBenchExpr PgBenchExpr;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 8e19bbd3f45..81b7ce4dfcc 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -237,6 +237,8 @@ sub pgbench
 		qr{command=36.: int 36\b},
 		qr{command=37.: boolean true\b},
 		qr{command=38.: boolean true\b},
+		qr{command=44.: int -27\b},
+		qr{command=45.: double 1024\b},
 	],
 	'pgbench expressions',
 	{   '001_pgbench_expressions' => q{-- integer functions
@@ -299,6 +301,9 @@ sub pgbench
 \set v2 5432
 \set v3 -54.21E-2
 SELECT :v0, :v1, :v2, :v3;
+--- pow() operator
+\set poweri debug(pow(-3,3))
+\set powerd debug(pow(2.0,10))
 } });
 
 =head

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pow support for pgbench

2017-10-30 Thread Raúl Marín Rodríguez
Hi,

both patches seem complementary. I've rebased my changes on top of that
patch
(v14) in https://git.io/vFtnT and everything seems to be working fine.

On Mon, Oct 30, 2017 at 12:36 PM, Michael Paquier  wrote:

> On Mon, Oct 30, 2017 at 11:07 AM, Alvaro Herrera
>  wrote:
> > Michael Paquier wrote:
> >
> >> Please add this patch to the upcoming commit fest if you would like to
> >> get some feedback:
> >> https://commitfest.postgresql.org/15/
> >>
> >> I am adding as well Fabien in CC who worked in getting the internal
> >> function infrastructure in the shape it is now (waaay better) with
> >> commit 86c43f4.
> >
> > I think Raúl would do well to review this patch by Fabien
> > https://www.postgresql.org/message-id/alpine.DEB.2.20.
> 1710201835390.15170@lancre
> > which adds a few functions and operators.
>
> Good idea. pow() is not added by Fabien's patch, but an operator for
> pow() could be something to add as well.
> --
> Michael
>



-- 

*Raúl Marín Rodríguez*carto.com


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Alvaro Herrera
Robert Haas wrote:
> On Tue, Oct 24, 2017 at 7:25 PM, Andres Freund  wrote:
> > I think it does the contrary. The current mechanism is, in my opinion,
> > downright dangerous:
> > https://www.postgresql.org/message-id/20160201235854.go8...@awork2.anarazel.de
> 
> A sort of middle way would be to keep the secondary checkpoint around
> but never try to replay from that point, or only if a specific flag is
> provided.

Why do you want to keep the secondary checkpoint?  If there is no way to
automatically start a recovery from the prior checkpoint, is it really
possible to do the same manually?  I think the only advantage of keeping
it is that the WAL files are kept around for a little bit longer.  But
is that useful?  Surely for any environment where you really care, you
have a WAL archive somewhere, so it doesn't matter if files are removed
from the primary's pg_xlog dir.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Robert Haas
On Tue, Oct 24, 2017 at 7:25 PM, Andres Freund  wrote:
> I think it does the contrary. The current mechanism is, in my opinion,
> downright dangerous:
> https://www.postgresql.org/message-id/20160201235854.go8...@awork2.anarazel.de

A sort of middle way would be to keep the secondary checkpoint around
but never try to replay from that point, or only if a specific flag is
provided.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Simon Riggs
On 30 October 2017 at 09:44, Robert Haas  wrote:
> On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs  wrote:
>> Nothing I am proposing blocks later work.
>
> That's not really true.  Nobody's going to be happy if MERGE has one
> behavior in one set of cases and an astonishingly different behavior
> in another set of cases.  If you adopt a behavior for certain cases
> that can't be extended to other cases, then you're blocking a
> general-purpose MERGE.

If a general purpose solution exists, please explain what it is. The
problem is that nobody has ever done so, so its not like we are
discussing the difference between bad solution X and good solution Y,
we are comparing reasonable solution X with non-existent solution Y.

> And, indeed, it seems that you're proposing an implementation that
> adds no new functionality, just syntax compatibility.  Do we really
> want or need two syntaxes  for the same thing in core?  I kinda think
> Peter might have the right idea here.  Under his proposal, we'd be
> getting something that is, in a way, new.

Partitioning looked like "just new syntax", but it has been a useful
new feature.

MERGE provides new capabilities that we do not have and is much more
powerful than INSERT/UPDATE, in a syntax that follow what other
databases use today. Just like partitioning.

Will what I propose do everything in the first release? No, just like
partitioning.

If other developers are able to do things in phases, then I claim that
right also.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-30 Thread Robert Haas
On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
 wrote:
> Hello. I made some bugfixes and rewrite the patch.

I don't think it's a good idea to deliberately leave the state of the
standby different from the state of the  master on the theory that it
won't matter.  I feel like that's something that's likely to come back
to bite us.

Giving LockAcquireExtended() an argument that causes some
AccessExclusiveLocks not all to be logged also seems pretty ugly,
especially because there are no comments whatsoever explaining the
rationale.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: restrict pg_rewind to whitelisted directories

2017-10-30 Thread Chris Travers
The attached patch is cleaned up and filed for the commit fest this next
month:

Here's the full commit message via Mercurial.  I will likely have a new
branch per version the patch since that's the closest thing to a rebase in
this version control system.

changeset:   60492:47f87a2d2fa1

tag: mine/pg_rewind_restrict_dirs

parent:  60446:e638ba9c3c11

user:Chris Travers 

date:Mon Oct 30 12:25:18 2017 +0100

files:   doc/src/sgml/ref/pg_rewind.sgml src/bin/pg_rewind/copy_fetch.c
src/bin/pg_rewind/fetch.c src/bin/pg_rewind/fetch.h
src/bin/pg_rewind/libpq_fetch.c src/bin/pg_rewind/pg_rewind.c
src/bin/pg_rewind/t/003_extrafiles.pl

description:

Restrict pg_rewind to whitelisted directories.


This is intended to be a minimum working version and in fact builds and
passes tests.

Note that tests for extra files have been changed to reflect new behavior
and additional

debugging informnation added in to output in case of failure.


The patch iterates through a series of set directories to synchronize them
only.  This improves

predictability of the complete state of the system after a rewind.


One important outstanding question here is whether we need to ensure the
possibility of backing

up other files if they exist via an --include-path command line switch
(this would not be a glob).

In the thread discussing this patch, Michael Paquier has expressed concern
about configuration

files created by extensions or other components not being copied.  I could
add such a switch but

the patch is long enough, and it is unclear enough to the extent this is
needed at present, so

I am leaving it at the reviewer's discretion whether I should add this here
or submit a second

patch later to add the ability to add additional paths to the filemap.

Either way, it is worth noting that I expect to have a subsequent patch
either incorporated here or in a further submission that takes this and
adds the ability to include additional directories or files via a command
line flag.  This will *not* be a shell glob but one directory or file per
invocation of the switch (similar to -t in pg_dump).

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


pg_rewind_restrict_dirs.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Simon Riggs
On 29 October 2017 at 21:25, Peter Geoghegan  wrote:

> The semantics that I suggest (the SQL standard's semantics) will
> require less code, and will be far simpler. Right now, I simply don't
> understand why you're insisting on using ON CONFLICT without even
> saying why. I can only surmise that you think that doing so will
> simplify the implementation, but I can guarantee you that it won't.

If you see problems in my proposal, please show the specific MERGE SQL
statements that you think will give problems and explain how and what
the failures will be.

We can then use those test cases to drive developments. If we end up
with code for multiple approaches we will be able to evaluate the
differences between proposals using the test cases produced.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Flexible configuration for full-text search

2017-10-30 Thread Aleksandr Parfenov
I'm mostly happy with mentioned modifications, but I have few questions
to clarify some points. I will send new patch in week or two.

On Thu, 26 Oct 2017 20:01:14 +0200
Emre Hasegeli  wrote:
> To put it formally:
> 
> ALTER TEXT SEARCH CONFIGURATION name
> ADD MAPPING FOR token_type [, ... ] WITH config
> 
> where config is one of:
> 
> dictionary_name
> config { UNION | INTERSECT | EXCEPT } config
> CASE config WHEN [ NO ] MATCH THEN [ KEEP ELSE ] config END

According to formal definition following configurations are valid:

CASE english_hunspell WHEN MATCH THEN KEEP ELSE simple END
CASE english_noun WHEN MATCH THEN english_hunspell END

But configuration:

CASE english_noun WHEN MATCH THEN english_hunspell ELSE simple END

is not (as I understand ELSE can be used only with KEEP).

I think we should decide to allow or disallow usage of different
dictionaries for match checking (between CASE and WHEN) and a result
(after THEN). If answer is 'allow', maybe we should allow the
third example too for consistency in configurations.

> > 3) Using different dictionaries for recognizing and output
> > generation. As I mentioned before, in new syntax condition and
> > command are separate and we can use it for some more complex text
> > processing. Here an example for processing only nouns:
> >
> > ALTER TEXT SEARCH CONFIGURATION nouns_only
> >   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> > word, hword, hword_part WITH CASE
> >   WHEN english_noun THEN english_hunspell
> > END  
> 
> This would also still work with the simpler syntax because
> "english_noun", still being a dictionary, would pass the tokens to the
> next one.

Based on formal definition it is possible to describe this example in
following manner:
CASE english_noun WHEN MATCH THEN english_hunspell END

The question is same as in the previous example.

> Instead of supporting old way of putting stopwords on dictionaries, we
> can make them dictionaries on their own.  This would then become
> something like:
> 
> CASE polish_stopword
> WHEN NO MATCH THEN polish_isspell
> END

Currently, stopwords increment position, for example:
SELECT to_tsvector('english','a test message');
-
 'messag':3 'test':2

A stopword 'a' has a position 1 but it is not in the vector.

If we want to save this behavior, we should somehow pass a stopword to
tsvector composition function (parsetext in ts_parse.c) for counter
increment or increment it in another way. Currently, an empty lexemes
array is passed as a result of LexizeExec.

One of possible way to do so is something like:
CASE polish_stopword
WHEN MATCH THEN KEEP -- stopword counting
ELSE polish_isspell
END

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pow support for pgbench

2017-10-30 Thread Alvaro Herrera
Michael Paquier wrote:

> Attaching patches directly to a thread is a better practice as if
> github goes away, any Postgres developers can still have an access to
> any code you publish using the public archives on postgresql.org.

Also, by posting to pgsql-hackers indicating intention to integrate to
Postgres you're implicitly making a statement about the license of your
contribution; see the second half of the last paragraph at
https://wiki.postgresql.org/wiki/Archives_Policy

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: full merge join on comparison clause

2017-10-30 Thread Alexander Kuzmenkov

Hi,

I am attaching the updated patch, rebased to 820c03.

On 09.10.2017 13:47, Ashutosh Bapat wrote:

Hi Alexander,
Commit c7a9fa399 has added another test on mergeopfamilies. I think
your patch will need to take care of that test.

On Wed, Oct 4, 2017 at 6:38 PM, Alexander Kuzmenkov
 wrote:

As discussed earlier, I changed the way we work with mergeopfamilies. I use
the "is_equality" flag to indicate whether the clause is an equality one,
and fill mergeopfamilies for both equality and inequality operators.
The updated patch is attached (rebased to 20b6552242).


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company






--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 925b4cf553..73e6a4ca74 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -172,31 +172,32 @@ typedef enum
  * to the opfamily and collation, with nulls at the indicated end of the range.
  * This allows us to obtain the needed comparison function from the opfamily.
  */
-static MergeJoinClause
+static void
 MJExamineQuals(List *mergeclauses,
 			   Oid *mergefamilies,
 			   Oid *mergecollations,
 			   int *mergestrategies,
 			   bool *mergenullsfirst,
-			   PlanState *parent)
+			   MergeJoinState *parent)
 {
-	MergeJoinClause clauses;
 	int			nClauses = list_length(mergeclauses);
 	int			iClause;
 	ListCell   *cl;
 
-	clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_UseEqual = (bool *) palloc0(nClauses * sizeof(bool));
+	parent->mj_UseLesser = (bool *) palloc0(nClauses * sizeof(bool));
 
 	iClause = 0;
 	foreach(cl, mergeclauses)
 	{
 		OpExpr	   *qual = (OpExpr *) lfirst(cl);
-		MergeJoinClause clause = [iClause];
+		MergeJoinClause clause = >mj_Clauses[iClause];
 		Oid			opfamily = mergefamilies[iClause];
 		Oid			collation = mergecollations[iClause];
-		StrategyNumber opstrategy = mergestrategies[iClause];
+		StrategyNumber sort_op_strategy = mergestrategies[iClause];
 		bool		nulls_first = mergenullsfirst[iClause];
-		int			op_strategy;
+		int			join_op_strategy;
 		Oid			op_lefttype;
 		Oid			op_righttype;
 		Oid			sortfunc;
@@ -207,28 +208,55 @@ MJExamineQuals(List *mergeclauses,
 		/*
 		 * Prepare the input expressions for execution.
 		 */
-		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent);
-		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent);
+		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), (PlanState *) parent);
+		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), (PlanState *) parent);
 
 		/* Set up sort support data */
 		clause->ssup.ssup_cxt = CurrentMemoryContext;
 		clause->ssup.ssup_collation = collation;
-		if (opstrategy == BTLessStrategyNumber)
+		if (sort_op_strategy == BTLessStrategyNumber)
 			clause->ssup.ssup_reverse = false;
-		else if (opstrategy == BTGreaterStrategyNumber)
+		else if (sort_op_strategy == BTGreaterStrategyNumber)
 			clause->ssup.ssup_reverse = true;
 		else	/* planner screwed up */
-			elog(ERROR, "unsupported mergejoin strategy %d", opstrategy);
+			elog(ERROR, "unsupported mergejoin strategy %d", sort_op_strategy);
 		clause->ssup.ssup_nulls_first = nulls_first;
 
 		/* Extract the operator's declared left/right datatypes */
 		get_op_opfamily_properties(qual->opno, opfamily, false,
-   _strategy,
+   _op_strategy,
    _lefttype,
    _righttype);
-		if (op_strategy != BTEqualStrategyNumber)	/* should not happen */
-			elog(ERROR, "cannot merge using non-equality operator %u",
- qual->opno);
+
+		/*
+		 * Determine whether we accept lesser and/or equal tuples of the inner
+		 * relation.
+		 */
+		switch (join_op_strategy)
+		{
+			case BTEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+break;
+
+			case BTLessEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTLessStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			case BTGreaterEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTGreaterStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			default:
+elog(ERROR, "unsupported join strategy %d", join_op_strategy);
+		}
 
 		/*
 		 * sortsupport routine must know if abbreviation optimization is
@@ -265,8 +293,6 @@ MJExamineQuals(List *mergeclauses,
 
 		iClause++;
 	}
-
-	return clauses;
 }
 
 /*
@@ -378,6 +404,14 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
 	return result;
 }
 
+/* Tuple comparison result */
+typedef enum
+{
+	MJCR_NextInner = 1,
+	MJCR_NextOuter = -1,
+	MJCR_Join = 0
+} 

Re: [HACKERS] pow support for pgbench

2017-10-30 Thread Michael Paquier
On Mon, Oct 30, 2017 at 11:56 AM, Raúl Marín Rodríguez
 wrote:
> both patches seem complementary. I've rebased my changes on top of that
> patch
> (v14) in https://git.io/vFtnT and everything seems to be working fine.

Attaching patches directly to a thread is a better practice as if
github goes away, any Postgres developers can still have an access to
any code you publish using the public archives on postgresql.org.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stats about skipped vacuums

2017-10-30 Thread Kyotaro HORIGUCHI
At Thu, 26 Oct 2017 15:06:30 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20171026.150630.115694437.horiguchi.kyot...@lab.ntt.co.jp>
> At Fri, 20 Oct 2017 19:15:16 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Alexander Korotkov
On Mon, Oct 30, 2017 at 2:07 PM, Oleg Bartunov  wrote:

> On Mon, Oct 30, 2017 at 12:05 PM, Oleg Bartunov 
> wrote:
> > On Sun, Oct 29, 2017 at 10:07 AM, Connor Wolf
> >  wrote:
> >> Hi there!
> >>
> >> I'm looking at implementing a custom indexing scheme, and I've been
> having
> >> trouble understanding the proper approach.
> >>
> >> Basically, I need a BK tree, which is a tree-structure useful for
> indexing
> >> arbitrary discrete metric-spaces (in my case, I'm interested in indexing
> >> across the hamming edit-distance of perceptual hashes, for fuzzy image
> >> searching). I'm pretty sure a SP-GiST index is the correct index type,
> as my
> >> tree is intrinsically unbalanced.
> >>
> >> I have a functional stand-alone implementation of a BK-Tree, and it
> works
> >> very well, but the complexity of managing what is basically a external
> index
> >> for my database has reached the point where it's significantly
> problematic,
> >> and it seems to be it should be moved into the database.
> >>
> >> Anyways, looking at the contents of postgres/src/backend/access/spgist,
> it
> >> looks pretty straightforward in terms of the actual C implementation,
> but
> >> I'm stuck understanding how to "install" a custom SP-GiST
> implementation.
> >> There are several GiST indexing implementations in the contrib
> directory,
> >> but no examples for how I'd go about implementing a loadable SP-GiST
> index.
> >>
> >> Basically, my questions are:
> >>
> >> Is it possible to implement a SP-GiST indexing scheme as a loadable
> module?
> >>
> >> If so, how?
> >> And is there an example I can base my implementation off of?
> >
> > Look on RUM access method ( https://github.com/postgrespro/rum ) we
> > developed using
> > api available since 9.6.
>
> or even simple, there is contrib/bloom access method, which illustrates
> developing access method as an extension.


I think Connor struggles to implement just an operator class.  Advising him
to implement an index access method is a good way to get him away of
PostgreSQL hacking for a long time :)

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Alexander Korotkov
Hi!

On Sun, Oct 29, 2017 at 12:07 PM, Connor Wolf 
wrote:

> I'm looking at implementing a custom indexing scheme, and I've been having
> trouble understanding the proper approach.
>
> Basically, I need a BK tree, which is a tree-structure useful for indexing
> arbitrary discrete metric-spaces (in my case, I'm interested in indexing
> across the hamming edit-distance of perceptual hashes, for fuzzy image
> searching). I'm *pretty* sure a SP-GiST index is the correct index type,
> as my tree is intrinsically unbalanced.
>

Yes, SP-GiST is appropriate index type for BK tree.  I'm pretty sure BK
tree could be implemented as SP-GiST opclass.
The only thing worrying me is selection pivot values for nodes.  SP-GiST
builds by insertion of index tuples on by one.  First pivot value for root
node in SP-GIST would be created once first leaf page overflows.  Thus, you
would have to select this pivot value basing on very small fraction in the
beginning of dataset.
As I know, BK tree is most efficient when root pivot value is selected
after looking in whole dataset and then hierarchically to subtrees.

BTW, did you try my extension for searching similar images.  It's quite
primitive, but works for some cases.
https://github.com/postgrespro/imgsmlr
https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdf

I have a functional stand-alone implementation of a BK-Tree, and it works
> very well, but the complexity of managing what is basically a external
> index for my database has reached the point where it's significantly
> problematic, and it seems to be it should be moved into the database.
>

Sure, moving this index to the database is right decision.

Anyways, looking at the contents of postgres/src/backend/access/spgist, it
> looks pretty straightforward in terms of the actual C implementation, but
> I'm stuck understanding how to "install" a custom SP-GiST implementation.
> There are several GiST indexing implementations in the contrib directory,
> but no examples for how I'd go about implementing a loadable SP-GiST index.
>
> Basically, my questions are:
>
>- Is it possible to implement a SP-GiST indexing scheme as a loadable
>module?
>
>  Yes, it's possible to define SP-GiST.

>
>- If so, how?
>
> The pretty same way as GiST opclass extension.  You have to define
supporting functions and operators and then define operator class over them.

>
>- And is there an example I can base my implementation off of?
>
>
>
> I'm relatively comfortable with C (much moreso with C++), but I haven't
> spent a lot of time looking at the postgresql codebase.  I don't think I
> could start from a empty folder and make a properly-implemented module in
> any reasonable period of time, so if I have a working example for some sort
> of index that uses the same interfaces that would really help a lot
>
I don't think there is an example in PostgreSQL source code tree or on
github.  But I've attached by early experiment with VP-tree (seems to be
pretty same as BK tree) using SP-GiST (see vptree.tar.gz).  Basing on this
experiment I realized that it's important to select root pivot value basing
on the whole dataset.  However, for your metric/dataset/queries it might
appear to be different.

It also would be nice to someday improve SP-GiST to support some global
strategies on index creation.  In particular, it would allow to resolve
selection of pivot values problem that I mention above.  Right now my
colleagues and me don't have time for that.  But I can assist you with
advises if you will decide to implement that.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


vptree.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pow support for pgbench

2017-10-30 Thread Michael Paquier
On Mon, Oct 30, 2017 at 11:07 AM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>
>> Please add this patch to the upcoming commit fest if you would like to
>> get some feedback:
>> https://commitfest.postgresql.org/15/
>>
>> I am adding as well Fabien in CC who worked in getting the internal
>> function infrastructure in the shape it is now (waaay better) with
>> commit 86c43f4.
>
> I think Raúl would do well to review this patch by Fabien
> https://www.postgresql.org/message-id/alpine.DEB.2.20.1710201835390.15170@lancre
> which adds a few functions and operators.

Good idea. pow() is not added by Fabien's patch, but an operator for
pow() could be something to add as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Current int & float overflow checking is slow.

2017-10-30 Thread Andres Freund
Hi,

On 2017-10-24 03:39:54 -0700, Andres Freund wrote:
> Largely that's due to the overflow checks.
>
> For integers we currently do:
>
> #define SAMESIGN(a,b) (((a) < 0) == ((b) < 0))
>
>   /*
>* Overflow check.  If the inputs are of different signs then their sum
>* cannot overflow.  If the inputs are of the same sign, their sum had
>* better be that sign too.
>*/
>   if (SAMESIGN(arg1, arg2) && !SAMESIGN(result, arg1))
>   ereport(ERROR,
>   (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
>errmsg("integer out of range")));
>
> which means that we turn a single integer instruction into ~10,
> including a bunch of branches.  All that despite the fact that most
> architectures have flag registers signalling integer overflow. It's just
> that C doesn't easily make that available.
>
> gcc exposes more efficient overflow detection via intrinsics:
> https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Integer-Overflow-Builtins.html
>
> Using that turns the non-error path from int4pl from:
>
>0x00826ec0 <+0>: mov0x20(%rdi),%rcx # arg1
>0x00826ec4 <+4>: mov0x28(%rdi),%rdx # arg2
>0x00826ec8 <+8>: mov%ecx,%esi
>0x00826eca <+10>:lea(%rdx,%rcx,1),%eax # add
># overflow check
>0x00826ecd <+13>:shr$0x1f,%edx
>0x00826ed0 <+16>:not%esi
>0x00826ed2 <+18>:shr$0x1f,%esi
>0x00826ed5 <+21>:cmp%dl,%sil
>0x00826ed8 <+24>:je 0x826f30 
>0x00826eda <+26>:mov%eax,%edx
>0x00826edc <+28>:shr$0x1f,%ecx
>0x00826edf <+31>:shr$0x1f,%edx
>0x00826ee2 <+34>:cmp%cl,%dl
>0x00826ee4 <+36>:je 0x826f30 
>/* overflow error code */
>0x00826f30 <+112>:   retq
>
> into
>
>0x00826ec0 <+0>: mov0x28(%rdi),%rax # arg2
>0x00826ec4 <+4>: add0x20(%rdi),%eax # arg1 + arg2
>0x00826ec7 <+7>: jo 0x826ecc  # jump if 
> overflowed
>0x00826ec9 <+9>: mov%eax,%eax # clear high bits
>0x00826ecb <+11>:retq
>
> which, not that surprisingly, is faster. Not to speak of easier to read
> ;)
>
> Besides the fact that the code is faster, there's also the issue that
> the current way to do overflow checks is not actually correct C, and
> requires compiler flags like -fwrapv.

Attached is a series of patches that:

0001) Introduces pg_{add,sub,mul}{16,32,64}_overflow(a, b, *result)
  These use compiler intrinsics on gcc/clang. If that's not
  available, they cast to a wider type and to overflow checks. For
  64bit there's a fallback for the case 128bit math is not
  available (here I stole from an old patch of Greg's).

  These fallbacks are, as far as I can tell, C free of overflow
  related undefined behaviour.

  Perhaps it should rather be pg_add_s32_overflow, or a similar
  naming scheme?

0002) Converts int.c, int8.c and a smattering of other functions to use
  the new facilities. This removes a fair amount of code.

  It might make sense to split this up further, but right now that's
  the set of functions that either are affected performancewise by
  previous overflow checks, and/or relied on wraparound
  overflow. There's probably more places, but this is what I found
  by visual inspection and compiler warnings.

0003) Removes -fwrapv. I'm *NOT* suggesting we apply this right now, but
  it seems like an important test for the new facilities. Without
  0002, tests would fail after this, after it all tests run
  successfully.

Greetings,

Andres Freund
>From 98fbe53be0a3046f8ace687f846f91a0043deee8 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 29 Oct 2017 22:13:54 -0700
Subject: [PATCH 1/3] Provide overflow safe integer math inline functions.

Author: Andres Freund, with some code stolen from Greg Stark
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 config/c-compiler.m4  |  22 
 configure |  33 ++
 configure.in  |   4 +
 src/include/common/int.h  | 229 ++
 src/include/pg_config.h.in|   3 +
 src/include/pg_config.h.win32 |   3 +
 6 files changed, 294 insertions(+)
 create mode 100644 src/include/common/int.h

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 6dcc7906491..0d91e52a28f 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -296,6 +296,28 @@ fi])# PGAC_C_BUILTIN_CONSTANT_P
 
 
 
+# PGAC_C_BUILTIN_OP_OVERFLOW
+# -
+# Check if the C compiler understands __builtin_$op_overflow(),
+# and define HAVE__BUILTIN_OP_OVERFLOW if so.
+#
+# Check for the most complicated case, 64 bit 

Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-30 Thread Chris Travers
On Mon, Oct 30, 2017 at 11:36 AM, Michael Paquier  wrote:

> On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers
>  wrote:
> > This also brings up a fairly major concern more generally about control
> by
> > the way.  A lot of cases where pg_rewind is called, the user doesn't
> > necessarily have much control on how it is called.  Moreover in many of
> > these cases, the user is probably not in a position to understand the
> > internals well enough to grasp what to check after.
>
> Likely they are not.
>

So currently I am submitting the current patch to commit fest.  I am open
to adding a new
--Include-path option in this patch, but I am worried about atomicity
concerns, and I am still
not really sure what the impact is for you (I haven't heard whether you
expect this file to be
there before the rewind, i.e. whether it would be on both master and slave,
or just on the slave).

Sp there is the question of under what specific circumstances this would
break for you.

>
> > Right, but there is a use case difference between "I am taking a backup
> of a
> > server" and "I need to get the server into  rejoin the replication as a
> > standby."
>
> The intersection of the first and second categories is not empty. Some
> take backups and use them to deploy standbys.
>

Sure, but it is not complete either.

>
> > A really good example of where this is a big problem is with replication
> > slots.  On a backup I would expect you want replication slots to be
> copied
> > in.
>
> I would actually expect the contrary, and please note that replication
> slots are not taken in a base backup, which is what the documentation
> says as well:
> https://www.postgresql.org/docs/10/static/protocol-replication.html
> "pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots,
> pg_stat_tmp, and pg_subtrans are copied as empty directories (even if
> they are symbolic links)."
>

Many of these are emptied on server restart but repslot is not.  What is
the logic
for excluding it from backups other than to avoid problems with replication
provisioning?

I mean if I have a replication slot for taking backups with barman (and I
am not actually doing replication) why would I *not* want that in my base
backup if I might have to restore to a new machine somewhere?

>
> Some code I have with 9.6's pg_bsaebackup removes manually replication
> slots as this logic is new in 10 ;)
>
> >> The pattern that base backups now use is an exclude list. In the
> >> future I would rather see base backups and pg_rewind using the same
> >> infrastructure for both things:
> >> - pg_rewind should use the replication protocol with BASE_BACKUP.
> >> Having it rely on root access now is crazy.
> >> - BASE_BACKUP should have an option where it is possible to exclude
> >> custom paths.
> >> What you are proposing here would make both diverge more, which is in
> >> my opinion not a good approach.
> >
> > How does rep mgr or other programs using pg_rewind know what to exclude?
>
> Good question. Answers could come from folks such as David Steele
> (pgBackRest) or Marco (barman) whom I am attaching in CC.
>

Two points that further occur to me:

Shell globs seem to me to be foot guns in this area, I think special paths
should be one path per invocation of the option not "--exclude=pg_w*" since
this is just asking for problems as time goes on and things get renamed.

It also seems to me that adding specific paths is far safer than removing
specific paths.

> --
> Michael
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] pow support for pgbench

2017-10-30 Thread Alvaro Herrera
Michael Paquier wrote:

> Please add this patch to the upcoming commit fest if you would like to
> get some feedback:
> https://commitfest.postgresql.org/15/
> 
> I am adding as well Fabien in CC who worked in getting the internal
> function infrastructure in the shape it is now (waaay better) with
> commit 86c43f4.

I think Raúl would do well to review this patch by Fabien
https://www.postgresql.org/message-id/alpine.DEB.2.20.1710201835390.15170@lancre
which adds a few functions and operators.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Oleg Bartunov
On Mon, Oct 30, 2017 at 12:05 PM, Oleg Bartunov  wrote:
> On Sun, Oct 29, 2017 at 10:07 AM, Connor Wolf
>  wrote:
>> Hi there!
>>
>> I'm looking at implementing a custom indexing scheme, and I've been having
>> trouble understanding the proper approach.
>>
>> Basically, I need a BK tree, which is a tree-structure useful for indexing
>> arbitrary discrete metric-spaces (in my case, I'm interested in indexing
>> across the hamming edit-distance of perceptual hashes, for fuzzy image
>> searching). I'm pretty sure a SP-GiST index is the correct index type, as my
>> tree is intrinsically unbalanced.
>>
>> I have a functional stand-alone implementation of a BK-Tree, and it works
>> very well, but the complexity of managing what is basically a external index
>> for my database has reached the point where it's significantly problematic,
>> and it seems to be it should be moved into the database.
>>
>> Anyways, looking at the contents of postgres/src/backend/access/spgist, it
>> looks pretty straightforward in terms of the actual C implementation, but
>> I'm stuck understanding how to "install" a custom SP-GiST implementation.
>> There are several GiST indexing implementations in the contrib directory,
>> but no examples for how I'd go about implementing a loadable SP-GiST index.
>>
>> Basically, my questions are:
>>
>> Is it possible to implement a SP-GiST indexing scheme as a loadable module?
>>
>> If so, how?
>> And is there an example I can base my implementation off of?
>
> Look on RUM access method ( https://github.com/postgrespro/rum ) we
> developed using
> api available since 9.6.

or even simple, there is contrib/bloom access method, which illustrates
developing access method as an extension.

>
>
>>
>> I'm relatively comfortable with C (much moreso with C++), but I haven't
>> spent a lot of time looking at the postgresql codebase.  I don't think I
>> could start from a empty folder and make a properly-implemented module in
>> any reasonable period of time, so if I have a working example for some sort
>> of index that uses the same interfaces that would really help a lot.
>>
>> Thanks!
>> Connor


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Oleg Bartunov
On Sun, Oct 29, 2017 at 10:07 AM, Connor Wolf
 wrote:
> Hi there!
>
> I'm looking at implementing a custom indexing scheme, and I've been having
> trouble understanding the proper approach.
>
> Basically, I need a BK tree, which is a tree-structure useful for indexing
> arbitrary discrete metric-spaces (in my case, I'm interested in indexing
> across the hamming edit-distance of perceptual hashes, for fuzzy image
> searching). I'm pretty sure a SP-GiST index is the correct index type, as my
> tree is intrinsically unbalanced.
>
> I have a functional stand-alone implementation of a BK-Tree, and it works
> very well, but the complexity of managing what is basically a external index
> for my database has reached the point where it's significantly problematic,
> and it seems to be it should be moved into the database.
>
> Anyways, looking at the contents of postgres/src/backend/access/spgist, it
> looks pretty straightforward in terms of the actual C implementation, but
> I'm stuck understanding how to "install" a custom SP-GiST implementation.
> There are several GiST indexing implementations in the contrib directory,
> but no examples for how I'd go about implementing a loadable SP-GiST index.
>
> Basically, my questions are:
>
> Is it possible to implement a SP-GiST indexing scheme as a loadable module?
>
> If so, how?
> And is there an example I can base my implementation off of?

Look on RUM access method ( https://github.com/postgrespro/rum ) we
developed using
api available since 9.6.


>
> I'm relatively comfortable with C (much moreso with C++), but I haven't
> spent a lot of time looking at the postgresql codebase.  I don't think I
> could start from a empty folder and make a properly-implemented module in
> any reasonable period of time, so if I have a working example for some sort
> of index that uses the same interfaces that would really help a lot.
>
> Thanks!
> Connor


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An unlikely() experiment

2017-10-30 Thread David Rowley
On 30 October 2017 at 22:44, Andres Freund  wrote:
> On 2017-10-30 22:39:01 +1300, David Rowley wrote:
>> Today I was thinking, to get around that issue, we might be able to
>> generate another thin wrapper around elog_start() and mark that as
>> __attribute__((cold)) and fudge the macro a bit to call that function
>> instead if it can detect a compile time const level >= ERROR. I've not
>> looked at the code again to remind myself if that would be possible.
>
> Yes, that's what I was thinking too. Add a elog_fatal() wrapping
> elog_finish(), and move the if (__builtin_constant_p(elevel)) branch a
> bit earlier, and that should work.  Similar with errstart_fatal() for
> ereport().

This may have been too good to be true. I can't seem to get it to work
and I think it's because the function is inside the do{}while(0) and
the if (__builtin_constant_p(elevel) && (elevel) >= ERROR) branch,
which appears to mean that:

"The paths leading to call of cold functions within code are marked as
unlikely by the branch prediction mechanism" [1]

is not the path that the macro is in in the calling function, like we
might have hoped.

I can get the assembly to change if I put an unlikely() around the
condition or if I go and vandalize the macro to become:

#define elog(elevel, ...) \
elog_start_error(__FILE__, __LINE__, PG_FUNCNAME_MACRO)

[1] https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html



-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pow support for pgbench

2017-10-30 Thread Michael Paquier
On Fri, Oct 27, 2017 at 4:51 PM, Raúl Marín Rodríguez
 wrote:
> I've written a small patch to add support for pow() in pgbench.

Cool.

> The main reason behind it is that I'm currently using a shell call to do it
> which takes between 2-10 ms that can be a big percentage of the time taken
> by the whole transaction. For example (shortened):
>
> latency average = 11.718 ms
>  - statement latencies in milliseconds:
>  2.834  \setshell POWER2  awk 'BEGIN {p=2^ARGV[1]; print p }'
> :ZOOM_CURRENT
>  8.846  SELECT
> ST_AsBinary(ST_Simplify(ST_SnapToGrid("the_geom_webmercator",:SNAP),
> :SIMPLIFY)) AS geom FROM
>
> I've also updated the related docs and added some tests. Please let me know
> if I'm missing anything.

Please add this patch to the upcoming commit fest if you would like to
get some feedback:
https://commitfest.postgresql.org/15/

I am adding as well Fabien in CC who worked in getting the internal
function infrastructure in the shape it is now (waaay better) with
commit 86c43f4.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-30 Thread Michael Paquier
On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers
 wrote:
> This also brings up a fairly major concern more generally about control by
> the way.  A lot of cases where pg_rewind is called, the user doesn't
> necessarily have much control on how it is called.  Moreover in many of
> these cases, the user is probably not in a position to understand the
> internals well enough to grasp what to check after.

Likely they are not.

> Right, but there is a use case difference between "I am taking a backup of a
> server" and "I need to get the server into  rejoin the replication as a
> standby."

The intersection of the first and second categories is not empty. Some
take backups and use them to deploy standbys.

> A really good example of where this is a big problem is with replication
> slots.  On a backup I would expect you want replication slots to be copied
> in.

I would actually expect the contrary, and please note that replication
slots are not taken in a base backup, which is what the documentation
says as well:
https://www.postgresql.org/docs/10/static/protocol-replication.html
"pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots,
pg_stat_tmp, and pg_subtrans are copied as empty directories (even if
they are symbolic links)."

Some code I have with 9.6's pg_bsaebackup removes manually replication
slots as this logic is new in 10 ;)

>> The pattern that base backups now use is an exclude list. In the
>> future I would rather see base backups and pg_rewind using the same
>> infrastructure for both things:
>> - pg_rewind should use the replication protocol with BASE_BACKUP.
>> Having it rely on root access now is crazy.
>> - BASE_BACKUP should have an option where it is possible to exclude
>> custom paths.
>> What you are proposing here would make both diverge more, which is in
>> my opinion not a good approach.
>
> How does rep mgr or other programs using pg_rewind know what to exclude?

Good question. Answers could come from folks such as David Steele
(pgBackRest) or Marco (barman) whom I am attaching in CC.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pow support for pgbench

2017-10-30 Thread Raúl Marín Rodríguez
Hi,

I've written a small patch to add support for pow() in pgbench.

The main reason behind it is that I'm currently using a shell call to do it
which takes between 2-10 ms that can be a big percentage of the time taken
by the whole transaction. For example (shortened):

latency average = 11.718 ms
 - statement latencies in milliseconds:
 2.834  \setshell POWER2  awk 'BEGIN {p=2^ARGV[1]; print p }'
:ZOOM_CURRENT
 8.846  SELECT
ST_AsBinary(ST_Simplify(ST_SnapToGrid("the_geom_webmercator",:SNAP),
:SIMPLIFY)) AS geom FROM

I've also updated the related docs and added some tests. Please let me know
if I'm missing anything.

Regards,
*Raúl Marín Rodríguez*
carto.com
From 08a4d519e0c73d0f16acd9e5db9e5b547a884902 Mon Sep 17 00:00:00 2001
From: Raul Marin 
Date: Fri, 13 Oct 2017 17:42:23 +0200
Subject: [PATCH] Add pow() support to pgbench

---
 doc/src/sgml/ref/pgbench.sgml| 7 +++
 src/bin/pgbench/exprparse.y  | 3 +++
 src/bin/pgbench/pgbench.c| 9 +
 src/bin/pgbench/pgbench.h| 3 ++-
 src/bin/pgbench/t/001_pgbench_with_server.pl | 7 ++-
 5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e509e6c7f62..e12a149c5bb 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1022,6 +1022,13 @@ pgbench  options  d
sqrt(2.0)
1.414213562
   
+  
+   pow(x, y)
+   double if x or y are doubles, else integer
+   Numeric exponentiation
+   pow(2.0, 10)
+   1024.0/
+  
  
  

diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index b3a2d9bfd37..4db3b215abf 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -191,6 +191,9 @@ static const struct
 	{
 		"random_exponential", 3, PGBENCH_RANDOM_EXPONENTIAL
 	},
+	{
+		"pow", 2, PGBENCH_POW
+	},
 	/* keep as last array element */
 	{
 		NULL, 0, 0
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 5d8a01c72cf..a3aef108f84 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1346,6 +1346,7 @@ evalFunc(TState *thread, CState *st,
 		case PGBENCH_MUL:
 		case PGBENCH_DIV:
 		case PGBENCH_MOD:
+		case PGBENCH_POW:
 			{
 PgBenchValue *lval = [0],
 		   *rval = [1];
@@ -1381,6 +1382,10 @@ evalFunc(TState *thread, CState *st,
 			setDoubleValue(retval, ld / rd);
 			return true;
 
+		case PGBENCH_POW:
+			setDoubleValue(retval, pow(ld, rd));
+			return true;
+
 		default:
 			/* cannot get here */
 			Assert(0);
@@ -1442,6 +1447,10 @@ evalFunc(TState *thread, CState *st,
 
 			return true;
 
+		case PGBENCH_POW:
+			setIntValue(retval, pow(li, ri));
+			return true;
+
 		default:
 			/* cannot get here */
 			Assert(0);
diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h
index fd428af274f..e0132b5fcf6 100644
--- a/src/bin/pgbench/pgbench.h
+++ b/src/bin/pgbench/pgbench.h
@@ -75,7 +75,8 @@ typedef enum PgBenchFunction
 	PGBENCH_SQRT,
 	PGBENCH_RANDOM,
 	PGBENCH_RANDOM_GAUSSIAN,
-	PGBENCH_RANDOM_EXPONENTIAL
+	PGBENCH_RANDOM_EXPONENTIAL,
+	PGBENCH_POW
 } PgBenchFunction;
 
 typedef struct PgBenchExpr PgBenchExpr;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 11bc0fecfef..601c7ee0e4d 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -218,7 +218,9 @@ sub pgbench
 		qr{command=18.: double 18\b},
 		qr{command=19.: double 19\b},
 		qr{command=20.: double 20\b},
-		qr{command=21.: int 9223372036854775807\b}, ],
+		qr{command=21.: int 9223372036854775807\b},
+		qr{command=23.: int -27\b},
+		qr{command=24.: double 1024\b}, ],
 	'pgbench expressions',
 	{   '001_pgbench_expressions' => q{-- integer functions
 \set i1 debug(random(1, 100))
@@ -248,6 +250,9 @@ sub pgbench
 \set maxint debug(:minint - 1)
 -- reset a variable
 \set i1 0
+--- pow() operator
+\set poweri debug(pow(-3,3))
+\set powerd debug(pow(2.0,10))
 } });
 
 # backslash commands

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-10-30 Thread Hannu Krosing
but you can always do

with a (id, value) as (
  values (1, 'foo'), (2, 'bar'), (3, 'baz')
)
select set_config('custom.value',(select value from a where id = 2),true);

if you are worried about the evaluation order

On 29 October 2017 at 09:51, Chris Travers  wrote:

>
>
> On Sat, Oct 28, 2017 at 4:56 PM, Pavel Stehule 
> wrote:
>
>>
>>>
>> The creating database objects and necessary infrastructure is the most
>> simple task of this project. I'll be more happy if there are zero
>> intersection because variables and GUC are designed for different purposes.
>> But due SET keyword the intersection there is.
>>
>> When I thinking about it, I have only one, but important reason, why I
>> prefer design new type of database object -the GUC are stack based with
>> different default granularity - global, database, user, session, function.
>> This can be unwanted behave for variables - it can be source of hard to
>> detected bugs. I afraid so this behave can be too messy for usage as
>> variables.
>>
>> @1 I have not clean opinion about it - not sure if rights are good enough
>> - probably some user limits can be more practical - but can be hard to
>> choose result when some user limits and GUC will be against
>>
>
> I was mostly thinking that users can probably set things like work_mem and
> possibly this might be a problem.
>
>
>> @2 With variables typed custom GUC are not necessary
>>
>
> I don't know about that.  For example with the geoip2lookup extension it
> is nice that you could set the preferred language for translation on a per
> user basis or the mmdb path on a per-db basis.
>
>
>> @3 Why you need it? It is possible with set_config function now.
>>
>
> Yeah you could do it safely with set_config and a CTE, but suppose I have:
>
> with a (Id, value) as (values (1::Int, 'foo'), (2, 'bar'), (3, 'baz'))
> SELECT set_config('custom_val', value) from a where id = 2;
>
> What is the result out of this?  I would *expect* that this would probably
> run set_config 3 times and filter the output.
>
>
>>
>> Regards
>>
>> Pavel
>>
>>
>>
>>
>>>
>>>
 regards

 Pavel



>>>
>>>
>>> --
>>> Best Regards,
>>> Chris Travers
>>> Database Administrator
>>>
>>> Tel: +49 162 9037 210 <+49%20162%209037210> | Skype: einhverfr |
>>> www.adjust.com
>>> Saarbrücker Straße 37a, 10405 Berlin
>>> 
>>>
>>>
>>
>
>
> --
> Best Regards,
> Chris Travers
> Database Administrator
>
> Tel: +49 162 9037 210 <+49%20162%209037210> | Skype: einhverfr |
> www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
> 
>
>


[HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Connor Wolf

Hi there!

I'm looking at implementing a custom indexing scheme, and I've been 
having trouble understanding the proper approach.


Basically, I need a BK tree, which is a tree-structure useful for 
indexing arbitrary discrete metric-spaces (in my case, I'm interested in 
indexing across the hamming edit-distance of perceptual hashes, for 
fuzzy image searching). I'm /pretty/ sure a SP-GiST index is the correct 
index type, as my tree is intrinsically unbalanced.


I have a functional stand-alone implementation of a BK-Tree, and it 
works very well, but the complexity of managing what is basically a 
external index for my database has reached the point where it's 
significantly problematic, and it seems to be it should be moved into 
the database.


Anyways, looking at the contents of postgres/src/backend/access/spgist, 
it looks pretty straightforward in terms of the actual C implementation, 
but I'm stuck understanding how to "install" a custom SP-GiST 
implementation. There are several GiST indexing implementations in the 
contrib directory, but no examples for how I'd go about implementing a 
loadable SP-GiST index.


Basically, my questions are:

 * Is it possible to implement a SP-GiST indexing scheme as a loadable
   module?
 o If so, how?
 o And is there an example I can base my implementation off of?

I'm relatively comfortable with C (much moreso with C++), but I haven't 
spent a lot of time looking at the postgresql codebase.  I don't think I 
could start from a empty folder and make a properly-implemented module 
in any reasonable period of time, so if I have a working example for 
some sort of index that uses the same interfaces that would really help 
a lot.


Thanks!
Connor



[HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Alvaro Herrera
Tomas Vondra wrote:
> FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So
> it seems to be due to something that changed in the last release.

I've been trying to reproduce it for half an hour with no success (I
think my laptop is just too slow).  I bet this is related to the
addition of PageIndexTupleOverwrite (commit b1328d78f) though I find it
more likely that this was not *caused* by that commit but rather just
made it easier to hit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-30 Thread Chris Travers
On Mon, Oct 30, 2017 at 10:57 AM, Michael Paquier  wrote:

> On Mon, Oct 30, 2017 at 9:43 AM, Chris Travers 
> wrote:
> > Are there any cases right now where you have features added by
> extensions that write to directories which are required for a rewind?
>
> In some of the stuff I maintain, I actually have one case now of a
> configuration file included with include_if_exists by postgresql.conf
> and is expected to be generated by a component that my code doing the
> rewind has no direct access on... I can control how pg_rewind kicks
> in, but I think that you would actually break silently the current
> code, which is scary especially if I don't control the code where
> pg_rewind is called when Postgres 11 gets integrated into the product
> I am thinking about, and even more scary if there is no way to include
> something.
>

Ok, so there is an argument that there needs to be a way to include
additional paths in this patch.  One important question I would have in
these cases is if you expect these to be set only on the master.  If not,
then is this a problem and if not then how do you handle the fail-back
problem etc?

This also brings up a fairly major concern more generally about control by
the way.  A lot of cases where pg_rewind is called, the user doesn't
necessarily have much control on how it is called.  Moreover in many of
these cases, the user is probably not in a position to understand the
internals well enough to grasp what to check after.

>
> > The problem with an exclude list is that we cannot safely exclude
> > configuration files or logs (because these could be named anything),
> right?
>
> You have the exact same problem with base backups now, and any
> configuration files created by extensions are a per-case problem...
>

Right, but there is a use case difference between "I am taking a backup of
a server" and "I need to get the server into  rejoin the replication as a
standby."

A really good example of where this is a big problem is with replication
slots.  On a backup I would expect you want replication slots to be copied
in.  However when setting yourself up as a slave you most certainly do not
want to just copy these over from the master unless you have infinite disk
space.  I would argue that under *no* circumstance should pg_rewind *ever*
copy replication slots.  But pg_base_backup really *should* (and when
provisioning a new slave you should clear these as soon as it is set up).

The pattern that base backups now use is an exclude list. In the
> future I would rather see base backups and pg_rewind using the same
> infrastructure for both things:
> - pg_rewind should use the replication protocol with BASE_BACKUP.
> Having it rely on root access now is crazy.
> - BASE_BACKUP should have an option where it is possible to exclude
> custom paths.
> What you are proposing here would make both diverge more, which is in
> my opinion not a good approach.
>

How does rep mgr or other programs using pg_rewind know what to exclude?

Again I am not convinced setting up a replica and taking a backup for
disaster recovery are the same use case or have the same requirements.

--
> Michael
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-30 Thread Michael Paquier
On Mon, Oct 30, 2017 at 9:43 AM, Chris Travers  wrote:
> Are there any cases right now where you have features added by extensions 
> that write to directories which are required for a rewind?

In some of the stuff I maintain, I actually have one case now of a
configuration file included with include_if_exists by postgresql.conf
and is expected to be generated by a component that my code doing the
rewind has no direct access on... I can control how pg_rewind kicks
in, but I think that you would actually break silently the current
code, which is scary especially if I don't control the code where
pg_rewind is called when Postgres 11 gets integrated into the product
I am thinking about, and even more scary if there is no way to include
something.

> The problem with an exclude list is that we cannot safely exclude
> configuration files or logs (because these could be named anything), right?

You have the exact same problem with base backups now, and any
configuration files created by extensions are a per-case problem...
The pattern that base backups now use is an exclude list. In the
future I would rather see base backups and pg_rewind using the same
infrastructure for both things:
- pg_rewind should use the replication protocol with BASE_BACKUP.
Having it rely on root access now is crazy.
- BASE_BACKUP should have an option where it is possible to exclude
custom paths.
What you are proposing here would make both diverge more, which is in
my opinion not a good approach.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An unlikely() experiment

2017-10-30 Thread Andres Freund
On 2017-10-30 22:39:01 +1300, David Rowley wrote:
> On 30 October 2017 at 22:34, Andres Freund  wrote:
> > Hi,
> >
> > On 2015-12-20 02:49:13 +1300, David Rowley wrote:
> >> Alternatively, if there was some way to mark the path as cold from within
> >> the path, rather than from the if() condition before the path, then we
> >> could perhaps do something in the elog() macro instead. I just couldn't
> >> figure out a way to do this.
> >
> > I just was thinking about this, and it turns out that
> > __attribute__((cold)) does what we need here. We could just mark
> > elog_start() and errstart() as cold, and afaict that should do the
> > trick.
> 
> I had actually been thinking about this again today. I had previously
> not thought  __attribute__((cold)) would be a good idea since if we
> mark elog_start() with that, then code paths with elog(NOTICE) and
> elog(LOG) not to mention DEBUG would be marked as cold.

That's an issue, true. Although I'm not convinced it's particularly
fatal - if you're ok with emitting debugging output in places it's
probably not the hottest codepath in the first place. And profile guided
optimization would overwrite hot/cold.

But:

> Today I was thinking, to get around that issue, we might be able to
> generate another thin wrapper around elog_start() and mark that as
> __attribute__((cold)) and fudge the macro a bit to call that function
> instead if it can detect a compile time const level >= ERROR. I've not
> looked at the code again to remind myself if that would be possible.

Yes, that's what I was thinking too. Add a elog_fatal() wrapping
elog_finish(), and move the if (__builtin_constant_p(elevel)) branch a
bit earlier, and that should work.  Similar with errstart_fatal() for
ereport().

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-30 Thread Chris Travers
First, thanks for your thoughts on this, and I am interested in probing
them more.

On Mon, Oct 30, 2017 at 9:04 AM, Michael Paquier 
wrote:

> On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers 
> wrote:
> > The Solution:
> > The solution is a whitelist of directories specified which are the only
> ones
> > which are synchronised.  The relevant part of this patch is:
> >
> > +/* List of directories to synchronize:
> > + * base data dirs (and ablespaces)
> > + * wal/transaction data
> > + * and that is it.
> > + *
> > + * This array is null-terminated to make
> > + * it easy to expand
> > + */
> >
> > +const char *rewind_dirs[] = {
> > +"base",
> > +"global",
> > +"pg_commit_ts",
> > +"pg_logical",
> > +"pg_multixact",
> > +"pg_serial",
> > +"pg_subtrans",
> > +"pg_tblspc",
> > +"pg_twophase",
> > +"pg_wal",
> > +"pg_xact",
> > +NULL
> > +};
>
> The problem with a white list, which is one reason why I do not favour
> it, is in the case where a feature adds in the data folder a new path
> for its data, particularly in the case where this is critical for a
> base backup. If this folder is not added in this list, then a rewind
> may be silently corrupted as well. There are also a set of directories
> in this list that we do not care about, pg_serial being one.
> pg_subtrans is a second, as it gets zeroed on startup.
>

The problem with an exclude list is that we cannot safely exclude
configuration files or logs (because these could be named anything), right?

Are there any cases right now where you have features added by extensions
that write to directories which are required for a rewind?  I am asking
because I would like to see if this is the minimum working change or if
this change is fundamentally broken currently and must be extended to allow
custom paths to be sync'd as well.

>
> And if you think about it, pg_rewind is actually the *same* processing
> as a base backup taken using the replication protocol. So instead of
> this patch I would recommend using excludeFiles and excludeDirContents
> by moving this list to a common header where frontend and backend can
> refer to. Note that basebackup.h includes replnodes.h, so my thoughts
> is that you should split the current header with something like
> basebackup_internal.h which is backend-only, and have pg_rewind fetch
> the list of directories to automatically ignore as those ones. You can
> also simplify a bit the code of pg_rewind a bit so as things like
> postmaster.opts. On top of that, there is visibly no need to tweak the
> SQL query fetching the directory list (which is useful for debugging
> as shaped now actually), but just change process_source_file so as its
> content is not added in the file map.
>

I am not sure it *should* be the same, however.  In a backup we probably
want to backup the postgresql.auto.conf, but on a failover, I don't think
we want to clobber configuration.  We certainly do not want to sometimes
clobber configuration but not other times (which is what happens right now
in some cases).  And under no circumstances do we want to clobber logs on a
failed server with logs on a working server.  That's asking for serious
problems in my view.

If you think about it, there's a huge difference in use case in backing up
a database cluster (Including replication slots, configs in the dir etc)
and re-syncing the data so that replication can resume, and I think there
are some dangers that come up when assuming these should be closely tied
together.

>
> Then there is a second case where you do not want a set of folders to
> be included, but those can be useful by default, an example here being
> pg_wal where one might want to have an empty path to begin with. A
> third case is a set of folders that you do not care about, but depends
> on the deployment, being here "log" for example. For those you could
> just add an --exclude-path option which simply piles up paths into a
> linked list when called multiple times. This could happen with a
> second patch.
>

Agreed.  And one could add an "--include-path" option to allow for unusual
cases where you want extra directories, such as replication slots, or the
like.

I think another patch could also specifically empty and perhaps create a
replication slot allowing for one to bring tings back up via streaming
replication more safely.

>
> > From there we iterate over this array for directory-based approaches in
> > copy_fetch.c and with one query per directory in libpq_fetch.c.  This
> also
> > means shifting from the basic interface from PQexec to PQexecParams.  It
> > would be possible to move to binary formats too, but this was not done
> > currently in order to simplify code review (that could be a separate
> > independent patch at a later time).
>
> -res = PQexec(conn, sql);
> +for (p = 0; rewind_dirs[p] != NULL; ++p)
> +{
> +const char *paths[1];
> +paths[0] = 

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-10-30 Thread Masahiko Sawada
On Mon, Oct 30, 2017 at 5:48 PM, Ashutosh Bapat
 wrote:
> On Thu, Oct 26, 2017 at 7:41 PM, Masahiko Sawada  
> wrote:
>>
>> Because I don't want to break the current user semantics. that is,
>> currently it's guaranteed that the subsequent reads can see the
>> committed result of previous writes even if the previous transactions
>> were distributed transactions. And it's ensured by writer side. If we
>> can make the reader side ensure it, the backend process don't need to
>> wait for the resolver process.
>>
>> The waiting backend process are released by resolver process after the
>> resolver process tried to resolve foreign transactions. Even if
>> resolver process failed to either connect to foreign server or to
>> resolve foreign transaction the backend process will be released and
>> the foreign transactions are leaved as dangling transaction in that
>> case, which are processed later. Also if resolver process takes a long
>> time to resolve foreign transactions for whatever reason the user can
>> cancel it by Ctl-c anytime.
>>
>
> So, there's no guarantee that the next command issued from the
> connection *will* see the committed data, since the foreign
> transaction might not have committed because of a network glitch
> (say). If we go this route of making backends wait for resolver to
> resolve the foreign transaction, we will have add complexity to make
> sure that the waiting backends are woken up in problematic events like
> crash of the resolver process OR if the resolver process hangs in a
> connection to a foreign server etc. I am not sure that the complexity
> is worth the half-guarantee.
>

Hmm, maybe I was wrong. I now think that the waiting backends can be
woken up only in following cases;
- The resolver process succeeded to resolve all foreign transactions.
- The user did the cancel (e.g. ctl-c).
- The resolver process failed to resolve foreign transaction for a
reason of there is no such prepared transaction on foreign server.

In other cases the resolver process should not release the waiters.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An unlikely() experiment

2017-10-30 Thread David Rowley
On 30 October 2017 at 22:34, Andres Freund  wrote:
> Hi,
>
> On 2015-12-20 02:49:13 +1300, David Rowley wrote:
>> Alternatively, if there was some way to mark the path as cold from within
>> the path, rather than from the if() condition before the path, then we
>> could perhaps do something in the elog() macro instead. I just couldn't
>> figure out a way to do this.
>
> I just was thinking about this, and it turns out that
> __attribute__((cold)) does what we need here. We could just mark
> elog_start() and errstart() as cold, and afaict that should do the
> trick.

I had actually been thinking about this again today. I had previously
not thought  __attribute__((cold)) would be a good idea since if we
mark elog_start() with that, then code paths with elog(NOTICE) and
elog(LOG) not to mention DEBUG would be marked as cold. Today I was
thinking, to get around that issue, we might be able to generate
another thin wrapper around elog_start() and mark that as
__attribute__((cold)) and fudge the macro a bit to call that function
instead if it can detect a compile time const level >= ERROR. I've not
looked at the code again to remind myself if that would be possible.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An unlikely() experiment

2017-10-30 Thread Andres Freund
Hi,

On 2015-12-20 02:49:13 +1300, David Rowley wrote:
> Alternatively, if there was some way to mark the path as cold from within
> the path, rather than from the if() condition before the path, then we
> could perhaps do something in the elog() macro instead. I just couldn't
> figure out a way to do this.

I just was thinking about this, and it turns out that
__attribute__((cold)) does what we need here. We could just mark
elog_start() and errstart() as cold, and afaict that should do the
trick.

Thanks to Jakub from #gcc for pointing that out.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git down

2017-10-30 Thread Magnus Hagander
On Mon, Oct 30, 2017 at 2:35 AM, Andreas Karlsson  wrote:

> On 10/27/2017 10:51 PM, Erik Rijkers wrote:
>
>> git.postgresql.org is down/unreachable
>>
>> ( git://git.postgresql.org/git/postgresql.git )
>>
>
> Yes, I noticed this too, but https://git.postgresql.org/git/postgresql.git
> still works fine.
>
> I guess it makes sense to remove unencrypted access, but in that case
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary should not
> advertise supporting the git protocol. I have not seen any announcement
> either, but that could just be me not paying enough attention.


We definitely still support the unencrypted git protocol. I do recommend
using https, but that doesn't mean git shouldn't work. There seems to have
been some network issues and the git daemon doesn't do a very good job of
handling hung sessions. I've cleaned up for now and it seems to be working
again, and we'll do some more digging into what actually was the root
cause.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-10-30 Thread Ashutosh Bapat
On Thu, Oct 26, 2017 at 7:41 PM, Masahiko Sawada  wrote:
>
> Because I don't want to break the current user semantics. that is,
> currently it's guaranteed that the subsequent reads can see the
> committed result of previous writes even if the previous transactions
> were distributed transactions. And it's ensured by writer side. If we
> can make the reader side ensure it, the backend process don't need to
> wait for the resolver process.
>
> The waiting backend process are released by resolver process after the
> resolver process tried to resolve foreign transactions. Even if
> resolver process failed to either connect to foreign server or to
> resolve foreign transaction the backend process will be released and
> the foreign transactions are leaved as dangling transaction in that
> case, which are processed later. Also if resolver process takes a long
> time to resolve foreign transactions for whatever reason the user can
> cancel it by Ctl-c anytime.
>

So, there's no guarantee that the next command issued from the
connection *will* see the committed data, since the foreign
transaction might not have committed because of a network glitch
(say). If we go this route of making backends wait for resolver to
resolve the foreign transaction, we will have add complexity to make
sure that the waiting backends are woken up in problematic events like
crash of the resolver process OR if the resolver process hangs in a
connection to a foreign server etc. I am not sure that the complexity
is worth the half-guarantee.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Robert Haas
On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs  wrote:
> Nothing I am proposing blocks later work.

That's not really true.  Nobody's going to be happy if MERGE has one
behavior in one set of cases and an astonishingly different behavior
in another set of cases.  If you adopt a behavior for certain cases
that can't be extended to other cases, then you're blocking a
general-purpose MERGE.

And, indeed, it seems that you're proposing an implementation that
adds no new functionality, just syntax compatibility.  Do we really
want or need two syntaxes  for the same thing in core?  I kinda think
Peter might have the right idea here.  Under his proposal, we'd be
getting something that is, in a way, new.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Jsonb transform for pl/python

2017-10-30 Thread Anthony Bykov
On Sun, 29 Oct 2017 19:11:02 +0100
David Fetter  wrote:

> Thanks for your hard work!
> 
> Should there also be one for PL/Python3U?
> 
> Best,
> David.
Hi.
Actually, there is one for PL/Python3U. This patch contains following
extensions:
jsonb_plpythonu
jsonb_plpython2u
jsonb_plpython3u
"make install" checks which python major version was your postgresql
configured with and installs corresponding extension.

--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-30 Thread Rajkumar Raghuwanshi
On Mon, Oct 30, 2017 at 12:20 PM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> In the previous versions, RT index of the table needed to be passed to
> partition.c, which I realized is no longer needed, so I removed that
> requirement from the interface.  As a result, patches 0002 and 0003 have
> changed in this version.
>

Thanks for the fix.

I am getting wrong output when default is sub-partitioned further, below is
a test case.

CREATE TABLE lpd(a int, b varchar, c float) PARTITION BY LIST (a);
CREATE TABLE lpd_p1 PARTITION OF lpd FOR VALUES IN (1,2,3);
CREATE TABLE lpd_p2 PARTITION OF lpd FOR VALUES IN (4,5);
CREATE TABLE lpd_d PARTITION OF lpd DEFAULT PARTITION BY LIST(a);
CREATE TABLE lpd_d1 PARTITION OF lpd_d FOR VALUES IN (7,8,9);
CREATE TABLE lpd_d2 PARTITION OF lpd_d FOR VALUES IN (10,11,12);
CREATE TABLE lpd_d3 PARTITION OF lpd_d FOR VALUES IN (6,null);
INSERT INTO lpd SELECT i,i,i FROM generate_Series (1,12)i;
INSERT INTO lpd VALUES (null,null,null);

--on HEAD
postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
a IS NOT NULL ORDER BY 1;
 QUERY PLAN
-
 Sort
   Sort Key: ((lpd_p1.tableoid)::regclass)
   ->  Result
 ->  Append
   ->  Seq Scan on lpd_p1
 Filter: (a IS NOT NULL)
   ->  Seq Scan on lpd_p2
 Filter: (a IS NOT NULL)
   ->  Seq Scan on lpd_d3
 Filter: (a IS NOT NULL)
   ->  Seq Scan on lpd_d1
 Filter: (a IS NOT NULL)
   ->  Seq Scan on lpd_d2
 Filter: (a IS NOT NULL)
(14 rows)

postgres=#
postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
BY 1;
 tableoid | a  | b  | c
--+++
 lpd_p1   |  1 | 1  |  1
 lpd_p1   |  2 | 2  |  2
 lpd_p1   |  3 | 3  |  3
 lpd_p2   |  4 | 4  |  4
 lpd_p2   |  5 | 5  |  5
 lpd_d1   |  7 | 7  |  7
 lpd_d1   |  8 | 8  |  8
 lpd_d1   |  9 | 9  |  9
 lpd_d2   | 12 | 12 | 12
 lpd_d2   | 10 | 10 | 10
 lpd_d2   | 11 | 11 | 11
 lpd_d3   |  6 | 6  |  6
(12 rows)


--on HEAD + v8 patches

postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
a IS NOT NULL ORDER BY 1;
 QUERY PLAN
-
 Sort
   Sort Key: ((lpd_p1.tableoid)::regclass)
   ->  Result
 ->  Append
   ->  Seq Scan on lpd_p1
 Filter: (a IS NOT NULL)
   ->  Seq Scan on lpd_p2
 Filter: (a IS NOT NULL)
(8 rows)

postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
BY 1;
 tableoid | a | b | c
--+---+---+---
 lpd_p1   | 1 | 1 | 1
 lpd_p1   | 2 | 2 | 2
 lpd_p1   | 3 | 3 | 3
 lpd_p2   | 4 | 4 | 4
 lpd_p2   | 5 | 5 | 5
(5 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] parallelize queries containing initplans

2017-10-30 Thread tushar

On 10/30/2017 09:02 AM, Amit Kapila wrote:

Thanks a lot Tushar for testing this patch.  In the latest patch, I
have just rebased some comments, there is no code change, so I don't
expect any change in behavior, but feel free to test it once again.


Thanks Amit. Sure.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-30 Thread Michael Paquier
On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers  wrote:
> The Solution:
> The solution is a whitelist of directories specified which are the only ones
> which are synchronised.  The relevant part of this patch is:
>
> +/* List of directories to synchronize:
> + * base data dirs (and ablespaces)
> + * wal/transaction data
> + * and that is it.
> + *
> + * This array is null-terminated to make
> + * it easy to expand
> + */
>
> +const char *rewind_dirs[] = {
> +"base",
> +"global",
> +"pg_commit_ts",
> +"pg_logical",
> +"pg_multixact",
> +"pg_serial",
> +"pg_subtrans",
> +"pg_tblspc",
> +"pg_twophase",
> +"pg_wal",
> +"pg_xact",
> +NULL
> +};

The problem with a white list, which is one reason why I do not favor
it, is in the case where a feature adds in the data folder a new path
for its data, particularly in the case where this is critical for a
base backup. If this folder is not added in this list, then a rewind
may be silently corrupted as well. There are also a set of directories
in this list that we do not care about, pg_serial being one.
pg_subtrans is a second, as it gets zeroed on startup.

And if you think about it, pg_rewind is actually the *same* processing
as a base backup taken using the replication protocol. So instead of
this patch I would recommend using excludeFiles and excludeDirContents
by moving this list to a common header where frontend and backend can
refer to. Note that basebackup.h includes replnodes.h, so my thoughts
is that you should split the current header with something like
basebackup_internal.h which is backend-only, and have pg_rewind fetch
the list of directories to automatically ignore as those ones. You can
also simplify a bit the code of pg_rewind a bit so as things like
postmaster.opts. On top of that, there is visibly no need to tweak the
SQL query fetching the directory list (which is useful for debugging
as shaped now actually), but just change process_source_file so as its
content is not added in the file map.

Then there is a second case where you do not want a set of folders to
be included, but those can be useful by default, an example here being
pg_wal where one might want to have an empty path to begin with. A
third case is a set of folders that you do not care about, but depends
on the deployment, being here "log" for example. For those you could
just add an --exclude-path option which simply piles up paths into a
linked list when called multiple times. This could happen with a
second patch.

> From there we iterate over this array for directory-based approaches in
> copy_fetch.c and with one query per directory in libpq_fetch.c.  This also
> means shifting from the basic interface from PQexec to PQexecParams.  It
> would be possible to move to binary formats too, but this was not done
> currently in order to simplify code review (that could be a separate
> independent patch at a later time).

-res = PQexec(conn, sql);
+for (p = 0; rewind_dirs[p] != NULL; ++p)
+{
+const char *paths[1];
+paths[0] = rewind_dirs[p];
+res = PQexecParams(conn, sql,  1, NULL, paths, NULL, NULL, 0);
Calling multiple times the query to list all directories is messy IMO.
And this is N-costly processing if there are many files to look at,
say many relation files.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-10-30 Thread Ashutosh Bapat
On Fri, Oct 27, 2017 at 1:36 AM, Tom Lane  wrote:
> David Rowley  writes:
>> On 27 October 2017 at 01:44, Ashutosh Bapat
>>  wrote:
>>> I think Antonin has a point. The join processing may deem some base
>>> relations dummy (See populate_joinrel_with_paths()). So an Append
>>> relation which had multiple child alive at the end of
>>> set_append_rel_size() might ultimately have only one child after
>>> partition-wise join has worked on it.
>
> TBH, that means partition-wise join planning is broken and needs to be
> redesigned.  It's impossible that we're going to make sane planning
> choices if the sizes of relations change after we've already done some
> planning work.

The mark_dummy_rel() call that marks a joining relation dummy was
added by 719012e013f10f72938520c46889c52df40fa329. The joining
relations are planned before the joins they participate in are
planned. Further some of the joins in which it participates might have
been planned before the joining pair, which causes it to be marked
dummy, is processed by populate_joinrel_with_paths(). So we already
have places where the sizes of relation changes during planning.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-10-30 Thread Amit Kapila
On Mon, Oct 30, 2017 at 10:04 AM, Robert Haas  wrote:
> On Mon, Oct 30, 2017 at 8:25 AM, Amit Kapila  wrote:
>> Thanks.  I have closed this entry in CF app, however, I am not sure
>> what is the best way to deal with the entry present in PostgreSQL 10
>> Open Items page[1].  The entry for this bug seems to be present in
>> Older Bugs section.  Shall we leave it as it is or do we want to do
>> something else with it?
>>
>> [1] - https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items
>
> How about just adding an additional bullet point for that item that
> says "fixed by commit blah blah for 10.1"?
>

Sounds good, so updated the Open items page accordingly.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reading timeline from pg_control on replication slave

2017-10-30 Thread Michael Paquier
On Mon, Oct 30, 2017 at 2:48 AM, Craig Ringer  wrote:
> (I'd quite like ThisTimeLineID to go away as a global. It's messy and
> confusing, and I'd much rather it be fetched when needed).

Yeah, I agree. My take on the matter is that we could just use the
status data of the control file which is in shared memory as the only
writers to it are the checkpointer and the startup processes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-10-30 Thread Masahiko Sawada
On Fri, Oct 27, 2017 at 12:03 AM, Robert Haas  wrote:
> On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada  
> wrote:
>> Since the previous patch conflicts with current HEAD, I attached the
>> updated patch for next CF.
>
> I think we should back up here and ask ourselves a couple of questions:

Thank you for summarizing of the purpose and discussion of this patch.

> 1. What are we trying to accomplish here?
>
> 2. Is this the best way to accomplish it?
>
> To the first question, the problem as I understand it as follows:
> Heavyweight locks don't conflict between members of a parallel group.
> However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN.  Currently, those cases
> don't arise, because parallel operations are strictly read-only
> (except for inserts by the leader into a just-created table, when only
> one member of the group can be taking the lock anyway).  However, once
> we allow writes, they become possible, so some solution is needed.
>
> To the second question, there are a couple of ways we could fix this.
> First, we could continue to allow these locks to be taken in the
> heavyweight lock manager, but make them conflict even between members
> of the same lock group.  This is, however, complicated.  A significant
> problem (or so I think) is that the deadlock detector logic, which is
> already quite hard to test, will become even more complicated, since
> wait edges between members of a lock group need to exist at some times
> and not other times.  Moreover, to the best of my knowledge, the
> increased complexity would have no benefit, because it doesn't look to
> me like we ever take any other heavyweight lock while holding one of
> these four kinds of locks.  Therefore, no deadlock can occur: if we're
> waiting for one of these locks, the process that holds it is not
> waiting for any other heavyweight lock.  This gives rise to a second
> idea: move these locks out of the heavyweight lock manager and handle
> them with separate code that does not have deadlock detection and
> doesn't need as many lock modes.  I think that idea is basically
> sound, although it's possibly not the only sound idea.

I'm on the same page.

>
> However, that makes me wonder whether we shouldn't be a bit more
> aggressive with this patch: why JUST relation extension locks?  Why
> not all four types of locks listed above?  Actually, tuple locks are a
> bit sticky, because they have four lock modes.  The other three kinds
> are very similar -- all you can do is "take it" (implicitly, in
> exclusive mode), "try to take it" (again, implicitly, in exclusive
> mode), or "wait for it to be released" (i.e. share lock and then
> release).  Another idea is to try to handle those three types and
> leave the tuple locking problem for another day.
>
> I suggest that a good thing to do more or less immediately, regardless
> of when this patch ends up being ready, would be to insert an
> insertion that LockAcquire() is never called while holding a lock of
> one of these types.  If that assertion ever fails, then the whole
> theory that these lock types don't need deadlock detection is wrong,
> and we'd like to find out about that sooner or later.

I understood. I'll check that first. If this direction has no problem
and we changed these three locks so that it uses new lock mechanism,
we'll not be able to use these locks at the same time. Since it also
means that we impose a limitation to the future we should think
carefully about it. We can implement the deadlock detection mechanism
for it again but it doesn't make sense.

>
> On the details of the patch, it appears that RelExtLockAcquire()
> executes the wait-for-lock code with the partition lock held, and then
> continues to hold the partition lock for the entire time that the
> relation extension lock is held.  That not only makes all code that
> runs while holding the lock non-interruptible but makes a lot of the
> rest of this code pointless.  How is any of this atomics code going to
> be reached by more than one process at the same time if the entire
> bucket is exclusive-locked?  I would guess that the concurrency is not
> very good here for the same reason.  Of course, just releasing the
> bucket lock wouldn't be right either, because then ext_lock might go
> away while we've got a pointer to it, which wouldn't be good.  I think
> you could make this work if each lock had both a locker count and a
> pin count, and the object can only be removed when the pin_count is 0.
> So the lock algorithm would look like this:
>
> - Acquire the partition LWLock.
> - Find the item of interest, creating it if necessary.  If out of
> memory for more elements, sweep through the table and reclaim
> 0-pin-count entries, then retry.
> - Increment the pin count.
> - Attempt to acquire the lock atomically; if we succeed, release the
> partition lock and return.
> - If this was a