Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-23 Thread Michael Paquier
On Thu, Mar 24, 2016 at 5:18 AM, Petr Jelinek  wrote:
> Hmm I see you are right, I missed the last couple emails. Ok I'll mark it
> ready for committer - it does work fine on my vs2015 machine and I am happy
> with the code too.

Thanks, let's see what others have to say.

> Well, as happy as I can be given the locale mess.

We are responsible for that, that's the frustrating part :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-23 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 24 Mar 2016 13:04:49 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] Show dropped users' backends in pg_stat_activity

2016-03-23 Thread Kyotaro HORIGUCHI
Hi,

At Tue, 22 Mar 2016 22:47:16 -0500, Jim Nasby  wrote 
in <56f211c4.6010...@bluetreble.com>
> On 3/22/16 10:35 PM, Kyotaro HORIGUCHI wrote:
> >> Even if we maintained some interlock for a backend's login role
> >> identity,
> >> >I hardly think it would be practical to e.g. lock during transient SET
> >> >ROLE or security-definer-function-call operations.  So it's not like
> >> >we
> >> >can let the permissions system assume that a role OID being inquired
> >> >about
> >> >always matches a live entry in pg_authid.
> > Even if blocking DROPs is not perfect for all cases,
> > unconditionally allowing to DROP a role still doesn't seem proper
> > behavior, especially for replication roles. And session logins
> > seem to me to have enough reason to be treated differently than
> > disguising as another role using SET ROLE or sec-definer.
> 
> There's probably a way this could be handled, since DROP ROLE is
> presumably a very uncommon operation. Perhaps something as simple as
> keeping a single OID in shared memory for the role about to be
> dropped. That would serialize role drops, but I doubt that matters.

The OID in shared memory has the same role with a tuple with the
OID in pg_authid in this patch. So it seems need a lock or a
retry mechanism, or we see a message something like this:p

| DROP ROLE: Another role is concurrently being dropped.

> > The attached patch blocks DROP ROLE for roles that own active
> > sessions, and on the other hand prevents a session from being
> > activated if the login role is concurrently dropped.
> 
> I think this is fine for now, but... what happens if you drop a role
> that's in use on a streaming replica? Does replay stall or do we just
> ignore it?

It behaves as the same to the ordinary backends. DROP ROLE fails
for any active walsender's session(?) role, or a new walsender
rejects login attempts by the role under being dropped.

> There should probably be some doc changes to go with the patch too,
> no?

Yes, this is a PoC. I'll provide documentation if this is
acceptable, and necessary. "20.4 Dropping Roles" would be
appropriate?

http://www.postgresql.org/docs/9.5/static/role-removal.html

Treating a session as an object dependent on the role could be
cleaner but may be too complex and fragile..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-23 Thread Amit Kapila
On Thu, Mar 24, 2016 at 8:08 AM, Amit Kapila 
wrote:
>
> On Thu, Mar 24, 2016 at 5:40 AM, Andres Freund  wrote:
> >
> > Even after changing to scale 500, the performance benefits on this,
> > older 2 socket, machine were minor; even though contention on the
> > ClogControlLock was the second most severe (after ProcArrayLock).
> >
>

One more point, I wanted to say here which is that I think the benefit will
be shown mainly when the ClogControlLock has contention more than or near
to ProcArrayLock, otherwise even if patch reduces contention (you can see
via LWLock stats), the performance doesn't increase.  From Mithun's data
[1], related to LWLocks, it seems like at 88 clients in his test, the
contention on CLOGControlLock becomes more than ProcArrayLock and that is
the point where it has started showing noticeable performance gain.  I have
explained some more on that thread [2] about this point.  Is it possible
for you to once test in similar situation and see the behaviour (like for
client count greater than number of cores) w.r.t locking contention and TPS.


[1] -
http://www.postgresql.org/message-id/cad__ouh6pahj+q1mzzjdzlo4v9gjyufegtjnayxc0_lfh-4...@mail.gmail.com
[2] -
http://www.postgresql.org/message-id/caa4ek1lboq4e3ycge+fe0euzvu89cqgtugneajoienxjr0a...@mail.gmail.com


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-23 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Mar 14, 2016 at 6:44 PM, Peter Geoghegan  wrote:
>> I can produce a back-patchable variant of this if you and Peter E.
>> think this approach is okay.

> Where are we on this?

I'm generally okay with the approach used in
http://www.postgresql.org/message-id/CAM3SWZS0iV1HQ2fqNxvmTZm4+hPLAfH=7LeC4znmFX8az=a...@mail.gmail.com
though I have not reviewed that patch in detail.

One minor thought is that maybe it'd be better to do this:

errno = 0;
+   ERR_clear_error();

in the other order, so as not to assume that ERR_clear_error doesn't
set errno.  On the other hand, if it does, things are probably hopelessly
broken anyway; so I'm not sure there is any case where this helps.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-03-23 Thread Etsuro Fujita

On 2016/03/24 11:14, Michael Paquier wrote:

On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown  wrote:

I've noticed that you now can't cancel a query if there's DML pushdown
to a foreign server.  This previously worked while it was sending
individual statements as it interrupted and rolled it back.

Here's what the local server sees when trying to cancel:

# DELETE FROM remote.contacts;
^CCancel request sent
DELETE 500

This should probably be fixed.



Looking at what has been committed, execute_dml_stmt is using
PQexecParams, so we'd want to use an asynchronous call and loop on
PQgetResult with CHECK_FOR_INTERRUPTS() in it.


Will fix.

Thanks for the report, Thom!  Thanks for the advice, Michael!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-23 Thread Masahiko Sawada
On Thu, Mar 24, 2016 at 11:34 AM, Fujii Masao  wrote:
> On Wed, Mar 23, 2016 at 5:32 PM, Kyotaro HORIGUCHI
>  wrote:
>> Hello,
>>
>> At Tue, 22 Mar 2016 23:08:36 +0900, Fujii Masao  
>> wrote in 

Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch

2016-03-23 Thread Etsuro Fujita

On 2016/03/23 13:44, Ashutosh Bapat wrote:

An FDW can choose not to use those functions, so I don't see a
connection between scan list having simple Vars and existence of those
functions (actually a single one). But having those function would
minimize the code that each FDW has to write, in case they want those
functions. E.g. we have to translate Var::varno to tableoid in case
that's requested by pulling RTE and then getting oid out from there. If
that functionality is available in the core, 1. the code is not
duplicated 2. every FDW will get the same tableoid. Similarly for the
other columns.


OK.  Then, I'd like to propose a function that would create interger 
Lists of indexes of tableoids, xids and cids plus an OID List of these 
tableoids, in a given fdw_scan_tlist, on the assumption that each 
expression in the fdw_scan_tlist is a simple Var.  I'd also like to 
propose another function that would fill system columns using these 
Lists when creating a scan tuple.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-23 Thread Tom Lane
I wrote:
> Having said that, I also notice these dependencies:
> ...
> src/bin/initdb/encnames.c -> ../../../src/backend/utils/mb/encnames.c
> src/interfaces/libpq/encnames.c -> ../../../src/backend/utils/mb/encnames.c
> ...
> which seem like they'd be better handled by moving those files into
> src/common/.

After poking around a bit, it seems like it ought to be a win to move
both encnames.c and wchar.c into src/common/, as both of those modules
are meant to be built for both backend and frontend environments.  The
stumbling block is pg_wchar.h, which is a total fail modularity-wise,
as it does not bother to distinguish stuff which should be visible to the
general backend from stuff which should be visible to frontend programs
from stuff which is solely of interest to encoding conversion logic (and
in many cases only to *particular* conversions).  I think we should not do
this move without figuring out which parts of that file sanely belong in a
src/include/common/ header file and which don't.  That's not something
that I feel like tackling right now, as it's been in that same messy state
for awhile and recent changes haven't made it worse.  But it ought to be
revisited at some point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-03-23 Thread Peter Geoghegan
On Wed, Mar 23, 2016 at 8:05 PM, Tomas Vondra
 wrote:
> FWIW, maintenance_work_mem was set to 1GB on the i5 machine and 256MB on the
> Xeon. Hmm, maybe that's why we see no difference for CREATE INDEX on the i5,
> and an improvement on the Xeon.

That would explain it.

> Not a big deal - it's easy enough to change the config and repeat the
> benchmark. Are there any particular replacement_sort_mem values that you
> think would be interesting to configure?

I would start with replacement_sort_mem=64. i.e., 64KB, effectively disabled

> I have to admit I'm a bit afraid we'll introduce a new GUC that only very
> few users will know how to set properly, and so most people will run with
> the default value or set it to something stupid.

I agree.

> Are you saying none of the queries triggers the memory context resets? What
> queries would trigger that (to test the theory)?

They will still do the context resetting and so on just the same, but
would use a heap for the first attempt. But replacement_sort_mem=64
would let us know that

> But if we care about 9.5 -> 9.6 regressions, then perhaps we should include
> that commit into the benchmark, because that's what the users will see? Or
> have I misunderstood the second part?

I think it's good that you didn't test the March 17 commit of the
memory batching to the master branch when testing the master branch.
You should continue to do that, because we care about regressions
against 9.5 only. The only issue insofar as what code was tested is
that replacement_sort_mem was not set to 64 (to effectively disable
any use of the heap by the patch). I would like to see if we can get
rid of replacement_sort_mem without causing any real regressions,
which I think the memory context reset stuff makes possible.

There was a new version of my quicksort patch posted after March 17,
but don't worry about it -- that's totally cosmetic. Some minor
tweaks.

> BTW which patch does the memory batching? A quick search through git log did
> not return any recent patches mentioning these terms.

Commit 0011c0091e886b874e485a46ff2c94222ffbf550. But, like I said,
avoid changing what you're testing as master; do not include that. The
patch set you were testing is fine. Nothing is missing.

> As I mentioned above, I haven't realized work_mem does not matter for CREATE
> INDEX, and maintenance_work_mem was set to a fixed value for the whole test.
> And the two machines used different values for this particular configuration
> value - Xeon used just 256MB, while i5 used 1GB. So while on i5 it was just
> a single chunk, on Xeon there were multiple batches. Hence the different
> behavior.

Makes sense. Obviously this should be avoided, though.

> No it doesn't add overhead. The script actually does
>
> COPY (query) TO '/dev/null'
>
> on the server for all queries (except for the CREATE INDEX, obviously), so
> there should be pretty much no overhead due to transferring rows to the
> client and so on.

That still adds overhead, because the output functions are still used
to create a textual representation of data. This was how Andres tested
the improvement to the timestamptz output function committed to 9.6,
for example.

>> If you really wanted to make the patch look good, a sort with 5GB of
>> work_mem is the best way, FWIW. The heap data structure used by the
>> master branch tuplesort.c will handle that very badly. You use no
>> temp_tablespaces here. I wonder if the patch would do better with
>> that. Sorting can actually be quite I/O bound with the patch
>> sometimes, where it's usually CPU/memory bound with the heap,
>> especially with lots of work_mem. More importantly, it would be more
>> informative if the temp_tablespace was not affected by I/O from
>> Postgres' heap.
>
>
> I'll consider testing that. However, I don't think there was any significant
> I/O on the machines - particularly not on the Xeon, which has 16GB of RAM.
> So the temp files should fit into that quite easily.

Right, but with a bigger sort, there might well be more I/O.
Especially for the merge. It might be that that holds back the patch
from doing even better than the master branch does.

> The i5 machine has only 8GB of RAM, but it has 6 SSD drives in raid0. So I
> doubt it was I/O bound.

These patches can sometimes be significantly I/O bound on my laptop,
where that didn't happen before. Sounds unlikely here, though.

>> I also like seeing a sample of "trace_sort = on" output. I don't
>> expect you to carefully collect that in every case, but it can tell
>> us a lot about what's really going on when benchmarking.
>
>
> Sure, I can collect that.

Just for the interesting cases. Or maybe just dump it all and let me
figure it out for myself. trace_sort output shows me how many runs
they are, how abbreviation did, how memory was used, and even if the
sort was I/O bound at various stages (it dumps some getrusage stats to
the log, too). You can usually tell exactly what happened for 

Re: [HACKERS] Using quicksort for every external sort run

2016-03-23 Thread Tomas Vondra

Hi,

On 03/24/2016 03:00 AM, Peter Geoghegan wrote:

On Tue, Mar 22, 2016 at 3:35 PM, Tomas Vondra
 wrote:

I've tested the patch you've sent on 2016/3/11, which I believe is the last
version. I haven't tuned the replacement_sort_mem at all. because
my understanding was that it's not a 9.6 material (per your
message). So my intent was to test the configuration people are
likely to use by default.


I meant that using replacement selection in a special way with
CREATE INDEX was not 9.6 material. But replacement_sort_mem is. And
so, any case with the (maintenance)_work_mem <= 16MB will have used a
heap for the first run.


FWIW, maintenance_work_mem was set to 1GB on the i5 machine and 256MB on 
the Xeon. Hmm, maybe that's why we see no difference for CREATE INDEX on 
the i5, and an improvement on the Xeon.




I'm sorry I did not make a point of telling you this. It's my fault.
The result in any case is that pre-sorted cases will be similar with
and without the patch, since replacement selection can thereby make
one long run. But on non-sorted cases, the patch helps less because
it is in use less -- with not so much data overall, possibly much
less (which I think explains why the 1M row tests seem so much less
interesting than the 10M row tests).


Not a big deal - it's easy enough to change the config and repeat the 
benchmark. Are there any particular replacement_sort_mem values that you 
think would be interesting to configure?


I have to admit I'm a bit afraid we'll introduce a new GUC that only 
very few users will know how to set properly, and so most people will 
run with the default value or set it to something stupid.




I worry that at the low end, replacement_sort_mem makes the patch
have one long run, but still some more other runs, so merging is
unbalanced. We should consider if the patch can beat the master
branch at the low end without using a replacement selection heap. It
would do better in at least some cases in low memory conditions,
possibly a convincing majority of cases. I had hoped that my recent
idea (since committed) of resetting memory contexts would help a lot
with regressions when work_mem is very low, and that particular
theory isn't really tested here.


Are you saying none of the queries triggers the memory context resets? 
What queries would trigger that (to test the theory)?





I'm not sure which commit are you referring to. The benchmark was
done on a414d96a (from 2016/3/10). However I'd expect that to
affect both sets of measurements, although it's possible that it
affects the patched version differently.


You did test the right patches. It just so happens that the master
branch now has the memory batching stuff now, so it doesn't get
credited with that. I think this is good, though, because we care
about 9.5 -> 9.6 regressions.


So there's a commit in master (but not in 9.5), adding memory batching, 
but it got committed before a414d96a so the benchmark does not measure 
it's impact (with respect to 9.5). Correct?


But if we care about 9.5 -> 9.6 regressions, then perhaps we should 
include that commit into the benchmark, because that's what the users 
will see? Or have I misunderstood the second part?


BTW which patch does the memory batching? A quick search through git log 
did not return any recent patches mentioning these terms.



Improvement ratio (master time/patched time) for Xeon 10 million row
case "SELECT * FROM int_test_padding ORDER BY a DESC":

For work_mem of 8MB = 0.83, 32MB = 0.62, 128MB = 0.52, 512MB = 0.47,
1024MB = 1.00

So, it gets faster than the master branch as more memory is
available, but then it goes to 1.00 -- a perfect draw. I think that
this happened simply because at that point, the sort was an internal
sort (even though similar CREATE INDEX case did not go internal at
the same point). The (internal) 1024MB case is not that much faster
than the 512MB external case, which is pretty good.


Indeed.



There are also "near draws", where the ratio is 0.98 or so. I think
that this is because abbreviation is aborted, which can be a problem
with synthetic data + text -- you get a very slow sort either way,


That is possible, yes. It's true that the worst regressions are on text, 
although there are a few on numeric too (albeit not as significant).



where most time is spent calling strcoll(), and cache
characteristics matter much less. Those cases seemingly take much
longer overall, so this theory makes sense. Unfortunately,
abbreviated keys for text that is not C locale text was basically
disabled across the board today due to a glibc problem. :-(


Yeah. Bummer :-(



Whenever I see that the patch is exactly as fast as the master
branch, I am skeptical. I am particularly skeptical of all i5
results (including 10M cases), because the patch seems to be almost
perfectly matched to the master branch for CREATE INDEX cases (which
are the best cases for the patch on your Xeon server) -- it's much
easier to 

Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 9:43 PM, Amit Kapila  wrote:
> RecordAndGetPageWithFreeSpace() tries to search from the oldPage passed to
> it, rather than from top, so even if  RecordPageWithFreeSpace() doesn't
> update till root, it will be able to search the newly added page.  I agree
> with whatever you have said in another mail that we should introduce a new
> API to do a more targeted search for such cases.

OK, let's do that, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-23 Thread Amit Kapila
On Thu, Mar 24, 2016 at 5:40 AM, Andres Freund  wrote:
>
> On 2016-03-23 21:43:41 +0100, Andres Freund wrote:
> > I'm playing around with SELECT txid_current(); right now - that should
> > be about the most specific load for setting clog bits.
>
> Or so I thought.
>
> In my testing that showed just about zero performance difference between
> the patch and master. And more surprisingly, profiling showed very
> little contention on the control lock. Hacking
> TransactionIdSetPageStatus() to return without doing anything, actually
> only showed minor performance benefits.
>
> [there's also the fact that txid_current() indirectly acquires two
> lwlock twice, which showed up more prominently than control lock, but
> that I could easily hack around by adding a xid_current().]
>
> Similar with an INSERT only workload. And a small scale pgbench.
>
>
> Looking through the thread showed that the positive results you'd posted
> all were with relativey big scale factors.
>

I have seen smaller benefits at 300 scale factor and somewhat larger
benefits at 1000 scale factor.  Also Mithun has done similar testing with
unlogged tables and the results of same [1] also looks good.

>
> Which made me think. Running
> a bigger pgbench showed that most the interesting (i.e. long) lock waits
> were both via TransactionIdSetPageStatus *and* TransactionIdGetStatus().
>

Yes, this is same what I have observed as well.

>
> So I think what happens is that once you have a big enough table, the
> UPDATEs standard pgbench does start to often hit *old* xids (in unhinted
> rows). Thus old pages have to be read in, potentially displacing slru
> content needed very shortly after.
>
>
> Have you, in your evaluation of the performance of this patch, done
> profiles over time? I.e. whether the performance benefits are the
> immediately, or only after a significant amount of test time? Comparing
> TPS over time, for both patched/unpatched looks relevant.
>

I have mainly done it with half-hour read-write tests. What do you want to
observe via smaller tests, sometimes it gives inconsistent data for
read-write tests?

>
> Even after changing to scale 500, the performance benefits on this,
> older 2 socket, machine were minor; even though contention on the
> ClogControlLock was the second most severe (after ProcArrayLock).
>

I have tried this patch on mainly 8 socket machine with 300 & 1000 scale
factor.  I am hoping that you have tried this test on unlogged tables and
by the way at what client count, you have seen these results.

> Afaics that squares with Jesper's result, which basically also didn't
> show a difference either way?
>

One difference was that I think Jesper has done testing with
synchronous_commit mode as off whereas my tests were with synchronous
commit mode on.

>
> I'm afraid that this patch might be putting bandaid on some of the
> absolutely worst cases, without actually addressing the core
> problem. Simon's patch in [1] seems to come closer addressing that
> (which I don't believe it's safe without going doing every status
> manipulation atomically, as individual status bits are smaller than 4
> bytes).  Now it's possibly to argue that the bandaid might slow the
> bleeding to a survivable level, but I have to admit I'm doubtful.
>
> Here's the stats for a -s 500 run btw:
>  Performance counter stats for 'system wide':
> 18,747  probe_postgres:TransactionIdSetTreeStatus
> 68,884  probe_postgres:TransactionIdGetStatus
>  9,718  probe_postgres:PGSemaphoreLock
> (the PGSemaphoreLock is over 50% ProcArrayLock, followed by ~15%
> SimpleLruReadPage_ReadOnly)
>
>
> My suspicion is that a better approach for now would be to take Simon's
> patch, but add a (per-page?) 'ClogModificationLock'; to avoid the need
> of doing something fancier in TransactionIdSetStatusBit().
>

I think we can try that as well and if you see better results by that
Approach, then we can use that instead of current patch.


[1] -
http://www.postgresql.org/message-id/cad__oujrdwqdjdovhahqldg-6ivu6ibci9ij1qpu6atuqpl...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Using quicksort for every external sort run

2016-03-23 Thread Peter Geoghegan
On Tue, Mar 22, 2016 at 2:27 PM, Tomas Vondra
 wrote:
> For example these two queries got almost 2x as slow for some data sets:
>
> SELECT a FROM numeric_test UNION SELECT a FROM numeric_test_padding
> SELECT a FROM text_test UNION SELECT a FROM text_test_padding
>
> I assume the slowdown is related to the batching (as it's only happening for
> low work_mem values), so perhaps there's an internal heuristics that we
> could tune?

Can you show trace_sort output for these cases? Both master, and patched?

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-23 Thread Fujii Masao
On Wed, Mar 23, 2016 at 5:32 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Tue, 22 Mar 2016 23:08:36 +0900, Fujii Masao  wrote 
> in 

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-03-23 Thread Michael Paquier
On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown  wrote:
> I've noticed that you now can't cancel a query if there's DML pushdown
> to a foreign server.  This previously worked while it was sending
> individual statements as it interrupted and rolled it back.
>
> Here's what the local server sees when trying to cancel:
>
> # DELETE FROM remote.contacts;
> ^CCancel request sent
> DELETE 500
>
> This should probably be fixed.

Looking at what has been committed, execute_dml_stmt is using
PQexecParams, so we'd want to use an asynchronous call and loop on
PQgetResult with CHECK_FOR_INTERRUPTS() in it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-23 Thread Peter Geoghegan
On Mon, Mar 14, 2016 at 6:44 PM, Peter Geoghegan  wrote:
> I can produce a back-patchable variant of this if you and Peter E.
> think this approach is okay.

Where are we on this?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-03-23 Thread Peter Geoghegan
On Tue, Mar 22, 2016 at 3:35 PM, Tomas Vondra
 wrote:
> I've tested the patch you've sent on 2016/3/11, which I believe is the last
> version. I haven't tuned the replacement_sort_mem at all. because my
> understanding was that it's not a 9.6 material (per your message). So my
> intent was to test the configuration people are likely to use by default.

I meant that using replacement selection in a special way with CREATE
INDEX was not 9.6 material. But replacement_sort_mem is. And so, any
case with the (maintenance)_work_mem <= 16MB will have used a heap for
the first run.

I'm sorry I did not make a point of telling you this. It's my fault.
The result in any case is that pre-sorted cases will be similar with
and without the patch, since replacement selection can thereby make
one long run. But on non-sorted cases, the patch helps less because it
is in use less -- with not so much data overall, possibly much less
(which I think explains why the 1M row tests seem so much less
interesting than the 10M row tests).

I worry that at the low end, replacement_sort_mem makes the patch have
one long run, but still some more other runs, so merging is
unbalanced. We should consider if the patch can beat the master branch
at the low end without using a replacement selection heap. It would do
better in at least some cases in low memory conditions, possibly a
convincing majority of cases. I had hoped that my recent idea (since
committed) of resetting memory contexts would help a lot with
regressions when work_mem is very low, and that particular theory
isn't really tested here.

> I'm not sure which commit are you referring to. The benchmark was done on
> a414d96a (from 2016/3/10). However I'd expect that to affect both sets of
> measurements, although it's possible that it affects the patched version
> differently.

You did test the right patches. It just so happens that the master
branch now has the memory batching stuff now, so it doesn't get
credited with that. I think this is good, though, because we care
about 9.5 -> 9.6 regressions.

Improvement ratio (master time/patched time) for Xeon 10 million row
case "SELECT * FROM int_test_padding ORDER BY a DESC":

For work_mem of 8MB = 0.83, 32MB = 0.62, 128MB = 0.52, 512MB = 0.47,
1024MB = 1.00

So, it gets faster than the master branch as more memory is available,
but then it goes to 1.00 -- a perfect draw. I think that this happened
simply because at that point, the sort was an internal sort (even
though similar CREATE INDEX case did not go internal at the same
point). The (internal) 1024MB case is not that much faster than the
512MB external case, which is pretty good.

There are also "near draws", where the ratio is 0.98 or so. I think
that this is because abbreviation is aborted, which can be a problem
with synthetic data + text -- you get a very slow sort either way,
where most time is spent calling strcoll(), and cache characteristics
matter much less. Those cases seemingly take much longer overall, so
this theory makes sense. Unfortunately, abbreviated keys for text that
is not C locale text was basically disabled across the board today due
to a glibc problem. :-(

Whenever I see that the patch is exactly as fast as the master branch,
I am skeptical. I am particularly skeptical of all i5 results
(including 10M cases), because the patch seems to be almost perfectly
matched to the master branch for CREATE INDEX cases (which are the
best cases for the patch on your Xeon server) -- it's much easier to
believe that there was a problem during the test, honestly, like
maintenance_work_mem wasn't set correctly. Those two things are so
different that I have a hard time imagining that they'd ever really
draw. I mean, it's possible, but it's more likely to be a problem with
testing. And, queries like "SELECT * FROM int_test_padding ORDER BY a
DESC" return all rows, which adds noise from all the client overhead.
In fact, you often see that adding more memory helps no case here, so
it seem a bit pointless. Maybe they should be written like "SELECT *
FROM (select * from int_test_padding ORDER BY a DESC OFFSET 1e10) ff"
instead. And maybe queries like "SELECT DISTINCT a FROM int_test ORDER
BY a" would be better as "SELECT COUNT(DISTINCT a) FROM int_test", in
order to test the datum/aggregate case. Just suggestions.

If you really wanted to make the patch look good, a sort with 5GB of
work_mem is the best way, FWIW. The heap data structure used by the
master branch tuplesort.c will handle that very badly. You use no
temp_tablespaces here. I wonder if the patch would do better with
that. Sorting can actually be quite I/O bound with the patch
sometimes, where it's usually CPU/memory bound with the heap,
especially with lots of work_mem. More importantly, it would be more
informative if the temp_tablespace was not affected by I/O from
Postgres' heap.

I also like seeing a sample of "trace_sort = on" output. I don't
expect you to carefully collect that 

Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Amit Kapila
On Thu, Mar 24, 2016 at 12:09 AM, Robert Haas  wrote:
>
> On Tue, Mar 22, 2016 at 1:12 PM, Petr Jelinek 
wrote:
>
> I've read this over several times and looked at
> RecordAndGetPageWithFreeSpace() and I'm still confused.  First of all,
> if the lock was acquired by some other backend which did
> RelationAddExtraBlocks(), it *will* have updated the FSM - that's the
> whole point.
>

It doesn't update the FSM uptill root in some cases, as per comments on top
of RecordPageWithFreeSpace and the code as well.

>
>   Second, if the other backend extended the relation in
> some other manner and did not extend the FSM, how does calling
> RecordAndGetPageWithFreeSpace help?  As far as I can see,
> GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
> searching the FSM, so if one is stymied the other will be too.  What
> am I missing?
>

RecordAndGetPageWithFreeSpace() tries to search from the oldPage passed to
it, rather than from top, so even if  RecordPageWithFreeSpace() doesn't
update till root, it will be able to search the newly added page.  I agree
with whatever you have said in another mail that we should introduce a new
API to do a more targeted search for such cases.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] WIP: Access method extendability

2016-03-23 Thread Alvaro Herrera
I don't quite see how this is supposed to work:

+ #ifdef WAL_DEBUG
+   /*
+* If xlog debug is enabled then check produced delta.  Result of delta
+* application to saved image should be the same as current page state.
+*/
+   if (XLOG_DEBUG)
+   {
+   chartmp[BLCKSZ];
+   memcpy(tmp, image, BLCKSZ);
+   applyPageRedo(tmp, pageData->data, pageData->dataLen);
+   elog(ERROR, "result of generic xlog apply doesn't match");
+   }
+ #endif

I suppose the elog(ERROR) call should be conditional ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres_fdw join pushdown - getting server crash in left outer join of three table

2016-03-23 Thread Michael Paquier
On Thu, Mar 24, 2016 at 1:53 AM, Robert Haas  wrote:
> On Wed, Mar 23, 2016 at 5:24 AM, Rajkumar Raghuwanshi
>  wrote:
>> Thanks Ashutosh for the patch. I have apply and retested it, now not getting
>> server crash.
>
> Thanks for the report and the testing.  I have committed the patch.

Cool, I have refreshed the wiki page for open items accordingly.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2016-03-23 Thread Peter Geoghegan
Hi,

On Tue, Mar 1, 2016 at 7:06 AM, Alexander Korotkov  wrote:
> I finally went over your review.

I'll respond to your points here. Note that I'm reviewing
"partial-sort-basic-7.patch", which you sent on March 13. I respond
here because this is where you answered my questions (I had no
feedback on "partial-sort-basic-6.patch", which didn't use the new
upper planner pathification stuff, unlike this latest version).

> On Wed, Nov 4, 2015 at 4:44 AM, Peter Geoghegan  wrote:
>>
>> Explain output
>> ---

>> I think it might be a good idea to also have a "Sort Groups: 2" field
>> above. That illustrates that you are in fact performing 2 small sorts
>> per group, which is the reality. As you said, it's good to have this
>> be high, because then the sort operations don't need to do too many
>> comparisons, which could be expensive.
>
>
> I agree with your notes. In the attached version of path explain output was
> revised as you proposed.

Cool.

>> Sort Method
>> 
>>
>> Even thought the explain analyze above shows "top-N heapsort" as its
>> sort method, that isn't really true. I actually ran this through a
>> debugger, which is why the above plan took so long to execute, in case
>> you wondered. I saw that in practice the first sort executed for the
>> first group uses a quicksort, while only the second sort (needed for
>> the 2 and last group in this example) used a top-N heapsort.

> With partial sort we run multiple sorts in the same node. Ideally, we need
> to provide some aggregated information over runs.
> This situation looks very similar to subplan which is called multiple times.
> I checked how it works for now.

Noticed this in nodeSort.c:

+   if (node->tuplesortstate != NULL)
+   {
+   tuplesort_reset((Tuplesortstate *) node->tuplesortstate);
+   node->groupsCount++;
+   }
+   else
+   {
+   /* Support structures for cmpSortSkipCols - already
sorted columns */
+   if (skipCols)
+   prepareSkipCols(plannode, node);

+   /*
+* Only pass on remaining columns that are unsorted.
Skip abbreviated
+* keys usage for partial sort.  We unlikely will have
huge groups
+* with partial sort.  Therefore usage of abbreviated
keys would be
+* likely a waste of time.
+*/
tuplesortstate = tuplesort_begin_heap(tupDesc,

You should comment on which case is which, and put common case (no
skip cols) first. Similarly, the ExecSort() for(;;) should put the
common (non-partial) case first, which it does, but then the "first
tuple in partial sort" case first, then the "second or subsequent
partial sort" case last.

More comments here, please:

+typedef struct SkipKeyData
+{
+ FunctionCallInfoData fcinfo;
+ FmgrInfo flinfo;
+ OffsetNumber attno;
+} SkipKeyData;

(What's SkipKeyData?)

Also want comments for new SortState fields. SortState.prev is a
palloc()'d copy of tuple, which should be directly noted, as it is for
similar aggregate cases, etc.

Should you be more aggressive about freeing memory allocated for
SortState.prev tuples?

The new function cmpSortSkipCols() should say "Special case for
NULL-vs-NULL, else use standard comparison", or something. "Lets
pretend NULL is a value for implementation convenience" cases are
considered the exception, and are always noted as the exception.

> In the case of subplan explain analyze gives us just information about last
> subplan run. This makes me uneasy. From one side, it's probably OK that
> partial sort behaves like subplan while showing information just about last
> sort run. From the other side, we need some better solution for that in
> general case.

I see what you mean, but I wasn't so much complaining about that, as
complaining about the simple fact that we use a top-N heap sort *at
all*. This feels like the "limit" case is playing with partial sort
sub-sorts in a way that it shouldn't.

I see you have code like this to make this work:

+   /*
+* Adjust bound_Done with number of tuples we've actually sorted.
+*/
+   if (node->bounded)
+   {
+   if (node->finished)
+   node->bound_Done = node->bound;
+   else
+   node->bound_Done = Min(node->bound,
node->bound_Done + nTuples);

But, why bother? Why not simply prevent tuplesort.c from ever using
the top-N heapsort method when it is called from nodeSort.c for a
partial sort (probably in the planner)?

Why, at a high level, does it make sense to pass down a limit to *any*
sort operation that makes up a partial sort, even the last? This seems
to be adding complexity without a benefit. A big advantage of top-N
heapsorts is that much less memory could be used, but this patch
already has the memory allocated that belonged to previous performsort
calls (mostly -- certainly 

Re: [HACKERS] Bug in searching path in jsonb_set when walking through JSONB array

2016-03-23 Thread Michael Paquier
On Wed, Mar 23, 2016 at 11:01 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> That's ugly. We should actually use TextDatumGetCString because the
>> index is stored as text here via a Datum, and then it is converted
>> back to an integer. So I propose instead the simple patch attached
>> that fixes the failure for me. Could you check if that works for you?
>
> Yeah, this seems better.  But that code needs review anyway, as it's using
> elog() for user-facing error conditions, and I'm thinking the error texts
> could use a bit of attention from a native English speaker.

Thanks. The bug fix is 384dfbd, and the improvements of error messages
is ea4b8bd.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-23 Thread Andres Freund
Hi,

On 2016-03-24 01:10:55 +0100, Andres Freund wrote:
> I'm afraid that this patch might be putting bandaid on some of the
> absolutely worst cases, without actually addressing the core
> problem. Simon's patch in [1] seems to come closer addressing that
> (which I don't believe it's safe without going doing every status
> manipulation atomically, as individual status bits are smaller than 4
> bytes).  Now it's possibly to argue that the bandaid might slow the
> bleeding to a survivable level, but I have to admit I'm doubtful.
> 
> Here's the stats for a -s 500 run btw:
>  Performance counter stats for 'system wide':
> 18,747  probe_postgres:TransactionIdSetTreeStatus
> 68,884  probe_postgres:TransactionIdGetStatus
>  9,718  probe_postgres:PGSemaphoreLock
> (the PGSemaphoreLock is over 50% ProcArrayLock, followed by ~15%
> SimpleLruReadPage_ReadOnly)
> 
> 
> My suspicion is that a better approach for now would be to take Simon's
> patch, but add a (per-page?) 'ClogModificationLock'; to avoid the need
> of doing something fancier in TransactionIdSetStatusBit().
> 
> Andres
> 
> [1]: 
> http://archives.postgresql.org/message-id/CANP8%2Bj%2BimQfHxkChFyfnXDyi6k-arAzRV%2BZG-V_OFxEtJjOL2Q%40mail.gmail.com

Simon, would you mind if I took your patch for a spin like roughly
suggested above?


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-23 Thread Andres Freund
On 2016-03-23 21:43:41 +0100, Andres Freund wrote:
> I'm playing around with SELECT txid_current(); right now - that should
> be about the most specific load for setting clog bits.

Or so I thought.

In my testing that showed just about zero performance difference between
the patch and master. And more surprisingly, profiling showed very
little contention on the control lock. Hacking
TransactionIdSetPageStatus() to return without doing anything, actually
only showed minor performance benefits.

[there's also the fact that txid_current() indirectly acquires two
lwlock twice, which showed up more prominently than control lock, but
that I could easily hack around by adding a xid_current().]

Similar with an INSERT only workload. And a small scale pgbench.


Looking through the thread showed that the positive results you'd posted
all were with relativey big scale factors. Which made me think. Running
a bigger pgbench showed that most the interesting (i.e. long) lock waits
were both via TransactionIdSetPageStatus *and* TransactionIdGetStatus().


So I think what happens is that once you have a big enough table, the
UPDATEs standard pgbench does start to often hit *old* xids (in unhinted
rows). Thus old pages have to be read in, potentially displacing slru
content needed very shortly after.


Have you, in your evaluation of the performance of this patch, done
profiles over time? I.e. whether the performance benefits are the
immediately, or only after a significant amount of test time? Comparing
TPS over time, for both patched/unpatched looks relevant.


Even after changing to scale 500, the performance benefits on this,
older 2 socket, machine were minor; even though contention on the
ClogControlLock was the second most severe (after ProcArrayLock).

Afaics that squares with Jesper's result, which basically also didn't
show a difference either way?


I'm afraid that this patch might be putting bandaid on some of the
absolutely worst cases, without actually addressing the core
problem. Simon's patch in [1] seems to come closer addressing that
(which I don't believe it's safe without going doing every status
manipulation atomically, as individual status bits are smaller than 4
bytes).  Now it's possibly to argue that the bandaid might slow the
bleeding to a survivable level, but I have to admit I'm doubtful.

Here's the stats for a -s 500 run btw:
 Performance counter stats for 'system wide':
18,747  probe_postgres:TransactionIdSetTreeStatus
68,884  probe_postgres:TransactionIdGetStatus
 9,718  probe_postgres:PGSemaphoreLock
(the PGSemaphoreLock is over 50% ProcArrayLock, followed by ~15%
SimpleLruReadPage_ReadOnly)


My suspicion is that a better approach for now would be to take Simon's
patch, but add a (per-page?) 'ClogModificationLock'; to avoid the need
of doing something fancier in TransactionIdSetStatusBit().

Andres

[1]: 
http://archives.postgresql.org/message-id/CANP8%2Bj%2BimQfHxkChFyfnXDyi6k-arAzRV%2BZG-V_OFxEtJjOL2Q%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Craig Ringer
On 24 March 2016 at 02:01, Merlin Moncure  wrote:


> > If you plan to have "prepare if not exists" at startup only, why don't
> > you just wrap it with
> > exception handler then?
>
> That's impolite to our users.  Virtually all other commands have been
> decorated with IF [NOT] exists to avoid having to guard with exception
> handler -- why not this one?  Also, if the handler is on the client
> side, it tends to be racey.
>

Yeah. Also, the log spam from that is ugly and it's really best avoided.

I find that to be a very frustrating issue with client-side upsert retry
loop approaches. Less of a concern now that 9.5 has a true upsert, but
that's not the only area where the client is expected to try it and handle
the error if it fails.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] README for src/backend/replication/logical

2016-03-23 Thread Craig Ringer
On 23 March 2016 at 23:54, Alvaro Herrera  wrote:

> Craig Ringer wrote:
> > Hi all
> >
> > As part of some internal training/discussion I wrote some notes on
> logical
> > decoding's structure and flow that seemed worth polishing up into a draft
> > README for src/backend/replication/logical .
>
> That's nice, but you attached the one in src/backend/replication
> instead.
>

Well, that was dumb. Lets try that again...


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


README
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-23 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> And where to put the corresponding header files?
> >> src/include/fe-utils?
> 
> > Sounds fair but would that be installed in PREFIX/include/server?
> > That'd be a bit odd, but I'm not -1 on that.
> 
> The only other plan I can think of is to put the .h files describing
> src/fe-utils files into src/fe-utils itself.  Which would sort of match
> the existing precedent of src/bin/ files looking into sibling directories
> for relevant .h files, but it seems a bit odd too.

I think it's better that they end up installed in
PREFIX/include/server/fe_utils rather than not installed at all, which
is what would happen, unless you were to create a tailored command in
make install.  For instance we have the ones for src/common in
PREFIX/include/server/common, which I suppose is useful for extensions
(we do install libpgcommon.a, which would be messy to use otherwise.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread Regina Obe

> I'm something of a backwards compatibility zealot, but I've become one for 
> very good reasons.  Personally, I'd rather we'd define precisely the usages 
> that are deprecated (I guess SRF-tlist in the presence of
> FROM) and force them to error out with an appropriate HINT rather than give a 
> different answer than they used to.  The problem here is that LATERAL is 
> still fairly new and there is a huge body of code out there leveraging the 
> 'bad' way, as it was for years > and years the only way to do a number of 
> useful things.

> merlin

FWIW: I prefer Merlin's solution of erroring out rather than returning an 
unexpected answer and if it's a buggy behavior it should be eradicated.

The reason is this.  For many  the (..).* ORDER BY .. looks equivalent to the 
lateral.  
More than a trivial amount of my time has been spent explaining to people why 
their raster queries are so slow because the SRF is called multiple times and 
they should switch to LATERAL usage.

So if the old solution is still going to have the same penalty's I would much 
assume just scrap it and break people's code in a clear and noticeable way they 
can't ignore.

There is nothing more frustrating than code that still works but gives you an 
answer different than what you are expecting.  Those kind of bugs stay buried 
for a while.

I think as long as it's noted in the release notes of the breaking change it's 
fine and called for if it makes your code cleaner and more manageable.

Thanks,
Regina




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread Tom Lane
I wrote:
> ... I'd love to
> toss the entire SRF-in-tlist feature overboard one of these years,
> both because of semantic issues like these and because a fair amount
> of code and overhead could be ripped out if it were disallowed.
> But I don't know how we get to that --- as Merlin says, there's a
> pretty large amount of application code depending on the feature.

BTW, after further thought I seem to recall some discussion about whether
we could mechanically transform SRF-in-tlist into a LATERAL query.
That is, we could make the rewriter or planner convert

SELECT srf1(...), srf2(...)
  FROM ...
  WHERE ...

into

SELECT u.s1, u.s2
  FROM ...,  LATERAL ROWS FROM(srf1(...), srf2(...)) AS u(s1,s2)
  WHERE ...

(The SRF invocations might be buried inside expressions, but we'd find
them and convert them anyway.  Also, we'd merge textually-identical SRF
invocations into a single ROWS FROM entry to avoid multiple evaluation,
at least for SRFs not marked volatile.)  Having done that, the executor's
support for SRFs in general expressions could be removed, a significant
savings.

One problem with this is that it only duplicates the current behavior
in cases where the SRFs all have the same period.  But you could argue
that the current behavior in cases where they don't is widely considered
a bug anyway.

A possibly larger problem is that it causes the SRFs to be evaluated
before sorting/ordering/limiting.  In view of the discussions that led
up to 9118d03a8, that could be a fatal objection :-(.  Maybe we could
get around it with a different transformation, into a two-level query?
The above sketch doesn't really consider what to do with GROUP BY,
ORDER BY, etc, but maybe we could push those down into a sub-select
that becomes the first FROM item.

Anyway, that's food for thought not something that could realistically
happen right now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-03-23 Thread Gilles Darold
Le 22/03/2016 14:44, Michael Paquier a écrit :
> On Sat, Mar 12, 2016 at 12:46 AM, Gilles Darold
>  wrote:
>> Here is the patch rewritten to use alternate file
>> $PGDATA/pg_log_filename to store the current log filename used by
>> syslogger. All examples used in the first mail of this thread work the
>> exact same way. If there's no other remarks, I will add the patch to the
>> next commit fest.
> Please be sure to register this patch to the next CF:
> https://commitfest.postgresql.org/10/
> Things are hot enough with 9.6, so this will only be considered in 9.7.

Thanks for the reminder, here is the v3 of the patch after a deeper
review and testing. It is now registered to the next commit fest under
the System Administration topic.

Fixes in this patch are:

- Output file have been renamed as PGDATA/pg_log_file

- Log level of the warning when logging collector is not active has been
changed to NOTICE

postgres=# select pg_current_logfile();
NOTICE:  current log can not be reported, log collection is not active
 pg_current_logfile


(1 row)

- Log level for file access errors in function
store_current_log_filename() of file src/backend/postmaster/syslogger.c
has been set to WARNING, using ERROR level forced the backend to stop
with a FATAL error.

- Add information about file PGDATA/pg_log_file in storage file layout
of doc/src/sgml/storage.sgml


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ae93e69..155e76b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15158,6 +15158,11 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
(0 if not called, directly or indirectly, from inside a trigger)
   
 
+  pg_current_logfile()
+   text
+   current log file used by the logging collector
+  
+
   
session_user
name
@@ -15365,6 +15370,16 @@ SET search_path TO schema , schema, ..
 pg_notification_queue_usage

 
+   
+pg_current_logfile
+   
+
+   
+pg_current_logfile returns the name of the current log
+file used by the logging collector, as a text. Log collection
+must be active.
+   
+

 pg_listening_channels returns a set of names of
 asynchronous notification channels that the current session is listening
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 9b2e09e..6284d54 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -170,6 +170,13 @@ last started with
   (this file is not present after server shutdown)
 
 
+
+ pg_log_file
+ A file recording the current log file used by the syslogger
+  when log collection is active
+
+
+
 
 
 
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index e7e488a..c3bf672 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -145,6 +145,7 @@ static char *logfile_getname(pg_time_t timestamp, const char *suffix);
 static void set_next_rotation_time(void);
 static void sigHupHandler(SIGNAL_ARGS);
 static void sigUsr1Handler(SIGNAL_ARGS);
+static void store_current_log_filename(char *filename);
 
 
 /*
@@ -571,6 +572,9 @@ SysLogger_Start(void)
 
 	syslogFile = logfile_open(filename, "a", false);
 
+	/* store information about current log filename used by log collection */
+	store_current_log_filename(filename);
+
 	pfree(filename);
 
 #ifdef EXEC_BACKEND
@@ -1209,6 +1213,9 @@ logfile_rotate(bool time_based_rotation, int size_rotation_for)
 		fclose(syslogFile);
 		syslogFile = fh;
 
+		/* store information about current log filename used by log collection */
+		store_current_log_filename(filename);
+
 		/* instead of pfree'ing filename, remember it for next time */
 		if (last_file_name != NULL)
 			pfree(last_file_name);
@@ -1253,6 +1260,9 @@ logfile_rotate(bool time_based_rotation, int size_rotation_for)
 		fclose(csvlogFile);
 		csvlogFile = fh;
 
+		/* store information about current log filename used by log collection */
+		store_current_log_filename(csvfilename);
+
 		/* instead of pfree'ing filename, remember it for next time */
 		if (last_csv_file_name != NULL)
 			pfree(last_csv_file_name);
@@ -1362,3 +1372,35 @@ sigUsr1Handler(SIGNAL_ARGS)
 
 	errno = save_errno;
 }
+
+
+/*
+ * Store the name of the file where current log messages are written when
+ * log collector is enabled. Useful to find the name of the current log file
+ * when a time-based rotation is defined. 
+ */
+static void
+store_current_log_filename(char *filename)
+{
+	FILE   *fh;
+	charlogpathfilename[MAXPGPATH];
+
+	snprintf(logpathfilename, sizeof(logpathfilename), "%s",
+		CURRENT_LOG_FILENAME);
+	if ((fh = fopen(logpathfilename, "w")) == NULL)
+	{
+	   ereport(WARNING,
+			   (errcode_for_file_access(),
+			   errmsg("could not open log file \"%s\": %m",
+	   logpathfilename)));
+	

Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread Stephen Frost
* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Wed, Mar 23, 2016 at 2:11 PM, Tom Lane  wrote:
> > In the meantime I suppose there's a case to be made for preserving
> > bug compatibility as much as possible.
> >
> > So anyway the question is whether to commit the attached or not.
> 
> +1 for commit - I'll trust Tom on the quality of the patch :)

I'm not going to object to it.  All-in-all, I suppose '+0' from me.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread David G. Johnston
On Wed, Mar 23, 2016 at 2:11 PM, Tom Lane  wrote:

>
> ​​
> In the meantime I suppose there's a case to be made for preserving
> bug compatibility as much as possible.
>
> So anyway the question is whether to commit the attached or not.


​+1 for commit - I'll trust Tom on the quality of the patch :)

David J.


Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread Tom Lane
Stephen Frost  writes:
> * Merlin Moncure (mmonc...@gmail.com) wrote:
>> I'm something of a backwards compatibility zealot, but I've become one
>> for very good reasons.  Personally, I'd rather we'd define precisely
>> the usages that are deprecated (I guess SRF-tlist in the presence of
>> FROM) and force them to error out with an appropriate HINT rather than
>> give a different answer than they used to.  The problem here is that
>> LATERAL is still fairly new and there is a huge body of code out there
>> leveraging the 'bad' way, as it was for years and years the only way
>> to do a number of useful things.

> I have to side with what I believe is Tom's position on this one.  I do
> like the notion of throwing an error in cases where someone sent us
> something that we're pretty sure is wrong, but I don't agree that we
> should continue to carry on bug-compatibility with things that are
> already one foot in the grave and really just need to be shoved all the
> way in.

FWIW, I think the case that we're looking at only comes up if you have
more than one SRF expression in the tlist (Regina's example has that
after *-expansion, though it might not've looked so to start with),
and some of them are sort/group columns while others are not.

So the question at hand is whether that falls within the scope of "huge
body of code leveraging the old way" or whether it's too much of a corner
case to want to tie ourselves down to.

I wrote a patch that fixes this for the sort-column case, thus getting us
back to the historical behavior; see attached.  If we really wanted to be
consistent, we'd have to look at pushing all SRFs clear back to the
scanjoin_target list if any SRFs appear in grouping columns.  That would
be significantly more code and it would deviate from the historical
behavior, as I illustrated upthread, so I'm not really inclined to do it.
But the historical behavior in this area is pretty unprincipled.

It's also worth noting that we've had multiple complaints about the
ExecTargetList least-common-multiple behavior over the years; it seems
sane enough in examples like these where all the SRFs have the same
period, but it gets way less so as soon as they don't.  I'd love to
toss the entire SRF-in-tlist feature overboard one of these years,
both because of semantic issues like these and because a fair amount
of code and overhead could be ripped out if it were disallowed.
But I don't know how we get to that --- as Merlin says, there's a
pretty large amount of application code depending on the feature.
In the meantime I suppose there's a case to be made for preserving
bug compatibility as much as possible.

So anyway the question is whether to commit the attached or not.

regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5229c84..db347b8 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*** make_pathkeys_for_window(PlannerInfo *ro
*** 4615,4625 
   *
   * Our current policy is to postpone volatile expressions till after the sort
   * unconditionally (assuming that that's possible, ie they are in plain tlist
!  * columns and not ORDER BY/GROUP BY/DISTINCT columns).  We also postpone
!  * set-returning expressions unconditionally (if possible), because running
!  * them beforehand would bloat the sort dataset, and because it might cause
!  * unexpected output order if the sort isn't stable.  Expensive expressions
!  * are postponed if there is a LIMIT, or if root->tuple_fraction shows that
   * partial evaluation of the query is possible (if neither is true, we expect
   * to have to evaluate the expressions for every row anyway), or if there are
   * any volatile or set-returning expressions (since once we've put in a
--- 4615,4630 
   *
   * Our current policy is to postpone volatile expressions till after the sort
   * unconditionally (assuming that that's possible, ie they are in plain tlist
!  * columns and not ORDER BY/GROUP BY/DISTINCT columns).  We also prefer to
!  * postpone set-returning expressions, because running them beforehand would
!  * bloat the sort dataset, and because it might cause unexpected output order
!  * if the sort isn't stable.  However there's a constraint on that: all SRFs
!  * in the tlist should be evaluated at the same plan step, so that they can
!  * run in sync in ExecTargetList.  So if any SRFs are in sort columns, we
!  * mustn't postpone any SRFs.  (Note that in principle that policy should
!  * probably get applied to the group/window input targetlists too, but we
!  * have not done that historically.)  Lastly, expensive expressions are
!  * postponed if there is a LIMIT, or if root->tuple_fraction shows that
   * partial evaluation of the query is possible (if neither is true, we expect
   * to have to evaluate the expressions for every row anyway), or if there are
   * any volatile or set-returning 

Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-23 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> And where to put the corresponding header files?
>> src/include/fe-utils?

> Sounds fair but would that be installed in PREFIX/include/server?
> That'd be a bit odd, but I'm not -1 on that.

The only other plan I can think of is to put the .h files describing
src/fe-utils files into src/fe-utils itself.  Which would sort of match
the existing precedent of src/bin/ files looking into sibling directories
for relevant .h files, but it seems a bit odd too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-23 Thread Andres Freund
On 2016-03-23 12:33:22 +0530, Amit Kapila wrote:
> On Wed, Mar 23, 2016 at 12:26 PM, Amit Kapila 
> wrote:
> >
> > On Tue, Mar 22, 2016 at 4:22 PM, Andres Freund  wrote:
> > >
> > >
> > > I think it's worthwhile to create a benchmark that does something like
> > > BEGIN;SELECT ... FOR UPDATE; SELECT pg_sleep(random_time);
> > > INSERT;COMMIT; you'd find that if random is a bit larger (say 20-200ms,
> > > completely realistic values for network RTT + application computation),
> > > the success rate of group updates shrinks noticeably.
> > >
> >
> > Will do some tests based on above test and share results.
> >
> 
> Forgot to mention that the effect of patch is better visible with unlogged
> tables, so will do the test with those and request you to use same if you
> yourself is also planning to perform some tests.

I'm playing around with SELECT txid_current(); right now - that should
be about the most specific load for setting clog bits.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Yury Zhuravlev

Michael Meskes wrote:

More than you think.

I doubt you want to propose removing features that make developing new
features harder, or do you? :)


I want to understand the situation. You may want to make the build ecpg 
optional. Personally, I want to.


--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Vladimir Sitnikov
Merlin> All I'm saying is that the use of
Merlin> server side prepared statements is extremely problematic in
Merlin> conjunction with pgbouncer

I've filed https://github.com/pgbouncer/pgbouncer/issues/126 to get
pgbouncer improved in regard to prepared statements.

Vladimir


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread Stephen Frost
* Merlin Moncure (mmonc...@gmail.com) wrote:
> On Wed, Mar 23, 2016 at 12:37 PM, Tom Lane  wrote:
> > which is both SQL-standard semantics and much more efficient than
> > SRF-in-tlist.  We've more or less deprecated SRF-in-tlist since we
> > introduced LATERAL in 9.3.  How much are we willing to do to stay
> > bug-compatible with old behaviors here?
> 
> I think we should, and the fact this was caught so early on the
> release cycle underscores that.  One of the problems is that there are
> reasonable cases (note, not impacted by this bug) of this usage that
> are still commonplace, for example:
> 
> ysanalysis=# select unnest(current_schemas(true));
>unnest
> 
>  pg_catalog
>  public
> 
> I'm something of a backwards compatibility zealot, but I've become one
> for very good reasons.  Personally, I'd rather we'd define precisely
> the usages that are deprecated (I guess SRF-tlist in the presence of
> FROM) and force them to error out with an appropriate HINT rather than
> give a different answer than they used to.  The problem here is that
> LATERAL is still fairly new and there is a huge body of code out there
> leveraging the 'bad' way, as it was for years and years the only way
> to do a number of useful things.

I have to side with what I believe is Tom's position on this one.  I do
like the notion of throwing an error in cases where someone sent us
something that we're pretty sure is wrong, but I don't agree that we
should continue to carry on bug-compatibility with things that are
already one foot in the grave and really just need to be shoved all the
way in.

This isn't the only break in backwards compatibility we've had over the
years and is pretty far from the largest (string escaping, anyone?  or
removing implicit casts?) and I'd argue we're better off for it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-23 Thread Petr Jelinek

On 23/03/16 13:05, Michael Paquier wrote:


OK, the please send an updated set of patches for what remains.



Here you go:
- 0001 fixes the global declarations of TIMEZONE_GLOBAL and
TZNAME_GLOBAL to be WIN32-compliant. I got bitten by that in the ECPG
compilation.
- 0002, support of VS2015 in the scripts of src/tools/msvc
- 0003, this is necessary in order to run the regression tests,
details are here:

http://www.postgresql.org/message-id/cab7npqtdixxs8cdl8moivh6qfq-1qv9mkn0ayjzybvzv6wc...@mail.gmail.com


So that's basically what Andres did? Interesting that we now actually really
need it. I was in favor of doing those changes in any case.


Yes, that's what Andres did. I am just attaching it here to allow
Andrew to test this patch set appropriately.


- 0004 is the fix for locales. This is actually Petr's work upthread,
I have updated comments in the code though and did a bit more
polishing. This still looks like the cleanest solution we have. Other
solutions are mentioned upthread: redeclaration of struct _locale_t in
backend code is one.


Thanks for polishing this.

I think this is ready to be committed, but the 0003 might be somewhat
controversial based on the original thread.


I thought that the end consensus regarding 0003 was to apply it, but
we could as well wait for the final verdict (committer push) on the
other thread. And well, if this is not applied, some of the gin tests
will complain about "right sibling of GIN page is of different type" a
couple of times.



Hmm I see you are right, I missed the last couple emails. Ok I'll mark 
it ready for committer - it does work fine on my vs2015 machine and I am 
happy with the code too. Well, as happy as I can be given the locale mess.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Stephen Frost
Merlin,

* Merlin Moncure (mmonc...@gmail.com) wrote:
> No one is arguing that that you should send it any every time (at
> least -- I hope not).

I'm not sure I follow how you can avoid that though?

pgbouncer in transaction pooling mode may let a particular connection
die off and then, when you issue a new request, create a new one- which
won't have any prepared queries in it, even though you never lost your
connection to pgbouncer.

That's why I was saying you'd have to send it at the start of every
transaction, which does add to network load and requires parsing, etc.
Would be nice to avoid that, if possible, but I'm not quite sure how.

One thought might be to have the server somehow have a pre-canned set of
queries already set up and ready for you to use when you connect,
without any need to explicitly prepare them, etc.

Just a thought.  I do still like the general idea of INE support for
PREPARE, but perhaps there's a better option.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Merlin Moncure
On Wed, Mar 23, 2016 at 2:17 PM, Vladimir Sitnikov
 wrote:
> Merlin>proposed would allow use of server side prepared statements with JDBC.
>
> It would not. If we discuss end-to-end scenarios in detail, we would end up 
> with
> "send full query on each execution" -> lex/gram on each execution kind
> of overheads.

I think we're talking over each other here.  I'm not suggesting the
jdbc driver needs to be adjusted.  All I'm saying is that the use of
server side prepared statements is extremely problematic in
conjunction with pgbouncer (or any technology where the application db
connection and the server session are not 1:1) and trivial with the
proposed patch.

Any discussion regarding jdbc is off topic relative to that.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-23 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:

> > Actually you could just list them in OBJS_FRONTEND in src/common.  That
> > way they're not built for the server at all; no need for a new subdir.
> 
> Meh.  I think stuff in OBJS_FRONTEND has to be pretty weird special
> cases, such as frontend emulations of server-environment code.  Which
> is what fe_memutils is, so that's OK, but I kinda question whether
> restricted_token.c belongs in src/common/ at all.

OK, that makes sense.  Feel free to move restricted_token.c if you feel
like it.

> > The libpq dependency argument AFAICS only applies to the ones in
> > src/bin/psql.  Perhaps we could have feutils with those only, and move
> > the other files to src/common.
> 
> On looking closer, the only one that doesn't depend on libpq is
> keywords.c, which seems like it belongs in the same place as kwlookup.c.
> So yeah, let's move both of those to src/common.

I guess those dependencies are somewhat hidden.  No objections to that
move.

> Anybody want to bikeshed the directory name src/feutils?  Maybe fe-utils
> would be more readable.

Yes, +1 for either a - or _ in there.

> And where to put the corresponding header files?
> src/include/fe-utils?

Sounds fair but would that be installed in PREFIX/include/server?
That'd be a bit odd, but I'm not -1 on that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Petr Jelinek

On 23/03/16 20:43, Robert Haas wrote:

On Wed, Mar 23, 2016 at 3:33 PM, Petr Jelinek  wrote:

Btw thinking about it some more, ISTM that not finding the block and just
doing the extension if the FSM wasn't extended correctly previously is
probably cleaner behavior than what we do now. The reasoning for that
opinion is that if the FSM wasn't extended, we'll fix it by doing relation
extension since we know we do both in this code path and also if we could
not find page before we'll most likely not find one even on retry and if
there was page added at the end by extension that we might reuse partially
here then there is no harm in adding new one anyway as the whole point of
this patch is that it does bigger extension that strictly necessary so
insisting on page reuse for something that seems like only theoretical
possibility that does not even exist in current code does not seem right.


I'm not sure I completely follow this.  The fact that the last
sentence is 9 lines long may be related.  :-)



I tend to do that sometimes :)


I think it's pretty clearly important to re-check the FSM after
acquiring the extension lock.  Otherwise, imagine that 25 backends
arrive at the exact same time.  The first one gets the lock and
extends the relation 500 pages; the next one, 480, and so on.  In
total, they extend the relation by 6500 pages, which is a bit rich.
Rechecking the FSM after acquiring the lock prevents that from
happening, and that's a very good thing.  We'll do one 500-page
extension and that's it.



Right, but that would only happen if all the backends did it using 
different code which does not do the FSM extension because the current 
code does FSM extension and the point of using 
RecordAndGetPageWithFreeSpace seems to be "just in case" somebody is 
doing extension differently (at least I don't see other reason). So 
basically I am not saying we shouldn't do the search but that I agree 
GetPageWithFreeSpace should be enough as the worst that can happen is 
that we overextend the relation in case some theoretical code from 
somewhere else also did extension of relation without extending FSM 
(afaics).


But maybe Dilip had some other reason for using the 
RecordAndGetPageWithFreeSpace that is not documented.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 3:33 PM, Petr Jelinek  wrote:
> Btw thinking about it some more, ISTM that not finding the block and just
> doing the extension if the FSM wasn't extended correctly previously is
> probably cleaner behavior than what we do now. The reasoning for that
> opinion is that if the FSM wasn't extended, we'll fix it by doing relation
> extension since we know we do both in this code path and also if we could
> not find page before we'll most likely not find one even on retry and if
> there was page added at the end by extension that we might reuse partially
> here then there is no harm in adding new one anyway as the whole point of
> this patch is that it does bigger extension that strictly necessary so
> insisting on page reuse for something that seems like only theoretical
> possibility that does not even exist in current code does not seem right.

I'm not sure I completely follow this.  The fact that the last
sentence is 9 lines long may be related.  :-)

I think it's pretty clearly important to re-check the FSM after
acquiring the extension lock.  Otherwise, imagine that 25 backends
arrive at the exact same time.  The first one gets the lock and
extends the relation 500 pages; the next one, 480, and so on.  In
total, they extend the relation by 6500 pages, which is a bit rich.
Rechecking the FSM after acquiring the lock prevents that from
happening, and that's a very good thing.  We'll do one 500-page
extension and that's it.

However, I don't think using RecordAndGetPageWithFreeSpace rather than
GetPageWithFreeSpace is appropriate.  We've already recorded free
space on that page, and recording it again is a bad idea.  It's quite
possible that by the time we get the lock our old value is totally
inaccurate.  If there's some advantage to searching in the more
targeted way that RecordAndGetPageWithFreeSpace does over
GetPageWithFreeSpace then we need a new API into the freespace stuff
that does the more targeted search without updating anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN is missing in multicolumn indexes documentation

2016-03-23 Thread Petr Jediný
>>
>> Multicolumn BRIN is like GIN.  Every column is indexed separately.
>> The order of the columns doesn't matter.
>
> Right.  You can use one index to cover all columns; the position of the
> column in the index won't matter for a query that uses one column.  The
> only reason to have multiple BRIN indexes on a single table is to have
> a different pages_per_range.

Thanks for the information, I've attached the patch updated to v2.


PJ


brin-multicolumn-doc-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-23 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Note: the reason for a new subdirectory, rather than putting this
>> stuff into src/common/, is that src/common/ is meant for code that's
>> shared between frontend and backend, which this stuff is not.  Also,
>> many of these files depend on libpq which seems like an inappropriate
>> dependency for src/common.

> Actually you could just list them in OBJS_FRONTEND in src/common.  That
> way they're not built for the server at all; no need for a new subdir.

Meh.  I think stuff in OBJS_FRONTEND has to be pretty weird special
cases, such as frontend emulations of server-environment code.  Which
is what fe_memutils is, so that's OK, but I kinda question whether
restricted_token.c belongs in src/common/ at all.

> The libpq dependency argument AFAICS only applies to the ones in
> src/bin/psql.  Perhaps we could have feutils with those only, and move
> the other files to src/common.

On looking closer, the only one that doesn't depend on libpq is
keywords.c, which seems like it belongs in the same place as kwlookup.c.
So yeah, let's move both of those to src/common.

Anybody want to bikeshed the directory name src/feutils?  Maybe fe-utils
would be more readable.  And where to put the corresponding header files?
src/include/fe-utils?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Petr Jelinek

On 23/03/16 20:19, Petr Jelinek wrote:

On 23/03/16 20:01, Robert Haas wrote:

On Wed, Mar 23, 2016 at 2:52 PM, Petr Jelinek 
wrote:

Second, if the other backend extended the relation in
some other manner and did not extend the FSM, how does calling
RecordAndGetPageWithFreeSpace help?  As far as I can see,
GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
searching the FSM, so if one is stymied the other will be too.  What
am I missing?



The RecordAndGetPageWithFreeSpace will extend FSM as it calls
fsm_set_and_search which in turn calls fsm_readbuf with extend = true.


So how does that help?  If I'm reading this right, the new block will
be all zeroes which means no space available on any of those pages.



I am bit confused as to what exactly you are saying, but what will
happen is we get back to the while cycle and try again so eventually we
should find either block with enough free space or add new one (not sure
if this would actually ever happen in practice in heavily concurrent
workload where the FSM would not be correctly extended during relation
extension though, we might just loop here forever).



Btw thinking about it some more, ISTM that not finding the block and 
just doing the extension if the FSM wasn't extended correctly previously 
is probably cleaner behavior than what we do now. The reasoning for that 
opinion is that if the FSM wasn't extended, we'll fix it by doing 
relation extension since we know we do both in this code path and also 
if we could not find page before we'll most likely not find one even on 
retry and if there was page added at the end by extension that we might 
reuse partially here then there is no harm in adding new one anyway as 
the whole point of this patch is that it does bigger extension that 
strictly necessary so insisting on page reuse for something that seems 
like only theoretical possibility that does not even exist in current 
code does not seem right.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Petr Jelinek

On 23/03/16 20:01, Robert Haas wrote:

On Wed, Mar 23, 2016 at 2:52 PM, Petr Jelinek  wrote:

Second, if the other backend extended the relation in
some other manner and did not extend the FSM, how does calling
RecordAndGetPageWithFreeSpace help?  As far as I can see,
GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
searching the FSM, so if one is stymied the other will be too.  What
am I missing?



The RecordAndGetPageWithFreeSpace will extend FSM as it calls
fsm_set_and_search which in turn calls fsm_readbuf with extend = true.


So how does that help?  If I'm reading this right, the new block will
be all zeroes which means no space available on any of those pages.



I am bit confused as to what exactly you are saying, but what will 
happen is we get back to the while cycle and try again so eventually we 
should find either block with enough free space or add new one (not sure 
if this would actually ever happen in practice in heavily concurrent 
workload where the FSM would not be correctly extended during relation 
extension though, we might just loop here forever).


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Vladimir Sitnikov
Merlin>proposed would allow use of server side prepared statements with JDBC.

It would not. If we discuss end-to-end scenarios in detail, we would end up with
"send full query on each execution" -> lex/gram on each execution kind
of overheads.

That is hardly a proper way of using prepared statements.
I'm not eager to merge half-a-solution to pgjdbc.
Just in case: I'm one of those who maintain pgjdbc.

Merlin>https://pgbouncer.github.io/faq.html#how-to-use-prepared-statements-with-session-pooling
pgbouncer's docs> the reset query must clean old prepared statements

I'm afraid, that is just "you must clean prepared statements".
Where are the issues/suggestions from pgbouncer team?
I'm just a year into pgjdbc, I've attended a couple of the largest
PostgreSQL conferences
(like 700+ attendees), however I know of exactly 0 suggestions on
improving the matter.
Everybody just keeps saying "discard all".

Merlin>With proposed, the application can simply prepare after
Merlin>opening the 'connection' and not have to worry about handling the
Merlin>error or scope.

Can you name at least a couple of applications that indeed
"prepare after opening the connection"?
Note: it is not something JDBC driver can do. That "prepare after opening"
requires cooperation from the application.

I'm afraid, I know exactly 0 such applications. At least, in java world.
Applications typically use framework generated queries, so it would be
hard to impossible to "simply prepare after opening".
The "set of used sql queries" is likely to be infinite even for a
single application.
That is sad, but true.

That is exactly the reason why I've implemented _transparent_ statement cache
for pgjdbc. As application is using generated queries, pgjdbc detects
query reuse and enables server-prepared queries behind the scenes.


If no one objects, I would just go ahead and file
"report ParameterStatus(pgbouncer.backend_id=...)" issue for pgbouncer.
I guess we can agree on the name and semantics of the new status message,
so it would not accidentally collide with the one of newer PG version.

Merlin>Although there was a very long standing issue where jdbc
Merlin>would try to prepare 'BEGIN' in such a a way that it could not be
Merlin>disabled -- that was fixed.

What bothers me is current pgjdbc CI has exactly 0 pgbouncer tests.
That means "BEGIN is fixed" can easily break and no one would notice that.
It is tracked under https://github.com/pgjdbc/pgjdbc/issues/509, so
if there's interest in pgbouncer vs pgjdbc, then #509 might be a good start.

Vladimir


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread Tom Lane
"David G. Johnston"  writes:
> On Wednesday, March 23, 2016, Tom Lane  wrote:
>> ...  We've more or less deprecated SRF-in-tlist since we
>> introduced LATERAL in 9.3.  How much are we willing to do to stay
>> bug-compatible with old behaviors here?

> My gut reaction is that this is an unnecessary regression for the sake of a
> performance optimization that is likely drowned out in the usage presented
> anyway.
> The pivot for me would be how hard would it be to maintain the old behavior
> in this "more or less deprecated" scenario.  I have no way to judge that.

Well, it'd make make_sort_input_target() more complicated and a bit
slower, mainly because it would have to check SRF-ness of sorting columns
that it currently has no need to inspect at all.  My concern here is not
really with that, it's with trying to draw a boundary around what behavior
we're going to promise bug-compatibility with.  To illustrate, you really
can get the multiple-expansions behavior with existing releases, eg in
9.4

regression=# SELECT (dumpset(f.test, 'hello world' || f.test)).*, f.test
FROM generate_series(1,4) As f(test)
GROUP BY junk2, f.test;
 id | junk1 | junk2 | test 
+---+---+--
  1 | 3hello world3 | 31|3
  2 | 3hello world3 | 31|3
  1 | 4hello world4 | 42|4
  2 | 4hello world4 | 42|4
  1 | 4hello world4 | 41|4
  2 | 4hello world4 | 41|4
  1 | 1hello world1 | 11|1
  2 | 1hello world1 | 11|1
  1 | 3hello world3 | 32|3
  2 | 3hello world3 | 32|3
  1 | 2hello world2 | 21|2
  2 | 2hello world2 | 21|2
  1 | 2hello world2 | 22|2
  2 | 2hello world2 | 22|2
  1 | 1hello world1 | 12|1
  2 | 1hello world1 | 12|1
(16 rows)

which is kind of fun to wrap your head around, but after awhile you
realize that dumpset(...).junk2 is being evaluated before the GROUP BY
and dumpset.(...).id after it, which is how come the grouping behavior
is so obviously violated.

AFAICS, the only way to ensure non-crazy behavior for such examples would
be to force all SRFs in the tlist to be evaluated at the same plan step.
Which we've never done in the past, and if we were to start doing so,
it would cause a behavioral change for examples like this one.

Anyway, my concern is just that we're deciding to stay bug-compatible
with some behaviors that were not very well thought out to start with,
and we don't even have a clear understanding of where the limits of
that compatibility will be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 2:52 PM, Petr Jelinek  wrote:
>> Second, if the other backend extended the relation in
>> some other manner and did not extend the FSM, how does calling
>> RecordAndGetPageWithFreeSpace help?  As far as I can see,
>> GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
>> searching the FSM, so if one is stymied the other will be too.  What
>> am I missing?
>>
>
> The RecordAndGetPageWithFreeSpace will extend FSM as it calls
> fsm_set_and_search which in turn calls fsm_readbuf with extend = true.

So how does that help?  If I'm reading this right, the new block will
be all zeroes which means no space available on any of those pages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Petr Jelinek

On 23/03/16 19:39, Robert Haas wrote:

On Tue, Mar 22, 2016 at 1:12 PM, Petr Jelinek  wrote:

I also think the code simplicity makes this worth it.


Agreed.  I went over this patch and did a cleanup pass today.   I
discovered that the LockWaiterCount() function was broken if you try
to tried to use it on a lock that you didn't hold or a lock that you
held in any mode other than exclusive, so I tried to fix that.  I
rewrote a lot of the comments and tightened some other things up.  The
result is attached.

I'm baffled by the same code Amit asked about upthread, even though
there's now a comment:

+   /*
+* Here we are calling
RecordAndGetPageWithFreeSpace
+* instead of GetPageWithFreeSpace,
because other backend
+* who have got the lock might have
added extra blocks in
+* the FSM and its possible that FSM
tree might not have
+* been updated so far and we will not
get the page by
+* searching from top using
GetPageWithFreeSpace, so we
+* need to search the leaf page
directly using our last
+* valid target block.
+*
+* XXX. I don't understand what is
happening here. -RMH
+*/

I've read this over several times and looked at
RecordAndGetPageWithFreeSpace() and I'm still confused.  First of all,
if the lock was acquired by some other backend which did
RelationAddExtraBlocks(), it *will* have updated the FSM - that's the
whole point.


That's good point, maybe this coding is bit too defensive.


Second, if the other backend extended the relation in
some other manner and did not extend the FSM, how does calling
RecordAndGetPageWithFreeSpace help?  As far as I can see,
GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
searching the FSM, so if one is stymied the other will be too.  What
am I missing?



The RecordAndGetPageWithFreeSpace will extend FSM as it calls 
fsm_set_and_search which in turn calls fsm_readbuf with extend = true.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread Merlin Moncure
On Wed, Mar 23, 2016 at 12:37 PM, Tom Lane  wrote:
> which is both SQL-standard semantics and much more efficient than
> SRF-in-tlist.  We've more or less deprecated SRF-in-tlist since we
> introduced LATERAL in 9.3.  How much are we willing to do to stay
> bug-compatible with old behaviors here?

I think we should, and the fact this was caught so early on the
release cycle underscores that.  One of the problems is that there are
reasonable cases (note, not impacted by this bug) of this usage that
are still commonplace, for example:

ysanalysis=# select unnest(current_schemas(true));
   unnest

 pg_catalog
 public

I'm something of a backwards compatibility zealot, but I've become one
for very good reasons.  Personally, I'd rather we'd define precisely
the usages that are deprecated (I guess SRF-tlist in the presence of
FROM) and force them to error out with an appropriate HINT rather than
give a different answer than they used to.  The problem here is that
LATERAL is still fairly new and there is a huge body of code out there
leveraging the 'bad' way, as it was for years and years the only way
to do a number of useful things.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Robert Haas
On Tue, Mar 22, 2016 at 1:12 PM, Petr Jelinek  wrote:
> I also think the code simplicity makes this worth it.

Agreed.  I went over this patch and did a cleanup pass today.   I
discovered that the LockWaiterCount() function was broken if you try
to tried to use it on a lock that you didn't hold or a lock that you
held in any mode other than exclusive, so I tried to fix that.  I
rewrote a lot of the comments and tightened some other things up.  The
result is attached.

I'm baffled by the same code Amit asked about upthread, even though
there's now a comment:

+   /*
+* Here we are calling
RecordAndGetPageWithFreeSpace
+* instead of GetPageWithFreeSpace,
because other backend
+* who have got the lock might have
added extra blocks in
+* the FSM and its possible that FSM
tree might not have
+* been updated so far and we will not
get the page by
+* searching from top using
GetPageWithFreeSpace, so we
+* need to search the leaf page
directly using our last
+* valid target block.
+*
+* XXX. I don't understand what is
happening here. -RMH
+*/

I've read this over several times and looked at
RecordAndGetPageWithFreeSpace() and I'm still confused.  First of all,
if the lock was acquired by some other backend which did
RelationAddExtraBlocks(), it *will* have updated the FSM - that's the
whole point.  Second, if the other backend extended the relation in
some other manner and did not extend the FSM, how does calling
RecordAndGetPageWithFreeSpace help?  As far as I can see,
GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
searching the FSM, so if one is stymied the other will be too.  What
am I missing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..3ab911f 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -169,6 +169,55 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 }
 
 /*
+ * Extend a relation by multiple blocks to avoid future contention on the
+ * relation extension lock.  Our goal is to pre-extend the relation by an
+ * amount which ramps up as the degree of contention ramps up, but limiting
+ * the result to some sane overall value.
+ */
+static void
+RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
+{
+	Page		page;
+	Size		freespace;
+	BlockNumber blockNum;
+	int			extraBlocks = 0;
+	int			lockWaiters = 0;
+	Buffer		buffer;
+
+	/*
+	 * We use the length of the lock wait queue to judge how much to extend.
+	 * It might seem like multiplying the number of lock waiters by as much
+	 * as 20 is too aggressive, but benchmarking revealed that smaller numbers
+	 * were insufficient.  512 is just an arbitrary cap to prevent pathological
+	 * results (and excessive wasted disk space).
+	 */
+	lockWaiters = RelationExtensionLockWaiterCount(relation);
+	extraBlocks = Min(512, lockWaiters * 20);
+
+	while (extraBlocks-- >= 0)
+	{
+		/* Ouch - an unnecessary lseek() each time through the loop! */
+		buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+		/* Extend by one page. */
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		page = BufferGetPage(buffer);
+		PageInit(page, BufferGetPageSize(buffer), 0);
+		freespace = PageGetHeapFreeSpace(page);
+		MarkBufferDirty(buffer);
+		blockNum = BufferGetBlockNumber(buffer);
+		UnlockReleaseBuffer(buffer);
+
+		/*
+		 * Put the page in the freespace map so other backends can find it.
+		 * This is what will keep those other backends from also queueing up
+		 * on the relation extension lock.
+		 */
+		RecordPageWithFreeSpace(relation, blockNum, freespace);
+	}
+}
+
+/*
  * RelationGetBufferForTuple
  *
  *	Returns pinned and exclusive-locked buffer of a page in given relation
@@ -233,10 +282,11 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	bool		use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
 	Buffer		buffer = InvalidBuffer;
 	Page		page;
-	Size		pageFreeSpace,
-saveFreeSpace;
+	Size		pageFreeSpace = 0,
+saveFreeSpace = 0;
 	BlockNumber targetBlock,
-otherBlock;
+otherBlock,
+lastValidBlock = InvalidBlockNumber;
 	bool		needLock;
 
 	len = MAXALIGN(len);		/* be conservative */
@@ -308,6 +358,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		}
 	}
 
+loop:
 	while (targetBlock != InvalidBlockNumber)
 	{
 		/*
@@ -388,6 +439,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
  otherBlock, targetBlock, vmbuffer_other,
  vmbuffer);
 
+		lastValidBlock = targetBlock;
+
 		/*
 		 * Now we can 

Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Merlin Moncure
On Wed, Mar 23, 2016 at 1:15 PM, Vladimir Sitnikov
 wrote:
> Merlin>No one is arguing that that you should send it any every time (at
> least -- I hope not).
>
> Well, what is your suggestion exactly?
>
> pgjdbc is NOT using "prepare ..." sql command.
> I'm inclined to suppose, it will not use "prepare..." even after your fix.

Maybe so (note, the fix is not mine).  I guess better stated, the
proposed would allow use of server side prepared statements with JDBC.

> Merlin>Again, not in pooling scenarios
> Merlin>The problems integrating server side
> Merlin>prepared statements with pgbouncer are well known.
>
> I'm afraid, they are not.

yes, they are.   see:
https://pgbouncer.github.io/faq.html#how-to-use-prepared-statements-with-session-pooling

There are numerous pages on the web suggesting to DISCARD ALL in
transaction mode which is incredibly wasteful in the case of prepared
statements...so much so you're better off not using them if you can
help it.  With proposed, the application can simply prepare after
opening the 'connection' and not have to worry about handling the
error or scope.

> Your words are "This feature should be immediately be incorporated
> by the JDBC driver" yet you have not raised that subject on pgsql-jdbc
> mailing list/github issue. That is not very fair.

That's fair. Although there was a very long standing issue where jdbc
would try to prepare 'BEGIN' in such a a way that it could not be
disabled -- that was fixed.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread David G. Johnston
On Wednesday, March 23, 2016, Tom Lane  wrote:

> "Regina Obe" > writes:
> > In the past couple of weeks our PostGIS tests against PostgreSQL 9.6 dev
> > started failing.  I traced the issue down to a behavior change in 9.6
> when
> > dealing with output of set returning functions when used with (func).*
> > syntax.
>
> > CREATE OR REPLACE FUNCTION dumpset(param_num integer, param_text text)
> > RETURNS TABLE(id integer, junk1 text, junk2 text)
> > ...
> > -- Get 16 rows in 9.6, Get 8 rows in 9.5
> > SELECT (dumpset(f.test, 'hello world' || f.test)).*
> > FROM generate_series(1,4) As f(test)
> > ORDER BY junk2;
>
> I think this is a side-effect of 9118d03a8 ("When appropriate, postpone
> SELECT output expressions till after ORDER BY").  Previously, although you
> got N evaluations of the SRF which is pretty horrid, they were all in the
> same targetlist and hence ran in sync and produced only the expected
> number of rows (thanks to the otherwise-indefensible ExecTargetList
> behavior by which multiple SRF tlist expressions produce a number of rows
> equal to the least common multiple of their periods, not the product).
> That commit causes the evaluation of dumpset(...).junk1 to be postponed
> till after the Sort step, but the evaluation of dumpset(...).junk2
> necessarily can't be.  So now you get dumpset(...).junk2 inflating
> the original rowcount 2X, and then dumpset(...).junk1 inflating it
> another 2X after the Sort.
>
> We could remain bug-compatible with the old behavior by adding some
> restriction to keep all the tlist SRFs getting evaluated at the same
> plan step, at least to the extent that we can.  I think you could get
> similar strange behaviors in prior releases if you used GROUP BY or
> another syntax that might result in early evaluation of the sort column,
> and we're not going to be able to fix that.  But we could prevent this
> particular optimization from introducing new strangeness.
>
> But I'm not really sure that we should.  The way that you *should*
> write this query is
>
> SELECT ds.*
> FROM generate_series(1,4) AS f(test),
>  dumpset(f.test, 'hello world' || f.test) AS ds
> ORDER BY junk2;
>
> which is both SQL-standard semantics and much more efficient than
> SRF-in-tlist.  We've more or less deprecated SRF-in-tlist since we
> introduced LATERAL in 9.3.  How much are we willing to do to stay
> bug-compatible with old behaviors here?
>
>
My gut reaction is that this is an unnecessary regression for the sake of a
performance optimization that is likely drowned out in the usage presented
anyway.

The pivot for me would be how hard would it be to maintain the old behavior
in this "more or less deprecated" scenario.  I have no way to judge that.

David J.


Re: [HACKERS] multivariate statistics v14

2016-03-23 Thread Petr Jelinek

Hi,

I'll add couple of code comments from my first cursory read through 
(this is huge):


0002:
there is some whitespace noise between the varlistentries in 
alter_statistics.sgml


+   parentobject.classId = RelationRelationId;
+   parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+   parentobject.objectSubId = 0;
+   childobject.classId = MvStatisticRelationId;
+   childobject.objectId = statoid;
+   childobject.objectSubId = 0;

I wonder if this (several places similar code) would be simpler done 
using ObjectAddressSet()


The common.h in backend/utils/mvstat is slightly weird header file 
placement and naming.



0004:
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))

Huh? We have Max and Min macros defined in c.h

+   values[Anum_pg_mv_statistic_stamcv  - 1] = 
PointerGetDatum(data);

Why the double space (that's actually in several places in several of 
the patches).


I don't really understand why 0008 and 0009 are separate patches and 
aren't part of one of the other patches. But otherwise good job on 
splitting the functionality into patchset.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Vladimir Sitnikov
Merlin>No one is arguing that that you should send it any every time (at
least -- I hope not).

Well, what is your suggestion exactly?

pgjdbc is NOT using "prepare ..." sql command.
I'm inclined to suppose, it will not use "prepare..." even after your fix.

Merlin>Again, not in pooling scenarios
Merlin>The problems integrating server side
Merlin>prepared statements with pgbouncer are well known.

I'm afraid, they are not.

Your words are "This feature should be immediately be incorporated
by the JDBC driver" yet you have not raised that subject on pgsql-jdbc
mailing list/github issue. That is not very fair.

Let me just name an alternative, so you can see what "a step back" could be:
What if pg-bouncer generated _fake_ ParameterStatus(backend_id=...)
messages to pgjdbc?
Then pgjdbc can learn true backend session id and it can use proper
set of prepared statements. Basically, pgjdbc could have prepared statement
cache per backend_id.
Well, it is not a 100% solution, however it is yet another approach to
"pgbouncer" problem, and it will support all the PostgreSQL versions.

It fits into current frontend-backend protocol as all clients are supposed
to handle ParameterStatus messages, etc.

Vladimir


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-23 Thread Alvaro Herrera
Tom Lane wrote:

> Note: the reason for a new subdirectory, rather than putting this
> stuff into src/common/, is that src/common/ is meant for code that's
> shared between frontend and backend, which this stuff is not.  Also,
> many of these files depend on libpq which seems like an inappropriate
> dependency for src/common.

Actually you could just list them in OBJS_FRONTEND in src/common.  That
way they're not built for the server at all; no need for a new subdir.
The libpq dependency argument AFAICS only applies to the ones in
src/bin/psql.  Perhaps we could have feutils with those only, and move
the other files to src/common.

I'm unclear on the #ifndef PGSCRIPTS thingy in mbprint.c.  Perhaps it's
no longer needed?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Tom Lane
Vladimir Sitnikov  writes:
> 2016-03-23 16:21 GMT+03:00 Merlin Moncure :
>> On Wed, Mar 23, 2016 at 7:27 AM, Craig Ringer  wrote:
> Craig>> With PREPARE IF NOT EXISTS the client is also paying parse and network
> Craig>> overhead for every time you send that statement. Much better
> Craig>> not to send it repeatedly in the first place.

> Merlin> How did you determine that?  The client prepares the statement exactly
> Merlin> once.  The problem is there's no easy way to determine if someone else
> Merlin> prepared it first and this patch directly addresses that.

> With suggested "prepare if not exists", client would still have to send full
> query text along with "prepare if not exist" command.
> That is "network overhead".

Yes.  Also, the query would certainly go through the lex and raw-parse
steps (scan.l and gram.y) on the database side before we could get as far
as realizing that it's a PREPARE IF NOT EXISTS and we should check for
pre-existence of the statement name.  Those steps are a nontrivial
fraction of the full parse-analysis overhead, especially for simpler
statements.

So I wouldn't take it as a given that this is actually going to be
very efficient.  Some measurements would be important to make the case
that this is worth having.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Merlin Moncure
On Wed, Mar 23, 2016 at 12:46 PM, Vladimir Sitnikov
 wrote:
> 2016-03-23 16:21 GMT+03:00 Merlin Moncure :
> Merlin> A typical pattern is for the application to
> Merlin> prepare them all upon startup, but currently each PREPARE needs to be
> Merlin> wrapped with an exception handler in case someone else prepared it
> Merlin> first.
>
> If you plan to have "prepare if not exists" at startup only, why don't
> you just wrap it with
> exception handler then?

That's impolite to our users.  Virtually all other commands have been
decorated with IF [NOT] exists to avoid having to guard with exception
handler -- why not this one?  Also, if the handler is on the client
side, it tends to be racey.

> If you plan to always issue "prepare if not exists", then you will
> have to send full query text
> for each prepare => overhead. Those repeated query texts are
> "endless copies of the same PREPARE statements" Craig is talking about.

No one is arguing that that you should send it any every time (at
least -- I hope not).

> Merlin>The client prepares the statement exactly
> Merlin>once.  The problem is there's no easy way to determine if someone else
> Merlin>prepared it first
>
> Merlin, if by "client" you somehow mean JDBC (e.g. pgjdbc), then it
> does track which connections
> have which queries prepared.

Again, not in pooling scenarios.  The problems integrating server side
prepared statements with pgbouncer are well known.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Generic WAL logical messages

2016-03-23 Thread Petr Jelinek

On 23/03/16 14:17, Alvaro Herrera wrote:

Petr Jelinek wrote:


+++ b/contrib/test_decoding/sql/messages.sql
@@ -0,0 +1,17 @@
+-- predictability
+SET synchronous_commit = on;
+
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 
'test_decoding');
+
+SELECT 'msg1' FROM pg_logical_emit_message(true, 'test', 'msg1');
+SELECT 'msg2' FROM pg_logical_emit_message(false, 'test', 'msg2');
+
+BEGIN;
+SELECT 'msg3' FROM pg_logical_emit_message(true, 'test', 'msg3');
+SELECT 'msg4' FROM pg_logical_emit_message(false, 'test', 'msg4');
+SELECT 'msg5' FROM pg_logical_emit_message(true, 'test', 'msg5');
+COMMIT;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'force-binary', '0', 'skip-empty-xacts', '1');
+
+SELECT 'init' FROM pg_drop_replication_slot('regression_slot');


No tests for a rolled back transaction?



Good point, probably worth testing especially since we have 
non-transactional messages.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 81fc28cedc19fe0f91f882d42989c14113a40f88 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Wed, 24 Feb 2016 17:02:36 +0100
Subject: [PATCH] Logical Decoding Messages

---
 contrib/test_decoding/Makefile  |   2 +-
 contrib/test_decoding/expected/ddl.out  |  21 ++--
 contrib/test_decoding/expected/messages.out |  71 ++
 contrib/test_decoding/sql/ddl.sql   |   3 +-
 contrib/test_decoding/sql/messages.sql  |  22 +
 contrib/test_decoding/test_decoding.c   |  18 
 doc/src/sgml/func.sgml  |  45 +
 doc/src/sgml/logicaldecoding.sgml   |  38 
 src/backend/access/rmgrdesc/Makefile|   4 +-
 src/backend/access/rmgrdesc/logicalmsgdesc.c|  41 
 src/backend/access/transam/rmgr.c   |   1 +
 src/backend/replication/logical/Makefile|   2 +-
 src/backend/replication/logical/decode.c|  46 +
 src/backend/replication/logical/logical.c   |  38 
 src/backend/replication/logical/logicalfuncs.c  |  27 ++
 src/backend/replication/logical/message.c   |  85 +
 src/backend/replication/logical/reorderbuffer.c | 121 
 src/backend/replication/logical/snapbuild.c |  19 
 src/bin/pg_xlogdump/.gitignore  |  20 +---
 src/bin/pg_xlogdump/rmgrdesc.c  |   1 +
 src/include/access/rmgrlist.h   |   1 +
 src/include/catalog/pg_proc.h   |   4 +
 src/include/replication/logicalfuncs.h  |   2 +
 src/include/replication/message.h   |  41 
 src/include/replication/output_plugin.h |  13 +++
 src/include/replication/reorderbuffer.h |  22 +
 src/include/replication/snapbuild.h |   2 +
 27 files changed, 679 insertions(+), 31 deletions(-)
 create mode 100644 contrib/test_decoding/expected/messages.out
 create mode 100644 contrib/test_decoding/sql/messages.sql
 create mode 100644 src/backend/access/rmgrdesc/logicalmsgdesc.c
 create mode 100644 src/backend/replication/logical/message.c
 create mode 100644 src/include/replication/message.h

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 06c9546..309cb0b 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -38,7 +38,7 @@ submake-test_decoding:
$(MAKE) -C $(top_builddir)/contrib/test_decoding
 
 REGRESSCHECKS=ddl xact rewrite toast permissions decoding_in_xact \
-   decoding_into_rel binary prepared replorigin time
+   decoding_into_rel binary prepared replorigin time messages
 
 regresscheck: | submake-regress submake-test_decoding temp-install
$(MKDIR_P) regression_output
diff --git a/contrib/test_decoding/expected/ddl.out 
b/contrib/test_decoding/expected/ddl.out
index 77719e8..32cd24d 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -220,11 +220,17 @@ SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
 (7 rows)
 
 /*
- * check that disk spooling works
+ * check that disk spooling works (also for logical messages)
  */
 BEGIN;
 CREATE TABLE tr_etoomuch (id serial primary key, data int);
 INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+SELECT 'tx logical msg' FROM pg_logical_emit_message(true, 'test', 'tx logical 
msg');
+?column?
+
+ tx logical msg
+(1 row)
+
 DELETE FROM tr_etoomuch WHERE id < 5000;
 UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
 COMMIT;
@@ -233,12 +239,13 @@ SELECT count(*), min(data), max(data)
 FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1')
 GROUP BY substring(data, 1, 24)
 ORDER BY 1,2;
- count |   min 

Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Vladimir Sitnikov
2016-03-23 16:21 GMT+03:00 Merlin Moncure :
> On Wed, Mar 23, 2016 at 7:27 AM, Craig Ringer  wrote:
Craig>> With PREPARE IF NOT EXISTS the client is also paying parse and network
Craig>> overhead for every time you send that statement. Much better
not to send it
Craig>> repeatedly in the first place.
>
Merlin> How did you determine that?  The client prepares the statement exactly
Merlin> once.  The problem is there's no easy way to determine if someone else
Merlin> prepared it first and this patch directly addresses that.

With suggested "prepare if not exists", client would still have to send full
query text along with "prepare if not exist" command.
That is "network overhead".
DB would still have to check if the same query with same "query name" has
already been registered. Well, that query check would probably be easier than
"generate execution plan", yet it can be of non-zero overhead.

Craig>> I think we need to take a step back here and better define the problem
Craig>> before stepping in with a proposed solution. Something that
avoids the need
Craig>> to spam the server with endless copies of the same PREPARE
statements would
Craig>> be good.

+1

Merlin> A typical pattern is for the application to
Merlin> prepare them all upon startup, but currently each PREPARE needs to be
Merlin> wrapped with an exception handler in case someone else prepared it
Merlin> first.

If you plan to have "prepare if not exists" at startup only, why don't
you just wrap it with
exception handler then?

If you plan to always issue "prepare if not exists", then you will
have to send full query text
for each prepare => overhead. Those repeated query texts are
"endless copies of the same PREPARE statements" Craig is talking about.

Merlin>The client prepares the statement exactly
Merlin>once.  The problem is there's no easy way to determine if someone else
Merlin>prepared it first

Merlin, if by "client" you somehow mean JDBC (e.g. pgjdbc), then it
does track which connections
have which queries prepared.
So the thing we/you need might be not backend support for "prepare if
not exists", but some kind of
bouncer vs jdbc integration, so jdbc knows which connection it is
using at each point in time.

Vladimir


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator

2016-03-23 Thread Tom Lane
Robert Haas  writes:
> On Wed, Mar 23, 2016 at 12:27 PM, Tom Lane  wrote:
>> I'm also a bit dubious of the assumption in RemoveOperatorById that an
>> operator can't be its own negator.  Yeah, that should not be the case,
>> but if it is the case the deletion will fail outright.

> So what?  We've never guaranteed that things are going to work if you
> start by corrupting the catalogs, and I wouldn't pick this as a place
> to start.

I would not be worried except that it breaks a case that used to work,
as your test below demonstrates.

>> We could resolve both of these issues by changing the semantics of
>> OprUpdate so that it unconditionally does a CommandCounterIncrement
>> after each update that it performs.  IMO that would be a lot simpler
>> and more bulletproof; it'd allow removal of a lot of these
>> overly-tightly-reasoned cases.

> I tried this, but it did not seem to work.

Odd.  If you post the revised patch, I'll try to chase down what's wrong.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread Tom Lane
"Regina Obe"  writes:
> In the past couple of weeks our PostGIS tests against PostgreSQL 9.6 dev
> started failing.  I traced the issue down to a behavior change in 9.6 when
> dealing with output of set returning functions when used with (func).*
> syntax.

> CREATE OR REPLACE FUNCTION dumpset(param_num integer, param_text text)
> RETURNS TABLE(id integer, junk1 text, junk2 text)
> ...
> -- Get 16 rows in 9.6, Get 8 rows in 9.5
> SELECT (dumpset(f.test, 'hello world' || f.test)).*
> FROM generate_series(1,4) As f(test)
> ORDER BY junk2;

I think this is a side-effect of 9118d03a8 ("When appropriate, postpone
SELECT output expressions till after ORDER BY").  Previously, although you
got N evaluations of the SRF which is pretty horrid, they were all in the
same targetlist and hence ran in sync and produced only the expected
number of rows (thanks to the otherwise-indefensible ExecTargetList
behavior by which multiple SRF tlist expressions produce a number of rows
equal to the least common multiple of their periods, not the product).
That commit causes the evaluation of dumpset(...).junk1 to be postponed
till after the Sort step, but the evaluation of dumpset(...).junk2
necessarily can't be.  So now you get dumpset(...).junk2 inflating
the original rowcount 2X, and then dumpset(...).junk1 inflating it
another 2X after the Sort.

We could remain bug-compatible with the old behavior by adding some
restriction to keep all the tlist SRFs getting evaluated at the same
plan step, at least to the extent that we can.  I think you could get
similar strange behaviors in prior releases if you used GROUP BY or
another syntax that might result in early evaluation of the sort column,
and we're not going to be able to fix that.  But we could prevent this
particular optimization from introducing new strangeness.

But I'm not really sure that we should.  The way that you *should*
write this query is

SELECT ds.*
FROM generate_series(1,4) AS f(test),
 dumpset(f.test, 'hello world' || f.test) AS ds
ORDER BY junk2;

which is both SQL-standard semantics and much more efficient than
SRF-in-tlist.  We've more or less deprecated SRF-in-tlist since we
introduced LATERAL in 9.3.  How much are we willing to do to stay
bug-compatible with old behaviors here?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-23 Thread Petr Jelinek

On 23/03/16 09:14, Kouhei Kaigai wrote:

On 15/03/16 05:03, Kouhei Kaigai wrote:

Petr,

The attached patch is the revised one that follows the new extensible-
node routine.

It is almost same the previous version except for:
- custom-apis.[ch] was renamed to custom-node.[ch]
- check for the length of custom-scan-method name followed
 the manner of RegisterExtensibleNodeMethods()



Hi,

looks good, only nitpick I have is that it probably should be
custom_node.h with underscore given that we use underscore everywhere
(except for libpq and for some reason atomic ops).


Sorry for my response late.

The attached patch just renamed custom-node.[ch] by custom_node.[ch].
Other portions are not changed from the previous revison.



Forgot to attach?


Yes Thanks,



Ok, I am happy with it, marked it as ready for committer (it was marked 
as committed although it wasn't committed).


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-23 Thread Magnus Hagander
On Wed, Mar 23, 2016 at 5:55 PM, Tom Lane  wrote:

> I'm not terribly happy with the hack I used to get pgbench to be able
> to borrow psql's psqlscan.l lexer.  It's a mess for the MSVC build
> scripts, and I have seen it causing two concurrent builds of psqlscan.o
> in parallel builds, which is likely to cause a build failure some of
> the time.  Another unresolved issue is that we can't apply FLEX_NO_BACKUP
> testing to both of psql's lexers, because that causes each of those
> make steps to want to write lex.backup in the current directory.
>
> I have a modest proposal for improving this: let's move all the code
> that's currently shared by two or more src/bin/ subdirectories into a
> new directory, say src/feutils, and build it into a ".a" library in
> the same way that src/common/ is handled.  This would remove the
> following klugy cross-directory links:
>
> src/bin/pgbench/psqlscan.o -> ../../../src/bin/psql/psqlscan.o
> src/bin/psql/dumputils.c -> ../../../src/bin/pg_dump/dumputils.c
> src/bin/psql/keywords.c -> ../../../src/bin/pg_dump/keywords.c
> src/bin/scripts/dumputils.c -> ../../../src/bin/pg_dump/dumputils.c
> src/bin/scripts/keywords.c -> ../../../src/bin/pg_dump/keywords.c
> src/bin/scripts/mbprint.c -> ../../../src/bin/psql/mbprint.c
> src/bin/scripts/print.c -> ../../../src/bin/psql/print.c
>
> Note: the reason for a new subdirectory, rather than putting this
> stuff into src/common/, is that src/common/ is meant for code that's
> shared between frontend and backend, which this stuff is not.  Also,
> many of these files depend on libpq which seems like an inappropriate
> dependency for src/common.
>
> Having said that, I also notice these dependencies:
>
> src/bin/pg_dump/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
> src/bin/psql/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
> src/bin/scripts/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
> src/interfaces/ecpg/preproc/kwlookup.c ->
> ../../../../src/backend/parser/kwlookup.c
> src/bin/initdb/encnames.c -> ../../../src/backend/utils/mb/encnames.c
> src/interfaces/libpq/encnames.c -> ../../../src/backend/utils/mb/encnames.c
>
> which seem like they'd be better handled by moving those files into
> src/common/.
>
> Thoughts?
>
>
+1 on both. I've certainly been annoyed by that kwlookup thing many times
:)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-23 Thread Petr Jelinek

On 23/03/16 17:55, Tom Lane wrote:

I'm not terribly happy with the hack I used to get pgbench to be able
to borrow psql's psqlscan.l lexer.  It's a mess for the MSVC build
scripts, and I have seen it causing two concurrent builds of psqlscan.o
in parallel builds, which is likely to cause a build failure some of
the time.  Another unresolved issue is that we can't apply FLEX_NO_BACKUP
testing to both of psql's lexers, because that causes each of those
make steps to want to write lex.backup in the current directory.

I have a modest proposal for improving this: let's move all the code
that's currently shared by two or more src/bin/ subdirectories into a
new directory, say src/feutils, and build it into a ".a" library in
the same way that src/common/ is handled.  This would remove the
following klugy cross-directory links:

src/bin/pgbench/psqlscan.o -> ../../../src/bin/psql/psqlscan.o
src/bin/psql/dumputils.c -> ../../../src/bin/pg_dump/dumputils.c
src/bin/psql/keywords.c -> ../../../src/bin/pg_dump/keywords.c
src/bin/scripts/dumputils.c -> ../../../src/bin/pg_dump/dumputils.c
src/bin/scripts/keywords.c -> ../../../src/bin/pg_dump/keywords.c
src/bin/scripts/mbprint.c -> ../../../src/bin/psql/mbprint.c
src/bin/scripts/print.c -> ../../../src/bin/psql/print.c

Note: the reason for a new subdirectory, rather than putting this
stuff into src/common/, is that src/common/ is meant for code that's
shared between frontend and backend, which this stuff is not.  Also,
many of these files depend on libpq which seems like an inappropriate
dependency for src/common.

Having said that, I also notice these dependencies:

src/bin/pg_dump/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
src/bin/psql/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
src/bin/scripts/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
src/interfaces/ecpg/preproc/kwlookup.c -> 
../../../../src/backend/parser/kwlookup.c
src/bin/initdb/encnames.c -> ../../../src/backend/utils/mb/encnames.c
src/interfaces/libpq/encnames.c -> ../../../src/backend/utils/mb/encnames.c

which seem like they'd be better handled by moving those files into
src/common/.

Thoughts?



Yes please!

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 12:55 PM, Tom Lane  wrote:
> I'm not terribly happy with the hack I used to get pgbench to be able
> to borrow psql's psqlscan.l lexer.  It's a mess for the MSVC build
> scripts, and I have seen it causing two concurrent builds of psqlscan.o
> in parallel builds, which is likely to cause a build failure some of
> the time.  Another unresolved issue is that we can't apply FLEX_NO_BACKUP
> testing to both of psql's lexers, because that causes each of those
> make steps to want to write lex.backup in the current directory.
>
> I have a modest proposal for improving this: let's move all the code
> that's currently shared by two or more src/bin/ subdirectories into a
> new directory, say src/feutils, and build it into a ".a" library in
> the same way that src/common/ is handled.  This would remove the
> following klugy cross-directory links:
>
> src/bin/pgbench/psqlscan.o -> ../../../src/bin/psql/psqlscan.o
> src/bin/psql/dumputils.c -> ../../../src/bin/pg_dump/dumputils.c
> src/bin/psql/keywords.c -> ../../../src/bin/pg_dump/keywords.c
> src/bin/scripts/dumputils.c -> ../../../src/bin/pg_dump/dumputils.c
> src/bin/scripts/keywords.c -> ../../../src/bin/pg_dump/keywords.c
> src/bin/scripts/mbprint.c -> ../../../src/bin/psql/mbprint.c
> src/bin/scripts/print.c -> ../../../src/bin/psql/print.c
>
> Note: the reason for a new subdirectory, rather than putting this
> stuff into src/common/, is that src/common/ is meant for code that's
> shared between frontend and backend, which this stuff is not.  Also,
> many of these files depend on libpq which seems like an inappropriate
> dependency for src/common.
>
> Having said that, I also notice these dependencies:
>
> src/bin/pg_dump/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
> src/bin/psql/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
> src/bin/scripts/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
> src/interfaces/ecpg/preproc/kwlookup.c -> 
> ../../../../src/backend/parser/kwlookup.c
> src/bin/initdb/encnames.c -> ../../../src/backend/utils/mb/encnames.c
> src/interfaces/libpq/encnames.c -> ../../../src/backend/utils/mb/encnames.c
>
> which seem like they'd be better handled by moving those files into
> src/common/.
>
> Thoughts?

No objection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump / copy bugs with "big lines" ?

2016-03-23 Thread Daniel Verite
Alvaro Herrera wrote:

> >   tuple = (HeapTuple) palloc0(HEAPTUPLESIZE + len);
> > 
> > which fails because (HEAPTUPLESIZE + len) is again considered
> > an invalid size, the  size being 1468006476 in my test.
> 
> Um, it seems reasonable to make this one be a huge-zero-alloc:
> 
>   MemoryContextAllocExtended(CurrentMemoryContext,
>  HEAPTUPLESIZE + len,
> MCXT_ALLOC_HUGE | MCXT_ALLOC_ZERO)

Good, this allows the tests to go to completion! The tests in question
are dump/reload of a row with several fields totalling 1.4GB (deflated),
with COPY TO/FROM file and psql's \copy in both directions, as well as
pg_dump followed by pg_restore|psql.

The modified patch is attached.

It provides a useful mitigation to dump/reload databases having
rows in the 1GB-2GB range, but it works under these limitations:

- no single field has a text representation exceeding 1GB.
- no row as text exceeds 2GB (\copy from fails beyond that. AFAICS we
  could push this to 4GB with limited changes to libpq, by
  interpreting the Int32 field in the CopyData message as unsigned).

It's also possible to go beyond 4GB per row with this patch, but
when not using the protocol. I've managed to get a 5.6GB single-row
file with COPY TO file. That doesn't help with pg_dump, but that might
be useful in other situations.

In StringInfo, I've changed int64 to Size, because on 32 bits platforms
the downcast from int64 to Size is problematic, and as the rest of the
allocation routines seems to favor Size, it seems more consistent
anyway.

I couldn't test on 32 bits though, as I seem to never have enough
free contiguous memory available on a 32 bits VM to handle
that kind of data.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 6d0f3f3..ed9cabd 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -741,7 +741,9 @@ heap_form_tuple(TupleDesc tupleDescriptor,
 	 * Allocate and zero the space needed.  Note that the tuple body and
 	 * HeapTupleData management structure are allocated in one chunk.
 	 */
-	tuple = (HeapTuple) palloc0(HEAPTUPLESIZE + len);
+	tuple = MemoryContextAllocExtended(CurrentMemoryContext,
+	   HEAPTUPLESIZE + len,
+	   MCXT_ALLOC_HUGE | MCXT_ALLOC_ZERO);
 	tuple->t_data = td = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
 
 	/*
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3201476..ce681d7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -446,6 +446,15 @@ SendCopyEnd(CopyState cstate)
 	}
 }
 
+/*
+ * Prepare for output
+ */
+static void
+CopyStartSend(CopyState cstate)
+{
+	allowLongStringInfo(cstate->fe_msgbuf);
+}
+
 /*--
  * CopySendData sends output data to the destination (file or frontend)
  * CopySendString does the same for null-terminated strings
@@ -2015,6 +2024,8 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
+	CopyStartSend(cstate);
+
 	if (cstate->binary)
 	{
 		/* Binary per-tuple header */
@@ -3203,6 +3214,7 @@ CopyReadLine(CopyState cstate)
 	bool		result;
 
 	resetStringInfo(>line_buf);
+	allowLongStringInfo(>line_buf);
 	cstate->line_buf_valid = true;
 
 	/* Mark that encoding conversion hasn't occurred yet */
@@ -3272,6 +3284,7 @@ CopyReadLine(CopyState cstate)
 		{
 			/* transfer converted data back to line_buf */
 			resetStringInfo(>line_buf);
+			allowLongStringInfo(>line_buf);
 			appendBinaryStringInfo(>line_buf, cvt, strlen(cvt));
 			pfree(cvt);
 		}
@@ -3696,7 +3709,7 @@ CopyReadAttributesText(CopyState cstate)
 	}
 
 	resetStringInfo(>attribute_buf);
-
+	allowLongStringInfo(>attribute_buf);
 	/*
 	 * The de-escaped attributes will certainly not be longer than the input
 	 * data line, so we can just force attribute_buf to be large enough and
diff --git a/src/backend/lib/stringinfo.c b/src/backend/lib/stringinfo.c
index 7382e08..6e451b2 100644
--- a/src/backend/lib/stringinfo.c
+++ b/src/backend/lib/stringinfo.c
@@ -47,12 +47,24 @@ initStringInfo(StringInfo str)
 {
 	int			size = 1024;	/* initial default buffer size */
 
-	str->data = (char *) palloc(size);
+	str->data = (char *) palloc(size);	/* no need for "huge" at this point */
 	str->maxlen = size;
+	str->allowlong = false;
 	resetStringInfo(str);
 }
 
 /*
+ * allocLongStringInfo
+ *
+ * Mark the StringInfo as a candidate for a "huge" allocation
+ */
+void
+allowLongStringInfo(StringInfo str)
+{
+	str->allowlong = true;
+}
+
+/*
  * resetStringInfo
  *
  * Reset the StringInfo: the data buffer remains valid, but its
@@ -64,6 +76,7 @@ resetStringInfo(StringInfo str)
 	str->data[0] = '\0';
 	str->len = 0;
 	str->cursor = 0;
+	str->allowlong = false;
 }
 
 /*
@@ -244,7 +257,9 @@ 

Re: [HACKERS] [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 12:27 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I did not find this version very clear.  It wasn't consistent about
>> using ObjectIdGetDatum() where needed, but the bigger problem was that
>> I found the logic unnecessarily convoluted.  I rewrote it - I believe
>> more straightforwardly - as attached.  How does this look?
>
> I'd suggest that we save some code by always doing separate updates for
> the commutator and negator entries.  We can handle the corner case where
> they're the same by doing a CommandCounterIncrement between the updates,
> instead of having convoluted and probably-never-yet-tested logic.

Sure, we could do that, but it isn't necessary.  If the logic never
gets hit, the question of whether it has bugs isn't that important.
And I'd rather not rejigger things more than necessary in something
that's going to be back-patched.

> I'm also a bit dubious of the assumption in RemoveOperatorById that an
> operator can't be its own negator.  Yeah, that should not be the case,
> but if it is the case the deletion will fail outright.

So what?  We've never guaranteed that things are going to work if you
start by corrupting the catalogs, and I wouldn't pick this as a place
to start.

> We could resolve both of these issues by changing the semantics of
> OprUpdate so that it unconditionally does a CommandCounterIncrement
> after each update that it performs.  IMO that would be a lot simpler
> and more bulletproof; it'd allow removal of a lot of these
> overly-tightly-reasoned cases.

I tried this, but it did not seem to work.  With the command counter
increments added and the conditional logic removed, I got:

rhaas=# CREATE OPERATOR === (PROCEDURE = int8eq, LEFTARG = bigint,
RIGHTARG = bigint);
CREATE OPERATOR
rhaas=# update pg_operator set oprnegate = oid where oprname = '===';
UPDATE 1
rhaas=# drop operator === (bigint, bigint);
ERROR:  attempted to delete invisible tuple

The same test case without those changes fails with:

ERROR:  tuple already updated by self

Interestingly, that test case passes on unpatched master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-03-23 Thread Abhijit Menon-Sen
Hi.

I implemented "ALTER FUNCTION … DEPENDS ON EXTENSION" using a new node
(AlterObjectDependsStmt), and tried to add "ALTER TRIGGER … DEPENDS ON
EXTENSION" (partly because I wanted to make sure the code could support
multiple object types, partly because it's convenient in this particular
use case). Then I ran into a problem, which I could use some help with.

The main ExecAlterObjectDependsStmt() function is very simple:

+ObjectAddress
+ExecAlterObjectDependsStmt(AlterObjectDependsStmt *stmt)
+{
+   ObjectAddress address;
+   ObjectAddress extAddr;
+
+   address = get_object_address(stmt->objectType, stmt->objname, stmt->objargs,
+NULL, AccessExclusiveLock, false);
+   extAddr = get_object_address(OBJECT_EXTENSION, stmt->extname, NULL,
+NULL, AccessExclusiveLock, false);
+
+   recordDependencyOn(, , DEPENDENCY_AUTO_EXTENSION);
+
+   return address;
+}

All that's needed is to construct the node based on variants of the
ALTER command:

+AlterObjectDependsStmt:
+ALTER FUNCTION function_with_argtypes DEPENDS ON EXTENSION name
+{
+AlterObjectDependsStmt *n = 
makeNode(AlterObjectDependsStmt);
+n->objectType = OBJECT_FUNCTION;
+n->objname = $3->funcname;
+n->objargs = $3->funcargs;
+n->extname = list_make1(makeString($7));
+$$ = (Node *)n;
+}
+| ALTER TRIGGER name ON any_name DEPENDS ON EXTENSION name
+{
+AlterObjectDependsStmt *n = 
makeNode(AlterObjectDependsStmt);
+n->objectType = OBJECT_TRIGGER;
+n->objname = list_make1(lappend($5, makeString($3)));
+n->objargs = NIL;
+n->extname = list_make1(makeString($9));
+$$ = (Node *)n;
+}
+;

Now, the first part of this works fine. But with the second part, I get
a reduce/reduce conflict if I use any_name. Here's an excerpt from the
verbose bison output:

State 2920

  1181 AlterObjectDependsStmt: ALTER TRIGGER name ON any_name DEPENDS . ON 
EXTENSION name

ON  shift, and go to state 3506


State 2921

  1165 RenameStmt: ALTER TRIGGER name ON qualified_name RENAME . TO name

TO  shift, and go to state 3507


State 2922

  829 attrs: '.' . attr_name
  2006 indirection_el: '.' . attr_name
  2007   | '.' . '*'

  …

  attr_name   go to state 3508
  ColLabelgo to state 1937
  unreserved_keyword  go to state 1938
  col_name_keywordgo to state 1939
  type_func_name_keyword  go to state 1940
  reserved_keywordgo to state 1941


State 3506

  1181 AlterObjectDependsStmt: ALTER TRIGGER name ON any_name DEPENDS ON . 
EXTENSION name

EXTENSION  shift, and go to state 4000


State 3507

  1165 RenameStmt: ALTER TRIGGER name ON qualified_name RENAME TO . name

…

namego to state 4001
ColId   go to state 543
unreserved_keyword  go to state 544
col_name_keywordgo to state 545


State 3508

  829 attrs: '.' attr_name .
  2006 indirection_el: '.' attr_name .

RENAMEreduce using rule 2006 (indirection_el)
'['   reduce using rule 2006 (indirection_el)
'.'   reduce using rule 829 (attrs)
'.'   [reduce using rule 2006 (indirection_el)]
$default  reduce using rule 829 (attrs)

I can see that the problem is that it can reduce '.' in two ways, but
I'm afraid I don't remember enough about yacc to know how to fix it. If
anyone has suggestions, I would be grateful.

Meanwhile, I tried using qualified_name instead of any_name, which does
work without conflicts, but I end up with a RangeVar instead of the sort
of List* that I can feed to get_object_address.

I could change ExecAlterObjectDependsStmt() to check if stmt->relation
is set, and if so, convert the RangeVar back to a List* in the right
format before passing it to get_object_address. That would work fine,
but it doesn't seem like a good solution.

I could write some get_object_address_rv() wrapper that does essentially
the same conversion, but that's only slightly better.

Are there any other options?

(Apart from the above, the rest of the patch is really just boilerplate
for the new node type, but I've attached it anyway for reference.)

-- Abhijit
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 951f59b..189b771 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2864,6 +2864,19 @@
   
  
 
+
+
+ DEPENDENCY_AUTO_EXTENSION (x)
+ 
+  
+   The dependent object is not a member of the extension that is the
+   referenced object (and so should not be ignored by pg_dump), but
+   cannot function without it and should be dropped when the
+   extension itself is. The dependent object may be 

[HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*

2016-03-23 Thread Regina Obe
In the past couple of weeks our PostGIS tests against PostgreSQL 9.6 dev
started failing.  I traced the issue down to a behavior change in 9.6 when
dealing with output of set returning functions when used with (func).*
syntax.

Here is an example not involving PostGIS.  Is this an intentional change in
behavior?

CREATE OR REPLACE FUNCTION dumpset(param_num integer, param_text text)
RETURNS TABLE(id integer, junk1 text, junk2 text)
AS
$$
BEGIN
 RETURN QUERY SELECT id2 As id, $1 || $2::text As junk1, $1 || id2::text AS
junk2
FROM generate_series(1,2) As id2;
END;

$$
language 'plpgsql';

-- Get 16 rows in 9.6, Get 8 rows in 9.5
SELECT (dumpset(f.test, 'hello world' || f.test)).*
FROM generate_series(1,4) As f(test)
ORDER BY junk2;


I know that functions get called multiple times with (..).* and so it's
frowned upon, but before the results would only return once and I suspect
for people who are lazy and also don't mind the penalty cost they might just
use this syntax.
If its intentional I can change the tests to follow the best practice
approach.

I think the tests started failing around March 8th which I thought might
have to do with this commit: 9118d03a8cca3d97327c56bf89a72e328e454e63
(around that time) 
When appropriate, postpone SELECT output expressions till after ORDER
BY.
It is frequently useful for volatile, set-returning, or expensive
functions in a SELECT's targetlist to be postponed till after ORDER BY
and LIMIT are done.

Which involved change in output sort.

I'm not absolutely sure if this has to do with that commit, because we had
another test failing (where the order of the result changed, and putting in
an order by fixed that test). 



Thanks,
Regina






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-23 Thread Tom Lane
I'm not terribly happy with the hack I used to get pgbench to be able
to borrow psql's psqlscan.l lexer.  It's a mess for the MSVC build
scripts, and I have seen it causing two concurrent builds of psqlscan.o
in parallel builds, which is likely to cause a build failure some of
the time.  Another unresolved issue is that we can't apply FLEX_NO_BACKUP
testing to both of psql's lexers, because that causes each of those
make steps to want to write lex.backup in the current directory.

I have a modest proposal for improving this: let's move all the code
that's currently shared by two or more src/bin/ subdirectories into a
new directory, say src/feutils, and build it into a ".a" library in
the same way that src/common/ is handled.  This would remove the
following klugy cross-directory links:

src/bin/pgbench/psqlscan.o -> ../../../src/bin/psql/psqlscan.o
src/bin/psql/dumputils.c -> ../../../src/bin/pg_dump/dumputils.c
src/bin/psql/keywords.c -> ../../../src/bin/pg_dump/keywords.c
src/bin/scripts/dumputils.c -> ../../../src/bin/pg_dump/dumputils.c
src/bin/scripts/keywords.c -> ../../../src/bin/pg_dump/keywords.c
src/bin/scripts/mbprint.c -> ../../../src/bin/psql/mbprint.c
src/bin/scripts/print.c -> ../../../src/bin/psql/print.c

Note: the reason for a new subdirectory, rather than putting this
stuff into src/common/, is that src/common/ is meant for code that's
shared between frontend and backend, which this stuff is not.  Also,
many of these files depend on libpq which seems like an inappropriate
dependency for src/common.

Having said that, I also notice these dependencies:

src/bin/pg_dump/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
src/bin/psql/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
src/bin/scripts/kwlookup.c -> ../../../src/backend/parser/kwlookup.c
src/interfaces/ecpg/preproc/kwlookup.c -> 
../../../../src/backend/parser/kwlookup.c
src/bin/initdb/encnames.c -> ../../../src/backend/utils/mb/encnames.c
src/interfaces/libpq/encnames.c -> ../../../src/backend/utils/mb/encnames.c

which seem like they'd be better handled by moving those files into
src/common/.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres_fdw join pushdown - getting server crash in left outer join of three table

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 5:24 AM, Rajkumar Raghuwanshi
 wrote:
> Thanks Ashutosh for the patch. I have apply and retested it, now not getting
> server crash.

Thanks for the report and the testing.  I have committed the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator

2016-03-23 Thread Tom Lane
Robert Haas  writes:
> I did not find this version very clear.  It wasn't consistent about
> using ObjectIdGetDatum() where needed, but the bigger problem was that
> I found the logic unnecessarily convoluted.  I rewrote it - I believe
> more straightforwardly - as attached.  How does this look?

I'd suggest that we save some code by always doing separate updates for
the commutator and negator entries.  We can handle the corner case where
they're the same by doing a CommandCounterIncrement between the updates,
instead of having convoluted and probably-never-yet-tested logic.

I'm also a bit dubious of the assumption in RemoveOperatorById that an
operator can't be its own negator.  Yeah, that should not be the case,
but if it is the case the deletion will fail outright.

We could resolve both of these issues by changing the semantics of
OprUpdate so that it unconditionally does a CommandCounterIncrement
after each update that it performs.  IMO that would be a lot simpler
and more bulletproof; it'd allow removal of a lot of these
overly-tightly-reasoned cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Merlin Moncure
On Wed, Mar 23, 2016 at 8:21 AM, Merlin Moncure  wrote:
> I'm not understanding the objection at all.  You have N client
> sessions connecting to the database that all utilize the same named
> prepared statement.  A typical pattern is for the application to
> prepare them all upon startup, but currently each PREPARE needs to be
> wrapped with an exception handler in case someone else prepared it
> first.  Having an IF NOT EXISTS decoration simplifies this.  This can
> happen both inside and outside of connection pooling scenarios.

I'm walking that back a bit -- this is only interesting in pooler
scenarios, especially pgbouncer where you have no way of knowing if
the statement is created or not.  Of course, you can always re-prepare
them following a discard but that's quite pessimal in many cases.
Still, I've often wanted this exact feature.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Michael Meskes
> PS Who use ecpg? For me it's like hell. 

More than you think.

I doubt you want to propose removing features that make developing new
features harder, or do you? :)

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relax requirement for INTO with SELECT in pl/pgsql

2016-03-23 Thread Merlin Moncure
On Wed, Mar 23, 2016 at 10:57 AM, Jim Nasby  wrote:
> On 3/22/16 8:37 AM, Merlin Moncure wrote:
>>>
>>> I afraid of useless and forgotten call of functions. But the risk is same
>>> >like PERFORM - so this is valid from one half. The PERFORM statement
>>> > holds
>>> >special semantic, and it is interesting.
>>
>> I see your point here, but the cost of doing that far outweighs the
>> risks.  And I don't think the arbitrary standard of defining forcing
>> the user to identify if the query should return data is a good way of
>> identifying dead code.
>
> Not to mention that there's tons of other ways to introduce unintended
> inefficiencies. Off the top of my head, declaring variables that are never
> referenced and have no assignment is a big one.

Yes. This comes up most often with dblink for me.   Mainly because of
the slight wonkiness of dblink API, but totally agree this should
never have to be done.  Anyways, I submitted the patch to the next
open commit fest.

Totally off topic aside: are you in dallas for the next PUG? I
unfortunately missed the last one and will likely miss the next one
too -- I coach the company kickball team and we play on Wednesdays --
oh well.  Maybe beers afterwards?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Upper planner pathification

2016-03-23 Thread Jim Nasby

On 3/22/16 7:28 AM, Michael Paquier wrote:

On Mon, Mar 21, 2016 at 7:55 AM, Jim Nasby  wrote:

On 3/17/16 9:01 AM, Robert Haas wrote:


I think that
there are an awful lot of cases where extension authors haven't been
able to quite do what they want to do without core changes because
they couldn't get control in quite the right place; or they could do
it but they had to cut-and-paste a lot of code.


FWIW, I've certainly run into this at least once, maybe twice. The case I
can think of offhand is doing function resolution with variant. I don't
remember the details anymore, but my recollection is that to get what I
needed I would have needed to copy huge swaths of the rewrite code.


Amen, I have been doing that a couple of days ago with some elog stuff.


Any ideas on ways to address this? Adding more hooks in random places 
every time we stumble across something doesn't seem like a good method.


One thing I've wondered about is making it easier to find specific 
constructs in a parsed query so that you can make specific 
modifications. I recall looking at that once and finding a roadblock 
(maybe a bunch of functions were static?)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-03-23 Thread Aleksander Alekseev
> Thanks!  I can't think of anything else to worry about with regards to
> that version, so I have committed it.
> 

Thanks, Robert. And thanks everyone for contributing to this patch.


-- 
Best regards,
Aleksander Alekseev
http://eax.me/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator

2016-03-23 Thread Robert Haas
On Tue, Mar 22, 2016 at 9:25 PM, Tomas Vondra
 wrote:
> OK, the new code seems more comprehensible to me.

I did not find this version very clear.  It wasn't consistent about
using ObjectIdGetDatum() where needed, but the bigger problem was that
I found the logic unnecessarily convoluted.  I rewrote it - I believe
more straightforwardly - as attached.  How does this look?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 3cd1899..cd66823 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -54,8 +54,6 @@ static Oid OperatorShellMake(const char *operatorName,
   Oid leftTypeId,
   Oid rightTypeId);
 
-static void OperatorUpd(Oid baseId, Oid commId, Oid negId);
-
 static Oid get_other_operator(List *otherOp,
    Oid otherLeftTypeId, Oid otherRightTypeId,
    const char *operatorName, Oid operatorNamespace,
@@ -566,7 +564,7 @@ OperatorCreate(const char *operatorName,
 		commutatorId = operatorObjectId;
 
 	if (OidIsValid(commutatorId) || OidIsValid(negatorId))
-		OperatorUpd(operatorObjectId, commutatorId, negatorId);
+		OperatorUpd(operatorObjectId, commutatorId, negatorId, false);
 
 	return address;
 }
@@ -639,125 +637,158 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId,
  * OperatorUpd
  *
  *	For a given operator, look up its negator and commutator operators.
- *	If they are defined, but their negator and commutator fields
- *	(respectively) are empty, then use the new operator for neg or comm.
- *	This solves a problem for users who need to insert two new operators
- *	which are the negator or commutator of each other.
+ *	When isDelete is false, update their negator and commutator operators to
+ *	point back to the given operator; when isDelete is true, update those
+ *	operators to no longer point back to the given operator.
+ *
+ *	The !isDelete case solves a problem for users who need to insert two new
+ *  operators which are the negator or commutator of each other, while the
+ *  isDelete case is important so as not to leave dangling OID links behind
+ *  after dropping an operator.
  */
-static void
-OperatorUpd(Oid baseId, Oid commId, Oid negId)
+void
+OperatorUpd(Oid baseId, Oid commId, Oid negId, bool isDelete)
 {
-	int			i;
 	Relation	pg_operator_desc;
-	HeapTuple	tup;
-	bool		nulls[Natts_pg_operator];
-	bool		replaces[Natts_pg_operator];
-	Datum		values[Natts_pg_operator];
-
-	for (i = 0; i < Natts_pg_operator; ++i)
-	{
-		values[i] = (Datum) 0;
-		replaces[i] = false;
-		nulls[i] = false;
-	}
+	HeapTuple	tup = NULL;
 
 	/*
-	 * check and update the commutator & negator, if necessary
-	 *
-	 * We need a CommandCounterIncrement here in case of a self-commutator
-	 * operator: we'll need to update the tuple that we just inserted.
+	 * If we're making an operator into its own commutator, then we need a
+	 * command-counter increment here, since we've just inserted the tuple
+	 * we're about to update.  But when we're dropping an operator, we can
+	 * skip this.
 	 */
-	CommandCounterIncrement();
+	if (!isDelete)
+		CommandCounterIncrement();
 
+	/* Open the relation. */
 	pg_operator_desc = heap_open(OperatorRelationId, RowExclusiveLock);
 
-	tup = SearchSysCacheCopy1(OPEROID, ObjectIdGetDatum(commId));
+	/* Get a copy of the commutator's tuple. */
+	if (OidIsValid(commId))
+		tup = SearchSysCacheCopy1(OPEROID, ObjectIdGetDatum(commId));
 
-	/*
-	 * if the commutator and negator are the same operator, do one update. XXX
-	 * this is probably useless code --- I doubt it ever makes sense for
-	 * commutator and negator to be the same thing...
-	 */
-	if (commId == negId)
+	/* Update the commutator's tuple if need be. */
+	if (HeapTupleIsValid(tup))
 	{
-		if (HeapTupleIsValid(tup))
+		Form_pg_operator t = (Form_pg_operator) GETSTRUCT(tup);
+		bool		update_commutator = false;
+		bool		nulls[Natts_pg_operator];
+		bool		replaces[Natts_pg_operator];
+		Datum		values[Natts_pg_operator];
+
+		memset(values, 0, sizeof(values));
+		memset(replaces, 0, sizeof(replaces));
+		memset(nulls, 0, sizeof(nulls));
+
+		/*
+		 * Out of due caution, we only change the commutator's orpcom field
+		 * if it has the exact value we expected: InvalidOid when creating an
+		 * operator, and baseId when dropping one.
+		 */
+		if (isDelete && t->oprcom == baseId)
 		{
-			Form_pg_operator t = (Form_pg_operator) GETSTRUCT(tup);
+			values[Anum_pg_operator_oprcom - 1] = ObjectIdGetDatum(InvalidOid);
+			replaces[Anum_pg_operator_oprcom - 1] = true;
+			update_commutator = true;
+		}
+		else if (!isDelete && !OidIsValid(t->oprcom))
+		{
+			values[Anum_pg_operator_oprcom - 1] = ObjectIdGetDatum(baseId);
+			replaces[Anum_pg_operator_oprcom - 1] = true;
+			update_commutator = true;
+		}
 
-			if (!OidIsValid(t->oprcom) || !OidIsValid(t->oprnegate))
+		/*
+		 * If the commutator is 

Re: [HACKERS] Relax requirement for INTO with SELECT in pl/pgsql

2016-03-23 Thread Jim Nasby

On 3/22/16 8:37 AM, Merlin Moncure wrote:

I afraid of useless and forgotten call of functions. But the risk is same
>like PERFORM - so this is valid from one half. The PERFORM statement holds
>special semantic, and it is interesting.

I see your point here, but the cost of doing that far outweighs the
risks.  And I don't think the arbitrary standard of defining forcing
the user to identify if the query should return data is a good way of
identifying dead code.


Not to mention that there's tons of other ways to introduce unintended 
inefficiencies. Off the top of my head, declaring variables that are 
never referenced and have no assignment is a big one.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] README for src/backend/replication/logical

2016-03-23 Thread Alvaro Herrera
Craig Ringer wrote:
> Hi all
> 
> As part of some internal training/discussion I wrote some notes on logical
> decoding's structure and flow that seemed worth polishing up into a draft
> README for src/backend/replication/logical .

That's nice, but you attached the one in src/backend/replication
instead.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Yury Zhuravlev

Fabrízio de Royes Mello wrote:

I got an error when build this patch.


I fix it! All tests pass (include ecpg tests). 
Patch in attachment. 

Thanks. 

PS Who use ecpg? For me it's like hell. 


--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml
index dbce8f2..c52879f 100644
--- a/doc/src/sgml/ref/prepare.sgml
+++ b/doc/src/sgml/ref/prepare.sgml
@@ -26,7 +26,7 @@ PostgreSQL documentation
 
  
 
-PREPARE name [ ( data_type [, ...] ) ] AS statement
+PREPARE [ IF NOT EXISTS ] name [ ( data_type [, ...] ) ] AS statement
 
  
 
@@ -86,6 +86,15 @@ PREPARE name [ ( 

+IF NOT EXISTS
+
+ 
+  Do not throw an error if a prepare statement with the same name already exists.
+ 
+
+   
+
+   
 name
 
  
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index cec37ce..019330f 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -59,6 +59,7 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString)
 	int			nargs;
 	Query	   *query;
 	List	   *query_list;
+	bool		found;
 	int			i;
 
 	/*
@@ -70,6 +71,30 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString)
 (errcode(ERRCODE_INVALID_PSTATEMENT_DEFINITION),
  errmsg("invalid statement name: must not be empty")));
 
+	/* Find entry in hash table */
+	if(prepared_queries)
+	{
+		hash_search(prepared_queries,
+	stmt->name,
+	HASH_FIND,
+	);
+
+		/* Shouldn't get a duplicate entry */
+		if (found && stmt->if_not_exists)
+		{
+			ereport(NOTICE,
+	(errcode(ERRCODE_DUPLICATE_PSTATEMENT),
+	 errmsg("prepared statement \"%s\" already exists, skipping",
+			stmt->name)));
+			return;
+		}
+		else if (found && !stmt->if_not_exists)
+			ereport(ERROR,
+	(errcode(ERRCODE_DUPLICATE_PSTATEMENT),
+	 errmsg("prepared statement \"%s\" already exists",
+			stmt->name)));
+	}
+
 	/*
 	 * Create the CachedPlanSource before we do parse analysis, since it needs
 	 * to see the unmodified raw parse tree.
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..be8ac78 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4021,6 +4021,7 @@ _copyPrepareStmt(const PrepareStmt *from)
 	COPY_STRING_FIELD(name);
 	COPY_NODE_FIELD(argtypes);
 	COPY_NODE_FIELD(query);
+	COPY_SCALAR_FIELD(if_not_exists);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..fbd248b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2017,6 +2017,7 @@ _equalPrepareStmt(const PrepareStmt *a, const PrepareStmt *b)
 	COMPARE_STRING_FIELD(name);
 	COMPARE_NODE_FIELD(argtypes);
 	COMPARE_NODE_FIELD(query);
+	COMPARE_SCALAR_FIELD(if_not_exists);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b9aeb31..e08d95f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -9443,6 +9443,16 @@ PrepareStmt: PREPARE name prep_type_clause AS PreparableStmt
 	n->name = $2;
 	n->argtypes = $3;
 	n->query = $5;
+	n->if_not_exists = false;
+	$$ = (Node *) n;
+}
+			| PREPARE IF_P NOT EXISTS name prep_type_clause AS PreparableStmt
+{
+	PrepareStmt *n = makeNode(PrepareStmt);
+	n->name = $5;
+	n->argtypes = $6;
+	n->query = $8;
+	n->if_not_exists = true;
 	$$ = (Node *) n;
 }
 		;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2fd0629..f08dee4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2986,6 +2986,7 @@ typedef struct PrepareStmt
 	char	   *name;			/* Name of plan, arbitrary */
 	List	   *argtypes;		/* Types of parameters (List of TypeName) */
 	Node	   *query;			/* The query itself (as a raw parsetree) */
+	bool	   if_not_exists;
 } PrepareStmt;
 
 
diff --git a/src/interfaces/ecpg/preproc/check_rules.pl b/src/interfaces/ecpg/preproc/check_rules.pl
index 4c981e0..cba9a74 100644
--- a/src/interfaces/ecpg/preproc/check_rules.pl
+++ b/src/interfaces/ecpg/preproc/check_rules.pl
@@ -43,7 +43,9 @@ my %replace_line = (
 	  => 'CREATE OptTemp TABLE create_as_target AS EXECUTE prepared_name execute_param_clause',
 
 	'PrepareStmtPREPAREnameprep_type_clauseASPreparableStmt' =>
-	  'PREPARE prepared_name prep_type_clause AS PreparableStmt');
+	  'PREPARE prepared_name prep_type_clause AS PreparableStmt',
+	'PrepareStmtPREPAREIF_PNOTEXISTSnameprep_type_clauseASPreparableStmt' =>
+	  'PREPARE IF_P NOT EXISTS prepared_name prep_type_clause AS PreparableStmt');
 
 my $block= '';
 my $yaccmode = 0;
diff --git a/src/interfaces/ecpg/preproc/ecpg.addons b/src/interfaces/ecpg/preproc/ecpg.addons
index b3b36cf..c2742cf 100644
--- a/src/interfaces/ecpg/preproc/ecpg.addons
+++ b/src/interfaces/ecpg/preproc/ecpg.addons
@@ -283,6 +283,12 @@ ECPG: 

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-23 Thread Alexandr Popov



On 18.03.2016 20:19, Anastasia Lubennikova wrote:
Please, find the new version of the patch attached. Now it has WAL 
functionality.


Detailed description of the feature you can find in README draft 
https://goo.gl/50O8Q0


This patch is pretty complicated, so I ask everyone, who interested in 
this feature,
to help with reviewing and testing it. I will be grateful for any 
feedback.
But please, don't complain about code style, it is still work in 
progress.


Next things I'm going to do:
1. More debugging and testing. I'm going to attach in next message 
couple of sql scripts for testing.

2. Fix NULLs processing
3. Add a flag into pg_index, that allows to enable/disable compression 
for each particular index.
4. Recheck locking considerations. I tried to write code as less 
invasive as possible, but we need to make sure that algorithm is still 
correct.

5. Change BTMaxItemSize
6. Bring back microvacuum functionality.




Hi, hackers.

It's my first review, so do not be strict to me.

I have tested this patch on the next table:
create table message
(
idserial,
usr_idinteger,
texttext
);
CREATE INDEX message_usr_id ON message (usr_id);
The table has 1000 records.

I found the following:
The less unique keys the less size of the table.

Next 2 tablas demonstrates it.
New B-tree
Count of unique keys (usr_id), index“s size , time of creation
1000;"214 MB";"00:00:34.193441"
333  ;"214 MB";"00:00:45.731173"
200  ;"129 MB";"00:00:41.445876"
100  ;"129 MB";"00:00:38.455616"
10;"86 MB"  ;"00:00:40.887626"
1  ;"79 MB"  ;"00:00:47.199774"

Old B-tree
Count of unique keys (usr_id), index“s size , time of creation
1000;"214 MB";"00:00:35.043677"
333  ;"286 MB";"00:00:40.922845"
200  ;"300 MB";"00:00:46.454846"
100  ;"278 MB";"00:00:42.323525"
10;"287 MB";"00:00:47.438132"
1  ;"280 MB";"00:01:00.307873"

I inserted data  randomly and sequentially, it did not influence the 
index's size.
Time of select, insert and update random rows is not changed. It is 
great, but certainly it needs some more detailed study.


Alexander Popov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 5:49 AM, Aleksander Alekseev
 wrote:
> I still believe that optimizing 1% blindly without considering possible
> side effects this optimization can bring (other data alignment, some
> additional machine instructions - just to name a few) and having no
> way to measure these side effects is a bad idea. But I also admit a
> possibility that I can somehow be wrong on this. So I rewrote this
> patch one again :'( the way you suggested (without that alignment
> related hack I tried, it's just too ugly). I also attached original
> hashhdr-rmh.patch just to have both patches in one message so it would
> be easier to find both patches in this thread.

Thanks!  I can't think of anything else to worry about with regards to
that version, so I have committed it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Add SCTP network protocol to postgresql backend and frontend

2016-03-23 Thread Eduardo Morras
On Wed, 23 Mar 2016 10:13:42 -0300
Alvaro Herrera  wrote:

> Andreas Karlsson escribió:
> > On 03/23/2016 01:55 PM, Eduardo Morras wrote:
> > >Benefits:
> > >
> > >Dynamic multihoming, modifiable at run time, don't need aggregate
> > >links at OS level or shutdown servers/clients for a hardware or
> > >topology network change. Message oriented connection. Message
> > >reliability. Inmune to SYN floods that affect tcp.
> > >Assimetric multihoming, a client with 4 links(3x 1GbEth + wifi)
> > >can connect to a server with 1 link (10GbEth). Metadata connection
> > >messages.
> > 
> > While SCTP has some nice advantages in general (I think it is a
> > pity it is not used more) I wonder how well these benefits
> > translate into the database space. Many databases are run either in
> > a controlled server environment with no direct access from the
> > Internet, or locally on the same machine as the application. In
> > those environments you generally do not have to worry about SYN
> > floods or asymmetric links.
> 
> That might or might not be the most common cases, but replication
> across the ocean and similar long-range setups are a reality today
> and their use will only increase.
> 
> I wonder about message ordering.  Is it possible to get messages out
> of order in SCTP?  Say if you have an ordered resultset stream from
> the server, it would be disastrous to get the data messages out of
> order.

Message ordering is optional, server decides if clients can use messages out of 
order as received or strictly in the same order as sended.

 
> -- 
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

---   ---
Eduardo Morras 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add something in struct PGPROC

2016-03-23 Thread Aleksander Alekseev
> Hi
> I just add something in struct PGPORC, such as "int level", But I
> cann't seet what I added in MyProc while debugging. That's confused, 
> 
> 
> zhangxiaojian

I think you forgot to run `make clean` after changing proc.h. When you
change header files dependent files are not recompiled automatically. 

-- 
Best regards,
Aleksander Alekseev
http://eax.me/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Add SCTP network protocol to postgresql backend and frontend

2016-03-23 Thread Eduardo Morras
On Wed, 23 Mar 2016 14:03:31 +0100
Andreas Karlsson  wrote:

> On 03/23/2016 01:55 PM, Eduardo Morras wrote:
> > Benefits:
> >
> > Dynamic multihoming, modifiable at run time, don't need aggregate
> > links at OS level or shutdown servers/clients for a hardware or
> > topology network change. Message oriented connection. Message
> > reliability. Inmune to SYN floods that affect tcp.
> > Assimetric multihoming, a client with 4 links(3x 1GbEth + wifi) can
> > connect to a server with 1 link (10GbEth). Metadata connection
> > messages.
> 
> While SCTP has some nice advantages in general (I think it is a pity
> it is not used more) I wonder how well these benefits translate into
> the database space. Many databases are run either in a controlled
> server environment with no direct access from the Internet, or
> locally on the same machine as the application. In those environments
> you generally do not have to worry about SYN floods or asymmetric
> links.
> 
> Do you have any specific use case in mind?

The main use case is change the network topology on the fly, without shutting 
down postgresql server, postgresql middleware, or any of the applications that 
uses it through libpq. 

Specific use case, backup is backup server on OS level or pgdump, not 
postgresql slave, (hope it don't wraps) 

backup <-> postgresql <-> middleware <-> client apps <-> backup

At peak times you need all nics connected between postgresql servers and 
middleware and client apps,

backup <-> postgresql <=> middleware <=> client apps <-> backup

at night or idle time or while backup, you can reassign the nics to get more 
network bandwith to backup server

backup <=> postgresql <-> middleware <-> client apps <=> backup

On a crash restore, all nics are used from backup to servers

backup  postgresql < > middleware < > client apps  backup

> Andreas


---   ---
Eduardo Morras 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 8:43 AM, Michael Paquier
 wrote:
> On Wed, Mar 23, 2016 at 8:39 PM, Robert Haas  wrote:
>> On Wed, Mar 23, 2016 at 7:34 AM, Thomas Munro
>>  wrote:
>>> synchronous_commitTPS
>>>  
>>> off  9234
>>> local1223
>>> remote_write  907
>>> on587
>>> remote_apply  555
>>>
>>> synchronous_commitTPS
>>>  
>>> off  3937
>>> local1984
>>> remote_write 1701
>>> on   1373
>>> remote_apply 1349
>>
>> Hmm, so "remote_apply" is barely more expensive than "on".  That's 
>> encouraging.
>
> Indeed, interesting. This is perhaps proving that just having the
> possibility to have remote_apply (with feedback messages managed by
> signals, which is the best thing proposed for this release) is what we
> need to ensure the consistency of reads across nodes, and what would
> satisfy most of the user's requirements. Getting a slightly lower TPS
> may be worth the cost for some users if they can ensure that reads
> across nodes are accessible after a local commit, and there is no need
> to have an error management layer at application level to take care of
> errors caused by causal read timeouts.

Well, I wouldn't go that far.  It seems pretty clear that remote_apply
by itself is useful - I can't imagine anybody seriously arguing the
contrary, whatever they think of this implementation.  My view,
though, is that by itself that's pretty limiting: you can only have
one standby, and if that standby falls over then you lose
availability.  Causal reads fixes both of those problems - admittedly
that requires some knowledge in the application or the pooler, but
it's no worse than SSI in that regard.  Still, half a loaf is better
than none, and I imagine even just getting remote_apply would make a
few people quite happy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Add SCTP network protocol to postgresql backend and frontend

2016-03-23 Thread Andreas Karlsson

On 03/23/2016 02:13 PM, Alvaro Herrera wrote:

Andreas Karlsson escribió:

On 03/23/2016 01:55 PM, Eduardo Morras wrote:

Benefits:

Dynamic multihoming, modifiable at run time, don't need aggregate links at OS 
level or shutdown servers/clients for a hardware or topology network change.
Message oriented connection.
Message reliability.
Inmune to SYN floods that affect tcp.
Assimetric multihoming, a client with 4 links(3x 1GbEth + wifi) can connect to 
a server with 1 link (10GbEth).
Metadata connection messages.


While SCTP has some nice advantages in general (I think it is a pity it is
not used more) I wonder how well these benefits translate into the database
space. Many databases are run either in a controlled server environment with
no direct access from the Internet, or locally on the same machine as the
application. In those environments you generally do not have to worry about
SYN floods or asymmetric links.


That might or might not be the most common cases, but replication across
the ocean and similar long-range setups are a reality today and their use
will only increase.


Agreed. When I reread my message I realized that I implied things I did 
not mean. People run databases today in the cloud and, as you said, long 
distance replication will only get more common. What I am actually 
curious about is how the advantages of SCTP translate into the database 
space.



I wonder about message ordering.  Is it possible to get messages out of
order in SCTP?  Say if you have an ordered resultset stream from the
server, it would be disastrous to get the data messages out of order.


Message ordering is an optional feature in SCTP, so if you need message 
ordering you can get it.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add something in struct PGPROC

2016-03-23 Thread A student from china
Hi
I just add something in struct PGPORC, such as "int level", But I cann't 
seet what I added in MyProc while debugging. That's confused, 


zhangxiaojian

Re: [HACKERS] Bug in searching path in jsonb_set when walking through JSONB array

2016-03-23 Thread Tom Lane
Michael Paquier  writes:
> That's ugly. We should actually use TextDatumGetCString because the
> index is stored as text here via a Datum, and then it is converted
> back to an integer. So I propose instead the simple patch attached
> that fixes the failure for me. Could you check if that works for you?

Yeah, this seems better.  But that code needs review anyway, as it's using
elog() for user-facing error conditions, and I'm thinking the error texts
could use a bit of attention from a native English speaker.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Add SCTP network protocol to postgresql backend and frontend

2016-03-23 Thread Alvaro Herrera
Andreas Karlsson escribió:
> On 03/23/2016 01:55 PM, Eduardo Morras wrote:
> >Benefits:
> >
> >Dynamic multihoming, modifiable at run time, don't need aggregate links at 
> >OS level or shutdown servers/clients for a hardware or topology network 
> >change.
> >Message oriented connection.
> >Message reliability.
> >Inmune to SYN floods that affect tcp.
> >Assimetric multihoming, a client with 4 links(3x 1GbEth + wifi) can connect 
> >to a server with 1 link (10GbEth).
> >Metadata connection messages.
> 
> While SCTP has some nice advantages in general (I think it is a pity it is
> not used more) I wonder how well these benefits translate into the database
> space. Many databases are run either in a controlled server environment with
> no direct access from the Internet, or locally on the same machine as the
> application. In those environments you generally do not have to worry about
> SYN floods or asymmetric links.

That might or might not be the most common cases, but replication across
the ocean and similar long-range setups are a reality today and their use
will only increase.

I wonder about message ordering.  Is it possible to get messages out of
order in SCTP?  Say if you have an ordered resultset stream from the
server, it would be disastrous to get the data messages out of order.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Generic WAL logical messages

2016-03-23 Thread Alvaro Herrera
Petr Jelinek wrote:

> +++ b/contrib/test_decoding/sql/messages.sql
> @@ -0,0 +1,17 @@
> +-- predictability
> +SET synchronous_commit = on;
> +
> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 
> 'test_decoding');
> +
> +SELECT 'msg1' FROM pg_logical_emit_message(true, 'test', 'msg1');
> +SELECT 'msg2' FROM pg_logical_emit_message(false, 'test', 'msg2');
> +
> +BEGIN;
> +SELECT 'msg3' FROM pg_logical_emit_message(true, 'test', 'msg3');
> +SELECT 'msg4' FROM pg_logical_emit_message(false, 'test', 'msg4');
> +SELECT 'msg5' FROM pg_logical_emit_message(true, 'test', 'msg5');
> +COMMIT;
> +
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
> 'force-binary', '0', 'skip-empty-xacts', '1');
> +
> +SELECT 'init' FROM pg_drop_replication_slot('regression_slot');

No tests for a rolled back transaction?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-23 Thread Tomas Vondra

On 03/23/2016 06:20 AM, Tatsuo Ishii wrote:

I am now looking into the create statistics doc to see if the example
appearing in it is working. I will get back if I find any.


I have the ref doc: CREATE STATISTICS

There are nice examples how the multivariate statistics gives better
row number estimation. So I gave them a try.

"Create table t1 with two functionally dependent columns,
 i.e. knowledge of a value in the first column is sufficient for
 determining the value in the other column" The example creates table
 "t1", then populates it using generate_series. After CREATE
 STATISTICS, ANALYZE and EXPLAIN. I expected the EXPLAIN demonstrates
 how result rows estimation is enhanced by using the multivariate
 statistics.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
QUERY PLAN
---
 Seq Scan on t1  (cost=0.00..19425.00 rows=98 width=8) (actual 
time=76.876..76.876 rows=0 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 100
 Planning time: 0.146 ms
 Execution time: 76.896 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
QUERY PLAN
--
 Seq Scan on t1  (cost=0.00..19425.00 rows=1 width=8) (actual 
time=78.867..78.867 rows=0 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 100
 Planning time: 0.102 ms
 Execution time: 78.885 ms
(5 rows)

It seems the row numbers estimation (98) using the multivariate
statistics is actually *worse* than the one (1) not using the
statistics because the actual row number is 0.


Yes, there's a mistake in the first query, because the conditions 
actually are not compatible. I.e. (i/100)=1 and (i/500)=1 have no 
overlapping rows, clearly. It should be


EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);

instead. Will fix.



Next example (using table "t2") is much better than the case using t1.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
   QUERY PLAN

 Seq Scan on t2  (cost=0.00..19425.00 rows=9633 width=8) (actual 
time=0.012..75.350 rows=1 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 99
 Planning time: 0.107 ms
 Execution time: 75.680 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
  QUERY PLAN
--
 Seq Scan on t2  (cost=0.00..19425.00 rows=91 width=8) (actual 
time=0.008..76.614 rows=1 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 99
 Planning time: 0.067 ms
 Execution time: 76.935 ms
(5 rows)

This time it seems the row numbers estimation (9633) using the
multivariate statistics is much better than the one (91) not using the
statistics because the actual row number is 1.

The last example (using table "t3") seems no effect by multivariate statistics.


Yes. There's a typo in the example - it analyzes the wrong table (t2 
instead of t3). Once I fix that, the estimates are much better.



In summary, the only case which shows the effect of the multivariate
statistics is the "t2" case. So I don't see why other examples are
shown in the manual. Am I missing something?


No, thanks for spotting those mistakes. I'll fix them and submit a new 
version of the patch - either later today or perhaps tomorrow.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT EXIST for PREPARE

2016-03-23 Thread Merlin Moncure
On Wed, Mar 23, 2016 at 7:27 AM, Craig Ringer  wrote:
> With PREPARE IF NOT EXISTS the client is also paying parse and network
> overhead for every time you send that statement. Much better not to send it
> repeatedly in the first place.

How did you determine that?  The client prepares the statement exactly
once.  The problem is there's no easy way to determine if someone else
prepared it first and this patch directly addresses that.

> I think we need to take a step back here and better define the problem
> before stepping in with a proposed solution. Something that avoids the need
> to spam the server with endless copies of the same PREPARE statements would
> be good.

I'm not understanding the objection at all.  You have N client
sessions connecting to the database that all utilize the same named
prepared statement.  A typical pattern is for the application to
prepare them all upon startup, but currently each PREPARE needs to be
wrapped with an exception handler in case someone else prepared it
first.  Having an IF NOT EXISTS decoration simplifies this.  This can
happen both inside and outside of connection pooling scenarios.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-03-23 Thread Thom Brown
On 22 March 2016 at 02:30, Etsuro Fujita  wrote:
> On 2016/03/19 3:30, Robert Haas wrote:
>>
>> On Fri, Mar 18, 2016 at 5:15 AM, Etsuro Fujita
>>  wrote:
>>>
>>> Attached is the updated version of the patch.

I've noticed that you now can't cancel a query if there's DML pushdown
to a foreign server.  This previously worked while it was sending
individual statements as it interrupted and rolled it back.

Here's what the local server sees when trying to cancel:

# DELETE FROM remote.contacts;
^CCancel request sent
DELETE 500

This should probably be fixed.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >