date:20160815

Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-15 Thread Gavin Flower


On 16/08/16 16:19, Andrew Dunstan wrote:



On 08/15/2016 02:23 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 08/15/2016 10:19 AM, Tom Lane wrote:

Andrew Dunstan  writes:
We should probably specify -bext='/', which would cause the backup 
files

to be deleted unless an error occurred.

Really?  That seems a bit magic, and it's certainly undocumented.

We must be using different versions.

Hmm ... I'm using the RHEL6 version, which claims to be v20090616,
which is what pgindent/README says to use.



Ah. I have Fedora 22's v20140711. 2009 seems a bit ancient :-) Anyway, 
what you've done seems fine.


cheers

andrew



A vintage year, but for the best taste, it has to have been to be aged 
on Oak!  :-)




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch proposal

2016-08-15 Thread Venkata B Nagothi

On Tue, Aug 16, 2016 at 2:50 AM, David Steele  wrote:

> On 8/15/16 2:33 AM, Venkata B Nagothi wrote:
>
> > During the recovery process, It would be nice if PostgreSQL generates an
> > error by aborting the recovery process (instead of starting-up the
> > cluster) if the intended recovery target point is not reached and give
> > an option to DBA to resume the recovery process from where it exactly
> > stopped.
>
> Thom wrote a patch [1] recently that gives warnings in this case.  You
> might want to have a look at that first.
>

That is good to know. Yes, this patch is about generating a more meaningful
output messages for recovery process, which makes sense.

> > The issue here is, if by any chance, the required WALs are not available
> > or if there is any WAL missing or corrupted at the restore_command
> > location, then PostgreSQL recovers until the end of the last available
> > WAL and starts-up the cluster.
>
> You can use pause_at_recovery_target/recovery_target_action (depending
> on your version) to prevent promotion.  That would work for your stated
> scenario but not for the scenario where replay starts (or the database
> reaches consistency) after the recovery target.
>

The above said parameters can be configured to pause, shutdown or prevent
promotion only after reaching the recovery target point.
To clarify, I am referring to a scenario where recovery target point is not
reached at all ( i mean, half-complete or in-complete recovery) and there
are lots of WALs still pending to be replayed - in this situation,
PostgreSQL just completes the archive recovery until the end of the last
available WAL (WAL file "0001001E" in my case) and
starts-up the cluster by generating an error message (saying
"0001001F" not found).

Note: I am testing in PostgreSQL-9.5

LOG:  restored log file "0001001E" from archive
cp: cannot stat ‘/data/pgrestore9531/0001001F’: No such
file or directory
LOG:  redo done at 0/1EFFDBB8
LOG:  last completed transaction was at log time 2016-08-15
11:04:26.795902+10

I have used the following recovery* parameters in the recovery.conf file
here and have intentionally not supplied all the WAL archives needed for
the recovery process to reach the target xid.

recovery_target_xid = ,
recovery_target_inclusive = true
recovery_target_action = pause

It would be nice if PostgreSQL pauses the recovery in-case its not complete
(because of missing or corrupt WAL), shutdown the cluster and allows the
DBA to restart the replay of the remaining WAL Archive files to continue
recovery (from where it stopped previously) until the recovery target point
is reached.

Regards,
Venkata B N

Fujitsu Australia

Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-15 Thread Rushabh Lathia

On Mon, Aug 15, 2016 at 6:02 PM, Robert Haas  wrote:

> On Sat, Aug 13, 2016 at 4:36 AM, Amit Kapila 
> wrote:
> > AFAICS, your patch seems to be the right fix for this issue, unless we
> > need the instrumentation information during execution (other than for
> > explain) for some purpose.
>
> Hmm, I disagree.  It should be the job of
> ExecParallelRetrieveInstrumentation to allocate its data in the
> correct context, not the responsibility of nodeGather.c to work around
> the fact that it doesn't.  The worker instrumentation should be
> allocated in the same context as the regular instrumentation
> information, which I assume is probably the per-query context.
>

I agree, this make sense.

Here is the patch to allocate worker instrumentation into same context
as the regular instrumentation which is per-query context.

PFA patch.


--
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 380d743..5aa6f02 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -500,6 +500,7 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 	int			n;
 	int			ibytes;
 	int			plan_node_id = planstate->plan->plan_node_id;
+	MemoryContext oldcontext;
 
 	/* Find the instumentation for this node. */
 	for (i = 0; i < instrumentation->num_plan_nodes; ++i)
@@ -514,10 +515,19 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 	for (n = 0; n < instrumentation->num_workers; ++n)
 		InstrAggNode(planstate->instrument, [n]);
 
-	/* Also store the per-worker detail. */
+	/*
+	 * Also store the per-worker detail.
+	 *
+	 * Worker instrumentation should be allocated in the same context as
+	 * the regular instrumentation information, which is the per-query
+	 * context. Switch into per-query memory context.
+	 */
+	oldcontext = MemoryContextSwitchTo(planstate->state->es_query_cxt);
 	ibytes = mul_size(instrumentation->num_workers, sizeof(Instrumentation));
 	planstate->worker_instrument =
 		palloc(ibytes + offsetof(WorkerInstrumentation, instrument));
+	MemoryContextSwitchTo(oldcontext);
+
 	planstate->worker_instrument->num_workers = instrumentation->num_workers;
 	memcpy(>worker_instrument->instrument, instrument, ibytes);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in to_timestamp().

2016-08-15 Thread amul sul

Monday, 15 August 2016 9:58 PM, Robert Haas  wrote:

>On Mon, Aug 15, 2016 at 10:56 AM, amul sul  wrote:
>> On Thursday, 11 August 2016 3:18 PM, Artur Zakirov
>>  wrote:


[Skipped..]
>Well, what's the Oracle behavior in any of these cases?  I don't think
>we can decide to change any of this behavior without knowing that.  If
>a proposed behavior change is incompatible with our previous releases,
>I think it'd better at least be more compatible with Oracle.
>Otherwise, we're just changing from an established behavior that we
>invented ourselves to a new behavior we invented ourselves, which is
>only worthwhile if it's absolutely clear that the new behavior is way
>better.

Following cases works as expected on Oracle (except 3rd one asking input value 
for &15).

>> SELECT TO_TIMESTAMP('2015-12-31 13:43:36', ' MM DD HH24 MI SS');
>> SELECT TO_TIMESTAMP('2011$03!18 23_38_15', '-MM-DD HH24:MI:SS');
>> SELECT TO_TIMESTAMP('2011*03*18 23^38&15', '-MM-DD HH24:MI:SS');
>> SELECT TO_TIMESTAMP('2011*03!18 #%23^38$15', '-MM-DD$$$HH24:MI:SS');


And rest throwing error as shown below:

>> SELECT TO_TIMESTAMP('2016-06-13 99:99:99', '-MM-DD HH24:MI:SS');
ERROR at line 1:
ORA-01850: hour must be between 0 and 23

>> SELECT TO_TIMESTAMP('2016-02-30 15:43:36', '-MM-DD HH24:MI:SS');
ERROR at line 1:
ORA-01839: date not valid for month specified


>(Also, note that text formatted email is generally preferred to HTML
>on this mailing list; the fact that your email is in a different font
>than the rest of the thread makes it hard to read.)

Understood. Will try to follow this, thanks.


Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix comment in ATExecValidateConstraint

2016-08-15 Thread Amit Langote

On 2016/07/29 23:50, Robert Haas wrote:
> On Mon, Jul 25, 2016 at 4:18 AM, Amit Langote
>  wrote:
>> The comment seems to have been copied from ATExecAddColumn, which says:
>>
>>  /*
>>   * If we are told not to recurse, there had better not be any
>> - * child tables; else the addition would put them out of step.
>>
>> For ATExecValidateConstraint, it should say something like:
>>
>> + * child tables; else validating the constraint would put them
>> + * out of step.
>>
>> Attached patch fixes it.
> 
> I agree that the current comment is wrong, but what does "out of step"
> actually mean here, anyway?  I think this isn't very clear.

Like Tom explained over at [1], I guess it means if a constraint is marked
validated in parent, the same constraint in all the children should have
been marked validated as well. Validating just the parent's copy seems to
break this invariant. I admit though that I don't know if the phrase "out
of step" conveys that.

Thanks,
Amit

[1] https://www.postgresql.org/message-id/13658.1470072749%40sss.pgh.pa.us




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-15 Thread Andrew Dunstan




On 08/15/2016 02:23 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 08/15/2016 10:19 AM, Tom Lane wrote:

Andrew Dunstan  writes:

We should probably specify -bext='/', which would cause the backup files
to be deleted unless an error occurred.

Really?  That seems a bit magic, and it's certainly undocumented.

We must be using different versions.

Hmm ... I'm using the RHEL6 version, which claims to be v20090616,
which is what pgindent/README says to use.



Ah. I have Fedora 22's v20140711. 2009 seems a bit ancient :-) Anyway, 
what you've done seems fine.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anyelement -> anyrange

2016-08-15 Thread Tom Lane

Jim Nasby  writes:
> Any reason why we can create a function that accepts anyelement and 
> returns anyarray, but can't do the same with anyrange?

Because there can be more than one range type over the same element
type, so we couldn't deduce which one should be used for anyrange.

The other direction (inferring anyelement from anyrange) does work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] anyelement -> anyrange

2016-08-15 Thread Jim Nasby

Any reason why we can create a function that accepts anyelement and 
returns anyarray, but can't do the same with anyrange? Could we attempt 
to match each range subtype looking for a match?


create function range__create(anyelement,anyelement,text = '[]') RETURNS 
anyrange LANGUAGE plpgsql AS $body$

BEGIN
RETURN int4range($1,$2,$3)
END$body$;
ERROR:  42P13: cannot determine result data type
DETAIL:  A function returning "anyrange" must have at least one 
"anyrange" argument.


create function array__create(anyelement,anyelement) RETURNS anyarray 
LANGUAGE plpgsql AS $body$

BEGIN
RETURN array[$1,$2];
END$body$;
CREATE FUNCTION
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-08-15 Thread Robert Haas

On Mon, Aug 15, 2016 at 9:10 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sat, Aug 13, 2016 at 11:56 AM, Alvaro Herrera
>>  wrote:
>>> I too prefer to keep it turned off in 9.6 and consider enabling it by
>>> default on a future release (10 is probably good).  Interested users can
>>> carefully test the feature without endangering other unsuspecting users.
>>> I agree with the idea of keeping it enabled in master, so that it'll get
>>> a modicum of testing there by hackers, too.
>
>> Sounds like that is the consensus.  Who's going to implement it?
>
> I believe we're talking about reverting 77cd477c4 (in 9.6 only not
> master), correct?  It's a little harder than just "git revert" because
> of the subsequent max_parallel_degree -> max_parallel_workers_per_gather
> name change, but still not exactly rocket science.
>
> Since 77cd477c4 was your commit, I'd sort of expect you to do the
> honors, but if you don't want to I can.

OK, I'll do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-08-15 Thread Tom Lane

Robert Haas  writes:
> On Sat, Aug 13, 2016 at 11:56 AM, Alvaro Herrera
>  wrote:
>> I too prefer to keep it turned off in 9.6 and consider enabling it by
>> default on a future release (10 is probably good).  Interested users can
>> carefully test the feature without endangering other unsuspecting users.
>> I agree with the idea of keeping it enabled in master, so that it'll get
>> a modicum of testing there by hackers, too.

> Sounds like that is the consensus.  Who's going to implement it?

I believe we're talking about reverting 77cd477c4 (in 9.6 only not
master), correct?  It's a little harder than just "git revert" because
of the subsequent max_parallel_degree -> max_parallel_workers_per_gather
name change, but still not exactly rocket science.

Since 77cd477c4 was your commit, I'd sort of expect you to do the
honors, but if you don't want to I can.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-15 Thread Michael Paquier

On Mon, Aug 15, 2016 at 12:12 AM, Ryan Murphy  wrote:
> This is to fix an issue that came up for me when running initdb.
>
> At the end of a successful initdb it says:
>
> Success. You can now start the database server using:
> pg_ctl -D /some/path/to/data -l logfile start
>
> but this command did not work for me because my data directory
> contained a space.  The src/bin/initdb/initdb.c source code
> did already have a QUOTE_PATH constant, but it was the empty
> string for non-windows cases.
>
> Therefore, added quotes via existing QUOTE_PATH constant:
>
> pg_ctl -D '/some/path/to/data' -l logfile start

You may want to register this patch to the next commit fest:
https://commitfest.postgresql.org/10/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PSA: Systemd will kill PostgreSQL

2016-08-15 Thread Tom Lane

Josh Berkus  writes:
> On 08/15/2016 05:18 PM, Tom Lane wrote:
>> Well, yeah, it's easy to fix once you know you need to do so.  The
>> complaint is basically that out-of-the-box, it's broken, and it's
>> not very clear what was gained by breaking it.

> You're welcome to argue with Lennart about that.

Hah!  I can think of more pleasant ways of wasting my time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-08-15 Thread Peter Geoghegan

On Wed, Aug 3, 2016 at 2:13 PM, Peter Geoghegan  wrote:
> Since merging is a big bottleneck with this, we should probably also
> work to address that indirectly.

I attach a patch that changes how we maintain the heap invariant
during tuplesort merging. I already mentioned this over on the
"Parallel tuplesort, partitioning, merging, and the future" thread. As
noted already on that thread, this patch makes merging clustered
numeric input about 2.1x faster overall in one case, which is
particularly useful in the context of a serial final/leader merge
during a parallel CREATE INDEX. Even *random* non-C-collated text
input is made significantly faster. This work is totally orthogonal to
parallelism, though; it's just very timely, given our discussion of
the merge bottleneck on this thread.

If I benchmark a parallel build of a 100 million row index, with
presorted input, I can see a 71% reduction in *comparisons* with 8
tapes/workers, and an 80% reduction in comparisons with 16
workers/tapes in one instance (the numeric case I just mentioned).
With random input, we can still come out significantly ahead, but not
to the same degree. I was able to see a reduction in comparisons
during a leader merge, from 1,468,522,397 comparisons to 999,755,569
comparisons, which is obviously still quite significant (worker
merges, if any, benefit too). I think I need to redo my parallel
CREATE INDEX benchmark, so that you can take this into account. Also,
I think that this patch will make very large external sorts that
naturally have tens of runs to merge significantly faster, but I
didn't bother to benchmark that.

The patch is intended to be applied on top of parallel B-Tree patches
0001-* and 0002-* [1]. I happened to test it with parallelism, but
these are all independently useful, and will be entered as a separate
CF entry (perhaps better to commit the earlier two patches first, to
avoid merge conflicts). I'm optimistic that we can get those 3 patches
in the series out of the way early, without blocking on discussing
parallel sort.

The patch makes tuplesort merging shift down and displace the root
tuple with the tape's next preread tuple, rather than compacting and
then inserting into the heap anew. This approach to maintaining the
heap as tuples are returned to caller will always produce fewer
comparisons overall. The new approach is also simpler. We were already
shifting down to compact the heap within the misleadingly named [2]
function tuplesort_heap_siftup() -- why not instead just use the
caller tuple (the tuple that we currently go on to insert) when
initially shifting down (not the heap's preexisting last tuple, which
is guaranteed to go straight to the leaf level again)? That way, we
don't need to enlarge the heap at all through insertion, shifting up,
etc. We're done, and are *guaranteed* to have performed less work
(fewer comparisons and swaps) than with the existing approach (this is
the reason for my optimism about getting this stuff out of the way
early).

This new approach is more or less the *conventional* way to maintain
the heap invariant when returning elements from a heap during k-way
merging. Our existing approach is convoluted; merging was presumably
only coded that way because the generic functions
tuplesort_heap_siftup() and tuplesort_heap_insert() happened to be
available. Perhaps the problem was masked by unrelated bottlenecks
that existed at the time, too.

I think that I could push this further (a minimum of 2 comparisons per
item returned when 3 or more tapes are active still seems like 1
comparison too many), but what I have here gets us most of the
benefit. And, it does so while not actually adding code that could be
called "overly clever", IMV. I'll probably leave clever, aggressive
optimization of merging for a later release.

[1] 
https://www.postgresql.org/message-id/CAM3SWZQKM=Pzc=cahzrixkjp2eo5q0jg1sofqqexfq647ji...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/cam3swzq+2gjmnv7chxwexqxoplfb_few2rfexhj+gsyf39f...@mail.gmail.com
-- 
Peter Geoghegan
From 71dcd1fc8acf8e54478526bd0a7feaf5e43bf2c1 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Tue, 9 Aug 2016 14:53:49 -0700
Subject: [PATCH 3/6] Displace heap's root during tuplesort merge

Directly displace and shift down.  This is significantly faster than
shifting down to compact the heap, and then shifting up to insert a new
preload tuple during tuplesort merging, especially when the merge can
naturally return at least a few tuples from each tape (return them as
sorted output to caller) before some other tape needs to return tuples.
This is fairly common in the real world; tuple attribute values are
often found to be physically clustered together on input into
tuplesort.c.
---
 src/backend/utils/sort/tuplesort.c | 110 +
 1 file changed, 100 insertions(+), 10 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c

Re: [HACKERS] Let's get rid of the separate minor version numbers for shlibs

2016-08-15 Thread Tom Lane

Peter Eisentraut  writes:
> On 8/15/16 5:11 PM, Stephen Frost wrote:
>> Eh?  Last I checked, we needed minor version bumps to ensure that
>> binaries compiled against later versions, which might use newer symbols,
>> don't try to link against older libraries (which wouldn't have those
>> symbols).

> Let's review:

> What we install is

> libpq.so.5.8 (actual file)
> libpq.so.5 -> libpq.so.5.8
> libpq.so -> libpq.so.5.8

> The second one is the one used at run-time, looked up by SONAME.

Right, and that is all exactly per distro recommendations, at least
for Red Hat, and I'm pretty sure other distros too.  This has not
been changed recently TTBOMK.  See for example

http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html

The only argument that particular document offers for including the
minor number is that it makes it easier to see which specific
version you have installed.  That's not much, but it's not
nothing either.  There might be other reasons I'm not remembering.

Also, SO_MINOR_VERSION is included in the shlib name for most
Unix-oid platforms, not just Linux.  Even if we were to conclude
this was no longer recommended practice for Linux, I doubt we
should unilaterally drop the practice everywhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-08-15 Thread Petr Jelinek


On 15/08/16 15:51, Stas Kelvich wrote:

On 11 Aug 2016, at 17:43, Petr Jelinek  wrote:



* Also I wasn’t able actually to run replication itself =) While regression 
tests passes, TAP
tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. 
I’ll investigate
that more and write again.


Interesting, please keep me posted. It's possible for tables to stay in 's' 
state for some time if there is nothing happening on the server, but that 
should not mean anything is stuck.


Slightly played around, it seems that apply worker waits forever for substate 
change.

(lldb) bt
* thread #1: tid = 0x183e00, 0x7fff88c7f2a2 libsystem_kernel.dylib`poll + 
10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x7fff88c7f2a2 libsystem_kernel.dylib`poll + 10
frame #1: 0x0001017ca8a3 
postgres`WaitEventSetWaitBlock(set=0x7fd2dc816b30, cur_timeout=1, 
occurred_events=0x7fff5e7f67d8, nevents=1) + 51 at latch.c:1108
frame #2: 0x0001017ca438 
postgres`WaitEventSetWait(set=0x7fd2dc816b30, timeout=1, 
occurred_events=0x7fff5e7f67d8, nevents=1) + 248 at latch.c:941
frame #3: 0x0001017c9fde 
postgres`WaitLatchOrSocket(latch=0x00010ab208a4, wakeEvents=25, sock=-1, 
timeout=1) + 254 at latch.c:347
frame #4: 0x0001017c9eda postgres`WaitLatch(latch=0x00010ab208a4, 
wakeEvents=25, timeout=1) + 42 at latch.c:302
  * frame #5: 0x000101793352 
postgres`wait_for_sync_status_change(tstate=0x000101e409b0) + 178 at 
tablesync.c:228
frame #6: 0x000101792bbe 
postgres`process_syncing_tables_apply(slotname="subbi", 
end_lsn=140734778796592) + 430 at tablesync.c:436
frame #7: 0x0001017928c1 
postgres`process_syncing_tables(slotname="subbi", end_lsn=140734778796592) + 81 
at tablesync.c:518
frame #8: 0x00010177b620 
postgres`LogicalRepApplyLoop(last_received=140734778796592) + 704 at 
apply.c:1122
frame #9: 0x00010177bef4 postgres`ApplyWorkerMain(main_arg=0) + 1044 at 
apply.c:1353
frame #10: 0x00010174cb5a postgres`StartBackgroundWorker + 826 at 
bgworker.c:729
frame #11: 0x000101762227 
postgres`do_start_bgworker(rw=0x7fd2db70) + 343 at postmaster.c:5553
frame #12: 0x00010175d42b postgres`maybe_start_bgworker + 427 at 
postmaster.c:5761
frame #13: 0x00010175bccf 
postgres`sigusr1_handler(postgres_signal_arg=30) + 383 at postmaster.c:4979
frame #14: 0x7fff9ab2352a libsystem_platform.dylib`_sigtramp + 26
frame #15: 0x7fff88c7e07b libsystem_kernel.dylib`__select + 11
frame #16: 0x00010175d5ac postgres`ServerLoop + 252 at postmaster.c:1665
frame #17: 0x00010175b2e0 postgres`PostmasterMain(argc=3, 
argv=0x7fd2db403840) + 5968 at postmaster.c:1309
frame #18: 0x00010169507f postgres`main(argc=3, 
argv=0x7fd2db403840) + 751 at main.c:228
frame #19: 0x7fff8d45c5ad libdyld.dylib`start + 1
(lldb) p state
(char) $1 = 'c'
(lldb) p tstate->state
(char) $2 = ‘c’



Hmm, not sure why is that, it might be related to the lsn reported being 
wrong. Could you check what is the lsn there (either in tstate or or in 
pg_subscription_rel)? Especially in comparison with what the 
sent_location is.



Also I’ve noted that some lsn position looks wrong on publisher:

postgres=# select restart_lsn, confirmed_flush_lsn from pg_replication_slots;
 restart_lsn | confirmed_flush_lsn
-+-
 0/1530EF8   | 7FFF/5E7F6A30
(1 row)

postgres=# select sent_location, write_location, flush_location, 
replay_location from pg_stat_replication;
 sent_location | write_location | flush_location | replay_location
---+++-
 0/1530F30 | 7FFF/5E7F6A30  | 7FFF/5E7F6A30  | 7FFF/5E7F6A30
(1 row)



That's most likely result of the unitialized origin_startpos warning. I 
am working on new version of patch where that part is fixed, if you want 
to check this before I send it in, the patch looks like this:


diff --git a/src/backend/replication/logical/apply.c 
b/src/backend/replication/logical/apply.c

index 581299e..7a9e775 100644
--- a/src/backend/replication/logical/apply.c
+++ b/src/backend/replication/logical/apply.c
@@ -1353,6 +1353,7 @@ ApplyWorkerMain(Datum main_arg)
originid = replorigin_by_name(myslotname, false);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+   origin_startpos = replorigin_session_get_progress(false);
CommitTransactionCommand();

wrcapi->connect(wrchandle, MySubscription->conninfo, true,


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PSA: Systemd will kill PostgreSQL

2016-08-15 Thread Josh Berkus

On 08/15/2016 05:18 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> On 08/15/2016 02:43 PM, Tom Lane wrote:
>>> Last I heard, there's an exclusion for "system" accounts, so an
>>> installation that's using the Fedora-provided pgsql account isn't
>>> going to have a problem.  It's homebrew installs running under
>>> ordinary-user accounts that are at risk.
> 
>> Presumably people just need to add the system account tag to the unit
>> file, no?
> 
> Well, yeah, it's easy to fix once you know you need to do so.  The
> complaint is basically that out-of-the-box, it's broken, and it's
> not very clear what was gained by breaking it.

You're welcome to argue with Lennart about that.  I'm not personally
supporting the feature, I just don't think it's that hard to work around.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-15 Thread Tom Lane

Michael Paquier  writes:
> The tree does not have any .bak file, and those refer to backup copies
> normally. Perhaps it would make sense to include those in root's
> .gitignore? That would save from an unfortunate manipulation of git
> add in the future.

We've generally refrained from adding things like that to the .gitignore
files.  If you've got unexpected trash in your tree, you probably ought
to be told about it.  There was some discussion back-when about including
common editor backup extensions and suchlike, but the consensus was that
those are better handled in user-private git config files.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PSA: Systemd will kill PostgreSQL

2016-08-15 Thread Tom Lane

Josh Berkus  writes:
> On 08/15/2016 02:43 PM, Tom Lane wrote:
>> Last I heard, there's an exclusion for "system" accounts, so an
>> installation that's using the Fedora-provided pgsql account isn't
>> going to have a problem.  It's homebrew installs running under
>> ordinary-user accounts that are at risk.

> Presumably people just need to add the system account tag to the unit
> file, no?

Well, yeah, it's easy to fix once you know you need to do so.  The
complaint is basically that out-of-the-box, it's broken, and it's
not very clear what was gained by breaking it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Anyone want to update our Windows timezone map?

2016-08-15 Thread Michael Paquier

On Tue, Aug 16, 2016 at 4:19 AM, Tom Lane  wrote:
> We have certainly not been doing that on a regular basis (as best I can
> tell, no such changes have been made since 2010).  Does anybody who uses
> Windows want to deal with it?  Or at least do it once so that our Windows
> TZ info is less than 5 years out of date?

It would be a good idea to refresh that.

> If we got this done in the next couple weeks, any resulting changes
> could go out in 9.6rc1.  Given that we've not done this routinely,
> that seems like a better plan than having them first hit the field
> in minor releases of stable branches.

Agreed. I'll try to give it a shot in the next couple of days at worst.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-15 Thread Michael Paquier

On Mon, Aug 15, 2016 at 11:19 PM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 08/14/2016 04:38 PM, Tom Lane wrote:
>>> I did a trial run following the current pgindent README procedure, and
>>> noticed that the perltidy step left me with a pile of '.bak' files
>>> littering the entire tree.  This seems like a pretty bad idea because
>>> a naive "git add ." would have committed them.  It's evidently because
>>> src/tools/pgindent/perltidyrc includes --backup-and-modify-in-place.
>
> BTW, after experimenting with this, I did not find any way to get perltidy
> to overwrite the original files without making a backup file.
>
>> We should probably specify -bext='/', which would cause the backup files
>> to be deleted unless an error occurred.
>
> Really?  That seems a bit magic, and it's certainly undocumented.
>
>> Alternatively, we could just remove the in-place parameter and write a
>> command that moved the new .tdy files over the original when perltidy
>> was finished.
>
> I was thinking about just removing all the .bak files afterwards, ie
> automating the existing manual process.  As long as we're making an
> invocation script anyway, that's easy.

The tree does not have any .bak file, and those refer to backup copies
normally. Perhaps it would make sense to include those in root's
.gitignore? That would save from an unfortunate manipulation of git
add in the future.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PSA: Systemd will kill PostgreSQL

2016-08-15 Thread Josh Berkus

On 08/15/2016 02:43 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> On 07/10/2016 10:56 AM, Joshua D. Drake wrote:
>>> tl;dr; Systemd 212 defaults to remove all IPC (including SYSV memory)
>>> when a user "fully" logs out.
> 
>> That looks like it was under discussion in April, though.  Do we have
>> confirmation it was never fixed?  I'm not seeing systemd killing
>> Postgres under Fedora24.
> 
> Last I heard, there's an exclusion for "system" accounts, so an
> installation that's using the Fedora-provided pgsql account isn't
> going to have a problem.  It's homebrew installs running under
> ordinary-user accounts that are at risk.

Presumably people just need to add the system account tag to the unit
file, no?


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let's get rid of the separate minor version numbers for shlibs

2016-08-15 Thread Peter Eisentraut

On 8/15/16 5:11 PM, Stephen Frost wrote:
> Eh?  Last I checked, we needed minor version bumps to ensure that
> binaries compiled against later versions, which might use newer symbols,
> don't try to link against older libraries (which wouldn't have those
> symbols).

Let's review:

What we install is

libpq.so.5.8 (actual file)
libpq.so.5 -> libpq.so.5.8
libpq.so -> libpq.so.5.8

The second one is the one used at run-time, looked up by SONAME.

The third one is the one used at build time, because -lpq means look for
libpq.so.

If we instead installed

libpq.so.5 (actual file)
libpq.so -> libpq.so.5

nothing would effectively change.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index Onlys Scan for expressions

2016-08-15 Thread Tomas Vondra


On 08/16/2016 12:03 AM, Ildar Musin wrote:

Hi, hackers!

There is a known issue that index only scan (IOS) can only work with
simple index keys based on single attributes and doesn't work with index
expressions. In this patch I propose a solution that adds support of IOS
for index expressions. Here's an example:

create table abc(a int, b int, c int);
create index on abc ((a * 1000 + b), c);

with t1 as (select generate_series(1, 1000) as x),
 t2 as (select generate_series(0, 999) as x)
insert into abc(a, b, c)
select t1.x, t2.x, t2.x from t1, t2;
vacuum analyze;

Explain results with the patch:

explain (analyze, buffers) select a * 1000 + b + c from abc where a *
1000 + b = 1001;
   QUERY PLAN
-

 Index Only Scan using abc_expr_c_idx on abc  (cost=0.42..4.45 rows=1
width=4) (actual time=0.032..0.033 rows=1 loops=1)
   Index Cond: a * 1000) + b)) = 1001)
   Heap Fetches: 0
   Buffers: shared hit=4
 Planning time: 0.184 ms
 Execution time: 0.077 ms
(6 rows)

Before the patch it was:

explain (analyze, buffers) select a * 1000 + b + c from abc where a *
1000 + b = 1001;
 QUERY PLAN


 Index Scan using abc_expr_c_idx on abc  (cost=0.42..8.45 rows=1
width=4) (actual time=0.039..0.041 rows=1 loops=1)
   Index Cond: (((a * 1000) + b) = 1001)
   Buffers: shared hit=4
 Planning time: 0.177 ms
 Execution time: 0.088 ms
(5 rows)



Nice! I've only quickly skimmed through the diff, but it seems sane. 
Please add the patch to the 2016-09 CF, though.



This solution has limitations though: the restriction or the target
expression tree (or its part) must match exactly the index. E.g. this
expression will pass the check:

select a * 1000 + b + 100 from ...

but this will fail:

select 100 + a * 1000 + b from ...

because the parser groups it as:

(100 + a * 1000) + b

In this form it won't match any index key. Another case is when we
create index on (a+b) and then make query like 'select b+a ...' or '...
where b+a = smth' -- it won't match. This applies to regular index scan
too. Probably it worth to discuss the way to normalize index expressions
and clauses and work out more convenient way to match them.
Anyway, I will be grateful if you take a look at the patch in
attachment. Any comments and tips are welcome.


I don't think it's a major limitation - it's quite similar to the 
limitation for partial indexes, i.e. with an index defined like


CREATE INDEX ON abc (c) WHERE a + b = 1000;

the index will not be used unless the query expression matches exactly. 
So for example this won't work:


SELECT c FROM abc WHERE b + a = 1000;

because the variables are in the opposite order. Moreover, in the target 
list it might be possible to use explicit parentheses to make it work, 
no? That is, will this work?


select 100 + (a * 1000 + b) from ...

Or will it still break the IOS?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-08-15 Thread Robert Haas

On Sat, Aug 13, 2016 at 11:56 AM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>
>> Being cautious pays more in the long term, so seeing the number of
>> bugs that showed up I'd rather vote for having it disabled by default
>> in 9.6 stable, and enabled on master to aim at enabling it in 10.0.
>
> I too prefer to keep it turned off in 9.6 and consider enabling it by
> default on a future release (10 is probably good).  Interested users can
> carefully test the feature without endangering other unsuspecting users.
>
> I agree with the idea of keeping it enabled in master, so that it'll get
> a modicum of testing there by hackers, too.

Sounds like that is the consensus.  Who's going to implement it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LWLocks in DSM memory

2016-08-15 Thread Robert Haas

On Thu, Aug 11, 2016 at 2:19 PM, Robert Haas  wrote:
> Therefore, I plan to commit this patch, removing the #include
>  unless someone convinces me we need it, shortly after
> development for v10 opens, unless there are objections before then.

Hearing no objections, done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Index Onlys Scan for expressions

2016-08-15 Thread Ildar Musin


Hi, hackers!

There is a known issue that index only scan (IOS) can only work with 
simple index keys based on single attributes and doesn't work with index 
expressions. In this patch I propose a solution that adds support of IOS 
for index expressions. Here's an example:


create table abc(a int, b int, c int);
create index on abc ((a * 1000 + b), c);

with t1 as (select generate_series(1, 1000) as x),
 t2 as (select generate_series(0, 999) as x)
insert into abc(a, b, c)
select t1.x, t2.x, t2.x from t1, t2;
vacuum analyze;

Explain results with the patch:

explain (analyze, buffers) select a * 1000 + b + c from abc where a * 
1000 + b = 1001;

   QUERY PLAN
-
 Index Only Scan using abc_expr_c_idx on abc  (cost=0.42..4.45 rows=1 
width=4) (actual time=0.032..0.033 rows=1 loops=1)

   Index Cond: a * 1000) + b)) = 1001)
   Heap Fetches: 0
   Buffers: shared hit=4
 Planning time: 0.184 ms
 Execution time: 0.077 ms
(6 rows)

Before the patch it was:

explain (analyze, buffers) select a * 1000 + b + c from abc where a * 
1000 + b = 1001;

 QUERY PLAN

 Index Scan using abc_expr_c_idx on abc  (cost=0.42..8.45 rows=1 
width=4) (actual time=0.039..0.041 rows=1 loops=1)

   Index Cond: (((a * 1000) + b) = 1001)
   Buffers: shared hit=4
 Planning time: 0.177 ms
 Execution time: 0.088 ms
(5 rows)

This solution has limitations though: the restriction or the target 
expression tree (or its part) must match exactly the index. E.g. this 
expression will pass the check:


select a * 1000 + b + 100 from ...

but this will fail:

select 100 + a * 1000 + b from ...

because the parser groups it as:

(100 + a * 1000) + b

In this form it won't match any index key. Another case is when we 
create index on (a+b) and then make query like 'select b+a ...' or '... 
where b+a = smth' -- it won't match. This applies to regular index scan 
too. Probably it worth to discuss the way to normalize index expressions 
and clauses and work out more convenient way to match them.
Anyway, I will be grateful if you take a look at the patch in 
attachment. Any comments and tips are welcome.


Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2952bfb..9eb0e12 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
@@ -130,7 +131,13 @@ static PathClauseUsage *classify_index_clause_usage(Path *path,
 static Relids get_bitmap_tree_required_outer(Path *bitmapqual);
 static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds);
 static int	find_list_position(Node *node, List **nodelist);
-static bool check_index_only(RelOptInfo *rel, IndexOptInfo *index);
+static bool check_index_only(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index);
+static bool check_index_only_targetlist(PlannerInfo *root,
+			RelOptInfo *rel,
+			IndexOptInfo *index,
+			Bitmapset *index_canreturn_attrs);
+static bool check_index_only_clauses(List *clauses,
+		 IndexOptInfo *index);
 static double get_loop_count(PlannerInfo *root, Index cur_relid, Relids outer_relids);
 static double adjust_rowcount_for_semijoins(PlannerInfo *root,
 			  Index cur_relid,
@@ -190,7 +197,6 @@ static List *network_prefix_quals(Node *leftop, Oid expr_op, Oid opfamily,
 static Datum string_to_datum(const char *str, Oid datatype);
 static Const *string_to_const(const char *str, Oid datatype);
 
-
 /*
  * create_index_paths()
  *	  Generate all interesting index paths for the given relation.
@@ -1020,7 +1026,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	 * index data retrieval anyway.
 	 */
 	index_only_scan = (scantype != ST_BITMAPSCAN &&
-	   check_index_only(rel, index));
+	   check_index_only(root, rel, index));
 
 	/*
 	 * 4. Generate an indexscan path if there are relevant restriction clauses
@@ -1780,13 +1786,12 @@ find_list_position(Node *node, List **nodelist)
 	return i;
 }
 
-
 /*
  * check_index_only
  *		Determine whether an index-only scan is possible for this index.
  */
 static bool
-check_index_only(RelOptInfo *rel, IndexOptInfo *index)
+check_index_only(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index)
 {
 	bool		result;
 	Bitmapset  *attrs_used = NULL;
@@ -1836,10 +1841,7 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 	{
 		int			attno = index->indexkeys[i];
 
-		/*
-		 * For the

Re: [HACKERS] PSA: Systemd will kill PostgreSQL

2016-08-15 Thread Tom Lane

Josh Berkus  writes:
> On 07/10/2016 10:56 AM, Joshua D. Drake wrote:
>> tl;dr; Systemd 212 defaults to remove all IPC (including SYSV memory)
>> when a user "fully" logs out.

> That looks like it was under discussion in April, though.  Do we have
> confirmation it was never fixed?  I'm not seeing systemd killing
> Postgres under Fedora24.

Last I heard, there's an exclusion for "system" accounts, so an
installation that's using the Fedora-provided pgsql account isn't
going to have a problem.  It's homebrew installs running under
ordinary-user accounts that are at risk.

But they might have changed the policy some more since then.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Anyone want to update our Windows timezone map?

2016-08-15 Thread Tom Lane

Thomas Munro  writes:
> On Tue, Aug 16, 2016 at 7:19 AM, Tom Lane  wrote:
>> We have certainly not been doing that on a regular basis (as best I can
>> tell, no such changes have been made since 2010).  Does anybody who uses
>> Windows want to deal with it?  Or at least do it once so that our Windows
>> TZ info is less than 5 years out of date?

> By the way, I noticed that Unicode CLDR publishes this data set which
> seems to be the same sort of thing:
> http://www.unicode.org/cldr/charts/29/supplemental/zone_tzid.html

> Could that be a better source than dumping stuff from arbitrary
> Windows versions?

Well, the point is that we need to recognize the timezone names that
Windows actually uses.  A list of strings that's more or less the same
doesn't really cut it ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PSA: Systemd will kill PostgreSQL

2016-08-15 Thread Josh Berkus

On 07/10/2016 10:56 AM, Joshua D. Drake wrote:
> Hackers,
> 
> This just came across my twitter feed:
> 
> https://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html
> 
> tl;dr; Systemd 212 defaults to remove all IPC (including SYSV memory)
> when a user "fully" logs out.

That looks like it was under discussion in April, though.  Do we have
confirmation it was never fixed?  I'm not seeing systemd killing
Postgres under Fedora24.


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New version numbering practices

2016-08-15 Thread Tom Lane

I wrote:
> I've pushed this with minor additional twiddling.  We can work on the
> "UI issues" you mentioned at leisure.  One that I noted is that psql's
> version-mismatch-warning messages aren't two-part-version aware, and
> will print "10.0" where they should say "10".  Probably that fix should
> get back-patched, so that older psql branches will do the right thing
> if possible.

Attached is a WIP patch for that.  I've only fixed psql/command.c
so far, but I put the support logic into fe_utils since we're going
to need it in some other places too (pg_dump at least).  Any objections
to this general approach?

regards, tom lane

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9c0af4e..4aaf657 100644
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
*** exec_command(const char *cmd,
*** 635,642 
  
  		if (pset.sversion < 80400)
  		{
! 			psql_error("The server (version %d.%d) does not support editing function source.\n",
! 	   pset.sversion / 1, (pset.sversion / 100) % 100);
  			status = PSQL_CMD_ERROR;
  		}
  		else if (!query_buf)
--- 635,645 
  
  		if (pset.sversion < 80400)
  		{
! 			char		sverbuf[32];
! 
! 			psql_error("The server (version %s) does not support editing function source.\n",
! 	   formatPGVersionNumber(pset.sversion, false,
! 			 sverbuf, sizeof(sverbuf)));
  			status = PSQL_CMD_ERROR;
  		}
  		else if (!query_buf)
*** exec_command(const char *cmd,
*** 731,738 
  
  		if (pset.sversion < 70400)
  		{
! 			psql_error("The server (version %d.%d) does not support editing view definitions.\n",
! 	   pset.sversion / 1, (pset.sversion / 100) % 100);
  			status = PSQL_CMD_ERROR;
  		}
  		else if (!query_buf)
--- 734,744 
  
  		if (pset.sversion < 70400)
  		{
! 			char		sverbuf[32];
! 
! 			psql_error("The server (version %s) does not support editing view definitions.\n",
! 	   formatPGVersionNumber(pset.sversion, false,
! 			 sverbuf, sizeof(sverbuf)));
  			status = PSQL_CMD_ERROR;
  		}
  		else if (!query_buf)
*** exec_command(const char *cmd,
*** 1362,1369 
  	  OT_WHOLE_LINE, NULL, true);
  		if (pset.sversion < 80400)
  		{
! 			psql_error("The server (version %d.%d) does not support showing function source.\n",
! 	   pset.sversion / 1, (pset.sversion / 100) % 100);
  			status = PSQL_CMD_ERROR;
  		}
  		else if (!func)
--- 1368,1378 
  	  OT_WHOLE_LINE, NULL, true);
  		if (pset.sversion < 80400)
  		{
! 			char		sverbuf[32];
! 
! 			psql_error("The server (version %s) does not support showing function source.\n",
! 	   formatPGVersionNumber(pset.sversion, false,
! 			 sverbuf, sizeof(sverbuf)));
  			status = PSQL_CMD_ERROR;
  		}
  		else if (!func)
*** exec_command(const char *cmd,
*** 1441,1448 
  	  OT_WHOLE_LINE, NULL, true);
  		if (pset.sversion < 70400)
  		{
! 			psql_error("The server (version %d.%d) does not support showing view definitions.\n",
! 	   pset.sversion / 1, (pset.sversion / 100) % 100);
  			status = PSQL_CMD_ERROR;
  		}
  		else if (!view)
--- 1450,1460 
  	  OT_WHOLE_LINE, NULL, true);
  		if (pset.sversion < 70400)
  		{
! 			char		sverbuf[32];
! 
! 			psql_error("The server (version %s) does not support showing view definitions.\n",
! 	   formatPGVersionNumber(pset.sversion, false,
! 			 sverbuf, sizeof(sverbuf)));
  			status = PSQL_CMD_ERROR;
  		}
  		else if (!view)
*** connection_warnings(bool in_startup)
*** 2014,2035 
  	if (!pset.quiet && !pset.notty)
  	{
  		int			client_ver = PG_VERSION_NUM;
  
  		if (pset.sversion != client_ver)
  		{
  			const char *server_version;
- 			char		server_ver_str[16];
  
  			/* Try to get full text form, might include "devel" etc */
  			server_version = PQparameterStatus(pset.db, "server_version");
  			if (!server_version)
  			{
! snprintf(server_ver_str, sizeof(server_ver_str),
! 		 "%d.%d.%d",
! 		 pset.sversion / 1,
! 		 (pset.sversion / 100) % 100,
! 		 pset.sversion % 100);
! server_version = server_ver_str;
  			}
  
  			printf(_("%s (%s, server %s)\n"),
--- 2026,2046 
  	if (!pset.quiet && !pset.notty)
  	{
  		int			client_ver = PG_VERSION_NUM;
+ 		char		cverbuf[32];
+ 		char		sverbuf[32];
  
  		if (pset.sversion != client_ver)
  		{
  			const char *server_version;
  
  			/* Try to get full text form, might include "devel" etc */
  			server_version = PQparameterStatus(pset.db, "server_version");
+ 			/* Otherwise fall back on pset.sversion */
  			if (!server_version)
  			{
! formatPGVersionNumber(pset.sversion, true,
! 	  sverbuf, sizeof(sverbuf));
! server_version = sverbuf;
  			}
  
  			printf(_("%s (%s, server %s)\n"),
*** connection_warnings(bool in_startup)
*** 2040,2049 
  			printf("%s (%s)\n", pset.progname, PG_VERSION);
  
  		if (pset.sversion / 100 >

Re: [HACKERS] Let's get rid of the separate minor version numbers for shlibs

2016-08-15 Thread Stephen Frost

Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 8/15/16 3:06 PM, Tom Lane wrote:
> > That would give us an automatic annual change in the minor version.
> > If we ever made an incompatible change in a shlib, we could advance
> > its SO_MAJOR_VERSION but keep this rule for the minor version (there's
> > no law that says we have to reset the minor version when we do that).
> 
> Let's look into getting rid of the minor versions altogether.  They
> don't serve any technical purpose in most cases.  Library packaging
> policies have evolved quite a bit over the years; maybe there is some
> guidance there to make this simpler.

Eh?  Last I checked, we needed minor version bumps to ensure that
binaries compiled against later versions, which might use newer symbols,
don't try to link against older libraries (which wouldn't have those
symbols).

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-15 Thread David Steele


On 8/15/16 4:58 PM, Jim Nasby wrote:

On 8/15/16 2:39 PM, David Steele wrote:

That patch got me thinking about what else could be excluded and after
some investigation I found the following: pg_notify, pg_serial,
pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
or rebuilt on server start.


If someone wanted to scratch an itch, it would also be useful to put
things that are zeroed under a single dedicated directory, so that folks
who wanted to could mount that on a ram drive. It would also be useful
to do that for stuff that's only reset on crash (to put it on local
storage as opposed to a SAN).


I definitely have something like that in mind and thought this was a 
good place to start.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-15 Thread Jim Nasby


On 8/15/16 2:39 PM, David Steele wrote:

That patch got me thinking about what else could be excluded and after
some investigation I found the following: pg_notify, pg_serial,
pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
or rebuilt on server start.


If someone wanted to scratch an itch, it would also be useful to put 
things that are zeroed under a single dedicated directory, so that folks 
who wanted to could mount that on a ram drive. It would also be useful 
to do that for stuff that's only reset on crash (to put it on local 
storage as opposed to a SAN).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

2016-08-15 Thread Tomas Vondra


On 08/10/2016 06:41 AM, Michael Paquier wrote:

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra
 wrote:

1) enriching the query tree with multivariate statistics info

Right now all the stuff related to multivariate statistics estimation
happens in clausesel.c - matching condition to statistics, selection of
statistics to use (if there are multiple usable stats), etc. So pretty much
all this info is internal to clausesel.c and does not get outside.


This does not seem bad to me as first sight but...


I'm starting to think that some of the steps (matching quals to stats,
selection of stats) should happen in a "preprocess" step before the actual
estimation, storing the information (which stats to use, etc.) in a new type
of node in the query tree - something like RestrictInfo.

I believe this needs to happen sometime after deconstruct_jointree() as that
builds RestrictInfos nodes, and looking at planmain.c, right after
extract_restriction_or_clauses seems about right. Haven't tried, though.

This would move all the "statistics selection" logic from clausesel.c,
separating it from the "actual estimation" and simplifying the code.

But more importantly, I think we'll need to show some of the data in EXPLAIN
output. With per-column statistics it's fairly straightforward to determine
which statistics are used and how. But with multivariate stats things are
often more complicated - there may be multiple candidate statistics (e.g.
histograms covering different subsets of the conditions), it's possible to
apply them in different orders, etc.

But EXPLAIN can't show the info if it's ephemeral and available only within
clausesel.c (and thrown away after the estimation).


This gives a good reason to not do that in clauserel.c, it would be
really cool to be able to get some information regarding the stats
used with a simple EXPLAIN.


I've been thinking about this, and I'm afraid it's way more complicated 
in practice. It essentially means doing something like


rel->baserestrictinfo = enrichWithStatistics(rel->baserestrictinfo);

for each table (and in the future maybe also for joins etc.) But as the 
name suggests the list should only include RestrictInfo nodes, which 
seems to contradict the transformation.


For example with conditions

WHERE (a=1) AND (b=2) AND (c=3)

the list will contain 3 RestrictInfos. But if there's a statistics on 
(a,b,c), we need to note that somehow - my plan was to inject a node 
storing this information, something like (a bit simplified):


StatisticsInfo {
 Oid statisticsoid; /* OID of the statistics */
 List *mvconditions; /* estimate using the statistics */
 List *otherconditions; /* estimate the old way */
}

But that'd clearly violate the assumption that baserestrictinfo only 
contains RestrictInfo. I don't think it's feasible (or desirable) to 
rework all the places to expect both RestrictInfo and the new node.


I can think of two alternatives:

1) keep the transformed list as separate list, next to baserestrictinfo

This obviously fixes the issue, as each caller can decide which node it 
wants. But it also means we need to maintain two lists instead of one, 
and keep them synchronized.


2) embed the information into the existing tree

It might be possible to store the information in existing nodes, i.e. 
each node would track whether it's estimated the "old way" or using 
multivariate statistics (and which one). But it would require changing 
many of the existing nodes (at least those compatible with multivariate 
statistics: currently OpExpr, NullTest, ...).


And it also seems fairly difficult to reconstruct the information during 
the estimation, as it'd be necessary to look for other nodes to be 
estimated by the same statistics. Which seems to defeat the idea of 
preprocessing to some degree.


So I'm not sure what's the best solution. I'm leaning to (1), i.e. 
keeping a separate list, but I'd welcome other ideas.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let's get rid of the separate minor version numbers for shlibs

2016-08-15 Thread Peter Eisentraut

On 8/15/16 3:06 PM, Tom Lane wrote:
> That would give us an automatic annual change in the minor version.
> If we ever made an incompatible change in a shlib, we could advance
> its SO_MAJOR_VERSION but keep this rule for the minor version (there's
> no law that says we have to reset the minor version when we do that).

Let's look into getting rid of the minor versions altogether.  They
don't serve any technical purpose in most cases.  Library packaging
policies have evolved quite a bit over the years; maybe there is some
guidance there to make this simpler.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Anyone want to update our Windows timezone map?

2016-08-15 Thread Thomas Munro

On Tue, Aug 16, 2016 at 7:19 AM, Tom Lane  wrote:
> src/timezone/README saith
>
>   When there has been a new release of Windows (probably including Service
>   Packs), the list of matching timezones need to be updated. Run the
>   script in src/tools/win32tzlist.pl on a Windows machine running this new
>   release and apply any new timezones that it detects. Never remove any
>   mappings in case they are removed in Windows, since we still need to
>   match properly on the old version.
>
> We have certainly not been doing that on a regular basis (as best I can
> tell, no such changes have been made since 2010).  Does anybody who uses
> Windows want to deal with it?  Or at least do it once so that our Windows
> TZ info is less than 5 years out of date?

By the way, I noticed that Unicode CLDR publishes this data set which
seems to be the same sort of thing:

http://www.unicode.org/cldr/charts/29/supplemental/zone_tzid.html

Could that be a better source than dumping stuff from arbitrary
Windows versions?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-15 Thread Stephen Frost

David,

* David Steele (da...@pgmasters.net) wrote:
> Recently a hacker proposed a patch to add pg_dynshmem to the list of
> directories whose contents are excluded in pg_basebackup.  I wasn't able
> to find the original email despite several attempts.

That would be here:

b4e94836-786b-6020-e1b3-3d7390f95...@aiven.io

Thanks!

Stephen


signature.asc
Description: Digital signature

[HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-08-15 Thread David Steele

Recently a hacker proposed a patch to add pg_dynshmem to the list of
directories whose contents are excluded in pg_basebackup.  I wasn't able
to find the original email despite several attempts.

That patch got me thinking about what else could be excluded and after
some investigation I found the following: pg_notify, pg_serial,
pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
or rebuilt on server start.

The attached patch adds these directories to the pg_basebackup
exclusions and takes an array-based approach to excluding directories
and files during backup.

I also incorporated Ashutosh's patch to fix corruption when
pg_stat_tmp/pg_replslot are links [1].  This logic has been extended to
all excluded directories.

Perhaps these patches should be merged in the CF but first I'd like to
see if anyone can identify problems with the additional exclusions.

Thanks,
-- 
-David
da...@pgmasters.net

[1]
http://www.postgresql.org/message-id/flat/CAE9k0Pm7=x_o0w7e2b2s2cwczdcbgczgdrxttzxozgp8beb...@mail.gmail.com/
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index da9b7a6..209b2cb 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -30,6 +30,7 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "utils/builtins.h"
@@ -69,9 +70,6 @@ static void throttle(size_t increment);
 /* Was the backup currently in-progress initiated in recovery mode? */
 static bool backup_started_in_recovery = false;
 
-/* Relative path of temporary statistics directory */
-static char *statrelpath = NULL;
-
 /*
  * Size of each block sent into the tar stream for larger files.
  */
@@ -95,6 +93,68 @@ static int64 elapsed_min_unit;
 static int64 throttled_last;
 
 /*
+ * The contents of these directories are removed or recreated during server
+ * start so they will not be included in the backup.  The directory entry
+ * will be included to preserve permissions.
+ */
+#define EXCLUDE_DIR_MAX8
+#define EXCLUDE_DIR_STAT_TMP   0
+
+const char *excludeDirContents[EXCLUDE_DIR_MAX] =
+{
+   /*
+* Skip temporary statistics files. The first array position will be
+* filled with the value of pgstat_stat_directory relative to PGDATA.
+* PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set
+* because PGSS_TEXT_FILE is always created there.
+*/
+   NULL,
+   PG_STAT_TMP_DIR,
+
+   /* Recreated on startup, see StartupReplicationSlots(). */
+   "pg_replslot",
+
+   /* Removed on startup, see dsm_cleanup_for_mmap(). */
+   PG_DYNSHMEM_DIR,
+
+   /* Recreated/zeroed on startup, see AsyncShmemInit(). */
+   "pg_notify",
+
+   /* Recreated on startup, see OldSerXidInit(). */
+   "pg_serial",
+
+   /* Removed on startup, see DeleteAllExportedSnapshotFiles(). */
+   "pg_snapshots",
+
+   /* Recreated/zeroed on startup, see StartupSUBTRANS(. */
+   "pg_subtrans"
+};
+
+/*
+ * Files that should not be included in the backup.
+ */
+#define EXCLUDE_FILE_MAX   5
+
+const char *excludeFile[EXCLUDE_FILE_MAX] =
+{
+   /* Skip auto conf temporary file. */
+   PG_AUTOCONF_FILENAME ".tmp",
+
+   /*
+* If there's a backup_label or tablespace_map file, it belongs to a 
backup
+* started by the user with pg_start_backup(). It is *not* correct for 
this
+* backup, our backup_label/tablespace_map is injected into the tar
+* separately.
+*/
+   BACKUP_LABEL_FILE,
+   TABLESPACE_MAP,
+
+   /* Skip postmaster.pid and postmaster.opts. */
+   "postmaster.pid",
+   "postmaster.opts"
+};
+
+/*
  * Called when ERROR or FATAL happens in perform_base_backup() after
  * we have started the backup - make sure we end it!
  */
@@ -154,11 +214,13 @@ perform_base_backup(basebackup_options *opt, DIR 
*tblspcdir)
 */
if (is_absolute_path(pgstat_stat_directory) &&
strncmp(pgstat_stat_directory, DataDir, datadirpathlen) 
== 0)
-   statrelpath = psprintf("./%s", pgstat_stat_directory + 
datadirpathlen + 1);
+   excludeDirContents[EXCLUDE_DIR_STAT_TMP] =
+   pgstat_stat_directory + datadirpathlen + 1;
else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
-   statrelpath = psprintf("./%s", pgstat_stat_directory);
+   excludeDirContents[EXCLUDE_DIR_STAT_TMP] = 
pgstat_stat_directory;
else
-   statrelpath = pgstat_stat_directory;
+   excludeDirContents[EXCLUDE_DIR_STAT_TMP] =
+   pgstat_stat_directory + 2;
 
/* Add a node for the base directory at the end */

Re: [HACKERS] Let's get rid of the separate minor version numbers for shlibs

2016-08-15 Thread Tom Lane

Robert Haas  writes:
> On Mon, Aug 15, 2016 at 3:06 PM, Tom Lane  wrote:
>> That would give us an automatic annual change in the minor version.
>> If we ever made an incompatible change in a shlib, we could advance
>> its SO_MAJOR_VERSION but keep this rule for the minor version (there's
>> no law that says we have to reset the minor version when we do that).

> Well, part of the motivation for moving to one part version numbers
> was that it would be less confusing, but this seems like it would
> create more confusing minor version numbers for shared libraries.  I
> think it would be strange to have a library that went from version
> 1.10 to version 2.11 without passing through 2.0 - 2.10.  I wouldn't
> rate that a critical defect, but if you don't like strange version
> numbering conventions, I wouldn't pick that one.

Well, we could address that when and if we ever do a major-version
sonumber bump.  We have not done that in the last ten years and frankly
I doubt we ever would; that would imply the sort of client API break
that we don't like to make.

> I wonder if we could address this problem by setting the version
> numbers using a formula based on the major version, instead of using
> the major version directly.

Possibly, though I think arithmetic is not Makefiles' strong suit.
In any case, I don't see a need for it right now, unless you're objecting
to the ecpg version changes I outlined.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Anyone want to update our Windows timezone map?

2016-08-15 Thread Tom Lane

src/timezone/README saith

  When there has been a new release of Windows (probably including Service
  Packs), the list of matching timezones need to be updated. Run the
  script in src/tools/win32tzlist.pl on a Windows machine running this new
  release and apply any new timezones that it detects. Never remove any
  mappings in case they are removed in Windows, since we still need to
  match properly on the old version.

We have certainly not been doing that on a regular basis (as best I can
tell, no such changes have been made since 2010).  Does anybody who uses
Windows want to deal with it?  Or at least do it once so that our Windows
TZ info is less than 5 years out of date?

If we got this done in the next couple weeks, any resulting changes
could go out in 9.6rc1.  Given that we've not done this routinely,
that seems like a better plan than having them first hit the field
in minor releases of stable branches.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let's get rid of the separate minor version numbers for shlibs

2016-08-15 Thread Robert Haas

On Mon, Aug 15, 2016 at 3:06 PM, Tom Lane  wrote:
> After doing the tedious and easily forgotten (I almost did forget)
> minor version bumps for our shared libraries,
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0b9358d4406b9b45a06855d53f491cc7ce9550a9
>
> it suddenly struck me that in the brave new two-part-version-number
> world we could dispense with that annual task by hard-wiring the
> various shlibs' SO_MINOR_VERSION numbers to equal the current release
> branch's major version number, ie
>
> SO_MINOR_VERSION=$MAJORVERSION
>
> That would give us an automatic annual change in the minor version.
> If we ever made an incompatible change in a shlib, we could advance
> its SO_MAJOR_VERSION but keep this rule for the minor version (there's
> no law that says we have to reset the minor version when we do that).
>
> This would be basically no change for libpq, since its scheduled
> minor version number for this release cycle happens to be 10 anyway.
> ecpg's various shlibs are at SO_MINOR_VERSION 8 or 9, so they would
> basically skip a minor version number or two, but that seems fine.
>
> The only place where we'd have a problem is the ecpg preprocessor
> itself, which is scheduled to be at version 4.13 this year.  However,
> that version number is purely cosmetic since AFAICS the only thing
> that gets done with it is to print it in response to -v and suchlike.
> I don't really see why ecpg has its own version number anyway ---
> why don't we go over to giving it the same version number as the
> rest of PG?  So it would just print the PG_VERSION string in the places
> where it currently prints the numbers hard-wired in ecpg/preproc/Makefile.
>
> Thoughts?

Well, part of the motivation for moving to one part version numbers
was that it would be less confusing, but this seems like it would
create more confusing minor version numbers for shared libraries.  I
think it would be strange to have a library that went from version
1.10 to version 2.11 without passing through 2.0 - 2.10.  I wouldn't
rate that a critical defect, but if you don't like strange version
numbering conventions, I wouldn't pick that one.

I wonder if we could address this problem by setting the version
numbers using a formula based on the major version, instead of using
the major version directly.  e.g. if something is scheduled to be 1.8
this year, make it 1.(VERSION - 2).  Then, you'd only have to update
the formula when bumping the major version - e.g. if we go to 2.0 in
version 14, you'd change the formula at that time to 2.(VERSION - 14).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Let's get rid of the separate minor version numbers for shlibs

2016-08-15 Thread Tom Lane

After doing the tedious and easily forgotten (I almost did forget)
minor version bumps for our shared libraries,
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0b9358d4406b9b45a06855d53f491cc7ce9550a9

it suddenly struck me that in the brave new two-part-version-number
world we could dispense with that annual task by hard-wiring the
various shlibs' SO_MINOR_VERSION numbers to equal the current release
branch's major version number, ie

SO_MINOR_VERSION=$MAJORVERSION

That would give us an automatic annual change in the minor version.
If we ever made an incompatible change in a shlib, we could advance
its SO_MAJOR_VERSION but keep this rule for the minor version (there's
no law that says we have to reset the minor version when we do that).

This would be basically no change for libpq, since its scheduled
minor version number for this release cycle happens to be 10 anyway.
ecpg's various shlibs are at SO_MINOR_VERSION 8 or 9, so they would
basically skip a minor version number or two, but that seems fine.

The only place where we'd have a problem is the ecpg preprocessor
itself, which is scheduled to be at version 4.13 this year.  However,
that version number is purely cosmetic since AFAICS the only thing
that gets done with it is to print it in response to -v and suchlike.
I don't really see why ecpg has its own version number anyway ---
why don't we go over to giving it the same version number as the
rest of PG?  So it would just print the PG_VERSION string in the places
where it currently prints the numbers hard-wired in ecpg/preproc/Makefile.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-15 Thread Tom Lane

Andrew Dunstan  writes:
> On 08/15/2016 10:19 AM, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> We should probably specify -bext='/', which would cause the backup files
>>> to be deleted unless an error occurred.

>> Really?  That seems a bit magic, and it's certainly undocumented.

> We must be using different versions.

Hmm ... I'm using the RHEL6 version, which claims to be v20090616,
which is what pgindent/README says to use.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-15 Thread Shay Rojansky

>
> I'm not going to respond to the part about dealing with prepared
>> statements errors, since I think we've already covered that and there's
>> nothing new being said. I don't find automatic savepointing acceptable, and
>> a significant change of the PostgreSQL protocol to support this doesn't
>> seem reasonable (but you can try proposing).
>>
>
> As mentioned before. JDBC is not the only postgres driver to do this the
> ODBC driver does this as well. This is a requested feature by users. We
> didn't just decide to do it on our own.
>
> One thing to keep in mind is that both JDBC and ODBC are not exclusively
> PostgreSQL drivers and as such we sometimes have to jump through hoops to
> provide the semantics requested by the API.
>

I don't have anything in particular against automatic savepointing when
requested directly by users (especially if it's opt-in). My problem is more
with automatic savepointing as a solution to a problem created by doing
automatic prepared statements... Way too much automatic stuff going on
there for my taste...

Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-15 Thread Andrew Dunstan




On 08/15/2016 10:19 AM, Tom Lane wrote:

Andrew Dunstan  writes:

On 08/14/2016 04:38 PM, Tom Lane wrote:

I did a trial run following the current pgindent README procedure, and
noticed that the perltidy step left me with a pile of '.bak' files
littering the entire tree.  This seems like a pretty bad idea because
a naive "git add ." would have committed them.  It's evidently because
src/tools/pgindent/perltidyrc includes --backup-and-modify-in-place.

BTW, after experimenting with this, I did not find any way to get perltidy
to overwrite the original files without making a backup file.


We should probably specify -bext='/', which would cause the backup files
to be deleted unless an error occurred.

Really?  That seems a bit magic, and it's certainly undocumented.



We must be using different versions.





Alternatively, we could just remove the in-place parameter and write a
command that moved the new .tdy files over the original when perltidy
was finished.

I was thinking about just removing all the .bak files afterwards, ie
automating the existing manual process.  As long as we're making an
invocation script anyway, that's easy.




WFM.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-15 Thread Dave Cramer

>
>
> I'm not going to respond to the part about dealing with prepared
> statements errors, since I think we've already covered that and there's
> nothing new being said. I don't find automatic savepointing acceptable, and
> a significant change of the PostgreSQL protocol to support this doesn't
> seem reasonable (but you can try proposing).
>

As mentioned before. JDBC is not the only postgres driver to do this the
ODBC driver does this as well. This is a requested feature by users. We
didn't just decide to do it on our own.

One thing to keep in mind is that both JDBC and ODBC are not exclusively
PostgreSQL drivers and as such we sometimes have to jump through hoops to
provide the semantics requested by the API.



Dave Cramer

da...@postgresintl.com
www.postgresintl.com

Re: [HACKERS] pg_bsd_indent - improvements around offsetof and sizeof

2016-08-15 Thread Piotr Stefaniak

On 2016-05-25 21:13, Tom Lane wrote:
> Robert Haas  writes:
>> On Sun, May 22, 2016 at 4:16 PM, Piotr Stefaniak
>>  wrote:
>>> I think I've managed to improve pg_bsd_indent's handling of two types of
>>> cases.
>
>> Wow, that seems pretty great.  I haven't scrutinized your changes to
>> pg_bsd_indent, but effect_on_pg.diff looks like a large improvement.

> Assuming this patch withstands more careful review, we will need to think
> about project policy for how/when to apply such fixes.

The patches have got committed upstream and work well for Postgres. You 
can take FreeBSD indent(1) as of SVN r303746, apply patches from 
https://github.com/pstef/freebsd_indent/commits/pass2 (subject to heavy 
rebasing) and use as pg_bsd_indent for pgindent.

There are more fixes I intend to do, of which the most relevant for 
Postgres are:
1) fixing "function pointer typedef formatting"
2) adding a -tsn option like in GNU indent, for setting how many columns 
a tab character will produce. I had a preliminary patch implementing 
that and I have to say that while it removes the need for entab, it also 
introduces a lot of seemingly pointless changes in formatting which will 
be arguably improvements or regressions.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2016-08-15 Thread Anastasia Lubennikova


13.08.2016 02:15, Alvaro Herrera:

Many have expressed their interest in this topic, but I haven't seen any
design of how it should work.  Here's my attempt; I've been playing with
this for some time now and I think what I propose here is a good initial
plan.  This will allow us to write permanent table storage that works
differently than heapam.c.  At this stage, I haven't throught through
whether this is going to allow extensions to define new storage modules;
I am focusing on AMs that can coexist with heapam in core.

Hello,
thank you for starting this topic.

I am working on a very similar proposal in another thread.
https://commitfest.postgresql.org/10/700/

At first glance our drafts are very similar. I hope it means that we are
moving in the right direction. It's great that your proposal is focused on
interactions with executor while mine are mostly about internals of new
StorageAm interface and interactions with a buffer and storage management.

Please read and comment https://wiki.postgresql.org/wiki/HeapamRefactoring

I'll leave more concrete comments tomorrow.
Thank you for the explicit description of issues.



The design starts with a new row type in pg_am, of type "s" (for "storage").
The handler function returns a struct of node StorageAmRoutine.  This
contains functions for 1) scans (beginscan, getnext, endscan) 2) tuples
(tuple_insert/update/delete/lock, as well as set_oid, get_xmin and the
like), and operations on tuples that are part of slots (tuple_deform,
materialize).
To support this, we introduce StorageTuple and StorageScanDesc.
StorageTuples represent a physical tuple coming from some storage AM.
It is necessary to have a pointer to a StorageAmRoutine in order to
manipulate the tuple.  For heapam.c, a StorageTuple is just a HeapTuple.

RelationData gains ->rd_stamroutine which is a pointer to the
StorageAmRoutine for the relation in question.  Similarly,
TupleTableSlot is augmented with a link to the StorageAmRoutine to
handle the StorageTuple it contains (probably in most cases it's set at
the same time as the tupdesc).  This implies that routines such as
ExecAssignScanType need to pass down the StorageAmRoutine from the
relation to the slot.

The executor is modified so that instead of calling heap_insert etc
directly, it uses rel->rd_stamroutine to call these methods.  The
executor is still in charge of dealing with indexes, constraints, and
any other thing that's not the tuple storage itself (this is one major
point in which this differs from FDWs).  This all looks simple enough,
with one exception and a few notes:

exception a) ExecMaterializeSlot needs special consideration.  This is
used in two different ways: a1) is the stated "make tuple independent
from any underlying storage" point, which is handled by
ExecMaterializeSlot itself and calling a method from the storage AM to
do any byte copying as needed.  ExecMaterializeSlot no longer returns a
HeapTuple, because there might not be any.  The second usage pattern a2)
is to create a HeapTuple that's passed to other modules which only deal
with HT and not slots (triggers are the main case I noticed, but I think
there are others such as the executor itself wanting tuples as Datum for
some reason).  For the moment I'm handling this by having a new
ExecHeapifyTuple which creates a HeapTuple from a slot, regardless of
the original tuple format.

note b) EvalPlanQual currently maintains an array of HeapTuple in
EState->es_epqTuple.  I think it works to replace that with an array of
StorageTuples; EvalPlanQualFetch needs to call the StorageAmRoutine
methods in order to interact with it.  Other than those changes, it
seems okay.

note c) nodeSubplan has curTuple as a HeapTuple.  It seems simple
to replace this with an independent slot-based tuple.

note d) grp_firstTuple in nodeAgg / nodeSetOp.  These are less
simple than the above, but replacing the HeapTuple with a slot-based
tuple seems doable too.

note e) nodeLockRows uses lr_curtuples to feed EvalPlanQual.
TupleTableSlot also seems a good replacement.  This has fallout in other
users of EvalPlanQual, too.

note f) More widespread, MinimalTuples currently use a tweaked HeapTuple
format.  In the long run, it may be possible to replace them with a
separate storage module that's specifically designed to handle tuples
meant for tuplestores etc.  That may simplify TupleTableSlot and
execTuples.  For the moment we keep the tts_mintuple as it is.  Whenever
a tuple is not already in heap format, we heapify it in order to put in
the store.


The current heapam.c routines need some changes.  Currently, practice is
that heap_insert, heap_multi_insert, heap_fetch, heap_update scribble on
their input tuples to set the resulting ItemPointer in tuple->t_self.
This is messy if we want StorageTuples to be abstract.  I'm changing
this so that the resulting ItemPointer is returned in a separate output
argument; the tuple itself is left alone.  This is somewhat messy in the
case of

Re: [HACKERS] Refactoring of heapam code.

2016-08-15 Thread Anastasia Lubennikova


08.08.2016 12:43, Anastasia Lubennikova:

08.08.2016 03:51, Michael Paquier:
On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera 
 wrote:

Anastasia Lubennikova wrote:
So there is a couple of patches. They do not cover all mentioned 
problems,

but I'd like to get a feedback before continuing.

I agree that we could improve things in this general area, but I do not
endorse any particular changes you propose in these patches; I haven't
reviewed your patches.
Well, I am interested in the topic. And just had a look at the 
patches proposed.


+ /*
+ * Split PageGetItem into set of different macros
+ * in order to make code more readable.
+ */
+#define PageGetItemHeap(page, itemId) \
+( \
+   AssertMacro(PageIsValid(page)), \
+   AssertMacro(ItemIdHasStorage(itemId)), \
+   (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \
+)
Placing into bufpage.h macros that are specific to heap and indexes is
not that much a good idea. I'd suggest having the heap ones in
heapam.h, and the index ones in a more generic place, as
src/access/common as already mentioned by Alvaro.

+   onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid);
Just PageGetItemHeapHeader would be fine.

@@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate)
 pfree(bistate);
  }

-
  /*
   * heap_insert - insert tuple into a heap
Those patches have some noise.

Patch 0002 is doing two things: reorganizing the order of the routines
in heapam.c and move some routines from heapam.c to hio.c. Those two
could be separate patches/

If the final result is to be able to extend custom AMs for tables, we
should have a structure like src/access/common/index.c,
src/access/common/table.c and src/access/common/relation.c for
example, and have headers dedicated to them. But yes, there is
definitely a lot of room for refactoring as we are aiming, at least I
guess so, at having more various code paths for access methods.


Thank you for the review, I'll fix these problems in final version.

Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more concrete.

Thank you for attention and feedback.
Since there are no objections to the idea in general, I'll write an 
exhaustive

README about current state of the code and then propose API design
to discuss details.

Stay tuned.



Here is the promised design draft.
https://wiki.postgresql.org/wiki/HeapamRefactoring

Looking forward to your comments.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New version numbering practices

2016-08-15 Thread Tom Lane

Peter Eisentraut  writes:
> On 8/1/16 11:49 AM, Tom Lane wrote:
>> Somebody needs to come up with a patch implementing this changeover.

> Here is such a patch.  It does not yet implement:

I've pushed this with minor additional twiddling.  We can work on the
"UI issues" you mentioned at leisure.  One that I noted is that psql's
version-mismatch-warning messages aren't two-part-version aware, and
will print "10.0" where they should say "10".  Probably that fix should
get back-patched, so that older psql branches will do the right thing
if possible.

I tried to fix the MSVC build to work, but I've not tested it and
am waiting to see what the buildfarm says.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-15 Thread Shay Rojansky

On Mon, Aug 15, 2016 at 3:16 PM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Vladimir>> Yes, that is what happens.
> Vladimir>> The idea is not to mess with gucs.
>
> Shay:> Wow... That is... insane...
>
> Someone might say that "programming languages that enable side-effects
> are insane".

Lots of connection pools work by sharing the connections and it is up
> to developer
> if he can behave well (see "It is not" below)
>

The insane part is that there's really almost no reason to allow these
side-effects... It's possible to efficiently reset connection state with a
truly negligible impact on

> Shay> it's entirely reasonable for
> Shay> more than one app to use use the same pool
>
> Sharing connections between different applications might be not that good
> idea.
>

Not sure why... It seems like a bad idea mostly if your pool is leaking
state.

> However, I would not agree that "having out-of-process connection
> pool" is the only sane
> way to go.
> I do admit there might be valid cases for out of process pooling,
> however I do not agree
> it is the only way to go. Not only "inprocess" is one of the ways,
> "in-process" way is wildly used
> in at least one of the top enterprise languages.
>
> If you agree with that, you might agree that "in-process connection
> pool that serves
> exactly one application" might work in a reasonable fashion even
> without DISCARD ALL.
>

I never said out-of-process pooling is the way to go - in the .NET world
in-process pooling is very valid. But as I carefully said in my last email,
the same problems you have with multiple applications can occur with
multiple components within the same application. The process boundary isn't
very important here - in some scenarios programmers choose process
separation, in others they choose threads. The same basic problems that
occur in one model can occur in the other, including usage strategies which
make LRU caching very bad.

> Shay> Even with in-process pools it's standard practice (and a good idea)
> to
> Shay> reset state.
>
> It is not. Can you quote where did you get that "standard practice is
> to reset state" from?
>

I guess I take that back, I haven't actually made a thorough comparison
here.

Shay> If some part of a big complex
> Shay> application modified state, and then some other part happens to get
> that
> Shay> physical connection, it's extremely hard to make sense of things.
>
> Let's not go into "functional programming" vs "imperative programming"
> discussion?
> Of course you might argue that "programming in Haskell or OCaml or F#"
> makes
> "extremely easy to make sense of things", but that's completely
> another discussion.
>

I'm not sure what this is referring to... If you're talking about my
comment that "isolation/separation of layers is a good thing in
programming", then I don't think it's the function vs. imperative kind of
argument.

Shay> One note - in Npgsql's implementation of persistent prepared
> statements,
> Shay> instead of sending DISCARD ALL Npgsql sends the commands listed in
> Shay> https://www.postgresql.org/docs/current/static/sql-discard.html,
> except for
> Shay> DEALLOCATE ALL. This cleans up state changes but leaves prepared
> statements
> Shay> in place.
>
> Ok. At least you consider that "always discarding all the state" might be
> bad.
>

Yes I do. I actually implemented persistent prepared statements before this
conversation started - I think it's a great performance booster. I'm still
not sure if it should be opt-in or default, although I admit I'm leaning
towards default. But that feature has very little to do with *implicit*
preparation.

Shay> This is somewhat similar to the CPU reordering you
> Shay> keep coming back to - it's totally invisible
>
> I would disagree. CPU reordering is easily visible if you are dealing
> with multithreaded case.
> It can easily result in application bugs if application misses some
> synchronization.
>
> CPU reordering is very visible to regular programmers, and it is a
> compromise:
> 1) Developers enable compiler and CPU do certain "reorderings"
> 2) Developers agree to follow the rules like "missing synchronization
> might screw things up"
> 3) In the result, the code gets executed faster.
>

The point is that AFAIK the same bugs that can result from reordering can
also result from other basic conditions as well. If you're writing
multithreaded code then you must handle synchronization - this is not a
reordering-specific problem. Therefore if your program is multithreaded but
doesn't do proper synchronization you have a bug - regardless of whether
its manifestation is triggered by CPU reordering or not. I admit I'm not an
expert on this and may be wrong (it would be interesting to know).

> Vladimir> Just in case: PostgreSQL does not execute "discard all" on its
> own.
>
> Shay> Of course it doesn't - it doesn't know anything about connection
> pooling,
> Shay> it only knows about physical connections. When would it execute

Re: [HACKERS] condition variables

2016-08-15 Thread Robert Haas

On Mon, Aug 15, 2016 at 1:58 AM, Thomas Munro
 wrote:
> On Sun, Aug 14, 2016 at 9:04 AM, Thomas Munro
>  wrote:
>> On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas  wrote:
>>> [condition-variable-v1.patch]
>>
>> Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?
>
> I poked at this a bit... OK, a lot... and have some feedback:
>
> 1.  As above, we need to clear cvSleeping before setting the latch.

Right, OK.

> 2.  The following schedule corrupts the waitlist by trying to delete
> something from it that isn't in it:
>
> P1: ConditionVariablePrepareToSleep: push self onto waitlist
> P2:   ConditionVariableSignal: pop P1 from waitlist
> P1: 
> P3: ConditionVariablePrepareToSleep: push self onto waitlist
> P1: ConditionVariableCancelSleep: delete self from waitlist (!)
>
> Without P3 coming along you probably wouldn't notice because the
> waitlist will be happily cleared and look valid, but P3's entry gets
> lost and then it hangs forever.
>
> One solution is to teach ConditionVariableCancelSleep to check if
> we're still actually there first.  That can be done in O(1) time by
> clearing nodes' next and prev pointers when deleting, so that we can
> rely on that in a new proclist_contains() function.  See attached.

How about instead using cvSleeping to test this?  Suppose we make a
rule that cvSleeping can be changed from false to true only by the
process whose PGPROC it is, and thus no lock is needed, but changing
it from true to false always requires the spinlock.

> 3.  The following schedule corrupts the waitlist by trying to insert
> something into it that is already in it:
>
> P1: ConditionVariablePrepareToSleep: push self onto waitlist
> P1: 
> P1: ConditionVariableSleep
> P1: ConditionVariablePrepareToSleep: push self onto waitlist (!)
>
> Nodes before and after self's pre-existing position can be forgotten
> when self's node is pushed to the front of the list.  That can be
> fixed by making ConditionVariablePrepareToSleep also check if we're
> already in the list.

OK.

> 4.  The waitlist is handled LIFO-ly.  Would it be better for the first
> guy to start waiting to be woken up first, like we do for locks?  The
> Pthreads equivalent says that it depends on "scheduling policy".  I
> don't know if it matters much, just an observation.

I don't know whether this matters.  It's possible that FIFO is a
better policy; I don't really care.

> 5.  The new proclist function you added is the first to work in terms
> of PGPROC* rather than procno.  Maybe the whole interface should work
> with either PGPROC pointers or procnos?  No strong view.

Hmm, maybe so.  But wouldn't any caller translate to a PGPROC * straight off?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch proposal

2016-08-15 Thread David Steele

On 8/15/16 2:33 AM, Venkata B Nagothi wrote:

> During the recovery process, It would be nice if PostgreSQL generates an
> error by aborting the recovery process (instead of starting-up the
> cluster) if the intended recovery target point is not reached and give
> an option to DBA to resume the recovery process from where it exactly
> stopped.

Thom wrote a patch [1] recently that gives warnings in this case.  You
might want to have a look at that first.

> The issue here is, if by any chance, the required WALs are not available
> or if there is any WAL missing or corrupted at the restore_command
> location, then PostgreSQL recovers until the end of the last available
> WAL and starts-up the cluster. 

You can use pause_at_recovery_target/recovery_target_action (depending
on your version) to prevent promotion.  That would work for your stated
scenario but not for the scenario where replay starts (or the database
reaches consistency) after the recovery target.

[1]
https://www.postgresql.org/message-id/flat/CAA-aLv4K2-9a%2BcvK75dkZkYD1etxpaH%2B9HC0vm9Ebw2up9Co2A%40mail.gmail.com#caa-alv4k2-9a+cvk75dkzkyd1etxpah+9hc0vm9ebw2up9c...@mail.gmail.com

-- 
-David
da...@pgmasters.net

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Barriers

2016-08-15 Thread Konstantin Knizhnik



On 15.08.2016 15:42, Thomas Munro wrote:
This implementation is using a spinlock for the arrival counter, and 
signals (via Robert's condition variables and latches) for waking up 
peer processes when the counter reaches the target. I realise that 
using signals for this sort of thing is a bit unusual outside the 
Postgres universe, but won't a semaphore-based implementation require 
just as many system calls, context switches and scheduling operations? 


Yes, you are right.
I never expected that this combination of signal+local socket+select can 
provide performance comparable with pthread_cond_t.
I have implemented simple test where two background workers are 
emulating request-response round-trip using latches and pthread primitives.
Result (average round-trip time) was 7.49 microseconds for Postgres 
latches vs. 4.59 microseconds for pthread_cond_timedwait.





#define N_ROUNDTRIPS 100
#define WAIT_LATCH_TIMEOUT 6

static void PongLatch(Datum arg)
{
int i;
timestamp_t start;
int result;

BackgroundWorkerUnblockSignals();

Mtm->pong = MyProc->pgprocno;
ResetLatch(>procLatch);
MtmSleep(100);
Assert(Mtm->ping);


for (i = 0; i <= N_ROUNDTRIPS; i++) {
result = WaitLatch(>procLatch, WL_LATCH_SET|WL_TIMEOUT, 
WAIT_LATCH_TIMEOUT);

Assert(result & WL_LATCH_SET);
ResetLatch(>procLatch);
SetLatch(>allProcs[Mtm->ping].procLatch);
if (i == 0) {
   start = MtmGetSystemTime();
}
}
fprintf(stderr, "Average roundrip time: %f microsconds\n", 
(double)(MtmGetSystemTime() - start) /  N_ROUNDTRIPS);

}

static void PingLatch(Datum arg)
{
int i;
timestamp_t start;
int result;

BackgroundWorkerUnblockSignals();

Mtm->ping = MyProc->pgprocno;
ResetLatch(>procLatch);
MtmSleep(100);
Assert(Mtm->pong);

for (i = 0; i <= N_ROUNDTRIPS; i++) {
SetLatch(>allProcs[Mtm->pong].procLatch);
result = WaitLatch(>procLatch, WL_LATCH_SET|WL_TIMEOUT, 
WAIT_LATCH_TIMEOUT);

Assert(result & WL_LATCH_SET);
ResetLatch(>procLatch);
if (i == 0) {
   start = MtmGetSystemTime();
}
}
fprintf(stderr, "Average roundrip time: %f microseconds\n", 
(double)(MtmGetSystemTime() - start) /  N_ROUNDTRIPS);

}


static BackgroundWorker Pinger = {
"ping",
BGWORKER_SHMEM_ACCESS,// | BGWORKER_BACKEND_DATABASE_CONNECTION,
BgWorkerStart_ConsistentState,
BGW_NEVER_RESTART,
PingLatch
};

static BackgroundWorker Ponger = {
"pong",
BGWORKER_SHMEM_ACCESS,// | BGWORKER_BACKEND_DATABASE_CONNECTION,
BgWorkerStart_ConsistentState,
BGW_NEVER_RESTART,
PongLatch
};

static void PingPong()
{
RegisterBackgroundWorker();
RegisterBackgroundWorker();
}




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Undiagnosed bug in Bloom index

2016-08-15 Thread Jeff Janes

On Sat, Aug 13, 2016 at 6:22 PM, Tom Lane  wrote:
> Jeff Janes  writes:
>> I am getting corrupted Bloom indexes in which a tuple in the table
>> heap is not in the index.
>



> Will push a fix in a bit.

After 36 hours of successful running on two different machines (one
with crash injection turned on, one without), I am pretty confident
that your fix is working.

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in to_timestamp().

2016-08-15 Thread Robert Haas

On Mon, Aug 15, 2016 at 10:56 AM, amul sul  wrote:
> On Thursday, 11 August 2016 3:18 PM, Artur Zakirov
>  wrote:
>
>>Here is my patch. It is a proof of concept.
>>Date/Time Formatting
>>
>>There are changes in date/time formatting rules:
> -> now to_timestamp() and to_date() skip spaces in the input string and
>>in the formatting string unless FX option is used, as Amul Sul wrote on
>>first message of this thread. But Ex.2 gives an error now with this
>>patch (should we fix this too?).
>
> Why not, currently we are skipping whitespace exists at the start of input
> string but not if in format string.
>
> [Skipped… ]
>
>>Of course this patch can be completely wrong. But it tries to introduce
>>more formal rules for formatting.
>>I will be grateful for notes and remarks.
>
> Following are few scenarios where we break existing behaviour:
>
> SELECT TO_TIMESTAMP('2015-12-31 13:43:36', ' MM DD HH24 MI SS');
> SELECT TO_TIMESTAMP('2011$03!18 23_38_15', '-MM-DD HH24:MI:SS');
> SELECT TO_TIMESTAMP('2011*03*18 23^38&15', '-MM-DD HH24:MI:SS');
> SELECT TO_TIMESTAMP('2011*03!18 #%23^38$15', '-MM-DD$$$HH24:MI:SS');
>
> But current patch behaviour is not that much bad either at least we have
> errors, but I am not sure about community acceptance.
>
> I would like to divert communities' attention on following case:
> SELECT TO_TIMESTAMP('2013--10-01', '-MM-DD');
>
> Where the hyphen (-) is not skipped. So ultimately -10 is interpreted using
> MM as negative 10. So the date goes back by that many months (and probably
> additional days because of -31), and so the final output becomes 2012-01-30.
> But the fix is not specific to hyphen case. Ideally the fix would have been
> to handle it in from_char_parse_int(). Here, -10 is converted to int using
> strtol. May be we could have done it using strtoul(). Is there any intention
> behind not considering signed integers versus unsigned ones ?
>
> Another is, shouldn’t we have error in following cases?
> SELECT TO_TIMESTAMP('2016-06-13 99:99:99', '-MM-DD HH24:MI:SS');
> SELECT TO_TIMESTAMP('2016-02-30 15:43:36', '-MM-DD HH24:MI:SS');

Well, what's the Oracle behavior in any of these cases?  I don't think
we can decide to change any of this behavior without knowing that.  If
a proposed behavior change is incompatible with our previous releases,
I think it'd better at least be more compatible with Oracle.
Otherwise, we're just changing from an established behavior that we
invented ourselves to a new behavior we invented ourselves, which is
only worthwhile if it's absolutely clear that the new behavior is way
better.

(Also, note that text formatted email is generally preferred to HTML
on this mailing list; the fact that your email is in a different font
than the rest of the thread makes it hard to read.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2016-08-15 Thread Robert Haas

On Fri, Aug 12, 2016 at 7:15 PM, Alvaro Herrera
 wrote:
> Many have expressed their interest in this topic, but I haven't seen any
> design of how it should work.  Here's my attempt; I've been playing with
> this for some time now and I think what I propose here is a good initial
> plan.  This will allow us to write permanent table storage that works
> differently than heapam.c.  At this stage, I haven't throught through
> whether this is going to allow extensions to define new storage modules;
> I am focusing on AMs that can coexist with heapam in core.

Thanks for taking a stab at this.  I'd like to throw out a few concerns.

One, I'm worried that adding an additional layer of pointer-jumping is
going to slow things down and make Andres' work to speed up the
executor more difficult.  I don't know that there is a problem there,
and if there is a problem I don't know what to do about it, but I
think it's something we need to consider.  I am somewhat inclined to
believe that we need to restructure the executor in a bigger way so
that it passes around datums instead of tuples; I'm inclined to
believe that the current tuple-centric model is probably not optimal
even for the existing storage format.  It seems even less likely to be
right for a data format in which fetching columns is more expensive
than currently, such as a columnar store.

Two, I think that we really need to think very hard about how the
query planner will interact with new heap storage formats.  For
example, suppose cstore_fdw were rewritten as a new heap storage
format.   Because ORC contains internal indexing structures with
characteristics somewhat similar to BRIN, many scans can be executed
much more efficiently than for our current heap storage format.  If it
can be seen that an entire chunk will fail to match the quals, we can
skip the whole chunk.  Some operations may permit additional
optimizations: for example, given SELECT count(*) FROM thing WHERE
quals, we may be able to push the COUNT(*) down into the heap access
layer.  If it can be verified that EVERY tuple in a chunk will match
the quals, we can just increment the count by that number without
visiting each tuple individually.  This could be really fast.  These
kinds of query planner issues are generally why I have favored trying
to do something like this through the FDW interface, which already has
the right APIs for this kind of thing, even if we're not using them
all yet.  I don't say that's the only way to crack this problem, but I
think we're going to find that a heap storage API that doesn't include
adequate query planner integration is not a very exciting thing.

Three, with respect to this limitation:

> iii) All tuples need to be identifiable by ItemPointers.  Storages that
> have different requirements will need careful additional thought across
> the board.

I think it's a good idea for a first patch in this area to ignore (or
mostly ignore) this problem - e.g. maybe allow such storage formats
but refuse to create indexes on them.  But eventually I think we're
going to want/need to do something about it.  There are an awful lot
of interesting ideas that we can't pursue without addressing this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-15 Thread Vladimir Sitnikov

Vladimir>> Yes, that is what happens.
Vladimir>> The idea is not to mess with gucs.

Shay:> Wow... That is... insane...

Someone might say that "programming languages that enable side-effects
are insane".
Lots of connection pools work by sharing the connections and it is up
to developer
if he can behave well (see "It is not" below)

Shay> it's entirely reasonable for
Shay> more than one app to use use the same pool

Sharing connections between different applications might be not that good idea.
However, I would not agree that "having out-of-process connection
pool" is the only sane
way to go.
I do admit there might be valid cases for out of process pooling,
however I do not agree
it is the only way to go. Not only "inprocess" is one of the ways,
"in-process" way is wildly used
in at least one of the top enterprise languages.

If you agree with that, you might agree that "in-process connection
pool that serves
exactly one application" might work in a reasonable fashion even
without DISCARD ALL.

Shay> Even with in-process pools it's standard practice (and a good idea) to
Shay> reset state.

It is not. Can you quote where did you get that "standard practice is
to reset state" from?

Oracle Weblogic application server does not reset connections.
JBoss WildFly application server does not reset connections.
HikariCP connection pool does not reset connections.

I can easily continue the list.
The above are heavily used java servers (Hikari is a pool to be exact).

Shay> If some part of a big complex
Shay> application modified state, and then some other part happens to get that
Shay> physical connection, it's extremely hard to make sense of things.

Let's not go into "functional programming" vs "imperative programming"
discussion?
Of course you might argue that "programming in Haskell or OCaml or F#" makes
"extremely easy to make sense of things", but that's completely
another discussion.

Shay> One note - in Npgsql's implementation of persistent prepared statements,
Shay> instead of sending DISCARD ALL Npgsql sends the commands listed in
Shay> https://www.postgresql.org/docs/current/static/sql-discard.html,
except for
Shay> DEALLOCATE ALL. This cleans up state changes but leaves prepared
statements
Shay> in place.

Ok. At least you consider that "always discarding all the state" might be bad.


Shay> This is somewhat similar to the CPU reordering you
Shay> keep coming back to - it's totally invisible

I would disagree. CPU reordering is easily visible if you are dealing
with multithreaded case.
It can easily result in application bugs if application misses some
synchronization.

CPU reordering is very visible to regular programmers, and it is a compromise:
1) Developers enable compiler and CPU do certain "reorderings"
2) Developers agree to follow the rules like "missing synchronization
might screw things up"
3) In the result, the code gets executed faster.


Vladimir> Just in case: PostgreSQL does not execute "discard all" on its own.

Shay> Of course it doesn't - it doesn't know anything about connection pooling,
Shay> it only knows about physical connections. When would it execute "discard
Shay> all" on its own?

That my point was for "pgpool aiming to look like a regular postgresql
connection".
The point was: "postgresql does not discard on its own, so pgpool
should not discard".


Shay> To enforce isolation, which is maybe the most important way for
programs to
Shay> be reliable - but this is a general theme which you don't seem to agree
Shay> with.

If you want to isolate something, you might better have a
per-application connection pool.
That way, if a particular application consumes all the connections, it
would not impact
other applications. If all the applications use the same
out-of-process pool, there might
be trouble of resource hogging.

Shay> Regardless, resetting state doesn't have to have a necessary effect
Shay> on response times/throughput.

Even if you do not reset prepared statements, "reset query" takes time.
For instance: there's a common problem to "validate connections before use".
That is the pooler should ensure the connection is working before handling it
to the application.
Both Weblogic server, and HikariCP have those connection validation built in
and the validation is enabled by default.

However, it turns out that "connection validation" takes too much time,
it is visible in the application response times, etc, so they both implement a
grace period. That is "if the connection was recently used, it is
assumed to be fine".
Weblogic trusts 15 seconds by default, so if you borrow connections
each 10 seconds, then
they will not be tested.
Well, there's additional background validation, but my main point is
"even select 1"
is visible on the application response times.



Shay> This is something new - maybe it's part of the misunderstanding here. To
Shay> me, the term "prepared statements" always means "server-prepared
Shay> statements"; this seems to be supported by the documentation

Re: [HACKERS] Bug in to_timestamp().

2016-08-15 Thread amul sul

On Thursday, 11 August 2016 3:18 PM, Artur Zakirov  
wrote:
>Here is my patch. It is a proof of concept.>Date/Time 
>Formatting>>There are changes in date/time formatting 
>rules:-> now to_timestamp() and to_date() skip spaces in the input string and 
>>in the formatting string unless FX option is used, as Amul Sul wrote on 
>>first message of this thread. But Ex.2 gives an error now with this >patch 
>(should we fix this too?).
Why not, currently we are skipping whitespace exists at the start of input 
string but not if in format string.
[Skipped… ]
>Of course this patch can be completely wrong. But it tries to introduce >more 
>formal rules for formatting.>I will be grateful for notes and remarks.
Following are few scenarios where we break existing behaviour:
SELECT TO_TIMESTAMP('2015-12-31 13:43:36', ' MM DD HH24 MI SS');SELECT 
TO_TIMESTAMP('2011$03!18 23_38_15', '-MM-DD HH24:MI:SS');SELECT 
TO_TIMESTAMP('2011*03*18 23^38&15', '-MM-DD HH24:MI:SS');SELECT 
TO_TIMESTAMP('2011*03!18 #%23^38$15', '-MM-DD$$$HH24:MI:SS');
But current patch behaviour is not that much bad either at least we have 
errors, but I am not sure about community acceptance.
I would like to divert communities' attention on following case:SELECT 
TO_TIMESTAMP('2013--10-01', '-MM-DD');
Where the hyphen (-) is not skipped. So ultimately -10 is interpreted using MM 
as negative 10. So the date goes back by that many months (and probably 
additional days because of -31), and so the final output becomes 2012-01-30. 
But the fix is not specific to hyphen case. Ideally the fix would have been to 
handle it in from_char_parse_int(). Here, -10 is converted to int using strtol. 
May be we could have done it using strtoul(). Is there any intention behind not 
considering signed integers versus unsigned ones ?
Another is, shouldn’t we have error in following cases? SELECT 
TO_TIMESTAMP('2016-06-13 99:99:99', '-MM-DD HH24:MI:SS'); SELECT 
TO_TIMESTAMP('2016-02-30 15:43:36', '-MM-DD HH24:MI:SS');

Thanks  & Regards,Amul Sul

Re: [HACKERS] WIP: Covering + unique indexes.

2016-08-15 Thread Anastasia Lubennikova


14.08.2016 20:11, Andrey Borodin:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, failed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi hackers!

I've read the patch and here is my code review.

==PURPOSE
I've used this feature from time to time with MS SQL. From my experience 
INCLUDE is a 'sugar on top' feature.
Some MS SQL classes do not even mention INCLUDE despite it's there from 2005 
(though classes do not mention lots of important things, so it's not kind of 
valuable indicator).
But those who use it, use it whenever possible. For example, system view with 
recommended indices rarely list one without INCLUDE columns.
So, this feature is very important from perspective of converting MS SQL DBAs 
to PostgreSQL. This is how I see it.


Thank you for the review, I hope this feature will be useful for many 
people.



SUGGESTIONS==
0. Index build is broken. This script 
https://github.com/x4m/pggistopt/blob/8ad65d2e305e98c836388a07909af5983dba9c73/test.sql
 SEGFAULTs and may cause situation when you cannot insert anything into table 
(I think drop of index would help, but didn't tested this)


Thank you for reporting. That was a bug caused by high key truncation, 
that occurs

when index has more than 3 levels.
Fixed. See attached file.


1. I think MS SQL syntax INCLUDE instead of INCLUDING would be better (for a 
purpose listed above)


I've chosen this particular name to avoid using of new keyword. We 
already have INCLUDING
in postgres in a context of inheritance that will never intersect with 
covering indexes.

I'm sure it won't be a big problem of migration from MsSQL.


2. Empty line added in ruleutils.c. Is it for a reason?


No, just a missed line.
Fixed.


3. Now we have indnatts and indnkeyatts instead of indnatts. I think it is 
worth considering renaming indnatts to something different from old name. 
Someone somewhere could still suppose it's a number of keys.


I agree that naming became vague after this patch.
I've already suggested to replace "indkeys[]" with more specific name, and
AFAIR there was no reaction. So I didn't do that.
But I don't sure about your suggestion regarding indnatts. Old queries 
(and old indexes)
can still use it correctly. I don't see a reason to break compatibility 
for all users.
Those who will use this new feature, should ensure that their queries to 
pg_index

behave as expected.


PERFORMANCE==
Due to suggestion number 0 I could not measure performance of index build. 
Index crashes when there's more than 1.1 million of rows in a table.
Performance test script is here 
https://github.com/x4m/pggistopt/blob/f206b4395baa15a2fa42897eeb27bd555619119a/test.sql
Test scenario is following:
1. Create table, then create index, then add data.
2. Make a query touching data in INCLUDING columns.
This scenario is tested against table with:
A. Table with index, that do not contain touched columns, just PK.
B. Index with all columns in index.
C. Index with PK in keys and INCLUDING all other columns.

Tests were executed 5 times on Ubuntu VM under Hyper-V i5 2500 CPU, 16 Gb of 
RAM, SSD disk.
Time to insert 10M rows:
A. AVG 110 seconds STD 4.8
B. AVG 121 seconds STD 2.0
C. AVG 111 seconds STD 5.7
Inserts to INCLUDING index is almost as fast as inserts to index without extra 
columns.

Time to run SELECT query:
A. AVG 2864 ms STD 794
B. AVG 2329 ms STD 84
C. AVG 2293 ms STD 58
Selects with INCLUDING columns is almost as fast as with full index.

Index size (deterministic measure, STD = 0)
A. 317 MB
B. 509 MB
C. 399 MB
Index size is in the middle between full index and minimal index.

I think this numbers agree with expectation from the feature.

CONCLUSION==
This patch brings useful and important feature. Build shall be repaired; other 
my suggestions are only suggestions.



Best regards, Andrey Borodin, Octonica & Ural Federal University.

The new status of this patch is: Waiting on Author



--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts,

Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-15 Thread Tom Lane

Andrew Dunstan  writes:
> On 08/14/2016 04:38 PM, Tom Lane wrote:
>> I did a trial run following the current pgindent README procedure, and
>> noticed that the perltidy step left me with a pile of '.bak' files
>> littering the entire tree.  This seems like a pretty bad idea because
>> a naive "git add ." would have committed them.  It's evidently because
>> src/tools/pgindent/perltidyrc includes --backup-and-modify-in-place.

BTW, after experimenting with this, I did not find any way to get perltidy
to overwrite the original files without making a backup file.

> We should probably specify -bext='/', which would cause the backup files 
> to be deleted unless an error occurred.

Really?  That seems a bit magic, and it's certainly undocumented.

> Alternatively, we could just remove the in-place parameter and write a 
> command that moved the new .tdy files over the original when perltidy 
> was finished.

I was thinking about just removing all the .bak files afterwards, ie
automating the existing manual process.  As long as we're making an
invocation script anyway, that's easy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-15 Thread Andrew Dunstan




On 08/14/2016 04:38 PM, Tom Lane wrote:

I did a trial run following the current pgindent README procedure, and
noticed that the perltidy step left me with a pile of '.bak' files
littering the entire tree.  This seems like a pretty bad idea because
a naive "git add ." would have committed them.  It's evidently because
src/tools/pgindent/perltidyrc includes --backup-and-modify-in-place.
Is there a good reason for that, and if so what is it?



We should probably specify -bext='/', which would cause the backup files 
to be deleted unless an error occurred.


Alternatively, we could just remove the in-place parameter and write a 
command that moved the new .tdy files over the original when perltidy 
was finished.





Also, is there a reason why the perltidy invocation command hasn't
been packaged into a shell script, rather than expecting the committer
to copy-and-paste a rather large string?



No idea. Sounds like a good thing to do.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-15 Thread Shay Rojansky

On Sat, Aug 13, 2016 at 11:20 PM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Tatsuo>Interesting. What would happen if a user changes some of GUC
> parameters? Subsequent session accidentally inherits the changed GUC
> parameter?
>
> Yes, that is what happens.
> The idea is not to mess with gucs.
>

Wow... That is... insane...

> Tatsuo>There's nothing wrong with DICARD ALL
> Tatsuo>"DISCARD ALL" is perfect for this goal.
>
> It looks like you mean: "let's reset the connection state just in case".
> I see where it might help: e.g. when providing database access to random
> people who might screw a connection in every possible way.
>

It's not about random people screwing up connections, it's about
maintaining isolation between different connections... When using an
out-of-process pool (e.g. pgbouncer/pgpool), it's entirely reasonable for
more than one app to use use the same pool. It's entirely possible for one
app to change a GUC and then the other app goes boom. It's especially hard
to debug too, because it depends on who got the physical connection before
you.

Even with in-process pools it's standard practice (and a good idea) to
reset state. The idea is that pooling, much like a cache (see below!), is
supposed to speed up your application without having any other visible
effects. In other words, running your app with or without a pool should be
identical except for performance aspects. If some part of a big complex
application modified state, and then some other part happens to get that
physical connection, it's extremely hard to make sense of things.

One note - in Npgsql's implementation of persistent prepared statements,
instead of sending DISCARD ALL Npgsql sends the commands listed in
https://www.postgresql.org/docs/current/static/sql-discard.html, except for
DEALLOCATE ALL. This cleans up state changes but leaves prepared statements
in place.

Just in case: do you think one should reset CPU caches, OS filesystem
> caches, DNS caches, bounce the application, and bounce the OS in addition
> to "discard all"?
> Why reset only "connection properties" when we can reset everything to its
> pristine state?
>

Um, because all the caches you mention are, well, caches - by definition
they're not supposed to have any visible impact on any application, except
for making it faster. This is somewhat similar to the CPU reordering you
keep coming back to - it's totally invisible. GUCs are the exact opposite -
you use them to modify how PostgreSQL behaves. So it makes perfect sense to
reset them.

Just in case: PostgreSQL does not execute "discard all" on its own.
>

Of course it doesn't - it doesn't know anything about connection pooling,
it only knows about physical connections. When would it execute "discard
all" on its own?

> If you mean "pgpool is exactly like reconnect to postgresql but faster
> since connection is already established", then OK, that might work in some
> cases (e.g. when response times/throughput are not important), however why
> forcing "you must always start from scratch" execution model?
>

To enforce isolation, which is maybe the most important way for programs to
be reliable - but this is a general theme which you don't seem to agree
with. Regardless, resetting state doesn't have to have a necessary effect
on response times/throughput.

> Developers are not that dumb, and they can understand that "changing gucs
> at random is bad".
>

This has nothing to do with random, developers may have a legitimate reason
to modify a GUC. In fact, GUCs are there precisely so that developers can
change them...

Also, session state is not only about GUCs. DISCARD ALL also releases
locks, resets authorization, resets sequence state (currval/nextval), etc.
etc. All these things should not leak across connections.

> Please read again. PreparedStatement is the only way to execute statements
> in JDBC API. There's no API that allows user to specify "use
> server-prepared here".
> Well, there's non-prepared API in JDBC, however it misses "bind
> variables" support,
> so if bind variables required, developer would use PreparedStatement.

> Java's PreparedStatement does not have an option to distinguish which
statements
> should be server-prepared and which should not.

This is something new - maybe it's part of the misunderstanding here. To
me, the term "prepared statements" always means "server-prepared
statements"; this seems to be supported by the documentation you quote: "If
the driver supports precompilation, the method prepareStatement will send
the statement to the database for precompilation". I don't understand the
concept of a prepared statements which isn't server-prepared - do you have
some sort of driver-only prepared statements, which

Regardless of any optimizations you may be doing, in every database driver
I've ever seen in my life, "prepared" simply means "server-prepared". And
in every driver I've ever seen, there's an explicit API for that. Therefore
server-prepare

Re: [HACKERS] WIP: Barriers

2016-08-15 Thread Robert Haas

On Sat, Aug 13, 2016 at 7:18 PM, Thomas Munro
 wrote:
> I would like to propose "barriers" for Postgres processes.  A barrier
> is a very simple mechanism for coordinating parallel computation, as
> found in many threading libraries.
>
> First, you initialise a Barrier object somewhere in shared memory,
> most likely in the DSM segment used by parallel query, by calling
> BarrierInit(, nworkers).  Then workers can call
> BarrierWait() when they want to block until all workers arrive
> at the barrier.  When the final worker arrives, BarrierWait returns in
> all workers, releasing them to continue their work.  One arbitrary
> worker receives a different return value as a way of "electing" it to
> perform serial phases of computation.  For parallel phases of
> computation, the return value can be ignored.  For example, there may
> be preparation, merging, or post-processing phases which must be done
> by just one worker, interspersed with phases where all workers do
> something.
>
> My use case for this is coordinating the phases of parallel hash
> joins, but I strongly suspect there are other cases.  Parallel sort
> springs to mind, which is why I wanted to post this separately and
> earlier than my larger patch series, to get feedback from people
> working on other parallel features.

I was thinking about this over the weekend and I started to wonder
whether this is really going to work well for hash joins.  For
example, suppose that 6GB of work_mem is available and the projected
size of the hash table is 8GB.  Clearly, we're going to need 2
batches, but, if our estimates are accurate and the tuples split
evenly between batches, each batch will be only 4GB!  That means that
we can build up to 2GB of the hash table for the next batch before
we've finished with the hash table for the previous batch.  It seems
like a really good idea to try to take advantage of that as much as
possible.

The simple version of this is that when a worker gets done with its
own probe phase for batch X, it can immediately start building the
hash table for phase X+1, stopping if it fills up the unused portion
of work_mem before the old hash table goes away.  Of course, there are
some tricky issues with reading tapes that were originally created by
other backends, but if I understand correctly, Peter Geoghegan has
already done some work on that problem, and it seems like something we
can eventually solve, even if not in the first version.

The more complicated version of this is that we might want to delegate
one or more workers to start building as much of the next-batch hash
table as will fit instead of assisting with the current probe phase.
Once work_mem is full, they join the probe phase and continue until
it's done.  Again, tape management is an issue.  But you can see that
if you can make this work, in this example, you can reduce the
enforced pause between batches by about 50%; half the work is already
done by the time the old hash table goes away.  I bet that has a
chance of being fairly significant, especially for hash joins that
have tons of batches.  I once saw a 64-batch hash join outperform a
nested loop with inner index scan!

Anyway, my point here is that I'm not sure whether the barrier
mechanic is going to work well for computations with overlapping
phases, and I suspect that overlapping phases is going to be an
important technique, so we should make sure not to settle into a
synchronization model that makes it hard.

> A problem that I'm still grappling with is how to deal with workers
> that fail to launch.  What I'm proposing so far is based on static
> worker sets, where you can only give the number of workers at
> initialisation time, just like pthread_barrier_init.  Some other
> libraries allow for adjustable worker sets, and I'm wondering if a
> parallel leader might need to be able to adjust the barrier when it
> hears of a worker not starting.  More on that soon.

I think tying this into the number of workers for the parallel context
in general is going to get you into trouble.  For example, suppose
that we have an Append plan and beneath that we have two children of
each of which is a Hash Join.  Right now, Append is dumb, so it will
blindly throw all of the workers at the first Hash Join and then, as
they emerge, it will throw them at the second one.  However, we've had
previous discussions which I'm too lazy to look up right now about
making a Parallel Append that would split the workers between the two
hash joins; if one of them finished, then those workers could go join
the other hash join in medias res.

Now, in this world, it's clearly very bad if each hash join waits for
"all of the workers" to finish a given phase before beginning the next
phase.  In fact, it's probably going to result in both hash joins
hanging, followed by everybody waiting for each other forever.  The
actual condition that must be verified is that there are no workers
which are going to keep

Re: [HACKERS] Logical Replication WIP

2016-08-15 Thread Stas Kelvich

> On 11 Aug 2016, at 17:43, Petr Jelinek  wrote:
> 
>> 
>> * Also I wasn’t able actually to run replication itself =) While regression 
>> tests passes, TAP
>> tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. 
>> I’ll investigate
>> that more and write again.
> 
> Interesting, please keep me posted. It's possible for tables to stay in 's' 
> state for some time if there is nothing happening on the server, but that 
> should not mean anything is stuck.

Slightly played around, it seems that apply worker waits forever for substate 
change.

(lldb) bt
* thread #1: tid = 0x183e00, 0x7fff88c7f2a2 libsystem_kernel.dylib`poll + 
10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x7fff88c7f2a2 libsystem_kernel.dylib`poll + 10
frame #1: 0x0001017ca8a3 
postgres`WaitEventSetWaitBlock(set=0x7fd2dc816b30, cur_timeout=1, 
occurred_events=0x7fff5e7f67d8, nevents=1) + 51 at latch.c:1108
frame #2: 0x0001017ca438 
postgres`WaitEventSetWait(set=0x7fd2dc816b30, timeout=1, 
occurred_events=0x7fff5e7f67d8, nevents=1) + 248 at latch.c:941
frame #3: 0x0001017c9fde 
postgres`WaitLatchOrSocket(latch=0x00010ab208a4, wakeEvents=25, sock=-1, 
timeout=1) + 254 at latch.c:347
frame #4: 0x0001017c9eda postgres`WaitLatch(latch=0x00010ab208a4, 
wakeEvents=25, timeout=1) + 42 at latch.c:302
  * frame #5: 0x000101793352 
postgres`wait_for_sync_status_change(tstate=0x000101e409b0) + 178 at 
tablesync.c:228
frame #6: 0x000101792bbe 
postgres`process_syncing_tables_apply(slotname="subbi", 
end_lsn=140734778796592) + 430 at tablesync.c:436
frame #7: 0x0001017928c1 
postgres`process_syncing_tables(slotname="subbi", end_lsn=140734778796592) + 81 
at tablesync.c:518
frame #8: 0x00010177b620 
postgres`LogicalRepApplyLoop(last_received=140734778796592) + 704 at 
apply.c:1122
frame #9: 0x00010177bef4 postgres`ApplyWorkerMain(main_arg=0) + 1044 at 
apply.c:1353
frame #10: 0x00010174cb5a postgres`StartBackgroundWorker + 826 at 
bgworker.c:729
frame #11: 0x000101762227 
postgres`do_start_bgworker(rw=0x7fd2db70) + 343 at postmaster.c:5553
frame #12: 0x00010175d42b postgres`maybe_start_bgworker + 427 at 
postmaster.c:5761
frame #13: 0x00010175bccf 
postgres`sigusr1_handler(postgres_signal_arg=30) + 383 at postmaster.c:4979
frame #14: 0x7fff9ab2352a libsystem_platform.dylib`_sigtramp + 26
frame #15: 0x7fff88c7e07b libsystem_kernel.dylib`__select + 11
frame #16: 0x00010175d5ac postgres`ServerLoop + 252 at postmaster.c:1665
frame #17: 0x00010175b2e0 postgres`PostmasterMain(argc=3, 
argv=0x7fd2db403840) + 5968 at postmaster.c:1309
frame #18: 0x00010169507f postgres`main(argc=3, 
argv=0x7fd2db403840) + 751 at main.c:228
frame #19: 0x7fff8d45c5ad libdyld.dylib`start + 1
(lldb) p state
(char) $1 = 'c'
(lldb) p tstate->state
(char) $2 = ‘c’

Also I’ve noted that some lsn position looks wrong on publisher:

postgres=# select restart_lsn, confirmed_flush_lsn from pg_replication_slots;
 restart_lsn | confirmed_flush_lsn 
-+-
 0/1530EF8   | 7FFF/5E7F6A30
(1 row)

postgres=# select sent_location, write_location, flush_location, 
replay_location from pg_stat_replication;
 sent_location | write_location | flush_location | replay_location 
---+++-
 0/1530F30 | 7FFF/5E7F6A30  | 7FFF/5E7F6A30  | 7FFF/5E7F6A30
(1 row)



-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-08-15 Thread Petr Jelinek


On 13/08/16 17:34, Steve Singer wrote:

On 08/05/2016 11:00 AM, Petr Jelinek wrote:

Hi,

as promised here is WIP version of logical replication patch.



Thanks for keeping on this.  This is important work


Feedback is welcome.



+
+  Publication
+  
+A Publication object can be defined on any master node, owned by one
+user. A Publication is a set of changes generated from a group of
+tables, and might also be described as a Change Set or Replication
Set.
+Each Publication exists in only one database.

'A publication object can be defined on *any master node*'.  I found
this confusing the first time I read it because I thought it was
circular (what makes a node a 'master' node? Having a publication object
published from it?).   On reflection I realized that you mean ' any
*physical replication master*'.  I think this might be better worded as
'A publication object can be defined on any node other than a standby
node'.  I think referring to 'master' in the context of logical
replication might confuse people.


Makes sense to me.



I am raising this in the context of the larger terminology that we want
to use and potential confusion with the terminology we use for physical
replication. I like the publication / subscription terminology you've
gone with.


 
+Publications are different from table schema and do not affect
+how the table is accessed. Each table can be added to multiple
+Publications if needed.  Publications may include both tables
+and materialized views. Objects must be added explicitly, except
+when a Publication is created for "ALL TABLES". There is no
+default name for a Publication which specifies all tables.
+  
+  
+The Publication is different from table schema, it does not affect
+how the table is accessed and each table can be added to multiple

Those 2 paragraphs seem to start the same way.  I get the feeling that
there is some point your trying to express that I'm not catching onto.
Of course a publication is different than a tables schema, or different
than a function.


Ah that's relic of some editorialization, will fix. The reason why we 
think it's important to mention the difference between publication and 
schema is that they are the only objects that contain tables but they 
affect them in very different ways which might confuse users.




The definition of publication you have on the CREATE PUBLICATION page
seems better and should be repeated here (A publication is essentially a
group of tables intended for managing logical replication. See Section
30.1  for details about how
publications fit into logical replication setup. )


+  
+Conflicts happen when the replicated changes is breaking any
+specified constraints (with the exception of foreign keys which are
+not checked). Currently conflicts are not resolved automatically and
+cause replication to be stopped with an error until the conflict is
+manually resolved.

What options are there for manually resolving conflicts?  Is the only
option to change the data on the subscriber to avoid the conflict?
I assume there isn't a way to flag a particular row coming from the
publisher and say ignore it.  I don't think this is something we need to
support for the first version.


Yes you have to update data on subscriber or skip the the replication of 
whole transaction (for which the UI is not very friendly currently as 
you either have to consume the transaction 
pg_logical_slot_get_binary_changes or by moving origin on subscriber 
using pg_replication_origin_advance).


It's relatively easy to add some automatic conflict resolution as well, 
but it didn't seem absolutely necessary so I didn't do it for the 
initial version.





+  Architecture
+  
+Logical replication starts by copying a snapshot of the data on
+the Provider database. Once that is done, the changes on Provider

I notice the user of 'Provider' above do you intend to update that to
'Publisher' or does provider mean something different. If we like the
'publication' terminology then I think 'publishers' should publish them
not providers.



Okay, I am just used to 'provider' in general (I guess londiste habit), 
but 'publisher' is fine as well.




I'm trying to test a basic subscription and I do the following

I did the following:

cluster 1:
create database test1;
create table a(id serial8 primary key,b text);
create publication testpub1;
 alter publication testpub1 add table a;
insert into a(b) values ('1');

cluster2
create database test1;
create table a(id serial8 primary key,b text);
create subscription testsub2 publication testpub1 connection
'host=localhost port=5440 dbname=test1';
NOTICE:  created replication slot "testsub2" on provider
NOTICE:  synchronized table states
CREATE SUBSCRIPTION

 [...]

The initial sync completed okay, then I did

insert into a(b) values ('2');

but the second insert never replicated.

I had the following

Re: [HACKERS] Slowness of extended protocol

2016-08-15 Thread Vladimir Sitnikov

Shay> What? I really didn't understand your point here. All the doc is saying is
Shay> that if the driver doesn't support prepared statements, then using them

Please read again. PreparedStatement is the only way to execute statements
in JDBC API. There's no API that allows user to specify "use
server-prepared here".
Well, there's non-prepared API in JDBC, however it misses "bind
variables" support,
so if bind variables required, developer would use PreparedStatement.

Java's PreparedStatement does not have an option to distinguish which statements
should be server-prepared and which should not.

Vladimir>> My experience shows, that people are very bad at predicting where the
Vladimir>> performance problem would be.
Vladimir>> For 80% (or even 99%) of the cases, they just do not care
thinking if a
Vladimir>> particular statement should be server-prepared or not.
Vladimir>> They have absolutely no idea how much resources it would
take and so on.
Shay> Maybe you're even right saying these things, I don't know. But
that doesn't
Shay> mean I as a driver should solve their problems for them. And I also get
Shay> that you have an additional argument here besides programmer
Shay> laziness/stupidity - the ORM argument - which makes more sense.

Suppose backend can handle 20 server-prepared statements at most (if using more
it would run out of memory).
Suppose an application has 100 statements with ".prepare()" call.
I think it is reasonable for the DB driver to figure out which
statements are most important
and server-prepare just "20 most important ones", and leave the rest
80 as regular
non-prepared statements.

Do you think the DB driver should just follow developer's advice and
server-prepare
all the 100 statements causing backend crash?
Do you think application developer should have a single list of all
the statements ever
used in the application and make sure there's no more than 20 queries in it?

My main point is not "developers are stupid", but "people often have
wrong guess when
it comes to performance". There are too many moving parts, so it is
hard to predict
performance implications.

Often it is much easier to execute a series of benchmarks that
validate certain hypothesis.
For instance, as Tatsuo says, savepoint overhead for DML is higher
than savepoint
overhead for SELECT, so I plan to have that benchmark as well.


Shay> First, there's nothing stopping an ORM from optimizing multiple inserts
Shay> into a single multivalue insert. I do admit I'm not aware of any who do
Shay> this, but it's a good idea for an optimization - I happen to maintain the
Shay> Entity Framework Core provider for Npgsql, I might take a look at this
Shay> optimization (so again thanks for the idea).

Nothings stops, but M framework times N database drivers results in M*N effort
for each feature.
As you say: application should just use batch API, and it's driver's
job to convert
that into suitable for the database sequence of bytes.

Same for Npgsql: if you implement rewrite at Npgsql level, that would
automatically
improve all the framework/applications running on top of Npgsql.


Shay> I'm going to repeat what I said
Shay> before and it would be good to get some reaction to this. Every software
Shay> component in the stack has a role, and maintaining those separations is
Shay> what keeps things simple and sane

You might be missing my comments on CPU, x86, etc.
My reaction is: almost every existing component is extremely hard to
reason about.

For instance: CPU has certain number of registers, it has certain
amount of L1/L2/...
caches and so on.
Do you mean each and every developer should explicitly specify which
program variable should use register and which one should go into L2 cache?

This is a counter-example to your "sane" "separation". CPU is free to reorder
instruction stream as long as the net result complies to the specification.
In the same way, CPU is free to use L1/L2 caches in whatever way it
thinks is the best.
Note: typical DB driver developer does not try to maintain a set of
"optimal assembly
instructions".
Driver developer relies on the compiler and the CPU so they would optimize
driver's code into the best machine code.

Of course driver might have inline assembly, but that is not how
mainstream drivers are written.

Another example: TCP stack. When DB driver sends some data, kernel is
free to reorder
packets, it is free to interleave, delay them, or even send even use
multiple network cards
to send a single TCP stream.
Windows 10 includes several performance improvements to the TCP stack,
and it is nowhere near
to "kernel is doing exactly what application/driver coded".
Once again: application/driver developer does not optimize for a
specific hardware (e.g. network card).
Developers just use common API and it is kernel's job to use best
optimizations for the particular HW.

The same goes to ORM-DB combo. ORM uses DB driver's API, and it's
drivers job to use
optimal command sequence for the specific

Re: [HACKERS] WIP: Barriers

2016-08-15 Thread Thomas Munro

On Sun, Aug 14, 2016 at 6:54 PM, konstantin knizhnik
 wrote:
> Barriers are really very simple and convenient mechanism for process 
> synchronization.
> But it is actually a special case of semaphores: having semaphore primitive 
> it is trivial to implement a barrier.
> We have semaphores in Postgres, but ... them can not be used by extensions: 
> there is fixed number of semaphores allocated based on maximal number of 
> connections and there is no mechanism for requesting additional semaphores.

Probably because they are kernel objects requiring extra resource
management.  I'm hoping for something that can be created dynamically
in DSM segments with no cleanup, and that aspect of semaphores is
problematic.

> [...]  I wonder if somebody has measured  how much times latches 
> (signal+socket) are slower then posix semaphores or conditional variables?

I'm not sure if it makes sense for us to use POSIX conditional
variables: they require using POSIX mutexes, and we're pretty heavily
invested in the use of lwlocks that are hooked into our error handling
system, and spinlocks.

I'd be curious to know if you can make a better barrier with
semaphores.  I've attached a little test module which can be used to
measure the time for N processes to synchronise at M barrier wait
points.  You can run with SELECT barrier_test(, ,
), where implementation 0 uses the barrier patch I
posted and you can add another implementation as 1.  This patch
requires lwlocks-in-dsm-v3.patch, condition-variable-v2.patch,
barrier-v1.patch, in that order.

This implementation is using a spinlock for the arrival counter, and
signals (via Robert's condition variables and latches) for waking up
peer processes when the counter reaches the target.  I realise that
using signals for this sort of thing is a bit unusual outside the
Postgres universe,  but won't a semaphore-based implementation require
just as many system calls, context switches and scheduling operations?

-- 
Thomas Munro
http://www.enterprisedb.com

barrier-test.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-15 Thread Shay Rojansky

Apologies, I accidentally replied off-list, here's the response I sent.
Vladimir, I suggest you reply to this message with your own response...

On Sat, Aug 13, 2016 at 6:32 PM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Shay>To be honest, the mere idea of having an SQL parser inside my driver
> makes me shiver.
>
> Same for me.
> However I cannot wait for PostgreSQL 18 that does not need client-side
> parsing.
>

I'm glad we agree on something. For me the problem has nothing to do with
PostgreSQL or performance - it has to do with database APIs that impose
uniform parameter placeholder formats, and therefore force drivers to
rewrite user SQL queries. AFAIK rewriting a user's SQL for optimization is
totally out of the scope of the driver's work.

Shay>There's nothing your driver is doing that the application developer
> can't do themselves -
> Shay>so your driver isn't faster than other drivers. It's faster only when
> used by lazy programmers.
>
> I'm afraid you do not get the point.
> ORMs like Hibernate, EclipseLink, etc send regular "insert ... values" via
> batch API.
> For the developer the only way to make use of "multivalues" is to
> implement either "ORM fix" or "the driver fix" or "postgresql fix".
>
So the feature has very little to do with laziness of the programmers.
> Application developer just does not have full control of each SQL when
> working though ORM.
> Do you suggest "stop using ORMs"? Do you suggest fixing all the ORMs so it
> uses optimal for each DB insert statement?
> Do you suggest fixing postgresql?
>
> Once again "multivalues rewrite at pgjdbc level" enables the feature
> transparently for all the users. If PostgreSQL 10/11 would improve
> bind/exec performance, we could even drop that rewrite at pgjdbc level and
> revert to the regular flow. That would again be transparent to the
> application.
>
>
I do get the point, and in fact I myself mentioned the ORM case above as an
advantage of implicit query preparation.

First, there's nothing stopping an ORM from optimizing multiple inserts
into a single multivalue insert. I do admit I'm not aware of any who do
this, but it's a good idea for an optimization - I happen to maintain the
Entity Framework Core provider for Npgsql, I might take a look at this
optimization (so again thanks for the idea).

Second, it's well-known that using an ORM almost always implies a
performance sacrifice - it's a tradeoff that's chosen when going with an
ORM. It's great that you optimize multiple inserts, but there are a myriad
of other cases where an ORM generates less efficient SQL that what would be
possible - but I don't think it makes sense for the driver to actually
contain an SQL optimizer. Slightly worse performance isn't in itself a
reason to drop ORMs: it's frequent practice to drop down to raw SQL for
performance-critical operations, etc.

But all that isn't really important - I'm going to repeat what I said
before and it would be good to get some reaction to this. Every software
component in the stack has a role, and maintaining those separations is
what keeps things simple and sane. Just for fun, we could imagine a kernel
network-stack feature which analyzes outgoing messages and optimizes them;
we could even implement your multiple insert -> multivalue insert
optimization there. This would have the advantage of working out-of-the-box
for every driver and every language out there (just like your driver does
provides it transparently for all ORMs) But nobody in their right mind
would think of doing something like this, and for good reason.

The programmer's task is to write SQL, the driver's task is to communicate
that SQL via the database-specific protocol, the kernel's networking
stack's task is to transmit that protocol via TCP, etc. etc. If an ORM is
used, the programmer effectively outsources the task of writing SQL to
another component, which is supposed to do a good job about it. Once you go
about blurring all the lines here, everything becomes more complicated,
brittle and hard to debug.

For what it's worth, I can definitely imagine your kind of optimizations
occurring at some additional layer which the user would choose to use - an
intermediate SQL optimizer between application code (or ORM) and the
driver. This "SQL optimizer" layer would keep the driver itself lean and
simple (for users who *don't* want all the magic), while allowing for
transparent optimizations for ORMs. Or if the magic is implemented at the
driver leve, it should be opt-in, or at least easy to disable entirely.

>

> Shay>are you sure there aren't "hidden" costs on the PostgreSQL side for
> generating so many implicit savepoints?
>
> Technically speaking I use the same savepoint name through bind/exec
> message.
>

Out of curiosity, I whipped up a quick benchmark (voltmeter) of the impact
of adding a savepoint before every command. Each iteration creates a
transaction, sends one Bind/Execute for "SELECT 1" (which was prepared
during setup),

Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-15 Thread Robert Haas

On Sat, Aug 13, 2016 at 4:36 AM, Amit Kapila  wrote:
> AFAICS, your patch seems to be the right fix for this issue, unless we
> need the instrumentation information during execution (other than for
> explain) for some purpose.

Hmm, I disagree.  It should be the job of
ExecParallelRetrieveInstrumentation to allocate its data in the
correct context, not the responsibility of nodeGather.c to work around
the fact that it doesn't.  The worker instrumentation should be
allocated in the same context as the regular instrumentation
information, which I assume is probably the per-query context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Aleksander Alekseev

> Why are you sending this off-list?  Please let's keep the discussion
> on the mailing list.  I suggest resending this there.

Sorry for that. I accidentally removed pgsql-hackers@ from CC list or
maybe my email client somehow did it for me. Short after that I realized
my mistake and sent a copy to the mailing list. 

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] to_date_valid()

2016-08-15 Thread Andreas 'ads' Scherbaum


On 15.08.2016 13:44, Artur Zakirov wrote:

On 15.08.2016 14:33, Andreas 'ads' Scherbaum wrote:

Is it right and "true" way to validate date by extra transforming and
comparison?

Maybe validate date by using ValidateDate(). Attached sample patch.


This does not solve the problem at hand, and let's wrong dates/formats
slip through:

./buildclient.py -v -c demo-config-pg.yaml --run-configure --run-make
--run-install --no-clean-at-all --patch
'https://www.postgresql.org/message-id/95738e12-6ed6-daf5-9dcf-6336072e6b15%40postgrespro.ru'




postgres=# SELECT to_date('2011 12  18', ' MM   DD');
  to_date

 2011-12-08
(1 row)


That is from the regression tests, and obviously handles the date
transformation wrong. My attempt catches this, because I compare the
date with the input date, and do not rely on a valid date only.


I suppose that your sample query is an another issue, not date validate
task. I sent the patch to the thread
https://www.postgresql.org/message-id/b2a39359-3282-b402-f4a3-057aae500...@postgrespro.ru
. It fixes formatting issues.

I thought that it is better to distinguish this issues to:
- validation of input date/timestmap string and input format string
- result date/timestamp validation


I was a bit confused when I've seen that you modified another function 
in your patch, but tried it out nevertheless. After all, you answered 
one of my emails, and I let my tool download your patch directly from 
the website. The other thread you are mentioning is a different topic.


Anyway, what I'm trying to solve is validate that to_date() produces 
valid and correct output. It does not, as of today.



Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2016-08-15 Thread Alexander Korotkov

On Sat, Aug 13, 2016 at 2:15 AM, Alvaro Herrera 
wrote:

> Many have expressed their interest in this topic, but I haven't seen any
> design of how it should work.

That's it.  My design presented at PGCon was very sketchy, and I didn't
deliver any prototype yet.

> Here's my attempt; I've been playing with
> this for some time now and I think what I propose here is a good initial
> plan.

Great!  It's nice to see you're working in this direction!

> This will allow us to write permanent table storage that works
> differently than heapam.c.  At this stage, I haven't throught through
> whether this is going to allow extensions to define new storage modules;
> I am focusing on AMs that can coexist with heapam in core.
>

So, as I get you're proposing extendability to replace heap with something
another
but compatible with heap (with executor nodes, index access methods and so
on).
That's good, but what alternative storage access methods could be?
AFAICS, we can't fit here, for instance, another MVCC implementation (undo
log) or
in-memory storage or columnar storage. However, it seems that we would be
able to make compressed heap or alternative HOT/WARM tuples mechanism.
Correct me if I'm wrong.

ISTM you're proposing something quite orthogonal to my view
of pluggable storage engines.  My idea, in general, was to extend FDW
mechanism
to let it be something manageable inside database (support for VACUUM,
defining
indexes and so on).  But imperative was that it should have its own
executor nodes,
and it doesn't have to compatible with existing index access methods.

Therefore, I think my design presented at PGCon and your current proposal
are
about orthogonal features which could coexist.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Robert Haas

Why are you sending this off-list?  Please let's keep the discussion
on the mailing list.  I suggest resending this there.

On Mon, Aug 15, 2016 at 5:01 AM, Aleksander Alekseev
 wrote:
>> > >>> I think the whole idea of a fast temporary table is that there
>> > >>> are no catalog entries.  If there are no catalog entries, then
>> > >>> dependencies are not visible.  If there ARE catalog entries, to
>> > >>> what do they refer? Without a pg_class entry for the table,
>> > >>> there's no table OID upon which to depend.
>> >
>> > >> TBH, I think that the chances of such a design getting committed
>> > >> are not distinguishable from zero.  Tables have to have OIDs;
>> > >> there is just too much code that assumes that.  And I seriously
>> > >> doubt that it will work (for any large value of "work") without
>> > >> catalog entries.
>> >
>> > > That seems a bit too defeatist.
>> >
>> > Huh?  I didn't say we shouldn't work on the problem --- I just
>> > think that this particular approach isn't good.  Which you seemed
>> > to agree with.
>>
>> I took your statement to mean that they need a pg_class entry - even
>> if there were a partial solution to the pg_depend problem allowing to
>> avoid pg_attribute entries, tha't still not really be a solution. If
>> that's not what you mean, sorry - and nice that we agree ;)
>>
>>
>
> Just to keep things sane I would like to remind that in this concrete
> patch there _are_ catalog entries:
>
> ```
> [...]
> This file contents imlementation of special type of temporary tables ---
> fast temporary tables (FTT). From user perspective they work exactly as
> regular temporary tables. However there are no records about FTTs in
> pg_catalog. These records are stored in backend's memory instead and
> mixed with regular records during scans of catalog tables. We refer to
> corresponding tuples of catalog tables as "in-memory" or "virtual"
> tuples and to all these tuples together --- as "in-memory" or "virtual"
> catalog.
> [...]
> ```
>
> As Tom pointed out a lot of PL/pgSQL code would stop working otherwise.
> Also I mentioned that in this case even \d and \d+ would not work.
>
> I personally find this discussion very confusing. Maybe we should
> concentrate on a concrete patch instead of some abstract ideas and
> topics that are still open.
>
> For instance it surprises me that apparently there is no one who
> objects "lets make all temporary tables fast temporary tables" idea.
> Since in this case code would use more memory for keeping a virtual
> catalog wouldn't it be considered a major change of behavior that could
> break someones production environment?
>
> --
> Best regards,
> Aleksander Alekseev



-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] to_date_valid()

2016-08-15 Thread Artur Zakirov


On 15.08.2016 14:33, Andreas 'ads' Scherbaum wrote:

Is it right and "true" way to validate date by extra transforming and
comparison?

Maybe validate date by using ValidateDate(). Attached sample patch.


This does not solve the problem at hand, and let's wrong dates/formats
slip through:

./buildclient.py -v -c demo-config-pg.yaml --run-configure --run-make
--run-install --no-clean-at-all --patch
'https://www.postgresql.org/message-id/95738e12-6ed6-daf5-9dcf-6336072e6b15%40postgrespro.ru'



postgres=# SELECT to_date('2011 12  18', ' MM   DD');
  to_date

 2011-12-08
(1 row)


That is from the regression tests, and obviously handles the date
transformation wrong. My attempt catches this, because I compare the
date with the input date, and do not rely on a valid date only.


I suppose that your sample query is an another issue, not date validate 
task. I sent the patch to the thread 
https://www.postgresql.org/message-id/b2a39359-3282-b402-f4a3-057aae500...@postgrespro.ru 
. It fixes formatting issues.


I thought that it is better to distinguish this issues to:
- validation of input date/timestmap string and input format string
- result date/timestamp validation

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] to_date_valid()

2016-08-15 Thread Andreas 'ads' Scherbaum


On 15.08.2016 10:24, Artur Zakirov wrote:

On 14.08.2016 01:52, Andreas 'ads' Scherbaum wrote:


Attached is a patch to "do the right thing". The verification is in
"to_date()" now, the extra function is removed. Regression tests are
updated - two or three of them returned a wrong date before, and still
passed. They fail now. Documentation is also updated.


Regards,



Is it right and "true" way to validate date by extra transforming and
comparison?

Maybe validate date by using ValidateDate(). Attached sample patch.


This does not solve the problem at hand, and let's wrong dates/formats 
slip through:


./buildclient.py -v -c demo-config-pg.yaml --run-configure --run-make 
--run-install --no-clean-at-all --patch 
'https://www.postgresql.org/message-id/95738e12-6ed6-daf5-9dcf-6336072e6b15%40postgrespro.ru'



postgres=# SELECT to_date('2011 12  18', ' MM   DD');
  to_date

 2011-12-08
(1 row)


That is from the regression tests, and obviously handles the date 
transformation wrong. My attempt catches this, because I compare the 
date with the input date, and do not rely on a valid date only.


--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-08-15 Thread Amit Langote

Thanks a lot for taking a look at this.

On 2016/08/11 3:22, Robert Haas wrote:
> On Wed, Aug 10, 2016 at 7:09 AM, Amit Langote
>  wrote:
>> Attached is the latest set of patches to implement declarative
>> partitioning.
> 
> Cool.  I would encourage you to give some thought to what is the least
> committable subset of these patches, and think about whether it can be
> reorganized to make that smaller.  Based on the descriptions, it
> sounds to me like the least committable subset is probably all of 0001
> - 0005 as one giant commit, and that's a lot of code.  Maybe there's
> no real way to come up with something more compact than that, but it's
> worth some thought.

I will consider this and try to come up with a minimal patch set covering
what is now 0001 - 0005.

>> 0001-Catalog-and-DDL-for-partition-key.patch
> 
> +   linkend="catalog-pg-partitioned">pg_partitioned
> 
> pg_partitioned seems like a slightly strange choice of name.  We have
> no other catalog tables whose names end in "ed", so the use of a past
> participle here is novel.  More generally, this patch seems like it's
> suffering a bit of terminological confusion: while the catalog table
> is called pg_partitioned, the relkind is RELKIND_PARTITION_REL.  And
> get_relation_by_qualified_name() thinks that RELKIND_PARTITION_REL is
> a "table", but getRelationDescription thinks it's a "partitioned
> table".  I think we need to get all these bits and pieces on the same
> page, or have some kind of consistent way of deciding what to do in
> each case.  Maybe pg_partitioned_table, RELKIND_PARTITIONED_TABLE,
> etc. for the internal stuff, but just "table" in user-facing messages.

Name pg_partitioned does sound a little strange now that you mention it.

I agree to make the terminology more consistent and to that end, your
suggestion to keep the user-facing term "table" and using
pg_partitioned_table and RELKIND_PARTITIONED_TABLE for internals sounds good.

> Alternatively, I wonder if we should switch to calling this a
> "partition root" rather than a "partitioned table".  It's not the
> whole table, just the root.  And doesn't contain data - in fact it
> probably shouldn't even have storage - so calling it a "table" might
> be confusing.  But if we're using CREATE TABLE to create it in the
> ifrst place, then calling it something other than a table later on is
> also confusing.  Hmm.

I think it makes sense to keep calling it a table because it has all the
logical properties of a table even though it will differ from a regular
table on the basis of physical implementation details such as that it does
not own physical storage.  Am I missing something?

> 
> +  partexprs
> 
> There's a certain symmetry between this and what we do for indexes,
> but I'm wondering whether there's a use case for partitioning a table
> by an expression rather than a column value.  I suppose if you've
> already done the work, there's no harm in supporting it.

Yeah, it's not a whole lot of code to manage expressions alongside simple
column references.

> +[ PARTITION BY {RANGE | LIST} ( {  class="parameter">column_name | (  class="parameter">expression ) } [  class="parameter">opclass ] [, ...] )
> 
> The spacing isn't right here.  Should say { RANGE | LIST } rather than
> {RANGE|LIST}.  Similarly elsewhere.

Will fix.

> +  thus created is called partitioned table.  Key
> +  consists of an ordered list of column names and/or expressions when
> +  using the RANGE method, whereas only a single column or
> +  expression can be specified when using the LIST method.
> 
> Why do we support range partitioning with multiple columns, but list
> partitioning only with a single column?

This is mostly because I didn't find any other database supporting it and
hence thought maybe there is not much use for it.

On the implementation side, I think it makes sense to think of ordering of
tuples (for a multi-column range key), where we compare a new tuple's
partition key columns one-by-one until we find the column such that its
value != the value of corresponding column in the range upper bound of a
partition (usually but not necessarily the last column of the key).  Once
we have determined which side of the bound the value was, we then perform
the same process with either the lower bound of the same partition or
upper bound of some partition in the other half as determined by binary
search logic.

Conversely, when determining the constraint on rows a range partition
contains, we emit a (keycol = lowerval) expression for all columns i =
0..j-1 where range.lowerval[i] = range.upperval[i] and then emit (keycol >
/ >= range.lowerval[j] and keycol < / <= upperval[j]) for the first column
j where lowerval[j] < upperval[j] and stop at that column (any further
columns are irrelevant).

When considering the same for list partitioning where we consider set-
membership of values (based on equality of a new tuple's value for the
column and a

[HACKERS] pg_hba_file_settings view patch

2016-08-15 Thread Haribabu Kommi

Hi All,

While working on pg_hba_lookup function that can be used to lookup for an client
authentication that can be matched for given input parameters, Tom raised some
concrete use case issues in the following mail [1]. In this same
thread, he raised
some advantages of having a view similar like pg_file_settings view
for pg_hba.conf
also.

Here I attached a patch that implements the pg_hba_file_settings view
that displays
all the rows in pg_hba.conf. In case if any error exists in the
authentication rule, the
corresponding error is displayed similar like pg_file_settings.

This view can be used to verify whether there exists any problems or
not in the pg_hba.conf
before it reloads into the system. This view cannot be used to check
similar like
pg_hba_lookup function to find out which rule maps to the
corresponding input connection.

comments?

[1] - https://www.postgresql.org/message-id/28434.1468246200%40sss.pgh.pa.us

Regards,
Hari Babu
Fujitsu Australia


pg_hba_file_settings_1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Christoph Berg

Re: To Tom Lane 2016-08-15 <20160815111057.v2mqqjp4aabvw...@msg.df7cb.de>
> Re: Tom Lane 2016-07-30 <1184.1469890...@sss.pgh.pa.us>
> > In short, I think that the way to make something like this work is to
> > figure out how to have "virtual" catalog rows describing a temp table.
> > Or maybe to partition the catalogs so that vacuuming away temp-table
> > rows is easier/cheaper than today.
> 
> We should also be thinking about how the opposite idea of "global"
> temp tables

(Obviously I should catch up on the rest of the thread when postponing
a message for an hour or two. Sorry for the duplicated idea here...)

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Christoph Berg

Re: Tom Lane 2016-07-30 <1184.1469890...@sss.pgh.pa.us>
> In short, I think that the way to make something like this work is to
> figure out how to have "virtual" catalog rows describing a temp table.
> Or maybe to partition the catalogs so that vacuuming away temp-table
> rows is easier/cheaper than today.

We should also be thinking about how the opposite idea of "global"
temp tables (I believe that's what Oracle calls them) would work.
These have a persistent structure in the catalogs, just the data is
private to every session (or transaction); every connection starts
with an empty temp table and for their use.

I'd guess that type of global temp tables would fix the bloat problem
also very efficiently. (Ad-hoc temp tables shouldn't occur that often
so the bloat caused by them wouldn't matter that much. If they do,
their structure is likely always the same, and they could be made
"global" in the schema.)

The bit that needs to be thought out here would be how to maintain
statistics for these tables. Obviously ANALYZE shouldn't update any
globally visible data.

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Pavel Stehule

2016-08-15 12:18 GMT+02:00 Aleksander Alekseev :

> > The global temporary tables has persistent rows in the catalogue. The
> > mapping to files can be marked as special and real mapping should be
> > only in memory.
> >
> > So the changes in catalogue related to global temporary tables are
> > pretty less frequently.
>
> I'm afraid I still don't get it. Let say I have an application that
> does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
> time. Naturally there is not only one temp table per session. Could you
> explain please in more detail how exactly do these persistent rows help?
>
>
when you use global temporary tables, then you create it only once - like
usual tables.

you don't drop these tables.

Regards

Pavel


> --
> Best regards,
> Aleksander Alekseev
>

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Aleksander Alekseev

> The global temporary tables has persistent rows in the catalogue. The
> mapping to files can be marked as special and real mapping should be
> only in memory.
> 
> So the changes in catalogue related to global temporary tables are
> pretty less frequently.

I'm afraid I still don't get it. Let say I have an application that
does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
time. Naturally there is not only one temp table per session. Could you
explain please in more detail how exactly do these persistent rows help?

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Pavel Stehule

2016-08-15 12:00 GMT+02:00 Aleksander Alekseev :

> > But we can change this discussion little bit different. I believe so
> > solution should be *global temporary tables*. These tables has
> > persistent catalogue entries. Data are joined with session. These
> > tables can be effective solution of problem with temporary tables,
> > can be strong benefit for developers (more comfortable, possible
> > static analyse of PLpgSQL) and it simplify life to all people who has
> > do migration from Oracle. So only benefits are there :).
>
> I don't think that global temporary tables solve "catalog bloating that
> causes auto vacuum" problem. I suggest we don't change a topic. Or maybe
> I don't know something about global temporary tables?
>

The global temporary tables has persistent rows in the catalogue. The
mapping to files can be marked as special and real mapping should be only
in memory.

So the changes in catalogue related to global temporary tables are pretty
less frequently.

Regards

Pavel


>
> --
> Best regards,
> Aleksander Alekseev
>

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Aleksander Alekseev

> But we can change this discussion little bit different. I believe so
> solution should be *global temporary tables*. These tables has
> persistent catalogue entries. Data are joined with session. These
> tables can be effective solution of problem with temporary tables,
> can be strong benefit for developers (more comfortable, possible
> static analyse of PLpgSQL) and it simplify life to all people who has
> do migration from Oracle. So only benefits are there :).

I don't think that global temporary tables solve "catalog bloating that
causes auto vacuum" problem. I suggest we don't change a topic. Or maybe
I don't know something about global temporary tables?

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Pavel Stehule

2016-08-15 11:01 GMT+02:00 Aleksander Alekseev :

> > > >>> I think the whole idea of a fast temporary table is that there
> > > >>> are no catalog entries.  If there are no catalog entries, then
> > > >>> dependencies are not visible.  If there ARE catalog entries, to
> > > >>> what do they refer? Without a pg_class entry for the table,
> > > >>> there's no table OID upon which to depend.
> > >
> > > >> TBH, I think that the chances of such a design getting committed
> > > >> are not distinguishable from zero.  Tables have to have OIDs;
> > > >> there is just too much code that assumes that.  And I seriously
> > > >> doubt that it will work (for any large value of "work") without
> > > >> catalog entries.
> > >
> > > > That seems a bit too defeatist.
> > >
> > > Huh?  I didn't say we shouldn't work on the problem --- I just
> > > think that this particular approach isn't good.  Which you seemed
> > > to agree with.
> >
> > I took your statement to mean that they need a pg_class entry - even
> > if there were a partial solution to the pg_depend problem allowing to
> > avoid pg_attribute entries, tha't still not really be a solution. If
> > that's not what you mean, sorry - and nice that we agree ;)
> >
> >
>
> Just to keep things sane I would like to remind that in this concrete
> patch there _are_ catalog entries:
>
> ```
> [...]
> This file contents imlementation of special type of temporary tables ---
> fast temporary tables (FTT). From user perspective they work exactly as
> regular temporary tables. However there are no records about FTTs in
> pg_catalog. These records are stored in backend's memory instead and
> mixed with regular records during scans of catalog tables. We refer to
> corresponding tuples of catalog tables as "in-memory" or "virtual"
> tuples and to all these tuples together --- as "in-memory" or "virtual"
> catalog.
> [...]
> ```
>
> As Tom pointed out a lot of PL/pgSQL code would stop working otherwise.
> Also I mentioned that in this case even \d and \d+ would not work.
>
> I personally find this discussion very confusing. Maybe we should
> concentrate on a concrete patch instead of some abstract ideas and
> topics that are still open.
>
> For instance it surprises me that apparently there is no one who
> objects "lets make all temporary tables fast temporary tables" idea.
> Since in this case code would use more memory for keeping a virtual
> catalog wouldn't it be considered a major change of behavior that could
> break someones production environment?
>

It is pretty hard discussion about cost or usability of FTT. The small FTT
(for usage in PLpgSQL) can be replaced by arrays. The overhead of
pg_catalog of big TT is not significant. So introduction special
proprietary table type is debatable.

Probably size of metadata of temporary tables should be minimal - currently
all metadata are cached in memory - and it is not a problem.

But we can change this discussion little bit different. I believe so
solution should be *global temporary tables*. These tables has persistent
catalogue entries. Data are joined with session. These tables can be
effective solution of problem with temporary tables, can be strong benefit
for developers (more comfortable, possible static analyse of PLpgSQL) and
it simplify life to all people who has do migration from Oracle. So only
benefits are there :).

Regards

Pavel


>
> --
> Best regards,
> Aleksander Alekseev
>

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Aleksander Alekseev

> > >>> I think the whole idea of a fast temporary table is that there
> > >>> are no catalog entries.  If there are no catalog entries, then
> > >>> dependencies are not visible.  If there ARE catalog entries, to
> > >>> what do they refer? Without a pg_class entry for the table,
> > >>> there's no table OID upon which to depend.
> > 
> > >> TBH, I think that the chances of such a design getting committed
> > >> are not distinguishable from zero.  Tables have to have OIDs;
> > >> there is just too much code that assumes that.  And I seriously
> > >> doubt that it will work (for any large value of "work") without
> > >> catalog entries.
> > 
> > > That seems a bit too defeatist.
> > 
> > Huh?  I didn't say we shouldn't work on the problem --- I just
> > think that this particular approach isn't good.  Which you seemed
> > to agree with.
> 
> I took your statement to mean that they need a pg_class entry - even
> if there were a partial solution to the pg_depend problem allowing to
> avoid pg_attribute entries, tha't still not really be a solution. If
> that's not what you mean, sorry - and nice that we agree ;)
> 
>   

Just to keep things sane I would like to remind that in this concrete
patch there _are_ catalog entries:

```
[...]
This file contents imlementation of special type of temporary tables ---
fast temporary tables (FTT). From user perspective they work exactly as
regular temporary tables. However there are no records about FTTs in
pg_catalog. These records are stored in backend's memory instead and
mixed with regular records during scans of catalog tables. We refer to
corresponding tuples of catalog tables as "in-memory" or "virtual"
tuples and to all these tuples together --- as "in-memory" or "virtual"
catalog.
[...]
```

As Tom pointed out a lot of PL/pgSQL code would stop working otherwise.
Also I mentioned that in this case even \d and \d+ would not work.

I personally find this discussion very confusing. Maybe we should
concentrate on a concrete patch instead of some abstract ideas, and
topics that are still open.

For instance it surprises me that apparently there is no one who
objects "lets make all temporary tables fast temporary tables" idea.
Since in this case code would use more memory for keeping a virtual
catalog wouldn't it be considered a major change of behavior that can
break someones production environment?

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-15 Thread Andrew Borodin

>So on average in a large randomly filled index, pages
spend more time nearer 50% full than 100% full.

I think we can make this number more...controllable.
Before split we can check whether left and right pages both are in
shared buffer and if they are seriously under certain fillfactor, say
under 75%. We can unload some of data there instead of spliting. This
will slow down insertions a bit, but we can have any fillfactor we
want for random inserts. I mean, for sure, someone can construct bad
input to gain low fillfactor, like it is with qsort (
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf ), but every real
scenario will not trigger this behavior.

But then we have to think about locks, WALs etc.


Best regards, Andrey Borodin, Octonica & Ural Federal University.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] to_date_valid()

2016-08-15 Thread Artur Zakirov


On 14.08.2016 01:52, Andreas 'ads' Scherbaum wrote:


Attached is a patch to "do the right thing". The verification is in
"to_date()" now, the extra function is removed. Regression tests are
updated - two or three of them returned a wrong date before, and still
passed. They fail now. Documentation is also updated.


Regards,



Is it right and "true" way to validate date by extra transforming and 
comparison?


Maybe validate date by using ValidateDate(). Attached sample patch.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index bbd97dc..c0048c9 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -3563,7 +3563,9 @@ do_to_timestamp(text *date_txt, text *fmt,
 {
 	FormatNode *format;
 	TmFromChar	tmfc;
-	int			fmt_len;
+	int			fmt_len,
+fmask = 0;		/* Bit mask for ValidateDate() */
+	char	   *date_str = NULL;
 
 	ZERO_tmfc();
 	ZERO_tm(tm);
@@ -3574,7 +3576,6 @@ do_to_timestamp(text *date_txt, text *fmt,
 	if (fmt_len)
 	{
 		char	   *fmt_str;
-		char	   *date_str;
 		bool		incache;
 
 		fmt_str = text_to_cstring(fmt);
@@ -3630,7 +3631,6 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 		DCH_from_char(format, date_str, );
 
-		pfree(date_str);
 		pfree(fmt_str);
 		if (!incache)
 			pfree(format);
@@ -3706,6 +3706,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 			if (tmfc.bc && tm->tm_year > 0)
 tm->tm_year = -(tm->tm_year - 1);
 		}
+		fmask |= DTK_M(YEAR);
 	}
 	else if (tmfc.cc)			/* use first year of century */
 	{
@@ -3717,10 +3718,14 @@ do_to_timestamp(text *date_txt, text *fmt,
 		else
 			/* +1 because year == 599 is 600 BC */
 			tm->tm_year = tmfc.cc * 100 + 1;
+		fmask |= DTK_M(YEAR);
 	}
 
 	if (tmfc.j)
+	{
 		j2date(tmfc.j, >tm_year, >tm_mon, >tm_mday);
+		fmask |= DTK_DATE_M;
+	}
 
 	if (tmfc.ww)
 	{
@@ -3734,6 +3739,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 isoweekdate2date(tmfc.ww, tmfc.d, >tm_year, >tm_mon, >tm_mday);
 			else
 isoweek2date(tmfc.ww, >tm_year, >tm_mon, >tm_mday);
+			fmask |= DTK_DATE_M;
 		}
 		else
 			tmfc.ddd = (tmfc.ww - 1) * 7 + 1;
@@ -3744,11 +3750,17 @@ do_to_timestamp(text *date_txt, text *fmt,
 	if (tmfc.d)
 		tm->tm_wday = tmfc.d - 1;		/* convert to native numbering */
 	if (tmfc.dd)
+	{
 		tm->tm_mday = tmfc.dd;
+		fmask |= DTK_M(DAY);
+	}
 	if (tmfc.ddd)
 		tm->tm_yday = tmfc.ddd;
 	if (tmfc.mm)
+	{
 		tm->tm_mon = tmfc.mm;
+		fmask |= DTK_M(MONTH);
+	}
 
 	if (tmfc.ddd && (tm->tm_mon <= 1 || tm->tm_mday <= 1))
 	{
@@ -3771,6 +3783,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 			j0 = isoweek2j(tm->tm_year, 1) - 1;
 
 			j2date(j0 + tmfc.ddd, >tm_year, >tm_mon, >tm_mday);
+			fmask |= DTK_DATE_M;
 		}
 		else
 		{
@@ -3793,9 +3806,24 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 			if (tm->tm_mday <= 1)
 tm->tm_mday = tmfc.ddd - y[i - 1];
+
+			fmask |= DTK_M(MONTH) | DTK_M(DAY);
 		}
 	}
 
+	/* Validate date with bit mask received above */
+	if (fmask != 0 && date_str)
+	{
+		if (ValidateDate(fmask, false, false, false, tm) != 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_DATETIME_FIELD_OVERFLOW),
+	 errmsg("date/time field value out of range: \"%s\"",
+			date_str)));
+	}
+
+	if (date_str)
+		pfree(date_str);
+
 #ifdef HAVE_INT64_TIMESTAMP
 	if (tmfc.ms)
 		*fsec += tmfc.ms * 1000;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-15 Thread Mark Kirkwood


On 13/08/16 05:44, Jeff Janes wrote:

On Fri, Aug 12, 2016 at 1:40 AM, Mark Kirkwood

However your index rebuild gets you from 5 to 3 GB - does that really help
performance significantly?

It can make a big difference, depending on how much RAM you have.



Yeah - I suspect this is the issue - loading up a similar type of schema 
with records with a primary key of form 'userx' for (uniformly) 
randomly distributed x... (I was gonna use the Yahoo benchmark but 
it is s slow...). Also I'm using 1000 rows instead of 1 
to avoid waiting a long time (1000 should be enough to show the point):


prefix=# \d prefix
   Table "public.prefix"
 Column | Type  | Modifiers
+---+---
 uid| character varying(30) | not null
 filler | character(255)|
Indexes:
"prefix_pkey" PRIMARY KEY, btree (uid)

Doing an uncached indexed read by forcing a buffer cache clear:

# echo 3 > /proc/sys/vm/drop_caches

prefix=# SELECT 
relfilenode,relname,reltuples,pg_relation_size(oid)/1024/1024 AS mb

FROM pg_class WHERE relname LIKE 'prefix%';
 relfilenode |   relname   | reltuples | mb
-+-+---+-
 6017817 | prefix  | 1e+07 | 422
 6017819 | prefix_pkey | 1e+07 | 391
(2 rows)

prefix=# EXPLAIN ANALYZE SELECT count(*)
 FROM prefix WHERE uid='user1';
  QUERY PLAN


---
 Aggregate  (cost=8.46..8.46 rows=1 width=0) (actual time=3.408..3.408 
rows=1 lo

ops=1)
   ->  Index Only Scan using prefix_pkey on prefix (cost=0.43..8.45 
rows=1 widt

h=0) (actual time=3.406..3.406 rows=0 loops=1)
 Index Cond: (uid = 'user1'::text)
 Heap Fetches: 0
 Planning time: 19.362 ms
 Execution time: 3.429 ms
(6 rows)

Repeating this after REINDEX:

# echo 3 > /proc/sys/vm/drop_caches

prefix=# SELECT 
relfilenode,relname,reltuples,pg_relation_size(oid)/1024/1024 AS mb

FROM pg_class WHERE relname LIKE 'prefix%';
 relfilenode |   relname   | reltuples | mb
-+-+---+-
 6017817 | prefix  | 1e+07 | 422
 6017819 | prefix_pkey | 1e+07 | 300
(2 rows)

prefix=# EXPLAIN ANALYZE SELECT count(*)
 FROM prefix WHERE uid='user1';
  QUERY PLAN


---
 Aggregate  (cost=8.46..8.46 rows=1 width=0) (actual time=3.868..3.868 
rows=1 lo

ops=1)
   ->  Index Only Scan using prefix_pkey on prefix (cost=0.43..8.45 
rows=1 widt

h=0) (actual time=3.866..3.866 rows=0 loops=1)
 Index Cond: (uid = 'user1'::text)
 Heap Fetches: 0
 Planning time: 19.366 ms
 Execution time: 3.889 ms
(6 rows)

So certainly not significantly *slower* with the physically bigger 
index. This suggests that Jeff's analysis was spot on - likely that the 
larger index didn't fix in RAM.


Cheers

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch proposal

2016-08-15 Thread Venkata B Nagothi

Hi,

During the recovery process, It would be nice if PostgreSQL generates an
error by aborting the recovery process (instead of starting-up the cluster)
if the intended recovery target point is not reached and give an option to
DBA to resume the recovery process from where it exactly stopped.

The issue here is, if by any chance, the required WALs are not available or
if there is any WAL missing or corrupted at the restore_command location,
then PostgreSQL recovers until the end of the last available WAL and
starts-up the cluster. At a later point-of-time, if the missing WAL is
found, then, it is not possible to resume the recovery process from where
it stopped previously. The whole backup restoration + recovery process must
be initiated from the beginning which can be time consuming especially when
recovering huge clusters sizing up to Giga bytes and Tera Bytes requiring
1000s of WALs to be copied over for recovery.

Any thoughts ? comments?

I can work on this patch based on the comments/feedback.

Regards,
Venkata B N

Fujitsu Australia

Re: [HACKERS] condition variables

2016-08-15 Thread Thomas Munro

On Mon, Aug 15, 2016 at 5:58 PM, Thomas Munro
 wrote:
> Also, I have attached a v2->v3 diff ...
Ugh.  I meant a v1->v2 diff.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

91 matches

Mail list logo