Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-07-18 Thread Michael Paquier
On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila  wrote:
> On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
>  wrote:
>> If we want to tackle the case I mentioned above, one way is to just
>> update minRecoveryPoint when an exclusive or a non-exclusive backup is
>> being taken by looking at their status in shared memory. See for
>> example the patch (1) attached, but this does not save from the case
>> where a node replays WAL, does not have data flushes, and from which a
>> backup is taken, in the case where this node gets restarted later with
>> the immediate mode and has different replay targets.
>
> This looks clumsy as it updates minrecoverypoint for a specific
> condition and it doesn't solve the above mentioned inconcistency.

Yep. I am not saying the contrary. That's why (2) with its separate
fields would make more sense.

>> Another way that just popped into my mind is to add dedicated fields
>> to XLogCtl that set the stop LSN of a backup the way it should be
>> instead of using minRecoveryPoint. In short we'd update those fields
>> in CreateRestartPoint and UpdateMinRecoveryPoint under
>> XLogCtl->info_lck. The good thing is that this lock is already taken
>> there. See patch (2) accomplishing that.
>
> How is it different/preferable then directly using
> XLogCtl->replayEndRecPtr and XLogCtl->replayEndTLI for stop backup
> purpose?  Do you see any problem if we go with what Kyotaro-san has
> proposed in the initial patch [1] (aka using
> XLogCtl->lastReplayedEndRecPtr and XLogCtl->lastReplayedTLI as stop
> backup location)?

Re-reading this thread from scratch and scratching my mind, I am
actually not getting why we bumped into the topic of making
minRecoveryPoint updates more aggressive instead of the first proposal
:)

Knowing that we have no way to be sure if pg_control has been backed
up last or not, using the last replay LSN and TLI would be the most
simple solution, so let's do this for back-branches. It is an
annoyance to not be able to ensure that backups are taken while the
master is stopped or if there is no activity that updates relation
pages.

The thing that is really annoying btw is that there will be always a
gap between minRecoveryPoint and the actual moment where a backup
finishes because there is no way to rely on the XLOG_BACKUP_END
record. On top of that we can not be sure if pg_control has been
backed up last or not. Which is why it would be cool to document that
gap. Another crazy idea would be to return pg_control as an extra
return field of pg_stop_backup() and encourage users to write that
back in the backup itself. This would allow closing any hole in the
current logic for backups taken from live standbys: minRecoveryPoint
would be updated directly to the last replayed LSN/TLI in the control
file.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-07-18 Thread Michael Paquier
On Sat, Jul 16, 2016 at 10:08 AM, Andres Freund  wrote:
> On 2016-07-14 20:53:07 -0700, Andres Freund wrote:
>> On 2016-07-13 23:06:07 -0700, Andres Freund wrote:
>> > won't enter the branch, because HEAP_XMAX_LOCK_ONLY won't be set.  Which
>> > will leave t_ctid and HEAP_HOT_UPDATED set differently on the master and
>> > standby / after crash recovery.   I'm failing to see any harmful
>> > consequences right now, but differences between master and standby are a 
>> > bad
>> > thing.
>>
>> I think it's actually critical, because HEAP_HOT_UPDATED /
>> HEAP_XMAX_LOCK_ONLY are used to terminate ctid chasing loops (like
>> heap_hot_search_buffer()).
>
> I've pushed a quite heavily revised version of the first patch to
> 9.1-master. I manually verified using pageinspect, gdb breakpoints and a
> standby that xmax, infomask etc are set correctly (leading to finding
> a4d357bf).  As there's noticeable differences, especially 9.2->9.3,
> between versions, I'd welcome somebody having a look at the commits.

Waoh, man. Thanks!

I have been just pinged this week end about a set up that likely has
faced this exact problem in the shape of "tuple concurrently updated"
with a node getting kill-9-ed by some framework because it did not
finish its shutdown checkpoint after some time in some test which
enforced it to do crash recovery. I have not been able to put my hands
on the raw data to have a look at the flags set within those tuples
but I got the string feeling that this is related to that. After a
couple of rounds doing so, it was possible to see "tuple concurrently
updated" errors for a relation that has few pages and a high update
rate using 9.4.

More seriously, I have spent some time looking at what you have pushed
on each branch, and the fixes are looking correct to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap_update() VM retry could break HOT?

2016-07-18 Thread Pavan Deolasee
On Mon, Jul 18, 2016 at 12:47 PM, Andres Freund  wrote:

>
> as far as I can see that could mean that we perform hot updates when not
> permitted, because the tuple has been replaced since, including the
> pkey. Similarly, the wrong tuple lock mode could end up being used.
>
> Am I missing something?
>
>
If the to-be-updated tuple gets updated while we were retrying vm pinning,
heap_update() should return HeapTupleUpdated and the caller must wait for
the updating transaction to finish, retry update with the new version (or
fail depending on the isolation level). Given
that HeapTupleSatisfiesUpdate() is called after l2, the logic seems fine to
me.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Improving executor performance

2016-07-18 Thread Peter Geoghegan
On Wed, Jul 13, 2016 at 6:18 PM, Andres Freund  wrote:
> While, as in 6) above, removing linked lists from the relevant
> structures helps, it's not that much. Staring at this for a long while
> made me realize that, somewhat surprisingly to me, is that one of the
> major problems is that we are bottlenecked on stack usage. Constantly
> entering and leaving this many functions for trivial expressions
> evaluations hurts considerably. Some of that is the additional numbers
> of instructions, some of that is managing return jump addresses, and
> some of that the actual bus traffic. It's also rather expensive to check
> for stack limits at a very high rate.

You'll recall how I complained how parallel CREATE INDEX, while
generally effective, became incredibly CPU bound on the still-serial
merge on my C collated text test case (I told you this in person, I
think). I looked into addressing this bottleneck, and made an
interesting discovery, which kind of reminds me of what you say here
about function call overhead.

I hacked up varstrfastcmp_c() to assume 4 byte varlena headers. No
function call to pg_detoast_datum_packed() is made (which otherwise
happens through all that DatumGetVarStringPP() macro indirection). Of
course, that assumption is dangerous in the general case, but it could
be made to work in most cases with a little care, as in practice the
vast majority of text comparisons are of text Datums with a 4 byte
varlena header. SortSupport could reliably detect if it was safe, with
a little help from the text opclass, or something along those lines.
This might not just be useful with sorting, but it does seem to be
particularly useful with parallel sorting.

Just on my laptop, this makes some parallel CREATE INDEX gensort test
cases [1] take as much as 15% less time to execute overall. That's a
big difference. I looked at the disassembly, and the number of
instructions for varstrfastcmp_c() was reduced from 113 to 29. That's
the kind of difference that could add up to a lot.

[1] https://github.com/petergeoghegan/gensort
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dumping database privileges broken in 9.6

2016-07-18 Thread Stephen Frost
Noah,

On Monday, July 18, 2016, Noah Misch  wrote:

> On Fri, Jul 15, 2016 at 03:46:17PM -0400, Stephen Frost wrote:
> > * Noah Misch (n...@leadboat.com ) wrote:
> > > On Sat, Jul 09, 2016 at 12:55:33AM -0400, Stephen Frost wrote:
> > > > * Noah Misch (n...@leadboat.com ) wrote:
> > > > > This PostgreSQL 9.6 open item is past due for your status update.
> Kindly send
> > > > > a status update within 24 hours, and include a date for your
> subsequent status
> > > > > update.  Refer to the policy on open item ownership:
> > > > >
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
> > > >
> > > > Unfortunately, not going to make any further progress on this
> tonight or
> > > > over the weekend as I'm going to be out of town.  I believe I've
> > > > convinced myself that adding a template1 entry to pg_init_privs will
> be
> > > > both sufficient and produce the correct results, along with adjusting
> > > > the query in pg_dumpall to join through it.  Will provide an update
> on
> > > > Monday.
> > >
> > > This PostgreSQL 9.6 open item is long past due for your status
> update.  Kindly
> > > send a status update within 24 hours, and include a date for your
> subsequent
> > > status update.  (Your Tuesday posting lacked a date.)
> >
> > Yeah, I realized that tablespaces have the same issue and have updated
> > the patch to address them as well, in the same way.
> >
> > Going through and doing testing now.  Unfortunately, it doesn't look
> > like adding in testing of tablespaces into the TAP tests would be very
> > easy (the only TAP test that deals with tablespaces that I found was
> > pg_basebackup and that looks rather grotty..), so I'm not planning to do
> > that, at least not at this time.
> >
> > As such, I'm planning to commit the patch with the fix+regression test
> > for database ACLs and the fix for tablespace ACLs either later today
> > or tomorrow.
>
> Does commit 47f5bb9 fully fix this open item?
>

Yes, it does. Apologies for not closing it.

Thanks!

Stephen


Re: [HACKERS] dumping database privileges broken in 9.6

2016-07-18 Thread Michael Paquier
On Sat, Jul 16, 2016 at 4:46 AM, Stephen Frost  wrote:
> Going through and doing testing now.  Unfortunately, it doesn't look
> like adding in testing of tablespaces into the TAP tests would be very
> easy (the only TAP test that deals with tablespaces that I found was
> pg_basebackup and that looks rather grotty..), so I'm not planning to do
> that, at least not at this time.

The cleanest way to handle that in PostgresNode would be to have a
dedicated routine calling psql -c 'create tablespace' with tablespaces
located in a folder $basedir/tbspace. And on top of that there should
be the tablespace metadata saved in the context of the test, with a
pair of (tbspc name, location). But I think that we'd need first
stronger arguments (take more use cases) to introduce such an
extension of the test APIs.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-18 Thread Noah Misch
On Sat, Jul 16, 2016 at 06:48:08PM -0400, Noah Misch wrote:
> On Wed, Jul 13, 2016 at 03:57:02PM -0500, Kevin Grittner wrote:
> > On Wed, Jul 13, 2016 at 12:48 PM, Andres Freund  wrote:
> > > On 2016-07-13 10:06:52 -0500, Kevin Grittner wrote:
> > >> On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila  
> > >> wrote:
> > >>> On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner  
> > >>> wrote:
> >  On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  
> >  wrote:
> > 
> > > I'm a bit confused, why aren't we simply adding LSN interlock
> > > checks for toast? Doesn't look that hard? Seems like a much more
> > > natural course of fixing this issue?
> > 
> >  I took some time trying to see what you have in mind, and I'm
> >  really not "getting it".
> > >>>
> > >>> Isn't it possible if we initialize lsn and whenTaken in SnapshotToast
> > >>> when old_snapshot_threshold > 0 and add a check for
> > >>> HeapTupleSatisfiesToast in TestForOldSnapshot()?
> > >>
> > >> With that approach, how will we know *not* to generate an error
> > >> when reading the chain of tuples for a value we are deleting.  Or
> > >> positioning to modify an index on toast data.  Etc., etc. etc.
> > >
> > > I'm not following. How is that different in the toast case than in the
> > > heap case?
> > 
> > A short answer is that a normal table's heap doesn't go through
> > systable_getnext_ordered().  That function is used both for cases
> > where the check should not be made, like toast_delete_datum(), and
> > cases where it should, like toast_fetch_datum().
> > 
> > Since this keeps coming up, I'll produce a patch this way.  I'm
> > skeptical, but maybe it will look better than I think it will.  I
> > should be able to post that by Friday.
> 
> This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2016-07-20 03:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Document that vacuum can't truncate if old_snapshot_threshold >= 0

2016-07-18 Thread Noah Misch
On Fri, Jul 15, 2016 at 02:38:54AM -0400, Noah Misch wrote:
> On Wed, Jul 13, 2016 at 02:14:06PM -0700, Andres Freund wrote:
> > That appears to not be mentioned in a comment, the commit message or the
> > the docs. I think this definitely needs to be prominently documented.
> 
> [Action required within 72 hours.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 9.6 open item.  Kevin,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.
> 
> [1] 
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dumping database privileges broken in 9.6

2016-07-18 Thread Noah Misch
On Fri, Jul 15, 2016 at 03:46:17PM -0400, Stephen Frost wrote:
> * Noah Misch (n...@leadboat.com) wrote:
> > On Sat, Jul 09, 2016 at 12:55:33AM -0400, Stephen Frost wrote:
> > > * Noah Misch (n...@leadboat.com) wrote:
> > > > This PostgreSQL 9.6 open item is past due for your status update.  
> > > > Kindly send
> > > > a status update within 24 hours, and include a date for your subsequent 
> > > > status
> > > > update.  Refer to the policy on open item ownership:
> > > > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
> > > 
> > > Unfortunately, not going to make any further progress on this tonight or
> > > over the weekend as I'm going to be out of town.  I believe I've
> > > convinced myself that adding a template1 entry to pg_init_privs will be
> > > both sufficient and produce the correct results, along with adjusting
> > > the query in pg_dumpall to join through it.  Will provide an update on
> > > Monday.
> > 
> > This PostgreSQL 9.6 open item is long past due for your status update.  
> > Kindly
> > send a status update within 24 hours, and include a date for your subsequent
> > status update.  (Your Tuesday posting lacked a date.)
> 
> Yeah, I realized that tablespaces have the same issue and have updated
> the patch to address them as well, in the same way.
> 
> Going through and doing testing now.  Unfortunately, it doesn't look
> like adding in testing of tablespaces into the TAP tests would be very
> easy (the only TAP test that deals with tablespaces that I found was
> pg_basebackup and that looks rather grotty..), so I'm not planning to do
> that, at least not at this time.
> 
> As such, I'm planning to commit the patch with the fix+regression test
> for database ACLs and the fix for tablespace ACLs either later today
> or tomorrow.

Does commit 47f5bb9 fully fix this open item?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread Craig Ringer
On 18 July 2016 at 02:27, Robert Haas  wrote:

> On Fri, Jul 15, 2016 at 4:28 AM, Craig Ringer 
> wrote:
> > I don't think anyone's considering moving from multi-processing to
> > multi-threading in PostgreSQL. I really, really like the protection that
> the
> > shared-nothing-by-default process model gives us, among other things.
>
> We get some very important protection by having the postmaster in a
> separate address space from the user processes, but separating the
> other backends from each other has no value.  If one of the backends
> dies, we take provisions to make sure they all die, which is little or
> no different from what would happen if we had the postmaster as one
> process and all of the other backends as threads within a second
> process.  As far as I can see, running each and every backend in a
> separate process has downsides but no upsides.  It slows down the
> system and makes it difficult to share data between processes without
> much in the way of benefits.
>

That's a good point, the random memory overwrites that Tom mentioned aside.

I think that memory leaks we currently ignore as insignificant one-time
losses that'll get cleaned up on backend exit will become relvant so some
cleanup work would be needed there, but it's not technically difficult and
valgrind is a wonderful thing.

One minor thing to be aware of is that cPython has a horrible threading
design with a single massive lock for the whole interpreter. PL/Python will
perform apallingly in a threaded context. It looks like it has more
recently gained support for separate interpreters (each with their own GIL)
within a single process though, if I'm reading the docs correctly:

https://pl.python.org/docs/api/threads.html

so maybe it'd just require some plpython tweaks to switch to the right
interpreter for the current backend.

I don't think plpython issues are a huge cause for hand-wringing anyway,
really. TBH, if it really is practical to move Pg to a threaded model and
not as hard as I thought, I do see the advantages. Mainly because I'd
_love_ efficient embedded Java and C# runtime, it's downright embarrassing
to tell people they should write procs in Perl or Python (or TCL!) if they
can't do it in plpgsql.

If we can stand the macro code pollution, it'd be interesting to do a
minimal conversion as an experiment and let some buildfarm members start
digesting it once it runs. Find issues slowly over time, make it an
experimental build option. We could do things like transparently use
threadids whereever we currently expose a PID in the UI, change the
bgworker backend to spawn threads rather than procs, etc. (One nice
consequence would be the possibility of getting rid of most of EXEC_BACKEND
since the postmaster launching the backend proc would be a one-time thing,
once it was stable enough to make the thread model the only option on
Windows).

It'd be very helpful to find a nice portable library that abstracts
platform threading specifics and has a less horrid API than pthreads,
rather than having to DIY. (See e.g.:
https://software.intel.com/en-us/blogs/2006/10/19/why-windows-threads-are-better-than-posix-threads
). Or use C++11  :p [dives for fireproof suit]

> Where I agreed with you, and where I think Robert sounded like he was
>
> agreeing, was that our current design where we have one executor per user
> > sessions and can't suspend/resume sessions is problematic.
>
> The problems are very closely related.  The problem with suspending
> and resuming sessions is that you need to keep all of the session's
> global variable contents (except for any caches that are safe to
> rebuild) until the session is resumed; and we have no way of
> discovering all of the global variables a process is using and no
> general mechanism that can be used to serialize and deserialize them.
>

Right. In our per-process model we'd have to provide a subsystem that
serializes/deserializes session state and has callbacks for extensions to
register their own save/restore callbacks. We'd probably want it even if
the improbable happened and Pg moved to threads - which until this thread I
would've considered the same as saying "pigs fly" or "the US gets universal
health care". Since we'd want to be able to essentially page out idle
sessons, though it'd be more of a nice-to-have than a necessity.

Individual Pg subsystems would probably register callbacks with the
save/restore subsystem, rather than trying to have the save/restore
subsystem have the knowledge to reach into all the other subsystems. Since
we'd block save/restore when there's an active query of course, it might
not actually be that bad. Especially if we started with save/restore only
on idle state, not idle-in-transaction, i.e. start with transaction pooling.

Since I got started with Pg, I've taken it as given that PostgreSQL Will
Never Use Threads, Don't Even Talk About It. As taboo as query hints or
more so. Is this actually a 

Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread Andres Freund
On 2016-07-19 08:33:20 +0800, Craig Ringer wrote:
> On 19 July 2016 at 03:53, Andres Freund  wrote:
> 
> > On 2016-07-18 15:47:58 -0400, Robert Haas wrote:
> > > I think the risk profile is exactly the opposite of what you are
> > > suggesting here.  If we provide an option to compile the server with
> > > all global variables converted to thread-local variables, there's
> > > really not a whole lot that can break, AFAICS.
> >
> > Using TLS will slow down things noticeably though. So if we were to go
> > there, we'd have to make up for some constant slowdown.
> >
> >
> Does TLS really have more of a performance impact than process context
> switches?

Those aren't really alternatives, unless I'm missing something. You have
context switches between threads as well. They're a bit cheaper (due to
less TLB invalidations), but generally not a *whole* lot.  What TLS
requires is basically for every thread local variable to go through one
(IIRC sometimes two) additional layer of indirection.  For workloads
which are bottlenecked on single core performance (i.e. most of pg,
regardless of parallel query), that can be painful.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread Craig Ringer
On 19 July 2016 at 03:53, Andres Freund  wrote:

> On 2016-07-18 15:47:58 -0400, Robert Haas wrote:
> > I think the risk profile is exactly the opposite of what you are
> > suggesting here.  If we provide an option to compile the server with
> > all global variables converted to thread-local variables, there's
> > really not a whole lot that can break, AFAICS.
>
> Using TLS will slow down things noticeably though. So if we were to go
> there, we'd have to make up for some constant slowdown.
>
>
Does TLS really have more of a performance impact than process context
switches?

Genuine question, I'm clueless in the area.


> > We'll technically be multi-threaded but the code need not know or care
> > about the other threads; only in the event of a memory clobber can
> > they affect each other.
>
> But that'll make it pretty hard to take advantage of multi-threading to
> a meaningful degree. Except for being able to create shared memory after
> the fact - quite useful! - there'd not be much point.
>

Yeah. TLS is too simplistic. To do any useful parallel work without
serialization/deserialization, some threads need to be in groups with other
threads, sharing some data structures but having private copies of others.

That said, TLS can be used as a reasonable default state, then individual
data structures extracted and shared more formally when needed. So it's a
sane starting point.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-18 Thread Joshua D. Drake

On 07/18/2016 03:17 PM, Jim Nasby wrote:

On 7/17/16 2:22 PM, Petr Jelinek wrote:



I do agree that DDL "feels better" (which I think is what JD was
alluding too).


Yes and no. It reads better and is more clear to those who are not 
developers or have a developer background which is, many in the database 
field. It is also easier to type. I type 120 a minute on a roll, that is 
until I have to do this ('[ . Simple wording base command structure 
is much more efficient.


ALTER TABLE FOO ENABLE REPLICATION ON SLAVE 0;

vs

select enable_replication_for_table('['foo']',0);

Guess which one was typed without a single error and more quickly.

Sincerely,

JD




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] application_name in process name?

2016-07-18 Thread Tom Lane
Mike Blackwell  writes:
> On Sun, Jul 17, 2016 at 10:34 AM, Tom Lane  wrote:
>> It occurs to me that we could also remove the update_process_title GUC:
>> what you would do is configure a process_title pattern that doesn't
>> include the %-escape for current command tag, and the infrastructure
>> could notice that that escape isn't present and skip unnecessary updates.
>> The same kind of trick could be used for other potentially-expensive
>> items like the lock "waiting" flag.

> This seems like an interesting project for learning my way around gucs and
> logging.  ​Could you elaborate a little
> on the cost considerations?

Well, the point is to avoid unnecessary updates of the process title,
since on many platforms that involves a kernel call.  Most of the fields
that you might want in it are static anyhow, but current command tag
obviously changes often, as does the lock-waiting annotation.  The idea
would be to notice that the selected title string doesn't call for those
fields and skip updates if so.

I envision this as having assignment of the process_title GUC compute a
bitmask showing which possibly-mutable escape codes appear in the string
(this could be implemented by having the check_hook compute a bitmask and
save it via the GUC "extra" mechanism).  Then calls to set_ps_display
could be extended with a bitmask parameter showing which field types are
known to be changing in that call, and you just "AND" to discover if an
update is needed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] application_name in process name?

2016-07-18 Thread Mike Blackwell
On Sun, Jul 17, 2016 at 10:34 AM, Tom Lane  wrote:

> It occurs to me that we could also remove the update_process_title GUC:
> what you would do is configure a process_title pattern that doesn't
> include the %-escape for current command tag, and the infrastructure
> could notice that that escape isn't present and skip unnecessary updates.
> The same kind of trick could be used for other potentially-expensive
> items like the lock "waiting" flag.
>

This seems like an interesting project for learning my way around gucs and
logging.  ​Could you elaborate a little
on the cost considerations?


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-18 Thread Jim Nasby

On 7/17/16 2:22 PM, Petr Jelinek wrote:

I generally agree, but I think the more important question is "Why?". Is
it becouse DDL looks more like a sentence? Is it because arrays are a
PITA? Is it too hard to call functions?


For me it's many small reasons. I want to store it in catalogs and some
things there are nicer when you manipulate using standard DDL processing
(like dependencies for example).


Fair point.


The syntax is also bit nicer. Our
documentation works better for DDLs than functions (that's something we
should fix but I am not doing it as part of this patch). Same goes for
psql tab completion. We automatically gain things like event triggers.


I'd think all of those we'd want to be able to support for functions as 
well...



The support in pg_dump is also more straightforward with DDL.


Hmm... not sure why that is. It does seem to me that support for 
extension configuration isn't as strong as it could be.



It might make sense to have functions for manipulating slots and origins
as those are just primitives which user should not have to fiddle with
but for things that are directly meant for user interaction DDL just
feels better.


I do agree that DDL "feels better" (which I think is what JD was 
alluding too).


I had a secret agenda in asking why it's better though: can we find a 
way to allow extensions to do "DDL-ish" things in a better way than how 
they're stuck doing them today. I suspect it will never be practical to 
have extensions modifying grammar willy-nilly, but maybe there's some 
other things we could do to make life easier. One thought is an 
"extension command" mode you can enter that means everything you're 
typing gets treated as a call to a function in that extension:


EXTENSION MODE citus;
master_create_distributed_table 'github_events', 'created_at', 'append';
EXTENSION MODE;

instead of SELECT master_create_distributed_table('github_events', 
'created_at', 'append');


obviously that's completely pointless for a single command, but if you 
needed to do a bunch of things it starts saving typing.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-07-18 Thread Peter Eisentraut
On 7/15/16 11:23 AM, Tom Lane wrote:
> Meh ... if we're using one-letter abbreviations for hour and second,
> using three letters for minute seems just arbitrarily inconsistent.

Well, it's the SI abbreviation.

We also need to think through localization options.

Using the 01:02:03.004 format would sidestep most of that.

Or we can still give users the option to set it to whatever they want.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread Andres Freund
On 2016-07-18 15:47:58 -0400, Robert Haas wrote:
> I think the risk profile is exactly the opposite of what you are
> suggesting here.  If we provide an option to compile the server with
> all global variables converted to thread-local variables, there's
> really not a whole lot that can break, AFAICS.

Using TLS will slow down things noticeably though. So if we were to go
there, we'd have to make up for some constant slowdown.


> We'll technically be multi-threaded but the code need not know or care
> about the other threads; only in the event of a memory clobber can
> they affect each other.

But that'll make it pretty hard to take advantage of multi-threading to
a meaningful degree. Except for being able to create shared memory after
the fact - quite useful! - there'd not be much point.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread Robert Haas
On Sun, Jul 17, 2016 at 10:00 PM, Jan Wieck  wrote:
>> I admit that it is risky, but I think there are things that could be
>> done to limit the risk.  I don't believe we can indefinitely continue
>> to ignore the potential performance benefits of making a switch like
>> this.  Breaking a thirty-year old code base irretrievably would be
>> sad, but letting it fade into irrelevance because we're not willing to
>> make the architecture changes that are needed to remain relevant would
>> be sad, too
>
> I have to agree with Robert on that one. We have been "thinking" about
> multi-threading some 16 years ago already. We were aware of the dangers
> and yet at least considered doing it some day for things like a parallel
> executor. And that would probably be our best bang for the buck still.
>
> The risks of jamming all sessions into a single, multi-threaded process are
> huge. Think of snapshot visibility together with catalog cache
> invalidations.
> I'd say no to that one as a first step.
>
> But multi-threading the executor or even certain utility commands at first
> should not be rejected purely on the notion that "we don't have
> multithreading
> today."

I think the risk profile is exactly the opposite of what you are
suggesting here.  If we provide an option to compile the server with
all global variables converted to thread-local variables, there's
really not a whole lot that can break, AFAICS.  We'll technically be
multi-threaded but the code need not know or care about the other
threads; only in the event of a memory clobber can they affect each
other.

On the other hand, if we start trying to create multiple threads per
processes just for certain purposes - be it the executor or certain
utility commands - any C code that is reachable within the secondary
threads needs to have specific provisions for thread-safety.  That
would create all kinds of problems, no doubt.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread Robert Haas
On Mon, Jul 18, 2016 at 10:03 AM, Greg Stark  wrote:
> On Mon, Jul 18, 2016 at 2:41 PM,   wrote:
>> And  I  will  be  really happy when there are processors with infinite
>> performance and memory with infinite size.
>>:)))
>
> Well for what it's worth there's no theoretical difference between
> multi-process and multi-threaded. They're just two different APIs to
> create shared memory and other kernel data structures and they both
> allow all the sharing you want.  In the API Postgres uses everything
> starts out non-shared and we explicitly set up the parts we want
> shared. In the other nearly everything starts shared though it's
> possible to unshare parts. Once they're set up the CPU and MMU don't
> really care what kernel API was used to set them up.

That's totally true, but allocating additional shared memory after
system startup is a pain.  It's true that we now have DSM, but the
fact that DSM segments can be mapped in different addresses in
different processes is a nuisance.  We will still get a lot of benefit
out of having DSM, but it's not right to imply that threading wouldn't
make things easier.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Updating our timezone code in the back branches

2016-07-18 Thread Tom Lane
When I updated our copy of the IANA timezone library back in March
(commit 1c1a7cbd6), I noted that we ought to consider back-patching
those changes once they'd settled out in HEAD.  Now that the code
has survived a couple of months of beta testing, I think it is time
to do that.

I went through the IANA announcement archives
http://mm.icann.org/pipermail/tz-announce/
to get a summary of what's changed since the tzcode 2010c release
that we last synced with.  Attached is a list of code changes that
seem potentially significant to us.  There are at least two ticking
time bombs in there, namely zic feature additions that IANA has not
yet started to use in the tzdata files, but presumably will start
using at some point, else why would they put them in?  (These are
the extension to allow four DST transitions per year and the addition
of %z escapes in TZ abbreviations.)  Whenever they do start using them,
we'll be unable to apply tzdata updates to unpatched back branches,
because our old copy of zic will fail to compile the tzdata files.

There are also several bug fixes that affect interpretation of dates after
2037, a year that's getting closer all the time.  And there are some
changes that affect reading of zic data files, which could affect
unpatched back branches that are built with --with-system-tzdata,
since those might be fed updated data files even if we do nothing.

So I think it behooves us to apply these changes to the back branches
while we can still do it in a leisurely fashion, rather than waiting
until our hands are forced.  I'd like to do it in the next week or so
so that we can get in a little bit of buildfarm and manual testing
before the next set of back-branch releases in August.

Thoughts, objections?

regards, tom lane


tzcode release 2013a

The zone offset at the end of version-2-format zone files is now
allowed to be 24:00, as per POSIX.1-2008.

Fix zic bug that mishandled Egypt's 2010 changes (this also affected
the data).

2013e

Allow POSIX-like TZ strings where the transition time's hour can
range from -167 through 167, instead of the POSIX-required 0
through 24.  E.g., TZ='FJT-12FJST,M10.3.1/146,M1.3.4/75' for the
new Fiji rules.  This is a more-compact way to represent
far-future time stamps for America/Godthab, America/Santiago,
Antarctica/Palmer, Asia/Gaza, Asia/Hebron, Asia/Jerusalem,
Pacific/Easter, and Pacific/Fiji.  Other zones are unaffected by
this change.

Allow POSIX-like TZ strings where daylight saving time is in
effect all year.  E.g., TZ='WART4WARST,J1/0,J365/25' for Western
Argentina Summer Time all year.  This supports a more-compact way
to represent the 2013d data for America/Argentina/San_Luis.
Because of the change for San Luis noted above this change does not
affect the current data.

Where these two TZ changes take effect, there is a minor extension
to the tz file format in that it allows new values for the
embedded TZ-format string, and the tz file format version number
has therefore been increased from 2 to 3 as a precaution.
Version-2-based client code should continue to work as before for
all time stamps before 2038.  Existing version-2-based client code
(tzcode, GNU/Linux, Solaris) has been tested on version-3-format
files, and typically works in practice even for time stamps after
2037; the only known exception is America/Godthab.

2014b

 'zic' and 'localtime' no longer reject locations needing four
 transitions per year for the forseeable future.  (Needed for
 planned changes to make Antarctica/Troll zone more accurate)

2014c

 zic now generates transitions for minimum time values, eliminating
 guesswork when handling low-valued time stamps.

2015f

 The two characters '%z' in a zone format now stand for the UTC
 offset, e.g., '-07' for seven hours behind UTC and '+0530' for
 five hours and thirty minutes ahead.  This better supports time
 zone abbreviations conforming to POSIX.1-2001 and later.

 zdump and zic no longer warn about valid time zone abbreviations
 like '-05'.  (We were forced to apply this change already, cf
 221619ad6)

 localtime no longer mishandles America/Anchorage after 2037.


not yet applied to PG:

2016e

 zic now outputs a dummy transition at time 2**31 - 1 in zones
 whose POSIX-style TZ strings contain a '<'.  This mostly works
 around Qt bug 53071 .


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More parallel-query fun

2016-07-18 Thread Piotr Stefaniak

On 2016-06-16 20:14, Tom Lane wrote:

As of HEAD you can exercise quite a lot of parallel query behavior
by running the regression tests with these settings applied:

force_parallel_mode = regress
max_parallel_workers_per_gather = 2-- this is default at the moment
min_parallel_relation_size = 0
parallel_setup_cost = 0
parallel_tuple_cost = 0

This results in multiple interesting failures, including a core dump



I saw another previously-unreported problem before getting to the crash:



Haven't tried to trace that one down yet.


As I expected, I'm unable to reproduce anything of the above - please 
correct me if I'm wrong, but it all seems to have been fixed.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-18 Thread Peter Geoghegan
On Mon, Jul 18, 2016 at 7:56 AM, Robert Haas  wrote:
> The test case I used previously was an external sort, which does lots
> of retail pfrees.  Now that we've mostly abandoned replacement
> selection, there will be many fewer pfrees while building runs, I
> think, but still quite a few while merging runs.

Surely you mean that in 9.6 there are just as many palloc() + pfree()
calls as before when building runs, but many fewer when merging
(provided you limit your consideration to a final on-the-fly merge,
which are the large majority of merges in practice)? Nothing changed
about how tuplesort caller tuples are originally formed in 9.6, so
work remains even there.

I think we should be using batch memory for caller tuples (e.g.,
MinimalTuples) past the first run as an optimization, but that isn't
something that I plan to do soon. Separately, I've already written a
patch to make final merges that are not on-the-fly (i.e. the final
merge of a randomAccess caller, where a single materialize output to
one tape is formed) use batch memory, mostly to suit parallel sort
workers. Parallel sort could increase the prevalence of non-on-the-fly
merges by quite a bit, so that is on my agenda for the next release.

> Now it might be the
> case that if the allocating is fast enough and we save a bunch of
> memory, spending a few additional cycles freeing things is no big
> deal.  It might also be the case that this is problematic in a few
> cases but that we can eliminate those cases.  It's likely to take some
> work, though.

Perhaps I simply lack imagination here, but I still suspect that
ad-hoc approaches will tend to work best, because most of the benefit
can be derived from specialized, precise memory management (what I've
called batch memory) for just a few modules, and what remains isn't
several broad swathes that can be delineated easily. I can see a
"palloc a lot and don't worry too much about pfrees" allocator having
some value, but I suspect that that isn't going to move the needle in
the same way.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression tests vs existing users in an installation

2016-07-18 Thread Tom Lane
Peter Eisentraut  writes:
> On 7/15/16 6:13 PM, Tom Lane wrote:
>> We've talked before about how the regression tests should be circumspect
>> about what role names they create/drop, so as to avoid possibly blowing
>> up an installation's existing users during "make installcheck".

> I'm not particularly sure that that is a useful goal anymore.  Setting
> up a new instance is cheap, so if users are concerned, they should not
> run the tests against their important instance.

To my mind, the main point of "make installcheck" is to verify that
your actual installation, not some other instance, is working right.
This is far from a trivial issue; for instance on a SELinux machine,
you need to be able to verify that the installed policy allows the
DB to work, and that is very likely to be path-sensitive.

So this remains an important requirement to me.  It's true that it might
be something you need to run only right after making the installation ---
but that doesn't mean the DB is empty.  Consider wanting to test a freshly
pg_upgrade'd installation, for example.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typos in logical decoding

2016-07-18 Thread Magnus Hagander
On Mon, Jul 18, 2016 at 11:00 AM, Antonin Houska  wrote:

> While reading the logical decoding code I noticed a few supposedly mistyped
> comments, see the diff below.
>

Applied, thanks.



> Besides that, output_plugin_options argument of CreateInitDecodingContext
> function is unused, and the function passes NIL to the corresponding
> argument
> of StartupDecodingContext. This looks to me like a thinko.
>

I agree it looks like a thinko, and AFAICT all internal uses pass NIL there
anyway so it makes no difference on those. I'll wait for someone more
familiar with that code to comment though :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Regression tests vs existing users in an installation

2016-07-18 Thread Peter Eisentraut
On 7/15/16 6:13 PM, Tom Lane wrote:
> We've talked before about how the regression tests should be circumspect
> about what role names they create/drop, so as to avoid possibly blowing
> up an installation's existing users during "make installcheck".

I'm not particularly sure that that is a useful goal anymore.  Setting
up a new instance is cheap, so if users are concerned, they should not
run the tests against their important instance.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread AMatveev
Hi

>So  I  think  as long as  process and thread have different function in OS,
>use process like thread will have overhead in practice.

But  There  are  other  negative  things.  I think parallel oriented
library usually do not work with process.
So Jvm integration is not exception. It is regularity.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread AMatveev
Hi


> In other words, there's no theoretical reason you couldn't have adapt
> a JVM to create a large shared memory segment using mmap or SysV
I  think  even  if  I  was  the  leader in OS development, I could not
correctly answer your question.
So just let discuss.
Ok, I agree with you that there is no " theoretical reason "
But  in  practice  I  think  the  main  reason  that OS(VM) developers
implement this things differently.

>there's no theoretical reason you couldn't
Why does Os developers make threads?
If there is no reason the use of thread just waste?
Why do most(any) common web server or balancer use thread?
May be they are bad theoretical?
and so on

But to be more constructive.
I  just  don't know  how to make between process things that we can easily do
between threads, in most os.

I don't know how to share mutable execution code.
so i just cant imagine how to implement
http://docs.oracle.com/javase/7/docs/technotes/guides/vm/multiple-language-support.html#invokedynamic
in optimal way.

I  don't understand why mutex has overhead compare to critical section
for windows.
http://stackoverflow.com/questions/800383/what-is-the-difference-between-mutex-and-critical-section

And so on.

So  I  think  as long as  process and thread have different function in OS,
use process like thread will have overhead in practice.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression tests vs existing users in an installation

2016-07-18 Thread Tom Lane
Robert Haas  writes:
> On Sat, Jul 16, 2016 at 11:38 AM, Tom Lane  wrote:
>> I'm coming to the conclusion that the only thing that will make this
>> materially better in the long run is automatic enforcement of a convention
>> about what role names may be created in the regression tests.  See my
>> response to Stephen just now for a concrete proposal.

> We could also do this by loading a C module during the regression
> tests, which seems maybe less ugly than adding a GUC.

Meh, I'm not convinced.  As Michael points out, arranging for such a
module to get loaded in an installcheck context would be difficult ---
maybe not impossible, but complicated.  Also, we'd have to add hook
function calls in all the places it would need to get control; most
of those places would probably be one-off hooks with no other conceivable
use.  And we'd still need to have a GUC, because I think it's inevitable
that we'd need to be able to turn off the restrictions for specific
tests.  So that seems like a lot of work and complication just to make
a GUC be custom to some undocumented extension rather than built-in.
If we had no other debugging GUCs then there might be some point in
rejecting this one, but we have a bunch:
https://www.postgresql.org/docs/devel/static/runtime-config-developer.html

> I don't particularly like your suggestion of spooky action at a
> distance between force_parallel_mode and regression_test_mode.  That
> just seems kooky.

It's certainly a judgment call as to which way is cleaner, but I don't
understand your objection.  There are plenty of ways in which multiple
GUCs determine a given behavior already.  Also, breaking this behavior
into two variables would let us document the user-useful behavior (do
this to test parallel safety of functions) in a different place from the
developer-useful behavior (do this to make EXPLAIN lie to you, which
surely has no possible use except for regression testing).

Possibly a single "regression_test_mode" variable is a bad idea and
we should instead have distinct developer-oriented GUCs for each special
behavior we decide we need.  I'm not particularly set on that, but
to me it seems like less of a mess to have just one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-18 Thread Robert Haas
On Mon, Jul 18, 2016 at 10:19 AM, Greg Stark  wrote:
> On Sun, Jul 17, 2016 at 1:55 PM, Robert Haas  wrote:
>>On Wed, Jul 13, 2016 at 4:39 PM, Tom Lane  wrote:
>>> I wonder whether we could compromise by reducing the minimum "standard
>>> chunk header" to be just a pointer to owning context, with the other
>>> fields becoming specific to particular mcxt implementations.
>>
>> I think that would be worth doing.  It's not perfect, and the extra 8
>> (or 4) bytes per chunk certainly do matter.
>
> I wonder if we could go further. If we don't imagine having a very
> large number of allocators then we could just ask each one in turn if
> this block is one of theirs and which context it came from. That would
> allow an allocator that just allocated everything in a contiguous
> block to recognize pointers and return the memory context just by the
> range the pointer lies in.
>
> There could be optimizations like if the leading points point to a
> structure with a decently large magic number then assume it's a valid
> header to avoid the cost of checking with lots of allocators. But I'm
> imagining that the list of allocators in use concurrenlty will be
> fairly small so it might not even be necessary.

Well, if we were going to do this, it would make more sense to have a
central registry of blocks that is shared across all memory context
types, and when you see a pointer you just look it up in the chunk map
and find the containing context that way.  I actually did something
like this when I was working on sb_alloc (see archives); it looked up
whether the chunk was of the new type and, if not, it assumed it came
from aset.c.  To do this, it used a three-level trie (like Google's
tcmalloc based on the pointer address) with, IIRC, some optimizations
for the case where we repeatedly free from the same chunk.  That
slowed down pfree noticeably, though. The problem is that it's pretty
well impossible to do something that is as cheap as fetching a memory
context pointer from the chunk header, chasing a pointer or two from
there, and then jumping.  That's just really cheap.

The test case I used previously was an external sort, which does lots
of retail pfrees.  Now that we've mostly abandoned replacement
selection, there will be many fewer pfrees while building runs, I
think, but still quite a few while merging runs.  Now it might be the
case that if the allocating is fast enough and we save a bunch of
memory, spending a few additional cycles freeing things is no big
deal.  It might also be the case that this is problematic in a few
cases but that we can eliminate those cases.  It's likely to take some
work, though.

Anyway, my point is really that doing what Tom proposes would be a
good first step and probably would not involve much risk.  And with
that change, any new allocator that has some extrinsic way of
determing chunk sizes can save 8 bytes per chunk, which is certainly
meaningful for any context that does lots of allocations.  If somebody
writes a new allocator that has good performance characteristics and
uses 400MB of memory on some test where aset.c would have used 535MB
of memory, and we can compute that with no chunk headers at all would
use only 346MB of memory, then we can talk about how to get there.
Doing things stepwise is good.

One random thought is that we could have a memory allocator altogether
separate from palloc/pfree.  For example, suppose we introduce
qalloc/qfree.  A chunk of memory allocated with qalloc can only be
freed with qfree, not pfree.  In this world, you're completely freed
from the requirement to care about chunk headers in any form.  The
downside, of course, is that you can't let qalloc'd tuples escape into
the executor, say, because it only knows how to pfree things, not how
to qfree things.  But there are plenty of places where you can easily
see that allocations won't escape outside the module where they are
performed; in fact, there are many cases where it would be easy to
pass the memory context *as an argument to the free function* -- which
would presumably be quite advantageous for any allocator that doesn't
store a pointer to the context in the chunk header.  I'm sure there
will be some reluctance to go down these kinds of paths because it is
undeniably convenient from a programmer's perspective to be able to
just say pfree and forget the details, but I believe there will be
some cases - especially for contexts that hold lots of allocations -
where the performance gains from these kinds of techniques are quite
measurable on macrobenchmarks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] timestamp informations to pg_stat_statements

2016-07-18 Thread Jason Kim
Hi guys,
Thank you for feedbacks.

> I think that this is the job of a tool that aggregates things from 
> pg_stat_statements. It's unfortunate that there isn't a good 
> third-party tool that does that, but there is nothing that prevents 
> it. 

Right. We can do this if we aggregate it frequently enough. However,
third-parties can do it better if we have more informations.
I think these are fundamental informations to strong third-parties could be.

> The concern I've got about this proposal is that the results get very 
> questionable as soon as we start dropping statement entries for lack 
> of space.  last_executed would be okay, perhaps, but first_executed 
> not so much. 

If we set pg_stat_statements.max to large enough, there could be long
lived entries and short lived ones simultaneously. In this case, per call 
statistics could be accurate but per seconds stats can not.
The idea of I named it as created and last_updated (not
first_executed and last_executed) was this. It represents timestamp of
 stats are created and updated, so we can calculate per second stats with
simple math.

> Also, for what it's worth, I should point out to Jun that 
> GetCurrentTimestamp() should definitely not be called when a spinlock 
> is held like that. 

I moved it outside of spinlock.

@@ -1204,6 +1209,11 @@ pgss_store(const char *query, uint32 queryId,
 */
volatile pgssEntry *e = (volatile pgssEntry *) entry;

+   /*
+* Read a timestamp before grab the spinlock
+*/
+   TimestampTz last_updated = GetCurrentTimestamp();
+
SpinLockAcquire(>mutex);

/* "Unstick" entry if it was previously sticky */
@@ -1251,6 +1261,7 @@ pgss_store(const char *query, uint32 queryId,
e->counters.blk_read_time +=
INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time +=
INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+   e->counters.last_updated = last_updated;

SpinLockRelease(>mutex);
  }

pg_stat_statements_with_timestamp_v2.patch

  

Regards,
Jason Kim.



--
View this message in context: 
http://postgresql.nabble.com/PROPOSAL-timestamp-informations-to-pg-stat-statements-tp5912306p5912461.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-18 Thread Jan Wieck
On Mon, Jul 18, 2016 at 10:05 AM, Tom Lane  wrote:

> Jan Wieck  writes:
> > In the meantime, would it be appropriate to backpatch the double linking
> > of memory context children at this time? I believe it has had plenty of
> > testing in the 9.6 cycle to be sure it didn't break anything.
>
> I'm concerned about the ABI breakage risk from changing a data structure
> as fundamental as MemoryContext.  Yeah, code outside utils/mmgr probably
> *shouldn't* be looking at that struct, but that doesn't mean it isn't.
> In the past we've generally only taken that sort of risk when there was
> no other way to fix a bug; and this patch isn't a bug fix.  While this
> does help performance in some corner cases, I don't think it's enough of
> an across-the-board win to justify the risk of back-patching.
>

I would consider mucking with the linked lists of memory context children
inside
of 3rd party code a really bad idea, but I concede. It isn't a bug fix and
there is
that small risk that somebody did precisely that, so no backpatch.


Regards, Jan

-- 
Jan Wieck
Senior Postgres Architect
http://pgblog.wi3ck.info


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-18 Thread Tom Lane
Greg Stark  writes:
> I wonder if we could go further. If we don't imagine having a very
> large number of allocators then we could just ask each one in turn if
> this block is one of theirs and which context it came from. That would
> allow an allocator that just allocated everything in a contiguous
> block to recognize pointers and return the memory context just by the
> range the pointer lies in.

The performance problem is not "large number of allocators", it's "large
number of distinct memory ranges".  Also, this will fail utterly to
recognize duplicate-pfree and bad-pointer cases.  Not that the existing
method is bulletproof, but this way has zero robustness against caller
bugs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-18 Thread Tom Lane
Merlin Moncure  writes:
> Hm, maybe, instead of trying to figure out if in a loop, set a
> 'called' flag  with each statement and only cache when touched the
> second time.  (If that's easier, dunno).

Well, then you just guarantee to lose once.  I think Jan's sketch of
marking potentially-cacheable expressions at compile time sounds
promising.  I'm envisioning a counter that starts at 1 normally or 0 in a
DO block, increment at beginning of parsing a loop construct, decrement at
end; then mark expressions/statements as cacheable if counter>0.

Of course the next question is what exactly to do differently for a
noncacheable expression.  My recollection is that that's all tied
pretty tightly to the plancache these days, so it might take a little
work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-18 Thread Greg Stark
On Sun, Jul 17, 2016 at 1:55 PM, Robert Haas  wrote:
>
>On Wed, Jul 13, 2016 at 4:39 PM, Tom Lane  wrote:
>>
>> I wonder whether we could compromise by reducing the minimum "standard
>> chunk header" to be just a pointer to owning context, with the other
>> fields becoming specific to particular mcxt implementations.
>
> I think that would be worth doing.  It's not perfect, and the extra 8
> (or 4) bytes per chunk certainly do matter.

I wonder if we could go further. If we don't imagine having a very
large number of allocators then we could just ask each one in turn if
this block is one of theirs and which context it came from. That would
allow an allocator that just allocated everything in a contiguous
block to recognize pointers and return the memory context just by the
range the pointer lies in.

There could be optimizations like if the leading points point to a
structure with a decently large magic number then assume it's a valid
header to avoid the cost of checking with lots of allocators. But I'm
imagining that the list of allocators in use concurrenlty will be
fairly small so it might not even be necessary.
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-18 Thread Tom Lane
Jan Wieck  writes:
> In the meantime, would it be appropriate to backpatch the double linking
> of memory context children at this time? I believe it has had plenty of
> testing in the 9.6 cycle to be sure it didn't break anything.

I'm concerned about the ABI breakage risk from changing a data structure
as fundamental as MemoryContext.  Yeah, code outside utils/mmgr probably
*shouldn't* be looking at that struct, but that doesn't mean it isn't.
In the past we've generally only taken that sort of risk when there was
no other way to fix a bug; and this patch isn't a bug fix.  While this
does help performance in some corner cases, I don't think it's enough of
an across-the-board win to justify the risk of back-patching.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-18 Thread Merlin Moncure
On Mon, Jul 18, 2016 at 8:59 AM, Jan Wieck  wrote:
>
>
> On Mon, Jul 18, 2016 at 9:43 AM, Tom Lane  wrote:
>>
>> Merlin Moncure  writes:
>> > BTW, while the fix does address the cleanup performance issue, it's
>> > still the case that anonymous code blocks burn up lots of resident
>> > memory (my 315k example I tested with ate around 8gb IIRC) when run
>> > like this.  My question is, if the pl/pgsql code block is anonymous
>> > and not in some kind of a loop, why bother caching the plan at all?
>>
>> Nobody got around to it.  Also, as you note, it's not as simple as
>> "don't cache if in a DO block".  You'd need to track whether you were
>> inside any sort of looping construct.  Depending on how difficult
>> that turned out to be, it might add overhead to regular functions
>> that we don't want.
>
> Agreed. And from the structures themselves it is not really easy to detect
> if inside of a loop, the toplevel, while, for and if all use the same
> statement
> block and call exec_stmts(), which in turn calls exec_stmt() for  each
> element in that list. It is not impossible to add a flag, set at PL compile
> time, to that element and check it every time, the statement is executed.
> But such a change definitely needs more testing and probably won't
> qualify for backpatching.

Right. Note, not arguing for backpatch here, just some open
speculation and some evidence that we still have a problem (although
nearly as nasty of one -- the pre-patch behavior of not responding to
cancel is very dangerous and solved).

Hm, maybe, instead of trying to figure out if in a loop, set a
'called' flag  with each statement and only cache when touched the
second time.  (If that's easier, dunno).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread Greg Stark
On Mon, Jul 18, 2016 at 2:41 PM,   wrote:
> And  I  will  be  really happy when there are processors with infinite
> performance and memory with infinite size.
>:)))

Well for what it's worth there's no theoretical difference between
multi-process and multi-threaded. They're just two different APIs to
create shared memory and other kernel data structures and they both
allow all the sharing you want.  In the API Postgres uses everything
starts out non-shared and we explicitly set up the parts we want
shared. In the other nearly everything starts shared though it's
possible to unshare parts. Once they're set up the CPU and MMU don't
really care what kernel API was used to set them up.

In other words, there's no theoretical reason you couldn't have adapt
a JVM to create a large shared memory segment using mmap or SysV
shared memory and live entirely within that including the Java stacks
and object allocations and so on. The net result of what's running on
the CPU can actually end up being exactly equivalent (though I suspect
a natural implementation would end up being slightly different) and
there are pros and cons to each API.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-18 Thread Jan Wieck
On Mon, Jul 18, 2016 at 9:43 AM, Tom Lane  wrote:

> Merlin Moncure  writes:
> > BTW, while the fix does address the cleanup performance issue, it's
> > still the case that anonymous code blocks burn up lots of resident
> > memory (my 315k example I tested with ate around 8gb IIRC) when run
> > like this.  My question is, if the pl/pgsql code block is anonymous
> > and not in some kind of a loop, why bother caching the plan at all?
>
> Nobody got around to it.  Also, as you note, it's not as simple as
> "don't cache if in a DO block".  You'd need to track whether you were
> inside any sort of looping construct.  Depending on how difficult
> that turned out to be, it might add overhead to regular functions
> that we don't want.


Agreed. And from the structures themselves it is not really easy to detect
if inside of a loop, the toplevel, while, for and if all use the same
statement
block and call exec_stmts(), which in turn calls exec_stmt() for  each
element in that list. It is not impossible to add a flag, set at PL compile
time, to that element and check it every time, the statement is executed.
But such a change definitely needs more testing and probably won't
qualify for backpatching.

In the meantime, would it be appropriate to backpatch the double linking
of memory context children at this time? I believe it has had plenty of
testing in the 9.6 cycle to be sure it didn't break anything.


Regards, Jan

-- 
Jan Wieck
Senior Postgres Architect
http://pgblog.wi3ck.info


Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-18 Thread Tom Lane
Merlin Moncure  writes:
> BTW, while the fix does address the cleanup performance issue, it's
> still the case that anonymous code blocks burn up lots of resident
> memory (my 315k example I tested with ate around 8gb IIRC) when run
> like this.  My question is, if the pl/pgsql code block is anonymous
> and not in some kind of a loop, why bother caching the plan at all?

Nobody got around to it.  Also, as you note, it's not as simple as
"don't cache if in a DO block".  You'd need to track whether you were
inside any sort of looping construct.  Depending on how difficult
that turned out to be, it might add overhead to regular functions
that we don't want.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread AMatveev
Hi

> This https://github.com/davecramer/plj-new is a very old project
> that did work at one time which attempted to do RPC calls to the jvm to 
> address exactly this problem.

> However "cheaply" calling jvm from sql or vice-versa is not really possible.
> I do like the idea of the background worker and shared memory though.

It's not opposite concepts. It's like two level cache.
Somethingis   best   with   shared memory.
When "a sharing of upper layer" is best with shared process.
And there is something that should not sharing at all.
Any deviation is always overhead.

But to be honest I  really do not like "sharing".
It is against human nature. 
And  I  will  be  really happy when there are processors with infinite
performance and memory with infinite size.
:)))







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-18 Thread Merlin Moncure
On Sat, Jul 16, 2016 at 2:47 PM, Jan Wieck  wrote:
> On Tue, Jul 12, 2016 at 3:29 PM, Merlin Moncure  wrote:
>>
>> I've noticed that pl/pgsql functions/do commands do not behave well
>> when the statement resolves and frees memory.   To be clear:
>>
>> FOR i in 1..100
>> LOOP
>>   INSERT INTO foo VALUES (i);
>> END LOOP;
>>
>> ...runs just fine while
>>
>> BEGIN
>>   INSERT INTO foo VALUES (1);
>>   INSERT INTO foo VALUES (2);
>>   ...
>>   INSERT INTO foo VALUES (100);
>> END;
>
>
> This sounds very much like what led to commit
> 25c539233044c235e97fd7c9dc600fb5f08fe065.
>
> It seems that patch was only applied to master and never backpatched to 9.5
> or earlier.

You're right; thanks (my bad for missing that).  For those following
along, the case that turned this up was:
DO


...;

Where the insertion step was a large number of standalone insert statements.

(temp table creation isn't necessary to turn up this bug, but it's a
common pattern when sending batch updates to a server).

For those following along, the workaround I recommend would be to do this:

do $d$
begin

create function doit() returns void as
$$
  
$$ language sql;
perform doit();
end;
$d$;

BTW, while the fix does address the cleanup performance issue, it's
still the case that anonymous code blocks burn up lots of resident
memory (my 315k example I tested with ate around 8gb IIRC) when run
like this.  My question is, if the pl/pgsql code block is anonymous
and not in some kind of a loop, why bother caching the plan at all?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread Dave Cramer
On 18 July 2016 at 06:04,  wrote:

> Hi
>
> > There's https://github.com/jnr/jnr-ffi that enables to call C
> > functions without resorting to writing JNI wrappers.
> I have not said that you are wrong.
> It's the dark side of "like seprate process"
> They can cheaply call sql from jvm.
> And they can't cheaply call jvm from sql.
>

This https://github.com/davecramer/plj-new is a very old project that did
work at one time which attempted to do RPC calls to the jvm to address
exactly this problem.

However "cheaply" calling jvm from sql or vice-versa is not really possible.

I do like the idea of the background worker and shared memory though.

Dave Cramer

da...@postgresintl.com
www.postgresintl.com


Re: [HACKERS] pg_xlogdump follow into the future

2016-07-18 Thread Magnus Hagander
On Thu, Jul 14, 2016 at 8:27 PM, Andres Freund  wrote:

> On 2016-07-14 13:46:23 +0200, Magnus Hagander wrote:
> > Currently, if you run pg_xlogdump with -f, you have to specify an end
> > position in an existing file, or if you don't it will only follow until
> the
> > end of the current file.
>
> That's because specifying a file explicitly says that you only want to
> look at that file, specifying two files that you want the range
> inclusively between the two files.  -f works if you just use -s.
>

Hmm. It does now. I'm *sure* it didn't when I was testing it. It must've
been something else that was broken at that point :)



> > I'd appreciate a review of that by someone who's done more work on the
> xlog
> > stuff, but it seems trivial to me. Not sure I can argue it's a bugfix
> > though, since the usecase simply did not work...
>
> I'd say it's working as intended, and you want to change that
> intent. That's fair, but I'd not call it a bug, and I'd say it's not
> really 9.6 material.
>

Based on that, I agree that it's working as intended.

And definitely that it's not 9.6 material.

I'll stick it on the CF page so I don't forget about it.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread AMatveev
Hi

> There's https://github.com/jnr/jnr-ffi that enables to call C
> functions without resorting to writing JNI wrappers.
I have not said that you are wrong.
It's the dark side of "like seprate process"
They can cheaply call sql from jvm.
And they can't cheaply call jvm from sql.

Jvm in oracle  appeared a long time ago.
May by when java thread model had many faults.(befor jvm 1.6 for example)
Nowadays it seems to have sense only for compatibility.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread AMatveev
Hi


> I admit that it is risky, but I think there are things that could be
> done to limit the risk.  I don't believe we can indefinitely continue
> to ignore the potential performance benefits of making a switch like
> this.  Breaking a thirty-year old code base irretrievably would be
> sad, but letting it fade into irrelevance because we're not willing to
> make the architecture changes that are needed to remain relevant would
> be sad, too.

I can add, that nowadays it seems
that the paralleling processing is the only way to scale.
They  can't  wait  that  CPU  Clock  Speeds Increased in in the coming
years.

I understand that use of thread has some difficulties.
I can not understand why use of thread can have disadvantages.
Actually  I  think  that  parallelling  using threads is much easy than
parallelling  using processes.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-07-18 Thread Andres Freund
On 2016-07-18 01:33:10 -0700, Andres Freund wrote:
> On 2016-07-18 10:02:52 +0530, Amit Kapila wrote:
> > On Mon, Jul 18, 2016 at 9:13 AM, Andres Freund  wrote:
> > > On 2016-07-18 09:07:19 +0530, Amit Kapila wrote:
> > >> + /*
> > >> + * Before locking the buffer, pin the visibility map page if it may be
> > >> + * necessary.
> > >> + */
> > >>
> > >> + if (PageIsAllVisible(BufferGetPage(*buffer)))
> > >> + visibilitymap_pin(relation, block, );
> > >> +
> > >>   LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
> > >>
> > >> I think we need to check for PageIsAllVisible and try to pin the
> > >> visibility map after taking the lock on buffer. I think it is quite
> > >> possible that in the time this routine tries to acquire lock on
> > >> buffer, the page becomes all visible.
> > >
> > > I don't see how. Without a cleanup lock it's not possible to mark a page
> > > all-visible/frozen.
> > >
> >
> > Consider the below scenario.
> >
> > Vacuum
> > a. acquires a cleanup lock for page - 10
> > b. busy in checking visibility of tuples
> > --assume, here it takes some time and in the meantime Session-1
> > performs step (a) and (b) and start waiting in step- (c)
> > c. marks the page as all-visible (PageSetAllVisible)
> > d. unlockandrelease the buffer
> >
> > Session-1
> > a. In heap_lock_tuple(), readbuffer for page-10
> > b. check PageIsAllVisible(), found page is not all-visible, so didn't
> > acquire the visbilitymap_pin
> > c. LockBuffer in ExlusiveMode  - here it will wait for vacuum to
> > release the lock
> > d. Got the lock, but now the page is marked as all-visible, so ideally
> > need to recheck the page and acquire the visibilitymap_pin
>
> So, I've tried pretty hard to reproduce that. While the theory above is
> sound, I believe the relevant code-path is essentially dead for SQL
> callable code, because we'll always hold a buffer pin before even
> entering heap_update/heap_lock_tuple.  It's possible that you could
> concoct a dangerous scenario with follow_updates though; but I can't
> immediately see how.  Due to that, and based on the closing in beta
> release, I'm planning to push a version of the patch that the returns
> fixed; but not this.  It seems better to have the majority of the fix
> in.

Pushed that way. Let's try to figure out a good solution to a) test this
case b) how to fix it in a reasonable way. Note that there's also
http://archives.postgresql.org/message-id/20160718071729.tlj4upxhaylwv75n%40alap3.anarazel.de
which seems related.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread AMatveev
Hi


> The issue here is an architectural mismatch between PostgreSQL and
> the JVM, made worse by the user's very stored-proc-heavy code. Some
> other runtime that's designed to co-operate with a multiprocessing
> environment could well be fine, but the JVM isn't. At least, the 
> Sun/Oracle/OpenJDK JVM isn't.

Actually  the  lack of threads make any vm quite limit in some aspects of 
scalability.
The  desire  to  use  jvm  is  the  result  that there is no desire to
reinvent the wheel.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-18 Thread AMatveev
Hi


>  Such a host delegate process could be explicitly built with
> multithread support and not 'infect' the rest of the code with its 
> requirements.
>  
>  Using granular RPC is nice for isolation but I am concerned that the 
> latencies might be high.
I agree with you.
Moreover I think that some decision have not sense with this "thread model" in 
any way.
For example Embedded oracle XML DB applications:
http://docs.oracle.com/cd/B28359_01/appdev.111/b28369/xdb23jv1.htm#i1043708
"""
You can run a Java servlet.
Servlets work better as the top-level entry point into Oracle Database, and 
require using HTTP(S) as the protocol to access Oracle Database.
"""

> What I know about Oracle, PL/SQL, Java - all is executed as
> outprocess calls. I am sure, so PL doesn't share process with SQL engine there

It's better to say that java is executed like outprocess calls.
It's done for wide libraries support.
But it is not separate  process.
"""
The JDBC server-side internal driver,
the Oracle JVM, the database, and the SQL engine all run within the same 
address space, and therefore, the issue of network round-trips is irrelevant
"""
http://docs.oracle.com/cd/E11882_01/java.112/e16548/overvw.htm#JJDBC28026



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typos in logical decoding

2016-07-18 Thread Antonin Houska
While reading the logical decoding code I noticed a few supposedly mistyped
comments, see the diff below.

Besides that, output_plugin_options argument of CreateInitDecodingContext
function is unused, and the function passes NIL to the corresponding argument
of StartupDecodingContext. This looks to me like a thinko.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
new file mode 100644
index 7c8a777..ecf9a03
*** a/src/backend/replication/logical/logical.c
--- b/src/backend/replication/logical/logical.c
*** CreateInitDecodingContext(char *plugin,
*** 281,287 
  	LWLockRelease(ProcArrayLock);
  
  	/*
! 	 * tell the snapshot builder to only assemble snapshot once reaching the a
  	 * running_xact's record with the respective xmin.
  	 */
  	xmin_horizon = slot->data.catalog_xmin;
--- 281,287 
  	LWLockRelease(ProcArrayLock);
  
  	/*
! 	 * tell the snapshot builder to only assemble snapshot once reaching the
  	 * running_xact's record with the respective xmin.
  	 */
  	xmin_horizon = slot->data.catalog_xmin;
*** LogicalIncreaseRestartDecodingForSlot(XL
*** 880,886 
  }
  
  /*
!  * Handle a consumer's conformation having received all changes up to lsn.
   */
  void
  LogicalConfirmReceivedLocation(XLogRecPtr lsn)
--- 880,886 
  }
  
  /*
!  * Handle a consumer's confirmation having received all changes up to lsn.
   */
  void
  LogicalConfirmReceivedLocation(XLogRecPtr lsn)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
new file mode 100644
index 00e31a2..9e2208a
*** a/src/backend/replication/logical/reorderbuffer.c
--- b/src/backend/replication/logical/reorderbuffer.c
*** ReorderBufferGetTupleBuf(ReorderBuffer *
*** 466,473 
  	/*
  	 * Most tuples are below MaxHeapTupleSize, so we use a slab allocator for
  	 * those. Thus always allocate at least MaxHeapTupleSize. Note that tuples
! 	 * tuples generated for oldtuples can be bigger, as they don't have
! 	 * out-of-line toast columns.
  	 */
  	if (alloc_len < MaxHeapTupleSize)
  		alloc_len = MaxHeapTupleSize;
--- 466,473 
  	/*
  	 * Most tuples are below MaxHeapTupleSize, so we use a slab allocator for
  	 * those. Thus always allocate at least MaxHeapTupleSize. Note that tuples
! 	 * generated for oldtuples can be bigger, as they don't have out-of-line
! 	 * toast columns.
  	 */
  	if (alloc_len < MaxHeapTupleSize)
  		alloc_len = MaxHeapTupleSize;
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
new file mode 100644
index b4dc617..b5fa3db
*** a/src/backend/replication/logical/snapbuild.c
--- b/src/backend/replication/logical/snapbuild.c
*** SnapBuildEndTxn(SnapBuild *builder, XLog
*** 901,907 
  	/*
  	 * NB: This handles subtransactions correctly even if we started from
  	 * suboverflowed xl_running_xacts because we only keep track of toplevel
! 	 * transactions. Since the latter are always are allocated before their
  	 * subxids and since they end at the same time it's sufficient to deal
  	 * with them here.
  	 */
--- 901,907 
  	/*
  	 * NB: This handles subtransactions correctly even if we started from
  	 * suboverflowed xl_running_xacts because we only keep track of toplevel
! 	 * transactions. Since the latter are always allocated before their
  	 * subxids and since they end at the same time it's sufficient to deal
  	 * with them here.
  	 */
*** SnapBuildCommitTxn(SnapBuild *builder, X
*** 981,987 
  		 * we reached consistency.
  		 */
  		forced_timetravel = true;
! 		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running to early", xid);
  	}
  
  	for (nxact = 0; nxact < nsubxacts; nxact++)
--- 981,987 
  		 * we reached consistency.
  		 */
  		forced_timetravel = true;
! 		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
  	}
  
  	for (nxact = 0; nxact < nsubxacts; nxact++)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-07-18 Thread Andres Freund
On 2016-07-18 10:02:52 +0530, Amit Kapila wrote:
> On Mon, Jul 18, 2016 at 9:13 AM, Andres Freund  wrote:
> > On 2016-07-18 09:07:19 +0530, Amit Kapila wrote:
> >> + /*
> >> + * Before locking the buffer, pin the visibility map page if it may be
> >> + * necessary.
> >> + */
> >>
> >> + if (PageIsAllVisible(BufferGetPage(*buffer)))
> >> + visibilitymap_pin(relation, block, );
> >> +
> >>   LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
> >>
> >> I think we need to check for PageIsAllVisible and try to pin the
> >> visibility map after taking the lock on buffer. I think it is quite
> >> possible that in the time this routine tries to acquire lock on
> >> buffer, the page becomes all visible.
> >
> > I don't see how. Without a cleanup lock it's not possible to mark a page
> > all-visible/frozen.
> >
> 
> Consider the below scenario.
> 
> Vacuum
> a. acquires a cleanup lock for page - 10
> b. busy in checking visibility of tuples
> --assume, here it takes some time and in the meantime Session-1
> performs step (a) and (b) and start waiting in step- (c)
> c. marks the page as all-visible (PageSetAllVisible)
> d. unlockandrelease the buffer
> 
> Session-1
> a. In heap_lock_tuple(), readbuffer for page-10
> b. check PageIsAllVisible(), found page is not all-visible, so didn't
> acquire the visbilitymap_pin
> c. LockBuffer in ExlusiveMode  - here it will wait for vacuum to
> release the lock
> d. Got the lock, but now the page is marked as all-visible, so ideally
> need to recheck the page and acquire the visibilitymap_pin

So, I've tried pretty hard to reproduce that. While the theory above is
sound, I believe the relevant code-path is essentially dead for SQL
callable code, because we'll always hold a buffer pin before even
entering heap_update/heap_lock_tuple.  It's possible that you could
concoct a dangerous scenario with follow_updates though; but I can't
immediately see how.  Due to that, and based on the closing in beta
release, I'm planning to push a version of the patch that the returns
fixed; but not this.  It seems better to have the majority of the fix
in.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changing the result set to contain the cost of the optimizer's chosen plan

2016-07-18 Thread Srinivas Karthik V
Hi Tom,

Yes, we indeed get the cost of the plan at the first line itself. Somehow,
I missed this point. We just in fact implemented this functionality and its
working. Thanks again.

Regards,
Srinivas Karthik

On Tue, Jul 12, 2016 at 6:43 AM, Craig Ringer  wrote:

> On 11 July 2016 at 23:29, Tom Lane  wrote:
>
>> Srinivas Karthik V  writes:
>> > Specifically, I have a Java program which calls
>> > ResultSet rs = statement.executeQuery("explain select * from table");
>> > I would like to change PostgreSQL such that ResultSet rs should contain
>> a
>> > field that contains also the cost of the optimizer chosen plan.
>>
>> Why do you need to change anything?  The cost is right there in the
>> first line of the result text.  It might be easier to parse out if
>> you use one of EXPLAIN's intended-to-be-machine-readable output
>> formats, though. 
>
>
> Yeah - if we were going to do this at all, it'd want to be output that
> decomposes _all_ the explain output into columns.  But since we can emit
> json, xml, etc, I don't really see the point.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


[HACKERS] heap_update() VM retry could break HOT?

2016-07-18 Thread Andres Freund
Hi,

heap_update() retries pinning the vm pinning, as explained in the
following comment:

/*
 * Before locking the buffer, pin the visibility map page if it appears 
to
 * be necessary.  Since we haven't got the lock yet, someone else might 
be
 * in the middle of changing this, so we'll need to recheck after we 
have
 * the lock.
 */
if (PageIsAllVisible(page))
visibilitymap_pin(relation, block, );

...

/*
 * If we didn't pin the visibility map page and the page has become all
 * visible while we were busy locking the buffer, or during some
 * subsequent window during which we had it unlocked, we'll have to 
unlock
 * and re-lock, to avoid holding the buffer lock across an I/O.  That's 
a
 * bit unfortunate, especially since we'll now have to recheck whether 
the
 * tuple has been locked or updated under us, but hopefully it won't
 * happen very often.
 */
if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
visibilitymap_pin(relation, block, );
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
goto l2;
}

unfortunately the l2 target is after the following:
/*
 * If we're not updating any "key" column, we can grab a weaker lock 
type.
 * This allows for more concurrency when we are running simultaneously
 * with foreign key checks.
 *
 * Note that if a column gets detoasted while executing the update, but
 * the value ends up being the same, this test will fail and we will use
 * the stronger lock.  This is acceptable; the important case to 
optimize
 * is updates that don't manipulate key columns, not those that
 * serendipitiously arrive at the same key values.
 */
HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs, id_attrs,
 
_hot, _key,
 _id, 
, newtup);
if (satisfies_key)
{
*lockmode = LockTupleNoKeyExclusive;
mxact_status = MultiXactStatusNoKeyUpdate;
key_intact = true;

/*
 * If this is the first possibly-multixact-able operation in the
 * current transaction, set my per-backend OldestMemberMXactId
 * setting. We can be certain that the transaction will never 
become a
 * member of any older MultiXactIds than that.  (We have to do 
this
 * even if we end up just using our own TransactionId below, 
since
 * some other backend could incorporate our XID into a MultiXact
 * immediately afterwards.)
 */
MultiXactIdSetOldestMember();
}
else
{
*lockmode = LockTupleExclusive;
mxact_status = MultiXactStatusUpdate;
key_intact = false;
}

as far as I can see that could mean that we perform hot updates when not
permitted, because the tuple has been replaced since, including the
pkey. Similarly, the wrong tuple lock mode could end up being used.

Am I missing something?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RecoveryTargetTLI dead variable in XLogCtlData

2016-07-18 Thread Michael Paquier
On Wed, Jul 13, 2016 at 12:29 PM, Michael Paquier
 wrote:
> I just bumped into $subject, a variable that is never set and never used:
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -631,8 +631,6 @@ typedef struct XLogCtlData
> TimeLineID  replayEndTLI;
> /* timestamp of last COMMIT/ABORT record replayed (or being replayed) */
> TimestampTz recoveryLastXTime;
> -   /* current effective recovery target timeline */
> -   TimeLineID  RecoveryTargetTLI;

d57a9734 has missed this cleanup.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers