Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-02 Thread Peter Geoghegan
On Sun, Sep 1, 2013 at 4:06 AM, Andres Freund  wrote:
>> We're looking for the first duplicate. So it would probably be okay
>> for the IGNORE case to not bother retrying and re-locking if the other
>> transaction committed (which, at least very broadly speaking, is the
>> likely outcome).
>
> Hm. Could you explain this a bit more? Do you mean that we can stop
> trying to insert after the first violation (yes, sure if so)?

Yeah, that's all I mean. Nothing controversial there.

> I can't really agree with this part. First off, a heap_insert()
> certainly isn't lightweight operation. There's several things that can
> take extended time:
> * toasting of rows (writing a gigabyte worth of data...)
> * extension of the heap/toast table
> * writeout of a dirty victim buffer if the page isn't in s_b
> * reading in the target page if the page isn't in s_b
> * waiting for an exclusive lock on the target pages

These are all valid concerns. I'll need to better assess just how bad
this can get (hopefully with help from others). But some of this stuff
can surely be done separately for my case. For example, surely we
could make TOASTing occur after index insertion proper when locks are
released, without too much work.

> Secondly, you seem to be saying that such a lock would only block other
> sessions from finishing of tuple insertions - but it will block normal
> index searches as well? I have a hard time accepting the fact that one
> session using INSERT ... KEY IGNORE will block lookup of a range of values
> from the index in *other* sessions.

I'm sorry that I created the impression that I was denying an impact
on read queries as compared to regular insertion. Of course, regular
insertion also blocks read queries for very small intervals, and there
could be quite a bit of work to be done with that exclusive lock held
already. I am only proposing increasing that period for this case, and
usually by not terribly much.

> I'd expect that with the current implementation strategy if you
> e.g. made pgbench use INSERT .. ON DUPLICATE IGNORE without ever
> actually hitting duplicates, you'd see a very, very noticeable hit in
> scalability. I wouldn't expect a single session to get much slower bug
> trying a dozen or two should show a very noticeable dip.

I do not accept the premise of your questions regarding the patch's
performance, which appears to be that it's fair to expect the same
level of performance of INSERT .,. ON DUPLICATE IGNORE as it is to
expect of regular INSERT. No scheme is going to get that, including
your scheme, which necessitates WAL logging everything related to
index tuple insertion twice for each unique index. We certainly didn't
require SSI to perform as well as the current REPEATABLE READ, and
certainly not as well as READ COMMITTED, and for good reasons.

That said, I'm not dismissive of these concerns. As I said already, by
all means, let us scrutinise the performance of the patch.

I'm not going to bother benchmarking the patch myself until I give it
another whack. There is some really obvious low-hanging performance
fruit. Obviously by this I mean doing as little as possible when
holding the exclusive lock.

> I think we could optimize away the second search with some
> cleverness. If we remember & pin the page from the first search we know
> that the promise will have to be either on that page or - after a page
> split - potentially to ones on the right. Which we should be able to find
> by repeatedly stepping right. Similar to logic of doing the
> _bt_moveright() in _bt_doinsert().

That may well be possible, but we'd still have to do a binary search
on the page. And under your proposal, that will happen twice - each
time with an exclusive buffer lock held. Since our intent would be to
update in-place, the entire binary search operation would need a full
write/exclusive lock the second time around as well. You might even
end up doing more work with an exclusive/write lock held than under my
proposal, before you even factor in the other increased costs
(granted, that's not that likely, but you're still doing much more
with exclusive locks held than the regular insertion case).

What you describe here is the kind of trick that could be very useful
to another, existing common cases -- blocking on an xid in the event
of a possible duplicate (as you know, it's okay to hold pins pretty
much indefinitely) -- and yet has remained unimplemented for many
years.

> I am pretty sure it would be better. We know that nbtree works pretty
> damn well under concurrency. And the "promise" logic wouldn't change
> anything of that.

It would have multiple extra steps. Including WAL-logging of the
initial insertion, and clean-up for the IGNORE case. And WAL-logging
of the cleanup. Instead of doing exactly nothing extra in the IGNORE
case, which, under my proposal, has an additional duration of the
exclusive lock being held of zero (instead of the ereport() call we
just immediately release the lock and report 

Re: [HACKERS] backup.sgml-cmd-v003.patch

2013-09-02 Thread Karl O. Pinc
On 09/02/2013 10:56:54 PM, Karl O. Pinc wrote:
> I have frobbed your  to adjust the indentation and
> line-wrap style.  

Oops.  Somehow left a \ out of this.  Anyhow, you get the idea.



Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further XLogInsert scaling tweaking

2013-09-02 Thread Bruce Momjian
On Mon, Sep  2, 2013 at 10:14:03AM +0300, Heikki Linnakangas wrote:
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index 39c58d0..28e62ea 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -428,8 +428,14 @@ typedef struct XLogCtlInsert
>   uint64  CurrBytePos;
>   uint64  PrevBytePos;
>  
> - /* insertion slots, see above for details */
> - XLogInsertSlotPadded *insertSlots;
> + /*
> +  * Make sure the above heavily-contended spinlock and byte positions are
> +  * on their own cache line. In particular, the RedoRecPtr and full page
> +  * write variables below should be on a different cache line. They are
> +  * read on every WAL insertion, but updated rarely, and we don't want
> +  * those reads to steal the cache line containing Curr/PrevBytePos.
> +  */
> + charpad[128];

Do we adjust for cache line lengths anywhere else?  PGPROC?  Should it
be a global define?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory

2013-09-02 Thread Noah Misch
On Tue, Sep 03, 2013 at 12:52:22AM +0200, Andres Freund wrote:
> On 2013-09-01 12:07:04 -0400, Noah Misch wrote:
> > On Sun, Sep 01, 2013 at 05:08:38PM +0200, Andres Freund wrote:
> > > On 2013-09-01 09:24:00 -0400, Noah Misch wrote:
> > > > The difficulty depends on whether processes other than the segment's 
> > > > creator
> > > > will attach anytime or only as they start.  Attachment at startup is 
> > > > enough
> > > > for parallel query, but it's not enough for something like lock table
> > > > expansion.  I'll focus on the attach-anytime case since it's more 
> > > > general.
> > > 
> > > Even on startup it might get more complicated than one immediately
> > > imagines on EXEC_BACKEND type platforms because their memory layout
> > > doesn't need to be the same. The more shared memory you need, the harder
> > > that will be. Afair
> > 
> > Non-Windows EXEC_BACKEND is already facing a dead end that way.
> 
> Not sure whether you mean non-windows EXEC_BACKEND isn't going to be
> supported for much longer or that it already has problems.

It already has problems: ASLR measures sometimes prevent reattachment of the
main shared memory segment.  Multiplying the combined size of our
fixed-address mappings does not push us over some threshold where this becomes
a problem, because it is already a problem.

> > > Note that allocating a large mapping, even without using it, has
> > > noticeable cost, at least under linux. The kernel has to create & copy
> > > data to track each pages state (without copying the memory content's
> > > itself due to COW) for every fork afterwards.

> So, after reading up on the issue a bit more and reading some more
> kernel code, a large mmap(PROT_NONE, MAP_PRIVATE) won't cause much
> problems except counting in ulimit -v. It will *not* cause overcommit
> violations. mmap(PROT_NONE, MAP_SHARED) will tho, even if not yet
> faulted. Which means that to be reliable and not violate overcommit we'd
> need to munmap() a chunk of PROT_NONE, MAP_PRIVATE memory, and
> immediately (without interceding mallocs, using mmap itself) map it again.
> 
> It only gets really expensive in the sense of making fork expensive if
> you set protections on many regions in that mapping individually. Each
> mprotect() call will split the VMA into distinct pieces and they won't
> get merged even if there are neighboors with the same settings.

Thanks for researching that.

> > > > I don't foresee fundamental differences on 32-bit.  All the allocation
> > > > maximums scale down, but that's the usual story for 32-bit.
> > > 
> > > If you actually want to allocate memory after starting up, without
> > > carving a section out for that from the beginning, the memory
> > > fragmentation will make it very hard to find memory addresses of the
> > > same across processes.
> > 
> > True.  I wouldn't feel bad if total dynamic shared memory usage above, say,
> > 256 MiB were unreliable on 32-bit.  If you're still running 32-bit in 2015,
> > you probably have a low-memory platform.
> 
> Not sure. I think that will partially depend on whether x32 will have
> any success which I still find hard to judge.

I won't hold my breath for x32 becoming a common platform for high-memory
database servers, regardless of other successes it might find.  Not
impossible, but I recommend placing trivial priority on maximizing performance
for that scenario.

> > I think the take-away is that we have a lot of knobs available, not a bright
> > line between possible and impossible.  Robert opted to omit provision for
> > reliable fixed addresses, and the upsides of that decision are the absence 
> > of
> > a DBA-unfriendly space-reservation GUC, trivial overhead when the APIs are 
> > not
> > used, and a clearer portability outlook.
> 
> I guess my point is that if we want to develop stuff that requires
> reliable addresses, we should build support for that from a low level
> up. Not rely on a hack^Wlayer ontop of the actual dynamic shared memory
> API.
> That is, it should be a flag to dsm_create() that we require a fixed
> address and dsm_attach() will then automatically use that or die
> trying. Requiring implementations to take care about passing addresses
> around and fiddling with mmap/windows api to make sure those mappings
> are possible doesn't strike me to be a good idea.

I agree.

> In the end, you're going to be the primary/first user as far as I
> understand things, so you'll have to argue whether we need fixed
> addresses or not. I don't think it's a good idea to forgo this decision
> on this layer and bolt on another ontop if we decide it's neccessary.

We don't need fixed addresses.  Parallel internal sort will probably include
the equivalent of a SortTuple array in its shared memory segment, and that
implies relative pointers to the tuples also stored in shared memory.  I
expect that wart to be fairly isolated within the code, so little harm done.

I don't think we will have at all painted ourselves into a corner, should 

Re: [HACKERS] Extension Templates S03E11

2013-09-02 Thread Michael Paquier
On Tue, Sep 3, 2013 at 4:20 AM, David Fetter  wrote:
> On Mon, Sep 02, 2013 at 02:32:16AM -0400, Peter Eisentraut wrote:
>> On Thu, 2013-08-29 at 12:16 +0200, Dimitri Fontaine wrote:
>> > Here's v14 of the patch with pg_upgrade support fixed, so that the
>> > automated setup that Peter built is able to have at it!
>>
>> Fails cpluspluscheck:
>>
>> In file included from /tmp/cpluspluscheck.5g2uWw/test.cpp:3:0:
>> ./src/include/commands/template.h:47:8: error: ‘ExtensionControl’ does not 
>> name a type
>> ./src/include/commands/template.h:51:8: error: ‘ExtensionControl’ does not 
>> name a type
>>
>> I think this actually just means the header does not include all it needs by 
>> itself.
>
> Is there some standard set of checks you run on new patches, and are
> the results showing up on, say, the buildfarm or some other CI
> dashboard?
I believe that Peter does all those checks using his own Jenkins environment:
http://pgci.eisentraut.org/jenkins/
For the commit fest patches here you go:
http://pgci.eisentraut.org/jenkins/view/All/job/postgresql_commitfest_world/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-02 Thread Peter Geoghegan
On Mon, Sep 2, 2013 at 7:10 PM, Jeff Davis  wrote:
> I'd like to submit a patch to just make it into a PGC_POSTMASTER and
> remove the code to support changing it.

Makes sense to me.

I wonder, is anyone really running in production with full_page_writes
off? I talked to someone a while ago who used Postgres on ZFS with
data journaling, and was perfectly well aware of the fact that
theoretically he could safely turn the setting off, and yet chose not
to. Now, I know that Greg Smith's book describes the conditions in
which it's acceptable and the precautions that should be taken and so
on, but in my (admittedly relatively limited) experience, no one
actually does it in production. My sample size for people who at least
considered doing it (that both believed that Postgres could write
pages atomically on their hardware/FS, and also knew that
full_page_writes exists and what it means) is only 1.

At least in people's minds, it might be that the knowledge that no one
runs with full_page_writes off is enough to put them off. It's like
building Postgres with a non-standard BLCKSZ -- even if you had solid
evidence that it would help performance in your case, would you really
want to do it?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-02 Thread Jeff Davis
There is a significant amount of code supporting the changing of
full_page_writes while the server is running, including an
XLOG_FPW_CHANGE wal record. But I don't see the use-case; surely almost
nobody changes this on a running server, because either your filesystem
guarantees atomic page writes for you or not.

I'd like to submit a patch to just make it into a PGC_POSTMASTER and
remove the code to support changing it.

Then, I intend to write another patch to make the full-page writes for
checksums honor the full_page_writes setting. That will be easier to
write once it's a PGC_POSTMASTER.

Any reason to keep the setting as a PGC_SIGHUP?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory

2013-09-02 Thread Robert Haas
On Mon, Sep 2, 2013 at 6:52 PM, Andres Freund  wrote:
> Not sure whether you mean non-windows EXEC_BACKEND isn't going to be
> supported for much longer or that it already has problems.

I'm not sure what Noah was getting at, but I have used EXEC_BACKEND
twice now during development, in situations where I would have needed
a Windows development otherwise.  So it's definitely useful, at least
to me.  But on my MacBook Pro, you have to compile it with -fno-pie (I
think that's the right flag) to disable ASLR in order to get reliable
operation.  I imagine such problems will become commonplace on more
and more platforms as time wears on.

> I guess my point is that if we want to develop stuff that requires
> reliable addresses, we should build support for that from a low level
> up. Not rely on a hack^Wlayer ontop of the actual dynamic shared memory
> API.
> That is, it should be a flag to dsm_create() that we require a fixed
> address and dsm_attach() will then automatically use that or die
> trying. Requiring implementations to take care about passing addresses
> around and fiddling with mmap/windows api to make sure those mappings
> are possible doesn't strike me to be a good idea.
>
> In the end, you're going to be the primary/first user as far as I
> understand things, so you'll have to argue whether we need fixed
> addresses or not. I don't think it's a good idea to forgo this decision
> on this layer and bolt on another ontop if we decide it's neccessary.

I didn't intend to punt that decision to another layer so much as
another patch and a more detailed examination of requirements.   IME,
given a choice between something that is 99% reliable and provides
more functionality, or something that is 99.99% reliable and provides
less functionality, this community picks the latter every time.  And
that's why I've left out any capability to insist on a fixed address
from this patch.  It would be nice to have, to be sure.  But it also
would take more work and add more complexity, and I don't have a clear
sense that that work would be justified.

Now, we might get to a point where it seems clear that we're not going
to get any further with parallelism without adding a capability for
fixed-address mappings.  If that happens, I think that's the time to
come back to this layer and add that capability.  But right now it
doesn't seem essential.  Now, having said that, I didn't see any
particular reason to bury the ability to pass mmap() or shmat() a
*preferred* address.  But IJWH.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 RC1 psql encoding reporting inconsistently?

2013-09-02 Thread David Johnston
Tom Lane-2 wrote
> Michael Nolan <

> htfoot@

> > writes:
>> This is 9.3 RC1 on a Fedora 7 system. Why does \l report the encoding
>> as SQL_ASCII and \set report it as UTF8?
> 
> psql sets client_encoding based on its environment (LANG or related
> variables).  That's been true for some time --- since 9.1, according
> to a quick check.
> 
>   regards, tom lane

My knowledge of encoding is minimal but to expand on the comment:

Client and server (or, more specifically, database) encodings can and often
do differ just as you are seeing here.

I'm guessing that somewhere deep inside psql and/or postgres encoding
conversion is performed if the client and server do not match.  While I
guess it is possible to try and auto-adapt the client encoding to match the
server/database the current policy is to require the user to explicitly (so
to speak) declare the encoding they are using on their client.

I guess a counter-question would be: what would you expect "\set" to report
and why?

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/9-3-RC1-psql-encoding-reporting-inconsistently-tp5769334p5769339.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-09-02 Thread Bruce Momjian
On Fri, Aug 30, 2013 at 04:24:06PM -0400, Stephen Frost wrote:
> > The scenario I'm worried about is where somebody says "hey, Postgres has
> > RLS now, I can rely on that to hide my sooper sekrit data from other users
> > in the same database", and later they have a security breach through some
> > covert-channel attack.  Are they going to blame themselves?  No, they're
> > gonna blame Postgres.  Or consider the case where some bozo publishes
> > a method for such an attack and uses it to badmouth us as insecure.
> 
> In my experience, issues are discovered, patched, and released as
> security updates.  Does adding RLS mean we might have more of those?
> Sure, as did column level privileges and as does *any* such increase in
> the granularity of what we can provide security-wise.

Releasing a feature we think is perfect, and finding out later is isn't,
and fixing it, is not the same as releasing a feature we _know_ isn't
perfect, and having no idea how to fix it.

>From later discussions, it seems like we have to accept that we will
never be able to avoid all convert channels, e.g. timing queries, but we
can avoid the most obvious ones, e.g. EXPLAIN, and improve it as we go.

Knowing there is no end to improving it does make some people feel
uncomfortable, and I can't think of another feature we have added with
such an open-ended nature.  We do have open-ended performance features,
but here is seems the value of the feature itself, security, is not
attainable.  Perhaps reasonable security is attainable.

I am not saying we should reject this feature --- just that the calculus
of how we decide to add this feature doesn't fit our normal analysis.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-09-02 Thread David Johnston
Jeff Davis-8 wrote
> Is there any semantic difference between marking a constraint as
> DISABLED and simply dropping it? Or does it just make it easier to
> re-add it later?

I cannot answer the question but if there is none then the main concern I'd
have is capturing "meta-information" about WHY such a constraint has been
disabled instead of dropped.

I guess this whole feature extends from the trigger disable feature that
already exists.  Given we have the one adding this seems symmetrical...

I cannot really see using either feature on a production system (if
following best practices) but I can imagine where they could both be helpful
during development.  Note with this usage pattern the meta-information about
"why" becomes considerably less important.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/ENABLE-DISABLE-CONSTRAINT-NAME-tp5769136p5769337.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-09-02 Thread Bruce Momjian
On Sun, Sep  1, 2013 at 11:05:58AM -0700, Josh Berkus wrote:
> > Security community also concludes it is not avoidable nature as long
> > as human can observe system behavior and estimate something, thus,
> > security evaluation criteria does not require eliminate covert-channels
> > or does not pay attention about covert-channels for the products that
> > is installed on the environment with basic robustness (that means,
> > non-military, regular enterprise usage).
> 
> To be completely blunt, the security community does not understand
> databases.  At all.  I'd think if anything had become clear through the
> course of work on SEPosgres, it would be that.

Agreed.  The security community realizes these covert channels exist,
but doesn't really have any recommendations on how to avoid them.  You
could argue that avoiding them is too tied to specific database
implementations, but there are general channels, like insert with a
unique key, that should at least have well-defined solutions.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 RC1 psql encoding reporting inconsistently?

2013-09-02 Thread Tom Lane
Michael Nolan  writes:
> This is 9.3 RC1 on a Fedora 7 system. Why does \l report the encoding
> as SQL_ASCII and \set report it as UTF8?

psql sets client_encoding based on its environment (LANG or related
variables).  That's been true for some time --- since 9.1, according
to a quick check.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.3 RC1 psql encoding reporting inconsistently?

2013-09-02 Thread Michael Nolan
This is 9.3 RC1 on a Fedora 7 system. Why does \l report the encoding
as SQL_ASCII
and \set report it as UTF8?

psql (9.3rc1)
Type "help" for help.

postgres=# \l
List of databases
  Name Owner   Encoding  Collate Ctype   Access privileges
-  - --- - -
postgres  postgres SQL_ASCII C   C
template0 postgres SQL_ASCII C   C =c/postgres  +
   postgres=CTc/postgres
template1 postgres SQL_ASCII C   C =c/postgres  +
   postgres=CTc/postgres
(3 rows)

postgres=# \set
AUTOCOMMIT = 'on'
ECHO = 'queries'
PROMPT1 = '%/%R%# '
PROMPT2 = '%/%R%# '
PROMPT3 = '>> '
VERBOSITY = 'default'
VERSION = 'PostgreSQL 9.3rc1 on i686-pc-linux-gnu, compiled by gcc
(GCC) 4.1.2 20070925 (Red Hat 4.1.2-27), 32-bit'
DBNAME = 'postgres'
USER = 'postgres'
PORT = '5432'
ENCODING = 'UTF8'
postgres=#
--
Mike Nolan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory

2013-09-02 Thread Andres Freund
Hi Noah!

On 2013-09-01 12:07:04 -0400, Noah Misch wrote:
> On Sun, Sep 01, 2013 at 05:08:38PM +0200, Andres Freund wrote:
> > On 2013-09-01 09:24:00 -0400, Noah Misch wrote:
> > > The difficulty depends on whether processes other than the segment's 
> > > creator
> > > will attach anytime or only as they start.  Attachment at startup is 
> > > enough
> > > for parallel query, but it's not enough for something like lock table
> > > expansion.  I'll focus on the attach-anytime case since it's more general.
> > 
> > Even on startup it might get more complicated than one immediately
> > imagines on EXEC_BACKEND type platforms because their memory layout
> > doesn't need to be the same. The more shared memory you need, the harder
> > that will be. Afair
> 
> Non-Windows EXEC_BACKEND is already facing a dead end that way.

Not sure whether you mean non-windows EXEC_BACKEND isn't going to be
supported for much longer or that it already has problems.

> > > On a system supporting MAP_FIXED, implement this by having the postmaster
> > > reserve address space under a PROT_NONE mapping, then carving out from 
> > > that
> > > mapping for each fixed-address dynamic segment.  The size of the 
> > > reservation
> > > would be controlled by a GUC; one might set it to several times 
> > > anticipated
> > > peak usage.  (The overhead of doing that depends on the kernel.)  Windows
> > > permits the same technique with its own primitives.
> > 
> > Note that allocating a large mapping, even without using it, has
> > noticeable cost, at least under linux. The kernel has to create & copy
> > data to track each pages state (without copying the memory content's
> > itself due to COW) for every fork afterwards. If you don't believe me,
> > check the whole discussion about go's (the language) memory
> > management...
> 
> I believe you, but I'd appreciate a link to the discussion you have in mind.

Unfortunately I could only find the first half of the discussion about
the issue. Turns out it's not the greatest idea to name your fancy new
programming language "go" (yesyes, petpeeve of mine).

http://lkml.org/lkml/2011/2/8/118
https://lwn.net/Articles/428100/

So, after reading up on the issue a bit more and reading some more
kernel code, a large mmap(PROT_NONE, MAP_PRIVATE) won't cause much
problems except counting in ulimit -v. It will *not* cause overcommit
violations. mmap(PROT_NONE, MAP_SHARED) will tho, even if not yet
faulted. Which means that to be reliable and not violate overcommit we'd
need to munmap() a chunk of PROT_NONE, MAP_PRIVATE memory, and
immediately (without interceding mallocs, using mmap itself) map it again.

It only gets really expensive in the sense of making fork expensive if
you set protections on many regions in that mapping individually. Each
mprotect() call will split the VMA into distinct pieces and they won't
get merged even if there are neighboors with the same settings.

> > > I don't foresee fundamental differences on 32-bit.  All the allocation
> > > maximums scale down, but that's the usual story for 32-bit.
> > 
> > If you actually want to allocate memory after starting up, without
> > carving a section out for that from the beginning, the memory
> > fragmentation will make it very hard to find memory addresses of the
> > same across processes.
> 
> True.  I wouldn't feel bad if total dynamic shared memory usage above, say,
> 256 MiB were unreliable on 32-bit.  If you're still running 32-bit in 2015,
> you probably have a low-memory platform.

Not sure. I think that will partially depend on whether x32 will have
any success which I still find hard to judge.

> I think the take-away is that we have a lot of knobs available, not a bright
> line between possible and impossible.  Robert opted to omit provision for
> reliable fixed addresses, and the upsides of that decision are the absence of
> a DBA-unfriendly space-reservation GUC, trivial overhead when the APIs are not
> used, and a clearer portability outlook.

I guess my point is that if we want to develop stuff that requires
reliable addresses, we should build support for that from a low level
up. Not rely on a hack^Wlayer ontop of the actual dynamic shared memory
API.
That is, it should be a flag to dsm_create() that we require a fixed
address and dsm_attach() will then automatically use that or die
trying. Requiring implementations to take care about passing addresses
around and fiddling with mmap/windows api to make sure those mappings
are possible doesn't strike me to be a good idea.

In the end, you're going to be the primary/first user as far as I
understand things, so you'll have to argue whether we need fixed
addresses or not. I don't think it's a good idea to forgo this decision
on this layer and bolt on another ontop if we decide it's neccessary.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailin

Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-09-02 Thread Jeff Davis
On Fri, 2013-08-30 at 09:57 +0800, wangs...@highgo.com.cn wrote:
> Hi hackers,
> 
>In order to achieve enable/disable constraint name,I made ​​a few 
> modifications to the code.
> 
>First, someone used to build the constraints while building 
> table. Then inserting data must follow a certain order.
>And people usually like to insert the data but not affected by 
> foreign keys or check.

Is there any semantic difference between marking a constraint as
DISABLED and simply dropping it? Or does it just make it easier to
re-add it later?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-02 Thread Peter Geoghegan
On Mon, Sep 2, 2013 at 6:25 AM, Craig Ringer  wrote:
> It'll be yet another way for people to get upsert wrong, of course.
> They'll use a wCTE with RETURNING REJECTS to do an UPDATE of the rejects
> w/o locking the table against writes first. Documenting this pitfall
> should be enough, though.

My preferred solution is to actually provide a variant to lock the
rows implicated in the would-be unique constraint violation. Obviously
that's a harder problem to solve.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-09-02 Thread Fabrízio de Royes Mello

On 19-08-2013 16:10, Boszormenyi Zoltan wrote:


I am reviewing your patch.



Thanks...



* Is the patch in a patch format which has context? (eg: context diff
format)

Yes.

* Does it apply cleanly to the current git master?

Almost. No rejects, no fuzz, only offset for some files.

* Does it include reasonable tests, necessary doc patches, etc?

Documentation, yes. Tests, no.

* Does the patch actually implement what it's supposed to do?

Yes.

* Do we want that?

Yes.

* Do we already have it?

No.

* Does it follow SQL spec, or the community-agreed behavior?

The SQL standard doesn't have DISCARD.
Otherwise the behaviour is obvious.

* Does it include pg_dump support (if applicable)?

n/a

* Are there dangers?

It changes applications' assumptions slightly but takes the
behaviour closer to the wording of the documentation.

* Have all the bases been covered?

Yes.

* Does the feature work as advertised?

Yes.

* Are there corner cases the author has failed to consider?

No.

* Are there any assertion failures or crashes?

No.

* Does the patch slow down simple tests?

No.

* If it claims to improve performance, does it?

n/a

* Does it slow down other things?

No.

* Does it follow the project coding guidelines?

Yes.

Maybe a little stylistic comment:

+void
+ReleaseSequenceCaches()
+{
+   SeqTableData *ptr = seqtab;
+   SeqTableData *tmp = NULL;
+
+   while (ptr != NULL)
+   {
+   tmp = ptr;
+   ptr = ptr->next;
+   free(tmp);
+   }
+
+   seqtab = NULL;
+}

I would rename the variables to "seq" and "next" from
"ptr" and "tmp", respectively, to make them even more
obvious. This looks a little better:

+void
+ReleaseSequenceCaches()
+{
+   SeqTableData *seq = seqtab;
+   SeqTableData *next;
+
+   while (seq)
+   {
+   next = seq->next;
+   free(seq);
+   seq = next;
+   }
+
+   seqtab = NULL;
+}



Done!



* Are there portability issues?

No.

* Will it work on Windows/BSD etc?

It should. There are no extra system calls.

* Are the comments sufficient and accurate?

The feature needs very little code which is downright obvious.
There are no comments but I don't think the code needs it.

* Does it do what it says, correctly?

Yes.


I lied.

There is one little problem. There is no command tag
reported for DISCARD SEQUENCES:

zozo=# create sequence s1;
CREATE SEQUENCE
zozo=# select nextval('s1');
  nextval
-
1
(1 row)

zozo=# select currval('s1');
  currval
-
1
(1 row)

zozo=# discard all;
DISCARD ALL
zozo=# discard sequences;
???
zozo=# select currval('s1');
ERROR:  currval of sequence "s1" is not yet defined in this session



Fixed!




* Does it produce compiler warnings?

Only one:

src/backend/commands/sequence.c should have

#include 

because of this:

sequence.c:1608:1: warning: no previous prototype for
‘ReleaseSequenceCaches’ [-Wmissing-prototypes]
 ReleaseSequenceCaches()
 ^



Fixed!



* Can you make it crash?

No.

* Is everything done in a way that fits together coherently with other
features/modules?

Yes.

* Are there interdependencies that can cause problems?

I don't think so.




The attached patch fix the items reviewed by you.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/doc/src/sgml/ref/discard.sgml b/doc/src/sgml/ref/discard.sgml
index 65ebbae..abd3e28 100644
--- a/doc/src/sgml/ref/discard.sgml
+++ b/doc/src/sgml/ref/discard.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-DISCARD { ALL | PLANS | TEMPORARY | TEMP }
+DISCARD { ALL | PLANS | SEQUENCES | TEMPORARY | TEMP }
 
  
 
@@ -67,6 +67,15 @@ DISCARD { ALL | PLANS | TEMPORARY | TEMP }

 

+SEQUENCES
+
+ 
+  Releases all internally cached sequences.
+ 
+
+   
+
+   
 ALL
 
  
@@ -83,6 +92,7 @@ UNLISTEN *;
 SELECT pg_advisory_unlock_all();
 DISCARD PLANS;
 DISCARD TEMP;
+DISCARD SEQUENCES;
 
 

diff --git a/src/backend/commands/discard.c b/src/backend/commands/discard.c
index 76f3ab6..f4e7e06 100644
--- a/src/backend/commands/discard.c
+++ b/src/backend/commands/discard.c
@@ -18,13 +18,14 @@
 #include "commands/async.h"
 #include "commands/discard.h"
 #include "commands/prepare.h"
+#include "commands/sequence.h"
 #include "utils/guc.h"
 #include "utils/portal.h"
 
 static void DiscardAll(bool isTopLevel);
 
 /*
- * DISCARD { ALL | TEMP | PLANS }
+ * DISCARD { ALL | SEQUENCES | TEMP | PLANS }
  */
 void
 DiscardCommand(DiscardStmt *stmt, bool isTopLevel)
@@ -39,6 +40,10 @@ DiscardCommand(DiscardStmt *stmt, bool isTopLevel)
 			ResetPlanCache();
 			break;
 
+		case DISCARD_SEQUENCES:
+			ReleaseSequenceCaches();
+			break;
+
 		case DISCARD_TEMP:
 			ResetTempTableNamespace();
 			break;
@@ -69,4 +74,5 @@ DiscardAll(bool isTopLevel)
 	LockReleaseAll(USER_LOCKMETHOD, true);
 	ResetPlanCache();

Re: [HACKERS] Extension Templates S03E11

2013-09-02 Thread David Fetter
On Mon, Sep 02, 2013 at 02:32:16AM -0400, Peter Eisentraut wrote:
> On Thu, 2013-08-29 at 12:16 +0200, Dimitri Fontaine wrote:
> > Here's v14 of the patch with pg_upgrade support fixed, so that the
> > automated setup that Peter built is able to have at it!
> 
> Fails cpluspluscheck:
> 
> In file included from /tmp/cpluspluscheck.5g2uWw/test.cpp:3:0:
> ./src/include/commands/template.h:47:8: error: ‘ExtensionControl’ does not 
> name a type
> ./src/include/commands/template.h:51:8: error: ‘ExtensionControl’ does not 
> name a type
> 
> I think this actually just means the header does not include all it needs by 
> itself.

Is there some standard set of checks you run on new patches, and are
the results showing up on, say, the buildfarm or some other CI
dashboard?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-09-02 Thread Jeff Davis
On Fri, 2013-08-30 at 20:34 +0200, Andres Freund wrote:
> I have a quick question: The reason I'd asked about the status of the
> patch was that I was thinking about the state of the "forensic freezing"
> patch. After a quick look at your proposal, we still need to freeze in
> some situations (old & new data on the same page basically), so I'd say
> it still makes sense to apply the forensic freezing patch, right?
> 
> Differing Opinions?

The Freeze Forensically patch is nice because (if I understand it
correctly) it allows us to freeze at the same time as we mark
PD_ALL_VISIBLE, which avoids the potential extra page write. But that's
not such a big advantage if we don't ordinarily have to write again for
freezing, anyway.

However, there are still some cases where we'd be able to preserve the
forensic information. If nothing else, that might help debug this patch,
if necessary. There might also be cases where we can freeze more eagerly
to avoid the case where very old (but unfrozen) and very new tuples mix
on the same page. Perhaps Robert has some thoughts here, as well.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Next CFM?

2013-09-02 Thread David Fetter
On Mon, Sep 02, 2013 at 12:00:02PM -0500, Josh Berkus wrote:
> Hackers,
> 
> We need a Commit Fest manager for the September CF.  I'm not going
> to do it; this month is a heavy travel month for me (3 conferences
> and a wedding).
> 
> For help, here's the Commitfest Checklist Mike and I assembled:
> 
> https://wiki.postgresql.org/wiki/CommitFest_Checklist
> 
> Mind you, Peter E. seems to be getting patches organized ... are you
> CFM for this one, Peter?

If Peter won't, I will.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max freeze age query in docs

2013-09-02 Thread Andrew Dunstan


On 09/02/2013 02:26 PM, Andres Freund wrote:

On 2013-09-02 14:20:57 -0400, Andrew Dunstan wrote:

On 09/02/2013 01:30 PM, Tom Lane wrote:

Andrew Dunstan  writes:

Yes, possibly, but we can't do that now, but I would like to fix the
docs now.

If you want this in 9.3.0 it needs to be committed in the next couple of
hours.

FWIW, the idea seemed generally sane to me, but I'd suggest not depending
on reltoastrelid being zero when and only when there's no match.
Why not test whether t.oid IS NULL, instead?

Or actually, code it like this

GREATEST(age(c.relfrozenxid), age(t.relfrozenxid))

and be done, as well as not having an ugly direct use of int4larger.




OK, I'll do it that way. Working on it now.

I'd vote for c.relkind != 't' AND NOT c.relfrozenxid = 0; instead of
relkind = 'r' for the main relation, that way you'd include materialized
views and stuff.



See what was just committed - the matview case is included for 9.3+ (as 
it was in fact in the original - I must have been looking at older docs 
when saw it wasn't there.)


I'll be back in an hour or so if any final tweeks are needed.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max freeze age query in docs

2013-09-02 Thread Andres Freund
On 2013-09-02 14:20:57 -0400, Andrew Dunstan wrote:
> 
> On 09/02/2013 01:30 PM, Tom Lane wrote:
> >Andrew Dunstan  writes:
> >>Yes, possibly, but we can't do that now, but I would like to fix the
> >>docs now.
> >If you want this in 9.3.0 it needs to be committed in the next couple of
> >hours.
> >
> >FWIW, the idea seemed generally sane to me, but I'd suggest not depending
> >on reltoastrelid being zero when and only when there's no match.
> >Why not test whether t.oid IS NULL, instead?
> >
> >Or actually, code it like this
> >
> >GREATEST(age(c.relfrozenxid), age(t.relfrozenxid))
> >
> >and be done, as well as not having an ugly direct use of int4larger.
> >
> > 
> 
> 
> OK, I'll do it that way. Working on it now.

I'd vote for c.relkind != 't' AND NOT c.relfrozenxid = 0; instead of
relkind = 'r' for the main relation, that way you'd include materialized
views and stuff.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max freeze age query in docs

2013-09-02 Thread Andrew Dunstan


On 09/02/2013 01:30 PM, Tom Lane wrote:

Andrew Dunstan  writes:

Yes, possibly, but we can't do that now, but I would like to fix the
docs now.

If you want this in 9.3.0 it needs to be committed in the next couple of
hours.

FWIW, the idea seemed generally sane to me, but I'd suggest not depending
on reltoastrelid being zero when and only when there's no match.
Why not test whether t.oid IS NULL, instead?

Or actually, code it like this

GREATEST(age(c.relfrozenxid), age(t.relfrozenxid))

and be done, as well as not having an ugly direct use of int4larger.





OK, I'll do it that way. Working on it now.

cheers

andrew

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max freeze age query in docs

2013-09-02 Thread Tom Lane
Andrew Dunstan  writes:
> Yes, possibly, but we can't do that now, but I would like to fix the 
> docs now.

If you want this in 9.3.0 it needs to be committed in the next couple of
hours.

FWIW, the idea seemed generally sane to me, but I'd suggest not depending
on reltoastrelid being zero when and only when there's no match.
Why not test whether t.oid IS NULL, instead?

Or actually, code it like this

   GREATEST(age(c.relfrozenxid), age(t.relfrozenxid))

and be done, as well as not having an ugly direct use of int4larger.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max freeze age query in docs

2013-09-02 Thread Andrew Dunstan


On 09/01/2013 10:33 PM, Josh Berkus wrote:

Maybe for bonus points we'd print out the schema (e.g. by selectting
c.oid::regclass instead of c.relname), and also include materialized
views which are omitted from the query altogether.

Given the importance of this, maybe we need to have it as part of
pg_stat_user_tables?




Yes, possibly, but we can't do that now, but I would like to fix the 
docs now.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Next CFM?

2013-09-02 Thread Josh Berkus
Hackers,

We need a Commit Fest manager for the September CF.  I'm not going to do
it; this month is a heavy travel month for me (3 conferences and a
wedding).

For help, here's the Commitfest Checklist Mike and I assembled:

https://wiki.postgresql.org/wiki/CommitFest_Checklist

Mind you, Peter E. seems to be getting patches organized ... are you CFM
for this one, Peter?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-02 Thread Craig Ringer
On 08/31/2013 06:40 AM, Josh Berkus wrote:
>> > 3) RETURNING is expanded - "RETURNING REJECTS *" is now possible where
>> > that makes sense.
> Oh, nifty!  OK, now I can *really* use this feature.

Absolutely; especially combined with COPY to a staging TEMPORARY or
UNLOGGED table.

It'll be yet another way for people to get upsert wrong, of course.
They'll use a wCTE with RETURNING REJECTS to do an UPDATE of the rejects
w/o locking the table against writes first. Documenting this pitfall
should be enough, though.

Speaking of upsert, I'm starting to think that to solve the upsert
problem without forcing full-table locking on people we need a lock type
that conflicts only with DELETE/INSERT and a way to prevent UPDATEs from
changing the key. Kind of like the table level inverse of FOR KEY UPDATE
- a way to say "you can change rows, just cannot add or remove from the
set of keys".

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension Templates S03E11

2013-09-02 Thread Dimitri Fontaine
Peter Eisentraut  writes:
> Fails cpluspluscheck:

Turns out I'm discovering that particular check, thanks! I could
reproduce and fix the error locally after being led to the command
./src/tools/pginclude/cpluspluscheck.

So please find v15 of the patch attached to this email, that passes all
previously done checks and this one too now.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



templates.v15.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql and pset without any arguments

2013-09-02 Thread Gilles Darold
Patch added to current open commitfest under the Client section with title:

   "Call \pset without any arguments displays current status of all
printing options"

Status: Need review.

Let me know if it should not be there.

Regards,

Le 29/06/2013 01:08, Gilles Darold a écrit :
> Hi,
>
> I was looking at psql 8.3 documention about \pset options and saw that
> there was the following note :
>
> "Note: It is an error to call \pset without any arguments. In the
> future this case might show the current status of all printing options."
>
> I looked backward and forward to find that this note is present in all
> versions since 7.1 up to 9.3, maybe it is time to add this little feature.
>
> I've attached a patch to add the usage of the \pset command without any
> arguments to displays current status of all printing options instead of
> the error message. Here is a sample output:
>
> (postgres@[local]:5494) [postgres] > \pset
> Output format is aligned.
> Border style is 2.
> Expanded display is used automatically.
> Null display is "NULL".
> Field separator is "|".
> Tuples only is off.
> Title is unset.
> Table attributes unset.
> Line style is unicode.
> Pager is used for long output.
> Record separator is .
> (postgres@[local]:5494) [postgres] >
>
> To avoid redundant code I've added a new method printPsetInfo() so that
> do_pset() and exec_command() will used the same output message, they are
> all in src/bin/psql/command.c. For example:
>
> (postgres@[local]:5494) [postgres] > \pset null 'NULL'
> Null display is "NULL".
> (postgres@[local]:5494) [postgres] >
>
> The patch print all variables information from struct printTableOpt when
> \pset is given without any arguments and also update documentation.
>
> Let me know if there's any additional work to do on this basic patch or
> something that I've omitted.
>
> Best regards,
>
-- 
Gilles Darold
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Further XLogInsert scaling tweaking

2013-09-02 Thread Heikki Linnakangas
Now that I've had a little break from the big XLogInsert scaling patch, 
I went back to do some testing and profiling of it. I saw a lot of 
contention on the first access of RedoRecPtr and force/fullPageWrites, 
which made me realize that I put those variables right next to the 
heavily-contended insertpos_lck spinlock and the variables that it 
protects. Needless to say, that's a recipe for contention.


I added some padding between those two, per attached patch, and got a 
big improvement. So we should definitely do that. I just added a 
char[128] field between them, which is enough to put them on different 
cache lines on the server I'm testing on. I don't think there is any 
platform-independent way to get the current cache line size, 
unfortunately. Since this doesn't affect correctness, I'm inclined to 
just add the 128-byte padding field there.


Attached is a graph generated by pgbench-tools. Full results are 
available here: http://hlinnaka.iki.fi/xloginsert-scaling/padding/. The 
test query used was:


insert into foo:client_id select generate_series(1, 100);

That is, each client inserts a lot of rows into a different table. The 
table has no indexes. This is pretty much the worst-case scenario for 
stressing XLogInsert.


The "master-b03d196" is unpatched git master, "xlog-padding-fb741c0" is 
with the padding. The -16 plots are the same, but with 
xloginsert_slots=16 (the default is 8). The server has 16 cores, 32 with 
hyperthreading.


It's interesting that although the peak performance is better with 8 
slots than with 16, it doesn't scale as gracefully with 16 slots. I 
believe that's because with more insertion slots, you have more backends 
fighting over the insertpos_lck. With fewer slots, the extra work is 
queued behind the insertion slots instead, and sleeping is better than 
spinning.


In either case, the performance with the padding is better than without.

- Heikki
<>diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 39c58d0..28e62ea 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -428,8 +428,14 @@ typedef struct XLogCtlInsert
 	uint64		CurrBytePos;
 	uint64		PrevBytePos;
 
-	/* insertion slots, see above for details */
-	XLogInsertSlotPadded *insertSlots;
+	/*
+	 * Make sure the above heavily-contended spinlock and byte positions are
+	 * on their own cache line. In particular, the RedoRecPtr and full page
+	 * write variables below should be on a different cache line. They are
+	 * read on every WAL insertion, but updated rarely, and we don't want
+	 * those reads to steal the cache line containing Curr/PrevBytePos.
+	 */
+	char		pad[128];
 
 	/*
 	 * fullPageWrites is the master copy used by all backends to determine
@@ -455,6 +461,9 @@ typedef struct XLogCtlInsert
 	bool		exclusiveBackup;
 	int			nonExclusiveBackups;
 	XLogRecPtr	lastBackupStart;
+
+	/* insertion slots, see XLogInsertSlot struct above for details */
+	XLogInsertSlotPadded *insertSlots;
 } XLogCtlInsert;
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers