date:20140421

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Peter Geoghegan

Here is a benchmark that is similar to my earlier one, but with a rate
limit of 125 tps, to help us better characterize how the prototype
patch helps performance:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/3-sec-delay-limit/

Again, these are 15 minute runs with unlogged tables at multiple
client counts, and scale 5,000.

Every test run should have managed to hit that limit, but in the case
of master 1 test did not. I have included vmstat, iostat and meminfo
OS instrumentation this time around, which is really interesting for
this particular limit based benchmark. The prototype patch tested here
is a slight refinement on my earlier prototype. Apart from reducing
the number of gettimeofday() calls, and doing them out of the critical
path, I increased the initial usage_count value to 6. I also set
BM_MAX_USAGE_COUNT to 30. I guess I couldn't resist the temptation to
tweak things, which I actually did very little of prior to publishing
my initial results. This helps, but there isn't a huge additional
benefit.

The benchmark results show that master cannot even meet the 125 tps
limit on 1 test out of 9. More interestingly, the background writer
consistently cleans about 10,000 buffers per test run when testing the
patch. At the same time, buffers are allocated at a very consistent
rate of around 262,000 for the patched test runs. Leaving aside the
first test run, with the patch there is only a tiny variance in the
number cleaned between each test, a variance of just a few hundred
buffers. In contrast, master has enormous variance. During just over
half of the tests, the background writer does not clean even a single
buffer. Then, on 2 tests out of 9, it cleans an enormous ~350,000
buffers. The second time this happens leads to master failing to even
meet the 125 tps limit (albeit with only one client).

If you drill down to individual test runs, a similar pattern is
evident. You'll now find operating system information (meminfo dirty
memory) graphed here. The majority of the time, master does not hold
more than 1,000 kB of dirty memory at a time. Once or twice it's 0 kB
for multiple minutes. However, during the test runs where we also see
huge spikes in background-writer-cleaned pages, we also see huge
spikes in the total amount of dirty memory (and correlated huge spikes
in latency). It can get to highs of ~700,000 kB at one point. In
contrast, the patched tests show very consistent amounts of dirty
memory. Per test, it almost always tops out at 4,000 kB - 6,000 kB
(there is a single 12,000 kB spike, though). There is consistently a
distinct zig-zag pattern to the dirty memory graph with the patched
tests, regardless of client count or where checkpoints occur. Master
shows mountains and valleys for those two particularly problematic
tests, correlating with a panicked background writer's aggressive
feedback loop. Master also shows less rhythmic zig-zag patterns that
only peak at about 600 kB - 1,000 kB for the entire duration of many
individual test runs.

Perhaps most notably, average and worst case latency is far improved
with the patch. On average it's less than half of master with 1
client, and less than a quarter of master with 32 clients.

I think that the rate limiting feature of pgbench is really useful for
characterizing how work like this improves performance. I see a far
smoother and more consistent pattern of I/O that superficially looks
like Postgres is cooperating with the operating system much more than
it does in the baseline. It sort of makes sense that the operating
system cache doesn't care about frequency while Postgres does. If the
OS cache did weigh frequency, it would surely not alter the outcome
very much, since OS cached data has presumably not been accessed very
frequently recently. I suspect that I've cut down on double buffering
by quite a bit. I would like to come up with a simple way of measuring
that, using something like pgfincore, but the available interfaces
don't seem well-suited to quantifying how much of a problem this is
and remains. I guess call pgfincore on the index segment files might
be interesting, since shared_buffers mostly holds index pages. This
has been verified using pg_buffercache.

It would be great to test this out with something involving
non-uniform distributions, like Gaussian and Zipfian distributions.
The LRU-K paper tests Zipfian too. The uniform distribution pgbench
uses here, while interesting, doesn't tell the full story at all and
is less representative of reality (TPB-C is formally required to have
a non-uniform distribution [1] for some things, for example). A
Gaussian distribution might show essentially the same failure to
properly credit pages with frequency of access in one additional
dimension, so to speak. Someone should probably look at the TPC-C-like
DBT-2, since in the past that was considered to be a problematic
workload for PostgreSQL [2] due to the heavy I/O. Zipfian is a lot
less sympathetic than uniform if the LRU

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund

On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote:
> 
> On 04/21/2014 02:54 PM, Andres Freund wrote:
> >Hi,
> >
> >I spent the last two hours poking arounds in the environment Andrew
> >provided and I was able to reproduce the issue, find a assert to
> >reproduce it much faster and find a possible root cause.
> 
> 
> What's the assert that makes it happen faster? That might help a lot in
> constructing a self-contained test.

Assertion and *preliminary*, *hacky* fix attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 40c09d8a85db137caf42c2dfe36adda63a381c75 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 22 Apr 2014 08:42:11 +0200
Subject: [PATCH 1/2] Error out when creating a multixact with more than one
 updater.

---
 src/backend/access/heap/heapam.c   |  3 ---
 src/backend/access/transam/multixact.c | 15 +++
 src/include/access/multixact.h |  3 +++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2605f20..bd26b80 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -190,9 +190,6 @@ static const int MultiXactStatusLock[MaxMultiXactStatus + 1] =
 /* Get the LockTupleMode for a given MultiXactStatus */
 #define TUPLOCK_from_mxstatus(status) \
 			(MultiXactStatusLock[(status)])
-/* Get the is_update bit for a given MultiXactStatus */
-#define ISUPDATE_from_mxstatus(status) \
-			((status) > MultiXactStatusForUpdate)
 
 /* 
  *		 heap support routines
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index d4ad678..4ff1e58 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -713,6 +713,21 @@ MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members)
 		return multi;
 	}
 
+	{
+		int			i;
+		TransactionId update_xact = InvalidTransactionId;
+
+		for (i = 0; i < nmembers; i++)
+		{
+			/* Ignore lockers */
+			if (!ISUPDATE_from_mxstatus(members[i].status))
+continue;
+			if (update_xact != InvalidTransactionId)
+elog(ERROR, "new multixact has more than one updating member");
+			update_xact = members[i].xid;
+		}
+	}
+
 	/*
 	 * Assign the MXID and offsets range to use, and make sure there is space
 	 * in the OFFSETs and MEMBERs files.  NB: this routine does
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 9729f27..95ffe0e 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -48,6 +48,9 @@ typedef enum
 
 #define MaxMultiXactStatus MultiXactStatusUpdate
 
+/* Get the is_update bit for a given MultiXactStatus */
+#define ISUPDATE_from_mxstatus(status) \
+			((status) > MultiXactStatusForUpdate)
 
 typedef struct MultiXactMember
 {
-- 
1.8.5.rc2.dirty

>From ebd4f834b7cdccb9a9fe336dd981811e26bfc271 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 22 Apr 2014 08:42:44 +0200
Subject: [PATCH 2/2] preliminary fix for corruption issue.

---
 src/backend/access/heap/heapam.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bd26b80..9c075c4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2722,7 +2722,7 @@ l1:
 			 * update this tuple before we get to this point.  Check for xmax
 			 * change, and start over if so.
 			 */
-			if (!(tp.t_data->t_infomask & HEAP_XMAX_IS_MULTI) ||
+			if (tp.t_data->t_infomask != infomask ||
 !TransactionIdEquals(HeapTupleHeaderGetRawXmax(tp.t_data),
 	 xwait))
 goto l1;
@@ -2748,7 +2748,7 @@ l1:
 			 * other xact could update this tuple before we get to this point.
 			 * Check for xmax change, and start over if so.
 			 */
-			if ((tp.t_data->t_infomask & HEAP_XMAX_IS_MULTI) ||
+			if (tp.t_data->t_infomask != infomask ||
 !TransactionIdEquals(HeapTupleHeaderGetRawXmax(tp.t_data),
 	 xwait))
 goto l1;
@@ -3275,7 +3275,7 @@ l2:
 			 * update this tuple before we get to this point.  Check for xmax
 			 * change, and start over if so.
 			 */
-			if (!(oldtup.t_data->t_infomask & HEAP_XMAX_IS_MULTI) ||
+			if (oldtup.t_data->t_infomask != infomask ||
 !TransactionIdEquals(HeapTupleHeaderGetRawXmax(oldtup.t_data),
 	 xwait))
 goto l2;
@@ -3329,7 +3329,7 @@ l2:
  * recheck the locker; if someone else changed the tuple while
  * we weren't looking, start over.
  */
-if ((oldtup.t_data->t_infomask & HEAP_XMAX_IS_MULTI) ||
+if (oldtup.t_data->t_infomask != infomask ||
 	!TransactionIdEquals(
 	HeapTupleHeaderGetRawXmax(oldtup.t_data),
 		 xwait))
@@ -3350,7 +3350,7 @@ l2:
  * some other xact could update this tuple before we get to

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Mark Kirkwood


On 22/04/14 09:25, Andres Freund wrote:

On 2014-04-21 17:21:20 -0400, Bruce Momjian wrote:

On Mon, Apr 21, 2014 at 02:08:51PM -0700, Joshua Drake wrote:

If the community had more *BSD presence I think it would be great
but it isn't all that viable at this point. I do know however that
no-one in this community would turn down a team of FreeBSD advocates
helping us make PostgreSQL awesome for PostgreSQL.


I don't think we would even implement a run-time control for Linux or
Windows for this, so I don't even think it is a FreeBSD issue.


I think some of the arguments in this thread are pretty damn absurd. We
have just introduced dynamic_shared_memory_type.



+1

I was just thinking the same thing...



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Michael Paquier

On Tue, Apr 22, 2014 at 9:58 AM, Tatsuo Ishii  wrote:

> > * Tatsuo Ishii (is...@postgresql.org) wrote:
> >> I observe performance degradation with PostgreSQL 9.3 vs 9.2 on Linux
> >> as well.  The hardware is HP DL980G7, 80 cores, 2TB mem, RHEL 6,
> >> pgbench is used (read only query), scale factor is 1,000 (DB size
> >> 15GB).
> >
> > Can you isolate the sysv-vs-mmap patch and see what happens with just
> > that change..?
>
> Unfortunately I don't have an access to the machine at this moment.
>
> >> Included is the graph (from PostgreSQL Enterprise Consortium's 2014
> >> report page 13: https://www.pgecons.org/downloads/43). I see up to 14%
> >> degration (at 128 concurrent users) comparing with 9.2.
> >
> > That URL returns 'Forbidden'...
>
> Sorry for this. I sent a problem report to the person in charge.  In
> the mean time, please go to:
> https://www.pgecons.org/download/works_2013/ then click the link "2013
> 年度WG1活動報告" (sorry for not English). You should be able to
> download a report (PDF).
>
> Also the report is written in Japanese. I hope you can read at leat
> the graph in page 13 and the table in page 14.
>
Is pgecons planning to do a translation of that at some point? It looks
like good material, and the audience able to understand it is rather
limited now :)
-- 
Michael

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Tatsuo Ishii

>> Can you isolate the sysv-vs-mmap patch and see what happens with just
>> that change..?
> 
> Unfortunately I don't have an access to the machine at this moment.

Where is the patch? I would like to test it on a smaller machine for
now.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Joshua D. Drake



On 04/21/2014 06:19 PM, Andrew Dunstan wrote:



If we never start we'll never get there.

I can think of several organizations that might be approached to donate
hardware.


Like .Org?

We have a hardware farm, a rack full of hardware and spindles. It isn't 
the most current but it is there.


Sincerely,

JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Noah Misch

On Mon, Apr 21, 2014 at 10:57:34AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > I wonder how it would work out to instead delay this new detoast effort 
> > until
> > toast_insert_or_update().
> 
> That would require toast_insert_or_update() to know about every container
> datatype.  I doubt it could lead to an extensible or maintainable
> solution.

If that's its worst drawback, it's excellent.

> I'm actually planning to set this patch on the shelf for a bit and go
> investigate the other alternative, ie, not generating composite Datums
> containing toast pointers in the first place.  We now know that the idea
> that we aren't going to take performance hits *somewhere* is an illusion,
> and I still suspect that the other way is going to lead to a smaller and
> cleaner patch.  The main performance downside for plpgsql might be
> addressable by making sure that plpgsql record variables fall on the "heap
> tuple" rather than the "composite Datum" side of the line.  I'm also quite
> concerned about correctness: I don't have a lot of confidence that this
> patch has closed every loophole with respect to arrays, and it hasn't even
> touched ranges or any of the related one-off bugs that I believe exist.

I maintain that the potential slowdown is too great to consider adopting that
for the sake of a cleaner patch.  Your last message examined a 67% performance
regression.  The strategy you're outlining now can slow a query by 1,000,000%.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-21 Thread Fabrízio de Royes Mello

On Mon, Apr 21, 2014 at 12:00 AM, Stephen Frost  wrote:
>
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Fri, Apr 18, 2014 at 2:24 AM, Tom Lane  wrote:
> > > 2. While I'm no Python expert, I believe GD is just a specific
instance
> > > of a general capability for global state in Python.  Are we going to
> > > promise that any and all user-created data inside Python goes away?
> > > What about other PLs?  Will users thank us if this suddenly starts
> > > happening?
> >
> > This is not the first time that somebody's asked for a way to throw
> > away global interpreter state, and I really think we ought to oblige.
> > In a connection-pooling environment, you really need a way to get the
> > connection back to its original state rather than some not-so-near
> > facsimile thereof.  Maybe it'll end up as an optional behavior, and
> > which kind of reset to use will become part of the pooler
> > configuration, but that doesn't bother me as much as not having it for
> > those that want it.
>
> Drop the connection and reconnect would be the answer to that.  For as
> much as we may hope and wish for a connection to go back to 'the way it
> was upon first connection', throwing away the interpretor *might* (and I
> wouldn't be comfortable claiming it absolutely..) get you there when
> you've only called functions which use interpretors, but people write
> code in C too and we've seen complaints of memory leaks, etc, from C
> libraries and C extensions- and there's nothing we're going to be able
> to do to address that, so this mythical 'DISCARD EVERYTHING' is a pipe
> dream.  (Were we to actually re-exec ourselves into a new process, as if
> we went through a disconnect/reconnect, I'd be more inclined to support
> this capability, but I'm not sure what such would really buy us...)
>
> > What's a bit odd about this request is that it asks for the ability to
> > throw away only part of the state.  ISTM that if somebody wants to add
> > that kind of capability, they ought to just package a function which
> > does precisely that with the plpython extension, or create a Python
> > function that zaps that particular variable if that's possible.  I
> > think it's clearly useful to have DISCARD ALL be a request to discard
> > *everything* in one shot, but it's going to be a stretch to come up
> > with DISCARD variants for every kind of partial state removal somebody
> > wants to do.
>
> Agreed.
>

Doesn't makes sense implement a way to every extension register your own
cleanup code and them with DISCARD EXTENSIONS and/or DISCARD ALL run this
registered procedure?

If yes I can develop a patch to do that (actually I already start with
implementation).

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Peter Geoghegan

On Mon, Apr 21, 2014 at 5:59 PM, Peter Geoghegan  wrote:
> LRU-K, and 2Q have roughly the same advantages. I'm
> reasonably confident you can have the best of both worlds, or
> something closer to it.

Having said that, a big part of what I'd like to accomplish here is to
address the more general problem of "correlated references". That's
probably something that has independent value.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Peter Geoghegan

On Mon, Apr 21, 2014 at 6:12 PM, Tom Lane  wrote:
> Did you read the commit message I pointed to?

Yes.

> (See also 4e8af8d27.)

Oh, I wasn't actually aware of the fact that 2Q made it into the tree.
I thought that the first commit message you referred to just
referenced on-list discussion of 2Q. Interesting.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andrew Dunstan



On 04/21/2014 09:16 PM, Peter Geoghegan wrote:

On Mon, Apr 21, 2014 at 6:08 PM, Tatsuo Ishii  wrote:

What we would need is a way to graph the results - that's something
beyond my very rudimentary expertise in web programming. If anyone
feels like collaborating I'd be glad to hear from them (The web site
is programmed in perl + TemplateToolkit, but even that's not
immutable. I'm open to using, say, node.js plus one of its templating
engines.

gnuplot? (the graph I attached was created by gnuplt).

That's all pgbench-tools itself uses.

The problem with a performance farm is that it's relatively hard to
donate a performance farm member. It more or less requires expensive
hardware, and a large amount of rigor in testing and normalizing
various aspects of the environment that might otherwise add noise.
Then again, it might only take 2 or 3 servers to make a huge
difference. There are a number of different things that would be
immediately compelling to target with that kind of thing, so the first
step is non-obvious too.




If we never start we'll never get there.

I can think of several organizations that might be approached to donate 
hardware.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Peter Geoghegan

On Mon, Apr 21, 2014 at 6:08 PM, Tatsuo Ishii  wrote:
>> What we would need is a way to graph the results - that's something
>> beyond my very rudimentary expertise in web programming. If anyone
>> feels like collaborating I'd be glad to hear from them (The web site
>> is programmed in perl + TemplateToolkit, but even that's not
>> immutable. I'm open to using, say, node.js plus one of its templating
>> engines.
>
> gnuplot? (the graph I attached was created by gnuplt).

That's all pgbench-tools itself uses.

The problem with a performance farm is that it's relatively hard to
donate a performance farm member. It more or less requires expensive
hardware, and a large amount of rigor in testing and normalizing
various aspects of the environment that might otherwise add noise.
Then again, it might only take 2 or 3 servers to make a huge
difference. There are a number of different things that would be
immediately compelling to target with that kind of thing, so the first
step is non-obvious too.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Tom Lane

Peter Geoghegan  writes:
> On Mon, Apr 21, 2014 at 5:50 PM, Tom Lane  wrote:
>> ARC *was* the predecessor algorithm.  See commit 5d5087363.

> I believe that the main impetus for replacing ARC with clock sweep
> came from patent issues, though.

That was one issue, but performance gains were a large part of it too,
and the main reason why we picked clock sweep rather than something else.
Did you read the commit message I pointed to?

(See also 4e8af8d27.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Tatsuo Ishii

> This is exactly why we need a benchfarm.
> 
> I actually have a client working based on Greg Smith's pgbench tools.
> 
> What we would need is a way to graph the results - that's something
> beyond my very rudimentary expertise in web programming. If anyone
> feels like collaborating I'd be glad to hear from them (The web site
> is programmed in perl + TemplateToolkit, but even that's not
> immutable. I'm open to using, say, node.js plus one of its templating
> engines.

gnuplot? (the graph I attached was created by gnuplt).

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andrew Dunstan



On 04/21/2014 08:49 PM, Stephen Frost wrote:

* Tatsuo Ishii (is...@postgresql.org) wrote:

I observe performance degradation with PostgreSQL 9.3 vs 9.2 on Linux
as well.  The hardware is HP DL980G7, 80 cores, 2TB mem, RHEL 6,
pgbench is used (read only query), scale factor is 1,000 (DB size
15GB).

Can you isolate the sysv-vs-mmap patch and see what happens with just
that change..?



This is exactly why we need a benchfarm.

I actually have a client working based on Greg Smith's pgbench tools.

What we would need is a way to graph the results - that's something 
beyond my very rudimentary expertise in web programming. If anyone feels 
like collaborating I'd be glad to hear from them (The web site is 
programmed in perl + TemplateToolkit, but even that's not immutable. I'm 
open to using, say, node.js plus one of its templating engines.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus

All,

I've taken a stab at creating a reproduceable test case based on the
characterisitics of the production issues I'm seeing.  But clearly
there's an element I'm missing, because I'm not able to produce the bug
with a pgbench-based test case.

My current test has FKs, updating both FK'd tables, updating both
indexed and non-indexed columns, and doing multiple updates in the same
transaction, and lock-blocking.

Files are attached in case someone has better ideas.  queue_bench.sql is
the setup file, and then you do:

pgbench -n -T 600 -c 15 -j 5 -f queue_adder.bench -f queue_worker.bench
-f queue_worker.bench -f queue_worker.bench -f queue_worker.bench {dbname}

... or whatever levels of c and j make sense on your hardware.

FWIW, this particular test case might make a nice destruction test case
for replication testing, too.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
CREATE TABLE queue (
queue_id serial not null primary key,
ts timestamptz not null default now(),
cat int not null,
status text not null default 'new',
current_step int not null default 1
);

CREATE TABLE queue_steps (
queue_id int not null,
step int not null,
status text not null default 'pending',
done timestamptz,
payload text
);

INSERT INTO queue ( ts, cat, current_step )
SELECT now() - interval '15 seconds' + ( i * interval '1 second' ),
(random()*3)::int,
1
FROM generate_series(1,12) as gs(i);

INSERT INTO queue_steps ( queue_id, step, status, payload )
SELECT queue_id, gs.i, 'pending',
md5(random()::text)
FROM queue,
generate_series(1,3) as gs(i);

CREATE INDEX queue_ts ON queue(ts DESC);
CREATE INDEX queue_step_pk ON queue_steps(queue_id, step);
ALTER TABLE queue_steps ADD CONSTRAINT queue_fk FOREIGN KEY ( queue_id ) REFERENCES queue(queue_id);

CREATE FUNCTION add_queue_item ( ncat int )
RETURNS INT
LANGUAGE plpgsql
AS $f$
DECLARE qid INT;
BEGIN
INSERT INTO queue ( ts, cat, status, current_step )
VALUES ( now(), ncat, 'new', 1 )
RETURNING queue_id
INTO qid;

INSERT INTO queue_steps ( queue_id, step, status, payload )
SELECT qid, gs.i, 'pending',
md5(random()::text)
FROM generate_series(1,3) as gs(i);

RETURN qid;
END;
$f$;

CREATE OR REPLACE FUNCTION queue_step ( ncat int )
RETURNS INT
LANGUAGE plpgsql
AS $f$
DECLARE qid INT;
stnum INT;
BEGIN
SELECT queue_id
INTO qid
FROM queue
WHERE status = 'working'
AND cat = ncat
ORDER BY ts LIMIT 1
FOR UPDATE;

IF qid IS NULL THEN
SELECT queue_id
INTO qid
FROM queue
WHERE status = 'new'
AND cat = ncat
ORDER BY ts LIMIT 1
FOR UPDATE;
END IF;

SELECT step FROM queue_steps
INTO stnum
WHERE queue_id = qid
AND status = 'pending'
ORDER BY step LIMIT 1;

IF stnum = 1 THEN
UPDATE queue
SET status = 'working'
WHERE queue_id = qid;
END IF;

UPDATE queue_steps
SET status = 'working'
WHERE queue_id = qid
AND step = stnum;

PERFORM pg_sleep(0.002);

UPDATE queue_steps
SET status = 'done',
done = now()
WHERE queue_id = qid
AND step = stnum;

IF stnum = 3 THEN
UPDATE queue
SET status = 'done'
WHERE queue_id = qid;
ELSE
UPDATE queue
SET current_step = stnum + 1
WHERE queue_id = qid;
END IF;

RETURN qid;

END; $f$;

ANALYZE;
\setrandom rcat 0 3
\setrandom wtime 1 3
BEGIN;
SELECT add_queue_item(:rcat);
END;
\sleep :wtime ms

\setrandom rcat 0 3
\setrandom wtime 1 3
BEGIN;
SELECT queue_step(:rcat);
\sleep :wtime ms
END;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Peter Geoghegan

On Mon, Apr 21, 2014 at 5:50 PM, Tom Lane  wrote:
> ARC *was* the predecessor algorithm.  See commit 5d5087363.

I believe that the main impetus for replacing ARC with clock sweep
came from patent issues, though. It was a happy coincidence that clock
sweep happened to be better than ARC, but that doesn't mean that ARC
didn't have some clear advantages, even if it wasn't worth it on
balance. LRU-K, and 2Q have roughly the same advantages. I'm
reasonably confident you can have the best of both worlds, or
something closer to it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Tatsuo Ishii

> * Tatsuo Ishii (is...@postgresql.org) wrote:
>> I observe performance degradation with PostgreSQL 9.3 vs 9.2 on Linux
>> as well.  The hardware is HP DL980G7, 80 cores, 2TB mem, RHEL 6,
>> pgbench is used (read only query), scale factor is 1,000 (DB size
>> 15GB).
> 
> Can you isolate the sysv-vs-mmap patch and see what happens with just
> that change..?

Unfortunately I don't have an access to the machine at this moment.

>> Included is the graph (from PostgreSQL Enterprise Consortium's 2014
>> report page 13: https://www.pgecons.org/downloads/43). I see up to 14%
>> degration (at 128 concurrent users) comparing with 9.2.
> 
> That URL returns 'Forbidden'...

Sorry for this. I sent a problem report to the person in charge.  In
the mean time, please go to:
https://www.pgecons.org/download/works_2013/ then click the link "2013
年度WG1活動報告" (sorry for not English). You should be able to
download a report (PDF).

Also the report is written in Japanese. I hope you can read at leat
the graph in page 13 and the table in page 14.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Tom Lane

Peter Geoghegan  writes:
> On Mon, Apr 21, 2014 at 5:28 PM, Tom Lane  wrote:
>> We used to have one.  It was a big bottleneck --- and this was years
>> ago, when the buffer manager was much less scalable than it is today.
>> (IIRC, getting rid of a central lock was one of the main advantages
>> of the current clock sweep code over its predecessor.)

> Yes, it was. This is a major advantage of clock sweep, and anything
> that replaces it will need to maintain the same advantage. Didn't
> someone indicate that clock sweep could beat ARC around that time,
> presumably for this reason? If no one did, then my reading of a
> variety of other papers on caching indicates that this is probably the
> case.

ARC *was* the predecessor algorithm.  See commit 5d5087363.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Stephen Frost

* Tatsuo Ishii (is...@postgresql.org) wrote:
> I observe performance degradation with PostgreSQL 9.3 vs 9.2 on Linux
> as well.  The hardware is HP DL980G7, 80 cores, 2TB mem, RHEL 6,
> pgbench is used (read only query), scale factor is 1,000 (DB size
> 15GB).

Can you isolate the sysv-vs-mmap patch and see what happens with just
that change..?

> Included is the graph (from PostgreSQL Enterprise Consortium's 2014
> report page 13: https://www.pgecons.org/downloads/43). I see up to 14%
> degration (at 128 concurrent users) comparing with 9.2.

That URL returns 'Forbidden'...

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Joshua D. Drake



On 04/21/2014 03:08 PM, Jim Nasby wrote:


On 4/21/14, 4:08 PM, Joshua D. Drake wrote:

If the community had more *BSD presence I think it would be great but
it isn't all that viable at this point. I do know however that no-one
in this community would turn down a team of FreeBSD advocates helping
us make PostgreSQL awesome for PostgreSQL.


I assume you meant FreeBSD awesome for PostgreSQL? :)


Yes. Ty for the correction.



I'm also a big fan of *BSD but the reality is it's MUCH harder to get
*BSD into a corporation than linux. Now, if FreeBSD had a bunch of stuff
that made PostgreSQL run like 4x faster on *BSD than Linux that would be
a different story.


Exactly.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Peter Geoghegan

On Mon, Apr 21, 2014 at 5:28 PM, Tom Lane  wrote:
> We used to have one.  It was a big bottleneck --- and this was years
> ago, when the buffer manager was much less scalable than it is today.
> (IIRC, getting rid of a central lock was one of the main advantages
> of the current clock sweep code over its predecessor.)

Yes, it was. This is a major advantage of clock sweep, and anything
that replaces it will need to maintain the same advantage. Didn't
someone indicate that clock sweep could beat ARC around that time,
presumably for this reason? If no one did, then my reading of a
variety of other papers on caching indicates that this is probably the
case.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Peter Geoghegan

On Mon, Apr 21, 2014 at 4:10 AM, Andres Freund  wrote:
> If there are indeed such large regressions on FreeBSD we need to treat
> them as postgres regressions. It's nicer not to add config options for
> things that don't need it, but apparently that's not the case here.

+1, but I think this is something for packagers to get right, not users.

I really don't like the idea of playing chicken with the FreeBSD
people, especially since we're going to use System V shared memory
into the foreseeable future anyway. It's probably *far* easier for us
to fix it than it is for the FreeBSD people to fix it.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Tatsuo Ishii

> I see performance degradation with PostgreSQL 9.3 vs 9.2 on FreeBSD, and I'm 
> wondering who to poke to mitigate the problem. In reference to this thread 
> [1], who where the FreeBSD people that Francois mentioned? If mmap needs to 
> perform well in the kernel, I'd like to know of someone with FreeBSD kernel 
> knowledge who is interested in working with mmap perfocmance. If mmap is 
> indeed the cuplrit, I've just tested 9.2.8 vs 9.3.4, I nevere isolated the 
> mmap patch, although I believe Francois did just that with similar results.

I observe performance degradation with PostgreSQL 9.3 vs 9.2 on Linux
as well.  The hardware is HP DL980G7, 80 cores, 2TB mem, RHEL 6,
pgbench is used (read only query), scale factor is 1,000 (DB size
15GB).

Included is the graph (from PostgreSQL Enterprise Consortium's 2014
report page 13: https://www.pgecons.org/downloads/43). I see up to 14%
degration (at 128 concurrent users) comparing with 9.2.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Tom Lane

Jim Nasby  writes:
> How *certain* are we that a single freelist lock (that actually ONLY
> protects the freelist) would be that big a deal?

We used to have one.  It was a big bottleneck --- and this was years
ago, when the buffer manager was much less scalable than it is today.
(IIRC, getting rid of a central lock was one of the main advantages
of the current clock sweep code over its predecessor.)

The real issue here is that in the modern code, we hardly ever actually
have anything in the freelist: only when a relation is dropped, or
something like that, do buffers ever get put to the freelist.  So your
argument that removing a buffer from the freelist is cheaper than running
the clock sweep is largely missing the point.  We'd have to run a clock
sweep in order to find something to put in the freelist.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread David G Johnston

Stephen Frost wrote
> * Alfred Perlstein (

> alfred@

> ) wrote:
>> On 4/21/14, 12:47 PM, Stephen Frost wrote:
>> >  Asking for help to address the FreeBSD performance would have
>> >been much better received. Thanks, Stephen
>> 
>> That is exactly what I did, I asked for a version of postgresql that
>> was easy to switch at runtime between two behaviors.
>> 
>> That would make it a LOT easier to run a few scripts and make sure I
>> got the correct binary without having to munge PREFIX and a bunch of
>> PATH and other tools to get my test harness to DTRT.
> 
> I'm sure one of the hackers would be happy to provide you with a patch
> to help you with your testing.
> 
> That's quite a different thing from asking for a GUC to be provided and
> then supported over the next 5 years as part of the core release, which
> is what I believe we all thought you were asking for.

Alfred,

Are you willing and use a custom 9.3 installed from source or are you asking
for something to actually be released to the wild before you go and test it
- your comments are unclear on this point?

The technical consensus is that the more desirable approach is to have the
determination done at compile-time since - besides testing - no obvious
reason exists that a user, once they have determined the correct option for
their platform, would find reason to change it.  Yes, it adds another player
to the game (unless you install from source), but the community is already
structured to rely upon packagers to do the right thing for their platform
so that the amount of customization presented to the user can be minimized.

In short, the goal is to have GUCs limited to work-mix, not platform,
configuration; and definitely not for platform testing purposes.  If you are
going to be testing platform performance it seems to be expected that you
have the ability to compile and alter source code.  This may indeed limit
the potential number of testers but it does add efficiency to the process
because the testers can make patches.

David J.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Perfomance-degradation-9-3-vs-9-2-for-FreeBSD-tp5800835p5800993.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andrew Dunstan



On 04/21/2014 02:54 PM, Andres Freund wrote:

Hi,

I spent the last two hours poking arounds in the environment Andrew
provided and I was able to reproduce the issue, find a assert to
reproduce it much faster and find a possible root cause.



What's the assert that makes it happen faster? That might help a lot in 
constructing a self-contained test.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-21 Thread Josh Berkus

On 04/21/2014 03:41 PM, Simon Riggs wrote:
> Storage Efficiency
> * Compression
> * Column Orientation

You might look at turning this:

http://citusdata.github.io/cstore_fdw/

... into a more integrated part of Postgres.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows

2014-04-21 Thread Michael Paquier

On Mon, Apr 21, 2014 at 12:54 PM, Michael Paquier  wrote:

> When doing some work on Windows, I noticed that the mkvc specs in
> src/tools/msvc use wsock32.lib, which is as far as I understand an
> old, obsolete version of the Windows socket library. Wouldn't it make
> sense to update the specs to build only with ws2_32.lib like in the
> patch attached?
>
I created an entry in the upcoming commit fest for this patch:
https://commitfest.postgresql.org/action/patch_view?id=1440
Thanks,
-- 
Michael

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Claudio Freire

On Mon, Apr 21, 2014 at 8:07 PM, David G Johnston
 wrote:
> Jim Nasby-2 wrote
>>> I feel that if there is no memory pressure, frankly it doesnt matter much
>>> about what gets out and what not. The case I am specifically targeting is
>>> when the clocksweep gets to move about a lot i.e. high memory pressure
>>> workloads. Of course,  I may be totally wrong here.
>>
>> Well, there's either memory pressure or there isn't. If there isn't then
>> it's all moot *because we're not evicting anything*.
>
> The trade-off I'm seeing here is between measuring when there is no memory
> pressure - and thus eating at performance while not actually evicting
> buffers - and not measuring but then encountering memory pressure and not
> having a clue as to what should be evicted.


I believe that for the intended use discussed in this thread, a
compile-time switch would be more than enough control, and it would
avoid that tradeoff.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread David G Johnston

Jim Nasby-2 wrote
>> I feel that if there is no memory pressure, frankly it doesnt matter much
>> about what gets out and what not. The case I am specifically targeting is
>> when the clocksweep gets to move about a lot i.e. high memory pressure
>> workloads. Of course,  I may be totally wrong here.
> 
> Well, there's either memory pressure or there isn't. If there isn't then
> it's all moot *because we're not evicting anything*.

The trade-off I'm seeing here is between measuring when there is no memory
pressure - and thus eating at performance while not actually evicting
buffers - and not measuring but then encountering memory pressure and not
having a clue as to what should be evicted.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Clock-sweep-not-caching-enough-B-Tree-leaf-pages-tp5799947p5800988.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Jim Nasby


On 4/16/14, 10:28 AM, Robert Haas wrote:

Also, I think the scalability problems around buffer eviction are
eminently solvable, and in particular I'm hopeful that Amit is going
to succeed in solving them.  Suppose we have a background process
(whether the background writer or some other) that runs the clock
sweep, identifies good candidates for eviction, and pushes them on a
set of, say, 16 free-lists protected by spinlocks.  (The optimal
number of free-lists probably depends on the size of shared_buffers.)


How *certain* are we that a single freelist lock (that actually ONLY protects 
the freelist) would be that big a deal? I suspect it wouldn't be much of an 
issue at all:

- Right now (IIRC) it's tied into the clock as well, so immediate fail on 
scaling...
- The clock is WAY more expensive than grabbing one buffer off the free list. 
Last I looked it was so bad that even if the next buffer the clock hit was free 
it was still worse than hitting the free list.

I strongly suspect that a single freelist lock (that didn't protect anything 
else) would be fine. I think it'd be folly to start with a more complex 
multi-lock/multi-freelist implementation before we knew we needed one.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-21 Thread Simon Riggs

I've discussed 2ndQuadrant's involvement in the AXLE project a few
times publicly, but never on this mailing list. The project relates to
innovation and improvement in Business Intelligence for systems based
upon PostgreSQL in the range of 10-100TB.

Our work will span the 9.5 and 9.6 cycles. We're looking to make
measurable improvements in a number of cases; one of those is TPC-H,
since its a publicly accessible benchmark, another is a more private
benchmark on healthcare data. In brief, this means speeding up the
performance of large queries, data loading and looking at very large
systems issues.

Some of areas of R&D are definitely on the roadmap, others are more
flexible. Some of this is in progress, other stuff is not even at the
design stage - yet, just a few paragraphs along the lines of "we will
look at these topics". If we have room, its possible we may
accommodate other topics; this is not carte blanche, but the reason
for posting here is so people know we will take input, following the
normal community process. Detailed in-person discussions at PGCon are
expected and the Wiki pages will be updated for each aspect.

BI-related Indexing
* MinMax indexes
* Bitmap indexes

Large Systems
* Freeze avoidance
* Storage management issues for very large systems

Storage Efficiency
* Compression
* Column Orientation

Optimisation
* Bulk loading speed improvements
* Bulk FK evaluation
* Executor tuning for very large queries

Query tuning
* Approximate queries, sampling
* Materialized Views

...and possibly some other aspects.

2ndQuadrant is also assisting other researchers on GPU and FPGA
topics, which may also yield work of interest to PostgreSQL project.

Couple of points: The project is time limited, so if work gets pushed
back beyond that then we'll lose the opportunity to contribute. Please
support our work with timely objections, assistance in defining the
path forwards and limiting the scope to something that avoids wasting
this opportunity. Further funding is possible if we don't squander
this. We are being funded to make best efforts to contribute to open
source PostgreSQL, not pay-for-commit.

AXLE is funded by the EU under FP7 Grant Agreement 318633.  Further
details are available here http://www.axleproject.eu/

(There are also other 2ndQuadrant development projects in progress,
this is just one of the larger ones).

Best Regards

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Jim Nasby


On 4/18/14, 2:51 PM, Atri Sharma wrote:


I feel that if there is no memory pressure, frankly it doesnt matter much about 
what gets out and what not. The case I am specifically targeting is when the 
clocksweep gets to move about a lot i.e. high memory pressure workloads. Of 
course,  I may be totally wrong here.


Well, there's either memory pressure or there isn't. If there isn't then it's 
all moot *because we're not evicting anything*.


One thing that I discussed with Merlin offline and am now concerned about is 
how will the actual eviction work. We cannot traverse the entire list and then 
find all the buffers with refcount 0 and then do another traversal to find the 
oldest one.


This is why OSes use multiple page pools. If we're going to use a clock sweep 
at all I think we need to use the same.

Every time we discuss this stuff it feels like we're completely reinventing the 
wheel that was solved by OSes years ago. :(
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-21 Thread Jim Nasby


On 4/15/14, 1:15 PM, Peter Geoghegan wrote:

On Tue, Apr 15, 2014 at 9:30 AM, Merlin Moncure  wrote:

>There are many reports of improvement from lowering shared_buffers.
>The problem is that it tends to show up on complex production
>workloads and that there is no clear evidence pointing to problems
>with the clock sweep; it could be higher up in the partition locks or
>something else entirely (like the O/S).  pgbench is also not the
>greatest tool for sniffing out these cases: it's too random and for
>large database optimization is generally an exercise in de-randomizing
>i/o patterns.  We really, really need a broader testing suite that
>covers more usage patterns.

I find it quite dissatisfying that we know so little about this.


This is an area where additional stats gathering would be very valuable. We're 
running 8G buffers on 512G servers because bumping it up hurt performance, but 
we have no clue why. Was it due to buffer pins? How many times the clock had to 
sweep to find a victim? Something else entirely? No idea... :(
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Jim Nasby


On 4/21/14, 4:08 PM, Joshua D. Drake wrote:

If the community had more *BSD presence I think it would be great but it isn't 
all that viable at this point. I do know however that no-one in this community 
would turn down a team of FreeBSD advocates helping us make PostgreSQL awesome 
for PostgreSQL.


I assume you meant FreeBSD awesome for PostgreSQL? :)

I'm also a big fan of *BSD but the reality is it's MUCH harder to get *BSD into 
a corporation than linux. Now, if FreeBSD had a bunch of stuff that made 
PostgreSQL run like 4x faster on *BSD than Linux that would be a different 
story.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 2:23 PM, Stephen Frost wrote:

Alfred,

* Alfred Perlstein (alf...@freebsd.org) wrote:

On 4/21/14, 12:47 PM, Stephen Frost wrote:

  Asking for help to address the FreeBSD performance would have
been much better received. Thanks, Stephen

That is exactly what I did, I asked for a version of postgresql that
was easy to switch at runtime between two behaviors.

That would make it a LOT easier to run a few scripts and make sure I
got the correct binary without having to munge PREFIX and a bunch of
PATH and other tools to get my test harness to DTRT.

I'm sure one of the hackers would be happy to provide you with a patch
to help you with your testing.

That would be fine.

That's quite a different thing from asking for a GUC to be provided and
then supported over the next 5 years as part of the core release, which
is what I believe we all thought you were asking for.
I did not know that GUCs were not classified into 
"experimental/non-experimental".  The fact that a single GUC would need 
to be supported for 5 years is definitely something to consider.  Now I 
understand the push back a little more.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andres Freund

On 2014-04-21 17:21:20 -0400, Bruce Momjian wrote:
> On Mon, Apr 21, 2014 at 02:08:51PM -0700, Joshua Drake wrote:
> > If the community had more *BSD presence I think it would be great
> > but it isn't all that viable at this point. I do know however that
> > no-one in this community would turn down a team of FreeBSD advocates
> > helping us make PostgreSQL awesome for PostgreSQL.
> 
> I don't think we would even implement a run-time control for Linux or
> Windows for this, so I don't even think it is a FreeBSD issue.

I think some of the arguments in this thread are pretty damn absurd. We
have just introduced dynamic_shared_memory_type.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Bruce Momjian

On Mon, Apr 21, 2014 at 02:08:51PM -0700, Joshua Drake wrote:
> If the community had more *BSD presence I think it would be great
> but it isn't all that viable at this point. I do know however that
> no-one in this community would turn down a team of FreeBSD advocates
> helping us make PostgreSQL awesome for PostgreSQL.

I don't think we would even implement a run-time control for Linux or
Windows for this, so I don't even think it is a FreeBSD issue.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Stephen Frost

Alfred,

* Alfred Perlstein (alf...@freebsd.org) wrote:
> On 4/21/14, 12:47 PM, Stephen Frost wrote:
> >  Asking for help to address the FreeBSD performance would have
> >been much better received. Thanks, Stephen
> 
> That is exactly what I did, I asked for a version of postgresql that
> was easy to switch at runtime between two behaviors.
> 
> That would make it a LOT easier to run a few scripts and make sure I
> got the correct binary without having to munge PREFIX and a bunch of
> PATH and other tools to get my test harness to DTRT.

I'm sure one of the hackers would be happy to provide you with a patch
to help you with your testing.

That's quite a different thing from asking for a GUC to be provided and
then supported over the next 5 years as part of the core release, which
is what I believe we all thought you were asking for.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus


> Josh, how long does it take you to reproduce the issue? 

A couple hours.

> And can you
> reproduce it outside of a production environment?

Not yet, still working on that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Joshua D. Drake



On 04/21/2014 10:39 AM, Alfred Perlstein wrote:


What I am seeing here is unfortunately a very strong departure from
FreeBSD support by the community from several of the developers.  In
fact over drinks at pgcon last year there were a TON of jokes making fun
of FreeBSD users and developers which I took in stride as professional
joking with alcohol involved.  I thought it was pretty funny.  However a
year later and I realize that there appears to be a real problem with
FreeBSD in the pgsql community.


The reality is, FreeBSD is like Saab (before it died, and no I am not 
suggesting that FreeBSD is dying). Saab was a niche, very cool 
automobile that offered a lot of unique features that others didn't. 
However, they didn't sell very well in the states but had a very devoted 
fan base (myself included).


FreeBSD is awesome. There is no question about that. It certainly has a 
better license than Linux and has offered things for years that Linux 
has never really gotten right (jails/zones).


That said, FreeBSD is niche and Linux is, not. Linux is the king of the 
jungle in this world. Whether we want it to be or not and what that 
means is: that is where the resources go.


If the community had more *BSD presence I think it would be great but it 
isn't all that viable at this point. I do know however that no-one in 
this community would turn down a team of FreeBSD advocates helping us 
make PostgreSQL awesome for PostgreSQL.




There are other Linux centric dbs to pick from.  If pgsql is just


No, there is about 1 and its derivatives thereof. If you want the type 
of features pgsql offers, then pgsql is all you have got.


Sincerely,

JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Bruce Momjian

On Mon, Apr 21, 2014 at 01:52:48PM -0700, Alfred Perlstein wrote:
> 
> On 4/21/14, 12:47 PM, Stephen Frost wrote:
> >  Asking for help to address the FreeBSD performance would have
> >been much better received. Thanks, Stephen
> 
> That is exactly what I did, I asked for a version of postgresql that
> was easy to switch at runtime between two behaviors.
> 
> That would make it a LOT easier to run a few scripts and make sure I
> got the correct binary without having to munge PREFIX and a bunch of
> PATH and other tools to get my test harness to DTRT.

I think the big point is that you must realize that we are dealing with
thousands of users, so making a suggestion without considering its
impact on those thousands of people is not helpful.

We have clearly stated the need to consider those thousands of users,
and you are still saying the same thing --- this would make it easy for
"me".  This is not helpful to the discussion, and you must realize that
at some level.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 12:47 PM, Stephen Frost wrote:
  Asking for help to address the FreeBSD performance would have been 
much better received. Thanks, Stephen 


That is exactly what I did, I asked for a version of postgresql that was 
easy to switch at runtime between two behaviors.


That would make it a LOT easier to run a few scripts and make sure I got 
the correct binary without having to munge PREFIX and a bunch of PATH 
and other tools to get my test harness to DTRT.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund

On 2014-04-21 12:31:09 -0700, Josh Berkus wrote:
> On 04/21/2014 12:26 PM, Tom Lane wrote:
> > Andres Freund  writes:
> >> I spent the last two hours poking arounds in the environment Andrew
> >> provided and I was able to reproduce the issue, find a assert to
> >> reproduce it much faster and find a possible root cause.
> > 
> > Hmm ... is this the same thing Josh is reporting?  If so, why the
> > apparent connection with use of pg_stat_statements?  Maybe that
> > just changes the timing to make the race condition more likely?
> 
> Well, in Andrew's case it only happens if auto_explain is loaded.

Josh, how long does it take you to reproduce the issue? And can you
reproduce it outside of a production environment?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2014-04-21 15:47:31 -0400, Stephen Frost wrote:
> > That's certainly unfortunate.  For my 2c, I'd recommend that you write a
> > minimal implementation that allows you to test just the sysv-vs-mmap
> > case (which could certainly take an option, to avoid having to
> > recompile during testing), or even ask if anyone here already has;
> 
> I don't think that's something all that easily testable in
> isolation. The behaviour here is heavily related to concurrency.

Concurrency is not terribly hard to generate in a simulated case; I
still feel that doing this independently of PG would probably be better
than involving all the rest of the PG code in testing something this
low-level.

> > I
> > wouldn't be at all surprised if both Robert and Francois did exactly
> > that already, nor would I be surprised if someone volunteered to write
> > such a small C utility for you, if it meant that this issue would be
> > fixed in FreeBSD that much sooner.
> 
> I don't know, but the patch for a guc would be < 10 lines. I think I'd
> start with that.

Certainly running a minimally patched PG wouldn't be hard for a kernel
dev either, of course.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andres Freund

On 2014-04-21 15:47:31 -0400, Stephen Frost wrote:
> That's certainly unfortunate.  For my 2c, I'd recommend that you write a
> minimal implementation that allows you to test just the sysv-vs-mmap
> case (which could certainly take an option, to avoid having to
> recompile during testing), or even ask if anyone here already has;

I don't think that's something all that easily testable in
isolation. The behaviour here is heavily related to concurrency.

> I
> wouldn't be at all surprised if both Robert and Francois did exactly
> that already, nor would I be surprised if someone volunteered to write
> such a small C utility for you, if it meant that this issue would be
> fixed in FreeBSD that much sooner.

I don't know, but the patch for a guc would be < 10 lines. I think I'd
start with that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Stephen Frost

Alfred,

* Alfred Perlstein (alf...@freebsd.org) wrote:
> Stephen, please calm down on the hyperbole, seriously, picking
> another db is not an attack.

Perhaps it wasn't intended by you, but to those of us reading it, your
comments came across clearly as "if you don't fix this, then we
won't use PG".  Email certainly isn't perfect but I hope you can
understand our confusion- why else bring up MySQL or any other database
if not to try and push us to do what you want?  It's hard for me to see
how bringing up other databases or making comments about us being
"Linux-only" are anything other than attempts to persude by threating
that we might lose a user or users.

> I was simply asking for a feature that would make my life easier as
> both an admin deploying postgresql and a kernel dev attempting to
> fix a problem.  I'm one guy, probably the only guy right now asking.
> Honestly the thought of needing to compile two versions of
> postgresql to do sysv vs mmap performance would take me more time
> than I would like to devote to the issue when my time is already
> limited.

That's certainly unfortunate.  For my 2c, I'd recommend that you write a
minimal implementation that allows you to test just the sysv-vs-mmap
case (which could certainly take an option, to avoid having to
recompile during testing), or even ask if anyone here already has; I
wouldn't be at all surprised if both Robert and Francois did exactly
that already, nor would I be surprised if someone volunteered to write
such a small C utility for you, if it meant that this issue would be
fixed in FreeBSD that much sooner.

Asking for help to address the FreeBSD performance would have been much
better received.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund

On 2014-04-21 15:26:03 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I spent the last two hours poking arounds in the environment Andrew
> > provided and I was able to reproduce the issue, find a assert to
> > reproduce it much faster and find a possible root cause.
> 
> Hmm ... is this the same thing Josh is reporting?  If so, why the
> apparent connection with use of pg_stat_statements?  Maybe that
> just changes the timing to make the race condition more likely?

I am not sure. I could only reproduce the problem with both
stat_statements and auto_explain loaded, but I didn't try very
long. It's pretty timing sensitive, so I could very well imagine that
some additional overhead makes it easier to reproduce.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andrew Dunstan


On 04/21/2014 03:26 PM, Tom Lane wrote:

Andres Freund  writes:

I spent the last two hours poking arounds in the environment Andrew
provided and I was able to reproduce the issue, find a assert to
reproduce it much faster and find a possible root cause.

Hmm ... is this the same thing Josh is reporting?  If so, why the
apparent connection with use of pg_stat_statements?  Maybe that
just changes the timing to make the race condition more likely?





That's my theory. Josh's case is very similar indeed to mine.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus

On 04/21/2014 12:26 PM, Tom Lane wrote:
> Andres Freund  writes:
>> I spent the last two hours poking arounds in the environment Andrew
>> provided and I was able to reproduce the issue, find a assert to
>> reproduce it much faster and find a possible root cause.
> 
> Hmm ... is this the same thing Josh is reporting?  If so, why the
> apparent connection with use of pg_stat_statements?  Maybe that
> just changes the timing to make the race condition more likely?

Well, in Andrew's case it only happens if auto_explain is loaded.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 11:14 AM, Stephen Frost wrote:

Alfred,

* Alfred Perlstein (alf...@freebsd.org) wrote:

On 4/21/14, 9:51 AM, Andres Freund wrote:

On 2014-04-21 09:42:06 -0700, Alfred Perlstein wrote:

Sure, to be fair, we are under the gun here for a product, it may just mean
that the end result of that conversation is "mysql".

Personally arguments in that vain are removing just about any incentive
I have to work on the problem.

I was just explaining that we have a timeline over here and while
that may disincentive you for providing what we need it would be
very unfair.

I'm pretty sure Andres was referring to the part where there's a
'threat' to move to some other platform due to a modest performance
degredation, as if it's the only factor involved in making a decision
among the various RDBMS options.  If that's really your deciding
criteria instead of the myriad of other factors, I daresay you have your
priorities mixed up.


There are other Linux centric dbs to pick from.  If pgsql is just
another Linux centric DB then that is unfortunate but something I
can deal with.

These attacks really aren't going to get you anywhere.  We're talking
about a specific performance issue that FreeBSD has and how much PG
(surely not the only application impacted by this issue) should bend
to address it, even though the FreeBSD folks were made aware of the
issue over year ago and have done nothing to address it.

Moreover, you'd like to also define the way we deal with the issue as
being to make it runtime configurable rather than as a compile-time
option, even though 90% of the users out there won't understand the
difference nor would know how to correctly set it (and, in many cases,
may end up making the wrong decision because it's the default for
other platforms, unless we add more code to address this at initdb
time).

Basically, it doesn't sound like you're terribly concerned with the
majority of our user base, even on FreeBSD, and would prefer to try
and browbeat us into doing what you've decided is the correct solution
because it'd work better for you.

I've been guiltly of the same in the past and it's not fun having to
back off from a proposal when it's pointed out that there's a better
option, particularly when it doesn't seem like the alternative is
better for me, but that's just part of working in any large project.

Stephen, please calm down on the hyperbole, seriously, picking another 
db is not an attack.


I was simply asking for a feature that would make my life easier as both 
an admin deploying postgresql and a kernel dev attempting to fix a 
problem.  I'm one guy, probably the only guy right now asking.  Honestly 
the thought of needing to compile two versions of postgresql to do sysv 
vs mmap performance would take me more time than I would like to devote 
to the issue when my time is already limited.


Again, it was an ask, you are free to do what you like, the same way you 
were free to ignore my advice at pgcon about mmap being less efficient.  
It does not make what I'm saying an "attack".  Just like when 
interviewing people choosing a different candidate for a job is not an 
attack on the other candidates!


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Tom Lane

Andres Freund  writes:
> I spent the last two hours poking arounds in the environment Andrew
> provided and I was able to reproduce the issue, find a assert to
> reproduce it much faster and find a possible root cause.

Hmm ... is this the same thing Josh is reporting?  If so, why the
apparent connection with use of pg_stat_statements?  Maybe that
just changes the timing to make the race condition more likely?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund

Hi,

I spent the last two hours poking arounds in the environment Andrew
provided and I was able to reproduce the issue, find a assert to
reproduce it much faster and find a possible root cause.

Since the symptom of the problem seem to be multixacts with more than
one updating xid, I added a check to MultiXactIdCreateFromMembers()
preventing that. That requires to move ISUPDATE_from_mxstatus() to a
header, but I think we should definitely add such a assert.

As it turns out the problem is in the
else if (result == HeapTupleBeingUpdated && wait)
branch in (at least) heap_update(). When the problem is hit the
to-be-updated tuple originally has HEAP_XMIN_COMMITTED |
HEAP_XMAX_LOCK_ONLY | HEAP_XMAX_KEYSHR_LOCK set. So we release the
buffer lock, acquire the tuple lock, and reacquire the buffer lock. But
inbetween the locking backend has actually updated the tuple.
The code tries to protect against that with:
/*
 * recheck the locker; if someone else changed the tuple while
 * we weren't looking, start over.
 */
if ((oldtup.t_data->t_infomask & HEAP_XMAX_IS_MULTI) ||
!TransactionIdEquals(
HeapTupleHeaderGetRawXmax(oldtup.t_data),
 xwait))
goto l2;

can_continue = true;
locker_remains = true;

and similar. The problem is that in Andrew's case the infomask changes
from 0x2192 to 0x2102 (i.e. it's a normal update afterwards), while xmax
stays the same. Ooops.
A bit later there's:
result = can_continue ? HeapTupleMayBeUpdated : HeapTupleUpdated;
So, from thereon we happily continue to update the tuple, thinking
there's no previous updater. Which obviously causes problems.

I've hacked^Wfixed this by changing the infomask test above into
infomask != oldtup.t_data->t_infomask in a couple of places. That seems
to be sufficient to survive the testcase a couple of times.

I am too hungry right now to think about a proper fix for this and
whether there's further problematic areas.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Stephen Frost

Alfred,

* Alfred Perlstein (alf...@freebsd.org) wrote:
> On 4/21/14, 9:51 AM, Andres Freund wrote:
> >On 2014-04-21 09:42:06 -0700, Alfred Perlstein wrote:
> >>Sure, to be fair, we are under the gun here for a product, it may just mean
> >>that the end result of that conversation is "mysql".
> >Personally arguments in that vain are removing just about any incentive
> >I have to work on the problem.
> 
> I was just explaining that we have a timeline over here and while
> that may disincentive you for providing what we need it would be
> very unfair.

I'm pretty sure Andres was referring to the part where there's a
'threat' to move to some other platform due to a modest performance
degredation, as if it's the only factor involved in making a decision
among the various RDBMS options.  If that's really your deciding
criteria instead of the myriad of other factors, I daresay you have your
priorities mixed up.

> There are other Linux centric dbs to pick from.  If pgsql is just
> another Linux centric DB then that is unfortunate but something I
> can deal with.

These attacks really aren't going to get you anywhere.  We're talking
about a specific performance issue that FreeBSD has and how much PG
(surely not the only application impacted by this issue) should bend
to address it, even though the FreeBSD folks were made aware of the
issue over year ago and have done nothing to address it.

Moreover, you'd like to also define the way we deal with the issue as
being to make it runtime configurable rather than as a compile-time
option, even though 90% of the users out there won't understand the
difference nor would know how to correctly set it (and, in many cases,
may end up making the wrong decision because it's the default for
other platforms, unless we add more code to address this at initdb
time).

Basically, it doesn't sound like you're terribly concerned with the
majority of our user base, even on FreeBSD, and would prefer to try
and browbeat us into doing what you've decided is the correct solution
because it'd work better for you.

I've been guiltly of the same in the past and it's not fun having to
back off from a proposal when it's pointed out that there's a better
option, particularly when it doesn't seem like the alternative is
better for me, but that's just part of working in any large project.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 9:52 AM, Alvaro Herrera wrote:

Alfred Perlstein wrote:


I am unsure of the true overhead of making this a runtime tunable so
pardon if I'm asking for "a lot".  From the perspective of both an
OS developer and postgresql user (I am both) it really makes more
sense to have it a runtime tunable for the following reasons:

 From an OS developer making this a runtime allows us to much more
easily do the testing (instead of needing two compiled versions).
 From a sysadmin perspective it makes switching to/from a LOT easier
in case the new mmap code exposes a stability or performance bug.

In this case, AFAICS the only overhead of a runtime option (what we call
a GUC) is the added potential for user confusion, and the necessary
documentation.  If we instead go for a compile-time option, both items
become smaller.

In any case, I don't see that there's much need for a runtime option,
really; you already know that the mmap code path is slower in FreeBSD.
You only need to benchmark both options once the FreeBSD vm code itself
is fixed, right?

In fact, it might not even need to be a configure option; I would
suggest a pg_config_manual.h setting instead, and perhaps tweaks to the
src/template/freebsd file to enable it automatically on the "broken"
FreeBSD releases.  We could then, in the future, have the template
itself turn the option off for the future FreeBSD release that fixes the
problem.

That is correct, until you're in prod and suddenly one option becomes 
unstable, or you want to try a quick kernel patch without rebooting.


Look, this is an argument I've lost time and time again in open source 
software communities, the idea of a software option as opposed to 
compile time really seems to hit people the wrong way.


I think from now on it just makes sense to sit back and let whatever 
happens happen.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Stephen Frost

* Alfred Perlstein (alf...@freebsd.org) wrote:
> How high on the hierarchy of PostgreSQL's "needs" is making a single
> option a tunable versus compile time thing?  I mean seriously you
> mean to stick on this one point when one of your users are asking
> you about this?   That is pretty concerning to me.

Seriously, we do care that the system is easy to use for both admins and
end users and part of how we do that is by minimizing the number of
tunable options because they add to confusion and increase the
difficulty to manage the system- look at certain other $expensive
RDBMS's if you're unsure about that.

Far better is to work out the *correct* solution to a given problem
rather than punt'ing on it and asking the (almost uniformly
under-informed user) to tell us what to do.

And, yes, while we're interested in our user's input, we do not add new
configuration variables because one user asked us to.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 9:51 AM, Andres Freund wrote:

On 2014-04-21 09:42:06 -0700, Alfred Perlstein wrote:

Sure, to be fair, we are under the gun here for a product, it may just mean
that the end result of that conversation is "mysql".

Personally arguments in that vain are removing just about any incentive
I have to work on the problem.


I was just explaining that we have a timeline over here and while that 
may disincentive you for providing what we need it would be very unfair.


In that I mean sometimes the reality of a situation can be inconvenient 
and for that I do apologize.


What I am seeing here is unfortunately a very strong departure from 
FreeBSD support by the community from several of the developers.  In 
fact over drinks at pgcon last year there were a TON of jokes making fun 
of FreeBSD users and developers which I took in stride as professional 
joking with alcohol involved.  I thought it was pretty funny.  However a 
year later and I realize that there appears to be a real problem with 
FreeBSD in the pgsql community.


There are other Linux centric dbs to pick from.  If pgsql is just 
another Linux centric DB then that is unfortunate but something I can 
deal with.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alvaro Herrera

Alfred Perlstein wrote:

> How high on the hierarchy of PostgreSQL's "needs" is making a single
> option a tunable versus compile time thing?  I mean seriously you
> mean to stick on this one point when one of your users are asking
> you about this?   That is pretty concerning to me.

I think the sticking point here is that the problem affects a single
platform, and it can easily be construed as a platform bug.  For
problems that affect PostgreSQL as a whole for everybody, we hesitate a
lot less when it comes to creating new runtime options.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 9:51 AM, Andrew Dunstan wrote:


On 04/21/2014 12:44 PM, Alfred Perlstein wrote:

On 4/21/14 9:38 AM, Andrew Dunstan wrote:


On 04/21/2014 12:25 PM, Alfred Perlstein wrote:






1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.


2. We should be trying to get rid of GUCs where possible, and only 
add them when we must. The more there are the more we confuse 
users. If a packager can pick a default surely they can pick build 
options too.
Thank you for the lecture Andrew!  Really pleasant way to treat a 
user and a fan of the system. :)





I confess to being mightily confused.


Sure, to clarify:

Andrew, you just told someone who in a db stack sits both below (as a 
pgsql user 15 years) and above (as a FreeBSD kernel dev 15 years) 
your software what they "really need".






I told you what *we* (i.e. the PostgreSQL community) need, IMNSHO (and 
speaking as a Postgres developer and consultant of 10 or so years 
standing).


How high on the hierarchy of PostgreSQL's "needs" is making a single 
option a tunable versus compile time thing?  I mean seriously you mean 
to stick on this one point when one of your users are asking you about 
this?   That is pretty concerning to me.


-Alfred





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 4:10 AM, Andres Freund wrote:

Hi,

On 2014-04-20 11:24:38 +0200, Palle Girgensohn wrote:

I see performance degradation with PostgreSQL 9.3 vs 9.2 on FreeBSD, and I'm 
wondering who to poke to mitigate the problem. In reference to this thread [1], 
who where the FreeBSD people that Francois mentioned? If mmap needs to perform 
well in the kernel, I'd like to know of someone with FreeBSD kernel knowledge 
who is interested in working with mmap perfocmance. If mmap is indeed the 
cuplrit, I've just tested 9.2.8 vs 9.3.4, I nevere isolated the mmap patch, 
although I believe Francois did just that with similar results.

If there are indeed such large regressions on FreeBSD we need to treat
them as postgres regressions. It's nicer not to add config options for
things that don't need it, but apparently that's not the case here.

Imo this means we need to add GUC to control wether anon mmap() or sysv
shmem is to be used. In 9.3.

Greetings,

Andres Freund

Andres, thank you.  Speaking as a FreeBSD developer that would be a good 
idea.


--
Alfred Perlstein



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Tom Lane

Josh Berkus  writes:
> 1) I've confirmed at the 2nd site that the issue doesn't happen if
> pg_stat_statements.so is not loaded.  So this seems to be confirmation
> that either auto_explain, pg_stat_statements, or both need to be loaded
> (but not necessarily created as extensions) in order to have the issue.

I looked a little bit for possible interactions between those two modules,
and almost immediately noticed that pg_stat_statements is not being polite
enough to call any previous occupant of the post_parse_analyze_hook.
That's not the bug we're looking for, since auto_explain does not use
post_parse_analyze_hook, but it's a bug nonetheless.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alvaro Herrera

Alfred Perlstein wrote:

> I am unsure of the true overhead of making this a runtime tunable so
> pardon if I'm asking for "a lot".  From the perspective of both an
> OS developer and postgresql user (I am both) it really makes more
> sense to have it a runtime tunable for the following reasons:
> 
> From an OS developer making this a runtime allows us to much more
> easily do the testing (instead of needing two compiled versions).
> From a sysadmin perspective it makes switching to/from a LOT easier
> in case the new mmap code exposes a stability or performance bug.

In this case, AFAICS the only overhead of a runtime option (what we call
a GUC) is the added potential for user confusion, and the necessary
documentation.  If we instead go for a compile-time option, both items
become smaller.

In any case, I don't see that there's much need for a runtime option,
really; you already know that the mmap code path is slower in FreeBSD.
You only need to benchmark both options once the FreeBSD vm code itself
is fixed, right?

In fact, it might not even need to be a configure option; I would
suggest a pg_config_manual.h setting instead, and perhaps tweaks to the
src/template/freebsd file to enable it automatically on the "broken"
FreeBSD releases.  We could then, in the future, have the template
itself turn the option off for the future FreeBSD release that fixes the
problem.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andres Freund

On 2014-04-21 09:42:06 -0700, Alfred Perlstein wrote:
> Sure, to be fair, we are under the gun here for a product, it may just mean
> that the end result of that conversation is "mysql".

Personally arguments in that vain are removing just about any incentive
I have to work on the problem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andrew Dunstan



On 04/21/2014 12:44 PM, Alfred Perlstein wrote:

On 4/21/14 9:38 AM, Andrew Dunstan wrote:


On 04/21/2014 12:25 PM, Alfred Perlstein wrote:






1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.


2. We should be trying to get rid of GUCs where possible, and only 
add them when we must. The more there are the more we confuse 
users. If a packager can pick a default surely they can pick build 
options too.
Thank you for the lecture Andrew!  Really pleasant way to treat a 
user and a fan of the system. :)





I confess to being mightily confused.


Sure, to clarify:

Andrew, you just told someone who in a db stack sits both below (as a 
pgsql user 15 years) and above (as a FreeBSD kernel dev 15 years) your 
software what they "really need".






I told you what *we* (i.e. the PostgreSQL community) need, IMNSHO (and 
speaking as a Postgres developer and consultant of 10 or so years standing).


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 9:38 AM, Andrew Dunstan wrote:


On 04/21/2014 12:25 PM, Alfred Perlstein wrote:






1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.


2. We should be trying to get rid of GUCs where possible, and only 
add them when we must. The more there are the more we confuse users. 
If a packager can pick a default surely they can pick build options 
too.
Thank you for the lecture Andrew!  Really pleasant way to treat a 
user and a fan of the system. :)





I confess to being mightily confused.


Sure, to clarify:

Andrew, you just told someone who in a db stack sits both below (as a 
pgsql user 15 years) and above (as a FreeBSD kernel dev 15 years) your 
software what they "really need".


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Alvaro Herrera

Merlin Moncure wrote:
> On Mon, Apr 21, 2014 at 10:40 AM, Andres Freund  
> wrote:

> > And too bad that infomask bits are so scarce :(. We really need to
> > reclaim HEAP_MOVED_OFF and HEAP_MOVED_IN.
> 
> The only consequence of that is losing support for in-place update for
> pre-9.0 (of which the only supported version is 8.4).  I figure it's
> also pretty reasonable to drop support for IPU for out of support
> versions for new versions going forward.  That would recover the bits
> and yield some nice cleanups in tqual.c.

There's no way to be sure that the bits have been removed from tables
that were upgraded from a 8.4 server to a 9.0 server and from there to a
newer server.  We'd need some way to catalogue which tables have been
cleaned prior to an upgrade to a release that no longer accepts those
bits; pg_upgrade would then say "table xyz needs to be vacuumed before
the upgrade" in check mode, or something like that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 9:34 AM, Stephen Frost wrote:

* Alfred Perlstein (alf...@freebsd.org) wrote:

There is definitely hope, however changes to the FreeBSD vm are
taken as seriously as changes to core changes to Postresql's store.
In addition changes to vm is somewhat in the realm of complexity of
Postgresql store as well so it may not be coming in the next few
days/weeks, but rather a month or two.  I am not sure if an easy fix
is available in FreeBSD but we will see in short order.

This has been known for over a year.. :(

I know!  I remember warning y'all about it back at pgcon last year. :)



I need to do some research.  I work with Adrian (FreeBSD kernel dev
mentioned earlier in the thread), I'll grab him today and discuss
what the issue may be.

Hopefully that'll get things moving in the right direction, finally..
Sure, to be fair, we are under the gun here for a product, it may just 
mean that the end result of that conversation is "mysql".


I'm hoping we can use Postgresql as I've been a huge fan since 1999.  I 
based my first successful project on it and had a LOT of help from the 
pgsql community, Tom, Bruce and we even contracted Vadim for some work 
on incremental vacuums!


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus


> Can you get the infomask bits..?  What's does pg_controldata report wrt
> the MultiXid's?

Can't get the infomask bits.

pg_controldata attached, with some redactions.  Unfortunately, it
appears that they've continued to do tests on this system, so the XID
counter has advanced somewhat.

pg_control version number:937
Catalog version number:   {redacted}
Database system identifier:   {redacted}
Database cluster state:   in production
pg_control last modified: Mon 21 Apr 2014 04:28:30 PM UTC
Latest checkpoint location:   8/C8B05F98
Prior checkpoint location:8/C663A4B0
Latest checkpoint's REDO location:8/C6DC9A10
Latest checkpoint's REDO WAL file:0003000800C6
Latest checkpoint's TimeLineID:   3
Latest checkpoint's PrevTimeLineID:   3
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/11214375
Latest checkpoint's NextOID:  32769
Latest checkpoint's NextMultiXactId:  28939
Latest checkpoint's NextMultiOffset:  58004
Latest checkpoint's oldestXID:674
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  11214375
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:Mon 21 Apr 2014 04:21:02 PM UTC
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:hot_standby
Current max_connections setting:  100
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andrew Dunstan



On 04/21/2014 12:25 PM, Alfred Perlstein wrote:






1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.


2. We should be trying to get rid of GUCs where possible, and only 
add them when we must. The more there are the more we confuse users. 
If a packager can pick a default surely they can pick build options too.
Thank you for the lecture Andrew!  Really pleasant way to treat a user 
and a fan of the system. :)





I confess to being mightily confused.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Stephen Frost

* Alfred Perlstein (alf...@freebsd.org) wrote:
> There is definitely hope, however changes to the FreeBSD vm are
> taken as seriously as changes to core changes to Postresql's store.
> In addition changes to vm is somewhat in the realm of complexity of
> Postgresql store as well so it may not be coming in the next few
> days/weeks, but rather a month or two.  I am not sure if an easy fix
> is available in FreeBSD but we will see in short order.

This has been known for over a year.. :(

> I need to do some research.  I work with Adrian (FreeBSD kernel dev
> mentioned earlier in the thread), I'll grab him today and discuss
> what the issue may be.

Hopefully that'll get things moving in the right direction, finally..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 9:24 AM, Andrew Dunstan wrote:

On 04/21/2014 11:59 AM, Alfred Perlstein wrote:

On 4/21/14 8:45 AM, Andrew Dunstan wrote:

On 04/21/2014 11:39 AM, Magnus Hagander wrote:
On Mon, Apr 21, 2014 at 4:51 PM, Andres Freund 
mailto:and...@2ndquadrant.com>> wrote:

On 2014-04-21 10:45:24 -0400, Tom Lane wrote:
> Andres Freund mailto:and...@2ndquadrant.com>> writes:
> > If there are indeed such large regressions on FreeBSD we need
to treat
> > them as postgres regressions. It's nicer not to add config
options for
> > things that don't need it, but apparently that's not the case
here.
>
> > Imo this means we need to add GUC to control wether anon
mmap() or sysv
> > shmem is to be used. In 9.3.
>
> I will resist this mightily.  One of the main reasons to switch
to mmap
> was so we would no longer have to explain about SysV shm
configuration.

It's still explained in the docs and one of the dynshm 
implementations
is based on sysv shmem. So I don't see this as a convincing 
reason.

Regressing installed OSs by 15-20% just to save a couple of 
lines of

docs and code seems rather unconvincing to me.

There's also the fact that even if it's changed in FreeBSD, that 
might be somethign that takes years to trickle out to whatever 
stable release people are actually using.

But do we really want a *guc* for it though? Isn't it enough (and 
in fact better) with a configure switch to pick the implementation 
when multiple are available, that could then be set by default for 
example by the freebsd ports build? That's a lot less "overhead" to 
keep dragging around...

That seems to make more sense. I can't imagine why this would be a 
runtime parameter as opposed to build time.

I am unsure of the true overhead of making this a runtime tunable so 
pardon if I'm asking for "a lot".  From the perspective of both an OS 
developer and postgresql user (I am both) it really makes more sense 
to have it a runtime tunable for the following reasons:

From an OS developer making this a runtime allows us to much more 
easily do the testing (instead of needing two compiled versions).
From a sysadmin perspective it makes switching to/from a LOT easier 
in case the new mmap code exposes a stability or performance bug.

1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.

2. We should be trying to get rid of GUCs where possible, and only add 
them when we must. The more there are the more we confuse users. If a 
packager can pick a default surely they can pick build options too.
Thank you for the lecture Andrew!  Really pleasant way to treat a user 
and a fan of the system. :)

-Alfred

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Francois Tigeot

On Mon, Apr 21, 2014 at 05:43:46PM +0200, Andres Freund wrote:
> 
> On 2014-04-21 17:39:39 +0200, Magnus Hagander wrote:
> > But do we really want a *guc* for it though? Isn't it enough (and in fact
> > better) with a configure switch to pick the implementation when multiple
> > are available, that could then be set by default for example by the freebsd
> > ports build? That's a lot less "overhead" to keep dragging around...
> 
> Well, we don't know at all it's just freebsd that's affected. In fact, I
> would be surprised if there aren't other platforms that regressed due to
> this.

The only platforms affected are the BSDs.

I ran (or tried to run) Pgbench on the four major ones during the last two
years. My experience so far:

- FreeBSD: has trouble scaling; there's some interest to improve things but
  nobody has done anything about it yet

- NetBSD: crashes under load; this could have been fixed but when I ran the
  benchmarks in 2012 none of the developers seemed to care.

- OpenBSD: couldn't run the benchmark; for some reason this operating system
  is unable to mmap() 32GB of memory on a recent PC with 128GB of RAM.

- DragonFly: scales better than everything else out there even with mmap()

Given these results I doubt reintroducing SysV shm memory use on PostgreSQL
is worthwile; most platforms where it has a performance impact have much
bigger issues to fix first.

-- 
Francois Tigeot


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andrew Dunstan

On 04/21/2014 11:59 AM, Alfred Perlstein wrote:

On 4/21/14 8:45 AM, Andrew Dunstan wrote:

On 04/21/2014 11:39 AM, Magnus Hagander wrote:
On Mon, Apr 21, 2014 at 4:51 PM, Andres Freund 
mailto:and...@2ndquadrant.com>> wrote:

On 2014-04-21 10:45:24 -0400, Tom Lane wrote:
> Andres Freund mailto:and...@2ndquadrant.com>> writes:
> > If there are indeed such large regressions on FreeBSD we need
to treat
> > them as postgres regressions. It's nicer not to add config
options for
> > things that don't need it, but apparently that's not the case
here.
>
> > Imo this means we need to add GUC to control wether anon
mmap() or sysv
> > shmem is to be used. In 9.3.
>
> I will resist this mightily.  One of the main reasons to switch
to mmap
> was so we would no longer have to explain about SysV shm
configuration.

It's still explained in the docs and one of the dynshm 
implementations

is based on sysv shmem. So I don't see this as a convincing reason.

Regressing installed OSs by 15-20% just to save a couple of 
lines of

docs and code seems rather unconvincing to me.

There's also the fact that even if it's changed in FreeBSD, that 
might be somethign that takes years to trickle out to whatever 
stable release people are actually using.

But do we really want a *guc* for it though? Isn't it enough (and in 
fact better) with a configure switch to pick the implementation when 
multiple are available, that could then be set by default for 
example by the freebsd ports build? That's a lot less "overhead" to 
keep dragging around...

That seems to make more sense. I can't imagine why this would be a 
runtime parameter as opposed to build time.

I am unsure of the true overhead of making this a runtime tunable so 
pardon if I'm asking for "a lot".  From the perspective of both an OS 
developer and postgresql user (I am both) it really makes more sense 
to have it a runtime tunable for the following reasons:

From an OS developer making this a runtime allows us to much more 
easily do the testing (instead of needing two compiled versions).
From a sysadmin perspective it makes switching to/from a LOT easier in 
case the new mmap code exposes a stability or performance bug.

1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.

2. We should be trying to get rid of GUCs where possible, and only add 
them when we must. The more there are the more we confuse users. If a 
packager can pick a default surely they can pick build options too.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 9:13 AM, Stephen Frost wrote:

* Alfred Perlstein (alf...@freebsd.org) wrote:

Can the package builder not set the default for the runtime tunable?

Yeah, I was thinking about that also, but at least in this case it seems
pretty clear that the 'right' answer is known at build time.


Honestly we're about to select a db platform for another FreeBSD
based system we are building, I strongly hoping that we can get back
to sysvshm easily otherwise we may have to select another store.

Is there no hope of this getting fixed in FreeBSD..?  PG wouldn't be the
only application impacted by this, I'm sure...
There is definitely hope, however changes to the FreeBSD vm are taken as 
seriously as changes to core changes to Postresql's store. In addition 
changes to vm is somewhat in the realm of complexity of Postgresql store 
as well so it may not be coming in the next few days/weeks, but rather a 
month or two.  I am not sure if an easy fix is available in FreeBSD but 
we will see in short order.


I need to do some research.  I work with Adrian (FreeBSD kernel dev 
mentioned earlier in the thread), I'll grab him today and discuss what 
the issue may be.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Stephen Frost

* Josh Berkus (j...@agliodbs.com) wrote:
> 1) I've confirmed at the 2nd site that the issue doesn't happen if
> pg_stat_statements.so is not loaded.  So this seems to be confirmation
> that either auto_explain, pg_stat_statements, or both need to be loaded
> (but not necessarily created as extensions) in order to have the issue.

Interesting...

> 2) I've been unable to reproduce the issue using a naive test case, but
> I'll keep trying.

Thanks.

> 3) The XID information is interesting.
> 
> a) I've confirmed that this is a case of having multiple row versions
> rather than allowing a duplicate PK to be inserted.
> 
> b) the end result of this corruption is XIDs which go backwards:
> 
>   xmin   |  xmax   |   ctid| step_id | workitem_id | status
> -+-+---+-+-+-
>  3362707 | 3362707 | (6177,31) |   1 |  446469 | pending
>  3362710 | 3362707 | (6177,32) |   1 |  446469 | working
> 
>  5520953 | 5520953 | (5064,105) |   1 |  727946 | pending
>  5520954 | 5520953 | (5064,108) |   1 |  727946 | working
> 
> What's additionally problematic is that the current snapshot minxid is
> in the 900 range, so it's not clear why any of the above rows are
> visible at all.

Can you get the infomask bits..?  What's does pg_controldata report wrt
the MultiXid's?

THanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus

All,

More on this:

1) I've confirmed at the 2nd site that the issue doesn't happen if
pg_stat_statements.so is not loaded.  So this seems to be confirmation
that either auto_explain, pg_stat_statements, or both need to be loaded
(but not necessarily created as extensions) in order to have the issue.

2) I've been unable to reproduce the issue using a naive test case, but
I'll keep trying.

3) The XID information is interesting.

a) I've confirmed that this is a case of having multiple row versions
rather than allowing a duplicate PK to be inserted.

b) the end result of this corruption is XIDs which go backwards:

  xmin   |  xmax   |   ctid| step_id | workitem_id | status
-+-+---+-+-+-
 3362707 | 3362707 | (6177,31) |   1 |  446469 | pending
 3362710 | 3362707 | (6177,32) |   1 |  446469 | working

 5520953 | 5520953 | (5064,105) |   1 |  727946 | pending
 5520954 | 5520953 | (5064,108) |   1 |  727946 | working

What's additionally problematic is that the current snapshot minxid is
in the 900 range, so it's not clear why any of the above rows are
visible at all.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Merlin Moncure

On Mon, Apr 21, 2014 at 10:40 AM, Andres Freund  wrote:
> On 2014-04-21 11:30:57 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>> > I unfortunately haven't followed this in detail, but shouldn't it be
>> > relatively easily to make this even cheaper by checking for
>> > HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
>> > composite's columns, right?
>>
>> That's the point I made further down:
>
> Oh, sorry. I started reading this thread from the end just now.
>
>> we could do that if we were willing
>> to abandon the principle that nested fields shouldn't be compressed.
>> It's not very clear what it'd cost us to give that up.
>
> I don't think the cost of that would be all that high.

I think it's pretty reasonable too.

> And too bad that infomask bits are so scarce :(. We really need to
> reclaim HEAP_MOVED_OFF and HEAP_MOVED_IN.

The only consequence of that is losing support for in-place update for
pre-9.0 (of which the only supported version is 8.4).  I figure it's
also pretty reasonable to drop support for IPU for out of support
versions for new versions going forward.  That would recover the bits
and yield some nice cleanups in tqual.c.

8.4 is scheduled to go out of support in July, so if you agree with my
reasoning 9.4 would be a candidate for not honoring 8.4 upgrade
support.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andres Freund

On 2014-04-21 11:58:10 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-04-21 11:45:49 -0400, Andrew Dunstan wrote:
> >> That seems to make more sense. I can't imagine why this would be a runtime
> >> parameter as opposed to build time.
> 
> > Because that implies that packagers and porters need to make that
> > decision. If it's a GUC people can benchmark it and decide.
> 
> As against that, the packager would be more likely to get it right
> (or even to know that there's an issue).

I sure hope that FreeBSD is going to fix this at some point (it's surely
affecting more than just postgres). But since we (and probably the
packagers) don't know which platforms it's going to affect the only
thing we could do would be to add a configure switch. To test people
would need to recompile postgres.
I don't understand what the problem with a GUC here is. It's a pretty
simple patch and that codepath is entered only once, so performance
surely can't be an argument.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Stephen Frost

* Alfred Perlstein (alf...@freebsd.org) wrote:
> Can the package builder not set the default for the runtime tunable?

Yeah, I was thinking about that also, but at least in this case it seems
pretty clear that the 'right' answer is known at build time.

> Honestly we're about to select a db platform for another FreeBSD
> based system we are building, I strongly hoping that we can get back
> to sysvshm easily otherwise we may have to select another store.

Is there no hope of this getting fixed in FreeBSD..?  PG wouldn't be the
only application impacted by this, I'm sure...

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Tom Lane

Andres Freund  writes:
> On 2014-04-21 11:45:49 -0400, Andrew Dunstan wrote:
>> That seems to make more sense. I can't imagine why this would be a runtime
>> parameter as opposed to build time.

> Because that implies that packagers and porters need to make that
> decision. If it's a GUC people can benchmark it and decide.

As against that, the packager would be more likely to get it right
(or even to know that there's an issue).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 8:58 AM, Tom Lane wrote:

Andres Freund  writes:

On 2014-04-21 11:45:49 -0400, Andrew Dunstan wrote:

That seems to make more sense. I can't imagine why this would be a runtime
parameter as opposed to build time.

Because that implies that packagers and porters need to make that
decision. If it's a GUC people can benchmark it and decide.

As against that, the packager would be more likely to get it right
(or even to know that there's an issue).


Can the package builder not set the default for the runtime tunable?

Honestly we're about to select a db platform for another FreeBSD based 
system we are building, I strongly hoping that we can get back to 
sysvshm easily otherwise we may have to select another store.


-Alfred (who still remembers back when Tom had a login on our primary db 
to help us. :) )




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 8:45 AM, Andrew Dunstan wrote:

On 04/21/2014 11:39 AM, Magnus Hagander wrote:
On Mon, Apr 21, 2014 at 4:51 PM, Andres Freund 
mailto:and...@2ndquadrant.com>> wrote:

On 2014-04-21 10:45:24 -0400, Tom Lane wrote:
> Andres Freund mailto:and...@2ndquadrant.com>> writes:
> > If there are indeed such large regressions on FreeBSD we need
to treat
> > them as postgres regressions. It's nicer not to add config
options for
> > things that don't need it, but apparently that's not the case
here.
>
> > Imo this means we need to add GUC to control wether anon
mmap() or sysv
> > shmem is to be used. In 9.3.
>
> I will resist this mightily.  One of the main reasons to switch
to mmap
> was so we would no longer have to explain about SysV shm
configuration.

It's still explained in the docs and one of the dynshm 
implementations

is based on sysv shmem. So I don't see this as a convincing reason.

Regressing installed OSs by 15-20% just to save a couple of lines of
docs and code seems rather unconvincing to me.

There's also the fact that even if it's changed in FreeBSD, that 
might be somethign that takes years to trickle out to whatever stable 
release people are actually using.

But do we really want a *guc* for it though? Isn't it enough (and in 
fact better) with a configure switch to pick the implementation when 
multiple are available, that could then be set by default for example 
by the freebsd ports build? That's a lot less "overhead" to keep 
dragging around...

That seems to make more sense. I can't imagine why this would be a 
runtime parameter as opposed to build time.

I am unsure of the true overhead of making this a runtime tunable so 
pardon if I'm asking for "a lot".  From the perspective of both an OS 
developer and postgresql user (I am both) it really makes more sense to 
have it a runtime tunable for the following reasons:

From an OS developer making this a runtime allows us to much more 
easily do the testing (instead of needing two compiled versions).
From a sysadmin perspective it makes switching to/from a LOT easier in 
case the new mmap code exposes a stability or performance bug.

-Alfred

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andres Freund

Hi,

On 2014-04-21 11:45:49 -0400, Andrew Dunstan wrote:
> That seems to make more sense. I can't imagine why this would be a runtime
> parameter as opposed to build time.

Because that implies that packagers and porters need to make that
decision. If it's a GUC people can benchmark it and decide.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andrew Dunstan

On 04/21/2014 11:39 AM, Magnus Hagander wrote:
On Mon, Apr 21, 2014 at 4:51 PM, Andres Freund > wrote:

On 2014-04-21 10:45:24 -0400, Tom Lane wrote:
> Andres Freund mailto:and...@2ndquadrant.com>> writes:
> > If there are indeed such large regressions on FreeBSD we need
to treat
> > them as postgres regressions. It's nicer not to add config
options for
> > things that don't need it, but apparently that's not the case
here.
>
> > Imo this means we need to add GUC to control wether anon
mmap() or sysv
> > shmem is to be used. In 9.3.
>
> I will resist this mightily.  One of the main reasons to switch
to mmap
> was so we would no longer have to explain about SysV shm
configuration.

It's still explained in the docs and one of the dynshm implementations
is based on sysv shmem. So I don't see this as a convincing reason.

Regressing installed OSs by 15-20% just to save a couple of lines of
docs and code seems rather unconvincing to me.

There's also the fact that even if it's changed in FreeBSD, that might 
be somethign that takes years to trickle out to whatever stable 
release people are actually using.

But do we really want a *guc* for it though? Isn't it enough (and in 
fact better) with a configure switch to pick the implementation when 
multiple are available, that could then be set by default for example 
by the freebsd ports build? That's a lot less "overhead" to keep 
dragging around...

That seems to make more sense. I can't imagine why this would be a 
runtime parameter as opposed to build time.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andres Freund

Hi,

On 2014-04-21 17:39:39 +0200, Magnus Hagander wrote:
> But do we really want a *guc* for it though? Isn't it enough (and in fact
> better) with a configure switch to pick the implementation when multiple
> are available, that could then be set by default for example by the freebsd
> ports build? That's a lot less "overhead" to keep dragging around...

Well, we don't know at all it's just freebsd that's affected. In fact, I
would be surprised if there aren't other platforms that regressed due to
this.
I think a configure switch actually ends up being more code than the GUC...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Andres Freund

On 2014-04-21 11:30:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I unfortunately haven't followed this in detail, but shouldn't it be
> > relatively easily to make this even cheaper by checking for
> > HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
> > composite's columns, right?
> 
> That's the point I made further down:

Oh, sorry. I started reading this thread from the end just now.

> we could do that if we were willing
> to abandon the principle that nested fields shouldn't be compressed.
> It's not very clear what it'd cost us to give that up.

I don't think the cost of that would be all that high. As you argue,
without that trick the cost of iterating over all columns will be paid
all the time, whereas double compression will take effect much less
often. And might even end up being beneficial.
The risk of significant performance regressions due to backpatching
seems significantly less likely if we pay heed to HASEXTERNAL.

> (Too bad we didn't define a HEAP_HASCOMPRESSED flag bit ...)

And too bad that infomask bits are so scarce :(. We really need to
reclaim HEAP_MOVED_OFF and HEAP_MOVED_IN.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Magnus Hagander

On Mon, Apr 21, 2014 at 4:51 PM, Andres Freund wrote:

> On 2014-04-21 10:45:24 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > If there are indeed such large regressions on FreeBSD we need to treat
> > > them as postgres regressions. It's nicer not to add config options for
> > > things that don't need it, but apparently that's not the case here.
> >
> > > Imo this means we need to add GUC to control wether anon mmap() or sysv
> > > shmem is to be used. In 9.3.
> >
> > I will resist this mightily.  One of the main reasons to switch to mmap
> > was so we would no longer have to explain about SysV shm configuration.
>
> It's still explained in the docs and one of the dynshm implementations
> is based on sysv shmem. So I don't see this as a convincing reason.
>
> Regressing installed OSs by 15-20% just to save a couple of lines of
> docs and code seems rather unconvincing to me.
>
>
There's also the fact that even if it's changed in FreeBSD, that might be
somethign that takes years to trickle out to whatever stable release people
are actually using.

But do we really want a *guc* for it though? Isn't it enough (and in fact
better) with a configure switch to pick the implementation when multiple
are available, that could then be set by default for example by the freebsd
ports build? That's a lot less "overhead" to keep dragging around...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Tom Lane

Andres Freund  writes:
> I unfortunately haven't followed this in detail, but shouldn't it be
> relatively easily to make this even cheaper by checking for
> HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
> composite's columns, right?

That's the point I made further down: we could do that if we were willing
to abandon the principle that nested fields shouldn't be compressed.
It's not very clear what it'd cost us to give that up.  (Too bad we didn't
define a HEAP_HASCOMPRESSED flag bit ...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Andres Freund

Hi,

On 2014-04-20 15:38:23 -0400, Tom Lane wrote:
> Some poking around with oprofile says that much of the added time is
> coming from added syscache and typcache lookups, as I suspected.
> 
> I did some further work on the patch to make it possible to cache
> the element tuple descriptor across calls for these two functions.
> With the attached patch, the first test case comes down to about 335 ms
> and the second to about 1475 ms (plpgsql is still leaving some extra
> cache lookups on the table).  More could be done with some further API
> changes, but I think this is about as much as is safe to back-patch.

I unfortunately haven't followed this in detail, but shouldn't it be
relatively easily to make this even cheaper by checking for
HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
composite's columns, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Tom Lane

Noah Misch  writes:
> I wonder how it would work out to instead delay this new detoast effort until
> toast_insert_or_update().

That would require toast_insert_or_update() to know about every container
datatype.  I doubt it could lead to an extensible or maintainable
solution.

I'm actually planning to set this patch on the shelf for a bit and go
investigate the other alternative, ie, not generating composite Datums
containing toast pointers in the first place.  We now know that the idea
that we aren't going to take performance hits *somewhere* is an illusion,
and I still suspect that the other way is going to lead to a smaller and
cleaner patch.  The main performance downside for plpgsql might be
addressable by making sure that plpgsql record variables fall on the "heap
tuple" rather than the "composite Datum" side of the line.  I'm also quite
concerned about correctness: I don't have a lot of confidence that this
patch has closed every loophole with respect to arrays, and it hasn't even
touched ranges or any of the related one-off bugs that I believe exist.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andres Freund

On 2014-04-21 10:45:24 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > If there are indeed such large regressions on FreeBSD we need to treat
> > them as postgres regressions. It's nicer not to add config options for
> > things that don't need it, but apparently that's not the case here.
> 
> > Imo this means we need to add GUC to control wether anon mmap() or sysv
> > shmem is to be used. In 9.3.
> 
> I will resist this mightily.  One of the main reasons to switch to mmap
> was so we would no longer have to explain about SysV shm configuration.

It's still explained in the docs and one of the dynshm implementations
is based on sysv shmem. So I don't see this as a convincing reason.

Regressing installed OSs by 15-20% just to save a couple of lines of
docs and code seems rather unconvincing to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Tom Lane

Andres Freund  writes:
> If there are indeed such large regressions on FreeBSD we need to treat
> them as postgres regressions. It's nicer not to add config options for
> things that don't need it, but apparently that's not the case here.

> Imo this means we need to add GUC to control wether anon mmap() or sysv
> shmem is to be used. In 9.3.

I will resist this mightily.  One of the main reasons to switch to mmap
was so we would no longer have to explain about SysV shm configuration.
Are we going to still have to explain that, but only for FreeBSD?
No thanks.  It will certainly not be the *first* resort.  Instead,
somebody needs to hold the FreeBSD folks' feet to the fire about when
we can expect to see a fix from their side.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 4:10 AM, Andres Freund wrote:

Hi,

On 2014-04-20 11:24:38 +0200, Palle Girgensohn wrote:

I see performance degradation with PostgreSQL 9.3 vs 9.2 on FreeBSD, and I'm 
wondering who to poke to mitigate the problem. In reference to this thread [1], 
who where the FreeBSD people that Francois mentioned? If mmap needs to perform 
well in the kernel, I'd like to know of someone with FreeBSD kernel knowledge 
who is interested in working with mmap perfocmance. If mmap is indeed the 
cuplrit, I've just tested 9.2.8 vs 9.3.4, I nevere isolated the mmap patch, 
although I believe Francois did just that with similar results.

If there are indeed such large regressions on FreeBSD we need to treat
them as postgres regressions. It's nicer not to add config options for
things that don't need it, but apparently that's not the case here.

Imo this means we need to add GUC to control wether anon mmap() or sysv
shmem is to be used. In 9.3.

Greetings,

Andres Freund

Andres, thank you.  Speaking as a FreeBSD developer that would be a good 
idea.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] New functions for sslinfo extension

2014-04-21 Thread Воронин Дмитрий

Hello, I make an a patch, which adds 4 functions to sslinfo extension module:1) ssl_get_count_of_extensions() --- get count of X509v3 extensions from client certificate;2) ssl_get_extension_names() --- get short names of X509v3 extensions from client certificate;3) ssl_get_extension_value(text) --- get value of extension from certificate (argument --- short name of extension);4) ssl_is_critical_extension(text) --- returns true, if extension is critical and false, if is not (argument --- short name of extension). You can view some information of certificate's extensions via those functions.I want, that my functions will be included in PostgreSQL release. What do you think about it? -- Best regards, Dmitry Voronin

Re: [HACKERS] Re: [DOCS] Docs incorrectly claiming equivalence between show and pg_settings

2014-04-21 Thread Stefan Seifert

On Saturday 19 April 2014 13:38:16 Tom Lane wrote:

> > 3  document this restriction
> 
> As for (3), I might be wrong, but I don't think the documentation mentions
> the possibility of abusing SET this way at all.  Restrictions in
> undocumented quasi-features are likewise undocumented.

http://www.postgresql.org/docs/devel/static/runtime-config-custom.html
states: "Because custom options may need to be set in processes that have not 
loaded the relevant extension module, PostgreSQL will accept a setting for any 
two-part parameter name."
So I'd say the possibility of using SET this way is somewha documented. That's 
at least how I actually got the idea of doing that.

Maybe it can help if I describe our use case: we have some plperlu stored 
procedures and triggers that use them that replicate specific data from our 
internal management application's database to the system on our webserver. 
When running the application's test suite though we obviously don't want that 
and instead want it to replicate to another local database, so we can test the 
triggers as well. We use a custom option set by our test suite to communicate 
the required change in replication target to the stored procedure. As a nice 
side effect we can use the same mechanism to prevent a test sandbox version of 
the application from trying to replicate to the webserver.

So far this works quite well except for the SHOW command throwing an exception 
when we request an unknown configuration parameter complicating our stored 
procedure a little. It would be more comfortable if we could query pg_settings 
and just get an empty result if the configuration parameter is not set.

Regards,
Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New functions for sslinfo extension

2014-04-21 Thread Dmitry Voronin

Hello, I make an a patch, which adds 4 functions to sslinfo extension module.  You can view some information of certificate's extensions with those functions. The descriptions of functions I posted in my first letter and in source code. What do you think about it? -- Best regards, Dmitry Voronin --- contrib/sslinfo/sslinfo.c	2014-03-17 23:35:47.0 +0400
+++ contrib/sslinfo/sslinfo.c	2014-04-18 11:09:49.567775647 +0400
@@ -5,6 +5,8 @@
  * This file is distributed under BSD-style license.
  *
  * contrib/sslinfo/sslinfo.c
+ * 
+ * Extension functions written by Dmitry Voronin carriingfat...@yandex.ru, CNIIEISU.
  */
 
 #include "postgres.h"
@@ -14,9 +16,11 @@
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "mb/pg_wchar.h"
+#include "funcapi.h"
 
 #include 
 #include 
+#include 
 
 
 PG_MODULE_MAGIC;
@@ -35,6 +39,11 @@
 Datum		X509_NAME_to_text(X509_NAME *name);
 Datum		ASN1_STRING_to_text(ASN1_STRING *str);
 
+X509_EXTENSION	*get_extension(X509* certificate, char *name);
+Datum 		ssl_get_extension_value(PG_FUNCTION_ARGS);
+Datum		ssl_is_critical_extension(PG_FUNCTION_ARGS);
+Datum 		ssl_get_count_of_extensions(PG_FUNCTION_ARGS);
+Datum		ssl_get_extension_names(PG_FUNCTION_ARGS);
 
 /*
  * Indicates whether current session uses SSL
@@ -371,3 +380,146 @@
 		PG_RETURN_NULL();
 	return X509_NAME_to_text(X509_get_issuer_name(MyProcPort->peer));
 }
+
+
+X509_EXTENSION *get_extension(X509* certificate, char *name) {
+	int 			extension_nid = 0;
+	int 			locate = 0;
+	
+	extension_nid = OBJ_sn2nid(name);
+	if (extension_nid == NID_undef) {
+		extension_nid = OBJ_ln2nid(name);
+		if (extension_nid == NID_undef) 
+			return NULL;
+	}
+	locate = X509_get_ext_by_NID(certificate, extension_nid,  -1);
+	return X509_get_ext(certificate, locate);
+}
+
+/* Returns value of extension. 
+ * 
+ * This function returns value of extension by short name in client certificate. 
+ * 
+ * Returns text datum. 
+ */
+
+PG_FUNCTION_INFO_V1(ssl_get_extension_value);
+Datum
+ssl_get_extension_value(PG_FUNCTION_ARGS) {	
+	X509 			*certificate = MyProcPort -> peer;
+	X509_EXTENSION 		*extension = NULL;
+	char 			*extension_name = text_to_cstring(PG_GETARG_TEXT_P(0));
+	BIO 			*bio = NULL;
+	char 			*value = NULL;
+	char 			nullterm = '\0';
+	text 			*result = NULL;
+
+	if (certificate == NULL)
+		PG_RETURN_NULL();
+
+	extension = get_extension(certificate, extension_name);
+	if (extension == NULL)
+		elog(ERROR, "Extension by name \"%s\" is not found in certificate", extension_name);
+
+	bio = BIO_new(BIO_s_mem());
+	X509V3_EXT_print(bio, extension, -1, -1);
+	BIO_write(bio, &nullterm, 1);
+	BIO_get_mem_data(bio, &value);
+
+	result = cstring_to_text(value);
+	BIO_free(bio);
+
+	PG_RETURN_TEXT_P(result);
+}
+
+/* Returns status of extension 
+ * 
+ * Returns true, if extension is critical and false, if it is not.
+ * 
+ * Returns bool datum
+ */
+PG_FUNCTION_INFO_V1(ssl_is_critical_extension);
+Datum
+ssl_is_critical_extension(PG_FUNCTION_ARGS) {
+	X509 			*certificate = MyProcPort -> peer;
+	X509_EXTENSION 		*extension = NULL;
+	char 			*extension_name = text_to_cstring(PG_GETARG_TEXT_P(0));
+	int 			critical = 0;
+	
+	if (certificate == NULL)
+		PG_RETURN_NULL();
+	
+	extension = get_extension(certificate, extension_name);
+	if (extension == NULL) 
+		elog(ERROR, "Extension name \"%s\" is not found in certificate", extension_name);
+	
+	critical = X509_EXTENSION_get_critical(extension);
+	PG_RETURN_BOOL(critical);
+}
+
+/* Returns count of extensions in client certificate
+ * 
+ * Returns int datum
+ */
+PG_FUNCTION_INFO_V1(ssl_get_count_of_extensions);
+Datum
+ssl_get_count_of_extensions(PG_FUNCTION_ARGS) {
+	X509 			*certificate = MyProcPort -> peer;
+	
+	if (certificate == NULL)
+		PG_RETURN_NULL();
+	
+	PG_RETURN_INT32(X509_get_ext_count(certificate));
+}
+
+/* Returns short names of extensions in client certificate
+ * 
+ * Returns setof text datum
+ */
+PG_FUNCTION_INFO_V1(ssl_get_extension_names);
+Datum
+ssl_get_extension_names(PG_FUNCTION_ARGS) {
+	X509*certificate = MyProcPort -> peer;
+	FuncCallContext 		*funcctx;
+	STACK_OF(X509_EXTENSION) 	*extension_stack = NULL;
+	MemoryContext 			oldcontext;
+	int call = 0;
+	int max_calls = 0;
+	X509_EXTENSION			*extension = NULL;
+	ASN1_OBJECT			*object = NULL;
+	int extension_nid = 0;
+	text*result = NULL;
+	
+	if (certificate == NULL)
+		PG_RETURN_NULL();
+	
+	extension_stack = certificate -> cert_info -> extensions;
+	if (extension_stack == NULL) 
+		PG_RETURN_NULL();
+	
+	if (SRF_IS_FIRSTCALL()) {
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx -> multi_call_memory_ctx);
+		funcctx -> max_calls = X509_get_ext_count(certificate);
+		MemoryContextSwitchTo(oldcontext);
+	}
+	funcctx = SRF_PERCALL_SETUP();
+	
+	call = funcctx -> call_cntr;
+	max_calls = funcctx -> max_calls;
+	
+	if (call < max_calls) {
+		extension = sk_X509_EXTENSION_value(extension_stack, call);
+		object = X509_EXTENSION_get_object(extension);

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Andres Freund

Hi,

On 2014-04-20 11:24:38 +0200, Palle Girgensohn wrote:
> I see performance degradation with PostgreSQL 9.3 vs 9.2 on FreeBSD, and I'm 
> wondering who to poke to mitigate the problem. In reference to this thread 
> [1], who where the FreeBSD people that Francois mentioned? If mmap needs to 
> perform well in the kernel, I'd like to know of someone with FreeBSD kernel 
> knowledge who is interested in working with mmap perfocmance. If mmap is 
> indeed the cuplrit, I've just tested 9.2.8 vs 9.3.4, I nevere isolated the 
> mmap patch, although I believe Francois did just that with similar results.

If there are indeed such large regressions on FreeBSD we need to treat
them as postgres regressions. It's nicer not to add config options for
things that don't need it, but apparently that's not the case here.

Imo this means we need to add GUC to control wether anon mmap() or sysv
shmem is to be used. In 9.3.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Palle Girgensohn



> 21 apr 2014 kl. 11:26 skrev Francois Tigeot :
> 
>> On Sun, Apr 20, 2014 at 04:07:25PM +0200, Palle Girgensohn wrote:
>> 
 If mmap needs to perform well in the kernel, I'd like to know of someone 
 with FreeBSD kernel knowledge who is interested in working with mmap 
 perfocmance. If mmap is indeed the cuplrit, I've just tested 9.2.8 vs 
 9.3.4, I nevere isolated the mmap patch, although I believe Francois did 
 just that with similar results.
>>> 
>>> I did test the 9.3 -devel branch before and after the SysV shared memory =>
>>> mmap commit. The performance degradation was visible.
>>> 
>>> I recently ran a few benchmarks of PostgreSQL 9.3 with different operating 
>>> systems
>>> including DragonFly 3.6 and FreeBSD 10. You may be interested in the 
>>> results:
>>> 
>>> http://lists.dragonflybsd.org/pipermail/users/2014-March/128216.html
>> 
>> Interesting, indeed. The fixes to dragonfly where made quite recently, in 
>> 3.2, right?
> 
> The most important fixes occured in the 3.1 development version, around
> September 2012.
> 
> There was definitely more than an isolated patch; the new scheduler was only
> part of the performance improvements.
> I'm afraid none of the commits would be applicable as-is to FreeBSD 10.x; the
> DragonFly kernel is vastly different in locking, threading and VM management.
> 
> The FreeBSD folks should know what to do though; I collected performance
> counter data during the last benchmark run and sent it to adrian@.
> It was also discussed on freebsd-performance; the thread begins here:
> http://lists.freebsd.org/pipermail/freebsd-performance/2014-March/004770.html
> 

Great, thanks for the pointers!

Palle

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Francois Tigeot

On Sun, Apr 20, 2014 at 04:07:25PM +0200, Palle Girgensohn wrote:
> 
> >> If mmap needs to perform well in the kernel, I'd like to know of someone 
> >> with FreeBSD kernel knowledge who is interested in working with mmap 
> >> perfocmance. If mmap is indeed the cuplrit, I've just tested 9.2.8 vs 
> >> 9.3.4, I nevere isolated the mmap patch, although I believe Francois did 
> >> just that with similar results.
> > 
> > I did test the 9.3 -devel branch before and after the SysV shared memory =>
> > mmap commit. The performance degradation was visible.
> > 
> > I recently ran a few benchmarks of PostgreSQL 9.3 with different operating 
> > systems
> > including DragonFly 3.6 and FreeBSD 10. You may be interested in the 
> > results:
> > 
> > http://lists.dragonflybsd.org/pipermail/users/2014-March/128216.html
> > 
> 
> Interesting, indeed. The fixes to dragonfly where made quite recently, in 
> 3.2, right?

The most important fixes occured in the 3.1 development version, around
September 2012.

There was definitely more than an isolated patch; the new scheduler was only
part of the performance improvements.
I'm afraid none of the commits would be applicable as-is to FreeBSD 10.x; the
DragonFly kernel is vastly different in locking, threading and VM management.

The FreeBSD folks should know what to do though; I collected performance
counter data during the last benchmark run and sent it to adrian@.
It was also discussed on freebsd-performance; the thread begins here:
http://lists.freebsd.org/pipermail/freebsd-performance/2014-March/004770.html

-- 
Francois Tigeot

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 102 matches

Mail list logo