date:20130521

Re: [HACKERS] Fast promotion failure

2013-05-21 Thread Heikki Linnakangas


On 21.05.2013 00:00, Simon Riggs wrote:

When we set the new timeline we should be updating all values that
might be used elsewhere. If we do that, then no matter when or how we
run GetXLogReplayRecPtr, it can't ever get it wrong in any backend.

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5838,8 +5838,16 @@ StartupXLOG(void)
}

/* Save the selected TimeLineID in shared memory, too */
-   XLogCtl-ThisTimeLineID = ThisTimeLineID;
-   XLogCtl-PrevTimeLineID = PrevTimeLineID;
+   {
+   /* use volatile pointer to prevent code rearrangement */
+   volatile XLogCtlData *xlogctl = XLogCtl;
+
+   SpinLockAcquire(xlogctl-info_lck);
+   XLogCtl-ThisTimeLineID = ThisTimeLineID;
+   XLogCtl-lastReplayedTLI = ThisTimeLineID;
+   XLogCtl-PrevTimeLineID = PrevTimeLineID;
+   SpinLockRelease(xlogctl-info_lck);
+   }


Hmm. lastReplayedTLI is supposed to be the timeline of the last record 
that was replayed, ie. the timeline corresponding lastReplayedEndRecPtr. 
I think it would work, but it feels like it could be an easy source of 
bugs in the future.


It might be a good idea to change walsender to not store that in 
ThisTimeLineID, though. It used to make sense for ThisTimeLineID to 
track the current recovery timeline in 9.2 and below, but now that 
walsender can be sending any older timeline, using ThisTimeLineID to 
store the latest one seems confusing.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move unused buffers to freelist

2013-05-21 Thread Amit Kapila

On Monday, May 20, 2013 6:54 PM Robert Haas wrote:
 On Thu, May 16, 2013 at 10:18 AM, Amit Kapila amit.kap...@huawei.com
 wrote:
  Further Performance Data:
 
  Below data is for average 3 runs of 20 minutes
 
  Scale Factor   - 1200
  Shared Buffers - 7G
 
 These results are good but I don't get similar results in my own
 testing.  

Thanks for running detailed tests

 I ran pgbench tests at a variety of client counts and scale
 factors, using 30-minute test runs and the following non-default
 configuration parameters.
 
 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 log_line_prefix = '%t [%p] '
 
 Here are the results.  The first field in each line is the number of
 clients. The second number is the scale factor.  The numbers after
 master and patched are the median of three runs.
 
 01 100 master 1433.297699 patched 1420.306088
 01 300 master 1371.286876 patched 1368.910732
 01 1000 master 1056.891901 patched 1067.341658
 01 3000 master 637.312651 patched 685.205011
 08 100 master 10575.017704 patched 11456.043638
 08 300 master 9262.601107 patched 9120.925071
 08 1000 master 1721.807658 patched 1800.733257
 08 3000 master 819.694049 patched 854.333830
 32 100 master 26981.677368 patched 27024.507600
 32 300 master 14554.870871 patched 14778.285400
 32 1000 master 1941.733251 patched 1990.248137
 32 3000 master 846.654654 patched 892.554222


Is the above test for tpc-b?
In the above tests, there is performance increase from 1~8% and decrease
from 0.2~1.5%

 And here's the same results for 5-minute, read-only tests:
 
 01 100 master 9361.073952 patched 9049.553997
 01 300 master 8640.235680 patched 8646.590739
 01 1000 master 8339.364026 patched 8342.799468
 01 3000 master 7968.428287 patched 7882.121547
 08 100 master 71311.491773 patched 71812.899492
 08 300 master 69238.839225 patched 70063.632081
 08 1000 master 34794.778567 patched 65998.468775
 08 3000 master 60834.509571 patched 61165.998080
 32 100 master 203168.264456 patched 205258.283852
 32 300 master 199137.276025 patched 200391.633074
 32 1000 master 177996.853496 patched 176365.732087
 32 3000 master 149891.147442 patched 148683.269107
 
 Something appears to have screwed up my results for 8 clients @ scale
 factor 300 on master, 

  Do you want to say the reading of 1000 scale factor?
  
but overall, on both the read-only and
 read-write tests, I'm not seeing anything that resembles the big gains
 you reported.

I have not generated numbers for read-write tests, I will check that once.
For read-only tests, the performance increase is minor and different from
what I saw. 
Few points which I could think of for difference in data:

1. In my test's I always observed best data when number of clients/threads
are equal to number of cores which in your case should be at 16.
2. I think for scale factor 100 and 300, there should not be much
performance increase, as for them they should mostly get buffer from
freelist inspite of even bgwriter adds to freelist or not. 
3. In my tests variance is for shared buffers, database size is always less
than RAM (Scale Factor -1200, approx db size 16~17GB, RAM -24 GB), but due
to variance in shared buffers, it can lead to I/O. 
4. Each run is of 20 minutes, not sure if this has any difference.
 
 Tests were run on a 16-core, 64-hwthread PPC64 machine provided to the
 PostgreSQL community courtesy of IBM.  Fedora 16, Linux kernel 3.2.6.

To think about the difference in your and my runs, could you please tell me
about below points
1. What is RAM in machine.
2. Are number of threads equal to number of clients.
3. Before starting tests I have always done pre-warming of buffers (used
pg_prewarm written by you last year), is it same for above read-only tests.
4. Can you please once again run only the test where you saw variation(8
clients @ scale factor 1000 on master), because I have also seen that
performance difference is very good for certain
   configurations(Scale Factor, RAM, Shared Buffers)

Apart from above, I had one more observation during my investigation to find
why in some cases, there is a small dip:
1. Many times, it finds the buffer in free list is not usable, means it's
refcount or usage count is not zero, due to which it had to spend more time
under BufFreelistLock.
   I had not any further experiments related to this finding like if it
really adds any overhead.

Currently I am trying to find reasons for small dip of performance and see
if I could do something to avoid it. Also I will run tests with various
configurations.

Any other suggestions?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast promotion failure

2013-05-21 Thread Simon Riggs

On 21 May 2013 07:46, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 21.05.2013 00:00, Simon Riggs wrote:

 When we set the new timeline we should be updating all values that
 might be used elsewhere. If we do that, then no matter when or how we
 run GetXLogReplayRecPtr, it can't ever get it wrong in any backend.

 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -5838,8 +5838,16 @@ StartupXLOG(void)
 }

 /* Save the selected TimeLineID in shared memory, too */
 -   XLogCtl-ThisTimeLineID = ThisTimeLineID;
 -   XLogCtl-PrevTimeLineID = PrevTimeLineID;
 +   {
 +   /* use volatile pointer to prevent code rearrangement */
 +   volatile XLogCtlData *xlogctl = XLogCtl;
 +
 +   SpinLockAcquire(xlogctl-info_lck);
 +   XLogCtl-ThisTimeLineID = ThisTimeLineID;
 +   XLogCtl-lastReplayedTLI = ThisTimeLineID;
 +   XLogCtl-PrevTimeLineID = PrevTimeLineID;
 +   SpinLockRelease(xlogctl-info_lck);
 +   }


 Hmm. lastReplayedTLI is supposed to be the timeline of the last record that
 was replayed, ie. the timeline corresponding lastReplayedEndRecPtr. I think
 it would work, but it feels like it could be an easy source of bugs in the
 future.

I'm OK with that principle, as long as we don't touch ThisTimeLineID,
which has been the source of multiple bugs.

So we should set the TLI explicitly before installing, like attached patch.

Otherwise we'd need multiple permanent TLIs which would be overkill.

I feel there are problems because we set the newly selected TLI from
startup process into shared memory, then some time later we set
SharedRecoveryInProgress = false. That timing window isn't good, but I
don't see a different way.

 It might be a good idea to change walsender to not store that in
 ThisTimeLineID, though. It used to make sense for ThisTimeLineID to track
 the current recovery timeline in 9.2 and below, but now that walsender can
 be sending any older timeline, using ThisTimeLineID to store the latest one
 seems confusing.

Agreed, but looks like too much code to touch that lightly.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


install_xlog_right.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move unused buffers to freelist

2013-05-21 Thread Amit Kapila

On Tuesday, May 21, 2013 12:36 PM Amit Kapila wrote:
 On Monday, May 20, 2013 6:54 PM Robert Haas wrote:
  On Thu, May 16, 2013 at 10:18 AM, Amit Kapila
 amit.kap...@huawei.com
  wrote:
   Further Performance Data:
  
   Below data is for average 3 runs of 20 minutes
  
   Scale Factor   - 1200
   Shared Buffers - 7G
 
  These results are good but I don't get similar results in my own
  testing.
 
 Thanks for running detailed tests
 
  I ran pgbench tests at a variety of client counts and scale
  factors, using 30-minute test runs and the following non-default
  configuration parameters.
 
  shared_buffers = 8GB
  maintenance_work_mem = 1GB
  synchronous_commit = off
  checkpoint_segments = 300
  checkpoint_timeout = 15min
  checkpoint_completion_target = 0.9
  log_line_prefix = '%t [%p] '
 
  Here are the results.  The first field in each line is the number of
  clients. The second number is the scale factor.  The numbers after
  master and patched are the median of three runs.
 
  01 100 master 1433.297699 patched 1420.306088
  01 300 master 1371.286876 patched 1368.910732
  01 1000 master 1056.891901 patched 1067.341658
  01 3000 master 637.312651 patched 685.205011
  08 100 master 10575.017704 patched 11456.043638
  08 300 master 9262.601107 patched 9120.925071
  08 1000 master 1721.807658 patched 1800.733257
  08 3000 master 819.694049 patched 854.333830
  32 100 master 26981.677368 patched 27024.507600
  32 300 master 14554.870871 patched 14778.285400
  32 1000 master 1941.733251 patched 1990.248137
  32 3000 master 846.654654 patched 892.554222
 
 
 Is the above test for tpc-b?
 In the above tests, there is performance increase from 1~8% and
 decrease
 from 0.2~1.5%
 
  And here's the same results for 5-minute, read-only tests:
 
  01 100 master 9361.073952 patched 9049.553997
  01 300 master 8640.235680 patched 8646.590739
  01 1000 master 8339.364026 patched 8342.799468
  01 3000 master 7968.428287 patched 7882.121547
  08 100 master 71311.491773 patched 71812.899492
  08 300 master 69238.839225 patched 70063.632081
  08 1000 master 34794.778567 patched 65998.468775
  08 3000 master 60834.509571 patched 61165.998080
  32 100 master 203168.264456 patched 205258.283852
  32 300 master 199137.276025 patched 200391.633074
  32 1000 master 177996.853496 patched 176365.732087
  32 3000 master 149891.147442 patched 148683.269107
 
  Something appears to have screwed up my results for 8 clients @ scale
  factor 300 on master,
 
   Do you want to say the reading of 1000 scale factor?
 
 but overall, on both the read-only and
  read-write tests, I'm not seeing anything that resembles the big
 gains
  you reported.
 
 I have not generated numbers for read-write tests, I will check that
 once.
 For read-only tests, the performance increase is minor and different
 from
 what I saw.
 Few points which I could think of for difference in data:
 
 1. In my test's I always observed best data when number of
 clients/threads
 are equal to number of cores which in your case should be at 16.
 2. I think for scale factor 100 and 300, there should not be much
 performance increase, as for them they should mostly get buffer from
 freelist inspite of even bgwriter adds to freelist or not.
 3. In my tests variance is for shared buffers, database size is always
 less
 than RAM (Scale Factor -1200, approx db size 16~17GB, RAM -24 GB), but
 due
 to variance in shared buffers, it can lead to I/O.
 4. Each run is of 20 minutes, not sure if this has any difference.
 
  Tests were run on a 16-core, 64-hwthread PPC64 machine provided to
 the
  PostgreSQL community courtesy of IBM.  Fedora 16, Linux kernel 3.2.6.
 
 To think about the difference in your and my runs, could you please
 tell me
 about below points
 1. What is RAM in machine.
 2. Are number of threads equal to number of clients.
 3. Before starting tests I have always done pre-warming of buffers
 (used
 pg_prewarm written by you last year), is it same for above read-only
 tests.
 4. Can you please once again run only the test where you saw
 variation(8
 clients @ scale factor 1000 on master), because I have also seen that
 performance difference is very good for certain
configurations(Scale Factor, RAM, Shared Buffers)

On looking more closely at data posted by you, I believe that there is some
problem with data (8
clients @ scale factor 1000 on master) as in all other cases, the data for
scale factor 1000 is better than 3000 except for this case.
So I think no need to run again.

 Apart from above, I had one more observation during my investigation to
 find
 why in some cases, there is a small dip:
 1. Many times, it finds the buffer in free list is not usable, means
 it's
 refcount or usage count is not zero, due to which it had to spend more
 time
 under BufFreelistLock.
I had not any further experiments related to this finding like if it
 really adds any overhead.
 
 Currently I am trying to find reasons for small dip of performance and
 see
 if I could do

Re: [HACKERS] pgbench vs. SERIALIZABLE

2013-05-21 Thread Josh Berkus


 Presumably we would want to repeat all of the ordinary commands, in the
 file, but not any of the backslash set commands that precede any ordinary
 commands.  But what if backslash set commands are sprinkled between
 ordinary commands?

See, this is why I had no intention of retrying.  Since people can run
custom workloads, there's no good way for us to handle the different
permutations of what kinds of scripts people might write.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fast promotion and log_checkpoints

2013-05-21 Thread Fujii Masao

On Tue, May 21, 2013 at 4:44 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 20 May 2013 20:06, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 It would be possible to redesign this with a special new reason, or we
 could just use time as the reason, or we could just leave it.

 Do nothing is easy, though so are the others, so we can choose
 anything we want. What do we want it to say?


 I'm not sure. Perhaps we should print (no flags), so that it wouldn't look
 like there's something missing in the log message.

 The reason text would still be absent, so it wouldn't really help the
 user interpret the log message correctly.

 I suggest we use RequestCheckpoint(CHECKPOINT_CAUSE_TIME) instead,
 since it is literally time for a checkpoint.

Or, what about using CHECKPOINT_FORCE and just printing force?
Currently that checkpoint always starts because of existence of the
end-of-recovery record, but I think we should ensure that the checkpoint
always starts by using that flag.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fast promotion and log_checkpoints

2013-05-21 Thread Simon Riggs

On 21 May 2013 15:29, Fujii Masao masao.fu...@gmail.com wrote:

 Or, what about using CHECKPOINT_FORCE and just printing force?
 Currently that checkpoint always starts because of existence of the
 end-of-recovery record, but I think we should ensure that the checkpoint
 always starts by using that flag.

This would mean we can't use the secondary checkpoint record, but we
already gave that up so should be OK.

Three people, three suggestions; so I will agree to this suggestion so
we can get on with it.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-21 Thread Benedikt Grundmann

We are seeing these errors on a regular basis on the testing box now.  We
have even changed the backup script to
shutdown the hot standby, take lvm snapshot, restart the hot standby, rsync
the lvm snapshot.  It still happens.

We have never seen this before we introduced the hot standby.  So we will
now revert to taking the backups from lvm snapshots on the production
database.  If you have ideas of what else we should try / what information
we can give you to debug this let us know and we will try to so.

Until then we will sadly operate on the assumption that the combination of
hot standby and frozen snapshot backup of it is not production ready.

Thanks,

Bene




On Thu, May 16, 2013 at 8:10 AM, David Powers dpow...@janestreet.comwrote:

 I'll try to get the primary upgraded over the weekend when we can afford a
 restart.

 In the meantime I have a single test showing that a shutdown, snapshot,
 restart produces a backup that passes the vacuum analyze test.  I'm going
 to run a full vacuum today.

 -David


 On Wed, May 15, 2013 at 3:53 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 15.05.2013 22:50, Benedikt Grundmann wrote:

 On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangashlinnakangas@**
 vmware.com hlinnakan...@vmware.com

 The subject says 9.2.3. Are you sure you're running 9.2.4 on all the

 servers? There was a fix to a bug related to starting a standby server
 from
 a filesystem snapshot. I don't think it was quite the case you have, but
 pretty close.


 So this is delightfully embarrassing I just went back to double check and

 - primary box is 9.2.3
 - standby is 9.2.4
 - testing is 9.2.4

 I guess that alone could possibly explain it?


 Hmm, no, it should still work. There haven't been any changes in the WAL
 format. I do recommend upgrading the primary, of course, but I don't really
 see how that would explain what you're seeing.

 - Heikki

[HACKERS] pg_export_snapshot on standby side

2013-05-21 Thread Fujii Masao

Hi,

We cannot run parallel pg_dump on the standby server because
pg_export_snapshot() always fails on the standby. Is this the oversight
of parallel pg_dump or pg_export_snapshot()?

pg_export_snapshot() fails in the standby because it always assigns
new XID and which is not allowed in the standby. Do we really need
to assign new XID even in the standby for the exportable snapshot?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast promotion failure

2013-05-21 Thread Simon Riggs

On 21 May 2013 09:26, Simon Riggs si...@2ndquadrant.com wrote:

 I'm OK with that principle...

Well, after fighting some more with that, I've gone with the, er,
principle of slightly less ugliness.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SET work_mem = '1TB';

2013-05-21 Thread Simon Riggs

I worked up a small patch to support Terabyte setting for memory.
Which is OK, but it only works for 1TB, not for 2TB or above.

Which highlights that since we measure things in kB, we have an
inherent limit of 2047GB for our memory settings. It isn't beyond
belief we'll want to go that high, or at least won't be by end 2014
and will be annoying sometime before 2020.

Solution seems to be to support something potentially bigger than INT
for GUCs. So we can reclassify GUC_UNIT_MEMORY according to the
platform we're on.

Opinions?

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


terabyte_work_mem.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SET work_mem = '1TB';

2013-05-21 Thread Gavin Flower


On 22/05/13 09:13, Simon Riggs wrote:

I worked up a small patch to support Terabyte setting for memory.
Which is OK, but it only works for 1TB, not for 2TB or above.

Which highlights that since we measure things in kB, we have an
inherent limit of 2047GB for our memory settings. It isn't beyond
belief we'll want to go that high, or at least won't be by end 2014
and will be annoying sometime before 2020.

Solution seems to be to support something potentially bigger than INT
for GUCs. So we can reclassify GUC_UNIT_MEMORY according to the
platform we're on.

Opinions?

--
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

I suspect it should be fixed before it starts being a problem, for 2 
reasons:


1. best to panic early while we have time
   (or more prosaically: doing it soon gives us more time to get it
   right without undue pressure)

2. not able to cope with 2TB and above might put off companies with
   seriously massive databases from moving to Postgres

Probably an idea to check what other values should be increased as well.


Cheers,
Gavin

Re: [HACKERS] plperl segfault in plperl_trusted_init() on kfreebsd

2013-05-21 Thread Christoph Berg

Re: Andrew Dunstan 2013-05-17 51964770.6070...@dunslane.net
 I have reproduced this. It happens with both the distro perl and a
 home-built perl 5.14. AFAICT this is a Perl bug. Any reference at
 all to ERRSV at the point this occurs causes a core dump, even just
 assigning it to a local SV * variable. Maybe you should take this up
 with the Perl people. It certainly seems mysterious that this should
 only occur on this platform.

I've just tried to look into this but got lost in chasing about 5
nested layers of ERRSV #defines. :-/

The crash also happens with libperl5.18 (5.18.0-1) on unstable/kfreebsd-amd64.

./configure --with-perl; make; cd src/pl/perl; make clean

postgresql-9.3/src/pl/plperl $ make PROFILE=-g -O0
'/usr/bin/perl' ./text2macro.pl --strip='^(\#.*|\s*)$' plc_perlboot.pl 
plc_trusted.pl  perlchunks.h
'/usr/bin/perl' plperl_opmask.pl plperl_opmask.h
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g 
-O0 -fpic -I. -I. -I../../../src/include -D_GNU_SOURCE  
-I/usr/lib/perl/5.18/CORE  -c -o plperl.o plperl.c
'/usr/bin/perl' /usr/share/perl/5.18/ExtUtils/xsubpp -typemap 
/usr/share/perl/5.18/ExtUtils/typemap SPI.xs SPI.c
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g 
-O0 -fpic -I. -I. -I../../../src/include -D_GNU_SOURCE  
-I/usr/lib/perl/5.18/CORE  -c -o SPI.o SPI.c
'/usr/bin/perl' /usr/share/perl/5.18/ExtUtils/xsubpp -typemap 
/usr/share/perl/5.18/ExtUtils/typemap Util.xs Util.c
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g 
-O0 -fpic -I. -I. -I../../../src/include -D_GNU_SOURCE  
-I/usr/lib/perl/5.18/CORE  -c -o Util.o Util.c
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g 
-O0 -fpic -shared -o plperl.so plperl.o SPI.o Util.o -L../../../src/port 
-L../../../src/common -Wl,--as-needed 
-Wl,-rpath,'/usr/lib/perl/5.18/CORE',--enable-new-dtags -g -O0  
-fstack-protector -L/usr/local/lib  -L/usr/lib/perl/5.18/CORE -lperl -ldl -lm 
-lpthread -lc -lcrypt 

postgresql-9.3/src/pl/plperl $ make check
[...]
../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check 
--top-builddir=../../..   --dbname=pl_regression --load-extension=plperl  
--load-extension=plperlu plperl plperl_lc plperl_trigger plperl_shared 
plperl_elog plperl_util plperl_init plperlu plperl_array plperl_plperlu
[...]
== running regression test queries==
test plperl   ... FAILED (test process exited with exit code 2)
[...]

#0  0x00080a4274ed in plperl_trusted_init () at plperl.c:937
937 if (SvTRUE(ERRSV))
#1  0x00080a42602d in select_perl_context (trusted=1 '\001') at plperl.c:600
#2  0x00080a430cb1 in compile_plperl_function (fn_oid=49152, is_trigger=0 
'\000') at plperl.c:2661
#3  0x00080a42ca4d in plperl_validator (fcinfo=0x7fffbeb0) at 
plperl.c:1900
#4  0x0071ffb6 in OidFunctionCall1Coll ()
#5  0x004ec3b7 in ProcedureCreate ()
#6  0x0054c4ee in CreateFunction ()
#7  0x0065889f in ProcessUtilitySlow.isra.2 ()
#8  0x00657b4c in standard_ProcessUtility ()
#9  0x00655467 in PortalRunUtility ()
#10 0x00656055 in PortalRunMulti ()
#11 0x00656b42 in PortalRun ()
#12 0x00653ef3 in PostgresMain ()
#13 0x00464471 in ServerLoop ()
#14 0x006144e1 in PostmasterMain ()
#15 0x00464cab in main ()

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] MVCC catalog access

2013-05-21 Thread Robert Haas

We've had a number of discussions about the evils of SnapshotNow.  As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.  I decided to do some
testing to measure the impact.  I was pleasantly surprised by the
results.

The attached patch is a quick hack to provide for MVCC catalog access.
 It adds a GUC called mvcc_catalog_access.  When this GUC is set to
true, and heap_beginscan() or index_beginscan() is called with
SnapshotNow, they call GetLatestSnapshot() and use the resulting
snapshot in lieu of SnapshotNow.  As a debugging double-check, I
modified HeapTupleSatisfiesNow to elog(FATAL) if called with
mvcc_catalog_access is true.  Of course, both of these are dirty
hacks.  If we were actually to implement MVCC catalog access, I think
we'd probably just go through and start replacing instances of
SnapshotNow with GetLatestSnapshot(), but I wanted to make it easy to
do perf testing.

When I first made this change, I couldn't detect any real change;
indeed, it seemed that make check was running ever-so-slightly faster
than before, although that may well have been a testing artifact.  I
wrote a test case that created a schema with 100,000 functions in it
and then dropped the schema (I believe it was Tom who previously
suggested this test case as a worst-case scenario for MVCC catalog
access).  That didn't seem to be adversely affected either, even
though it must take ~700k additional MVCC snapshots with
mvcc_catalog_access = true.

MVCC Off: Create 8743.101 ms, Drop 9655.471 ms
MVCC On: Create 7462.882 ms, Drop 9515.537 ms
MVCC Off: Create 7519.160 ms, Drop 9380.905 ms
MVCC On: Create 7517.382 ms, Drop 9394.857 ms

The first Create seems to be artificially slow because of some kind
of backend startup overhead.  Not sure exactly what.

After wracking my brain for a few minutes, I realized that the lack of
any apparent performance regression was probably due to the lack of
any concurrent connections, making the scans of the PGXACT array very
cheap.  So I wrote a little program to open a bunch of extra
connections.  My MacBook Pro grumbled when I tried to open more than
about 600, so I had to settle for that number.  That was enough to
show up the cost of all those extra snapshots:

MVCC Off: Create 9065.887 ms, Drop 9599.494 ms
MVCC On: Create 8384.065 ms, Drop 10532.909 ms
MVCC Off: Create 7632.197 ms, Drop 9499.502 ms
MVCC On: Create 8215.443 ms, Drop 10033.499 ms

Now, I don't know about you, but I'm having a hard time getting
agitated about those numbers.  Most people are not going to drop
100,000 objects with a single cascaded drop.  And most people are not
going to have 600 connections open when they do.  (The snapshot
overhead should be roughly proportional to the product of the number
of drops and the number of open connections, and the number of cases
where the product is as high as 60 million has got to be pretty
small.)  But suppose that someone is in that situation.  Well, then
they will take a... 10% performance penalty?  That sounds plenty
tolerable to me, if it means we can start moving in the direction of
allowing some concurrent DDL.

Am I missing an important test case here?  Are these results worse
than I think they are?  Did I boot this testing somehow?

[MVCC catalog access patch, test program to create lots of idle
connections, and pg_depend stress test case attached.]

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


mvcc-catalog-access.patch
Description: Binary data
/*
 * pg_cxn.c
 */

#include stdio.h
#include stdlib.h
#include getopt.h
#include libpq-fe.h

static void pg_connect(const char *conninfo);

int
main(int argc, char **argv)
{
	int		c;
	int		n = 1;
	int		optindex;
	int		i;
	const char *conninfo;

	while ((c = getopt_long(argc, argv, n:, NULL, optindex)) != -1)
	{
		switch (c)
		{
			case 'n':
n = atoi(optarg);
break;
			default:
fprintf(stderr, Usage: pg_cxn [-n count] connstr\n);
exit(1);
		}
	}
	argv += optind;
	argc -= optind;

	if (argc  0)
		conninfo = argv[0];
	else
		conninfo = ;

	for (i = 0; i  n; ++i)
		pg_connect(conninfo);

	while (1)
		sleep(3600);

	return 0;
}

static void
pg_connect(const char *conninfo)
{
	PGconn	   *conn;

	/* Make a connection to the database */
	conn = PQconnectdb(conninfo);

	/* Check to see that the backend connection was successfully made */
	if (PQstatus(conn) != CONNECTION_OK)
	{
		fprintf(stderr, %s, PQerrorMessage(conn));
		exit(1);
	}
}


depend
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast promotion failure

Re: [HACKERS] Move unused buffers to freelist

Re: [HACKERS] Fast promotion failure

Re: [HACKERS] Move unused buffers to freelist

Re: [HACKERS] pgbench vs. SERIALIZABLE

Re: [HACKERS] fast promotion and log_checkpoints

Re: [HACKERS] fast promotion and log_checkpoints

Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

[HACKERS] pg_export_snapshot on standby side

Re: [HACKERS] Fast promotion failure

[HACKERS] SET work_mem = '1TB';

Re: [HACKERS] SET work_mem = '1TB';

Re: [HACKERS] plperl segfault in plperl_trusted_init() on kfreebsd

[HACKERS] MVCC catalog access

14 matches

Site Navigation

Mail list logo

Footer information