Re: [HACKERS] [PATCHES] TODO Item - Add system view to show free
On Fri, 2005-10-28 at 12:50 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: There are a few issues with current FSM implementation, IMHO, discussing as usual the very highest end of performance: Do you have any evidence that the FSM is actually a source of performance issues, or is this all hypothetical? This was a side-bar issue for my current focus, as I already said, so I'll skip what sounds like a lengthy debate on this for now. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] 8.1RC1 fails opr_sanity on osx
Just the one fail on OSX 10.3.9 opr_sanity ... FAILED Is this a known problem, or something specific to my machine... I can post regression.diffs (quite long) if required ... Thanks Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] LDAP Authentication?
--- Magnus Hagander wrote: It should be fairly easy to write a LDAP backend to password authentication using openldap, winldap or whatever ldap library is available. I support the idea. It would be a good gain for PostgreSQL authentication. If you want to discuss ideas, drop me a line. Euler Taveira de Oliveira euler[at]yahoo_com_br ___ Promoção Yahoo! Acesso Grátis: a cada hora navegada você acumula cupons e concorre a mais de 500 prêmios! Participe! http://yahoo.fbiz.com.br/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 8.1RC1 fails opr_sanity on osx
Adam Witney wrote: Just the one fail on OSX 10.3.9 opr_sanity ... FAILED Is this a known problem, or something specific to my machine... I can post regression.diffs (quite long) if required ... Uh, regression.diffs is large? MY guess is your backend crashed, for some unknown reason, so all the queries after the crash just failed. I can't think of another reason for that diff file to be large. Is the failure repoducable? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] LDAP Authentication?
I can help on this one too. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Euler Taveira de Oliveira Sent: Monday, October 31, 2005 9:44 AM To: Satoshi Nagayasu; Magnus Hagander Cc: PostgreSQL-development Subject: Re: [HACKERS] LDAP Authentication? --- Magnus Hagander wrote: It should be fairly easy to write a LDAP backend to password authentication using openldap, winldap or whatever ldap library is available. I support the idea. It would be a good gain for PostgreSQL authentication. If you want to discuss ideas, drop me a line. Euler Taveira de Oliveira euler[at]yahoo_com_br ___ Promoção Yahoo! Acesso Grátis: a cada hora navegada você acumula cupons e concorre a mais de 500 prêmios! Participe! http://yahoo.fbiz.com.br/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 8.1RC1 fails opr_sanity on osx
On 31/10/05 1:32 pm, Bruce Momjian pgman@candle.pha.pa.us wrote: Adam Witney wrote: Just the one fail on OSX 10.3.9 opr_sanity ... FAILED Is this a known problem, or something specific to my machine... I can post regression.diffs (quite long) if required ... Uh, regression.diffs is large? MY guess is your backend crashed, for some unknown reason, so all the queries after the crash just failed. I can't think of another reason for that diff file to be large. Is the failure repoducable? Seems a bit random actually... Here are the results of 3 successive make check's, the fourth passed all tests! http://bugs.sgul.ac.uk/downloads/temp/regression1.diffs http://bugs.sgul.ac.uk/downloads/temp/regression2.diffs http://bugs.sgul.ac.uk/downloads/temp/regression3.diffs -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 8.1RC1 fails opr_sanity on osx
Adam Witney wrote: On 31/10/05 1:32 pm, Bruce Momjian pgman@candle.pha.pa.us wrote: Adam Witney wrote: Just the one fail on OSX 10.3.9 opr_sanity ... FAILED Is this a known problem, or something specific to my machine... I can post regression.diffs (quite long) if required ... Uh, regression.diffs is large? MY guess is your backend crashed, for some unknown reason, so all the queries after the crash just failed. I can't think of another reason for that diff file to be large. Is the failure repoducable? Seems a bit random actually... Here are the results of 3 successive make check's, the fourth passed all tests! http://bugs.sgul.ac.uk/downloads/temp/regression1.diffs http://bugs.sgul.ac.uk/downloads/temp/regression2.diffs http://bugs.sgul.ac.uk/downloads/temp/regression3.diffs Yea, that helps. The errors you have are really these: ! psql: could not fork new process for connection: Resource temporarily unavailable and ! psql: could not send startup packet: Broken pipe Is anything else big running on your machine? I looked at the OSX configuration section here: http://candle.pha.pa.us/main/writings/pgsql/sgml/kernel-resources.html but didn't see anything significant. My guess is that the parallel nature of the regression tests are exhausting some system resource on your machine. Does the kernel log have anything of interest? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.1 Release Candidate 1 Bundled ...
The developer.postgresql.org machine really isn't geared to handle downloads.. Any reason you can't just stick it on the standard ftp sites and have it mirrored along with everything else? This is taken from our spec: # Pre-release RPM's should not be put up on the public ftp.postgresql.org server # -- only test releases or full releases should be. So thinking that: * Beta and RC RPMs are used only by testers * We use the beta and RC steps to build the new RPM sets, so that means that actually they are not production quality looking from the RPM perspective. By way of clarification, as I am the one who wrote that portion of the spec file, an 'RPM prerelease' and a 'beta' weren't intended to be the same thing; the line in the spec referenced was for my own use to remind me that my own internal testing packages (with a release number 0.x) weren't intended for public consumption. Devrim, you can remove that section of the spec file at any time at this point, because you are using CVS for the purpose that I was using 'prerelease' RPMs. Historically, beta and release candidate RPM's were put on the main ftp site but flagged as beta quality. I certainly appreciate your dilligence in following those instructions I wrote long ago, but, thanks to your smoother release process (in no small part due to the use of CVS) those instructions are obsolete. Many thanks for being that dilligent! -- Lamar Owen Director of Information Technology Pisgah Astronomical Research Institute 1 PARI Drive Rosman, NC 28772 (828)862-5554 www.pari.edu ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 8.1RC1 fails opr_sanity on osx
On 31/10/05 2:13 pm, Bruce Momjian pgman@candle.pha.pa.us wrote: Adam Witney wrote: On 31/10/05 1:32 pm, Bruce Momjian pgman@candle.pha.pa.us wrote: Adam Witney wrote: Just the one fail on OSX 10.3.9 opr_sanity ... FAILED Is this a known problem, or something specific to my machine... I can post regression.diffs (quite long) if required ... Uh, regression.diffs is large? MY guess is your backend crashed, for some unknown reason, so all the queries after the crash just failed. I can't think of another reason for that diff file to be large. Is the failure repoducable? Seems a bit random actually... Here are the results of 3 successive make check's, the fourth passed all tests! http://bugs.sgul.ac.uk/downloads/temp/regression1.diffs http://bugs.sgul.ac.uk/downloads/temp/regression2.diffs http://bugs.sgul.ac.uk/downloads/temp/regression3.diffs Yea, that helps. The errors you have are really these: ! psql: could not fork new process for connection: Resource temporarily unavailable and ! psql: could not send startup packet: Broken pipe Is anything else big running on your machine? I looked at the OSX configuration section here: http://candle.pha.pa.us/main/writings/pgsql/sgml/kernel-resources.html but didn't see anything significant. My guess is that the parallel nature of the regression tests are exhausting some system resource on your machine. Does the kernel log have anything of interest? Ah that probably explains it... It is my laptop and I have quite a few things running... So should probably run the make check when I first start it up maybe. Thanks for the help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.1RC1 fails opr_sanity on osx
Adam Witney [EMAIL PROTECTED] writes: http://bugs.sgul.ac.uk/downloads/temp/regression1.diffs http://bugs.sgul.ac.uk/downloads/temp/regression2.diffs http://bugs.sgul.ac.uk/downloads/temp/regression3.diffs If you'd looked, you would have noticed that they're all variations on psql: could not fork new process for connection: Resource temporarily unavailable In other words, you've got a system resource limit problem. See http://developer.postgresql.org/docs/postgres/kernel-resources.html#AEN17862 regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] 8.1RC1 on Tru64
Hello, I tried RC1 on Tru64 box(with Compaq C V6.1-011)) and succeed : bash-2.05b$ make MAX_CONNECTIONS=2 check ... == shutting down postmaster == postmaster stopped == All 98 tests passed. == Environments: - $ uname -a OSF1 kiss.my.domain V5.0 910 alpha - Compaq C V6.1-011 on Digital UNIX V5.0 (Rev. 910) - GNU Make version 3.79.1, by Richard Stallman and Roland McGrath. - result of pg_config $ src/bin/pg_config/pg_config BINDIR = /home/postgres/postgresql-8.1RC1/src/bin/pg_config DOCDIR = /home/postgres/postgresql-8.1RC1/src/bin/doc INCLUDEDIR = /home/postgres/postgresql-8.1RC1/src/bin/include PKGINCLUDEDIR = /home/postgres/postgresql-8.1RC1/src/bin/include INCLUDEDIR-SERVER = /home/postgres/postgresql-8.1RC1/src/bin/include/server LIBDIR = /home/postgres/postgresql-8.1RC1/src/bin/lib PKGLIBDIR = /home/postgres/postgresql-8.1RC1/src/bin/lib LOCALEDIR = MANDIR = /home/postgres/postgresql-8.1RC1/src/bin/man SHAREDIR = /home/postgres/postgresql-8.1RC1/src/bin/share SYSCONFDIR = /home/postgres/postgresql-8.1RC1/src/bin/etc PGXS = /home/postgres/postgresql-8.1RC1/src/bin/lib/pgxs/src/makefiles/pgxs.mk CONFIGURE = '--with-includes=/usr/local/include' CC = cc -std CPPFLAGS = -I/usr/local/include CFLAGS = -O -ieee CFLAGS_SL = LDFLAGS = -rpath /usr/local/pgsql/lib LDFLAGS_SL = LIBS = -lpgport -lz -lreadline -lresolv -lPW -lm -lbsd VERSION = PostgreSQL 8.1RC1 regards, -- Shigehiro Honda ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] platform test
4x AMD Opteron (tm) Processor 852 - [EMAIL PROTECTED] /tmp/pgtestbuild/postgresql-8.1RC1 $ uname -a Linux localhost 2.6.12-gentoo-r10 #1 SMP Fri Sep 9 09:43:22 EDT 2005 x86_64 AMD Opteron (tm) Processor 852 AuthenticAMD GNU/Linux [EMAIL PROTECTED] /tmp/pgtestbuild/postgresql-8.1RC1 $ file src/backend/postgres src/backend/postgres: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), not stripped -- [EMAIL PROTECTED] /tmp/pgtestbuild/postgresql-8.1RC1 $ gcc --version gcc (GCC) 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8) -- [EMAIL PROTECTED] /tmp/pgtestbuild/postgresql-8.1RC1 $ src/bin/pg_config/pg_config BINDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/pg_config DOCDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/doc/postgresql INCLUDEDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/include PKGINCLUDEDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/include/postgresql INCLUDEDIR-SERVER = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/include/postgresql/server LIBDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/lib PKGLIBDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/lib/postgresql LOCALEDIR = MANDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/man SHAREDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/share/postgresql SYSCONFDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/etc/postgresql PGXS = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/lib/postgresql/pgxs/src/makefiles/pgxs.mk CONFIGURE = '--with-perl' '--with-openssl' '--enable-integer-datetimes' '--prefix=/tmp/pgtest/' CC = gcc CPPFLAGS = -D_GNU_SOURCE CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing CFLAGS_SL = -fpic LDFLAGS = -Wl,-rpath,/tmp/pgtest//lib LDFLAGS_SL = LIBS = -lpgport -lssl -lcrypto -lz -lreadline -lcrypt -lresolv -lnsl -ldl -lm -lbsd VERSION = PostgreSQL 8.1RC1 -- == All 98 tests passed. == -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 8.1RC1 on Tru64
Honda Shigehiro wrote: Hello, I tried RC1 on Tru64 box(with Compaq C V6.1-011)) and succeed : bash-2.05b$ make MAX_CONNECTIONS=2 check The seems to be a very low setting for MAX_CONNECTIONS. Any particular reason for that? (side note - we'd very much welcome a Tru64 buildfarm member - if you're interested please email me off list). cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.1RC1 on Tru64
From: Andrew Dunstan [EMAIL PROTECTED] Subject: Re: [HACKERS] 8.1RC1 on Tru64 Date: Mon, 31 Oct 2005 11:12:09 -0500 I tried RC1 on Tru64 box(with Compaq C V6.1-011)) and succeed : bash-2.05b$ make MAX_CONNECTIONS=2 check The seems to be a very low setting for MAX_CONNECTIONS. Any particular reason for that? This is because my box has too small memory(64MB) to do without this. With default parameter, my box said Unable to obtain requested swap space... (side note - we'd very much welcome a Tru64 buildfarm member - if you're interested please email me off list). ... I have been trying to join buildfarm since last weak. But I can not compile CVS now... regards, -- Shigehiro Honda ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags 0x01),)
On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: I'd like Jim to test this theory by seeing if it helps to reverse the order of the if-test elements at lines 294/295, ie make it look like if (shared-page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS || shared-page_number[slotno] != pageno) This won't do as a permanent patch, because it isn't guaranteed to fix the problem on machines that don't strongly order writes, but it should work on Opterons, at least well enough to confirm the diagnosis. Given your proposed fix on -patches, do you still need me to test this? Also, is there any heap corruption risk associated with this patch? I'm also wondering what the effect of this is when assertions are turned off. My client had to go back to running with assertions turned off because of the performance impact. Are they now risking data corruption? Is there a way to turn on the assertion just in this code segment? This incident has made me wonder if it's worth creating two classes of assertions. The (hopefully more common) set of assertions would be for things that shouldn't happen, but if go un-caught won't result in heap corruption. A new set (well, existing asserts, but just re-classified) would be for things that if uncaught could result in heap corruption. My hope is that the set of critical assertions could be turned on by default, helping to identify race conditions and other bugs that conventional testing is unlikely to find. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags 0x01),)
Sorry, two more things... Will increasing shared_buffers make this less likely to occur? Or is this just something that's likely to happen when there are things like seqscans that are putting buffers near the front of the LRU? (The 8.0.3 buffer manager does something like that, right?) Is this something that a test case can be created for? I know someone submitted a framework for doing concurrent testing... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] FKs on temp tables: hard, or just omitted?
On Sun, Oct 30, 2005 at 05:31:07PM -0800, Josh Berkus wrote: Folks, Thanks, all! Now, if only I could remember who asked me the question ... ISTM we should add a note about this to the docs... Here's a patch for create_table.sgml, though there's probably some other places this could go... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 Index: doc/src/sgml/ref/create_table.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ref/create_table.sgml,v retrieving revision 1.94 diff -u -r1.94 create_table.sgml --- doc/src/sgml/ref/create_table.sgml 13 Aug 2005 02:48:18 - 1.94 +++ doc/src/sgml/ref/create_table.sgml 31 Oct 2005 17:54:10 - @@ -421,7 +421,10 @@ primary key of the replaceable class=parameterreftable/replaceable is used. The referenced columns must be the columns of a unique or primary - key constraint in the referenced table. + key constraint in the referenced table. Note that foreign key + constraints may not be defined between temporary tables and permanent + tables. This is because doing so would eliminate most of the performance + gains of using a temporary table. /para para ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags
Jim C. Nasby wrote: On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: I'd like Jim to test this theory by seeing if it helps to reverse the order of the if-test elements at lines 294/295, ie make it look like if (shared-page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS || shared-page_number[slotno] != pageno) This won't do as a permanent patch, because it isn't guaranteed to fix the problem on machines that don't strongly order writes, but it should work on Opterons, at least well enough to confirm the diagnosis. Given your proposed fix on -patches, do you still need me to test this? Also, is there any heap corruption risk associated with this patch? Because it is a test, I am not sure there is any way to know what the possible impact of a bug is. If we knew there were bug in the patch, it would have been fixed already. I'm also wondering what the effect of this is when assertions are turned off. My client had to go back to running with assertions turned off because of the performance impact. Are they now risking data corruption? Is there a way to turn on the assertion just in this code segment? This incident has made me wonder if it's worth creating two classes of assertions. The (hopefully more common) set of assertions would be for things that shouldn't happen, but if go un-caught won't result in heap corruption. A new set (well, existing asserts, but just re-classified) would be for things that if uncaught could result in heap corruption. My hope is that the set of critical assertions could be turned on by default, helping to identify race conditions and other bugs that conventional testing is unlikely to find. That is probably overkill. Running with test patches isn't something we expect folks to do often. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags 0x01),)
Jim C. Nasby [EMAIL PROTECTED] writes: On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: This won't do as a permanent patch, because it isn't guaranteed to fix the problem on machines that don't strongly order writes, but it should work on Opterons, at least well enough to confirm the diagnosis. Given your proposed fix on -patches, do you still need me to test this? Yes; we still need to verify that my theory actually explains your problem. Given that I'm positing that you can repeatedly hit a two-instruction window, this is by no means a sure thing. We need it tested (and with asserts on, so that we can tell if it's fixed the problem or not). Also, is there any heap corruption risk associated with this patch? Look, Jim, I'm trying to help you fix this. Are you going to help or not? If you want some kind of written guarantee, you're not going to get one. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags 0x01),)
On Mon, Oct 31, 2005 at 01:05:06PM -0500, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: This won't do as a permanent patch, because it isn't guaranteed to fix the problem on machines that don't strongly order writes, but it should work on Opterons, at least well enough to confirm the diagnosis. Given your proposed fix on -patches, do you still need me to test this? Yes; we still need to verify that my theory actually explains your problem. Given that I'm positing that you can repeatedly hit a two-instruction window, this is by no means a sure thing. We need it tested (and with asserts on, so that we can tell if it's fixed the problem or not). Ok, I'll work on getting this tested. Just to clarify, if this fixes it then the problem wouldn't occur, or would we just see a different assert? Also, is there any heap corruption risk associated with this patch? Look, Jim, I'm trying to help you fix this. Are you going to help or not? If you want some kind of written guarantee, you're not going to get one. Of course not, and I'm not looking for one. On the otherhand, I don't want to recommend something on a production system without understanding what kind of risks are involved, and unfortunately much of this is still over my head. I would really like to have a better idea of what the impact of this bug is. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags
Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: This won't do as a permanent patch, because it isn't guaranteed to fix the problem on machines that don't strongly order writes, but it should work on Opterons, at least well enough to confirm the diagnosis. Given your proposed fix on -patches, do you still need me to test this? Yes; we still need to verify that my theory actually explains your problem. Given that I'm positing that you can repeatedly hit a two-instruction window, this is by no means a sure thing. We need it tested (and with asserts on, so that we can tell if it's fixed the problem or not). Also, is there any heap corruption risk associated with this patch? Look, Jim, I'm trying to help you fix this. Are you going to help or not? If you want some kind of written guarantee, you're not going to get one. I think we can say Jim gets his money back if he finds a bug. :-) -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags
On Mon, Oct 31, 2005 at 01:01:14PM -0500, Bruce Momjian wrote: This incident has made me wonder if it's worth creating two classes of assertions. The (hopefully more common) set of assertions would be for things that shouldn't happen, but if go un-caught won't result in heap corruption. A new set (well, existing asserts, but just re-classified) would be for things that if uncaught could result in heap corruption. My hope is that the set of critical assertions could be turned on by default, helping to identify race conditions and other bugs that conventional testing is unlikely to find. That is probably overkill. Running with test patches isn't something we expect folks to do often. I wasn't thinking about test patches. My assumption is that the asserts that are currently in place fall into one of two categories: either they check for something that if false could result in data corruption in the heap, or they check for something that shouldn't happen, but if it does it can't corrupt the heap. If that assumption is correct then seperating them might make it easier to run with the set of critical asserts turned on. Currently, there can be a substantial performance penalty with all asserts turned on, but I suspect a lot of that penalty is from asserts in things like parsing and planning code; code that pretty much couldn't corrupt data. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags
Jim C. Nasby wrote: On Mon, Oct 31, 2005 at 01:01:14PM -0500, Bruce Momjian wrote: This incident has made me wonder if it's worth creating two classes of assertions. The (hopefully more common) set of assertions would be for things that shouldn't happen, but if go un-caught won't result in heap corruption. A new set (well, existing asserts, but just re-classified) would be for things that if uncaught could result in heap corruption. My hope is that the set of critical assertions could be turned on by default, helping to identify race conditions and other bugs that conventional testing is unlikely to find. That is probably overkill. Running with test patches isn't something we expect folks to do often. I wasn't thinking about test patches. My assumption is that the asserts that are currently in place fall into one of two categories: either they check for something that if false could result in data corruption in the heap, or they check for something that shouldn't happen, but if it does it can't corrupt the heap. If that assumption is correct then seperating them might make it easier to run with the set of critical asserts turned on. Currently, there can be a substantial performance penalty with all asserts turned on, but I suspect a lot of that penalty is from asserts in things like parsing and planning code; code that pretty much couldn't corrupt data. There is no way if the system has some incorrect value whether that would later corrupt the data or not. Anything the system does that it shouldn't do is a potential corruption problem. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags
On Mon, Oct 31, 2005 at 01:34:17PM -0500, Bruce Momjian wrote: There is no way if the system has some incorrect value whether that would later corrupt the data or not. Anything the system does that it shouldn't do is a potential corruption problem. But is it safe to say that there are areas where a failed assert is far more likely to result in data corruption? And that there's also areas where there's likely to be difficult/impossible to find bugs, such as race conditions? ISTM that it would be valuable to do some additional checking in these critical areas. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.1 Release Candidate 1 Coming ...
[EMAIL PROTECTED] (Tom Lane) writes: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: hmm well -HEAD(and 8.0.4 too!) is broken on AIX 5.3ML3: http://archives.postgresql.org/pgsql-hackers/2005-10/msg01053.php [ shrug... ] The reports of this problem have not given enough information to fix it, and since it's not a regression from 8.0, it's not going to hold up the 8.1 release. When and if we receive enough info to fix it, we'll gladly do so, but ... Well, we never had an AIX 5.3 system when 8.0 was released, so didn't attempt a compile. Seneca just tried out a build on 8.0.3 on AIX 5.3; it appears to be experiencing the same problem with initdb, and a slight modification of the previous fix appears to resolve the issue. Can you suggest what further we might provide that would help? (My guess is that the problem is a compiler or libc bug anyway, given that one report says that replacing a memcpy call with an equivalent loop makes the failure go away.) It seems unlikely to be a compiler bug as the same issue has been reported with both GCC and IBM XLC. I could believe it being a libc bug... It would be terribly disappointing to have to report both internally and externally that AIX 5.3 is not a usable platform for recent releases of PostgreSQL... -- cbbrowne,@,ntlug.org http://cbbrowne.com/info/linuxdistributions.html Never lend your car to anyone to whom you have given birth to. --Erma Bombeck ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 8.1 Release Candidate 1 Coming ...
Chris Browne [EMAIL PROTECTED] writes: [EMAIL PROTECTED] (Tom Lane) writes: (My guess is that the problem is a compiler or libc bug anyway, given that one report says that replacing a memcpy call with an equivalent loop makes the failure go away.) It seems unlikely to be a compiler bug as the same issue has been reported with both GCC and IBM XLC. I could believe it being a libc bug... As best I can tell after poking at it on Stefan's machine, it's a linker bug, or else there is something strange about memcpy as compared to, say, memcmp. A function pointer to memcmp works, a function pointer to memcpy contains a bogus value that points entirely outside the program's address space. This despite the assembly code that generates them looking just the same in both cases, viz LC..12: .tc memcmp[TC],memcmp[DS] LC..14: .tc memcpy[TC],memcpy[DS] Even more interesting, if you start the postmaster under gdb and examine the pointer, then set a breakpoint at main and say run, by the time control arrives at main() the bogus value has changed to a different bogus value. So something in the basic C runtime support is frobbing it --- incorrectly :-(. I think all the signs point to incorrect relocation data generated by the linker, though I have no idea why only memcpy would be affected. It would be terribly disappointing to have to report both internally and externally that AIX 5.3 is not a usable platform for recent releases of PostgreSQL... According to Stefan it broke between 5.3ML1 and 5.3ML3. I suggest filing a defect report with IBM. We're not going to stop using memcpy because one version of one platform is broken. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags
On 10/31/05, Jim C. Nasby [EMAIL PROTECTED] wrote: On Mon, Oct 31, 2005 at 01:34:17PM -0500, Bruce Momjian wrote: There is no way if the system has some incorrect value whether that would later corrupt the data or not. Anything the system does that it shouldn't do is a potential corruption problem. But is it safe to say that there are areas where a failed assert is far more likely to result in data corruption? And that there's also areas where there's likely to be difficult/impossible to find bugs, such as race conditions? ISTM that it would be valuable to do some additional checking in these critical areas. There are, no doubt, also places where an assert has minimal to no performance impact. I'd wager a guess that the intersection of low impact asserts, and asserts which measure high risk activities, is small enough to be uninteresting. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 8.1 Release Candidate 1 Coming ...
Is this issue only on AIX 5.3 ML1 thru ML 3? Does the build work fine with 5.2 (ALL MLs)? On 10/31/05, Tom Lane [EMAIL PROTECTED] wrote: Chris Browne [EMAIL PROTECTED] writes: [EMAIL PROTECTED] (Tom Lane) writes: (My guess is that the problem is a compiler or libc bug anyway, given that one report says that replacing a memcpy call with an equivalent loop makes the failure go away.) It seems unlikely to be a compiler bug as the same issue has been reported with both GCC and IBM XLC. I could believe it being a libc bug... As best I can tell after poking at it on Stefan's machine, it's a linker bug, or else there is something strange about memcpy as compared to, say, memcmp. A function pointer to memcmp works, a function pointer to memcpy contains a bogus value that points entirely outside the program's address space. This despite the assembly code that generates them looking just the same in both cases, viz LC..12: .tc memcmp[TC],memcmp[DS] LC..14: .tc memcpy[TC],memcpy[DS] Even more interesting, if you start the postmaster under gdb and examine the pointer, then set a breakpoint at main and say run, by the time control arrives at main() the bogus value has changed to a different bogus value. So something in the basic C runtime support is frobbing it --- incorrectly :-(. I think all the signs point to incorrect relocation data generated by the linker, though I have no idea why only memcpy would be affected. It would be terribly disappointing to have to report both internally and externally that AIX 5.3 is not a usable platform for recent releases of PostgreSQL... According to Stefan it broke between 5.3ML1 and 5.3ML3. I suggest filing a defect report with IBM. We're not going to stop using memcpy because one version of one platform is broken. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 8.1 Release Candidate 1 Coming ...
Mag Gam [EMAIL PROTECTED] writes: Is this issue only on AIX 5.3 ML1 thru ML 3? Does the build work fine with 5.2 (ALL MLs)? There's an AIX 5.2 machine in the buildfarm, and it seems happy. I have no idea about details beyond that ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags 0x01),)
Now that I've got a little better idea of what this code does, I've noticed something interesting... this issue is happening on an 8-way machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this greatly increase the odds of buffer conflicts? Bug aside, would it be better to set NUM_SLRU_BUFFERS higher for a larger number of CPUs? Also, something else to note is that this database can see a pretty high transaction rate... I just checked and it was doing 200TPS, but iirc it can hit 1000+ TPS during the day. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags 0x01),)
Jim C. Nasby wrote: Now that I've got a little better idea of what this code does, I've noticed something interesting... this issue is happening on an 8-way machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this greatly increase the odds of buffer conflicts? Bug aside, would it be better to set NUM_SLRU_BUFFERS higher for a larger number of CPUs? We had talked about increasing NUM_SLRU_BUFFERS depending on shared_buffers, but it didn't get done. Something to consider for 8.2 though. I think you could have better performance by increasing that setting, while at the same time dimishing the possibility that the race condition appears. I think you should also consider increasing PGPROC_MAX_CACHED_SUBXIDS (src/include/storage/proc.h), because that should decrease the chance that the subtrans area needs to be scanned. By how much, however, I wouldn't know -- it depends on the number of subtransactions you typically have; I guess you could activate the measuring code in procarray.c to get a figure. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 www.google.com: interfaz de línea de comando para la web. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
On Thu, 20 Oct 2005 23:03:47 +0100 Simon Riggs [EMAIL PROTECTED] wrote: On Wed, 2005-10-19 at 14:07 -0700, Mark Wong wrote: This isn't exactly elegant coding, but it provides a useful improvement on an 8-way SMP box when run on 8.0 base. OK, lets be brutal: this looks pretty darn stupid. But it does follow the CPU optimization handbook advice and I did see a noticeable improvement in performance and a reduction in context switching. I'm not in a position to try this again now on 8.1beta, but I'd welcome a performance test result from anybody that is. I'll supply a patch against 8.1beta for anyone wanting to test this. Ok, I've produce a few results on a 4 way (8 core) POWER 5 system, which I've just set up and probably needs a bit of tuning. I don't see much difference but I'm wondering if the cacheline sizes are dramatically different from Intel/AMD processors. I still need to take a closer look to make sure I haven't grossly mistuned anything, but I'll let everyone take a look: Well, the Power 5 architecture probably has the lowest overall memory delay you can get currently so in some ways that would negate the effects of the patch. (Cacheline is still 128 bytes, AFAICS). But it's clear the patch isn't significantly better (like it was with 8.0 when we tried this on the 8-way Itanium in Feb). cvs 20051013 http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/19/ 2501 notpm cvs 20051013 w/ lw.patch http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/20/ 2519 notpm Could you re-run with wal_buffers = 32 ? (Without patch) Thanks Ok, sorry for the delay. I've bumped up the wal_buffers to 2048 and redid the disk layout. Here's where I'm at now: cvs 20051013 http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/40/ 3257 notpm cvs 20051013 w/ lw.patch http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/42/ 3285 notpm Still not much of a difference with the patch. A quick glance over the iostat data suggests I'm still not i/o bound, but the i/o wait is rather high according to vmstat. Will try to see if there's anything else obvious to get the load up higher. Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags 0x01),)
On Mon, Oct 31, 2005 at 09:02:59PM -0300, Alvaro Herrera wrote: Jim C. Nasby wrote: Now that I've got a little better idea of what this code does, I've noticed something interesting... this issue is happening on an 8-way machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this greatly increase the odds of buffer conflicts? Bug aside, would it be better to set NUM_SLRU_BUFFERS higher for a larger number of CPUs? We had talked about increasing NUM_SLRU_BUFFERS depending on shared_buffers, but it didn't get done. Something to consider for 8.2 though. I think you could have better performance by increasing that setting, while at the same time dimishing the possibility that the race condition appears. Ok, I'll look into that. This database is definately having issues due to the sheer transaction volume, so maybe that will help. If NUM_SLRU_BUFFERS were to be tied to something, wouldn't it make more sense to tie it to wal_buffers though? One example is a data warehouse might have a very high shared_buffers, but most likely won't have a high transaction rate. ISTM that most databases with a high transaction rate are likely to have increased wal_buffers. I think you should also consider increasing PGPROC_MAX_CACHED_SUBXIDS (src/include/storage/proc.h), because that should decrease the chance that the subtrans area needs to be scanned. By how much, however, I wouldn't know -- it depends on the number of subtransactions you typically have; I guess you could activate the measuring code in procarray.c to get a figure. AFAIK they're not using subtransactions at all, but I'll check. Is there anywhere this stuff is documented other than in code? It sounds like an advanced tuning guide would be very valuable for environments like this one... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] regression failures on WIndows in machines with some non-English locales
I have become aware that regression is failing due to ordering differences on Windows machines in some non-English locales (specifically, Czech, but the potential is there for more failures). The problem seems to be that the regression suite and initdb don't do enough between them to ensure that the tests are run in C locale. The simple solution seems to be to add --no-locale to the initdb args in pg_regress.sh. I have asked Petr Jelinek (one of our Czech users) to test this. If it works as I expect it to (buildfarm has done this for installcheck tests for some time) I'd like to add this to both the HEAD and 8.0 branches. I know it's very late in the cycle, but it seems very low risk to me, and I'd like to have regression working on as broad a set of platforms as possible. If people prefer, I could add it just for the Windows case - Unix platforms won't see the effect I propose to remedy, as their setlocale works from the environment, unlike Windows. Thoughts? cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags 0x01),)
Jim C. Nasby [EMAIL PROTECTED] writes: AFAIK they're not using subtransactions at all, but I'll check. Well, yeah, they are ... else you'd never have seen this failure. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] regression failures on WIndows in machines with some non-English locales
Andrew Dunstan [EMAIL PROTECTED] writes: The simple solution seems to be to add --no-locale to the initdb args in pg_regress.sh. Er ... what exactly does that do that setting LC_ALL=C doesn't? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] regression failures on WIndows in machines with some non-English
Tom Lane wrote: The simple solution seems to be to add --no-locale to the initdb args in pg_regress.sh. Er ... what exactly does that do that setting LC_ALL=C doesn't? Windows are ignoring locale enviroment variables so you can't do that -- Regards Petr Jelinek (PJMODOS) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
On Mon, 2005-10-31 at 16:10 -0800, Mark Wong wrote: On Thu, 20 Oct 2005 23:03:47 +0100 Simon Riggs [EMAIL PROTECTED] wrote: On Wed, 2005-10-19 at 14:07 -0700, Mark Wong wrote: This isn't exactly elegant coding, but it provides a useful improvement on an 8-way SMP box when run on 8.0 base. OK, lets be brutal: this looks pretty darn stupid. But it does follow the CPU optimization handbook advice and I did see a noticeable improvement in performance and a reduction in context switching. I'm not in a position to try this again now on 8.1beta, but I'd welcome a performance test result from anybody that is. I'll supply a patch against 8.1beta for anyone wanting to test this. Ok, I've produce a few results on a 4 way (8 core) POWER 5 system, which I've just set up and probably needs a bit of tuning. I don't see much difference but I'm wondering if the cacheline sizes are dramatically different from Intel/AMD processors. I still need to take a closer look to make sure I haven't grossly mistuned anything, but I'll let everyone take a look: Well, the Power 5 architecture probably has the lowest overall memory delay you can get currently so in some ways that would negate the effects of the patch. (Cacheline is still 128 bytes, AFAICS). But it's clear the patch isn't significantly better (like it was with 8.0 when we tried this on the 8-way Itanium in Feb). cvs 20051013 http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/19/ 2501 notpm cvs 20051013 w/ lw.patch http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/20/ 2519 notpm Could you re-run with wal_buffers = 32 ? (Without patch) Thanks Ok, sorry for the delay. I've bumped up the wal_buffers to 2048 and redid the disk layout. Here's where I'm at now: cvs 20051013 http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/40/ 3257 notpm cvs 20051013 w/ lw.patch http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/42/ 3285 notpm Still not much of a difference with the patch. A quick glance over the iostat data suggests I'm still not i/o bound, but the i/o wait is rather high according to vmstat. Will try to see if there's anything else obvious to get the load up higher. OK, thats fine. I'm glad there's some gain, but not much yet. I think we should leave out doing any more tests on lw.patch for now. Concerned about the awful checkpointing. Can you bump wal_buffers to 8192 just to make sure? Thats way too high, but just to prove it. We need to rdeuce the number of blocks to be written at checkpoint. bgwriter_all_maxpages 5 - 15 bgwriter_all_percent0.333 bgwriter_delay 200 bgwriter_lru_maxpages 5 - 7 bgwriter_lru_percent1 shared_buffers set lower to 10 (which should cause some amusement on-list) Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 8.1 Release Candidate 1 Coming ...
Mag Gam wrote: Is this issue only on AIX 5.3 ML1 thru ML 3? Does the build work fine with 5.2 (ALL MLs)? 5.3 ML1 works but it is affected by the System include Bug mentioned in our AIX-FAQ. ML3 is supposed to fix that specific problem but breaks in another more difficult way as it seems ... Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq