[HACKERS] buffer assertion tripping under repeat pgbench load
I'm testing a checkout from a few days ago and trying to complete a day long pgbench stress test, with assertions and debugging on. I want to make sure the base code works as expected before moving on to testing checksums. It's crashing before finishing though. Here's a sample: 2012-12-20 20:36:11 EST [26613]: LOG: checkpoint complete: wrote 32 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=3.108 s, sync=0.978 s, total=4.187 s; sync files=7, longest=0.832 s, average=0.139 s TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1703) 2012-12-20 20:36:44 EST [26611]: LOG: server process (PID 4834) was terminated by signal 6: Aborted 2012-12-20 20:36:44 EST [26611]: DETAIL: Failed process was running: END; Running the same test set again gave the same crash, so this looks repeatable. It's not trivial to trigger given the average runtime to crash I'm seeing. Second sample: 2012-12-22 21:27:17 EST [6006]: LOG: checkpoint complete: wrote 34 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=3.310 s, sync=2.906 s, total=6.424 s; sync files=7, longest=2.648 s, average=0.415 s TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1703) 2012-12-22 21:27:23 EST [26611]: LOG: server process (PID 12143) was terminated by signal 6: Aborted 2012-12-22 21:27:23 EST [26611]: DETAIL: Failed process was running: END; The important part: TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1703) That's the check done by "AtEOXact_Buffers - clean up at end of transaction". It fired while executing the "END;"" statement at the end of the standard pgbench test. The test looks for loose things that weren't pinned/unpinned correctly by the transaction. I'm not sure if getting a stack trace at that point will be useful. The ref count mistake could easily have been made by an earlier statement in the transaction block, right? This is on a server where shared_buffers=2GB, checkpoint_segments=64. Test runs were automated with pgbench-tools, using the following configuration: MAX_WORKERS="4" SCRIPT="tpc-b.sql" SCALES="500 1000" SETCLIENTS="4 8 16 32 64 96" SETTIMES=3 RUNTIME=600 All of the crashes so far have been at scale=500, and I haven't seen those finish yet. It failed 7 hours in the first time, then 4 hours in. For code reference, the last commit in the source code tree I built against was: af275a12dfeecd621ed9899a9382f26a68310263 Thu Dec 20 14:23:31 2012 +0200 Looking at recent activity, the only buffer pin related changes I saw was this one: commit 62656617dbe49cca140f3da588a5e311b3fc35ea Date: Mon Dec 3 18:53:31 2012 + Avoid holding vmbuffer pin after VACUUM. This is my first test like this against 9.3 development though, so the cause could be an earlier commit. I'm just starting with the most recent work as the first suspect. Next I think I'll try autovacuum=off and see if the crash goes away. Other ideas are welcome. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto seeding problem when ssl=on
On 2012-12-22 14:20:56 -0500, Tom Lane wrote: > Marko Kreen writes: > > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch wrote: > >> How about instead calling RAND_cleanup() after each backend fork? > > > Not "instead" - the gettimeofday() makes sense in any case. Considering > > that it's immediate problem only for pgcrypto, this patch has higher chance > > for appearing in back-branches. > > > If the _cleanup() makes next RAND_bytes() call RAND_poll() again? > > Then yes, it certainly makes sense as core patch. > > I thought some more about this, and I think there is a pretty strong > objection to this version of the patch: it *only* fixes the > lack-of-randomness problem for pgcrypto users. Any other third-party > code depending on the OpenSSL PRNG will have the same problem. > Furthermore, it's not even obvious that this patch fixes it for all > pgcrypto functions --- it probably is all right today, since I assume > Marko remembers all the connections in that code, but it would be easy > to miss the problem in future additions. > > I believe that we'd be better off doing something in postmaster.c to > positively ensure that each session has a distinct seed value. Notice > that BackendRun() already takes measures to ensure that's the case for > the regular libc random() function; it seems like a reasonable extension > to also worry about OpenSSL's PRNG. > > I'm not thrilled with the suggestion to do RAND_cleanup() after forking > though, as that seems like it'll guarantee that each session starts out > with only a minimal amount of entropy. What seems to me like it'd be > most secure is to make the postmaster do RAND_add() with a gettimeofday > result after each successful fork, say at the bottom of > BackendStartup(). That way the randomness accumulates in the parent > process, and there's no way to predict the initial state of any session > without exact knowledge of every child process launch since postmaster > start. I am entirely unconvinced that adding in a gettimeofday() provides more entropy than what openssl gathers internally after a RAND_cleanup(). gettimeofday() will yield the same result in close enough forks on a significant number of systems whereas openssl should use stuff like /dev/urandom to initialize its PRNG after a cleanup. I think a better solution might be to use up some entropy in the postmaster *before* doing a fork and then mix in the pid + gettimeofday()after the fork. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest 2012-11 Progress
On Sat, Dec 08, 2012 at 05:34:08PM +0100, Andres Freund wrote: > I made a pass through most of the commitfest entries to update their > state to something sensible and where it seemed necessary inquired the > newest status. > Not sure thats really welcome, but it was the only thing I could think > of helping to make some progress. Belated thanks for doing that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feature Request: pg_replication_master()
> Forgive me because I have been up for 28 hours on a 9.0 to 9.2 migration > with Hot Standby and PgPool-II for load balancing but I was excessively > irritated that I had to go into recovery.conf to configure things. I am > one of the software authors that breaking recovery.conf will cause > problems for. Break it anyway. Speaking as another vendor who has custom software currently designed to manage recovery.conf: I will *happily* rewrite that software. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto seeding problem when ssl=on
On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote: > I believe that we'd be better off doing something in postmaster.c to > positively ensure that each session has a distinct seed value. Notice > that BackendRun() already takes measures to ensure that's the case for > the regular libc random() function; it seems like a reasonable extension > to also worry about OpenSSL's PRNG. > #ifdef USE_SSL > if (EnableSSL) > { > struct timeval tv; > > gettimeofday(&tv, NULL); > RAND_add(&tv, sizeof(tv), 0); > } > #endif Take the caution one step further and make it independent of EnableSSL. In a stock installation, a !EnableSSL postmaster will never seed its PRNG, and there's no vulnerability. Add a shared_preload_libraries module that uses the OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again. Other than that, looks good. > We could perhaps also make this conditional on not EXEC_BACKEND, since > the whole issue is moot if backends are launched by fork/exec. True. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pg_upgrade faster, again!
I promised to research allowing parallel execution of schema dump/restore, so I have developed the attached patch, with dramatic results: tables git patch 1000 22.2918.30 2000 30.7519.67 4000 46.3322.31 8000 81.0929.27 16000 145.4340.12 32000 309.3964.85 64000 754.62 108.76 These performance results are best-case because it was run with the the databases all the same size and equal to the number of server cores. (Test script attached.) This uses fork/processes on Unix, and threads on Windows. I need someone to check my use of waitpid() on Unix, and I need code compile and run testing on Windows. It basically adds a --jobs option, like pg_restore uses, to run multiple schema dumps/restores in parallel. I patterned this after the pg_restore pg_backup_archiver.c --jobs code. However, I found the pg_restore Windows code awkward because it puts everything in one struct array that has gaps for dead children. Because WaitForMultipleObjects() requires an array of thread handles with no gaps, the pg_restore code must make a temporary array for every call to WaitForMultipleObjects(). Instead, I created an array just for thread handles (rather than putting it in the same struct), and swapped entries into dead child slots to avoid gaps --- this allows the thread handle array to be passed directly to WaitForMultipleObjects(). Do people like this approach? Should we do the same in pg_restore. I expect us to be doing more parallelism in other areas so I would like to have an consistent approach. The only other optimization I can think of is to do parallel file copy per tablespace (in non-link mode). -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/Makefile b/contrib/pg_upgrade/Makefile new file mode 100644 index dec57a6..bbb14a1 *** a/contrib/pg_upgrade/Makefile --- b/contrib/pg_upgrade/Makefile *** PGAPPICON = win32 *** 5,11 PROGRAM = pg_upgrade OBJS = check.o controldata.o dump.o exec.o file.o function.o info.o \ !option.o page.o pg_upgrade.o relfilenode.o server.o \ tablespace.o util.o version.o version_old_8_3.o $(WIN32RES) PG_CPPFLAGS = -DFRONTEND -DDLSUFFIX=\"$(DLSUFFIX)\" -I$(srcdir) -I$(libpq_srcdir) --- 5,11 PROGRAM = pg_upgrade OBJS = check.o controldata.o dump.o exec.o file.o function.o info.o \ !option.o page.o parallel.o pg_upgrade.o relfilenode.o server.o \ tablespace.o util.o version.o version_old_8_3.o $(WIN32RES) PG_CPPFLAGS = -DFRONTEND -DDLSUFFIX=\"$(DLSUFFIX)\" -I$(srcdir) -I$(libpq_srcdir) diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c new file mode 100644 index f35852b..a4b0127 *** a/contrib/pg_upgrade/dump.c --- b/contrib/pg_upgrade/dump.c *** generate_old_dump(void) *** 33,50 /* create per-db dump files */ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) { ! char file_name[MAXPGPATH]; DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum]; pg_log(PG_STATUS, "%s", old_db->db_name); ! snprintf(file_name, sizeof(file_name), DB_DUMP_FILE_MASK, old_db->db_oid); ! exec_prog(RESTORE_LOG_FILE, NULL, true, "\"%s/pg_dump\" %s --schema-only --binary-upgrade --format=custom %s --file=\"%s\" \"%s\"", new_cluster.bindir, cluster_conn_opts(&old_cluster), ! log_opts.verbose ? "--verbose" : "", file_name, old_db->db_name); } end_progress_output(); check_ok(); } --- 33,55 /* create per-db dump files */ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) { ! char sql_file_name[MAXPGPATH], log_file_name[MAXPGPATH]; DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum]; pg_log(PG_STATUS, "%s", old_db->db_name); ! snprintf(sql_file_name, sizeof(sql_file_name), DB_DUMP_FILE_MASK, old_db->db_oid); ! snprintf(log_file_name, sizeof(log_file_name), DB_DUMP_LOG_FILE_MASK, old_db->db_oid); ! parallel_exec_prog(log_file_name, NULL, "\"%s/pg_dump\" %s --schema-only --binary-upgrade --format=custom %s --file=\"%s\" \"%s\"", new_cluster.bindir, cluster_conn_opts(&old_cluster), ! log_opts.verbose ? "--verbose" : "", sql_file_name, old_db->db_name); } + /* reap all children */ + while (reap_child(true) == true) + ; + end_progress_output(); check_ok(); } diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c new file mode 100644 index 19053fa..88686c5 *** a/contrib/pg_upgrade/option.c --- b/contrib/pg_upgrade/option.c *** parseCommandLine(int argc, char *argv[]) *** 52,57 --- 52,58 {"check", no_argument, NULL, 'c'}, {"link", no_argument, NULL, 'k'}, {"retain", no_argument, NULL, 'r'}, + {"jobs",
Re: [HACKERS] Review of Row Level Security
Kohei KaiGai wrote: > RLS entry of wiki has not been updated for long time, I'll try to > update the entry for high-level design in a couple of days. Thanks, I think that is essential for a productive discussion of the issue. For me, it would help tremendously if you could provide a very short statement of the over-arching goal of the current development effort. As an example, I could summarize the SSI development as: "Ensure that the result of executing any set of successfully committed serializable transactions is the same as having run those transactions one at a time, without introducing any new blocking." Proceeding from a general goal statement like that, to general principles of how it will be achieved before getting down to implementation details helps me put the details in proper context. I apologize again for coming in so late with strong opinions, but I thought I knew what "row level security" meant, and it was just a question of how to do it, but I can't reconcile what I thought the feature was about with the patch I'm seeing; perhaps it's just a lack of the hight level context that's making it difficult. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto seeding problem when ssl=on
Marko Kreen writes: > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch wrote: >> How about instead calling RAND_cleanup() after each backend fork? > Not "instead" - the gettimeofday() makes sense in any case. Considering > that it's immediate problem only for pgcrypto, this patch has higher chance > for appearing in back-branches. > If the _cleanup() makes next RAND_bytes() call RAND_poll() again? > Then yes, it certainly makes sense as core patch. I thought some more about this, and I think there is a pretty strong objection to this version of the patch: it *only* fixes the lack-of-randomness problem for pgcrypto users. Any other third-party code depending on the OpenSSL PRNG will have the same problem. Furthermore, it's not even obvious that this patch fixes it for all pgcrypto functions --- it probably is all right today, since I assume Marko remembers all the connections in that code, but it would be easy to miss the problem in future additions. I believe that we'd be better off doing something in postmaster.c to positively ensure that each session has a distinct seed value. Notice that BackendRun() already takes measures to ensure that's the case for the regular libc random() function; it seems like a reasonable extension to also worry about OpenSSL's PRNG. I'm not thrilled with the suggestion to do RAND_cleanup() after forking though, as that seems like it'll guarantee that each session starts out with only a minimal amount of entropy. What seems to me like it'd be most secure is to make the postmaster do RAND_add() with a gettimeofday result after each successful fork, say at the bottom of BackendStartup(). That way the randomness accumulates in the parent process, and there's no way to predict the initial state of any session without exact knowledge of every child process launch since postmaster start. So it'd go something like #ifdef USE_SSL if (EnableSSL) { struct timeval tv; gettimeofday(&tv, NULL); RAND_add(&tv, sizeof(tv), 0); } #endif We could perhaps also make this conditional on not EXEC_BACKEND, since the whole issue is moot if backends are launched by fork/exec. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: store additional info in GIN index
Hi! On Thu, Dec 6, 2012 at 5:44 AM, Tomas Vondra wrote: > Then I've run a simple benchmarking script, and the results are not as > good as I expected, actually I'm getting much worse performance than > with the original GIN index. > > The following table contains the time of loading the data (not a big > difference), and number of queries per minute for various number of > words in the query. > > The queries looks like this > > SELECT id FROM messages > WHERE body_tsvector @@ plainto_tsquery('english', 'word1 word2 ...') > > so it's really the simplest form of FTS query possible. > >without patch | with patch > > loading 750 sec| 770 sec > 1 word 1500|1100 > 2 words 23000|9800 > 3 words 24000|9700 > 4 words 16000|7200 > > > I'm not saying this is a perfect benchmark, but the differences (of > querying) are pretty huge. Not sure where this difference comes from, > but it seems to be quite consistent (I usually get +-10% results, which > is negligible considering the huge difference). > > Is this an expected behaviour that will be fixed by another patch? > Another patches which significantly accelerate index search will be provided. This patch changes only GIN posting lists/trees storage. However, it wasn't expected that this patch significantly changes index scan speed in any direction. The database contains ~680k messages from the mailing list archives, > i.e. about 900 MB of data (in the table), and the GIN index on tsvector > is about 900MB too. So the whole dataset nicely fits into memory (8GB > RAM), and it seems to be completely CPU bound (no I/O activity at all). > > The configuration was exactly the same in both cases > > shared buffers = 1GB > work mem = 64 MB > maintenance work mem = 256 MB > > I can either upload the database somewhere, or provide the benchmarking > script if needed. Unfortunately, I can't reproduce such huge slowdown on my testcases. Could you share both database and benchmarking script? -- With best regards, Alexander Korotkov.
Re: [HACKERS] foreign key locks
On 2012-12-22 10:53:47 +0100, Erik Rijkers wrote: > On Thu, November 29, 2012 17:16, Alvaro Herrera wrote: > > Here it is. > > > > fklocks-26.patch.gz > > This applies today after removing, not only the infamous catversion.h chunk, > but also a file_fdw > chunk. (It's not really a problem.) > > I also wondered if anyone has any perl/python/bash programs handy that uses > these new options. I > can hack something together myself, but I just thought I'd ask. I have mostly done code review and some very targeted testing (pgbench, gdb + psql). So no ready scripts, sorry. There are imo some unfinished pieces (the whole EPQ integration) that will require significant retesting once Alvaro has time to work on this again.. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks
On Thu, November 29, 2012 17:16, Alvaro Herrera wrote: > Here it is. > > fklocks-26.patch.gz This applies today after removing, not only the infamous catversion.h chunk, but also a file_fdw chunk. (It's not really a problem.) I also wondered if anyone has any perl/python/bash programs handy that uses these new options. I can hack something together myself, but I just thought I'd ask. Thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers