[HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-22 Thread Greg Smith
I'm testing a checkout from a few days ago and trying to complete a day 
long pgbench stress test, with assertions and debugging on.  I want to 
make sure the base code works as expected before moving on to testing 
checksums.  It's crashing before finishing though.  Here's a sample:


2012-12-20 20:36:11 EST [26613]: LOG:  checkpoint complete: wrote 32 
buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; 
write=3.108 s, sync=0.978 s, total=4.187 s; sync files=7, longest=0.832 
s, average=0.139 s
TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", 
Line: 1703)
2012-12-20 20:36:44 EST [26611]: LOG:  server process (PID 4834) was 
terminated by signal 6: Aborted

2012-12-20 20:36:44 EST [26611]: DETAIL:  Failed process was running: END;

Running the same test set again gave the same crash, so this looks 
repeatable.  It's not trivial to trigger given the average runtime to 
crash I'm seeing.  Second sample:


2012-12-22 21:27:17 EST [6006]: LOG:  checkpoint complete: wrote 34 
buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; 
write=3.310 s, sync=2.906 s, total=6.424 s; sync files=7, longest=2.648 
s, average=0.415 s
TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", 
Line: 1703)
2012-12-22 21:27:23 EST [26611]: LOG:  server process (PID 12143) was 
terminated by signal 6: Aborted

2012-12-22 21:27:23 EST [26611]: DETAIL:  Failed process was running: END;

The important part:

TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", 
Line: 1703)


That's the check done by "AtEOXact_Buffers - clean up at end of 
transaction".  It fired while executing the "END;"" statement at the end 
of the standard pgbench test.  The test looks for loose things that 
weren't pinned/unpinned correctly by the transaction.  I'm not sure if 
getting a stack trace at that point will be useful.  The ref count 
mistake could easily have been made by an earlier statement in the 
transaction block, right?


This is on a server where shared_buffers=2GB, checkpoint_segments=64. 
Test runs were automated with pgbench-tools, using the following 
configuration:


MAX_WORKERS="4"
SCRIPT="tpc-b.sql"
SCALES="500 1000"
SETCLIENTS="4 8 16 32 64 96"
SETTIMES=3
RUNTIME=600

All of the crashes so far have been at scale=500, and I haven't seen 
those finish yet.  It failed 7 hours in the first time, then 4 hours in.


For code reference, the last commit in the source code tree I built 
against was:


af275a12dfeecd621ed9899a9382f26a68310263
Thu Dec 20 14:23:31 2012 +0200

Looking at recent activity, the only buffer pin related changes I saw 
was this one:


commit 62656617dbe49cca140f3da588a5e311b3fc35ea
Date:   Mon Dec 3 18:53:31 2012 +
Avoid holding vmbuffer pin after VACUUM.

This is my first test like this against 9.3 development though, so the 
cause could be an earlier commit.  I'm just starting with the most 
recent work as the first suspect.  Next I think I'll try autovacuum=off 
and see if the crash goes away.  Other ideas are welcome.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto seeding problem when ssl=on

2012-12-22 Thread Andres Freund
On 2012-12-22 14:20:56 -0500, Tom Lane wrote:
> Marko Kreen  writes:
> > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch  wrote:
> >> How about instead calling RAND_cleanup() after each backend fork?
>
> > Not "instead" - the gettimeofday() makes sense in any case.  Considering
> > that it's immediate problem only for pgcrypto, this patch has higher chance
> > for appearing in back-branches.
>
> > If the _cleanup() makes next RAND_bytes() call RAND_poll() again?
> > Then yes, it certainly makes sense as core patch.
>
> I thought some more about this, and I think there is a pretty strong
> objection to this version of the patch: it *only* fixes the
> lack-of-randomness problem for pgcrypto users.  Any other third-party
> code depending on the OpenSSL PRNG will have the same problem.
> Furthermore, it's not even obvious that this patch fixes it for all
> pgcrypto functions --- it probably is all right today, since I assume
> Marko remembers all the connections in that code, but it would be easy
> to miss the problem in future additions.
>
> I believe that we'd be better off doing something in postmaster.c to
> positively ensure that each session has a distinct seed value.  Notice
> that BackendRun() already takes measures to ensure that's the case for
> the regular libc random() function; it seems like a reasonable extension
> to also worry about OpenSSL's PRNG.
>
> I'm not thrilled with the suggestion to do RAND_cleanup() after forking
> though, as that seems like it'll guarantee that each session starts out
> with only a minimal amount of entropy.  What seems to me like it'd be
> most secure is to make the postmaster do RAND_add() with a gettimeofday
> result after each successful fork, say at the bottom of
> BackendStartup().  That way the randomness accumulates in the parent
> process, and there's no way to predict the initial state of any session
> without exact knowledge of every child process launch since postmaster
> start.

I am entirely unconvinced that adding in a gettimeofday() provides more
entropy than what openssl gathers internally after a
RAND_cleanup(). gettimeofday() will yield the same result in close
enough forks on a significant number of systems whereas openssl should
use stuff like /dev/urandom to initialize its PRNG after a cleanup.

I think a better solution might be to use up some entropy in the
postmaster *before* doing a fork and then mix in the pid +
gettimeofday()after the fork.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2012-11 Progress

2012-12-22 Thread Noah Misch
On Sat, Dec 08, 2012 at 05:34:08PM +0100, Andres Freund wrote:
> I made a pass through most of the commitfest entries to update their
> state to something sensible and where it seemed necessary inquired the
> newest status.
> Not sure thats really welcome, but it was the only thing I could think
> of helping to make some progress.

Belated thanks for doing that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature Request: pg_replication_master()

2012-12-22 Thread Josh Berkus

> Forgive me because I have been up for 28 hours on a 9.0 to 9.2 migration
> with Hot Standby and PgPool-II for load balancing but I was excessively
> irritated that I had to go into recovery.conf to configure things. I am
> one of the software authors that breaking recovery.conf will cause
> problems for. Break it anyway.

Speaking as another vendor who has custom software currently designed to
manage recovery.conf: I will *happily* rewrite that software.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto seeding problem when ssl=on

2012-12-22 Thread Noah Misch
On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote:
> I believe that we'd be better off doing something in postmaster.c to
> positively ensure that each session has a distinct seed value.  Notice
> that BackendRun() already takes measures to ensure that's the case for
> the regular libc random() function; it seems like a reasonable extension
> to also worry about OpenSSL's PRNG.

> #ifdef USE_SSL
>   if (EnableSSL)
>   {
>   struct timeval tv;
> 
>   gettimeofday(&tv, NULL);
>   RAND_add(&tv, sizeof(tv), 0);
>   }
> #endif

Take the caution one step further and make it independent of EnableSSL.  In a
stock installation, a !EnableSSL postmaster will never seed its PRNG, and
there's no vulnerability.  Add a shared_preload_libraries module that uses the
OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again.

Other than that, looks good.

> We could perhaps also make this conditional on not EXEC_BACKEND, since
> the whole issue is moot if backends are launched by fork/exec.

True.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pg_upgrade faster, again!

2012-12-22 Thread Bruce Momjian
I promised to research allowing parallel execution of schema
dump/restore, so I have developed the attached patch, with dramatic
results:

tables   git patch
 1000   22.2918.30
 2000   30.7519.67
 4000   46.3322.31
 8000   81.0929.27
16000  145.4340.12
32000  309.3964.85
64000  754.62   108.76

These performance results are best-case because it was run with the the
databases all the same size and equal to the number of server cores. 
(Test script attached.)

This uses fork/processes on Unix, and threads on Windows.  I need
someone to check my use of waitpid() on Unix, and I need code compile
and run testing on Windows.

It basically adds a --jobs option, like pg_restore uses, to run multiple
schema dumps/restores in parallel.   I patterned this after the
pg_restore pg_backup_archiver.c --jobs code.  However, I found the
pg_restore Windows code awkward because it puts everything in one struct
array that has gaps for dead children.  Because WaitForMultipleObjects()
requires an array of thread handles with no gaps, the pg_restore code
must make a temporary array for every call to WaitForMultipleObjects(). 

Instead, I created an array just for thread handles (rather than putting
it in the same struct), and swapped entries into dead child slots to
avoid gaps --- this allows the thread handle array to be passed directly
to WaitForMultipleObjects().  

Do people like this approach?  Should we do the same in pg_restore.  I
expect us to be doing more parallelism in other areas so I would like to
have an consistent approach.

The only other optimization I can think of is to do parallel file copy
per tablespace (in non-link mode).

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/Makefile b/contrib/pg_upgrade/Makefile
new file mode 100644
index dec57a6..bbb14a1
*** a/contrib/pg_upgrade/Makefile
--- b/contrib/pg_upgrade/Makefile
*** PGAPPICON = win32
*** 5,11 
  
  PROGRAM  = pg_upgrade
  OBJS = check.o controldata.o dump.o exec.o file.o function.o info.o \
!option.o page.o pg_upgrade.o relfilenode.o server.o \
 tablespace.o util.o version.o version_old_8_3.o $(WIN32RES)
  
  PG_CPPFLAGS  = -DFRONTEND -DDLSUFFIX=\"$(DLSUFFIX)\" -I$(srcdir) -I$(libpq_srcdir)
--- 5,11 
  
  PROGRAM  = pg_upgrade
  OBJS = check.o controldata.o dump.o exec.o file.o function.o info.o \
!option.o page.o parallel.o pg_upgrade.o relfilenode.o server.o \
 tablespace.o util.o version.o version_old_8_3.o $(WIN32RES)
  
  PG_CPPFLAGS  = -DFRONTEND -DDLSUFFIX=\"$(DLSUFFIX)\" -I$(srcdir) -I$(libpq_srcdir)
diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c
new file mode 100644
index f35852b..a4b0127
*** a/contrib/pg_upgrade/dump.c
--- b/contrib/pg_upgrade/dump.c
*** generate_old_dump(void)
*** 33,50 
   	/* create per-db dump files */
  	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
  	{
! 		char 		file_name[MAXPGPATH];
  		DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
  
  		pg_log(PG_STATUS, "%s", old_db->db_name);
! 		snprintf(file_name, sizeof(file_name), DB_DUMP_FILE_MASK, old_db->db_oid);
  
! 		exec_prog(RESTORE_LOG_FILE, NULL, true,
    "\"%s/pg_dump\" %s --schema-only --binary-upgrade --format=custom %s --file=\"%s\" \"%s\"",
    new_cluster.bindir, cluster_conn_opts(&old_cluster),
!   log_opts.verbose ? "--verbose" : "", file_name, old_db->db_name);
  	}
  
  	end_progress_output();
  	check_ok();
  }
--- 33,55 
   	/* create per-db dump files */
  	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
  	{
! 		char 		sql_file_name[MAXPGPATH], log_file_name[MAXPGPATH];
  		DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
  
  		pg_log(PG_STATUS, "%s", old_db->db_name);
! 		snprintf(sql_file_name, sizeof(sql_file_name), DB_DUMP_FILE_MASK, old_db->db_oid);
! 		snprintf(log_file_name, sizeof(log_file_name), DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
  
! 		parallel_exec_prog(log_file_name, NULL,
    "\"%s/pg_dump\" %s --schema-only --binary-upgrade --format=custom %s --file=\"%s\" \"%s\"",
    new_cluster.bindir, cluster_conn_opts(&old_cluster),
!   log_opts.verbose ? "--verbose" : "", sql_file_name, old_db->db_name);
  	}
  
+ 	/* reap all children */
+ 	while (reap_child(true) == true)
+ 		;
+ 	
  	end_progress_output();
  	check_ok();
  }
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
new file mode 100644
index 19053fa..88686c5
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*** parseCommandLine(int argc, char *argv[])
*** 52,57 
--- 52,58 
  		{"check", no_argument, NULL, 'c'},
  		{"link", no_argument, NULL, 'k'},
  		{"retain", no_argument, NULL, 'r'},
+ 		{"jobs", 

Re: [HACKERS] Review of Row Level Security

2012-12-22 Thread Kevin Grittner
Kohei KaiGai wrote:

> RLS entry of wiki has not been updated for long time, I'll try to
> update the entry for high-level design in a couple of days.

Thanks, I think that is essential for a productive discussion of
the issue.

For me, it would help tremendously if you could provide a very
short statement of the over-arching goal of the current development
effort. As an example, I could summarize the SSI development as:

"Ensure that the result of executing any set of successfully
committed serializable transactions is the same as having run those
transactions one at a time, without introducing any new blocking."

Proceeding from a general goal statement like that, to general
principles of how it will be achieved before getting down to
implementation details helps me put the details in proper context.

I apologize again for coming in so late with strong opinions, but I
thought I knew what "row level security" meant, and it was just a
question of how to do it, but I can't reconcile what I thought the
feature was about with the patch I'm seeing; perhaps it's just a
lack of the hight level context  that's making it difficult.

-Kevin 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto seeding problem when ssl=on

2012-12-22 Thread Tom Lane
Marko Kreen  writes:
> On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch  wrote:
>> How about instead calling RAND_cleanup() after each backend fork?

> Not "instead" - the gettimeofday() makes sense in any case.  Considering
> that it's immediate problem only for pgcrypto, this patch has higher chance
> for appearing in back-branches.

> If the _cleanup() makes next RAND_bytes() call RAND_poll() again?
> Then yes, it certainly makes sense as core patch.

I thought some more about this, and I think there is a pretty strong
objection to this version of the patch: it *only* fixes the
lack-of-randomness problem for pgcrypto users.  Any other third-party
code depending on the OpenSSL PRNG will have the same problem.
Furthermore, it's not even obvious that this patch fixes it for all
pgcrypto functions --- it probably is all right today, since I assume
Marko remembers all the connections in that code, but it would be easy
to miss the problem in future additions.

I believe that we'd be better off doing something in postmaster.c to
positively ensure that each session has a distinct seed value.  Notice
that BackendRun() already takes measures to ensure that's the case for
the regular libc random() function; it seems like a reasonable extension
to also worry about OpenSSL's PRNG.

I'm not thrilled with the suggestion to do RAND_cleanup() after forking
though, as that seems like it'll guarantee that each session starts out
with only a minimal amount of entropy.  What seems to me like it'd be
most secure is to make the postmaster do RAND_add() with a gettimeofday
result after each successful fork, say at the bottom of
BackendStartup().  That way the randomness accumulates in the parent
process, and there's no way to predict the initial state of any session
without exact knowledge of every child process launch since postmaster
start.  So it'd go something like

#ifdef USE_SSL
if (EnableSSL)
{
struct timeval tv;

gettimeofday(&tv, NULL);
RAND_add(&tv, sizeof(tv), 0);
}
#endif

We could perhaps also make this conditional on not EXEC_BACKEND, since
the whole issue is moot if backends are launched by fork/exec.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-22 Thread Alexander Korotkov
Hi!

On Thu, Dec 6, 2012 at 5:44 AM, Tomas Vondra  wrote:

> Then I've run a simple benchmarking script, and the results are not as
> good as I expected, actually I'm getting much worse performance than
> with the original GIN index.
>
> The following table contains the time of loading the data (not a big
> difference), and number of queries per minute for various number of
> words in the query.
>
> The queries looks like this
>
> SELECT id FROM messages
>  WHERE body_tsvector @@ plainto_tsquery('english', 'word1 word2 ...')
>
> so it's really the simplest form of FTS query possible.
>
>without patch |  with patch
> 
> loading   750 sec| 770 sec
> 1 word   1500|1100
> 2 words 23000|9800
> 3 words 24000|9700
> 4 words 16000|7200
> 
>
> I'm not saying this is a perfect benchmark, but the differences (of
> querying) are pretty huge. Not sure where this difference comes from,
> but it seems to be quite consistent (I usually get +-10% results, which
> is negligible considering the huge difference).
>
> Is this an expected behaviour that will be fixed by another patch?
>

Another patches which significantly accelerate index search will be
provided. This patch changes only GIN posting lists/trees storage. However,
it wasn't expected that this patch significantly changes index scan speed
in any direction.

The database contains ~680k messages from the mailing list archives,
> i.e. about 900 MB of data (in the table), and the GIN index on tsvector
> is about 900MB too. So the whole dataset nicely fits into memory (8GB
> RAM), and it seems to be completely CPU bound (no I/O activity at all).
>
> The configuration was exactly the same in both cases
>
> shared buffers = 1GB
> work mem = 64 MB
> maintenance work mem = 256 MB
>
> I can either upload the database somewhere, or provide the benchmarking
> script if needed.


Unfortunately, I can't reproduce such huge slowdown on my testcases. Could
you share both database and benchmarking script?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] foreign key locks

2012-12-22 Thread Andres Freund
On 2012-12-22 10:53:47 +0100, Erik Rijkers wrote:
> On Thu, November 29, 2012 17:16, Alvaro Herrera wrote:
> > Here it is.
> >
> >   fklocks-26.patch.gz
>
> This applies today after removing, not only the infamous catversion.h chunk, 
> but also a file_fdw
> chunk. (It's not really a problem.)
>
> I also wondered if anyone has any perl/python/bash programs handy that uses 
> these new options.  I
> can hack something together myself, but I just thought I'd ask.

I have mostly done code review and some very targeted testing (pgbench,
gdb + psql). So no ready scripts, sorry.
There are imo some unfinished pieces (the whole EPQ integration) that
will require significant retesting once Alvaro has time to work on this
again..

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks

2012-12-22 Thread Erik Rijkers
On Thu, November 29, 2012 17:16, Alvaro Herrera wrote:
> Here it is.
>
>   fklocks-26.patch.gz

This applies today after removing, not only the infamous catversion.h chunk, 
but also a file_fdw
chunk. (It's not really a problem.)

I also wondered if anyone has any perl/python/bash programs handy that uses 
these new options.  I
can hack something together myself, but I just thought I'd ask.


Thanks,


Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers