Re: [HACKERS] Notes on implementing URI syntax for libpq

2011-11-28 Thread Greg Smith

On 11/24/2011 05:21 AM, Alvaro Herrera wrote:

A coworker also suggested using a different designator:

postgresqli:///path/to/socket:5433/database
postgresqli://:5433/database


This is not unprecedented.  An example is how CUPS handles this problem 
when connecting printers using URIs:  
http://www.cups.org/documentation.php/network.html where you might see 
this for the usual port:


lpd://ip-address-or-hostname/queue

And this for AppSocket AKA JetDirect:

socket://ip-address-or-hostname

I am certainly not going to defend printing setup with CUPS as a model 
worth emulating, just noting the similarity here.  I think we'll save 
miles of user headaches if there's only one designator.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-27 Thread Greg Smith

On 11/27/2011 04:39 PM, Ants Aasma wrote:

On the AMD I saw about 3% performance drop with timing enabled. On the
Intel machine I couldn't measure any statistically significant change.


Oh no, it's party pooper time again.  Sorry I have to be the one to do 
it this round.  The real problem with this whole area is that we know 
there are systems floating around where the amount of time taken to grab 
timestamps like this is just terrible.  I've been annoyed enough by that 
problem to spend some time digging into why that is--seems to be a bunch 
of trivia around the multiple ways to collect time info on x86 
systems--and after this CommitFest is over I was already hoping to dig 
through my notes and start quantifying that more.  So you can't really 
prove the overhead of this approach is acceptable just by showing two 
examples; we need to find one of the really terrible clocks and test 
there to get a real feel for the worst-case.


I recall a patch similar to this one was submitted by Greg Stark some 
time ago.  It used the info for different reasons--to try and figure out 
whether reads were cached or not--but I believe it withered rather than 
being implemented mainly because it ran into the same fundamental 
roadblocks here.  My memory could be wrong here, there were also 
concerns about what the data would be used for.


I've been thinking about a few ways to try and cope with this whole 
class of timing problem:


-Document the underlying problem and known workarounds, provide a way to 
test how bad the overhead is, and just throw our hands up and say 
"sorry, you just can't instrument like this" if someone has a slow system.


-Have one of the PostgreSQL background processes keep track of a time 
estimate on its own, only periodically pausing to sync against the real 
time.  Then most calls to gettimeofday() can use that value instead.  I 
was thinking of that idea for slightly longer running things though; I 
doubt that can be made accurate enough to test instrument buffer


And while I hate to kick off massive bike-shedding in your direction, 
I'm also afraid this area--collecting stats about how long individual 
operations take--will need a much wider ranging approach than just 
looking at the buffer cache ones.  If you step back and ask "what do 
people expect here?", there's a pretty large number who really want 
something like Oracle's v$session_wait  and v$system_event interface for 
finding the underlying source of slow things.  There's enough demand for 
that that EnterpriseDB has even done some work in this area too; what 
I've been told about it suggests the code isn't a great fit for 
contribution to community PostgreSQL though.  Like I said, this area is 
really messy and hard to get right.


Something more ambitious like the v$ stuff would also take care of what 
you're doing here; I'm not sure that what you've done helps built it 
though.  Please don't take that personally.  Part of one of my own 
instrumentation patches recently was rejected out of hand for the same 
reason, just not being general enough.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logging in high performance systems.

2011-11-23 Thread Greg Smith

On 11/23/2011 09:28 PM, Theo Schlossnagle wrote:

The second thing I did was write a sample use of those hooks to
implement a completely non-blocking fifo logger. (if it would block,
it drops the log line).  The concept is that we could run this without
risk of negative performance impact due to slow log reading (choosing
to drop logs in lieu of pausing).  And a simple process could be
written to consume from the fifo.


This was one of the topics at the last developer's meeting you might not 
have seen go by:  
http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting#Improving_Logging  
There was a reference to a pipe-based implementation from Magnus that I 
haven't gotten a chance to track down yet.  I think this area is going 
to start hitting a lot more people in the upcoming couple of years, 
since I'm seeing it increasingly at two customers I consider "canary in 
a cole mine" sentinels for performance issues.


I'm now roughly considering three types of users here:

-Don't care about the overhead of logging, but are sick of parsing text 
files.  Would prefer the data be in a table instead.
-Concerned enough about overhead that statement-level logging is 
impractical to log or table, but can cope with logging for other things.
-Logging rate can burst high enough that messages must start being 
dropped instead no matter where they go.  Before making a big change, 
log file vs. table needs to be carefully explored to figure which of the 
two approaches has more reasonable behavior/performance trade-offs.


I've been trying to attack this starting at the middle, with the 
pg_stat_statements rework Peter here did for the current CommitFest.  If 
you've already worked out a way to simulate heavy logging as part of 
what you've done here, I'd be quite interested to hear how capable you 
feel it is for the class of problem you're seeing.  I've always assumed 
that pushing the most common queries into shared memory and only showing 
them on demand, rather than logging them line at a time, could be a big 
win for some places.  We're still a bit light on benchmarks proving that 
is the case so far though.


My assumption has been that eventually a lossy logger was going to be 
necessary for busier sites, I just haven't been suffering from one 
enough to hack on it yet.  If it's possible to work this out in enough 
detail to figure out where the hooks go, and to prove they work with at 
least one consumer of them, I'd consider that a really useful thing to 
try and squeeze into 9.2.  The processing parts can always be further 
improved later based on production feedback, going along with my recent 
them of letting extensions that poke and probe existing hooks be one 
place to brew next version features at.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core Extensions relocation

2011-11-21 Thread Greg Smith

On 11/21/2011 11:40 AM, Bruce Momjian wrote:

I think a question is how often people are waiting for features that
actually can be addressed in a contrib/plugin way.  My gut feeling is
that most missing features have to be added to the server core (e.g.
index-only scans) and are not possible to add in a contrib/plugin way.
   


Good question; let's talk about 9.0.  We were building/distributing 
three things for that version that poked into the server:


-Replication monitoring tools that slipped from the 9.0 schedule, 
similar to what became pg_stat_replication in 9.1

-An early version of what became hot_standby_feedback in 9.1.
-pg_streamrecv

While these weren't all packaged as extensions per se, all of them used 
the PGXS interface.  And they all provided deployment blocking features 
to early adopters before those features were available in core, in some 
cases after the issues they address had been encountered in production 
deployments.  As I was ranting on my blog recently, I'm seeing more 
complaints recently about monitoring and management features--exactly 
the sort of thing that you can improve as an extension, and that the 
extensions I've proposed provide--than I am over missing big features.


Index-only scans are a good example, as one of the most requested 
performance feature you can only get in core (I'd put them at #2 behind 
materialized views for the customers I talk to).  I wouldn't bet that 
they are considered more important by a typical deployment than good 
built-in query profiling though.  I get complaints about query 
monitoring from every single PostgreSQL install, while complaints about 
not having index-only scans only come from the bigger installations.  
Note how demand is high enough that we have two pg_stat_statements 
replacements submitted right now.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core Extensions relocation

2011-11-19 Thread Greg Smith

On 11/18/2011 09:35 AM, Tom Lane wrote:

Subdividing/rearranging contrib makes the packager's
life more complicated, *and* makes his users' lives more complicated,
if only because things aren't where they were before.  It seems unlikely
to happen, at least in the near term.
   


Users are visibly suffering by the current packaging.  Production DBAs 
are afraid to install contrib because it's described as untrustworthy.  
They are hit by emergencies that the inspection tools would help with, 
but cannot get contrib installed easily without root permissions.  They 
have performance issues that the contrib modules I'm trying to relocate 
into the server package would help with, but company policies related to 
post-deployment installation mean they cannot use them.  They have to 
always be installed to make this class of problem go away.


If you feel there are more users that would be negatively impacted by 
some directories moving than what I'm describing above, we are a very 
fundamental disagreement here.  The status quote for all of these 
extensions is widely understood to be unusable in common corporate 
environments.  Packagers should be trying to improve the PostgreSQL 
experience, and I'm trying to help with that.


In the face of pushback from packagers, I can roll my own packages that 
are designed without this problem; I'm being pushed into doing that 
instead of working on community PostgreSQL already.  But I'd really 
prefer to see this very common problem identified as so important it 
should get fixed everywhere instead.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core Extensions relocation

2011-11-19 Thread Greg Smith

On 11/18/2011 03:36 PM, Josh Berkus wrote:

Of course, packagers may then reasonably ask why that code is not just
part of the core?
   


Let me step back from the implementation ideas for a minute and talk 
about this, and how it ties into release philosophy.  The extensions 
infrastructure for PostgreSQL is one of its strongest features.  We can 
use it as a competitive advantage over other databases, one that can 
make this database stable and continuously innovating at the same time.  
But that's not happening enough yet; I feel this change is a major step 
in that direction.  There's no demonstration that extensions are edible 
dog food like the core database visibly eating a can.


To see why this matters so much, let's compare two stereotypes of 
PostgreSQL users at different extremes of upgrade tolerance.  First we 
have the classic enterprise DBA.  Relative to this person's 
expectations, PostgreSQL releases are way too fast.  They can't upgrade 
their database every year; that's madness.  This is the person who we 
yell at about how they should upgrade to the latest minor point release, 
because once they have a working system they touch *nothing*.  For this 
user, the long beta period of new PostgreSQL releases, and its general 
conservative development model, are key components to PostgreSQL being 
suitable for them.


At the other extreme, we have the software developer with a frantic 
development/release schedule, the one who's running the latest stable 
version of every tool they use.  This person expects some bugs in them, 
and the first reaction to running into one is to ask "is this fixed in 
the next version?"  You'll find at least one component in their stack 
that's labeled "compiled from the latest checkout" because that was the 
only way to get a working version.  And to them, the yearly release 
cycle of PostgreSQL is glacial.  When they run into a limitation that is 
impacting a user base that's doubling every few months, they can't wait 
a year for a fix; they could easily go out of business by then.


The key to satisfying both these extremes at once is a strong extensions 
infrastructure, led by the example of serious tools that are provided 
that way in the PostgreSQL core.  For this to catch on, we need the 
classic DBAs to trust core extensions enough to load them into 
production.  They don't do that now because the current contrib 
description sounds too scary, and they may not even have loaded that 
package onto the server.  And we need people who want more frequent 
database core changes to see that extensions are a viable way to build 
some pretty extensive server hacks.


I've submitted two changes to this CommitFest that are enhancing 
features in this "core extensions" set.  Right now I have multiple 
customers who are desperate for both of those features.  With 
extensions, I can give them changes that solve their immediate crisis 
right now, almost a full year before they could possibly appear in a 
proper release of PostgreSQL.  And then I can push those back toward 
community PostgreSQL, with any luck landing in the next major version.  
Immediate gratification for the person funding development, and peer 
reviewed code that goes through a long beta and release cycle.  That's 
the vision I have for a PostgreSQL that is simultaneously stable and 
agile.  The easiest way to get there it is to lead by example--by having 
extensions that provide necessary, visible components to core, while 
still being obviously separate components.  That's the best approach for 
proving this development model works and is suitable for everyone.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core Extensions relocation

2011-11-18 Thread Greg Smith

On 11/17/2011 03:18 PM, Peter Eisentraut wrote:

Who's to say that after this, the core extensions won't end up in a new
separate package postgresql-extensions (or similar) or might even stay
in postgresql-contrib, for compatibility?
   


I don't know why packagers would make an active decision that will make 
their lives more difficult, with no benefit to them and a regression 
against project recommendations for their users.  The last thing anyone 
packaging PostgreSQL wants is more packages to deal with; there are 
already too many.  Each of the current sub-packages has a legitimate 
technical or distribution standard reason to exist--guidelines like 
"break out client and server components" or "minimize the package 
dependencies for the main server".  I can't think of any good reason 
that would inspire the sort of drift you're concerned about.


There's little compatibility argument beyond consistency with the 
previous packages.  The reason why this is suddenly feasible now is that 
the under the hood change are almost all hidden by CREATE EXTENSION.


And if some wanted to wander this way, they'll end up having to maintain 
a doc patch to address the fact that they've broken with project 
recommendations.  This text in what I submitted will no longer be true:


"This appendix contains information regarding core extensions that are 
built and included with a standard installation of PostgreSQL."


One of the reasons I picked the name I did was to contrast with the 
existing description of contrib:


"porting tools, analysis utilities, and plug-in features that are not 
part of the core PostgreSQL system, mainly because they address a 
limited audience or are too experimental to be part of the main source 
tree."


That says it's perfectly fine to make these optional in another 
package--they're not "part of the core".  That scary wording is 
practically telling packagers to separate them, so it's easy to keep the 
experimental stuff away from the production quality components.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CommitFest 2011-11 Started

2011-11-16 Thread Greg Smith
The November CommitFest is now closed for new entries.  We have 30 
patches in the queue that are still looking for a reviewer at this 
point, out of a total of 53.  If you'd like to review a patch but are 
looking for a suggestion as to which to choose, e-mail the 
pgsql-rrreviewers list saying so.


4 patches have already received some early review and are waiting for 
updates from the author:


Allow toast tables to be moved to a different tablespace
Online base backup from the hot-standby
Separate pg_stat_activity into current_query into state and query columns
Allow substitute allocator for PGresult.

And we have 5 earlier submissions that have been flagged ready for a 
committer to look at them; names here are the expected committer when 
one has been mentioned already:


Non-inheritable check constraints (Greg Stark)
pg_last_xact_insert_timestamp (Simon)
Add Support for building with Visual Studio 2010 (Magnus)
plperl verify utf8 strings
xsubpp from cpan

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Displaying accumulated autovacuum cost

2011-11-15 Thread Greg Smith

On 10/05/2011 03:02 AM, Greg Smith wrote:
Presumably you meant to ask if this makes sense to show when cost 
accounting isn't enabled, because the code doesn't do that right now.  
No cost accounting, no buffer usage/write rate data as this was 
submitted.


This is done in the attached update.  I just made the page accounting 
happen all the time, regardless of whether the costs were being 
accumulated.  Added a read rate too, which is how fast reads happened 
from the OS cache to shared_buffers.  Simple test case generates a 600MB 
pgbench_accounts database and wipes out enough to take a while to clean 
up; it needs log_autovacuum_min_duration  = 0 and then:


$ createdb pgbench
$ pgbench -i -s 10 pgbench
$ psql -d pgbench -c "delete from pgbench_accounts where aid<20"

LOG:  automatic vacuum of table "pgbench.public.pgbench_accounts": index 
scans: 1

pages: 0 removed, 16394 remain
tuples: 19 removed, 640011 remain
buffer usage: 13742 hits, 2708 misses, 1058 dirtied
avg read rate: 3.067 MiB/s, avg write rate: 1.198 MiB/s
system usage: CPU 0.05s/0.61u sec elapsed 6.89 sec

Now that you mention it, people who do a manual, full-speed VACUUM 
would certainly appreciate some feedback on the rate it ran at.


This is more of a pain because this whole code path is only active when 
IsAutoVacuumWorkerProcess.  I have some larger refactoring in mind to 
perhaps make that more feasible.  I didn't want to hold this update 
aiming at the more valuable autovac case for that though, can always 
layer it on later.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index f42504c..6ef85dd 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*** vacuum(VacuumStmt *vacstmt, Oid relid, b
*** 214,219 
--- 214,222 
  
  		VacuumCostActive = (VacuumCostDelay > 0);
  		VacuumCostBalance = 0;
+ 		VacuumPageHit = 0;
+ 		VacuumPageMiss = 0;
+ 		VacuumPageDirty = 0;
  
  		/*
  		 * Loop to process each selected relation.
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 38deddc..c59fceb 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 154,160 
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
  	BlockNumber new_rel_pages;
--- 154,163 
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0, endtime;
!  	long		secs;
!  	int			usecs;
!  	double		read_rate, write_rate;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
  	BlockNumber new_rel_pages;
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 166,173 
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0)
! 			starttime = GetCurrentTimestamp();
  	}
  
  	if (vacstmt->options & VACOPT_VERBOSE)
--- 169,175 
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		starttime = GetCurrentTimestamp();
  	}
  
  	if (vacstmt->options & VACOPT_VERBOSE)
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 262,274 
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(),
  	   Log_autovacuum_min_duration))
  			ereport(LOG,
  	(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
  			"pages: %d removed, %d remain\n"
  			"tuples: %.0f removed, %.0f remain\n"
  			"system usage: %s",
  			get_database_name(MyDatabaseId),
  			get_namespace_name(RelationGetNamespace(onerel)),
--- 264,290 
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
+ 		endtime = GetCurrentTimestamp();
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, endtime,
  	   Log_autovacuum_min_duration))
+ 		{
+ 			TimestampDifference(starttime, endtime, &secs, &usecs);
+ 			read_rate = 0;
+ 			write_rate = 0;
+ 			if ((secs > 0) || (usecs > 0))
+ 			{
+ read_rate = (double) BLCKSZ * VacuumPageMiss / (1024 * 1024) /
+ 	(secs + usecs / 100.0);
+ write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) /
+  	(secs + usecs / 100.0);
+ 			}
  			ereport(LOG,
  	(errmsg("automatic vacuum of tab

[HACKERS] includeifexists in configuration file

2011-11-15 Thread Greg Smith
By recent popular request in the ongoing discussion saga around merging 
the recovery.conf, I've added an "includeifexists" directive to the 
postgresql.conf in the attached patch.  Demo:


$ tail -n 1 $PGDATA/postgresql.conf
include 'missing.conf'
$ pg_ctl start -l $PGLOG
server starting
$ tail -n 2 $PGLOG
LOG:  could not open configuration file 
"/home/gsmith/pgwork/data/include-exists/missing.conf": No such file or 
directory
FATAL:  configuration file 
"/home/gsmith/pgwork/data/include-exists/postgresql.conf" contains errors

$ vi $PGDATA/postgresql.conf
$ tail -n 1 $PGDATA/postgresql.conf
includeifexists 'missing.conf'
$ pg_ctl start -l $PGLOG
server starting
$ tail -n 3 $PGLOG
LOG:  database system was shut down at 2011-11-16 00:17:36 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

There might be a cleaner way to write this that eliminates some of the 
cut and paste duplication between this and the regular include 
directive.  I'm short on clever but full of brute force tonight.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1e628f..da45ac1 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** include 'filename'
*** 91,96 
--- 91,107 
  
 
  
+  includeifexists
+  in configuration file
+ 
+ Use the same approach as the include directive, continuing
+ normally if the file does not exist.  A regular include
+ will stop with an error if the referenced file is missing, while
+ includeifexists does not.
+
+ 
+
+ 
   SIGHUP
  
  The configuration file is reread whenever the main server process receives a
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index a094c7a..6f26421 100644
*** a/src/backend/utils/misc/guc-file.l
--- b/src/backend/utils/misc/guc-file.l
*** ProcessConfigFile(GucContext context)
*** 129,135 
  	/* Parse the file into a list of option names and values */
  	head = tail = NULL;
  
! 	if (!ParseConfigFile(ConfigFileName, NULL, 0, elevel, &head, &tail))
  	{
  		/* Syntax error(s) detected in the file, so bail out */
  		error = true;
--- 129,135 
  	/* Parse the file into a list of option names and values */
  	head = tail = NULL;
  
! 	if (!ParseConfigFile(ConfigFileName, NULL, true, 0, elevel, &head, &tail))
  	{
  		/* Syntax error(s) detected in the file, so bail out */
  		error = true;
*** ProcessConfigFile(GucContext context)
*** 363,369 
   * and absolute-ifying the path name if necessary.
   */
  bool
! ParseConfigFile(const char *config_file, const char *calling_file,
  int depth, int elevel,
  ConfigVariable **head_p,
  ConfigVariable **tail_p)
--- 363,369 
   * and absolute-ifying the path name if necessary.
   */
  bool
! ParseConfigFile(const char *config_file, const char *calling_file, bool strict,
  int depth, int elevel,
  ConfigVariable **head_p,
  ConfigVariable **tail_p)
*** ParseConfigFile(const char *config_file,
*** 414,424 
  	fp = AllocateFile(config_file, "r");
  	if (!fp)
  	{
! 		ereport(elevel,
! (errcode_for_file_access(),
!  errmsg("could not open configuration file \"%s\": %m",
! 		config_file)));
! 		return false;
  	}
  
  	OK = ParseConfigFp(fp, config_file, depth, elevel, head_p, tail_p);
--- 414,430 
  	fp = AllocateFile(config_file, "r");
  	if (!fp)
  	{
! 		if (strict)
! 		{
! 			ereport(elevel,
! 	(errcode_for_file_access(),
! 	 errmsg("could not open configuration file \"%s\": %m",
! 			config_file)));
! 			return false;
! 		}
! 
! 		/* Silently skip missing files if not asked to be strict */
! 		return OK;
  	}
  
  	OK = ParseConfigFp(fp, config_file, depth, elevel, head_p, tail_p);
*** ParseConfigFp(FILE *fp, const char *conf
*** 512,518 
  		}
  
  		/* OK, process the option name and value */
! 		if (guc_name_compare(opt_name, "include") == 0)
  		{
  			/*
  			 * An include directive isn't a variable and should be processed
--- 518,541 
  		}
  
  		/* OK, process the option name and value */
! 		if (guc_name_compare(opt_name, "includeifexists") == 0)
! 		{
! 			/*
! 			 * An includeifexists directive isn't a variable and should be
! 			 * processed immediately.
! 			 */
! 			unsigned int save_ConfigFileLineno = ConfigFileLineno;
! 
! 			if (!ParseConfigFile(opt_value, config_file, false,
!  depth + 1, elevel,
!  head_p, tail_p))
! OK = false;
! 			yy_switch_to_buffer(lex_buffer);
! 			ConfigFileLineno = save_ConfigFileLineno;
! 			pfree(opt_name);
! 			pfree(opt_value);
! 		}
! 	

[HACKERS] Configuration include directory

2011-11-15 Thread Greg Smith
Two years ago Magnus submitted a patch to parse all the configuration 
files in a directory.  After some discussion I tried to summarize what I 
thought the most popular ideas were for moving that forward:


http://archives.postgresql.org/pgsql-hackers/2009-10/msg01452.php
http://archives.postgresql.org/pgsql-hackers/2009-10/msg01631.php

And I've now cleared the bit rot and updated that patch to do what was 
discussed.  Main feature set:


-Called by specifying "includedir ".  No changes to the 
shipped postgresql.conf yet.

-Takes an input directory name
-If it's not an absolute path, considers that relative to the -D option 
(if specified) or PGDATA, the same logic used to locate the 
postgresql.conf (unless a full path to it is used)
-Considers all names in that directory that end with *.conf  [Discussion 
concluded more flexibility here would be of limited value relative to 
how it complicates the implementation]

-Loops over the files found in sorted order by name

The idea here is that it will be easier to write tools that customize 
the database configuration if they can just write a new file out, rather 
than needing to parse the whole configuration file first.  This would 
allow Apache style configuration directories.  My end goal here is to 
see all of the work initdb does pushed into a new file included by this 
scheme.  People could then expect a functionally empty postgresql.conf 
except for an includedir, and the customization would go into 00initdb.  
Getting some agreement on that is not necessary for this feature to go 
in though; one step at a time.


Here's an example showing this working, including rejection of a 
spurious editor backup file in the directory:


$ cat $PGDATA/postgresql.conf | grep ^work_mem
$ tail -n 1 $PGDATA/postgresql.conf
includedir='conf.d'
$ ls $PGDATA/conf.d
00config.conf  00config.conf~
$ cat $PGDATA/conf.d/00config.conf
work_mem=4MB
$ cat $PGDATA/conf.d/00config.conf~
work_mem=2MB
$ psql -c "select name,setting,sourcefile,sourceline from pg_settings 
where name='work_mem'"
   name   | setting |  
sourcefile   | sourceline

--+-+---+
 work_mem | 4096| 
/home/gsmith/pgwork/data/confdir/conf.d/00config.conf |  1


No docs in here yet.  There's one ugly bit of code here I was hoping 
(but failed) to avoid.  Right now the server doesn't actually save the 
configuration directory anywhere.  Once you leave the initial read in 
SelectConfigFiles, that information is gone, and you only have the 
configfile.  I decided to make that configdir into a global value.  
Seemed easier than trying to pass it around, given how many SIGHUP paths 
could lead to this new code.


I can see some potential confusion here in one case.  Let's say someone 
specifies a full path to their postgresql.conf file.  They might assume 
that the includedir was relative to the directory that file is in.  
Let's say configfile is /etc/sysconfig/pgsql/postgresql.conf ; a user 
might think that "includedir conf.d" from there would reference 
/etc/sysconfig/pgsql/conf.d instead of the $PGDATA/conf.d you actually 
get.  Wavering on how to handle that is one reason I didn't try 
documenting this yet, the decision I made here may not actually be the 
right one.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index a094c7a..25e1a07 100644
*** a/src/backend/utils/misc/guc-file.l
--- b/src/backend/utils/misc/guc-file.l
*** ParseConfigFp(FILE *fp, const char *conf
*** 512,518 
  		}
  
  		/* OK, process the option name and value */
! 		if (guc_name_compare(opt_name, "include") == 0)
  		{
  			/*
  			 * An include directive isn't a variable and should be processed
--- 512,535 
  		}
  
  		/* OK, process the option name and value */
! 		if (guc_name_compare(opt_name, "includedir") == 0)
! 		{
! 			/*
! 			 * An includedir directive isn't a variable and should be processed
! 			 * immediately.
! 			 */
! 			unsigned int save_ConfigFileLineno = ConfigFileLineno;
! 
! 			if (!ParseConfigDirectory(opt_value, config_file,
!  depth + 1, elevel,
!  head_p, tail_p))
! OK = false;
! 			yy_switch_to_buffer(lex_buffer);
! 			ConfigFileLineno = save_ConfigFileLineno;
! 			pfree(opt_name);
! 			pfree(opt_value);
! 		}
! 		else if (guc_name_compare(opt_name, "include") == 0)
  		{
  			/*
  			 * An include directive isn't a variable and should be processed
*** ParseConfigFp(FILE *fp, const char *conf
*** 599,604 
--- 616,727 
  	return OK;
  }
  
+ static int
+ comparestr(const void *a, const void *b)
+ {
+ 	return st

Re: [HACKERS] Core Extensions relocation

2011-11-15 Thread Greg Smith
Well, this discussion veering off into ISN has certainly validated my 
gut feel that I should touch only the minimum number of things that 
kills my pain points, rather than trying any more ambitious 
restructuring.  I hope that packaged extensions become so popular that 
some serious cutting can happen to contrib, especially the data type 
additions.  If something as big as PostGIS can live happily as an 
external project, surely most of these can too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core Extensions relocation

2011-11-15 Thread Greg Smith

On 11/15/2011 12:53 PM, Joshua Berkus wrote:

Given discussion, is there any point in reporting on the actual patch yet?


I don't expect the discussion will alter the code changes that are the 
main chunk of the patch here.  The only place the most disputed parts 
impact is the documentation.


I like "Management Extensions" as an alternate name for this category 
instead, even though it still has the issue that auto_explain isn't 
technically an extension.  The name does help suggest why they're thrown 
into a different directory and package.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IDLE in transaction introspection

2011-11-15 Thread Greg Smith

On 11/15/2011 09:44 AM, Scott Mead wrote:

Fell off the map last week, here's v2 that:
 * RUNNING => active
 * all states from CAPS to lower case



This looks like what I was hoping someone would add here now.  Patch 
looks good, only issue I noticed was a spaces instead of a tab goof 
where you set beentry_>st_state at line 2419 in 
src/backend/postmaster/pgstat.c


Missing pieces:

-There is one regression test that uses pg_stat_activity that is broken now.
-The documentation should list the new field and all of the states it 
might include.  That's a serious doc update from the minimal info 
available right now.


I know this issue has been beat up already some, but let me summarize 
and extend that thinking a moment.  I see two equally valid schools of 
thought on how exactly to deal with introducing this change:


-Add the new state field just as you've done it, but keep updating the 
query text anyway.  Do not rename current_query.  Declare the 
overloading of current_query as both a state and the query text to be 
deprecated in 9.3.  This would keep existing tools working fine against 
9.2 and give a clean transition period.


-Forget about backward compatibility and just put all the breaking stuff 
we've been meaning to do in here.  If we're going to rename 
current_query to query--what Scott's patch does here--that will force 
all code using pg_stat_activity to be rewritten.  This seems like the 
perfect time to also change "procpid" to "pid", finally blow away that wart.


I'll happily update all of the tools and samples I deal with to support 
this change.  Most of the ones I can think of will be simplified; 
they're already parsing query_text and extracting the implicit state.  
Just operating on an explicit one instead will be simpler and more robust.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Unremovable tuple monitoring

2011-11-15 Thread Greg Smith

On 11/15/2011 10:29 AM, Alvaro Herrera wrote:

They were deleted but there are transactions with older snapshots.
I think vacuum uses the term "nondeletable" or "nonremovable".  Not sure
which one is less bad.  Not being a native speaker, they all sound
horrible to me.
   


I would go more for something like "deadinuse".  Saying they are 
unremovable isn't very helpful because it doesn't lead the user to 
knowing why.  If the name gives some suggestion as to why they are 
unremovable--in this case that they are still potentially visible and 
usable by old queries--that would be a useful naming improvement to me.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core Extensions relocation

2011-11-14 Thread Greg Smith

On 11/14/2011 10:09 PM, Robert Haas wrote:

I continue to think that we should be trying to sort these by subject
matter.  The term "core extensions" doesn't convey that these are
server management and debugging tools, hence Josh's confusion.
   


I'm not attached to the name, which I just pulled out of the air for the 
documentation.  Could just as easily call them built-in modules or 
extensions.  If the objection is that "extensions" isn't technically 
correct for auto-explain, you might call them core add-ons instead.  My 
thinking was that the one exception didn't make it worth the trouble to 
introduce a new term altogether here.  There's already too many terms 
used for talking about this sort of thing, the confusion from using a 
word other than "extensions" seemed larger than the confusion sown by 
auto-explain not fitting perfectly.


The distinction I care about here is primarily a packaging one.  These 
are server additions that people should be able to count on having 
available, whereas right now they may or may not be installed depending 
on if contrib was added.  Everything I'm touching requires our RPM and 
Debian packagers (at least) make a packaging change, too.  I can't 
justify why that's worth doing for any of the other extensions, which is 
one reason I don't try to tackle them.


The type of finer sorting you and Thom are suggesting seems like it's 
mainly a documentation change to me.  I'm indifferent to the idea; no 
plans to either work on it or object to it.  The docs could be made 
easier to follow here without any change to the directory tree, and 
trying to push out a larger packaging change has downsides.  Useful 
reminder reading here is 
http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting#Moving_Contrib_Around  
To quote from there, "Users hate having loads and loads of packages. We 
do need to be careful not to oversplit it."  There's some useful notes 
about dependency issues there too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core Extensions relocation

2011-11-14 Thread Greg Smith

On 11/14/2011 07:56 PM, Josh Berkus wrote:

So I'm a bit unclear on why most of the optional data types were
excluded from your list of Core Extensions.


I was aiming for the extensions that seemed uncontroversial for a first 
pass here.  One of the tests I applied was "do people sometimes need 
this module after going into production with their application?"  The 
very specific problem I was most concerned about eliminating was people 
discovering they needed an extension to troubleshoot performance or 
corruption issues, only to discover it wasn't available--because they 
hadn't installed the postgresql-contrib package.  New package 
installation can be a giant pain to get onto a production system in some 
places, if it wasn't there during QA etc.


All of the data type extensions fail that test.  If you need one of 
those, you would have discovered that on your development server, and 
made sure the contrib package was available on production too.  There 
very well may be some types that should be rolled into the core 
extensions list, but I didn't want arguments over that to block moving 
forward with the set I did suggest.  We can always move more of them 
later, if this general approach is accepted.  It only takes about 5 
minutes per extension to move them from contrib to src/extension, once 
the new directory tree and doc section is there.  But I didn't want to 
do the work of moving another 15 of them if the whole idea was going to 
get shot down.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-14 Thread Greg Smith

On 11/14/2011 04:04 PM, Robert Haas wrote:

Some pgbench -S numbers (SELECT-only test) from Nate Boley's 32-core
box


It seems like Nate Boley's system should be be credited in the 9.2 
release notes.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group Commit

2011-11-14 Thread Greg Smith

On 11/14/2011 03:43 PM, Josh Berkus wrote:
   

Purpose is to provide consistent WAL writes, even when WALInsertLock
contended. Currently no "off" option, thinking is that the overhead of
doing this is relatively low and so it can be "always on" - exactly as
it is for sync rep.
 

Hmmm, have you had a chance to do any performance tests?
   


I was planning to run some later this week, but someone else is welcome 
to take a shot at it.  The inspiration for this change was the 
performance scaling tests I did for sync rep last month.  Don't recall 
if I shared those with this list yet; I've attached the fun graph.  Over 
a slow international link with 100ms ping times, I was only getting the 
expected 10 TPS doing sync rep with a single client.  But as more 
clients were added, so that a chunk of them were acknowledged in each 
commit reply, the total throughput among all of them scaled near 
linearly.  With 300 clients, that managed to hit a crazy 2000 TPS.


The best scenario to show this patch working would be a laptop drive 
spinning at a slow speed (5400 or 4200 RPM) so that individual local 
commits are slow.  That won't be 100ms slow, but close to 10ms is easy 
to see.  When adding clients to a system with a slow local commit, what 
I've observed is that the scaling levels off between 750 and 1000 TPS, 
no matter how many clients are involved.  The hope is that this 
alternate implementation will give the higher scaling in the face of 
slow commits that is seen on sync rep.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

<>
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CommitFest 2011-11 starting soon

2011-11-14 Thread Greg Smith
Tomorrow November 15, patch submission will close for the 2011-11 
CommitFest after 11:59PM PST.  New patches ready for review should be 
submitted to this mailing list and added to the CommitFest application 
at https://commitfest.postgresql.org/  See 
http://wiki.postgresql.org/wiki/Development_information for guidelines 
on patch development, submission, review, and how the CommitFest work 
happens.


Even before the last minute rush there are already 25 submissions that 
are looking for reviewers.  If you can apply a patch and you can use the 
new feature, you're qualified to start reviewing it.  And we've produced 
some guides to the patch part--see 
http://www.pgcon.org/2011/schedule/events/368.en.html as one example.


If you're interested in helping with review, but are looking for 
suggestions on what patch to select, you should join the Round Robin 
Reviewers list to get assigned one.  More information about that at 
http://wiki.postgresql.org/wiki/RRReviewers


This is the 3rd of the 4 CommitFests for PostgreSQL 9.2.  The hope is 
that any major features aimed at 9.2 will have been submitted for 
initial review by this one, so there's still time to get feedback and 
re-submit before the final CommitFest for 9.2, 2012-01.


There are also 4 uncontroversial submissions in this CommitFest that are 
marked "Ready for Committer":


-Non-inheritable check constraints
-plperl verify utf8 strings
-Perl xsubpp from cpan
-Add Support for building with Visual Studio 2010

Committers who would like to get an early start have some options already.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] star join optimization

2011-11-14 Thread Greg Smith

On 11/14/2011 10:44 AM, Rudyar wrote:

Ok, I'm working in that project. I will send you my results
and comparision with SQL server HHJ optimization in one or two months.


Please be careful not to share here details of how features like this 
are built in any commercial databases you evaluate.  Some of those 
implementations use patented design approaches that should be avoided in 
an open source project.  Oracle, Microsoft, and DB2 are all aggressive 
about patenting the innovative parts of their database server code.


In addition to not wanting to accidentally incorporate such a design, 
it's better for the PostgreSQL project to not be aware of what patents 
in this area exist too.  We don't even want a survey of patents in this 
area published here because there are increased penalties for willful 
patent infringement.  See http://en.wikipedia.org/wiki/Treble_damages 
for example.


What this project likes best are innovative approaches from recent 
academic research that haven't been incorporated in any commercial 
products yet.  A good example is how the Serializable Snapshot Isolation 
technique developed by Cahill and others was added to PostgreSQL 9.1:  
http://wiki.postgresql.org/wiki/Serializable  There was less concern 
over accidentally duplicating a patented approach because that technique 
wasn't in any of the commercial databases yet.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing output of EXPLAIN command in PostgreSQL

2011-11-10 Thread Greg Smith

On 11/10/2011 11:10 AM, Αναστάσιος Αρβανίτης wrote:

I'm developing an application that requires parsing of
execution plans (those produced as output by issuing an EXPLAIN [query]
command). Are you aware of any Java library that I could use for this
purpose? I found https://github.com/depesz/Pg--Explain but it is built in Perl.
   


There's also a little Javascript program that consumes the JSON version at:

http://www.postgresonline.com/journal/archives/171-pgexplain90formats_part1.html
http://www.postgresonline.com/journal/archives/174-pgexplain90formats_part2.html


Also another option I am considering is to use EXPLAIN [query] FORMAT XML which 
is available in PostgreSQL 9.1. However, in that case it
would better to have the XML Schema of the generated plans available.
   


That's the easiest way to solve this problem in Java, and in that case 
most of the text-based code in Pg--Explain will just be a distraction.  
I know some of the earlier versions of XML EXPLAIN included a "DTD" 
option to output that, but I don't see that in the committed code.  I'm 
not sure where that is at actually; it's a good question.


The only reference to doing this I found was Andrew's blog:  
http://people.planetpostgresql.org/andrew/index.php?/archives/32-A-couple-of-nice-tools.html 
where he talks about there being a RELAXNG specification for the XML 
output.  I can't find where that came from either.  Andrew?


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MPI programming in postgreSQL backend source code

2011-11-09 Thread Greg Smith

On 11/09/2011 04:10 PM, Rudyar Cortés wrote:

I'm a new programmer in postgreSQL source code..
Is possible use MPI functions in postgreSQL source code?


To do this the proper way, you would need to modify the database's 
"configure" step to:


-Check if the OpenMPI libraries are available and include the necessary 
bits.  For example, this is in one of the projects I work on:


#ifdef _OPENMP
extern int omp_get_num_threads();
#endif

Some form of that test and defining the functions available would be 
needed for what you want.


-Link OpenMPI in.  At the gcc level you'll need "-fopenmp" to start.

Then you could start using OpenMPI functions in database code.  You 
might hack the build steps to do this in a simpler way, too, rather than 
fight with configure the proper way.


Since a lot of the MPI functions aim at specific types of thread use and 
I/O, it would be a major effort to utilize the library for too many 
things.  The existing notion of how processes are created and managed is 
scattered throughout the PostgreSQL code.  And the I/O parts of the 
database are buried through a few layers of indirection.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-09 Thread Greg Smith

On 11/09/2011 04:05 PM, Dimitri Fontaine wrote:

The more I think about this problem, the more I think that the reason
why we still don't have declarative partitioning is that it basically
sucks.


I think that we don't have it because no one has ever dumped the much 
larger than might be expected amount of time into pulling all the pieces 
together and smoothing out the rough parts.  I don't think there's any 
design thinking leap needed over what's already been worked out.  Just a 
lot of work to get all the edge cases right on the simplest possible 
thing that is useful.


The path to reach something that could be considered for commit includes 
something like this set of things:


1) Add partitioning catalog support
2) Create new syntax for partitioning that writes to the catalog
3) Decide how to represent partition data in memory
4) Route new INSERTed data into the right place
5) Support moving UPDATEd data into a new partition
6) Handle COPY usefully

The last rev of this submitted was still working through (1) here, i.e. 
this review from Robert:  
http://archives.postgresql.org/message-id/aanlktikp-1_8b04eyik0sdf8ua5kmo64o8sorfbze...@mail.gmail.com  
And there's a whole pile of issues I don't think have been fully 
explored about even the most basic case.  How to handle ALTER to these 
structures cleanly, locking, etc..  I don't think it's possible to 
design such that you skip a large portion of these details; someone 
needs to put some number of spend weeks+ getting them all right instead.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.1.2 ?

2011-11-09 Thread Greg Smith

On 11/09/2011 03:58 PM, Daniel Farina wrote:

On Tue, Nov 8, 2011 at 9:06 PM, Magnus Hagander  wrote:
   

I definitely think they are important enough to trigger a release. But as
you say, I think we need confirmation that they actually fix the problem...
 

I have confirmed that the clog/subtrans fixes allow us to start up
while in hot standby on otherwise problematic base backups.
   


I think Daniel has run into this problem more than anyone else, so 
hearing it's fixed for him makes me feel a lot better that it's been 
resolved.  I'd characterize this problem as a medium grade data 
corruption issue.  It's not security issue bad that it needs to be 
released tomorrow, but a backbranch release of at least 9.0/9.1 that 
includes it would be a big relief for people nervous about this.  I'd 
hate to see that slip forward to where it gets sucked into the holiday 
vortex.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.1.2 ?

2011-11-09 Thread Greg Smith

On 11/09/2011 01:12 PM, Greg Jaskiewicz wrote:

Would you consider it a blocker for a rollout on production system ?


I wouldn't.  Good process for checking your backups should find this 
problem if it pops up, and it's not that easy to run into.  That's why I 
was saying there are workarounds here, they're just not nice to put 
people through.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Releasing an alpha for CF2

2011-11-08 Thread Greg Smith

On 11/08/2011 09:31 PM, Robert Haas wrote:

I know various people are working on patches to do various things, but the
only outstanding thing that sticks out in my mind as an outstanding
"big feature" patch is Alvaro's work on foreign key locks.  I'm sure
there must be others...  but I'm not sure how many.
   


Along with that, there's a lot of potential for ripple from the "Command 
Triggers" patch Dimitri just submitted today.  That's already 
approaching 3000 lines of code and he's still going on it.  I've been 
sniffing around recently looking for troublesome patches, under the 
assumption I'll be running the CF next week, and those two are the ones 
I've been most concerned about.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views

2011-11-08 Thread Greg Smith

On 11/08/2011 04:23 PM, Kevin Grittner wrote:

http://wiki.postgresql.org/wiki/Materialized_Views
That page describes three components: creating MVs, updating MVs, and
having the planner automatically detect when an MV matches some
portion of a regular query and using the MV instead of the specified
tables in such cases.


So far that page is just my initial notes on this subject based on some 
research I did, don't read too much into it.  Ignoring MV substitution 
is certainly the right plan for an initial development target.  An 
implementation that didn't update automatically at all is also still a 
useful step.  It's very easy to pick too big of a chunk of this project 
to chew on at once.


When I wrote that, it wasn't completely clear to me yet that doing the 
updates would involve so many of the same tricky bits that stalled 
progress on the MERGE command.  Nowadays I think of working out the 
concurrency issues in both this and MERGE, in the wake of true 
serialization, as another step in this implementation plan.  It's not 
strictly necessary, but there's some shared issues that might be tackled 
easier in the MERGE context instead.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.1.2 ?

2011-11-08 Thread Greg Smith

On 11/08/2011 07:34 PM, Tom Lane wrote:

I don't recall that we've fixed anything since September that seemed to
warrant an immediate release.
   


The backup+pg_clog failure issues fixed last week have been a nasty 
problem hitting people for a while.  Backup corruption is obviously 
serious.  Only reason I think it wasn't a higher priority issue is that 
it didn't happen every time, and the people impacted were eventually 
able to work around it.  Concern about that problem is why I popped off 
a message earlier today, about whether the fixes committed have been 
confirmed outside of Simon's own testing.


I was curious how 9.0 fared last year for comparison, here's that data:

Version Date  Days  Weeks
9.0.009/20/10
9.0.110/04/10142.0
9.0.212/16/107310.4
9.0.301/31/11466.6
9.0.404/18/117711.0
9.0.509/26/11161   23.0

So the average for the first three point releases was around 6 weeks apart.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring relation free space

2011-11-08 Thread Greg Smith

On 11/08/2011 05:07 PM, Robert Treat wrote:

It's already easy to get "good enough" numbers based on user space
tools with very little overhead, so I think it's more important that
the server side tool be accurate rather than fast.


What user space method do you consider good enough here?  I haven't 
found any approximation that I was really happy with; wouldn't have 
bothered with this otherwise.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-11-08 Thread Greg Smith
I was curious if anyone running into these problems has gotten a chance 
to test the 3 fixes committed here.  It sounded like Linas even had a 
repeatable test case?


For easier reference the commits are:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2f55c535e1f026929cf20855b3790d3632062d42

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ff8451aa14c8513e429cbef09ddc72e79da366a5

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=656bba95af3950b26e9e97d86d29787d89e2b423

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring relation free space

2011-11-08 Thread Greg Smith

On 11/06/2011 11:55 PM, Mark Kirkwood wrote:
I am guessing (at this point very much guessing) that pg_freespace map 
may return its data faster, as pageinspect is gonna have to grovel 
through all the pages for itself (whereas pg_freespacemap relies on 
using info from the ... free space map fork).


I started with pageinspect because I wasn't sure if other methods would 
be as accurate.  For example, I wasn't sure until just before submission 
that only the free space and size of the relation are needed to get a 
useful measure here; at one point I was considering some other things 
too.  I've ruled them out as unnecessary as far as I can tell, but I 
can't claim my tests are exhaustive.  Some review confirmation that this 
measure is useful for other people is one thing I was hoping for 
feedback on, as one thing to consider in addition to the actual 
implementation.


If this measurement is the only one needed, than as I said at the start 
of the thread here it might easily be implemented to run just against 
the free space map instead.  I'm thinking of what's been sent so far as 
a UI with matching output it should produce.  If it's possible to get 
the same numbers faster, exactly how to implement the function under the 
hood is easy enough to change.  Jaime already has a new version in 
development that adds a ring buffer for example.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements

2011-11-07 Thread Greg Smith

On 11/06/2011 06:00 PM, Tom Lane wrote:

Peter Geoghegan  writes:

>  A major consideration was backwards compatibility;
This is not a consideration that the community is likely to weigh
heavily, or indeed at all.  We aren't going to back-port this feature
into prior release branches, and we aren't going to want to contort its
definition to make that easier.


Being able to ship a better pg_stat_statements that can run against 
earlier versions as an extension has significant value to the 
community.  Needing to parse log files to do statement profiling is a 
major advocacy issue for PostgreSQL.  If we can get something useful 
that's possible to test as an extension earlier than the 9.2 
release--and make it available to more people running older versions 
too--that has some real value to users, and for early production testing 
of what will go into 9.2.


The point where this crosses over to requiring server-side code to 
operate at all is obviously a deal breaker on that idea.  So far that 
line hasn't been crossed, and we're trying to stage testing against 
older versions on real-world queries.  As long as it's possible to keep 
that goal without making the code more difficult to deal with, I 
wouldn't dismiss that as a useless distinction.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements

2011-11-07 Thread Greg Smith

On 11/07/2011 09:03 AM, Robert Haas wrote:

I think it's an established principle that the design for features
like this should, for best results, be discussed on -hackers before
writing a lot of code.


You can see from the commit history this idea is less than a month old.  
Do we need to get community approval before writing a proof of concept 
now?  Everyone would still be arguing about how to get started had that 
path been taken.  If feedback says this needs a full rewrite, fine.  We 
are familiar with how submitting patches works here.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements

2011-11-06 Thread Greg Smith

On 11/06/2011 01:08 PM, Tom Lane wrote:

Peter Geoghegan  writes:
   

... It also does things
like intelligently distinguishing between queries with different
limit/offset constant values, as these constants are deemed to be
differentiators of queries for our purposes. A guiding principal that
I've followed is that anything that could result in a different plan
is a differentiator of queries.
 

This claim seems like bunk, unless you're hashing the plan tree,
in which case it's tautological.


Peter's patch adds a planner hook and hashes at that level.  Since this 
cat is rapidly escaping its bag and impacting other contributors, we 
might as well share the work in progress early if anyone has a burning 
desire to look at the code.  A diff against the version I've been 
internally reviewing in prep for community submission is at 
https://github.com/greg2ndQuadrant/postgres/compare/master...pg_stat_statements_norm  
Easier to browse than ask questions probing about it I think.  Apologies 
to Peter for sharing his work before he was completely ready; there is a 
list of known problems with it.  I also regret the thread hijack here.


The first chunk of code is a Python based regression test program, and 
an open item here is the best way to turn that into a standard 
regression test set.



There will be additional infrastructure added to the parser to support
normalisation of query strings for the patch I'll be submitting (that
obviously won't be supported in the version that builds against
existing Postgres versions that I'll make available). Essentially,
I'll be adding a length field to certain nodes,
 

This seems like a good way to get your patch rejected: adding overhead
to the core system for the benefit of a feature that not everyone cares
about is problematic.


Struggling around this area is the main reason this code hasn't been 
submitted yet.  To step back for a moment, there are basically three 
options here that any code like this can take, in regards to storing the 
processed query name used as the key:


1) Use the first instance of that query seen as the "reference" version
2) Use the last instance seen
3) Transform the text of the query in a way that's stable across all 
duplicates of that statement, and output that


Downstream tools operating on this data, things that will sample 
pg_stat_statements multiple times, need some sort of stable query text 
they can operate on.  It really needs to survive server restart too.  
Neither (1) nor (2) seem like usable solutions.  Both have been 
implemented already in Peter's patch, with (2) being what's in the 
current patch.  How best to do (3) instead is not obvious though.


Doing the query matching operating at the planner level seems effective 
at side-stepping the problem of needing to parse the SQL, which is where 
most implementations of this idea get stuck doing fragile 
transformations.  My own first try at this back in September and Tomas's 
patch both fall into that category.  That part of Peter's work seems to 
work as expected.  That still leaves the problem of "parsed query -> 
stable normalized string".  I think that is an easier one to solve than 
directly taking on the entirety of "query text -> stable normalized 
string" though.  Peter has an idea he's exploring for how to implement 
that, and any amount of overhead it adds to people who don't use this 
feature is an obvious concern.  There are certainly use cases that don't 
care about this problem, ones that would happily take (1) or (2).  I'd 
rather not ship yet another not quite right for everyone version of 
pg_stat_statements again though; only solving too limited of a use case 
is the big problem with the one that's already there.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Measuring relation free space

2011-11-05 Thread Greg Smith
Attached patch adds a new function to the pageinspect extension for 
measuring total free space, in either tables or indexes.  It returns the 
free space as a percentage, so higher numbers mean more bloat.  After 
trying a couple of ways to quantify it, I've found this particular 
measure correlates well with the nastiest bloat issues I've ran into in 
production recently.  For example, an index that had swelled to over 5X 
its original size due to autovacuum issues registered at 0.86 on this 
scale.  I could easily see people putting an alert at something like 
0.40 and picking candidates to reindex based on it triggering.  That 
would be about a million times smarter than how I've been muddling 
through this class of problems so far.


Code by Jaime Casanova, based on a prototype by me.  Thanks to attendees 
and sponsors of the PgWest conference for helping to fund some deeper 
exploration of this idea.


Here's a test case showing it in action:

create extension pageinspect;
create table t (k serial,v integer);
insert into t(v) (select generate_series(1,10));
create index t_idx on t(k);
delete from t where k<5;
vacuum t;

gsmith=# select relation_free_space('t');
 relation_free_space
-
0.445466

gsmith=# select relation_free_space('t_idx');
 relation_free_space
-
0.550946

Some open questions in my mind:

-Given this is doing a full table scan, should it hook into a ring 
buffer to keep from trashing the buffer cache?  Or might it loop over 
the relation in a different way all together?  I was thinking about 
eyeing the FSM instead at one point, didn't explore that yet.  There's 
certainly a few ways to approach this, we just aimed at the easiest way 
to get a working starter implementation, and associated results to 
compare others against.


-Should there be a non-superuser version of this?  We'd certainly need 
to get a less cache demolishing version before that would seem wise.


-There were related things in the pageinspect module, but a case could 
be made for this being a core function instead.  It's a bit more likely 
to be used in production than the rest of that extension.


-What if anything related to TOAST should this handle?

We're also planning to do a sampling version of this, using the same 
approach ANALYZE does.  Grab a number of blocks, extrapolate from 
there.  It shouldn't take many samples before the accuracy is better 
than how people are estimated this now.  That work is just waiting on 
some better thinking about how to handle the full relation version first.


And, yes, the explanation in the docs and code should be clear that it's 
returning a percentage, which I just realized when writing this.  At 
least I remembered to document something; still ahead of the average new 
patch...


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index dbb2158..aac9148 100644
*** a/contrib/pageinspect/btreefuncs.c
--- b/contrib/pageinspect/btreefuncs.c
***
*** 34,39 
--- 34,40 
  #include "utils/builtins.h"
  #include "utils/rel.h"
  
+ #include "btreefuncs.h"
  
  extern Datum bt_metap(PG_FUNCTION_ARGS);
  extern Datum bt_page_items(PG_FUNCTION_ARGS);
*** GetBTPageStatistics(BlockNumber blkno, B
*** 155,160 
--- 156,202 
  		stat->avg_item_size = 0;
  }
  
+ /*
+  * GetBTRelationFreeSpace
+  *
+  * Get the free space for a btree index.
+  * This is a helper function for relation_free_space()
+  *
+  */
+ float4
+ GetBTRelationFreeSpace(Relation rel)
+ {
+ 	BTPageStat stat;
+ 
+ 	Buffer		buffer;
+ 	BlockNumber blkno;
+ 	BlockNumber totalBlcksInRelation = RelationGetNumberOfBlocks(rel);
+ 	Size 		free_space = 0;
+ 	double		free_percent = 0;
+ 	
+ 	/* Skip page 0 because it is a metapage */
+ 	for (blkno = 1; blkno < totalBlcksInRelation; blkno++)
+ 	{
+ 		buffer = ReadBuffer(rel, blkno);
+ 		/* 
+ 		 * get the statistics of the indexes and use that info
+ 		 * to determine free space on the page
+ 		 */
+ 		GetBTPageStatistics(blkno, buffer, &stat);
+ 		if (stat.type == 'd')
+ 			free_space += stat.page_size;
+ 		else
+ 			free_space += stat.free_size;		
+ 
+ 		ReleaseBuffer(buffer);
+ 	}
+ 
+ 	if (totalBlcksInRelation > 1)
+ 		free_percent = (free_space * 1.0) / ((totalBlcksInRelation - 1) * BLCKSZ);
+ 	return free_percent;
+ }
+ 
+ 
  /* ---
   * bt_page()
   *
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index fa50655..c1d72ba 100644
*** a/contrib/pageinspect/heapfuncs.c
--- b/contrib/pageinspect/hea

Re: [HACKERS] IDLE in transaction introspection

2011-11-05 Thread Greg Smith

On 11/04/2011 05:01 PM, Tom Lane wrote:

Scott Mead  writes:
   

I leave the waiting flag in place for posterity.  With this in mind, is
the consensus:
RUNNING
 or
ACTIVE
 

Personally, I'd go for lower case.
   


I was thinking it would be nice if this state looked like the WAL sender 
state values in pg_stat_replication, which are all lower case.  For 
comparison those states are:


startup
backup
catchup
streaming

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unite recovery.conf and postgresql.conf

2011-11-03 Thread Greg Smith

On 09/24/2011 04:49 PM, Joshua Berkus wrote:
Well, we *did* actually come up with a reasonable way, but it died 
under an avalanche of bikeshedding and 
"we-must-do-everything-the-way-we-always-have-done". I refer, of 
course, to the "configuration directory" patch, which was a fine 
solution, and would indeed take care of the recovery.conf issues as 
well had we implemented it. We can *still* implement it, for 9.2.


That actually died from a lack of round-tuits, the consensus at the end 
of the bike-sheeding was pretty clear.  Last night I finally got 
motivated to fix the bit rot and feature set on that patch, to match 
what seemed to be the easiest path toward community approval.  One known 
bug left to resolve and I think it's ready to submit for the next CF.


I think includeifexists is also a good improvement, too, on a related 
arc to the main topic here.  If I can finish off the directory one (or 
get someone else to fix my bug) I should be able to follow up with that 
one.  The patches won't be that different.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce checkpoints and WAL traffic on low activity database serv

2011-11-02 Thread Greg Smith

On 11/02/2011 05:48 PM, Simon Riggs wrote:

On Wed, Nov 2, 2011 at 6:27 PM, Robert Haas  wrote:

   

Again, it's no longer the maximum time between automatic checkpoints.
 

You're missing the point that it never was like that. I've not altered
the situation.
   


Robert's point is more that the existing docs are already broken; this 
new patch can just increase how bad the drift between reality and 
documentation can be.  Before, the only people who ran into this had 
zero activity on the server, which meant there wasn't any data to be 
lost, either.  Now it's potentially broader than that.


With some trivial checkpoints containing a small amount of data skipped 
now, aren't there some cases where less WAL data will be written than 
before?  In that case, the user visible behavior here would be 
different.  I'd be most concerned about file-based log shipping case.


I don't think there's any change needed to the "Write Ahead Log" section 
of the "Server Configuration" chapter.  In the "Reliability and the 
Write-Ahead Log" chapter, this text in "WAL Configuration" was already 
highlighted as the problem here:


The server's background writer process automatically performs a 
checkpoint every so often. A checkpoint is created every 
checkpoint_segments log segments, or every checkpoint_timeout seconds, 
whichever comes first. The default settings are 3 segments and 300 
seconds (5 minutes), respectively. It is also possible to force a 
checkpoint by using the SQL command CHECKPOINT.


I think this needs a change like this, to address the hole that was 
already in the docs and cover the new behavior too; this goes just 
before " It is also possible to force..."


In cases where there are little or no writes to the WAL, checkpoints 
will be skipped even if checkpoint_timeout has passed.  At least one new 
WAL segment must have been created before an automatic checkpoint 
occurs.  The time between checkpoints and when new WAL segments are 
created are not related in any other way.  If file-based WAL shipping is 
being used and you want to bound how often files are sent to standby 
server, to reduce potential data loss you should adjust archive_timeout 
parameter rather than the checkpoint ones.


This area is a confusing one, so some clarification of the relation 
between checkpoints and replication is a useful docs improvement.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core Extensions relocation

2011-11-02 Thread Greg Smith

On 10/14/2011 01:48 PM, Bruce Momjian wrote:

Is this going to be done for 9.2?
   


Refreshing this patch is on my list of things to finish before the next 
CommitFest starts later this month.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Displaying accumulated autovacuum cost

2011-10-05 Thread Greg Smith

On 09/29/2011 10:40 AM, Alvaro Herrera wrote:

I reviewed this patch.  My question for you is: does it make sense to
enable to reporting of write rate even when vacuum cost accounting is
enabled?  In my opinion it would be useful to do so.  If you agree,
please submit an updated patch.
   


Presumably you meant to ask if this makes sense to show when cost 
accounting isn't enabled, because the code doesn't do that right now.  
No cost accounting, no buffer usage/write rate data as this was submitted.


Looks like making this work even in cases where cost accounting isn't on 
will make the patch a bit larger obtrusive, but it's not unreasonable.  
Now that you mention it, people who do a manual, full-speed VACUUM would 
certainly appreciate some feedback on the rate it ran at.  I'll include 
that in the next update.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Displaying accumulated autovacuum cost

2011-10-04 Thread Greg Smith
On 09/26/2011 05:58 AM, Shigeru Hanada wrote:
> * Local variables added by the patch (secs, usecs, write_rate and
> endtime) can be moved into narrower scope.
> * Initializing starttime to zero seems unnecessary.
>   

Setting starttime to 0 is already in the code; the change made to that
line was to add endtime, which is not initialized. You may be right that
initializing it isn't necessary, but I'm sure not going to touch that
part of the working code.

You're right about the the local variables; they were placed to look
like the surrounding code rather than to be as local as possible. I'm
not sure if all the PostgreSQL code is completely consistent here; a
quick survey shows examples of both "put all the variables at the top"
and "make variables as local to blocks as possible" styles. I don't know
that it really makes any difference with modern compilers, either. I'm
sure someone else will have a stronger opinion on this subject now that
I've trolled the list for one by writing this.

> In addition, documents about how to use the statistics would be
> necessary, maybe in "23.1.5. The Autovacuum Daemon".
> I'll set the status of this patch to waiting-on-author. Would you rebase
> the patch and post it again?
>   

I didn't do that because there's not really much documentation at this
level of detail yet--how to interpret all the information in the logs.
That's an open-ended bit of work; there's a lot more that could be
written on this topic than the docs have right now. It's probably worth
pointing out that this specific info is now in the logs, though, given
that people might not notice it otherwise. I'll see what I can do about
that.

As a general FYI, rebasing is normally requested only when the existing
patch doesn't apply anymore. If "patch" or "git apply" can consume it,
complaints about code shifting around isn't by itself enough reason to
ask for an updated patch.

-- 
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Unremovable tuple monitoring

2011-10-04 Thread Greg Smith

On 10/04/2011 03:45 PM, Royce Ausburn wrote:
I think I get this stats stuff now. Unless someone here thinks it's 
too hard for a new postgres dev's 2nd patch, I could take a stab. I 
might take a look at it tonight to get a feel for how hard, and what 
stats we could collect. I'll start a new thread for discussion.


Adding statistics and good monitoring points isn't hard to do, once you 
figure out how the statistics messaging works.  From looking at your 
patch, you seem to be over that part of the learning curve already.  The 
most time consuming part for vacuum logging patches is setting up the 
test cases and waiting for them to execute.  If you can provide a script 
that does that, it will aid in getting review off to a goo start.  
Basically, whatever you build to test the patch, you should think about 
packaging into a script you can hand to a reviewer/committer.  Normal 
approach is to build a test data set with something like 
generate_series, then create the situation the patch is supposed to log.


Just to clarify what Robert was suggesting a little further, good 
practice here is just to say "this patch needs a catversion bump", but 
not actually do it.  Committers should figure that out even if you don't 
mention it, but sometimes a goof is made; a little reminder doesn't hurt.



I'm not sure what my next step should be.  I've added this patch to the open 
commit fest -- is that all for now until the commit fest begins review?
   


Basically, we'll get to it next month.  I have my own autovacuum logging 
stuff I'm working on that I expect to slip to that one too, so I can 
easily take on reviewing yours then.  I just fixed the entry in the CF 
app to follow convention by listing your full name.


Main feedback for now on the patch is that few people ever use 
pg_stat_all_tables.  The new counter needs to go into 
pg_stat_user_tables and pg_stat_sys_tables if it's going to be visible 
to the people who are most likely to need it.  I updated the name of the 
patch on the CommitFest to read "Unremovable tuple count on 
pg_stat_*_tables" so the spec here is more clear.  I'd suggest chewing 
on the rest of your ideas, see what else falls out of this, and just 
make sure to submit another update just before the next CF starts.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Separating bgwriter and checkpointer

2011-09-21 Thread Greg Smith

On 09/20/2011 09:35 AM, Heikki Linnakangas wrote:
Yeah, I was thinking of write()s, not fsyncs. I agree this might have 
some effect during fsync phase.


Right; that's where the most serious problems seem to pop up at anyway 
now.  All the testing I did earlier this year suggested Linux at least 
is happy to do a granular fsync, and it can also use things like 
barriers when appropriate to schedule I/O.  The hope here is that the 
background writer work to clean ahead of the strategy point is helpful 
to backends, and that should keep going even during the sync 
phase--which currently doesn't pause for anything else once it's 
started.  The cleaner writes should all queue up into RAM in a lazy way 
rather than block the true I/O, which is being driven by sync calls.


There is some risk here that the cleaner writes happen faster than the 
true rate at which backends really need buffers, since it has a 
predictive component it can be wrong about.  Those could in theory 
result in the write cache filling faster than it would in the current 
environment, such that writes truly block that would have been cached in 
the current code.  If you're that close to the edge though, backends 
should really benefit from the cleaner--that same write done by a client 
would turn into a serious stall.  From that perspective, when things 
have completely filled the write cache, any writes the cleaner can get 
out of the way in advance of when a backend needs it should be the 
biggest win most of the time.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there really no interest in SQL Standard?

2011-09-19 Thread Greg Smith

On 09/19/2011 10:58 AM, Joe Abbate wrote:

Maybe it's time for an open source SQL standard, one not controlled by
the "big guys" and their IP claims.
   


Not spending as much time sitting in meetings and fighting with other 
vendors is one of the competitive advantages PostgreSQL development has 
vs. the "big guys".  There needs to be a pretty serious problem with 
your process before adding bureaucracy to it is anything but a backwards 
move.  And standardization tends to attract lots of paperwork.  Last 
thing you want to be competing with a big company on is doing that sort 
of big company work.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2011-09-19 Thread Greg Smith

On 09/19/2011 10:53 AM, Thom Brown wrote:

But couldn't that also be seen as a chicken/egg situation?



The chicken/egg problem here is a bit deeper than just "no one offers 
GPUs because no one wants them" on server systems.  One of the reasons 
there aren't more GPUs in typical database server configurations is that 
you're already filling up some number of the full size slots, and 
correspondingly the bandwidth available to cards, with disk 
controllers.  It doesn't help that many server class motherboards don't 
even have a x16 PCI-e slot on them, which is what most GPUs as delivered 
on regular consumer video cards are optimized for.



But nVidia does produce a non-graphics-oriented GPGPU line called
Tesla dedicated to such processing.
   


Tesla units start at around $1500 USD, which is a nice budget to spend 
on either more RAM (to allow higher work_mem), faster storage to store 
temporary files onto, or a faster CPU to chew through all sorts of tasks 
more quickly.  The Tesla units are easy to justify if you have a serious 
GPU-oriented application.  The good bang for the buck point with CPU 
sorting for PostgreSQL is probably going to be a $50-$100 video card 
instead.  For example, the card Vitor is seeing good results on costs 
around $60.  (That's also a system with fairly slow RAM, though; it will 
be interesting to see if the gain holds up on newer systems.)


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-19 Thread Greg Smith

On 09/05/2011 07:52 PM, Tomas Vondra wrote:

If your logging criteria for the write phase was "display a message any
time more than 30 seconds have passed since last seeing one", that would
give you only a few lines of output in a boring, normal
checkpoint--certainly less than the 9 in-progress samples you're
outputting now, at 10% intervals.  But in the pathological situations
where writes are super slow, your log data would become correspondingly
denser, which is exactly what you want in that situation.
 

I still am not sure what should be a reasonable value or how to determine
it. What happens when the checkpoint_timeout is increased, there's more
shared_buffers etc.? What about using (checkpoint_timeout/10) for the
time-based checkpoints and 30s for the other checkpoints?
   


That may work fine.  Maybe implement it like that, and see if the amount 
of logging detail is reasonable in a couple of test scenarios.



I think combining the two makes the most sense:  "log when>=30 seconds
have passed since the last message, and there's been>=10% more progress
made".  (Maybe do the progress check before the time one, to cut down on
 

Is this is a good idea? The case when the timeout expires and not much
data was written is interesting, and this would not log it. But OTOH this
would nicely solve the issue with time-based checkpoints and a fixed
threshold.
   


One thing I am trying to avoid here is needing to check the system clock 
after every buffer write.  I also consider it useful to put an upper 
bound on how many of these messages will appear even in the verbose 
mode.  This deals with both those problems.


Yes, there is a potential problem with this idea.  Let's say checkpoint 
writes degrade to where they take an hour.  In that case, you won't see 
the first progress report until 6 minutes (10%) have gone by with this 
implementation.  I don't see a good way to resolve that without 
violating one of the other priorities I listed above though.  You'll 
have to poll the system clock constantly and will end up creating a lot 
of log entries if you don't do a check against the % progress first.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2011-09-19 Thread Greg Smith

On 09/19/2011 10:12 AM, Greg Stark wrote:

With the GPU I'm curious to see how well
it handles multiple processes contending for resources, it might be a
flashy feature that gets lots of attention but might not really be
very useful in practice. But it would be very interesting to see.
   


The main problem here is that the sort of hardware commonly used for 
production database servers doesn't have any serious enough GPU to 
support CUDA/OpenCL available.  The very clear trend now is that all 
systems other than gaming ones ship with motherboard graphics chipsets 
more than powerful enough for any task but that.  I just checked the 5 
most popular configurations of server I see my customers deploy 
PostgreSQL onto (a mix of Dell and HP units), and you don't get a 
serious GPU from any of them.


Intel's next generation Ivy Bridge chipset, expected for the spring of 
2012, is going to add support for OpenCL to the built-in motherboard 
GPU.  We may eventually see that trickle into the server hardware side 
of things too.


I've never seen a PostgreSQL server capable of running CUDA, and I don't 
expect that to change.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Greg Smith

On 09/02/2011 11:10 AM, Tomas Vondra wrote:

My 'ideal' solution would be either to add another GUC (to turn this
on/off) or allow log_checkpoints to have three values

log_checkpoints = {off, normal, detailed}

where 'normal' provides the current output and 'detail' produces this much
verbose output.
   


If this is going to be acceptable, that's likely the only path it could 
happen by and still meet what you're looking for.  I will just again 
stress that the part you're working on instrumenting better right now is 
not actually where larger systems really run into the most problems 
here, based on what I've seen.  I added a series of log messages to 9.1 
at DEBUG1, aimed at tracking the sync phase.  That's where I see many 
more checkpoint issues than in the write one.  On Linux in particular, 
it's almost impossible for the write phase to be more of a problem than 
the sync one.


So the logging you're adding here I don't ever expect to turn on.  But I 
wouldn't argue against an option to handle the logging use-case you're 
concerned about.  Letting people observe for themselves and decide which 
of the phases is more interesting to their workload seems appropriate.  
Then users have options for what to log, no matter which type of problem 
they run into.


If you're expanding log_checkpoints to an enum, for that to handle what 
I think everybody might ever want (for what checkpoints do now at 
least), I'd find that more useful if it happened like this instead:


log_checkpoints = {off, on, write, sync, verbose}

I don't think you should change the semantics of off/on, which will 
avoid breaking existing postgresql.conf files and resources that suggest 
tuning advice.  "write" can toggle on what you're adding; "sync" should 
control whether the DEBUG1 messages showing the individual file names in 
the sync phase appear; and "verbose" can include both.


As far as a heuristic for making this less chatty when there's nothing 
exciting happening goes, I think something based on how much time has 
passed would be the best one.  In your use case, I would guess you don't 
really care whether a message appears every n%.  If I understand you 
correctly now, you would mainly care about getting enough log detail to 
know 1) when things are running really slow, or b) often enough that the 
margin of error in your benchmark results from unaccounted checkpoint 
writes is acceptable.  In both of those cases, I'd think a time-based 
threshold would be appropriate, and that also deals with the time-based 
checkpoints, too.


If your logging criteria for the write phase was "display a message any 
time more than 30 seconds have passed since last seeing one", that would 
give you only a few lines of output in a boring, normal 
checkpoint--certainly less than the 9 in-progress samples you're 
outputting now, at 10% intervals.  But in the pathological situations 
where writes are super slow, your log data would become correspondingly 
denser, which is exactly what you want in that situation.


I think combining the two makes the most sense:  "log when >=30 seconds 
have passed since the last message, and there's been >=10% more progress 
made".  (Maybe do the progress check before the time one, to cut down on 
gettimeofday() calls)  That would give you 4 in-progress reports during 
a standard 2.5 minute write phase, and in cases where the checkpoints 
are taking a long time you'd get as many as 9.  That's pretty close to 
auto-tuning the amount of log output, so the amount of it is roughly 
proportional to how likely the information it's logging will be 
interesting.


We certainly don't want to add yet another GUC just to control the 
frequency.  I don't think it will be too hard to put two hard-coded 
thresholds in and do good enough for just about everyone though.  I 
would probably prefer setting those thresholds to 60 seconds/20% 
instead.  That might not be detailed enough for you though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Displaying accumulated autovacuum cost

2011-08-29 Thread Greg Smith

On 08/29/2011 11:03 AM, Robert Haas wrote:

Instead of doing this only when vacuum costing is active, could we
drive it off of the pgBufferUsage stuff (maybe with a few tweaks...)
and do it unconditionally?
   


Sure.  I've wondered about an ever larger refactoring, to reorient 
vacuum costing around completely:  drive it all from the pgBufferUsage 
side rather than running its own totals.  I didn't even start wandering 
down that path yet because of time constraints, plus the desire to have 
something I could backport to installs having VACUUM issues on earlier 
versions.  This code I'd backport without hesitation; something that 
wanders toward a more complicated rearrangement becomes harder to deal with.



To me it seems like it would better to say "write rate xyz MB/s"
rather than "xyz MB/s write rate", but maybe I'm in the minority on
that one.
   


I was just trying to mimic the style of the logging already there as 
closely as I could.  I don't like the way the existing log message looks 
either.  I wasn't going to ignore its style over that though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Displaying accumulated autovacuum cost

2011-08-28 Thread Greg Smith
Updated patch cleans up two diff mistakes made when backing out the 
progress report feature.  The tip-off I screwed up should have been the 
absurdly high write rate shown.  The usleep was accidentally deleted, so 
it was running without cost limits even applying.  Here's a good one 
instead:


LOG:  automatic vacuum of table "pgbench.public.pgbench_accounts": index 
scans: 1

pages: 0 removed, 163935 remain
tuples: 200 removed, 2928356 remain
buffer usage: 117393 hits, 123351 misses, 102684 dirtied, 2.168 
MiB/s write rate

system usage: CPU 2.54s/6.27u sec elapsed 369.99 sec

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 889737e..c9890b4 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*** vacuum(VacuumStmt *vacstmt, Oid relid, b
*** 214,219 
--- 214,222 
  
  		VacuumCostActive = (VacuumCostDelay > 0);
  		VacuumCostBalance = 0;
+ 		VacuumPageHit = 0;
+ 		VacuumPageMiss = 0;
+ 		VacuumPageDirty = 0;
  
  		/*
  		 * Loop to process each selected relation.
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index b5547c5..a41f1cd 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 151,165 
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
  
  	/* measure elapsed time iff autovacuum logging requires it */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0)
  			starttime = GetCurrentTimestamp();
  	}
  
--- 151,168 
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0, endtime;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
+ 	long		secs;
+ 	int			usecs;
+ 	double		write_rate;
  
  	/* measure elapsed time iff autovacuum logging requires it */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0 || VacuumCostActive)
  			starttime = GetCurrentTimestamp();
  	}
  
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 225,247 
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(),
  	   Log_autovacuum_min_duration))
! 			ereport(LOG,
! 	(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 			"pages: %d removed, %d remain\n"
! 			"tuples: %.0f removed, %.0f remain\n"
! 			"system usage: %s",
! 			get_database_name(MyDatabaseId),
! 			get_namespace_name(RelationGetNamespace(onerel)),
! 			RelationGetRelationName(onerel),
! 			vacrelstats->num_index_scans,
! 			vacrelstats->pages_removed,
! 			vacrelstats->rel_pages,
! 			vacrelstats->tuples_deleted,
! 			vacrelstats->new_rel_tuples,
! 			pg_rusage_show(&ru0;
  	}
  }
  
--- 228,282 
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
+ 		endtime = GetCurrentTimestamp();
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, endtime,
  	   Log_autovacuum_min_duration))
! 		{
! 			if (VacuumCostActive)
! 			{
! TimestampDifference(starttime, endtime, &secs, &usecs);
! write_rate = 0;
! if ((secs > 0) || (usecs > 0))
! 	write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) /
! 		(secs + usecs / 100.0);
! 
! ereport(LOG,
! 		(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! "pages: %d removed, %d remain\n"
! "tuples: %.0f removed, %.0f remain\n"
! "buffer usage: %d hits, %d misses, %d dirtied, %.3f MiB/s write rate\n"
! "system usage: %s",
! get_database_name(MyDatabaseId),
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel),
! vacrelstats->num_index_scans,
! vacrelstats->pages_removed,
! vacrelstats->rel_pages,
! vacrelstats->tuples_deleted,
! vacrelstats->new_rel_tuples,
! VacuumPageHit,
! VacuumPageMiss,
! VacuumPageDirty,
! write_rate,
! pg_rusage_show(&ru0;
! 			}
! 			else
! ereport(LOG,

Re: [HACKERS] Displaying accumulated autovacuum cost

2011-08-28 Thread Greg Smith
Attached patch includes "math is hard" reworking, so it displays the 
average write rate in the log output automatically:


LOG:  automatic vacuum of table "pgbench.public.pgbench_accounts": index 
scans: 1

pages: 0 removed, 163935 remain
tuples: 200 removed, 4625165 remain
buffer usage: 111901 hits, 123348 misses, 102351 dirtied, 23.365 
MiB/s write rate

system usage: CPU 1.84s/4.22u sec elapsed 34.22 sec

All of the updates to the process title are gone, in favor of some 
progress report mechanism TBD.  The summary is much more important than 
the progress tracking part as far as I'm concerned, I don't mind 
splitting things apart to try and get this part in earlier.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 889737e..fa15b2e 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
***
*** 43,48 
--- 43,49 
  #include "utils/fmgroids.h"
  #include "utils/guc.h"
  #include "utils/memutils.h"
+ #include "utils/ps_status.h"
  #include "utils/snapmgr.h"
  #include "utils/syscache.h"
  #include "utils/tqual.h"
*** vacuum(VacuumStmt *vacstmt, Oid relid, b
*** 214,219 
--- 215,223 
  
  		VacuumCostActive = (VacuumCostDelay > 0);
  		VacuumCostBalance = 0;
+ 		VacuumPageHit = 0;
+ 		VacuumPageMiss = 0;
+ 		VacuumPageDirty = 0;
  
  		/*
  		 * Loop to process each selected relation.
*** vacuum_delay_point(void)
*** 1160,1167 
  		if (msec > VacuumCostDelay * 4)
  			msec = VacuumCostDelay * 4;
  
- 		pg_usleep(msec * 1000L);
- 
  		VacuumCostBalance = 0;
  
  		/* update balance values for workers */
--- 1164,1169 
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index b5547c5..a41f1cd 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 151,165 
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
  
  	/* measure elapsed time iff autovacuum logging requires it */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0)
  			starttime = GetCurrentTimestamp();
  	}
  
--- 151,168 
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0, endtime;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
+ 	long		secs;
+ 	int			usecs;
+ 	double		write_rate;
  
  	/* measure elapsed time iff autovacuum logging requires it */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0 || VacuumCostActive)
  			starttime = GetCurrentTimestamp();
  	}
  
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 225,247 
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(),
  	   Log_autovacuum_min_duration))
! 			ereport(LOG,
! 	(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 			"pages: %d removed, %d remain\n"
! 			"tuples: %.0f removed, %.0f remain\n"
! 			"system usage: %s",
! 			get_database_name(MyDatabaseId),
! 			get_namespace_name(RelationGetNamespace(onerel)),
! 			RelationGetRelationName(onerel),
! 			vacrelstats->num_index_scans,
! 			vacrelstats->pages_removed,
! 			vacrelstats->rel_pages,
! 			vacrelstats->tuples_deleted,
! 			vacrelstats->new_rel_tuples,
! 			pg_rusage_show(&ru0;
  	}
  }
  
--- 228,282 
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
+ 		endtime = GetCurrentTimestamp();
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, endtime,
  	   Log_autovacuum_min_duration))
! 		{
! 			if (VacuumCostActive)
! 			{
! TimestampDifference(starttime, endtime, &secs, &usecs);
! write_rate = 0;
! if ((secs > 0) || (usecs > 0))
! 	write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) /
! 		(secs + usecs / 100.0);
! 
! ereport(LOG,
! 		(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! "pages: %d removed, %d remain\n"
! "tuples: %.

Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-08-27 Thread Greg Smith

On 08/27/2011 12:01 AM, Noah Misch wrote:

On Fri, Aug 26, 2011 at 10:46:33AM +0200, Tomas Vondra wrote:
   

1. collect pg_stat_bgwriter stats
2. run pgbench for 10 minutes
3. collect pg_stat_bgwriter stats (to compute difference with (1))
4. kill the postmaster

The problem is that when checkpoint stats are collected, there might be a
checkpoint in progress and in that case the stats are incomplete. In some
cases (especially with very small db blocks) this has significant impact
because the checkpoints are less frequent.
 

Could you remove this hazard by adding a step "2a. psql -c CHECKPOINT"?
   


That's what I do in pgbench-tools, and it helps a lot.  It makes it 
easier to identify when the checkpoint kicks in if you know it's 
approximately the same time after each test run begins, given similar 
testing parameters.  That said, it's hard to eliminate all of the edge 
conditions here.


For example, imagine that you're consuming WAL files such that you hit 
checkpoint_segments every 4 minutes.  In a 10 minute test run, a 
checkpoint will start at 4:00 and finish at around 6:00 (with 
checkpoint_completion_target=0.5).  The next will start at 8:00 and 
should finish at around 10:00--right at the end of when the test ends.  
Given the variation that sync timing and rounding issues in the write 
phase adds to things, you can expect that some test runs will include 
stats from 2 checkpoints, while others will end the test just before the 
second one finishes.  It does throw the numbers off a bit.


To avoid this when it pops up, I normally aim to push up to where there 
are >=4 checkpoints per test run, just so whether I get n or n-1 of them 
doesn't impact results as much.  But that normally takes doubling the 
length of the test to 20 minutes.  As it will often take me days of test 
time to plow through exploring just a couple of parameters, I'm 
sympathetic to Tomas trying to improve accuracy here without having to 
run for quite so long.  There's few people who have this problem to 
worry about though, it's a common issue with benchmarking but not many 
other contexts.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-08-26 Thread Greg Smith

On 08/26/2011 03:54 AM, Magnus Hagander wrote:

In theory, this could be the "progress view" or
"progress field" talked about around Gregs previous patch - or it
could just be modifying the commandstring in pg_stat_activity.


Right.  The whole progress indicator idea is hard to do for queries in 
general.  But there's enough of these other progress indicator ideas 
around now that it may be worth putting a standard way to handle them in 
here.  It sounds like that would be sufficient to address the area Tomas 
is trying to instrument better.  I badly want a progress indicator on 
CREATE INDEX CONCURRENTLY too, to at least let me know what phase of the 
build process it's on.  That's turned into a major headache recently.


If we run with the idea of just allowing backends to publish a progress 
text string, I think this one maps into a similar space as the 
autovacuum one.  Publishing how many seconds the operation has been 
running for may be reasonable too.  Whether the overhead of the timing 
calls necessary to compute that will be high or not depends on the 
refresh rate of the progress info.  My suggestion before was to name 
these as key=value pairs for easy parsing; here's three examples now:


autovacumm:  pgbench_accounts h=182701 m=301515 d=321345 s=62.231
(cache hits, cache misses, dirty writes, seconds)

background writer:  checkpoint b=511 t=3072 s=5.321
(buffers written, total, seconds)

create index concurrently:  pgbench_accounts p=1 b=62 t=6213 s=81.232
(phase, blocks processed, total block estimate, seconds)

I think that the idea of making this easily human readable is 
optimistic, because it will make all these strings big enough to start 
mattering.  Given that, we almost have to assume the only consumers of 
this data will be able to interpret it using the documentation.  I'd be 
happy with just the minimal data set in each case, not including any 
statistics you can easily derive from the values given (like the MB/s 
readings).  Adding that figure in particular to more of the log messages 
would be nice though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-08-26 Thread Greg Smith

On 08/25/2011 04:57 PM, Tomas Vondra wrote:

(b) sends bgwriter stats (so that the buffers_checkpoint is updated)
   


The idea behind only updating the stats in one chunk, at the end, is 
that it makes one specific thing easier to do.  Let's say you're running 
a monitoring system that is grabbing snapshots of pg_stat_bgwriter 
periodically.  If you want to figure out how much work a checkpoint did, 
you only need two points of data to compute that right now.  Whenever 
you see either of the checkpoint count numbers increase, you just 
subtract off the previous sample; now you've got a delta for how many 
buffers that checkpoint wrote out.  You can derive the information about 
the buffer counts involved that appears in the logs quite easily this 
way.  The intent was to make that possible to do, so that people can 
figure this out without needing to parse the log data.


Spreading out the updates defeats that idea.  It also makes it possible 
to see the buffer writes more in real-time, as they happen.  You can 
make a case for both approaches having their use cases; the above is 
just summarizing the logic behind why it's done the way it is right 
now.  I don't think many people are actually doing things with this to 
the level where their tool will care.  The most popular consumer of 
pg_stat_bgwriter data I see is Munin graphing changes, and I don't think 
it will care either way.


Giving people the option of doing it the other way is a reasonable idea, 
but I'm not sure there's enough use case there to justify adding a GUC 
just for that.  My next goal here is to eliminate checkpoint_segments, 
not to add yet another tunable extremely few users would ever touch.


As for throwing more log data out, I'm not sure what new analysis you're 
thinking of that it allows.  I/O gets increasingly spiky as you zoom in 
on it; averaging over a shorter period can easily end up providing less 
insight about trends.  If anything, I spend more time summarizing the 
data that's already there, rather than wanting to break them down.  It's 
already providing way too much detail for most people.  Customers tell 
me they don't care to see checkpoint stats unless they're across a day 
or more of sampling, so even the current "once every ~5 minutes" is way 
more info than they want.  I have all this log parsing code and things 
that look at pg_stat_bgwriter to collect that data and produce higher 
level reports.  And lots of it would break if any of this patch is added 
and people turn it on.  I imagine other log/stat parsing programs might 
suffer issues too.  That's your other hurdle for change here:  the new 
analysis techniques have to be useful enough to justify that some 
downstream tool disruption is inevitable.


If you have an idea for how to use this extra data for something useful, 
let's talk about what that is and see if it's possible to build it in 
instead.  This problem is harder than it looks, mainly because the way 
the OS caches writes here makes trying to derive hard numbers from what 
the background writer is doing impossible.  When the database writes 
things out, and when they actually get written to disk, they are not the 
same event.  The actual write is often during the sync phase, and not 
being able to tracking that beast is where I see the most problems at.  
The write phase, the easier part to instrument in the database, that is 
pretty boring.  That's why the last extra logging I added here focused 
on adding visibility to the sync activity instead.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Displaying accumulated autovacuum cost

2011-08-22 Thread Greg Smith

On 08/22/2011 05:54 PM, Jim Nasby wrote:
I know folks have talked about progress, but I haven't seen anything 
specific... could you add info about what table/index vacuum is 
working on, and how far along it is? I realize that's not very close 
to an actual % completion, but it's far better than what we have right 
now.


Due to complaints about the mechanism the first version used to inform 
the user of the progress, I'm yanking that from the next patch 
altogether.  The goal for now is to get a good report into the logs, and 
then maybe that gets extended later with a progress report.  (All of the 
proposed alternate mechanisms are way more complicated than anything I 
have time to do right now)



FWIW, the number I end up caring about isn't so much write traffic as read. 
Thanks to a good amount of battery-backed write cache (and possibly some iSCSI 
misconfiguration), our writes are generally much cheaper than our reads.
   


VACUUM can't really know its true read rate from what's inside the 
database.  I can add a summary of the accumulated read amounts into the 
logs, in more useful figures than what is provided so far, which is 
better than nothing.  But those will be kind of deceptive, which is one 
reason I wasn't so focused on them yet.  If the relation is largely in 
the OS cache, but not the PostgreSQL one, the summary can show a read 
rate even when that isn't actually doing any reads at all.  That was 
exactly the case in the sample data I posted.  VACUUM thought it was 
reading anywhere from 2.5 to 6MB/s.  But at the OS level, it was 
actually reading zero bytes, since the whole thing was in cache already.


What you actually want is a count of the accumulated read counters at 
the OS level.  I've recently figured out how to track those, too, but 
that code is something that lives outside the database.  If this is 
something useful to you, I think you're about to sign up to be my next 
beta tester for that program.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Displaying accumulated autovacuum cost

2011-08-18 Thread Greg Smith

On 08/18/2011 10:54 AM, Peter Eisentraut wrote:

So how about adding a column to pg_stat_activity, progress_metrics or
something like that, and add that information there.
   


Adding a field here (I'd go for the simpler "progress") and updating it 
regularly would be a reasonable way to go here.  This data doesn't 
really need to go into the traditional statistics infrastructure to be 
useful.  I didn't start there because I was already getting pushback on 
overloading the stats collector with constantly updated metrics last 
time I did something in this area.  I wasn't going to try and argue why 
it was worth it in this case, just like I'm not going to argue about the 
complaint over the command string overhead being too high--just going to 
not do that instead.  If the bikeshed I built doesn't look fancy enough 
to hold the bike I put in there, I'm not going to build a better one 
right now--I'll just put a cheaper bike in there instead.


I was hoping to eventually take the useful summary bits at the end, the 
totals, and save those into statistics somewhere each time a VACUUM of 
either sort finishes.  It would fit with the information shown in 
pg_stat_tables, but that's obviously getting too wide.  Breaking out a 
pg_stat_autovacuum view that contains all the relevant bits currently 
shown in that view, plus these 3 new fields, would be a reasonable start.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Displaying accumulated autovacuum cost

2011-08-18 Thread Greg Smith

On 08/18/2011 10:12 AM, Robert Haas wrote:

Perhaps a reasonable way to break up the patch would be:
- Part 1: Gather the information and display it in the
log_autovacuum_min_duration output.
- Part 2: Add the ability to see the information incrementally (via
some mechanism yet to be agreed upon).
   


My reaction to all the suggestions for redesign is just that: pull out 
the part that does the incremental updates altogether, improve the part 
that dumps the info into the logs, and resubmit without any incremental 
progress for now.  This is much more valuable to me if the first commit 
that hits is something I can backport trivially.  I'm seeing enough 
production servers running into this problem right now on earlier 
versions to be worried about that, and the log dump at the end would be 
a huge help even if that was all they got.  I'm going to add directly 
computing the write MB/s figure from the dirty data written too, since 
that ends up being the thing that I keep deriving by hand anyway.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Displaying accumulated autovacuum cost

2011-08-17 Thread Greg Smith

On 08/17/2011 07:42 PM, Euler Taveira de Oliveira wrote:
I don't like exposing this information only on title processes. It 
would be difficult for client apps (for example, PGAdmin) to track 
this kind of information and it is restricted to local access. I'm not 
objecting to display this information in process title; I'm just 
saying that that information should be exposed in  functions (say 
pg_stat_get_vacuum_[hit|miss|dirty]) too.


I tend to build the simplest possible thing that is useful enough to 
work.  The data is getting stored and shown now, where it wasn't 
before.  If it's possible to expose that in additional ways later too, 
great.  The big step up for this information is to go from 
"unobtainable" to "obtainable".  I'd prefer not to add a quest for 
"easily obtainable" to the requirements until that big jump is made, for 
fear it will cause nothing to get delivered.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Displaying accumulated autovacuum cost

2011-08-17 Thread Greg Smith
K 
(usually) buffers.  Multiply that by 8192/(60*1024*1024), and you get 
MB/s out of there.  I collected that data for a cleanup run of the 
pgbench_accounts damage done above, CSV file with all the statistics is 
attached.


I also collected OS level stats from Linux about the actual read/write 
rate of the process, converted into "Write Mbps" (those are actually in 
MB/s, sloppy capitalization is via OpenOffice "autocorrect").  Those 
numbers are close enough to make me confident the dirty buffer totals 
tracked here do turn into useful MB/s values.  Sample of the most 
interesting part:


Cost DeltaDirty MbpsWrite Mbps
589,8902.562.73
591,1512.572.73
589,0352.562.72
593,7753.140.20
599,4202.050.00
598,5032.050.00
599,4212.050.00
574,0460.600.01
574,7790.640.67
609,1402.562.68
612,3972.572.69
611,7442.572.69
610,0082.562.68

This shows the expected 600K/minute cost accumulation.  And using the 
dirty= numbers to compute MB/s of write speed closely matches the total 
write speed of this process.  Some of the difference might be I/O to 
other things besides the main table here, some of it is just because OS 
write caching will influence the write rate.  In the spots where the OS 
value and what's derived from the dirty rate diverge most, it appears to 
be because vacuum is filling Linux's write cache.  Actual writes 
accumulated against the process them block for a while.  It's a small 
difference most of the time.


I'd be willing to accept a "Dirty MB/s" figure computed this way as 
accurate enough for most purposes.  And this patch lets you get that 
data, currently unavailable without poking into the OS statistics (if at 
all), just by doing a little log file and/or command string scraping.  
Total at the end or real-time monitoring, based on how much work you 
want to put into it.  For a busy site where one or more autovacuum 
processes are running most of the time, being able to monitor the vacuum 
portion of the I/O this way will be a huge improvement over the current 
state of things.  I already have a stack of tools built on top of this 
data I'm working on, and they're making it much easier to come up with 
an iterative tuning process for autovacuum.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

#"timestamp","page_hit","hit_bytes","page_miss","miss_bytes","page_dirty","dirty_bytes","Dirty Mbps","Total Cost","Cost Delta","Read Bytes","Write Bytes","Read Mbps","Write Mbps"
08/16/2011-18:46:56,21,"172,032",27088,"221,904,896",27079,"221,831,168",2.56,"812,481","589,680",0,"235,945,984",0.00,2.73
08/16/2011-18:47:56,21,"172,032",46737,"382,869,504",46728,"382,795,776",2.56,"1,401,951","589,470",0,"407,298,048",0.00,2.72
08/16/2011-18:48:56,21,"172,032",66359,"543,612,928",66349,"543,531,008",2.55,"1,990,591","588,640",0,"578,527,232",0.00,2.72
08/16/2011-18:49:56,21,"172,032",86092,"705,265,664",86082,"705,183,744",2.57,"2,582,581","591,990",0,"750,551,040",0.00,2.73
08/16/2011-18:50:56,21,"172,032",105713,"866,000,896",105703,"865,918,976",2.55,"3,171,211","588,630",0,"921,657,344",0.00,2.72
08/16/2011-18:51:57,21,"172,032",125369,"1,027,022,848",125359,"1,026,940,928",2.56,"3,760,891","589,680",0,"1,093,173,248",0.00,2.73
08/16/2011-18:52:57,21,"172,032",145019,"1,187,995,648",145008,"1,187,905,536",2.56,"4,350,371","589,480",0,"1,264,517,120",0.00,2.72
08/16/2011-18:53:57,21,"172,032",164717,"1,349,361,664",164706,"1,349,271,552",2.56,"4,941,311","590,940",0,"1,436,352,512",0.00,2.73
08/16/2011-18:54:57,21,"172,032",184387,"1,510,498,304",184376,"1,510,408,192",2.56,"5,531,411","590,100",0,"1,607,917,568",0.00,2.73
08/16/2011-18:55:57,22,"180,224",204078,"1,671,806,976",204067,"1,671,716,864",2.56,"6,122,142","590,731",0,"1,779,671,040",0.00,2.73
08/16/2011-18:56:57,22,"180,224",223706,"1,832,599,552",223695,"1,832,509,440",2.56,"6,710,982","588,840",0,"1,950,982,144",0.00,2.72
08/16/2011-18:57:57,22,"180,224",243369,"1,993,678,848"

Re: [HACKERS] index-only scans

2011-08-15 Thread Greg Smith
t vary too much]

That's 5.4X as fast; not too shabby!  Kind of interesting how much 
different the I/O pattern is on the index-only version.  I ran this test 
against a 3-disk RAID0 set with a 256MB BBWC, so there's some 
possibility of caching here.  But given that each query blows away a 
large chunk of the other's data, I wouldn't expect that to be a huge 
factor here:


gsmith=# select pg_size_pretty(pg_relation_size('pgbench_accounts'));
 pg_size_pretty

 640 MB

gsmith=# select pg_size_pretty(pg_relation_size('pgbench_accounts_pkey'));
 pg_size_pretty

 107 MB

gsmith=# select pg_size_pretty(pg_relation_size('sample_data'));
 pg_size_pretty

 112 MB

And with the large difference in response time, things appear to be 
working as hoped even in this situation.  If you try this on your 
laptop, where drive cache size and random I/O are likely to be even 
slower, you might see an ever larger difference.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] our buffer replacement strategy is kind of lame

2011-08-14 Thread Greg Smith

On 08/12/2011 10:51 PM, Greg Stark wrote:

If you execute a large batch delete or update or even just set lots of
hint bits you'll dirty a lot of buffers. The ring buffer forces the
query that is actually dirtying all these buffers to also do the i/o
to write them out. Otherwise you leave them behind to slow down other
queries. This was one of the problems with the old vacuum code which
the ring buffer replaced. It left behind lots of dirtied buffers for
other queries to do i/o on.
   


I ran into the other side of this when trying to use Linux's relatively 
new dirty_background_bytes tunable to constrain the OS write cache.  
When running with the current VACUUM ring buffer code, if there isn't 
also a large OS write cache backing that, performance suffers quite a 
bit.  I've been adding test rigging to quantify this into pgbench-tools 
recently, and I fear that one of the possible outcomes could pushback 
pressure toward making the database's ring buffer bigger.  Just a 
theory--waiting on some numbers still.


Anyway, I think every idea thrown out here so far needs about an order 
of magnitude more types of benchmarking test cases before it can be 
evaluated at all.  The case I just mentioned is a good example of why.  
Every other test I ran showed a nice improvement with the kernel tuning 
I tried.  But VACUUM was massively detuned in the process.


I have an entire file folder filled with notes on way to reorganize the 
buffer cache, from my background writer work for 8.3.  In my mind 
they're all sitting stuck behind the problem of getting enough 
standardized benchmark workloads to have a performance regression 
suite.  The idea of touching any of this code without a look at a large 
number of different tests is a bit optimistic.  What I expect to happen 
here that all initially proposed changes will end up tuning for one 
workload at the expense of other, not measured ones.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench internal contention

2011-08-01 Thread Greg Smith

On 07/30/2011 09:08 AM, Robert Haas wrote:

If I'm reading the code right, it only modifies __libc_drand48_data on
first call, so as long as we called erand48() at least once before
spawning the child threads, it would probably work.  That seems pretty
fragile in the face of the fact that they explicitly state that
they're modifying the global random generator state and that you
should use erand48_r() if you want reentrant behavior.  So I think if
we're going to go the erand48() route we probably ought to force
pgbench to always use our version rather than any OS-supplied version.
   


By the way:  the landmines in this whole area are what sunk the attempt 
to switch pgbench over to using 64 bits for the accounts table back in 
February.  See the last few messages of 
http://postgresql.1045698.n5.nabble.com/Re-PERFORM-pgbench-to-the-MAXINT-td3337374.html 
to revisit.  I think you've retraced all of that now, but double 
checking your plan against things like the AIX specific weirdness I 
pointed out there may be useful.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incremental checkopints

2011-07-29 Thread Greg Smith

On 07/29/2011 11:04 AM, jord...@go-link.net wrote:

I think that current implementation of checkpoints is not good for huge
shared buffer cache and for many WAL segments. If there is more buffers
and if buffers can be written rarely more updates of buffers can be
combined so total number of writes to disk will be significantly less. I
think that incremental checkpoints can achieve this goal (maybe more) and
price is additional memory (about 1/1000 of size of buffer cache).
   


The current code optimizes for buffers that are written frequently.  
Those will sit in shared_buffers and in the hoped for case, only be 
written once at checkpoint time.


There are two issues with adopting increment checkpoints instead, one 
fundamental, the other solvable but not started on yet:


1) Postponing writes as long as possible always improves the resulting 
throughput of those writes.  Any incremental checkpoint approach will 
detune throughput by some amount.  If you make writes go out more often, 
they will be less efficient; that's just how things work if you 
benchmark anything that allows write combining.  Any incremental 
checkpoint approach is likely to improve latency in some cases if it 
works well, while decreasing throughput in most cases.


2) The incremental checkpoint approach used by other databases, such as 
the MySQL implementation, works by tracking what transaction IDs were 
associated with a buffer update.  The current way PostgreSQL saves 
buffer sync information for the checkpoint to process things doesn't 
store enough information to do that.  As you say, the main price there 
is some additional memory.


From my perspective, the main problem with plans to tweak the 
checkpoint code is that we don't have a really good benchmark that 
tracks both throughput and latency to test proposed changes against.  
Mark Wong has been working to get his TCP-E clone DBT-5 running 
regularly for that purpose, and last I heard that was basically done at 
this point--he's running daily tests now.  There's already a small pile 
of patches that adjust checkpoint behavior around that were postponed 
from being included in 9.1 mainly because it was hard to prove they were 
useful given the benchmark used to test them, pgbench.  I have higher 
hopes for DBT-5 as being a test that gives informative data in this 
area.  I would want to go back and revisit the existing patches (sorted 
checkpoints, spread sync) before launching into this whole new area.  I 
don't think any of those has even been proven not to work, they just 
didn't help the slightly unrealistic pgbench write-heavy workload.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Greg Smith

On 07/26/2011 12:33 PM, Pavan Deolasee wrote:

I think what I am suggesting is that the default pgbench test setup
would probably not give you a good scalability as number of clients
are increased and one reason could be the contention in the small
table. So it might be a good idea to get rid of that and see if we get
much better scalability and understand other bottlenecks.
   


You can easily see this form of contention pulling down results when the 
database itself is really small and the overall transaction rate is very 
high.  With Robert using a scale=100, no more than 80 clients, and 
transaction rates peaking at <10K TPS on a 24 core box, I wouldn't 
expect this form of contention to be a large issue.  It may be dropping 
results a few percent, but I'd be surprised if it was any more than that.


It's easy enough to use "-N" to skip the updates to the smaller tellers 
and branches table to pull that out of the way.  TPS will go up, because 
it's doing less per transaction.  That's not necessarily a better test 
case though, it's just a different one.  The regular case includes a lot 
of overwriting the same blocks in the hot branches and tellers tables.  
That effectively pushes a lot more I/O toward being likely to happen at 
checkpoint time.  Those tables rarely have any significant I/O outside 
of the checkpoint in the standard "TPC-B like" scenario, because their 
usage counts stay high most of the time.


Contention for small tables that are being heavily updated is still 
important to optimize too though.  Which type of test makes more sense 
depends on what aspect of performance you're trying to focus on.  I'll 
sometimes do a full pgbench-tools "sweep" (range of multiple scales and 
clients at each scale) in both regular write and "-N" write modes, just 
to collect the slightly different data each provides.  The form of 
checkpoint I/O spike you see at sync time is very different in the two 
cases, but both usage profiles are important to consider and optimize.  
That Robert has started with the regular case doesn't worry me too much 
now that I've seen the parameters he's using, he's not running it in a 
way where I'd expect branch/teller contention to be a major limiting 
factor on the results.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-25 Thread Greg Smith

On 07/25/2011 04:07 PM, Robert Haas wrote:

I did 5-minute pgbench runs with unlogged tables and with permanent
tables, restarting the database server and reinitializing the tables
between each run.


Database scale?  One or multiple pgbench worker threads?  A reminder on 
the amount of RAM in the server would be helpful for interpreting the 
results too.


The other thing I'd recommend if you're running more write-heavy tests 
is to turn off autovacuum.  Whether or not it kicks in depends on the 
test duration and the TPS rate, which adds a source of variability 
better avoided here. It also means that faster tests end up getting 
penalized by having it run near their end, which makes them no longer 
look like fast results.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench--new transaction type

2011-07-25 Thread Greg Smith

On 07/25/2011 08:12 PM, Jeff Janes wrote:

In the absence of -s and presence of -f, :scale gets set to 1, rather
than to "select count(*) from pgbench_branches".

I don't think it is nice to rely on people to correctly specify -s.  I
would like to change -f so that in the absence of -s it uses the same
scale as -S, etc., do.  But that would probably be too backwards
incompatible to be an acceptable change.
   


Auto-detecting scale only works if you have a database populated with 
the pgbench tables.  You can use "pgbench -f" to run arbitrary bits of 
SQL, using pgbench as the driver program for all sorts of benchmarking 
tasks against other data sets.  For example, at 
http://projects.2ndquadrant.it/sites/default/files/pgbench-intro.pdf I 
show how to use it for testing how fast INSERT statements of various 
sizes can execute.


The very concept of a "scale" may not make sense for other data sets 
that pgbench will happily run against when using "-f".  The only sort of 
heuristic I have considered adding here when running in that mode is:


1) Check if pgbench_branches exists.
2) If so, count the records to derive a scale, as currently done in the 
non "-f" cases

3) Should that scale not match the value of "-s", issue a warning.

You have to assume anyone sophisticated enough to be playing with "-f" 
may be doing something the program doesn't expect or understand, and let 
them do that without trying to "fix" what may be intentional behavior.  
But a check for the most common mistake made in this area wouldn't 
bother people who aren't using pgbench in its original form at all, 
while it would help those new to the program from screwing this up.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --unlogged-tables

2011-07-25 Thread Greg Smith

On 07/25/2011 09:23 AM, Robert Haas wrote:

At some point, we also need to sort out the scale factor limit issues,
so you can make these things bigger.
   


I had a patch to improve that whole situation, but it hasn't seem to nag 
at me recently.  I forget why it seemed less important, but I doubt I'll 
make it another six months without coming to some resolution there.


The two systems I have in for benchmarking right now have 128GB and 
192GB of RAM in them, so large scales should have been tested.  
Unfortunately, it looks like the real-world limiting factor on doing 
lots of tests at big scales is how long it takes to populate the data 
set.  For example, here's pgbench creation time on a big server (48 
cores, 128GB RAM) with a RAID10 array, when scale=2 (292GB):


real174m12.055s
user17m35.994s
sys 0m52.358s

And here's the same server putting the default tablespace (but not the 
WAL) on [much faster flash device I can't talk about yet]:


Creating new pgbench tables, scale=2
real169m59.541s
user18m19.527s
sys0m52.833s

I was hoping for a bigger drop here; maybe I needed to use unlogged 
tables? (ha!)  I think I need to start looking at the pgbench data 
generation stage as its own optimization problem.  Given how expensive 
systems this large are, I never get them for very long before they are 
rushed into production.  People don't like hearing that just generating 
the data set for a useful test is going to take 3 hours; that tends to 
limit how many of them I can schedule running.


And, yes, I'm going to try and sneak in some time to test fastpatch 
locking on one of these before they head into production.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --unlogged-tables

2011-07-22 Thread Greg Smith

On 07/22/2011 08:15 PM, David Fetter wrote:

Do you have any theories as to how indexing on SSD speeds things up?
IIRC you found only marginal benefit in putting WALs there.  Are there
cases that SSD helps more than others when it comes to indexing?
   


Yes, I've found a variety of workloads where using a SSD turns out to be 
slower than the old-school array of drives with a battery-backed write 
cache.  Tiny commits are slower, sequential writes can easily be slower, 
and if there isn't a random I/O component to the job the SSD won't get 
any way to make up for that.


In the standard pgbench case, the heavy UPDATE traffic does a lot of 
random writes to the index blocks of the pgbench_accounts table.  Even 
in cases where the whole index fits into RAM, having the indexes backed 
by a faster store can end up speeding those up, particularly at 
checkpoint time.  And if you can't quite fit the whole index in RAM, but 
it does fit on the SSD, being able to shuffle it in/out of flash as 
needed to look pointers to data blocks is a whole lot better than 
seeking around a regular drive.  That case is where the biggest win 
seems to be at.


I'd like to publish some hard numbers on all this, but have realized I 
need to relocate just the pgbench indexes to do a good simulation.  And 
I'm getting tired of doing that manually.  If I'm going to put time into 
testing this unlogged table variation that Robert has submitted, and I 
expect to, I'm just pointing out I'd like to that the "index on 
alternate tablespace" one available then too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --unlogged-tables

2011-07-22 Thread Greg Smith
That looks straightforward enough.  The other thing I keep realizing 
would be useful recently is to allow specifying a different tablespace 
to switch to when creating all of the indexes.  The old "data here, 
indexes on faster storage here" trick was already popular in some 
environments.  But it's becoming a really big win for environments that 
put indexes on SSD, and being able to simulate that easily with pgbench 
would be nice.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Policy on pulling in code from other projects?

2011-07-22 Thread Greg Smith

On 07/22/2011 02:09 PM, Joshua D. Drake wrote:
Remember this library follows the RFC for URIs which is why I even 
brought it up. If it was just some random parser, I wouldn't even have 
bothered. Do we care about the RFC for URIs?


The main components of the RFC involve:

-Decoding escaped characters entered by percent-encoding
-Parsing the permissible IPv4 and IPv6 addresses
-Handling absolute vs. relative addresses.  This is a lot of the spec, 
and it's not really relevant for PostgreSQL URIs

-Splitting the URI into its five main components

I know I've seen a URL-oriented %-encoding decoder as a PostgreSQL 
function already (I think Gabriele here wrote one).  Surely useful IP 
address decoding functions are already around.  And the splitting part 
seems like a fairly straightforward bit of regular expression work.


I think one crossover point where it's absolutely worth using the 
external library for this purpose is if you have an app that has to 
follow all of the rules around path names.  If this project didn't 
already have libraries around for things like IP address parsing, using 
the library instead would also make more sense.  The remaining chores if 
you don't worry about all the path name trivia, and know how to 
interpret an IP address, seem feasible to do directly.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench--new transaction type

2011-06-29 Thread Greg Smith

On 06/30/2011 12:13 AM, Jeff Janes wrote:

One more thought I had, would it make sense to change this from the
creation of a PL/pgSQL permanent function to instead use the recently
added DO anonymous block syntax?  I think that would be somewhat
cleaner about leaving cruft behind in the database.  But it would
increase the overhead of each outer execution, and would also mean
that it would not be backwards compatible to run against servers
before 9.0
   


I think some measurement of the overhead difference would be needed to 
know for sure about the first part.  I suspect that given the block size 
of 512 now being targeted, that would end up not mattering very much.


pgbench's job is to generate a whole database full of cruft, so I can't 
say I'd find an argument from either side of that to be very 
compelling.  I'm not real busy anymore testing performance of PostgreSQL 
instances from before 9.0 anymore either, so whether this mode was 
compatible with them or not isn't very compelling either.  Just a mixed 
bag all around in those areas.


I would say take a look at what the performance change looks like, and 
see if it turns out to make the patch to pgbench less obtrusive.  The 
main objection against committing this code I can see is that it will 
further complicate pgbench for a purpose not many people care about.  So 
if the DO version ends up with a smaller diff and less impact on the 
codebase, that would likely be a more substantial tie-breaker in its 
favor than any of these other arguments.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith

On 06/24/2011 03:21 PM, Robert Haas wrote:

If I were attacking this problem, I'd be inclined to make a web
application that lists all the commits in a format roughly similar to
the git API, and then lets you tag each commit with tags from some
list (feature, bug-fix, revert, repair-of-previous-commit, etc.).
Some of the tagging (e.g. docs-only) could probably even be done
automatically.  Then somebody could go through once a month and update
all the tags.  I'd be more more willing to volunteer to do that than I
would be to trying to get the right metadata tag in every commit...
   


I tend not to think in terms of solutions that involve web applications 
because I never build them, but this seems like a useful approach to 
consider.  Given that the list of tags is pretty static, I could see a 
table with a line for each commit, and a series of check boxes in 
columns for each tag next to it.  Then you just click on each of the 
tags that apply to that line.


Once that was done, increasing the amount of smarts that go into 
pre-populating which boxes are already filled in could then happen, with 
"docs only" being the easiest one to spot.  A really smart 
implementation here might even eventually make a good guess for "bug 
fix" too, based on whether a bunch of similar commits happened to other 
branches around the same time.  Everyone is getting better lately at 
noting the original SHA1 when fixing a mistake too, so being able to 
identify "repair" seems likely when that's observed.


This approach would pull the work from being at commit time, but it 
would still be easier to do incrementally and to distribute the work 
around than what's feasible right now.  Release note prep takes critical 
project contributors a non-trivial amount of time, this would let anyone 
who felt like tagging things for an hour help with the early stages of 
that.  And it would provide a functional source for the metadata I've 
been searching for too, to drive all this derived data about sponsors etc.


Disclaimer:  as a person who does none of this work currently, my 
suggestions are strictly aimed to inspire those who do in a direction 
that might makes things easier for them.  I can get the sponsor stuff 
I've volunteered to work on finished regardless.  I just noticed what 
seems like it could be a good optimization over here while I was working 
on that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith

On 06/24/2011 03:28 PM, Christopher Browne wrote:

I expect that the correlation between commit and [various parties] is
something that will need to take place outside git.
   


Agreed on everything except the "Author" information that is already 
being placed into each commit.  The right data is already going into 
there, all it would take is some small amount of tagging to make it 
easier to extract programatically.



The existing CommitFest data goes quite a long ways towards capturing
interesting information (with the likely exception of sponsorship);
what it's missing, at this point, is a capture of what commit or
commits wound up drawing the proposed patch into the official code
base.


The main problem with driving this from the CommitFest app is that not 
every feature ends up in there.  Committers who commit their own work 
are one source of those.  Commits for bug fixes that end up being 
notable enough to go into the release notes are another.


I agree it would be nice if every entry marked as "Committed" in the CF 
app included a final link to the message ID of the commit closing it.  
But since I don't ever see that being the complete data set, I find it 
hard to justify enforcing that work.  And the ability to operate 
programatically on the output from "git log" is a slightly easier path 
to walk down than extracting the same from the CF app, you avoid one 
pre-processing step:  extracting the right entries in the database to 
get a list of commit IDs.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith

On 06/24/2011 04:52 PM, Bruce Momjian wrote:

That tagging is basically what I do on my first pass through the release
notes.  For the gory details:

http://momjian.us/main/blogs/pgblog/2009.html#March_25_2009
   


Excellent summary of the process I was trying to suggest might be 
improved; the two most relevant bits:


3   remove insignificant items  2.7k1 day
4   research and reword items   1k  5 days


Some sort of tagging to identify feature changes should drive down the 
time spent on filtering insignificant items.  And the person doing the 
commit already has the context you are acquiring later as "research" 
here.  Would suggesting they try to write a short description at commit 
time drive it and the "reword" phase time down significantly?  Can't say 
for sure, but I wanted to throw the idea out for 
consideration--particularly since solving it well ends up making some of 
this other derived data people would like to see a lot easier to 
generate too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith

On 06/24/2011 01:42 PM, Robert Haas wrote:

I am disinclined to add a "feature"
annotation.  I think it is unlikely that will end up being any more
useful than just extracting either the whole commit message or its
first line.
   


I don't see any good way to extract the list of commits relevant to the 
release notes without something like it.  Right now, you can't just mine 
every commit into the release notes without getting more noise than 
signal.  Something that tags the ones that are adding new features or 
other notable updates, as opposed to bug fixes, doc updates, etc., would 
allow that separation.



I am not inclined to try to track sponsors in the commit message at
all.


I was not suggesting that information be part of the commit.  We've 
worked out a reasonable initial process for the people working on 
sponsored features to record that information completely outside of the 
commit or release notes data.  It turns out though that process would be 
easier to drive if it were easier to derive a feature->{commit,author} 
list though--and that would spit out for free with the rest of this.  
Improving the ability to do sponsor tracking is more of a helpful 
side-effect of something that's useful for other reasons rather than a 
direct goal.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith
There's been a steady flow of messages on pgsql-advocacy since last 
month (threads "Crediting sponsors in release notes?" and "Crediting 
reviewers & bug-reporters in the release notes") talking about who/how 
should receive credited for their work on PostgreSQL.  That discussion 
seems to be me heading in one inevitable direction:  it's not going to 
be possible to make everyone happy unless there's a way to track all of 
these things for each feature added to PostgreSQL:


-Short description for the release notes
-Feature author(s)
-Reviewers and bug reporters
-Sponsors
-Main git commit adding the feature

Now, this is clearly the job for a tool, because the idea that any 
person capable of doing this work will actually do it is 
laughable--everyone qualified is too busy.  It strikes me however that 
the current production of the release notes is itself a time consuming 
and error-prone process that could also be improved by automation.  I 
had an idea for pushing forward both these at once.


Committers here are pretty good at writing terse but clear summaries of 
new features when they are added.  These are generally distilled further 
for the release notes.  It strikes me that a little decoration of commit 
messages might go a long way toward saving time in a few areas here.  
I'll pick a simple easy example I did to demonstrate; I wrote a small 
optimization to commit_delay committed at 
http://archives.postgresql.org/message-id/e1pqp72-0001us...@gemulon.postgresql.org


This made its way into the release notes like this:

  Improve performance of commit_siblings (Greg Smith)
  This allows the use of commit_siblings with less overhead.

What if the commit message had been decorated like this?

  Feature:  Improve performance of commit_siblings

  Optimize commit_siblings in two ways to improve group commit.
  First, avoid scanning the whole ProcArray once we know there...

With that simple addition, two things become possible:

-Generating a first draft of the release notes for a new version could 
turn into a script that parses the git commit logs, which has gotta save 
somebody a whole lot of time each release that goes into the first draft 
of the release notes.
-All of these other ways to analyze of the contributors would be much 
easier to maintain.  A little "Author:" decoration to that section of 
each commit would probably be welcome too.


I'm sure someone is going to reply to this suggesting some git metadata 
is the right way to handle this, but that seems like overkill to me.  I 
think there's enough committer time gained in faster release note 
generation for this decoration to payback its overhead, which is 
important to me--I'd want a change here to net close to zero for 
committers.  And the fact that it would also allow deriving all this 
other data makes it easier to drive the goals rising out of advocacy 
forward too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch - Debug builds without optimization

2011-06-20 Thread Greg Smith

On 06/20/2011 01:34 PM, Tom Lane wrote:

I was trying to illustrate how to have minimal turnaround time
when testing a small code change.  Rebuilding from scratch is slow
enough that you lose focus while waiting.  (Or I do, anyway.)
   


I just keep upgrading to the fastest CPU I can possibly justify to avoid 
losing focus; it goes fast with 8 cores.  I was trying to demonstrate 
that peg makes this very high level now, and I was more jousting at the 
idea that everyone should bother to write their own individual reinstall 
script.


The peg code makes it easy to assimilate whatever other neat 
optimization ideas one might come across.  I just pushed an update out 
that absorbed this one, so now if you do:


stop
peg rebuild

It uses the install-bin trick you suggested.  It even does a couple of 
sanity checks so that it will probably fall back to a regular build if 
it doesn't look like you have a good install and binary tree already.  
Maybe I'll make a "reinstall" alias that does this combination next.


I don't expect to improve your workflow.  But people who haven't already 
invested a good chunk of work in automating things already will probably 
take some time to catch up with where peg puts them on day one.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] cache estimates, cache access cost

2011-06-20 Thread Greg Smith

Kevin Grittner wrote:

But its not hard to imagine an application mix where this
feature could cause a surprising ten-fold performance drop after
someone does a table scan which could persist indefinitely.  I'm not
risking that in production without a clear mechanism to
automatically recover from that sort of cache skew


The idea that any of this will run automatically is a dream at this 
point, so saying you want to automatically recover from problems with 
the mechanism that doesn't even exist yet is a bit premature.  Some of 
the implementation ideas here might eventually lead to where real-time 
cache information is used, and that is where the really scary feedback 
loops you are right to be worried about come into play.  The idea for 
now is that you'll run this new type of ANALYZE CACHE operation 
manually, supervised and at a time where recent activity reflects the 
sort of workload you want to optimize for.  And then you should review 
its results to make sure the conclusions it drew about your cache 
population aren't really strange.


To help with that, I was thinking of writing a sanity check tool that 
showed how the cached percentages this discovers compare against the 
historical block hit percentages for the relation.  An example of how 
values changed from what they were already set to after a second ANALYZE 
CACHE is probably useful too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench--new transaction type

2011-06-20 Thread Greg Smith

Itagaki Takahiro wrote:

Anyway, I'm not sure we need to include the query mode into the pgbench's
codes. Instead, how about providing "a sample script" as a separate sql
file? pgbench can execute any script files with -f option.
  


When you execute using "-f", it doesn't correctly detect database 
scale.  Also, the really valuable thing here is seeing the higher 
selects/second number come out in the report.  I just realized neither 
Jeff nor myself ever included an example of the output in the new mode, 
which helps explain some of why the patch is built the way it is:


$ pgbench -c 12 -j 4 -T 30 -P pgbench
plgsql function created.
starting vacuum...end.
transaction type: SELECT only via plpgsql
scaling factor: 100
query mode: simple
number of clients: 12
number of threads: 4
duration: 30 s
number of transactions actually processed: 9342
tps = 311.056293 (including connections establishing)
tps = 311.117886 (excluding connections establishing)
selects per second = 159260.822100 (including connections establishing)
selects per second = 159292.357672 (excluding connections establishing)

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch - Debug builds without optimization

2011-06-19 Thread Greg Smith

Greg Stark wrote:

I've always wondered what other people do to iterate quickly.


I'd have bet money you had an elisp program for this by now!

The peg utility script I use makes a reinstall as simple as:

stop
peg build

The UI for peg is still is a little rough around switching to another 
project when using git, and the PGDATA handling could be better.  Being 
able to give each patch I want to play with its own binary+data tree 
with a couple of simple commands is the time consuming part to setup I 
wanted to automate completely, and for that it works great:  
https://github.com/gregs1104/peg


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] cache estimates, cache access cost

2011-06-19 Thread Greg Smith

On 06/19/2011 06:15 PM, Kevin Grittner wrote:

I think the point is that if, on a fresh system, the first access to
a table is something which uses a tables scan -- like select count(*)
-- that all indexed access would then  tend to be suppressed for that
table.  After all, for each individual query, selfishly looking at
its own needs in isolation, it likely *would* be faster to use the
cached heap data.
   


If those accesses can compete with other activity, such that the data 
really does stay in the cache rather than being evicted, then what's 
wrong with that?  We regularly have people stop by asking for how to pin 
particular relations to the cache, to support exactly this sort of scenario.


What I was would expect on any mixed workload is that the table would 
slowly get holes shot in it, as individual sections were evicted for 
more popular index data.  And eventually there'd be little enough left 
for it to win over an index scan.  But if people keep using the copy of 
the table in memory instead, enough so that it never really falls out of 
cache, well that's not necessarily even a problem--it could be 
considered a solution for some.


The possibility that people can fit their entire table into RAM and it 
never leaves there is turning downright probable in some use cases now.  
A good example are cloud instances using EC2, where people often 
architect their systems such that the data set put onto any one node 
fits into RAM.  As soon as that's not true you suffer too much from disk 
issues, so breaking the databases into RAM sized pieces turns out to be 
very good practice.  It's possible to tune fairly well for this case 
right now--just make the page costs all low.  The harder case that I see 
a lot is where all the hot data fits into cache, but there's a table or 
two of history/archives that don't.  And that would be easier to do the 
right thing with given this bit of "what's in the cache?" percentages.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench--new transaction type

2011-06-19 Thread Greg Smith
I applied Jeff's patch but changed this to address concerns about the 
program getting stuck running for too long in the function:


#define plpgsql_loops   512

This would be better named as "plpgsql_batch_size" or something similar 
instead, the current name suggests it's how many loops to run which is 
confusing.


My main performance concern here was whether this change really matter 
so much once a larger number of clients were involved.  Some of the 
other things you can do to optimize single-client performance aren't as 
useful with lots of them.  Here's how the improvements in this mode 
worked for me on a server with 4 Hyper-Threaded cores (i870); 
shared_buffers=256MB, scale=100:


1 client:
-S: 11533
-S -M prepared: 19498
-P: 49547

12 clients, 4 workers:
-S:  56052
-S -M prepared: 82043
-P: 159443

96 clients, 8 workers:
-S: 49940
-S -M prepared: 76099
-P: 137942

I think this is a really nice new workload to demonstrate.  One of the 
things we tell people is that code works much faster when moved 
server-side, but how much faster isn't always easy to show.  Having this 
mode available lets them see how dramatic that can be quite easily.  I 
know I'd like to be able to run performance tests for clients of new 
hardware using PostgreSQL and tell them something like this:  "With 
simple clients executing a statement at a time, this server reaches 56K 
SELECTs/section.  But using server-side functions to execute them in 
larger batches it can do 159K".


The value this provides for providing an alternate source for benchmark 
load generation, with a very different profile for how it exercises the 
server, is good too.


Things to fix in the patch before it would be a commit candidate:

-Adjust the loop size/name, per above
-Reformat some of the longer lines to try and respect the implied right 
margin in the code formatting

-Don't include the "plgsql function created." line unless in debugging mode.
-Add the docs.  Focus on how this measures how fast the database can 
execute SELECT statements using server-side code.  An explanation that 
the "transaction" block size is 512 is important to share.  It also 
needs a warning that time based runs ("-T") may have to wait for a block 
to finish and go beyond its normally expected end time.
-The word "via" in the "transaction type" output description is probably 
not the best choice.  Changing to "SELECT only using PL/pgSQL" would 
translate better, and follow the standard case use for the name of that 
language.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] cache estimates, cache access cost

2011-06-19 Thread Greg Smith

On 06/19/2011 09:38 AM, Greg Stark wrote:

There's another problem which I haven't seen mentioned. Because the
access method will affect the cache there's the possibility of
feedback loops. e.g. A freshly loaded system prefers sequential scans
for a given table because without the cache the seeks of random reads
are too expensive...


Not sure if it's been mentioned in this thread yet, but he feedback 
issue has popped up in regards to this area plenty of times.  I think 
everyone who's producing regular input into this is aware of it, even if 
it's not mentioned regularly.  I'm not too concerned about the specific 
case you warned about because I don't see how sequential scan vs. index 
costing will be any different on a fresh system than it is now.  But 
there are plenty of cases like it to be mapped out here, and many are 
not solvable--they're just something that needs to be documented as a risk.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] procpid?

2011-06-16 Thread Greg Smith

On 06/16/2011 05:27 PM, Bruce Momjian wrote:

Greg Smith wrote:
   

-It is still useful to set current_query to descriptive text in the
cases where the transaction is  etc.
 

Uh, if we are going to do that, why not just add the boolean columns to
the existing view?  Clearly renaming procpid isn't worth creating
another view.
   


I'm not completely set on this either way; that's why I suggested a 
study that digs into typical monitoring system queries would be useful.  
Even the current view is pushing the limits for how much you can put 
into something that intends to be human-readable though.  Adding a new 
pile of columns to it has some downsides there.


I hadn't ever tried to write down everything I'd like to see changed 
here until this week, so there may be further column churn that 
justifies a new view too.  I think the whole idea needs to get chewed on 
a bit more.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] procpid?

2011-06-16 Thread Greg Smith

On 06/15/2011 12:41 PM, Robert Haas wrote:

But I will note that we had better be darn sure to make all the changes we
want to make in one go, because I dowanna have to create pg_sessions2
(or pg_tessions?) in a year or three.
   


I just added a new section to the TODO to start collecting up some of 
these related ideas into one place:  
http://wiki.postgresql.org/wiki/Todo#Monitoring so we might try to get 
as many as possible all in one go.


The other item on there related to pg_stat_activity that might impact 
this design was adding a column for tracking progress of commands like 
CREATE INDEX and VACUUM (I updated to note CLUSTER falls into that 
category too).  While query progress will always be a hard problem, 
adding a field to store some sort of progress indicator might be useful 
even if it only worked on these two initially.  Anyway, topic for 
another time.


The only other item related to this view on the TODO was "Have 
pg_stat_activity display query strings in the correct client encoding".  
That might be worthwhile to bundle into this rework, but it doesn't seem 
something that impacts the UI such that it must be considered early.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch - Debug builds without optimization

2011-06-16 Thread Greg Smith

On 06/16/2011 10:10 AM, Tom Lane wrote:

I could see providing some other nonstandard configure switch that
changed the default -O level ... but realistically, would that do
anything that you couldn't already do by setting CFLAGS, ie

./configure CFLAGS="-O0 -g"
   


I think a small discussion of the issue Radek ran into is appropriate to 
put somewhere, with this example.  The install procedure section of the 
docs already includes a CFLAGS example:


./configure CC=/opt/bin/gcc CFLAGS='-O2 -pipe'

There is also a section talking about setting options like 
--enable-cassert in the Developer's FAQ.  Looking at all the info out 
there about developer/debug builds, it's really kind of sketchy and 
distributed though.  No one place that pulls all the most common things 
people need together into one resource.


What seems like the idea solution here is to add a new section to the 
install procedure with brief coverage of this entire area.  Here's a 
prototype of text that might go there:


= Installation for development and debugging =

When modifying the PostgreSQL source code, or when trying to find the 
source of a bug in the program, it may be helpful to build the program 
in a way that makes this process easier.  There are build-time only 
changes that enable better error checking and debugging, including:


Pass --enable-cassert to configure. This can make bugs more visible, 
because they cause operations to abort with a clear error.  That makes 
some types of debugging much easier.  This is risky on a production 
server, as described in the documentation for this parameter.


Pass --enable-debug to configure. This provides better information about 
what the server is doing when looking at it using a debugger.  It's less 
risky to a production server than enabling assertions, and it normally 
has less of a performance impact hgtoo.  See its documentation for more 
details.


Disable compiler optimization.  When using a debugger to trace into the 
source code of the server, steps may optimized away by the normal build 
process.  In some situations --enable-debug will disable such 
optimization, but this is not always the case.  Specifically disabling 
optimization is possible with many compilers by setting the compiler 
flags when configuration the source code build, such as:


./configure CFLAGS="-O0 -g"

This example for the gcc compiler disables optimizations, and tells the 
compiler to provide extra debugging information most useful with the gdb 
debugger.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] procpid?

2011-06-16 Thread Greg Smith
Since the CF is upon us and discussion is settling, let's see if I can 
wrap this bikeshedding up into a more concrete proposal that someone can 
return to later.  The ideas floating around have gelled into:


-Add a new pg_stat_sessions function that is implemented similarly to 
pg_stat_activity.  For efficiency and simplicity sake, internally this 
will use the same sort of SRF UI that pg_stat_get_activity does inside 
src/backend/utils/adt/pgstatfuncs.c  There will need to be some 
refactoring here to reduce code duplication between that and the new 
function (which will presumably named pg_stat_get_sessions)


-The process ID field here will be named "pid" to match other system 
views, rather than the current "procpid"


-State information such as whether the session is idle, idle in a 
transaction, or has a query visible to this backend will be presented as 
booleans similar to the current waiting field.  A possible additional 
state to expose is the concept of "active", which ends up being derived 
using logic like "visible && !idle && !idle_transaction && !waiting" in 
some monitoring systems.


-A case could be made for making some of these state fields null, 
instead true or false, in situations where the session is not visible.  
If you don't have rights to see the connection activity, setting idle, 
idle_transaction, and active all to null may be the right thing to do.  
More future bikeshedding is likely on this part, once an initial patch 
is ready for testing.  I'd want to get some specific tests against the 
common monitoring goals of tools like check_postgres and the Munin 
plug-in to see which implementation makes more sense for them as input 
on that.


-It is still useful to set current_query to descriptive text in the 
cases where the transaction is  etc.  That text is not ambiguous 
with a real query, it is useful for a human-readable view, and it 
improves the potential for pg_stat_sessions to fully replace a 
deprecated pg_stat_activity (instead of just co-existing with it).  That 
the query text is overloaded with this information seems agreed to be a 
good thing; it's just that filtering on the state information there 
should not require parsing it.  The additional booleans will handle 
that.  If idle sessions can be filtered using "WHERE NOT idle", whether 
the current_query for them reads "" or is null won't matter to 
typical monitoring use.  Given no strong preference there, using 
"" is both familiar and more human readable.


I'll go add this as a TODO now.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] procpid?

2011-06-16 Thread Greg Smith

On 06/15/2011 04:13 AM, Rainer Pruy wrote:

I much prefer reading an "  in transaction" on a quick glance
over having to search a column and recognize a "t" from an "f"
to find out whether there is a transaction pending or not.
   


This is a fair observation.  If we provide a second view here that 
reorganizes the data toward something more appropriate for monitoring 
systems to process it, you may be right that the result will be a step 
backwards for making it human-readable.  They may end up being similar, 
co-existing views aimed at different uses, rather than one clearly 
replacing the other one day.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] procpid?

2011-06-15 Thread Greg Smith
Here's the sort of thing every person who writes a monitoring tool 
involving pg_stat_activity goes through:


1) Hurray!  I know how to see what the database is doing now!  Let me 
try counting all the connections so I can finally figure out what to set 
[max_connections | work_mem | other] to.
2) Wait, some of these can be "".  That's not documented.  I'll 
have to special case them because they don't really matter for my 
computation.
3) Seriously, there's another state for idle in a transaction?  Just how 
many of these special values are there?  [There's actually one more 
surprise after this]


The whole thing is enormously frustrating, and it's an advocacy 
problem--it contributes to people just starting to become serious about 
using PostgreSQL lowering their opinion of its suitability for their 
business.  If this is what's included for activity monitoring, and it's 
this terrible, it suggest people must not have very high requirements 
for that.


And what you end up with to make it better is not just another few 
keystrokes.  Here, as a common example I re-use a lot, is a decoder 
inspired by Munin's connection count monitoring graph:


SELECT
waiting,
CASE WHEN current_query='' THEN true ELSE false END AS idle,
CASE WHEN current_query=' in transaction' THEN true ELSE 
false END AS idletransaction,
CASE WHEN current_query='' THEN false ELSE 
true END as visible,
CASE WHEN NOT waiting AND current_query NOT IN ('', ' 
in transaction', '') THEN true ELSE false END AS 
active,

procpid,current_query
FROM pg_stat_activity WHERE procpid != pg_backend_pid();

What percentage of people do you think get this right?  Now, what does 
that number go to if these states were all obviously exposed booleans?  
As I'm concerned, this design is fundamentally flawed as currently 
delivered, so the concept of "breaking" it doesn't really make sense.


The fact that you can only figure all this decoding magic out through 
extensive trial and error, or reading the source code to [the database | 
another monitoring tool], is crazy.  It's a much bigger problem than the 
fact that the pid column is misnamed, and way up on my list of things 
I'm just really tired of doing.  Yes, we could just document all these 
mystery states to help, but they'd still be terrible.


This is a database; let's expose the data in a way that it's easy to 
slice yourself using a database query.  And if we're going to fix 
that--which unfortunately will be breaking it relative to those already 
using the current format--I figure why not bundle the procpid fix into 
that while we're at it.  It's even possible to argue that breaking that 
small thing will draw useful attention to the improvements in other 
parts of the view.  Having your monitoring query break after a version 
upgrade is no fun.  But if investigating why reveals new stuff you 
didn't notice in the release notes, the changes become more 
discoverable, albeit in a somewhat perverse way.


Putting on my stability hat instead of my "make it right" one, maybe 
this really makes sense to expose as a view with a whole new name.  Make 
this new one pg_activity (there's no stats here anyway), keep the old 
one around as pg_stat_activity for a few releases until everyone has 
converted to the new one.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] procpid?

2011-06-14 Thread Greg Smith

On 06/14/2011 06:00 PM, Tom Lane wrote:

As far as Greg's proposal is concerned, I don't see how a proposed
addition of two columns would justify renaming an existing column.
Additions should not break any sanely-implemented application, but
renamings certainly will.
   


It's not so much justification as something that makes the inevitable 
complaints easier to stomach, in terms of not leaving a really bad taste 
in the user's mouth.  My thinking is that if we're going to mess with 
pg_stat_activity in a way that breaks something, I'd like to see it 
completely refactored for better usability in the process.  If code 
breaks and the resulting investigation by the admin highlights something 
new, that offsets some of the bad user experience resulting from the 
breakage.


Also, I haven't fully worked whether it makes sense to really change 
what current_query means if the idle/transaction component of it gets 
moved to another column.  Would it be better to set current_query to 
null if you are idle, rather than the way it's currently overloaded with 
text in that case?  I don't like the way this view works at all, but I'm 
not sure the best way to change it.  Just changing procpid wouldn't be 
the only thing on the list though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] cache estimates, cache access cost

2011-06-14 Thread Greg Smith

On 06/14/2011 07:08 PM, Tom Lane wrote:

I concur with Robert's desire to not push experimental code into the
main repository, but we need to have *some* way of working with it.
Maybe a separate repo where experimental branches could hang out would
be helpful?
   


Well, this one is sitting around in branches at both git.postgresql.org 
and github so far, both being updated periodically.  Maybe there's some 
value around an official experimental repository too, but I thought that 
was the idea of individual people having their own directories on 
git.postgres.org.  Do we need something fancier than that?  It would be 
nice, but seems little return on investment to improve that, relative to 
what you can do easily enough now.


The idea David Fetter has been advocating of having a "bit rot" farm to 
help detect when the experimental branches drift too far out of date 
tries to make that concept really formal.  I like that idea, too, but 
find it hard to marshal enough resources to do something about it.  The 
current status quo isn't that terrible; noticing bit rot when it's 
relevant isn't that hard to do.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] cache estimates, cache access cost

2011-06-14 Thread Greg Smith

On 06/14/2011 01:16 PM, Robert Haas wrote:

But there's no reason that code (which may or may not eventually prove
useful) has to be incorporated into the main tree.  We don't commit
code so people can go benchmark it; we ask for the benchmarking to be
done first, and then if the results are favorable, we commit the code.
   


Who said anything about this being a commit candidate?  The "WIP" in the 
subject says it's not intended to be.  The community asks people to 
submit design ideas early so that ideas around them can be explored 
publicly.  One of the things that needs to be explored, and that could 
use some community feedback, is exactly how this should be benchmarked 
in the first place.  This topic--planning based on cached 
percentage--keeps coming up, but hasn't gone very far as an abstract 
discussion.  Having a patch to test lets it turn to a concrete one.


Note that I already listed myself as the reviewer  here, so it's not 
even like this is asking explicitly for a community volunteer to help.  
Would you like us to research this privately and then dump a giant patch 
that is commit candidate quality on everyone six months from now, 
without anyone else getting input to the process, or would you like the 
work to happen here?  I recommended Cédric not ever bother soliciting 
ideas early, because I didn't want to get into this sort of debate.  I 
avoid sending anything here unless I already have a strong idea about 
the solution, because it's hard to keep criticism at bay even with 
that.  He was more optimistic about working within the community 
contribution guidelines and decided to send this over early instead.  If 
you feel this is too rough to even discuss, I'll mark it returned with 
feedback and we'll go develop this ourselves.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] procpid?

2011-06-14 Thread Greg Smith

On 06/14/2011 02:20 PM, Kevin Grittner wrote:
Just on our Wiki pages we have some queries available for copy/paste 
which would need multiple

versions while both column names were in supported versions of the
software:

http://wiki.postgresql.org/wiki/Lock_dependency_information
http://wiki.postgresql.org/wiki/Lock_Monitoring
http://wiki.postgresql.org/wiki/Backend_killer_function
   


...and most of these would actually be simplified if they could just 
JOIN on pid instead of needing this common idiom:


   join pg_catalog.pg_stat_activity ka
   on kl.pid = ka.procpid

Yes, there are a lot of these floating around.  I'd bet that in an hour 
of research I could find 95% of them though, and make sure they were all 
updated in advance of the release.  (I already did most of this search 
as part of stealing every good idea I could find in this area for my book)



I think that's consistent with the "save up our breaking changes to do them all 
at
once" approach.
   


I don't actually buy into this whole idea at all.  We already have this 
big wall at 8.3 because changes made in that release are too big for 
people on the earlier side to upgrade past.  I'd rather see a series of 
smaller changes in each release, even if they are disruptive, so that no 
one version turns into a frustrating hurdle seen as impossible to 
clear.  This adjustment is a perfect candidate for putting into 9.2 to 
me, because I'd rather reduce max(breakage) across releases than 
intentionally aim at increasing it but bundling them into larger clumps.


For me, the litmus test is whether the change provides enough 
improvement that it outweighs the disruption when the user runs into 
it.  This is why I suggested a specific, useful, and commonly requested 
(to me at least) change to pg_stat_activity go along with this.  If 
people discover their existing pg_stat_activity tools break, presumably 
they're going to look at the view again to see what changed.  When they 
do that, I don't want the reaction to be "why was this random change 
made?"  I want it to be "look, there are useful new fields in here; let 
me see if I can use them too here".  That's how you make people tolerate 
disruption in upgrades.  If they see a clear improvement in the same 
spot when forced to fix around it, the experience is much more pleasant 
if they get something new out of it too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] procpid?

2011-06-14 Thread Greg Smith

On 06/14/2011 11:44 AM, Jim Nasby wrote:
Wouldn't it be better still to have both the new and old columns 
available for a while? That would produce the minimum amount of 
disruption to tools, etc.


Doing this presumes the existence of a large number of tools where the 
author is unlikely to be keeping up with PostgreSQL development.  I 
don't believe that theorized set of users actually exists.  There are 
people who use pg_stat_activity simply, and there are tool authors who 
are heavily involved enough that they will see a change here coming far 
enough in advance to adopt it without disruption.  If there's a large 
base of "casual" tool authors, who wrote something using 
pg_stat_activity once and will never update it again, I don't know where 
they are.


Anyway, I want a larger change to pg_stat_activity than this one, and I 
would just roll fixing this column name into that more disruptive and 
positive change.  Right now the biggest problem with this view is that 
you have to parse the text of the query to figure out what state the 
connection is in.  This is silly; there should be boolean values exposed 
for "idle" and "in transaction".  I want to be able to write things like 
this:


SELECT idle,in_trans,count(*) FROM pg_stat_activity GROUP BY idle,in_trans;
SELECT min(backend_start) FROM pg_stat_activity WHERE idle;

Right now the standard approach to this is to turn current_query into a 
derived state value using CASE statements.  It's quite unfriendly, and a 
bigger problem than this procpid mismatch.  Fix that whole mess at once, 
and now you've got something useful enough to justify breaking tools.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] cache estimates, cache access cost

2011-06-14 Thread Greg Smith

On 06/14/2011 11:04 AM, Robert Haas wrote:

Even if the data were accurate and did not cause plan stability, we
have no evidence that using it will improve real-world performance.
   


That's the dependency Cédric has provided us a way to finally make 
progress on.  Everyone says there's no evidence that this whole approach 
will improve performance.  But we can't collect such data, to prove or 
disprove it helps, without a proof of concept patch that implements 
*something*.  You may not like the particular way the data is collected 
here, but it's a working implementation that may be useful for some 
people.  I'll take "data collected at ANALYZE time" as a completely 
reasonable way to populate the new structures with realistic enough test 
data to use initially.


Surely at least one other way to populate the statistics, and possibly 
multiple other ways that the user selects, will be needed eventually.  I 
commented a while ago on this thread:  every one of these discussions 
always gets dragged into the details of how the cache statistics data 
will be collected and rejects whatever is suggested as not good enough.  
Until that stops, no progress will ever get made on the higher level 
details.  By its nature, developing toward integrating cached 
percentages is going to lurch forward on both "collecting the cache 
data" and "using the cache knowledge in queries" fronts almost 
independently.  This is not a commit candidate; it's the first useful 
proof of concept step for something we keep talking about but never 
really doing.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Greg Smith

On 06/13/2011 07:55 AM, Stefan Kaltenbrunner wrote:

all those tests are done with pgbench running on the same box - which
has a noticable impact on the results because pgbench is using ~1 core
per 8 cores of the backend tested in cpu resoures - though I don't think
it causes any changes in the results that would show the performance
behaviour in a different light.
   


Yeah, this used to make a much bigger difference, but nowadays it's not 
so important.  So long as you have enough cores that you can spare a 
chunk of them to drive the test with, and you crank "-j" up to a lot, 
there doesn't seem to be much of an advantage to moving the clients to a 
remote system now.  You end up trading off CPU time for everything going 
through the network stack, which adds yet another set of uncertainty to 
the whole thing anyway.


I'm glad to see so many people have jumped onto doing these SELECT-only 
tests now.  The performance farm idea I've been working on runs a test 
just like what's proven useful here.  I'd suggested that because it's 
been really sensitive to changes in locking and buffer management for me.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Greg Smith

On 06/13/2011 08:27 PM, Jeff Janes wrote:

pgbench sends each query (per connection) and waits for the reply
before sending another.

Do we know whether sysbench does that, or if it just stuffs the
kernel's IPC buffer full of queries without synchronously waiting for
individual replies?
   


sysbench creates a thread for each client and lets them go at things at 
whatever speed they can handle.  You have to setup pgbench with a worker 
per core to get them even close to level footing.  And even in that 
case, sysbench has a significant advantage, because it's got the 
commands it runs more or less hard-coded in the program.  pgbench is 
constantly parsing things in its internal command language and then 
turning them into SQL requests.  That's flexible and allows it to be 
used for some neat custom things, but it uses a lot more resources to 
drive the same number of clients.



I can't get sysbench to "make" for me, or I'd strace in single client
mode and see what kind of messages are going back and forth.
   


If you're using a sysbench tarball, no surprise.  It doesn't build on 
lots of platforms now.  If you grab 
http://projects.2ndquadrant.it/sites/default/files/bottom-up-benchmarking.pdf 
it has my sysbench notes starting on page 34.  I had to checkout the 
latest version from their development repo to get it to compile on any 
recent system.  The attached wrapper script may be helpful to you as 
well to help get over the learning curve for how to run the program; it 
iterates sysbench over a number of database sizes and thread counts 
running the complicated to setup OLTP test.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


#!/bin/bash

SB="$HOME/0.4/sysbench/sysbench"

DB="--pgsql-user=postgres --pgsql-password=password --pgsql-db=sysbench 
--pgsql-host=localhost --db-driver=pgsql"
#DB="--mysql-user=root --db-driver=mysql --mysql-table-engine=innodb 
--mysql-db=sysbench"
#THREADS="1 2 3 4 5 6 7 8 9 10 11 12 16 24 32 48 64 96"
THREADS="1"
TIME=10
TEST_PARAM="--oltp-read-only=off --oltp-test-mode=complex"
#TEST_PARAM="--oltp-read-only=on --oltp-test-mode=simple"
SIZES="1"
#SIZES="1 10 100 1000 5000 1 5 10"
# In complex, non read-only mode, there will be duplicate key issues, and
# threads will fail with deadlock--causing no value to be returned.
#TEST_PARAM="--oltp-test-mode=complex"

for s in $SIZES
do

SIZE_PARAM="--oltp-table-size=$s"

# Just in case!
$SB $DB --test=oltp cleanup

$SB $DB $SIZE_PARAM --test=oltp prepare 

for t in $THREADS
do
  echo -n $t threads:
  $SB $DB --test=oltp $TEST_PARAM $SIZE_PARAM --init-rng --max-requests=0 
--max-time=$TIME --num-threads=$t run 2>&1 # | grep "read/write requests"
done

$SB $DB --test=oltp cleanup

done

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench--new transaction type

2011-06-12 Thread Greg Smith

On 06/11/2011 03:21 PM, Jeff Janes wrote:

I wouldn't expect IPC chatter to show up in profiling, because it
costs wall time, but not CPU time.  The time spent might be attributed
to the kernel, or to pgbench, or to nothing at all.
   


Profilers aren't necessarily just accumulating raw CPU time though.  If 
the approach includes sampling "what code is active right now?" 
periodically, you might be able to separate this out even though it's 
not using CPU time in the normal fashion.  I think you might just need 
to use a better profiler.


Anyway, the sort of breakdown this helps produce is valuable 
regardless.  I highlighted the statement you made because I reflexively 
challenge theorizing about code hotspots on general principle.  The 
measurement-based breakdown you provided was more what I look for.




But there is no
repository of such "useful for developer testing" patches, other than
this mailing list.  That being the case, would it even be worthwhile
to fix it up more and submit it to commitfest?
   


The activity around profiling pgbench and trying to crack one of the 
bottlenecks has been picking up a lot of momentum lately, and if we're 
lucky that will continue throughout 9.2 development.  As such, now seems 
a good time as any to consider adding something like this.  We may end 
up reskinng lots of pgbench before this is over.  I added your patch to 
the CommitFest.



So at a loop of 512, you would have overhead of 59.0/512=0.115 out of
total time of 17.4, or 0.7% overhead.  So that should be large enough.
   


That I think is workable.  If the split was a compile time constant 
fixed at 512 unless you specifically changed it, even the worst typical 
cases shouldn't suffer much from batch overhang.  If you create a 
database so large that you only get 50 TPS, which is unusual but not 
that rare, having a 512 execution batch size would mean you might get 
your "-T" set end time lagging 10 seconds behind its original plan.  
Unlike the 10K you started with, that is reasonable; that does sound 
like the sweet spot where overhead is low but time overrun isn't too 
terrible.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Formatting curmudgeon HOWTO

2011-06-12 Thread Greg Smith
With another round of GSoC submissions approaching, I went looking 
around for some better guidance on the topic of how to follow terse 
submission guidelines like "blend in with the surrounding code" or 
"remove spurious whitespace".  And I didn't find any.  Many mature 
open-source projects say things like that, but I haven't been able to 
find a tutorial of just what that means, or how to do it.


Now we have http://wiki.postgresql.org/wiki/Creating_Clean_Patches to 
fill that role, which fits in between "Working with Git" and "Submitting 
a Patch" as a fairly detailed walkthrough.  That should be an easier URL 
to point people who submit malformed patches toward than the 
documentation we've had before.


Advocacy aside:  this page might be a good one to submit to sites that 
publish open-source news.  It's pretty generic advice, is useful but not 
widely documented information, and it reflects well on our development 
practice.  I'm trying to reverse the perception we hear about sometimes 
that submitting patches to PostgreSQL is unreasonably difficult.  Seeing 
an example of how much easier it is to read a well formatted patch 
serves well for making people understand why the project has high 
expectations for formatting work.  It's not pedantic, it's functionally 
better.  I threw it onto reddit as a first spot to popularize:  
http://www.reddit.com/r/technology/comments/hy0aq/creating_clean_patches_with_git_diff/


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2   3   4   5   6   7   8   9   10   >