Re: [HACKERS] Notes on implementing URI syntax for libpq
On 11/24/2011 05:21 AM, Alvaro Herrera wrote: A coworker also suggested using a different designator: postgresqli:///path/to/socket:5433/database postgresqli://:5433/database This is not unprecedented. An example is how CUPS handles this problem when connecting printers using URIs: http://www.cups.org/documentation.php/network.html where you might see this for the usual port: lpd://ip-address-or-hostname/queue And this for AppSocket AKA JetDirect: socket://ip-address-or-hostname I am certainly not going to defend printing setup with CUPS as a model worth emulating, just noting the similarity here. I think we'll save miles of user headaches if there's only one designator. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 11/27/2011 04:39 PM, Ants Aasma wrote: On the AMD I saw about 3% performance drop with timing enabled. On the Intel machine I couldn't measure any statistically significant change. Oh no, it's party pooper time again. Sorry I have to be the one to do it this round. The real problem with this whole area is that we know there are systems floating around where the amount of time taken to grab timestamps like this is just terrible. I've been annoyed enough by that problem to spend some time digging into why that is--seems to be a bunch of trivia around the multiple ways to collect time info on x86 systems--and after this CommitFest is over I was already hoping to dig through my notes and start quantifying that more. So you can't really prove the overhead of this approach is acceptable just by showing two examples; we need to find one of the really terrible clocks and test there to get a real feel for the worst-case. I recall a patch similar to this one was submitted by Greg Stark some time ago. It used the info for different reasons--to try and figure out whether reads were cached or not--but I believe it withered rather than being implemented mainly because it ran into the same fundamental roadblocks here. My memory could be wrong here, there were also concerns about what the data would be used for. I've been thinking about a few ways to try and cope with this whole class of timing problem: -Document the underlying problem and known workarounds, provide a way to test how bad the overhead is, and just throw our hands up and say "sorry, you just can't instrument like this" if someone has a slow system. -Have one of the PostgreSQL background processes keep track of a time estimate on its own, only periodically pausing to sync against the real time. Then most calls to gettimeofday() can use that value instead. I was thinking of that idea for slightly longer running things though; I doubt that can be made accurate enough to test instrument buffer And while I hate to kick off massive bike-shedding in your direction, I'm also afraid this area--collecting stats about how long individual operations take--will need a much wider ranging approach than just looking at the buffer cache ones. If you step back and ask "what do people expect here?", there's a pretty large number who really want something like Oracle's v$session_wait and v$system_event interface for finding the underlying source of slow things. There's enough demand for that that EnterpriseDB has even done some work in this area too; what I've been told about it suggests the code isn't a great fit for contribution to community PostgreSQL though. Like I said, this area is really messy and hard to get right. Something more ambitious like the v$ stuff would also take care of what you're doing here; I'm not sure that what you've done helps built it though. Please don't take that personally. Part of one of my own instrumentation patches recently was rejected out of hand for the same reason, just not being general enough. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logging in high performance systems.
On 11/23/2011 09:28 PM, Theo Schlossnagle wrote: The second thing I did was write a sample use of those hooks to implement a completely non-blocking fifo logger. (if it would block, it drops the log line). The concept is that we could run this without risk of negative performance impact due to slow log reading (choosing to drop logs in lieu of pausing). And a simple process could be written to consume from the fifo. This was one of the topics at the last developer's meeting you might not have seen go by: http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting#Improving_Logging There was a reference to a pipe-based implementation from Magnus that I haven't gotten a chance to track down yet. I think this area is going to start hitting a lot more people in the upcoming couple of years, since I'm seeing it increasingly at two customers I consider "canary in a cole mine" sentinels for performance issues. I'm now roughly considering three types of users here: -Don't care about the overhead of logging, but are sick of parsing text files. Would prefer the data be in a table instead. -Concerned enough about overhead that statement-level logging is impractical to log or table, but can cope with logging for other things. -Logging rate can burst high enough that messages must start being dropped instead no matter where they go. Before making a big change, log file vs. table needs to be carefully explored to figure which of the two approaches has more reasonable behavior/performance trade-offs. I've been trying to attack this starting at the middle, with the pg_stat_statements rework Peter here did for the current CommitFest. If you've already worked out a way to simulate heavy logging as part of what you've done here, I'd be quite interested to hear how capable you feel it is for the class of problem you're seeing. I've always assumed that pushing the most common queries into shared memory and only showing them on demand, rather than logging them line at a time, could be a big win for some places. We're still a bit light on benchmarks proving that is the case so far though. My assumption has been that eventually a lossy logger was going to be necessary for busier sites, I just haven't been suffering from one enough to hack on it yet. If it's possible to work this out in enough detail to figure out where the hooks go, and to prove they work with at least one consumer of them, I'd consider that a really useful thing to try and squeeze into 9.2. The processing parts can always be further improved later based on production feedback, going along with my recent them of letting extensions that poke and probe existing hooks be one place to brew next version features at. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
On 11/21/2011 11:40 AM, Bruce Momjian wrote: I think a question is how often people are waiting for features that actually can be addressed in a contrib/plugin way. My gut feeling is that most missing features have to be added to the server core (e.g. index-only scans) and are not possible to add in a contrib/plugin way. Good question; let's talk about 9.0. We were building/distributing three things for that version that poked into the server: -Replication monitoring tools that slipped from the 9.0 schedule, similar to what became pg_stat_replication in 9.1 -An early version of what became hot_standby_feedback in 9.1. -pg_streamrecv While these weren't all packaged as extensions per se, all of them used the PGXS interface. And they all provided deployment blocking features to early adopters before those features were available in core, in some cases after the issues they address had been encountered in production deployments. As I was ranting on my blog recently, I'm seeing more complaints recently about monitoring and management features--exactly the sort of thing that you can improve as an extension, and that the extensions I've proposed provide--than I am over missing big features. Index-only scans are a good example, as one of the most requested performance feature you can only get in core (I'd put them at #2 behind materialized views for the customers I talk to). I wouldn't bet that they are considered more important by a typical deployment than good built-in query profiling though. I get complaints about query monitoring from every single PostgreSQL install, while complaints about not having index-only scans only come from the bigger installations. Note how demand is high enough that we have two pg_stat_statements replacements submitted right now. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
On 11/18/2011 09:35 AM, Tom Lane wrote: Subdividing/rearranging contrib makes the packager's life more complicated, *and* makes his users' lives more complicated, if only because things aren't where they were before. It seems unlikely to happen, at least in the near term. Users are visibly suffering by the current packaging. Production DBAs are afraid to install contrib because it's described as untrustworthy. They are hit by emergencies that the inspection tools would help with, but cannot get contrib installed easily without root permissions. They have performance issues that the contrib modules I'm trying to relocate into the server package would help with, but company policies related to post-deployment installation mean they cannot use them. They have to always be installed to make this class of problem go away. If you feel there are more users that would be negatively impacted by some directories moving than what I'm describing above, we are a very fundamental disagreement here. The status quote for all of these extensions is widely understood to be unusable in common corporate environments. Packagers should be trying to improve the PostgreSQL experience, and I'm trying to help with that. In the face of pushback from packagers, I can roll my own packages that are designed without this problem; I'm being pushed into doing that instead of working on community PostgreSQL already. But I'd really prefer to see this very common problem identified as so important it should get fixed everywhere instead. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
On 11/18/2011 03:36 PM, Josh Berkus wrote: Of course, packagers may then reasonably ask why that code is not just part of the core? Let me step back from the implementation ideas for a minute and talk about this, and how it ties into release philosophy. The extensions infrastructure for PostgreSQL is one of its strongest features. We can use it as a competitive advantage over other databases, one that can make this database stable and continuously innovating at the same time. But that's not happening enough yet; I feel this change is a major step in that direction. There's no demonstration that extensions are edible dog food like the core database visibly eating a can. To see why this matters so much, let's compare two stereotypes of PostgreSQL users at different extremes of upgrade tolerance. First we have the classic enterprise DBA. Relative to this person's expectations, PostgreSQL releases are way too fast. They can't upgrade their database every year; that's madness. This is the person who we yell at about how they should upgrade to the latest minor point release, because once they have a working system they touch *nothing*. For this user, the long beta period of new PostgreSQL releases, and its general conservative development model, are key components to PostgreSQL being suitable for them. At the other extreme, we have the software developer with a frantic development/release schedule, the one who's running the latest stable version of every tool they use. This person expects some bugs in them, and the first reaction to running into one is to ask "is this fixed in the next version?" You'll find at least one component in their stack that's labeled "compiled from the latest checkout" because that was the only way to get a working version. And to them, the yearly release cycle of PostgreSQL is glacial. When they run into a limitation that is impacting a user base that's doubling every few months, they can't wait a year for a fix; they could easily go out of business by then. The key to satisfying both these extremes at once is a strong extensions infrastructure, led by the example of serious tools that are provided that way in the PostgreSQL core. For this to catch on, we need the classic DBAs to trust core extensions enough to load them into production. They don't do that now because the current contrib description sounds too scary, and they may not even have loaded that package onto the server. And we need people who want more frequent database core changes to see that extensions are a viable way to build some pretty extensive server hacks. I've submitted two changes to this CommitFest that are enhancing features in this "core extensions" set. Right now I have multiple customers who are desperate for both of those features. With extensions, I can give them changes that solve their immediate crisis right now, almost a full year before they could possibly appear in a proper release of PostgreSQL. And then I can push those back toward community PostgreSQL, with any luck landing in the next major version. Immediate gratification for the person funding development, and peer reviewed code that goes through a long beta and release cycle. That's the vision I have for a PostgreSQL that is simultaneously stable and agile. The easiest way to get there it is to lead by example--by having extensions that provide necessary, visible components to core, while still being obviously separate components. That's the best approach for proving this development model works and is suitable for everyone. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
On 11/17/2011 03:18 PM, Peter Eisentraut wrote: Who's to say that after this, the core extensions won't end up in a new separate package postgresql-extensions (or similar) or might even stay in postgresql-contrib, for compatibility? I don't know why packagers would make an active decision that will make their lives more difficult, with no benefit to them and a regression against project recommendations for their users. The last thing anyone packaging PostgreSQL wants is more packages to deal with; there are already too many. Each of the current sub-packages has a legitimate technical or distribution standard reason to exist--guidelines like "break out client and server components" or "minimize the package dependencies for the main server". I can't think of any good reason that would inspire the sort of drift you're concerned about. There's little compatibility argument beyond consistency with the previous packages. The reason why this is suddenly feasible now is that the under the hood change are almost all hidden by CREATE EXTENSION. And if some wanted to wander this way, they'll end up having to maintain a doc patch to address the fact that they've broken with project recommendations. This text in what I submitted will no longer be true: "This appendix contains information regarding core extensions that are built and included with a standard installation of PostgreSQL." One of the reasons I picked the name I did was to contrast with the existing description of contrib: "porting tools, analysis utilities, and plug-in features that are not part of the core PostgreSQL system, mainly because they address a limited audience or are too experimental to be part of the main source tree." That says it's perfectly fine to make these optional in another package--they're not "part of the core". That scary wording is practically telling packagers to separate them, so it's easy to keep the experimental stuff away from the production quality components. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CommitFest 2011-11 Started
The November CommitFest is now closed for new entries. We have 30 patches in the queue that are still looking for a reviewer at this point, out of a total of 53. If you'd like to review a patch but are looking for a suggestion as to which to choose, e-mail the pgsql-rrreviewers list saying so. 4 patches have already received some early review and are waiting for updates from the author: Allow toast tables to be moved to a different tablespace Online base backup from the hot-standby Separate pg_stat_activity into current_query into state and query columns Allow substitute allocator for PGresult. And we have 5 earlier submissions that have been flagged ready for a committer to look at them; names here are the expected committer when one has been mentioned already: Non-inheritable check constraints (Greg Stark) pg_last_xact_insert_timestamp (Simon) Add Support for building with Visual Studio 2010 (Magnus) plperl verify utf8 strings xsubpp from cpan -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On 10/05/2011 03:02 AM, Greg Smith wrote: Presumably you meant to ask if this makes sense to show when cost accounting isn't enabled, because the code doesn't do that right now. No cost accounting, no buffer usage/write rate data as this was submitted. This is done in the attached update. I just made the page accounting happen all the time, regardless of whether the costs were being accumulated. Added a read rate too, which is how fast reads happened from the OS cache to shared_buffers. Simple test case generates a 600MB pgbench_accounts database and wipes out enough to take a while to clean up; it needs log_autovacuum_min_duration = 0 and then: $ createdb pgbench $ pgbench -i -s 10 pgbench $ psql -d pgbench -c "delete from pgbench_accounts where aid<20" LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index scans: 1 pages: 0 removed, 16394 remain tuples: 19 removed, 640011 remain buffer usage: 13742 hits, 2708 misses, 1058 dirtied avg read rate: 3.067 MiB/s, avg write rate: 1.198 MiB/s system usage: CPU 0.05s/0.61u sec elapsed 6.89 sec Now that you mention it, people who do a manual, full-speed VACUUM would certainly appreciate some feedback on the rate it ran at. This is more of a pain because this whole code path is only active when IsAutoVacuumWorkerProcess. I have some larger refactoring in mind to perhaps make that more feasible. I didn't want to hold this update aiming at the more valuable autovac case for that though, can always layer it on later. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index f42504c..6ef85dd 100644 *** a/src/backend/commands/vacuum.c --- b/src/backend/commands/vacuum.c *** vacuum(VacuumStmt *vacstmt, Oid relid, b *** 214,219 --- 214,222 VacuumCostActive = (VacuumCostDelay > 0); VacuumCostBalance = 0; + VacuumPageHit = 0; + VacuumPageMiss = 0; + VacuumPageDirty = 0; /* * Loop to process each selected relation. diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 38deddc..c59fceb 100644 *** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** lazy_vacuum_rel(Relation onerel, VacuumS *** 154,160 int nindexes; BlockNumber possibly_freeable; PGRUsage ru0; ! TimestampTz starttime = 0; bool scan_all; TransactionId freezeTableLimit; BlockNumber new_rel_pages; --- 154,163 int nindexes; BlockNumber possibly_freeable; PGRUsage ru0; ! TimestampTz starttime = 0, endtime; ! long secs; ! int usecs; ! double read_rate, write_rate; bool scan_all; TransactionId freezeTableLimit; BlockNumber new_rel_pages; *** lazy_vacuum_rel(Relation onerel, VacuumS *** 166,173 if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { pg_rusage_init(&ru0); ! if (Log_autovacuum_min_duration > 0) ! starttime = GetCurrentTimestamp(); } if (vacstmt->options & VACOPT_VERBOSE) --- 169,175 if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { pg_rusage_init(&ru0); ! starttime = GetCurrentTimestamp(); } if (vacstmt->options & VACOPT_VERBOSE) *** lazy_vacuum_rel(Relation onerel, VacuumS *** 262,274 /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { if (Log_autovacuum_min_duration == 0 || ! TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(), Log_autovacuum_min_duration)) ereport(LOG, (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n" "pages: %d removed, %d remain\n" "tuples: %.0f removed, %.0f remain\n" "system usage: %s", get_database_name(MyDatabaseId), get_namespace_name(RelationGetNamespace(onerel)), --- 264,290 /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { + endtime = GetCurrentTimestamp(); if (Log_autovacuum_min_duration == 0 || ! TimestampDifferenceExceeds(starttime, endtime, Log_autovacuum_min_duration)) + { + TimestampDifference(starttime, endtime, &secs, &usecs); + read_rate = 0; + write_rate = 0; + if ((secs > 0) || (usecs > 0)) + { + read_rate = (double) BLCKSZ * VacuumPageMiss / (1024 * 1024) / + (secs + usecs / 100.0); + write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) / + (secs + usecs / 100.0); + } ereport(LOG, (errmsg("automatic vacuum of tab
[HACKERS] includeifexists in configuration file
By recent popular request in the ongoing discussion saga around merging the recovery.conf, I've added an "includeifexists" directive to the postgresql.conf in the attached patch. Demo: $ tail -n 1 $PGDATA/postgresql.conf include 'missing.conf' $ pg_ctl start -l $PGLOG server starting $ tail -n 2 $PGLOG LOG: could not open configuration file "/home/gsmith/pgwork/data/include-exists/missing.conf": No such file or directory FATAL: configuration file "/home/gsmith/pgwork/data/include-exists/postgresql.conf" contains errors $ vi $PGDATA/postgresql.conf $ tail -n 1 $PGDATA/postgresql.conf includeifexists 'missing.conf' $ pg_ctl start -l $PGLOG server starting $ tail -n 3 $PGLOG LOG: database system was shut down at 2011-11-16 00:17:36 EST LOG: database system is ready to accept connections LOG: autovacuum launcher started There might be a cleaner way to write this that eliminates some of the cut and paste duplication between this and the regular include directive. I'm short on clever but full of brute force tonight. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index d1e628f..da45ac1 100644 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** include 'filename' *** 91,96 --- 91,107 + includeifexists + in configuration file + + Use the same approach as the include directive, continuing + normally if the file does not exist. A regular include + will stop with an error if the referenced file is missing, while + includeifexists does not. + + + + SIGHUP The configuration file is reread whenever the main server process receives a diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l index a094c7a..6f26421 100644 *** a/src/backend/utils/misc/guc-file.l --- b/src/backend/utils/misc/guc-file.l *** ProcessConfigFile(GucContext context) *** 129,135 /* Parse the file into a list of option names and values */ head = tail = NULL; ! if (!ParseConfigFile(ConfigFileName, NULL, 0, elevel, &head, &tail)) { /* Syntax error(s) detected in the file, so bail out */ error = true; --- 129,135 /* Parse the file into a list of option names and values */ head = tail = NULL; ! if (!ParseConfigFile(ConfigFileName, NULL, true, 0, elevel, &head, &tail)) { /* Syntax error(s) detected in the file, so bail out */ error = true; *** ProcessConfigFile(GucContext context) *** 363,369 * and absolute-ifying the path name if necessary. */ bool ! ParseConfigFile(const char *config_file, const char *calling_file, int depth, int elevel, ConfigVariable **head_p, ConfigVariable **tail_p) --- 363,369 * and absolute-ifying the path name if necessary. */ bool ! ParseConfigFile(const char *config_file, const char *calling_file, bool strict, int depth, int elevel, ConfigVariable **head_p, ConfigVariable **tail_p) *** ParseConfigFile(const char *config_file, *** 414,424 fp = AllocateFile(config_file, "r"); if (!fp) { ! ereport(elevel, ! (errcode_for_file_access(), ! errmsg("could not open configuration file \"%s\": %m", ! config_file))); ! return false; } OK = ParseConfigFp(fp, config_file, depth, elevel, head_p, tail_p); --- 414,430 fp = AllocateFile(config_file, "r"); if (!fp) { ! if (strict) ! { ! ereport(elevel, ! (errcode_for_file_access(), ! errmsg("could not open configuration file \"%s\": %m", ! config_file))); ! return false; ! } ! ! /* Silently skip missing files if not asked to be strict */ ! return OK; } OK = ParseConfigFp(fp, config_file, depth, elevel, head_p, tail_p); *** ParseConfigFp(FILE *fp, const char *conf *** 512,518 } /* OK, process the option name and value */ ! if (guc_name_compare(opt_name, "include") == 0) { /* * An include directive isn't a variable and should be processed --- 518,541 } /* OK, process the option name and value */ ! if (guc_name_compare(opt_name, "includeifexists") == 0) ! { ! /* ! * An includeifexists directive isn't a variable and should be ! * processed immediately. ! */ ! unsigned int save_ConfigFileLineno = ConfigFileLineno; ! ! if (!ParseConfigFile(opt_value, config_file, false, ! depth + 1, elevel, ! head_p, tail_p)) ! OK = false; ! yy_switch_to_buffer(lex_buffer); ! ConfigFileLineno = save_ConfigFileLineno; ! pfree(opt_name); ! pfree(opt_value); ! } !
[HACKERS] Configuration include directory
Two years ago Magnus submitted a patch to parse all the configuration files in a directory. After some discussion I tried to summarize what I thought the most popular ideas were for moving that forward: http://archives.postgresql.org/pgsql-hackers/2009-10/msg01452.php http://archives.postgresql.org/pgsql-hackers/2009-10/msg01631.php And I've now cleared the bit rot and updated that patch to do what was discussed. Main feature set: -Called by specifying "includedir ". No changes to the shipped postgresql.conf yet. -Takes an input directory name -If it's not an absolute path, considers that relative to the -D option (if specified) or PGDATA, the same logic used to locate the postgresql.conf (unless a full path to it is used) -Considers all names in that directory that end with *.conf [Discussion concluded more flexibility here would be of limited value relative to how it complicates the implementation] -Loops over the files found in sorted order by name The idea here is that it will be easier to write tools that customize the database configuration if they can just write a new file out, rather than needing to parse the whole configuration file first. This would allow Apache style configuration directories. My end goal here is to see all of the work initdb does pushed into a new file included by this scheme. People could then expect a functionally empty postgresql.conf except for an includedir, and the customization would go into 00initdb. Getting some agreement on that is not necessary for this feature to go in though; one step at a time. Here's an example showing this working, including rejection of a spurious editor backup file in the directory: $ cat $PGDATA/postgresql.conf | grep ^work_mem $ tail -n 1 $PGDATA/postgresql.conf includedir='conf.d' $ ls $PGDATA/conf.d 00config.conf 00config.conf~ $ cat $PGDATA/conf.d/00config.conf work_mem=4MB $ cat $PGDATA/conf.d/00config.conf~ work_mem=2MB $ psql -c "select name,setting,sourcefile,sourceline from pg_settings where name='work_mem'" name | setting | sourcefile | sourceline --+-+---+ work_mem | 4096| /home/gsmith/pgwork/data/confdir/conf.d/00config.conf | 1 No docs in here yet. There's one ugly bit of code here I was hoping (but failed) to avoid. Right now the server doesn't actually save the configuration directory anywhere. Once you leave the initial read in SelectConfigFiles, that information is gone, and you only have the configfile. I decided to make that configdir into a global value. Seemed easier than trying to pass it around, given how many SIGHUP paths could lead to this new code. I can see some potential confusion here in one case. Let's say someone specifies a full path to their postgresql.conf file. They might assume that the includedir was relative to the directory that file is in. Let's say configfile is /etc/sysconfig/pgsql/postgresql.conf ; a user might think that "includedir conf.d" from there would reference /etc/sysconfig/pgsql/conf.d instead of the $PGDATA/conf.d you actually get. Wavering on how to handle that is one reason I didn't try documenting this yet, the decision I made here may not actually be the right one. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l index a094c7a..25e1a07 100644 *** a/src/backend/utils/misc/guc-file.l --- b/src/backend/utils/misc/guc-file.l *** ParseConfigFp(FILE *fp, const char *conf *** 512,518 } /* OK, process the option name and value */ ! if (guc_name_compare(opt_name, "include") == 0) { /* * An include directive isn't a variable and should be processed --- 512,535 } /* OK, process the option name and value */ ! if (guc_name_compare(opt_name, "includedir") == 0) ! { ! /* ! * An includedir directive isn't a variable and should be processed ! * immediately. ! */ ! unsigned int save_ConfigFileLineno = ConfigFileLineno; ! ! if (!ParseConfigDirectory(opt_value, config_file, ! depth + 1, elevel, ! head_p, tail_p)) ! OK = false; ! yy_switch_to_buffer(lex_buffer); ! ConfigFileLineno = save_ConfigFileLineno; ! pfree(opt_name); ! pfree(opt_value); ! } ! else if (guc_name_compare(opt_name, "include") == 0) { /* * An include directive isn't a variable and should be processed *** ParseConfigFp(FILE *fp, const char *conf *** 599,604 --- 616,727 return OK; } + static int + comparestr(const void *a, const void *b) + { + return st
Re: [HACKERS] Core Extensions relocation
Well, this discussion veering off into ISN has certainly validated my gut feel that I should touch only the minimum number of things that kills my pain points, rather than trying any more ambitious restructuring. I hope that packaged extensions become so popular that some serious cutting can happen to contrib, especially the data type additions. If something as big as PostGIS can live happily as an external project, surely most of these can too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
On 11/15/2011 12:53 PM, Joshua Berkus wrote: Given discussion, is there any point in reporting on the actual patch yet? I don't expect the discussion will alter the code changes that are the main chunk of the patch here. The only place the most disputed parts impact is the documentation. I like "Management Extensions" as an alternate name for this category instead, even though it still has the issue that auto_explain isn't technically an extension. The name does help suggest why they're thrown into a different directory and package. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IDLE in transaction introspection
On 11/15/2011 09:44 AM, Scott Mead wrote: Fell off the map last week, here's v2 that: * RUNNING => active * all states from CAPS to lower case This looks like what I was hoping someone would add here now. Patch looks good, only issue I noticed was a spaces instead of a tab goof where you set beentry_>st_state at line 2419 in src/backend/postmaster/pgstat.c Missing pieces: -There is one regression test that uses pg_stat_activity that is broken now. -The documentation should list the new field and all of the states it might include. That's a serious doc update from the minimal info available right now. I know this issue has been beat up already some, but let me summarize and extend that thinking a moment. I see two equally valid schools of thought on how exactly to deal with introducing this change: -Add the new state field just as you've done it, but keep updating the query text anyway. Do not rename current_query. Declare the overloading of current_query as both a state and the query text to be deprecated in 9.3. This would keep existing tools working fine against 9.2 and give a clean transition period. -Forget about backward compatibility and just put all the breaking stuff we've been meaning to do in here. If we're going to rename current_query to query--what Scott's patch does here--that will force all code using pg_stat_activity to be rewritten. This seems like the perfect time to also change "procpid" to "pid", finally blow away that wart. I'll happily update all of the tools and samples I deal with to support this change. Most of the ones I can think of will be simplified; they're already parsing query_text and extracting the implicit state. Just operating on an explicit one instead will be simpler and more robust. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Unremovable tuple monitoring
On 11/15/2011 10:29 AM, Alvaro Herrera wrote: They were deleted but there are transactions with older snapshots. I think vacuum uses the term "nondeletable" or "nonremovable". Not sure which one is less bad. Not being a native speaker, they all sound horrible to me. I would go more for something like "deadinuse". Saying they are unremovable isn't very helpful because it doesn't lead the user to knowing why. If the name gives some suggestion as to why they are unremovable--in this case that they are still potentially visible and usable by old queries--that would be a useful naming improvement to me. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
On 11/14/2011 10:09 PM, Robert Haas wrote: I continue to think that we should be trying to sort these by subject matter. The term "core extensions" doesn't convey that these are server management and debugging tools, hence Josh's confusion. I'm not attached to the name, which I just pulled out of the air for the documentation. Could just as easily call them built-in modules or extensions. If the objection is that "extensions" isn't technically correct for auto-explain, you might call them core add-ons instead. My thinking was that the one exception didn't make it worth the trouble to introduce a new term altogether here. There's already too many terms used for talking about this sort of thing, the confusion from using a word other than "extensions" seemed larger than the confusion sown by auto-explain not fitting perfectly. The distinction I care about here is primarily a packaging one. These are server additions that people should be able to count on having available, whereas right now they may or may not be installed depending on if contrib was added. Everything I'm touching requires our RPM and Debian packagers (at least) make a packaging change, too. I can't justify why that's worth doing for any of the other extensions, which is one reason I don't try to tackle them. The type of finer sorting you and Thom are suggesting seems like it's mainly a documentation change to me. I'm indifferent to the idea; no plans to either work on it or object to it. The docs could be made easier to follow here without any change to the directory tree, and trying to push out a larger packaging change has downsides. Useful reminder reading here is http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting#Moving_Contrib_Around To quote from there, "Users hate having loads and loads of packages. We do need to be careful not to oversplit it." There's some useful notes about dependency issues there too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
On 11/14/2011 07:56 PM, Josh Berkus wrote: So I'm a bit unclear on why most of the optional data types were excluded from your list of Core Extensions. I was aiming for the extensions that seemed uncontroversial for a first pass here. One of the tests I applied was "do people sometimes need this module after going into production with their application?" The very specific problem I was most concerned about eliminating was people discovering they needed an extension to troubleshoot performance or corruption issues, only to discover it wasn't available--because they hadn't installed the postgresql-contrib package. New package installation can be a giant pain to get onto a production system in some places, if it wasn't there during QA etc. All of the data type extensions fail that test. If you need one of those, you would have discovered that on your development server, and made sure the contrib package was available on production too. There very well may be some types that should be rolled into the core extensions list, but I didn't want arguments over that to block moving forward with the set I did suggest. We can always move more of them later, if this general approach is accepted. It only takes about 5 minutes per extension to move them from contrib to src/extension, once the new directory tree and doc section is there. But I didn't want to do the work of moving another 15 of them if the whole idea was going to get shot down. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why do we need two snapshots per query?
On 11/14/2011 04:04 PM, Robert Haas wrote: Some pgbench -S numbers (SELECT-only test) from Nate Boley's 32-core box It seems like Nate Boley's system should be be credited in the 9.2 release notes. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group Commit
On 11/14/2011 03:43 PM, Josh Berkus wrote: Purpose is to provide consistent WAL writes, even when WALInsertLock contended. Currently no "off" option, thinking is that the overhead of doing this is relatively low and so it can be "always on" - exactly as it is for sync rep. Hmmm, have you had a chance to do any performance tests? I was planning to run some later this week, but someone else is welcome to take a shot at it. The inspiration for this change was the performance scaling tests I did for sync rep last month. Don't recall if I shared those with this list yet; I've attached the fun graph. Over a slow international link with 100ms ping times, I was only getting the expected 10 TPS doing sync rep with a single client. But as more clients were added, so that a chunk of them were acknowledged in each commit reply, the total throughput among all of them scaled near linearly. With 300 clients, that managed to hit a crazy 2000 TPS. The best scenario to show this patch working would be a laptop drive spinning at a slow speed (5400 or 4200 RPM) so that individual local commits are slow. That won't be 100ms slow, but close to 10ms is easy to see. When adding clients to a system with a slow local commit, what I've observed is that the scaling levels off between 750 and 1000 TPS, no matter how many clients are involved. The hope is that this alternate implementation will give the higher scaling in the face of slow commits that is seen on sync rep. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us <> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CommitFest 2011-11 starting soon
Tomorrow November 15, patch submission will close for the 2011-11 CommitFest after 11:59PM PST. New patches ready for review should be submitted to this mailing list and added to the CommitFest application at https://commitfest.postgresql.org/ See http://wiki.postgresql.org/wiki/Development_information for guidelines on patch development, submission, review, and how the CommitFest work happens. Even before the last minute rush there are already 25 submissions that are looking for reviewers. If you can apply a patch and you can use the new feature, you're qualified to start reviewing it. And we've produced some guides to the patch part--see http://www.pgcon.org/2011/schedule/events/368.en.html as one example. If you're interested in helping with review, but are looking for suggestions on what patch to select, you should join the Round Robin Reviewers list to get assigned one. More information about that at http://wiki.postgresql.org/wiki/RRReviewers This is the 3rd of the 4 CommitFests for PostgreSQL 9.2. The hope is that any major features aimed at 9.2 will have been submitted for initial review by this one, so there's still time to get feedback and re-submit before the final CommitFest for 9.2, 2012-01. There are also 4 uncontroversial submissions in this CommitFest that are marked "Ready for Committer": -Non-inheritable check constraints -plperl verify utf8 strings -Perl xsubpp from cpan -Add Support for building with Visual Studio 2010 Committers who would like to get an early start have some options already. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] star join optimization
On 11/14/2011 10:44 AM, Rudyar wrote: Ok, I'm working in that project. I will send you my results and comparision with SQL server HHJ optimization in one or two months. Please be careful not to share here details of how features like this are built in any commercial databases you evaluate. Some of those implementations use patented design approaches that should be avoided in an open source project. Oracle, Microsoft, and DB2 are all aggressive about patenting the innovative parts of their database server code. In addition to not wanting to accidentally incorporate such a design, it's better for the PostgreSQL project to not be aware of what patents in this area exist too. We don't even want a survey of patents in this area published here because there are increased penalties for willful patent infringement. See http://en.wikipedia.org/wiki/Treble_damages for example. What this project likes best are innovative approaches from recent academic research that haven't been incorporated in any commercial products yet. A good example is how the Serializable Snapshot Isolation technique developed by Cahill and others was added to PostgreSQL 9.1: http://wiki.postgresql.org/wiki/Serializable There was less concern over accidentally duplicating a patented approach because that technique wasn't in any of the commercial databases yet. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing output of EXPLAIN command in PostgreSQL
On 11/10/2011 11:10 AM, Αναστάσιος Αρβανίτης wrote: I'm developing an application that requires parsing of execution plans (those produced as output by issuing an EXPLAIN [query] command). Are you aware of any Java library that I could use for this purpose? I found https://github.com/depesz/Pg--Explain but it is built in Perl. There's also a little Javascript program that consumes the JSON version at: http://www.postgresonline.com/journal/archives/171-pgexplain90formats_part1.html http://www.postgresonline.com/journal/archives/174-pgexplain90formats_part2.html Also another option I am considering is to use EXPLAIN [query] FORMAT XML which is available in PostgreSQL 9.1. However, in that case it would better to have the XML Schema of the generated plans available. That's the easiest way to solve this problem in Java, and in that case most of the text-based code in Pg--Explain will just be a distraction. I know some of the earlier versions of XML EXPLAIN included a "DTD" option to output that, but I don't see that in the committed code. I'm not sure where that is at actually; it's a good question. The only reference to doing this I found was Andrew's blog: http://people.planetpostgresql.org/andrew/index.php?/archives/32-A-couple-of-nice-tools.html where he talks about there being a RELAXNG specification for the XML output. I can't find where that came from either. Andrew? -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MPI programming in postgreSQL backend source code
On 11/09/2011 04:10 PM, Rudyar Cortés wrote: I'm a new programmer in postgreSQL source code.. Is possible use MPI functions in postgreSQL source code? To do this the proper way, you would need to modify the database's "configure" step to: -Check if the OpenMPI libraries are available and include the necessary bits. For example, this is in one of the projects I work on: #ifdef _OPENMP extern int omp_get_num_threads(); #endif Some form of that test and defining the functions available would be needed for what you want. -Link OpenMPI in. At the gcc level you'll need "-fopenmp" to start. Then you could start using OpenMPI functions in database code. You might hack the build steps to do this in a simpler way, too, rather than fight with configure the proper way. Since a lot of the MPI functions aim at specific types of thread use and I/O, it would be a major effort to utilize the library for too many things. The existing notion of how processes are created and managed is scattered throughout the PostgreSQL code. And the I/O parts of the database are buried through a few layers of indirection. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Syntax for partitioning
On 11/09/2011 04:05 PM, Dimitri Fontaine wrote: The more I think about this problem, the more I think that the reason why we still don't have declarative partitioning is that it basically sucks. I think that we don't have it because no one has ever dumped the much larger than might be expected amount of time into pulling all the pieces together and smoothing out the rough parts. I don't think there's any design thinking leap needed over what's already been worked out. Just a lot of work to get all the edge cases right on the simplest possible thing that is useful. The path to reach something that could be considered for commit includes something like this set of things: 1) Add partitioning catalog support 2) Create new syntax for partitioning that writes to the catalog 3) Decide how to represent partition data in memory 4) Route new INSERTed data into the right place 5) Support moving UPDATEd data into a new partition 6) Handle COPY usefully The last rev of this submitted was still working through (1) here, i.e. this review from Robert: http://archives.postgresql.org/message-id/aanlktikp-1_8b04eyik0sdf8ua5kmo64o8sorfbze...@mail.gmail.com And there's a whole pile of issues I don't think have been fully explored about even the most basic case. How to handle ALTER to these structures cleanly, locking, etc.. I don't think it's possible to design such that you skip a large portion of these details; someone needs to put some number of spend weeks+ getting them all right instead. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1.2 ?
On 11/09/2011 03:58 PM, Daniel Farina wrote: On Tue, Nov 8, 2011 at 9:06 PM, Magnus Hagander wrote: I definitely think they are important enough to trigger a release. But as you say, I think we need confirmation that they actually fix the problem... I have confirmed that the clog/subtrans fixes allow us to start up while in hot standby on otherwise problematic base backups. I think Daniel has run into this problem more than anyone else, so hearing it's fixed for him makes me feel a lot better that it's been resolved. I'd characterize this problem as a medium grade data corruption issue. It's not security issue bad that it needs to be released tomorrow, but a backbranch release of at least 9.0/9.1 that includes it would be a big relief for people nervous about this. I'd hate to see that slip forward to where it gets sucked into the holiday vortex. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1.2 ?
On 11/09/2011 01:12 PM, Greg Jaskiewicz wrote: Would you consider it a blocker for a rollout on production system ? I wouldn't. Good process for checking your backups should find this problem if it pops up, and it's not that easy to run into. That's why I was saying there are workarounds here, they're just not nice to put people through. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Releasing an alpha for CF2
On 11/08/2011 09:31 PM, Robert Haas wrote: I know various people are working on patches to do various things, but the only outstanding thing that sticks out in my mind as an outstanding "big feature" patch is Alvaro's work on foreign key locks. I'm sure there must be others... but I'm not sure how many. Along with that, there's a lot of potential for ripple from the "Command Triggers" patch Dimitri just submitted today. That's already approaching 3000 lines of code and he's still going on it. I've been sniffing around recently looking for troublesome patches, under the assumption I'll be running the CF next week, and those two are the ones I've been most concerned about. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views
On 11/08/2011 04:23 PM, Kevin Grittner wrote: http://wiki.postgresql.org/wiki/Materialized_Views That page describes three components: creating MVs, updating MVs, and having the planner automatically detect when an MV matches some portion of a regular query and using the MV instead of the specified tables in such cases. So far that page is just my initial notes on this subject based on some research I did, don't read too much into it. Ignoring MV substitution is certainly the right plan for an initial development target. An implementation that didn't update automatically at all is also still a useful step. It's very easy to pick too big of a chunk of this project to chew on at once. When I wrote that, it wasn't completely clear to me yet that doing the updates would involve so many of the same tricky bits that stalled progress on the MERGE command. Nowadays I think of working out the concurrency issues in both this and MERGE, in the wake of true serialization, as another step in this implementation plan. It's not strictly necessary, but there's some shared issues that might be tackled easier in the MERGE context instead. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1.2 ?
On 11/08/2011 07:34 PM, Tom Lane wrote: I don't recall that we've fixed anything since September that seemed to warrant an immediate release. The backup+pg_clog failure issues fixed last week have been a nasty problem hitting people for a while. Backup corruption is obviously serious. Only reason I think it wasn't a higher priority issue is that it didn't happen every time, and the people impacted were eventually able to work around it. Concern about that problem is why I popped off a message earlier today, about whether the fixes committed have been confirmed outside of Simon's own testing. I was curious how 9.0 fared last year for comparison, here's that data: Version Date Days Weeks 9.0.009/20/10 9.0.110/04/10142.0 9.0.212/16/107310.4 9.0.301/31/11466.6 9.0.404/18/117711.0 9.0.509/26/11161 23.0 So the average for the first three point releases was around 6 weeks apart. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Measuring relation free space
On 11/08/2011 05:07 PM, Robert Treat wrote: It's already easy to get "good enough" numbers based on user space tools with very little overhead, so I think it's more important that the server side tool be accurate rather than fast. What user space method do you consider good enough here? I haven't found any approximation that I was really happy with; wouldn't have bothered with this otherwise. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
I was curious if anyone running into these problems has gotten a chance to test the 3 fixes committed here. It sounded like Linas even had a repeatable test case? For easier reference the commits are: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2f55c535e1f026929cf20855b3790d3632062d42 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ff8451aa14c8513e429cbef09ddc72e79da366a5 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=656bba95af3950b26e9e97d86d29787d89e2b423 -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Measuring relation free space
On 11/06/2011 11:55 PM, Mark Kirkwood wrote: I am guessing (at this point very much guessing) that pg_freespace map may return its data faster, as pageinspect is gonna have to grovel through all the pages for itself (whereas pg_freespacemap relies on using info from the ... free space map fork). I started with pageinspect because I wasn't sure if other methods would be as accurate. For example, I wasn't sure until just before submission that only the free space and size of the relation are needed to get a useful measure here; at one point I was considering some other things too. I've ruled them out as unnecessary as far as I can tell, but I can't claim my tests are exhaustive. Some review confirmation that this measure is useful for other people is one thing I was hoping for feedback on, as one thing to consider in addition to the actual implementation. If this measurement is the only one needed, than as I said at the start of the thread here it might easily be implemented to run just against the free space map instead. I'm thinking of what's been sent so far as a UI with matching output it should produce. If it's possible to get the same numbers faster, exactly how to implement the function under the hood is easy enough to change. Jaime already has a new version in development that adds a ring buffer for example. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements
On 11/06/2011 06:00 PM, Tom Lane wrote: Peter Geoghegan writes: > A major consideration was backwards compatibility; This is not a consideration that the community is likely to weigh heavily, or indeed at all. We aren't going to back-port this feature into prior release branches, and we aren't going to want to contort its definition to make that easier. Being able to ship a better pg_stat_statements that can run against earlier versions as an extension has significant value to the community. Needing to parse log files to do statement profiling is a major advocacy issue for PostgreSQL. If we can get something useful that's possible to test as an extension earlier than the 9.2 release--and make it available to more people running older versions too--that has some real value to users, and for early production testing of what will go into 9.2. The point where this crosses over to requiring server-side code to operate at all is obviously a deal breaker on that idea. So far that line hasn't been crossed, and we're trying to stage testing against older versions on real-world queries. As long as it's possible to keep that goal without making the code more difficult to deal with, I wouldn't dismiss that as a useless distinction. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements
On 11/07/2011 09:03 AM, Robert Haas wrote: I think it's an established principle that the design for features like this should, for best results, be discussed on -hackers before writing a lot of code. You can see from the commit history this idea is less than a month old. Do we need to get community approval before writing a proof of concept now? Everyone would still be arguing about how to get started had that path been taken. If feedback says this needs a full rewrite, fine. We are familiar with how submitting patches works here. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements
On 11/06/2011 01:08 PM, Tom Lane wrote: Peter Geoghegan writes: ... It also does things like intelligently distinguishing between queries with different limit/offset constant values, as these constants are deemed to be differentiators of queries for our purposes. A guiding principal that I've followed is that anything that could result in a different plan is a differentiator of queries. This claim seems like bunk, unless you're hashing the plan tree, in which case it's tautological. Peter's patch adds a planner hook and hashes at that level. Since this cat is rapidly escaping its bag and impacting other contributors, we might as well share the work in progress early if anyone has a burning desire to look at the code. A diff against the version I've been internally reviewing in prep for community submission is at https://github.com/greg2ndQuadrant/postgres/compare/master...pg_stat_statements_norm Easier to browse than ask questions probing about it I think. Apologies to Peter for sharing his work before he was completely ready; there is a list of known problems with it. I also regret the thread hijack here. The first chunk of code is a Python based regression test program, and an open item here is the best way to turn that into a standard regression test set. There will be additional infrastructure added to the parser to support normalisation of query strings for the patch I'll be submitting (that obviously won't be supported in the version that builds against existing Postgres versions that I'll make available). Essentially, I'll be adding a length field to certain nodes, This seems like a good way to get your patch rejected: adding overhead to the core system for the benefit of a feature that not everyone cares about is problematic. Struggling around this area is the main reason this code hasn't been submitted yet. To step back for a moment, there are basically three options here that any code like this can take, in regards to storing the processed query name used as the key: 1) Use the first instance of that query seen as the "reference" version 2) Use the last instance seen 3) Transform the text of the query in a way that's stable across all duplicates of that statement, and output that Downstream tools operating on this data, things that will sample pg_stat_statements multiple times, need some sort of stable query text they can operate on. It really needs to survive server restart too. Neither (1) nor (2) seem like usable solutions. Both have been implemented already in Peter's patch, with (2) being what's in the current patch. How best to do (3) instead is not obvious though. Doing the query matching operating at the planner level seems effective at side-stepping the problem of needing to parse the SQL, which is where most implementations of this idea get stuck doing fragile transformations. My own first try at this back in September and Tomas's patch both fall into that category. That part of Peter's work seems to work as expected. That still leaves the problem of "parsed query -> stable normalized string". I think that is an easier one to solve than directly taking on the entirety of "query text -> stable normalized string" though. Peter has an idea he's exploring for how to implement that, and any amount of overhead it adds to people who don't use this feature is an obvious concern. There are certainly use cases that don't care about this problem, ones that would happily take (1) or (2). I'd rather not ship yet another not quite right for everyone version of pg_stat_statements again though; only solving too limited of a use case is the big problem with the one that's already there. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Measuring relation free space
Attached patch adds a new function to the pageinspect extension for measuring total free space, in either tables or indexes. It returns the free space as a percentage, so higher numbers mean more bloat. After trying a couple of ways to quantify it, I've found this particular measure correlates well with the nastiest bloat issues I've ran into in production recently. For example, an index that had swelled to over 5X its original size due to autovacuum issues registered at 0.86 on this scale. I could easily see people putting an alert at something like 0.40 and picking candidates to reindex based on it triggering. That would be about a million times smarter than how I've been muddling through this class of problems so far. Code by Jaime Casanova, based on a prototype by me. Thanks to attendees and sponsors of the PgWest conference for helping to fund some deeper exploration of this idea. Here's a test case showing it in action: create extension pageinspect; create table t (k serial,v integer); insert into t(v) (select generate_series(1,10)); create index t_idx on t(k); delete from t where k<5; vacuum t; gsmith=# select relation_free_space('t'); relation_free_space - 0.445466 gsmith=# select relation_free_space('t_idx'); relation_free_space - 0.550946 Some open questions in my mind: -Given this is doing a full table scan, should it hook into a ring buffer to keep from trashing the buffer cache? Or might it loop over the relation in a different way all together? I was thinking about eyeing the FSM instead at one point, didn't explore that yet. There's certainly a few ways to approach this, we just aimed at the easiest way to get a working starter implementation, and associated results to compare others against. -Should there be a non-superuser version of this? We'd certainly need to get a less cache demolishing version before that would seem wise. -There were related things in the pageinspect module, but a case could be made for this being a core function instead. It's a bit more likely to be used in production than the rest of that extension. -What if anything related to TOAST should this handle? We're also planning to do a sampling version of this, using the same approach ANALYZE does. Grab a number of blocks, extrapolate from there. It shouldn't take many samples before the accuracy is better than how people are estimated this now. That work is just waiting on some better thinking about how to handle the full relation version first. And, yes, the explanation in the docs and code should be clear that it's returning a percentage, which I just realized when writing this. At least I remembered to document something; still ahead of the average new patch... -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c index dbb2158..aac9148 100644 *** a/contrib/pageinspect/btreefuncs.c --- b/contrib/pageinspect/btreefuncs.c *** *** 34,39 --- 34,40 #include "utils/builtins.h" #include "utils/rel.h" + #include "btreefuncs.h" extern Datum bt_metap(PG_FUNCTION_ARGS); extern Datum bt_page_items(PG_FUNCTION_ARGS); *** GetBTPageStatistics(BlockNumber blkno, B *** 155,160 --- 156,202 stat->avg_item_size = 0; } + /* + * GetBTRelationFreeSpace + * + * Get the free space for a btree index. + * This is a helper function for relation_free_space() + * + */ + float4 + GetBTRelationFreeSpace(Relation rel) + { + BTPageStat stat; + + Buffer buffer; + BlockNumber blkno; + BlockNumber totalBlcksInRelation = RelationGetNumberOfBlocks(rel); + Size free_space = 0; + double free_percent = 0; + + /* Skip page 0 because it is a metapage */ + for (blkno = 1; blkno < totalBlcksInRelation; blkno++) + { + buffer = ReadBuffer(rel, blkno); + /* + * get the statistics of the indexes and use that info + * to determine free space on the page + */ + GetBTPageStatistics(blkno, buffer, &stat); + if (stat.type == 'd') + free_space += stat.page_size; + else + free_space += stat.free_size; + + ReleaseBuffer(buffer); + } + + if (totalBlcksInRelation > 1) + free_percent = (free_space * 1.0) / ((totalBlcksInRelation - 1) * BLCKSZ); + return free_percent; + } + + /* --- * bt_page() * diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c index fa50655..c1d72ba 100644 *** a/contrib/pageinspect/heapfuncs.c --- b/contrib/pageinspect/hea
Re: [HACKERS] IDLE in transaction introspection
On 11/04/2011 05:01 PM, Tom Lane wrote: Scott Mead writes: I leave the waiting flag in place for posterity. With this in mind, is the consensus: RUNNING or ACTIVE Personally, I'd go for lower case. I was thinking it would be nice if this state looked like the WAL sender state values in pg_stat_replication, which are all lower case. For comparison those states are: startup backup catchup streaming -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unite recovery.conf and postgresql.conf
On 09/24/2011 04:49 PM, Joshua Berkus wrote: Well, we *did* actually come up with a reasonable way, but it died under an avalanche of bikeshedding and "we-must-do-everything-the-way-we-always-have-done". I refer, of course, to the "configuration directory" patch, which was a fine solution, and would indeed take care of the recovery.conf issues as well had we implemented it. We can *still* implement it, for 9.2. That actually died from a lack of round-tuits, the consensus at the end of the bike-sheeding was pretty clear. Last night I finally got motivated to fix the bit rot and feature set on that patch, to match what seemed to be the easiest path toward community approval. One known bug left to resolve and I think it's ready to submit for the next CF. I think includeifexists is also a good improvement, too, on a related arc to the main topic here. If I can finish off the directory one (or get someone else to fix my bug) I should be able to follow up with that one. The patches won't be that different. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce checkpoints and WAL traffic on low activity database serv
On 11/02/2011 05:48 PM, Simon Riggs wrote: On Wed, Nov 2, 2011 at 6:27 PM, Robert Haas wrote: Again, it's no longer the maximum time between automatic checkpoints. You're missing the point that it never was like that. I've not altered the situation. Robert's point is more that the existing docs are already broken; this new patch can just increase how bad the drift between reality and documentation can be. Before, the only people who ran into this had zero activity on the server, which meant there wasn't any data to be lost, either. Now it's potentially broader than that. With some trivial checkpoints containing a small amount of data skipped now, aren't there some cases where less WAL data will be written than before? In that case, the user visible behavior here would be different. I'd be most concerned about file-based log shipping case. I don't think there's any change needed to the "Write Ahead Log" section of the "Server Configuration" chapter. In the "Reliability and the Write-Ahead Log" chapter, this text in "WAL Configuration" was already highlighted as the problem here: The server's background writer process automatically performs a checkpoint every so often. A checkpoint is created every checkpoint_segments log segments, or every checkpoint_timeout seconds, whichever comes first. The default settings are 3 segments and 300 seconds (5 minutes), respectively. It is also possible to force a checkpoint by using the SQL command CHECKPOINT. I think this needs a change like this, to address the hole that was already in the docs and cover the new behavior too; this goes just before " It is also possible to force..." In cases where there are little or no writes to the WAL, checkpoints will be skipped even if checkpoint_timeout has passed. At least one new WAL segment must have been created before an automatic checkpoint occurs. The time between checkpoints and when new WAL segments are created are not related in any other way. If file-based WAL shipping is being used and you want to bound how often files are sent to standby server, to reduce potential data loss you should adjust archive_timeout parameter rather than the checkpoint ones. This area is a confusing one, so some clarification of the relation between checkpoints and replication is a useful docs improvement. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
On 10/14/2011 01:48 PM, Bruce Momjian wrote: Is this going to be done for 9.2? Refreshing this patch is on my list of things to finish before the next CommitFest starts later this month. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On 09/29/2011 10:40 AM, Alvaro Herrera wrote: I reviewed this patch. My question for you is: does it make sense to enable to reporting of write rate even when vacuum cost accounting is enabled? In my opinion it would be useful to do so. If you agree, please submit an updated patch. Presumably you meant to ask if this makes sense to show when cost accounting isn't enabled, because the code doesn't do that right now. No cost accounting, no buffer usage/write rate data as this was submitted. Looks like making this work even in cases where cost accounting isn't on will make the patch a bit larger obtrusive, but it's not unreasonable. Now that you mention it, people who do a manual, full-speed VACUUM would certainly appreciate some feedback on the rate it ran at. I'll include that in the next update. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On 09/26/2011 05:58 AM, Shigeru Hanada wrote: > * Local variables added by the patch (secs, usecs, write_rate and > endtime) can be moved into narrower scope. > * Initializing starttime to zero seems unnecessary. > Setting starttime to 0 is already in the code; the change made to that line was to add endtime, which is not initialized. You may be right that initializing it isn't necessary, but I'm sure not going to touch that part of the working code. You're right about the the local variables; they were placed to look like the surrounding code rather than to be as local as possible. I'm not sure if all the PostgreSQL code is completely consistent here; a quick survey shows examples of both "put all the variables at the top" and "make variables as local to blocks as possible" styles. I don't know that it really makes any difference with modern compilers, either. I'm sure someone else will have a stronger opinion on this subject now that I've trolled the list for one by writing this. > In addition, documents about how to use the statistics would be > necessary, maybe in "23.1.5. The Autovacuum Daemon". > I'll set the status of this patch to waiting-on-author. Would you rebase > the patch and post it again? > I didn't do that because there's not really much documentation at this level of detail yet--how to interpret all the information in the logs. That's an open-ended bit of work; there's a lot more that could be written on this topic than the docs have right now. It's probably worth pointing out that this specific info is now in the logs, though, given that people might not notice it otherwise. I'll see what I can do about that. As a general FYI, rebasing is normally requested only when the existing patch doesn't apply anymore. If "patch" or "git apply" can consume it, complaints about code shifting around isn't by itself enough reason to ask for an updated patch. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Unremovable tuple monitoring
On 10/04/2011 03:45 PM, Royce Ausburn wrote: I think I get this stats stuff now. Unless someone here thinks it's too hard for a new postgres dev's 2nd patch, I could take a stab. I might take a look at it tonight to get a feel for how hard, and what stats we could collect. I'll start a new thread for discussion. Adding statistics and good monitoring points isn't hard to do, once you figure out how the statistics messaging works. From looking at your patch, you seem to be over that part of the learning curve already. The most time consuming part for vacuum logging patches is setting up the test cases and waiting for them to execute. If you can provide a script that does that, it will aid in getting review off to a goo start. Basically, whatever you build to test the patch, you should think about packaging into a script you can hand to a reviewer/committer. Normal approach is to build a test data set with something like generate_series, then create the situation the patch is supposed to log. Just to clarify what Robert was suggesting a little further, good practice here is just to say "this patch needs a catversion bump", but not actually do it. Committers should figure that out even if you don't mention it, but sometimes a goof is made; a little reminder doesn't hurt. I'm not sure what my next step should be. I've added this patch to the open commit fest -- is that all for now until the commit fest begins review? Basically, we'll get to it next month. I have my own autovacuum logging stuff I'm working on that I expect to slip to that one too, so I can easily take on reviewing yours then. I just fixed the entry in the CF app to follow convention by listing your full name. Main feedback for now on the patch is that few people ever use pg_stat_all_tables. The new counter needs to go into pg_stat_user_tables and pg_stat_sys_tables if it's going to be visible to the people who are most likely to need it. I updated the name of the patch on the CommitFest to read "Unremovable tuple count on pg_stat_*_tables" so the spec here is more clear. I'd suggest chewing on the rest of your ideas, see what else falls out of this, and just make sure to submit another update just before the next CF starts. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separating bgwriter and checkpointer
On 09/20/2011 09:35 AM, Heikki Linnakangas wrote: Yeah, I was thinking of write()s, not fsyncs. I agree this might have some effect during fsync phase. Right; that's where the most serious problems seem to pop up at anyway now. All the testing I did earlier this year suggested Linux at least is happy to do a granular fsync, and it can also use things like barriers when appropriate to schedule I/O. The hope here is that the background writer work to clean ahead of the strategy point is helpful to backends, and that should keep going even during the sync phase--which currently doesn't pause for anything else once it's started. The cleaner writes should all queue up into RAM in a lazy way rather than block the true I/O, which is being driven by sync calls. There is some risk here that the cleaner writes happen faster than the true rate at which backends really need buffers, since it has a predictive component it can be wrong about. Those could in theory result in the write cache filling faster than it would in the current environment, such that writes truly block that would have been cached in the current code. If you're that close to the edge though, backends should really benefit from the cleaner--that same write done by a client would turn into a serious stall. From that perspective, when things have completely filled the write cache, any writes the cleaner can get out of the way in advance of when a backend needs it should be the biggest win most of the time. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On 09/19/2011 10:58 AM, Joe Abbate wrote: Maybe it's time for an open source SQL standard, one not controlled by the "big guys" and their IP claims. Not spending as much time sitting in meetings and fighting with other vendors is one of the competitive advantages PostgreSQL development has vs. the "big guys". There needs to be a pretty serious problem with your process before adding bureaucracy to it is anything but a backwards move. And standardization tends to attract lots of paperwork. Last thing you want to be competing with a big company on is doing that sort of big company work. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
On 09/19/2011 10:53 AM, Thom Brown wrote: But couldn't that also be seen as a chicken/egg situation? The chicken/egg problem here is a bit deeper than just "no one offers GPUs because no one wants them" on server systems. One of the reasons there aren't more GPUs in typical database server configurations is that you're already filling up some number of the full size slots, and correspondingly the bandwidth available to cards, with disk controllers. It doesn't help that many server class motherboards don't even have a x16 PCI-e slot on them, which is what most GPUs as delivered on regular consumer video cards are optimized for. But nVidia does produce a non-graphics-oriented GPGPU line called Tesla dedicated to such processing. Tesla units start at around $1500 USD, which is a nice budget to spend on either more RAM (to allow higher work_mem), faster storage to store temporary files onto, or a faster CPU to chew through all sorts of tasks more quickly. The Tesla units are easy to justify if you have a serious GPU-oriented application. The good bang for the buck point with CPU sorting for PostgreSQL is probably going to be a $50-$100 video card instead. For example, the card Vitor is seeing good results on costs around $60. (That's also a system with fairly slow RAM, though; it will be interesting to see if the gain holds up on newer systems.) -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 09/05/2011 07:52 PM, Tomas Vondra wrote: If your logging criteria for the write phase was "display a message any time more than 30 seconds have passed since last seeing one", that would give you only a few lines of output in a boring, normal checkpoint--certainly less than the 9 in-progress samples you're outputting now, at 10% intervals. But in the pathological situations where writes are super slow, your log data would become correspondingly denser, which is exactly what you want in that situation. I still am not sure what should be a reasonable value or how to determine it. What happens when the checkpoint_timeout is increased, there's more shared_buffers etc.? What about using (checkpoint_timeout/10) for the time-based checkpoints and 30s for the other checkpoints? That may work fine. Maybe implement it like that, and see if the amount of logging detail is reasonable in a couple of test scenarios. I think combining the two makes the most sense: "log when>=30 seconds have passed since the last message, and there's been>=10% more progress made". (Maybe do the progress check before the time one, to cut down on Is this is a good idea? The case when the timeout expires and not much data was written is interesting, and this would not log it. But OTOH this would nicely solve the issue with time-based checkpoints and a fixed threshold. One thing I am trying to avoid here is needing to check the system clock after every buffer write. I also consider it useful to put an upper bound on how many of these messages will appear even in the verbose mode. This deals with both those problems. Yes, there is a potential problem with this idea. Let's say checkpoint writes degrade to where they take an hour. In that case, you won't see the first progress report until 6 minutes (10%) have gone by with this implementation. I don't see a good way to resolve that without violating one of the other priorities I listed above though. You'll have to poll the system clock constantly and will end up creating a lot of log entries if you don't do a check against the % progress first. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
On 09/19/2011 10:12 AM, Greg Stark wrote: With the GPU I'm curious to see how well it handles multiple processes contending for resources, it might be a flashy feature that gets lots of attention but might not really be very useful in practice. But it would be very interesting to see. The main problem here is that the sort of hardware commonly used for production database servers doesn't have any serious enough GPU to support CUDA/OpenCL available. The very clear trend now is that all systems other than gaming ones ship with motherboard graphics chipsets more than powerful enough for any task but that. I just checked the 5 most popular configurations of server I see my customers deploy PostgreSQL onto (a mix of Dell and HP units), and you don't get a serious GPU from any of them. Intel's next generation Ivy Bridge chipset, expected for the spring of 2012, is going to add support for OpenCL to the built-in motherboard GPU. We may eventually see that trickle into the server hardware side of things too. I've never seen a PostgreSQL server capable of running CUDA, and I don't expect that to change. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 09/02/2011 11:10 AM, Tomas Vondra wrote: My 'ideal' solution would be either to add another GUC (to turn this on/off) or allow log_checkpoints to have three values log_checkpoints = {off, normal, detailed} where 'normal' provides the current output and 'detail' produces this much verbose output. If this is going to be acceptable, that's likely the only path it could happen by and still meet what you're looking for. I will just again stress that the part you're working on instrumenting better right now is not actually where larger systems really run into the most problems here, based on what I've seen. I added a series of log messages to 9.1 at DEBUG1, aimed at tracking the sync phase. That's where I see many more checkpoint issues than in the write one. On Linux in particular, it's almost impossible for the write phase to be more of a problem than the sync one. So the logging you're adding here I don't ever expect to turn on. But I wouldn't argue against an option to handle the logging use-case you're concerned about. Letting people observe for themselves and decide which of the phases is more interesting to their workload seems appropriate. Then users have options for what to log, no matter which type of problem they run into. If you're expanding log_checkpoints to an enum, for that to handle what I think everybody might ever want (for what checkpoints do now at least), I'd find that more useful if it happened like this instead: log_checkpoints = {off, on, write, sync, verbose} I don't think you should change the semantics of off/on, which will avoid breaking existing postgresql.conf files and resources that suggest tuning advice. "write" can toggle on what you're adding; "sync" should control whether the DEBUG1 messages showing the individual file names in the sync phase appear; and "verbose" can include both. As far as a heuristic for making this less chatty when there's nothing exciting happening goes, I think something based on how much time has passed would be the best one. In your use case, I would guess you don't really care whether a message appears every n%. If I understand you correctly now, you would mainly care about getting enough log detail to know 1) when things are running really slow, or b) often enough that the margin of error in your benchmark results from unaccounted checkpoint writes is acceptable. In both of those cases, I'd think a time-based threshold would be appropriate, and that also deals with the time-based checkpoints, too. If your logging criteria for the write phase was "display a message any time more than 30 seconds have passed since last seeing one", that would give you only a few lines of output in a boring, normal checkpoint--certainly less than the 9 in-progress samples you're outputting now, at 10% intervals. But in the pathological situations where writes are super slow, your log data would become correspondingly denser, which is exactly what you want in that situation. I think combining the two makes the most sense: "log when >=30 seconds have passed since the last message, and there's been >=10% more progress made". (Maybe do the progress check before the time one, to cut down on gettimeofday() calls) That would give you 4 in-progress reports during a standard 2.5 minute write phase, and in cases where the checkpoints are taking a long time you'd get as many as 9. That's pretty close to auto-tuning the amount of log output, so the amount of it is roughly proportional to how likely the information it's logging will be interesting. We certainly don't want to add yet another GUC just to control the frequency. I don't think it will be too hard to put two hard-coded thresholds in and do good enough for just about everyone though. I would probably prefer setting those thresholds to 60 seconds/20% instead. That might not be detailed enough for you though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On 08/29/2011 11:03 AM, Robert Haas wrote: Instead of doing this only when vacuum costing is active, could we drive it off of the pgBufferUsage stuff (maybe with a few tweaks...) and do it unconditionally? Sure. I've wondered about an ever larger refactoring, to reorient vacuum costing around completely: drive it all from the pgBufferUsage side rather than running its own totals. I didn't even start wandering down that path yet because of time constraints, plus the desire to have something I could backport to installs having VACUUM issues on earlier versions. This code I'd backport without hesitation; something that wanders toward a more complicated rearrangement becomes harder to deal with. To me it seems like it would better to say "write rate xyz MB/s" rather than "xyz MB/s write rate", but maybe I'm in the minority on that one. I was just trying to mimic the style of the logging already there as closely as I could. I don't like the way the existing log message looks either. I wasn't going to ignore its style over that though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
Updated patch cleans up two diff mistakes made when backing out the progress report feature. The tip-off I screwed up should have been the absurdly high write rate shown. The usleep was accidentally deleted, so it was running without cost limits even applying. Here's a good one instead: LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index scans: 1 pages: 0 removed, 163935 remain tuples: 200 removed, 2928356 remain buffer usage: 117393 hits, 123351 misses, 102684 dirtied, 2.168 MiB/s write rate system usage: CPU 2.54s/6.27u sec elapsed 369.99 sec -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 889737e..c9890b4 100644 *** a/src/backend/commands/vacuum.c --- b/src/backend/commands/vacuum.c *** vacuum(VacuumStmt *vacstmt, Oid relid, b *** 214,219 --- 214,222 VacuumCostActive = (VacuumCostDelay > 0); VacuumCostBalance = 0; + VacuumPageHit = 0; + VacuumPageMiss = 0; + VacuumPageDirty = 0; /* * Loop to process each selected relation. diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index b5547c5..a41f1cd 100644 *** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** lazy_vacuum_rel(Relation onerel, VacuumS *** 151,165 int nindexes; BlockNumber possibly_freeable; PGRUsage ru0; ! TimestampTz starttime = 0; bool scan_all; TransactionId freezeTableLimit; /* measure elapsed time iff autovacuum logging requires it */ if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { pg_rusage_init(&ru0); ! if (Log_autovacuum_min_duration > 0) starttime = GetCurrentTimestamp(); } --- 151,168 int nindexes; BlockNumber possibly_freeable; PGRUsage ru0; ! TimestampTz starttime = 0, endtime; bool scan_all; TransactionId freezeTableLimit; + long secs; + int usecs; + double write_rate; /* measure elapsed time iff autovacuum logging requires it */ if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { pg_rusage_init(&ru0); ! if (Log_autovacuum_min_duration > 0 || VacuumCostActive) starttime = GetCurrentTimestamp(); } *** lazy_vacuum_rel(Relation onerel, VacuumS *** 225,247 /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { if (Log_autovacuum_min_duration == 0 || ! TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(), Log_autovacuum_min_duration)) ! ereport(LOG, ! (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n" ! "pages: %d removed, %d remain\n" ! "tuples: %.0f removed, %.0f remain\n" ! "system usage: %s", ! get_database_name(MyDatabaseId), ! get_namespace_name(RelationGetNamespace(onerel)), ! RelationGetRelationName(onerel), ! vacrelstats->num_index_scans, ! vacrelstats->pages_removed, ! vacrelstats->rel_pages, ! vacrelstats->tuples_deleted, ! vacrelstats->new_rel_tuples, ! pg_rusage_show(&ru0; } } --- 228,282 /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { + endtime = GetCurrentTimestamp(); if (Log_autovacuum_min_duration == 0 || ! TimestampDifferenceExceeds(starttime, endtime, Log_autovacuum_min_duration)) ! { ! if (VacuumCostActive) ! { ! TimestampDifference(starttime, endtime, &secs, &usecs); ! write_rate = 0; ! if ((secs > 0) || (usecs > 0)) ! write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) / ! (secs + usecs / 100.0); ! ! ereport(LOG, ! (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n" ! "pages: %d removed, %d remain\n" ! "tuples: %.0f removed, %.0f remain\n" ! "buffer usage: %d hits, %d misses, %d dirtied, %.3f MiB/s write rate\n" ! "system usage: %s", ! get_database_name(MyDatabaseId), ! get_namespace_name(RelationGetNamespace(onerel)), ! RelationGetRelationName(onerel), ! vacrelstats->num_index_scans, ! vacrelstats->pages_removed, ! vacrelstats->rel_pages, ! vacrelstats->tuples_deleted, ! vacrelstats->new_rel_tuples, ! VacuumPageHit, ! VacuumPageMiss, ! VacuumPageDirty, ! write_rate, ! pg_rusage_show(&ru0; ! } ! else ! ereport(LOG,
Re: [HACKERS] Displaying accumulated autovacuum cost
Attached patch includes "math is hard" reworking, so it displays the average write rate in the log output automatically: LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index scans: 1 pages: 0 removed, 163935 remain tuples: 200 removed, 4625165 remain buffer usage: 111901 hits, 123348 misses, 102351 dirtied, 23.365 MiB/s write rate system usage: CPU 1.84s/4.22u sec elapsed 34.22 sec All of the updates to the process title are gone, in favor of some progress report mechanism TBD. The summary is much more important than the progress tracking part as far as I'm concerned, I don't mind splitting things apart to try and get this part in earlier. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 889737e..fa15b2e 100644 *** a/src/backend/commands/vacuum.c --- b/src/backend/commands/vacuum.c *** *** 43,48 --- 43,49 #include "utils/fmgroids.h" #include "utils/guc.h" #include "utils/memutils.h" + #include "utils/ps_status.h" #include "utils/snapmgr.h" #include "utils/syscache.h" #include "utils/tqual.h" *** vacuum(VacuumStmt *vacstmt, Oid relid, b *** 214,219 --- 215,223 VacuumCostActive = (VacuumCostDelay > 0); VacuumCostBalance = 0; + VacuumPageHit = 0; + VacuumPageMiss = 0; + VacuumPageDirty = 0; /* * Loop to process each selected relation. *** vacuum_delay_point(void) *** 1160,1167 if (msec > VacuumCostDelay * 4) msec = VacuumCostDelay * 4; - pg_usleep(msec * 1000L); - VacuumCostBalance = 0; /* update balance values for workers */ --- 1164,1169 diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index b5547c5..a41f1cd 100644 *** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** lazy_vacuum_rel(Relation onerel, VacuumS *** 151,165 int nindexes; BlockNumber possibly_freeable; PGRUsage ru0; ! TimestampTz starttime = 0; bool scan_all; TransactionId freezeTableLimit; /* measure elapsed time iff autovacuum logging requires it */ if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { pg_rusage_init(&ru0); ! if (Log_autovacuum_min_duration > 0) starttime = GetCurrentTimestamp(); } --- 151,168 int nindexes; BlockNumber possibly_freeable; PGRUsage ru0; ! TimestampTz starttime = 0, endtime; bool scan_all; TransactionId freezeTableLimit; + long secs; + int usecs; + double write_rate; /* measure elapsed time iff autovacuum logging requires it */ if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { pg_rusage_init(&ru0); ! if (Log_autovacuum_min_duration > 0 || VacuumCostActive) starttime = GetCurrentTimestamp(); } *** lazy_vacuum_rel(Relation onerel, VacuumS *** 225,247 /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { if (Log_autovacuum_min_duration == 0 || ! TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(), Log_autovacuum_min_duration)) ! ereport(LOG, ! (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n" ! "pages: %d removed, %d remain\n" ! "tuples: %.0f removed, %.0f remain\n" ! "system usage: %s", ! get_database_name(MyDatabaseId), ! get_namespace_name(RelationGetNamespace(onerel)), ! RelationGetRelationName(onerel), ! vacrelstats->num_index_scans, ! vacrelstats->pages_removed, ! vacrelstats->rel_pages, ! vacrelstats->tuples_deleted, ! vacrelstats->new_rel_tuples, ! pg_rusage_show(&ru0; } } --- 228,282 /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0) { + endtime = GetCurrentTimestamp(); if (Log_autovacuum_min_duration == 0 || ! TimestampDifferenceExceeds(starttime, endtime, Log_autovacuum_min_duration)) ! { ! if (VacuumCostActive) ! { ! TimestampDifference(starttime, endtime, &secs, &usecs); ! write_rate = 0; ! if ((secs > 0) || (usecs > 0)) ! write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) / ! (secs + usecs / 100.0); ! ! ereport(LOG, ! (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n" ! "pages: %d removed, %d remain\n" ! "tuples: %.
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 08/27/2011 12:01 AM, Noah Misch wrote: On Fri, Aug 26, 2011 at 10:46:33AM +0200, Tomas Vondra wrote: 1. collect pg_stat_bgwriter stats 2. run pgbench for 10 minutes 3. collect pg_stat_bgwriter stats (to compute difference with (1)) 4. kill the postmaster The problem is that when checkpoint stats are collected, there might be a checkpoint in progress and in that case the stats are incomplete. In some cases (especially with very small db blocks) this has significant impact because the checkpoints are less frequent. Could you remove this hazard by adding a step "2a. psql -c CHECKPOINT"? That's what I do in pgbench-tools, and it helps a lot. It makes it easier to identify when the checkpoint kicks in if you know it's approximately the same time after each test run begins, given similar testing parameters. That said, it's hard to eliminate all of the edge conditions here. For example, imagine that you're consuming WAL files such that you hit checkpoint_segments every 4 minutes. In a 10 minute test run, a checkpoint will start at 4:00 and finish at around 6:00 (with checkpoint_completion_target=0.5). The next will start at 8:00 and should finish at around 10:00--right at the end of when the test ends. Given the variation that sync timing and rounding issues in the write phase adds to things, you can expect that some test runs will include stats from 2 checkpoints, while others will end the test just before the second one finishes. It does throw the numbers off a bit. To avoid this when it pops up, I normally aim to push up to where there are >=4 checkpoints per test run, just so whether I get n or n-1 of them doesn't impact results as much. But that normally takes doubling the length of the test to 20 minutes. As it will often take me days of test time to plow through exploring just a couple of parameters, I'm sympathetic to Tomas trying to improve accuracy here without having to run for quite so long. There's few people who have this problem to worry about though, it's a common issue with benchmarking but not many other contexts. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 08/26/2011 03:54 AM, Magnus Hagander wrote: In theory, this could be the "progress view" or "progress field" talked about around Gregs previous patch - or it could just be modifying the commandstring in pg_stat_activity. Right. The whole progress indicator idea is hard to do for queries in general. But there's enough of these other progress indicator ideas around now that it may be worth putting a standard way to handle them in here. It sounds like that would be sufficient to address the area Tomas is trying to instrument better. I badly want a progress indicator on CREATE INDEX CONCURRENTLY too, to at least let me know what phase of the build process it's on. That's turned into a major headache recently. If we run with the idea of just allowing backends to publish a progress text string, I think this one maps into a similar space as the autovacuum one. Publishing how many seconds the operation has been running for may be reasonable too. Whether the overhead of the timing calls necessary to compute that will be high or not depends on the refresh rate of the progress info. My suggestion before was to name these as key=value pairs for easy parsing; here's three examples now: autovacumm: pgbench_accounts h=182701 m=301515 d=321345 s=62.231 (cache hits, cache misses, dirty writes, seconds) background writer: checkpoint b=511 t=3072 s=5.321 (buffers written, total, seconds) create index concurrently: pgbench_accounts p=1 b=62 t=6213 s=81.232 (phase, blocks processed, total block estimate, seconds) I think that the idea of making this easily human readable is optimistic, because it will make all these strings big enough to start mattering. Given that, we almost have to assume the only consumers of this data will be able to interpret it using the documentation. I'd be happy with just the minimal data set in each case, not including any statistics you can easily derive from the values given (like the MB/s readings). Adding that figure in particular to more of the log messages would be nice though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 08/25/2011 04:57 PM, Tomas Vondra wrote: (b) sends bgwriter stats (so that the buffers_checkpoint is updated) The idea behind only updating the stats in one chunk, at the end, is that it makes one specific thing easier to do. Let's say you're running a monitoring system that is grabbing snapshots of pg_stat_bgwriter periodically. If you want to figure out how much work a checkpoint did, you only need two points of data to compute that right now. Whenever you see either of the checkpoint count numbers increase, you just subtract off the previous sample; now you've got a delta for how many buffers that checkpoint wrote out. You can derive the information about the buffer counts involved that appears in the logs quite easily this way. The intent was to make that possible to do, so that people can figure this out without needing to parse the log data. Spreading out the updates defeats that idea. It also makes it possible to see the buffer writes more in real-time, as they happen. You can make a case for both approaches having their use cases; the above is just summarizing the logic behind why it's done the way it is right now. I don't think many people are actually doing things with this to the level where their tool will care. The most popular consumer of pg_stat_bgwriter data I see is Munin graphing changes, and I don't think it will care either way. Giving people the option of doing it the other way is a reasonable idea, but I'm not sure there's enough use case there to justify adding a GUC just for that. My next goal here is to eliminate checkpoint_segments, not to add yet another tunable extremely few users would ever touch. As for throwing more log data out, I'm not sure what new analysis you're thinking of that it allows. I/O gets increasingly spiky as you zoom in on it; averaging over a shorter period can easily end up providing less insight about trends. If anything, I spend more time summarizing the data that's already there, rather than wanting to break them down. It's already providing way too much detail for most people. Customers tell me they don't care to see checkpoint stats unless they're across a day or more of sampling, so even the current "once every ~5 minutes" is way more info than they want. I have all this log parsing code and things that look at pg_stat_bgwriter to collect that data and produce higher level reports. And lots of it would break if any of this patch is added and people turn it on. I imagine other log/stat parsing programs might suffer issues too. That's your other hurdle for change here: the new analysis techniques have to be useful enough to justify that some downstream tool disruption is inevitable. If you have an idea for how to use this extra data for something useful, let's talk about what that is and see if it's possible to build it in instead. This problem is harder than it looks, mainly because the way the OS caches writes here makes trying to derive hard numbers from what the background writer is doing impossible. When the database writes things out, and when they actually get written to disk, they are not the same event. The actual write is often during the sync phase, and not being able to tracking that beast is where I see the most problems at. The write phase, the easier part to instrument in the database, that is pretty boring. That's why the last extra logging I added here focused on adding visibility to the sync activity instead. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On 08/22/2011 05:54 PM, Jim Nasby wrote: I know folks have talked about progress, but I haven't seen anything specific... could you add info about what table/index vacuum is working on, and how far along it is? I realize that's not very close to an actual % completion, but it's far better than what we have right now. Due to complaints about the mechanism the first version used to inform the user of the progress, I'm yanking that from the next patch altogether. The goal for now is to get a good report into the logs, and then maybe that gets extended later with a progress report. (All of the proposed alternate mechanisms are way more complicated than anything I have time to do right now) FWIW, the number I end up caring about isn't so much write traffic as read. Thanks to a good amount of battery-backed write cache (and possibly some iSCSI misconfiguration), our writes are generally much cheaper than our reads. VACUUM can't really know its true read rate from what's inside the database. I can add a summary of the accumulated read amounts into the logs, in more useful figures than what is provided so far, which is better than nothing. But those will be kind of deceptive, which is one reason I wasn't so focused on them yet. If the relation is largely in the OS cache, but not the PostgreSQL one, the summary can show a read rate even when that isn't actually doing any reads at all. That was exactly the case in the sample data I posted. VACUUM thought it was reading anywhere from 2.5 to 6MB/s. But at the OS level, it was actually reading zero bytes, since the whole thing was in cache already. What you actually want is a count of the accumulated read counters at the OS level. I've recently figured out how to track those, too, but that code is something that lives outside the database. If this is something useful to you, I think you're about to sign up to be my next beta tester for that program. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On 08/18/2011 10:54 AM, Peter Eisentraut wrote: So how about adding a column to pg_stat_activity, progress_metrics or something like that, and add that information there. Adding a field here (I'd go for the simpler "progress") and updating it regularly would be a reasonable way to go here. This data doesn't really need to go into the traditional statistics infrastructure to be useful. I didn't start there because I was already getting pushback on overloading the stats collector with constantly updated metrics last time I did something in this area. I wasn't going to try and argue why it was worth it in this case, just like I'm not going to argue about the complaint over the command string overhead being too high--just going to not do that instead. If the bikeshed I built doesn't look fancy enough to hold the bike I put in there, I'm not going to build a better one right now--I'll just put a cheaper bike in there instead. I was hoping to eventually take the useful summary bits at the end, the totals, and save those into statistics somewhere each time a VACUUM of either sort finishes. It would fit with the information shown in pg_stat_tables, but that's obviously getting too wide. Breaking out a pg_stat_autovacuum view that contains all the relevant bits currently shown in that view, plus these 3 new fields, would be a reasonable start. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On 08/18/2011 10:12 AM, Robert Haas wrote: Perhaps a reasonable way to break up the patch would be: - Part 1: Gather the information and display it in the log_autovacuum_min_duration output. - Part 2: Add the ability to see the information incrementally (via some mechanism yet to be agreed upon). My reaction to all the suggestions for redesign is just that: pull out the part that does the incremental updates altogether, improve the part that dumps the info into the logs, and resubmit without any incremental progress for now. This is much more valuable to me if the first commit that hits is something I can backport trivially. I'm seeing enough production servers running into this problem right now on earlier versions to be worried about that, and the log dump at the end would be a huge help even if that was all they got. I'm going to add directly computing the write MB/s figure from the dirty data written too, since that ends up being the thing that I keep deriving by hand anyway. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On 08/17/2011 07:42 PM, Euler Taveira de Oliveira wrote: I don't like exposing this information only on title processes. It would be difficult for client apps (for example, PGAdmin) to track this kind of information and it is restricted to local access. I'm not objecting to display this information in process title; I'm just saying that that information should be exposed in functions (say pg_stat_get_vacuum_[hit|miss|dirty]) too. I tend to build the simplest possible thing that is useful enough to work. The data is getting stored and shown now, where it wasn't before. If it's possible to expose that in additional ways later too, great. The big step up for this information is to go from "unobtainable" to "obtainable". I'd prefer not to add a quest for "easily obtainable" to the requirements until that big jump is made, for fear it will cause nothing to get delivered. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Displaying accumulated autovacuum cost
K (usually) buffers. Multiply that by 8192/(60*1024*1024), and you get MB/s out of there. I collected that data for a cleanup run of the pgbench_accounts damage done above, CSV file with all the statistics is attached. I also collected OS level stats from Linux about the actual read/write rate of the process, converted into "Write Mbps" (those are actually in MB/s, sloppy capitalization is via OpenOffice "autocorrect"). Those numbers are close enough to make me confident the dirty buffer totals tracked here do turn into useful MB/s values. Sample of the most interesting part: Cost DeltaDirty MbpsWrite Mbps 589,8902.562.73 591,1512.572.73 589,0352.562.72 593,7753.140.20 599,4202.050.00 598,5032.050.00 599,4212.050.00 574,0460.600.01 574,7790.640.67 609,1402.562.68 612,3972.572.69 611,7442.572.69 610,0082.562.68 This shows the expected 600K/minute cost accumulation. And using the dirty= numbers to compute MB/s of write speed closely matches the total write speed of this process. Some of the difference might be I/O to other things besides the main table here, some of it is just because OS write caching will influence the write rate. In the spots where the OS value and what's derived from the dirty rate diverge most, it appears to be because vacuum is filling Linux's write cache. Actual writes accumulated against the process them block for a while. It's a small difference most of the time. I'd be willing to accept a "Dirty MB/s" figure computed this way as accurate enough for most purposes. And this patch lets you get that data, currently unavailable without poking into the OS statistics (if at all), just by doing a little log file and/or command string scraping. Total at the end or real-time monitoring, based on how much work you want to put into it. For a busy site where one or more autovacuum processes are running most of the time, being able to monitor the vacuum portion of the I/O this way will be a huge improvement over the current state of things. I already have a stack of tools built on top of this data I'm working on, and they're making it much easier to come up with an iterative tuning process for autovacuum. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us #"timestamp","page_hit","hit_bytes","page_miss","miss_bytes","page_dirty","dirty_bytes","Dirty Mbps","Total Cost","Cost Delta","Read Bytes","Write Bytes","Read Mbps","Write Mbps" 08/16/2011-18:46:56,21,"172,032",27088,"221,904,896",27079,"221,831,168",2.56,"812,481","589,680",0,"235,945,984",0.00,2.73 08/16/2011-18:47:56,21,"172,032",46737,"382,869,504",46728,"382,795,776",2.56,"1,401,951","589,470",0,"407,298,048",0.00,2.72 08/16/2011-18:48:56,21,"172,032",66359,"543,612,928",66349,"543,531,008",2.55,"1,990,591","588,640",0,"578,527,232",0.00,2.72 08/16/2011-18:49:56,21,"172,032",86092,"705,265,664",86082,"705,183,744",2.57,"2,582,581","591,990",0,"750,551,040",0.00,2.73 08/16/2011-18:50:56,21,"172,032",105713,"866,000,896",105703,"865,918,976",2.55,"3,171,211","588,630",0,"921,657,344",0.00,2.72 08/16/2011-18:51:57,21,"172,032",125369,"1,027,022,848",125359,"1,026,940,928",2.56,"3,760,891","589,680",0,"1,093,173,248",0.00,2.73 08/16/2011-18:52:57,21,"172,032",145019,"1,187,995,648",145008,"1,187,905,536",2.56,"4,350,371","589,480",0,"1,264,517,120",0.00,2.72 08/16/2011-18:53:57,21,"172,032",164717,"1,349,361,664",164706,"1,349,271,552",2.56,"4,941,311","590,940",0,"1,436,352,512",0.00,2.73 08/16/2011-18:54:57,21,"172,032",184387,"1,510,498,304",184376,"1,510,408,192",2.56,"5,531,411","590,100",0,"1,607,917,568",0.00,2.73 08/16/2011-18:55:57,22,"180,224",204078,"1,671,806,976",204067,"1,671,716,864",2.56,"6,122,142","590,731",0,"1,779,671,040",0.00,2.73 08/16/2011-18:56:57,22,"180,224",223706,"1,832,599,552",223695,"1,832,509,440",2.56,"6,710,982","588,840",0,"1,950,982,144",0.00,2.72 08/16/2011-18:57:57,22,"180,224",243369,"1,993,678,848"
Re: [HACKERS] index-only scans
t vary too much] That's 5.4X as fast; not too shabby! Kind of interesting how much different the I/O pattern is on the index-only version. I ran this test against a 3-disk RAID0 set with a 256MB BBWC, so there's some possibility of caching here. But given that each query blows away a large chunk of the other's data, I wouldn't expect that to be a huge factor here: gsmith=# select pg_size_pretty(pg_relation_size('pgbench_accounts')); pg_size_pretty 640 MB gsmith=# select pg_size_pretty(pg_relation_size('pgbench_accounts_pkey')); pg_size_pretty 107 MB gsmith=# select pg_size_pretty(pg_relation_size('sample_data')); pg_size_pretty 112 MB And with the large difference in response time, things appear to be working as hoped even in this situation. If you try this on your laptop, where drive cache size and random I/O are likely to be even slower, you might see an ever larger difference. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] our buffer replacement strategy is kind of lame
On 08/12/2011 10:51 PM, Greg Stark wrote: If you execute a large batch delete or update or even just set lots of hint bits you'll dirty a lot of buffers. The ring buffer forces the query that is actually dirtying all these buffers to also do the i/o to write them out. Otherwise you leave them behind to slow down other queries. This was one of the problems with the old vacuum code which the ring buffer replaced. It left behind lots of dirtied buffers for other queries to do i/o on. I ran into the other side of this when trying to use Linux's relatively new dirty_background_bytes tunable to constrain the OS write cache. When running with the current VACUUM ring buffer code, if there isn't also a large OS write cache backing that, performance suffers quite a bit. I've been adding test rigging to quantify this into pgbench-tools recently, and I fear that one of the possible outcomes could pushback pressure toward making the database's ring buffer bigger. Just a theory--waiting on some numbers still. Anyway, I think every idea thrown out here so far needs about an order of magnitude more types of benchmarking test cases before it can be evaluated at all. The case I just mentioned is a good example of why. Every other test I ran showed a nice improvement with the kernel tuning I tried. But VACUUM was massively detuned in the process. I have an entire file folder filled with notes on way to reorganize the buffer cache, from my background writer work for 8.3. In my mind they're all sitting stuck behind the problem of getting enough standardized benchmark workloads to have a performance regression suite. The idea of touching any of this code without a look at a large number of different tests is a bit optimistic. What I expect to happen here that all initially proposed changes will end up tuning for one workload at the expense of other, not measured ones. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench internal contention
On 07/30/2011 09:08 AM, Robert Haas wrote: If I'm reading the code right, it only modifies __libc_drand48_data on first call, so as long as we called erand48() at least once before spawning the child threads, it would probably work. That seems pretty fragile in the face of the fact that they explicitly state that they're modifying the global random generator state and that you should use erand48_r() if you want reentrant behavior. So I think if we're going to go the erand48() route we probably ought to force pgbench to always use our version rather than any OS-supplied version. By the way: the landmines in this whole area are what sunk the attempt to switch pgbench over to using 64 bits for the accounts table back in February. See the last few messages of http://postgresql.1045698.n5.nabble.com/Re-PERFORM-pgbench-to-the-MAXINT-td3337374.html to revisit. I think you've retraced all of that now, but double checking your plan against things like the AIX specific weirdness I pointed out there may be useful. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incremental checkopints
On 07/29/2011 11:04 AM, jord...@go-link.net wrote: I think that current implementation of checkpoints is not good for huge shared buffer cache and for many WAL segments. If there is more buffers and if buffers can be written rarely more updates of buffers can be combined so total number of writes to disk will be significantly less. I think that incremental checkpoints can achieve this goal (maybe more) and price is additional memory (about 1/1000 of size of buffer cache). The current code optimizes for buffers that are written frequently. Those will sit in shared_buffers and in the hoped for case, only be written once at checkpoint time. There are two issues with adopting increment checkpoints instead, one fundamental, the other solvable but not started on yet: 1) Postponing writes as long as possible always improves the resulting throughput of those writes. Any incremental checkpoint approach will detune throughput by some amount. If you make writes go out more often, they will be less efficient; that's just how things work if you benchmark anything that allows write combining. Any incremental checkpoint approach is likely to improve latency in some cases if it works well, while decreasing throughput in most cases. 2) The incremental checkpoint approach used by other databases, such as the MySQL implementation, works by tracking what transaction IDs were associated with a buffer update. The current way PostgreSQL saves buffer sync information for the checkpoint to process things doesn't store enough information to do that. As you say, the main price there is some additional memory. From my perspective, the main problem with plans to tweak the checkpoint code is that we don't have a really good benchmark that tracks both throughput and latency to test proposed changes against. Mark Wong has been working to get his TCP-E clone DBT-5 running regularly for that purpose, and last I heard that was basically done at this point--he's running daily tests now. There's already a small pile of patches that adjust checkpoint behavior around that were postponed from being included in 9.1 mainly because it was hard to prove they were useful given the benchmark used to test them, pgbench. I have higher hopes for DBT-5 as being a test that gives informative data in this area. I would want to go back and revisit the existing patches (sorted checkpoints, spread sync) before launching into this whole new area. I don't think any of those has even been proven not to work, they just didn't help the slightly unrealistic pgbench write-heavy workload. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write scalability
On 07/26/2011 12:33 PM, Pavan Deolasee wrote: I think what I am suggesting is that the default pgbench test setup would probably not give you a good scalability as number of clients are increased and one reason could be the contention in the small table. So it might be a good idea to get rid of that and see if we get much better scalability and understand other bottlenecks. You can easily see this form of contention pulling down results when the database itself is really small and the overall transaction rate is very high. With Robert using a scale=100, no more than 80 clients, and transaction rates peaking at <10K TPS on a 24 core box, I wouldn't expect this form of contention to be a large issue. It may be dropping results a few percent, but I'd be surprised if it was any more than that. It's easy enough to use "-N" to skip the updates to the smaller tellers and branches table to pull that out of the way. TPS will go up, because it's doing less per transaction. That's not necessarily a better test case though, it's just a different one. The regular case includes a lot of overwriting the same blocks in the hot branches and tellers tables. That effectively pushes a lot more I/O toward being likely to happen at checkpoint time. Those tables rarely have any significant I/O outside of the checkpoint in the standard "TPC-B like" scenario, because their usage counts stay high most of the time. Contention for small tables that are being heavily updated is still important to optimize too though. Which type of test makes more sense depends on what aspect of performance you're trying to focus on. I'll sometimes do a full pgbench-tools "sweep" (range of multiple scales and clients at each scale) in both regular write and "-N" write modes, just to collect the slightly different data each provides. The form of checkpoint I/O spike you see at sync time is very different in the two cases, but both usage profiles are important to consider and optimize. That Robert has started with the regular case doesn't worry me too much now that I've seen the parameters he's using, he's not running it in a way where I'd expect branch/teller contention to be a major limiting factor on the results. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write scalability
On 07/25/2011 04:07 PM, Robert Haas wrote: I did 5-minute pgbench runs with unlogged tables and with permanent tables, restarting the database server and reinitializing the tables between each run. Database scale? One or multiple pgbench worker threads? A reminder on the amount of RAM in the server would be helpful for interpreting the results too. The other thing I'd recommend if you're running more write-heavy tests is to turn off autovacuum. Whether or not it kicks in depends on the test duration and the TPS rate, which adds a source of variability better avoided here. It also means that faster tests end up getting penalized by having it run near their end, which makes them no longer look like fast results. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench--new transaction type
On 07/25/2011 08:12 PM, Jeff Janes wrote: In the absence of -s and presence of -f, :scale gets set to 1, rather than to "select count(*) from pgbench_branches". I don't think it is nice to rely on people to correctly specify -s. I would like to change -f so that in the absence of -s it uses the same scale as -S, etc., do. But that would probably be too backwards incompatible to be an acceptable change. Auto-detecting scale only works if you have a database populated with the pgbench tables. You can use "pgbench -f" to run arbitrary bits of SQL, using pgbench as the driver program for all sorts of benchmarking tasks against other data sets. For example, at http://projects.2ndquadrant.it/sites/default/files/pgbench-intro.pdf I show how to use it for testing how fast INSERT statements of various sizes can execute. The very concept of a "scale" may not make sense for other data sets that pgbench will happily run against when using "-f". The only sort of heuristic I have considered adding here when running in that mode is: 1) Check if pgbench_branches exists. 2) If so, count the records to derive a scale, as currently done in the non "-f" cases 3) Should that scale not match the value of "-s", issue a warning. You have to assume anyone sophisticated enough to be playing with "-f" may be doing something the program doesn't expect or understand, and let them do that without trying to "fix" what may be intentional behavior. But a check for the most common mistake made in this area wouldn't bother people who aren't using pgbench in its original form at all, while it would help those new to the program from screwing this up. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --unlogged-tables
On 07/25/2011 09:23 AM, Robert Haas wrote: At some point, we also need to sort out the scale factor limit issues, so you can make these things bigger. I had a patch to improve that whole situation, but it hasn't seem to nag at me recently. I forget why it seemed less important, but I doubt I'll make it another six months without coming to some resolution there. The two systems I have in for benchmarking right now have 128GB and 192GB of RAM in them, so large scales should have been tested. Unfortunately, it looks like the real-world limiting factor on doing lots of tests at big scales is how long it takes to populate the data set. For example, here's pgbench creation time on a big server (48 cores, 128GB RAM) with a RAID10 array, when scale=2 (292GB): real174m12.055s user17m35.994s sys 0m52.358s And here's the same server putting the default tablespace (but not the WAL) on [much faster flash device I can't talk about yet]: Creating new pgbench tables, scale=2 real169m59.541s user18m19.527s sys0m52.833s I was hoping for a bigger drop here; maybe I needed to use unlogged tables? (ha!) I think I need to start looking at the pgbench data generation stage as its own optimization problem. Given how expensive systems this large are, I never get them for very long before they are rushed into production. People don't like hearing that just generating the data set for a useful test is going to take 3 hours; that tends to limit how many of them I can schedule running. And, yes, I'm going to try and sneak in some time to test fastpatch locking on one of these before they head into production. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --unlogged-tables
On 07/22/2011 08:15 PM, David Fetter wrote: Do you have any theories as to how indexing on SSD speeds things up? IIRC you found only marginal benefit in putting WALs there. Are there cases that SSD helps more than others when it comes to indexing? Yes, I've found a variety of workloads where using a SSD turns out to be slower than the old-school array of drives with a battery-backed write cache. Tiny commits are slower, sequential writes can easily be slower, and if there isn't a random I/O component to the job the SSD won't get any way to make up for that. In the standard pgbench case, the heavy UPDATE traffic does a lot of random writes to the index blocks of the pgbench_accounts table. Even in cases where the whole index fits into RAM, having the indexes backed by a faster store can end up speeding those up, particularly at checkpoint time. And if you can't quite fit the whole index in RAM, but it does fit on the SSD, being able to shuffle it in/out of flash as needed to look pointers to data blocks is a whole lot better than seeking around a regular drive. That case is where the biggest win seems to be at. I'd like to publish some hard numbers on all this, but have realized I need to relocate just the pgbench indexes to do a good simulation. And I'm getting tired of doing that manually. If I'm going to put time into testing this unlogged table variation that Robert has submitted, and I expect to, I'm just pointing out I'd like to that the "index on alternate tablespace" one available then too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --unlogged-tables
That looks straightforward enough. The other thing I keep realizing would be useful recently is to allow specifying a different tablespace to switch to when creating all of the indexes. The old "data here, indexes on faster storage here" trick was already popular in some environments. But it's becoming a really big win for environments that put indexes on SSD, and being able to simulate that easily with pgbench would be nice. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Policy on pulling in code from other projects?
On 07/22/2011 02:09 PM, Joshua D. Drake wrote: Remember this library follows the RFC for URIs which is why I even brought it up. If it was just some random parser, I wouldn't even have bothered. Do we care about the RFC for URIs? The main components of the RFC involve: -Decoding escaped characters entered by percent-encoding -Parsing the permissible IPv4 and IPv6 addresses -Handling absolute vs. relative addresses. This is a lot of the spec, and it's not really relevant for PostgreSQL URIs -Splitting the URI into its five main components I know I've seen a URL-oriented %-encoding decoder as a PostgreSQL function already (I think Gabriele here wrote one). Surely useful IP address decoding functions are already around. And the splitting part seems like a fairly straightforward bit of regular expression work. I think one crossover point where it's absolutely worth using the external library for this purpose is if you have an app that has to follow all of the rules around path names. If this project didn't already have libraries around for things like IP address parsing, using the library instead would also make more sense. The remaining chores if you don't worry about all the path name trivia, and know how to interpret an IP address, seem feasible to do directly. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench--new transaction type
On 06/30/2011 12:13 AM, Jeff Janes wrote: One more thought I had, would it make sense to change this from the creation of a PL/pgSQL permanent function to instead use the recently added DO anonymous block syntax? I think that would be somewhat cleaner about leaving cruft behind in the database. But it would increase the overhead of each outer execution, and would also mean that it would not be backwards compatible to run against servers before 9.0 I think some measurement of the overhead difference would be needed to know for sure about the first part. I suspect that given the block size of 512 now being targeted, that would end up not mattering very much. pgbench's job is to generate a whole database full of cruft, so I can't say I'd find an argument from either side of that to be very compelling. I'm not real busy anymore testing performance of PostgreSQL instances from before 9.0 anymore either, so whether this mode was compatible with them or not isn't very compelling either. Just a mixed bag all around in those areas. I would say take a look at what the performance change looks like, and see if it turns out to make the patch to pgbench less obtrusive. The main objection against committing this code I can see is that it will further complicate pgbench for a purpose not many people care about. So if the DO version ends up with a smaller diff and less impact on the codebase, that would likely be a more substantial tie-breaker in its favor than any of these other arguments. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deriving release notes from git commit messages
On 06/24/2011 03:21 PM, Robert Haas wrote: If I were attacking this problem, I'd be inclined to make a web application that lists all the commits in a format roughly similar to the git API, and then lets you tag each commit with tags from some list (feature, bug-fix, revert, repair-of-previous-commit, etc.). Some of the tagging (e.g. docs-only) could probably even be done automatically. Then somebody could go through once a month and update all the tags. I'd be more more willing to volunteer to do that than I would be to trying to get the right metadata tag in every commit... I tend not to think in terms of solutions that involve web applications because I never build them, but this seems like a useful approach to consider. Given that the list of tags is pretty static, I could see a table with a line for each commit, and a series of check boxes in columns for each tag next to it. Then you just click on each of the tags that apply to that line. Once that was done, increasing the amount of smarts that go into pre-populating which boxes are already filled in could then happen, with "docs only" being the easiest one to spot. A really smart implementation here might even eventually make a good guess for "bug fix" too, based on whether a bunch of similar commits happened to other branches around the same time. Everyone is getting better lately at noting the original SHA1 when fixing a mistake too, so being able to identify "repair" seems likely when that's observed. This approach would pull the work from being at commit time, but it would still be easier to do incrementally and to distribute the work around than what's feasible right now. Release note prep takes critical project contributors a non-trivial amount of time, this would let anyone who felt like tagging things for an hour help with the early stages of that. And it would provide a functional source for the metadata I've been searching for too, to drive all this derived data about sponsors etc. Disclaimer: as a person who does none of this work currently, my suggestions are strictly aimed to inspire those who do in a direction that might makes things easier for them. I can get the sponsor stuff I've volunteered to work on finished regardless. I just noticed what seems like it could be a good optimization over here while I was working on that. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deriving release notes from git commit messages
On 06/24/2011 03:28 PM, Christopher Browne wrote: I expect that the correlation between commit and [various parties] is something that will need to take place outside git. Agreed on everything except the "Author" information that is already being placed into each commit. The right data is already going into there, all it would take is some small amount of tagging to make it easier to extract programatically. The existing CommitFest data goes quite a long ways towards capturing interesting information (with the likely exception of sponsorship); what it's missing, at this point, is a capture of what commit or commits wound up drawing the proposed patch into the official code base. The main problem with driving this from the CommitFest app is that not every feature ends up in there. Committers who commit their own work are one source of those. Commits for bug fixes that end up being notable enough to go into the release notes are another. I agree it would be nice if every entry marked as "Committed" in the CF app included a final link to the message ID of the commit closing it. But since I don't ever see that being the complete data set, I find it hard to justify enforcing that work. And the ability to operate programatically on the output from "git log" is a slightly easier path to walk down than extracting the same from the CF app, you avoid one pre-processing step: extracting the right entries in the database to get a list of commit IDs. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deriving release notes from git commit messages
On 06/24/2011 04:52 PM, Bruce Momjian wrote: That tagging is basically what I do on my first pass through the release notes. For the gory details: http://momjian.us/main/blogs/pgblog/2009.html#March_25_2009 Excellent summary of the process I was trying to suggest might be improved; the two most relevant bits: 3 remove insignificant items 2.7k1 day 4 research and reword items 1k 5 days Some sort of tagging to identify feature changes should drive down the time spent on filtering insignificant items. And the person doing the commit already has the context you are acquiring later as "research" here. Would suggesting they try to write a short description at commit time drive it and the "reword" phase time down significantly? Can't say for sure, but I wanted to throw the idea out for consideration--particularly since solving it well ends up making some of this other derived data people would like to see a lot easier to generate too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Re: [HACKERS] Deriving release notes from git commit messages
On 06/24/2011 01:42 PM, Robert Haas wrote: I am disinclined to add a "feature" annotation. I think it is unlikely that will end up being any more useful than just extracting either the whole commit message or its first line. I don't see any good way to extract the list of commits relevant to the release notes without something like it. Right now, you can't just mine every commit into the release notes without getting more noise than signal. Something that tags the ones that are adding new features or other notable updates, as opposed to bug fixes, doc updates, etc., would allow that separation. I am not inclined to try to track sponsors in the commit message at all. I was not suggesting that information be part of the commit. We've worked out a reasonable initial process for the people working on sponsored features to record that information completely outside of the commit or release notes data. It turns out though that process would be easier to drive if it were easier to derive a feature->{commit,author} list though--and that would spit out for free with the rest of this. Improving the ability to do sponsor tracking is more of a helpful side-effect of something that's useful for other reasons rather than a direct goal. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Deriving release notes from git commit messages
There's been a steady flow of messages on pgsql-advocacy since last month (threads "Crediting sponsors in release notes?" and "Crediting reviewers & bug-reporters in the release notes") talking about who/how should receive credited for their work on PostgreSQL. That discussion seems to be me heading in one inevitable direction: it's not going to be possible to make everyone happy unless there's a way to track all of these things for each feature added to PostgreSQL: -Short description for the release notes -Feature author(s) -Reviewers and bug reporters -Sponsors -Main git commit adding the feature Now, this is clearly the job for a tool, because the idea that any person capable of doing this work will actually do it is laughable--everyone qualified is too busy. It strikes me however that the current production of the release notes is itself a time consuming and error-prone process that could also be improved by automation. I had an idea for pushing forward both these at once. Committers here are pretty good at writing terse but clear summaries of new features when they are added. These are generally distilled further for the release notes. It strikes me that a little decoration of commit messages might go a long way toward saving time in a few areas here. I'll pick a simple easy example I did to demonstrate; I wrote a small optimization to commit_delay committed at http://archives.postgresql.org/message-id/e1pqp72-0001us...@gemulon.postgresql.org This made its way into the release notes like this: Improve performance of commit_siblings (Greg Smith) This allows the use of commit_siblings with less overhead. What if the commit message had been decorated like this? Feature: Improve performance of commit_siblings Optimize commit_siblings in two ways to improve group commit. First, avoid scanning the whole ProcArray once we know there... With that simple addition, two things become possible: -Generating a first draft of the release notes for a new version could turn into a script that parses the git commit logs, which has gotta save somebody a whole lot of time each release that goes into the first draft of the release notes. -All of these other ways to analyze of the contributors would be much easier to maintain. A little "Author:" decoration to that section of each commit would probably be welcome too. I'm sure someone is going to reply to this suggesting some git metadata is the right way to handle this, but that seems like overkill to me. I think there's enough committer time gained in faster release note generation for this decoration to payback its overhead, which is important to me--I'd want a change here to net close to zero for committers. And the fact that it would also allow deriving all this other data makes it easier to drive the goals rising out of advocacy forward too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch - Debug builds without optimization
On 06/20/2011 01:34 PM, Tom Lane wrote: I was trying to illustrate how to have minimal turnaround time when testing a small code change. Rebuilding from scratch is slow enough that you lose focus while waiting. (Or I do, anyway.) I just keep upgrading to the fastest CPU I can possibly justify to avoid losing focus; it goes fast with 8 cores. I was trying to demonstrate that peg makes this very high level now, and I was more jousting at the idea that everyone should bother to write their own individual reinstall script. The peg code makes it easy to assimilate whatever other neat optimization ideas one might come across. I just pushed an update out that absorbed this one, so now if you do: stop peg rebuild It uses the install-bin trick you suggested. It even does a couple of sanity checks so that it will probably fall back to a regular build if it doesn't look like you have a good install and binary tree already. Maybe I'll make a "reinstall" alias that does this combination next. I don't expect to improve your workflow. But people who haven't already invested a good chunk of work in automating things already will probably take some time to catch up with where peg puts them on day one. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
Kevin Grittner wrote: But its not hard to imagine an application mix where this feature could cause a surprising ten-fold performance drop after someone does a table scan which could persist indefinitely. I'm not risking that in production without a clear mechanism to automatically recover from that sort of cache skew The idea that any of this will run automatically is a dream at this point, so saying you want to automatically recover from problems with the mechanism that doesn't even exist yet is a bit premature. Some of the implementation ideas here might eventually lead to where real-time cache information is used, and that is where the really scary feedback loops you are right to be worried about come into play. The idea for now is that you'll run this new type of ANALYZE CACHE operation manually, supervised and at a time where recent activity reflects the sort of workload you want to optimize for. And then you should review its results to make sure the conclusions it drew about your cache population aren't really strange. To help with that, I was thinking of writing a sanity check tool that showed how the cached percentages this discovers compare against the historical block hit percentages for the relation. An example of how values changed from what they were already set to after a second ANALYZE CACHE is probably useful too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench--new transaction type
Itagaki Takahiro wrote: Anyway, I'm not sure we need to include the query mode into the pgbench's codes. Instead, how about providing "a sample script" as a separate sql file? pgbench can execute any script files with -f option. When you execute using "-f", it doesn't correctly detect database scale. Also, the really valuable thing here is seeing the higher selects/second number come out in the report. I just realized neither Jeff nor myself ever included an example of the output in the new mode, which helps explain some of why the patch is built the way it is: $ pgbench -c 12 -j 4 -T 30 -P pgbench plgsql function created. starting vacuum...end. transaction type: SELECT only via plpgsql scaling factor: 100 query mode: simple number of clients: 12 number of threads: 4 duration: 30 s number of transactions actually processed: 9342 tps = 311.056293 (including connections establishing) tps = 311.117886 (excluding connections establishing) selects per second = 159260.822100 (including connections establishing) selects per second = 159292.357672 (excluding connections establishing) -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch - Debug builds without optimization
Greg Stark wrote: I've always wondered what other people do to iterate quickly. I'd have bet money you had an elisp program for this by now! The peg utility script I use makes a reinstall as simple as: stop peg build The UI for peg is still is a little rough around switching to another project when using git, and the PGDATA handling could be better. Being able to give each patch I want to play with its own binary+data tree with a couple of simple commands is the time consuming part to setup I wanted to automate completely, and for that it works great: https://github.com/gregs1104/peg -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On 06/19/2011 06:15 PM, Kevin Grittner wrote: I think the point is that if, on a fresh system, the first access to a table is something which uses a tables scan -- like select count(*) -- that all indexed access would then tend to be suppressed for that table. After all, for each individual query, selfishly looking at its own needs in isolation, it likely *would* be faster to use the cached heap data. If those accesses can compete with other activity, such that the data really does stay in the cache rather than being evicted, then what's wrong with that? We regularly have people stop by asking for how to pin particular relations to the cache, to support exactly this sort of scenario. What I was would expect on any mixed workload is that the table would slowly get holes shot in it, as individual sections were evicted for more popular index data. And eventually there'd be little enough left for it to win over an index scan. But if people keep using the copy of the table in memory instead, enough so that it never really falls out of cache, well that's not necessarily even a problem--it could be considered a solution for some. The possibility that people can fit their entire table into RAM and it never leaves there is turning downright probable in some use cases now. A good example are cloud instances using EC2, where people often architect their systems such that the data set put onto any one node fits into RAM. As soon as that's not true you suffer too much from disk issues, so breaking the databases into RAM sized pieces turns out to be very good practice. It's possible to tune fairly well for this case right now--just make the page costs all low. The harder case that I see a lot is where all the hot data fits into cache, but there's a table or two of history/archives that don't. And that would be easier to do the right thing with given this bit of "what's in the cache?" percentages. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench--new transaction type
I applied Jeff's patch but changed this to address concerns about the program getting stuck running for too long in the function: #define plpgsql_loops 512 This would be better named as "plpgsql_batch_size" or something similar instead, the current name suggests it's how many loops to run which is confusing. My main performance concern here was whether this change really matter so much once a larger number of clients were involved. Some of the other things you can do to optimize single-client performance aren't as useful with lots of them. Here's how the improvements in this mode worked for me on a server with 4 Hyper-Threaded cores (i870); shared_buffers=256MB, scale=100: 1 client: -S: 11533 -S -M prepared: 19498 -P: 49547 12 clients, 4 workers: -S: 56052 -S -M prepared: 82043 -P: 159443 96 clients, 8 workers: -S: 49940 -S -M prepared: 76099 -P: 137942 I think this is a really nice new workload to demonstrate. One of the things we tell people is that code works much faster when moved server-side, but how much faster isn't always easy to show. Having this mode available lets them see how dramatic that can be quite easily. I know I'd like to be able to run performance tests for clients of new hardware using PostgreSQL and tell them something like this: "With simple clients executing a statement at a time, this server reaches 56K SELECTs/section. But using server-side functions to execute them in larger batches it can do 159K". The value this provides for providing an alternate source for benchmark load generation, with a very different profile for how it exercises the server, is good too. Things to fix in the patch before it would be a commit candidate: -Adjust the loop size/name, per above -Reformat some of the longer lines to try and respect the implied right margin in the code formatting -Don't include the "plgsql function created." line unless in debugging mode. -Add the docs. Focus on how this measures how fast the database can execute SELECT statements using server-side code. An explanation that the "transaction" block size is 512 is important to share. It also needs a warning that time based runs ("-T") may have to wait for a block to finish and go beyond its normally expected end time. -The word "via" in the "transaction type" output description is probably not the best choice. Changing to "SELECT only using PL/pgSQL" would translate better, and follow the standard case use for the name of that language. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On 06/19/2011 09:38 AM, Greg Stark wrote: There's another problem which I haven't seen mentioned. Because the access method will affect the cache there's the possibility of feedback loops. e.g. A freshly loaded system prefers sequential scans for a given table because without the cache the seeks of random reads are too expensive... Not sure if it's been mentioned in this thread yet, but he feedback issue has popped up in regards to this area plenty of times. I think everyone who's producing regular input into this is aware of it, even if it's not mentioned regularly. I'm not too concerned about the specific case you warned about because I don't see how sequential scan vs. index costing will be any different on a fresh system than it is now. But there are plenty of cases like it to be mapped out here, and many are not solvable--they're just something that needs to be documented as a risk. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] procpid?
On 06/16/2011 05:27 PM, Bruce Momjian wrote: Greg Smith wrote: -It is still useful to set current_query to descriptive text in the cases where the transaction is etc. Uh, if we are going to do that, why not just add the boolean columns to the existing view? Clearly renaming procpid isn't worth creating another view. I'm not completely set on this either way; that's why I suggested a study that digs into typical monitoring system queries would be useful. Even the current view is pushing the limits for how much you can put into something that intends to be human-readable though. Adding a new pile of columns to it has some downsides there. I hadn't ever tried to write down everything I'd like to see changed here until this week, so there may be further column churn that justifies a new view too. I think the whole idea needs to get chewed on a bit more. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] procpid?
On 06/15/2011 12:41 PM, Robert Haas wrote: But I will note that we had better be darn sure to make all the changes we want to make in one go, because I dowanna have to create pg_sessions2 (or pg_tessions?) in a year or three. I just added a new section to the TODO to start collecting up some of these related ideas into one place: http://wiki.postgresql.org/wiki/Todo#Monitoring so we might try to get as many as possible all in one go. The other item on there related to pg_stat_activity that might impact this design was adding a column for tracking progress of commands like CREATE INDEX and VACUUM (I updated to note CLUSTER falls into that category too). While query progress will always be a hard problem, adding a field to store some sort of progress indicator might be useful even if it only worked on these two initially. Anyway, topic for another time. The only other item related to this view on the TODO was "Have pg_stat_activity display query strings in the correct client encoding". That might be worthwhile to bundle into this rework, but it doesn't seem something that impacts the UI such that it must be considered early. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch - Debug builds without optimization
On 06/16/2011 10:10 AM, Tom Lane wrote: I could see providing some other nonstandard configure switch that changed the default -O level ... but realistically, would that do anything that you couldn't already do by setting CFLAGS, ie ./configure CFLAGS="-O0 -g" I think a small discussion of the issue Radek ran into is appropriate to put somewhere, with this example. The install procedure section of the docs already includes a CFLAGS example: ./configure CC=/opt/bin/gcc CFLAGS='-O2 -pipe' There is also a section talking about setting options like --enable-cassert in the Developer's FAQ. Looking at all the info out there about developer/debug builds, it's really kind of sketchy and distributed though. No one place that pulls all the most common things people need together into one resource. What seems like the idea solution here is to add a new section to the install procedure with brief coverage of this entire area. Here's a prototype of text that might go there: = Installation for development and debugging = When modifying the PostgreSQL source code, or when trying to find the source of a bug in the program, it may be helpful to build the program in a way that makes this process easier. There are build-time only changes that enable better error checking and debugging, including: Pass --enable-cassert to configure. This can make bugs more visible, because they cause operations to abort with a clear error. That makes some types of debugging much easier. This is risky on a production server, as described in the documentation for this parameter. Pass --enable-debug to configure. This provides better information about what the server is doing when looking at it using a debugger. It's less risky to a production server than enabling assertions, and it normally has less of a performance impact hgtoo. See its documentation for more details. Disable compiler optimization. When using a debugger to trace into the source code of the server, steps may optimized away by the normal build process. In some situations --enable-debug will disable such optimization, but this is not always the case. Specifically disabling optimization is possible with many compilers by setting the compiler flags when configuration the source code build, such as: ./configure CFLAGS="-O0 -g" This example for the gcc compiler disables optimizations, and tells the compiler to provide extra debugging information most useful with the gdb debugger. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] procpid?
Since the CF is upon us and discussion is settling, let's see if I can wrap this bikeshedding up into a more concrete proposal that someone can return to later. The ideas floating around have gelled into: -Add a new pg_stat_sessions function that is implemented similarly to pg_stat_activity. For efficiency and simplicity sake, internally this will use the same sort of SRF UI that pg_stat_get_activity does inside src/backend/utils/adt/pgstatfuncs.c There will need to be some refactoring here to reduce code duplication between that and the new function (which will presumably named pg_stat_get_sessions) -The process ID field here will be named "pid" to match other system views, rather than the current "procpid" -State information such as whether the session is idle, idle in a transaction, or has a query visible to this backend will be presented as booleans similar to the current waiting field. A possible additional state to expose is the concept of "active", which ends up being derived using logic like "visible && !idle && !idle_transaction && !waiting" in some monitoring systems. -A case could be made for making some of these state fields null, instead true or false, in situations where the session is not visible. If you don't have rights to see the connection activity, setting idle, idle_transaction, and active all to null may be the right thing to do. More future bikeshedding is likely on this part, once an initial patch is ready for testing. I'd want to get some specific tests against the common monitoring goals of tools like check_postgres and the Munin plug-in to see which implementation makes more sense for them as input on that. -It is still useful to set current_query to descriptive text in the cases where the transaction is etc. That text is not ambiguous with a real query, it is useful for a human-readable view, and it improves the potential for pg_stat_sessions to fully replace a deprecated pg_stat_activity (instead of just co-existing with it). That the query text is overloaded with this information seems agreed to be a good thing; it's just that filtering on the state information there should not require parsing it. The additional booleans will handle that. If idle sessions can be filtered using "WHERE NOT idle", whether the current_query for them reads "" or is null won't matter to typical monitoring use. Given no strong preference there, using "" is both familiar and more human readable. I'll go add this as a TODO now. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] procpid?
On 06/15/2011 04:13 AM, Rainer Pruy wrote: I much prefer reading an " in transaction" on a quick glance over having to search a column and recognize a "t" from an "f" to find out whether there is a transaction pending or not. This is a fair observation. If we provide a second view here that reorganizes the data toward something more appropriate for monitoring systems to process it, you may be right that the result will be a step backwards for making it human-readable. They may end up being similar, co-existing views aimed at different uses, rather than one clearly replacing the other one day. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] procpid?
Here's the sort of thing every person who writes a monitoring tool involving pg_stat_activity goes through: 1) Hurray! I know how to see what the database is doing now! Let me try counting all the connections so I can finally figure out what to set [max_connections | work_mem | other] to. 2) Wait, some of these can be "". That's not documented. I'll have to special case them because they don't really matter for my computation. 3) Seriously, there's another state for idle in a transaction? Just how many of these special values are there? [There's actually one more surprise after this] The whole thing is enormously frustrating, and it's an advocacy problem--it contributes to people just starting to become serious about using PostgreSQL lowering their opinion of its suitability for their business. If this is what's included for activity monitoring, and it's this terrible, it suggest people must not have very high requirements for that. And what you end up with to make it better is not just another few keystrokes. Here, as a common example I re-use a lot, is a decoder inspired by Munin's connection count monitoring graph: SELECT waiting, CASE WHEN current_query='' THEN true ELSE false END AS idle, CASE WHEN current_query=' in transaction' THEN true ELSE false END AS idletransaction, CASE WHEN current_query='' THEN false ELSE true END as visible, CASE WHEN NOT waiting AND current_query NOT IN ('', ' in transaction', '') THEN true ELSE false END AS active, procpid,current_query FROM pg_stat_activity WHERE procpid != pg_backend_pid(); What percentage of people do you think get this right? Now, what does that number go to if these states were all obviously exposed booleans? As I'm concerned, this design is fundamentally flawed as currently delivered, so the concept of "breaking" it doesn't really make sense. The fact that you can only figure all this decoding magic out through extensive trial and error, or reading the source code to [the database | another monitoring tool], is crazy. It's a much bigger problem than the fact that the pid column is misnamed, and way up on my list of things I'm just really tired of doing. Yes, we could just document all these mystery states to help, but they'd still be terrible. This is a database; let's expose the data in a way that it's easy to slice yourself using a database query. And if we're going to fix that--which unfortunately will be breaking it relative to those already using the current format--I figure why not bundle the procpid fix into that while we're at it. It's even possible to argue that breaking that small thing will draw useful attention to the improvements in other parts of the view. Having your monitoring query break after a version upgrade is no fun. But if investigating why reveals new stuff you didn't notice in the release notes, the changes become more discoverable, albeit in a somewhat perverse way. Putting on my stability hat instead of my "make it right" one, maybe this really makes sense to expose as a view with a whole new name. Make this new one pg_activity (there's no stats here anyway), keep the old one around as pg_stat_activity for a few releases until everyone has converted to the new one. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] procpid?
On 06/14/2011 06:00 PM, Tom Lane wrote: As far as Greg's proposal is concerned, I don't see how a proposed addition of two columns would justify renaming an existing column. Additions should not break any sanely-implemented application, but renamings certainly will. It's not so much justification as something that makes the inevitable complaints easier to stomach, in terms of not leaving a really bad taste in the user's mouth. My thinking is that if we're going to mess with pg_stat_activity in a way that breaks something, I'd like to see it completely refactored for better usability in the process. If code breaks and the resulting investigation by the admin highlights something new, that offsets some of the bad user experience resulting from the breakage. Also, I haven't fully worked whether it makes sense to really change what current_query means if the idle/transaction component of it gets moved to another column. Would it be better to set current_query to null if you are idle, rather than the way it's currently overloaded with text in that case? I don't like the way this view works at all, but I'm not sure the best way to change it. Just changing procpid wouldn't be the only thing on the list though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On 06/14/2011 07:08 PM, Tom Lane wrote: I concur with Robert's desire to not push experimental code into the main repository, but we need to have *some* way of working with it. Maybe a separate repo where experimental branches could hang out would be helpful? Well, this one is sitting around in branches at both git.postgresql.org and github so far, both being updated periodically. Maybe there's some value around an official experimental repository too, but I thought that was the idea of individual people having their own directories on git.postgres.org. Do we need something fancier than that? It would be nice, but seems little return on investment to improve that, relative to what you can do easily enough now. The idea David Fetter has been advocating of having a "bit rot" farm to help detect when the experimental branches drift too far out of date tries to make that concept really formal. I like that idea, too, but find it hard to marshal enough resources to do something about it. The current status quo isn't that terrible; noticing bit rot when it's relevant isn't that hard to do. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On 06/14/2011 01:16 PM, Robert Haas wrote: But there's no reason that code (which may or may not eventually prove useful) has to be incorporated into the main tree. We don't commit code so people can go benchmark it; we ask for the benchmarking to be done first, and then if the results are favorable, we commit the code. Who said anything about this being a commit candidate? The "WIP" in the subject says it's not intended to be. The community asks people to submit design ideas early so that ideas around them can be explored publicly. One of the things that needs to be explored, and that could use some community feedback, is exactly how this should be benchmarked in the first place. This topic--planning based on cached percentage--keeps coming up, but hasn't gone very far as an abstract discussion. Having a patch to test lets it turn to a concrete one. Note that I already listed myself as the reviewer here, so it's not even like this is asking explicitly for a community volunteer to help. Would you like us to research this privately and then dump a giant patch that is commit candidate quality on everyone six months from now, without anyone else getting input to the process, or would you like the work to happen here? I recommended Cédric not ever bother soliciting ideas early, because I didn't want to get into this sort of debate. I avoid sending anything here unless I already have a strong idea about the solution, because it's hard to keep criticism at bay even with that. He was more optimistic about working within the community contribution guidelines and decided to send this over early instead. If you feel this is too rough to even discuss, I'll mark it returned with feedback and we'll go develop this ourselves. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] procpid?
On 06/14/2011 02:20 PM, Kevin Grittner wrote: Just on our Wiki pages we have some queries available for copy/paste which would need multiple versions while both column names were in supported versions of the software: http://wiki.postgresql.org/wiki/Lock_dependency_information http://wiki.postgresql.org/wiki/Lock_Monitoring http://wiki.postgresql.org/wiki/Backend_killer_function ...and most of these would actually be simplified if they could just JOIN on pid instead of needing this common idiom: join pg_catalog.pg_stat_activity ka on kl.pid = ka.procpid Yes, there are a lot of these floating around. I'd bet that in an hour of research I could find 95% of them though, and make sure they were all updated in advance of the release. (I already did most of this search as part of stealing every good idea I could find in this area for my book) I think that's consistent with the "save up our breaking changes to do them all at once" approach. I don't actually buy into this whole idea at all. We already have this big wall at 8.3 because changes made in that release are too big for people on the earlier side to upgrade past. I'd rather see a series of smaller changes in each release, even if they are disruptive, so that no one version turns into a frustrating hurdle seen as impossible to clear. This adjustment is a perfect candidate for putting into 9.2 to me, because I'd rather reduce max(breakage) across releases than intentionally aim at increasing it but bundling them into larger clumps. For me, the litmus test is whether the change provides enough improvement that it outweighs the disruption when the user runs into it. This is why I suggested a specific, useful, and commonly requested (to me at least) change to pg_stat_activity go along with this. If people discover their existing pg_stat_activity tools break, presumably they're going to look at the view again to see what changed. When they do that, I don't want the reaction to be "why was this random change made?" I want it to be "look, there are useful new fields in here; let me see if I can use them too here". That's how you make people tolerate disruption in upgrades. If they see a clear improvement in the same spot when forced to fix around it, the experience is much more pleasant if they get something new out of it too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] procpid?
On 06/14/2011 11:44 AM, Jim Nasby wrote: Wouldn't it be better still to have both the new and old columns available for a while? That would produce the minimum amount of disruption to tools, etc. Doing this presumes the existence of a large number of tools where the author is unlikely to be keeping up with PostgreSQL development. I don't believe that theorized set of users actually exists. There are people who use pg_stat_activity simply, and there are tool authors who are heavily involved enough that they will see a change here coming far enough in advance to adopt it without disruption. If there's a large base of "casual" tool authors, who wrote something using pg_stat_activity once and will never update it again, I don't know where they are. Anyway, I want a larger change to pg_stat_activity than this one, and I would just roll fixing this column name into that more disruptive and positive change. Right now the biggest problem with this view is that you have to parse the text of the query to figure out what state the connection is in. This is silly; there should be boolean values exposed for "idle" and "in transaction". I want to be able to write things like this: SELECT idle,in_trans,count(*) FROM pg_stat_activity GROUP BY idle,in_trans; SELECT min(backend_start) FROM pg_stat_activity WHERE idle; Right now the standard approach to this is to turn current_query into a derived state value using CASE statements. It's quite unfriendly, and a bigger problem than this procpid mismatch. Fix that whole mess at once, and now you've got something useful enough to justify breaking tools. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On 06/14/2011 11:04 AM, Robert Haas wrote: Even if the data were accurate and did not cause plan stability, we have no evidence that using it will improve real-world performance. That's the dependency Cédric has provided us a way to finally make progress on. Everyone says there's no evidence that this whole approach will improve performance. But we can't collect such data, to prove or disprove it helps, without a proof of concept patch that implements *something*. You may not like the particular way the data is collected here, but it's a working implementation that may be useful for some people. I'll take "data collected at ANALYZE time" as a completely reasonable way to populate the new structures with realistic enough test data to use initially. Surely at least one other way to populate the statistics, and possibly multiple other ways that the user selects, will be needed eventually. I commented a while ago on this thread: every one of these discussions always gets dragged into the details of how the cache statistics data will be collected and rejects whatever is suggested as not good enough. Until that stops, no progress will ever get made on the higher level details. By its nature, developing toward integrating cached percentages is going to lurch forward on both "collecting the cache data" and "using the cache knowledge in queries" fronts almost independently. This is not a commit candidate; it's the first useful proof of concept step for something we keep talking about but never really doing. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lazy vxid locks, v1
On 06/13/2011 07:55 AM, Stefan Kaltenbrunner wrote: all those tests are done with pgbench running on the same box - which has a noticable impact on the results because pgbench is using ~1 core per 8 cores of the backend tested in cpu resoures - though I don't think it causes any changes in the results that would show the performance behaviour in a different light. Yeah, this used to make a much bigger difference, but nowadays it's not so important. So long as you have enough cores that you can spare a chunk of them to drive the test with, and you crank "-j" up to a lot, there doesn't seem to be much of an advantage to moving the clients to a remote system now. You end up trading off CPU time for everything going through the network stack, which adds yet another set of uncertainty to the whole thing anyway. I'm glad to see so many people have jumped onto doing these SELECT-only tests now. The performance farm idea I've been working on runs a test just like what's proven useful here. I'd suggested that because it's been really sensitive to changes in locking and buffer management for me. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)
On 06/13/2011 08:27 PM, Jeff Janes wrote: pgbench sends each query (per connection) and waits for the reply before sending another. Do we know whether sysbench does that, or if it just stuffs the kernel's IPC buffer full of queries without synchronously waiting for individual replies? sysbench creates a thread for each client and lets them go at things at whatever speed they can handle. You have to setup pgbench with a worker per core to get them even close to level footing. And even in that case, sysbench has a significant advantage, because it's got the commands it runs more or less hard-coded in the program. pgbench is constantly parsing things in its internal command language and then turning them into SQL requests. That's flexible and allows it to be used for some neat custom things, but it uses a lot more resources to drive the same number of clients. I can't get sysbench to "make" for me, or I'd strace in single client mode and see what kind of messages are going back and forth. If you're using a sysbench tarball, no surprise. It doesn't build on lots of platforms now. If you grab http://projects.2ndquadrant.it/sites/default/files/bottom-up-benchmarking.pdf it has my sysbench notes starting on page 34. I had to checkout the latest version from their development repo to get it to compile on any recent system. The attached wrapper script may be helpful to you as well to help get over the learning curve for how to run the program; it iterates sysbench over a number of database sizes and thread counts running the complicated to setup OLTP test. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us #!/bin/bash SB="$HOME/0.4/sysbench/sysbench" DB="--pgsql-user=postgres --pgsql-password=password --pgsql-db=sysbench --pgsql-host=localhost --db-driver=pgsql" #DB="--mysql-user=root --db-driver=mysql --mysql-table-engine=innodb --mysql-db=sysbench" #THREADS="1 2 3 4 5 6 7 8 9 10 11 12 16 24 32 48 64 96" THREADS="1" TIME=10 TEST_PARAM="--oltp-read-only=off --oltp-test-mode=complex" #TEST_PARAM="--oltp-read-only=on --oltp-test-mode=simple" SIZES="1" #SIZES="1 10 100 1000 5000 1 5 10" # In complex, non read-only mode, there will be duplicate key issues, and # threads will fail with deadlock--causing no value to be returned. #TEST_PARAM="--oltp-test-mode=complex" for s in $SIZES do SIZE_PARAM="--oltp-table-size=$s" # Just in case! $SB $DB --test=oltp cleanup $SB $DB $SIZE_PARAM --test=oltp prepare for t in $THREADS do echo -n $t threads: $SB $DB --test=oltp $TEST_PARAM $SIZE_PARAM --init-rng --max-requests=0 --max-time=$TIME --num-threads=$t run 2>&1 # | grep "read/write requests" done $SB $DB --test=oltp cleanup done -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench--new transaction type
On 06/11/2011 03:21 PM, Jeff Janes wrote: I wouldn't expect IPC chatter to show up in profiling, because it costs wall time, but not CPU time. The time spent might be attributed to the kernel, or to pgbench, or to nothing at all. Profilers aren't necessarily just accumulating raw CPU time though. If the approach includes sampling "what code is active right now?" periodically, you might be able to separate this out even though it's not using CPU time in the normal fashion. I think you might just need to use a better profiler. Anyway, the sort of breakdown this helps produce is valuable regardless. I highlighted the statement you made because I reflexively challenge theorizing about code hotspots on general principle. The measurement-based breakdown you provided was more what I look for. But there is no repository of such "useful for developer testing" patches, other than this mailing list. That being the case, would it even be worthwhile to fix it up more and submit it to commitfest? The activity around profiling pgbench and trying to crack one of the bottlenecks has been picking up a lot of momentum lately, and if we're lucky that will continue throughout 9.2 development. As such, now seems a good time as any to consider adding something like this. We may end up reskinng lots of pgbench before this is over. I added your patch to the CommitFest. So at a loop of 512, you would have overhead of 59.0/512=0.115 out of total time of 17.4, or 0.7% overhead. So that should be large enough. That I think is workable. If the split was a compile time constant fixed at 512 unless you specifically changed it, even the worst typical cases shouldn't suffer much from batch overhang. If you create a database so large that you only get 50 TPS, which is unusual but not that rare, having a 512 execution batch size would mean you might get your "-T" set end time lagging 10 seconds behind its original plan. Unlike the 10K you started with, that is reasonable; that does sound like the sweet spot where overhead is low but time overrun isn't too terrible. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Formatting curmudgeon HOWTO
With another round of GSoC submissions approaching, I went looking around for some better guidance on the topic of how to follow terse submission guidelines like "blend in with the surrounding code" or "remove spurious whitespace". And I didn't find any. Many mature open-source projects say things like that, but I haven't been able to find a tutorial of just what that means, or how to do it. Now we have http://wiki.postgresql.org/wiki/Creating_Clean_Patches to fill that role, which fits in between "Working with Git" and "Submitting a Patch" as a fairly detailed walkthrough. That should be an easier URL to point people who submit malformed patches toward than the documentation we've had before. Advocacy aside: this page might be a good one to submit to sites that publish open-source news. It's pretty generic advice, is useful but not widely documented information, and it reflects well on our development practice. I'm trying to reverse the perception we hear about sometimes that submitting patches to PostgreSQL is unreasonably difficult. Seeing an example of how much easier it is to read a well formatted patch serves well for making people understand why the project has high expectations for formatting work. It's not pedantic, it's functionally better. I threw it onto reddit as a first spot to popularize: http://www.reddit.com/r/technology/comments/hy0aq/creating_clean_patches_with_git_diff/ -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers