Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Wed, Oct 20, 2010 at 6:39 AM, Terry Laurenzo t...@laurenzo.org wrote: The answer may be to have both a jsontext and jsonbinary type as each will be optimized for a different case. I want to choose one format for JSON rather than having two types. It should be more efficient than other format in many cases, and not so bad in other cases. I think the discussion was started with BSON could represent was a subset of what JSON could represent. So, any binary format could be acceptable that have enough representational power compared with text format. For example, a sequence of byte-length text could reduce CPU cycles for reparsing and hold all of the input as-is except ignorable white-spaces. It is not a BSON, but is a binary format. Or, if we want to store numbers in binary form, I think the format will be numeric type in postgres. It has high precision, and we don't need any higher precision than it to compare two numbers eventually. Even if we use BSON format, we need to extend it to store all of numeric values, that precision is 10^1000. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Good points. In addition, any binary format needs to support object property traversal without having to do a deep scan of all descendants. BSON handles this with explicit lengths for document types (objects and arrays) so that entire parts of the tree can be skipped during sibling traversal. It would also be nice to make sure that we store fully parsed strings. There are lots of escape options that simply do not need to be preserved (c escapes, unicode, octal, hex sequences) and hinder the ability to do direct comparisons. BSON also makes a small extra effort to ensure that object property names are encoded in a way that is easily comparable, as this will be the most frequently compared items. I'm still going to write up a proposed grammar that takes these items into account - just ran out of time tonight. Terry On Wed, Oct 20, 2010 at 12:46 AM, Itagaki Takahiro itagaki.takah...@gmail.com wrote: On Wed, Oct 20, 2010 at 6:39 AM, Terry Laurenzo t...@laurenzo.org wrote: The answer may be to have both a jsontext and jsonbinary type as each will be optimized for a different case. I want to choose one format for JSON rather than having two types. It should be more efficient than other format in many cases, and not so bad in other cases. I think the discussion was started with BSON could represent was a subset of what JSON could represent. So, any binary format could be acceptable that have enough representational power compared with text format. For example, a sequence of byte-length text could reduce CPU cycles for reparsing and hold all of the input as-is except ignorable white-spaces. It is not a BSON, but is a binary format. Or, if we want to store numbers in binary form, I think the format will be numeric type in postgres. It has high precision, and we don't need any higher precision than it to compare two numbers eventually. Even if we use BSON format, we need to extend it to store all of numeric values, that precision is 10^1000. -- Itagaki Takahiro
Re: [HACKERS] Extensions, this time with a patch
Tom Lane t...@sss.pgh.pa.us writes: Robert Haas robertmh...@gmail.com writes: It seems good to do this in the normal case, but (1) if client_min_messages was already set higher than WARNING, we probably should not lower it and (2) we might want to have a way to lower it for troubleshooting purposes. I think the standard way of troubleshooting would be to run the extension's script by hand, no? So while I agree with (1), I'm not sure we need to sweat about (2). Will go and fix (1), but like Tom I think (2) is handled by the extension's author running pg_execute_from_file() rather than CREATE EXTENSION for debugging purposes: then the settings are not changed. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Tom Lane wrote: Stephen R. van den Berg s...@cuci.nl writes: In order to simplify recovery at this point (enormously), it would have been very helpful (at almost negligible cost), to have the name of the table, the name of the columns, and the types of the columns available. Why don't we insert that data into the first page of a regular table file after in the special data area? (1) it wouldn't necessarily fit Three viable options (pick any one, depending on other criteria): a. If it doesn't fit at first, drop columnnames, and try again. b. If necessary extend it into the special data area of the following page (repeat until you stored everything). c. Simply put in what fits and discard the rest. Please note that the information in there is purely informational and of a best-effort nature. It is not required for regular operation. It should have close to no performance impact in the normal use case. It is meant to help with forensics if the catalog is damaged or lost, and you still want to attempt to recover most of the data contained in this tablefile. (2) what are you going to do to maintain it during ALTER TABLE? Simply pick a point in time where it will be blatantly overwritten. Either at reception of the command, or at commit time, whatever is more convenient to implement. It's a best effort service, it is not a problem if we get it wrong sometimes due to wildly overlapping alter table/commit/rollback sequences. (3) if there are any custom types involved, you're still lost. Yes, this is a not a complete solution, it's a valid attempt at making forensics a lot easier in the common case at virtually no cost to the running system. -- Stephen. Life is that brief interlude between nothingness and eternity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Installer Fix on some Windows 7 64-bit Systems
One some Windows machines the command processor is not invoked properly when running the WScript.Run method. The same PostgreSQL 9.0.1-1 one click installer that worked perfrectly fine on one Windows 7 64-bit machine did not work as expected on another machine with same OS. Adding the command processor explicity: %comspec% /c to the first parameter on all objShell.Run calls in the installer WScript files below resolves this issue! For installruntimes.vbs file used the following mod: iRet = objShell.Run(%comspec% /c strPackage /q:a /c:msiexec /i vcredist.msi /qb!, 0, True) For initcluster.vbs, loadmodules.vbs and startupcfg.vbs used the following mod: DoCmd = objShell.Run(%comspec% /c objTempFolder.Path \ strBatchFile, 0, True) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Roberto Mello wrote: On Tue, Oct 19, 2010 at 6:13 PM, Stephen R. van den Berg s...@cuci.nl wrote: Greg Stark wrote: premise this on the idea that you've lost everything in the catalog but not the data in other tables. Which seems like a narrow use case. It happens, more often than you'd think. ??My client had it, I've seen numerous google hits which show the same. It happened to us recently when a customer had disk issues, and we It usually happens when there are disk issues, that's exactly what it is for. A tool like Stephen is proposing would most likely have helped us recover at least some or most of the data, I would hope. Well, because the customer could recreate (within reason) the original table definitions, we were able to recover all of his data (12 tables, including some toasted/compressed). It's just that matching table and file, and subsequently figuring out some missing columns which may have been added/removed later, can be rather timeconsuming and could be made a lot easier (not necessarily perfect) if that information would have been present in the first page of a file. -- Stephen. Life is that brief interlude between nothingness and eternity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ISN patch that applies cleanly with git apply
I have committed the patch and the text proposed above. Can I take it that there is no need for a formal review, where I answer various questions per http://wiki.postgresql.org/wiki/Reviewing_a_Patch? -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
Josh Berkus j...@agliodbs.com writes: Well, one thing to be addressed is separating the PITR functionality from replication. PITR needs a lot of features -- timelines, recovery stop points, etc. -- which replication doesn't need or want. I think that focussing on streaming replication functionality and ignoring the archive logs case is probably the best way to logically separate these two. Presumably anyone who needs archive logs as well will be a professional DBA. So, I've been thinking some more about this. We now have two modes of operation when starting up a PostgreSQL cluster, either it's a primary/master or it's a standby (in recovery or hot_standby). What I think would make sense here would be to add another mode of operation, archiving. A cluster that is archiving will accept replication connections and will accept WAL files streamed there, that it will put on its archive directory, defaults to $PGDATA/pg_xlog_archive. It should also be able to take a base backup from its primary server, maybe with a new pg_ctl command. The base backup location defaults to $PGDATA/pg_basebackup/$label. Then, it would also accept replication connection in the other way around, starting a walsender to publish its archive or its base backup, so that you could prepare a new standby against the archive server, then switch the primary_conninfo to the master and that's it. Now, the archive and restore commands in this setup would be internal commands pg_walsender and pg_walreceiver. That would mean we need to extend those spacial backends to be able to send and receive a full WAL file but it sounds simple enough, right? Then we could add support for those to compress the WAL files before streaming, as we're talking internal commands here it sounds easy enough (we're already linking against gzip I think). The base backup streaming support would be there for the archiving mode but also for the normal standby empty $PGDATA initial start up. Comments? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ISN patch that applies cleanly with git apply
On Wed, Oct 20, 2010 at 6:12 AM, Peter Geoghegan peter.geoghega...@gmail.com wrote: I have committed the patch and the text proposed above. Can I take it that there is no need for a formal review, where I answer various questions per http://wiki.postgresql.org/wiki/Reviewing_a_Patch? That is correct. That page is just a list of things to think about, anyway; a review needn't be in exactly that format (in fact I'm not sure it was even intended to be used that way). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Wed, Oct 20, 2010 at 6:22 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: In v6 patch, should client_min_messages or log_min_messages be lower than WARNING, they get set to WARNING for the script install context. We still dump the extension's script at each WARNING, but you can set your client_min_messages (and log_min_messages) to ERROR before hand. I would vote for overriding client_min_messages but not log_min_messages. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ISN patch that applies cleanly with git apply
On ons, 2010-10-20 at 11:12 +0100, Peter Geoghegan wrote: Can I take it that there is no need for a formal review, where I answer various questions per http://wiki.postgresql.org/wiki/Reviewing_a_Patch? The short answer is no. But note that there is no such thing as a formal review. The page you link to is only a guideline if you set out to review something and don't know what to do. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Wed, Oct 20, 2010 at 1:06 AM, Greg Smith g...@2ndquadrant.com wrote: Josh Berkus wrote: Well, now that you mention it, I also think that hot standby should be the default. Yes, I know about the overhead, but I also think that the number of our users who want easy replication *far* outnumber the users who care about an extra 10% WAL overhead. I think this whole situation is similar to the resistance to increasing default_statistics_target. There's additional overhead added by enabling both of these settings, in return for making it more likely for the average person to see useful behavior by default. If things are rejiggered so the advanced user has to turn things off in order to get optimal performance when not using these features, in return for the newbie being more likely to get things working in basic form, that's probably a net win for PostgreSQL advocacy. But much like default_statistics_target, there needs to be some more formal work done on quantifying just how bad each of these overheads really are first. We lost 3-7% on multiple simple benchmarks between 8.3 and 8.4[1] because of that retuning for ease of use on real-world workloads, and that's not something you want to repeat too often. Exactly. It doesn't take many 3-7% slowdowns to add up to being 50% or 100% slower, and that sucks. In fact, I'm still not convinced that we were wise to boost default_statistics_target as much as we did. I argued for a smaller boost at the time. Actually, I think the best thing for default_statistics_target might be to scale the target based on the number of rows in the table, e.g. given N rows: 10 + (N / 1000), if N 40,000 46 + (N / 1), if 50,000 N 3,540,000 400, if N 3,540,000 Consider a table with 2,000 rows. With default_statistics_target = 100, we can store up to 100 MCVs; and we break the remaining ~1900 values up into 100 buckets with 19 values/bucket. In most cases, that is probably overkill. Where you tend to run into problems with inadequate statistics is with the values that are not quite common enough to be an MCV, but are still significantly more common than their companions in the same bucket. However, with only 19 values in a bucket, you're probably not going to have that problem. If you scale the table down to 1000 rows you now have 9 values in a bucket, which makes it *really* unlikely you're going to have that problem. On the other hand, on a table with 4 million rows, it is entirely likely that there could be more than 100 values whose frequencies are worth tracking individually, and odds are good also that even if the planning time is a little longer to no purpose, it'll still be small relatively to the query execution time. It's unfortunately impractical for the size of the MCV list to track linearly with the size of the table, because there are O(n^2) algorithms in use, but I think some kind of graduated scheme might enable us to buy back some of that lost performance without damaging real workloads very much. Possibly even helping real workloads, because you may very well join large fact tables against small dimension tables, and odds are good that under the present scheme the fact tables have more statistics than they really need. As to replication, I don't believe the contention that most people will want to use replication. Many will, and that is fine, but many also won't. The world is full of development and test machines where replication is a non-issue, and some people won't even run it in production because the nature of their application makes the data on that box non-essential, or because they replicate with Bucardo or Slony. I completely agree that we should make it easier to get replication set up without (multiple) server restarts, but imposing a performance overhead on untuned systems is not the right way to do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Tue, Oct 19, 2010 at 11:30 PM, daveg da...@sonic.net wrote: On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: On 20/10/10 16:05, Mark Kirkwood wrote: shmget and friends are hugetlbpage aware, so it seems it should 'just work'. Heh - provided you specify SHM_HUGETLB in the relevant call that is :-) I had a patch for this against 8.3 that I could update if there is any interest. I suspect it is helpful. I think it would be a good feature. Of course, we would need appropriate documentation, and some benchmarks showing that it really works. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Robert Haas robertmh...@gmail.com writes: I would vote for overriding client_min_messages but not log_min_messages. Well it defaults to WARNING so I see your point. Then again, we're talking about hundreds of lines (3197 lines of isn, 531 lines for hstore) of output per message, containing a script that you're not maintaining. Do we really want that amount of extra logging? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Wed, Oct 20, 2010 at 9:33 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: I would vote for overriding client_min_messages but not log_min_messages. Well it defaults to WARNING so I see your point. Then again, we're talking about hundreds of lines (3197 lines of isn, 531 lines for hstore) of output per message, containing a script that you're not maintaining. Do we really want that amount of extra logging? Well, my thought was that it makes sense to override the user's logging preferences because, after all, if they wanted the extra logging, they could run the script by hand. But what gets logged to the system log is server policy, not user preference, and I'm reluctant to think we should second-guess whatever the admin has configured. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
Robert Haas robertmh...@gmail.com writes: I get the impression that you think that there's a problem not only with the approach but with any approach whatsoever to that underlying problem. Let's just say that the approaches proposed so far have performance and/or functionality and/or code maintenance penalties that are utterly unacceptable from the standpoint of people who don't need RLS. I don't know if there is a workable solution, but I do know I've not seen one. With respect to selectivity estimation, do we have a live bug there now? No, I don't believe so. Given that you'd like to get the planner to call function XYZ, you could create an operator using XYZ and attach to it one of the estimation functions that will actually call the underlying function --- but you have to have call permission on the function in order to create the operator. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus arrays versus typmods
Robert Haas robertmh...@gmail.com writes: On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: We've already accepted the cost of doing getBaseTypeAndTypmod() in a whole lot of performance-critical parsing paths, on the off chance that the target datatype might be a domain. It's not apparent to me that array subscripting is so important as to deserve an exemption from that. Especially when not doing so doesn't work. Hmm... so are there no cases where zeroing out the typelem will cost us an otherwise-unnecessary syscache lookup? My point is that anyplace that is relying on the surface typelem, without drilling down to see what the base type is, is wrong. So yeah, those lookups are (will be) necessary. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
Robert Haas robertmh...@gmail.com writes: On Tue, Oct 19, 2010 at 11:30 PM, daveg da...@sonic.net wrote: On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: Heh - provided you specify SHM_HUGETLB in the relevant call that is :-) I had a patch for this against 8.3 that I could update if there is any interest. I suspect it is helpful. I think it would be a good feature. Of course, we would need appropriate documentation, and some benchmarks showing that it really works. I believe that for the equivalent Solaris option, we just automatically enable it when available. So there'd be no need for user documentation. However, I definitely *would* like to see some benchmarks proving that the change actually does something useful. I've always harbored the suspicion that this is just a knob to satisfy people who need knobs to frob. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Itagaki Takahiro itagaki.takah...@gmail.com writes: On Wed, Oct 20, 2010 at 12:58 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Lets rename the directory. Hmmm, but we call it 'xml2' in the doc. There is no 'pgxml' at all in it. http://developer.postgresql.org/pgdocs/postgres/xml2.html However, I don't think we can change the module name because pg_upgrade will fail if the module (.so) name was changed. So, it might be the point of compromise to keep two names until we deprecate it completely. If the extensions manager is dependent on the assumption that a module's name matches the name of the directory it's built in, that assumption needs to be removed anyway. There are too many use-cases where that wouldn't hold, even if we try to force the standard contrib modules to follow such a rule. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Greg Smith g...@2ndquadrant.com writes: Josh Berkus wrote: Well, now that you mention it, I also think that hot standby should be the default. Yes, I know about the overhead, but I also think that the number of our users who want easy replication *far* outnumber the users who care about an extra 10% WAL overhead. ... But much like default_statistics_target, there needs to be some more formal work done on quantifying just how bad each of these overheads really are first. Quite. Josh, have you got any evidence showing that the penalty is only 10%? There are cases, such as COPY and ALTER TABLE, where you'd be looking at 2X or worse penalties, because of the existing optimizations that avoid writing WAL at all for operations where a single final fsync can serve the purpose. I'm not sure what the penalty for typical workloads is, partly because I'm not sure what should be considered a typical workload for this purpose. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Tom Lane t...@sss.pgh.pa.us writes: If the extensions manager is dependent on the assumption that a module's name matches the name of the directory it's built in It is not. There's some magic for simple cases so that contrib mostly works with no editing, but of course, that's only mostly. The version Itakagi-San worked with had not a single change to the contrib sources, I've only begun to change things there (in v6) with the spi case, that now produces 5 extensions control files out of a single Makefile, thanks to this single new line: CONTROL = $(addsuffix .control, $(MODULES)) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Stephen R. van den Berg s...@cuci.nl writes: It's just that matching table and file, and subsequently figuring out some missing columns which may have been added/removed later, can be rather timeconsuming and could be made a lot easier (not necessarily perfect) if that information would have been present in the first page of a file. So you've already moved the goalposts from what was claimed in your prior message. If the data is not maintained (with 100% reliability) during ALTER TABLE, how are you going to do something like figure out missing columns? I can see the potential usefulness of a self-documenting table storage format, but this proposal isn't that; it's just an unreliable kluge. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to reliably detect if it's a promoting standby
pg_is_in_recovery() returns a bool, are you proposing to change that? No. I just thought about adding more condition when it returns true. Here is the patch. Comments are welcome! *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 5604,5610 GetLatestXTime(void) Datum pg_is_in_recovery(PG_FUNCTION_ARGS) { ! PG_RETURN_BOOL(RecoveryInProgress()); } /* --- 5604,5613 Datum pg_is_in_recovery(PG_FUNCTION_ARGS) { ! /* use volatile pointer to prevent code rearrangement */ ! volatile WalRcvData *walrcv = WalRcv; ! ! PG_RETURN_BOOL(RecoveryInProgress() walrcv-walRcvState == WALRCV_RUNNING); } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Robert Haas robertmh...@gmail.com writes: On Wed, Oct 20, 2010 at 6:22 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: In v6 patch, should client_min_messages or log_min_messages be lower than WARNING, they get set to WARNING for the script install context. We still dump the extension's script at each WARNING, but you can set your client_min_messages (and log_min_messages) to ERROR before hand. I would vote for overriding client_min_messages but not log_min_messages. Why? The problem with unreasonably bulky messages is just as objectionable for the log. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Wed, Oct 20, 2010 at 10:10:00AM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Oct 19, 2010 at 11:30 PM, daveg da...@sonic.net wrote: On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: Heh - provided you specify SHM_HUGETLB in the relevant call that is :-) I had a patch for this against 8.3 that I could update if there is any interest. I suspect it is helpful. I think it would be a good feature. Of course, we would need appropriate documentation, and some benchmarks showing that it really works. I believe that for the equivalent Solaris option, we just automatically enable it when available. So there'd be no need for user documentation. However, I definitely *would* like to see some benchmarks proving that the change actually does something useful. I've always harbored the suspicion that this is just a knob to satisfy people who need knobs to frob. regards, tom lane Oracle apparently uses hugepages if they are available by first trying with the SHM_HUGETLB option. If it fails, they reissue the command without that option. This article does mention some of the benefits of the larger pagesizes with large shared memory regions: http://appcrawler.com/wordpress/?p=686 Regard, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On 20.10.2010 17:19, Tom Lane wrote: Greg Smithg...@2ndquadrant.com writes: Josh Berkus wrote: Well, now that you mention it, I also think that hot standby should be the default. Yes, I know about the overhead, but I also think that the number of our users who want easy replication *far* outnumber the users who care about an extra 10% WAL overhead. ... But much like default_statistics_target, there needs to be some more formal work done on quantifying just how bad each of these overheads really are first. Quite. Josh, have you got any evidence showing that the penalty is only 10%? There are cases, such as COPY and ALTER TABLE, where you'd be looking at 2X or worse penalties, because of the existing optimizations that avoid writing WAL at all for operations where a single final fsync can serve the purpose. I'm not sure what the penalty for typical workloads is, partly because I'm not sure what should be considered a typical workload for this purpose. Going from wal_level='minimal' to 'archivë́' incurs the penalty on WAL-logging COPY etc. That's a big penalty. However, the difference between wal_level='archive' and wal_level='hot_standby' should be tiny. The big reason for separating those two in 9.0 was that it's all new code with new ways to fail and, yes, new bugs. It's not smart to expose people who are not interested in using hot standby to those issues. But maybe we feel more comfortable merging 'archive' and 'hot_standby' levels in 9.1. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to reliably detect if it's a promoting standby
On 20.10.2010 17:32, Tatsuo Ishii wrote: pg_is_in_recovery() returns a bool, are you proposing to change that? No. I just thought about adding more condition when it returns true. Here is the patch. Comments are welcome! ... Datum pg_is_in_recovery(PG_FUNCTION_ARGS) { ! /* use volatile pointer to prevent code rearrangement */ ! volatile WalRcvData *walrcv = WalRcv; ! ! PG_RETURN_BOOL(RecoveryInProgress() walrcv-walRcvState == WALRCV_RUNNING); } This returns 'false' if you're in hot standby mode running against an archive. That seems wrong, I don't think the walreceiver state should play any role in this. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Going from wal_level='minimal' to 'archiveÍ' incurs the penalty on WAL-logging COPY etc. That's a big penalty. However, the difference between wal_level='archive' and wal_level='hot_standby' should be tiny. I'm not sure I believe that either, because of the costs associated with logging lock acquisitions. We really need some actual benchmarks in this area, rather than handwaving ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Tom Lane wrote: Stephen R. van den Berg s...@cuci.nl writes: It's just that matching table and file, and subsequently figuring out some missing columns which may have been added/removed later, can be rather timeconsuming and could be made a lot easier (not necessarily perfect) if that information would have been present in the first page of a file. So you've already moved the goalposts from what was claimed in your prior message. If the data is not maintained (with 100% reliability) during ALTER TABLE, how are you going to do something like figure out missing columns? Most alter table operations are well thought through and rarely undone (at least not on production databases). This means that most tables can be restored. I can see the potential usefulness of a self-documenting table storage format, but this proposal isn't that; it's just an unreliable kluge. Restoring tables/databases from table storage only is, by definition, an unreliable kludge. I'm not opposed to making the definition storage more robust, but, since the records in the table already have lost their relation to the pg_clog records, and therefore it *already* is uncertain which records were deleted and/or have the wrong number of columns, it seems to be a needless waste of time and energy to provide more reliable information about the column structure. I know for a fact that those who have lost data in such a way, and are faced with the option to have this unreliable kludgy information available now, or wait for a few years/months until a reliable solution is present; they would (in every single case) opt for the former and get at least some (if not all) of their data back in a shorter amount of time. -- Stephen. Life is that brief interlude between nothingness and eternity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Excerpts from Dimitri Fontaine's message of mié oct 20 07:22:53 -0300 2010: Itagaki Takahiro itagaki.takah...@gmail.com writes: CREATE EXTENSION command * Environment could be modified by the installer script. =# SHOW search_path; = $user,public =# CREATE EXTENSION dblink; =# SHOW search_path; = public because almost all of the modules have SET search_path in the scripts: -- Adjust this setting to control where the objects get created. SET search_path = public; Is is an intended behavior? Personally, I want the installer to run in sandbox. One idea is to rewrite module scripts to use BEGIN - SET LOCAL - COMMIT, but we cannot execute CREATE EXTENSION in transaction if do so. Using SPI to execute the extension's script already means that it can not contain explicit BEGIN and COMMIT commands. Now, is it possible to force a Reset of all GUCs after script's execution? Would it work to force a new transaction internally in CREATE EXTENSION, and use the equivalent of SET LOCAL in the CREATE EXTENSION code? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: If the extensions manager is dependent on the assumption that a module's name matches the name of the directory it's built in It is not. There's some magic for simple cases so that contrib mostly works with no editing, but of course, that's only mostly. The version Itakagi-San worked with had not a single change to the contrib sources, I don't think that no changes to the makefiles is a requirement, or even a wish-list item, for this. I think it's perfectly reasonable for the makefile to have to specify the module name; far better that than that we get the name by some magic or other. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus arrays versus typmods
On Wed, Oct 20, 2010 at 10:03 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: We've already accepted the cost of doing getBaseTypeAndTypmod() in a whole lot of performance-critical parsing paths, on the off chance that the target datatype might be a domain. It's not apparent to me that array subscripting is so important as to deserve an exemption from that. Especially when not doing so doesn't work. Hmm... so are there no cases where zeroing out the typelem will cost us an otherwise-unnecessary syscache lookup? My point is that anyplace that is relying on the surface typelem, without drilling down to see what the base type is, is wrong. So yeah, those lookups are (will be) necessary. OK. In that case, +1 from me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
On Wed, Oct 20, 2010 at 10:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I get the impression that you think that there's a problem not only with the approach but with any approach whatsoever to that underlying problem. Let's just say that the approaches proposed so far have performance and/or functionality and/or code maintenance penalties that are utterly unacceptable from the standpoint of people who don't need RLS. I don't know if there is a workable solution, but I do know I've not seen one. With respect to selectivity estimation, do we have a live bug there now? No, I don't believe so. Given that you'd like to get the planner to call function XYZ, you could create an operator using XYZ and attach to it one of the estimation functions that will actually call the underlying function --- but you have to have call permission on the function in order to create the operator. But suppose the operator already exists, but I don't have permission to call the underlying function... can I get the planner to call the function for me by EXPLAIN-ing the right query? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to reliably detect if it's a promoting standby
On Wed, Oct 20, 2010 at 10:35 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 20.10.2010 17:32, Tatsuo Ishii wrote: pg_is_in_recovery() returns a bool, are you proposing to change that? No. I just thought about adding more condition when it returns true. Here is the patch. Comments are welcome! ... Datum pg_is_in_recovery(PG_FUNCTION_ARGS) { ! /* use volatile pointer to prevent code rearrangement */ ! volatile WalRcvData *walrcv = WalRcv; ! ! PG_RETURN_BOOL(RecoveryInProgress() walrcv-walRcvState == WALRCV_RUNNING); } This returns 'false' if you're in hot standby mode running against an archive. That seems wrong, I don't think the walreceiver state should play any role in this. I think what we need is a second function, not a change to the definition of this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Excerpts from Robert Haas's message of mié oct 20 10:29:04 -0300 2010: Actually, I think the best thing for default_statistics_target might be to scale the target based on the number of rows in the table, e.g. given N rows: 10 + (N / 1000), if N 40,000 46 + (N / 1), if 50,000 N 3,540,000 400, if N 3,540,000 Consider a table with 2,000 rows. With default_statistics_target = 100, we can store up to 100 MCVs; and we break the remaining ~1900 values up into 100 buckets with 19 values/bucket. Maybe what should be done about this is to have separate sizes for the MCV list and the histogram, where the MCV list is automatically sized during ANALYZE. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Dimitri Fontaine's message of mié oct 20 07:22:53 -0300 2010: Using SPI to execute the extension's script already means that it can not contain explicit BEGIN and COMMIT commands. Now, is it possible to force a Reset of all GUCs after script's execution? Would it work to force a new transaction internally in CREATE EXTENSION, No, but maybe a savepoint? and use the equivalent of SET LOCAL in the CREATE EXTENSION code? I had assumed that that was how he was doing it ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Wed, Oct 20, 2010 at 10:53 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of mié oct 20 10:29:04 -0300 2010: Actually, I think the best thing for default_statistics_target might be to scale the target based on the number of rows in the table, e.g. given N rows: 10 + (N / 1000), if N 40,000 46 + (N / 1), if 50,000 N 3,540,000 400, if N 3,540,000 Consider a table with 2,000 rows. With default_statistics_target = 100, we can store up to 100 MCVs; and we break the remaining ~1900 values up into 100 buckets with 19 values/bucket. Maybe what should be done about this is to have separate sizes for the MCV list and the histogram, where the MCV list is automatically sized during ANALYZE. I thought about that, but I'm not sure there's any particular advantage. Automatically scaling the histogram seems just as useful as automatically scaling the MCV list - both things will tend to reduce the estimation error. For a table with 2,000,000 rows, automatically setting the statistics target from 100 to the value that would be computed by the above formula, which happens to be 246, will help the 101th-246th most common values, because they will now be MCVs. It will also help all the remaining values, both because you've pulled 146 fairly common values out of the histogram buckets and also because each bucket now contains ~8130 values rather than ~20,000. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to reliably detect if it's a promoting standby
This returns 'false' if you're in hot standby mode running against an archive. That seems wrong, I don't think the walreceiver state should play any role in this. Apart this, I wonder why walsender/walreceiver do not transfer archive logs as well. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable snapshot isolation patch
Robert Haas robertmh...@gmail.com wrote: On Tue, Oct 19, 2010 at 6:28 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: One thing that would work, but I really don't think I like it, is that a request for a snapshot for such a transaction would not only block until it could get a clean snapshot (no overlapping serializable non-read-only transactions which overlap serializable transactions which wrote data and then committed in time to be visible to the snapshot being acquired), but it would *also* block *other* serializable transactions, if they were non-read-only, on an attempt to acquire a snapshot. This seems pretty close to guaranteeing serializability by running transactions one at a time (i.e. I don't think it's likely to be acceptable from a performance standpoint). It absolutely makes no sense except for long-running read-only transactions, and would only be used when explicitly requested; and like I said, I really don't think I like it even on that basis -- just putting it out there as the only alternative I've found so far to either tolerating possible serialization anomalies in pg_dump output (albeit only when compared to the state the database reached after the dump's snapshot) or waiting indefinitely for a clean snapshot to become available. FWIW from a brainstorming perspective, while waiting for problem transactions to clear so we could get a clean snapshot for the dump I think it would work even better to block the *commit* of serializable transactions which *had done* writes than to block snapshot acquisition for serializable transactions which were not read-only. Still pretty icky, though. I am loathe to compromise the no new blocking promise of SSI. [thinks] Actually, maybe we can reduce the probability of needing to retry at each iteration of the non-blocking alternative by checking the conflict information for the problem transactions after they commit. Any transaction which didn't *actually* generate a read-write conflict out to a transaction visible to the dump's candidate snapshot could not cause an anomaly. If none of the problem transactions actually generates a rw-conflict we can use the candidate snapshot. Adding that logic to the non-blocking alternative might actually work pretty well. There might be some workloads where conflicts would be repeatedly generated, but there would be a lot where they wouldn't. If we add a switch to pg_dump to allow users to choose, I think this algorithm works. It never affects a transaction unless it has explicitly requested SERIALIZABLE READ ONLY DEFERRABLE, and the only impact is that startup may be deferred until a snapshot can be acquired which ensures serializable behavior without worrying about SIRead locks. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Tom Lane t...@sss.pgh.pa.us writes: I don't think that no changes to the makefiles is a requirement, or even a wish-list item, for this. I think it's perfectly reasonable for the makefile to have to specify the module name; far better that than that we get the name by some magic or other. It seemed easy to get a reasonable approach requiring very few edits in contribs so I favoured that. Now, it's still entirely possible to hand adjust. Determining the extension name automatically from DATA_built or DATA is only done where EXTENSION has not been provided, and guessing the CONTROL file name from the EXTENSION name only occurs when CONTROL has not been provided. Of course if those changes (inlined there after) are seen as a bad idea, then I will change all contrib Makefiles to add EXTENSION, EXTVERSION (which always is MAJORVERSION here) and CONTROL (which almost always is EXTENSION.control). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support # create extension support ifndef CONTROL ifndef EXTENSION ifdef DATA_built EXTENSION = $(basename $(notdir $(firstword $(DATA_built else ifdef DATA EXTENSION = $(basename $(notdir $(firstword $(DATA endif # DATA_built endif # EXTENSION ifndef EXTVERSION EXTVERSION = $(MAJORVERSION) endif ifdef EXTENSION CONTROL = $(EXTENSION).control endif # EXTENSION endif # CONTROL control: # create .control to keep track that we created the control file(s) @for file in $(CONTROL); do \ test -f `basename $$file .control`.sql -a ! -f $$file touch .control || true ; \ if [ -f .control ]; then \ if [ -n $(EXTENSION) ]; then \ (echo name = '$(EXTENSION)'; echo version = '$(EXTVERSION)') $$file ; \ else \ (echo name = '`basename $$file .control`'; echo version = '$(EXTVERSION)') $$file ; \ fi ; \ if [ -n $(EXTCOMMENT) ]; then echo comment = '$(EXTCOMMENT)' $$file ; fi ; \ fi ; \ done install: all installdirs control ifneq (,$(DATA)$(DATA_built)$(CONTROL)) @for file in $(addprefix $(srcdir)/, $(DATA)) $(DATA_built) $(CONTROL); do \ echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \ $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \ done endif # DATA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to reliably detect if it's a promoting standby
On 20.10.2010 18:06, Tatsuo Ishii wrote: This returns 'false' if you're in hot standby mode running against an archive. That seems wrong, I don't think the walreceiver state should play any role in this. Apart this, I wonder why walsender/walreceiver do not transfer archive logs as well. What do you mean? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Tom Lane t...@sss.pgh.pa.us writes: and use the equivalent of SET LOCAL in the CREATE EXTENSION code? I had assumed that that was how he was doing it ... I'm currently doing: SetConfigOption(client_min_messages, warning, PGC_SUSET, PGC_S_SESSION); And then manually reverting to what was there before the command: SetConfigOption(client_min_messages, old_cmsgs, PGC_SUSET, PGC_S_SESSION); The thing is that CREATE EXTENSION can be part of a transaction, so even SET LOCAL ain't going to work here, we need to reset before continuing the transaction. I don't know that SET LOCAL is RESET after a savepoint, so we would still need to care about that by hand, right? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: I don't think that no changes to the makefiles is a requirement, or even a wish-list item, for this. I think it's perfectly reasonable for the makefile to have to specify the module name; far better that than that we get the name by some magic or other. It seemed easy to get a reasonable approach requiring very few edits in contribs so I favoured that. Now, it's still entirely possible to hand adjust. Determining the extension name automatically from DATA_built or DATA is only done where EXTENSION has not been provided, That is simply a horrid idea. Just make it specify EXTENSION. and guessing the CONTROL file name from the EXTENSION name only occurs when CONTROL has not been provided. Here, on the other hand, I'm wondering why have two variables at all. Is there any sane use-case for the control file to not be named the same as the extension? It seems like that would accomplish little except to sow confusion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Tom Lane t...@sss.pgh.pa.us writes: That is simply a horrid idea. Just make it specify EXTENSION. Black magic it is, will remove in v7. Is there any sane use-case for the control file to not be named the same as the extension? It seems like that would accomplish little except to sow confusion. The goal of the 3 variables EXTENSION, EXTVERSION, EXTCOMMENT is to prepare the control file with 3 lines formatted variable = 'value'. If you specify CONTROL instead, that should be the file name you're providing directly. It then grew up into being a directive to produce the right file set for spi, but the simplest thing would be to hand prepare the files there. If you agree with that, that's what I'll do in v7. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
Greg Stark gsst...@mit.edu writes: On Tue, Oct 19, 2010 at 4:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think they're more alike than you think. If we had the infrastructure to do local temp tables this way, it'd be pretty easy to use that to instantiate per-backend copies of global temp tables. (The global entities would be templates, not actual tables.) Sure, but I think the idea was that you should be able to create temporary tables from whole cloth on the slave. Since they're local to the backend they never have to be stored on disk so logically from the user's point of view it seems like it should be possible. The way I'm envisioning this working is that you instantiate temporary child tables of all the system catalogs that are needed to describe tables. Being system catalogs, their schemas never change, so you don't have a problem there. Then you use these children to store the catalog entries describing user temp tables. Whether those temp tables are instantiations of spec-style global temp tables, or our current flavor of local temp tables, won't matter. I think it's pointless to speculate about whether we might have divvied up the meta-information about tables differently if we'd foreseen wanting to do this. It is what it is, and there is *way* too much code depending on it, both inside the backend and in clients. Any reimplementation of temp tables will still have to expose largely the same catalog information that exists for tables now. We can probably get away with marginal changes like redefining relfilenode, but we can't avoid providing catalog entries that describe the schema and statistics of a temp table. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer
Hi pgsql-hackers, Currently contrib/pgcrypto/pgp-pubenc.c contains code like: uint8 algo = pk-algo; if (pk == NULL) ... However, if pk was NULL, then the if() condition would never be reached because the pk-algo dereference would segfault. This patch moves the dereference to below the condition which was the intended behavior. Regards, Marti From a2500cda9e0e82883854a412ea12942e174e3dd2 Mon Sep 17 00:00:00 2001 From: Marti Raudsepp ma...@juffo.org Date: Wed, 20 Oct 2010 18:32:17 +0300 Subject: [PATCH] pgcrypto: Test for NULL before dereferencing pointer If pk is NULL, the backend would segfault when accessing -algo and the following NULL check was never reached. This problem was found by Coccinelle (null_ref.cocci from coccicheck) --- contrib/pgcrypto/pgp-pubenc.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/contrib/pgcrypto/pgp-pubenc.c b/contrib/pgcrypto/pgp-pubenc.c index 4b4d1bf..943d2e4 100644 --- a/contrib/pgcrypto/pgp-pubenc.c +++ b/contrib/pgcrypto/pgp-pubenc.c @@ -199,7 +199,7 @@ pgp_write_pubenc_sesskey(PGP_Context *ctx, PushFilter *dst) PGP_PubKey *pk = ctx-pub_key; uint8 ver = 3; PushFilter *pkt = NULL; - uint8 algo = pk-algo; + uint8 algo; if (pk == NULL) { @@ -207,6 +207,8 @@ pgp_write_pubenc_sesskey(PGP_Context *ctx, PushFilter *dst) return PXE_BUG; } + algo = pk-algo; + /* * now write packet */ -- 1.7.3.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
On Wed, Oct 20, 2010 at 10:28 AM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen R. van den Berg s...@cuci.nl writes: It's just that matching table and file, and subsequently figuring out some missing columns which may have been added/removed later, can be rather timeconsuming and could be made a lot easier (not necessarily perfect) if that information would have been present in the first page of a file. So you've already moved the goalposts from what was claimed in your prior message. If the data is not maintained (with 100% reliability) during ALTER TABLE, how are you going to do something like figure out missing columns? I can see the potential usefulness of a self-documenting table storage format, but this proposal isn't that; it's just an unreliable kluge. If we're looking to have any sort of out of catalog documentation of table storage format, what about just having a new relation fork that just appends each and every change made to the table formats, including ones rolled back, etc. Make this relation fork append only, and dump a completely new set of metadata to it each and every ALTER TABLE. This fork would never need to be read by PG, so a relation fork might even be too much. All you really need is a file you can tie to a relation, and blindly append data to on create/alter statements. Sure, it will have more information than *needed*, but I can't see it ever growing too big, and people doing forensics rarely complain about having *too much* information available. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
On Tue, Oct 19, 2010 at 11:16 AM, Greg Smith g...@2ndquadrant.com wrote: Josh Berkus wrote: It is critical that we make replication easier to set up, administrate and monitor than it currently is. In my conversations with people, this is more important to our users and the adoption of PostgreSQL than synchronous replication is. snip I find this launch into a new round of bike-shedding a bit distracting. If you want this to be easier to use, which it's obvious to any observer it should be because what's delivered in 9.0 is way too complicated, please work on finding development resources to assign to that problem. Because that's the bottleneck on simplifying things, not ideas about what to do. I would recommend finding or assigning a developer to work on integrating base backup in to the streaming protocol as the biggest single thing that would improve the built-in replication. All of the rest of the trivia about what knobs to set and such are tiny details that make for only a minor improvement until that's taken care of. Yeah, I'm sure we all think it should be easier, but figuring out what that means is certainly a moving target. The idea of being able to create a base backup automagically sounds good, but comparatively it's not significantly more difficult than what many other systems make you do, and actually if done incorrectly could be something rather limiting. On the whole the customers we are talking with are far more concerned about things like managing failover scenarios when you have multiple slaves, and it's the lack of capabilities around those kinds of things that hurt postgres adoption much more than it being hard to set up. Robert Treat play: http://www.xzilla.net work: http://omniti.com/is/hiring
Re: [HACKERS] pg_rawdump
Aidan Van Dyk ai...@highrise.ca writes: On Wed, Oct 20, 2010 at 10:28 AM, Tom Lane t...@sss.pgh.pa.us wrote: I can see the potential usefulness of a self-documenting table storage format, but this proposal isn't that; it's just an unreliable kluge. If we're looking to have any sort of out of catalog documentation of table storage format, what about just having a new relation fork that just appends each and every change made to the table formats, including ones rolled back, etc. I thought of that too, but I'm not sure if it's going to help enough. The trouble is that the information is only tied to the table itself via file names. In a fsck-recovery scenario, you may not have the correct file names. (One of the multiple problems with Stephen's proposal is that the metadata would only be physically tied to the first segment of a large table, and unless you can trust the file names you won't be able to match it to the other segments.) [ thinks for a bit... ] Perhaps we could stick some sort of unique ID into tables, which could be correlated to the same unique ID appearing in a metadata fork. [ thinks some more... ] Of course, at this point you have to start asking why the metadata fork should be thought to be any more reliable than the system catalogs. Make this relation fork append only, and dump a completely new set of metadata to it each and every ALTER TABLE. You can bet that there'll be somebody out there who whines because their app does lots and lots of repeated ALTER TABLEs, and so the metadata fork grows forever. I think we'd have to be a bit smarter than this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
* Terry Laurenzo: Agreed. BSON was born out of implementations that either lacked arbitrary precision numbers or had a strong affinity to an int/floating point way of thinking about numbers. I believe that if BSON had an arbitrary precision number type, it would be a proper superset of JSON. But JSON has only double-precision numbers!? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On 10/20/2010 01:15 PM, Florian Weimer wrote: * Terry Laurenzo: Agreed. BSON was born out of implementations that either lacked arbitrary precision numbers or had a strong affinity to an int/floating point way of thinking about numbers. I believe that if BSON had an arbitrary precision number type, it would be a proper superset of JSON. But JSON has only double-precision numbers!? AFAICT the JSON spec says nothing at all about the precision of numbers. It just provides a syntax for them. We should not confuse what can be allowed in JSON with what can be handled by some consumers of JSON such as ECMAScript. However, since we would quite reasonably require that any JSON implementation be able to handle arbitrary precision numbers, that apparently rules out BSON as a storage engine for it, since BSON can not handle such things. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
On Wed, Oct 20, 2010 at 1:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: [ thinks for a bit... ] Perhaps we could stick some sort of unique ID into tables, which could be correlated to the same unique ID appearing in a metadata fork. It would be awfully nice if the table name was in the first few bytes of the segments ;-) [ thinks some more... ] Of course, at this point you have to start asking why the metadata fork should be thought to be any more reliable than the system catalogs. It's not. But it's about data duplication. If you've lost some files, and others are found in lost+found, etc. Or maybe you lost only the main tablespace, and you have your big-honking-files from another table-space around, etc. The more copies of the data around, the more chance you have of being able to get something usable from of it. But we already have WAL archive as an external safe backup copy of *everything*. Maybe the cost of those extra forks/duplicates is too much. Maybe it would be cheaper to try and parse the WAL archive, and just specifically rebuild the system catalogs for the couple of times people actually need this type of data? Or maybe a query/function that ultra paranoid types can run daily, etc, which would show the system catalog information about table storage format that the could save safely instead of munging GB of WAL logs when they want it... You can bet that there'll be somebody out there who whines because their app does lots and lots of repeated ALTER TABLEs, and so the metadata fork grows forever. I think we'd have to be a bit smarter than this. Well, for bad habits, we have an equally bad solution ;-) BEGIN; ALTER TABLE bad_design RENAME TO bad_design_too_old; CREATE TABLE bad_design LIKE bad_design_too_old INCLUDING DEFAULTS CONSTRAINTS INDEXES; INSERT INTO bad_design SELECT * FROM bad_design_too_old; DROP TABLE bad_design_too_old; COMMIT; It's all about how much extra stuff do we want around to make forensic/reconstruction type work easier when they can't go to the documented, tried, tested, normal restore from backup/WAL. None? Or as much as possible? And what are the tradeoffs. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade performance with 150k tables
I received a private email report yesterday from someone using pg_upgrade with PG 9.0 who found it took five hours for pg_upgrade to upgrade a database with 150k tables. Yes, that is a lot of tables, but pg_upgrade should be able to do better than that. I have modified pg_upgrade in git master to cache scandir() and reduce array lookups and the time is down to 38 minutes. (He prototyped a hash implementation that was 30 minutes but it was too much code for my taste.) I don't think this is reasonable to backpatch. If anyone else sees cases for pg_upgrade improvement, please let me know. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade patch application process, and move to /bin?
Bruce Momjian wrote: I received a private email report yesterday from someone using pg_upgrade with PG 9.0 who found it took five hours for pg_upgrade to upgrade a database with 150k tables. Yes, that is a lot of tables, but pg_upgrade should be able to do better than that. I have modified pg_upgrade in git master to cache scandir() and reduce array lookups and the time is down to 38 minutes. (He prototyped a hash implementation that was 30 minutes but it was too much code for my taste.) I don't think this is reasonable to backpatch. If anyone else sees cases for pg_upgrade improvement, please let me know. One more question --- should I be sending pg_upgrade patches to the list for approval? The restructuring patch was large and didn't seem necessary to post, and the speedups were tested by the bug reporter, so I figured those were OK to apply. Oh, and do we want to move pg_upgrade into /bin for 9.1? There was discussion about that six months ago. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
On Tue, Oct 19, 2010 at 08:51:16PM -0400, Robert Haas wrote: On Tue, Oct 19, 2010 at 5:42 PM, Andrew Dunstan and...@dunslane.net wrote: Well a bit more testing shows some benefit. I've sorted out a few kinks, so this seems to work. In particular, with the above tables, the version imported from 9.0 can create have an index created in about the same time as on the fresh table (identical data, but all even numbered Oids). Of course, with lots of odd numbered Oids, if a label gets added the imported version will degrade in performance much more quickly. I'm quite impressed by the amount of time and thought being put into optimizing this. I didn't realize people cared so much about enum performance; but it's good that they do. I hope to see more such efforts in other parts of the system. Which parts of the system, in particular, do you have in mind? Other people from EDB have mentioned that slimming down the on-disk representation was one such target. What other ones would you see as needing such attention? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Wed, Oct 20, 2010 at 7:10 AM, Tom Lane t...@sss.pgh.pa.us wrote: I believe that for the equivalent Solaris option, we just automatically enable it when available. So there'd be no need for user documentation. However, I definitely *would* like to see some benchmarks proving that the change actually does something useful. I've always harbored the suspicion that this is just a knob to satisfy people who need knobs to frob. Well saving a few megabytes of kernel space memory isn't a bad thing. But I think the major effect is on forking new processes. Having to copy that page map is a major cost when you're talking about very large memory footprints. While machine memory has gotten larger the 4k page size hasn't. I don't think it's a big cost once all the processes have been forked if you're reusing them beyond perhaps slightly more efficient cache usage. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade patch application process, and move to /bin?
On Wed, Oct 20, 2010 at 3:01 PM, Bruce Momjian br...@momjian.us wrote: One more question --- should I be sending pg_upgrade patches to the list for approval? The restructuring patch was large and didn't seem necessary to post, and the speedups were tested by the bug reporter, so I figured those were OK to apply. I think it would be good to do that. At least give people a chance to comment, if they care. Oh, and do we want to move pg_upgrade into /bin for 9.1? There was discussion about that six months ago. I would be inclined to leave it in contrib for a few more releases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark gsst...@mit.edu wrote: I don't think it's a big cost once all the processes have been forked if you're reusing them beyond perhaps slightly more efficient cache usage. Hm, this site claims to get a 13% win just from the reduced tlb misses using a preload hack with Pg 8.2. That would be pretty substantial. http://oss.linbit.com/hugetlb/ -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade patch application process, and move to /bin?
On Wed, Oct 20, 2010 at 21:28, Robert Haas robertmh...@gmail.com wrote: On Wed, Oct 20, 2010 at 3:01 PM, Bruce Momjian br...@momjian.us wrote: One more question --- should I be sending pg_upgrade patches to the list for approval? The restructuring patch was large and didn't seem necessary to post, and the speedups were tested by the bug reporter, so I figured those were OK to apply. I think it would be good to do that. At least give people a chance to comment, if they care. +1. It's also a good way for people to get a bit more involved in the code. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade patch application process, and move to /bin?
Magnus Hagander wrote: On Wed, Oct 20, 2010 at 21:28, Robert Haas robertmh...@gmail.com wrote: On Wed, Oct 20, 2010 at 3:01 PM, Bruce Momjian br...@momjian.us wrote: One more question --- should I be sending pg_upgrade patches to the list for approval? ?The restructuring patch was large and didn't seem necessary to post, and the speedups were tested by the bug reporter, so I figured those were OK to apply. I think it would be good to do that. ?At least give people a chance to comment, if they care. +1. It's also a good way for people to get a bit more involved in the code. OK. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Fix snapshot taking inconsistencies
Excerpts from Tom Lane's message of lun oct 04 10:31:26 -0400 2010: In the particular case at hand here, I rather wonder why SQL functions are depending on postgres.c at all. It might be better to just duplicate a bit of code to make them independent. pg_parse_and_rewrite would then be dead code and could be deleted. This idea doesn't work, unless pushed a lot further. Attached are two separate patches, extracted from the last patch version posted by Marko (git commit --interactive helped here). The first one does what you suggest above: remove pg_parse_and_rewrite and inline it into the callers. The other patch is the remainder. Read on for the details of the first patch. As for the second patch, which is Marko's original intention anyway, I don't see that it needs to be delayed because of the first one. So while I haven't reviewed it, I think it should be considered separately. The problem with this patch (0001) is that the inlined versions of the code that used to be pg_parse_and_rewrite are still depending on functions in postgres.c. These are pg_parse_query and pg_analyze_and_rewrite. pg_parse_query is just a layer on top of raw_parser. pg_analyze_and_rewrite is a layer on top of parse_analyze plus pg_rewrite_query (also on postgres.c). Now, the only difference between those functions and the ones that underlie them is that they have the bunch of tracing macros and log_parser_stats reporting. So one solution would be to have SQL functions (pg_proc.c and executor/functions.c) call the raw parser.c and analyze.c functions directly, without invoking the tracing/logging code. The other idea would be to move the whole of those functions out of postgres.c and into their own modules, i.e. move pg_parse_query into parser.c and pg_analyze_and_rewrite and pg_rewrite_query into rewriteHandler.c. (This actually requires a bit more effort because we should also move pg_analyze_and_rewrite_params out of postgres.c, following pg_analyze_and_rewrite). Note that pg_analyze_and_rewrite and its params siblings are being called from copy.c, spi.c, optimizer/util/clauses.c, and plancache.c. So it does make certain sense to move them out of postgres.c, if we want to think of postgres.c as a module only concerned with client interaction. The only quarrel I have with this code shuffling is that pg_rewrite_query is being called from exec_parse_message. Since it's now a static function, it would have to stop being static so that it can be called from both places (postgres.c and rewriteHandler.c) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support 0001-Separate-SQL-function-processing-from-postgres.c.patch Description: Binary data 0002-The-remainder-of-Marko-s-patch.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer
On 20.10.2010 18:44, Marti Raudsepp wrote: Hi pgsql-hackers, Currently contrib/pgcrypto/pgp-pubenc.c contains code like: uint8 algo = pk-algo; if (pk == NULL) ... However, if pk was NULL, then the if() condition would never be reached because the pk-algo dereference would segfault. This patch moves the dereference to below the condition which was the intended behavior. Thanks, applied. Did coccicheck find anything else interesting? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
Excerpts from Greg Stark's message of mié oct 20 16:28:25 -0300 2010: On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark gsst...@mit.edu wrote: I don't think it's a big cost once all the processes have been forked if you're reusing them beyond perhaps slightly more efficient cache usage. Hm, this site claims to get a 13% win just from the reduced tlb misses using a preload hack with Pg 8.2. That would be pretty substantial. http://oss.linbit.com/hugetlb/ Wow, is there no other way to get the huge page size other than opening and reading /proc/meminfo? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas robertmh...@gmail.com wrote: Exactly. It doesn't take many 3-7% slowdowns to add up to being 50% or 100% slower, and that sucks. In fact, I'm still not convinced that we were wise to boost default_statistics_target as much as we did. I argued for a smaller boost at the time. Well we don't want to let ourselves be paralyzed by FUD so it was important to identify specific concerns and then tackle those concerns. Once we identified the worst-case planning cases we profiled them and found that the inflection point of the curve was fairly clearly above 100 but that there were cases where values below 1,000 caused problems. So I'm pretty happy with the evidence-based approach. The problem with being overly conservative is that it gives free rein to the folks who were shouting that we should just set the default to 1,000. They weren't wrong that the 10 was overly conservative and in the absence of evidence 1,000 was just as reasonable. Actually, I think the best thing for default_statistics_target might be to scale the target based on the number of rows in the table, e.g. given N rows: The number of buckets needed isn't related to the population size -- it's related to how wide the ranges you'll be estimating selectivity for are. That is, with our current code, if you're selecting tuples within a range a..b and that range happens to be the same size as the bucket size then you'll get an accurate estimate with a fixed 95th percentile precision regardless of the size of the table (to an approximation). I'm not sure how our selectivity works at all for the degenerate case of selecting for specific values. I don't understand how histograms are useful for such estimates at all. I think the MCV lists are basically an attempt to overcome this problem and as you point out I'm not sure the statistics target is really the right thign to control them -- but since I don't think there's any real statistics behind them I'm not sure there's any right way to control them. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade cleanup
I have applied the attached patch to clean up pg_upgrade cache lookup code and remove useless NULL pointer tests. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c index 257f5a8..67528b0 100644 *** a/contrib/pg_upgrade/info.c --- b/contrib/pg_upgrade/info.c *** gen_db_file_maps(DbInfo *old_db, DbInfo *** 74,81 num_maps++; /* ! * so much for the mapping of this relation. Now we need a mapping for ! * its corresponding toast relation if any. */ if (oldrel-toastrelid 0) { --- 74,81 num_maps++; /* ! * So much for mapping this relation; now we need a mapping ! * for its corresponding toast relation, if any. */ if (oldrel-toastrelid 0) { *** gen_db_file_maps(DbInfo *old_db, DbInfo *** 117,122 --- 117,123 newrel-reloid); /* look them up in their respective arrays */ + /* we lose our cache location here */ old_toast = relarr_lookup_rel(old_db-rel_arr, pg_toast, old_name, CLUSTER_OLD); new_toast = relarr_lookup_rel(new_db-rel_arr, *** get_rel_infos(const DbInfo *dbinfo, RelI *** 385,391 relarr-rels = relinfos; relarr-nrels = num_rels; ! relarr-cache_name_rel = 0; } --- 386,392 relarr-rels = relinfos; relarr-nrels = num_rels; ! relarr-last_relname_lookup = 0; } *** dbarr_lookup_db(DbInfoArr *db_arr, const *** 399,407 { int dbnum; - if (!db_arr || !db_name) - return NULL; - for (dbnum = 0; dbnum db_arr-ndbs; dbnum++) { if (strcmp(db_arr-dbs[dbnum].db_name, db_name) == 0) --- 400,405 *** relarr_lookup_rel(RelInfoArr *rel_arr, c *** 424,439 { int relnum; - if (!rel_arr || !relname) - return NULL; - /* Test next lookup first, for speed */ ! if (rel_arr-cache_name_rel + 1 rel_arr-nrels ! strcmp(rel_arr-rels[rel_arr-cache_name_rel + 1].nspname, nspname) == 0 ! strcmp(rel_arr-rels[rel_arr-cache_name_rel + 1].relname, relname) == 0) { ! rel_arr-cache_name_rel++; ! return rel_arr-rels[rel_arr-cache_name_rel]; } for (relnum = 0; relnum rel_arr-nrels; relnum++) --- 422,434 { int relnum; /* Test next lookup first, for speed */ ! if (rel_arr-last_relname_lookup + 1 rel_arr-nrels ! strcmp(rel_arr-rels[rel_arr-last_relname_lookup + 1].nspname, nspname) == 0 ! strcmp(rel_arr-rels[rel_arr-last_relname_lookup + 1].relname, relname) == 0) { ! rel_arr-last_relname_lookup++; ! return rel_arr-rels[rel_arr-last_relname_lookup]; } for (relnum = 0; relnum rel_arr-nrels; relnum++) *** relarr_lookup_rel(RelInfoArr *rel_arr, c *** 441,447 if (strcmp(rel_arr-rels[relnum].nspname, nspname) == 0 strcmp(rel_arr-rels[relnum].relname, relname) == 0) { ! rel_arr-cache_name_rel = relnum; return rel_arr-rels[relnum]; } } --- 436,442 if (strcmp(rel_arr-rels[relnum].nspname, nspname) == 0 strcmp(rel_arr-rels[relnum].relname, relname) == 0) { ! rel_arr-last_relname_lookup = relnum; return rel_arr-rels[relnum]; } } *** relarr_lookup_reloid(RelInfoArr *rel_arr *** 464,472 { int relnum; - if (!rel_arr || !oid) - return NULL; - for (relnum = 0; relnum rel_arr-nrels; relnum++) { if (rel_arr-rels[relnum].reloid == oid) --- 459,464 *** relarr_free(RelInfoArr *rel_arr) *** 483,489 { pg_free(rel_arr-rels); rel_arr-nrels = 0; ! rel_arr-cache_name_rel = 0; } --- 475,481 { pg_free(rel_arr-rels); rel_arr-nrels = 0; ! rel_arr-last_relname_lookup = 0; } diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h index e40f58d..439c0a7 100644 *** a/contrib/pg_upgrade/pg_upgrade.h --- b/contrib/pg_upgrade/pg_upgrade.h *** typedef struct *** 78,84 { RelInfo*rels; int nrels; ! int cache_name_rel; /* cache of last lookup location */ } RelInfoArr; /* --- 78,84 { RelInfo*rels; int nrels; ! int last_relname_lookup; /* cache of last lookup location */ } RelInfoArr; /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark gsst...@mit.edu wrote: On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas robertmh...@gmail.com wrote: Exactly. It doesn't take many 3-7% slowdowns to add up to being 50% or 100% slower, and that sucks. In fact, I'm still not convinced that we were wise to boost default_statistics_target as much as we did. I argued for a smaller boost at the time. Well we don't want to let ourselves be paralyzed by FUD so it was important to identify specific concerns and then tackle those concerns. Once we identified the worst-case planning cases we profiled them and found that the inflection point of the curve was fairly clearly above 100 but that there were cases where values below 1,000 caused problems. So I'm pretty happy with the evidence-based approach. The inflection point of the curve was certainly a good thing for us to look at but the fact remains that we took a hit on a trivial benchmark, and we can't afford to take too many of those. Actually, I think the best thing for default_statistics_target might be to scale the target based on the number of rows in the table, e.g. given N rows: The number of buckets needed isn't related to the population size -- it's related to how wide the ranges you'll be estimating selectivity for are. That is, with our current code, if you're selecting tuples within a range a..b and that range happens to be the same size as the bucket size then you'll get an accurate estimate with a fixed 95th percentile precision regardless of the size of the table (to an approximation). If you have a WHERE clause of the form WHERE x some_constant, then the effects vary depending on how that constant is chosen. If it's the median value, then as you say the statistics target doesn't matter much at all; but that's not necessarily representative of real life. For example, suppose x is a date and the constant is Monday of the current week. As the table grows, the present week's data becomes a smaller and smaller fraction of the table data. When it gets to be a tiny fraction of the very last histogram bucket, the estimates start to get progressively worse. At some point you have to give up and partition the table for other reasons anyway, but having to do it because the statistics are off is inexcusable. We've seen people hit this precise issue on -performance a few times. I'm not sure how our selectivity works at all for the degenerate case of selecting for specific values. I don't understand how histograms are useful for such estimates at all. I think the MCV lists are basically an attempt to overcome this problem and as you point out I'm not sure the statistics target is really the right thign to control them -- but since I don't think there's any real statistics behind them I'm not sure there's any right way to control them. If you have a WHERE clause of the form WHERE x = some_constant, then you get a much better estimate if some_constant is an MCV. If the constant is not an MCV, however, you still get better estimates, because the estimation code knows that no non-MCV can occur more frequently than any MCV, so increasing the number of MCVs pushes those estimates closer to reality. It is especially bad when the frequency falls off a cliff at a certain point in the distribution e.g. if there are 243 values that occur much more frequently than any others, a stats target of 250 will do much better than 225. But even if that's not an issue, it still helps. The bottom line here is that I can't remember any message, ever, on -performance, or any incident within my personal experience, where it was necessary to increase the statistics target beyond 50-100 on a table with 10K rows. However, there are certainly cases where we've recommended that for big tables, which means there are also people out there who have a performance problem on a big table but haven't asked for help and therefore haven't gotten that advice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Wed, Oct 20, 2010 at 3:47 PM, daveg da...@sonic.net wrote: On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote: On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark gsst...@mit.edu wrote: I don't think it's a big cost once all the processes have been forked if you're reusing them beyond perhaps slightly more efficient cache usage. Hm, this site claims to get a 13% win just from the reduced tlb misses using a preload hack with Pg 8.2. That would be pretty substantial. http://oss.linbit.com/hugetlb/ That was my motivation in trying a patch. TLB misses can be a substantial overhead. I'm not current on the state of play, but working at Sun's benchmark lab on a DB TPC-B benchmark something for the first generation of MP systems, something like 30% of all bus traffic was TLB misses. The next iteration of the hardward had a much larger TLB. I have a client with 512GB memory systems, currently with 128GB configured as postgresql buffer cache. Which is 32M TLB entires trying to fit in the few dozed cpu TLB slots. I suspect there may be some contention. I'll benchmark of course. Do you mean 128GB shared buffers, or shared buffers + OS cache? I think that the general wisdom is that performance tails off beyond 8-10GB of shared buffers anyway, so a performance improvement on 128GB shared buffers might not mean much unless you can also show that 128GB shared buffers actually performs better than some smaller amount. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
On Wed, Oct 20, 2010 at 8:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: I think it's pointless to speculate about whether we might have divvied up the meta-information about tables differently if we'd foreseen wanting to do this. It is what it is, and there is *way* too much code depending on it, both inside the backend and in clients. Any reimplementation of temp tables will still have to expose largely the same catalog information that exists for tables now. We can probably get away with marginal changes like redefining relfilenode, but we can't avoid providing catalog entries that describe the schema and statistics of a temp table. I agree about the schema -- that's the whole point of the catalog tables. I felt like the statistics were pretty marginal to begin with. There may be a large number of places but there's no complex structure of relationships to other tables or complex data structures going on here. Surely they can all be coded to look up the relpages from somewhere else just as easily? But I'm not about to start working on this area so my judgement on how much work that would be isn't very important here. And your point that if we have a complete local copy of the entire catalog schema then we can create temporary tables from whole cloth on a read-only database just as easily is attractive. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
On Wed, Oct 20, 2010 at 3:16 PM, David Fetter da...@fetter.org wrote: On Tue, Oct 19, 2010 at 08:51:16PM -0400, Robert Haas wrote: On Tue, Oct 19, 2010 at 5:42 PM, Andrew Dunstan and...@dunslane.net wrote: Well a bit more testing shows some benefit. I've sorted out a few kinks, so this seems to work. In particular, with the above tables, the version imported from 9.0 can create have an index created in about the same time as on the fresh table (identical data, but all even numbered Oids). Of course, with lots of odd numbered Oids, if a label gets added the imported version will degrade in performance much more quickly. I'm quite impressed by the amount of time and thought being put into optimizing this. I didn't realize people cared so much about enum performance; but it's good that they do. I hope to see more such efforts in other parts of the system. Which parts of the system, in particular, do you have in mind? Other people from EDB have mentioned that slimming down the on-disk representation was one such target. What other ones would you see as needing such attention? On-disk footprint. WAL volume. COPY speed. Checkpoint I/O. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer
On Wed, Oct 20, 2010 at 22:34, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Did coccicheck find anything else interesting? There's a file descriptor leak in psql/command.c function process_file() -- on errors it just returns without closing the file. But since it's quitting anyway, there's no practical impact. Should I submit a patch for this as well? Then there are a few more cases found by null_ref (same check as the original patch). But on closer inspection, these are false positives, because the variable is actually modified in between dereferencing and the NULL check. Then there's the 'badzero' check that finds a dozen cases where pointers are compared to a literal 0, not a NULL. This is a only a coding style check, as far as I can tell, so I thought it's not worth it. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
Greg Stark gsst...@mit.edu writes: On Wed, Oct 20, 2010 at 8:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: I think it's pointless to speculate about whether we might have divvied up the meta-information about tables differently if we'd foreseen wanting to do this. It is what it is, and there is *way* too much code depending on it, both inside the backend and in clients. Any reimplementation of temp tables will still have to expose largely the same catalog information that exists for tables now. We can probably get away with marginal changes like redefining relfilenode, but we can't avoid providing catalog entries that describe the schema and statistics of a temp table. I agree about the schema -- that's the whole point of the catalog tables. I felt like the statistics were pretty marginal to begin with. I'm thinking more of pg_statistic than the stuff in pg_class --- I agree that we could probably kluge some other approach for relpages and reltuples, but that doesn't scale to the real statistics. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Wed, Oct 20, 2010 at 1:12 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark gsst...@mit.edu wrote: On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas robertmh...@gmail.com wrote: Actually, I think the best thing for default_statistics_target might be to scale the target based on the number of rows in the table, e.g. given N rows: The number of buckets needed isn't related to the population size -- it's related to how wide the ranges you'll be estimating selectivity for are. As the table grows, the present week's data becomes a smaller and smaller fraction of the table data. That's an interesting point. I wonder if we can expose this in some way that allows users to specify the statistics target in something more meaningful for them that doesn't change as the ranges in the table grow. Or even gather stats on the size of the ranges being queried. If you have a WHERE clause of the form WHERE x = some_constant, then you get a much better estimate if some_constant is an MCV. If the constant is not an MCV, however, you still get better estimates, because the estimation code knows that no non-MCV can occur more frequently than any MCV, so increasing the number of MCVs pushes those estimates closer to reality. It is especially bad when the frequency falls off a cliff at a certain point in the distribution e.g. if there are 243 values that occur much more frequently than any others, a stats target of 250 will do much better than 225. It sounds like what we really need here some way to characterize the distribution of frequencies. Instead of just computing an upper bound we should have a kind of histogram showing how many values occur precisely once, how many occur twice, three times, etc. Or perhaps we only need to know the most common frequency per bucket. Or, hm... -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer
Marti Raudsepp ma...@juffo.org writes: There's a file descriptor leak in psql/command.c function process_file() -- on errors it just returns without closing the file. But since it's quitting anyway, there's no practical impact. Should I submit a patch for this as well? Might as well. It's the kind of thing that could turn into a real bug given some rearrangement of the code. Then there's the 'badzero' check that finds a dozen cases where pointers are compared to a literal 0, not a NULL. This is a only a coding style check, as far as I can tell, so I thought it's not worth it. I'd be in favor of fixing those too. I have no particular problem with either if (ptr) or if (ptr != NULL), but I think that if (ptr != 0) gives the reader entirely the wrong impression about the datatype of ptr. Just because C fails to distinguish doesn't make it good style. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote: On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark gsst...@mit.edu wrote: I don't think it's a big cost once all the processes have been forked if you're reusing them beyond perhaps slightly more efficient cache usage. Hm, this site claims to get a 13% win just from the reduced tlb misses using a preload hack with Pg 8.2. That would be pretty substantial. http://oss.linbit.com/hugetlb/ That was my motivation in trying a patch. TLB misses can be a substantial overhead. I'm not current on the state of play, but working at Sun's benchmark lab on a DB TPC-B benchmark something for the first generation of MP systems, something like 30% of all bus traffic was TLB misses. The next iteration of the hardward had a much larger TLB. I have a client with 512GB memory systems, currently with 128GB configured as postgresql buffer cache. Which is 32M TLB entires trying to fit in the few dozed cpu TLB slots. I suspect there may be some contention. I'll benchmark of course. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Tom Lane wrote: Aidan Van Dyk ai...@highrise.ca writes: If we're looking to have any sort of out of catalog documentation of table storage format, what about just having a new relation fork that just appends each and every change made to the table formats, including ones rolled back, etc. I presume that means that all tables changes are appended to a single central file per database? That would be a bad idea, because in the typical problem scenario, losing this new catalogue, basically creates the same problem. It would be preferable to keep the information tied in with the actual table(file) it concerns. [ thinks for a bit... ] Perhaps we could stick some sort of unique ID into tables, which could be correlated to the same unique ID appearing in a metadata fork. Ideal would be: put the table-oid inside the header of each page (either in the official header, or in the special area). This way even lost blocks can be correlated to the same table. I'd still vote for the latest known table definition in the first page. It's by no means perfect, but it will help 99% of all salvage attempts by an order of magnitude. [ thinks some more... ] Of course, at this point you have to start asking why the metadata fork should be thought to be any more reliable than the system catalogs. Quite. Which is why I wanted the best-effort latest version of the table description in the first page of the tablefile instead. Make this relation fork append only, and dump a completely new set of metadata to it each and every ALTER TABLE. You can bet that there'll be somebody out there who whines because their app does lots and lots of repeated ALTER TABLEs, and so the metadata fork grows forever. I think we'd have to be a bit smarter than this. Which means we come full circle and have to conclude that doing anything comprehensive is too invasive for normal operations; best-effort is all a forensics operation wants or can hope for. -- Stephen. Life is that brief interlude between nothingness and eternity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql: Don't close stdin, don't leak file descriptor with ON_ERROR_STOP
Hi list, Here's the second patch from my coccicheck run. Originally it flagged the fact that the opened file in psql's process_file() wasn't being closed in the ON_ERROR_STOP path, but there seem to be two more unintended behaviors here. (1) In the error path, the value of pset.inputfile wasn't being properly restored. The caller does free(fname) on line 786, so psql.inputfile would point to unallocated memory. (2) The more significant issue is that stdin *was closed in the success return path. So when you run a script with two \i - lines, the first \q would close stdin and the next one would fail with: psql:-:0: could not read from input file: Bad file descriptor In fact, this means that stdin was being accessed after being fclose()d, which is undefined behavior per ANSI C, though it seems to work just fine on Linux. The new behavior requires the same amount of \qs as the number of executions of '-' because stdin is never closed. Regards, Marti From 43b7595fdcc69cc9db0d066a53f53c5e71c965aa Mon Sep 17 00:00:00 2001 From: Marti Raudsepp ma...@juffo.org Date: Wed, 20 Oct 2010 23:44:36 +0300 Subject: [PATCH] psql: Don't close stdin, don't leak file descriptor with ON_ERROR_STOP Changes three things: 1. Don't close stdin when returning success if input was stdin 2. Don't leak file descriptor when exiting due to ON_ERROR_STOP 3. pset.inputfile wasn't being restored with ON_ERROR_STOP, yet the memory was freed by the caller (1) changes the behavior of \q on stdin. Previously multiple inclusions of stdin would be terminated with a single quit, now a separate quit is needed for each invocation. Previous behavior also accessed stdin after it was fclose()d, which is undefined behavior per ANSI C. (2) and (3) should have no practical impact, because the process would quit immediately afterwards anyway. --- src/bin/psql/command.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index e6d703a..45a145c 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1987,7 +1987,10 @@ process_file(char *filename, bool single_txn) if ((res = PSQLexec(BEGIN, false)) == NULL) { if (pset.on_error_stop) -return EXIT_USER; + { +result = EXIT_USER; +goto error; + } } else PQclear(res); @@ -2000,13 +2003,19 @@ process_file(char *filename, bool single_txn) if ((res = PSQLexec(COMMIT, false)) == NULL) { if (pset.on_error_stop) -return EXIT_USER; + { +result = EXIT_USER; +goto error; + } } else PQclear(res); } - fclose(fd); +error: + if(fd != stdin) + fclose(fd); + pset.inputfile = oldfilename; return result; } -- 1.7.3.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Tom Lane t...@sss.pgh.pa.us writes: That is simply a horrid idea. Just make it specify EXTENSION. And VERSION too, finally. So any extension and guessing the CONTROL file name from the EXTENSION name only occurs when CONTROL has not been provided. Here, on the other hand, I'm wondering why have two variables at all. Is there any sane use-case for the control file to not be named the same as the extension? It seems like that would accomplish little except to sow confusion. regards, tom lane -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Fix snapshot taking inconsistencies
Excerpts from Alvaro Herrera's message of mié oct 20 16:33:12 -0300 2010: The only quarrel I have with this code shuffling is that pg_rewrite_query is being called from exec_parse_message. Since it's now a static function, it would have to stop being static so that it can be called from both places (postgres.c and rewriteHandler.c) Actually, I just noticed that the remainder patch uses pg_plan_query, which is also in postgres.c. This function along with its sibling pg_plan_queries is also called from other internal places, like the PREPARE code, SPI and the plan cache. It strikes me that if we really want to restructure things to divide client interaction from other query-processing routines, we should create another file, say src/backend/tcop/queries.c; this would have stuff like pg_plan_query, pg_plan_queries, pg_rewrite_query, and the other things that the patch was evicting from postgres.c (plus, I imagine, a bunch of other stuff that I may be missing). In fact, if we go down this route, there would be no point in removing pg_parse_and_rewrite; we would just move it to this new module. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Windows-specific change: Enable crash dumps (like core files)
Craig Ringer wrote: On 4/10/2010 8:06 PM, Andrew Dunstan wrote: On 10/04/2010 07:50 AM, Craig Ringer wrote: - If the crash dump handler is enabled by setting the GUC, all backends register the handler during startup or (if it proves practical) when the GUC is changed. - When the handler is triggered by the OS trapping an unhandled exception, it loads dbghelp.dll, writes the appropriate dump format to the hardcoded path, and terminates the process. What is the performance impact of doing that? Specifically, how does it affect backend startup time? Without testing I can't say for sure. My expection based on how the handler works would be: near-zero, about as expensive as registering a signal handler, plus the cost of reading the GUC and doing one string compare to test the value. When disabled, it's just the GUC test. Is there a better mechanism to use for features that're going to be unused the great majority of the time? Perhaps something that does require a server restart, but doesn't have any cost at all when disabled? We definately had trouble producing crash dumps caused by the 128 return code problem, so I can see great value in this, if it can be done simply. I wonder if the 128-exit would have produced a crash file. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die
Maybe what should be done about this is to have separate sizes for the MCV list and the histogram, where the MCV list is automatically sized during ANALYZE. It's been suggested multiple times that we should base our sample size on a % of the table, or at least offer that as an option. I've pointed out (with math, which Simon wrote a prototype for) that doing block-based sampling instead of random-row sampling would allow us to collect, say, 2% of a very large table without more I/O than we're doing now. Nathan Boley has also shown that we could get tremendously better estimates without additional sampling if our statistics collector recognized common patterns such as normal, linear and geometric distributions. Right now our whole stats system assumes a completely random distribution. So, I think we could easily be quite a bit smarter than just increasing the MCV. Although that might be a nice start. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Fix snapshot taking inconsistencies
Alvaro Herrera alvhe...@commandprompt.com writes: It strikes me that if we really want to restructure things to divide client interaction from other query-processing routines, we should create another file, say src/backend/tcop/queries.c; this would have stuff like pg_plan_query, pg_plan_queries, pg_rewrite_query, and the other things that the patch was evicting from postgres.c (plus, I imagine, a bunch of other stuff that I may be missing). In fact, if we go down this route, there would be no point in removing pg_parse_and_rewrite; we would just move it to this new module. Yeah, possibly that would be a good idea. To my mind, the first thing that has to be resolved before messing around in this area is whether or not we want the logging/statistics behavior of these functions to apply when they are used in contexts other than interactive queries. Personally I would vote no, mainly because I don't think that behavior is very sensible in nested execution. If that is the decision, then probably these functions should stay where they are and as they are, and we just deprecate outside use of them. I'm not sure whether there's enough left after deleting the logging/statistics behavior to justify making exported versions, as opposed to just having the callers call the next-layer-down functionality directly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Quite. Josh, have you got any evidence showing that the penalty is only 10%? There are cases, such as COPY and ALTER TABLE, where you'd be looking at 2X or worse penalties, because of the existing optimizations that avoid writing WAL at all for operations where a single final fsync can serve the purpose. I'm not sure what the penalty for typical workloads is, partly because I'm not sure what should be considered a typical workload for this purpose. If we could agree on some workloads, I could run some benchmarks. I'm not sure what those would be though, given that COPY and ALTER TABLE aren't generally included in most benchmarks. I could see how everything else is effected, though. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
On Wed, Oct 20, 2010 at 5:30 PM, Stephen R. van den Berg s...@cuci.nl wrote: Ideal would be: put the table-oid inside the header of each page (either in the official header, or in the special area). This way even lost blocks can be correlated to the same table. I'd still vote for the latest known table definition in the first page. It's by no means perfect, but it will help 99% of all salvage attempts by an order of magnitude. I don't think we should shrink the amount of usable space by 4 bytes per block and break on-disk compatibility just to make it easier to reconstruct corrupted tables. I'm pretty dubious about the proposal to stuff an otherwise-useless metapage in every heap, too. If you have many small tables, you just doubled your disk utilization - worse than that, maybe, if some of them are empty. If we needed a metapage anyway and had extra space to play with, stuffing some useful forensic information in there might be worthwhile, but I have a hard time thinking that forensics alone is a sufficient justification for such a change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die
On Wed, Oct 20, 2010 at 3:15 PM, Josh Berkus j...@agliodbs.com wrote: Maybe what should be done about this is to have separate sizes for the MCV list and the histogram, where the MCV list is automatically sized during ANALYZE. It's been suggested multiple times that we should base our sample size on a % of the table, or at least offer that as an option. Why? Afaict this has been suggested multiple times by people who don't justify it in any way except with handwavy -- larger samples are better. The sample size is picked based on what sample statistics tells us we need to achieve a given 95th percentile confidence interval for the bucket size given. Robert pointed out one reason we would want smaller buckets for larger tables but nobody has explained why we would want smaller confidence intervals for the same size buckets. That amounts to querying larger tables for the same percentage of the table but wanting more precise estimates than you want for smaller tables. I've pointed out (with math, which Simon wrote a prototype for) that doing block-based sampling instead of random-row sampling would allow us to collect, say, 2% of a very large table without more I/O than we're doing now. Can you explain when this would and wouldn't bias the sample for the users so they can decide whether to use it or not? Nathan Boley has also shown that we could get tremendously better estimates without additional sampling if our statistics collector recognized common patterns such as normal, linear and geometric distributions. Right now our whole stats system assumes a completely random distribution. That's interesting, I hadn't seen that. So, I think we could easily be quite a bit smarter than just increasing the MCV. Although that might be a nice start. I think increasing the MCV is too simplistic since we don't really have any basis for any particular value. I think what we need are some statistics nerds to come along and say here's this nice tool from which you can make the following predictions and understand how increasing or decreasing the data set size affects the accuracy of the predictions. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
On Tue, Oct 19, 2010 at 9:15 PM, Andrew Dunstan and...@dunslane.net wrote: Efficiency has always been one of the major reasons for using enums, so it's important that we make them extensible without badly affecting performance. on that note is it worthwhile backpatching recent versions to allocate enums with even numbered oids? That way people binary upgrading can get the benefit of the optimization they should qualify for... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die
On Wed, Oct 20, 2010 at 6:38 PM, Greg Stark gsst...@mit.edu wrote: On Wed, Oct 20, 2010 at 3:15 PM, Josh Berkus j...@agliodbs.com wrote: Maybe what should be done about this is to have separate sizes for the MCV list and the histogram, where the MCV list is automatically sized during ANALYZE. It's been suggested multiple times that we should base our sample size on a % of the table, or at least offer that as an option. Why? Afaict this has been suggested multiple times by people who don't justify it in any way except with handwavy -- larger samples are better. The sample size is picked based on what sample statistics tells us we need to achieve a given 95th percentile confidence interval for the bucket size given. Robert pointed out one reason we would want smaller buckets for larger tables but nobody has explained why we would want smaller confidence intervals for the same size buckets. That amounts to querying larger tables for the same percentage of the table but wanting more precise estimates than you want for smaller tables. Yes, I think a percentage of the table is going to break down either at the high end or the low end. Hand-waving (but based on experience), for a 1000 row table a statistics target of 10 is probably approximately right and 100 is too much and 1 is too little. But for a 1,000,000 row table 10,000 is probably too much and even 1,000 is pushing it. So using a constant percentage of table rows doesn't feel right. I had a thought today that it might make sense to use an exponential curve, like min(2 * N^(1/3), 10). I can't really justify that mathematically, but that doesn't mean it won't work well in practice. So, I think we could easily be quite a bit smarter than just increasing the MCV. Although that might be a nice start. I think increasing the MCV is too simplistic since we don't really have any basis for any particular value. I think what we need are some statistics nerds to come along and say here's this nice tool from which you can make the following predictions and understand how increasing or decreasing the data set size affects the accuracy of the predictions. I'm not sure that's realistic, because everything depends on what queries you're running, and you can get arbitrary answers by postulating arbitrary queries. However, this does not make me excited about doing nothing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
On Wed, Oct 20, 2010 at 6:54 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Oct 19, 2010 at 9:15 PM, Andrew Dunstan and...@dunslane.net wrote: Efficiency has always been one of the major reasons for using enums, so it's important that we make them extensible without badly affecting performance. on that note is it worthwhile backpatching recent versions to allocate enums with even numbered oids? That way people binary upgrading can get the benefit of the optimization they should qualify for... Uh, -1 from me. This is not a bug fix, and it will only help people who create new enums between the time they upgrade to the relevant minor release and the time they upgrade to 9.1. We are not into the business of back-patching marginal peformance enhancements. If we want to have a 9.0R2 release, or whatever, then so be it, but let's not be modifying behavior in stable branches unless there's a *bug*. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die
On Wed, Oct 20, 2010 at 3:54 PM, Robert Haas robertmh...@gmail.com wrote: Yes, I think a percentage of the table is going to break down either at the high end or the low end. Hand-waving (but based on experience), for a 1000 row table a statistics target of 10 is probably approximately right and 100 is too much and 1 is too little. But for a 1,000,000 row table 10,000 is probably too much and even 1,000 is pushing it. So using a constant percentage of table rows doesn't feel right. I had a thought today that it might make sense to use an exponential curve, like min(2 * N^(1/3), 10). I can't really justify that mathematically, but that doesn't mean it won't work well in practice. Well we can analyze it but as you said later, it all depends on what queries you're running. If we want to aim for the same confidence interval at all times, ie that the estimated frequency is accurate to within +/- x% 95% of the time then: If we're querying ranges a..b which represent a constant percentage of the table we need a fixed number of buckets and a sample size that varies very little with respect to the size of the table (effectively constant). If we're querying ranges a..b which are constant sized and therefore represent a smaller percentage of the table as it grows then we need a number of buckets that's proportional to the size of the table. The sample size is proportional to the number of buckets (ie, it's a constant sized sample per bucket). If we're querying for a specific value which isn't one of the most common values then I'm not clear how to characterize the accuracy or precision of our current estimates let alone how they would vary if we changed our sample sizes. If we need to estimate ndistinct then we clearly need a sample of the table the size of which is proportional to the size of the table. And in practice to get accurate results it has to be a fairly high percentage -- effectively meaning we should read the whole table. I think increasing the MCV is too simplistic since we don't really have any basis for any particular value. I think what we need are some statistics nerds to come along and say here's this nice tool from which you can make the following predictions and understand how increasing or decreasing the data set size affects the accuracy of the predictions. I'm not sure that's realistic, because everything depends on what queries you're running, and you can get arbitrary answers by postulating arbitrary queries. However, this does not make me excited about doing nothing. Well our planner only needs to answer specific questions. We just needs stats capable of answering how many occurrences of x are there and how many values are in the range x..y for the normal estimation functions. We have the latter but if there's a stat we're missing for calculating the former more more robustly that would be great. We also need ndistinct but that's another story. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Oct 20, 2010, at 3:12 PM, Dimitri Fontaine wrote: So, the idea is that $(EXTENSION) is a list of extensions you're providing from the Makefile (most often, a list of one extension, but contrib/spi is an exception here). Each extension in the list must have a corresponding $EXTENSION.control file. This control file contains at minimum a single line for the name of the extension, but it's better already with a comment for users. I've been filling them for our extensions, pasting from the documentation: Might I suggest instead a META.json file like PGXN requires? Here's a simple example: { name: pair, abstract: A key/value pair data type, version: 0.1.0, maintainer: David E. Wheeler da...@justatheory.com, license: postgresql, } They can have a lot more information, too. Her's the one I actually shipped with pair: http://github.com/theory/kv-pair/blob/master/META.json The meta spec is here: http://github.com/theory/pgxn/wiki/PGXN-Meta-Spec Anyway, the point is that it might be useful for us to sync on this format. I went with JSON for a few reasons: * CPAN is switching to it (from YAML) * It's extremely widespread * It's useful for ac-hoc REST-style requests * The format will likely be in 9.1. Thoughts? BTW, really excited that you're finally getting EXTENSION done, Dim. This is going to be *great* for PostgreSQL developers. I'll have to work it into my talk at West. https://www.postgresqlconference.org/content/building-and-distributing-postgresql-extensions-without-learning-c Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Tom Lane wrote: Greg Smith g...@2ndquadrant.com writes: I don't see this as needing any implementation any more complicated than the usual way such timeouts are handled. Note how long you've been trying to reach the standby. Default to -1 for forever. And if you hit the timeout, mark the standby as degraded and force them to do a proper resync when they disconnect. Once that's done, then they can re-enter sync rep mode again, via the same process a new node would have done so. Well, actually, that's *considerably* more complicated than just a timeout. How are you going to mark the standby as degraded? The standby can't keep that information, because it's not even connected when the master makes the decision. ISTM that this requires 1. a unique identifier for each standby (not just role names that multiple standbys might share); 2. state on the master associated with each possible standby -- not just the ones currently connected. Both of those are perhaps possible, but the sense I have of the discussion is that people want to avoid them. Actually, #2 seems rather difficult even if you want it. Presumably you'd like to keep that state in reliable storage, so it survives master crashes. But how you gonna commit a change to that state, if you just lost every standby (suppose master's ethernet cable got unplugged)? Looks to me like it has to be reliable non-replicated storage. Leaving aside the question of how reliable it can really be if not replicated, it's still the case that we have noplace to put such information given the WAL-is-across-the-whole-cluster design. I assumed we would have a parameter called sync_rep_failure that would take a command and the command would be called when communication to the slave was lost. If you restart, it tries again and might call the function again. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die
Why? Afaict this has been suggested multiple times by people who don't justify it in any way except with handwavy -- larger samples are better. The sample size is picked based on what sample statistics tells us we need to achieve a given 95th percentile confidence interval for the bucket size given. I also just realized that I confused myself ... we don't really want more MCVs. What we want it more *samples* to derive a small number of MCVs. Right now # of samples and number of MCVs is inexorably bound, and they shouldn't be. On larger tables, you're correct that we don't necessarily want more MCVs, we just need more samples to figure out those MCVs accurately. Can you explain when this would and wouldn't bias the sample for the users so they can decide whether to use it or not? Sure. There's some good math in various ACM papers for this. The basics are that block-based sampling should be accompanied by an increased sample size, or you are lowering your confidence level. But since block-based sampling allows you to increase your sample size without increasing I/O or RAM usage, you *can* take a larger sample ... a *much* larger sample if you have small rows. The algorithms for deriving stats from a block-based sample are a bit more complex, because the code needs to determine the level of physical correlation in the blocks sampled and skew the stats based on that. So there would be an increase in CPU time. As a result, we'd probably give some advice like random sampling for small tables, block-based for large ones. I think increasing the MCV is too simplistic since we don't really have any basis for any particular value. I think what we need are some statistics nerds to come along and say here's this nice tool from which you can make the following predictions and understand how increasing or decreasing the data set size affects the accuracy of the predictions. Agreed. Nathan? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die
On Wed, Oct 20, 2010 at 6:03 PM, Josh Berkus j...@agliodbs.com wrote: I also just realized that I confused myself ... we don't really want more MCVs. What we want it more *samples* to derive a small number of MCVs. Right now # of samples and number of MCVs is inexorably bound, and they shouldn't be. On larger tables, you're correct that we don't necessarily want more MCVs, we just need more samples to figure out those MCVs accurately. I don't see why the MCVs would need a particularly large sample size to calculate accurately. Have you done any tests on the accuracy of the MCV list? Robert explained why having more MCVs might be useful because we use the frequency of the least common MCV as an upper bound on the frequency of any value in the MCV. That seems logical but it's all about the number of MCV entries not the accuracy of them. And mostly what it tells me is that we need a robust statistical method and the data structures it requires for estimating the frequency of a single value. Binding the length of the MCV list to the size of the histogram is arbitrary but so would any other value and I haven't seen anyone propose any rationale for any particular value. The only rationale I can see is that we probably want to to take roughly the same amount of space as the existing stats -- and that means we probably want it to be roughly the same size. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] lazy snapshots?
I had the following idea for an optimization. Feel free to tell me I'm nuts. Would it be possible to postpone the operation of taking a snapshot until we encounter an in-doubt tuple - that is, a tuple whose XMIN or XMAX is committed but not all-visible? It seems to me that there are many transactions that probably never look at any recently-modified data, and that the overhead (and contention) of scanning the ProcArray could be avoided for such transactions. At the time when we currently take a snapshot, we could instead record an estimate of the oldest XID still running; I'll call this value the threshold XID. Ideally, this would be something we could read from shared memory in O(1) time. Subsequently, when we examine XMIN or XMAX, we may find that it's aborted (in which case we don't need a snapshot to decide what to do) or that the XID we're examining precedes the threshold XID (in which case we don't need a snapshot to decide what to do) or that the XID we're examining is our own (in which case we again don't need a snapshot to decide what to do). If none of those conditions hold, we take a snapshot. (Possibly, we could try rereading the threshold XID from shared memory, because it might have advanced far enough to get us out of the woods.) It's necessary to convince ourselves not only that this has some performance benefit but that it's actually correct. It's easy to see that, if we never take a snapshot, all the tuple visibility decisions we make will be exactly identical to the ones that we would have made with a snapshot; the choice of snapshot in that case is arbitrary. But if we do eventually take a snapshot, we'll likely make different tuple visibility decisions than we would have made had we taken the snapshot earlier. However, the decisions that we make prior to taking the snapshot will be consistent with the snapshot, and we will certainly see the effects of all transactions that committed before we started. We may also see the effects of some transactions that commit after we started, but that is OK: it is just as if our whole transaction had been started slightly later and then executed more quickly thereafter. It would be bad if we saw the effect of transaction A but not transaction B where transaction B committed after transaction A, but the way snapshots are taken prevents that regardless of exactly when we do it. VACUUM can't remove any tuples with committed XMINs unless their XMAX precedes our threshold XID, but I think that's not any worse under this proposal than it is anyway. If we took a full snapshot instead of just writing down a threshold XID, we'd have the same problem. OK, that's it. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die
I don't see why the MCVs would need a particularly large sample size to calculate accurately. Have you done any tests on the accuracy of the MCV list? Yes, although I don't have them at my fingertips. In sum, though, you can't take 10,000 samples from a 1b row table and expect to get a remotely accurate MCV list. A while back I did a fair bit of reading on ndistinct and large tables from the academic literature. The consensus of many papers was that it took a sample of at least 3% (or 5% for block-based) of the table in order to have 95% confidence in ndistinct of 3X. I can't imagine that MCV is easier than this. And mostly what it tells me is that we need a robust statistical method and the data structures it requires for estimating the frequency of a single value. Agreed. Binding the length of the MCV list to the size of the histogram is arbitrary but so would any other value and I haven't seen anyone propose any rationale for any particular value. histogram size != sample size. It is in our code, but that's a bug and not a feature. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die
On Wed, Oct 20, 2010 at 7:13 PM, Greg Stark gsst...@mit.edu wrote: On Wed, Oct 20, 2010 at 3:54 PM, Robert Haas robertmh...@gmail.com wrote: Yes, I think a percentage of the table is going to break down either at the high end or the low end. Hand-waving (but based on experience), for a 1000 row table a statistics target of 10 is probably approximately right and 100 is too much and 1 is too little. But for a 1,000,000 row table 10,000 is probably too much and even 1,000 is pushing it. So using a constant percentage of table rows doesn't feel right. I had a thought today that it might make sense to use an exponential curve, like min(2 * N^(1/3), 10). I can't really justify that mathematically, but that doesn't mean it won't work well in practice. Well we can analyze it but as you said later, it all depends on what queries you're running. If we want to aim for the same confidence interval at all times, ie that the estimated frequency is accurate to within +/- x% 95% of the time then: If we're querying ranges a..b which represent a constant percentage of the table we need a fixed number of buckets and a sample size that varies very little with respect to the size of the table (effectively constant). If we're querying ranges a..b which are constant sized and therefore represent a smaller percentage of the table as it grows then we need a number of buckets that's proportional to the size of the table. The sample size is proportional to the number of buckets (ie, it's a constant sized sample per bucket). If we're querying for a specific value which isn't one of the most common values then I'm not clear how to characterize the accuracy or precision of our current estimates let alone how they would vary if we changed our sample sizes. I think that sums it up pretty well. There's no one right formula. I think this problem needs an empirical approach rather than a statistical analysis. We know that it's impractical for the stats target to be linear in the table size. We also know that constant values are excessive for small tables and sometimes inadequate for large one. Therefore, we should pick something that grows, but sublinearly. Discuss. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die
Robert explained why having more MCVs might be useful because we use the frequency of the least common MCV as an upper bound on the frequency of any value in the MCV. Where is that being used? The only non-MCV frequency estimate that I recall seeing is ( nrows - n_ndistinct_rows )/ndistinct. Obviously changing the number of mcv's affects this by lowering n_ndistinct_rows, but it's always pretty coarse estimate. Binding the length of the MCV list to the size of the histogram is arbitrary but so would any other value Wouldn't the best approach be to stop adding MCV's/histogram buckets when adding new ones doesn't decrease your prediction error 'substantially'? One very hacky threshold heuristic is to stop adding MCV's when a simple equality select ( SELECT col FROM table WHERE col == VALUE ) would switch the plan from an index to a sequential scan ( or vice versa, although with the current code this would never happen ). ie, if the non_mcv frequency estimate is 0.1% ( producing an index scan ), but adding the MCV gives us an estimate of 5% ( pbly producing a seq scan ) then add that value as an mcv. More sophisticated variations might also consider plan changes to very suboptimal joins; even more sophisticated would be to stop when the MAX( curr - optimal plan / optimal plan ) was below some threshold, say 20%, over a bunch of recently executed queries. A similar approach would work for histogram bins, except the queries of interest are inequality rather than equality selections. -Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Wed, Oct 20, 2010 at 6:17 PM, Josh Berkus j...@agliodbs.com wrote: Quite. Josh, have you got any evidence showing that the penalty is only 10%? There are cases, such as COPY and ALTER TABLE, where you'd be looking at 2X or worse penalties, because of the existing optimizations that avoid writing WAL at all for operations where a single final fsync can serve the purpose. I'm not sure what the penalty for typical workloads is, partly because I'm not sure what should be considered a typical workload for this purpose. If we could agree on some workloads, I could run some benchmarks. I'm not sure what those would be though, given that COPY and ALTER TABLE aren't generally included in most benchmarks. I could see how everything else is effected, though. I think this whole thing is a complete non-starter. Are we seriously talking about shipping a configuration that will slow down COPY by 2X or more, just so that someone who wants replication can do it by changing one fewer parameter? I find it impossible to believe that's a good decision, and IMHO we should be focusing on how to make the parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us most of the same benefits without throwing away hard-won performance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Thu, Oct 21, 2010 at 7:12 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: This control file contains at minimum a single line for the name of the extension, but it's better already with a comment for users. I've been filling them for our extensions, pasting from the documentation: name | version | +--+ fuzzystrmatch | 9.1devel | hstore | 9.1 | Why does only hstore have version '9.1'? Any other modules have '9.1devel'. If you provide a $(VERSION) variable, then a line in the control file is automatically added at make install (version = '$(VERSION)'), in order to make life easier for extension authors. In v7, a line of version = '...' is added at make install, and removed at make clean. Also, if we runs make install multiple times, version lines are added repeatedly. I don't think they are good ideas; we should not modify source codes stored in git repo when we build them. How about having *.control.in and replace magic keywords in them at make? make install won't modify files at all, and make clean just removes *.control. It is the way we're using for *.sql.in and MODULE_PATHNAME. Some extensions are missing here because they fail to compile on my workstation where all the libs aren't installed --- ossp, xml2, etc I found xml2/pgxml.control should have 'pgxml for the name. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die
On Wed, Oct 20, 2010 at 9:53 PM, Nathan Boley npbo...@gmail.com wrote: Robert explained why having more MCVs might be useful because we use the frequency of the least common MCV as an upper bound on the frequency of any value in the MCV. Where is that being used? var_eq_const The only non-MCV frequency estimate that I recall seeing is ( nrows - n_ndistinct_rows )/ndistinct. Obviously changing the number of mcv's affects this by lowering n_ndistinct_rows, but it's always pretty coarse estimate. That one's used, too, but the other is used as an upper bound. n_distinct tends to come out too small on large tables, so that formula is prone to overestimation. Actually, both formulas are prone to overestimation. Binding the length of the MCV list to the size of the histogram is arbitrary but so would any other value Wouldn't the best approach be to stop adding MCV's/histogram buckets when adding new ones doesn't decrease your prediction error 'substantially'? One very hacky threshold heuristic is to stop adding MCV's when a simple equality select ( SELECT col FROM table WHERE col == VALUE ) would switch the plan from an index to a sequential scan ( or vice versa, although with the current code this would never happen ). ie, if the non_mcv frequency estimate is 0.1% ( producing an index scan ), When this happens depends on the values of a whole boat-load of GUCs... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lazy snapshots?
Robert Haas robertmh...@gmail.com writes: It's necessary to convince ourselves not only that this has some performance benefit but that it's actually correct. It's easy to see that, if we never take a snapshot, all the tuple visibility decisions we make will be exactly identical to the ones that we would have made with a snapshot; the choice of snapshot in that case is arbitrary. But if we do eventually take a snapshot, we'll likely make different tuple visibility decisions than we would have made had we taken the snapshot earlier. However, the decisions that we make prior to taking the snapshot will be consistent with the snapshot, and we will certainly see the effects of all transactions that committed before we started. We may also see the effects of some transactions that commit after we started, but that is OK: it is just as if our whole transaction had been started slightly later and then executed more quickly thereafter. I don't think this is going to be acceptable at all. You're assuming that clients have no independent means of determining what order transactions execute in, which isn't the case. It would be quite possible, for example, for a query submitted to one backend to see the effects of a transaction that was submitted to another backend long after the first query started. If the two clients involved interact at all, they're not going to be happy. Even if they just compare transaction timestamps, they're not going to be happy. I'm less than convinced by the hypothesis that most transactions would avoid taking snapshots in this regime, anyway. It would only hold up if there's little locality of reference in terms of which tuples are getting examined/modified by concurrent transactions, and that's a theory that seems much more likely to be wrong than right. I wonder whether we could do something involving WAL properties --- the current tuple visibility logic was designed before WAL existed, so it's not exploiting that resource at all. I'm imagining that the kernel of a snapshot is just a WAL position, ie the end of WAL as of the time you take the snapshot (easy to get in O(1) time). Visibility tests then reduce to did this transaction commit with a WAL record located before the specified position?. You'd need some index datastructure that made it reasonably cheap to find out the commit locations of recently committed transactions, where recent means back to recentGlobalXmin. That seems possibly do-able, though I don't have a concrete design in mind. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers