[HACKERS] pg_comparator table diff/sync
Is pg_comparator the only project out there that does what it does? I tried patching it, and it seems OK, but I'm not terribly confident in my patch. I'm hoping someone will tell me there's a great table- driven rsync out there that everyone uses and I just don't know about. http://pgfoundry.org/tracker/index.php?func=detailaid=1004603group_id=137atid=232 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Seq scans roadmap
On Fri, 2007-05-11 at 22:59 +0100, Heikki Linnakangas wrote: For comparison, here's the test results with vanilla CVS HEAD: copy-head | 00:06:21.533137 copy-head | 00:05:54.141285 I'm slightly worried that the results for COPY aren't anywhere near as good as the SELECT and VACUUM results. It isn't clear from those numbers that the benefit really is significant. Are you thinking that having COPY avoid cache spoiling is a benefit just of itself? Or do you see a pattern of benefit from your other runs? (BTW what was wal_buffers set to? At least twice the ring buffer size, hopefully). -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Seq scans roadmap
Hi Simon, On 5/12/07 12:35 AM, Simon Riggs [EMAIL PROTECTED] wrote: I'm slightly worried that the results for COPY aren't anywhere near as good as the SELECT and VACUUM results. It isn't clear from those numbers that the benefit really is significant. COPY is bottlenecked on datum formation and format translation with very low performance, so I don't think we should expect the ring buffer to make much of a dent. - Luke ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Use of ActiveSnapshot
The use of ActiveSnapshot throughout the code appears rather dangerous to me. It is a global pointer, assumed not to be set yet in some places, assumed to be saved and restored by the caller in others. The actual (context) memory it points to is sometimes explicitly freed, sometimes just left in the context and thrown away by MemoryContextDelete() without resetting ActiveSnapshot to NULL. The comment for the call of pg_plan_queries in util/cache/plancache.c line 469 for example is fatally wrong. Not only should the snapshot be set by all callers at this point, but if the call actually does replan the queries, the existing ActiveSnapshot is replaced with one allocated on the current memory context. If this happens to be inside of a nested SPI call sequence, the innermost SPI stack frame will free the snapshot data without restoring ActiveSnapshot to the one from the caller. Either calling pg_plan_queries() with needSnapshot=false or saving and restoring ActiveSnapshot will prevent the backend from dumping core in the mentioned example, but I am not entirely sure as to which one is the right solution. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)
Moving to -hackers. On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote: If you know when the checkpoint ended, and you know how long each of the pieces took, you can reconstruct the other times easily. The way you describe this it is true--that the summary is redundant given the detail--but if you put yourself in the shoes of a log file parser the other way around is easier to work with. Piecing together log entries is a pain, splitting them is easy. If I had to only keep one line out of this, it would be the one with the summary. It would be nice to have it logged at INFO. Yeah, if we have the summary line we don't need the other lines and vice versa. I have sympathy for parsing log files, I've done that a lot in the past and I can see what you mean. Having the individual lines is nice when you're monitoring a running system; you don't get the summary line until the checkpoint is finished. I suppose we can have both the individual lines and the summary, the extra lines shouldn't hurt anyone, and you won't get them unless you turn on the new log_checkpoints parameter anyway. Not to beat a dead horse, but do we really want to force folks to be parsing logs for performance monitoring? Especially if that log parsing is just going to result in data being inserted into a table anyway? I know there's concern about performance of the stats system and maybe that needs to be addressed, but pushing users to log parsing is a lot of extra effort, non-standard, likely to be overlooked, and doesn't play well with other tools. It also conflicts with all the existing statistics framework. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Performance monitoring
Not to beat a dead horse, but do we really want to force folks to be parsing logs for performance monitoring? Especially if that log parsing is just going to result in data being inserted into a table anyway? I know there's concern about performance of the stats system and maybe that needs to be addressed, but pushing users to log parsing is a lot of extra effort, non-standard, likely to be overlooked, and doesn't play well with other tools. It also conflicts with all the existing statistics framework. One thing that doesn't seemed to be being looked at it is the cost of logging. Logging is very expensive. I don't know if it is more expensive than the stats system, but you can cut your tps in half by having any level of verbose logging on. Yes that can be offset by pushing the logging to another spindle, and being careful about what you are logging but still. Either way, we are taking the hit, it is just a matter of where. IMO it would be better to have the information in the database where it makes sense, than pushing out to a log that: A. Will likely be forgotten B. Is only accessible if you have shell access to the machine (not as common as all of us would like to think) Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] I am back, catching up on email
I returned on Wednesday from my trip to Australia and India. (I skipped London.) I returned early because my brother-in-law died on May 5 (for details see http://momjian.us/main/news.html). Anyway, I am slow catching up on email for that reason. I should be caught up by Tuesday/Wednesday. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] plperl vs. bytea
Added to TODO: o Allow data to be passed in native language formats, rather than only text http://archives.postgresql.org/pgsql-hackers/2007-05/msg00289$ --- Andrew Dunstan wrote: Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tino Wildenhain wrote: Andrew Dunstan schrieb: This does not need to be over-engineered, IMNSHO. Well could you explain where it would appear over-engineered? Anything that imposes extra requirements on type creators seems undesirable. I'm not sure either that the UUID example is a very good one. This whole problem arose because of performance problems handling large gobs of data, not just anything that happens to be binary. Well, we realize that bytea has got a performance problem, but are we so sure that nothing else does? I don't want to stick in a one-purpose wart only to find later that we need a few more warts of the same kind. An example of something else we ought to be considering is binary transmission of float values. The argument in favor of that is not so much performance (although text-and-back conversion is hardly cheap) as it is that the conversion is potentially lossy, since float8out doesn't by default generate enough digits to ensure a unique back-conversion. ISTM there are three reasons for considering non-text-based transmission: 1. Performance, as in the bytea case 2. Avoidance of information loss, as for float 3. Providing a natural/convenient mapping to the PL's internal data types, as we already do --- but incompletely --- for arrays and records It's clear that the details of #3 have to vary across PLs, but I'd like it not to vary capriciously. For instance plperl currently has special treatment for returning perl arrays as SQL arrays, but AFAICS from the manual not for going in the other direction; plpython and pltcl overlook arrays entirely, even though there are natural mappings they could and should be using. I don't know to what extent we should apply point #3 to situations other than arrays and records, but now is the time to think about it. An example: working with the geometric types in a PL function is probably going to be pretty painful for lack of simple access to the constituent float values (not to mention the lossiness problem). We should also be considering some non-core PLs such as PL/Ruby and PL/R; they might provide additional examples to influence our thinking. OK, we have a lot of work to do here, then. I can really only speak with any significant knowledge on the perl front. Fundamentally, it has 3 types of scalars: IV, NV and PV (integer, float, string). IV can accomodate at least the largest integer or pointer type on the platform, NV a double, and PV an arbitrary string of bytes. As for structured types, as I noted elsewhere we have some of the work done for plperl. My suggestion would be to complete it for plperl and get it fully orthogonal and then retrofit that to plpython/pltcl. I've actually been worried for some time that the conversion glue was probably imposing significant penalties on the non-native PLs, so I'm glad to see this getting some attention. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Managing the community information stream
Jim Nasby wrote: On May 6, 2007, at 8:18 AM, Andrew Dunstan wrote: Oh, the answer to Bruce's question about when to create a feature item? You could well do it at the time when today you create a TODO item. However, we might even do better. For example, we might well add feature requests that are denied. That would help people to see if something has been proposed before. The problem with our current TODO process is that whether an item makes it onto the list is essentially determined by did the idea catch a committer's attention, and did that committer happen to think it was a good idea. That sets the bar pretty high for getting stuff on the list (which you need for a simple list like TODO), but it also means it's very subjective. (Of course 98% of the time that committer is Bruce, but I don't think that matters here...) Users often request items be added to the TODO list, and I usually comply. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Managing the community information stream
Jim Nasby wrote: On May 6, 2007, at 8:18 AM, Andrew Dunstan wrote: Oh, the answer to Bruce's question about when to create a feature item? You could well do it at the time when today you create a TODO item. However, we might even do better. For example, we might well add feature requests that are denied. That would help people to see if something has been proposed before. Uh, TODO has that: Features We Do _Not_ Want Do we need more items on that list? -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Managing the community information stream
To follow up on Andrew's idea of tracking things back to the TODO or bug number: We could have a universal developer number, something like PGD#23432 as a PostgreSQL Developer number. We could assign them for submissions to the bugs list, where we already assign a number. I could easily add them to TODO items that already don't have a number from the bugs list, and we could use the number for postings to the patches list that again don't already have a number. (The PGD numbers would have value ranges assigned for specific uses, like 0-10 are bugs, 11-20 are assigned as TODO items, +30 are patches, etc.) The idea is that if you are working on a TODO item you mention that number in the email subject discussing it, and for postings to the patches list. A web application could then read from the email stream and pull out information about any item. The only overhead is people mentioning the assigned number consistently. One problem is that our development isn't linear --- often TODO items are the result of several email threads, and TODO items are split and merged regularly, meaning that a PGD number could be partially complete or be merged with another number. When this happens, the number might cause confusion, and I don't see a way to fix that easily. --- Dave Page wrote: Bruce Momjian wrote: The idea of the patch number in the subject line works with that streaming model because it merely marks streams so they can be grouped. The defining event that marks the stream is a post to the patches list. We already number posts to the bugs list, so in a way we could improve tracking there and somehow link it to TODO items and patch submissions, but because many TODO items are not the result of bug reports but come out of general discussions, I am not sure tracking would work as well there. And what about features? Do you start assigning numbers there, and what is your trigger event? In my opinion, as you start trying to place more structure on the stream, the stream itself starts to degrade in its dynamism and ease of use. To me, that is the fundamental issue, and risk. Bruce, I cannot really add to that except to say that you neatly summarized what I've completely failed to in my last few emails to Andrew. I agree completely. Regards, Dave. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Re: Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)
On Sat, 12 May 2007, Jim C. Nasby wrote: Not to beat a dead horse, but do we really want to force folks to be parsing logs for performance monitoring? All of the really interesting DBA level information about checkpoints is now sitting in pg_stat_bgwriter. There really is no reason I'd expect this information to be part of normal performance monitoring setup; for me it has fit into troubleshooting and RD. When I'm talking about parsing logs in this context, it's for answering questions like how does fsync time correlate with amount of data written at checkpoint time? Since I've found it educational to sort through that, I just didn't want the patch modified so that was harder to do than it had to be. If I thought it worth the trouble to expose that level of information via the stats interface, I'd have submitted that instead of this log-based patch. This circles back to the previous discussion of whether this particular information is strictly developer-level. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Performance monitoring
On Sat, 12 May 2007, Joshua D. Drake wrote: One thing that doesn't seemed to be being looked at it is the cost of logging. If any of this executed at something like the query level, sure, that would be real important. The majority of the logging I suggested here is of things that happen at checkpoint time. The presumption is that the overhead of that is considerably greater than writing a log line or even five. The least intensive event I would like to be loggable is when a new WAL segment is created and cleared, which is again a pretty small bit of log compared to the 16MB write. I wouldn't mind seeing that exposed under pg_stats instead, just had more interesting things to statify first. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] What is happening on buildfarm member baiji?
The last two runs on baiji have failed at the installcheck stage, with symptoms that look a heck of a lot like the most recent system catalog changes haven't taken effect (eg, it doesn't seem to know about pg_type.typarray). Given that the previous check step passed, the most likely explanation seems to be that some part of the install step failed --- I've not tried to reproduce the behavior but it looks like it might be explained if the install target's postgres.bki file was not getting overwritten. So we have two issues: what exactly is going wrong (some new form of Vista brain death no doubt), and why isn't the buildfarm script noticing? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Performance monitoring
On Sat, 2007-12-05 at 14:26 -0700, Joshua D. Drake wrote: Either way, we are taking the hit, it is just a matter of where. IMO it would be better to have the information in the database where it makes sense, than pushing out to a log If performance monitoring information is provided as a database object, what would the right interface be? IMHO the problem with cleanly presenting monitoring information within a normal database system is that this sort of data is fundamentally dynamic and continuous: to determine how the performance of the system changes over time, you need to repeatedly rescan the table/view/SRF and recompute your analysis essentially from scratch. Trying to get even simple information like queries per second from pg_stat_activity is an example of how this can be painful. plug BTW, if the system included the concept of a continuous data *stream* as a kind of database object, this problem would be much more tractable :) In fact, there is some code in a version of TelegraphCQ that exposes various information about the runtime state of the system as a set of system-defined data streams -- like any other stream, users could then use those streams in arbitrary queries. /plug -Neil ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] What is happening on buildfarm member baiji?
Tom Lane wrote: The last two runs on baiji have failed at the installcheck stage, with symptoms that look a heck of a lot like the most recent system catalog changes haven't taken effect (eg, it doesn't seem to know about pg_type.typarray). Given that the previous check step passed, the most likely explanation seems to be that some part of the install step failed --- I've not tried to reproduce the behavior but it looks like it might be explained if the install target's postgres.bki file was not getting overwritten. So we have two issues: what exactly is going wrong (some new form of Vista brain death no doubt), and why isn't the buildfarm script noticing? The script will not even run if the install directory exists: die $buildroot/$branch has $pgsql or inst directories! if ((!$from_source -d $pgsql) || -d inst); But the install process is different for MSVC. It could be that we are screwing up there. I no longer have an MSVC box, so I can't tell so easily ;-( cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly