Re: [HACKERS] Getting a move on for 8.2 beta
On Sat, Sep 02, 2006 at 09:26:36PM -0700, Joshua D. Drake wrote: I just spent 1/2 hour fixing the multi-value UPDATE patch for the code drift caused by UPDATE/RETURNING. The patch is a simple grammar macro. Any coder could have taken that, reviewed it, and applied it, but no one did. Perhaps that's because nobody but you wanted it to go in. We got tons of people who wanted that. To further this, I didn't even know it was an issue. If it was only half an hour and it needed to be done, why wasn't it put out there? Thanks goes to Bruce for fixing it but I didn't know it was an issue, I have 5 C developers, if any of them could have done it -- then some communication is needed and I would have allocated someone to it. I am sure that is the case with others as well. I am not saying it is anyone's responsibility to speak up, but I can say had I known some help was needed (especially something so trivial) I would have gladly donated some time. There's been talk in the past of having some kind of system that automatically attempts to build things that are in the patch queue, both as an initial sanity-check and as a means to detect when something bit-rots... perhaps it's becoming worthwhile to set that up. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Simplifying standby mode
Tom Lane [EMAIL PROTECTED] writes: The project policy has always been that we don't change on-disk formats in minor releases. I'm not entirely clear why you are so keen on carving out an exception for WAL data. I had always thought of the policy as initdb is not required not no on-disk format changes. In that light you're suggesting extending the policy which I guess I just thought should be done explicitly rather than making policy by accident. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Could we see the actual EXPLAIN ANALYZE results for the slow plan? http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt Well, indeed it seems that the hash join is just an innocent bystander: the bulk of the runtime (all but about 120 sec in fact) is spent here: - Nested Loop (cost=13.65..1719683.85 rows=12000672 width=49) (actual time=60.325..24923860.713 rows=11897899 loops=1) - Merge Join (cost=0.00..10248.66 rows=2 width=41) (actual time=16.654..2578.060 rows=19837 loops=1) ... - Bitmap Heap Scan on lineitem (cost=13.65..77.16 rows=665 width=16) (actual time=13.492..1254.535 rows=600 loops=19837) Recheck Cond: (lineitem.l_suppkey = supplier.s_suppkey) - Bitmap Index Scan on i_l_suppkey (cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662 rows=600 loops=19837) Index Cond: (lineitem.l_suppkey = supplier.s_suppkey) I suppose that the profile result you showed was taken during the startup transient where it was computing the hashtables that this loop's results are joined to ... but that's not where the problem is. The problem is repeating that bitmap scan on lineitem for nearly 2 different l_suppkeys. possible - I actually took them over a longer period of time Apparently we've made the planner a bit too optimistic about the savings that can be expected from repeated indexscans occurring on the inside of a join. The other plan uses a different join order and doesn't try to join lineitem until it's got orders.o_orderkey, whereupon it does a mergejoin against an indexscan on lineitem: - Index Scan using i_l_orderkey on lineitem (cost=0.00..2715943.34 rows=60003360 width=16) (actual time=32.868..123668.380 rows=59991868 loops=1) The runtimes for the remainders of the plans are roughly comparable, so it's the cost of joining lineitem that is hurting here. Is lineitem sorted (or nearly sorted) by l_orderkey? Part of the problem could be overestimating the cost of this indexscan. What are the physical sizes of lineitem and its indexes, and how do those compare to your RAM? What are you using for planner settings (particularly effective_cache_size)? ouch - you are right(as usual) here. effective_cache_size was set to 10GB(my fault for copying over the conf from a 16GB box) during the run - lowering it just a few megabytes(!) or to a more realistic 6GB results in the following MUCH better plan: http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt as for the relation sizes: dbt3=# select pg_relation_size('lineitem'); pg_relation_size -- 10832764928 (1 row) dbt3=# select pg_total_relation_size('lineitem'); pg_total_relation_size 22960259072 (1 row) there are nine btree indexes on lineitem all between 1,1GB and 1,4GB in size. Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Getting a move on for 8.2 beta
Jim C. Nasby wrote: There's been talk in the past of having some kind of system that automatically attempts to build things that are in the patch queue, both as an initial sanity-check and as a means to detect when something bit-rots... perhaps it's becoming worthwhile to set that up. After writing the enum patch, I hacked the buildfarm client code to apply a patch to the checked out code before building. You could then run it thusly: ./run_build.pl --nosend --nostatus --verbose \ --patch=/home/tom/src/enums-v1.patch --patch-level=1 The idea was that patch authors could either run it manually or stick it in a cron so they could get emailed when the patch no longer cleanly applied, or when the patched source failed in make, make check etc. Obviously my motivation was to keep the enum patch up to date until we hit 8.3 and someone looks at it. To that end it might also be useful for it to die if duplicate_oids finds anything. I submitted a patch to Andrew, but it needed a couple of tweaks (disabling patching on vpath builds, for example) and I don't think I ever got around to resubmitting it, but if there's more general interest I'll do so. Note that it was intended for patch authors to run themselves rather than any kind of central mechanism to test the patch queue. While it would obviously be nice to know what the current status of any given patch in the queue is, the thing about the patch queue is that it contains patches that we haven't had time to review yet. It'll only take one patch to get into the queue containing a security vulnerability, or worse, a trojan, for it to seem unfortunate. I had thoughts of hacking the buildfarm server to allow the posting of a patch along with results, so that authors could report results for their own patches, but ran out of time. Is there interest in doing that? Obviously it'd be a different server to the existing buildfarm. Cheers Tom ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Optimizer improvements: to do or not to do?
Say42 wrote: Perhaps I am wrong but I assume normalization is a usual case, small master (parent) tables are not very rare also. Yes, my example is unusual but it is _real_ and demonstrate PG optimizer inaccuracy. Why don't we make PG optimizer more close to reality if we can? Is it so needless and I make a mountain out of a molehill? All you have shown so far is that one particular query runs faster on your machine when sequential scans are turned off. That is certainly a problem that is worth addressing. But you haven't offered any analysis about the cause of this problem, so any speculation about normalization, usual cases, caching effects and so on are unfounded and premature. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Simplifying standby mode
On Tue, 2006-09-12 at 16:23 -0400, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: My memory is lousy at the best of times, but when have we had a minor release that would have broken this due to changed format? Not often, which is why I mention the possibility of having interoperating minor release levels at all. If it was common, I'd just put a blanket warning on doing that. I don't know that it's happened in the past but I wouldn't be surprised. Consider that the bug being fixed in the point release may well be a bug in WAL log formatting. This would be the exception, not the rule, and should not be documented as if it were the rule. It's not really different from telling people to expect a forced initdb at a minor release: you are simply misrepresenting the project's policy. OK, that's clear. I'll word it the other way around. SGML'd version will go straight to -patches. -- Other Questions and Changes:: please shout them in now. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Lock partitions
On Tue, 2006-09-12 at 12:40 -0400, Tom Lane wrote: Strong, David [EMAIL PROTECTED] writes: When using 16 buffer and 16 lock partitions, we see that BufMapping takes 809 seconds to acquire locks and 174 seconds to release locks. The LockMgr takes 362 seconds to acquire locks and 26 seconds to release locks. When using 128 buffer and 128 lock partitions, we see that BufMapping takes 277 seconds (532 seconds improvement) to acquire locks and 78 seconds (96 seconds improvement) to release locks. The LockMgr takes 235 seconds (127 seconds improvement) to acquire locks and 22 seconds (4 seconds improvement) to release locks. While I don't see any particular penalty to increasing NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very significant penalty (increasing PGPROC size as well as the work needed during LockReleaseAll, which is executed at every transaction end). I think 128 lock partitions is probably verging on the ridiculous ... particularly if your benchmark only involves touching half a dozen tables. I'd be more interested in comparisons between 4 and 16 lock partitions. Also, please vary the two settings independently rather than confusing the issue by changing them both at once. Good thinking David. Even if 128 is fairly high, it does seem worth exploring higher values - I was just stuck in fewer == better thoughts. With the improvements in the various locking times, one might expect an improvement in the overall benchmark result. However, a 16 partition run produces a result of 198.74 TPS and a 128 partition run produces a result of 203.24 TPS. Part of the time saved from BufMapping and LockMgr partitions is absorbed into the WALInsertLock lock. For a 16 partition run, the total time to lock/release the WALInsertLock lock is 5845 seconds. For 128 partitions, the WALInsertLock lock takes 6172 seconds, an increase of 327 seconds. Perhaps we have our WAL configured incorrectly? I fear this throws your entire measurement procedure into question. For a fixed workload the number of acquisitions of WALInsertLock ought to be fixed, so you shouldn't see any more contention for WALInsertLock if the transaction rate didn't change materially. David's results were to do with lock acquire/release time, not the number of acquisitions, so that in itself doesn't make me doubt these measurements. Perhaps we can ask whether there was a substantially different number of lock acquisitions? As Tom says, that would be an issue. It seems reasonable that relieving the bottleneck on BufMapping and LockMgr locks that we would then queue longer on the next bottleneck, WALInsertLock. So again, those tests seem reasonable to me so far. These seem to be the beginnings of accurate wait time analysis, so I'm listening closely. Are you using a lightweight timer? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] dump / restore functionality
Hi, Tom, Tom Lane wrote: AFAIR what was discussed was separating - schema stuff needed before loading data - table data - schema stuff needed after loading data where the last category boils down to indexes and then foreign keys. All the other stuff such as functions really needs to be in the first part ... or at least there's no visible benefit to delaying loading it. I agree, it has to be in the first part, especially as data types and input functions needed for the table definitions and table data may be defined therein. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] GIN documentation
Patch adds GIN documentation and slightly improves GiST docs. Somebody of native English speakers, pls, check the text... Thank you. gindocs.gz Description: application/gzip ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Optimizer improvements: to do or not to do?
Peter Eisentraut wrote: But you haven't offered any analysis about the cause of this problem, so any speculation about normalization, usual cases, caching effects and so on are unfounded and premature. Ok. My previous message was a bit pompous and unfounded. Sorry. Below I'll try to explain what I mean when I spoke about caching effect. Let's take my pervious example (I repost query and some lines from 'explain' here for convenience): select count(*) from conn.conn20060803 c where exists (select code from belg_mobile tc where c.bnum = tc.code and c.bnum like tc.code || '%' order by tc.code desc limit 1) Index Scan Backward using belg_mobile_pkey on belg_mobile tc (cost=0.00..6.42 rows=1 width=10) (actual time=0.012..0.012 rows=0 loops=494527) Seq Scan on belg_mobile tc (cost=0.00..2.19 rows=1 width=10) (actual time=0.096..0.099 rows=0 loops=494527) belg_mobile is very small (68 rows (1 heap page) and has PK on code column (2 index pages)). indexCorrelation is equal to 0.0445 and almost don't affect cost estimation result. PG cost estimation (as far as I know, of course): Index scan cost = 2 (index pages) + 1 (heap pages) * 4 (random_page_cost) + ( 0.0025 (cpu_operator_cost) * 3 (# ops) + 0.001 (cpu_index_tuple_cost) + 0.01 (cpu_tuple_cost) ) * 68 (record count) * 0.5 (selectivity of subquery) ~ 6 (pages fetch cost) + 0.42 (cpu cost) = 6.42 Seq scan cost = 1(heap page) + (0.0025 (cpu_operator_cost) * 3 (# ops) + 0.01 (cpu_tuple_cost)) * 68 (record count) = 1 (pages fetch cost) + 1.19 (cpu cost) = 2.19 The estimation is ok if we touch the belg_mobile table only once. In the subquery we do it many times. After the first iteration of the subquery all the belg_mobile's heap and index pages are in the cache and cost per iteration should be estimated using formulae: Index scan cost = ( 6 (pages fetch cost) + 0.42 (cpu cost) * 500K (conn table row count) ) / 500K ~ 0.42 Seq scan cost = ( 1 (pages fetch cost) + 1.19 (cpu cost) * 500K (conn table row count) ) / 500K ~ 1.19 Index scan actually more cheaper because less than one tenth of conn rows have appropriate codes in the belg_mobile table. That's what I want to say. I am not a veteran DBMS user so I can not gauge importance of this cost inaccuracy in the whole. I hope you help me to look at the problem (?) more widely than I can at the moment. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Getting a move on for 8.2 beta
Tom Dunstan wrote: The idea was that patch authors could either run it manually or stick it in a cron so they could get emailed when the patch no longer cleanly applied, or when the patched source failed in make, make check etc. Obviously my motivation was to keep the enum patch up to date until we hit 8.3 and someone looks at it. To that end it might also be useful for it to die if duplicate_oids finds anything. I submitted a patch to Andrew, but it needed a couple of tweaks (disabling patching on vpath builds, for example) and I don't think I ever got around to resubmitting it, but if there's more general interest I'll do so. Huh, why would you disable patching on vpath builds? Note that it was intended for patch authors to run themselves rather than any kind of central mechanism to test the patch queue. Well, I'd think that one important benefit of passing patches through the buildfarm is detecting possible portability problems in it. While it would obviously be nice to know what the current status of any given patch in the queue is, the thing about the patch queue is that it contains patches that we haven't had time to review yet. It'll only take one patch to get into the queue containing a security vulnerability, or worse, a trojan, for it to seem unfortunate. We could have a register of developers allowed to upload patches to the patched build queue, and have a policy that says that you should only upload a patch if it has already passed some review. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Inconsistency in extended-query-protocol logging
Tom, I'm currently resyncing my parser with the changes you made to prepared statement logging. Everything is OK apart from an inconsistency between log_duration and log_min_duration_statement. * With log_duration, we have: protocol : LOG: statement: execute my_query: SELECT * FROM shop WHERE $1 = $2 query : LOG: statement: EXECUTE test1('lop1', 'lop2') * With log_min_duration_statement, we have: protocol: LOG: duration: 235.345 ms execute insert_query: INSERT INTO shop (name) VALUES($1) query: LOG: duration: 0.187 ms statement: EXECUTE test1('lop1', 'lop2') As you can see, in the log_duration case, we always have the statement: part but we don't have it in log_min_duration_statement case when using protocol. I attached a patch to improve the consistency. It adds statement: for every case. Note that statement was not there in the first version of Bruce and he added it after so I keep it in this patch. I don't really care if we have statement: or not but I'd really like a consistent behaviour between both configuration. Thanks, -- Guillaume Index: src/backend/tcop/postgres.c === RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v retrieving revision 1.508 diff -c -r1.508 postgres.c *** src/backend/tcop/postgres.c 8 Sep 2006 15:55:53 - 1.508 --- src/backend/tcop/postgres.c 13 Sep 2006 12:44:16 - *** *** 1256,1262 break; case 2: ereport(LOG, ! (errmsg(duration: %s ms parse %s: %s, msec_str, *stmt_name ? stmt_name : unnamed, query_string))); --- 1256,1262 break; case 2: ereport(LOG, ! (errmsg(duration: %s ms statement: parse %s: %s, msec_str, *stmt_name ? stmt_name : unnamed, query_string))); *** *** 1610,1616 break; case 2: ereport(LOG, ! (errmsg(duration: %s ms bind %s to %s: %s, msec_str, *portal_name ? portal_name : unnamed, *stmt_name ? stmt_name : unnamed, --- 1610,1616 break; case 2: ereport(LOG, ! (errmsg(duration: %s ms statement: bind %s to %s: %s, msec_str, *portal_name ? portal_name : unnamed, *stmt_name ? stmt_name : unnamed, *** *** 1819,1825 break; case 2: ereport(LOG, ! (errmsg(duration: %s ms %s %s%s%s%s%s, msec_str, execute_is_fetch ? _(execute fetch from) : --- 1819,1825 break; case 2: ereport(LOG, ! (errmsg(duration: %s ms statement: %s %s%s%s%s%s, msec_str, execute_is_fetch ? _(execute fetch from) : ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Getting a move on for 8.2 beta
Alvaro Herrera wrote: I submitted a patch to Andrew, but it needed a couple of tweaks (disabling patching on vpath builds, for example) and I don't think I ever got around to resubmitting it, but if there's more general interest I'll do so. Huh, why would you disable patching on vpath builds? As I understand it, vpath builds only do a checkout to a single place, and then build against that (from a different directory). Non-vpath builds take a copy of the checked out source and build in the copy. If we patched the source in vpath builds, the patch would stick around when updating the source tree from CVS, and the next patch attempt would fail etc. Non-vpath builds will get a clean copy to patch in subsequent runs. I suppose I could have made vpath builds take a copy when doing a patch, but it hardly seemed worth it. Well, I'd think that one important benefit of passing patches through the buildfarm is detecting possible portability problems in it. Absolutely. As long as they're blessed as mention below... We could have a register of developers allowed to upload patches to the patched build queue, and have a policy that says that you should only upload a patch if it has already passed some review. This was where I was originally heading when I first thought about this functionality. I didn't go that far with my fairly modest patch since a) it wasn't clear that there was manpower available to do the initial review to bless the patches, and b) what I did do solved my immediate problem. If there is support for doing something like this, there are all kinds of things that could be done. For example, you could check which patches break which other ones. An even more way-out idea might be to have performance patches run pgbench or some other set of benchmarks. You have a performance related patch? Let's stick it in the queue and see if it really improves things or not. Note that the existing patch queue mechanism wouldn't suffice, since there's no standard way to pull patches through via http or whatever. We'd probably want to back it with a small database and webapp, which might just be a hacked buildfarm server. Cheers Tom ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Getting a move on for 8.2 beta
Tom Dunstan wrote: Alvaro Herrera wrote: I submitted a patch to Andrew, but it needed a couple of tweaks (disabling patching on vpath builds, for example) and I don't think I ever got around to resubmitting it, but if there's more general interest I'll do so. Huh, why would you disable patching on vpath builds? As I understand it, vpath builds only do a checkout to a single place, and then build against that (from a different directory). Non-vpath builds take a copy of the checked out source and build in the copy. If we patched the source in vpath builds, the patch would stick around when updating the source tree from CVS, and the next patch attempt would fail etc. Non-vpath builds will get a clean copy to patch in subsequent runs. I suppose I could have made vpath builds take a copy when doing a patch, but it hardly seemed worth it. Huh, but the patch can be applied with -R to revert it after the buildfarm run ... the one problem I can see is if the patch fails for some reason; for which I'd suggest running a patch --dry-run as a first step, checking that it applies cleanly, and only continue in that case. [nice ideas snipped (not sniped)] Note that the existing patch queue mechanism wouldn't suffice, since there's no standard way to pull patches through via http or whatever. We'd probably want to back it with a small database and webapp, which might just be a hacked buildfarm server. Oh, sure. I'd say there is no existing patch queue, just a Mhonarc archive which is just useful as a reminder of what patches are there outstanding. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Getting a move on for 8.2 beta
Alvaro Herrera wrote: Huh, but the patch can be applied with -R to revert it after the buildfarm run ... the one problem I can see is if the patch fails for some reason; for which I'd suggest running a patch --dry-run as a first step, checking that it applies cleanly, and only continue in that case. Unfortunately, this won't fly as we rely on file timestamps to tell us which files have changed. Tom's idea of making a temp copy of the repo and patching that would work, but if you're going to do that why do a vpath build anyway? Regarding the idea of a list of approved patch authorisers, don't we have such a group now? i.e. committers. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Inconsistency in extended-query-protocol logging
Guillaume Smet [EMAIL PROTECTED] writes: I attached a patch to improve the consistency. It adds statement: for every case. Isn't that just bloating the log? And weren't you the one complaining about log volumes to begin with? The reason it is why it is is that Bruce wanted the wording to reflect why you are getting the log message: anything starting statement: is being logged because of log_statement, anything starting duration: is being logged because of one of the log_duration variables. Your proposed patch blurs that distinction and doesn't buy anything much that I can see. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Apparently we've made the planner a bit too optimistic about the savings that can be expected from repeated indexscans occurring on the inside of a join. effective_cache_size was set to 10GB(my fault for copying over the conf from a 16GB box) during the run - lowering it just a few megabytes(!) or to a more realistic 6GB results in the following MUCH better plan: http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt Interesting. It used to be that effective_cache_size wasn't all that critical... what I think this report is showing is that with the 8.2 changes to try to account for caching effects in repeated indexscans, we've turned that into a pretty significant parameter. It'd be nice not to have to depend on the DBA to give us a good number for this setting. But I don't know of any portable ways to find out how much RAM is in the box, let alone what fraction of it we should assume is available per-query. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Getting a move on for 8.2 beta
Andrew Dunstan wrote: Tom's idea of making a temp copy of the repo and patching that would work, but if you're going to do that why do a vpath build anyway? In this case, the answer is to make sure that your patch works when *someone else* does a vpath build. Regarding the idea of a list of approved patch authorisers, don't we have such a group now? i.e. committers. Right, and if committers or others are willing to put in the time required to verify that patches aren't nasty before going onto the blessed patch queue, the idea could quite possibly work and provide some value. Note that all we really need to test for here is that the patch isn't malicious; patches that are bad design or whatever are unlikely to open security holes or fry your box. A major point of the queue is that the appropriate committer often doesn't have the time to review the patch right now. There might be some benefit in allowing a wider set of contributors to bless patches as non-nasty for testing purposes, rather than waste the limited time of qualified committers. Maybe such an army doesn't exist, though. Cheers Tom ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Optimizer improvements: to do or not to do?
Say42 [EMAIL PROTECTED] writes: ... Let's take my pervious example (I repost query and some lines from 'explain' here for convenience): select count(*) from conn.conn20060803 c where exists (select code from belg_mobile tc where c.bnum = tc.code and c.bnum like tc.code || '%' order by tc.code desc limit 1) I'm having a hard time getting excited about improving this query when it's so badly coded in the first place. What's an ORDER BY doing in an EXISTS subquery? The LIMIT is unnecessary too. And the inner WHERE says nothing so much as I don't know how to design a database :-(. If we're going to look at specific examples we should at least look at examples that are representative of typical good practice. It is true that EXISTS() subqueries are planned independently without any idea of how often they might get re-executed. This would be good to fix but I don't see any clear way to do it --- at the time we are processing the outer WHERE, we don't have enough context to judge how many times a particular clause might be evaluated. (Yeah, in this case it's pretty obvious that it'll be executed once per conn20060803 row, but in join situations, or even just with additional outer WHERE clauses, it's not nearly so obvious.) regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Apparently we've made the planner a bit too optimistic about the savings that can be expected from repeated indexscans occurring on the inside of a join. effective_cache_size was set to 10GB(my fault for copying over the conf from a 16GB box) during the run - lowering it just a few megabytes(!) or to a more realistic 6GB results in the following MUCH better plan: http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt Interesting. It used to be that effective_cache_size wasn't all that critical... what I think this report is showing is that with the 8.2 changes to try to account for caching effects in repeated indexscans, we've turned that into a pretty significant parameter. yes I'm a bit worried about that too - it has been a bit of conventional wisdom that setting effective_cache_size optimistic will never hurt and that it encourages postgresql to sometimes get a better plan by favouring index-scans. It'd be nice not to have to depend on the DBA to give us a good number for this setting. But I don't know of any portable ways to find out how much RAM is in the box, let alone what fraction of it we should assume is available per-query. well there are really a number of things the dba would better give accurate information to the database - though in that case we might go from too much won't hurt to too much will hurt ... Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Inconsistency in extended-query-protocol logging
On 9/13/06, Tom Lane [EMAIL PROTECTED] wrote: Guillaume Smet [EMAIL PROTECTED] writes: I attached a patch to improve the consistency. It adds statement: for every case. Isn't that just bloating the log? And weren't you the one complaining about log volumes to begin with? As I told you, I don't care if we remove it or if we add it but having: statement: query - duration: duration statement: query in one case and statement: query - duration: duration query in the other case is not consistent. Either we remove statement: for extended protocol or we add it but I don't think it's a good idea to have a different behaviour between log_duration and log_min_duration_statement. As for bloating the log, it's already the case currently with regular queries so it won't change that much. At least in the cases we have here, it's negligible compared to the query text. IMHO, it's not an argument to choose either solution. IMHO, it's more logical to remove it as the text after statement: is not a statement in the extended query protocol case. I chose the other solution to be consistent with the choices Bruce made before. Regards, -- Guillaume ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: AIX shared libraries (was Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries)
Tom Lane wrote: Is it possible that the rules have changed across AIX versions, and that the code in there now is needful for older versions? I don't think that this behaviour has changed. I remember it from AIX 4.3.2. AIX 4.3 is the first version to support the -brtl. The current code is in place to mimic the behaviour of dlopen, etc, on the older platforms. I think we are at a point that we can stop maintaining AIX older than 4.3 if we want. -rocco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?
On Wed, Sep 13, 2006 at 10:47:09AM -0400, Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Apparently we've made the planner a bit too optimistic about the savings that can be expected from repeated indexscans occurring on the inside of a join. effective_cache_size was set to 10GB(my fault for copying over the conf from a 16GB box) during the run - lowering it just a few megabytes(!) or to a more realistic 6GB results in the following MUCH better plan: http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt Interesting. It used to be that effective_cache_size wasn't all that critical... what I think this report is showing is that with the 8.2 changes to try to account for caching effects in repeated indexscans, we've turned that into a pretty significant parameter. It'd be nice not to have to depend on the DBA to give us a good number for this setting. But I don't know of any portable ways to find out how much RAM is in the box, let alone what fraction of it we should assume is available per-query. That's fairly straight-forward, if a little crude. We ask the DBA and provide some tools for estimating and tuning same. :) Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries
The patch did not work for me :-( My buildfarm members failed in local testing to execute the install-check, because initdb failed to find libpq.so. Make check did succeed, so I think there is a possibility of getting it working, but it won't be as simple as adding -brtl to the template. I was also getting duplicate symbols in the link, that I wasn't before either. I don't have time right now to work through all the issues, but wanted to give feedback that the patch isn't quite this simple. Thanks, -rocco -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Albe Laurenz Sent: Tuesday, September 12, 2006 9:01 AM To: Tom Lane *EXTERN* Cc: pgsql-hackers@postgresql.org Subject: Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries Tom Lane wrote: In our case, we have libpq.a and libpq.so in the same directory, so unless you link with -brtl you will get a static link (because libpq.a is a static library). I wonder whether we ought to suppress building (or at least installing) our .a libraries at all on AIX. Adding -btrl to LDFLAGS would help within the context of our own build, but external clients that link to libpq without saying that are going to get undesirable results. I think there's a reasonable argument that by installing a .a file that isn't a shared library, we are violating the platform's conventions. The natural way in AIX would be: - Create libpq.so - Create libpq.a by 'rm -f libpq.a; ar -rc libpq.a libpq.so' - Install only libpq.a For a static build on AIX, you have to specify all the libraries and give the linker -bstatic and -bI:/lib/syscalls.exp Should -brtl be added to src/template/aix? Sounds that way, but that'll only help for psql and other stuff built within our build. Could you try this against CVS tip: * add -brtl to LDFLAGS in the template * Remove the AIX-specific hack on $(libpq) at lines 349-354 of src/Makefile.global.in * see if it configures and builds I have done that (see the attached patch) and it works fine. I don't have the native AIX C compiler, so I could only test it with gcc. I have taken the liberty to modify the static link line in Makefile.global.in to contain the LDAP libraries, I hope that's appropriate. Yours, Laurenz Albe ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )
The patch did not work for me :-( My buildfarm members failed in local testing to execute the install-check, because initdb failed to find libpq.so. Make check did succeed, so I think there is a possibility of getting it working, but it won't be as simple as adding -brtl to the template. I suspect that up to now the buildfarm had a static build of PostgreSQL. What is the output of 'ldd initdb' when it builds and runs correctly? Is libpq.so in a non-standard directory? If yes, one either has to export LIBPATH in the environment or link with -L/location/of/libpq for the executable to find it (similar to RPATH in Linux). Yours, Laurenz Albe ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Inconsistency in extended-query-protocol logging
Guillaume Smet [EMAIL PROTECTED] writes: IMHO, it's more logical to remove it as the text after statement: is not a statement in the extended query protocol case. Well, I was never particularly enamored of the idea that we should put statement: into everything generated by log_statement. My proposal would be to generate statement: querystring Simple Query parse stmt: querystring Parse bind stmt/portal: querystring Bind execute stmt/portal: querystringExecute or these prefixed with duration: xxx, as appropriate. Bruce was pretty hot about having statement: in there, so the hard part might be to convince him. Also, the current code distinguishes a fetch from an already-partially-executed portal ... do you care about that? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Inconsistency in extended-query-protocol logging
On 9/13/06, Tom Lane [EMAIL PROTECTED] wrote: statement: querystring Simple Query parse stmt: querystring Parse bind stmt/portal: querystring Bind execute stmt/portal: querystringExecute I agree with that. Hmmm, AFAICS, you changed stmt/portal to portal to statement in your last commit. Or did I misunderstand? or these prefixed with duration: xxx, as appropriate. Bruce was pretty hot about having statement: in there, so the hard part might be to convince him. Bruce, any opinion? I really think the extended query protocol is not a statement stricly speaking. Also, the current code distinguishes a fetch from an already-partially-executed portal ... do you care about that? I don't really understand what is a portal - I must admit I don't use libpq directly. I never saw a log file with fetch. Do you have an example? Should I consider an execute from fetch differently? -- Guillaume ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] contrib uninstall scripts need some love
Joshua D. Drake [EMAIL PROTECTED] writes: I have seen some patches come across with this. Is this done, or do I still need to work on it? Teodor added an uninstall for hstore, and I think Michael has fixed all the problems in the existing scripts, but we still lack uninstall scripts for the other modules I mentioned. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] CVS commit messages and backpatching
FYI, in going through the release notes, I would like to remind committers that it is important to mention if the commit was backpatched to any earlier release. Not only is this valuable for making the release notes, but it also helps people looking at the commit message. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Optimizer improvements: to do or not to do?
Simon Riggs wrote: On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote: That's what I want to do: 1. Replace not very useful indexCorrelation with indexClustering. An opinion such as not very useful isn't considered sufficient explanation or justification for a change around here. Not sufficient for some types of data would have been more fair. I speculate that an new additional stat of average # of unique values for a column within a block would go a long way to helping my worst queries. It's common here for queries to vastly overestimate the number of pages that would need to be read because postgresql's guess at the correlation being practically 0 despite the fact that the distinct values for any given column are closely packed on a few pages. Our biggest tables (180G or so) are mostly spatial data with columns like City State Zip County Street School District, Police Beat, lat/long etc; and we cluster the table on zip,street. Note that practically all the rows for any single value of any of the columns will lay in the same few blocks. However the calculated correlation being low because the total ordering of the other values doesn't match that of zip codes. This makes the optimizer vastly overestimate the cost of index scans because it guesses that most of the table will need to be read, even though in reality just a few pages are needed. If someone does look at the correlation calculations, I hope this type of data gets considered as well. I speculate that a new stat of average # of unique values for a column within a block could be useful here in addition to correlation. For most all my columns in my big table, this stat would be 1 or 2; which I think would be a useful hint that despite a low correlation, the distinct values are indeed packed together in blocks. That way the optimizer can see that a smaller number of pages would need to be accessed than correlation alone would suggest. Does this make sense, or am I missing something. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CVS commit messages and backpatching
Bruce Momjian wrote: FYI, in going through the release notes, I would like to remind committers that it is important to mention if the commit was backpatched to any earlier release. Backpatches usually happen after the commit to the head branch has been made, so the information is not available at that time. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CVS commit messages and backpatching
Peter Eisentraut wrote: Bruce Momjian wrote: FYI, in going through the release notes, I would like to remind committers that it is important to mention if the commit was backpatched to any earlier release. Backpatches usually happen after the commit to the head branch has been made, so the information is not available at that time. Well, I have a script where I create a single commit message that I use to apply to multiple branches. Usually you know at the time of commit to HEAD that you are going to backpatch. And in such cases, it is helpful to mention it as part of the HEAD commit. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] GIN documentation
On Wed, 2006-09-13 at 14:24 +0400, Teodor Sigaev wrote: Patch adds GIN documentation and slightly improves GiST docs. Somebody of native English speakers, pls, check the text... Thank you. I have some minor changes that reworded some parts that were confusing to me. I don't fully understand GIN, but I hope this is helpful. Regards, Jeff Davis --- gindocs.orig 2006-09-13 10:07:57.0 -0700 +++ gindocs 2006-09-13 10:43:25.0 -0700 @@ -89,7 +89,7 @@ +acronymGIN/acronym stands for Generalized Inverted Index. It is +an index structure storing a set of (key, posting list) pairs, where +'posting list' is a set of rows in which the key occurs. The -+row may contains a lot of keys. ++row may contain many keys. + /para + + para @@ -209,10 +209,10 @@ +termCreate vs insert/term +listitem + para -+ In most cases, insertion into acronymGIN/acronym index is slow enough -+ due to a lot keys should be inserted per one value. So, for bulk upload -+ data in table it will be useful to drop index and create it -+ after finishing upload. ++ In most cases, insertion into acronymGIN/acronym index is slow because ++ many GIN keys may be inserted for each table row. So, when loading data ++ in bulk it may be useful to drop index and recreate it ++ after the data is loaded in the table. + /para +/listitem + /varlistentry @@ -381,9 +381,8 @@ +para + Short-term share/exclusive page-level locks are used for + read/write access. Locks are released immediately after each -+ index row is fetched or inserted. But note, that GIN index -+ usually requires produce several inserts per one row, so, -+ GIN makes more work per one value's insertion. ++ index row is fetched or inserted. However, note that GIN index ++ usually requires several inserts per one table row. +/para + /listitem + /varlistentry ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] - Proposal for repreparing prepared statements
The following is a proposal for work I'd like to do to force long-running backend processes to reprepare their prepared statements. It would be used in cases where the user knows they have made a database change that will invalidate an existing prepared statement. I look forward to comments from the community. I propose creating a new system administration function to force repreparation of prepared statements in all backends. The functionality could be extended to include re-initialization of other kinds of per-backend data. This proposal addresses, to some degree, the prepare-alter-exec issue discussed in various mailing list postings, and the following wish-list item: # Invalidate prepared queries, like INSERT, when the table definition is altered However, the solution would only be partial, as it would be the responsibility of database clients to call the system administration function when needed. Alternately, additional integration work could be done to invoke this logic automatically whenever the columns of any table are altered. -- Here is what I propose: We define a new system administration function called pg_reload_per_backend_data. This function would work much like pg_reload_conf, i.e. it would require superuser privileges and would work by sending a signal to the postmaster that would then be propagated to all the child backends (but not the special ones, like the bgwriter). The signal handling logic for the backends would be modified to respond to the signal by reinitializing any data cached in the backend's memory space, such as prepared statements. Each kind of data that would be reinitialized would require special logic, as they would all be reinitialized in their own particular way. Choosing an appropriate signal to send might be difficult, as the list of available signals is somewhat restricted. The user-defined signals would be a natural choice, but it appears SIGUSR1 is used for sinval or catchup events, while SIGUSR2 is used for asynchronous notification. Use of the real time signals (signal numbers = 32) might be possible, but could have portability problems. Another alternative would be to overload SIGHUP, so that it causes both configuration reloads and reloading of per-backend data. This makes some sense, since most configuration parameters are basically a special form of per-backend data. However, changing the behavior of an existing signal might have undesirable side effects. Overall, I'm very open to suggestions regarding the appropriate signal to use. To implement the repreparation logic, a new function called RepreparePreparedStatements() could be added to source files backend/commands/prepare.[ch]. This function would be called by a signal handler installed the backends within backend/tcop/postgres.c. RepreparePreparedStatements would do the equivalent of iterating over the prepared_queries hash table and executing DropPreparedStatement() and PrepareQuery on each. However, it is possible that some refactoring of the logic would be needed to improve performance and make the code more robust. The scope of pg_reload_per_backend_data could also be expanded to include reinitialization of other data that resides in the memory space of individual backend processes. An example of such cached entities are reusable modules associated with a particular procedural language, e.g. the TCL modules found in the table pltcl_modules. Once a such a module is used in a particular backend, it remains held in backend memory and changes to the disk version are not noticed. There is also no way to undefine any global variables associated with such modules. I have not given much consideration to the implementation for reloading modules, but doing the equivalent of the SQL command LOAD 'libname' for all dynamically loaded libraries should have the desired effect (at least it does for the library that implements the PL/TCL language, pltcl.so). Perhaps the the general response should be to reload any libraries that have been dynamically-loaded by the particular backend. -- Here are few permutations of this plan that could be considered: 1. Bundle pg_reload_per_backend_data functionality with pg_reload_conf. Pros: Avoids having to find an appropriate unused signal Logical consistancy with reloading config, which could be considered a special case of reloading per-backend data. Cons: Changes behavior of an existing functionality, which has the risk of unintended side-effects. Gives less fine-grained control over when per-backend data is reloaded. 2. Break pg_reload_per_backend_data functional into multiple functions. Pros: Can assign more descriptive names to the functionality, e.g. pg_reload_ddl, pg_reprepare_statements, etc. Finer grained control over which kind of reloading is performed. Cons: Require more use of the scarce list of available signals.
Re: [HACKERS] Lock partitions
Simon, In the 16/16 (16 buffer partitions/16 lock partitions) test, the WALInsertLock lock had 14643080 acquisition attempts and 12057678 successful acquisitions on the lock. That's 2585402 retries on the lock. That is to say that PGSemaphoreLock was invoked 2585402 times. In the 128/128 test, the WALInsertLock lock had 14991208 acquisition attempts and 12324765 successful acquisitions. That's 2666443 retries. The 128/128 test attempted 348128 more lock acquisitions than the 16/16 test and retried 81041 times more than the 16/16 test. We attribute the rise in WALInsertLock lock accesses to the reduction in time on acquiring the BufMapping and LockMgr partition locks. Does this seem reasonable? The overhead of any monitoring is of great concern to us. We've tried both clock_gettime () and getttimeofday () calls. They both seem to have the same overhead ~1 us/call (measured against the TSC of the CPU) and both seem to be accurate. We realize this can be a delicate point and so we would be happy to rerun any tests with a different timing mechanism. David -Original Message- From: Simon Riggs [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 13, 2006 2:22 AM To: Tom Lane Cc: Strong, David; PostgreSQL-development Subject: Re: [HACKERS] Lock partitions On Tue, 2006-09-12 at 12:40 -0400, Tom Lane wrote: Strong, David [EMAIL PROTECTED] writes: When using 16 buffer and 16 lock partitions, we see that BufMapping takes 809 seconds to acquire locks and 174 seconds to release locks. The LockMgr takes 362 seconds to acquire locks and 26 seconds to release locks. When using 128 buffer and 128 lock partitions, we see that BufMapping takes 277 seconds (532 seconds improvement) to acquire locks and 78 seconds (96 seconds improvement) to release locks. The LockMgr takes 235 seconds (127 seconds improvement) to acquire locks and 22 seconds (4 seconds improvement) to release locks. While I don't see any particular penalty to increasing NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very significant penalty (increasing PGPROC size as well as the work needed during LockReleaseAll, which is executed at every transaction end). I think 128 lock partitions is probably verging on the ridiculous ... particularly if your benchmark only involves touching half a dozen tables. I'd be more interested in comparisons between 4 and 16 lock partitions. Also, please vary the two settings independently rather than confusing the issue by changing them both at once. Good thinking David. Even if 128 is fairly high, it does seem worth exploring higher values - I was just stuck in fewer == better thoughts. With the improvements in the various locking times, one might expect an improvement in the overall benchmark result. However, a 16 partition run produces a result of 198.74 TPS and a 128 partition run produces a result of 203.24 TPS. Part of the time saved from BufMapping and LockMgr partitions is absorbed into the WALInsertLock lock. For a 16 partition run, the total time to lock/release the WALInsertLock lock is 5845 seconds. For 128 partitions, the WALInsertLock lock takes 6172 seconds, an increase of 327 seconds. Perhaps we have our WAL configured incorrectly? I fear this throws your entire measurement procedure into question. For a fixed workload the number of acquisitions of WALInsertLock ought to be fixed, so you shouldn't see any more contention for WALInsertLock if the transaction rate didn't change materially. David's results were to do with lock acquire/release time, not the number of acquisitions, so that in itself doesn't make me doubt these measurements. Perhaps we can ask whether there was a substantially different number of lock acquisitions? As Tom says, that would be an issue. It seems reasonable that relieving the bottleneck on BufMapping and LockMgr locks that we would then queue longer on the next bottleneck, WALInsertLock. So again, those tests seem reasonable to me so far. These seem to be the beginnings of accurate wait time analysis, so I'm listening closely. Are you using a lightweight timer? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] CVS commit messages and backpatching
Bruce Momjian [EMAIL PROTECTED] writes: FYI, in going through the release notes, I would like to remind committers that it is important to mention if the commit was backpatched to any earlier release. Not only is this valuable for making the release notes, but it also helps people looking at the commit message. With the standard output from cvs2cl, this is pretty obvious anyway, no? I see entries like 2006-08-29 09:39 teodor * contrib/tsearch2/: tsvector.c (REL8_1_STABLE), tsvector.c: Remove pos comparison in silly_cmp_tsvector(): it is not a semantically significant so it seems to me that explicit mention of back-patching is mostly redundant. (Of course, this requires the committer to commit all the branches at about the same time, which I make an effort to do precisely so that the cvs log looks nice. If some time elapses between patching and back-patching then a mention in the commit message is definitely needed.) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] AIX shared libraries
[EMAIL PROTECTED] (Rocco Altier) writes: Tom Lane wrote: Is it possible that the rules have changed across AIX versions, and that the code in there now is needful for older versions? I don't think that this behaviour has changed. I remember it from AIX 4.3.2. AIX 4.3 is the first version to support the -brtl. The current code is in place to mimic the behaviour of dlopen, etc, on the older platforms. I think we are at a point that we can stop maintaining AIX older than 4.3 if we want. Version 5.1 is no longer being maintained by IBM; we were some displeased when we heard when support offerings were expiring :-(. Fortunately, we already had plans in place for a migration to 5.3. I have to agree that even 4.3 is really rather old now. Looking at IBM's support lifecycle list... http://www-306.ibm.com/software/info/supportlifecycle/list/a.html AIX version AvailableSupport Withdrawn - 5.1 May 2001 April 2006 5.2 Oct 2002 Sept 2008 5.3 Aug 2004 unannounced, presumably late 2010... I'd guess that 4.3 fell out of support in late 2004. -- let name=cbbrowne and tld=linuxdatabases.info in String.concat @ [name;tld];; http://linuxfinances.info/info/ Of course 5 years from now that will be different, but 5 years from now everyone will be running free GNU on their 200 MIPS, 64M SPARCstation-5. -- Andrew Tanenbaum, 1992. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] - Proposal for repreparing prepared statements
Stephen Marshall [EMAIL PROTECTED] writes: The following is a proposal for work I'd like to do to force long-running backend processes to reprepare their prepared statements. It would be used in cases where the user knows they have made a database change that will invalidate an existing prepared statement. There should be no need for users to concern themselves with this. The direction we've been intending to go in is to automatically invalidate stored plans when any related schema or statistics change occurs, forcing a re-plan on any subsequent use. See past discussions (IIRC, Neil Conway actually did some work on this idea earlier this year, but didn't get it done). The appropriate cross-backend communication mechanism already exists: it's the catcache/relcache invalidation code. No need to fool with finding a spare signal; and you can't do any meaningful work in a signal handler anyway. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Getting a move on for 8.2 beta
Jim C. Nasby wrote: There's been talk in the past of having some kind of system that automatically attempts to build things that are in the patch queue, both as an initial sanity-check and as a means to detect when something bit-rots... perhaps it's becoming worthwhile to set that up. After writing the enum patch, I hacked the buildfarm client code to apply a patch to the checked out code before building. You could then run it thusly: ./run_build.pl --nosend --nostatus --verbose \ --patch=/home/tom/src/enums-v1.patch --patch-level=1 The idea was that patch authors could either run it manually or stick it in a cron so they could get emailed when the patch no longer cleanly applied, or when the patched source failed in make, make check etc. Obviously my motivation was to keep the enum patch up to date until we hit 8.3 and someone looks at it. To that end it might also be useful for it to die if duplicate_oids finds anything. I submitted a patch to Andrew, but it needed a couple of tweaks (disabling patching on vpath builds, for example) and I don't think I ever got around to resubmitting it, but if there's more general interest I'll do so. Note that it was intended for patch authors to run themselves rather than any kind of central mechanism to test the patch queue. While it would obviously be nice to know what the current status of any given patch in the queue is, the thing about the patch queue is that it contains patches that we haven't had time to review yet. It'll only take one patch to get into the queue containing a security vulnerability, or worse, a trojan, for it to seem unfortunate. I had thoughts of hacking the buildfarm server to allow the posting of a patch along with results, so that authors could report results for their own patches, but ran out of time. Is there interest in doing that? Obviously it'd be a different server to the existing buildfarm. Cheers Tom ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CVS commit messages and backpatching
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: FYI, in going through the release notes, I would like to remind committers that it is important to mention if the commit was backpatched to any earlier release. Not only is this valuable for making the release notes, but it also helps people looking at the commit message. With the standard output from cvs2cl, this is pretty obvious anyway, no? I see entries like 2006-08-29 09:39 teodor * contrib/tsearch2/: tsvector.c (REL8_1_STABLE), tsvector.c: Remove pos comparison in silly_cmp_tsvector(): it is not a semantically significant so it seems to me that explicit mention of back-patching is mostly redundant. (Of course, this requires the committer to commit all the branches at about the same time, which I make an effort to do precisely so that the cvs log looks nice. If some time elapses between patching and back-patching then a mention in the commit message is definitely needed.) I pull activity only from HEAD, so I do not see that tag. In fact, I use our src/tools/pgcvslog rather than cvslog. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] - Proposal for repreparing prepared statements
Tom, Thanks for the update on the roadmap. I have a couple of questions: 1. Is the invalidation of stored plans going to be part of 8.2? If not, any idea when it would be available? I'd be willing to work on this, if it would help. 2. Is there any plan for the other part of my proposal, i.e. the ability to force one or all backends to reload their dynamically linked libraries? This is needed by backends that use loadable modules with procedural languages (like pltcl_modules) in cases where a loadable module gets updated. Thanks, Steve -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 13, 2006 2:08 PM To: Marshall, Steve Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] - Proposal for repreparing prepared statements Stephen Marshall [EMAIL PROTECTED] writes: The following is a proposal for work I'd like to do to force long-running backend processes to reprepare their prepared statements. It would be used in cases where the user knows they have made a database change that will invalidate an existing prepared statement. There should be no need for users to concern themselves with this. The direction we've been intending to go in is to automatically invalidate stored plans when any related schema or statistics change occurs, forcing a re-plan on any subsequent use. See past discussions (IIRC, Neil Conway actually did some work on this idea earlier this year, but didn't get it done). The appropriate cross-backend communication mechanism already exists: it's the catcache/relcache invalidation code. No need to fool with finding a spare signal; and you can't do any meaningful work in a signal handler anyway. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] CVS commit messages and backpatching
Bruce Momjian [EMAIL PROTECTED] writes: I pull activity only from HEAD, so I do not see that tag. In fact, I use our src/tools/pgcvslog rather than cvslog. Not sure why we are maintaining our own script when there are much better things out there: http://freshmeat.net/projects/cvs2cl.pl/ regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] CVS commit messages and backpatching
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I pull activity only from HEAD, so I do not see that tag. In fact, I use our src/tools/pgcvslog rather than cvslog. Not sure why we are maintaining our own script when there are much better things out there: http://freshmeat.net/projects/cvs2cl.pl/ Well, my script produces output that is closer to what I need to create the release notes. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] - Proposal for repreparing prepared statements
Marshall, Steve [EMAIL PROTECTED] writes: 1. Is the invalidation of stored plans going to be part of 8.2? If not, any idea when it would be available? I'd be willing to work on this, if it would help. No, it did not get done; feel free to work on it for 8.3. 2. Is there any plan for the other part of my proposal, i.e. the ability to force one or all backends to reload their dynamically linked libraries? I would vote urgently against that, as not all loadable libraries are necessarily designed to survive being reloaded. Nor have I heard any previous requests for it; how often do loadable libraries get updated in production? Stuff like pltcl.so would only change at a version update, for which you have to restart the postmaster anyway to bring in the new executable image for postgres itself. The LOAD command is meant for development, not as something that would be invoked in production ... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Optimizer improvements: to do or not to do?
Ron Mayer [EMAIL PROTECTED] writes: It's common here for queries to vastly overestimate the number of pages that would need to be read because postgresql's guess at the correlation being practically 0 despite the fact that the distinct values for any given column are closely packed on a few pages. I think we need a serious statistics jock to pipe up with some standard metrics that do what we need. Otherwise we'll never have a solid footing for the predictions we make and will never know how much we can trust them. That said I'm now going to do exactly what I just said we should stop doing and brain storm about an ad-hoc metric that might help: I wonder if what we need is something like: sort the sampled values by value and count up the average number of distinct blocks per value. That might let us predict how many pages a fetch of a specific value would retrieve. Or perhaps we need a second histogram where the quantities are of distinct pages rather than total records. We might also need a separate average number of n-block spans per value metric to predict how sequential the i/o will be in addition to how many pages will be fetched. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Lock partitions
Tom, We have some results for you. We left the buffer partition locks at 128 as this did not seem to be a concern and we're still using 25 backend processes. We ran tests for 4, 8 and 16 lock partitions. For 4 lock partitions, it took 620 seconds to acquire locks and 32 seconds to release locks. The test produced 199.95 TPS. For 8 lock partitions, it took 505 seconds to acquire locks and 31 seconds to release locks. The test produced 201.16 TPS. For 16 lock partitions, it took 362 seconds to acquire locks and 22 seconds to release locks. The test produced 200.75 TPS. And, just for grins, using 128 buffer and 128 lock partitions, took 235 seconds to acquire locks and 22 seconds to release locks. The test produced 203.24 TPS. Let me know if we can provide any additional information from these tests and if there are any other tests that we can run. David -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Strong, David Sent: Wednesday, September 13, 2006 10:52 AM To: PostgreSQL-development Subject: Re: [HACKERS] Lock partitions Simon, In the 16/16 (16 buffer partitions/16 lock partitions) test, the WALInsertLock lock had 14643080 acquisition attempts and 12057678 successful acquisitions on the lock. That's 2585402 retries on the lock. That is to say that PGSemaphoreLock was invoked 2585402 times. In the 128/128 test, the WALInsertLock lock had 14991208 acquisition attempts and 12324765 successful acquisitions. That's 2666443 retries. The 128/128 test attempted 348128 more lock acquisitions than the 16/16 test and retried 81041 times more than the 16/16 test. We attribute the rise in WALInsertLock lock accesses to the reduction in time on acquiring the BufMapping and LockMgr partition locks. Does this seem reasonable? The overhead of any monitoring is of great concern to us. We've tried both clock_gettime () and getttimeofday () calls. They both seem to have the same overhead ~1 us/call (measured against the TSC of the CPU) and both seem to be accurate. We realize this can be a delicate point and so we would be happy to rerun any tests with a different timing mechanism. David -Original Message- From: Simon Riggs [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 13, 2006 2:22 AM To: Tom Lane Cc: Strong, David; PostgreSQL-development Subject: Re: [HACKERS] Lock partitions On Tue, 2006-09-12 at 12:40 -0400, Tom Lane wrote: Strong, David [EMAIL PROTECTED] writes: When using 16 buffer and 16 lock partitions, we see that BufMapping takes 809 seconds to acquire locks and 174 seconds to release locks. The LockMgr takes 362 seconds to acquire locks and 26 seconds to release locks. When using 128 buffer and 128 lock partitions, we see that BufMapping takes 277 seconds (532 seconds improvement) to acquire locks and 78 seconds (96 seconds improvement) to release locks. The LockMgr takes 235 seconds (127 seconds improvement) to acquire locks and 22 seconds (4 seconds improvement) to release locks. While I don't see any particular penalty to increasing NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very significant penalty (increasing PGPROC size as well as the work needed during LockReleaseAll, which is executed at every transaction end). I think 128 lock partitions is probably verging on the ridiculous ... particularly if your benchmark only involves touching half a dozen tables. I'd be more interested in comparisons between 4 and 16 lock partitions. Also, please vary the two settings independently rather than confusing the issue by changing them both at once. Good thinking David. Even if 128 is fairly high, it does seem worth exploring higher values - I was just stuck in fewer == better thoughts. With the improvements in the various locking times, one might expect an improvement in the overall benchmark result. However, a 16 partition run produces a result of 198.74 TPS and a 128 partition run produces a result of 203.24 TPS. Part of the time saved from BufMapping and LockMgr partitions is absorbed into the WALInsertLock lock. For a 16 partition run, the total time to lock/release the WALInsertLock lock is 5845 seconds. For 128 partitions, the WALInsertLock lock takes 6172 seconds, an increase of 327 seconds. Perhaps we have our WAL configured incorrectly? I fear this throws your entire measurement procedure into question. For a fixed workload the number of acquisitions of WALInsertLock ought to be fixed, so you shouldn't see any more contention for WALInsertLock if the transaction rate didn't change materially. David's results were to do with lock acquire/release time, not the number of acquisitions, so that in itself doesn't make me doubt these measurements. Perhaps we can ask whether there was a substantially different number of lock acquisitions? As Tom says, that would be an issue. It seems reasonable that relieving the bottleneck on BufMapping and LockMgr locks that we
Re: [HACKERS] CVS commit messages and backpatching
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I pull activity only from HEAD, so I do not see that tag. In fact, I use our src/tools/pgcvslog rather than cvslog. Not sure why we are maintaining our own script when there are much better things out there: http://freshmeat.net/projects/cvs2cl.pl/ Well, my script produces output that is closer to what I need to create the release notes. If there are procedures, please document them. Nobody ever told me much when I was given committer status, and I just did what it looked like you guys did, and no doubt made some mistakes. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Optimizer improvements: to do or not to do?
On Sep 13, 2006, at 14:44 , Gregory Stark wrote: I think we need a serious statistics jock to pipe up with some standard metrics that do what we need. Otherwise we'll never have a solid footing for the predictions we make and will never know how much we can trust them. That said I'm now going to do exactly what I just said we should stop doing and brain storm about an ad-hoc metric that might help: I wonder if what we need is something like: sort the sampled values by value and count up the average number of distinct blocks per value. That might let us predict how many pages a fetch of a specific value would retrieve. Or perhaps we need a second histogram where the quantities are of distinct pages rather than total records. We might also need a separate average number of n-block spans per value metric to predict how sequential the i/o will be in addition to how many pages will be fetched. Currently, statistics are only collected during an ANALYZE. Why aren't statistics collected during actual query runs such as seq scans? One could turn such as beast off in order to get repeatable, deterministic optimizer results. -M ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] - Proposal for repreparing prepared statements
Marshall, Steve [EMAIL PROTECTED] writes: 1. Is the invalidation of stored plans going to be part of 8.2? If not, any idea when it would be available? I'd be willing to work on this, if it would help. No, it did not get done; feel free to work on it for 8.3. [steve's reply]: Could you clue me into the buzz words for searching the lists to find out more details on the plan, what has been done, and what is left to do? 2. Is there any plan for the other part of my proposal, i.e. the ability to force one or all backends to reload their dynamically linked libraries? I would vote urgently against that, as not all loadable libraries are necessarily designed to survive being reloaded. Nor have I heard any previous requests for it; how often do loadable libraries get updated in production? Stuff like pltcl.so would only change at a version update, for which you have to restart the postmaster anyway to bring in the new executable image for postgres itself. The LOAD command is meant for development, not as something that would be invoked in production [steve's reply]: I can understand your reservations about reloading libraries, and I'm really not committed to doing that. Reloading libraries is just a mechanism to do what I really want to do. I guess that did not come across very clearly in my first email, so let me explain again. I want to be able to tell a backend to reinitialize some of the cached data it is holding in static variables. Generally I want to do this when the something cached in memory has gotten out-of-sync with what is in the database. The two examples that affect me are: 1. A database table has changed in its structure, and the prepared statements related to it either no longer work, or do the wrong thing. 2. A TCL module in pltcl_modules is updated with a new version, but running processes that have already executed a function from this module do not pick up the change. The former is handled by the prepared statement invalidation plan, but the latter is not. Effectively, I want something that does the equivalent to what CREATE OR REPLACE FUNCTION does for SQL function, except for these procedural language modules. The current way the pltcl_modules works is like have a CREATE OR REPLACE FUNCTION capability that will only effect newly started connections, but has no effect on existing connections. The reason I advocated the library reloading is not because pltcl.so has changed (which should be rare), but that by reloading it, the static variables associated with the library would get reinitialized. Essentially reloading the library pltcl.so has the side effect of causing the TCL modules in pltcl_modules to get reread from disk. I'd love to have a lighter-weight way to do this, but unfortunately, I have not come up with one. If someone has a better way to do this, I'm all ears. Currently the only way I know of to do a complete reinitialization of static variables is to force a disconnection on all the backend processes, and let them all reconnect to new postgres backends that have clean slates in their static variables. There does not seem to be a way to do this via SQL; as far as I know, this requires access to the database server as the postgres user or superuser and sending a TERM signal to each backend you want to kill. - Basically, I've come up with two basic approaches for forcing a reload of pltcl_modules: have a system admin function to do some kind of resetting in each backend that cleans out the static variables associated with a procedural language, or kill most or all of the backends and let the client applications handle the errors and reset their connections. Currently, the only resetting operation I know to work is reloaded the pltcl.so library. If anyone has any better ideas for how to crack this problem, I love to hear them. Here's some more details on the approaches I've thought of, for those who are interested: 1. Add special logic to pltcl_loadmod (the utility that facilitates inserting and updating TCL modules in the database). However, this would still require some kind of communication to all the other backend processes, which would probably require something like the next suggestion 2. A system admin command to force the reinitialization of the static data associated with a procedural language. This command would need to communicate either with all the backends, or to a backend with a particular pid (like pg_cancel_backend does); it is not sufficient for it only to effect the backend associated with its connection. This command could be specific to the procedural language (e.g. reload_pltcl_modules), or allow a particular dynamically linked library to be reload (reload_library_if_already_loaded(libname). There are probably other ways to effect the reinitialization, but all other them seem to require communication to a backend process other than the one we are connected to, which made it seem to me like a candidate for
Re: [HACKERS] Fixed length data types issue
On Sep 11, 2006, at 1:57 PM, Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: I think its's more important to pick bitpatterns that reduce the number of cases heap_deform_tuple has to think about while decoding the length of a field --- every if in that inner loop is expensive. I'll have to spend a few hours tomorrow becoming one with that section of code. I looked at it already and was surprised at how short it was already so I can understand what you mean. I'm guessing this won't change the design, but I'll throw it out anyway. I'd love to have the ability to control toasting thresholds manually. This could result in a lot of speed improvements in cases where a varlena field isn't frequently accessed and will be fairly large, yet not large enough to normally trigger toasting. An address field would be a good example. Being able to force a field to be toasted before it normally would could drastically improve tuple density without requiring the developer to use a 'side table' to store the data. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Getting a move on for 8.2 beta
On Sep 13, 2006, at 6:56 PM, Tom Dunstan wrote: Regarding the idea of a list of approved patch authorisers, don't we have such a group now? i.e. committers. Right, and if committers or others are willing to put in the time required to verify that patches aren't nasty before going onto the blessed patch queue, the idea could quite possibly work and provide some value. Note that all we really need to test for here is that the patch isn't malicious; patches that are bad design or whatever are unlikely to open security holes or fry your box. A major point of the queue is that the appropriate committer often doesn't have the time to review the patch right now. There might be some benefit in allowing a wider set of contributors to bless patches as non- nasty for testing purposes, rather than waste the limited time of qualified committers. Maybe such an army doesn't exist, though. That's something I'd be willing to do. And for many people that aren't committers but are still trusted in the community, we could probably bypass the checking. Another possibility would be to test these patches in some kind of virtual machine that gets blown away every X days, so that even if someone did get something malicious in there it wouldn't last long. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CVS commit messages and backpatching
Andrew Dunstan wrote: Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I pull activity only from HEAD, so I do not see that tag. In fact, I use our src/tools/pgcvslog rather than cvslog. Not sure why we are maintaining our own script when there are much better things out there: http://freshmeat.net/projects/cvs2cl.pl/ Well, my script produces output that is closer to what I need to create the release notes. If there are procedures, please document them. Nobody ever told me much when I was given committer status, and I just did what it looked like you guys did, and no doubt made some mistakes. I guess the question is whether it is possible using cvs2cl to show only HEAD, and then show if the same commit message also appears in the most recent back branch. And will that always work reliably? -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Lock partitions
Strong, David [EMAIL PROTECTED] writes: We have some results for you. We left the buffer partition locks at 128 as this did not seem to be a concern and we're still using 25 backend processes. We ran tests for 4, 8 and 16 lock partitions. For 4 lock partitions, it took 620 seconds to acquire locks and 32 seconds to release locks. The test produced 199.95 TPS. For 8 lock partitions, it took 505 seconds to acquire locks and 31 seconds to release locks. The test produced 201.16 TPS. For 16 lock partitions, it took 362 seconds to acquire locks and 22 seconds to release locks. The test produced 200.75 TPS. And, just for grins, using 128 buffer and 128 lock partitions, took 235 seconds to acquire locks and 22 seconds to release locks. The test produced 203.24 TPS. [ itch... ] I can't help thinking there's something wrong with this; the wait-time measurements seem sane, but why is there essentially no change in the TPS result? The above numbers are only for the lock-partition LWLocks, right? What are the totals --- that is, how much time is spent blocked vs. processing overall? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Lock partitions
On Sep 13, 2006, at 2:46 PM, Strong, David wrote: We have some results for you. We left the buffer partition locks at 128 as this did not seem to be a concern and we're still using 25 backend processes. We ran tests for 4, 8 and 16 lock partitions. Isn't having more lock partitions than buffer partitions pointless? -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] - Proposal for repreparing prepared statements
Marshall, Steve [EMAIL PROTECTED] writes: I want to be able to tell a backend to reinitialize some of the cached data it is holding in static variables. Generally I want to do this when the something cached in memory has gotten out-of-sync with what is in the database. The two examples that affect me are: 1. A database table has changed in its structure, and the prepared statements related to it either no longer work, or do the wrong thing. This we need to fix. 2. A TCL module in pltcl_modules is updated with a new version, but running processes that have already executed a function from this module do not pick up the change. To be frank, this is not Postgres' problem, it's pltcl's. I think the pltcl_modules facility is poorly designed (precisely because it doesn't fit into any sane way of handling the schema-update problem) and needs to be thrown out and redone. If the units you were reloading were Postgres functions, or had some other way of being represented in the system catalogs, then we'd have a reasonable way to attack the problem. But forcing a reload of pltcl.so is nothing but a kluge --- it leaks memory like there's no tomorrow, and it's only an accident that it fails to crash. I don't want to design a further kluge on top of it. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length data types issue
Mark Dilger wrote: Tom Lane wrote: Mark Dilger [EMAIL PROTECTED] writes: ... The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 = smallint, int3 = integer, ascii1 = text, ascii2 = text, ascii3 = text, etc.) This would work all right for the string-category cases, since TEXT is the only thing you really care about having them cast to anyway. It probably won't work all that well for int1/int3, because you really want them to coerce implicitly to all the wider numeric types. Otherwise, perfectly sane queries like int8 + int1 fail. Part of the issue here is that we deliberately keep the parser from searching for multi-step coercions. So for example if you only provide int1-int2 then the existence of up-casts from int2 doesn't help you use an int1 with anything except int2. I am not sure whether any problems would be created if you did provide the full spectrum of up-casts. I remember having argued that there would be problems with trying to invent uint2/uint4 types, but that was a very long time ago, before we had pg_cast and some other changes in the type resolution rules. With the current system it might work OK. regards, tom lane Thanks Tom, I will try this then. I won't be proposing to ever put this in core, as the increased code size isn't justified for people who aren't using these types (IMHO). Any further feedback on why this wouldn't work is appreciated, as it might save me some time learning on my own. But otherwise I'll post back in a few days when this is finished. I've created the int1 and int3 types, with casts to/from each other as well as to/from int2, int4, int8, float4, float8, and numeric. They also have comparison operators for themselves, though you have to use casts if you want to compare against other numeric types. int1 works perfectly, as far as I can tell. int3 works great in memory, but can't be stored to a table. The problem seems to be that store_att_byval allows data of size 1 byte but not size 3 bytes, forcing me to pass int3 by reference. But when I pass either of these types by reference the backend exits when trying to store to a table. Does anybody know whether storing data by reference works for data smaller than 4 bytes? If not, then I seem to be out of options for creating a 3-byte datatype. This would also seem to prevent the creation of an ascii3 type as well. mark tiny.tgz Description: GNU Unix tar archive ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Fixed length data types issue
Mark Dilger [EMAIL PROTECTED] writes: int1 works perfectly, as far as I can tell. int3 works great in memory, but can't be stored to a table. The problem seems to be that store_att_byval allows data of size 1 byte but not size 3 bytes, forcing me to pass int3 by reference. But when I pass either of these types by reference the backend exits when trying to store to a table. Please provide a stack trace --- AFAIK there shouldn't be any reason why a pass-by-ref 3-byte type wouldn't work. I'm wondering though what alignment you expect it to have. You'd need some pretty ugly code to pick up an unaligned 3-byte integer portably ... but if you align it, the space savings probably goes out the window. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Patch attribution and non-ASCII characters
I see a number of non-ASCII characters in the names of patch submitters in the CVS logs. Does anyone know a good way to have all these get the same encoding in the CVS commit logs? I am thinking that is impossible because we can't attach the email encoding to the commit message. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] CVS commit messages and backpatching
Bruce Momjian [EMAIL PROTECTED] writes: I guess the question is whether it is possible using cvs2cl to show only HEAD, and then show if the same commit message also appears in the most recent back branch. It's so rare that we make a back-branch patch without a corresponding HEAD patch that I'm not clear why you are concerned about showing only HEAD for this purpose. I've always found that cvs2cl's behavior shows me exactly what I want to know for CVS log searching (and that includes release note making). If the output format isn't quite what you want, maybe you could turn on its XML-output option and then munge that into HTML. I've never had occasion to play with the XML-format option myself. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Lock partitions
Jim Nasby [EMAIL PROTECTED] writes: Isn't having more lock partitions than buffer partitions pointless? AFAIK they're pretty orthogonal. It's true though that a typical transaction doesn't hold all that many locks, which is why I don't see a need for a large number of lock partitions. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Getting a move on for 8.2 beta
Jim Nasby wrote: That's something I'd be willing to do. And for many people that aren't committers but are still trusted in the community, we could probably bypass the checking. That's a worthwhile point. How many patches come from the general community vs out of the blue? Patches from regulars could probably get a free pass, which might cut down the review burden substantially. Another possibility would be to test these patches in some kind of virtual machine that gets blown away every X days, so that even if someone did get something malicious in there it wouldn't last long. Yeah, nasties could be roughly separated into two categories: stuff which affects your box, or stuff which uses your box to affect someone else. A VM fixes the first, and a firewall blocking outgoing connections (with exceptions for the CVS server and patch buildfarm or whatever it is server) largely fixes the second. I was under the impression that most VM products are x86 centric, which wouldn't lead to huge amounts of diversity in the buildfarm results. At least, not as far as architecture goes. Cheers Tom ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Optimizer improvements: to do or not to do?
Gregory Stark wrote: Ron Mayer [EMAIL PROTECTED] writes: ...vastly overestimate the number of pages .. because postgresql's guess at the correlation being practically 0 despite the fact that the distinct values for any given column are closely packed on a few pages. I think we need a serious statistics jock to pipe up with some standard metrics that do what we need. Otherwise we'll never have a solid footing for the predictions we make and will never know how much we can trust them. Do we know if any such people participate/lurk on this list, or if the conversation should go elsewhere? That said I'm now going to do exactly what I just said we should stop doing and brain storm about an ad-hoc metric that might help: I wonder if what we need is something like: sort the sampled values by value and count up the average number of distinct blocks per value. That might let us predict how many pages a fetch of a specific value would retrieve. Or perhaps we need a second histogram where the quantities are of distinct pages rather than total records. Either of these sound like they might be an improvement over correlation itself to estimate the number of pages it'd need to read. Would it be relatively easy or hard for a programmer not too familiar with the code to experiment with these ideas? Where would be a good place to look. We might also need a separate average number of n-block spans per value metric to predict how sequential the i/o will be in addition to how many pages will be fetched. I'm wildly guessing that, the # of pages itself seems to be a bigger factor than the sequential/random nature. For example, I do a query for data from a particular small city I'd only need dozens of pages, not many thousands. OTOH, it'd be neat to know if this were true. Is there any good way to make something like explain analyze show both the expected and actual # of pages and # of seeks? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Optimizer improvements: to do or not to do?
Ron Mayer [EMAIL PROTECTED] writes: I'm wildly guessing that, the # of pages itself seems to be a bigger factor than the sequential/random nature. No, they're both important: fetching N pages in sequence is way cheaper than fetching the same number of pages scattered all over. This is partly because you reduce seeking at the hardware level, and partly because sequential reads cue the kernel to do read-ahead, allowing overlap of I/O and computation. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] - Proposal for repreparing prepared statements
But forcing a reload of pltcl.so is nothing but a kluge --- it leaks memory like there's no tomorrow, and it's only an accident that it fails to crash. I don't want to design a further kluge on top of it. Are you saying that pltcl.so leaks memory in general, or that forcing a reload of the pltcl.so library leaks memory? If the former, I haven't seen evidence of it in my use of pltcl, but I'd be interested to know ofo problems other have had with it. Yours, Steve Marshall ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Inconsistency in extended-query-protocol logging
Guillaume Smet [EMAIL PROTECTED] writes: On 9/13/06, Tom Lane [EMAIL PROTECTED] wrote: statement: querystring Simple Query parse stmt: querystring Parse bind stmt/portal: querystring Bind execute stmt/portal: querystringExecute I agree with that. OK, Bruce hasn't said anything so I went ahead and changed it, as per attached patch. Hmmm, AFAICS, you changed stmt/portal to portal to statement in your last commit. Or did I misunderstand? Yeah, that seemed like a good idea at the time ... but on reflection it's probably better to be consistent with the way the execute message is logged. Also, the current code distinguishes a fetch from an already-partially-executed portal ... do you care about that? I don't really understand what is a portal - I must admit I don't use libpq directly. I never saw a log file with fetch. Do you have an example? Should I consider an execute from fetch differently? Well, the point is that if you see execute unnamed: select ... followed by execute fetch from unnamed: select ... the latter is a continuation of the same command not a newly entered command. For resource-analysis purposes you can likely treat them the same, but if you were trying to debug an application you might need to know the difference. regards, tom lane *** src/backend/tcop/postgres.c.origFri Sep 8 11:55:53 2006 --- src/backend/tcop/postgres.c Wed Sep 13 17:51:35 2006 *** *** 1610,1619 break; case 2: ereport(LOG, ! (errmsg(duration: %s ms bind %s to %s: %s, msec_str, - *portal_name ? portal_name : unnamed, *stmt_name ? stmt_name : unnamed, pstmt-query_string ? pstmt-query_string : source not stored), errdetail_params(params))); break; --- 1610,1620 break; case 2: ereport(LOG, ! (errmsg(duration: %s ms bind %s%s%s: %s, msec_str, *stmt_name ? stmt_name : unnamed, + *portal_name ? / : , + *portal_name ? portal_name : , pstmt-query_string ? pstmt-query_string : source not stored), errdetail_params(params))); break; *** *** 1740,1747 ereport(LOG, (errmsg(%s %s%s%s%s%s, execute_is_fetch ? ! _(statement: execute fetch from) : ! _(statement: execute), prepStmtName, *portal_name ? / : , *portal_name ? portal_name : , --- 1741,1748 ereport(LOG, (errmsg(%s %s%s%s%s%s, execute_is_fetch ? ! _(execute fetch from) : ! _(execute), prepStmtName, *portal_name ? / : , *portal_name ? portal_name : , ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Getting a move on for 8.2 beta
On Wed, 13 Sep 2006, Tom Dunstan wrote: Another possibility would be to test these patches in some kind of virtual machine that gets blown away every X days, so that even if someone did get something malicious in there it wouldn't last long. Or just have a snapshot which is reverted after each run, and read-only access to files used to do the build. I know vmware supports this, probably others too... Yeah, nasties could be roughly separated into two categories: stuff which affects your box, or stuff which uses your box to affect someone else. A VM fixes the first, and a firewall blocking outgoing connections (with exceptions for the CVS server and patch buildfarm or whatever it is server) largely fixes the second. I was under the impression that most VM products are x86 centric, which wouldn't lead to huge amounts of diversity in the buildfarm results. At least, not as far as architecture goes. I have played with QEmu (www.qemu.org) which is open source and supports multiple target architectures. I'm not sure how stable all of the different targets are, I know that sparc64 is not quite done yet. -- The problem with engineers is that they tend to cheat in order to get results. The problem with mathematicians is that they tend to work on toy problems in order to get results. The problem with program verifiers is that they tend to cheat at toy problems in order to get results. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] - Proposal for repreparing prepared statements
Marshall, Steve [EMAIL PROTECTED] writes: But forcing a reload of pltcl.so is nothing but a kluge --- it leaks memory like there's no tomorrow, and it's only an accident that it fails to crash. I don't want to design a further kluge on top of it. Are you saying that pltcl.so leaks memory in general, or that forcing a reload of the pltcl.so library leaks memory? The latter. There's no mechanism to release the previous instance's data structures (Tcl interpreters, function bodies, etc). regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Optimizer improvements: to do or not to do?
Ron Mayer wrote: Gregory Stark wrote: Ron Mayer [EMAIL PROTECTED] writes: ...vastly overestimate the number of pages .. because postgresql's guess at the correlation being practically 0 despite the fact that the distinct values for any given column are closely packed on a few pages. I think we need a serious statistics jock to pipe up with some standard metrics that do what we need. Otherwise we'll never have a solid footing for the predictions we make and will never know how much we can trust them. Do we know if any such people participate/lurk on this list, or if the conversation should go elsewhere? I lurk... I don't know if I'm a 'statistics jock', but I may be valuable if only I had a better understanding of how the optimizer works. I have been following this thread with interest, but could really do with a good pointer to background information beyond what I have read in the main postgres manual. Does such information exist, and if so, where ? Josh Reich ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )
Albe Laurenz [EMAIL PROTECTED] writes: The patch did not work for me :-( Is libpq.so in a non-standard directory? If yes, one either has to export LIBPATH in the environment or link with -L/location/of/libpq for the executable to find it (similar to RPATH in Linux). libpq will definitely be in a nonstandard directory in buildfarm usage. It sounds like we should add the local equivalent of RPATH for AIX. Now I'm pretty dubious that -L is what we want --- -L should have the semantics of this is where to find the library right now. RPATH is used to point to an expected installation directory that might not contain the library at the time of linking ... or worse, might contain an old incompatible version of it. Does AIX have a true RPATH equivalent? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Patch attribution and non-ASCII characters
Bruce Momjian wrote: I see a number of non-ASCII characters in the names of patch submitters in the CVS logs. Does anyone know a good way to have all these get the same encoding in the CVS commit logs? I am thinking that is impossible because we can't attach the email encoding to the commit message. Is this a problem now, or are you looking to solve it for future releases? I think the best you could do is post the non-ASCII names here, and have affected people post back their names in HTML escaping or something that suits the SGML docs. For example my name is Aacute;lvaro Herrera Or, in Latin-1, Álvaro Herrera Most commit messages contain the ASCII version of my name, thus you wouldn't notice the problem anyway. The COPY (select) commit message, AFAIR, also has Zoltán's name in ASCII form (Zoltan). -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Patch attribution and non-ASCII characters
Alvaro Herrera [EMAIL PROTECTED] writes: I think the best you could do is post the non-ASCII names here, and have affected people post back their names in HTML escaping or something that suits the SGML docs. That's probably the best way to close the loop. I know that when I'm committing such patches, I tend to copy-n-paste from a mail window to a shell window, and I'd not venture to guarantee anything about what encoding the text is in anyway. Sometimes it looks reasonable in the shell window and sometimes it doesn't ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
information_schema vs temp tables (was Re: [HACKERS] [COMMITTERS] pgsql: Sequences were not being shown due to the use of lowercase `s`)
Greg Sabino Mullane [EMAIL PROTECTED] writes: ... I can't think of a use case where a user would not want to append a is_visible clause to the query above. That or start tracking which pg_temp_ schema belongs to whom. Well, I'm still having a problem with this, because it seems like a pretty klugy solution. It's inefficient (the is_visible functions are not cheap) and it's not hard to fool: set search_path = pg_temp_N, ... (This won't work for a non-superuser, because he'll not have USAGE privilege on someone else's temp schema, but you seem to be worried about hiding temp tables from superusers.) If you're really intent on making it work this way, my vote is to expose namespace.c's isOtherTempNamespace() as a SQL-callable function, and add a test on that to the info-schema views, rather than relying on is_visible or explicit knowledge of the temp-schema naming convention. Perhaps we should expose both pg_is_my_temp_schema(schema_oid) pg_is_other_temp_schema(schema_oid) Thoughts? Opinions about the function names? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Fixed length data types issue
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Jim Nasby) wrote: I'd love to have the ability to control toasting thresholds manually. ... Being able to force a field to be toasted before it normally would could drastically improve tuple density without requiring the developer to use a 'side table' to store the data. +1 :-) -arturo ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Patch attribution and non-ASCII characters
Alvaro Herrera wrote: Bruce Momjian wrote: I see a number of non-ASCII characters in the names of patch submitters in the CVS logs. Does anyone know a good way to have all these get the same encoding in the CVS commit logs? I am thinking that is impossible because we can't attach the email encoding to the commit message. Is this a problem now, or are you looking to solve it for future releases? Either. ;-) I think the best you could do is post the non-ASCII names here, and have affected people post back their names in HTML escaping or something that suits the SGML docs. For example my name is Aacute;lvaro Herrera Or, in Latin-1, ?lvaro Herrera Most commit messages contain the ASCII version of my name, thus you wouldn't notice the problem anyway. The COPY (select) commit message, AFAIR, also has Zolt?n's name in ASCII form (Zoltan). Yep. I will grab the unknown names and ask the group to research HTML versions. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Getting a move on for 8.2 beta
Jeremy Drake wrote: On Wed, 13 Sep 2006, Tom Dunstan wrote: I was under the impression that most VM products are x86 centric, which wouldn't lead to huge amounts of diversity in the buildfarm results. At least, not as far as architecture goes. I have played with QEmu (www.qemu.org) which is open source and supports multiple target architectures. I'm not sure how stable all of the different targets are, I know that sparc64 is not quite done yet. Oh, I didn't realize Qemu did non-x86 architectures. Is it considered good enough at emulating e.g. a sparc for it to be useful to us? PearPC was a PowerPC emulator that got some press a while ago, although it appears that the project has stagnated a bit (probably because people who wanted to run OSX on intel hardware have a legit way to do it now :) ) The problem with these things is if something goes wrong, was it the patch that failed or the not-quite-perfect VM product? To cut down on those sorts of problems, I suppose we could have it do a clean, non-patched run first, and then only do the patched version if the clean version passed. We'd have to be reasonably unlucky to have a patch trigger a VM bug under those circumstance, I would think. Cheers Tom ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Optimizer improvements: to do or not to do?
Joshua Reich [EMAIL PROTECTED] writes: I lurk... I don't know if I'm a 'statistics jock', but I may be valuable if only I had a better understanding of how the optimizer works. I have been following this thread with interest, but could really do with a good pointer to background information beyond what I have read in the main postgres manual. Does such information exist, and if so, where ? Well, there's the 2-foot view here: http://developer.postgresql.org/pgdocs/postgres/planner-optimizer.html but after that you have to start reading code. The optimizer README file may be useful: http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/README but it goes into a lot of details that probably aren't interesting for your purposes. Most of the planner is just mechanism associated with generating different possible plans. The policy that determines which plan is chosen is the cost-estimation equations, and those are all in costsize.c and selfuncs.c: http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/path/costsize.c http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/selfuncs.c The division between these two files is a bit historical, but roughly speaking selfuncs.c knows about the behavior of specific WHERE-clause operators and index access methods, while costsize.c knows about the behavior of particular plan types. I'd like to think that costsize.c is well enough commented that you can follow it even without any C knowledge, but selfuncs.c may be a bit more daunting. Still, the comments are pretty extensive, and feel free to ask questions on pg-hackers. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Fixed length data types issue
Tom Lane wrote: Mark Dilger [EMAIL PROTECTED] writes: int1 works perfectly, as far as I can tell. int3 works great in memory, but can't be stored to a table. The problem seems to be that store_att_byval allows data of size 1 byte but not size 3 bytes, forcing me to pass int3 by reference. But when I pass either of these types by reference the backend exits when trying to store to a table. Please provide a stack trace --- AFAIK there shouldn't be any reason why a pass-by-ref 3-byte type wouldn't work. I'm wondering though what alignment you expect it to have. You'd need some pretty ugly code to pick up an unaligned 3-byte integer portably ... but if you align it, the space savings probably goes out the window. regards, tom lane Program received signal SIGSEGV, Segmentation fault. 0xb7e01d45 in memcpy () from /lib/libc.so.6 (gdb) bt #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, values=0x83c2e84, isnull=0x83c2e98 , data=0x83c2ef4 , infomask=0x83c2ef0, bit=0x0) at heaptuple.c:181 #2 0x08078b0d in heap_form_tuple (tupleDescriptor=0x83c2d78, values=0x83c2e84, isnull=0x83c2e98 ) at heaptuple.c:749 #3 0x0815d2d9 in ExecCopySlotTuple (slot=0x83c26f4) at execTuples.c:558 #4 0x0815d393 in ExecMaterializeSlot (slot=0x83c26f4) at execTuples.c:639 #5 0x081560ca in ExecutorRun (queryDesc=0x83c2834, direction=ForwardScanDirection, count=0) at execMain.c:1401 #6 0x081e78e4 in ProcessQuery (parsetree=0x83c2240, plan=0x83b837c, params=0x3, dest=0x83b8290, completionTag=0xbfedffa0 ) at pquery.c:174 #7 0x081e89f9 in PortalRun (portal=0x83c0064, count=2147483647, dest=0x83b8290, altdest=0x83b8290, completionTag=0xbfedffa0 ) at pquery.c:1076 #8 0x081e4060 in exec_simple_query (query_string=0x83b7bbc insert into test (a) values (3::int3);) at postgres.c:1004 #9 0x081e6074 in PostgresMain (argc=4, argv=0x836fab4, username=0x836fa8c mark) at postgres.c:3219 #10 0x081b89b3 in ServerLoop () at postmaster.c:2854 #11 0x081ba21b in PostmasterMain (argc=1, argv=0x836d9f8) at postmaster.c:941 #12 0x081764a8 in main (argc=1, argv=0x836d9f8) at main.c:265 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CSStorm occurred again by postgreSQL8.2
I wrote: ... it seems like in the case where RecordSubTransactionCommit detects that the subxact has not stored its XID anywhere, we could immediately remove the XID from the PGPROC array, just as if it had aborted. This would avoid chewing subxid slots for cases such as exception blocks in plpgsql that are not modifying the database, but just catching computational errors. (and later realized that Alvaro had had the same idea awhile back, but I don't have his message at hand). I looked into this a bit more; it seems like basically it should only take addition of else XidCacheRemoveRunningXids(xid, 0, NULL); to the bottom of RecordSubTransactionCommit(), plus suitable adjustment of the comments in both routines. However, there's a problem: if we delete a second-level subxact's XID from PGPROC, and later its upper subtransaction aborts, XidCacheRemoveRunningXids will emit scary warnings when it doesn't find the sub-subxact in PGPROC. This could doubtless be fixed with sufficient jiggery-pokery --- simply removing the debug warnings would be a brute-force answer, but I'd like to find something a bit less brute-force. Maybe drop the sub-subxact from its parent's list immediately, instead of carrying it forward? Anyway, given that there's this one nonobvious gotcha, there might be others. My recommendation is that we take this off the open-items list for 8.2 and revisit it in the 8.3 cycle when there's more time. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Not-so-open items
There are several entries on the 8.2 open-items list that I think can be removed: Fix backward array comparison - subset Done (this was redundant with the containment-operator item) Store only active XIDs in subtransaction cache Per my note just now, this probably should wait for 8.3. Double vacuum speed on tables with no indexes I changed the locking thing I was worried about. Unless Greg wants to do some real-world performance measurements to confirm or refute that change, I think this can be closed. Fix excessive page checking for new btree split code Per my note yesterday, I can't reproduce the misbehavior I saw six weeks ago, so I recommend we leave the code alone. Suppress error on bind parameters of unknown types I haven't heard one single person speak up to say yeah, that's a good idea, so I conclude it probably isn't. Recommend we not change it. BTW, pushing out an 8.1.5 is probably a good idea, but what's it doing in the 8.2 open-items list? Especially in the documentation section? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length data types issue
Mark Dilger [EMAIL PROTECTED] writes: Tom Lane wrote: Please provide a stack trace --- AFAIK there shouldn't be any reason why a pass-by-ref 3-byte type wouldn't work. (gdb) bt #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, values=0x83c2e84, isnull=0x83c2e98 , data=0x83c2ef4 , infomask=0x83c2ef0, bit=0x0) at heaptuple.c:181 Hm, are you sure you provided a valid pointer (not the integer value itself) as the Datum output from int3_in? (Looks at patch ... ) Um, I think you didn't, although that coding is far too cute to be actually readable ... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length data types issue
Tom Lane wrote: Mark Dilger [EMAIL PROTECTED] writes: Tom Lane wrote: Please provide a stack trace --- AFAIK there shouldn't be any reason why a pass-by-ref 3-byte type wouldn't work. (gdb) bt #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, values=0x83c2e84, isnull=0x83c2e98 , data=0x83c2ef4 , infomask=0x83c2ef0, bit=0x0) at heaptuple.c:181 Hm, are you sure you provided a valid pointer (not the integer value itself) as the Datum output from int3_in? (Looks at patch ... ) Um, I think you didn't, although that coding is far too cute to be actually readable ... regards, tom lane I tracked this down to my implementation of Int24GetDatum. I've got that fixed now and have hit another bug, but I'm still working on it so I won't bother you about that yet. As for the patch, I will eventually submit a version without the cute code autogeneration stuff. mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Interesting tight loop
In production today (8.1.4), I ran into a backend process that wouldn't cancel right away -- minutes went by. It was in [0] TransactionIdIsCurrentTransactionId [1] HeapTupleSatisfiesSnapshot ... But the interesting thing is that there were 4.6 million elements in the s-childXids list. Which is why it took so damn long. I can't quite figure out how I induced this state. It is an OLAP server with about 10-20 connection that run long queries (from 5 seconds to 24 hours). If this is a common possible state, it seems that perhaps a hash of the childXids would be more appropriate. Does the order of the childXids chained off the current transaction state matter? Most of the placed I could find that reference it seem to just attempt to find an Xid in there. O(1) sounds a lot better than O(n) :-D Best regards, Theo // Theo Schlossnagle // CTO -- http://www.omniti.com/~jesus/ // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Optimizer improvements: to do or not to do?
Tom Lane wrote: I'm having a hard time getting excited about improving this query when it's so badly coded in the first place. What's an ORDER BY doing in an EXISTS subquery? The LIMIT is unnecessary too. And the inner WHERE says nothing so much as I don't know how to design a database :-(. It was the test query which has the same execution plan for belg_mobile (and the same problem) as the production query below: select (select max(code) from belg_mobile tc where c.bnum = tc.code and c.bnum like tc.code || '%') as code, c.cause, c.ilno, extract(hour from c.datetime) as hour, count(*) as cnt, sum(c.dur) as dur from conn.conn20060803 c where itgrp = :itgrp group by 1,2,3,4 It's a simple OLAP query for analysis telephonic traffic distribution over time and trunk codes. 'max(codes)' is used to get the most matching code. For example, 84725 and 8472 are both valid codes, and number 84725123456 must match 84725 but not 8472. The 'c.bnum = tc.code' qual significantly reduce index scan and execution time. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Draft release notes
Here is an early draft of the release notes. It needs more polish and review: http://momjian.us/cgi-bin/pgrelease I will catch up on my email tomorrow, update the open items list for 8.2, and then return to the release notes for cleanup. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings