Re: [PATCHES] [HACKERS] Patching dblink.c to avoid warning about
Bruce Momjian wrote: There was also a problem in that if two cursors were opened, the first close would close the transaction. I have fixed that code by changing the xact variable in to a counter that keeps track of the number of opened cursors and commits only when they are all closed. Both the dblink.c patch and the regression patch are at: ftp://candle.pha.pa.us/pub/postgresql/mypatches OK, I'll take a look, but I won't have time for a couple of days (I'm not at home -- visiting my dad for his 80th birthday -- and have no broadband access). Joe ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Kerberos brokenness and oops question in 8.1beta2
> The point is I'm having a hard time seeing what the actual > gain is in not changing it back. If the principal name > mismatches, we're going to get rejected anyway, so it's not > really a problem there. Even though the gain in changing it > back isn't all that big either, why should we introduce > abackwards-incompatibility if there is no real gain in a > different part of the code. Here's a patch that fixes the big problem and reverts the behaviour of appl_version to be compatible with 8.0. It's easy enough to isolate the changes that are around the appl_version - one line in backend/libpq/auth.c call to krb5_recvauth and one in interfaces/libpq/fe-auth.c call to krb5_sendauth. The call in backend/libpq/auth.c to krb5_sname_to_principal in 8.1beta2 was completely broken for a scenario where you *didn't* use virtual hosts, by setting pg_krb5_server to NULL... The call is needed there as well. //Magnus krb5fix.patch Description: krb5fix.patch ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Patching dblink.c to avoid warning about open
Joe Conway wrote: > Bruce Momjian wrote: > > There was also a problem in that if two cursors were opened, the first > > close would close the transaction. I have fixed that code by changing > > the xact variable in to a counter that keeps track of the number of > > opened cursors and commits only when they are all closed. > > > > Both the dblink.c patch and the regression patch are at: > > > > ftp://candle.pha.pa.us/pub/postgresql/mypatches > > > > OK, I'll take a look, but I won't have time for a couple of days (I'm > not at home -- visiting my dad for his 80th birthday -- and have no > broadband access). No problem -- thanks. I have slimmed down the patch by applying the cosmetic parts to CVS. Use the URL above to get the newest versions of the dblink.c and regression changes. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PATCHES] [PATCH] Using pread instead of lseek (with analysis)
Hi, Below is a patch that tries to determine (for linux) if the pread is emulated or not (using an autoconf test), and if so use it instead of lseek. It doesn't remove all uses of lseek. For example, PostgreSQL uses lseek to find the end of the file when extending. So in a bulk load, there is one seek per block written anyway. Also, the xlog seems to be completely seperate code. Short summary: Actual benefit, (on Linux anyway) not distinguishable from the noise. Longer summary: It reduces CPU usage slightly (few percent user, overall almost nothing), but the benefit is usually swamped by the actual cost of the read from disk (system cache miss). I would actually make the change because it simplifies the code a bit. Comparisons on other systems may be worthwhile. What follows is the detailed analysis of the change. My conclusion is that on Linux at least where the syscall overhead is so low, it's not worth the change for performance. It is cleaner code-wise IMHO. Initially, the table is 1.5 million rows of random integer data. Worst case scenario: ordered output run with seqscan disabled. Table is 66MB, Index is 56MB and includes all three data columns. The queries took 98 +/- 1 seconds. Over the entire query it consistantly loaded about 55MB of data from disk (about 50% system cache miss rate). Query: select count(*) from (select * from bigtable order by a,b,c) Exhibit A below is the vmstat output with 10 second averages. The interesting columns are cpu us (user) and cpu sy (system). Note, system is defined as stuff not user and not idle. Waiting for the I/O to complete is counted as system, even though it's not doing all that much. It shows a very, very slight reduction in user time, though it is measurable and repeatable (both run a number of times with same results). What this tells me is that lseek time is swamped by actual read time, so not a major benefit in most cases. Exhibit B is the same but with only 1 million rows (Table 48MB, Index 29MB), this fitting in cache memory. The original index had some slack from deleting records (I did a vacuum full though). The queries now take 60 +/- 1 seconds. As you can see even without accessing the disk, it still isn't making a big difference. Exhibit C is the output of strace -c on the second test. As you can see there is a difference, but marginal. Exhibit D is the output of a seq scan, in case anyone figured I might be buggering the queries somehow. As you can see, it shows that the index scan is extremely expensive. Only 5900 reads for the seq scan (enough to read the whole table) instead of 89. The shared buffers is the default 1000. Upping this would close the gap somewhat but that is irrelevent for this test (lseek vs pread). Exhibit E is the strace -c output, for completeness. lseek doesn't appear in the top 10. If you read this far, thanks for the attention! If you're running another system, it would be nice to test. Remember to update the autoconf code for your system! Have a nice day, Misc details: command line: time echo "select count(*) from(select * from bigtable order by a,b,c) x" | \ ./postgres -f s -D /var/lib/postgres/data test System: Linux kleptog 2.6.8-1-686 #1 Thu Nov 25 04:34:30 UTC 2004 i686 GNU/Linux 128MB RAM Intel(R) Celeron(TM) CPU 1000MHz Exhibit (A): Index scan, dataset > memory Using ordinary llseek/read: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0 4 1372940 10756400 53418 1070 145 18 56 0 27 1 0 4 1648992 10726000 548 3 1071 147 18 63 0 19 1 0 4 1552984 10734800 542 4 1069 144 20 69 0 11 1 0 4 1432980 10749600 550 6 1071 146 19 69 0 12 1 0 4 1648664 10760000 560 3 1071 148 19 67 0 15 1 0 4 1576632 10772800 597 3 1076 157 19 68 0 13 2 0 4 1768640 10752800 61814 1080 163 19 67 0 14 Using pread(): procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0 4 1680600 10761600 53418 1070 145 17 56 0 28 1 0 4 1656644 10764400 543 5 1070 147 16 64 0 19 1 0 4 1344624 10797600 540 2 1068 142 19 70 0 11 1 0 4 1656632 10765200 555 6 1071 147 17 70 0 13 1 0 4 1608540 10779600 56413 1074 149 18 68 0 15 1 0 4 1392504 10806800 598 4 1076 158 17 69 0 14 0 1 4 1368460 10812000 626 3 1079 165 17 69 0 14 Exhibit (B): Index scan, dataset < memory Using ordinary llseek/read: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free
[PATCHES] test, please ignore
test email, please ignore. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Patching dblink.c to avoid warning about open
[ reposted] Tom Lane wrote: > Bruce Momjian writes: > > Well, as I said in the patch email: > > > The reported problem is that dblink_open/dblink_close() (for cursor > > reads) do a BEGIN/COMMIT regardless of the transaction state of the > > remote connection. There was code in dblink.c to track the remote > > transaction state (rconn), but it was not being maintained or used. > > You should lose the remoteXactOpen flag entirely, in favor of just > testing PQtransactionStatus() on-the-fly when necessary. Simpler, > more reliable, not notably slower. > > With that change, the separate remoteConn struct could be dropped > altogether in favor of just using the PGconn pointer. This would > make things notationally simpler, and in fact perhaps allow undoing > the bulk of the edits in your patch. As-is I think the patch is > pretty risky to apply during beta. I have developed a dblink regression patch that tests for the bug. It opens a transaction in the client code, opens/closes a cursor, then tries a DECLARE. Without my patch, this fails because the dblink_close() closes the transaction, but with my patch it succeeds. Here is the test: -- test opening cursor in a transaction SELECT dblink_exec('myconn','BEGIN'); -- an open transaction will prevent dblink_open() from opening its own SELECT dblink_open('myconn','rmt_foo_cursor','SELECT * FROM foo'); -- this should not close the cursor because the client opened it SELECT dblink_close('myconn','rmt_foo_cursor'); -- this should succeed because we have an open transaction SELECT dblink_exec('myconn','DECLARE xact_test CURSOR FOR SELECT * FROM foo'); -- commit remote transaction SELECT dblink_exec('myconn','COMMIT'); Here is the failure reported by the current code: -- this should succeed because we have an open transaction SELECT dblink_exec('myconn','DECLARE xact_test CURSOR FOR SELECT * FROM foo'); ERROR: sql error DETAIL: ERROR: DECLARE CURSOR may only be used in transaction blocks There was also a problem in that if two cursors were opened, the first close would close the transaction. I have fixed that code by changing the xact variable in to a counter that keeps track of the number of opened cursors and commits only when they are all closed. Both the dblink.c patch and the regression patch are at: ftp://candle.pha.pa.us/pub/postgresql/mypatches -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [BUGS] BUG #1927: incorrect timestamp returned
Bruce Momjian writes: > Tom Lane wrote: >> I think the solution is that timestamp_out needs to decide how many >> fractional digits it wants to display, and then round off the input >> accordingly, *before* it breaks the input down into y/m/d/h/m/s fields. >> This "60.00" business is happening because the rounding is done only on >> the seconds-and-fractional-seconds field. > Well, the testing showed that the one with the most 9's was actually > rounded up to a whole number by timestamp_in, meaning we never have a > chance to adjust it in timestamp_out. I am assuming you will be able to > round the middle test value up to a whole number in timestamp_out so the > 60 number will disappear. After further thought I've realized that the current methodology for applying JROUND (ie, to apply it to a whole timestamp or time value) is fundamentally bogus. The macro is trying to round off to six fraction digits, but if you've got many digits on the left side of the decimal point then you cannot guarantee that six fraction digits of precision are available to work with. It is *only* sensible to JROUND a value representing a fractional-seconds field (or possibly seconds + fractional seconds). Accordingly, the majority of the JROUND calls in the existing code are indeed bogus and should go away. The only one that's not bogus is the one in dt2time(), which is properly applied to a fractional second ... but the problem with that one is that there's no provision for coping with the possibility that the fractional second rounds up to 1.0. In that situation, we need to be able to propagate the roundoff to higher fields to avoid the "23:59:60" syndrome. Since dt2time doesn't have a way to propagate the roundoff into the date part, we can't fix this right there. It needs to be handled at the caller level (timestamp2tm and related routines). Attached is a proposed patch that cleans this up. It's not quite complete because I haven't propagated the changes into ecpglib, but I thought I could put it out for comment in this form. The patch does the following: 1. Replace JROUND with TSROUND (to round to MAX_TIMESTAMP_PRECISION places) and TIMEROUND (to round to MAX_TIME_PRECISION places). 2. Remove JROUND calls in calculational routines. 3. In timestamp2tm, time2tm, timetz2tm, and interval2tm, round off the fractional-seconds field to the right number of places, check for rounding up to 1, and cope as needed. 4. Make the sprintf field width in EncodeTimeOnly match MAX_TIME_PRECISION. Because it was only printing 9 digits and not 10, there were cases where a properly rounded fractional time would still print as "60.0". (I think at some point we reduced it from 10 to 9 as a hacky way of dealing with a different bug report ... but now I feel we've gotten the right solution at last.) The patch passes the regression tests and appears to fix the problems. Comments? regards, tom lane *** contrib/btree_gist/btree_ts.c.orig Fri Jul 1 09:44:55 2005 --- contrib/btree_gist/btree_ts.c Sat Oct 8 13:12:37 2005 *** *** 122,130 *gmt -= (tz * INT64CONST(100)); #else *gmt -= tz; - *gmt = JROUND(*gmt); #endif - } return gmt; } --- 122,128 *** src/backend/utils/adt/date.c.orig Thu Sep 8 22:31:49 2005 --- src/backend/utils/adt/date.cSat Oct 8 13:12:45 2005 *** *** 944,953 --- 944,961 #else double trem; + recalc: trem = time; TMODULO(trem, tm->tm_hour, (double)SECS_PER_HOUR); TMODULO(trem, tm->tm_min, (double)SECS_PER_MINUTE); TMODULO(trem, tm->tm_sec, 1.0); + trem = TIMEROUND(trem); + /* roundoff may need to propagate to higher-order fields */ + if (trem >= 1.0) + { + time = ceil(time); + goto recalc; + } *fsec = trem; #endif *** *** 1837,1845 --- 1845,1861 #else double trem = time->time; + recalc: TMODULO(trem, tm->tm_hour, (double)SECS_PER_HOUR); TMODULO(trem, tm->tm_min, (double)SECS_PER_MINUTE); TMODULO(trem, tm->tm_sec, 1.0); + trem = TIMEROUND(trem); + /* roundoff may need to propagate to higher-order fields */ + if (trem >= 1.0) + { + trem = ceil(time->time); + goto recalc; + } *fsec = trem; #endif *** src/backend/utils/adt/datetime.c.orig Sat Jul 23 10:25:33 2005 --- src/backend/utils/adt/datetime.cSat Oct 8 13:12:46 2005 *** *** 3488,3495 sprintf(str, "%02d:%02d", tm->tm_hour, tm->tm_min); /* !* Print fractional seconds if any. The field widths here should be !* at least equal to the larger of MAX_TIME_PRECISION and * MAX_TIMESTAMP_PRECISION. */ if (fsec != 0) --- 3488,3495 sprintf(str, "%02d:%02d",
Re: [PATCHES] [HACKERS] Kerberos brokenness and oops question in 8.1beta2
"Magnus Hagander" <[EMAIL PROTECTED]> writes: > Here's a patch that fixes the big problem and reverts the behaviour of > appl_version to be compatible with 8.0. Applied with trivial stylistic cleanups. BTW, the documentation seems a bit broken: krb_server_hostname (string) Sets the hostname part of the service principal. This, combined with krb_srvname, is used to generate the complete service principal, i.e. krb_server_hostname/[EMAIL PROTECTED] I suppose one of those last two should be "krb_srvname", but which? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Kerberos brokenness and oops question in 8.1beta2
> > Here's a patch that fixes the big problem and reverts the > behaviour of > > appl_version to be compatible with 8.0. > > Applied with trivial stylistic cleanups. > > BTW, the documentation seems a bit broken: > > krb_server_hostname (string) > > Sets the hostname part of the service principal. This, combined > with krb_srvname, is used to generate the complete service > principal, i.e. krb_server_hostname/[EMAIL PROTECTED] > > I suppose one of those last two should be "krb_srvname", but which? Yes. It should be krb_srvname/[EMAIL PROTECTED] Now that you mention it, krb_server_hostname should probably mentioned in chapter 19.2.3 somewhere. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Kerberos brokenness and oops question in 8.1beta2
"Magnus Hagander" <[EMAIL PROTECTED]> writes: >> I suppose one of those last two should be "krb_srvname", but which? > Yes. > It should be krb_srvname/[EMAIL PROTECTED] OK, will fix. > Now that you mention it, krb_server_hostname should probably mentioned > in chapter 19.2.3 somewhere. Yeah, it just says hostname is the fully qualified host name of the server machine. which is clearly insufficient detail. Please send some proposed text. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [PATCH] Using pread instead of lseek (with analysis)
"Martijn van Oosterhout" wrote > > Below is a patch that tries to determine (for linux) if the pread is > emulated or not (using an autoconf test), and if so use it instead of > lseek. Just FYI, this was proposed 2 years ago: http://archives.postgresql.org/pgsql-patches/2003-02/msg00197.php The basic problem I read from the above thread is that it does not give enough performance proofs on *various* platforms. Regards, Qingqing ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Kerberos brokenness and oops question in 8.1beta2
BTW, it appears to me that this patch has also broken the claim in the manual that If [krb_server_hostname is] not set, the default is to allow any service principal matching an entry in the keytab. The reason that was true was that we passed a NULL "server" value to krb5_recvauth(), which with this patch we never do anymore. I'm not sure if this represents a serious loss of flexibility or not, but in any case the documentation needs an update. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [PATCH] Using pread instead of lseek (with analysis)
Martijn van Oosterhout writes: > What follows is the detailed analysis of the change. My conclusion is > that on Linux at least where the syscall overhead is so low, it's not > worth the change for performance. It is cleaner code-wise IMHO. How is it cleaner code-wise to debug and maintain two #ifdef'd code paths instead of one? (And when I say "debug", I mean "I don't believe FileSeek still works". One reason that the patch seems unmaintainable to me as-is is that it creates a subtle, critical, and undocumented difference in the semantic meaning of the seekPos variable: in one case it's tracking the kernel's own seek position, and in the other it isn't.) > What this tells me is that lseek time is swamped by actual read time, > so not a major benefit in most cases. It would be reasonable to check results in fully-cached cases, which would be the best real-world scenario for this to show any improvement in. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [PATCH] Using pread instead of lseek (with analysis)
On Sat, Oct 08, 2005 at 05:27:11PM -0400, Tom Lane wrote: > How is it cleaner code-wise to debug and maintain two #ifdef'd code > paths instead of one? (And when I say "debug", I mean "I don't believe > FileSeek still works". One reason that the patch seems unmaintainable > to me as-is is that it creates a subtle, critical, and undocumented > difference in the semantic meaning of the seekPos variable: in one case > it's tracking the kernel's own seek position, and in the other it isn't.) I mean in the situation where we would only use pread. As for the difference, no code outside of fd.c can assume anying about the kernel seek position because it can't see it (the file might be closed!). Unless something bypasses the abstraction? > It would be reasonable to check results in fully-cached cases, which > would be the best real-world scenario for this to show any improvement > in. If you look, I did that and even then it simply didn't make a difference. lseek is 10 microseconds, you'd need to do hundreds of thousands of them to make a difference. And any gain would disappear in just the rotational latency of a hard disk read. I satisfied my curiosity and showed it just isn't worth it at this point. -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgp4zc73DtXI1.pgp Description: PGP signature