Re: [HACKERS] Shouldn't non-MULTIBYTE backend refuse to start inMB database?
Tatsuo Ishii [EMAIL PROTECTED] writes: Oh, I see. So the question still remains: can a MULTIBYTE-aware backend ever use a sort order different from strcmp() order? (That is, not as a result of LOCALE, but just because of the non-SQL-ASCII encoding.) According to the code, no, because varstr_cmp() doesn't pay attention to the multibyte status. Presumably strcmp() and strcoll() don't either. Right. OK, so I guess this comes down to a judgment call: should we insert the check in the non-MULTIBYTE case, or not? I still think it's safest to do so, but I'm not sure what you want to do. regards, tom lane I have discussed with Japanese hackers including Hiroshi of this issue. We have reached the conclusion that your proposal is appropreate and will make PostgreSQL more statble. -- Tatsuo Ishii
[HACKERS] MATCH PARTIAL
MATCH PARTIAL isn't in 7.1. Is it? Mike Mascari [EMAIL PROTECTED]
Re: [HACKERS] MATCH PARTIAL
No. In parser/gram.y I see: | MATCH PARTIAL { elog(ERROR, "FOREIGN KEY/MATCH PARTIAL not yet implemented"); $$ = "PARTIAL"; } MATCH PARTIAL isn't in 7.1. Is it? Mike Mascari [EMAIL PROTECTED] -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
[HACKERS] Re: [ADMIN] Re: Kernel panic error
[EMAIL PROTECTED] writes: What showed up was the "Error index_formtuple: data takes 21268 bytes: too big". If anyone has any ideas on this, please share them. It means your data is too big to fit into an index. Good case in point. Here is a typical email. Here is a difficult/rare problem that should be appearing on the mailing lists. Those easy questions are pretty much gone, as far as I can tell. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
[HACKERS] Quick question about 7.1 SQL92 Entry Level
Just a quick question, but how much of SQL92 Entry Level does 7.1 support, and what parts haven't we got (yet)? I need to know for a couple of internal bits in the JDBC driver... Peter -- Peter Mount [EMAIL PROTECTED] PostgreSQL JDBC Driver: http://www.retep.org.uk/postgres/ RetepPDF PDF library for Java: http://www.retep.org.uk/pdf/
Re: [HACKERS] floating point representation
Hiroshi Inoue writes: Is there a way to change the precision of floating point representation from clients ? Not currently, but I image it couldn't be too hard to introduce a parameter that changes the format string used by float*out to something else. The GNU C library now offers a %a (and %A) format that prints floating point numbers in a semi-internal form that is meant to be portable. (I image this was done because of C99, but I'm speculating.) It might be useful to offer this to preserve accurate data across dumps. -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/
Re: [HACKERS] Open 7.1 items
Can someone comment on this? Seems GET DIAGNOSTICS is wrong from Peter's reading of SQL92, and mine too. Bruce Momjian writes: Bruce Momjian writes: New PL/pgSQL GET DIAGNOSTICS statement for SPI value access (Jan) If someone can show me an example of how it operates I can write up something. I found: Quoting a recent message by Jan Wieck [EMAIL PROTECTED]: :Do a : :GET DIAGNOSTICS SELECT PROCESSED INTO int4_variable; : :directly after an INSERT, UPDATE or DELETE statement and you'll know :how many rows have been hit. : :Also you can get the OID of an inserted row with : :GET DIAGNOSTICS SELECT RESULT INTO int4_variable; Looking at plpgsql/src/gram.y, it only supports PROCESSED (rows returned/affected) and RESULT (OID). The grammar indicates that only SELECT is allowed in GET DIAGNOSTICS SELECT. Jan says it works for INSERT/UPDATE/DELETE too, but I guess you still use GET DIAGNOSTICS SELECT. May I suggest that this is the wrong syntax? It should be GET DIAGNOSTICS variable = ROW_COUNT; See SQL99 part 2, clause 19.1. -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/ -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
[HACKERS] Something smells in this schema...
I had postgres start blocking all it's UPDATEs on a production database today, when an engineer added the following two tables, among other things. We've had to restore from backup, and the interesting thing is that when we re-add these tables, things break again. Version: PostgreSQL 7.0.2 on i686-pc-linux-gnu, compiled by gcc 2.95.2 (we were planning on going to beta4 in another week, and have done some testing. This problem doesn't seem to occur on the engineer's machine, which is already at beta4) My first thought was the index on the boolean field in the time_cards, which I could have sworn has caused me problems before. Anyone else see anything wrong? -- time_tasks.schema create table time_tasks ( name char(2) primary key, title text, description text ); insert into time_tasks (name, title) values ('CO', 'Communication'); insert into time_tasks (name, title) values ('DB', 'Debug'); . . . -- time_cards.schema create table time_cards ( id serial, open bool not null default 't', accounted bool not null default 'f', uid int4 not null, task char(2) not null, project int4, component text, time_start int4, time_stop int4, total_minutes int4, notes text ); create index time_cards_open_pkey on time_cards (open); create index time_cards_uid_pkey on time_cards (uid); -- Adam Haberlach|A cat spends her life conflicted between a [EMAIL PROTECTED] |deep, passionate, and profound desire for http://www.newsnipple.com |fish and an equally deep, passionate, and '88 EX500'00 ^ |profound desire to avoid getting wet.
Re: [HACKERS] Quick question about 7.1 SQL92 Entry Level
Peter T Mount [EMAIL PROTECTED] writes: Just a quick question, but how much of SQL92 Entry Level does 7.1 support, and what parts haven't we got (yet)? I don't think anyone's made a careful list --- making one is on my personal to-do list for the near future, but not yet at the top. Schemas are one big item I know we are missing, and the privileges mechanism needs a revamp as well. Peter Eisentraut made a list a year ago (see attached) but that was as of 6.5, and I'm not sure how careful he was. regards, tom lane --- Forwarded Message Date:Sat, 19 Feb 2000 15:12:24 +0100 (CET) From:Peter Eisentraut [EMAIL PROTECTED] To: Thomas Lockhart [EMAIL PROTECTED] cc: PostgreSQL Development [EMAIL PROTECTED] Subject: [HACKERS] Re: SQL compliance On 2000-02-17, Thomas Lockhart mentioned: I've since seen the article in the latest issue of PCWeek. The article was not at all clear on the *specific* features which would disqualify Postgres from having SQL92 entry level compliance I dug through the standard to come up with a list. I probably missed some things, but they would be more of a lexical nature. I think I covered all language constructs (which is what people look at anyway). Some of these things I never used, so I merely tested them by looking at the current documentation and/or entering a simple example query. Also, this list doesn't care whether an implemented feature contains bugs that would actually disqualify it from complete compliance. * TIME and TIMESTAMP WITH TIMEZONE missing [6.1] * Things such as SELECT MAX(ALL x) FROM y; don't work. [6.5] {This seems to be an easy grammar fix.} * LIKE with ESCAPE clause missing [8.5] {Is on TODO.} * SOME / ANY doesn't seem to exist [8.7] * Grant privileges have several deficiencies [10.3, 11.36] * Schemas [11.1, 11.2] * CREATE VIEW name (x, y, z) doesn't work [11.19] * There's a WITH CHECK OPTION clause for CREATE VIEW [11.19] * no OPEN statement [13.2] * FETCH syntax has a few issues [13.3] * SELECT x INTO a, b, c table [13.5] * DELETE WHERE CURRENT OF [13.6] * INSERT INTO table DEFAULT VALUES [13.8] {Looks like a grammar fix as well.} * UPDATE WHERE CURRENT OF [13.9] * no SQLSTATE, SQLCODE [22.1, 22.2] {Not sure about that one, since the sections don't contain leveling information.} * default transaction isolation level is SERIALIZABLE {Why isn't ours?} * no autocommit in SQL * modules? [12] * Some type conversion problems. For example a DECIMAL field should not dump out as NUMERIC, and a FLOAT(x) field should be stored as such. [* Haven't looked at Embedded SQL.] That's it. :) -- Peter Eisentraut Sernanders väg 10:115 [EMAIL PROTECTED] 75262 Uppsala http://yi.org/peter-e/Sweden --- End of Forwarded Message
Re: [HACKERS] Something smells in this schema...
Adam Haberlach [EMAIL PROTECTED] writes: I had postgres start blocking all it's UPDATEs on a production database today, when an engineer added the following two tables, among other things. We've had to restore from backup, and the interesting thing is that when we re-add these tables, things break again. "blocking"? Please define symptoms more precisely. My first thought was the index on the boolean field in the time_cards, which I could have sworn has caused me problems before. Anyone else see anything wrong? Pre-7.1 versions do have problems with large numbers of equal keys in a btree index, which is more or less the definition of an index on boolean. I'm dubious that such an index is of any value anyway ... regards, tom lane
Re: [HACKERS] Something smells in this schema...
On Fri, Feb 16, 2001 at 01:02:24PM -0500, Tom Lane wrote: Adam Haberlach [EMAIL PROTECTED] writes: I had postgres start blocking all it's UPDATEs on a production database today, when an engineer added the following two tables, among other things. We've had to restore from backup, and the interesting thing is that when we re-add these tables, things break again. "blocking"? Please define symptoms more precisely. The postgres process stalls. According to ps, it's it is attempting an UPDATE. I think it times out eventually (I was in disaster-recovery mode this morning, and not always waiting around for these things. :) My first thought was the index on the boolean field in the time_cards, which I could have sworn has caused me problems before. Anyone else see anything wrong? Pre-7.1 versions do have problems with large numbers of equal keys in a btree index, which is more or less the definition of an index on boolean. I'm dubious that such an index is of any value anyway ... Ok--I'll check this. Thanks for the incredibly fast response--my favorite thing about PostgreSQL is the fact that I can post to a mailing list and get clued answers from real developers, usually within hours if not minutes. -- Adam Haberlach|A cat spends her life conflicted between a [EMAIL PROTECTED] |deep, passionate, and profound desire for http://www.newsnipple.com |fish and an equally deep, passionate, and '88 EX500'00 ^ |profound desire to avoid getting wet.
[HACKERS] undocumented parts of SPI
I'm trying to figure out the SPI and need a bit of help, because the docs do not mention (that I can find) some parts of the interface. - SPI_exec(char * sql, int count): this function seems to execute the query. Is the `count' parameter the maximum number of tuples to return? Does count=0 imply return everything? Return value are the SPI_* flags defined in spi.h? - SPI_processed: number of tuples processed during execution of SPI_exec? Does this equate to N in the INSERT/DELETE/UPDATE N messages that are emitted by psql? - Are there any restrictions on what types of queries may be executed by a trigger using SPI_exec? Thanks for the help. Cheers, Brook
[HACKERS] 1/2 OFF PostgreSQL in cluster
Hello world, I saw that PostgreSQL doesn't works with a MOSIX cluster because shared memory. Well, some day MOSIX will be there, I'll be waiting for a shared memory support! The question is: does anybody know another cluster software that supports shared memory? Thanx, -- - | Fbio B. de Paula | [EMAIL PROTECTED] | | Linux Solutions Consultoria | www.olinux.com.br | | www.linuxsolutions.com.br | ICQ: 6399331 | -
Re: [HACKERS] Open 7.1 items
Hi there... I would like to inquire of possible support for running PostgreSQL on a Linux Cluster. How would i implement and configure PostgreSQL as a distributed database i.e. replicated on several servers? I am anxious to hear from you guys. -- Manny C. Cabido e-mail:[EMAIL PROTECTED] [EMAIL PROTECTED] =
[HACKERS] non blocking mode bug?
Hi, I realy get into problems witht his one. I've made an Irix nss library which connects to postgresql. But somehow the backend doesn;t get into active status. The blocking PQconnectdb halts until timeout (if i ignore the errormessage, the results return right after the timeout has expired). By using the non blocking function PQconnectStart() and using a callback function and the select() mainloop of the nss daemon, the status of PQconnectPoll() doesn't turn into PGRESS_POLLING_OK. In the callback routine I check for PGRESS_POLLING_OK, but it never gets active! I put this little piece of code in the callback function for testing, but the syslogs says this routine is called 6300 times before timeout. It seems libpq is sending data to the socket as if it were ready, but it doesn't get the PGRESS_POLLING_OK status! nsd_callback_remove(PQsocket(pgc)); if (pgs != PGRES_POLLING_OK) { nsd_callback_new(PQsocket(pgc), (void *)(ns_psql_ccb(file, pgc, s)), NSD_READ); return; } nsd_timeout_remove(file); Does anybody have any idea about this problem? I might be nss_daemon or even Irix reletaed, but i don't get a way to check this out :( Erik
Re: [HACKERS] floating point representation
Peter Eisentraut [EMAIL PROTECTED] writes: The GNU C library now offers a %a (and %A) format that prints floating point numbers in a semi-internal form that is meant to be portable. (I image this was done because of C99, but I'm speculating.) It might be useful to offer this to preserve accurate data across dumps. Here's what I find in the C99 draft: a,A A double argument representing a (finite) floating- pointnumberis converted in the style [-]0xh.p±d, where there is one hexadecimal digit ^ ± == "+/-" ... tgl (which is nonzero if the argument is a normalized floating-point number and is otherwise unspecified) before the decimal-point character (219) and the number of hexadecimal digits after it is equal to the precision; if the precision is missing and FLT_RADIX is a power of 2, then the precision is sufficient for an exact representation of the value; if the precision is missing and FLT_RADIX is not a power of 2, then the precision is sufficient to distinguish (220) values of type double, except that trailing zeros may be omitted; if the precision is zero and the # flag is not specified, no decimal- point character appears. The letters abcdef are used for a conversion and the letters ABCDEF for A conversion. The A conversion specifier produces a number with X and P instead of x and p. The exponent always contains at least one digit, and only as many more digits as necessary to represent the decimal exponent of 2. If the value is zero, the exponent is zero. A double argument representing an infinity or NaN is converted in the style of an f or F conversion specifier. 219Binary implementations can choose the hexadecimal digit to the left of the decimal-point character so that subsequent digits align to nibble (4-bit) boundaries. 220The precision p is sufficient to distinguish values of the source type if 16p-1bn where b is FLT_RADIX and n is the number of base-b digits in the significand of the source type. A smaller p might suffice depending on the implementation's scheme for determining the digit to the left of the decimal-point character. 7.19.6.1 Library 7.19.6.1 314 Committee Draft -- August 3, 1998 WG14/N843 So, it looks like C99-compliant libc implementations will have this, but I'd hesitate to rely on it for pg_dump purposes; it would certainly not be very portable for awhile yet. Peter's idea of a SET variable to control float display format might not be a bad idea, but what if anything should pg_dump do with it? Maybe just crank the precision up a couple digits from the current defaults? regards, tom lane
[HACKERS] Backup from within Postgres
Dear friends, I have been searching the mailing lists for a couple of days now hoping to find a solution to my problem. Well I hope I find a solution here. The problem is such: I have a Win32 application that uses ODBC to connect to the Postgres (ver 6.5) on RedHat Linux dbase. I have been able to connect and perform a lot of SQL statements, etc. However I am unable to perform the backup of the dbase. I need to backup up the database from within this application. It doesn't matter where the backup file is. I know there is a shell command "pg_dump", yes it works fine from the shell but I need to backup the dbase when connected to the Postgers database on an ODBC connection (I'm using the 32-bit ODBC drivers for postgres). I have also tried making a function in the dbase and including the "pg_dump" in that but to no avail. I would be grateful if there were any suggestions/advice/code to help me with this task. thanx a lot guys, lloyd
[HACKERS] Postgres Benchmark
Hello!! I need demostrate that PostgreSQL is a great RDBMS for my undergraduate project, because this, Does somebody has a bechmark (or similar document) between Postgres and others DB (commercial DB's, principally)? Thanks in advance!!
[HACKERS] extract vs date_part
ISTM that it is mighty confusing that extract() and date_part() don't accept the same set of "field" arguments. - SELECT EXTRACT(decade FROM TIMESTAMP '2001-02-16 20:38:40'); ERROR: parser: parse error at or near "decade" = SELECT EXTRACT("decade" FROM TIMESTAMP '2001-02-16 20:38:40'); ERROR: parser: parse error at or near """ = SELECT date_part('decade', TIMESTAMP '2001-02-16 20:38:40'); date_part --- 200 This can be an easy grammar fix: diff -c -r2.220 gram.y *** gram.y 2001/02/09 03:26:28 2.220 --- gram.y 2001/02/16 19:42:42 *** *** 4987,4992 --- 4987,4993 ; extract_arg: datetime{ $$ = $1; } + | IDENT { $$ = $1; } | TIMEZONE_HOUR { $$ = "tz_hour"; } | TIMEZONE_MINUTE { $$ = "tz_minute"; } ; (Using ColId instead of datetime + IDENT gives reduce/reduce conflicts that I don't want to mess with now.) The date_part implementation is prepared for unknown field selectors, so this should be all safe. Comments? -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/
Re: [HACKERS] Postgres Benchmark
Hi, They're a little dated, but Great Bridge funded some benchmarks last summer putting Postgres 7.0 against Unnamed Proprietary Database 1 (version 8i) and Unnamed Proprietary Database 2 (version 7.0, available for NT platform only). See http://www.greatbridge.com/about/press.php?content_id=4 They ran the AS3AP and TPC-C benchmarks using an off-the-shelf commercial product called Benchmark Factory, now part of Quest Software. See http://www.quest.com/benchmark_factory/ Best regards, Ned Lilly Jreniz wrote: Hello!! I need demostrate that PostgreSQL is a great RDBMS for my undergraduate project, because this, Does somebody has a bechmark (or similar document) between Postgres and others DB (commercial DB's, principally)? Thanks in advance!! -- Ned Lilly e: [EMAIL PROTECTED] Vice Presidentw: www.greatbridge.com Evangelism / Hacker Relationsv: 757.233.5523 Great Bridge, LLCf: 757.233.
[HACKERS] A bug in binary distribution for S.u.S.E. 7.0
Hello, There is a bug in the binary distribution for S.u.S.E. 7.0; in the script "/etc/rc.d/postgres", in the "start" clause. The -D option of the postmaster daemon is used to declare where is the data directory. You do it like this: postgres -D$datadir but you must do it like this: postgres -D $datadir There must be a space among "-D" and "$datadir". David Lizano ~~ David Lizano - Director rea tcnica correo-e: [EMAIL PROTECTED] I Z A N E T - Servicios integrales de internet. web: http://www.izanet.com/ Direccin: C/ Checa, 57-59, 3 D - 50.007 Zaragoza (Espaa) Telfono: +34 976 25 80 23Fax: +34 976 25 80 24 ~~
[HACKERS] beta5 ...
things appear to have quieted off nicely ... so would like to put out a Beta5 for testing ... Tom, I saw/read your proposal about the JOIN syntax, but haven't seen any commit on it yet, nor any arguments against the changes ... so just wondering where those stand right now? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org
Re: [HACKERS] Backup from within Postgres
Online -- Goa writes: I have a Win32 application that uses ODBC to connect to the Postgres (ver 6.5) on RedHat Linux dbase. I have been able to connect and perform a lot of SQL statements, etc. However I am unable to perform the backup of the dbase. I need to backup up the database from within this application. Then you need to ask the author of that application to add this functionality. If this is your own application, then you will have to duplicate a lot of pg_dump's code in it, which will probably be a rather large project. -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/
[HACKERS] SPI_Free() causes backend to close.
Perhaps someone can explain what's wrong with this. Excuse the mess, it was cut out of a much larger function but reliably creates the error on my Postgres 7.1 beta 4 machine. Compile C function, restart postgres (for the heck of it), create a new db (I used 'stuff), import sql. The insert it runs at the end fails, and: pqReadData() -- backend closed the channel unexpectedly. Quite confused, if I remove the SPI_Finish() it works fine, but complains every time an SPI_Connect is issued after the first run of the function. As you can see, the closure is SPI_Finish as the notice before appears, and the notice after doesn't. -- Rod Taylor There are always four sides to every story: your side, their side, the truth, and what really happened. BEGIN:VCARD VERSION:2.1 N:Taylor;Rod;B FN:Taylor, Rod B ORG:BarChord Entertainment Inc.;System Operation and Development TITLE:Chief Technical Officer ADR;WORK:;;;Toronto;Ontario;;Canada LABEL;WORK;ENCODING=QUOTED-PRINTABLE:Toronto, Ontario=0D=0ACanada X-WAB-GENDER:2 URL: URL:http://www.barchord.com BDAY:19790401 EMAIL;INTERNET:[EMAIL PROTECTED] EMAIL;PREF;INTERNET:[EMAIL PROTECTED] EMAIL;INTERNET:[EMAIL PROTECTED] REV:20010216T210809Z END:VCARD bad_function.sql bad_function.c
Re: [HACKERS] beta5 ...
The Hermit Hacker [EMAIL PROTECTED] writes: things appear to have quieted off nicely ... so would like to put out a Beta5 for testing ... Tom, I saw/read your proposal about the JOIN syntax, but haven't seen any commit on it yet, nor any arguments against the changes ... so just wondering where those stand right now? You must have been looking the other way ;-) ... it's committed. What I'm currently thinking about is the discussion from last week where Vadim reported that he could get "stuck spinlock" errors during btree index crash recovery, because the backend fixing the index might hold disk-buffer locks longer than the ~70 second timeout for spinlocks (see "Btree runtime recovery. Stuck spins" thread on 2/8 and 2/9). Vadim says (and I agree) that we really ought to implement a new lightweight lock manager that would fall between spinlocks and regular locks in terms of overhead and functionality. But it's not reasonable to try to do that for 7.1 at this late date. So I was trying to pick a stopgap solution for 7.1. Unfortunately Vadim's off to Siberia and I can't consult with him... I'm currently thinking of modifying the buffer manager so that disk buffer spinlocks use an alternate version of s_lock() with no timeout, and perhaps longer sleeps (no zero-delay selects anyway). This was one of the ideas we kicked around last week, and I think it's about the best we can do for now. Comments anyone? Other than that, I have nothing to hold up a beta5. Anyone else? regards, tom lane
[HACKERS] Re: extract vs date_part
(Using ColId instead of datetime + IDENT gives reduce/reduce conflicts that I don't want to mess with now.) The date_part implementation is prepared for unknown field selectors, so this should be all safe. Comments? Works for me. Since extract required explicit reserved words, I had just implemented the ones specified in the SQL9x standard. Your extension patch is a great idea, as long as others agree it can go into the beta (afaict this is an extremely low risk fix). - Thomas
RE: [HACKERS] floating point representation
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED]] Peter Eisentraut [EMAIL PROTECTED] writes: The GNU C library now offers a %a (and %A) format that prints floating point numbers in a semi-internal form that is meant to be portable. (I image this was done because of C99, but I'm speculating.) It might be useful to offer this to preserve accurate data across dumps. [snip] So, it looks like C99-compliant libc implementations will have this, but I'd hesitate to rely on it for pg_dump purposes; it would certainly not be very portable for awhile yet. Agreed. Peter's idea of a SET variable to control float display format might not be a bad idea, but what if anything should pg_dump do with it? Maybe just crank the precision up a couple digits from the current defaults? Currently the precision of float display format is FLT_DIG(DBL_DIG). It's not sufficent to distinguish float values. As Peter already suggested, the quickest solution would be to change XXX_DIG constants to variables and provide a routine to SET the variables. Strictly speaking the precision needed to distigush float values seems OS-dependent. It seems preferable to have a symbol to specify the precision. Regards, Hiroshi Inoue
Re: [HACKERS] beta5 ...
I am GO. SET DIAGNOSTICS is my only open item left. The Hermit Hacker [EMAIL PROTECTED] writes: things appear to have quieted off nicely ... so would like to put out a Beta5 for testing ... Tom, I saw/read your proposal about the JOIN syntax, but haven't seen any commit on it yet, nor any arguments against the changes ... so just wondering where those stand right now? You must have been looking the other way ;-) ... it's committed. What I'm currently thinking about is the discussion from last week where Vadim reported that he could get "stuck spinlock" errors during btree index crash recovery, because the backend fixing the index might hold disk-buffer locks longer than the ~70 second timeout for spinlocks (see "Btree runtime recovery. Stuck spins" thread on 2/8 and 2/9). Vadim says (and I agree) that we really ought to implement a new lightweight lock manager that would fall between spinlocks and regular locks in terms of overhead and functionality. But it's not reasonable to try to do that for 7.1 at this late date. So I was trying to pick a stopgap solution for 7.1. Unfortunately Vadim's off to Siberia and I can't consult with him... I'm currently thinking of modifying the buffer manager so that disk buffer spinlocks use an alternate version of s_lock() with no timeout, and perhaps longer sleeps (no zero-delay selects anyway). This was one of the ideas we kicked around last week, and I think it's about the best we can do for now. Comments anyone? Other than that, I have nothing to hold up a beta5. Anyone else? regards, tom lane -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
[HACKERS] Performance lossage in checkpoint dumping
While poking at Peter Schmidt's comments about pgbench showing worse performance than for 7.0 (using -F in both cases), I noticed that given enough buffer space, FileWrite never seemed to get called at all. A little bit of sleuthing revealed the following: 1. Under WAL, we don't write dirty buffers out of the shared memory at every transaction commit. Instead, as long as a dirty buffer's slot isn't needed for something else, it just sits there until the next checkpoint or shutdown. CreateCheckpoint calls FlushBufferPool which writes out all the dirty buffers in one go. This is a Good Thing; it lets us consolidate multiple updates of a single datafile page by successive transactions into one disk write. We need this to buy back some of the extra I/O required to write the WAL logfile. 2. However, this means that a lot of the dirty-buffer writes get done by the periodic checkpoint process, not by the backends that originally dirtied the buffers. And that means that every last one gets done by blind write, because the checkpoint process isn't going to have opened any relation cache entries --- maybe a couple of system catalog relations, but for sure it won't have any for user relations. If you look at BufferSync, any page that the current process doesn't have an already-open relcache entry for is sent to smgrblindwrt not smgrwrite. 3. Blind write is gratuitously inefficient: it does separate open, seek, write, close kernel calls for every request. This was the right thing in 7.0.*, because backends relatively seldom did blind writes and even less often needed to blindwrite multiple pages of a single relation in succession. But the typical usage has changed a lot. I am thinking it'd be a good idea if blind write went through fd.c and thus was able to re-use open file descriptors, just like normal writes. This should improve the efficiency of dumping dirty buffers during checkpoint by a noticeable amount. Comments? regards, tom lane
Re: [HACKERS] Performance lossage in checkpoint dumping
3. Blind write is gratuitously inefficient: it does separate open, seek, write, close kernel calls for every request. This was the right thing in 7.0.*, because backends relatively seldom did blind writes and even less often needed to blindwrite multiple pages of a single relation in succession. But the typical usage has changed a lot. I am thinking it'd be a good idea if blind write went through fd.c and thus was able to re-use open file descriptors, just like normal writes. This should improve the efficiency of dumping dirty buffers during checkpoint by a noticeable amount. I totally agree the current code is broken. I am reading what you say and am thinking, "Oh well, we lose there, but at least we only open a relation once and do them in one shot." Now I am hearing that is not true, and it is a performance problem. This is not a total surprise. We have that stuff pretty well streamlined for the old behavour. Now that things have changed, I can see the need to reevaluate stuff. Not sure how to handle the beta issue though. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] Performance lossage in checkpoint dumping
Bruce Momjian [EMAIL PROTECTED] writes: I am thinking it'd be a good idea if blind write went through fd.c and thus was able to re-use open file descriptors, just like normal writes. This should improve the efficiency of dumping dirty buffers during checkpoint by a noticeable amount. Not sure how to handle the beta issue though. After looking a little more, I think this is too big a change to risk making for beta. I was thinking it might be an easy change, but it's not; there's noplace to store the open-relation reference if we don't have a relcache entry. But we don't want to pay the price of opening a relcache entry just to dump some buffers. I recall Vadim speculating about decoupling the storage manager's notion of open files from the relcache, and having a much more lightweight open-relation mechanism at the smgr level. That might be a good way to tackle this. But I'm not going to touch it for 7.1... regards, tom lane
Re: [HACKERS] Performance lossage in checkpoint dumping
After looking a little more, I think this is too big a change to risk making for beta. I was thinking it might be an easy change, but it's not; there's noplace to store the open-relation reference if we don't have a relcache entry. But we don't want to pay the price of opening a relcache entry just to dump some buffers. I recall Vadim speculating about decoupling the storage manager's notion of open files from the relcache, and having a much more lightweight open-relation mechanism at the smgr level. That might be a good way to tackle this. But I'm not going to touch it for 7.1... No way to group the writes to you can keep the most recent one open? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
[HACKERS] Re: beta5 ...
At 04:17 PM 2/16/01 -0500, Tom Lane wrote: Vadim says (and I agree) that we really ought to implement a new lightweight lock manager that would fall between spinlocks and regular locks in terms of overhead and functionality. But it's not reasonable Will there be an arbitrary user locking feature? E.g. lock on arbitrary text string. That would be great :). BTW, is 7.1 going to be a bit slower than 7.0? Or just Beta 5? Just curious. Don't mind waiting for 7.2 for the speed-up if necessary. Cheerio, Link.
Re: [HACKERS] Performance lossage in checkpoint dumping
Bruce Momjian [EMAIL PROTECTED] writes: But I'm not going to touch it for 7.1... No way to group the writes to you can keep the most recent one open? Don't see an easy way, do you? No, but I haven't looked at it. I am now much more concerned with the delay, and am wondering if I should start thinking about trying my idea of looking for near-committers and post the patch to the list to see if anyone likes it for 7.1 final. Vadim will not be back in enough time to write any new code in this area, I am afraid. We could look to fix this in 7.1.1. Let's see what the pgbench tester comes back with when he sets the delay to zero. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] Re: beta5 ...
BTW, is 7.1 going to be a bit slower than 7.0? Or just Beta 5? Just curious. Don't mind waiting for 7.2 for the speed-up if necessary. We expect 7.1 to be faster than 7.0.X. We may have a small problem that we may have to address. Not sure yet. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
[HACKERS] Re: beta5 ...
On Sat, 17 Feb 2001, Lincoln Yeoh wrote: At 04:17 PM 2/16/01 -0500, Tom Lane wrote: Vadim says (and I agree) that we really ought to implement a new lightweight lock manager that would fall between spinlocks and regular locks in terms of overhead and functionality. But it's not reasonable Will there be an arbitrary user locking feature? E.g. lock on arbitrary text string. That would be great :). BTW, is 7.1 going to be a bit slower than 7.0? Or just Beta 5? Just curious. Don't mind waiting for 7.2 for the speed-up if necessary. It is possible that it will be ... the question is whether the slow down is unbearable or not, as to whether we'll let it hold things up or not ... From reading one of Tom's email's, it looks like the changes to 'fix' the slowdown are drastic/large enough that it might not be safe (or desirable) to fix it at this late of a stage in beta ... Depending on what is involved, we might put out a v7.1 for March 1st, so that ppl can feel confident about using the various features, but have a v7.1.1 that follows relatively closely on its heels that addresses the performance problem ...
Re: [HACKERS] Performance lossage in checkpoint dumping
On Fri, 16 Feb 2001, Bruce Momjian wrote: Bruce Momjian [EMAIL PROTECTED] writes: But I'm not going to touch it for 7.1... No way to group the writes to you can keep the most recent one open? Don't see an easy way, do you? No, but I haven't looked at it. I am now much more concerned with the delay, and am wondering if I should start thinking about trying my idea of looking for near-committers and post the patch to the list to see if anyone likes it for 7.1 final. Vadim will not be back in enough time to write any new code in this area, I am afraid. Near committers? *puzzled look*
Re: [HACKERS] beta5 ...
Other than that, I have nothing to hold up a beta5. Anyone else? regards, tom lane I see a small problem with the regression test. If PL/pgSQL has been already to template1, the regression scripts will fail because createlang fails. Probably we should create the regression database using template0? -- Tatsuo Ishii
Re: [HACKERS] beta5 ...
Tatsuo Ishii [EMAIL PROTECTED] writes: Probably we should create the regression database using template0? Seems like a good idea. regards, tom lane
Re: [HACKERS] Performance lossage in checkpoint dumping
The Hermit Hacker [EMAIL PROTECTED] writes: No way to group the writes to you can keep the most recent one open? Don't see an easy way, do you? No, but I haven't looked at it. I am now much more concerned with the delay, I concur. The blind write business is not important enough to hold up the release for --- for one thing, it has nothing to do with the pgbench results we're seeing, because these tests don't run long enough to include any checkpoint cycles. The commit delay, on the other hand, is a big problem. and am wondering if I should start thinking about trying my idea of looking for near-committers and post the patch to the list to see if anyone likes it for 7.1 final. Vadim will not be back in enough time to write any new code in this area, I am afraid. Near committers? *puzzled look* Processes nearly ready to commit. I'm thinking that any mechanism for detecting that might be overkill, however, especially compared to just setting commit_delay to zero by default. I've been sitting here running pgbench under various scenarios, and so far I can't find any condition where commit_delay0 is materially better than commit_delay=0, even under heavy load. It's either the same or much worse. Numbers to follow... regards, tom lane
Re: [HACKERS] Re: beta5 ...
BTW, is 7.1 going to be a bit slower than 7.0? Or just Beta 5? Just curious. Don't mind waiting for 7.2 for the speed-up if necessary. It is possible that it will be ... the question is whether the slow down is unbearable or not, as to whether we'll let it hold things up or not ... From reading one of Tom's email's, it looks like the changes to 'fix' the slowdown are drastic/large enough that it might not be safe (or desirable) to fix it at this late of a stage in beta ... Depending on what is involved, we might put out a v7.1 for March 1st, so that ppl can feel confident about using the various features, but have a v7.1.1 that follows relatively closely on its heels that addresses the performance problem ... The easy fix is to just set the delay to zero. Looks like that will fix most of the problem. The near-committers thing may indeed be overkill, and certainly is not worth holding beta. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
[HACKERS] Re: [ADMIN] v7.1b4 bad performance
"Schmidt, Peter" [EMAIL PROTECTED] writes: So, is it OK to use commit_delay=0? Certainly. In fact, I think that's about to become the default ;-) I have now experimented with several different platforms --- HPUX, FreeBSD, and two considerably different strains of Linux --- and I find that the minimum delay supported by select(2) is 10 or more milliseconds on all of them, as much as 20 msec on some popular platforms. Try it yourself (my test program is attached). Thus, our past arguments about whether a few microseconds of delay before commit are a good idea seem moot; we do not have any portable way of implementing that, and a ten millisecond delay for commit is clearly Not Good. regards, tom lane /* To use: gcc test.c, then time ./a.out N N=0 should return almost instantly, if your select(2) does not block as per spec. N=1 shows the minimum achievable delay, * 1000 --- for example, if time reports the elapsed time as 10 seconds, then select has rounded your 1-microsecond delay request up to 10 milliseconds. Some Unixen seem to throw in an extra ten millisec of delay just for good measure, eg, on FreeBSD 4.2 N=1 takes 20 sec, N=2 takes 30. */ #include stdio.h #include stdlib.h #include sys/stat.h #include sys/time.h #include sys/types.h int main(int argc, char** argv) { struct timeval delay; int i, del; del = atoi(argv[1]); for (i = 0; i 1000; i++) { delay.tv_sec = 0; delay.tv_usec = del; (void) select(0, NULL, NULL, NULL, delay); } return 0; }
Re: [ADMIN] v7.1b4 bad performance
I wrote: Thus, our past arguments about whether a few microseconds of delay before commit are a good idea seem moot; we do not have any portable way of implementing that, and a ten millisecond delay for commit is clearly Not Good. I've now finished running a spectrum of pgbench scenarios, and I find no case in which commit_delay = 0 is worse than commit_delay 0. Now this is just one benchmark on just one platform, but it's pretty damning... Platform: HPUX 10.20 on HPPA C180, fast wide SCSI discs, 7200rpm (I think). Minimum select(2) delay is 10 msec on this platform. POSTMASTER OPTIONS: -i -B 1024 -N 100 $ PGOPTIONS='-c commit_delay=1' pgbench -c 1 -t 1000 bench tps = 13.304624(including connections establishing) tps = 13.323967(excluding connections establishing) $ PGOPTIONS='-c commit_delay=0' pgbench -c 1 -t 1000 bench tps = 16.614691(including connections establishing) tps = 16.645832(excluding connections establishing) $ PGOPTIONS='-c commit_delay=1' pgbench -c 10 -t 100 bench tps = 13.612502(including connections establishing) tps = 13.712996(excluding connections establishing) $ PGOPTIONS='-c commit_delay=0' pgbench -c 10 -t 100 bench tps = 14.674477(including connections establishing) tps = 14.787715(excluding connections establishing) $ PGOPTIONS='-c commit_delay=1' pgbench -c 30 -t 100 bench tps = 10.875912(including connections establishing) tps = 10.932836(excluding connections establishing) $ PGOPTIONS='-c commit_delay=0' pgbench -c 30 -t 100 bench tps = 12.853009(including connections establishing) tps = 12.934365(excluding connections establishing) $ PGOPTIONS='-c commit_delay=1' pgbench -c 50 -t 100 bench tps = 9.476856(including connections establishing) tps = 9.520800(excluding connections establishing) $ PGOPTIONS='-c commit_delay=0' pgbench -c 50 -t 100 bench tps = 9.807925(including connections establishing) tps = 9.854161(excluding connections establishing) With -F (no fsync), it's the same story: POSTMASTER OPTIONS: -i -o -F -B 1024 -N 100 $ PGOPTIONS='-c commit_delay=1' pgbench -c 1 -t 1000 bench tps = 40.584300(including connections establishing) tps = 40.708855(excluding connections establishing) $ PGOPTIONS='-c commit_delay=0' pgbench -c 1 -t 1000 bench tps = 51.585629(including connections establishing) tps = 51.797280(excluding connections establishing) $ PGOPTIONS='-c commit_delay=1' pgbench -c 10 -t 100 bench tps = 35.811729(including connections establishing) tps = 36.448439(excluding connections establishing) $ PGOPTIONS='-c commit_delay=0' pgbench -c 10 -t 100 bench tps = 43.878827(including connections establishing) tps = 44.856029(excluding connections establishing) $ PGOPTIONS='-c commit_delay=1' pgbench -c 30 -t 100 bench tps = 23.490464(including connections establishing) tps = 23.749558(excluding connections establishing) $ PGOPTIONS='-c commit_delay=0' pgbench -c 30 -t 100 bench tps = 23.452935(including connections establishing) tps = 23.716181(excluding connections establishing) I vote for commit_delay = 0, unless someone can show cases where positive delay is significantly better than zero delay. regards, tom lane
Re: [HACKERS] Re: [ADMIN] v7.1b4 bad performance
"Schmidt, Peter" [EMAIL PROTECTED] writes: So, is it OK to use commit_delay=0? Certainly. In fact, I think that's about to become the default ;-) I agree with Tom. I did some benchmarking tests using pgbench for a computer magazine in Japan. I got a almost equal or better result for 7.1 than 7.0.3 if commit_delay=0. See included png file. -- Tatsuo Ishii performance.png
Re: [HACKERS] Re: [ADMIN] v7.1b4 bad performance
Tatsuo Ishii [EMAIL PROTECTED] writes: I agree with Tom. I did some benchmarking tests using pgbench for a computer magazine in Japan. I got a almost equal or better result for 7.1 than 7.0.3 if commit_delay=0. See included png file. Interesting curves. One thing you might like to know is that while poking around with a profiler this afternoon, I found that the vast majority of the work done for this benchmark is in the uniqueness checks driven by the unique indexes. Declare those as plain (non unique) and the TPS figures would probably go up noticeably. That doesn't make the test invalid, but it does suggest that pgbench is emphasizing one aspect of system performance to the exclusion of others ... regards, tom lane
[HACKERS] Re: [ADMIN] v7.1b4 bad performance
... See included png file. What kind of machine was this run on? - Thomas