Re: [HACKERS] XLogWrite: write request 0/53A4000 is past end of log 0/53A4000
On Fri, Jul 16, 2004 at 01:32:01AM -0400, lists wrote: LOG: ReadRecord: unexpected pageaddr 0/33A4000 in log file 0, segment 5, offset 3817472 LOG: redo is not required PANIC: XLogWrite: write request 0/53A4000 is past end of log 0/53A4000 LOG: startup process (pid 16068) was terminated by signal 6 LOG: aborting startup due to startup process failure this is 7.3.3 (PGDG rpm's) on red hat 7.2 Get 7.3.6 and try again. IIRC this is a known bug that was repaired in 7.3.4. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Al principio era UNIX, y UNIX habló y dijo: Hello world\n. No dijo Hello New Jersey\n, ni Hello USA\n. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] XLogWrite: write request 0/53A4000 is past end of log
thank you for replying. Alvaro Herrera wrote: LOG: ReadRecord: unexpected pageaddr 0/33A4000 in log file 0, segment 5, offset 3817472 LOG: redo is not required PANIC: XLogWrite: write request 0/53A4000 is past end of log 0/53A4000 LOG: startup process (pid 16068) was terminated by signal 6 LOG: aborting startup due to startup process failure this is 7.3.3 (PGDG rpm's) on red hat 7.2 Get 7.3.6 and try again. IIRC this is a known bug that was repaired in 7.3.4. do you know if there are rpm's for RH7.2 available? to get the data back, is it a matter of just upgrading and starting the db (ie will it perform recovery)? thanks ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] schema dumping
Hi, Can I change pg_dump to never use the AUTHORIZATION clause and use OWNER TO instead? It would make things a lot simpler, especially in the case when dumping that the public schema has had its ownership changed. This is what the comment says: * Note that ownership is shown in the AUTHORIZATION clause, * while the archive entry is listed with empty owner (causing * it to be emitted with SET SESSION AUTHORIZATION DEFAULT). * This seems the best way of dealing with schemas owned by * users without CREATE SCHEMA privilege. Further hacking has * to be applied for --no-owner mode, though! We now have a better way, but it's non-standard. Shall I change it? Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Converting epoch to timestamp
SELECT EXTRACT(EPOCH FROM TIMESTAMP '2003-01-01 11:23:44'); Yeah, but I think Michael's question had to do with going the other way (numeric to timestamp). Sorry, SELECT EXTRACT(TIMESTAMP FROM EPOCH '123412341234'); Chris ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] XLogWrite: write request 0/53A4000 is past end of log
LOG: ReadRecord: unexpected pageaddr 0/33A4000 in log file 0, segment 5, offset 3817472 LOG: redo is not required PANIC: XLogWrite: write request 0/53A4000 is past end of log 0/53A4000 LOG: startup process (pid 16068) was terminated by signal 6 LOG: aborting startup due to startup process failure this is 7.3.3 (PGDG rpm's) on red hat 7.2 Get 7.3.6 and try again. IIRC this is a known bug that was repaired in 7.3.4. That won't help him start up his DB though will it? It will just prevent the problem in the future? Is it the same fix as for me? Appending zeros to a file? Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] XLogWrite: write request 0/53A4000 is past end of log 0/53A4000
On Fri, Jul 16, 2004 at 02:09:33PM +0800, Christopher Kings-Lynne wrote: LOG: ReadRecord: unexpected pageaddr 0/33A4000 in log file 0, segment 5, offset 3817472 LOG: redo is not required PANIC: XLogWrite: write request 0/53A4000 is past end of log 0/53A4000 LOG: startup process (pid 16068) was terminated by signal 6 LOG: aborting startup due to startup process failure this is 7.3.3 (PGDG rpm's) on red hat 7.2 Get 7.3.6 and try again. IIRC this is a known bug that was repaired in 7.3.4. That won't help him start up his DB though will it? AFAIU it should ... It will just prevent the problem in the future? Is it the same fix as for me? Appending zeros to a file? As far as I remember, your problem was much worse ... This is a strange bug where the Xlog code would refuse to replay a record that starts just at the start of the segment. A later version deals with this situation correctly. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Este mail se entrega garantizadamente 100% libre de sarcasmo. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] XLogWrite: write request 0/53A4000 is past end of log 0/53A4000
On Fri, Jul 16, 2004 at 01:55:13AM -0400, lists wrote: thank you for replying. Alvaro Herrera wrote: LOG: ReadRecord: unexpected pageaddr 0/33A4000 in log file 0, segment 5, offset 3817472 LOG: redo is not required PANIC: XLogWrite: write request 0/53A4000 is past end of log 0/53A4000 LOG: startup process (pid 16068) was terminated by signal 6 LOG: aborting startup due to startup process failure this is 7.3.3 (PGDG rpm's) on red hat 7.2 Get 7.3.6 and try again. IIRC this is a known bug that was repaired in 7.3.4. do you know if there are rpm's for RH7.2 available? No idea, sorry. to get the data back, is it a matter of just upgrading and starting the db (ie will it perform recovery)? That should do. Depending on how much you value your data, however, it might be advisable to make a backup of the data directory just in case ... -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, sólo le suma el nuevo terror de la locura (Perelandra, CSLewis) ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 10:03 PM To: Dann Corbit Cc: Oliver Jowett; Magnus Hagander; Hackers; [EMAIL PROTECTED] Subject: Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone Dann Corbit [EMAIL PROTECTED] writes: All translations between UTC time and local time are based on the following formula: UTC = local time + bias Surely not. Or has Windows not heard of daylight-savings time? Or perhaps they have, but are not aware that the DST laws have changed often in the past? No problems. They even handle time zones with arbitrary minute boundaries (not on the hour) with aplomb. Over-simplistic answers are not what we need here. The data structure you quote cannot even tell what DST transition dates Windows thinks are in effect this year, let alone what it thinks the dates were in past years. Yes, there are other calls for that, obviously. I sent to Mr. Momjian a complete implementation of time zone stuff that uses Windows calls. It's also accurate to a fraction of a nanosecond millions of years into the past and the future. The call that I showed returns the NAME OF THE TIME ZONE and also what it is called when you are in Daylight savings time. I thought the issue under question was to find out what the time zone was. This program: #include windows.h #include iostream using namespace std; int main(void) { TIME_ZONE_INFORMATION tz; DWORD i = GetTimeZoneInformation(tz); for (i = 0; i 32 tz.StandardName[i]; i++) cout (TCHAR) tz.StandardName[i]; cout endl; for (i = 0; i 32 tz.DaylightName[i]; i++) cout (TCHAR) tz.DaylightName[i]; cout endl; return 0; } Prints this: Pacific Standard Time Pacific Daylight Time There is also a global variable called _tzname that contains the name of the time zone. See: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/h tml/_crt__daylight.2c_._timezone.2c_.and__tzname.asp On my machine: cout _tzname[0]; // - PST cout _tzname[1]; // - PDT As far as doing the calculations for time values, do whatever you like (it's not that difficult either, and the code I send does address all that stuff, though it is in C++). Don't forget that things are inverted in the southern hemisphere. Have a good one. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Converting epoch to timestamp
On Jul 16, 2004, at 3:08 PM, Christopher Kings-Lynne wrote: SELECT EXTRACT(TIMESTAMP FROM EPOCH '123412341234'); That's a really interesting idea! Makes for a much more consistent syntax for our other functions. ISTM this might require a native EPOCH datatype. I wouldn't want to encourage people to use EPOCH as a datatype though. Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Point in Time Recovery
On Fri, 2004-07-16 at 04:49, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2004-07-16 at 00:01, Alvaro Herrera wrote: My manpage for signal(2) says that you shouldn't assign SIG_IGN to SIGCHLD, according to POSIX. So - I should be setting this to SIG_DFL and thats good for everyone? Yeah, we learned the same lesson in the backend not too many releases back. SIG_IGN'ing SIGCHLD is bad voodoo; it'll work on some platforms but not others. Many thanks all, Best Regards Simon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] serverlog rotation/functions
Bruce Momjian wrote: Andreas Pflug wrote: You do something that splits the value into directory name and file name and removes every letter after %. /var/log postgresql.log.%-%-%_%%% Another idea is to allow filename wildcards in the listing so it becomes: SELECT * FROM dir_listing('/var/log/postgresql.log.*-*-*_***') AS dir While that is nice, it doesn't match the functionality of opendir so we are perhaps better with one that doesn't handle wildcards and we just do the wildcard processing in the WHERE clause. Uh, this looks ugly. How about pg_logfile_list() RETURNS setof timestamp -- to list available logfiles pg_logfile_filename(timestamp) to return filename for that logfile I don't see the need to return timestamps. If you select any empty directory, you can just return the file names. The only reason you might need a pattern is to distinguish pg log files from other log files. If you want, create a server-side function that returns the file name with the strftime() patterns converted to '*'. and generic pg_dir(wildcard_text) Maybe pg_dir_ls(). OK, it would be nice if we could do a sed operation like: sed 's/%./*/g' but I don't know a way to do that without defining a function or pulling in a procedural language, but if we could do it we could do: pg_dir(echo log_destination | sed 's/%./*/g') Ar ever used sed on win32?!? And how should the timestamp be represented in client tools? Date/time interpretation is always a source of problems, so *please* let the server do that. Rethinking all this, I'd like the pg_logfile_list to return a complex type: CREATE TYPE pg_logfile_list AS ( filedate timestamp, filename text, backendpid int, inuse bool) and pg_logfile_list() RETURNS SETOF pg_logfile_list which would enable SELECT filename, pg_file_unlink(filename) FROM pg_logfile_list() WHERE filedate current_timestamp - '3 months'::interval AND NOT inuse In-use check is easy for the backend, if the syslog process publishes the current logfile's timestamp in sharedmem. We can use a GUC variable for the log_directory (not log_destination); anyway, I'd like the filenames to be selected by the server. Regards, Andreas ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone
It occurs to me that with a check this thorough, we might be able to finesse the problem on Windows with the system returning very nonstandard timezone abbreviations. It does *not* pick up my timezone. Drat. I assume from your domain name that Europe/Stockholm would actually be the best choice for you? What Windows timezone setting are you using for this test? Yup, that would be the best. Either that or CET/CEST. I'm using the timezone that in the GUI is called (GMT +01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna. srtftime with %z calls it W. Europe Daylight Time. The following possibilities are rejected per: DEBUG: Reject TZ Europe/Stockholm: at 16844400 1970-07-15 00:00:00 std versus 1970-07-15 01:00:00 dst If you look in src/timezone/data/europe you will see that the zic database thinks Sweden was on strict GMT+1 (no daylight savings) between 1916 and 1980, and since 1980 they were on EU daylight-savings rules. Does that square with your ideas of reality? (If it does not then we should just punt the problem upstream to the zic people, but I will assume here that their research is good.) Actually, I was sure that was wrong, but started looking it up, and it seems it's about right. See http://www.sp.se/metrology/timefreq/eng/standard_time.htm for an official take on the summertime rules. I *think* that is the EU rules. What I suspect given the above is that Windows has no clue about historical reality and is retroactively applying the current DST rules back to 1970, thus deciding that 1970-07-15 was on DST when it was really not. That could definitly be. Since before 1970 they say there is no DST at all (remember the old issues), it wouldn't surprise me a bit if they took this kind of shortcut. I thought about restricting the scope of the TZ testing to start in 1990 or so to avoid this, but that seems certain to fall foul of the other problem, which is distinguishing closely-related timezones (cf Chris K-L discovering that he lives in Antarctica, a few days back...) Maybe the whole match-on-behavior approach is wrong and we need to do something else, but I'm really unsure what. Ideas? Well, Windows specific we could do a translate table. SInce there is a finite (and not *huge*) number of timezones. But we'd have to figure out for each one which to match. But that won't work for unix I think - the lookup table would be just huge considwering all possible combinatinos... //Magnus ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] analyze.c
Hi guys, during my study i have to du some practical work. i must extend postgresql with some sampling algorithms. therefore i implemented some trigger functions. it works fine. but now i have to do some changes concerning the parser, particularly analyze.c . if a user sends an create table command, postgresql should create several sample tables, which will be filled wih the sample data from the original table. for the beginning, it would be enough, if two tables are created, the original one, with table name, columns etc. selectd by the user and second one, a copy with same columns etc, but another name e.g. sample. my problem is, that the system catalogs are created at initdb. then the following failure occurs: loading pg_description... /usr/local/pqsql/initdb: line 837: 22348 Done (141) (cat EOF CREATE TEMP TABLE tmp_pg_description (objoid oid, classname name, objsuboid int4, description text) WITHOUT OIDS; COPY tmp_pg_description FROM STDIN; EOF cat $POSTGRES_DESC; cat EOF \. INSERT INTO pg_description SELECT t.objoid, c.oid, t.objsuboid, t.description FROM tmp_pg_description t, pg_class c WHERE c.relname = t.classname; EOF) 22349 Segmentation fault $PGPATH/postgres $PGSQL_OPT template1/dev/null how can i tell postgres. that it should only create the sample table, when the original table to be created has the name origin or something like that? or that it should only act, if some user wants to create the table, not postgres itself? thank you very much! greetings, anja ___ WEB.DE Video-Mail - Sagen Sie mehr mit bewegten Bildern Informationen unter: http://freemail.web.de/?mc=021199 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Point in Time Recovery
I'm aiming for the minimum feature set - which means we do need to take care over whether that set is insufficient and also to pull any part that doesn't stand up to close scrutiny over the next few days. As you can see, we are still chewing on NT. What PITR features are missing? I assume because we can't stop file system writes during backup that we will need a backup parameter file like I described. Is there anything else that PITR needs? No, we don't need to stop writes ! Not even to split a mirror, other db's need that to be able to restore, but we dont. We only need to tell people to backup pg_control first. The rest was only intended to enforce 1. that pg_control is the first file backed up 2. the dba uses a large enough PIT (or xid) for restore I think the idea with an extra file with WAL start position was overly complicated, since all you need is pg_control (+ WAL end position to enforce 2.). If we don't want to tell people to backup pg_control first, imho the next best plan would be to add a WAL start input (e.g. xlog name) parameter to recovery.conf, that fixes pg_control. Andreas ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone
Over-simplistic answers are not what we need here. The data structure you quote cannot even tell what DST transition dates Windows thinks are in effect this year, let alone what it thinks the dates were in past years. Yes, there are other calls for that, obviously. I sent to Mr. Momjian a complete implementation of time zone stuff that uses Windows calls. It's also accurate to a fraction of a nanosecond millions of years into the past and the future. You wouldn't happen to know if Windows internally has knowledge of the many different DST rules in force at different years? See the other mail about how it apparantly deals with Swedish DST - if that's common or a single case? If the OS doesn't have that knowledge, we can give up trying to get it to tell us about it :-) The call that I showed returns the NAME OF THE TIME ZONE and also what it is called when you are in Daylight savings time. I thought the issue under question was to find out what the time zone was. Nope, we already had that. The issue is that the names are not the same as the one used in zic/unix, so there is nothing to match on. //Magnus ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone
Magnus Hagander [EMAIL PROTECTED] writes: I thought the issue under question was to find out what the time zone was. Nope, we already had that. The issue is that the names are not the same as the one used in zic/unix, so there is nothing to match on. Right. The problem we are actually faced with is to identify which of the zic timezones is the best match for the system's timezone setting. One of the issues is that it's not clear what best means... At the moment I like Oliver Jowett's idea of defining best as the one that matches furthest back. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] serverlog rotation/functions
Andreas Pflug [EMAIL PROTECTED] writes: In-use check is easy for the backend, if the syslog process publishes the current logfile's timestamp in sharedmem. You really haven't absorbed any of the objections I've raised, have you? I don't want the log process connected to shared mem at *all*, and see no particularly good reason why it should be. We can use a GUC variable for the log_directory (not log_destination); anyway, I'd like the filenames to be selected by the server. The directory should definitely be a GUC variable. The individual filenames should probably be of the form prefixtimestamp, where the server dictates the format of the timestamp (and we choose it so that the names sort correctly). We could let the prefix be user-selectable or make it hard-wired; I don't have a strong feeling about that either way. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Point in Time Recovery
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: We only need to tell people to backup pg_control first. The rest was only intended to enforce 1. that pg_control is the first file backed up 2. the dba uses a large enough PIT (or xid) for restore Right, but I think Bruce's point is that it is far too easy to get those things wrong; especially point 2 for which a straight tar dump will simply not contain the information you need to determine what is a safe stopping point. I agree with Bruce that we should have some mechanism that doesn't rely on the DBA to get this right. Exactly what the mechanism should be is certainly open for discussion... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] serverlog rotation/functions
Andreas Pflug wrote: OK, it would be nice if we could do a sed operation like: sed 's/%./*/g' but I don't know a way to do that without defining a function or pulling in a procedural language, but if we could do it we could do: pg_dir(echo log_destination | sed 's/%./*/g') Ar ever used sed on win32?!? And how should the timestamp be represented in client tools? Date/time interpretation is always a source of problems, so *please* let the server do that. I am thinking of these all being server-side functions. Rethinking all this, I'd like the pg_logfile_list to return a complex type: CREATE TYPE pg_logfile_list AS ( filedate timestamp, filename text, backendpid int, inuse bool) and pg_logfile_list() RETURNS SETOF pg_logfile_list which would enable SELECT filename, pg_file_unlink(filename) FROM pg_logfile_list() WHERE filedate current_timestamp - '3 months'::interval AND NOT inuse In-use check is easy for the backend, if the syslog process publishes the current logfile's timestamp in sharedmem. We can use a GUC variable for the log_directory (not log_destination); anyway, I'd like the filenames to be selected by the server. This seems quite involved. Can we get the basic functionality I described first? Also I am not sure how all this information is going to be passed from the logging process to the backend requesting the information, and it seems overly complicated. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Point in Time Recovery
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: We only need to tell people to backup pg_control first. The rest was only intended to enforce 1. that pg_control is the first file backed up 2. the dba uses a large enough PIT (or xid) for restore Right, but I think Bruce's point is that it is far too easy to get those things wrong; especially point 2 for which a straight tar dump will simply not contain the information you need to determine what is a safe stopping point. I agree with Bruce that we should have some mechanism that doesn't rely on the DBA to get this right. Exactly what the mechanism should be is certainly open for discussion... Right. I am wondering what process people would use to backup pg_control first? If they do: tar -f $TAPE ./global/pg_control . They will get two copies or pg_control, the early one, and one as part of the directory scan. On restore, they would restore the early one, but the directory scan would overwrite it. I suppose they could do: cp global/pg_control global/pg_control.backup tar -f $TAPE . then on restore once all the files are restored move the pg_control.backup to its original name. That gives us the checkpoint wal/offset but how do we get the start/stop information. Is that not required? Maybe we should just have a start/stop server-side functions that create a file in the archive directory describing the start/stop counters and time and the admin would then have to find those values. Why are the start/stop wal/offset values needed anyway? I know why we need the checkpoint value. Do we need a checkpoint after the archiving starts but before the backup begins? Also, when you are in recovery mode, how do you get out of recovery mode, meaning if you have a power failure, how do you prevent the system from doing another recovery? Do you remove the recovery.conf file? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone
Magnus Hagander [EMAIL PROTECTED] writes: At the moment I like Oliver Jowett's idea of defining best as the one that matches furthest back. Sounds reasonable to me. As long as a clear warning is put in the log whenever something is picked that is not a perfect match, Define perfect match. I do not think we can really tell if we have an exact match or not; the libc timezone API is just too limited to be sure. And on many platforms we can be sure we will never have an exact match, especially if we look at years before 1970. If you want something in the log I'd be inclined to just always make a log entry when we infer a timezone setting. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Point in Time Recovery
Bruce Momjian [EMAIL PROTECTED] writes: Also, when you are in recovery mode, how do you get out of recovery mode, meaning if you have a power failure, how do you prevent the system from doing another recovery? Do you remove the recovery.conf file? I do not care for the idea of a recovery.conf file at all, and have been intending to look to see what we'd need to do to not have one. I find it hard to believe that there is anything one would put in it that is really persistent state. The above concern shows why it shouldn't be treated as a persistent configuration file. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Point in Time Recovery
then on restore once all the files are restored move the pg_control.backup to its original name. That gives us the checkpoint wal/offset but how do we get the start/stop information. Is that not required? The checkpoint wal/offset is in pg_control, that is sufficient start information. The stop info is only necessary as a safeguard. Do we need a checkpoint after the archiving starts but before the backup begins? No. Also, when you are in recovery mode, how do you get out of recovery mode, meaning if you have a power failure, how do you prevent the system from doing another recovery? Do you remove the recovery.conf file? pg_control could be updated during rollforward (only if that actually does a checkpoint). So if pg_control is also the recovery start info, then we can continue from there if we have a power failure. For the first release it would imho also be ok to simply start over if you loose power. I think the filename 'recovery.conf' is misleading, since it is not a static configuration file, but a command file for one recovery. How about 'recovery.command' then 'recovery.inprogress', and on recovery completion it should be renamed to 'recovery.done' Andreas ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Point in Time Recovery
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: Do we need a checkpoint after the archiving starts but before the backup begins? No. Actually yes. You have to start at a checkpoint record when replaying the log, so if no checkpoint occurred between starting to archive WAL and starting the tar backup, you have a useless backup. It would be reasonable to issue a CHECKPOINT just before starting the backup as part of the standard operating procedure for taking PITR dumps. We need not require this, but it would help to avoid this particular sort of mistake; and of course it might save a little bit of replay effort if the backup is ever used. As far as the business about copying pg_control first goes: there is another way to think about it, which is to copy pg_control to another place that will be included in your backup. For example the standard backup procedure could be 1. [somewhat optional] Issue CHECKPOINT and wait till it finishes. 2. cp $PGDATA/global/pg_control $PGDATA/pg_control.dump 3. tar cf /dev/mt $PGDATA 4. do something to record ending WAL position If we standardized on this way, then the tar archive would automatically contain the pre-backup checkpoint position in ./pg_control.dump, and there is no need for any special assumptions about the order in which tar processes things. However, once you decide to do things like that, there is no reason why the copied file has to be an exact image of pg_control. I claim it would be more useful if the copied file were plain text so that you could just cat it to find out the starting WAL position; that would let you determine without any special tools what range of WAL archive files you are going to need to bring back from your archives. This is pretty much the same chain of reasoning that Bruce and I went through yesterday to come up with the idea of putting a label file inside the tar backups. We concluded that it'd be worth putting both the backup starting time and the checkpoint WAL position into the label file --- the starting time isn't needed for restore but might be really helpful as documentation, if you needed to verify which dump file was which. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Point in Time Recovery
Do we need a checkpoint after the archiving starts but before the backup begins? No. Actually yes. Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-( So yes, you need one checkpoint after archiving starts. Imho turning on xlog archiving should issue such a checkpoint just to be sure. Andreas ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] serverlog rotation/functions
Bruce Momjian wrote: This seems quite involved. Can we get the basic functionality I described first? On the way. Also I am not sure how all this information is going to be passed from the logging process to the backend requesting the information, and it seems overly complicated. There's *no* information passing from the logging process, with the single exception of the latest logfile timestamp (if allowed). I'd rather like to have that information from the logger, to be safe in case the system time was manipulated and the last logfile is not the current one. The rest is just a reworked version of pg_dir_ls, with internal knowledge of how the timestamp is formatted. Regards, Andreas ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] serverlog rotation/functions
Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: In-use check is easy for the backend, if the syslog process publishes the current logfile's timestamp in sharedmem. You really haven't absorbed any of the objections I've raised, have you? I don't want the log process connected to shared mem at *all*, and see no particularly good reason why it should be. Why shouldn't the process announce the logfile timestamp and its pid *writeonly*, so other backends know about it? At least the pid must be distributed like this, just as bgwriter does. I understand perfectly that postmaster and logger are very critical processes, so they should be dependent on as few resources as possible. The logger works without shmem in general, but how to reach it if its pid is unknown? The directory should definitely be a GUC variable. The individual filenames should probably be of the form prefixtimestamp, where the server dictates the format of the timestamp (and we choose it so that the names sort correctly). We could let the prefix be user-selectable or make it hard-wired; I don't have a strong feeling about that either way. Agreed. Regard, Andreas ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Point in Time Recovery
On Fri, 2004-07-16 at 16:58, Zeugswetter Andreas SB SD wrote: Do we need a checkpoint after the archiving starts but before the backup begins? No. Actually yes. Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-( So yes, you need one checkpoint after archiving starts. Imho turning on xlog archiving should issue such a checkpoint just to be sure. By agreement, archive_mode can only be turned on at postmaster startup, which means you always have a checkpoint - either because you shut it down cleanly, or you didn't and it recovers, then writes one. There is always something to start the rollforward. So, non-issue. Best regards, Simon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Point in Time Recovery
On Fri, 2004-07-16 at 15:27, Bruce Momjian wrote: Also, when you are in recovery mode, how do you get out of recovery mode, meaning if you have a power failure, how do you prevent the system from doing another recovery? Do you remove the recovery.conf file? That was the whole point of the recovery.conf file: it prevents you from re-entering recovery accidentally, as would occur if the parameters were set in the normal .conf file. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Point in Time Recovery
On Fri, 2004-07-16 at 16:25, Zeugswetter Andreas SB SD wrote: I think the filename 'recovery.conf' is misleading, since it is not a static configuration file, but a command file for one recovery. How about 'recovery.command' then 'recovery.inprogress', and on recovery completion it should be renamed to 'recovery.done' You understand this and your assessment is correct. recovery.conf isn't an attempt to persist information. It is a means of delivering a set of parameters to the recovery process, as well as signalling overall that archive recovery is required (because the system default remains the same, which is to recover from the logs it has locally available to it). I originally offered a design which used a command, similar to DB2/Oracle...that was overruled as too complex. The (whatever you call it) file is just a very simple way of specifying whats required. There is more to be said here...clearly some explanations are required and I will provide those later... Best regards, Simon Riggs ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] analyze.c
Anja Klein wrote: how can i tell postgres. that it should only create the sample table, when the original table to be created has the name origin or something like that? or that it should only act, if some user wants to create the table, not postgres itself? You would presumably do a strcmp() on the name of the table to be created before you jump into your new code that creates the sample table. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] serverlog rotation/functions
Andreas Pflug wrote: Bruce Momjian wrote: This seems quite involved. Can we get the basic functionality I described first? On the way. Also I am not sure how all this information is going to be passed from the logging process to the backend requesting the information, and it seems overly complicated. There's *no* information passing from the logging process, with the single exception of the latest logfile timestamp (if allowed). I'd rather like to have that information from the logger, to be safe in case the system time was manipulated and the last logfile is not the current one. The rest is just a reworked version of pg_dir_ls, with internal knowledge of how the timestamp is formatted. Oh, so you are hardcoding the logfile name so you can interpret the timestamp from that? It seems cleaner to allow the admin to specify whatever log pattern the want. However, you idea of expiring the log files based on timestamp values is pretty powerful. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] serverlog rotation/functions
Andreas Pflug wrote: Bruce Momjian wrote: This seems quite involved. Can we get the basic functionality I described first? Current workable patch. Some questions/limitations: - How's the official way to restrict pg_* functions to superuser only Very crudely :-) static int pg_signal_backend(int pid, int sig) { if (!superuser()) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg(only superuser can signal other backends; - I've restricted pg_file_read to 50k max. What's a reasonable limit for a generic function? Uh, that seems fine. You already check to see it is within the limit. I think a bigger question is should we limit it at all? Do we limit pg_largeobject? Is that similar? - pg_file_read and pg_file_write read/write text only; should have binary versions too. I guess we could but no one is asking for that yet so I would leave it for later. Very open question: - How should a backend know the logger's pid if it's not in shmem. Write a magic string to the pipe? I think it has to and in fact the pid is being written by the postmaster, not by the logger process, so that should be OK. The issue is that the logger shouldn't _attach_ to shared memory unless it has to. As far as recording the current log timestamp, I think that will be a problem. I would much rather see us forget about doing timestamp processing with these log files and keep it simple at this point and see what needs we have for 7.6. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Point in Time Recovery
Simon Riggs wrote: On Fri, 2004-07-16 at 16:58, Zeugswetter Andreas SB SD wrote: Do we need a checkpoint after the archiving starts but before the backup begins? No. Actually yes. Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-( So yes, you need one checkpoint after archiving starts. Imho turning on xlog archiving should issue such a checkpoint just to be sure. By agreement, archive_mode can only be turned on at postmaster startup, which means you always have a checkpoint - either because you shut it down cleanly, or you didn't and it recovers, then writes one. There is always something to start the rollforward. So, non-issue. I don't think so. I can imagine many cases where you want to do a nightly tar backup without turning archiving on/off or restarting the postmaster. In those cases, a manual checkpoint would have to be issued before the backup begins. Imagine a system that is up for a month, and they don't have enough archive space to keep a months worth of WAL files. They would probably do nightly or weekend tar backups, and then discard the WAL archives. What procedure would they use? I assume they would copy all their old WAL files to a save directory, issue a checkpoint, do a tar backup, then they can delete the saved WAL files. Is that correct? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Memory management with C aggregate
Tom Lane wrote: Ian Burrell [EMAIL PROTECTED] writes: We have user-defined aggregates written in C running inside the server. We are running into some memory management issues and wondering what is the best way to solve the problem. The state of the aggregates is a structure with a pointer to allocated memory. The structure and memory are allocated in the PortalMemoryContext and freed in the ffunc. We just discovered that the ffunc function can be called multiple times with the same state on certain queries. The double free causes a crash. AFAICS the ffunc should be called only once per aggregated group. Perhaps your code is confused about the handling of groups? If so, the double free is hardly your only worry --- you'll be computing wrong answers anyway till you fix that. The aggregate is in a subquery which might make a difference. The ffunc is only called multiple times when a nested loop is used in the query plan. With other query plans, the ffunc is only called once. The ffunc seems to be called once for every combination. I can't get the following query to use a nested loop, but our query is similar, but with a compound key and a custom aggregate. If x has N rows, y has M, with N groups, then the ffunc can be called N*M times, M times for each N group. SELECT x.key_no, s.agg FROM x INNER JOIN ( SELECT y.key_no, custom_agg(num) AS agg FROM y GROUP BY key_no ) s USING (key_no) ORDER BY key_no I'll try to come up with a test case that illustrates the problem. - Ian ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] case sensitivity in PQExecPrepared
I noticed a curious thing (not sure if by design or not). While using the PQExecPrepared interface, the statement name passed to the function (as a const char*) has to be in lower case to get it to work. I kind of understand why this is, but it is kind of weird that passing the exact same statement name to PQExecPrepared and PREPARE does not work if it contains any upper case characters. Just FYI. Merlin ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] serverlog rotation/functions
Bruce Momjian wrote: - How's the official way to restrict pg_* functions to superuser only Very crudely :-) Got it. 'nother question: Is reading the logfile a task that may be allowed to superusers only? I don't think so, though acls might apply. Uh, that seems fine. You already check to see it is within the limit. I think a bigger question is should we limit it at all? Do we limit pg_largeobject? Is that similar? Ok, no limit (but a default maximum of 50k remains). And since it's superuser only, he hopefully knows what he does. Very open question: - How should a backend know the logger's pid if it's not in shmem. Write a magic string to the pipe? I think it has to and in fact the pid is being written by the postmaster, not by the logger process, so that should be OK. The issue is that the logger shouldn't _attach_ to shared memory unless it has to. It doesn't. It inherits the unnamed shared mem segment from the postmaster, as all subprocesses. As far as recording the current log timestamp, I think that will be a problem. I would much rather see us forget about doing timestamp processing with these log files and keep it simple at this point and see what needs we have for 7.6. I'm a bit insisting on this point. Remember, this all started from the attempt to display the serverlog on the client side. To do this, I need a way to retrieve the current logfile properties (size, and in case of rotation timestamp too) in a low-overhead way, or at least get to know something has changed. Scanning a whole directory and interpreting the data isn't low overhead any more. There's no locking on the shmem, and the single dependence on shmem is the existence of it at the time of rotation. If the shmem is gone, postmaster is probably dead anyway. Regards, Andreas ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] case sensitivity in PQExecPrepared
Merlin Moncure wrote: While using the PQExecPrepared interface, the statement name passed to the function (as a const char*) has to be in lower case to get it to work. I kind of understand why this is, but it is kind of weird that passing the exact same statement name to PQExecPrepared and PREPARE does not work if it contains any upper case characters. PQExecPrepared is used in C programs, PREPARE is used in SQL programs. I think it's fairly obvious that those use different syntax rules. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] case sensitivity in PQExecPrepared
Peter Eisentraut wrote: Merlin Moncure wrote: While using the PQExecPrepared interface, the statement name passed to the function (as a const char*) has to be in lower case to get it to work. I kind of understand why this is, but it is kind of weird that passing the exact same statement name to PQExecPrepared and PREPARE does not work if it contains any upper case characters. PQExecPrepared is used in C programs, PREPARE is used in SQL programs. I think it's fairly obvious that those use different syntax rules. Well, yes :) Just to be absolutely clear what I mean, the following will fail (pseudocode, but you get the idea): char stmt[] = prepare X as select 0(); PQexec(c, execute X); -- works PQexecPrepared(c, X [...]); -- fails You are saying this is the correct and expected behavior? Merlin ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] case sensitivity in PQExecPrepared
Merlin Moncure wrote: Well, yes :) Just to be absolutely clear what I mean, the following will fail (pseudocode, but you get the idea): char stmt[] = prepare X as select 0(); PQexec(c, execute X); -- works PQexecPrepared(c, X [...]); -- fails You are saying this is the correct and expected behavior? Yes, because part of those syntax rules is that in SQL, unquoted identifiers are folded to lower case, but in C they are not. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] serverlog rotation/functions
Andreas Pflug wrote: Bruce Momjian wrote: - How's the official way to restrict pg_* functions to superuser only Very crudely :-) Got it. 'nother question: Is reading the logfile a task that may be allowed to superusers only? I don't think so, though acls might apply. Yes, the log file might contain SQL queries issued by others. It is a super-user only capability. Uh, that seems fine. You already check to see it is within the limit. I think a bigger question is should we limit it at all? Do we limit pg_largeobject? Is that similar? Ok, no limit (but a default maximum of 50k remains). And since it's superuser only, he hopefully knows what he does. Huh? Why have a default maximum? Very open question: - How should a backend know the logger's pid if it's not in shmem. Write a magic string to the pipe? I think it has to and in fact the pid is being written by the postmaster, not by the logger process, so that should be OK. The issue is that the logger shouldn't _attach_ to shared memory unless it has to. It doesn't. It inherits the unnamed shared mem segment from the postmaster, as all subprocesses. Ah, I think it needs to close that as soon as it starts. Don't other subprocesses do that? That shared memory is very fragile and we don't want an errant pointer poking in there. As far as recording the current log timestamp, I think that will be a problem. I would much rather see us forget about doing timestamp processing with these log files and keep it simple at this point and see what needs we have for 7.6. I'm a bit insisting on this point. Remember, this all started from the attempt to display the serverlog on the client side. To do this, I need a way to retrieve the current logfile properties (size, and in case of rotation timestamp too) in a low-overhead way, or at least get to know something has changed. Scanning a whole directory and interpreting the data isn't low overhead any more. This seems clean and fast enough to me: SELECT filename FROM pg_dir_ls('/var/log') ORDER BY 1 DESC LIMIT 1 Considering that any query from a client is going to have to go through the parser and be executed, an 'ls' in a directory just isn't a measurable performance hit. If you want run a test that does an 'ls' and one that doesn't to see that there is no measurable performance difference. I would not worry about the clock going backward. PostgreSQL would have enough problems with timestamp columns moving backward that the file log times are the least of our problems. There's no locking on the shmem, and the single dependence on shmem is the existence of it at the time of rotation. If the shmem is gone, postmaster is probably dead anyway. You can't know that you aren't reading corrupt data if you read shared memory without a lock. What if the write is happening as you read? The only clean solution I can think of is to write an operating system file that contains the current log filename and read from that. I believe such writes are atomic. But again, this seems like overkill to me. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Nested Savepoints in Db2 Stinger
On Thu, Jul 15, 2004 at 08:41:08PM +0100, Simon Riggs wrote: FYI http://www.databasejournal.com/features/db2/article.php/3361941 Hmm. Basically it says that they implement SQL2003 semantics (or the other way around: SQL2003 was written with DB2 in mind). In the code there is a provision for savepoint levels, but it doesn't do anything (yet). -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Hay dos momentos en la vida de un hombre en los que no debería especular: cuando puede permitírselo y cuando no puede (Mark Twain) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Point in Time Recovery
On Fri, 2004-07-16 at 19:30, Bruce Momjian wrote: Simon Riggs wrote: On Fri, 2004-07-16 at 16:58, Zeugswetter Andreas SB SD wrote: Do we need a checkpoint after the archiving starts but before the backup begins? No. Actually yes. Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-( So yes, you need one checkpoint after archiving starts. Imho turning on xlog archiving should issue such a checkpoint just to be sure. By agreement, archive_mode can only be turned on at postmaster startup, which means you always have a checkpoint - either because you shut it down cleanly, or you didn't and it recovers, then writes one. There is always something to start the rollforward. So, non-issue. I was discussing the claim that there might not be a checkpoint to begin the rollforward from. There always is: if you are in archive_mode=true then you will always have a checkpoint that can be used for recovery. It may be a long way in the past, if there has been no write activity, but the rollforward will very very quick, since there will be no log records. I don't think so. I can imagine many cases where you want to do a nightly tar backup without turning archiving on/off or restarting the postmaster. This is a misunderstanding. I strongly agree with what you say: the whole system has been designed to avoid any benefit from turning on/off archiving and there is no requirement to restart postmaster to take backups. In those cases, a manual checkpoint would have to be issued before the backup begins. A manual checkpoint doesn't HAVE TO be issued. Presumably most systems will be running checkpoint every few minutes. Wherever the last one was is where the rollforward would start from. But you can if thats the way you want to do things, just wait long enough for the checkpoint to have completed, otherwise your objective of reducing rollforward time will not be met. (please note my earlier reported rollback performance of approximately x10 rate of recovery v elapsed time - will require testing on your own systems). Imagine a system that is up for a month, and they don't have enough archive space to keep a months worth of WAL files. They would probably do nightly or weekend tar backups, and then discard the WAL archives. Yes, that would be normal practice. I would recommend keeping at least the last 3 full backups and all of the WAL logs to cover that period. What procedure would they use? I assume they would copy all their old WAL files to a save directory, issue a checkpoint, do a tar backup, then they can delete the saved WAL files. Is that correct? PITR is designed to interface with a wide range of systems, through the extensible archive/recovery program interface. We shouldn't focus on just tar backups - if you do, then the whole thing seems less feature-rich. The current design allows interfacing with tape, remote backup, internet backup providers, automated standby servers and the dozen major storage/archive vendors' solutions. Writing a procedure to backup, assign filenames, keep track of stuff isn't too difficult if you're a competent DBA with a mild knowledge of shell or perl scripting. But if data is important, people will want to invest the time and trouble to adopt one of the open source or licenced vendors that provide solutions in this area. Systems management is a discipline and procedures should be in place for everything. I fully agree with the automate everything dictum, but just don't want to constrain people too much to a particular way of doing things. -o-o- Overall, for first release, I think the complexity of this design is acceptable. PITR is similar to Oracle7 Backup/Recovery, and easily recognisable to any DBA with current experience of current SQL Server, DB2 (MVS, UDB) or Teradata systems. [I can't comment much on Ingres, Informix, Sybase etc] My main areas of concern are: - the formal correctness of the recovery process As a result of this concern, PITR makes ZERO alterations to the recovery code itself. The trick is to feed it the right xlog files and to stop, if required, at the right place and allow normal work to resume. - the robustness and quality of my implementation This requires quality checking of the code and full beta testing -o-o- We've raised a couple of valid points on the lists in the last few days: - its probably a desirable feature (but not essential) to implement a write suspend feature on the bgwriter, if nothing else it will be a confidence building feature...as said previously, for many people, this will not be required, but people will no doubt keep asking - there is a small window of risk around the possibility that a recovery target might be set by the user that doesn't rollforward all the way past the end of the backup. That is real, but in general, people aren't likely to be performing archive recovery within minutes of a backup being taken -
Re: [HACKERS] serverlog rotation/functions
Bruce Momjian wrote: Ok, no limit (but a default maximum of 50k remains). And since it's superuser only, he hopefully knows what he does. Huh? Why have a default maximum? Just for convenience. Both start and size are optional parameters, but with start=0 and size=5. Well, it's a very special function anyway, so we could require the user to supply all parameters. I'll remove it. Ah, I think it needs to close that as soon as it starts. Don't other subprocesses do that? That shared memory is very fragile and we don't want an errant pointer poking in there. The result of an errant pointer writing to that shred mem would be 1) wrong pid for SysLogger, so it can't be signalled to rotate from backends 2) wrong timestamp, so backends don't know the latest logfile. Nothing particularly crash prone really. This seems clean and fast enough to me: SELECT filename FROM pg_dir_ls('/var/log') ORDER BY 1 DESC LIMIT 1 For a logfile listing function, this would look SELECT MAX(startdate) FROM pg_logfile_ls() Considering that any query from a client is going to have to go through the parser and be executed, an 'ls' in a directory just isn't a measurable performance hit. If you want run a test that does an 'ls' and one that doesn't to see that there is no measurable performance difference. So while a simple PG_RETURN_TIMESTAMP(logfiletimestamp) compared to a lengthy setof returning function is drastically faster, this appears much less drastic with parser overhead. I would not worry about the clock going backward. PostgreSQL would have enough problems with timestamp columns moving backward that the file log times are the least of our problems. I see, so the admin is in trouble anyway (what about PITR? Data column deviations appear harmless compared to restoration based on timestamps). You can't know that you aren't reading corrupt data if you read shared memory without a lock. What if the write is happening as you read? I thought about this quite a while. If the shmem fields aren't written atomically (one is 32bit, one 64 bit, probably on dword boundaries so writing will happen at least processor bus wide, do we support any 16 bit processor?) the corruption consequences as above apply. In the case of the timestamp, the high word will rarely change anyway, only every 2^32 seconds... Concurrent access on the logger pid would mean to call pg_logfile_rotate() while a killed logger is being restarted, which is creating a new logfile then anyway. This would send a SIGINT into outer space, maybe to the bgwriter triggering a checkpoint, or the postmaster shutting it down (gracefully, still unwanted). BTW, the consequences of a trigger flag in shmem would be less because all that could happen was a log rotation (which appends to existing files, just in case syslogger died in the milliseconds after a rotation). The only clean solution I can think of is to write an operating system file that contains the current log filename and read from that. I believe such writes are atomic. But again, this seems like overkill to me. Ah wait. Digging further behind SIGUSR1 I now *do* see a solution without pid in shmem, using SendPostmasterSignal. Well, a little hint from gurus would have helped... I'll convert to this, *dropping* all shmem. Regards, Andreas ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] serverlog rotation/functions
Andreas Pflug wrote: Ah wait. Digging further behind SIGUSR1 I now *do* see a solution without pid in shmem, using SendPostmasterSignal. Well, a little hint from gurus would have helped... Oops, SendPostmasterSignal uses shmem At least, this enables syslogger.c to be free from shmem stuff, except for PGSharedMemDetach. Regards, Andreas ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] serverlog rotation/functions
Andreas Pflug wrote: Bruce Momjian wrote: Ok, no limit (but a default maximum of 50k remains). And since it's superuser only, he hopefully knows what he does. Huh? Why have a default maximum? Just for convenience. Both start and size are optional parameters, but with start=0 and size=5. Well, it's a very special function anyway, so we could require the user to supply all parameters. I'll remove it. Agreed, and maybe a zero value gets the entire file. Ah, I think it needs to close that as soon as it starts. Don't other subprocesses do that? That shared memory is very fragile and we don't want an errant pointer poking in there. The result of an errant pointer writing to that shred mem would be 1) wrong pid for SysLogger, so it can't be signalled to rotate from backends 2) wrong timestamp, so backends don't know the latest logfile. Nothing particularly crash prone really. No, I am thinking the program goes crazy and writes everywhere. This seems clean and fast enough to me: SELECT filename FROM pg_dir_ls('/var/log') ORDER BY 1 DESC LIMIT 1 For a logfile listing function, this would look SELECT MAX(startdate) FROM pg_logfile_ls() Considering that any query from a client is going to have to go through the parser and be executed, an 'ls' in a directory just isn't a measurable performance hit. If you want run a test that does an 'ls' and one that doesn't to see that there is no measurable performance difference. So while a simple PG_RETURN_TIMESTAMP(logfiletimestamp) compared to a lengthy setof returning function is drastically faster, this appears much less drastic with parser overhead. I would not worry about the clock going backward. PostgreSQL would have enough problems with timestamp columns moving backward that the file log times are the least of our problems. I see, so the admin is in trouble anyway (what about PITR? Data column deviations appear harmless compared to restoration based on timestamps). PITR uses WAL numbering so it would be fine, but the timestamps on the commit records would have problems. You can't know that you aren't reading corrupt data if you read shared memory without a lock. What if the write is happening as you read? I thought about this quite a while. If the shmem fields aren't written atomically (one is 32bit, one 64 bit, probably on dword boundaries so writing will happen at least processor bus wide, do we support any 16 bit processor?) the corruption consequences as above apply. In the case of the timestamp, the high word will rarely change anyway, only every 2^32 seconds... Concurrent access on the logger pid would mean to call pg_logfile_rotate() while a killed logger is being restarted, which is creating a new logfile then anyway. This would send a SIGINT into outer space, maybe to the bgwriter triggering a checkpoint, or the postmaster shutting it down (gracefully, still unwanted). BTW, the consequences of a trigger flag in shmem would be less because all that could happen was a log rotation (which appends to existing files, just in case syslogger died in the milliseconds after a rotation). The only clean solution I can think of is to write an operating system file that contains the current log filename and read from that. I believe such writes are atomic. But again, this seems like overkill to me. Ah wait. Digging further behind SIGUSR1 I now *do* see a solution without pid in shmem, using SendPostmasterSignal. Well, a little hint from gurus would have helped... I'll convert to this, *dropping* all shmem. Yes, that is the usual method. We signal the postmaster and it then does the signalling to the logger. I thought you had looked at other backend signalling examples so I didn't explain it. Now, one really good efficiency would be to use LISTEN/NOTIFY so clients could know when new data has appeared in the log, or the log file is rotated. Now that's an efficiency! However, let's get this infrastructure completed first. One wacky idea would be for the clients to LISTEN on 'pg_new_logfile' and have the logger do system('psql -c NOTIFY pg_new_logfile template1') or something like that. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] serverlog rotation/functions
Andreas Pflug wrote: Andreas Pflug wrote: Ah wait. Digging further behind SIGUSR1 I now *do* see a solution without pid in shmem, using SendPostmasterSignal. Well, a little hint from gurus would have helped... Oops, SendPostmasterSignal uses shmem At least, this enables syslogger.c to be free from shmem stuff, except for PGSharedMemDetach. Right. We already have to use shared mem for the backends and postmaster. It is the logger we are worried about. Tom brought up the point that if the logger used shared memory, we would have to kill/restart it if we need to reinitialize shared memory, meaning we would loose logging info at a time we really need it --- again a good reason not to use shared memory in the logger. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Point in Time Recovery
On Fri, 2004-07-16 at 16:47, Tom Lane wrote: As far as the business about copying pg_control first goes: there is another way to think about it, which is to copy pg_control to another place that will be included in your backup. For example the standard backup procedure could be 1. [somewhat optional] Issue CHECKPOINT and wait till it finishes. 2. cp $PGDATA/global/pg_control $PGDATA/pg_control.dump 3. tar cf /dev/mt $PGDATA 4. do something to record ending WAL position If we standardized on this way, then the tar archive would automatically contain the pre-backup checkpoint position in ./pg_control.dump, and there is no need for any special assumptions about the order in which tar processes things. Sounds good. That would be familiar to Oracle DBAs doing BACKUP CONTROLFILE. We can document that and off it as a suggested procedure. However, once you decide to do things like that, there is no reason why the copied file has to be an exact image of pg_control. I claim it would be more useful if the copied file were plain text so that you could just cat it to find out the starting WAL position; that would let you determine without any special tools what range of WAL archive files you are going to need to bring back from your archives. I wouldn't be in favour of a manual mechanism. If you want an automated mechanism, whats wrong with using the one thats already there? You can use pg_controldata to read the controlfile, again whats wrong with that? We agreed some time back that an off-line xlog file inspector would be required to allow us to inspect the logs and make a decision about where to end recovery. You'd still need that. It's scary enough having to specify the end point, let alone having to specify the starting point as well. At your request, and with Bruce's idea, I designed and built the recovery system so that you don't need to know what range of xlogs to bring back. You just run it, it brings back the right files from archive and does recovery with them, then cleans up - and it works without running out of disk space on long recoveries. I've built it now and it works... This is pretty much the same chain of reasoning that Bruce and I went through yesterday to come up with the idea of putting a label file inside the tar backups. We concluded that it'd be worth putting both the backup starting time and the checkpoint WAL position into the label file --- the starting time isn't needed for restore but might be really helpful as documentation, if you needed to verify which dump file was which. ...if you are doing tar backups...what will you do if you're not using that mechanism? If you are: It's common practice to make up a backup filename from elements such as systemname, databasename, date and time etc. That gives you the start time, the file last mod date gives you the end time. I think its perfectly fine for everybody to do backups any way they please. There are many licenced variants of PostgreSQL and it might be appropriate in those to specify particular ways of doing things. I'll be trusting the management of backup metadata and storage media to a solution designed for the purpose (open or closed source), just as I'll be trusting my data to a database solution designed for that purpose. That for me is one of the good things about PostgreSQL - we use the filesystem, we don't write our own, we provide language interfaces not invent our own proprietary server language etc.. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] serverlog rotation/functions
Andreas Pflug wrote: Bruce Momjian wrote: Andreas Pflug wrote: Right. We already have to use shared mem for the backends and postmaster. It is the logger we are worried about. Tom brought up the point that if the logger used shared memory, we would have to kill/restart it if we need to reinitialize shared memory, I don't know why that particular segment should ever be renewed. Anyway, it's gone. As I remember, we have one big shared memory segment. Where you creating a special one just for this timestamp? If you were, I see why your approach was safer, but as you said, it doesn't buy us much anyway. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] serverlog rotation/functions
Bruce Momjian wrote: Andreas Pflug wrote: Right. We already have to use shared mem for the backends and postmaster. It is the logger we are worried about. Tom brought up the point that if the logger used shared memory, we would have to kill/restart it if we need to reinitialize shared memory, I don't know why that particular segment should ever be renewed. Anyway, it's gone. Regards, Andreas ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] serverlog rotation/functions
Bruce Momjian wrote: Andreas Pflug wrote: Just for convenience. Both start and size are optional parameters, but with start=0 and size=5. Well, it's a very special function anyway, so we could require the user to supply all parameters. I'll remove it. Agreed, and maybe a zero value gets the entire file. Which is a default param back again, maybe on a 100MB file? Better not. Lets leave it to the admin to do sick stuff as pg_read_file('base/5000/5002', 0, 1) ... No, I am thinking the program goes crazy and writes everywhere. What I described was just that situation. Yes, that is the usual method. We signal the postmaster and it then does the signalling to the logger. I thought you had looked at other backend signalling examples so I didn't explain it. Well if you know the places where backends do signal stuff to the postmaster... Still, somebody could have yelled use the standard way before reinventing the wheel. Now, one really good efficiency would be to use LISTEN/NOTIFY so clients could know when new data has appeared in the log, or the log file is rotated. Now that's an efficiency! However, let's get this infrastructure completed first. One wacky idea would be for the clients to LISTEN on 'pg_new_logfile' and have the logger do system('psql -c NOTIFY pg_new_logfile template1') or something like that. No, certainly not. This would mean that every time a log is done, psql is fired up. Tom wouldn't accept this as KISS, I believe. And h*ll, that would cause traffic (just imagine a single log message on client startup...) What you saw on LinuxTag was pgAdmin3 polling once a second if the logfile length changed, which is the fastest setting possible. Regards, Andreas ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Point in Time Recovery
OK, I think I have some solid ideas and reasons for them. First, I think we need server-side functions to call when we start/stop the backup. The advantage of these server-side functions is that they will do the required work of recording the pg_control values and creating needed files with little chance for user error. It also allows us to change the internal operations in later releases without requiring admins to change their procedures. We are even able to adjust the internal operation in minor releases without forcing a new procedure on users. Second, I think once we start a restore, we should rename recovery.conf to recovery.in_progress, and when complete rename that to recovery.done. If the postmaster starts and sees recovery.in_progress, it will fail to start knowing its recovery was interrupted. This allows the admin to take appropriate action. (I am not sure what that action would be. Does he bring back the backup files or just keep going?) Third, I think we need to put a file in the archive location once we complete a backup, recording the start/stop xid and wal/offsets. This gives the admin documentation on what archive logs to keep and what xids are available for recovery. Ideally the recover program would read that file and check the recover xid to make sure it is after the stop xid recorded in the file. How would the recover program know the name of that file? We need to create it in /data with start contents before the backup, then complete it with end contents and archive it. What should we name it? Ideally it would be named by the WAL name/offset of the start so it orders in the proper spot in the archive file listing, e.g.: 093a 093b 093b.032b9.start 093c Are people going to know they need 093b for 093b.032b9.start? I hope so. Another idea is to do: 093a.xlog 093b.032b9.start 093b.xlog 093c.xlog This would order properly. It might be a very good idea to add extensions to these log files now that we are archiving them in strange places. In fact, maybe we should use *.pg_xlog to document the directory they came from. --- Simon Riggs wrote: On Fri, 2004-07-16 at 16:47, Tom Lane wrote: As far as the business about copying pg_control first goes: there is another way to think about it, which is to copy pg_control to another place that will be included in your backup. For example the standard backup procedure could be 1. [somewhat optional] Issue CHECKPOINT and wait till it finishes. 2. cp $PGDATA/global/pg_control $PGDATA/pg_control.dump 3. tar cf /dev/mt $PGDATA 4. do something to record ending WAL position If we standardized on this way, then the tar archive would automatically contain the pre-backup checkpoint position in ./pg_control.dump, and there is no need for any special assumptions about the order in which tar processes things. Sounds good. That would be familiar to Oracle DBAs doing BACKUP CONTROLFILE. We can document that and off it as a suggested procedure. However, once you decide to do things like that, there is no reason why the copied file has to be an exact image of pg_control. I claim it would be more useful if the copied file were plain text so that you could just cat it to find out the starting WAL position; that would let you determine without any special tools what range of WAL archive files you are going to need to bring back from your archives. I wouldn't be in favour of a manual mechanism. If you want an automated mechanism, whats wrong with using the one thats already there? You can use pg_controldata to read the controlfile, again whats wrong with that? We agreed some time back that an off-line xlog file inspector would be required to allow us to inspect the logs and make a decision about where to end recovery. You'd still need that. It's scary enough having to specify the end point, let alone having to specify the starting point as well. At your request, and with Bruce's idea, I designed and built the recovery system so that you don't need to know what range of xlogs to bring back. You just run it, it brings back the right files from archive and does recovery with them, then cleans up - and it works without running out of disk space on long recoveries. I've built it now and it works... This is pretty much the same chain of reasoning that Bruce and I went through yesterday to come up with the idea of putting a label file inside the tar backups. We concluded that it'd be worth putting both the backup starting time and the checkpoint WAL position into the label file --- the starting time isn't needed for restore but might be really helpful as documentation, if you needed to verify which
Re: [HACKERS] Point in Time Recovery
Let me address you concerns about PITR getting into 7.5. I think a few people spoke last week expressing concern about our release process and wanting to take drastic action. However, looking at the release status report I am about to post, you will see we are on track for an August 1 beta. PITR has been neglected only because it has been moving along so well we haven't needed to get deeply involved. Simon has been able to address concerns as we raised them and make adjustments quickly with little guidance. Now, we certainly don't want to skip adding PITR by not giving it our full attention to get into 7.5. Once Tom completes the cursor issues with NT in the next day or so, I think that removes the last big NT stumbling block, and we will start to focus on PITR. Unless there is some major thing we are missing, we fully expect to get PITR in 7.5. We don't have a crystal ball to know for sure, but our intent is clear. I know Simon is going away July 26 so we want to get him feedback as soon as possible. If we wait until after July 26, we will have to make all the adjustments without Simon's guidance, which will be difficult. As far as the importance of PITR, it is a _key_ enterprise feature, even more key than NT. PITR is going to be one of the crowning jewels of the 7.5 release, and I don't want to go into beta without it unless we can't help it. So, I know with the deadline looming, and everyone it getting nervous, but keep the faith. I can see the light at the end of the tunnel. I know this is a tighter schedule than we would like, but I know we can do it, and I expect we will do it. --- Simon Riggs wrote: On Thu, 2004-07-15 at 15:57, Bruce Momjian wrote: We will get there --- it just seems dark at this time. Thanks for that. My comments were heartfelt, but not useful right now. I'm badly overdrawn already on my time budget, though that is my concern alone. There is more to do than I have time for. Pragmatically, if we aren't going to get there then I need to stop now, so I can progress other outstanding issues. All help is appreciated. I'm aiming for the minimum feature set - which means we do need to take care over whether that set is insufficient and also to pull any part that doesn't stand up to close scrutiny over the next few days. Overall, my primary goal is increased robustness and availability for PostgreSQL...and then to have a rest! Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] Release status report
We are two weeks away from the beta freeze on August 1 so I want to give a status report on where we are. Right now we have two large outstanding items, and a few smaller ones. Nested transactions (NT) is in CVS, but it is missing: o savepoints o cursor rollback o performance fixes Alvaro has completed the first item and a few more and will probably submit a mega-patch shortly that can be applied. Tom is working on cursors and should finish in a day or two. Performance fixes are needed and will have to be addressed, though we might be adjusting these during beta as we find them. PITR has been ready for a while though Simon continues to improve it and users are continuing to suggest fixes/ideas. I am hopeful we can get the majority of PITR issues/comments addressed next week while Simon is available. The other patches are in-progress and will be applied when the community concludes requesting changes to each. So, basically, I think we are on track. It would be nice to get the majority of NT/PITR work into CVS by next weekend so we can have a week of polishing before beta. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html