[HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?
Howdy. I'm starting some work on our favorite LISTEN/NOTIFY subsystem in the hopes of more advanced functionality. Right now I'm using a bastardized (RAISE NOTICE + tailing error logs) + NOTIFY to get what should just be built in to the LISTEN/NOTIFY interface. Here's the syntax for the proposed functionality. I've updated the grammar to support the following: LISTEN name [BLOCK] [WITH] [TIMEOUT [=] ''::INTERVAL]; NOTIFY name a_expr; But am having something of a think-o when it comes to the semantics of the BLOCK keyword for the LISTEN command. Let me first explain my goals for this exercise: * Allow passing of a data type in a NOTIFY. Something like: NOTIFY 'relname' a_expr; Should be able to pass a text, INET, etc... shouldn't matter... not sure if this is possible though given that OIDs don't travel with data types... I may have to settle for a TEXT arg, which is acceptable. * Allow LISTEN to block until a value changes. LISTEN [BLOCK|WAIT] 'relname' * Allow LISTEN to have a timeout LISTEN name [BLOCK] [WITH] [TIMEOUT [=] ''::INTERVAL]; * Allow blocking LISTEN queries to update the status of the proc title while blocking. Basically I want to introduce formal support for turning PostgreSQL into a message bus. To start with, it won't be a scalable message bus that can scale to thousands of concurrent connections (that's something I'd like to do long term(tm), but I digress). The problem is with the BLOCK'ing semantics or implementation. There are two ways that I can do this, IMHO, but I'm out to solicit alternatives as I'm still getting a handle on what the best interfaces/APIs are internally to get something done. Option 1) Use sleep(3) for the given timeout and wake up on some interruptible a signal (USR2?). This is the simplest solution, but likely the least portable to win32. Given the new world order of 8.0 and it's portability headaches, it's something I'm aware of. Option 2) block on an exclusive lock. Check to see if relname has been registered. If it has, block on the existing exclusive lock (can I block on a lock for a given number of sec/ms?). If it hasn't, create an exclusive lock, but give the lock away (to the parent postmaster, a lockmgr proc, etc) so that a notify can remove the remove and unlock the exclusive lock so that all of the blocking children wake up. The async interface is nice, but not really useful to me as it requires polling, instead of unblocking when an event comes through, which would create a vastly more real time interface that should be easier on the database. Does anyone think there would be any conflicts with the use of the existing alarm code from storage/lmgr/proc.c for the LISTEN/NOTIFY interface? It looks like SIGALRM has a reserved purpose. I haven't found any global alarm handling interface that can be used to assign different meanings when an SIGALRM is received. Any other thoughts on the best way to proceed? -sc -- Sean Chittenden ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?
Basically I want to introduce formal support for turning PostgreSQL into a message bus. To start with, it won't be a scalable message bus that can scale to thousands of concurrent connections (that's something I'd like to do long term(tm), but I digress). The problem is with the BLOCK'ing semantics or implementation. There are two ways that I can do this, IMHO, but I'm out to solicit alternatives as I'm still getting a handle on what the best interfaces/APIs are internally to get something done. Sounds great :-) Option 1) Use sleep(3) for the given timeout and wake up on some interruptible a signal (USR2?). This is the simplest solution, but likely the least portable to win32. Given the new world order of 8.0 and it's portability headaches, it's something I'm aware of. There should be no problem doing this on win32. The portability layer (in backend/port/win32) implements a sleep (pg_usleep, that is) that can be interrupted by any signal (our win32 signal implementation actually supports 256 signals, but you'd have to stick to a signal that's supported on all platforms and also not already in use. And that's going to be a lot harder unless you can piggyback on something that's already there). The async interface is nice, but not really useful to me as it requires polling, instead of unblocking when an event comes through, which would create a vastly more real time interface that should be easier on the database. As long as you're working only from the client side, you don't need to do polling. See the bottom of the page at http://developer.postgresql.org/docs/postgres/libpq-notify.html. You just need to pass the socket down to select(). I'm doing this from C and perl, don't know about other languages. That said, there are certainly a lot of cases where a blocking call at the SQL level would also be useful. Just don't kill what we have now :-) //Magnus ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Help Needed
On Mon, 27 Dec 2004 07:55:27 +0530 (IST), Ameya S. Sakhalkar [EMAIL PROTECTED] wrote: For my project (main memory DBMS), I have written a main memory filesystem. Idea is, the primary copy of data will reside in main memory. (Workable only for small size data, at most 2GB). Now, I want to plug this filesystem with Postgres, so that, instead of Unix filesystem, this main memory filesystem(MMFS) will now store the data. I believe that all you need to do is to mount your filesystem somewhere, and then tell postgres to use that directory for storing the database files. 1. Mount your filesystem to a directory, e.g /mnt/mem_drive (possibly : mount -t mmfs /dev/mmfs /mnt/mem_drive) 2. export PG_DATA=/mnt/mem_drive 3. initdb 4. pg_ctl start I hope this helps. Adrian Maier I want to know, how to plug it with Postgres. i.e. whether replacing all calls to Unix filesystem by calls to MMFS is enough, or do I need to do something else/more. I am really a newbie to Postgres. So, can someone pls guide me, abt how should I go. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?
Sean Chittenden [EMAIL PROTECTED] writes: * Allow LISTEN to block until a value changes. LISTEN [BLOCK|WAIT] 'relname' * Allow LISTEN to have a timeout LISTEN name [BLOCK] [WITH] [TIMEOUT [=] ''::INTERVAL]; * Allow blocking LISTEN queries to update the status of the proc title while blocking. I don't believe in any of these things, at least not on the server side. You can get the same effect on the client side without cluttering the LISTEN semantics and implementation. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?
Sean Chittenden [EMAIL PROTECTED] writes: The async interface is nice, but not really useful to me as it requires polling, instead of unblocking when an event comes through, which would create a vastly more real time interface that should be easier on the database. BTW, this is nonsense; the backend sends a message exactly when the NOTIFY occurs. It may well be that libpq needs some work to make it easier to wait for a NOTIFY without polling, but again there isn't a reason to clutter the server-side code with it. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] production server down
Tom Lane wrote: Are you using one of the scripts that does an auto initdb if it doesn't see a valid PGDATA? 11 seconds might be about right for that. One problem with this theory is how come you didn't get screwed during *that* boot cycle. It seems to require assuming that the NFS mount came online just after the initdb finished (else initdb would have overwritten the on-NFS pg_control) but before the regular postmaster started (else this same scenario would have played out then). That's not a very wide window. [followup] We've now had a chance to bring Postgres down and check under the mount point. There *is* indeed a newly initdb'd cluster under there. FWIW the control file is corrupt: # pg_controldata /home/jconway/pgsql/fds/replica/pgdata WARNING: Calculated CRC checksum does not match value stored in file. Either the file is corrupt, or it has a different layout than this program is expecting. The results below are untrustworthy. pg_control version number:72 Catalog version number: 200310211 Database cluster state: in production pg_control last modified: Sat Feb 6 22:28:16 2106 Current log file ID: 0 Next log file segment:10161036 Latest checkpoint location: 0/9AA1B4 Prior checkpoint location:0/9B0B8C Latest checkpoint's REDO location:0/0 Latest checkpoint's UNDO location:C/218 Latest checkpoint's StartUpID:17142 Latest checkpoint's NextXID: 1099443932 Latest checkpoint's NextOID: 8192 Time of latest checkpoint:Wed Apr 8 07:05:36 6325 Database block size: 1 Blocks per segment of large relation: 128 Maximum length of identifiers:67 Maximum number of function arguments: 0 Date/time type storage: floating-point numbers Maximum length of locale name:0 LC_COLLATE: LC_CTYPE: I have a tarred copy of the under-the-mount PGDATA if anyone is interested in examining it. BTW, there was another Postgres cluster on this same server which we had not used since the November 2 reboot -- it was corrupt in pretty much the same way and also had an initdb'd cluster under its mount. So it looks like using an auto initdb startup script is a very bad idea when using an NFS mounted PGDATA. We left the under-mount structure in place and did chown root:root and chmod 000 on it. And, as mentioned in an earlier post, we now rely on the dba to start postgres manually after a server restart. Joe ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?
Sean Chittenden wrote: Option 1) Use sleep(3) for the given timeout and wake up on some interruptible a signal (USR2?). This is the simplest solution, but likely the least portable to win32. Given the new world order of 8.0 and it's portability headaches, it's something I'm aware of. Option 2) block on an exclusive lock. Check to see if relname has been registered. If it has, block on the existing exclusive lock (can I block on a lock for a given number of sec/ms?). If it hasn't, create an exclusive lock, but give the lock away (to the parent postmaster, a lockmgr proc, etc) so that a notify can remove the remove and unlock the exclusive lock so that all of the blocking children wake up. The async interface is nice, but not really useful to me as it requires polling, instead of unblocking when an event comes through, which would create a vastly more real time interface that should be easier on the database. Does anyone think there would be any conflicts with the use of the existing alarm code from storage/lmgr/proc.c for the LISTEN/NOTIFY interface? It looks like SIGALRM has a reserved purpose. I haven't found any global alarm handling interface that can be used to assign different meanings when an SIGALRM is received. Any other thoughts on the best way to proceed? You can make cooperative blocking locks using some slick client side code and the swiss army knife userlock module. Since user locks pierce transaction encapsulation they can be used for these types of things. select user_write_lock(some number); if return = 1 notify some message else wait a and try again, etc. end if // release lock, etc. NOTIFY 'relname' a_expr; This would be great to have...at least I think this is what you are driving at: (adding a noiseword for readability) LISTEN system_messages; NOTIFY system_messages MESSAGE logoff; NOTIFY request_unlock MESSAGE 12345; -- for use with user locks! Etc. Another cute tidbit about the listen/notify interface is that it can be abused to create server side storage that is cleaned up by the server when a backend drops. I was using this to crate a crude pessimistic locking system before I discovered the userlock module. Note that this is not a scalable or robust approach however. listen 12345; --acquired a lock on '12345' select listenerpid from pg_listener where relname = '12345' Merlin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?
NOTIFY 'relname' a_expr; This would be great to have...at least I think this is what you are driving at: (adding a noiseword for readability) LISTEN system_messages; NOTIFY system_messages MESSAGE logoff; NOTIFY request_unlock MESSAGE 12345; -- for use with user locks! Hrm... the userlock module may work for what I'm after, but the module is GPL'ed, which is something I don't want to rely on. I wanted to have a backend process essentially act as a long running client that would act on notify messages to dispatch work. I was hoping to accomplish something like, LISTEN REGISTER event_name; LISTEN * BLOCK WITH TIMEOUT = 30; which would only listen to events that you've registered to listen to, but would only deliver messages when you explicitly were going to block waiting for message delivery (ie, LISTEN BLOCK). I'll settle for getting just message passing via NOTIFY/LISTEN to work and will revisit extending things further once I have this bit done. -sc -- Sean Chittenden ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?
The async interface is nice, but not really useful to me as it requires polling, instead of unblocking when an event comes through, which would create a vastly more real time interface that should be easier on the database. BTW, this is nonsense; the backend sends a message exactly when the NOTIFY occurs. It may well be that libpq needs some work to make it easier to wait for a NOTIFY without polling, but again there isn't a reason to clutter the server-side code with it. It's asynchronous with regards to client delivery of the message. Sending of the NOTIFY message is synchronous, but delivery to the client is not. I don't see how it could be any other way in PostgreSQL. libpq(3) actually has a reasonable interface that relies on the developer to block on the fd as described here: http://developer.postgresql.org/docs/postgres/libpq-notify.html The problem is that this doesn't work in pl/*, which is the problem I was trying to address. *shrug* -sc -- Sean Chittenden ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Schema permissions inheiritance
Greetings, As was discussed on IRC a bit- In the TODO list already there's a blip about adding support for group ownership. In addition to this I think it'd be nice if you could create a schema (owned by a group) in which any tables/etc created were also owned by a group automatically. Following that, additional permissions might be nice, kind of implicit grants for tables created in a certain schema. I'd think we'd also want to have a NOTICE when that happens so users are aware of the permissions on the table. Thoughts? Can we get it added to the list? Stephen signature.asc Description: Digital signature
Re: [HACKERS] Updateable views
Greg Stark [EMAIL PROTECTED] writes: I think you want to extend the SQL syntax to allow updating views, and implement plan nodes and executor functionality to handle them. So things like this works: UPDATE (SELECT id,val FROM t) SET val=0 where id 100 Then the rules you create on the views are just like the rules for SELECT, they simply mechanically replace the view with the view definition. I think this is the right approach because: a) I think creating the general rules to transform an update into an update on the underlying table will be extremely complex, and you'll only ever be able to handle the simplest cases. By handling the view at planning time you'll be able to handle arbitrarily complex cases limited only by whether you can come up with reasonable semantics. Please provide an existence proof. I don't really see any basis for the claim that this will be simpler to implement --- the semantic problems will be the same either way. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] displaying contents
Hi, there is way to display all the values (fields) in a tree node like this? for debug purpouses. Query *query; regards, Jaime Casanova _ Do You Yahoo!? Información de Estados Unidos y América Latina, en Yahoo! Noticias. Visítanos en http://noticias.espanol.yahoo.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] displaying contents
Jaime Casanova [EMAIL PROTECTED] writes: there is way to display all the values (fields) in a tree node like this? for debug purpouses. Query*query; Look at elog_node_display(). regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Where do I get the spec for PostgreSQL
Where can I obtain a spec for postgresql, so I can start looking at the code? Benjamin [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] displaying contents
--- Tom Lane [EMAIL PROTECTED] escribió: Jaime Casanova [EMAIL PROTECTED] writes: there is way to display all the values (fields) in a tree node like this? for debug purpouses. Query *query; Look at elog_node_display(). regards, tom lane Ok. Thanks. regards, Jaime Casanova _ Do You Yahoo!? Información de Estados Unidos y América Latina, en Yahoo! Noticias. Visítanos en http://noticias.espanol.yahoo.com ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Bgwriter behavior
Simon Riggs wrote: On Wed, 2004-12-22 at 04:43, Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: So what are we doing for 8.0? Well, it looks like RC2 has already crashed and burned --- I can't imagine that Marc will let us release without an RC3 given what was committed today, never mind the btree bug that Mark Wong seems to have found. So maybe we should just bite the bullet and do something real about this. I'm willing to code up a proposed patch for the two-track idea I suggested, and if anyone else has a favorite maybe they could write something too. But do we have the resources to test such patches and make a decision in the next few days? At the moment my inclination is to sit on what we have. I've not seen any indication that 8.0 is really worse than earlier releases; the most you could argue against it is that it's not as much better as we hoped. That's not grounds to muck around at the RC3 stage. Agreed, if somewhat reluctantly. We may have the time to test, but it is clear that we do not have the time to validate those tests, then discuss and agree on the results. Time to go with what we have. I ran some tests last week and can report results similar on Tom's test: pgbench -i -s 10 bench pgbench -c 10 -t 1 bench The tests were on a machine with a single SCSI drive that doesn't lie about fsync. I found 7.4.X got around 75tps while 8.0 got 100tps, very similar to the 65/107 numbers Tom had. First, I am confused why we have such a large improvement in 8.0. Does anyone know? This is a pretty long test so a 33-50% increase is a big jump. Second, I added a little code in my local code to check if the pendingOpsTable overflows and register_dirty_segment() must have a local backend do an fsync(). I found one bgbench test had 54 local fsyncs, but the next test had none, and 54 isn't a very larger number. Should we emit a server log message when this happens so they can reduce bewriter delay? It seems having the backend do the writes is not so bad (same as 7.4.X) and our only big problem with current bgwriter is the inability to reduce checkpoint load for busy servers. Should we consider at least adjusting the meaning of bgwriter_percent? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] Ready for RC3?
What are the open issues before we can release an RC3? I don't know of any except the btree problem OSDL found. Is that fixed? Are their others? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bgwriter behavior
I ran some tests last week and can report results similar on Tom's test: pgbench -i -s 10 bench pgbench -c 10 -t 1 bench The tests were on a machine with a single SCSI drive that doesn't lie about fsync. I found 7.4.X got around 75tps while 8.0 got 100tps, very similar to the 65/107 numbers Tom had. You do realize, that pgbench result comparisons are about as useful as a fork for eating soup? On another note, how do you know for sure, that your drive does not lie about fsync? Did you run the tests with fsync turned off vs fsync on? First, I am confused why we have such a large improvement in 8.0. Does anyone know? This is a pretty long test so a 33-50% increase is a big jump. bgwriter is responsible I imagine,... I experienced the same improvement in an early 7.5, just after the bgwriter was added. (tho my results was about 4-5 times higher in terms of tps rates, hehe) ... John ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] RC2 and open issues
Greg Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Suppose that you run a checkpoint every 5 minutes, and with the knob you slow down the checkpoint to extend over say 3 minutes on average, rather than the normal blast-it-out-as-fast-as-possible. Then you'll be keeping an average of 8 minutes worth of WAL files instead of 5. Not exactly a killer objection. Right. I was thinking that the goal would be to spread the checkpoint out over exactly the checkpoint interval, minus some safety factor. So if it has some estimate of the total number of dirty buffers that need flushing it could just divide the checkpoint interval by that and calculate the delay needed to finish in some fraction of the checkpoint interval, 60% seems like a reasonable guess. One issue is that while we can regulate the rate at which we issue write()s, we still have to issue fsync()s at the end, and we can't control what happens in response to those. It's quite possible that all the I/O would happen in response to the fsync()s anyway, in which case the whole exercise would be a waste of time. Well you could fsync earlier as well, say just before whenever you sleep. Obviously the delay on the checkpoint process doesn't matter to performance if it's about to sleep. It could end up scheduling i/o earlier than necessary and cause redundant seeks but then I guess that's an inherent tension between trying to spread out the i/o evenly and trying to get the ideal ordering of i/o. It certainly is an interesting idea to have the checkpoint span a longer time period. We couldn't do that with sync, but now that we fsync each file it is possible. It would be easy do this if we didn't also need the fsync. The original idea was that we would write() the dirty buffers long before the checkpoint, and the kernel would write many of these dirty buffers before we got to checkpoint time. We could go with the checkpoint clock sweep idea but then we aren't writing them but actually doing write/fsync a lot more. I can't think of a way this would be a win. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings