[HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?

2004-12-27 Thread Sean Chittenden
Howdy.  I'm starting some work on our favorite LISTEN/NOTIFY subsystem 
in the hopes of more advanced functionality.  Right now I'm using a 
bastardized (RAISE NOTICE + tailing error logs) + NOTIFY to get what 
should just be built in to the LISTEN/NOTIFY interface.  Here's the 
syntax for the proposed functionality.  I've updated the grammar to 
support the following:

  LISTEN name [BLOCK] [WITH] [TIMEOUT [=] ''::INTERVAL];
  NOTIFY name a_expr;
But am having something of a think-o when it comes to the semantics of 
the BLOCK keyword for the LISTEN command.  Let me first explain my 
goals for this exercise:

* Allow passing of a data type in a NOTIFY.  Something like:
  NOTIFY 'relname' a_expr;
  Should be able to pass a text, INET, etc... shouldn't matter... not 
sure if this is possible though given that OIDs don't travel with data 
types... I may have to settle for a TEXT arg, which is acceptable.

* Allow LISTEN to block until a value changes.  LISTEN [BLOCK|WAIT] 
'relname'

* Allow LISTEN to have a timeout
  LISTEN name [BLOCK] [WITH] [TIMEOUT [=] ''::INTERVAL];
* Allow blocking LISTEN queries to update the status of the proc title
  while blocking.
Basically I want to introduce formal support for turning PostgreSQL 
into a message bus.  To start with, it won't be a scalable message bus 
that can scale to thousands of concurrent connections (that's something 
I'd like to do long term(tm), but I digress).  The problem is with the 
BLOCK'ing semantics or implementation.  There are two ways that I can 
do this, IMHO, but I'm out to solicit alternatives as I'm still getting 
a handle on what the best interfaces/APIs are internally to get 
something done.

Option 1) Use sleep(3) for the given timeout and wake up on some 
interruptible a signal (USR2?).  This is the simplest solution, but 
likely the least portable to win32.  Given the new world order of 8.0 
and it's portability headaches, it's something I'm aware of.

Option 2) block on an exclusive lock.  Check to see if relname has been 
registered.  If it has, block on the existing exclusive lock (can I 
block on a lock for a given number of sec/ms?).  If it hasn't, create 
an exclusive lock, but give the lock away (to the parent postmaster, a 
lockmgr proc, etc) so that a notify can remove the remove and unlock 
the exclusive lock so that all of the blocking children wake up.

The async interface is nice, but not really useful to me as it requires 
polling, instead of unblocking when an event comes through, which would 
create a vastly more real time interface that should be easier on the 
database.  Does anyone think there would be any conflicts with the use 
of the existing alarm code from storage/lmgr/proc.c for the 
LISTEN/NOTIFY interface?  It looks like SIGALRM has a reserved purpose. 
 I haven't found any global alarm handling interface that can be used 
to assign different meanings when an SIGALRM is received.  Any other 
thoughts on the best way to proceed?

-sc
--
Sean Chittenden
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?

2004-12-27 Thread Magnus Hagander

 Basically I want to introduce formal support for turning 
 PostgreSQL into a message bus.  To start with, it won't be a 
 scalable message bus that can scale to thousands of 
 concurrent connections (that's something I'd like to do long 
 term(tm), but I digress).  The problem is with the BLOCK'ing 
 semantics or implementation.  There are two ways that I can 
 do this, IMHO, but I'm out to solicit alternatives as I'm 
 still getting a handle on what the best interfaces/APIs are 
 internally to get something done.

Sounds great :-)


 Option 1) Use sleep(3) for the given timeout and wake up on 
 some interruptible a signal (USR2?).  This is the simplest 
 solution, but likely the least portable to win32.  Given the 
 new world order of 8.0 and it's portability headaches, it's 
 something I'm aware of.

There should be no problem doing this on win32. The portability layer
(in backend/port/win32) implements a sleep (pg_usleep, that is) that can
be interrupted by any signal (our win32 signal implementation actually
supports 256 signals, but you'd have to stick to a signal that's
supported on all platforms and also not already in use. And that's going
to be a lot harder unless you can piggyback on something that's already
there).


 The async interface is nice, but not really useful to me as 
 it requires polling, instead of unblocking when an event 
 comes through, which would create a vastly more real time 
 interface that should be easier on the database. 

As long as you're working only from the client side, you don't need to
do polling. See the bottom of the page at
http://developer.postgresql.org/docs/postgres/libpq-notify.html.  You
just need to pass the socket down to select(). I'm doing this from C and
perl, don't know about other languages.

That said, there are certainly a lot of cases where a blocking call at
the SQL level would also be useful. Just don't kill what we have now :-)

//Magnus

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Help Needed

2004-12-27 Thread Adrian Maier
On Mon, 27 Dec 2004 07:55:27 +0530 (IST), Ameya S. Sakhalkar
[EMAIL PROTECTED] wrote:
 
 For my project (main memory DBMS), I have written a main memory filesystem.
 Idea is, the primary copy of data will reside in main memory. (Workable
 only for small size data, at most 2GB).
 
 Now, I want to plug this filesystem with Postgres, so that, instead of
 Unix filesystem, this main memory filesystem(MMFS) will now store the
 data.

I believe that all you need to do is to mount your filesystem somewhere, 
and then tell postgres to use that directory for storing the database files.  

1. Mount your filesystem to a directory,  e.g  /mnt/mem_drive
   (possibly :   mount  -t mmfs  /dev/mmfs   /mnt/mem_drive)
2. export  PG_DATA=/mnt/mem_drive
3. initdb  
4. pg_ctl start

I hope this helps. 

Adrian Maier
 
 I want to know, how to plug it with Postgres. i.e. whether replacing all
 calls to Unix filesystem by calls to MMFS is enough, or do I need to do
 something else/more. I am really a newbie to Postgres. So, can someone pls
 guide me, abt how should I go.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?

2004-12-27 Thread Tom Lane
Sean Chittenden [EMAIL PROTECTED] writes:
 * Allow LISTEN to block until a value changes.  LISTEN [BLOCK|WAIT] 
 'relname'

 * Allow LISTEN to have a timeout

LISTEN name [BLOCK] [WITH] [TIMEOUT [=] ''::INTERVAL];

 * Allow blocking LISTEN queries to update the status of the proc title
while blocking.

I don't believe in any of these things, at least not on the server side.
You can get the same effect on the client side without cluttering the
LISTEN semantics and implementation.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?

2004-12-27 Thread Tom Lane
Sean Chittenden [EMAIL PROTECTED] writes:
 The async interface is nice, but not really useful to me as it requires 
 polling, instead of unblocking when an event comes through, which would 
 create a vastly more real time interface that should be easier on the 
 database.

BTW, this is nonsense; the backend sends a message exactly when the
NOTIFY occurs.  It may well be that libpq needs some work to make it
easier to wait for a NOTIFY without polling, but again there isn't a
reason to clutter the server-side code with it.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] production server down

2004-12-27 Thread Joe Conway
Tom Lane wrote:
Are you using one of the scripts that
does an auto initdb if it doesn't see a valid PGDATA?  11 seconds might
be about right for that.
One problem with this theory is how come you didn't get screwed during
*that* boot cycle.  It seems to require assuming that the NFS mount came
online just after the initdb finished (else initdb would have
overwritten the on-NFS pg_control) but before the regular postmaster
started (else this same scenario would have played out then).  That's
not a very wide window.
[followup]
We've now had a chance to bring Postgres down and check under the mount 
point. There *is* indeed a newly initdb'd cluster under there. FWIW the 
control file is corrupt:

# pg_controldata /home/jconway/pgsql/fds/replica/pgdata
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.
pg_control version number:72
Catalog version number:   200310211
Database cluster state:   in production
pg_control last modified: Sat Feb  6 22:28:16 2106
Current log file ID:  0
Next log file segment:10161036
Latest checkpoint location:   0/9AA1B4
Prior checkpoint location:0/9B0B8C
Latest checkpoint's REDO location:0/0
Latest checkpoint's UNDO location:C/218
Latest checkpoint's StartUpID:17142
Latest checkpoint's NextXID:  1099443932
Latest checkpoint's NextOID:  8192
Time of latest checkpoint:Wed Apr  8 07:05:36 6325
Database block size:  1
Blocks per segment of large relation: 128
Maximum length of identifiers:67
Maximum number of function arguments: 0
Date/time type storage:   floating-point numbers
Maximum length of locale name:0
LC_COLLATE:
LC_CTYPE:
I have a tarred copy of the under-the-mount PGDATA if anyone is 
interested in examining it.

BTW, there was another Postgres cluster on this same server which we had 
not used since the November 2 reboot -- it was corrupt in pretty much 
the same way and also had an initdb'd cluster under its mount.

So it looks like using an auto initdb startup script is a very bad idea 
when using an NFS mounted PGDATA. We left the under-mount structure in 
place and did chown root:root and chmod 000 on it. And, as mentioned 
in an earlier post, we now rely on the dba to start postgres manually 
after a server restart.

Joe

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?

2004-12-27 Thread Merlin Moncure
Sean Chittenden wrote:
 Option 1) Use sleep(3) for the given timeout and wake up on some
 interruptible a signal (USR2?).  This is the simplest solution, but
 likely the least portable to win32.  Given the new world order of 8.0
 and it's portability headaches, it's something I'm aware of.
 
 Option 2) block on an exclusive lock.  Check to see if relname has
been
 registered.  If it has, block on the existing exclusive lock (can I
 block on a lock for a given number of sec/ms?).  If it hasn't, create
 an exclusive lock, but give the lock away (to the parent postmaster, a
 lockmgr proc, etc) so that a notify can remove the remove and unlock
 the exclusive lock so that all of the blocking children wake up.
 
 The async interface is nice, but not really useful to me as it
requires
 polling, instead of unblocking when an event comes through, which
would
 create a vastly more real time interface that should be easier on the
 database.  Does anyone think there would be any conflicts with the use
 of the existing alarm code from storage/lmgr/proc.c for the
 LISTEN/NOTIFY interface?  It looks like SIGALRM has a reserved
purpose.
   I haven't found any global alarm handling interface that can be used
 to assign different meanings when an SIGALRM is received.  Any other
 thoughts on the best way to proceed?

You can make cooperative blocking locks using some slick client side
code and the swiss army knife userlock module.  Since user locks pierce
transaction encapsulation they can be used for these types of things.

select user_write_lock(some number);
if return = 1
notify some message
else
wait a and try again, etc.
end if

// release lock, etc.


 NOTIFY 'relname' a_expr;
This would be great to have...at least I think this is what you are
driving at: (adding a noiseword for readability)

LISTEN system_messages;
NOTIFY system_messages MESSAGE logoff;
NOTIFY request_unlock  MESSAGE 12345; -- for use with user locks!

Etc.

Another cute tidbit about the listen/notify interface is that it can be
abused to create server side storage that is cleaned up by the server
when a backend drops.  I was using this to crate a crude pessimistic
locking system before I discovered the userlock module.  Note that this
is not a scalable or robust approach however.

listen 12345;  --acquired a lock on '12345'
select listenerpid from pg_listener where relname = '12345'

Merlin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?

2004-12-27 Thread Sean Chittenden
NOTIFY 'relname' a_expr;
This would be great to have...at least I think this is what you are
driving at: (adding a noiseword for readability)
LISTEN system_messages;
NOTIFY system_messages MESSAGE logoff;
NOTIFY request_unlock  MESSAGE 12345; -- for use with user locks!
Hrm... the userlock module may work for what I'm after, but the module 
is GPL'ed, which is something I don't want to rely on.  I wanted to 
have a backend process essentially act as a long running client that 
would act on notify messages to dispatch work.  I was hoping to 
accomplish something like, LISTEN REGISTER event_name; LISTEN * BLOCK 
WITH TIMEOUT = 30; which would only listen to events that you've 
registered to listen to, but would only deliver messages when you 
explicitly were going to block waiting for message delivery (ie, LISTEN 
BLOCK).  I'll settle for getting just message passing via NOTIFY/LISTEN 
to work and will revisit extending things further once I have this bit 
done.  -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] LISTEN/NOTIFY enhancement: Portable signal handling?

2004-12-27 Thread Sean Chittenden
The async interface is nice, but not really useful to me as it 
requires
polling, instead of unblocking when an event comes through, which 
would
create a vastly more real time interface that should be easier on the
database.
BTW, this is nonsense; the backend sends a message exactly when the
NOTIFY occurs.  It may well be that libpq needs some work to make it
easier to wait for a NOTIFY without polling, but again there isn't a
reason to clutter the server-side code with it.
It's asynchronous with regards to client delivery of the message.  
Sending of the NOTIFY message is synchronous, but delivery to the 
client is not.  I don't see how it could be any other way in 
PostgreSQL.  libpq(3) actually has a reasonable interface that relies 
on the developer to block on the fd as described here:

http://developer.postgresql.org/docs/postgres/libpq-notify.html
The problem is that this doesn't work in pl/*, which is the problem I 
was trying to address.  *shrug*  -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] Schema permissions inheiritance

2004-12-27 Thread Stephen Frost
Greetings,

  As was discussed on IRC a bit-  In the TODO list already there's a
  blip about adding support for group ownership.  In addition to this I
  think it'd be nice if you could create a schema (owned by a group) in
  which any tables/etc created were also owned by a group automatically.
  Following that, additional permissions might be nice, kind of implicit
  grants for tables created in a certain schema.  I'd think we'd also
  want to have a NOTICE when that happens so users are aware of the
  permissions on the table.

  Thoughts?  Can we get it added to the list?

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Updateable views

2004-12-27 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 I think you want to extend the SQL syntax to allow updating views, and
 implement plan nodes and executor functionality to handle them. So things
 like this works:

 UPDATE (SELECT id,val FROM t) SET val=0 where id  100

 Then the rules you create on the views are just like the rules for SELECT,
 they simply mechanically replace the view with the view definition.

 I think this is the right approach because:

 a) I think creating the general rules to transform an update into an update on
the underlying table will be extremely complex, and you'll only ever be
able to handle the simplest cases. By handling the view at planning time
you'll be able to handle arbitrarily complex cases limited only by whether
you can come up with reasonable semantics.

Please provide an existence proof.  I don't really see any basis for the
claim that this will be simpler to implement --- the semantic problems
will be the same either way.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] displaying contents

2004-12-27 Thread Jaime Casanova
Hi,

there is way to display all the values (fields) in a 
tree node like this?  for debug purpouses.

Query  *query;

regards,
Jaime Casanova


_
Do You Yahoo!?
Información de Estados Unidos y América Latina, en Yahoo! Noticias.
Visítanos en http://noticias.espanol.yahoo.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] displaying contents

2004-12-27 Thread Tom Lane
Jaime Casanova [EMAIL PROTECTED] writes:
 there is way to display all the values (fields) in a 
 tree node like this?  for debug purpouses.

 Query*query;

Look at elog_node_display().

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Where do I get the spec for PostgreSQL

2004-12-27 Thread Benjamin Arai
Where can I obtain a spec for postgresql, so I can start looking at the 
code?

Benjamin
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] displaying contents

2004-12-27 Thread Jaime Casanova
 --- Tom Lane [EMAIL PROTECTED] escribió: 
 Jaime Casanova [EMAIL PROTECTED] writes:
  there is way to display all the values (fields) in
 a 
  tree node like this?  for debug purpouses.
 
  Query  *query;
 
 Look at elog_node_display().
 
   regards, tom lane
  

Ok. Thanks.

regards, 
Jaime Casanova

_
Do You Yahoo!?
Información de Estados Unidos y América Latina, en Yahoo! Noticias.
Visítanos en http://noticias.espanol.yahoo.com

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Bgwriter behavior

2004-12-27 Thread Bruce Momjian
Simon Riggs wrote:
 On Wed, 2004-12-22 at 04:43, Tom Lane wrote:
  Bruce Momjian pgman@candle.pha.pa.us writes:
   So what are we doing for 8.0?
  
  Well, it looks like RC2 has already crashed and burned --- I can't
  imagine that Marc will let us release without an RC3 given what was
  committed today, never mind the btree bug that Mark Wong seems to have
  found.  So maybe we should just bite the bullet and do something real
  about this.
  
  I'm willing to code up a proposed patch for the two-track idea I
  suggested, and if anyone else has a favorite maybe they could write
  something too.  But do we have the resources to test such patches and
  make a decision in the next few days?
  
  At the moment my inclination is to sit on what we have.  I've not seen
  any indication that 8.0 is really worse than earlier releases; the most
  you could argue against it is that it's not as much better as we hoped.
  That's not grounds to muck around at the RC3 stage.
 
 Agreed, if somewhat reluctantly.
 
 We may have the time to test, but it is clear that we do not have the
 time to validate those tests, then discuss and agree on the results.
 
 Time to go with what we have.

I ran some tests last week and can report results similar on Tom's test:

pgbench -i -s 10 bench
pgbench -c 10 -t 1 bench

The tests were on a machine with a single SCSI drive that doesn't lie
about fsync.  I found 7.4.X got around 75tps while 8.0 got 100tps, very
similar to the 65/107 numbers Tom had.

First, I am confused why we have such a large improvement in 8.0.  Does
anyone know?  This is a pretty long test so a 33-50% increase is a big
jump.

Second, I added a little code in my local code to check if the
pendingOpsTable overflows and register_dirty_segment() must have a local
backend do an fsync().  I found one bgbench test had 54 local fsyncs,
but the next test had none, and 54 isn't a very larger number.

Should we emit a server log message when this happens so they can
reduce bewriter delay?

It seems having the backend do the writes is not so bad (same as 7.4.X)
and our only big problem with current bgwriter is the inability to
reduce checkpoint load for busy servers.

Should we consider at least adjusting the meaning of bgwriter_percent?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] Ready for RC3?

2004-12-27 Thread Bruce Momjian
What are the open issues before we can release an RC3?  I don't know of
any except the btree problem OSDL found.  Is that fixed?  Are their
others?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Bgwriter behavior

2004-12-27 Thread John Hansen
 I ran some tests last week and can report results similar on Tom's test:
 
   pgbench -i -s 10 bench
   pgbench -c 10 -t 1 bench
 
 The tests were on a machine with a single SCSI drive that doesn't lie
 about fsync.  I found 7.4.X got around 75tps while 8.0 got 100tps, very
 similar to the 65/107 numbers Tom had.

You do realize, that pgbench result comparisons are about as useful as a
fork for eating soup?

On another note, how do you know for sure, that your drive does not lie
about fsync?

Did you run the tests with fsync turned off vs fsync on?

 First, I am confused why we have such a large improvement in 8.0.  Does
 anyone know?  This is a pretty long test so a 33-50% increase is a big
 jump.

bgwriter is responsible I imagine,... I experienced the same improvement
in an early 7.5, just after the bgwriter was added.
(tho my results was about 4-5 times higher in terms of tps rates, hehe)

... John


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] RC2 and open issues

2004-12-27 Thread Bruce Momjian
Greg Stark wrote:
 
 Tom Lane [EMAIL PROTECTED] writes:
 
  Suppose that you run a checkpoint every 5 minutes, and with the knob
  you slow down the checkpoint to extend over say 3 minutes on average,
  rather than the normal blast-it-out-as-fast-as-possible.  Then you'll
  be keeping an average of 8 minutes worth of WAL files instead of 5.
  Not exactly a killer objection.
 
 Right. I was thinking that the goal would be to spread the checkpoint out over
 exactly the checkpoint interval, minus some safety factor. So if it has some
 estimate of the total number of dirty buffers that need flushing it could just
 divide the checkpoint interval by that and calculate the delay needed to
 finish in some fraction of the checkpoint interval, 60% seems like a
 reasonable guess.
 
  One issue is that while we can regulate the rate at which we issue
  write()s, we still have to issue fsync()s at the end, and we can't
  control what happens in response to those.  It's quite possible that
  all the I/O would happen in response to the fsync()s anyway, in which
  case the whole exercise would be a waste of time.
 
 Well you could fsync earlier as well, say just before whenever you sleep.
 Obviously the delay on the checkpoint process doesn't matter to performance if
 it's about to sleep. It could end up scheduling i/o earlier than necessary and
 cause redundant seeks but then I guess that's an inherent tension between
 trying to spread out the i/o evenly and trying to get the ideal ordering of
 i/o.

It certainly is an interesting idea to have the checkpoint span a longer
time period.  We couldn't do that with sync, but now that we fsync each
file it is possible.

It would be easy do this if we didn't also need the fsync.  The original
idea was that we would write() the dirty buffers long before the
checkpoint, and the kernel would write many of these dirty buffers
before we got to checkpoint time.

We could go with the checkpoint clock sweep idea but then we aren't
writing them but actually doing write/fsync a lot more.  I can't think
of a way this would be a win.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings