Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-16 Thread Markus Wanner

Hi,

On 09/15/2010 08:54 PM, Robert Haas wrote:

I think that the bar for committing to another in-core replication
solution right now is probably fairly high.


I'm not trying to convince you to accept the Postgres-R patch.. at least 
not now.


showing-off
BTW, that'd be what I call a huge patch:

bgworkers, excluding dynshmem and imessages:
 34 files changed, 2910 insertions(+), 1421 deletions(-)

from there to Postgres-R:
 98 files changed, 14856 insertions(+), 230 deletions(-)
/showing-off


I am pretty doubtful that
our current architecture is going to get us to the full feature set
we'd eventually like to have - multi-master, partial replication, etc.


Would be hard to do, due to the (physical) format of WAL, yes. That's 
why Postgres-R uses its own (logical) wire format.



  But we're not ever going to have ten replication solutions in core,
so we need to think pretty carefully about what we accept.


That's very understandable.


That
conversation probably needs to start from the other end - is the
overall architecture correct for us? - before we get down to specific
patches.  On the other hand, I'm very interested in laying the
groundwork for parallel query


Cool. Maybe we should take another look at bgworkers, as soon as a 
parallel querying feature gets planned?



and I think there are probably a number
of bits of architecture both from this project and Postgres-XC, that
could be valuable contributions to PostgreSQL;


(...note that Postgres-R is license compatible, as opposed to the GPL'ed 
Postgres-XC project...)



however, in neither
case do I expect them to be accepted without significant modification.


Sure, that's understandable as well. I've published this part of the 
infrastructure to get some feedback as early as possible on that part of 
Postgres-R.


As you can certainly imagine, it's important for me that any 
modification to such a patch from Postgres-R would still be compatible 
to what I use it for in Postgres-R and not cripple any functionality 
there, because that'd probably create more work for me than not getting 
the patch accepted upstream at all.



I'm saying it's hard to think about committing any of them because
they aren't really independent of each other or of other parts of
Postgres-R.


As long as you don't consider imessages and dynshmem a part of 
Postgres-R, they are independent of the rest of Postgres-R in the 
technical sense.


And for any kind of parallel querying feature, imessages and dynshmem 
might be of help as well. So I currently don't see where I could 
de-couple these patches any further.


If you have a specific requirement, please don't hesitate to ask.


I feel like there is an antagonistic thread to this conversation, and
some others that we've had.  I hope I'm misreading that, because it's
not my intent to piss you off.  I'm just offering my honest feedback.
Your mileage may vary; others may feel differently; none of it is
personal.


That's absolutely fine. I'm thankful for your feedback.

Also note that I initially didn't even want to add the bgworker patches 
to the commit fest. I've de-coupled and published these separate from 
Postgres-R with a) the hope to get feedback (more than for the overall 
Postgres-R patch) and b) to show others that such a facility exists and 
is ready to be reused.


I didn't really expect them to get accepted to Postgres core at the 
moment. But the Postgres team normally asks for sharing concepts and 
ideas as early as possible...



OK, I think I understand what you're trying to say now.  I guess I
feel like the ideal architecture for any sort of solution that needs a
pool of workers would be to keep around the workers that most recently
proved to be useful.  Upon needing a new worker, you look for one
that's available and already bound to the correct database.  If you
find one, you assign him to the new task.


That's mostly how bgworkers are designed, yes. The min/max idle 
background worker GUCs allow a loose control over how many spare 
processes you want to allow hanging around doing nothing.



If not, you find the one
that's been idle longest and either (a) kill him off and start a new
one that is bound to the correct database or, even better, (b) tell
him to flush his caches and rebind to the correct database.


Hm.. sorry if I didn't express this more clearly. What I'm trying to say 
is that (b) isn't worth implementing, because it doesn't offer enough of 
an improvement over (a). The only saving would be the fork() and some 
basic process initialization.


Being able to re-use a bgworker connected to the correct database 
already gives you most of the benefit, namely not having to fork() *and* 
re-connect to the database for every job.



Back at the technical issues, let me try to summarize the feedback and 
what I do with it.


In general, there's not much use for bgworkers for just autovacuum as 
the only background job. I agree.


Tom raised the 'lots of databases' issue. I agree that the 

Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-16 Thread Robert Haas
On Thu, Sep 16, 2010 at 4:47 AM, Markus Wanner mar...@bluegap.ch wrote:
 showing-off
 BTW, that'd be what I call a huge patch:

 bgworkers, excluding dynshmem and imessages:
  34 files changed, 2910 insertions(+), 1421 deletions(-)

 from there to Postgres-R:
  98 files changed, 14856 insertions(+), 230 deletions(-)
 /showing-off

Yeah, that's huge.  :-)

 That
 conversation probably needs to start from the other end - is the
 overall architecture correct for us? - before we get down to specific
 patches.  On the other hand, I'm very interested in laying the
 groundwork for parallel query

 Cool. Maybe we should take another look at bgworkers, as soon as a parallel
 querying feature gets planned?

Well, that will obviously depend somewhat on the wishes of whoever
decides to work on parallel query, but it seems reasonable to me.  It
would be nice to get some pieces of this committed incrementally but
as I say I fear there is too much dependency on what might happen
later, at least the way things are structured now.

 and I think there are probably a number
 of bits of architecture both from this project and Postgres-XC, that
 could be valuable contributions to PostgreSQL;

 (...note that Postgres-R is license compatible, as opposed to the GPL'ed
 Postgres-XC project...)

Yeah.  +1 for license compatibility.

 As you can certainly imagine, it's important for me that any modification to
 such a patch from Postgres-R would still be compatible to what I use it for
 in Postgres-R and not cripple any functionality there, because that'd
 probably create more work for me than not getting the patch accepted
 upstream at all.

That's an understandable goal, but it may be difficult to achieve.

 As long as you don't consider imessages and dynshmem a part of Postgres-R,
 they are independent of the rest of Postgres-R in the technical sense.

 And for any kind of parallel querying feature, imessages and dynshmem might
 be of help as well. So I currently don't see where I could de-couple these
 patches any further.

I agree.  I've already said my piece on how I think that stuff would
need to be reworked to be acceptable, so we might have to agree to
disagree on those, especially if your goal is to get something
committed that doesn't involve a major rewrite on your end.

 I didn't really expect them to get accepted to Postgres core at the moment.
 But the Postgres team normally asks for sharing concepts and ideas as early
 as possible...

Absolutely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-16 Thread Markus Wanner

Morning,

On 09/16/2010 04:26 PM, Robert Haas wrote:

I agree.  I've already said my piece on how I think that stuff would
need to be reworked to be acceptable, so we might have to agree to
disagree on those, especially if your goal is to get something
committed that doesn't involve a major rewrite on your end.


Just for clarification: you are referring to imessages and dynshmem 
here, right? I agree that dynshmem needs to be reworked and rethought. 
And imessages simply depends on dynshmem.


If you are referring to the bgworker stuff, I'm not quite clear about 
what I could do to make bgworker more acceptable. (Except perhaps for 
removing the dependency on imessages).


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-16 Thread Robert Haas
On Thu, Sep 16, 2010 at 1:20 PM, Markus Wanner mar...@bluegap.ch wrote:
 On 09/16/2010 04:26 PM, Robert Haas wrote:

 I agree.  I've already said my piece on how I think that stuff would
 need to be reworked to be acceptable, so we might have to agree to
 disagree on those, especially if your goal is to get something
 committed that doesn't involve a major rewrite on your end.

 Just for clarification: you are referring to imessages and dynshmem here,
 right? I agree that dynshmem needs to be reworked and rethought. And
 imessages simply depends on dynshmem.

Yes, I was referring to imessages and dynshmem.

 If you are referring to the bgworker stuff, I'm not quite clear about what I
 could do to make bgworker more acceptable. (Except perhaps for removing the
 dependency on imessages).

I'm not sure, either.  It would be nice if there were a way to create
a general facility here that we could then build various applications
on, but I'm not sure whether that's the case.  We had some
back-and-forth about what is best for replication vs. what is best for
vacuum vs. what is best for parallel query.  If we could somehow
conceive of a system that could serve all of those needs without
introducing any more configuration complexity than what we have now,
that would of course be very interesting.  But judging by your
comments I'm not very sure such a thing is feasible, so perhaps
wait-and-see is the best approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-15 Thread Markus Wanner

Hi,

On 09/15/2010 03:44 AM, Robert Haas wrote:

Hmm.  So what happens if you have 1000 databases with a minimum of 1
worker per database and an overall limit of 10 workers?


The first 10 databases would get an idle worker. As soon as real jobs 
arrive, the idle workers on databases that don't have any pending jobs 
get terminated in favor of the databases for which there are pending 
jobs. Admittedly, that mechanism isn't too clever, yet. I.e. if there 
always are enough jobs for one database, the others could starve.


With 1000 databases and a max of only 10 workers, chances for having a 
spare worker for the database that gets the next job are pretty low, 
yes. But that's the case with the proposed 5 minute timeout as well.


Lowering that timeout wouldn't increase the chance. And while it might 
make the start of a new bgworker quicker in the above mentioned case, I 
think there's not much advantage over just setting 
max_idle_background_workers = 0.


OTOH such a timeout would be easy enough to implement. The admin would 
be faced with yet another GUC, though.



Hmm, I see.  That's probably not helpful for autovacuum, but I can see
it being useful for replication.


Glad to hear.


I still think maybe we ought to try
to crack the nut of allowing backends to rebind to a different
database.  That would simplify things here a good deal, although then
again maybe it's too complex to be worth it.


Also note that it would re-introduce some of the costs we try to avoid 
with keeping the connected bgworker around. And in case you afford 
having at least a few spare bgworkers around per database (i.e. less 
than 10 or 20 databases), potential savings seem to be negligible again.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-15 Thread Robert Haas
On Wed, Sep 15, 2010 at 2:48 AM, Markus Wanner mar...@bluegap.ch wrote:
 Hmm.  So what happens if you have 1000 databases with a minimum of 1
 worker per database and an overall limit of 10 workers?

 The first 10 databases would get an idle worker. As soon as real jobs
 arrive, the idle workers on databases that don't have any pending jobs get
 terminated in favor of the databases for which there are pending jobs.
 Admittedly, that mechanism isn't too clever, yet. I.e. if there always are
 enough jobs for one database, the others could starve.

I haven't scrutinized your code but it seems like the
minimum-per-database might be complicating things more than necessary.
 You might find that you can make the logic simpler without that.  I
might be wrong, though.

I guess the real issue here is whether it's possible to, and whether
you're interested in, extracting a committable subset of this work,
and if so what that subset should look like.  There's sort of a
chicken-and-egg problem with large patches; if you present them as one
giant monolithic patch, they're too large to review.  But if you break
them down into smaller patches, it doesn't really fix the problem
unless the pieces have independent value.  Even in the two years I've
been involved in the project, a number of different contributors have
gone through the experience of submitting a patch that only made sense
if you assumed that the follow-on patch was also going to get
accepted, and as no one was willing to assume that, the first patch
didn't get committed either.  Where people have been able to break
things down into a series of small to medium-sized incremental
improvements, things have gone more smoothly.  For example, Simon was
able to get a batch to start the bgwriter during archive recovery
committed to 8.4.  That didn't have a lot of independent value, but it
had some, and it paved the way for Hot Standby in 9.0.  Had someone
thought of a way to decompose that project into more than two truly
independent pieces, I suspect it might have even gone more smoothly
(although of course that's an arguable point and YMMV).

 I still think maybe we ought to try
 to crack the nut of allowing backends to rebind to a different
 database.  That would simplify things here a good deal, although then
 again maybe it's too complex to be worth it.

 Also note that it would re-introduce some of the costs we try to avoid with
 keeping the connected bgworker around.

How?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-15 Thread Markus Wanner

Robert,

On 09/15/2010 07:23 PM, Robert Haas wrote:

I haven't scrutinized your code but it seems like the
minimum-per-database might be complicating things more than necessary.
  You might find that you can make the logic simpler without that.  I
might be wrong, though.


I still think of that as a feature, not something I'd like to simplify 
away. Are you arguing it could raise the chance of adaption of bgworkers 
to Postgres? I'm not seeing your point here.



I guess the real issue here is whether it's possible to, and whether
you're interested in, extracting a committable subset of this work,
and if so what that subset should look like.


Well, as it doesn't currently provide any real benefit for autovacuum, 
it depends on how much hackers like to prepare for something like 
Postgres-R or parallel querying.



There's sort of a
chicken-and-egg problem with large patches; if you present them as one
giant monolithic patch, they're too large to review.  But if you break
them down into smaller patches, it doesn't really fix the problem
unless the pieces have independent value.


I don't quite get what you are trying to say here. I splited the 
bgworker projet from Postgres-R into 6 separate patches. Are you saying 
that's too few or too much?


You are welcome to argue about independent patches, i.e. this patch 1 of 
6 (as $SUBJECT implies) might have some value, according to Alvaro.


Admittedly, patch 2 of 6 is the biggest and most important chunk of 
functionality of the whole set.



Also note that it would re-introduce some of the costs we try to avoid with
keeping the connected bgworker around.


How?


I'm talking about the cost of connecting to a database (and 
disconnecting), most probably flushing caches, and very probably some 
kind of re-registering with the coordinator. Most of what a normal 
backend does at startup. About the only thing you'd save here is the 
fork() and very basic process setup. I really doubt that's worth the effort.


Having some more idle processes around doesn't hurt that much and solves 
the problem, I think.


Thanks for your feedback.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-15 Thread Robert Haas
On Wed, Sep 15, 2010 at 2:28 PM, Markus Wanner mar...@bluegap.ch wrote:
 I guess the real issue here is whether it's possible to, and whether
 you're interested in, extracting a committable subset of this work,
 and if so what that subset should look like.

 Well, as it doesn't currently provide any real benefit for autovacuum, it
 depends on how much hackers like to prepare for something like Postgres-R or
 parallel querying.

I think that the bar for committing to another in-core replication
solution right now is probably fairly high.  I am pretty doubtful that
our current architecture is going to get us to the full feature set
we'd eventually like to have - multi-master, partial replication, etc.
 But we're not ever going to have ten replication solutions in core,
so we need to think pretty carefully about what we accept.  That
conversation probably needs to start from the other end - is the
overall architecture correct for us? - before we get down to specific
patches.  On the other hand, I'm very interested in laying the
groundwork for parallel query, and I think there are probably a number
of bits of architecture both from this project and Postgres-XC, that
could be valuable contributions to PostgreSQL; however, in neither
case do I expect them to be accepted without significant modification.

 There's sort of a
 chicken-and-egg problem with large patches; if you present them as one
 giant monolithic patch, they're too large to review.  But if you break
 them down into smaller patches, it doesn't really fix the problem
 unless the pieces have independent value.

 I don't quite get what you are trying to say here. I splited the bgworker
 projet from Postgres-R into 6 separate patches. Are you saying that's too
 few or too much?

I'm saying it's hard to think about committing any of them because
they aren't really independent of each other or of other parts of
Postgres-R.

I feel like there is an antagonistic thread to this conversation, and
some others that we've had.  I hope I'm misreading that, because it's
not my intent to piss you off.  I'm just offering my honest feedback.
Your mileage may vary; others may feel differently; none of it is
personal.

 Also note that it would re-introduce some of the costs we try to avoid
 with
 keeping the connected bgworker around.

 How?

 I'm talking about the cost of connecting to a database (and disconnecting),
 most probably flushing caches, and very probably some kind of re-registering
 with the coordinator. Most of what a normal backend does at startup. About
 the only thing you'd save here is the fork() and very basic process setup. I
 really doubt that's worth the effort.

 Having some more idle processes around doesn't hurt that much and solves the
 problem, I think.

OK, I think I understand what you're trying to say now.  I guess I
feel like the ideal architecture for any sort of solution that needs a
pool of workers would be to keep around the workers that most recently
proved to be useful.  Upon needing a new worker, you look for one
that's available and already bound to the correct database.  If you
find one, you assign him to the new task.  If not, you find the one
that's been idle longest and either (a) kill him off and start a new
one that is bound to the correct database or, even better, (b) tell
him to flush his caches and rebind to the correct database.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Robert Haas
On Mon, Aug 30, 2010 at 11:30 AM, Markus Wanner mar...@bluegap.ch wrote:
 On 08/30/2010 04:52 PM, Tom Lane wrote:
 Let me just point out that awhile back we got a *measurable* performance
 boost by eliminating a single indirect fetch from the buffer addressing
 code path.

 I'll take a look a that, thanks.

 So I don't have any faith in untested assertions
 Neither do I. Thus I'm probably going to try my approach.

As a matter of project management, I am inclined to think that until
we've hammered out this issue, there's not a whole lot useful that can
be done on any of the BG worker patches.  So I am wondering if we
should set those to Returned with Feedback or bump them to a future
CommitFest.

The good news is that, after a lot of back and forth, I think we've
identified the reason underpinning much of why Markus and I have been
disagreeing about dynshmem and imessages - namely, whether or not it's
possible to allocate shared_buffers as something other than one giant
slab without taking an unacceptable performance hit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Markus Wanner

On 09/14/2010 06:26 PM, Robert Haas wrote:

As a matter of project management, I am inclined to think that until
we've hammered out this issue, there's not a whole lot useful that can
be done on any of the BG worker patches.  So I am wondering if we
should set those to Returned with Feedback or bump them to a future
CommitFest.


I agree in general. I certainly don't want to hold back the commit fest.

What bugs me a bit is that I didn't really get much feedback regarding 
the *bgworker* portion of code. Especially as that's the part I'm most 
interested in feedback.


However, I currently don't have any time to work on these patches, so 
I'm fine with dropping them from the current commit fest.



The good news is that, after a lot of back and forth, I think we've
identified the reason underpinning much of why Markus and I have been
disagreeing about dynshmem and imessages - namely, whether or not it's
possible to allocate shared_buffers as something other than one giant
slab without taking an unacceptable performance hit.


Agreed.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Alvaro Herrera
Excerpts from Markus Wanner's message of mar sep 14 12:56:59 -0400 2010:

 What bugs me a bit is that I didn't really get much feedback regarding 
 the *bgworker* portion of code. Especially as that's the part I'm most 
 interested in feedback.

I think we've had enough problems with the current design of forking a
new autovac process every once in a while, that I'd like to have them as
permanent processes instead, waiting for orders from the autovac
launcher.  From that POV, bgworkers would make sense.

I cannot promise a timely review however :-(

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 I think we've had enough problems with the current design of forking a
 new autovac process every once in a while, that I'd like to have them as
 permanent processes instead, waiting for orders from the autovac
 launcher.  From that POV, bgworkers would make sense.

That seems like a fairly large can of worms to open: we have never tried
to make backends switch from one database to another, and I don't think
I'd want to start such a project with autovac.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar sep 14 13:46:17 -0400 2010:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  I think we've had enough problems with the current design of forking a
  new autovac process every once in a while, that I'd like to have them as
  permanent processes instead, waiting for orders from the autovac
  launcher.  From that POV, bgworkers would make sense.
 
 That seems like a fairly large can of worms to open: we have never tried
 to make backends switch from one database to another, and I don't think
 I'd want to start such a project with autovac.

Yeah, what I was thinking is that each worker would still die after
completing the run, but a new one would be started immediately; it would
go to sleep until a new assignment arrived.  (What got me into this was
the whole latch thing, actually.)

This is a very raw idea however, so don't mind me much.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Markus Wanner

Hi,

On 09/14/2010 07:46 PM, Tom Lane wrote:

Alvaro Herreraalvhe...@commandprompt.com  writes:

I think we've had enough problems with the current design of forking a
new autovac process every once in a while, that I'd like to have them as
permanent processes instead, waiting for orders from the autovac
launcher.  From that POV, bgworkers would make sense.


Okay, great.


That seems like a fairly large can of worms to open: we have never tried
to make backends switch from one database to another, and I don't think
I'd want to start such a project with autovac.


They don't. Even with bgworker, every backend stays connected to the 
same backend. You configure the min and max amounts of idle backends 
*per database*. Plus the overall max of background workers, IIRC.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Tom Lane
Markus Wanner mar...@bluegap.ch writes:
 On 09/14/2010 07:46 PM, Tom Lane wrote:
 That seems like a fairly large can of worms to open: we have never tried
 to make backends switch from one database to another, and I don't think
 I'd want to start such a project with autovac.

 They don't. Even with bgworker, every backend stays connected to the 
 same backend. You configure the min and max amounts of idle backends 
 *per database*. Plus the overall max of background workers, IIRC.

So there is a minimum of one avworker per database?  That's a guaranteed
nonstarter.  There are many people with thousands of databases, but no
need for thousands of avworkers.

I'm also pretty unclear why you speak of min and max numbers of workers
when the proposal (AIUI) is to have the workers there always, rather
than have them come and go.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Markus Wanner

Hi,

I'm glad discussion on this begins.

On 09/14/2010 07:55 PM, Tom Lane wrote:

So there is a minimum of one avworker per database?


Nope, you can set that to 0. You don't *need* to keep idle workers around.


That's a guaranteed
nonstarter.  There are many people with thousands of databases, but no
need for thousands of avworkers.


Well, yeah, bgworkers are primarily designed to be used for Postgres-R, 
where you easily get more background workers than normal backends. And 
having idle backends around waiting for a next job is certainly 
preferable over having to re-connect every time.


I've advertised the bgworker infrastructure for use for parallel 
querying as well. Again, that use case easily leads to having more 
background workers than normal backends. And you don't want to wait for 
them all to re-connect for every query they need to help with.



I'm also pretty unclear why you speak of min and max numbers of workers
when the proposal (AIUI) is to have the workers there always, rather
than have them come and go.


This step 1 of the bgworker set of patches turns the av*launcher* 
(coordinator) into a permanent process (even if autovacuum is off).


The background workers can still come and go. However, they don't 
necessarily *need* to terminate after having done their job. The 
coordinator controls them and requests new workers or commands idle ones 
to terminate *as required*.


I don't think there's that much different to the current implementation. 
Setting both, the min and max number of idle bgworkers to 0 should in 
fact give you the exact same behavior as we currently have: namely to 
terminate each av/bgworker after its job is done, never having idle 
workers around. Which might or might not be the optimal configuration 
for users with lots of databases, that's hard to predict. And it depends 
a lot on the load distribution over the databases and on how clever the 
coordinator manages the bgworkers.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Markus Wanner

On 09/14/2010 08:06 PM, Robert Haas wrote:

One idea I had was to have autovacuum workers stick around for a
period of time after finishing their work.  When we need to autovacuum
a database, first check whether there's an existing worker that we can
use, and if so use him.  If not, start a new one.  If that puts us
over the max number of workers, kill of the one that's been waiting
the longest.  But workers will exit anyway if not reused after a
certain period of time.


That's pretty close to how bgworkers are implemented, now. Except for 
the need to terminate after a certain period of time. What is that 
intended to be good for?


Especially considering that the avlauncher/coordinator knows the current 
amount of work (number of jobs) per database.



The idea here would be to try to avoid all the backend startup costs:
process creation, priming the caches, etc.  But I'm not really sure
it's worth the effort.  I think we need to look for ways to further
reduce the overhead of vacuuming, but this doesn't necessarily seem
like the thing that would have the most bang for the buck.


Well, the pressure has simply been bigger for Postgres-R.

It should be possible to do benchmarks using Postgres-R and compare 
against a max_idle_background_workers = 0 configuration that leads to 
termination and re-connecting for ever remote transaction to be applied. 
However, that's not going to say anything about whether or not it's 
worth it for autovacuum.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Robert Haas
On Tue, Sep 14, 2010 at 1:56 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Tom Lane's message of mar sep 14 13:46:17 -0400 2010:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  I think we've had enough problems with the current design of forking a
  new autovac process every once in a while, that I'd like to have them as
  permanent processes instead, waiting for orders from the autovac
  launcher.  From that POV, bgworkers would make sense.

 That seems like a fairly large can of worms to open: we have never tried
 to make backends switch from one database to another, and I don't think
 I'd want to start such a project with autovac.

 Yeah, what I was thinking is that each worker would still die after
 completing the run, but a new one would be started immediately; it would
 go to sleep until a new assignment arrived.  (What got me into this was
 the whole latch thing, actually.)

 This is a very raw idea however, so don't mind me much.

What would be the advantage of that?

One idea I had was to have autovacuum workers stick around for a
period of time after finishing their work.  When we need to autovacuum
a database, first check whether there's an existing worker that we can
use, and if so use him.  If not, start a new one.  If that puts us
over the max number of workers, kill of the one that's been waiting
the longest.  But workers will exit anyway if not reused after a
certain period of time.

The idea here would be to try to avoid all the backend startup costs:
process creation, priming the caches, etc.  But I'm not really sure
it's worth the effort.  I think we need to look for ways to further
reduce the overhead of vacuuming, but this doesn't necessarily seem
like the thing that would have the most bang for the buck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Robert Haas
On Tue, Sep 14, 2010 at 2:26 PM, Markus Wanner mar...@bluegap.ch wrote:
 On 09/14/2010 08:06 PM, Robert Haas wrote:
 One idea I had was to have autovacuum workers stick around for a
 period of time after finishing their work.  When we need to autovacuum
 a database, first check whether there's an existing worker that we can
 use, and if so use him.  If not, start a new one.  If that puts us
 over the max number of workers, kill of the one that's been waiting
 the longest.  But workers will exit anyway if not reused after a
 certain period of time.

 That's pretty close to how bgworkers are implemented, now. Except for the
 need to terminate after a certain period of time. What is that intended to
 be good for?

To avoid consuming system resources forever if they're not being used.

 Especially considering that the avlauncher/coordinator knows the current
 amount of work (number of jobs) per database.

 The idea here would be to try to avoid all the backend startup costs:
 process creation, priming the caches, etc.  But I'm not really sure
 it's worth the effort.  I think we need to look for ways to further
 reduce the overhead of vacuuming, but this doesn't necessarily seem
 like the thing that would have the most bang for the buck.

 Well, the pressure has simply been bigger for Postgres-R.

 It should be possible to do benchmarks using Postgres-R and compare against
 a max_idle_background_workers = 0 configuration that leads to termination
 and re-connecting for ever remote transaction to be applied.

Well, presumably that would be fairly disastrous.  I would think,
though, that you would not have a min/max number of workers PER
DATABASE, but an overall limit on the upper size of the total pool - I
can't see any reason to limit the minimum size of the pool, but I
might be missing something.

 However, that's
 not going to say anything about whether or not it's worth it for autovacuum.

Personally, my position is that if someone does something that is only
a small improvement on its own but which has the potential to help
with other things later, that's a perfectly legitimate patch and we
should try to accept it.  But if it's not a clear (even if small) win
then the bar is a lot higher, at least in my book.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Markus Wanner

On 09/14/2010 08:41 PM, Robert Haas wrote:

To avoid consuming system resources forever if they're not being used.


Well, what timeout would you choose. And how would you justify it 
compared to the amounts of system resources consumed by an idle process 
sitting there and waiting for a job?


I'm not against such a timeout, but so far I felt that unlimited would 
be the best default.



Well, presumably that would be fairly disastrous.  I would think,
though, that you would not have a min/max number of workers PER
DATABASE, but an overall limit on the upper size of the total pool


That already exists (in addition to the other parameters).


- I
can't see any reason to limit the minimum size of the pool, but I
might be missing something.


I tried to mimic what others do, for example apache pre-fork. Maybe it's 
just another way of trying to keep the overall resource consumption at a 
reasonable level.


The minimum is helpful to eliminate waits for backends starting up. Note 
here that the coordinator can only request to fork one new bgworker at a 
time. It then needs to wait until that new bgworker registers with the 
coordinator, until it can request further bgworkers from the postmaster. 
(That's due to the limitation in communication between the postmaster 
and coordinator).



Personally, my position is that if someone does something that is only
a small improvement on its own but which has the potential to help
with other things later, that's a perfectly legitimate patch and we
should try to accept it.  But if it's not a clear (even if small) win
then the bar is a lot higher, at least in my book.


I don't think it's an improvement over the current autovacuum behavior. 
Not intended to be one. But it certainly shouldn't hurt, either.


It only has the potential to help with other things, namely parallel 
querying. And of course replication (Postgres-R). Or any other kind of 
background job you come up with (where background means not requiring a 
client connection).


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-09-14 Thread Robert Haas
On Tue, Sep 14, 2010 at 2:59 PM, Markus Wanner mar...@bluegap.ch wrote:
 On 09/14/2010 08:41 PM, Robert Haas wrote:

 To avoid consuming system resources forever if they're not being used.

 Well, what timeout would you choose. And how would you justify it compared
 to the amounts of system resources consumed by an idle process sitting there
 and waiting for a job?

 I'm not against such a timeout, but so far I felt that unlimited would be
 the best default.

I don't have a specific number in mind.  5 minutes?

 Well, presumably that would be fairly disastrous.  I would think,
 though, that you would not have a min/max number of workers PER
 DATABASE, but an overall limit on the upper size of the total pool

 That already exists (in addition to the other parameters).

Hmm.  So what happens if you have 1000 databases with a minimum of 1
worker per database and an overall limit of 10 workers?

 - I
 can't see any reason to limit the minimum size of the pool, but I
 might be missing something.

 I tried to mimic what others do, for example apache pre-fork. Maybe it's
 just another way of trying to keep the overall resource consumption at a
 reasonable level.

 The minimum is helpful to eliminate waits for backends starting up. Note
 here that the coordinator can only request to fork one new bgworker at a
 time. It then needs to wait until that new bgworker registers with the
 coordinator, until it can request further bgworkers from the postmaster.
 (That's due to the limitation in communication between the postmaster and
 coordinator).

Hmm, I see.  That's probably not helpful for autovacuum, but I can see
it being useful for replication.  I still think maybe we ought to try
to crack the nut of allowing backends to rebind to a different
database.  That would simplify things here a good deal, although then
again maybe it's too complex to be worth it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-30 Thread Markus Wanner

Hi,

On 08/27/2010 10:46 PM, Robert Haas wrote:

What other subsystems are you imagining servicing with a dynamic
allocator?  If there were a big demand for this functionality, we
probably would have been forced to implement it already, but that's
not the case.  We've already discussed the fact that there are massive
problems with using it for something like shared_buffers, which is by
far the largest consumer of shared memory.


Understood. I certainly plan to look into that for a better 
understanding of the problems those pose for dynamically allocated memory.



I think it would be great if we could bring some more flexibility to
our memory management.  There are really two layers of problems here.


Full ACK.


One is resizing the segment itself, and one is resizing structures
within the segment.  As far as I can tell, there is no portable API
that can be used to resize the shm itself.  For so long as that
remains the case, I am of the opinion that any meaningful resizing of
the objects within the shm is basically unworkable.  So we need to
solve that problem first.


Why should resizing of the objects within the shmem be unworkable? 
Doesn't my patch(es) prove the exact opposite? Being able to resize 
objects within the shm requires some kind of underlying dynamic 
allocation. And I rather like to be in control of that allocator than 
having to deal with two dozen different implementations on different 
OSes and their libraries.



There are a couple of possible solutions, which have been discussed
here in the past.


I currently don't have much interest in dynamic resizing. Being able to 
resize the overall amount of shared memory on the fly would be nice, 
sure. But the total amount of RAM in a server changes rather 
infrequently. Being able to use what's available more efficiently is 
what I'm interested in. That doesn't need any kind of additional or 
different OS level support. It's just a matter of making better use of 
what's available - within Postgres itself.



Next, we have to think about how we're going to resize data structures
within this expandable shm.


Okay, that's where I'm getting interested.


Many of these structures are not things
that we can easily move without bringing the system to a halt.  For
example, it's difficult to see how you could change the base address
of shared buffers without ceasing all system activity, at which point
there's not really much advantage over just forcing a restart.
Similarly with LWLocks or the ProcArray.


I guess that's what Bruce wanted to point out by saying our data 
structures are mostly continuous. I.e. not dynamic lists or hash 
tables, but plain simple arrays.


Maybe that's a subjective impression, but I seem to hear complaints 
about their fixed size and inflexibility quite often. Try to imagine the 
flexibility that dynamic lists could give us.



And if you can't move them,
then how will you grow them if (as will likely be the case) there's
something immediately following them in memory.  One possible solution
is to divide up these data structures into slabs.  For example, we
might imagine allocating shared_buffers in 1GB chunks.


Why 1GB and do yet another layer of dynamic allocation within that? The 
buffers are (by default) 8K, so allocate in chunks of 8K. Or a tiny bit 
more for all of the book-keeping stuff.



To make this
work, we'd need to change the memory layout so that each chunk would
include all of the miscellaneous stuff that we need to do bookkeeping
for that chunk, such as the LWLocks and buffer descriptors.  That
doesn't seem completely impossible, but there would be some
performance penalty, because you could no longer index into shared
buffers from a single base offset.


AFAICT we currently have three fixed size blocks to manage shared 
buffers: the buffer blocks themselves, the buffer descriptors, the 
strategy status (for the freelist) and the buffer lookup table.


It's not obvious to me how these data structures should perform better 
than a dynamically allocated layout. One could rather argue that 
combining (some of) the bookkeeping stuff with data itself would lead to 
better locality and thus perform better.



Instead, you'd need to determine
which chunk contains the buffer you want, look up the base address for
that chunk, and then index into the chunk.  Maybe that overhead
wouldn't be significant (or maybe it would); at any rate, it's not
completely free.  There's also the problem of handling the partial
chunk at the end, especially if that happens to be the only chunk.


This sounds way too complicated, yes. Use 8K chunks and most of the 
problems vanish.



I think the problems for other arrays are similar, or more severe.  I
can't see, for example, how you could resize the ProcArray using this
approach.


Try not to think in terms of resizing, but dynamic allocation. I.e. 
being able to resize ProcArray (and thus being able to alter 
max_connections on the fly) would take a lot more work.


Just 

Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-30 Thread Markus Wanner
(Sorry, need to disable Ctrl-Return, which quite often sends mails 
earlier than I really want.. continuing my mail)


On 08/27/2010 10:46 PM, Robert Haas wrote:

Yeah, probably.  I think designing something that works efficiently
over a network is a somewhat different problem than designing
something that works on an individual node, and we probably shouldn't
let the designs influence each other too much.


Agreed. Thus I've left out any kind of congestion avoidance stuff from 
imessages so far.



There's no padding or sophisticated allocation needed.  You
just need a pointer to the last byte read (P1), the last byte allowed
to be read (P2), and the last byte allocated (P3).  Writers take a
spinlock, advance P3, release the spinlock, write the message, take
the spinlock, advance P2, release the spinlock, and signal the reader.


That would block parallel writers (i.e. only one process can write to the
queue at any time).


I feel like there's probably some variant of this idea that works
around that problem.  The problem is that when a worker finishes
writing a message, he needs to know whether to advance P2 only over
his own message or also over some subsequent message that has been
fully written in the meantime.  I don't know exactly how to solve that
problem off the top of my head, but it seems like it might be
possible.


I've tried pretty much that before. And failed. Because the 
allocation-order (i.e. the time the message gets created in preparation 
for writing to it) isn't necessarily the same as the sending-order (i.e. 
when the process has finished writing and decides to send the message).


To satisfy the FIFO property WRT the sending order, you need to decouple 
allocation form the ordering (i.e. queuing logic).


(And yes, it has taken me a while to figure out what's wrong in 
Postgres-R, before I've even noticed about that design bug).



Readers take the spinlock, read P1 and P2, release the spinlock, read
the data, take the spinlock, advance P1, and release the spinlock.


It would require copying data in case a process only needs to forward the
message. That's a quick pointer dequeue and enqueue exercise ATM.


If we need to do that, that's a compelling argument for having a
single messaging area rather than one per backend.


Absolutely, yes.


But I'm not sure I
see why we would need that sort of capability.  Why wouldn't you just
arrange for the sender to deliver the message directly to the final
recipient?


A process can read and even change the data of the message before 
forwarding it. Something the coordinator in Postgres-R does sometimes. 
(As it is the interface to the GCS and thus to the rest of the nodes in 
the cluster).


For parallel querying (on a single node) that's probably less important 
a feature.



So, they know in advance how large the message will be but not what
the contents will be?  What are they doing?


Filling the message until it's (mostly) full and then continue with the 
next one. At least that's how the streaming approach on top of imessages 
works.


But yes, it's somewhat annoying to have to know the message size in 
advance. I didn't implement realloc so far. Nor can I think of any other 
solution. Note that separation of allocation and queue ordering is 
required anyway for the above reasons.



Well, the fact that something is commonly used doesn't mean it's right
for us.  Tabula raza, we might design the whole system differently,
but changing it now is not to be undertaken lightly.  Hopefully the
above comments shed some light on my concerns.  In short, (1) I don't
want to preallocate a big chunk of memory we might not use,


Isn't that's exactly what we do now for lots of sub-systems, and what 
I'd like to improve (i.e. reduce to a single big chunk).



(2) I fear
reducing the overall robustness of the system, and


Well, that applies to pretty much every new feature you add.


(3) I'm uncertain
what other systems would be able leverage a dynamic allocator of the
sort you propose.


Okay, that's up to me to show evidences (or at least a PoC).

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-30 Thread Tom Lane
Markus Wanner mar...@bluegap.ch writes:
 AFAICT we currently have three fixed size blocks to manage shared 
 buffers: the buffer blocks themselves, the buffer descriptors, the 
 strategy status (for the freelist) and the buffer lookup table.

 It's not obvious to me how these data structures should perform better 
 than a dynamically allocated layout.

Let me just point out that awhile back we got a *measurable* performance
boost by eliminating a single indirect fetch from the buffer addressing
code path.  We used to have an array of pointers pointing to the actual
buffers, and we removed that in favor of assuming the buffers were
laid out in a contiguous array, so that the address of buffer N could be
computed with a shift-and-add, eliminating the pointer fetch.  I forget
exactly what the numbers were, but it was significant enough to make us
change it.

So I don't have any faith in untested assertions that we can convert
these data structures to use dynamic allocation with no penalty.
It's very difficult to see how you'd do that without introducing a
new layer of indirection, and our experience shows that that layer
will cost you.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-30 Thread Markus Wanner

Hi,

On 08/30/2010 04:52 PM, Tom Lane wrote:

Let me just point out that awhile back we got a *measurable* performance
boost by eliminating a single indirect fetch from the buffer addressing
code path.


I'll take a look a that, thanks.


So I don't have any faith in untested assertions


Neither do I. Thus I'm probably going to try my approach.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-27 Thread Markus Wanner

Hi,

On 08/26/2010 11:57 PM, Robert Haas wrote:

It wouldn't require you to preallocate a big chunk of shared memory


Agreed, you wouldn't have to allocate it in advance. We would still want 
a configurable upper limit. So this can be seen as another approach for 
an implementation of a dynamic allocator. (Which should be separate from 
the exact imessages implementation, just for the sake of modularization 
already, IMO).



In addition, it means that maximum_message_queue_size_per_backend (or
whatever it's called) can be changed on-the-fly; that is, it can be
PGC_SIGHUP rather than PGC_POSTMASTER.


That's certainly a point. However, as you are proposing a solution to 
just one subsystem (i.e. imessages), I don't find it half as convincing.


If you are saying it *should* be possible to resize shared memory in a 
portable way, why not do it for *all* subsystems right away? I still 
remember Tom saying it's not something that's doable in a portable way. 
Why and how should it be possible for a per-backend basis? How portable 
is mmap() really? Why don't we use in in Postgres as of now?


I certainly think that these are orthogonal issues: whether to use fixed 
boundaries or to dynamically allocate the memory available is one thing, 
dynamic resizing is another. If the later is possible, I'm certainly not 
opposed to it. (But would still favor dynamic allocation).



As to efficiency, the process is not much different once the initial
setup is completed.


I fully agree to that.

I'm more concerned about ease of use for developers. Simply being able 
to alloc() from shared memory makes things easier than having to invent 
a separate allocation method for every subsystem, again and again (the 
argument that people are more used to multi-threaded argument).



Doing the extra setup just to send one or two messages
might suck.  But maybe that just means this isn't the right mechanism
for those cases (e.g. the existing XID-wraparound logic should still
use signal multiplexing rather than this system).  I see the value of
this as being primarily for streaming big chunks of data, not so much
for sending individual, very short messages.


I agree that simple signals don't need a full imessage. But as soon as 
you want to send some data (like which database to vacuum), or require 
the delivery guarantee (i.e. no single message gets lost, as opposed to 
signals), then imessages should be cheap enough.



The current approach uses plain spinlocks, which are more efficient. Note
that both, appending as well as removing from the queue are writing
operations, from the point of view of the queue. So I don't think LWLocks
buy you anything here, either.


I agree that this might not be useful.  We don't really have all the
message types defined yet, though, so it's hard to say.


What does the type of lock used have to do with message types? IMO it 
doesn't matter what kind of message or what size you want to send. For 
appending or removing a pointer to or from a message queue, a spinlock 
seems to be just the right thing to use.



I understand the need to limit the amount of data in flight, but I don't
think that sending any type of message should ever block. Messages are
atomic in that regard. Either they are ready to be delivered (in entirety)
or not. Thus the sender needs to hold back the message, if the recipient is
overloaded. (Also note that currently imessages are bound to a maximum size
of around 8 KB).


That's functionally equivalent to blocking, isn't it?  I think that's
just a question of what API you want to expose.


Hm.. well, yeah, depends on what level you are arguing. The imessages 
API can be used in a completely non-blocking fashion. So a process can 
theoretically do other work while waiting for messages.


For parallel querying, the helper/worker backends would probably need to 
block, if the origin backend is not ready to accept more data, yes. 
However, making it accept and process another job in the mean time seems 
hard to do. But not an imessages problem per se. (While with the above 
streaming layer I've mentioned, that would not be possible, because that 
blocks).



For replication, that might be the case, but for parallel query,
per-queue seems about right.  At any rate, no design we've discussed
will let individual queues grow without bound.


Extend parallel querying to multiple nodes and you are back at the same 
requirement.


However, it's certainly something that can be done atop imessages. I'm 
unsure if doing it as part of imessages is a good thing or not. Given 
the above requirement, I don't currently think so. Using multiple queues 
with different priorities, as you proposed, would probably make it more 
feasible.



You probably need this, but 8KB seems like a pretty small chunk size.


For node-internal messaging, I probably agree. Would need benchmarking, 
as it's a compromise between latency and overhead, IMO.


I've chosen 8KB so these messages (together with some GCS 

Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-27 Thread Robert Haas
On Fri, Aug 27, 2010 at 2:17 PM, Markus Wanner mar...@bluegap.ch wrote:
 In addition, it means that maximum_message_queue_size_per_backend (or
 whatever it's called) can be changed on-the-fly; that is, it can be
 PGC_SIGHUP rather than PGC_POSTMASTER.

 That's certainly a point. However, as you are proposing a solution to just
 one subsystem (i.e. imessages), I don't find it half as convincing.

What other subsystems are you imagining servicing with a dynamic
allocator?  If there were a big demand for this functionality, we
probably would have been forced to implement it already, but that's
not the case.  We've already discussed the fact that there are massive
problems with using it for something like shared_buffers, which is by
far the largest consumer of shared memory.

 If you are saying it *should* be possible to resize shared memory in a
 portable way, why not do it for *all* subsystems right away? I still
 remember Tom saying it's not something that's doable in a portable way.

I think it would be great if we could bring some more flexibility to
our memory management.  There are really two layers of problems here.
One is resizing the segment itself, and one is resizing structures
within the segment.  As far as I can tell, there is no portable API
that can be used to resize the shm itself.  For so long as that
remains the case, I am of the opinion that any meaningful resizing of
the objects within the shm is basically unworkable.  So we need to
solve that problem first.

There are a couple of possible solutions, which have been discussed
here in the past.  One very appealing option is to use POSIX shm
rather than sysv shm.  AFAICT, it is possible to portably resize a
POSIX shm using ftruncate(), though I am not sure to what extent this
is supported on Windows.  One significant advantage of using POSIX shm
is that the default limits for POSIX shm on many operating systems are
much higher than the corresponding limits for sysv shm; in fact, some
people have expressed the opinion that it might be worth making the
switch for that reason alone, since it is no secret that a default
value of 32MB or less for shared_buffers is not enough to get
reasonable performance on many modern systems.  I believe, however,
that Tom Lane thinks we need to get a bit more out of it than that to
make it worthwhile.  One obstacle to making the switch is that POSIX
shm does not provide a way to fetch the number of processes attached
to the shared memory segment, which is a critical part of our
infrastructure to prevent accidentally running multiple postmasters on
the same data directory at the same time.  Consequently, it seems hard
to see how we can make that switch completely.  At a minimum, we'll
probably need to maintain a small sysv shm for interlock purposes.

OK, so let's suppose we use POSIX shm for most of the shared memory
segment, and keep only our fixed-size data structures in the sysv shm.
 Then what?  Well, then we can potentially resize it.  Because we are
using a process-based model, this will require some careful
gymnastics.  Let's say we're growing the shm.  The backend that is
initiating the operation will call ftruncate() and then send signal
all of the other backends (using a sinval message or a multiplexed
signal or some such mechanism) to unmap and remap the shared memory
segment.  Any failure to remap the shared memory segment is at least a
FATAL for that backend, and very likely a PANIC, so this had better
not be something we plan to do routinely - for example, we wouldn't
want to do this as a way of adapting to changing load conditions.  It
would probably be acceptable to do it in a situation such as a
postgresql.conf reload, to accommodate a change in the server
parameter that can't otherwise be changed without a restart, since the
worst case scenario is, well, we have to restart anyway.  Once all
that's done, it's safe to start allocating memory from the newly added
portion of the shm.  Conversely, if we want to shrink the shm, the
considerations are similar, but we have to do everything in the
opposite order.  First, we must ensure that the portion of the shm
we're about to release is unused.  Then, we tell all the backends to
unmap and remap it.  Once we've confirmed that they have done so, we
ftruncate() it to the new size.

Next, we have to think about how we're going to resize data structures
within this expandable shm.  Many of these structures are not things
that we can easily move without bringing the system to a halt.  For
example, it's difficult to see how you could change the base address
of shared buffers without ceasing all system activity, at which point
there's not really much advantage over just forcing a restart.
Similarly with LWLocks or the ProcArray.  And if you can't move them,
then how will you grow them if (as will likely be the case) there's
something immediately following them in memory.  One possible solution
is to divide up these data structures into slabs.  For example, 

Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Markus Wanner

Hi,

thanks for your feedback on this, it sort of got lost below the 
discussion about the dynamic shared memory stuff, IMO.


On 08/26/2010 04:39 AM, Robert Haas wrote:

It's not clear to me whether it's better to have a single coordinator
process that handles both autovacuum and other things, or whether it's
better to have two separate processes.


It has been proposed by Alvaro and/or Tom (IIRC) to reduce code 
duplication. Compared to the former approach, it certainly seems cleaner 
that way and it has helped reduce duplicate code a lot.


I'm envisioning such a coordinator process to handle coordination of 
other background processes as well, for example for distributed and/or 
parallel querying.


Having just only one process reduces the amount of interaction required 
between coordinators (i.e. autovacuum shouldn't ever start on databases 
for which replication didn't start, yet, as the autovacuum worker would 
be unable to connect to the database at that stage). It also reduces the 
amount of extra processes required, and thus I think also general 
complexity.


What'd be the benefits of having separate coordinator processes? They'd 
be doing pretty much the same: coordinate background processes. (And 
yes, I clearly consider autovacuum to be just one kind of background 
process).



I agree with this criticism, but the other thing that strikes me as a
nonstarter is having the postmaster participate in the imessages
framework.


This is simply not the case (anymore). (And one of the reasons a 
separate coordinator process is required, instead of letting the 
postmaster do this kind of coordination).



Our general rule is that the postmaster must avoid
touching shared memory; else a backend that scribbles on shared memory
might take out the postmaster, leading to a failure of the
crash-and-restart logic.


That rule is well understood and followed by the bg worker 
infrastructure patches. If you find code for which that isn't true, 
please point at it. The crash-and-restart logic should work just as it 
did with the autovacuum launcher.


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Markus Wanner

Itagaki-san,

thanks for reviewing this.

On 08/26/2010 03:39 AM, Itagaki Takahiro wrote:

Other changes in the patch doesn't seem be always needed for the purpose.
In other words, the patch is not minimal.


Hm.. yeah, maybe the separation between step1 and step2 is a bit 
arbitrary. I'll look into it.



The original purpose could be done without IMessage.


Agreed, that's the one exception. I've mentioned why that is and I don't 
currently feel like coding an unneeded variant which doesn't use imessages.



Also, min/max_spare_background_workers are not used in the patch at all.


You are right, it only starts to get used in step2, so the addition 
should probably move there, right.



(BTW, min/max_spare_background_*helpers* in postgresql.conf.sample is
maybe typo.)


Uh, correct, thank you for pointing this out. (I've originally named 
these helper processes before. Merging with autovacuum, it made more 
sense to name them background *workers*)



The most questionable point for me is why you didn't add any hook functions
in the coordinator process.


Because I'm a hook-hater. ;-)

No, seriously: I don't see what problem hooks could have solved. I'm 
coding in C and extending the Postgres code. Deciding for hooks and an 
API to use them requires good knowledge of where exactly you want to 
hook and what API you want to provide. Then that API needs to remain 
stable for an extended time. I don't think any part of the bg worker 
infrastructure currently is anywhere close to that.



With the patch, you can extend the coordinator
protocols with IMessage, but it requires patches to core at handle_imessage().
If you want fully-extensible workers, we should provide a method to develop
worker codes in an external plugin.


It's originally intended as an internal infrastructure. Offering its 
capabilities to the outside would requires stabilization, security 
control and working out an API. All of which is certainly not something 
I intend to do.



Is it possible to develop your own codes in the plugin? If possible, you can
use IMessage as a private protocol freely in the plugin. Am I missing something?


Well, what problem(s) are you trying to solve with such a thing? I've no 
idea what direction you are aiming at, sorry. However, it's certainly 
different or extending bg worker, so it would need to be a separate 
patch, IMO.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Markus Wanner

On 08/26/2010 05:01 AM, Itagaki Takahiro wrote:

Markus, do you need B? Or A + standard backend processes are enough?
If you need B eventually, starting with B might be better.


No, I certainly don't need B.

Why not just use an ordinary backend to do user defined background 
processing? It covers all of the API stability and the security issues 
I've raised.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Itagaki Takahiro
On Thu, Aug 26, 2010 at 7:42 PM, Markus Wanner mar...@bluegap.ch wrote:
 Markus, do you need B? Or A + standard backend processes are enough?

 No, I certainly don't need B.

OK, I see why you proposed coordinator hook (yeah, I call it hook :)
rather than adding user-defined processes.

 Why not just use an ordinary backend to do user defined background
 processing? It covers all of the API stability and the security issues I've
 raised.

However, we have autovacuum worker processes in addition to normal backend
processes. Does it show a fact that there are some jobs we cannot run in
normal backends?

 For example, normal backends cannot do anything in idle time, so a
time-based polling job is difficult in backends. It might be ok to
fork processes for each interval when the polling interval is long,
but it is not effective for short interval cases.  I'd like to use
such kind of process as an additional stats collector.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Markus Wanner

Itagaki-san,

On 08/26/2010 01:02 PM, Itagaki Takahiro wrote:

OK, I see why you proposed coordinator hook (yeah, I call it hook :)
rather than adding user-defined processes.


I see. If you call that a hook, I'm definitely not a hook-hater ;-)  at 
least not according to your definition.



However, we have autovacuum worker processes in addition to normal backend
processes. Does it show a fact that there are some jobs we cannot run in
normal backends?


Hm.. understood. You can use VACUUM from a cron job. And that's the 
problem autovacuum solves. So in a way, that's just a convenience 
feature. You want the same for general purpose user defined background 
processing, right?



  For example, normal backends cannot do anything in idle time, so a
time-based polling job is difficult in backends. It might be ok to
fork processes for each interval when the polling interval is long,
but it is not effective for short interval cases.  I'd like to use
such kind of process as an additional stats collector.


Did you follow the discussion I had with Dimitri, who was trying 
something similar, IIRC. See the bg worker - overview thread. There 
might be some interesting bits thinking into that direction.


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Robert Haas
On Thu, Aug 26, 2010 at 6:07 AM, Markus Wanner mar...@bluegap.ch wrote:
 What'd be the benefits of having separate coordinator processes? They'd be
 doing pretty much the same: coordinate background processes. (And yes, I
 clearly consider autovacuum to be just one kind of background process).

I dunno.  It was just a thought.  I haven't actually looked at the
code to see how much synergy there is.  (Sorry, been really busy...)

 I agree with this criticism, but the other thing that strikes me as a
 nonstarter is having the postmaster participate in the imessages
 framework.

 This is simply not the case (anymore). (And one of the reasons a separate
 coordinator process is required, instead of letting the postmaster do this
 kind of coordination).

Oh, OK.  I see now that I misinterpreted what you wrote.

On the more general topic of imessages, I had one other thought that
might be worth considering.  Instead of using shared memory, what
about using a file that is shared between the sender and receiver?  So
for example, perhaps each receiver will read messages from a file
called pg_messages/%d, where %d is the backend ID.  And writers will
write into that file.  Perhaps both readers and writers mmap() the
file, or perhaps there's a way to make it work with just read() and
write().  If you actually mmap() the file, you could probably manage
it in a fashion pretty similar to what you had in mind for wamalloc,
or some other setup that minimizes locking.  In particular, ISTM that
if we want this to be usable for parallel query, we'll want to be able
to have one process streaming data in while another process streams
data out, with minimal interference between these two activities.  On
the other hand, for processes that only send and receive messages
occasionally, this might just be overkill (and overhead).  You'd be
just as well off wrapping the access to the file in an LWLock: the
reader takes the lock, reads the data, marks it read, and releases the
lock.  The writer takes the lock, writes data, and releases the lock.

It almost seems to me that there are two different kinds of messages
here: control messages and data messages.  Control messages are things
like vacuum this database! or flush your cache! or execute this
query and send the results to backend %d! or cancel the currently
executing query!.  They are relatively small (in some cases,
fixed-size), relatively low-volume, don't need complex locking, and
can generally be processed serially but with high priority.  Data
messages are streams of tuples, either from a remote database from
which we are replicating, or between backends that are executing a
parallel query.  These messages may be very large and extremely
high-volume, are very sensitive to concurrency problems, but are not
high-priority.  We want to process them as quickly as possible, of
course, but the work may get interrupted by control messages.  Another
point is that it's reasonable, at least in the case of parallel query,
for the action of sending a data message to *block*.  If one part of
the query is too far ahead of the rest of the query, we don't want to
queue up results forever, perhaps using CPU or I/O resources that some
other backend needs to catch up, exhausting available disk space, etc.
 Instead, at some point, we just block and wait for the queue to
drain.  I suppose there's no avoiding the possibility that sending a
control message might also block, but certainly we wouldn't like a
control message to block because the relevant queue is full of data
messages.

So I kind of wonder whether we ought to have two separate systems, one
for data and one for control, with someone different characteristics.
I notice that one of your bg worker patches is for OOO-messages.  I
apologize again for not having read through it, but how much does that
resemble separating the control and data channels?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Markus Wanner

Robert,

On 08/26/2010 02:44 PM, Robert Haas wrote:

I dunno.  It was just a thought.  I haven't actually looked at the
code to see how much synergy there is.  (Sorry, been really busy...)


No problem, was just wondering if there's any benefit you had in mind.


On the more general topic of imessages, I had one other thought that
might be worth considering.  Instead of using shared memory, what
about using a file that is shared between the sender and receiver?


What would that buy us? (At the price of more system calls and disk 
I/O)? Remember that the current approach (IIRC) uses exactly one syscall 
to send a message: kill() to send the (multiplexed) signal. (Except on 
strange platforms or setups that don't have a user-space spinlock 
implementation and need to use system mutexes).



So
for example, perhaps each receiver will read messages from a file
called pg_messages/%d, where %d is the backend ID.  And writers will
write into that file.  Perhaps both readers and writers mmap() the
file, or perhaps there's a way to make it work with just read() and
write().  If you actually mmap() the file, you could probably manage
it in a fashion pretty similar to what you had in mind for wamalloc,
or some other setup that minimizes locking.


That would still require proper locking, then. So I'm not seeing the 
benefit.



In particular, ISTM that
if we want this to be usable for parallel query, we'll want to be able
to have one process streaming data in while another process streams
data out, with minimal interference between these two activities.


That's well possible with the current approach. About the only 
limitation is that a receiver can only consume the messages in the order 
they got into the queue. But pretty much any backend can send messages 
to any other backend concurrently.


(Well, except that I think there currently are bugs in wamalloc).


On
the other hand, for processes that only send and receive messages
occasionally, this might just be overkill (and overhead).  You'd be
just as well off wrapping the access to the file in an LWLock: the
reader takes the lock, reads the data, marks it read, and releases the
lock.  The writer takes the lock, writes data, and releases the lock.


The current approach uses plain spinlocks, which are more efficient. 
Note that both, appending as well as removing from the queue are writing 
operations, from the point of view of the queue. So I don't think 
LWLocks buy you anything here, either.



It almost seems to me that there are two different kinds of messages
here: control messages and data messages.  Control messages are things
like vacuum this database! or flush your cache! or execute this
query and send the results to backend %d! or cancel the currently
executing query!.  They are relatively small (in some cases,
fixed-size), relatively low-volume, don't need complex locking, and
can generally be processed serially but with high priority.  Data
messages are streams of tuples, either from a remote database from
which we are replicating, or between backends that are executing a
parallel query.  These messages may be very large and extremely
high-volume, are very sensitive to concurrency problems, but are not
high-priority.  We want to process them as quickly as possible, of
course, but the work may get interrupted by control messages.  Another
point is that it's reasonable, at least in the case of parallel query,
for the action of sending a data message to *block*.  If one part of
the query is too far ahead of the rest of the query, we don't want to
queue up results forever, perhaps using CPU or I/O resources that some
other backend needs to catch up, exhausting available disk space, etc.


I agree that such a thing isn't currently covered. And it might be 
useful. However, adding two separate queues with different priority 
would be very simple to do. (Note, however, that there already are the 
standard unix signals for very simple kinds of control signals. I.e. for 
aborting a parallel query, you could simply send SIGINT to all 
background workers involved).


I understand the need to limit the amount of data in flight, but I don't 
think that sending any type of message should ever block. Messages are 
atomic in that regard. Either they are ready to be delivered (in 
entirety) or not. Thus the sender needs to hold back the message, if the 
recipient is overloaded. (Also note that currently imessages are bound 
to a maximum size of around 8 KB).


It might be interesting to note that I've just implemented some kind of 
streaming mechanism *atop* of imessages for Postgres-R. A data stream 
gets fragmented into single messages. As you pointed out, there should 
be some kind of congestion control. However, in my case, that needs to 
cover the inter-node connection as well, not just imessages. So I think 
the solution to that problem needs to be found on a higher level. I.e. 
in the Postgres-R case, I want to limit the *overall* amount of recovery 
data 

Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Tom Lane
Markus Wanner mar...@bluegap.ch writes:
 On 08/26/2010 02:44 PM, Robert Haas wrote:
 On the more general topic of imessages, I had one other thought that
 might be worth considering.  Instead of using shared memory, what
 about using a file that is shared between the sender and receiver?

 What would that buy us?

Not having to have a hard limit on the space for unconsumed messages?

 The current approach uses plain spinlocks, which are more efficient. 

Please note the coding rule that says that the code should not execute
more than a few straight-line instructions while holding a spinlock.
If you're copying long messages while holding the lock, I don't think
spinlocks are acceptable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Markus Wanner

On 08/26/2010 09:22 PM, Tom Lane wrote:

Not having to have a hard limit on the space for unconsumed messages?


Ah, I see. However, spilling to disk is unwanted for the current use 
cases of imessages. Instead the sender needs to be able to deal with 
out-of-(that-specific-part-of-shared)-memory conditions.



Please note the coding rule that says that the code should not execute
more than a few straight-line instructions while holding a spinlock.
If you're copying long messages while holding the lock, I don't think
spinlocks are acceptable.


Writing the payload data for imessages to shared memory doesn't need any 
kind of lock. (Because the relevant chunk of shared memory got allocated 
via wamalloc, which grants the allocator exclusive control over the 
returned chunk). Only appending and removing (the pointer to the data) 
to and from the queue requires taking a spinlock. And I think that still 
qualifies.


However, your concern is valid for wamalloc, which is more critical in 
that regard.


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Robert Haas
On Thu, Aug 26, 2010 at 3:03 PM, Markus Wanner mar...@bluegap.ch wrote:
 On the more general topic of imessages, I had one other thought that
 might be worth considering.  Instead of using shared memory, what
 about using a file that is shared between the sender and receiver?

 What would that buy us? (At the price of more system calls and disk I/O)?
 Remember that the current approach (IIRC) uses exactly one syscall to send a
 message: kill() to send the (multiplexed) signal. (Except on strange
 platforms or setups that don't have a user-space spinlock implementation and
 need to use system mutexes).

It wouldn't require you to preallocate a big chunk of shared memory
without knowing how much of it you'll actually need.  For example,
suppose we implement parallel query.  If the message queues can be
allocated on the fly, then you can just say
maximum_message_queue_size_per_backend = 16MB and that'll probably be
good enough for most installations.  On systems where parallel query
is not used (e.g. because they only have 1 or 2 processors) then it
costs nothing.  On systems where parallel query is used extensively
(e.g. because they have 32 processors), you'll allocate enough space
for the number of backends that actually need message buffers, and not
more than that.  Furthermore, if parallel query is used at some times
(say, for nightly reporting) but not others (say, for daily OLTP
queries), the buffers can be deallocated when the helper backends exit
(or paged out if they are idle), and that memory can be reclaimed for
other use.

In addition, it means that maximum_message_queue_size_per_backend (or
whatever it's called) can be changed on-the-fly; that is, it can be
PGC_SIGHUP rather than PGC_POSTMASTER.  Being able to change GUCs
without shutting down the postmaster is a *big deal* for people
running in 24x7 operations.  Even things like wal_level that aren't
apt to be changed more than once in a blue moon are a problem (once
you go from not having a standby to having a standby, you're
unlikely to want to go backwards), and this would likely need more
tweaking.  You might find that you need more memory for better
throughput, or that you need to reclaim memory for other purposes.
Especially if it's a hard allocation for any number of backends,
rather than something that backends can allocate only as and when they
need it.

As to efficiency, the process is not much different once the initial
setup is completed.  Just because you write to a memory-mapped file
rather than a shared memory segment doesn't mean that you're
necessarily doing disk I/O.  On systems that support it, you could
also choose to map a named POSIX shm rather than a disk file.  Either
way, there might be a little more overhead at startup but that doesn't
seem so bad; presumably the amount of work that the worker is doing is
large compared to the overhead of a few system calls, or you're
probably in trouble anyway, since our process startup overhead is
pretty substantial already.  The only time it seems like the overhead
would be annoying is if a process is going to use this system, but
only lightly.  Doing the extra setup just to send one or two messages
might suck.  But maybe that just means this isn't the right mechanism
for those cases (e.g. the existing XID-wraparound logic should still
use signal multiplexing rather than this system).  I see the value of
this as being primarily for streaming big chunks of data, not so much
for sending individual, very short messages.

 On
 the other hand, for processes that only send and receive messages
 occasionally, this might just be overkill (and overhead).  You'd be
 just as well off wrapping the access to the file in an LWLock: the
 reader takes the lock, reads the data, marks it read, and releases the
 lock.  The writer takes the lock, writes data, and releases the lock.

 The current approach uses plain spinlocks, which are more efficient. Note
 that both, appending as well as removing from the queue are writing
 operations, from the point of view of the queue. So I don't think LWLocks
 buy you anything here, either.

I agree that this might not be useful.  We don't really have all the
message types defined yet, though, so it's hard to say.

 I understand the need to limit the amount of data in flight, but I don't
 think that sending any type of message should ever block. Messages are
 atomic in that regard. Either they are ready to be delivered (in entirety)
 or not. Thus the sender needs to hold back the message, if the recipient is
 overloaded. (Also note that currently imessages are bound to a maximum size
 of around 8 KB).

That's functionally equivalent to blocking, isn't it?  I think that's
just a question of what API you want to expose.

 It might be interesting to note that I've just implemented some kind of
 streaming mechanism *atop* of imessages for Postgres-R. A data stream gets
 fragmented into single messages. As you pointed out, there should be some
 kind of congestion 

Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-26 Thread Robert Haas
On Thu, Aug 26, 2010 at 3:40 PM, Markus Wanner mar...@bluegap.ch wrote:
 On 08/26/2010 09:22 PM, Tom Lane wrote:

 Not having to have a hard limit on the space for unconsumed messages?

 Ah, I see. However, spilling to disk is unwanted for the current use cases
 of imessages. Instead the sender needs to be able to deal with
 out-of-(that-specific-part-of-shared)-memory conditions.

Shared memory can be paged out, too, if it's not being used enough to
keep the OS from deciding to evict it.  And I/O to a mmap()'d file or
shared memory region can remain in RAM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-25 Thread Itagaki Takahiro
On Tue, Jul 13, 2010 at 11:31 PM, Markus Wanner mar...@bluegap.ch wrote:
 This patch turns the existing autovacuum launcher into an always running
 process, partly called the coordinator. If autovacuum is disabled, the
 coordinator process still gets started and keeps around, but it doesn't
 dispatch vacuum jobs.

I think this part is a reasonable proposal, but...

 The coordinator process now uses imessages to communicate with background
 (autovacuum) workers and to trigger a vacuum job.
 It also adds two new controlling GUCs: min_spare_background_workers and
 max_spare_background_workers.

Other changes in the patch doesn't seem be always needed for the purpose.
In other words, the patch is not minimal.
The original purpose could be done without IMessage.
Also, min/max_spare_background_workers are not used in the patch at all.
(BTW, min/max_spare_background_*helpers* in postgresql.conf.sample is
maybe typo.)

The most questionable point for me is why you didn't add any hook functions
in the coordinator process. With the patch, you can extend the coordinator
protocols with IMessage, but it requires patches to core at handle_imessage().
If you want fully-extensible workers, we should provide a method to develop
worker codes in an external plugin.

Is it possible to develop your own codes in the plugin? If possible, you can
use IMessage as a private protocol freely in the plugin. Am I missing something?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-25 Thread Robert Haas
On Wed, Aug 25, 2010 at 9:39 PM, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 On Tue, Jul 13, 2010 at 11:31 PM, Markus Wanner mar...@bluegap.ch wrote:
 This patch turns the existing autovacuum launcher into an always running
 process, partly called the coordinator. If autovacuum is disabled, the
 coordinator process still gets started and keeps around, but it doesn't
 dispatch vacuum jobs.

 I think this part is a reasonable proposal, but...

It's not clear to me whether it's better to have a single coordinator
process that handles both autovacuum and other things, or whether it's
better to have two separate processes.

 The coordinator process now uses imessages to communicate with background
 (autovacuum) workers and to trigger a vacuum job.
 It also adds two new controlling GUCs: min_spare_background_workers and
 max_spare_background_workers.

 Other changes in the patch doesn't seem be always needed for the purpose.
 In other words, the patch is not minimal.
 The original purpose could be done without IMessage.
 Also, min/max_spare_background_workers are not used in the patch at all.
 (BTW, min/max_spare_background_*helpers* in postgresql.conf.sample is
 maybe typo.)

 The most questionable point for me is why you didn't add any hook functions
 in the coordinator process. With the patch, you can extend the coordinator
 protocols with IMessage, but it requires patches to core at handle_imessage().
 If you want fully-extensible workers, we should provide a method to develop
 worker codes in an external plugin.

I agree with this criticism, but the other thing that strikes me as a
nonstarter is having the postmaster participate in the imessages
framework.  Our general rule is that the postmaster must avoid
touching shared memory; else a backend that scribbles on shared memory
might take out the postmaster, leading to a failure of the
crash-and-restart logic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: patch 1 of 6 - permanent process

2010-08-25 Thread Itagaki Takahiro
On Thu, Aug 26, 2010 at 11:39 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jul 13, 2010 at 11:31 PM, Markus Wanner mar...@bluegap.ch wrote:
 This patch turns the existing autovacuum launcher into an always running
 process, partly called the coordinator.

 It's not clear to me whether it's better to have a single coordinator
 process that handles both autovacuum and other things, or whether it's
 better to have two separate processes.

Ah, we can separate the proposal to two topics:
  A. Support to run non-vacuum jobs from autovacuum launcher
  B. Support user defined background processes

A was proposed in the original 1 of 6 patch, but B might be more general.
If we have a separated coordinator, B will be required.

Markus, do you need B? Or A + standard backend processes are enough?
If you need B eventually, starting with B might be better.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers