Re: Cyrus process model...

2003-02-27 Thread Rob Siemborski
On Wed, 26 Feb 2003, Rob Mueller wrote:

> [ Continued from an off mailing list conversation about killing Cyrus lmtpd
> processes when they go haywire, and cyrus process accounting ]

Actually, cyrus-devel would have probably been an even better place to put
this (and I'm cross-posting there).

> Would the cyrus team think it worthwhile to consider refactoring to use the
> new Apache 2 APR modules? I know off hand that it would be a lot of work,
> but it could be a gradual re-factoring process, and the idea of actually
> reusing code between projects would be *really* nice.

I'm definitely in agreement about refactoring (indeed, your original Sieve
issue just went away in 2.2, since we gutted most of the sieve framework
to use compiled bytecode instead).

I am nervous every time someone suggests adding a dependency, however.  As
it is, Cyrus has a larger-than-average [potential] number of dependencies:

Berkeley DB
Kerberos / GSSAPI
AFS
(in 2.2) OpenLDAP
Perl
OpenSSL
Cyrus SASL
 which in turn can potentially depend on [in addition to some items
  above]:
 GDBM/NDBM
 MySQL

Probably more which escape me at the moment.

The fact is, given the number of problems people already have getting the
dependencies to play nicely together, I am hesitant to add another.
Additionally, its hard enough to keep up with the changes between every
version of Berkeley DB (which are basically limited to 1 file in IMAPd,
and 2-3 in SASL), I can't imagine what it would be like if we had to do
that for most of (more than?) the functionality currently provided by
libcyrus.

As for your comments about the age of Cyrus's code, yes, that's true,
there are portions that show their age more than others (non-ANSI
prototypes, use of strcpy, strcat, etc).  However, we clean up the
non-ANSI stuff as we see it, and Security Appraisers and Bynari are
currently helping us clean up the string manipulation routines to be more
modern (along with other potential security issues).

As far as memory allocation, libcyrus has memory pool routines, and we use
them where there is an efficiency benefit to do so (maybe we could do
better, I don't know).  It is not entirely clear to me that we should use
them in a global way, especially on long-running connections (apache can
use them globally, since HTTP connections are typically short-lived).

In any case, we're always open to listening to new design ideas (that
doesn't mean we will automatically do whatever is suggested of course ;).

-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper



Re: Cyrus process model...

2003-02-26 Thread Lawrence Greenfield
   From: "Rob Mueller" <[EMAIL PROTECTED]>
   Date: Thu, 27 Feb 2003 08:22:09 +1100
[...]
   In the case of cyrus, I think you can quite happily stick with the
   multi-process model, I wasn't advocating moving to a threaded model. The
   discussion started due to an issue with killing child processes. Apparently
   there are currently race conditions in 'master' that means that a killed
   child may not be correctly recognised by the 'master' process as a dead
   child. I commented that I thought a master/forked child idiom had been used
   in unix for 30 years, and shouldn't there be cookbook solutions for most of
   these issues? Which started me looking for libraries that might have already
   done this...

Sigh. It is _not_ a race condition in master. Master is working just
fine.

The services do not always deal correctly with receiving signals.

The process accounting patch for master (which works around the
services not always doing the right thing) introduces race conditions,
which is one of the reasons why it hasn't been applied.

Being able to deal with "kill -9" processes is definitely _not_ a
design goal, since kill -9 can leave the mail spool in an arbitrary
corrupted state.

Larry



Re: Cyrus process model...

2003-02-26 Thread David Lang
I will also add that on current *nix systems the advantages of threads
over processes is a lot less then it used to be. In my case we are running
apache2 on AIX and found no noticable difference between the two (so we
are useing processes for the stability reasons you note below)

David Lang

 On Thu, 27 Feb 2003, Rob Mueller wrote:

> >   It is always a big pain to update code that was never written to be
> > threaded, to be thread-safe.  Apache2 has a problems with just about every
> > third party module supported under Apache 1.3.  I imagine that Cyrus would
> > have all sorts of thread issues.  There is no magic solution for that.
>
> I'm not convinced that it's necessary to make it thread safe. In many
> situations I think threads are a step backwards. While it always feels a bit
> odd to think of it as a positive, the multiple process model introduces an
> inherent stability, even for non-optimal (buggy) code that can crash a
> process. In that case, only one connection/instance is lost, and no-one else
> is affected. In multithreaded code, one bad crashed thread *can* take out
> the entire process and all connections. Of course, if your code has to share
> a lot of information between each 'instance', then threads are very useful.
>
> In the case of cyrus, I think you can quite happily stick with the
> multi-process model, I wasn't advocating moving to a threaded model. The
> discussion started due to an issue with killing child processes. Apparently
> there are currently race conditions in 'master' that means that a killed
> child may not be correctly recognised by the 'master' process as a dead
> child. I commented that I thought a master/forked child idiom had been used
> in unix for 30 years, and shouldn't there be cookbook solutions for most of
> these issues? Which started me looking for libraries that might have already
> done this...
>
> >   Besides, if anyone really wants to take Cyrus to the next generation,
> > create a new NG branch in CVS (on your own CVS server if necessary), and
> > start "refactoring" away.  (Of course, "refactoring" has to be the most
> > overused term in software development at the moment, and is touted as a
> > solution for everything from bad design, to poor management).
>
> The thing is, in my experience, refactoring actually works, regardless of
> it's buzz word of the week or not. Better yet, *continuous* refactoring
> seems to work the best! Hmmm, not that I find that easy to define. I guess
> it's being aware as you work on a project, which parts are clearly beginning
> to feel 'wrong' (hmmm, more subjective thoughts...), and devoting some time
> to actually fixing up those problem areas. This is generally a lot easier if
> you're good at creating interfaces and sticking to them. Of course, being
> forced to work around an interface is one of the clear signs of something
> being 'wrong'.
>
> Rob
>


Re: Cyrus process model...

2003-02-26 Thread Rob Mueller

>   It is always a big pain to update code that was never written to be
> threaded, to be thread-safe.  Apache2 has a problems with just about every
> third party module supported under Apache 1.3.  I imagine that Cyrus would
> have all sorts of thread issues.  There is no magic solution for that.

I'm not convinced that it's necessary to make it thread safe. In many
situations I think threads are a step backwards. While it always feels a bit
odd to think of it as a positive, the multiple process model introduces an
inherent stability, even for non-optimal (buggy) code that can crash a
process. In that case, only one connection/instance is lost, and no-one else
is affected. In multithreaded code, one bad crashed thread *can* take out
the entire process and all connections. Of course, if your code has to share
a lot of information between each 'instance', then threads are very useful.

In the case of cyrus, I think you can quite happily stick with the
multi-process model, I wasn't advocating moving to a threaded model. The
discussion started due to an issue with killing child processes. Apparently
there are currently race conditions in 'master' that means that a killed
child may not be correctly recognised by the 'master' process as a dead
child. I commented that I thought a master/forked child idiom had been used
in unix for 30 years, and shouldn't there be cookbook solutions for most of
these issues? Which started me looking for libraries that might have already
done this...

>   Besides, if anyone really wants to take Cyrus to the next generation,
> create a new NG branch in CVS (on your own CVS server if necessary), and
> start "refactoring" away.  (Of course, "refactoring" has to be the most
> overused term in software development at the moment, and is touted as a
> solution for everything from bad design, to poor management).

The thing is, in my experience, refactoring actually works, regardless of
it's buzz word of the week or not. Better yet, *continuous* refactoring
seems to work the best! Hmmm, not that I find that easy to define. I guess
it's being aware as you work on a project, which parts are clearly beginning
to feel 'wrong' (hmmm, more subjective thoughts...), and devoting some time
to actually fixing up those problem areas. This is generally a lot easier if
you're good at creating interfaces and sticking to them. Of course, being
forced to work around an interface is one of the clear signs of something
being 'wrong'.

Rob



Re: Cyrus process model...

2003-02-26 Thread Tom Samplonius

  It is always a big pain to update code that was never written to be
threaded, to be thread-safe.  Apache2 has a problems with just about every
third party module supported under Apache 1.3.  I imagine that Cyrus would
have all sorts of thread issues.  There is no magic solution for that.

  Besides, if anyone really wants to take Cyrus to the next generation,
create a new NG branch in CVS (on your own CVS server if necessary), and
start "refactoring" away.  (Of course, "refactoring" has to be the most
overused term in software development at the moment, and is touted as a
solution for everything from bad design, to poor management).

Tom

On Tue, 25 Feb 2003, David Lang wrote:

> as someone attempting to get apache 2 running (reliably) in a high volume
> environment I can say the idea is interesting, but I definantly wouldn't
> rush into useing it. if you have some time and want to get a start on
> something that may (or may not) be worth doing in the long run you can
> start on it, but don't stop maintaining the current version, the apache
> core code may not be the right thing in the long run.
> 
> David Lang
> 
> 
>  On Wed, 26 Feb 2003, Rob Mueller wrote:
> 
> > Date: Wed, 26 Feb 2003 16:45:00 +1100
> > From: Rob Mueller <[EMAIL PROTECTED]>
> > To: Lawrence Greenfield <[EMAIL PROTECTED]>,
> >  Rob Siemborski <[EMAIL PROTECTED]>
> > Cc: Ken Murchison <[EMAIL PROTECTED]>,
> >  info-cyrus <[EMAIL PROTECTED]>
> > Subject: Cyrus process model...
> >
> > [ Continued from an off mailing list conversation about killing cyrus lmtpd
> > processes when they go haywire, and cyrus process accounting ]
> >
> > > > Surely this is a relatively well solved problem? Just about every unix
> > > > system uses this master/forked child approach? How does apache do it?
> > > > Net::Server::PreFork? I can't imagine that there aren't cookbook
> > solutions
> > > > to this issue since it's what unix has been doing for 30 years? Or is
> > there
> > > > something I'm missing here?
> >
> > > There are many different possibilities. Most other systems limit the
> > > number of clients instead of forking new processes on demand without a
> > > set limit. Apache also doesn't have differentiated children or
> > > substantial shared state. (All children are members of the same
> > > service or you don't particularly care how many additional unused children
> > > you have...)
> >
> > I was under the impression that Apache 2 was planning on making it's
> > forking/threading model much more generic, and supporting a general
> > 'services' model, including a library to abstract the underlying OS? Hmmm,
> > looking into that, it appears that it's mostly done already.
> >
> > http://apr.apache.org/
> > http://apr.apache.org/apr2_0intro/apr2_0intro.htm
> >
> > And more:
> >
> > Contains following functionality
> > -Reading and writing of files
> > -Character set conversion
> > -Network communications using sockets
> > -Time management used for Internet type conversions
> > -String management like C++ including natural order management
> > -UNIX Password management routines
> > -Table management routines
> > -UUID Internet generation
> > -Filename canonicalization
> > -Random data generation
> > -Global lock management
> > -Threads and process management
> > -Dynamic library loading routines
> > -Memory mapped and shared memory
> >
> > -
> >
> > http://www.arctic.org/~dean/apache/2.0/process-model.html
> >
> > I think the above is general enough to implement the interesting process
> > models, and to implement optimizations that are available only in some of
> > the multi-threaded models. Note that nothing above is specific to HTTP, and
> > I believe that we should strive to keep the abstraction so that the same
> > libraries can be used to implement other types of servers (i.e. FTP,
> > streaming video/audio, corba).
> >
> > -
> >
> > Would the cyrus team think it worthwhile to consider refactoring to use the
> > new Apache 2 APR modules? I know off hand that it would be a lot of work,
> > but it could be a gradual re-factoring process, and the idea of actually
> > reusing code between projects would be *really* nice.
> >
> > Joel Spolsky is a big proponent of refactoring over time to improve software
> > and you can read some of his thoughts here.
> >
> > http://www.joelonsoftware.com/articles/fog69.html
> > http://www.joelonsoftware.com/news/fog000328.html
> >
> > Ooops, I'm feeling a rant come along...
> >
> > *** RANT MODE ***
> >
> > I know this is a little off topic, but the source for cyrus is really
> > showing it's age a bit. I know that happens with all software, you start
> > with certain assumptions, and the more you go on, the more the original
> > assumptions get blown away, so you hack this in here, and there, and then
> > every now and then, you go on a big cleanup spree! The problem I feel is
> > that the cleanup hasn't been big enough or often enough.
> >
> > Also, over time programming habits c

Re: Cyrus process model...

2003-02-25 Thread David Lang
as someone attempting to get apache 2 running (reliably) in a high volume
environment I can say the idea is interesting, but I definantly wouldn't
rush into useing it. if you have some time and want to get a start on
something that may (or may not) be worth doing in the long run you can
start on it, but don't stop maintaining the current version, the apache
core code may not be the right thing in the long run.

David Lang


 On Wed, 26 Feb 2003, Rob Mueller wrote:

> Date: Wed, 26 Feb 2003 16:45:00 +1100
> From: Rob Mueller <[EMAIL PROTECTED]>
> To: Lawrence Greenfield <[EMAIL PROTECTED]>,
>  Rob Siemborski <[EMAIL PROTECTED]>
> Cc: Ken Murchison <[EMAIL PROTECTED]>,
>  info-cyrus <[EMAIL PROTECTED]>
> Subject: Cyrus process model...
>
> [ Continued from an off mailing list conversation about killing cyrus lmtpd
> processes when they go haywire, and cyrus process accounting ]
>
> > > Surely this is a relatively well solved problem? Just about every unix
> > > system uses this master/forked child approach? How does apache do it?
> > > Net::Server::PreFork? I can't imagine that there aren't cookbook
> solutions
> > > to this issue since it's what unix has been doing for 30 years? Or is
> there
> > > something I'm missing here?
>
> > There are many different possibilities. Most other systems limit the
> > number of clients instead of forking new processes on demand without a
> > set limit. Apache also doesn't have differentiated children or
> > substantial shared state. (All children are members of the same
> > service or you don't particularly care how many additional unused children
> > you have...)
>
> I was under the impression that Apache 2 was planning on making it's
> forking/threading model much more generic, and supporting a general
> 'services' model, including a library to abstract the underlying OS? Hmmm,
> looking into that, it appears that it's mostly done already.
>
> http://apr.apache.org/
> http://apr.apache.org/apr2_0intro/apr2_0intro.htm
>
> And more:
>
> Contains following functionality
> -Reading and writing of files
> -Character set conversion
> -Network communications using sockets
> -Time management used for Internet type conversions
> -String management like C++ including natural order management
> -UNIX Password management routines
> -Table management routines
> -UUID Internet generation
> -Filename canonicalization
> -Random data generation
> -Global lock management
> -Threads and process management
> -Dynamic library loading routines
> -Memory mapped and shared memory
>
> -
>
> http://www.arctic.org/~dean/apache/2.0/process-model.html
>
> I think the above is general enough to implement the interesting process
> models, and to implement optimizations that are available only in some of
> the multi-threaded models. Note that nothing above is specific to HTTP, and
> I believe that we should strive to keep the abstraction so that the same
> libraries can be used to implement other types of servers (i.e. FTP,
> streaming video/audio, corba).
>
> -
>
> Would the cyrus team think it worthwhile to consider refactoring to use the
> new Apache 2 APR modules? I know off hand that it would be a lot of work,
> but it could be a gradual re-factoring process, and the idea of actually
> reusing code between projects would be *really* nice.
>
> Joel Spolsky is a big proponent of refactoring over time to improve software
> and you can read some of his thoughts here.
>
> http://www.joelonsoftware.com/articles/fog69.html
> http://www.joelonsoftware.com/news/fog000328.html
>
> Ooops, I'm feeling a rant come along...
>
> *** RANT MODE ***
>
> I know this is a little off topic, but the source for cyrus is really
> showing it's age a bit. I know that happens with all software, you start
> with certain assumptions, and the more you go on, the more the original
> assumptions get blown away, so you hack this in here, and there, and then
> every now and then, you go on a big cleanup spree! The problem I feel is
> that the cleanup hasn't been big enough or often enough.
>
> Also, over time programming habits change. Many old C idioms are pretty much
> dead. Most of the C string handling methods are now annoying, or downright
> dangerous. There are several dozen replacement libraries, including the APR
> one above, and good ones like
> http://www.annexia.org/freeware/c2lib/index.msp. This library also
> implements automatically resizing arrays and memory pools, a common way to
> avoid all subtle leaks introduced by malloc() and the like, and to avoid the
> buffer overflows of stack buffers.
>
> I'm sure I could go on and on, and I'd eventually get back to the fact that
> the biggest problem is that the original C lanuage and library is pretty
> horrible in hindsight. But people in general still only dare to use the
> original library and idioms, and loath to include extra dependencies in
> their products, often instead rewriting the same new set of libraries from
> scratch! *sigh* Ok