James Carlson wrote:
> Jordan Brown (Sun) writes:
>   
>> James Carlson wrote:
>>     
>>> I can't reach into other address spaces, so those guys are still ok.
>>> (Yes, there's an interesting fate-sharing issue with shared memory,
>>> and having a mutual-core-dump pact among processes attached to a
>>> shared memory segment sounds like a cool idea, but we're not talking
>>> about anything like that here.)
>>>       
>> There's also shared files, databases, output streams, and so on.  They 
>> aren't *totally* independent.  A sed dying, ignored by its parent shell, 
>> can lead to damaged data being written into a file, and so on.  I agree, 
>>     
>
> Right, but the distinction I was drawing was between the "parent is on
> the hook to figure out what to do about failures" design school (i.e.,
> traditional UNIX) and the new SMF+contracts school that (at least by
> default) bucks that trend.
>
> Obviously, if the parent fails to live up to its end of the bargain in
> the old model, bad things are entirely possible.
>
> The "new" thing here is that parents are shot for the sins of
> grandchildren, and it happens in unexpected ways.  (Which goes back to
> the whole question of understanding how fault boundaries are set.
> It's an issue I don't think we understand well, or that any UNIX
> designer would _expect_.)
>   

I think I disagree.  The SMF model is remarkably similar to the
model used by Solaris Cluster and most other HA clustering products.
When providing an HA service, we tend to want all-or-nothing wrt
resource groups because trying to limp along is usually worse than
just failing over.  Parallel services can limp, but when we talk about
parallel services in the cluster space, we aren't talking about a single
OS instance.

Note that we do spend a lot of design time on the relationships
between resources.  In fact, this is one of the most important tasks
for implementing a HA service.  However, there are some typical
patterns. For example, the file system containing the applications
should be mounted before you try to start the applications. SMF has
a similar dependency structure to the cluster resource management
state machines.

So the reason I disagree is that people who do HA for a living
do understand these concepts.  If you want to get there, then you
also need to understand these concepts. If you expect SMF to
magically make anything you write HA, then you will be sadly
misshapen.
 -- richard


Reply via email to