On 10/19/07, Jason Clark <[EMAIL PROTECTED]> wrote:
>
>
>
> -Jason
>
>
>
>
> ############################################################################
>
> ############################################################################
> #########
> This electronic mail transmission contains confidential information
> intended
> only for the person(s) named. Any use, distribution, copying or disclosure
> by another person is strictly prohibited.
>
> ############################################################################
> ############################################################################
>
> #########
>
> > -----Original Message-----
> > From: Simon Laws [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, October 18, 2007 9:42 PM
> > To: [email protected]
> > Subject: Re: Configuring and loading Composites and/or components at
> > runtime?
> >
> > On 10/19/07, Jason Clark < [EMAIL PROTECTED]> wrote:
> > >
> > > More questions :-) See below.
> > >
> > > -Jason
> > >
> > > > -----Original Message-----
> > > > From: Simon Laws [mailto: [EMAIL PROTECTED]
> > > > Sent: Thursday, October 18, 2007 9:42 AM
> > > > To: [email protected]
> > > > Subject: Re: Configuring and loading Composites and/or components at
> > > > runtime?
> > > >
> > > > Hi Jason, Welcome to Tuscany!
> > > >
> > > > Comments below...
> > > >
> > > > Would be great to have your input on how some of these features
> should
> > > > work
> > > > . So keep asking the questions:-)
> > > >
> > > > Regards
> > > >
> > > > Simon
> > > >
> > > > On 10/18/07, Jason Clark <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > Another question. Is there a way to load composites and components
>
> > at
> > > > > runtime? I'm looking at issues of distributed application
> > > survivability
> > > > > and
> > > > > in the event that a given service is no longer functional
> (hardware
> > > > crash,
> > > > > or other problems), I want to be able to relaunch a portion of the
> > > > domain
> > > > > after the component crashed. Is that possible?
> > > >
> > > >
> > > > There is an api for loading composites. It you look at most of our
> > > samples
> > > > you see that they use a domain api  to read a composite file which
> has
> > > the
> > > > effect or loading the application. We are doing some work on the
> node
> > > and
> > > > domain apis now so they have moved on a little from the 1.0 release.
> > The
> > > > idea with the distributed domain is that you can have mulitple nodes
>
> > in
> > > a
> > > > domain. A node is something that will run SCA applications, i.e.
> read
> > > and
> > > > run an SCA contribution. The nodes register with the domain and when
> > the
> > > > nodes expose services they register those with the domain also. With
> > > this
> > > > information you can build SCA applications without having to
> > explicitly
> > > > specify endpoints as  long as you are connecting components within
> the
> > > > domain.
> > > >
> > >
> > >
> > >
> > > Does this mean you can only restart composites and not components
> within
> > a
> > > composite? If 1 component crashed, I would have to shutdown and
> restart
> > > the
> > > entire composite?
> >
> >
> > Currently the level of granularity  of the API is start/stopping
> > composites
> > but under the covers this deals with individual components so that could
>
> > be
> > opened up relatively easily. This is actually one of the areas we have
> > been
> > debating, i.e. what level of control should be provided. Can you say a
> > little more about what it means for you when a component crashes. What
> is
> > the sign that a component has crashed? Is an exception reported? What
> are
> > the implications for restarting a component? Are you restarting some
> other
> > system that the component is wrapping?
>
>
> One of my biggest concerns is application survivability. My work is in the
> realm of Disaster Management, so one of my requirements is to have an
> application that more or less self heals. If part of it fails, it needs to
>
> recognize that and fix the problem. I guess if a component reports a
> problem, the entire composite can be restarted, but it might be nice to
> create a composite, part of which contains a health monitoring component
> that contains a reference to every other component. Should one fail, it
> could be restarted?
>
> To be honest, I don't know the solution to my problem and I don't know
> best
> practices for dealing with this sort of situation, so I'm more or less
> grasping at straws trying to figure out how to solve my dilemma.
>
> For a component to crash would be that the hardware failed, or the
> component
> threw an exception. In the latter case, most likely that would be a
> programming error for a rare case we didn't consider. The component might
> go
> up and down for as long as it takes for us to patch the app. For the
> former,
> I'm still trying to understand if it's even possible to restart a
> component
> on a different machine. If I understand everything correctly, this would
> be
> possible to do at the composite level. Can components of a composite even
> run on separate machines? I mean, not a reference but the actual instance?
>
> All domains get launched with a composite, of which another composite can
> reference as a component correct? Kind of? Or am I way off?
>
> >
> > > If a node crashes you can just restart is again and it will
> re-register.
> > > > There is a bit of code commented out in the invoker for the default
> > > > binding
> > > > that does retries and looks up the endpoint again in the case that
> the
> > > > endpoint can't be reached, i.e. the target node has failed. There
> was
> > > some
> > > > issue with it so that needs looking at but if we fix that then basic
> > > > restarts should be ok.
> > > >
> > > >
> > > > The best example I could find in the samples in the
> > > calculator-distrubuted
> > > > > example in the 1.0-incubating release, but the readme file lists a
> > few
> > > > > resource directories that are not in the project (resource/domain,
>
> > > > > resource/management). Is the readme incorrect or are the files
> > really
> > > > > missing?
> > > >
> > > >
> > > > Yeah, that looks like a mistake in the documentation. Can you raise
> a
> > > JIRA
> > > > for that and I'll fix it. The connection between the domains and the
> > > nodes
> > > > within it is actually implemented as an SCA application. That's what
> > the
> > > > domain and management directories had in them. These have been moved
> > now
> > > > and
> > > > are currently node-impl/src/main/resources/node.composite and
> > > > domain-impl/src/main/resources/domain.composite which seemed to make
>
> > > more
> > > > sense.
> > > >
> > >
> > >
> > > I can't seem to locate these two directories. I feel like I'm looking
> in
> > > the
> > > wrong place though. I'm guessing it's not in the sample project?
> Second,
> > > do
> > > I even need to see them? Are they general composites used to construct
> > > nodes, or are they specific to the calculator sample?
> >
> >
> > You are right, you don't really need to see them. Was just giving you a
> > bit
> > of background. There are system composites that are used by the node and
> > domain implementations to communicate with one another. They aren't
> > specific
> > to the calculator-distributed sample which is why they were moved. They
> > can
> > be found in trunk here
> >
> >
> http://svn.apache.org/repos/asf/incubator/tuscany/java/sca/modules/domain-
> > impl/src/main/resources/
> > http://svn.apache.org/repos/asf/incubator/tuscany/java/sca/modules/node-
> > impl/src/main/resources/
> >
> > You may have noticed already the way that the SCA code is layed out. All
> > of
> > the modules that provide SCA runtime features appear under the modules
> > directory. You can only see the modules in a source distribution.. Some
> of
> > these modules are loaded statically and some are loaded dynamically
> > (primarily optional bindings, databindings and extensions). Then you see
> > there are samples and demos that show various features being used. Also
> in
> > trunk, but not in the distributions, you see there is an integration
> test
> > (itest) directory.
> >
> >
> > > Thanks.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
> Hi Jason, sorry this has gone unanswered. I was off last week (supposed to
be leave but spent the week being unwell instead:-( ). I shouldn't really
point out the irony in the loss of a developer in this case leading to your
mail not getting answered.

You outlined some interesting scenarios here around failure. There are of
course many hardware and software systems that provide varying levels of
fault tolerance using different strategies usually around duplication or
move and restart. As you know in the case of Tuscany today there is very
little but I would like to take some small steps to providing a little more.
We should try and use existing solutions to reliability where possible
without inventing new ones. Lets look at the different parts of the
distributed tuscany solution and try and make some simplifying assumptions.


Domain
  The domain is a single point of failure as is represents a registry of
deployed contributions/composites/compoenets/services. We should just assume
that this is running as a highly available service, e.g. a clustered web app
server.

Node
  The node provides the runtime for components and can fail itself.
   Failure of the runtime due to fatal errors in the Tuscany code,
extension, dependencies or applications running on it that are not handled
properly. A situation that causes the node JVM to stop would usually be
handled with a restart script of by a heartbeat. The simplest is the restart
approach but 1. the domain has to be able to handle re-registrations and 2.
the bindings have to be able to handle attempts to send messages to a node
that has gone away.

Component
   I've separated components from component instances here to point out that
the components that appear in the composite files are represented in an in
memory model. I consider any failure in the model in the same light as he
node above.

Component Instances
  Component instances are created to handle incoming messages based on the
in memory component model and the scope specified for each component. A
failure in a stateless component may not be serious at all as it simply
required an appropriate response to the calling component. A new component
instance will be created for next message. However where a component
instance exists for longer that the duration of a call it's a bit trickier
as we may have to recover any state that was held there and restart and
dependencies that that component instance relies on.

I'll give this a little bit more though and try and be a bit more scientific
about it (we could do with a wiki page where we can set out the different
scenarios we expect to encounter) so let me know if any of this is making
sense or not.

Regards

Simon

Reply via email to