On 10/19/07, Jason Clark <[EMAIL PROTECTED]> wrote: > > > > -Jason > > > > > ############################################################################ > > ############################################################################ > ######### > This electronic mail transmission contains confidential information > intended > only for the person(s) named. Any use, distribution, copying or disclosure > by another person is strictly prohibited. > > ############################################################################ > ############################################################################ > > ######### > > > -----Original Message----- > > From: Simon Laws [mailto:[EMAIL PROTECTED] > > Sent: Thursday, October 18, 2007 9:42 PM > > To: [email protected] > > Subject: Re: Configuring and loading Composites and/or components at > > runtime? > > > > On 10/19/07, Jason Clark < [EMAIL PROTECTED]> wrote: > > > > > > More questions :-) See below. > > > > > > -Jason > > > > > > > -----Original Message----- > > > > From: Simon Laws [mailto: [EMAIL PROTECTED] > > > > Sent: Thursday, October 18, 2007 9:42 AM > > > > To: [email protected] > > > > Subject: Re: Configuring and loading Composites and/or components at > > > > runtime? > > > > > > > > Hi Jason, Welcome to Tuscany! > > > > > > > > Comments below... > > > > > > > > Would be great to have your input on how some of these features > should > > > > work > > > > . So keep asking the questions:-) > > > > > > > > Regards > > > > > > > > Simon > > > > > > > > On 10/18/07, Jason Clark <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Another question. Is there a way to load composites and components > > > at > > > > > runtime? I'm looking at issues of distributed application > > > survivability > > > > > and > > > > > in the event that a given service is no longer functional > (hardware > > > > crash, > > > > > or other problems), I want to be able to relaunch a portion of the > > > > domain > > > > > after the component crashed. Is that possible? > > > > > > > > > > > > There is an api for loading composites. It you look at most of our > > > samples > > > > you see that they use a domain api to read a composite file which > has > > > the > > > > effect or loading the application. We are doing some work on the > node > > > and > > > > domain apis now so they have moved on a little from the 1.0 release. > > The > > > > idea with the distributed domain is that you can have mulitple nodes > > > in > > > a > > > > domain. A node is something that will run SCA applications, i.e. > read > > > and > > > > run an SCA contribution. The nodes register with the domain and when > > the > > > > nodes expose services they register those with the domain also. With > > > this > > > > information you can build SCA applications without having to > > explicitly > > > > specify endpoints as long as you are connecting components within > the > > > > domain. > > > > > > > > > > > > > > > > Does this mean you can only restart composites and not components > within > > a > > > composite? If 1 component crashed, I would have to shutdown and > restart > > > the > > > entire composite? > > > > > > Currently the level of granularity of the API is start/stopping > > composites > > but under the covers this deals with individual components so that could > > > be > > opened up relatively easily. This is actually one of the areas we have > > been > > debating, i.e. what level of control should be provided. Can you say a > > little more about what it means for you when a component crashes. What > is > > the sign that a component has crashed? Is an exception reported? What > are > > the implications for restarting a component? Are you restarting some > other > > system that the component is wrapping? > > > One of my biggest concerns is application survivability. My work is in the > realm of Disaster Management, so one of my requirements is to have an > application that more or less self heals. If part of it fails, it needs to > > recognize that and fix the problem. I guess if a component reports a > problem, the entire composite can be restarted, but it might be nice to > create a composite, part of which contains a health monitoring component > that contains a reference to every other component. Should one fail, it > could be restarted? > > To be honest, I don't know the solution to my problem and I don't know > best > practices for dealing with this sort of situation, so I'm more or less > grasping at straws trying to figure out how to solve my dilemma. > > For a component to crash would be that the hardware failed, or the > component > threw an exception. In the latter case, most likely that would be a > programming error for a rare case we didn't consider. The component might > go > up and down for as long as it takes for us to patch the app. For the > former, > I'm still trying to understand if it's even possible to restart a > component > on a different machine. If I understand everything correctly, this would > be > possible to do at the composite level. Can components of a composite even > run on separate machines? I mean, not a reference but the actual instance? > > All domains get launched with a composite, of which another composite can > reference as a component correct? Kind of? Or am I way off? > > > > > > If a node crashes you can just restart is again and it will > re-register. > > > > There is a bit of code commented out in the invoker for the default > > > > binding > > > > that does retries and looks up the endpoint again in the case that > the > > > > endpoint can't be reached, i.e. the target node has failed. There > was > > > some > > > > issue with it so that needs looking at but if we fix that then basic > > > > restarts should be ok. > > > > > > > > > > > > The best example I could find in the samples in the > > > calculator-distrubuted > > > > > example in the 1.0-incubating release, but the readme file lists a > > few > > > > > resource directories that are not in the project (resource/domain, > > > > > > resource/management). Is the readme incorrect or are the files > > really > > > > > missing? > > > > > > > > > > > > Yeah, that looks like a mistake in the documentation. Can you raise > a > > > JIRA > > > > for that and I'll fix it. The connection between the domains and the > > > nodes > > > > within it is actually implemented as an SCA application. That's what > > the > > > > domain and management directories had in them. These have been moved > > now > > > > and > > > > are currently node-impl/src/main/resources/node.composite and > > > > domain-impl/src/main/resources/domain.composite which seemed to make > > > > more > > > > sense. > > > > > > > > > > > > > I can't seem to locate these two directories. I feel like I'm looking > in > > > the > > > wrong place though. I'm guessing it's not in the sample project? > Second, > > > do > > > I even need to see them? Are they general composites used to construct > > > nodes, or are they specific to the calculator sample? > > > > > > You are right, you don't really need to see them. Was just giving you a > > bit > > of background. There are system composites that are used by the node and > > domain implementations to communicate with one another. They aren't > > specific > > to the calculator-distributed sample which is why they were moved. They > > can > > be found in trunk here > > > > > http://svn.apache.org/repos/asf/incubator/tuscany/java/sca/modules/domain- > > impl/src/main/resources/ > > http://svn.apache.org/repos/asf/incubator/tuscany/java/sca/modules/node- > > impl/src/main/resources/ > > > > You may have noticed already the way that the SCA code is layed out. All > > of > > the modules that provide SCA runtime features appear under the modules > > directory. You can only see the modules in a source distribution.. Some > of > > these modules are loaded statically and some are loaded dynamically > > (primarily optional bindings, databindings and extensions). Then you see > > there are samples and demos that show various features being used. Also > in > > trunk, but not in the distributions, you see there is an integration > test > > (itest) directory. > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > Hi Jason, sorry this has gone unanswered. I was off last week (supposed to be leave but spent the week being unwell instead:-( ). I shouldn't really point out the irony in the loss of a developer in this case leading to your mail not getting answered.
You outlined some interesting scenarios here around failure. There are of course many hardware and software systems that provide varying levels of fault tolerance using different strategies usually around duplication or move and restart. As you know in the case of Tuscany today there is very little but I would like to take some small steps to providing a little more. We should try and use existing solutions to reliability where possible without inventing new ones. Lets look at the different parts of the distributed tuscany solution and try and make some simplifying assumptions. Domain The domain is a single point of failure as is represents a registry of deployed contributions/composites/compoenets/services. We should just assume that this is running as a highly available service, e.g. a clustered web app server. Node The node provides the runtime for components and can fail itself. Failure of the runtime due to fatal errors in the Tuscany code, extension, dependencies or applications running on it that are not handled properly. A situation that causes the node JVM to stop would usually be handled with a restart script of by a heartbeat. The simplest is the restart approach but 1. the domain has to be able to handle re-registrations and 2. the bindings have to be able to handle attempts to send messages to a node that has gone away. Component I've separated components from component instances here to point out that the components that appear in the composite files are represented in an in memory model. I consider any failure in the model in the same light as he node above. Component Instances Component instances are created to handle incoming messages based on the in memory component model and the scope specified for each component. A failure in a stateless component may not be serious at all as it simply required an appropriate response to the calling component. A new component instance will be created for next message. However where a component instance exists for longer that the duration of a call it's a bit trickier as we may have to recover any state that was held there and restart and dependencies that that component instance relies on. I'll give this a little bit more though and try and be a bit more scientific about it (we could do with a wiki page where we can set out the different scenarios we expect to encounter) so let me know if any of this is making sense or not. Regards Simon
