Fellow ServiceMix users and developers, I am experiencing some rather odd behavior when using the JMS Flow in a load balanced environment, that is, where http traffic is round robined between two HttpSoapConnectors running in separate ServiceMix containers. I'll illustrate the course of events below.
1. Start up two separate ActiveMQ 3.2.1 instances on two separate hosts, each pointing to the other via the <networkConnector> directive in the conf/activemq.xml file. 2. Startup two separate ServiceMix 2.x instances on two separate hosts, each pointing to both of the ActiveMQ instance started up in (1) via the flowName="jms?jmsURL=reliable(tcp://activeMQ_A, tcp://activeMQ_B)". 3. Deploy the same service assembly to each ServiceMix installation. The SA will create an HttpSoapConnector and SaajComponent. 4. From a SOAP+HTTP client, connect to http://serviceMix_A/HttpSoapConnector, which will take the SOAP message and send it to the SaajComponent. The SaajComponent will invoke a specified web service and then return the result back to the HttpSoapConnector, and finally back to the SOAP+HTTP client. The expected response is received. 5. Execute same flow from (4), but point to http://serviceMix_B/HttpSoapConnector. The expected response is received. 6. Kill serviceMix_A, the re-execute (5). The expected response may be returned, OR the call may hang with no visible errors. Things that I have noticed during steps (4) and (5): a. The debug output indicates that service execution is not confined to the container hosting the HttpSoapConnector being posted to. That is to say that any node of the cluster might execute the SaajComponent invocation; even though the SaajComponent service is deployed locally in the same container hosting the invoked HttpSoapConnector, a remote instance of the SaajComponent hosted by another node of the cluster might be invoked. (Please note that the ServiceMix instance have different named in the servicemix.xml file). Which instance of the SajjComponent that is invoked seems to be based upon which instance was first invoked (i.e. which ServiceMix instance was first part of the exchange). b. When one of the two nodes is shutdown, and the other node's HttpSoapConnector is invoked, the SaajComponent local to that active node may not ever be called; the JMS Flow seems to attempt an invocation of the SaajComponent hosted by the downed node. This can be verified by calling the same service multiple times: If the first invocation results in a timeout (because the cluster is trying to utilize the downed node), and the downed node is re-started, subsequent calls will succeed. c. If you kill one of the ActiveMQ instances once you enter the hung state, the flow will pick up correctly once the active ServiceMix node has failed-over to the secondary ActiveMQ node. So you would then have one ActiveMQ instance and one ServiceMix instance. Once you have killed and restarted the ActiveMQ instance, the problem may be very hard to recreate. It only seems to happen when the environment is brought up from scratch. This behavior makes no sense. I would expect that if a node of a cluster was shutdown BEFORE any calls were made, that all of the calls to the cluster would be honored by the active nodes. There are no exceptions being thrown by the bus, and no exceptions being thrown by ActiveMQ. As it is (with an Oracle DS in place for ActiveMQ OR with Derby), the JMS Flow results in a more fragile environment than simply running with SEDA flow and HTTP load balancing provided by Apache. I cannot imagine that this behavior is by design, but don't have any more information to help diagnose the error. Moreover, I am confused by the execution of services on a remote note, when the local node itself has a copy of the service. I would think that the local copy would be selected first; is there someway to enforce a "LOCAL_FIRST" policy, or is it not something that can be easily done (I know if two policies: RandomChoice and FirstChoice)? Does the bus know if a certain service is local or remote? I ask this question because with the current behavior (which sometimes selects remote over local), you cannot effectively load balance across multiple ServiceMix nodes when they are deployed ccntrally; service execution will take place on, at best, a subset of nodes (and that subset might only be one node). In summary, I have two questions herein: [1] How can you control service selection and execution in a cluster? [2] Why are the JMS Flow fragile to the point of haveing to bounce various parts when a node dies? Put another way, why can't I add and remove ServiceMix nodes without haveing to worry about re-starting one or more of the ActiveMQ nodes. regards, /jonathan
