Hi Jonathan,
I've moved your questions up as it's a long email - and following on
from what Bruce said:
1] How can you control service selection and execution in a cluster?
For the JMS Flow - message exchanges are routed on JMS Queues - we
rely on the JMS implementation to do this for us.
The reason there are dependencies on ActiveMQ is that we rely on it's
notification mechanisms to detect the arrival/disappearance of
servicemix nodes in a cluster.
ActiveMQ 3 is using round-robin load balancing, hence the behavior
you are seeing.
Since the code move to apache, servicemix in SVN now relies on
ActiveMQ 4 - which offers more flexibility over queue routing -
[2] Why are the JMS Flow fragile to the point of haveing to bounce
various
parts when a node dies? Put another way, why can't I add and remove
ServiceMix nodes without haveing to worry about re-starting one or
more of
the ActiveMQ nodes.
This is a problem in the servicemix JMS flow - which has been
rewritten to use ActiveMQ 4 - which is better for this sort of thing.
The expected release date for the next version of servicemix is end
of January.
cheers,
Rob
On 6 Jan 2006, at 00:34, [EMAIL PROTECTED] wrote:
Fellow ServiceMix users and developers,
I am experiencing some rather odd behavior when using the JMS Flow
in a load
balanced environment, that is, where http traffic is round robined
between
two HttpSoapConnectors running in separate ServiceMix containers. I'll
illustrate the course of events below.
1. Start up two separate ActiveMQ 3.2.1 instances on two separate
hosts,
each pointing to the other via the <networkConnector> directive in the
conf/activemq.xml file.
2. Startup two separate ServiceMix 2.x instances on two separate
hosts, each
pointing to both of the ActiveMQ instance started up in (1) via the
flowName="jms?jmsURL=reliable(tcp://activeMQ_A, tcp://activeMQ_B)".
3. Deploy the same service assembly to each ServiceMix
installation. The SA
will create an HttpSoapConnector and SaajComponent.
4. From a SOAP+HTTP client, connect to
http://serviceMix_A/HttpSoapConnector, which will take the SOAP
message and
send it to the SaajComponent. The SaajComponent will invoke a
specified web
service and then return the result back to the HttpSoapConnector, and
finally back to the SOAP+HTTP client. The expected response is
received.
5. Execute same flow from (4), but point to
http://serviceMix_B/HttpSoapConnector. The expected response is
received.
6. Kill serviceMix_A, the re-execute (5). The expected response may be
returned, OR the call may hang with no visible errors.
Things that I have noticed during steps (4) and (5):
a. The debug output indicates that service execution is not
confined to the
container hosting the HttpSoapConnector being posted to. That is to
say that
any node of the cluster might execute the SaajComponent invocation;
even
though the SaajComponent service is deployed locally in the same
container
hosting the invoked HttpSoapConnector, a remote instance of the
SaajComponent hosted by another node of the cluster might be invoked.
(Please note that the ServiceMix instance have different named in the
servicemix.xml file). Which instance of the SajjComponent that is
invoked
seems to be based upon which instance was first invoked (i.e. which
ServiceMix instance was first part of the exchange).
b. When one of the two nodes is shutdown, and the other node's
HttpSoapConnector is invoked, the SaajComponent local to that
active node
may not ever be called; the JMS Flow seems to attempt an invocation
of the
SaajComponent hosted by the downed node. This can be verified by
calling the
same service multiple times: If the first invocation results in a
timeout
(because the cluster is trying to utilize the downed node), and the
downed
node is re-started, subsequent calls will succeed.
c. If you kill one of the ActiveMQ instances once you enter the
hung state,
the flow will pick up correctly once the active ServiceMix node has
failed-over to the secondary ActiveMQ node. So you would then have one
ActiveMQ instance and one ServiceMix instance. Once you have killed
and
restarted the ActiveMQ instance, the problem may be very hard to
recreate.
It only seems to happen when the environment is brought up from
scratch.
This behavior makes no sense. I would expect that if a node of a
cluster was
shutdown BEFORE any calls were made, that all of the calls to the
cluster
would be honored by the active nodes. There are no exceptions being
thrown
by the bus, and no exceptions being thrown by ActiveMQ. As it is
(with an
Oracle DS in place for ActiveMQ OR with Derby), the JMS Flow
results in a
more fragile environment than simply running with SEDA flow and
HTTP load
balancing provided by Apache. I cannot imagine that this behavior
is by
design, but don't have any more information to help diagnose the
error.
Moreover, I am confused by the execution of services on a remote
note, when
the local node itself has a copy of the service. I would think that
the
local copy would be selected first; is there someway to enforce a
"LOCAL_FIRST" policy, or is it not something that can be easily
done (I know
if two policies: RandomChoice and FirstChoice)? Does the bus know if a
certain service is local or remote? I ask this question because
with the
current behavior (which sometimes selects remote over local), you
cannot
effectively load balance across multiple ServiceMix nodes when they
are
deployed ccntrally; service execution will take place on, at best,
a subset
of nodes (and that subset might only be one node).
In summary, I have two questions herein:
[1] How can you control service selection and execution in a cluster?
[2] Why are the JMS Flow fragile to the point of haveing to bounce
various
parts when a node dies? Put another way, why can't I add and remove
ServiceMix nodes without haveing to worry about re-starting one or
more of
the ActiveMQ nodes.
regards,
/jonathan