I have not been able to run a test as described previously (externalizing ActiveMQ), but I believe that I can indeed conform that the root cause is related to ActiveMQ.
My client experienced an unexpected database outage in a multi-node QA environment last week. This outage caused the ActiveMQ master to lose the lock, but of course the slave couldn't acquire the lock either... however, as soon as the ActiveMQ master lost the db connection, suddenly the PAX JDBC initialization sprang to life on the second node. Obviously, the datasource that was associated with the down database could not initialize, but three other datasources associated with a different database that was still available came up fine. >From these observations it seems pretty clear that the PAX JDBC initialization was tied directly to the ActiveMQ master/slave status in some way. That seems odd to me, but at this point I'm just interested in finding out if there is a reasonable solution; otherwise I will likely need to go back to a non-PAX-JDBC approach for managing the data sources. (I should also mention that when I observed the behavior described above, I asked the database team to force another outage after everything had recovered. I observed the same behavior again - the ActiveMQ master lost its connection, and immediately the PAX JDBC initialization began on the other node. I can't pinpoint anything specifically for you, but I think this is pretty strong evidence of a direct link between ActiveMQ master/slave status and PAX JDBC initialization.) On Thu, Mar 15, 2018 at 2:37 AM, Christian Schneider < [email protected]> wrote: > To use ActiveMQ externally you still need to install the ActiveMQ client > on karaf. > Instead of the full activemq feature you only install activemq-client. > > You then also need to setup a JMS ConnectionFactory to connect to the > external instance. > This can be done using the regular karaf jms commands. > > Christian > > 2018-03-15 6:21 GMT+01:00 Matthew Zipay <[email protected]>: > >> I don't have a multi-node environment available to attempt this (I can't >> disrupt the actual environments I have up and running), so I'd need to test >> this using multiple VMs. At the moment I can't spare the time (or system >> resources) to set that all up. I will see what I can do about getting >> access to one of the actual multi-node environments, but it may take me a >> while. >> >> I do have a single-node test environment that I can use, though. So >> would the following simulate the conditions well enough? (I think so)... >> (1) stop Karaf/ServiceMix >> (2) run standalone ActiveMQ process using same JDBC config as for >> Karaf/ServiceMix but with different ports (i.e. allow standalone ActiveMQ >> to acquire the lock to become the master) >> (3) start Karaf/ServiceMix >> >> I believe in step (4) this should cause the single Karaf/ServiceMix >> process to effectively act as the slave, and I would expect to see my >> problem replicated (data sources not initialized). >> >> If that works, then I could try removing ActiveMQ from that >> Karaf/ServiceMix to see if it allows the data sources to start. >> >> (I've never removed ActiveMQ from a Karaf/ServiceMix installation - never >> had the need. Is it as simple as feature:uninstall activemq, or is there >> more to it?) >> >> >> On Sun, Mar 11, 2018 at 12:57 PM, Christian Schneider < >> [email protected]> wrote: >> >>> Can you try to remove activemq from karaf and host it externally? Just >>> to see if it really is the reason. >>> >>> Christian >>> >>> 2018-03-11 5:28 GMT+01:00 Matthew Zipay <[email protected]>: >>> >>>> Yes, the Karaf instances are on different machines, and I have removed >>>> Cellar entirely from the system. >>>> >>>> If I simply start up both instances, then everything *except* the data >>>> sources comes up fine on both instances. Because of this, I am temporarily >>>> running this system as active/failover (karaf.lock=true) until I can get to >>>> the bottom of this data source issue. >>>> >>>> As far as what I see in the logs... on the second node (the one where >>>> the data sources do not initialize), I do see that the cfg's are found, and >>>> there are logging entries stating that they will be initialized, but then >>>> nothing happens. No errors, nothing. >>>> >>>> Then if I then stop node 1, I almost immediately see the data sources >>>> initialize on node 2. (But, of course, if I then restart node 1, same issue >>>> - the data sources do not initialize on *that* node). >>>> >>>> Throughout all of the variations I have tested, there has been one >>>> constant: - ActiveMQ. It is configured using JDBC master/slave. It doesn't >>>> seem like this should cause Pax JDBC data sources to not initialize >>>> (ActiveMQ is not sharing the data source - entirely separate >>>> configuration), but that's the only thing I can think of: when ActiveMQ >>>> master releases the lock and allows the slave to become the new master, >>>> that's when I see Pax JDBC initialize the data sources. Thoughts? >>>> >>>> >>>> On Fri, Mar 9, 2018 at 2:38 AM, Christian Schneider < >>>> [email protected]> wrote: >>>> >>>>> Are the karaf instances on different machines? >>>>> If they are on different machines and you do not use cellar or >>>>> specially configured locking then both should come up completely >>>>> independent from each other. >>>>> >>>>> Do you see anything in the log of the second instance when you stop >>>>> the first one? I wonder what could trigger the data source come up in this >>>>> moment. >>>>> >>>>> Christian >>>>> >>>>> 2018-03-08 17:30 GMT+01:00 Matthew Zipay <[email protected]>: >>>>> >>>>>> Thanks for the quick replies. With respect to Cellar, I should >>>>>> clarify that the Pax JDBC issue persists whether or not Cellar is >>>>>> involved >>>>>> (Cellar was a recent addition to the system, specifically to see if its >>>>>> explicit support for active/active would resolve the Pax JDBC issue - it >>>>>> did not). >>>>>> >>>>>> I have tried two variations: >>>>>> (1) Without Cellar, bring up both Karaf nodes without Karaf locking >>>>>> (karaf.lock=false). (i.e. "unsupported" active/active) >>>>>> (2) With Cellar, bring up both Karaf nodes without Karaf locking >>>>>> (karaf.lock=false). (i.e. "supported" active/active) >>>>>> >>>>>> In either scenario, the data sources only initialize on the *first* >>>>>> node that comes up. >>>>>> >>>>>> In either scenario, I can confirm that Pax JDBC bundles are started >>>>>> on both nodes: >>>>>> >>>>>> Node 1: >>>>>> 228 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC Generic Driver Extender >>>>>> 229 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC Config >>>>>> 230 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC Pooling Support Base >>>>>> 270 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC MSSQL Driver Adapter >>>>>> 271 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC Oracle Driver Adapter >>>>>> 272 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC Pooling DBCP2 >>>>>> >>>>>> Node 2: >>>>>> 228 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC Generic Driver Extender >>>>>> 229 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC Config >>>>>> 230 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC Pooling Support Base >>>>>> 270 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC MSSQL Driver Adapter >>>>>> 271 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC Oracle Driver Adapter >>>>>> 272 | Active | 80 | 1.0.1 | OPS4J >>>>>> Pax JDBC Pooling DBCP2 >>>>>> >>>>>> However, jdbc:ds-list only shows the initialized data sources on node >>>>>> 1; on node 2 *no* data sources are initialized, and I have multiple >>>>>> bundles >>>>>> in GracePeriod waiting for data sources that never show up. For example, >>>>>> on >>>>>> node 2: >>>>>> >>>>>> admin@root>diag 331 >>>>>> SKU Resolution (331) >>>>>> ----------------------------------------------- >>>>>> Status: GracePeriod >>>>>> Blueprint >>>>>> 3/8/18 11:10 AM >>>>>> Missing dependencies: >>>>>> (&(dataSourceName=PRODUCTDS)(objectClass=javax.sql.DataSource)) >>>>>> >>>>>> >>>>>> The "org.ops4j.datasource-PRODUCT.cfg" config file is present and >>>>>> identical on both nodes. I see the data source initialization occur in >>>>>> node >>>>>> 1's log, but nothing in node 2's log. As soon as I bring down node 1, >>>>>> then >>>>>> I see the data sources on node 2 initialize. >>>>>> >>>>>> Could this have anything to do with ActiveMQ master/slave? That's the >>>>>> only constant here. It seems unlikely, but I'm out of ideas. >>>>>> >>>>>> >>>>>> >>>>>> On Thursday, March 8, 2018 at 2:32:39 AM UTC-5, Jean-Baptiste Onofré >>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Cellar syncs the bundles and config on the cluster. >>>>>>> >>>>>>> By default, it doesn't sync "local" config (basically the etc/*.cfg >>>>>>> files). To do so, you have to enable the local listener in etc/ >>>>>>> org.apache.karaf.cellar.groups.cfg. >>>>>>> >>>>>>> Else, you have to use cluster:config-property-set to create the >>>>>>> config on the cluster. >>>>>>> >>>>>>> Regards >>>>>>> JB >>>>>>> >>>>>>> On 03/08/2018 08:22 AM, Christian Schneider wrote: >>>>>>> >>>>>>> I am not experienced with cellar but generally I would expect that >>>>>>> in an active/active setup both machines start the same services. >>>>>>> >>>>>>> So first thing to check is if the pax-jdbc features and bundles are >>>>>>> active on the second node. If they are active you can check the log to >>>>>>> see >>>>>>> if pax-jdbc reports that something is missing. >>>>>>> >>>>>>> When you describe that the datasources on the second machine come up >>>>>>> when the first machine goes down it sounds like you have setup something >>>>>>> like a master slave setup in cellar. >>>>>>> >>>>>>> Christian >>>>>>> >>>>>>> 2018-03-07 22:55 GMT+01:00 Matthew Zipay <[email protected]>: >>>>>>> >>>>>>>> I really like the approach that Pax JDBC introduced for managing >>>>>>>> data sources, but I am running into an issue that may require me to >>>>>>>> abandon >>>>>>>> it if I can't get it resolved. >>>>>>>> >>>>>>>> My setup is as follows: >>>>>>>> ServiceMix 7.0.1 (Karaf 4.0.9) running on two nodes, clustered with >>>>>>>> Cellar (active/active). ActiveMQ is JDBC master/slave. Using Pax JDBC >>>>>>>> 1.0.1 >>>>>>>> (config, pool, and oracle and mssql adapters). >>>>>>>> >>>>>>>> I have five (5) data sources configured for the various databases >>>>>>>> in use by this system. What I see is that the data sources are only >>>>>>>> available on the *first* node in the cluster that comes up. When the >>>>>>>> second node comes up, even though it also has the data source cfg's, >>>>>>>> the >>>>>>>> data sources never get initialized, and all of my bundles that use the >>>>>>>> data >>>>>>>> sources are stuck perpetually in GracePeriod waiting on the data >>>>>>>> sources >>>>>>>> (confirmed with bundle:diag). >>>>>>>> >>>>>>>> If I bring down the first node, *then* the data sources on the >>>>>>>> second node suddenly spring to life and all's well. But this is not >>>>>>>> the >>>>>>>> behavior I would desire or expect, and it may be showstopper for me >>>>>>>> w/r/t >>>>>>>> Pax JDBC. I need those data sources available on both nodes. >>>>>>>> >>>>>>>> Is this expected? If not, any ideas how I can work around it? >>>>>>>> -- >>>>>>>> -- >>>>>>>> ------------------ >>>>>>>> OPS4J - http://www.ops4j.org - [email protected] >>>>>>>> >>>>>>>> --- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "OPS4J" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> Christian Schneider >>>>>>> http://www.liquid-reality.de >>>>>>> >>>>>>> Computer Scientist >>>>>>> http://www.adobe.com >>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> ------------------ >>>>>>> OPS4J - http://www.ops4j.org - [email protected] >>>>>>> >>>>>>> --- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "OPS4J" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>>> >>>>>>> -- >>>>>> -- >>>>>> ------------------ >>>>>> OPS4J - http://www.ops4j.org - [email protected] >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "OPS4J" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Christian Schneider >>>>> http://www.liquid-reality.de >>>>> >>>>> Computer Scientist >>>>> http://www.adobe.com >>>>> >>>>> -- >>>>> -- >>>>> ------------------ >>>>> >>>>> OPS4J - http://www.ops4j.org - [email protected] >>>>> >>>>> --- >>>>> You received this message because you are subscribed to a topic in the >>>>> Google Groups "OPS4J" group. >>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to >>>>> pic/ops4j/LHv7sqWkZTg/unsubscribe. >>>>> To unsubscribe from this group and all its topics, send an email to >>>>> [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> -- >>>> ------------------ >>>> OPS4J - http://www.ops4j.org - [email protected] >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "OPS4J" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> >>> -- >>> -- >>> Christian Schneider >>> http://www.liquid-reality.de >>> >>> Computer Scientist >>> http://www.adobe.com >>> >>> -- >>> -- >>> ------------------ >>> OPS4J - http://www.ops4j.org - [email protected] >>> >>> --- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "OPS4J" group. >>> To unsubscribe from this topic, visit https://groups.google.com/d/to >>> pic/ops4j/LHv7sqWkZTg/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> -- >> ------------------ >> OPS4J - http://www.ops4j.org - [email protected] >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OPS4J" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > -- > Christian Schneider > http://www.liquid-reality.de > > Computer Scientist > http://www.adobe.com > > -- > -- > ------------------ > OPS4J - http://www.ops4j.org - [email protected] > > --- > You received this message because you are subscribed to a topic in the > Google Groups "OPS4J" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/ops4j/LHv7sqWkZTg/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- -- ------------------ OPS4J - http://www.ops4j.org - [email protected] --- You received this message because you are subscribed to the Google Groups "OPS4J" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
