Hi, We have a network of brokers using the multicast protocol. Some messages on that network disappear.
Our application is a large php website, where we use Stomp+activemq to offload some of the work to allow asynchronous processing. Among that are notifications similar to Facebook's, but also other queues. Those queues vary from a few messages to over three million messages a day. In the network are 11 servers which we implicitly give certain roles: - Store and forward Each of our 9 webservers has a activemq instance. This allows saving some network overhead for the non-reused connections of PHP's stomp-client (i.e. connects to 'localhost') and provides some redundancy and buffering in case the central node has a problem. - Central node We have 2 of these of which one is 'active'. We simply connect the consumers using the failover protocol to these nodes, with the 'active' one being tried first. The nodes all run ActiveMQ 5.13.3 and use multicast for transport discovery. The flow for the messages in question is: 1. 'Something' happens on the website (i.e. a user's post is quoted) 2. The php-code produces a Stomp-message 3. The message is sent to the activemq on 'localhost' of that webserver 4. (Since there is no consumer there) that activemq forwards it to the central node 5. Consumed from that central node using a long running php process that consumes and acks each message as it arrives (i.e. does not buffer) Since users started to report bugs about this, we added logging at many levels. After a lot of digging it turns out those missing messages coincide with log messages like this on the central node's activemq.log: 2016-09-28 17:47:23,637 | WARN | suppressing duplicate message send [ID:panda-41468-1473163942165-83:18887261:-1:1:1] from network producer with producerSequence [1] less than last stored: 2 | org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport: tcp:///172.29.249.161:50026@61616 ... 2016-09-28 22:14:55,310 | WARN | suppressing duplicate message send [ID:phobos-44763-1473074816679-26:18380536:-1:1:3] from network producer with producerSequence [3] less than last stored: 5 | org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport: tcp:///172.29.249.33:59068@61616 In total, this seems to happen about 3000 times a day. So if those are (considered) duplicate, where are the initial ones? And if they're false positives, how do we prevent false positives while keeping support for true positives? Below is a somewhat stripped down version of our <broker>-section (no comments and debug stuff). All 11 servers have the same config (apart from the brokerName). Best regards, Arjen <broker xmlns="http://activemq.apache.org/schema/core" brokerName="nestor" dataDirectory="${activemq.data}" schedulerSupport="true"> <destinationPolicy> <policyMap> <policyEntries> <policyEntry topic=">" > <pendingMessageLimitStrategy> <constantPendingMessageLimitStrategy limit="1000"/> </pendingMessageLimitStrategy> </policyEntry> <policyEntry queue=">" producerFlowControl="true" memoryLimit="128mb" enableAudit="false"> <networkBridgeFilterFactory> <conditionalNetworkBridgeFilterFactory replayWhenNoConsumers="true"/> </networkBridgeFilterFactory> </policyEntry> </policyEntries> </policyMap> </destinationPolicy> <networkConnectors> <networkConnector uri="multicast://default?group=tweakersActiveMQProduction&prefetchSize=1" /> </networkConnectors> <persistenceAdapter> <kahaDB directory="${activemq.data}/kahadb"/> </persistenceAdapter> <transportConnectors> <transportConnector name="openwire" uri="tcp://0.0.0.0:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600" discoveryUri="multicast://default?group=tweakersActiveMQProduction" auditNetworkProducers="true"/> <transportConnector name="stomp" uri="stomp://0.0.0.0:61613?transport.closeAsync=false&maximumConnections=1000&wireFormat.maxFrameSize=104857600"/> </transportConnectors> </broker>
