Hi Frank,
All tests are done on Solaris with an x86 processor. The JDKs are
from Oracle:
* for the SNMP4J 2.3.0 and 2.3.1 tests, JDK 7u45 is used
* for the SNMP4J 2.0.1 problem at our customer, it's JDK 7u06
We'll test with the patch that you provided.
Thanks,
Peter.
On 27/10/2014 8:35, Frank Fock wrote:
Hi Peter,
Operarting system and JDK version (including whether it is a Oracle or
3rd party JRE) would be helpful
too.
I will analyze it. Meanwhile, you could try the following fix in
Snmp.java:
protected boolean resendRequest(PendingRequest request, PDU response) {
if (request.useNextPDU()) {
request.responseReceived = false;
synchronized (pendingRequests) {
pendingRequests.remove(request.key);
PduHandle holdKeyUntilResendDone = request.key;
request.key = null;
handleInternalResponse(response, request.pdu,
request.target.getAddress());
try {
sendMessage(request.pdu, request.target, request.transport,
request);
}
catch (IOException e) {
logger.error("IOException while resending request after RFC
5343 context engine ID discovery: " +
e.getMessage(), e);
}
// now the previous retry can be released
if (logger.isDebugEnabled()) {
logger.debug("Releasing PDU handle "+holdKeyUntilResendDone);
}
holdKeyUntilResendDone = null;
}
return true;
}
return false;
}
From my analysis, I thought this improvement would not have any effect
on your issue,
but I am not 100% sure. I will have to exeactly analyze the
request/response sequences
to be sure.
Best regards,
Frank
Am 27.10.2014 08:01, schrieb Peter Verthez:
Hi Frank,
We have now tried with both SNMP4J 2.3.0 and 2.3.1 and we're facing
the same problem.
Can you investigate this? Is there any more information we could
provide?
Best regards,
Peter.
On 24/10/2014 8:17, Peter Verthez wrote:
OK, thanks, we'll try that (we are on the latest version in our
latest release, but the customer is using an older release of our
product).
Best regards,
Peter.
On 24/10/2014 2:16, Frank Fock wrote:
Hi Peter,
Please update to the latest version. There have been several fixes
about race conditions
in the Snmp class regarding retries and the PduHandleCallback. The
current implementation
seems to be OK regarding handling of the Message IDs and the MPv3
Cache WeakHashMap.
Best regards,
Frank
Am 23.10.2014 12:34, schrieb Peter Verthez:
Hi Frank,
One of our customers has reported a problem with SNMPv3. The
version of our application that they use is using SNMP4J 2.0.1.
For the particular SNMP request that has the problem, the agent is
known to respond very slowly, so the timeout model for that
particular request has been set to:
- a first timeout after 30 seconds, then a retransmission
- a second timeout after 60 seconds (so 90 seconds after the
original request), then a retransmission
- a third timeout after 120 seconds (210 seconds after the
original request)
Now, we observe the following behavior (you can also see that in
the logs that I'm pasting below):
2014-10-22 10:42:33,428 => Original request
2014-10-22 10:43:03,430 => 1st retransmission
2014-10-22 10:43:37,444 => Response to original request (rejected)
2014-10-22 10:44:03,440 => 2nd retransmission
2014-10-22 10:44:41,466 => Response to 1st retransmission (rejected)
2014-10-22 10:45:45,539 => Response to 2nd retransmission (accepted)
So the problem that we see is that the first response (the
response to the original request) is not accepted for some
reason. This looks similar to the problem SFJ-43, but that was
fixed in SNMP4J 2.0.0 according to the change log. We are in the
process of trying to reproduce it on SNMP4J 2.3.0, but I see no
relevant change in the change log, so I guess there will be no
difference.
The logs are the following (with some sensitive information removed):
------------------------------------------------
2014-10-22 10:42:33,426 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.security.UsmUserTable] Adding
user <removed> = UsmUser
[secName=<removed>,authProtocol=1.3.6.1.6.3.10.1.1.3,authPassphrase=<removed>,privProtocol=1.3.6.1.6.3.10.1.2.2,privPassphrase=<removed>,localizationEngineID=null]
2014-10-22 10:42:33,427 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.mp.MPv3] Context engine ID of
scoped PDU is empty! Setting it to authoritative engine ID:
80:00:02:7d:03:00:0e:86:1e:5f:9b
2014-10-22 10:42:33,427 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.security.USM] getUser
(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b, securityName=<removed>)
2014-10-22 10:42:33,427 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.security.USM] RFC3414 §3.1.4.a
Outgoing message needs to be encrypted
2014-10-22 10:42:33,427 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Preparing
decrypt_params.
2014-10-22 10:42:33,427 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Preparing iv
for encryption.
2014-10-22 10:42:33,427 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Using padding.
2014-10-22 10:42:33,427 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Encryption
finished.
2014-10-22 10:42:33,427 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.mp.MPv3] Adding cache entry:
StateReference[msgID=1547681362,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM@897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
2014-10-22 10:42:33,428 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.Snmp] Running pending sync
request with handle PduHandle[1912160112] and retry count left 2
2014-10-22 10:42:33,428 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.transport.DefaultUdpTransportMapping]
Sending message to 124.150.10.52/161 with length 143:
30:81:8c:02:01:03:30:11:02:04:5c:3f:be:52:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:
00:0e:86:1e:5f:9b:02:01:05:02:03:00:f6:1c:04:09:31:40:72:61:6c:65:69:67:68:04:0c:aa:82:4f:0d:79:70:ab:03:54:1d:8d:07:04:08:
00:00:00:05:1a:64:0a:89:04:38:56:b3:24:74:26:e8:bc:d0:be:5c:f4:6e:59:d6:ac:2e:35:2d:01:fd:91:3f:be:0e:75:31:aa:96:4d:9b:36:ab:27:e7:65:e6:e3:18:55:
a6:1a:fc:be:cc:22:3c:92:47:6a:06:3b:e7:30:1a:16:2c
2014-10-22 10:43:03,430 DEBUG [Timer-30]-[org.snmp4j.mp.MPv3]
Adding cache entry:
StateReference[msgID=1547681458,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM@897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
2014-10-22 10:43:03,430 DEBUG [Timer-30]-[org.snmp4j.Snmp] Running
pending sync request with handle PduHandle[1912160112] and retry
count left 1
2014-10-22 10:43:03,430 DEBUG
[Timer-30]-[org.snmp4j.transport.DefaultUdpTransportMapping]
Sending message to 124.150.10.52/161 with length 143:
30:81:8c:02:01:03:30:11:02:04:5c:3f:be:b2:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:9b:
02:01:05:02:03:00:f6:3a:04:09:31:40:72:61:6c:65:69:67:68:04:0c:01:bd:0d:b8:20:96:05:00:f8:52:26:ac:04:08:00:00:00:05:1a:64:0a:
e6:04:38:76:37:c0:dd:e9:97:38:33:11:2d:3e:76:31:a8:67:2e:bb:dd:8b:cd:cc:5a:33:28:ec:53:6c:a2:ab:da:f7:5b:88:9d:98:59:f3:e9:ae:
2b:d5:12:16:2f:dd:4d:33:5d:c7:6e:19:14:65:f4:2c:60
2014-10-22 10:43:37,444 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.transport.DefaultUdpTransportMapping]
Received message from /124.150.10.52/161 with length 152:
30:82:00:94:02:01:03:30:11:02:04:5c:3f:be:52:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:
9b:02:01:05:02:03:00:f6:5d:04:09:31:40:72:61:6c:65:69:67:68:04:0c:e0:81:9d:7f:48:f7:12:2b:72:28:33:0c:04:08:08:35:08:35:08:35:
08:35:04:40:16:f0:c4:fd:8a:3a:e6:01:89:de:17:22:d3:75:95:69:9d:81:d6:f8:44:47:97:91:bf:d9:8c:cc:f1:fc:04:5e:47:84:1b:04:ae:37:
b0:3a:58:ec:2b:6f:c9:8f:5d:08:0c:e3:33:f1:7f:5b:ca:95:cf:c0:43:72:c6:9c:79:6c
2014-10-22 10:43:37,444 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
SNMPv3 header decoded: msgId=1547681362, msgMaxSize=65535,
msgFlags=07, secModel=3
2014-10-22 10:43:37,444 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.USM]
getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,
securityName=<removed>)
2014-10-22 10:43:37,444 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.UsmTimeTable]
CheckTime: time ok (non authoritative)
2014-10-22 10:43:37,444 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
RFC3412 §7.2.10 - Received PDU (msgID=1547681362) is a response
or internal class message, but cached information for the msgID
could not be found
2014-10-22 10:43:37,444 WARN
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.MessageDispatcherImpl]
statusInfo=noError, status=-1409
2014-10-22 10:44:03,439 DEBUG [Timer-30]-[org.snmp4j.security.USM]
getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,
securityName=<removed>)
2014-10-22 10:44:03,439 DEBUG [Timer-30]-[org.snmp4j.security.USM]
RFC3414 §3.1.4.a Outgoing message needs to be encrypted
2014-10-22 10:44:03,439 DEBUG
[Timer-30]-[org.snmp4j.security.PrivDES] Preparing decrypt_params.
2014-10-22 10:44:03,439 DEBUG
[Timer-30]-[org.snmp4j.security.PrivDES] Preparing iv for encryption.
2014-10-22 10:44:03,440 DEBUG
[Timer-30]-[org.snmp4j.security.PrivDES] Using padding.
2014-10-22 10:44:03,440 DEBUG
[Timer-30]-[org.snmp4j.security.PrivDES] Encryption finished.
2014-10-22 10:44:03,440 DEBUG [Timer-30]-[org.snmp4j.mp.MPv3]
Adding cache entry:
StateReference[msgID=1547681650,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM@897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
2014-10-22 10:44:03,440 DEBUG [Timer-30]-[org.snmp4j.Snmp] Running
pending sync request with handle PduHandle[1912160112] and retry
count left 0
2014-10-22 10:44:03,440 DEBUG
[Timer-30]-[org.snmp4j.transport.DefaultUdpTransportMapping]
Sending message to 124.150.10.52/161 with length 143:
30:81:8c:02:01:03:30:11:02:04:5c:3f:bf:72:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:9b:
02:01:05:02:03:00:f6:77:04:09:31:40:72:61:6c:65:69:67:68:04:0c:a7:9a:4e:c6:14:1f:c3:59:6d:7c:50:8d:04:08:00:00:00:05:1a:64:0b:
9d:04:38:da:3a:05:13:5d:80:b0:9c:fe:41:c5:92:f7:72:9c:9b:fc:01:d0:c3:85:40:b5:93:e2:ab:9c:a0:ca:3a:0a:cf:91:1b:73:e6:89:a3:11:
f2:59:c1:00:3c:ee:a7:07:aa:7d:9f:1e:7f:2e:07:7f:04
2014-10-22 10:44:04,109 DEBUG [Timer-30]-[org.snmp4j.Snmp]
PendingRequest canceled key=null, pdu=null, target=null,
transport=null, listener=null
2014-10-22 10:44:04,129 DEBUG [Timer-30]-[org.snmp4j.Snmp]
PendingRequest canceled key=null, pdu=null, target=null,
transport=null, listener=null
2014-10-22 10:44:04,129 DEBUG [Timer-30]-[org.snmp4j.Snmp]
PendingRequest canceled key=null, pdu=null, target=null,
transport=null, listener=null
2014-10-22 10:44:41,466 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.transport.DefaultUdpTransportMapping]
Received message from /124.150.10.52/161 with length 152:
30:82:00:94:02:01:03:30:11:02:04:5c:3f:be:b2:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:
9b:02:01:05:02:03:00:f6:9d:04:09:31:40:72:61:6c:65:69:67:68:04:0c:1a:d1:a1:f1:89:56:c8:cb:60:58:05:2d:04:08:54:4a:54:4a:54:4a:
54:4a:04:40:b1:0c:a4:55:56:4c:13:c6:73:a2:f5:70:19:68:83:e8:03:ac:b0:40:0c:c8:8d:c0:fc:7c:b6:b6:0f:81:d8:97:35:ea:d6:7e:6b:47:
0f:f7:c2:7d:f7:62:1e:14:c0:81:58:9e:22:ba:1d:4a:f2:cd:74:b6:18:b9:1e:93:9f:23
2014-10-22 10:44:41,466 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
SNMPv3 header decoded: msgId=1547681458, msgMaxSize=65535,
msgFlags=07, secModel=3
2014-10-22 10:44:41,466 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.USM]
getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,
securityName=<removed>)
2014-10-22 10:44:41,466 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.UsmTimeTable]
CheckTime: time ok (non authoritative)
2014-10-22 10:44:41,466 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
RFC3412 §7.2.10 - Received PDU (msgID=1547681458) is a response
or internal class message, but cached information for the msgID
could not be found
2014-10-22 10:44:41,467 WARN
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.MessageDispatcherImpl]
statusInfo=noError, status=-1409
2014-10-22 10:45:45,539 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.transport.DefaultUdpTransportMapping]
Received message from /124.150.10.52/161 with length 152:
30:82:00:94:02:01:03:30:11:02:04:5c:3f:bf:72:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:
9b:02:01:05:02:03:00:f6:dd:04:09:31:40:72:61:6c:65:69:67:68:04:0c:63:92:b3:32:2d:43:06:89:11:1b:bf:41:04:08:79:1a:79:1a:79:1a:
79:1a:04:40:3e:29:b5:5b:77:09:68:17:85:b1:9a:dc:22:75:cd:6d:f6:d7:3d:6f:a0:0c:6e:85:c5:9c:78:92:97:52:08:e6:83:59:5c:e9:70:44:
3c:5a:94:6e:f2:3a:f3:19:3e:fa:ba:6c:4f:3e:ed:32:2f:02:de:ce:c3:95:46:0c:79:c4
2014-10-22 10:45:45,539 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
SNMPv3 header decoded: msgId=1547681650, msgMaxSize=65535,
msgFlags=07, secModel=3
2014-10-22 10:45:45,540 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.USM]
getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,
securityName=<removed>)
2014-10-22 10:45:45,540 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.UsmTimeTable]
CheckTime: time ok (non authoritative)
2014-10-22 10:45:45,540 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
Removed cache entry:
StateReference[msgID=1547681650,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM@897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
2014-10-22 10:45:45,540 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
RFC3412 §7.2.10 - Received PDU (msgID=1547681650) is a response
or an internal class message. PduHandle.transactionID = 1912160112
2014-10-22 10:45:45,540 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
MPv3 finished
2014-10-22 10:45:45,540 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.Snmp]
Looking up pending request with handle PduHandle[1912160112]
2014-10-22 10:45:45,540 DEBUG
[http-0.0.0.0-8080-30]-[org.snmp4j.Snmp] Removed pending request
with handle: PduHandle[1912160112]
----------------------------------
As you can see, the first response is rejected with:
2014-10-22 10:43:37,444 DEBUG
[DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
RFC3412 §7.2.10 - Received PDU (msgID=1547681362) is a response
or internal class message, but cached information for the msgID
could not be found
However, I don't see a logging line telling that the entry for
PDUHandle 1912160112 has been removed from the MPv3 cache before
that line. I would expect:
- either the logging line from Cache.popEntry ("Removed cache
entry: ...")
- or the error line from Snmp.run, where Cache.deleteEntry is
eventually called ("Failed to send SNMP message to ...")
The entry is only explicitly removed at the very end, after the
response to the second retransmission.
The only thing I can see (but I could be completely wrong about
this) is that the MPv3 Cache uses a WeakHashMap, so if there would
be no other references to those PduHandles, the GC could remove
this. The longer the timeout is, the bigger the chance that a GC
will step in and remove the entry. But I don't have a view on
the overall design: maybe there are references elsewhere.
Could you investigate this issue?
Thanks,
Peter.
--
Peter Verthez
Systems Engineer Network Mgt.
Tel: (+32) 3 240 84 50 | Alcanet:
Fax: (+32) 3 240 84 59 | (6)2605
Alcatel-Lucent Bell NV
Copernicuslaan 50, 2018 Antwerp, Belgium
Fortis 220-0002334-42
VAT BE 0404 621 642 Register of Legal Entities Antwerp
***
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited without the prior consent of its
author.
_______________________________________________
SNMP4J mailing list
[email protected]
https://oosnmp.net/mailman/listinfo/snmp4j