Hi, 

We use snmp4j in one of our applications and while tracking down an issue 
reported by our customer, I came across some code in snmp4j that I believe may 
be thread-unsafe and that MIGHT occasionally be causing a device to be flooded 
with too many GET requests.

Code snippet as follows
In org.snmp4j.Snmp.java, org.snmp4j.Snmp.ReportProcessor class

1294 public void processReport(PduHandle handle, CommandResponderEvent e) {
1295 PDU pdu = e.getPDU();
1296 logger.debug("Searching pending request with handle" + handle);
..
..
...
...
...
..
1313 boolean resend = false;
1314 if (request.requestStatus < request.maxRequestStatus) {
1315 switch (request.requestStatus) {
1316 case 0:
1317 if (SnmpConstants.usmStatsUnknownEngineIDs.equals(firstOID)) {
1318        resend = true;
1319 }
1320 else if (SnmpConstants.usmStatsNotInTimeWindows.equals(firstOID)) {
1321        request.requestStatus++;
1322        resend = true;
1323 }
1324 break;
1325 case 1:
1326 if (SnmpConstants.usmStatsNotInTimeWindows.equals(firstOID)) {
1327        resend = true;
1328 }
1329 break;
1330 }
1331 }
1332 // if legal report PDU received, then resend request
1333 if (resend) {
1334 logger.debug("Send new request after report.");
1335 request.requestStatus++;
1336 try {
1337        // We need no callback here because we already have an equivalent
1338        // handle registered.
1339 PduHandle resentHandle =
1340 sendMessage(request.pdu, request.target, e.getTransportMapping(),
1341 null);
1342 // make sure reference to handle is hold until request is finished,
1343 // because otherwise cache information may get lost (WeakHashMap)
1344 request.key = resentHandle;
1345 }
1346 catch (IOException iox) {
1347        logger.error("Failed to send message to " + request.target + ": " +
1348        iox.getMessage());
1349        return;
1350 }
1351 }
1352 else {
..
....

The value of the resend boolean is based on the value of requestStatus. But the 
use of request.requestStatus++ is not threadsafe in this method above - because 
a) the ++ operation is not atomic on a primitive integer and 
b) the value is read on one line and modified on another.
If multiple threads enter the processReport() method in order to process a 
report for the same request object, they may end up seeing the cached value of 
0 over and over again and resending the request several times until the 
request.requestStatus object value finally hits a memory barrier and the real 
value is flushed.
Also, since the value of requestStatus is read first on line 1314, and then 
incremented either on line 1335, line 1321, if multiple threads got in here for 
the same request object, you wouldn't get atomic behavior.

The customer reported a scenario (and provided a wireshark capture to prove it) 
that showed the exact same SNMP GET request (with the same requestID) going out 
to a particular snmp device approximately 180 times in ~9 seconds. We are using 
timeout=3000 and retries = 2 - so we expect to see 3 requests in 9 seconds with 
the same request ID. So in the span of approximately 20 milliseconds, we see 
about 60 identical GET requests are being issued with the same requestID. I can 
provide the pcap file showing this behavior if necessary. Also, these GETs are 
interspersed with many REPORTs coming in from the device so this situation may 
have been induced by a bug on the agent - however I'm writing to check whether 
the code should protect itself from such agent bugs. Perhaps I am wrong and 
this code is synchronized somewhere at a higher layer so that a REPORT for a 
request with a particular handle can only be processed on a single thread at a 
time.

Thanks for any assistance / clarification in advance!

Syed F. Ali


_______________________________________________
SNMP4J mailing list
[email protected]
http://lists.agentpp.org/mailman/listinfo/snmp4j

Reply via email to