Re: using net-snmp at scale with one process and one socket

2021-11-11 Thread Neil Mckee
I'm happy to share the changes I made to the library,  but they were
minimal.  Really just:

(a)  adding this hook to snmp_api.c to allow response packets to be
fed back in...

// add hook to allow this to be called from outside
void snmp_sess_receive_packet(void *sessp, u_char *packet, int length,
struct sockaddr *from, int from_len)
{
  struct session_list *slp = (struct session_list *) sessp;
  if(slp) {
netsnmp_session *sp = slp->session;
struct snmp_internal_session *isp = slp->internal;
netsnmp_transport *transport = slp->transport;
if(transport && transport->sock > 0 && sp && isp && isp->requests) {
  // The opaque is expected to be malloc'd by the transport
receive fn
  // because it is then freed by _sess_process_packet() later.
  void *from2 = malloc(from_len);
  memcpy(from2, from, from_len);
  int rc = _sess_process_packet(sessp, sp, isp, transport, from2,
from_len, packet, length);
  if(rc && sp->s_snmp_errno) SET_SNMP_ERROR(sp->s_snmp_errno);
}
  }
}

(b) In snmpusm.c,  declaring usm_build_probe_pdu() as a global
function (by removing "static").

(c) reworking snmpusm.c to use a hash-table of lists,  instead of just
one (struct usmUser *) list.  This was a bigger change,  but it was
also a _breaking_ change because there is an SNMP v3 agent table that
expects a certain ordering here.  I think the right compromise might
be to use a tree-structure so that lookups are efficient while
ordering is preserved,  or better still just use a hash table and
change the agent code to comb through the table any time it wants a
GET_NEXT.  But either way that is a big change for such an obscure
benefit,  and extra care would need to be taken - especially in the
presence of multiple threads.

Unfortunately the other code I wrote to interface between the rest of
the application and the library is too intertwined with product code
to share just as it is.  But I'd be happy to discuss choices if
someone were to look at this.  For now I can share this psuedo code
outline below, which is probably more helpful anyway.

Notes:
(1)  I used a red-black-tree for the retry-timeout collection when it
might have been more appropriate to use a binary-heap(?)
(2)  I had to maintain separate hash lookups keyed by reqid and msgid
so that processing SNMPv3 responses could work the same way as for
SNMPv2.  For v3 responses the msgid appears unencrypted, while the
reqid only emerges after decryption is successful.
(3) Since a response might come in for a request that has already been
retried the request-pdu needs to keep the list of all attempts made so
far. Those requests are kept in an ordered list by the pdu and only
deleted when the pdu life-cycle is complete.
(4) It might have worked better to make the Request object be the
callback magic pointer, rather than the more persistent "Target"
object,  which then needs yet another hash table lookup to get to the
original request that matches the pdu.  I think this was to make sure
it was OK to delete a request at any time,  but in practice the
lifecycle is clear so maybe there is one indirection too many in this
scheme.

The cast of characters here are:
snmp - one persistent object representing this wrapper
  \_target - one mostly-persistent object for each remote target IP
  \_pdu - query and response objects
 |_request - individual attempt

// the callback function given to the library.  Called after
snmp_timeout() or snmp_sess_receive_packet()
SNMP_callback(int op, netsnmp_session *ss, int reqid, netsnmp_pdu
*pdu, void *magic) {
  target = (target)magic;
  req = removeFromHashTable(target->requestsByReqid, pdu->reqid)
  // process Pdu/error and call back to client
  // now we can finally clean up
  for(req in pdu->requests) {
 // make sure req is removed from all 5 collections:
 removeFromList(pdu->requests, req)
 removeFromTree(snmp->timeoutTree, req)
 removeFromHashTable(target->requestsByReqid, pdu->reqid)
 removeFromHashTable(snmp->requestsByReqId, req->reqid)
 removeFromHashTable(snmp->requestsByMsgId, req->msgid)
 delete(req)
  }
}

// sending a request
SNMP_sendRequest(pdu, target) {
  long reqid = snmp_sess_async_send(target->sessp, pdu, SNMP_callback, target)
  req = newRequest(target, pdu, reqid)
  // req goes in up to 5 collections:
  addToList(pdu->requests,req)
  addToHashTable(target->requestsByReqId, reqid)
  addToTree(snmp->timeoutTree, req)
  addToHashTable(snmp->requestsByReqId, reqid, req);
  if(pdu->version == 3)
 addToHashTable(snmp->requestsByMsgId, pdu->msgid, req);
}

// this is called every 100mS or so
SNMP_timeout(snmp) {
 while((req = timedOutRequest(snmp)) != NULL) {
removeReqFromTimeoutTree(req);
snmp_sess_timeout(req->target->sessp);
 }
}

// this is called when a response packet appears on the socket as a
result of select() or poll()
SNMP_response(snmp, socket, msg) {
 Request req=NULL;
 long version=0,msgid=0,reqid=0;
 

Re: using net-snmp at scale with one process and one socket

2021-10-22 Thread Wes Hardaker via Net-snmp-coders
Neil Mckee  writes:

> You are very welcome. Let me know if you want to discuss any of the
> particulars.

Well, the biggest question in my mind is: are you willing to contribute
the patches back?

It's unclear if any of them break backwards compatability (which we
always strive for), but certainly many of them would not.

> Adding or importing your favorite tree implementation to the library
> would be a lot of new code

Well, there's one issue: lots of new code means bigger applications and
we do strive to be small to run on small devices.  That being said, the
sliding definition of "small" has certainly increased in size over time
with the continued increase of storage/memory availability in even the
smallest devices.


-- 
Wes Hardaker
Please mail all replies to net-snmp-coders@lists.sourceforge.net


___
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders


Re: using net-snmp at scale with one process and one socket

2021-10-18 Thread Neil Mckee
You are very welcome. Let me know if you want to discuss any of the
particulars.  For example, it seems like snmpusm.c is also trying to
please the SNMPv3 user MIB by getting everything in the right order
for GETNEXT there.  So my use of a hash-table breaks that,  but that's
OK for me because I am only using the library as a client.  I did
wonder if using something like a red-black tree with a carefully
crafted sort function would allow efficient lookup without breaking
the ordering,  but it was hard to make the logic turn out exactly the
same as it is now (e.g. I think it would require great care to make
sure an entry is removed and added again if it is ever modified in
situ).

Adding or importing your favorite tree implementation to the library
would be a lot of new code,  but could also be used to store
outstanding requests sorted by monotonic-clock "due" time,  which
helps to streamline the timeout mechanism when there are
hundreds/thousands of concurrent requests.

On balance,  however,  it doesn't seem like there is a case for
rocking the boat - especially when most client use-cases (perl,
python, snmpwalk etc.) make only one request in the life cycle of a
session. Plus it looks like you now have to consider multiple threads
too, so using more complex data-structures is hard.  I guess if you
decided to carve out a single-session, single-threaded client API then
it would be more tunable, but as long as the package is also an agent
and is also multi-threaded then it seems like your hands are tied? I
certainly won't object if nothing changes.


On Mon, Oct 18, 2021 at 10:30 AM Wes Hardaker
 wrote:
>
> Neil Mckee  writes:
>
> > This may have been covered many times before,  but just in case it
> > helps someone, here is a summary of my experience with using net-snmp
> > in a large network.
>
> Thank you for the very useful list of changes.
> --
> Wes Hardaker
> Please mail all replies to net-snmp-coders@lists.sourceforge.net


___
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders


Re: using net-snmp at scale with one process and one socket

2021-10-18 Thread Wes Hardaker via Net-snmp-coders
Neil Mckee  writes:

> This may have been covered many times before,  but just in case it
> helps someone, here is a summary of my experience with using net-snmp
> in a large network.

Thank you for the very useful list of changes.
-- 
Wes Hardaker
Please mail all replies to net-snmp-coders@lists.sourceforge.net


___
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders


using net-snmp at scale with one process and one socket

2021-09-16 Thread Neil Mckee
This may have been covered many times before,  but just in case it
helps someone, here is a summary of my experience with using net-snmp
in a large network.  Specifically, I use it to talk to thousands of
SNMPv2 and SNMPv3 agents using just one UDP socket in a process that
runs continuously.

The net-snmp library uses linear linked lists for sessions, requests
and usmuser data. That would normally be OK, but does not scale so
well for this use-case.  So I had to make a few small changes:

(1) I still create a separate netsnmp-session for each remote agent
IP,  but force them all to use the same UDP socket.
(2) I turn off the v3 engine-id probe hooks so I can send and process
those requests manually.
(3) I do my own select() using hash tables keyed by reqid and msgid to
map back to the right netsnmp-session. There is no blocking.
(4) I do my own timeout check using a time-sorted tree.
(5) I changed the library to expose this function so I can feed
response PDUs back to the library directly:

extern void snmp_sess_receive_packet(void *sessp, u_char *packet, int
length, struct sockaddr *from, int from_len);

(6) I exposed this function to help construct engine-id probes:

extern int usm_build_probe_pdu(netsnmp_pdu **pdu);

(6) I changed the userList in snmplib/snmpusm.c to be a hash-table of
userLists keyed by engine-id (but the logic within each list is
otherwise unchanged).

These were the smallest changes I could make.  There don't seem to be
any significant bottlenecks now.

Neil

P.S. I was also tempted to set SNMPV3_IGNORE_UNAUTH_REPORTS=1 because
otherwise the library delivers an unexpected extra callback after a
failed engine-id probe,  but it seemed better to harden the logic to
handle unexpected callbacks gracefully.


___
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders