Hi, I'll try to digest the rest of your message more properly, but this is just my initial feedback:
> So to cut to the chase. Based on my testing, on short term your > troubles should go away by increasing the number of this define in > tcpset.h > > #define TCPSET_MAX 50 > > Make this something in the order of the number of zones you are > adding at once. I'd stay a bit away from 1024 as to allow for > the signerd to have some room for other file descriptors. So > I'd advice maybe 500 to 900. My initial reaction is that this will make OpenDNSSEC into the local network villain, and will create a "thundering herd" of TCP connections on startup, possibly overwhelming the upstream auth name server (which in our case also does other things than feed OpenDNSSEC its zones(!)). Instead I'd rather like to see OpenDNSSEC properly pace itself in its interaction with its surroundings: there really is no need to start 300-500 parallel TCP sessions on startup if it only is configured to have two worker threads! Even on a relatively beefy machine it takes its own sweet time "configuring" and "reading" all the zone files on startup, so why the rush? And I'm quite certain that in total, doing this paced instead of with 300-500 parallel sessions isn't actually going to be slower, possibly quite the opposite, and additionally create a lot less stress on its surroundings. If the signer wants to read a zone file, and the zone file isn't there, do a zone transfer if configured to do so, and wait for it to complete before proceeding, instead of "retrying". If there's no connection slot available, take a place in the queue, and don't simply declare "input adapter failed" and *not* initiate a zone transfer, and spin around ever more slowly trying to read a zone file which won't be there until a zone transfer is actually attempted. (That's what the behaviour looks like reading the log files, which seems an awfully clumsy way to go about these things...) Instead of the signer trying 300-500 parallel zone transfers, I think a more reasonable behaviour would be to not do more than 4 parallel zone transfers(!) But ... this partly depends on what you actually mean when you say "the number of zones you are adding at once". I can think of a couple of interpretations: 1) The number of zones configured where there's no cached file on-disk, i.e. the number of configured zones. When I get the dreaded "soamin not set" assertion (which unhelpfully doesn't point to *which* zone or which file which triggers this error condition, possibly I'll take a look at fixing that), I as an operator have no other recourse than to remove all the cached files. So am I then "adding the number of configured zones" (in my case, at present, 368). 2) The number of zones added in one batch, with "zone add" and counted when you do the corresponding "update zonelist"? This is in our situation usually quite modest, except for the round where I 2-3 weeks ago added around 300 in one go. Or perhaps "both 1 and 2"? I'll need to look in more detail at the suggested patches and the other comments. Regards, - HÃ¥vard _______________________________________________ Opendnssec-user mailing list [email protected] https://lists.opendnssec.org/mailman/listinfo/opendnssec-user
