| From: Andrew Cagney <[email protected]> | > Clearly the main thread needs the MD most of the time. But probably | > not during "suspension" of a state. That's when the worker could have | > ownership. I'm guessing that the only worker-access needed is for | > encryption/authentication of the packet itself. | | Yes, the state transition code uses MD (as a parameter). It doesn't | need to be stuffing it in st_suspended_md or conversely having state | stuffed into md.st.
Historically, md has a pointer to the current struct state. I know you've changed this, but I'm not sure why that's a win. The idea was that md would carry all that was needed by a State Transition Function. It is true that the rest of the msg_digest was the sliced and diced packet and that the state object is different. The state pointer in md was kind of cheap: it did not "own" the state, so that freeing the md did not require freeing the state. On the other hand, if the state were freed (killed), the md needed to be deleted because nothing else "owned" the md. If a state transition were suspended, the md (if any) needs to be preserved and remembered: for the restarted the state transition and for not leaking (kind of related). The md is preserved by nulling out a pointer to it in the framework so that the framework won't free it at the end of the (incomplete) state transition. It is remembered by being placed in st_suspended_md. So it a state object is killed, it is easy to find any md attached to it by suspension. The maintenance of the md's pointer to the state falls easily out of that. | My current hunch is to get pluto's responsiveness up, we'll be trying | to off-load more and more work. That would mean that pretty much | every state transition will consist of: | state->in-progress->new-state | while strictly speaking this should be implemented as separate states, | adding all the extra states will get so tedious and redundant that | we'll end up accepting the status quo. I don't know what you are proposing. It sounds like a path of most resistance. Most of the work that is not already off-loaded is really modest, as far as I know. One very important thing is to get Pluto to behave with grace under load. I don't know if it does. That's probably more important than handling a larger load. I'm not aware of (haven't been paying attention to?) any reasonable performance analysis of Pluto. We do have hunches. Hunches are useful mostly for directing measurements. There is no point in redesigns for performance without measurements. Now onto my hunches. The event-loop is fundamentally single threaded. The code depends on variables not changing by outside forces while it is running. It is as if a lock were excluding all other actors while an event is being processed. That seriously simplifies the code. If the event loop processing is the bottleneck, and it needs to be threaded, most of the code needs to be rewritten. There are current exceptions. Threads or processes are used for PAM, for crypto, and for parts of DNS lookup. I don't know whether certificate validation is threaded, but it should be. What are other operations that take a bunch of time? Something that takes CPU time could be a little different from something that takes real time. Multiple CPUs are needed to improve CPU time consumers whereas multitasking can deal with real time consumers. Some parts of Pluto were designed for light duty. No need to overengineer. But now, many years later, some of those mechanisms may be bottlenecks. You sped up serial number to state mapping. No longer a linear search. That mechanism wasn't intended to be used much. But: are the uses that caused performance issues really reasonable? I don't know. Scanning all existing states is linear. There seem to be a fair number of casess where that is used. Can that be sped up by datastructuctures that let us only look at the relevant states? Performance measurement showed that sanitizing the log messages took a non-trivial amount of time. You ditched the sanitization. It probably wasn't necessary, but it was part of defense in depth. Alternatively, we could have found out why it was so slow. My guess: something to do with locales, something that should have been easy to cut out of the processing. If sanitizing the log messages took noticeable time, we were doing too much logging -- something expensive in itself. | However, one thing that I do think should go is that extra code path | handling STF_INLINE. When there are no threads, use the event loop to | offload the work. I don't remember that clearly, but I think that's exactly what STF_INLINE is actually doing. My hunch is that marshalling and demarshalling data eats a surprising amount of processing. That needs to be backed up by measurement. I've got to go, so this message is half-baked. _______________________________________________ Swan-dev mailing list [email protected] https://lists.libreswan.org/mailman/listinfo/swan-dev
