Andy Polyakov wrote: >>>> FYI "pushfl; popfl" is used to clear a bit in eflags that says whether >>>> or not to reload the encryption key from memory. For now I always force >>>> the reloading. Probably it could be more optimized later. >>> >>> [...] >>> Which in turn means that as long as you *know* that no execution thread >>> interleaves several contexts, pushfl;popfl is redundant. Right? >> >> >> I'd say it's not uncommon for apps to have different keys (i.e. >> contexts) for dataflows in different directions. E.g. one key for >> incoming and one for outgoing data. Right? > > > Oh come on:-) My question essentially was hypothetical. I posed it > trying to figure out how the thing works. It doesn't help me if you ask > another question back as I still don't know the answer:-) But never mind...
Sorry :-) The flag eflags[30] gets cleared on *any* load of eflags from the stack, i.e. also on every context- and task-switch. It is *very* likely that the flag is already cleared when the PadLock engine is called because either a syscall or a task switch happend since it's last usage (would probably any interrupt do it as well?). So the pushf/popf is almost always redundant anyway. >> For now I'd leave it as it is and would add the comparsion to the last >> context used later. > > > Sure! Just one thing to remember for that later occasion. If you choose > to track context through a global variable (in opposite to per-thread > storage) make sure it's thread-safe, i.e. stick to atomic lock;cmpxchg > or something similar. Not needed because of the above. IMHO a thread switch is much like a task switch from this point of view. >>> Get rid of htonl. It surely mapped to inline asm on Linux, but what >>> about other platforms? Just implement your own inline bswapl. >> >> >> If it wasn't inlined - who cares? It is only used in key generation, >> i.e. not very often. But I can add padlock_htonl() if you want. > > > I do, so yes, please. But why? > As for the name. One can actually argue if it > htonl or ntohl. To me the latter feels more appropriate actually:-) So > let's call it padlock_bswapl... OK ;-) >>> In my opinion aligning code abuses malloc. I'd recommend to declare an >>> automatic buffer of say 1KB and use it as temporary aligned storage >>> whenever nbytes is less than size of this automatic buffer, and fall to >>> malloc only if it's larger. Once you have the code in place, see how >>> this buffer size affects the benchmark. >> >> >> For small blocks it may be worth the effort, yes. I'll make it 1.5kB so >> that an ethernet packet fits there. > > > After extra consideration I'd even say that it's not necessarily > appropriate to use malloc *at all*. Indeed, imagine user says "I'd like > to encrypt 1GB," would you malloc it too? Indeed ;-) Well, seriously - it's a bad idea, agreed. > It's probably more appropriate > to stick *exclusively* to automatic buffer and if user asks for more > than this buffer size, just reuse it in loop. In which case(!) it would > be safe *not* to pushfl;popfl, right? Yes. One would have to store intermediate IVs but it's necessary anyway. > As for 1.5KB to accomodate an ethernet packet. Well, who says that > encryption is performed on ethernet frames? At least in SSL case it's > actually performed on larger chunks. I personally wouldn't get fixated > on any size in particular, but simply vary this automatic buffer size > while benchmarking and figure out which minimal size (but probably not > larger than pagesize) doesn't harm performance too much. OK, I'll prepare it and will run some benchmarks. I just don't know what kind of benchmarks to run...? >> Do you agree with using alloca()? > > Let's put it this way. If nobody else objects, I'd leave this choice up > to you. I.e. whether or not you choose to > aligned=alloca((nbytes<OPTIMAL_SIZE?nbytes:OPTIMAL_SIZE)+16) or simply > declare aligned[OPTIMAL_SIZE+16]. How does it sound? Cheers. A. The question now is how to get the optimal size? It will be different for interactive SSH, for HTTPS, ... I think the other overhead is that huge that fiddling with bytes for the optimal_size wouldn't bring too much. Indeed, for PadLock the bigger the better... Michal Ludvig ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
