Andy Polyakov wrote:
>>>> FYI "pushfl; popfl" is used to clear a bit in eflags that says whether
>>>> or not to reload the encryption key from memory. For now I always force
>>>> the reloading. Probably it could be more optimized later.
>>>
>>> [...]
>>> Which in turn means that as long as you *know* that no execution thread
>>> interleaves several contexts, pushfl;popfl is redundant. Right?
>>
>>
>> I'd say it's not uncommon for apps to have different keys (i.e.
>> contexts) for dataflows in different directions. E.g. one key for
>> incoming and one for outgoing data. Right?
> 
> 
> Oh come on:-) My question essentially was hypothetical. I posed it
> trying to figure out how the thing works. It doesn't help me if you ask
> another question back as I still don't know the answer:-) But never mind...

Sorry :-)
The flag eflags[30] gets cleared on *any* load of eflags from the stack,
i.e. also on every context- and task-switch. It is *very* likely that
the flag is already cleared when the PadLock engine is called because
either a syscall or a task switch happend since it's last usage (would
probably any interrupt do it as well?). So the pushf/popf is almost
always redundant anyway.

>> For now I'd leave it as it is and would add the comparsion to the last
>> context used later.
> 
> 
> Sure! Just one thing to remember for that later occasion. If you choose
> to track context through a global variable (in opposite to per-thread
> storage) make sure it's thread-safe, i.e. stick to atomic lock;cmpxchg
> or something similar.

Not needed because of the above. IMHO a thread switch is much like a
task switch from this point of view.

>>> Get rid of htonl. It surely mapped to inline asm on Linux, but what
>>> about other platforms? Just implement your own inline bswapl.
>>
>>
>> If it wasn't inlined - who cares? It is only used in key generation,
>> i.e. not very often. But I can add padlock_htonl() if you want.
> 
> 
> I do, so yes, please.

But why?

> As for the name. One can actually argue if it
> htonl or ntohl. To me the latter feels more appropriate actually:-) So
> let's call it padlock_bswapl...

OK ;-)

>>> In my opinion aligning code abuses malloc. I'd recommend to declare an
>>> automatic buffer of say 1KB and use it as temporary aligned storage
>>> whenever nbytes is less than size of this automatic buffer, and fall to
>>> malloc only if it's larger. Once you have the code in place, see how
>>> this buffer size affects the benchmark.
>>
>>
>> For small blocks it may be worth the effort, yes. I'll make it 1.5kB so
>> that an ethernet packet fits there.
> 
> 
> After extra consideration I'd even say that it's not necessarily
> appropriate to use malloc *at all*. Indeed, imagine user says "I'd like
> to encrypt 1GB," would you malloc it too? 

Indeed ;-) Well, seriously - it's a bad idea, agreed.

> It's probably more appropriate
> to stick *exclusively* to automatic buffer and if user asks for more
> than this buffer size, just reuse it in loop. In which case(!) it would
> be safe *not* to pushfl;popfl, right?

Yes. One would have to store intermediate IVs but it's necessary anyway.

> As for 1.5KB to accomodate an ethernet packet. Well, who says that
> encryption is performed on ethernet frames? At least in SSL case it's
> actually performed on larger chunks. I personally wouldn't get fixated
> on any size in particular, but simply vary this automatic buffer size
> while benchmarking and figure out which minimal size (but probably not
> larger than pagesize) doesn't harm performance too much.

OK, I'll prepare it and will run some benchmarks. I just don't know what
kind of benchmarks to run...?

>> Do you agree with using alloca()?
> 
> Let's put it this way. If nobody else objects, I'd leave this choice up
> to you. I.e. whether or not you choose to
> aligned=alloca((nbytes<OPTIMAL_SIZE?nbytes:OPTIMAL_SIZE)+16) or simply
> declare aligned[OPTIMAL_SIZE+16]. How does it sound? Cheers. A.

The question now is how to get the optimal size? It will be different
for interactive SSH, for HTTPS, ... I think the other overhead is that
huge that fiddling with bytes for the optimal_size wouldn't bring too
much. Indeed, for PadLock the bigger the better...

Michal Ludvig
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to