On 05/12/14 13:49, Stefan Kristiansson wrote:
> On Mon, May 12, 2014 at 2:25 PM, Jonas Bonn <[email protected]> wrote:
> i) the _only_ reason we need scratch regs is because we need to switch
> from the user-stack to the process' kernel stack.
> ii) for syscalls, we have scratch regs (the call-clobbered regs) so
> there's no issue here... I don't think I pushed my patch for this
> upstream, but it should be in my repo somewhere... check for branches
> with syscall in the name
> iii) for exceptions, there's nothing you can do that will suddenly get
> you a scratch reg... _all_ the regs are precious and need to be restored
> exactly as they were when returning to userspace.  Switching from r10 to
> r21 gets you nothing here as you still need to save r21.
> You miss the point of what I tried to achieve, I was hoping to make
> use of r10 as a "kernel reserved register" (since that's what the
> 'bug' basically could be interpreted as).
> So, with an ABI like that, yes, you'd need to save r21, but not r10.
> But, this discussion is moot anyway now ;)
So you wanted to:

i) "leak" the value of r10 userspace... though it shouldn't matter if
that page isn't user-accessible; not sure if the kernel gets that right
(presumably not)
ii) ...and hope that userspace doesn't "accidentally" modify its value
so that you can use it unmodified in the kernel?

Both of these are a bad idea.  Glad the idea died on its own... :)

v) I never liked the way Linux uses some hard-coded low memory address
for scratch space... there's almost always a cache-miss there to begin
with that slows down exception handling a fair bit.

> Yeah, and the problem we are trying to solve now is that it doesn't
> work in the multi-core case.
> So it's a multi-issue that would be nice to get around. ;)
Yes, of course... but even for single core it's crap, that's my point.
>> ii) r1 is the "stack" register and this is documented in the spec.  What
>> if there was an SPR that contained the "supervisor stack" address, and
>> on an exception the processor automatically writes that value into r1
>> and saves the value of r1 (the userspace stack pointer) into another SPR
>> ("process stack" SPR, like it saves the PC to an SPR...?  This would
>> amount to a "fast" context switch that's relevant for Linux, at least...
>> a fast swap of the process stack.  And it really only requires a single
>> SPR as you could presumably just swap the value of r1 with this SPR when
>> going to/from an exception.
>>
>> SPR_EXCEPTION_STACK: contains the address of an exception stack to
>> switch to on exception entry; upon exception entry, the values of this
>> SPR and r1 will be swapped so that r1 points to the exception stack and
>> the SPR contains the value of the r1 from the interrupted context
>>
> Hmm, interesting thought, that will need some more consideration.
> Otoh, you could quite easily manually implement this kind of behaviour
> with the more "generic" kind of SPR scratch regs we have been
> discussing below.
I think the TLB handlers want some scratch space as well, though one
could probably handle this with my SPR_EXCEPTION_STACK register as
well... the TLB handlers use scratch space (I think) because we don't
bother to set up the kernel stack for them....

Thinking about this some more:

i) scratch space really just comes down to one or two per-cpu variables;
we don't really need SPR's for this at all, but having some freely
usable SPR's allows us to stash some per-cpu variables there instead of
in memory

ii) we currently use 3 scratch values in the exception entry code

iii) I think we really only need 2 scratch values because we could
calculate the value of the kernel stack pointer given
current_thread_info instead of loading it from current_thread_info

iv) If we had direct access to the physical value of current_thread_info
(instead of the virtual address), then we wouldn't need the to_phys()
calls and could probably make do with a single scratch value; this could
easily be added to the kernel

v) note that an in-kernel exception, however, probably requires that we
do a to_phys() on r1 so we pretty much require 2 scratch values no
matter what

Anyway, considering this, any freely usable SPR that we add allows us to
keep a per-cpu variable close to the CPU instead of in memory... (is
this any different from a non-flushable cache line?)  The more such
SPR's we have, the more per-cpu values we could cache.  We'd need two to
use as scratch space for exception entry, but one of those could double
as the "per-cpu physical address of current_thread_info" (the details of
which I'll leave out for now) even allowing us to free up r10 (though
that's moot as userspace now has it pegged as fixed-function).

So I guess my conclusion given the above is that a couple of SPR_PERCPUx
would be a good idea, solving the problem of scratch space on exception
entry and being usable for other per-cpu optimizations on a
implementation-by-implementation basis (where the Linux implementatin
requires "scratch space" while others may not and use the regs for other
purposes)

/Jonas
_______________________________________________
OpenRISC mailing list
[email protected]
http://lists.openrisc.net/listinfo/openrisc

Reply via email to