Le 2016-10-11 14:42, Maxime Villard a écrit :
Userland is pageable, so when mmap is called with one page, the kernel
yet make the page officially available to the CPU. Rather, it waits for
to fault, and at fault time it will make it valid for real. It means
code path from the interrupt to the moment when the page is entered
needs to be
All this to say that in pmap_enter_ma on x86, an optimization is
this function, new_pve and new_sparepve are always allocated, but not
needed. The reason it is done this way is because preemption is
disabled in the
critical part, so obviously the allocation needs to be performed
new_pve and new_sparepve are to be used in pmap_enter_pv. After adding
counters in here, a './build.sh tools' gives these numbers:
PVE: used=36441394 unused=58955001
SPAREPVE: used=1647254 unused=93749141
It means that 38088648 allocations were needed and performed, and
performed but not used. In short, only 19% of the allocated buffers
Verily, the real number may be even smaller than that, since I didn't
account the fact that there may be no p->v tracking at all (in which
buffers would be unused as well).
I have a patch which introduces two inlined functions that can tell
whether these buffers are needed. One problem with this patch is that
the code harder to understand, even though I tried to explain clearly
are doing. Another problem is that when both buffers are needed, my
introduces a little overhead (the cost of a few branches).
I don't know if we care enough about things like that, if someone here
particular comments feel free.
I would benchmark both (with and without the "overhead" introduced); a
while back when implementing PAE I did not expect the paddr_t promotion
from 32 to 64 bits to have that much of an impact on pmap performance,
but the first attempt induced more that 5% overhead on a "cold"
Granted, you are not dealing with the same situation here but pool
caches make the allocation used/unused dance almost free (except for the
slow path). When objets are in the pool cache but not yet obtained
through the getter, they are still allocated but basically not used. It
would be interesting to see if the hit/miss ratio is affected for the
"pvpl" pool with your optimization.