Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex

Emilio G. Cota Wed, 17 Aug 2016 11:20:20 -0700

On Wed, Aug 17, 2016 at 13:58:00 -0400, Emilio G. Cota wrote:
> due to my glaring lack of TCG competence.


A related note that might be of interest.

I benchmarked an alternative implementation that *does* instrument
stores. I wrapped every tcg_gen_qemu_st_i64 (those are enough, right?
tcg_gen_st_i64 are stores for the host memory, which I presume are
not "explicit" guest stores and therefore would not go through
the soft TLB) with a pre/post pair of helpers.

These helpers first check a bitmap given a masked subset of the physical
address of the access, and if the bit is set, then check a QHT with the full
physaddr. If an entry exists, they lock/unlock the entry's spinlock around
the store, so that no race is possible with an ongoing atomic (atomics always
take their corresponding lock). Overhead is not too bad over cmpxchg, but
most of it comes from the helpers--see these numbers for SPEC:
(NB. the "QEMU" baseline does *not* include QHT for tb_htable and therefore
takes tb_lock around tb_find_fast, that's why it's so slow)
  http://imgur.com/a/SoSHQ

"QHT only" means a QHT lookup is performed on every guest store. The win of
having the bitmap before hitting the QHT is quite large. I wonder
if things could be sped up further by performing the bitmap check in
TCG code. Would that be worth exploring? If so, any help on that would
be appreciated (i386 host at least)--I tried, but I'm way out of my element.

                E.

Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex

Reply via email to