On 08/07/16 23:18, Paolo Bonzini wrote: > > On 08/07/2016 21:55, Sergey Fedorov wrote: >> On 08/07/16 17:07, Paolo Bonzini wrote: >>> On 08/07/2016 14:32, Sergey Fedorov wrote: >>>>>>>> I think we can do even better. One option is using a separate tiny lock >>>>>>>> to protect direct jump set/reset instead of tb_lock. >>>>>> If you have to use a separate tiny lock, you don't gain anything compared >>>>>> to the two critical sections, do you? >>>> If we have a separate lock for direct jump set/reset then we can do fast >>>> TB lookup + direct jump patching without taking tb_lock at all. How much >>>> this would reduce lock contention largely depends on the workload we use. >>> Yeah, it probably would be easy enough that it's hard to object to it >>> (unlike the other idea below, which I'm not very comfortable with, at >>> least without seeing patches). >>> >>> The main advantage would be that this tiny lock could be a spinlock >>> rather than a mutex. >> Well, the problem is more subtle than we thought: tb_find_fast() can >> race with tb_phys_invalidate(). The first tb_find_phys() out of the lock >> can return a TB which is being invalidated. Then a direct jump can be >> set up to this TB. It can happen after concurrent tb_phys_invalidate() >> resets all the direct jumps to the TB. Thus we can end up with a direct >> jump to an invalidated TB. Even extending tb_lock critical section >> wouldn't help if at least one tb_find_phys() is performed out of the lock. > Ahem, isn't this exactly why tb_find_phys was invalidating the PC in my > patches, as the very first step?... (The smp_wmb after invalidating the > PC paired with an atomic_rcu_read in tb_find_fast; now we could do it > after computing the hash and before calling qht_remove). > > It turned out that invalidating the PC wasn't as easy as writing -1 to > the pc, but it's possible to do one of these: > > 1) set cs_base to an invalid value (all-ones works for everything except > x86---instead anything nonzero is enough except on > x86 and SPARC) > > 2) set the flags to an invalid combination (x86 can use all ones or > rename the useless HF_SOFTMMU_MASK to HF_INVALID_MASK).
I remember, I've just found that we discussed it in this thread: http://thread.gmane.org/gmane.comp.emulators.qemu/401723/focus=406852 I was thinking of just doing 'tb_jmp_cache' lookup out of the lock, not tb_find_physical(). Now thanks to QHT, we could do tb_find_physical() out of the lock, too. This changes things. Kind regards, Sergey