On 2/14/20 6:49 AM, Alex Bennée wrote: > The bug describes a race whereby cpu_exec_step_atomic can acquire a TB > which is invalidated by a tb_flush before we execute it. This doesn't > affect the other cpu_exec modes as a tb_flush by it's nature can only > occur on a quiescent system. The race was described as: > > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code > B3. tcg_tb_alloc obtains a new TB > > C3. TB obtained with tb_lookup__cpu_state or tb_gen_code > (same TB as B2) > > A3. start_exclusive critical section entered > A4. do_tb_flush is called, TB memory freed/re-allocated > A5. end_exclusive exits critical section > > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code > B3. tcg_tb_alloc reallocates TB from B2 > > C4. start_exclusive critical section entered > C5. cpu_tb_exec executes the TB code that was free in A4 > > The simplest fix is to widen the exclusive period to include the TB > lookup. As a result we can drop the complication of checking we are in > the exclusive region before we end it.
I'm not 100% keen on having the tb_gen_code within the exclusive region. It implies a much larger delay on (at least) the first execution of the atomic operation. But I suppose until recently we had a global lock around code generation, and this is only slightly worse. Plus, it has the advantage of being dead simple, and without the races vs tb_ctx.tb_flush_count that exist in Yifan's patch. Applied to tcg-next. r~