Re: [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation

2016-02-19 Thread Alex Bennée

alvise rigo  writes:

> On Fri, Feb 19, 2016 at 12:44 PM, Alex Bennée  wrote:
>>
>> Alvise Rigo  writes:
>>
>>> This is the seventh iteration of the patch series which applies to the
>>> upstream branch of QEMU (v2.5.0-rc4).
>>>
>>> Changes versus previous versions are at the bottom of this cover letter.
>>>
>>> The code is also available at following repository:
>>> https://git.virtualopensystems.com/dev/qemu-mt.git
>>> branch:
>>> slowpath-for-atomic-v7-no-mttcg
>>
>> OK I'm done on this review pass. I think generally we are in pretty good
>> shape although I await to see what extra needs to be done for the MTTCG
>> case.
>
> Hi Alex,
>
> Thank you for this review. Regarding the extra needs and integration
> with the MTTCG code, I've made available at this address [1] a working
> branch with the two patch series merged together. The branch boots
> fine Linux on both aarch64 and arm architectures. There is still that
> known issue with virtio, that Fred should fix soon. Let me know your
> first impressions.

Does the virtio problem it go away if you drop the top patch (as per my last 
email)?

>
> [1] https://git.virtualopensystems.com/dev/qemu-mt.git (branch
> "merging-slowpath-v7-mttcg-v8-wip")

Thanks I'll have a look next week.

>
> Thank you,
> alvise
>
>>
>> We are coming up to soft-freeze on 1/3/16 and it would be nice to get
>> this merged by then. As it is a fairly major chunk of work it would need
>> to get the initial commit by that date.
>>
>> However before we can get to that stage we need some review from the
>> maintainers. For your next version can you please:
>>
>>   - Drop the RFC tag, I think we have had enough comment ;-)
>>   - Make sure you CC the TCG maintainers (Paolo, Peter C and Richard 
>> Henderson)
>>   - Also CC the ARM maintainers (Peter M)
>>   - Be ready for a fast turnaround
>>
>> Paolo/Richard,
>>
>> Do you have any comments on this iteration?
>>
>> --
>> Alex Bennée


--
Alex Bennée



Re: [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation

2016-02-19 Thread alvise rigo
On Fri, Feb 19, 2016 at 12:44 PM, Alex Bennée  wrote:
>
> Alvise Rigo  writes:
>
>> This is the seventh iteration of the patch series which applies to the
>> upstream branch of QEMU (v2.5.0-rc4).
>>
>> Changes versus previous versions are at the bottom of this cover letter.
>>
>> The code is also available at following repository:
>> https://git.virtualopensystems.com/dev/qemu-mt.git
>> branch:
>> slowpath-for-atomic-v7-no-mttcg
>
> OK I'm done on this review pass. I think generally we are in pretty good
> shape although I await to see what extra needs to be done for the MTTCG
> case.

Hi Alex,

Thank you for this review. Regarding the extra needs and integration
with the MTTCG code, I've made available at this address [1] a working
branch with the two patch series merged together. The branch boots
fine Linux on both aarch64 and arm architectures. There is still that
known issue with virtio, that Fred should fix soon. Let me know your
first impressions.

[1] https://git.virtualopensystems.com/dev/qemu-mt.git (branch
"merging-slowpath-v7-mttcg-v8-wip")

Thank you,
alvise

>
> We are coming up to soft-freeze on 1/3/16 and it would be nice to get
> this merged by then. As it is a fairly major chunk of work it would need
> to get the initial commit by that date.
>
> However before we can get to that stage we need some review from the
> maintainers. For your next version can you please:
>
>   - Drop the RFC tag, I think we have had enough comment ;-)
>   - Make sure you CC the TCG maintainers (Paolo, Peter C and Richard 
> Henderson)
>   - Also CC the ARM maintainers (Peter M)
>   - Be ready for a fast turnaround
>
> Paolo/Richard,
>
> Do you have any comments on this iteration?
>
> --
> Alex Bennée



Re: [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation

2016-02-19 Thread Alex Bennée

Alvise Rigo  writes:

> This is the seventh iteration of the patch series which applies to the
> upstream branch of QEMU (v2.5.0-rc4).
>
> Changes versus previous versions are at the bottom of this cover letter.
>
> The code is also available at following repository:
> https://git.virtualopensystems.com/dev/qemu-mt.git
> branch:
> slowpath-for-atomic-v7-no-mttcg

OK I'm done on this review pass. I think generally we are in pretty good
shape although I await to see what extra needs to be done for the MTTCG
case.

We are coming up to soft-freeze on 1/3/16 and it would be nice to get
this merged by then. As it is a fairly major chunk of work it would need
to get the initial commit by that date.

However before we can get to that stage we need some review from the
maintainers. For your next version can you please:

  - Drop the RFC tag, I think we have had enough comment ;-)
  - Make sure you CC the TCG maintainers (Paolo, Peter C and Richard Henderson)
  - Also CC the ARM maintainers (Peter M)
  - Be ready for a fast turnaround

Paolo/Richard,

Do you have any comments on this iteration?

--
Alex Bennée



[Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation

2016-01-29 Thread Alvise Rigo
This is the seventh iteration of the patch series which applies to the
upstream branch of QEMU (v2.5.0-rc4).

Changes versus previous versions are at the bottom of this cover letter.

The code is also available at following repository:
https://git.virtualopensystems.com/dev/qemu-mt.git
branch:
slowpath-for-atomic-v7-no-mttcg

This patch series provides an infrastructure for atomic instruction
implementation in QEMU, thus offering a 'legacy' solution for
translating guest atomic instructions. Moreover, it can be considered as
a first step toward a multi-thread TCG.

The underlying idea is to provide new TCG helpers (sort of softmmu
helpers) that guarantee atomicity to some memory accesses or in general
a way to define memory transactions.

More specifically, the new softmmu helpers behave as LoadLink and
StoreConditional instructions, and are called from TCG code by means of
target specific helpers. This work includes the implementation for all
the ARM atomic instructions, see target-arm/op_helper.c.

The implementation heavily uses the software TLB together with a new
bitmap that has been added to the ram_list structure which flags, on a
per-CPU basis, all the memory pages that are in the middle of a LoadLink
(LL), StoreConditional (SC) operation.  Since all these pages can be
accessed directly through the fast-path and alter a vCPU's linked value,
the new bitmap has been coupled with a new TLB flag for the TLB virtual
address which forces the slow-path execution for all the accesses to a
page containing a linked address.

The new slow-path is implemented such that:
- the LL behaves as a normal load slow-path, except for clearing the
  dirty flag in the bitmap.  The cputlb.c code while generating a TLB
  entry, checks if there is at least one vCPU that has the bit cleared
  in the exclusive bitmap, it that case the TLB entry will have the EXCL
  flag set, thus forcing the slow-path.  In order to ensure that all the
  vCPUs will follow the slow-path for that page, we flush the TLB cache
  of all the other vCPUs.

  The LL will also set the linked address and size of the access in a
  vCPU's private variable. After the corresponding SC, this address will
  be set to a reset value.

- the SC can fail returning 1, or succeed, returning 0.  It has to come
  always after a LL and has to access the same address 'linked' by the
  previous LL, otherwise it will fail. If in the time window delimited
  by a legit pair of LL/SC operations another write access happens to
  the linked address, the SC will fail.

In theory, the provided implementation of TCG LoadLink/StoreConditional
can be used to properly handle atomic instructions on any architecture.

The code has been tested with bare-metal test cases and by booting Linux.

* Performance considerations
The new slow-path adds some overhead to the translation of the ARM
atomic instructions, since their emulation doesn't happen anymore only
in the guest (by means of pure TCG generated code), but requires the
execution of two helpers functions. Despite this, the additional time
required to boot an ARM Linux kernel on an i7 clocked at 2.5GHz is
negligible.
Instead, on a LL/SC bound test scenario - like:
https://git.virtualopensystems.com/dev/tcg_baremetal_tests.git - this
solution requires 30% (1 million iterations) and 70% (10 millions
iterations) of additional time for the test to complete.

Changes from v6:
- Included aligned variants of the exclusive helpers
- Reverted to single bit per page design in DIRTY_MEMORY_EXCLUSIVE
  bitmap. The new way we restore the pages as non-exclusive (PATCH 13)
  made the per-VCPU design unnecessary.
- arm32 now uses aligned exclusive accesses
- aarch64 exclusive instructions implemented [PATCH 15-16]
- Addressed comments from Alex

Changes from v5:
- The exclusive memory region is now set through a CPUClass hook,
  allowing any architecture to decide the memory area that will be
  protected during a LL/SC operation [PATCH 3]
- The runtime helpers dropped any target dependency and are now in a
  common file [PATCH 5]
- Improved the way we restore a guest page as non-exclusive [PATCH 9]
- Included MMIO memory as possible target of LL/SC
  instructions. This also required to somehow simplify the
  helper_*_st_name helpers in softmmu_template.h [PATCH 8-14]

Changes from v4:
- Reworked the exclusive bitmap to be of fixed size (8 bits per address)
- The slow-path is now TCG backend independent, no need to touch
  tcg/* anymore as suggested by Aurelien Jarno.

Changes from v3:
- based on upstream QEMU
- addressed comments from Alex Bennée
- the slow path can be enabled by the user with:
  ./configure --enable-tcg-ldst-excl only if the backend supports it
- all the ARM ldex/stex instructions make now use of the slow path
- added aarch64 TCG backend support
- part of the code has been rewritten

Changes from v2:
- the bitmap accessors are now atomic
- a rendezvous between vCPUs and a simple callback support before executing
  a TB have been