Re: [Qemu-devel] [PATCH v2 10/17] translate-all: use per-page locking in !user-mode

2018-05-08 Thread Emilio G. Cota
Hi Richard,

On Mon, Apr 23, 2018 at 20:18:31 -0400, Emilio G. Cota wrote:
> On Fri, Apr 13, 2018 at 17:29:20 -1000, Richard Henderson wrote:
> > On 04/05/2018 04:13 PM, Emilio G. Cota wrote:
> (snip)
> > > +struct page_collection {
> > > +GTree *tree;
> > > +struct page_entry *max;
> > > +};
> > 
> > I don't understand the motivation for this data structure.  Substituting one
> > tree for another does not, on the face of it, seem to be a win.
> > 
> > Given that your locking order is based on the physical address, I can
> > understand that the sequential virtual addresses that these routines are 
> > given
> > is not appropriate.  But surely you should know how many pages are involved,
> > and therefore be able to allocate a flat array to hold the PageDesc.
> > 
> > > +/*
> > > + * Lock a range of pages ([@start,@end[) as well as the pages of all
> > > + * intersecting TBs.
> > > + * Locking order: acquire locks in ascending order of page index.
> > > + */
> > 
> > I don't think I understand this either.  From whence do you wind up with a
> > range of physical addresses?
> 
> For instance in tb_invalidate_phys_page_range. We need to invalidate
> all TBs associated with a range of phys addresses.
> 
> I am not sure how an array would make things easier, since we need
> to lock the pages in the given range, as well as the pages that
> overlap with the TBs in said range (since we'll invalidate the TBs).
> For example, if we have to invalidate all TBs in the range A-E, it
> is possible that a TB in page C will overlap with page K (not in
> the original range), so we'll have to lock page K as well. All of
> this needs to be done in order, that is, A-E,K.
> 
> If we had an array, we'd have to resize the array anytime we had
> an out-of-range page, and then do a binary search in the array
> to check whether we already locked that page. At this point
> we'd be reinventing a binary tree, so it seems simpler to just
> use a tree.

Do you have any comments wrt the above (and my other replies to
your comments in this series)? I'd like to do a respin this week.

Thanks,

Emilio



Re: [Qemu-devel] [PATCH v2 10/17] translate-all: use per-page locking in !user-mode

2018-04-23 Thread Emilio G. Cota
On Thu, Apr 05, 2018 at 22:13:01 -0400, Emilio G. Cota wrote:
> +/*
> + * Lock a range of pages ([@start,@end[) as well as the pages of all
> + * intersecting TBs.
> + * Locking order: acquire locks in ascending order of page index.
> + */
> +struct page_collection *
> +page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
> +{
> +struct page_collection *set = g_malloc(sizeof(*set));
> +tb_page_addr_t index;
> +PageDesc *pd;
> +
> +start >>= TARGET_PAGE_BITS;
> +end   >>= TARGET_PAGE_BITS;
> +g_assert(start <= end);
> +
> +set->tree = g_tree_new_full(tb_page_addr_cmp, NULL, NULL,
> +page_entry_destroy);
> +set->max = NULL;
> +
> + retry:
> +g_tree_foreach(set->tree, page_entry_lock, NULL);
> +
> +for (index = start; index <= end; index++) {
> +TranslationBlock *tb;
> +int n;
> +
> +pd = page_find(index);
> +if (pd == NULL) {
> +continue;
> +}

Just noticed there's a bug here: we need to acquire pd's page lock,
since it protects the traversal of the list of pd's TBs.

Fixed in v3 as shown below.

> +PAGE_FOR_EACH_TB(pd, tb, n) {
> +if (page_trylock_add(set, tb->page_addr[0]) ||
> +(tb->page_addr[1] != -1 &&
> + page_trylock_add(set, tb->page_addr[1]))) {
> +/* drop all locks, and reacquire in order */
> +g_tree_foreach(set->tree, page_entry_unlock, NULL);
> +goto retry;
> +}
> +}
> +}
> +return set;
> +}

Thanks,

Emilio
---
@@ -840,6 +840,12 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t 
end)
 if (pd == NULL) {
 continue;
 }
+if (page_trylock_add(set, index << TARGET_PAGE_BITS)) {
+g_tree_foreach(set->tree, page_entry_unlock, NULL);
+set_page_collection_locked(false);
+goto retry;
+}
+assert_page_locked(pd);
 PAGE_FOR_EACH_TB(pd, tb, n) {
 if (page_trylock_add(set, tb->page_addr[0]) ||
 (tb->page_addr[1] != -1 &&



Re: [Qemu-devel] [PATCH v2 10/17] translate-all: use per-page locking in !user-mode

2018-04-23 Thread Emilio G. Cota
On Fri, Apr 13, 2018 at 17:29:20 -1000, Richard Henderson wrote:
> On 04/05/2018 04:13 PM, Emilio G. Cota wrote:
(snip)
> > +struct page_collection {
> > +GTree *tree;
> > +struct page_entry *max;
> > +};
> 
> I don't understand the motivation for this data structure.  Substituting one
> tree for another does not, on the face of it, seem to be a win.
> 
> Given that your locking order is based on the physical address, I can
> understand that the sequential virtual addresses that these routines are given
> is not appropriate.  But surely you should know how many pages are involved,
> and therefore be able to allocate a flat array to hold the PageDesc.
> 
> > +/*
> > + * Lock a range of pages ([@start,@end[) as well as the pages of all
> > + * intersecting TBs.
> > + * Locking order: acquire locks in ascending order of page index.
> > + */
> 
> I don't think I understand this either.  From whence do you wind up with a
> range of physical addresses?

For instance in tb_invalidate_phys_page_range. We need to invalidate
all TBs associated with a range of phys addresses.

I am not sure how an array would make things easier, since we need
to lock the pages in the given range, as well as the pages that
overlap with the TBs in said range (since we'll invalidate the TBs).
For example, if we have to invalidate all TBs in the range A-E, it
is possible that a TB in page C will overlap with page K (not in
the original range), so we'll have to lock page K as well. All of
this needs to be done in order, that is, A-E,K.

If we had an array, we'd have to resize the array anytime we had
an out-of-range page, and then do a binary search in the array
to check whether we already locked that page. At this point
we'd be reinventing a binary tree, so it seems simpler to just
use a tree.

> > +struct page_collection *
> > +page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
> 
> ...
> 
> > +/*
> > + * Add the TB to the page list.
> > + * To avoid deadlock, acquire first the lock of the lower-addressed 
> > page.
> > + */
> > +p = page_find_alloc(phys_pc >> TARGET_PAGE_BITS, 1);
> > +if (likely(phys_page2 == -1)) {
> >  tb->page_addr[1] = -1;
> > +page_lock(p);
> > +tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK);
> > +} else {
> > +p2 = page_find_alloc(phys_page2 >> TARGET_PAGE_BITS, 1);
> > +if (phys_pc < phys_page2) {
> > +page_lock(p);
> > +page_lock(p2);
> > +} else {
> > +page_lock(p2);
> > +page_lock(p);
> > +}
> 
> Extract this as a helper for use here and page_lock_tb?

Done. Alex already suggested this when reviewing v1; I should
have done it then instead of resisting. Fixup appended.

> >  /*
> >   * Invalidate all TBs which intersect with the target physical address 
> > range
> > + * [start;end[. NOTE: start and end must refer to the *same* physical page.
> > + * 'is_cpu_write_access' should be true if called from a real cpu write
> > + * access: the virtual CPU will exit the current TB if code is modified 
> > inside
> > + * this TB.
> > + *
> > + * Called with tb_lock/mmap_lock held for user-mode emulation
> > + * Called with tb_lock held for system-mode emulation
> > + */
> > +void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t 
> > end,
> > +   int is_cpu_write_access)
> 
> FWIW, we should probably notice and optimize end = start + 1, which appears to
> have the largest number of users for e.g. watchpoints.

This is also the case when booting linux (~99% of cases). Once we
agree on the correctness of the whole thing we can look into
making the common case faster, if necessary.

Thanks,

Emilio
---
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f6ff087..9b21c1a 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -549,6 +549,9 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 return page_find_alloc(index, 0);
 }
 
+static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
+   PageDesc **ret_p2, tb_page_addr_t phys2, int alloc);
+
 /* In user-mode page locks aren't used; mmap_lock is enough */
 #ifdef CONFIG_USER_ONLY
 
@@ -682,17 +685,7 @@ static inline void page_unlock(PageDesc *pd)
 /* lock the page(s) of a TB in the correct acquisition order */
 static inline void page_lock_tb(const TranslationBlock *tb)
 {
-if (likely(tb->page_addr[1] == -1)) {
-page_lock(page_find(tb->page_addr[0] >> TARGET_PAGE_BITS));
-return;
-}
-if (tb->page_addr[0] < tb->page_addr[1]) {
-page_lock(page_find(tb->page_addr[0] >> TARGET_PAGE_BITS));
-page_lock(page_find(tb->page_addr[1] >> TARGET_PAGE_BITS));
-} else {
-page_lock(page_find(tb->page_addr[1] >> TARGET_PAGE_BITS));
-page_lock(page_find(tb->page_addr[0] >> TARGET_PAGE_BITS));
-}
+page_lock_pair(NULL,

Re: [Qemu-devel] [PATCH v2 10/17] translate-all: use per-page locking in !user-mode

2018-04-13 Thread Richard Henderson
On 04/05/2018 04:13 PM, Emilio G. Cota wrote:
> +/**
> + * struct page_collection - tracks a set of pages (i.e. &struct page_entry's)
> + * @tree:   Binary search tree (BST) of the pages, with key == page index
> + * @max:Pointer to the page in @tree with the highest page index
> + *
> + * To avoid deadlock we lock pages in ascending order of page index.
> + * When operating on a set of pages, we need to keep track of them so that
> + * we can lock them in order and also unlock them later. For this we collect
> + * pages (i.e. &struct page_entry's) in a binary search @tree. Given that the
> + * @tree implementation we use does not provide an O(1) operation to obtain 
> the
> + * highest-ranked element, we use @max to keep track of the inserted page
> + * with the highest index. This is valuable because if a page is not in
> + * the tree and its index is higher than @max's, then we can lock it
> + * without breaking the locking order rule.
> + *
> + * Note on naming: 'struct page_set' would be shorter, but we already have a 
> few
> + * page_set_*() helpers, so page_collection is used instead to avoid 
> confusion.
> + *
> + * See also: page_collection_lock().
> + */
> +struct page_collection {
> +GTree *tree;
> +struct page_entry *max;
> +};

I don't understand the motivation for this data structure.  Substituting one
tree for another does not, on the face of it, seem to be a win.

Given that your locking order is based on the physical address, I can
understand that the sequential virtual addresses that these routines are given
is not appropriate.  But surely you should know how many pages are involved,
and therefore be able to allocate a flat array to hold the PageDesc.

> +/*
> + * Lock a range of pages ([@start,@end[) as well as the pages of all
> + * intersecting TBs.
> + * Locking order: acquire locks in ascending order of page index.
> + */

I don't think I understand this either.  From whence do you wind up with a
range of physical addresses?

> +struct page_collection *
> +page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)

...

> +/*
> + * Add the TB to the page list.
> + * To avoid deadlock, acquire first the lock of the lower-addressed page.
> + */
> +p = page_find_alloc(phys_pc >> TARGET_PAGE_BITS, 1);
> +if (likely(phys_page2 == -1)) {
>  tb->page_addr[1] = -1;
> +page_lock(p);
> +tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK);
> +} else {
> +p2 = page_find_alloc(phys_page2 >> TARGET_PAGE_BITS, 1);
> +if (phys_pc < phys_page2) {
> +page_lock(p);
> +page_lock(p2);
> +} else {
> +page_lock(p2);
> +page_lock(p);
> +}

Extract this as a helper for use here and page_lock_tb?

>  /*
>   * Invalidate all TBs which intersect with the target physical address range
> + * [start;end[. NOTE: start and end must refer to the *same* physical page.
> + * 'is_cpu_write_access' should be true if called from a real cpu write
> + * access: the virtual CPU will exit the current TB if code is modified 
> inside
> + * this TB.
> + *
> + * Called with tb_lock/mmap_lock held for user-mode emulation
> + * Called with tb_lock held for system-mode emulation
> + */
> +void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
> +   int is_cpu_write_access)

FWIW, we should probably notice and optimize end = start + 1, which appears to
have the largest number of users for e.g. watchpoints.


r~