Re: [Xen-devel] null domains after xl destroy

Steven Haigh Mon, 15 May 2017 18:19:34 -0700

On 2017-05-16 10:49, Glenn Enright wrote:

On 15/05/17 21:57, Juergen Gross wrote:

On 13/05/17 06:02, Glenn Enright wrote:

On 09/05/17 21:24, Roger Pau Monné wrote:

On Mon, May 08, 2017 at 11:10:24AM +0200, Juergen Gross wrote:
On 04/05/17 00:17, Glenn Enright wrote:
On 04/05/17 04:58, Steven Haigh wrote:
On 04/05/17 01:53, Juergen Gross wrote:
On 03/05/17 12:45, Steven Haigh wrote:
Just wanted to give this a little nudge now people seem to be
back on
deck...
Glenn, could you please give the attached patch a try?
It should be applied on top of the other correction, the olddebug
patch should not be applied.
I have added some debug output to make sure we see what ishappening.
This patch is included in kernel-xen-4.9.26-1

It should be in the repos now.
Still seeing the same issue. Without the extra debug patch all Isee in
the logs after destroy is this...

xen-blkback: xen_blkif_disconnect: busy
xen-blkback: xen_blkif_free: delayed = 0
Hmm, to me it seems as if some grant isn't being unmapped.
Looking at gnttab_unmap_refs_async() I wonder how this is supposedto
work:

I don't see how a grant would ever be unmapped in case of
page_count(item->pages[pc]) > 1 in __gnttab_unmap_refs_async(). Allitdoes is deferring the call to the unmap operation again and again.Or
am I missing something here?
No, I don't think you are missing anything, but I cannot see howthis
can be
solved in a better way, unmapping a page that's still referenced is
certainly
not the best option, or else we risk triggering a page-faultelsewhere.
IMHO, gnttab_unmap_refs_async should have a timeout, and return an
error at
some point. Also, I'm wondering whether there's a way to keep trackof
who has
references on a specific page, but so far I haven't been able to
figure out how
to get this information from Linux.

Also, I've noticed that __gnttab_unmap_refs_async uses page_count,
shouldn't it
use page_ref_count instead?

Roger.

In case it helps, I have continued to work on this. I noticesprocessed

left behind (under 4.9.27). The same issue is ongoing.

# ps auxf | grep [x]vda
root      2983  0.0  0.0      0     0 ?        S    01:44   0:00  \_
[1.xvda1-1]
root      5457  0.0  0.0      0     0 ?        S    02:06   0:00  \_
[3.xvda1-1]
root      7382  0.0  0.0      0     0 ?        S    02:36   0:00  \_
[4.xvda1-1]
root      9668  0.0  0.0      0     0 ?        S    02:51   0:00  \_
[6.xvda1-1]
root     11080  0.0  0.0      0     0 ?        S    02:57   0:00  \_
[7.xvda1-1]

# xl list
Name                              ID   Mem VCPUs      State   Time(s)
Domain-0                          0  1512     2     r-----     118.5
(null)                            1     8     4     --p--d      43.8
(null)                            3     8     4     --p--d       6.3
(null)                            4     8     4     --p--d      73.4
(null)                            6     8     4     --p--d      14.7
(null)                            7     8     4     --p--d      30

Those all have...

[root 11080]# cat wchan
xen_blkif_schedule

[root 11080]# cat stack
[<ffffffff814eaee8>] xen_blkif_schedule+0x418/0xb40
[<ffffffff810a0555>] kthread+0xe5/0x100
[<ffffffff816f1c45>] ret_from_fork+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff


And found another reference count bug. Would you like to give the

attached patch (to be applied additionally to the previous ones) atry?



Juergen


This seems to have solved the issue in 4.9.28, with all three patches
applied. Awesome!

On my main test machine I can no longer replicate what I was
originally seeing, and in dmesg I now see this flow...

xen-blkback: xen_blkif_disconnect: busy
xen-blkback: xen_blkif_free: delayed = 1
xen-blkback: xen_blkif_free: delayed = 0

xl list is clean, xenstore looks right. No extraneous processes leftover.


Thankyou Juergen, so much. Really appreciate your persistence with
this. Anything I can do to help push this upstream please let me know.
Feel free to add a reported-by line with my name if you think it
appropriate.


This is good news.

Juergen, Can I request a full patch set posted to the list (plz CC me) -and I'll ensure we can build the kernel with all 3 (?) patches appliedand test properly.

I'll build up a complete kernel with those patches and give a tested-byif all goes well.


--
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] null domains after xl destroy

Reply via email to