Re: [Xen-devel] xen-4.6: xenstored crashes during domain->interface access
On Thu, 2016-01-28 at 09:50 +0100, Stefan Bader wrote: > On 26.01.2016 11:58, Stefan Bader wrote: > > Hi, > > > > while playing around with xen-4.6 I stumbled over an odd problem and am > > wondering whether anybody has seen the same. A method to relatively > > quickly > > reproduce this for me seems to: > > > > - Start one domU (PV or HVM does not seem to matter) > > - Repeatedly call xenstore-ls a few times > > > > I think I never got beyond 10 repeats when the xenstore-ls call > > suddenly locks > > up and xenstored crashes with a SIGBUS error. In the majority of cases > > (I think > > I saw one different), the crash happens while accessing conn->domain- > > >interface > > in tools/xenstore/xenstored_domain.c:domain_can_read(). > > Looking at the corefile produced by xenstored I now got at least one > > case where > > the pointer still matches the previously mapped value. Though I think I > > had also > > at least one run (with less debugging added) where it seemed to be > > really wrong. > > There is more info at [1] in case someone is interested. > > > > I need to repeat a few more times to see how consistent the whole thing > > is. Does > > this happen for anybody else? Any advice what I should look at (in the > > sense of > > gathering better data)? > > Just as an update and confirmation for Ian and Bastian: Debian testing is > fine. > I have not dug into the specifics but its not the Xen package side at all. > Something in our 4.3 kernel causes this. Unfortunately without any hint in > dmesg. But since we move to 4.4 soon and I cannot reproduce it with the > pending > 4.4 build it seems good enough to me. Ah, this is probably fixed by 9c17d96500f78 "xen/gntdev: Grant maps should not be subject to NUMA balancing" then. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xen-4.6: xenstored crashes during domain->interface access
On 26.01.2016 11:58, Stefan Bader wrote: > Hi, > > while playing around with xen-4.6 I stumbled over an odd problem and am > wondering whether anybody has seen the same. A method to relatively quickly > reproduce this for me seems to: > > - Start one domU (PV or HVM does not seem to matter) > - Repeatedly call xenstore-ls a few times > > I think I never got beyond 10 repeats when the xenstore-ls call suddenly locks > up and xenstored crashes with a SIGBUS error. In the majority of cases (I > think > I saw one different), the crash happens while accessing > conn->domain->interface > in tools/xenstore/xenstored_domain.c:domain_can_read(). > Looking at the corefile produced by xenstored I now got at least one case > where > the pointer still matches the previously mapped value. Though I think I had > also > at least one run (with less debugging added) where it seemed to be really > wrong. > There is more info at [1] in case someone is interested. > > I need to repeat a few more times to see how consistent the whole thing is. > Does > this happen for anybody else? Any advice what I should look at (in the sense > of > gathering better data)? Just as an update and confirmation for Ian and Bastian: Debian testing is fine. I have not dug into the specifics but its not the Xen package side at all. Something in our 4.3 kernel causes this. Unfortunately without any hint in dmesg. But since we move to 4.4 soon and I cannot reproduce it with the pending 4.4 build it seems good enough to me. -Stefan > > Thanks, > Stefan > > [1] https://bugs.launchpad.net/ubuntu/+source/xen/+bug/1538049 > signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xen-4.6: xenstored crashes during domain->interface access
On 28.01.2016 10:39, Ian Campbell wrote: > On Thu, 2016-01-28 at 09:50 +0100, Stefan Bader wrote: >> On 26.01.2016 11:58, Stefan Bader wrote: >>> Hi, >>> >>> while playing around with xen-4.6 I stumbled over an odd problem and am >>> wondering whether anybody has seen the same. A method to relatively >>> quickly >>> reproduce this for me seems to: >>> >>> - Start one domU (PV or HVM does not seem to matter) >>> - Repeatedly call xenstore-ls a few times >>> >>> I think I never got beyond 10 repeats when the xenstore-ls call >>> suddenly locks >>> up and xenstored crashes with a SIGBUS error. In the majority of cases >>> (I think >>> I saw one different), the crash happens while accessing conn->domain- interface >>> in tools/xenstore/xenstored_domain.c:domain_can_read(). >>> Looking at the corefile produced by xenstored I now got at least one >>> case where >>> the pointer still matches the previously mapped value. Though I think I >>> had also >>> at least one run (with less debugging added) where it seemed to be >>> really wrong. >>> There is more info at [1] in case someone is interested. >>> >>> I need to repeat a few more times to see how consistent the whole thing >>> is. Does >>> this happen for anybody else? Any advice what I should look at (in the >>> sense of >>> gathering better data)? >> >> Just as an update and confirmation for Ian and Bastian: Debian testing is >> fine. >> I have not dug into the specifics but its not the Xen package side at all. >> Something in our 4.3 kernel causes this. Unfortunately without any hint in >> dmesg. But since we move to 4.4 soon and I cannot reproduce it with the >> pending >> 4.4 build it seems good enough to me. > > Ah, this is probably fixed by 9c17d96500f78 "xen/gntdev: Grant maps should > not be subject to NUMA balancing" then. Oh right. That sounds very possible. Maybe paired with balancing done even on a non-NUMA system (because I saw the same happen on a non-NUMA host, too). And I cannot remember anytime having this with 4.2, so 4.3 seems to have introduced the additional (or maybe more aggressive) balancing. But the result pretty much was what I saw. That from one second to the next the grant-table page of xenstored for the running domU was invalid. Without the daemon having done any unmap. So yeah, likely the balancing got rid of it. -Stefan > > Ian. > signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] xen-4.6: xenstored crashes during domain->interface access
Hi, while playing around with xen-4.6 I stumbled over an odd problem and am wondering whether anybody has seen the same. A method to relatively quickly reproduce this for me seems to: - Start one domU (PV or HVM does not seem to matter) - Repeatedly call xenstore-ls a few times I think I never got beyond 10 repeats when the xenstore-ls call suddenly locks up and xenstored crashes with a SIGBUS error. In the majority of cases (I think I saw one different), the crash happens while accessing conn->domain->interface in tools/xenstore/xenstored_domain.c:domain_can_read(). Looking at the corefile produced by xenstored I now got at least one case where the pointer still matches the previously mapped value. Though I think I had also at least one run (with less debugging added) where it seemed to be really wrong. There is more info at [1] in case someone is interested. I need to repeat a few more times to see how consistent the whole thing is. Does this happen for anybody else? Any advice what I should look at (in the sense of gathering better data)? Thanks, Stefan [1] https://bugs.launchpad.net/ubuntu/+source/xen/+bug/1538049 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel