Re: [Xen-devel] xen-4.6: xenstored crashes during domain->interface access

2016-01-28 Thread Ian Campbell
On Thu, 2016-01-28 at 09:50 +0100, Stefan Bader wrote:
> On 26.01.2016 11:58, Stefan Bader wrote:
> > Hi,
> > 
> > while playing around with xen-4.6 I stumbled over an odd problem and am
> > wondering whether anybody has seen the same. A method to relatively
> > quickly
> > reproduce this for me seems to:
> > 
> > - Start one domU (PV or HVM does not seem to matter)
> > - Repeatedly call xenstore-ls a few times
> > 
> > I think I never got beyond 10 repeats when the xenstore-ls call
> > suddenly locks
> > up and xenstored crashes with a SIGBUS error. In the majority of cases
> > (I think
> > I saw one different), the crash happens while accessing conn->domain-
> > >interface
> > in tools/xenstore/xenstored_domain.c:domain_can_read().
> > Looking at the corefile produced by xenstored I now got at least one
> > case where
> > the pointer still matches the previously mapped value. Though I think I
> > had also
> > at least one run (with less debugging added) where it seemed to be
> > really wrong.
> > There is more info at [1] in case someone is interested.
> > 
> > I need to repeat a few more times to see how consistent the whole thing
> > is. Does
> > this happen for anybody else? Any advice what I should look at (in the
> > sense of
> > gathering better data)?
> 
> Just as an update and confirmation for Ian and Bastian: Debian testing is 
> fine.
> I have not dug into the specifics but its not the Xen package side at all.
> Something in our 4.3 kernel causes this. Unfortunately without any hint in
> dmesg. But since we move to 4.4 soon and I cannot reproduce it with the 
> pending
> 4.4 build it seems good enough to me.

Ah, this is probably fixed by 9c17d96500f78 "xen/gntdev: Grant maps should
not be subject to NUMA balancing" then.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xen-4.6: xenstored crashes during domain->interface access

2016-01-28 Thread Stefan Bader
On 26.01.2016 11:58, Stefan Bader wrote:
> Hi,
> 
> while playing around with xen-4.6 I stumbled over an odd problem and am
> wondering whether anybody has seen the same. A method to relatively quickly
> reproduce this for me seems to:
> 
> - Start one domU (PV or HVM does not seem to matter)
> - Repeatedly call xenstore-ls a few times
> 
> I think I never got beyond 10 repeats when the xenstore-ls call suddenly locks
> up and xenstored crashes with a SIGBUS error. In the majority of cases (I 
> think
> I saw one different), the crash happens while accessing 
> conn->domain->interface
> in tools/xenstore/xenstored_domain.c:domain_can_read().
> Looking at the corefile produced by xenstored I now got at least one case 
> where
> the pointer still matches the previously mapped value. Though I think I had 
> also
> at least one run (with less debugging added) where it seemed to be really 
> wrong.
> There is more info at [1] in case someone is interested.
> 
> I need to repeat a few more times to see how consistent the whole thing is. 
> Does
> this happen for anybody else? Any advice what I should look at (in the sense 
> of
> gathering better data)?

Just as an update and confirmation for Ian and Bastian: Debian testing is fine.
I have not dug into the specifics but its not the Xen package side at all.
Something in our 4.3 kernel causes this. Unfortunately without any hint in
dmesg. But since we move to 4.4 soon and I cannot reproduce it with the pending
4.4 build it seems good enough to me.

-Stefan
> 
> Thanks,
> Stefan
> 
> [1] https://bugs.launchpad.net/ubuntu/+source/xen/+bug/1538049
> 




signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xen-4.6: xenstored crashes during domain->interface access

2016-01-28 Thread Stefan Bader
On 28.01.2016 10:39, Ian Campbell wrote:
> On Thu, 2016-01-28 at 09:50 +0100, Stefan Bader wrote:
>> On 26.01.2016 11:58, Stefan Bader wrote:
>>> Hi,
>>>
>>> while playing around with xen-4.6 I stumbled over an odd problem and am
>>> wondering whether anybody has seen the same. A method to relatively
>>> quickly
>>> reproduce this for me seems to:
>>>
>>> - Start one domU (PV or HVM does not seem to matter)
>>> - Repeatedly call xenstore-ls a few times
>>>
>>> I think I never got beyond 10 repeats when the xenstore-ls call
>>> suddenly locks
>>> up and xenstored crashes with a SIGBUS error. In the majority of cases
>>> (I think
>>> I saw one different), the crash happens while accessing conn->domain-
 interface
>>> in tools/xenstore/xenstored_domain.c:domain_can_read().
>>> Looking at the corefile produced by xenstored I now got at least one
>>> case where
>>> the pointer still matches the previously mapped value. Though I think I
>>> had also
>>> at least one run (with less debugging added) where it seemed to be
>>> really wrong.
>>> There is more info at [1] in case someone is interested.
>>>
>>> I need to repeat a few more times to see how consistent the whole thing
>>> is. Does
>>> this happen for anybody else? Any advice what I should look at (in the
>>> sense of
>>> gathering better data)?
>>
>> Just as an update and confirmation for Ian and Bastian: Debian testing is 
>> fine.
>> I have not dug into the specifics but its not the Xen package side at all.
>> Something in our 4.3 kernel causes this. Unfortunately without any hint in
>> dmesg. But since we move to 4.4 soon and I cannot reproduce it with the 
>> pending
>> 4.4 build it seems good enough to me.
> 
> Ah, this is probably fixed by 9c17d96500f78 "xen/gntdev: Grant maps should
> not be subject to NUMA balancing" then.

Oh right. That sounds very possible. Maybe paired with balancing done even on a
non-NUMA system (because I saw the same happen on a non-NUMA host, too). And I
cannot remember anytime having this with 4.2, so 4.3 seems to have introduced
the additional (or maybe more aggressive) balancing.
But the result pretty much was what I saw. That from one second to the next the
grant-table page of xenstored for the running domU was invalid. Without the
daemon having done any unmap. So yeah, likely the balancing got rid of it.

-Stefan
> 
> Ian.
> 




signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] xen-4.6: xenstored crashes during domain->interface access

2016-01-26 Thread Stefan Bader
Hi,

while playing around with xen-4.6 I stumbled over an odd problem and am
wondering whether anybody has seen the same. A method to relatively quickly
reproduce this for me seems to:

- Start one domU (PV or HVM does not seem to matter)
- Repeatedly call xenstore-ls a few times

I think I never got beyond 10 repeats when the xenstore-ls call suddenly locks
up and xenstored crashes with a SIGBUS error. In the majority of cases (I think
I saw one different), the crash happens while accessing conn->domain->interface
in tools/xenstore/xenstored_domain.c:domain_can_read().
Looking at the corefile produced by xenstored I now got at least one case where
the pointer still matches the previously mapped value. Though I think I had also
at least one run (with less debugging added) where it seemed to be really wrong.
There is more info at [1] in case someone is interested.

I need to repeat a few more times to see how consistent the whole thing is. Does
this happen for anybody else? Any advice what I should look at (in the sense of
gathering better data)?

Thanks,
Stefan

[1] https://bugs.launchpad.net/ubuntu/+source/xen/+bug/1538049



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel