Re: [Xen-devel] [linux-linus test] 109469: regressions - FAIL

2017-05-19 Thread Jan Beulich
>>> On 19.05.17 at 12:56,  wrote:
> Jan Beulich writes ("Re: [Xen-devel] [linux-linus test] 109469: regressions - 
> FAIL"):
>> Therefore I'm afraid the only way we could obtain a more
>> complete picture would be if this re-occurred and if at that
>> time we'd have "async-show-all" in place on the hypervisor
>> command line.
> 
> Is that a thing I could do to all the tests ?  Adding a command-line
> option is easy, if it's otherwise harmless.

It'll only affect verbosity if the watchdog triggers, a non-understood
NMI was raised, or an MCE arrived requiring the machine to be
brought down, so yes, I think this could be enabled uniformly (some
older versions may not understand it, but they also won't choke).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [linux-linus test] 109469: regressions - FAIL

2017-05-19 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] [linux-linus test] 109469: regressions - 
FAIL"):
> Therefore I'm afraid the only way we could obtain a more
> complete picture would be if this re-occurred and if at that
> time we'd have "async-show-all" in place on the hypervisor
> command line.

Is that a thing I could do to all the tests ?  Adding a command-line
option is easy, if it's otherwise harmless.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [linux-linus test] 109469: regressions - FAIL

2017-05-19 Thread Jan Beulich
>>> On 17.05.17 at 16:59,  wrote:
> On 05/16/2017 06:43 PM, osstest service owner wrote:
>> flight 109469 linux-linus real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/109469/ 
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-amd64-i386-libvirt   6 xen-boot fail REGR. vs. 
>> 109449
> 
> http://logs.test-lab.xenproject.org/osstest/logs/109469/test-amd64-i386-libv 
> irt/serial-rimava0.log
> 
> This looks like some sort of a deadlock with CPU2 waiting for remote
> call to complete while CPU0 waiting for flush_lock.

But these two don't block each other, as both run with interrupts
enabled (i.e. are available to process IPIs the other might have
sent).

> Only two CPUs are dumped though.

That's bad. CPU4 sitting at the final loop in flush_area_mask()
makes clear that's the flush_lock holder, but we can imply it has
IRQs on just like CPUs 0 and 2. While the place CPU2 was
caught also doesn't allow us to deduce which other CPU(s)
is/are not responding, the main candidate would appear to be
CPU1, of which we know nothing except that it also sits in
_spin_lock(). Neither flush_lock nor call_lock would ever be
acquired with IRQs off, so I'd conclude there must be a 3rd
lock involved here.

Therefore I'm afraid the only way we could obtain a more
complete picture would be if this re-occurred and if at that
time we'd have "async-show-all" in place on the hypervisor
command line.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [linux-linus test] 109469: regressions - FAIL

2017-05-17 Thread Boris Ostrovsky
On 05/16/2017 06:43 PM, osstest service owner wrote:
> flight 109469 linux-linus real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/109469/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-i386-libvirt   6 xen-boot fail REGR. vs. 
> 109449

http://logs.test-lab.xenproject.org/osstest/logs/109469/test-amd64-i386-libvirt/serial-rimava0.log

This looks like some sort of a deadlock with CPU2 waiting for remote
call to complete while CPU0 waiting for flush_lock.

Only two CPUs are dumped though.

-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel