Re: Regressions

2020-03-02 Thread Kamil Rytarowski
On 01.03.2020 23:31, Andrew Doran wrote:
> On Sun, Mar 01, 2020 at 03:26:12PM +0200, Andreas Gustafsson wrote:
> 
>>   55020 dbregs_dr?_dont_inherit_lwp test cases fail on real hardware
> 
> I've not personally looked into these yet.
> 

In 55020 we strangely (unless there is a bug that I overlook) get from
the kernel EDEADLK on _lwp_wait(2) (pthread_join(3)).



signature.asc
Description: OpenPGP digital signature


Re: Regressions

2020-03-01 Thread Andrew Doran
On Sun, Mar 01, 2020 at 03:26:12PM +0200, Andreas Gustafsson wrote:

> NetBSD-current is again suffering from a number of regressions.  The
> last time the ATF tests showed zero unexpected failures on real amd64
> hardware was on Dec 12, and the sparc, sparc64, pmax, and hpcmips
> tests have all been unable to run to completion for more than a month.
> 
> Here are the PRs for some of the issues:
> 
>   50350 rump/rumpkern/t_sp/stress_{long,short} fail on Core 2 Quad
>   55032 rump/rumpkern/t_vm:uvmwait test case now fails
>   55018 atf tests for pppoe sometimes leave rump_server processes around

Rump is very fragile.  There's no real bug in the system captured here as
far as I am aware.  Have already spent a lot of time on these and related -
see pthread changes etc which somewhat improved the picture.  Will look into
them again when finished addressing build performance which should be within
the month.

>   54845 sparc panics in sleepq_remove

Spent a couple of days so far this year fixing sparc's maladies.  Will cycle
back when I have free time (as above).

>   54810 sparc64 pool_redzone_check errors during install
>   54923 pmax test runs fail to complete since Jan 15
>   55020 dbregs_dr?_dont_inherit_lwp test cases fail on real hardware

I've not personally looked into these yet.

Andrew


Re: Regressions

2020-03-01 Thread Jason Thorpe


> On Mar 1, 2020, at 7:56 AM, Andreas Gustafsson  wrote:
> 
> Are you saying fixing one or the other is not your responsibility,
> and if so, whose?

What I'm saying is it doesn't reflect a bug in the core functionality.  I 
acknowledge that it's an issue that needs to be addressed, but there are 
degrees of seriousness.

-- thorpej



Re: Regressions

2020-03-01 Thread Andreas Gustafsson
Jason Thorpe wrote:
> The issue seems to be that rump really wants to join threads that
> are created for work queues when the rump server exits.  But in this
> particular case, there's a global work queue that never goes away
> because in the real kernel, there's no need to do this before the
> system reboots / shuts down.  Any change to fix this will be 100%
> for the appeasement of rump.

Well, yes, just like any change to fix the current build breakage in
if_stge.c will be 100% for the appeasement of 32-bit platforms.
Are you saying fixing one or the other is not your responsibility,
and if so, whose?
-- 
Andreas Gustafsson, g...@gson.org


Re: Regressions

2020-03-01 Thread Kamil Rytarowski
On 01.03.2020 14:26, Andreas Gustafsson wrote:
> Hi all,
> 
> NetBSD-current is again suffering from a number of regressions.  The
> last time the ATF tests showed zero unexpected failures on real amd64
> hardware was on Dec 12, and the sparc, sparc64, pmax, and hpcmips
> tests have all been unable to run to completion for more than a month.
> 
> Here are the PRs for some of the issues:
> 
>   50350 rump/rumpkern/t_sp/stress_{long,short} fail on Core 2 Quad
>   54810 sparc64 pool_redzone_check errors during install
>   54845 sparc panics in sleepq_remove
>   54923 pmax test runs fail to complete since Jan 15
>   55018 atf tests for pppoe sometimes leave rump_server processes around
>   55020 dbregs_dr?_dont_inherit_lwp test cases fail on real hardware
>   55032 rump/rumpkern/t_vm:uvmwait test case now fails
> 
> What can be done?
> 

I was looking at the dbregs one, but it looks like a kernel bug to me.



signature.asc
Description: OpenPGP digital signature


Regressions

2020-03-01 Thread Andreas Gustafsson
Hi all,

NetBSD-current is again suffering from a number of regressions.  The
last time the ATF tests showed zero unexpected failures on real amd64
hardware was on Dec 12, and the sparc, sparc64, pmax, and hpcmips
tests have all been unable to run to completion for more than a month.

Here are the PRs for some of the issues:

  50350 rump/rumpkern/t_sp/stress_{long,short} fail on Core 2 Quad
  54810 sparc64 pool_redzone_check errors during install
  54845 sparc panics in sleepq_remove
  54923 pmax test runs fail to complete since Jan 15
  55018 atf tests for pppoe sometimes leave rump_server processes around
  55020 dbregs_dr?_dont_inherit_lwp test cases fail on real hardware
  55032 rump/rumpkern/t_vm:uvmwait test case now fails

What can be done?
-- 
Andreas Gustafsson, g...@gson.org