Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-15 Thread Ryan Libby
On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem  wrote:
>
> Rick Macklem wrote:
> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS.
> >I thought this was the culprit, since I did 6 cycles of r358097 without a 
> >hang.
> >However, I just got a hang with r358097, but it looks rather different.
> >The r358097 hang did not have any processes sleeping on btalloc. They
> >appeared to be waiting on two different locks in the buffer cache.
> >As such, I think it might be a different problem. (I'll admit I should have
> >made notes about this one before rebooting, but I was flustrated that
> >it happened and rebooted before looking at it mush detail.)
> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
> got a hang.
> --> It seems that r358097 is the culprit and r358098 makes it easier
>   to reproduce.
>   --> Basically runs out of kernel memory.
>
> It is not obvious if I can revert these two commits without reverting
> other ones, since there were a bunch of vm changes after these.
>
> I'll take a look, but if you guys have any ideas on how to fix this, please
> let me know.
>
> Thanks, rick

Interesting.  Could you try re-adding UMA_ZONE_NOFREE to the vmem btag
zone to see if that rescues it, on whatever base revision gets you a
reliable repro?

>
> Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4
> (single core i386) with 1.25Gbytes ram when doing kernel builds using
> head kernels from this winter. (I also saw one when doing a kernel build
> on UFS, so they aren't NFS specific, although easier to reproduce that way.)
> After a typical hang, there will be a bunch of processes sleeping on "btalloc"
> and several processes holding the following lock:
> exclusive sx lock @ vm/vm_map.c:4761
> - I have seen hangs where that is the only lock held by any process except
>the interrupt thread.
> - I have also seen processes waiting on the following locks:
> kern/subr_vmem.c:1343
> kern/subr_vmem.c:633
>
> I can't be absolutely sure r358098 is the culprit, but it seems to make the
> problem more reproducible.
>
> If anyone has a patch suggestion, I can test it.
> Otherwise, I will continue to test r358097 and earlier, to try and see what 
> hangs
> occur. (I've done 8 cycles of testing of r356776 without difficulties, but 
> that
> doesn't guarantee it isn't broken.)
>
> There is a bunch more of the stuff I got for Kostik and Ryan below.
> I can do "db" when it is hung, but it is a screen console, so I need to
> transcribe the output to email by hand. (ie. If you need something
> specific I can do that, but trying to do everything Kostik and Ryan asked
> for isn't easy.)
>
> rick
>
>
>
> Konstantin Belousov wrote:
> >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
> >> Konstantin Belousov wrote:
> >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
> >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  
> >> >> wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Since I hadn't upgraded a kernel through the winter, it took me a 
> >> >> > while
> >> >> > to bisect this, but r358252 seems to be the culprit.
> No longer true. I succeeded in reproducing the hang to-day running a
> r358251 kernel.
>
> I haven't had much luck sofar, but see below for what I have learned.
>
> >> >> >
> >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single 
> >> >> > core,
> >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
> >> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
> >> >> > If I revert to r358251, I cannot reproduce this.
> As above, this is no longer true.
>
> >> >> >
> >> >> > Any ideas?
> >> >> >
> >> >> > I can easily test any change you might suggest to see if it fixes the
> >> >> > problem.
> >> >> >
> >> >> > If you want more debug info, let me know, since I can easily
> >> >> > reproduce it.
> >> >> >
> >> >> > Thanks, rick
> >> >>
> >> >> Nothing obvious to me.  I can maybe try a repro on a VM...
> >> >>
> >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
> >> >>
> >> >> "btalloc" is "We're either out of address space or lost a fill race."
> From what I see, I think it is "out of address space".
> For one of the hangs, when I did "show alllocks", everything except the
> intr thread, was waiting for the
> exclusive sx lock @ vm/vm_map.c:4761
>
> >> >
> >> >Yes, I would be not surprised to be out of something on 1G i386 machine.
> >> >Please also add 'show alllocks'.
> >> Ok, I used an up to date head kernel and it took longer to reproduce a 
> >> hang.
> Go down to Kostik's comment about kern.maxvnodes for the rest of what I've
> learned. (The time it takes to reproduce one of these varies greatly, but I 
> usually
> get one within 3 cycles of a full kernel build over NFS. I have had it happen
> once when doing a kernel build over UFS.)
>
> >> This time, none of the processes are stuck on "btalloc".
> > I'll try and 

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-15 Thread Rick Macklem
Rick Macklem wrote:
>r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS.
>I thought this was the culprit, since I did 6 cycles of r358097 without a hang.
>However, I just got a hang with r358097, but it looks rather different.
>The r358097 hang did not have any processes sleeping on btalloc. They
>appeared to be waiting on two different locks in the buffer cache.
>As such, I think it might be a different problem. (I'll admit I should have
>made notes about this one before rebooting, but I was flustrated that
>it happened and rebooted before looking at it mush detail.)
Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
got a hang.
--> It seems that r358097 is the culprit and r358098 makes it easier
  to reproduce.
  --> Basically runs out of kernel memory.

It is not obvious if I can revert these two commits without reverting
other ones, since there were a bunch of vm changes after these.

I'll take a look, but if you guys have any ideas on how to fix this, please
let me know.

Thanks, rick

Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4
(single core i386) with 1.25Gbytes ram when doing kernel builds using
head kernels from this winter. (I also saw one when doing a kernel build
on UFS, so they aren't NFS specific, although easier to reproduce that way.)
After a typical hang, there will be a bunch of processes sleeping on "btalloc"
and several processes holding the following lock:
exclusive sx lock @ vm/vm_map.c:4761
- I have seen hangs where that is the only lock held by any process except
   the interrupt thread.
- I have also seen processes waiting on the following locks:
kern/subr_vmem.c:1343
kern/subr_vmem.c:633

I can't be absolutely sure r358098 is the culprit, but it seems to make the
problem more reproducible.

If anyone has a patch suggestion, I can test it.
Otherwise, I will continue to test r358097 and earlier, to try and see what 
hangs
occur. (I've done 8 cycles of testing of r356776 without difficulties, but that
doesn't guarantee it isn't broken.)

There is a bunch more of the stuff I got for Kostik and Ryan below.
I can do "db" when it is hung, but it is a screen console, so I need to
transcribe the output to email by hand. (ie. If you need something
specific I can do that, but trying to do everything Kostik and Ryan asked
for isn't easy.)

rick



Konstantin Belousov wrote:
>On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> >> > to bisect this, but r358252 seems to be the culprit.
No longer true. I succeeded in reproducing the hang to-day running a
r358251 kernel.

I haven't had much luck sofar, but see below for what I have learned.

>> >> >
>> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single 
>> >> > core,
>> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> >> > If I revert to r358251, I cannot reproduce this.
As above, this is no longer true.

>> >> >
>> >> > Any ideas?
>> >> >
>> >> > I can easily test any change you might suggest to see if it fixes the
>> >> > problem.
>> >> >
>> >> > If you want more debug info, let me know, since I can easily
>> >> > reproduce it.
>> >> >
>> >> > Thanks, rick
>> >>
>> >> Nothing obvious to me.  I can maybe try a repro on a VM...
>> >>
>> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>> >>
>> >> "btalloc" is "We're either out of address space or lost a fill race."
>From what I see, I think it is "out of address space".
For one of the hangs, when I did "show alllocks", everything except the
intr thread, was waiting for the
exclusive sx lock @ vm/vm_map.c:4761

>> >
>> >Yes, I would be not surprised to be out of something on 1G i386 machine.
>> >Please also add 'show alllocks'.
>> Ok, I used an up to date head kernel and it took longer to reproduce a hang.
Go down to Kostik's comment about kern.maxvnodes for the rest of what I've
learned. (The time it takes to reproduce one of these varies greatly, but I 
usually
get one within 3 cycles of a full kernel build over NFS. I have had it happen
once when doing a kernel build over UFS.)

>> This time, none of the processes are stuck on "btalloc".
> I'll try and give you most of the above, but since I have to type it in by 
> hand
> from the screen, I might not get it all. (I'm no real typist;-)
> > show alllocks
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259
> exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737
> exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ 
> kern/subr_vmem.c:1343
> exclusive lockmgr bufwait (bufwait) r = 0 locked @ 

FreeBSD CI Weekly Report 2020-06-14

2020-06-15 Thread Li-Wen Hsu
(Please send the followup to freebsd-testing@ and note Reply-To is set.)

FreeBSD CI Weekly Report 2020-06-14
===

Here is a summary of the FreeBSD Continuous Integration results for the period
from 2020-06-08 to 2020-06-14.

During this period, we have:

* 1798 builds (90.0% (+1.4) passed, 10.0% (-1.4) failed) of buildworld and
  buildkernel (GENERIC and LINT) were executed on aarch64, amd64, armv6,
  armv7, i386, mips, mips64, powerpc, powerpc64, powerpcspe, riscv64,
  sparc64 architectures for head, stable/12, stable/11 branches.
* 199 test runs (77.9% (+1.1) passed, 20.6% (-0.1) unstable, 1.5% (-1.0)
  exception) were executed on amd64, i386, riscv64 architectures for head,
  stable/12, stable/11 branches.
* 29 doc and www builds (96.6% (-3.4) passed, 3.4% (+3.4) failed)

Test case status (on 2020-06-14 23:59):
| Branch/Architecture | Total  | Pass   | Fail   | Skipped |
| --- | -- | -- | -- | --- |
| head/amd64  | 7863 (+30) | 7773 (+32) | 0 (+0) | 90 (-2) |
| head/i386   | 7861 (+30) | 7762 (+27) | 0 (+0) | 99 (+3) |
| 12-STABLE/amd64 | 7587 (+1)  | 7528 (+1)  | 0 (+0) | 59 (+0) |
| 12-STABLE/i386  | 7585 (+1)  | 7521 (+4)  | 0 (+0) | 64 (-3) |
| 11-STABLE/amd64 | 6885 (+2)  | 6835 (+5)  | 0 (+0) | 50 (-3) |
| 11-STABLE/i386  | 6883 (+2)  | 6831 (+2)  | 0 (+0) | 52 (+0) |

(The statistics from experimental jobs are omitted)

If any of the issues found by CI are in your area of interest or expertise
please investigate the PRs listed below.

The latest web version of this report is available at
https://hackmd.io/@FreeBSD-CI/report-20200614 and archive is available at
https://hackmd.io/@FreeBSD-CI/ , any help is welcome.

## Failing jobs

* https://ci.freebsd.org/job/FreeBSD-head-amd64-gcc6_build/
  ```
  /usr/local/bin/x86_64-unknown-freebsd12.1-ld: 
/tmp/obj/workspace/src/amd64.amd64/lib/clang/liblldb/liblldb.a(IOHandlerCursesGUI.o):
 in function `curses::Window::Box(unsigned int, unsigned int)':
  
/workspace/src/contrib/llvm-project/lldb/source/Core/IOHandlerCursesGUI.cpp:361:
 undefined reference to `box'
  /usr/local/bin/x86_64-unknown-freebsd12.1-ld: 
/workspace/src/contrib/llvm-project/lldb/source/Core/IOHandlerCursesGUI.cpp:361:
 undefined reference to `box'
  collect2: error: ld returned 1 exit status
  ```

## Regressions

* lib.libexecinfo.backtrace_test.backtrace_fmt_basic starts failing on amd64 
after r360915
  https://bugs.freebsd.org/246537

* (head, stable/12, stable/11) 2 tests start failing after llvm10 import
* lib.msun.ctrig_test.test_inf_inputs
  https://bugs.freebsd.org/244732
* (DTrace) common.pid.t_dtrace_contrib.err_D_PROC_OFF_toobig_d
  https://bugs.freebsd.org/244823

* Lock-order reversals triggered by tests under sys.net.if_lagg_test.* on i386
  https://bugs.freebsd.org/244163
  Discovered by newly endabled sys.net.* tests. 
([r357857](https://svnweb.freebsd.org/changeset/base/357857))
  
* sys.net.if_lagg_test.lacp_linkstate_destroy_stress panics i386 kernel
  https://bugs.freebsd.org/244168
  Discovered by newly endabled sys.net.* tests. 
([r357857](https://svnweb.freebsd.org/changeset/base/357857))
  Fix in review: https://reviews.freebsd.org/D25284

## Failing and Flaky tests (from experimental jobs)

* https://ci.freebsd.org/job/FreeBSD-head-amd64-dtrace_test/
* cddl.usr.sbin.dtrace.common.misc.t_dtrace_contrib.tst_dynopt_d
* https://bugs.freebsd.org/237641
* cddl.usr.sbin.dtrace.common.pid.t_dtrace_contrib.err_D_PROC_OFF_toobig_d
* https://bugs.freebsd.org/244823

* https://ci.freebsd.org/job/FreeBSD-head-amd64-test_zfs/
* There are ~13 failing and ~109 skipped cases, including flakey ones, see
  
https://ci.freebsd.org/job/FreeBSD-head-amd64-test_zfs/lastCompletedBuild/testReport/
 for more details
* Work for cleaning these failing cass are in progress

* https://ci.freebsd.org/job/FreeBSD-head-amd64-test_ltp/
* Total 3749 tests, 2277 success, 646 failures, 826 skipped

## Disabled Tests

* sys.fs.tmpfs.mount_test.large
  https://bugs.freebsd.org/212862
* sys.fs.tmpfs.link_test.kqueue
  https://bugs.freebsd.org/213662
* sys.kqueue.libkqueue.kqueue_test.main
  https://bugs.freebsd.org/233586
* sys.kern.ptrace_test.ptrace__PT_KILL_competing_stop
  https://bugs.freebsd.org/220841
* lib.libc.regex.exhaust_test.regcomp_too_big (i386 only)
  https://bugs.freebsd.org/237450
* sys.netinet.socket_afinet.socket_afinet_bind_zero
  https://bugs.freebsd.org/238781
* sys.netpfil.pf.names.names
* sys.netpfil.pf.synproxy.synproxy
  https://bugs.freebsd.org/238870
* sys.kern.ptrace_test.ptrace__follow_fork_child_detached_unrelated_debugger 
  https://bugs.freebsd.org/239292
* sys.kern.ptrace_test.ptrace__follow_fork_both_attached_unrelated_debugger 
  https://bugs.freebsd.org/239397
* sys.kern.ptrace_test.ptrace__parent_sees_exit_after_child_debugger
  https://bugs.freebsd.org/239399
* 

RE: MRSAS Panic during Install.

2020-06-15 Thread Santiago Martinez
Ok, will try a few more times as I dont need the server right now. i will keep 
you posted.Sent from my Samsung Galaxy smartphone.
 Original message From: Kashyap Desai 
 Date: 15/06/2020  19:29  (GMT+00:00) To: Santiago 
Martinez , Don Lewis , Andriy 
Gapon  Cc: FreeBSD Current , "Kashyap D. 
Desai" , "Kenneth D. Merry" , Sumit 
Saxena , Chandrakanth Patil 
 Subject: RE: MRSAS Panic during Install. Hi, 
Issue require certain specific timing. We fire two Write operation from Driver 
on raid-1 volume.  Write on Data arm and Write on Mirror arm. Based on which IO 
completed by device/HBA to the driver first can change the behavior. To hit 
this issue, IO from Mirror arm should return first. Kashyap From: Santiago 
Martinez [mailto:s...@codenetworks.net] Sent: Monday, June 15, 2020 11:17 PMTo: 
Kashyap Desai ; Don Lewis ; 
Andriy Gapon Cc: FreeBSD Current ; 
Kashyap D. Desai ; Kenneth D. Merry ; 
Sumit Saxena ; Chandrakanth Patil 
Subject: Re: MRSAS Panic during Install. Hi 
guys, sorry for the radio silence, I did a fresh install today to apply the 
patch, and by mistake i changed from UEFI to BIOS (legacy), now I'm not able to 
replicate the panic any more (without applying the patch). Do you think is 
related, I mean does it make sense to you?Im planning to re install tonight 
with UEFI to see if i can hit it again.SAnti  On 2020-06-12 06:30, Kashyap 
Desai wrote:Screenshot of final error is good enough at least for now. 
Additional logs will be good if you can provide. Kashyap From: Santiago 
Martinez [mailto:s...@codenetworks.net] Sent: Thursday, June 11, 2020 8:49 
PMTo: Kashyap Desai ; Don Lewis 
; Andriy Gapon Cc: FreeBSD Current 
; Kashyap D. Desai ; Kenneth D. Merry 
; Sumit Saxena ; Chandrakanth 
Patil Subject: Re: MRSAS Panic during Install. 
Hi Everyone,  i haven't forget about this yet :)I perform the following 
test in preparation to apply the patch:Installed 12.1 stable with one disc in 
raid0, no issues (did this so i can apply patch and don't have to build a new 
bootable image).with 12.1 added second raid, in this case RAID1 and it crashed. 
during foot complain about wrong/corrupt partition tableRepeated the same test 
with current with exactly the same results ( I also hit another issues on 
mlx5en that crashed with current, will follow after we can sort this out)Today 
I'm planning to apply the patch and see what happens. Have some question:Do you 
want me to record the booting/error for each case and make it available?Do you 
want access to the box?Best regards.Santiago On 2020-06-09 22:43, Santiago 
Martinez wrote:Hi there, apologies for the delayed response. Regarding the lock 
reversal, I can try to capture the screen showing the message. The "Wil check 
go it goes" it was my brain trying to multitask, obviously not in a 
successful way. What I meant to say was "I will check how it goes. without 
the RAID". Sure, I will test with the patch and let you know asap. hopefully by 
tomorrow night(BST). Cheers Santi On 2020-06-09 19:20, Kashyap Desai wrote: 
-Original Message- From: Santiago Martinez 
[mailto:s...@codenetworks.net] Sent: Tuesday, June 9, 2020 11:27 PM To: Kashyap 
Desai ; Don Lewis ; Andriy 
Gapon  Cc: FreeBSD Current ; Kashyap D. 
Desai ; Kenneth D. Merry ; Sumit Saxena 
; Chandrakanth Patil 
 Subject: Re: MRSAS Panic during Install. Hi! 
so it works but i got a lock order reversal warning, but it continue. OK. So 
what is a warning ? Wil check go it goes Could not get your point. Can you 
elaborate ? Also can you try Raid - 1 VD with below patch ? diff --git 
a/mrsas.c b/mrsas.c index 3d33073..60f4b4d 100755 --- a/mrsas.c +++ b/mrsas.c 
@@ -1744,11 +1744,14 @@ mrsas_complete_cmd(struct mrsas_softc *sc, u_int32_t 
MSIxIndex)  data_length 
= r1_cmd->io_request->DataLength;   
   sense = r1_cmd->sense;   
   } + +  mtx_lock(>sim_lock);  
    r1_cmd->ccb_ptr = NULL; 
 if (r1_cmd->callout_owner) { 
callout_stop(_cmd->cm_callout);  
    r1_cmd->callout_owner = false;  
    } +  
mtx_unlock(>sim_lock);  
mrsas_release_mpt_cmd(r1_cmd); mrsas_map_mpt_cmd_status(cmd_mpt, 
cmd_mpt->ccb_ptr, status,   
   extStatus, data_length, sense); Santi On 2020-06-09 11:13, Santiago Martinez 
wrote: Trying right now, will let you know. On 2020-06-09 11:07, Kashyap 
Desai wrote: Hi Santi - Please try without Raid-1 VD. Most likely you will not 
observe issue, but you can confirm from your end. Kashyap -Original 
Message- 

Re: MRSAS Panic during Install.

2020-06-15 Thread Santiago Martinez
Hi guys, sorry for the radio silence, I did a fresh install today to 
apply the patch, and by mistake i changed from UEFI to BIOS (legacy), 
now I'm not able to replicate the panic any more (without applying the 
patch). Do you think is related, I mean does it make sense to you?


Im planning to re install tonight with UEFI to see if i can hit it again.

SAnti



On 2020-06-12 06:30, Kashyap Desai wrote:


Screenshot of final error is good enough at least for now. Additional 
logs will be good if you can provide.


Kashyap

*From:*Santiago Martinez [mailto:s...@codenetworks.net 
]

*Sent:* Thursday, June 11, 2020 8:49 PM
*To:* Kashyap Desai >; Don Lewis >; Andriy Gapon >
*Cc:* FreeBSD Current >; Kashyap D. Desai >; Kenneth D. Merry >; Sumit Saxena >; Chandrakanth Patil 
mailto:chandrakanth.pa...@broadcom.com>>

*Subject:* Re: MRSAS Panic during Install.

Hi Everyone,  i haven't forget about this yet :)

I perform the following test in preparation to apply the patch:

 1. Installed 12.1 stable with one disc in raid0, no issues (did this
so i can apply patch and don't have to build a new bootable image).
 2. with 12.1 added second raid, in this case RAID1 and it crashed.
during foot complain about wrong/corrupt partition table

Repeated the same test with current with exactly the same results ( I 
also hit another issues on mlx5en that crashed with current, will 
follow after we can sort this out)


Today I'm planning to apply the patch and see what happens.

Have some question:

  * Do you want me to record the booting/error for each case and make
it available?
  * Do you want access to the box?

Best regards.

Santiago

On 2020-06-09 22:43, Santiago Martinez wrote:

Hi there, apologies for the delayed response.

Regarding the lock reversal, I can try to capture the screen
showing the message.

The "Wil check go it goes" it was my brain trying to
multitask, obviously not in a successful way. What I meant to say
was "I will check how it goes. without the RAID".

Sure, I will test with the patch and let you know asap. hopefully
by tomorrow night(BST).

Cheers

Santi


On 2020-06-09 19:20, Kashyap Desai wrote:

-Original Message-
From: Santiago Martinez [mailto:s...@codenetworks.net]
Sent: Tuesday, June 9, 2020 11:27 PM
To: Kashyap Desai 
; Don Lewis
 ;
Andriy Gapon  
Cc: FreeBSD Current 
; Kashyap D. Desai
 ;
Kenneth D. Merry 
; Sumit Saxena

; Chandrakanth Patil


Subject: Re: MRSAS Panic during Install.

Hi! so it works but i got a lock order reversal warning,
but it continue.

OK. So what is a warning ?


Wil check go it goes

Could not get your point. Can you elaborate ?


Also can you try Raid - 1 VD with below patch ?

diff --git a/mrsas.c b/mrsas.c
index 3d33073..60f4b4d 100755
--- a/mrsas.c
+++ b/mrsas.c
@@ -1744,11 +1744,14 @@ mrsas_complete_cmd(struct mrsas_softc
*sc, u_int32_t
MSIxIndex)
data_length =
r1_cmd->io_request->DataLength;
sense =
r1_cmd->sense;
 }
+
+ mtx_lock(>sim_lock);
r1_cmd->ccb_ptr = NULL;
 if
(r1_cmd->callout_owner) {

callout_stop(_cmd->cm_callout);
r1_cmd->callout_owner
= false;
 }
+ mtx_unlock(>sim_lock);
mrsas_release_mpt_cmd(r1_cmd);

mrsas_map_mpt_cmd_status(cmd_mpt,
cmd_mpt->ccb_ptr, status,
extStatus,
data_length, sense);




Santi

On 2020-06-09 11:13, Santiago Martinez wrote:

Trying right now, will let you know.


On 2020-06-09 11:07, Kashyap Desai wrote:

Hi Santi - Please try without Raid-1 VD. Most
likely you will not
observe issue, but you can confirm from your end.

Kashyap


-Original Message-
From: Santiago Martinez
[mailto:s...@codenetworks.net]
Sent: Tuesday, 

Re: Panic on mlx5en.

2020-06-15 Thread Santiago Martinez

Ok , will sync now/compile and let you know. Santi


On 2020-06-15 15:05, Hans Petter Selasky wrote:

On 2020-06-15 15:49, Santiago Martinez wrote:
Hi Hans Petter,  At the moment I'm running r362037 but can reinstall, 
patch/rebuild as needed as is just a lab machine.


This revision is not good. There are two things you can try:

1) Try a kernel newer than r362139.

2) Copy sys/sys/tree.h from 12-stable and put it into the 13-current 
tree at the same location and re-build the kernel.


--HPS

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on mlx5en.

2020-06-15 Thread Hans Petter Selasky

On 2020-06-15 15:49, Santiago Martinez wrote:
Hi Hans Petter,  At the moment I'm running r362037 but can reinstall, 
patch/rebuild as needed as is just a lab machine.


This revision is not good. There are two things you can try:

1) Try a kernel newer than r362139.

2) Copy sys/sys/tree.h from 12-stable and put it into the 13-current 
tree at the same location and re-build the kernel.


--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on mlx5en.

2020-06-15 Thread Hans Petter Selasky

On 2020-06-15 15:49, Santiago Martinez wrote:
Hi Hans Petter,  At the moment I'm running r362037 but can reinstall, 
patch/rebuild as needed as is just a lab machine.


One more question: Did you check if the firmware is up-to-date on the 
card? Are you able to extract the mce.N.xxx. sysctl from 12.x ?


sysctl -a | grep fw

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on mlx5en.

2020-06-15 Thread Santiago Martinez
Hi Hans Petter,  At the moment I'm running r362037 but can reinstall, 
patch/rebuild as needed as is just a lab machine.


Santi

On 2020-06-15 14:45, Hans Petter Selasky wrote:


On 2020-06-15 15:31, Santiago Martinez wrote:

Hi Peter, yes Im using the latest one.



Can you tell me which version you are at?

--HPS

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 
"freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on mlx5en.

2020-06-15 Thread Hans Petter Selasky

On 2020-06-15 15:31, Santiago Martinez wrote:

Hi Peter, yes Im using the latest one.



Can you tell me which version you are at?

--HPS

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on mlx5en.

2020-06-15 Thread Santiago Martinez

Hi Peter, yes Im using the latest one.

Santi

On 2020-06-15 12:26, Hans Petter Selasky wrote:

On 2020-06-15 11:12, Santiago Martinez wrote:
Hi everyone, while doing some tests for an MRSAS panic I hit another 
one on mlx5en.


The device is a LenovoSR655 with 2xMellanox NIC.

1 - Mellanox CX-4 Lx 10/25GbE SFP28 2-port OCP Ethernet Adapter
2 - Mellanox CX-4 Lx 10/25GbE SFP28 2-port PCIe Ethernet Adapter

This only happens only with current, tried different snapshots and 
the same in all.

On 12.1 works without a problem.

Trace (please not that is OCR+ manual corrections):

Tracing pid 2288 tid 182178 td @8xfeB385fe6500
kdb_enter() at kbd_inter+0x37/frame 0xfe0386030ba0
vpanic() at vpanic+0x19e/frame 0xfe0386030bf0
panic() at panic+0x43/frame 0xfe038630c50
trap_fatal() at trap_fatal+0x387/frame 0xfe0386030cb0
trap() at trap+0x80/frame Bxfe0386030dc0
calltrap() at calltrap+0x80/frame Bxfed386830dc0
--- trap 0x9, rip = 0xfff8275c060, rsp = 0xfe0386030e90, rbp 
= 0xfe0386030e90 ---
linux_root_RB_INSERT_COLOR() at linux_root_RB_INSERT COLOR+0x40/frame 
0xfe0386030f60

give_pages() at give pages+0x163/frame 0xfe0386830f20
mlx5_satisfy_startup_pages() at mlx5_satisfy_startup_pages+0x76/frame 
0xfe0386030f60

mlx5_load_one () at mlx5_load_one+0x6b7/frame 0xfe0386031080
init_one() at init_one+0x12d5/frame 0xfe03860310f0
linux_pci_attach_device() at linux_pci_attach device+0x573/frame 
0xfe0386031150

device_attach() at device_attach+0x3ca/frame 0xfe0386031190
device_probe_and_attach() at device_probe_and_attach+0x70/frame 
0xfe03860311c0

pci_driver_added() at pci_driver_added+0xf6/frame 0xfe0386031200
devclass_driver_added() at devclass_driver_added+0x39/frame 
0xfe0386031240
devclass_add_driver() at devclass_add_driver+0x147/frame 
0xfe0386031280
_linux_pci_register_driver() at _linux_pci_register_driver+0xc9/frame 
0xfe03860312a0




Are you using the latest version of kernel & mlx5en as of today? There 
was a regression issue with the rbtree.h implementation which recently 
was fixed.


--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 
"freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on mlx5en.

2020-06-15 Thread Hans Petter Selasky

On 2020-06-15 11:12, Santiago Martinez wrote:
Hi everyone, while doing some tests for an MRSAS panic I hit another one 
on mlx5en.


The device is a LenovoSR655 with 2xMellanox NIC.

1 - Mellanox CX-4 Lx 10/25GbE SFP28 2-port OCP Ethernet Adapter
2 - Mellanox CX-4 Lx 10/25GbE SFP28 2-port PCIe Ethernet Adapter

This only happens only with current, tried different snapshots and the 
same in all.

On 12.1 works without a problem.

Trace (please not that is OCR+ manual corrections):

Tracing pid 2288 tid 182178 td @8xfeB385fe6500
kdb_enter() at kbd_inter+0x37/frame 0xfe0386030ba0
vpanic() at vpanic+0x19e/frame 0xfe0386030bf0
panic() at panic+0x43/frame 0xfe038630c50
trap_fatal() at trap_fatal+0x387/frame 0xfe0386030cb0
trap() at trap+0x80/frame Bxfe0386030dc0
calltrap() at calltrap+0x80/frame Bxfed386830dc0
--- trap 0x9, rip = 0xfff8275c060, rsp = 0xfe0386030e90, rbp = 
0xfe0386030e90 ---
linux_root_RB_INSERT_COLOR() at linux_root_RB_INSERT COLOR+0x40/frame 
0xfe0386030f60

give_pages() at give pages+0x163/frame 0xfe0386830f20
mlx5_satisfy_startup_pages() at mlx5_satisfy_startup_pages+0x76/frame 
0xfe0386030f60

mlx5_load_one () at mlx5_load_one+0x6b7/frame 0xfe0386031080
init_one() at init_one+0x12d5/frame 0xfe03860310f0
linux_pci_attach_device() at linux_pci_attach device+0x573/frame 
0xfe0386031150

device_attach() at device_attach+0x3ca/frame 0xfe0386031190
device_probe_and_attach() at device_probe_and_attach+0x70/frame 
0xfe03860311c0

pci_driver_added() at pci_driver_added+0xf6/frame 0xfe0386031200
devclass_driver_added() at devclass_driver_added+0x39/frame 
0xfe0386031240

devclass_add_driver() at devclass_add_driver+0x147/frame 0xfe0386031280
_linux_pci_register_driver() at _linux_pci_register_driver+0xc9/frame 
0xfe03860312a0




Are you using the latest version of kernel & mlx5en as of today? There 
was a regression issue with the rbtree.h implementation which recently 
was fixed.


--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Panic on mlx5en.

2020-06-15 Thread Santiago Martinez
Hi everyone, while doing some tests for an MRSAS panic I hit another one 
on mlx5en.


The device is a LenovoSR655 with 2xMellanox NIC.

1 - Mellanox CX-4 Lx 10/25GbE SFP28 2-port OCP Ethernet Adapter
2 - Mellanox CX-4 Lx 10/25GbE SFP28 2-port PCIe Ethernet Adapter

This only happens only with current, tried different snapshots and the 
same in all.

On 12.1 works without a problem.

Trace (please not that is OCR+ manual corrections):

Tracing pid 2288 tid 182178 td @8xfeB385fe6500
kdb_enter() at kbd_inter+0x37/frame 0xfe0386030ba0
vpanic() at vpanic+0x19e/frame 0xfe0386030bf0
panic() at panic+0x43/frame 0xfe038630c50
trap_fatal() at trap_fatal+0x387/frame 0xfe0386030cb0
trap() at trap+0x80/frame Bxfe0386030dc0
calltrap() at calltrap+0x80/frame Bxfed386830dc0
--- trap 0x9, rip = 0xfff8275c060, rsp = 0xfe0386030e90, rbp = 
0xfe0386030e90 ---
linux_root_RB_INSERT_COLOR() at linux_root_RB_INSERT COLOR+0x40/frame 
0xfe0386030f60

give_pages() at give pages+0x163/frame 0xfe0386830f20
mlx5_satisfy_startup_pages() at mlx5_satisfy_startup_pages+0x76/frame 
0xfe0386030f60

mlx5_load_one () at mlx5_load_one+0x6b7/frame 0xfe0386031080
init_one() at init_one+0x12d5/frame 0xfe03860310f0
linux_pci_attach_device() at linux_pci_attach device+0x573/frame 
0xfe0386031150

device_attach() at device_attach+0x3ca/frame 0xfe0386031190
device_probe_and_attach() at device_probe_and_attach+0x70/frame 
0xfe03860311c0

pci_driver_added() at pci_driver_added+0xf6/frame 0xfe0386031200
devclass_driver_added() at devclass_driver_added+0x39/frame 
0xfe0386031240

devclass_add_driver() at devclass_add_driver+0x147/frame 0xfe0386031280
_linux_pci_register_driver() at _linux_pci_register_driver+0xc9/frame 
0xfe03860312a0



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


[2 WEEKS LEFT REMINDER] Call for 2020Q2 quarterly status reports

2020-06-15 Thread Lorenzo Salvadore
Dear FreeBSD Community,

The deadline for the next FreeBSD Quarterly Status update is
July, 1st 2020 for work done since the last round of Quarterly Reports:
April 2020 - June 2020.
I would like to remind you that reports are collected during the last
month of every quarter.

Status report submissions do not need to be very long.  They may be
about anything happening in the FreeBSD project and community, and
they provide a great way to inform FreeBSD users and developers about
work that is underway or has been completed. Report submissions are
not limited to committers; anyone doing anything interesting and
FreeBSD related can -- and should -- write one!

The preferred method is to follow the guidelines at the Quarterly
GitHub repository:

https://github.com/freebsd/freebsd-quarterly

Alternatively you can fetch the Markdown template, fill it in, and
email it to quarterly-submissi...@freebsd.org.
The template can be found at:

https://raw.githubusercontent.com/freebsd/freebsd-quarterly/master/report-sample.md

We look forward to seeing your 2020Q2 reports!

Thanks,

Lorenzo Salvadore (on behalf of quarterly@)
___
freebsd-quarterly-ca...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-quarterly-calls
To unsubscribe, send any mail to 
"freebsd-quarterly-calls-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"