Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-07-03 Thread Rick Macklem
Ryan Libby wrote:
>On Sun, Jun 28, 2020 at 9:57 PM Rick Macklem  wrote:
>>
>> Just in case you were waiting for another email, I have now run several
>> cycles of the kernel build over NFS on a recent head kernel with the
>> one line change and it has not hung.
>>
>> I don't know if this is the correct fix, but it would be nice to get 
>> something
>> into head to fix this.
>>
>> If I don't hear anything in the next few days, I'll put it in a PR so it
>> doesn't get forgotten.
>>
>> rick
>
>Thanks for the follow through on this.
>
>I think the patch is not complete.  It looks like the problem is that
>for systems that do not have UMA_MD_SMALL_ALLOC, we do
>uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc);
>but we haven't set an appropriate free function.  This is probably why
>UMA_ZONE_NOFREE was originally there.  When NOFREE was removed, it was
>appropriate for systems with uma_small_alloc.
>
>So by default we get page_free as our free function.  That calls
>kmem_free, which calls vmem_free ... but we do our allocs with
>vmem_xalloc.  I'm not positive, but I think the problem is that in
>effect we vmem_xalloc -> vmem_free, not vmem_xfree.
>
>Three possible fixes:
> 1: The one you tested, but this is not best for systems with
>uma_small_alloc.
> 2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC.
> 3: Actually provide an appropriate vmem_bt_free function.
>
>I think we should just do option 2 with a comment, it's simple and it's
>what we used to do.  I'm not sure how much benefit we would see from
>option 3, but it's more work.
I set hw.physmem to 1Gbyte on my amd64 system (did not have the patch)
and ran 6 cycles of the kernel build over NFS without a hang, so I don't
think any fix is needed for systems that support UMA_MD_SMALL_ALLOC.

The trivial patch for option 2 is attached.
I didn't do a comment, since you understand this and can probably
describe it more correctly.

Thanks, rick

Ryan

>
> 
> From: owner-freebsd-curr...@freebsd.org  
> on behalf of Rick Macklem 
> Sent: Thursday, June 18, 2020 11:42 PM
> To: Ryan Libby
> Cc: Konstantin Belousov; Jeff Roberson; freebsd-current@freebsd.org
> Subject: Re: r358252 causes intermittent hangs where processes are stuck 
> sleeping on btalloc
>
> Ryan Libby wrote:
> >On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem  wrote:
> >>
> >> Rick Macklem wrote:
> >> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over =
> NFS.
> >> >I thought this was the culprit, since I did 6 cycles of r358097 without =
> a hang.
> >> >However, I just got a hang with r358097, but it looks rather different.
> >> >The r358097 hang did not have any processes sleeping on btalloc. They
> >> >appeared to be waiting on two different locks in the buffer cache.
> >> >As such, I think it might be a different problem. (I'll admit I should h=
> ave
> >> >made notes about this one before rebooting, but I was flustrated that
> >> >it happened and rebooted before looking at it mush detail.)
> >> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
> >> got a hang.
> >> --> It seems that r358097 is the culprit and r358098 makes it easier
> >>   to reproduce.
> >>   --> Basically runs out of kernel memory.
> >>
> >> It is not obvious if I can revert these two commits without reverting
> >> other ones, since there were a bunch of vm changes after these.
> >>
> >> I'll take a look, but if you guys have any ideas on how to fix this, plea=
> se
> >> let me know.
> >>
> >> Thanks, rick
> >
> >Interesting.  Could you try re-adding UMA_ZONE_NOFREE to the vmem btag
> >zone to see if that rescues it, on whatever base revision gets you a
> >reliable repro?
> Good catch! That seems to fix it. I've done 8 cycles of kernel build over
> NFS without a hang (normally I'd get one in the first 1-3 cycles).
>
> I don't know if the intend was to delete UMA_ZONE_VM and r358097
> had a typo in it and deleted UMA_ZONE_NOFREE or ???
>
> Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and
> the hangs seem to have gone away.
>
> The small patch I did is attached, in case that isn't what you meant.
>
> I'll run a few more cycles just in case, but I think this fixes it.
>
> Thanks, rick
>
> >
> > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium=
>  4
> > (single core i386) with 1.25Gbytes ram when doing kernel builds using
> > head kernels from this wint

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-30 Thread Ryan Libby
On Sun, Jun 28, 2020 at 9:57 PM Rick Macklem  wrote:
>
> Just in case you were waiting for another email, I have now run several
> cycles of the kernel build over NFS on a recent head kernel with the
> one line change and it has not hung.
>
> I don't know if this is the correct fix, but it would be nice to get something
> into head to fix this.
>
> If I don't hear anything in the next few days, I'll put it in a PR so it
> doesn't get forgotten.
>
> rick

Thanks for the follow through on this.

I think the patch is not complete.  It looks like the problem is that
for systems that do not have UMA_MD_SMALL_ALLOC, we do
uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc);
but we haven't set an appropriate free function.  This is probably why
UMA_ZONE_NOFREE was originally there.  When NOFREE was removed, it was
appropriate for systems with uma_small_alloc.

So by default we get page_free as our free function.  That calls
kmem_free, which calls vmem_free ... but we do our allocs with
vmem_xalloc.  I'm not positive, but I think the problem is that in
effect we vmem_xalloc -> vmem_free, not vmem_xfree.

Three possible fixes:
 1: The one you tested, but this is not best for systems with
uma_small_alloc.
 2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC.
 3: Actually provide an appropriate vmem_bt_free function.

I think we should just do option 2 with a comment, it's simple and it's
what we used to do.  I'm not sure how much benefit we would see from
option 3, but it's more work.

Ryan

>
> 
> From: owner-freebsd-curr...@freebsd.org  
> on behalf of Rick Macklem 
> Sent: Thursday, June 18, 2020 11:42 PM
> To: Ryan Libby
> Cc: Konstantin Belousov; Jeff Roberson; freebsd-current@freebsd.org
> Subject: Re: r358252 causes intermittent hangs where processes are stuck 
> sleeping on btalloc
>
> Ryan Libby wrote:
> >On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem  wrote:
> >>
> >> Rick Macklem wrote:
> >> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over =
> NFS.
> >> >I thought this was the culprit, since I did 6 cycles of r358097 without =
> a hang.
> >> >However, I just got a hang with r358097, but it looks rather different.
> >> >The r358097 hang did not have any processes sleeping on btalloc. They
> >> >appeared to be waiting on two different locks in the buffer cache.
> >> >As such, I think it might be a different problem. (I'll admit I should h=
> ave
> >> >made notes about this one before rebooting, but I was flustrated that
> >> >it happened and rebooted before looking at it mush detail.)
> >> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
> >> got a hang.
> >> --> It seems that r358097 is the culprit and r358098 makes it easier
> >>   to reproduce.
> >>   --> Basically runs out of kernel memory.
> >>
> >> It is not obvious if I can revert these two commits without reverting
> >> other ones, since there were a bunch of vm changes after these.
> >>
> >> I'll take a look, but if you guys have any ideas on how to fix this, plea=
> se
> >> let me know.
> >>
> >> Thanks, rick
> >
> >Interesting.  Could you try re-adding UMA_ZONE_NOFREE to the vmem btag
> >zone to see if that rescues it, on whatever base revision gets you a
> >reliable repro?
> Good catch! That seems to fix it. I've done 8 cycles of kernel build over
> NFS without a hang (normally I'd get one in the first 1-3 cycles).
>
> I don't know if the intend was to delete UMA_ZONE_VM and r358097
> had a typo in it and deleted UMA_ZONE_NOFREE or ???
>
> Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and
> the hangs seem to have gone away.
>
> The small patch I did is attached, in case that isn't what you meant.
>
> I'll run a few more cycles just in case, but I think this fixes it.
>
> Thanks, rick
>
> >
> > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium=
>  4
> > (single core i386) with 1.25Gbytes ram when doing kernel builds using
> > head kernels from this winter. (I also saw one when doing a kernel build
> > on UFS, so they aren't NFS specific, although easier to reproduce that wa=
> y.)
> > After a typical hang, there will be a bunch of processes sleeping on "bta=
> lloc"
> > and several processes holding the following lock:
> > exclusive sx lock @ vm/vm_map.c:4761
> > - I have seen hangs where that is the only lock held by any process excep=
> t
> >the interrupt thread.
> > - I have also seen processes waiting on the followi

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-28 Thread Rick Macklem
Just in case you were waiting for another email, I have now run several
cycles of the kernel build over NFS on a recent head kernel with the
one line change and it has not hung.

I don't know if this is the correct fix, but it would be nice to get something
into head to fix this.

If I don't hear anything in the next few days, I'll put it in a PR so it
doesn't get forgotten.

rick


From: owner-freebsd-curr...@freebsd.org  on 
behalf of Rick Macklem 
Sent: Thursday, June 18, 2020 11:42 PM
To: Ryan Libby
Cc: Konstantin Belousov; Jeff Roberson; freebsd-current@freebsd.org
Subject: Re: r358252 causes intermittent hangs where processes are stuck 
sleeping on btalloc

Ryan Libby wrote:
>On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem  wrote:
>>
>> Rick Macklem wrote:
>> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over =
NFS.
>> >I thought this was the culprit, since I did 6 cycles of r358097 without =
a hang.
>> >However, I just got a hang with r358097, but it looks rather different.
>> >The r358097 hang did not have any processes sleeping on btalloc. They
>> >appeared to be waiting on two different locks in the buffer cache.
>> >As such, I think it might be a different problem. (I'll admit I should h=
ave
>> >made notes about this one before rebooting, but I was flustrated that
>> >it happened and rebooted before looking at it mush detail.)
>> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
>> got a hang.
>> --> It seems that r358097 is the culprit and r358098 makes it easier
>>   to reproduce.
>>   --> Basically runs out of kernel memory.
>>
>> It is not obvious if I can revert these two commits without reverting
>> other ones, since there were a bunch of vm changes after these.
>>
>> I'll take a look, but if you guys have any ideas on how to fix this, plea=
se
>> let me know.
>>
>> Thanks, rick
>
>Interesting.  Could you try re-adding UMA_ZONE_NOFREE to the vmem btag
>zone to see if that rescues it, on whatever base revision gets you a
>reliable repro?
Good catch! That seems to fix it. I've done 8 cycles of kernel build over
NFS without a hang (normally I'd get one in the first 1-3 cycles).

I don't know if the intend was to delete UMA_ZONE_VM and r358097
had a typo in it and deleted UMA_ZONE_NOFREE or ???

Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and
the hangs seem to have gone away.

The small patch I did is attached, in case that isn't what you meant.

I'll run a few more cycles just in case, but I think this fixes it.

Thanks, rick

>
> Jeff, to fill you in, I have been getting intermittent hangs on a Pentium=
 4
> (single core i386) with 1.25Gbytes ram when doing kernel builds using
> head kernels from this winter. (I also saw one when doing a kernel build
> on UFS, so they aren't NFS specific, although easier to reproduce that wa=
y.)
> After a typical hang, there will be a bunch of processes sleeping on "bta=
lloc"
> and several processes holding the following lock:
> exclusive sx lock @ vm/vm_map.c:4761
> - I have seen hangs where that is the only lock held by any process excep=
t
>the interrupt thread.
> - I have also seen processes waiting on the following locks:
> kern/subr_vmem.c:1343
> kern/subr_vmem.c:633
>
> I can't be absolutely sure r358098 is the culprit, but it seems to make t=
he
> problem more reproducible.
>
> If anyone has a patch suggestion, I can test it.
> Otherwise, I will continue to test r358097 and earlier, to try and see wh=
at hangs
> occur. (I've done 8 cycles of testing of r356776 without difficulties, bu=
t that
> doesn't guarantee it isn't broken.)
>
> There is a bunch more of the stuff I got for Kostik and Ryan below.
> I can do "db" when it is hung, but it is a screen console, so I need to
> transcribe the output to email by hand. (ie. If you need something
> specific I can do that, but trying to do everything Kostik and Ryan asked
> for isn't easy.)
>
> rick
>
>
>
> Konstantin Belousov wrote:
> >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
> >> Konstantin Belousov wrote:
> >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
> >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem =
 wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Since I hadn't upgraded a kernel through the winter, it took me a=
 while
> >> >> > to bisect this, but r358252 seems to be the culprit.
> No longer true. I succeeded in reproducing the hang to-day running a
> r358251 kernel.
>
> I haven't had much luck sofar, 

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-18 Thread Rick Macklem
Ryan Libby wrote:
>On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem  wrote:
>>
>> Rick Macklem wrote:
>> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over =
NFS.
>> >I thought this was the culprit, since I did 6 cycles of r358097 without =
a hang.
>> >However, I just got a hang with r358097, but it looks rather different.
>> >The r358097 hang did not have any processes sleeping on btalloc. They
>> >appeared to be waiting on two different locks in the buffer cache.
>> >As such, I think it might be a different problem. (I'll admit I should h=
ave
>> >made notes about this one before rebooting, but I was flustrated that
>> >it happened and rebooted before looking at it mush detail.)
>> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
>> got a hang.
>> --> It seems that r358097 is the culprit and r358098 makes it easier
>>   to reproduce.
>>   --> Basically runs out of kernel memory.
>>
>> It is not obvious if I can revert these two commits without reverting
>> other ones, since there were a bunch of vm changes after these.
>>
>> I'll take a look, but if you guys have any ideas on how to fix this, plea=
se
>> let me know.
>>
>> Thanks, rick
>
>Interesting.  Could you try re-adding UMA_ZONE_NOFREE to the vmem btag
>zone to see if that rescues it, on whatever base revision gets you a
>reliable repro?
Good catch! That seems to fix it. I've done 8 cycles of kernel build over
NFS without a hang (normally I'd get one in the first 1-3 cycles).

I don't know if the intend was to delete UMA_ZONE_VM and r358097
had a typo in it and deleted UMA_ZONE_NOFREE or ???

Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and
the hangs seem to have gone away.

The small patch I did is attached, in case that isn't what you meant.

I'll run a few more cycles just in case, but I think this fixes it.

Thanks, rick

>
> Jeff, to fill you in, I have been getting intermittent hangs on a Pentium=
 4
> (single core i386) with 1.25Gbytes ram when doing kernel builds using
> head kernels from this winter. (I also saw one when doing a kernel build
> on UFS, so they aren't NFS specific, although easier to reproduce that wa=
y.)
> After a typical hang, there will be a bunch of processes sleeping on "bta=
lloc"
> and several processes holding the following lock:
> exclusive sx lock @ vm/vm_map.c:4761
> - I have seen hangs where that is the only lock held by any process excep=
t
>the interrupt thread.
> - I have also seen processes waiting on the following locks:
> kern/subr_vmem.c:1343
> kern/subr_vmem.c:633
>
> I can't be absolutely sure r358098 is the culprit, but it seems to make t=
he
> problem more reproducible.
>
> If anyone has a patch suggestion, I can test it.
> Otherwise, I will continue to test r358097 and earlier, to try and see wh=
at hangs
> occur. (I've done 8 cycles of testing of r356776 without difficulties, bu=
t that
> doesn't guarantee it isn't broken.)
>
> There is a bunch more of the stuff I got for Kostik and Ryan below.
> I can do "db" when it is hung, but it is a screen console, so I need to
> transcribe the output to email by hand. (ie. If you need something
> specific I can do that, but trying to do everything Kostik and Ryan asked
> for isn't easy.)
>
> rick
>
>
>
> Konstantin Belousov wrote:
> >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
> >> Konstantin Belousov wrote:
> >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
> >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem =
 wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Since I hadn't upgraded a kernel through the winter, it took me a=
 while
> >> >> > to bisect this, but r358252 seems to be the culprit.
> No longer true. I succeeded in reproducing the hang to-day running a
> r358251 kernel.
>
> I haven't had much luck sofar, but see below for what I have learned.
>
> >> >> >
> >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (si=
ngle core,
> >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
> >> >> > When I do a "ps" in the debugger, I see processes sleeping on bta=
lloc.
> >> >> > If I revert to r358251, I cannot reproduce this.
> As above, this is no longer true.
>
> >> >> >
> >> >> > Any ideas?
> >> >> >
> >> >> > I can easily test any change you might suggest to see if it fixes=
 the
> >> >> > problem.
> >> >> >
> >> >> > If you want more debug info, let me know, since I can easily
> >> >> > reproduce it.
> >> >> >
> >> >> > Thanks, rick
> >> >>
> >> >> Nothing obvious to me.  I can maybe try a repro on a VM...
> >> >>
> >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welco=
me.
> >> >>
> >> >> "btalloc" is "We're either out of address space or lost a fill race=
."
> From what I see, I think it is "out of address space".
> For one of the hangs, when I did "show alllocks", everything except the
> intr thread, was waiting for the
> exclusive sx lock @ vm/vm_map.c:4761
>
> >> >
> >> >Yes, I would be 

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-15 Thread Ryan Libby
On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem  wrote:
>
> Rick Macklem wrote:
> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS.
> >I thought this was the culprit, since I did 6 cycles of r358097 without a 
> >hang.
> >However, I just got a hang with r358097, but it looks rather different.
> >The r358097 hang did not have any processes sleeping on btalloc. They
> >appeared to be waiting on two different locks in the buffer cache.
> >As such, I think it might be a different problem. (I'll admit I should have
> >made notes about this one before rebooting, but I was flustrated that
> >it happened and rebooted before looking at it mush detail.)
> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
> got a hang.
> --> It seems that r358097 is the culprit and r358098 makes it easier
>   to reproduce.
>   --> Basically runs out of kernel memory.
>
> It is not obvious if I can revert these two commits without reverting
> other ones, since there were a bunch of vm changes after these.
>
> I'll take a look, but if you guys have any ideas on how to fix this, please
> let me know.
>
> Thanks, rick

Interesting.  Could you try re-adding UMA_ZONE_NOFREE to the vmem btag
zone to see if that rescues it, on whatever base revision gets you a
reliable repro?

>
> Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4
> (single core i386) with 1.25Gbytes ram when doing kernel builds using
> head kernels from this winter. (I also saw one when doing a kernel build
> on UFS, so they aren't NFS specific, although easier to reproduce that way.)
> After a typical hang, there will be a bunch of processes sleeping on "btalloc"
> and several processes holding the following lock:
> exclusive sx lock @ vm/vm_map.c:4761
> - I have seen hangs where that is the only lock held by any process except
>the interrupt thread.
> - I have also seen processes waiting on the following locks:
> kern/subr_vmem.c:1343
> kern/subr_vmem.c:633
>
> I can't be absolutely sure r358098 is the culprit, but it seems to make the
> problem more reproducible.
>
> If anyone has a patch suggestion, I can test it.
> Otherwise, I will continue to test r358097 and earlier, to try and see what 
> hangs
> occur. (I've done 8 cycles of testing of r356776 without difficulties, but 
> that
> doesn't guarantee it isn't broken.)
>
> There is a bunch more of the stuff I got for Kostik and Ryan below.
> I can do "db" when it is hung, but it is a screen console, so I need to
> transcribe the output to email by hand. (ie. If you need something
> specific I can do that, but trying to do everything Kostik and Ryan asked
> for isn't easy.)
>
> rick
>
>
>
> Konstantin Belousov wrote:
> >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
> >> Konstantin Belousov wrote:
> >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
> >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  
> >> >> wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Since I hadn't upgraded a kernel through the winter, it took me a 
> >> >> > while
> >> >> > to bisect this, but r358252 seems to be the culprit.
> No longer true. I succeeded in reproducing the hang to-day running a
> r358251 kernel.
>
> I haven't had much luck sofar, but see below for what I have learned.
>
> >> >> >
> >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single 
> >> >> > core,
> >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
> >> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
> >> >> > If I revert to r358251, I cannot reproduce this.
> As above, this is no longer true.
>
> >> >> >
> >> >> > Any ideas?
> >> >> >
> >> >> > I can easily test any change you might suggest to see if it fixes the
> >> >> > problem.
> >> >> >
> >> >> > If you want more debug info, let me know, since I can easily
> >> >> > reproduce it.
> >> >> >
> >> >> > Thanks, rick
> >> >>
> >> >> Nothing obvious to me.  I can maybe try a repro on a VM...
> >> >>
> >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
> >> >>
> >> >> "btalloc" is "We're either out of address space or lost a fill race."
> From what I see, I think it is "out of address space".
> For one of the hangs, when I did "show alllocks", everything except the
> intr thread, was waiting for the
> exclusive sx lock @ vm/vm_map.c:4761
>
> >> >
> >> >Yes, I would be not surprised to be out of something on 1G i386 machine.
> >> >Please also add 'show alllocks'.
> >> Ok, I used an up to date head kernel and it took longer to reproduce a 
> >> hang.
> Go down to Kostik's comment about kern.maxvnodes for the rest of what I've
> learned. (The time it takes to reproduce one of these varies greatly, but I 
> usually
> get one within 3 cycles of a full kernel build over NFS. I have had it happen
> once when doing a kernel build over UFS.)
>
> >> This time, none of the processes are stuck on "btalloc".
> > I'll try and 

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-15 Thread Rick Macklem
Rick Macklem wrote:
>r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS.
>I thought this was the culprit, since I did 6 cycles of r358097 without a hang.
>However, I just got a hang with r358097, but it looks rather different.
>The r358097 hang did not have any processes sleeping on btalloc. They
>appeared to be waiting on two different locks in the buffer cache.
>As such, I think it might be a different problem. (I'll admit I should have
>made notes about this one before rebooting, but I was flustrated that
>it happened and rebooted before looking at it mush detail.)
Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
got a hang.
--> It seems that r358097 is the culprit and r358098 makes it easier
  to reproduce.
  --> Basically runs out of kernel memory.

It is not obvious if I can revert these two commits without reverting
other ones, since there were a bunch of vm changes after these.

I'll take a look, but if you guys have any ideas on how to fix this, please
let me know.

Thanks, rick

Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4
(single core i386) with 1.25Gbytes ram when doing kernel builds using
head kernels from this winter. (I also saw one when doing a kernel build
on UFS, so they aren't NFS specific, although easier to reproduce that way.)
After a typical hang, there will be a bunch of processes sleeping on "btalloc"
and several processes holding the following lock:
exclusive sx lock @ vm/vm_map.c:4761
- I have seen hangs where that is the only lock held by any process except
   the interrupt thread.
- I have also seen processes waiting on the following locks:
kern/subr_vmem.c:1343
kern/subr_vmem.c:633

I can't be absolutely sure r358098 is the culprit, but it seems to make the
problem more reproducible.

If anyone has a patch suggestion, I can test it.
Otherwise, I will continue to test r358097 and earlier, to try and see what 
hangs
occur. (I've done 8 cycles of testing of r356776 without difficulties, but that
doesn't guarantee it isn't broken.)

There is a bunch more of the stuff I got for Kostik and Ryan below.
I can do "db" when it is hung, but it is a screen console, so I need to
transcribe the output to email by hand. (ie. If you need something
specific I can do that, but trying to do everything Kostik and Ryan asked
for isn't easy.)

rick



Konstantin Belousov wrote:
>On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> >> > to bisect this, but r358252 seems to be the culprit.
No longer true. I succeeded in reproducing the hang to-day running a
r358251 kernel.

I haven't had much luck sofar, but see below for what I have learned.

>> >> >
>> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single 
>> >> > core,
>> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> >> > If I revert to r358251, I cannot reproduce this.
As above, this is no longer true.

>> >> >
>> >> > Any ideas?
>> >> >
>> >> > I can easily test any change you might suggest to see if it fixes the
>> >> > problem.
>> >> >
>> >> > If you want more debug info, let me know, since I can easily
>> >> > reproduce it.
>> >> >
>> >> > Thanks, rick
>> >>
>> >> Nothing obvious to me.  I can maybe try a repro on a VM...
>> >>
>> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>> >>
>> >> "btalloc" is "We're either out of address space or lost a fill race."
>From what I see, I think it is "out of address space".
For one of the hangs, when I did "show alllocks", everything except the
intr thread, was waiting for the
exclusive sx lock @ vm/vm_map.c:4761

>> >
>> >Yes, I would be not surprised to be out of something on 1G i386 machine.
>> >Please also add 'show alllocks'.
>> Ok, I used an up to date head kernel and it took longer to reproduce a hang.
Go down to Kostik's comment about kern.maxvnodes for the rest of what I've
learned. (The time it takes to reproduce one of these varies greatly, but I 
usually
get one within 3 cycles of a full kernel build over NFS. I have had it happen
once when doing a kernel build over UFS.)

>> This time, none of the processes are stuck on "btalloc".
> I'll try and give you most of the above, but since I have to type it in by 
> hand
> from the screen, I might not get it all. (I'm no real typist;-)
> > show alllocks
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259
> exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737
> exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ 
> kern/subr_vmem.c:1343
> exclusive lockmgr bufwait (bufwait) r = 0 locked @ 

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-09 Thread Rick Macklem
Hope you don't mind the top post, but since this is now an update and somewhat
different, I don't think it makes sense to imbed this in the message below.

r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS.
I thought this was the culprit, since I did 6 cycles of r358097 without a hang.
However, I just got a hang with r358097, but it looks rather different.
The r358097 hang did not have any processes sleeping on btalloc. They
appeared to be waiting on two different locks in the buffer cache.
As such, I think it might be a different problem. (I'll admit I should have
made notes about this one before rebooting, but I was flustrated that
it happened and rebooted before looking at it mush detail.)

Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4
(single core i386) with 1.25Gbytes ram when doing kernel builds using
head kernels from this winter. (I also saw one when doing a kernel build
on UFS, so they aren't NFS specific, although easier to reproduce that way.)
After a typical hang, there will be a bunch of processes sleeping on "btalloc"
and several processes holding the following lock:
exclusive sx lock @ vm/vm_map.c:4761
- I have seen hangs where that is the only lock held by any process except
   the interrupt thread.
- I have also seen processes waiting on the following locks:
kern/subr_vmem.c:1343
kern/subr_vmem.c:633

I can't be absolutely sure r358098 is the culprit, but it seems to make the
problem more reproducible.

If anyone has a patch suggestion, I can test it.
Otherwise, I will continue to test r358097 and earlier, to try and see what 
hangs
occur. (I've done 8 cycles of testing of r356776 without difficulties, but that
doesn't guarantee it isn't broken.)

There is a bunch more of the stuff I got for Kostik and Ryan below.
I can do "db" when it is hung, but it is a screen console, so I need to
transcribe the output to email by hand. (ie. If you need something
specific I can do that, but trying to do everything Kostik and Ryan asked
for isn't easy.)

rick



Konstantin Belousov wrote:
>On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> >> > to bisect this, but r358252 seems to be the culprit.
No longer true. I succeeded in reproducing the hang to-day running a
r358251 kernel.

I haven't had much luck sofar, but see below for what I have learned.

>> >> >
>> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single 
>> >> > core,
>> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> >> > If I revert to r358251, I cannot reproduce this.
As above, this is no longer true.

>> >> >
>> >> > Any ideas?
>> >> >
>> >> > I can easily test any change you might suggest to see if it fixes the
>> >> > problem.
>> >> >
>> >> > If you want more debug info, let me know, since I can easily
>> >> > reproduce it.
>> >> >
>> >> > Thanks, rick
>> >>
>> >> Nothing obvious to me.  I can maybe try a repro on a VM...
>> >>
>> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>> >>
>> >> "btalloc" is "We're either out of address space or lost a fill race."
>From what I see, I think it is "out of address space".
For one of the hangs, when I did "show alllocks", everything except the
intr thread, was waiting for the
exclusive sx lock @ vm/vm_map.c:4761

>> >
>> >Yes, I would be not surprised to be out of something on 1G i386 machine.
>> >Please also add 'show alllocks'.
>> Ok, I used an up to date head kernel and it took longer to reproduce a hang.
Go down to Kostik's comment about kern.maxvnodes for the rest of what I've
learned. (The time it takes to reproduce one of these varies greatly, but I 
usually
get one within 3 cycles of a full kernel build over NFS. I have had it happen
once when doing a kernel build over UFS.)

>> This time, none of the processes are stuck on "btalloc".
> I'll try and give you most of the above, but since I have to type it in by 
> hand
> from the screen, I might not get it all. (I'm no real typist;-)
> > show alllocks
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259
> exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737
> exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ 
> kern/subr_vmem.c:1343
> exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930
> exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474
> Process 12 (intr) thread 0x.. (108)
> exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152
>
> > ps
> - Not going to list them all, but here are the ones 

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-05-26 Thread Rick Macklem
Konstantin Belousov wrote:
>On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> >> > to bisect this, but r358252 seems to be the culprit.
No longer true. I succeeded in reproducing the hang to-day running a
r358251 kernel.

I haven't had much luck sofar, but see below for what I have learned.

>> >> >
>> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single 
>> >> > core,
>> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> >> > If I revert to r358251, I cannot reproduce this.
As above, this is no longer true.

>> >> >
>> >> > Any ideas?
>> >> >
>> >> > I can easily test any change you might suggest to see if it fixes the
>> >> > problem.
>> >> >
>> >> > If you want more debug info, let me know, since I can easily
>> >> > reproduce it.
>> >> >
>> >> > Thanks, rick
>> >>
>> >> Nothing obvious to me.  I can maybe try a repro on a VM...
>> >>
>> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>> >>
>> >> "btalloc" is "We're either out of address space or lost a fill race."
>From what I see, I think it is "out of address space".
For one of the hangs, when I did "show alllocks", everything except the
intr thread, was waiting for the
exclusive sx lock @ vm/vm_map.c:4761

>> >
>> >Yes, I would be not surprised to be out of something on 1G i386 machine.
>> >Please also add 'show alllocks'.
>> Ok, I used an up to date head kernel and it took longer to reproduce a hang.
Go down to Kostik's comment about kern.maxvnodes for the rest of what I've
learned. (The time it takes to reproduce one of these varies greatly, but I 
usually
get one within 3 cycles of a full kernel build over NFS. I have had it happen
once when doing a kernel build over UFS.)

>> This time, none of the processes are stuck on "btalloc".
> I'll try and give you most of the above, but since I have to type it in by 
> hand
> from the screen, I might not get it all. (I'm no real typist;-)
> > show alllocks
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259
> exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737
> exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ 
> kern/subr_vmem.c:1343
> exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930
> exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474
> Process 12 (intr) thread 0x.. (108)
> exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152
>
> > ps
> - Not going to list them all, but here are the ones that seem interesting...
> 18 0 0 0 DL vlruwt 0x11d939cc [vnlru]
> 16 0 0 0 DL (threaded)   [bufdaemon]
> 100069  D  qsleep  [bufdaemon]
> 100074  D  -   [bufspacedaemon-0]
> 100084  D  sdflush  0x11923284 [/ worker]
> - and more of these for the other UFS file systems
> 9 0 0 0   DL psleep  0x1e2f830  [vmdaemon]
> 8 0 0 0   DL (threaded)   [pagedaemon]
> 100067  D   psleep 0x1e2e95c   [dom0]
> 100072  D   launds 0x1e2e968   [laundry: dom0]
> 100073  D   umarcl 0x12cc720   [uma]
> … a bunch of usb and cam ones
> 100025  D   -   0x1b2ee40  [doneq0]
> …
> 12 0 0 0 RL  (threaded)   [intr]
> 17  I [swi6: task queue]
> 18  Run   CPU 0   [swi6: Giant taskq]
> …
> 10  D   swapin 0x1d96dfc[swapper]
> - and a bunch more in D state.
> Does this mean the swapper was trying to swap in?
>
> > acttrace
> - just shows the keyboard
> kdb_enter() at kdb_enter+0x35/frame
> vt_kbdevent() at vt_kdbevent+0x329/frame
> kdbmux_intr() at kbdmux_intr+0x19/frame
> taskqueue_run_locked() at taskqueue_run_locked+0x175/frame
> taskqueue_run() at taskqueue_run+0x44/frame
> taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame
> ithread_loop() at ithread_loop+0x237/frame
> fork_exit() at fork_exit+0x6c/frame
> fork_trampoline() at 0x../frame
>
> > show all vmem
> vmem 0x.. 'transient arena'
>   quantum: 4096
>   size:  23592960
>   inuse: 0
>   free: 23592960
>   busy tags:   0
>   free tags:2
>  inusesize   freesize
>   16777216   0   0   123592960
> vmem 0x.. 'buffer arena'
>   quantum:  4096
>   size:   94683136
>   inuse: 94502912
>   free: 180224
>   busy tags:1463
>   free tags:  3
>inuse  size freesize
>   16384   2 32768 1 16384
>   32768   39   1277952 1  32768
>   655361422  

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-05-23 Thread Konstantin Belousov
On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
> >> >
> >> > Hi,
> >> >
> >> > Since I hadn't upgraded a kernel through the winter, it took me a while
> >> > to bisect this, but r358252 seems to be the culprit.
> >> >
> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single 
> >> > core,
> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
> >> > If I revert to r358251, I cannot reproduce this.
> >> >
> >> > Any ideas?
> >> >
> >> > I can easily test any change you might suggest to see if it fixes the
> >> > problem.
> >> >
> >> > If you want more debug info, let me know, since I can easily
> >> > reproduce it.
> >> >
> >> > Thanks, rick
> >>
> >> Nothing obvious to me.  I can maybe try a repro on a VM...
> >>
> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
> >>
> >> "btalloc" is "We're either out of address space or lost a fill race."
> >
> >Yes, I would be not surprised to be out of something on 1G i386 machine.
> >Please also add 'show alllocks'.
> Ok, I used an up to date head kernel and it took longer to reproduce a hang.
> This time, none of the processes are stuck on "btalloc".
> I'll try and give you most of the above, but since I have to type it in by 
> hand
> from the screen, I might not get it all. (I'm no real typist;-)
> > show alllocks
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259
> exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737
> exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ 
> kern/subr_vmem.c:1343
> exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930
> exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474
> Process 12 (intr) thread 0x.. (108)
> exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152
> 
> > ps
> - Not going to list them all, but here are the ones that seem interesting...
> 18 0 0 0 DL vlruwt 0x11d939cc [vnlru]
> 16 0 0 0 DL (threaded)   [bufdaemon]
> 100069  D  qsleep  [bufdaemon]
> 100074  D  -   [bufspacedaemon-0]
> 100084  D  sdflush  0x11923284 [/ worker]
> - and more of these for the other UFS file systems
> 9 0 0 0   DL psleep  0x1e2f830  [vmdaemon]
> 8 0 0 0   DL (threaded)   [pagedaemon]
> 100067  D   psleep 0x1e2e95c   [dom0]
> 100072  D   launds 0x1e2e968   [laundry: dom0]
> 100073  D   umarcl 0x12cc720   [uma]
> … a bunch of usb and cam ones
> 100025  D   -   0x1b2ee40  [doneq0]
> …
> 12 0 0 0 RL  (threaded)   [intr]
> 17  I [swi6: task queue]
> 18  Run   CPU 0   [swi6: Giant taskq]
> …
> 10  D   swapin 0x1d96dfc[swapper]
> - and a bunch more in D state.
> Does this mean the swapper was trying to swap in?
> 
> > acttrace
> - just shows the keyboard
> kdb_enter() at kdb_enter+0x35/frame
> vt_kbdevent() at vt_kdbevent+0x329/frame
> kdbmux_intr() at kbdmux_intr+0x19/frame
> taskqueue_run_locked() at taskqueue_run_locked+0x175/frame
> taskqueue_run() at taskqueue_run+0x44/frame
> taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame
> ithread_loop() at ithread_loop+0x237/frame
> fork_exit() at fork_exit+0x6c/frame
> fork_trampoline() at 0x../frame
> 
> > show all vmem
> vmem 0x.. 'transient arena'
>   quantum: 4096
>   size:  23592960
>   inuse: 0
>   free: 23592960
>   busy tags:   0
>   free tags:2
>  inusesize   freesize
>   16777216   0   0   123592960
> vmem 0x.. 'buffer arena'
>   quantum:  4096
>   size:   94683136
>   inuse: 94502912
>   free: 180224
>   busy tags:1463
>   free tags:  3
>inuse  size freesize
>   16384   2 32768 1 16384
>   32768   39   1277952 1  32768
>   655361422   93192192 0   0
>   131072  0 01  131072
> vmem 0x.. 'i386trampoline'
>   quantum:  1
>   size:   24576
>   inuse: 20860
>   free:   3716
>   busy tags: 9
>   free tags:  3
>inuse  size  free  size
>   32 1 481   52
>   64  2208   0   0
>   1282280   00
>   2048  12048 1   3664
>   4096  28192 0   0
>   8192  110084   0   0
> vmem 0x.. 'kernel rwx arena'
>  

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-05-22 Thread Rick Macklem
Konstantin Belousov wrote:
>On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>> >
>> > Hi,
>> >
>> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> > to bisect this, but r358252 seems to be the culprit.
>> >
>> > If I do a kernel build over NFS using my not so big Pentium 4 (single core,
>> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> > If I revert to r358251, I cannot reproduce this.
>> >
>> > Any ideas?
>> >
>> > I can easily test any change you might suggest to see if it fixes the
>> > problem.
>> >
>> > If you want more debug info, let me know, since I can easily
>> > reproduce it.
>> >
>> > Thanks, rick
>>
>> Nothing obvious to me.  I can maybe try a repro on a VM...
>>
>> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>>
>> "btalloc" is "We're either out of address space or lost a fill race."
>
>Yes, I would be not surprised to be out of something on 1G i386 machine.
>Please also add 'show alllocks'.
Ok, I used an up to date head kernel and it took longer to reproduce a hang.
This time, none of the processes are stuck on "btalloc".
I'll try and give you most of the above, but since I have to type it in by hand
from the screen, I might not get it all. (I'm no real typist;-)
> show alllocks
exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259
exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737
exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ 
kern/subr_vmem.c:1343
exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663
exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930
exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474
Process 12 (intr) thread 0x.. (108)
exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152

> ps
- Not going to list them all, but here are the ones that seem interesting...
18 0 0 0 DL vlruwt 0x11d939cc [vnlru]
16 0 0 0 DL (threaded)   [bufdaemon]
100069  D  qsleep  [bufdaemon]
100074  D  -   [bufspacedaemon-0]
100084  D  sdflush  0x11923284 [/ worker]
- and more of these for the other UFS file systems
9 0 0 0   DL psleep  0x1e2f830  [vmdaemon]
8 0 0 0   DL (threaded)   [pagedaemon]
100067  D   psleep 0x1e2e95c   [dom0]
100072  D   launds 0x1e2e968   [laundry: dom0]
100073  D   umarcl 0x12cc720   [uma]
… a bunch of usb and cam ones
100025  D   -   0x1b2ee40  [doneq0]
…
12 0 0 0 RL  (threaded)   [intr]
17  I [swi6: task queue]
18  Run   CPU 0   [swi6: Giant taskq]
…
10  D   swapin 0x1d96dfc[swapper]
- and a bunch more in D state.
Does this mean the swapper was trying to swap in?

> acttrace
- just shows the keyboard
kdb_enter() at kdb_enter+0x35/frame
vt_kbdevent() at vt_kdbevent+0x329/frame
kdbmux_intr() at kbdmux_intr+0x19/frame
taskqueue_run_locked() at taskqueue_run_locked+0x175/frame
taskqueue_run() at taskqueue_run+0x44/frame
taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame
ithread_loop() at ithread_loop+0x237/frame
fork_exit() at fork_exit+0x6c/frame
fork_trampoline() at 0x../frame

> show all vmem
vmem 0x.. 'transient arena'
  quantum: 4096
  size:  23592960
  inuse: 0
  free: 23592960
  busy tags:   0
  free tags:2
 inusesize   freesize
  16777216   0   0   123592960
vmem 0x.. 'buffer arena'
  quantum:  4096
  size:   94683136
  inuse: 94502912
  free: 180224
  busy tags:1463
  free tags:  3
   inuse  size freesize
  16384   2 32768 1 16384
  32768   39   1277952 1  32768
  655361422   93192192 0   0
  131072  0 01  131072
vmem 0x.. 'i386trampoline'
  quantum:  1
  size:   24576
  inuse: 20860
  free:   3716
  busy tags: 9
  free tags:  3
   inuse  size  free  size
  32 1 481   52
  64  2208   0   0
  1282280   00
  2048  12048 1   3664
  4096  28192 0   0
  8192  110084   0   0
vmem 0x.. 'kernel rwx arena'
  quantum:4096
  size: 0
  inuse: 0
  free:   0
  busy tags: 0
  free tags:  0
vmem 0x.. 'kernel area dom'
  quantum:  4096
  size: 56623104
  inuse: 56582144
  free: 40960
  busy tags: 11224
  free tags: 3
inuse size  free  size
  4096  1109145428736 0   

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-05-21 Thread Konstantin Belousov
On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
> >
> > Hi,
> >
> > Since I hadn't upgraded a kernel through the winter, it took me a while
> > to bisect this, but r358252 seems to be the culprit.
> >
> > If I do a kernel build over NFS using my not so big Pentium 4 (single core,
> > 1.25Gbytes RAM, i386), about every second attempt will hang.
> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
> > If I revert to r358251, I cannot reproduce this.
> >
> > Any ideas?
> >
> > I can easily test any change you might suggest to see if it fixes the
> > problem.
> >
> > If you want more debug info, let me know, since I can easily
> > reproduce it.
> >
> > Thanks, rick
> 
> Nothing obvious to me.  I can maybe try a repro on a VM...
> 
> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
> 
> "btalloc" is "We're either out of address space or lost a fill race."

Yes, I would be not surprised to be out of something on 1G i386 machine.
Please also add 'show alllocks'.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-05-21 Thread Ryan Libby
On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>
> Hi,
>
> Since I hadn't upgraded a kernel through the winter, it took me a while
> to bisect this, but r358252 seems to be the culprit.
>
> If I do a kernel build over NFS using my not so big Pentium 4 (single core,
> 1.25Gbytes RAM, i386), about every second attempt will hang.
> When I do a "ps" in the debugger, I see processes sleeping on btalloc.
> If I revert to r358251, I cannot reproduce this.
>
> Any ideas?
>
> I can easily test any change you might suggest to see if it fixes the
> problem.
>
> If you want more debug info, let me know, since I can easily
> reproduce it.
>
> Thanks, rick

Nothing obvious to me.  I can maybe try a repro on a VM...

ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.

"btalloc" is "We're either out of address space or lost a fill race."
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-05-20 Thread Rick Macklem
Hi,

Since I hadn't upgraded a kernel through the winter, it took me a while
to bisect this, but r358252 seems to be the culprit.

If I do a kernel build over NFS using my not so big Pentium 4 (single core,
1.25Gbytes RAM, i386), about every second attempt will hang.
When I do a "ps" in the debugger, I see processes sleeping on btalloc.
If I revert to r358251, I cannot reproduce this.

Any ideas?

I can easily test any change you might suggest to see if it fixes the
problem.

If you want more debug info, let me know, since I can easily
reproduce it.

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"