Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
Ryan Libby wrote: >On Sun, Jun 28, 2020 at 9:57 PM Rick Macklem wrote: >> >> Just in case you were waiting for another email, I have now run several >> cycles of the kernel build over NFS on a recent head kernel with the >> one line change and it has not hung. >> >> I don't know if this is the correct fix, but it would be nice to get >> something >> into head to fix this. >> >> If I don't hear anything in the next few days, I'll put it in a PR so it >> doesn't get forgotten. >> >> rick > >Thanks for the follow through on this. > >I think the patch is not complete. It looks like the problem is that >for systems that do not have UMA_MD_SMALL_ALLOC, we do >uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc); >but we haven't set an appropriate free function. This is probably why >UMA_ZONE_NOFREE was originally there. When NOFREE was removed, it was >appropriate for systems with uma_small_alloc. > >So by default we get page_free as our free function. That calls >kmem_free, which calls vmem_free ... but we do our allocs with >vmem_xalloc. I'm not positive, but I think the problem is that in >effect we vmem_xalloc -> vmem_free, not vmem_xfree. > >Three possible fixes: > 1: The one you tested, but this is not best for systems with >uma_small_alloc. > 2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC. > 3: Actually provide an appropriate vmem_bt_free function. > >I think we should just do option 2 with a comment, it's simple and it's >what we used to do. I'm not sure how much benefit we would see from >option 3, but it's more work. I set hw.physmem to 1Gbyte on my amd64 system (did not have the patch) and ran 6 cycles of the kernel build over NFS without a hang, so I don't think any fix is needed for systems that support UMA_MD_SMALL_ALLOC. The trivial patch for option 2 is attached. I didn't do a comment, since you understand this and can probably describe it more correctly. Thanks, rick Ryan > > ____ > From: owner-freebsd-curr...@freebsd.org > on behalf of Rick Macklem > Sent: Thursday, June 18, 2020 11:42 PM > To: Ryan Libby > Cc: Konstantin Belousov; Jeff Roberson; freebsd-current@freebsd.org > Subject: Re: r358252 causes intermittent hangs where processes are stuck > sleeping on btalloc > > Ryan Libby wrote: > >On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem wrote: > >> > >> Rick Macklem wrote: > >> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over = > NFS. > >> >I thought this was the culprit, since I did 6 cycles of r358097 without = > a hang. > >> >However, I just got a hang with r358097, but it looks rather different. > >> >The r358097 hang did not have any processes sleeping on btalloc. They > >> >appeared to be waiting on two different locks in the buffer cache. > >> >As such, I think it might be a different problem. (I'll admit I should h= > ave > >> >made notes about this one before rebooting, but I was flustrated that > >> >it happened and rebooted before looking at it mush detail.) > >> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never > >> got a hang. > >> --> It seems that r358097 is the culprit and r358098 makes it easier > >> to reproduce. > >> --> Basically runs out of kernel memory. > >> > >> It is not obvious if I can revert these two commits without reverting > >> other ones, since there were a bunch of vm changes after these. > >> > >> I'll take a look, but if you guys have any ideas on how to fix this, plea= > se > >> let me know. > >> > >> Thanks, rick > > > >Interesting. Could you try re-adding UMA_ZONE_NOFREE to the vmem btag > >zone to see if that rescues it, on whatever base revision gets you a > >reliable repro? > Good catch! That seems to fix it. I've done 8 cycles of kernel build over > NFS without a hang (normally I'd get one in the first 1-3 cycles). > > I don't know if the intend was to delete UMA_ZONE_VM and r358097 > had a typo in it and deleted UMA_ZONE_NOFREE or ??? > > Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and > the hangs seem to have gone away. > > The small patch I did is attached, in case that isn't what you meant. > > I'll run a few more cycles just in case, but I think this fixes it. > > Thanks, rick > > > > > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium= > 4 > > (single core
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
On Sun, Jun 28, 2020 at 9:57 PM Rick Macklem wrote: > > Just in case you were waiting for another email, I have now run several > cycles of the kernel build over NFS on a recent head kernel with the > one line change and it has not hung. > > I don't know if this is the correct fix, but it would be nice to get something > into head to fix this. > > If I don't hear anything in the next few days, I'll put it in a PR so it > doesn't get forgotten. > > rick Thanks for the follow through on this. I think the patch is not complete. It looks like the problem is that for systems that do not have UMA_MD_SMALL_ALLOC, we do uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc); but we haven't set an appropriate free function. This is probably why UMA_ZONE_NOFREE was originally there. When NOFREE was removed, it was appropriate for systems with uma_small_alloc. So by default we get page_free as our free function. That calls kmem_free, which calls vmem_free ... but we do our allocs with vmem_xalloc. I'm not positive, but I think the problem is that in effect we vmem_xalloc -> vmem_free, not vmem_xfree. Three possible fixes: 1: The one you tested, but this is not best for systems with uma_small_alloc. 2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC. 3: Actually provide an appropriate vmem_bt_free function. I think we should just do option 2 with a comment, it's simple and it's what we used to do. I'm not sure how much benefit we would see from option 3, but it's more work. Ryan > > > From: owner-freebsd-curr...@freebsd.org > on behalf of Rick Macklem > Sent: Thursday, June 18, 2020 11:42 PM > To: Ryan Libby > Cc: Konstantin Belousov; Jeff Roberson; freebsd-current@freebsd.org > Subject: Re: r358252 causes intermittent hangs where processes are stuck > sleeping on btalloc > > Ryan Libby wrote: > >On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem wrote: > >> > >> Rick Macklem wrote: > >> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over = > NFS. > >> >I thought this was the culprit, since I did 6 cycles of r358097 without = > a hang. > >> >However, I just got a hang with r358097, but it looks rather different. > >> >The r358097 hang did not have any processes sleeping on btalloc. They > >> >appeared to be waiting on two different locks in the buffer cache. > >> >As such, I think it might be a different problem. (I'll admit I should h= > ave > >> >made notes about this one before rebooting, but I was flustrated that > >> >it happened and rebooted before looking at it mush detail.) > >> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never > >> got a hang. > >> --> It seems that r358097 is the culprit and r358098 makes it easier > >> to reproduce. > >> --> Basically runs out of kernel memory. > >> > >> It is not obvious if I can revert these two commits without reverting > >> other ones, since there were a bunch of vm changes after these. > >> > >> I'll take a look, but if you guys have any ideas on how to fix this, plea= > se > >> let me know. > >> > >> Thanks, rick > > > >Interesting. Could you try re-adding UMA_ZONE_NOFREE to the vmem btag > >zone to see if that rescues it, on whatever base revision gets you a > >reliable repro? > Good catch! That seems to fix it. I've done 8 cycles of kernel build over > NFS without a hang (normally I'd get one in the first 1-3 cycles). > > I don't know if the intend was to delete UMA_ZONE_VM and r358097 > had a typo in it and deleted UMA_ZONE_NOFREE or ??? > > Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and > the hangs seem to have gone away. > > The small patch I did is attached, in case that isn't what you meant. > > I'll run a few more cycles just in case, but I think this fixes it. > > Thanks, rick > > > > > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium= > 4 > > (single core i386) with 1.25Gbytes ram when doing kernel builds using > > head kernels from this winter. (I also saw one when doing a kernel build > > on UFS, so they aren't NFS specific, although easier to reproduce that wa= > y.) > > After a typical hang, there will be a bunch of processes sleeping on "bta= > lloc" > > and several processes holding the following lock: > > exclusive sx lock @ vm/vm_map.c:4761 > > - I have seen hangs where that is the only lock held by any process excep= > t > &
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
Just in case you were waiting for another email, I have now run several cycles of the kernel build over NFS on a recent head kernel with the one line change and it has not hung. I don't know if this is the correct fix, but it would be nice to get something into head to fix this. If I don't hear anything in the next few days, I'll put it in a PR so it doesn't get forgotten. rick From: owner-freebsd-curr...@freebsd.org on behalf of Rick Macklem Sent: Thursday, June 18, 2020 11:42 PM To: Ryan Libby Cc: Konstantin Belousov; Jeff Roberson; freebsd-current@freebsd.org Subject: Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc Ryan Libby wrote: >On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem wrote: >> >> Rick Macklem wrote: >> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over = NFS. >> >I thought this was the culprit, since I did 6 cycles of r358097 without = a hang. >> >However, I just got a hang with r358097, but it looks rather different. >> >The r358097 hang did not have any processes sleeping on btalloc. They >> >appeared to be waiting on two different locks in the buffer cache. >> >As such, I think it might be a different problem. (I'll admit I should h= ave >> >made notes about this one before rebooting, but I was flustrated that >> >it happened and rebooted before looking at it mush detail.) >> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never >> got a hang. >> --> It seems that r358097 is the culprit and r358098 makes it easier >> to reproduce. >> --> Basically runs out of kernel memory. >> >> It is not obvious if I can revert these two commits without reverting >> other ones, since there were a bunch of vm changes after these. >> >> I'll take a look, but if you guys have any ideas on how to fix this, plea= se >> let me know. >> >> Thanks, rick > >Interesting. Could you try re-adding UMA_ZONE_NOFREE to the vmem btag >zone to see if that rescues it, on whatever base revision gets you a >reliable repro? Good catch! That seems to fix it. I've done 8 cycles of kernel build over NFS without a hang (normally I'd get one in the first 1-3 cycles). I don't know if the intend was to delete UMA_ZONE_VM and r358097 had a typo in it and deleted UMA_ZONE_NOFREE or ??? Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and the hangs seem to have gone away. The small patch I did is attached, in case that isn't what you meant. I'll run a few more cycles just in case, but I think this fixes it. Thanks, rick > > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium= 4 > (single core i386) with 1.25Gbytes ram when doing kernel builds using > head kernels from this winter. (I also saw one when doing a kernel build > on UFS, so they aren't NFS specific, although easier to reproduce that wa= y.) > After a typical hang, there will be a bunch of processes sleeping on "bta= lloc" > and several processes holding the following lock: > exclusive sx lock @ vm/vm_map.c:4761 > - I have seen hangs where that is the only lock held by any process excep= t >the interrupt thread. > - I have also seen processes waiting on the following locks: > kern/subr_vmem.c:1343 > kern/subr_vmem.c:633 > > I can't be absolutely sure r358098 is the culprit, but it seems to make t= he > problem more reproducible. > > If anyone has a patch suggestion, I can test it. > Otherwise, I will continue to test r358097 and earlier, to try and see wh= at hangs > occur. (I've done 8 cycles of testing of r356776 without difficulties, bu= t that > doesn't guarantee it isn't broken.) > > There is a bunch more of the stuff I got for Kostik and Ryan below. > I can do "db" when it is hung, but it is a screen console, so I need to > transcribe the output to email by hand. (ie. If you need something > specific I can do that, but trying to do everything Kostik and Ryan asked > for isn't easy.) > > rick > > > > Konstantin Belousov wrote: > >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote: > >> Konstantin Belousov wrote: > >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote: > >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem = wrote: > >> >> > > >> >> > Hi, > >> >> > > >> >> > Since I hadn't upgraded a kernel through the winter, it took me a= while > >> >> > to bisect this, but r358252 seems to be the culprit. > No longer true. I succeeded in reproducin
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
Ryan Libby wrote: >On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem wrote: >> >> Rick Macklem wrote: >> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over = NFS. >> >I thought this was the culprit, since I did 6 cycles of r358097 without = a hang. >> >However, I just got a hang with r358097, but it looks rather different. >> >The r358097 hang did not have any processes sleeping on btalloc. They >> >appeared to be waiting on two different locks in the buffer cache. >> >As such, I think it might be a different problem. (I'll admit I should h= ave >> >made notes about this one before rebooting, but I was flustrated that >> >it happened and rebooted before looking at it mush detail.) >> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never >> got a hang. >> --> It seems that r358097 is the culprit and r358098 makes it easier >> to reproduce. >> --> Basically runs out of kernel memory. >> >> It is not obvious if I can revert these two commits without reverting >> other ones, since there were a bunch of vm changes after these. >> >> I'll take a look, but if you guys have any ideas on how to fix this, plea= se >> let me know. >> >> Thanks, rick > >Interesting. Could you try re-adding UMA_ZONE_NOFREE to the vmem btag >zone to see if that rescues it, on whatever base revision gets you a >reliable repro? Good catch! That seems to fix it. I've done 8 cycles of kernel build over NFS without a hang (normally I'd get one in the first 1-3 cycles). I don't know if the intend was to delete UMA_ZONE_VM and r358097 had a typo in it and deleted UMA_ZONE_NOFREE or ??? Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and the hangs seem to have gone away. The small patch I did is attached, in case that isn't what you meant. I'll run a few more cycles just in case, but I think this fixes it. Thanks, rick > > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium= 4 > (single core i386) with 1.25Gbytes ram when doing kernel builds using > head kernels from this winter. (I also saw one when doing a kernel build > on UFS, so they aren't NFS specific, although easier to reproduce that wa= y.) > After a typical hang, there will be a bunch of processes sleeping on "bta= lloc" > and several processes holding the following lock: > exclusive sx lock @ vm/vm_map.c:4761 > - I have seen hangs where that is the only lock held by any process excep= t >the interrupt thread. > - I have also seen processes waiting on the following locks: > kern/subr_vmem.c:1343 > kern/subr_vmem.c:633 > > I can't be absolutely sure r358098 is the culprit, but it seems to make t= he > problem more reproducible. > > If anyone has a patch suggestion, I can test it. > Otherwise, I will continue to test r358097 and earlier, to try and see wh= at hangs > occur. (I've done 8 cycles of testing of r356776 without difficulties, bu= t that > doesn't guarantee it isn't broken.) > > There is a bunch more of the stuff I got for Kostik and Ryan below. > I can do "db" when it is hung, but it is a screen console, so I need to > transcribe the output to email by hand. (ie. If you need something > specific I can do that, but trying to do everything Kostik and Ryan asked > for isn't easy.) > > rick > > > > Konstantin Belousov wrote: > >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote: > >> Konstantin Belousov wrote: > >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote: > >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem = wrote: > >> >> > > >> >> > Hi, > >> >> > > >> >> > Since I hadn't upgraded a kernel through the winter, it took me a= while > >> >> > to bisect this, but r358252 seems to be the culprit. > No longer true. I succeeded in reproducing the hang to-day running a > r358251 kernel. > > I haven't had much luck sofar, but see below for what I have learned. > > >> >> > > >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (si= ngle core, > >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang. > >> >> > When I do a "ps" in the debugger, I see processes sleeping on bta= lloc. > >> >> > If I revert to r358251, I cannot reproduce this. > As above, this is no longer true. > > >> >> > > >> >> > Any ideas? > >> >> > > >> >> > I can easily test any change you might suggest to see if it fixes= the > >> >> > problem. > >> >> > > >> >> > If you want more debug info, let me know, since I can easily > >> >> > reproduce it. > >> >> > > >> >> > Thanks, rick > >> >> > >> >> Nothing obvious to me. I can maybe try a repro on a VM... > >> >> > >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welco= me. > >> >> > >> >> "btalloc" is "We're either out of address space or lost a fill race= ." > From what I see, I think it is "out of address space". > For one of the hangs, when I did "show alllocks", everything except the > intr thread, was waiting for the > exclusive sx lock @ vm/vm_map.c:4761 > > >> > > >> >Yes, I would be not
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem wrote: > > Rick Macklem wrote: > >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS. > >I thought this was the culprit, since I did 6 cycles of r358097 without a > >hang. > >However, I just got a hang with r358097, but it looks rather different. > >The r358097 hang did not have any processes sleeping on btalloc. They > >appeared to be waiting on two different locks in the buffer cache. > >As such, I think it might be a different problem. (I'll admit I should have > >made notes about this one before rebooting, but I was flustrated that > >it happened and rebooted before looking at it mush detail.) > Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never > got a hang. > --> It seems that r358097 is the culprit and r358098 makes it easier > to reproduce. > --> Basically runs out of kernel memory. > > It is not obvious if I can revert these two commits without reverting > other ones, since there were a bunch of vm changes after these. > > I'll take a look, but if you guys have any ideas on how to fix this, please > let me know. > > Thanks, rick Interesting. Could you try re-adding UMA_ZONE_NOFREE to the vmem btag zone to see if that rescues it, on whatever base revision gets you a reliable repro? > > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4 > (single core i386) with 1.25Gbytes ram when doing kernel builds using > head kernels from this winter. (I also saw one when doing a kernel build > on UFS, so they aren't NFS specific, although easier to reproduce that way.) > After a typical hang, there will be a bunch of processes sleeping on "btalloc" > and several processes holding the following lock: > exclusive sx lock @ vm/vm_map.c:4761 > - I have seen hangs where that is the only lock held by any process except >the interrupt thread. > - I have also seen processes waiting on the following locks: > kern/subr_vmem.c:1343 > kern/subr_vmem.c:633 > > I can't be absolutely sure r358098 is the culprit, but it seems to make the > problem more reproducible. > > If anyone has a patch suggestion, I can test it. > Otherwise, I will continue to test r358097 and earlier, to try and see what > hangs > occur. (I've done 8 cycles of testing of r356776 without difficulties, but > that > doesn't guarantee it isn't broken.) > > There is a bunch more of the stuff I got for Kostik and Ryan below. > I can do "db" when it is hung, but it is a screen console, so I need to > transcribe the output to email by hand. (ie. If you need something > specific I can do that, but trying to do everything Kostik and Ryan asked > for isn't easy.) > > rick > > > > Konstantin Belousov wrote: > >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote: > >> Konstantin Belousov wrote: > >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote: > >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem > >> >> wrote: > >> >> > > >> >> > Hi, > >> >> > > >> >> > Since I hadn't upgraded a kernel through the winter, it took me a > >> >> > while > >> >> > to bisect this, but r358252 seems to be the culprit. > No longer true. I succeeded in reproducing the hang to-day running a > r358251 kernel. > > I haven't had much luck sofar, but see below for what I have learned. > > >> >> > > >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single > >> >> > core, > >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang. > >> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc. > >> >> > If I revert to r358251, I cannot reproduce this. > As above, this is no longer true. > > >> >> > > >> >> > Any ideas? > >> >> > > >> >> > I can easily test any change you might suggest to see if it fixes the > >> >> > problem. > >> >> > > >> >> > If you want more debug info, let me know, since I can easily > >> >> > reproduce it. > >> >> > > >> >> > Thanks, rick > >> >> > >> >> Nothing obvious to me. I can maybe try a repro on a VM... > >> >> > >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome. > >> >> > >> >> "btalloc" is "We're either out of address space or lost a fill race." > From what I see, I think it is "out of address space". > For one of the hangs, when I did "show alllocks", everything except the > intr thread, was waiting for the > exclusive sx lock @ vm/vm_map.c:4761 > > >> > > >> >Yes, I would be not surprised to be out of something on 1G i386 machine. > >> >Please also add 'show alllocks'. > >> Ok, I used an up to date head kernel and it took longer to reproduce a > >> hang. > Go down to Kostik's comment about kern.maxvnodes for the rest of what I've > learned. (The time it takes to reproduce one of these varies greatly, but I > usually > get one within 3 cycles of a full kernel build over NFS. I have had it happen > once when doing a kernel build over UFS.) > > >> This time, none of the processes are stuck on "btalloc". > > I'll try and g
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
Rick Macklem wrote: >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS. >I thought this was the culprit, since I did 6 cycles of r358097 without a hang. >However, I just got a hang with r358097, but it looks rather different. >The r358097 hang did not have any processes sleeping on btalloc. They >appeared to be waiting on two different locks in the buffer cache. >As such, I think it might be a different problem. (I'll admit I should have >made notes about this one before rebooting, but I was flustrated that >it happened and rebooted before looking at it mush detail.) Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never got a hang. --> It seems that r358097 is the culprit and r358098 makes it easier to reproduce. --> Basically runs out of kernel memory. It is not obvious if I can revert these two commits without reverting other ones, since there were a bunch of vm changes after these. I'll take a look, but if you guys have any ideas on how to fix this, please let me know. Thanks, rick Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4 (single core i386) with 1.25Gbytes ram when doing kernel builds using head kernels from this winter. (I also saw one when doing a kernel build on UFS, so they aren't NFS specific, although easier to reproduce that way.) After a typical hang, there will be a bunch of processes sleeping on "btalloc" and several processes holding the following lock: exclusive sx lock @ vm/vm_map.c:4761 - I have seen hangs where that is the only lock held by any process except the interrupt thread. - I have also seen processes waiting on the following locks: kern/subr_vmem.c:1343 kern/subr_vmem.c:633 I can't be absolutely sure r358098 is the culprit, but it seems to make the problem more reproducible. If anyone has a patch suggestion, I can test it. Otherwise, I will continue to test r358097 and earlier, to try and see what hangs occur. (I've done 8 cycles of testing of r356776 without difficulties, but that doesn't guarantee it isn't broken.) There is a bunch more of the stuff I got for Kostik and Ryan below. I can do "db" when it is hung, but it is a screen console, so I need to transcribe the output to email by hand. (ie. If you need something specific I can do that, but trying to do everything Kostik and Ryan asked for isn't easy.) rick Konstantin Belousov wrote: >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote: >> Konstantin Belousov wrote: >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote: >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem wrote: >> >> > >> >> > Hi, >> >> > >> >> > Since I hadn't upgraded a kernel through the winter, it took me a while >> >> > to bisect this, but r358252 seems to be the culprit. No longer true. I succeeded in reproducing the hang to-day running a r358251 kernel. I haven't had much luck sofar, but see below for what I have learned. >> >> > >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single >> >> > core, >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang. >> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc. >> >> > If I revert to r358251, I cannot reproduce this. As above, this is no longer true. >> >> > >> >> > Any ideas? >> >> > >> >> > I can easily test any change you might suggest to see if it fixes the >> >> > problem. >> >> > >> >> > If you want more debug info, let me know, since I can easily >> >> > reproduce it. >> >> > >> >> > Thanks, rick >> >> >> >> Nothing obvious to me. I can maybe try a repro on a VM... >> >> >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome. >> >> >> >> "btalloc" is "We're either out of address space or lost a fill race." >From what I see, I think it is "out of address space". For one of the hangs, when I did "show alllocks", everything except the intr thread, was waiting for the exclusive sx lock @ vm/vm_map.c:4761 >> > >> >Yes, I would be not surprised to be out of something on 1G i386 machine. >> >Please also add 'show alllocks'. >> Ok, I used an up to date head kernel and it took longer to reproduce a hang. Go down to Kostik's comment about kern.maxvnodes for the rest of what I've learned. (The time it takes to reproduce one of these varies greatly, but I usually get one within 3 cycles of a full kernel build over NFS. I have had it happen once when doing a kernel build over UFS.) >> This time, none of the processes are stuck on "btalloc". > I'll try and give you most of the above, but since I have to type it in by > hand > from the screen, I might not get it all. (I'm no real typist;-) > > show alllocks > exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259 > exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737 > exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ > kern/subr_vmem.c:1343 > exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/v
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
Hope you don't mind the top post, but since this is now an update and somewhat different, I don't think it makes sense to imbed this in the message below. r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS. I thought this was the culprit, since I did 6 cycles of r358097 without a hang. However, I just got a hang with r358097, but it looks rather different. The r358097 hang did not have any processes sleeping on btalloc. They appeared to be waiting on two different locks in the buffer cache. As such, I think it might be a different problem. (I'll admit I should have made notes about this one before rebooting, but I was flustrated that it happened and rebooted before looking at it mush detail.) Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4 (single core i386) with 1.25Gbytes ram when doing kernel builds using head kernels from this winter. (I also saw one when doing a kernel build on UFS, so they aren't NFS specific, although easier to reproduce that way.) After a typical hang, there will be a bunch of processes sleeping on "btalloc" and several processes holding the following lock: exclusive sx lock @ vm/vm_map.c:4761 - I have seen hangs where that is the only lock held by any process except the interrupt thread. - I have also seen processes waiting on the following locks: kern/subr_vmem.c:1343 kern/subr_vmem.c:633 I can't be absolutely sure r358098 is the culprit, but it seems to make the problem more reproducible. If anyone has a patch suggestion, I can test it. Otherwise, I will continue to test r358097 and earlier, to try and see what hangs occur. (I've done 8 cycles of testing of r356776 without difficulties, but that doesn't guarantee it isn't broken.) There is a bunch more of the stuff I got for Kostik and Ryan below. I can do "db" when it is hung, but it is a screen console, so I need to transcribe the output to email by hand. (ie. If you need something specific I can do that, but trying to do everything Kostik and Ryan asked for isn't easy.) rick Konstantin Belousov wrote: >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote: >> Konstantin Belousov wrote: >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote: >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem wrote: >> >> > >> >> > Hi, >> >> > >> >> > Since I hadn't upgraded a kernel through the winter, it took me a while >> >> > to bisect this, but r358252 seems to be the culprit. No longer true. I succeeded in reproducing the hang to-day running a r358251 kernel. I haven't had much luck sofar, but see below for what I have learned. >> >> > >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single >> >> > core, >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang. >> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc. >> >> > If I revert to r358251, I cannot reproduce this. As above, this is no longer true. >> >> > >> >> > Any ideas? >> >> > >> >> > I can easily test any change you might suggest to see if it fixes the >> >> > problem. >> >> > >> >> > If you want more debug info, let me know, since I can easily >> >> > reproduce it. >> >> > >> >> > Thanks, rick >> >> >> >> Nothing obvious to me. I can maybe try a repro on a VM... >> >> >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome. >> >> >> >> "btalloc" is "We're either out of address space or lost a fill race." >From what I see, I think it is "out of address space". For one of the hangs, when I did "show alllocks", everything except the intr thread, was waiting for the exclusive sx lock @ vm/vm_map.c:4761 >> > >> >Yes, I would be not surprised to be out of something on 1G i386 machine. >> >Please also add 'show alllocks'. >> Ok, I used an up to date head kernel and it took longer to reproduce a hang. Go down to Kostik's comment about kern.maxvnodes for the rest of what I've learned. (The time it takes to reproduce one of these varies greatly, but I usually get one within 3 cycles of a full kernel build over NFS. I have had it happen once when doing a kernel build over UFS.) >> This time, none of the processes are stuck on "btalloc". > I'll try and give you most of the above, but since I have to type it in by > hand > from the screen, I might not get it all. (I'm no real typist;-) > > show alllocks > exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259 > exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737 > exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ > kern/subr_vmem.c:1343 > exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663 > exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930 > exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474 > Process 12 (intr) thread 0x.. (108) > exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152 > > > ps > - Not going to list them all, but here are the ones t
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
Konstantin Belousov wrote: >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote: >> Konstantin Belousov wrote: >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote: >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem wrote: >> >> > >> >> > Hi, >> >> > >> >> > Since I hadn't upgraded a kernel through the winter, it took me a while >> >> > to bisect this, but r358252 seems to be the culprit. No longer true. I succeeded in reproducing the hang to-day running a r358251 kernel. I haven't had much luck sofar, but see below for what I have learned. >> >> > >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single >> >> > core, >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang. >> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc. >> >> > If I revert to r358251, I cannot reproduce this. As above, this is no longer true. >> >> > >> >> > Any ideas? >> >> > >> >> > I can easily test any change you might suggest to see if it fixes the >> >> > problem. >> >> > >> >> > If you want more debug info, let me know, since I can easily >> >> > reproduce it. >> >> > >> >> > Thanks, rick >> >> >> >> Nothing obvious to me. I can maybe try a repro on a VM... >> >> >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome. >> >> >> >> "btalloc" is "We're either out of address space or lost a fill race." >From what I see, I think it is "out of address space". For one of the hangs, when I did "show alllocks", everything except the intr thread, was waiting for the exclusive sx lock @ vm/vm_map.c:4761 >> > >> >Yes, I would be not surprised to be out of something on 1G i386 machine. >> >Please also add 'show alllocks'. >> Ok, I used an up to date head kernel and it took longer to reproduce a hang. Go down to Kostik's comment about kern.maxvnodes for the rest of what I've learned. (The time it takes to reproduce one of these varies greatly, but I usually get one within 3 cycles of a full kernel build over NFS. I have had it happen once when doing a kernel build over UFS.) >> This time, none of the processes are stuck on "btalloc". > I'll try and give you most of the above, but since I have to type it in by > hand > from the screen, I might not get it all. (I'm no real typist;-) > > show alllocks > exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259 > exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737 > exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ > kern/subr_vmem.c:1343 > exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663 > exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930 > exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474 > Process 12 (intr) thread 0x.. (108) > exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152 > > > ps > - Not going to list them all, but here are the ones that seem interesting... > 18 0 0 0 DL vlruwt 0x11d939cc [vnlru] > 16 0 0 0 DL (threaded) [bufdaemon] > 100069 D qsleep [bufdaemon] > 100074 D - [bufspacedaemon-0] > 100084 D sdflush 0x11923284 [/ worker] > - and more of these for the other UFS file systems > 9 0 0 0 DL psleep 0x1e2f830 [vmdaemon] > 8 0 0 0 DL (threaded) [pagedaemon] > 100067 D psleep 0x1e2e95c [dom0] > 100072 D launds 0x1e2e968 [laundry: dom0] > 100073 D umarcl 0x12cc720 [uma] > … a bunch of usb and cam ones > 100025 D - 0x1b2ee40 [doneq0] > … > 12 0 0 0 RL (threaded) [intr] > 17 I [swi6: task queue] > 18 Run CPU 0 [swi6: Giant taskq] > … > 10 D swapin 0x1d96dfc[swapper] > - and a bunch more in D state. > Does this mean the swapper was trying to swap in? > > > acttrace > - just shows the keyboard > kdb_enter() at kdb_enter+0x35/frame > vt_kbdevent() at vt_kdbevent+0x329/frame > kdbmux_intr() at kbdmux_intr+0x19/frame > taskqueue_run_locked() at taskqueue_run_locked+0x175/frame > taskqueue_run() at taskqueue_run+0x44/frame > taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame > ithread_loop() at ithread_loop+0x237/frame > fork_exit() at fork_exit+0x6c/frame > fork_trampoline() at 0x../frame > > > show all vmem > vmem 0x.. 'transient arena' > quantum: 4096 > size: 23592960 > inuse: 0 > free: 23592960 > busy tags: 0 > free tags:2 > inusesize freesize > 16777216 0 0 123592960 > vmem 0x.. 'buffer arena' > quantum: 4096 > size: 94683136 > inuse: 94502912 > free: 180224 > busy tags:1463 > free tags: 3 >inuse size freesize > 16384 2 32768 1 16384 > 32768 39 1277952 1 32768 > 655361422
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote: > Konstantin Belousov wrote: > >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote: > >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem wrote: > >> > > >> > Hi, > >> > > >> > Since I hadn't upgraded a kernel through the winter, it took me a while > >> > to bisect this, but r358252 seems to be the culprit. > >> > > >> > If I do a kernel build over NFS using my not so big Pentium 4 (single > >> > core, > >> > 1.25Gbytes RAM, i386), about every second attempt will hang. > >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc. > >> > If I revert to r358251, I cannot reproduce this. > >> > > >> > Any ideas? > >> > > >> > I can easily test any change you might suggest to see if it fixes the > >> > problem. > >> > > >> > If you want more debug info, let me know, since I can easily > >> > reproduce it. > >> > > >> > Thanks, rick > >> > >> Nothing obvious to me. I can maybe try a repro on a VM... > >> > >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome. > >> > >> "btalloc" is "We're either out of address space or lost a fill race." > > > >Yes, I would be not surprised to be out of something on 1G i386 machine. > >Please also add 'show alllocks'. > Ok, I used an up to date head kernel and it took longer to reproduce a hang. > This time, none of the processes are stuck on "btalloc". > I'll try and give you most of the above, but since I have to type it in by > hand > from the screen, I might not get it all. (I'm no real typist;-) > > show alllocks > exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259 > exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737 > exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ > kern/subr_vmem.c:1343 > exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663 > exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930 > exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474 > Process 12 (intr) thread 0x.. (108) > exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152 > > > ps > - Not going to list them all, but here are the ones that seem interesting... > 18 0 0 0 DL vlruwt 0x11d939cc [vnlru] > 16 0 0 0 DL (threaded) [bufdaemon] > 100069 D qsleep [bufdaemon] > 100074 D - [bufspacedaemon-0] > 100084 D sdflush 0x11923284 [/ worker] > - and more of these for the other UFS file systems > 9 0 0 0 DL psleep 0x1e2f830 [vmdaemon] > 8 0 0 0 DL (threaded) [pagedaemon] > 100067 D psleep 0x1e2e95c [dom0] > 100072 D launds 0x1e2e968 [laundry: dom0] > 100073 D umarcl 0x12cc720 [uma] > … a bunch of usb and cam ones > 100025 D - 0x1b2ee40 [doneq0] > … > 12 0 0 0 RL (threaded) [intr] > 17 I [swi6: task queue] > 18 Run CPU 0 [swi6: Giant taskq] > … > 10 D swapin 0x1d96dfc[swapper] > - and a bunch more in D state. > Does this mean the swapper was trying to swap in? > > > acttrace > - just shows the keyboard > kdb_enter() at kdb_enter+0x35/frame > vt_kbdevent() at vt_kdbevent+0x329/frame > kdbmux_intr() at kbdmux_intr+0x19/frame > taskqueue_run_locked() at taskqueue_run_locked+0x175/frame > taskqueue_run() at taskqueue_run+0x44/frame > taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame > ithread_loop() at ithread_loop+0x237/frame > fork_exit() at fork_exit+0x6c/frame > fork_trampoline() at 0x../frame > > > show all vmem > vmem 0x.. 'transient arena' > quantum: 4096 > size: 23592960 > inuse: 0 > free: 23592960 > busy tags: 0 > free tags:2 > inusesize freesize > 16777216 0 0 123592960 > vmem 0x.. 'buffer arena' > quantum: 4096 > size: 94683136 > inuse: 94502912 > free: 180224 > busy tags:1463 > free tags: 3 >inuse size freesize > 16384 2 32768 1 16384 > 32768 39 1277952 1 32768 > 655361422 93192192 0 0 > 131072 0 01 131072 > vmem 0x.. 'i386trampoline' > quantum: 1 > size: 24576 > inuse: 20860 > free: 3716 > busy tags: 9 > free tags: 3 >inuse size free size > 32 1 481 52 > 64 2208 0 0 > 1282280 00 > 2048 12048 1 3664 > 4096 28192 0 0 > 8192 110084 0 0 > vmem 0x.. 'kernel rwx arena' >
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
Konstantin Belousov wrote: >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote: >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem wrote: >> > >> > Hi, >> > >> > Since I hadn't upgraded a kernel through the winter, it took me a while >> > to bisect this, but r358252 seems to be the culprit. >> > >> > If I do a kernel build over NFS using my not so big Pentium 4 (single core, >> > 1.25Gbytes RAM, i386), about every second attempt will hang. >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc. >> > If I revert to r358251, I cannot reproduce this. >> > >> > Any ideas? >> > >> > I can easily test any change you might suggest to see if it fixes the >> > problem. >> > >> > If you want more debug info, let me know, since I can easily >> > reproduce it. >> > >> > Thanks, rick >> >> Nothing obvious to me. I can maybe try a repro on a VM... >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome. >> >> "btalloc" is "We're either out of address space or lost a fill race." > >Yes, I would be not surprised to be out of something on 1G i386 machine. >Please also add 'show alllocks'. Ok, I used an up to date head kernel and it took longer to reproduce a hang. This time, none of the processes are stuck on "btalloc". I'll try and give you most of the above, but since I have to type it in by hand from the screen, I might not get it all. (I'm no real typist;-) > show alllocks exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259 exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737 exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ kern/subr_vmem.c:1343 exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663 exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930 exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474 Process 12 (intr) thread 0x.. (108) exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152 > ps - Not going to list them all, but here are the ones that seem interesting... 18 0 0 0 DL vlruwt 0x11d939cc [vnlru] 16 0 0 0 DL (threaded) [bufdaemon] 100069 D qsleep [bufdaemon] 100074 D - [bufspacedaemon-0] 100084 D sdflush 0x11923284 [/ worker] - and more of these for the other UFS file systems 9 0 0 0 DL psleep 0x1e2f830 [vmdaemon] 8 0 0 0 DL (threaded) [pagedaemon] 100067 D psleep 0x1e2e95c [dom0] 100072 D launds 0x1e2e968 [laundry: dom0] 100073 D umarcl 0x12cc720 [uma] … a bunch of usb and cam ones 100025 D - 0x1b2ee40 [doneq0] … 12 0 0 0 RL (threaded) [intr] 17 I [swi6: task queue] 18 Run CPU 0 [swi6: Giant taskq] … 10 D swapin 0x1d96dfc[swapper] - and a bunch more in D state. Does this mean the swapper was trying to swap in? > acttrace - just shows the keyboard kdb_enter() at kdb_enter+0x35/frame vt_kbdevent() at vt_kdbevent+0x329/frame kdbmux_intr() at kbdmux_intr+0x19/frame taskqueue_run_locked() at taskqueue_run_locked+0x175/frame taskqueue_run() at taskqueue_run+0x44/frame taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame ithread_loop() at ithread_loop+0x237/frame fork_exit() at fork_exit+0x6c/frame fork_trampoline() at 0x../frame > show all vmem vmem 0x.. 'transient arena' quantum: 4096 size: 23592960 inuse: 0 free: 23592960 busy tags: 0 free tags:2 inusesize freesize 16777216 0 0 123592960 vmem 0x.. 'buffer arena' quantum: 4096 size: 94683136 inuse: 94502912 free: 180224 busy tags:1463 free tags: 3 inuse size freesize 16384 2 32768 1 16384 32768 39 1277952 1 32768 655361422 93192192 0 0 131072 0 01 131072 vmem 0x.. 'i386trampoline' quantum: 1 size: 24576 inuse: 20860 free: 3716 busy tags: 9 free tags: 3 inuse size free size 32 1 481 52 64 2208 0 0 1282280 00 2048 12048 1 3664 4096 28192 0 0 8192 110084 0 0 vmem 0x.. 'kernel rwx arena' quantum:4096 size: 0 inuse: 0 free: 0 busy tags: 0 free tags: 0 vmem 0x.. 'kernel area dom' quantum: 4096 size: 56623104 inuse: 56582144 free: 40960 busy tags: 11224 free tags: 3 inuse size free size 4096 1109145428736 0
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote: > On Wed, May 20, 2020 at 6:04 PM Rick Macklem wrote: > > > > Hi, > > > > Since I hadn't upgraded a kernel through the winter, it took me a while > > to bisect this, but r358252 seems to be the culprit. > > > > If I do a kernel build over NFS using my not so big Pentium 4 (single core, > > 1.25Gbytes RAM, i386), about every second attempt will hang. > > When I do a "ps" in the debugger, I see processes sleeping on btalloc. > > If I revert to r358251, I cannot reproduce this. > > > > Any ideas? > > > > I can easily test any change you might suggest to see if it fixes the > > problem. > > > > If you want more debug info, let me know, since I can easily > > reproduce it. > > > > Thanks, rick > > Nothing obvious to me. I can maybe try a repro on a VM... > > ddb ps, acttrace, alltrace, show all vmem, show page would be welcome. > > "btalloc" is "We're either out of address space or lost a fill race." Yes, I would be not surprised to be out of something on 1G i386 machine. Please also add 'show alllocks'. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
On Wed, May 20, 2020 at 6:04 PM Rick Macklem wrote: > > Hi, > > Since I hadn't upgraded a kernel through the winter, it took me a while > to bisect this, but r358252 seems to be the culprit. > > If I do a kernel build over NFS using my not so big Pentium 4 (single core, > 1.25Gbytes RAM, i386), about every second attempt will hang. > When I do a "ps" in the debugger, I see processes sleeping on btalloc. > If I revert to r358251, I cannot reproduce this. > > Any ideas? > > I can easily test any change you might suggest to see if it fixes the > problem. > > If you want more debug info, let me know, since I can easily > reproduce it. > > Thanks, rick Nothing obvious to me. I can maybe try a repro on a VM... ddb ps, acttrace, alltrace, show all vmem, show page would be welcome. "btalloc" is "We're either out of address space or lost a fill race." ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
Hi, Since I hadn't upgraded a kernel through the winter, it took me a while to bisect this, but r358252 seems to be the culprit. If I do a kernel build over NFS using my not so big Pentium 4 (single core, 1.25Gbytes RAM, i386), about every second attempt will hang. When I do a "ps" in the debugger, I see processes sleeping on btalloc. If I revert to r358251, I cannot reproduce this. Any ideas? I can easily test any change you might suggest to see if it fixes the problem. If you want more debug info, let me know, since I can easily reproduce it. Thanks, rick ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"