Re: gmirror crash writing to disk? Or is it su+j crash?
Replying to myself again, I again doubled the bio_transient_maxcnt: original value 160, failed doubling 360, new value 720; and the machine was able to successfully "for i in jot 10; do make -j4 buildkernel; done" ... But doesn't this mean that we still have a resource exhaustion to worry about? Isn't this just another race waiting for the the right set of conditions? On Tue, Sep 3, 2013 at 11:06 AM, Zaphod Beeblebrox wrote: > Since there weren't any more ideas here, I tried turning off > hyper-threading. This is an old pentium-D type CPU --- that is: one core > with HT. I'm wondering if the HT nature is helping this resource > exhaustion, so I turned off HT (basically making this a single-threaded > CPU) and it seems to have made the problem go away. > > That is not to say that the problem is fixed: it simply means that > replication may be tied to multiple CPUs and/or the allocation of resources > by an HT CPU core. > > > On Mon, Sep 2, 2013 at 3:53 AM, Zaphod Beeblebrox wrote: > >> The first one (kern.geom.transient_map_retries) causes the system to >> wedge. >> >> The second one (default is 180, I doubled to 360) causes the system to >> crash but not dump. >> >> So... neither fixes the problem. >> >> >> On Sat, Aug 31, 2013 at 5:27 AM, Edward Tomasz Napierała < >> tr...@freebsd.org> wrote: >> >>> Wiadomość napisana przez Zaphod Beeblebrox w dniu >>> 31 sie 2013, o godz. 00:49: >>> > Because someone said that there would be no logging of unerlying ATA >>> errors without verbose, I rebooted with verbose and tried the same make -j4 >>> again... and here is the relatively similar core.txt.5 >>> > >>> > >>> https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87 >>> > >>> > Looking at it, gmirror is dropping the same error and the underlying >>> hardware is not causing the error... >>> >>> Let me quote Konstantin: >>> >>> > It is either an exhaustion of the transient map, or a deadlock. >>> > For the first, setting kern.geom.transient_map_retries to 0 could help. >>> > For the second, the count of the transient buffers must be increased, >>> > by kern.bio_transient_maxcnt loader tunable. >>> >>> Could you try both and tell which one of them fixed the problem? Thanks! >>> >>> >> > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gmirror crash writing to disk? Or is it su+j crash?
Since there weren't any more ideas here, I tried turning off hyper-threading. This is an old pentium-D type CPU --- that is: one core with HT. I'm wondering if the HT nature is helping this resource exhaustion, so I turned off HT (basically making this a single-threaded CPU) and it seems to have made the problem go away. That is not to say that the problem is fixed: it simply means that replication may be tied to multiple CPUs and/or the allocation of resources by an HT CPU core. On Mon, Sep 2, 2013 at 3:53 AM, Zaphod Beeblebrox wrote: > The first one (kern.geom.transient_map_retries) causes the system to wedge. > > The second one (default is 180, I doubled to 360) causes the system to > crash but not dump. > > So... neither fixes the problem. > > > On Sat, Aug 31, 2013 at 5:27 AM, Edward Tomasz Napierała < > tr...@freebsd.org> wrote: > >> Wiadomość napisana przez Zaphod Beeblebrox w dniu 31 >> sie 2013, o godz. 00:49: >> > Because someone said that there would be no logging of unerlying ATA >> errors without verbose, I rebooted with verbose and tried the same make -j4 >> again... and here is the relatively similar core.txt.5 >> > >> > >> https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87 >> > >> > Looking at it, gmirror is dropping the same error and the underlying >> hardware is not causing the error... >> >> Let me quote Konstantin: >> >> > It is either an exhaustion of the transient map, or a deadlock. >> > For the first, setting kern.geom.transient_map_retries to 0 could help. >> > For the second, the count of the transient buffers must be increased, >> > by kern.bio_transient_maxcnt loader tunable. >> >> Could you try both and tell which one of them fixed the problem? Thanks! >> >> > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gmirror crash writing to disk? Or is it su+j crash?
The first one (kern.geom.transient_map_retries) causes the system to wedge. The second one (default is 180, I doubled to 360) causes the system to crash but not dump. So... neither fixes the problem. On Sat, Aug 31, 2013 at 5:27 AM, Edward Tomasz Napierała wrote: > Wiadomość napisana przez Zaphod Beeblebrox w dniu 31 > sie 2013, o godz. 00:49: > > Because someone said that there would be no logging of unerlying ATA > errors without verbose, I rebooted with verbose and tried the same make -j4 > again... and here is the relatively similar core.txt.5 > > > > > https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87 > > > > Looking at it, gmirror is dropping the same error and the underlying > hardware is not causing the error... > > Let me quote Konstantin: > > > It is either an exhaustion of the transient map, or a deadlock. > > For the first, setting kern.geom.transient_map_retries to 0 could help. > > For the second, the count of the transient buffers must be increased, > > by kern.bio_transient_maxcnt loader tunable. > > Could you try both and tell which one of them fixed the problem? Thanks! > > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gmirror crash writing to disk? Or is it su+j crash?
Wiadomość napisana przez Zaphod Beeblebrox w dniu 31 sie 2013, o godz. 00:49: > Because someone said that there would be no logging of unerlying ATA errors > without verbose, I rebooted with verbose and tried the same make -j4 again... > and here is the relatively similar core.txt.5 > > https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87 > > Looking at it, gmirror is dropping the same error and the underlying hardware > is not causing the error... Let me quote Konstantin: > It is either an exhaustion of the transient map, or a deadlock. > For the first, setting kern.geom.transient_map_retries to 0 could help. > For the second, the count of the transient buffers must be increased, > by kern.bio_transient_maxcnt loader tunable. Could you try both and tell which one of them fixed the problem? Thanks! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gmirror crash writing to disk? Or is it su+j crash?
Because someone said that there would be no logging of unerlying ATA errors without verbose, I rebooted with verbose and tried the same make -j4 again... and here is the relatively similar core.txt.5 https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87 Looking at it, gmirror is dropping the same error and the underlying hardware is not causing the error... On Fri, Aug 30, 2013 at 6:09 PM, Zaphod Beeblebrox wrote: > My bad. New link for the core.txt.4: > > > https://uk.eicat.ca/owncloud/public.php?service=files&t=f471e5afae483342cd20dc390e9c2dd7 > > > > > On Fri, Aug 30, 2013 at 4:51 PM, Ian Lepore wrote: > >> On Fri, 2013-08-30 at 21:50 +0200, Edward Tomasz Napierała wrote: >> > Wiadomość napisana przez Zaphod Beeblebrox w dniu >> 29 sie 2013, o godz. 23:35: >> > > So I have a system running: >> > > >> > > FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28 >> 03:02:55 >> > > EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE i386 >> > > >> > > and it has two 2T SATA disks. To keep this post short, the crash.txt >> is >> > > here. >> > > >> > > >> https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493 >> > >> > Login error. >> > >> > > now curiously, while running a "make -j4 buildkernel" ... almost >> every time >> > > ... it crashes with: >> > > >> > > g_vfs_done():mirror/walke[WRITE(offset=516764794880, >> length=65536)]error = >> > > 11 >> > > /usr: got error 11 while accessing filesystem >> > > panic: softdep_deallocate_dependencies: unrecovered I/O error >> > >> > This is softupdates panic caused by write operation returning error 11, >> which, >> > according to 'man errno', is EDEADLK. >> > >> > To be honest, I have no idea why gmirror might be returning this error. >> > >> > > ... no error report from the hard drives, simply an error report from >> the >> > > mirror. >> > >> > Note that ahci(4) does not log errors unless you're running with >> bootverbose. >> > >> > > The filesystem is ufs with su+j... but I'm not sure this matters here. >> > >> > It does, kind of - without soft updates/SUJ, the error would be >> non-fatal - it >> > wouldn't panic the box, but it would (probably) cause data corruption. >> >> One of the few places in the kernel that uses EDEADLK is in geom_io.c >> (line 642 in -current) in g_io_transient_map_bio()... >> >> g_io_deliver(bp, EDEADLK/* XXXKIB */); >> >> -- Ian >> >> >> > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gmirror crash writing to disk? Or is it su+j crash?
My bad. New link for the core.txt.4: https://uk.eicat.ca/owncloud/public.php?service=files&t=f471e5afae483342cd20dc390e9c2dd7 On Fri, Aug 30, 2013 at 4:51 PM, Ian Lepore wrote: > On Fri, 2013-08-30 at 21:50 +0200, Edward Tomasz Napierała wrote: > > Wiadomość napisana przez Zaphod Beeblebrox w dniu > 29 sie 2013, o godz. 23:35: > > > So I have a system running: > > > > > > FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28 > 03:02:55 > > > EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE i386 > > > > > > and it has two 2T SATA disks. To keep this post short, the crash.txt > is > > > here. > > > > > > > https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493 > > > > Login error. > > > > > now curiously, while running a "make -j4 buildkernel" ... almost every > time > > > ... it crashes with: > > > > > > g_vfs_done():mirror/walke[WRITE(offset=516764794880, > length=65536)]error = > > > 11 > > > /usr: got error 11 while accessing filesystem > > > panic: softdep_deallocate_dependencies: unrecovered I/O error > > > > This is softupdates panic caused by write operation returning error 11, > which, > > according to 'man errno', is EDEADLK. > > > > To be honest, I have no idea why gmirror might be returning this error. > > > > > ... no error report from the hard drives, simply an error report from > the > > > mirror. > > > > Note that ahci(4) does not log errors unless you're running with > bootverbose. > > > > > The filesystem is ufs with su+j... but I'm not sure this matters here. > > > > It does, kind of - without soft updates/SUJ, the error would be > non-fatal - it > > wouldn't panic the box, but it would (probably) cause data corruption. > > One of the few places in the kernel that uses EDEADLK is in geom_io.c > (line 642 in -current) in g_io_transient_map_bio()... > > g_io_deliver(bp, EDEADLK/* XXXKIB */); > > -- Ian > > > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gmirror crash writing to disk? Or is it su+j crash?
On Fri, 2013-08-30 at 21:50 +0200, Edward Tomasz Napierała wrote: > Wiadomość napisana przez Zaphod Beeblebrox w dniu 29 sie > 2013, o godz. 23:35: > > So I have a system running: > > > > FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28 03:02:55 > > EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE i386 > > > > and it has two 2T SATA disks. To keep this post short, the crash.txt is > > here. > > > > https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493 > > Login error. > > > now curiously, while running a "make -j4 buildkernel" ... almost every time > > ... it crashes with: > > > > g_vfs_done():mirror/walke[WRITE(offset=516764794880, length=65536)]error = > > 11 > > /usr: got error 11 while accessing filesystem > > panic: softdep_deallocate_dependencies: unrecovered I/O error > > This is softupdates panic caused by write operation returning error 11, which, > according to 'man errno', is EDEADLK. > > To be honest, I have no idea why gmirror might be returning this error. > > > ... no error report from the hard drives, simply an error report from the > > mirror. > > Note that ahci(4) does not log errors unless you're running with bootverbose. > > > The filesystem is ufs with su+j... but I'm not sure this matters here. > > It does, kind of - without soft updates/SUJ, the error would be non-fatal - it > wouldn't panic the box, but it would (probably) cause data corruption. One of the few places in the kernel that uses EDEADLK is in geom_io.c (line 642 in -current) in g_io_transient_map_bio()... g_io_deliver(bp, EDEADLK/* XXXKIB */); -- Ian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gmirror crash writing to disk? Or is it su+j crash?
Wiadomość napisana przez Zaphod Beeblebrox w dniu 29 sie 2013, o godz. 23:35: > So I have a system running: > > FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28 03:02:55 > EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE i386 > > and it has two 2T SATA disks. To keep this post short, the crash.txt is > here. > > https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493 Login error. > now curiously, while running a "make -j4 buildkernel" ... almost every time > ... it crashes with: > > g_vfs_done():mirror/walke[WRITE(offset=516764794880, length=65536)]error = > 11 > /usr: got error 11 while accessing filesystem > panic: softdep_deallocate_dependencies: unrecovered I/O error This is softupdates panic caused by write operation returning error 11, which, according to 'man errno', is EDEADLK. To be honest, I have no idea why gmirror might be returning this error. > ... no error report from the hard drives, simply an error report from the > mirror. Note that ahci(4) does not log errors unless you're running with bootverbose. > The filesystem is ufs with su+j... but I'm not sure this matters here. It does, kind of - without soft updates/SUJ, the error would be non-fatal - it wouldn't panic the box, but it would (probably) cause data corruption. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gmirror crash writing to disk? Or is it su+j crash?
I was going to mention that I ran fsck _twice_, but I forgot. Then when that didn't fix it, I dumped the filesystem, newfs'd it and restored it. Then I fsck'd it for good measure. This particular crash immediately follows that treatment. I can do this in a loop: boot -> make -j4 buildkernel -> crash -> single user -> fsck -> fsck again -\ ^-/ On Fri, Aug 30, 2013 at 8:47 AM, Adam Vande More wrote: > On Thu, Aug 29, 2013 at 4:35 PM, Zaphod Beeblebrox wrote: > >> So I have a system running: >> >> FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28 >> 03:02:55 >> EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE i386 >> >> and it has two 2T SATA disks. To keep this post short, the crash.txt is >> here. >> >> >> https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493 >> >> now curiously, while running a "make -j4 buildkernel" ... almost every >> time >> ... it crashes with: >> >> g_vfs_done():mirror/walke[WRITE(offset=516764794880, length=65536)]error = >> 11 >> /usr: got error 11 while accessing filesystem >> panic: softdep_deallocate_dependencies: unrecovered I/O error >> >> ... no error report from the hard drives, simply an error report from the >> mirror. >> >> The filesystem is ufs with su+j... but I'm not sure this matters here. >> >> > Run fsck. > > > -- > Adam Vande More > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gmirror crash writing to disk? Or is it su+j crash?
On Thu, Aug 29, 2013 at 4:35 PM, Zaphod Beeblebrox wrote: > So I have a system running: > > FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28 > 03:02:55 > EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE i386 > > and it has two 2T SATA disks. To keep this post short, the crash.txt is > here. > > > https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493 > > now curiously, while running a "make -j4 buildkernel" ... almost every time > ... it crashes with: > > g_vfs_done():mirror/walke[WRITE(offset=516764794880, length=65536)]error = > 11 > /usr: got error 11 while accessing filesystem > panic: softdep_deallocate_dependencies: unrecovered I/O error > > ... no error report from the hard drives, simply an error report from the > mirror. > > The filesystem is ufs with su+j... but I'm not sure this matters here. > > Run fsck. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
gmirror crash writing to disk? Or is it su+j crash?
So I have a system running: FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28 03:02:55 EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE i386 and it has two 2T SATA disks. To keep this post short, the crash.txt is here. https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493 now curiously, while running a "make -j4 buildkernel" ... almost every time ... it crashes with: g_vfs_done():mirror/walke[WRITE(offset=516764794880, length=65536)]error = 11 /usr: got error 11 while accessing filesystem panic: softdep_deallocate_dependencies: unrecovered I/O error ... no error report from the hard drives, simply an error report from the mirror. The filesystem is ufs with su+j... but I'm not sure this matters here. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"