Re: gmirror crash writing to disk? Or is it su+j crash?

2013-09-05 Thread Zaphod Beeblebrox
Replying to myself again, I again doubled the bio_transient_maxcnt:
original value 160, failed doubling 360, new value 720; and the machine was
able to successfully "for i in jot 10; do make -j4 buildkernel; done" ...

But doesn't this mean that we still have a resource exhaustion to worry
about?  Isn't this just another race waiting for the the right set of
conditions?


On Tue, Sep 3, 2013 at 11:06 AM, Zaphod Beeblebrox wrote:

> Since there weren't any more ideas here, I tried turning off
> hyper-threading.  This is an old pentium-D type CPU --- that is: one core
> with HT.  I'm wondering if the HT nature is helping this resource
> exhaustion, so I turned off HT (basically making this a single-threaded
> CPU) and it seems to have made the problem go away.
>
> That is not to say that the problem is fixed: it simply means that
> replication may be tied to multiple CPUs and/or the allocation of resources
> by an HT CPU core.
>
>
> On Mon, Sep 2, 2013 at 3:53 AM, Zaphod Beeblebrox wrote:
>
>> The first one (kern.geom.transient_map_retries) causes the system to
>> wedge.
>>
>> The second one (default is 180, I doubled to 360) causes the system to
>> crash but not dump.
>>
>> So... neither fixes the problem.
>>
>>
>> On Sat, Aug 31, 2013 at 5:27 AM, Edward Tomasz Napierała <
>> tr...@freebsd.org> wrote:
>>
>>> Wiadomość napisana przez Zaphod Beeblebrox  w dniu
>>> 31 sie 2013, o godz. 00:49:
>>> > Because someone said that there would be no logging of unerlying ATA
>>> errors without verbose, I rebooted with verbose and tried the same make -j4
>>> again... and here is the relatively similar core.txt.5
>>> >
>>> >
>>> https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87
>>> >
>>> > Looking at it, gmirror is dropping the same error and the underlying
>>> hardware is not causing the error...
>>>
>>> Let me quote Konstantin:
>>>
>>> > It is either an exhaustion of the transient map, or a deadlock.
>>> > For the first, setting kern.geom.transient_map_retries to 0 could help.
>>> > For the second, the count of the transient buffers must be increased,
>>> > by kern.bio_transient_maxcnt loader tunable.
>>>
>>> Could you try both and tell which one of them fixed the problem?  Thanks!
>>>
>>>
>>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: gmirror crash writing to disk? Or is it su+j crash?

2013-09-03 Thread Zaphod Beeblebrox
Since there weren't any more ideas here, I tried turning off
hyper-threading.  This is an old pentium-D type CPU --- that is: one core
with HT.  I'm wondering if the HT nature is helping this resource
exhaustion, so I turned off HT (basically making this a single-threaded
CPU) and it seems to have made the problem go away.

That is not to say that the problem is fixed: it simply means that
replication may be tied to multiple CPUs and/or the allocation of resources
by an HT CPU core.


On Mon, Sep 2, 2013 at 3:53 AM, Zaphod Beeblebrox  wrote:

> The first one (kern.geom.transient_map_retries) causes the system to wedge.
>
> The second one (default is 180, I doubled to 360) causes the system to
> crash but not dump.
>
> So... neither fixes the problem.
>
>
> On Sat, Aug 31, 2013 at 5:27 AM, Edward Tomasz Napierała <
> tr...@freebsd.org> wrote:
>
>> Wiadomość napisana przez Zaphod Beeblebrox  w dniu 31
>> sie 2013, o godz. 00:49:
>> > Because someone said that there would be no logging of unerlying ATA
>> errors without verbose, I rebooted with verbose and tried the same make -j4
>> again... and here is the relatively similar core.txt.5
>> >
>> >
>> https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87
>> >
>> > Looking at it, gmirror is dropping the same error and the underlying
>> hardware is not causing the error...
>>
>> Let me quote Konstantin:
>>
>> > It is either an exhaustion of the transient map, or a deadlock.
>> > For the first, setting kern.geom.transient_map_retries to 0 could help.
>> > For the second, the count of the transient buffers must be increased,
>> > by kern.bio_transient_maxcnt loader tunable.
>>
>> Could you try both and tell which one of them fixed the problem?  Thanks!
>>
>>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: gmirror crash writing to disk? Or is it su+j crash?

2013-09-02 Thread Zaphod Beeblebrox
The first one (kern.geom.transient_map_retries) causes the system to wedge.

The second one (default is 180, I doubled to 360) causes the system to
crash but not dump.

So... neither fixes the problem.


On Sat, Aug 31, 2013 at 5:27 AM, Edward Tomasz Napierała
wrote:

> Wiadomość napisana przez Zaphod Beeblebrox  w dniu 31
> sie 2013, o godz. 00:49:
> > Because someone said that there would be no logging of unerlying ATA
> errors without verbose, I rebooted with verbose and tried the same make -j4
> again... and here is the relatively similar core.txt.5
> >
> >
> https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87
> >
> > Looking at it, gmirror is dropping the same error and the underlying
> hardware is not causing the error...
>
> Let me quote Konstantin:
>
> > It is either an exhaustion of the transient map, or a deadlock.
> > For the first, setting kern.geom.transient_map_retries to 0 could help.
> > For the second, the count of the transient buffers must be increased,
> > by kern.bio_transient_maxcnt loader tunable.
>
> Could you try both and tell which one of them fixed the problem?  Thanks!
>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: gmirror crash writing to disk? Or is it su+j crash?

2013-08-31 Thread Edward Tomasz Napierała
Wiadomość napisana przez Zaphod Beeblebrox  w dniu 31 sie 
2013, o godz. 00:49:
> Because someone said that there would be no logging of unerlying ATA errors 
> without verbose, I rebooted with verbose and tried the same make -j4 again... 
> and here is the relatively similar core.txt.5
> 
> https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87
> 
> Looking at it, gmirror is dropping the same error and the underlying hardware 
> is not causing the error...

Let me quote Konstantin:

> It is either an exhaustion of the transient map, or a deadlock.
> For the first, setting kern.geom.transient_map_retries to 0 could help.
> For the second, the count of the transient buffers must be increased,
> by kern.bio_transient_maxcnt loader tunable.

Could you try both and tell which one of them fixed the problem?  Thanks!

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: gmirror crash writing to disk? Or is it su+j crash?

2013-08-30 Thread Zaphod Beeblebrox
Because someone said that there would be no logging of unerlying ATA errors
without verbose, I rebooted with verbose and tried the same make -j4
again... and here is the relatively similar core.txt.5

https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87

Looking at it, gmirror is dropping the same error and the underlying
hardware is not causing the error...


On Fri, Aug 30, 2013 at 6:09 PM, Zaphod Beeblebrox wrote:

> My bad.  New link for the core.txt.4:
>
>
> https://uk.eicat.ca/owncloud/public.php?service=files&t=f471e5afae483342cd20dc390e9c2dd7
>
>
>
>
> On Fri, Aug 30, 2013 at 4:51 PM, Ian Lepore  wrote:
>
>> On Fri, 2013-08-30 at 21:50 +0200, Edward Tomasz Napierała wrote:
>> > Wiadomość napisana przez Zaphod Beeblebrox  w dniu
>> 29 sie 2013, o godz. 23:35:
>> > > So I have a system running:
>> > >
>> > > FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28
>> 03:02:55
>> > > EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE  i386
>> > >
>> > > and it has two 2T SATA disks.  To keep this post short, the crash.txt
>> is
>> > > here.
>> > >
>> > >
>> https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493
>> >
>> > Login error.
>> >
>> > > now curiously, while running a "make -j4 buildkernel" ... almost
>> every time
>> > > ... it crashes with:
>> > >
>> > > g_vfs_done():mirror/walke[WRITE(offset=516764794880,
>> length=65536)]error =
>> > > 11
>> > > /usr: got error 11 while accessing filesystem
>> > > panic: softdep_deallocate_dependencies: unrecovered I/O error
>> >
>> > This is softupdates panic caused by write operation returning error 11,
>> which,
>> > according to 'man errno', is EDEADLK.
>> >
>> > To be honest, I have no idea why gmirror might be returning this error.
>> >
>> > > ... no error report from the hard drives, simply an error report from
>> the
>> > > mirror.
>> >
>> > Note that ahci(4) does not log errors unless you're running with
>> bootverbose.
>> >
>> > > The filesystem is ufs with su+j... but I'm not sure this matters here.
>> >
>> > It does, kind of - without soft updates/SUJ, the error would be
>> non-fatal - it
>> > wouldn't panic the box, but it would (probably) cause data corruption.
>>
>> One of the few places in the kernel that uses EDEADLK is in geom_io.c
>> (line 642 in -current) in g_io_transient_map_bio()...
>>
>> g_io_deliver(bp, EDEADLK/* XXXKIB */);
>>
>> -- Ian
>>
>>
>>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: gmirror crash writing to disk? Or is it su+j crash?

2013-08-30 Thread Zaphod Beeblebrox
My bad.  New link for the core.txt.4:

https://uk.eicat.ca/owncloud/public.php?service=files&t=f471e5afae483342cd20dc390e9c2dd7




On Fri, Aug 30, 2013 at 4:51 PM, Ian Lepore  wrote:

> On Fri, 2013-08-30 at 21:50 +0200, Edward Tomasz Napierała wrote:
> > Wiadomość napisana przez Zaphod Beeblebrox  w dniu
> 29 sie 2013, o godz. 23:35:
> > > So I have a system running:
> > >
> > > FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28
> 03:02:55
> > > EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE  i386
> > >
> > > and it has two 2T SATA disks.  To keep this post short, the crash.txt
> is
> > > here.
> > >
> > >
> https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493
> >
> > Login error.
> >
> > > now curiously, while running a "make -j4 buildkernel" ... almost every
> time
> > > ... it crashes with:
> > >
> > > g_vfs_done():mirror/walke[WRITE(offset=516764794880,
> length=65536)]error =
> > > 11
> > > /usr: got error 11 while accessing filesystem
> > > panic: softdep_deallocate_dependencies: unrecovered I/O error
> >
> > This is softupdates panic caused by write operation returning error 11,
> which,
> > according to 'man errno', is EDEADLK.
> >
> > To be honest, I have no idea why gmirror might be returning this error.
> >
> > > ... no error report from the hard drives, simply an error report from
> the
> > > mirror.
> >
> > Note that ahci(4) does not log errors unless you're running with
> bootverbose.
> >
> > > The filesystem is ufs with su+j... but I'm not sure this matters here.
> >
> > It does, kind of - without soft updates/SUJ, the error would be
> non-fatal - it
> > wouldn't panic the box, but it would (probably) cause data corruption.
>
> One of the few places in the kernel that uses EDEADLK is in geom_io.c
> (line 642 in -current) in g_io_transient_map_bio()...
>
> g_io_deliver(bp, EDEADLK/* XXXKIB */);
>
> -- Ian
>
>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: gmirror crash writing to disk? Or is it su+j crash?

2013-08-30 Thread Ian Lepore
On Fri, 2013-08-30 at 21:50 +0200, Edward Tomasz Napierała wrote:
> Wiadomość napisana przez Zaphod Beeblebrox  w dniu 29 sie 
> 2013, o godz. 23:35:
> > So I have a system running:
> > 
> > FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28 03:02:55
> > EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE  i386
> > 
> > and it has two 2T SATA disks.  To keep this post short, the crash.txt is
> > here.
> > 
> > https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493
> 
> Login error.
> 
> > now curiously, while running a "make -j4 buildkernel" ... almost every time
> > ... it crashes with:
> > 
> > g_vfs_done():mirror/walke[WRITE(offset=516764794880, length=65536)]error =
> > 11
> > /usr: got error 11 while accessing filesystem
> > panic: softdep_deallocate_dependencies: unrecovered I/O error
> 
> This is softupdates panic caused by write operation returning error 11, which,
> according to 'man errno', is EDEADLK.
> 
> To be honest, I have no idea why gmirror might be returning this error.
> 
> > ... no error report from the hard drives, simply an error report from the
> > mirror.
> 
> Note that ahci(4) does not log errors unless you're running with bootverbose.
> 
> > The filesystem is ufs with su+j... but I'm not sure this matters here.
> 
> It does, kind of - without soft updates/SUJ, the error would be non-fatal - it
> wouldn't panic the box, but it would (probably) cause data corruption.

One of the few places in the kernel that uses EDEADLK is in geom_io.c
(line 642 in -current) in g_io_transient_map_bio()...

g_io_deliver(bp, EDEADLK/* XXXKIB */);

-- Ian


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: gmirror crash writing to disk? Or is it su+j crash?

2013-08-30 Thread Edward Tomasz Napierała
Wiadomość napisana przez Zaphod Beeblebrox  w dniu 29 sie 
2013, o godz. 23:35:
> So I have a system running:
> 
> FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28 03:02:55
> EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE  i386
> 
> and it has two 2T SATA disks.  To keep this post short, the crash.txt is
> here.
> 
> https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493

Login error.

> now curiously, while running a "make -j4 buildkernel" ... almost every time
> ... it crashes with:
> 
> g_vfs_done():mirror/walke[WRITE(offset=516764794880, length=65536)]error =
> 11
> /usr: got error 11 while accessing filesystem
> panic: softdep_deallocate_dependencies: unrecovered I/O error

This is softupdates panic caused by write operation returning error 11, which,
according to 'man errno', is EDEADLK.

To be honest, I have no idea why gmirror might be returning this error.

> ... no error report from the hard drives, simply an error report from the
> mirror.

Note that ahci(4) does not log errors unless you're running with bootverbose.

> The filesystem is ufs with su+j... but I'm not sure this matters here.

It does, kind of - without soft updates/SUJ, the error would be non-fatal - it
wouldn't panic the box, but it would (probably) cause data corruption.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: gmirror crash writing to disk? Or is it su+j crash?

2013-08-30 Thread Zaphod Beeblebrox
I was going to mention that I ran fsck _twice_, but I forgot.  Then when
that didn't fix it, I dumped the filesystem, newfs'd it and restored it.
Then I fsck'd it for good measure.

This particular crash immediately follows that treatment.

I can do this in a loop:

boot -> make -j4 buildkernel -> crash -> single user -> fsck -> fsck again
-\

^-/


On Fri, Aug 30, 2013 at 8:47 AM, Adam Vande More wrote:

> On Thu, Aug 29, 2013 at 4:35 PM, Zaphod Beeblebrox wrote:
>
>> So I have a system running:
>>
>> FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28
>> 03:02:55
>> EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE  i386
>>
>> and it has two 2T SATA disks.  To keep this post short, the crash.txt is
>> here.
>>
>>
>> https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493
>>
>> now curiously, while running a "make -j4 buildkernel" ... almost every
>> time
>> ... it crashes with:
>>
>> g_vfs_done():mirror/walke[WRITE(offset=516764794880, length=65536)]error =
>> 11
>> /usr: got error 11 while accessing filesystem
>> panic: softdep_deallocate_dependencies: unrecovered I/O error
>>
>> ... no error report from the hard drives, simply an error report from the
>> mirror.
>>
>> The filesystem is ufs with su+j... but I'm not sure this matters here.
>>
>>
> Run fsck.
>
>
> --
> Adam Vande More
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: gmirror crash writing to disk? Or is it su+j crash?

2013-08-30 Thread Adam Vande More
On Thu, Aug 29, 2013 at 4:35 PM, Zaphod Beeblebrox wrote:

> So I have a system running:
>
> FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28
> 03:02:55
> EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE  i386
>
> and it has two 2T SATA disks.  To keep this post short, the crash.txt is
> here.
>
>
> https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493
>
> now curiously, while running a "make -j4 buildkernel" ... almost every time
> ... it crashes with:
>
> g_vfs_done():mirror/walke[WRITE(offset=516764794880, length=65536)]error =
> 11
> /usr: got error 11 while accessing filesystem
> panic: softdep_deallocate_dependencies: unrecovered I/O error
>
> ... no error report from the hard drives, simply an error report from the
> mirror.
>
> The filesystem is ufs with su+j... but I'm not sure this matters here.
>
>
Run fsck.


-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


gmirror crash writing to disk? Or is it su+j crash?

2013-08-29 Thread Zaphod Beeblebrox
So I have a system running:

FreeBSD walk.dclg.ca 9.2-RC3 FreeBSD 9.2-RC3 # r254952: Wed Aug 28 03:02:55
EDT 2013 r...@walk.dclg.ca:/usr/obj/usr/src/sys/STRIKE  i386

and it has two 2T SATA disks.  To keep this post short, the crash.txt is
here.

https://uk.eicat.ca/owncloud/public.php?service=files&t=fea9d25579fe0c4afb808859e80e1493

now curiously, while running a "make -j4 buildkernel" ... almost every time
... it crashes with:

g_vfs_done():mirror/walke[WRITE(offset=516764794880, length=65536)]error =
11
/usr: got error 11 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error

... no error report from the hard drives, simply an error report from the
mirror.

The filesystem is ufs with su+j... but I'm not sure this matters here.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"