Re: spec_getpages I/O read failure on md0

2003-01-02 Thread phk

In message [EMAIL PROTECTED], Bruce Evans writes:

The md driver doesn't set any of the si_ size parameters so it has no chance
of getting this stuff right when the parameters are not the defaults.

It does however set its sectorsize to 4k.  The problem was GEOM not
setting si_bsize_phys on the dev_t.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: spec_getpages I/O read failure on md0

2003-01-02 Thread Bruce Evans
On Thu, 2 Jan 2003 [EMAIL PROTECTED] wrote:

 In message [EMAIL PROTECTED], Bruce Evans writes:
 
 The md driver doesn't set any of the si_ size parameters so it has no chance
 of getting this stuff right when the parameters are not the defaults.

 It does however set its sectorsize to 4k.  The problem was GEOM not
 setting si_bsize_phys on the dev_t.

The problem must be deeper, since setting it in GEOM doesn't affect
the non-GEOM case.  GEOM can't set it, since it might be different
from the sector size.  Stefan Esser reported some ordering and/or
cloning problems.  GEOM apparently creates an extra device whose
si_bsize_phys can't be touched by the md driver.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: spec_getpages I/O read failure on md0

2003-01-02 Thread phk
In message [EMAIL PROTECTED], Bruce Evans writes:
On Thu, 2 Jan 2003 [EMAIL PROTECTED] wrote:

 In message [EMAIL PROTECTED], Bruce Evans writes:
 
 The md driver doesn't set any of the si_ size parameters so it has no chance
 of getting this stuff right when the parameters are not the defaults.

 It does however set its sectorsize to 4k.  The problem was GEOM not
 setting si_bsize_phys on the dev_t.

The problem must be deeper, since setting it in GEOM doesn't affect
the non-GEOM case.  GEOM can't set it, since it might be different
from the sector size.  Stefan Esser reported some ordering and/or
cloning problems.  GEOM apparently creates an extra device whose
si_bsize_phys can't be touched by the md driver.

GEOM does not operate with two different sizes, it operates with a
sectorsize which is defined as the smallest size of data the unit
supports.

Transferring this from md to GEOM to the dev_t should solve the problem
in the GEOM case.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: spec_getpages I/O read failure on md0

2002-12-30 Thread Jos Backus
I was finally able to reproduce this. Here are some offset values:

Dec 30 10:55:07 lizzy kernel: spec_getpages:(md0) I/O read failure: (error=22) b
p 0xce5fb488 vp 0xc41e4708
Dec 30 10:55:07 lizzy kernel: size: 6144, resid: 6144, a_count: 6124, valid: 0x0
Dec 30 10:55:07 lizzy kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 2
Dec 30 10:55:07 lizzy kernel: offset: 20316160

Dec 30 13:29:07 lizzy kernel: spec_getpages:(md0) I/O read failure: (error=22) b
p 0xce5fa630 vp 0xc41e4708
Dec 30 13:29:07 lizzy kernel: size: 6144, resid: 6144, a_count: 6124, valid: 0x0
Dec 30 13:29:07 lizzy kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 2
Dec 30 13:29:07 lizzy kernel: offset: 28729344

Dec 30 16:04:31 lizzy kernel: spec_getpages:(md0) I/O read failure: (error=22) b
p 0xce5f8fe0 vp 0xc41e4708
Dec 30 16:04:31 lizzy kernel: size: 6144, resid: 6144, a_count: 6124, valid: 0x0
Dec 30 16:04:31 lizzy kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 2
Dec 30 16:04:32 lizzy kernel: offset: 20316160

Dec 30 16:05:47 lizzy kernel: spec_getpages:(md0) I/O read failure: (error=22) b
p 0xce5f9ca0 vp 0xc41e4708
Dec 30 16:05:47 lizzy kernel: size: 14848, resid: 14848, a_count: 14404, valid: 
0x0
Dec 30 16:05:47 lizzy kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 4
Dec 30 16:05:47 lizzy kernel: offset: 17874944

Does this help?

-- 
Jos Backus   _/  _/_/_/  Sunnyvale, CA
_/  _/   _/
   _/  _/_/_/
  _/  _/  _/_/
jos at catnook.com_/_/   _/_/_/  require 'std/disclaimer'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



spec_getpages I/O read failure on md0

2002-12-28 Thread Jos Backus
I am using the following fstab entry:

/dev/md0 /tmp mfs rw,nosuid,nodev,-s=32m 0 0

And three times now have I seen this during an installworld, both on a UP and
an SMP system running a very recent -current:

=== usr.sbin/pcvt/ispcvt
install -s -o root -g wheel -m 555   ispcvt /usr/sbin
install -o root -g wheel -m 444 ispcvt.8.gz  /usr/share/man/man8
=== usr.sbin/pcvt/vgaio
echo:Input/output error
*** Error code 1

Stop in /disk0/usr/src/usr.sbin/pcvt/vgaio.
*** Error code 1

Stop in /disk0/usr/src/usr.sbin/pcvt.
*** Error code 1

Stop in /disk0/usr/src/usr.sbin.
*** Error code 1

Stop in /disk0/usr/src.
*** Error code 1

Stop in /disk0/usr/src.
*** Error code 1

Stop in /disk0/usr/src.
*** Error code 1

Stop in /disk0/usr/src.

Accompanied by

Dec 28 01:42:12 lizzy kernel: spec_getpages:(md0) I/O read failure: (error=22) bp 
0xce5f9310 vp 0xc41e8708
Dec 28 01:42:12 lizzy kernel: size: 2048, resid: 2048, a_count: 2028, valid: 0x0
Dec 28 01:42:12 lizzy kernel: nread: 0, reqpage: 0, pindex: 1, pcount: 1
Dec 28 01:42:12 lizzy kernel: vm_fault: pager read error, pid 40673 (sh)
Dec 28 01:43:26 lizzy kernel: spec_getpages:(md0) I/O read failure: (error=22) bp 
0xce5fb158 vp 0xc41e8708
Dec 28 01:43:26 lizzy kernel: size: 14848, resid: 14848, a_count: 14404, valid: 0x0
Dec 28 01:43:26 lizzy kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 4

This is with

FreeBSD lizzy.catnook.com 5.0-CURRENT FreeBSD 5.0-CURRENT #37: Wed Dec 25 13:08:26 
PST 2002 [EMAIL PROTECTED]:/disk0/usr/obj/disk0/usr/src/sys/LIZZY  i386

Any pointers on how to gather more info to debug this problem?

-- 
Jos Backus   _/  _/_/_/  Sunnyvale, CA
_/  _/   _/
   _/  _/_/_/
  _/  _/  _/_/
jos at catnook.com_/_/   _/_/_/  require 'std/disclaimer'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: spec_getpages I/O read failure on md0

2002-12-28 Thread phk
In message [EMAIL PROTECTED], Jos Backus writes:

Accompanied by

Dec 28 01:42:12 lizzy kernel: spec_getpages:(md0) I/O read failure: (error=22) bp 
0xce5f9310 vp 0xc41e8708
Dec 28 01:42:12 lizzy kernel: size: 2048, resid: 2048, a_count: 2028, valid: 0x0
Dec 28 01:42:12 lizzy kernel: nread: 0, reqpage: 0, pindex: 1, pcount: 1
Dec 28 01:42:12 lizzy kernel: vm_fault: pager read error, pid 40673 (sh)
Dec 28 01:43:26 lizzy kernel: spec_getpages:(md0) I/O read failure: (error=22) bp 
0xce5fb158 vp 0xc41e8708
Dec 28 01:43:26 lizzy kernel: size: 14848, resid: 14848, a_count: 14404, valid: 
0x0
Dec 28 01:43:26 lizzy kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 4

22 is EINVAL, so likely cause is a bogus offset.  Either unaligned or
out of range.  Unfortunately the above messages does not contain the
offset of the I/O operation.

Suggest you ammend one or more of the relevant printfs to also printout
the offset at which the I/O operation was attempted.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: spec_getpages I/O read failure on md0

2002-12-28 Thread Jos Backus
On Sat, Dec 28, 2002 at 08:39:32PM +0100, [EMAIL PROTECTED] wrote:
 22 is EINVAL, so likely cause is a bogus offset.  Either unaligned or
 out of range.  Unfortunately the above messages does not contain the
 offset of the I/O operation.
 
 Suggest you ammend one or more of the relevant printfs to also printout
 the offset at which the I/O operation was attempted.

You mean like this?

--- spec_vnops.c.orig   Sat Dec 28 11:46:07 2002
+++ spec_vnops.cSat Dec 28 14:46:46 2002
@@ -958,6 +958,9 @@
printf(
   nread: %d, reqpage: %d, pindex: %lu, pcount: %d\n,
nread, ap-a_reqpage, (u_long)m-pindex, pcount);
+   printf(
+  offset: %llu\n,
+   offset);
/*
 * Free the buffer header back to the swap buffer pool.
 */

-- 
Jos Backus   _/  _/_/_/  Sunnyvale, CA
_/  _/   _/
   _/  _/_/_/
  _/  _/  _/_/
jos at catnook.com_/_/   _/_/_/  require 'std/disclaimer'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: spec_getpages I/O read failure on md0

2002-12-28 Thread phk
In message [EMAIL PROTECTED], Jos Backus writes:
On Sat, Dec 28, 2002 at 08:39:32PM +0100, [EMAIL PROTECTED] wrote:
 22 is EINVAL, so likely cause is a bogus offset.  Either unaligned or
 out of range.  Unfortunately the above messages does not contain the
 offset of the I/O operation.
 
 Suggest you ammend one or more of the relevant printfs to also printout
 the offset at which the I/O operation was attempted.

You mean like this?

for instance, yes.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: spec_getpages I/O read failure on md0

2002-12-28 Thread Bruce Evans
On Sat, 28 Dec 2002, Jos Backus wrote:

 I am using the following fstab entry:

 /dev/md0 /tmp mfs rw,nosuid,nodev,-s=32m 0 0

 And three times now have I seen this during an installworld, both on a UP and
 an SMP system running a very recent -current:

 === usr.sbin/pcvt/ispcvt
 install -s -o root -g wheel -m 555   ispcvt /usr/sbin
 install -o root -g wheel -m 444 ispcvt.8.gz  /usr/share/man/man8
 === usr.sbin/pcvt/vgaio
 echo:Input/output error
 *** Error code 1
 ...
 Accompanied by

 Dec 28 01:42:12 lizzy kernel: spec_getpages:(md0) I/O read failure: (error=22) 
bp 0xce5f9310 vp 0xc41e8708

Known bugs in spec_getpages() (or its callers or infrastructure):
(1) does not understand disk driver's si_iosize_max, so it fails if sizes
larger than this are requested.  Sizes larger than this can occur for
at least calls from exec_map_first_page() for execve() in some cases,
since VM_INITIAL_PAGEIN is constant (but MD) and may exceed si_iosize_max.
VM_INITIAL_PAGEIN 16 pages so it works for most devices on i386's but not
for any device with a limit of 64K on alphas.  This used to cause exec
failures for afd (zip) devices on i386's because the limit was 32K.
(2) see below.

 Dec 28 01:42:12 lizzy kernel: size: 2048, resid: 2048, a_count: 2028, valid: 0x0
 Dec 28 01:42:12 lizzy kernel: nread: 0, reqpage: 0, pindex: 1, pcount: 1
 Dec 28 01:42:12 lizzy kernel: vm_fault: pager read error, pid 40673 (sh)
 Dec 28 01:43:26 lizzy kernel: spec_getpages:(md0) I/O read failure: (error=22) 
bp 0xce5fb158 vp 0xc41e8708
 Dec 28 01:43:26 lizzy kernel: size: 14848, resid: 14848, a_count: 14404, valid: 
0x0
 Dec 28 01:43:26 lizzy kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 4

This seems to be a different but related problem: spec_getpages() understands
the disk driver's si_bsize_phys but si_bsize_phys is apparently not
initialized correctly.  We start with a count of 14404 = 3 * 4096 + 2116
= 28 * 512 + 28.  This is not a multiple of any reasonably block size, so
it must be rounded up.  We round it up to a multiple of 512 (29 * 512),
apparently because we use the default block size of DEV_BSIZE = 512.  The
md driver doesn't like this, apparently because you are using swap-backed
mode which gives a block size of PAGE_SIZE = 4096.

The md driver doesn't set any of the si_ size parameters so it has no chance
of getting this stuff right when the parameters are not the defaults.  The
defaults are set bogusly in too many places to DFLTPHYS for si_iosize_max
and to DEV_BSIZE for si_bsize_phys.

Errors caused by these bugs can be non-deterministic because the pages may
be loaded into memory other means.  E.g., when exec off zip drives
was broken, exec would succeed after several attempts because each attempt
loaded another 8 i386 pages so that all pages were eventually in memory.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: spec_getpages I/O read failure on md0

2002-12-28 Thread Jos Backus
On Sun, Dec 29, 2002 at 03:57:49PM +1100, Bruce Evans wrote:
 Errors caused by these bugs can be non-deterministic because the pages may
 be loaded into memory other means.  E.g., when exec off zip drives
 was broken, exec would succeed after several attempts because each attempt
 loaded another 8 i386 pages so that all pages were eventually in memory.

Fwiw, that jibes with what I am seeing. After sticking in the extra printf so
far I have not been able to reproduce the problem again.

-- 
Jos Backus   _/  _/_/_/  Sunnyvale, CA
_/  _/   _/
   _/  _/_/_/
  _/  _/  _/_/
jos at catnook.com_/_/   _/_/_/  require 'std/disclaimer'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message