I've finally pinned down the cause of this bug and created a kernel-
patch that will be posted to the kernel mailing list later.
I'll attach the patch file to this bug report in a separate comment.
The bug is caused because no bounds checking is done in the partition-
table checking routines that look for PC-style partitions (linux, bsd,
solaris, msdos, extended, etc.)
In fs/partitions/msdos.c, in function msdos_partition(), the raw
partition table sector values are read and block_device's created based
on them, but no bounds (sanity) checking is done to ensure the sector
numbers are within the limits of the disk.
As a result, the subsequent attempts by various parts of the kernel to
scan for file-systems cause the repeated errors shown in this bug
report.
This will only occur if the partition table contains sector numbers that
are invalid, or is part of a logical RAID stripe such as that managed by
the dmraid module.
Because the block_devices underlying the RAID device are still exposed
to the kernel, and because they contain 'valid' partition table
structures, they get scanned twice during boot, and again when the
Display/Window manager loads.
Each period of errors usually lasts about 10 seconds-per-disk, and is repeated
3 times during start-up. With a mirror+stripe RAID 1+0 array this is doubled.
The banging of the disk heads against the end of the disk sounds scary and
does, on occasion, *cause physical problems* that result in the disks failing
to initialise until they have been left switched off a while.
My patch creates a new function in fs/partitions/msdos.c
"check_sane_values()" and calls that function from within
msdos_partition() *before* it enters the partition-table build loop.
If insane values are found the function prints a detailed report of each
error to the kernel log (printk() to dmesg) and returns an error which
results in the partition table scan being aborted and msdos_partition()
reporting "unable to read partition table".
The new code in msdos_partition() is:
if (check_sane_values(p, bdev) == -1) {
put_dev_sector(sect); /* release to cache */
return -1; /* report invalid partition table */
}
And the new function check_sane_values() is:
/*
* Check that *all* sector offsets are valid before actually building the
partition structure.
*
* This prevents physical damage to disks and boot-time problems caused by an
apparently valid
* partition table causing attempts to read sectors beyond the end of the
physical disk.
*
* This is especially important where this is the first physical disk in a
striped RAID array
* and the partition table contains sector offsets into the larger logical disk
(beyond the end
* of this physical disk).
*
* The RAID module will correctly manage the disks.
*
* The function is re-entrant so it can call itself to check extended
partitions.
*
* @param p partition table
* @param bdev block device
* @returns -1 if insane values found; 0 otherwise
* @copy Copyright 31 January 2007
* @author TJ <[EMAIL PROTECTED]>
*/
int check_sane_values(struct partition *p, struct block_device *bdev) {
unsigned char *data;
struct partition *ext;
Sector sect;
int slot;
int insane;
int sector_size = bdev_hardsect_size(bdev) / 512;
int ret = 0; /* default is to report ok */
/* don't return early; allow all partition entries to be checked */
for (slot = 1 ; slot <= 4 ; slot++, p++) {
insane = 0; /* track sanity within each table entry */
if (START_SECT(p) > bdev->bd_disk->capacity-1) { /* invalid -
beyond end of disk */
insane |= 1; /* bit-0 flags insane start */
}
if (START_SECT(p)+NR_SECTS(p)-1 > bdev->bd_disk->capacity-1) {
/* invalid - beyond end of disk */
insane |= 2; /* bit-1 flags insane end */
}
if (!insane && is_extended_partition(p)) { /* check the
extended partition */
data = read_dev_sector(bdev, START_SECT(p)*sector_size,
§); /* fetch sector from cache */
if (data) {
if (msdos_magic_present(data + 510)) { /* check
for signature */
ext = (struct partition *) (data +
0x1be);
ret = check_sane_values(ext, bdev); /*
recursive call */
if (ret == -1) /* insanity found */
insane |= 4; /* bit-2 flags
insane extended partition contents */
}
put_dev_sector(sect); /* release sector to
cache */
}
else ret = -1; /* failed to read sector from cache */
}
if (insane) { /* insanity found; report it */
ret = -1; /* error code */
printk("\n"); /* start error report on a fresh line */
if (insane | 1)
printk(" partition %d: start (sector %d) beyond
end of disk (sector %d)\n", slot, START_SECT(p), (unsigned int)
bdev->bd_disk->capacity-1);
if (insane | 2)
printk(" partition %d: end (sector %d) beyond
end of disk (sector %d)\n", slot, START_SECT(p)+NR_SECTS(p)-1, (unsigned int)
bdev->bd_disk->capacity-1);
if (insane | 4)
printk(" partition %d: insane extended
contents\n", slot);
}
}
return ret;
}
** Also affects: linux (upstream)
Importance: Undecided
Status: Unconfirmed
--
Disk Read Errors during boot-time caused by probe of invalid partitions
https://launchpad.net/bugs/77734
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs