Re: [CentOS] CentOS-6.8 fsck report Maximal Count
On Mon, 2017-03-20 at 04:53 +, Chris Murphy wrote: > On Tue, Mar 14, 2017, 7:41 AM James B. Byrne wrote: > > > On Fri, March 10, 2017 11:57, m.r...@5-cent.us wrote: > > > > > > > > Looks like only one sector's bad. Running badblocks should, > > > > You'll need to search the smartmontools site for their doc on bad sectors. > There's a how to, to find what file is affected by the bad sector so you > can replace it. That's the only way to fix the problem. > > This gets tricky going through LVM. After booting from USB into single-user mode and dd'ing all readable blocks, multiple passes as I then had to "skip=" to start with next good blocks, I ran the manufacurers diag/repair software and had good results. YMMV > > > Chris Murphy > HTH, Bill ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS-6.8 fsck report Maximal Count
On Tue, Mar 14, 2017, 7:41 AM James B. Byrne wrote: > On Fri, March 10, 2017 11:57, m.r...@5-cent.us wrote: > > > > > Looks like only one sector's bad. Running badblocks should, > > I think, mark that sector as bad, so the system doesn't try > > to read or write there. I've got a user whose workstation has > > had a bad sector running for over a year. However, if it > > becomes two, or four, or 64 sectors, it's replacement > > time, asap. > > > > > Bear with me on this. The last time I did anything like this I ended > up having to boot into recovery mode from an install cd and do this by > hand. This is not an option in the present circumstance as the unit > is a headless server in a remote location. > > If I do this: > > echo '-c' > /fsckoptions > touch /forcefsck > shutdown -r now > > Will this repair the bad block and bring the system back up? If not > then what other options should I use? > > The bad block is located in an LV assigned to a libvirt pool > associated with a single vm. Can this be checked and corrected > without having to deal with the base system? If so then how? You'll need to search the smartmontools site for their doc on bad sectors. There's a how to, to find what file is affected by the bad sector so you can replace it. That's the only way to fix the problem. This gets tricky going through LVM. Chris Murphy ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS-6.8 fsck report Maximal Count
On Fri, March 10, 2017 11:57, m.r...@5-cent.us wrote: > > Looks like only one sector's bad. Running badblocks should, > I think, mark that sector as bad, so the system doesn't try > to read or write there. I've got a user whose workstation has > had a bad sector running for over a year. However, if it > becomes two, or four, or 64 sectors, it's replacement > time, asap. > Bear with me on this. The last time I did anything like this I ended up having to boot into recovery mode from an install cd and do this by hand. This is not an option in the present circumstance as the unit is a headless server in a remote location. If I do this: echo '-c' > /fsckoptions touch /forcefsck shutdown -r now Will this repair the bad block and bring the system back up? If not then what other options should I use? The bad block is located in an LV assigned to a libvirt pool associated with a single vm. Can this be checked and corrected without having to deal with the base system? If so then how? Regards, -- *** e-Mail is NOT a SECURE channel *** Do NOT transmit sensitive data via e-Mail Do NOT open attachments nor follow links sent by e-Mail James B. Byrnemailto:byrn...@harte-lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS-6.8 fsck report Maximal Count
Talk about missing the email I wanted to reply too. Disregard... >> On Mar 10, 2017, at 9:28 AM, Valeri Galtsev >> wrote: >>> >>> >>> On Fri, March 10, 2017 9:52 am, Warren Young wrote: On Mar 10, 2017, at 6:32 AM, James B. Byrne wrote: > > On Thu, March 9, 2017 09:46, John Hodrien wrote: >> >> fsck's not good at finding disk errors, it finds filesystem errors. > > If not fsck then what? badblocks(8). >>> >>> And I definitely will unmount relevant filesystem(s) before using >>> badblocks⦠>> >> You donât necessarily have to. The default mode of badblocks is a >> non-invasive read-only test >> which is safe to run on a mounted filesystem. >> >> That said, a read-only badblocks pass can give a false âno errorsâ >> report in cases where a >> non-destructive read-then-write pass (-n) will show errors. >> >> Alternatively, a read-only pass may show an error that a read-then-write >> pass will silently bury >> by forcing the drive to relocate the bad sector. >> >> In extreme cases, you could potentially fix a problem with a >> read-random-random-write pass (-n >> -t >> random -t random) because that will statistically flip all the bits at least >> twice, which may >> rub >> the driveâs nose in a bad sector, forcing a reallocation where a normal >> read-then-write pass >> (-n >> alone) may not. >> >> Hard drives are weird. It is only through the grace of ECC and such that >> they approximate >> deterministic behavior as well as they do. >> ___ >> CentOS mailing list >> CentOS@centos.org >> https://lists.centos.org/mailman/listinfo/centos >> > > > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos > ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS-6.8 fsck report Maximal Count
I get up around 0630, u can come anytime after that. I want to hit the range that morning but if I KNEW when you are arriving, I could plan around that... > On Mar 10, 2017, at 9:28 AM, Valeri Galtsev wrote: >> >> >> On Fri, March 10, 2017 9:52 am, Warren Young wrote: >>> On Mar 10, 2017, at 6:32 AM, James B. Byrne wrote: On Thu, March 9, 2017 09:46, John Hodrien wrote: > > fsck's not good at finding disk errors, it finds filesystem errors. If not fsck then what? >>> >>> badblocks(8). >> >> And I definitely will unmount relevant filesystem(s) before using >> badblocks⦠> > You donât necessarily have to. The default mode of badblocks is a > non-invasive read-only test > which is safe to run on a mounted filesystem. > > That said, a read-only badblocks pass can give a false âno errorsâ report > in cases where a > non-destructive read-then-write pass (-n) will show errors. > > Alternatively, a read-only pass may show an error that a read-then-write pass > will silently bury > by forcing the drive to relocate the bad sector. > > In extreme cases, you could potentially fix a problem with a > read-random-random-write pass (-n -t > random -t random) because that will statistically flip all the bits at least > twice, which may rub > the driveâs nose in a bad sector, forcing a reallocation where a normal > read-then-write pass (-n > alone) may not. > > Hard drives are weird. It is only through the grace of ECC and such that > they approximate > deterministic behavior as well as they do. > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos > ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS-6.8 fsck report Maximal Count
On Mar 10, 2017, at 9:28 AM, Valeri Galtsev wrote: > > > On Fri, March 10, 2017 9:52 am, Warren Young wrote: >> On Mar 10, 2017, at 6:32 AM, James B. Byrne wrote: >>> >>> On Thu, March 9, 2017 09:46, John Hodrien wrote: fsck's not good at finding disk errors, it finds filesystem errors. >>> >>> If not fsck then what? >> >> badblocks(8). > > And I definitely will unmount relevant filesystem(s) before using > badblocks… You don’t necessarily have to. The default mode of badblocks is a non-invasive read-only test which is safe to run on a mounted filesystem. That said, a read-only badblocks pass can give a false “no errors” report in cases where a non-destructive read-then-write pass (-n) will show errors. Alternatively, a read-only pass may show an error that a read-then-write pass will silently bury by forcing the drive to relocate the bad sector. In extreme cases, you could potentially fix a problem with a read-random-random-write pass (-n -t random -t random) because that will statistically flip all the bits at least twice, which may rub the drive’s nose in a bad sector, forcing a reallocation where a normal read-then-write pass (-n alone) may not. Hard drives are weird. It is only through the grace of ECC and such that they approximate deterministic behavior as well as they do. ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS-6.8 fsck report Maximal Count
On Fri, March 10, 2017 9:52 am, Warren Young wrote: > On Mar 10, 2017, at 6:32 AM, James B. Byrne wrote: >> >> On Thu, March 9, 2017 09:46, John Hodrien wrote: >>> >>> fsck's not good at finding disk errors, it finds filesystem errors. >> >> If not fsck then what? > > badblocks(8). And I definitely will unmount relevant filesystem(s) before using badblocks... > > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos > Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS-6.8 fsck report Maximal Count
James B. Byrne wrote: > > On Thu, March 9, 2017 09:46, John Hodrien wrote: >> On Thu, 9 Mar 2017, James B. Byrne wrote: >> >>> This indicated that a bad sector on the underlying disk system might >>> be the source of the problem. The guests were all shutdown, a >>> /forcefsck file was created on the host system, and the host system >>> remotely restarted. >> >> fsck's not good at finding disk errors, it finds filesystem errors. > > If not fsck then what? > fsck run with -c, which forces badblocks to run. Or you can run that directly. >> >> If it was a real disk issue, you'd expect matching errors in the host >> logs. > > Yes, there are: > > Mar 9 09:14:13 vhost03 kernel: end_request: I/O error, dev sda, > sector 1236929063 > Mar 9 09:14:30 vhost03 kernel: end_request: I/O error, dev sda, > sector 1236929063 > Mar 9 09:14:48 vhost03 kernel: end_request: I/O error, dev sda, > sector 1236929063 Looks like only one sector's bad. Running badblocks should, I think, mark that sector as bad, so the system doesn't try to read or write there. I've got a user whose workstation has had a bad sector running for over a year. However, if it becomes two, or four, or 64 sectors, it's replacement time, asap. mark ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS-6.8 fsck report Maximal Count
On Mar 10, 2017, at 6:32 AM, James B. Byrne wrote: > > On Thu, March 9, 2017 09:46, John Hodrien wrote: >> >> fsck's not good at finding disk errors, it finds filesystem errors. > > If not fsck then what? badblocks(8). ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS-6.8 fsck report Maximal Count
On Thu, March 9, 2017 09:46, John Hodrien wrote: > On Thu, 9 Mar 2017, James B. Byrne wrote: > >> This indicated that a bad sector on the underlying disk system might >> be the source of the problem. The guests were all shutdown, a >> /forcefsck file was created on the host system, and the host system >> remotely restarted. > > fsck's not good at finding disk errors, it finds filesystem errors. If not fsck then what? > > If it was a real disk issue, you'd expect matching errors in the host > logs. Yes, there are: Mar 9 09:14:13 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063 Mar 9 09:14:30 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063 Mar 9 09:14:48 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063 I am running an extended SMART test on the drive at the moment. I suspect that the drive is probably at its EOL for practical purposes. So likely we will be looking at an equipment upgrade given the age of the rest of the equipment. In the meantime what steps, if any, should I take to remediate this problem? > >> /var/log/messages:Mar 9 08:34:48 vhost03 kernel: EXT4-fs (dm-6): >> warning: maximal mount count reached, running e2fsck is recommended > > Unmount it and run fsck on it, and that message would go away. But > I'd not > worry about that one. > > jh > > -- *** e-Mail is NOT a SECURE channel *** Do NOT transmit sensitive data via e-Mail Do NOT open attachments nor follow links sent by e-Mail James B. Byrnemailto:byrn...@harte-lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CentOS-6.8 fsck report Maximal Count
On Thu, 9 Mar 2017, James B. Byrne wrote: This indicated that a bad sector on the underlying disk system might be the source of the problem. The guests were all shutdown, a /forcefsck file was created on the host system, and the host system remotely restarted. fsck's not good at finding disk errors, it finds filesystem errors. If it was a real disk issue, you'd expect matching errors in the host logs. Did you? /var/log/messages:Mar 9 08:34:48 vhost03 kernel: EXT4-fs (dm-6): warning: maximal mount count reached, running e2fsck is recommended Unmount it and run fsck on it, and that message would go away. But I'd not worry about that one. jh ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
[CentOS] CentOS-6.8 fsck report Maximal Count
We have a remote warm standby system running CentOS-6.8 as a KVM system with multiple guests. One of the guests began reporting an error when running aide. Caught SIGBUS/SEGV while mmapping. File was truncated while aide was running? Caught SIGBUS/SEGV. Exiting The /var/log/messages file contained this: Mar 9 09:14:13 inet12 kernel: end_request: I/O error, dev vda, sector 14539264 Mar 9 09:14:31 inet12 kernel: end_request: I/O error, dev vda, sector 14539296 Mar 9 09:14:48 inet12 kernel: end_request: I/O error, dev vda, sector 14539296 df Filesystem 1K-blocksUsed Available Use% Mounted on /dev/mapper/vg_inet02-lv_root 7932336 2262672 5260064 31% / tmpfs 961044 0961044 0% /dev/shm /dev/vda1 487652 139473322579 31% /boot . . . This indicated that a bad sector on the underlying disk system might be the source of the problem. The guests were all shutdown, a /forcefsck file was created on the host system, and the host system remotely restarted. However, this action did not remove the error. The host system log files had this to say about fsck: /var/log/messages:Mar 9 08:34:48 vhost03 kernel: EXT4-fs (dm-6): warning: maximal mount count reached, running e2fsck is recommended in /dev I see this: brw-rw. 1 root disk253, 6 Mar 9 08:34 dm-6 But, this device has nothing whatsoever to do with the kvm guests: ll /dev/vg_vhost03/ | grep dm-6 lrwxrwxrwx. 1 root root 7 Mar 9 08:34 lv_centos_repos -> ../dm-6 Rather this is an lv devoted to holding CentOS ISOs: /dev/mapper/vg_vhost03-lv_centos_repos 101016992 77160124 18718848 81% /var/data/centos So, my questions are: 1. How do I fix the problem with the guest system that Aide is stumbling over? 2. How do I get the fsck issue with dm-6 resolved? -- *** e-Mail is NOT a SECURE channel *** Do NOT transmit sensitive data via e-Mail Do NOT open attachments nor follow links sent by e-Mail James B. Byrnemailto:byrn...@harte-lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos