Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-20 Thread Bill Maltby (C4B)
On Mon, 2017-03-20 at 04:53 +, Chris Murphy wrote:
> On Tue, Mar 14, 2017, 7:41 AM James B. Byrne  wrote:
> 
> > On Fri, March 10, 2017 11:57, m.r...@5-cent.us wrote:
> >
> > >
> > > Looks like only one sector's bad. Running badblocks should,
> > > 

> You'll need to search the smartmontools site for their doc on bad sectors.
> There's a how to, to find what file is affected by the bad sector so you
> can replace it. That's the only way to fix the problem.
> 
> This gets tricky going through LVM.

After booting from USB into single-user mode and dd'ing all readable
blocks, multiple passes as I then had to "skip=" to start with next good
blocks, I ran the manufacurers diag/repair software and had good
results.

YMMV

> 
> 
> Chris Murphy
> 

HTH,
Bill

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-19 Thread Chris Murphy
On Tue, Mar 14, 2017, 7:41 AM James B. Byrne  wrote:

> On Fri, March 10, 2017 11:57, m.r...@5-cent.us wrote:
>
> >
> > Looks like only one sector's bad. Running badblocks should,
> > I think, mark that sector as bad, so the system doesn't try
> > to read or write there. I've got a user whose workstation has
> > had a bad sector running for over a year. However, if it
> > becomes two, or four, or 64 sectors, it's replacement
> > time, asap.
> > 
>
>
> Bear with me on this.  The last time I did anything like this I ended
> up having to boot into recovery mode from an install cd and do this by
> hand.  This is not an option in the present circumstance as the unit
> is a headless server in a remote location.
>
> If I do this:
>
> echo '-c' > /fsckoptions
> touch /forcefsck
> shutdown -r now
>
> Will this repair the bad block and bring the system back up? If not
> then what other options should I use?
>
> The bad block is located in an LV assigned to a libvirt pool
> associated with a single vm.  Can this be checked and corrected
> without having to deal with the base system? If so then how?



You'll need to search the smartmontools site for their doc on bad sectors.
There's a how to, to find what file is affected by the bad sector so you
can replace it. That's the only way to fix the problem.

This gets tricky going through LVM.


Chris Murphy
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-14 Thread James B. Byrne
On Fri, March 10, 2017 11:57, m.r...@5-cent.us wrote:

>
> Looks like only one sector's bad. Running badblocks should,
> I think, mark that sector as bad, so the system doesn't try
> to read or write there. I've got a user whose workstation has
> had a bad sector running for over a year. However, if it
> becomes two, or four, or 64 sectors, it's replacement
> time, asap.
> 


Bear with me on this.  The last time I did anything like this I ended
up having to boot into recovery mode from an install cd and do this by
hand.  This is not an option in the present circumstance as the unit
is a headless server in a remote location.

If I do this:

echo '-c' > /fsckoptions
touch /forcefsck
shutdown -r now

Will this repair the bad block and bring the system back up? If not
then what other options should I use?

The bad block is located in an LV assigned to a libvirt pool
associated with a single vm.  Can this be checked and corrected
without having to deal with the base system? If so then how?

Regards,


-- 
***  e-Mail is NOT a SECURE channel  ***
Do NOT transmit sensitive data via e-Mail
 Do NOT open attachments nor follow links sent by e-Mail

James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-10 Thread Jay Hart
Talk about missing the email I wanted to reply too. Disregard...


>> On Mar 10, 2017, at 9:28 AM, Valeri Galtsev  
>> wrote:
>>>
>>>
>>> On Fri, March 10, 2017 9:52 am, Warren Young wrote:
 On Mar 10, 2017, at 6:32 AM, James B. Byrne  wrote:
>
> On Thu, March 9, 2017 09:46, John Hodrien wrote:
>>
>> fsck's not good at finding disk errors, it finds filesystem errors.
>
> If not fsck then what?

 badblocks(8).
>>>
>>> And I definitely will unmount relevant filesystem(s) before using
>>> badblocks…
>>
>> You don’t necessarily have to.  The default mode of badblocks is a 
>> non-invasive read-only test
>> which is safe to run on a mounted filesystem.
>>
>> That said, a read-only badblocks pass can give a false “no errors” 
>> report in cases where a
>> non-destructive read-then-write pass (-n) will show errors.
>>
>> Alternatively, a read-only pass may show an error that a read-then-write 
>> pass will silently bury
>> by forcing the drive to relocate the bad sector.
>>
>> In extreme cases, you could potentially fix a problem with a 
>> read-random-random-write pass (-n
>> -t
>> random -t random) because that will statistically flip all the bits at least 
>> twice, which may
>> rub
>> the drive’s nose in a bad sector, forcing a reallocation where a normal 
>> read-then-write pass
>> (-n
>> alone) may not.
>>
>> Hard drives are weird.  It is only through the grace of ECC and such that 
>> they approximate
>> deterministic behavior as well as they do.
>> ___
>> CentOS mailing list
>> CentOS@centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>>
>
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-10 Thread Jay Hart
I get up around 0630, u can come anytime after that. I want to hit the range 
that morning but if I
KNEW when you are arriving, I could plan around that...

> On Mar 10, 2017, at 9:28 AM, Valeri Galtsev  wrote:
>>
>>
>> On Fri, March 10, 2017 9:52 am, Warren Young wrote:
>>> On Mar 10, 2017, at 6:32 AM, James B. Byrne  wrote:

 On Thu, March 9, 2017 09:46, John Hodrien wrote:
>
> fsck's not good at finding disk errors, it finds filesystem errors.

 If not fsck then what?
>>>
>>> badblocks(8).
>>
>> And I definitely will unmount relevant filesystem(s) before using
>> badblocks…
>
> You don’t necessarily have to.  The default mode of badblocks is a 
> non-invasive read-only test
> which is safe to run on a mounted filesystem.
>
> That said, a read-only badblocks pass can give a false “no errors” report 
> in cases where a
> non-destructive read-then-write pass (-n) will show errors.
>
> Alternatively, a read-only pass may show an error that a read-then-write pass 
> will silently bury
> by forcing the drive to relocate the bad sector.
>
> In extreme cases, you could potentially fix a problem with a 
> read-random-random-write pass (-n -t
> random -t random) because that will statistically flip all the bits at least 
> twice, which may rub
> the drive’s nose in a bad sector, forcing a reallocation where a normal 
> read-then-write pass (-n
> alone) may not.
>
> Hard drives are weird.  It is only through the grace of ECC and such that 
> they approximate
> deterministic behavior as well as they do.
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-10 Thread Warren Young
On Mar 10, 2017, at 9:28 AM, Valeri Galtsev  wrote:
> 
> 
> On Fri, March 10, 2017 9:52 am, Warren Young wrote:
>> On Mar 10, 2017, at 6:32 AM, James B. Byrne  wrote:
>>> 
>>> On Thu, March 9, 2017 09:46, John Hodrien wrote:
 
 fsck's not good at finding disk errors, it finds filesystem errors.
>>> 
>>> If not fsck then what?
>> 
>> badblocks(8).
> 
> And I definitely will unmount relevant filesystem(s) before using
> badblocks…

You don’t necessarily have to.  The default mode of badblocks is a non-invasive 
read-only test which is safe to run on a mounted filesystem.

That said, a read-only badblocks pass can give a false “no errors” report in 
cases where a non-destructive read-then-write pass (-n) will show errors.

Alternatively, a read-only pass may show an error that a read-then-write pass 
will silently bury by forcing the drive to relocate the bad sector.

In extreme cases, you could potentially fix a problem with a 
read-random-random-write pass (-n -t random -t random) because that will 
statistically flip all the bits at least twice, which may rub the drive’s nose 
in a bad sector, forcing a reallocation where a normal read-then-write pass (-n 
alone) may not.

Hard drives are weird.  It is only through the grace of ECC and such that they 
approximate deterministic behavior as well as they do.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-10 Thread Valeri Galtsev

On Fri, March 10, 2017 9:52 am, Warren Young wrote:
> On Mar 10, 2017, at 6:32 AM, James B. Byrne  wrote:
>>
>> On Thu, March 9, 2017 09:46, John Hodrien wrote:
>>>
>>> fsck's not good at finding disk errors, it finds filesystem errors.
>>
>> If not fsck then what?
>
> badblocks(8).

And I definitely will unmount relevant filesystem(s) before using
badblocks...

>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>



Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-10 Thread m . roth
James B. Byrne wrote:
>
> On Thu, March 9, 2017 09:46, John Hodrien wrote:
>> On Thu, 9 Mar 2017, James B. Byrne wrote:
>>
>>> This indicated that a bad sector on the underlying disk system might
>>> be the source of the problem.  The guests were all shutdown, a
>>> /forcefsck file was created on the host system, and the host system
>>> remotely restarted.
>>
>> fsck's not good at finding disk errors, it finds filesystem errors.
>
> If not fsck then what?
>
fsck  run with -c, which forces badblocks to run. Or you can run that
directly.
>>
>> If it was a real disk issue, you'd expect matching errors in the host
>> logs.
>
> Yes, there are:
>
> Mar  9 09:14:13 vhost03 kernel: end_request: I/O error, dev sda,
> sector 1236929063
> Mar  9 09:14:30 vhost03 kernel: end_request: I/O error, dev sda,
> sector 1236929063
> Mar  9 09:14:48 vhost03 kernel: end_request: I/O error, dev sda,
> sector 1236929063

Looks like only one sector's bad. Running badblocks should, I think, mark
that sector as bad, so the system doesn't try to read or write there. I've
got a user whose workstation has had a bad sector running for over a year.
However, if it becomes two, or four, or 64 sectors, it's replacement time,
asap.

mark

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-10 Thread Warren Young
On Mar 10, 2017, at 6:32 AM, James B. Byrne  wrote:
> 
> On Thu, March 9, 2017 09:46, John Hodrien wrote:
>> 
>> fsck's not good at finding disk errors, it finds filesystem errors.
> 
> If not fsck then what?

badblocks(8).

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-10 Thread James B. Byrne

On Thu, March 9, 2017 09:46, John Hodrien wrote:
> On Thu, 9 Mar 2017, James B. Byrne wrote:
>
>> This indicated that a bad sector on the underlying disk system might
>> be the source of the problem.  The guests were all shutdown, a
>> /forcefsck file was created on the host system, and the host system
>> remotely restarted.
>
> fsck's not good at finding disk errors, it finds filesystem errors.

If not fsck then what?

>
> If it was a real disk issue, you'd expect matching errors in the host
> logs.


Yes, there are:

Mar  9 09:14:13 vhost03 kernel: end_request: I/O error, dev sda,
sector 1236929063
Mar  9 09:14:30 vhost03 kernel: end_request: I/O error, dev sda,
sector 1236929063
Mar  9 09:14:48 vhost03 kernel: end_request: I/O error, dev sda,
sector 1236929063

I am running an extended SMART test on the drive at the moment. I
suspect that the drive is probably at its EOL for practical purposes. 
So likely we will be looking at an equipment upgrade given the age of
the rest of the equipment.

In the meantime what steps, if any, should I take to remediate this
problem?

>
>> /var/log/messages:Mar  9 08:34:48 vhost03 kernel: EXT4-fs (dm-6):
>> warning: maximal mount count reached, running e2fsck is recommended
>
> Unmount it and run fsck on it, and that message would go away.  But
> I'd not
> worry about that one.
>
> jh
>
>


-- 
***  e-Mail is NOT a SECURE channel  ***
Do NOT transmit sensitive data via e-Mail
 Do NOT open attachments nor follow links sent by e-Mail

James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-09 Thread John Hodrien

On Thu, 9 Mar 2017, James B. Byrne wrote:


This indicated that a bad sector on the underlying disk system might
be the source of the problem.  The guests were all shutdown, a
/forcefsck file was created on the host system, and the host system
remotely restarted.


fsck's not good at finding disk errors, it finds filesystem errors.

If it was a real disk issue, you'd expect matching errors in the host logs.
Did you?


/var/log/messages:Mar  9 08:34:48 vhost03 kernel: EXT4-fs (dm-6):
warning: maximal mount count reached, running e2fsck is recommended


Unmount it and run fsck on it, and that message would go away.  But I'd not
worry about that one.

jh
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] CentOS-6.8 fsck report Maximal Count

2017-03-09 Thread James B. Byrne
We have a remote warm standby system running CentOS-6.8 as a KVM
system with multiple guests.  One of the guests began reporting an
error when running aide.

Caught SIGBUS/SEGV while mmapping. File was truncated while aide was
running?
Caught SIGBUS/SEGV. Exiting

The /var/log/messages file contained this:
Mar  9 09:14:13 inet12 kernel: end_request: I/O error, dev vda, sector
14539264
Mar  9 09:14:31 inet12 kernel: end_request: I/O error, dev vda, sector
14539296
Mar  9 09:14:48 inet12 kernel: end_request: I/O error, dev vda, sector
14539296

df
Filesystem   1K-blocksUsed Available Use% Mounted on
/dev/mapper/vg_inet02-lv_root
   7932336 2262672   5260064  31% /
tmpfs   961044   0961044   0% /dev/shm
/dev/vda1   487652  139473322579  31% /boot
. . .


This indicated that a bad sector on the underlying disk system might
be the source of the problem.  The guests were all shutdown, a
/forcefsck file was created on the host system, and the host system
remotely restarted.

However, this action did not remove the error.  The host system log
files had this to say about fsck:

/var/log/messages:Mar  9 08:34:48 vhost03 kernel: EXT4-fs (dm-6):
warning: maximal mount count reached, running e2fsck is recommended

in /dev I see this:
brw-rw. 1 root disk253,   6 Mar  9 08:34 dm-6

But, this device has nothing whatsoever to do with the kvm guests:

ll /dev/vg_vhost03/ | grep dm-6
lrwxrwxrwx. 1 root root 7 Mar  9 08:34 lv_centos_repos -> ../dm-6

Rather this is an lv devoted to holding CentOS ISOs:

/dev/mapper/vg_vhost03-lv_centos_repos
 101016992 77160124  18718848  81% /var/data/centos

So, my questions are:

1. How do I fix the problem with the guest system that Aide is
stumbling over?

2. How do I get the fsck issue with dm-6 resolved?


-- 
***  e-Mail is NOT a SECURE channel  ***
Do NOT transmit sensitive data via e-Mail
 Do NOT open attachments nor follow links sent by e-Mail

James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos