Re: [SOLVED, kind of] Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-30 Thread Greg Wooledge
On Sun, Sep 29, 2019 at 03:04:48AM +0200, local10 wrote:
> Sep 28, 2019, 11:24 by bouncingc...@gmail.com:
> 
> > Having read that, I don't see any admission that fsck makes any
> > changes if run without any options as it seems you did. So I
> > wonder what caused the change in the debugfs message.
> >
> 
> man wasn't available in BusyBox so I had to limit myself to the options 
> available through "fsck.ext4 --help". I think I ran it as "fsck.ext4 -cfv 
> /dev/sda", if I remember correctly.

There's always .  It'll redirect
to some variant of the man page in the current stable release, usually.



Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-30 Thread Greg Wooledge
On Sat, Sep 28, 2019 at 04:01:27PM +1000, David wrote:
> > Maybe better to hide the stdout from md5sum:
> > # find junk -xdev -type f -exec md5sum '{}' >/dev/null \;
> 
> To be more precise, that hides the stdout from both md5sum and find,
> but I don't think that matters to the goal.

This is correct.  Pedantry follows:

The >/dev/null redirection applies to the "simple" shell command in
which it occurs, which happens to be find.  Its position within that
simple shell command is irrelevant.  It could be at the end, at the
beginning, or somewhere in the middle.  In this case, it happens to be
in the middle.

The command is 100% equivalent to:

find junk -xdev -type f -exec md5sum '{}' \; >/dev/null

find's stdout is redirected to /dev/null, and if find happens to find
any files which meet its matching criteria, and executes md5sum to
operate on them, md5sum will inherit find's stdout, which will still
be pointed to /dev/null.

The freedom to place a redirection anywhere within a simple shell command
is why examples like this one:

[ "$i" > 5 ]

give such surprising results.  This is a simple shell command (the
command name is "["), with an output redirection in it.  It is 100%
equivalent to

[ "$i" ] > 5

which opens-and-truncates a file named "5" for stdout, then performs
a string-length test on the value of "$i".  The command's exit status
will be "true" (0) as long as "$i" is not empty.

The intended command is most likely:

[ "$i" -gt 5 ]

which treats the value of "$i" as an integer, and then checks whether
that integer is greater than 5.  It returns "true" (0) if the value is
greater than 5, returns "false" (1) if it's less than or equal to 5,
and returns a value greater than 1 (still "false") and writes an error
message to stderr if "$i" cannot be converted to an integer.



Re: [SOLVED, kind of] Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-28 Thread local10
Sep 28, 2019, 11:24 by bouncingc...@gmail.com:

> Having read that, I don't see any admission that fsck makes any
> changes if run without any options as it seems you did. So I
> wonder what caused the change in the debugfs message.
>

man wasn't available in BusyBox so I had to limit myself to the options 
available through "fsck.ext4 --help". I think I ran it as "fsck.ext4 -cfv 
/dev/sda", if I remember correctly.


> Personally I think if both smart and fsck are happy then I would
> trust the filesystem. I have done a few repairs like this and never
> had any subsequent problems, but it's worth keeping an eye on
> future smart tests to see if any further errors appear on the drive.
>

The system seems to work fine so far. I'm going to keep an eye on it in the 
near future, just in case.

Thanks to everyone who responded.



Re: [SOLVED, kind of] Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-28 Thread Charles Curley
On Sun, 29 Sep 2019 01:24:00 +1000
David  wrote:

> Having read that, I don't see any admission that fsck makes any
> changes if run without any options as it seems you did. So I
> wonder what caused the change in the debugfs message.

I'll offer a guess. Modern disk drives have a full up controller on
board. That controller has enough smarts that when it detects an error
it can correct, it will transparently correct the error and re-write
the sector.

Sometimes it takes more than one such read and re-write to make the
correction permanent.

In GSmartControl, under the Statistics tab, keep an eye on the number
of reallocated sectors. If you see a non-zero value, your drive is in
trouble. At the prices of modern hard drives, it makes sense to replace
early rather than endure the hassle of restoring from backups, no
matter how well done they are.

Also,if you haven't already done so, set up smartd, which is part of
the smartmontools package.

And think about having a spare drive on hand for the day when that
drive goes south.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: [SOLVED, kind of] Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-28 Thread David
On Sat, 28 Sep 2019 at 22:31, local10  wrote:
> Sep 28, 2019, 02:06 by loca...@tutanota.com:

> > Good advice, thanks. I have a backup drive which is almost a
> > mirror copy of the failing one, so that's why I am not very worried
> > about it. I'm going to try to fix it in a couple of days, so let's see
> > how it goes.

> So I forced fsck to run at reboot, it refused to run in the auto
> mode, dropped me into BusyBox and from there I could run fsck
> manually, pressing  a couple of times telling fsck to ignore
> errors (that was the only option available to me in fsck other than
> quitting it). > After that fsck reported the filesystem clean.

It's great to hear that fsck seems happy with your filesystem.

Just for curiosity, I had a read of 'man e2fsck' and found this:
"""
If  e2fsck  is  run  in  interactive mode (meaning that none of
-y, -n, or -p are specified), the program will ask the user to fix
each problem found in the filesystem.  A response of 'y' will fix the
error; 'n' will leave the error unfixed; and 'a' will fix the problem
and all subsequent problems; pressing Enter will proceed with the
default response, which is printed before the question mark.
Pressing Control-C terminates e2fsck immediately.
"""
and also about the -p option, it does not give much detail in
the man page but a lot more information on -p is here:
https://unix.stackexchange.com/questions/18526/what-does-fsck-p-preen-do-on-ext4

Having read that, I don't see any admission that fsck makes any
changes if run without any options as it seems you did. So I
wonder what caused the change in the debugfs message.

Perhaps the most likely guess is fsck did something to tidy up the
inodes or cause the drive to remap or somehow avoid the bad block.
Maybe someone who knows more will add to this conversation.

Personally I think if both smart and fsck are happy then I would
trust the filesystem. I have done a few repairs like this and never
had any subsequent problems, but it's worth keeping an eye on
future smart tests to see if any further errors appear on the drive.



Re: [SOLVED, kind of] Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-28 Thread local10
Sep 28, 2019, 08:31 by loca...@tutanota.com:

> The end result:
>
Starting with the "The end result:" the email provider I use screwed up the 
email formatting, in the original it was a numbered list which should've looked 
something like this:

The end result:
1. fsck reports the repaired fs as clean
2. The system boots from the repaired root fs and functions normally, no disk 
related errors in the syslog
3. SMART self tests (both short and long) complete successfully, no errors
4. badblocks reports no bad blocks:
...
5. debugfs still looks a bit weird:
...

Hopefully it wasn't too unreadable.

Regards,



[SOLVED, kind of] Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-28 Thread local10
Sep 28, 2019, 02:06 by loca...@tutanota.com:

> Good advice, thanks. I have a backup drive which is almost a mirror copy of 
> the failing one, so that's why I am not very worried about it. I'm going to 
> try to fix it in a couple of days, so let's see how it goes.
>

So I forced fsck to run at reboot, it refused to run in the auto mode, dropped 
me into BusyBox and from there I could run fsck manually, pressing  a 
couple of times telling fsck to ignore errors (that was the only option 
available to me in fsck other than quitting it). After that fsck reported the 
filesystem clean.

The end result:
fsck reports the repaired fs as clean
The system boots from the repaired root fs and functions normally, no disk 
related errors in the syslog
SMART self tests (both short and long) complete successfully, no errors
badblocks reports no bad blocks:

# badblocks -sv -o /root/bad.blocks /dev/sda
Checking blocks 0 to 976762583
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)

debugfs still looks a bit weird:

# debugfs
debugfs 1.44.5 (15-Dec-2018)
debugfs:  open /dev/sda2
debugfs:  testb 950
Block 950 marked in use
debugfs:  icheck 950
Block   Inode number
950 7
debugfs:  ncheck 7
Inode   Pathname
debugfs:  testb 1430
Block 1430 marked in use
debugfs:  icheck 1430
Block   Inode number
1430    
debugfs:  quit

So the issue appears to be resolved, the system works, my remaining concern at 
this point is the debugfs output above.

Thanks to everyone who responded.



Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-28 Thread local10
Sep 28, 2019, 00:50 by bouncingc...@gmail.com:

> In your situation, I would:
> 1) backup everything important ASAP.
> and (assuming your init is systemd):
> 2) read here
> https://www.linuxuprising.com/2019/05/how-to-force-fsck-filesystem.html
> 3) do something like a one-off use of
>  fsck.mode=force
> by manually adding that to your boot one time
>
> It is possible that fsck might be able to repair it,
> but I would be surprised. If you try it, let us know
> how it goes. Be sure to backup anything important
> before you try it.
>

> I had another thought: if you care curious about where the problem
> is, maybe you could run some read-only command that reads
> every part of your disk, and see if it gets stuck anywhere.
> For example:
> # find / -xdev -type f -exec md5sum '{}' \;

Good advice, thanks. I have a backup drive which is almost a mirror copy of the 
failing one, so that's why I am not very worried about it. I'm going to try to 
fix it in a couple of days, so let's see how it goes.

Thanks for your feedback.



Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-28 Thread David
On Sat, 28 Sep 2019 at 15:57, David  wrote:
> On Sat, 28 Sep 2019 at 15:05, David  wrote:
>
> > I had another thought: if you care curious about where the problem
> > is, maybe you could run some read-only command that reads
> > every part of your disk, and see if it gets stuck anywhere.
> > For example:
> > # find / -xdev -type f -exec md5sum '{}' \;
>
> On second thoughts, I think my suggested command likely won't
> "get stuck" when it hits the error, so it won't be very useful.
>
> Maybe better to hide the stdout from md5sum:
> # find junk -xdev -type f -exec md5sum '{}' >/dev/null \;

To be more precise, that hides the stdout from both md5sum and find,
but I don't think that matters to the goal.



Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-27 Thread David
On Sat, 28 Sep 2019 at 15:05, David  wrote:

> I had another thought: if you care curious about where the problem
> is, maybe you could run some read-only command that reads
> every part of your disk, and see if it gets stuck anywhere.
> For example:
> # find / -xdev -type f -exec md5sum '{}' \;

On second thoughts, I think my suggested command likely won't
"get stuck" when it hits the error, so it won't be very useful.

Maybe better to hide the stdout from md5sum:
# find junk -xdev -type f -exec md5sum '{}' >/dev/null \;



Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-27 Thread John Covici
Buy spinrite from www.grc.com, that will work, but after fixing, I
would copy to a new drive.
You will need a dos disk but this program works well.

On Fri, 27 Sep 2019 22:38:59 -0400,
local10 wrote:
> 
> Sep 27, 2019, 20:08 by bouncingc...@gmail.com:
> 
> > Hi, I assume you are attempting to follow a procedure similar
> > to this one:
> > https://www.smartmontools.org/wiki/BadBlockHowto#Repairsinafilesystem 
> > 
> >
> 
> Yes, a different document but the same idea. The one you linked is actually a 
> better document.
> 
> 
> > It's telling you that the filesystem itself is broken/unreadable.
> > I assume you know what inodes are, if not then you should
> > read about that. 
> >
> > In such a case, I think that it is not possible to repair this filesystem.
> >
> 
> What I find weird about this that the filesystem (it's a root filesystem) 
> appears to be fully functional, it boots without issues and generally 
> everything seems to work fine, the only indication of a problem  I see in the 
> SMART log. Would be nice if there a was a way to just repair it, without 
> reinstalling everything.
> 
> Thanks
> 
> 

-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

 John Covici wb2una
 cov...@ccs.covici.com



Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-27 Thread David
On Sat, 28 Sep 2019 at 12:39, local10  wrote:
> Sep 27, 2019, 20:08 by bouncingc...@gmail.com:

> > It's telling you that the filesystem itself is broken/unreadable.
> > I assume you know what inodes are, if not then you should
> > read about that.

> > In such a case, I think that it is not possible to repair this filesystem.

> What I find weird about this that the filesystem (it's a root filesystem)
> appears to be fully functional, it boots without issues and generally
> everything seems to work fine

I had another thought: if you care curious about where the problem
is, maybe you could run some read-only command that reads
every part of your disk, and see if it gets stuck anywhere.
For example:
# find / -xdev -type f -exec md5sum '{}' \;



Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-27 Thread David
On Sat, 28 Sep 2019 at 12:39, local10  wrote:
> Sep 27, 2019, 20:08 by bouncingc...@gmail.com:
> >
> > In such a case, I think that it is not possible to repair this filesystem.
>
> What I find weird about this that the filesystem (it's a root filesystem) 
> appears to be fully functional, it boots without issues and generally 
> everything seems to work fine, the only indication of a problem  I see in the 
> SMART log. Would be nice if there a was a way to just repair it, without 
> reinstalling everything.

I would not trust that filesystem at all.
I would trust what SMART is telling you and fix it ASAP.

Because you don't know where in your directory tree that error
is lurking. You might never touch it in years, or you might
suddenly one day lose access to hundreds of files. Which may
or may not be important to you.

I don't live in a world where I run everything off one partition,
so to me doing that seems incredibly risky, even without any
disk error messages :)

I always have several fallback bootable partions on every
machine I use, and all my data lives on other partitions and
is synced between several machines as well. Not doing that
seems very risky, so I recommend learning some kind of strategy
that you don't find yourself in the situation where these kind
of issues risk losing data. To me, recreating a filesystem is
simple, because I have backups everywhere, and so I'm never
scared of failures or screw ups.

In your situation, I would:
1) backup everything important ASAP.
and (assuming your init is systemd):
2) read here
https://www.linuxuprising.com/2019/05/how-to-force-fsck-filesystem.html
3) do something like a one-off use of
  fsck.mode=force
by manually adding that to your boot one time

It is possible that fsck might be able to repair it,
but I would be surprised. If you try it, let us know
how it goes. Be sure to backup anything important
before you try it.



Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-27 Thread local10
Sep 27, 2019, 20:08 by bouncingc...@gmail.com:

> Hi, I assume you are attempting to follow a procedure similar
> to this one:
> https://www.smartmontools.org/wiki/BadBlockHowto#Repairsinafilesystem 
> 
>

Yes, a different document but the same idea. The one you linked is actually a 
better document.


> It's telling you that the filesystem itself is broken/unreadable.
> I assume you know what inodes are, if not then you should
> read about that. 
>
> In such a case, I think that it is not possible to repair this filesystem.
>

What I find weird about this that the filesystem (it's a root filesystem) 
appears to be fully functional, it boots without issues and generally 
everything seems to work fine, the only indication of a problem  I see in the 
SMART log. Would be nice if there a was a way to just repair it, without 
reinstalling everything.

Thanks



Re: ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-27 Thread David
On Sat, 28 Sep 2019 at 09:47, local10  wrote:
>
> Started to get SMART self test errors and wanted to fix them before things 
> would get any worse:
>
> SMART Self-test log structure revision number 1
> Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
> LBA_of_first_error
> # 1  Short offline   Completed: read failure   60% 24805 
> 624048
>
> So I calculated the file sytem block number of the bad LBA according to the 
> following formula:
>
> b = (int) ((L-S) * 512 / B)
>
> where:
> b = File System block number
> B = File system block size in bytes
> L = LBA of bad sector
> S = Starting sector of partition as shown by fdisk -lu
> and (int) denotes the integer part.
>
>
> The bad file system block number turned out to be 950 in my case. However, 
> trying to find out what file this block number belongs to failed (see below, 
> /dev/sda2 contains an ext4 root file system that was mounted as root when 
> debugfs was run):
>
> # debugfs
> debugfs 1.44.5 (15-Dec-2018)
> debugfs:  open /dev/sda2
> debugfs:  testb 950
> Block 950 marked in use
> debugfs:  icheck 950
> icheck: Input/output error while calling ext2fs_block_iterate
> icheck: Can't read next inode while doing inode scan
> debugfs:  quit
> #
>
> Any ideas? Thanks

Hi, I assume you are attempting to follow a procedure similar
to this one:
https://www.smartmontools.org/wiki/BadBlockHowto#Repairsinafilesystem

Most of the examples there assume that the bad block
is used for storage of file data. This is reasonable because
that's what most blocks are used for.

However, some blocks on any disk are used by the file system
itself to keep track of which blocks are used for what purpose.

And in this case, it appears that your bad block is one that is
used by the filesystem itself. I think that what debugfs is
telling you here:
> icheck: Can't read next inode while doing inode scan

It's telling you that the filesystem itself is broken/unreadable.
I assume you know what inodes are, if not then you should
read about that.

In such a case, I think that it is not possible to repair this filesystem.
Rather, you need to rescue or backup any required data, then
the whole filesystem needs to be recreated using mkfs to destroy
the old file system and create a new one, and then restore
all the file data. If it's a root filesystem, you will need another
one to manage that process, or reinstall.



ext4: debugfs: icheck: Input/output error while calling ext2fs_block_iterate

2019-09-27 Thread local10
Hi,

Started to get SMART self test errors and wanted to fix them before things 
would get any worse:

SMART Self-test log structure revision number 1
Num  Test_Description    Status  Remaining  LifeTime(hours)  
LBA_of_first_error
# 1  Short offline   Completed: read failure   60% 24805 
624048

So I calculated the file sytem block number of the bad LBA according to the 
following formula:

b = (int) ((L-S) * 512 / B)

where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.


The bad file system block number turned out to be 950 in my case. However, 
trying to find out what file this block number belongs to failed (see below, 
/dev/sda2 contains an ext4 root file system that was mounted as root when 
debugfs was run):

# debugfs
debugfs 1.44.5 (15-Dec-2018)
debugfs:  open /dev/sda2
debugfs:  testb 950
Block 950 marked in use
debugfs:  icheck 950
icheck: Input/output error while calling ext2fs_block_iterate
icheck: Can't read next inode while doing inode scan
debugfs:  quit
#

Any ideas? Thanks

# uname -a
Linux srv 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u1 (2019-09-20) x86_64 
GNU/Linux