Re: slowness with kernel 6.4.10 and software raid

2023-08-20 Thread Roger Heflin
I think you are overestimating their competence.

The issues continue to seem to be new formula/process to increase
platter density, and poor long term testing to figure out it is
garbage.

Usually the platter issues show up well before the warranty expires
and usually continues and fails most of them by 5 years.

Each time they work up a "new" formula/process it is a new crap shoot
on how good and/or bad it will be.

They have been screwing up their new magnetic media platter formulas
for a long time, and it usually causes it to delaminate/bubble off the
platters causing bad sectors.

On Sun, Aug 20, 2023 at 9:05 AM George N. White III  wrote:
>
> On Fri, Aug 18, 2023 at 5:21 PM Samuel Sieb  wrote:
>>
>> On 8/18/23 13:15, Ranjan Maitra wrote:
>> > Thanks, so are there two drives that are bad? Sorry, I am confused. It is 
>> > likely no longer in warranty: the one with /home is new (I think) and also 
>> > the /mnt/backup (which is a rsync-based backup I do so as to actually be 
>> > able to see these files, and also as a more reliable backup that i can 
>> > actually see). Outside this, I have a / drive  that is a smaller SSD. I 
>> > also used to have that raided, but that other / drive died and I never got 
>> > to replacing it.
>> >
>> > So, my question is that is it only the raid drive /dev/sda that is bad, or 
>> > is there something else that you can see based on the report?
>>
>> The logs only indicate that sda is bad.  There are no errors for sdc.
>
>
> My experience has been that manufacturers have become good at optimizing
> drives so they start failing just after the warranty ends.  A few will fail 
> before
> end-of-warranty.  I used to proactively replace drives at end-of-warranty so 
> I could
> pick a time when users didn't have urgent demands.  I also bought a few spares
> to replace drives that would fail early to minimize impacts on users.  Cost 
> of spares
> is much less than the cost of downtime, and there were always some 
> non-critical
> need for temporary drive space which could be met by putting a spare drive in
> an external case.
>
> --
> George N. White III
>
> ___
> users mailing list -- users@lists.fedoraproject.org
> To unsubscribe send an email to users-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
> Do not reply to spam, report it: 
> https://pagure.io/fedora-infrastructure/new_issue
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: slowness with kernel 6.4.10 and software raid

2023-08-20 Thread Tim via users
On Sun, 2023-08-20 at 11:04 -0300, George N. White III wrote:
> My experience has been that manufacturers have become good at optimizing 
> drives so they start failing just after the warranty ends. 

Gonna jinx it, but I've had a good run with drives lasting eons, with
the exception of several ones failing in an iMac (the same machine). 
And it's such a pain to change the drives buried within that hardware.

Also, that iMac has much less use than any other PC, here.

I have some 1980-1990s Amigas around here that still work, though there
is one that needs a slap to get the hard drive to unstick and spin up.

I did pull apart another completely stuck drive, and found that the arm
that tracks the heads over the single disk platter is a y-shaped fork
that holds the heads either side of the platter moves too far into the
centre of the drive, and the centre of the Y branch of the fork grabs
the disk platter.  I always thought the head arm should move the other
way at shut-off and stay off the platter.

-- 
 
uname -rsvp
Linux 3.10.0-1160.92.1.el7.x86_64 #1 SMP Tue Jun 20 11:48:01 UTC 2023 x86_64
 
Boilerplate:  All unexpected mail to my mailbox is automatically deleted.
I will only get to see the messages that are posted to the mailing list.
 
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: slowness with kernel 6.4.10 and software raid

2023-08-20 Thread Joe Zeff

On 08/20/2023 08:04 AM, George N. White III wrote:


My experience has been that manufacturers have become good at optimizing
drives so they start failing just after the warranty ends.


You have that backwards.  They've become very good at setting the 
warranty duration just long enough that the drives don't start failing 
until after it ends.  Much easier for them.

___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: slowness with kernel 6.4.10 and software raid

2023-08-20 Thread George N. White III
On Fri, Aug 18, 2023 at 5:21 PM Samuel Sieb  wrote:

> On 8/18/23 13:15, Ranjan Maitra wrote:
> > Thanks, so are there two drives that are bad? Sorry, I am confused. It
> is likely no longer in warranty: the one with /home is new (I think) and
> also the /mnt/backup (which is a rsync-based backup I do so as to actually
> be able to see these files, and also as a more reliable backup that i can
> actually see). Outside this, I have a / drive  that is a smaller SSD. I
> also used to have that raided, but that other / drive died and I never got
> to replacing it.
> >
> > So, my question is that is it only the raid drive /dev/sda that is bad,
> or is there something else that you can see based on the report?
>
> The logs only indicate that sda is bad.  There are no errors for sdc.
>

My experience has been that manufacturers have become good at optimizing
drives so they start failing just after the warranty ends.  A few will fail
before
end-of-warranty.  I used to proactively replace drives at end-of-warranty
so I could
pick a time when users didn't have urgent demands.  I also bought a few
spares
to replace drives that would fail early to minimize impacts on users.  Cost
of spares
is much less than the cost of downtime, and there were always some
non-critical
need for temporary drive space which could be met by putting a spare drive
in
an external case.

-- 
George N. White III
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: slowness with kernel 6.4.10 and software raid

2023-08-18 Thread Ranjan Maitra
On Fri Aug18'23 01:21:01PM, Samuel Sieb wrote:
> From: Samuel Sieb 
> Date: Fri, 18 Aug 2023 13:21:01 -0700
> To: users@lists.fedoraproject.org
> Reply-To: Community support for Fedora users 
> Subject: Re: slowness with kernel 6.4.10 and software raid
>
> On 8/18/23 13:15, Ranjan Maitra wrote:
> > Thanks, so are there two drives that are bad? Sorry, I am confused. It is 
> > likely no longer in warranty: the one with /home is new (I think) and also 
> > the /mnt/backup (which is a rsync-based backup I do so as to actually be 
> > able to see these files, and also as a more reliable backup that i can 
> > actually see). Outside this, I have a / drive  that is a smaller SSD. I 
> > also used to have that raided, but that other / drive died and I never got 
> > to replacing it.
> >
> > So, my question is that is it only the raid drive /dev/sda that is bad, or 
> > is there something else that you can see based on the report?
>
> The logs only indicate that sda is bad.  There are no errors for sdc.


Thanks very much, Sam!

Best wishes,
Ranjan

> ___
> users mailing list -- users@lists.fedoraproject.org
> To unsubscribe send an email to users-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
> Do not reply to spam, report it: 
> https://pagure.io/fedora-infrastructure/new_issue
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: slowness with kernel 6.4.10 and software raid

2023-08-18 Thread Samuel Sieb

On 8/18/23 13:15, Ranjan Maitra wrote:

Thanks, so are there two drives that are bad? Sorry, I am confused. It is 
likely no longer in warranty: the one with /home is new (I think) and also the 
/mnt/backup (which is a rsync-based backup I do so as to actually be able to 
see these files, and also as a more reliable backup that i can actually see). 
Outside this, I have a / drive  that is a smaller SSD. I also used to have that 
raided, but that other / drive died and I never got to replacing it.

So, my question is that is it only the raid drive /dev/sda that is bad, or is 
there something else that you can see based on the report?


The logs only indicate that sda is bad.  There are no errors for sdc.
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: slowness with kernel 6.4.10 and software raid

2023-08-18 Thread Ranjan Maitra
Thanks, so are there two drives that are bad? Sorry, I am confused. It is 
likely no longer in warranty: the one with /home is new (I think) and also the 
/mnt/backup (which is a rsync-based backup I do so as to actually be able to 
see these files, and also as a more reliable backup that i can actually see). 
Outside this, I have a / drive  that is a smaller SSD. I also used to have that 
raided, but that other / drive died and I never got to replacing it.

So, my question is that is it only the raid drive /dev/sda that is bad, or is 
there something else that you can see based on the report?

Many thanks, and best wishes,
Ranjan


On Fri Aug18'23 02:58:30PM, Roger Heflin wrote:
> From: Roger Heflin 
> Date: Fri, 18 Aug 2023 14:58:30 -0500
> To: Community support for Fedora users 
> Reply-To: Community support for Fedora users 
> Subject: Re: slowness with kernel 6.4.10 and software raid
>
> ok.  You have around 4000 sectors that are bad and are reallocated.
>
> You have around 1000 that are offline uncorrectable (reads failed).
>
> And you have a desktop drive that has a bad sector timeout of who
> knows exactly what.   I would guess at least 30 seconds, it could be
> higher, but it must be lower than the scsi timeout fo the device.
>
> Given the power on hours the disk is out of warranty (I think).  If
> the disk was in warranty you could get the disk vendor to replace it.
>
> So whatever that timeout is when you hit a single bad sector the disk
> is going to keep re-reading it for that timeout and then report that
> sector cannot be read and mdraid will then read it from the other
> mirror and re-write it.
>
> This disk could eventually failed to read each sector and mdraid could
> re-write them and that may fix it.  Or it could fix some of them on
> this pass, and some on the next pass, and never fix all of them so sda
> simply sucks.
>
> Best idea would be to buy a new disk, but this time do not buy a
> desktop drive nor buy a SMR drive.There is a webpage someplaec
> that lists which disks are not SMR disks, and other webpages list what
> disks have a settable timeout (WD Red Plus and/or Seagate Ironwolf,
> and likely others).
>
> Likely the disks will be classified as enterprise and/or NAS disks,
> but whatever you look at make sure to check the vendors list to see if
> the disk is SMR or not.  Note WD Red is SMR, WD Red Plus is not SMR.
> And SMR sometimes does not play nice with raid.
>
> On Fri, Aug 18, 2023 at 2:05 PM Ranjan Maitra  wrote:
> >
> > On Fri Aug18'23 01:39:08PM, Roger Heflin wrote:
> > > From: Roger Heflin 
> > > Date: Fri, 18 Aug 2023 13:39:08 -0500
> > > To: Community support for Fedora users 
> > > Reply-To: Community support for Fedora users 
> > > 
> > > Subject: Re: slowness with kernel 6.4.10 and software raid
> > >
> > > The above makes it very clear what is happening.   What kind of disks
> > > are these?  And did you set the scterc timeout?  You can see it via
> > > smartctl -l scterc /dev/sda   and then repeat on the other disk.
> > >
> > > Setting the timeout as low as you can will improve this situation
> > > some, but it appears that sda has a number of bad sectors on it.
> > >
> > > a full output of "smartclt --xall /dev/sda" would be useful also to
> > > see how bad it is.
> > >
> > > Short answer is you probably need a new device for sda.
> > >
> >
> > Thanks!
> >
> > I tried:
> >
> > # smartctl -l scterc /dev/sda
> >  smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local 
> > build)
> >  Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > SCT Error Recovery Control command not supported
> >
> > # smartctl --xall /dev/sda
> >
> >   smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] 
> > (local build)
> >   Copyright (C) 2002-23, Bruce Allen, Christian Franke, 
> > www.smartmontools.org
> >
> >   === START OF INFORMATION SECTION ===
> >   Model Family: Seagate Barracuda 7200.14 (AF)
> >   Device Model: ST2000DM001-1ER164
> >   Serial Number:Z4Z5F3LE
> >   LU WWN Device Id: 5 000c50 091167f04
> >   Firmware Version: CC27
> >   User Capacity:2,000,398,934,016 bytes [2.00 TB]
> >   Sector Sizes: 512 bytes logical, 4096 bytes physical
> >   Rotation Rate:7200 rpm
> >   Form Factor:  3.5 inches
> >   Device is:In smartctl database 7.3/5528
> >   ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
> >   SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> &

Re: slowness with kernel 6.4.10 and software raid

2023-08-18 Thread Roger Heflin
ok.  You have around 4000 sectors that are bad and are reallocated.

You have around 1000 that are offline uncorrectable (reads failed).

And you have a desktop drive that has a bad sector timeout of who
knows exactly what.   I would guess at least 30 seconds, it could be
higher, but it must be lower than the scsi timeout fo the device.

Given the power on hours the disk is out of warranty (I think).  If
the disk was in warranty you could get the disk vendor to replace it.

So whatever that timeout is when you hit a single bad sector the disk
is going to keep re-reading it for that timeout and then report that
sector cannot be read and mdraid will then read it from the other
mirror and re-write it.

This disk could eventually failed to read each sector and mdraid could
re-write them and that may fix it.  Or it could fix some of them on
this pass, and some on the next pass, and never fix all of them so sda
simply sucks.

Best idea would be to buy a new disk, but this time do not buy a
desktop drive nor buy a SMR drive.There is a webpage someplaec
that lists which disks are not SMR disks, and other webpages list what
disks have a settable timeout (WD Red Plus and/or Seagate Ironwolf,
and likely others).

Likely the disks will be classified as enterprise and/or NAS disks,
but whatever you look at make sure to check the vendors list to see if
the disk is SMR or not.  Note WD Red is SMR, WD Red Plus is not SMR.
And SMR sometimes does not play nice with raid.

On Fri, Aug 18, 2023 at 2:05 PM Ranjan Maitra  wrote:
>
> On Fri Aug18'23 01:39:08PM, Roger Heflin wrote:
> > From: Roger Heflin 
> > Date: Fri, 18 Aug 2023 13:39:08 -0500
> > To: Community support for Fedora users 
> > Reply-To: Community support for Fedora users 
> > Subject: Re: slowness with kernel 6.4.10 and software raid
> >
> > The above makes it very clear what is happening.   What kind of disks
> > are these?  And did you set the scterc timeout?  You can see it via
> > smartctl -l scterc /dev/sda   and then repeat on the other disk.
> >
> > Setting the timeout as low as you can will improve this situation
> > some, but it appears that sda has a number of bad sectors on it.
> >
> > a full output of "smartclt --xall /dev/sda" would be useful also to
> > see how bad it is.
> >
> > Short answer is you probably need a new device for sda.
> >
>
> Thanks!
>
> I tried:
>
> # smartctl -l scterc /dev/sda
>  smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local 
> build)
>  Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
>
> SCT Error Recovery Control command not supported
>
> # smartctl --xall /dev/sda
>
>   smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local 
> build)
>   Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
>
>   === START OF INFORMATION SECTION ===
>   Model Family: Seagate Barracuda 7200.14 (AF)
>   Device Model: ST2000DM001-1ER164
>   Serial Number:Z4Z5F3LE
>   LU WWN Device Id: 5 000c50 091167f04
>   Firmware Version: CC27
>   User Capacity:2,000,398,934,016 bytes [2.00 TB]
>   Sector Sizes: 512 bytes logical, 4096 bytes physical
>   Rotation Rate:7200 rpm
>   Form Factor:  3.5 inches
>   Device is:In smartctl database 7.3/5528
>   ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
>   SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>   Local Time is:Fri Aug 18 14:01:28 2023 CDT
>   SMART support is: Available - device has SMART capability.
>   SMART support is: Enabled
>   AAM feature is:   Unavailable
>   APM level is: 128 (minimum power consumption without standby)
>   Rd look-ahead is: Enabled
>   Write cache is:   Enabled
>   DSN feature is:   Unavailable
>   ATA Security is:  Disabled, NOT FROZEN [SEC1]
>   Wt Cache Reorder: Unavailable
>
>   === START OF READ SMART DATA SECTION ===
>   SMART overall-health self-assessment test result: PASSED
>
>   General SMART Values:
>   Offline data collection status:  (0x00)   Offline data collection 
> activity
> was never started.
> Auto Offline Data Collection: 
> Disabled.
>   Self-test execution status:  (   0)   The previous self-test 
> routine completed
> without error or no self-test has ever
> been run.
>   Total time to complete Offline
>   data collection:  (   80) seconds.
>   Offline data collection
>   capabilities:  (0x73) SMART execute Offline 
> immediate.
> Auto Off

Re: slowness with kernel 6.4.10 and software raid

2023-08-18 Thread Ranjan Maitra
On Fri Aug18'23 01:39:08PM, Roger Heflin wrote:
> From: Roger Heflin 
> Date: Fri, 18 Aug 2023 13:39:08 -0500
> To: Community support for Fedora users 
> Reply-To: Community support for Fedora users 
> Subject: Re: slowness with kernel 6.4.10 and software raid
>
> The above makes it very clear what is happening.   What kind of disks
> are these?  And did you set the scterc timeout?  You can see it via
> smartctl -l scterc /dev/sda   and then repeat on the other disk.
>
> Setting the timeout as low as you can will improve this situation
> some, but it appears that sda has a number of bad sectors on it.
>
> a full output of "smartclt --xall /dev/sda" would be useful also to
> see how bad it is.
>
> Short answer is you probably need a new device for sda.
>

Thanks!

I tried:

# smartctl -l scterc /dev/sda
 smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local 
build)
 Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control command not supported

# smartctl --xall /dev/sda

  smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local 
build)
  Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

  === START OF INFORMATION SECTION ===
  Model Family: Seagate Barracuda 7200.14 (AF)
  Device Model: ST2000DM001-1ER164
  Serial Number:Z4Z5F3LE
  LU WWN Device Id: 5 000c50 091167f04
  Firmware Version: CC27
  User Capacity:2,000,398,934,016 bytes [2.00 TB]
  Sector Sizes: 512 bytes logical, 4096 bytes physical
  Rotation Rate:7200 rpm
  Form Factor:  3.5 inches
  Device is:In smartctl database 7.3/5528
  ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
  SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
  Local Time is:Fri Aug 18 14:01:28 2023 CDT
  SMART support is: Available - device has SMART capability.
  SMART support is: Enabled
  AAM feature is:   Unavailable
  APM level is: 128 (minimum power consumption without standby)
  Rd look-ahead is: Enabled
  Write cache is:   Enabled
  DSN feature is:   Unavailable
  ATA Security is:  Disabled, NOT FROZEN [SEC1]
  Wt Cache Reorder: Unavailable

  === START OF READ SMART DATA SECTION ===
  SMART overall-health self-assessment test result: PASSED

  General SMART Values:
  Offline data collection status:  (0x00)   Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
  Self-test execution status:  (   0)   The previous self-test routine 
completed
without error or no self-test has ever
been run.
  Total time to complete Offline
  data collection:  (   80) seconds.
  Offline data collection
  capabilities:  (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
  SMART capabilities:(0x0003)   Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
  Error logging capability:(0x01)   Error logging supported.
General Purpose Logging supported.
  Short self-test routine
  recommended polling time:  (   1) minutes.
  Extended self-test routine
  recommended polling time:  ( 212) minutes.
  Conveyance self-test routine
  recommended polling time:  (   2) minutes.
  SCT capabilities:(0x1085) SCT Status supported.

  SMART Attributes Data Structure revision number: 10
  Vendor Specific SMART Attributes with Thresholds:
  ID# ATTRIBUTE_NAME  FLAGSVALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR--   116   092   006-106200704
3 Spin_Up_TimePO   096   096   000-0
4 Start_Stop_Count-O--CK   100   100   020-97
5 Reallocated_Sector_Ct   PO--CK   097   097   010-3960
7 Seek_Error_Rate POSR--   084   060   030-333268033
9 Power_On_Hours  -O--CK   062   062   000-34085
   10 Spin_Retry_CountPO--C-   100   100   097-0
   12 Power_Cycle_Count   -O--CK   100   100   020-96
  183 Runtime_Bad_Block   -O--CK   100   100   000-0
  184 End-to-End_Error-O--CK   100   100   099 

Re: slowness with kernel 6.4.10 and software raid

2023-08-18 Thread Roger Heflin
The above makes it very clear what is happening.   What kind of disks
are these?  And did you set the scterc timeout?  You can see it via
smartctl -l scterc /dev/sda   and then repeat on the other disk.

Setting the timeout as low as you can will improve this situation
some, but it appears that sda has a number of bad sectors on it.

a full output of "smartclt --xall /dev/sda" would be useful also to
see how bad it is.

Short answer is you probably need a new device for sda.

On Fri, Aug 18, 2023 at 1:30 PM Ranjan Maitra  wrote:
>
> Thanks, Roger!
>
>
> On Fri Aug18'23 12:23:23PM, Roger Heflin wrote:
> > From: Roger Heflin 
> > Date: Fri, 18 Aug 2023 12:23:23 -0500
> > To: Community support for Fedora users 
> > Reply-To: Community support for Fedora users 
> > Subject: Re: slowness with kernel 6.4.10 and software raid
> >
> > Is it moving at all or just stopped?  If just stopped it appears that
> > md126 is using external:/md127 for something and md127 looks wrong
> > (both disks are spare) but I don't know in this external case what
> > md127 should look like.
>
> It is moving, slowly. It is a 2 TB drive, but this is weird.
>
> >
> > I would suggest checking messages with grep md12[67] /var/log/messages
> > (and older messages files if the reboot was not this week) to see what
> > is going on.
>
> Good idea! Here is the result from
>
> $ grep md126  /var/log/messages
>
>
>   Aug 14 15:02:30 localhost mdadm[1035]: Rebuild60 event detected on md 
> device /dev/md126
>   Aug 16 14:21:20 localhost kernel: md/raid1:md126: active with 2 out of 2 
> mirrors
>   Aug 16 14:21:20 localhost kernel: md126: detected capacity change from 0 to 
> 3711741952
>   Aug 16 14:21:20 localhost kernel: md126: p1
>   Aug 16 14:21:23 localhost systemd[1]: Condition check resulted in 
> dev-md126p1.device - /dev/md126p1 being skipped.
>   Aug 16 14:21:28 localhost systemd-fsck[942]: /dev/md126p1: clean, 
> 7345384/115998720 files, 409971205/463967488 blocks
>   Aug 16 14:21:31 localhost kernel: EXT4-fs (md126p1): mounted filesystem 
> 932eb81c-2ab4-4e6e-b093-46e43dbd6c28 r/w with ordered data mode. Quota mode: 
> none.
>   Aug 16 14:21:31 localhost mdadm[1033]: NewArray event detected on md device 
> /dev/md126
>   Aug 16 14:21:31 localhost mdadm[1033]: RebuildStarted event detected on md 
> device /dev/md126
>   Aug 16 14:21:31 localhost kernel: md: data-check of RAID array md126
>   Aug 16 19:33:18 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735900352
>   Aug 16 19:33:22 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735900864
>   Aug 16 19:33:28 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900496 on sda)
>   Aug 16 19:33:36 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900568 on sda)
>   Aug 16 19:33:41 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900576 on sda)
>   Aug 16 19:33:50 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900624 on sda)
>   Aug 16 19:34:00 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900640 on sda)
>   Aug 16 19:34:10 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900688 on sda)
>   Aug 16 19:34:18 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900712 on sda)
>   Aug 16 19:34:28 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900792 on sda)
>   Aug 16 19:34:32 localhost kernel: md/raid1:md126: redirecting sector 
> 2735900352 to other mirror: sdc
>   Aug 16 19:34:37 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900872 on sda)
>   Aug 16 19:34:45 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900920 on sda)
>   Aug 16 19:34:54 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900992 on sda)
>   Aug 16 19:34:54 localhost kernel: md/raid1:md126: redirecting sector 
> 2735900864 to other mirror: sdc
>   Aug 16 19:35:07 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735905704
>   Aug 16 19:35:11 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735905960
>   Aug 16 19:35:18 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735905768 on sda)
>   Aug 16 19:35:19 localhost kernel: md/raid1:md126: redirecting sector 
> 2735905704 to other mirror: sdc
>   Aug 16 19:35:24 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735906120 on sda)
>   Aug 16 19:35:33 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735906192 on sda)
>   Aug 16 19:35:39 localh

Re: slowness with kernel 6.4.10 and software raid

2023-08-18 Thread Ranjan Maitra
Thanks, Roger!


On Fri Aug18'23 12:23:23PM, Roger Heflin wrote:
> From: Roger Heflin 
> Date: Fri, 18 Aug 2023 12:23:23 -0500
> To: Community support for Fedora users 
> Reply-To: Community support for Fedora users 
> Subject: Re: slowness with kernel 6.4.10 and software raid
>
> Is it moving at all or just stopped?  If just stopped it appears that
> md126 is using external:/md127 for something and md127 looks wrong
> (both disks are spare) but I don't know in this external case what
> md127 should look like.

It is moving, slowly. It is a 2 TB drive, but this is weird.

>
> I would suggest checking messages with grep md12[67] /var/log/messages
> (and older messages files if the reboot was not this week) to see what
> is going on.

Good idea! Here is the result from

$ grep md126  /var/log/messages


  Aug 14 15:02:30 localhost mdadm[1035]: Rebuild60 event detected on md device 
/dev/md126
  Aug 16 14:21:20 localhost kernel: md/raid1:md126: active with 2 out of 2 
mirrors
  Aug 16 14:21:20 localhost kernel: md126: detected capacity change from 0 to 
3711741952
  Aug 16 14:21:20 localhost kernel: md126: p1
  Aug 16 14:21:23 localhost systemd[1]: Condition check resulted in 
dev-md126p1.device - /dev/md126p1 being skipped.
  Aug 16 14:21:28 localhost systemd-fsck[942]: /dev/md126p1: clean, 
7345384/115998720 files, 409971205/463967488 blocks
  Aug 16 14:21:31 localhost kernel: EXT4-fs (md126p1): mounted filesystem 
932eb81c-2ab4-4e6e-b093-46e43dbd6c28 r/w with ordered data mode. Quota mode: 
none.
  Aug 16 14:21:31 localhost mdadm[1033]: NewArray event detected on md device 
/dev/md126
  Aug 16 14:21:31 localhost mdadm[1033]: RebuildStarted event detected on md 
device /dev/md126
  Aug 16 14:21:31 localhost kernel: md: data-check of RAID array md126
  Aug 16 19:33:18 localhost kernel: md/raid1:md126: sda: rescheduling sector 
2735900352
  Aug 16 19:33:22 localhost kernel: md/raid1:md126: sda: rescheduling sector 
2735900864
  Aug 16 19:33:28 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900496 on sda)
  Aug 16 19:33:36 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900568 on sda)
  Aug 16 19:33:41 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900576 on sda)
  Aug 16 19:33:50 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900624 on sda)
  Aug 16 19:34:00 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900640 on sda)
  Aug 16 19:34:10 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900688 on sda)
  Aug 16 19:34:18 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900712 on sda)
  Aug 16 19:34:28 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900792 on sda)
  Aug 16 19:34:32 localhost kernel: md/raid1:md126: redirecting sector 
2735900352 to other mirror: sdc
  Aug 16 19:34:37 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900872 on sda)
  Aug 16 19:34:45 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900920 on sda)
  Aug 16 19:34:54 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735900992 on sda)
  Aug 16 19:34:54 localhost kernel: md/raid1:md126: redirecting sector 
2735900864 to other mirror: sdc
  Aug 16 19:35:07 localhost kernel: md/raid1:md126: sda: rescheduling sector 
2735905704
  Aug 16 19:35:11 localhost kernel: md/raid1:md126: sda: rescheduling sector 
2735905960
  Aug 16 19:35:18 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735905768 on sda)
  Aug 16 19:35:19 localhost kernel: md/raid1:md126: redirecting sector 
2735905704 to other mirror: sdc
  Aug 16 19:35:24 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735906120 on sda)
  Aug 16 19:35:33 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735906192 on sda)
  Aug 16 19:35:39 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735906448 on sda)
  Aug 16 19:35:40 localhost kernel: md/raid1:md126: redirecting sector 
2735905960 to other mirror: sdc
  Aug 16 19:35:45 localhost kernel: md/raid1:md126: sda: rescheduling sector 
2735906472
  Aug 16 19:35:49 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735906504 on sda)
  Aug 16 19:35:52 localhost kernel: md/raid1:md126: redirecting sector 
2735906472 to other mirror: sdc
  Aug 16 19:36:03 localhost kernel: md/raid1:md126: sda: rescheduling sector 
2735908008
  Aug 16 19:36:08 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735908232 on sda)
  Aug 16 19:36:16 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735908344 on sda)
  Aug 16 19:36:21 localhost kernel: md/raid1:md126: read error corrected (8 
sectors at 2735908424 on sda)
  Aug 16 19:36:21 localhost kernel: md/raid1:md126: redirecting sector 
2735908008 to oth

Re: slowness with kernel 6.4.10 and software raid

2023-08-18 Thread Roger Heflin
Is it moving at all or just stopped?  If just stopped it appears that
md126 is using external:/md127 for something and md127 looks wrong
(both disks are spare) but I don't know in this external case what
md127 should look like.

I would suggest checking messages with grep md12[67] /var/log/messages
(and older messages files if the reboot was not this week) to see what
is going on.

Maybe also if you have a prior good reboot in messages file include
that and see what happened differently between the 2.

On Fri, Aug 18, 2023 at 7:46 AM Ranjan Maitra  wrote:
>
> On Thu Aug17'23 10:37:29PM, Samuel Sieb wrote:
> > From: Samuel Sieb 
> > Date: Thu, 17 Aug 2023 22:37:29 -0700
> > To: users@lists.fedoraproject.org
> > Reply-To: Community support for Fedora users 
> > Subject: Re: slowness with kernel 6.4.10 and software raid
> >
> > On 8/17/23 21:38, Ranjan Maitra wrote:
> > > $ cat /proc/mdstat
> > >   Personalities : [raid1]
> > >   md126 : active raid1 sda[1] sdc[0]
> > > 1855870976 blocks super external:/md127/0 [2/2] [UU]
> > > [=>...]  check =  8.8% (165001216/1855870976) 
> > > finish=45465.2min speed=619K/sec
> > >
> > >   md127 : inactive sda[1](S) sdc[0](S)
> > > 10402 blocks super external:imsm
> > >
> > >   unused devices: 
> > >
> > > I am not sure what it is doing, and I am a bit concerned that this will 
> > > go on at this rate for about 20 days. No knowing what will happen after 
> > > that, and also if this problem will recur with another reboot.
> >
> > After a certain amount of time, mdraid will do a verification of the data
> > where it scans the entire array.  If you reboot, it will continue from where
> > it left off.  But that is *really* slow, so you should find out what's going
> > on there.
>
> Yes, I know, just not sure what to do. Thanks very much!
>
> Any suggestion is appreciated!
>
> Best wishes,
> Ranjan
> ___
> users mailing list -- users@lists.fedoraproject.org
> To unsubscribe send an email to users-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
> Do not reply to spam, report it: 
> https://pagure.io/fedora-infrastructure/new_issue
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: slowness with kernel 6.4.10 and software raid

2023-08-18 Thread Ranjan Maitra
On Thu Aug17'23 10:37:29PM, Samuel Sieb wrote:
> From: Samuel Sieb 
> Date: Thu, 17 Aug 2023 22:37:29 -0700
> To: users@lists.fedoraproject.org
> Reply-To: Community support for Fedora users 
> Subject: Re: slowness with kernel 6.4.10 and software raid
>
> On 8/17/23 21:38, Ranjan Maitra wrote:
> > $ cat /proc/mdstat
> >   Personalities : [raid1]
> >   md126 : active raid1 sda[1] sdc[0]
> > 1855870976 blocks super external:/md127/0 [2/2] [UU]
> > [=>...]  check =  8.8% (165001216/1855870976) 
> > finish=45465.2min speed=619K/sec
> >
> >   md127 : inactive sda[1](S) sdc[0](S)
> > 10402 blocks super external:imsm
> >
> >   unused devices: 
> >
> > I am not sure what it is doing, and I am a bit concerned that this will go 
> > on at this rate for about 20 days. No knowing what will happen after that, 
> > and also if this problem will recur with another reboot.
>
> After a certain amount of time, mdraid will do a verification of the data
> where it scans the entire array.  If you reboot, it will continue from where
> it left off.  But that is *really* slow, so you should find out what's going
> on there.

Yes, I know, just not sure what to do. Thanks very much!

Any suggestion is appreciated!

Best wishes,
Ranjan
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: slowness with kernel 6.4.10 and software raid

2023-08-17 Thread Samuel Sieb

On 8/17/23 21:38, Ranjan Maitra wrote:

$ cat /proc/mdstat
  Personalities : [raid1]
  md126 : active raid1 sda[1] sdc[0]
1855870976 blocks super external:/md127/0 [2/2] [UU]
[=>...]  check =  8.8% (165001216/1855870976) 
finish=45465.2min speed=619K/sec

  md127 : inactive sda[1](S) sdc[0](S)
10402 blocks super external:imsm

  unused devices: 

I am not sure what it is doing, and I am a bit concerned that this will go on 
at this rate for about 20 days. No knowing what will happen after that, and 
also if this problem will recur with another reboot.


After a certain amount of time, mdraid will do a verification of the 
data where it scans the entire array.  If you reboot, it will continue 
from where it left off.  But that is *really* slow, so you should find 
out what's going on there.

___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


slowness with kernel 6.4.10 and software raid

2023-08-17 Thread Ranjan Maitra
Hi,

I am at a bit of a loss. After about two months, I decided to reboot a few days 
ago, and rebooted into 6.4.10  kernel. Everything came up, but the machine is 
very unresponsive, and practically unusable when I am at it. It is better when 
working on it remote, but it is still slower.

I was going to reboot but I came up with the following:

 $ reboot
 Operation inhibited by "Disk Manager" (PID 1044 "udisksd", user root), reason 
is "Unknown (mdraid-check-job)".
 Please retry operation after closing inhibitors and logging out other users.
Alternatively, ignore inhibitors and users with 'systemctl reboot -i'.

This pointed me to the possibility of the issue being raid. So, I looked at

$ cat /proc/mdstat
 Personalities : [raid1]
 md126 : active raid1 sda[1] sdc[0]
   1855870976 blocks super external:/md127/0 [2/2] [UU]
   [=>...]  check =  8.8% (165001216/1855870976) 
finish=45465.2min speed=619K/sec

 md127 : inactive sda[1](S) sdc[0](S)
   10402 blocks super external:imsm

 unused devices: 

I am not sure what it is doing, and I am a bit concerned that this will go on 
at this rate for about 20 days. No knowing what will happen after that, and 
also if this problem will recur with another reboot.

The machine is a 10 core, 20-thread Dell T5810. It is running a fully updated 
Fedora 38 installation.

Any help/suggestions on what to do?

Many thanks and best wishes,
Ranjan
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue