Re: slowness with kernel 6.4.10 and software raid
I think you are overestimating their competence. The issues continue to seem to be new formula/process to increase platter density, and poor long term testing to figure out it is garbage. Usually the platter issues show up well before the warranty expires and usually continues and fails most of them by 5 years. Each time they work up a "new" formula/process it is a new crap shoot on how good and/or bad it will be. They have been screwing up their new magnetic media platter formulas for a long time, and it usually causes it to delaminate/bubble off the platters causing bad sectors. On Sun, Aug 20, 2023 at 9:05 AM George N. White III wrote: > > On Fri, Aug 18, 2023 at 5:21 PM Samuel Sieb wrote: >> >> On 8/18/23 13:15, Ranjan Maitra wrote: >> > Thanks, so are there two drives that are bad? Sorry, I am confused. It is >> > likely no longer in warranty: the one with /home is new (I think) and also >> > the /mnt/backup (which is a rsync-based backup I do so as to actually be >> > able to see these files, and also as a more reliable backup that i can >> > actually see). Outside this, I have a / drive that is a smaller SSD. I >> > also used to have that raided, but that other / drive died and I never got >> > to replacing it. >> > >> > So, my question is that is it only the raid drive /dev/sda that is bad, or >> > is there something else that you can see based on the report? >> >> The logs only indicate that sda is bad. There are no errors for sdc. > > > My experience has been that manufacturers have become good at optimizing > drives so they start failing just after the warranty ends. A few will fail > before > end-of-warranty. I used to proactively replace drives at end-of-warranty so > I could > pick a time when users didn't have urgent demands. I also bought a few spares > to replace drives that would fail early to minimize impacts on users. Cost > of spares > is much less than the cost of downtime, and there were always some > non-critical > need for temporary drive space which could be met by putting a spare drive in > an external case. > > -- > George N. White III > > ___ > users mailing list -- users@lists.fedoraproject.org > To unsubscribe send an email to users-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > Do not reply to spam, report it: > https://pagure.io/fedora-infrastructure/new_issue ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: slowness with kernel 6.4.10 and software raid
On Sun, 2023-08-20 at 11:04 -0300, George N. White III wrote: > My experience has been that manufacturers have become good at optimizing > drives so they start failing just after the warranty ends. Gonna jinx it, but I've had a good run with drives lasting eons, with the exception of several ones failing in an iMac (the same machine). And it's such a pain to change the drives buried within that hardware. Also, that iMac has much less use than any other PC, here. I have some 1980-1990s Amigas around here that still work, though there is one that needs a slap to get the hard drive to unstick and spin up. I did pull apart another completely stuck drive, and found that the arm that tracks the heads over the single disk platter is a y-shaped fork that holds the heads either side of the platter moves too far into the centre of the drive, and the centre of the Y branch of the fork grabs the disk platter. I always thought the head arm should move the other way at shut-off and stay off the platter. -- uname -rsvp Linux 3.10.0-1160.92.1.el7.x86_64 #1 SMP Tue Jun 20 11:48:01 UTC 2023 x86_64 Boilerplate: All unexpected mail to my mailbox is automatically deleted. I will only get to see the messages that are posted to the mailing list. ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: slowness with kernel 6.4.10 and software raid
On 08/20/2023 08:04 AM, George N. White III wrote: My experience has been that manufacturers have become good at optimizing drives so they start failing just after the warranty ends. You have that backwards. They've become very good at setting the warranty duration just long enough that the drives don't start failing until after it ends. Much easier for them. ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: slowness with kernel 6.4.10 and software raid
On Fri, Aug 18, 2023 at 5:21 PM Samuel Sieb wrote: > On 8/18/23 13:15, Ranjan Maitra wrote: > > Thanks, so are there two drives that are bad? Sorry, I am confused. It > is likely no longer in warranty: the one with /home is new (I think) and > also the /mnt/backup (which is a rsync-based backup I do so as to actually > be able to see these files, and also as a more reliable backup that i can > actually see). Outside this, I have a / drive that is a smaller SSD. I > also used to have that raided, but that other / drive died and I never got > to replacing it. > > > > So, my question is that is it only the raid drive /dev/sda that is bad, > or is there something else that you can see based on the report? > > The logs only indicate that sda is bad. There are no errors for sdc. > My experience has been that manufacturers have become good at optimizing drives so they start failing just after the warranty ends. A few will fail before end-of-warranty. I used to proactively replace drives at end-of-warranty so I could pick a time when users didn't have urgent demands. I also bought a few spares to replace drives that would fail early to minimize impacts on users. Cost of spares is much less than the cost of downtime, and there were always some non-critical need for temporary drive space which could be met by putting a spare drive in an external case. -- George N. White III ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: slowness with kernel 6.4.10 and software raid
On Fri Aug18'23 01:21:01PM, Samuel Sieb wrote: > From: Samuel Sieb > Date: Fri, 18 Aug 2023 13:21:01 -0700 > To: users@lists.fedoraproject.org > Reply-To: Community support for Fedora users > Subject: Re: slowness with kernel 6.4.10 and software raid > > On 8/18/23 13:15, Ranjan Maitra wrote: > > Thanks, so are there two drives that are bad? Sorry, I am confused. It is > > likely no longer in warranty: the one with /home is new (I think) and also > > the /mnt/backup (which is a rsync-based backup I do so as to actually be > > able to see these files, and also as a more reliable backup that i can > > actually see). Outside this, I have a / drive that is a smaller SSD. I > > also used to have that raided, but that other / drive died and I never got > > to replacing it. > > > > So, my question is that is it only the raid drive /dev/sda that is bad, or > > is there something else that you can see based on the report? > > The logs only indicate that sda is bad. There are no errors for sdc. Thanks very much, Sam! Best wishes, Ranjan > ___ > users mailing list -- users@lists.fedoraproject.org > To unsubscribe send an email to users-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > Do not reply to spam, report it: > https://pagure.io/fedora-infrastructure/new_issue ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: slowness with kernel 6.4.10 and software raid
On 8/18/23 13:15, Ranjan Maitra wrote: Thanks, so are there two drives that are bad? Sorry, I am confused. It is likely no longer in warranty: the one with /home is new (I think) and also the /mnt/backup (which is a rsync-based backup I do so as to actually be able to see these files, and also as a more reliable backup that i can actually see). Outside this, I have a / drive that is a smaller SSD. I also used to have that raided, but that other / drive died and I never got to replacing it. So, my question is that is it only the raid drive /dev/sda that is bad, or is there something else that you can see based on the report? The logs only indicate that sda is bad. There are no errors for sdc. ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: slowness with kernel 6.4.10 and software raid
Thanks, so are there two drives that are bad? Sorry, I am confused. It is likely no longer in warranty: the one with /home is new (I think) and also the /mnt/backup (which is a rsync-based backup I do so as to actually be able to see these files, and also as a more reliable backup that i can actually see). Outside this, I have a / drive that is a smaller SSD. I also used to have that raided, but that other / drive died and I never got to replacing it. So, my question is that is it only the raid drive /dev/sda that is bad, or is there something else that you can see based on the report? Many thanks, and best wishes, Ranjan On Fri Aug18'23 02:58:30PM, Roger Heflin wrote: > From: Roger Heflin > Date: Fri, 18 Aug 2023 14:58:30 -0500 > To: Community support for Fedora users > Reply-To: Community support for Fedora users > Subject: Re: slowness with kernel 6.4.10 and software raid > > ok. You have around 4000 sectors that are bad and are reallocated. > > You have around 1000 that are offline uncorrectable (reads failed). > > And you have a desktop drive that has a bad sector timeout of who > knows exactly what. I would guess at least 30 seconds, it could be > higher, but it must be lower than the scsi timeout fo the device. > > Given the power on hours the disk is out of warranty (I think). If > the disk was in warranty you could get the disk vendor to replace it. > > So whatever that timeout is when you hit a single bad sector the disk > is going to keep re-reading it for that timeout and then report that > sector cannot be read and mdraid will then read it from the other > mirror and re-write it. > > This disk could eventually failed to read each sector and mdraid could > re-write them and that may fix it. Or it could fix some of them on > this pass, and some on the next pass, and never fix all of them so sda > simply sucks. > > Best idea would be to buy a new disk, but this time do not buy a > desktop drive nor buy a SMR drive.There is a webpage someplaec > that lists which disks are not SMR disks, and other webpages list what > disks have a settable timeout (WD Red Plus and/or Seagate Ironwolf, > and likely others). > > Likely the disks will be classified as enterprise and/or NAS disks, > but whatever you look at make sure to check the vendors list to see if > the disk is SMR or not. Note WD Red is SMR, WD Red Plus is not SMR. > And SMR sometimes does not play nice with raid. > > On Fri, Aug 18, 2023 at 2:05 PM Ranjan Maitra wrote: > > > > On Fri Aug18'23 01:39:08PM, Roger Heflin wrote: > > > From: Roger Heflin > > > Date: Fri, 18 Aug 2023 13:39:08 -0500 > > > To: Community support for Fedora users > > > Reply-To: Community support for Fedora users > > > > > > Subject: Re: slowness with kernel 6.4.10 and software raid > > > > > > The above makes it very clear what is happening. What kind of disks > > > are these? And did you set the scterc timeout? You can see it via > > > smartctl -l scterc /dev/sda and then repeat on the other disk. > > > > > > Setting the timeout as low as you can will improve this situation > > > some, but it appears that sda has a number of bad sectors on it. > > > > > > a full output of "smartclt --xall /dev/sda" would be useful also to > > > see how bad it is. > > > > > > Short answer is you probably need a new device for sda. > > > > > > > Thanks! > > > > I tried: > > > > # smartctl -l scterc /dev/sda > > smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local > > build) > > Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org > > > > SCT Error Recovery Control command not supported > > > > # smartctl --xall /dev/sda > > > > smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] > > (local build) > > Copyright (C) 2002-23, Bruce Allen, Christian Franke, > > www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Seagate Barracuda 7200.14 (AF) > > Device Model: ST2000DM001-1ER164 > > Serial Number:Z4Z5F3LE > > LU WWN Device Id: 5 000c50 091167f04 > > Firmware Version: CC27 > > User Capacity:2,000,398,934,016 bytes [2.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate:7200 rpm > > Form Factor: 3.5 inches > > Device is:In smartctl database 7.3/5528 > > ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b > > SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > &
Re: slowness with kernel 6.4.10 and software raid
ok. You have around 4000 sectors that are bad and are reallocated. You have around 1000 that are offline uncorrectable (reads failed). And you have a desktop drive that has a bad sector timeout of who knows exactly what. I would guess at least 30 seconds, it could be higher, but it must be lower than the scsi timeout fo the device. Given the power on hours the disk is out of warranty (I think). If the disk was in warranty you could get the disk vendor to replace it. So whatever that timeout is when you hit a single bad sector the disk is going to keep re-reading it for that timeout and then report that sector cannot be read and mdraid will then read it from the other mirror and re-write it. This disk could eventually failed to read each sector and mdraid could re-write them and that may fix it. Or it could fix some of them on this pass, and some on the next pass, and never fix all of them so sda simply sucks. Best idea would be to buy a new disk, but this time do not buy a desktop drive nor buy a SMR drive.There is a webpage someplaec that lists which disks are not SMR disks, and other webpages list what disks have a settable timeout (WD Red Plus and/or Seagate Ironwolf, and likely others). Likely the disks will be classified as enterprise and/or NAS disks, but whatever you look at make sure to check the vendors list to see if the disk is SMR or not. Note WD Red is SMR, WD Red Plus is not SMR. And SMR sometimes does not play nice with raid. On Fri, Aug 18, 2023 at 2:05 PM Ranjan Maitra wrote: > > On Fri Aug18'23 01:39:08PM, Roger Heflin wrote: > > From: Roger Heflin > > Date: Fri, 18 Aug 2023 13:39:08 -0500 > > To: Community support for Fedora users > > Reply-To: Community support for Fedora users > > Subject: Re: slowness with kernel 6.4.10 and software raid > > > > The above makes it very clear what is happening. What kind of disks > > are these? And did you set the scterc timeout? You can see it via > > smartctl -l scterc /dev/sda and then repeat on the other disk. > > > > Setting the timeout as low as you can will improve this situation > > some, but it appears that sda has a number of bad sectors on it. > > > > a full output of "smartclt --xall /dev/sda" would be useful also to > > see how bad it is. > > > > Short answer is you probably need a new device for sda. > > > > Thanks! > > I tried: > > # smartctl -l scterc /dev/sda > smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local > build) > Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org > > SCT Error Recovery Control command not supported > > # smartctl --xall /dev/sda > > smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local > build) > Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Seagate Barracuda 7200.14 (AF) > Device Model: ST2000DM001-1ER164 > Serial Number:Z4Z5F3LE > LU WWN Device Id: 5 000c50 091167f04 > Firmware Version: CC27 > User Capacity:2,000,398,934,016 bytes [2.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate:7200 rpm > Form Factor: 3.5 inches > Device is:In smartctl database 7.3/5528 > ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b > SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is:Fri Aug 18 14:01:28 2023 CDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > AAM feature is: Unavailable > APM level is: 128 (minimum power consumption without standby) > Rd look-ahead is: Enabled > Write cache is: Enabled > DSN feature is: Unavailable > ATA Security is: Disabled, NOT FROZEN [SEC1] > Wt Cache Reorder: Unavailable > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x00) Offline data collection > activity > was never started. > Auto Offline Data Collection: > Disabled. > Self-test execution status: ( 0) The previous self-test > routine completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: ( 80) seconds. > Offline data collection > capabilities: (0x73) SMART execute Offline > immediate. > Auto Off
Re: slowness with kernel 6.4.10 and software raid
On Fri Aug18'23 01:39:08PM, Roger Heflin wrote: > From: Roger Heflin > Date: Fri, 18 Aug 2023 13:39:08 -0500 > To: Community support for Fedora users > Reply-To: Community support for Fedora users > Subject: Re: slowness with kernel 6.4.10 and software raid > > The above makes it very clear what is happening. What kind of disks > are these? And did you set the scterc timeout? You can see it via > smartctl -l scterc /dev/sda and then repeat on the other disk. > > Setting the timeout as low as you can will improve this situation > some, but it appears that sda has a number of bad sectors on it. > > a full output of "smartclt --xall /dev/sda" would be useful also to > see how bad it is. > > Short answer is you probably need a new device for sda. > Thanks! I tried: # smartctl -l scterc /dev/sda smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local build) Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org SCT Error Recovery Control command not supported # smartctl --xall /dev/sda smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.10-200.fc38.x86_64] (local build) Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST2000DM001-1ER164 Serial Number:Z4Z5F3LE LU WWN Device Id: 5 000c50 091167f04 Firmware Version: CC27 User Capacity:2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate:7200 rpm Form Factor: 3.5 inches Device is:In smartctl database 7.3/5528 ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is:Fri Aug 18 14:01:28 2023 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 128 (minimum power consumption without standby) Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Unavailable === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 80) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 212) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities:(0x1085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGSVALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 116 092 006-106200704 3 Spin_Up_TimePO 096 096 000-0 4 Start_Stop_Count-O--CK 100 100 020-97 5 Reallocated_Sector_Ct PO--CK 097 097 010-3960 7 Seek_Error_Rate POSR-- 084 060 030-333268033 9 Power_On_Hours -O--CK 062 062 000-34085 10 Spin_Retry_CountPO--C- 100 100 097-0 12 Power_Cycle_Count -O--CK 100 100 020-96 183 Runtime_Bad_Block -O--CK 100 100 000-0 184 End-to-End_Error-O--CK 100 100 099
Re: slowness with kernel 6.4.10 and software raid
The above makes it very clear what is happening. What kind of disks are these? And did you set the scterc timeout? You can see it via smartctl -l scterc /dev/sda and then repeat on the other disk. Setting the timeout as low as you can will improve this situation some, but it appears that sda has a number of bad sectors on it. a full output of "smartclt --xall /dev/sda" would be useful also to see how bad it is. Short answer is you probably need a new device for sda. On Fri, Aug 18, 2023 at 1:30 PM Ranjan Maitra wrote: > > Thanks, Roger! > > > On Fri Aug18'23 12:23:23PM, Roger Heflin wrote: > > From: Roger Heflin > > Date: Fri, 18 Aug 2023 12:23:23 -0500 > > To: Community support for Fedora users > > Reply-To: Community support for Fedora users > > Subject: Re: slowness with kernel 6.4.10 and software raid > > > > Is it moving at all or just stopped? If just stopped it appears that > > md126 is using external:/md127 for something and md127 looks wrong > > (both disks are spare) but I don't know in this external case what > > md127 should look like. > > It is moving, slowly. It is a 2 TB drive, but this is weird. > > > > > I would suggest checking messages with grep md12[67] /var/log/messages > > (and older messages files if the reboot was not this week) to see what > > is going on. > > Good idea! Here is the result from > > $ grep md126 /var/log/messages > > > Aug 14 15:02:30 localhost mdadm[1035]: Rebuild60 event detected on md > device /dev/md126 > Aug 16 14:21:20 localhost kernel: md/raid1:md126: active with 2 out of 2 > mirrors > Aug 16 14:21:20 localhost kernel: md126: detected capacity change from 0 to > 3711741952 > Aug 16 14:21:20 localhost kernel: md126: p1 > Aug 16 14:21:23 localhost systemd[1]: Condition check resulted in > dev-md126p1.device - /dev/md126p1 being skipped. > Aug 16 14:21:28 localhost systemd-fsck[942]: /dev/md126p1: clean, > 7345384/115998720 files, 409971205/463967488 blocks > Aug 16 14:21:31 localhost kernel: EXT4-fs (md126p1): mounted filesystem > 932eb81c-2ab4-4e6e-b093-46e43dbd6c28 r/w with ordered data mode. Quota mode: > none. > Aug 16 14:21:31 localhost mdadm[1033]: NewArray event detected on md device > /dev/md126 > Aug 16 14:21:31 localhost mdadm[1033]: RebuildStarted event detected on md > device /dev/md126 > Aug 16 14:21:31 localhost kernel: md: data-check of RAID array md126 > Aug 16 19:33:18 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735900352 > Aug 16 19:33:22 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735900864 > Aug 16 19:33:28 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900496 on sda) > Aug 16 19:33:36 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900568 on sda) > Aug 16 19:33:41 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900576 on sda) > Aug 16 19:33:50 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900624 on sda) > Aug 16 19:34:00 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900640 on sda) > Aug 16 19:34:10 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900688 on sda) > Aug 16 19:34:18 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900712 on sda) > Aug 16 19:34:28 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900792 on sda) > Aug 16 19:34:32 localhost kernel: md/raid1:md126: redirecting sector > 2735900352 to other mirror: sdc > Aug 16 19:34:37 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900872 on sda) > Aug 16 19:34:45 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900920 on sda) > Aug 16 19:34:54 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900992 on sda) > Aug 16 19:34:54 localhost kernel: md/raid1:md126: redirecting sector > 2735900864 to other mirror: sdc > Aug 16 19:35:07 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735905704 > Aug 16 19:35:11 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735905960 > Aug 16 19:35:18 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735905768 on sda) > Aug 16 19:35:19 localhost kernel: md/raid1:md126: redirecting sector > 2735905704 to other mirror: sdc > Aug 16 19:35:24 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735906120 on sda) > Aug 16 19:35:33 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735906192 on sda) > Aug 16 19:35:39 localh
Re: slowness with kernel 6.4.10 and software raid
Thanks, Roger! On Fri Aug18'23 12:23:23PM, Roger Heflin wrote: > From: Roger Heflin > Date: Fri, 18 Aug 2023 12:23:23 -0500 > To: Community support for Fedora users > Reply-To: Community support for Fedora users > Subject: Re: slowness with kernel 6.4.10 and software raid > > Is it moving at all or just stopped? If just stopped it appears that > md126 is using external:/md127 for something and md127 looks wrong > (both disks are spare) but I don't know in this external case what > md127 should look like. It is moving, slowly. It is a 2 TB drive, but this is weird. > > I would suggest checking messages with grep md12[67] /var/log/messages > (and older messages files if the reboot was not this week) to see what > is going on. Good idea! Here is the result from $ grep md126 /var/log/messages Aug 14 15:02:30 localhost mdadm[1035]: Rebuild60 event detected on md device /dev/md126 Aug 16 14:21:20 localhost kernel: md/raid1:md126: active with 2 out of 2 mirrors Aug 16 14:21:20 localhost kernel: md126: detected capacity change from 0 to 3711741952 Aug 16 14:21:20 localhost kernel: md126: p1 Aug 16 14:21:23 localhost systemd[1]: Condition check resulted in dev-md126p1.device - /dev/md126p1 being skipped. Aug 16 14:21:28 localhost systemd-fsck[942]: /dev/md126p1: clean, 7345384/115998720 files, 409971205/463967488 blocks Aug 16 14:21:31 localhost kernel: EXT4-fs (md126p1): mounted filesystem 932eb81c-2ab4-4e6e-b093-46e43dbd6c28 r/w with ordered data mode. Quota mode: none. Aug 16 14:21:31 localhost mdadm[1033]: NewArray event detected on md device /dev/md126 Aug 16 14:21:31 localhost mdadm[1033]: RebuildStarted event detected on md device /dev/md126 Aug 16 14:21:31 localhost kernel: md: data-check of RAID array md126 Aug 16 19:33:18 localhost kernel: md/raid1:md126: sda: rescheduling sector 2735900352 Aug 16 19:33:22 localhost kernel: md/raid1:md126: sda: rescheduling sector 2735900864 Aug 16 19:33:28 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900496 on sda) Aug 16 19:33:36 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900568 on sda) Aug 16 19:33:41 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900576 on sda) Aug 16 19:33:50 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900624 on sda) Aug 16 19:34:00 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900640 on sda) Aug 16 19:34:10 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900688 on sda) Aug 16 19:34:18 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900712 on sda) Aug 16 19:34:28 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900792 on sda) Aug 16 19:34:32 localhost kernel: md/raid1:md126: redirecting sector 2735900352 to other mirror: sdc Aug 16 19:34:37 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900872 on sda) Aug 16 19:34:45 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900920 on sda) Aug 16 19:34:54 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735900992 on sda) Aug 16 19:34:54 localhost kernel: md/raid1:md126: redirecting sector 2735900864 to other mirror: sdc Aug 16 19:35:07 localhost kernel: md/raid1:md126: sda: rescheduling sector 2735905704 Aug 16 19:35:11 localhost kernel: md/raid1:md126: sda: rescheduling sector 2735905960 Aug 16 19:35:18 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735905768 on sda) Aug 16 19:35:19 localhost kernel: md/raid1:md126: redirecting sector 2735905704 to other mirror: sdc Aug 16 19:35:24 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735906120 on sda) Aug 16 19:35:33 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735906192 on sda) Aug 16 19:35:39 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735906448 on sda) Aug 16 19:35:40 localhost kernel: md/raid1:md126: redirecting sector 2735905960 to other mirror: sdc Aug 16 19:35:45 localhost kernel: md/raid1:md126: sda: rescheduling sector 2735906472 Aug 16 19:35:49 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735906504 on sda) Aug 16 19:35:52 localhost kernel: md/raid1:md126: redirecting sector 2735906472 to other mirror: sdc Aug 16 19:36:03 localhost kernel: md/raid1:md126: sda: rescheduling sector 2735908008 Aug 16 19:36:08 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735908232 on sda) Aug 16 19:36:16 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735908344 on sda) Aug 16 19:36:21 localhost kernel: md/raid1:md126: read error corrected (8 sectors at 2735908424 on sda) Aug 16 19:36:21 localhost kernel: md/raid1:md126: redirecting sector 2735908008 to oth
Re: slowness with kernel 6.4.10 and software raid
Is it moving at all or just stopped? If just stopped it appears that md126 is using external:/md127 for something and md127 looks wrong (both disks are spare) but I don't know in this external case what md127 should look like. I would suggest checking messages with grep md12[67] /var/log/messages (and older messages files if the reboot was not this week) to see what is going on. Maybe also if you have a prior good reboot in messages file include that and see what happened differently between the 2. On Fri, Aug 18, 2023 at 7:46 AM Ranjan Maitra wrote: > > On Thu Aug17'23 10:37:29PM, Samuel Sieb wrote: > > From: Samuel Sieb > > Date: Thu, 17 Aug 2023 22:37:29 -0700 > > To: users@lists.fedoraproject.org > > Reply-To: Community support for Fedora users > > Subject: Re: slowness with kernel 6.4.10 and software raid > > > > On 8/17/23 21:38, Ranjan Maitra wrote: > > > $ cat /proc/mdstat > > > Personalities : [raid1] > > > md126 : active raid1 sda[1] sdc[0] > > > 1855870976 blocks super external:/md127/0 [2/2] [UU] > > > [=>...] check = 8.8% (165001216/1855870976) > > > finish=45465.2min speed=619K/sec > > > > > > md127 : inactive sda[1](S) sdc[0](S) > > > 10402 blocks super external:imsm > > > > > > unused devices: > > > > > > I am not sure what it is doing, and I am a bit concerned that this will > > > go on at this rate for about 20 days. No knowing what will happen after > > > that, and also if this problem will recur with another reboot. > > > > After a certain amount of time, mdraid will do a verification of the data > > where it scans the entire array. If you reboot, it will continue from where > > it left off. But that is *really* slow, so you should find out what's going > > on there. > > Yes, I know, just not sure what to do. Thanks very much! > > Any suggestion is appreciated! > > Best wishes, > Ranjan > ___ > users mailing list -- users@lists.fedoraproject.org > To unsubscribe send an email to users-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > Do not reply to spam, report it: > https://pagure.io/fedora-infrastructure/new_issue ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: slowness with kernel 6.4.10 and software raid
On Thu Aug17'23 10:37:29PM, Samuel Sieb wrote: > From: Samuel Sieb > Date: Thu, 17 Aug 2023 22:37:29 -0700 > To: users@lists.fedoraproject.org > Reply-To: Community support for Fedora users > Subject: Re: slowness with kernel 6.4.10 and software raid > > On 8/17/23 21:38, Ranjan Maitra wrote: > > $ cat /proc/mdstat > > Personalities : [raid1] > > md126 : active raid1 sda[1] sdc[0] > > 1855870976 blocks super external:/md127/0 [2/2] [UU] > > [=>...] check = 8.8% (165001216/1855870976) > > finish=45465.2min speed=619K/sec > > > > md127 : inactive sda[1](S) sdc[0](S) > > 10402 blocks super external:imsm > > > > unused devices: > > > > I am not sure what it is doing, and I am a bit concerned that this will go > > on at this rate for about 20 days. No knowing what will happen after that, > > and also if this problem will recur with another reboot. > > After a certain amount of time, mdraid will do a verification of the data > where it scans the entire array. If you reboot, it will continue from where > it left off. But that is *really* slow, so you should find out what's going > on there. Yes, I know, just not sure what to do. Thanks very much! Any suggestion is appreciated! Best wishes, Ranjan ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Re: slowness with kernel 6.4.10 and software raid
On 8/17/23 21:38, Ranjan Maitra wrote: $ cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sda[1] sdc[0] 1855870976 blocks super external:/md127/0 [2/2] [UU] [=>...] check = 8.8% (165001216/1855870976) finish=45465.2min speed=619K/sec md127 : inactive sda[1](S) sdc[0](S) 10402 blocks super external:imsm unused devices: I am not sure what it is doing, and I am a bit concerned that this will go on at this rate for about 20 days. No knowing what will happen after that, and also if this problem will recur with another reboot. After a certain amount of time, mdraid will do a verification of the data where it scans the entire array. If you reboot, it will continue from where it left off. But that is *really* slow, so you should find out what's going on there. ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
slowness with kernel 6.4.10 and software raid
Hi, I am at a bit of a loss. After about two months, I decided to reboot a few days ago, and rebooted into 6.4.10 kernel. Everything came up, but the machine is very unresponsive, and practically unusable when I am at it. It is better when working on it remote, but it is still slower. I was going to reboot but I came up with the following: $ reboot Operation inhibited by "Disk Manager" (PID 1044 "udisksd", user root), reason is "Unknown (mdraid-check-job)". Please retry operation after closing inhibitors and logging out other users. Alternatively, ignore inhibitors and users with 'systemctl reboot -i'. This pointed me to the possibility of the issue being raid. So, I looked at $ cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sda[1] sdc[0] 1855870976 blocks super external:/md127/0 [2/2] [UU] [=>...] check = 8.8% (165001216/1855870976) finish=45465.2min speed=619K/sec md127 : inactive sda[1](S) sdc[0](S) 10402 blocks super external:imsm unused devices: I am not sure what it is doing, and I am a bit concerned that this will go on at this rate for about 20 days. No knowing what will happen after that, and also if this problem will recur with another reboot. The machine is a 10 core, 20-thread Dell T5810. It is running a fully updated Fedora 38 installation. Any help/suggestions on what to do? Many thanks and best wishes, Ranjan ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue