The above makes it very clear what is happening. What kind of disks are these? And did you set the scterc timeout? You can see it via smartctl -l scterc /dev/sda and then repeat on the other disk.
Setting the timeout as low as you can will improve this situation some, but it appears that sda has a number of bad sectors on it. a full output of "smartclt --xall /dev/sda" would be useful also to see how bad it is. Short answer is you probably need a new device for sda. On Fri, Aug 18, 2023 at 1:30 PM Ranjan Maitra <mlmai...@gmx.com> wrote: > > Thanks, Roger! > > > On Fri Aug18'23 12:23:23PM, Roger Heflin wrote: > > From: Roger Heflin <rogerhef...@gmail.com> > > Date: Fri, 18 Aug 2023 12:23:23 -0500 > > To: Community support for Fedora users <users@lists.fedoraproject.org> > > Reply-To: Community support for Fedora users <users@lists.fedoraproject.org> > > Subject: Re: slowness with kernel 6.4.10 and software raid > > > > Is it moving at all or just stopped? If just stopped it appears that > > md126 is using external:/md127 for something and md127 looks wrong > > (both disks are spare) but I don't know in this external case what > > md127 should look like. > > It is moving, slowly. It is a 2 TB drive, but this is weird. > > > > > I would suggest checking messages with grep md12[67] /var/log/messages > > (and older messages files if the reboot was not this week) to see what > > is going on. > > Good idea! Here is the result from > > $ grep md126 /var/log/messages > > > Aug 14 15:02:30 localhost mdadm[1035]: Rebuild60 event detected on md > device /dev/md126 > Aug 16 14:21:20 localhost kernel: md/raid1:md126: active with 2 out of 2 > mirrors > Aug 16 14:21:20 localhost kernel: md126: detected capacity change from 0 to > 3711741952 > Aug 16 14:21:20 localhost kernel: md126: p1 > Aug 16 14:21:23 localhost systemd[1]: Condition check resulted in > dev-md126p1.device - /dev/md126p1 being skipped. > Aug 16 14:21:28 localhost systemd-fsck[942]: /dev/md126p1: clean, > 7345384/115998720 files, 409971205/463967488 blocks > Aug 16 14:21:31 localhost kernel: EXT4-fs (md126p1): mounted filesystem > 932eb81c-2ab4-4e6e-b093-46e43dbd6c28 r/w with ordered data mode. Quota mode: > none. > Aug 16 14:21:31 localhost mdadm[1033]: NewArray event detected on md device > /dev/md126 > Aug 16 14:21:31 localhost mdadm[1033]: RebuildStarted event detected on md > device /dev/md126 > Aug 16 14:21:31 localhost kernel: md: data-check of RAID array md126 > Aug 16 19:33:18 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735900352 > Aug 16 19:33:22 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735900864 > Aug 16 19:33:28 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900496 on sda) > Aug 16 19:33:36 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900568 on sda) > Aug 16 19:33:41 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900576 on sda) > Aug 16 19:33:50 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900624 on sda) > Aug 16 19:34:00 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900640 on sda) > Aug 16 19:34:10 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900688 on sda) > Aug 16 19:34:18 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900712 on sda) > Aug 16 19:34:28 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900792 on sda) > Aug 16 19:34:32 localhost kernel: md/raid1:md126: redirecting sector > 2735900352 to other mirror: sdc > Aug 16 19:34:37 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900872 on sda) > Aug 16 19:34:45 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900920 on sda) > Aug 16 19:34:54 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735900992 on sda) > Aug 16 19:34:54 localhost kernel: md/raid1:md126: redirecting sector > 2735900864 to other mirror: sdc > Aug 16 19:35:07 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735905704 > Aug 16 19:35:11 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735905960 > Aug 16 19:35:18 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735905768 on sda) > Aug 16 19:35:19 localhost kernel: md/raid1:md126: redirecting sector > 2735905704 to other mirror: sdc > Aug 16 19:35:24 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735906120 on sda) > Aug 16 19:35:33 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735906192 on sda) > Aug 16 19:35:39 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735906448 on sda) > Aug 16 19:35:40 localhost kernel: md/raid1:md126: redirecting sector > 2735905960 to other mirror: sdc > Aug 16 19:35:45 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735906472 > Aug 16 19:35:49 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735906504 on sda) > Aug 16 19:35:52 localhost kernel: md/raid1:md126: redirecting sector > 2735906472 to other mirror: sdc > Aug 16 19:36:03 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735908008 > Aug 16 19:36:08 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735908232 on sda) > Aug 16 19:36:16 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735908344 on sda) > Aug 16 19:36:21 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735908424 on sda) > Aug 16 19:36:21 localhost kernel: md/raid1:md126: redirecting sector > 2735908008 to other mirror: sda > Aug 16 19:36:30 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735908008 > Aug 16 19:36:37 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735908296 on sda) > Aug 16 19:36:38 localhost kernel: md/raid1:md126: redirecting sector > 2735908008 to other mirror: sdc > Aug 16 19:36:42 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735908776 > Aug 16 19:36:42 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735909032 > Aug 16 19:36:46 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735908784 on sda) > Aug 16 19:36:50 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735908944 on sda) > Aug 16 19:36:50 localhost kernel: md/raid1:md126: redirecting sector > 2735908776 to other mirror: sdc > Aug 16 19:36:55 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735909312 on sda) > Aug 16 19:37:00 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735909360 on sda) > Aug 16 19:37:04 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735909400 on sda) > Aug 16 19:37:11 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735909520 on sda) > Aug 16 19:37:11 localhost kernel: md/raid1:md126: redirecting sector > 2735909032 to other mirror: sdc > Aug 16 19:37:21 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735910056 > Aug 16 19:37:21 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735910568 > Aug 16 19:37:25 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735910064 on sda) > Aug 16 19:37:31 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735910080 on sda) > Aug 16 19:38:00 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735910128 on sda) > Aug 16 19:38:08 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735910240 on sda) > Aug 16 19:38:12 localhost kernel: md/raid1:md126: redirecting sector > 2735910056 to other mirror: sdc > Aug 16 19:38:15 localhost kernel: md/raid1:md126: redirecting sector > 2735910568 to other mirror: sdc > Aug 16 19:38:23 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735911080 > Aug 16 19:38:23 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735911592 > Aug 16 19:38:27 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735911520 on sda) > Aug 16 19:38:27 localhost kernel: md/raid1:md126: redirecting sector > 2735911080 to other mirror: sdc > Aug 16 19:38:28 localhost kernel: md/raid1:md126: redirecting sector > 2735911592 to other mirror: sdc > Aug 16 19:38:33 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735912104 > Aug 16 19:38:37 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735912184 on sda) > Aug 16 19:38:45 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735912240 on sda) > Aug 16 19:38:49 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735912248 on sda) > Aug 16 19:38:59 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735912288 on sda) > Aug 16 19:39:05 localhost kernel: md/raid1:md126: redirecting sector > 2735912104 to other mirror: sdc > Aug 16 19:39:10 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735912872 > Aug 16 19:39:14 localhost kernel: md/raid1:md126: sda: rescheduling sector > 2735913128 > Aug 16 19:39:25 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735912976 on sda) > Aug 16 19:39:33 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735913048 on sda) > Aug 16 19:39:37 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735913072 on sda) > Aug 16 19:39:41 localhost kernel: md/raid1:md126: redirecting sector > 2735912872 to other mirror: sdc > Aug 16 19:39:45 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735913128 on sda) > Aug 16 19:39:55 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735913176 on sda) > Aug 16 19:40:05 localhost kernel: md/raid1:md126: read error corrected (8 > sectors at 2735913232 on sda) > > > And here is what I get from: > > $ grep md127 /var/log/messages > > > Aug 16 14:16:38 localhost systemd[1]: mdmon@md127.service: Deactivated > successfully. > Aug 16 14:16:38 localhost systemd[1]: mdmon@md127.service: Unit process 884 > (mdmon) remains running after unit stopped. > Aug 16 14:16:38 localhost systemd[1]: Stopped mdmon@md127.service - MD > Metadata Monitor on /dev/md127. > Aug 16 14:16:38 localhost audit[1]: SERVICE_STOP pid=1 uid=0 > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 > msg='unit=mdmon@md127 comm="systemd" exe="/usr/lib/systemd/systemd" > hostname=? addr=? terminal=? res=success' > Aug 16 14:16:38 localhost systemd[1]: mdmon@md127.service: Consumed 41.719s > CPU time. > Aug 16 14:21:20 localhost systemd[1]: Starting mdmon@md127.service - MD > Metadata Monitor on /dev/md127... > Aug 16 14:21:20 localhost systemd[1]: Started mdmon@md127.service - MD > Metadata Monitor on /dev/md127. > Aug 16 14:21:20 localhost audit[1]: SERVICE_START pid=1 uid=0 > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 > msg='unit=mdmon@md127 comm="systemd" exe="/usr/lib/systemd/systemd" > hostname=? addr=? terminal=? res=success' > > > > > Maybe also if you have a prior good reboot in messages file include > > that and see what happened differently between the 2. > > Yeah, I do not know where to find this. I looked into /var/log/messages, but > it looks like it starts on August 13, which was a surprise to me, and the > last non-responsive instance for me was last week (August 10, I think, when I > booted into the 6.4 kernel). I did reboot in frustration on August 16. > > Thanks, > Ranjan > > > > > > On Fri, Aug 18, 2023 at 7:46 AM Ranjan Maitra <mlmai...@gmx.com> wrote: > > > > > > On Thu Aug17'23 10:37:29PM, Samuel Sieb wrote: > > > > From: Samuel Sieb <sam...@sieb.net> > > > > Date: Thu, 17 Aug 2023 22:37:29 -0700 > > > > To: users@lists.fedoraproject.org > > > > Reply-To: Community support for Fedora users > > > > <users@lists.fedoraproject.org> > > > > Subject: Re: slowness with kernel 6.4.10 and software raid > > > > > > > > On 8/17/23 21:38, Ranjan Maitra wrote: > > > > > $ cat /proc/mdstat > > > > > Personalities : [raid1] > > > > > md126 : active raid1 sda[1] sdc[0] > > > > > 1855870976 blocks super external:/md127/0 [2/2] [UU] > > > > > [=>...................] check = 8.8% (165001216/1855870976) > > > > > finish=45465.2min speed=619K/sec > > > > > > > > > > md127 : inactive sda[1](S) sdc[0](S) > > > > > 10402 blocks super external:imsm > > > > > > > > > > unused devices: <none> > > > > > > > > > > I am not sure what it is doing, and I am a bit concerned that this > > > > > will go on at this rate for about 20 days. No knowing what will > > > > > happen after that, and also if this problem will recur with another > > > > > reboot. > > > > > > > > After a certain amount of time, mdraid will do a verification of the > > > > data > > > > where it scans the entire array. If you reboot, it will continue from > > > > where > > > > it left off. But that is *really* slow, so you should find out what's > > > > going > > > > on there. > > > > > > Yes, I know, just not sure what to do. Thanks very much! > > > > > > Any suggestion is appreciated! > > > > > > Best wishes, > > > Ranjan > > > _______________________________________________ > > > users mailing list -- users@lists.fedoraproject.org > > > To unsubscribe send an email to users-le...@lists.fedoraproject.org > > > Fedora Code of Conduct: > > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > > List Archives: > > > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > > > Do not reply to spam, report it: > > > https://pagure.io/fedora-infrastructure/new_issue > > _______________________________________________ > > users mailing list -- users@lists.fedoraproject.org > > To unsubscribe send an email to users-le...@lists.fedoraproject.org > > Fedora Code of Conduct: > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > List Archives: > > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > > Do not reply to spam, report it: > > https://pagure.io/fedora-infrastructure/new_issue > _______________________________________________ > users mailing list -- users@lists.fedoraproject.org > To unsubscribe send an email to users-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org > Do not reply to spam, report it: > https://pagure.io/fedora-infrastructure/new_issue _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue