Re: slowness with kernel 6.4.10 and software raid

Roger Heflin Fri, 18 Aug 2023 11:39:41 -0700

The above makes it very clear what is happening.   What kind of disks
are these?  And did you set the scterc timeout?  You can see it via
smartctl -l scterc /dev/sda   and then repeat on the other disk.


Setting the timeout as low as you can will improve this situation
some, but it appears that sda has a number of bad sectors on it.

a full output of "smartclt --xall /dev/sda" would be useful also to
see how bad it is.

Short answer is you probably need a new device for sda.

On Fri, Aug 18, 2023 at 1:30 PM Ranjan Maitra <mlmai...@gmx.com> wrote:
>
> Thanks, Roger!
>
>
> On Fri Aug18'23 12:23:23PM, Roger Heflin wrote:
> > From: Roger Heflin <rogerhef...@gmail.com>
> > Date: Fri, 18 Aug 2023 12:23:23 -0500
> > To: Community support for Fedora users <users@lists.fedoraproject.org>
> > Reply-To: Community support for Fedora users <users@lists.fedoraproject.org>
> > Subject: Re: slowness with kernel 6.4.10 and software raid
> >
> > Is it moving at all or just stopped?  If just stopped it appears that
> > md126 is using external:/md127 for something and md127 looks wrong
> > (both disks are spare) but I don't know in this external case what
> > md127 should look like.
>
> It is moving, slowly. It is a 2 TB drive, but this is weird.
>
> >
> > I would suggest checking messages with grep md12[67] /var/log/messages
> > (and older messages files if the reboot was not this week) to see what
> > is going on.
>
> Good idea! Here is the result from
>
> $ grep md126  /var/log/messages
>
>
>   Aug 14 15:02:30 localhost mdadm[1035]: Rebuild60 event detected on md 
> device /dev/md126
>   Aug 16 14:21:20 localhost kernel: md/raid1:md126: active with 2 out of 2 
> mirrors
>   Aug 16 14:21:20 localhost kernel: md126: detected capacity change from 0 to 
> 3711741952
>   Aug 16 14:21:20 localhost kernel: md126: p1
>   Aug 16 14:21:23 localhost systemd[1]: Condition check resulted in 
> dev-md126p1.device - /dev/md126p1 being skipped.
>   Aug 16 14:21:28 localhost systemd-fsck[942]: /dev/md126p1: clean, 
> 7345384/115998720 files, 409971205/463967488 blocks
>   Aug 16 14:21:31 localhost kernel: EXT4-fs (md126p1): mounted filesystem 
> 932eb81c-2ab4-4e6e-b093-46e43dbd6c28 r/w with ordered data mode. Quota mode: 
> none.
>   Aug 16 14:21:31 localhost mdadm[1033]: NewArray event detected on md device 
> /dev/md126
>   Aug 16 14:21:31 localhost mdadm[1033]: RebuildStarted event detected on md 
> device /dev/md126
>   Aug 16 14:21:31 localhost kernel: md: data-check of RAID array md126
>   Aug 16 19:33:18 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735900352
>   Aug 16 19:33:22 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735900864
>   Aug 16 19:33:28 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900496 on sda)
>   Aug 16 19:33:36 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900568 on sda)
>   Aug 16 19:33:41 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900576 on sda)
>   Aug 16 19:33:50 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900624 on sda)
>   Aug 16 19:34:00 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900640 on sda)
>   Aug 16 19:34:10 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900688 on sda)
>   Aug 16 19:34:18 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900712 on sda)
>   Aug 16 19:34:28 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900792 on sda)
>   Aug 16 19:34:32 localhost kernel: md/raid1:md126: redirecting sector 
> 2735900352 to other mirror: sdc
>   Aug 16 19:34:37 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900872 on sda)
>   Aug 16 19:34:45 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900920 on sda)
>   Aug 16 19:34:54 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735900992 on sda)
>   Aug 16 19:34:54 localhost kernel: md/raid1:md126: redirecting sector 
> 2735900864 to other mirror: sdc
>   Aug 16 19:35:07 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735905704
>   Aug 16 19:35:11 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735905960
>   Aug 16 19:35:18 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735905768 on sda)
>   Aug 16 19:35:19 localhost kernel: md/raid1:md126: redirecting sector 
> 2735905704 to other mirror: sdc
>   Aug 16 19:35:24 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735906120 on sda)
>   Aug 16 19:35:33 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735906192 on sda)
>   Aug 16 19:35:39 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735906448 on sda)
>   Aug 16 19:35:40 localhost kernel: md/raid1:md126: redirecting sector 
> 2735905960 to other mirror: sdc
>   Aug 16 19:35:45 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735906472
>   Aug 16 19:35:49 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735906504 on sda)
>   Aug 16 19:35:52 localhost kernel: md/raid1:md126: redirecting sector 
> 2735906472 to other mirror: sdc
>   Aug 16 19:36:03 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735908008
>   Aug 16 19:36:08 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735908232 on sda)
>   Aug 16 19:36:16 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735908344 on sda)
>   Aug 16 19:36:21 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735908424 on sda)
>   Aug 16 19:36:21 localhost kernel: md/raid1:md126: redirecting sector 
> 2735908008 to other mirror: sda
>   Aug 16 19:36:30 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735908008
>   Aug 16 19:36:37 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735908296 on sda)
>   Aug 16 19:36:38 localhost kernel: md/raid1:md126: redirecting sector 
> 2735908008 to other mirror: sdc
>   Aug 16 19:36:42 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735908776
>   Aug 16 19:36:42 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735909032
>   Aug 16 19:36:46 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735908784 on sda)
>   Aug 16 19:36:50 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735908944 on sda)
>   Aug 16 19:36:50 localhost kernel: md/raid1:md126: redirecting sector 
> 2735908776 to other mirror: sdc
>   Aug 16 19:36:55 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735909312 on sda)
>   Aug 16 19:37:00 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735909360 on sda)
>   Aug 16 19:37:04 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735909400 on sda)
>   Aug 16 19:37:11 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735909520 on sda)
>   Aug 16 19:37:11 localhost kernel: md/raid1:md126: redirecting sector 
> 2735909032 to other mirror: sdc
>   Aug 16 19:37:21 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735910056
>   Aug 16 19:37:21 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735910568
>   Aug 16 19:37:25 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735910064 on sda)
>   Aug 16 19:37:31 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735910080 on sda)
>   Aug 16 19:38:00 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735910128 on sda)
>   Aug 16 19:38:08 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735910240 on sda)
>   Aug 16 19:38:12 localhost kernel: md/raid1:md126: redirecting sector 
> 2735910056 to other mirror: sdc
>   Aug 16 19:38:15 localhost kernel: md/raid1:md126: redirecting sector 
> 2735910568 to other mirror: sdc
>   Aug 16 19:38:23 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735911080
>   Aug 16 19:38:23 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735911592
>   Aug 16 19:38:27 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735911520 on sda)
>   Aug 16 19:38:27 localhost kernel: md/raid1:md126: redirecting sector 
> 2735911080 to other mirror: sdc
>   Aug 16 19:38:28 localhost kernel: md/raid1:md126: redirecting sector 
> 2735911592 to other mirror: sdc
>   Aug 16 19:38:33 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735912104
>   Aug 16 19:38:37 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735912184 on sda)
>   Aug 16 19:38:45 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735912240 on sda)
>   Aug 16 19:38:49 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735912248 on sda)
>   Aug 16 19:38:59 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735912288 on sda)
>   Aug 16 19:39:05 localhost kernel: md/raid1:md126: redirecting sector 
> 2735912104 to other mirror: sdc
>   Aug 16 19:39:10 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735912872
>   Aug 16 19:39:14 localhost kernel: md/raid1:md126: sda: rescheduling sector 
> 2735913128
>   Aug 16 19:39:25 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735912976 on sda)
>   Aug 16 19:39:33 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735913048 on sda)
>   Aug 16 19:39:37 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735913072 on sda)
>   Aug 16 19:39:41 localhost kernel: md/raid1:md126: redirecting sector 
> 2735912872 to other mirror: sdc
>   Aug 16 19:39:45 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735913128 on sda)
>   Aug 16 19:39:55 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735913176 on sda)
>   Aug 16 19:40:05 localhost kernel: md/raid1:md126: read error corrected (8 
> sectors at 2735913232 on sda)
>
>
> And here is what I get from:
>
> $ grep  md127  /var/log/messages
>
>
>   Aug 16 14:16:38 localhost systemd[1]: mdmon@md127.service: Deactivated 
> successfully.
>   Aug 16 14:16:38 localhost systemd[1]: mdmon@md127.service: Unit process 884 
> (mdmon) remains running after unit stopped.
>   Aug 16 14:16:38 localhost systemd[1]: Stopped mdmon@md127.service - MD 
> Metadata Monitor on /dev/md127.
>   Aug 16 14:16:38 localhost audit[1]: SERVICE_STOP pid=1 uid=0 
> auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 
> msg='unit=mdmon@md127 comm="systemd" exe="/usr/lib/systemd/systemd" 
> hostname=? addr=? terminal=? res=success'
>   Aug 16 14:16:38 localhost systemd[1]: mdmon@md127.service: Consumed 41.719s 
> CPU time.
>   Aug 16 14:21:20 localhost systemd[1]: Starting mdmon@md127.service - MD 
> Metadata Monitor on /dev/md127...
>   Aug 16 14:21:20 localhost systemd[1]: Started mdmon@md127.service - MD 
> Metadata Monitor on /dev/md127.
>   Aug 16 14:21:20 localhost audit[1]: SERVICE_START pid=1 uid=0 
> auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 
> msg='unit=mdmon@md127 comm="systemd" exe="/usr/lib/systemd/systemd" 
> hostname=? addr=? terminal=? res=success'
>
> >
> > Maybe also if you have a prior good reboot in messages file include
> > that and see what happened differently between the 2.
>
> Yeah, I do not know where to find this. I looked into /var/log/messages, but 
> it looks like it starts on August 13, which was a surprise to me, and the 
> last non-responsive instance for me was last week (August 10, I think, when I 
> booted into the 6.4 kernel). I did reboot in frustration on August 16.
>
> Thanks,
> Ranjan
>
>
> >
> > On Fri, Aug 18, 2023 at 7:46 AM Ranjan Maitra <mlmai...@gmx.com> wrote:
> > >
> > > On Thu Aug17'23 10:37:29PM, Samuel Sieb wrote:
> > > > From: Samuel Sieb <sam...@sieb.net>
> > > > Date: Thu, 17 Aug 2023 22:37:29 -0700
> > > > To: users@lists.fedoraproject.org
> > > > Reply-To: Community support for Fedora users 
> > > > <users@lists.fedoraproject.org>
> > > > Subject: Re: slowness with kernel 6.4.10 and software raid
> > > >
> > > > On 8/17/23 21:38, Ranjan Maitra wrote:
> > > > > $ cat /proc/mdstat
> > > > >   Personalities : [raid1]
> > > > >   md126 : active raid1 sda[1] sdc[0]
> > > > >         1855870976 blocks super external:/md127/0 [2/2] [UU]
> > > > >         [=>...................]  check =  8.8% (165001216/1855870976) 
> > > > > finish=45465.2min speed=619K/sec
> > > > >
> > > > >   md127 : inactive sda[1](S) sdc[0](S)
> > > > >         10402 blocks super external:imsm
> > > > >
> > > > >   unused devices: <none>
> > > > >
> > > > > I am not sure what it is doing, and I am a bit concerned that this 
> > > > > will go on at this rate for about 20 days. No knowing what will 
> > > > > happen after that, and also if this problem will recur with another 
> > > > > reboot.
> > > >
> > > > After a certain amount of time, mdraid will do a verification of the 
> > > > data
> > > > where it scans the entire array.  If you reboot, it will continue from 
> > > > where
> > > > it left off.  But that is *really* slow, so you should find out what's 
> > > > going
> > > > on there.
> > >
> > > Yes, I know, just not sure what to do. Thanks very much!
> > >
> > > Any suggestion is appreciated!
> > >
> > > Best wishes,
> > > Ranjan
> > > _______________________________________________
> > > users mailing list -- users@lists.fedoraproject.org
> > > To unsubscribe send an email to users-le...@lists.fedoraproject.org
> > > Fedora Code of Conduct: 
> > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > > List Archives: 
> > > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
> > > Do not reply to spam, report it: 
> > > https://pagure.io/fedora-infrastructure/new_issue
> > _______________________________________________
> > users mailing list -- users@lists.fedoraproject.org
> > To unsubscribe send an email to users-le...@lists.fedoraproject.org
> > Fedora Code of Conduct: 
> > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives: 
> > https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
> > Do not reply to spam, report it: 
> > https://pagure.io/fedora-infrastructure/new_issue
> _______________________________________________
> users mailing list -- users@lists.fedoraproject.org
> To unsubscribe send an email to users-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
> Do not reply to spam, report it: 
> https://pagure.io/fedora-infrastructure/new_issue
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Re: slowness with kernel 6.4.10 and software raid

Reply via email to