On Sat, Nov 14, 2020 at 12:51 PM Mike Keehan via Qubes OS Community Forum < qubes...@discoursemail.com> wrote:
> Mike_Keehan <https://qubes-os.discourse.group/u/mike_keehan> > November 14 > > Well, the thin pool is LVM, but if the VM is offline, there should not be > a problem. Guess you'll have to investigate all the logs you can > find. > I finally have the answer! Thankfully this problem has nothing to do with R4.0.4 but rather a brand new disk drive failing (MTBF<=5.2 days, likely earlier) in a rather odd way. What had me stumped is why the VMs would would seem to run fine but completely hung the backup process while reading the exact same volumes. It appears that all the VMs that were acting odd were all allocated on the same physical drive, but nothing ever gave any kind of an error when they were reading the drive. It was likely the per-VM metadata needed for the backup system that failed first. Fortunatly the drives built in "smart" log holds the records for the last 4 errors, which can be easilly checked, and this allowed me to identify which physical drive needed to be yanked and replaced. Being a brand new system I did not yet know which logical drive mapped to which physical drive. To analyse the problem I used a "smartctl" tool variant on another system to check the logs that are stored physically within the drive. Since checking each drive in this way is relatively efficient and easy it seems to me that there must be an automated way to check these error logs and notify the user when a drive is starting to fail. My Qubes system was completely silent and it was only because of the odd behaviour of the backup system that I was forced to investigate. If the backup process didn't just hang then all my future backups could have been trash, and I would have not even noticed the issue until it was too late. Why wait until the system is completely unusable? So, my question to the Qubes community is, has anyone out there set up this kind of "smart" disk check up on Qubes? What are the best tools for a quick check, say upon each boot, or one that could easilly be put in cron for a periodic/daily go-no-go health check? Thanks, Steve > ------------------------------ > > Visit Topic > <https://qubes-os.discourse.group/t/qubes-users-r4-0-4-rc1-unable-to-delete-or-backup-certain-qubes/1507/3> > or reply to this email to respond. > > To unsubscribe from these emails, click here > <https://qubes-os.discourse.group/email/unsubscribe/f2a96951586260ffa6442e60947eb759a3b75ce7e285aa587ac9d2d10be81bc3> > . > > -- You received this message because you are subscribed to the Google Groups "qubes-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to qubes-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/qubes-users/CAJ5FDniPDdysU6isxXotQvdkPWgkZ9LFCRVLVbkRwp12%2BMacVA%40mail.gmail.com.