On Sat, Nov 14, 2020 at 12:51 PM Mike Keehan via Qubes OS Community Forum <
qubes...@discoursemail.com> wrote:

> Mike_Keehan <https://qubes-os.discourse.group/u/mike_keehan>
> November 14
> Well, the thin pool is LVM, but if the VM is offline, there should not be
> a problem. Guess you'll have to investigate all the logs you can
> find.
I finally have the answer!

Thankfully this problem has nothing to do with R4.0.4 but rather a brand
new disk drive failing (MTBF<=5.2 days, likely earlier)  in a rather odd
way. What had me stumped is why the VMs would would seem to run fine but
completely hung the backup process while reading the exact same volumes. It
appears that all the VMs that were acting odd were all allocated on the
same physical drive, but nothing ever gave any kind of an error when they
were reading the drive. It was likely the per-VM metadata needed for the
backup system that failed first.

Fortunatly the drives built in "smart" log holds the records for the last 4
errors, which can be easilly checked, and this allowed me to identify which
physical drive needed to be yanked and replaced. Being a brand new system I
did not yet know which logical drive mapped to which physical drive. To
analyse the problem I used a "smartctl" tool variant on another system to
check the logs that are stored physically within the drive.

Since checking each drive in this way is relatively efficient and easy it
seems to me that there must be an automated way to check these error logs
and notify the user when a drive is starting to fail. My Qubes system was
completely silent and it was only because of the odd behaviour of the
backup system that I was forced to investigate. If the backup process
didn't just hang then all my future backups could have been trash, and I
would have not even noticed the issue until it was too late. Why wait until
the system is completely unusable?

So, my question to the Qubes community is, has anyone out there set up this
kind of "smart" disk check up on Qubes? What are the best tools for a quick
check, say upon each boot, or one that could easilly be put in cron for a
periodic/daily go-no-go health check?



> ------------------------------
> Visit Topic
> <https://qubes-os.discourse.group/t/qubes-users-r4-0-4-rc1-unable-to-delete-or-backup-certain-qubes/1507/3>
> or reply to this email to respond.
> To unsubscribe from these emails, click here
> <https://qubes-os.discourse.group/email/unsubscribe/f2a96951586260ffa6442e60947eb759a3b75ce7e285aa587ac9d2d10be81bc3>
> .

You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Reply via email to