Re: vmm/vmd disk issue

2021-03-10 Thread Jan Johansson
So to conclude.

I have done four parallel dd, cp, cmp on the host without any
error showing up.

Ian Darwin wrote:
> Depending on where the error is, you might get away with
> dd'ing with conv=noerror,sync, changing vm.conf to point
> to the new copy, and run fsck in the vm.

After this the vm would no longer freeze but an important config
file was missing so I would not trust the state of the machine
for anything else than maybe keeping it alive a few days until
there is a better time to reinstall.

Dave Voutila wrote:
> Have you run fsck(8) on your host?

Complere fsck of the host in single user mode showed no problem
at all.

> I'd say maybe make sure you have backups of anything important
> first if you're purposely going to break things. :-)

Always! :)


So for now I will just let it be and see what time gives.

Thank you all for your input!



Re: vmm/vmd disk issue

2021-03-09 Thread Mike Larkin
On Tue, Mar 09, 2021 at 11:20:30PM +0100, Jan Johansson wrote:
> Mike Larkin  wrote:
> > On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote:
> > > On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote:
> > > > If I try to cp or dd the disk image on the host it fails
> > > >
> > > > dd if=disk.raw.old of=disk.raw.bak bs=1m
> > > > dd: disk.raw.old: Input/output error
> > > > 8858+0 records in
> > > > 8858+0 records out
> > > > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec)
> > > >
> > > > The host show no other signs of failing hardware.
> > > >
> > > > Is this a software or a hardware error?
> > >
> > > Given that it gives an error outside the VM, it's likely hardware.
> > >
> >
> > Agreed. Sorta hard to fault vmd(8) if it's not even running.
>
> Since these are sparse files, could the vioblk(4) somehow write
> incorrect data that later will make it unreadable such as a
> pointer pointing into nothingness?
>

no

> The messages
>
> vmd[39543]: vioblk write error: Input/output error
> vmd[39543]: wr vioblk: disk write error
>
> was produced and 01:30 when all the 4 guests and the host all run
> the daily script (which makes backup and other maintenance tasks)
> if that could have any impact.
>
> Should there not be anything on the host logging errors to
> dmesg/syslog such as sd(4) or ahci(4)?
>
> (If it is not obvious my understanding of how the virtio/vioblk
> stuff hooks in to the disk stack is very limited)
>
> This drive was installed in august 2020 and if I recall correctly
> it was because of this issue. So I am thinkig cable or
> motherboard.
>
> If I decide to replace would it make sense to make this a
> softraid mirror (RAID1) to avoid or get better indication of this
> kind of problems in the future or would only add more parts that
> can break?
>
> I'am currently trying to provoke the drive from the host with
>
> dd if=/dev/random of=test.raw bs=1m count=17000
>
> then cp/dd and cmp to see if I can make it break for real.
>



Re: vmm/vmd disk issue

2021-03-09 Thread Dave Voutila


Jan Johansson writes:

> Mike Larkin  wrote:
>> On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote:
>> > On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote:
>> > > If I try to cp or dd the disk image on the host it fails
>> > >
>> > > dd if=disk.raw.old of=disk.raw.bak bs=1m
>> > > dd: disk.raw.old: Input/output error
>> > > 8858+0 records in
>> > > 8858+0 records out
>> > > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec)
>> > >
>> > > The host show no other signs of failing hardware.
>> > >
>> > > Is this a software or a hardware error?
>> >
>> > Given that it gives an error outside the VM, it's likely hardware.
>> >
>>
>> Agreed. Sorta hard to fault vmd(8) if it's not even running.
>
> Since these are sparse files, could the vioblk(4) somehow write
> incorrect data that later will make it unreadable such as a
> pointer pointing into nothingness?
>
> The messages
>
> vmd[39543]: vioblk write error: Input/output error
> vmd[39543]: wr vioblk: disk write error
>
> was produced and 01:30 when all the 4 guests and the host all run
> the daily script (which makes backup and other maintenance tasks)
> if that could have any impact.
>
> Should there not be anything on the host logging errors to
> dmesg/syslog such as sd(4) or ahci(4)?
>
> (If it is not obvious my understanding of how the virtio/vioblk
> stuff hooks in to the disk stack is very limited)
>

vmd(8) reads/writes to the disk image files (both raw and qcow2) using
pread(2)/pwrite(2) calls. The qcow2 handling is a bit more complex, but
they're still just calling pread/pwrite as far as I'm aware.

Have you run fsck(8) on your host?

> This drive was installed in august 2020 and if I recall correctly
> it was because of this issue. So I am thinkig cable or
> motherboard.
>
> If I decide to replace would it make sense to make this a
> softraid mirror (RAID1) to avoid or get better indication of this
> kind of problems in the future or would only add more parts that
> can break?
>
> I'am currently trying to provoke the drive from the host with
>
> dd if=/dev/random of=test.raw bs=1m count=17000
>
> then cp/dd and cmp to see if I can make it break for real.

I'd say maybe make sure you have backups of anything important first if
you're purposely going to break things. :-)

--
-Dave Voutila



Re: vmm/vmd disk issue

2021-03-09 Thread Dirk Coetzee
It maybe possible that disk IO is saturated. (i.e. more writes than the 
physical disk could handle).

-Original Message-
From: owner-m...@openbsd.org  On Behalf Of Jan Johansson
Sent: Wednesday, 10 March 2021 6:21 AM
To: misc@openbsd.org
Cc: Mike Larkin ; Ian Darwin 
Subject: Re: vmm/vmd disk issue

Mike Larkin  wrote:
> On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote:
> > On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote:
> > > If I try to cp or dd the disk image on the host it fails
> > >
> > > dd if=disk.raw.old of=disk.raw.bak bs=1m
> > > dd: disk.raw.old: Input/output error
> > > 8858+0 records in
> > > 8858+0 records out
> > > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec)
> > >
> > > The host show no other signs of failing hardware.
> > >
> > > Is this a software or a hardware error?
> >
> > Given that it gives an error outside the VM, it's likely hardware.
> >
>
> Agreed. Sorta hard to fault vmd(8) if it's not even running.

Since these are sparse files, could the vioblk(4) somehow write incorrect data 
that later will make it unreadable such as a pointer pointing into nothingness?

The messages

vmd[39543]: vioblk write error: Input/output error
vmd[39543]: wr vioblk: disk write error

was produced and 01:30 when all the 4 guests and the host all run the daily 
script (which makes backup and other maintenance tasks) if that could have any 
impact.

Should there not be anything on the host logging errors to dmesg/syslog such as 
sd(4) or ahci(4)?

(If it is not obvious my understanding of how the virtio/vioblk stuff hooks in 
to the disk stack is very limited)

This drive was installed in august 2020 and if I recall correctly it was 
because of this issue. So I am thinkig cable or motherboard.

If I decide to replace would it make sense to make this a softraid mirror 
(RAID1) to avoid or get better indication of this kind of problems in the 
future or would only add more parts that can break?

I'am currently trying to provoke the drive from the host with

dd if=/dev/random of=test.raw bs=1m count=17000

then cp/dd and cmp to see if I can make it break for real.


Classified as Confidential



Re: vmm/vmd disk issue

2021-03-09 Thread Jan Johansson
Mike Larkin  wrote:
> On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote:
> > On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote:
> > > If I try to cp or dd the disk image on the host it fails
> > >
> > > dd if=disk.raw.old of=disk.raw.bak bs=1m
> > > dd: disk.raw.old: Input/output error
> > > 8858+0 records in
> > > 8858+0 records out
> > > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec)
> > >
> > > The host show no other signs of failing hardware.
> > >
> > > Is this a software or a hardware error?
> >
> > Given that it gives an error outside the VM, it's likely hardware.
> >
> 
> Agreed. Sorta hard to fault vmd(8) if it's not even running.

Since these are sparse files, could the vioblk(4) somehow write
incorrect data that later will make it unreadable such as a
pointer pointing into nothingness?

The messages

vmd[39543]: vioblk write error: Input/output error
vmd[39543]: wr vioblk: disk write error

was produced and 01:30 when all the 4 guests and the host all run
the daily script (which makes backup and other maintenance tasks)
if that could have any impact.

Should there not be anything on the host logging errors to
dmesg/syslog such as sd(4) or ahci(4)?

(If it is not obvious my understanding of how the virtio/vioblk
stuff hooks in to the disk stack is very limited)

This drive was installed in august 2020 and if I recall correctly
it was because of this issue. So I am thinkig cable or
motherboard.

If I decide to replace would it make sense to make this a
softraid mirror (RAID1) to avoid or get better indication of this
kind of problems in the future or would only add more parts that
can break?

I'am currently trying to provoke the drive from the host with

dd if=/dev/random of=test.raw bs=1m count=17000

then cp/dd and cmp to see if I can make it break for real.



Re: vmm/vmd disk issue

2021-03-09 Thread Mike Larkin
On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote:
> On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote:
> > If I try to cp or dd the disk image on the host it fails
> >
> > dd if=disk.raw.old of=disk.raw.bak bs=1m
> > dd: disk.raw.old: Input/output error
> > 8858+0 records in
> > 8858+0 records out
> > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec)
> >
> > The host show no other signs of failing hardware.
> >
> > Is this a software or a hardware error?
>
> Given that it gives an error outside the VM, it's likely hardware.
>

Agreed. Sorta hard to fault vmd(8) if it's not even running.

> > Is there some way to recover the guest disk image without a
> > complete reinstall?
>
> Depending on where the error is, you might get away with
> dd'ing with conv=noerror,sync, changing vm.conf to point
> to the new copy, and run fsck in the vm.
>
> And buy a new hard disk or SDD. Probably cheaper than your time
> to further diagnose it?
>



Re: vmm/vmd disk issue

2021-03-09 Thread Ian Darwin
On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote:
> If I try to cp or dd the disk image on the host it fails
> 
> dd if=disk.raw.old of=disk.raw.bak bs=1m
> dd: disk.raw.old: Input/output error
> 8858+0 records in
> 8858+0 records out
> 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec)
> 
> The host show no other signs of failing hardware.
> 
> Is this a software or a hardware error?

Given that it gives an error outside the VM, it's likely hardware.
 
> Is there some way to recover the guest disk image without a
> complete reinstall?

Depending on where the error is, you might get away with
dd'ing with conv=noerror,sync, changing vm.conf to point
to the new copy, and run fsck in the vm.

And buy a new hard disk or SDD. Probably cheaper than your time
to further diagnose it?



vmm/vmd disk issue

2021-03-09 Thread Jan Johansson
Hello!

A few times I have had problems with my guests freezing (no
console, no ssh) when reading a specific file and only a complete
reinstall seems to solve the problem.

This has happended with both raw and qcow2 disks and on both 6.7
and 6.8. I'm only using OpenBSD for the host and guests.

When the freeze happens the host logs these  messages six times

vmd[24266]: vioblk read error: Input/output error
vmd[24266]: vioblk: block read error, sector 23122496

I also found this a few days back that may be relevant

vmd[39543]: vioblk write error: Input/output error
vmd[39543]: wr vioblk: disk write error

If I try to cp or dd the disk image on the host it fails

dd if=disk.raw.old of=disk.raw.bak bs=1m
dd: disk.raw.old: Input/output error
8858+0 records in
8858+0 records out
9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec)

The host show no other signs of failing hardware.

Is this a software or a hardware error?

Is there some way to recover the guest disk image without a
complete reinstall?

Thank you in advance,
Jan J