Re: Bhyve storage improvements (was: Several bhyve quirks)

2015-04-18 Thread Alexander Motin
On 06.04.2015 23:38, Alexander Motin wrote:
>> I had some time to try it out today, but I'm still having issues:
> 
> I've just made experiment alike to your with making bhyve to work on top
> of GEOM device instead of preferable "dev" mode of ZVOL. And I indeed
> reproduced the problem. But the problem that I see is not related to the
> block size. The block size is reported to the guest correctly as 4K, and
> as I can see it works as such at least in FreeBSD guest.
> 
> The problem is in the way how bhyve inter-operates with block/GEOM
> devices. bhyve sends requests to the kernel with preadv()/pwritev()
> calls, specifying scatter/gather lists of buffer addresses provided by
> the guest. But GEOM code can not handle scatter/gather lists, only
> sequential buffer, and so single request is split into several. The
> problem is that splitting happens according to scatter/gather elements,
> and those elements in general case may not be multiple to the block
> size, that is fatal for GEOM and any block device.
> 
> I am not yet sure how to fix this problem. The most straightforward way
> is to copy the data at some point to collect elements of scatter/gather
> list into something sequential to pass to GEOM, but that requires
> additional memory allocation, and the copying is not free.  May be some
> cases could be optimized to work without copying but with some clever
> page mapping, but that seems absolutely not trivial.

I've committed the workaround to FreeBSD head at r281700.

-- 
Alexander Motin
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Bhyve storage improvements (was: Several bhyve quirks)

2015-04-06 Thread Alexander Motin
Hi, Julian.

> I had some time to try it out today, but I'm still having issues:

I've just made experiment alike to your with making bhyve to work on top
of GEOM device instead of preferable "dev" mode of ZVOL. And I indeed
reproduced the problem. But the problem that I see is not related to the
block size. The block size is reported to the guest correctly as 4K, and
as I can see it works as such at least in FreeBSD guest.

The problem is in the way how bhyve inter-operates with block/GEOM
devices. bhyve sends requests to the kernel with preadv()/pwritev()
calls, specifying scatter/gather lists of buffer addresses provided by
the guest. But GEOM code can not handle scatter/gather lists, only
sequential buffer, and so single request is split into several. The
problem is that splitting happens according to scatter/gather elements,
and those elements in general case may not be multiple to the block
size, that is fatal for GEOM and any block device.

I am not yet sure how to fix this problem. The most straightforward way
is to copy the data at some point to collect elements of scatter/gather
list into something sequential to pass to GEOM, but that requires
additional memory allocation, and the copying is not free.  May be some
cases could be optimized to work without copying but with some clever
page mapping, but that seems absolutely not trivial.

-- 
Alexander Motin
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Bhyve storage improvements (was: Several bhyve quirks)

2015-03-30 Thread Julian Hsiao

On 2015-03-27 09:46:50 +, Alexander Motin said:


[snip]

Also both virtio-blk and ahci-hd drivers now report to guest logical and
physical block sizes of underlying storage, that allow guests properly
align partitions and I/Os for best compatibility and performance.


Hi Alexander,

In a previous reply from Peter Grehan, he said that ahci-hd should 
already report the correct block size in 10.1.  I had some time to try 
it out today, but I'm still having issues:


$ zfs create \
   -o compression=off \
   -o primarycache=metadata \
   -o secondarycache=metadata \
   -o volblocksize=4096 \
   -o refreservation=none \
   -V 10G \
   zroot/usr/bhyve/test/img
$ geli init -B none -e AES-XTS -K test.key -l 128 -P -s 4096 \
   zvol/zroot/usr/bhyve/test/img
$ geli attach -p -k test.key zvol/zroot/usr/bhyve/test/img
[set up device map, grub-bhyve, etc.]
$ bhyve -A -c 1 -H -P -m 256 \
   -s 0:0,hostbridge \
   -s 1:0,ahci-hd,img.eli \
   -s 2:0,ahci-cd,ubuntu-14.10-server-amd64.iso \
   -s 31,lpc -l com1,stdio \
   test
[boot guest to recovery console]
$ fdisk -l /dev/sda
fdisk: cannot open /dev/sda: Input/output error

And syslog shows a lot of errors accessing sda.

Note that the actual HDD has 512-byte sectors, so perhaps bhyve is 
getting the sector size from the hardware and not from geli / ZFS?


Julian Hsiao


___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Several bhyve quirks

2015-03-27 Thread Jason Tubnor
On 28 March 2015 at 10:49, Neel Natu  wrote:
>
> This is fixed in HEAD where the RTC device model defaults to 24-hour time.
>
>> 
>> suggests that I'm on the right track, but it doesn't explain the off-by-one
>> nor the (one time) multi-day offset.
>>
>
> The one-hour offset is a bug due to my interpretation of the 12-hour format.
>
> I am going to fix this in HEAD shortly but here is a patch for 10.1 and 
> earlier:
> https://people.freebsd.org/~neel/patches/bhyve_openbsd_rtc.patch
>

Thanks for this Neel.  I was trying to back port your original HEAD
patch into 10.1 but there were too many quirks to deal with into other
dependent libs.  I didn't have the skills to do this, it is
appreciated that you did it :-)  Thanks!
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Several bhyve quirks

2015-03-27 Thread Neel Natu
Hi Julian,

On Wed, Mar 25, 2015 at 2:24 AM, Julian Hsiao  wrote:
> Hi,
>
> I'm running bhyve on 10.1, mostly with OpenBSD (5.7) guests, and I ran into
> a few strange issues:
>
> 1. The guest RTC is several hours off every time I start bhyve.  The host
> RTC is set to UTC, and /etc/localtime on both the host and guests are set to
> US/Pacific (currently PDT).  I thought maybe bhyve is setting the RTC to the
> local time, and indeed changing TZ environment variable affects the guest's
> RTC.  However, with TZ=UTC the guest is still off by an hour, and to get the
> correct offset I set TZ='UTC+1'; perhaps something's not handling DST
> correctly?
>
> Also, one time the offset was mysteriously tens of hours off (i.e. the guest
> RTS is a day or two ahead), and the condition persisted across multiple host
> and guest reboots.  Unfortunately, the problem went away a few hours later
> and I was unable to reproduce it since.
>

The problem is that in 10.1 (and earlier) bhyve defaulted to a 12-hour
RTC format but some guests like OpenBSD and Linux assume that it is
configured in the 24-hour format.

The 12-hour format indicates PM time by setting the most significant
bit in the 'hour' byte. Since the guest is not prepared to mask this
bit it thinks that the time is 68 hours ahead of the actual time (but
only for PM times - everything goes back to normal during AM times).

This is fixed in HEAD where the RTC device model defaults to 24-hour time.

> 
> suggests that I'm on the right track, but it doesn't explain the off-by-one
> nor the (one time) multi-day offset.
>

The one-hour offset is a bug due to my interpretation of the 12-hour format.

I am going to fix this in HEAD shortly but here is a patch for 10.1 and earlier:
https://people.freebsd.org/~neel/patches/bhyve_openbsd_rtc.patch

> As an aside, the commit message implies that this only affects OpenBSD
> guest, when in fact this probably affects all guests (at least also Linux).
> Perhaps he meant you cannot configure OpenBSD to assume that the RTC is set
> to local time instead of UTC.
>
> 2. What's the preferred solution for minimizing guest clock drift in bhyve?
> Based on some Google searches, I run ntpd in the guests and set
> kern.timecounter.hardware=acpitimer0 instead of the default acpihpet0.
> acpitimer0 drifts by ~600 ppm while acpihpet0 drifts by ~1500 ppm; why?
>

I don't know but I am running experiments that I hope will provide some insight.

best
Neel

> 3. Even moderate guest disk I/O completely kills guest network performance.
> For example, whenever security(8) (security(7) in FreeBSD) runs, guest
> network throughput drops from 150+ Mbps to ~20 Mbps, and jitter from ping
> jumps from <0.01 ms to 100+ ms.  If I try to build something in the guest,
> then network becomes almost unusable.
>
> The network performance degradation only affects the guest that's generating
> the I/O; high I/O on guest B doesn't affect guest A, nor would high I/O on
> the host.
>
> I'm using both virtio-blk and virio-net drivers, and the guests' disk images
> are backed by zvol+geli.  Removing geli has no effect.
>
> There are some commits in CURRENT that suggests improved virtio performance,
> but I'm not comfortable running CURRENT.  Is there a workaround I could use
> for 10.1?
>
> 4. virtio-blk always reports the virtual disk as having 512-byte sectors,
> and so I get I/O errors on OpenBSD guests when the disk image is backed by
> zvol+geli with 4K sector size.  Curiously, this only seems to affect
> zvol+geli; with just zvol it seems to work.  Also, it works either way on
> Linux guests.
>
> ATM I changed the zvol / geli sector size to 512 bytes, which probably made
> #2 worse.  I think this bug / feature is addressed by:
> ,
> but again is there a workaround to force a specific sector size for 10.1?
>
> 5. This may be better directed at OpenBSD but I'll ask here anyway: if I
> enable virtio-rnd then OpenBSD would not boot with "couldn't map interrupt"
> error.  The kernel in bsd.rd will boot, but not the installed kernel (or the
> one built from STABLE; I forgot).  Again, Linux seems unaffected, but I
> couldn't tell if it's actually working.
>
> Julian Hsiao
>
>
> ___
> freebsd-virtualization@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
> To unsubscribe, send any mail to
> "freebsd-virtualization-unsubscr...@freebsd.org"
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Bhyve storage improvements (was: Several bhyve quirks)

2015-03-27 Thread John Nielsen
On Mar 27, 2015, at 10:47 AM, John Nielsen  wrote:

> On Mar 27, 2015, at 3:46 AM, Alexander Motin  wrote:
> 
>>> I've always assumed virtio driver > emulated driver so it didn't occur
>>> to me to try ahci-hd.
>> 
>> I've just merged to FreeBSD stable/10 branch set of bhyve changes that
>> should significantly improve situation in the storage area.
>> 
>> virtio-blk driver was fixed to work asynchronously and not block virtual
>> CPU, that should fix many problems with performance and interactivity.
>> Both virtio-blk and ahci-hd drivers got ability to execute multiple (up
>> to 8) requests same time, that should proportionally improve parallel
>> random I/O performance on wide storages.  At this point virtio-blk is
>> indeed faster then ahci-hd on high IOPS, and they both are faster then
>> before.
>> 
>> On the other side ahci-hd driver now got TRIM support to allow freeing
>> unused space on backing ZVOL. Unfortunately there is no any TRIM/UNMAP
>> support in virtio-blk API to allow the same.
>> 
>> Also both virtio-blk and ahci-hd drivers now report to guest logical and
>> physical block sizes of underlying storage, that allow guests properly
>> align partitions and I/Os for best compatibility and performance.
> 
> Mav, thank you very much for all this great work and for the concise summary. 
> TRIM on AHCI makes it compelling for a lot of use cases despite the probable 
> performance hit.
> 
> Does anyone have plans (or know about any) to implement virtio-scsi support 
> in bhyve? That API does support TRIM and should retain most or all of the 
> low-overhead virtio goodness.

Okay, some belated googling reminded me that this has been listed as an "open 
task" in the last couple of FreeBSD quarterly status reports and discussed at 
one or more devsummits. I'd still be interested to know if anyone's actually 
contemplated or started doing the work though. :)

JN

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Bhyve storage improvements (was: Several bhyve quirks)

2015-03-27 Thread John Nielsen
On Mar 27, 2015, at 3:46 AM, Alexander Motin  wrote:

>> I've always assumed virtio driver > emulated driver so it didn't occur
>> to me to try ahci-hd.
> 
> I've just merged to FreeBSD stable/10 branch set of bhyve changes that
> should significantly improve situation in the storage area.
> 
> virtio-blk driver was fixed to work asynchronously and not block virtual
> CPU, that should fix many problems with performance and interactivity.
> Both virtio-blk and ahci-hd drivers got ability to execute multiple (up
> to 8) requests same time, that should proportionally improve parallel
> random I/O performance on wide storages.  At this point virtio-blk is
> indeed faster then ahci-hd on high IOPS, and they both are faster then
> before.
> 
> On the other side ahci-hd driver now got TRIM support to allow freeing
> unused space on backing ZVOL. Unfortunately there is no any TRIM/UNMAP
> support in virtio-blk API to allow the same.
> 
> Also both virtio-blk and ahci-hd drivers now report to guest logical and
> physical block sizes of underlying storage, that allow guests properly
> align partitions and I/Os for best compatibility and performance.

Mav, thank you very much for all this great work and for the concise summary. 
TRIM on AHCI makes it compelling for a lot of use cases despite the probable 
performance hit.

Does anyone have plans (or know about any) to implement virtio-scsi support in 
bhyve? That API does support TRIM and should retain most or all of the 
low-overhead virtio goodness.

JN

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Bhyve storage improvements (was: Several bhyve quirks)

2015-03-27 Thread Alexander Motin
> I've always assumed virtio driver > emulated driver so it didn't occur
> to me to try ahci-hd.

I've just merged to FreeBSD stable/10 branch set of bhyve changes that
should significantly improve situation in the storage area.

virtio-blk driver was fixed to work asynchronously and not block virtual
CPU, that should fix many problems with performance and interactivity.
Both virtio-blk and ahci-hd drivers got ability to execute multiple (up
to 8) requests same time, that should proportionally improve parallel
random I/O performance on wide storages.  At this point virtio-blk is
indeed faster then ahci-hd on high IOPS, and they both are faster then
before.

On the other side ahci-hd driver now got TRIM support to allow freeing
unused space on backing ZVOL. Unfortunately there is no any TRIM/UNMAP
support in virtio-blk API to allow the same.

Also both virtio-blk and ahci-hd drivers now report to guest logical and
physical block sizes of underlying storage, that allow guests properly
align partitions and I/Os for best compatibility and performance.

-- 
Alexander Motin
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Several bhyve quirks

2015-03-26 Thread Peter Grehan

Hi Julian,


Thank you for your explanation and tips, Peter.  I just tried changing
virtio-blk -> ahci-hd and preliminary results are good.  And now you've
mentioned it, I do recall seeing slightly less performance degradation
on guests with 2 vCPUs vs. ones with just one.


 Glad to hear that :)


Try using the -W option to bhyve. This will force the bhyve virtio
code to advertize (non-standard) MSI interrupt capability which OpenBSD
will then use to allocate vectors.


Unfortunately -W didn't help.  This is not critical, however, and I'll
ask around in the OpenBSD mailing list.


 I tried this out today with OpenBSD 5.7 and a CURRENT host, and it's 
actually a bug in the virtio-rnd implementation in bhyve when MSI-x 
isn't used. The early testing was with FreeBSD and Linux which both use 
MSI-x so wasn't picked up for the MSI/legacy case.


 I have a fix for CURRENT and that should make it's way into 10-stable 
shortly.


 Thanks for the report !

later,

Peter.

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Several bhyve quirks

2015-03-26 Thread Julian Hsiao

On 2015-03-25 15:44:35 +, Peter Grehan said:


In 10.1, virtio-blk i/o is done sychronously in the context of the
guest vCPU exit. If it's a single vCPU guest, or the virtio-net
interrupt happens to be delivered to that vCPU, performance will suffer.

A workaround is to use ahci-hd for the disk emulation and not
virtio-blk. The AHCI emulation does i/o in a dedicated thread and
doesn't block the vCPU thread.


Thank you for your explanation and tips, Peter.  I just tried changing 
virtio-blk -> ahci-hd and preliminary results are good.  And now you've 
mentioned it, I do recall seeing slightly less performance degradation 
on guests with 2 vCPUs vs. ones with just one.


I've always assumed virtio driver > emulated driver so it didn't occur 
to me to try ahci-hd.



The only workaround for 10.1 would be to use ahci-hd instead of
virtio-blk. The correct sector size will be reported there.


I haven't had a chance to test this; next time I spin up a guest from 
scratch I'll try it out.



Try using the -W option to bhyve. This will force the bhyve virtio
code to advertize (non-standard) MSI interrupt capability which OpenBSD
will then use to allocate vectors.


Unfortunately -W didn't help.  This is not critical, however, and I'll 
ask around in the OpenBSD mailing list.


Thanks again for your help.

Julian Hsiao


___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Several bhyve quirks

2015-03-25 Thread Peter Grehan

Hi Julian,

 I'll let Neel take care of the time questions.


3. Even moderate guest disk I/O completely kills guest network
performance.  For example, whenever security(8) (security(7) in FreeBSD)
runs, guest network throughput drops from 150+ Mbps to ~20 Mbps, and
jitter from ping jumps from <0.01 ms to 100+ ms.  If I try to build
something in the guest, then network becomes almost unusable.

The network performance degradation only affects the guest that's
generating the I/O; high I/O on guest B doesn't affect guest A, nor
would high I/O on the host.

I'm using both virtio-blk and virio-net drivers, and the guests' disk
images are backed by zvol+geli.  Removing geli has no effect.

There are some commits in CURRENT that suggests improved virtio
performance, but I'm not comfortable running CURRENT.  Is there a
workaround I could use for 10.1?


 In 10.1, virtio-blk i/o is done sychronously in the context of the 
guest vCPU exit. If it's a single vCPU guest, or the virtio-net 
interrupt happens to be delivered to that vCPU, performance will suffer.


 A workaround is to use ahci-hd for the disk emulation and not 
virtio-blk. The AHCI emulation does i/o in a dedicated thread and 
doesn't block the vCPU thread.



4. virtio-blk always reports the virtual disk as having 512-byte
sectors, and so I get I/O errors on OpenBSD guests when the disk image
is backed by zvol+geli with 4K sector size.  Curiously, this only seems
to affect zvol+geli; with just zvol it seems to work.  Also, it works
either way on Linux guests.

ATM I changed the zvol / geli sector size to 512 bytes, which probably
made #2 worse.  I think this bug / feature is addressed by:
,
but again is there a workaround to force a specific sector size for 10.1?


 The only workaround for 10.1 would be to use ahci-hd instead of 
virtio-blk. The correct sector size will be reported there.



5. This may be better directed at OpenBSD but I'll ask here anyway: if I
enable virtio-rnd then OpenBSD would not boot with "couldn't map
interrupt" error.  The kernel in bsd.rd will boot, but not the installed
kernel (or the one built from STABLE; I forgot).  Again, Linux seems
unaffected, but I couldn't tell if it's actually working.


 Try using the -W option to bhyve. This will force the bhyve virtio 
code to advertize (non-standard) MSI interrupt capability which OpenBSD 
will then use to allocate vectors.


later,

Peter.

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Several bhyve quirks

2015-03-25 Thread Julian Hsiao

Hi,

I'm running bhyve on 10.1, mostly with OpenBSD (5.7) guests, and I ran 
into a few strange issues:


1. The guest RTC is several hours off every time I start bhyve.  The 
host RTC is set to UTC, and /etc/localtime on both the host and guests 
are set to US/Pacific (currently PDT).  I thought maybe bhyve is 
setting the RTC to the local time, and indeed changing TZ environment 
variable affects the guest's RTC.  However, with TZ=UTC the guest is 
still off by an hour, and to get the correct offset I set TZ='UTC+1'; 
perhaps something's not handling DST correctly?


Also, one time the offset was mysteriously tens of hours off (i.e. the 
guest RTS is a day or two ahead), and the condition persisted across 
multiple host and guest reboots.  Unfortunately, the problem went away 
a few hours later and I was unable to reproduce it since.


 
suggests that I'm on the right track, but it doesn't explain the 
off-by-one nor the (one time) multi-day offset.


As an aside, the commit message implies that this only affects OpenBSD 
guest, when in fact this probably affects all guests (at least also 
Linux).  Perhaps he meant you cannot configure OpenBSD to assume that 
the RTC is set to local time instead of UTC.


2. What's the preferred solution for minimizing guest clock drift in 
bhyve?  Based on some Google searches, I run ntpd in the guests and set 
kern.timecounter.hardware=acpitimer0 instead of the default acpihpet0.  
acpitimer0 drifts by ~600 ppm while acpihpet0 drifts by ~1500 ppm; why?


3. Even moderate guest disk I/O completely kills guest network 
performance.  For example, whenever security(8) (security(7) in 
FreeBSD) runs, guest network throughput drops from 150+ Mbps to ~20 
Mbps, and jitter from ping jumps from <0.01 ms to 100+ ms.  If I try to 
build something in the guest, then network becomes almost unusable.


The network performance degradation only affects the guest that's 
generating the I/O; high I/O on guest B doesn't affect guest A, nor 
would high I/O on the host.


I'm using both virtio-blk and virio-net drivers, and the guests' disk 
images are backed by zvol+geli.  Removing geli has no effect.


There are some commits in CURRENT that suggests improved virtio 
performance, but I'm not comfortable running CURRENT.  Is there a 
workaround I could use for 10.1?


4. virtio-blk always reports the virtual disk as having 512-byte 
sectors, and so I get I/O errors on OpenBSD guests when the disk image 
is backed by zvol+geli with 4K sector size.  Curiously, this only seems 
to affect zvol+geli; with just zvol it seems to work.  Also, it works 
either way on Linux guests.


ATM I changed the zvol / geli sector size to 512 bytes, which probably 
made #2 worse.  I think this bug / feature is addressed by: 
, 
but again is there a workaround to force a specific sector size for 
10.1?


5. This may be better directed at OpenBSD but I'll ask here anyway: if 
I enable virtio-rnd then OpenBSD would not boot with "couldn't map 
interrupt" error.  The kernel in bsd.rd will boot, but not the 
installed kernel (or the one built from STABLE; I forgot).  Again, 
Linux seems unaffected, but I couldn't tell if it's actually working.


Julian Hsiao


___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"