Re: Xen virtual network (Netfront) driver

2016-01-25 Thread Jonathon Sisson
On Mon, Jan 25, 2016 at 12:04:03PM +0100, Mike Belopuhov wrote:
> On 25 January 2016 at 01:38, Jonathon Sisson  wrote:
> > Not certain if this is debug output put there intentionally or
> 
> Yes.
> 
> > if this is some error condition?
> 
> It is.  Transmission is stuck and these are watchdog timeouts.
> Did it get a lease on c4.8xlarge?  So far it looks like it happens
> on m4.10xlarge instance only.  No idea why.
> 
Here's the console output:

user@host:~$ grep 'tx prod' dmesg/*
dmesg/c4.8xlarge_dmesg.txt:xnf0: tx prod 2 cons 2,0 evt 3,1
dmesg/c4.8xlarge_dmesg.txt:xnf0: tx prod 3 cons 3,0 evt 4,1
dmesg/c4.8xlarge_dmesg.txt:xnf0: tx prod 4 cons 4,0 evt 5,1
dmesg/c4.8xlarge_dmesg.txt:xnf0: tx prod 5 cons 5,0 evt 6,1
dmesg/c4.8xlarge_dmesg.txt:xnf0: tx prod 6 cons 6,0 evt 7,1
dmesg/c4.8xlarge_dmesg.txt:xnf0: tx prod 7 cons 7,0 evt 8,1
dmesg/d2.8xlarge_dmesg.txt:xnf0: tx prod 2 cons 2,0 evt 3,1
dmesg/d2.8xlarge_dmesg.txt:xnf0: tx prod 3 cons 3,0 evt 4,1
dmesg/d2.8xlarge_dmesg.txt:xnf0: tx prod 4 cons 4,0 evt 5,1
dmesg/d2.8xlarge_dmesg.txt:xnf0: tx prod 5 cons 5,0 evt 6,1
dmesg/d2.8xlarge_dmesg.txt:xnf0: tx prod 6 cons 6,0 evt 7,1
dmesg/g2.8xlarge_dmesg.txt:xnf0: tx prod 5 cons 5,0 evt 6,1
dmesg/g2.8xlarge_dmesg.txt:xnf0: tx prod 6 cons 6,0 evt 7,1
dmesg/i2.8xlarge_dmesg.txt:xnf0: tx prod 3 cons 3,0 evt 4,1
dmesg/i2.8xlarge_dmesg.txt:xnf0: tx prod 4 cons 4,0 evt 5,1
dmesg/i2.8xlarge_dmesg.txt:xnf0: tx prod 5 cons 5,0 evt 6,1
dmesg/i2.8xlarge_dmesg.txt:xnf0: tx prod 6 cons 6,0 evt 7,1
dmesg/i2.8xlarge_dmesg.txt:xnf0: tx prod 7 cons 7,0 evt 8,1
dmesg/m4.10xlarge_dmesg.txt:xnf0: tx prod 2 cons 2,0 evt 3,1
dmesg/m4.10xlarge_dmesg.txt:xnf0: tx prod 3 cons 3,0 evt 4,1
dmesg/m4.10xlarge_dmesg.txt:xnf0: tx prod 4 cons 4,0 evt 5,1
dmesg/m4.10xlarge_dmesg.txt:xnf0: tx prod 5 cons 5,0 evt 6,1
dmesg/m4.10xlarge_dmesg.txt:xnf0: tx prod 6 cons 6,0 evt 7,1
dmesg/r3.8xlarge_dmesg.txt:xnf0: tx prod 5 cons 5,0 evt 6,1
dmesg/r3.8xlarge_dmesg.txt:xnf0: tx prod 6 cons 6,0 evt 7,1
dmesg/r3.8xlarge_dmesg.txt:xnf0: tx prod 7 cons 7,0 evt 8,1
dmesg/r3.8xlarge_dmesg.txt:xnf0: tx prod 8 cons 8,0 evt 9,1

It happens on c4.8x, d2.8x, g2.8x, i2.8x, m4.10x, r3.8x.  
Basically all of the largest instances sizes...on newer
gen instance types that support enhanced networking?  

I can re-test these and see if it occurs frequently or
if it was just a fluke.  I'll update in a bit.




Re: Xen virtual network (Netfront) driver

2016-01-25 Thread Mike Belopuhov
On 25 January 2016 at 01:38, Jonathon Sisson  wrote:
> tech@,
>
> I've uploaded a few of the dmesgs gathered to dmesgd.nycbug.org:
>
> http://dmesgd.nycbug.org/index.cgi?action=dmesgd=index=Jonathon
>
> Currently I have m4.10xlarge, c4.8xlarge, m3.medium, and t2.nano
> uploaded for perusal.
>

Thanks!

> I noticed some new output in the m4.10xlarge console output here:
>
> starting network
> DHCPDISCOVER on xnf0 - interval 3
> DHCPDISCOVER on xnf0 - interval 5
> xnf0: tx prod 2 cons 2,0 evt 3,1
> DHCPDISCOVER on xnf0 - interval 8
> xnf0: tx prod 3 cons 3,0 evt 4,1
> DHCPDISCOVER on xnf0 - interval 10
> xnf0: tx prod 4 cons 4,0 evt 5,1
> DHCPDISCOVER on xnf0 - interval 15
> xnf0: tx prod 5 cons 5,0 evt 6,1
> DHCPDISCOVER on xnf0 - interval 20
> xnf0: tx prod 6 cons 6,0 evt 7,1
> No acceptable DHCPOFFERS received.
> No working leases in persistent database - sleeping.
>
> Not certain if this is debug output put there intentionally or

Yes.

> if this is some error condition?

It is.  Transmission is stuck and these are watchdog timeouts.
Did it get a lease on c4.8xlarge?  So far it looks like it happens
on m4.10xlarge instance only.  No idea why.

> At any rate, there it is =)
>
> -Jonathon



Re: Xen virtual network (Netfront) driver

2016-01-24 Thread Mike Belopuhov
Hi Jonathon,

Thanks a lot for taking your time to test this.

On 24 January 2016 at 06:49, Jonathon Sisson  wrote:
> On Sat, Jan 23, 2016 at 02:18:17PM -0800, Jonathon Sisson wrote:
>> Speaking of testing, is there any particular area non-devs could
>> assist with at this time?  Gathering dmesgs for different instance
>> types?
>>

Trying newer kernels would be the most helpful. I've just enabled tcp/udp
checksum offloading in the xnf on Friday and would welcome any network
tests.

> I decided to spin up one of each instance type and grab the console
> output in case it would be beneficial to the on-going work:
>
> http://update.j3z.org/dmesg/c3.2xlarge_dmesg.txt
> http://update.j3z.org/dmesg/c3.4xlarge_dmesg.txt
> http://update.j3z.org/dmesg/c3.8xlarge_dmesg.txt
> http://update.j3z.org/dmesg/c3.large_dmesg.txt
> http://update.j3z.org/dmesg/c3.xlarge_dmesg.txt
> http://update.j3z.org/dmesg/c4.2xlarge_dmesg.txt
> http://update.j3z.org/dmesg/c4.4xlarge_dmesg.txt
> http://update.j3z.org/dmesg/c4.8xlarge_dmesg.txt
> http://update.j3z.org/dmesg/c4.large_dmesg.txt
> http://update.j3z.org/dmesg/c4.xlarge_dmesg.txt
> http://update.j3z.org/dmesg/d2.2xlarge_dmesg.txt
> http://update.j3z.org/dmesg/d2.4xlarge_dmesg.txt
> http://update.j3z.org/dmesg/d2.8xlarge_dmesg.txt
> http://update.j3z.org/dmesg/d2.xlarge_dmesg.txt
> http://update.j3z.org/dmesg/g2.2xlarge_dmesg.txt
> http://update.j3z.org/dmesg/g2.8xlarge_dmesg.txt
> http://update.j3z.org/dmesg/i2.2xlarge_dmesg.txt
> http://update.j3z.org/dmesg/i2.4xlarge_dmesg.txt
> http://update.j3z.org/dmesg/i2.8xlarge_dmesg.txt
> http://update.j3z.org/dmesg/i2.xlarge_dmesg.txt
> http://update.j3z.org/dmesg/m3.2xlarge_dmesg.txt
> http://update.j3z.org/dmesg/m3.large_dmesg.txt
> http://update.j3z.org/dmesg/m3.medium_dmesg.txt
> http://update.j3z.org/dmesg/m3.xlarge_dmesg.txt
> http://update.j3z.org/dmesg/m4.10xlarge_dmesg.txt
> http://update.j3z.org/dmesg/m4.2xlarge_dmesg.txt
> http://update.j3z.org/dmesg/m4.4xlarge_dmesg.txt
> http://update.j3z.org/dmesg/m4.large_dmesg.txt
> http://update.j3z.org/dmesg/m4.xlarge_dmesg.txt
> http://update.j3z.org/dmesg/r3.2xlarge_dmesg.txt
> http://update.j3z.org/dmesg/r3.4xlarge_dmesg.txt
> http://update.j3z.org/dmesg/r3.8xlarge_dmesg.txt
> http://update.j3z.org/dmesg/r3.large_dmesg.txt
> http://update.j3z.org/dmesg/r3.xlarge_dmesg.txt
> http://update.j3z.org/dmesg/t2.large_dmesg.txt
> http://update.j3z.org/dmesg/t2.medium_dmesg.txt
> http://update.j3z.org/dmesg/t2.micro_dmesg.txt
> http://update.j3z.org/dmesg/t2.nano_dmesg.txt
> http://update.j3z.org/dmesg/t2.small_dmesg.txt
>
> If it is deemed helpful, I can keep them updated as
> new AMIs come out.
>

It would be very interesting to see newer code run on these.

> Thanks!
>
> -Jonathon

Cheers,
Mike



Re: Xen virtual network (Netfront) driver

2016-01-24 Thread Jonathon Sisson
On Sun, Jan 24, 2016 at 02:16:37PM +0100, Mike Belopuhov wrote:
> Hi Jonathon,
> 
> Thanks a lot for taking your time to test this.
>
No, thank you guys for all of the work you're doing to get
this working.  I'm just a user heh.
 
> 
> Trying newer kernels would be the most helpful. I've just enabled tcp/udp
> checksum offloading in the xnf on Friday and would welcome any network
> tests.
>
I rebuilt with a source checkout earlier today, and after
rebooting to the new kernel I can't seem to get a dhcp lease.
I'm working on building userland to determine if there is
some issue with dhclient, but I haven't finished that step
yet.  Has anyone else noted the dhcp issue? 
 



Re: Xen virtual network (Netfront) driver

2016-01-24 Thread Mike Belopuhov
On 24 January 2016 at 20:55, Jonathon Sisson  wrote:
> On Sun, Jan 24, 2016 at 02:16:37PM +0100, Mike Belopuhov wrote:
>> Hi Jonathon,
>>
>> Thanks a lot for taking your time to test this.
>>
> No, thank you guys for all of the work you're doing to get
> this working.  I'm just a user heh.
>
>>
>> Trying newer kernels would be the most helpful. I've just enabled tcp/udp
>> checksum offloading in the xnf on Friday and would welcome any network
>> tests.
>>
> I rebuilt with a source checkout earlier today, and after
> rebooting to the new kernel I can't seem to get a dhcp lease.
> I'm working on building userland to determine if there is
> some issue with dhclient, but I haven't finished that step
> yet.  Has anyone else noted the dhcp issue?
>

I haven't seen that on my test box (not AWS), but maybe reverting
the minimum number of rx slots back to 32 can help?

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pv/if_xnf.c.diff?r1=1.9=1.10



Re: Xen virtual network (Netfront) driver

2016-01-24 Thread Jonathon Sisson
On Sun, Jan 24, 2016 at 09:08:32PM +0100, Mike Belopuhov wrote:
> On 24 January 2016 at 20:55, Jonathon Sisson  wrote:
> > On Sun, Jan 24, 2016 at 02:16:37PM +0100, Mike Belopuhov wrote:
> >> Hi Jonathon,
> >>
> >> Thanks a lot for taking your time to test this.
> >>
> > No, thank you guys for all of the work you're doing to get
> > this working.  I'm just a user heh.
> >
> >>
> >> Trying newer kernels would be the most helpful. I've just enabled tcp/udp
> >> checksum offloading in the xnf on Friday and would welcome any network
> >> tests.
> >>
> > I rebuilt with a source checkout earlier today, and after
> > rebooting to the new kernel I can't seem to get a dhcp lease.
> > I'm working on building userland to determine if there is
> > some issue with dhclient, but I haven't finished that step
> > yet.  Has anyone else noted the dhcp issue?
> >
> 
> I haven't seen that on my test box (not AWS), but maybe reverting
> the minimum number of rx slots back to 32 can help?
> 
> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pv/if_xnf.c.diff?r1=1.9=1.10
> 
Reverting to 32 fixed the dhcp issue.

I'll go ahead and get those dmesgs for you now =)

Thanks again!



Re: Xen virtual network (Netfront) driver

2016-01-23 Thread Reyk Floeter

> On 23.01.2016, at 12:12, Anders Berggren  wrote:
> 
>> On 06 Jan 2016, at 18:49, Reyk Floeter  wrote:
>> - I didn't work on m4.10xlarge (see cvs:~reyk/dmesg.m4.10xlarge).
> 
> I didn’t see any mentions of it in the dmesg 
> https://gist.github.com/reyk/b372af303eb86bab3fee but could it be that those 
> machine classes (*x*large-ish) uses Intel NICs with SR-IOV (ixgbe/ixv-ish) by 
> default 
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html. 
> Last time I tried, sriovNetSupport couldn’t be disabled after the AMI/VM was 
> created, and I had to use the "aws ec2 register-image …” commands, because 
> the AWS web console didn’t offer any web to create a machine without it...

No, you have to *enable* SR-IOV in the image.

Machines with the Intel NIC will not show any netfront in the device list via 
XenStore (just try Ubuntu).

Reyk


Re: Xen virtual network (Netfront) driver

2016-01-23 Thread Anders Berggren
> On 06 Jan 2016, at 18:49, Reyk Floeter  wrote:
> - I didn't work on m4.10xlarge (see cvs:~reyk/dmesg.m4.10xlarge).

I didn’t see any mentions of it in the dmesg 
https://gist.github.com/reyk/b372af303eb86bab3fee but could it be that those 
machine classes (*x*large-ish) uses Intel NICs with SR-IOV (ixgbe/ixv-ish) by 
default 
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html. 
Last time I tried, sriovNetSupport couldn’t be disabled after the AMI/VM was 
created, and I had to use the "aws ec2 register-image …” commands, because the 
AWS web console didn’t offer any web to create a machine without it...


Re: Xen virtual network (Netfront) driver

2016-01-23 Thread Jonathon Sisson
On Sat, Jan 23, 2016 at 12:19:29PM +0100, Reyk Floeter wrote:
> No, you have to *enable* SR-IOV in the image.
> 
> Machines with the Intel NIC will not show any netfront in the device list via 
> XenStore (just try Ubuntu).
> 
> Reyk

That's correct, but I think what was being pointed out is that
an instance with SRIOV enabled cannot have it *disabled* (i.e.
to switch back to xnf NICs).  I was able to get xnf operational
on a c3.large (enhanced networking-capable) by creating an instance
with CentOS and swapping the root volume out.  Any AMI constructed
on Amazon Linux or Ubuntu will have enhanced networking enabled
by default, whereas CentOS doesn't appear to have it enabled (unless
you manually enable it).



Re: Xen virtual network (Netfront) driver

2016-01-23 Thread Reyk Floeter

> On 23.01.2016, at 22:27, Jonathon Sisson  wrote:
> 
> On Sat, Jan 23, 2016 at 12:19:29PM +0100, Reyk Floeter wrote:
>> No, you have to *enable* SR-IOV in the image.
>> 
>> Machines with the Intel NIC will not show any netfront in the device list 
>> via XenStore (just try Ubuntu).
>> 
>> Reyk
> 
> That's correct, but I think what was being pointed out is that
> an instance with SRIOV enabled cannot have it *disabled* (i.e.
> to switch back to xnf NICs).  I was able to get xnf operational
> on a c3.large (enhanced networking-capable) by creating an instance
> with CentOS and swapping the root volume out.  Any AMI constructed
> on Amazon Linux or Ubuntu will have enhanced networking enabled
> by default, whereas CentOS doesn't appear to have it enabled (unless
> you manually enable it).

Ah, OK.

I recommend to upload new images or to use my public openbsd
images to bootstrap new AMIs.

The "dd from Linux" trick is just a hack if you don't want to install the
aws and ec2 cli tools - but we have ports now.

Reyk



Re: Xen virtual network (Netfront) driver

2016-01-23 Thread Jonathon Sisson
On Sat, Jan 23, 2016 at 10:57:21PM +0100, Reyk Floeter wrote:
> 
> > On 23.01.2016, at 22:27, Jonathon Sisson  wrote:
> > 
> > On Sat, Jan 23, 2016 at 12:19:29PM +0100, Reyk Floeter wrote:
> >> No, you have to *enable* SR-IOV in the image.
> >> 
> >> Machines with the Intel NIC will not show any netfront in the device list 
> >> via XenStore (just try Ubuntu).
> >> 
> >> Reyk
> > 
> > That's correct, but I think what was being pointed out is that
> > an instance with SRIOV enabled cannot have it *disabled* (i.e.
> > to switch back to xnf NICs).  I was able to get xnf operational
> > on a c3.large (enhanced networking-capable) by creating an instance
> > with CentOS and swapping the root volume out.  Any AMI constructed
> > on Amazon Linux or Ubuntu will have enhanced networking enabled
> > by default, whereas CentOS doesn't appear to have it enabled (unless
> > you manually enable it).
> 
> Ah, OK.
> 
> I recommend to upload new images or to use my public openbsd
> images to bootstrap new AMIs.
> 
> The "dd from Linux" trick is just a hack if you don't want to install the
> aws and ec2 cli tools - but we have ports now.
> 
> Reyk
> 
Fair enough =)

I wasn't certain if the experimental images were considered ready
for testing.  I'll switch to using them for any other testing I do.

Speaking of testing, is there any particular area non-devs could
assist with at this time?  Gathering dmesgs for different instance
types?



Re: Xen virtual network (Netfront) driver

2016-01-23 Thread Jonathon Sisson
On Sat, Jan 23, 2016 at 02:18:17PM -0800, Jonathon Sisson wrote:
> Speaking of testing, is there any particular area non-devs could
> assist with at this time?  Gathering dmesgs for different instance
> types?
> 
I decided to spin up one of each instance type and grab the console
output in case it would be beneficial to the on-going work:

http://update.j3z.org/dmesg/c3.2xlarge_dmesg.txt
http://update.j3z.org/dmesg/c3.4xlarge_dmesg.txt
http://update.j3z.org/dmesg/c3.8xlarge_dmesg.txt
http://update.j3z.org/dmesg/c3.large_dmesg.txt
http://update.j3z.org/dmesg/c3.xlarge_dmesg.txt
http://update.j3z.org/dmesg/c4.2xlarge_dmesg.txt
http://update.j3z.org/dmesg/c4.4xlarge_dmesg.txt
http://update.j3z.org/dmesg/c4.8xlarge_dmesg.txt
http://update.j3z.org/dmesg/c4.large_dmesg.txt
http://update.j3z.org/dmesg/c4.xlarge_dmesg.txt
http://update.j3z.org/dmesg/d2.2xlarge_dmesg.txt
http://update.j3z.org/dmesg/d2.4xlarge_dmesg.txt
http://update.j3z.org/dmesg/d2.8xlarge_dmesg.txt
http://update.j3z.org/dmesg/d2.xlarge_dmesg.txt
http://update.j3z.org/dmesg/g2.2xlarge_dmesg.txt
http://update.j3z.org/dmesg/g2.8xlarge_dmesg.txt
http://update.j3z.org/dmesg/i2.2xlarge_dmesg.txt
http://update.j3z.org/dmesg/i2.4xlarge_dmesg.txt
http://update.j3z.org/dmesg/i2.8xlarge_dmesg.txt
http://update.j3z.org/dmesg/i2.xlarge_dmesg.txt
http://update.j3z.org/dmesg/m3.2xlarge_dmesg.txt
http://update.j3z.org/dmesg/m3.large_dmesg.txt
http://update.j3z.org/dmesg/m3.medium_dmesg.txt
http://update.j3z.org/dmesg/m3.xlarge_dmesg.txt
http://update.j3z.org/dmesg/m4.10xlarge_dmesg.txt
http://update.j3z.org/dmesg/m4.2xlarge_dmesg.txt
http://update.j3z.org/dmesg/m4.4xlarge_dmesg.txt
http://update.j3z.org/dmesg/m4.large_dmesg.txt
http://update.j3z.org/dmesg/m4.xlarge_dmesg.txt
http://update.j3z.org/dmesg/r3.2xlarge_dmesg.txt
http://update.j3z.org/dmesg/r3.4xlarge_dmesg.txt
http://update.j3z.org/dmesg/r3.8xlarge_dmesg.txt
http://update.j3z.org/dmesg/r3.large_dmesg.txt
http://update.j3z.org/dmesg/r3.xlarge_dmesg.txt
http://update.j3z.org/dmesg/t2.large_dmesg.txt
http://update.j3z.org/dmesg/t2.medium_dmesg.txt
http://update.j3z.org/dmesg/t2.micro_dmesg.txt
http://update.j3z.org/dmesg/t2.nano_dmesg.txt
http://update.j3z.org/dmesg/t2.small_dmesg.txt

If it is deemed helpful, I can keep them updated as
new AMIs come out.

Thanks!

-Jonathon



Re: Xen virtual network (Netfront) driver

2016-01-12 Thread Stefan Fritsch
On Thu, 7 Jan 2016, Mike Belopuhov wrote:

> On 7 January 2016 at 13:17, Mark Kettenis  wrote:
> >> From: Mike Belopuhov 
> >> Date: Thu, 7 Jan 2016 12:02:23 +0100
> >>
> >> On 6 January 2016 at 17:58, Stefan Fritsch  wrote:
> >> > On Wed, 6 Jan 2016, Mike Belopuhov wrote:
> >> >
> >> >> There's still stuff to do, but it receives and transmits reliably
> >> >> (at least on modern Xen) so I'd like to get it in.  Man page will
> >> >> follow.
> >> >
> >> > I only had a quick glance at the code, but I have one comment about your
> >> > use of memory barriers. The membar_* macros are pure compiler barriers
> >> > when the openbsd kernel is compiled for UP. But since the host machine 
> >> > and
> >> > xen may use SMP even in this case, I suspect the that you need hardware
> >> > memory barriers even if MULTIPROCESSOR is not defined. This does not seem
> >> > relevant for x86 because you don't use membar_sync, but it may become
> >> > relevant for arm, which is also supported by xen.
> >> >
> >>
> >> membar_{producer,consumer} are defined on arm to perform store and
> >> load memory barriers.  Our arm code currently does not distinguish
> >> between an MP case and non-MP case regarding the definition of these
> >> macros, so I'm not entirely certain what are you trying to say.

I didn't check arm's implementation but new that it had non-empty 
membar_{producer,consumer}. So, if it does not distinguish between an MP 
case and non-MP case, then there is no problem there. But maybe you should 
document somewhere which assumptions about the architecture you make, so 
that they can be checked when adding a new architecture. I guess arm64 
will come sooner or later and I don't know if it has exactly the same 
memory model as 32bit arm.

> >
> > Not sure ARM is a good example to look at.
> >
> 
> The only architectures that Xen dom0 is implemented for are i386,
> adm64 and arm, so there's no real need to look at anything else.
> 
> > In principle I think that the membar_xxx() interfaces could be simple
> > compiler barriers on all our architectures, at least as long as the
> > CPU will observe its own stores in the same order as they were
> > emitted.  But I think all sane CPU architectures make those
> > guarantees.  At least for "normal" memory.  However, we treat that as
> > an optimization.  And we haven't done that for all our architectures.
> >
> > The problem with virtualization is of course that even a non-MP kernel
> > is actually running in an MP environment.  If data structures are
> > shared with the hypervisor or another domain running on a different
> > CPU, proper memory barriers must be used to guarantee the other side
> > sees our stores in the right order.  The typical case would be
> > populating a descriptor with some sort of validity bit.  There you
> > want to make sure the other side doesn't see the valid bit set until
> > all the other parts of the descriptor have been filled in and are
> > visible.  In that case a simple compiler barrier may not be enough.

Yes. With intel it's the "Reads may be reordered with older writes to 
different locations but not with older writes to the same location" bit 
from the memory model that is causing problems. So you have to check if 
xen hits this case. virtio does (and removing the membarriers causes 
observable hangs).

> 
> That's what I was referring to in my example below.
> 
> > This is why the virtio_membar_xxx() primitives were introduced.
> >
> 
> Any idea why wasn't store and load barriers implemented separately?

No idea. virtio_membar_xxx() was modeled after the existing membar_xxx(). 
But AIUI membar_consumer() plus membar_producer() is not equivalent to 
membar_sync() (which also prevents read vs. write reordering).



Re: Xen virtual network (Netfront) driver

2016-01-07 Thread Mark Kettenis
> From: Mike Belopuhov 
> Date: Thu, 7 Jan 2016 12:02:23 +0100
> 
> On 6 January 2016 at 17:58, Stefan Fritsch  wrote:
> > On Wed, 6 Jan 2016, Mike Belopuhov wrote:
> >
> >> There's still stuff to do, but it receives and transmits reliably
> >> (at least on modern Xen) so I'd like to get it in.  Man page will
> >> follow.
> >
> > I only had a quick glance at the code, but I have one comment about your
> > use of memory barriers. The membar_* macros are pure compiler barriers
> > when the openbsd kernel is compiled for UP. But since the host machine and
> > xen may use SMP even in this case, I suspect the that you need hardware
> > memory barriers even if MULTIPROCESSOR is not defined. This does not seem
> > relevant for x86 because you don't use membar_sync, but it may become
> > relevant for arm, which is also supported by xen.
> >
> 
> membar_{producer,consumer} are defined on arm to perform store and
> load memory barriers.  Our arm code currently does not distinguish
> between an MP case and non-MP case regarding the definition of these
> macros, so I'm not entirely certain what are you trying to say.

Not sure ARM is a good example to look at.

In principle I think that the membar_xxx() interfaces could be simple
compiler barriers on all our architectures, at least as long as the
CPU will observe its own stores in the same order as they were
emitted.  But I think all sane CPU architectures make those
guarantees.  At least for "normal" memory.  However, we treat that as
an optimization.  And we haven't done that for all our architectures.

The problem with virtualization is of course that even a non-MP kernel
is actually running in an MP environment.  If data structures are
shared with the hypervisor or another domain running on a different
CPU, proper memory barriers must be used to guarantee the other side
sees our stores in the right order.  The typical case would be
populating a descriptor with some sort of validity bit.  There you
want to make sure the other side doesn't see the valid bit set until
all the other parts of the descriptor have been filled in and are
visible.  In that case a simple compiler barrier may not be enough.
This is why the virtio_membar_xxx() primitives were introduced.

This is actually not all that different from handling DMA to real
hardware devices.  There we must make sure that stores become visible
to the hardware device in the right order.  That matters even on
non-MP kernels too and is handled by bus_dmamap_sync(9).

Since you have embraced bus_dma(9) for the xen stuff, it would make
sense to add a xen-specifc bus_dmamap_sync() implementation that
issues the appropriate memory barrier.  I think it should be
virtio_membar_consumer() for BUS_DMASYNC_PREREAD and
virtio_membar_producer() for BUS_DMASYNC_POSTWRITE.  But you'd better
double-check, because I always get confused!

BTW, your xen bus_dma(9) implementation relies on the internals of the
MD bus_dma(9) implementation.  Don't expect it to to work on other
architectures.  I'm not even sure I want to be held responsible if
changes in the MD code break it.

> However I'm thankful for bringing this up and I'll spend some time
> figuring out if I need actual fences in my code.  for instance the
> cas loop in xen_grant_table_remove runs for more than 1 iterations
> in the normal case.  I've changed the code to perform bus_dma_unload
> after zeroing the descriptor out so that there won't be technically
> any dangling grant table references, but I didn't remeasure the cas
> loop.  Possibly due to caching and CPU migration on the host we lose
> out and perhaps can get a boost in performance by putting an implicit
> memory barrier.

Not sure what memory fences have to do with this.  The hypervisor
should defenitely issue any appropriate barriers as part of the
context switching.

> > I had the same problem in virtio and introduced the virtio_membar_* macros
> > for this purpose. Maybe they should be renamed to a more generic name and
> > you should use them, too?
> >
> 
> I'm not sure cause I don't think x86 needs any explicit membars, but
> I'll do some test and report on this.

It's a grey area.  The x86 memory model evolved over time and isn't
all that well specified.  On top of that it seems the actual hardware
is abit more strongly ordered than the specification.  It is fairly
strongly ordered, which means that memory barriers can be omitted in
most cases that deal with "normal" memory.  But our implementation
does issue memory barriers for membar_enter() and membar_sync().  I'm
not 100% certain it is correct.



Re: Xen virtual network (Netfront) driver

2016-01-06 Thread Mike Belopuhov
On Wed, Jan 06, 2016 at 16:37 +0100, Mike Belopuhov wrote:
> There's still stuff to do, but it receives and transmits reliably
> (at least on modern Xen) so I'd like to get it in.  Man page will
> follow.
> 
> OK?
>

Just noticed that a couple of debug printfs have sneaked in.
I'm not going to commit them.

> +void
> +xnf_rx_ring_destroy(struct xnf_softc *sc)
> +{
> + int i, slots = 0;
> +
> + for (i = 0; i < XNF_RX_DESC; i++) {
> + if (sc->sc_rx_buf[i] == NULL)
> + continue;
> + bus_dmamap_unload(sc->sc_dmat, sc->sc_rx_dmap[i]);
> + m_freem(sc->sc_rx_buf[i]);
> + sc->sc_rx_buf[i] = NULL;
> + slots++;
> + }
> + printf("%s: unload done\n", __func__);
> + if_rxr_put(>sc_rx_slots, slots);
> + printf("%s: rxr_put done\n", __func__);
> +
> + for (i = 0; i < XNF_RX_DESC; i++) {
> + if (sc->sc_rx_dmap[i] == NULL)
> + continue;
> + bus_dmamap_destroy(sc->sc_dmat, sc->sc_rx_dmap[i]);
> + sc->sc_rx_dmap[i] = NULL;
> + }
> + printf("%s: desc map destroy done\n", __func__);
> + if (sc->sc_rx_rmap) {
> + bus_dmamap_unload(sc->sc_dmat, sc->sc_rx_rmap);
> + bus_dmamap_destroy(sc->sc_dmat, sc->sc_rx_rmap);
> + }
> + printf("%s: ring map destroy done\n", __func__);
> + if (sc->sc_rx_ring) {
> + bus_dmamem_unmap(sc->sc_dmat, (caddr_t)sc->sc_rx_ring,
> + PAGE_SIZE);
> + bus_dmamem_free(sc->sc_dmat, >sc_rx_seg, 1);
> + }
> + printf("%s: ring mem free done\n", __func__);
> + sc->sc_rx_ring = NULL;
> + sc->sc_rx_rmap = NULL;
> + sc->sc_rx_cons = 0;
> +}
> +



Re: Xen virtual network (Netfront) driver

2016-01-06 Thread Reyk Floeter
On Wed, Jan 06, 2016 at 04:37:36PM +0100, Mike Belopuhov wrote:
> There's still stuff to do, but it receives and transmits reliably
> (at least on modern Xen) so I'd like to get it in.  Man page will
> follow.
> 
> OK?
> 

I can see it works now and as mentioned in icb:
I just had the first contact with OpenBSD in an EC2 instance.
(once again, we need emoji in xterm to see the U+1F596)

Two bugs:
- I didn't work on m4.10xlarge (see cvs:~reyk/dmesg.m4.10xlarge).
- One time, xnf stopped while coping a large file to a remote machine.

I think it is good enough to go in and be tweaked in the tree.

OK reyk@

> diff --git sys/arch/amd64/conf/GENERIC sys/arch/amd64/conf/GENERIC
> index fca4459..77e07cc 100644
> --- sys/arch/amd64/conf/GENERIC
> +++ sys/arch/amd64/conf/GENERIC
> @@ -67,10 +67,11 @@ mpbios0   at bios0
>  ipmi0at mainbus? disable # IPMI
>  
>  vmt0 at pvbus?   # VMware Tools
>  
>  #xen0at pvbus?   # Xen HVM domU
> +#xnf*at xen? # Xen Netfront
>  
>  option   PCIVERBOSE
>  option   USBVERBOSE
>  
>  pchb*at pci? # PCI-Host bridges
> diff --git sys/dev/pv/files.pv sys/dev/pv/files.pv
> index d0e3b8c..e1272b2 100644
> --- sys/dev/pv/files.pv
> +++ sys/dev/pv/files.pv
> @@ -16,5 +16,9 @@ filedev/pv/vmt.cvmt 
> needs-flag
>  # Xen
>  device   xen {}
>  attach   xen at pvbus
>  file dev/pv/xen.cxen needs-flag
>  file dev/pv/xenstore.c   xen
> +
> +device   xnf: ether, ifnet, ifmedia
> +attach   xnf at xen
> +file dev/pv/if_xnf.c xnf
> diff --git sys/dev/pv/if_xnf.c sys/dev/pv/if_xnf.c
> new file mode 100644
> index 000..7f8b08e
> --- /dev/null
> +++ sys/dev/pv/if_xnf.c
> @@ -0,0 +1,1022 @@
> +/*
> + * Copyright (c) 2015 Mike Belopuhov
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#include "bpfilter.h"
> +#include "vlan.h"
> +#include "xen.h"
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#ifdef INET6
> +#include 
> +#endif
> +
> +#if NBPFILTER > 0
> +#include 
> +#endif
> +
> +
> +/*
> + * Rx ring
> + */
> +
> +struct xnf_rx_req {
> + uint16_t rxq_id;
> + uint16_t rxq_pad;
> + uint32_t rxq_ref;
> +} __packed;
> +
> +struct xnf_rx_rsp {
> + uint16_t rxp_id;
> + uint16_t rxp_offset;
> + uint16_t rxp_flags;
> +#define  XNF_RXF_CSUM  0x0001
> +#define  XNF_RXF_BLANK 0x0002
> +#define  XNF_RXF_CHUNK 0x0004
> +#define  XNF_RXF_EXTRA 0x0008
> + int16_t  rxp_status;
> +} __packed;
> +
> +union xnf_rx_desc {
> + struct xnf_rx_reqrxd_req;
> + struct xnf_rx_rsprxd_rsp;
> +} __packed;
> +
> +#define XNF_RX_DESC  256
> +#define XNF_MCLENPAGE_SIZE
> +#define XNF_RX_MIN   32
> +
> +struct xnf_rx_ring {
> + uint32_t rxr_prod;
> + uint32_t rxr_req_evt;
> + uint32_t rxr_cons;
> + uint32_t rxr_rsp_evt;
> + uint32_t rxr_reserved[12];
> + union xnf_rx_descrxr_desc[XNF_RX_DESC];
> +} __packed;
> +
> +
> +/*
> + * Tx ring
> + */
> +
> +struct xnf_tx_req {
> + uint32_t txq_ref;
> + uint16_t txq_offset;
> + uint16_t txq_flags;
> +#define  XNF_TXF_CSUM  0x0001
> +#define  XNF_TXF_VALID 0x0002
> +#define  XNF_TXF_CHUNK 0x0004
> +#define  XNF_TXF_ETXRA 0x0008
> + uint16_t txq_id;
> + uint16_t txq_size;
> +} __packed;
> +
> +struct xnf_tx_rsp {
> + uint16_t txp_id;
> + int16_t  txp_status;
> +} __packed;
> +
> +union xnf_tx_desc {
> + struct xnf_tx_reqtxd_req;
> + struct xnf_tx_rsp

Re: Xen virtual network (Netfront) driver

2016-01-06 Thread Stefan Fritsch
On Wed, 6 Jan 2016, Mike Belopuhov wrote:

> There's still stuff to do, but it receives and transmits reliably
> (at least on modern Xen) so I'd like to get it in.  Man page will
> follow.

I only had a quick glance at the code, but I have one comment about your 
use of memory barriers. The membar_* macros are pure compiler barriers 
when the openbsd kernel is compiled for UP. But since the host machine and 
xen may use SMP even in this case, I suspect the that you need hardware 
memory barriers even if MULTIPROCESSOR is not defined. This does not seem 
relevant for x86 because you don't use membar_sync, but it may become 
relevant for arm, which is also supported by xen.

I had the same problem in virtio and introduced the virtio_membar_* macros 
for this purpose. Maybe they should be renamed to a more generic name and 
you should use them, too?


> + if (!(ifp->if_flags & IFF_RUNNING) || ifq_is_oactive(>if_snd))
> + return;
> +
> + prod = txr->txr_prod;
> + membar_consumer();
> +
> + for (;;) {
> + m = ifq_deq_begin(>if_snd);
> + if (m == NULL)
> + break;
> +
> + error = xnf_encap(sc, m, );
> + if (error == ENOENT) {
> + /* transient */
> + ifq_deq_rollback(>if_snd, m);
> + ifq_set_oactive(>if_snd);
> + break;
> + } else if (error) {
> + /* the chain is too large */
> + ifq_deq_commit(>if_snd, m);
> + m_freem(m);
> + continue;
> + }
> + ifq_deq_commit(>if_snd, m);
> +
> +#if NBPFILTER > 0
> + if (ifp->if_bpf)
> + bpf_mtap_ether(ifp->if_bpf, m, BPF_DIRECTION_OUT);
> +#endif
> + pkts++;
> + }
> + if (pkts > 0) {
> + txr->txr_prod = prod;
> + xen_intr_signal(sc->sc_xih);
> + }
> +}
> +
> +int
> +xnf_encap(struct xnf_softc *sc, struct mbuf *m, uint32_t *prod)
> +{
> + struct ifnet *ifp = >sc_ac.ac_if;
> + struct xnf_tx_ring *txr = sc->sc_tx_ring;
> + union xnf_tx_desc *txd;
> + bus_dmamap_t dmap;
> + int error, i, n = 0;
> +
> + if (((txr->txr_cons - *prod - 1) & (XNF_TX_DESC - 1)) < XNF_TX_FRAG) {
> + error = ENOENT;
> + goto errout;
> + }
> +
> + i = *prod & (XNF_TX_DESC - 1);
> + dmap = sc->sc_tx_dmap[i];
> +
> + error = bus_dmamap_load_mbuf(sc->sc_dmat, dmap, m, BUS_DMA_WRITE |
> + BUS_DMA_NOWAIT);
> + if (error == EFBIG) {
> + if (m_defrag(m, M_DONTWAIT) ||
> + bus_dmamap_load_mbuf(sc->sc_dmat, dmap, m, BUS_DMA_WRITE |
> +  BUS_DMA_NOWAIT))
> + goto errout;
> + } else if (error)
> + goto errout;
> +
> + for (n = 0; n < dmap->dm_nsegs; n++, (*prod)++) {
> + i = *prod & (XNF_TX_DESC - 1);
> + if (sc->sc_tx_buf[i])
> + panic("%s: save vs spell: %d\n", ifp->if_xname, i);
> + txd = >txr_desc[i];
> + if (n == 0) {
> + sc->sc_tx_buf[i] = m;
> + if (0 && m->m_pkthdr.csum_flags & M_IPV4_CSUM_OUT)
> + txd->txd_req.txq_flags = XNF_TXF_CSUM |
> + XNF_TXF_VALID;
> + txd->txd_req.txq_size = m->m_pkthdr.len;
> + } else
> + txd->txd_req.txq_size = dmap->dm_segs[n].ds_len;
> + if (n != dmap->dm_nsegs - 1)
> + txd->txd_req.txq_flags |= XNF_TXF_CHUNK;
> + txd->txd_req.txq_ref = dmap->dm_segs[n].ds_addr;
> + txd->txd_req.txq_offset = dmap->dm_segs[n].ds_offset;
> + }
> +
> + ifp->if_opackets++;
> + return (0);
> +
> + errout:
> + ifp->if_oerrors++;
> + return (error);
> +}
> +
> +void
> +xnf_intr(void *arg)
> +{
> + struct xnf_softc *sc = arg;
> + struct ifnet *ifp = >sc_ac.ac_if;
> +
> + if (ifp->if_flags & IFF_RUNNING) {
> + xnf_rxeof(sc);
> + xnf_txeof(sc);
> + }
> +}
> +
> +int
> +xnf_txeof(struct xnf_softc *sc)
> +{
> + struct ifnet *ifp = >sc_ac.ac_if;
> + struct xnf_tx_ring *txr = sc->sc_tx_ring;
> + union xnf_tx_desc *txd;
> + struct mbuf *m;
> + bus_dmamap_t dmap;
> + volatile uint32_t r;
> + uint32_t cons;
> + int i, id, pkts = 0;
> +
> + do {
> + for (cons = sc->sc_tx_cons; cons != txr->txr_cons; cons++) {
> + membar_consumer();
> + i = cons & (XNF_TX_DESC - 1);
> + txd = >txr_desc[i];
> + id = txd->txd_rsp.txp_id;
> + memset(txd, 0, sizeof(*txd));
> + txd->txd_req.txq_id = id;
> + membar_producer();
> + if (sc->sc_tx_buf[i]) {
> +