[PATCH 6.6.y] virtio_net: Do not send RSS key if it is not supported
From: Breno Leitao commit 059a49aa2e25c58f90b50151f109dd3c4cdb3a47 upstream. There is a bug when setting the RSS options in virtio_net that can break the whole machine, getting the kernel into an infinite loop. Running the following command in any QEMU virtual machine with virtionet will reproduce this problem: # ethtool -X eth0 hfunc toeplitz This is how the problem happens: 1) ethtool_set_rxfh() calls virtnet_set_rxfh() 2) virtnet_set_rxfh() calls virtnet_commit_rss_command() 3) virtnet_commit_rss_command() populates 4 entries for the rss scatter-gather 4) Since the command above does not have a key, then the last scatter-gatter entry will be zeroed, since rss_key_size == 0. sg_buf_size = vi->rss_key_size; 5) This buffer is passed to qemu, but qemu is not happy with a buffer with zero length, and do the following in virtqueue_map_desc() (QEMU function): if (!sz) { virtio_error(vdev, "virtio: zero sized buffers are not allowed"); 6) virtio_error() (also QEMU function) set the device as broken vdev->broken = true; 7) Qemu bails out, and do not repond this crazy kernel. 8) The kernel is waiting for the response to come back (function virtnet_send_command()) 9) The kernel is waiting doing the following : while (!virtqueue_get_buf(vi->cvq, ) && !virtqueue_is_broken(vi->cvq)) cpu_relax(); 10) None of the following functions above is true, thus, the kernel loops here forever. Keeping in mind that virtqueue_is_broken() does not look at the qemu `vdev->broken`, so, it never realizes that the vitio is broken at QEMU side. Fix it by not sending RSS commands if the feature is not available in the device. Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Cc: sta...@vger.kernel.org Cc: qemu-devel@nongnu.org Signed-off-by: Breno Leitao Reviewed-by: Heng Qi Reviewed-by: Xuan Zhuo Signed-off-by: David S. Miller Signed-off-by: Vlad Poenaru --- drivers/net/virtio_net.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 7cb0548d17a3..56cbe00126bb 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3570,19 +3570,34 @@ static int virtnet_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, u8 *hfu static int virtnet_set_rxfh(struct net_device *dev, const u32 *indir, const u8 *key, const u8 hfunc) { struct virtnet_info *vi = netdev_priv(dev); + bool update = false; int i; if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP) return -EOPNOTSUPP; if (indir) { + if (!vi->has_rss) + return -EOPNOTSUPP; + for (i = 0; i < vi->rss_indir_table_size; ++i) vi->ctrl->rss.indirection_table[i] = indir[i]; + update = true; } - if (key) + if (key) { + /* If either _F_HASH_REPORT or _F_RSS are negotiated, the +* device provides hash calculation capabilities, that is, +* hash_key is configured. +*/ + if (!vi->has_rss && !vi->has_rss_hash_report) + return -EOPNOTSUPP; + memcpy(vi->ctrl->rss.key, key, vi->rss_key_size); + update = true; + } - virtnet_commit_rss_command(vi); + if (update) + virtnet_commit_rss_command(vi); return 0; } @@ -4491,13 +4506,15 @@ static int virtnet_probe(struct virtio_device *vdev) if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT)) vi->has_rss_hash_report = true; - if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) + if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) { vi->has_rss = true; - if (vi->has_rss || vi->has_rss_hash_report) { vi->rss_indir_table_size = virtio_cread16(vdev, offsetof(struct virtio_net_config, rss_max_indirection_table_length)); + } + + if (vi->has_rss || vi->has_rss_hash_report) { vi->rss_key_size = virtio_cread8(vdev, offsetof(struct virtio_net_config, rss_max_key_size)); -- 2.43.0
Re: [agl-dev-community] Tips for debugging GPU acceleration.
On Fri, Oct 23, 2020 at 05:28:22PM +0100, Alex Bennée wrote: > Thanks - that seems to do the trick. > I can't find where in the agl repo that is set. The recipe seems to be in: > > > meta-agl-cluster-demo/recipie-graphics/wayland/weston-ini-conf/virtual-landscape.cfg > > but I can't see where the "transform=270" got added. I assume from meta-agl/meta-agl-bsp/meta-aglprofilegraphical/recipes-graphics/wayland/weston-ini-conf/virtual.cfg > > Visually it looks a lot nicer now although I could still do with > getting rid of the quite large AGL lower banner which is cropping out > the main display. Yes, like I've said, but apparently I've substituted landscape w/ portrait, applications are designed to accommodate portrait orientation. The panels, are added dynamically by homescreen. > > On Fri, 23 Oct 2020 at 13:32, Marius Vlad wrote: > > > > On Fri, Oct 23, 2020 at 12:52:17PM +0100, Alex Bennée wrote: > > > > > > Gerd Hoffmann writes: > > > > > > > Hi, > > > > > > > >> [2.930300] [drm] virgl 3d acceleration not supported by host > > > > > > > > Nope, not active. > > > > > > > >> -display gtk,show-cursor=on \ > > > > > > > > Needs -display gtk,gl=on for opengl support. > > > > > > Awesome - even on TCG the display is now nice and snappy. > > > > > > For bonus points: > > > > > > The AGL panel display is rotated 90 degrees which is putting a bit of a > > > crink in my neck. Is their anyway to rotate the view port. I tried > > > inverting xres and yres but that didn't do anything. > > Hi, > > > > The output is rotated, edit /etc/xdg/weston/weston.ini and comment out > > transform ini entry from the Virtual-1 output section. Reboot, or > > restart weston@display service. Note that the apps are optimized for > > landscape. > > > > Enabling 3D with qemu might be something worth adding in the docs. > > > > > > >> -device virtio-gpu-pci,virgl=true > > > > > > > > virgl is silently turned off in case opengl support is not available. > > > > Ideally virgl should be OnOffAuto not bool, but I fear changing that > > > > now (and start throwing errors on virgl=on if opengl is not available) > > > > will break stuff ... > > > > > > At least a warn_report maybe? > > > > > > > > > > > take care, > > > > Gerd > > > > > > > > > -- > > > Alex Bennée > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Alex Bennée > KVM/QEMU Hacker for Linaro > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#8773): > https://lists.automotivelinux.org/g/agl-dev-community/message/8773 > Mute This Topic: https://lists.automotivelinux.org/mt/77748325/4043866 > Group Owner: agl-dev-community+ow...@lists.automotivelinux.org > Unsubscribe: > https://lists.automotivelinux.org/g/agl-dev-community/leave/7327001/516878461/xyzzy > [marius.v...@collabora.com] > -=-=-=-=-=-=-=-=-=-=-=- > > signature.asc Description: PGP signature
Re: [agl-dev-community] Tips for debugging GPU acceleration.
On Fri, Oct 23, 2020 at 12:52:17PM +0100, Alex Bennée wrote: > > Gerd Hoffmann writes: > > > Hi, > > > >> [2.930300] [drm] virgl 3d acceleration not supported by host > > > > Nope, not active. > > > >> -display gtk,show-cursor=on \ > > > > Needs -display gtk,gl=on for opengl support. > > Awesome - even on TCG the display is now nice and snappy. > > For bonus points: > > The AGL panel display is rotated 90 degrees which is putting a bit of a > crink in my neck. Is their anyway to rotate the view port. I tried > inverting xres and yres but that didn't do anything. Hi, The output is rotated, edit /etc/xdg/weston/weston.ini and comment out transform ini entry from the Virtual-1 output section. Reboot, or restart weston@display service. Note that the apps are optimized for landscape. Enabling 3D with qemu might be something worth adding in the docs. > > >> -device virtio-gpu-pci,virgl=true > > > > virgl is silently turned off in case opengl support is not available. > > Ideally virgl should be OnOffAuto not bool, but I fear changing that > > now (and start throwing errors on virgl=on if opengl is not available) > > will break stuff ... > > At least a warn_report maybe? > > > > > take care, > > Gerd > > > -- > Alex Bennée > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#8771): > https://lists.automotivelinux.org/g/agl-dev-community/message/8771 > Mute This Topic: https://lists.automotivelinux.org/mt/77748325/4043866 > Group Owner: agl-dev-community+ow...@lists.automotivelinux.org > Unsubscribe: > https://lists.automotivelinux.org/g/agl-dev-community/leave/7327001/516878461/xyzzy > [marius.v...@collabora.com] > -=-=-=-=-=-=-=-=-=-=-=- > > signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v3 0/5] Fix handling of IPv4/IPv6 dual stack
On 06/01/2017 04:29 AM, Daniel P. Berrange wrote: > This is a followup to: > > v1: https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg05659.html > v2: https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04706.html > > This series aims to fix a lot of bugs related to handling of IPv4 / IPv6 > dual stack. > > - The VNC server mistakenly listened on two separate ports 5900+5901 >when the to= parameter was given > - IPv6 sockets are accepting IPv4 clients even when IPv4 is set to >be disabled > - IPv6 sockets are failing to accept IPv4 clients when IPv4 is not set >to be disabled > - The VNC server was loosing the ipv4=/ipv6= settings due to a bug >in the DNS resolver > > The behaviour of all this is really subtle and hard to get working correctly > across all the different network backends. Thus, the most important part of > this patch series is the last patch which adds a test case covering the > backends for -vnc, -chardev tcp, -net socket, and -incoming socket, with > a 120 entry matrix. > > IOW, if you think any of the first 4 patches are applying the wrong logic, > then take a look at the last patch and indicate which test matrix entries > are believed to be defining wrong behaviour :-) > > Changed in v2: > > - Clarified error message in commit message > - Replace assert with exit (Eric) > - Fix typo in test comment (Eric) > - Fix wierd copyright line (Eric) > > Daniel P. Berrange (5): > sockets: ensure we can bind to both ipv4 & ipv6 separately > sockets: don't block IPv4 clients when listening on "::" > sockets: ensure we don't accept IPv4 clients when IPv4 is disabled > io: preserve ipv4/ipv6 flags when resolving InetSocketAddress > tests: add functional test validating ipv4/ipv6 address flag handling > > io/dns-resolver.c | 6 +- > tests/.gitignore | 1 + > tests/Makefile.include | 4 + > tests/test-sockets-proto.c | 855 > + > util/qemu-sockets.c| 71 +++- > 5 files changed, 916 insertions(+), 21 deletions(-) > create mode 100644 tests/test-sockets-proto.c > Series Reviewed-by: Vlad Yasevich <vyase...@redhat.com> -vlad
Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters
On 06/01/2017 03:02 AM, Jason Wang wrote: > > > On 2017年05月31日 02:57, Dr. David Alan Gilbert wrote: >> * Vlad Yasevich (vyase...@redhat.com) wrote: >>> On 05/26/2017 12:03 AM, Jason Wang wrote: >>>> On 2017å¹´05月25æ—¥ 02:05, Vladislav Yasevich wrote: >>>>> Add parameters that control RARP/GARP announcement timeouts. >>>>> The parameters structure is added to the QAPI and a qmp command >>>>> is added to set/get the parameter data. >>>>> >>>>> Based on work by "Dr. David Alan Gilbert"<dgilb...@redhat.com> >>>>> >>>>> Signed-off-by: Vladislav Yasevich<vyase...@redhat.com> >>>> I think it's better to explain e.g under which condition do we need to >>>> tweak such >>>> parameters. >>>> >>>> Thanks >>>> >>> OK. I'll rip off what dgilbert wrote in his original series for the >>> description. >>> >>> Dave, if you have any text to add as to why migration might need to tweak >>> this, I'd >>> appreciate it. >> Pretty much what I originally said; that the existing values >> are arbitrary and fixed, and for systems with complex/sluggish >> network reconfiguration systems they can be too slow. >> >> Dave >> > > I agree, but I'm not sure how much it can help in fact unless management can > set > configuration specific parameters. And what we did here is best effort, > there's no > guarantee that G(R)ARP packet can reach the destination. > So, that's what the series allows. If management knows something new, it can set appropriate parameter values. Additionally, the management is also free to trigger additional announcements through the new commands. I am starting to think that just for the sake of migration, exposing announce_self interface to management might be sufficient. Management, when it deems migration complete, may use the interface to trigger announcements in addition to whatever best effort QEMU may attempt itself. Dave, would that be enough, or do the parameters still make sense? Thanks -vlad
Re: [Qemu-devel] [PATCH 08/12] announce_timer: Add ability to reset an existing
On 05/30/2017 03:35 PM, Dr. David Alan Gilbert wrote: > * Vladislav Yasevich (vyase...@redhat.com) wrote: >> It is now potentially possible to issue annouce-self command in a tight >> loop. Instead of doing nother, we can reset the timeout pararameters, >> especially since each instance may provide it's own values. This >> allows the user to extend or cut short currently runnig timer. >> >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> > > ah ok, you can ignore my comment on the previous patch then! > >> --- >> include/migration/vmstate.h | 1 + >> migration/savevm.c | 41 +++-- >> 2 files changed, 32 insertions(+), 10 deletions(-) >> >> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h >> index 689b685..6dfdac3 100644 >> --- a/include/migration/vmstate.h >> +++ b/include/migration/vmstate.h >> @@ -1057,6 +1057,7 @@ void vmstate_register_ram_global(struct MemoryRegion >> *memory); >> >> typedef struct AnnounceTimer { >> QEMUTimer *tm; >> +QemuMutex active_lock; >> struct AnnounceTimer **entry; >> AnnounceParameters params; >> QEMUClockType type; >> diff --git a/migration/savevm.c b/migration/savevm.c >> index dcba8bd..e43658f 100644 >> --- a/migration/savevm.c >> +++ b/migration/savevm.c >> @@ -220,20 +220,29 @@ static void qemu_announce_self_iter(NICState *nic, >> void *opaque) >> >> AnnounceTimer *announce_timers[QEMU_ANNOUNCE__MAX]; >> >> +static void qemu_announce_timer_destroy(AnnounceTimer *timer) >> +{ >> +timer_del(timer->tm); >> +timer_free(timer->tm); >> +qemu_mutex_destroy(>active_lock); > > Can you explain what makes this safe; we're not > holding the lock at this point are we? Are we guaranteed > that no one else will try and take it? > Either way it should be commented to say why it's safe. Looking at this again, it doesn't look safe. The problem is the lookup code. There is needs to be a lock on the on the global array that needs to be held during creation/modification. Thanks -vlad > > Dave > >> +g_free(timer); >> +} >> + >> static void qemu_announce_self_once(void *opaque) >> { >> AnnounceTimer *timer = (AnnounceTimer *)opaque; >> >> +qemu_mutex_lock(>active_lock); >> qemu_foreach_nic(qemu_announce_self_iter, NULL); >> >> -if (--timer->round) { >> +if (--timer->round ) { >> timer_mod(timer->tm, qemu_clock_get_ms(timer->type) + >>self_announce_delay(timer)); >> +qemu_mutex_unlock(>active_lock); >> } else { >> -*(timer->entry) = NULL; >> -timer_del(timer->tm); >> -timer_free(timer->tm); >> -g_free(timer); >> +*(timer->entry) = NULL; >> +qemu_mutex_unlock(>active_lock); >> +qemu_announce_timer_destroy(timer); >> } >> } >> >> @@ -242,6 +251,7 @@ AnnounceTimer >> *qemu_announce_timer_new(AnnounceParameters *params, >> { >> AnnounceTimer *timer = g_new(AnnounceTimer, 1); >> >> +qemu_mutex_init(>active_lock); >> timer->params = *params; >> timer->round = params->rounds; >> timer->type = type; >> @@ -259,6 +269,21 @@ AnnounceTimer >> *qemu_announce_timer_create(AnnounceParameters *params, >> return timer; >> } >> >> +static void qemu_announce_timer_update(AnnounceTimer *timer, >> + AnnounceParameters *params) >> +{ >> +qemu_mutex_lock(>active_lock); >> + >> +/* Update timer paramenter with any new values. >> + * Reset the number of rounds to run, and stop the current timer. >> + */ >> +timer->params = *params; >> +timer->round = params->rounds; >> +timer_del(timer->tm); >> + >> +qemu_mutex_unlock(>active_lock); >> +} >> + >> void qemu_announce_self(AnnounceParameters *params, AnnounceType type) >> { >> AnnounceTimer *timer; >> @@ -270,11 +295,7 @@ void qemu_announce_self(AnnounceParameters *params, >> AnnounceType type) >> announce_timers[type] = timer; >> timer->entry = _timers[type]; >> } else { >> -/* For now, don't do anything. If we want to reset the timer, >> - * we'll need to add locking to each announce timer to prevent >> - * races between timeout handling and a reset. >> - */ >> -return; >> +qemu_announce_timer_update(timer, params); >> } >> qemu_announce_self_once(timer); >> } >> -- >> 2.7.4 >> > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >
Re: [Qemu-devel] [PATCH 03/12] migration: Switch to using announcement timer
>> qemu_send_packet_raw(qemu_get_queue(nic), buf, len); >> } >> >> - >> static void qemu_announce_self_once(void *opaque) >> { >> -static int count = SELF_ANNOUNCE_ROUNDS; >> -QEMUTimer *timer = *(QEMUTimer **)opaque; >> +AnnounceTimer *timer = (AnnounceTimer *)opaque; >> >> qemu_foreach_nic(qemu_announce_self_iter, NULL); >> >> -if (--count) { >> -/* delay 50ms, 150ms, 250ms, ... */ >> -timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + >> - self_announce_delay(count)); >> +if (--timer->round) { >> +timer_mod(timer->tm, qemu_clock_get_ms(timer->type) + >> + self_announce_delay(timer)); >> } else { >> -timer_del(timer); >> -timer_free(timer); >> +timer_del(timer->tm); >> +timer_free(timer->tm); >> +g_free(timer); > > That's kind of odd in that it doesn't cause the thing the opaque > pointed to, to be zero'd, so problem if someone free'd it > in an exit path? But probably OK in this case. > In this case, the opaque is the AnnounceTimer itself. So we end up: +-> Announce_timer: +> QemuTimer: | tm ---+ opaque --+ || ++ zero-ing opaque is a really qemu-timers job, but it doesn't do so, and in this case, it's OK (sort of), since we call timer_free(). Thanks -vlad > Dave > >> } >> } >> >> @@ -252,11 +250,13 @@ AnnounceTimer >> *qemu_announce_timer_create(AnnounceParameters *params, >> return timer; >> } >> >> -void qemu_announce_self(void) >> +void qemu_announce_self(AnnounceParameters *params) >> { >> -static QEMUTimer *timer; >> -timer = timer_new_ms(QEMU_CLOCK_REALTIME, qemu_announce_self_once, >> ); >> -qemu_announce_self_once(); >> +AnnounceTimer *timer; >> + >> +timer = qemu_announce_timer_create(params, QEMU_CLOCK_REALTIME, >> + qemu_announce_self_once); >> +qemu_announce_self_once(timer); >> } >> >> /***/ >> @@ -1730,7 +1730,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque) >> */ >> cpu_synchronize_all_post_init(); >> >> -qemu_announce_self(); >> +qemu_announce_self(qemu_get_announce_params()); >> >> /* Make sure all file formats flush their mutable metadata. >> * If we get an error here, just don't restart the VM yet. */ >> -- >> 2.7.4 >> > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >
Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command
On 05/30/2017 10:14 AM, Daniel P. Berrange wrote: > On Wed, May 24, 2017 at 02:05:22PM -0400, Vladislav Yasevich wrote: >> Add a qmp command that can trigger guest announcements. >> >> Based on work of Germano Veit Michel <germ...@redhat.com> >> >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >> --- >> migration/savevm.c | 14 ++ >> qapi-schema.json | 19 +++ >> 2 files changed, 33 insertions(+) >> >> diff --git a/migration/savevm.c b/migration/savevm.c >> index a4097c9..b55ce6a 100644 >> --- a/migration/savevm.c >> +++ b/migration/savevm.c >> @@ -265,6 +265,20 @@ void qemu_announce_self(AnnounceParameters *params) >> qemu_announce_self_once(timer); >> } >> >> +void qmp_announce_self(bool has_params, AnnounceParameters *params, >> + Error **errp) >> +{ >> +AnnounceParameters announce_params; >> + >> +memcpy(_params, qemu_get_announce_params(), >> + sizeof(announce_params)); >> + >> +if (has_params) >> +qemu_set_announce_parameters(_params, params); >> + >> +qemu_announce_self(_params); >> +} > > I'm not a huge fan of the idea of setting announce parameters in a > global namespace, via a previously issued separate command. > > I realize this is already done for the 'migrate' command, but there > it has been the cause of a number of bugs in libvirt mgmt of QEMU. > At least with migrate it makes sense for some parameters which need > to be tuned 'on the fly' after migrate already started. That doesn't > apply to this command though. > > So I tend to think it'd be nicer to just make the parameters mandatory > for this command, so it is self-contanied does and not rely on previously > set (or potentially unknown/indeterminate) global state. > The reason I didn't do this is that I didn't want to require the user to know or even think about the timeout values. I did want to provide an option though for those who may want to tweak things. In most circumstances, the timeout values would never really be touched and we'll just use the qemu defaults. in rare circumstances, values might need to be changed during migration. Here, I can definitely see the need to give migration it's own values. > I'd also suggest that instad of adding an 'announce_set_parameters', > the parameters for the automatic announce at end of migration should > be set via the 'migrate_set_parameters' command. I can do that. > >> diff --git a/qapi-schema.json b/qapi-schema.json >> index 2030087..126b09d 100644 >> --- a/qapi-schema.json >> +++ b/qapi-schema.json >> @@ -654,6 +654,25 @@ >>'returns': 'AnnounceParameters' } >> >> ## >> +# @announce-self: >> +# >> +# Trigger generation of broadcast RARP frames to update network switches. >> +# This can be useful when network bonds fail-over the active slave. >> +# >> +# Arguments: None. > > "None", but it has some arguments listed right below will fix Thanks -vlad > >> +# >> +# Example: >> +# >> +# -> { "execute": "announce-self" >> +# "arguments": { "announce-rounds": 5 } } >> +# <- { "return": {} } >> +# >> +# Since: 2.10 >> +## >> +{ 'command': 'announce-self', >> + 'data' : {'*params': 'AnnounceParameters'} } >> + >> +## >> # @MigrationStats: >> # >> # Detailed migration status. >> -- >> 2.7.4 >> >> > > Regards, > Daniel >
Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command
On 05/30/2017 10:24 AM, Juan Quintela wrote: > Vlad Yasevich <vyase...@redhat.com> wrote: >> On 05/30/2017 06:11 AM, Juan Quintela wrote: >>> Vladislav Yasevich <vyase...@redhat.com> wrote: >>>> Add a qmp command that can trigger guest announcements. >>>> >>>> Based on work of Germano Veit Michel <germ...@redhat.com> >>>> >>>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >>>> --- >>>> migration/savevm.c | 14 ++ >>>> qapi-schema.json | 19 +++ >>>> 2 files changed, 33 insertions(+) >>>> >>>> diff --git a/migration/savevm.c b/migration/savevm.c >>>> index a4097c9..b55ce6a 100644 >>>> --- a/migration/savevm.c >>>> +++ b/migration/savevm.c >>>> @@ -265,6 +265,20 @@ void qemu_announce_self(AnnounceParameters *params) >>>> qemu_announce_self_once(timer); >>>> } >>>> >>>> +void qmp_announce_self(bool has_params, AnnounceParameters *params, >>>> + Error **errp) >>>> +{ >>>> +AnnounceParameters announce_params; >>>> + >>>> +memcpy(_params, qemu_get_announce_params(), >>>> + sizeof(announce_params)); >>>> + >>>> +if (has_params) >>>> +qemu_set_announce_parameters(_params, params); >>>> + >>>> +qemu_announce_self(_params); >>> >>> Are I missreading qemu_annouce_self()? >>> My reading is that it passes announce_params to a timer (i.e. async >>> function), but here announce_params is a local variable here, no? >>> >> >> The AnnounceTimer holds a copy since each timer may have it's own values. > > > >> AnnounceTimer *qemu_announce_timer_new(AnnounceParameters *params, >> QEMUClockType type) >> { >>AnnounceTimer *timer = g_new(AnnounceTimer, 1); >> >>timer->params = *params; > > I have to remomember that C has learn how to copy structures long ago. > > > > I was missing the "*" on my previous reading, sorry for the noise. that actually changed, since as Eric Pointed out, shallow copies shouldn't be used. In the v2 code, this uses qemu_set_announce_paramters(), but the copy essentially remains. -vlad > >>timer->round = params->rounds; >>timer->type = type; >> >>return timer; >> } > >
Re: [Qemu-devel] [PATCH 09/12] hmp: add announce paraters info/set
On 05/30/2017 06:18 AM, Juan Quintela wrote: > Vladislav Yasevich <vyase...@redhat.com> wrote: >> Add HMP command to control and read annoucment parameters. >> >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> > > Reviewed-by: Juan Quintela <quint...@redhat.com> > > >> + cleanup: >> +if (err) { >> +error_report_err(err); >> +} >> +} > > > My understanding is that this the "cool way" to spell this nowadays is: > > hmp_handle_error(mon, ); > > Just in case you want to change it. > Thanks. I must have used an older function as base. I'll check it out. -vlad
Re: [Qemu-devel] [PATCH 03/12] migration: Switch to using announcement timer
On 05/30/2017 06:10 AM, Juan Quintela wrote: > Vladislav Yasevich <vyase...@redhat.com> wrote: >> Switch qemu_announce_self and virtio annoucements to use >> the announcement timer framework. This makes sure that both >> timers use the same timeouts and number of annoucement attempts >> >> Based on work by Dr. David Alan Gilbert <dgilb...@redhat.com> >> >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> > >> static void qemu_announce_self_once(void *opaque) >> { >> -static int count = SELF_ANNOUNCE_ROUNDS; >> -QEMUTimer *timer = *(QEMUTimer **)opaque; >> +AnnounceTimer *timer = (AnnounceTimer *)opaque; > > Cast from void * is never needed. > OK Thanks -vlad
Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command
On 05/30/2017 06:11 AM, Juan Quintela wrote: > Vladislav Yasevich <vyase...@redhat.com> wrote: >> Add a qmp command that can trigger guest announcements. >> >> Based on work of Germano Veit Michel <germ...@redhat.com> >> >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >> --- >> migration/savevm.c | 14 ++ >> qapi-schema.json | 19 +++ >> 2 files changed, 33 insertions(+) >> >> diff --git a/migration/savevm.c b/migration/savevm.c >> index a4097c9..b55ce6a 100644 >> --- a/migration/savevm.c >> +++ b/migration/savevm.c >> @@ -265,6 +265,20 @@ void qemu_announce_self(AnnounceParameters *params) >> qemu_announce_self_once(timer); >> } >> >> +void qmp_announce_self(bool has_params, AnnounceParameters *params, >> + Error **errp) >> +{ >> +AnnounceParameters announce_params; >> + >> +memcpy(_params, qemu_get_announce_params(), >> + sizeof(announce_params)); >> + >> +if (has_params) >> +qemu_set_announce_parameters(_params, params); >> + >> +qemu_announce_self(_params); > > Are I missreading qemu_annouce_self()? > My reading is that it passes announce_params to a timer (i.e. async > function), but here announce_params is a local variable here, no? > The AnnounceTimer holds a copy since each timer may have it's own values. Thanks -vlad
Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters
On 05/30/2017 05:58 AM, Juan Quintela wrote: > Vladislav Yasevich <vyase...@redhat.com> wrote: >> Add parameters that control RARP/GARP announcement timeouts. >> The parameters structure is added to the QAPI and a qmp command >> is added to set/get the parameter data. >> >> Based on work by "Dr. David Alan Gilbert" <dgilb...@redhat.com> >> >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> > > Hi > >> diff --git a/migration/savevm.c b/migration/savevm.c >> index f5e8194..cee2837 100644 >> --- a/migration/savevm.c >> +++ b/migration/savevm.c >> @@ -78,6 +78,104 @@ static struct mig_cmd_args { >> [MIG_CMD_MAX] = { .len = -1, .name = "MAX" }, >> }; >> > > Once that you are touching this, shouldn't it be better to be in > net/? > They have nothing to do with migration really. > Yeah, I considered this, but could really find a good slot for them. I can either put them into net.c directly or pull them out into their own little file. > >> +#define QEMU_ANNOUNCE_INITIAL50 >> +#define QEMU_ANNOUNCE_MAX 550 >> +#define QEMU_ANNOUNCE_ROUNDS 5 >> +#define QEMU_ANNOUNCE_STEP 100 >> + >> +AnnounceParameters *qemu_get_announce_params(void) >> +{ >> +static AnnounceParameters announce = { >> +.initial = QEMU_ANNOUNCE_INITIAL, >> +.max = QEMU_ANNOUNCE_MAX, >> +.rounds = QEMU_ANNOUNCE_ROUNDS, >> +.step = QEMU_ANNOUNCE_STEP, >> +}; >> + >> +return >> +} >> + >> +void qemu_fill_announce_parameters(AnnounceParameters **to, >> + AnnounceParameters *from) >> +{ >> +AnnounceParameters *params; >> + >> +params = *to = g_malloc0(sizeof(*params)); >> +params->has_initial = true; >> +params->initial = from->initial; >> +params->has_max = true; >> +params->max = from->max; >> +params->has_rounds = true; >> +params->rounds = from->rounds; >> +params->has_step = true; >> +params->step = from->step; >> +} >> + >> +bool qemu_validate_announce_parameters(AnnounceParameters *params, Error >> **errp) >> +{ >> +if (params->has_initial && >> +(params->initial < 1 || params->initial > 10)) { > > This is independent of this patch, but we really need a macro: > CHECK(field, mininum, maximum) > > We repeat this left and right. > >> +void qemu_set_announce_parameters(AnnounceParameters *announce_params, >> + AnnounceParameters *params) > >> +void qmp_announce_set_parameters(AnnounceParameters *params, Error **errp) > > Really similar functions name. I assume by know that the 1st function > is used somewhere else in the series. > Yes, the first function is going to be used later. Thanks -vlad
Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command
On 05/26/2017 09:16 AM, Eric Blake wrote: > On 05/24/2017 01:05 PM, Vladislav Yasevich wrote: >> Add a qmp command that can trigger guest announcements. >> >> Based on work of Germano Veit Michel <germ...@redhat.com> >> >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >> --- >> migration/savevm.c | 14 ++ >> qapi-schema.json | 19 +++ >> 2 files changed, 33 insertions(+) >> > >> +void qmp_announce_self(bool has_params, AnnounceParameters *params, >> + Error **errp) >> +{ >> +AnnounceParameters announce_params; >> + >> +memcpy(_params, qemu_get_announce_params(), >> + sizeof(announce_params)); > > Shallow copies of a QAPI type happen to work when the type is all scalar > (as AnnounceParameters currently is), but it's more robust to use > QAPI_CLONE() or QAPI_CLONE_MEMBERS() so that any future non-scalar > additions to the parameters type will be correctly copied without > introducing memory leaks or double frees. > > Even this might be better: > AnnounceParameters announce_params = { 0 }; > qemu_set_announce_parameters(_params, qemu_get_announce_params()); > Ok. >> + >> +if (has_params) >> +qemu_set_announce_parameters(_params, params); >> + >> +qemu_announce_self(_params); >> +} > The QMP changes look okay. Do you have a testsuite covering the new QMP > command somewhere in the series? > There is basic test in patch 12. Thanks -vlad
Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters
On 05/26/2017 09:08 AM, Eric Blake wrote: > On 05/24/2017 01:05 PM, Vladislav Yasevich wrote: >> Add parameters that control RARP/GARP announcement timeouts. >> The parameters structure is added to the QAPI and a qmp command >> is added to set/get the parameter data. >> >> Based on work by "Dr. David Alan Gilbert" <dgilb...@redhat.com> >> >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >> --- > > Just an interface review for now: > >> +++ b/qapi-schema.json >> @@ -569,6 +569,90 @@ >> ## >> { 'command': 'query-events', 'returns': ['EventInfo'] } >> >> + >> +## >> +# @AnnounceParameter: >> +# >> +# @AnnounceParameter enumeration >> +# >> +# @initial: Initial delay (in ms) before sending the first GARP/RARP >> +# announcement >> +# >> +# @max: Maximum delay (in ms) to between GARP/RARP announcemnt packets > > s/announcemnt/announcement/ > >> +# >> +# @rounds: Number of self-announcement attempts >> +# >> +# @step: Delay increate (in ms) after each self-announcment attempt > > s/increate/increase/ > s/announcment/announcement/ > >> +# >> +# Since: 2.10 >> +## >> +{ 'enum' : 'AnnounceParameter', >> + 'data' : [ 'initial', 'max', 'rounds', 'step' ] } > > Why are we creating an enum here? Without reading further, it looks > like you plan on using the enum to delineate members of a union? But > that feels like it will be overly complicated. A struct should be > sufficient (each parameter being an optional member of the struct, where > you can supply as many or as few on input, but all are reported on output). > I end up using them for the HMP interface. If it's better, I can move this definition to the HMP patch. Thanks -vlad
Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters
On 05/26/2017 12:03 AM, Jason Wang wrote: > > On 2017年05月25日 02:05, Vladislav Yasevich wrote: >> Add parameters that control RARP/GARP announcement timeouts. >> The parameters structure is added to the QAPI and a qmp command >> is added to set/get the parameter data. >> >> Based on work by "Dr. David Alan Gilbert" <dgilb...@redhat.com> >> >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> > > I think it's better to explain e.g under which condition do we need to tweak > such parameters. > > Thanks > OK. I'll rip off what dgilbert wrote in his original series for the description. Dave, if you have any text to add as to why migration might need to tweak this, I'd appreciate it. Thanks -vlad
Re: [Qemu-devel] [PATCH 03/12] migration: Switch to using announcement timer
timer_free(n->announce_timer->tm); >> +g_free(n->announce_timer); >> g_free(n->vqs); >> qemu_del_nic(n->nic); >> virtio_cleanup(vdev); >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h >> index 1eec9a2..51dd4c3 100644 >> --- a/include/hw/virtio/virtio-net.h >> +++ b/include/hw/virtio/virtio-net.h >> @@ -94,8 +94,7 @@ typedef struct VirtIONet { >> char *netclient_name; >> char *netclient_type; >> uint64_t curr_guest_offloads; >> -QEMUTimer *announce_timer; >> -int announce_counter; >> +AnnounceTimer *announce_timer; >> bool needs_vnet_hdr_swap; >> } VirtIONet; >> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h >> index a6bf84d..f8aed9b 100644 >> --- a/include/migration/vmstate.h >> +++ b/include/migration/vmstate.h >> @@ -1022,8 +1022,6 @@ extern const VMStateInfo vmstate_info_qtailq; >> #define VMSTATE_END_OF_LIST() \ >> {} >> -#define SELF_ANNOUNCE_ROUNDS 5 >> - >> void loadvm_free_handlers(MigrationIncomingState *mis); >> int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd, >> @@ -1071,11 +1069,18 @@ AnnounceTimer >> *qemu_announce_timer_create(AnnounceParameters >> *params, >> QEMUTimerCB *cb); >> static inline >> -int64_t self_announce_delay(int round) >> +int64_t self_announce_delay(AnnounceTimer *timer) >> { >> -assert(round < SELF_ANNOUNCE_ROUNDS && round > 0); >> -/* delay 50ms, 150ms, 250ms, ... */ >> -return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100; >> +int64_t ret; >> + >> +ret = timer->params.initial + >> + (timer->params.rounds - timer->round - 1) * >> + timer->params.step; >> + >> +if (ret < 0 || ret > timer->params.max) { >> +ret = timer->params.max; >> +} > > Can we move this check to qemu_validate_announce_parameters()? Not really. Consider a situation where you don't really want to wait for more the then say 5 seconds between announcements, but you want to increase the number of announcement attempts (since you are not sure how long it will take to plumb the whole network path). In this case, you will hit the max delay before hitting max rounds, and the code above makes sure that we don't exceed max delay. I can update the names to make it clearer, if it will help. Thanks -vlad > >> +return ret; >> } >> void dump_vmstate_json_to_file(FILE *out_fp); >> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h >> index 7fd49c4..2ef1687 100644 >> --- a/include/sysemu/sysemu.h >> +++ b/include/sysemu/sysemu.h >> @@ -85,7 +85,7 @@ bool qemu_validate_announce_parameters(AnnounceParameters >> *params, >> Error **errp); >> void qemu_set_announce_parameters(AnnounceParameters *announce_params, >> AnnounceParameters *params); >> -void qemu_announce_self(void); >> +void qemu_announce_self(AnnounceParameters *params); >> /* Subcommands for QEMU_VM_COMMAND */ >> enum qemu_vm_cmd { >> diff --git a/migration/migration.c b/migration/migration.c >> index 0304c01..987c1cf 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -345,7 +345,7 @@ static void process_incoming_migration_bh(void *opaque) >>* This must happen after all error conditions are dealt with and >>* we're sure the VM is going to be running on this host. >>*/ >> -qemu_announce_self(); >> +qemu_announce_self(qemu_get_announce_params()); >> /* If global state section was not received or we are in running >> state, we need to obey autostart. Any other state is set with >> diff --git a/migration/savevm.c b/migration/savevm.c >> index 607b090..555157a 100644 >> --- a/migration/savevm.c >> +++ b/migration/savevm.c >> @@ -212,21 +212,19 @@ static void qemu_announce_self_iter(NICState *nic, >> void *opaque) >> qemu_send_packet_raw(qemu_get_queue(nic), buf, len); >> } >> - >> static void qemu_announce_self_once(void *opaque) >> { >> -static int count = SELF_ANNOUNCE_ROUNDS; >> -QEMUTimer *timer = *(QEMUTimer **)opaque; >> +AnnounceTimer *timer = (AnnounceTimer *)opaque; >> qemu_foreach_nic(qemu_announce_self_iter, NULL); >
Re: [Qemu-devel] [PATCH v2] virtio_net: Bypass backends for MTU feature negotiation
On 05/23/2017 08:31 AM, Maxime Coquelin wrote: > This patch adds a new internal "x-mtu-bypass-backend" property > to bypass backends for MTU feature negotiation. > > When this property is set, the MTU feature is negotiated as soon > as supported by the guest and a MTU value is set via the host_mtu > parameter. In case the backend advertises the feature (e.g. DPDK's > vhost-user backend), the feature negotiation is propagated down to > the backend. > > When this property is not set, the backend has to support the MTU > feature for its negotiation to succeed. > > For compatibility purpose, this property is disabled for machine > types v2.9 and older. > > Cc: Aaron Conole <acon...@redhat.com> > Suggested-by: Michael S. Tsirkin <m...@redhat.com> > Signed-off-by: Maxime Coquelin <maxime.coque...@redhat.com> Reviewed-by: Vlad Yasevich <vyase...@redhat.com> -vlad > --- > > Tests performed: > - Vhost-net kernel backendi, host_mtu=2000: > * default machine type: guest MTU = 2000 > * pc-q35-2.9 machine type: Guest MTU = 1500 > > - Vhost-net user backend, host_mtu=2000: > * DPDK v17.05 (MTU feature supported on backend side) > - default machine type: guest MTU = 2000 > - pc-q35-2.9 machine type: guest MTU = 2000) > * DPDK v16.11 (MTU feature not supported on backend side) > - default machine type: guest MTU = 2000 > - pc-q35-2.9 machine type: guest MTU = 1500) > > hw/net/virtio-net.c| 17 - > include/hw/compat.h| 4 > include/hw/virtio/virtio-net.h | 1 + > include/hw/virtio/virtio.h | 1 + > 4 files changed, 22 insertions(+), 1 deletion(-) > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index 7d091c9..39c336e 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -589,7 +589,15 @@ static uint64_t virtio_net_get_features(VirtIODevice > *vdev, uint64_t features, > if (!get_vhost_net(nc->peer)) { > return features; > } > -return vhost_net_get_features(get_vhost_net(nc->peer), features); > +features = vhost_net_get_features(get_vhost_net(nc->peer), features); > +vdev->backend_features = features; > + > +if (n->mtu_bypass_backend && > +(n->host_features & 1ULL << VIRTIO_NET_F_MTU)) { > +features |= (1ULL << VIRTIO_NET_F_MTU); > +} > + > +return features; > } > > static uint64_t virtio_net_bad_features(VirtIODevice *vdev) > @@ -640,6 +648,11 @@ static void virtio_net_set_features(VirtIODevice *vdev, > uint64_t features) > VirtIONet *n = VIRTIO_NET(vdev); > int i; > > +if (n->mtu_bypass_backend && > +!virtio_has_feature(vdev->backend_features, VIRTIO_NET_F_MTU)) { > +features &= ~(1ULL << VIRTIO_NET_F_MTU); > +} > + > virtio_net_set_multiqueue(n, >virtio_has_feature(features, VIRTIO_NET_F_MQ)); > > @@ -2090,6 +2103,8 @@ static Property virtio_net_properties[] = { > DEFINE_PROP_UINT16("rx_queue_size", VirtIONet, net_conf.rx_queue_size, > VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE), > DEFINE_PROP_UINT16("host_mtu", VirtIONet, net_conf.mtu, 0), > +DEFINE_PROP_BOOL("x-mtu-bypass-backend", VirtIONet, mtu_bypass_backend, > + true), > DEFINE_PROP_END_OF_LIST(), > }; > > diff --git a/include/hw/compat.h b/include/hw/compat.h > index 55b1765..181f450 100644 > --- a/include/hw/compat.h > +++ b/include/hw/compat.h > @@ -6,6 +6,10 @@ > .driver = "pci-bridge",\ > .property = "shpc",\ > .value= "off",\ > +},{\ > +.driver = "virtio-net-device",\ > +.property = "x-mtu-bypass-backend",\ > +.value= "off",\ > }, > > #define HW_COMPAT_2_8 \ > diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h > index 1eec9a2..602b486 100644 > --- a/include/hw/virtio/virtio-net.h > +++ b/include/hw/virtio/virtio-net.h > @@ -97,6 +97,7 @@ typedef struct VirtIONet { > QEMUTimer *announce_timer; > int announce_counter; > bool needs_vnet_hdr_swap; > +bool mtu_bypass_backend; > } VirtIONet; > > void virtio_net_set_netclient_name(VirtIONet *n, const char *name, > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index 7b6edba..80c45c3 100644 > --- a/include/hw/virtio/virtio.h > +++ b/include/hw/virtio/virtio.h > @@ -79,6 +79,7 @@ struct VirtIODevice > uint16_t queue_sel; > uint64_t guest_features; > uint64_t host_features; > +uint64_t backend_features; > size_t config_len; > void *config; > uint16_t config_vector; >
Re: [Qemu-devel] [PATCH V2] migration: expose qemu_announce_self() via qmp
On 05/12/2017 03:24 PM, Dr. David Alan Gilbert wrote: > * Vlad Yasevich (vyase...@redhat.com) wrote: >> On 02/20/2017 07:16 PM, Germano Veit Michel wrote: >>> qemu_announce_self() is triggered by qemu at the end of migrations >>> to update the network regarding the path to the guest l2addr. >>> >>> however it is also useful when there is a network change such as >>> an active bond slave swap. Essentially, it's the same as a migration >>> from a network perspective - the guest moves to a different point >>> in the network topology. >>> >>> this exposes the function via qmp. >>> >>> Signed-off-by: Germano Veit Michel <germ...@redhat.com> >>> --- >>> include/migration/vmstate.h | 5 + >>> migration/savevm.c | 30 +++--- >>> qapi-schema.json| 18 ++ >>> 3 files changed, 42 insertions(+), 11 deletions(-) >>> >>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h >>> index 63e7b02..a08715c 100644 >>> --- a/include/migration/vmstate.h >>> +++ b/include/migration/vmstate.h >>> @@ -1042,6 +1042,11 @@ int64_t self_announce_delay(int round) >>> return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100; >>> } >>> >>> +struct AnnounceRound { >>> +QEMUTimer *timer; >>> +int count; >>> +}; >>> + >>> void dump_vmstate_json_to_file(FILE *out_fp); >>> >>> #endif >>> diff --git a/migration/savevm.c b/migration/savevm.c >>> index 5ecd264..44e196b 100644 >>> --- a/migration/savevm.c >>> +++ b/migration/savevm.c >>> @@ -118,29 +118,37 @@ static void qemu_announce_self_iter(NICState >>> *nic, void *opaque) >>> qemu_send_packet_raw(qemu_get_queue(nic), buf, len); >>> } >>> >>> - >>> static void qemu_announce_self_once(void *opaque) >>> { >>> -static int count = SELF_ANNOUNCE_ROUNDS; >>> -QEMUTimer *timer = *(QEMUTimer **)opaque; >>> +struct AnnounceRound *round = opaque; >>> >>> qemu_foreach_nic(qemu_announce_self_iter, NULL); >>> >>> -if (--count) { >>> +round->count--; >>> +if (round->count) { >>> /* delay 50ms, 150ms, 250ms, ... */ >>> -timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + >>> - self_announce_delay(count)); >>> +timer_mod(round->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + >>> + self_announce_delay(round->count)); >>> } else { >>> -timer_del(timer); >>> -timer_free(timer); >>> +timer_del(round->timer); >>> +timer_free(round->timer); >>> +g_free(round); >>> } >>> } >>> >>> void qemu_announce_self(void) >>> { >>> -static QEMUTimer *timer; >>> -timer = timer_new_ms(QEMU_CLOCK_REALTIME, qemu_announce_self_once, >>> ); >>> -qemu_announce_self_once(); >>> +struct AnnounceRound *round = g_malloc(sizeof(struct AnnounceRound)); >>> +if (!round) >>> +return; >>> +round->count = SELF_ANNOUNCE_ROUNDS; >>> +round->timer = timer_new_ms(QEMU_CLOCK_REALTIME, >>> qemu_announce_self_once, round); >>> +qemu_announce_self_once(round); >>> +} >> >> So, I've been looking and this code and have been playing with it and with >> David's >> patches and my patches to include virtio self announcements as well. What >> I've discovered >> is what I think is a possible packet amplification issue here. >> >> This creates a new timer every time we do do a announce_self. With just >> migration, >> this is not an issue since you only migrate once at a time, so there is only >> 1 timer. >> With exposing this as an API, a user can potentially call it in a tight loop >> and now you have a ton of timers being created. Add in David's patches >> allowing timeouts >> and retries to be configurable, and you may now have a ton of long lived >> timers. >> Add in the patches I am working on to let virtio do self announcements too >> (to really fix >> bonding issues), and now you add in a possibility of a lot of packets being >> sent for >> each timeout (RARP, GARP, NA, IGMPv4 Reports, IGMPv6 Reports [even worse if >> MLD1 is used]). >
Re: [Qemu-devel] [PATCH V2] migration: expose qemu_announce_self() via qmp
On 02/20/2017 07:16 PM, Germano Veit Michel wrote: > qemu_announce_self() is triggered by qemu at the end of migrations > to update the network regarding the path to the guest l2addr. > > however it is also useful when there is a network change such as > an active bond slave swap. Essentially, it's the same as a migration > from a network perspective - the guest moves to a different point > in the network topology. > > this exposes the function via qmp. > > Signed-off-by: Germano Veit Michel <germ...@redhat.com> > --- > include/migration/vmstate.h | 5 + > migration/savevm.c | 30 +++--- > qapi-schema.json| 18 ++ > 3 files changed, 42 insertions(+), 11 deletions(-) > > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h > index 63e7b02..a08715c 100644 > --- a/include/migration/vmstate.h > +++ b/include/migration/vmstate.h > @@ -1042,6 +1042,11 @@ int64_t self_announce_delay(int round) > return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100; > } > > +struct AnnounceRound { > +QEMUTimer *timer; > +int count; > +}; > + > void dump_vmstate_json_to_file(FILE *out_fp); > > #endif > diff --git a/migration/savevm.c b/migration/savevm.c > index 5ecd264..44e196b 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -118,29 +118,37 @@ static void qemu_announce_self_iter(NICState > *nic, void *opaque) > qemu_send_packet_raw(qemu_get_queue(nic), buf, len); > } > > - > static void qemu_announce_self_once(void *opaque) > { > -static int count = SELF_ANNOUNCE_ROUNDS; > -QEMUTimer *timer = *(QEMUTimer **)opaque; > +struct AnnounceRound *round = opaque; > > qemu_foreach_nic(qemu_announce_self_iter, NULL); > > -if (--count) { > +round->count--; > +if (round->count) { > /* delay 50ms, 150ms, 250ms, ... */ > -timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + > - self_announce_delay(count)); > +timer_mod(round->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + > + self_announce_delay(round->count)); > } else { > -timer_del(timer); > -timer_free(timer); > +timer_del(round->timer); > +timer_free(round->timer); > +g_free(round); > } > } > > void qemu_announce_self(void) > { > -static QEMUTimer *timer; > -timer = timer_new_ms(QEMU_CLOCK_REALTIME, qemu_announce_self_once, > ); > -qemu_announce_self_once(); > +struct AnnounceRound *round = g_malloc(sizeof(struct AnnounceRound)); > +if (!round) > +return; > +round->count = SELF_ANNOUNCE_ROUNDS; > +round->timer = timer_new_ms(QEMU_CLOCK_REALTIME, > qemu_announce_self_once, round); > +qemu_announce_self_once(round); > +} So, I've been looking and this code and have been playing with it and with David's patches and my patches to include virtio self announcements as well. What I've discovered is what I think is a possible packet amplification issue here. This creates a new timer every time we do do a announce_self. With just migration, this is not an issue since you only migrate once at a time, so there is only 1 timer. With exposing this as an API, a user can potentially call it in a tight loop and now you have a ton of timers being created. Add in David's patches allowing timeouts and retries to be configurable, and you may now have a ton of long lived timers. Add in the patches I am working on to let virtio do self announcements too (to really fix bonding issues), and now you add in a possibility of a lot of packets being sent for each timeout (RARP, GARP, NA, IGMPv4 Reports, IGMPv6 Reports [even worse if MLD1 is used]). As you can see, this can get rather ugly... I think we need timer user here. Migration and QMP being two to begin with. Each one would get a single timer to play with. If a given user already has a timer running, we could return an error or just not do anything. -vlad > + > +void qmp_announce_self(Error **errp) > +{ > +qemu_announce_self(); > } > > /***/ > diff --git a/qapi-schema.json b/qapi-schema.json > index baa0d26..0d9bffd 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -6080,3 +6080,21 @@ > # > ## > { 'command': 'query-hotpluggable-cpus', 'returns': ['HotpluggableCPU'] } > + > +## > +# @announce-self: > +# > +# Trigger generation of broadcast RARP frames to update network switches. > +# This can be useful when network bonds fail-over the active slave. > +# > +# Arguments: None. > +# > +# Example: > +# > +# -> { "execute": "announce-self" } > +# <- { "return": {} } > +# > +# Since: 2.9 > +## > +{ 'command': 'announce-self' } > + >
Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self
On 05/02/2017 06:41 AM, Dr. David Alan Gilbert wrote: > * Vlad Yasevich (vyase...@redhat.com) wrote: >> On 03/30/2017 11:26 AM, Dr. David Alan Gilbert wrote: >>> * Vlad Yasevich (vyase...@redhat.com) wrote: >>>> On 03/30/2017 10:53 AM, Dr. David Alan Gilbert wrote: >>>>> * Vlad Yasevich (vyase...@redhat.com) wrote: >>>>>> On 03/30/2017 04:24 AM, Dr. David Alan Gilbert wrote: >>>>>>> * Vladislav Yasevich (vyase...@redhat.com) wrote: >>>>>>>> virtio announcements seem to run on its own timer with it's own >>>>>>>> repetition loop and are invoked separately from qemu_announce_self. >>>>>>>> This patch consolidates announcements into a single location and >>>>>>>> allows other nics to provide it's own announcement capabilities, if >>>>>>>> necessary. >>>>>>>> >>>>>>>> This is also usefull in support of exposing qemu_announce_self through >>>>>>>> qmp/hmp. >>>>>>>> >>>>>>>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >>>>>>>> --- >>>>>>>> hw/net/virtio-net.c| 30 -- >>>>>>>> include/hw/virtio/virtio-net.h | 2 -- >>>>>>>> include/net/net.h | 2 ++ >>>>>>>> migration/savevm.c | 6 ++ >>>>>>>> 4 files changed, 16 insertions(+), 24 deletions(-) >>>>>>>> >>>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>>>>> index c321680..6e9ce4f 100644 >>>>>>>> --- a/hw/net/virtio-net.c >>>>>>>> +++ b/hw/net/virtio-net.c >>>>>>>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, >>>>>>>> uint8_t status) >>>>>>>> (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running; >>>>>>>> } >>>>>>>> >>>>>>>> -static void virtio_net_announce_timer(void *opaque) >>>>>>>> +static void virtio_net_announce(NetClientState *nc) >>>>>>>> { >>>>>>>> -VirtIONet *n = opaque; >>>>>>>> +VirtIONet *n = qemu_get_nic_opaque(nc); >>>>>>>> VirtIODevice *vdev = VIRTIO_DEVICE(n); >>>>>>>> >>>>>>>> -n->announce_counter--; >>>>>>>> -n->status |= VIRTIO_NET_S_ANNOUNCE; >>>>>>>> -virtio_notify_config(vdev); >>>>>>>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) && >>>>>>>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) { >>>>>>>> +n->status |= VIRTIO_NET_S_ANNOUNCE; >>>>>>>> +virtio_notify_config(vdev); >>>>>>>> +} >>>>>>>> } >>>>>>>> >>>>>>>> static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) >>>>>>>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev) >>>>>>>> n->nobcast = 0; >>>>>>>> /* multiqueue is disabled by default */ >>>>>>>> n->curr_queues = 1; >>>>>>>> -timer_del(n->announce_timer); >>>>>>>> -n->announce_counter = 0; >>>>>>>> n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>>>>>> >>>>>>>> /* Flush any MAC and VLAN filter table state */ >>>>>>>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet >>>>>>>> *n, uint8_t cmd, >>>>>>>> if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK && >>>>>>>> n->status & VIRTIO_NET_S_ANNOUNCE) { >>>>>>>> n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>>>>>> -if (n->announce_counter) { >>>>>>>> -timer_mod(n->announce_timer, >>>>>>>> - qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + >>>>>>>> - self_announce_delay(n->announce_counter
Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self
On 03/30/2017 11:26 AM, Dr. David Alan Gilbert wrote: > * Vlad Yasevich (vyase...@redhat.com) wrote: >> On 03/30/2017 10:53 AM, Dr. David Alan Gilbert wrote: >>> * Vlad Yasevich (vyase...@redhat.com) wrote: >>>> On 03/30/2017 04:24 AM, Dr. David Alan Gilbert wrote: >>>>> * Vladislav Yasevich (vyase...@redhat.com) wrote: >>>>>> virtio announcements seem to run on its own timer with it's own >>>>>> repetition loop and are invoked separately from qemu_announce_self. >>>>>> This patch consolidates announcements into a single location and >>>>>> allows other nics to provide it's own announcement capabilities, if >>>>>> necessary. >>>>>> >>>>>> This is also usefull in support of exposing qemu_announce_self through >>>>>> qmp/hmp. >>>>>> >>>>>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >>>>>> --- >>>>>> hw/net/virtio-net.c| 30 -- >>>>>> include/hw/virtio/virtio-net.h | 2 -- >>>>>> include/net/net.h | 2 ++ >>>>>> migration/savevm.c | 6 ++ >>>>>> 4 files changed, 16 insertions(+), 24 deletions(-) >>>>>> >>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>>> index c321680..6e9ce4f 100644 >>>>>> --- a/hw/net/virtio-net.c >>>>>> +++ b/hw/net/virtio-net.c >>>>>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, >>>>>> uint8_t status) >>>>>> (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running; >>>>>> } >>>>>> >>>>>> -static void virtio_net_announce_timer(void *opaque) >>>>>> +static void virtio_net_announce(NetClientState *nc) >>>>>> { >>>>>> -VirtIONet *n = opaque; >>>>>> +VirtIONet *n = qemu_get_nic_opaque(nc); >>>>>> VirtIODevice *vdev = VIRTIO_DEVICE(n); >>>>>> >>>>>> -n->announce_counter--; >>>>>> -n->status |= VIRTIO_NET_S_ANNOUNCE; >>>>>> -virtio_notify_config(vdev); >>>>>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) && >>>>>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) { >>>>>> +n->status |= VIRTIO_NET_S_ANNOUNCE; >>>>>> +virtio_notify_config(vdev); >>>>>> +} >>>>>> } >>>>>> >>>>>> static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) >>>>>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev) >>>>>> n->nobcast = 0; >>>>>> /* multiqueue is disabled by default */ >>>>>> n->curr_queues = 1; >>>>>> -timer_del(n->announce_timer); >>>>>> -n->announce_counter = 0; >>>>>> n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>>>> >>>>>> /* Flush any MAC and VLAN filter table state */ >>>>>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet *n, >>>>>> uint8_t cmd, >>>>>> if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK && >>>>>> n->status & VIRTIO_NET_S_ANNOUNCE) { >>>>>> n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>>>> -if (n->announce_counter) { >>>>>> -timer_mod(n->announce_timer, >>>>>> - qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + >>>>>> - self_announce_delay(n->announce_counter)); >>>>>> -} >>>>>> return VIRTIO_NET_OK; >>>>>> } else { >>>>>> return VIRTIO_NET_ERR; >>>>>> @@ -1609,12 +1604,6 @@ static int virtio_net_post_load_device(void >>>>>> *opaque, int version_id) >>>>>> qemu_get_subqueue(n->nic, i)->link_down = link_down; >>>>>> } >>>>>> >>>>>> -if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) && >>>>>> -vir
Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self
On 03/30/2017 10:53 AM, Dr. David Alan Gilbert wrote: > * Vlad Yasevich (vyase...@redhat.com) wrote: >> On 03/30/2017 04:24 AM, Dr. David Alan Gilbert wrote: >>> * Vladislav Yasevich (vyase...@redhat.com) wrote: >>>> virtio announcements seem to run on its own timer with it's own >>>> repetition loop and are invoked separately from qemu_announce_self. >>>> This patch consolidates announcements into a single location and >>>> allows other nics to provide it's own announcement capabilities, if >>>> necessary. >>>> >>>> This is also usefull in support of exposing qemu_announce_self through >>>> qmp/hmp. >>>> >>>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >>>> --- >>>> hw/net/virtio-net.c| 30 -- >>>> include/hw/virtio/virtio-net.h | 2 -- >>>> include/net/net.h | 2 ++ >>>> migration/savevm.c | 6 ++ >>>> 4 files changed, 16 insertions(+), 24 deletions(-) >>>> >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>> index c321680..6e9ce4f 100644 >>>> --- a/hw/net/virtio-net.c >>>> +++ b/hw/net/virtio-net.c >>>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, uint8_t >>>> status) >>>> (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running; >>>> } >>>> >>>> -static void virtio_net_announce_timer(void *opaque) >>>> +static void virtio_net_announce(NetClientState *nc) >>>> { >>>> -VirtIONet *n = opaque; >>>> +VirtIONet *n = qemu_get_nic_opaque(nc); >>>> VirtIODevice *vdev = VIRTIO_DEVICE(n); >>>> >>>> -n->announce_counter--; >>>> -n->status |= VIRTIO_NET_S_ANNOUNCE; >>>> -virtio_notify_config(vdev); >>>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) && >>>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) { >>>> +n->status |= VIRTIO_NET_S_ANNOUNCE; >>>> +virtio_notify_config(vdev); >>>> +} >>>> } >>>> >>>> static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) >>>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev) >>>> n->nobcast = 0; >>>> /* multiqueue is disabled by default */ >>>> n->curr_queues = 1; >>>> -timer_del(n->announce_timer); >>>> -n->announce_counter = 0; >>>> n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>> >>>> /* Flush any MAC and VLAN filter table state */ >>>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet *n, >>>> uint8_t cmd, >>>> if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK && >>>> n->status & VIRTIO_NET_S_ANNOUNCE) { >>>> n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>> -if (n->announce_counter) { >>>> -timer_mod(n->announce_timer, >>>> - qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + >>>> - self_announce_delay(n->announce_counter)); >>>> -} >>>> return VIRTIO_NET_OK; >>>> } else { >>>> return VIRTIO_NET_ERR; >>>> @@ -1609,12 +1604,6 @@ static int virtio_net_post_load_device(void >>>> *opaque, int version_id) >>>> qemu_get_subqueue(n->nic, i)->link_down = link_down; >>>> } >>>> >>>> -if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) && >>>> -virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) { >>>> -n->announce_counter = SELF_ANNOUNCE_ROUNDS; >>>> -timer_mod(n->announce_timer, >>>> qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL)); >>>> -} >>>> - >>>> return 0; >>>> } >>>> >>>> @@ -1829,6 +1818,7 @@ static NetClientInfo net_virtio_info = { >>>> .receive = virtio_net_receive, >>>> .link_status_changed = virtio_net_set_link_status, >>>> .query_rx_filter = virtio_net_query_rxfilter, >>>> +.announce = virtio_net_announce, >>>> }; >>>> >>>>
Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self
On 03/29/2017 04:35 PM, Michael S. Tsirkin wrote: > On Wed, Mar 29, 2017 at 04:19:50PM -0400, Vladislav Yasevich wrote: >> virtio announcements seem to run on its own timer with it's own >> repetition loop and are invoked separately from qemu_announce_self. >> This patch consolidates announcements into a single location and >> allows other nics to provide it's own > > their own > >> announcement capabilities, if >> necessary. >> >> This is also usefull > > or useful? > >> in support of exposing qemu_announce_self through >> qmp/hmp. > > I'm guessing this is the real reason for the change. > How is it helpful for that? Pls clarify in the commit log. Ok > >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >> --- >> hw/net/virtio-net.c| 30 -- >> include/hw/virtio/virtio-net.h | 2 -- >> include/net/net.h | 2 ++ >> migration/savevm.c | 6 ++ >> 4 files changed, 16 insertions(+), 24 deletions(-) >> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> index c321680..6e9ce4f 100644 >> --- a/hw/net/virtio-net.c >> +++ b/hw/net/virtio-net.c >> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, uint8_t >> status) >> (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running; >> } >> >> -static void virtio_net_announce_timer(void *opaque) >> +static void virtio_net_announce(NetClientState *nc) >> { >> -VirtIONet *n = opaque; >> +VirtIONet *n = qemu_get_nic_opaque(nc); >> VirtIODevice *vdev = VIRTIO_DEVICE(n); >> >> -n->announce_counter--; >> -n->status |= VIRTIO_NET_S_ANNOUNCE; >> -virtio_notify_config(vdev); >> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) && >> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) { >> +n->status |= VIRTIO_NET_S_ANNOUNCE; >> +virtio_notify_config(vdev); >> +} >> } >> >> static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) >> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev) >> n->nobcast = 0; >> /* multiqueue is disabled by default */ >> n->curr_queues = 1; >> -timer_del(n->announce_timer); >> -n->announce_counter = 0; >> n->status &= ~VIRTIO_NET_S_ANNOUNCE; >> >> /* Flush any MAC and VLAN filter table state */ >> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet *n, >> uint8_t cmd, >> if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK && >> n->status & VIRTIO_NET_S_ANNOUNCE) { >> n->status &= ~VIRTIO_NET_S_ANNOUNCE; >> -if (n->announce_counter) { >> -timer_mod(n->announce_timer, >> - qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + >> - self_announce_delay(n->announce_counter)); >> -} >> return VIRTIO_NET_OK; >> } else { >> return VIRTIO_NET_ERR; > > you will notice that this runs at VM speeds with QEMU_CLOCK_VIRTUAL, > while the self announce you are tying this to runs with > QEMU_CLOCK_REALTIME. > > So this switch seems wrong to me: by the time VM is started > on destination all timers might have expired. > That explains what I am seeing now. Hmm... may be combining them this way isn't the right approach. I think I'll leave RARPs alone and export an ability to start the a nic provided announce feature. Then Germano's qmp command would be able to call this one if available and the old one if not. That way, on user-triggered announcements, we wouldn't need to do the RARPs, if a better announcement capability is present. That way current behavior is preserved for migrations. Thanks -vlad > >> @@ -1609,12 +1604,6 @@ static int virtio_net_post_load_device(void *opaque, >> int version_id) >> qemu_get_subqueue(n->nic, i)->link_down = link_down; >> } >> >> -if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) && >> -virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) { >> -n->announce_counter = SELF_ANNOUNCE_ROUNDS; >> -timer_mod(n->announce_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL)); >> -} >> - >> return 0; >> } >> >> @@ -1829,6 +1818,7 @@ static NetClientInfo net_virtio_info = { >> .receive = virtio_net_receive, >> .link_status_changed
Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self
gt; -timer_free(n->announce_timer); >> g_free(n->vqs); >> qemu_del_nic(n->nic); >> virtio_cleanup(vdev); >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h >> index 1eec9a2..0c597f4 100644 >> --- a/include/hw/virtio/virtio-net.h >> +++ b/include/hw/virtio/virtio-net.h >> @@ -94,8 +94,6 @@ typedef struct VirtIONet { >> char *netclient_name; >> char *netclient_type; >> uint64_t curr_guest_offloads; >> -QEMUTimer *announce_timer; >> -int announce_counter; >> bool needs_vnet_hdr_swap; >> } VirtIONet; >> >> diff --git a/include/net/net.h b/include/net/net.h >> index 99b28d5..598f523 100644 >> --- a/include/net/net.h >> +++ b/include/net/net.h >> @@ -64,6 +64,7 @@ typedef int (SetVnetLE)(NetClientState *, bool); >> typedef int (SetVnetBE)(NetClientState *, bool); >> typedef struct SocketReadState SocketReadState; >> typedef void (SocketReadStateFinalize)(SocketReadState *rs); >> +typedef void (NetAnnounce)(NetClientState *); >> >> typedef struct NetClientInfo { >> NetClientDriver type; >> @@ -84,6 +85,7 @@ typedef struct NetClientInfo { >> SetVnetHdrLen *set_vnet_hdr_len; >> SetVnetLE *set_vnet_le; >> SetVnetBE *set_vnet_be; >> +NetAnnounce *announce; >> } NetClientInfo; >> >> struct NetClientState { >> diff --git a/migration/savevm.c b/migration/savevm.c >> index 3b19a4a..5c1d8dc 100644 >> --- a/migration/savevm.c >> +++ b/migration/savevm.c >> @@ -113,9 +113,15 @@ static void qemu_announce_self_iter(NICState *nic, void >> *opaque) >> int len; >> >> trace_qemu_announce_self_iter(qemu_ether_ntoa(>conf->macaddr)); >> + >> len = announce_self_create(buf, nic->conf->macaddr.a); >> >> qemu_send_packet_raw(qemu_get_queue(nic), buf, len); >> + >> +/* if the NIC provides it's own announcement support, use it as well */ >> +if (nic->ncs->info->announce) { >> +nic->ncs->info->announce(nic->ncs); >> +} >> } > > Combining them doesn't necessarily seem like a bad thing; but > it does change the timing quite a bit - at the moment the QEMU RARPs > are sent at the end of migration, while the Virtio ARPs are sent > when the guest starts running which is quite a bit later. > > In many ways the 'when the guest starts' is better, given that libvirt > etc has had a chance to reconfigure the networking, but I'm not that > sure if it's safe to move the existing one - I had considered *adding* > another shot of the current mechanism after the guest is started. > Yes, the timing of things have been giving me some issues, but I wanted to post this patch to get some comments just like this one.. I've wondered why they qemu one happens before the guest starts running. > I certainly think it's wrong to do the virtio announce at the earlier > point. > I see. > See also Germano's thread of being able to retrigger the announce > at arbitrary points, and the series I posted a couple of days ago > to extend the length/timing of the announce. > That's kind of what prompted me to do try this. The issue with just exposing qemu_announce_self() is that RARPS just aren't enough in some cases (vlans, multicast). This attempts to add the callback like Jason mentioned, but then we get timer interactions between the virtio-net timers and qemu one. -vlad > Dave > >> -- >> 2.7.4 >> >> > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >
Re: [Qemu-devel] [PATCHv4] multiboot: copy the cmdline verbatim, unescape module strings
Resubmitted with errors fixed. Regards, Vlad On 12/21/2016 01:01 AM, no-re...@patchew.org wrote: > Hi, > > Your series seems to have some coding style problems. See output below for > more information: > > Subject: [Qemu-devel] [PATCHv4] multiboot: copy the cmdline verbatim, > unescape module strings > Message-id: 1482140507-23607-1-git-send-email-vlad.lu...@windriver.com > Type: series > > === TEST SCRIPT BEGIN === > #!/bin/bash > > BASE=base > n=1 > total=$(git log --oneline $BASE.. | wc -l) > failed=0 > > # Useful git options > git config --local diff.renamelimit 0 > git config --local diff.renames True > > commits="$(git log --format=%H --reverse $BASE..)" > for c in $commits; do > echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." > if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; > then > failed=1 > echo > fi > n=$((n+1)) > done > > exit $failed > === TEST SCRIPT END === > > Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 > Switched to a new branch 'test' > 951c8be multiboot: copy the cmdline verbatim, unescape module strings > > === OUTPUT BEGIN === > Checking PATCH 1/1: multiboot: copy the cmdline verbatim, unescape module > strings... > WARNING: line over 80 characters > #42: FILE: hw/i386/multiboot.c:299: > +next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), > initrd_filename); > > ERROR: do not use assignment in if condition > #50: FILE: hw/i386/multiboot.c:304: > +if ((next_space = strchr(tmpbuf, ' '))) > > ERROR: braces {} are necessary for all arms of this statement > #50: FILE: hw/i386/multiboot.c:304: > +if ((next_space = strchr(tmpbuf, ' '))) > [...] > > total: 2 errors, 1 warnings, 48 lines checked > > Your patch has style problems, please review. If any of these errors > are false positives report them to the maintainer, see > CHECKPATCH in MAINTAINERS. > > === OUTPUT END === > > Test command exited with code: 1 > > > --- > Email generated automatically by Patchew [http://patchew.org/]. > Please send your feedback to patchew-de...@freelists.org
[Qemu-devel] [PATCHv5] multiboot: copy the cmdline verbatim, unescape module strings
get_opt_value() truncates the value at the first comma Use memcpy() instead Unescape the module filename and parameters with get_opt_value() before calling mb_add_cmdline() Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com> --- hw/i386/multiboot.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c index 387caa6..d16777e 100644 --- a/hw/i386/multiboot.c +++ b/hw/i386/multiboot.c @@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const char *cmdline) hwaddr p = s->offset_cmdlines; char *b = (char *)s->mb_buf + p; -get_opt_value(b, strlen(cmdline) + 1, cmdline); +memcpy(b, cmdline, strlen(cmdline) + 1); s->offset_cmdlines += strlen(b) + 1; return s->mb_buf_phys + p; } @@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg, mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len; if (initrd_filename) { -char *next_initrd, not_last; +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1]; mbs.offset_mods = mbs.mb_buf_size; @@ -296,25 +296,26 @@ int load_multiboot(FWCfgState *fw_cfg, int mb_mod_length; uint32_t offs = mbs.mb_buf_size; -next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename); +next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), initrd_filename); not_last = *next_initrd; -*next_initrd = '\0'; /* if a space comes after the module filename, treat everything after that as parameters */ -hwaddr c = mb_add_cmdline(, initrd_filename); -if ((next_space = strchr(initrd_filename, ' '))) +hwaddr c = mb_add_cmdline(, tmpbuf); +next_space = strchr(tmpbuf, ' '); +if (next_space) { *next_space = '\0'; -mb_debug("multiboot loading module: %s\n", initrd_filename); -mb_mod_length = get_image_size(initrd_filename); +} +mb_debug("multiboot loading module: %s\n", tmpbuf); +mb_mod_length = get_image_size(tmpbuf); if (mb_mod_length < 0) { -fprintf(stderr, "Failed to open file '%s'\n", initrd_filename); +fprintf(stderr, "Failed to open file '%s'\n", tmpbuf); exit(1); } mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + mbs.mb_buf_size); mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size); -load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs); +load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs); mb_add_mod(, mbs.mb_buf_phys + offs, mbs.mb_buf_phys + offs + mb_mod_length, c); -- 1.9.1
[Qemu-devel] [PATCHv4] multiboot: copy the cmdline verbatim, unescape module strings
get_opt_value() truncates the value at the first comma Use memcpy() instead Unescape the module filename and parameters with get_opt_value() before calling mb_add_cmdline() Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com> --- hw/i386/multiboot.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c index 387caa6..efe11ae 100644 --- a/hw/i386/multiboot.c +++ b/hw/i386/multiboot.c @@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const char *cmdline) hwaddr p = s->offset_cmdlines; char *b = (char *)s->mb_buf + p; -get_opt_value(b, strlen(cmdline) + 1, cmdline); +memcpy(b, cmdline, strlen(cmdline) + 1); s->offset_cmdlines += strlen(b) + 1; return s->mb_buf_phys + p; } @@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg, mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len; if (initrd_filename) { -char *next_initrd, not_last; +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1]; mbs.offset_mods = mbs.mb_buf_size; @@ -296,25 +296,24 @@ int load_multiboot(FWCfgState *fw_cfg, int mb_mod_length; uint32_t offs = mbs.mb_buf_size; -next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename); +next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), initrd_filename); not_last = *next_initrd; -*next_initrd = '\0'; /* if a space comes after the module filename, treat everything after that as parameters */ -hwaddr c = mb_add_cmdline(, initrd_filename); -if ((next_space = strchr(initrd_filename, ' '))) +hwaddr c = mb_add_cmdline(, tmpbuf); +if ((next_space = strchr(tmpbuf, ' '))) *next_space = '\0'; -mb_debug("multiboot loading module: %s\n", initrd_filename); -mb_mod_length = get_image_size(initrd_filename); +mb_debug("multiboot loading module: %s\n", tmpbuf); +mb_mod_length = get_image_size(tmpbuf); if (mb_mod_length < 0) { -fprintf(stderr, "Failed to open file '%s'\n", initrd_filename); +fprintf(stderr, "Failed to open file '%s'\n", tmpbuf); exit(1); } mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + mbs.mb_buf_size); mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size); -load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs); +load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs); mb_add_mod(, mbs.mb_buf_phys + offs, mbs.mb_buf_phys + offs + mb_mod_length, c); -- 1.9.1
Re: [Qemu-devel] [PATCHv3] multiboot: copy the cmdline verbatim, unescape module strings
On 12/18/2016 10:25 PM, Eduardo Habkost wrote: > On Thu, Dec 15, 2016 at 02:32:04PM +0200, Vlad Lungu wrote: >> get_opt_value() truncates the value at the first comma >> Use memcpy() instead >> Unescape the module filename and parameters with get_opt_value() >> before calling mb_add_cmdline() >> >> Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com> >> --- >> hw/i386/multiboot.c | 19 +-- >> 1 file changed, 9 insertions(+), 10 deletions(-) >> >> diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c >> index 387caa6..6b7b5a9 100644 >> --- a/hw/i386/multiboot.c >> +++ b/hw/i386/multiboot.c >> @@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const >> char *cmdline) >> hwaddr p = s->offset_cmdlines; >> char *b = (char *)s->mb_buf + p; >> >> -get_opt_value(b, strlen(cmdline) + 1, cmdline); >> +memcpy(b, cmdline, strlen(cmdline) + 1); >> s->offset_cmdlines += strlen(b) + 1; >> return s->mb_buf_phys + p; >> } >> @@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg, >> mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len; >> >> if (initrd_filename) { >> -char *next_initrd, not_last; >> +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1]; >> >> mbs.offset_mods = mbs.mb_buf_size; >> >> @@ -296,25 +296,24 @@ int load_multiboot(FWCfgState *fw_cfg, >> int mb_mod_length; >> uint32_t offs = mbs.mb_buf_size; >> >> -next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename); >> +next_initrd = (char *)get_opt_value(tmpbuf, >> strlen(initrd_filename) + 1, initrd_filename); > I would prefer to use sizeof(initrd_filename) like Paolo > suggested. sizeof(initrd_filename) is 8 (on my machine, x86_64). Maybe sizeof(tmpbuf) would be a better idea :-) Regards, Vlad
[Qemu-devel] [PATCHv3] multiboot: copy the cmdline verbatim, unescape module strings
get_opt_value() truncates the value at the first comma Use memcpy() instead Unescape the module filename and parameters with get_opt_value() before calling mb_add_cmdline() Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com> --- hw/i386/multiboot.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c index 387caa6..6b7b5a9 100644 --- a/hw/i386/multiboot.c +++ b/hw/i386/multiboot.c @@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const char *cmdline) hwaddr p = s->offset_cmdlines; char *b = (char *)s->mb_buf + p; -get_opt_value(b, strlen(cmdline) + 1, cmdline); +memcpy(b, cmdline, strlen(cmdline) + 1); s->offset_cmdlines += strlen(b) + 1; return s->mb_buf_phys + p; } @@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg, mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len; if (initrd_filename) { -char *next_initrd, not_last; +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1]; mbs.offset_mods = mbs.mb_buf_size; @@ -296,25 +296,24 @@ int load_multiboot(FWCfgState *fw_cfg, int mb_mod_length; uint32_t offs = mbs.mb_buf_size; -next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename); +next_initrd = (char *)get_opt_value(tmpbuf, strlen(initrd_filename) + 1, initrd_filename); not_last = *next_initrd; -*next_initrd = '\0'; /* if a space comes after the module filename, treat everything after that as parameters */ -hwaddr c = mb_add_cmdline(, initrd_filename); -if ((next_space = strchr(initrd_filename, ' '))) +hwaddr c = mb_add_cmdline(, tmpbuf); +if ((next_space = strchr(tmpbuf, ' '))) *next_space = '\0'; -mb_debug("multiboot loading module: %s\n", initrd_filename); -mb_mod_length = get_image_size(initrd_filename); +mb_debug("multiboot loading module: %s\n", tmpbuf); +mb_mod_length = get_image_size(tmpbuf); if (mb_mod_length < 0) { -fprintf(stderr, "Failed to open file '%s'\n", initrd_filename); +fprintf(stderr, "Failed to open file '%s'\n", tmpbuf); exit(1); } mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + mbs.mb_buf_size); mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size); -load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs); +load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs); mb_add_mod(, mbs.mb_buf_phys + offs, mbs.mb_buf_phys + offs + mb_mod_length, c); -- 1.9.1
Re: [Qemu-devel] [PATCHv2] multiboot: copy the cmdline verbatim, unescape module strings
On 12/15/2016 02:29 PM, Paolo Bonzini wrote: > > On 15/12/2016 12:34, Vlad Lungu wrote: >> >> On 12/15/2016 12:56 PM, Paolo Bonzini wrote: >>> On 15/12/2016 11:03, Vlad Lungu wrote: >>>> >>>> if (initrd_filename) { >>>> -char *next_initrd, not_last; >>>> +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1]; >>>> >>>> mbs.offset_mods = mbs.mb_buf_size; >>>> >>>> @@ -299,22 +299,24 @@ int load_multiboot(FWCfgState *fw_cfg, >>>> next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename); >>>> not_last = *next_initrd; >>>> *next_initrd = '\0'; >>>> +/* Unescape the filename + eventual arguments */ >>>> +get_opt_value(tmpbuf, strlen(initrd_filename) + 1, >>>> initrd_filename); >>> No need to call get_opt_value twice; you can call it once since you have >>> declared tmpbuf to hold the entire size of initrd_filename. >> So what you're saying is to go with >> >> next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), >> initrd_filename); >> >> and delete the second call to get_opt_value(). >> >> Is that right? > Yes, thanks! > I'll delete *next_initrd = '\0'; too, since we're operating on tmpbuf after get_opt_value(), and it's already NUL-terminated. Regards, Vlad
Re: [Qemu-devel] [PATCHv2] multiboot: copy the cmdline verbatim, unescape module strings
On 12/15/2016 12:56 PM, Paolo Bonzini wrote: > > On 15/12/2016 11:03, Vlad Lungu wrote: >> >> if (initrd_filename) { >> -char *next_initrd, not_last; >> +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1]; >> >> mbs.offset_mods = mbs.mb_buf_size; >> >> @@ -299,22 +299,24 @@ int load_multiboot(FWCfgState *fw_cfg, >> next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename); >> not_last = *next_initrd; >> *next_initrd = '\0'; >> +/* Unescape the filename + eventual arguments */ >> +get_opt_value(tmpbuf, strlen(initrd_filename) + 1, >> initrd_filename); > No need to call get_opt_value twice; you can call it once since you have > declared tmpbuf to hold the entire size of initrd_filename. So what you're saying is to go with next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), initrd_filename); and delete the second call to get_opt_value(). Is that right? Regards, Vlad
[Qemu-devel] [PATCHv2] multiboot: copy the cmdline verbatim, unescape module strings
get_opt_value() truncates the value at the first comma Use memcpy() instead Unescape the module filename and parameters with get_opt_value() before calling mb_add_cmdline() Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com> --- hw/i386/multiboot.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c index 387caa6..37e27f2 100644 --- a/hw/i386/multiboot.c +++ b/hw/i386/multiboot.c @@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const char *cmdline) hwaddr p = s->offset_cmdlines; char *b = (char *)s->mb_buf + p; -get_opt_value(b, strlen(cmdline) + 1, cmdline); +memcpy(b, cmdline, strlen(cmdline) + 1); s->offset_cmdlines += strlen(b) + 1; return s->mb_buf_phys + p; } @@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg, mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len; if (initrd_filename) { -char *next_initrd, not_last; +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1]; mbs.offset_mods = mbs.mb_buf_size; @@ -299,22 +299,24 @@ int load_multiboot(FWCfgState *fw_cfg, next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename); not_last = *next_initrd; *next_initrd = '\0'; +/* Unescape the filename + eventual arguments */ +get_opt_value(tmpbuf, strlen(initrd_filename) + 1, initrd_filename); /* if a space comes after the module filename, treat everything after that as parameters */ -hwaddr c = mb_add_cmdline(, initrd_filename); -if ((next_space = strchr(initrd_filename, ' '))) +hwaddr c = mb_add_cmdline(, tmpbuf); +if ((next_space = strchr(tmpbuf, ' '))) *next_space = '\0'; -mb_debug("multiboot loading module: %s\n", initrd_filename); -mb_mod_length = get_image_size(initrd_filename); +mb_debug("multiboot loading module: %s\n", tmpbuf); +mb_mod_length = get_image_size(tmpbuf); if (mb_mod_length < 0) { -fprintf(stderr, "Failed to open file '%s'\n", initrd_filename); +fprintf(stderr, "Failed to open file '%s'\n", tmpbuf); exit(1); } mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + mbs.mb_buf_size); mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size); -load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs); +load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs); mb_add_mod(, mbs.mb_buf_phys + offs, mbs.mb_buf_phys + offs + mb_mod_length, c); -- 1.9.1
Re: [Qemu-devel] [PATCH] multiboot: copy the cmdline verbatim, unescape module strings
Sorry about replying to myself, I only had one coffee today. There is no need for mb_add_modstring(), we can simply do get_opt_value(tmpbuf,..) then mb_add_cmdline(, tmpbuf) Regards, Vlad On 12/15/2016 11:45 AM, Vlad Lungu wrote: > get_opt_value() truncates the value at the first comma > Rename mb_add_cmdline() to mb_add_modstring() > Unescape filename too > Add new mb_add_cmdline() using memcpy() > > Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com> > --- > hw/i386/multiboot.c | 23 +-- > 1 file changed, 17 insertions(+), 6 deletions(-) > > diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c > index 387caa6..bfb52fc 100644 > --- a/hw/i386/multiboot.c > +++ b/hw/i386/multiboot.c > @@ -109,6 +109,16 @@ static uint32_t mb_add_cmdline(MultibootState *s, const > char *cmdline) > hwaddr p = s->offset_cmdlines; > char *b = (char *)s->mb_buf + p; > > +memcpy(b, cmdline, strlen(cmdline) + 1); > +s->offset_cmdlines += strlen(b) + 1; > +return s->mb_buf_phys + p; > +} > + > +static uint32_t mb_add_modstring(MultibootState *s, const char *cmdline) > +{ > +hwaddr p = s->offset_cmdlines; > +char *b = (char *)s->mb_buf + p; > + > get_opt_value(b, strlen(cmdline) + 1, cmdline); > s->offset_cmdlines += strlen(b) + 1; > return s->mb_buf_phys + p; > @@ -287,7 +297,7 @@ int load_multiboot(FWCfgState *fw_cfg, > mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len; > > if (initrd_filename) { > -char *next_initrd, not_last; > +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1]; > > mbs.offset_mods = mbs.mb_buf_size; > > @@ -301,20 +311,21 @@ int load_multiboot(FWCfgState *fw_cfg, > *next_initrd = '\0'; > /* if a space comes after the module filename, treat everything > after that as parameters */ > -hwaddr c = mb_add_cmdline(, initrd_filename); > +hwaddr c = mb_add_modstring(, initrd_filename); > if ((next_space = strchr(initrd_filename, ' '))) > *next_space = '\0'; > -mb_debug("multiboot loading module: %s\n", initrd_filename); > -mb_mod_length = get_image_size(initrd_filename); > +get_opt_value(tmpbuf, strlen(initrd_filename) + 1, > initrd_filename); > +mb_debug("multiboot loading module: %s\n", tmpbuf); > +mb_mod_length = get_image_size(tmpbuf); > if (mb_mod_length < 0) { > -fprintf(stderr, "Failed to open file '%s'\n", > initrd_filename); > +fprintf(stderr, "Failed to open file '%s'\n", tmpbuf); > exit(1); > } > > mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + > mbs.mb_buf_size); > mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size); > > -load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs); > +load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs); > mb_add_mod(, mbs.mb_buf_phys + offs, > mbs.mb_buf_phys + offs + mb_mod_length, c); >
[Qemu-devel] [PATCH] multiboot: copy the cmdline verbatim, unescape module strings
get_opt_value() truncates the value at the first comma Rename mb_add_cmdline() to mb_add_modstring() Unescape filename too Add new mb_add_cmdline() using memcpy() Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com> --- hw/i386/multiboot.c | 23 +-- 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c index 387caa6..bfb52fc 100644 --- a/hw/i386/multiboot.c +++ b/hw/i386/multiboot.c @@ -109,6 +109,16 @@ static uint32_t mb_add_cmdline(MultibootState *s, const char *cmdline) hwaddr p = s->offset_cmdlines; char *b = (char *)s->mb_buf + p; +memcpy(b, cmdline, strlen(cmdline) + 1); +s->offset_cmdlines += strlen(b) + 1; +return s->mb_buf_phys + p; +} + +static uint32_t mb_add_modstring(MultibootState *s, const char *cmdline) +{ +hwaddr p = s->offset_cmdlines; +char *b = (char *)s->mb_buf + p; + get_opt_value(b, strlen(cmdline) + 1, cmdline); s->offset_cmdlines += strlen(b) + 1; return s->mb_buf_phys + p; @@ -287,7 +297,7 @@ int load_multiboot(FWCfgState *fw_cfg, mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len; if (initrd_filename) { -char *next_initrd, not_last; +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1]; mbs.offset_mods = mbs.mb_buf_size; @@ -301,20 +311,21 @@ int load_multiboot(FWCfgState *fw_cfg, *next_initrd = '\0'; /* if a space comes after the module filename, treat everything after that as parameters */ -hwaddr c = mb_add_cmdline(, initrd_filename); +hwaddr c = mb_add_modstring(, initrd_filename); if ((next_space = strchr(initrd_filename, ' '))) *next_space = '\0'; -mb_debug("multiboot loading module: %s\n", initrd_filename); -mb_mod_length = get_image_size(initrd_filename); +get_opt_value(tmpbuf, strlen(initrd_filename) + 1, initrd_filename); +mb_debug("multiboot loading module: %s\n", tmpbuf); +mb_mod_length = get_image_size(tmpbuf); if (mb_mod_length < 0) { -fprintf(stderr, "Failed to open file '%s'\n", initrd_filename); +fprintf(stderr, "Failed to open file '%s'\n", tmpbuf); exit(1); } mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + mbs.mb_buf_size); mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size); -load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs); +load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs); mb_add_mod(, mbs.mb_buf_phys + offs, mbs.mb_buf_phys + offs + mb_mod_length, c); -- 1.9.1
Re: [Qemu-devel] [PATCH] multiboot: copy the cmdline verbatim
On 12/15/2016 12:32 AM, Paolo Bonzini wrote: >>>> In other words: this fixes the mb_add_cmdline(kcmdline) case, and >>>> doesn't break comma escaping on the initrd case (because it was >>>> already broken). I don't see a problem with this patch. >>> ... there is one case of comma escaping that wasn't broken: >>> >>> $ qemu-system-x86_64 -kernel foo -initrd '/tmp/one >>> arg,,with,,commas,/tmp/another arg,,with,,commas' >>> >> Oh, I didn't notice the whitespace-based split for initrd >> arguments. So that's how it works :-) >> This is messier than I thought. Maybe the simplest solution is to >> inline mb_add_cmdline() at both callers, and change the kcmdline >> one to use memcpy(). > OK, I have a new version that does with memcpy for the cmdline, with get_opt_value() for the modules and also unescapes the filename for get_image_size()/load_image() . You still can't have spaces in filenames, so maybe a new scheme should be devised for this. Regards, Vlad
[Qemu-devel] [PATCH] multiboot: copy the cmdline verbatim
get_opt_value() truncates the value at the first comma Use memcpy() instead Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com> --- hw/i386/multiboot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c index 387caa6..b4495ad 100644 --- a/hw/i386/multiboot.c +++ b/hw/i386/multiboot.c @@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const char *cmdline) hwaddr p = s->offset_cmdlines; char *b = (char *)s->mb_buf + p; -get_opt_value(b, strlen(cmdline) + 1, cmdline); +memcpy(b, cmdline, strlen(cmdline) + 1); s->offset_cmdlines += strlen(b) + 1; return s->mb_buf_phys + p; } -- 1.9.1
[Qemu-devel] [PATCH] multiboot: copy the cmdline verbatim
get_opt_value() truncates the value at the first comma. Use memcpy() instead. Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com> --- hw/i386/multiboot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c index 387caa6..b4495ad 100644 --- a/hw/i386/multiboot.c +++ b/hw/i386/multiboot.c @@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const char *cmdline) hwaddr p = s->offset_cmdlines; char *b = (char *)s->mb_buf + p; -get_opt_value(b, strlen(cmdline) + 1, cmdline); +memcpy(b, cmdline, strlen(cmdline) + 1); s->offset_cmdlines += strlen(b) + 1; return s->mb_buf_phys + p; } -- 1.9.1
Re: [Qemu-devel] [PATCH v3 2/2] rtl8139: correctly track full receive buffer in standard mode
On 08/31/2015 11:15 PM, Jason Wang wrote: > > > On 08/31/2015 10:24 PM, Vlad Yasevich wrote: >> On 08/31/2015 05:59 AM, Jason Wang wrote: >>> >>> On 08/28/2015 10:06 PM, Vladislav Yasevich wrote: >>>> In standard operation mode, when the receive ring buffer >>>> is full, the buffer actually appears empty to the driver since >>>> the RxBufAddr (the location we wirte new data to) and RxBufPtr >>>> (the location guest would stat reading from) are the same. >>>> As a result, the call to rtl8139_RxBufferEmpty ends up >>>> returning true indicating that the receive buffer is empty. >>>> This would result in the next packet overwriting the recevie buffer >>>> again and stalling receive operations. >>>> >>>> This patch tracks the number of unread bytes in the rxbuffer >>>> using an unused C+ register. On every read and write, the >>>> number is adjsted and the special case of a full buffer is also >>>> trapped. >>>> >>>> The C+ register trick is used to simplify migration and not require >>>> a new machine type. This register is not used in regular mode >>>> and C+ mode doesn't have the same issue. >>>> >>>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >>>> --- >>>> hw/net/rtl8139.c | 45 + >>>> 1 file changed, 41 insertions(+), 4 deletions(-) >>> I'm not sure this can happen. For example, looks like the following >>> check in rtl8139_do_receive(): >>> >>> if (avail != 0 && size + 8 >= avail) >>> { >>> >>> can guarantee there's no overwriting? >>> >> The problem is the calculation of avail. When the buffer is full, >> avail will be the the size of the receive buffer. So the test >> above will be false because the driver thinks there is actually >> enough room. >> >> With his patch, 'avail' will be calculated to 0. >> >> -vlad >> > > If believe the condition size + 8 >= avail can guarantee that the buffer > won't be full (if we allow size + 8 == avail, buffer will be full)? So > avail == 0 means the buffer is empty. Or is there anything I miss? > So the issue is that the RxBufAddr is 4 byte aligned, but when we do availability check above, we don't 4 byte align the size+8 calculation. That causes the check above to fail when it should succeed and we never catch the overflow condition. I'll resubmit with a simple alignment patch that makes this work. -vlad
Re: [Qemu-devel] [PATCH v3 2/2] rtl8139: correctly track full receive buffer in standard mode
On 08/31/2015 05:59 AM, Jason Wang wrote: > > > On 08/28/2015 10:06 PM, Vladislav Yasevich wrote: >> In standard operation mode, when the receive ring buffer >> is full, the buffer actually appears empty to the driver since >> the RxBufAddr (the location we wirte new data to) and RxBufPtr >> (the location guest would stat reading from) are the same. >> As a result, the call to rtl8139_RxBufferEmpty ends up >> returning true indicating that the receive buffer is empty. >> This would result in the next packet overwriting the recevie buffer >> again and stalling receive operations. >> >> This patch tracks the number of unread bytes in the rxbuffer >> using an unused C+ register. On every read and write, the >> number is adjsted and the special case of a full buffer is also >> trapped. >> >> The C+ register trick is used to simplify migration and not require >> a new machine type. This register is not used in regular mode >> and C+ mode doesn't have the same issue. >> >> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com> >> --- >> hw/net/rtl8139.c | 45 + >> 1 file changed, 41 insertions(+), 4 deletions(-) > > I'm not sure this can happen. For example, looks like the following > check in rtl8139_do_receive(): > > if (avail != 0 && size + 8 >= avail) > { > > can guarantee there's no overwriting? > The problem is the calculation of avail. When the buffer is full, avail will be the the size of the receive buffer. So the test above will be false because the driver thinks there is actually enough room. With his patch, 'avail' will be calculated to 0. -vlad
Re: [Qemu-devel] [PATCH 2/2] rtl8139: correctly track full receive buffer in standard mode
On 08/26/2015 08:36 AM, Stefan Hajnoczi wrote: On Fri, Aug 21, 2015 at 02:59:25PM -0700, Vladislav Yasevich wrote: In standard operation mode, when the receive ring buffer is full, the buffer actually appears empty to the driver since the RxBufAddr (the location we wirte new data to) and RxBufPtr (the location guest would stat reading from) are the same. As a result, the call to rtl8139_RxBufferEmpty ends up returning true indicating that the receive buffer is empty. This would result in the next packet overwriting the recevie buffer again and stalling receive operations. This patch catches the receive buffer full condition using an unused C+ register. This is done to simplify migration and not require a new machine type. Signed-off-by: Vladislav Yasevich vyase...@redhat.com --- hw/net/rtl8139.c | 36 ++-- 1 file changed, 34 insertions(+), 2 deletions(-) The rtl8139 code duplicates the following expression in several places: MOD2(s-RxBufferSize + s-RxBufAddr - s-RxBufPtr, s-RxBufferSize); It may be cleaner to keep a rx_unread_bytes counter so that all these users can simply look at that variable. That cleanup also eliminates the rx full vs empty problem because then we'll know whether rx_unread_bytes == 0 or rx_unread_bytes == s-RxBufferSize. The same trick of stashing the value in s-currCPlusRxDesc could be used. Good idea. I'll give it a try. diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 359e001..3d572ab 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -816,6 +816,23 @@ static int rtl8139_can_receive(NetClientState *nc) } } +static void rtl8139_set_rxbuf_full(RTL8139State *s, bool full) +{ +/* In standard mode, C+ RxDesc isn't used. Reuse it + * to store the rx_buf_full status. + */ assert(!s-cplus_enabled)? +s-currCPlusRxDesc = full; +DPRINTF(received: rx buffer full\n); +} + +static bool rtl8139_rxbuf_full(RTL8139State *s) +{ +/* In standard mode, C+ RxDesc isn't used. Reuse it + * to store the rx_buf_full status. + */ assert(!s-cplus_enabled)? @@ -2601,6 +2630,9 @@ static void rtl8139_RxBufPtr_write(RTL8139State *s, uint32_t val) /* this value is off by 16 */ s-RxBufPtr = MOD2(val + 0x10, s-RxBufferSize); +/* We just read data, clear full buffer state */ +rtl8139_set_rxbuf_full(s, false); + /* more buffer space may be available so try to receive */ qemu_flush_queued_packets(qemu_get_queue(s-nic)); What if the guest writes this register while we're in C+ mode? In C+ mode the guest uses a descriptor ring instead of liner buffer so a well behaved C+ guest wouldn't write this value. Validated this with a linux guest. I guess a malicious guest could do this, but then it could do a lot of other things too. -vlad
Re: [Qemu-devel] [PATCH 1/2] rtl8139: Do not consume the packet during overflow in standard mode.
On 08/26/2015 08:18 AM, Stefan Hajnoczi wrote: On Fri, Aug 21, 2015 at 02:59:24PM -0700, Vladislav Yasevich wrote: When operation in standard mode, we currently return the size of packet during buffer overflow. This consumes the overflow packet. Return 0 instead so we can re-process the overflow packet when we have room. This fixes issues with lost/dropped fragments of large messages. Signed-off-by: Vladislav Yasevich vyase...@redhat.com --- hw/net/rtl8139.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index edbb61c..359e001 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -1157,7 +1157,7 @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t s-IntrStatus |= RxOverflow; ++s-RxMissed; rtl8139_update_irq(s); -return size_; +return 0; Every .receive() return 0 must be paired with a qemu_flush_queued_packets() call. Isn't that already there. A few drivers already return 0 to trigger a requeue. What's missing? Is rtl8139_RxBufPtr_write() guaranteed to be called when the guest refills rx buffers? It calls it on read, once it's consumed a packet, to advance to the next packet in the buffer. -vlad
Re: [Qemu-devel] [PATCH v2 0/2] rtl8139: Fix buffer overflow in standard mode
On 08/26/2015 03:51 PM, Vladislav Yasevich wrote: When rtl8139 card is running in standard mode, it is very easy to overlflow and the receive buffer and get into a siutation where all packets are dropped. Simply reproduction case is to ping the guest from the host with 6500 byte packets. There are actually 2 problems here. 1) When the rtl8129 buffer is overflow, the card emulation returns the size of the packet back to queue transmission. This signals successful reception even though the packet has been dropped. The proper solution is to return 0, so that the packet is re-queued and will be resubmitted later. 2) When packets are sized such that the fragments end up completely filling the receive buffer without overflow, the device thinks that the buffer is actually empty (instead of full). This causes next packet to over-write the existing packets. With the above ping reproducer, ever ICMP packet fills the buffer and thus keeps overwriting the previous packet and never waking up the guest. The solution here is track the number of unread bytes separately so we would know if we have anything in buffer to read or not. V2: instead of tracking buffer_full condition, changed the code, as suggested by Stefan Hajnoczi, to track the number of unread bytes instead. We initialize it to 0 at the start, adjust it on every receive from the network and read from the guest and can set the number of unread of bytes to full buffer size when the buffer full. Vladislav Yasevich (2): rtl8139: Do not consume the packet during overflow in standard mode. rtl8139: correctly track full receive buffer in standard mode Self nack. The second patch is wrong. Will resubmit when fixed. -vlad hw/net/rtl8139.c | 44 +++- 1 file changed, 39 insertions(+), 5 deletions(-)
Re: [Qemu-devel] [Qemu-discuss] GRO not happening in VM with VxLAN
On 07/01/2015 02:50 PM, Santosh R wrote: Since the vxlan UDP header checksum is 0, udp_tunnel_gro_complete (called via vxlan_gro_complete) is setting SKB_GSO_UDP_TUNNEL in skb_shinfo(skb)-gso_type. Later when bridge interface tries to forward this packet to tap interface (br_dev_queue_push_xmit - __dev_queue_xmit - validate_xmit_skb) netif_needs_gso will succeed (skb_gso_ok returns 0) resulting in packet segmentation (which was earlier aggregated) as tap interface doesn't have this feature. Is there a way to enable tx-udp_tnl-segmentation for tap interface? No. Those features are set by qemu when the guest is created and mirror any features that are supported by the guest. So, if the guest virtio doesn't have that feature, neither will the tap device. Which kernel version are you using on the host? -vlad # ethtool -K tap0 tx-udp_tnl-segmentation on Could not change any device features # ethtool -k tap0 | grep -i tnl tx-udp_tnl-segmentation: off [fixed] On Tue, Jun 30, 2015 at 12:24 PM, Santosh R skrasta...@gmail.com wrote: On Mon, Jun 29, 2015 at 9:24 PM, Santosh R skrasta...@gmail.com wrote: On Mon, Jun 29, 2015 at 9:04 PM, Vlad Yasevich vyase...@redhat.com wrote: On 06/29/2015 01:46 AM, Santosh R wrote: All, I am testing VxLAN performance in VM. For this I am using below command to bring up the VM. # qemu-system-x86_64 -m 4096 -smp 4 -boot c -device virtio-net-pci,netdev=tap0,mac=00:11:22:33:44:55 -netdev tap,id=tap0,script=no,vhost=on -drive file=/root/vdisk_rhel65.img This is resulting in lower throughput in VM. On looking further I found that GRO is not happing on the tap interface. Using tcpdump I could see GRO happeing in physcial nic, vxlan and bridge interface. but _not_ on tap interface. I don't think this is GRO. GRO only happens at the lowest possible layer. For packets coming from external sources, that happens just above the nic when the packet enters the host for the first time. From then on, the features of other devices determine if the large aggregated packet will be segmented again or not. So, I thought to use veth pair instead of the tap interface in VM. What is the command to do this? veth pair will not help with this. you need a tap or a macvtap to have VM talk to host. Also, 1. Why is rx-checksumming off for tap interface? because tap doesn't do checksum validation. it allows lower level devices or VM to do so. 2. Why is GRO not happening with tap interface? Again. I don't think it's GRO. Try capturing traffic at tap and see if you get large packets (more the 1 mtu long) on the tap. If not, then someone else is fragmenting packets and you need to find who. [Santosh]: Thanks for the reply. With VxLAN, aggregated packets are seen on physical, VxLAN and bridge interface but _not_ on the tap interface. Without VxLAN, I do see aggregated (GRO) packets all the way upto VM (i.e. physical, bridge and tap interface). As you rightly pointed, looks like VxLAN packets are getting fragmented. Any pointers to look for? By the way, I was referring below link for VxLAN testing and ran into the GRO issue. This mentions The veth0 is suppose to be connected to the VM, while the veth1 is connected to the Linux bridge. https://community.mellanox.com/docs/DOC-1473
Re: [Qemu-devel] [Qemu-discuss] GRO not happening in VM with VxLAN
On 06/29/2015 01:46 AM, Santosh R wrote: All, I am testing VxLAN performance in VM. For this I am using below command to bring up the VM. # qemu-system-x86_64 -m 4096 -smp 4 -boot c -device virtio-net-pci,netdev=tap0,mac=00:11:22:33:44:55 -netdev tap,id=tap0,script=no,vhost=on -drive file=/root/vdisk_rhel65.img This is resulting in lower throughput in VM. On looking further I found that GRO is not happing on the tap interface. Using tcpdump I could see GRO happeing in physcial nic, vxlan and bridge interface. but _not_ on tap interface. I don't think this is GRO. GRO only happens at the lowest possible layer. For packets coming from external sources, that happens just above the nic when the packet enters the host for the first time. From then on, the features of other devices determine if the large aggregated packet will be segmented again or not. So, I thought to use veth pair instead of the tap interface in VM. What is the command to do this? veth pair will not help with this. you need a tap or a macvtap to have VM talk to host. Also, 1. Why is rx-checksumming off for tap interface? because tap doesn't do checksum validation. it allows lower level devices or VM to do so. 2. Why is GRO not happening with tap interface? Again. I don't think it's GRO. Try capturing traffic at tap and see if you get large packets (more the 1 mtu long) on the tap. If not, then someone else is fragmenting packets and you need to find who. -vlad Pls copy me in all the replies as I am not on the list. # brctl show bridge name bridge id STP enabled interfaces br-vx 8000.323a9146b2d2 no tap0 vxlan0 # ethtool -k eth35 Features for eth35: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off [fixed] receive-hashing: off [fixed] highdma: on rx-vlan-filter: off [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: on tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] busy-poll: on [fixed] # ethtool -k vxlan0 Features for vxlan0: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp6-segmentation: on udp-fragmentation-offload: on generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: off [fixed] tx-vlan-offload: on ntuple-filters: off [fixed] receive-hashing: off [fixed] highdma: off [fixed] rx-vlan-filter: off [fixed] vlan-challenged: off [fixed] tx-lockless: on [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: on rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] busy-poll: off [fixed] # ethtool -k tap0 Features for tap0: rx-checksumming: off [fixed] tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: on tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation
Re: [Qemu-devel] [PATCH] net: Fix link state inter-dependencies between NIC and backend
On 04/02/2015 05:21 AM, Stefan Hajnoczi wrote: On Thu, Apr 02, 2015 at 08:11:15AM +0200, Michael S. Tsirkin wrote: On Wed, Apr 01, 2015 at 07:55:38PM -0400, Vladislav Yasevich wrote: Commit 02d38fcb2caa4454cf4ed728d5908c3cc9ba47be net: Update netdev peer on link change introduced a link state dependency between the backend netdev and the nic. If the admin turned off the link on the backend, the nic link state was set to down because the link had no access to the network at all. This created some inconsitet behavior for someone who wanted to play around the links states. 1) Turning off the nic link and then turning on the backend link (even if it was already on) would turn on the nic link again. 2) Turning off the backend link and then turning on the nic link would turn on the link in the VM, but would not change the backend state resulting in a broken/unreachable network. This patch attempts to correct these behaviors. The patch treats the nic-backend relationship as two ends of a wire. Each end tracks the administratively set link state in addition to actual link state. Thus is is possible to set link down at each end effectively emulating plugging/unplugging the wire at either end. The patch continues to preserve the old behavior where setting just nic side down does NOT impact the backend. Signed-off-by: Vladislav Yasevich vyase...@redhat.com I never understood the point of 02d38fcb2caa4454cf4ed728d5908c3cc9ba47be. If you want to tell guest link is down, just set the nic link down. How about we just revert it? Then one can change link state on each end independently, with no notification, and no state to track. I agree. The simple solution would be nice, unless you are aware of a use case where it causes problems. It causes problems when the user turns off link on the backend. Problems range from much longer boot up times on older VMs to Windows networks not working. The problem is the if you turn off the link on the backend, the link to the VM stays up but can't really reach the network. Now, you could say Don't do this and document it, but it's still ugly and causes issues. We could disable the link state change on backends and only allow it on NIC type devices. That would also solve this. -vlad Stefan
Re: [Qemu-devel] [Bug 1362755] [NEW] QEmu +2 does not route VLAN tagged packets, that are originated within the Hypervisor itself.
[ realized that the bug and reporter were non cc'd, updated cc list] On 08/28/2014 02:40 PM, Thiago Martins wrote: Public bug reported: Guys, Trusty QEmu 2.0 Hypervisor fails to create a consistent virtual network. It does not route tagged VLAN packets. The have a been a bunch of rather recent changes to the kernel to support guest VLANs correctly. The issues have been around TSO/GSO implementation in the kernel. Could try disabling TSO/GSO and tx checksums on the vlan devices in the guest and see if it solves your problem? If it does, could you try the kernel from git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git turn the offloads back on and see if the problem is solved? Thanks -vlad That's it, it is impossible to use Trusty acting as a QEmu 2.0 Hypervisor (metapakage `ubuntu-virt-server`), to make a basic virtual tagged network within itself. QEmu 2.X guest does not route traffic when with tagged VLANs! So, Trusty QEmu 2.0 Hypervisor cannot be used to host guests acting as firewalls / routers, and it have an easy to reproduce, connectivity problem. This network problem affects Ubuntu 14.04.1 (Linux-3.13.0-35-generic) with QEmu 2.0 (it also affects 14.10, Linux 3.16 - QEmu 2.1). I have this very same setup up and running, on about ~100 physical servers (others Trusty QEmu 2.0 Hypervisors), and in only a few of them, the QEmu Hypervisors dedicated to host guest acting as routers / firewalls, like a borger gateway for example, that it does not work as expected. One interesting thing to note is that, this BUG appear only, and only at, the QEmu Hypervisors dedicated to host guests that are used as `router / firewalls` (as I said above), others QEmu Hypervisors of my network does not suffer from this problem. Another interesting point is that it fails to route tagged VLAN packets only when these packets are originated from within the Hypervisor itself, I mean, packets from both host and other guests (not the router/firewall guest itself), suffer from this connectivity problem. As a workaroung / fix, Xen-4.4 can be used, instead of QEmu 2.0, as a border hypervisor. So, this proves that there is something wrong with QEmu. I already tested it with both `openvswitch-switch` and with `bridge- utils`, same bad results. So, don't waste your time trying `bridge- utils` (optional steps while reproducing it), you can keep OVS bridges from original design. I think that I'm using the best pratices to build this environment, as follows... * Topology * QEmu 2.0 Hypervisor - (qemu-host-1.domain.com - the border hypervisor): 1- Physical machine with 3 NICs; 2- Minimal Ubuntu 14.04.1 installed and upgraded; 3- Packages installed: ubuntu-virt-server openvswitch-switch rdnssd tcpdump. - eth0 connected to the Internet - VLAN tag 10; - eth1 connected to the LAN1 - VLAN tag 100; - eth2 connected to the LAN2 - VLAN tag 200; Guest (guest-fw-1.domain.com - the border gateway itself - regular guest acting as a router with iptables/ip6tables): 1- Virtual Machine with 3 NICs (VirtIO); 2- Minimal Virtual Machine Ubuntu 14.04.1 installed and upgraded; 3- Packages installed: aiccu iptables vlan pv-grub-menu. OBS: You'll need `virt-manager` to connect at `qemu-host-1` to install `guest-fw-1`. Then, use `guest-fw-1` as a default gateway for your (virt-)lab network, including the `qemu-host-1` itself. Steps to reproduce * Preparing the `qemu-host-1` host: - Configure the /etc/network/interfaces with: --- # The loopback network interface auto lo iface lo inet loopback auto eth0 iface eth0 inet manual up ip link set $IFACE up down ip link set $IFACE down auto eth1 iface eth1 inet manual up ip link set dev $IFACE up down ip link set dev $IFACE down auto ovsbr1p1 iface ovsbr1p1 inet6 auto iface ovsbr1p1 inet static address 192.168.1.10 netmask 24 gateway 192.168.1.1 auto eth2 iface eth2 inet manual up ip link set $IFACE up down ip link set $IFACE down --- - Creating the Hypervisor OVS Bridges: ovs-vsctl add-br ovsbr0 ovs-vsctl add-br ovsbr1 ovs-vsctl add-br ovsbr2 - Attaching the bridges to the NICs: ovs-vsctl add-port ovsbr0 eth0 ovs-vsctl add-port ovsbr1 eth1 ovs-vsctl add-port ovsbr2 eth2 - Creating the OVS internal tagged interface (best practice?), so the QEmu Hypervisor itself can have its own IP (v4 and v6): ovs-vsctl add-port ovsbr1 ovsbr1p1 tag=100 -- set interface ovsbr1p1 type=internal ovs-vsctl set interface ovsbr1p1 mac=\32:ac:85:72:ab:fe\ NOTE: * I'm fixing the MAC Address of ovsbr1p1 because I like to use IPv6 with SLAAC, so, it remain fixed across host reboots. - Making Libvirt aware of OVS Bridges: Create 3 files, one for each bridge, like this (ovsbr0.xml, ovsbr1.xml and ovsbr2.xml): --- ovsbr0.xml contents: network nameovsbr0/name forward mode
Re: [Qemu-devel] [Bug 1362755] [NEW] QEmu +2 does not route VLAN tagged packets, that are originated within the Hypervisor itself.
On 08/28/2014 02:40 PM, Thiago Martins wrote: Public bug reported: Guys, Trusty QEmu 2.0 Hypervisor fails to create a consistent virtual network. It does not route tagged VLAN packets. The have a been a bunch of rather recent changes to the kernel to support guest VLANs correctly. The issues have been around TSO/GSO implementation in the kernel. Could try disabling TSO/GSO and tx checksums on the vlan devices in the guest and see if it solves your problem? If it does, could you try the kernel from git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git turn the offloads back on and see if the problem is solved? Thanks -vlad That's it, it is impossible to use Trusty acting as a QEmu 2.0 Hypervisor (metapakage `ubuntu-virt-server`), to make a basic virtual tagged network within itself. QEmu 2.X guest does not route traffic when with tagged VLANs! So, Trusty QEmu 2.0 Hypervisor cannot be used to host guests acting as firewalls / routers, and it have an easy to reproduce, connectivity problem. This network problem affects Ubuntu 14.04.1 (Linux-3.13.0-35-generic) with QEmu 2.0 (it also affects 14.10, Linux 3.16 - QEmu 2.1). I have this very same setup up and running, on about ~100 physical servers (others Trusty QEmu 2.0 Hypervisors), and in only a few of them, the QEmu Hypervisors dedicated to host guest acting as routers / firewalls, like a borger gateway for example, that it does not work as expected. One interesting thing to note is that, this BUG appear only, and only at, the QEmu Hypervisors dedicated to host guests that are used as `router / firewalls` (as I said above), others QEmu Hypervisors of my network does not suffer from this problem. Another interesting point is that it fails to route tagged VLAN packets only when these packets are originated from within the Hypervisor itself, I mean, packets from both host and other guests (not the router/firewall guest itself), suffer from this connectivity problem. As a workaroung / fix, Xen-4.4 can be used, instead of QEmu 2.0, as a border hypervisor. So, this proves that there is something wrong with QEmu. I already tested it with both `openvswitch-switch` and with `bridge- utils`, same bad results. So, don't waste your time trying `bridge- utils` (optional steps while reproducing it), you can keep OVS bridges from original design. I think that I'm using the best pratices to build this environment, as follows... * Topology * QEmu 2.0 Hypervisor - (qemu-host-1.domain.com - the border hypervisor): 1- Physical machine with 3 NICs; 2- Minimal Ubuntu 14.04.1 installed and upgraded; 3- Packages installed: ubuntu-virt-server openvswitch-switch rdnssd tcpdump. - eth0 connected to the Internet - VLAN tag 10; - eth1 connected to the LAN1 - VLAN tag 100; - eth2 connected to the LAN2 - VLAN tag 200; Guest (guest-fw-1.domain.com - the border gateway itself - regular guest acting as a router with iptables/ip6tables): 1- Virtual Machine with 3 NICs (VirtIO); 2- Minimal Virtual Machine Ubuntu 14.04.1 installed and upgraded; 3- Packages installed: aiccu iptables vlan pv-grub-menu. OBS: You'll need `virt-manager` to connect at `qemu-host-1` to install `guest-fw-1`. Then, use `guest-fw-1` as a default gateway for your (virt-)lab network, including the `qemu-host-1` itself. Steps to reproduce * Preparing the `qemu-host-1` host: - Configure the /etc/network/interfaces with: --- # The loopback network interface auto lo iface lo inet loopback auto eth0 iface eth0 inet manual up ip link set $IFACE up down ip link set $IFACE down auto eth1 iface eth1 inet manual up ip link set dev $IFACE up down ip link set dev $IFACE down auto ovsbr1p1 iface ovsbr1p1 inet6 auto iface ovsbr1p1 inet static address 192.168.1.10 netmask 24 gateway 192.168.1.1 auto eth2 iface eth2 inet manual up ip link set $IFACE up down ip link set $IFACE down --- - Creating the Hypervisor OVS Bridges: ovs-vsctl add-br ovsbr0 ovs-vsctl add-br ovsbr1 ovs-vsctl add-br ovsbr2 - Attaching the bridges to the NICs: ovs-vsctl add-port ovsbr0 eth0 ovs-vsctl add-port ovsbr1 eth1 ovs-vsctl add-port ovsbr2 eth2 - Creating the OVS internal tagged interface (best practice?), so the QEmu Hypervisor itself can have its own IP (v4 and v6): ovs-vsctl add-port ovsbr1 ovsbr1p1 tag=100 -- set interface ovsbr1p1 type=internal ovs-vsctl set interface ovsbr1p1 mac=\32:ac:85:72:ab:fe\ NOTE: * I'm fixing the MAC Address of ovsbr1p1 because I like to use IPv6 with SLAAC, so, it remain fixed across host reboots. - Making Libvirt aware of OVS Bridges: Create 3 files, one for each bridge, like this (ovsbr0.xml, ovsbr1.xml and ovsbr2.xml): --- ovsbr0.xml contents: network nameovsbr0/name forward mode='bridge'/ bridge name='ovsbr0'/ virtualport type='openvswitch
Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change
On 04/02/2014 04:51 AM, Michael S. Tsirkin wrote: On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote: When a link change occurs on a backend (like tap), we currently do not propage such change to the nic. As a result, when someone turns off a link on a tap device, for instance, then a guest doesn't see that change and continues to try to send traffic or run DHCP even though the lower-layer is disconnected. This is OK when the network is set up as a HUB since the the guest may be connected to other HUB ports too, but when it's set up as a netdev, it makes thinkgs worse. The patch addresses this by setting the peers link down only when the peer is not a HUBPORT device. With this patch, in the following config -netdev tap,id=net0 -device e1000,mac=X,netdev=net0 when net0 link is turned off, the guest e1000 shows lower-layer link down. This allows guests to boot much faster in such configurations. With windows guest, it also allows the network to recover properly since windows will not configure the link-local IPv4 address, and when the link is turned on, the proper address address is configured. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- net/net.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/net/net.c b/net/net.c index 0a88e68..8a084bf 100644 --- a/net/net.c +++ b/net/net.c @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error **errp) nc-info-link_status_changed(nc); } -/* Notify peer. Don't update peer link status: this makes it possible to - * disconnect from host network without notifying the guest. - * FIXME: is disconnected link status change operation useful? - * - * Current behaviour is compatible with qemu vlans where there could be - * multiple clients that can still communicate with each other in - * disconnected mode. For now maintain this compatibility. */ Hmm this went in without much discussion. But before this change went in, it was possible to control link state on NIC and on link separately, without guest noticing. Why did we decide to drop this functionality? This was causing some interesting issues in the non-hub configurations. In these configs, the loss of link on the backend did not propagate to the guest. The guest would continue to send traffic and it would just queue up and get dropped. Eventually the guest could lose the DHCP lease and there wouldn't be any indication at all as to why it happened. From the guest perspective, the link is working, but it is completely disconnected. This patch essentially equates the backend to a carrier on the guest and the loss of a backend is loss of carrier. This is only done for configurations where you have a 1-to-1 relationship between a hw nic and netdev. For the old vlan/hubport, there is no change, since in those case, the guest nic can talk to other hubports. In the case of netdev, it can't. As an example, try booting a VM in which the netdev link is down. First, it'll take a while to boot if it uses DHCP. When you re-enable, the link, the VM will not issue DHCP requests and you'd have to manually restart the nic in the VM. With this patch, link detection works properly. -vlad -if (nc-peer nc-peer-info-link_status_changed) { -nc-peer-info-link_status_changed(nc-peer); +if (nc-peer) { +/* Change peer link only if the peer is NIC and then notify peer. + * If the peer is a HUBPORT or a backend, we do not change the + * link status. + * + * This behavior is compatible with qemu vlans where there could be + * multiple clients that can still communicate with each other in + * disconnected mode. For now maintain this compatibility. + */ +if (nc-peer-info-type == NET_CLIENT_OPTIONS_KIND_NIC) { +for (i = 0; i queues; i++) { +ncs[i]-peer-link_down = !up; +} +} +if (nc-peer-info-link_status_changed) { +nc-peer-info-link_status_changed(nc-peer); +} } } -- 1.8.4.2
Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change
On 04/02/2014 06:46 AM, Yan Vugenfirer wrote: On Apr 2, 2014, at 11:51 AM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote: When a link change occurs on a backend (like tap), we currently do not propage such change to the nic. As a result, when someone turns off a link on a tap device, for instance, then a guest doesn't see that change and continues to try to send traffic or run DHCP even though the lower-layer is disconnected. This is OK when the network is set up as a HUB since the the guest may be connected to other HUB ports too, but when it's set up as a netdev, it makes thinkgs worse. The patch addresses this by setting the peers link down only when the peer is not a HUBPORT device. With this patch, in the following config -netdev tap,id=net0 -device e1000,mac=X,netdev=net0 when net0 link is turned off, the guest e1000 shows lower-layer link down. This allows guests to boot much faster in such configurations. With windows guest, it also allows the network to recover properly since windows will not configure the link-local IPv4 address, and when the link is turned on, the proper address address is configured. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- net/net.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/net/net.c b/net/net.c index 0a88e68..8a084bf 100644 --- a/net/net.c +++ b/net/net.c @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error **errp) nc-info-link_status_changed(nc); } -/* Notify peer. Don't update peer link status: this makes it possible to - * disconnect from host network without notifying the guest. - * FIXME: is disconnected link status change operation useful? - * - * Current behaviour is compatible with qemu vlans where there could be - * multiple clients that can still communicate with each other in - * disconnected mode. For now maintain this compatibility. */ Hmm this went in without much discussion. But before this change went in, it was possible to control link state on NIC and on link separately, without guest noticing. Why did we decide to drop this functionality? Not sure there was a real need for the patch. On other hand Windows guest will not receive IP address from DHCP server if you start VM with tap link down and then set it to up without toggling link state of the guest device as well. Same for a linux guest. It a fault scenario. So we either handle it by propagating the link event, or we forbid it. Doing neither allows for a bad state in common configuration. This patch chose to propagate the event under certain configuration. -vlad Yan. -if (nc-peer nc-peer-info-link_status_changed) { -nc-peer-info-link_status_changed(nc-peer); +if (nc-peer) { +/* Change peer link only if the peer is NIC and then notify peer. + * If the peer is a HUBPORT or a backend, we do not change the + * link status. + * + * This behavior is compatible with qemu vlans where there could be + * multiple clients that can still communicate with each other in + * disconnected mode. For now maintain this compatibility. + */ +if (nc-peer-info-type == NET_CLIENT_OPTIONS_KIND_NIC) { +for (i = 0; i queues; i++) { +ncs[i]-peer-link_down = !up; +} +} +if (nc-peer-info-link_status_changed) { +nc-peer-info-link_status_changed(nc-peer); +} } } -- 1.8.4.2
Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change
On 04/02/2014 11:03 AM, Michael S. Tsirkin wrote: On Wed, Apr 02, 2014 at 10:57:08AM -0400, Vlad Yasevich wrote: On 04/02/2014 06:46 AM, Yan Vugenfirer wrote: On Apr 2, 2014, at 11:51 AM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote: When a link change occurs on a backend (like tap), we currently do not propage such change to the nic. As a result, when someone turns off a link on a tap device, for instance, then a guest doesn't see that change and continues to try to send traffic or run DHCP even though the lower-layer is disconnected. This is OK when the network is set up as a HUB since the the guest may be connected to other HUB ports too, but when it's set up as a netdev, it makes thinkgs worse. The patch addresses this by setting the peers link down only when the peer is not a HUBPORT device. With this patch, in the following config -netdev tap,id=net0 -device e1000,mac=X,netdev=net0 when net0 link is turned off, the guest e1000 shows lower-layer link down. This allows guests to boot much faster in such configurations. With windows guest, it also allows the network to recover properly since windows will not configure the link-local IPv4 address, and when the link is turned on, the proper address address is configured. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- net/net.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/net/net.c b/net/net.c index 0a88e68..8a084bf 100644 --- a/net/net.c +++ b/net/net.c @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error **errp) nc-info-link_status_changed(nc); } -/* Notify peer. Don't update peer link status: this makes it possible to - * disconnect from host network without notifying the guest. - * FIXME: is disconnected link status change operation useful? - * - * Current behaviour is compatible with qemu vlans where there could be - * multiple clients that can still communicate with each other in - * disconnected mode. For now maintain this compatibility. */ Hmm this went in without much discussion. But before this change went in, it was possible to control link state on NIC and on link separately, without guest noticing. Why did we decide to drop this functionality? Not sure there was a real need for the patch. On other hand Windows guest will not receive IP address from DHCP server if you start VM with tap link down and then set it to up without toggling link state of the guest device as well. Same for a linux guest. It a fault scenario. So we either handle it by propagating the link event, or we forbid it. Doing neither allows for a bad state in common configuration. It's how networking works. If I turn off my router at home I can't get dhcp either. And you device link will show that it has not carrier. We had a way to disable networking at link or at the router, then removed the ability to turn it off at the router and I'm asking why. We didn't remove the ability to turn it off at router/switch. We just now propagate carrier loss correctly. In fact, if you have a configuration where the VM is connected to a hub in qemu, turning off the link on the 'tap' connected to the same hub doesn't not change guest state. This patch chose to propagate the event under certain configuration. -vlad So don't do this then. If you want guest to know that link is down, put it down on guest side. It's that simple. Sorry, but that's not always possible. The host administrator may not always be vm administrator and without this patch vm administrator has no idea what happened to his network. -vlad Yan. -if (nc-peer nc-peer-info-link_status_changed) { -nc-peer-info-link_status_changed(nc-peer); +if (nc-peer) { +/* Change peer link only if the peer is NIC and then notify peer. + * If the peer is a HUBPORT or a backend, we do not change the + * link status. + * + * This behavior is compatible with qemu vlans where there could be + * multiple clients that can still communicate with each other in + * disconnected mode. For now maintain this compatibility. + */ +if (nc-peer-info-type == NET_CLIENT_OPTIONS_KIND_NIC) { +for (i = 0; i queues; i++) { +ncs[i]-peer-link_down = !up; +} +} +if (nc-peer-info-link_status_changed) { +nc-peer-info-link_status_changed(nc-peer); +} } } -- 1.8.4.2
Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change
On 04/02/2014 11:29 AM, Michael S. Tsirkin wrote: On Wed, Apr 02, 2014 at 11:25:32AM -0400, Vlad Yasevich wrote: On 04/02/2014 11:03 AM, Michael S. Tsirkin wrote: On Wed, Apr 02, 2014 at 10:57:08AM -0400, Vlad Yasevich wrote: On 04/02/2014 06:46 AM, Yan Vugenfirer wrote: On Apr 2, 2014, at 11:51 AM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote: When a link change occurs on a backend (like tap), we currently do not propage such change to the nic. As a result, when someone turns off a link on a tap device, for instance, then a guest doesn't see that change and continues to try to send traffic or run DHCP even though the lower-layer is disconnected. This is OK when the network is set up as a HUB since the the guest may be connected to other HUB ports too, but when it's set up as a netdev, it makes thinkgs worse. The patch addresses this by setting the peers link down only when the peer is not a HUBPORT device. With this patch, in the following config -netdev tap,id=net0 -device e1000,mac=X,netdev=net0 when net0 link is turned off, the guest e1000 shows lower-layer link down. This allows guests to boot much faster in such configurations. With windows guest, it also allows the network to recover properly since windows will not configure the link-local IPv4 address, and when the link is turned on, the proper address address is configured. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- net/net.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/net/net.c b/net/net.c index 0a88e68..8a084bf 100644 --- a/net/net.c +++ b/net/net.c @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error **errp) nc-info-link_status_changed(nc); } -/* Notify peer. Don't update peer link status: this makes it possible to - * disconnect from host network without notifying the guest. - * FIXME: is disconnected link status change operation useful? - * - * Current behaviour is compatible with qemu vlans where there could be - * multiple clients that can still communicate with each other in - * disconnected mode. For now maintain this compatibility. */ Hmm this went in without much discussion. But before this change went in, it was possible to control link state on NIC and on link separately, without guest noticing. Why did we decide to drop this functionality? Not sure there was a real need for the patch. On other hand Windows guest will not receive IP address from DHCP server if you start VM with tap link down and then set it to up without toggling link state of the guest device as well. Same for a linux guest. It a fault scenario. So we either handle it by propagating the link event, or we forbid it. Doing neither allows for a bad state in common configuration. It's how networking works. If I turn off my router at home I can't get dhcp either. And you device link will show that it has not carrier. It doesn't as long as the switch it's connected to is up. We had a way to disable networking at link or at the router, then removed the ability to turn it off at the router and I'm asking why. We didn't remove the ability to turn it off at router/switch. We just now propagate carrier loss correctly. In fact, if you have a configuration where the VM is connected to a hub in qemu, turning off the link on the 'tap' connected to the same hub doesn't not change guest state. So why is it a good idea to make -netdev inconsistent with this? Because -netdev has a 1-1 relationship with -device. You could think of it a physical wire that connects the nic to the network. If something happens to the wire, the nic should notice it. It doesn't seem to add anything useful. It might fix things for users who turned off link at the wrong place by mistake but I doubt it's a common scenario. I don't think link state management in general is a common scenario, but we support it. I think this makes the behavior better and improves the vm experience. Where end users getting it wrong and complaining? Yes, I can provide you with a list reported bugs if you wish. -vlad This patch chose to propagate the event under certain configuration. -vlad So don't do this then. If you want guest to know that link is down, put it down on guest side. It's that simple. Sorry, but that's not always possible. The host administrator may not always be vm administrator and without this patch vm administrator has no idea what happened to his network. -vlad Yan. -if (nc-peer nc-peer-info-link_status_changed) { -nc-peer-info-link_status_changed(nc-peer); +if (nc-peer) { +/* Change peer link only if the peer is NIC and then notify peer. + * If the peer is a HUBPORT or a backend, we do not change the + * link status
Re: [Qemu-devel] [PATCH v2 for 2.0] virtio-net: Do not filter VLANs without F_CTRL_VLAN
On 03/26/2014 06:29 AM, Amos Kong wrote: From: Stefan Fritsch s...@sfritsch.de If VIRTIO_NET_F_CTRL_VLAN is not negotiated, do not filter out all VLAN-tagged packets but send them to the guest. This fixes VLANs with OpenBSD guests (and probably NetBSD, too, because the OpenBSD driver started as a port from NetBSD). Signed-off-by: Stefan Fritsch s...@sfritsch.de Signed-off-by: Amos Kong ak...@redhat.com --- V2: it's not good to check guest features at reset time, so reset vlan table in setting features --- hw/net/virtio-net.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 43b4eda..439477b 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -530,6 +530,12 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features) } vhost_net_ack_features(tap_get_vhost_net(nc-peer), features); } + +if ((1 VIRTIO_NET_F_CTRL_VLAN) features) { +memset(n-vlans, 0, MAX_VLAN 3); +} else { +memset(n-vlans, 0xff, MAX_VLAN 3); +} } static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd, Reviewed-by: Vlad Yasevich vyase...@redhat.com -vlad
Re: [Qemu-devel] [PATCH v2] virtio-net: add a field to indicate if vlan table is used
On 02/20/2014 11:38 AM, Amos Kong wrote: Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated. This patch added a new field to @RxFilterInfo to indicate if management uses the vlan table. [1] http://lists.nongnu.org/archive/html/qemu-devel/2014-02/msg02604.html Signed-off-by: Amos Kong ak...@redhat.com --- V2: don't make vlan-table optional, add a flag to indicate if vlan table is used by management --- hw/net/virtio-net.c | 38 +- qapi-schema.json| 3 +++ qmp-commands.hx | 2 ++ 3 files changed, 30 insertions(+), 13 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 3626608..f591f4e 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -222,13 +222,33 @@ static char *mac_strdup_printf(const uint8_t *mac) mac[1], mac[2], mac[3], mac[4], mac[5]); } +static intList *get_vlan_table(VirtIONet *n) +{ +intList *list, *entry; +int i, j; + +list = NULL; +for (i = 0; i MAX_VLAN 5; i++) { +for (j = 0; n-vlans[i] j 0x1f; j++) { +if (n-vlans[i] (1U j)) { +entry = g_malloc0(sizeof(*entry)); +entry-value = (i 5) + j; +entry-next = list; +list = entry; +} +} +} + +return list; +} + static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc) { VirtIONet *n = qemu_get_nic_opaque(nc); +VirtIODevice *vdev = VIRTIO_DEVICE(n); RxFilterInfo *info; strList *str_list, *entry; -intList *int_list, *int_entry; -int i, j; +int i; info = g_malloc0(sizeof(*info)); info-name = g_strdup(nc-name); @@ -273,19 +293,11 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc) str_list = entry; } info-multicast_table = str_list; +info-vlan_table = get_vlan_table(n); -int_list = NULL; -for (i = 0; i MAX_VLAN 5; i++) { -for (j = 0; n-vlans[i] j 0x1f; j++) { -if (n-vlans[i] (1U j)) { -int_entry = g_malloc0(sizeof(*int_entry)); -int_entry-value = (i 5) + j; -int_entry-next = int_list; -int_list = int_entry; -} -} +if ((1 VIRTIO_NET_F_CTRL_VLAN) vdev-guest_features) { +info-vlan = true; } So, in the case that vlan filtering is not supported in the guest we get: vlan: false, vlan-table: [ 0, 1, 2, ... 4095 ] since virtio_net now initializes the table to all 1s. Seems a bit awkward. We are providing a lot of data that is simply going to be ignored. -info-vlan_table = int_list; /* enable event notification after query */ nc-rxfilter_notify_enabled = 1; diff --git a/qapi-schema.json b/qapi-schema.json index 7cfb5e5..5b54e94 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -4032,6 +4032,8 @@ # # @unicast-overflow: unicast table is overflowed or not # +# @vlan: whether management uses the vlan table +# The above description seems a bit confusing to me. The value we are returning describes whether or not qemu is performing vlan filtering. I am not sure if it has any bearing on what management may be doing. I think the idea is that management, in the future, would look at this value and make some decision about applying provided filter to the current host configuration. # @main-mac: the main macaddr string # # @vlan-table: a list of active vlan id @@ -4052,6 +4054,7 @@ 'broadcast-allowed': 'bool', 'multicast-overflow': 'bool', 'unicast-overflow': 'bool', +'vlan': 'bool', Not terribly descriptive. May be call it vlan-filter? Thanks -vlad 'main-mac': 'str', 'vlan-table': ['int'], 'unicast-table': ['str'], diff --git a/qmp-commands.hx b/qmp-commands.hx index cce6b81..b170c79 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -3307,6 +3307,7 @@ Each array entry contains the following: - broadcast-allowed: allow to receive broadcast (json-bool) - multicast-overflow: multicast table is overflowed (json-bool) - unicast-overflow: unicast table is overflowed (json-bool) +- vlan: management uses the vlan table (json-bool) - main-mac: main macaddr string (json-string) - vlan-table: a json-array of active vlan id - unicast-table: a json-array of unicast macaddr string @@ -3321,6 +3322,7 @@ Example: name: vnet0, main-mac: 52:54:00:12:34:56, unicast: normal, +vlan: true, vlan-table: [ 4, 0
Re: [Qemu-devel] [PATCH] virtio-net: only output the vlan table when VIRTIO_NET_F_CTRL_VLAN is negotiated
On 02/18/2014 05:06 AM, Michael S. Tsirkin wrote: On Mon, Feb 17, 2014 at 11:58:11AM -0500, Vlad Yasevich wrote: On 02/17/2014 11:56 AM, Eric Blake wrote: On 02/17/2014 09:52 AM, Eric Blake wrote: On 02/16/2014 07:27 PM, Amos Kong wrote: Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated. We should also not send the vlan table to management, this patch makes the vlan-talbe optional. s/talbe/table/ @@ -4053,7 +4053,7 @@ 'multicast-overflow': 'bool', 'unicast-overflow': 'bool', 'main-mac': 'str', -'vlan-table': ['int'], +'*vlan-table': ['int'], Indentation is now off. - main-mac: main macaddr string (json-string) -- vlan-table: a json-array of active vlan id +- vlan-table: a json-array of active vlan id (optoinal) s/optoinal/optional/ Those fixes are trivial enough, so I'm okay if you correct them then add: Scratch that. I revoke my R-b, on the following grounds: On 02/17/2014 08:27 AM, Vlad Yasevich wrote: On 02/16/2014 09:27 PM, Amos Kong wrote: Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated. We should also not send the vlan table to management, this patch makes the vlan-talbe optional. ^^ table. One question I have from the API perspective is can we suddenly change something to be optional? If there are any users of this , wouldn't they have to change now to check the existence of this list? You are correct. Since the parameter is an output field, older clients may be depending on it existing. It is okay to generate an empty array, but you must not entirely omit the array unless you add further justification in your commit message that you are 100% positive that there are no clients of 1.6 that will be broken when the array no longer appears in the output. Can you rework the patch to just leave the array empty in the case where the bit does not indicate it is used? Or do we need to add a new bool field to the output for new enough management to know whether to use the array? I think a completely empty array should be sufficient. -vlad An empty array could mean either no vlans allowed or all vlans allowed. Also it's nice if users can detect an old buggy qemu. Let's just add an rx state. Fine with me. -vlad
Re: [Qemu-devel] [PATCH] virtio-net: only output the vlan table when VIRTIO_NET_F_CTRL_VLAN is negotiated
On 02/16/2014 09:27 PM, Amos Kong wrote: Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated. We should also not send the vlan table to management, this patch makes the vlan-talbe optional. ^^ table. One question I have from the API perspective is can we suddenly change something to be optional? If there are any users of this , wouldn't they have to change now to check the existence of this list? Thanks -vlad [1] http://lists.nongnu.org/archive/html/qemu-devel/2014-02/msg02604.html Signed-off-by: Amos Kong ak...@redhat.com --- hw/net/virtio-net.c | 38 +- qapi-schema.json| 4 ++-- qmp-commands.hx | 2 +- 3 files changed, 28 insertions(+), 16 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 3626608..0b32e6a 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -222,13 +222,33 @@ static char *mac_strdup_printf(const uint8_t *mac) mac[1], mac[2], mac[3], mac[4], mac[5]); } +static intList *get_vlan_table(VirtIONet *n) +{ +intList *list, *entry; +int i, j; + +list = NULL; +for (i = 0; i MAX_VLAN 5; i++) { +for (j = 0; n-vlans[i] j 0x1f; j++) { +if (n-vlans[i] (1U j)) { +entry = g_malloc0(sizeof(*entry)); +entry-value = (i 5) + j; +entry-next = list; +list = entry; +} +} +} + +return list; +} + static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc) { VirtIONet *n = qemu_get_nic_opaque(nc); +VirtIODevice *vdev = VIRTIO_DEVICE(n); RxFilterInfo *info; strList *str_list, *entry; -intList *int_list, *int_entry; -int i, j; +int i; info = g_malloc0(sizeof(*info)); info-name = g_strdup(nc-name); @@ -274,18 +294,10 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc) } info-multicast_table = str_list; -int_list = NULL; -for (i = 0; i MAX_VLAN 5; i++) { -for (j = 0; n-vlans[i] j 0x1f; j++) { -if (n-vlans[i] (1U j)) { -int_entry = g_malloc0(sizeof(*int_entry)); -int_entry-value = (i 5) + j; -int_entry-next = int_list; -int_list = int_entry; -} -} +if ((1 VIRTIO_NET_F_CTRL_VLAN) vdev-guest_features) { +info-has_vlan_table = true; +info-vlan_table = get_vlan_table(n); } -info-vlan_table = int_list; /* enable event notification after query */ nc-rxfilter_notify_enabled = 1; diff --git a/qapi-schema.json b/qapi-schema.json index 7cfb5e5..5d48fa9 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -4034,7 +4034,7 @@ # # @main-mac: the main macaddr string # -# @vlan-table: a list of active vlan id +# @vlan-table: #optional a list of active vlan id # # @unicast-table: a list of unicast macaddr string # @@ -4053,7 +4053,7 @@ 'multicast-overflow': 'bool', 'unicast-overflow': 'bool', 'main-mac': 'str', -'vlan-table': ['int'], +'*vlan-table': ['int'], 'unicast-table': ['str'], 'multicast-table':['str'] }} diff --git a/qmp-commands.hx b/qmp-commands.hx index cce6b81..a1c1dfa 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -3308,7 +3308,7 @@ Each array entry contains the following: - multicast-overflow: multicast table is overflowed (json-bool) - unicast-overflow: unicast table is overflowed (json-bool) - main-mac: main macaddr string (json-string) -- vlan-table: a json-array of active vlan id +- vlan-table: a json-array of active vlan id (optoinal) - unicast-table: a json-array of unicast macaddr string - multicast-table: a json-array of multicast macaddr string
Re: [Qemu-devel] [PATCH] virtio-net: only output the vlan table when VIRTIO_NET_F_CTRL_VLAN is negotiated
On 02/17/2014 11:56 AM, Eric Blake wrote: On 02/17/2014 09:52 AM, Eric Blake wrote: On 02/16/2014 07:27 PM, Amos Kong wrote: Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated. We should also not send the vlan table to management, this patch makes the vlan-talbe optional. s/talbe/table/ @@ -4053,7 +4053,7 @@ 'multicast-overflow': 'bool', 'unicast-overflow': 'bool', 'main-mac': 'str', -'vlan-table': ['int'], +'*vlan-table': ['int'], Indentation is now off. - main-mac: main macaddr string (json-string) -- vlan-table: a json-array of active vlan id +- vlan-table: a json-array of active vlan id (optoinal) s/optoinal/optional/ Those fixes are trivial enough, so I'm okay if you correct them then add: Scratch that. I revoke my R-b, on the following grounds: On 02/17/2014 08:27 AM, Vlad Yasevich wrote: On 02/16/2014 09:27 PM, Amos Kong wrote: Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated. We should also not send the vlan table to management, this patch makes the vlan-talbe optional. ^^ table. One question I have from the API perspective is can we suddenly change something to be optional? If there are any users of this , wouldn't they have to change now to check the existence of this list? You are correct. Since the parameter is an output field, older clients may be depending on it existing. It is okay to generate an empty array, but you must not entirely omit the array unless you add further justification in your commit message that you are 100% positive that there are no clients of 1.6 that will be broken when the array no longer appears in the output. Can you rework the patch to just leave the array empty in the case where the bit does not indicate it is used? Or do we need to add a new bool field to the output for new enough management to know whether to use the array? I think a completely empty array should be sufficient. -vlad
Re: [Qemu-devel] [RFC PATCH 1/2] e1000: Use Address_Available bit as HW latch
On 11/22/2013 04:47 AM, Jason Wang wrote: On 11/22/2013 04:04 AM, Vlad Yasevich wrote: e1000 provides a E1000_RAH_AV bit on every complete write to the Receive Address Register. We can use this bit 2 ways: 1) To trigger HMP notifications. When the bit is set the mac address is fully set and we can update the HMP. 2) We can turn off he bit on the write to low order bits of the Receive Address Register, so that we would not try to match received traffic to this address when it is not completely set. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/net/e1000.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index ae63591..82978ea 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -1106,10 +1106,19 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA || index == RA + 1) { +switch (index) { +case RA: +/* Mask off AV bit on the write of the low dword. The write of + * the high dword will set the bit. This way a half-written + * mac address will not be used to filter on rx. + */ +s-mac_reg[RA+1] = ~E1000_RAH_AV; If a stupid driver write high dword first, it won't receive any packets. I need to ping Intel guys again. I asked them what happens when only the low register is set, but haven't heard back. +break; +case (RA + 1): macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); Guest may invalid the mac address by clearing the AV bit through writing to high dword. So this may notify a wrong mac address. In this case, testing for the AV bit would solve this issue. Generally, we could teset the AV bit before notification, and try to do the this on both high and low dword. This obeys specs and receive_filter() above. This will not really help since the AV bit would already be set from the prior mac address. So, if a stupid driver writes just the low word, the AV bit would already be set. If we don't want half-written status, driver should clear AV bit before each writing of new mac address. But looks like linux and freebsd does not do this. But the window is really small and harmless. We can emulate this. Thanks for the idea. -vlad +break; } }
[Qemu-devel] [RFC PATCH 2/2] rtl8139: update HMP only when the address is fully written
rtl8139 hardware requires 9346 config register to be set into write mode before mac address can be changed even though it is not documented. Every driver inspected so far appears to do this along with comments that this is an undocumented requirement. We can use this to help us identify when the mac address has been completely written. Simple set a flag whenever mac has changed and at the next transition of 9346 register from Write to Normal mode, we update the HMP. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/i386/pc_piix.c| 4 hw/i386/pc_q35.c | 4 hw/net/rtl8139.c | 50 +- include/hw/i386/pc.h | 8 4 files changed, 65 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 2daa111..731ae3b 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -372,6 +372,10 @@ static QEMUMachine pc_i440fx_machine_v1_7 = { PC_I440FX_1_7_MACHINE_OPTIONS, .name = pc-i440fx-1.7, .init = pc_init_pci_1_7, +.compat_props = (GlobalProperty[]) { +PC_COMPAT_1_7, +{ /* end of list */ } +}, }; #define PC_I440FX_1_6_MACHINE_OPTIONS PC_I440FX_1_7_MACHINE_OPTIONS diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index e2b8907..7b6aedf 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -292,6 +292,10 @@ static QEMUMachine pc_q35_machine_v1_7 = { PC_Q35_1_7_MACHINE_OPTIONS, .name = pc-q35-1.7, .init = pc_q35_init_1_7, +.compat_props = (GlobalProperty[]) { +PC_COMPAT_1_7, +{ /* end of list */ } +}, }; #define PC_Q35_1_6_MACHINE_OPTIONS PC_Q35_1_7_MACHINE_OPTIONS diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 7f2b4db..5c4caec 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -476,6 +476,7 @@ typedef struct RTL8139State { uint16_t CpCmd; uint8_t TxThresh; +bool mac_changed; NICState *nic; NICConf conf; @@ -515,6 +516,9 @@ typedef struct RTL8139State { /* Support migration to/from old versions */ int rtl8139_mmio_io_addr_dummy; +#define RTL8139_FLAG_MAC_BIT 0 +#define RTL8139_FLAG_MAC_COMPLETE (1 RTL8139_FLAG_MAC_BIT) +uint32_tcompat_flags; } RTL8139State; /* Writes tally counters to memory via DMA */ @@ -1215,6 +1219,7 @@ static void rtl8139_reset(DeviceState *d) /* restore MAC address */ memcpy(s-phys, s-conf.macaddr.a, 6); qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed = false; /* reset interrupt mask */ s-IntrStatus = 0; @@ -1563,6 +1568,14 @@ static void rtl8139_Cfg9346_write(RTL8139State *s, uint32_t val) /* Reset. */ val = 0; rtl8139_reset(d); +} else if (opmode == Cfg9346_Normal s-mac_changed) { +/* Even though it is not documented, it is required to set + * opmode to Cfg9346_ConfigWrite when changing the mac address + * of the card and to set to Cfg9346_Normal when done. We + * can use this as an idication to kick off the notification event. + */ +qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed = false; } s-Cfg9346 = val; @@ -2743,7 +2756,12 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) { case MAC0 ... MAC0+5: s-phys[addr - MAC0] = val; -qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +if (s-compat_flags RTL8139_FLAG_MAC_COMPLETE) { +s-mac_changed = true; +} else if (addr == MAC0+5) { +/* Emulate old style updates on the last write */ +qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +} break; case MAC0+6 ... MAC0+7: /* reserved */ @@ -3256,6 +3274,13 @@ static int rtl8139_post_load(void *opaque, int version_id) * to link status bit in BasicModeStatus */ qemu_get_queue(s-nic)-link_down = (s-BasicModeStatus 0x04) == 0; +/* Emulate old behavior if we don't support mac change completion + * tracking + */ +if (!(s-compat_flags RTL8139_FLAG_MAC_COMPLETE)) { +s-mac_changed = false; +} + return 0; } @@ -3286,6 +3311,24 @@ static void rtl8139_pre_save(void *opaque) s-rtl8139_mmio_io_addr_dummy = 0; } +static bool rtl8139_mac_state_needed(void *opaque) +{ +RTL8139State *s = opaque; + +return (s-compat_flags RTL8139_FLAG_MAC_COMPLETE) s-mac_changed; +} + +static const VMStateDescription vmstate_rtl8139_mac_state ={ +.name = rtl8139/mac_state, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_BOOL(mac_changed, RTL8139State), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_rtl8139 = { .name = rtl8139, .version_id = 4, @@ -3371,6 +3414,9 @@ static const
[Qemu-devel] [RFC PATCH 1/2] e1000: Use Address_Available bit as HW latch
e1000 provides a E1000_RAH_AV bit on every complete write to the Receive Address Register. We can use this bit 2 ways: 1) To trigger HMP notifications. When the bit is set the mac address is fully set and we can update the HMP. 2) We can turn off he bit on the write to low order bits of the Receive Address Register, so that we would not try to match received traffic to this address when it is not completely set. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/net/e1000.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index ae63591..82978ea 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -1106,10 +1106,19 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA || index == RA + 1) { +switch (index) { +case RA: +/* Mask off AV bit on the write of the low dword. The write of + * the high dword will set the bit. This way a half-written + * mac address will not be used to filter on rx. + */ +s-mac_reg[RA+1] = ~E1000_RAH_AV; +break; +case (RA + 1): macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); +break; } } -- 1.8.4.2
[Qemu-devel] [PATCH] net: Update netdev peer on link change
When a link change occurs on a backend (like tap), we currently do not propage such change to the nic. As a result, when someone turns off a link on a tap device, for instance, then a guest doesn't see that change and continues to try to send traffic or run DHCP even though the lower-layer is disconnected. This is OK when the network is set up as a HUB since the the guest may be connected to other HUB ports too, but when it's set up as a netdev, it makes thinkgs worse. The patch addresses this by setting the peers link down only when the peer is not a HUBPORT device. With this patch, in the following config -netdev tap,id=net0 -device e1000,mac=X,netdev=net0 when net0 link is turned off, the guest e1000 shows lower-layer link down. This allows guests to boot much faster in such configurations. With windows guest, it also allows the network to recover properly since windows will not configure the link-local IPv4 address, and when the link is turned on, the proper address address is configured. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- net/net.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/net/net.c b/net/net.c index 0a88e68..8a084bf 100644 --- a/net/net.c +++ b/net/net.c @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error **errp) nc-info-link_status_changed(nc); } -/* Notify peer. Don't update peer link status: this makes it possible to - * disconnect from host network without notifying the guest. - * FIXME: is disconnected link status change operation useful? - * - * Current behaviour is compatible with qemu vlans where there could be - * multiple clients that can still communicate with each other in - * disconnected mode. For now maintain this compatibility. */ -if (nc-peer nc-peer-info-link_status_changed) { -nc-peer-info-link_status_changed(nc-peer); +if (nc-peer) { +/* Change peer link only if the peer is NIC and then notify peer. + * If the peer is a HUBPORT or a backend, we do not change the + * link status. + * + * This behavior is compatible with qemu vlans where there could be + * multiple clients that can still communicate with each other in + * disconnected mode. For now maintain this compatibility. + */ +if (nc-peer-info-type == NET_CLIENT_OPTIONS_KIND_NIC) { +for (i = 0; i queues; i++) { +ncs[i]-peer-link_down = !up; +} +} +if (nc-peer-info-link_status_changed) { +nc-peer-info-link_status_changed(nc-peer); +} } } -- 1.8.4.2
[Qemu-devel] [RFC PATCH 0/2] Update HMP only upon mac change completion.
Recent threads regarding e1000/rtl8139 and mac address change notifications prompted some research into the respecitive hw data sheets as well as available drivers. What I found is that each hw has a mechanism that can be used by our emulation layer to determine when the mac address change has completed (when the OS finished writing the mac address), and we can use these mechanisms to trigger HMP notifications. This is only an RFC series. It's been tested and works well. I've split e1000 and rtl8139 changes as they are sufficiently different. e1000 make this very clean and easy, but rtl8139 isn't as nice. Please take a look and I'd like to hear your comments. Thanks -vlad Vlad Yasevich (2): e1000: Use Address_Available bit as HW latch rtl8139: update HMP only when the address is fully written hw/i386/pc_piix.c| 4 hw/i386/pc_q35.c | 4 hw/net/e1000.c | 11 ++- hw/net/rtl8139.c | 50 +- include/hw/i386/pc.h | 8 5 files changed, 75 insertions(+), 2 deletions(-) -- 1.8.4.2
Re: [Qemu-devel] [PATCH for-1.7] qdev-properties-system.c: Allow vlan or netdev for -device, not both
On 11/11/2013 02:50 AM, Paolo Bonzini wrote: Il 11/11/2013 06:18, Jason Wang ha scritto: On 11/08/2013 10:13 AM, Vlad Yasevich wrote: It is currently possible to specify things like: -device e1000,netdev=foo,vlan=1 With this usage, whichever argument was specified last (vlan or netdev) overwrites what was previousely set and results in a non-working configuration. Even worse, when used with multiqueue devices, it causes a segmentation fault on exit in qemu_free_net_client. That patch treates the above command line options as invalid and generates an error at start-up. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/core/qdev-properties-system.c | 9 + 1 file changed, 9 insertions(+) diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index 0eada32..729efa8 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -205,6 +205,11 @@ static int parse_netdev(DeviceState *dev, const char *str, void **ptr) goto err; } +if (ncs[i]) { +ret = -EINVAL; +goto err; +} + ncs[i] = peers[i]; ncs[i]-queue_index = i; } @@ -301,6 +306,10 @@ static void set_vlan(Object *obj, Visitor *v, void *opaque, *ptr = NULL; return; } +if (*ptr) { +error_set_from_qdev_prop_error(errp, -EINVAL, dev, prop, name); +return; +} hubport = net_hub_port_find(id); if (!hubport) { Acked-by: Jason Wang jasow...@redhat.com This patch is good for 1.7. Paolo Hi All Just wondering what's to become of this patch? It has an ACK, but has been abandoned. It does a fix a crash in qemu. Thanks -vlad
Re: [Qemu-devel] [PATCH] virtio-net: fix the memory leak in rxfilter_notify()
On 11/18/2013 08:47 AM, Amos Kong wrote: object_get_canonical_path() returns a gchar*, it should be freeed by the caller. Signed-off-by: Amos Kong ak...@redhat.com Reviewed-by: Vlad Yasevich vyase...@redhat.com -vlad --- hw/net/virtio-net.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 613f144..2b2fb57 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -198,15 +198,14 @@ static void rxfilter_notify(NetClientState *nc) { QObject *event_data; VirtIONet *n = qemu_get_nic_opaque(nc); +gchar *path = object_get_canonical_path(OBJECT(n-qdev)); if (nc-rxfilter_notify_enabled) { if (n-netclient_name) { event_data = qobject_from_jsonf({ 'name': %s, 'path': %s }, -n-netclient_name, - object_get_canonical_path(OBJECT(n-qdev))); +n-netclient_name, path); } else { -event_data = qobject_from_jsonf({ 'path': %s }, - object_get_canonical_path(OBJECT(n-qdev))); +event_data = qobject_from_jsonf({ 'path': %s }, path); } monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data); qobject_decref(event_data); @@ -214,6 +213,7 @@ static void rxfilter_notify(NetClientState *nc) /* disable event notification to avoid events flooding */ nc-rxfilter_notify_enabled = 0; } +g_free(path); } static char *mac_strdup_printf(const uint8_t *mac)
Re: [Qemu-devel] [PATCH v2] virtio-net: fix the memory leak in rxfilter_notify()
On 11/18/2013 10:32 AM, Amos Kong wrote: object_get_canonical_path() returns a gchar*, it should be freeed by the caller. Signed-off-by: Amos Kong ak...@redhat.com Reviewed-by: Vlad Yasevich vyase...@redhat.com -vlad --- v2: put gchar *path inside rxfilter_notify_enabled block --- hw/net/virtio-net.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 613f144..b75c753 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -200,16 +200,16 @@ static void rxfilter_notify(NetClientState *nc) VirtIONet *n = qemu_get_nic_opaque(nc); if (nc-rxfilter_notify_enabled) { +gchar *path = object_get_canonical_path(OBJECT(n-qdev)); if (n-netclient_name) { event_data = qobject_from_jsonf({ 'name': %s, 'path': %s }, -n-netclient_name, - object_get_canonical_path(OBJECT(n-qdev))); +n-netclient_name, path); } else { -event_data = qobject_from_jsonf({ 'path': %s }, - object_get_canonical_path(OBJECT(n-qdev))); +event_data = qobject_from_jsonf({ 'path': %s }, path); } monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data); qobject_decref(event_data); +g_free(path); /* disable event notification to avoid events flooding */ nc-rxfilter_notify_enabled = 0;
Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written
On 11/18/2013 10:02 AM, Michael S. Tsirkin wrote: On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote: We currently just update the HMP NIC info when the last bit of macaddr is written. This assumes that guest driver will write all the macaddr from bit 0 to bit 5 when it changes the macaddr, this is the current behavior of linux driver (e1000/rtl8139cp), but we can't do this assumption. The macaddr that is used for rx-filter will be updated when every bit is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC info when every bit is changed. It will be same as virtio-net. Signed-off-by: Amos Kong ak...@redhat.com Vlad here told me he did some research and this does not actually match hardware behaviour for either e1000 or rtl8139. Vlad, would you like to elaborate on-list? Sure. I decided to dig into the hardware data-sheets and the OS drivers that use the HW to see what's really going on and how the hw expects this data to be programmed. Here is what I've found so far: E1000: e1000 stores each mac address in 2 registers: RAL - receive register low RAH - receive register high The highest order bit of RAH (bit 31) is called the address available bit. When this bit is set the HW will attempt to use the address for packet matching. So, when the mac address is initially programmed into HW, that AV bit is not set until RAH is written. Thus drivers really should do writes in RAL+RAH order, otherwise AV bit would be set on a partially set address. There is a slight issue when the receive address registers already have a value. Since the address is written in 2 separate writes, there is a very small window of time when the RAL is set to the new value and RAH is set to the old value with AV still set. I am talking to Intel guys about this now. So from the POV of notifying libvirt/management that address is changed, it should be done when RAH is set. RTL8139: Realtek devices have a 9346CR Command Register that gates write access to certain configuration regions on the HW. It is placed into Configuration register write enabled mode before driver can write to one of a set of configuration spaces. Even though the data sheet doesn't mention this, it appears that this must also must be used to guard write access to receive address register of the card. All variants of BSD and linux drivers that I've found use this along with comment that say that this is an undocumented requirement. I am not sure what the HW does to incoming frames when the command register is to this mode. I see 2 things that we might be able to do here: 1) A low-impact change might be to only notify the management when we've detected an address change and currently exiting config write mode. 2) A more invasive change might be to disable rx_handling while in config wirte mode. This would prevent attempting to match packets to a partially written mac address. I have a patch that does (1) above. Thoughts? -vlad I think we should revert this for 1.8 and look at emulating actual hardware behaviour. --- hw/net/e1000.c | 2 +- hw/net/rtl8139.c | 5 + 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index ec8ecd7..2d60639 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -1110,7 +1110,7 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA + 1) { +if (index == RA || index == RA + 1) { macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 5329f44..7f2b4db 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -2741,10 +2741,7 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) switch (addr) { -case MAC0 ... MAC0+4: -s-phys[addr - MAC0] = val; -break; -case MAC0+5: +case MAC0 ... MAC0+5: s-phys[addr - MAC0] = val; qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); break; -- 1.8.3.1
Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written
On 11/18/2013 02:42 PM, Michael S. Tsirkin wrote: On Mon, Nov 18, 2013 at 12:33:20PM -0500, Vlad Yasevich wrote: On 11/18/2013 10:02 AM, Michael S. Tsirkin wrote: On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote: We currently just update the HMP NIC info when the last bit of macaddr is written. This assumes that guest driver will write all the macaddr from bit 0 to bit 5 when it changes the macaddr, this is the current behavior of linux driver (e1000/rtl8139cp), but we can't do this assumption. The macaddr that is used for rx-filter will be updated when every bit is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC info when every bit is changed. It will be same as virtio-net. Signed-off-by: Amos Kong ak...@redhat.com Vlad here told me he did some research and this does not actually match hardware behaviour for either e1000 or rtl8139. Vlad, would you like to elaborate on-list? Sure. I decided to dig into the hardware data-sheets and the OS drivers that use the HW to see what's really going on and how the hw expects this data to be programmed. Here is what I've found so far: E1000: e1000 stores each mac address in 2 registers: RAL - receive register low RAH - receive register high The highest order bit of RAH (bit 31) is called the address available bit. When this bit is set the HW will attempt to use the address for packet matching. So, when the mac address is initially programmed into HW, that AV bit is not set until RAH is written. Thus drivers really should do writes in RAL+RAH order, otherwise AV bit would be set on a partially set address. There is a slight issue when the receive address registers already have a value. Since the address is written in 2 separate writes, there is a very small window of time when the RAL is set to the new value and RAH is set to the old value with AV still set. I am talking to Intel guys about this now. So from the POV of notifying libvirt/management that address is changed, it should be done when RAH is set. RTL8139: Realtek devices have a 9346CR Command Register that gates write access to certain configuration regions on the HW. It is placed into Configuration register write enabled mode before driver can write to one of a set of configuration spaces. Even though the data sheet doesn't mention this, it appears that this must also must be used to guard write access to receive address register of the card. All variants of BSD and linux drivers that I've found use this along with comment that say that this is an undocumented requirement. What does a windows driver do BTW? I'll boot windows and find out. -vlad I am not sure what the HW does to incoming frames when the command register is to this mode. I see 2 things that we might be able to do here: 1) A low-impact change might be to only notify the management when we've detected an address change and currently exiting config write mode. 2) A more invasive change might be to disable rx_handling while in config wirte mode. This would prevent attempting to match packets to a partially written mac address. I have a patch that does (1) above. Thoughts? -vlad Let's start by reverting cd5be5829c1ce87aa6b3a7806524fac07ac9a757. I think we should revert this for 1.8 and look at emulating actual hardware behaviour. --- hw/net/e1000.c | 2 +- hw/net/rtl8139.c | 5 + 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index ec8ecd7..2d60639 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -1110,7 +1110,7 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA + 1) { +if (index == RA || index == RA + 1) { macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 5329f44..7f2b4db 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -2741,10 +2741,7 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) switch (addr) { -case MAC0 ... MAC0+4: -s-phys[addr - MAC0] = val; -break; -case MAC0+5: +case MAC0 ... MAC0+5: s-phys[addr - MAC0] = val; qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); break; -- 1.8.3.1
Re: [Qemu-devel] [PATCH for-1.7] Revert e1000/rtl8139: update HMP NIC when every bit is written
On 11/18/2013 02:47 PM, Michael S. Tsirkin wrote: This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757. Digging into hardware specs shows this does not actually make QEMU behave more like hardware. Let's stick to the tried heuristic for 1.7 and possibly revisit for 1.8. Reported-by: Vlad Yasevich vyase...@redhat.com Cc: Amos Kong ak...@redhat.com Cc: Alex Williamson alex.william...@redhat.com Reviewed-by: Vlad Yasevich vyase...@redhat.com. -vlad --- hw/net/e1000.c | 2 +- hw/net/rtl8139.c | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index ae63591..8387443 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -1106,7 +1106,7 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA || index == RA + 1) { +if (index == RA + 1) { macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 7f2b4db..5329f44 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -2741,7 +2741,10 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) switch (addr) { -case MAC0 ... MAC0+5: +case MAC0 ... MAC0+4: +s-phys[addr - MAC0] = val; +break; +case MAC0+5: s-phys[addr - MAC0] = val; qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); break;
Re: [Qemu-devel] [PATCH for-1.7] Revert e1000/rtl8139: update HMP NIC when every bit is written
On 11/18/2013 02:58 PM, Alex Williamson wrote: On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote: This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757. Digging into hardware specs shows this does not actually make QEMU behave more like hardware. Let's stick to the tried heuristic for 1.7 and possibly revisit for 1.8. If this is broken, then so are these: 23c37c37f0280761072c23bf67d3a4f3c0ff25aa 7c36507c2b8776266f50c5e2739bd18279953b93 These aren't really broken. They just assume that the high order writes will happen after the low order writes. In the case of e1000, this is a little more then an assumption (particularly in the case of nic initilization). In the case of RTL nic, it is just an assumption, but it hasn't been shown faulty yet. We do plan to address this a bit more thoroughly. The patch that was applied was controversial and more then 1 person expressed reservations. -vlad None of these change the behavior of hardware, they only change when the monitor gets told about mac address changes. I'd suggest either add the emulation described in each spec or revert all of them. A partial revert is just noise. Thanks, Alex Reported-by: Vlad Yasevich vyase...@redhat.com Cc: Amos Kong ak...@redhat.com Cc: Alex Williamson alex.william...@redhat.com --- hw/net/e1000.c | 2 +- hw/net/rtl8139.c | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index ae63591..8387443 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -1106,7 +1106,7 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA || index == RA + 1) { +if (index == RA + 1) { macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 7f2b4db..5329f44 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -2741,7 +2741,10 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) switch (addr) { -case MAC0 ... MAC0+5: +case MAC0 ... MAC0+4: +s-phys[addr - MAC0] = val; +break; +case MAC0+5: s-phys[addr - MAC0] = val; qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); break;
Re: [Qemu-devel] [PATCH for-1.7] Revert e1000/rtl8139: update HMP NIC when every bit is written
On 11/18/2013 03:33 PM, Alex Williamson wrote: On Mon, 2013-11-18 at 15:09 -0500, Vlad Yasevich wrote: On 11/18/2013 02:58 PM, Alex Williamson wrote: On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote: This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757. Digging into hardware specs shows this does not actually make QEMU behave more like hardware. Let's stick to the tried heuristic for 1.7 and possibly revisit for 1.8. If this is broken, then so are these: 23c37c37f0280761072c23bf67d3a4f3c0ff25aa 7c36507c2b8776266f50c5e2739bd18279953b93 These aren't really broken. They just assume that the high order writes will happen after the low order writes. In the case of e1000, this is a little more then an assumption (particularly in the case of nic initilization). But AIUI there's also a valid bit in that high order byte on e1000, so reverting cd5be582 means we stuff a new mac into qemu less often, but it's still only accurate some of the time. Yes, there is a slight issue with validity of mac at the time of processing packets. I have an outstanding question on the Intel list about this behavior with real HW. But, with e1000, the validity bit provides a much higher guarantee that a guest that will be setting the mac address will write the high register second to guarantee that when the valid bit is written, the mac is fully valid. As a result we don't really need the e1000 part of the cd5be5829. In the case of RTL nic, it is just an assumption, but it hasn't been shown faulty yet. We do plan to address this a bit more thoroughly. So how is RTL less broken without cd5be582? AIUI the valid bit is off in a separate register on RTL, so we have no guarantee about order of updating the mac. Without cd5be582 the info in the monitor may be permanently broken if the guest uses a write order other than what we assume. This one is actually not as bad either. RTL spec requires that receive register writes happen as 32 bit word writes. This is what linux and bsd drivers do, so from driver perspective, the issue is the same. What our emulation layer does is turn these 32 bit writes into 4 8-bit writes. This is likely due to some very broken and very old drivers, but I am not sure. So, the information in the monitor will be broken if the guest does: write_hi(); write_lo(); A part of me would really like to see a guest that does this :) The current code isn't perfect either. It still has a potential to show the wrong mac address in the monitor. I doesn't make a lot of sense to me to replace one sub-optimal solution with another sub-optimal solution, especially since no-one complained. -vlad The patch that was applied was controversial and more then 1 person expressed reservations. Understood, but it also raised and addressed a shortcoming in the previous patches. If cd5be582 was controversial because the monitor was getting updated with incorrect mac addresses then this simple revert doesn't solve that problem. Thanks, Alex None of these change the behavior of hardware, they only change when the monitor gets told about mac address changes. I'd suggest either add the emulation described in each spec or revert all of them. A partial revert is just noise. Thanks, Alex Reported-by: Vlad Yasevich vyase...@redhat.com Cc: Amos Kong ak...@redhat.com Cc: Alex Williamson alex.william...@redhat.com --- hw/net/e1000.c | 2 +- hw/net/rtl8139.c | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index ae63591..8387443 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -1106,7 +1106,7 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA || index == RA + 1) { +if (index == RA + 1) { macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 7f2b4db..5329f44 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -2741,7 +2741,10 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) switch (addr) { -case MAC0 ... MAC0+5: +case MAC0 ... MAC0+4: +s-phys[addr - MAC0] = val; +break; +case MAC0+5: s-phys[addr - MAC0] = val; qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); break;
Re: [Qemu-devel] [PATCH for-1.7] Revert e1000/rtl8139: update HMP NIC when every bit is written
On 11/18/2013 04:33 PM, Alex Williamson wrote: On Mon, 2013-11-18 at 15:57 -0500, Vlad Yasevich wrote: On 11/18/2013 03:33 PM, Alex Williamson wrote: On Mon, 2013-11-18 at 15:09 -0500, Vlad Yasevich wrote: On 11/18/2013 02:58 PM, Alex Williamson wrote: On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote: This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757. Digging into hardware specs shows this does not actually make QEMU behave more like hardware. Let's stick to the tried heuristic for 1.7 and possibly revisit for 1.8. If this is broken, then so are these: 23c37c37f0280761072c23bf67d3a4f3c0ff25aa 7c36507c2b8776266f50c5e2739bd18279953b93 These aren't really broken. They just assume that the high order writes will happen after the low order writes. In the case of e1000, this is a little more then an assumption (particularly in the case of nic initilization). But AIUI there's also a valid bit in that high order byte on e1000, so reverting cd5be582 means we stuff a new mac into qemu less often, but it's still only accurate some of the time. Yes, there is a slight issue with validity of mac at the time of processing packets. I have an outstanding question on the Intel list about this behavior with real HW. But, with e1000, the validity bit provides a much higher guarantee that a guest that will be setting the mac address will write the high register second to guarantee that when the valid bit is written, the mac is fully valid. As a result we don't really need the e1000 part of the cd5be5829. But doesn't that valid bit mean that a mac update will start and end with a write to the high order register? So we're assuming: a) write RA + 1 (invalidate) b) write RA (write low) c) write RA + 1 (write high + valid) No. On update, only steps b and c typically happen. Thus my question to the on the intel list. Without cd5be5829 the only change is that we don't store a new mac into the monitor at b). The mac stored in the monitor is still wrong from a) until c). So it's ever so slightly less broken without cd5be5829. Since there is really no a), the mac in the monitor is only different after step b). since it's is incomplete and we expect step c), there is really no point in updating it. In the case of RTL nic, it is just an assumption, but it hasn't been shown faulty yet. We do plan to address this a bit more thoroughly. So how is RTL less broken without cd5be582? AIUI the valid bit is off in a separate register on RTL, so we have no guarantee about order of updating the mac. Without cd5be582 the info in the monitor may be permanently broken if the guest uses a write order other than what we assume. This one is actually not as bad either. RTL spec requires that receive register writes happen as 32 bit word writes. This is what linux and bsd drivers do, so from driver perspective, the issue is the same. What our emulation layer does is turn these 32 bit writes into 4 8-bit writes. This is likely due to some very broken and very old drivers, but I am not sure. So, the information in the monitor will be broken if the guest does: write_hi(); write_lo(); A part of me would really like to see a guest that does this :) So the argument for reverting here is that it seems unlikely that the dwords get written in the hi-lo order and we'd rather the monitor get stuck with the wrong mac forever For how many/which guests? I know it's not linux or BSD. I need to boot windows to see what it does, but I think it does the right thing. than it show the wrong mac address for a tiny window for everybody else? Yes, this would happen for everybody. If you are querying the output, you might see this and it will show up as 2 changes. We are talking about 2 tiny amounts here: On the one hand, we __might__ have guests that write mac and reverse order thus showing wrong address. On the other hand we have all the guests who will show the wrong address for a _short_ time. I have a hard time deciding, but have a slight preference for a small uncertainty (the # of backward writers is very small), rather then a small certainty (everyone will be effected for a small period of time). I think you say something about sub-optimal here... The current code isn't perfect either. It still has a potential to show the wrong mac address in the monitor. I doesn't make a lot of sense to me to replace one sub-optimal solution with another sub-optimal solution, especially since no-one complained. Exactly, the code isn't perfect either way and this revert is just replacing one sub-optimal solution for another. So why do it? Another part of this, for me, is that a change was made that we had a disagreement on and a different approach was being discussed. The revert is to bring us back to before this change. -vlad Thanks, Alex The patch that was applied was controversial and more then 1 person expressed reservations
Re: [Qemu-devel] [PATCH for-1.7] Revert e1000/rtl8139: update HMP NIC when every bit is written
On 11/18/2013 05:40 PM, Alex Williamson wrote: On Mon, 2013-11-18 at 17:07 -0500, Vlad Yasevich wrote: On 11/18/2013 04:33 PM, Alex Williamson wrote: On Mon, 2013-11-18 at 15:57 -0500, Vlad Yasevich wrote: On 11/18/2013 03:33 PM, Alex Williamson wrote: On Mon, 2013-11-18 at 15:09 -0500, Vlad Yasevich wrote: On 11/18/2013 02:58 PM, Alex Williamson wrote: On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote: This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757. Digging into hardware specs shows this does not actually make QEMU behave more like hardware. Let's stick to the tried heuristic for 1.7 and possibly revisit for 1.8. If this is broken, then so are these: 23c37c37f0280761072c23bf67d3a4f3c0ff25aa 7c36507c2b8776266f50c5e2739bd18279953b93 These aren't really broken. They just assume that the high order writes will happen after the low order writes. In the case of e1000, this is a little more then an assumption (particularly in the case of nic initilization). But AIUI there's also a valid bit in that high order byte on e1000, so reverting cd5be582 means we stuff a new mac into qemu less often, but it's still only accurate some of the time. Yes, there is a slight issue with validity of mac at the time of processing packets. I have an outstanding question on the Intel list about this behavior with real HW. But, with e1000, the validity bit provides a much higher guarantee that a guest that will be setting the mac address will write the high register second to guarantee that when the valid bit is written, the mac is fully valid. As a result we don't really need the e1000 part of the cd5be5829. But doesn't that valid bit mean that a mac update will start and end with a write to the high order register? So we're assuming: a) write RA + 1 (invalidate) b) write RA (write low) c) write RA + 1 (write high + valid) No. On update, only steps b and c typically happen. Thus my question to the on the intel list. So perhaps the bit is some kind of data latch bit and the mac address fields within those registers are effectively scratch until that bit is written? Without cd5be5829 the only change is that we don't store a new mac into the monitor at b). The mac stored in the monitor is still wrong from a) until c). So it's ever so slightly less broken without cd5be5829. Since there is really no a), the mac in the monitor is only different after step b). since it's is incomplete and we expect step c), there is really no point in updating it. Great, so I have no argument against reverting, or just fixing, that chunk of cd5be5829. Let's implement the latch bit too. In the case of RTL nic, it is just an assumption, but it hasn't been shown faulty yet. We do plan to address this a bit more thoroughly. So how is RTL less broken without cd5be582? AIUI the valid bit is off in a separate register on RTL, so we have no guarantee about order of updating the mac. Without cd5be582 the info in the monitor may be permanently broken if the guest uses a write order other than what we assume. This one is actually not as bad either. RTL spec requires that receive register writes happen as 32 bit word writes. This is what linux and bsd drivers do, so from driver perspective, the issue is the same. What our emulation layer does is turn these 32 bit writes into 4 8-bit writes. This is likely due to some very broken and very old drivers, but I am not sure. So, the information in the monitor will be broken if the guest does: write_hi(); write_lo(); A part of me would really like to see a guest that does this :) So the argument for reverting here is that it seems unlikely that the dwords get written in the hi-lo order and we'd rather the monitor get stuck with the wrong mac forever For how many/which guests? I know it's not linux or BSD. I need to boot windows to see what it does, but I think it does the right thing. How many guests do you plan to test? I think the proposal was to see if anyone reports an issue :) than it show the wrong mac address for a tiny window for everybody else? Yes, this would happen for everybody. If you are querying the output, you might see this and it will show up as 2 changes. We are talking about 2 tiny amounts here: On the one hand, we __might__ have guests that write mac and reverse order thus showing wrong address. On the other hand we have all the guests who will show the wrong address for a _short_ time. I have a hard time deciding, but have a slight preference for a small uncertainty (the # of backward writers is very small), rather then a small certainty (everyone will be effected for a small period of time). Why compromise? Why not implement the register that handles whether the mac is valid on RTL? I am working on this but it's too late for 1.7. For 1.7, we have a question: impact a very small number (# - 0) of guests for a long time, or impact
Re: [Qemu-devel] [PATCH v2] pc: add 1.8 machine type
On 11/14/2013 05:37 AM, Michael S. Tsirkin wrote: -#define PC_Q35_1_7_MACHINE_OPTIONS PC_Q35_MACHINE_OPTIONS +#define PC_Q35_1_8_MACHINE_OPTIONS PC_Q35_MACHINE_OPTIONS + +static QEMUMachine pc_q35_machine_v1_8 = { +PC_Q35_1_8_MACHINE_OPTIONS, +.name = pc-q35-1.8, +.alias = q35, +.init = pc_q35_init, +}; + +#define PC_Q35_1_7_MACHINE_OPTIONS PC_Q35_1_8_MACHINE_OPTIONS static QEMUMachine pc_q35_machine_v1_7 = { PC_Q35_1_7_MACHINE_OPTIONS, .name = pc-q35-1.7, .alias = q35, -.init = pc_q35_init, +.init = pc_q35_init_1_7, }; Hi Michael Shouldn't the '.alias' be removed from the 1.7 machine? Thanks -vlad -#define PC_Q35_1_6_MACHINE_OPTIONS PC_Q35_MACHINE_OPTIONS +#define PC_Q35_1_6_MACHINE_OPTIONS PC_Q35_1_7_MACHINE_OPTIONS static QEMUMachine pc_q35_machine_v1_6 = { PC_Q35_1_6_MACHINE_OPTIONS, @@ -313,6 +333,7 @@ static QEMUMachine pc_q35_machine_v1_4 = { static void pc_q35_machine_init(void) { +qemu_register_machine(pc_q35_machine_v1_8); qemu_register_machine(pc_q35_machine_v1_7); qemu_register_machine(pc_q35_machine_v1_6); qemu_register_machine(pc_q35_machine_v1_5);
Re: [Qemu-devel] [PATCH] net: move rxfilter_notify() to net.c
On 11/15/2013 10:27 AM, Stefan Hajnoczi wrote: On Tue, Nov 05, 2013 at 07:00:48PM +0800, Amos Kong wrote: @@ -545,7 +523,7 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd, return VIRTIO_NET_ERR; } -rxfilter_notify(nc); +rxfilter_notify(nc, object_get_canonical_path(OBJECT(n-qdev))); return VIRTIO_NET_OK; } [...] diff --git a/net/net.c b/net/net.c index c330c9a..f41a457 100644 --- a/net/net.c +++ b/net/net.c @@ -40,6 +40,7 @@ #include qapi-visit.h #include qapi/opts-visitor.h #include qapi/dealloc-visitor.h +#include qapi/qmp/qjson.h /* Net bridge is currently not supported for W32. */ #if !defined(_WIN32) @@ -962,6 +963,25 @@ void print_net_client(Monitor *mon, NetClientState *nc) nc-info_str); } +void rxfilter_notify(NetClientState *nc, const char *path) +{ +QObject *event_data; + +if (nc-rxfilter_notify_enabled) { +if (nc-name) { +event_data = qobject_from_jsonf({ 'name': %s, 'path': %s }, + nc-name, path); +} else { +event_data = qobject_from_jsonf({ 'path': %s }, path); +} +monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data); +qobject_decref(event_data); + +/* disable event notification to avoid events flooding */ +nc-rxfilter_notify_enabled = 0; +} +} Please fix the memory leak: object_get_canonical_path() returns a gchar* that the caller must free. Wow, this memory leak is all over the place and not just in this patch. -vlad
[Qemu-devel] [PATCH] qom: Fix memory leak in object_property_set_link()
Save the result of the call to object_get_cannonical_path() so we can free it. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- qom/object.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/qom/object.c b/qom/object.c index b617f26..21b6f87 100644 --- a/qom/object.c +++ b/qom/object.c @@ -838,8 +838,9 @@ char *object_property_get_str(Object *obj, const char *name, void object_property_set_link(Object *obj, Object *value, const char *name, Error **errp) { -object_property_set_str(obj, object_get_canonical_path(value), -name, errp); +gchar *path = object_get_cannonical_path(value); +object_property_set_str(obj, path, name, errp); +gfree(path); } Object *object_property_get_link(Object *obj, const char *name, -- 1.8.4.2
Re: [Qemu-devel] [PATCH] qom: Fix memory leak in object_property_set_link()
On 11/15/2013 12:06 PM, Vlad Yasevich wrote: Save the result of the call to object_get_cannonical_path() so we can free it. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- qom/object.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/qom/object.c b/qom/object.c index b617f26..21b6f87 100644 --- a/qom/object.c +++ b/qom/object.c @@ -838,8 +838,9 @@ char *object_property_get_str(Object *obj, const char *name, void object_property_set_link(Object *obj, Object *value, const char *name, Error **errp) { -object_property_set_str(obj, object_get_canonical_path(value), -name, errp); +gchar *path = object_get_cannonical_path(value); +object_property_set_str(obj, path, name, errp); +gfree(path); } Object *object_property_get_link(Object *obj, const char *name, Self-nack. Forgot to commit the changes that actually make it build and work :/ -vlad
[Qemu-devel] [PATCH v2] qom: Fix memory leak in object_property_set_link()
Save the result of the call to object_get_cannonical_path() so we can free it. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- v1-v2: Builds and works :) qom/object.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/qom/object.c b/qom/object.c index b617f26..fc19cf6 100644 --- a/qom/object.c +++ b/qom/object.c @@ -838,8 +838,9 @@ char *object_property_get_str(Object *obj, const char *name, void object_property_set_link(Object *obj, Object *value, const char *name, Error **errp) { -object_property_set_str(obj, object_get_canonical_path(value), -name, errp); +gchar *path = object_get_canonical_path(value); +object_property_set_str(obj, path, name, errp); +g_free(path); } Object *object_property_get_link(Object *obj, const char *name, -- 1.8.4.2
Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written
On 11/10/2013 07:11 AM, Michael S. Tsirkin wrote: On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote: What about this approach? This only updates the monitory when all the bits have been written to. -vlad Thanks! Some comments below. -- 8 -- Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written We currently just update the HMP NIC info when the last bit of macaddr is written. This assumes that guest driver will write all the macaddr from bit 0 to bit 5 when it changes the macaddr, this is the current behavior of linux driver (e1000/rtl8139cp), but we can't do this assumption. I would rather say it seems better not to make this assumption. This does look somewhat safer than what Amos proposed. The macaddr that is used for rx-filter will be updated when every bit is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC info when every bit has been changed. It will be same as virtio-net. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/net/e1000.c | 17 - hw/net/rtl8139.c | 14 +- 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 8387443..a5967ed 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -149,6 +149,10 @@ typedef struct E1000State_st { #define E1000_FLAG_AUTONEG (1 E1000_FLAG_AUTONEG_BIT) #define E1000_FLAG_MIT (1 E1000_FLAG_MIT_BIT) uint32_t compat_flags; +uint32_t mac_changed; Hmm why uint32_t? uint8_t is enough here isn't? This new state has to be migrated then, and we need to fallback to old behaviour if migrating to/from an old version (see compat_flags for one way to detect this compatibility mode). Hi Michael I started looking at migrating this thing and I now starting to question the whole approach. The only reason to migrate this is if we can migrate between writes to the mac address registers. We can fairly easily solve the issue of migrating from net to old versions. The more interesting question is migrating from old to new versions. If someone is migrating from an older version (without this feature) to a newer version and doing so between writes, the bitmap state will have no idea that a partial write has already happened. The completing write will just set one of the bits and notifications that we are looking for do not happen. -vlad
Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written
On 11/13/2013 03:00 PM, Michael S. Tsirkin wrote: On Wed, Nov 13, 2013 at 11:21:18AM -0500, Vlad Yasevich wrote: On 11/10/2013 07:11 AM, Michael S. Tsirkin wrote: On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote: What about this approach? This only updates the monitory when all the bits have been written to. -vlad Thanks! Some comments below. -- 8 -- Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written We currently just update the HMP NIC info when the last bit of macaddr is written. This assumes that guest driver will write all the macaddr from bit 0 to bit 5 when it changes the macaddr, this is the current behavior of linux driver (e1000/rtl8139cp), but we can't do this assumption. I would rather say it seems better not to make this assumption. This does look somewhat safer than what Amos proposed. The macaddr that is used for rx-filter will be updated when every bit is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC info when every bit has been changed. It will be same as virtio-net. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/net/e1000.c | 17 - hw/net/rtl8139.c | 14 +- 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 8387443..a5967ed 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -149,6 +149,10 @@ typedef struct E1000State_st { #define E1000_FLAG_AUTONEG (1 E1000_FLAG_AUTONEG_BIT) #define E1000_FLAG_MIT (1 E1000_FLAG_MIT_BIT) uint32_t compat_flags; +uint32_t mac_changed; Hmm why uint32_t? uint8_t is enough here isn't? This new state has to be migrated then, and we need to fallback to old behaviour if migrating to/from an old version (see compat_flags for one way to detect this compatibility mode). Hi Michael I started looking at migrating this thing and I now starting to question the whole approach. The only reason to migrate this is if we can migrate between writes to the mac address registers. Absolutely. For some reason below you only discuss cross version migration but it needs to be migrated for same version migration too. We can fairly easily solve the issue of migrating from net to old versions. I'm not sure it's easier, but it needs to happen anymore. The more interesting question is migrating from old to new versions. If someone is migrating from an older version (without this feature) to a newer version and doing so between writes, the bitmap state will have no idea that a partial write has already happened. The completing write will just set one of the bits and notifications that we are looking for do not happen. -vlad Yep, that's a problem too. For 1.8 just send the bitmap across. Right. That's simple enough. For cross version migration I would say we should just detect -M 1.7 and older and just emulate old behaviour, disregard the bitmap completely. Don't do special tricks just for migration. The compat flags allow for simple migration to 1.7. However, it's the migration from 1.7 to 1.8 that's the issue. There is a version number on the E1000 state and I am thinking of maybe bumping that so that we can catch older state that doesn't support mac sate change. Thoughts? -vlad
Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written
On 11/10/2013 07:11 AM, Michael S. Tsirkin wrote: On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote: What about this approach? This only updates the monitory when all the bits have been written to. -vlad Thanks! Some comments below. -- 8 -- Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written We currently just update the HMP NIC info when the last bit of macaddr is written. This assumes that guest driver will write all the macaddr from bit 0 to bit 5 when it changes the macaddr, this is the current behavior of linux driver (e1000/rtl8139cp), but we can't do this assumption. I would rather say it seems better not to make this assumption. This does look somewhat safer than what Amos proposed. The macaddr that is used for rx-filter will be updated when every bit is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC info when every bit has been changed. It will be same as virtio-net. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/net/e1000.c | 17 - hw/net/rtl8139.c | 14 +- 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 8387443..a5967ed 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -149,6 +149,10 @@ typedef struct E1000State_st { #define E1000_FLAG_AUTONEG (1 E1000_FLAG_AUTONEG_BIT) #define E1000_FLAG_MIT (1 E1000_FLAG_MIT_BIT) uint32_t compat_flags; +uint32_t mac_changed; Hmm why uint32_t? uint8_t is enough here isn't? Yes, that should be fine. I'll fix and find a better spot to put it. (may be a hole in the struct). This new state has to be migrated then, and we need to fallback to old behaviour if migrating to/from an old version (see compat_flags for one way to detect this compatibility mode). Good point. Didn't think of that... +#define E1000_RA0_CHANGED 0 +#define E1000_RA1_CHANGED 1 +#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED) I don't get it. So E1000_RA_ALL_CHANGED is 0 | 1 == 1. it looks like the trigger is when E1000_RA1_CHANGED so that's more or less equivalent. Goofed on the bits. That should have been (10) and (11). } E1000State; #define TYPE_E1000 e1000 @@ -402,6 +406,7 @@ static void e1000_reset(void *opaque) d-mac_reg[RA + 1] |= (i 2) ? macaddr[i + 4] (8 * i) : 0; } qemu_format_nic_info_str(qemu_get_queue(d-nic), macaddr); +d-mac_changed = 0; } static void @@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA + 1) { +switch (index) { +case RA: +s-mac_changed |= E1000_RA0_CHANGED; +break; +case (RA + 1): +s-mac_changed |= E1000_RA1_CHANGED; +break; +} + +if (s-mac_changed == E1000_RA_ALL_CHANGED) { Some whitespace damage here. macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); + s-mac_changed = 0; Need to use spaces for indent in qemu. } } diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c Best to split out in a separate commit. OK index 5329f44..6dac10c 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -476,6 +476,8 @@ typedef struct RTL8139State { uint16_t CpCmd; uint8_t TxThresh; +uint8_t mac_changed; This new state has to be migrated then, and we need to fallback to old behaviour if migrating to/from an old version. +#define RTL8139_MAC_CHANGED_ALL 0x3F NICState *nic; NICConf conf; @@ -1215,6 +1217,7 @@ static void rtl8139_reset(DeviceState *d) /* restore MAC address */ memcpy(s-phys, s-conf.macaddr.a, 6); qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed = 0; /* reset interrupt mask */ s-IntrStatus = 0; @@ -2741,12 +2744,13 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) switch (addr) { -case MAC0 ... MAC0+4: -s-phys[addr - MAC0] = val; -break; -case MAC0+5: +case MAC0 ... MAC0+5: s-phys[addr - MAC0] = val; -qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed |= (1 (addr - MAC0)); Better drop the external () here otherwise it starts looking like lisp :) Heh. OK. Thanks -vlad +if (s-mac_changed == RTL8139_MAC_CHANGED_ALL) { +qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed = 0; +} break; case MAC0+6 ... MAC0+7: /* reserved */ -- 1.8.4.2
Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written
On 11/08/2013 10:43 PM, Amos Kong wrote: On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote: What about this approach? This only updates the monitory when all the bits have been written to. Hi Vlad, Looks good to me. Using this patch, we don't need to care the writing order. If we add event notify in future, it can reduce the noise event. BTW, can we use a tmp buffer to record the new mac (changing is unfinished), and write the new mac to registers until all bits is written? I've thought about this as well, but I am not sure what it would buy us and what's the best way to do it. At first I thought that we could have a function local static that we could accumulate into. But then Michael mentioned migration and what happens if we migrate in middle of the mac change? So we put into the device, but then it isn't much different then what we have now with the possible exception that the mac change happens slightly more atomically. :) For this, it might make more sense to temporarily stop the link when the mac address change is happening. -vlad Amos -vlad -- 8 -- Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written Subject: [PATCH] e1000/rtl8139: update HMP NIC when all bits are written We currently just update the HMP NIC info when the last bit of macaddr is written. This assumes that guest driver will write all the macaddr from bit 0 to bit 5 when it changes the macaddr, this is the current behavior of linux driver (e1000/rtl8139cp), but we can't do this assumption. The macaddr that is used for rx-filter will be updated when every bit is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC info when every bit has been changed. It will be same as virtio-net. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/net/e1000.c | 17 - hw/net/rtl8139.c | 14 +- 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 8387443..a5967ed 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -149,6 +149,10 @@ typedef struct E1000State_st { #define E1000_FLAG_AUTONEG (1 E1000_FLAG_AUTONEG_BIT) #define E1000_FLAG_MIT (1 E1000_FLAG_MIT_BIT) uint32_t compat_flags; +uint32_t mac_changed; +#define E1000_RA0_CHANGED 0 +#define E1000_RA1_CHANGED 1 +#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED) } E1000State; #define TYPE_E1000 e1000 @@ -402,6 +406,7 @@ static void e1000_reset(void *opaque) d-mac_reg[RA + 1] |= (i 2) ? macaddr[i + 4] (8 * i) : 0; } qemu_format_nic_info_str(qemu_get_queue(d-nic), macaddr); +d-mac_changed = 0; } static void @@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA + 1) { +switch (index) { +case RA: +s-mac_changed |= E1000_RA0_CHANGED; +break; +case (RA + 1): +s-mac_changed |= E1000_RA1_CHANGED; +break; +} + +if (s-mac_changed == E1000_RA_ALL_CHANGED) { macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); + s-mac_changed = 0; } } diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 5329f44..6dac10c 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -476,6 +476,8 @@ typedef struct RTL8139State { uint16_t CpCmd; uint8_t TxThresh; +uint8_t mac_changed; +#define RTL8139_MAC_CHANGED_ALL 0x3F NICState *nic; NICConf conf; @@ -1215,6 +1217,7 @@ static void rtl8139_reset(DeviceState *d) /* restore MAC address */ memcpy(s-phys, s-conf.macaddr.a, 6); qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed = 0; /* reset interrupt mask */ s-IntrStatus = 0; @@ -2741,12 +2744,13 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) switch (addr) { -case MAC0 ... MAC0+4: -s-phys[addr - MAC0] = val; -break; -case MAC0+5: +case MAC0 ... MAC0+5: s-phys[addr - MAC0] = val; -qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed |= (1 (addr - MAC0)); +if (s-mac_changed == RTL8139_MAC_CHANGED_ALL) { +qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed = 0; +} break; case MAC0+6 ... MAC0+7: /* reserved */ -- 1.8.4.2
Re: [Qemu-devel] [PATCH] virtio-net: don't update mac_table in error state
On 11/10/2013 10:48 PM, Amos Kong wrote: mac_table was always cleaned up first in handling VIRTIO_NET_CTRL_MAC_TABLE_SET command, and we din't recover mac_table content in error state, it's not correct. This patch makes all the changes in temporal variables, only update the real mac_table if everything is ok. We won't change mac_table in error state, so rxfilter notification isn't needed. This patch also fixed same problame in http://lists.nongnu.org/archive/html/qemu-devel/2013-11/msg01188.html (not merge) I will send patch for virtio spec to clarifying this change. Signed-off-by: Amos Kong ak...@redhat.com --- hw/net/virtio-net.c | 34 -- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 22dbd05..880fa4a 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -610,11 +610,11 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, return VIRTIO_NET_ERR; } -n-mac_table.in_use = 0; -n-mac_table.first_multi = 0; -n-mac_table.uni_overflow = 0; -n-mac_table.multi_overflow = 0; -memset(n-mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN); +int in_use = 0; +int first_multi = 0; +uint8_t uni_overflow = 0; +uint8_t multi_overflow = 0; +uint8_t *macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN); I am not sure what the practice is for mixing declarations with code in qemu. Can't find anything in the docs. Otherwise, looks good to me. Reviewed-by: Vlad Yasevich vyase...@redhat.com -vlad s = iov_to_buf(iov, iov_cnt, 0, mac_data.entries, sizeof(mac_data.entries)); @@ -629,19 +629,19 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, } if (mac_data.entries = MAC_TABLE_ENTRIES) { -s = iov_to_buf(iov, iov_cnt, 0, n-mac_table.macs, +s = iov_to_buf(iov, iov_cnt, 0, macs, mac_data.entries * ETH_ALEN); if (s != mac_data.entries * ETH_ALEN) { goto error; } -n-mac_table.in_use += mac_data.entries; +in_use += mac_data.entries; } else { -n-mac_table.uni_overflow = 1; +uni_overflow = 1; } iov_discard_front(iov, iov_cnt, mac_data.entries * ETH_ALEN); -n-mac_table.first_multi = n-mac_table.in_use; +first_multi = in_use; s = iov_to_buf(iov, iov_cnt, 0, mac_data.entries, sizeof(mac_data.entries)); @@ -656,23 +656,29 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, goto error; } -if (n-mac_table.in_use + mac_data.entries = MAC_TABLE_ENTRIES) { -s = iov_to_buf(iov, iov_cnt, 0, n-mac_table.macs, +if (in_use + mac_data.entries = MAC_TABLE_ENTRIES) { +s = iov_to_buf(iov, iov_cnt, 0, macs[in_use * ETH_ALEN], mac_data.entries * ETH_ALEN); if (s != mac_data.entries * ETH_ALEN) { goto error; } -n-mac_table.in_use += mac_data.entries; +in_use += mac_data.entries; } else { -n-mac_table.multi_overflow = 1; +multi_overflow = 1; } +n-mac_table.in_use = in_use; +n-mac_table.first_multi = first_multi; +n-mac_table.uni_overflow = uni_overflow; +n-mac_table.multi_overflow = multi_overflow; +memcpy(n-mac_table.macs, macs, MAC_TABLE_ENTRIES * ETH_ALEN); +g_free(macs); rxfilter_notify(nc); return VIRTIO_NET_OK; error: -rxfilter_notify(nc); +g_free(macs); return VIRTIO_NET_ERR; }
[Qemu-devel] [PATCH] virtio-net: Correctly store multicast filter entries
Commit 921ac5d0f3a0df869db5ce4edf752f51d8b1596a virtio-net: remove layout assumptions for ctrl vq introduced a regression where the multicast address filter entries are written to the beginning of the mac table array, thus overwriting any unicast addresses that may have been programmed in the filter. The multicast addresses should be written after all the unicast addresses. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/net/virtio-net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 22dbd05..94e8b68 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -657,7 +657,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, } if (n-mac_table.in_use + mac_data.entries = MAC_TABLE_ENTRIES) { -s = iov_to_buf(iov, iov_cnt, 0, n-mac_table.macs, +s = iov_to_buf(iov, iov_cnt, 0, + n-mac_table.macs + (n-mac_table.in_use * ETH_ALEN), mac_data.entries * ETH_ALEN); if (s != mac_data.entries * ETH_ALEN) { goto error; -- 1.8.4.2
Re: [Qemu-devel] [PATCH] virtio-net: Correctly store multicast filter entries
On 11/08/2013 11:07 AM, Vlad Yasevich wrote: Commit 921ac5d0f3a0df869db5ce4edf752f51d8b1596a virtio-net: remove layout assumptions for ctrl vq introduced a regression where the multicast address filter entries are written to the beginning of the mac table array, thus overwriting any unicast addresses that may have been programmed in the filter. The multicast addresses should be written after all the unicast addresses. Signed-off-by: Vlad Yasevich vyase...@redhat.com Please ignore. Just saw pull pull request with similar patch. -vlad --- hw/net/virtio-net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 22dbd05..94e8b68 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -657,7 +657,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, } if (n-mac_table.in_use + mac_data.entries = MAC_TABLE_ENTRIES) { -s = iov_to_buf(iov, iov_cnt, 0, n-mac_table.macs, +s = iov_to_buf(iov, iov_cnt, 0, + n-mac_table.macs + (n-mac_table.in_use * ETH_ALEN), mac_data.entries * ETH_ALEN); if (s != mac_data.entries * ETH_ALEN) { goto error;
[Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written
What about this approach? This only updates the monitory when all the bits have been written to. -vlad -- 8 -- Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written We currently just update the HMP NIC info when the last bit of macaddr is written. This assumes that guest driver will write all the macaddr from bit 0 to bit 5 when it changes the macaddr, this is the current behavior of linux driver (e1000/rtl8139cp), but we can't do this assumption. The macaddr that is used for rx-filter will be updated when every bit is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC info when every bit has been changed. It will be same as virtio-net. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/net/e1000.c | 17 - hw/net/rtl8139.c | 14 +- 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 8387443..a5967ed 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -149,6 +149,10 @@ typedef struct E1000State_st { #define E1000_FLAG_AUTONEG (1 E1000_FLAG_AUTONEG_BIT) #define E1000_FLAG_MIT (1 E1000_FLAG_MIT_BIT) uint32_t compat_flags; +uint32_t mac_changed; +#define E1000_RA0_CHANGED 0 +#define E1000_RA1_CHANGED 1 +#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED) } E1000State; #define TYPE_E1000 e1000 @@ -402,6 +406,7 @@ static void e1000_reset(void *opaque) d-mac_reg[RA + 1] |= (i 2) ? macaddr[i + 4] (8 * i) : 0; } qemu_format_nic_info_str(qemu_get_queue(d-nic), macaddr); +d-mac_changed = 0; } static void @@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA + 1) { +switch (index) { +case RA: +s-mac_changed |= E1000_RA0_CHANGED; +break; +case (RA + 1): +s-mac_changed |= E1000_RA1_CHANGED; +break; +} + +if (s-mac_changed == E1000_RA_ALL_CHANGED) { macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); + s-mac_changed = 0; } } diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 5329f44..6dac10c 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -476,6 +476,8 @@ typedef struct RTL8139State { uint16_t CpCmd; uint8_t TxThresh; +uint8_t mac_changed; +#define RTL8139_MAC_CHANGED_ALL 0x3F NICState *nic; NICConf conf; @@ -1215,6 +1217,7 @@ static void rtl8139_reset(DeviceState *d) /* restore MAC address */ memcpy(s-phys, s-conf.macaddr.a, 6); qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed = 0; /* reset interrupt mask */ s-IntrStatus = 0; @@ -2741,12 +2744,13 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) switch (addr) { -case MAC0 ... MAC0+4: -s-phys[addr - MAC0] = val; -break; -case MAC0+5: +case MAC0 ... MAC0+5: s-phys[addr - MAC0] = val; -qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed |= (1 (addr - MAC0)); +if (s-mac_changed == RTL8139_MAC_CHANGED_ALL) { +qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); +s-mac_changed = 0; +} break; case MAC0+6 ... MAC0+7: /* reserved */ -- 1.8.4.2
Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written
On 11/07/2013 10:27 AM, Michael S. Tsirkin wrote: On Thu, Nov 07, 2013 at 07:33:57AM -0700, Alex Williamson wrote: On Thu, 2013-11-07 at 12:26 +0200, Michael S. Tsirkin wrote: On Thu, Nov 07, 2013 at 03:32:29PM +0800, Amos Kong wrote: On Thu, Nov 07, 2013 at 08:59:22AM +0200, Michael S. Tsirkin wrote: On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote: We currently just update the HMP NIC info when the last bit of macaddr is written. This assumes that guest driver will write all the macaddr from bit 0 to bit 5 when it changes the macaddr, this is the current behavior of linux driver (e1000/rtl8139cp), but we can't do this assumption. The macaddr that is used for rx-filter will be updated when every bit is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC info when every bit is changed. It will be same as virtio-net. Signed-off-by: Amos Kong ak...@redhat.com I'm not sure I buy this. If we actually implement e.g. mac change notifications, sending them on writes of random bytes will confuse the host. This patch just effects the monitor display of macaddr. During each writing, the macaddr is used for rx-filter is really changed. In the real hardware, it supports to just write part of bits, the rx-filtering is effected by every bit writing. Yes but again, the window can just be too small to matter on real hardware. Our emulation is not perfect, fixing this to be just like real hardware just might expose other bugs we can't fix that easily. If we were to implement mac change notification, then every partial update would send a notify and the host. Is that a problem? It seems no different than if the device had an atomic mac update procedure and the guest admin changed the mac several times. I think modern nics make address updates atomic. Problem is, we are emulating an ancient one, so to make host configuration of a modern one reasonable we need to resort to tricks. The problem with assuming that a given byte is always written last is that unless the hardware spec identifies an order, we're just basing our code on examples where we know what the guest driver does, either by code inspection or access tracing. If there are examples of guests that write the last byte first, then the host will never know the correct mac address. Maybe there are no guests that use the wrong order, but that's a pretty exhaustive search. I agree what we have is a hack. Maybe we need some better hacks. For example, maybe we should update mac at link state change (I think link is always down when mac is updated?). Needs some thought. I thought this too, but checking recent linux kernel, e1000 and rtl8139 seem to allow live mac change so link is up. So here is a stupid, untested patch for e1000 to notify only once: diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 8387443..b99eba4 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -149,6 +149,10 @@ typedef struct E1000State_st { #define E1000_FLAG_AUTONEG (1 E1000_FLAG_AUTONEG_BIT) #define E1000_FLAG_MIT (1 E1000_FLAG_MIT_BIT) uint32_t compat_flags; +uint32_t mac_change_flags; +#define E1000_RA0_CHANGED 0 +#define E1000_RA1_CHANGED 1 +#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED) } E1000State; #define TYPE_E1000 e1000 @@ -402,6 +406,7 @@ static void e1000_reset(void *opaque) d-mac_reg[RA + 1] |= (i 2) ? macaddr[i + 4] (8 * i) : 0; } qemu_format_nic_info_str(qemu_get_queue(d-nic), macaddr); +d-mac_change_flags = 0; } static void @@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA + 1) { +switch (index) { + case RA: + s-mac_change_flags |= E1000_RA0_CHANGED; + break; + case (RA + 1): + s-mac_change_flags |= E1000_RA1_CHANGED; + break; +} + +if (!(s-mac_change_flags ^ E1000_RA_ALL_CHANGED)) { macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); +s-mac_change_flags = 0; } } Any thoughts? -vlad
Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written
On 11/07/2013 09:33 AM, Alex Williamson wrote: On Thu, 2013-11-07 at 12:26 +0200, Michael S. Tsirkin wrote: On Thu, Nov 07, 2013 at 03:32:29PM +0800, Amos Kong wrote: On Thu, Nov 07, 2013 at 08:59:22AM +0200, Michael S. Tsirkin wrote: On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote: We currently just update the HMP NIC info when the last bit of macaddr is written. This assumes that guest driver will write all the macaddr from bit 0 to bit 5 when it changes the macaddr, this is the current behavior of linux driver (e1000/rtl8139cp), but we can't do this assumption. The macaddr that is used for rx-filter will be updated when every bit is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC info when every bit is changed. It will be same as virtio-net. Signed-off-by: Amos Kong ak...@redhat.com I'm not sure I buy this. If we actually implement e.g. mac change notifications, sending them on writes of random bytes will confuse the host. This patch just effects the monitor display of macaddr. During each writing, the macaddr is used for rx-filter is really changed. In the real hardware, it supports to just write part of bits, the rx-filtering is effected by every bit writing. Yes but again, the window can just be too small to matter on real hardware. Our emulation is not perfect, fixing this to be just like real hardware just might expose other bugs we can't fix that easily. If we were to implement mac change notification, then every partial update would send a notify and the host. Is that a problem? It seems no different than if the device had an atomic mac update procedure and the guest admin changed the mac several times. Yes, but the issue is exercebated in the non-atomic case. RTL8139 is the worst since it has byte oriented writes so that for ever mac change, we have the worst case potential to generate 6 notificates (assuming libvirt is so fast as to pick up ever change). Additionally, once libvirt is updated, this would cause rather serious churn on the host as for ever update, libvirt is going to push the address down to the physical host nic and the fewer of these updates we do the better. The problem with assuming that a given byte is always written last is that unless the hardware spec identifies an order, we're just basing our code on examples where we know what the guest driver does, either by code inspection or access tracing. If there are examples of guests that write the last byte first, then the host will never know the correct mac address. Maybe there are no guests that use the wrong order, but that's a pretty exhaustive search. The patch doesn't change anything about how the NIC operates, only when mac changes are updated. During an update the mac is in a transitory state and we can't go back in time to make it atomic on this hardware design to avoid a window where the wrong mac may be seen. I think the patch is the right thing to do. Thanks, Reporting half-complete state is not the right thing, IMO. Right now, it's doesn't have much impact, but if we start writing these addresses to the host nic, then this will have a much bigger impact as I said above. -vlad Alex I would say let's leave e1000/rtl8139 well alone unless we see guests that actually write mac without touching the last byte. At least, linux rtl8139cp/e1000 writes macaddr from bit 0 to bit 5. It works to just watch the last bit. Thanks, Amos Then think of ways to detect when mac change is done for these. --- hw/net/e1000.c | 2 +- hw/net/rtl8139.c | 5 + 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index ec8ecd7..2d60639 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -1110,7 +1110,7 @@ mac_writereg(E1000State *s, int index, uint32_t val) s-mac_reg[index] = val; -if (index == RA + 1) { +if (index == RA || index == RA + 1) { macaddr[0] = cpu_to_le32(s-mac_reg[RA]); macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr); diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 5329f44..7f2b4db 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -2741,10 +2741,7 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) switch (addr) { -case MAC0 ... MAC0+4: -s-phys[addr - MAC0] = val; -break; -case MAC0+5: +case MAC0 ... MAC0+5: s-phys[addr - MAC0] = val; qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys); break; -- 1.8.3.1
[Qemu-devel] [PATCH] qdev-properties-system.c: Allow vlan or netdev for -device, not both
It is currently possible to specify things like: -device e1000,netdev=foo,vlan=1 With this usage, whichever argument was specified last (vlan or netdev) overwrites what was previousely set and results in a non-working configuration. Even worse, when used with multiqueue devices, it causes a segmentation fault on exit in qemu_free_net_client. That patch treates the above command line options as invalid and generates an error at start-up. Signed-off-by: Vlad Yasevich vyase...@redhat.com --- hw/core/qdev-properties-system.c | 9 + 1 file changed, 9 insertions(+) diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index 0eada32..729efa8 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -205,6 +205,11 @@ static int parse_netdev(DeviceState *dev, const char *str, void **ptr) goto err; } +if (ncs[i]) { +ret = -EINVAL; +goto err; +} + ncs[i] = peers[i]; ncs[i]-queue_index = i; } @@ -301,6 +306,10 @@ static void set_vlan(Object *obj, Visitor *v, void *opaque, *ptr = NULL; return; } +if (*ptr) { +error_set_from_qdev_prop_error(errp, -EINVAL, dev, prop, name); +return; +} hubport = net_hub_port_find(id); if (!hubport) { -- 1.8.3.1
[Qemu-devel] [Bug 721793] Re: QEMU freezes on startup (100% CPU utilization)
QEMU 0.15.0 fixes the problem. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/721793 Title: QEMU freezes on startup (100% CPU utilization) Status in QEMU: New Bug description: 0.12.5 was the last version of QEMU that runs ok and boots any os image. 0.13.0-0.14.0 just freeze, and the only thing I see is a black screen and both of them make it use 100% of CPU also. Both kernels 2.6.35.11 and 2.6.37.1 with and without PAE support. tested commands: W2000: $ qemu -m 256 -localtime -net nic,model=rtl8139 -net tap -usbdevice host:0e21:0750 /var/opt/vm/w2000.img W2000: $ qemu /var/opt/vm/w2000.img OpenBSD 4.8: $ qemu -cdrom ~/cd48.iso -boot d empty-qcow2.img tried to use `-M pc-0.12` selector, different audio cards (I've found it caused infinite loop on startup once) -- no luck. tried to use recent seabios from git -- still no luck. attached strace log of 0.14.0. everything was tested on HP mini 311C with Intel Atom N270. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/721793/+subscriptions
[Qemu-devel] [Bug 721793] [NEW] QEMU freezes on startup (100% CPU utilization)
Public bug reported: 0.12.5 was the last version of QEMU that runs ok and boots any os image. 0.13.0-0.14.0 just freeze, and the only thing I see is a black screen and both of them make it use 100% of CPU also. Both kernels 2.6.35.11 and 2.6.37.1 with and without PAE support. tested commands: W2000: $ qemu -m 256 -localtime -net nic,model=rtl8139 -net tap -usbdevice host:0e21:0750 /var/opt/vm/w2000.img W2000: $ qemu /var/opt/vm/w2000.img OpenBSD 4.8: $ qemu -cdrom ~/cd48.iso -boot d empty-qcow2.img tried to use `-M pc-0.12` selector, different audio cards (I've found it caused infinite loop on startup once) -- no luck. tried to use recent seabios from git -- still no luck. attached strace log of 0.14.0. everything was tested on HP mini 311C with Intel Atom N270. ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/721793 Title: QEMU freezes on startup (100% CPU utilization) Status in QEMU: New Bug description: 0.12.5 was the last version of QEMU that runs ok and boots any os image. 0.13.0-0.14.0 just freeze, and the only thing I see is a black screen and both of them make it use 100% of CPU also. Both kernels 2.6.35.11 and 2.6.37.1 with and without PAE support. tested commands: W2000: $ qemu -m 256 -localtime -net nic,model=rtl8139 -net tap -usbdevice host:0e21:0750 /var/opt/vm/w2000.img W2000: $ qemu /var/opt/vm/w2000.img OpenBSD 4.8: $ qemu -cdrom ~/cd48.iso -boot d empty-qcow2.img tried to use `-M pc-0.12` selector, different audio cards (I've found it caused infinite loop on startup once) -- no luck. tried to use recent seabios from git -- still no luck. attached strace log of 0.14.0. everything was tested on HP mini 311C with Intel Atom N270.
[Qemu-devel] [Bug 721793] Re: QEMU freezes on startup (100% CPU utilization)
** Attachment added: Strace log https://bugs.launchpad.net/bugs/721793/+attachment/1859874/+files/strace-qemu.log.gz -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/721793 Title: QEMU freezes on startup (100% CPU utilization) Status in QEMU: New Bug description: 0.12.5 was the last version of QEMU that runs ok and boots any os image. 0.13.0-0.14.0 just freeze, and the only thing I see is a black screen and both of them make it use 100% of CPU also. Both kernels 2.6.35.11 and 2.6.37.1 with and without PAE support. tested commands: W2000: $ qemu -m 256 -localtime -net nic,model=rtl8139 -net tap -usbdevice host:0e21:0750 /var/opt/vm/w2000.img W2000: $ qemu /var/opt/vm/w2000.img OpenBSD 4.8: $ qemu -cdrom ~/cd48.iso -boot d empty-qcow2.img tried to use `-M pc-0.12` selector, different audio cards (I've found it caused infinite loop on startup once) -- no luck. tried to use recent seabios from git -- still no luck. attached strace log of 0.14.0. everything was tested on HP mini 311C with Intel Atom N270.
[Qemu-devel] [Bug 721793] Re: QEMU freezes on startup (100% CPU utilization)
** Attachment added: 2.6.35.11 kernel config https://bugs.launchpad.net/qemu/+bug/721793/+attachment/1859875/+files/config.gz -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/721793 Title: QEMU freezes on startup (100% CPU utilization) Status in QEMU: New Bug description: 0.12.5 was the last version of QEMU that runs ok and boots any os image. 0.13.0-0.14.0 just freeze, and the only thing I see is a black screen and both of them make it use 100% of CPU also. Both kernels 2.6.35.11 and 2.6.37.1 with and without PAE support. tested commands: W2000: $ qemu -m 256 -localtime -net nic,model=rtl8139 -net tap -usbdevice host:0e21:0750 /var/opt/vm/w2000.img W2000: $ qemu /var/opt/vm/w2000.img OpenBSD 4.8: $ qemu -cdrom ~/cd48.iso -boot d empty-qcow2.img tried to use `-M pc-0.12` selector, different audio cards (I've found it caused infinite loop on startup once) -- no luck. tried to use recent seabios from git -- still no luck. attached strace log of 0.14.0. everything was tested on HP mini 311C with Intel Atom N270.
Re: [Qemu-devel] U-Boot patch for qemu -M mips
Thiemo Seufer wrote: Vlad Lungu wrote: Thiemo Seufer wrote: Vlad Lungu wrote: [snip] +long int initdram(int board_type) +{ + /* Sdram is setup by assembler code */ + /* If memory could be changed, we should return the true value here */ + return MEM_SIZE*1024*1024; Qemu gets the amount of RAM passed via a command line switch, the qemu-mips emulation sets up a Linux kernel like command line in memory where u-boot could parse it from. Does it, or just when you pass -kernel to it? I'll check. Hm, you are right, it does that only for -kernel. Would it make sense to change that in Qemu? IDK. Maybe I can probe the RAM size in U-Boot , or if this does not work, mips_r4k implements no probable memory controller, so this won't work. I say let's stick it into a DIP-switch, so nobody can overwrite it by mistake. And you don't have to parse it. put some info somewhere (RAM, register, emulated DIP-dwitch), like RAM size, endianness of the CPU. Endianness is rather pointless. If your U-Boot binary doesn't explode immediately you got the right endianness. :-) It doesn't actually explode, it sort of almost hits the exception handler and freezes there. I don't have a mipsel gdb at hand to watch it, but it should be fun. pc=EPC 0xbfc00380 ds 0006 STATUS0x0046 CAUSE 0x0428 CONFIG0 0x8082 CONFIG1 0x9e190c8b
Re: [Qemu-devel] U-Boot patch for qemu -M mips
Vlad Lungu wrote: [snip] put some info somewhere (RAM, register, emulated DIP-dwitch), like RAM size, endianness of the CPU. Endianness is rather pointless. If your U-Boot binary doesn't explode immediately you got the right endianness. :-) It doesn't actually explode, it sort of almost hits the exception handler and freezes there. I don't have a mipsel gdb at hand to watch it, but it should be fun. pc=EPC 0xbfc00380 ds 0006 STATUS0x0046 CAUSE 0x0428 CONFIG0 0x8082 CONFIG1 0x9e190c8b Try qemu-system-mipsel ... -d all, and watch /tmp/qemu.log. :-) Oh crap. Now I see the problem. Either I misread the 0.90 info reg or there's a difference in how it's handled. It dies on the first instruction anyway. Vlad
Re: [Qemu-devel] U-Boot patch for qemu -M mips TAKE 2
Now with board config file included, so it can be built :-) Thiemo, I'll think about the memory size issue and get back to you on that. How about a git repo for U-Boot, if this thing takes off? Vlad --- diff --git a/Makefile b/Makefile index 85885b1..8f650d2 100644 --- a/Makefile +++ b/Makefile @@ -2444,6 +2444,12 @@ pb1000_config: unconfig @echo #define CONFIG_PB1000 1 $(obj)include/config.h @$(MKCONFIG) -a pb1x00 mips mips pb1x00 +qemu_mips_config : unconfig + @mkdir -p $(obj)include + @ $(obj)include/config.h + @echo #define CONFIG_QEMU_MIPS 1 $(obj)include/config.h + @$(MKCONFIG) -a qemu-mips mips mips qemu-mips + # ## MIPS64 5Kc # diff --git a/board/qemu-mips/Makefile b/board/qemu-mips/Makefile new file mode 100644 index 000..23be447 --- /dev/null +++ b/board/qemu-mips/Makefile @@ -0,0 +1,45 @@ +# +# (C) Copyright 2003-2006 +# Wolfgang Denk, DENX Software Engineering, [EMAIL PROTECTED] +# +# See file CREDITS for list of people who contributed to this +# project. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation; either version 2 of +# the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, +# MA 02111-1307 USA +# + +include $(TOPDIR)/config.mk + +LIB= $(obj)lib$(BOARD).a + +COBJS = $(BOARD).o flash.o +SOBJS = lowlevel_init.o + +SRCS := $(SOBJS:.o=.S) $(COBJS:.o=.c) +OBJS := $(addprefix $(obj),$(COBJS)) +SOBJS := $(addprefix $(obj),$(SOBJS)) + +$(LIB): $(OBJS) $(SOBJS) + $(AR) $(ARFLAGS) $@ $(OBJS) $(SOBJS) + +# + +# defines $(obj).depend target +include $(SRCTREE)/rules.mk + +sinclude $(obj).depend + +# diff --git a/board/qemu-mips/README b/board/qemu-mips/README new file mode 100644 index 000..39570b1 --- /dev/null +++ b/board/qemu-mips/README @@ -0,0 +1,11 @@ +By Vlad Lungu [EMAIL PROTECTED] 2007-Oct-01 + +Qemu is a full system emulator. See + +http://fabrice.bellard.free.fr/qemu + +Limitations comments +-- +Supports the -m mips configuration of qemu: serial,NE2000,IDE. +Support is big endian only for now (or at least this is what I tested). +Derived from au1x00 with a lot of things cut out. diff --git a/board/qemu-mips/config.mk b/board/qemu-mips/config.mk new file mode 100644 index 000..61269ce --- /dev/null +++ b/board/qemu-mips/config.mk @@ -0,0 +1,11 @@ + +# +# Qemu -M mips system emulator +# See http://fabrice.bellard.free.fr/qemu +# + +# ROM version +TEXT_BASE = 0xbfc0 + +# RAM version +#TEXT_BASE = 0x80001000 diff --git a/board/qemu-mips/flash.c b/board/qemu-mips/flash.c new file mode 100644 index 000..d419f41 --- /dev/null +++ b/board/qemu-mips/flash.c @@ -0,0 +1,43 @@ +/* + * (C) Copyright 2003 + * Wolfgang Denk, DENX Software Engineering, [EMAIL PROTECTED] + * + * See file CREDITS for list of people who contributed to this + * project. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, + * MA 02111-1307 USA + */ + +#include common.h +#if 0 +flash_info_t flash_info[CFG_MAX_FLASH_BANKS]; /* info for FLASH chips */ +#endif +/*--- + * flash_init() + * + * sets up flash_info and returns size of FLASH (bytes) + */ +unsigned long flash_init (void) +{ + printf (Skipping flash_init\n); + return (0); +} + +int write_buff (/*flash_info_t */ void * info, uchar * src, ulong addr, ulong
Re: [Qemu-devel] U-Boot patch for qemu -M mips TAKE 2
Wolfgang Denk wrote: In message [EMAIL PROTECTED] you wrote: Now with board config file included, so it can be built :-) Thiemo, I'll think about the memory size issue and get back to you on that. How about a git repo for U-Boot, if this thing takes off? We actually have already a lot of them - the master repo and the custodian repositories. Logically, your code should go into the MIPS repo. I think you should start posting on the U-Boot mailing list instead of sending to mydirectaddress;see http://www.denx.de/wiki/UBoot/WebHome for details. Actually, this was directed at the Qemu list. But thanks for the offer. I suppose they could just point to your repo in the source release/dev tree and store the corresponding source snapshot when doing a binary release with information included in the README, and it would be kosher Re: GPL. BTW: Your code has a few minor coding style violations (indentation by spaces instead of TAB), and the signed-off-ny line is missing. Might be. I use joe/mcedit/vi in random order to edit source files, that's the source of the spaces, and the patch was rather informal so no signoff. I'll re-branch and do a 3-part (GOT,NE2000,qemu) patchset . Vlad
Re: [Qemu-devel] U-Boot patch for qemu -M mips
Thiemo Seufer wrote: Vlad Lungu wrote: Fix for mips GOT relocation bug, NE2000 bugs, add support for qemu -M mips target. [snip] diff --git a/board/qemu-mips/config.mk b/board/qemu-mips/config.mk [snip] +# +# AMD development board AMD Alchemy DbAu1x00, MIPS32 core Incorrect comment. didn't spot that +# + +# ROM version +TEXT_BASE = 0xbfc0 + +# RAM version +#TEXT_BASE = 0x8010 This could be as low as 0x80001000 (assuming the space for exception handlers isn't reserved by other means). I think I saw that difference in Linux 2.6 too. I suppose you're right. [snip] diff --git a/board/qemu-mips/lowlevel_init.S b/board/qemu-mips/lowlevel_init.S new file mode 100644 index 000..855e8ab --- /dev/null +++ b/board/qemu-mips/lowlevel_init.S @@ -0,0 +1,47 @@ +/* Memory sub-system initialization code */ + +#include config.h +#include version.h +#include asm/regdef.h +#include asm/mipsregs.h + + .text + .set noreorder + .set mips32 + + .globl lowlevel_init +lowlevel_init: + + /* +* Step 2) Establish Status Register +* (set BEV, clear ERL, clear EXL, clear IE) +*/ + li t1, 0x0040 + mtc0t1, CP0_STATUS + + /* +* Step 3) Establish CP0 Config0 +* (set OD, set K0=3) +*/ + li t1, 0x00080003 + mtc0t1, CP0_CONFIG OD is a processor-specific flag, it does nothing on Qemu. + /* +* Step 5) Disable the performance counters +*/ + mtc0zero, CP0_PERFORMANCE + nop This field isn't required to exist (as per architecture spec), which means you can get an exception when writing it. Since perfctr interrupts are guaranteed to be disabled, the best option is not to touch the register in early startup code. (If you want to use performance counters, first check via the config registers if they are implemented. You won't have much luck on Qemu: The registers are implemented, but they don't do anything useful.) Well, I didn't get any exception . And nobody complained about OD either. The stuff that barfed on me, I deleted right away. I'll delete this too, no problem. + /* +* Step 7) Establish Cause +* (set IV bit) +*/ + li t1, 0x0080 + mtc0t1, CP0_CAUSE + + /* Establish Wired (and Random) */ + mtc0zero, CP0_WIRED + nop + + j ra + nop diff --git a/board/qemu-mips/qemu-mips.c b/board/qemu-mips/qemu-mips.c new file mode 100644 index 000..76c093c --- /dev/null +++ b/board/qemu-mips/qemu-mips.c @@ -0,0 +1,48 @@ +/* + * (C) Copyright 2007 + * [EMAIL PROTECTED] A name in addition would look nicer. :-) Names are so passe :-) + * + * See file CREDITS for list of people who contributed to this + * project. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, + * MA 02111-1307 USA + */ + +#include common.h +#include command.h +#include asm/mipsregs.h + +long int initdram(int board_type) +{ + /* Sdram is setup by assembler code */ + /* If memory could be changed, we should return the true value here */ + return MEM_SIZE*1024*1024; Qemu gets the amount of RAM passed via a command line switch, the qemu-mips emulation sets up a Linux kernel like command line in memory where u-boot could parse it from. Does it, or just when you pass -kernel to it? I'll check. +} + + +int checkboard (void) +{ + u32 proc_id; + + proc_id = read_32bit_cp0_register(CP0_PRID); + + switch (proc_id 24) { + default: + printf (Unsupported cpu %d, proc_id=0x%x\n, proc_id 24, proc_id); + } + + return 0; +} Huh? What is this code good for? Checking for the type of board, I suppose :-) I think all BSPs have it and it's called early in the boot process. I could either do only a return 0 or actually decode CP0_PRID and print something meaningful, if Qemu sets it to something sensible. Or just print a nice banner. Thanks for the input. This is a first version and it does what it's supposed to do, it booted in qemu HEAD on linux-x86 and in stock 0.90 for Windows and loaded
Re: [Qemu-devel] U-Boot patch for qemu -M mips
Thiemo Seufer wrote: Vlad Lungu wrote: [snip] +long int initdram(int board_type) +{ + /* Sdram is setup by assembler code */ + /* If memory could be changed, we should return the true value here */ + return MEM_SIZE*1024*1024; Qemu gets the amount of RAM passed via a command line switch, the qemu-mips emulation sets up a Linux kernel like command line in memory where u-boot could parse it from. Does it, or just when you pass -kernel to it? I'll check. Hm, you are right, it does that only for -kernel. Would it make sense to change that in Qemu? IDK. Maybe I can probe the RAM size in U-Boot , or if this does not work, put some info somewhere (RAM, register, emulated DIP-dwitch), like RAM size, endianness of the CPU. Since the mips_r4k machine doesn't correspond to any real hardware we have a bit of leeway with the hardware design. +} + + +int checkboard (void) +{ + u32 proc_id; + + proc_id = read_32bit_cp0_register(CP0_PRID); + + switch (proc_id 24) { + default: + printf (Unsupported cpu %d, proc_id=0x%x\n, proc_id 24, proc_id); + } + + return 0; +} Huh? What is this code good for? Checking for the type of board, I suppose :-) I think all BSPs have it and it's called early in the boot process. I could either do only a return 0 or actually decode CP0_PRID and print something meaningful, if Qemu sets it to something sensible. Or just print a nice banner. As it is, this code prints a nice Unsupported CPU banner no matter what (and proceeds anyway instead of bailing out). True, because it does not care much about this at the moment. Nor do I. But that can change. Latest CVS Qemu supports a number of different CPUs (see the -cpu switch), the default for 32 bit is a 24Kf, and for 64 bit a R4000. I'll look into this and see what use can we make of CP0_PRID. Vlad
[Qemu-devel] U-Boot patch for qemu -M mips
Fix for mips GOT relocation bug, NE2000 bugs, add support for qemu -M mips target. Patch is against U-Boot master branch. -- diff --git a/Makefile b/Makefile index 85885b1..8f650d2 100644 --- a/Makefile +++ b/Makefile @@ -2444,6 +2444,12 @@ pb1000_config : unconfig @echo "#define CONFIG_PB1000 1" $(obj)include/config.h @$(MKCONFIG) -a pb1x00 mips mips pb1x00 +qemu_mips_config : unconfig + @mkdir -p $(obj)include + @ $(obj)include/config.h + @echo "#define CONFIG_QEMU_MIPS 1" $(obj)include/config.h + @$(MKCONFIG) -a qemu-mips mips mips qemu-mips + # ## MIPS64 5Kc # diff --git a/board/qemu-mips/Makefile b/board/qemu-mips/Makefile new file mode 100644 index 000..23be447 --- /dev/null +++ b/board/qemu-mips/Makefile @@ -0,0 +1,45 @@ +# +# (C) Copyright 2003-2006 +# Wolfgang Denk, DENX Software Engineering, [EMAIL PROTECTED]. +# +# See file CREDITS for list of people who contributed to this +# project. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation; either version 2 of +# the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, +# MA 02111-1307 USA +# + +include $(TOPDIR)/config.mk + +LIB = $(obj)lib$(BOARD).a + +COBJS = $(BOARD).o flash.o +SOBJS = lowlevel_init.o + +SRCS := $(SOBJS:.o=.S) $(COBJS:.o=.c) +OBJS := $(addprefix $(obj),$(COBJS)) +SOBJS := $(addprefix $(obj),$(SOBJS)) + +$(LIB): $(OBJS) $(SOBJS) + $(AR) $(ARFLAGS) $@ $(OBJS) $(SOBJS) + +# + +# defines $(obj).depend target +include $(SRCTREE)/rules.mk + +sinclude $(obj).depend + +# diff --git a/board/qemu-mips/README b/board/qemu-mips/README new file mode 100644 index 000..070c99d --- /dev/null +++ b/board/qemu-mips/README @@ -0,0 +1,11 @@ +By [EMAIL PROTECTED] 2007-Oct-01 + +Qemu is a full system emulator. See + +http://fabrice.bellard.free.fr/qemu + +Limitations comments +-- +Supports the "-m mips" configuration of qemu: serial,NE2000,IDE. +Support is big endian only for now (or at least this is what I tested). +Derived from au1x00 with a lot of things cut out. diff --git a/board/qemu-mips/config.mk b/board/qemu-mips/config.mk new file mode 100644 index 000..39eb60a --- /dev/null +++ b/board/qemu-mips/config.mk @@ -0,0 +1,32 @@ +# +# (C) Copyright 2003 +# Wolfgang Denk, DENX Software Engineering, [EMAIL PROTECTED]. +# +# See file CREDITS for list of people who contributed to this +# project. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation; either version 2 of +# the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, +# MA 02111-1307 USA +# + +# +# AMD development board AMD Alchemy DbAu1x00, MIPS32 core +# + +# ROM version +TEXT_BASE = 0xbfc0 + +# RAM version +#TEXT_BASE = 0x8010 diff --git a/board/qemu-mips/flash.c b/board/qemu-mips/flash.c new file mode 100644 index 000..d419f41 --- /dev/null +++ b/board/qemu-mips/flash.c @@ -0,0 +1,43 @@ +/* + * (C) Copyright 2003 + * Wolfgang Denk, DENX Software Engineering, [EMAIL PROTECTED]. + * + * See file CREDITS for list of people who contributed to this + * project. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU
[Qemu-devel] U-Boot
Vlad Lungu wrote: lol, replying to myself Blue Swirl wrote: Someone could port OpenBIOS or LinuxBIOS to MIPS. Well, just FYI: I sort-of-ported U-Boot to Qemu (-M mips). Network performance sucks for some reason (hard enough that tftp-booting a kernel is impossible) and I didn't have time to investigate this yet. But I guess that porting to (-M malta) should not be that hard, and I think there is some PCI support already in U-boot. Reports of poor network performance have been greatly exagerated. Apparently, what it happens is : - qemu doesn't respond with a next-server or server-name in DHCP, apparently - U-Boot uses broadcast in this case as the server address - things don't quite work Setting IP by hand or using tap for networking and configuring DHCP on the host correctly allows loading bt TFTP (or even nfs-booting with tap). I can boot a vmlinux with embedded ramdisk (a couple of Megs) in maybe 10 seconds. Still getting RX timeouts, but it's no big deal. I will polish the thing a little and post an update in 1-2 weeks max. Maybe we can have a bootrom for MIPS shipping with Qemu, after all. Even for malta. Vlad