from:"vlad"

[PATCH 6.6.y] virtio_net: Do not send RSS key if it is not supported

2024-04-22 Thread Vlad Poenaru

From: Breno Leitao 

commit 059a49aa2e25c58f90b50151f109dd3c4cdb3a47 upstream.

There is a bug when setting the RSS options in virtio_net that can break
the whole machine, getting the kernel into an infinite loop.

Running the following command in any QEMU virtual machine with virtionet
will reproduce this problem:

# ethtool -X eth0  hfunc toeplitz

This is how the problem happens:

1) ethtool_set_rxfh() calls virtnet_set_rxfh()

2) virtnet_set_rxfh() calls virtnet_commit_rss_command()

3) virtnet_commit_rss_command() populates 4 entries for the rss
scatter-gather

4) Since the command above does not have a key, then the last
scatter-gatter entry will be zeroed, since rss_key_size == 0.
sg_buf_size = vi->rss_key_size;

5) This buffer is passed to qemu, but qemu is not happy with a buffer
with zero length, and do the following in virtqueue_map_desc() (QEMU
function):

  if (!sz) {
  virtio_error(vdev, "virtio: zero sized buffers are not allowed");

6) virtio_error() (also QEMU function) set the device as broken

vdev->broken = true;

7) Qemu bails out, and do not repond this crazy kernel.

8) The kernel is waiting for the response to come back (function
virtnet_send_command())

9) The kernel is waiting doing the following :

  while (!virtqueue_get_buf(vi->cvq, ) &&
 !virtqueue_is_broken(vi->cvq))
  cpu_relax();

10) None of the following functions above is true, thus, the kernel
loops here forever. Keeping in mind that virtqueue_is_broken() does
not look at the qemu `vdev->broken`, so, it never realizes that the
vitio is broken at QEMU side.

Fix it by not sending RSS commands if the feature is not available in
the device.

Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Cc: sta...@vger.kernel.org
Cc: qemu-devel@nongnu.org
Signed-off-by: Breno Leitao 
Reviewed-by: Heng Qi 
Reviewed-by: Xuan Zhuo 
Signed-off-by: David S. Miller 
Signed-off-by: Vlad Poenaru 
---
 drivers/net/virtio_net.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7cb0548d17a3..56cbe00126bb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3570,19 +3570,34 @@ static int virtnet_get_rxfh(struct net_device *dev, u32 
*indir, u8 *key, u8 *hfu
 static int virtnet_set_rxfh(struct net_device *dev, const u32 *indir, const u8 
*key, const u8 hfunc)
 {
struct virtnet_info *vi = netdev_priv(dev);
+   bool update = false;
int i;
 
if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
return -EOPNOTSUPP;
 
if (indir) {
+   if (!vi->has_rss)
+   return -EOPNOTSUPP;
+
for (i = 0; i < vi->rss_indir_table_size; ++i)
vi->ctrl->rss.indirection_table[i] = indir[i];
+   update = true;
}
-   if (key)
+   if (key) {
+   /* If either _F_HASH_REPORT or _F_RSS are negotiated, the
+* device provides hash calculation capabilities, that is,
+* hash_key is configured.
+*/
+   if (!vi->has_rss && !vi->has_rss_hash_report)
+   return -EOPNOTSUPP;
+
memcpy(vi->ctrl->rss.key, key, vi->rss_key_size);
+   update = true;
+   }
 
-   virtnet_commit_rss_command(vi);
+   if (update)
+   virtnet_commit_rss_command(vi);
 
return 0;
 }
@@ -4491,13 +4506,15 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT))
vi->has_rss_hash_report = true;
 
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS))
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) {
vi->has_rss = true;
 
-   if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_indir_table_size =
virtio_cread16(vdev, offsetof(struct virtio_net_config,
rss_max_indirection_table_length));
+   }
+
+   if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_key_size =
virtio_cread8(vdev, offsetof(struct virtio_net_config, 
rss_max_key_size));
 
-- 
2.43.0

Re: [agl-dev-community] Tips for debugging GPU acceleration.

2020-10-23 Thread Marius Vlad

On Fri, Oct 23, 2020 at 05:28:22PM +0100, Alex Bennée wrote:
> Thanks - that seems to do the trick.
> I can't find where in the agl repo that is set. The recipe seems to be in:
> 
>   
> meta-agl-cluster-demo/recipie-graphics/wayland/weston-ini-conf/virtual-landscape.cfg
> 
> but I can't see where the "transform=270" got added.
I assume from
meta-agl/meta-agl-bsp/meta-aglprofilegraphical/recipes-graphics/wayland/weston-ini-conf/virtual.cfg
> 
> Visually it looks a lot nicer now although I could still do with
> getting rid of the quite large AGL lower banner which is cropping out
> the main display.
Yes, like I've said, but apparently I've substituted landscape w/
portrait, applications are designed to accommodate portrait orientation.
The panels, are added dynamically by homescreen.
> 
> On Fri, 23 Oct 2020 at 13:32, Marius Vlad  wrote:
> >
> > On Fri, Oct 23, 2020 at 12:52:17PM +0100, Alex Bennée wrote:
> > >
> > > Gerd Hoffmann  writes:
> > >
> > > >   Hi,
> > > >
> > > >>   [2.930300] [drm] virgl 3d acceleration not supported by host
> > > >
> > > > Nope, not active.
> > > >
> > > >> -display gtk,show-cursor=on \
> > > >
> > > > Needs -display gtk,gl=on for opengl support.
> > >
> > > Awesome - even on TCG the display is now nice and snappy.
> > >
> > > For bonus points:
> > >
> > > The AGL panel display is rotated 90 degrees which is putting a bit of a
> > > crink in my neck. Is their anyway to rotate the view port. I tried
> > > inverting xres and yres but that didn't do anything.
> > Hi,
> >
> > The output is rotated, edit /etc/xdg/weston/weston.ini and comment out
> > transform ini entry from the Virtual-1 output section. Reboot, or
> > restart weston@display service. Note that the apps are optimized for
> > landscape.
> >
> > Enabling 3D with qemu might be something worth adding in the docs.
> > >
> > > >> -device virtio-gpu-pci,virgl=true
> > > >
> > > > virgl is silently turned off in case opengl support is not available.
> > > > Ideally virgl should be OnOffAuto not bool, but I fear changing that
> > > > now (and start throwing errors on virgl=on if opengl is not available)
> > > > will break stuff ...
> > >
> > > At least a warn_report maybe?
> > >
> > > >
> > > > take care,
> > > >   Gerd
> > >
> > >
> > > --
> > > Alex Bennée
> > >
> > >
> > >
> > >
> > >
> >
> >
> > 
> >
> >
> 
> 
> -- 
> Alex Bennée
> KVM/QEMU Hacker for Linaro
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#8773): 
> https://lists.automotivelinux.org/g/agl-dev-community/message/8773
> Mute This Topic: https://lists.automotivelinux.org/mt/77748325/4043866
> Group Owner: agl-dev-community+ow...@lists.automotivelinux.org
> Unsubscribe: 
> https://lists.automotivelinux.org/g/agl-dev-community/leave/7327001/516878461/xyzzy
>  [marius.v...@collabora.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
> 


signature.asc
Description: PGP signature

Re: [agl-dev-community] Tips for debugging GPU acceleration.

2020-10-23 Thread Marius Vlad

On Fri, Oct 23, 2020 at 12:52:17PM +0100, Alex Bennée wrote:
> 
> Gerd Hoffmann  writes:
> 
> >   Hi,
> >
> >>   [2.930300] [drm] virgl 3d acceleration not supported by host
> >
> > Nope, not active.
> >
> >> -display gtk,show-cursor=on \
> >
> > Needs -display gtk,gl=on for opengl support.
> 
> Awesome - even on TCG the display is now nice and snappy.
> 
> For bonus points:
> 
> The AGL panel display is rotated 90 degrees which is putting a bit of a
> crink in my neck. Is their anyway to rotate the view port. I tried
> inverting xres and yres but that didn't do anything.
Hi,

The output is rotated, edit /etc/xdg/weston/weston.ini and comment out
transform ini entry from the Virtual-1 output section. Reboot, or
restart weston@display service. Note that the apps are optimized for
landscape. 

Enabling 3D with qemu might be something worth adding in the docs.
> 
> >> -device virtio-gpu-pci,virgl=true
> >
> > virgl is silently turned off in case opengl support is not available.
> > Ideally virgl should be OnOffAuto not bool, but I fear changing that
> > now (and start throwing errors on virgl=on if opengl is not available)
> > will break stuff ...
> 
> At least a warn_report maybe?
> 
> >
> > take care,
> >   Gerd
> 
> 
> -- 
> Alex Bennée
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#8771): 
> https://lists.automotivelinux.org/g/agl-dev-community/message/8771
> Mute This Topic: https://lists.automotivelinux.org/mt/77748325/4043866
> Group Owner: agl-dev-community+ow...@lists.automotivelinux.org
> Unsubscribe: 
> https://lists.automotivelinux.org/g/agl-dev-community/leave/7327001/516878461/xyzzy
>  [marius.v...@collabora.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
> 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 0/5] Fix handling of IPv4/IPv6 dual stack

2017-06-10 Thread Vlad Yasevich

On 06/01/2017 04:29 AM, Daniel P. Berrange wrote:
> This is a followup to:
> 
>   v1: https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg05659.html
>   v2: https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04706.html
> 
> This series aims to fix a lot of bugs related to handling of IPv4 / IPv6
> dual stack.
> 
>  - The VNC server mistakenly listened on two separate ports 5900+5901
>when the to= parameter was given
>  - IPv6 sockets are accepting IPv4 clients even when IPv4 is set to
>be disabled
>  - IPv6 sockets are failing to accept IPv4 clients when IPv4 is not set
>to be disabled
>  - The VNC server was loosing the ipv4=/ipv6= settings due to a bug
>in the DNS resolver
> 
> The behaviour of all this is really subtle and hard to get working correctly
> across all the different network backends. Thus, the most important part of
> this patch series is the last patch which adds a test case covering the
> backends for -vnc, -chardev tcp, -net socket, and -incoming socket, with
> a 120 entry matrix.
> 
> IOW, if you think any of the first 4 patches are applying the wrong logic,
> then take a look at the last patch and indicate which test matrix entries
> are believed to be defining wrong behaviour :-)
> 
> Changed in v2:
> 
>  - Clarified error message in commit message
>  - Replace assert with exit (Eric)
>  - Fix typo in test comment (Eric)
>  - Fix wierd copyright line (Eric)
> 
> Daniel P. Berrange (5):
>   sockets: ensure we can bind to both ipv4 & ipv6 separately
>   sockets: don't block IPv4 clients when listening on "::"
>   sockets: ensure we don't accept IPv4 clients when IPv4 is disabled
>   io: preserve ipv4/ipv6 flags when resolving InetSocketAddress
>   tests: add functional test validating ipv4/ipv6 address flag handling
> 
>  io/dns-resolver.c  |   6 +-
>  tests/.gitignore   |   1 +
>  tests/Makefile.include |   4 +
>  tests/test-sockets-proto.c | 855 
> +
>  util/qemu-sockets.c|  71 +++-
>  5 files changed, 916 insertions(+), 21 deletions(-)
>  create mode 100644 tests/test-sockets-proto.c
> 

Series Reviewed-by: Vlad Yasevich <vyase...@redhat.com>

-vlad

Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-06-01 Thread Vlad Yasevich

On 06/01/2017 03:02 AM, Jason Wang wrote:
> 
> 
> On 2017年05月31日 02:57, Dr. David Alan Gilbert wrote:
>> * Vlad Yasevich (vyase...@redhat.com) wrote:
>>> On 05/26/2017 12:03 AM, Jason Wang wrote:
>>>> On 2017å¹´05æœˆ25æ—¥ 02:05, Vladislav Yasevich wrote:
>>>>> Add parameters that control RARP/GARP announcement timeouts.
>>>>> The parameters structure is added to the QAPI and a qmp command
>>>>> is added to set/get the parameter data.
>>>>>
>>>>> Based on work by "Dr. David Alan Gilbert"<dgilb...@redhat.com>
>>>>>
>>>>> Signed-off-by: Vladislav Yasevich<vyase...@redhat.com>
>>>> I think it's better to explain e.g under which condition do we need to 
>>>> tweak such
>>>> parameters.
>>>>
>>>> Thanks
>>>>
>>> OK.  I'll rip off what dgilbert wrote in his original series for the 
>>> description.
>>>
>>> Dave, if you have any text to add as to why migration might need to tweak 
>>> this, I'd
>>> appreciate it.
>> Pretty much what I originally said;  that the existing values
>> are arbitrary and fixed, and for systems with complex/sluggish
>> network reconfiguration systems they can be too slow.
>>
>> Dave
>>
> 
> I agree, but I'm not sure how much it can help in fact unless management can 
> set
> configuration specific parameters. And what we did here is best effort, 
> there's no
> guarantee that G(R)ARP packet can reach the destination.
> 

So, that's what the series allows.  If management knows something new, it can 
set
appropriate parameter values.  Additionally, the management is also free to 
trigger
additional announcements through the new commands.

I am starting to think that just for the sake of migration, exposing 
announce_self
interface to management might be sufficient.  Management, when it deems 
migration
complete, may use the interface to trigger announcements in addition to whatever
best effort QEMU may attempt itself.

Dave, would that be enough, or do the parameters still make sense?

Thanks
-vlad

Re: [Qemu-devel] [PATCH 08/12] announce_timer: Add ability to reset an existing

2017-05-30 Thread Vlad Yasevich

On 05/30/2017 03:35 PM, Dr. David Alan Gilbert wrote:
> * Vladislav Yasevich (vyase...@redhat.com) wrote:
>> It is now potentially possible to issue annouce-self command in a tight
>> loop.  Instead of doing nother, we can reset the timeout pararameters,
>> especially since each instance may provide it's own values.  This
>> allows the user to  extend or cut short currently runnig timer.
>>
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
> 
> ah ok, you can ignore my comment on the previous patch then!
> 
>> ---
>>  include/migration/vmstate.h |  1 +
>>  migration/savevm.c  | 41 +++--
>>  2 files changed, 32 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index 689b685..6dfdac3 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -1057,6 +1057,7 @@ void vmstate_register_ram_global(struct MemoryRegion 
>> *memory);
>>  
>>  typedef struct AnnounceTimer {
>>  QEMUTimer *tm;
>> +QemuMutex active_lock;
>>  struct AnnounceTimer **entry;
>>  AnnounceParameters params;
>>  QEMUClockType type;
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index dcba8bd..e43658f 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -220,20 +220,29 @@ static void qemu_announce_self_iter(NICState *nic, 
>> void *opaque)
>>  
>>  AnnounceTimer *announce_timers[QEMU_ANNOUNCE__MAX];
>>  
>> +static void qemu_announce_timer_destroy(AnnounceTimer *timer)
>> +{
>> +timer_del(timer->tm);
>> +timer_free(timer->tm);
>> +qemu_mutex_destroy(>active_lock);
> 
> Can you explain what makes this safe; we're not
> holding the lock at this point are we? Are we guaranteed
> that no one else will try and take it?
> Either way it should be commented to say why it's safe.

Looking at this again, it doesn't look safe.
The problem is the lookup code.  There is needs to be a lock on the
on the global array that needs to be held during creation/modification.

Thanks
-vlad

> 
> Dave
> 
>> +g_free(timer);
>> +}
>> +
>>  static void qemu_announce_self_once(void *opaque)
>>  {
>>  AnnounceTimer *timer = (AnnounceTimer *)opaque;
>>  
>> +qemu_mutex_lock(>active_lock);
>>  qemu_foreach_nic(qemu_announce_self_iter, NULL);
>>  
>> -if (--timer->round) {
>> +if (--timer->round ) {
>>  timer_mod(timer->tm, qemu_clock_get_ms(timer->type) +
>>self_announce_delay(timer));
>> +qemu_mutex_unlock(>active_lock);
>>  } else {
>> -*(timer->entry) = NULL;
>> -timer_del(timer->tm);
>> -timer_free(timer->tm);
>> -g_free(timer);
>> +*(timer->entry) = NULL;
>> +qemu_mutex_unlock(>active_lock);
>> +qemu_announce_timer_destroy(timer);
>>  }
>>  }
>>  
>> @@ -242,6 +251,7 @@ AnnounceTimer 
>> *qemu_announce_timer_new(AnnounceParameters *params,
>>  {
>>  AnnounceTimer *timer = g_new(AnnounceTimer, 1);
>>  
>> +qemu_mutex_init(>active_lock);
>>  timer->params = *params;
>>  timer->round = params->rounds;
>>  timer->type = type;
>> @@ -259,6 +269,21 @@ AnnounceTimer 
>> *qemu_announce_timer_create(AnnounceParameters *params,
>>  return timer;
>>  }
>>  
>> +static void qemu_announce_timer_update(AnnounceTimer *timer,
>> +   AnnounceParameters *params)
>> +{
>> +qemu_mutex_lock(>active_lock);
>> +
>> +/* Update timer paramenter with any new values.
>> + * Reset the number of rounds to run, and stop the current timer.
>> + */
>> +timer->params = *params;
>> +timer->round = params->rounds;
>> +timer_del(timer->tm);
>> +
>> +qemu_mutex_unlock(>active_lock);
>> +}
>> +
>>  void qemu_announce_self(AnnounceParameters *params, AnnounceType type)
>>  {
>>  AnnounceTimer *timer;
>> @@ -270,11 +295,7 @@ void qemu_announce_self(AnnounceParameters *params, 
>> AnnounceType type)
>>  announce_timers[type] = timer;
>>  timer->entry = _timers[type];
>>  } else {
>> -/* For now, don't do anything.  If we want to reset the timer,
>> - * we'll need to add locking to each announce timer to prevent
>> - * races between timeout handling and a reset.
>> - */
>> -return;
>> +qemu_announce_timer_update(timer, params);
>>  }
>>  qemu_announce_self_once(timer);
>>  }
>> -- 
>> 2.7.4
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

Re: [Qemu-devel] [PATCH 03/12] migration: Switch to using announcement timer

2017-05-30 Thread Vlad Yasevich

>>  qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
>>  }
>>  
>> -
>>  static void qemu_announce_self_once(void *opaque)
>>  {
>> -static int count = SELF_ANNOUNCE_ROUNDS;
>> -QEMUTimer *timer = *(QEMUTimer **)opaque;
>> +AnnounceTimer *timer = (AnnounceTimer *)opaque;
>>  
>>  qemu_foreach_nic(qemu_announce_self_iter, NULL);
>>  
>> -if (--count) {
>> -/* delay 50ms, 150ms, 250ms, ... */
>> -timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>> -  self_announce_delay(count));
>> +if (--timer->round) {
>> +timer_mod(timer->tm, qemu_clock_get_ms(timer->type) +
>> +  self_announce_delay(timer));
>>  } else {
>> -timer_del(timer);
>> -timer_free(timer);
>> +timer_del(timer->tm);
>> +timer_free(timer->tm);
>> +g_free(timer);
> 
> That's kind of odd in that it doesn't cause the thing the opaque
> pointed to, to be zero'd, so problem if someone free'd it
> in an exit path?  But probably OK in this case.
> 

In this case, the opaque is the AnnounceTimer itself.  So we end
up:

  +-> Announce_timer:  +>  QemuTimer:
  | tm  ---+  opaque --+
  ||
  ++

zero-ing opaque is a really qemu-timers job, but it doesn't do so, and
in this case, it's OK (sort of), since we call timer_free().

Thanks
-vlad

> Dave
> 
>>  }
>>  }
>>  
>> @@ -252,11 +250,13 @@ AnnounceTimer 
>> *qemu_announce_timer_create(AnnounceParameters *params,
>>  return timer;
>>  }
>>  
>> -void qemu_announce_self(void)
>> +void qemu_announce_self(AnnounceParameters *params)
>>  {
>> -static QEMUTimer *timer;
>> -timer = timer_new_ms(QEMU_CLOCK_REALTIME, qemu_announce_self_once, 
>> );
>> -qemu_announce_self_once();
>> +AnnounceTimer *timer;
>> +
>> +timer = qemu_announce_timer_create(params, QEMU_CLOCK_REALTIME,
>> +   qemu_announce_self_once);
>> +qemu_announce_self_once(timer);
>>  }
>>  
>>  /***/
>> @@ -1730,7 +1730,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
>>   */
>>  cpu_synchronize_all_post_init();
>>  
>> -qemu_announce_self();
>> +qemu_announce_self(qemu_get_announce_params());
>>  
>>  /* Make sure all file formats flush their mutable metadata.
>>   * If we get an error here, just don't restart the VM yet. */
>> -- 
>> 2.7.4
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command

2017-05-30 Thread Vlad Yasevich

On 05/30/2017 10:14 AM, Daniel P. Berrange wrote:
> On Wed, May 24, 2017 at 02:05:22PM -0400, Vladislav Yasevich wrote:
>> Add a qmp command that can trigger guest announcements.
>>
>> Based on work of Germano Veit Michel <germ...@redhat.com>
>>
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>> ---
>>  migration/savevm.c | 14 ++
>>  qapi-schema.json   | 19 +++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index a4097c9..b55ce6a 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -265,6 +265,20 @@ void qemu_announce_self(AnnounceParameters *params)
>>  qemu_announce_self_once(timer);
>>  }
>>  
>> +void qmp_announce_self(bool has_params, AnnounceParameters *params,
>> +   Error **errp)
>> +{
>> +AnnounceParameters announce_params;
>> +
>> +memcpy(_params, qemu_get_announce_params(),
>> +   sizeof(announce_params));
>> +
>> +if (has_params)
>> +qemu_set_announce_parameters(_params, params);
>> +
>> +qemu_announce_self(_params);
>> +}
> 
> I'm not a huge fan of the idea of setting announce parameters in a
> global namespace, via a previously issued separate command.
> 
> I realize this is already done for the 'migrate' command, but there
> it has been the cause of a number of bugs in libvirt mgmt of QEMU.
> At least with migrate it makes sense for some parameters which need
> to be tuned 'on the fly' after migrate already started. That doesn't
> apply to this command though.
> 
> So I tend to think it'd be nicer to just make the parameters mandatory
> for this command, so it is self-contanied does and not rely on previously
> set (or potentially unknown/indeterminate) global state.
> 

The reason I didn't do this is that I didn't want to require the user
to know or even think about the timeout values. I did want to provide
an option though for those who may want to tweak things.  In most circumstances,
the timeout values would never really be touched and we'll just use the
qemu defaults.  in rare circumstances, values might need to be changed
during migration.  Here, I can definitely see the need to give migration
it's own values.


> I'd also suggest that instad of adding an 'announce_set_parameters',
> the parameters for the automatic announce at end of migration should
> be set via the 'migrate_set_parameters' command.

I can do that.

> 
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 2030087..126b09d 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -654,6 +654,25 @@
>>'returns': 'AnnounceParameters' }
>>  
>>  ##
>> +# @announce-self:
>> +#
>> +# Trigger generation of broadcast RARP frames to update network switches.
>> +# This can be useful when network bonds fail-over the active slave.
>> +#
>> +# Arguments: None.
> 
> "None", but it has some arguments listed right below

will fix


Thanks
-vlad

> 
>> +#
>> +# Example:
>> +#
>> +# -> { "execute": "announce-self"
>> +#  "arguments": { "announce-rounds": 5 } }
>> +# <- { "return": {} }
>> +#
>> +# Since: 2.10
>> +##
>> +{ 'command': 'announce-self',
>> +  'data' : {'*params': 'AnnounceParameters'} }
>> +
>> +##
>>  # @MigrationStats:
>>  #
>>  # Detailed migration status.
>> -- 
>> 2.7.4
>>
>>
> 
> Regards,
> Daniel
>

Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command

2017-05-30 Thread Vlad Yasevich

On 05/30/2017 10:24 AM, Juan Quintela wrote:
> Vlad Yasevich <vyase...@redhat.com> wrote:
>> On 05/30/2017 06:11 AM, Juan Quintela wrote:
>>> Vladislav Yasevich <vyase...@redhat.com> wrote:
>>>> Add a qmp command that can trigger guest announcements.
>>>>
>>>> Based on work of Germano Veit Michel <germ...@redhat.com>
>>>>
>>>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>>>> ---
>>>>  migration/savevm.c | 14 ++
>>>>  qapi-schema.json   | 19 +++
>>>>  2 files changed, 33 insertions(+)
>>>>
>>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>>> index a4097c9..b55ce6a 100644
>>>> --- a/migration/savevm.c
>>>> +++ b/migration/savevm.c
>>>> @@ -265,6 +265,20 @@ void qemu_announce_self(AnnounceParameters *params)
>>>>  qemu_announce_self_once(timer);
>>>>  }
>>>>  
>>>> +void qmp_announce_self(bool has_params, AnnounceParameters *params,
>>>> +   Error **errp)
>>>> +{
>>>> +AnnounceParameters announce_params;
>>>> +
>>>> +memcpy(_params, qemu_get_announce_params(),
>>>> +   sizeof(announce_params));
>>>> +
>>>> +if (has_params)
>>>> +qemu_set_announce_parameters(_params, params);
>>>> +
>>>> +qemu_announce_self(_params);
>>>
>>> Are I missreading qemu_annouce_self()?
>>> My reading is that it passes announce_params to a timer (i.e. async
>>> function), but here announce_params is a local variable here, no?
>>>
>>
>> The AnnounceTimer holds a copy since each timer may have it's own values.
> 
> 
> 
>> AnnounceTimer *qemu_announce_timer_new(AnnounceParameters *params,
>>    QEMUClockType type)
>> {
>>AnnounceTimer *timer = g_new(AnnounceTimer, 1);
>>
>>timer->params = *params;
> 
> I have to remomember that C has learn how to copy structures long ago.
> 
> 
> 
> I was missing the "*" on my previous reading, sorry for the noise.

that actually changed, since as Eric Pointed out, shallow copies shouldn't
be used.  In the v2 code, this uses qemu_set_announce_paramters(), but the copy
essentially remains.

-vlad
> 
>>timer->round = params->rounds;
>>timer->type = type;
>>
>>return timer;
>> }
> 
>

Re: [Qemu-devel] [PATCH 09/12] hmp: add announce paraters info/set

2017-05-30 Thread Vlad Yasevich

On 05/30/2017 06:18 AM, Juan Quintela wrote:
> Vladislav Yasevich <vyase...@redhat.com> wrote:
>> Add HMP command to control and read annoucment parameters.
>>
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
> 
> Reviewed-by: Juan Quintela <quint...@redhat.com>
> 
> 
>> + cleanup:
>> +if (err) {
>> +error_report_err(err);
>> +}
>> +}
> 
> 
> My understanding is that this the "cool way" to spell this nowadays is:
> 
> hmp_handle_error(mon, );
> 
> Just in case you want to change it.
> 

Thanks. I must have used an older function as base.  I'll check it out.

-vlad

Re: [Qemu-devel] [PATCH 03/12] migration: Switch to using announcement timer

2017-05-30 Thread Vlad Yasevich

On 05/30/2017 06:10 AM, Juan Quintela wrote:
> Vladislav Yasevich <vyase...@redhat.com> wrote:
>> Switch qemu_announce_self and virtio annoucements to use
>> the announcement timer framework.  This makes sure that both
>> timers use the same timeouts and number of annoucement attempts
>>
>> Based on work by Dr. David Alan Gilbert <dgilb...@redhat.com>
>>
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
> 
>>  static void qemu_announce_self_once(void *opaque)
>>  {
>> -static int count = SELF_ANNOUNCE_ROUNDS;
>> -QEMUTimer *timer = *(QEMUTimer **)opaque;
>> +AnnounceTimer *timer = (AnnounceTimer *)opaque;
> 
> Cast from void * is never needed.
> 

OK

Thanks
-vlad

Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command

2017-05-30 Thread Vlad Yasevich

On 05/30/2017 06:11 AM, Juan Quintela wrote:
> Vladislav Yasevich <vyase...@redhat.com> wrote:
>> Add a qmp command that can trigger guest announcements.
>>
>> Based on work of Germano Veit Michel <germ...@redhat.com>
>>
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>> ---
>>  migration/savevm.c | 14 ++
>>  qapi-schema.json   | 19 +++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index a4097c9..b55ce6a 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -265,6 +265,20 @@ void qemu_announce_self(AnnounceParameters *params)
>>  qemu_announce_self_once(timer);
>>  }
>>  
>> +void qmp_announce_self(bool has_params, AnnounceParameters *params,
>> +   Error **errp)
>> +{
>> +AnnounceParameters announce_params;
>> +
>> +memcpy(_params, qemu_get_announce_params(),
>> +   sizeof(announce_params));
>> +
>> +if (has_params)
>> +qemu_set_announce_parameters(_params, params);
>> +
>> +qemu_announce_self(_params);
> 
> Are I missreading qemu_annouce_self()?
> My reading is that it passes announce_params to a timer (i.e. async
> function), but here announce_params is a local variable here, no?
> 

The AnnounceTimer holds a copy since each timer may have it's own values.

Thanks
-vlad

Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-05-30 Thread Vlad Yasevich

On 05/30/2017 05:58 AM, Juan Quintela wrote:
> Vladislav Yasevich <vyase...@redhat.com> wrote:
>> Add parameters that control RARP/GARP announcement timeouts.
>> The parameters structure is added to the QAPI and a qmp command
>> is added to set/get the parameter data.
>>
>> Based on work by "Dr. David Alan Gilbert" <dgilb...@redhat.com>
>>
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
> 
> Hi
> 
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index f5e8194..cee2837 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -78,6 +78,104 @@ static struct mig_cmd_args {
>>  [MIG_CMD_MAX]  = { .len = -1, .name = "MAX" },
>>  };
>>  
> 
> Once that you are touching this, shouldn't it be better to be in
> net/?
> They have nothing to do with migration really.
> 

Yeah, I considered this, but could really find a good slot for them.
I can either put them into net.c directly or pull them out into their
own little file.

> 
>> +#define QEMU_ANNOUNCE_INITIAL50
>> +#define QEMU_ANNOUNCE_MAX   550
>> +#define QEMU_ANNOUNCE_ROUNDS  5
>> +#define QEMU_ANNOUNCE_STEP  100
>> +
>> +AnnounceParameters *qemu_get_announce_params(void)
>> +{
>> +static AnnounceParameters announce = {
>> +.initial = QEMU_ANNOUNCE_INITIAL,
>> +.max = QEMU_ANNOUNCE_MAX,
>> +.rounds = QEMU_ANNOUNCE_ROUNDS,
>> +.step = QEMU_ANNOUNCE_STEP,
>> +};
>> +
>> +return 
>> +}
>> +
>> +void qemu_fill_announce_parameters(AnnounceParameters **to,
>> +   AnnounceParameters *from)
>> +{
>> +AnnounceParameters *params;
>> +
>> +params = *to = g_malloc0(sizeof(*params));
>> +params->has_initial = true;
>> +params->initial = from->initial;
>> +params->has_max = true;
>> +params->max = from->max;
>> +params->has_rounds = true;
>> +params->rounds = from->rounds;
>> +params->has_step = true;
>> +params->step = from->step;
>> +}
>> +
>> +bool qemu_validate_announce_parameters(AnnounceParameters *params, Error 
>> **errp)
>> +{
>> +if (params->has_initial &&
>> +(params->initial < 1 || params->initial > 10)) {
> 
> This is independent of this patch, but we really need a macro:
>   CHECK(field, mininum, maximum)
> 
> We repeat this left and right.
> 
>> +void qemu_set_announce_parameters(AnnounceParameters *announce_params,
>> +  AnnounceParameters *params)
> 
>> +void qmp_announce_set_parameters(AnnounceParameters *params, Error **errp)
> 
> Really similar functions name.  I assume by know that the 1st function
> is used somewhere else in the series.
> 

Yes, the first function is going to be used later.

Thanks
-vlad

Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command

2017-05-26 Thread Vlad Yasevich

On 05/26/2017 09:16 AM, Eric Blake wrote:
> On 05/24/2017 01:05 PM, Vladislav Yasevich wrote:
>> Add a qmp command that can trigger guest announcements.
>>
>> Based on work of Germano Veit Michel <germ...@redhat.com>
>>
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>> ---
>>  migration/savevm.c | 14 ++
>>  qapi-schema.json   | 19 +++
>>  2 files changed, 33 insertions(+)
>>
> 
>> +void qmp_announce_self(bool has_params, AnnounceParameters *params,
>> +   Error **errp)
>> +{
>> +AnnounceParameters announce_params;
>> +
>> +memcpy(_params, qemu_get_announce_params(),
>> +   sizeof(announce_params));
> 
> Shallow copies of a QAPI type happen to work when the type is all scalar
> (as AnnounceParameters currently is), but it's more robust to use
> QAPI_CLONE() or QAPI_CLONE_MEMBERS() so that any future non-scalar
> additions to the parameters type will be correctly copied without
> introducing memory leaks or double frees.
> 
> Even this might be better:
> AnnounceParameters announce_params = { 0 };
> qemu_set_announce_parameters(_params, qemu_get_announce_params());
> 

Ok.

>> +
>> +if (has_params)
>> +qemu_set_announce_parameters(_params, params);
>> +
>> +qemu_announce_self(_params);
>> +}
> The QMP changes look okay.  Do you have a testsuite covering the new QMP
> command somewhere in the series?
> 

There is basic test in patch 12.

Thanks
-vlad

Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-05-26 Thread Vlad Yasevich

On 05/26/2017 09:08 AM, Eric Blake wrote:
> On 05/24/2017 01:05 PM, Vladislav Yasevich wrote:
>> Add parameters that control RARP/GARP announcement timeouts.
>> The parameters structure is added to the QAPI and a qmp command
>> is added to set/get the parameter data.
>>
>> Based on work by "Dr. David Alan Gilbert" <dgilb...@redhat.com>
>>
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>> ---
> 
> Just an interface review for now:
> 
>> +++ b/qapi-schema.json
>> @@ -569,6 +569,90 @@
>>  ##
>>  { 'command': 'query-events', 'returns': ['EventInfo'] }
>>  
>> +
>> +##
>> +# @AnnounceParameter:
>> +#
>> +# @AnnounceParameter enumeration
>> +#
>> +# @initial: Initial delay (in ms) before sending the first GARP/RARP
>> +#   announcement
>> +#
>> +# @max: Maximum delay (in ms) to between GARP/RARP announcemnt packets
> 
> s/announcemnt/announcement/
> 
>> +#
>> +# @rounds: Number of self-announcement attempts
>> +#
>> +# @step: Delay increate (in ms) after each self-announcment attempt
> 
> s/increate/increase/
> s/announcment/announcement/
> 
>> +#
>> +# Since: 2.10
>> +##
>> +{ 'enum' : 'AnnounceParameter',
>> +  'data' : [ 'initial', 'max', 'rounds', 'step' ] }
> 
> Why are we creating an enum here?  Without reading further, it looks
> like you plan on using the enum to delineate members of a union? But
> that feels like it will be overly complicated.  A struct should be
> sufficient (each parameter being an optional member of the struct, where
> you can supply as many or as few on input, but all are reported on output).
> 

I end up using them for the HMP interface.  If it's better, I can move this
definition to the HMP patch.

Thanks
-vlad

Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-05-26 Thread Vlad Yasevich

On 05/26/2017 12:03 AM, Jason Wang wrote:
> 
> On 2017年05月25日 02:05, Vladislav Yasevich wrote:
>> Add parameters that control RARP/GARP announcement timeouts.
>> The parameters structure is added to the QAPI and a qmp command
>> is added to set/get the parameter data.
>>
>> Based on work by "Dr. David Alan Gilbert" <dgilb...@redhat.com>
>>
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
> 
> I think it's better to explain e.g under which condition do we need to tweak 
> such parameters.
> 
> Thanks
> 

OK.  I'll rip off what dgilbert wrote in his original series for the 
description.

Dave, if you have any text to add as to why migration might need to tweak this, 
I'd
appreciate it.

Thanks
-vlad

Re: [Qemu-devel] [PATCH 03/12] migration: Switch to using announcement timer

2017-05-26 Thread Vlad Yasevich

 timer_free(n->announce_timer->tm);
>> +g_free(n->announce_timer);
>>   g_free(n->vqs);
>>   qemu_del_nic(n->nic);
>>   virtio_cleanup(vdev);
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 1eec9a2..51dd4c3 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -94,8 +94,7 @@ typedef struct VirtIONet {
>>   char *netclient_name;
>>   char *netclient_type;
>>   uint64_t curr_guest_offloads;
>> -QEMUTimer *announce_timer;
>> -int announce_counter;
>> +AnnounceTimer *announce_timer;
>>   bool needs_vnet_hdr_swap;
>>   } VirtIONet;
>>   diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index a6bf84d..f8aed9b 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -1022,8 +1022,6 @@ extern const VMStateInfo vmstate_info_qtailq;
>>   #define VMSTATE_END_OF_LIST() \
>>   {}
>>   -#define SELF_ANNOUNCE_ROUNDS 5
>> -
>>   void loadvm_free_handlers(MigrationIncomingState *mis);
>> int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
>> @@ -1071,11 +1069,18 @@ AnnounceTimer 
>> *qemu_announce_timer_create(AnnounceParameters
>> *params,
>> QEMUTimerCB *cb);
>> static inline
>> -int64_t self_announce_delay(int round)
>> +int64_t self_announce_delay(AnnounceTimer *timer)
>>   {
>> -assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
>> -/* delay 50ms, 150ms, 250ms, ... */
>> -return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>> +int64_t ret;
>> +
>> +ret =  timer->params.initial +
>> +   (timer->params.rounds - timer->round - 1) *
>> +   timer->params.step;
>> +
>> +if (ret < 0 || ret > timer->params.max) {
>> +ret = timer->params.max;
>> +}
> 
> Can we move this check to qemu_validate_announce_parameters()?

Not really.  Consider a situation where you don't really want
to wait for more the then say 5 seconds between announcements, but
you want to increase the number of announcement attempts (since you
are not sure how long it will take to plumb the whole network path).
In this case, you will hit the max delay before hitting max rounds, and
the code above makes sure that we don't exceed max delay.

I can update the names to make it clearer, if it will help.

Thanks
-vlad

> 
>> +return ret;
>>   }
>> void dump_vmstate_json_to_file(FILE *out_fp);
>> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>> index 7fd49c4..2ef1687 100644
>> --- a/include/sysemu/sysemu.h
>> +++ b/include/sysemu/sysemu.h
>> @@ -85,7 +85,7 @@ bool qemu_validate_announce_parameters(AnnounceParameters 
>> *params,
>>  Error **errp);
>>   void qemu_set_announce_parameters(AnnounceParameters *announce_params,
>> AnnounceParameters *params);
>> -void qemu_announce_self(void);
>> +void qemu_announce_self(AnnounceParameters *params);
>> /* Subcommands for QEMU_VM_COMMAND */
>>   enum qemu_vm_cmd {
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 0304c01..987c1cf 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -345,7 +345,7 @@ static void process_incoming_migration_bh(void *opaque)
>>* This must happen after all error conditions are dealt with and
>>* we're sure the VM is going to be running on this host.
>>*/
>> -qemu_announce_self();
>> +qemu_announce_self(qemu_get_announce_params());
>> /* If global state section was not received or we are in running
>>  state, we need to obey autostart. Any other state is set with
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 607b090..555157a 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -212,21 +212,19 @@ static void qemu_announce_self_iter(NICState *nic, 
>> void *opaque)
>>   qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
>>   }
>>   -
>>   static void qemu_announce_self_once(void *opaque)
>>   {
>> -static int count = SELF_ANNOUNCE_ROUNDS;
>> -QEMUTimer *timer = *(QEMUTimer **)opaque;
>> +AnnounceTimer *timer = (AnnounceTimer *)opaque;
>> qemu_foreach_nic(qemu_announce_self_iter, NULL);
>

Re: [Qemu-devel] [PATCH v2] virtio_net: Bypass backends for MTU feature negotiation

2017-05-23 Thread Vlad Yasevich

On 05/23/2017 08:31 AM, Maxime Coquelin wrote:
> This patch adds a new internal "x-mtu-bypass-backend" property
> to bypass backends for MTU feature negotiation.
> 
> When this property is set, the MTU feature is negotiated as soon
> as supported by the guest and a MTU value is set via the host_mtu
> parameter. In case the backend advertises the feature (e.g. DPDK's
> vhost-user backend), the feature negotiation is propagated down to
> the backend.
> 
> When this property is not set, the backend has to support the MTU
> feature for its negotiation to succeed.
> 
> For compatibility purpose, this property is disabled for machine
> types v2.9 and older.
> 
> Cc: Aaron Conole <acon...@redhat.com>
> Suggested-by: Michael S. Tsirkin <m...@redhat.com>
> Signed-off-by: Maxime Coquelin <maxime.coque...@redhat.com>

Reviewed-by: Vlad Yasevich <vyase...@redhat.com>

-vlad

> ---
> 
> Tests performed:
> - Vhost-net kernel backendi, host_mtu=2000:
>  * default machine type: guest MTU = 2000
>  * pc-q35-2.9 machine type: Guest MTU = 1500
> 
> - Vhost-net user backend, host_mtu=2000:
>  * DPDK v17.05 (MTU feature supported on backend side)
>   - default machine type: guest MTU = 2000
>   - pc-q35-2.9 machine type: guest MTU = 2000)
>  * DPDK v16.11 (MTU feature not supported on backend side)
>   - default machine type: guest MTU = 2000
>   - pc-q35-2.9 machine type: guest MTU = 1500)
> 
>  hw/net/virtio-net.c| 17 -
>  include/hw/compat.h|  4 
>  include/hw/virtio/virtio-net.h |  1 +
>  include/hw/virtio/virtio.h |  1 +
>  4 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 7d091c9..39c336e 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -589,7 +589,15 @@ static uint64_t virtio_net_get_features(VirtIODevice 
> *vdev, uint64_t features,
>  if (!get_vhost_net(nc->peer)) {
>  return features;
>  }
> -return vhost_net_get_features(get_vhost_net(nc->peer), features);
> +features = vhost_net_get_features(get_vhost_net(nc->peer), features);
> +vdev->backend_features = features;
> +
> +if (n->mtu_bypass_backend &&
> +(n->host_features & 1ULL << VIRTIO_NET_F_MTU)) {
> +features |= (1ULL << VIRTIO_NET_F_MTU);
> +}
> +
> +return features;
>  }
>  
>  static uint64_t virtio_net_bad_features(VirtIODevice *vdev)
> @@ -640,6 +648,11 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
> uint64_t features)
>  VirtIONet *n = VIRTIO_NET(vdev);
>  int i;
>  
> +if (n->mtu_bypass_backend &&
> +!virtio_has_feature(vdev->backend_features, VIRTIO_NET_F_MTU)) {
> +features &= ~(1ULL << VIRTIO_NET_F_MTU);
> +}
> +
>  virtio_net_set_multiqueue(n,
>virtio_has_feature(features, VIRTIO_NET_F_MQ));
>  
> @@ -2090,6 +2103,8 @@ static Property virtio_net_properties[] = {
>  DEFINE_PROP_UINT16("rx_queue_size", VirtIONet, net_conf.rx_queue_size,
> VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE),
>  DEFINE_PROP_UINT16("host_mtu", VirtIONet, net_conf.mtu, 0),
> +DEFINE_PROP_BOOL("x-mtu-bypass-backend", VirtIONet, mtu_bypass_backend,
> + true),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/compat.h b/include/hw/compat.h
> index 55b1765..181f450 100644
> --- a/include/hw/compat.h
> +++ b/include/hw/compat.h
> @@ -6,6 +6,10 @@
>  .driver   = "pci-bridge",\
>  .property = "shpc",\
>  .value= "off",\
> +},{\
> +.driver   = "virtio-net-device",\
> +.property = "x-mtu-bypass-backend",\
> +.value= "off",\
>  },
>  
>  #define HW_COMPAT_2_8 \
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 1eec9a2..602b486 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -97,6 +97,7 @@ typedef struct VirtIONet {
>  QEMUTimer *announce_timer;
>  int announce_counter;
>  bool needs_vnet_hdr_swap;
> +bool mtu_bypass_backend;
>  } VirtIONet;
>  
>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7b6edba..80c45c3 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -79,6 +79,7 @@ struct VirtIODevice
>  uint16_t queue_sel;
>  uint64_t guest_features;
>  uint64_t host_features;
> +uint64_t backend_features;
>  size_t config_len;
>  void *config;
>  uint16_t config_vector;
>

Re: [Qemu-devel] [PATCH V2] migration: expose qemu_announce_self() via qmp

2017-05-12 Thread Vlad Yasevich

On 05/12/2017 03:24 PM, Dr. David Alan Gilbert wrote:
> * Vlad Yasevich (vyase...@redhat.com) wrote:
>> On 02/20/2017 07:16 PM, Germano Veit Michel wrote:
>>> qemu_announce_self() is triggered by qemu at the end of migrations
>>> to update the network regarding the path to the guest l2addr.
>>>
>>> however it is also useful when there is a network change such as
>>> an active bond slave swap. Essentially, it's the same as a migration
>>> from a network perspective - the guest moves to a different point
>>> in the network topology.
>>>
>>> this exposes the function via qmp.
>>>
>>> Signed-off-by: Germano Veit Michel <germ...@redhat.com>
>>> ---
>>>  include/migration/vmstate.h |  5 +
>>>  migration/savevm.c  | 30 +++---
>>>  qapi-schema.json| 18 ++
>>>  3 files changed, 42 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>>> index 63e7b02..a08715c 100644
>>> --- a/include/migration/vmstate.h
>>> +++ b/include/migration/vmstate.h
>>> @@ -1042,6 +1042,11 @@ int64_t self_announce_delay(int round)
>>>  return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>>>  }
>>>
>>> +struct AnnounceRound {
>>> +QEMUTimer *timer;
>>> +int count;
>>> +};
>>> +
>>>  void dump_vmstate_json_to_file(FILE *out_fp);
>>>
>>>  #endif
>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>> index 5ecd264..44e196b 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -118,29 +118,37 @@ static void qemu_announce_self_iter(NICState
>>> *nic, void *opaque)
>>>  qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
>>>  }
>>>
>>> -
>>>  static void qemu_announce_self_once(void *opaque)
>>>  {
>>> -static int count = SELF_ANNOUNCE_ROUNDS;
>>> -QEMUTimer *timer = *(QEMUTimer **)opaque;
>>> +struct AnnounceRound *round = opaque;
>>>
>>>  qemu_foreach_nic(qemu_announce_self_iter, NULL);
>>>
>>> -if (--count) {
>>> +round->count--;
>>> +if (round->count) {
>>>  /* delay 50ms, 150ms, 250ms, ... */
>>> -timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>>> -  self_announce_delay(count));
>>> +timer_mod(round->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>>> +  self_announce_delay(round->count));
>>>  } else {
>>> -timer_del(timer);
>>> -timer_free(timer);
>>> +timer_del(round->timer);
>>> +timer_free(round->timer);
>>> +g_free(round);
>>>  }
>>>  }
>>>
>>>  void qemu_announce_self(void)
>>>  {
>>> -static QEMUTimer *timer;
>>> -timer = timer_new_ms(QEMU_CLOCK_REALTIME, qemu_announce_self_once, 
>>> );
>>> -qemu_announce_self_once();
>>> +struct AnnounceRound *round = g_malloc(sizeof(struct AnnounceRound));
>>> +if (!round)
>>> +return;
>>> +round->count = SELF_ANNOUNCE_ROUNDS;
>>> +round->timer = timer_new_ms(QEMU_CLOCK_REALTIME,
>>> qemu_announce_self_once, round);
>>> +qemu_announce_self_once(round);
>>> +}
>>
>> So, I've been looking and this code and have been playing with it and with 
>> David's
>> patches and my patches to include virtio self announcements as well.  What 
>> I've discovered
>> is what I think is a possible packet amplification issue here.
>>
>> This creates a new timer every time we do do a announce_self.  With just 
>> migration,
>> this is not an issue since you only migrate once at a time, so there is only 
>> 1 timer.
>> With exposing this as an API, a user can potentially call it in a tight loop
>> and now you have a ton of timers being created.  Add in David's patches 
>> allowing timeouts
>> and retries to be configurable, and you may now have a ton of long lived 
>> timers.
>> Add in the patches I am working on to let virtio do self announcements too 
>> (to really fix
>> bonding issues), and now you add in a possibility of a lot of packets being 
>> sent for
>> each timeout (RARP, GARP, NA, IGMPv4 Reports, IGMPv6 Reports [even worse if 
>> MLD1 is used]).
>

Re: [Qemu-devel] [PATCH V2] migration: expose qemu_announce_self() via qmp

2017-05-11 Thread Vlad Yasevich

On 02/20/2017 07:16 PM, Germano Veit Michel wrote:
> qemu_announce_self() is triggered by qemu at the end of migrations
> to update the network regarding the path to the guest l2addr.
> 
> however it is also useful when there is a network change such as
> an active bond slave swap. Essentially, it's the same as a migration
> from a network perspective - the guest moves to a different point
> in the network topology.
> 
> this exposes the function via qmp.
> 
> Signed-off-by: Germano Veit Michel <germ...@redhat.com>
> ---
>  include/migration/vmstate.h |  5 +
>  migration/savevm.c  | 30 +++---
>  qapi-schema.json| 18 ++
>  3 files changed, 42 insertions(+), 11 deletions(-)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 63e7b02..a08715c 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -1042,6 +1042,11 @@ int64_t self_announce_delay(int round)
>  return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>  }
> 
> +struct AnnounceRound {
> +QEMUTimer *timer;
> +int count;
> +};
> +
>  void dump_vmstate_json_to_file(FILE *out_fp);
> 
>  #endif
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 5ecd264..44e196b 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -118,29 +118,37 @@ static void qemu_announce_self_iter(NICState
> *nic, void *opaque)
>  qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
>  }
> 
> -
>  static void qemu_announce_self_once(void *opaque)
>  {
> -static int count = SELF_ANNOUNCE_ROUNDS;
> -QEMUTimer *timer = *(QEMUTimer **)opaque;
> +struct AnnounceRound *round = opaque;
> 
>  qemu_foreach_nic(qemu_announce_self_iter, NULL);
> 
> -if (--count) {
> +round->count--;
> +if (round->count) {
>  /* delay 50ms, 150ms, 250ms, ... */
> -timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> -  self_announce_delay(count));
> +timer_mod(round->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +  self_announce_delay(round->count));
>  } else {
> -timer_del(timer);
> -timer_free(timer);
> +timer_del(round->timer);
> +timer_free(round->timer);
> +g_free(round);
>  }
>  }
> 
>  void qemu_announce_self(void)
>  {
> -static QEMUTimer *timer;
> -timer = timer_new_ms(QEMU_CLOCK_REALTIME, qemu_announce_self_once, 
> );
> -qemu_announce_self_once();
> +struct AnnounceRound *round = g_malloc(sizeof(struct AnnounceRound));
> +if (!round)
> +return;
> +round->count = SELF_ANNOUNCE_ROUNDS;
> +round->timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> qemu_announce_self_once, round);
> +qemu_announce_self_once(round);
> +}

So, I've been looking and this code and have been playing with it and with 
David's
patches and my patches to include virtio self announcements as well.  What I've 
discovered
is what I think is a possible packet amplification issue here.

This creates a new timer every time we do do a announce_self.  With just 
migration,
this is not an issue since you only migrate once at a time, so there is only 1 
timer.
With exposing this as an API, a user can potentially call it in a tight loop
and now you have a ton of timers being created.  Add in David's patches 
allowing timeouts
and retries to be configurable, and you may now have a ton of long lived timers.
Add in the patches I am working on to let virtio do self announcements too (to 
really fix
bonding issues), and now you add in a possibility of a lot of packets being 
sent for
each timeout (RARP, GARP, NA, IGMPv4 Reports, IGMPv6 Reports [even worse if 
MLD1 is used]).

As you can see, this can get rather ugly...

I think we need timer user here.  Migration and QMP being two to begin with.  
Each
one would get a single timer to play with.  If a given user already has a timer 
running,
we could return an error or just not do anything.

-vlad

> +
> +void qmp_announce_self(Error **errp)
> +{
> +qemu_announce_self();
>  }
> 
>  /***/
> diff --git a/qapi-schema.json b/qapi-schema.json
> index baa0d26..0d9bffd 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -6080,3 +6080,21 @@
>  #
>  ##
>  { 'command': 'query-hotpluggable-cpus', 'returns': ['HotpluggableCPU'] }
> +
> +##
> +# @announce-self:
> +#
> +# Trigger generation of broadcast RARP frames to update network switches.
> +# This can be useful when network bonds fail-over the active slave.
> +#
> +# Arguments: None.
> +#
> +# Example:
> +#
> +# -> { "execute": "announce-self" }
> +# <- { "return": {} }
> +#
> +# Since: 2.9
> +##
> +{ 'command': 'announce-self' }
> +
>

Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self

2017-05-05 Thread Vlad Yasevich

On 05/02/2017 06:41 AM, Dr. David Alan Gilbert wrote:
> * Vlad Yasevich (vyase...@redhat.com) wrote:
>> On 03/30/2017 11:26 AM, Dr. David Alan Gilbert wrote:
>>> * Vlad Yasevich (vyase...@redhat.com) wrote:
>>>> On 03/30/2017 10:53 AM, Dr. David Alan Gilbert wrote:
>>>>> * Vlad Yasevich (vyase...@redhat.com) wrote:
>>>>>> On 03/30/2017 04:24 AM, Dr. David Alan Gilbert wrote:
>>>>>>> * Vladislav Yasevich (vyase...@redhat.com) wrote:
>>>>>>>> virtio announcements seem to run on its own timer with it's own
>>>>>>>> repetition loop and are invoked separately from qemu_announce_self.
>>>>>>>> This patch consolidates announcements into a single location and
>>>>>>>> allows other nics to provide it's own announcement capabilities, if
>>>>>>>> necessary.
>>>>>>>>
>>>>>>>> This is also usefull in support of exposing qemu_announce_self through
>>>>>>>> qmp/hmp.
>>>>>>>>
>>>>>>>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>>>>>>>> ---
>>>>>>>>  hw/net/virtio-net.c| 30 --
>>>>>>>>  include/hw/virtio/virtio-net.h |  2 --
>>>>>>>>  include/net/net.h  |  2 ++
>>>>>>>>  migration/savevm.c |  6 ++
>>>>>>>>  4 files changed, 16 insertions(+), 24 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>>>> index c321680..6e9ce4f 100644
>>>>>>>> --- a/hw/net/virtio-net.c
>>>>>>>> +++ b/hw/net/virtio-net.c
>>>>>>>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, 
>>>>>>>> uint8_t status)
>>>>>>>>  (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> -static void virtio_net_announce_timer(void *opaque)
>>>>>>>> +static void virtio_net_announce(NetClientState *nc)
>>>>>>>>  {
>>>>>>>> -VirtIONet *n = opaque;
>>>>>>>> +VirtIONet *n = qemu_get_nic_opaque(nc);
>>>>>>>>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>>>>  
>>>>>>>> -n->announce_counter--;
>>>>>>>> -n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>>>> -virtio_notify_config(vdev);
>>>>>>>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>>>>>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>>>>>>>> +n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>>>> +virtio_notify_config(vdev);
>>>>>>>> +}
>>>>>>>>  }
>>>>>>>>  
>>>>>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>>>>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>>>>>  n->nobcast = 0;
>>>>>>>>  /* multiqueue is disabled by default */
>>>>>>>>  n->curr_queues = 1;
>>>>>>>> -timer_del(n->announce_timer);
>>>>>>>> -n->announce_counter = 0;
>>>>>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>>>>  
>>>>>>>>  /* Flush any MAC and VLAN filter table state */
>>>>>>>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet 
>>>>>>>> *n, uint8_t cmd,
>>>>>>>>  if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>>>>>>>  n->status & VIRTIO_NET_S_ANNOUNCE) {
>>>>>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>>>> -if (n->announce_counter) {
>>>>>>>> -timer_mod(n->announce_timer,
>>>>>>>> -  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>>>>>>>> -  self_announce_delay(n->announce_counter

Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self

2017-03-30 Thread Vlad Yasevich

On 03/30/2017 11:26 AM, Dr. David Alan Gilbert wrote:
> * Vlad Yasevich (vyase...@redhat.com) wrote:
>> On 03/30/2017 10:53 AM, Dr. David Alan Gilbert wrote:
>>> * Vlad Yasevich (vyase...@redhat.com) wrote:
>>>> On 03/30/2017 04:24 AM, Dr. David Alan Gilbert wrote:
>>>>> * Vladislav Yasevich (vyase...@redhat.com) wrote:
>>>>>> virtio announcements seem to run on its own timer with it's own
>>>>>> repetition loop and are invoked separately from qemu_announce_self.
>>>>>> This patch consolidates announcements into a single location and
>>>>>> allows other nics to provide it's own announcement capabilities, if
>>>>>> necessary.
>>>>>>
>>>>>> This is also usefull in support of exposing qemu_announce_self through
>>>>>> qmp/hmp.
>>>>>>
>>>>>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>>>>>> ---
>>>>>>  hw/net/virtio-net.c| 30 --
>>>>>>  include/hw/virtio/virtio-net.h |  2 --
>>>>>>  include/net/net.h  |  2 ++
>>>>>>  migration/savevm.c |  6 ++
>>>>>>  4 files changed, 16 insertions(+), 24 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>> index c321680..6e9ce4f 100644
>>>>>> --- a/hw/net/virtio-net.c
>>>>>> +++ b/hw/net/virtio-net.c
>>>>>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, 
>>>>>> uint8_t status)
>>>>>>  (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>>>  }
>>>>>>  
>>>>>> -static void virtio_net_announce_timer(void *opaque)
>>>>>> +static void virtio_net_announce(NetClientState *nc)
>>>>>>  {
>>>>>> -VirtIONet *n = opaque;
>>>>>> +VirtIONet *n = qemu_get_nic_opaque(nc);
>>>>>>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>>  
>>>>>> -n->announce_counter--;
>>>>>> -n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>> -virtio_notify_config(vdev);
>>>>>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>>>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>>>>>> +n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>> +virtio_notify_config(vdev);
>>>>>> +}
>>>>>>  }
>>>>>>  
>>>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>>>  n->nobcast = 0;
>>>>>>  /* multiqueue is disabled by default */
>>>>>>  n->curr_queues = 1;
>>>>>> -timer_del(n->announce_timer);
>>>>>> -n->announce_counter = 0;
>>>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>>  
>>>>>>  /* Flush any MAC and VLAN filter table state */
>>>>>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
>>>>>> uint8_t cmd,
>>>>>>  if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>>>>>  n->status & VIRTIO_NET_S_ANNOUNCE) {
>>>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>> -if (n->announce_counter) {
>>>>>> -timer_mod(n->announce_timer,
>>>>>> -  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>>>>>> -  self_announce_delay(n->announce_counter));
>>>>>> -}
>>>>>>  return VIRTIO_NET_OK;
>>>>>>  } else {
>>>>>>  return VIRTIO_NET_ERR;
>>>>>> @@ -1609,12 +1604,6 @@ static int virtio_net_post_load_device(void 
>>>>>> *opaque, int version_id)
>>>>>>  qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>>>>>  }
>>>>>>  
>>>>>> -if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>>>> -vir

Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self

2017-03-30 Thread Vlad Yasevich

On 03/30/2017 10:53 AM, Dr. David Alan Gilbert wrote:
> * Vlad Yasevich (vyase...@redhat.com) wrote:
>> On 03/30/2017 04:24 AM, Dr. David Alan Gilbert wrote:
>>> * Vladislav Yasevich (vyase...@redhat.com) wrote:
>>>> virtio announcements seem to run on its own timer with it's own
>>>> repetition loop and are invoked separately from qemu_announce_self.
>>>> This patch consolidates announcements into a single location and
>>>> allows other nics to provide it's own announcement capabilities, if
>>>> necessary.
>>>>
>>>> This is also usefull in support of exposing qemu_announce_self through
>>>> qmp/hmp.
>>>>
>>>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>>>> ---
>>>>  hw/net/virtio-net.c| 30 --
>>>>  include/hw/virtio/virtio-net.h |  2 --
>>>>  include/net/net.h  |  2 ++
>>>>  migration/savevm.c |  6 ++
>>>>  4 files changed, 16 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index c321680..6e9ce4f 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, uint8_t 
>>>> status)
>>>>  (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>  }
>>>>  
>>>> -static void virtio_net_announce_timer(void *opaque)
>>>> +static void virtio_net_announce(NetClientState *nc)
>>>>  {
>>>> -VirtIONet *n = opaque;
>>>> +VirtIONet *n = qemu_get_nic_opaque(nc);
>>>>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>  
>>>> -n->announce_counter--;
>>>> -n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>> -virtio_notify_config(vdev);
>>>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>>>> +n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>> +virtio_notify_config(vdev);
>>>> +}
>>>>  }
>>>>  
>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>  n->nobcast = 0;
>>>>  /* multiqueue is disabled by default */
>>>>  n->curr_queues = 1;
>>>> -timer_del(n->announce_timer);
>>>> -n->announce_counter = 0;
>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>  
>>>>  /* Flush any MAC and VLAN filter table state */
>>>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
>>>> uint8_t cmd,
>>>>  if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>>>  n->status & VIRTIO_NET_S_ANNOUNCE) {
>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>> -if (n->announce_counter) {
>>>> -timer_mod(n->announce_timer,
>>>> -  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>>>> -  self_announce_delay(n->announce_counter));
>>>> -}
>>>>  return VIRTIO_NET_OK;
>>>>  } else {
>>>>  return VIRTIO_NET_ERR;
>>>> @@ -1609,12 +1604,6 @@ static int virtio_net_post_load_device(void 
>>>> *opaque, int version_id)
>>>>  qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>>>  }
>>>>  
>>>> -if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>> -virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>>>> -n->announce_counter = SELF_ANNOUNCE_ROUNDS;
>>>> -timer_mod(n->announce_timer, 
>>>> qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
>>>> -}
>>>> -
>>>>  return 0;
>>>>  }
>>>>  
>>>> @@ -1829,6 +1818,7 @@ static NetClientInfo net_virtio_info = {
>>>>  .receive = virtio_net_receive,
>>>>  .link_status_changed = virtio_net_set_link_status,
>>>>  .query_rx_filter = virtio_net_query_rxfilter,
>>>> +.announce = virtio_net_announce,
>>>>  };
>>>>  
>>>>

Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self

2017-03-30 Thread Vlad Yasevich

On 03/29/2017 04:35 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 29, 2017 at 04:19:50PM -0400, Vladislav Yasevich wrote:
>> virtio announcements seem to run on its own timer with it's own
>> repetition loop and are invoked separately from qemu_announce_self.
>> This patch consolidates announcements into a single location and
>> allows other nics to provide it's own
> 
> their own
> 
>> announcement capabilities, if
>> necessary.
>>
>> This is also usefull
> 
> or useful?
> 
>> in support of exposing qemu_announce_self through
>> qmp/hmp.
> 
> I'm guessing this is the real reason for the change.
> How is it helpful for that? Pls clarify in the commit log.

Ok

> 
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>> ---
>>  hw/net/virtio-net.c| 30 --
>>  include/hw/virtio/virtio-net.h |  2 --
>>  include/net/net.h  |  2 ++
>>  migration/savevm.c |  6 ++
>>  4 files changed, 16 insertions(+), 24 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index c321680..6e9ce4f 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, uint8_t 
>> status)
>>  (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>  }
>>  
>> -static void virtio_net_announce_timer(void *opaque)
>> +static void virtio_net_announce(NetClientState *nc)
>>  {
>> -VirtIONet *n = opaque;
>> +VirtIONet *n = qemu_get_nic_opaque(nc);
>>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>  
>> -n->announce_counter--;
>> -n->status |= VIRTIO_NET_S_ANNOUNCE;
>> -virtio_notify_config(vdev);
>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>> +n->status |= VIRTIO_NET_S_ANNOUNCE;
>> +virtio_notify_config(vdev);
>> +}
>>  }
>>  
>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>  n->nobcast = 0;
>>  /* multiqueue is disabled by default */
>>  n->curr_queues = 1;
>> -timer_del(n->announce_timer);
>> -n->announce_counter = 0;
>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>  
>>  /* Flush any MAC and VLAN filter table state */
>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
>> uint8_t cmd,
>>  if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>  n->status & VIRTIO_NET_S_ANNOUNCE) {
>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>> -if (n->announce_counter) {
>> -timer_mod(n->announce_timer,
>> -  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> -  self_announce_delay(n->announce_counter));
>> -}
>>  return VIRTIO_NET_OK;
>>  } else {
>>  return VIRTIO_NET_ERR;
> 
> you will notice that this runs at VM speeds with QEMU_CLOCK_VIRTUAL,
> while the self announce you are tying this to runs with
> QEMU_CLOCK_REALTIME.
> 
> So this switch seems wrong to me: by the time VM is started
> on destination all timers might have expired.
> 

That explains what I am seeing now.  Hmm... may be combining them this way
isn't the right approach.  I think I'll leave RARPs alone and export an ability
to start the a nic provided announce feature.

Then Germano's qmp command would be able to call this one if available and the 
old
one if not.  That way, on user-triggered announcements, we wouldn't need to do 
the
RARPs, if a better announcement capability is present.

That way current behavior is preserved for migrations.

Thanks
-vlad

> 
>> @@ -1609,12 +1604,6 @@ static int virtio_net_post_load_device(void *opaque, 
>> int version_id)
>>  qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>  }
>>  
>> -if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>> -virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>> -n->announce_counter = SELF_ANNOUNCE_ROUNDS;
>> -timer_mod(n->announce_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
>> -}
>> -
>>  return 0;
>>  }
>>  
>> @@ -1829,6 +1818,7 @@ static NetClientInfo net_virtio_info = {
>>  .receive = virtio_net_receive,
>>  .link_status_changed

Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self

2017-03-30 Thread Vlad Yasevich

gt; -timer_free(n->announce_timer);
>>  g_free(n->vqs);
>>  qemu_del_nic(n->nic);
>>  virtio_cleanup(vdev);
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 1eec9a2..0c597f4 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -94,8 +94,6 @@ typedef struct VirtIONet {
>>  char *netclient_name;
>>  char *netclient_type;
>>  uint64_t curr_guest_offloads;
>> -QEMUTimer *announce_timer;
>> -int announce_counter;
>>  bool needs_vnet_hdr_swap;
>>  } VirtIONet;
>>  
>> diff --git a/include/net/net.h b/include/net/net.h
>> index 99b28d5..598f523 100644
>> --- a/include/net/net.h
>> +++ b/include/net/net.h
>> @@ -64,6 +64,7 @@ typedef int (SetVnetLE)(NetClientState *, bool);
>>  typedef int (SetVnetBE)(NetClientState *, bool);
>>  typedef struct SocketReadState SocketReadState;
>>  typedef void (SocketReadStateFinalize)(SocketReadState *rs);
>> +typedef void (NetAnnounce)(NetClientState *);
>>  
>>  typedef struct NetClientInfo {
>>  NetClientDriver type;
>> @@ -84,6 +85,7 @@ typedef struct NetClientInfo {
>>  SetVnetHdrLen *set_vnet_hdr_len;
>>  SetVnetLE *set_vnet_le;
>>  SetVnetBE *set_vnet_be;
>> +NetAnnounce *announce;
>>  } NetClientInfo;
>>  
>>  struct NetClientState {
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 3b19a4a..5c1d8dc 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -113,9 +113,15 @@ static void qemu_announce_self_iter(NICState *nic, void 
>> *opaque)
>>  int len;
>>  
>>  trace_qemu_announce_self_iter(qemu_ether_ntoa(>conf->macaddr));
>> +
>>  len = announce_self_create(buf, nic->conf->macaddr.a);
>>  
>>  qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
>> +
>> +/* if the NIC provides it's own announcement support, use it as well */
>> +if (nic->ncs->info->announce) {
>> +nic->ncs->info->announce(nic->ncs);
>> +}
>>  }
> 
> Combining them doesn't necessarily seem like a bad thing; but
> it does change the timing quite a bit - at the moment the QEMU RARPs
> are sent at the end of migration, while the Virtio ARPs are sent
> when the guest starts running which is quite a bit later.
> 
> In many ways the 'when the guest starts' is better, given that libvirt
> etc has had a chance to reconfigure the networking,  but I'm not that
> sure if it's safe to move the existing one - I had considered *adding*
> another shot of the current mechanism after the guest is started.
> 

Yes, the timing of things have been giving me some issues, but I wanted to post
this patch to get some comments just like this one..

I've wondered why they qemu one happens before the guest starts running.

> I certainly think it's wrong to do the virtio announce at the earlier
> point.
>

I see.

> See also Germano's thread of being able to retrigger the announce
> at arbitrary points, and the series I posted a couple of days ago
> to extend the length/timing of the announce.
> 

That's kind of what prompted me to do try this.  The issue with just
exposing qemu_announce_self() is that RARPS just aren't enough in
some cases (vlans, multicast).  This attempts to add the callback
like Jason mentioned, but then we get timer interactions between the
virtio-net timers and qemu one.

-vlad

> Dave
> 
>> -- 
>> 2.7.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

Re: [Qemu-devel] [PATCHv4] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-21 Thread Vlad Lungu

Resubmitted with errors fixed.

Regards,

Vlad



On 12/21/2016 01:01 AM, no-re...@patchew.org wrote:
> Hi,
>
> Your series seems to have some coding style problems. See output below for
> more information:
>
> Subject: [Qemu-devel] [PATCHv4] multiboot: copy the cmdline verbatim, 
> unescape module strings
> Message-id: 1482140507-23607-1-git-send-email-vlad.lu...@windriver.com
> Type: series
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
>
> BASE=base
> n=1
> total=$(git log --oneline $BASE.. | wc -l)
> failed=0
>
> # Useful git options
> git config --local diff.renamelimit 0
> git config --local diff.renames True
>
> commits="$(git log --format=%H --reverse $BASE..)"
> for c in $commits; do
> echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
> if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; 
> then
> failed=1
> echo
> fi
> n=$((n+1))
> done
>
> exit $failed
> === TEST SCRIPT END ===
>
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> Switched to a new branch 'test'
> 951c8be multiboot: copy the cmdline verbatim, unescape module strings
>
> === OUTPUT BEGIN ===
> Checking PATCH 1/1: multiboot: copy the cmdline verbatim, unescape module 
> strings...
> WARNING: line over 80 characters
> #42: FILE: hw/i386/multiboot.c:299:
> +next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), 
> initrd_filename);
>
> ERROR: do not use assignment in if condition
> #50: FILE: hw/i386/multiboot.c:304:
> +if ((next_space = strchr(tmpbuf, ' ')))
>
> ERROR: braces {} are necessary for all arms of this statement
> #50: FILE: hw/i386/multiboot.c:304:
> +if ((next_space = strchr(tmpbuf, ' ')))
> [...]
>
> total: 2 errors, 1 warnings, 48 lines checked
>
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
>
> === OUTPUT END ===
>
> Test command exited with code: 1
>
>
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-de...@freelists.org

[Qemu-devel] [PATCHv5] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-21 Thread Vlad Lungu

get_opt_value() truncates the value at the first comma
Use memcpy() instead
Unescape the module filename and parameters with get_opt_value()
before calling mb_add_cmdline()

Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com>
---
 hw/i386/multiboot.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index 387caa6..d16777e 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const 
char *cmdline)
 hwaddr p = s->offset_cmdlines;
 char *b = (char *)s->mb_buf + p;
 
-get_opt_value(b, strlen(cmdline) + 1, cmdline);
+memcpy(b, cmdline, strlen(cmdline) + 1);
 s->offset_cmdlines += strlen(b) + 1;
 return s->mb_buf_phys + p;
 }
@@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg,
 mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len;
 
 if (initrd_filename) {
-char *next_initrd, not_last;
+char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
 
 mbs.offset_mods = mbs.mb_buf_size;
 
@@ -296,25 +296,26 @@ int load_multiboot(FWCfgState *fw_cfg,
 int mb_mod_length;
 uint32_t offs = mbs.mb_buf_size;
 
-next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename);
+next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), 
initrd_filename);
 not_last = *next_initrd;
-*next_initrd = '\0';
 /* if a space comes after the module filename, treat everything
after that as parameters */
-hwaddr c = mb_add_cmdline(, initrd_filename);
-if ((next_space = strchr(initrd_filename, ' ')))
+hwaddr c = mb_add_cmdline(, tmpbuf);
+next_space = strchr(tmpbuf, ' ');
+if (next_space) {
 *next_space = '\0';
-mb_debug("multiboot loading module: %s\n", initrd_filename);
-mb_mod_length = get_image_size(initrd_filename);
+}
+mb_debug("multiboot loading module: %s\n", tmpbuf);
+mb_mod_length = get_image_size(tmpbuf);
 if (mb_mod_length < 0) {
-fprintf(stderr, "Failed to open file '%s'\n", initrd_filename);
+fprintf(stderr, "Failed to open file '%s'\n", tmpbuf);
 exit(1);
 }
 
 mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + 
mbs.mb_buf_size);
 mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size);
 
-load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs);
+load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs);
 mb_add_mod(, mbs.mb_buf_phys + offs,
mbs.mb_buf_phys + offs + mb_mod_length, c);
 
-- 
1.9.1

[Qemu-devel] [PATCHv4] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-19 Thread Vlad Lungu

get_opt_value() truncates the value at the first comma
Use memcpy() instead
Unescape the module filename and parameters with get_opt_value()
before calling mb_add_cmdline()

Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com>
---
 hw/i386/multiboot.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index 387caa6..efe11ae 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const 
char *cmdline)
 hwaddr p = s->offset_cmdlines;
 char *b = (char *)s->mb_buf + p;
 
-get_opt_value(b, strlen(cmdline) + 1, cmdline);
+memcpy(b, cmdline, strlen(cmdline) + 1);
 s->offset_cmdlines += strlen(b) + 1;
 return s->mb_buf_phys + p;
 }
@@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg,
 mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len;
 
 if (initrd_filename) {
-char *next_initrd, not_last;
+char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
 
 mbs.offset_mods = mbs.mb_buf_size;
 
@@ -296,25 +296,24 @@ int load_multiboot(FWCfgState *fw_cfg,
 int mb_mod_length;
 uint32_t offs = mbs.mb_buf_size;
 
-next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename);
+next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), 
initrd_filename);
 not_last = *next_initrd;
-*next_initrd = '\0';
 /* if a space comes after the module filename, treat everything
after that as parameters */
-hwaddr c = mb_add_cmdline(, initrd_filename);
-if ((next_space = strchr(initrd_filename, ' ')))
+hwaddr c = mb_add_cmdline(, tmpbuf);
+if ((next_space = strchr(tmpbuf, ' ')))
 *next_space = '\0';
-mb_debug("multiboot loading module: %s\n", initrd_filename);
-mb_mod_length = get_image_size(initrd_filename);
+mb_debug("multiboot loading module: %s\n", tmpbuf);
+mb_mod_length = get_image_size(tmpbuf);
 if (mb_mod_length < 0) {
-fprintf(stderr, "Failed to open file '%s'\n", initrd_filename);
+fprintf(stderr, "Failed to open file '%s'\n", tmpbuf);
 exit(1);
 }
 
 mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + 
mbs.mb_buf_size);
 mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size);
 
-load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs);
+load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs);
 mb_add_mod(, mbs.mb_buf_phys + offs,
mbs.mb_buf_phys + offs + mb_mod_length, c);
 
-- 
1.9.1

Re: [Qemu-devel] [PATCHv3] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-19 Thread Vlad Lungu



On 12/18/2016 10:25 PM, Eduardo Habkost wrote:
> On Thu, Dec 15, 2016 at 02:32:04PM +0200, Vlad Lungu wrote:
>> get_opt_value() truncates the value at the first comma
>> Use memcpy() instead
>> Unescape the module filename and parameters with get_opt_value()
>> before calling mb_add_cmdline()
>>
>> Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com>
>> ---
>>  hw/i386/multiboot.c | 19 +--
>>  1 file changed, 9 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
>> index 387caa6..6b7b5a9 100644
>> --- a/hw/i386/multiboot.c
>> +++ b/hw/i386/multiboot.c
>> @@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const 
>> char *cmdline)
>>  hwaddr p = s->offset_cmdlines;
>>  char *b = (char *)s->mb_buf + p;
>>  
>> -get_opt_value(b, strlen(cmdline) + 1, cmdline);
>> +memcpy(b, cmdline, strlen(cmdline) + 1);
>>  s->offset_cmdlines += strlen(b) + 1;
>>  return s->mb_buf_phys + p;
>>  }
>> @@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg,
>>  mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len;
>>  
>>  if (initrd_filename) {
>> -char *next_initrd, not_last;
>> +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
>>  
>>  mbs.offset_mods = mbs.mb_buf_size;
>>  
>> @@ -296,25 +296,24 @@ int load_multiboot(FWCfgState *fw_cfg,
>>  int mb_mod_length;
>>  uint32_t offs = mbs.mb_buf_size;
>>  
>> -next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename);
>> +next_initrd = (char *)get_opt_value(tmpbuf, 
>> strlen(initrd_filename) + 1, initrd_filename);
> I would prefer to use sizeof(initrd_filename) like Paolo
> suggested.
sizeof(initrd_filename) is 8 (on my machine, x86_64). Maybe sizeof(tmpbuf) 
would be a better idea :-)

Regards,
Vlad

[Qemu-devel] [PATCHv3] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-15 Thread Vlad Lungu

get_opt_value() truncates the value at the first comma
Use memcpy() instead
Unescape the module filename and parameters with get_opt_value()
before calling mb_add_cmdline()

Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com>
---
 hw/i386/multiboot.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index 387caa6..6b7b5a9 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const 
char *cmdline)
 hwaddr p = s->offset_cmdlines;
 char *b = (char *)s->mb_buf + p;
 
-get_opt_value(b, strlen(cmdline) + 1, cmdline);
+memcpy(b, cmdline, strlen(cmdline) + 1);
 s->offset_cmdlines += strlen(b) + 1;
 return s->mb_buf_phys + p;
 }
@@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg,
 mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len;
 
 if (initrd_filename) {
-char *next_initrd, not_last;
+char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
 
 mbs.offset_mods = mbs.mb_buf_size;
 
@@ -296,25 +296,24 @@ int load_multiboot(FWCfgState *fw_cfg,
 int mb_mod_length;
 uint32_t offs = mbs.mb_buf_size;
 
-next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename);
+next_initrd = (char *)get_opt_value(tmpbuf, 
strlen(initrd_filename) + 1, initrd_filename);
 not_last = *next_initrd;
-*next_initrd = '\0';
 /* if a space comes after the module filename, treat everything
after that as parameters */
-hwaddr c = mb_add_cmdline(, initrd_filename);
-if ((next_space = strchr(initrd_filename, ' ')))
+hwaddr c = mb_add_cmdline(, tmpbuf);
+if ((next_space = strchr(tmpbuf, ' ')))
 *next_space = '\0';
-mb_debug("multiboot loading module: %s\n", initrd_filename);
-mb_mod_length = get_image_size(initrd_filename);
+mb_debug("multiboot loading module: %s\n", tmpbuf);
+mb_mod_length = get_image_size(tmpbuf);
 if (mb_mod_length < 0) {
-fprintf(stderr, "Failed to open file '%s'\n", initrd_filename);
+fprintf(stderr, "Failed to open file '%s'\n", tmpbuf);
 exit(1);
 }
 
 mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + 
mbs.mb_buf_size);
 mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size);
 
-load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs);
+load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs);
 mb_add_mod(, mbs.mb_buf_phys + offs,
mbs.mb_buf_phys + offs + mb_mod_length, c);
 
-- 
1.9.1

Re: [Qemu-devel] [PATCHv2] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-15 Thread Vlad Lungu



On 12/15/2016 02:29 PM, Paolo Bonzini wrote:
>
> On 15/12/2016 12:34, Vlad Lungu wrote:
>>
>> On 12/15/2016 12:56 PM, Paolo Bonzini wrote:
>>> On 15/12/2016 11:03, Vlad Lungu wrote:
>>>>  
>>>>  if (initrd_filename) {
>>>> -char *next_initrd, not_last;
>>>> +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
>>>>  
>>>>  mbs.offset_mods = mbs.mb_buf_size;
>>>>  
>>>> @@ -299,22 +299,24 @@ int load_multiboot(FWCfgState *fw_cfg,
>>>>  next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename);
>>>>  not_last = *next_initrd;
>>>>  *next_initrd = '\0';
>>>> +/* Unescape the filename + eventual arguments */
>>>> +get_opt_value(tmpbuf, strlen(initrd_filename) + 1, 
>>>> initrd_filename);
>>> No need to call get_opt_value twice; you can call it once since you have
>>> declared tmpbuf to hold the entire size of initrd_filename.
>> So what you're saying is to go with
>>
>>  next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), 
>> initrd_filename);
>>
>> and delete the second call to get_opt_value().
>>
>> Is that right?
> Yes, thanks!
>
I'll delete

*next_initrd = '\0';

too, since we're operating on tmpbuf after get_opt_value(), and it's already 
NUL-terminated.

Regards,
Vlad

Re: [Qemu-devel] [PATCHv2] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-15 Thread Vlad Lungu



On 12/15/2016 12:56 PM, Paolo Bonzini wrote:
>
> On 15/12/2016 11:03, Vlad Lungu wrote:
>>  
>>  if (initrd_filename) {
>> -char *next_initrd, not_last;
>> +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
>>  
>>  mbs.offset_mods = mbs.mb_buf_size;
>>  
>> @@ -299,22 +299,24 @@ int load_multiboot(FWCfgState *fw_cfg,
>>  next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename);
>>  not_last = *next_initrd;
>>  *next_initrd = '\0';
>> +/* Unescape the filename + eventual arguments */
>> +get_opt_value(tmpbuf, strlen(initrd_filename) + 1, 
>> initrd_filename);
> No need to call get_opt_value twice; you can call it once since you have
> declared tmpbuf to hold the entire size of initrd_filename.
So what you're saying is to go with

 next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), 
initrd_filename);

and delete the second call to get_opt_value().

Is that right?

Regards,
Vlad

[Qemu-devel] [PATCHv2] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-15 Thread Vlad Lungu

get_opt_value() truncates the value at the first comma
Use memcpy() instead
Unescape the module filename and parameters with get_opt_value()
before calling mb_add_cmdline()

Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com>
---
 hw/i386/multiboot.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index 387caa6..37e27f2 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const 
char *cmdline)
 hwaddr p = s->offset_cmdlines;
 char *b = (char *)s->mb_buf + p;
 
-get_opt_value(b, strlen(cmdline) + 1, cmdline);
+memcpy(b, cmdline, strlen(cmdline) + 1);
 s->offset_cmdlines += strlen(b) + 1;
 return s->mb_buf_phys + p;
 }
@@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg,
 mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len;
 
 if (initrd_filename) {
-char *next_initrd, not_last;
+char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
 
 mbs.offset_mods = mbs.mb_buf_size;
 
@@ -299,22 +299,24 @@ int load_multiboot(FWCfgState *fw_cfg,
 next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename);
 not_last = *next_initrd;
 *next_initrd = '\0';
+/* Unescape the filename + eventual arguments */
+get_opt_value(tmpbuf, strlen(initrd_filename) + 1, 
initrd_filename);
 /* if a space comes after the module filename, treat everything
after that as parameters */
-hwaddr c = mb_add_cmdline(, initrd_filename);
-if ((next_space = strchr(initrd_filename, ' ')))
+hwaddr c = mb_add_cmdline(, tmpbuf);
+if ((next_space = strchr(tmpbuf, ' ')))
 *next_space = '\0';
-mb_debug("multiboot loading module: %s\n", initrd_filename);
-mb_mod_length = get_image_size(initrd_filename);
+mb_debug("multiboot loading module: %s\n", tmpbuf);
+mb_mod_length = get_image_size(tmpbuf);
 if (mb_mod_length < 0) {
-fprintf(stderr, "Failed to open file '%s'\n", initrd_filename);
+fprintf(stderr, "Failed to open file '%s'\n", tmpbuf);
 exit(1);
 }
 
 mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + 
mbs.mb_buf_size);
 mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size);
 
-load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs);
+load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs);
 mb_add_mod(, mbs.mb_buf_phys + offs,
mbs.mb_buf_phys + offs + mb_mod_length, c);
 
-- 
1.9.1

Re: [Qemu-devel] [PATCH] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-15 Thread Vlad Lungu

Sorry about replying to myself, I only had one coffee today.

There is no need for mb_add_modstring(), we can simply  do
get_opt_value(tmpbuf,..) then mb_add_cmdline(, tmpbuf)

Regards,
Vlad

On 12/15/2016 11:45 AM, Vlad Lungu wrote:
> get_opt_value() truncates the value at the first comma
> Rename mb_add_cmdline() to mb_add_modstring()
> Unescape filename too
> Add new mb_add_cmdline() using  memcpy()
>
> Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com>
> ---
>  hw/i386/multiboot.c | 23 +--
>  1 file changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
> index 387caa6..bfb52fc 100644
> --- a/hw/i386/multiboot.c
> +++ b/hw/i386/multiboot.c
> @@ -109,6 +109,16 @@ static uint32_t mb_add_cmdline(MultibootState *s, const 
> char *cmdline)
>  hwaddr p = s->offset_cmdlines;
>  char *b = (char *)s->mb_buf + p;
>  
> +memcpy(b, cmdline, strlen(cmdline) + 1);
> +s->offset_cmdlines += strlen(b) + 1;
> +return s->mb_buf_phys + p;
> +}
> +
> +static uint32_t mb_add_modstring(MultibootState *s, const char *cmdline)
> +{
> +hwaddr p = s->offset_cmdlines;
> +char *b = (char *)s->mb_buf + p;
> +
>  get_opt_value(b, strlen(cmdline) + 1, cmdline);
>  s->offset_cmdlines += strlen(b) + 1;
>  return s->mb_buf_phys + p;
> @@ -287,7 +297,7 @@ int load_multiboot(FWCfgState *fw_cfg,
>  mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len;
>  
>  if (initrd_filename) {
> -char *next_initrd, not_last;
> +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
>  
>  mbs.offset_mods = mbs.mb_buf_size;
>  
> @@ -301,20 +311,21 @@ int load_multiboot(FWCfgState *fw_cfg,
>  *next_initrd = '\0';
>  /* if a space comes after the module filename, treat everything
> after that as parameters */
> -hwaddr c = mb_add_cmdline(, initrd_filename);
> +hwaddr c = mb_add_modstring(, initrd_filename);
>  if ((next_space = strchr(initrd_filename, ' ')))
>  *next_space = '\0';
> -mb_debug("multiboot loading module: %s\n", initrd_filename);
> -mb_mod_length = get_image_size(initrd_filename);
> +get_opt_value(tmpbuf, strlen(initrd_filename) + 1, 
> initrd_filename);
> +mb_debug("multiboot loading module: %s\n", tmpbuf);
> +mb_mod_length = get_image_size(tmpbuf);
>  if (mb_mod_length < 0) {
> -fprintf(stderr, "Failed to open file '%s'\n", 
> initrd_filename);
> +fprintf(stderr, "Failed to open file '%s'\n", tmpbuf);
>  exit(1);
>  }
>  
>  mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + 
> mbs.mb_buf_size);
>  mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size);
>  
> -load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs);
> +load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs);
>  mb_add_mod(, mbs.mb_buf_phys + offs,
> mbs.mb_buf_phys + offs + mb_mod_length, c);
>

[Qemu-devel] [PATCH] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-15 Thread Vlad Lungu

get_opt_value() truncates the value at the first comma
Rename mb_add_cmdline() to mb_add_modstring()
Unescape filename too
Add new mb_add_cmdline() using  memcpy()

Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com>
---
 hw/i386/multiboot.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index 387caa6..bfb52fc 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -109,6 +109,16 @@ static uint32_t mb_add_cmdline(MultibootState *s, const 
char *cmdline)
 hwaddr p = s->offset_cmdlines;
 char *b = (char *)s->mb_buf + p;
 
+memcpy(b, cmdline, strlen(cmdline) + 1);
+s->offset_cmdlines += strlen(b) + 1;
+return s->mb_buf_phys + p;
+}
+
+static uint32_t mb_add_modstring(MultibootState *s, const char *cmdline)
+{
+hwaddr p = s->offset_cmdlines;
+char *b = (char *)s->mb_buf + p;
+
 get_opt_value(b, strlen(cmdline) + 1, cmdline);
 s->offset_cmdlines += strlen(b) + 1;
 return s->mb_buf_phys + p;
@@ -287,7 +297,7 @@ int load_multiboot(FWCfgState *fw_cfg,
 mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len;
 
 if (initrd_filename) {
-char *next_initrd, not_last;
+char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
 
 mbs.offset_mods = mbs.mb_buf_size;
 
@@ -301,20 +311,21 @@ int load_multiboot(FWCfgState *fw_cfg,
 *next_initrd = '\0';
 /* if a space comes after the module filename, treat everything
after that as parameters */
-hwaddr c = mb_add_cmdline(, initrd_filename);
+hwaddr c = mb_add_modstring(, initrd_filename);
 if ((next_space = strchr(initrd_filename, ' ')))
 *next_space = '\0';
-mb_debug("multiboot loading module: %s\n", initrd_filename);
-mb_mod_length = get_image_size(initrd_filename);
+get_opt_value(tmpbuf, strlen(initrd_filename) + 1, 
initrd_filename);
+mb_debug("multiboot loading module: %s\n", tmpbuf);
+mb_mod_length = get_image_size(tmpbuf);
 if (mb_mod_length < 0) {
-fprintf(stderr, "Failed to open file '%s'\n", initrd_filename);
+fprintf(stderr, "Failed to open file '%s'\n", tmpbuf);
 exit(1);
 }
 
 mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + 
mbs.mb_buf_size);
 mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size);
 
-load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs);
+load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs);
 mb_add_mod(, mbs.mb_buf_phys + offs,
mbs.mb_buf_phys + offs + mb_mod_length, c);
 
-- 
1.9.1

Re: [Qemu-devel] [PATCH] multiboot: copy the cmdline verbatim

2016-12-15 Thread Vlad Lungu



On 12/15/2016 12:32 AM, Paolo Bonzini wrote:
>>>> In other words: this fixes the mb_add_cmdline(kcmdline) case, and
>>>> doesn't break comma escaping on the initrd case (because it was
>>>> already broken). I don't see a problem with this patch.
>>> ... there is one case of comma escaping that wasn't broken:
>>>
>>> $ qemu-system-x86_64 -kernel foo -initrd '/tmp/one
>>> arg,,with,,commas,/tmp/another arg,,with,,commas'
>>>
>> Oh, I didn't notice the whitespace-based split for initrd
>> arguments.
So that's how it works :-)
>> This is messier than I thought. Maybe the simplest solution is to
>> inline mb_add_cmdline() at both callers, and change the kcmdline
>> one to use memcpy().
>

OK, I have a new version that does with memcpy for the cmdline, with
get_opt_value() for the modules and also unescapes the filename for
get_image_size()/load_image() . You still can't have spaces in filenames,
so maybe a new scheme should be devised for this.

Regards,
Vlad

[Qemu-devel] [PATCH] multiboot: copy the cmdline verbatim

2016-12-14 Thread Vlad Lungu



get_opt_value() truncates the value at the first comma
Use memcpy() instead

Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com>
---
 hw/i386/multiboot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index 387caa6..b4495ad 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, 
const char *cmdline)

 hwaddr p = s->offset_cmdlines;
 char *b = (char *)s->mb_buf + p;

-get_opt_value(b, strlen(cmdline) + 1, cmdline);
+memcpy(b, cmdline, strlen(cmdline) + 1);
 s->offset_cmdlines += strlen(b) + 1;
 return s->mb_buf_phys + p;
 }
--
1.9.1

[Qemu-devel] [PATCH] multiboot: copy the cmdline verbatim

2016-12-14 Thread Vlad Lungu

get_opt_value() truncates the value at the first comma.
Use memcpy() instead.

Signed-off-by: Vlad Lungu <vlad.lu...@windriver.com>
---
 hw/i386/multiboot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index 387caa6..b4495ad 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const 
char *cmdline)
 hwaddr p = s->offset_cmdlines;
 char *b = (char *)s->mb_buf + p;
 
-get_opt_value(b, strlen(cmdline) + 1, cmdline);
+memcpy(b, cmdline, strlen(cmdline) + 1);
 s->offset_cmdlines += strlen(b) + 1;
 return s->mb_buf_phys + p;
 }
-- 
1.9.1

Re: [Qemu-devel] [PATCH v3 2/2] rtl8139: correctly track full receive buffer in standard mode

2015-09-01 Thread Vlad Yasevich

On 08/31/2015 11:15 PM, Jason Wang wrote:
> 
> 
> On 08/31/2015 10:24 PM, Vlad Yasevich wrote:
>> On 08/31/2015 05:59 AM, Jason Wang wrote:
>>>
>>> On 08/28/2015 10:06 PM, Vladislav Yasevich wrote:
>>>> In standard operation mode, when the receive ring buffer
>>>> is full, the buffer actually appears empty to the driver since
>>>> the RxBufAddr (the location we wirte new data to) and RxBufPtr
>>>> (the location guest would stat reading from) are the same.
>>>> As a result, the call to rtl8139_RxBufferEmpty ends up
>>>> returning true indicating that the receive buffer is empty.
>>>> This would result in the next packet overwriting the recevie buffer
>>>> again and stalling receive operations.
>>>>
>>>> This patch tracks the number of unread bytes in the rxbuffer
>>>> using an unused C+ register.  On every read and write, the
>>>> number is adjsted and the special case of a full buffer is also
>>>> trapped.
>>>>
>>>> The C+ register trick is used to simplify migration and not require
>>>> a new machine type.  This register is not used in regular mode
>>>> and C+ mode doesn't have the same issue.
>>>>
>>>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>>>> ---
>>>>  hw/net/rtl8139.c | 45 +
>>>>  1 file changed, 41 insertions(+), 4 deletions(-)
>>> I'm not sure this can happen. For example, looks like the following
>>> check in rtl8139_do_receive():
>>>
>>> if (avail != 0 && size + 8 >= avail)
>>> {
>>>
>>> can guarantee there's no overwriting?
>>>
>> The problem is the calculation of avail.  When the buffer is full,
>> avail will be the the size of the receive buffer.  So the test
>> above will be false because the driver thinks there is actually
>> enough room.
>>
>> With his patch, 'avail' will be calculated to 0.
>>
>> -vlad
>>
> 
> If believe the condition size + 8 >= avail can guarantee that the buffer
> won't be full (if we allow size + 8 == avail, buffer will be full)? So
> avail == 0 means the buffer is empty. Or is there anything I miss?
> 

So the issue is that the RxBufAddr is 4 byte aligned, but when we do
availability check above, we don't 4 byte align the size+8 calculation.
That causes the check above to fail when it should succeed and we never
catch the overflow condition.

I'll resubmit with a simple alignment patch that makes this work.

-vlad

Re: [Qemu-devel] [PATCH v3 2/2] rtl8139: correctly track full receive buffer in standard mode

2015-08-31 Thread Vlad Yasevich

On 08/31/2015 05:59 AM, Jason Wang wrote:
> 
> 
> On 08/28/2015 10:06 PM, Vladislav Yasevich wrote:
>> In standard operation mode, when the receive ring buffer
>> is full, the buffer actually appears empty to the driver since
>> the RxBufAddr (the location we wirte new data to) and RxBufPtr
>> (the location guest would stat reading from) are the same.
>> As a result, the call to rtl8139_RxBufferEmpty ends up
>> returning true indicating that the receive buffer is empty.
>> This would result in the next packet overwriting the recevie buffer
>> again and stalling receive operations.
>>
>> This patch tracks the number of unread bytes in the rxbuffer
>> using an unused C+ register.  On every read and write, the
>> number is adjsted and the special case of a full buffer is also
>> trapped.
>>
>> The C+ register trick is used to simplify migration and not require
>> a new machine type.  This register is not used in regular mode
>> and C+ mode doesn't have the same issue.
>>
>> Signed-off-by: Vladislav Yasevich <vyase...@redhat.com>
>> ---
>>  hw/net/rtl8139.c | 45 +
>>  1 file changed, 41 insertions(+), 4 deletions(-)
> 
> I'm not sure this can happen. For example, looks like the following
> check in rtl8139_do_receive():
> 
> if (avail != 0 && size + 8 >= avail)
> {
> 
> can guarantee there's no overwriting?
> 

The problem is the calculation of avail.  When the buffer is full,
avail will be the the size of the receive buffer.  So the test
above will be false because the driver thinks there is actually
enough room.

With his patch, 'avail' will be calculated to 0.

-vlad

Re: [Qemu-devel] [PATCH 2/2] rtl8139: correctly track full receive buffer in standard mode

2015-08-26 Thread Vlad Yasevich

On 08/26/2015 08:36 AM, Stefan Hajnoczi wrote:
 On Fri, Aug 21, 2015 at 02:59:25PM -0700, Vladislav Yasevich wrote:
 In standard operation mode, when the receive ring buffer
 is full, the buffer actually appears empty to the driver since
 the RxBufAddr (the location we wirte new data to) and RxBufPtr
 (the location guest would stat reading from) are the same.
 As a result, the call to rtl8139_RxBufferEmpty ends up
 returning true indicating that the receive buffer is empty.
 This would result in the next packet overwriting the recevie buffer
 again and stalling receive operations.

 This patch catches the receive buffer full condition
 using an unused C+ register.  This is done to simplify
 migration and not require a new machine type.

 Signed-off-by: Vladislav Yasevich vyase...@redhat.com
 ---
  hw/net/rtl8139.c | 36 ++--
  1 file changed, 34 insertions(+), 2 deletions(-)
 
 The rtl8139 code duplicates the following expression in several places:
 
   MOD2(s-RxBufferSize + s-RxBufAddr - s-RxBufPtr, s-RxBufferSize);
 
 It may be cleaner to keep a rx_unread_bytes counter so that all these
 users can simply look at that variable.
 
 That cleanup also eliminates the rx full vs empty problem because then
 we'll know whether rx_unread_bytes == 0 or rx_unread_bytes ==
 s-RxBufferSize.
 
 The same trick of stashing the value in s-currCPlusRxDesc could be
 used.
 

Good idea.  I'll give it a try.


 diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
 index 359e001..3d572ab 100644
 --- a/hw/net/rtl8139.c
 +++ b/hw/net/rtl8139.c
 @@ -816,6 +816,23 @@ static int rtl8139_can_receive(NetClientState *nc)
  }
  }
  
 +static void rtl8139_set_rxbuf_full(RTL8139State *s, bool full)
 +{
 +/* In standard mode, C+ RxDesc isn't used.  Reuse it
 + * to store the rx_buf_full status.
 + */
 
 assert(!s-cplus_enabled)?
 
 +s-currCPlusRxDesc = full;
 +DPRINTF(received: rx buffer full\n);
 +}
 +
 +static bool rtl8139_rxbuf_full(RTL8139State *s)
 +{
 +/* In standard mode, C+ RxDesc isn't used.  Reuse it
 + * to store the rx_buf_full status.
 + */
 
 assert(!s-cplus_enabled)?
 
 @@ -2601,6 +2630,9 @@ static void rtl8139_RxBufPtr_write(RTL8139State *s, 
 uint32_t val)
  /* this value is off by 16 */
  s-RxBufPtr = MOD2(val + 0x10, s-RxBufferSize);
  
 +/* We just read data, clear full buffer state */
 +rtl8139_set_rxbuf_full(s, false);
 +
  /* more buffer space may be available so try to receive */
  qemu_flush_queued_packets(qemu_get_queue(s-nic));
 
 What if the guest writes this register while we're in C+ mode?
 

In C+ mode the guest uses a descriptor ring instead of liner buffer so a well 
behaved
C+ guest wouldn't write this value.  Validated this with a linux guest.
I guess a malicious guest could do this, but then it could do a lot of other 
things too.

-vlad

Re: [Qemu-devel] [PATCH 1/2] rtl8139: Do not consume the packet during overflow in standard mode.

2015-08-26 Thread Vlad Yasevich

On 08/26/2015 08:18 AM, Stefan Hajnoczi wrote:
 On Fri, Aug 21, 2015 at 02:59:24PM -0700, Vladislav Yasevich wrote:
 When operation in standard mode, we currently return the size
 of packet during buffer overflow.  This consumes the overflow
 packet.  Return 0 instead so we can re-process the overflow packet
 when we have room.

 This fixes issues with lost/dropped fragments of large messages.

 Signed-off-by: Vladislav Yasevich vyase...@redhat.com
 ---
  hw/net/rtl8139.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
 index edbb61c..359e001 100644
 --- a/hw/net/rtl8139.c
 +++ b/hw/net/rtl8139.c
 @@ -1157,7 +1157,7 @@ static ssize_t rtl8139_do_receive(NetClientState *nc, 
 const uint8_t *buf, size_t
  s-IntrStatus |= RxOverflow;
  ++s-RxMissed;
  rtl8139_update_irq(s);
 -return size_;
 +return 0;
 
 Every .receive() return 0 must be paired with a
 qemu_flush_queued_packets() call.

Isn't that already there.  A few drivers already return 0 to trigger a requeue. 
 What's
missing?


 
 Is rtl8139_RxBufPtr_write() guaranteed to be called when the guest
 refills rx buffers?
 

It calls it on read, once it's consumed a packet, to advance to the next packet 
in the
buffer.

-vlad

Re: [Qemu-devel] [PATCH v2 0/2] rtl8139: Fix buffer overflow in standard mode

2015-08-26 Thread Vlad Yasevich

On 08/26/2015 03:51 PM, Vladislav Yasevich wrote:
 When rtl8139 card is running in standard mode, it is very easy
 to overlflow and the receive buffer and get into a siutation
 where all packets are dropped.  Simply reproduction case is
 to ping the guest from the host with 6500 byte packets.  
 
 There are actually 2 problems here.
  1) When the rtl8129 buffer is overflow, the card emulation
 returns the size of the packet back to queue transmission.
 This signals successful reception even though the packet
 has been dropped.  The proper solution is to return 0, so
 that the packet is re-queued and will be resubmitted later.
 
  2) When packets are sized such that the fragments end up completely
 filling the receive buffer without overflow, the device thinks
 that the buffer is actually empty (instead of full).  This causes
 next packet to over-write the existing packets.  With the above
 ping reproducer, ever ICMP packet fills the buffer and thus keeps
 overwriting the previous packet and never waking up the guest.
 The solution here is track the number of unread bytes separately
 so we would know if we have anything in buffer to read or not.
 
 V2: instead of tracking buffer_full condition, changed the code, as
 suggested by Stefan Hajnoczi, to track the number of unread bytes
 instead.  We initialize it to 0 at the start, adjust it on every
 receive from the network and read from the guest and can set
 the number of unread of bytes to full buffer size when the buffer
 full.
 
 Vladislav Yasevich (2):
   rtl8139: Do not consume the packet during overflow in standard mode.
   rtl8139: correctly track full receive buffer in standard mode
 

Self nack.  The second patch is wrong.  Will resubmit when fixed.

-vlad

  hw/net/rtl8139.c | 44 +++-
  1 file changed, 39 insertions(+), 5 deletions(-)

Re: [Qemu-devel] [Qemu-discuss] GRO not happening in VM with VxLAN

2015-07-01 Thread Vlad Yasevich

On 07/01/2015 02:50 PM, Santosh R wrote:
 Since the vxlan UDP header checksum is 0, udp_tunnel_gro_complete (called
 via vxlan_gro_complete) is setting SKB_GSO_UDP_TUNNEL in
 skb_shinfo(skb)-gso_type.
 Later when bridge interface tries to forward this packet to tap interface
 (br_dev_queue_push_xmit - __dev_queue_xmit - validate_xmit_skb)
 netif_needs_gso will succeed (skb_gso_ok returns 0)
 resulting in packet segmentation (which was earlier aggregated) as tap
 interface doesn't have this feature. Is there a way to enable
 tx-udp_tnl-segmentation for tap interface?

No.  Those features are set by qemu when the guest is created and
mirror any features that are supported by the guest.  So, if the guest
virtio doesn't have that feature, neither will the tap device.

Which kernel version are you using on the host?

-vlad

 
 # ethtool -K tap0 tx-udp_tnl-segmentation on
 Could not change any device features
 
 # ethtool -k tap0 | grep -i tnl
 tx-udp_tnl-segmentation: off [fixed]
 
 On Tue, Jun 30, 2015 at 12:24 PM, Santosh R skrasta...@gmail.com wrote:
 

 On Mon, Jun 29, 2015 at 9:24 PM, Santosh R skrasta...@gmail.com wrote:


 On Mon, Jun 29, 2015 at 9:04 PM, Vlad Yasevich vyase...@redhat.com
 wrote:

 On 06/29/2015 01:46 AM, Santosh R wrote:
 All,

I am testing VxLAN performance in VM. For this I am using below
 command
 to bring up the VM.
 # qemu-system-x86_64 -m 4096 -smp 4 -boot c -device
 virtio-net-pci,netdev=tap0,mac=00:11:22:33:44:55 -netdev
 tap,id=tap0,script=no,vhost=on  -drive file=/root/vdisk_rhel65.img 

 This is resulting in lower throughput in VM. On looking further I found
 that GRO is not happing on the tap interface.
 Using tcpdump I could see GRO happeing in physcial nic, vxlan and
 bridge
 interface. but _not_ on tap interface.

 I don't think this is GRO.  GRO  only happens at the lowest possible
 layer.
 For packets coming from external sources, that happens just above the nic
 when the packet enters the host for the first time.  From then on, the
 features
 of other devices determine if the large aggregated packet will be
 segmented
 again or not.



 So, I thought to use veth pair instead of the tap interface in VM.
 What is
 the command to do this?

 veth pair will not help with this.  you need a tap or a macvtap to have
 VM  talk to host.

 Also,
 1. Why is rx-checksumming off for tap interface?

 because tap doesn't do checksum validation.  it allows lower level
 devices or
 VM to do so.

 2. Why is GRO not happening with tap interface?


 Again.  I don't think it's GRO.  Try capturing traffic at tap and see if
 you get
 large packets (more the 1 mtu long) on the tap.  If not, then someone
 else is fragmenting
 packets and you need to find who.

 [Santosh]:  Thanks for the reply.
 With VxLAN, aggregated packets are seen on physical, VxLAN and bridge
 interface but _not_ on the tap interface.
 Without VxLAN, I do see aggregated (GRO) packets all the way upto VM
 (i.e. physical, bridge and tap interface).
 As you rightly pointed, looks like VxLAN packets are getting fragmented.
 Any pointers to look for?


 By the way, I was referring below link for VxLAN testing and ran into the
 GRO issue.
 This mentions The veth0 is suppose to be connected to the VM, while the
 veth1 is connected to the Linux bridge.
 https://community.mellanox.com/docs/DOC-1473

Re: [Qemu-devel] [Qemu-discuss] GRO not happening in VM with VxLAN

2015-06-29 Thread Vlad Yasevich

On 06/29/2015 01:46 AM, Santosh R wrote:
 All,
 
I am testing VxLAN performance in VM. For this I am using below command
 to bring up the VM.
 # qemu-system-x86_64 -m 4096 -smp 4 -boot c -device
 virtio-net-pci,netdev=tap0,mac=00:11:22:33:44:55 -netdev
 tap,id=tap0,script=no,vhost=on  -drive file=/root/vdisk_rhel65.img 
 
 This is resulting in lower throughput in VM. On looking further I found
 that GRO is not happing on the tap interface.
 Using tcpdump I could see GRO happeing in physcial nic, vxlan and bridge
 interface. but _not_ on tap interface.

I don't think this is GRO.  GRO  only happens at the lowest possible layer.
For packets coming from external sources, that happens just above the nic
when the packet enters the host for the first time.  From then on, the features
of other devices determine if the large aggregated packet will be segmented
again or not.


 
 So, I thought to use veth pair instead of the tap interface in VM. What is
 the command to do this?

veth pair will not help with this.  you need a tap or a macvtap to have
VM  talk to host.

 Also,
 1. Why is rx-checksumming off for tap interface?

because tap doesn't do checksum validation.  it allows lower level devices or
VM to do so.

 2. Why is GRO not happening with tap interface?
 

Again.  I don't think it's GRO.  Try capturing traffic at tap and see if you get
large packets (more the 1 mtu long) on the tap.  If not, then someone else is 
fragmenting
packets and you need to find who.

-vlad

 Pls copy me in all the replies as I am not on the list.
 
 # brctl show
 bridge name bridge id   STP enabled interfaces
 br-vx   8000.323a9146b2d2   no  tap0
 vxlan0
 # ethtool -k eth35
 Features for eth35:
 rx-checksumming: on
 tx-checksumming: on
 tx-checksum-ipv4: on
 tx-checksum-ip-generic: off [fixed]
 tx-checksum-ipv6: on
 tx-checksum-fcoe-crc: off [fixed]
 tx-checksum-sctp: off [fixed]
 scatter-gather: on
 tx-scatter-gather: on
 tx-scatter-gather-fraglist: off [fixed]
 tcp-segmentation-offload: on
 tx-tcp-segmentation: on
 tx-tcp-ecn-segmentation: on
 tx-tcp6-segmentation: on
 udp-fragmentation-offload: off [fixed]
 generic-segmentation-offload: on
 generic-receive-offload: on
 large-receive-offload: off [fixed]
 rx-vlan-offload: on
 tx-vlan-offload: on
 ntuple-filters: off [fixed]
 receive-hashing: off [fixed]
 highdma: on
 rx-vlan-filter: off [fixed]
 vlan-challenged: off [fixed]
 tx-lockless: off [fixed]
 netns-local: off [fixed]
 tx-gso-robust: off [fixed]
 tx-fcoe-segmentation: off [fixed]
 tx-gre-segmentation: off [fixed]
 tx-ipip-segmentation: off [fixed]
 tx-sit-segmentation: off [fixed]
 tx-udp_tnl-segmentation: on
 tx-mpls-segmentation: off [fixed]
 fcoe-mtu: off [fixed]
 tx-nocache-copy: off
 loopback: off [fixed]
 rx-fcs: off [fixed]
 rx-all: off [fixed]
 tx-vlan-stag-hw-insert: off [fixed]
 rx-vlan-stag-hw-parse: off [fixed]
 rx-vlan-stag-filter: off [fixed]
 l2-fwd-offload: off [fixed]
 busy-poll: on [fixed]
 
 # ethtool -k vxlan0
 Features for vxlan0:
 rx-checksumming: on
 tx-checksumming: on
 tx-checksum-ipv4: off [fixed]
 tx-checksum-ip-generic: on
 tx-checksum-ipv6: off [fixed]
 tx-checksum-fcoe-crc: off [fixed]
 tx-checksum-sctp: off [fixed]
 scatter-gather: on
 tx-scatter-gather: on
 tx-scatter-gather-fraglist: off [fixed]
 tcp-segmentation-offload: on
 tx-tcp-segmentation: on
 tx-tcp-ecn-segmentation: on
 tx-tcp6-segmentation: on
 udp-fragmentation-offload: on
 generic-segmentation-offload: on
 generic-receive-offload: on
 large-receive-offload: off [fixed]
 rx-vlan-offload: off [fixed]
 tx-vlan-offload: on
 ntuple-filters: off [fixed]
 receive-hashing: off [fixed]
 highdma: off [fixed]
 rx-vlan-filter: off [fixed]
 vlan-challenged: off [fixed]
 tx-lockless: on [fixed]
 netns-local: off [fixed]
 tx-gso-robust: off [fixed]
 tx-fcoe-segmentation: off [fixed]
 tx-gre-segmentation: off [fixed]
 tx-ipip-segmentation: off [fixed]
 tx-sit-segmentation: off [fixed]
 tx-udp_tnl-segmentation: off [fixed]
 tx-mpls-segmentation: off [fixed]
 fcoe-mtu: off [fixed]
 tx-nocache-copy: off
 loopback: off [fixed]
 rx-fcs: off [fixed]
 rx-all: off [fixed]
 tx-vlan-stag-hw-insert: on
 rx-vlan-stag-hw-parse: off [fixed]
 rx-vlan-stag-filter: off [fixed]
 l2-fwd-offload: off [fixed]
 busy-poll: off [fixed]
 
 # ethtool -k tap0
 Features for tap0:
 rx-checksumming: off [fixed]
 tx-checksumming: on
 tx-checksum-ipv4: off [fixed]
 tx-checksum-ip-generic: on
 tx-checksum-ipv6: off [fixed]
 tx-checksum-fcoe-crc: off [fixed]
 tx-checksum-sctp: off [fixed]
 scatter-gather: on
 tx-scatter-gather: on
 tx-scatter-gather-fraglist: on
 tcp-segmentation-offload: on
 tx-tcp-segmentation: on
 tx-tcp-ecn-segmentation

Re: [Qemu-devel] [PATCH] net: Fix link state inter-dependencies between NIC and backend

2015-04-02 Thread Vlad Yasevich

On 04/02/2015 05:21 AM, Stefan Hajnoczi wrote:
 On Thu, Apr 02, 2015 at 08:11:15AM +0200, Michael S. Tsirkin wrote:
 On Wed, Apr 01, 2015 at 07:55:38PM -0400, Vladislav Yasevich wrote:
 Commit 02d38fcb2caa4454cf4ed728d5908c3cc9ba47be
 net: Update netdev peer on link change

 introduced a link state dependency between the backend
 netdev and the nic.  If the admin turned off the link
 on the backend, the nic link state was set to down because
 the link had no access to the network at all.  This created
 some inconsitet behavior for someone who wanted to play
 around the links states.
  1) Turning off the nic link and then turning on the backend
 link (even if it was already on) would turn on the nic link
 again.
  2) Turning off the backend link and then turning on the nic
 link would turn on the link in the VM, but would not change
 the backend state resulting in a broken/unreachable network.

 This patch attempts to correct these behaviors.  The patch treats
 the nic-backend relationship as two ends of a wire.  Each end tracks
 the administratively set link state in addition to actual link
 state.  Thus is is possible to set link down at each end effectively
 emulating plugging/unplugging the wire at either end.  The patch
 continues to preserve the old behavior where setting just
 nic side down does NOT impact the backend.

 Signed-off-by: Vladislav Yasevich vyase...@redhat.com

 I never understood the point of
 02d38fcb2caa4454cf4ed728d5908c3cc9ba47be.

 If you want to tell guest link is down,
 just set the nic link down.

 How about we just revert it?

 Then one can change link state on each end
 independently, with no notification,
 and no state to track.
 
 I agree.  The simple solution would be nice, unless you are aware of a
 use case where it causes problems.

It causes problems when the user turns off link on the backend.  Problems range 
from much
longer boot up times on older VMs to Windows networks not working.

The problem is the if you turn off the link on the backend, the link to the VM 
stays
up but can't really reach the network.

Now, you could say Don't do this and document it, but it's still ugly and 
causes
issues.

We could disable the link state change on backends and only allow it on NIC 
type devices.
That would also solve this.

-vlad
 
 Stefan

Re: [Qemu-devel] [Bug 1362755] [NEW] QEmu +2 does not route VLAN tagged packets, that are originated within the Hypervisor itself.

2014-08-29 Thread Vlad Yasevich

[ realized that the bug and reporter were non cc'd, updated cc list]

On 08/28/2014 02:40 PM, Thiago Martins wrote:
 Public bug reported:
 
 Guys,
 
 Trusty QEmu 2.0 Hypervisor fails to create a consistent virtual network.
 It does not route tagged VLAN packets.
 

The have a been a bunch of rather recent changes to the kernel to support
guest VLANs correctly.  The issues have been around TSO/GSO implementation
in the kernel.

Could try disabling TSO/GSO and tx checksums on the vlan devices in the guest
and see if it solves your problem?

If it does, could you try the kernel from
  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
turn the offloads back on and see if the problem is solved?

Thanks
-vlad

 That's it, it is impossible to use Trusty acting as a QEmu 2.0
 Hypervisor (metapakage `ubuntu-virt-server`), to make a basic virtual
 tagged network within itself. QEmu 2.X guest does not route traffic when
 with tagged VLANs!
 
 So, Trusty QEmu 2.0 Hypervisor cannot be used to host guests acting as
 firewalls / routers, and it have an easy to reproduce, connectivity
 problem.
 
 This network problem affects Ubuntu 14.04.1 (Linux-3.13.0-35-generic)
 with QEmu 2.0 (it also affects 14.10, Linux 3.16 - QEmu 2.1).
 
 I have this very same setup up and running, on about ~100 physical
 servers (others Trusty QEmu 2.0 Hypervisors), and in only a few of them,
 the QEmu Hypervisors dedicated to host guest acting as routers /
 firewalls, like a borger gateway for example, that it does not work
 as expected.
 
 One interesting thing to note is that, this BUG appear only, and only
 at, the QEmu Hypervisors dedicated to host guests that are used as
 `router / firewalls` (as I said above), others QEmu Hypervisors of my
 network does not suffer from this problem.
 
 Another interesting point is that it fails to route tagged VLAN packets
 only when these packets are originated from within the Hypervisor
 itself, I mean, packets from both host and other guests (not the
 router/firewall guest itself), suffer from this connectivity problem.
 
 As a workaroung / fix, Xen-4.4 can be used, instead of QEmu 2.0, as a
 border hypervisor. So, this proves that there is something wrong with
 QEmu.
 
 I already tested it with both `openvswitch-switch` and with `bridge-
 utils`, same bad results. So, don't waste your time trying `bridge-
 utils` (optional steps while reproducing it), you can keep OVS bridges
 from original design.
 
 I think that I'm using the best pratices to build this environment, as
 follows...
 
 
 * Topology *
 
 
 QEmu 2.0 Hypervisor - (qemu-host-1.domain.com - the border hypervisor):
 
 1- Physical machine with 3 NICs;
 2- Minimal Ubuntu 14.04.1 installed and upgraded;
 3- Packages installed: ubuntu-virt-server openvswitch-switch rdnssd tcpdump.
 
 - eth0 connected to the Internet - VLAN tag 10;
 - eth1 connected to the LAN1 - VLAN tag 100;
 - eth2 connected to the LAN2 - VLAN tag 200;
 
 
 Guest (guest-fw-1.domain.com - the border gateway itself - regular guest 
 acting as a router with iptables/ip6tables):
 
 1- Virtual Machine with 3 NICs (VirtIO);
 2- Minimal Virtual Machine Ubuntu 14.04.1 installed and upgraded;
 3- Packages installed: aiccu iptables vlan pv-grub-menu.
 
 
 OBS: You'll need `virt-manager` to connect at `qemu-host-1` to install
 `guest-fw-1`. Then, use `guest-fw-1` as a default gateway for your
 (virt-)lab network, including the `qemu-host-1` itself.
 
 
 Steps to reproduce
 
 
 * Preparing the `qemu-host-1` host:
 
 - Configure the /etc/network/interfaces with:
 
 ---
 # The loopback network interface
 auto lo
 iface lo inet loopback
 
 auto eth0
 iface eth0 inet manual
 up ip link set $IFACE up
 down ip link set $IFACE down
 
 auto eth1
 iface eth1 inet manual
   up ip link set dev $IFACE up
   down ip link set dev $IFACE down
 
 auto ovsbr1p1
 iface ovsbr1p1 inet6 auto
 
 iface ovsbr1p1 inet static
   address 192.168.1.10
   netmask 24
   gateway 192.168.1.1
 
 auto eth2
 iface eth2 inet manual
   up ip link set $IFACE up
   down ip link set $IFACE down
 ---
 
 
 - Creating the Hypervisor OVS Bridges:
 
 ovs-vsctl add-br ovsbr0
 ovs-vsctl add-br ovsbr1
 ovs-vsctl add-br ovsbr2
 
 
 - Attaching the bridges to the NICs:
 
 ovs-vsctl add-port ovsbr0 eth0
 ovs-vsctl add-port ovsbr1 eth1
 ovs-vsctl add-port ovsbr2 eth2
 
 
 - Creating the OVS internal tagged interface (best practice?), so the QEmu 
 Hypervisor itself can have its own IP (v4 and v6):
 
 ovs-vsctl add-port ovsbr1 ovsbr1p1 tag=100 -- set interface ovsbr1p1 
 type=internal
 ovs-vsctl set interface ovsbr1p1 mac=\32:ac:85:72:ab:fe\
 
 
  NOTE:
 
  * I'm fixing the MAC Address of ovsbr1p1 because I like to use IPv6
 with SLAAC, so, it remain fixed across host reboots.
 
 
 - Making Libvirt aware of OVS Bridges:
 
 Create 3 files, one for each bridge, like this (ovsbr0.xml, ovsbr1.xml
 and ovsbr2.xml):
 
 --- ovsbr0.xml contents:
 network
  nameovsbr0/name
  forward mode

Re: [Qemu-devel] [Bug 1362755] [NEW] QEmu +2 does not route VLAN tagged packets, that are originated within the Hypervisor itself.

2014-08-28 Thread Vlad Yasevich

On 08/28/2014 02:40 PM, Thiago Martins wrote:
 Public bug reported:
 
 Guys,
 
 Trusty QEmu 2.0 Hypervisor fails to create a consistent virtual network.
 It does not route tagged VLAN packets.
 

The have a been a bunch of rather recent changes to the kernel to support
guest VLANs correctly.  The issues have been around TSO/GSO implementation
in the kernel.

Could try disabling TSO/GSO and tx checksums on the vlan devices in the guest
and see if it solves your problem?

If it does, could you try the kernel from
  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
turn the offloads back on and see if the problem is solved?

Thanks
-vlad

 That's it, it is impossible to use Trusty acting as a QEmu 2.0
 Hypervisor (metapakage `ubuntu-virt-server`), to make a basic virtual
 tagged network within itself. QEmu 2.X guest does not route traffic when
 with tagged VLANs!
 
 So, Trusty QEmu 2.0 Hypervisor cannot be used to host guests acting as
 firewalls / routers, and it have an easy to reproduce, connectivity
 problem.
 
 This network problem affects Ubuntu 14.04.1 (Linux-3.13.0-35-generic)
 with QEmu 2.0 (it also affects 14.10, Linux 3.16 - QEmu 2.1).
 
 I have this very same setup up and running, on about ~100 physical
 servers (others Trusty QEmu 2.0 Hypervisors), and in only a few of them,
 the QEmu Hypervisors dedicated to host guest acting as routers /
 firewalls, like a borger gateway for example, that it does not work
 as expected.
 
 One interesting thing to note is that, this BUG appear only, and only
 at, the QEmu Hypervisors dedicated to host guests that are used as
 `router / firewalls` (as I said above), others QEmu Hypervisors of my
 network does not suffer from this problem.
 
 Another interesting point is that it fails to route tagged VLAN packets
 only when these packets are originated from within the Hypervisor
 itself, I mean, packets from both host and other guests (not the
 router/firewall guest itself), suffer from this connectivity problem.
 
 As a workaroung / fix, Xen-4.4 can be used, instead of QEmu 2.0, as a
 border hypervisor. So, this proves that there is something wrong with
 QEmu.
 
 I already tested it with both `openvswitch-switch` and with `bridge-
 utils`, same bad results. So, don't waste your time trying `bridge-
 utils` (optional steps while reproducing it), you can keep OVS bridges
 from original design.
 
 I think that I'm using the best pratices to build this environment, as
 follows...
 
 
 * Topology *
 
 
 QEmu 2.0 Hypervisor - (qemu-host-1.domain.com - the border hypervisor):
 
 1- Physical machine with 3 NICs;
 2- Minimal Ubuntu 14.04.1 installed and upgraded;
 3- Packages installed: ubuntu-virt-server openvswitch-switch rdnssd tcpdump.
 
 - eth0 connected to the Internet - VLAN tag 10;
 - eth1 connected to the LAN1 - VLAN tag 100;
 - eth2 connected to the LAN2 - VLAN tag 200;
 
 
 Guest (guest-fw-1.domain.com - the border gateway itself - regular guest 
 acting as a router with iptables/ip6tables):
 
 1- Virtual Machine with 3 NICs (VirtIO);
 2- Minimal Virtual Machine Ubuntu 14.04.1 installed and upgraded;
 3- Packages installed: aiccu iptables vlan pv-grub-menu.
 
 
 OBS: You'll need `virt-manager` to connect at `qemu-host-1` to install
 `guest-fw-1`. Then, use `guest-fw-1` as a default gateway for your
 (virt-)lab network, including the `qemu-host-1` itself.
 
 
 Steps to reproduce
 
 
 * Preparing the `qemu-host-1` host:
 
 - Configure the /etc/network/interfaces with:
 
 ---
 # The loopback network interface
 auto lo
 iface lo inet loopback
 
 auto eth0
 iface eth0 inet manual
 up ip link set $IFACE up
 down ip link set $IFACE down
 
 auto eth1
 iface eth1 inet manual
   up ip link set dev $IFACE up
   down ip link set dev $IFACE down
 
 auto ovsbr1p1
 iface ovsbr1p1 inet6 auto
 
 iface ovsbr1p1 inet static
   address 192.168.1.10
   netmask 24
   gateway 192.168.1.1
 
 auto eth2
 iface eth2 inet manual
   up ip link set $IFACE up
   down ip link set $IFACE down
 ---
 
 
 - Creating the Hypervisor OVS Bridges:
 
 ovs-vsctl add-br ovsbr0
 ovs-vsctl add-br ovsbr1
 ovs-vsctl add-br ovsbr2
 
 
 - Attaching the bridges to the NICs:
 
 ovs-vsctl add-port ovsbr0 eth0
 ovs-vsctl add-port ovsbr1 eth1
 ovs-vsctl add-port ovsbr2 eth2
 
 
 - Creating the OVS internal tagged interface (best practice?), so the QEmu 
 Hypervisor itself can have its own IP (v4 and v6):
 
 ovs-vsctl add-port ovsbr1 ovsbr1p1 tag=100 -- set interface ovsbr1p1 
 type=internal
 ovs-vsctl set interface ovsbr1p1 mac=\32:ac:85:72:ab:fe\
 
 
  NOTE:
 
  * I'm fixing the MAC Address of ovsbr1p1 because I like to use IPv6
 with SLAAC, so, it remain fixed across host reboots.
 
 
 - Making Libvirt aware of OVS Bridges:
 
 Create 3 files, one for each bridge, like this (ovsbr0.xml, ovsbr1.xml
 and ovsbr2.xml):
 
 --- ovsbr0.xml contents:
 network
  nameovsbr0/name
  forward mode='bridge'/
  bridge name='ovsbr0'/
  virtualport type='openvswitch

Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change

2014-04-02 Thread Vlad Yasevich

On 04/02/2014 04:51 AM, Michael S. Tsirkin wrote:
 On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote:
 When a link change occurs on a backend (like tap), we currently do
 not propage such change to the nic.  As a result, when someone turns
 off a link on a tap device, for instance, then a guest doesn't see
 that change and continues to try to send traffic or run DHCP even
 though the lower-layer is disconnected.  This is OK when the network
 is set up as a HUB since the the guest may be connected to other HUB
 ports too, but when it's set up as a netdev, it makes thinkgs worse.

 The patch addresses this by setting the peers link down only when the
 peer is not a HUBPORT device.  With this patch, in the following config
   -netdev tap,id=net0 -device e1000,mac=X,netdev=net0
 when net0 link is turned off, the guest e1000 shows lower-layer link
 down. This allows guests to boot much faster in such configurations.
 With windows guest, it also allows the network to recover properly
 since windows will not configure the link-local IPv4 address, and
 when the link is turned on, the proper address address is configured.

 Signed-off-by: Vlad Yasevich vyase...@redhat.com
 ---
  net/net.c | 26 +-
  1 file changed, 17 insertions(+), 9 deletions(-)

 diff --git a/net/net.c b/net/net.c
 index 0a88e68..8a084bf 100644
 --- a/net/net.c
 +++ b/net/net.c
 @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error 
 **errp)
  nc-info-link_status_changed(nc);
  }
  
 -/* Notify peer. Don't update peer link status: this makes it possible to
 - * disconnect from host network without notifying the guest.
 - * FIXME: is disconnected link status change operation useful?
 - *
 - * Current behaviour is compatible with qemu vlans where there could be
 - * multiple clients that can still communicate with each other in
 - * disconnected mode. For now maintain this compatibility. */
 
 Hmm this went in without much discussion.
 
 But before this change went in, it was possible to control link state on NIC
 and on link separately, without guest noticing.
 
 Why did we decide to drop this functionality?

This was causing some interesting issues in the non-hub
configurations.  In these configs, the loss of link on the backend
did not propagate to the guest.  The guest would continue to send
traffic and it would just queue up and get dropped.  Eventually the
guest could lose the DHCP lease and there wouldn't be any indication
at all as to why it happened.  From the guest perspective, the link
is working, but it is completely disconnected.
This patch essentially equates the backend to a carrier on the guest
and the loss of a backend is loss of carrier.

This is only done for configurations where you have a 1-to-1
relationship between a hw nic and netdev.  For the old vlan/hubport,
there is no change, since in those case, the guest nic can talk to
other hubports.  In the case of netdev, it can't.

As an example, try booting a VM in which the netdev link is down.
First, it'll take a while to boot if it uses DHCP.  When you
re-enable, the link, the VM will not issue DHCP requests and you'd
have to manually restart the nic in the VM.

With this patch, link detection works properly.

-vlad


 
 
 
 -if (nc-peer  nc-peer-info-link_status_changed) {
 -nc-peer-info-link_status_changed(nc-peer);
 +if (nc-peer) {
 +/* Change peer link only if the peer is NIC and then notify peer.
 + * If the peer is a HUBPORT or a backend, we do not change the
 + * link status.
 + *
 + * This behavior is compatible with qemu vlans where there could be
 + * multiple clients that can still communicate with each other in
 + * disconnected mode. For now maintain this compatibility.
 + */
 +if (nc-peer-info-type == NET_CLIENT_OPTIONS_KIND_NIC) {
 +for (i = 0; i  queues; i++) {
 +ncs[i]-peer-link_down = !up;
 +}
 +}
 +if (nc-peer-info-link_status_changed) {
 +nc-peer-info-link_status_changed(nc-peer);
 +}
  }
  }
  
 -- 
 1.8.4.2

Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change

2014-04-02 Thread Vlad Yasevich

On 04/02/2014 06:46 AM, Yan Vugenfirer wrote:
 
 On Apr 2, 2014, at 11:51 AM, Michael S. Tsirkin m...@redhat.com wrote:
 
 On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote:
 When a link change occurs on a backend (like tap), we currently do
 not propage such change to the nic.  As a result, when someone turns
 off a link on a tap device, for instance, then a guest doesn't see
 that change and continues to try to send traffic or run DHCP even
 though the lower-layer is disconnected.  This is OK when the network
 is set up as a HUB since the the guest may be connected to other HUB
 ports too, but when it's set up as a netdev, it makes thinkgs worse.

 The patch addresses this by setting the peers link down only when the
 peer is not a HUBPORT device.  With this patch, in the following config
  -netdev tap,id=net0 -device e1000,mac=X,netdev=net0
 when net0 link is turned off, the guest e1000 shows lower-layer link
 down. This allows guests to boot much faster in such configurations.
 With windows guest, it also allows the network to recover properly
 since windows will not configure the link-local IPv4 address, and
 when the link is turned on, the proper address address is configured.

 Signed-off-by: Vlad Yasevich vyase...@redhat.com
 ---
 net/net.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

 diff --git a/net/net.c b/net/net.c
 index 0a88e68..8a084bf 100644
 --- a/net/net.c
 +++ b/net/net.c
 @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error 
 **errp)
 nc-info-link_status_changed(nc);
 }

 -/* Notify peer. Don't update peer link status: this makes it possible 
 to
 - * disconnect from host network without notifying the guest.
 - * FIXME: is disconnected link status change operation useful?
 - *
 - * Current behaviour is compatible with qemu vlans where there could be
 - * multiple clients that can still communicate with each other in
 - * disconnected mode. For now maintain this compatibility. */

 Hmm this went in without much discussion.

 But before this change went in, it was possible to control link state on NIC
 and on link separately, without guest noticing.

 Why did we decide to drop this functionality?
 
 Not sure there was a real need for the patch.
 
 On other hand Windows guest will not receive IP address from DHCP server if 
 you start VM with tap link down and then set it to up without toggling link 
 state of the guest device as well.

Same for a linux guest.  It a fault scenario.  So we either handle
it by propagating the link event, or we forbid it.  Doing neither
allows for a bad state in common configuration.

This patch chose to propagate the event under certain configuration.

-vlad

 
 Yan. 
 



 -if (nc-peer  nc-peer-info-link_status_changed) {
 -nc-peer-info-link_status_changed(nc-peer);
 +if (nc-peer) {
 +/* Change peer link only if the peer is NIC and then notify peer.
 + * If the peer is a HUBPORT or a backend, we do not change the
 + * link status.
 + *
 + * This behavior is compatible with qemu vlans where there could be
 + * multiple clients that can still communicate with each other in
 + * disconnected mode. For now maintain this compatibility.
 + */
 +if (nc-peer-info-type == NET_CLIENT_OPTIONS_KIND_NIC) {
 +for (i = 0; i  queues; i++) {
 +ncs[i]-peer-link_down = !up;
 +}
 +}
 +if (nc-peer-info-link_status_changed) {
 +nc-peer-info-link_status_changed(nc-peer);
 +}
 }
 }

 -- 
 1.8.4.2

Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change

2014-04-02 Thread Vlad Yasevich

On 04/02/2014 11:03 AM, Michael S. Tsirkin wrote:
 On Wed, Apr 02, 2014 at 10:57:08AM -0400, Vlad Yasevich wrote:
 On 04/02/2014 06:46 AM, Yan Vugenfirer wrote:

 On Apr 2, 2014, at 11:51 AM, Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote:
 When a link change occurs on a backend (like tap), we currently do
 not propage such change to the nic.  As a result, when someone turns
 off a link on a tap device, for instance, then a guest doesn't see
 that change and continues to try to send traffic or run DHCP even
 though the lower-layer is disconnected.  This is OK when the network
 is set up as a HUB since the the guest may be connected to other HUB
 ports too, but when it's set up as a netdev, it makes thinkgs worse.

 The patch addresses this by setting the peers link down only when the
 peer is not a HUBPORT device.  With this patch, in the following config
  -netdev tap,id=net0 -device e1000,mac=X,netdev=net0
 when net0 link is turned off, the guest e1000 shows lower-layer link
 down. This allows guests to boot much faster in such configurations.
 With windows guest, it also allows the network to recover properly
 since windows will not configure the link-local IPv4 address, and
 when the link is turned on, the proper address address is configured.

 Signed-off-by: Vlad Yasevich vyase...@redhat.com
 ---
 net/net.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

 diff --git a/net/net.c b/net/net.c
 index 0a88e68..8a084bf 100644
 --- a/net/net.c
 +++ b/net/net.c
 @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, 
 Error **errp)
 nc-info-link_status_changed(nc);
 }

 -/* Notify peer. Don't update peer link status: this makes it 
 possible to
 - * disconnect from host network without notifying the guest.
 - * FIXME: is disconnected link status change operation useful?
 - *
 - * Current behaviour is compatible with qemu vlans where there could 
 be
 - * multiple clients that can still communicate with each other in
 - * disconnected mode. For now maintain this compatibility. */

 Hmm this went in without much discussion.

 But before this change went in, it was possible to control link state on 
 NIC
 and on link separately, without guest noticing.

 Why did we decide to drop this functionality?

 Not sure there was a real need for the patch.

 On other hand Windows guest will not receive IP address from DHCP server if 
 you start VM with tap link down and then set it to up without toggling link 
 state of the guest device as well.

 Same for a linux guest.  It a fault scenario.  So we either handle
 it by propagating the link event, or we forbid it.  Doing neither
 allows for a bad state in common configuration.
 
 It's how networking works.
 If I turn off my router at home I can't get dhcp either.

And you device link will show that it has not carrier.

 
 We had a way to disable networking  at link or at
 the router, then removed the ability to turn it
 off at the router and I'm asking why.

We didn't remove the ability to turn it off at router/switch.
We just now propagate carrier loss correctly.

In fact, if you have a configuration where the VM is connected
to a hub in qemu, turning off the link on the 'tap' connected
to the same hub doesn't not change guest state.

 
 This patch chose to propagate the event under certain configuration.

 -vlad
 
 So don't do this then.
 If you want guest to know that link is down,
 put it down on guest side.
 
 It's that simple.
 

Sorry, but that's not always possible.  The host administrator
may not always be vm administrator and without this patch vm
administrator has no idea what happened to his network.

-vlad
 
 
 

 Yan. 




 -if (nc-peer  nc-peer-info-link_status_changed) {
 -nc-peer-info-link_status_changed(nc-peer);
 +if (nc-peer) {
 +/* Change peer link only if the peer is NIC and then notify peer.
 + * If the peer is a HUBPORT or a backend, we do not change the
 + * link status.
 + *
 + * This behavior is compatible with qemu vlans where there could 
 be
 + * multiple clients that can still communicate with each other in
 + * disconnected mode. For now maintain this compatibility.
 + */
 +if (nc-peer-info-type == NET_CLIENT_OPTIONS_KIND_NIC) {
 +for (i = 0; i  queues; i++) {
 +ncs[i]-peer-link_down = !up;
 +}
 +}
 +if (nc-peer-info-link_status_changed) {
 +nc-peer-info-link_status_changed(nc-peer);
 +}
 }
 }

 -- 
 1.8.4.2

Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change

2014-04-02 Thread Vlad Yasevich

On 04/02/2014 11:29 AM, Michael S. Tsirkin wrote:
 On Wed, Apr 02, 2014 at 11:25:32AM -0400, Vlad Yasevich wrote:
 On 04/02/2014 11:03 AM, Michael S. Tsirkin wrote:
 On Wed, Apr 02, 2014 at 10:57:08AM -0400, Vlad Yasevich wrote:
 On 04/02/2014 06:46 AM, Yan Vugenfirer wrote:

 On Apr 2, 2014, at 11:51 AM, Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote:
 When a link change occurs on a backend (like tap), we currently do
 not propage such change to the nic.  As a result, when someone turns
 off a link on a tap device, for instance, then a guest doesn't see
 that change and continues to try to send traffic or run DHCP even
 though the lower-layer is disconnected.  This is OK when the network
 is set up as a HUB since the the guest may be connected to other HUB
 ports too, but when it's set up as a netdev, it makes thinkgs worse.

 The patch addresses this by setting the peers link down only when the
 peer is not a HUBPORT device.  With this patch, in the following config
  -netdev tap,id=net0 -device e1000,mac=X,netdev=net0
 when net0 link is turned off, the guest e1000 shows lower-layer link
 down. This allows guests to boot much faster in such configurations.
 With windows guest, it also allows the network to recover properly
 since windows will not configure the link-local IPv4 address, and
 when the link is turned on, the proper address address is configured.

 Signed-off-by: Vlad Yasevich vyase...@redhat.com
 ---
 net/net.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

 diff --git a/net/net.c b/net/net.c
 index 0a88e68..8a084bf 100644
 --- a/net/net.c
 +++ b/net/net.c
 @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, 
 Error **errp)
 nc-info-link_status_changed(nc);
 }

 -/* Notify peer. Don't update peer link status: this makes it 
 possible to
 - * disconnect from host network without notifying the guest.
 - * FIXME: is disconnected link status change operation useful?
 - *
 - * Current behaviour is compatible with qemu vlans where there 
 could be
 - * multiple clients that can still communicate with each other in
 - * disconnected mode. For now maintain this compatibility. */

 Hmm this went in without much discussion.

 But before this change went in, it was possible to control link state on 
 NIC
 and on link separately, without guest noticing.

 Why did we decide to drop this functionality?

 Not sure there was a real need for the patch.

 On other hand Windows guest will not receive IP address from DHCP server 
 if you start VM with tap link down and then set it to up without toggling 
 link state of the guest device as well.

 Same for a linux guest.  It a fault scenario.  So we either handle
 it by propagating the link event, or we forbid it.  Doing neither
 allows for a bad state in common configuration.

 It's how networking works.
 If I turn off my router at home I can't get dhcp either.

 And you device link will show that it has not carrier.
 
 It doesn't as long as the switch it's connected to is up.
 
 

 We had a way to disable networking  at link or at
 the router, then removed the ability to turn it
 off at the router and I'm asking why.

 We didn't remove the ability to turn it off at router/switch.
 We just now propagate carrier loss correctly.

 In fact, if you have a configuration where the VM is connected
 to a hub in qemu, turning off the link on the 'tap' connected
 to the same hub doesn't not change guest state.
 
 So why is it a good idea to make -netdev inconsistent
 with this?

Because -netdev has a 1-1 relationship with -device.   You
could think of it a physical wire that connects the nic
to the network.  If something happens to the wire, the
nic should notice it.

 It doesn't seem to add anything useful.
 It might fix things for users who turned off link
 at the wrong place by mistake but I doubt it's
 a common scenario.

I don't think link state management in general is a common scenario, but
we support it.  I think this makes the behavior better and improves
the vm experience.
 
 Where end users getting it wrong and complaining?

Yes, I can provide you with a list reported bugs if you wish.

-vlad

 
 

 This patch chose to propagate the event under certain configuration.

 -vlad

 So don't do this then.
 If you want guest to know that link is down,
 put it down on guest side.

 It's that simple.


 Sorry, but that's not always possible.  The host administrator
 may not always be vm administrator and without this patch vm
 administrator has no idea what happened to his network.

 -vlad




 Yan. 




 -if (nc-peer  nc-peer-info-link_status_changed) {
 -nc-peer-info-link_status_changed(nc-peer);
 +if (nc-peer) {
 +/* Change peer link only if the peer is NIC and then notify 
 peer.
 + * If the peer is a HUBPORT or a backend, we do not change the
 + * link status

Re: [Qemu-devel] [PATCH v2 for 2.0] virtio-net: Do not filter VLANs without F_CTRL_VLAN

2014-03-26 Thread Vlad Yasevich

On 03/26/2014 06:29 AM, Amos Kong wrote:
 From: Stefan Fritsch s...@sfritsch.de
 
 If VIRTIO_NET_F_CTRL_VLAN is not negotiated, do not filter out all
 VLAN-tagged packets but send them to the guest.
 
 This fixes VLANs with OpenBSD guests (and probably NetBSD, too, because
 the OpenBSD driver started as a port from NetBSD).
 
 Signed-off-by: Stefan Fritsch s...@sfritsch.de
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
 V2: it's not good to check guest features at reset time,
 so reset vlan table in setting features
 ---
  hw/net/virtio-net.c | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index 43b4eda..439477b 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
 @@ -530,6 +530,12 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
 uint32_t features)
  }
  vhost_net_ack_features(tap_get_vhost_net(nc-peer), features);
  }
 +
 +if ((1  VIRTIO_NET_F_CTRL_VLAN)  features) {
 +memset(n-vlans, 0, MAX_VLAN  3);
 +} else {
 +memset(n-vlans, 0xff, MAX_VLAN  3);
 +}
  }
  
  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
 

Reviewed-by: Vlad Yasevich vyase...@redhat.com

-vlad

Re: [Qemu-devel] [PATCH v2] virtio-net: add a field to indicate if vlan table is used

2014-02-20 Thread Vlad Yasevich

On 02/20/2014 11:38 AM, Amos Kong wrote:
 Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
 filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated.
 
 This patch added a new field to @RxFilterInfo to indicate if management
 uses the vlan table.
 
 [1] http://lists.nongnu.org/archive/html/qemu-devel/2014-02/msg02604.html
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
 V2: don't make vlan-table optional, add a flag to indicate
 if vlan table is used by management
 ---
  hw/net/virtio-net.c | 38 +-
  qapi-schema.json|  3 +++
  qmp-commands.hx |  2 ++
  3 files changed, 30 insertions(+), 13 deletions(-)
 
 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index 3626608..f591f4e 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
 @@ -222,13 +222,33 @@ static char *mac_strdup_printf(const uint8_t *mac)
  mac[1], mac[2], mac[3], mac[4], mac[5]);
  }
  
 +static intList *get_vlan_table(VirtIONet *n)
 +{
 +intList *list, *entry;
 +int i, j;
 +
 +list = NULL;
 +for (i = 0; i  MAX_VLAN  5; i++) {
 +for (j = 0; n-vlans[i]  j  0x1f; j++) {
 +if (n-vlans[i]  (1U  j)) {
 +entry = g_malloc0(sizeof(*entry));
 +entry-value = (i  5) + j;
 +entry-next = list;
 +list = entry;
 +}
 +}
 +}
 +
 +return list;
 +}
 +
  static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
  {
  VirtIONet *n = qemu_get_nic_opaque(nc);
 +VirtIODevice *vdev = VIRTIO_DEVICE(n);
  RxFilterInfo *info;
  strList *str_list, *entry;
 -intList *int_list, *int_entry;
 -int i, j;
 +int i;
  
  info = g_malloc0(sizeof(*info));
  info-name = g_strdup(nc-name);
 @@ -273,19 +293,11 @@ static RxFilterInfo 
 *virtio_net_query_rxfilter(NetClientState *nc)
  str_list = entry;
  }
  info-multicast_table = str_list;
 +info-vlan_table = get_vlan_table(n);
  
 -int_list = NULL;
 -for (i = 0; i  MAX_VLAN  5; i++) {
 -for (j = 0; n-vlans[i]  j  0x1f; j++) {
 -if (n-vlans[i]  (1U  j)) {
 -int_entry = g_malloc0(sizeof(*int_entry));
 -int_entry-value = (i  5) + j;
 -int_entry-next = int_list;
 -int_list = int_entry;
 -}
 -}
 +if ((1  VIRTIO_NET_F_CTRL_VLAN)  vdev-guest_features) {
 +info-vlan = true;
  }

So, in the case that vlan filtering is not supported in the guest
we get:
 vlan: false,
 vlan-table: [
 0,
 1,
 2,
 ...
 4095
]
since virtio_net now initializes the table to all 1s.
Seems a bit awkward.  We are providing a lot of data that
is simply going to be ignored.

 -info-vlan_table = int_list;
  
  /* enable event notification after query */
  nc-rxfilter_notify_enabled = 1;
 diff --git a/qapi-schema.json b/qapi-schema.json
 index 7cfb5e5..5b54e94 100644
 --- a/qapi-schema.json
 +++ b/qapi-schema.json
 @@ -4032,6 +4032,8 @@
  #
  # @unicast-overflow: unicast table is overflowed or not
  #
 +# @vlan: whether management uses the vlan table
 +#

The above description seems a bit confusing to me.  The value
we are returning describes whether or not qemu is performing
vlan filtering.  I am not sure if it has any bearing on what
management may be doing.

I think the idea is that management, in the future, would look at
this value and make some decision about applying provided filter
to the current host configuration.

  # @main-mac: the main macaddr string
  #
  # @vlan-table: a list of active vlan id
 @@ -4052,6 +4054,7 @@
  'broadcast-allowed':  'bool',
  'multicast-overflow': 'bool',
  'unicast-overflow':   'bool',
 +'vlan':   'bool',

Not terribly descriptive.  May be call it vlan-filter?

Thanks
-vlad
  'main-mac':   'str',
  'vlan-table': ['int'],
  'unicast-table':  ['str'],
 diff --git a/qmp-commands.hx b/qmp-commands.hx
 index cce6b81..b170c79 100644
 --- a/qmp-commands.hx
 +++ b/qmp-commands.hx
 @@ -3307,6 +3307,7 @@ Each array entry contains the following:
  - broadcast-allowed: allow to receive broadcast (json-bool)
  - multicast-overflow: multicast table is overflowed (json-bool)
  - unicast-overflow: unicast table is overflowed (json-bool)
 +- vlan: management uses the vlan table (json-bool)
  - main-mac: main macaddr string (json-string)
  - vlan-table: a json-array of active vlan id
  - unicast-table: a json-array of unicast macaddr string
 @@ -3321,6 +3322,7 @@ Example:
  name: vnet0,
  main-mac: 52:54:00:12:34:56,
  unicast: normal,
 +vlan: true,
  vlan-table: [
  4,
  0

Re: [Qemu-devel] [PATCH] virtio-net: only output the vlan table when VIRTIO_NET_F_CTRL_VLAN is negotiated

2014-02-18 Thread Vlad Yasevich

On 02/18/2014 05:06 AM, Michael S. Tsirkin wrote:
 On Mon, Feb 17, 2014 at 11:58:11AM -0500, Vlad Yasevich wrote:
 On 02/17/2014 11:56 AM, Eric Blake wrote:
 On 02/17/2014 09:52 AM, Eric Blake wrote:
 On 02/16/2014 07:27 PM, Amos Kong wrote:
 Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
 filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated.

 We should also not send the vlan table to management, this patch makes
 the vlan-talbe optional.

 s/talbe/table/


 @@ -4053,7 +4053,7 @@
  'multicast-overflow': 'bool',
  'unicast-overflow':   'bool',
  'main-mac':   'str',
 -'vlan-table': ['int'],
 +'*vlan-table': ['int'],

 Indentation is now off.

  - main-mac: main macaddr string (json-string)
 -- vlan-table: a json-array of active vlan id
 +- vlan-table: a json-array of active vlan id (optoinal)

 s/optoinal/optional/

 Those fixes are trivial enough, so I'm okay if you correct them then add:

 Scratch that.  I revoke my R-b, on the following grounds:

 On 02/17/2014 08:27 AM, Vlad Yasevich wrote:
 On 02/16/2014 09:27 PM, Amos Kong wrote:
 Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
 filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't
 negotiated.

 We should also not send the vlan table to management, this patch makes
 the vlan-talbe optional.
   ^^
table.

 One question I have from the API perspective is can we suddenly change
 something to be optional?  If there are any users of this ,
 wouldn't they have to change now to check the existence of this
 list?

 You are correct.  Since the parameter is an output field, older clients
 may be depending on it existing.  It is okay to generate an empty array,
 but you must not entirely omit the array unless you add further
 justification in your commit message that you are 100% positive that
 there are no clients of 1.6 that will be broken when the array no longer
 appears in the output.

 Can you rework the patch to just leave the array empty in the case where
 the bit does not indicate it is used?  Or do we need to add a new bool
 field to the output for new enough management to know whether to use the
 array?


 I think a completely empty array should be sufficient.

 -vlad
 
 An empty array could mean either no vlans allowed or
 all vlans allowed. Also it's nice if users can detect an old
 buggy qemu.
 
 Let's just add an rx state.
 

Fine with me.

-vlad

Re: [Qemu-devel] [PATCH] virtio-net: only output the vlan table when VIRTIO_NET_F_CTRL_VLAN is negotiated

2014-02-17 Thread Vlad Yasevich

On 02/16/2014 09:27 PM, Amos Kong wrote:
 Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
 filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated.
 
 We should also not send the vlan table to management, this patch makes
 the vlan-talbe optional.
  ^^
   table.

One question I have from the API perspective is can we suddenly change
something to be optional?  If there are any users of this ,
wouldn't they have to change now to check the existence of this
list?

Thanks
-vlad

 [1] http://lists.nongnu.org/archive/html/qemu-devel/2014-02/msg02604.html
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  hw/net/virtio-net.c | 38 +-
  qapi-schema.json|  4 ++--
  qmp-commands.hx |  2 +-
  3 files changed, 28 insertions(+), 16 deletions(-)
 
 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index 3626608..0b32e6a 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
 @@ -222,13 +222,33 @@ static char *mac_strdup_printf(const uint8_t *mac)
  mac[1], mac[2], mac[3], mac[4], mac[5]);
  }
  
 +static intList *get_vlan_table(VirtIONet *n)
 +{
 +intList *list, *entry;
 +int i, j;
 +
 +list = NULL;
 +for (i = 0; i  MAX_VLAN  5; i++) {
 +for (j = 0; n-vlans[i]  j  0x1f; j++) {
 +if (n-vlans[i]  (1U  j)) {
 +entry = g_malloc0(sizeof(*entry));
 +entry-value = (i  5) + j;
 +entry-next = list;
 +list = entry;
 +}
 +}
 +}
 +
 +return list;
 +}
 +
  static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
  {
  VirtIONet *n = qemu_get_nic_opaque(nc);
 +VirtIODevice *vdev = VIRTIO_DEVICE(n);
  RxFilterInfo *info;
  strList *str_list, *entry;
 -intList *int_list, *int_entry;
 -int i, j;
 +int i;
  
  info = g_malloc0(sizeof(*info));
  info-name = g_strdup(nc-name);
 @@ -274,18 +294,10 @@ static RxFilterInfo 
 *virtio_net_query_rxfilter(NetClientState *nc)
  }
  info-multicast_table = str_list;
  
 -int_list = NULL;
 -for (i = 0; i  MAX_VLAN  5; i++) {
 -for (j = 0; n-vlans[i]  j  0x1f; j++) {
 -if (n-vlans[i]  (1U  j)) {
 -int_entry = g_malloc0(sizeof(*int_entry));
 -int_entry-value = (i  5) + j;
 -int_entry-next = int_list;
 -int_list = int_entry;
 -}
 -}
 +if ((1  VIRTIO_NET_F_CTRL_VLAN)  vdev-guest_features) {
 +info-has_vlan_table = true;
 +info-vlan_table = get_vlan_table(n);
  }
 -info-vlan_table = int_list;

  
  /* enable event notification after query */
  nc-rxfilter_notify_enabled = 1;
 diff --git a/qapi-schema.json b/qapi-schema.json
 index 7cfb5e5..5d48fa9 100644
 --- a/qapi-schema.json
 +++ b/qapi-schema.json
 @@ -4034,7 +4034,7 @@
  #
  # @main-mac: the main macaddr string
  #
 -# @vlan-table: a list of active vlan id
 +# @vlan-table: #optional a list of active vlan id
  #
  # @unicast-table: a list of unicast macaddr string
  #
 @@ -4053,7 +4053,7 @@
  'multicast-overflow': 'bool',
  'unicast-overflow':   'bool',
  'main-mac':   'str',
 -'vlan-table': ['int'],
 +'*vlan-table': ['int'],
  'unicast-table':  ['str'],
  'multicast-table':['str'] }}
  
 diff --git a/qmp-commands.hx b/qmp-commands.hx
 index cce6b81..a1c1dfa 100644
 --- a/qmp-commands.hx
 +++ b/qmp-commands.hx
 @@ -3308,7 +3308,7 @@ Each array entry contains the following:
  - multicast-overflow: multicast table is overflowed (json-bool)
  - unicast-overflow: unicast table is overflowed (json-bool)
  - main-mac: main macaddr string (json-string)
 -- vlan-table: a json-array of active vlan id
 +- vlan-table: a json-array of active vlan id (optoinal)
  - unicast-table: a json-array of unicast macaddr string
  - multicast-table: a json-array of multicast macaddr string

Re: [Qemu-devel] [PATCH] virtio-net: only output the vlan table when VIRTIO_NET_F_CTRL_VLAN is negotiated

2014-02-17 Thread Vlad Yasevich

On 02/17/2014 11:56 AM, Eric Blake wrote:
 On 02/17/2014 09:52 AM, Eric Blake wrote:
 On 02/16/2014 07:27 PM, Amos Kong wrote:
 Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
 filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated.

 We should also not send the vlan table to management, this patch makes
 the vlan-talbe optional.

 s/talbe/table/

 
 @@ -4053,7 +4053,7 @@
  'multicast-overflow': 'bool',
  'unicast-overflow':   'bool',
  'main-mac':   'str',
 -'vlan-table': ['int'],
 +'*vlan-table': ['int'],

 Indentation is now off.

  - main-mac: main macaddr string (json-string)
 -- vlan-table: a json-array of active vlan id
 +- vlan-table: a json-array of active vlan id (optoinal)

 s/optoinal/optional/

 Those fixes are trivial enough, so I'm okay if you correct them then add:
 
 Scratch that.  I revoke my R-b, on the following grounds:
 
 On 02/17/2014 08:27 AM, Vlad Yasevich wrote:
 On 02/16/2014 09:27 PM, Amos Kong wrote:
 Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
 filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't
 negotiated.

 We should also not send the vlan table to management, this patch makes
 the vlan-talbe optional.
   ^^
table.

 One question I have from the API perspective is can we suddenly change
 something to be optional?  If there are any users of this ,
 wouldn't they have to change now to check the existence of this
 list?
 
 You are correct.  Since the parameter is an output field, older clients
 may be depending on it existing.  It is okay to generate an empty array,
 but you must not entirely omit the array unless you add further
 justification in your commit message that you are 100% positive that
 there are no clients of 1.6 that will be broken when the array no longer
 appears in the output.
 
 Can you rework the patch to just leave the array empty in the case where
 the bit does not indicate it is used?  Or do we need to add a new bool
 field to the output for new enough management to know whether to use the
 array?
 

I think a completely empty array should be sufficient.

-vlad

Re: [Qemu-devel] [RFC PATCH 1/2] e1000: Use Address_Available bit as HW latch

2013-11-22 Thread Vlad Yasevich

On 11/22/2013 04:47 AM, Jason Wang wrote:
 On 11/22/2013 04:04 AM, Vlad Yasevich wrote:
 e1000 provides a E1000_RAH_AV bit on every complete write
 to the Receive Address Register.  We can use this bit
 2 ways:
  1) To trigger HMP notifications.  When the bit is set the
 mac address is fully set and we can update the HMP.

  2) We can turn off he bit on the write to low order bits of
 the Receive Address Register, so that we would not try
 to match received traffic to this address when it is
 not completely set.

 Signed-off-by: Vlad Yasevich vyase...@redhat.com
 ---
  hw/net/e1000.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

 diff --git a/hw/net/e1000.c b/hw/net/e1000.c
 index ae63591..82978ea 100644
 --- a/hw/net/e1000.c
 +++ b/hw/net/e1000.c
 @@ -1106,10 +1106,19 @@ mac_writereg(E1000State *s, int index, uint32_t val)
  
  s-mac_reg[index] = val;
  
 -if (index == RA || index == RA + 1) {
 +switch (index) {
 +case RA:
 +/* Mask off AV bit on the write of the low dword.  The write of
 + * the high dword will set the bit.  This way a half-written
 + * mac address will not be used to filter on rx.
 + */
 +s-mac_reg[RA+1] = ~E1000_RAH_AV;
 
 If a stupid driver write high dword first, it won't receive any packets.

I need to ping Intel guys again.  I asked them what happens when only
the low register is set, but haven't heard back.

 +break;
 +case (RA + 1):
  macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t 
 *)macaddr);
 
 Guest may invalid the mac address by clearing the AV bit through writing
 to high dword. So this may notify a wrong mac address.

In this case, testing for the AV bit would solve this issue.
 
 Generally, we could teset the AV bit before notification, and try to do
 the this on both high and low dword. This obeys specs and
 receive_filter() above.

This will not really help since the AV bit would already be set from the
prior mac address.  So, if a stupid driver writes just the low word,
the AV bit would already be set.

 
 If we don't want half-written status, driver should clear AV bit before
 each writing of new mac address. But looks like linux and freebsd does
 not do this.  But the window is really small and harmless.

We can emulate this.  Thanks for the idea.

-vlad

 +break;
  }
  }

[Qemu-devel] [RFC PATCH 2/2] rtl8139: update HMP only when the address is fully written

2013-11-21 Thread Vlad Yasevich

rtl8139 hardware requires 9346 config register to be set into
write mode before mac address can be changed even though it is
not documented.  Every driver inspected so far appears to do
this along with comments that this is an undocumented requirement.

We can use this to help us identify when the mac address has been
completely written.  Simple set a flag whenever mac has changed
and at the next transition of 9346 register from Write to Normal
mode, we update the HMP.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
 hw/i386/pc_piix.c|  4 
 hw/i386/pc_q35.c |  4 
 hw/net/rtl8139.c | 50 +-
 include/hw/i386/pc.h |  8 
 4 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2daa111..731ae3b 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -372,6 +372,10 @@ static QEMUMachine pc_i440fx_machine_v1_7 = {
 PC_I440FX_1_7_MACHINE_OPTIONS,
 .name = pc-i440fx-1.7,
 .init = pc_init_pci_1_7,
+.compat_props = (GlobalProperty[]) {
+PC_COMPAT_1_7,
+{ /* end of list */ }
+},
 };
 
 #define PC_I440FX_1_6_MACHINE_OPTIONS PC_I440FX_1_7_MACHINE_OPTIONS
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index e2b8907..7b6aedf 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -292,6 +292,10 @@ static QEMUMachine pc_q35_machine_v1_7 = {
 PC_Q35_1_7_MACHINE_OPTIONS,
 .name = pc-q35-1.7,
 .init = pc_q35_init_1_7,
+.compat_props = (GlobalProperty[]) {
+PC_COMPAT_1_7,
+{ /* end of list */ }
+},
 };
 
 #define PC_Q35_1_6_MACHINE_OPTIONS PC_Q35_1_7_MACHINE_OPTIONS
diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 7f2b4db..5c4caec 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -476,6 +476,7 @@ typedef struct RTL8139State {
 
 uint16_t CpCmd;
 uint8_t  TxThresh;
+bool mac_changed;
 
 NICState *nic;
 NICConf conf;
@@ -515,6 +516,9 @@ typedef struct RTL8139State {
 
 /* Support migration to/from old versions */
 int rtl8139_mmio_io_addr_dummy;
+#define RTL8139_FLAG_MAC_BIT 0
+#define RTL8139_FLAG_MAC_COMPLETE (1  RTL8139_FLAG_MAC_BIT)
+uint32_tcompat_flags;
 } RTL8139State;
 
 /* Writes tally counters to memory via DMA */
@@ -1215,6 +1219,7 @@ static void rtl8139_reset(DeviceState *d)
 /* restore MAC address */
 memcpy(s-phys, s-conf.macaddr.a, 6);
 qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed = false;
 
 /* reset interrupt mask */
 s-IntrStatus = 0;
@@ -1563,6 +1568,14 @@ static void rtl8139_Cfg9346_write(RTL8139State *s, 
uint32_t val)
 /* Reset.  */
 val = 0;
 rtl8139_reset(d);
+} else if (opmode == Cfg9346_Normal  s-mac_changed) {
+/* Even though it is not documented, it is required to set
+ * opmode to Cfg9346_ConfigWrite when changing the mac address
+ * of the card and to set to Cfg9346_Normal when done.  We
+ * can use this as an idication to kick off the notification event.
+ */
+qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed = false;
 }
 
 s-Cfg9346 = val;
@@ -2743,7 +2756,12 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
addr, uint32_t val)
 {
 case MAC0 ... MAC0+5:
 s-phys[addr - MAC0] = val;
-qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+if (s-compat_flags  RTL8139_FLAG_MAC_COMPLETE) {
+s-mac_changed = true;
+} else if (addr == MAC0+5) {
+/* Emulate old style updates on the last write */
+qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+}
 break;
 case MAC0+6 ... MAC0+7:
 /* reserved */
@@ -3256,6 +3274,13 @@ static int rtl8139_post_load(void *opaque, int 
version_id)
  * to link status bit in BasicModeStatus */
 qemu_get_queue(s-nic)-link_down = (s-BasicModeStatus  0x04) == 0;
 
+/* Emulate old behavior if we don't support mac change completion
+ * tracking
+ */
+if (!(s-compat_flags  RTL8139_FLAG_MAC_COMPLETE)) {
+s-mac_changed = false;
+}
+
 return 0;
 }
 
@@ -3286,6 +3311,24 @@ static void rtl8139_pre_save(void *opaque)
 s-rtl8139_mmio_io_addr_dummy = 0;
 }
 
+static bool rtl8139_mac_state_needed(void *opaque)
+{
+RTL8139State *s = opaque;
+
+return (s-compat_flags  RTL8139_FLAG_MAC_COMPLETE)  s-mac_changed;
+}
+
+static const VMStateDescription vmstate_rtl8139_mac_state ={
+.name = rtl8139/mac_state,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_BOOL(mac_changed, RTL8139State),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_rtl8139 = {
 .name = rtl8139,
 .version_id = 4,
@@ -3371,6 +3414,9 @@ static const

[Qemu-devel] [RFC PATCH 1/2] e1000: Use Address_Available bit as HW latch

2013-11-21 Thread Vlad Yasevich

e1000 provides a E1000_RAH_AV bit on every complete write
to the Receive Address Register.  We can use this bit
2 ways:
 1) To trigger HMP notifications.  When the bit is set the
mac address is fully set and we can update the HMP.

 2) We can turn off he bit on the write to low order bits of
the Receive Address Register, so that we would not try
to match received traffic to this address when it is
not completely set.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
 hw/net/e1000.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index ae63591..82978ea 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -1106,10 +1106,19 @@ mac_writereg(E1000State *s, int index, uint32_t val)
 
 s-mac_reg[index] = val;
 
-if (index == RA || index == RA + 1) {
+switch (index) {
+case RA:
+/* Mask off AV bit on the write of the low dword.  The write of
+ * the high dword will set the bit.  This way a half-written
+ * mac address will not be used to filter on rx.
+ */
+s-mac_reg[RA+1] = ~E1000_RAH_AV;
+break;
+case (RA + 1):
 macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
 macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
 qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr);
+break;
 }
 }
 
-- 
1.8.4.2

[Qemu-devel] [PATCH] net: Update netdev peer on link change

2013-11-21 Thread Vlad Yasevich

When a link change occurs on a backend (like tap), we currently do
not propage such change to the nic.  As a result, when someone turns
off a link on a tap device, for instance, then a guest doesn't see
that change and continues to try to send traffic or run DHCP even
though the lower-layer is disconnected.  This is OK when the network
is set up as a HUB since the the guest may be connected to other HUB
ports too, but when it's set up as a netdev, it makes thinkgs worse.

The patch addresses this by setting the peers link down only when the
peer is not a HUBPORT device.  With this patch, in the following config
  -netdev tap,id=net0 -device e1000,mac=X,netdev=net0
when net0 link is turned off, the guest e1000 shows lower-layer link
down. This allows guests to boot much faster in such configurations.
With windows guest, it also allows the network to recover properly
since windows will not configure the link-local IPv4 address, and
when the link is turned on, the proper address address is configured.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
 net/net.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/net/net.c b/net/net.c
index 0a88e68..8a084bf 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error 
**errp)
 nc-info-link_status_changed(nc);
 }
 
-/* Notify peer. Don't update peer link status: this makes it possible to
- * disconnect from host network without notifying the guest.
- * FIXME: is disconnected link status change operation useful?
- *
- * Current behaviour is compatible with qemu vlans where there could be
- * multiple clients that can still communicate with each other in
- * disconnected mode. For now maintain this compatibility. */
-if (nc-peer  nc-peer-info-link_status_changed) {
-nc-peer-info-link_status_changed(nc-peer);
+if (nc-peer) {
+/* Change peer link only if the peer is NIC and then notify peer.
+ * If the peer is a HUBPORT or a backend, we do not change the
+ * link status.
+ *
+ * This behavior is compatible with qemu vlans where there could be
+ * multiple clients that can still communicate with each other in
+ * disconnected mode. For now maintain this compatibility.
+ */
+if (nc-peer-info-type == NET_CLIENT_OPTIONS_KIND_NIC) {
+for (i = 0; i  queues; i++) {
+ncs[i]-peer-link_down = !up;
+}
+}
+if (nc-peer-info-link_status_changed) {
+nc-peer-info-link_status_changed(nc-peer);
+}
 }
 }
 
-- 
1.8.4.2

[Qemu-devel] [RFC PATCH 0/2] Update HMP only upon mac change completion.

2013-11-21 Thread Vlad Yasevich

Recent threads regarding e1000/rtl8139 and mac address change notifications
prompted some research into the respecitive hw data sheets as well as
available drivers.  What I found is that each hw has a mechanism that
can be used by our emulation layer to determine when the mac address
change has completed (when the OS finished writing the mac address),
and we can use these mechanisms to trigger HMP notifications.

This is only an RFC series.  It's been tested and works well.
I've split e1000 and rtl8139 changes as they are sufficiently
different.  e1000 make this very clean and easy, but rtl8139
isn't as nice.

Please take a look and I'd like to hear your comments.

Thanks
-vlad

Vlad Yasevich (2):
  e1000: Use Address_Available bit as HW latch
  rtl8139: update HMP only when the address is fully written

 hw/i386/pc_piix.c|  4 
 hw/i386/pc_q35.c |  4 
 hw/net/e1000.c   | 11 ++-
 hw/net/rtl8139.c | 50 +-
 include/hw/i386/pc.h |  8 
 5 files changed, 75 insertions(+), 2 deletions(-)

-- 
1.8.4.2

Re: [Qemu-devel] [PATCH for-1.7] qdev-properties-system.c: Allow vlan or netdev for -device, not both

2013-11-21 Thread Vlad Yasevich

On 11/11/2013 02:50 AM, Paolo Bonzini wrote:
 Il 11/11/2013 06:18, Jason Wang ha scritto:
 On 11/08/2013 10:13 AM, Vlad Yasevich wrote:
 It is currently possible to specify things like:
 -device e1000,netdev=foo,vlan=1
 With this usage, whichever argument was specified last (vlan or netdev)
 overwrites what was previousely set and results in a non-working
 configuration.  Even worse, when used with multiqueue devices,
 it causes a segmentation fault on exit in qemu_free_net_client.

 That patch treates the above command line options as invalid and
 generates an error at start-up.

 Signed-off-by: Vlad Yasevich vyase...@redhat.com
 ---
  hw/core/qdev-properties-system.c | 9 +
  1 file changed, 9 insertions(+)

 diff --git a/hw/core/qdev-properties-system.c 
 b/hw/core/qdev-properties-system.c
 index 0eada32..729efa8 100644
 --- a/hw/core/qdev-properties-system.c
 +++ b/hw/core/qdev-properties-system.c
 @@ -205,6 +205,11 @@ static int parse_netdev(DeviceState *dev, const char 
 *str, void **ptr)
  goto err;
  }
  
 +if (ncs[i]) {
 +ret = -EINVAL;
 +goto err;
 +}
 +
  ncs[i] = peers[i];
  ncs[i]-queue_index = i;
  }
 @@ -301,6 +306,10 @@ static void set_vlan(Object *obj, Visitor *v, void 
 *opaque,
  *ptr = NULL;
  return;
  }
 +if (*ptr) {
 +error_set_from_qdev_prop_error(errp, -EINVAL, dev, prop, name);
 +return;
 +}
  
  hubport = net_hub_port_find(id);
  if (!hubport) {

 Acked-by: Jason Wang jasow...@redhat.com
 
 This patch is good for 1.7.
 
 Paolo
 

Hi All

Just wondering what's to become of this patch?  It has an ACK, but has
been abandoned.  It does a fix a crash in qemu.

Thanks
-vlad

Re: [Qemu-devel] [PATCH] virtio-net: fix the memory leak in rxfilter_notify()

2013-11-18 Thread Vlad Yasevich

On 11/18/2013 08:47 AM, Amos Kong wrote:
 object_get_canonical_path() returns a gchar*, it should be freeed by the
 caller.
 
 Signed-off-by: Amos Kong ak...@redhat.com

Reviewed-by: Vlad Yasevich vyase...@redhat.com

-vlad

 ---
  hw/net/virtio-net.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index 613f144..2b2fb57 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
 @@ -198,15 +198,14 @@ static void rxfilter_notify(NetClientState *nc)
  {
  QObject *event_data;
  VirtIONet *n = qemu_get_nic_opaque(nc);
 +gchar *path = object_get_canonical_path(OBJECT(n-qdev));
  
  if (nc-rxfilter_notify_enabled) {
  if (n-netclient_name) {
  event_data = qobject_from_jsonf({ 'name': %s, 'path': %s },
 -n-netclient_name,
 -
 object_get_canonical_path(OBJECT(n-qdev)));
 +n-netclient_name, path);
  } else {
 -event_data = qobject_from_jsonf({ 'path': %s },
 -
 object_get_canonical_path(OBJECT(n-qdev)));
 +event_data = qobject_from_jsonf({ 'path': %s }, path);
  }
  monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
  qobject_decref(event_data);
 @@ -214,6 +213,7 @@ static void rxfilter_notify(NetClientState *nc)
  /* disable event notification to avoid events flooding */
  nc-rxfilter_notify_enabled = 0;
  }
 +g_free(path);
  }
  
  static char *mac_strdup_printf(const uint8_t *mac)

Re: [Qemu-devel] [PATCH v2] virtio-net: fix the memory leak in rxfilter_notify()

2013-11-18 Thread Vlad Yasevich

On 11/18/2013 10:32 AM, Amos Kong wrote:
 object_get_canonical_path() returns a gchar*, it should be freeed by the
 caller.
 
 Signed-off-by: Amos Kong ak...@redhat.com

Reviewed-by: Vlad Yasevich vyase...@redhat.com

-vlad

 ---
 v2: put gchar *path inside rxfilter_notify_enabled block
 ---
  hw/net/virtio-net.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index 613f144..b75c753 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
 @@ -200,16 +200,16 @@ static void rxfilter_notify(NetClientState *nc)
  VirtIONet *n = qemu_get_nic_opaque(nc);
  
  if (nc-rxfilter_notify_enabled) {
 +gchar *path = object_get_canonical_path(OBJECT(n-qdev));
  if (n-netclient_name) {
  event_data = qobject_from_jsonf({ 'name': %s, 'path': %s },
 -n-netclient_name,
 -
 object_get_canonical_path(OBJECT(n-qdev)));
 +n-netclient_name, path);
  } else {
 -event_data = qobject_from_jsonf({ 'path': %s },
 -
 object_get_canonical_path(OBJECT(n-qdev)));
 +event_data = qobject_from_jsonf({ 'path': %s }, path);
  }
  monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
  qobject_decref(event_data);
 +g_free(path);
  
  /* disable event notification to avoid events flooding */
  nc-rxfilter_notify_enabled = 0;

Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-18 Thread Vlad Yasevich

On 11/18/2013 10:02 AM, Michael S. Tsirkin wrote:
 On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote:
 We currently just update the HMP NIC info when the last bit of macaddr
 is written. This assumes that guest driver will write all the macaddr
 from bit 0 to bit 5 when it changes the macaddr, this is the current
 behavior of linux driver (e1000/rtl8139cp), but we can't do this
 assumption.

 The macaddr that is used for rx-filter will be updated when every bit
 is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
 info when every bit is changed. It will be same as virtio-net.

 Signed-off-by: Amos Kong ak...@redhat.com

 Vlad here told me he did some research and this
 does not actually match hardware behaviour
 for either e1000 or rtl8139.

 Vlad, would you like to elaborate on-list?

Sure.

I decided to dig into the hardware data-sheets and the OS drivers
that use the HW to see what's really going on and how the
hw expects this data to be programmed.

Here is what I've found so far:
E1000:
   e1000 stores each mac address in 2 registers:
   RAL - receive register low
   RAH - receive register high
   The highest order bit of RAH (bit 31) is called the address available
   bit.  When this bit is set the HW will attempt to use the address for
   packet matching.  So, when the mac address is initially programmed
   into HW, that AV bit is not set until RAH is written.  Thus drivers
   really should do writes in RAL+RAH order, otherwise AV bit would be
   set on a partially set address.
   There is a slight issue when the receive address registers already
   have a value.  Since the address is written in 2 separate writes,
   there is a very small window of time when the RAL is set to the new
   value and RAH is set to the old value with AV still set.  I am
   talking to Intel guys about this now.

   So from the POV of notifying libvirt/management that address is
   changed, it should be done when RAH is set.

RTL8139:
   Realtek devices have a 9346CR Command Register that gates write
   access to certain configuration regions on the HW.  It is placed
   into Configuration register write enabled mode before driver can
   write to one of a set of configuration spaces.  Even though
   the data sheet doesn't mention this, it appears that this must
   also must be used to guard write access to receive address register
   of the card.  All variants of BSD and linux drivers that I've found
   use this along with comment that say that this is an undocumented
   requirement.  I am not sure what the HW does to incoming frames when
   the command register is to this mode.
   I see 2 things that we might be able to do here:
 1) A low-impact change might be to only notify the management when
we've detected an address change and currently exiting
config write mode.
 2) A more invasive change might be to disable rx_handling while in
config wirte mode.  This would prevent attempting to match
packets to a partially written mac address.

   I have a patch that does (1) above.


Thoughts?
-vlad


 I think we should revert this for 1.8 and
 look at emulating actual hardware behaviour.

 ---
  hw/net/e1000.c   | 2 +-
  hw/net/rtl8139.c | 5 +
  2 files changed, 2 insertions(+), 5 deletions(-)

 diff --git a/hw/net/e1000.c b/hw/net/e1000.c
 index ec8ecd7..2d60639 100644
 --- a/hw/net/e1000.c
 +++ b/hw/net/e1000.c
 @@ -1110,7 +1110,7 @@ mac_writereg(E1000State *s, int index, uint32_t
val)

  s-mac_reg[index] = val;

 -if (index == RA + 1) {
 +if (index == RA || index == RA + 1) {
  macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t
*)macaddr);
 diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
 index 5329f44..7f2b4db 100644
 --- a/hw/net/rtl8139.c
 +++ b/hw/net/rtl8139.c
 @@ -2741,10 +2741,7 @@ static void rtl8139_io_writeb(void *opaque,
uint8_t addr, uint32_t val)

  switch (addr)
  {
 -case MAC0 ... MAC0+4:
 -s-phys[addr - MAC0] = val;
 -break;
 -case MAC0+5:
 +case MAC0 ... MAC0+5:
  s-phys[addr - MAC0] = val;
  qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
  break;
 --
 1.8.3.1

Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-18 Thread Vlad Yasevich

On 11/18/2013 02:42 PM, Michael S. Tsirkin wrote:
 On Mon, Nov 18, 2013 at 12:33:20PM -0500, Vlad Yasevich wrote:
 On 11/18/2013 10:02 AM, Michael S. Tsirkin wrote:
 On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote:
 We currently just update the HMP NIC info when the last bit of macaddr
 is written. This assumes that guest driver will write all the macaddr
 from bit 0 to bit 5 when it changes the macaddr, this is the current
 behavior of linux driver (e1000/rtl8139cp), but we can't do this
 assumption.

 The macaddr that is used for rx-filter will be updated when every bit
 is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
 info when every bit is changed. It will be same as virtio-net.

 Signed-off-by: Amos Kong ak...@redhat.com

 Vlad here told me he did some research and this
 does not actually match hardware behaviour
 for either e1000 or rtl8139.

 Vlad, would you like to elaborate on-list?

 Sure.

 I decided to dig into the hardware data-sheets and the OS drivers
 that use the HW to see what's really going on and how the
 hw expects this data to be programmed.

 Here is what I've found so far:
 E1000:
e1000 stores each mac address in 2 registers:
RAL - receive register low
RAH - receive register high
The highest order bit of RAH (bit 31) is called the address available
bit.  When this bit is set the HW will attempt to use the address for
packet matching.  So, when the mac address is initially programmed
into HW, that AV bit is not set until RAH is written.  Thus drivers
really should do writes in RAL+RAH order, otherwise AV bit would be
set on a partially set address.
There is a slight issue when the receive address registers already
have a value.  Since the address is written in 2 separate writes,
there is a very small window of time when the RAL is set to the new
value and RAH is set to the old value with AV still set.  I am
talking to Intel guys about this now.

So from the POV of notifying libvirt/management that address is
changed, it should be done when RAH is set.

 RTL8139:
Realtek devices have a 9346CR Command Register that gates write
access to certain configuration regions on the HW.  It is placed
into Configuration register write enabled mode before driver can
write to one of a set of configuration spaces.  Even though
the data sheet doesn't mention this, it appears that this must
also must be used to guard write access to receive address register
of the card.  All variants of BSD and linux drivers that I've found
use this along with comment that say that this is an undocumented
requirement.
 
 What does a windows driver do BTW?

I'll boot windows and find out.

-vlad

 
I am not sure what the HW does to incoming frames when
the command register is to this mode.
I see 2 things that we might be able to do here:
  1) A low-impact change might be to only notify the management when
 we've detected an address change and currently exiting
  config write mode.
  2) A more invasive change might be to disable rx_handling while in
 config wirte mode.  This would prevent attempting to match
  packets to a partially written mac address.

I have a patch that does (1) above.


 Thoughts?
 -vlad
 
 Let's start by reverting cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
 

 I think we should revert this for 1.8 and
 look at emulating actual hardware behaviour.

 ---
  hw/net/e1000.c   | 2 +-
  hw/net/rtl8139.c | 5 +
  2 files changed, 2 insertions(+), 5 deletions(-)

 diff --git a/hw/net/e1000.c b/hw/net/e1000.c
 index ec8ecd7..2d60639 100644
 --- a/hw/net/e1000.c
 +++ b/hw/net/e1000.c
 @@ -1110,7 +1110,7 @@ mac_writereg(E1000State *s, int index, uint32_t
 val)

  s-mac_reg[index] = val;

 -if (index == RA + 1) {
 +if (index == RA || index == RA + 1) {
  macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t
 *)macaddr);
 diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
 index 5329f44..7f2b4db 100644
 --- a/hw/net/rtl8139.c
 +++ b/hw/net/rtl8139.c
 @@ -2741,10 +2741,7 @@ static void rtl8139_io_writeb(void *opaque,
 uint8_t addr, uint32_t val)

  switch (addr)
  {
 -case MAC0 ... MAC0+4:
 -s-phys[addr - MAC0] = val;
 -break;
 -case MAC0+5:
 +case MAC0 ... MAC0+5:
  s-phys[addr - MAC0] = val;
  qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
  break;
 --
 1.8.3.1

Re: [Qemu-devel] [PATCH for-1.7] Revert e1000/rtl8139: update HMP NIC when every bit is written

2013-11-18 Thread Vlad Yasevich

On 11/18/2013 02:47 PM, Michael S. Tsirkin wrote:
 This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
 Digging into hardware specs shows this does not
 actually make QEMU behave more like hardware.
 Let's stick to the tried heuristic for 1.7 and
 possibly revisit for 1.8.
 
 Reported-by: Vlad Yasevich vyase...@redhat.com
 Cc: Amos Kong ak...@redhat.com
 Cc: Alex Williamson alex.william...@redhat.com

Reviewed-by: Vlad Yasevich vyase...@redhat.com.

-vlad

 ---
  hw/net/e1000.c   | 2 +-
  hw/net/rtl8139.c | 5 -
  2 files changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/hw/net/e1000.c b/hw/net/e1000.c
 index ae63591..8387443 100644
 --- a/hw/net/e1000.c
 +++ b/hw/net/e1000.c
 @@ -1106,7 +1106,7 @@ mac_writereg(E1000State *s, int index, uint32_t val)
  
  s-mac_reg[index] = val;
  
 -if (index == RA || index == RA + 1) {
 +if (index == RA + 1) {
  macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr);
 diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
 index 7f2b4db..5329f44 100644
 --- a/hw/net/rtl8139.c
 +++ b/hw/net/rtl8139.c
 @@ -2741,7 +2741,10 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
 addr, uint32_t val)
  
  switch (addr)
  {
 -case MAC0 ... MAC0+5:
 +case MAC0 ... MAC0+4:
 +s-phys[addr - MAC0] = val;
 +break;
 +case MAC0+5:
  s-phys[addr - MAC0] = val;
  qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
  break;

Re: [Qemu-devel] [PATCH for-1.7] Revert e1000/rtl8139: update HMP NIC when every bit is written

2013-11-18 Thread Vlad Yasevich

On 11/18/2013 02:58 PM, Alex Williamson wrote:
 On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote:
 This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
 Digging into hardware specs shows this does not
 actually make QEMU behave more like hardware.
 Let's stick to the tried heuristic for 1.7 and
 possibly revisit for 1.8.
 
 If this is broken, then so are these:
 
 23c37c37f0280761072c23bf67d3a4f3c0ff25aa
 7c36507c2b8776266f50c5e2739bd18279953b93

These aren't really broken.  They just assume that the high order
writes will happen after the low order writes.

In the case of e1000, this is a little more then an assumption
(particularly in the case of nic initilization).

In the case of RTL nic, it is just an  assumption, but it hasn't
been shown faulty yet.  We do plan to address this a bit more
thoroughly.

The patch that was applied was controversial and more then 1 person
expressed reservations.

-vlad

 
 None of these change the behavior of hardware, they only change when the
 monitor gets told about mac address changes.  I'd suggest either add the
 emulation described in each spec or revert all of them.  A partial
 revert is just noise.  Thanks,
 
 Alex
 

 Reported-by: Vlad Yasevich vyase...@redhat.com
 Cc: Amos Kong ak...@redhat.com
 Cc: Alex Williamson alex.william...@redhat.com
 ---
  hw/net/e1000.c   | 2 +-
  hw/net/rtl8139.c | 5 -
  2 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/hw/net/e1000.c b/hw/net/e1000.c
 index ae63591..8387443 100644
 --- a/hw/net/e1000.c
 +++ b/hw/net/e1000.c
 @@ -1106,7 +1106,7 @@ mac_writereg(E1000State *s, int index, uint32_t val)
  
  s-mac_reg[index] = val;
  
 -if (index == RA || index == RA + 1) {
 +if (index == RA + 1) {
  macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t 
 *)macaddr);
 diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
 index 7f2b4db..5329f44 100644
 --- a/hw/net/rtl8139.c
 +++ b/hw/net/rtl8139.c
 @@ -2741,7 +2741,10 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
 addr, uint32_t val)
  
  switch (addr)
  {
 -case MAC0 ... MAC0+5:
 +case MAC0 ... MAC0+4:
 +s-phys[addr - MAC0] = val;
 +break;
 +case MAC0+5:
  s-phys[addr - MAC0] = val;
  qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
  break;

Re: [Qemu-devel] [PATCH for-1.7] Revert e1000/rtl8139: update HMP NIC when every bit is written

2013-11-18 Thread Vlad Yasevich

On 11/18/2013 03:33 PM, Alex Williamson wrote:
 On Mon, 2013-11-18 at 15:09 -0500, Vlad Yasevich wrote:
 On 11/18/2013 02:58 PM, Alex Williamson wrote:
 On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote:
 This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
 Digging into hardware specs shows this does not
 actually make QEMU behave more like hardware.
 Let's stick to the tried heuristic for 1.7 and
 possibly revisit for 1.8.

 If this is broken, then so are these:

 23c37c37f0280761072c23bf67d3a4f3c0ff25aa
 7c36507c2b8776266f50c5e2739bd18279953b93

 These aren't really broken.  They just assume that the high order
 writes will happen after the low order writes.

 In the case of e1000, this is a little more then an assumption
 (particularly in the case of nic initilization).
 
 But AIUI there's also a valid bit in that high order byte on e1000, so
 reverting cd5be582 means we stuff a new mac into qemu less often, but
 it's still only accurate some of the time.

Yes, there is a slight issue with validity of mac at the time of
processing packets.  I have an outstanding question on the Intel
list about this behavior with real HW.  But, with e1000, the validity
bit provides a much higher guarantee that a guest that will be
setting the mac address will write the high register second to
guarantee that when the valid bit is written, the mac is fully
valid.  As a result we don't really need the e1000 part of the
cd5be5829.


 
 In the case of RTL nic, it is just an  assumption, but it hasn't
 been shown faulty yet.  We do plan to address this a bit more
 thoroughly.
 
 So how is RTL less broken without cd5be582?  AIUI the valid bit is off
 in a separate register on RTL, so we have no guarantee about order of
 updating the mac.  Without cd5be582 the info in the monitor may be
 permanently broken if the guest uses a write order other than what we
 assume.
 

This one is actually not as bad either.  RTL spec requires that
receive register writes happen as 32 bit word writes.  This is
what linux and bsd drivers do, so from driver perspective, the
issue is the same.  What our emulation layer does is turn these
32 bit writes into 4 8-bit writes.  This is likely due to some
very broken and very old drivers, but I am not sure.

So, the information in the monitor will be broken if the guest
does: write_hi(); write_lo();  A part of me would really like
to see a guest that does this :)

The current code isn't perfect either.  It still has a potential
to show the wrong mac address in the monitor.  I doesn't make
a lot of sense to me to replace one sub-optimal solution
with another sub-optimal solution, especially since no-one
complained.

-vlad

 The patch that was applied was controversial and more then 1 person
 expressed reservations.
 
 Understood, but it also raised and addressed a shortcoming in the
 previous patches.  If cd5be582 was controversial because the monitor was
 getting updated with incorrect mac addresses then this simple revert
 doesn't solve that problem.  Thanks,
 
 Alex
 

 None of these change the behavior of hardware, they only change when the
 monitor gets told about mac address changes.  I'd suggest either add the
 emulation described in each spec or revert all of them.  A partial
 revert is just noise.  Thanks,

 Alex


 Reported-by: Vlad Yasevich vyase...@redhat.com
 Cc: Amos Kong ak...@redhat.com
 Cc: Alex Williamson alex.william...@redhat.com
 ---
  hw/net/e1000.c   | 2 +-
  hw/net/rtl8139.c | 5 -
  2 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/hw/net/e1000.c b/hw/net/e1000.c
 index ae63591..8387443 100644
 --- a/hw/net/e1000.c
 +++ b/hw/net/e1000.c
 @@ -1106,7 +1106,7 @@ mac_writereg(E1000State *s, int index, uint32_t val)
  
  s-mac_reg[index] = val;
  
 -if (index == RA || index == RA + 1) {
 +if (index == RA + 1) {
  macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t 
 *)macaddr);
 diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
 index 7f2b4db..5329f44 100644
 --- a/hw/net/rtl8139.c
 +++ b/hw/net/rtl8139.c
 @@ -2741,7 +2741,10 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
 addr, uint32_t val)
  
  switch (addr)
  {
 -case MAC0 ... MAC0+5:
 +case MAC0 ... MAC0+4:
 +s-phys[addr - MAC0] = val;
 +break;
 +case MAC0+5:
  s-phys[addr - MAC0] = val;
  qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
  break;

Re: [Qemu-devel] [PATCH for-1.7] Revert e1000/rtl8139: update HMP NIC when every bit is written

2013-11-18 Thread Vlad Yasevich

On 11/18/2013 04:33 PM, Alex Williamson wrote:
 On Mon, 2013-11-18 at 15:57 -0500, Vlad Yasevich wrote:
 On 11/18/2013 03:33 PM, Alex Williamson wrote:
 On Mon, 2013-11-18 at 15:09 -0500, Vlad Yasevich wrote:
 On 11/18/2013 02:58 PM, Alex Williamson wrote:
 On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote:
 This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
 Digging into hardware specs shows this does not
 actually make QEMU behave more like hardware.
 Let's stick to the tried heuristic for 1.7 and
 possibly revisit for 1.8.

 If this is broken, then so are these:

 23c37c37f0280761072c23bf67d3a4f3c0ff25aa
 7c36507c2b8776266f50c5e2739bd18279953b93

 These aren't really broken.  They just assume that the high order
 writes will happen after the low order writes.

 In the case of e1000, this is a little more then an assumption
 (particularly in the case of nic initilization).

 But AIUI there's also a valid bit in that high order byte on e1000, so
 reverting cd5be582 means we stuff a new mac into qemu less often, but
 it's still only accurate some of the time.

 Yes, there is a slight issue with validity of mac at the time of
 processing packets.  I have an outstanding question on the Intel
 list about this behavior with real HW.  But, with e1000, the validity
 bit provides a much higher guarantee that a guest that will be
 setting the mac address will write the high register second to
 guarantee that when the valid bit is written, the mac is fully
 valid.  As a result we don't really need the e1000 part of the
 cd5be5829.
 
 But doesn't that valid bit mean that a mac update will start and end
 with a write to the high order register?  So we're assuming:
 
 a) write RA + 1 (invalidate)
 b) write RA (write low)
 c) write RA + 1 (write high + valid)
 

No. On update, only steps b and c typically happen.  Thus my question
to the on the intel list.


 Without cd5be5829 the only change is that we don't store a new mac into
 the monitor at b).  The mac stored in the monitor is still wrong from a)
 until c).  So it's ever so slightly less broken without cd5be5829.

Since there is really no a), the mac in the monitor is only different
after step b).  since it's is incomplete and we expect step c), there
is really no point in updating it.

 

 In the case of RTL nic, it is just an  assumption, but it hasn't
 been shown faulty yet.  We do plan to address this a bit more
 thoroughly.

 So how is RTL less broken without cd5be582?  AIUI the valid bit is off
 in a separate register on RTL, so we have no guarantee about order of
 updating the mac.  Without cd5be582 the info in the monitor may be
 permanently broken if the guest uses a write order other than what we
 assume.


 This one is actually not as bad either.  RTL spec requires that
 receive register writes happen as 32 bit word writes.  This is
 what linux and bsd drivers do, so from driver perspective, the
 issue is the same.  What our emulation layer does is turn these
 32 bit writes into 4 8-bit writes.  This is likely due to some
 very broken and very old drivers, but I am not sure.

 So, the information in the monitor will be broken if the guest
 does: write_hi(); write_lo();  A part of me would really like
 to see a guest that does this :)
 
 So the argument for reverting here is that it seems unlikely that the
 dwords get written in the hi-lo order and we'd rather the monitor get
 stuck with the wrong mac forever

For how many/which guests?  I know it's not linux or BSD.  I need to
boot windows to see what it does, but I think it does the right thing.

 than it show the wrong mac address for
 a tiny window for everybody else?

Yes, this would happen for everybody.  If you are querying the output,
you might see this and it will show up as 2 changes.

We are talking about 2 tiny amounts here:
On the one hand, we __might__ have guests that write mac and reverse
order thus showing wrong address.
On the other hand we have all the guests who will show the wrong address
for a _short_ time.

I have a hard time deciding, but have a slight preference for a small
uncertainty (the # of backward writers is very small), rather then a
small certainty (everyone will be effected for a small period of time).

  I think you say something about
 sub-optimal here...
 
 The current code isn't perfect either.  It still has a potential
 to show the wrong mac address in the monitor.  I doesn't make
 a lot of sense to me to replace one sub-optimal solution
 with another sub-optimal solution, especially since no-one
 complained.
 
 Exactly, the code isn't perfect either way and this revert is just
 replacing one sub-optimal solution for another.  So why do it?

Another part of this, for me, is that a change was made that we had a
disagreement on and a different approach was being discussed.  The
revert is to bring us back to before this change.

-vlad

  Thanks,

 Alex
 
 The patch that was applied was controversial and more then 1 person
 expressed reservations

Re: [Qemu-devel] [PATCH for-1.7] Revert e1000/rtl8139: update HMP NIC when every bit is written

2013-11-18 Thread Vlad Yasevich

On 11/18/2013 05:40 PM, Alex Williamson wrote:
 On Mon, 2013-11-18 at 17:07 -0500, Vlad Yasevich wrote:
 On 11/18/2013 04:33 PM, Alex Williamson wrote:
 On Mon, 2013-11-18 at 15:57 -0500, Vlad Yasevich wrote:
 On 11/18/2013 03:33 PM, Alex Williamson wrote:
 On Mon, 2013-11-18 at 15:09 -0500, Vlad Yasevich wrote:
 On 11/18/2013 02:58 PM, Alex Williamson wrote:
 On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote:
 This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
 Digging into hardware specs shows this does not
 actually make QEMU behave more like hardware.
 Let's stick to the tried heuristic for 1.7 and
 possibly revisit for 1.8.

 If this is broken, then so are these:

 23c37c37f0280761072c23bf67d3a4f3c0ff25aa
 7c36507c2b8776266f50c5e2739bd18279953b93

 These aren't really broken.  They just assume that the high order
 writes will happen after the low order writes.

 In the case of e1000, this is a little more then an assumption
 (particularly in the case of nic initilization).

 But AIUI there's also a valid bit in that high order byte on e1000, so
 reverting cd5be582 means we stuff a new mac into qemu less often, but
 it's still only accurate some of the time.

 Yes, there is a slight issue with validity of mac at the time of
 processing packets.  I have an outstanding question on the Intel
 list about this behavior with real HW.  But, with e1000, the validity
 bit provides a much higher guarantee that a guest that will be
 setting the mac address will write the high register second to
 guarantee that when the valid bit is written, the mac is fully
 valid.  As a result we don't really need the e1000 part of the
 cd5be5829.

 But doesn't that valid bit mean that a mac update will start and end
 with a write to the high order register?  So we're assuming:

 a) write RA + 1 (invalidate)
 b) write RA (write low)
 c) write RA + 1 (write high + valid)


 No. On update, only steps b and c typically happen.  Thus my question
 to the on the intel list.
 
 So perhaps the bit is some kind of data latch bit and the mac address
 fields within those registers are effectively scratch until that bit is
 written?
 
 Without cd5be5829 the only change is that we don't store a new mac into
 the monitor at b).  The mac stored in the monitor is still wrong from a)
 until c).  So it's ever so slightly less broken without cd5be5829.

 Since there is really no a), the mac in the monitor is only different
 after step b).  since it's is incomplete and we expect step c), there
 is really no point in updating it.
 
 Great, so I have no argument against reverting, or just fixing, that
 chunk of cd5be5829.  Let's implement the latch bit too.
 

 In the case of RTL nic, it is just an  assumption, but it hasn't
 been shown faulty yet.  We do plan to address this a bit more
 thoroughly.

 So how is RTL less broken without cd5be582?  AIUI the valid bit is off
 in a separate register on RTL, so we have no guarantee about order of
 updating the mac.  Without cd5be582 the info in the monitor may be
 permanently broken if the guest uses a write order other than what we
 assume.


 This one is actually not as bad either.  RTL spec requires that
 receive register writes happen as 32 bit word writes.  This is
 what linux and bsd drivers do, so from driver perspective, the
 issue is the same.  What our emulation layer does is turn these
 32 bit writes into 4 8-bit writes.  This is likely due to some
 very broken and very old drivers, but I am not sure.

 So, the information in the monitor will be broken if the guest
 does: write_hi(); write_lo();  A part of me would really like
 to see a guest that does this :)

 So the argument for reverting here is that it seems unlikely that the
 dwords get written in the hi-lo order and we'd rather the monitor get
 stuck with the wrong mac forever

 For how many/which guests?  I know it's not linux or BSD.  I need to
 boot windows to see what it does, but I think it does the right thing.
 
 How many guests do you plan to test?

I think the proposal was to see if anyone reports an issue :)

 
 than it show the wrong mac address for
 a tiny window for everybody else?

 Yes, this would happen for everybody.  If you are querying the output,
 you might see this and it will show up as 2 changes.

 We are talking about 2 tiny amounts here:
 On the one hand, we __might__ have guests that write mac and reverse
 order thus showing wrong address.
 On the other hand we have all the guests who will show the wrong address
 for a _short_ time.

 I have a hard time deciding, but have a slight preference for a small
 uncertainty (the # of backward writers is very small), rather then a
 small certainty (everyone will be effected for a small period of time).
 
 Why compromise?  Why not implement the register that handles whether the
 mac is valid on RTL?


I am working on this but it's too late for 1.7.

For 1.7, we have a question:  impact a very small number (# - 0) of
guests for a long time, or impact

Re: [Qemu-devel] [PATCH v2] pc: add 1.8 machine type

2013-11-15 Thread Vlad Yasevich


On 11/14/2013 05:37 AM, Michael S. Tsirkin wrote:



-#define PC_Q35_1_7_MACHINE_OPTIONS PC_Q35_MACHINE_OPTIONS
+#define PC_Q35_1_8_MACHINE_OPTIONS PC_Q35_MACHINE_OPTIONS
+
+static QEMUMachine pc_q35_machine_v1_8 = {
+PC_Q35_1_8_MACHINE_OPTIONS,
+.name = pc-q35-1.8,
+.alias = q35,
+.init = pc_q35_init,
+};
+
+#define PC_Q35_1_7_MACHINE_OPTIONS PC_Q35_1_8_MACHINE_OPTIONS

  static QEMUMachine pc_q35_machine_v1_7 = {
  PC_Q35_1_7_MACHINE_OPTIONS,
  .name = pc-q35-1.7,
  .alias = q35,
-.init = pc_q35_init,
+.init = pc_q35_init_1_7,
  };


Hi Michael

Shouldn't the '.alias' be removed from the 1.7 machine?

Thanks
-vlad


-#define PC_Q35_1_6_MACHINE_OPTIONS PC_Q35_MACHINE_OPTIONS
+#define PC_Q35_1_6_MACHINE_OPTIONS PC_Q35_1_7_MACHINE_OPTIONS

  static QEMUMachine pc_q35_machine_v1_6 = {
  PC_Q35_1_6_MACHINE_OPTIONS,
@@ -313,6 +333,7 @@ static QEMUMachine pc_q35_machine_v1_4 = {

  static void pc_q35_machine_init(void)
  {
+qemu_register_machine(pc_q35_machine_v1_8);
  qemu_register_machine(pc_q35_machine_v1_7);
  qemu_register_machine(pc_q35_machine_v1_6);
  qemu_register_machine(pc_q35_machine_v1_5);

Re: [Qemu-devel] [PATCH] net: move rxfilter_notify() to net.c

2013-11-15 Thread Vlad Yasevich


On 11/15/2013 10:27 AM, Stefan Hajnoczi wrote:

On Tue, Nov 05, 2013 at 07:00:48PM +0800, Amos Kong wrote:

@@ -545,7 +523,7 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t 
cmd,
  return VIRTIO_NET_ERR;
  }

-rxfilter_notify(nc);
+rxfilter_notify(nc, object_get_canonical_path(OBJECT(n-qdev)));

  return VIRTIO_NET_OK;
  }

[...]

diff --git a/net/net.c b/net/net.c
index c330c9a..f41a457 100644
--- a/net/net.c
+++ b/net/net.c
@@ -40,6 +40,7 @@
  #include qapi-visit.h
  #include qapi/opts-visitor.h
  #include qapi/dealloc-visitor.h
+#include qapi/qmp/qjson.h

  /* Net bridge is currently not supported for W32. */
  #if !defined(_WIN32)
@@ -962,6 +963,25 @@ void print_net_client(Monitor *mon, NetClientState *nc)
 nc-info_str);
  }

+void rxfilter_notify(NetClientState *nc, const char *path)
+{
+QObject *event_data;
+
+if (nc-rxfilter_notify_enabled) {
+if (nc-name) {
+event_data = qobject_from_jsonf({ 'name': %s, 'path': %s },
+   nc-name, path);
+} else {
+event_data = qobject_from_jsonf({ 'path': %s }, path);
+}
+monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
+qobject_decref(event_data);
+
+/* disable event notification to avoid events flooding */
+nc-rxfilter_notify_enabled = 0;
+}
+}


Please fix the memory leak:
object_get_canonical_path() returns a gchar* that the caller must free.



Wow, this memory leak is all over the place and not just in this patch.

-vlad

[Qemu-devel] [PATCH] qom: Fix memory leak in object_property_set_link()

2013-11-15 Thread Vlad Yasevich

Save the result of the call to object_get_cannonical_path()
so we can free it.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
 qom/object.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index b617f26..21b6f87 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -838,8 +838,9 @@ char *object_property_get_str(Object *obj, const char *name,
 void object_property_set_link(Object *obj, Object *value,
   const char *name, Error **errp)
 {
-object_property_set_str(obj, object_get_canonical_path(value),
-name, errp);
+gchar *path = object_get_cannonical_path(value);
+object_property_set_str(obj, path, name, errp);
+gfree(path);
 }
 
 Object *object_property_get_link(Object *obj, const char *name,
-- 
1.8.4.2

Re: [Qemu-devel] [PATCH] qom: Fix memory leak in object_property_set_link()

2013-11-15 Thread Vlad Yasevich


On 11/15/2013 12:06 PM, Vlad Yasevich wrote:

Save the result of the call to object_get_cannonical_path()
so we can free it.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
  qom/object.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index b617f26..21b6f87 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -838,8 +838,9 @@ char *object_property_get_str(Object *obj, const char *name,
  void object_property_set_link(Object *obj, Object *value,
const char *name, Error **errp)
  {
-object_property_set_str(obj, object_get_canonical_path(value),
-name, errp);
+gchar *path = object_get_cannonical_path(value);
+object_property_set_str(obj, path, name, errp);
+gfree(path);
  }

  Object *object_property_get_link(Object *obj, const char *name,



Self-nack.  Forgot to commit the changes that actually make it build and 
work :/


-vlad

[Qemu-devel] [PATCH v2] qom: Fix memory leak in object_property_set_link()

2013-11-15 Thread Vlad Yasevich

Save the result of the call to object_get_cannonical_path()
so we can free it.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
v1-v2:  Builds and works :)

 qom/object.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index b617f26..fc19cf6 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -838,8 +838,9 @@ char *object_property_get_str(Object *obj, const char *name,
 void object_property_set_link(Object *obj, Object *value,
   const char *name, Error **errp)
 {
-object_property_set_str(obj, object_get_canonical_path(value),
-name, errp);
+gchar *path = object_get_canonical_path(value);
+object_property_set_str(obj, path, name, errp);
+g_free(path);
 }
 
 Object *object_property_get_link(Object *obj, const char *name,
-- 
1.8.4.2

Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-13 Thread Vlad Yasevich


On 11/10/2013 07:11 AM, Michael S. Tsirkin wrote:

On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote:

What about this approach?  This only updates the monitory when all the
bits have been written to.

-vlad



Thanks!
Some comments below.


-- 8 --
Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.


I would rather say it seems better not to make this assumption.
This does look somewhat safer than what Amos proposed.



The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit has been changed. It will be same as virtio-net.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
  hw/net/e1000.c   | 17 -
  hw/net/rtl8139.c | 14 +-
  2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..a5967ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
  #define E1000_FLAG_AUTONEG (1  E1000_FLAG_AUTONEG_BIT)
  #define E1000_FLAG_MIT (1  E1000_FLAG_MIT_BIT)
  uint32_t compat_flags;
+uint32_t mac_changed;


Hmm why uint32_t? uint8_t is enough here isn't?

This new state has to be migrated then, and
we need to fallback to old behaviour if migrating to/from
an old version (see compat_flags for one way to
detect this compatibility mode).



Hi Michael

I started looking at migrating this thing and I now starting to question
the whole approach.

The only reason to migrate this is if we can migrate between writes to
the mac address registers.  We can fairly easily solve the issue of
migrating from net to old versions.  The more interesting question
is migrating from old to new versions.

If someone is migrating from an older version (without this feature)
to a newer version and doing so between writes, the bitmap state will
have no idea that a partial write has already happened.  The completing
write will just set one of the bits and notifications that we are
looking for do not happen.

-vlad

Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-13 Thread Vlad Yasevich


On 11/13/2013 03:00 PM, Michael S. Tsirkin wrote:

On Wed, Nov 13, 2013 at 11:21:18AM -0500, Vlad Yasevich wrote:

On 11/10/2013 07:11 AM, Michael S. Tsirkin wrote:

On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote:

What about this approach?  This only updates the monitory when all the
bits have been written to.

-vlad



Thanks!
Some comments below.


-- 8 --
Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr

from bit 0 to bit 5 when it changes the macaddr, this is the current

behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.


I would rather say it seems better not to make this assumption.
This does look somewhat safer than what Amos proposed.



The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit has been changed. It will be same as virtio-net.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
  hw/net/e1000.c   | 17 -
  hw/net/rtl8139.c | 14 +-
  2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..a5967ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
  #define E1000_FLAG_AUTONEG (1  E1000_FLAG_AUTONEG_BIT)
  #define E1000_FLAG_MIT (1  E1000_FLAG_MIT_BIT)
  uint32_t compat_flags;
+uint32_t mac_changed;


Hmm why uint32_t? uint8_t is enough here isn't?

This new state has to be migrated then, and
we need to fallback to old behaviour if migrating to/from
an old version (see compat_flags for one way to
detect this compatibility mode).



Hi Michael

I started looking at migrating this thing and I now starting to question
the whole approach.

The only reason to migrate this is if we can migrate between writes to
the mac address registers.


Absolutely. For some reason below you only discuss cross version
migration but it needs to be migrated for same version migration too.


  We can fairly easily solve the issue of
migrating from net to old versions.


I'm not sure it's easier, but it needs to happen anymore.


  The more interesting question
is migrating from old to new versions.

If someone is migrating from an older version (without this feature)
to a newer version and doing so between writes, the bitmap state will
have no idea that a partial write has already happened.  The completing
write will just set one of the bits and notifications that we are
looking for do not happen.

-vlad


Yep, that's a problem too.


For 1.8 just send the bitmap across.


Right.  That's simple enough.



For cross version migration I would say we should just detect -M 1.7
and older and just emulate old behaviour, disregard the bitmap
completely.
Don't do special tricks just for migration.



The compat flags allow for simple migration to 1.7.  However, it's the
migration from 1.7 to 1.8 that's the issue.

There is a version number on the E1000 state and I am thinking of maybe
bumping that so that we can catch older state that doesn't support mac
sate change.  Thoughts?

-vlad

Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-12 Thread Vlad Yasevich


On 11/10/2013 07:11 AM, Michael S. Tsirkin wrote:

On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote:

What about this approach?  This only updates the monitory when all the
bits have been written to.

-vlad



Thanks!
Some comments below.


-- 8 --
Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.


I would rather say it seems better not to make this assumption.
This does look somewhat safer than what Amos proposed.



The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit has been changed. It will be same as virtio-net.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
  hw/net/e1000.c   | 17 -
  hw/net/rtl8139.c | 14 +-
  2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..a5967ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
  #define E1000_FLAG_AUTONEG (1  E1000_FLAG_AUTONEG_BIT)
  #define E1000_FLAG_MIT (1  E1000_FLAG_MIT_BIT)
  uint32_t compat_flags;
+uint32_t mac_changed;


Hmm why uint32_t? uint8_t is enough here isn't?


Yes, that should be fine.  I'll fix and find a better spot to
put it. (may be a hole in the struct).



This new state has to be migrated then, and
we need to fallback to old behaviour if migrating to/from
an old version (see compat_flags for one way to
detect this compatibility mode).


Good point.  Didn't think of that...




+#define E1000_RA0_CHANGED 0
+#define E1000_RA1_CHANGED 1
+#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED)


I don't get it. So E1000_RA_ALL_CHANGED is 0 | 1 == 1.
it looks like the trigger is when E1000_RA1_CHANGED
so that's more or less equivalent.


Goofed on the bits.  That should have been (10) and (11).




  } E1000State;

  #define TYPE_E1000 e1000
@@ -402,6 +406,7 @@ static void e1000_reset(void *opaque)
  d-mac_reg[RA + 1] |= (i  2) ? macaddr[i + 4]  (8 * i) : 0;
  }
  qemu_format_nic_info_str(qemu_get_queue(d-nic), macaddr);
+d-mac_changed = 0;
  }

  static void
@@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val)

  s-mac_reg[index] = val;

-if (index == RA + 1) {
+switch (index) {
+case RA:
+s-mac_changed |= E1000_RA0_CHANGED;
+break;
+case (RA + 1):
+s-mac_changed |= E1000_RA1_CHANGED;
+break;
+}
+
+if (s-mac_changed ==  E1000_RA_ALL_CHANGED) {


Some whitespace damage here.


  macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr);
+   s-mac_changed = 0;


Need to use spaces for indent in qemu.


  }
  }

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c


Best to split out in a separate commit.


OK




index 5329f44..6dac10c 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -476,6 +476,8 @@ typedef struct RTL8139State {

  uint16_t CpCmd;
  uint8_t  TxThresh;
+uint8_t  mac_changed;


This new state has to be migrated then, and
we need to fallback to old behaviour if migrating to/from
an old version.


+#define RTL8139_MAC_CHANGED_ALL 0x3F

  NICState *nic;
  NICConf conf;
@@ -1215,6 +1217,7 @@ static void rtl8139_reset(DeviceState *d)
  /* restore MAC address */
  memcpy(s-phys, s-conf.macaddr.a, 6);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed = 0;

  /* reset interrupt mask */
  s-IntrStatus = 0;
@@ -2741,12 +2744,13 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
addr, uint32_t val)

  switch (addr)
  {
-case MAC0 ... MAC0+4:
-s-phys[addr - MAC0] = val;
-break;
-case MAC0+5:
+case MAC0 ... MAC0+5:
  s-phys[addr - MAC0] = val;
-qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed |= (1  (addr - MAC0));


Better drop the external () here otherwise it starts looking like lisp :)


Heh.  OK.

Thanks
-vlad




+if (s-mac_changed == RTL8139_MAC_CHANGED_ALL) {
+qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed = 0;
+}
  break;
  case MAC0+6 ... MAC0+7:
  /* reserved */
--
1.8.4.2

Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-12 Thread Vlad Yasevich


On 11/08/2013 10:43 PM, Amos Kong wrote:

On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote:

What about this approach?  This only updates the monitory when all the
bits have been written to.


Hi Vlad,

Looks good to me.

Using this patch, we don't need to care the writing order.
If we add event notify in future, it can reduce the noise
event.

BTW, can we use a tmp buffer to record the new mac (changing is unfinished),
and write the new mac to registers until all bits is written?


I've thought about this as well, but I am not sure what it would buy us
and what's the best way to do it.

At first I thought that we could have a function local static that we
could accumulate into.  But then Michael mentioned migration and what
happens if we migrate in middle of the mac change?

So we put into the device, but then it isn't much different then what
we have now with the possible exception that the mac change happens
slightly more atomically. :)  For this, it might make more sense to
temporarily stop the link when the mac address change is happening.

-vlad



Amos


-vlad

-- 8 --
Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written


Subject: [PATCH] e1000/rtl8139: update HMP NIC when all bits are written


We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.

The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit has been changed. It will be same as virtio-net.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
  hw/net/e1000.c   | 17 -
  hw/net/rtl8139.c | 14 +-
  2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..a5967ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
  #define E1000_FLAG_AUTONEG (1  E1000_FLAG_AUTONEG_BIT)
  #define E1000_FLAG_MIT (1  E1000_FLAG_MIT_BIT)
  uint32_t compat_flags;
+uint32_t mac_changed;
+#define E1000_RA0_CHANGED 0
+#define E1000_RA1_CHANGED 1
+#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED)
  } E1000State;

  #define TYPE_E1000 e1000
@@ -402,6 +406,7 @@ static void e1000_reset(void *opaque)
  d-mac_reg[RA + 1] |= (i  2) ? macaddr[i + 4]  (8 * i) : 0;
  }
  qemu_format_nic_info_str(qemu_get_queue(d-nic), macaddr);
+d-mac_changed = 0;
  }

  static void
@@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val)

  s-mac_reg[index] = val;

-if (index == RA + 1) {
+switch (index) {
+case RA:
+s-mac_changed |= E1000_RA0_CHANGED;
+break;
+case (RA + 1):
+s-mac_changed |= E1000_RA1_CHANGED;
+break;
+}
+
+if (s-mac_changed ==  E1000_RA_ALL_CHANGED) {
  macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr);
+   s-mac_changed = 0;
  }
  }

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 5329f44..6dac10c 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -476,6 +476,8 @@ typedef struct RTL8139State {

  uint16_t CpCmd;
  uint8_t  TxThresh;
+uint8_t  mac_changed;
+#define RTL8139_MAC_CHANGED_ALL 0x3F

  NICState *nic;
  NICConf conf;
@@ -1215,6 +1217,7 @@ static void rtl8139_reset(DeviceState *d)
  /* restore MAC address */
  memcpy(s-phys, s-conf.macaddr.a, 6);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed = 0;

  /* reset interrupt mask */
  s-IntrStatus = 0;
@@ -2741,12 +2744,13 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
addr, uint32_t val)

  switch (addr)
  {
-case MAC0 ... MAC0+4:
-s-phys[addr - MAC0] = val;
-break;
-case MAC0+5:
+case MAC0 ... MAC0+5:
  s-phys[addr - MAC0] = val;
-qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed |= (1  (addr - MAC0));
+if (s-mac_changed == RTL8139_MAC_CHANGED_ALL) {
+qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed = 0;
+}
  break;
  case MAC0+6 ... MAC0+7:
  /* reserved */
--
1.8.4.2

Re: [Qemu-devel] [PATCH] virtio-net: don't update mac_table in error state

2013-11-12 Thread Vlad Yasevich


On 11/10/2013 10:48 PM, Amos Kong wrote:

mac_table was always cleaned up first in handling
VIRTIO_NET_CTRL_MAC_TABLE_SET command, and we din't recover
mac_table content in error state, it's not correct.

This patch makes all the changes in temporal variables,
only update the real mac_table if everything is ok.
We won't change mac_table in error state, so rxfilter
notification isn't needed.

This patch also fixed same problame in
  http://lists.nongnu.org/archive/html/qemu-devel/2013-11/msg01188.html
  (not merge)

I will send patch for virtio spec to clarifying this change.

Signed-off-by: Amos Kong ak...@redhat.com
---
  hw/net/virtio-net.c | 34 --
  1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 22dbd05..880fa4a 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -610,11 +610,11 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t 
cmd,
  return VIRTIO_NET_ERR;
  }

-n-mac_table.in_use = 0;
-n-mac_table.first_multi = 0;
-n-mac_table.uni_overflow = 0;
-n-mac_table.multi_overflow = 0;
-memset(n-mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
+int in_use = 0;
+int first_multi = 0;
+uint8_t uni_overflow = 0;
+uint8_t multi_overflow = 0;
+uint8_t *macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN);


I am not sure what the practice is for mixing declarations with code in 
qemu.  Can't find anything in the docs.


Otherwise, looks good to me.

Reviewed-by: Vlad Yasevich vyase...@redhat.com

-vlad



  s = iov_to_buf(iov, iov_cnt, 0, mac_data.entries,
 sizeof(mac_data.entries));
@@ -629,19 +629,19 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t 
cmd,
  }

  if (mac_data.entries = MAC_TABLE_ENTRIES) {
-s = iov_to_buf(iov, iov_cnt, 0, n-mac_table.macs,
+s = iov_to_buf(iov, iov_cnt, 0, macs,
 mac_data.entries * ETH_ALEN);
  if (s != mac_data.entries * ETH_ALEN) {
  goto error;
  }
-n-mac_table.in_use += mac_data.entries;
+in_use += mac_data.entries;
  } else {
-n-mac_table.uni_overflow = 1;
+uni_overflow = 1;
  }

  iov_discard_front(iov, iov_cnt, mac_data.entries * ETH_ALEN);

-n-mac_table.first_multi = n-mac_table.in_use;
+first_multi = in_use;

  s = iov_to_buf(iov, iov_cnt, 0, mac_data.entries,
 sizeof(mac_data.entries));
@@ -656,23 +656,29 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t 
cmd,
  goto error;
  }

-if (n-mac_table.in_use + mac_data.entries = MAC_TABLE_ENTRIES) {
-s = iov_to_buf(iov, iov_cnt, 0, n-mac_table.macs,
+if (in_use + mac_data.entries = MAC_TABLE_ENTRIES) {
+s = iov_to_buf(iov, iov_cnt, 0, macs[in_use * ETH_ALEN],
 mac_data.entries * ETH_ALEN);
  if (s != mac_data.entries * ETH_ALEN) {
  goto error;
  }
-n-mac_table.in_use += mac_data.entries;
+in_use += mac_data.entries;
  } else {
-n-mac_table.multi_overflow = 1;
+multi_overflow = 1;
  }

+n-mac_table.in_use = in_use;
+n-mac_table.first_multi = first_multi;
+n-mac_table.uni_overflow = uni_overflow;
+n-mac_table.multi_overflow = multi_overflow;
+memcpy(n-mac_table.macs, macs, MAC_TABLE_ENTRIES * ETH_ALEN);
+g_free(macs);
  rxfilter_notify(nc);

  return VIRTIO_NET_OK;

  error:
-rxfilter_notify(nc);
+g_free(macs);
  return VIRTIO_NET_ERR;
  }

[Qemu-devel] [PATCH] virtio-net: Correctly store multicast filter entries

2013-11-08 Thread Vlad Yasevich

Commit 921ac5d0f3a0df869db5ce4edf752f51d8b1596a
 virtio-net: remove layout assumptions for ctrl vq
introduced a regression where the multicast address filter
entries are written to the beginning of the mac table array,
thus overwriting any unicast addresses that may have been
programmed in the filter.
The multicast addresses should be written after all the
unicast addresses.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
 hw/net/virtio-net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 22dbd05..94e8b68 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -657,7 +657,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
 }
 
 if (n-mac_table.in_use + mac_data.entries = MAC_TABLE_ENTRIES) {
-s = iov_to_buf(iov, iov_cnt, 0, n-mac_table.macs,
+s = iov_to_buf(iov, iov_cnt, 0,
+   n-mac_table.macs + (n-mac_table.in_use * ETH_ALEN),
mac_data.entries * ETH_ALEN);
 if (s != mac_data.entries * ETH_ALEN) {
 goto error;
-- 
1.8.4.2

Re: [Qemu-devel] [PATCH] virtio-net: Correctly store multicast filter entries

2013-11-08 Thread Vlad Yasevich


On 11/08/2013 11:07 AM, Vlad Yasevich wrote:

Commit 921ac5d0f3a0df869db5ce4edf752f51d8b1596a
 virtio-net: remove layout assumptions for ctrl vq
introduced a regression where the multicast address filter
entries are written to the beginning of the mac table array,
thus overwriting any unicast addresses that may have been
programmed in the filter.
The multicast addresses should be written after all the
unicast addresses.

Signed-off-by: Vlad Yasevich vyase...@redhat.com


Please ignore.  Just saw pull pull request with similar patch.

-vlad


---
  hw/net/virtio-net.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 22dbd05..94e8b68 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -657,7 +657,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
  }

  if (n-mac_table.in_use + mac_data.entries = MAC_TABLE_ENTRIES) {
-s = iov_to_buf(iov, iov_cnt, 0, n-mac_table.macs,
+s = iov_to_buf(iov, iov_cnt, 0,
+   n-mac_table.macs + (n-mac_table.in_use * ETH_ALEN),
 mac_data.entries * ETH_ALEN);
  if (s != mac_data.entries * ETH_ALEN) {
  goto error;

[Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-08 Thread Vlad Yasevich

What about this approach?  This only updates the monitory when all the
bits have been written to.

-vlad

-- 8 --
Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.

The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit has been changed. It will be same as virtio-net.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
 hw/net/e1000.c   | 17 -
 hw/net/rtl8139.c | 14 +-
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..a5967ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
 #define E1000_FLAG_AUTONEG (1  E1000_FLAG_AUTONEG_BIT)
 #define E1000_FLAG_MIT (1  E1000_FLAG_MIT_BIT)
 uint32_t compat_flags;
+uint32_t mac_changed;
+#define E1000_RA0_CHANGED 0
+#define E1000_RA1_CHANGED 1
+#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED)
 } E1000State;
 
 #define TYPE_E1000 e1000
@@ -402,6 +406,7 @@ static void e1000_reset(void *opaque)
 d-mac_reg[RA + 1] |= (i  2) ? macaddr[i + 4]  (8 * i) : 0;
 }
 qemu_format_nic_info_str(qemu_get_queue(d-nic), macaddr);
+d-mac_changed = 0;
 }
 
 static void
@@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val)
 
 s-mac_reg[index] = val;
 
-if (index == RA + 1) {
+switch (index) {
+case RA:
+s-mac_changed |= E1000_RA0_CHANGED;
+break;
+case (RA + 1):
+s-mac_changed |= E1000_RA1_CHANGED;
+break;
+}
+
+if (s-mac_changed ==  E1000_RA_ALL_CHANGED) {
 macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
 macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
 qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr);
+   s-mac_changed = 0;
 }
 }
 
diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 5329f44..6dac10c 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -476,6 +476,8 @@ typedef struct RTL8139State {
 
 uint16_t CpCmd;
 uint8_t  TxThresh;
+uint8_t  mac_changed;
+#define RTL8139_MAC_CHANGED_ALL 0x3F
 
 NICState *nic;
 NICConf conf;
@@ -1215,6 +1217,7 @@ static void rtl8139_reset(DeviceState *d)
 /* restore MAC address */
 memcpy(s-phys, s-conf.macaddr.a, 6);
 qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed = 0;
 
 /* reset interrupt mask */
 s-IntrStatus = 0;
@@ -2741,12 +2744,13 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
addr, uint32_t val)
 
 switch (addr)
 {
-case MAC0 ... MAC0+4:
-s-phys[addr - MAC0] = val;
-break;
-case MAC0+5:
+case MAC0 ... MAC0+5:
 s-phys[addr - MAC0] = val;
-qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed |= (1  (addr - MAC0));
+if (s-mac_changed == RTL8139_MAC_CHANGED_ALL) {
+qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
+s-mac_changed = 0;
+}
 break;
 case MAC0+6 ... MAC0+7:
 /* reserved */
-- 
1.8.4.2

Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-07 Thread Vlad Yasevich


On 11/07/2013 10:27 AM, Michael S. Tsirkin wrote:

On Thu, Nov 07, 2013 at 07:33:57AM -0700, Alex Williamson wrote:

On Thu, 2013-11-07 at 12:26 +0200, Michael S. Tsirkin wrote:

On Thu, Nov 07, 2013 at 03:32:29PM +0800, Amos Kong wrote:

On Thu, Nov 07, 2013 at 08:59:22AM +0200, Michael S. Tsirkin wrote:

On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote:

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.

The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit is changed. It will be same as virtio-net.

Signed-off-by: Amos Kong ak...@redhat.com


I'm not sure I buy this.

If we actually implement e.g. mac change notifications,
sending them on writes of random bytes will confuse
the host.


This patch just effects the monitor display of macaddr.
During each writing, the macaddr is used for rx-filter is really
changed.

In the real hardware, it supports to just write part of bits,
the rx-filtering is effected by every bit writing.


Yes but again, the window can just be too small to matter
on real hardware.

Our emulation is not perfect, fixing this to be just like real
hardware just might expose other bugs we can't fix
that easily.


If we were to implement mac change notification, then every partial
update would send a notify and the host.  Is that a problem?  It seems
no different than if the device had an atomic mac update procedure and
the guest admin changed the mac several times.


I think modern nics make address updates atomic.
Problem is, we are emulating an ancient one,
so to make host configuration of a modern one
reasonable we need to resort to tricks.


The problem with assuming that a given byte is always written last is
that unless the hardware spec identifies an order, we're just basing our
code on examples where we know what the guest driver does, either by
code inspection or access tracing.  If there are examples of guests that
write the last byte first, then the host will never know the correct mac
address.  Maybe there are no guests that use the wrong order, but that's
a pretty exhaustive search.


I agree what we have is a hack. Maybe we need some better hacks.
For example, maybe we should update mac at link state change
(I think  link is always down when mac is updated?).
Needs some thought.


I thought this too, but checking recent linux kernel, e1000 and rtl8139 
seem to allow live mac change so link is up.


So here is a stupid, untested patch for e1000 to notify only once:
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..b99eba4 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
 #define E1000_FLAG_AUTONEG (1  E1000_FLAG_AUTONEG_BIT)
 #define E1000_FLAG_MIT (1  E1000_FLAG_MIT_BIT)
 uint32_t compat_flags;
+uint32_t mac_change_flags;
+#define E1000_RA0_CHANGED 0
+#define E1000_RA1_CHANGED 1
+#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED)
 } E1000State;

 #define TYPE_E1000 e1000
@@ -402,6 +406,7 @@ static void e1000_reset(void *opaque)
 d-mac_reg[RA + 1] |= (i  2) ? macaddr[i + 4]  (8 * i) : 0;
 }
 qemu_format_nic_info_str(qemu_get_queue(d-nic), macaddr);
+d-mac_change_flags = 0;
 }

 static void
@@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val)

 s-mac_reg[index] = val;

-if (index == RA + 1) {
+switch (index) {
+   case RA:
+   s-mac_change_flags |= E1000_RA0_CHANGED;
+   break;
+   case (RA + 1):
+   s-mac_change_flags |= E1000_RA1_CHANGED;
+   break;
+}
+
+if (!(s-mac_change_flags ^ E1000_RA_ALL_CHANGED)) {
 macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
 macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
 qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t 
*)macaddr);

+s-mac_change_flags = 0;
 }
 }

Any thoughts?

-vlad

Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-07 Thread Vlad Yasevich


On 11/07/2013 09:33 AM, Alex Williamson wrote:

On Thu, 2013-11-07 at 12:26 +0200, Michael S. Tsirkin wrote:

On Thu, Nov 07, 2013 at 03:32:29PM +0800, Amos Kong wrote:

On Thu, Nov 07, 2013 at 08:59:22AM +0200, Michael S. Tsirkin wrote:

On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote:

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.

The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit is changed. It will be same as virtio-net.

Signed-off-by: Amos Kong ak...@redhat.com


I'm not sure I buy this.

If we actually implement e.g. mac change notifications,
sending them on writes of random bytes will confuse
the host.


This patch just effects the monitor display of macaddr.
During each writing, the macaddr is used for rx-filter is really
changed.

In the real hardware, it supports to just write part of bits,
the rx-filtering is effected by every bit writing.


Yes but again, the window can just be too small to matter
on real hardware.

Our emulation is not perfect, fixing this to be just like real
hardware just might expose other bugs we can't fix
that easily.


If we were to implement mac change notification, then every partial
update would send a notify and the host.  Is that a problem?  It seems
no different than if the device had an atomic mac update procedure and
the guest admin changed the mac several times.


Yes, but the issue is exercebated in the non-atomic case.  RTL8139
is the worst since it has byte oriented writes so that for ever mac
change, we have the worst case potential to generate 6 notificates
(assuming libvirt is so fast as to pick up ever change).

Additionally, once libvirt is updated, this would cause rather serious
churn on the host as for ever update, libvirt is going to push the
address down to the physical host nic and the fewer of these updates
we do the better.



The problem with assuming that a given byte is always written last is
that unless the hardware spec identifies an order, we're just basing our
code on examples where we know what the guest driver does, either by
code inspection or access tracing.  If there are examples of guests that
write the last byte first, then the host will never know the correct mac
address.  Maybe there are no guests that use the wrong order, but that's
a pretty exhaustive search.

The patch doesn't change anything about how the NIC operates, only when
mac changes are updated.  During an update the mac is in a transitory
state and we can't go back in time to make it atomic on this hardware
design to avoid a window where the wrong mac may be seen.  I think the
patch is the right thing to do.  Thanks,



Reporting half-complete state is not the right thing, IMO.  Right now,
it's doesn't have much impact, but if we start writing these addresses
to the host nic, then this will have a much bigger impact as I said above.

-vlad


Alex


I would say let's leave e1000/rtl8139 well alone unless
we see guests that actually write mac without touching
the last byte.


At least, linux rtl8139cp/e1000 writes macaddr from bit 0 to bit 5.
It works to just watch the last bit.

Thanks, Amos


Then think of ways to detect when mac change is done
for these.


---
  hw/net/e1000.c   | 2 +-
  hw/net/rtl8139.c | 5 +
  2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index ec8ecd7..2d60639 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -1110,7 +1110,7 @@ mac_writereg(E1000State *s, int index, uint32_t val)

  s-mac_reg[index] = val;

-if (index == RA + 1) {
+if (index == RA || index == RA + 1) {
  macaddr[0] = cpu_to_le32(s-mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s-mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s-nic), (uint8_t *)macaddr);
diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 5329f44..7f2b4db 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -2741,10 +2741,7 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
addr, uint32_t val)

  switch (addr)
  {
-case MAC0 ... MAC0+4:
-s-phys[addr - MAC0] = val;
-break;
-case MAC0+5:
+case MAC0 ... MAC0+5:
  s-phys[addr - MAC0] = val;
  qemu_format_nic_info_str(qemu_get_queue(s-nic), s-phys);
  break;
--
1.8.3.1

[Qemu-devel] [PATCH] qdev-properties-system.c: Allow vlan or netdev for -device, not both

2013-11-07 Thread Vlad Yasevich

It is currently possible to specify things like:
-device e1000,netdev=foo,vlan=1
With this usage, whichever argument was specified last (vlan or netdev)
overwrites what was previousely set and results in a non-working
configuration.  Even worse, when used with multiqueue devices,
it causes a segmentation fault on exit in qemu_free_net_client.

That patch treates the above command line options as invalid and
generates an error at start-up.

Signed-off-by: Vlad Yasevich vyase...@redhat.com
---
 hw/core/qdev-properties-system.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 0eada32..729efa8 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -205,6 +205,11 @@ static int parse_netdev(DeviceState *dev, const char *str, 
void **ptr)
 goto err;
 }
 
+if (ncs[i]) {
+ret = -EINVAL;
+goto err;
+}
+
 ncs[i] = peers[i];
 ncs[i]-queue_index = i;
 }
@@ -301,6 +306,10 @@ static void set_vlan(Object *obj, Visitor *v, void *opaque,
 *ptr = NULL;
 return;
 }
+if (*ptr) {
+error_set_from_qdev_prop_error(errp, -EINVAL, dev, prop, name);
+return;
+}
 
 hubport = net_hub_port_find(id);
 if (!hubport) {
-- 
1.8.3.1

[Qemu-devel] [Bug 721793] Re: QEMU freezes on startup (100% CPU utilization)

2011-08-11 Thread Vlad Glagolev

QEMU 0.15.0 fixes the problem.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/721793

Title:
  QEMU freezes on startup (100% CPU utilization)

Status in QEMU:
  New

Bug description:
  0.12.5 was the last version of QEMU that runs ok and boots any os
  image.

  0.13.0-0.14.0 just freeze, and the only thing I see is a black screen and 
both of them make it use 100% of CPU also.
  Both kernels 2.6.35.11 and 2.6.37.1 with and without PAE support.

  tested commands:

  W2000:
  $ qemu -m 256 -localtime -net nic,model=rtl8139 -net tap -usbdevice 
host:0e21:0750 /var/opt/vm/w2000.img
  W2000:
  $ qemu /var/opt/vm/w2000.img
  OpenBSD 4.8:
  $ qemu -cdrom ~/cd48.iso -boot d empty-qcow2.img

  tried to use `-M pc-0.12` selector, different audio cards (I've found it 
caused infinite loop on startup once) -- no luck.
  tried to use recent seabios from git -- still no luck.

  attached strace log of 0.14.0.

  everything was tested on HP mini 311C with Intel Atom N270.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/721793/+subscriptions

[Qemu-devel] [Bug 721793] [NEW] QEMU freezes on startup (100% CPU utilization)

2011-02-19 Thread Vlad Glagolev

Public bug reported:

0.12.5 was the last version of QEMU that runs ok and boots any os image.

0.13.0-0.14.0 just freeze, and the only thing I see is a black screen and both 
of them make it use 100% of CPU also.
Both kernels 2.6.35.11 and 2.6.37.1 with and without PAE support.

tested commands:

W2000:
$ qemu -m 256 -localtime -net nic,model=rtl8139 -net tap -usbdevice 
host:0e21:0750 /var/opt/vm/w2000.img
W2000:
$ qemu /var/opt/vm/w2000.img
OpenBSD 4.8:
$ qemu -cdrom ~/cd48.iso -boot d empty-qcow2.img

tried to use `-M pc-0.12` selector, different audio cards (I've found it caused 
infinite loop on startup once) -- no luck.
tried to use recent seabios from git -- still no luck.

attached strace log of 0.14.0.

everything was tested on HP mini 311C with Intel Atom N270.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/721793

Title:
  QEMU freezes on startup (100% CPU utilization)

Status in QEMU:
  New

Bug description:
  0.12.5 was the last version of QEMU that runs ok and boots any os
  image.

  0.13.0-0.14.0 just freeze, and the only thing I see is a black screen and 
both of them make it use 100% of CPU also.
  Both kernels 2.6.35.11 and 2.6.37.1 with and without PAE support.

  tested commands:

  W2000:
  $ qemu -m 256 -localtime -net nic,model=rtl8139 -net tap -usbdevice 
host:0e21:0750 /var/opt/vm/w2000.img
  W2000:
  $ qemu /var/opt/vm/w2000.img
  OpenBSD 4.8:
  $ qemu -cdrom ~/cd48.iso -boot d empty-qcow2.img

  tried to use `-M pc-0.12` selector, different audio cards (I've found it 
caused infinite loop on startup once) -- no luck.
  tried to use recent seabios from git -- still no luck.

  attached strace log of 0.14.0.

  everything was tested on HP mini 311C with Intel Atom N270.

[Qemu-devel] [Bug 721793] Re: QEMU freezes on startup (100% CPU utilization)

2011-02-19 Thread Vlad Glagolev

** Attachment added: Strace log
   
https://bugs.launchpad.net/bugs/721793/+attachment/1859874/+files/strace-qemu.log.gz

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/721793

Title:
  QEMU freezes on startup (100% CPU utilization)

Status in QEMU:
  New

Bug description:
  0.12.5 was the last version of QEMU that runs ok and boots any os
  image.

  0.13.0-0.14.0 just freeze, and the only thing I see is a black screen and 
both of them make it use 100% of CPU also.
  Both kernels 2.6.35.11 and 2.6.37.1 with and without PAE support.

  tested commands:

  W2000:
  $ qemu -m 256 -localtime -net nic,model=rtl8139 -net tap -usbdevice 
host:0e21:0750 /var/opt/vm/w2000.img
  W2000:
  $ qemu /var/opt/vm/w2000.img
  OpenBSD 4.8:
  $ qemu -cdrom ~/cd48.iso -boot d empty-qcow2.img

  tried to use `-M pc-0.12` selector, different audio cards (I've found it 
caused infinite loop on startup once) -- no luck.
  tried to use recent seabios from git -- still no luck.

  attached strace log of 0.14.0.

  everything was tested on HP mini 311C with Intel Atom N270.

[Qemu-devel] [Bug 721793] Re: QEMU freezes on startup (100% CPU utilization)

2011-02-19 Thread Vlad Glagolev

** Attachment added: 2.6.35.11 kernel config
   
https://bugs.launchpad.net/qemu/+bug/721793/+attachment/1859875/+files/config.gz

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/721793

Title:
  QEMU freezes on startup (100% CPU utilization)

Status in QEMU:
  New

Bug description:
  0.12.5 was the last version of QEMU that runs ok and boots any os
  image.

  0.13.0-0.14.0 just freeze, and the only thing I see is a black screen and 
both of them make it use 100% of CPU also.
  Both kernels 2.6.35.11 and 2.6.37.1 with and without PAE support.

  tested commands:

  W2000:
  $ qemu -m 256 -localtime -net nic,model=rtl8139 -net tap -usbdevice 
host:0e21:0750 /var/opt/vm/w2000.img
  W2000:
  $ qemu /var/opt/vm/w2000.img
  OpenBSD 4.8:
  $ qemu -cdrom ~/cd48.iso -boot d empty-qcow2.img

  tried to use `-M pc-0.12` selector, different audio cards (I've found it 
caused infinite loop on startup once) -- no luck.
  tried to use recent seabios from git -- still no luck.

  attached strace log of 0.14.0.

  everything was tested on HP mini 311C with Intel Atom N270.

Re: [Qemu-devel] U-Boot patch for qemu -M mips

2007-10-04 Thread Vlad Lungu


Thiemo Seufer wrote:

Vlad Lungu wrote:
  

Thiemo Seufer wrote:


Vlad Lungu wrote:
[snip]
  
  

 +long int initdram(int board_type)
 +{
 +   /* Sdram is setup by assembler code */
 +   /* If memory could be changed, we should return the true value 
here */

 +   return MEM_SIZE*1024*1024;



Qemu gets the amount of RAM passed via a command line switch, the
qemu-mips emulation sets up a Linux kernel like command line in
memory where u-boot could parse it from.


  

Does it, or just when you pass -kernel to it? I'll check.



Hm, you are right, it does that only for -kernel. Would it make sense
to change that in Qemu?
  
  
IDK.  Maybe I can probe the  RAM size in U-Boot , or if this does not work, 



mips_r4k implements no probable memory controller, so this won't work.
  
I say let's stick it into a DIP-switch, so nobody can overwrite it by 
mistake. And you don't have to parse it.

put some info somewhere (RAM, register,
emulated DIP-dwitch), like RAM size, endianness of the CPU.



Endianness is rather pointless. If your U-Boot binary doesn't explode
immediately you got the right endianness. :-)

  
It doesn't actually explode, it sort of almost hits the exception 
handler and freezes there. I don't have a mipsel gdb at hand

to watch it, but it should be fun.

pc=EPC   0xbfc00380
ds 0006

STATUS0x0046
CAUSE 0x0428
CONFIG0 0x8082
CONFIG1 0x9e190c8b

Re: [Qemu-devel] U-Boot patch for qemu -M mips

2007-10-04 Thread vlad

 Vlad Lungu wrote:
 [snip]
 put some info somewhere (RAM, register,
 emulated DIP-dwitch), like RAM size, endianness of the CPU.


 Endianness is rather pointless. If your U-Boot binary doesn't explode
 immediately you got the right endianness. :-)


 It doesn't actually explode, it sort of almost hits the exception
 handler
 and freezes there. I don't have a mipsel gdb at hand
 to watch it, but it should be fun.
 pc=EPC   0xbfc00380
 ds 0006

 STATUS0x0046
 CAUSE 0x0428
 CONFIG0 0x8082
 CONFIG1 0x9e190c8b

 Try qemu-system-mipsel ... -d all, and watch /tmp/qemu.log. :-)

Oh crap. Now I see the problem. Either I misread the 0.90 info reg or
there's a difference in how it's handled. It dies on the first instruction
anyway.

Vlad

Re: [Qemu-devel] U-Boot patch for qemu -M mips TAKE 2

2007-10-04 Thread Vlad Lungu


Now with board config file included, so it can be built :-)
Thiemo, I'll think about the memory size issue and get back to you on that.
How about a git repo for U-Boot, if this thing takes off?  


Vlad


---
diff --git a/Makefile b/Makefile
index 85885b1..8f650d2 100644
--- a/Makefile
+++ b/Makefile
@@ -2444,6 +2444,12 @@ pb1000_config:   unconfig
@echo #define CONFIG_PB1000 1 $(obj)include/config.h
@$(MKCONFIG) -a pb1x00 mips mips pb1x00

+qemu_mips_config   :   unconfig
+   @mkdir -p $(obj)include
+   @ $(obj)include/config.h
+   @echo #define CONFIG_QEMU_MIPS 1 $(obj)include/config.h
+   @$(MKCONFIG) -a qemu-mips mips mips qemu-mips
+
#
## MIPS64 5Kc
#
diff --git a/board/qemu-mips/Makefile b/board/qemu-mips/Makefile
new file mode 100644
index 000..23be447
--- /dev/null
+++ b/board/qemu-mips/Makefile
@@ -0,0 +1,45 @@
+#
+# (C) Copyright 2003-2006
+# Wolfgang Denk, DENX Software Engineering, [EMAIL PROTECTED]
+#
+# See file CREDITS for list of people who contributed to this
+# project.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation; either version 2 of
+# the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston,
+# MA 02111-1307 USA
+#
+
+include $(TOPDIR)/config.mk
+
+LIB= $(obj)lib$(BOARD).a
+
+COBJS  = $(BOARD).o flash.o
+SOBJS  = lowlevel_init.o
+
+SRCS   := $(SOBJS:.o=.S) $(COBJS:.o=.c)
+OBJS   := $(addprefix $(obj),$(COBJS))
+SOBJS  := $(addprefix $(obj),$(SOBJS))
+
+$(LIB): $(OBJS) $(SOBJS)
+   $(AR) $(ARFLAGS) $@ $(OBJS) $(SOBJS)
+
+#
+
+# defines $(obj).depend target
+include $(SRCTREE)/rules.mk
+
+sinclude $(obj).depend
+
+#
diff --git a/board/qemu-mips/README b/board/qemu-mips/README
new file mode 100644
index 000..39570b1
--- /dev/null
+++ b/board/qemu-mips/README
@@ -0,0 +1,11 @@
+By Vlad Lungu [EMAIL PROTECTED] 2007-Oct-01
+
+Qemu is a full system emulator. See
+
+http://fabrice.bellard.free.fr/qemu
+
+Limitations  comments
+--
+Supports the -m mips configuration of qemu: serial,NE2000,IDE.
+Support is big endian only for now (or at least this is what I tested).
+Derived from au1x00 with a lot of things cut out.
diff --git a/board/qemu-mips/config.mk b/board/qemu-mips/config.mk
new file mode 100644
index 000..61269ce
--- /dev/null
+++ b/board/qemu-mips/config.mk
@@ -0,0 +1,11 @@
+
+#
+# Qemu -M mips system emulator
+# See http://fabrice.bellard.free.fr/qemu
+#
+
+# ROM version
+TEXT_BASE = 0xbfc0
+
+# RAM version
+#TEXT_BASE = 0x80001000
diff --git a/board/qemu-mips/flash.c b/board/qemu-mips/flash.c
new file mode 100644
index 000..d419f41
--- /dev/null
+++ b/board/qemu-mips/flash.c
@@ -0,0 +1,43 @@
+/*
+ * (C) Copyright 2003
+ * Wolfgang Denk, DENX Software Engineering, [EMAIL PROTECTED]
+ *
+ * See file CREDITS for list of people who contributed to this
+ * project.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston,
+ * MA 02111-1307 USA
+ */
+
+#include common.h
+#if 0
+flash_info_t flash_info[CFG_MAX_FLASH_BANKS];  /* info for FLASH chips */
+#endif
+/*---
+ * flash_init()
+ *
+ * sets up flash_info and returns size of FLASH (bytes)
+ */
+unsigned long flash_init (void)
+{
+   printf (Skipping flash_init\n);
+   return (0);
+}
+
+int write_buff (/*flash_info_t */ void * info, uchar * src, ulong addr, ulong

Re: [Qemu-devel] U-Boot patch for qemu -M mips TAKE 2

2007-10-04 Thread Vlad Lungu


Wolfgang Denk wrote:

In message [EMAIL PROTECTED] you wrote:
  

Now with board config file included, so it can be built :-)
Thiemo, I'll think about the memory size issue and get back to you on that.
How about a git repo for U-Boot, if this thing takes off?  



We actually have already a lot of them -  the  master  repo  and  the
custodian  repositories. Logically, your code should go into the MIPS
repo. I think you should start posting on  the  U-Boot  mailing  list
instead of sending to mydirectaddress;see
http://www.denx.de/wiki/UBoot/WebHome for details.
  
Actually, this was directed at the Qemu list. But thanks for the offer. 
I suppose they could just point
to your repo in the source release/dev tree and  store the corresponding 
source snapshot when doing a binary
release with information included in the README, and it would be kosher 
Re: GPL.

BTW: Your code has a few minor coding style  violations  (indentation
by spaces instead of TAB), and the signed-off-ny line is missing.
  
Might be. I use joe/mcedit/vi in random order to edit source files, 
that's the source of the spaces, and the patch was rather
informal so no signoff. I'll re-branch and do a 3-part (GOT,NE2000,qemu) 
patchset .


Vlad

Re: [Qemu-devel] U-Boot patch for qemu -M mips

2007-10-03 Thread Vlad Lungu


Thiemo Seufer wrote:

Vlad Lungu wrote:
  

 Fix for mips GOT relocation bug, NE2000 bugs, add support for qemu -M mips 
target.


[snip]
  

 diff --git a/board/qemu-mips/config.mk b/board/qemu-mips/config.mk


[snip]

 +#
 +# AMD development board AMD Alchemy DbAu1x00, MIPS32 core



Incorrect comment.

  

didn't spot that

 +#
 +
 +# ROM version
 +TEXT_BASE = 0xbfc0
 +
 +# RAM version
 +#TEXT_BASE = 0x8010



This could be as low as 0x80001000 (assuming the space for exception
handlers isn't reserved by other means).
  

I think I saw that difference in Linux 2.6 too. I suppose you're right.

[snip]
  

 diff --git a/board/qemu-mips/lowlevel_init.S b/board/qemu-mips/lowlevel_init.S
 new file mode 100644
 index 000..855e8ab
 --- /dev/null
 +++ b/board/qemu-mips/lowlevel_init.S
 @@ -0,0 +1,47 @@
 +/* Memory sub-system initialization code */
 +
 +#include config.h
 +#include version.h
 +#include asm/regdef.h
 +#include asm/mipsregs.h
 +
 +   .text
 +   .set noreorder
 +   .set mips32
 +
 +   .globl  lowlevel_init
 +lowlevel_init:
 +
 +   /*
 +* Step 2) Establish Status Register
 +* (set BEV, clear ERL, clear EXL, clear IE)
 +*/
 +   li  t1, 0x0040
 +   mtc0t1, CP0_STATUS
 +
 +   /*
 +* Step 3) Establish CP0 Config0
 +* (set OD, set K0=3)
 +*/
 +   li  t1, 0x00080003
 +   mtc0t1, CP0_CONFIG



OD is a processor-specific flag, it does nothing on Qemu.

  

 +   /*
 +* Step 5) Disable the performance counters
 +*/
 +   mtc0zero, CP0_PERFORMANCE
 +   nop



This field isn't required to exist (as per architecture spec), which
means you can get an exception when writing it. Since perfctr
interrupts are guaranteed to be disabled, the best option is not to
touch the register in early startup code.

(If you want to use performance counters, first check via the config
registers if they are implemented. You won't have much luck on Qemu:
The registers are implemented, but they don't do anything useful.)
  
Well, I didn't get any exception . And nobody complained about OD 
either. The stuff that barfed on me, I deleted right away.

I'll delete this too, no problem.

 +   /*
 +* Step 7) Establish Cause
 +* (set IV bit)
 +*/
 +   li  t1, 0x0080
 +   mtc0t1, CP0_CAUSE
 +
 +   /* Establish Wired (and Random) */
 +   mtc0zero, CP0_WIRED
 +   nop
 +
 +   j   ra
 +   nop
 diff --git a/board/qemu-mips/qemu-mips.c b/board/qemu-mips/qemu-mips.c
 new file mode 100644
 index 000..76c093c
 --- /dev/null
 +++ b/board/qemu-mips/qemu-mips.c
 @@ -0,0 +1,48 @@
 +/*
 + * (C) Copyright 2007
 + * [EMAIL PROTECTED]



A name in addition would look nicer. :-)

  

Names are so passe :-)

 + *
 + * See file CREDITS for list of people who contributed to this
 + * project.
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public License as
 + * published by the Free Software Foundation; either version 2 of
 + * the License, or (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, write to the Free Software
 + * Foundation, Inc., 59 Temple Place, Suite 330, Boston,
 + * MA 02111-1307 USA
 + */
 +
 +#include common.h
 +#include command.h
 +#include asm/mipsregs.h
 +
 +long int initdram(int board_type)
 +{
 +   /* Sdram is setup by assembler code */
 +   /* If memory could be changed, we should return the true value here */
 +   return MEM_SIZE*1024*1024;



Qemu gets the amount of RAM passed via a command line switch, the
qemu-mips emulation sets up a Linux kernel like command line in
memory where u-boot could parse it from.

  

Does it, or just when you pass -kernel to it? I'll check.

 +}
 +
 +
 +int checkboard (void)
 +{
 +   u32 proc_id;
 +
 +   proc_id = read_32bit_cp0_register(CP0_PRID);
 +
 +   switch (proc_id  24) {
 +   default:
 +   printf (Unsupported cpu %d, proc_id=0x%x\n, proc_id  24, 
proc_id);
 +   }
 +
 +   return 0;
 +}



Huh? What is this code good for?

  
Checking for the type of board, I suppose :-)  I think all BSPs have it 
and it's called early in the boot process. I could either do only a 
return 0 or actually decode CP0_PRID and print something meaningful, if 
Qemu sets it to something sensible. Or just print a nice banner.


Thanks for the input. This is a first version and it does what it's 
supposed to do, it booted in qemu HEAD on linux-x86 and in stock 0.90 
for Windows and loaded

Re: [Qemu-devel] U-Boot patch for qemu -M mips

2007-10-03 Thread Vlad Lungu


Thiemo Seufer wrote:

Vlad Lungu wrote:
[snip]
  

 +long int initdram(int board_type)
 +{
 +   /* Sdram is setup by assembler code */
 +   /* If memory could be changed, we should return the true value 
here */

 +   return MEM_SIZE*1024*1024;



Qemu gets the amount of RAM passed via a command line switch, the
qemu-mips emulation sets up a Linux kernel like command line in
memory where u-boot could parse it from.

  
  

Does it, or just when you pass -kernel to it? I'll check.



Hm, you are right, it does that only for -kernel. Would it make sense
to change that in Qemu?
  
IDK.  Maybe I can probe the  RAM size in U-Boot , or if this does not 
work, put some info somewhere (RAM, register,

emulated DIP-dwitch), like RAM size, endianness of the CPU.

Since the mips_r4k machine doesn't correspond to any real hardware we
have a bit of leeway with the hardware design.

  

 +}
 +
 +
 +int checkboard (void)
 +{
 +   u32 proc_id;
 +
 +   proc_id = read_32bit_cp0_register(CP0_PRID);
 +
 +   switch (proc_id  24) {
 +   default:
 +   printf (Unsupported cpu %d, proc_id=0x%x\n, proc_id  
24, proc_id);

 +   }
 +
 +   return 0;
 +}



Huh? What is this code good for?

  
  
Checking for the type of board, I suppose :-)  I think all BSPs have it and 
it's called early in the boot process. I could either do only a return 0 or 
actually decode CP0_PRID and print something meaningful, if Qemu sets it to 
something sensible. Or just print a nice banner.



As it is, this code prints a nice Unsupported CPU banner no matter what
(and proceeds anyway instead of bailing out).
  
True, because it does not care much about this at the moment. Nor do I. 
But that can change.

Latest CVS Qemu supports a number of different CPUs (see the -cpu switch),
the default for 32 bit is a 24Kf, and for 64 bit a R4000.

  

I'll look into this and see what use can we make of CP0_PRID.

Vlad

[Qemu-devel] U-Boot patch for qemu -M mips

2007-10-02 Thread Vlad Lungu





Fix for mips GOT relocation bug, NE2000 bugs, add support for qemu -M mips target.

Patch is against U-Boot master branch.
--
diff --git a/Makefile b/Makefile
index 85885b1..8f650d2 100644
--- a/Makefile
+++ b/Makefile
@@ -2444,6 +2444,12 @@ pb1000_config		:	unconfig
 	@echo "#define CONFIG_PB1000 1" $(obj)include/config.h
 	@$(MKCONFIG) -a pb1x00 mips mips pb1x00
 
+qemu_mips_config		:	unconfig
+	@mkdir -p $(obj)include
+	@ $(obj)include/config.h
+	@echo "#define CONFIG_QEMU_MIPS 1" $(obj)include/config.h
+	@$(MKCONFIG) -a qemu-mips mips mips qemu-mips
+
 #
 ## MIPS64 5Kc
 #
diff --git a/board/qemu-mips/Makefile b/board/qemu-mips/Makefile
new file mode 100644
index 000..23be447
--- /dev/null
+++ b/board/qemu-mips/Makefile
@@ -0,0 +1,45 @@
+#
+# (C) Copyright 2003-2006
+# Wolfgang Denk, DENX Software Engineering, [EMAIL PROTECTED].
+#
+# See file CREDITS for list of people who contributed to this
+# project.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation; either version 2 of
+# the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston,
+# MA 02111-1307 USA
+#
+
+include $(TOPDIR)/config.mk
+
+LIB	= $(obj)lib$(BOARD).a
+
+COBJS	= $(BOARD).o flash.o
+SOBJS	= lowlevel_init.o
+
+SRCS	:= $(SOBJS:.o=.S) $(COBJS:.o=.c)
+OBJS	:= $(addprefix $(obj),$(COBJS))
+SOBJS	:= $(addprefix $(obj),$(SOBJS))
+
+$(LIB):	 $(OBJS) $(SOBJS)
+	$(AR) $(ARFLAGS) $@ $(OBJS) $(SOBJS)
+
+#
+
+# defines $(obj).depend target
+include $(SRCTREE)/rules.mk
+
+sinclude $(obj).depend
+
+#
diff --git a/board/qemu-mips/README b/board/qemu-mips/README
new file mode 100644
index 000..070c99d
--- /dev/null
+++ b/board/qemu-mips/README
@@ -0,0 +1,11 @@
+By [EMAIL PROTECTED] 2007-Oct-01
+
+Qemu is a full system emulator. See
+
+http://fabrice.bellard.free.fr/qemu
+
+Limitations  comments
+--
+Supports the "-m mips" configuration of qemu: serial,NE2000,IDE.
+Support is big endian only for now (or at least this is what I tested).
+Derived from au1x00 with a lot of things cut out.
diff --git a/board/qemu-mips/config.mk b/board/qemu-mips/config.mk
new file mode 100644
index 000..39eb60a
--- /dev/null
+++ b/board/qemu-mips/config.mk
@@ -0,0 +1,32 @@
+#
+# (C) Copyright 2003
+# Wolfgang Denk, DENX Software Engineering, [EMAIL PROTECTED].
+#
+# See file CREDITS for list of people who contributed to this
+# project.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation; either version 2 of
+# the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston,
+# MA 02111-1307 USA
+#
+
+#
+# AMD development board AMD Alchemy DbAu1x00, MIPS32 core
+#
+
+# ROM version
+TEXT_BASE = 0xbfc0
+
+# RAM version
+#TEXT_BASE = 0x8010
diff --git a/board/qemu-mips/flash.c b/board/qemu-mips/flash.c
new file mode 100644
index 000..d419f41
--- /dev/null
+++ b/board/qemu-mips/flash.c
@@ -0,0 +1,43 @@
+/*
+ * (C) Copyright 2003
+ * Wolfgang Denk, DENX Software Engineering, [EMAIL PROTECTED].
+ *
+ * See file CREDITS for list of people who contributed to this
+ * project.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU

[Qemu-devel] U-Boot

2007-09-29 Thread Vlad Lungu


Vlad Lungu wrote:

lol, replying to myself


Blue Swirl wrote:



Someone could port OpenBIOS or LinuxBIOS to MIPS.



Well, just FYI:

I sort-of-ported U-Boot to Qemu (-M mips). Network performance sucks for 
some reason (hard enough that tftp-booting a kernel is impossible) and I 
didn't have time to investigate this yet. But I guess that porting to 
(-M malta) should not be that hard, and I think there is some PCI 
support already in U-boot.




Reports of poor network performance have been greatly exagerated. 
Apparently, what it happens is :


- qemu doesn't respond with a next-server or server-name in DHCP, apparently
- U-Boot uses broadcast in this case as the server address
- things don't quite work

Setting IP by hand or using tap for networking and configuring DHCP on 
the host correctly allows loading bt TFTP (or even nfs-booting with 
tap). I can boot a vmlinux with embedded ramdisk (a couple of Megs) in 
maybe 10 seconds. Still getting RX timeouts, but it's no big deal.


I will polish the thing a little and post an update in 1-2 weeks max.
Maybe we can have a bootrom for MIPS shipping with Qemu, after all. Even 
for malta.


Vlad

1 2 >

1 - 100 of 104 matches

Mail list logo