[Qemu-devel] [Bug 1629618] Re: QEMU causes host hang / reset on PPC64EL

2016-11-07 Thread Thomas Huth
No idea (apart from asking why're you're still using 4k pages on the host - 
hardly anybody seems to do that anymore).
Anyway, this sounds like a kernel bug, not a QEMU problem, so you should try to 
get help via the kernel bug tracker or the KVM mailing list instead.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1629618

Title:
  QEMU causes host hang / reset on PPC64EL

Status in QEMU:
  New

Bug description:
  QEMU causes a host hang / reset on PPC64EL when used in KVM + HV mode
  (kvm_hv module).

  After a random amount of uptime, starting new QEMU virtual machines
  will cause the host to experience a soft CPU lockup.  Depending on
  configuration and other random factors the host will either checkstop
  and reboot, or hang indefinitely.  The following stacktrace was pulled
  from an instance where the host simply hung after starting a fourth
  virtual machine.

  Command line:

  qemu-system-ppc64 --enable-kvm -M pseries -cpu host -smp
  14,cores=14,threads=1,sockets=1 -m 64G -realtime mlock=on -kernel
  vmlinux-4.7.0-1-powerpc64le -initrd initrd.img-4.7.0-1-powerpc64le

  Lockup trace:

  [  527.393933] KVM guest htab at c03ae400 (order 29), LPID 4
  [  574.637695] INFO: rcu_sched self-detected stall on CPU
  [  574.637799]112-...: (5249 ticks this GP) 
idle=699/141/0 softirq=5358/5382 fqs=5072
  [  574.637877] (t=5250 jiffies g=19853 c=19852 q=64401)
  [  574.637947] Task dump for CPU 112:
  [  574.637982] qemu-system-ppc R  running task0 12037  11828 
0x00040004
  [  574.638051] Call Trace:
  [  574.638081] [c01c1cddb430] [c00f2710] 
sched_show_task+0xe0/0x180 (unreliable)
  [  574.638164] [c01c1cddb4a0] [c01326f4] 
rcu_dump_cpu_stacks+0xe4/0x150
  [  574.638246] [c01c1cddb4f0] [c0137a04] 
rcu_check_callbacks+0x6b4/0x9c0
  [  574.638328] [c01c1cddb610] [c013f7c4] 
update_process_times+0x54/0xa0
  [  574.638409] [c01c1cddb640] [c0156c28] 
tick_sched_handle.isra.5+0x48/0xe0
  [  574.638489] [c01c1cddb680] [c0156d24] 
tick_sched_timer+0x64/0xd0
  [  574.638602] [c01c1cddb6c0] [c0140274] 
__hrtimer_run_queues+0x124/0x420
  [  574.638683] [c01c1cddb750] [c014123c] 
hrtimer_interrupt+0xec/0x2c0
  [  574.638765] [c01c1cddb810] [c001fe5c] 
__timer_interrupt+0x8c/0x270
  [  574.638847] [c01c1cddb860] [c002053c] timer_interrupt+0x9c/0xe0
  [  574.638915] [c01c1cddb890] [c0002750] 
decrementer_common+0x150/0x180
  [  574.639001] --- interrupt: 901 at kvmppc_hv_get_dirty_log+0x1c4/0x570 
[kvm_hv]
  [  574.639001] LR = kvmppc_hv_get_dirty_log+0x1f8/0x570 [kvm_hv]
  [  574.639114] [c01c1cddbc30] [d0001a524980] 
kvm_vm_ioctl_get_dirty_log_hv+0xd0/0x170 [kvm_hv]
  [  574.639209] [c01c1cddbc80] [d0001a4d4140] 
kvm_vm_ioctl_get_dirty_log+0x40/0x60 [kvm]
  [  574.639291] [c01c1cddbcb0] [d0001a4ca3cc] kvm_vm_ioctl+0x3fc/0x760 
[kvm]
  [  574.639372] [c01c1cddbd40] [c02d9e18] do_vfs_ioctl+0xd8/0x8e0
  [  574.639442] [c01c1cddbde0] [c02da6f4] SyS_ioctl+0xd4/0xf0
  [  574.639512] [c01c1cddbe30] [c0009260] system_call+0x38/0x108
  [  580.601573] NMI watchdog: BUG: soft lockup - CPU#112 stuck for 22s! 
[qemu-system-ppc:12037]
  [  580.601655] Modules linked in: xt_tcpudp(E) rpcsec_gss_krb5(E) nfsv4(E) 
dns_resolver(E) ext4(E) ecb(E) crc16(E) jbd2(E) mbcache(E) tun(E) btrfs(E) 
crc32c_generic(E) raid6_pq(E) xor(E) dm_crypt(E) xts(E) gf128mul(E) 
algif_skcipher(E) af_alg(E) dm_mod(E) bonding(E) cpufreq_stats(E) 
iptable_filter(E) ip_tables(E) x_tables(E) bridge(E) stp(E) llc(E) 
ipmi_devintf(E) ipmi_msghandler(E) i2c_dev(E) fuse(E) raid1(E) md_mod(E) ses(E) 
sd_mod(E) enclosure(E) sg(E) binfmt_misc(E) radeon(E) ttm(E) drm_kms_helper(E) 
snd_hda_codec_hdmi(E) snd_hda_intel(E) drm(E) snd_hda_codec(E) snd_hda_core(E) 
snd_hwdep(E) snd_pcm(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) 
fb_sys_fops(E) snd_timer(E) evdev(E) i2c_algo_bit(E) snd(E) soundcore(E) 
at24(E) ahci(E) mpt3sas(E) nvmem_core(E) libahci(E) raid_class(E) 
scsi_transport_sas(E) powernv_rng(E) rng_core(E) uinput(E) kvm_hv(E) kvm(E) 
ib_srp(E) scsi_transport_srp(E) ofpart(E) powernv_flash(E) mtd(E) nfsd(E) 
opal_prd(E) auth_rpcgss(E) parport_pc(E) lp(E) parport(E) autofs4(E) nfsv3(E) 
nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) ib_ipoib(E) ib_umad(E) 
rdma_ucm(E) ib_uverbs(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_sa(E) configfs(E) 
hid_generic(E) usbhid(E) hid(E) xhci_pci(E) xhci_hcd(E) usbcore(E) tg3(E) 
usb_common(E) ptp(E) pps_core(E) libphy(E) ib_mthca(E) ib_mad(E) ib_core(E) 
ib_addr(E)
  [  580.603295] CPU: 112 PID: 12037 Comm: qemu-system-ppc Tainted: G   
 E   4.6.0-2-powerpc64le #1 Debian 4.6.3-1
  [  580.603386] task: c01f706f0180 ti: c01c1cdd8000 task.ti: 
c01c1cdd8000
  [  580.603456] NIP: 

Re: [Qemu-devel] [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication

2016-11-07 Thread Marc-André Lureau
Hi

I suggest you split this patch for the various "features" you propose.

On Mon, Oct 24, 2016 at 11:10 AM Wei Wang  wrote:

> Signed-off-by: Wei Wang 
> ---
>  docs/specs/vhost-user.txt | 81
> +--
>  1 file changed, 72 insertions(+), 9 deletions(-)
>
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 7890d71..173f693 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -17,28 +17,37 @@ The protocol defines 2 sides of the communication,
> master and slave. Master is
>  the application that shares its virtqueues, in our case QEMU. Slave is the
>  consumer of the virtqueues.
>
> -In the current implementation QEMU is the Master, and the Slave is
> intended to
> +In the traditional implementation QEMU is the Master, and the Slave is
> intended to
>  be a software Ethernet switch running in user space, such as Snabbswitch.
>
>
ok


>  Master and slave can be either a client (i.e. connecting) or server
> (listening)
>  in the socket communication.
>
> +The current vhost-user protocol is extended to support the vhost-pci
> based inter-VM
> +communication. In this case, Slave is a QEMU which runs a vhost-pci
> server, and
> +Master is another QEMU which runs a vhost-pci client.
> +
>


Why introduce new terminology "server" and "client"? What does it change?
This is confusing with socket client/server configuration.


>  Message Specification
>  -
>
>  Note that all numbers are in the machine native byte order. A vhost-user
> message
> -consists of 3 header fields and a payload:
> +consists of 4 header fields and a payload:
>
> -
> -| request | flags | size | payload |
> -
> +--
> +| request | flags | conn_id | size | payload |
> +--
>
>   * Request: 32-bit type of the request
>   * Flags: 32-bit bit field:
> - Lower 2 bits are the version (currently 0x01)
> -   - Bit 2 is the reply flag - needs to be sent on each reply from the
> slave
> +   - Bit 2 is the reply flag - needs to be sent on each reply
> - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK
> for
>   details.
> + * Conn_id: 64-bit connection id to indentify a client socket connection.
> It is
> +introduced in version 0x02 to support the "1-server-N-client"
> model
> +and an asynchronous client read implementation. The
> connection id,
> +0x, is used by an anonymous client (e.g. a
> client who
> +has not got its connection id from the server in the initial
> talk)
>

I don't understand why you need a connection id, on each message. What's
the purpose? Since the communication is unicast, a single message should be
enough.

  * Size - 32-bit size of the payload
>
>
> @@ -97,6 +106,13 @@ Depending on the request type, payload can be:
> log offset: offset from start of supplied file descriptor
> where logging starts (i.e. where guest address 0 would be logged)
>
> +* Device info
> +   
> +   | virito id | uuid |
> +   
> +   Virtio id: 16-bit virtio id of the device
> +   UUID: 128-bit UUID to identify the QEMU instance that creates the
> device
> +
>

I wonder if UUID should be a different message.



>  In QEMU the vhost-user message is implemented with the following struct:
>
>  typedef struct VhostUserMsg {
> @@ -109,6 +125,7 @@ typedef struct VhostUserMsg {
>  struct vhost_vring_addr addr;
>  VhostUserMemory memory;
>  VhostUserLog log;
> +DeviceInfo dev_info;
>  };
>  } QEMU_PACKED VhostUserMsg;
>
> @@ -119,17 +136,25 @@ The protocol for vhost-user is based on the existing
> implementation of vhost
>  for the Linux Kernel. Most messages that can be sent via the Unix domain
> socket
>  implementing vhost-user have an equivalent ioctl to the kernel
> implementation.
>
> -The communication consists of master sending message requests and slave
> sending
> -message replies. Most of the requests don't require replies. Here is a
> list of
> -the ones that do:
> +Traditionally, the communication consists of master sending message
> requests
> +and slave sending message replies. Most of the requests don't require
> replies.
> +Here is a list of the ones that do:
>
>   * VHOST_GET_FEATURES
>   * VHOST_GET_PROTOCOL_FEATURES
>   * VHOST_GET_VRING_BASE
>   * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
> + * VHOST_USER_GET_CONN_ID
> + * VHOST_USER_SET_PEER_CONNECTION
>
> Let's also fix  the VHOST_USER prefix of the above requests.

 [ Also see the section on REPLY_ACK protocol extension. ]
>
> +Currently, the communication also supports the Slave (server) sending
> messages
> +to the Master (client). Here is a list of them:
> + * VHOST_USER_SET_FEATURES
>

[Qemu-devel] Concerning " [PULL 6/6] curses: Use cursesw instead of curses"

2016-11-07 Thread Sergey Smolov

Dear List!

I've encountered the same problem as was discussed in this thread: 
https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg07898.html


Does anybody succeeded in solving the problem?

From my side, the problem appears when I run the 'configure' script 
with '--target-list=aarch64-softmmu' option. The script returns the 
following message to me:


ERROR: configure test passed without -Werror but failed with -Werror.
   This is probably a bug in the configure script. The failing command
   will be at the bottom of config.log.
   You can run configure with --disable-werror to bypass this check.

I've attached a config.log to this e-mail.

Thanks in advance!

--
Sincerely yours,
Sergey Smolov

# QEMU configure log Пн. нояб.  7 19:12:21 MSK 2016
# Configured with: './configure' '--target-list=aarch64-softmmu'
#
cc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -c -o config-temp/qemu-conf.o 
config-temp/qemu-conf.c
cc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -c -o config-temp/qemu-conf.o 
config-temp/qemu-conf.c
config-temp/qemu-conf.c:2:2: error: #error __i386__ not defined
cc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -c -o config-temp/qemu-conf.o 
config-temp/qemu-conf.c
cc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -c -o config-temp/qemu-conf.o 
config-temp/qemu-conf.c
config-temp/qemu-conf.c:2:2: error: #error __ILP32__ not defined
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -c -o 
config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -Werror -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -c -o 
config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -o 
config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -Werror -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -o 
config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -c -o 
config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -Werror -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -c -o 
config-temp/qemu-conf.o config-temp/qemu-conf.c
c++ -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wall 
-Wundef -Wwrite-strings -fno-strict-aliasing -fno-common -fwrapv -o 
config-temp/qemu-conf.exe config-temp/qemu-conf.cxx config-temp/qemu-conf.o 
-m64 -g
c++ -Werror -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
-D_LARGEFILE_SOURCE -Wall -Wundef -Wwrite-strings -fno-strict-aliasing 
-fno-common -fwrapv -o config-temp/qemu-conf.exe config-temp/qemu-conf.cxx 
config-temp/qemu-conf.o -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Werror 
-Wstring-plus-int -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc: error: unrecognized command line option ‘-Wstring-plus-int’
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Werror 
-Winitializer-overrides -o config-temp/qemu-conf.exe config-temp/qemu-conf.c 
-m64 -g
cc: error: unrecognized command line option ‘-Winitializer-overrides’
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Werror 
-Wendif-labels -o 

Re: [Qemu-devel] [PATCH v4 7/8] qmp: Support abstract classes on device-list-properties

2016-11-07 Thread Markus Armbruster
Eduardo Habkost  writes:

> On Mon, Nov 07, 2016 at 06:08:42PM +, Daniel P. Berrange wrote:
>> On Mon, Nov 07, 2016 at 04:03:58PM -0200, Eduardo Habkost wrote:
>> > On Mon, Nov 07, 2016 at 05:41:01PM +, Daniel P. Berrange wrote:
>> > > On Mon, Nov 07, 2016 at 03:27:31PM -0200, Eduardo Habkost wrote:
>> > > > On Mon, Nov 07, 2016 at 04:51:57PM +0100, Markus Armbruster wrote:
>> > > > > "Daniel P. Berrange"  writes:
>> > > > > 
>> > > > > > On Mon, Nov 07, 2016 at 03:48:49PM +0100, Halil Pasic wrote:
>> > > > > >> 
>> > > > > >> 
>> > > > > >> On 11/07/2016 02:05 PM, Eduardo Habkost wrote:
>> > > > > >> > If you want some subclasses to not have the property, then I
>> > > > > >> > recommend not registering it as a class property on the base
>> > > > > >> > class in the first place. I don't expect to see a mechanism to
>> > > > > >> > allow subclasses to remove or override class properties from
>> > > > > >> > parent classes.
>> > > > > >> > 
>> > > > > >> 
>> > > > > >> Thank you very much for your reply.
>> > > > > >> 
>> > > > > >> I understand, yet I see potential problems. The example with 
>> > > > > >> ioeventfd
>> > > > > >> and vhost in virtio-pci is a good one also because  the first 
>> > > > > >> there was
>> > > > > >> the ioeventfd property with commit 653ced07 and then the vhost 
>> > > > > >> case came
>> > > > > >> along with commit 50787628ee3 (ok ioeventfd is not there for some 
>> > > > > >> non
>> > > > > >> vhost virtio-pci devices for reasons I do not understand).
>> > > > > >> 
>> > > > > >> To rephrase this in generic context a specialization for which a
>> > > > > >> property does not make sense might come along after the property 
>> > > > > >> at the
>> > > > > >> base class was established.
>> > > > > >> 
>> > > > > >> Now AFAIU properties are external API, so having to make a 
>> > > > > >> compatibility
>> > > > > >> breaking change there might not be fun. Does this mean one should 
>> > > > > >> be
>> > > > > >> very careful to put only use class level properties on abstract 
>> > > > > >> classes
>> > > > > >> where its certain that the property always makes sense including 
>> > > > > >> it's
>> > > > > >> access control?
>> > > > > >
>> > > > > > This could be an argument for *NOT* allowing introspectiing of 
>> > > > > > properties
>> > > > > > against abstract parent classes. If you only ever allow 
>> > > > > > introspecting against
>> > > > > > leaf node non-abstract classes, then QEMU retains the freedom to 
>> > > > > > move props
>> > > > > > from a base class down to an leaf class without risk of breaking 
>> > > > > > mgmt apps.
>> > > > > 
>> > > > > That's a really good point.  To generalize it a bit, introspection of
>> > > > > actual interfaces is fine, but permitting introspection of how they 
>> > > > > are
>> > > > > made can add artificial constraints.
>> > > > > 
>> > > > > Introspecting the subtype relation is already problematic in this 
>> > > > > view.
>> > > > 
>> > > > Yes, that's a very good point. But note that that this means
>> > > > making things more complex for libvirt.
>> > > > 
>> > > > In the case of -cpu, if we don't expose (or allow libvirt to
>> > > > making assumptions about) subtype relations, the only way libvirt
>> > > > can conclude that "+foo can be used as -cpu option with any CPU
>> > > > model", is to query each and every CPU model type, and see if all
>> > > > of them support the "foo" property.
>> > > >
>> > > > It's a trade-off between an interface that's more complex to use
>> > > > and having less freedom to change the class hierarchy.
>> > > > Personally, I don't mind going either way, if we have a good
>> > > > reason for that.
>> > > 
>> > > Or could do a tradeoff where we allow introspection of abstract
>> > > parent classes, but explicitly document that we reserve the right
>> > > to move properties to leaf nodes ?
>> > 
>> > Reserving the right to move properties to leaf nodes would be
>> > welcome. But it would force libvirt to query all leaf nodes if it
>> > wants to be sure the option is really unsupported by the QEMU
>> > binary, so why would libvirt query the parent class in the first
>> > place?
>> 
>> The introspection API is quite general purpose so its semantics have to
>> be suitable for all types of object, but some types of object may not need
>> the full degree of flexibility. So what I meant was that while we want
>> to be able to move props down to leaf classes for objects in general,
>> we could perhaps assume that this will never happen for CPU model objects.
>
> This would work for me. I only worry that any code that makes the
> wrong assumptions (on either QEMU or libvirt) would easily go
> unnoticed until we try to change the class hierarchy and it
> breaks something.
>
> Markus, what do you think?

I dislike complexity in interface contracts.

Guidance like "if you want to learn the properties of a type T,
introspect T" is simple.

Guidance like 

Re: [Qemu-devel] [PATCH v11 13/22] vfio: Introduce common function to add capabilities

2016-11-07 Thread Alexey Kardashevskiy
On 05/11/16 08:10, Kirti Wankhede wrote:
> Vendor driver using mediated device framework should use
> vfio_info_add_capability() to add capabilities.
> Introduced this function to reduce code duplication in vendor drivers.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I6fca329fa2291f37a2c859d0bc97574d9e2ce1a6
> ---
>  drivers/vfio/vfio.c  | 60 
> +++-
>  include/linux/vfio.h |  3 +++
>  2 files changed, 62 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 4ed1a6a247c6..9a03be0942a1 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1797,8 +1797,66 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, 
> size_t offset)
>   for (tmp = caps->buf; tmp->next; tmp = (void *)tmp + tmp->next - offset)
>   tmp->next += offset;
>  }
> -EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
> +EXPORT_SYMBOL(vfio_info_cap_shift);


Why this change?


>  
> +static int sparse_mmap_cap(struct vfio_info_cap *caps, void *cap_type)
> +{
> + struct vfio_info_cap_header *header;
> + struct vfio_region_info_cap_sparse_mmap *sparse_cap, *sparse = cap_type;
> + size_t size;
> +
> + size = sizeof(*sparse) + sparse->nr_areas *  sizeof(*sparse->areas);
> + header = vfio_info_cap_add(caps, size,
> +VFIO_REGION_INFO_CAP_SPARSE_MMAP, 1);
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + sparse_cap = container_of(header,
> + struct vfio_region_info_cap_sparse_mmap, header);
> + sparse_cap->nr_areas = sparse->nr_areas;
> + memcpy(sparse_cap->areas, sparse->areas,
> +sparse->nr_areas * sizeof(*sparse->areas));
> + return 0;
> +}
> +
> +static int region_type_cap(struct vfio_info_cap *caps, void *cap_type)
> +{
> + struct vfio_info_cap_header *header;
> + struct vfio_region_info_cap_type *type_cap, *cap = cap_type;
> +
> + header = vfio_info_cap_add(caps, sizeof(*cap),
> +VFIO_REGION_INFO_CAP_TYPE, 1);
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + type_cap = container_of(header, struct vfio_region_info_cap_type,
> + header);
> + type_cap->type = cap->type;
> + type_cap->subtype = cap->subtype;
> + return 0;
> +}
> +
> +int vfio_info_add_capability(struct vfio_info_cap *caps, int cap_type_id,
> +  void *cap_type)
> +{
> + int ret = -EINVAL;
> +
> + if (!cap_type)
> + return 0;
> +
> + switch (cap_type_id) {
> + case VFIO_REGION_INFO_CAP_SPARSE_MMAP:
> + ret = sparse_mmap_cap(caps, cap_type);
> + break;
> +
> + case VFIO_REGION_INFO_CAP_TYPE:
> + ret = region_type_cap(caps, cap_type);
> + break;
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_info_add_capability);
>  
>  /*
>   * Pin a set of guest PFNs and return their associated host PFNs for local
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index dcda8fccefab..cf90393a11e2 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -113,6 +113,9 @@ extern struct vfio_info_cap_header *vfio_info_cap_add(
>   struct vfio_info_cap *caps, size_t size, u16 id, u16 version);
>  extern void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset);
>  
> +extern int vfio_info_add_capability(struct vfio_info_cap *caps,
> + int cap_type_id, void *cap_type);
> +


It would make it easier to review and bisect if 14/22 was squashed into
this one. In the resulting patch, vfio_info_cap_add() can be made static as
it will only be used in drivers/vfio/vfio.c from now.




>  struct pci_dev;
>  #ifdef CONFIG_EEH
>  extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
> 


-- 
Alexey



Re: [Qemu-devel] [PATCH v4 7/8] qmp: Support abstract classes on device-list-properties

2016-11-07 Thread Markus Armbruster
Eduardo Habkost  writes:

> On Mon, Nov 07, 2016 at 03:40:57PM +0100, Markus Armbruster wrote:
>> Eduardo Habkost  writes:
>> 
>> > On Mon, Nov 07, 2016 at 09:09:58AM +0100, Markus Armbruster wrote:
>> >> Eduardo Habkost  writes:
>> >> 
>> >> > On Fri, Nov 04, 2016 at 04:45:17PM +0100, Markus Armbruster wrote:
>> >> >> Eduardo Habkost  writes:
>> >> >> 
>> >> >> > (CCing libvirt people, as I forgot to CC them)
>> >> >> >
>> >> >> > On Mon, Oct 31, 2016 at 03:07:23PM +0100, Igor Mammedov wrote:
>> >> >> >> On Fri, 28 Oct 2016 23:48:06 -0200
>> >> >> >> Eduardo Habkost  wrote:
>> >> >> >> 
>> >> >> >> > When an abstract class is used on device-list-properties, we can
>> >> >> >> > simply return the class properties registered for the class.
>> >> >> >> > 
>> >> >> >> > This will be useful if management software needs to query for
>> >> >> >> > supported options that apply to all devices of a given type (e.g.
>> >> >> >> > options supported by all CPU models, options supported by all PCI
>> >> >> >> > devices).
>> >> >> >> Patch looks fine to me but I'm not qmp interface guru
>> >> >> >> so I'd leave review up to maintainers.
>> >> >> >> 
>> >> >> >> One question though,
>> >> >> >> How would management software discover typename of abstract class?
>> >> >> >
>> >> >> > It depends on the use case. On some cases, management may already
>> >> >> > have bus-specific logic that will know what's the base type it
>> >> >> > needs to query (e.g. it may query "pci-device" to find out if all
>> >> >> > PCI devices support a given option). On other cases, it may be
>> >> >> > discovered using other commands.
>> >> >> 
>> >> >> The stated purpose of this feature is to let management software "query
>> >> >> for supported options that apply to all devices of a given type".  I
>> >> >> suspect that when management software has a notion of "a given type", 
>> >> >> it
>> >> >> knows its name.
>> >> >> 
>> >> >> Will management software go fishing for subtype relationships beyond 
>> >> >> the
>> >> >> types it knows?  I doubt it.  Of course, management software developers
>> >> >> are welcome to educate me :)
>> >> >> 
>> >> >> > For the CPU case, I will propose adding the base QOM CPU typename
>> >> >> > in the query-target command.
>> >> >> 
>> >> >> Does this type name vary?  If yes, can you give examples?
>> >> >
>> >> > It does. x86-specific CPU properties are on the x86_64-cpu and
>> >> > i386-cpu classes. arm-specific CPU properties are on the arm-cpu
>> >> > class.
>> >> 
>> >> I see we have concrete CPUs (such as "Westmere-x86_64-cpu"), which are
>> >> subtypes of an abstract CPU (such as "x86_64-cpu"), which is a subtype
>> >> of "cpu", which is a subtype of "device", which is a subtype of
>> >> "object".
>> >> 
>> >> The chain "cpu" - "device" - "object" is fixed and well-known.
>> >> 
>> >> The link from there to the concrete CPU varies.  Whether it could be
>> >> considered well-known or not is debatable.
>> >> 
>> >> My true question is: should we have a special purpose interface to get
>> >> the abstract supertype of concrete CPU types, or should be have general
>> >> purpose means to introspect the subtype hierarchy?
>> >> 
>> >> Note that we have the latter already, although in a rather cumbersome
>> >> form:
>> >> 
>> >> { "execute": "qom-list-types",
>> >>   "arguments": { "implements": T, "abstract": true } }
>> >> 
>> >> lists all subtypes of T.  You can filter out the concrete subtypes by
>> >> subtracting the same query with "abstract": false.  Start with the
>> >> type you're interested in, find all its abstract supertypes.  If you
>> >> need to know more, repeat for the types you found.
>> >
>> > Looks cumbersome, because I don't see a way to find all
>> > supertypes of a given type without walking the whole tree
>> > starting from "object" (is there one?). But it could be improved
>> > a bit if we added a "implements" field to ObjectTypeInfo.
>> 
>> My point is: we can skip discussing whether we should expose the subtype
>> relation, because we already do.
>
> Correct. My only problem is that it seems to add extra
> assumptions to the code (e.g. that there's only one abstract CPU
> type). But if libvirt is careful, it doesn't need to make any
> assumptions: it can explore the type hierarchy and confirm that
> the assumptions are correct.
>
>> 
>> > But, maybe we should take a step back: my original goal was to
>> > let libvirt know which properties are supported by any CPU model
>> > when using "-cpu".
>> 
>> Why is that useful?
>
> libvirt wants to know if the QEMU binary supports a given -cpu
> option (normally CPU features that can be enabled/disabled using
> "+foo"/"-foo").

The obvious way to check whether a specific CPU supports it is to
introspect that CPU.

The obvious way to check whether all CPUs of interest support it
(assuming that is a productive question) is to introspect 

Re: [Qemu-devel] [PATCH for-2.8? 0/3] block/curl: Drop TFTP "support"

2016-11-07 Thread Markus Armbruster
Max Reitz  writes:

> On 07.11.2016 09:20, Markus Armbruster wrote:
>> Max Reitz  writes:
>> 
>>> On 03.11.2016 08:56, Markus Armbruster wrote:
 Max Reitz  writes:

> See patch 3 for the reason why we have actually never supported TFTP at
> all (except for very small files (i.e. below 256 kB or so)).

 Care to explain why it works "for very small files" in a bit more
 detail?  PATCH 3 gives a "does not support byte ranges" hint, but to go
 from there to "very small files", you need to know more about how the
 block layer works than I can remember right now.
>>>
>>> Our curl block drivers caches data and uses a readahead cache, which by
>>> default has a size of 256 kB. Therefore, if the start of the file is
>>> read first (which it usually is, if just for format probing), then the
>>> correct data will be read for that size.
>>>
>>> Yes, you can adjust the readahead size. No, I cannot guarantee that
>>> there are no users that just set readahead to the image size and thus
>>> made it work. I can't really imagine that, though, because at that point
>>> you can just copy the file to tmpfs and have the same result.
>>>
>>> Also, if I were a user, I probably wouldn't use 256 kB images, and thus
>>> I would just notice tftp to be broken. I don't think I would experiment
>>> with the readahead option to find out that it works if I set it to the
>>> image size and then just use it that way. I definitely think I would
>>> give up before that and just copy the file to the local system.
>> 
>> I'm not trying to make you explain why it's okay to drop TFTP.  I'm
>> trying to make you explain what exactly worked and what exactly didn't.
>> Such explanations generally involve a certain degree of "why".
>
> Well, I'm trying to explain both. :-)
>
>> Your first paragraph provides a few more hints, but I'm still guessing.
>> Here's my current best guess:
>> 
>> * Commonly, images smaller than 256 KiB work, and larger images don't.
>
> Yes. Unless you set the "readahead" option to something different (it
> just defaults to 256 kB), then it'll commonly work for that images up to
> that size.
>
> Oh, and I just realized it's not called "readahead" for nothing: It gets
> added to the size of the read operation, so if your first read operation
> has a size of 1 GB... Well, then all of that will be correctly cached.
> So both the size and the offset of the first read operation are significant.
>
>> * "Don't work" means the block layer returns garbled data.
>
> Right. It will be data from the image, but not from the offset you want.
>
>> * "Commonly" means when the first read is for offset zero.  Begs the
>>   question when exactly that's the case.  You mentioned format probing.
>>   What if the user specified a format?  It's okay not to answer this
>>   question.  I'm not demanding exhaustive analysis, I'm fishing for a
>>   better commit message.  Such a message may leave some of its questions
>>   unanswered.
>
> Well, qcow2 will always start at offset zero anyway (because it reads
> the header first). For raw images, the offset can be anywhere, but if
> you're starting a VM from it, offset zero is obviously likely to be read
> first, too.
>
> (And as a side note, the first read operation for qcow2 images will
> always be 64 kB in size.)
>
> But, yes, for raw images the offset can be anywhere and if it is not
> zero, the answer what works and what doesn't becomes a bit more complicated:
>
> 
> Suppose the first offset read from is 64k. curl will return data from
> offset 0 anyway, so it's pretty much garbage. But if you then do another
> read operation from 0, that will return correct data.
>
> If after that you try to read data from the area that has been covered
> by both read operations... Then it depends on which buffer the curl
> driver sees first, which is most likely the first one, i.e. you'll get
> broken data again.
> 

There's a lovely addition to your commit message struggling to get out
of your reply.



[Qemu-devel] [RFC v2] RBD: Add support readv,writev for rbd

2016-11-07 Thread jazeltq
From: tianqing 

Rbd can do readv and writev directly, so wo do not need to transform
iov to buf or vice versa any more.

Signed-off-by: tianqing 
---
 block/rbd.c | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index a57b3e3..93fe299 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -73,7 +73,12 @@ typedef struct RBDAIOCB {
 BlockAIOCB common;
 int64_t ret;
 QEMUIOVector *qiov;
+/* Note:
+ * The LIBRBD_SUPPORTS_IOVEC is defined in librbd.h.
+ */
+#ifndef LIBRBD_SUPPORTS_IOVEC
 char *bounce;
+#endif
 RBDAIOCmd cmd;
 int error;
 struct BDRVRBDState *s;
@@ -83,7 +88,9 @@ typedef struct RADOSCB {
 RBDAIOCB *acb;
 struct BDRVRBDState *s;
 int64_t size;
+#ifndef LIBRBD_SUPPORTS_IOVEC
 char *buf;
+#endif
 int64_t ret;
 } RADOSCB;
 
@@ -426,11 +433,21 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 }
 } else {
 if (r < 0) {
+#ifndef LIBRBD_SUPPORTS_IOVEC
 memset(rcb->buf, 0, rcb->size);
+#else
+iov_memset(acb->qiov->iov, acb->qiov->niov, 0, 0, acb->qiov->size);
+#endif
 acb->ret = r;
 acb->error = 1;
 } else if (r < rcb->size) {
+#ifndef LIBRBD_SUPPORTS_IOVEC
 memset(rcb->buf + r, 0, rcb->size - r);
+#else
+iov_memset(acb->qiov->iov, acb->qiov->niov,
+   r, 0, acb->qiov->size - r);
+#endif
+
 if (!acb->error) {
 acb->ret = rcb->size;
 }
@@ -441,10 +458,12 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 
 g_free(rcb);
 
+#ifndef LIBRBD_SUPPORTS_IOVEC
 if (acb->cmd == RBD_AIO_READ) {
 qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size);
 }
 qemu_vfree(acb->bounce);
+#endif
 acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
 
 qemu_aio_unref(acb);
@@ -664,6 +683,7 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 acb->cmd = cmd;
 acb->qiov = qiov;
 assert(!qiov || qiov->size == size);
+#ifndef LIBRBD_SUPPORTS_IOVEC
 if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) {
 acb->bounce = NULL;
 } else {
@@ -672,19 +692,20 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 goto failed;
 }
 }
-acb->ret = 0;
-acb->error = 0;
-acb->s = s;
-
 if (cmd == RBD_AIO_WRITE) {
 qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
 }
-
 buf = acb->bounce;
+#endif
+acb->ret = 0;
+acb->error = 0;
+acb->s = s;
 
 rcb = g_new(RADOSCB, 1);
 rcb->acb = acb;
+#ifndef LIBRBD_SUPPORTS_IOVEC
 rcb->buf = buf;
+#endif
 rcb->s = acb->s;
 rcb->size = size;
 r = rbd_aio_create_completion(rcb, (rbd_callback_t) rbd_finish_aiocb, );
@@ -694,10 +715,18 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 
 switch (cmd) {
 case RBD_AIO_WRITE:
+#ifndef LIBRBD_SUPPORTS_IOVEC
 r = rbd_aio_write(s->image, off, size, buf, c);
+#else
+r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
+#endif
 break;
 case RBD_AIO_READ:
+#ifndef LIBRBD_SUPPORTS_IOVEC
 r = rbd_aio_read(s->image, off, size, buf, c);
+#else
+r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c);
+#endif
 break;
 case RBD_AIO_DISCARD:
 r = rbd_aio_discard_wrapper(s->image, off, size, c);
@@ -719,7 +748,9 @@ failed_completion:
 rbd_aio_release(c);
 failed:
 g_free(rcb);
+#ifndef LIBRBD_SUPPORTS_IOVEC
 qemu_vfree(acb->bounce);
+#endif
 qemu_aio_unref(acb);
 return NULL;
 }
-- 
2.10.2




[Qemu-devel] [PATCHv2] build-sys: remove libtool left-over

2016-11-07 Thread Marc-André Lureau
Libtool support was removed in commit e999ee44349, there is a few
left-over.

Signed-off-by: Marc-André Lureau 
---

v2:
 - remove .pc and .libs from gitignore
 - some libtool removal in make clean rule

 Makefile  | 9 +++--
 Makefile.objs | 1 -
 .gitignore| 4 
 configure | 2 --
 4 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/Makefile b/Makefile
index a84582c..56f3a94 100644
--- a/Makefile
+++ b/Makefile
@@ -236,12 +236,10 @@ ALL_SUBDIRS=$(TARGET_DIRS) $(patsubst %,pc-bios/%, 
$(ROMS))
 
 recurse-all: $(SUBDIR_RULES) $(ROMSUBDIR_RULES)
 
-$(BUILD_DIR)/version.o: $(SRC_PATH)/version.rc config-host.h | 
$(BUILD_DIR)/version.lo
+$(BUILD_DIR)/version.o: $(SRC_PATH)/version.rc config-host.h
$(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ 
$<,"RC","version.o")
-$(BUILD_DIR)/version.lo: $(SRC_PATH)/version.rc config-host.h
-   $(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ 
$<,"RC","version.lo")
 
-Makefile: $(version-obj-y) $(version-lobj-y)
+Makefile: $(version-obj-y)
 
 ##
 # Build libraries
@@ -363,10 +361,9 @@ clean:
rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h 
gen-op-arm.h
rm -f qemu-options.def
rm -f *.msi
-   find . \( -name '*.l[oa]' -o -name '*.so' -o -name '*.dll' -o -name 
'*.mo' -o -name '*.[oda]' \) -type f -exec rm {} +
+   find . \( -name '*.so' -o -name '*.dll' -o -name '*.mo' -o -name 
'*.[oda]' \) -type f -exec rm {} +
rm -f $(filter-out %.tlb,$(TOOLS)) $(HELPERS-y) qemu-ga TAGS cscope.* 
*.pod *~ */*~
rm -f fsdev/*.pod
-   rm -rf .libs */.libs
rm -f qemu-img-cmds.h
rm -f ui/shader/*-vert.h ui/shader/*-frag.h
@# May not be present in GENERATED_HEADERS
diff --git a/Makefile.objs b/Makefile.objs
index 06f74b8..0d7acd4 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -97,7 +97,6 @@ common-obj-y += disas/
 ##
 # Resource file for Windows executables
 version-obj-$(CONFIG_WIN32) += $(BUILD_DIR)/version.o
-version-lobj-$(CONFIG_WIN32) += $(BUILD_DIR)/version.lo
 
 ##
 # tracing
diff --git a/.gitignore b/.gitignore
index 3d7848c..e43c304 100644
--- a/.gitignore
+++ b/.gitignore
@@ -82,10 +82,6 @@
 *.d
 !/scripts/qemu-guest-agent/fsfreeze-hook.d
 *.o
-*.lo
-*.la
-*.pc
-.libs
 .sdk
 *.gcda
 *.gcno
diff --git a/configure b/configure
index fd6f898..3194a98 100755
--- a/configure
+++ b/configure
@@ -28,8 +28,6 @@ TMPB="qemu-conf"
 TMPC="${TMPDIR1}/${TMPB}.c"
 TMPO="${TMPDIR1}/${TMPB}.o"
 TMPCXX="${TMPDIR1}/${TMPB}.cxx"
-TMPL="${TMPDIR1}/${TMPB}.lo"
-TMPA="${TMPDIR1}/lib${TMPB}.la"
 TMPE="${TMPDIR1}/${TMPB}.exe"
 TMPMO="${TMPDIR1}/${TMPB}.mo"
 
-- 
2.10.0




Re: [Qemu-devel] [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices

2016-11-07 Thread Alexey Kardashevskiy
On 05/11/16 08:10, Kirti Wankhede wrote:
> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
> Mediated device only uses IOMMU APIs, the underlying hardware can be
> managed by an IOMMU domain.
> 
> Aim of this change is:
> - To use most of the code of TYPE1 IOMMU driver for mediated devices
> - To support direct assigned device and mediated device in single module
> 
> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
> backend module. More details:
> - vfio_pin_pages() callback here uses task and address space of vfio_dma,
>   that is, of the process who mapped that iova range.
> - Added pfn_list tracking logic to address space structure. All pages
>   pinned through this interface are trached in its address space.
> - Pinned pages list is used to verify unpinning request and to unpin
>   remaining pages while detaching the group for that device.
> - Page accounting is updated to account in its address space where the
>   pages are pinned/unpinned.
> -  Accouting for mdev device is only done if there is no iommu capable
>   domain in the container. When there is a direct device assigned to the
>   container and that domain is iommu capable, all pages are already pinned
>   during DMA_MAP.
> - Page accouting is updated on hot plug and unplug mdev device and pass
>   through device.
> 
> Tested by assigning below combinations of devices to a single VM:
> - GPU pass through only

This does not require this patchset, right?

> - vGPU device only

Out of curiosity - how exactly did you test this? The exact GPU, how to
create vGPU, what was the QEMU command line and the guest does with this
passed device? Thanks.




-- 
Alexey



Re: [Qemu-devel] [PATCH for-2.8] migration: Fix return code of ram_save_iterate()

2016-11-07 Thread Thomas Huth
On 08.11.2016 02:14, David Gibson wrote:
> On Fri, Nov 04, 2016 at 02:10:17PM +0100, Thomas Huth wrote:
>> qemu_savevm_state_iterate() expects the iterators to return 1
>> when they are done, and 0 if there is still something left to do.
>> However, ram_save_iterate() does not obey this rule and returns
>> the number of saved pages instead. This causes a fatal hang with
>> ppc64 guests when you run QEMU like this (also works with TCG):
>>
>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>>-hda /tmp/test.qcow2 -serial mon:stdio
>>
>> ... then switch to the monitor by pressing CTRL-a c and try to
>> save a snapshot with "savevm test1" for example.
>>
>> After the first iteration, ram_save_iterate() always returns 0 here,
>> so that qemu_savevm_state_iterate() hangs in an endless loop and you
>> can only "kill -9" the QEMU process.
>> Fix it by using proper return values in ram_save_iterate().
>>
>> Signed-off-by: Thomas Huth 
> 
> Hmm.  I think the change is technically correct, but I'm uneasy with
> this approach to the solution.  The whole reason this wasn't caught
> earlier is that almost nothing looks at the return value.  Without
> changing that I think it's very likely someone will mess this up
> again.
> 
> I think it would be preferable to change the return type to void to
> make it explicit that this function is not directly returning the
> "completion" status, but instead that's calculated from the other
> progress variables it updates.

Not sure how such a patch should finally look like. Could you propose a
patch?

Anyway, we're in soft freeze already ... do we still want such a major
change of the logic at this point in time? If not, we should maybe go
with fixing the return type only for 2.8, and do the major change for
2.9 instead?

 Thomas




signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v4 6/6] iotests: add transactional failure race test

2016-11-07 Thread John Snow
Add a regression test for the case found by Vladimir.

Reported-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: John Snow 
Reviewed-by: Kevin Wolf 
---
 tests/qemu-iotests/124 | 53 ++
 tests/qemu-iotests/124.out |  4 ++--
 2 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
index f06938e..d0d2c2b 100644
--- a/tests/qemu-iotests/124
+++ b/tests/qemu-iotests/124
@@ -395,19 +395,7 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
 self.check_backups()
 
 
-def test_transaction_failure(self):
-'''Test: Verify backups made from a transaction that partially fails.
-
-Add a second drive with its own unique pattern, and add a bitmap to 
each
-drive. Use blkdebug to interfere with the backup on just one drive and
-attempt to create a coherent incremental backup across both drives.
-
-verify a failure in one but not both, then delete the failed stubs and
-re-run the same transaction.
-
-verify that both incrementals are created successfully.
-'''
-
+def do_transaction_failure_test(self, race=False):
 # Create a second drive, with pattern:
 drive1 = self.add_node('drive1')
 self.img_create(drive1['file'], drive1['fmt'])
@@ -451,9 +439,10 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
 self.assertFalse(self.vm.get_qmp_events(wait=False))
 
 # Emulate some writes
-self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
-  ('0xfe', '16M', '256k'),
-  ('0x64', '32736k', '64k')))
+if not race:
+self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
+  ('0xfe', '16M', '256k'),
+  ('0x64', '32736k', '64k')))
 self.hmp_io_writes(drive1['id'], (('0xba', 0, 512),
   ('0xef', '16M', '256k'),
   ('0x46', '32736k', '64k')))
@@ -463,7 +452,8 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
 target1 = self.prepare_backup(dr1bm0)
 
 # Ask for a new incremental backup per-each drive,
-# expecting drive1's backup to fail:
+# expecting drive1's backup to fail. In the 'race' test,
+# we expect drive1 to attempt to cancel the empty drive0 job.
 transaction = [
 transaction_drive_backup(drive0['id'], target0, sync='incremental',
  format=drive0['fmt'], mode='existing',
@@ -488,9 +478,15 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
 self.assert_no_active_block_jobs()
 
 # Delete drive0's successful target and eliminate our record of the
-# unsuccessful drive1 target. Then re-run the same transaction.
+# unsuccessful drive1 target.
 dr0bm0.del_target()
 dr1bm0.del_target()
+if race:
+# Don't re-run the transaction, we only wanted to test the race.
+self.vm.shutdown()
+return
+
+# Re-run the same transaction:
 target0 = self.prepare_backup(dr0bm0)
 target1 = self.prepare_backup(dr1bm0)
 
@@ -511,6 +507,27 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
 self.vm.shutdown()
 self.check_backups()
 
+def test_transaction_failure(self):
+'''Test: Verify backups made from a transaction that partially fails.
+
+Add a second drive with its own unique pattern, and add a bitmap to 
each
+drive. Use blkdebug to interfere with the backup on just one drive and
+attempt to create a coherent incremental backup across both drives.
+
+verify a failure in one but not both, then delete the failed stubs and
+re-run the same transaction.
+
+verify that both incrementals are created successfully.
+'''
+self.do_transaction_failure_test()
+
+def test_transaction_failure_race(self):
+'''Test: Verify that transactions with jobs that have no data to
+transfer do not cause race conditions in the cancellation of the entire
+transaction job group.
+'''
+self.do_transaction_failure_test(race=True)
+
 
 def test_sync_dirty_bitmap_missing(self):
 self.assert_no_active_block_jobs()
diff --git a/tests/qemu-iotests/124.out b/tests/qemu-iotests/124.out
index 36376be..e56cae0 100644
--- a/tests/qemu-iotests/124.out
+++ b/tests/qemu-iotests/124.out
@@ -1,5 +1,5 @@
-..
+...
 --
-Ran 10 tests
+Ran 11 tests
 
 OK
-- 
2.7.4




[Qemu-devel] [PATCH v4 4/6] blockjob: add block_job_start

2016-11-07 Thread John Snow
Instead of automatically starting jobs at creation time via backup_start
et al, we'd like to return a job object pointer that can be started
manually at later point in time.

For now, add the block_job_start mechanism and start the jobs
automatically as we have been doing, with conversions job-by-job coming
in later patches.

Of note: cancellation of unstarted jobs will perform all the normal
cleanup as if the job had started, particularly abort and clean. The
only difference is that we will not emit any events, because the job
never actually started.

Signed-off-by: John Snow 
---
 block/backup.c|  3 +--
 block/commit.c|  5 ++---
 block/mirror.c|  5 ++---
 block/stream.c|  5 ++---
 block/trace-events|  6 +++---
 blockjob.c| 54 ---
 include/block/blockjob.h  |  9 
 tests/test-blockjob-txn.c | 12 +--
 8 files changed, 67 insertions(+), 32 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 4ed4494..ae1b99a 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -654,9 +654,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 
 block_job_add_bdrv(>common, target);
 job->common.len = len;
-job->common.co = qemu_coroutine_create(job->common.driver->start, job);
 block_job_txn_add_job(txn, >common);
-qemu_coroutine_enter(job->common.co);
+block_job_start(>common);
 return;
 
  error:
diff --git a/block/commit.c b/block/commit.c
index 20d27e2..c284e85 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -289,10 +289,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 s->backing_file_str = g_strdup(backing_file_str);
 
 s->on_error = on_error;
-s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 
-trace_commit_start(bs, base, top, s, s->common.co);
-qemu_coroutine_enter(s->common.co);
+trace_commit_start(bs, base, top, s);
+block_job_start(>common);
 }
 
 
diff --git a/block/mirror.c b/block/mirror.c
index 659e09c..62ac87f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1009,9 +1009,8 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 }
 }
 
-s->common.co = qemu_coroutine_create(s->common.driver->start, s);
-trace_mirror_start(bs, s, s->common.co, opaque);
-qemu_coroutine_enter(s->common.co);
+trace_mirror_start(bs, s, opaque);
+block_job_start(>common);
 }
 
 void mirror_start(const char *job_id, BlockDriverState *bs,
diff --git a/block/stream.c b/block/stream.c
index 92309ff..1523ba7 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -255,7 +255,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 s->bs_flags = orig_bs_flags;
 
 s->on_error = on_error;
-s->common.co = qemu_coroutine_create(s->common.driver->start, s);
-trace_stream_start(bs, base, s, s->common.co);
-qemu_coroutine_enter(s->common.co);
+trace_stream_start(bs, base, s);
+block_job_start(>common);
 }
diff --git a/block/trace-events b/block/trace-events
index 882c903..cfc05f2 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -19,14 +19,14 @@ bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned 
int bytes, int64_t c
 
 # block/stream.c
 stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int 
is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
-stream_start(void *bs, void *base, void *s, void *co) "bs %p base %p s %p co 
%p"
+stream_start(void *bs, void *base, void *s) "bs %p base %p s %p"
 
 # block/commit.c
 commit_one_iteration(void *s, int64_t sector_num, int nb_sectors, int 
is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
-commit_start(void *bs, void *base, void *top, void *s, void *co) "bs %p base 
%p top %p s %p co %p"
+commit_start(void *bs, void *base, void *top, void *s) "bs %p base %p top %p s 
%p"
 
 # block/mirror.c
-mirror_start(void *bs, void *s, void *co, void *opaque) "bs %p s %p co %p 
opaque %p"
+mirror_start(void *bs, void *s, void *opaque) "bs %p s %p opaque %p"
 mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_flush(void *s) "s %p"
 mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
diff --git a/blockjob.c b/blockjob.c
index e3c458c..513620c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -174,7 +174,9 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 job->blk   = blk;
 job->cb= cb;
 job->opaque= opaque;
-job->busy  = true;
+job->busy  = false;
+job->paused= true;
+job->pause_count   = 1;
 job->refcnt= 1;
 bs->job = job;
 
@@ -202,6 +204,23 @@ bool block_job_is_internal(BlockJob *job)
 return (job->id == NULL);
 }
 
+static bool block_job_started(BlockJob *job)
+{
+return job->co;
+}
+
+void block_job_start(BlockJob *job)

Re: [Qemu-devel] [PATCH V3 05/10] intel_iommu: support device iotlb descriptor

2016-11-07 Thread Jason Wang



On 2016年11月08日 07:35, Peter Xu wrote:

On Mon, Nov 07, 2016 at 03:09:50PM +0800, Jason Wang wrote:

[...]


+static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
+  VTDInvDesc *inv_desc)
+{
+VTDAddressSpace *vtd_dev_as;
+IOMMUTLBEntry entry;

Since "entry" is allocated on the stack...

[...]


+entry.target_as = _dev_as->as;
+entry.addr_mask = sz - 1;
+entry.iova = addr;
+memory_region_notify_iommu(entry.target_as->root, entry);

... here we need to assign entry.perm explicitly to IOMMU_NONE, right?

Also I think it'll be nice that we set all the fields even not used,
to avoid rubbish from the stack passed down to notifier handlers.

[...]


This is better, if no other comments on the series I will post a patch 
on top to fix this.





+static bool x86_iommu_device_iotlb_prop_get(Object *o, Error **errp)
+{
+X86IOMMUState *s = X86_IOMMU_DEVICE(o);
+return s->dt_supported;
+}
+
+static void x86_iommu_device_iotlb_prop_set(Object *o, bool value, Error 
**errp)
+{
+X86IOMMUState *s = X86_IOMMU_DEVICE(o);
+s->dt_supported = value;
+}
+
  static void x86_iommu_instance_init(Object *o)
  {
  X86IOMMUState *s = X86_IOMMU_DEVICE(o);
@@ -114,6 +126,11 @@ static void x86_iommu_instance_init(Object *o)
  s->intr_supported = false;
  object_property_add_bool(o, "intremap", x86_iommu_intremap_prop_get,
   x86_iommu_intremap_prop_set, NULL);
+s->dt_supported = false;
+object_property_add_bool(o, "device-iotlb",
+ x86_iommu_device_iotlb_prop_get,
+ x86_iommu_device_iotlb_prop_set,
+ NULL);

Again, a nit-pick here is to use Property for "device-iotlb":

 static Property vtd_properties[] = {
 DEFINE_PROP_UINT32("device-iotlb", X86IOMMUState, dt_supported, false),
 DEFINE_PROP_END_OF_LIST(),
 };

However not worth a repost.

Thanks,

-- peterx



We may want to share this with AMD IOMMU. (Looking at AMD IOMMU codes, 
its device-iotlb support is buggy).


Thanks



[Qemu-devel] [PATCH v4 3/6] blockjob: add .start field

2016-11-07 Thread John Snow
Add an explicit start field to specify the entrypoint. We already have
ownership of the coroutine itself AND managing the lifetime of the
coroutine, let's take control of creation of the coroutine, too.

This will allow us to delay creation of the actual coroutine until we
know we'll actually start a BlockJob in block_job_start. This avoids
the sticky question of how to "un-create" a Coroutine that hasn't been
started yet.

Signed-off-by: John Snow 
Reviewed-by: Kevin Wolf 
---
 block/backup.c   | 25 +
 block/commit.c   |  3 ++-
 block/mirror.c   |  4 +++-
 block/stream.c   |  3 ++-
 include/block/blockjob_int.h |  3 +++
 5 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 734a24c..4ed4494 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -323,17 +323,6 @@ static void backup_drain(BlockJob *job)
 }
 }
 
-static const BlockJobDriver backup_job_driver = {
-.instance_size  = sizeof(BackupBlockJob),
-.job_type   = BLOCK_JOB_TYPE_BACKUP,
-.set_speed  = backup_set_speed,
-.commit = backup_commit,
-.abort  = backup_abort,
-.clean  = backup_clean,
-.attached_aio_context   = backup_attached_aio_context,
-.drain  = backup_drain,
-};
-
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
 bool read, int error)
 {
@@ -542,6 +531,18 @@ static void coroutine_fn backup_run(void *opaque)
 block_job_defer_to_main_loop(>common, backup_complete, data);
 }
 
+static const BlockJobDriver backup_job_driver = {
+.instance_size  = sizeof(BackupBlockJob),
+.job_type   = BLOCK_JOB_TYPE_BACKUP,
+.start  = backup_run,
+.set_speed  = backup_set_speed,
+.commit = backup_commit,
+.abort  = backup_abort,
+.clean  = backup_clean,
+.attached_aio_context   = backup_attached_aio_context,
+.drain  = backup_drain,
+};
+
 void backup_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
@@ -653,7 +654,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 
 block_job_add_bdrv(>common, target);
 job->common.len = len;
-job->common.co = qemu_coroutine_create(backup_run, job);
+job->common.co = qemu_coroutine_create(job->common.driver->start, job);
 block_job_txn_add_job(txn, >common);
 qemu_coroutine_enter(job->common.co);
 return;
diff --git a/block/commit.c b/block/commit.c
index e1eda89..20d27e2 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -205,6 +205,7 @@ static const BlockJobDriver commit_job_driver = {
 .instance_size = sizeof(CommitBlockJob),
 .job_type  = BLOCK_JOB_TYPE_COMMIT,
 .set_speed = commit_set_speed,
+.start = commit_run,
 };
 
 void commit_start(const char *job_id, BlockDriverState *bs,
@@ -288,7 +289,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 s->backing_file_str = g_strdup(backing_file_str);
 
 s->on_error = on_error;
-s->common.co = qemu_coroutine_create(commit_run, s);
+s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 
 trace_commit_start(bs, base, top, s, s->common.co);
 qemu_coroutine_enter(s->common.co);
diff --git a/block/mirror.c b/block/mirror.c
index b2c1fb8..659e09c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -920,6 +920,7 @@ static const BlockJobDriver mirror_job_driver = {
 .instance_size  = sizeof(MirrorBlockJob),
 .job_type   = BLOCK_JOB_TYPE_MIRROR,
 .set_speed  = mirror_set_speed,
+.start  = mirror_run,
 .complete   = mirror_complete,
 .pause  = mirror_pause,
 .attached_aio_context   = mirror_attached_aio_context,
@@ -930,6 +931,7 @@ static const BlockJobDriver commit_active_job_driver = {
 .instance_size  = sizeof(MirrorBlockJob),
 .job_type   = BLOCK_JOB_TYPE_COMMIT,
 .set_speed  = mirror_set_speed,
+.start  = mirror_run,
 .complete   = mirror_complete,
 .pause  = mirror_pause,
 .attached_aio_context   = mirror_attached_aio_context,
@@ -1007,7 +1009,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 }
 }
 
-s->common.co = qemu_coroutine_create(mirror_run, s);
+s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 trace_mirror_start(bs, s, s->common.co, opaque);
 qemu_coroutine_enter(s->common.co);
 }
diff --git a/block/stream.c b/block/stream.c
index b05856b..92309ff 

[Qemu-devel] [PATCH v4 5/6] blockjob: refactor backup_start as backup_job_create

2016-11-07 Thread John Snow
Refactor backup_start as backup_job_create, which only creates the job,
but does not automatically start it. The old interface, 'backup_start',
is not kept in favor of limiting the number of nearly-identical interfaces
that would have to be edited to keep up with QAPI changes in the future.

Callers that wish to synchronously start the backup_block_job can
instead just call block_job_start immediately after calling
backup_job_create.

Transactions are updated to use the new interface, calling block_job_start
only during the .commit phase, which helps prevent race conditions where
jobs may finish before we even finish building the transaction. This may
happen, for instance, during empty block backup jobs.

Reported-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: John Snow 
---
 block/backup.c| 26 ---
 block/replication.c   | 12 ---
 blockdev.c| 81 ++-
 include/block/block_int.h | 23 +++---
 4 files changed, 85 insertions(+), 57 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index ae1b99a..ea38733 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -543,7 +543,7 @@ static const BlockJobDriver backup_job_driver = {
 .drain  = backup_drain,
 };
 
-void backup_start(const char *job_id, BlockDriverState *bs,
+BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
   bool compress,
@@ -563,52 +563,52 @@ void backup_start(const char *job_id, BlockDriverState 
*bs,
 
 if (bs == target) {
 error_setg(errp, "Source and target cannot be the same");
-return;
+return NULL;
 }
 
 if (!bdrv_is_inserted(bs)) {
 error_setg(errp, "Device is not inserted: %s",
bdrv_get_device_name(bs));
-return;
+return NULL;
 }
 
 if (!bdrv_is_inserted(target)) {
 error_setg(errp, "Device is not inserted: %s",
bdrv_get_device_name(target));
-return;
+return NULL;
 }
 
 if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
 error_setg(errp, "Compression is not supported for this drive %s",
bdrv_get_device_name(target));
-return;
+return NULL;
 }
 
 if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
-return;
+return NULL;
 }
 
 if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
-return;
+return NULL;
 }
 
 if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
 if (!sync_bitmap) {
 error_setg(errp, "must provide a valid bitmap name for "
  "\"incremental\" sync mode");
-return;
+return NULL;
 }
 
 /* Create a new bitmap, and freeze/disable this one. */
 if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
-return;
+return NULL;
 }
 } else if (sync_bitmap) {
 error_setg(errp,
"a sync_bitmap was provided to backup_run, "
"but received an incompatible sync_mode (%s)",
MirrorSyncMode_lookup[sync_mode]);
-return;
+return NULL;
 }
 
 len = bdrv_getlength(bs);
@@ -655,8 +655,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 block_job_add_bdrv(>common, target);
 job->common.len = len;
 block_job_txn_add_job(txn, >common);
-block_job_start(>common);
-return;
+
+return >common;
 
  error:
 if (sync_bitmap) {
@@ -666,4 +666,6 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 backup_clean(>common);
 block_job_unref(>common);
 }
+
+return NULL;
 }
diff --git a/block/replication.c b/block/replication.c
index d5e2b0f..729dd12 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -421,6 +421,7 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 int64_t active_length, hidden_length, disk_length;
 AioContext *aio_context;
 Error *local_err = NULL;
+BlockJob *job;
 
 aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(aio_context);
@@ -508,17 +509,18 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 bdrv_op_block_all(top_bs, s->blocker);
 bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
 
-backup_start(NULL, s->secondary_disk->bs, s->hidden_disk->bs, 0,
- MIRROR_SYNC_MODE_NONE, NULL, false,
- BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
- BLOCK_JOB_INTERNAL, backup_job_completed, bs,
- NULL, _err);
+job = 

Re: [Qemu-devel] [PATCH 3/3] net: virtio-net discards TX data after link down

2016-11-07 Thread Jason Wang



On 2016年11月07日 16:20, yuri.benditov...@daynix.com wrote:

From: Yuri Benditovich 

https://bugzilla.redhat.com/show_bug.cgi?id=1295637
Upon set_link monitor command or upon netdev deletion
virtio-net sends link down indication to the guest
and stops vhost if one is used.
Guest driver can still submit data for TX until it
recognizes link loss. If these packets not returned by
the host, the Windows guest will never be able to finish
disable/removal/shutdown.
Now each packet sent by guest after NIC indicated link
down will be completed immediately.

Signed-off-by: Yuri Benditovich 
---
  hw/net/virtio-net.c | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 06bfe4b..6158de0 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1200,6 +1200,16 @@ static ssize_t virtio_net_receive(NetClientState *nc, 
const uint8_t *buf, size_t
  return size;
  }
  
+static void virtio_net_drop_tx_queue_data(VirtIODevice *vdev, VirtQueue *vq)

+{
+VirtQueueElement *elem;
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
  static int32_t virtio_net_flush_tx(VirtIONetQueue *q);
  
  static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)

@@ -1345,6 +1355,11 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, 
VirtQueue *vq)
  VirtIONet *n = VIRTIO_NET(vdev);
  VirtIONetQueue *q = >vqs[vq2q(virtio_get_queue_index(vq))];
  
+if (unlikely((n->status & VIRTIO_NET_S_LINK_UP) == 0)) {

+virtio_net_drop_tx_queue_data(vdev, vq);
+return;
+}
+
  if (unlikely(q->tx_waiting)) {
  return;
  }


This doesn't work for tx timer, you may want to do the modification on 
virtio_net_flush_tx().


What's more, when link_down is true, qemu_send_packet_async() will 
return size of iov, can we do some check there?


Thanks



[Qemu-devel] [PATCH v4 2/6] blockjob: add .clean property

2016-11-07 Thread John Snow
Cleaning up after we have deferred to the main thread but before the
transaction has converged can be dangerous and result in deadlocks
if the job cleanup invokes any BH polling loops.

A job may attempt to begin cleaning up, but may induce another job to
enter its cleanup routine. The second job, part of our same transaction,
will block waiting for the first job to finish, so neither job may now
make progress.

To rectify this, allow jobs to register a cleanup operation that will
always run regardless of if the job was in a transaction or not, and
if the transaction job group completed successfully or not.

Move sensitive cleanup to this callback instead which is guaranteed to
be run only after the transaction has converged, which removes sensitive
timing constraints from said cleanup.

Furthermore, in future patches these cleanup operations will be performed
regardless of whether or not we actually started the job. Therefore,
cleanup callbacks should essentially confine themselves to undoing create
operations, e.g. setup actions taken in what is now backup_start.

Reported-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: John Snow 
Reviewed-by: Kevin Wolf 
---
 block/backup.c   | 15 ++-
 blockjob.c   |  3 +++
 include/block/blockjob_int.h |  8 
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 7b5d8a3..734a24c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -242,6 +242,14 @@ static void backup_abort(BlockJob *job)
 }
 }
 
+static void backup_clean(BlockJob *job)
+{
+BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+assert(s->target);
+blk_unref(s->target);
+s->target = NULL;
+}
+
 static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
 {
 BackupBlockJob *s = container_of(job, BackupBlockJob, common);
@@ -321,6 +329,7 @@ static const BlockJobDriver backup_job_driver = {
 .set_speed  = backup_set_speed,
 .commit = backup_commit,
 .abort  = backup_abort,
+.clean  = backup_clean,
 .attached_aio_context   = backup_attached_aio_context,
 .drain  = backup_drain,
 };
@@ -343,12 +352,8 @@ typedef struct {
 
 static void backup_complete(BlockJob *job, void *opaque)
 {
-BackupBlockJob *s = container_of(job, BackupBlockJob, common);
 BackupCompleteData *data = opaque;
 
-blk_unref(s->target);
-s->target = NULL;
-
 block_job_completed(job, data->ret);
 g_free(data);
 }
@@ -658,7 +663,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
 }
 if (job) {
-blk_unref(job->target);
+backup_clean(>common);
 block_job_unref(>common);
 }
 }
diff --git a/blockjob.c b/blockjob.c
index 4d0ef53..e3c458c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -241,6 +241,9 @@ static void block_job_completed_single(BlockJob *job)
 job->driver->abort(job);
 }
 }
+if (job->driver->clean) {
+job->driver->clean(job);
+}
 
 if (job->cb) {
 job->cb(job->opaque, job->ret);
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 40275e4..60d91a0 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -74,6 +74,14 @@ struct BlockJobDriver {
 void (*abort)(BlockJob *job);
 
 /**
+ * If the callback is not NULL, it will be invoked after a call to either
+ * .commit() or .abort(). Regardless of which callback is invoked after
+ * completion, .clean() will always be called, even if the job does not
+ * belong to a transaction group.
+ */
+void (*clean)(BlockJob *job);
+
+/**
  * If the callback is not NULL, it will be invoked when the job transitions
  * into the paused state.  Paused jobs must not perform any asynchronous
  * I/O or event loop activity.  This callback is used to quiesce jobs.
-- 
2.7.4




[Qemu-devel] [PATCH v4 0/6] jobs: fix transactional race condition

2016-11-07 Thread John Snow
There are a few problems with transactional job completion right now.

First, if jobs complete so quickly they complete before remaining jobs
get a chance to join the transaction, the completion mode can leave well
known state and the QLIST can get corrupted and the transactional jobs
can complete in batches or phases instead of all together.

Second, if two or more jobs defer to the main loop at roughly the same
time, it's possible for one job's cleanup to directly invoke the other
job's cleanup from within the same thread, leading to a situation that
will deadlock the entire transaction.

Thanks to Vladimir for pointing out these modes of failure.

===
v4:
===

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/6:[] [--] 'blockjob: fix dead pointer in txn list'
002/6:[] [--] 'blockjob: add .clean property'
003/6:[] [--] 'blockjob: add .start field'
004/6:[0021] [FC] 'blockjob: add block_job_start'
005/6:[0010] [FC] 'blockjob: refactor backup_start as backup_job_create'
006/6:[] [--] 'iotests: add transactional failure race test'

04: Fix command tracers (Kevin)
Implement the ability to 'start' a 'paused' job (Kevin, Jeff)
05: Replace superfluous conditionals with assertions. (Kevin, Jeff)

===
v3:
===

- Rebase to origin/master, requisite patches now upstream.

===
v2:
===

- Correct Vladimir's email (Sorry!)
- Add test as a variant of an existing test [Vladimir]



For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git branch job-fix-race-condition
https://github.com/jnsnow/qemu/tree/job-fix-race-condition

This version is tagged job-fix-race-condition-v4:
https://github.com/jnsnow/qemu/releases/tag/job-fix-race-condition-v4

John Snow (5):
  blockjob: add .clean property
  blockjob: add .start field
  blockjob: add block_job_start
  blockjob: refactor backup_start as backup_job_create
  iotests: add transactional failure race test

Vladimir Sementsov-Ogievskiy (1):
  blockjob: fix dead pointer in txn list

 block/backup.c   | 63 +++---
 block/commit.c   |  6 ++--
 block/mirror.c   |  7 ++--
 block/replication.c  | 12 ---
 block/stream.c   |  6 ++--
 block/trace-events   |  6 ++--
 blockdev.c   | 81 
 blockjob.c   | 58 ---
 include/block/block_int.h| 23 +++--
 include/block/blockjob.h |  9 +
 include/block/blockjob_int.h | 11 ++
 tests/qemu-iotests/124   | 53 +++--
 tests/qemu-iotests/124.out   |  4 +--
 tests/test-blockjob-txn.c| 12 +++
 14 files changed, 228 insertions(+), 123 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH v4 1/6] blockjob: fix dead pointer in txn list

2016-11-07 Thread John Snow
From: Vladimir Sementsov-Ogievskiy 

Though it is not intended to be reached through normal circumstances,
if we do not gracefully deconstruct the transaction QLIST, we may wind
up with stale pointers in the list.

The rest of this series attempts to address the underlying issues,
but this should fix list inconsistencies.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Tested-by: John Snow 
Reviewed-by: John Snow 
[Rewrote commit message. --js]
Signed-off-by: John Snow 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 

Signed-off-by: John Snow 
---
 blockjob.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/blockjob.c b/blockjob.c
index 4aa14a4..4d0ef53 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -256,6 +256,7 @@ static void block_job_completed_single(BlockJob *job)
 }
 
 if (job->txn) {
+QLIST_REMOVE(job, txn_list);
 block_job_txn_unref(job->txn);
 }
 block_job_unref(job);
-- 
2.7.4




[Qemu-devel] Issue with Keyboard Mapping

2016-11-07 Thread Michael Schem
Hello,

I am getting an error message “unknown keycodes `empty_aliases(qwerty)', please 
report to qemu-devel@nongnu.org” I am wondering 
if there is a way to manually set what the keycodes should be.

Thanks,

Michael Schem
Research Engineer

Skype: msc...@cyberadapt.com
msc...@cyberadapt.com
p: 310-699-7175
14755 Preston Rd., Suite 405, Dallas TX 75254
Connect:  Web| 
Twitter| 
Facebook | 
LinkedIn

[cid:image001.png@01D23924.E61A39D0]


[Qemu-devel] [Bug 1639983] [NEW] e1000 EEPROM have bad checksum

2016-11-07 Thread Paul Dufresne
Public bug reported:

I am using qemu-system-i386 to emulate FreeDOS with e1000 nic card.

I am using Intel PRODOS v.19.0 (latest version with E1000ODI.COM file).
E1000ODI.COM v.5.07 (140116)

http://pclosmag.com/html/issues/201208/page11.html
Suggest that v.4.75 (120212) was/is working.
Oldest PRODOS available version seems now 18.5 (June 2013) which I have not 
tested yet.

When running it, it detect: Slot 18, IRQ 11, Port C000.

But complains:
EEPROM checksum was incorrect.

Contact your services network supplier for a replacement.

paul@paul89473:~$ qemu-system-i386 --version
QEMU emulator version 2.6.1 (Debian 1:2.6.1+dfsg-0ubuntu5), Copyright (c) 
2003-2008 Fabrice Bellard
paul@paul89473:~$

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1639983

Title:
  e1000 EEPROM have bad checksum

Status in QEMU:
  New

Bug description:
  I am using qemu-system-i386 to emulate FreeDOS with e1000 nic card.

  I am using Intel PRODOS v.19.0 (latest version with E1000ODI.COM file).
  E1000ODI.COM v.5.07 (140116)

  http://pclosmag.com/html/issues/201208/page11.html
  Suggest that v.4.75 (120212) was/is working.
  Oldest PRODOS available version seems now 18.5 (June 2013) which I have not 
tested yet.

  When running it, it detect: Slot 18, IRQ 11, Port C000.

  But complains:
  EEPROM checksum was incorrect.

  Contact your services network supplier for a replacement.

  paul@paul89473:~$ qemu-system-i386 --version
  QEMU emulator version 2.6.1 (Debian 1:2.6.1+dfsg-0ubuntu5), Copyright (c) 
2003-2008 Fabrice Bellard
  paul@paul89473:~$

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1639983/+subscriptions



Re: [Qemu-devel] [PATCH] usbredir: free vm_change_state_handler in usbredir destroy dispatch

2016-11-07 Thread Marc-André Lureau
Hi

On Tue, Nov 8, 2016 at 9:58 AM Li Qiang  wrote:

> From: Li Qiang 
>
> In usbredir destroy dispatch function, it doesn't free the vm change
> state handler once registered in usbredir_realize function. This will
> lead a memory leak issue. This patch avoid this.
>
> Signed-off-by: Li Qiang 
>


Reviewed-by: Marc-André Lureau 



> ---
>  hw/usb/redirect.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
> index 528081e..a657237 100644
> --- a/hw/usb/redirect.c
> +++ b/hw/usb/redirect.c
> @@ -132,6 +132,7 @@ struct USBRedirDevice {
>  struct usbredirfilter_rule *filter_rules;
>  int filter_rules_count;
>  int compatible_speedmask;
> +VMChangeStateEntry *vmstate;
>  };
>
>  #define TYPE_USB_REDIR "usb-redir"
> @@ -1411,7 +1412,8 @@ static void usbredir_realize(USBDevice *udev, Error
> **errp)
>   usbredir_chardev_read,
> usbredir_chardev_event,
>   dev, NULL, true);
>
> -qemu_add_vm_change_state_handler(usbredir_vm_state_change, dev);
> +dev->vmstate =
> +qemu_add_vm_change_state_handler(usbredir_vm_state_change, dev);
>  }
>
>  static void usbredir_cleanup_device_queues(USBRedirDevice *dev)
> @@ -1450,6 +1452,7 @@ static void usbredir_handle_destroy(USBDevice *udev)
>  }
>
>  free(dev->filter_rules);
> +qemu_del_vm_change_state_handler(dev->vmstate);
>  }
>
>  static int usbredir_check_filter(USBRedirDevice *dev)
> --
> 1.8.3.1
>
>
> --
Marc-André Lureau


Re: [Qemu-devel] [RFC 13/17] pseries: Move CPU compatibility property to machine

2016-11-07 Thread Alexey Kardashevskiy
On 08/11/16 16:26, David Gibson wrote:
> On Fri, Nov 04, 2016 at 06:43:52PM +1100, Alexey Kardashevskiy wrote:
>> On 30/10/16 22:12, David Gibson wrote:
>>> Server class POWER CPUs have a "compat" property, which is used to set the
>>> backwards compatibility mode for the processor.  However, this only makes
>>> sense for machine types which don't give the guest access to hypervisor
>>> privilege - otherwise the compatibility level is under the guest's control.
>>>
>>> To reflect this, this removes the CPU 'compat' property and instead
>>> creates a 'max-cpu-compat' property on the pseries machine.  Strictly
>>> speaking this breaks compatibility, but AFAIK the 'compat' option was
>>> never (directly) used with -device or device_add.
>>>
>>> The option was used with -cpu.  So, to maintain compatibility, this patch
>>> adds a hack to the cpu option parsing to strip out any compat options
>>> supplied with -cpu and set them on the machine property instead of the new
>>> removed cpu property.
>>>
>>> Signed-off-by: David Gibson 
>>> ---
>>>  hw/ppc/spapr.c  |  6 +++-
>>>  hw/ppc/spapr_cpu_core.c | 47 +++--
>>>  hw/ppc/spapr_hcall.c|  2 +-
>>>  include/hw/ppc/spapr.h  | 10 +--
>>>  target-ppc/compat.c | 65 
>>>  target-ppc/cpu.h|  6 ++--
>>>  target-ppc/translate_init.c | 73 
>>> -
>>>  7 files changed, 127 insertions(+), 82 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 6c78889..b983faa 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -1849,7 +1849,7 @@ static void ppc_spapr_init(MachineState *machine)
>>>  machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
>>>  }
>>>  
>>> -ppc_cpu_parse_features(machine->cpu_model);
>>> +spapr_cpu_parse_features(spapr);
>>>  
>>>  spapr_init_cpus(spapr);
>>>  
>>> @@ -2191,6 +2191,10 @@ static void spapr_machine_initfn(Object *obj)
>>>  " place of standard EPOW events when 
>>> possible"
>>>  " (required for memory hot-unplug 
>>> support)",
>>>  NULL);
>>> +
>>> +object_property_add(obj, "max-cpu-compat", "str",
>>> +ppc_compat_prop_get, ppc_compat_prop_set,
>>> +NULL, >max_compat_pvr, _fatal);
>>>  }
>>>  
>>>  static void spapr_machine_finalizefn(Object *obj)
>>> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
>>> index ee5cd14..0319516 100644
>>> --- a/hw/ppc/spapr_cpu_core.c
>>> +++ b/hw/ppc/spapr_cpu_core.c
>>> @@ -18,6 +18,49 @@
>>>  #include "target-ppc/mmu-hash64.h"
>>>  #include "sysemu/numa.h"
>>>  
>>> +void spapr_cpu_parse_features(sPAPRMachineState *spapr)
>>> +{
>>> +/*
>>> + * Backwards compatibility hack:
>>> +
>>> + *   CPUs had a "compat=" property which didn't make sense for
>>> + *   anything except pseries.  It was replaced by "max-cpu-compat"
>>> + *   machine option.  This supports old command lines like
>>> + *   -cpu POWER8,compat=power7
>>> + *   By stripping the compat option and applying it to the machine
>>> + *   before passing it on to the cpu level parser.
>>> + */
>>> +gchar **inpieces, **outpieces;
>>> +int n, i, j;
>>> +gchar *compat_str = NULL;
>>> +gchar *filtered_model;
>>> +
>>> +inpieces = g_strsplit(MACHINE(spapr)->cpu_model, ",", 0);
>>> +n = g_strv_length(inpieces);
>>> +outpieces = g_new0(gchar *, g_strv_length(inpieces));
>>> +
>>> +/* inpieces[0] is the actual model string */
>>> +for (i = 0, j = 0; i < n; i++) {
>>> +if (g_str_has_prefix(inpieces[i], "compat=")) {
>>> +compat_str = inpieces[i];
>>> +} else {
>>> +outpieces[j++] = g_strdup(inpieces[i]);
>>> +}
>>> +}
>>> +
>>> +if (compat_str) {
>>> +char *val = compat_str + strlen("compat=");
>>> +object_property_set_str(OBJECT(spapr), val, "max-cpu-compat",
>>> +_fatal);
>>
>> This part is ok.
>>
>>> +}
>>> +
>>> +filtered_model = g_strjoinv(",", outpieces);
>>> +ppc_cpu_parse_features(filtered_model);
>>
>>
>> Rather than reducing the CPU parameters string from the command line, I'd
>> keep "dc->props = powerpc_servercpu_properties" and make them noop + warn
>> to use the machine option instead. One day QEMU may start calling the CPU
>> features parser itself and somebody will have to hack this thing
>> again.
> 
> Hrm.  A deprecation message like that only works if a human is reading
> it.  Usually qemu will be invoked by libvirt and the message will
> probably disappear into some log file to scare someone unnecessarily.
> 
> Meanwhile, what will the actual behaviour be?  Pulling the CPU's
> property value into the machine instead would be 

Re: [Qemu-devel] [PATCH 3/3] Split ISA and sysbus versions of m48t59 device

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 08:28:25AM -0500, Eric Blake wrote:
> On 11/04/2016 05:22 AM, Markus Armbruster wrote:
> > Needs a rebase.  First error:
> > 
> >   CC  hw/timer/m48t59.o
> > In file included from /work/armbru/qemu/include/exec/cpu-common.h:7:0,
> >  from /work/armbru/qemu/include/exec/memory.h:24,
> >  from /work/armbru/qemu/include/hw/isa/isa.h:6,
> >  from /work/armbru/qemu/hw/timer/m48t59-isa.c:25:
> > /work/armbru/qemu/include/exec/hwaddr.h:11:9: error: unknown type name 
> > ‘uint64_t’
> >  typedef uint64_t hwaddr;
> >  ^
> > 
> 
> >> index 000..3a521dc
> >> --- /dev/null
> >> +++ b/hw/timer/m48t59-isa.c
> 
> >> + * THE SOFTWARE.
> >> + */
> >> +#include "hw/isa/isa.h"
> 
> Probably because you forgot osdep.h as the first include.

That, plus I stupidly re-used the same double-include prevention
symbol from an existing header file.  I've fixed that up in my branch
now.

Of course, I still have no idea who I really need to convince to
actually get this merged.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC 15/17] ppc: Check that CPU model stays consistent across migration

2016-11-07 Thread Alexey Kardashevskiy
On 08/11/16 16:29, David Gibson wrote:
> On Fri, Nov 04, 2016 at 06:54:48PM +1100, Alexey Kardashevskiy wrote:
>> On 30/10/16 22:12, David Gibson wrote:
>>> When a vmstate for the ppc cpu was first introduced (a90db15 "target-ppc:
>>> Convert ppc cpu savevm to VMStateDescription"), a VMSTATE_EQUAL was used
>>> to ensure that identical CPU models were used at source and destination
>>> as based on the PVR (Processor Version Register).
>>>
>>> However this was a problem for HV KVM, where due to hardware limitations
>>> we always need to use the real PVR of the host CPU.  So, to allow
>>> migration between hosts with "similar enough" CPUs, the PVR check was
>>> removed in 569be9f0 "target-ppc: Remove PVR check from migration".  This
>>> left the onus on user / management to only attempt migration between
>>> compatible CPUs.
>>>
>>> Now that we've reworked the handling of compatiblity modes, we have the
>>> information to actually determine if we're making a compatible migration.
>>> So this patch partially restores the PVR check.  If the source was running
>>> in a compatibility mode, we just make sure that the destination cpu can
>>> also run in that compatibility mode.  However, if the source was running
>>> in "raw" mode, we verify that the destination has the same PVR value.
>>>
>>> Signed-off-by: David Gibson 
>>> ---
>>>  target-ppc/machine.c | 15 +++
>>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
>>> index 5d87ff6..62b9e94 100644
>>> --- a/target-ppc/machine.c
>>> +++ b/target-ppc/machine.c
>>> @@ -173,10 +173,12 @@ static int cpu_post_load(void *opaque, int version_id)
>>>  target_ulong msr;
>>>  
>>>  /*
>>> - * We always ignore the source PVR. The user or management
>>> - * software has to take care of running QEMU in a compatible mode.
>>> + * If we're operating in compat mode, we should be ok as long as
>>> + * the destination supports the same compatiblity mode.
>>> + *
>>> + * Otherwise, however, we require that the destination has exactly
>>> + * the same CPU model as the source.
>>>   */
>>> -env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
>>>  
>>>  #if defined(TARGET_PPC64)
>>>  if (cpu->compat_pvr) {
>>> @@ -188,8 +190,13 @@ static int cpu_post_load(void *opaque, int version_id)
>>>  error_free(local_err);
>>>  return -1;
>>>  }
>>> -}
>>> +} else
>>>  #endif
>>> +{
>>> +if (env->spr[SPR_PVR] != env->spr_cb[SPR_PVR].default_value) {
>>> +return -1;
>>> +}
>>> +}
>>
>> This should break migration from host with PVR=004d0200 to host with
>> PVR=004d0201, what is the benefit of such limitation?
> 
> There probably isn't one.  But the point is it also blocks migration
> from a host with PVR=004B0201 (POWER8) to one with PVR=00201400
> (403GCX) and *that* has a clear benefit.  I don't see a way to block
> the second without the first, except by creating a huge compatibility
> matrix table, which would require inordinate amounts of time to
> research carefully.


This is pcc->pvr_match() for this purpose.



-- 
Alexey



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] usbredir: free vm_change_state_handler in usbredir destroy dispatch

2016-11-07 Thread Li Qiang
From: Li Qiang 

In usbredir destroy dispatch function, it doesn't free the vm change
state handler once registered in usbredir_realize function. This will
lead a memory leak issue. This patch avoid this.

Signed-off-by: Li Qiang 
---
 hw/usb/redirect.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 528081e..a657237 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -132,6 +132,7 @@ struct USBRedirDevice {
 struct usbredirfilter_rule *filter_rules;
 int filter_rules_count;
 int compatible_speedmask;
+VMChangeStateEntry *vmstate;
 };
 
 #define TYPE_USB_REDIR "usb-redir"
@@ -1411,7 +1412,8 @@ static void usbredir_realize(USBDevice *udev, Error 
**errp)
  usbredir_chardev_read, usbredir_chardev_event,
  dev, NULL, true);
 
-qemu_add_vm_change_state_handler(usbredir_vm_state_change, dev);
+dev->vmstate =
+qemu_add_vm_change_state_handler(usbredir_vm_state_change, dev);
 }
 
 static void usbredir_cleanup_device_queues(USBRedirDevice *dev)
@@ -1450,6 +1452,7 @@ static void usbredir_handle_destroy(USBDevice *udev)
 }
 
 free(dev->filter_rules);
+qemu_del_vm_change_state_handler(dev->vmstate);
 }
 
 static int usbredir_check_filter(USBRedirDevice *dev)
-- 
1.8.3.1




Re: [Qemu-devel] [RFC 12/17] ppc: Migrate compatibility mode

2016-11-07 Thread Alexey Kardashevskiy
On 08/11/16 16:19, David Gibson wrote:
> On Fri, Nov 04, 2016 at 04:58:47PM +1100, Alexey Kardashevskiy wrote:
>> On 30/10/16 22:12, David Gibson wrote:
>>> Server-class POWER CPUs can be put into several compatibility modes.  These
>>> can be specified on the command line, or negotiated by the guest during
>>> boot.
>>>
>>> Currently we don't migrate the compatibility mode, which means after a
>>> migration the guest will revert to running with whatever compatibility
>>> mode (or none) specified on the command line.
>>>
>>> With the limited range of CPUs currently used, this doesn't usually cause
>>> a problem, but it could.  Fix this by adding the compatibility mode (if
>>> set) to the migration stream.
>>>
>>> Signed-off-by: David Gibson 
>>> ---
>>>  target-ppc/machine.c | 34 ++
>>>  1 file changed, 34 insertions(+)
>>>
>>> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
>>> index 4820f22..5d87ff6 100644
>>> --- a/target-ppc/machine.c
>>> +++ b/target-ppc/machine.c
>>> @@ -9,6 +9,7 @@
>>>  #include "mmu-hash64.h"
>>>  #include "migration/cpu.h"
>>>  #include "exec/exec-all.h"
>>> +#include "qapi/error.h"
>>>  
>>>  static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
>>>  {
>>> @@ -176,6 +177,20 @@ static int cpu_post_load(void *opaque, int version_id)
>>>   * software has to take care of running QEMU in a compatible mode.
>>>   */
>>>  env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
>>> +
>>> +#if defined(TARGET_PPC64)
>>> +if (cpu->compat_pvr) {
>>> +Error *local_err = NULL;
>>> +
>>> +ppc_set_compat(cpu, cpu->compat_pvr, _err);
>>> +if (local_err) {
>>> +error_report_err(local_err);
>>> +error_free(local_err);
>>> +return -1;
>>> +}
>>> +}
>>> +#endif
>>> +
>>>  env->lr = env->spr[SPR_LR];
>>>  env->ctr = env->spr[SPR_CTR];
>>>  cpu_write_xer(env, env->spr[SPR_XER]);
>>> @@ -528,6 +543,24 @@ static const VMStateDescription vmstate_tlbmas = {
>>>  }
>>>  };
>>>  
>>> +static bool compat_needed(void *opaque)
>>> +{
>>> +PowerPCCPU *cpu = opaque;
>>> +
>>> +return cpu->compat_pvr != 0;
>>
>>
>> Finally got to trying how this affects migration :)
>>
>> This breaks migration to QEMU <=2.7, and it should not at least when both
>> source and destination are running with  -cpu host,compat=power7.
> 
> IIUC, we don't generally try to maintain backwards migration, even for
> old machine types.


I thought the opposite - we generally try to maintain it, this is pretty
much why we use these subsections in cases like this; otherwise you could
just add a new field and bump the vmstate_ppc_cpu.version.


> 
>>
>>
>>> +}
>>> +
>>> +static const VMStateDescription vmstate_compat = {
>>> +.name = "cpu/compat",
>>> +.version_id = 1,
>>> +.minimum_version_id = 1,
>>> +.needed = compat_needed,
>>> +.fields = (VMStateField[]) {
>>> +VMSTATE_UINT32(compat_pvr, PowerPCCPU),
>>> +VMSTATE_END_OF_LIST()
>>> +}
>>> +};
>>> +
>>>  const VMStateDescription vmstate_ppc_cpu = {
>>>  .name = "cpu",
>>>  .version_id = 5,
>>> @@ -580,6 +613,7 @@ const VMStateDescription vmstate_ppc_cpu = {
>>>  _tlb6xx,
>>>  _tlbemb,
>>>  _tlbmas,
>>> +_compat,
>>>  NULL
>>>  }
>>>  };
>>>
>>
>>
> 


-- 
Alexey



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info

2016-11-07 Thread Li, Liang Z
> On 11/06/2016 07:37 PM, Li, Liang Z wrote:
> >> Let's say we do a 32k bitmap that can hold ~1M pages.  That's 4GB of RAM.
> >> On a 1TB system, that's 256 passes through the top-level loop.
> >> The bottom-level lists have tens of thousands of pages in them, even
> >> on my laptop.  Only 1/256 of these pages will get consumed in a given pass.
> >>
> > Your description is not exactly.
> > A 32k bitmap is used only when there is few free memory left in the
> > system and when the extend_page_bitmap() failed to allocate more
> > memory for the bitmap. Or dozens of 32k split bitmap will be used,
> > this version limit the bitmap count to 32, it means we can use at most
> > 32*32 kB for the bitmap, which can cover 128GB for RAM. We can increase
> the bitmap count limit to a larger value if 32 is not big enough.
> 
> OK, so it tries to allocate a large bitmap.  But, if it fails, it will try to 
> work with a
> smaller bitmap.  Correct?
> 
Yes.

> So, what's the _worst_ case?  It sounds like it is even worse than I was
> positing.
> 

Only a  32KB bitmap can be allocated, and there are a huge amount of low order 
(<3) free pages is the worst case. 

> >> That's an awfully inefficient way of doing it.  This patch
> >> essentially changed the data structure without changing the algorithm to
> populate it.
> >>
> >> Please change the *algorithm* to use the new data structure efficiently.
> >>  Such a change would only do a single pass through each freelist, and
> >> would choose whether to use the extent-based (pfn -> range) or
> >> bitmap-based approach based on the contents of the free lists.
> >
> > Save the free page info to a raw bitmap first and then process the raw
> > bitmap to get the proper ' extent-based ' and  'bitmap-based' is the
> > most efficient way I can come up with to save the virtio data transmission.
> Do you have some better idea?
> 
> That's kinda my point.  This patch *does* processing to try to pack the
> bitmaps full of pages from the various pfn ranges.  It's a form of processing
> that gets *REALLY*, *REALLY* bad in some (admittedly obscure) cases.
> 
> Let's not pretend that making an essentially unlimited number of passes over
> the free lists is not processing.
> 
> 1. Allocate as large of a bitmap as you can. (what you already do) 2. Iterate
> from the largest freelist order.  Store those pages in the
>bitmap.
> 3. If you can no longer fit pages in the bitmap, return the list that
>you have.
> 4. Make an approximation about where the bitmap does not make any more,
>and fall back to listing individual PFNs.  This would make sens, for
>instance in a large zone with very few free order-0 pages left.
> 
Sounds good.  Should we ignore some of the order-0 pages in step 4 if the 
bitmap is full?
Or should retry to get a complete list of order-0 pages?

> 
> > It seems the benefit we get for this feature is not as big as that in fast
> balloon inflating/deflating.
> >>
> >> You should not be using get_max_pfn().  Any patch set that continues
> >> to use it is not likely to be using a proper algorithm.
> >
> > Do you have any suggestion about how to avoid it?
> 
> Yes: get the pfns from the page free lists alone.  Don't derive them from the
> pfn limits of the system or zones.

The ' get_max_pfn()' can be avoid in this patch, but I think we can't avoid it 
completely.
We need it as a hint for allocating a proper size bitmap. No?

Thanks!
Liang



Re: [Qemu-devel] [Qemu-ppc] [RFC 03/17] pseries: Always use core objects for CPU construction

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 10:51:40AM +0100, Greg Kurz wrote:
> On Thu, 3 Nov 2016 19:11:48 +1100
> Alexey Kardashevskiy  wrote:
> 
> > On 30/10/16 22:11, David Gibson wrote:
> > > Currently the pseries machine has two paths for constructing CPUs.  On
> > > newer machine type versions, which support cpu hotplug, it constructs
> > > cpu core objects, which in turn construct CPU threads.  For older machine
> > > versions it individually constructs the CPU threads.
> > > 
> > > This division is going to make some future changes to the cpu construction
> > > harder, so this patch unifies them.  Now cpu core objects are always
> > > created.  This requires some updates to allow core objects to be created
> > > without a full complement of threads (since older versions allowed a
> > > number of cpus not a multiple of the threads-per-core).  Likewise it needs
> > > some changes to the cpu core hot/cold plug path so as not to choke on the
> > > old machine types without hotplug support.
> > > 
> > > For good measure, we move the cpu construction to its own subfunction,
> > > spapr_init_cpus().
> > > 
> > > Signed-off-by: David Gibson 
> > > ---
> > >  hw/ppc/spapr.c  | 125 
> > > +++-
> > >  hw/ppc/spapr_cpu_core.c |  30 +++-
> > >  include/hw/ppc/spapr.h  |   1 -
> > >  3 files changed, 89 insertions(+), 67 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index c8e2921..ad68a9d 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -1688,11 +1688,80 @@ static void 
> > > spapr_validate_node_memory(MachineState *machine, Error **errp)
> > >  }
> > >  }
> > >  
> > > +static void spapr_init_cpus(sPAPRMachineState *spapr)
> > > +{
> > > +MachineState *machine = MACHINE(spapr);
> > > +MachineClass *mc = MACHINE_GET_CLASS(machine);
> > > +char *type = spapr_get_cpu_core_type(machine->cpu_model);
> > > +int smt = kvmppc_smt_threads();
> > > +int spapr_max_cores, spapr_cores;
> > > +int i;
> > > +
> > > +if (!type) {
> > > +error_report("Unable to find sPAPR CPU Core definition");
> > > +exit(1);
> > > +}
> > > +
> > > +if (mc->query_hotpluggable_cpus) {
> > > +if (smp_cpus % smp_threads) {
> > > +error_report("smp_cpus (%u) must be multiple of threads 
> > > (%u)",
> > > + smp_cpus, smp_threads);
> > > +exit(1);
> > > +}
> > > +if (max_cpus % smp_threads) {
> > > +error_report("max_cpus (%u) must be multiple of threads 
> > > (%u)",
> > > + max_cpus, smp_threads);
> > > +exit(1);
> > > +}
> > > +
> > > +spapr_max_cores = max_cpus / smp_threads;
> > > +spapr_cores = smp_cpus / smp_threads;
> > > +} else {
> > > +if (max_cpus != smp_cpus) {
> > > +error_report("This machine version does not support CPU 
> > > hotplug");
> > > +exit(1);
> > > +}
> > > +
> > > +spapr_max_cores = QEMU_ALIGN_UP(smp_cpus, smp_threads) / 
> > > smp_threads;
> > > +spapr_cores = spapr_max_cores;
> > > +}
> > > +
> > > +spapr->cores = g_new0(Object *, spapr_max_cores);
> > > +for (i = 0; i < spapr_max_cores; i++) {
> > > +int core_id = i * smp_threads;
> > > +
> > > +if (mc->query_hotpluggable_cpus) {
> > > +sPAPRDRConnector *drc =
> > > +spapr_dr_connector_new(OBJECT(spapr),
> > > +   SPAPR_DR_CONNECTOR_TYPE_CPU,
> > > +   (core_id / smp_threads) * smt);
> > > +
> > > +qemu_register_reset(spapr_drc_reset, drc);
> > > +}
> > > +
> > > +if (i < spapr_cores) {
> > > +Object *core  = object_new(type);
> > > +int nr_threads = smp_threads;
> > > +
> > > +/* Handle the partially filled core for older machine types 
> > > */
> > > +if ((i + 1) * smp_threads >= smp_cpus) {
> > > +nr_threads = smp_cpus - i * smp_threads;
> > > +}  
> > 
> > 
> > What is this exactly for? Older machines report "qemu-system-ppc64: threads
> > must be 8" when I do "-smp 12,threads=8 -machine pseries-2.2".
> > 
> 
> IIUC, this lowers nr_threads for the last core to end up with the requested
> number of vCPUs... but spapr_core_pre_plug() doesn't like partially filled
> cores.
> 
> if (cc->nr_threads != smp_threads) {
> error_setg(_err, "threads must be %d", smp_threads);
> goto out;
> }

Ah, yeah, that's a bug.  I hadn't had a chance to test on real
hardware yet, just TCG, which only supports 1 thread per core, so I
hadn't spotted this.  I'll fix it in the next spin.

> BTW, this error message looks weird when ones has passed "-smp threads=8"...
> It should better reads:
> 
> "unsupported partially filled core (%d threads, 

Re: [Qemu-devel] [RFC 11/17] ppc: Add ppc_set_compat_all()

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 03:01:40PM +1100, Alexey Kardashevskiy wrote:
> On 30/10/16 22:12, David Gibson wrote:
> > Once a compatiblity mode is negotiated with the guest,
> > h_client_architecture_support() uses run_on_cpu() to update each CPU to
> > the new mode.  We're going to want this logic somewhere else shortly,
> > so make a helper function to do this global update.
> > 
> > We put it in target-ppc/compat.c - it makes as much sense at the CPU level
> > as it does at the machine level.  We also move the cpu_synchronize_state()
> > into ppc_set_compat(), since it doesn't really make any sense to call that
> > without synchronizing state.
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  hw/ppc/spapr_hcall.c | 31 +--
> >  target-ppc/compat.c  | 36 
> >  target-ppc/cpu.h |  3 +++
> >  3 files changed, 44 insertions(+), 26 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > index 3bd6d06..4eaf9a6 100644
> > --- a/hw/ppc/spapr_hcall.c
> > +++ b/hw/ppc/spapr_hcall.c
> > @@ -881,20 +881,6 @@ static target_ulong h_set_mode(PowerPCCPU *cpu, 
> > sPAPRMachineState *spapr,
> >  return ret;
> >  }
> >  
> > -typedef struct {
> > -uint32_t compat_pvr;
> > -Error *err;
> > -} SetCompatState;
> > -
> > -static void do_set_compat(CPUState *cs, void *arg)
> > -{
> > -PowerPCCPU *cpu = POWERPC_CPU(cs);
> > -SetCompatState *s = arg;
> > -
> > -cpu_synchronize_state(cs);
> > -ppc_set_compat(cpu, s->compat_pvr, >err);
> > -}
> > -
> >  static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >sPAPRMachineState *spapr,
> >target_ulong opcode,
> > @@ -902,7 +888,6 @@ static target_ulong 
> > h_client_architecture_support(PowerPCCPU *cpu,
> >  {
> >  target_ulong list = ppc64_phys_to_real(args[0]);
> >  target_ulong ov_table;
> > -CPUState *cs;
> >  bool explicit_match = false; /* Matched the CPU's real PVR */
> >  uint32_t max_compat = cpu->max_compat;
> >  uint32_t best_compat = 0;
> > @@ -949,18 +934,12 @@ static target_ulong 
> > h_client_architecture_support(PowerPCCPU *cpu,
> >  
> >  /* Update CPUs */
> >  if (cpu->compat_pvr != best_compat) {
> > -CPU_FOREACH(cs) {
> > -SetCompatState s = {
> > -.compat_pvr = best_compat,
> > -.err = NULL,
> > -};
> > +Error *local_err = NULL;
> >  
> > -run_on_cpu(cs, do_set_compat, );
> > -
> > -if (s.err) {
> > -error_report_err(s.err);
> > -return H_HARDWARE;
> > -}
> > +ppc_set_compat_all(best_compat, _err);
> > +if (local_err) {
> > +error_report_err(local_err);
> > +return H_HARDWARE;
> >  }
> >  }
> >  
> > diff --git a/target-ppc/compat.c b/target-ppc/compat.c
> > index 1059555..0b12b58 100644
> > --- a/target-ppc/compat.c
> > +++ b/target-ppc/compat.c
> > @@ -124,6 +124,8 @@ void ppc_set_compat(PowerPCCPU *cpu, uint32_t 
> > compat_pvr, Error **errp)
> >  pcr = compat->pcr;
> >  }
> >  
> > +cpu_synchronize_state(CPU(cpu));
> > +
> >  cpu->compat_pvr = compat_pvr;
> >  env->spr[SPR_PCR] = pcr & pcc->pcr_mask;
> >  
> > @@ -136,6 +138,40 @@ void ppc_set_compat(PowerPCCPU *cpu, uint32_t 
> > compat_pvr, Error **errp)
> >  }
> >  }
> >  
> > +#if !defined(CONFIG_USER_ONLY)
> > +typedef struct {
> > +uint32_t compat_pvr;
> > +Error *err;
> > +} SetCompatState;
> > +
> > +static void do_set_compat(CPUState *cs, void *arg)
> > +{
> > +PowerPCCPU *cpu = POWERPC_CPU(cs);
> > +SetCompatState *s = arg;
> > +
> > +ppc_set_compat(cpu, s->compat_pvr, >err);
> > +}
> > +
> > +void ppc_set_compat_all(uint32_t compat_pvr, Error **errp)
> > +{
> > +CPUState *cs;
> > +
> > +CPU_FOREACH(cs) {
> > +SetCompatState s = {
> > +.compat_pvr = compat_pvr,
> > +.err = NULL,
> > +};
> > +
> > +run_on_cpu(cs, do_set_compat, );
> > +
> > +if (s.err) {
> > +error_propagate(errp, s.err);
> > +return;
> > +}
> > +}
> > +}
> > +#endif
> > +
> >  int ppc_compat_max_threads(PowerPCCPU *cpu)
> >  {
> >  const CompatInfo *compat = compat_by_pvr(cpu->compat_pvr);
> > diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> > index 91e8be8..201a655 100644
> > --- a/target-ppc/cpu.h
> > +++ b/target-ppc/cpu.h
> > @@ -1317,6 +1317,9 @@ static inline int cpu_mmu_index (CPUPPCState *env, 
> > bool ifetch)
> >  bool ppc_check_compat(PowerPCCPU *cpu, uint32_t compat_pvr,
> >uint32_t min_compat_pvr, uint32_t max_compat_pvr);
> >  void ppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr, Error **errp);
> > +#if !defined(CONFIG_USER_ONLY)
> > +void 

Re: [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create

2016-11-07 Thread John Snow



On 11/03/2016 09:17 AM, Kevin Wolf wrote:

Am 02.11.2016 um 18:50 hat John Snow geschrieben:

Refactor backup_start as backup_job_create, which only creates the job,
but does not automatically start it. The old interface, 'backup_start',
is not kept in favor of limiting the number of nearly-identical interfaces
that would have to be edited to keep up with QAPI changes in the future.

Callers that wish to synchronously start the backup_block_job can
instead just call block_job_start immediately after calling
backup_job_create.

Transactions are updated to use the new interface, calling block_job_start
only during the .commit phase, which helps prevent race conditions where
jobs may finish before we even finish building the transaction. This may
happen, for instance, during empty block backup jobs.

Reported-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: John Snow 



+static void drive_backup_commit(BlkActionState *common)
+{
+DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
+if (state->job) {
+block_job_start(state->job);
+}
 }


How could state->job ever be NULL?



Mechanical thinking. It can't. (I definitely didn't copy paste from the 
.abort routines. Definitely.)



Same question for abort, and for blockdev_backup_commit/abort.



Abort ... we may not have created the job successfully. Abort gets 
called whether or not we made it to or through the matching .prepare.



Kevin



--js



Re: [Qemu-devel] [RFC 15/17] ppc: Check that CPU model stays consistent across migration

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 06:54:48PM +1100, Alexey Kardashevskiy wrote:
> On 30/10/16 22:12, David Gibson wrote:
> > When a vmstate for the ppc cpu was first introduced (a90db15 "target-ppc:
> > Convert ppc cpu savevm to VMStateDescription"), a VMSTATE_EQUAL was used
> > to ensure that identical CPU models were used at source and destination
> > as based on the PVR (Processor Version Register).
> > 
> > However this was a problem for HV KVM, where due to hardware limitations
> > we always need to use the real PVR of the host CPU.  So, to allow
> > migration between hosts with "similar enough" CPUs, the PVR check was
> > removed in 569be9f0 "target-ppc: Remove PVR check from migration".  This
> > left the onus on user / management to only attempt migration between
> > compatible CPUs.
> > 
> > Now that we've reworked the handling of compatiblity modes, we have the
> > information to actually determine if we're making a compatible migration.
> > So this patch partially restores the PVR check.  If the source was running
> > in a compatibility mode, we just make sure that the destination cpu can
> > also run in that compatibility mode.  However, if the source was running
> > in "raw" mode, we verify that the destination has the same PVR value.
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  target-ppc/machine.c | 15 +++
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> > index 5d87ff6..62b9e94 100644
> > --- a/target-ppc/machine.c
> > +++ b/target-ppc/machine.c
> > @@ -173,10 +173,12 @@ static int cpu_post_load(void *opaque, int version_id)
> >  target_ulong msr;
> >  
> >  /*
> > - * We always ignore the source PVR. The user or management
> > - * software has to take care of running QEMU in a compatible mode.
> > + * If we're operating in compat mode, we should be ok as long as
> > + * the destination supports the same compatiblity mode.
> > + *
> > + * Otherwise, however, we require that the destination has exactly
> > + * the same CPU model as the source.
> >   */
> > -env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
> >  
> >  #if defined(TARGET_PPC64)
> >  if (cpu->compat_pvr) {
> > @@ -188,8 +190,13 @@ static int cpu_post_load(void *opaque, int version_id)
> >  error_free(local_err);
> >  return -1;
> >  }
> > -}
> > +} else
> >  #endif
> > +{
> > +if (env->spr[SPR_PVR] != env->spr_cb[SPR_PVR].default_value) {
> > +return -1;
> > +}
> > +}
> 
> This should break migration from host with PVR=004d0200 to host with
> PVR=004d0201, what is the benefit of such limitation?

There probably isn't one.  But the point is it also blocks migration
from a host with PVR=004B0201 (POWER8) to one with PVR=00201400
(403GCX) and *that* has a clear benefit.  I don't see a way to block
the second without the first, except by creating a huge compatibility
matrix table, which would require inordinate amounts of time to
research carefully.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC 07/17] ppc: Rewrite ppc_set_compat()

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 01:57:45PM +1100, Alexey Kardashevskiy wrote:
> On 30/10/16 22:11, David Gibson wrote:
> > This rewrites the ppc_set_compat() function so that instead of open coding
> > the various compatibility modes, it reads the relevant data from a table.
> > This is a first step in consolidating the information on compatibility
> > modes scattered across the code into a single place.
> > 
> > It also makes one change to the logic.  The old code masked the bits to be
> > set in the PCR (Processor Compatibility Register) by which bits are valid
> > on the host CPU.  This made no sense, since it was done regardless of
> > whether our guest CPU was the same as the host CPU or not.  Futhermore,
> 
> s/Futhermore/Furthermore/
> 
> 
> > the actual PCR bits are only relevant for TCG[1] - KVM instead uses the
> > compatibility mode we tell it in kvmppc_set_compat().  When using TCG
> > host cpu information usually isn't even present.
> > 
> > While we're at it, we put the new implementation in a new file to make the
> > enormouse translate_init.c a little smaller.
> 
> s/enormouse/enormous/

Thanks, spelling corrections applied.

> 
> 
> > 
> > [1] Actually it doesn't even do anything in TCG, but it will if / when we
> > get to implementing compatibility mode logic at that level.
> > 
> > Signed-off-by: David Gibson 
> 
> Reviewed-by: Alexey Kardashevskiy 
> 
> > ---
> >  target-ppc/Makefile.objs|  1 +
> >  target-ppc/compat.c | 91 
> > +
> >  target-ppc/cpu.h|  6 ++-
> >  target-ppc/translate_init.c | 41 
> >  4 files changed, 97 insertions(+), 42 deletions(-)
> >  create mode 100644 target-ppc/compat.c
> > 
> > diff --git a/target-ppc/Makefile.objs b/target-ppc/Makefile.objs
> > index e667e69..feb5c30 100644
> > --- a/target-ppc/Makefile.objs
> > +++ b/target-ppc/Makefile.objs
> > @@ -15,3 +15,4 @@ obj-y += misc_helper.o
> >  obj-y += mem_helper.o
> >  obj-$(CONFIG_USER_ONLY) += user_only_helper.o
> >  obj-y += gdbstub.o
> > +obj-$(TARGET_PPC64) += compat.o
> > diff --git a/target-ppc/compat.c b/target-ppc/compat.c
> > new file mode 100644
> > index 000..f3fd9c6
> > --- /dev/null
> > +++ b/target-ppc/compat.c
> > @@ -0,0 +1,91 @@
> > +/*
> > + *  PowerPC CPU initialization for qemu.
> > + *
> > + *  Copyright 2016, David Gibson, Red Hat Inc. 
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, see 
> > .
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "sysemu/kvm.h"
> > +#include "kvm_ppc.h"
> > +#include "sysemu/cpus.h"
> > +#include "qemu/error-report.h"
> > +#include "qapi/error.h"
> > +#include "cpu-models.h"
> > +
> > +typedef struct {
> > +uint32_t pvr;
> > +uint64_t pcr;
> > +} CompatInfo;
> > +
> > +static const CompatInfo compat_table[] = {
> > +{ /* POWER6, ISA2.05 */
> > +.pvr = CPU_POWERPC_LOGICAL_2_05,
> > +.pcr = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_COMPAT_2_05
> > +   | PCR_TM_DIS | PCR_VSX_DIS,
> > +},
> > +{ /* POWER7, ISA2.06 */
> > +.pvr = CPU_POWERPC_LOGICAL_2_06,
> > +.pcr = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_TM_DIS,
> > +},
> > +{
> > +.pvr = CPU_POWERPC_LOGICAL_2_06_PLUS,
> > +.pcr = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_TM_DIS,
> > +},
> > +{ /* POWER8, ISA2.07 */
> > +.pvr = CPU_POWERPC_LOGICAL_2_07,
> > +.pcr = PCR_COMPAT_2_07,
> > +},
> > +};
> > +
> > +static const CompatInfo *compat_by_pvr(uint32_t pvr)
> > +{
> > +int i;
> > +
> > +for (i = 0; i < ARRAY_SIZE(compat_table); i++) {
> > +if (compat_table[i].pvr == pvr) {
> > +return _table[i];
> > +}
> > +}
> > +return NULL;
> > +}
> > +
> > +void ppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr, Error **errp)
> > +{
> > +const CompatInfo *compat = compat_by_pvr(compat_pvr);
> > +CPUPPCState *env = >env;
> > +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> > +uint64_t pcr;
> > +
> > +if (!compat_pvr) {
> > +pcr = 0;
> > +} else if (!compat) {
> > +error_setg(errp, "Unknown compatibility PVR 0x%08"PRIx32, 
> > compat_pvr);
> > +return;
> > +} else {
> > +pcr = 

Re: [Qemu-devel] [RFC 08/17] ppc: Rewrite ppc_get_compat_smt_threads()

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 02:37:18PM +1100, Alexey Kardashevskiy wrote:
> On 30/10/16 22:11, David Gibson wrote:
> > To continue consolidation of compatibility mode information, this rewrites
> > the ppc_get_compat_smt_threads() function using the table of compatiblity
> > modes in target-ppc/compat.c.
> > 
> > It's not a direct replacement, the new ppc_compat_max_threads() function
> > has simpler semantics - it just returns the number of threads the cpu
> > model has, taking into account any compatiblity mode it is in.
> > 
> > This no longer takes into account kvmppc_smt_threads() as the previous
> > version did.  That check wasn't useful because we check elsewhere that
> 
> Nit: s/elsewhere/in ppc_cpu_realizefn()/

Changed.

> 
> 
> Reviewed-by: Alexey Kardashevskiy 
> 
> 
> 
> 
> > CPUs aren't instantiated with more threads than kvm allows (or if we didn't
> > things will already be broken and this won't make it any worse).
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  hw/ppc/spapr.c  |  8 
> >  target-ppc/compat.c | 18 ++
> >  target-ppc/cpu.h|  2 +-
> >  target-ppc/translate_init.c | 20 
> >  4 files changed, 23 insertions(+), 25 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 276cefa..6c78889 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -207,6 +207,7 @@ static int spapr_fixup_cpu_dt(void *fdt, 
> > sPAPRMachineState *spapr)
> >  PowerPCCPU *cpu = POWERPC_CPU(cs);
> >  DeviceClass *dc = DEVICE_GET_CLASS(cs);
> >  int index = ppc_get_vcpu_dt_id(cpu);
> > +int compat_smt = MIN(smp_threads, ppc_compat_max_threads(cpu));
> >  
> >  if ((index % smt) != 0) {
> >  continue;
> > @@ -241,8 +242,7 @@ static int spapr_fixup_cpu_dt(void *fdt, 
> > sPAPRMachineState *spapr)
> >  return ret;
> >  }
> >  
> > -ret = spapr_fixup_cpu_smt_dt(fdt, offset, cpu,
> > - ppc_get_compat_smt_threads(cpu));
> > +ret = spapr_fixup_cpu_smt_dt(fdt, offset, cpu, compat_smt);
> >  if (ret < 0) {
> >  return ret;
> >  }
> > @@ -408,6 +408,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void 
> > *fdt, int offset,
> >  size_t page_sizes_prop_size;
> >  uint32_t vcpus_per_socket = smp_threads * smp_cores;
> >  uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
> > +int compat_smt = MIN(smp_threads, ppc_compat_max_threads(cpu));
> >  sPAPRDRConnector *drc;
> >  sPAPRDRConnectorClass *drck;
> >  int drc_index;
> > @@ -495,8 +496,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void 
> > *fdt, int offset,
> >  
> >  _FDT(spapr_fixup_cpu_numa_dt(fdt, offset, cs));
> >  
> > -_FDT(spapr_fixup_cpu_smt_dt(fdt, offset, cpu,
> > -ppc_get_compat_smt_threads(cpu)));
> > +_FDT(spapr_fixup_cpu_smt_dt(fdt, offset, cpu, compat_smt));
> >  }
> >  
> >  static void spapr_populate_cpus_dt_node(void *fdt, sPAPRMachineState 
> > *spapr)
> > diff --git a/target-ppc/compat.c b/target-ppc/compat.c
> > index f3fd9c6..66529a6 100644
> > --- a/target-ppc/compat.c
> > +++ b/target-ppc/compat.c
> > @@ -28,6 +28,7 @@
> >  typedef struct {
> >  uint32_t pvr;
> >  uint64_t pcr;
> > +int max_threads;
> >  } CompatInfo;
> >  
> >  static const CompatInfo compat_table[] = {
> > @@ -35,18 +36,22 @@ static const CompatInfo compat_table[] = {
> >  .pvr = CPU_POWERPC_LOGICAL_2_05,
> >  .pcr = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_COMPAT_2_05
> > | PCR_TM_DIS | PCR_VSX_DIS,
> > +.max_threads = 2,
> >  },
> >  { /* POWER7, ISA2.06 */
> >  .pvr = CPU_POWERPC_LOGICAL_2_06,
> >  .pcr = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_TM_DIS,
> > +.max_threads = 4,
> >  },
> >  {
> >  .pvr = CPU_POWERPC_LOGICAL_2_06_PLUS,
> >  .pcr = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_TM_DIS,
> > +.max_threads = 4,
> >  },
> >  { /* POWER8, ISA2.07 */
> >  .pvr = CPU_POWERPC_LOGICAL_2_07,
> >  .pcr = PCR_COMPAT_2_07,
> > +.max_threads = 8,
> >  },
> >  };
> >  
> > @@ -89,3 +94,16 @@ void ppc_set_compat(PowerPCCPU *cpu, uint32_t 
> > compat_pvr, Error **errp)
> >  }
> >  }
> >  }
> > +
> > +int ppc_compat_max_threads(PowerPCCPU *cpu)
> > +{
> > +const CompatInfo *compat = compat_by_pvr(cpu->compat_pvr);
> > +int n_threads = CPU(cpu)->nr_threads;
> > +
> > +if (cpu->compat_pvr) {
> > +g_assert(compat);
> > +n_threads = MIN(n_threads, compat->max_threads);
> > +}
> > +
> > +return n_threads;
> > +}
> > diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> > index 15d5e4b..cfda7b2 100644
> > --- a/target-ppc/cpu.h
> > +++ b/target-ppc/cpu.h
> > @@ -1241,7 +1241,6 @@ void ppc_store_sdr1 

Re: [Qemu-devel] [RFC 09/17] ppc: Validate compatibility modes when setting

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 02:45:02PM +1100, Alexey Kardashevskiy wrote:
> On 31/10/16 19:39, David Gibson wrote:
> > On Mon, Oct 31, 2016 at 04:55:42PM +1100, Alexey Kardashevskiy wrote:
> >> On 30/10/16 22:12, David Gibson wrote:
> >>> Current ppc_set_compat() will attempt to set any compatiblity mode
> >>> specified, regardless of whether it's available on the CPU.  The caller is
> >>> expected to make sure it is setting a possible mode, which is awkwward
> >>> because most of the information to make that decision is at the CPU level.
> >>>
> >>> This begins to clean this up by introducing a ppc_check_compat() function
> >>> which will determine if a given compatiblity mode is supported on a CPU
> >>> (and also whether it lies within specified minimum and maximum compat
> >>> levels, which will be useful later).  It also contains an assertion that
> >>> the CPU has a "virtual hypervisor"[1], that is, that the guest isn't
> >>> permitted to execute hypervisor privilege code.  Without that, the guest
> >>> would own the PCR and so could override any mode set here.  Only machine
> >>> types which use a virtual hypervisor (i.e. 'pseries') should use
> >>> ppc_check_compat().
> >>>
> >>> ppc_set_compat() is modified to validate the compatibility mode it is 
> >>> given
> >>> and fail if it's not available on this CPU.
> >>>
> >>> [1] Or user-only mode, which also obviously doesn't allow access to the
> >>> hypervisor privileged PCR.  We don't use that now, but could in future.
> >>>
> >>> Signed-off-by: David Gibson 
> >>> ---
> >>>  target-ppc/compat.c | 41 +
> >>>  target-ppc/cpu.h|  2 ++
> >>>  2 files changed, 43 insertions(+)
> >>>
> >>> diff --git a/target-ppc/compat.c b/target-ppc/compat.c
> >>> index 66529a6..1059555 100644
> >>> --- a/target-ppc/compat.c
> >>> +++ b/target-ppc/compat.c
> >>> @@ -28,29 +28,37 @@
> >>>  typedef struct {
> >>>  uint32_t pvr;
> >>>  uint64_t pcr;
> >>> +uint64_t pcr_level;
> >>>  int max_threads;
> >>>  } CompatInfo;
> >>>  
> >>>  static const CompatInfo compat_table[] = {
> >>> +/*
> >>> + * Ordered from oldest to newest - the code relies on this
> 
> In last 5+ years, I have never seen pointer compared anyhow but using "=="
> and "!=". A bit unusual.

Unusual, yes, but it has its uses from time to time.

> 
> 
> Reviewed-by: Alexey Kardashevskiy 
> 
> 
> 
> 
> 
> >>> + */
> >>>  { /* POWER6, ISA2.05 */
> >>>  .pvr = CPU_POWERPC_LOGICAL_2_05,
> >>>  .pcr = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_COMPAT_2_05
> >>> | PCR_TM_DIS | PCR_VSX_DIS,
> >>> +.pcr_level = PCR_COMPAT_2_05,
> >>>  .max_threads = 2,
> >>>  },
> >>>  { /* POWER7, ISA2.06 */
> >>>  .pvr = CPU_POWERPC_LOGICAL_2_06,
> >>>  .pcr = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_TM_DIS,
> >>> +.pcr_level = PCR_COMPAT_2_06,
> >>>  .max_threads = 4,
> >>>  },
> >>>  {
> >>>  .pvr = CPU_POWERPC_LOGICAL_2_06_PLUS,
> >>>  .pcr = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_TM_DIS,
> >>> +.pcr_level = PCR_COMPAT_2_06,
> >>>  .max_threads = 4,
> >>>  },
> >>>  { /* POWER8, ISA2.07 */
> >>>  .pvr = CPU_POWERPC_LOGICAL_2_07,
> >>>  .pcr = PCR_COMPAT_2_07,
> >>> +.pcr_level = PCR_COMPAT_2_07,
> >>>  .max_threads = 8,
> >>>  },
> >>>  };
> >>> @@ -67,6 +75,35 @@ static const CompatInfo *compat_by_pvr(uint32_t pvr)
> >>>  return NULL;
> >>>  }
> >>>  
> >>> +bool ppc_check_compat(PowerPCCPU *cpu, uint32_t compat_pvr,
> >>> +  uint32_t min_compat_pvr, uint32_t max_compat_pvr)
> >>> +{
> >>> +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> >>> +const CompatInfo *compat = compat_by_pvr(compat_pvr);
> >>> +const CompatInfo *min = compat_by_pvr(min_compat_pvr);
> >>> +const CompatInfo *max = compat_by_pvr(max_compat_pvr);
> >>
> >>
> >> You keep giving very generic names (as "min" and "max") to local
> >> variables ;)
> > 
> > For local variables, brevity is a virtue.
> 
> 
> 
> 
> 




-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC 12/17] ppc: Migrate compatibility mode

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 04:58:47PM +1100, Alexey Kardashevskiy wrote:
> On 30/10/16 22:12, David Gibson wrote:
> > Server-class POWER CPUs can be put into several compatibility modes.  These
> > can be specified on the command line, or negotiated by the guest during
> > boot.
> > 
> > Currently we don't migrate the compatibility mode, which means after a
> > migration the guest will revert to running with whatever compatibility
> > mode (or none) specified on the command line.
> > 
> > With the limited range of CPUs currently used, this doesn't usually cause
> > a problem, but it could.  Fix this by adding the compatibility mode (if
> > set) to the migration stream.
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  target-ppc/machine.c | 34 ++
> >  1 file changed, 34 insertions(+)
> > 
> > diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> > index 4820f22..5d87ff6 100644
> > --- a/target-ppc/machine.c
> > +++ b/target-ppc/machine.c
> > @@ -9,6 +9,7 @@
> >  #include "mmu-hash64.h"
> >  #include "migration/cpu.h"
> >  #include "exec/exec-all.h"
> > +#include "qapi/error.h"
> >  
> >  static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
> >  {
> > @@ -176,6 +177,20 @@ static int cpu_post_load(void *opaque, int version_id)
> >   * software has to take care of running QEMU in a compatible mode.
> >   */
> >  env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
> > +
> > +#if defined(TARGET_PPC64)
> > +if (cpu->compat_pvr) {
> > +Error *local_err = NULL;
> > +
> > +ppc_set_compat(cpu, cpu->compat_pvr, _err);
> > +if (local_err) {
> > +error_report_err(local_err);
> > +error_free(local_err);
> > +return -1;
> > +}
> > +}
> > +#endif
> > +
> >  env->lr = env->spr[SPR_LR];
> >  env->ctr = env->spr[SPR_CTR];
> >  cpu_write_xer(env, env->spr[SPR_XER]);
> > @@ -528,6 +543,24 @@ static const VMStateDescription vmstate_tlbmas = {
> >  }
> >  };
> >  
> > +static bool compat_needed(void *opaque)
> > +{
> > +PowerPCCPU *cpu = opaque;
> > +
> > +return cpu->compat_pvr != 0;
> 
> 
> Finally got to trying how this affects migration :)
> 
> This breaks migration to QEMU <=2.7, and it should not at least when both
> source and destination are running with  -cpu host,compat=power7.

IIUC, we don't generally try to maintain backwards migration, even for
old machine types.

> 
> 
> > +}
> > +
> > +static const VMStateDescription vmstate_compat = {
> > +.name = "cpu/compat",
> > +.version_id = 1,
> > +.minimum_version_id = 1,
> > +.needed = compat_needed,
> > +.fields = (VMStateField[]) {
> > +VMSTATE_UINT32(compat_pvr, PowerPCCPU),
> > +VMSTATE_END_OF_LIST()
> > +}
> > +};
> > +
> >  const VMStateDescription vmstate_ppc_cpu = {
> >  .name = "cpu",
> >  .version_id = 5,
> > @@ -580,6 +613,7 @@ const VMStateDescription vmstate_ppc_cpu = {
> >  _tlb6xx,
> >  _tlbemb,
> >  _tlbmas,
> > +_compat,
> >  NULL
> >  }
> >  };
> > 
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC 13/17] pseries: Move CPU compatibility property to machine

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 06:43:52PM +1100, Alexey Kardashevskiy wrote:
> On 30/10/16 22:12, David Gibson wrote:
> > Server class POWER CPUs have a "compat" property, which is used to set the
> > backwards compatibility mode for the processor.  However, this only makes
> > sense for machine types which don't give the guest access to hypervisor
> > privilege - otherwise the compatibility level is under the guest's control.
> > 
> > To reflect this, this removes the CPU 'compat' property and instead
> > creates a 'max-cpu-compat' property on the pseries machine.  Strictly
> > speaking this breaks compatibility, but AFAIK the 'compat' option was
> > never (directly) used with -device or device_add.
> > 
> > The option was used with -cpu.  So, to maintain compatibility, this patch
> > adds a hack to the cpu option parsing to strip out any compat options
> > supplied with -cpu and set them on the machine property instead of the new
> > removed cpu property.
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  hw/ppc/spapr.c  |  6 +++-
> >  hw/ppc/spapr_cpu_core.c | 47 +++--
> >  hw/ppc/spapr_hcall.c|  2 +-
> >  include/hw/ppc/spapr.h  | 10 +--
> >  target-ppc/compat.c | 65 
> >  target-ppc/cpu.h|  6 ++--
> >  target-ppc/translate_init.c | 73 
> > -
> >  7 files changed, 127 insertions(+), 82 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 6c78889..b983faa 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1849,7 +1849,7 @@ static void ppc_spapr_init(MachineState *machine)
> >  machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
> >  }
> >  
> > -ppc_cpu_parse_features(machine->cpu_model);
> > +spapr_cpu_parse_features(spapr);
> >  
> >  spapr_init_cpus(spapr);
> >  
> > @@ -2191,6 +2191,10 @@ static void spapr_machine_initfn(Object *obj)
> >  " place of standard EPOW events when 
> > possible"
> >  " (required for memory hot-unplug 
> > support)",
> >  NULL);
> > +
> > +object_property_add(obj, "max-cpu-compat", "str",
> > +ppc_compat_prop_get, ppc_compat_prop_set,
> > +NULL, >max_compat_pvr, _fatal);
> >  }
> >  
> >  static void spapr_machine_finalizefn(Object *obj)
> > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> > index ee5cd14..0319516 100644
> > --- a/hw/ppc/spapr_cpu_core.c
> > +++ b/hw/ppc/spapr_cpu_core.c
> > @@ -18,6 +18,49 @@
> >  #include "target-ppc/mmu-hash64.h"
> >  #include "sysemu/numa.h"
> >  
> > +void spapr_cpu_parse_features(sPAPRMachineState *spapr)
> > +{
> > +/*
> > + * Backwards compatibility hack:
> > +
> > + *   CPUs had a "compat=" property which didn't make sense for
> > + *   anything except pseries.  It was replaced by "max-cpu-compat"
> > + *   machine option.  This supports old command lines like
> > + *   -cpu POWER8,compat=power7
> > + *   By stripping the compat option and applying it to the machine
> > + *   before passing it on to the cpu level parser.
> > + */
> > +gchar **inpieces, **outpieces;
> > +int n, i, j;
> > +gchar *compat_str = NULL;
> > +gchar *filtered_model;
> > +
> > +inpieces = g_strsplit(MACHINE(spapr)->cpu_model, ",", 0);
> > +n = g_strv_length(inpieces);
> > +outpieces = g_new0(gchar *, g_strv_length(inpieces));
> > +
> > +/* inpieces[0] is the actual model string */
> > +for (i = 0, j = 0; i < n; i++) {
> > +if (g_str_has_prefix(inpieces[i], "compat=")) {
> > +compat_str = inpieces[i];
> > +} else {
> > +outpieces[j++] = g_strdup(inpieces[i]);
> > +}
> > +}
> > +
> > +if (compat_str) {
> > +char *val = compat_str + strlen("compat=");
> > +object_property_set_str(OBJECT(spapr), val, "max-cpu-compat",
> > +_fatal);
> 
> This part is ok.
> 
> > +}
> > +
> > +filtered_model = g_strjoinv(",", outpieces);
> > +ppc_cpu_parse_features(filtered_model);
> 
> 
> Rather than reducing the CPU parameters string from the command line, I'd
> keep "dc->props = powerpc_servercpu_properties" and make them noop + warn
> to use the machine option instead. One day QEMU may start calling the CPU
> features parser itself and somebody will have to hack this thing
> again.

Hrm.  A deprecation message like that only works if a human is reading
it.  Usually qemu will be invoked by libvirt and the message will
probably disappear into some log file to scare someone unnecessarily.

Meanwhile, what will the actual behaviour be?  Pulling the CPU's
property value into the machine instead would be really ugly.
Ignoring it would break users with existing 

Re: [Qemu-devel] [RFC 16/17] ppc: Remove counter-productive "sanity checks" in migration

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 04:52:39PM +1100, Alexey Kardashevskiy wrote:
> On 30/10/16 22:12, David Gibson wrote:
> > When vmstate for the ppc cpu was introduced in a90db158 "target-ppc:
> > Convert ppc cpu savevm to VMStateDescription", several "sanity check"
> > fields were included, verifying that certain cpu parameters matched between
> > source and destination.
> > 
> > This turns out not to have been a good idea.  For one thing it's redundant
> > with existing checks for a compatible cpu version at either end.  But more
> > importantly the insns_flags and insns_flags2 checks actively break things:
> > they expose what's essentially an internal TCG implementation detail in the
> > migration stream.  That means that when new instruction classes are added
> > or rearranged, migration can break.
> > 
> > This removes these ill-considered sanity checks.
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  target-ppc/machine.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> > index 62b9e94..453ef0a 100644
> > --- a/target-ppc/machine.c
> > +++ b/target-ppc/machine.c
> > @@ -602,10 +602,10 @@ const VMStateDescription vmstate_ppc_cpu = {
> >  /* FIXME: access_type? */
> >  
> >  /* Sanity checking */
> > -VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
> > -VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
> > -VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
> > -VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
> > +VMSTATE_UNUSED(sizeof(target_ulong) /* msr_mask */
> > +   + sizeof(uint64_t) /* insns_flags */
> > +   + sizeof(uint64_t) /* insns_flags2 */
> > +   + sizeof(uint32_t)), /* nb_BATs */
> 
> 
> This breaks migration to older QEMU:
> 
> 25055@1478238734.537761:vmstate_load_field_error field "env.msr_mask" load
> failed, ret = -22

Again, I don't think we generally support backwards migration.

That said, it would be nice here to do a "set to this field on
ourgoing migration, but ignore on incoming migration".  Do you know a
way to do that?

a
> 
> 
> 
> >  VMSTATE_END_OF_LIST()
> >  },
> >  .subsections = (const VMStateDescription*[]) {
> > 
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC 04/17] pseries: Make cpu_update during CAS unconditional

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 11:45:43AM +0100, Thomas Huth wrote:
> On 30.10.2016 12:11, David Gibson wrote:
> > spapr_h_cas_compose_response() includes a cpu_update parameter which
> > controls whether it includes updated information on the CPUs in the device
> > tree fragment returned from the ibm,client-architecture-support (CAS) call.
> > 
> > Providing the updated information is essential when CAS has negotiated
> > compatibility options which require different cpu information to be
> > presented to the guest.  However, it should be safe to provide in other
> > cases (it will just override the existing data in the device tree with
> > identical data).  This simplifies the code by removing the parameter and
> > always providing the cpu update information.
> 
> But updating the CPU device tree again and again will also increase the
> QEMU start-up time... Considering that guest start up time is sometimes
> also an issue, do you think that this code simplification really worth
> the effort here?

Given how much it made my brain hurt to try to work the subsequent
changes around that parameter; yes.

If we really have problems with startup time we can revisit this - and
we can probably do better in the context of cleaned up dt building.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC 01/17] ppc: Remove some stub POWER6 models

2016-11-07 Thread David Gibson
On Sun, Oct 30, 2016 at 10:11:52PM +1100, David Gibson wrote:
> The CPU model table includes stub (commented out) definitions for
> CPU_POWERPC_POWER6_5 and CPU_POWERPC_POWER6A.  These are not real cpu
> models, but represent the POWER6 in some compatiblity modes.  If we ever
> do implement POWER6 (unlikely), we'll implement its compatibility modes in
> a different way (similar to what we do for POWER7 and POWER8).  So these
> stub definitions can be removed.
> 
> Signed-off-by: David Gibson 

I think this one is sufficiently non-controversial that I've merged it
into ppc-for-2.8.

> ---
>  target-ppc/cpu-models.c | 4 
>  target-ppc/cpu-models.h | 2 --
>  2 files changed, 6 deletions(-)
> 
> diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
> index 901cf40..506dee1 100644
> --- a/target-ppc/cpu-models.c
> +++ b/target-ppc/cpu-models.c
> @@ -1130,10 +1130,6 @@
>  #if defined(TODO)
>  POWERPC_DEF("POWER6",CPU_POWERPC_POWER6, POWER6,
>  "POWER6")
> -POWERPC_DEF("POWER6_5",  CPU_POWERPC_POWER6_5,   POWER5,
> -"POWER6 running in POWER5 mode")
> -POWERPC_DEF("POWER6A",   CPU_POWERPC_POWER6A,POWER6,
> -"POWER6A")
>  #endif
>  POWERPC_DEF("POWER7_v2.3",   CPU_POWERPC_POWER7_v23, POWER7,
>  "POWER7 v2.3")
> diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
> index 7d9e6a2..aafbbd7 100644
> --- a/target-ppc/cpu-models.h
> +++ b/target-ppc/cpu-models.h
> @@ -549,8 +549,6 @@ enum {
>  CPU_POWERPC_POWER5 = 0x003A0203,
>  CPU_POWERPC_POWER5P_v21= 0x003B0201,
>  CPU_POWERPC_POWER6 = 0x003E,
> -CPU_POWERPC_POWER6_5   = 0x0F01, /* POWER6 in POWER5 mode */
> -CPU_POWERPC_POWER6A= 0x0F02,
>  CPU_POWERPC_POWER_SERVER_MASK  = 0x,
>  CPU_POWERPC_POWER7_BASE= 0x003F,
>  CPU_POWERPC_POWER7_v23 = 0x003F0203,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC 06/17] ppc: Rename cpu_version to compat_pvr

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 01:26:41PM +1100, Alexey Kardashevskiy wrote:
> On 30/10/16 22:11, David Gibson wrote:
> > The 'cpu_version' field in PowerPCCPU is badly named.  It's named after the
> > 'cpu-version' device tree property where it is advertised, but that meaning
> > may not be obvious in most places it appears.
> > 
> > Worse, it doesn't even really correspond to that device tree property.  The
> > property contains either the processor's PVR, or, if the CPU is running in
> > a compatibility mode, a special "logical PVR" representing which mode.
> > 
> > Rename the cpu_version field, and a number of related variables to
> > compat_pvr to make this clearer.
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  hw/ppc/spapr.c  |  4 ++--
> >  hw/ppc/spapr_hcall.c| 30 +++---
> >  target-ppc/cpu.h|  6 +++---
> >  target-ppc/kvm.c|  4 ++--
> >  target-ppc/kvm_ppc.h|  4 ++--
> >  target-ppc/translate_init.c | 10 +-
> >  6 files changed, 29 insertions(+), 29 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index b7762ee..276cefa 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -149,8 +149,8 @@ static int spapr_fixup_cpu_smt_dt(void *fdt, int 
> > offset, PowerPCCPU *cpu,
> >  uint32_t gservers_prop[smt_threads * 2];
> >  int index = ppc_get_vcpu_dt_id(cpu);
> >  
> > -if (cpu->cpu_version) {
> > -ret = fdt_setprop_cell(fdt, offset, "cpu-version", 
> > cpu->cpu_version);
> > +if (cpu->compat_pvr) {
> 
> 
> Nit: g_assert(cpu->compat_pvr & 0x0F00); may be?

That change wouldn't belong in this patch, which is purely a
mechanical s/cpu_version/compat_pvr/.

In general, I have considered such an assert(), but held back, because
I hand't spotted an actual document saying that range was explicitly
reserved for logical PVRs.  If you have such a reference, I'll look at
adding such an assert somewhere.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] Migration dirty bitmap: should only mark pages as dirty after they have been sent

2016-11-07 Thread Chunguang Li



> -Original Messages-
> From: "Li, Liang Z" 
> Sent Time: Monday, November 7, 2016
> To: "Chunguang Li" 
> Cc: "Dr. David Alan Gilbert" , "Amit Shah" 
> , "pbonz...@redhat.com" , 
> "qemu-devel@nongnu.org" , "stefa...@redhat.com" 
> , "quint...@redhat.com" 
> Subject: RE: [Qemu-devel] Migration dirty bitmap: should only mark pages as 
> dirty after they have been sent
> 
> > > > > > > > > > I think this is "very" wasteful. Assume the workload
> > > > > > > > > > writes the pages
> > > > > > dirty randomly within the guest address space, and the transfer
> > > > > > speed is constant. Intuitively, I think nearly half of the dirty
> > > > > > pages produced in Iteration 1 is not really dirty. This means
> > > > > > the time of Iteration 2 is double of that to send only really dirty 
> > > > > > pages.
> > > > > > > > >
> > > > > > > > > It makes sense, can you get some perf numbers to show what
> > > > > > > > > kinds of workloads get impacted the most?  That would also
> > > > > > > > > help us to figure out what kinds of speed improvements we
> > > > > > > > > can
> > > > expect.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >   Amit
> > > > > > > >
> > > > > > > > I have picked up 6 workloads and got the following
> > > > > > > > statistics numbers of every iteration (except the last
> > > > > > > > stop-copy one) during
> > > > precopy.
> > > > > > > > These numbers are obtained with the basic precopy migration,
> > > > > > > > without the capabilities like xbzrle or compression, etc.
> > > > > > > > The network for the migration is exclusive, with a separate
> > > > > > > > network for
> > > > the workloads.
> > > > > > > > They are both gigabit ethernet. I use qemu-2.5.1.
> > > > > > > >
> > > > > > > > Three (booting, idle, web server) of them converged to the
> > > > > > > > stop-copy
> > > > > > phase,
> > > > > > > > with the given bandwidth and default downtime (300ms), while
> > > > > > > > the other three (kernel compilation, zeusmp, memcached) did not.
> > > > > > > >
> > > > > > > > One page is "not-really-dirty", if it is written first and
> > > > > > > > is sent later (and not written again after that) during one
> > > > > > > > iteration. I guess this would not happen so often during the
> > > > > > > > other iterations as during the 1st iteration. Because all
> > > > > > > > the pages of the VM are sent to the dest node
> > > > > > during
> > > > > > > > the 1st iteration, while during the others, only part of the
> > > > > > > > pages are
> > > > sent.
> > > > > > > > So I think the "not-really-dirty" pages should be produced
> > > > > > > > mainly during the 1st iteration , and maybe very little
> > > > > > > > during the other
> > > > iterations.
> > > > > > > >
> > > > > > > > If we could avoid resending the "not-really-dirty" pages,
> > > > > > > > intuitively, I think the time spent on Iteration 2 would be
> > > > > > > > halved. This is a chain
> > > > > > reaction,
> > > > > > > > because the dirty pages produced during Iteration 2 is
> > > > > > > > halved, which
> > > > > > incurs
> > > > > > > > that the time spent on Iteration 3 is halved, then Iteration 4, 
> > > > > > > > 5...
> > > > > > >
> > > > > > > Yes; these numbers don't show how many of them are false dirty
> > > > though.
> > > > > > >
> > > > > > > One problem is thinking about pages that have been redirtied,
> > > > > > > if the page is
> > > > > > dirtied
> > > > > > > after the sync but before the network write then it's the
> > > > > > > false-dirty that you're describing.
> > > > > > >
> > > > > > > However, if the page is being written a few times, and so it
> > > > > > > would have
> > > > > > been written
> > > > > > > after the network write then it isn't a false-dirty.
> > > > > > >
> > > > > > > You might be able to figure that out with some kernel tracing
> > > > > > > of when the
> > > > > > dirtying
> > > > > > > happens, but it might be easier to write the fix!
> > > > > > >
> > > > > > > Dave
> > > > > >
> > > > > > Hi, I have made some new progress now.
> > > > > >
> > > > > > To tell how many false dirty pages there are exactly in each
> > > > > > iteration, I malloc a buffer in memory as big as the size of the
> > > > > > whole VM memory. When a page is transferred to the dest node, it
> > > > > > is copied to the buffer; During the next iteration, if one page
> > > > > > is transferred, it is compared to the old one in the buffer, and
> > > > > > the old one will be replaced for next comparison if it is really 
> > > > > > dirty.
> > > > > > Thus, we are now able to get the exact number of false dirty pages.
> > > > > >
> > > > > > This time, I use 15 workloads to get the statistic number. They are:
> > > > > >
> > > > > >   1. 11 benchmarks picked up from cpu2006 benchmark suit. They
> > > > 

Re: [Qemu-devel] Sphinx for QEMU docs? (and a doc-comment format question)

2016-11-07 Thread Emilio G. Cota
On Mon, Nov 07, 2016 at 15:03:23 +, Peter Maydell wrote:
> On 5 November 2016 at 18:42, Peter Maydell  wrote:
> > With a little luck I may be able to put something up
> > on Monday as a sort of minimal-demonstration of how
> > this would look in QEMU.
> 
> Generated documentation:
>   http://people.linaro.org/~peter.maydell/sphinx/index.html
> Git branch with the patches needed to produce that:
>   https://git.linaro.org/people/peter.maydell/qemu-arm.git sphinx-docs
> Pointy-clicky interface to git branch:
>   https://git.linaro.org/people/peter.maydell/qemu-arm.git/log/?h=sphinx-docs
> 
> I didn't bother to write the makefile changes to tie it into
> the main build process, so to regenerate the docs locally you'll
> need to run
>  sphinx-build -b html docs my-build-dir/docs
> from the QEMU source tree root, which will put the output into
> my-build-dir/docs, which you can then point your web browser at.

I moved qht's documentation to this to see how hard it was.
Was trivial to do! The result looks very nice. 

Patches here:
- Web:  https://github.com/cota/qemu/tree/sphinx-docs
- Git:  https://github.com/cota/qemu.git sphinx-docs

> The overall organisation structure needs some thought --
> I think we should at least separate into user/ for user
> docs and dev/ for internals docs (and only install the
> user/ docs).

Agreed.

> The branch above just puts the two example
> docs directly into the index.rst for demo purposes.
> 
> Conclusions from this exercise:
> 1) conversion isn't all that difficult, and the results
>look pretty nice
> 2) some of the doc-comment format differences are irritating:
>. "function - short description" not "function: short description"
>. "" not ".@fieldname"
>. "" not "#typename"
> 3) the most awkward part of kernel-doc syntax is that it bakes
>in the kernel's style choice of always using "struct foo"
>for types -- I don't think there's any way to document
>'MemoryRegion' and 'AddressSpace' without the 'struct'
>coming out in the documentation output.
> 
> We could fix (2) by loosening the kernel-doc script's
> parsing if we were happy to carry around a forked version
> of it. Fixing (3) requires more serious surgery on kernel-doc
> I suspect.

FWIW I'd prefer to strictly adhere to kerneldoc as is. Converting
the existing kerneldocs will require some supervision, anyway.

E.



Re: [Qemu-devel] [PULL 15/16] spapr_pci: Add a 64-bit MMIO window

2016-11-07 Thread Alexey Kardashevskiy
On 08/11/16 12:16, David Gibson wrote:
> On Fri, Nov 04, 2016 at 04:03:31PM +1100, Alexey Kardashevskiy wrote:
>> On 17/10/16 13:43, David Gibson wrote:
>>> On real hardware, and under pHyp, the PCI host bridges on Power machines
>>> typically advertise two outbound MMIO windows from the guest's physical
>>> memory space to PCI memory space:
>>>   - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
>>>   - A 64-bit window which maps onto a large region somewhere high in PCI
>>> address space (traditionally this used an identity mapping from guest
>>> physical address to PCI address, but that's not always the case)
>>>
>>> The qemu implementation in spapr-pci-host-bridge, however, only supports a
>>> single outbound MMIO window, however.  At least some Linux versions expect
>>> the two windows however, so we arranged this window to map onto the PCI
>>> memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
>>> windows, the "32-bit" window from 2G..4G and the "64-bit" window from
>>> 4G..~64G.
>>>
>>> This approach means, however, that the 64G window is not naturally aligned.
>>> In turn this limits the size of the largest BAR we can map (which does have
>>> to be naturally aligned) to roughly half of the total window.  With some
>>> large nVidia GPGPU cards which have huge memory BARs, this is starting to
>>> be a problem.
>>>
>>> This patch adds true support for separate 32-bit and 64-bit outbound MMIO
>>> windows to the spapr-pci-host-bridge implementation, each of which can
>>> be independently configured.  The 32-bit window always maps to 2G.. in PCI
>>> space, but the PCI address of the 64-bit window can be configured (it
>>> defaults to the same as the guest physical address).
>>>
>>> So as not to break possible existing configurations, as long as a 64-bit
>>> window is not specified, a large single window can be specified.  This
>>> will appear the same way to the guest as the old approach, although it's
>>> now implemented by two contiguous memory regions rather than a single one.
>>>
>>> For now, this only adds the possibility of 64-bit windows.  The default
>>> configuration still uses the legacy mode.
>>
>>
>> This breaks migration to QEMU v2.7, the destination reports:
>>
>> 22901@1478235261.799031:vmstate_load spapr_pci, spapr_pci
>> 22901@1478235261.799040:vmstate_load_field_error field "mem_win_size" load
>> failed, ret = -22
>> qemu-hostos1: error while loading state for instance 0x0 of device 
>> 'spapr_pci'
>> 22901@1478235261.801324:migrate_set_state new state 7
>> qemu-hostos1: load of migration failed: Invalid argument
>>
>>
>> mem_win_size decreased from 0xf8000 to 0x8000.
>>
>> I'd think it should be allowed to migrate like this.
> 
> AIUI, we don't generally care (upstream) about migration from newer to
> older qemu, only from older to newer. 

Older (v2.7.0) to newer (current upstream with -machine pseries-2.7) does
not work either with the exact same symptom.



> Trying to maintain backwards
> migration makes it almost impossible to fix anything at all, ever.
> 
>>
>>
>> The source PHB is:
>>
>> (qemu) info qtree
>> bus: main-system-bus
>>   type System
>>   dev: spapr-pci-host-bridge, id ""
>> index = 0 (0x0)
>> buid = 576460752840294400 (0x8002000)
>> liobn = 2147483648 (0x8000)
>> liobn64 = 4294967295 (0x)
>> mem_win_addr = 1102195982336 (0x100a000)
>> mem_win_size = 2147483648 (0x8000)
>> mem64_win_addr = 1104343465984 (0x1012000)
>> mem64_win_size = 64424509440 (0xf)
>> mem64_win_pciaddr = 4294967296 (0x1)
>>
>>
>> The destination PHB is:
>>
>> (qemu) info qtree
>> bus: main-system-bus
>>   type System
>>   dev: spapr-pci-host-bridge, id ""
>> index = 0 (0x0)
>> buid = 576460752840294400 (0x8002000)
>> liobn = 2147483648 (0x8000)
>> liobn64 = 4294967295 (0x)
>> mem_win_addr = 1102195982336 (0x100a000)
>> mem_win_size = 66571993088 (0xf8000)
>>
>>
>>
>> The source QEMU cmdline:
>>
>> /home/aik/p/qemu/ppc64-softmmu/qemu-system-ppc64 -nodefaults \
>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>> -device spapr-vty,id=svty0,chardev=STDIO0,reg=0x71000100 \
>> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none \
>> -kernel /home/aik/t/vml450le \
>> -initrd /home/aik/t/le.cpio -m 4G \
>> -machine pseries-2.6 -enable-kvm \
>>
>>
>> The destination (./qemu-hostos1 is v2.7.0 from
>> https://github.com/open-power-host-os/qemu/commits/hostos-stable )
>>
>> ./qemu-hostos1 -nodefaults \
>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>> -device spapr-vty,id=svty0,chardev=STDIO0,reg=0x71000100 \
>> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none -m 4G \
>> -machine pseries-2.6 -enable-kvm \
>> -mon chardev=SOCKET0,mode=readline -incoming "tcp:fstn1:2"
>>
>>
>>
>>>
>>> Signed-off-by: David Gibson 
>>> Reviewed-by: Laurent Vivier 

Re: [Qemu-devel] [PATCH v4 4/4] target-ppc: Implement bcdctz. instruction

2016-11-07 Thread David Gibson
On Tue, Nov 01, 2016 at 01:24:48PM -0200, Jose Ricardo Ziviani wrote:
> bcdctz. converts from BCD to Zoned numeric format. Zoned format uses
> a byte to represent a digit where the most significant nibble is 0x3
> or 0xf, depending on the preferred signal.
> 
> Signed-off-by: Jose Ricardo Ziviani 
> ---
>  target-ppc/helper.h |  1 +
>  target-ppc/int_helper.c | 49 
> +
>  target-ppc/translate/vmx-impl.inc.c |  7 ++
>  3 files changed, 57 insertions(+)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 8546bb9..5412da5 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -374,6 +374,7 @@ DEF_HELPER_4(bcdsub, i32, avr, avr, avr, i32)
>  DEF_HELPER_3(bcdcfn, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdctn, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdcfz, i32, avr, avr, i32)
> +DEF_HELPER_3(bcdctz, i32, avr, avr, i32)
>  
>  DEF_HELPER_2(xsadddp, void, env, i32)
>  DEF_HELPER_2(xssubdp, void, env, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index c546a9a..5983a32 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2774,6 +2774,55 @@ uint32_t helper_bcdcfz(ppc_avr_t *r, ppc_avr_t *b, 
> uint32_t ps)
>  return cr;
>  }
>  
> +uint32_t helper_bcdctz(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
> +{
> +int i;
> +int j;
> +int cr = 0;
> +uint8_t digit = 0;
> +int sgnb = bcd_get_sgn(b);
> +int zone_lead = (ps) ? 0xF0 : 0x30;
> +int invalid = (sgnb == 0);
> +ppc_avr_t ret = { .u64 = { 0, 0 } };
> +
> +int eq_flag = (b->u64[HI_IDX] == 0) && ((b->u64[LO_IDX] >> 4) == 0);
> +int ox_flag = ((b->u64[HI_IDX] >> 4) != 0);

As in 2/4 you can simplify this by using bcd_cmp_zero().

> +
> +for (i = 0, j = 1; i < 32; i += 2, j++) {

And you can reduce this to a single loop variable.

> +digit = bcd_get_digit(b, j, );
> +
> +if (unlikely(invalid)) {
> +break;
> +}
> +
> +ret.u8[BCD_DIG_BYTE(i)] = zone_lead + digit;
> +}
> +
> +if (ps) {
> +bcd_put_digit(, (sgnb == 1) ? 0xC : 0xD, 1);
> +} else {
> +bcd_put_digit(, (sgnb == 1) ? 0x3 : 0x7, 1);
> +}
> +
> +if (!eq_flag) {
> +cr = (sgnb == 1) ? 1 << CRF_GT : 1 << CRF_LT;
> +} else {
> +cr = 1 << CRF_EQ;
> +}
> +
> +if (ox_flag) {
> +cr |= 1 << CRF_SO;
> +}
> +
> +if (unlikely(invalid)) {
> +cr = 1 << CRF_SO;
> +}
> +
> +*r = ret;
> +
> +return cr;
> +}
> +
>  void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
>  {
>  int i;
> diff --git a/target-ppc/translate/vmx-impl.inc.c 
> b/target-ppc/translate/vmx-impl.inc.c
> index 7e902a9..b05c874 100644
> --- a/target-ppc/translate/vmx-impl.inc.c
> +++ b/target-ppc/translate/vmx-impl.inc.c
> @@ -973,10 +973,14 @@ GEN_BCD(bcdsub)
>  GEN_BCD2(bcdcfn)
>  GEN_BCD2(bcdctn)
>  GEN_BCD2(bcdcfz)
> +GEN_BCD2(bcdctz)
>  
>  static void gen_xpnd04_1(DisasContext *ctx)
>  {
>  switch (opc4(ctx->opcode)) {
> +case 4:
> +gen_bcdctz(ctx);
> +break;
>  case 5:
>  gen_bcdctn(ctx);
>  break;
> @@ -995,6 +999,9 @@ static void gen_xpnd04_1(DisasContext *ctx)
>  static void gen_xpnd04_2(DisasContext *ctx)
>  {
>  switch (opc4(ctx->opcode)) {
> +case 4:
> +gen_bcdctz(ctx);
> +break;
>  case 6:
>  gen_bcdcfz(ctx);
>  break;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v4 3/4] target-ppc: Implement bcdcfz. instruction

2016-11-07 Thread David Gibson
On Tue, Nov 01, 2016 at 01:24:47PM -0200, Jose Ricardo Ziviani wrote:
> bcdcfz. converts from Zoned numeric format to BCD. Zoned format uses
> a byte to represent a digit where the most significant nibble is 0x3
> or 0xf, depending on the preferred signal.
> 
> Signed-off-by: Jose Ricardo Ziviani 
> ---
>  target-ppc/helper.h |  1 +
>  target-ppc/int_helper.c | 45 
> +
>  target-ppc/translate/vmx-impl.inc.c |  7 ++
>  3 files changed, 53 insertions(+)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 33286c6..8546bb9 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -373,6 +373,7 @@ DEF_HELPER_4(bcdadd, i32, avr, avr, avr, i32)
>  DEF_HELPER_4(bcdsub, i32, avr, avr, avr, i32)
>  DEF_HELPER_3(bcdcfn, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdctn, i32, avr, avr, i32)
> +DEF_HELPER_3(bcdcfz, i32, avr, avr, i32)
>  
>  DEF_HELPER_2(xsadddp, void, env, i32)
>  DEF_HELPER_2(xssubdp, void, env, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 3c21173..c546a9a 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2729,6 +2729,51 @@ uint32_t helper_bcdctn(ppc_avr_t *r, ppc_avr_t *b, 
> uint32_t ps)
>  return cr;
>  }
>  
> +uint32_t helper_bcdcfz(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
> +{
> +int i;
> +int j;
> +int cr = 0;
> +int invalid = 0;
> +int zone_digit = 0;
> +int zone_lead = ps ? 0xF : 0x3;
> +int digit = 0;
> +ppc_avr_t ret = { .u64 = { 0, 0 } };
> +int sgnb = b->u8[BCD_DIG_BYTE(0)] >> 4;
> +
> +if (unlikely((sgnb < 0xA) && ps)) {
> +invalid = 1;
> +}
> +
> +for (i = 0, j = 1; i < 31; i += 2, j++) {

Having these two loop counters in paralle is kind of clunky.

I think it would be clearer to just have i go from 0..15, use i*2 for
the input index and (i+1) for the output index.

> +zone_digit = (i) ? b->u8[BCD_DIG_BYTE(i)] >> 4 : zone_lead;
> +digit = b->u8[BCD_DIG_BYTE(i)] & 0xF;
> +if (unlikely(zone_digit != zone_lead || digit > 0x9)) {
> +invalid = 1;
> +break;
> +}
> +
> +bcd_put_digit(, digit, j);
> +}
> +
> +if ((ps && (sgnb == 0xB || sgnb == 0xD)) ||
> +(!ps && (sgnb & 0x4))) {
> +bcd_put_digit(, BCD_NEG_PREF, 0);
> +} else {
> +bcd_put_digit(, BCD_PLUS_PREF_1, 0);
> +}
> +
> +cr = bcd_cmp_zero();
> +
> +if (unlikely(invalid)) {
> +cr = 1 << CRF_SO;
> +}
> +
> +*r = ret;
> +
> +return cr;
> +}
> +
>  void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
>  {
>  int i;
> diff --git a/target-ppc/translate/vmx-impl.inc.c 
> b/target-ppc/translate/vmx-impl.inc.c
> index d5953a6..7e902a9 100644
> --- a/target-ppc/translate/vmx-impl.inc.c
> +++ b/target-ppc/translate/vmx-impl.inc.c
> @@ -972,6 +972,7 @@ GEN_BCD(bcdadd)
>  GEN_BCD(bcdsub)
>  GEN_BCD2(bcdcfn)
>  GEN_BCD2(bcdctn)
> +GEN_BCD2(bcdcfz)
>  
>  static void gen_xpnd04_1(DisasContext *ctx)
>  {
> @@ -979,6 +980,9 @@ static void gen_xpnd04_1(DisasContext *ctx)
>  case 5:
>  gen_bcdctn(ctx);
>  break;
> +case 6:
> +gen_bcdcfz(ctx);
> +break;
>  case 7:
>  gen_bcdcfn(ctx);
>  break;
> @@ -991,6 +995,9 @@ static void gen_xpnd04_1(DisasContext *ctx)
>  static void gen_xpnd04_2(DisasContext *ctx)
>  {
>  switch (opc4(ctx->opcode)) {
> +case 6:
> +gen_bcdcfz(ctx);
> +break;
>  case 7:
>  gen_bcdcfn(ctx);
>  break;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create

2016-11-07 Thread Jeff Cody
On Wed, Nov 02, 2016 at 01:50:55PM -0400, John Snow wrote:
> Refactor backup_start as backup_job_create, which only creates the job,
> but does not automatically start it. The old interface, 'backup_start',
> is not kept in favor of limiting the number of nearly-identical interfaces
> that would have to be edited to keep up with QAPI changes in the future.
> 
> Callers that wish to synchronously start the backup_block_job can
> instead just call block_job_start immediately after calling
> backup_job_create.
> 
> Transactions are updated to use the new interface, calling block_job_start
> only during the .commit phase, which helps prevent race conditions where
> jobs may finish before we even finish building the transaction. This may
> happen, for instance, during empty block backup jobs.
> 
> Reported-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: John Snow 
> ---
>  block/backup.c| 26 ---
>  block/replication.c   | 12 ---
>  blockdev.c| 83 
> ++-
>  include/block/block_int.h | 23 ++---
>  4 files changed, 87 insertions(+), 57 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index ae1b99a..ea38733 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -543,7 +543,7 @@ static const BlockJobDriver backup_job_driver = {
>  .drain  = backup_drain,
>  };
>  
> -void backup_start(const char *job_id, BlockDriverState *bs,
> +BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>BlockDriverState *target, int64_t speed,
>MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
>bool compress,
> @@ -563,52 +563,52 @@ void backup_start(const char *job_id, BlockDriverState 
> *bs,
>  
>  if (bs == target) {
>  error_setg(errp, "Source and target cannot be the same");
> -return;
> +return NULL;
>  }
>  
>  if (!bdrv_is_inserted(bs)) {
>  error_setg(errp, "Device is not inserted: %s",
> bdrv_get_device_name(bs));
> -return;
> +return NULL;
>  }
>  
>  if (!bdrv_is_inserted(target)) {
>  error_setg(errp, "Device is not inserted: %s",
> bdrv_get_device_name(target));
> -return;
> +return NULL;
>  }
>  
>  if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
>  error_setg(errp, "Compression is not supported for this drive %s",
> bdrv_get_device_name(target));
> -return;
> +return NULL;
>  }
>  
>  if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
> -return;
> +return NULL;
>  }
>  
>  if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
> -return;
> +return NULL;
>  }
>  
>  if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>  if (!sync_bitmap) {
>  error_setg(errp, "must provide a valid bitmap name for "
>   "\"incremental\" sync mode");
> -return;
> +return NULL;
>  }
>  
>  /* Create a new bitmap, and freeze/disable this one. */
>  if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
> -return;
> +return NULL;
>  }
>  } else if (sync_bitmap) {
>  error_setg(errp,
> "a sync_bitmap was provided to backup_run, "
> "but received an incompatible sync_mode (%s)",
> MirrorSyncMode_lookup[sync_mode]);
> -return;
> +return NULL;
>  }
>  
>  len = bdrv_getlength(bs);
> @@ -655,8 +655,8 @@ void backup_start(const char *job_id, BlockDriverState 
> *bs,
>  block_job_add_bdrv(>common, target);
>  job->common.len = len;
>  block_job_txn_add_job(txn, >common);
> -block_job_start(>common);
> -return;
> +
> +return >common;
>  
>   error:
>  if (sync_bitmap) {
> @@ -666,4 +666,6 @@ void backup_start(const char *job_id, BlockDriverState 
> *bs,
>  backup_clean(>common);
>  block_job_unref(>common);
>  }
> +
> +return NULL;
>  }
> diff --git a/block/replication.c b/block/replication.c
> index d5e2b0f..729dd12 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -421,6 +421,7 @@ static void replication_start(ReplicationState *rs, 
> ReplicationMode mode,
>  int64_t active_length, hidden_length, disk_length;
>  AioContext *aio_context;
>  Error *local_err = NULL;
> +BlockJob *job;
>  
>  aio_context = bdrv_get_aio_context(bs);
>  aio_context_acquire(aio_context);
> @@ -508,17 +509,18 @@ static void replication_start(ReplicationState *rs, 
> ReplicationMode mode,
>  bdrv_op_block_all(top_bs, s->blocker);
>  bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, 

Re: [Qemu-devel] [PATCH v3 3/6] blockjob: add .start field

2016-11-07 Thread Jeff Cody
On Wed, Nov 02, 2016 at 01:50:53PM -0400, John Snow wrote:
> Add an explicit start field to specify the entrypoint. We already have
> ownership of the coroutine itself AND managing the lifetime of the
> coroutine, let's take control of creation of the coroutine, too.
> 
> This will allow us to delay creation of the actual coroutine until we
> know we'll actually start a BlockJob in block_job_start. This avoids
> the sticky question of how to "un-create" a Coroutine that hasn't been
> started yet.
> 
> Signed-off-by: John Snow 
> ---
>  block/backup.c   | 25 +
>  block/commit.c   |  3 ++-
>  block/mirror.c   |  4 +++-
>  block/stream.c   |  3 ++-
>  include/block/blockjob_int.h |  3 +++
>  5 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 734a24c..4ed4494 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -323,17 +323,6 @@ static void backup_drain(BlockJob *job)
>  }
>  }
>  
> -static const BlockJobDriver backup_job_driver = {
> -.instance_size  = sizeof(BackupBlockJob),
> -.job_type   = BLOCK_JOB_TYPE_BACKUP,
> -.set_speed  = backup_set_speed,
> -.commit = backup_commit,
> -.abort  = backup_abort,
> -.clean  = backup_clean,
> -.attached_aio_context   = backup_attached_aio_context,
> -.drain  = backup_drain,
> -};
> -
>  static BlockErrorAction backup_error_action(BackupBlockJob *job,
>  bool read, int error)
>  {
> @@ -542,6 +531,18 @@ static void coroutine_fn backup_run(void *opaque)
>  block_job_defer_to_main_loop(>common, backup_complete, data);
>  }
>  
> +static const BlockJobDriver backup_job_driver = {
> +.instance_size  = sizeof(BackupBlockJob),
> +.job_type   = BLOCK_JOB_TYPE_BACKUP,
> +.start  = backup_run,
> +.set_speed  = backup_set_speed,
> +.commit = backup_commit,
> +.abort  = backup_abort,
> +.clean  = backup_clean,
> +.attached_aio_context   = backup_attached_aio_context,
> +.drain  = backup_drain,
> +};
> +

Some code movement here in addition to the .start addition, but to a better
place (I am guessing that is intentional).

>  void backup_start(const char *job_id, BlockDriverState *bs,
>BlockDriverState *target, int64_t speed,
>MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
> @@ -653,7 +654,7 @@ void backup_start(const char *job_id, BlockDriverState 
> *bs,
>  
>  block_job_add_bdrv(>common, target);
>  job->common.len = len;
> -job->common.co = qemu_coroutine_create(backup_run, job);
> +job->common.co = qemu_coroutine_create(job->common.driver->start, job);
>  block_job_txn_add_job(txn, >common);
>  qemu_coroutine_enter(job->common.co);
>  return;
> diff --git a/block/commit.c b/block/commit.c
> index e1eda89..20d27e2 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -205,6 +205,7 @@ static const BlockJobDriver commit_job_driver = {
>  .instance_size = sizeof(CommitBlockJob),
>  .job_type  = BLOCK_JOB_TYPE_COMMIT,
>  .set_speed = commit_set_speed,
> +.start = commit_run,
>  };
>  
>  void commit_start(const char *job_id, BlockDriverState *bs,
> @@ -288,7 +289,7 @@ void commit_start(const char *job_id, BlockDriverState 
> *bs,
>  s->backing_file_str = g_strdup(backing_file_str);
>  
>  s->on_error = on_error;
> -s->common.co = qemu_coroutine_create(commit_run, s);
> +s->common.co = qemu_coroutine_create(s->common.driver->start, s);
>  
>  trace_commit_start(bs, base, top, s, s->common.co);
>  qemu_coroutine_enter(s->common.co);
> diff --git a/block/mirror.c b/block/mirror.c
> index b2c1fb8..659e09c 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -920,6 +920,7 @@ static const BlockJobDriver mirror_job_driver = {
>  .instance_size  = sizeof(MirrorBlockJob),
>  .job_type   = BLOCK_JOB_TYPE_MIRROR,
>  .set_speed  = mirror_set_speed,
> +.start  = mirror_run,
>  .complete   = mirror_complete,
>  .pause  = mirror_pause,
>  .attached_aio_context   = mirror_attached_aio_context,
> @@ -930,6 +931,7 @@ static const BlockJobDriver commit_active_job_driver = {
>  .instance_size  = sizeof(MirrorBlockJob),
>  .job_type   = BLOCK_JOB_TYPE_COMMIT,
>  .set_speed  = mirror_set_speed,
> +.start  = mirror_run,
>  .complete   = mirror_complete,
>  .pause  = mirror_pause,
>  .attached_aio_context   = mirror_attached_aio_context,
> @@ -1007,7 +1009,7 @@ static void 

Re: [Qemu-devel] [PATCH v3 2/6] blockjob: add .clean property

2016-11-07 Thread Jeff Cody
On Wed, Nov 02, 2016 at 01:50:52PM -0400, John Snow wrote:
> Cleaning up after we have deferred to the main thread but before the
> transaction has converged can be dangerous and result in deadlocks
> if the job cleanup invokes any BH polling loops.
> 
> A job may attempt to begin cleaning up, but may induce another job to
> enter its cleanup routine. The second job, part of our same transaction,
> will block waiting for the first job to finish, so neither job may now
> make progress.
> 
> To rectify this, allow jobs to register a cleanup operation that will
> always run regardless of if the job was in a transaction or not, and
> if the transaction job group completed successfully or not.
> 
> Move sensitive cleanup to this callback instead which is guaranteed to
> be run only after the transaction has converged, which removes sensitive
> timing constraints from said cleanup.
> 
> Furthermore, in future patches these cleanup operations will be performed
> regardless of whether or not we actually started the job. Therefore,
> cleanup callbacks should essentially confine themselves to undoing create
> operations, e.g. setup actions taken in what is now backup_start.
> 
> Reported-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: John Snow 
> ---
>  block/backup.c   | 15 ++-
>  blockjob.c   |  3 +++
>  include/block/blockjob_int.h |  8 
>  3 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 7b5d8a3..734a24c 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -242,6 +242,14 @@ static void backup_abort(BlockJob *job)
>  }
>  }
>  
> +static void backup_clean(BlockJob *job)
> +{
> +BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> +assert(s->target);
> +blk_unref(s->target);
> +s->target = NULL;
> +}
> +
>  static void backup_attached_aio_context(BlockJob *job, AioContext 
> *aio_context)
>  {
>  BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> @@ -321,6 +329,7 @@ static const BlockJobDriver backup_job_driver = {
>  .set_speed  = backup_set_speed,
>  .commit = backup_commit,
>  .abort  = backup_abort,
> +.clean  = backup_clean,
>  .attached_aio_context   = backup_attached_aio_context,
>  .drain  = backup_drain,
>  };
> @@ -343,12 +352,8 @@ typedef struct {
>  
>  static void backup_complete(BlockJob *job, void *opaque)
>  {
> -BackupBlockJob *s = container_of(job, BackupBlockJob, common);
>  BackupCompleteData *data = opaque;
>  
> -blk_unref(s->target);
> -s->target = NULL;
> -
>  block_job_completed(job, data->ret);
>  g_free(data);
>  }
> @@ -658,7 +663,7 @@ void backup_start(const char *job_id, BlockDriverState 
> *bs,
>  bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
>  }
>  if (job) {
> -blk_unref(job->target);
> +backup_clean(>common);
>  block_job_unref(>common);
>  }
>  }
> diff --git a/blockjob.c b/blockjob.c
> index 4d0ef53..e3c458c 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -241,6 +241,9 @@ static void block_job_completed_single(BlockJob *job)
>  job->driver->abort(job);
>  }
>  }
> +if (job->driver->clean) {
> +job->driver->clean(job);
> +}
>  
>  if (job->cb) {
>  job->cb(job->opaque, job->ret);
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 40275e4..60d91a0 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -74,6 +74,14 @@ struct BlockJobDriver {
>  void (*abort)(BlockJob *job);
>  
>  /**
> + * If the callback is not NULL, it will be invoked after a call to either
> + * .commit() or .abort(). Regardless of which callback is invoked after
> + * completion, .clean() will always be called, even if the job does not
> + * belong to a transaction group.
> + */
> +void (*clean)(BlockJob *job);
> +
> +/**
>   * If the callback is not NULL, it will be invoked when the job 
> transitions
>   * into the paused state.  Paused jobs must not perform any asynchronous
>   * I/O or event loop activity.  This callback is used to quiesce jobs.
> -- 
> 2.7.4
> 

Reviewed-by: Jeff Cody 



Re: [Qemu-devel] [PATCH v4 1/4] target-ppc: Implement bcdcfn. instruction

2016-11-07 Thread David Gibson
On Tue, Nov 01, 2016 at 01:24:45PM -0200, Jose Ricardo Ziviani wrote:
> bcdcfn. converts from National numeric format to BCD. National format
> uses a byte to represent a digit where the most significant nibble is
> always 0x3 and the least sign. nibbles is the digit itself.
> 
> Signed-off-by: Jose Ricardo Ziviani 

Reviewed-by: David Gibson 

> ---
>  target-ppc/helper.h |  1 +
>  target-ppc/int_helper.c | 56 
> +
>  target-ppc/translate/vmx-impl.inc.c | 55 
>  target-ppc/translate/vmx-ops.inc.c  |  4 +--
>  4 files changed, 114 insertions(+), 2 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 3916b2e..3b23eed 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -371,6 +371,7 @@ DEF_HELPER_4(vpermxor, void, avr, avr, avr, avr)
>  
>  DEF_HELPER_4(bcdadd, i32, avr, avr, avr, i32)
>  DEF_HELPER_4(bcdsub, i32, avr, avr, avr, i32)
> +DEF_HELPER_3(bcdcfn, i32, avr, avr, i32)
>  
>  DEF_HELPER_2(xsadddp, void, env, i32)
>  DEF_HELPER_2(xssubdp, void, env, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index dca4798..605cfc7 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2429,6 +2429,8 @@ void helper_vsubecuq(ppc_avr_t *r, ppc_avr_t *a, 
> ppc_avr_t *b, ppc_avr_t *c)
>  #define BCD_NEG_PREF0xD
>  #define BCD_NEG_ALT 0xB
>  #define BCD_PLUS_ALT_2  0xE
> +#define NATIONAL_PLUS   0x2B
> +#define NATIONAL_NEG0x2D
>  
>  #if defined(HOST_WORDS_BIGENDIAN)
>  #define BCD_DIG_BYTE(n) (15 - (n/2))
> @@ -2495,6 +2497,24 @@ static void bcd_put_digit(ppc_avr_t *bcd, uint8_t 
> digit, int n)
>  }
>  }
>  
> +static int bcd_cmp_zero(ppc_avr_t *bcd)
> +{
> +if (bcd->u64[HI_IDX] == 0 && (bcd->u64[LO_IDX] >> 4) == 0) {
> +return 1 << CRF_EQ;
> +} else {
> +return (bcd_get_sgn(bcd) == 1) ? 1 << CRF_GT : 1 << CRF_LT;
> +}
> +}
> +
> +static uint16_t get_national_digit(ppc_avr_t *reg, int n)
> +{
> +#if defined(HOST_WORDS_BIGENDIAN)
> +return reg->u16[8 - n];
> +#else
> +return reg->u16[n];
> +#endif
> +}
> +
>  static int bcd_cmp_mag(ppc_avr_t *a, ppc_avr_t *b)
>  {
>  int i;
> @@ -2625,6 +2645,42 @@ uint32_t helper_bcdsub(ppc_avr_t *r,  ppc_avr_t *a, 
> ppc_avr_t *b, uint32_t ps)
>  return helper_bcdadd(r, a, , ps);
>  }
>  
> +uint32_t helper_bcdcfn(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
> +{
> +int i;
> +int cr = 0;
> +uint16_t national = 0;
> +uint16_t sgnb = get_national_digit(b, 0);
> +ppc_avr_t ret = { .u64 = { 0, 0 } };
> +int invalid = (sgnb != NATIONAL_PLUS && sgnb != NATIONAL_NEG);
> +
> +for (i = 1; i < 8; i++) {
> +national = get_national_digit(b, i);
> +if (unlikely(national < 0x30 || national > 0x39)) {
> +invalid = 1;
> +break;
> +}
> +
> +bcd_put_digit(, national & 0xf, i);
> +}
> +
> +if (sgnb == NATIONAL_PLUS) {
> +bcd_put_digit(, (ps == 0) ? BCD_PLUS_PREF_1 : BCD_PLUS_PREF_2, 
> 0);
> +} else {
> +bcd_put_digit(, BCD_NEG_PREF, 0);
> +}
> +
> +cr = bcd_cmp_zero();
> +
> +if (unlikely(invalid)) {
> +cr = 1 << CRF_SO;
> +}
> +
> +*r = ret;
> +
> +return cr;
> +}
> +
>  void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
>  {
>  int i;
> diff --git a/target-ppc/translate/vmx-impl.inc.c 
> b/target-ppc/translate/vmx-impl.inc.c
> index fc612d9..50abfaf 100644
> --- a/target-ppc/translate/vmx-impl.inc.c
> +++ b/target-ppc/translate/vmx-impl.inc.c
> @@ -945,8 +945,61 @@ static void gen_##op(DisasContext *ctx) \
>  tcg_temp_free_i32(ps);  \
>  }
>  
> +#define GEN_BCD2(op)\
> +static void gen_##op(DisasContext *ctx) \
> +{   \
> +TCGv_ptr rd, rb;\
> +TCGv_i32 ps;\
> +\
> +if (unlikely(!ctx->altivec_enabled)) {  \
> +gen_exception(ctx, POWERPC_EXCP_VPU);   \
> +return; \
> +}   \
> +\
> +rb = gen_avr_ptr(rB(ctx->opcode));  \
> +rd = gen_avr_ptr(rD(ctx->opcode));  \
> +\
> +ps = tcg_const_i32((ctx->opcode & 0x200) != 0); \
> +\
> +gen_helper_##op(cpu_crf[6], rd, rb, ps);\
> +\
> +tcg_temp_free_ptr(rb);  \
> +tcg_temp_free_ptr(rd);  \
> +tcg_temp_free_i32(ps);  \
> 

Re: [Qemu-devel] [PATCH v4 2/4] target-ppc: Implement bcdctn. instruction

2016-11-07 Thread David Gibson
On Tue, Nov 01, 2016 at 01:24:46PM -0200, Jose Ricardo Ziviani wrote:
> bcdctn. converts from BCD to National numeric format. National format
> uses a byte to represent a digit where the most significant nibble is
> always 0x3 and the least sign. nibbles is the digit itself.
> 
> Signed-off-by: Jose Ricardo Ziviani 
> ---
>  target-ppc/helper.h |  1 +
>  target-ppc/int_helper.c | 48 
> +
>  target-ppc/translate/vmx-impl.inc.c |  4 
>  3 files changed, 53 insertions(+)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 3b23eed..33286c6 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -372,6 +372,7 @@ DEF_HELPER_4(vpermxor, void, avr, avr, avr, avr)
>  DEF_HELPER_4(bcdadd, i32, avr, avr, avr, i32)
>  DEF_HELPER_4(bcdsub, i32, avr, avr, avr, i32)
>  DEF_HELPER_3(bcdcfn, i32, avr, avr, i32)
> +DEF_HELPER_3(bcdctn, i32, avr, avr, i32)
>  
>  DEF_HELPER_2(xsadddp, void, env, i32)
>  DEF_HELPER_2(xssubdp, void, env, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 605cfc7..3c21173 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2515,6 +2515,15 @@ static uint16_t get_national_digit(ppc_avr_t *reg, int 
> n)
>  #endif
>  }
>  
> +static void set_national_digit(ppc_avr_t *reg, uint8_t val, int n)
> +{
> +#if defined(HOST_WORDS_BIGENDIAN)
> +reg->u16[8 - n] = val;
> +#else
> +reg->u16[n] = val;
> +#endif
> +}
> +
>  static int bcd_cmp_mag(ppc_avr_t *a, ppc_avr_t *b)
>  {
>  int i;
> @@ -2681,6 +2690,45 @@ uint32_t helper_bcdcfn(ppc_avr_t *r, ppc_avr_t *b, 
> uint32_t ps)
>  return cr;
>  }
>  
> +uint32_t helper_bcdctn(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
> +{
> +int i;
> +int cr = 0;
> +int sgnb = bcd_get_sgn(b);
> +int invalid = (sgnb == 0);
> +ppc_avr_t ret = { .u64 = { 0, 0 } };
> +
> +int eq_flag = (b->u64[HI_IDX] == 0) && ((b->u64[LO_IDX] >> 4) == 0);

You can simplify this, and several pieces below by using the
bcd_cmp_zero() function you introduced in the previous patch.

> +int ox_flag = (b->u64[HI_IDX] != 0) || ((b->u64[LO_IDX] >> 32) != 0);
> +
> +for (i = 1; i < 8; i++) {
> +set_national_digit(, 0x30 + bcd_get_digit(b, i, ), i);
> +
> +if (unlikely(invalid)) {
> +break;
> +}
> +}
> +set_national_digit(, (sgnb == -1) ? NATIONAL_NEG : NATIONAL_PLUS, 0);
> +
> +if (!eq_flag) {
> +cr = (sgnb == -1) ? 1 << CRF_LT : 1 << CRF_GT;
> +} else {
> +cr = 1 << CRF_EQ;
> +}
> +
> +if (ox_flag) {
> +cr |= 1 << CRF_SO;
> +}
> +
> +if (unlikely(invalid)) {
> +cr = 1 << CRF_SO;
> +}
> +
> +*r = ret;
> +
> +return cr;
> +}
> +
>  void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
>  {
>  int i;
> diff --git a/target-ppc/translate/vmx-impl.inc.c 
> b/target-ppc/translate/vmx-impl.inc.c
> index 50abfaf..d5953a6 100644
> --- a/target-ppc/translate/vmx-impl.inc.c
> +++ b/target-ppc/translate/vmx-impl.inc.c
> @@ -971,10 +971,14 @@ static void gen_##op(DisasContext *ctx) \
>  GEN_BCD(bcdadd)
>  GEN_BCD(bcdsub)
>  GEN_BCD2(bcdcfn)
> +GEN_BCD2(bcdctn)
>  
>  static void gen_xpnd04_1(DisasContext *ctx)
>  {
>  switch (opc4(ctx->opcode)) {
> +case 5:
> +gen_bcdctn(ctx);
> +break;
>  case 7:
>  gen_bcdcfn(ctx);
>  break;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 1/6] blockjob: fix dead pointer in txn list

2016-11-07 Thread Jeff Cody
On Wed, Nov 02, 2016 at 01:50:51PM -0400, John Snow wrote:
> From: Vladimir Sementsov-Ogievskiy 
> 
> Though it is not intended to be reached through normal circumstances,
> if we do not gracefully deconstruct the transaction QLIST, we may wind
> up with stale pointers in the list.
> 
> The rest of this series attempts to address the underlying issues,
> but this should fix list inconsistencies.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Tested-by: John Snow 
> Reviewed-by: John Snow 
> [Rewrote commit message. --js]
> Signed-off-by: John Snow 
> Reviewed-by: Eric Blake 
> Reviewed-by: Kevin Wolf 
> 
> Signed-off-by: John Snow 
> ---
>  blockjob.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 4aa14a4..4d0ef53 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -256,6 +256,7 @@ static void block_job_completed_single(BlockJob *job)
>  }
>  
>  if (job->txn) {
> +QLIST_REMOVE(job, txn_list);
>  block_job_txn_unref(job->txn);
>  }
>  block_job_unref(job);
> -- 
> 2.7.4
>

Reviewed-by: Jeff Cody 



Re: [Qemu-devel] [PATCH v5 11/17] ppc/xics: Add "native" XICS subclass

2016-11-07 Thread David Gibson
On Wed, Nov 02, 2016 at 11:48:51AM +0100, Cédric Le Goater wrote:
> On 10/28/2016 03:00 AM, David Gibson wrote:
> > On Thu, Oct 27, 2016 at 07:43:10PM +0200, Cédric Le Goater wrote:
> >> On 10/27/2016 05:09 AM, David Gibson wrote:
> >>> On Wed, Oct 26, 2016 at 09:13:18AM +0200, Cédric Le Goater wrote:
>  On 10/25/2016 07:08 AM, David Gibson wrote:
> > On Sat, Oct 22, 2016 at 11:46:44AM +0200, Cédric Le Goater wrote:
> >> This provides access to the MMIO based Interrupt Presentation
> >> Controllers (ICP) as found on a POWER8 system.
> >>
> >> A new XICSNative class is introduced to hold the MMIO region of the
> >> ICPs. Each thread of the system has a subregion, indexed by its PIR
> >> number, holding a XIVE (External Interrupt Vector Entry). This
> >> provides a mean to make the link with the ICPState of the CPU.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>
> >>  Changes since v4:
> >>
> >>  - replaced the pir_able by memory subregions using an ICP. 
> >>  - removed the find_icp() and cpu_setup() handlers which became
> >>useless with the memory regions.
> >>  - removed the superfluous inits done in xics_native_initfn. This is
> >>covered in the parent class init.
> >>  - took ownership of the patch.
> >>
> >>  default-configs/ppc64-softmmu.mak |   3 +-
> >>  hw/intc/Makefile.objs |   1 +
> >>  hw/intc/xics_native.c | 304 
> >> ++
> >>  include/hw/ppc/pnv.h  |  19 +++
> >>  include/hw/ppc/xics.h |  24 +++
> >>  5 files changed, 350 insertions(+), 1 deletion(-)
> >>  create mode 100644 hw/intc/xics_native.c
> >>
> >> diff --git a/default-configs/ppc64-softmmu.mak 
> >> b/default-configs/ppc64-softmmu.mak
> >> index 67a9bcaa67fa..a22c93a48686 100644
> >> --- a/default-configs/ppc64-softmmu.mak
> >> +++ b/default-configs/ppc64-softmmu.mak
> >> @@ -48,8 +48,9 @@ CONFIG_PLATFORM_BUS=y
> >>  CONFIG_ETSEC=y
> >>  CONFIG_LIBDECNUMBER=y
> >>  # For pSeries
> >> -CONFIG_XICS=$(CONFIG_PSERIES)
> >> +CONFIG_XICS=$(or $(CONFIG_PSERIES),$(CONFIG_POWERNV))
> >>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
> >> +CONFIG_XICS_NATIVE=$(CONFIG_POWERNV)
> >>  CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
> >>  # For PReP
> >>  CONFIG_MC146818RTC=y
> >> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> >> index 2f44a2da26e9..e44a29d75b32 100644
> >> --- a/hw/intc/Makefile.objs
> >> +++ b/hw/intc/Makefile.objs
> >> @@ -34,6 +34,7 @@ obj-$(CONFIG_RASPI) += bcm2835_ic.o bcm2836_control.o
> >>  obj-$(CONFIG_SH4) += sh_intc.o
> >>  obj-$(CONFIG_XICS) += xics.o
> >>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
> >> +obj-$(CONFIG_XICS_NATIVE) += xics_native.o
> >>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> >>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
> >>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> >> diff --git a/hw/intc/xics_native.c b/hw/intc/xics_native.c
> >> new file mode 100644
> >> index ..bbdd786aeb50
> >> --- /dev/null
> >> +++ b/hw/intc/xics_native.c
> >> @@ -0,0 +1,304 @@
> >> +/*
> >> + * QEMU PowerPC PowerNV machine model
> >> + *
> >> + * Native version of ICS/ICP
> >> + *
> >> + * Copyright (c) 2016, IBM Corporation.
> >> + *
> >> + * This library is free software; you can redistribute it and/or
> >> + * modify it under the terms of the GNU Lesser General Public
> >> + * License as published by the Free Software Foundation; either
> >> + * version 2 of the License, or (at your option) any later version.
> >> + *
> >> + * This library is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> >> + * Lesser General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU Lesser General Public
> >> + * License along with this library; if not, see 
> >> .
> >> + */
> >> +
> >> +#include "qemu/osdep.h"
> >> +#include "qapi/error.h"
> >> +#include "qemu-common.h"
> >> +#include "cpu.h"
> >> +#include "hw/hw.h"
> >> +#include "qemu/log.h"
> >> +#include "qapi/error.h"
> >> +
> >> +#include "hw/ppc/fdt.h"
> >> +#include "hw/ppc/xics.h"
> >> +#include "hw/ppc/pnv.h"
> >> +
> >> +#include 
> >> +
> >> +static void xics_native_reset(void *opaque)
> >> +{
> >> +device_reset(DEVICE(opaque));
> >> +}
> >> +
> >> +static void xics_native_initfn(Object *obj)
> >> +{
> >> +qemu_register_reset(xics_native_reset, obj);
> >> +}
> >
> > I 

Re: [Qemu-devel] [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices

2016-11-07 Thread Jike Song
On 11/08/2016 07:16 AM, Alex Williamson wrote:
> On Sat, 5 Nov 2016 02:40:44 +0530
> Kirti Wankhede  wrote:
> 
>> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
>> Mediated device only uses IOMMU APIs, the underlying hardware can be
>> managed by an IOMMU domain.
>>
>> Aim of this change is:
>> - To use most of the code of TYPE1 IOMMU driver for mediated devices
>> - To support direct assigned device and mediated device in single module
>>
>> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
>> backend module. More details:
>> - vfio_pin_pages() callback here uses task and address space of vfio_dma,
>>   that is, of the process who mapped that iova range.
>> - Added pfn_list tracking logic to address space structure. All pages
>>   pinned through this interface are trached in its address space.
>   ^ k
> --|
> 
>> - Pinned pages list is used to verify unpinning request and to unpin
>>   remaining pages while detaching the group for that device.
>> - Page accounting is updated to account in its address space where the
>>   pages are pinned/unpinned.
>> -  Accouting for mdev device is only done if there is no iommu capable
>>   domain in the container. When there is a direct device assigned to the
>>   container and that domain is iommu capable, all pages are already pinned
>>   during DMA_MAP.
>> - Page accouting is updated on hot plug and unplug mdev device and pass
>>   through device.
>>
>> Tested by assigning below combinations of devices to a single VM:
>> - GPU pass through only
>> - vGPU device only
>> - One GPU pass through and one vGPU device
>> - Linux VM hot plug and unplug vGPU device while GPU pass through device
>>   exist
>> - Linux VM hot plug and unplug GPU pass through device while vGPU device
>>   exist
>>
>> Signed-off-by: Kirti Wankhede 
>> Signed-off-by: Neo Jia 
>> Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 538 
>> +---
>>  1 file changed, 500 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 8d64528dcc22..e511073446a0 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson "
>> @@ -56,6 +57,7 @@ MODULE_PARM_DESC(disable_hugepages,
>>  struct vfio_iommu {
>>  struct list_headdomain_list;
>>  struct list_headaddr_space_list;
>> +struct vfio_domain  *external_domain; /* domain for external user */
>>  struct mutexlock;
>>  struct rb_root  dma_list;
>>  boolv2;
>> @@ -67,6 +69,9 @@ struct vfio_addr_space {
>>  struct mm_struct*mm;
>>  struct list_headnext;
>>  atomic_tref_count;
>> +/* external user pinned pfns */
>> +struct rb_root  pfn_list;   /* pinned Host pfn list */
>> +struct mutexpfn_list_lock;  /* mutex for pfn_list */
>>  };
>>  
>>  struct vfio_domain {
>> @@ -83,6 +88,7 @@ struct vfio_dma {
>>  unsigned long   vaddr;  /* Process virtual addr */
>>  size_t  size;   /* Map size (bytes) */
>>  int prot;   /* IOMMU_READ/WRITE */
>> +booliommu_mapped;
>>  struct vfio_addr_space  *addr_space;
>>  struct task_struct  *task;
>>  boolmlock_cap;
>> @@ -94,6 +100,19 @@ struct vfio_group {
>>  };
>>  
>>  /*
>> + * Guest RAM pinning working set or DMA target
>> + */
>> +struct vfio_pfn {
>> +struct rb_node  node;
>> +unsigned long   pfn;/* Host pfn */
>> +int prot;
>> +atomic_tref_count;
>> +};
>> +
>> +#define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
>> +(!list_empty(>domain_list))
>> +
>> +/*
>>   * This code handles mapping and unmapping of user data buffers
>>   * into DMA'ble space using the IOMMU
>>   */
>> @@ -153,6 +172,93 @@ static struct vfio_addr_space 
>> *vfio_find_addr_space(struct vfio_iommu *iommu,
>>  return NULL;
>>  }
>>  
>> +/*
>> + * Helper Functions for host pfn list
>> + */
>> +static struct vfio_pfn *vfio_find_pfn(struct vfio_addr_space *addr_space,
>> +  unsigned long pfn)
>> +{
>> +struct vfio_pfn *vpfn;
>> +struct rb_node *node = addr_space->pfn_list.rb_node;
>> +
>> +while (node) {
>> +vpfn = rb_entry(node, struct vfio_pfn, node);
>> +
>> +if (pfn < vpfn->pfn)
>> +node = 

Re: [Qemu-devel] [PULL 15/16] spapr_pci: Add a 64-bit MMIO window

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 04:03:31PM +1100, Alexey Kardashevskiy wrote:
> On 17/10/16 13:43, David Gibson wrote:
> > On real hardware, and under pHyp, the PCI host bridges on Power machines
> > typically advertise two outbound MMIO windows from the guest's physical
> > memory space to PCI memory space:
> >   - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
> >   - A 64-bit window which maps onto a large region somewhere high in PCI
> > address space (traditionally this used an identity mapping from guest
> > physical address to PCI address, but that's not always the case)
> > 
> > The qemu implementation in spapr-pci-host-bridge, however, only supports a
> > single outbound MMIO window, however.  At least some Linux versions expect
> > the two windows however, so we arranged this window to map onto the PCI
> > memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
> > windows, the "32-bit" window from 2G..4G and the "64-bit" window from
> > 4G..~64G.
> > 
> > This approach means, however, that the 64G window is not naturally aligned.
> > In turn this limits the size of the largest BAR we can map (which does have
> > to be naturally aligned) to roughly half of the total window.  With some
> > large nVidia GPGPU cards which have huge memory BARs, this is starting to
> > be a problem.
> > 
> > This patch adds true support for separate 32-bit and 64-bit outbound MMIO
> > windows to the spapr-pci-host-bridge implementation, each of which can
> > be independently configured.  The 32-bit window always maps to 2G.. in PCI
> > space, but the PCI address of the 64-bit window can be configured (it
> > defaults to the same as the guest physical address).
> > 
> > So as not to break possible existing configurations, as long as a 64-bit
> > window is not specified, a large single window can be specified.  This
> > will appear the same way to the guest as the old approach, although it's
> > now implemented by two contiguous memory regions rather than a single one.
> > 
> > For now, this only adds the possibility of 64-bit windows.  The default
> > configuration still uses the legacy mode.
> 
> 
> This breaks migration to QEMU v2.7, the destination reports:
> 
> 22901@1478235261.799031:vmstate_load spapr_pci, spapr_pci
> 22901@1478235261.799040:vmstate_load_field_error field "mem_win_size" load
> failed, ret = -22
> qemu-hostos1: error while loading state for instance 0x0 of device 'spapr_pci'
> 22901@1478235261.801324:migrate_set_state new state 7
> qemu-hostos1: load of migration failed: Invalid argument
> 
> 
> mem_win_size decreased from 0xf8000 to 0x8000.
> 
> I'd think it should be allowed to migrate like this.

AIUI, we don't generally care (upstream) about migration from newer to
older qemu, only from older to newer.  Trying to maintain backwards
migration makes it almost impossible to fix anything at all, ever.

> 
> 
> The source PHB is:
> 
> (qemu) info qtree
> bus: main-system-bus
>   type System
>   dev: spapr-pci-host-bridge, id ""
> index = 0 (0x0)
> buid = 576460752840294400 (0x8002000)
> liobn = 2147483648 (0x8000)
> liobn64 = 4294967295 (0x)
> mem_win_addr = 1102195982336 (0x100a000)
> mem_win_size = 2147483648 (0x8000)
> mem64_win_addr = 1104343465984 (0x1012000)
> mem64_win_size = 64424509440 (0xf)
> mem64_win_pciaddr = 4294967296 (0x1)
> 
> 
> The destination PHB is:
> 
> (qemu) info qtree
> bus: main-system-bus
>   type System
>   dev: spapr-pci-host-bridge, id ""
> index = 0 (0x0)
> buid = 576460752840294400 (0x8002000)
> liobn = 2147483648 (0x8000)
> liobn64 = 4294967295 (0x)
> mem_win_addr = 1102195982336 (0x100a000)
> mem_win_size = 66571993088 (0xf8000)
> 
> 
> 
> The source QEMU cmdline:
> 
> /home/aik/p/qemu/ppc64-softmmu/qemu-system-ppc64 -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device spapr-vty,id=svty0,chardev=STDIO0,reg=0x71000100 \
> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none \
> -kernel /home/aik/t/vml450le \
> -initrd /home/aik/t/le.cpio -m 4G \
> -machine pseries-2.6 -enable-kvm \
> 
> 
> The destination (./qemu-hostos1 is v2.7.0 from
> https://github.com/open-power-host-os/qemu/commits/hostos-stable )
> 
> ./qemu-hostos1 -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device spapr-vty,id=svty0,chardev=STDIO0,reg=0x71000100 \
> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none -m 4G \
> -machine pseries-2.6 -enable-kvm \
> -mon chardev=SOCKET0,mode=readline -incoming "tcp:fstn1:2"
> 
> 
> 
> > 
> > Signed-off-by: David Gibson 
> > Reviewed-by: Laurent Vivier 
> > ---
> >  hw/ppc/spapr.c  | 10 +--
> >  hw/ppc/spapr_pci.c  | 70 
> > -
> >  include/hw/pci-host/spapr.h |  8 --
> >  include/hw/ppc/spapr.h  |  

Re: [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start

2016-11-07 Thread John Snow



On 11/07/2016 09:05 PM, Jeff Cody wrote:

On Mon, Nov 07, 2016 at 09:02:14PM -0500, John Snow wrote:



On 11/03/2016 08:17 AM, Kevin Wolf wrote:

Am 02.11.2016 um 18:50 hat John Snow geschrieben:

Instead of automatically starting jobs at creation time via backup_start
et al, we'd like to return a job object pointer that can be started
manually at later point in time.

For now, add the block_job_start mechanism and start the jobs
automatically as we have been doing, with conversions job-by-job coming
in later patches.

Of note: cancellation of unstarted jobs will perform all the normal
cleanup as if the job had started, particularly abort and clean. The
only difference is that we will not emit any events, because the job
never actually started.

Signed-off-by: John Snow 



diff --git a/block/commit.c b/block/commit.c
index 20d27e2..5b7c454 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -289,10 +289,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
s->backing_file_str = g_strdup(backing_file_str);

s->on_error = on_error;
-s->common.co = qemu_coroutine_create(s->common.driver->start, s);

trace_commit_start(bs, base, top, s, s->common.co);


s->common.co is now uninitialised and should probably be removed from
the tracepoint arguments. The same is true for mirror and stream.


-qemu_coroutine_enter(s->common.co);
+block_job_start(>common);
}



diff --git a/blockjob.c b/blockjob.c
index e3c458c..16c5159 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -174,7 +174,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
job->blk   = blk;
job->cb= cb;
job->opaque= opaque;
-job->busy  = true;
+job->busy  = false;
+job->paused= true;
job->refcnt= 1;
bs->job = job;

@@ -202,6 +203,21 @@ bool block_job_is_internal(BlockJob *job)
return (job->id == NULL);
}

+static bool block_job_started(BlockJob *job)
+{
+return job->co;
+}
+
+void block_job_start(BlockJob *job)
+{
+assert(job && !block_job_started(job) && job->paused &&
+   !job->busy && job->driver->start);
+job->paused = false;
+job->busy = true;
+job->co = qemu_coroutine_create(job->driver->start, job);
+qemu_coroutine_enter(job->co);
+}


We allow the user to pause a job while it's not started yet. You
classified this as "harmless". But if we accept this, can we really
unconditionally enter the coroutine even if the job has been paused?
Can't a user expect that a job remains in paused state when they
explicitly requested a pause and the job was already internally paused,
like in this case by block_job_create()?



What will end up happening is that we'll enter the job, and then it'll pause
immediately upon entrance. Is that a problem?

If the jobs themselves are not checking their pause state fastidiously, it
could be (but block/backup does -- after it creates a write notifier.)

Do we want a stronger guarantee here?

Naively I think it's OK as-is, but I could add a stronger boolean in that
lets us know if it's okay to start or not, and we could delay the actual
creation and start until the 'resume' comes in if you'd like.

I'd like to avoid the complexity if we can help it, but perhaps I'm not
thinking carefully enough about the existing edge cases.



Is there any reason we can't just use job->pause_count here?  When the job
is created, set job->paused = true, and job->pause_count = 1.  In the
block_job_start(), check the pause_count prior to qemu_coroutine_enter():

void block_job_start(BlockJob *job)
{
assert(job && !block_job_started(job) && job->paused &&
  !job->busy && job->driver->start);
job->co = qemu_coroutine_create(job->driver->start, job);
job->paused = --job->pause_count > 0;
if (!job->paused) {
job->busy = true;
qemu_coroutine_enter(job->co);
}
}



Solid point. Let's do it this way.
Thanks!




The same probably also applies to the internal job pausing during
bdrv_drain_all_begin/end, though as you know there is a larger problem
with starting jobs under drain_all anyway. For now, we just need to keep
in mind that we can neither create nor start a job in such sections.



Yeah, there are deeper problems there. As long as the existing critical
sections don't allow us to create jobs (started or not) I think we're
probably already OK.


Kevin





Re: [Qemu-devel] [PATCH for-2.8] migration: Fix return code of ram_save_iterate()

2016-11-07 Thread David Gibson
On Fri, Nov 04, 2016 at 02:10:17PM +0100, Thomas Huth wrote:
> qemu_savevm_state_iterate() expects the iterators to return 1
> when they are done, and 0 if there is still something left to do.
> However, ram_save_iterate() does not obey this rule and returns
> the number of saved pages instead. This causes a fatal hang with
> ppc64 guests when you run QEMU like this (also works with TCG):
> 
>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>-hda /tmp/test.qcow2 -serial mon:stdio
> 
> ... then switch to the monitor by pressing CTRL-a c and try to
> save a snapshot with "savevm test1" for example.
> 
> After the first iteration, ram_save_iterate() always returns 0 here,
> so that qemu_savevm_state_iterate() hangs in an endless loop and you
> can only "kill -9" the QEMU process.
> Fix it by using proper return values in ram_save_iterate().
> 
> Signed-off-by: Thomas Huth 

Hmm.  I think the change is technically correct, but I'm uneasy with
this approach to the solution.  The whole reason this wasn't caught
earlier is that almost nothing looks at the return value.  Without
changing that I think it's very likely someone will mess this up
again.

I think it would be preferable to change the return type to void to
make it explicit that this function is not directly returning the
"completion" status, but instead that's calculated from the other
progress variables it updates.

> ---
>  migration/ram.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index fb9252d..a1c8089 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1987,7 +1987,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  int ret;
>  int i;
>  int64_t t0;
> -int pages_sent = 0;
> +int done = 0;
>  
>  rcu_read_lock();
>  if (ram_list.version != last_version) {
> @@ -2007,9 +2007,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  pages = ram_find_and_save_block(f, false, _transferred);
>  /* no more pages to sent */
>  if (pages == 0) {
> +done = 1;
>  break;
>  }
> -pages_sent += pages;
>  acct_info.iterations++;
>  
>  /* we want to check in the 1st loop, just in case it was the 1st time
> @@ -2044,7 +2044,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  return ret;
>  }
>  
> -return pages_sent;
> +return done;
>  }
>  
>  /* Called with iothread lock */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start

2016-11-07 Thread Jeff Cody
On Mon, Nov 07, 2016 at 09:02:14PM -0500, John Snow wrote:
> 
> 
> On 11/03/2016 08:17 AM, Kevin Wolf wrote:
> >Am 02.11.2016 um 18:50 hat John Snow geschrieben:
> >>Instead of automatically starting jobs at creation time via backup_start
> >>et al, we'd like to return a job object pointer that can be started
> >>manually at later point in time.
> >>
> >>For now, add the block_job_start mechanism and start the jobs
> >>automatically as we have been doing, with conversions job-by-job coming
> >>in later patches.
> >>
> >>Of note: cancellation of unstarted jobs will perform all the normal
> >>cleanup as if the job had started, particularly abort and clean. The
> >>only difference is that we will not emit any events, because the job
> >>never actually started.
> >>
> >>Signed-off-by: John Snow 
> >
> >>diff --git a/block/commit.c b/block/commit.c
> >>index 20d27e2..5b7c454 100644
> >>--- a/block/commit.c
> >>+++ b/block/commit.c
> >>@@ -289,10 +289,9 @@ void commit_start(const char *job_id, BlockDriverState 
> >>*bs,
> >> s->backing_file_str = g_strdup(backing_file_str);
> >>
> >> s->on_error = on_error;
> >>-s->common.co = qemu_coroutine_create(s->common.driver->start, s);
> >>
> >> trace_commit_start(bs, base, top, s, s->common.co);
> >
> >s->common.co is now uninitialised and should probably be removed from
> >the tracepoint arguments. The same is true for mirror and stream.
> >
> >>-qemu_coroutine_enter(s->common.co);
> >>+block_job_start(>common);
> >> }
> >
> >>diff --git a/blockjob.c b/blockjob.c
> >>index e3c458c..16c5159 100644
> >>--- a/blockjob.c
> >>+++ b/blockjob.c
> >>@@ -174,7 +174,8 @@ void *block_job_create(const char *job_id, const 
> >>BlockJobDriver *driver,
> >> job->blk   = blk;
> >> job->cb= cb;
> >> job->opaque= opaque;
> >>-job->busy  = true;
> >>+job->busy  = false;
> >>+job->paused= true;
> >> job->refcnt= 1;
> >> bs->job = job;
> >>
> >>@@ -202,6 +203,21 @@ bool block_job_is_internal(BlockJob *job)
> >> return (job->id == NULL);
> >> }
> >>
> >>+static bool block_job_started(BlockJob *job)
> >>+{
> >>+return job->co;
> >>+}
> >>+
> >>+void block_job_start(BlockJob *job)
> >>+{
> >>+assert(job && !block_job_started(job) && job->paused &&
> >>+   !job->busy && job->driver->start);
> >>+job->paused = false;
> >>+job->busy = true;
> >>+job->co = qemu_coroutine_create(job->driver->start, job);
> >>+qemu_coroutine_enter(job->co);
> >>+}
> >
> >We allow the user to pause a job while it's not started yet. You
> >classified this as "harmless". But if we accept this, can we really
> >unconditionally enter the coroutine even if the job has been paused?
> >Can't a user expect that a job remains in paused state when they
> >explicitly requested a pause and the job was already internally paused,
> >like in this case by block_job_create()?
> >
> 
> What will end up happening is that we'll enter the job, and then it'll pause
> immediately upon entrance. Is that a problem?
> 
> If the jobs themselves are not checking their pause state fastidiously, it
> could be (but block/backup does -- after it creates a write notifier.)
> 
> Do we want a stronger guarantee here?
> 
> Naively I think it's OK as-is, but I could add a stronger boolean in that
> lets us know if it's okay to start or not, and we could delay the actual
> creation and start until the 'resume' comes in if you'd like.
> 
> I'd like to avoid the complexity if we can help it, but perhaps I'm not
> thinking carefully enough about the existing edge cases.
> 

Is there any reason we can't just use job->pause_count here?  When the job
is created, set job->paused = true, and job->pause_count = 1.  In the
block_job_start(), check the pause_count prior to qemu_coroutine_enter():

void block_job_start(BlockJob *job)
{
assert(job && !block_job_started(job) && job->paused &&
  !job->busy && job->driver->start);
job->co = qemu_coroutine_create(job->driver->start, job);
job->paused = --job->pause_count > 0;
if (!job->paused) {
job->busy = true;
qemu_coroutine_enter(job->co);
}
}


> >The same probably also applies to the internal job pausing during
> >bdrv_drain_all_begin/end, though as you know there is a larger problem
> >with starting jobs under drain_all anyway. For now, we just need to keep
> >in mind that we can neither create nor start a job in such sections.
> >
> 
> Yeah, there are deeper problems there. As long as the existing critical
> sections don't allow us to create jobs (started or not) I think we're
> probably already OK.
> 
> >Kevin
> >



Re: [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start

2016-11-07 Thread John Snow



On 11/03/2016 08:17 AM, Kevin Wolf wrote:

Am 02.11.2016 um 18:50 hat John Snow geschrieben:

Instead of automatically starting jobs at creation time via backup_start
et al, we'd like to return a job object pointer that can be started
manually at later point in time.

For now, add the block_job_start mechanism and start the jobs
automatically as we have been doing, with conversions job-by-job coming
in later patches.

Of note: cancellation of unstarted jobs will perform all the normal
cleanup as if the job had started, particularly abort and clean. The
only difference is that we will not emit any events, because the job
never actually started.

Signed-off-by: John Snow 



diff --git a/block/commit.c b/block/commit.c
index 20d27e2..5b7c454 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -289,10 +289,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 s->backing_file_str = g_strdup(backing_file_str);

 s->on_error = on_error;
-s->common.co = qemu_coroutine_create(s->common.driver->start, s);

 trace_commit_start(bs, base, top, s, s->common.co);


s->common.co is now uninitialised and should probably be removed from
the tracepoint arguments. The same is true for mirror and stream.


-qemu_coroutine_enter(s->common.co);
+block_job_start(>common);
 }



diff --git a/blockjob.c b/blockjob.c
index e3c458c..16c5159 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -174,7 +174,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 job->blk   = blk;
 job->cb= cb;
 job->opaque= opaque;
-job->busy  = true;
+job->busy  = false;
+job->paused= true;
 job->refcnt= 1;
 bs->job = job;

@@ -202,6 +203,21 @@ bool block_job_is_internal(BlockJob *job)
 return (job->id == NULL);
 }

+static bool block_job_started(BlockJob *job)
+{
+return job->co;
+}
+
+void block_job_start(BlockJob *job)
+{
+assert(job && !block_job_started(job) && job->paused &&
+   !job->busy && job->driver->start);
+job->paused = false;
+job->busy = true;
+job->co = qemu_coroutine_create(job->driver->start, job);
+qemu_coroutine_enter(job->co);
+}


We allow the user to pause a job while it's not started yet. You
classified this as "harmless". But if we accept this, can we really
unconditionally enter the coroutine even if the job has been paused?
Can't a user expect that a job remains in paused state when they
explicitly requested a pause and the job was already internally paused,
like in this case by block_job_create()?



What will end up happening is that we'll enter the job, and then it'll 
pause immediately upon entrance. Is that a problem?


If the jobs themselves are not checking their pause state fastidiously, 
it could be (but block/backup does -- after it creates a write notifier.)


Do we want a stronger guarantee here?

Naively I think it's OK as-is, but I could add a stronger boolean in 
that lets us know if it's okay to start or not, and we could delay the 
actual creation and start until the 'resume' comes in if you'd like.


I'd like to avoid the complexity if we can help it, but perhaps I'm not 
thinking carefully enough about the existing edge cases.



The same probably also applies to the internal job pausing during
bdrv_drain_all_begin/end, though as you know there is a larger problem
with starting jobs under drain_all anyway. For now, we just need to keep
in mind that we can neither create nor start a job in such sections.



Yeah, there are deeper problems there. As long as the existing critical 
sections don't allow us to create jobs (started or not) I think we're 
probably already OK.



Kevin





Re: [Qemu-devel] [PATCH v4] This patch adds support for a new block device type called "vxhs".

2016-11-07 Thread Fam Zheng
On Mon, 11/07 17:39, ashish mittal wrote:
> I guess the email subject of the individual patches is OK to
> change?

Yes, that is fine, as long as it doesn't burden incremental reviewing
unnecessarily.

Fam



Re: [Qemu-devel] [PATCH v4] This patch adds support for a new block device type called "vxhs".

2016-11-07 Thread ashish mittal
On Fri, Nov 4, 2016 at 6:04 AM, Stefan Hajnoczi  wrote:
> Please keep using "block/vxhs: Add Veritas HyperScale VxHS block device
> support" as the cover letter email subject.  This way tools are able to
> automatically mark old versions of this patch series as obsolete.

Sent out the new set of patches as a series. Hope I got it right!
Will keep the same email subject on the cover letter in subsequent
versions. I guess the email subject of the individual patches is OK to
change?



[Qemu-devel] [PATCH v2 0/2] Add the generic ARM timer

2016-11-07 Thread Alistair Francis
These two patches and and connect the Generic ARM Timer. This includes
support for dropping insecure writes.

V2:
 - Fix couter/counter typo

Alistair Francis (2):
  arm_generic_timer: Add the ARM Generic Timer
  xlnx-zynqmp: Connect the ARM Generic Timer

 hw/arm/xlnx-zynqmp.c |  13 +++
 hw/timer/Makefile.objs   |   1 +
 hw/timer/arm_generic_timer.c | 216 +++
 include/hw/arm/xlnx-zynqmp.h |   2 +
 include/hw/timer/arm_generic_timer.h |  60 ++
 5 files changed, 292 insertions(+)
 create mode 100644 hw/timer/arm_generic_timer.c
 create mode 100644 include/hw/timer/arm_generic_timer.h

-- 
2.7.4




[Qemu-devel] [PATCH v2 1/2] arm_generic_timer: Add the ARM Generic Timer

2016-11-07 Thread Alistair Francis
Add the ARM generic timer. This allows the guest to poll the timer for
values and also supports secure writes only.

Signed-off-by: Alistair Francis 
---
V2:
 - Fix couter/counter typo

 hw/timer/Makefile.objs   |   1 +
 hw/timer/arm_generic_timer.c | 216 +++
 include/hw/timer/arm_generic_timer.h |  60 ++
 3 files changed, 277 insertions(+)
 create mode 100644 hw/timer/arm_generic_timer.c
 create mode 100644 include/hw/timer/arm_generic_timer.h

diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index 7ba8c23..f88c468 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -17,6 +17,7 @@ common-obj-$(CONFIG_IMX) += imx_epit.o
 common-obj-$(CONFIG_IMX) += imx_gpt.o
 common-obj-$(CONFIG_LM32) += lm32_timer.o
 common-obj-$(CONFIG_MILKYMIST) += milkymist-sysctl.o
+common-obj-$(CONFIG_XLNX_ZYNQMP) += arm_generic_timer.o
 
 obj-$(CONFIG_EXYNOS4) += exynos4210_mct.o
 obj-$(CONFIG_EXYNOS4) += exynos4210_pwm.o
diff --git a/hw/timer/arm_generic_timer.c b/hw/timer/arm_generic_timer.c
new file mode 100644
index 000..7642206
--- /dev/null
+++ b/hw/timer/arm_generic_timer.c
@@ -0,0 +1,216 @@
+/*
+ * QEMU model of the ARM Generic Timer
+ *
+ * Copyright (c) 2016 Xilinx Inc.
+ * Written by Alistair Francis 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/timer/arm_generic_timer.h"
+#include "qemu/timer.h"
+#include "qemu/log.h"
+
+#ifndef ARM_GEN_TIMER_ERR_DEBUG
+#define ARM_GEN_TIMER_ERR_DEBUG 0
+#endif
+
+static void counter_control_postw(RegisterInfo *reg, uint64_t val64)
+{
+ARMGenTimer *s = ARM_GEN_TIMER(reg->opaque);
+bool new_status = extract32(s->regs[R_COUNTER_CONTROL_REGISTER],
+R_COUNTER_CONTROL_REGISTER_EN_SHIFT,
+R_COUNTER_CONTROL_REGISTER_EN_LENGTH);
+uint64_t current_ticks;
+
+current_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
+ NANOSECONDS_PER_SECOND, 100);
+
+if ((s->enabled && !new_status) ||
+(!s->enabled && new_status)) {
+/* The timer is being disabled or enabled */
+s->tick_offset = current_ticks - s->tick_offset;
+}
+
+s->enabled = new_status;
+}
+
+static uint64_t counter_low_value_postr(RegisterInfo *reg, uint64_t val64)
+{
+ARMGenTimer *s = ARM_GEN_TIMER(reg->opaque);
+uint64_t current_ticks, total_ticks;
+uint32_t low_ticks;
+
+if (s->enabled) {
+current_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
+ NANOSECONDS_PER_SECOND, 100);
+total_ticks = current_ticks - s->tick_offset;
+low_ticks = (uint32_t) total_ticks;
+} else {
+/* Timer is disabled, return the time when it was disabled */
+low_ticks = (uint32_t) s->tick_offset;
+}
+
+return low_ticks;
+}
+
+static uint64_t counter_high_value_postr(RegisterInfo *reg, uint64_t val64)
+{
+ARMGenTimer *s = ARM_GEN_TIMER(reg->opaque);
+uint64_t current_ticks, total_ticks;
+uint32_t high_ticks;
+
+if (s->enabled) {
+current_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
+ NANOSECONDS_PER_SECOND, 100);
+total_ticks = current_ticks - s->tick_offset;
+high_ticks = (uint32_t) (total_ticks >> 32);
+} else {
+/* Timer is disabled, return the time when it was disabled */
+high_ticks = (uint32_t) (s->tick_offset >> 32);
+}
+
+return high_ticks;
+}
+
+
+static RegisterAccessInfo arm_gen_timer_regs_info[] = {
+{   .name = "COUNTER_CONTROL_REGISTER",
+.addr = A_COUNTER_CONTROL_REGISTER,
+.rsvd = 0xfffc,
+.post_write = counter_control_postw,
+},{ .name = "COUNTER_STATUS_REGISTER",
+.addr = A_COUNTER_STATUS_REGISTER,
+

[Qemu-devel] [PATCH v6 2/2] block/vxhs.c: Add qemu-iotests for new block device type "vxhs"

2016-11-07 Thread Ashish Mittal
These changes use a vxhs test server that is a part of the following
repository:
https://github.com/MittalAshish/libqnio.git

Signed-off-by: Ashish Mittal 
---
v6 changelog:
(1) Added iotests for VxHS block device.

 tests/qemu-iotests/common|  6 ++
 tests/qemu-iotests/common.config | 13 +
 tests/qemu-iotests/common.filter |  1 +
 tests/qemu-iotests/common.rc | 19 +++
 4 files changed, 39 insertions(+)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index d60ea2c..41430d8 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -158,6 +158,7 @@ check options
 -nfstest nfs
 -archipelagotest archipelago
 -luks   test luks
+-vxhs   test vxhs
 -xdiff  graphical mode diff
 -nocacheuse O_DIRECT on backing file
 -misalign   misalign memory allocations
@@ -261,6 +262,11 @@ testlist options
 xpand=false
 ;;
 
+-vxhs)
+IMGPROTO=vxhs
+xpand=false
+;;
+
 -ssh)
 IMGPROTO=ssh
 xpand=false
diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
index f6384fb..c7a80c0 100644
--- a/tests/qemu-iotests/common.config
+++ b/tests/qemu-iotests/common.config
@@ -105,6 +105,10 @@ if [ -z "$QEMU_NBD_PROG" ]; then
 export QEMU_NBD_PROG="`set_prog_path qemu-nbd`"
 fi
 
+if [ -z "$QEMU_VXHS_PROG" ]; then
+export QEMU_VXHS_PROG="`set_prog_path qnio_server /usr/local/bin`"
+fi
+
 _qemu_wrapper()
 {
 (
@@ -156,10 +160,19 @@ _qemu_nbd_wrapper()
 )
 }
 
+_qemu_vxhs_wrapper()
+{
+(
+echo $BASHPID > "${TEST_DIR}/qemu-vxhs.pid"
+exec "$QEMU_VXHS_PROG" $QEMU_VXHS_OPTIONS "$@"
+)
+}
+
 export QEMU=_qemu_wrapper
 export QEMU_IMG=_qemu_img_wrapper
 export QEMU_IO=_qemu_io_wrapper
 export QEMU_NBD=_qemu_nbd_wrapper
+export QEMU_VXHS=_qemu_vxhs_wrapper
 
 QEMU_IMG_EXTRA_ARGS=
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 240ed06..a8a4d0e 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -123,6 +123,7 @@ _filter_img_info()
 -e "s#$TEST_DIR#TEST_DIR#g" \
 -e "s#$IMGFMT#IMGFMT#g" \
 -e 's#nbd://127.0.0.1:10810$#TEST_DIR/t.IMGFMT#g' \
+-e 's#json.*vdisk-id.*vxhs"}}#TEST_DIR/t.IMGFMT#' \
 -e "/encrypted: yes/d" \
 -e "/cluster_size: [0-9]\\+/d" \
 -e "/table_size: [0-9]\\+/d" \
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 3213765..06a3164 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -89,6 +89,9 @@ else
 TEST_IMG=$TEST_DIR/t.$IMGFMT
 elif [ "$IMGPROTO" = "archipelago" ]; then
 TEST_IMG="archipelago:at.$IMGFMT"
+elif [ "$IMGPROTO" = "vxhs" ]; then
+TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
+TEST_IMG="vxhs://127.0.0.1:/t.$IMGFMT"
 else
 TEST_IMG=$IMGPROTO:$TEST_DIR/t.$IMGFMT
 fi
@@ -175,6 +178,12 @@ _make_test_img()
 eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 -f $IMGFMT  $TEST_IMG_FILE 
&"
 sleep 1 # FIXME: qemu-nbd needs to be listening before we continue
 fi
+
+# Start QNIO server on image directory for vxhs protocol
+if [ $IMGPROTO = "vxhs" ]; then
+eval "$QEMU_VXHS -d  $TEST_DIR &"
+sleep 1 # Wait for server to come up.
+fi
 }
 
 _rm_test_img()
@@ -201,6 +210,16 @@ _cleanup_test_img()
 fi
 rm -f "$TEST_IMG_FILE"
 ;;
+vxhs)
+if [ -f "${TEST_DIR}/qemu-vxhs.pid" ]; then
+local QEMU_VXHS_PID
+read QEMU_VXHS_PID < "${TEST_DIR}/qemu-vxhs.pid"
+kill ${QEMU_VXHS_PID} >/dev/null 2>&1
+rm -f "${TEST_DIR}/qemu-vxhs.pid"
+fi
+rm -f "$TEST_IMG_FILE"
+;;
+
 file)
 _rm_test_img "$TEST_DIR/t.$IMGFMT"
 _rm_test_img "$TEST_DIR/t.$IMGFMT.orig"
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH] ppc/pnv: fix compile breakage on old gcc

2016-11-07 Thread David Gibson
On Mon, Nov 07, 2016 at 07:03:02PM +0100, Cédric Le Goater wrote:
> PnvChip is defined twice and this can confuse old compilers :
> 
>   CC  ppc64-softmmu/hw/ppc/pnv_xscom.o
> In file included from qemu.git/hw/ppc/pnv.c:29:
> qemu.git/include/hw/ppc/pnv.h:60: error: redefinition of typedef ‘PnvChip’
> qemu.git/include/hw/ppc/pnv_xscom.h:24: note: previous declaration of 
> ‘PnvChip’ was here
> make[1]: *** [hw/ppc/pnv.o] Error 1
> make[1]: *** Waiting for unfinished jobs
> 
> Signed-off-by: Cédric Le Goater 

Applied to ppc-for-2.8, thanks.

> ---
> 
>  Tested with gcc version 4.4.7 20120313
> 
>  hw/ppc/pnv_core.c  | 1 +
>  hw/ppc/pnv_lpc.c   | 3 ++-
>  hw/ppc/pnv_xscom.c | 2 +-
>  include/hw/ppc/pnv.h   | 1 -
>  include/hw/ppc/pnv_xscom.h | 2 --
>  5 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
> index 2acda9637db5..76ce854b0c40 100644
> --- a/hw/ppc/pnv_core.c
> +++ b/hw/ppc/pnv_core.c
> @@ -24,6 +24,7 @@
>  #include "hw/ppc/ppc.h"
>  #include "hw/ppc/pnv.h"
>  #include "hw/ppc/pnv_core.h"
> +#include "hw/ppc/pnv_xscom.h"
>  
>  static void powernv_cpu_reset(void *opaque)
>  {
> diff --git a/hw/ppc/pnv_lpc.c b/hw/ppc/pnv_lpc.c
> index 00dbd8b07b38..0e2117f0f5bc 100644
> --- a/hw/ppc/pnv_lpc.c
> +++ b/hw/ppc/pnv_lpc.c
> @@ -23,8 +23,9 @@
>  #include "qapi/error.h"
>  #include "qemu/log.h"
>  
> -#include "hw/ppc/pnv_lpc.h"
>  #include "hw/ppc/pnv.h"
> +#include "hw/ppc/pnv_lpc.h"
> +#include "hw/ppc/pnv_xscom.h"
>  #include "hw/ppc/fdt.h"
>  
>  #include 
> diff --git a/hw/ppc/pnv_xscom.c b/hw/ppc/pnv_xscom.c
> index 5aaa264bd75c..f46646141a96 100644
> --- a/hw/ppc/pnv_xscom.c
> +++ b/hw/ppc/pnv_xscom.c
> @@ -25,8 +25,8 @@
>  #include "hw/sysbus.h"
>  
>  #include "hw/ppc/fdt.h"
> -#include "hw/ppc/pnv_xscom.h"
>  #include "hw/ppc/pnv.h"
> +#include "hw/ppc/pnv_xscom.h"
>  
>  #include 
>  
> diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
> index 02ac1c5f428e..7bee658733db 100644
> --- a/include/hw/ppc/pnv.h
> +++ b/include/hw/ppc/pnv.h
> @@ -21,7 +21,6 @@
>  
>  #include "hw/boards.h"
>  #include "hw/sysbus.h"
> -#include "hw/ppc/pnv_xscom.h"
>  #include "hw/ppc/pnv_lpc.h"
>  
>  #define TYPE_PNV_CHIP "powernv-chip"
> diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
> index c0a2fbb9f6f8..41a5127a1907 100644
> --- a/include/hw/ppc/pnv_xscom.h
> +++ b/include/hw/ppc/pnv_xscom.h
> @@ -21,8 +21,6 @@
>  
>  #include "qom/object.h"
>  
> -typedef struct PnvChip PnvChip;
> -
>  typedef struct PnvXScomInterface {
>  Object parent;
>  } PnvXScomInterface;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH v2 2/2] xlnx-zynqmp: Connect the ARM Generic Timer

2016-11-07 Thread Alistair Francis
Signed-off-by: Alistair Francis 
---

 hw/arm/xlnx-zynqmp.c | 13 +
 include/hw/arm/xlnx-zynqmp.h |  2 ++
 2 files changed, 15 insertions(+)

diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index 0d86ba3..43c68c5 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -38,6 +38,8 @@
 #define SATA_ADDR   0xFD0C
 #define SATA_NUM_PORTS  2
 
+#define ARM_GEN_TIMER_ADDR  0xFF26
+
 #define DP_ADDR 0xfd4a
 #define DP_IRQ  113
 
@@ -172,6 +174,10 @@ static void xlnx_zynqmp_init(Object *obj)
 qdev_set_parent_bus(DEVICE(>spi[i]), sysbus_get_default());
 }
 
+object_initialize(>arm_gen_timer, sizeof(s->arm_gen_timer),
+  TYPE_ARM_GEN_TIMER);
+qdev_set_parent_bus(DEVICE(>arm_gen_timer), sysbus_get_default());
+
 object_initialize(>dp, sizeof(s->dp), TYPE_XLNX_DP);
 qdev_set_parent_bus(DEVICE(>dp), sysbus_get_default());
 
@@ -405,6 +411,13 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
 g_free(bus_name);
 }
 
+object_property_set_bool(OBJECT(>arm_gen_timer), true, "realized", 
);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+sysbus_mmio_map(SYS_BUS_DEVICE(>arm_gen_timer), 0, ARM_GEN_TIMER_ADDR);
+
 object_property_set_bool(OBJECT(>dp), true, "realized", );
 if (err) {
 error_propagate(errp, err);
diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
index c2931bf..8deabb4 100644
--- a/include/hw/arm/xlnx-zynqmp.h
+++ b/include/hw/arm/xlnx-zynqmp.h
@@ -26,6 +26,7 @@
 #include "hw/ide/ahci.h"
 #include "hw/sd/sdhci.h"
 #include "hw/ssi/xilinx_spips.h"
+#include "hw/timer/arm_generic_timer.h"
 #include "hw/dma/xlnx_dpdma.h"
 #include "hw/display/xlnx_dp.h"
 
@@ -83,6 +84,7 @@ typedef struct XlnxZynqMPState {
 SysbusAHCIState sata;
 SDHCIState sdhci[XLNX_ZYNQMP_NUM_SDHCI];
 XilinxSPIPS spi[XLNX_ZYNQMP_NUM_SPIS];
+ARMGenTimer arm_gen_timer;
 XlnxDPState dp;
 XlnxDPDMAState dpdma;
 
-- 
2.7.4




[Qemu-devel] [PATCH v6 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2016-11-07 Thread Ashish Mittal
Source code for the qnio library that this code loads can be downloaded from:
https://github.com/MittalAshish/libqnio.git

Sample command line using the JSON syntax:
./qemu-system-x86_64 -name instance-0008 -S -vnc 0.0.0.0:0 -k en-us
-vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
-msg timestamp=on
'json:{"driver":"vxhs","vdisk-id":"c3e9095a-a5ee-4dce-afeb-2a59fb387410",
"server":{"host":"172.172.17.4","port":""}}'

Sample command line using the URI syntax:
qemu-img convert -f raw -O raw -n
/var/lib/nova/instances/_base/0c5eacd5ebea5ed914b6a3e7b18f1ce734c386ad
vxhs://192.168.0.1:/c6718f6b-0401-441d-a8c3-1f0064d75ee0

Signed-off-by: Ashish Mittal 
---
v6 changelog:
(1) Added qemu-iotests for VxHS as a new patch in the series.
(2) Replaced release version from 2.8 to 2.9 in block-core.json.

v5 changelog:
(1) Incorporated v4 review comments.

v4 changelog:
(1) Incorporated v3 review comments on QAPI changes.
(2) Added refcounting for device open/close.
Free library resources on last device close.

v3 changelog:
(1) Added QAPI schema for the VxHS driver.

v2 changelog:
(1) Changes done in response to v1 comments.

 block/Makefile.objs  |   2 +
 block/trace-events   |  21 ++
 block/vxhs.c | 689 +++
 configure|  41 +++
 qapi/block-core.json |  21 +-
 5 files changed, 772 insertions(+), 2 deletions(-)
 create mode 100644 block/vxhs.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 67a036a..58313a2 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -18,6 +18,7 @@ block-obj-$(CONFIG_LIBNFS) += nfs.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 block-obj-$(CONFIG_GLUSTERFS) += gluster.o
+block-obj-$(CONFIG_VXHS) += vxhs.o
 block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o dirty-bitmap.o
@@ -38,6 +39,7 @@ rbd.o-cflags   := $(RBD_CFLAGS)
 rbd.o-libs := $(RBD_LIBS)
 gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
 gluster.o-libs := $(GLUSTERFS_LIBS)
+vxhs.o-libs:= $(VXHS_LIBS)
 ssh.o-cflags   := $(LIBSSH2_CFLAGS)
 ssh.o-libs := $(LIBSSH2_LIBS)
 archipelago.o-libs := $(ARCHIPELAGO_LIBS)
diff --git a/block/trace-events b/block/trace-events
index 882c903..efdd5ef 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -112,3 +112,24 @@ qed_aio_write_data(void *s, void *acb, int ret, uint64_t 
offset, size_t len) "s
 qed_aio_write_prefill(void *s, void *acb, uint64_t start, size_t len, uint64_t 
offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
 qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, 
uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
 qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) 
"s %p acb %p ret %d offset %"PRIu64" len %zu"
+
+# block/vxhs.c
+vxhs_iio_callback(int error, int reason) "ctx is NULL: error %d, reason %d"
+vxhs_setup_qnio(void *s) "Context to HyperScale IO manager = %p"
+vxhs_iio_callback_chnfail(int err, int error) "QNIO channel failed, no i/o %d, 
%d"
+vxhs_iio_callback_unknwn(int opcode, int err) "unexpected opcode %d, errno %d"
+vxhs_open_fail(int ret) "Could not open the device. Error = %d"
+vxhs_open_epipe(int ret) "Could not create a pipe for device. Bailing out. 
Error=%d"
+vxhs_aio_rw_invalid(int req) "Invalid I/O request iodir %d"
+vxhs_aio_rw_ioerr(char *guid, int iodir, uint64_t size, uint64_t off, void 
*acb, int ret, int err) "IO ERROR (vDisk %s) FOR : Read/Write = %d size = %lu 
offset = %lu ACB = %p. Error = %d, errno = %d"
+vxhs_get_vdisk_stat_err(char *guid, int ret, int err) "vDisk (%s) stat ioctl 
failed, ret = %d, errno = %d"
+vxhs_get_vdisk_stat(char *vdisk_guid, uint64_t vdisk_size) "vDisk %s stat 
ioctl returned size %lu"
+vxhs_qnio_iio_open(const char *ip) "Failed to connect to storage agent on 
host-ip %s"
+vxhs_qnio_iio_devopen(const char *fname) "Failed to open vdisk device: %s"
+vxhs_complete_aio(void *acb, uint64_t ret) "aio failed acb %p ret %ld"
+vxhs_parse_uri_filename(const char *filename) "URI passed via 
bdrv_parse_filename %s"
+vxhs_qemu_init_vdisk(const char *vdisk_id) "vdisk-id from json %s"
+vxhs_parse_uri_hostinfo(int num, char *host, int port) "Host %d: IP %s, Port 
%d"
+vxhs_qemu_init(char *of_vsa_addr, int port) "Adding host %s:%d to 
BDRVVXHSState"
+vxhs_qemu_init_filename(const char *filename) "Filename passed as %s"
+vxhs_close(char *vdisk_guid) "Closing vdisk %s"
diff --git a/block/vxhs.c b/block/vxhs.c
new file mode 100644
index 000..8913e8f
--- /dev/null
+++ b/block/vxhs.c
@@ -0,0 +1,689 @@
+/*
+ * QEMU Block driver for Veritas HyperScale (VxHS)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "block/block_int.h"
+#include 
+#include "qapi/qmp/qerror.h"

[Qemu-devel] [PATCH v6 0/2] block/vxhs: Add Veritas HyperScale VxHS block device support

2016-11-07 Thread Ashish Mittal
- Veritas HyperScale block driver in QEMU is designed to provide an accelerated
  IO path from KVM virtual machines to Veritas HyperScale storage service.

- A network IO transfer library that translates block IO from HyperScale block
  driver to a network IO format to send it to Veritas HyperScale storage
  service. This library (libqnio) has been open sourced and is available on
  github here: https://github.com/MittalAshish/libqnio

Ashish Mittal (2):
  block/vxhs.c: Add support for a new block device type called "vxhs"
  block/vxhs.c: Add qemu-iotests for new block device type "vxhs"

 block/Makefile.objs  |   2 +
 block/trace-events   |  21 ++
 block/vxhs.c | 689 +++
 configure|  41 +++
 qapi/block-core.json |  21 +-
 tests/qemu-iotests/common|   6 +
 tests/qemu-iotests/common.config |  13 +
 tests/qemu-iotests/common.filter |   1 +
 tests/qemu-iotests/common.rc |  19 ++
 9 files changed, 811 insertions(+), 2 deletions(-)
 create mode 100644 block/vxhs.c

-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v1 1/2] arm_generic_timer: Add the ARM Generic Timer

2016-11-07 Thread Alistair Francis
On Thu, Nov 3, 2016 at 1:47 AM, KONRAD Frederic
 wrote:
>
>
> Le 02/11/2016 à 17:41, Alistair Francis a écrit :
>>
>> Add the ARM generic timer. This allows the guest to poll the timer for
>> values and also supports secure writes only.
>>
>> Signed-off-by: Alistair Francis 
>> ---
>>
>>  hw/timer/Makefile.objs   |   1 +
>>  hw/timer/arm_generic_timer.c | 216
>> +++
>>  include/hw/timer/arm_generic_timer.h |  60 ++
>>  3 files changed, 277 insertions(+)
>>  create mode 100644 hw/timer/arm_generic_timer.c
>>  create mode 100644 include/hw/timer/arm_generic_timer.h
>>
>> diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
>> index 7ba8c23..f88c468 100644
>> --- a/hw/timer/Makefile.objs
>> +++ b/hw/timer/Makefile.objs
>> @@ -17,6 +17,7 @@ common-obj-$(CONFIG_IMX) += imx_epit.o
>>  common-obj-$(CONFIG_IMX) += imx_gpt.o
>>  common-obj-$(CONFIG_LM32) += lm32_timer.o
>>  common-obj-$(CONFIG_MILKYMIST) += milkymist-sysctl.o
>> +common-obj-$(CONFIG_XLNX_ZYNQMP) += arm_generic_timer.o
>>
>>  obj-$(CONFIG_EXYNOS4) += exynos4210_mct.o
>>  obj-$(CONFIG_EXYNOS4) += exynos4210_pwm.o
>> diff --git a/hw/timer/arm_generic_timer.c b/hw/timer/arm_generic_timer.c
>> new file mode 100644
>> index 000..8341e06
>> --- /dev/null
>> +++ b/hw/timer/arm_generic_timer.c
>> @@ -0,0 +1,216 @@
>> +/*
>> + * QEMU model of the ARM Generic Timer
>> + *
>> + * Copyright (c) 2016 Xilinx Inc.
>> + * Written by Alistair Francis 
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining
>> a copy
>> + * of this software and associated documentation files (the "Software"),
>> to deal
>> + * in the Software without restriction, including without limitation the
>> rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or
>> sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "hw/timer/arm_generic_timer.h"
>> +#include "qemu/timer.h"
>> +#include "qemu/log.h"
>> +
>> +#ifndef ARM_GEN_TIMER_ERR_DEBUG
>> +#define ARM_GEN_TIMER_ERR_DEBUG 0
>> +#endif
>> +
>> +static void counter_control_postw(RegisterInfo *reg, uint64_t val64)
>> +{
>> +ARMGenTimer *s = ARM_GEN_TIMER(reg->opaque);
>> +bool new_status = extract32(s->regs[R_COUNTER_CONTROL_REGISTER],
>> +R_COUNTER_CONTROL_REGISTER_EN_SHIFT,
>> +R_COUNTER_CONTROL_REGISTER_EN_LENGTH);
>> +uint64_t current_ticks;
>> +
>> +current_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
>> + NANOSECONDS_PER_SECOND, 100);
>> +
>> +if ((s->enabled && !new_status) ||
>> +(!s->enabled && new_status)) {
>> +/* The timer is being disabled or enabled */
>> +s->tick_offset = current_ticks - s->tick_offset;
>> +}
>> +
>> +s->enabled = new_status;
>> +}
>> +
>> +static uint64_t couter_low_value_postr(RegisterInfo *reg, uint64_t val64)
>
>
> s/couter/counter ?
>
>> +{
>> +ARMGenTimer *s = ARM_GEN_TIMER(reg->opaque);
>> +uint64_t current_ticks, total_ticks;
>> +uint32_t low_ticks;
>> +
>> +if (s->enabled) {
>> +current_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
>> + NANOSECONDS_PER_SECOND, 100);
>> +total_ticks = current_ticks - s->tick_offset;
>> +low_ticks = (uint32_t) total_ticks;
>> +} else {
>> +/* Timer is disabled, return the time when it was disabled */
>> +low_ticks = (uint32_t) s->tick_offset;
>> +}
>> +
>> +return low_ticks;
>> +}
>> +
>> +static uint64_t couter_high_value_postr(RegisterInfo *reg, uint64_t
>> val64)
>
>
> same here?

Thanks Fred, I'm sending out a V2 with this fixed.

Thanks,

Alistair

>
> Fred
>
>
>> +{
>> +ARMGenTimer *s = ARM_GEN_TIMER(reg->opaque);
>> +uint64_t current_ticks, total_ticks;
>> +uint32_t high_ticks;
>> +
>> +if (s->enabled) {
>> +current_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
>> + NANOSECONDS_PER_SECOND, 100);
>> +  

[Qemu-devel] [PATCH v2 1/1] cadence_uart: Check baud rate generator and divider values on migration

2016-11-07 Thread Alistair Francis
The Cadence UART device emulator calculates speed by dividing the
baud rate by a 'baud rate generator' & 'baud rate divider' value.
The device specification defines these register values to be
non-zero and within certain limits. Checks were recently added when
writing to these registers but not when restoring from migration.

This patch adds checks when restoring from migration to avoid divide by
zero errors.

Reported-by: Huawei PSIRT 
Signed-off-by: Alistair Francis 
---
V2:
 - Abort the migration if the data is invalid

 hw/char/cadence_uart.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c
index def34cd..9568ac6 100644
--- a/hw/char/cadence_uart.c
+++ b/hw/char/cadence_uart.c
@@ -487,6 +487,13 @@ static int cadence_uart_post_load(void *opaque, int 
version_id)
 {
 CadenceUARTState *s = opaque;
 
+/* Ensure these two aren't invalid numbers */
+if (s->r[R_BRGR] <= 1 || s->r[R_BRGR] & 0x ||
+s->r[R_BDIV] <= 3 || s->r[R_BDIV] & 0xFF) {
+/* Value is invalid, abort */
+return 1;
+}
+
 uart_parameters_setup(s);
 uart_update_status(s);
 return 0;
-- 
2.7.4




Re: [Qemu-devel] [PATCH] Document how x86 gdb_num_core_regs is computed.

2016-11-07 Thread Paolo Bonzini


On 03/11/2016 22:48, Doug Evans wrote:
> Hi.
> 
> It helps when reading the code to see how the number is arrived at.
> 
> Signed-off-by: Doug Evans 
> ---
>  target-i386/cpu.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 14c5186..01f1ab0 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -3721,6 +3721,8 @@ static void x86_cpu_common_class_init(ObjectClass
> *oc, void *data)
>  cc->write_elf32_qemunote = x86_cpu_write_elf32_qemunote;
>  cc->vmsd = _x86_cpu;
>  #endif
> +/* CPU_NB_REGS * 2 = general regs + xmm regs
> +   25 = eip, eflags, 6 seg regs, st[0-7], fctrl,...,fop, mxcsr */
>  cc->gdb_num_core_regs = CPU_NB_REGS * 2 + 25;
>  #ifndef CONFIG_USER_ONLY
>  cc->debug_excp_handler = breakpoint_handler;
> -- 
> 
> 

Queued for 2.8, thanks.

Paolo



Re: [Qemu-devel] [QEMU PATCH v2] kvmclock: advance clock by time window between vm_stop and pre_save

2016-11-07 Thread Marcelo Tosatti
On Mon, Nov 07, 2016 at 08:03:50PM +, Dr. David Alan Gilbert wrote:
> * Marcelo Tosatti (mtosa...@redhat.com) wrote:
> > On Mon, Nov 07, 2016 at 03:46:11PM +, Dr. David Alan Gilbert wrote:
> > > * Marcelo Tosatti (mtosa...@redhat.com) wrote:
> > > > This patch, relative to pre-copy migration codepath,
> > > > measures the time between vm_stop() and pre_save(),
> > > > which includes copying the remaining RAM to destination,
> > > > and advances the clock by that amount.
> > > > 
> > > > In a VM with 5 seconds downtime, this reduces the guest
> > > > clock difference on destination from 5s to 0.2s.
> > > > 
> > > > Tested with Linux and Windows 2012 R2 guests with -cpu XXX,+hv-time.
> > > 
> > > One thing that bothers me is that it's only this clock that's
> > > getting corrected; doesn't it cause things to get upset when
> > > one clock moves and the others dont?
> > 
> > If you are correlating the clocks, then yes.
> > 
> > Older Linux guests get upset (marking the TSC clocksource unstable
> > because the watchdog checks TSC vs kvmclock), but there is a workaround for 
> > it 
> > in newer guests
> > (kvmclock interface to notify watchdog to not complain).
> > 
> > Note marking TSC clocksource unstable on older guests is harmless
> > because kvmclock is the standard clocksource.
> > 
> > For Windows guests, i don't know that Windows correlates between different
> > clocks.
> > 
> > That is, there is relative control as to which software reads kvmclock 
> > or Windows TIMER MSR, so i don't see the need to advance every clock 
> > exposed.
> > 
> > > Shouldn't the pause delay be recorded somewhere architecturally
> > > independent and then be a thing that kvm-clock happens to use and
> > > other clocks might as well?
> > 
> > In theory, yes. In practice, i don't see the need for this... 
> 
> It seems unlikely to me that x86 is the only one that will want
> to do something similar.

Can't they copy what kvmclock is doing today? 




[Qemu-devel] [QEMU PATCH v12 4/4] migration: add error_report

2016-11-07 Thread Jianjun Duan
Added error_report where version_ids do not match in vmstate_load_state.

Signed-off-by: Jianjun Duan 
---
 migration/vmstate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/vmstate.c b/migration/vmstate.c
index 2f9d4ba..0e6fce4 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -85,6 +85,7 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription 
*vmsd,
 
 trace_vmstate_load_state(vmsd->name, version_id);
 if (version_id > vmsd->version_id) {
+error_report("%s %s",  vmsd->name, "too new");
 trace_vmstate_load_state_end(vmsd->name, "too new", -EINVAL);
 return -EINVAL;
 }
@@ -95,6 +96,7 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription 
*vmsd,
 trace_vmstate_load_state_end(vmsd->name, "old path", ret);
 return ret;
 }
+error_report("%s %s",  vmsd->name, "too old");
 trace_vmstate_load_state_end(vmsd->name, "too old", -EINVAL);
 return -EINVAL;
 }
-- 
1.9.1




[Qemu-devel] [QEMU PATCH v12 2/4] migration: migrate QTAILQ

2016-11-07 Thread Jianjun Duan
Currently we cannot directly transfer a QTAILQ instance because of the
limitation in the migration code. Here we introduce an approach to
transfer such structures. We created VMStateInfo vmstate_info_qtailq
for QTAILQ. Similar VMStateInfo can be created for other data structures
such as list.

When a QTAILQ is migrated from source to target, it is appended to the
corresponding QTAILQ structure, which is assumed to have been properly
initialized.

This approach will be used to transfer pending_events and ccs_list in spapr
state.

We also create some macros in qemu/queue.h to access a QTAILQ using pointer
arithmetic. This ensures that we do not depend on the implementation
details about QTAILQ in the migration code.

Signed-off-by: Jianjun Duan 
---
 include/migration/vmstate.h | 20 +
 include/qemu/queue.h| 60 +++
 migration/trace-events  |  4 +++
 migration/vmstate.c | 69 +
 4 files changed, 153 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index eafc8f2..6289327 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -253,6 +253,7 @@ extern const VMStateInfo vmstate_info_timer;
 extern const VMStateInfo vmstate_info_buffer;
 extern const VMStateInfo vmstate_info_unused_buffer;
 extern const VMStateInfo vmstate_info_bitmap;
+extern const VMStateInfo vmstate_info_qtailq;
 
 #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
 #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)
@@ -664,6 +665,25 @@ extern const VMStateInfo vmstate_info_bitmap;
 .offset   = offsetof(_state, _field),\
 }
 
+/* For QTAILQ that need customized handling.
+ * Target QTAILQ needs be properly initialized.
+ * _type: type of QTAILQ element
+ * _next: name of QTAILQ entry field in QTAILQ element
+ * _vmsd: VMSD for QTAILQ element
+ * size: size of QTAILQ element
+ * start: offset of QTAILQ entry in QTAILQ element
+ */
+#define VMSTATE_QTAILQ_V(_field, _state, _version, _vmsd, _type, _next)  \
+{\
+.name = (stringify(_field)), \
+.version_id   = (_version),  \
+.vmsd = &(_vmsd),\
+.size = sizeof(_type),   \
+.info = _info_qtailq,\
+.offset   = offsetof(_state, _field),\
+.start= offsetof(_type, _next),  \
+}
+
 /* _f : field name
_f_n : num of elements field_name
_n : num of elements
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 342073f..75616e1 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -438,4 +438,64 @@ struct {   
 \
 #define QTAILQ_PREV(elm, headname, field) \
 (*(((struct headname *)((elm)->field.tqe_prev))->tqh_last))
 
+#define field_at_offset(base, offset, type)
\
+((type) (((char *) (base)) + (offset)))
+
+typedef struct DUMMY_Q_ENTRY DUMMY_Q_ENTRY;
+typedef struct DUMMY_Q DUMMY_Q;
+
+struct DUMMY_Q_ENTRY {
+QTAILQ_ENTRY(DUMMY_Q_ENTRY) next;
+};
+
+struct DUMMY_Q {
+QTAILQ_HEAD(DUMMY_Q_HEAD, DUMMY_Q_ENTRY) head;
+};
+
+#define dummy_q ((DUMMY_Q *) 0)
+#define dummy_qe ((DUMMY_Q_ENTRY *) 0)
+
+/*
+ * Offsets of layout of a tail queue head.
+ */
+#define QTAILQ_FIRST_OFFSET (offsetof(typeof(dummy_q->head), tqh_first))
+#define QTAILQ_LAST_OFFSET  (offsetof(typeof(dummy_q->head), tqh_last))
+/*
+ * Raw access of elements of a tail queue
+ */
+#define QTAILQ_RAW_FIRST(head) 
\
+(*field_at_offset(head, QTAILQ_FIRST_OFFSET, void **))
+#define QTAILQ_RAW_LAST(head)  
\
+(*field_at_offset(head, QTAILQ_LAST_OFFSET, void ***))
+
+/*
+ * Offsets of layout of a tail queue element.
+ */
+#define QTAILQ_NEXT_OFFSET (offsetof(typeof(dummy_qe->next), tqe_next))
+#define QTAILQ_PREV_OFFSET (offsetof(typeof(dummy_qe->next), tqe_prev))
+
+/*
+ * Raw access of elements of a tail entry
+ */
+#define QTAILQ_RAW_NEXT(elm, entry)
\
+(*field_at_offset(elm, entry + QTAILQ_NEXT_OFFSET, void **))
+#define QTAILQ_RAW_PREV(elm, entry)
\
+(*field_at_offset(elm, entry + QTAILQ_PREV_OFFSET, void ***))
+/*
+ * Tail queue tranversal using pointer arithmetic.
+ */
+#define QTAILQ_RAW_FOREACH(elm, head, entry)   
\
+for ((elm) = QTAILQ_RAW_FIRST(head);   
\
+ (elm); 

[Qemu-devel] [QEMU PATCH v12 3/4] tests/migration: Add test for QTAILQ migration

2016-11-07 Thread Jianjun Duan
Add a test for QTAILQ migration to tests/test-vmstate.c.

Signed-off-by: Jianjun Duan 
---
 tests/test-vmstate.c | 160 +++
 1 file changed, 160 insertions(+)

diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index d8da26f..a992408 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -475,6 +475,164 @@ static void test_load_skip(void)
 qemu_fclose(loading);
 }
 
+
+/* test QTAILQ migration */
+typedef struct TestQtailqElement TestQtailqElement;
+
+struct TestQtailqElement {
+bool b;
+uint8_t  u8;
+QTAILQ_ENTRY(TestQtailqElement) next;
+};
+
+typedef struct TestQtailq {
+int16_t  i16;
+QTAILQ_HEAD(TestQtailqHead, TestQtailqElement) q;
+int32_t  i32;
+} TestQtailq;
+
+static const VMStateDescription vmstate_q_element = {
+.name = "test/queue-element",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_BOOL(b, TestQtailqElement),
+VMSTATE_UINT8(u8, TestQtailqElement),
+VMSTATE_END_OF_LIST()
+},
+};
+
+static const VMStateDescription vmstate_q = {
+.name = "test/queue",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_INT16(i16, TestQtailq),
+VMSTATE_QTAILQ_V(q, TestQtailq, 1, vmstate_q_element, 
TestQtailqElement,
+ next),
+VMSTATE_INT32(i32, TestQtailq),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void test_save_q(void)
+{
+TestQtailq obj_q = {
+.i16 = -512,
+.i32 = 7,
+};
+
+TestQtailqElement obj_qe1 = {
+.b = true,
+.u8 = 130,
+};
+
+TestQtailqElement obj_qe2 = {
+.b = false,
+.u8 = 65,
+};
+
+uint8_t wire_q[] = {
+/* i16 */ 0xfe, 0x0,
+/* start of element 0 of q */ 0x01,
+/* .b  */ 0x01,
+/* .u8 */ 0x82,
+/* start of element 1 of q */ 0x01,
+/* b */   0x00,
+/* u8 */  0x41,
+/* end of q */0x00,
+/* i32 */ 0x00, 0x01, 0x11, 0x70,
+QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
+};
+
+QTAILQ_INIT(_q.q);
+QTAILQ_INSERT_TAIL(_q.q, _qe1, next);
+QTAILQ_INSERT_TAIL(_q.q, _qe2, next);
+
+save_vmstate(_q, _q);
+compare_vmstate(wire_q, sizeof(wire_q));
+}
+
+static void test_load_q(void)
+{
+TestQtailq obj_q = {
+.i16 = -512,
+.i32 = 7,
+};
+
+TestQtailqElement obj_qe1 = {
+.b = true,
+.u8 = 130,
+};
+
+TestQtailqElement obj_qe2 = {
+.b = false,
+.u8 = 65,
+};
+
+uint8_t wire_q[] = {
+/* i16 */ 0xfe, 0x0,
+/* start of element 0 of q */ 0x01,
+/* .b  */ 0x01,
+/* .u8 */ 0x82,
+/* start of element 1 of q */ 0x01,
+/* b */   0x00,
+/* u8 */  0x41,
+/* end of q */0x00,
+/* i32 */ 0x00, 0x01, 0x11, 0x70,
+};
+
+QTAILQ_INIT(_q.q);
+QTAILQ_INSERT_TAIL(_q.q, _qe1, next);
+QTAILQ_INSERT_TAIL(_q.q, _qe2, next);
+
+QEMUFile *fsave = open_test_file(true);
+
+qemu_put_buffer(fsave, wire_q, sizeof(wire_q));
+qemu_put_byte(fsave, QEMU_VM_EOF);
+g_assert(!qemu_file_get_error(fsave));
+qemu_fclose(fsave);
+
+QEMUFile *fload = open_test_file(false);
+TestQtailq tgt;
+
+QTAILQ_INIT();
+vmstate_load_state(fload, _q, , 1);
+char eof = qemu_get_byte(fload);
+g_assert(!qemu_file_get_error(fload));
+g_assert_cmpint(tgt.i16, ==, obj_q.i16);
+g_assert_cmpint(tgt.i32, ==, obj_q.i32);
+g_assert_cmpint(eof, ==, QEMU_VM_EOF);
+
+TestQtailqElement *qele_from = QTAILQ_FIRST(_q.q);
+TestQtailqElement *qlast_from = QTAILQ_LAST(_q.q, TestQtailqHead);
+TestQtailqElement *qele_to = QTAILQ_FIRST();
+TestQtailqElement *qlast_to = QTAILQ_LAST(, TestQtailqHead);
+
+while (1) {
+g_assert_cmpint(qele_to->b, ==, qele_from->b);
+g_assert_cmpint(qele_to->u8, ==, qele_from->u8);
+if ((qele_from == qlast_from) || (qele_to == qlast_to)) {
+break;
+}
+qele_from = QTAILQ_NEXT(qele_from, next);
+qele_to = QTAILQ_NEXT(qele_to, next);
+}
+
+g_assert_cmpint((uint64_t) qele_from, ==, (uint64_t) qlast_from);
+g_assert_cmpint((uint64_t) qele_to, ==, (uint64_t) qlast_to);
+
+/* clean up */
+TestQtailqElement *qele;
+while (!QTAILQ_EMPTY()) {
+qele = QTAILQ_LAST(, TestQtailqHead);
+QTAILQ_REMOVE(, qele, next);
+free(qele);
+qele = NULL;
+}
+qemu_fclose(fload);
+}
+
 int main(int argc, char **argv)
 {
 temp_fd = mkstemp(temp_file);
@@ 

[Qemu-devel] [QEMU PATCH v12 1/4] migration: extend VMStateInfo

2016-11-07 Thread Jianjun Duan
Current migration code cannot handle some data structures such as
QTAILQ in qemu/queue.h. Here we extend the signatures of put/get
in VMStateInfo so that customized handling is supported. put now
will return int type.

Signed-off-by: Jianjun Duan 
---
 hw/display/virtio-gpu.c |   8 +++-
 hw/intc/s390_flic_kvm.c |   8 +++-
 hw/net/vmxnet3.c|  24 +++---
 hw/nvram/eeprom93xx.c   |   8 +++-
 hw/nvram/fw_cfg.c   |   8 +++-
 hw/pci/msix.c   |   8 +++-
 hw/pci/pci.c|  16 +--
 hw/pci/shpc.c   |   7 ++-
 hw/scsi/scsi-bus.c  |   8 +++-
 hw/timer/twl92230.c |   8 +++-
 hw/usb/redirect.c   |  24 +++---
 hw/virtio/virtio-pci.c  |   8 +++-
 hw/virtio/virtio.c  |  15 --
 include/migration/vmstate.h |  19 ++--
 migration/savevm.c  |   7 ++-
 migration/vmstate.c | 113 +---
 target-alpha/machine.c  |   6 ++-
 target-arm/machine.c|  14 --
 target-i386/machine.c   |  26 +++---
 target-mips/machine.c   |  14 --
 target-ppc/machine.c|  12 +++--
 target-sparc/machine.c  |   6 ++-
 22 files changed, 262 insertions(+), 105 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 60bce94..c58fa1b 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -988,7 +988,8 @@ static const VMStateDescription vmstate_virtio_gpu_scanouts 
= {
 },
 };
 
-static void virtio_gpu_save(QEMUFile *f, void *opaque, size_t size)
+static int virtio_gpu_save(QEMUFile *f, void *opaque, size_t size,
+   VMStateField *field, QJSON *vmdesc)
 {
 VirtIOGPU *g = opaque;
 struct virtio_gpu_simple_resource *res;
@@ -1013,9 +1014,12 @@ static void virtio_gpu_save(QEMUFile *f, void *opaque, 
size_t size)
 qemu_put_be32(f, 0); /* end of list */
 
 vmstate_save_state(f, _virtio_gpu_scanouts, g, NULL);
+
+return 0;
 }
 
-static int virtio_gpu_load(QEMUFile *f, void *opaque, size_t size)
+static int virtio_gpu_load(QEMUFile *f, void *opaque, size_t size,
+   VMStateField *field)
 {
 VirtIOGPU *g = opaque;
 struct virtio_gpu_simple_resource *res;
diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
index 21ac2e2..61f512f 100644
--- a/hw/intc/s390_flic_kvm.c
+++ b/hw/intc/s390_flic_kvm.c
@@ -286,7 +286,8 @@ static void kvm_s390_release_adapter_routes(S390FLICState 
*fs,
  * increase until buffer is sufficient or maxium size is
  * reached
  */
-static void kvm_flic_save(QEMUFile *f, void *opaque, size_t size)
+static int kvm_flic_save(QEMUFile *f, void *opaque, size_t size,
+ VMStateField *field, QJSON *vmdesc)
 {
 KVMS390FLICState *flic = opaque;
 int len = FLIC_SAVE_INITIAL_SIZE;
@@ -319,6 +320,8 @@ static void kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size)
 count * sizeof(struct kvm_s390_irq));
 }
 g_free(buf);
+
+return 0;
 }
 
 /**
@@ -331,7 +334,8 @@ static void kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size)
  * Note: Do nothing when no interrupts where stored
  * in QEMUFile
  */
-static int kvm_flic_load(QEMUFile *f, void *opaque, size_t size)
+static int kvm_flic_load(QEMUFile *f, void *opaque, size_t size,
+ VMStateField *field)
 {
 uint64_t len = 0;
 uint64_t count = 0;
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 92f6af9..4163ca8 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2451,7 +2451,8 @@ static void vmxnet3_put_tx_stats_to_file(QEMUFile *f,
 qemu_put_be64(f, tx_stat->pktsTxDiscard);
 }
 
-static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, size_t size,
+VMStateField *field)
 {
 Vmxnet3TxqDescr *r = pv;
 
@@ -2465,7 +2466,8 @@ static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, 
size_t size)
 return 0;
 }
 
-static void vmxnet3_put_txq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_put_txq_descr(QEMUFile *f, void *pv, size_t size,
+ VMStateField *field, QJSON *vmdesc)
 {
 Vmxnet3TxqDescr *r = pv;
 
@@ -2474,6 +2476,8 @@ static void vmxnet3_put_txq_descr(QEMUFile *f, void *pv, 
size_t size)
 qemu_put_byte(f, r->intr_idx);
 qemu_put_be64(f, r->tx_stats_pa);
 vmxnet3_put_tx_stats_to_file(f, >txq_stats);
+
+return 0;
 }
 
 static const VMStateInfo txq_descr_info = {
@@ -2512,7 +2516,8 @@ static void vmxnet3_put_rx_stats_to_file(QEMUFile *f,
 qemu_put_be64(f, rx_stat->pktsRxError);
 }
 
-static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, size_t size,
+VMStateField *field)
 {
 Vmxnet3RxqDescr *r = pv;
 int i;
@@ -2530,7 +2535,8 @@ static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, 
size_t size)
  

[Qemu-devel] [QEMU PATCH v12 0/4] migration: migrate QTAILQ

2016-11-07 Thread Jianjun Duan
Hi all,

I addressed some review comments. For QTAILQ, I hope we can reach a 
compromise since there are more than one way to do it. For some work such as 
the ability to initialize newly allocated QTAILQ element with default value, I 
think it is better to do it later on in a separate series. 
Comments are welcome. 

v12: - Fixed return type for put_qtailq which caused build break.

Previous versions are:

v11: - Split error_report statements into a separate patch.
 - Changed the signature of put. It now returns int type.
 - Minor changes to QTAILQ macros. 
 
v10: - Fixed a typo.
(http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg01206.html)

v9: - No more hard encoding of QTAILQ layout information
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg01042.html)

v8: - Fixed a style issue. 
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00874.html)

v7: - Fixed merge errors.
- Simplified macro definitions related to pointer arithmetic based QTAILQ 
access.
- Added test case for QTAILQ migration in tests/test-vmstate.c.
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00711.html)


v6: - Split from Power specific patches. 
- Dropped VMS_LINKED flag.
- Rebased to master.
- Added comments to clarify about put/get in VMStateInfo.  
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00336.html)

v5: - Rebased to David's ppc-for-2.8. 
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg00270.html)

v4: - Introduce a way to set customized instance_id in SaveStateEntry. Use it
  to set instance_id for DRC using its unique index to address David 
  Gibson's concern.
- Rename VMS_CSTM to VMS_LINKED based on Paolo Bonzini's suggestions.
- Clean up qjson stuff in put_qtailq. 
- Add trace for put_qtailq and get_qtailq based on David Gilbert's 
  suggestion.
- Based on David's ppc-for-2.7. 
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07720.html)

v3: - Simplify overall design followng discussion with Paolo. No longer need
  metadata to migrate QTAILQ.
- Extend VMStateInfo instead of adding similar fields to VMStateField.
- Clean up macros in qemu/queue.h.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg05695.html)

v2: - Introduce a general approach to migrate QTAILQ in qemu/queue.h.
- Migrate signalled field in the DRC state.
- Put the newly added migrating fields in subsections so that backward 
  migration is not broken.  
- Set detach_cb field right after migration so that a migrated hot-unplug
  event could finish its course.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04188.html)

v1: - Inital version.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02601.html)

Jianjun Duan (4):
  migration: extend VMStateInfo
  migration: migrate QTAILQ
  tests/migration: Add test for QTAILQ migration
  migration: add error_report

 hw/display/virtio-gpu.c |   8 +-
 hw/intc/s390_flic_kvm.c |   8 +-
 hw/net/vmxnet3.c|  24 --
 hw/nvram/eeprom93xx.c   |   8 +-
 hw/nvram/fw_cfg.c   |   8 +-
 hw/pci/msix.c   |   8 +-
 hw/pci/pci.c|  16 +++-
 hw/pci/shpc.c   |   7 +-
 hw/scsi/scsi-bus.c  |   8 +-
 hw/timer/twl92230.c |   8 +-
 hw/usb/redirect.c   |  24 --
 hw/virtio/virtio-pci.c  |   8 +-
 hw/virtio/virtio.c  |  15 +++-
 include/migration/vmstate.h |  39 --
 include/qemu/queue.h|  60 +++
 migration/savevm.c  |   7 +-
 migration/trace-events  |   4 +
 migration/vmstate.c | 184 +++-
 target-alpha/machine.c  |   6 +-
 target-arm/machine.c|  14 +++-
 target-i386/machine.c   |  26 +--
 target-mips/machine.c   |  14 +++-
 target-ppc/machine.c|  12 ++-
 target-sparc/machine.c  |   6 +-
 tests/test-vmstate.c| 160 ++
 25 files changed, 577 insertions(+), 105 deletions(-)

-- 
1.9.1




Re: [Qemu-devel] [PATCH v11 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-07 Thread Alex Williamson
On Sat, 5 Nov 2016 02:40:46 +0530
Kirti Wankhede  wrote:

> Add a notifier calback to parent's ops structure of mdev device so that per
> device notifer for vfio module is registered through vfio_mdev module.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
> ---
>  drivers/vfio/mdev/vfio_mdev.c | 19 +++
>  include/linux/mdev.h  |  9 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index bb534d19e321..2b7c24aa9e46 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -24,6 +24,15 @@
>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>  
> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
> action,
> +   void *data)
> +{
> + struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
> + struct parent_device *parent = mdev->parent;
> +
> + return parent->ops->notifier(mdev, action, data);
> +}
> +
>  static int vfio_mdev_open(void *device_data)
>  {
>   struct mdev_device *mdev = device_data;
> @@ -40,6 +49,11 @@ static int vfio_mdev_open(void *device_data)
>   if (ret)
>   module_put(THIS_MODULE);
>  
> + if (likely(parent->ops->notifier)) {
> + mdev->nb.notifier_call = vfio_mdev_notifier;
> + if (vfio_register_notifier(>dev, >nb))
> + pr_err("Failed to register notifier for mdev\n");
> + }
>   return ret;
>  }
>  
> @@ -48,6 +62,11 @@ static void vfio_mdev_release(void *device_data)
>   struct mdev_device *mdev = device_data;
>   struct parent_device *parent = mdev->parent;
>  
> + if (likely(parent->ops->notifier)) {
> + if (vfio_unregister_notifier(>dev, >nb))
> + pr_err("Failed to unregister notifier for mdev\n");
> + }
> +

Ok, I guess this is sufficient to automatically handle the unregister
at the mdev layer.  No need for my comments on the previous other
than the ordering of when the callback is called.  Thanks,

Alex

>   if (likely(parent->ops->release))
>   parent->ops->release(mdev);
>  
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 0352febc1944..2999ef0ddaed 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -37,6 +37,7 @@ struct mdev_device {
>   struct kref ref;
>   struct list_headnext;
>   struct kobject  *type_kobj;
> + struct notifier_block   nb;
>  };
>  
>  
> @@ -84,6 +85,12 @@ struct mdev_device {
>   *   @cmd: mediated device structure
>   *   @arg: mediated device structure
>   * @mmap:mmap callback
> + *   @mdev: mediated device structure
> + *   @vma: vma structure
> + * @notifer: Notifier callback
> + *   @mdev: mediated device structure
> + *   @action: Action for which notifier is called
> + *   @data: Data associated with the notifier
>   * Parent device that support mediated device should be registered with mdev
>   * module with parent_ops structure.
>   **/
> @@ -105,6 +112,8 @@ struct parent_ops {
>   ssize_t (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
>unsigned long arg);
>   int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
> + int (*notifier)(struct mdev_device *mdev, unsigned long action,
> + void *data);
>  };
>  
>  /* interface for exporting mdev supported type attributes */




[Qemu-devel] [Bug 732155] Re: system_reset doesn't work with qemu-kvm and latest SeaBIOS

2016-11-07 Thread Matthew Bloch
Hi Thomas, thanks for the triage.  I'm a few years past working on this
project directly so if it's not affecting anyone else I'd probably just
close this bug.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/732155

Title:
  system_reset doesn't work with qemu-kvm and latest SeaBIOS

Status in QEMU:
  Incomplete

Bug description:
  I've built qemu-kvm and seabios from the latest git sources, and found
  that the system_reset monitor command causes a freeze if I start qemu-
  system-x86_64 with the -no-kvm flag.  This is a serial log from an
  attempt at rebooting:

  $ ./x86_64-softmmu/qemu-system-x86_64 -monitor stdio -bios 
../seabios/out/bios.bin -serial /dev/stdout -no-kvm
  QEMU 0.14.50 monitor - type 'help' for more information
  (qemu) Changing serial settings was 0/0 now 3/0
  Start bios (version pre-0.6.3-20110309_171929-desk4)
  Ram Size=0x0800 (0x high)
  CPU Mhz=2202
  PCI: pci_bios_init_bus_rec bus = 0x0
  PIIX3/PIIX4 init: elcr=00 0c
  PCI: bus=0 devfn=0x00: vendor_id=0x8086 device_id=0x1237
  PCI: bus=0 devfn=0x08: vendor_id=0x8086 device_id=0x7000
  PCI: bus=0 devfn=0x09: vendor_id=0x8086 device_id=0x7010
  region 4: 0xc000
  PCI: bus=0 devfn=0x0b: vendor_id=0x8086 device_id=0x7113
  PCI: bus=0 devfn=0x10: vendor_id=0x1013 device_id=0x00b8
  region 0: 0xf000
  region 1: 0xf200
  region 6: 0xf201
  PCI: bus=0 devfn=0x18: vendor_id=0x10ec device_id=0x8139
  region 0: 0xc100
  region 1: 0xf202
  region 6: 0xf203
  Found 1 cpu(s) max supported 1 cpu(s)
  MP table addr=0x000fdb40 MPC table addr=0x000fdb50 size=224
  SMBIOS ptr=0x000fdb20 table=0x07fffef0
  ACPI tables: RSDP=0x000fdaf0 RSDT=0x07ffd6a0
  Scan for VGA option rom
  Running option rom at c000:0003
  Turning on vga text mode console
  SeaBIOS (version pre-0.6.3-20110309_171929-desk4)

  PS2 keyboard initialized
  Found 1 lpt ports
  Found 1 serial ports
  ATA controller 0 at 1f0/3f4/0 (irq 14 dev 9)
  ATA controller 1 at 170/374/0 (irq 15 dev 9)
  DVD/CD [ata1-0: QEMU DVD-ROM ATAPI-4 DVD/CD]
  Searching bootorder for: /pci@i0cf8/*@1,1/drive@1/disk@0
  Scan for option roms
  Running option rom at c900:0003
  pnp call arg1=60
  pmm call arg1=0
  pmm call arg1=2
  pmm call arg1=0
  Searching bootorder for: /pci@i0cf8/*@3
  Searching bootorder for: /rom@genroms/vapic.bin
  Running option rom at c980:0003
  ebda moved from 9fc00 to 9f400
  Returned 53248 bytes of ZoneHigh
  e820 map has 6 items:
0:  - 0009f400 = 1
1: 0009f400 - 000a = 2
2: 000f - 0010 = 2
3: 0010 - 07ffd000 = 1
4: 07ffd000 - 0800 = 2
5: fffc - 0001 = 2
  enter handle_19:
NULL
  Booting from DVD/CD...
  Device reports MEDIUM NOT PRESENT
  atapi_is_ready returned -1
  Boot failed: Could not read from CDROM (code 0003)
  enter handle_18:
NULL
  Booting from ROM...
  Booting from c900:0336

  (qemu) 
  (qemu) system_reset
  (qemu) RESET REQUESTEDChanging serial settings was 0/0 now 3/0
  Start bios (version pre-0.6.3-20110309_171929-desk4)
  Attempting a hard reboot
  prep_reset
  apm_shutdown?
  i8042_reboot
  i8042: wait to write...
  i8042: outb
  RESET REQUESTED
  (qemu) 
  (qemu) 
  (qemu) 
  (qemu) info cpus
  * CPU #0: pc=0xfff0 thread_id=18125 
  (qemu) system_reset
  (qemu) RESET REQUESTED
  (qemu) 
  (qemu) q

  I've tried fiddling a few build options in SeaBIOS but I'm not sure
  that's where the issue lies.  The RESET REQUESTED is me adding some
  extra debug to vl.c:1477 in the clause that tests for a reset request,
  and the i8042: lines are debug lines from seabios tracing the
  execution of the reset request.

  This may be a bug in SeaBIOS of course, since I can replicate the
  behaviour on my distro's qemu and kvm packages.  However it seems odd
  that qemu behaves differently with KVM turned on (i.e. system_reset
  works) than with it disabled.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/732155/+subscriptions



Re: [Qemu-devel] [V3,6/7] nios2: Add Altera 10M50 GHRD emulation

2016-11-07 Thread Guenter Roeck
On Tue, Oct 18, 2016 at 11:50:30PM +0200, Marek Vasut wrote:
> Add the Altera 10M50 Nios2 GHRD model. This allows emulating the
> 10M50 development kit with the Nios2 GHRD loaded in the FPGA. It
> is possible to boot Linux kernel and run userspace, thus far only
> from initrd as storage support is not yet implemented.
> 
> Signed-off-by: Marek Vasut 
> Cc: Chris Wulff 
> Cc: Jeff Da Silva 
> Cc: Ley Foon Tan 
> Cc: Sandra Loosemore 
> Cc: Yves Vandervennet 
> ---
> V3: Checkpatch cleanup, move cpu_pic.c here
> ---
>  hw/nios2/10m50_devboard.c | 126 ++
>  hw/nios2/Makefile.objs|   1 +
>  hw/nios2/boot.c   | 223 
> ++
>  hw/nios2/boot.h   |  11 +++
>  hw/nios2/cpu_pic.c|  70 +++
>  5 files changed, 431 insertions(+)
>  create mode 100644 hw/nios2/10m50_devboard.c
>  create mode 100644 hw/nios2/Makefile.objs
>  create mode 100644 hw/nios2/boot.c
>  create mode 100644 hw/nios2/boot.h
>  create mode 100644 hw/nios2/cpu_pic.c
> 
> diff --git a/hw/nios2/10m50_devboard.c b/hw/nios2/10m50_devboard.c
> new file mode 100644
> index 000..62e5738
> --- /dev/null
> +++ b/hw/nios2/10m50_devboard.c
> @@ -0,0 +1,126 @@
> +/*
> + * Altera 10M50 Nios2 GHRD
> + *
> + * Copyright (c) 2016 Marek Vasut 
> + *
> + * Based on LabX device code
> + *
> + * Copyright (c) 2012 Chris Wulff 
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see
> + * 
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu-common.h"
> +#include "cpu.h"
> +
> +#include "hw/sysbus.h"
> +#include "hw/hw.h"
> +#include "hw/char/serial.h"
> +#include "sysemu/sysemu.h"
> +#include "hw/boards.h"
> +#include "exec/memory.h"
> +#include "exec/address-spaces.h"
> +#include "qemu/config-file.h"
> +
> +#include "boot.h"
> +
> +#define BINARY_DEVICE_TREE_FILE"10m50-devboard.dtb"
> +
> +static void nios2_10m50_ghrd_init(MachineState *machine)
> +{
> +Nios2CPU *cpu;
> +DeviceState *dev;
> +MemoryRegion *address_space_mem = get_system_memory();
> +MemoryRegion *phys_tcm = g_new(MemoryRegion, 1);
> +MemoryRegion *phys_tcm_alias = g_new(MemoryRegion, 1);
> +MemoryRegion *phys_ram = g_new(MemoryRegion, 1);
> +MemoryRegion *phys_ram_alias = g_new(MemoryRegion, 1);
> +ram_addr_t tcm_base = 0x0;
> +ram_addr_t tcm_size = 0x1000;/* 1 kiB, but QEMU limit is 4 kiB */
> +ram_addr_t ram_base = 0x0800;
> +ram_addr_t ram_size = 0x0800;
> +qemu_irq *cpu_irq, irq[32];
> +int i;
> +
> +/* Physical TCM (tb_ram_1k) with alias at 0xc000 */
> +memory_region_init_ram(phys_tcm, NULL, "nios2.tcm", tcm_size, 
> _abort);
> +memory_region_init_alias(phys_tcm_alias, NULL, "nios2.tcm.alias",
> + phys_tcm, 0, tcm_size);
> +vmstate_register_ram_global(phys_tcm);
> +memory_region_add_subregion(address_space_mem, tcm_base, phys_tcm);
> +memory_region_add_subregion(address_space_mem, 0xc000 + tcm_base,
> +phys_tcm_alias);
> +
> +/* Physical DRAM with alias at 0xc000 */
> +memory_region_init_ram(phys_ram, NULL, "nios2.ram", ram_size, 
> _abort);
> +memory_region_init_alias(phys_ram_alias, NULL, "nios2.ram.alias",
> + phys_ram, 0, ram_size);
> +vmstate_register_ram_global(phys_ram);
> +memory_region_add_subregion(address_space_mem, ram_base, phys_ram);
> +memory_region_add_subregion(address_space_mem, 0xc000 + ram_base,
> +phys_ram_alias);
> +
> +/* Create CPU -- FIXME */
> +cpu = cpu_nios2_init("nios2");
> +
> +/* Register: CPU interrupt controller (PIC) */
> +cpu_irq = nios2_cpu_pic_init(cpu);
> +
> +/* Register: Internal Interrupt Controller (IIC) */
> +dev = qdev_create(NULL, "altera,iic");
> +qdev_prop_set_ptr(dev, "cpu", cpu);
> +qdev_init_nofail(dev);
> +sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, cpu_irq[0]);
> +for (i = 0; i < 32; i++) {
> +irq[i] = qdev_get_gpio_in(dev, i);
> +}
> +
> +/* Register: Altera 16550 UART */
> +

Re: [Qemu-devel] [PATCH v11 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-11-07 Thread Alex Williamson
On Sat, 5 Nov 2016 02:40:45 +0530
Kirti Wankhede  wrote:

> Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
> about DMA_UNMAP.
> Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
> Notifier should be registered, if external user wants to use
> vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages.
> Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
> mappings.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I5910d0024d6be87f3e8d3e0ca0eaeaaa0b17f271
> ---
>  drivers/vfio/vfio.c | 73 
> +
>  drivers/vfio/vfio_iommu_type1.c | 47 --
>  include/linux/vfio.h| 11 +++
>  3 files changed, 121 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 76d260e98930..4ed1a6a247c6 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1895,6 +1895,79 @@ err_unpin_pages:
>  }
>  EXPORT_SYMBOL(vfio_unpin_pages);
>  
> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb)

Is the expectation here that this is a generic notifier for all
vfio->mdev signaling?  That should probably be made clear in the mdev
API to avoid vendor drivers assuming their notifier callback only
occurs for unmaps, even if that's currently the case.

> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_register_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->register_notifier))
> + ret = driver->ops->register_notifier(container->iommu_data, nb);
> + else
> + ret = -EINVAL;

-ENOTTY again?  And below.

> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_register_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_register_notifier);
> +
> +int vfio_unregister_notifier(struct device *dev, struct notifier_block *nb)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_unregister_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->unregister_notifier))
> + ret = driver->ops->unregister_notifier(container->iommu_data,
> +nb);
> + else
> + ret = -EINVAL;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);

The concern any time we have an unregister like this is whether the
vendor driver does proper cleanup.  Maybe we don't even need an
unregister, could we track this on the group such that releasing the
group automatically unregisters the notifier?  Maybe a single nb
slot and -EBUSY if already set, cleared on release?  Along those lines,
automatically unpinning anything would also be a nice feature (ie. if
an mdev device is unplugged while other devices are still in the
container), but then we'd need to track pinning per group and we already
have too much overhead in tracking pinning.

> +
> +err_unregister_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_unregister_notifier);
> +
>  /**
>   * Module/class support
>   */
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index e511073446a0..c2d3a84c447b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -60,6 +61,7 @@ struct vfio_iommu {
>   struct vfio_domain  *external_domain; /* domain for external user */
>   struct mutexlock;
>   struct rb_root  dma_list;
> + struct blocking_notifier_head notifier;
>   boolv2;
>   boolnesting;
>  };
> @@ -550,7 +552,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>  
>   mutex_lock(>lock);
>  
> -   

Re: [Qemu-devel] [PATCH V3 09/10] memory: handle alias in memory_region_is_iommu()

2016-11-07 Thread Peter Xu
On Mon, Nov 07, 2016 at 03:09:54PM +0800, Jason Wang wrote:
> Cc: Paolo Bonzini 
> Acked-by: Paolo Bonzini 
> Signed-off-by: Jason Wang 

Reviewed-by: Peter Xu 



Re: [Qemu-devel] [PATCH V3 08/10] memory: handle alias for iommu notifier

2016-11-07 Thread Peter Xu
On Mon, Nov 07, 2016 at 03:09:53PM +0800, Jason Wang wrote:
> Cc: Paolo Bonzini 
> Acked-by: Paolo Bonzini 
> Signed-off-by: Jason Wang 

Reviewed-by: Peter Xu 



Re: [Qemu-devel] [PATCH V3 05/10] intel_iommu: support device iotlb descriptor

2016-11-07 Thread Peter Xu
On Mon, Nov 07, 2016 at 03:09:50PM +0800, Jason Wang wrote:

[...]

> +static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
> +  VTDInvDesc *inv_desc)
> +{
> +VTDAddressSpace *vtd_dev_as;
> +IOMMUTLBEntry entry;

Since "entry" is allocated on the stack...

[...]

> +entry.target_as = _dev_as->as;
> +entry.addr_mask = sz - 1;
> +entry.iova = addr;
> +memory_region_notify_iommu(entry.target_as->root, entry);

... here we need to assign entry.perm explicitly to IOMMU_NONE, right?

Also I think it'll be nice that we set all the fields even not used,
to avoid rubbish from the stack passed down to notifier handlers.

[...]

> +static bool x86_iommu_device_iotlb_prop_get(Object *o, Error **errp)
> +{
> +X86IOMMUState *s = X86_IOMMU_DEVICE(o);
> +return s->dt_supported;
> +}
> +
> +static void x86_iommu_device_iotlb_prop_set(Object *o, bool value, Error 
> **errp)
> +{
> +X86IOMMUState *s = X86_IOMMU_DEVICE(o);
> +s->dt_supported = value;
> +}
> +
>  static void x86_iommu_instance_init(Object *o)
>  {
>  X86IOMMUState *s = X86_IOMMU_DEVICE(o);
> @@ -114,6 +126,11 @@ static void x86_iommu_instance_init(Object *o)
>  s->intr_supported = false;
>  object_property_add_bool(o, "intremap", x86_iommu_intremap_prop_get,
>   x86_iommu_intremap_prop_set, NULL);
> +s->dt_supported = false;
> +object_property_add_bool(o, "device-iotlb",
> + x86_iommu_device_iotlb_prop_get,
> + x86_iommu_device_iotlb_prop_set,
> + NULL);

Again, a nit-pick here is to use Property for "device-iotlb":

static Property vtd_properties[] = {
DEFINE_PROP_UINT32("device-iotlb", X86IOMMUState, dt_supported, false),
DEFINE_PROP_END_OF_LIST(),
};

However not worth a repost.

Thanks,

-- peterx



Re: [Qemu-devel] [QEMU PATCH v11 0/4] migration: migrate QTAILQ

2016-11-07 Thread no-reply
Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Type: series
Subject: [Qemu-devel] [QEMU PATCH v11 0/4] migration: migrate QTAILQ
Message-id: 1478559599-25667-1-git-send-email-du...@linux.vnet.ibm.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=16
make docker-test-quick@centos6
make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
1e0f711 migration: add error_report
b226232 tests/migration: Add test for QTAILQ migration
b87f04f migration: migrate QTAILQ
2f6403b migration: extend VMStateInfo

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD   centos6
make[1]: Entering directory `/var/tmp/patchew-tester-tmp-t3roe8o5/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=8b5b8d7dc73c
TERM=xterm
MAKEFLAGS= -j16
HISTSIZE=1000
J=16
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1-pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels -Wmissing-include-dirs 
-Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-declaration -Wold-style-definition 
-Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support   no
brlapi supportno
bluez  supportno
Documentation no
PIE   yes
vde support   no
netmap supportno
Linux AIO support no
ATTR/XATTR support yes
Install blobs yes
KVM support   yes
COLO support  yes
RDMA support  no
TCG interpreter   no
fdt support   yes
preadv support

Re: [Qemu-devel] [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices

2016-11-07 Thread Alex Williamson
On Sat, 5 Nov 2016 02:40:44 +0530
Kirti Wankhede  wrote:

> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
> Mediated device only uses IOMMU APIs, the underlying hardware can be
> managed by an IOMMU domain.
> 
> Aim of this change is:
> - To use most of the code of TYPE1 IOMMU driver for mediated devices
> - To support direct assigned device and mediated device in single module
> 
> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
> backend module. More details:
> - vfio_pin_pages() callback here uses task and address space of vfio_dma,
>   that is, of the process who mapped that iova range.
> - Added pfn_list tracking logic to address space structure. All pages
>   pinned through this interface are trached in its address space.
  ^ k
--|

> - Pinned pages list is used to verify unpinning request and to unpin
>   remaining pages while detaching the group for that device.
> - Page accounting is updated to account in its address space where the
>   pages are pinned/unpinned.
> -  Accouting for mdev device is only done if there is no iommu capable
>   domain in the container. When there is a direct device assigned to the
>   container and that domain is iommu capable, all pages are already pinned
>   during DMA_MAP.
> - Page accouting is updated on hot plug and unplug mdev device and pass
>   through device.
> 
> Tested by assigning below combinations of devices to a single VM:
> - GPU pass through only
> - vGPU device only
> - One GPU pass through and one vGPU device
> - Linux VM hot plug and unplug vGPU device while GPU pass through device
>   exist
> - Linux VM hot plug and unplug GPU pass through device while vGPU device
>   exist
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
> ---
>  drivers/vfio/vfio_iommu_type1.c | 538 
> +---
>  1 file changed, 500 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 8d64528dcc22..e511073446a0 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -56,6 +57,7 @@ MODULE_PARM_DESC(disable_hugepages,
>  struct vfio_iommu {
>   struct list_headdomain_list;
>   struct list_headaddr_space_list;
> + struct vfio_domain  *external_domain; /* domain for external user */
>   struct mutexlock;
>   struct rb_root  dma_list;
>   boolv2;
> @@ -67,6 +69,9 @@ struct vfio_addr_space {
>   struct mm_struct*mm;
>   struct list_headnext;
>   atomic_tref_count;
> + /* external user pinned pfns */
> + struct rb_root  pfn_list;   /* pinned Host pfn list */
> + struct mutexpfn_list_lock;  /* mutex for pfn_list */
>  };
>  
>  struct vfio_domain {
> @@ -83,6 +88,7 @@ struct vfio_dma {
>   unsigned long   vaddr;  /* Process virtual addr */
>   size_t  size;   /* Map size (bytes) */
>   int prot;   /* IOMMU_READ/WRITE */
> + booliommu_mapped;
>   struct vfio_addr_space  *addr_space;
>   struct task_struct  *task;
>   boolmlock_cap;
> @@ -94,6 +100,19 @@ struct vfio_group {
>  };
>  
>  /*
> + * Guest RAM pinning working set or DMA target
> + */
> +struct vfio_pfn {
> + struct rb_node  node;
> + unsigned long   pfn;/* Host pfn */
> + int prot;
> + atomic_tref_count;
> +};
> +
> +#define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
> + (!list_empty(>domain_list))
> +
> +/*
>   * This code handles mapping and unmapping of user data buffers
>   * into DMA'ble space using the IOMMU
>   */
> @@ -153,6 +172,93 @@ static struct vfio_addr_space 
> *vfio_find_addr_space(struct vfio_iommu *iommu,
>   return NULL;
>  }
>  
> +/*
> + * Helper Functions for host pfn list
> + */
> +static struct vfio_pfn *vfio_find_pfn(struct vfio_addr_space *addr_space,
> +   unsigned long pfn)
> +{
> + struct vfio_pfn *vpfn;
> + struct rb_node *node = addr_space->pfn_list.rb_node;
> +
> + while (node) {
> + vpfn = rb_entry(node, struct vfio_pfn, node);
> +
> + if (pfn < vpfn->pfn)
> + node = node->rb_left;
> + else if (pfn > vpfn->pfn)
> + node = node->rb_right;
> + else
> + 

Re: [Qemu-devel] [PATCH v3 04/14] qapi: fix schema symbol sections

2016-11-07 Thread Eric Blake
On 11/07/2016 01:30 AM, Marc-André Lureau wrote:
> According to docs/qapi-code-gen.txt, there needs to be '##' to start a
> and end a symbol section, that's also what the documentation parser
> expects.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  qapi-schema.json | 18 +-
>  qapi/block-core.json |  1 +
>  qga/qapi-schema.json |  3 +++
>  3 files changed, 17 insertions(+), 5 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 05/14] qapi: fix missing symbol @prefix

2016-11-07 Thread Eric Blake
On 11/07/2016 01:30 AM, Marc-André Lureau wrote:
> Signed-off-by: Marc-André Lureau 
> ---
>  qapi-schema.json |  4 ++--
>  qapi/block-core.json |  4 ++--
>  qapi/crypto.json | 36 ++--
>  3 files changed, 22 insertions(+), 22 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [QEMU PATCH v11 2/4] migration: migrate QTAILQ

2016-11-07 Thread Jianjun Duan
Currently we cannot directly transfer a QTAILQ instance because of the
limitation in the migration code. Here we introduce an approach to
transfer such structures. We created VMStateInfo vmstate_info_qtailq
for QTAILQ. Similar VMStateInfo can be created for other data structures
such as list.

When a QTAILQ is migrated from source to target, it is appended to the
corresponding QTAILQ structure, which is assumed to have been properly
initialized.

This approach will be used to transfer pending_events and ccs_list in spapr
state.

We also create some macros in qemu/queue.h to access a QTAILQ using pointer
arithmetic. This ensures that we do not depend on the implementation
details about QTAILQ in the migration code.

Signed-off-by: Jianjun Duan 
---
 include/migration/vmstate.h | 20 ++
 include/qemu/queue.h| 60 
 migration/trace-events  |  4 +++
 migration/vmstate.c | 67 +
 4 files changed, 151 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index eafc8f2..6289327 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -253,6 +253,7 @@ extern const VMStateInfo vmstate_info_timer;
 extern const VMStateInfo vmstate_info_buffer;
 extern const VMStateInfo vmstate_info_unused_buffer;
 extern const VMStateInfo vmstate_info_bitmap;
+extern const VMStateInfo vmstate_info_qtailq;
 
 #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
 #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)
@@ -664,6 +665,25 @@ extern const VMStateInfo vmstate_info_bitmap;
 .offset   = offsetof(_state, _field),\
 }
 
+/* For QTAILQ that need customized handling.
+ * Target QTAILQ needs be properly initialized.
+ * _type: type of QTAILQ element
+ * _next: name of QTAILQ entry field in QTAILQ element
+ * _vmsd: VMSD for QTAILQ element
+ * size: size of QTAILQ element
+ * start: offset of QTAILQ entry in QTAILQ element
+ */
+#define VMSTATE_QTAILQ_V(_field, _state, _version, _vmsd, _type, _next)  \
+{\
+.name = (stringify(_field)), \
+.version_id   = (_version),  \
+.vmsd = &(_vmsd),\
+.size = sizeof(_type),   \
+.info = _info_qtailq,\
+.offset   = offsetof(_state, _field),\
+.start= offsetof(_type, _next),  \
+}
+
 /* _f : field name
_f_n : num of elements field_name
_n : num of elements
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 342073f..75616e1 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -438,4 +438,64 @@ struct {   
 \
 #define QTAILQ_PREV(elm, headname, field) \
 (*(((struct headname *)((elm)->field.tqe_prev))->tqh_last))
 
+#define field_at_offset(base, offset, type)
\
+((type) (((char *) (base)) + (offset)))
+
+typedef struct DUMMY_Q_ENTRY DUMMY_Q_ENTRY;
+typedef struct DUMMY_Q DUMMY_Q;
+
+struct DUMMY_Q_ENTRY {
+QTAILQ_ENTRY(DUMMY_Q_ENTRY) next;
+};
+
+struct DUMMY_Q {
+QTAILQ_HEAD(DUMMY_Q_HEAD, DUMMY_Q_ENTRY) head;
+};
+
+#define dummy_q ((DUMMY_Q *) 0)
+#define dummy_qe ((DUMMY_Q_ENTRY *) 0)
+
+/*
+ * Offsets of layout of a tail queue head.
+ */
+#define QTAILQ_FIRST_OFFSET (offsetof(typeof(dummy_q->head), tqh_first))
+#define QTAILQ_LAST_OFFSET  (offsetof(typeof(dummy_q->head), tqh_last))
+/*
+ * Raw access of elements of a tail queue
+ */
+#define QTAILQ_RAW_FIRST(head) 
\
+(*field_at_offset(head, QTAILQ_FIRST_OFFSET, void **))
+#define QTAILQ_RAW_LAST(head)  
\
+(*field_at_offset(head, QTAILQ_LAST_OFFSET, void ***))
+
+/*
+ * Offsets of layout of a tail queue element.
+ */
+#define QTAILQ_NEXT_OFFSET (offsetof(typeof(dummy_qe->next), tqe_next))
+#define QTAILQ_PREV_OFFSET (offsetof(typeof(dummy_qe->next), tqe_prev))
+
+/*
+ * Raw access of elements of a tail entry
+ */
+#define QTAILQ_RAW_NEXT(elm, entry)
\
+(*field_at_offset(elm, entry + QTAILQ_NEXT_OFFSET, void **))
+#define QTAILQ_RAW_PREV(elm, entry)
\
+(*field_at_offset(elm, entry + QTAILQ_PREV_OFFSET, void ***))
+/*
+ * Tail queue tranversal using pointer arithmetic.
+ */
+#define QTAILQ_RAW_FOREACH(elm, head, entry)   
\
+for ((elm) = QTAILQ_RAW_FIRST(head);   
\
+ (elm);   

[Qemu-devel] [QEMU PATCH v11 4/4] migration: add error_report

2016-11-07 Thread Jianjun Duan
Added error_report where version_ids do not match in vmstate_load_state.

Signed-off-by: Jianjun Duan 
---
 migration/vmstate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/vmstate.c b/migration/vmstate.c
index 4ef528c..344863f 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -85,6 +85,7 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription 
*vmsd,
 
 trace_vmstate_load_state(vmsd->name, version_id);
 if (version_id > vmsd->version_id) {
+error_report("%s %s",  vmsd->name, "too new");
 trace_vmstate_load_state_end(vmsd->name, "too new", -EINVAL);
 return -EINVAL;
 }
@@ -95,6 +96,7 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription 
*vmsd,
 trace_vmstate_load_state_end(vmsd->name, "old path", ret);
 return ret;
 }
+error_report("%s %s",  vmsd->name, "too old");
 trace_vmstate_load_state_end(vmsd->name, "too old", -EINVAL);
 return -EINVAL;
 }
-- 
1.9.1




[Qemu-devel] [QEMU PATCH v11 1/4] migration: extend VMStateInfo

2016-11-07 Thread Jianjun Duan
Current migration code cannot handle some data structures such as
QTAILQ in qemu/queue.h. Here we extend the signatures of put/get
in VMStateInfo so that customized handling is supported. put now
will return int type.

Signed-off-by: Jianjun Duan 
---
 hw/display/virtio-gpu.c |   8 +++-
 hw/intc/s390_flic_kvm.c |   8 +++-
 hw/net/vmxnet3.c|  24 +++---
 hw/nvram/eeprom93xx.c   |   8 +++-
 hw/nvram/fw_cfg.c   |   8 +++-
 hw/pci/msix.c   |   8 +++-
 hw/pci/pci.c|  16 +--
 hw/pci/shpc.c   |   7 ++-
 hw/scsi/scsi-bus.c  |   8 +++-
 hw/timer/twl92230.c |   8 +++-
 hw/usb/redirect.c   |  24 +++---
 hw/virtio/virtio-pci.c  |   8 +++-
 hw/virtio/virtio.c  |  15 --
 include/migration/vmstate.h |  19 ++--
 migration/savevm.c  |   7 ++-
 migration/vmstate.c | 113 +---
 target-alpha/machine.c  |   6 ++-
 target-arm/machine.c|  14 --
 target-i386/machine.c   |  26 +++---
 target-mips/machine.c   |  14 --
 target-ppc/machine.c|  12 +++--
 target-sparc/machine.c  |   6 ++-
 22 files changed, 262 insertions(+), 105 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 60bce94..c58fa1b 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -988,7 +988,8 @@ static const VMStateDescription vmstate_virtio_gpu_scanouts 
= {
 },
 };
 
-static void virtio_gpu_save(QEMUFile *f, void *opaque, size_t size)
+static int virtio_gpu_save(QEMUFile *f, void *opaque, size_t size,
+   VMStateField *field, QJSON *vmdesc)
 {
 VirtIOGPU *g = opaque;
 struct virtio_gpu_simple_resource *res;
@@ -1013,9 +1014,12 @@ static void virtio_gpu_save(QEMUFile *f, void *opaque, 
size_t size)
 qemu_put_be32(f, 0); /* end of list */
 
 vmstate_save_state(f, _virtio_gpu_scanouts, g, NULL);
+
+return 0;
 }
 
-static int virtio_gpu_load(QEMUFile *f, void *opaque, size_t size)
+static int virtio_gpu_load(QEMUFile *f, void *opaque, size_t size,
+   VMStateField *field)
 {
 VirtIOGPU *g = opaque;
 struct virtio_gpu_simple_resource *res;
diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
index 21ac2e2..61f512f 100644
--- a/hw/intc/s390_flic_kvm.c
+++ b/hw/intc/s390_flic_kvm.c
@@ -286,7 +286,8 @@ static void kvm_s390_release_adapter_routes(S390FLICState 
*fs,
  * increase until buffer is sufficient or maxium size is
  * reached
  */
-static void kvm_flic_save(QEMUFile *f, void *opaque, size_t size)
+static int kvm_flic_save(QEMUFile *f, void *opaque, size_t size,
+ VMStateField *field, QJSON *vmdesc)
 {
 KVMS390FLICState *flic = opaque;
 int len = FLIC_SAVE_INITIAL_SIZE;
@@ -319,6 +320,8 @@ static void kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size)
 count * sizeof(struct kvm_s390_irq));
 }
 g_free(buf);
+
+return 0;
 }
 
 /**
@@ -331,7 +334,8 @@ static void kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size)
  * Note: Do nothing when no interrupts where stored
  * in QEMUFile
  */
-static int kvm_flic_load(QEMUFile *f, void *opaque, size_t size)
+static int kvm_flic_load(QEMUFile *f, void *opaque, size_t size,
+ VMStateField *field)
 {
 uint64_t len = 0;
 uint64_t count = 0;
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 92f6af9..4163ca8 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2451,7 +2451,8 @@ static void vmxnet3_put_tx_stats_to_file(QEMUFile *f,
 qemu_put_be64(f, tx_stat->pktsTxDiscard);
 }
 
-static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, size_t size,
+VMStateField *field)
 {
 Vmxnet3TxqDescr *r = pv;
 
@@ -2465,7 +2466,8 @@ static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, 
size_t size)
 return 0;
 }
 
-static void vmxnet3_put_txq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_put_txq_descr(QEMUFile *f, void *pv, size_t size,
+ VMStateField *field, QJSON *vmdesc)
 {
 Vmxnet3TxqDescr *r = pv;
 
@@ -2474,6 +2476,8 @@ static void vmxnet3_put_txq_descr(QEMUFile *f, void *pv, 
size_t size)
 qemu_put_byte(f, r->intr_idx);
 qemu_put_be64(f, r->tx_stats_pa);
 vmxnet3_put_tx_stats_to_file(f, >txq_stats);
+
+return 0;
 }
 
 static const VMStateInfo txq_descr_info = {
@@ -2512,7 +2516,8 @@ static void vmxnet3_put_rx_stats_to_file(QEMUFile *f,
 qemu_put_be64(f, rx_stat->pktsRxError);
 }
 
-static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, size_t size,
+VMStateField *field)
 {
 Vmxnet3RxqDescr *r = pv;
 int i;
@@ -2530,7 +2535,8 @@ static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, 
size_t size)
  

[Qemu-devel] [QEMU PATCH v11 0/4] migration: migrate QTAILQ

2016-11-07 Thread Jianjun Duan
Hi all,

I addressed some review comments. For QTAILQ, I hope we can reach a 
compromise since there are more than one way to do it. For some work such as 
the ability to initialize newly allocated QTAILQ element with default value, I 
think it is better to do it later on in a separate series. 
Comments are welcome. 

v11: - Split error_report statements into a separate patch.
 - Changed the signature of put. It now returns int type.
 - Minor changes to QTAILQ macros. 
 
Previous versions are:

v10: -Fixed a typo.
(http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg01206.html)

v9: - No more hard encoding of QTAILQ layout information
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg01042.html)

v8: - Fixed a style issue. 
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00874.html)

v7: - Fixed merge errors.
- Simplified macro definitions related to pointer arithmetic based QTAILQ 
access.
- Added test case for QTAILQ migration in tests/test-vmstate.c.
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00711.html)


v6: - Split from Power specific patches. 
- Dropped VMS_LINKED flag.
- Rebased to master.
- Added comments to clarify about put/get in VMStateInfo.  
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00336.html)

v5: - Rebased to David's ppc-for-2.8. 
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg00270.html)

v4: - Introduce a way to set customized instance_id in SaveStateEntry. Use it
  to set instance_id for DRC using its unique index to address David 
  Gibson's concern.
- Rename VMS_CSTM to VMS_LINKED based on Paolo Bonzini's suggestions.
- Clean up qjson stuff in put_qtailq. 
- Add trace for put_qtailq and get_qtailq based on David Gilbert's 
  suggestion.
- Based on David's ppc-for-2.7. 
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07720.html)

v3: - Simplify overall design followng discussion with Paolo. No longer need
  metadata to migrate QTAILQ.
- Extend VMStateInfo instead of adding similar fields to VMStateField.
- Clean up macros in qemu/queue.h.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg05695.html)

v2: - Introduce a general approach to migrate QTAILQ in qemu/queue.h.
- Migrate signalled field in the DRC state.
- Put the newly added migrating fields in subsections so that backward 
  migration is not broken.  
- Set detach_cb field right after migration so that a migrated hot-unplug
  event could finish its course.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04188.html)

v1: - Inital version.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02601.html)


Jianjun Duan (4):
  migration: extend VMStateInfo
  migration: migrate QTAILQ
  tests/migration: Add test for QTAILQ migration
  migration: add error_report

 hw/display/virtio-gpu.c |   8 +-
 hw/intc/s390_flic_kvm.c |   8 +-
 hw/net/vmxnet3.c|  24 --
 hw/nvram/eeprom93xx.c   |   8 +-
 hw/nvram/fw_cfg.c   |   8 +-
 hw/pci/msix.c   |   8 +-
 hw/pci/pci.c|  16 +++-
 hw/pci/shpc.c   |   7 +-
 hw/scsi/scsi-bus.c  |   8 +-
 hw/timer/twl92230.c |   8 +-
 hw/usb/redirect.c   |  24 --
 hw/virtio/virtio-pci.c  |   8 +-
 hw/virtio/virtio.c  |  15 +++-
 include/migration/vmstate.h |  39 --
 include/qemu/queue.h|  60 +++
 migration/savevm.c  |   7 +-
 migration/trace-events  |   4 +
 migration/vmstate.c | 182 +++-
 target-alpha/machine.c  |   6 +-
 target-arm/machine.c|  14 +++-
 target-i386/machine.c   |  26 +--
 target-mips/machine.c   |  14 +++-
 target-ppc/machine.c|  12 ++-
 target-sparc/machine.c  |   6 +-
 tests/test-vmstate.c| 160 ++
 25 files changed, 575 insertions(+), 105 deletions(-)

-- 
1.9.1




[Qemu-devel] [QEMU PATCH v11 3/4] tests/migration: Add test for QTAILQ migration

2016-11-07 Thread Jianjun Duan
Add a test for QTAILQ migration to tests/test-vmstate.c.

Signed-off-by: Jianjun Duan 
---
 tests/test-vmstate.c | 160 +++
 1 file changed, 160 insertions(+)

diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index d8da26f..a992408 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -475,6 +475,164 @@ static void test_load_skip(void)
 qemu_fclose(loading);
 }
 
+
+/* test QTAILQ migration */
+typedef struct TestQtailqElement TestQtailqElement;
+
+struct TestQtailqElement {
+bool b;
+uint8_t  u8;
+QTAILQ_ENTRY(TestQtailqElement) next;
+};
+
+typedef struct TestQtailq {
+int16_t  i16;
+QTAILQ_HEAD(TestQtailqHead, TestQtailqElement) q;
+int32_t  i32;
+} TestQtailq;
+
+static const VMStateDescription vmstate_q_element = {
+.name = "test/queue-element",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_BOOL(b, TestQtailqElement),
+VMSTATE_UINT8(u8, TestQtailqElement),
+VMSTATE_END_OF_LIST()
+},
+};
+
+static const VMStateDescription vmstate_q = {
+.name = "test/queue",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_INT16(i16, TestQtailq),
+VMSTATE_QTAILQ_V(q, TestQtailq, 1, vmstate_q_element, 
TestQtailqElement,
+ next),
+VMSTATE_INT32(i32, TestQtailq),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void test_save_q(void)
+{
+TestQtailq obj_q = {
+.i16 = -512,
+.i32 = 7,
+};
+
+TestQtailqElement obj_qe1 = {
+.b = true,
+.u8 = 130,
+};
+
+TestQtailqElement obj_qe2 = {
+.b = false,
+.u8 = 65,
+};
+
+uint8_t wire_q[] = {
+/* i16 */ 0xfe, 0x0,
+/* start of element 0 of q */ 0x01,
+/* .b  */ 0x01,
+/* .u8 */ 0x82,
+/* start of element 1 of q */ 0x01,
+/* b */   0x00,
+/* u8 */  0x41,
+/* end of q */0x00,
+/* i32 */ 0x00, 0x01, 0x11, 0x70,
+QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
+};
+
+QTAILQ_INIT(_q.q);
+QTAILQ_INSERT_TAIL(_q.q, _qe1, next);
+QTAILQ_INSERT_TAIL(_q.q, _qe2, next);
+
+save_vmstate(_q, _q);
+compare_vmstate(wire_q, sizeof(wire_q));
+}
+
+static void test_load_q(void)
+{
+TestQtailq obj_q = {
+.i16 = -512,
+.i32 = 7,
+};
+
+TestQtailqElement obj_qe1 = {
+.b = true,
+.u8 = 130,
+};
+
+TestQtailqElement obj_qe2 = {
+.b = false,
+.u8 = 65,
+};
+
+uint8_t wire_q[] = {
+/* i16 */ 0xfe, 0x0,
+/* start of element 0 of q */ 0x01,
+/* .b  */ 0x01,
+/* .u8 */ 0x82,
+/* start of element 1 of q */ 0x01,
+/* b */   0x00,
+/* u8 */  0x41,
+/* end of q */0x00,
+/* i32 */ 0x00, 0x01, 0x11, 0x70,
+};
+
+QTAILQ_INIT(_q.q);
+QTAILQ_INSERT_TAIL(_q.q, _qe1, next);
+QTAILQ_INSERT_TAIL(_q.q, _qe2, next);
+
+QEMUFile *fsave = open_test_file(true);
+
+qemu_put_buffer(fsave, wire_q, sizeof(wire_q));
+qemu_put_byte(fsave, QEMU_VM_EOF);
+g_assert(!qemu_file_get_error(fsave));
+qemu_fclose(fsave);
+
+QEMUFile *fload = open_test_file(false);
+TestQtailq tgt;
+
+QTAILQ_INIT();
+vmstate_load_state(fload, _q, , 1);
+char eof = qemu_get_byte(fload);
+g_assert(!qemu_file_get_error(fload));
+g_assert_cmpint(tgt.i16, ==, obj_q.i16);
+g_assert_cmpint(tgt.i32, ==, obj_q.i32);
+g_assert_cmpint(eof, ==, QEMU_VM_EOF);
+
+TestQtailqElement *qele_from = QTAILQ_FIRST(_q.q);
+TestQtailqElement *qlast_from = QTAILQ_LAST(_q.q, TestQtailqHead);
+TestQtailqElement *qele_to = QTAILQ_FIRST();
+TestQtailqElement *qlast_to = QTAILQ_LAST(, TestQtailqHead);
+
+while (1) {
+g_assert_cmpint(qele_to->b, ==, qele_from->b);
+g_assert_cmpint(qele_to->u8, ==, qele_from->u8);
+if ((qele_from == qlast_from) || (qele_to == qlast_to)) {
+break;
+}
+qele_from = QTAILQ_NEXT(qele_from, next);
+qele_to = QTAILQ_NEXT(qele_to, next);
+}
+
+g_assert_cmpint((uint64_t) qele_from, ==, (uint64_t) qlast_from);
+g_assert_cmpint((uint64_t) qele_to, ==, (uint64_t) qlast_to);
+
+/* clean up */
+TestQtailqElement *qele;
+while (!QTAILQ_EMPTY()) {
+qele = QTAILQ_LAST(, TestQtailqHead);
+QTAILQ_REMOVE(, qele, next);
+free(qele);
+qele = NULL;
+}
+qemu_fclose(fload);
+}
+
 int main(int argc, char **argv)
 {
 temp_fd = mkstemp(temp_file);
@@ 

Re: [Qemu-devel] [PATCH v3 02/14] qga/schema: fix double-return in doc

2016-11-07 Thread Eric Blake
On 11/07/2016 01:30 AM, Marc-André Lureau wrote:
> guest-get-memory-block-info documentation should have only one
> "Returns:".
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  qga/qapi-schema.json | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
> index c21f308..758803a 100644
> --- a/qga/qapi-schema.json
> +++ b/qga/qapi-schema.json
> @@ -952,7 +952,6 @@
>  #
>  # Get information relating to guest memory blocks.
>  #
> -# Returns: memory block size in bytes.
>  # Returns: @GuestMemoryBlockInfo
>  #
>  # Since 2.3
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 01/14] qapi: add missing 'bus' argument in device_add

2016-11-07 Thread Eric Blake
On 11/07/2016 01:30 AM, Marc-André Lureau wrote:
> 'device_add' is incomplete for now, but 'bus' is a common argument for
> regarless of the device, and is present in documentation. Add it to the

s/regarless/regardless/

> device_add schema definition.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  qapi-schema.json | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index b0b4bf6..4ba5772 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2317,7 +2317,7 @@
>  # Since: 0.13
>  ##
>  { 'command': 'device_add',
> -  'data': {'driver': 'str', 'id': 'str'},
> +  'data': {'driver': 'str', '*bus': 'str', 'id': 'str'},
>'gen': false } # so we can get the additional arguments
>  
>  ##
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Sphinx for QEMU docs? (and a doc-comment format question)

2016-11-07 Thread John Snow



On 11/07/2016 08:30 AM, Stefan Hajnoczi wrote:

On Sat, Nov 05, 2016 at 06:42:23PM +, Peter Maydell wrote:

In particular I think we could:
 * set up a framework for our in-tree docs/ which gives us a
   place to put new docs (both for-users and for-developers) --
   I think having someplace to put things will reduce the barrier
   to people writing useful new docs
 * gradually convert the existing docs to rst
 * use the sphinx extension features to pull in the doc-comments
   we have been fairly consistently writing over the last few years
   (for instance a converted version of docs/memory.txt could pull
   in doc comments from memory.h; or we can just write simple
   wrapper files like a "Bitmap operations" document that
   displays the doc comments from bitops.h)


You are suggesting Sphinx for two different purposes:

1. Formatting docs/ in HTML, PDF, etc.

2. API documentation from doc comments.

It's a good idea for #1 since we can then publish automated builds of
the docs.  They will be easy to view and link to in a web browser.

I'm not a fan of #2.  QEMU is not a C library that people develop
against and our APIs are not stable.  There is no incentive for pretty
doc comments.  It might be cool to set it up once but things will
deterioate again quickly because we don't actually need external API
docs.

Instead of #2 we should focus on generating nice external QMP docs for
libvirt and other clients.  That has a clear benefit.

Stefan



I think that designating certain interfaces within QEMU as "Internal 
API" has some merit and are worth documenting for the sake of 
device/format authors like Peter suggests.


Things may be in flux often, but if we can generate the docs from source 
code comments, I don't think it's unjust or unreasonable to request that 
patches keep these docs up to date.


It's error prone, of course, but certainly more manageable if we have a 
build tool doing some robotic checking of doc completeness for select 
interfaces. I think it's not possible to be more error prone than our 
current solution of "Random GTK-doc-like comments strewn about that may 
or may not be accurate, that we don't actually check or verify or even 
use for any doc-building purposes."


I'm also a fan of unifying our internal code documentation formats 
because it helps make the code look more consistent, but may also open 
up some parsing options for enhanced IDE support which could be nice for 
some.


I think at a minimum, having _A_ standard approach cannot possibly be 
*any* worse than _NO_ standard approach.


I'm a fan of the concept, but have no particular feelings on Sphinx yet.

--js



Re: [Qemu-devel] [PATCH] nbd: Don't inf-loop on early EOF

2016-11-07 Thread Eric Blake
On 11/07/2016 04:22 PM, Max Reitz wrote:
> On 07.11.2016 21:38, Eric Blake wrote:
>> Commit 7d3123e converted a single read_sync() into a while loop
>> that assumed that read_sync() would either make progress or give
>> an error. But when the server hangs up early, the client sees
>> EOF (a read_sync() of 0) and never makes progress, which in turn
>> caused qemu-iotest './check -nbd 83' to go into an infinite loop.
>>
>> Rework the loop to accomodate reads cut short by EOF.
>>
>> Reported-by: Max Reitz 
>> Signed-off-by: Eric Blake 
>> ---
>>  nbd/client.c | 13 +++--
>>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> Reviewed-by: Max Reitz 
> 
> But what about the server's nbd_negotiate_drop_sync()? It uses pretty
> much the same code, so it seems susceptible to the same issue (only that
> we don't have a test for that side).

If so, that's an older bug (pre-existing back to at least 2.6?), so it
should be a separate fix, if anything.

I guess it's time to figure out how to test the server against
ill-behaved clients...

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v1 1/1] cadence_uart: Check baud rate generator and divider values on migration

2016-11-07 Thread Alistair Francis
On Mon, Nov 7, 2016 at 2:13 PM, Peter Maydell  wrote:
> On 7 November 2016 at 21:53, Alistair Francis
>  wrote:
>> On Sat, Nov 5, 2016 at 6:51 AM, Peter Maydell  
>> wrote:
>>> Usually we just fail the migration if the incoming
>>> data is bogus -- any particular reason not to take that
>>> approach here?
>>
>> There is no reason, it just seemed a bit much to abort just for this.
>>
>> Should I change it to abort?
>
> I think there are two cases:
>  (1) migration from an old version could be in these
> bogus states (without having crashed the old version
> in the process) -- in that case you can argue for
> sanitizing as being most helpful to the user
> (and should comment that that's why we accept-but-squash)

I think this is actually very unlikely, anyone setting these values by
accident has probably already seen crashes.

>  (2) the out-of-bounds values only happen if somebody
> is deliberately feeding QEMU a bogus incoming data
> stream -- in this case (which is the usual one for
> bounds checks) it's best to return 1 to fail the
> migration.

This seems more likely, so it sounds like I should fail the migration.

Thanks,

Alistair

>
> thanks
> -- PMM
>



Re: [Qemu-devel] [PATCH v2 2/6] libqtest: add qmp_eventwait_ref

2016-11-07 Thread Eric Blake
On 11/07/2016 03:13 PM, John Snow wrote:
> Wait for an event, but return a copy so we can investigate parameters.
> 
> Signed-off-by: John Snow 
> ---
>  tests/libqtest.c | 13 ++---
>  tests/libqtest.h | 22 ++
>  2 files changed, 32 insertions(+), 3 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   >