Re: [Qemu-devel] [RFC] [tcg] Idea on refactoring target code generation loop (gen_intermediate_code)

2016-04-10 Thread Claudio Fontana
Clearly late to the party,

On 08.04.2016 22:14, Paolo Bonzini wrote:
> 
> On 08/04/2016 15:15, Markus Armbruster wrote:
>>> On the other hand, minimal usage of templates instead of some of the
>>> preprocessor gunk we have would be a very good thing IMNSHO.  I am
>>> referring to the multiply included header files and to the macros with
>>> type arguments (mostly QOM casts).
>>>
>>> I don't think we need more C++ than that, but using templates as
>>> basically a type-safe preprocessor would improve QEMU a little bit.
>>> More rarely, lambdas could replace some preprocessor magic too, but
>>> that's C11 and not many compilers support them.
>>>
>>> But I won't weep if people say no because we have a lot other
>>> low-hanging fruit to make QEMU better (especially the header file
>>
>> "No!"  (Hey, you asked for it)
>>
>> Back to serious.  As Peter Maydell said, "if we move away from C I'd
>> rather it to be a language that's nicer than C rather than one that's
>> uglier and larger and still retains all of C's flaws."
> 
> Sure, except that one plan is feasible now and can be done in small
> steps; the other is not feasible now (for example Rust is not even
> packaged in Fedora) and entails pretty much a rewrite of the whole code
> base.

I skip my personal take on these specific issue, but one very practical concern 
I have with a potential move to C++,
if I understand these proposals correctly, is with the speed of compilation and 
the binary size, and also QEMU startup time.

I noticed while developing for OSv that compilation with C++ and templates was 
very slow compared with C, so certain scripts around git I am using to build 
and check code from scratch after each patch took a long time to complete.

Do you know what these refactoring proposals' impact on compilation speed, 
binary size and startup time would be?

Thank you,

Claudio


> 
>> People sometimes propose to defang C++ by subsetting and/or coding
>> conventions.  I'll take that seriously when I see the tool that
>> rigorously checks adherence to subset / convention.
> 
> The problem with subsetting and conventions is that they always come
> with exceptions, besides the fact that no one writes the tool.
> 
> So I prefer common sense :) and common sense says that a million-line
> codebase that mixes procedural, home-grown OO and language OO is going
> to stink from ten miles.  And maintainers sit at most two feet from the
> screen.
> 
> Really even just -Wc++-compat would be a nice improvement so we enjoy a
> little more type safety.  IMHO, C++'s biggest tax is that Coccinelle
> does not like it.
> 
>>> cleanups that Markus started and I want to conclude very early in 2.7).
>>
>> Speaking of which: the plan was you post yours for 2.7, and then I can
>> build on top (assuming there's useful work left), right?
> 
> I have rebased my stuff already, but I'm going to disappear in about a
> week and for the first week or two after the 2.6 release (depending on
> whether it slips or not).  Also, I will travel most of next week.  So I
> either I'll post it soonish or it will have to wait a little.
> 
> Anyhow, there's definitely useful work left from your last two patches,
> even more so after Veronia Bahaa cleaned up qemu-common.h substantially
> so there may be more pointless inclusions of it and fewer headers that
> really need it.
> 
> Paolo
> 




Re: [Qemu-devel] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-04-10 Thread Changlong Xie

On 03/30/2016 11:07 PM, Max Reitz wrote:

On 30.03.2016 13:39, Alberto Garcia wrote:

On Tue 29 Mar 2016 05:51:22 PM CEST, Max Reitz wrote:

It sounds like the argument here, and in Max's thread on
query-block-node-tree, is that we DO have cases where order matters, and
so we need a way for the hot-add operation to explicitly specify where
in the list a child is inserted (whether it is being inserted as the new
primary image, or explicitly as the last resort, or somewhere in the
middle).  An optional parameter, that defaults to appending, may be ok,
but we definitely need to consider how the order of children is affected
by hot-add.


However, the order should be queriable after the fact, and there are
three ways I see to accomplish this:

(1) Make this information queriable as driver-specific BDS information.
 I personally don't like it very much, but it would be fine.
(2) Implement query-block-node-tree, make the order of child nodes
 significant and thus represent the FIFO order there. I don't like
 this because it would mean returning two orders through that child
 node list: One is the numeric order (children.0, children.1, ...)
 and another is the FIFO order, which are not necessarily equal.
(3) Fix FIFO order to the child name (its role). I'm very much in favor
 of this.

While I don't have good arguments against (1), I think I have good
arguments for (3) instead: It just doesn't make sense to have a numeric
order of children if this order doesn't mean anything; especially if you
suddenly do need the list of child nodes to be ordered. To me, it
doesn't make any sense to introduce a new hidden order which takes
precedence over this obvious user-visible order.


I'm not sure if I understand correctly what you mean in (3). The
user-visible FIFO order is the one specified when the Quorum is created:

children.0.file.filename=hd0.qcow2,
children.1.file.filename=hd1.qcow2,
...

Would you then call those BDS children.0, children.1, etc


They are already called that way; it's not their node name but their
"child role" name.


   and make those
names be the ones that actually define how they are ordered internally?


Yes, that's what I meant.


Hi Max

I think you just mean what i draw in below chart:

1) Insert 4 children.
0   1  2  3
+
|children.0|children.1|children.2|children.3|
+

2) Remove the 2th child (s->children[1])

{ "execute": "x-blockdev-change", 



  "arguments": { "parent": "xxx", 



 "child": "children.1" } }

0   1  2  3
+
|children.0|children.1|children.2|children.3|
+++--
 || 
  +--+   ++
0   1 |2 |
+v--v
|children.0|children.1|children.2|
+

Remove children.1 and shorten the array, then rename children.{2,3} to 
children.{1.2}


3) Insert a new child

0   1  2 3
+
|children.0|children.1|children.2|children.3|
+

But as Wen said: 
http://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg00898.html


Everytime we try to remove a children.i (i < n-1, so it's not the last 
element of the array[n]), we have to rename children.{i+1, n-1} to 
children.{i, n-2}.


Thanks
-Xie


I also have another (not directly related) question: why not simply use
the node name when removing children? I understood that the idea was
that it's possible to have the same node attached twice to the same
Quorum, but can you actually do that? And what's the use case?


What I like about using the child role name is that it automatically
prevents you from specifying a node that is not a child of the given parent.

Which makes me notice that it might be a good idea to require the user
to specify the child's role when adding a new child. In this version of
this series (where only quorum is supported), the children are just
inserted in numerical order (first free slot is taken first), but maybe
the user wants to insert them in a different order.

For quorum, this is basically irrelevant if the order doesn't mean
anything (which I don't like), but it may be relevant for other block
drivers.

And the "filling up quorum's children" topic then makes me notice that
(x-)blockdev-change should probably be transaction-able (so you can
restructure the whole BDS graph in a single transaction), but that's
something we can care about later on.

Max







Re: [Qemu-devel] [PATCH] intel_iommu: fix incorrect mask for IOTLB invalidate desc

2016-04-10 Thread Peter Xu
On Mon, Apr 11, 2016 at 11:00:21AM +0800, Peter Xu wrote:
> Address Mask (AM) should be 6 bits long, not 7. (vt-d spec 6.5.2.3)
> 

OMG I was wrong... Sorry please ignore this one.

-- peterx



Re: [Qemu-devel] [PATCH for-2.7 v6 2/2] spapr: implement query-hotpluggable-cpus callback

2016-04-10 Thread David Gibson
On Fri,  8 Apr 2016 13:29:56 +0200
Igor Mammedov  wrote:

> it returns a list of present/possible to hotplug CPU
> objects with a list of properties to use with
> device_add.
> 
> in spapr case returned list would looks like:
> -> { "execute": "query-hotpluggable-cpus" }  
> <- {"return": [
>  { "props": { "core": 1 }, "type": "spapr-cpu-core",
>"vcpus-count": 2 },
>  { "props": { "core": 0 }, "type": "spapr-cpu-core",
>"vcpus-count": 2,
>"qom-path": "/machine/unattached/device[0]"}
>]}'
> 
> TODO:
>   add 'node' property for core <-> numa node mapping
> 
> Signed-off-by: Igor Mammedov 
> ---
> it's only compile tested
> v2:
>  - s/qmp_query_hotpluggable_cpus/MachineClass->query_hotpluggable_cpus/ 
> callback
> ---
>  hw/ppc/spapr.c | 33 +
>  1 file changed, 33 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 760a42f..c38995e 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -66,6 +66,7 @@
>  #include "hw/compat.h"
>  #include "qemu/cutils.h"
>  #include "hw/ppc/spapr_cpu_core.h"
> +#include "qmp-commands.h"
>  
>  #include 
>  
> @@ -2382,6 +2383,37 @@ static unsigned spapr_cpu_index_to_socket_id(unsigned 
> cpu_index)
>  return cpu_index / smp_threads / smp_cores;
>  }
>  
> +static HotpluggableCPUList *spapr_query_hotpluggable_cpus(MachineState 
> *machine)
> +{
> +int i;
> +HotpluggableCPUList *head = NULL;
> +sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> +int spapr_max_cores = max_cpus / smp_threads;
> +
> +for (i = 0; i < spapr_max_cores; i++) {
> +HotpluggableCPUList *list_item = g_new0(typeof(*list_item), 1);
> +HotpluggableCPU *cpu_item = g_new0(typeof(*cpu_item), 1);
> +CpuInstanceProperties *cpu_props = g_new0(typeof(*cpu_props), 1);
> +
> +cpu_item->type = g_strdup(TYPE_SPAPR_CPU_CORE);

So, Bharata's latest core stuff is (as suggested by you and others)
moving to using different derived types, rather than a cpu_model
property, so this will need to change a bit.

> +cpu_item->vcpus_count = smp_threads;
> +cpu_props->has_core = true;
> +cpu_props->core = i;
> +/* TODO: add 'has_node/node' here to describe
> +   to which node core belongs */
> +
> +cpu_item->props = cpu_props;
> +if (spapr->cores[i]) {
> +cpu_item->has_qom_path = true;
> +cpu_item->qom_path = object_get_canonical_path(spapr->cores[i]);
> +}
> +list_item->value = cpu_item;
> +list_item->next = head;
> +head = list_item;
> +}
> +return head;
> +}
> +
>  static void spapr_machine_class_init(ObjectClass *oc, void *data)
>  {
>  MachineClass *mc = MACHINE_CLASS(oc);
> @@ -2412,6 +2444,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  hc->plug = spapr_machine_device_plug;
>  hc->unplug = spapr_machine_device_unplug;
>  mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
> +mc->query_hotpluggable_cpus = spapr_query_hotpluggable_cpus;
>  
>  smc->dr_lmb_enabled = true;
>  smc->dr_cpu_enabled = true;
> -- 
> 1.8.3.1
> 


-- 
David Gibson 
Senior Software Engineer, Virtualization, Red Hat


pgpyOqvIlXH5S.pgp
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-2.7 v6 1/2] QMP: add query-hotpluggable-cpus

2016-04-10 Thread David Gibson
On Fri,  8 Apr 2016 13:29:55 +0200
Igor Mammedov  wrote:

> it will allow mgmt to query present and hotpluggable
> CPU objects, it is required from a target platform that
> wish to support command to implement and set
>  MachineClass.query_hotpluggable_cpus
> callback, which will return a list of possible CPU objects
> with options that would be needed for hotplugging possible
> CPU objects.
> 
> There are:
> 'type': 'str' - QOM CPU object type for usage with device_add
> 'vcpus-count': 'int' - number of logical VCPU threads per
> CPU object (mgmt needs to know)
> 
> and a set of optional fields that are to used for hotplugging
> a CPU objects and would allows mgmt tools to know what/where
> it could be hotplugged;
> [node],[socket],[core],[thread]
> 
> For present CPUs there is a 'qom-path' field which
> would allow mgmt to inspect whatever object/abstraction
> the target platform considers as CPU object.
> 
> Signed-off-by: Igor Mammedov 
> ---
> v6:
>  - fix style issues in qapi-schema and qmp-commands,
>Eric Blake 
>  - rebase on top current master (query-gic-capabilities conflict)
> v5:
>  - fix s390 build failure:
> undefined reference to `qmp_query_hotpluggable_cpus'
> v4:
>  - add MachineClass method to get CPU object list
> v3:
>  - add 'vcpus-count' field, pkre...@redhat.com
>  - s/CpuInstanceProps/CpuInstanceProperties/
>  - use '#optional' marker
>  - make "props" as always present even if it's empty
>  - fix JSON examples
>  - fix minor typos
> 
> query_fixup
> ---
>  include/hw/boards.h |  5 +
>  monitor.c   | 13 +
>  qapi-schema.json| 46 ++
>  qmp-commands.hx | 41 +
>  4 files changed, 105 insertions(+)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index aad5f2a..c122a70 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -81,6 +81,10 @@ typedef struct {
>   *Returns an array of @CPUArchId architecture-dependent CPU IDs
>   *which includes CPU IDs for present and possible to hotplug CPUs.
>   *Caller is responsible for freeing returned list.
> + * @query_hotpluggable_cpus:
> + *Returns a @HotpluggableCPUList, which describes CPUs objects which
> + *could be added with -device/device_add.
> + *Caller is responsible for freeing returned list.
>   */
>  struct MachineClass {
>  /*< private >*/
> @@ -123,6 +127,7 @@ struct MachineClass {
> DeviceState *dev);
>  unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
>  CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
> +HotpluggableCPUList *(*query_hotpluggable_cpus)(MachineState *machine);
>  };
>  
>  /**
> diff --git a/monitor.c b/monitor.c
> index d1c1930..b469225 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4267,3 +4267,16 @@ GICCapabilityList *qmp_query_gic_capabilities(Error 
> **errp)
>  return NULL;
>  }
>  #endif
> +
> +HotpluggableCPUList *qmp_query_hotpluggable_cpus(Error **errp)
> +{
> +MachineState *ms = MACHINE(qdev_get_machine());
> +MachineClass *mc = MACHINE_GET_CLASS(ms);
> +
> +if (!mc->query_hotpluggable_cpus) {
> +error_setg(errp, QERR_FEATURE_DISABLED, "query-hotpluggable-cpus");
> +return NULL;
> +}
> +
> +return mc->query_hotpluggable_cpus(ms);
> +}
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 54634c4..4d1d71d 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -4178,3 +4178,49 @@
>  # Since: 2.6
>  ##
>  { 'command': 'query-gic-capabilities', 'returns': ['GICCapability'] }
> +
> +##
> +# CpuInstanceProperties
> +#
> +# @node: #optional NUMA node ID the CPU belongs to
> +# @socket: #optional socket number within node/board the CPU belongs to
> +# @core: #optional core number within socket the CPU belongs to
> +# @thread: #optional thread number within core the CPU belongs to
> +#
> +# Since: 2.7
> +##
> +{ 'struct': 'CpuInstanceProperties',
> +  'data': { '*node': 'int',
> +'*socket': 'int',
> +'*core': 'int',
> +'*thread': 'int'
> +  }
> +}

Is there somewhere we should document the fact that these particular
properties are common ones, but there could be more.  The point is that
management should not assume these are the only fields here, but should
be prepared to accept anything, and echo it back to the device_add.


> +
> +##
> +# @HotpluggableCPU
> +#
> +# @type: CPU object type for usage with device_add command
> +# @props: list of properties to be used for hotplugging CPU
> +# @vcpus-count: number of logical VCPU threads @HotpluggableCPU provides
> +# @qom-path: #optional link to existing CPU object if CPU is present or
> +#omitted if CPU is not present.
> +#
> +# Since: 2.7
> +##
> +{ 'struct': 'HotpluggableCPU',
> +  'data': { 'type': 'str',
> +

[Qemu-devel] [PATCH v4 1/1] Introduce "xen-load-devices-state"

2016-04-10 Thread Changlong Xie
From: Wen Congyang 

Introduce a "xen-load-devices-state" QAPI command that can be used to
load the state of all devices, but not the RAM or the block devices of
the VM.

We only have hmp commands savevm/loadvm, and qmp commands
xen-save-devices-state.

We use this new command for COLO:
1. suspend both primary vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm

In such case, we need to update all devices' state in any time.

Signed-off-by: Wen Congyang 
Signed-off-by: Changlong Xie 
---
 migration/savevm.c | 36 
 qapi-schema.json   | 14 ++
 qmp-commands.hx| 27 +++
 3 files changed, 77 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 16ba443..d361a29 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -30,6 +30,7 @@
 #include "hw/boards.h"
 #include "hw/hw.h"
 #include "hw/qdev.h"
+#include "hw/xen/xen.h"
 #include "net/net.h"
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
@@ -1775,6 +1776,12 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis)
 return -EINVAL;
 }
 
+/* Validate if it is a device's state */
+if (xen_enabled() && se->is_ram) {
+error_report("loadvm: %s RAM loading not allowed on Xen", idstr);
+return -EINVAL;
+}
+
 /* Add entry */
 le = g_malloc0(sizeof(*le));
 
@@ -2084,6 +2091,35 @@ void qmp_xen_save_devices_state(const char *filename, 
Error **errp)
 }
 }
 
+void qmp_xen_load_devices_state(const char *filename, Error **errp)
+{
+QEMUFile *f;
+int ret;
+
+/* Guest must be paused before loading the device state; the RAM state
+ * will already have been loaded by xc
+ */
+if (runstate_is_running()) {
+error_setg(errp, "Cannot update device state while vm is running");
+return;
+}
+vm_stop(RUN_STATE_RESTORE_VM);
+
+f = qemu_fopen(filename, "rb");
+if (!f) {
+error_setg_file_open(errp, errno, filename);
+return;
+}
+
+migration_incoming_state_new(f);
+ret = qemu_loadvm_state(f);
+qemu_fclose(f);
+if (ret < 0) {
+error_setg(errp, QERR_IO_ERROR);
+}
+migration_incoming_state_destroy();
+}
+
 int load_vmstate(const char *name)
 {
 BlockDriverState *bs, *bs_vm_state;
diff --git a/qapi-schema.json b/qapi-schema.json
index 54634c4..132264f 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4144,6 +4144,20 @@
   'data': [ 'none', 'record', 'play' ] }
 
 ##
+# @xen-load-devices-state:
+#
+# Load the state of all devices from file. The RAM and the block devices
+# of the VM are not loaded by this command.
+#
+# @filename: the file to load the state of the devices from as binary
+# data. See xen-save-devices-state.txt for a description of the binary
+# format.
+#
+# Since: 2.7
+##
+{ 'command': 'xen-load-devices-state', 'data': {'filename': 'str'} }
+
+##
 # @GICCapability:
 #
 # The struct describes capability for a specific GIC (Generic
diff --git a/qmp-commands.hx b/qmp-commands.hx
index de896a5..68620f6 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -587,6 +587,33 @@ Example:
 EQMP
 
 {
+.name   = "xen-load-devices-state",
+.args_type  = "filename:F",
+.mhandler.cmd_new = qmp_marshal_xen_load_devices_state,
+},
+
+SQMP
+xen-load-devices-state
+--
+
+Load the state of all devices from file. The RAM and the block devices
+of the VM are not loaded by this command.
+
+Arguments:
+
+- "filename": the file to load the state of the devices from as binary
+data. See xen-save-devices-state.txt for a description of the binary
+format.
+
+Example:
+
+-> { "execute": "xen-load-devices-state",
+ "arguments": { "filename": "/tmp/resume" } }
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "xen-set-global-dirty-log",
 .args_type  = "enable:b",
 .mhandler.cmd_new = qmp_marshal_xen_set_global_dirty_log,
-- 
1.9.3






[Qemu-devel] [PATCH v4 0/1] Introduce "xen-load-devices-state"

2016-04-10 Thread Changlong Xie
Changelog
v4:
1. Rebased to the lastest code
v3:
1. Addressed on David's commets, fix a bug
v2:
1. Rebased to the lastest code
2. Addressed on Eric's comments, fixed coding style

Wen Congyang (1):
  Introduce "xen-load-devices-state"

 migration/savevm.c | 36 
 qapi-schema.json   | 14 ++
 qmp-commands.hx| 27 +++
 3 files changed, 77 insertions(+)

-- 
1.9.3






Re: [Qemu-devel] [PATCH RFC v2] IOMMU: Add Support to VFIO devices with vIOMMU present

2016-04-10 Thread Peter Xu
Hi, Aviv,

On Sat, Apr 09, 2016 at 09:03:38PM +0300, Aviv B.D. wrote:

[...]

> +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num,
> uint8_t devfn, uint16_t * domain_id)

It seems that there are two lines above, however what I feel is that
this is a long line splitted by the email client or something
else... are you sending the patch using git-send-email? Not sure
whether this would be a problem.

I see the same thing in previous patches too.

[...]

> @@ -785,7 +816,7 @@ static inline bool vtd_is_interrupt_addr(hwaddr addr)
>   * @entry: IOMMUTLBEntry that contain the addr to be translated and result
>   */
>  static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> -   uint8_t devfn, hwaddr addr, bool is_write,
> +   uint8_t devfn, hwaddr addr,
> IOMMUAccessPermissions is_write,

First of all, if we are to change this "is_write" into a permission,
I would prefer change it's name too from "is_write" to "perm" or
else, so that people would know this is not a boolean any more.

Secondly, I assume you have not handled all the is_write uses below,
right? So the code seems not consistent. Many places are still using
it as boolean (I suppose).

[...]

> uint16_t domain_id,
> +  hwaddr addr, uint8_t am)
> +{
> +VFIOGuestIOMMU * giommu;
> +
> +QLIST_FOREACH(giommu, &(s->giommu_list), iommu_next){
> +VTDAddressSpace *vtd_as = container_of(giommu->iommu,
> VTDAddressSpace, iommu);
> +uint16_t vfio_domain_id;
> +int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus),
> vtd_as->devfn, _domain_id);
> +int i=0;
> +if (!ret && domain_id == vfio_domain_id){
> +IOMMUTLBEntry entry;
> +
> +/* do vfio unmap */
> +VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask %d", addr, 
> am);
> +entry.target_as = NULL;
> +entry.iova = addr & VTD_PAGE_MASK_4K;
> +entry.translated_addr = 0;
> +entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K + am);
> +entry.perm = IOMMU_NONE;
> +memory_region_notify_iommu(giommu->iommu, entry);
> +
> +/* do vfio map */
> +VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d", addr, am);
> +/* call to vtd_iommu_translate */
> +for (i = 0; i < (1 << am); i++, addr+=(1 << VTD_PAGE_SHIFT_4K)){
> +IOMMUTLBEntry entry =
> s->iommu_ops.translate(giommu->iommu, addr, IOMMU_ANY);
> +if (entry.perm != IOMMU_NONE){
> +memory_region_notify_iommu(giommu->iommu, entry);
> +}

Questions (that I am curious about only, not to mean that there is
something worng):

- What should we do if entry.perm == IOMMU_NONE? Is it possible? If
  not, I'd prefer assert to if.

- Here, we do the remap for every 4K, guess even with huge
  pages. Better way to do? A stupid one from me: take special care
  for am == 9, 18 cases?

- Is it possible that multiple devices use same domain ID? Not
  sure. If so, we will always map/unmap the same address twice?

[...]

>  static void vfio_listener_region_add(MemoryListener *listener,
>   MemoryRegionSection *section)
>  {
> @@ -344,6 +347,7 @@ static void
> vfio_listener_region_add(MemoryListener *listener,
>  iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
>  llend = int128_make64(section->offset_within_address_space);
>  llend = int128_add(llend, section->size);
> +llend = int128_add(llend, int128_exts64(-1));

It seems that Bandan has fixed this. Please try pull latest master
branch and check commit 55efcc537d330. If so, maybe we need to
rebase?

>  llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> 
>  if (int128_ge(int128_make64(iova), llend)) {
> @@ -381,11 +385,13 @@ static void
> vfio_listener_region_add(MemoryListener *listener,
>  giommu->n.notify = vfio_iommu_map_notify;
>  QLIST_INSERT_HEAD(>giommu_list, giommu, giommu_next);
> 
> +vtd_register_giommu(giommu);
>  memory_region_register_iommu_notifier(giommu->iommu, >n);
> +#if 0
>  memory_region_iommu_replay(giommu->iommu, >n,
> vfio_container_granularity(container),
> false);
> -
> +#endif
>  return;
>  }
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 2de7898..0e814ab 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -146,10 +146,14 @@ struct MemoryRegionOps {
>  };
> 
>  typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
> -
> +typedef enum IOMMUAccessPermissions{
> +IOMMU_READ = 0,
> +IOMMU_WRITE = 1,
> +IOMMU_ANY = 2
> +} IOMMUAccessPermissions;

I see this:

typedef enum {
IOMMU_NONE = 0,
IOMMU_RO   = 1,
IOMMU_WO   = 2,
IOMMU_RW   = 3,
} 

[Qemu-devel] ping [PATCH 3/3] hid.c: Add debug support

2016-04-10 Thread Programmingkid
https://patchwork.ozlabs.org/patch/602037/

Add debug macros to the code for easier debugging.

Signed-off-by: John Arbuckle 
---
 hw/input/hid.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/input/hid.c b/hw/input/hid.c
index 329a27b..42ca592 100644
--- a/hw/input/hid.c
+++ b/hw/input/hid.c
@@ -37,6 +37,13 @@
 #define RELEASED -1
 #define PUSHED -2
 
+/* #define DEBUG_HID_CODE */
+#ifdef DEBUG_HID_CODE
+#define DEBUG_HID(fmt, ...) printf(fmt, __VA_ARGS__)
+#else
+#define DEBUG_HID(fmt, ...) (void)0
+#endif
+
 /* Translates a QKeyCode to USB HID value */
 static const uint8_t qcode_to_usb_hid[] = {
 [Q_KEY_CODE_SHIFT] = USB_HID_LEFT_SHIFT,
@@ -331,6 +338,7 @@ static void hid_keyboard_event(DeviceState *dev, 
QemuConsole *src,
 return;
 }
 keycode = qcode_to_usb_hid[qcode];
+DEBUG_HID("keycode = 0x%x qcode:%d\n", keycode, qcode);
 
 count = 2;
 if (evt->u.key.data->down == false) { /* if key up event */
@@ -381,6 +389,9 @@ static void hid_keyboard_process_keycode(HIDState *hs)
 slot = hs->head & QUEUE_MASK; QUEUE_INCR(hs->head); hs->n--;
 keycode = hs->kbd.keycodes[slot];
 
+DEBUG_HID("keycode:0x%x status:%s\n", keycode, (status == PUSHED ? "Pushed"
+  : "Released"));
+
 /* handle Control, Option, GUI/Windows/Command, and Shift keys */
 if (keycode >= 0xe0) {
 process_modifier_key(status, keycode, &(hs->kbd.modifiers));
-- 
2.7.2





[Qemu-devel] ping [PATCH 2/3] hid.c: convert to QKeyCode

2016-04-10 Thread Programmingkid
https://patchwork.ozlabs.org/patch/602036/

Switches hid.c from PS/2 to QKeyCode support.

Signed-off-by: John Arbuckle 
---
 hw/input/hid.c | 270 ++---
 1 file changed, 179 insertions(+), 91 deletions(-)

diff --git a/hw/input/hid.c b/hw/input/hid.c
index 5912677..329a27b 100644
--- a/hw/input/hid.c
+++ b/hw/input/hid.c
@@ -27,47 +27,144 @@
 #include "ui/console.h"
 #include "qemu/timer.h"
 #include "hw/input/hid.h"
+#include "include/hw/input/usb-keys.h"
+#include 
 
 #define HID_USAGE_ERROR_ROLLOVER0x01
 #define HID_USAGE_POSTFAIL  0x02
 #define HID_USAGE_ERROR_UNDEFINED   0x03
 
-/* Indices are QEMU keycodes, values are from HID Usage Table.  Indices
- * above 0x80 are for keys that come after 0xe0 or 0xe1+0x1d or 0xe1+0x9d.  */
-static const uint8_t hid_usage_keys[0x100] = {
-0x00, 0x29, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23,
-0x24, 0x25, 0x26, 0x27, 0x2d, 0x2e, 0x2a, 0x2b,
-0x14, 0x1a, 0x08, 0x15, 0x17, 0x1c, 0x18, 0x0c,
-0x12, 0x13, 0x2f, 0x30, 0x28, 0xe0, 0x04, 0x16,
-0x07, 0x09, 0x0a, 0x0b, 0x0d, 0x0e, 0x0f, 0x33,
-0x34, 0x35, 0xe1, 0x31, 0x1d, 0x1b, 0x06, 0x19,
-0x05, 0x11, 0x10, 0x36, 0x37, 0x38, 0xe5, 0x55,
-0xe2, 0x2c, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e,
-0x3f, 0x40, 0x41, 0x42, 0x43, 0x53, 0x47, 0x5f,
-0x60, 0x61, 0x56, 0x5c, 0x5d, 0x5e, 0x57, 0x59,
-0x5a, 0x5b, 0x62, 0x63, 0x00, 0x00, 0x64, 0x44,
-0x45, 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e,
-0xe8, 0xe9, 0x71, 0x72, 0x73, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x85, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0xe3, 0xe7, 0x65,
-
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x58, 0xe4, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x54, 0x00, 0x46,
-0xe6, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x00, 0x4a,
-0x52, 0x4b, 0x00, 0x50, 0x00, 0x4f, 0x00, 0x4d,
-0x51, 0x4e, 0x49, 0x4c, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0xe3, 0xe7, 0x65, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+#define RELEASED -1
+#define PUSHED -2
+
+/* Translates a QKeyCode to USB HID value */
+static const uint8_t qcode_to_usb_hid[] = {
+[Q_KEY_CODE_SHIFT] = USB_HID_LEFT_SHIFT,
+[Q_KEY_CODE_SHIFT_R] = USB_HID_RIGHT_SHIFT,
+[Q_KEY_CODE_ALT] = USB_HID_LEFT_OPTION,
+[Q_KEY_CODE_ALT_R] = USB_HID_RIGHT_OPTION,
+[Q_KEY_CODE_ALTGR] = USB_HID_LEFT_OPTION,
+[Q_KEY_CODE_ALTGR_R] = USB_HID_RIGHT_OPTION,
+[Q_KEY_CODE_CTRL] = USB_HID_LEFT_CONTROL,
+[Q_KEY_CODE_CTRL_R] = USB_HID_RIGHT_CONTROL,
+[Q_KEY_CODE_MENU] = USB_HID_MENU,
+[Q_KEY_CODE_ESC] = USB_HID_ESC,
+[Q_KEY_CODE_1] = USB_HID_1,
+[Q_KEY_CODE_2] = USB_HID_2,
+[Q_KEY_CODE_3] = USB_HID_3,
+[Q_KEY_CODE_4] = USB_HID_4,
+[Q_KEY_CODE_5] = USB_HID_5,
+[Q_KEY_CODE_6] = USB_HID_6,
+[Q_KEY_CODE_7] = USB_HID_7,
+[Q_KEY_CODE_8] = USB_HID_8,
+[Q_KEY_CODE_9] = USB_HID_9,
+[Q_KEY_CODE_0] = USB_HID_0,
+[Q_KEY_CODE_MINUS] = USB_HID_MINUS,
+[Q_KEY_CODE_EQUAL] = USB_HID_EQUALS,
+[Q_KEY_CODE_BACKSPACE] = USB_HID_DELETE,
+[Q_KEY_CODE_TAB] = USB_HID_TAB,
+[Q_KEY_CODE_Q] = USB_HID_Q,
+[Q_KEY_CODE_W] = USB_HID_W,
+[Q_KEY_CODE_E] = USB_HID_E,
+[Q_KEY_CODE_R] = USB_HID_R,
+[Q_KEY_CODE_T] = USB_HID_T,
+[Q_KEY_CODE_Y] = USB_HID_Y,
+[Q_KEY_CODE_U] = USB_HID_U,
+[Q_KEY_CODE_I] = USB_HID_I,
+[Q_KEY_CODE_O] = USB_HID_O,
+[Q_KEY_CODE_P] = USB_HID_P,
+[Q_KEY_CODE_BRACKET_LEFT] = USB_HID_LEFT_BRACKET,
+[Q_KEY_CODE_BRACKET_RIGHT] = USB_HID_RIGHT_BRACKET,
+[Q_KEY_CODE_RET] = USB_HID_RETURN,
+[Q_KEY_CODE_A] = USB_HID_A,
+[Q_KEY_CODE_S] = USB_HID_S,
+[Q_KEY_CODE_D] = USB_HID_D,
+[Q_KEY_CODE_F] = USB_HID_F,
+[Q_KEY_CODE_G] = USB_HID_G,
+[Q_KEY_CODE_H] = USB_HID_H,
+[Q_KEY_CODE_J] = USB_HID_J,
+[Q_KEY_CODE_K] = USB_HID_K,
+[Q_KEY_CODE_L] = USB_HID_L,
+[Q_KEY_CODE_SEMICOLON] = USB_HID_SEMICOLON,
+[Q_KEY_CODE_APOSTROPHE] = USB_HID_QUOTE,
+[Q_KEY_CODE_GRAVE_ACCENT] = USB_HID_GRAVE_ACCENT,
+[Q_KEY_CODE_BACKSLASH] = USB_HID_BACKSLASH,
+[Q_KEY_CODE_Z] = USB_HID_Z,
+[Q_KEY_CODE_X] = USB_HID_X,
+[Q_KEY_CODE_C] = USB_HID_C,
+[Q_KEY_CODE_V] = USB_HID_V,
+[Q_KEY_CODE_B] = USB_HID_B,
+[Q_KEY_CODE_N] = USB_HID_N,
+[Q_KEY_CODE_M] = USB_HID_M,
+[Q_KEY_CODE_COMMA] = USB_HID_COMMA,
+[Q_KEY_CODE_DOT] = USB_HID_PERIOD,
+[Q_KEY_CODE_SLASH] = 

[Qemu-devel] [PATCH] intel_iommu: fix incorrect mask for IOTLB invalidate desc

2016-04-10 Thread Peter Xu
Address Mask (AM) should be 6 bits long, not 7. (vt-d spec 6.5.2.3)

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu_internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e5f514c..db7da18 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -315,7 +315,7 @@ typedef struct VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_IOTLB_DID(val) (((val) >> 16) & VTD_DOMAIN_ID_MASK)
 #define VTD_INV_DESC_IOTLB_ADDR(val)((val) & ~0xfffULL & \
  ((1ULL << VTD_MGAW) - 1))
-#define VTD_INV_DESC_IOTLB_AM(val)  ((val) & 0x3fULL)
+#define VTD_INV_DESC_IOTLB_AM(val)  ((val) & 0x2fULL)
 #define VTD_INV_DESC_IOTLB_RSVD_LO  0xff00ULL
 #define VTD_INV_DESC_IOTLB_RSVD_HI  0xf80ULL
 
-- 
2.4.3




[Qemu-devel] ping [PATCH 1/3] usb-keys.h: initial commit

2016-04-10 Thread Programmingkid
https://patchwork.ozlabs.org/patch/602035/

Create an emum of all the USB HID keyboard values.

Signed-off-by: John Arbuckle 
---
 include/hw/input/usb-keys.h | 154 
 1 file changed, 154 insertions(+)
 create mode 100644 include/hw/input/usb-keys.h

diff --git a/include/hw/input/usb-keys.h b/include/hw/input/usb-keys.h
new file mode 100644
index 000..6a9fba8
--- /dev/null
+++ b/include/hw/input/usb-keys.h
@@ -0,0 +1,154 @@
+/*
+ * QEMU USB HID Emulator
+ *
+ * Copyright (c) 2016 John Arbuckle
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ *  File: usb-keys.h
+ *  Description: Creates an enum of all the USB keycodes.
+ *  Additional information: http://www.usb.org/developers/hidpage/Hut1_12v2.pdf
+ *  page 53
+ */
+
+#ifndef USB_KEYS_H
+#define USB_KEYS_H
+
+enum {
+USB_HID_A = 0x04,
+USB_HID_B = 0x05,
+USB_HID_C = 0x06,
+USB_HID_D = 0x07,
+USB_HID_E = 0x08,
+USB_HID_F = 0x09,
+USB_HID_G = 0x0a,
+USB_HID_H = 0x0b,
+USB_HID_I = 0x0c,
+USB_HID_J = 0x0d,
+USB_HID_K = 0x0e,
+USB_HID_L = 0x0f,
+USB_HID_M = 0x10,
+USB_HID_N = 0x11,
+USB_HID_O = 0x12,
+USB_HID_P = 0x13,
+USB_HID_Q = 0x14,
+USB_HID_R = 0x15,
+USB_HID_S = 0x16,
+USB_HID_T = 0x17,
+USB_HID_U = 0x18,
+USB_HID_V = 0x19,
+USB_HID_W = 0x1a,
+USB_HID_X = 0x1b,
+USB_HID_Y = 0x1c,
+USB_HID_Z = 0x1d,
+
+USB_HID_1 = 0x1e,
+USB_HID_2 = 0x1f,
+USB_HID_3 = 0x20,
+USB_HID_4 = 0x21,
+USB_HID_5 = 0x22,
+USB_HID_6 = 0x23,
+USB_HID_7 = 0x24,
+USB_HID_8 = 0x25,
+USB_HID_9 = 0x26,
+USB_HID_0 = 0x27,
+
+USB_HID_RETURN = 0x28,
+USB_HID_ESC = 0x29,
+USB_HID_DELETE = 0x2a,
+USB_HID_TAB = 0x2b,
+USB_HID_SPACE = 0x2c,
+USB_HID_MINUS = 0x2d,
+USB_HID_EQUALS = 0x2e,
+USB_HID_LEFT_BRACKET = 0x2f,
+USB_HID_RIGHT_BRACKET = 0x30,
+USB_HID_BACKSLASH = 0x31,
+USB_HID_NON_US_NUMBER_SIGN = 0x32,
+USB_HID_SEMICOLON = 0x33,
+USB_HID_QUOTE = 0x34,
+USB_HID_GRAVE_ACCENT = 0x35,
+USB_HID_COMMA = 0x36,
+USB_HID_PERIOD = 0x37,
+USB_HID_FORWARD_SLASH = 0x38,
+USB_HID_CAPS_LOCK = 0x39,
+
+USB_HID_F1 = 0x3a,
+USB_HID_F2 = 0x3b,
+USB_HID_F3 = 0x3c,
+USB_HID_F4 = 0x3d,
+USB_HID_F5 = 0x3e,
+USB_HID_F6 = 0x3f,
+USB_HID_F7 = 0x40,
+USB_HID_F8 = 0x41,
+USB_HID_F9 = 0x42,
+USB_HID_F10 = 0x43,
+USB_HID_F11 = 0x44,
+USB_HID_F12 = 0x45,
+USB_HID_PRINT = 0x46,
+USB_HID_SCROLL_LOCK = 0x47,
+USB_HID_PAUSE = 0x48,
+
+USB_HID_INSERT = 0x49,
+USB_HID_HOME = 0x4a,
+USB_HID_PAGE_UP = 0x4b,
+USB_HID_FORWARD_DELETE = 0x4c,
+USB_HID_END = 0x4d,
+USB_HID_PAGE_DOWN = 0x4e,
+USB_HID_RIGHT_ARROW = 0x4f,
+USB_HID_LEFT_ARROW = 0x50,
+USB_HID_DOWN_ARROW = 0x51,
+USB_HID_UP_ARROW = 0x52,
+
+USB_HID_CLEAR = 0x53,
+USB_HID_KP_DIVIDE = 0x54,
+USB_HID_KP_MULTIPLY = 0x55,
+USB_HID_KP_MINUS = 0x56,
+USB_HID_KP_ADD = 0x57,
+USB_HID_KP_ENTER = 0x58,
+USB_HID_KP_1 = 0x59,
+USB_HID_KP_2 = 0x5a,
+USB_HID_KP_3  = 0x5b,
+USB_HID_KP_4 = 0x5c,
+USB_HID_KP_5 = 0x5d,
+USB_HID_KP_6 = 0x5e,
+USB_HID_KP_7 = 0x5f,
+USB_HID_KP_8 = 0x60,
+USB_HID_KP_9 = 0x61,
+USB_HID_KP_0 = 0x62,
+USB_HID_KP_PERIOD = 0x63,
+
+USB_HID_NON_US_BACKSLASH = 0x64,
+USB_HID_APPLICATION = 0x65,
+USB_HID_POWER = 0x66,
+USB_HID_KP_EQUALS = 0x67,
+USB_HID_F13 = 0x68,
+USB_HID_F14 = 0x69,
+USB_HID_F15 = 0x6a,
+USB_HID_EXECUTE = 0x74,
+USB_HID_HELP = 0x75,
+USB_HID_MENU = 0x76,
+USB_HID_SELECT = 0x77,
+USB_HID_STOP = 0x78,
+USB_HID_AGAIN = 0x79,
+USB_HID_UNDO = 0x7a,
+USB_HID_CUT = 0x7b,
+USB_HID_COPY = 0x7c,
+USB_HID_PASTE = 0x7d,
+USB_HID_FIND = 0x7e,
+USB_HID_MUTE = 0x7f,
+USB_HID_VOLUME_UP = 0x80,
+USB_HID_VOLUME_DOWN = 0x81,
+USB_HID_KP_COMMA = 0x85,
+
+USB_HID_LEFT_CONTROL = 0xe0,
+USB_HID_LEFT_SHIFT = 0xe1,
+USB_HID_LEFT_OPTION = 0xe2,
+USB_HID_LEFT_GUI = 0xe3,
+USB_HID_RIGHT_CONTROL = 0xe4,
+USB_HID_RIGHT_SHIFT = 0xe5,
+USB_HID_RIGHT_OPTION = 0xe6,
+USB_HID_RIGHT_GUI = 0xe7,
+};
+
+#endif /* USB_KEYS_H */
-- 
2.7.2





[Qemu-devel] ping [PATCH 0/3] Switch USB HID to QKeyCode

2016-04-10 Thread Programmingkid
https://patchwork.ozlabs.org/patch/602035/

This patchset switches from the PS/2 keycode to QKeyCode support in the hid.c
file. 

John Arbuckle (3):
 usb-keys.h: initial commit
 hid.c: convert to QKeyCode
 hid.c: Add debug support

hw/input/hid.c  | 279 ++--
include/hw/input/usb-keys.h | 154 
2 files changed, 343 insertions(+), 90 deletions(-)
create mode 100644 include/hw/input/usb-keys.h

-- 
2.7.2

> 




[Qemu-devel] [PATCH v3 09/10] tcg: Clean up direct block chaining safety checks

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

We don't take care of direct jumps when address mapping changes. Thus we
must be sure to generate direct jumps so that they always keep valid
even if address mapping changes. However we only allow to execute a TB
if it was generated from the pages which match with current mapping.

Document in comments in the translation code the reason for these checks
on a destination PC.

Some targets with variable length instructions allow TB to straddle a
page boundary. However, we make sure that both TB pages match the
current address mapping when looking up TBs. So it is safe to do direct
jumps into both pages. Correct the checks for those targets.

Given that, we can safely patch a TB which spans two pages. Remove the
unnecessary check in cpu_exec() and allow such TBs to be patched.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c| 7 ++-
 target-alpha/translate.c  | 5 -
 target-arm/translate-a64.c| 5 -
 target-arm/translate.c| 7 ++-
 target-cris/translate.c   | 8 +++-
 target-i386/translate.c   | 7 +--
 target-lm32/translate.c   | 4 
 target-m68k/translate.c   | 6 +-
 target-microblaze/translate.c | 4 
 target-mips/translate.c   | 4 
 target-moxie/translate.c  | 4 
 target-openrisc/translate.c   | 4 
 target-ppc/translate.c| 4 
 target-s390x/translate.c  | 9 ++---
 target-sh4/translate.c| 4 
 target-sparc/translate.c  | 4 
 target-tricore/translate.c| 4 
 target-unicore32/translate.c  | 4 
 target-xtensa/translate.c | 8 
 19 files changed, 87 insertions(+), 15 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index bbfcbfb54385..065cc9159477 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -508,11 +508,8 @@ int cpu_exec(CPUState *cpu)
 next_tb = 0;
 tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
 }
-/* see if we can patch the calling TB. When the TB
-   spans two pages, we cannot safely do a direct
-   jump. */
-if (next_tb != 0 && tb->page_addr[1] == -1
-&& !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+/* See if we can patch the calling TB. */
+if (next_tb != 0 && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
 tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK),
 next_tb & TB_EXIT_MASK, tb);
 }
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 5b86992dd367..5fa66309ce2e 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -464,7 +464,10 @@ static bool use_goto_tb(DisasContext *ctx, uint64_t dest)
 if (in_superpage(ctx, dest)) {
 return true;
 }
-/* Check for the dest on the same page as the start of the TB.  */
+/* Direct jumps with goto_tb are only safe within the page this TB resides
+ * in because we don't take care of direct jumps when address mapping
+ * changes.
+ */
 return ((ctx->tb->pc ^ dest) & TARGET_PAGE_MASK) == 0;
 }
 
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index b13cff756ad1..bf8471c7fa99 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -274,7 +274,10 @@ static inline bool use_goto_tb(DisasContext *s, int n, 
uint64_t dest)
 return false;
 }
 
-/* Only link tbs from inside the same guest page */
+/* Direct jumps with goto_tb are only safe within the page this TB resides
+ * in because we don't take care of direct jumps when address mapping
+ * changes.
+ */
 if ((s->tb->pc & TARGET_PAGE_MASK) != (dest & TARGET_PAGE_MASK)) {
 return false;
 }
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 940ec8d981d1..aeb3e84e8d40 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4054,7 +4054,12 @@ static inline void gen_goto_tb(DisasContext *s, int n, 
target_ulong dest)
 TranslationBlock *tb;
 
 tb = s->tb;
-if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
+/* Direct jumps with goto_tb are only safe within the pages this TB resides
+ * in because we don't take care of direct jumps when address mapping
+ * changes.
+ */
+if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
+((s->pc - 1) & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
 tcg_gen_goto_tb(n);
 gen_set_pc_im(s, dest);
 tcg_gen_exit_tb((uintptr_t)tb + n);
diff --git a/target-cris/translate.c b/target-cris/translate.c
index a73176c118b0..7acac076167e 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -524,7 +524,13 @@ static void gen_goto_tb(DisasContext *dc, int n, 
target_ulong dest)
 {
 

[Qemu-devel] [PATCH v3 08/10] tcg: Clean up tb_jmp_unlink()

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

Unify the code of this function with tb_jmp_remove_from_list(). Making
these functions similar improves their readability. Also this could be a
step towards making this function thread-safe.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
---

Changes in v2:
 * Tweaked a comment to better describe jmp_list_{next|first} usage
 * Eliminated duplicate dereference of 'ptb' in tb_jmp_unlink()

 translate-all.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 2d82ce5735a1..485bc9ccdfbe 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -964,25 +964,22 @@ static inline void tb_reset_jump(TranslationBlock *tb, 
int n)
 /* remove any jumps to the TB */
 static inline void tb_jmp_unlink(TranslationBlock *tb)
 {
-uintptr_t tb1, tb2;
+TranslationBlock *tb1;
+uintptr_t *ptb, ntb;
 unsigned int n1;
 
-tb1 = tb->jmp_list_first;
+ptb = >jmp_list_first;
 for (;;) {
-TranslationBlock *tmp_tb;
-n1 = tb1 & 3;
+ntb = *ptb;
+n1 = ntb & 3;
+tb1 = (TranslationBlock *)(ntb & ~3);
 if (n1 == 2) {
 break;
 }
-tmp_tb = (TranslationBlock *)(tb1 & ~3);
-tb2 = tmp_tb->jmp_list_next[n1];
-tb_reset_jump(tmp_tb, n1);
-tmp_tb->jmp_list_next[n1] = (uintptr_t)NULL;
-tb1 = tb2;
+tb_reset_jump(tb1, n1);
+*ptb = tb1->jmp_list_next[n1];
+tb1->jmp_list_next[n1] = (uintptr_t)NULL;
 }
-
-assert(((uintptr_t)tb & 3) == 0);
-tb->jmp_list_first = (uintptr_t)tb | 2; /* fail safe */
 }
 
 /* invalidate one TB */
-- 
2.8.1




[Qemu-devel] [PATCH v3 00/10] tcg: Direct block chaining clean-up

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

This series combines a set of patches which is meant to improve overall code
structure and readability of direct block chaining mechanism. The other point
is to make a step towards thread safety of TB chainig.

The series' tree can be found in a public git repository [1].

[1] https://github.com/sergefdrv/qemu/tree/tb-chaining-cleanup-v3

Summary of changes:
 Changes in v3:
  * New patch to clean up safety checks [PATCH v3 09/10]
  * New patch to eliminate unneeded checks in user-mode [PATCH v3 10/10]
 Changes in v2:
  * Eliminated duplicate dereference of 'ptb' in tb_jmp_remove() [PATCH v2 2/8]
  * Tweaked a comment [PATCH v2 4/8]
  * Complete rewrite [PATCH v2 5/8]
  * Tweaked a comment; eliminated duplicate dereference of 'ptb' in
tb_jmp_unlink() [PATCH v2 8/8]

Sergey Fedorov (10):
  tcg: Clean up direct block chaining data fields
  tcg: Use uintptr_t type for jmp_list_{next|first} fields of TB
  tcg: Rearrange tb_link_page() to avoid forward declaration
  tcg: Init TB's direct jumps before making it visible
  tcg: Clarify thread safety check in tb_add_jump()
  tcg: Rename tb_jmp_remove() to tb_remove_from_jmp_list()
  tcg: Extract removing of jumps to TB from tb_phys_invalidate()
  tcg: Clean up tb_jmp_unlink()
  tcg: Clean up direct block chaining safety checks
  tcg: Moderate direct block chaining safety checks in user mode

 cpu-exec.c|   7 +-
 include/exec/exec-all.h   |  70 ++
 target-alpha/translate.c  |  13 +-
 target-arm/translate-a64.c|  11 +-
 target-arm/translate.c|  25 +++-
 target-cris/translate.c   |  24 +++-
 target-i386/translate.c   |  31 +++--
 target-lm32/translate.c   |  29 -
 target-m68k/translate.c   |  26 +++-
 target-microblaze/translate.c |  23 +++-
 target-mips/translate.c   |  28 +++-
 target-moxie/translate.c  |  29 -
 target-openrisc/translate.c   |  28 +++-
 target-ppc/translate.c|  28 +++-
 target-s390x/translate.c  |  25 +++-
 target-sh4/translate.c|  29 -
 target-sparc/translate.c  |  32 -
 target-tricore/translate.c|  28 +++-
 target-unicore32/translate.c  |  24 +++-
 target-xtensa/translate.c |  20 +++
 tcg/aarch64/tcg-target.inc.c  |   7 +-
 tcg/arm/tcg-target.inc.c  |   8 +-
 tcg/i386/tcg-target.inc.c |   8 +-
 tcg/ia64/tcg-target.inc.c |   6 +-
 tcg/mips/tcg-target.inc.c |   8 +-
 tcg/ppc/tcg-target.inc.c  |   6 +-
 tcg/s390/tcg-target.inc.c |  11 +-
 tcg/sparc/tcg-target.inc.c|   9 +-
 tcg/tcg.h |   6 +-
 tcg/tci/tcg-target.inc.c  |  10 +-
 translate-all.c   | 292 ++
 31 files changed, 605 insertions(+), 296 deletions(-)

-- 
2.8.1




[Qemu-devel] [PATCH v3 10/10] tcg: Moderate direct block chaining safety checks in user mode

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

In user mode, there's only a static address translation, TBs are always
invalidated properly and direct jumps are reset when mapping change.
Thus the destination address is always valid for direct jumps and
there's no need to restrict it to the pages the TB resides in.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 target-alpha/translate.c  | 10 +-
 target-arm/translate-a64.c|  8 +++-
 target-arm/translate.c| 26 ++
 target-cris/translate.c   | 26 ++
 target-i386/translate.c   | 30 --
 target-lm32/translate.c   | 27 ---
 target-m68k/translate.c   | 30 --
 target-microblaze/translate.c | 23 +--
 target-mips/translate.c   | 28 +---
 target-moxie/translate.c  | 29 +
 target-openrisc/translate.c   | 28 +---
 target-ppc/translate.c| 32 +++-
 target-s390x/translate.c  | 20 +++-
 target-sh4/translate.c| 27 ---
 target-sparc/translate.c  | 32 +++-
 target-tricore/translate.c| 28 +---
 target-unicore32/translate.c  | 24 +---
 target-xtensa/translate.c | 16 ++--
 18 files changed, 325 insertions(+), 119 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 5fa66309ce2e..684559e694bd 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -464,11 +464,19 @@ static bool use_goto_tb(DisasContext *ctx, uint64_t dest)
 if (in_superpage(ctx, dest)) {
 return true;
 }
+#ifndef CONFIG_USER_ONLY
 /* Direct jumps with goto_tb are only safe within the page this TB resides
  * in because we don't take care of direct jumps when address mapping
- * changes.
+ * changes in system mode.
  */
 return ((ctx->tb->pc ^ dest) & TARGET_PAGE_MASK) == 0;
+#else
+/* In user mode, there's only a static address translation, so the
+ * destination address is always valid. TBs are always invalidated properly
+ * and direct jumps are reset when mapping attributes change.
+ */
+return true;
+#endif
 }
 
 static ExitStatus gen_bdirect(DisasContext *ctx, int ra, int32_t disp)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index bf8471c7fa99..c2fccfe5a038 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -274,14 +274,20 @@ static inline bool use_goto_tb(DisasContext *s, int n, 
uint64_t dest)
 return false;
 }
 
+#ifndef CONFIG_USER_ONLY
 /* Direct jumps with goto_tb are only safe within the page this TB resides
  * in because we don't take care of direct jumps when address mapping
- * changes.
+ * changes in system mode.
  */
 if ((s->tb->pc & TARGET_PAGE_MASK) != (dest & TARGET_PAGE_MASK)) {
 return false;
 }
+#endif
 
+/* In user mode, there's only a static address translation, so the
+ * destination address is always valid. TBs are always invalidated properly
+ * and direct jumps are reset when mapping attributes change.
+ */
 return true;
 }
 
diff --git a/target-arm/translate.c b/target-arm/translate.c
index aeb3e84e8d40..ae6fd7f1c313 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4049,20 +4049,30 @@ static int disas_vfp_insn(DisasContext *s, uint32_t 
insn)
 return 0;
 }
 
-static inline void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
+static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
 {
-TranslationBlock *tb;
-
-tb = s->tb;
+#ifndef CONFIG_USER_ONLY
 /* Direct jumps with goto_tb are only safe within the pages this TB resides
  * in because we don't take care of direct jumps when address mapping
- * changes.
+ * changes in system mode.
  */
-if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
-((s->pc - 1) & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
+return (s->tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
+   ((s->pc - 1) & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK);
+#else
+/* In user mode, there's only a static address translation, so the
+ * destination address is always valid. TBs are always invalidated properly
+ * and direct jumps are reset when mapping attributes change.
+ */
+return true;
+#endif
+}
+
+static inline void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
+{
+if (use_goto_tb(s, dest)) {
 tcg_gen_goto_tb(n);
 gen_set_pc_im(s, dest);
-tcg_gen_exit_tb((uintptr_t)tb + n);
+tcg_gen_exit_tb((uintptr_t)s->tb + n);
 } else {
 

[Qemu-devel] [PATCH v3 07/10] tcg: Extract removing of jumps to TB from tb_phys_invalidate()

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

Move the code for removing jumps to a TB out of tb_phys_invalidate() to
a separate static inline function tb_jmp_unlink(). This simplifies
tb_phys_invalidate() and improves code structure.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
---
 translate-all.c | 44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index c2a800a7d8a0..2d82ce5735a1 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -961,14 +961,37 @@ static inline void tb_reset_jump(TranslationBlock *tb, 
int n)
 tb_set_jmp_target(tb, n, addr);
 }
 
+/* remove any jumps to the TB */
+static inline void tb_jmp_unlink(TranslationBlock *tb)
+{
+uintptr_t tb1, tb2;
+unsigned int n1;
+
+tb1 = tb->jmp_list_first;
+for (;;) {
+TranslationBlock *tmp_tb;
+n1 = tb1 & 3;
+if (n1 == 2) {
+break;
+}
+tmp_tb = (TranslationBlock *)(tb1 & ~3);
+tb2 = tmp_tb->jmp_list_next[n1];
+tb_reset_jump(tmp_tb, n1);
+tmp_tb->jmp_list_next[n1] = (uintptr_t)NULL;
+tb1 = tb2;
+}
+
+assert(((uintptr_t)tb & 3) == 0);
+tb->jmp_list_first = (uintptr_t)tb | 2; /* fail safe */
+}
+
 /* invalidate one TB */
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 {
 CPUState *cpu;
 PageDesc *p;
-unsigned int h, n1;
+unsigned int h;
 tb_page_addr_t phys_pc;
-uintptr_t tb1, tb2;
 
 /* remove the TB from the hash list */
 phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
@@ -1002,22 +1025,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 tb_remove_from_jmp_list(tb, 1);
 
 /* suppress any remaining jumps to this TB */
-tb1 = tb->jmp_list_first;
-for (;;) {
-TranslationBlock *tmp_tb;
-n1 = tb1 & 3;
-if (n1 == 2) {
-break;
-}
-tmp_tb = (TranslationBlock *)(tb1 & ~3);
-tb2 = tmp_tb->jmp_list_next[n1];
-tb_reset_jump(tmp_tb, n1);
-tmp_tb->jmp_list_next[n1] = (uintptr_t)NULL;
-tb1 = tb2;
-}
-
-assert(((uintptr_t)tb & 3) == 0);
-tb->jmp_list_first = (uintptr_t)tb | 2; /* fail safe */
+tb_jmp_unlink(tb);
 
 tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
 }
-- 
2.8.1




[Qemu-devel] [PATCH v3 06/10] tcg: Rename tb_jmp_remove() to tb_remove_from_jmp_list()

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

tb_jmp_remove() was only used to remove the TB from a list of all TBs
jumping to the same TB which is n-th jump destination of the given TB.
Put a comment briefly describing the function behavior and rename it to
better reflect its purpose.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
---
 translate-all.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index dfa7f0d64e76..c2a800a7d8a0 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -923,7 +923,8 @@ static inline void tb_page_remove(TranslationBlock **ptb, 
TranslationBlock *tb)
 }
 }
 
-static inline void tb_jmp_remove(TranslationBlock *tb, int n)
+/* remove the TB from a list of TBs jumping to the n-th jump target of the TB 
*/
+static inline void tb_remove_from_jmp_list(TranslationBlock *tb, int n)
 {
 TranslationBlock *tb1;
 uintptr_t *ptb, ntb;
@@ -997,8 +998,8 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 }
 
 /* suppress this TB from the two jump lists */
-tb_jmp_remove(tb, 0);
-tb_jmp_remove(tb, 1);
+tb_remove_from_jmp_list(tb, 0);
+tb_remove_from_jmp_list(tb, 1);
 
 /* suppress any remaining jumps to this TB */
 tb1 = tb->jmp_list_first;
-- 
2.8.1




[Qemu-devel] [PATCH v3 04/10] tcg: Init TB's direct jumps before making it visible

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

Initialize TB's direct jump list data fields and reset the jumps before
tb_link_page() puts it into the physical hash table and the physical
page list. So TB is completely initialized before it becomes visible.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---

Changes in v2:
 * Tweaked a comment

 translate-all.c | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 7ac7916f2792..dfa7f0d64e76 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1133,19 +1133,6 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 tb->page_addr[1] = -1;
 }
 
-assert(((uintptr_t)tb & 3) == 0);
-tb->jmp_list_first = (uintptr_t)tb | 2;
-tb->jmp_list_next[0] = (uintptr_t)NULL;
-tb->jmp_list_next[1] = (uintptr_t)NULL;
-
-/* init original jump addresses */
-if (tb->jmp_reset_offset[0] != TB_JMP_RESET_OFFSET_INVALID) {
-tb_reset_jump(tb, 0);
-}
-if (tb->jmp_reset_offset[1] != TB_JMP_RESET_OFFSET_INVALID) {
-tb_reset_jump(tb, 1);
-}
-
 #ifdef DEBUG_TB_CHECK
 tb_page_check();
 #endif
@@ -1254,6 +1241,20 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 ROUND_UP((uintptr_t)gen_code_buf + gen_code_size + search_size,
  CODE_GEN_ALIGN);
 
+/* init jump list */
+assert(((uintptr_t)tb & 3) == 0);
+tb->jmp_list_first = (uintptr_t)tb | 2;
+tb->jmp_list_next[0] = (uintptr_t)NULL;
+tb->jmp_list_next[1] = (uintptr_t)NULL;
+
+/* init original jump addresses wich has been set during tcg_gen_code() */
+if (tb->jmp_reset_offset[0] != TB_JMP_RESET_OFFSET_INVALID) {
+tb_reset_jump(tb, 0);
+}
+if (tb->jmp_reset_offset[1] != TB_JMP_RESET_OFFSET_INVALID) {
+tb_reset_jump(tb, 1);
+}
+
 /* check next page if needed */
 virt_page2 = (pc + tb->size - 1) & TARGET_PAGE_MASK;
 phys_page2 = -1;
-- 
2.8.1




[Qemu-devel] [PATCH v3 03/10] tcg: Rearrange tb_link_page() to avoid forward declaration

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
---
 translate-all.c | 204 
 1 file changed, 101 insertions(+), 103 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index ba71ff73f55f..7ac7916f2792 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -153,8 +153,6 @@ void tb_lock_reset(void)
 #endif
 }
 
-static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
- tb_page_addr_t phys_page2);
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
 
 void cpu_gen_init(void)
@@ -1052,6 +1050,107 @@ static void build_page_bitmap(PageDesc *p)
 }
 }
 
+/* add the tb in the target page and protect it if necessary
+ *
+ * Called with mmap_lock held for user-mode emulation.
+ */
+static inline void tb_alloc_page(TranslationBlock *tb,
+ unsigned int n, tb_page_addr_t page_addr)
+{
+PageDesc *p;
+#ifndef CONFIG_USER_ONLY
+bool page_already_protected;
+#endif
+
+tb->page_addr[n] = page_addr;
+p = page_find_alloc(page_addr >> TARGET_PAGE_BITS, 1);
+tb->page_next[n] = p->first_tb;
+#ifndef CONFIG_USER_ONLY
+page_already_protected = p->first_tb != NULL;
+#endif
+p->first_tb = (TranslationBlock *)((uintptr_t)tb | n);
+invalidate_page_bitmap(p);
+
+#if defined(CONFIG_USER_ONLY)
+if (p->flags & PAGE_WRITE) {
+target_ulong addr;
+PageDesc *p2;
+int prot;
+
+/* force the host page as non writable (writes will have a
+   page fault + mprotect overhead) */
+page_addr &= qemu_host_page_mask;
+prot = 0;
+for (addr = page_addr; addr < page_addr + qemu_host_page_size;
+addr += TARGET_PAGE_SIZE) {
+
+p2 = page_find(addr >> TARGET_PAGE_BITS);
+if (!p2) {
+continue;
+}
+prot |= p2->flags;
+p2->flags &= ~PAGE_WRITE;
+  }
+mprotect(g2h(page_addr), qemu_host_page_size,
+ (prot & PAGE_BITS) & ~PAGE_WRITE);
+#ifdef DEBUG_TB_INVALIDATE
+printf("protecting code page: 0x" TARGET_FMT_lx "\n",
+   page_addr);
+#endif
+}
+#else
+/* if some code is already present, then the pages are already
+   protected. So we handle the case where only the first TB is
+   allocated in a physical page */
+if (!page_already_protected) {
+tlb_protect_code(page_addr);
+}
+#endif
+}
+
+/* add a new TB and link it to the physical page tables. phys_page2 is
+ * (-1) to indicate that only one page contains the TB.
+ *
+ * Called with mmap_lock held for user-mode emulation.
+ */
+static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
+ tb_page_addr_t phys_page2)
+{
+unsigned int h;
+TranslationBlock **ptb;
+
+/* add in the physical hash table */
+h = tb_phys_hash_func(phys_pc);
+ptb = _ctx.tb_ctx.tb_phys_hash[h];
+tb->phys_hash_next = *ptb;
+*ptb = tb;
+
+/* add in the page list */
+tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);
+if (phys_page2 != -1) {
+tb_alloc_page(tb, 1, phys_page2);
+} else {
+tb->page_addr[1] = -1;
+}
+
+assert(((uintptr_t)tb & 3) == 0);
+tb->jmp_list_first = (uintptr_t)tb | 2;
+tb->jmp_list_next[0] = (uintptr_t)NULL;
+tb->jmp_list_next[1] = (uintptr_t)NULL;
+
+/* init original jump addresses */
+if (tb->jmp_reset_offset[0] != TB_JMP_RESET_OFFSET_INVALID) {
+tb_reset_jump(tb, 0);
+}
+if (tb->jmp_reset_offset[1] != TB_JMP_RESET_OFFSET_INVALID) {
+tb_reset_jump(tb, 1);
+}
+
+#ifdef DEBUG_TB_CHECK
+tb_page_check();
+#endif
+}
+
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
   target_ulong pc, target_ulong cs_base,
@@ -1409,107 +1508,6 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
 }
 #endif
 
-/* add the tb in the target page and protect it if necessary
- *
- * Called with mmap_lock held for user-mode emulation.
- */
-static inline void tb_alloc_page(TranslationBlock *tb,
- unsigned int n, tb_page_addr_t page_addr)
-{
-PageDesc *p;
-#ifndef CONFIG_USER_ONLY
-bool page_already_protected;
-#endif
-
-tb->page_addr[n] = page_addr;
-p = page_find_alloc(page_addr >> TARGET_PAGE_BITS, 1);
-tb->page_next[n] = p->first_tb;
-#ifndef CONFIG_USER_ONLY
-page_already_protected = p->first_tb != NULL;
-#endif
-p->first_tb = (TranslationBlock *)((uintptr_t)tb | n);
-invalidate_page_bitmap(p);
-
-#if defined(CONFIG_USER_ONLY)
-if (p->flags & PAGE_WRITE) {
-target_ulong addr;
-PageDesc *p2;
-int prot;
-
-/* force the host page as non writable 

[Qemu-devel] [PATCH v3 05/10] tcg: Clarify thread safety check in tb_add_jump()

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

The check is to make sure that another thread hasn't already done the
same while we were outside of tb_lock. Mention this in a comment.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---

Changes in v2:
 * Typo fixed in the commit title
 * Complete rewrite of the commit body and the patch based on Paolo's comments

 include/exec/exec-all.h | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b055716ed690..8e81ef5fb2c2 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -391,21 +391,24 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
 static inline void tb_add_jump(TranslationBlock *tb, int n,
TranslationBlock *tb_next)
 {
-/* NOTE: this test is only needed for thread safety */
-if (!tb->jmp_list_next[n]) {
-qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
-   "Linking TBs %p [" TARGET_FMT_lx
-   "] index %d -> %p [" TARGET_FMT_lx "]\n",
-   tb->tc_ptr, tb->pc, n,
-   tb_next->tc_ptr, tb_next->pc);
-/* patch the native jump address */
-tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc_ptr);
-
-/* add in TB jmp circular list */
-tb->jmp_list_next[n] = tb_next->jmp_list_first;
-assert(((uintptr_t)tb & 3) == 0);
-tb_next->jmp_list_first = (uintptr_t)tb | n;
+if (tb->jmp_list_next[n]) {
+/* Another thread has already done this while we were
+ * outside of the lock; nothing to do in this case */
+return;
 }
+qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
+   "Linking TBs %p [" TARGET_FMT_lx
+   "] index %d -> %p [" TARGET_FMT_lx "]\n",
+   tb->tc_ptr, tb->pc, n,
+   tb_next->tc_ptr, tb_next->pc);
+
+/* patch the native jump address */
+tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc_ptr);
+
+/* add in TB jmp circular list */
+tb->jmp_list_next[n] = tb_next->jmp_list_first;
+assert(((uintptr_t)tb & 3) == 0);
+tb_next->jmp_list_first = (uintptr_t)tb | n;
 }
 
 /* GETRA is the true target of the return instruction that we'll execute,
-- 
2.8.1




[Qemu-devel] [PATCH v3 02/10] tcg: Use uintptr_t type for jmp_list_{next|first} fields of TB

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

These fields do not contain pure pointers to a TranslationBlock
structure. So uintptr_t is the most appropriate type for them.
Also put an assert to assure that the two least significant bits of the
pointer are zero before assigning it to jmp_list_first.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---

Changes in v2:
 * Eliminated duplicate dereference of 'ptb' in tb_jmp_remove()

 include/exec/exec-all.h | 13 -
 translate-all.c | 38 --
 2 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 38a149cc5e0c..b055716ed690 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -275,14 +275,16 @@ struct TranslationBlock {
  * jmp_list_first points to the first TB jumping to this one.
  * jmp_list_next is used to point to the next TB in a list.
  * Since each TB can have two jumps, it can participate in two lists.
- * The two least significant bits of a pointer are used to choose which
- * data field holds a pointer to the next TB:
+ * jmp_list_first and jmp_list_next are 4-byte aligned pointers to a
+ * TranslationBlock structure, but the two least significant bits of
+ * them are used to encode which data field of the pointed TB should
+ * be used to traverse the list further from that TB:
  * 0 => jmp_list_next[0], 1 => jmp_list_next[1], 2 => jmp_list_first.
  * In other words, 0/1 tells which jump is used in the pointed TB,
  * and 2 means that this is a pointer back to the target TB of this list.
  */
-struct TranslationBlock *jmp_list_next[2];
-struct TranslationBlock *jmp_list_first;
+uintptr_t jmp_list_next[2];
+uintptr_t jmp_list_first;
 };
 
 #include "qemu/thread.h"
@@ -401,7 +403,8 @@ static inline void tb_add_jump(TranslationBlock *tb, int n,
 
 /* add in TB jmp circular list */
 tb->jmp_list_next[n] = tb_next->jmp_list_first;
-tb_next->jmp_list_first = (TranslationBlock *)((uintptr_t)tb | n);
+assert(((uintptr_t)tb & 3) == 0);
+tb_next->jmp_list_first = (uintptr_t)tb | n;
 }
 }
 
diff --git a/translate-all.c b/translate-all.c
index 33ca06c663d4..ba71ff73f55f 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -927,17 +927,17 @@ static inline void tb_page_remove(TranslationBlock **ptb, 
TranslationBlock *tb)
 
 static inline void tb_jmp_remove(TranslationBlock *tb, int n)
 {
-TranslationBlock *tb1, **ptb;
+TranslationBlock *tb1;
+uintptr_t *ptb, ntb;
 unsigned int n1;
 
 ptb = >jmp_list_next[n];
-tb1 = *ptb;
-if (tb1) {
+if (*ptb) {
 /* find tb(n) in circular list */
 for (;;) {
-tb1 = *ptb;
-n1 = (uintptr_t)tb1 & 3;
-tb1 = (TranslationBlock *)((uintptr_t)tb1 & ~3);
+ntb = *ptb;
+n1 = ntb & 3;
+tb1 = (TranslationBlock *)(ntb & ~3);
 if (n1 == n && tb1 == tb) {
 break;
 }
@@ -950,7 +950,7 @@ static inline void tb_jmp_remove(TranslationBlock *tb, int 
n)
 /* now we can suppress tb(n) from the list */
 *ptb = tb->jmp_list_next[n];
 
-tb->jmp_list_next[n] = NULL;
+tb->jmp_list_next[n] = (uintptr_t)NULL;
 }
 }
 
@@ -969,7 +969,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 PageDesc *p;
 unsigned int h, n1;
 tb_page_addr_t phys_pc;
-TranslationBlock *tb1, *tb2;
+uintptr_t tb1, tb2;
 
 /* remove the TB from the hash list */
 phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
@@ -1005,19 +1005,20 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 /* suppress any remaining jumps to this TB */
 tb1 = tb->jmp_list_first;
 for (;;) {
-n1 = (uintptr_t)tb1 & 3;
+TranslationBlock *tmp_tb;
+n1 = tb1 & 3;
 if (n1 == 2) {
 break;
 }
-tb1 = (TranslationBlock *)((uintptr_t)tb1 & ~3);
-tb2 = tb1->jmp_list_next[n1];
-tb_reset_jump(tb1, n1);
-tb1->jmp_list_next[n1] = NULL;
+tmp_tb = (TranslationBlock *)(tb1 & ~3);
+tb2 = tmp_tb->jmp_list_next[n1];
+tb_reset_jump(tmp_tb, n1);
+tmp_tb->jmp_list_next[n1] = (uintptr_t)NULL;
 tb1 = tb2;
 }
 
-/* fail safe */
-tb->jmp_list_first = (TranslationBlock *)((uintptr_t)tb | 2);
+assert(((uintptr_t)tb & 3) == 0);
+tb->jmp_list_first = (uintptr_t)tb | 2; /* fail safe */
 
 tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
 }
@@ -1491,9 +1492,10 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 tb->page_addr[1] = -1;
 }
 
-tb->jmp_list_first = (TranslationBlock *)((uintptr_t)tb | 2);
-tb->jmp_list_next[0] = NULL;
-tb->jmp_list_next[1] = NULL;
+

[Qemu-devel] [PATCH v3 01/10] tcg: Clean up direct block chaining data fields

2016-04-10 Thread Sergey Fedorov
From: Sergey Fedorov 

Briefly describe in a comment how direct block chaining is done. It
should help in understanding of the following data fields.

Rename some fields in TranslationBlock and TCGContext structures to
better reflect their purpose (dropping excessive 'tb_' prefix in
TranslationBlock but keeping it in TCGContext):
   tb_next_offset  =>  jmp_reset_offset
   tb_jmp_offset   =>  jmp_insn_offset
   tb_next =>  jmp_target_addr
   jmp_next=>  jmp_list_next
   jmp_first   =>  jmp_list_first

Avoid using a magic constant as an invalid offset which is used to
indicate that there's no n-th jump generated.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 include/exec/exec-all.h  | 44 --
 tcg/aarch64/tcg-target.inc.c |  7 +++---
 tcg/arm/tcg-target.inc.c |  8 +++
 tcg/i386/tcg-target.inc.c|  8 +++
 tcg/ia64/tcg-target.inc.c|  6 +++---
 tcg/mips/tcg-target.inc.c|  8 +++
 tcg/ppc/tcg-target.inc.c |  6 +++---
 tcg/s390/tcg-target.inc.c| 11 +-
 tcg/sparc/tcg-target.inc.c   |  9 
 tcg/tcg.h|  6 +++---
 tcg/tci/tcg-target.inc.c | 10 -
 translate-all.c  | 51 +++-
 12 files changed, 96 insertions(+), 78 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 736209505a68..38a149cc5e0c 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -257,20 +257,32 @@ struct TranslationBlock {
 struct TranslationBlock *page_next[2];
 tb_page_addr_t page_addr[2];
 
-/* the following data are used to directly call another TB from
-   the code of this one. */
-uint16_t tb_next_offset[2]; /* offset of original jump target */
+/* The following data are used to directly call another TB from
+ * the code of this one. This can be done either by emitting direct or
+ * indirect native jump instructions. These jumps are reset so that the TB
+ * just continue its execution. The TB can be linked to another one by
+ * setting one of the jump targets (or patching the jump instruction). Only
+ * two of such jumps are supported.
+ */
+uint16_t jmp_reset_offset[2]; /* offset of original jump target */
+#define TB_JMP_RESET_OFFSET_INVALID 0x /* indicates no jump generated */
 #ifdef USE_DIRECT_JUMP
-uint16_t tb_jmp_offset[2]; /* offset of jump instruction */
+uint16_t jmp_insn_offset[2]; /* offset of native jump instruction */
 #else
-uintptr_t tb_next[2]; /* address of jump generated code */
+uintptr_t jmp_target_addr[2]; /* target address for indirect jump */
 #endif
-/* list of TBs jumping to this one. This is a circular list using
-   the two least significant bits of the pointers to tell what is
-   the next pointer: 0 = jmp_next[0], 1 = jmp_next[1], 2 =
-   jmp_first */
-struct TranslationBlock *jmp_next[2];
-struct TranslationBlock *jmp_first;
+/* Each TB has an assosiated circular list of TBs jumping to this one.
+ * jmp_list_first points to the first TB jumping to this one.
+ * jmp_list_next is used to point to the next TB in a list.
+ * Since each TB can have two jumps, it can participate in two lists.
+ * The two least significant bits of a pointer are used to choose which
+ * data field holds a pointer to the next TB:
+ * 0 => jmp_list_next[0], 1 => jmp_list_next[1], 2 => jmp_list_first.
+ * In other words, 0/1 tells which jump is used in the pointed TB,
+ * and 2 means that this is a pointer back to the target TB of this list.
+ */
+struct TranslationBlock *jmp_list_next[2];
+struct TranslationBlock *jmp_list_first;
 };
 
 #include "qemu/thread.h"
@@ -359,7 +371,7 @@ void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr);
 static inline void tb_set_jmp_target(TranslationBlock *tb,
  int n, uintptr_t addr)
 {
-uint16_t offset = tb->tb_jmp_offset[n];
+uint16_t offset = tb->jmp_insn_offset[n];
 tb_set_jmp_target1((uintptr_t)(tb->tc_ptr + offset), addr);
 }
 
@@ -369,7 +381,7 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
 static inline void tb_set_jmp_target(TranslationBlock *tb,
  int n, uintptr_t addr)
 {
-tb->tb_next[n] = addr;
+tb->jmp_target_addr[n] = addr;
 }
 
 #endif
@@ -378,7 +390,7 @@ static inline void tb_add_jump(TranslationBlock *tb, int n,
TranslationBlock *tb_next)
 {
 /* NOTE: this test is only needed for thread safety */
-if (!tb->jmp_next[n]) {
+if (!tb->jmp_list_next[n]) {
 qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
"Linking TBs %p [" TARGET_FMT_lx
"] index %d -> %p [" TARGET_FMT_lx "]\n",
@@ -388,8 +400,8 @@ static 

[Qemu-devel] [Bug 1568589] Re: Compile for os x host failed

2016-04-10 Thread iQQator
fix: http://patchwork.ozlabs.org/patch/603499/

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1568589

Title:
  Compile for os x host failed

Status in QEMU:
  New

Bug description:
  Hello QEMU,

  I try compile qemu from git pulled by me today and have a troubles:

   GEN   trace/generated-helpers.c
CCaarch64-softmmu/trace/generated-helpers.o
LINK  aarch64-softmmu/qemu-system-aarch64
  Undefined symbols for architecture x86_64:
"_event_notifier_init_fd", referenced from:
_process_msg in ivshmem.o
  ld: symbol(s) not found for architecture x86_64
  clang: error: linker command failed with exit code 1 (use -v to see 
invocation)
  make[1]: *** [qemu-system-aarch64] Error 1
  make: *** [subdir-aarch64-softmmu] Error 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1568589/+subscriptions



[Qemu-devel] [Bug 1568589] Re: Compile for os x host failed

2016-04-10 Thread iQQator
Also for i386

  LINK  i386-softmmu/qemu-system-i386
Undefined symbols for architecture x86_64:
  "_event_notifier_init_fd", referenced from:
  _process_msg in ivshmem.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [qemu-system-i386] Error 1
make: *** [subdir-i386-softmmu] Error 2

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1568589

Title:
  Compile for os x host failed

Status in QEMU:
  New

Bug description:
  Hello QEMU,

  I try compile qemu from git pulled by me today and have a troubles:

   GEN   trace/generated-helpers.c
CCaarch64-softmmu/trace/generated-helpers.o
LINK  aarch64-softmmu/qemu-system-aarch64
  Undefined symbols for architecture x86_64:
"_event_notifier_init_fd", referenced from:
_process_msg in ivshmem.o
  ld: symbol(s) not found for architecture x86_64
  clang: error: linker command failed with exit code 1 (use -v to see 
invocation)
  make[1]: *** [qemu-system-aarch64] Error 1
  make: *** [subdir-aarch64-softmmu] Error 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1568589/+subscriptions



[Qemu-devel] [Bug 1568589] Re: Compile for os x host failed

2016-04-10 Thread iQQator
Host: 
# uname -a
Darwin MacBook-Pro 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 
2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64

# ./configure
Configure: http://pastebin.com/L9ytLFDE

# make -v
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1568589

Title:
  Compile for os x host failed

Status in QEMU:
  New

Bug description:
  Hello QEMU,

  I try compile qemu from git pulled by me today and have a troubles:

   GEN   trace/generated-helpers.c
CCaarch64-softmmu/trace/generated-helpers.o
LINK  aarch64-softmmu/qemu-system-aarch64
  Undefined symbols for architecture x86_64:
"_event_notifier_init_fd", referenced from:
_process_msg in ivshmem.o
  ld: symbol(s) not found for architecture x86_64
  clang: error: linker command failed with exit code 1 (use -v to see 
invocation)
  make[1]: *** [qemu-system-aarch64] Error 1
  make: *** [subdir-aarch64-softmmu] Error 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1568589/+subscriptions



[Qemu-devel] [Bug 1568589] [NEW] Compile for os x host failed

2016-04-10 Thread iQQator
Public bug reported:

Hello QEMU,

I try compile qemu from git pulled by me today and have a troubles:

 GEN   trace/generated-helpers.c
  CCaarch64-softmmu/trace/generated-helpers.o
  LINK  aarch64-softmmu/qemu-system-aarch64
Undefined symbols for architecture x86_64:
  "_event_notifier_init_fd", referenced from:
  _process_msg in ivshmem.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [qemu-system-aarch64] Error 1
make: *** [subdir-aarch64-softmmu] Error 2

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1568589

Title:
  Compile for os x host failed

Status in QEMU:
  New

Bug description:
  Hello QEMU,

  I try compile qemu from git pulled by me today and have a troubles:

   GEN   trace/generated-helpers.c
CCaarch64-softmmu/trace/generated-helpers.o
LINK  aarch64-softmmu/qemu-system-aarch64
  Undefined symbols for architecture x86_64:
"_event_notifier_init_fd", referenced from:
_process_msg in ivshmem.o
  ld: symbol(s) not found for architecture x86_64
  clang: error: linker command failed with exit code 1 (use -v to see 
invocation)
  make[1]: *** [qemu-system-aarch64] Error 1
  make: *** [subdir-aarch64-softmmu] Error 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1568589/+subscriptions



Re: [Qemu-devel] configure qemu and compile it on windows

2016-04-10 Thread Marwa Hamza
and how can i add a package to cygwin after the installation ? there is any
command line ? , i didn't find a way to add the python package so i
downloaded from internet like the traditional way , then an other error msg
dispalyed ERROR: "cc" either does not exist or does not work
so i add this cross compiler from cygwin folder
 ./configure --target-list=arm-softmmu --python=/Python27/python
--cc=/cygwin64/bin/x86_64-w64-mingw32-gcc
and i got this :/
[image: Images intégrées 1]

2016-04-10 16:55 GMT+02:00 Stefan Weil :

> Am 10.04.2016 um 15:56 schrieb Marwa Hamza:
> > thanks stefan for your answer , i just download cygwin , mingw64 and
> > msys2 and packages u mentioned , all in c:/ , and i download qemu from
> > source in c:/msys2/ , but when i run ./configure i got this error ERROR:
> > Python not found. Use --python=/path/to/python
> > Images intégrées 1
> >
>
> You'll need package python-2.7.10-1, too.
>
> As I said, there might be some more packages needed for the QEMU build,
> but they are all available for Cygwin.
>
> Stefan
>
>


Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver.

2016-04-10 Thread Michael S. Tsirkin
On Fri, Apr 01, 2016 at 04:38:28PM +0530, Jitendra Kolhe wrote:
> On 3/29/2016 5:58 PM, Michael S. Tsirkin wrote:
> > On Mon, Mar 28, 2016 at 09:46:05AM +0530, Jitendra Kolhe wrote:
> >> While measuring live migration performance for qemu/kvm guest, it
> >> was observed that the qemu doesn’t maintain any intelligence for the
> >> guest ram pages which are released by the guest balloon driver and
> >> treat such pages as any other normal guest ram pages. This has direct
> >> impact on overall migration time for the guest which has released
> >> (ballooned out) memory to the host.
> >>
> >> In case of large systems, where we can configure large guests with 1TB
> >> and with considerable amount of memory release by balloon driver to the,
> >> host the migration time gets worse.
> >>
> >> The solution proposed below is local only to qemu (and does not require
> >> any modification to Linux kernel or any guest driver). We have verified
> >> the fix for large guests =1TB on HPE Superdome X (which can support up
> >> to 240 cores and 12TB of memory) and in case where 90% of memory is
> >> released by balloon driver the migration time for an idle guests reduces
> >> to ~600 sec's from ~1200 sec’s.
> >>
> >> Detail: During live migration, as part of 1st iteration in 
> >> ram_save_iterate()
> >> -> ram_find_and_save_block () will try to migrate ram pages which are
> >> released by vitrio-balloon driver as part of dynamic memory delete.
> >> Even though the pages which are returned to the host by virtio-balloon
> >> driver are zero pages, the migration algorithm will still end up
> >> scanning the entire page ram_find_and_save_block() -> ram_save_page/
> >> ram_save_compressed_page -> save_zero_page() -> is_zero_range().  We
> >> also end-up sending some control information over network for these
> >> page during migration. This adds to total migration time.
> >>
> >> The proposed fix, uses the existing bitmap infrastructure to create
> >> a virtio-balloon bitmap. The bits in the bitmap represent a guest ram
> >> page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents
> >> entire guest ram memory till max configured memory. Guest ram pages
> >> claimed by the virtio-balloon driver will be represented by 1 in the
> >> bitmap. During live migration, each guest ram page (host VA offset)
> >> is checked against the virtio-balloon bitmap, if the bit is set the
> >> corresponding ram page will be excluded from scanning and sending
> >> control information during migration. The bitmap is also migrated to
> >> the target as part of every ram_save_iterate loop and after the
> >> guest is stopped remaining balloon bitmap is migrated as part of
> >> balloon driver save / load interface.
> > 
> > Migrating the bitmap might help a chained migration case
> > but will slow down the more typical case a bit.
> > Make it optional?
> > 
> > 
> 
> Disabling migration of balloon bitmap would disable the 
> optimization permanently for all other subsequent migrations.
> 
> The current implementation sends offsets and lengths of 1s 
> in the bitmap. In cases where we see highly fragmented 
> bitmap which may add to the bitmap migration time, but still 
> it would be less compared to total migration time.
> 
> Here are some values from my test setup for an idle guest 
> with 16GB memory ballooned down to 4GB memory (with no 
> ballooning operation in progress during migration).
> - Total migration time without proposed patch set - ~7600ms 
> - Total migration time with proposed patch set - ~5700ms
>   . Adds overhead of testing against bitmap - ~320ms
>   . Adds overhead of sending the bitmap - ~1.6ms
> - Saves zero page scan + protocol time - ~2348ms  
> We may see some higher pct for time taken to migrate bitmap
> in case ballooning is in progress during migration. 
> 
> The patch does introduces an option to enable/disable 
> testing ram_addr against balloon bitmap during migration 
> (since that’s the major overhead), but we still end-up 
> migrating the bitmap by default. 
> 
> >>
> >> With the proposed fix, the average migration time for an idle guest
> >> with 1TB maximum memory and 64vCpus
> >>  - reduces from ~1200 secs to ~600 sec, with guest memory ballooned
> >>down to 128GB (~10% of 1TB).
> >>  - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned
> >>down to 896GB (~90% of 1TB),
> >>  - with no ballooning configured, we don’t expect to see any impact
> >>on total migration time.
> >>
> >> The optimization gets temporarily disabled, if the balloon operation is
> >> in progress. Since the optimization skips scanning and migrating control
> >> information for ballooned out pages, we might skip guest ram pages in
> >> cases where the guest balloon driver has freed the ram page to the guest
> >> but not yet informed the host/qemu about the ram page
> >> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we
> >> might skip migrating ram pages which the guest is using. Since this

Re: [Qemu-devel] [PATCH v4 00/16] Introduce Intel 82574 GbE Controller Emulation (e1000e)

2016-04-10 Thread Dmitry Fleytman


> On 10 Apr 2016, at 18:35, Michael S. Tsirkin  wrote:
> 
>> On Sun, Apr 10, 2016 at 06:14:49PM +0300, Dmitry Fleytman wrote:
>> From: Dmitry Fleytman 
>> 
>> Hello All,
>> 
>> This is v4 of e1000e series.
>> 
>> For convenience, the same patches are available at:
>> https://github.com/daynix/qemu-e1000e/tree/e1000e-submit-v4
>> 
>> Best regards,
>> Dmitry.
>> 
>> Changes since v3:
>> 
>> 1. Various code fixes as suggested by Jason and Michael
>> 2. Rebased to the latest master
> 
> Pls remember to repost when 2.6 is out.

Sure. When do you think 2.6 will be out?

> 
>> Changes since v2:
>> 
>> 1. Interrupt storm on latest Linux kernels fixed
>> 2. Device unit test added
>> 3. Introduced code sharing between e1000 and e1000e
>> 4. Various code fixes as suggested by Jason
>> 5. Rebased to the latest master
>> 
>> Changes since v1:
>> 
>> 1. PCI_PM_CAP_VER_1_1 is defined now in include/hw/pci/pci_regs.h and
>>   not in include/standard-headers/linux/pci_regs.h.
>> 2. Changes in naming and extra comments in hw/pci/pcie.c and in
>>   include/hw/pci/pcie.h.
>> 3. Defining pci_dsn_ver and pci_dsn_cap static const variables in
>>   hw/pci/pcie.c, instead of PCI_DSN_VER and PCI_DSN_CAP symbolic
>>   constants in include/hw/pci/pcie_regs.h.
>> 4. Changing the vmxnet3_device_serial_num function in hw/net/vmxnet3.c
>>   to avoid the cast when it is called.
>> 5. Avoiding a preceding underscore in all the e1000e-related names.
>> 6. Minor style changes.
>> 
>> ===
>> 
>> Hello All,
>> 
>> This series is the final code of the e1000e device emulation, that we
>> have developed. Please review, and consider acceptance of these patches
>> to the upstream QEMU repository.
>> 
>> The code stability was verified by various traffic tests using Fedora 22
>> Linux, and Windows Server 2012R2 guests. Also, Microsoft Hardware
>> Certification Kit (HCK) tests were run on a Windows Server 2012R2 guest.
>> 
>> There was a discussion on the possibility of code sharing between the
>> e1000e, and the existing e1000 devices. We have reviewed the final code
>> for parts that may be shared between this device and the currently
>> available e1000 emulation. The device specifications are very different,
>> and there are almost no registers, nor functions, that were left as is
>> from e1000. The ring descriptor structures were changed as well, by the
>> introduction of extended and PS descriptors, as well as additional bits.
>> 
>> Additional differences stem from the fact that the e1000e device re-uses
>> network packet abstractions introduced by the vmxnet3 device, while the
>> e1000 has its own code for packet handling. BTW, it may be worth reusing
>> those abstractions in e1000 as well. (Following these changes the
>> vmxnet3 device was successfully tested for possible regressions.)
>> 
>> There are a few minor parts that may be shared, e.g. the default
>> register handlers, and the ring management functions. The total amount
>> of shared lines will be about 100--150, so we're not sure if it makes
>> sense bothering, and taking a risk of breaking e1000, which is a good,
>> old, and stable device.
>> 
>> Currently, the e1000e code is stand alone w.r.t. e1000.
>> 
>> Please share your thoughts.
>> 
>> Thanks in advance,
>> Dmitry.
>> 
>> Changes since RFCv2:
>> 
>> 1. Device functionality verified using Microsoft Hardware Certification Test 
>> Kit (HCK)
>> 2. Introduced a number of performance improvements
>> 3. The code was cleaned, and rebased to the latest master
>> 4. Patches verified with checkpatch.pl
>> 
>> ===
>> 
>> Changes since RFCv1:
>> 
>> 1. Added support for all the device features:
>>  - Interrupt moderation.
>>  - RSS.
>>  - Multiqueue.
>> 2. Simulated exact PCI/PCIe configuration space layout.
>> 3. Made fixes needed to pass Microsoft's HW certification tests (HCK).
>> 
>> This series is still an RFC, because the following tasks are not done yet:
>> 
>> 1. See which code can be shared between this device and the existing e1000 
>> device.
>> 2. Rebase patches to the latest master (current base is v2.3.0).
>> 
>> Please share your thoughts,
>> Thanks, Dmitry.
>> 
>> ===
>> 
>> Hello qemu-devel,
>> 
>> This patch series is an RFC for the new networking device emulation
>> we're developing for QEMU.
>> 
>> This new device emulates the Intel 82574 GbE Controller and works
>> with unmodified Intel e1000e drivers from the Linux/Windows kernels.
>> 
>> The status of the current series is "Functional Device Ready, work
>> on Extended Features in Progress".
>> 
>> More precisely, these patches represent a functional device, which
>> is recognized by the standard Intel drivers, and is able to transfer
>> TX/RX packets with CSO/TSO offloads, according to the spec.
>> 
>> Extended features not supported yet (work in progress):
>>  1. TX/RX Interrupt moderation mechanisms
>>  2. RSS
>>  3. Full-featured multi-queue (use of 

Re: [Qemu-devel] [PATCH v4 00/16] Introduce Intel 82574 GbE Controller Emulation (e1000e)

2016-04-10 Thread Michael S. Tsirkin
On Sun, Apr 10, 2016 at 06:14:49PM +0300, Dmitry Fleytman wrote:
> From: Dmitry Fleytman 
> 
> Hello All,
> 
> This is v4 of e1000e series.
> 
> For convenience, the same patches are available at:
> https://github.com/daynix/qemu-e1000e/tree/e1000e-submit-v4
> 
> Best regards,
> Dmitry.
> 
> Changes since v3:
> 
> 1. Various code fixes as suggested by Jason and Michael
> 2. Rebased to the latest master

Pls remember to repost when 2.6 is out.

> Changes since v2:
> 
> 1. Interrupt storm on latest Linux kernels fixed
> 2. Device unit test added
> 3. Introduced code sharing between e1000 and e1000e
> 4. Various code fixes as suggested by Jason
> 5. Rebased to the latest master
> 
> Changes since v1:
> 
> 1. PCI_PM_CAP_VER_1_1 is defined now in include/hw/pci/pci_regs.h and
>not in include/standard-headers/linux/pci_regs.h.
> 2. Changes in naming and extra comments in hw/pci/pcie.c and in
>include/hw/pci/pcie.h.
> 3. Defining pci_dsn_ver and pci_dsn_cap static const variables in
>hw/pci/pcie.c, instead of PCI_DSN_VER and PCI_DSN_CAP symbolic
>constants in include/hw/pci/pcie_regs.h.
> 4. Changing the vmxnet3_device_serial_num function in hw/net/vmxnet3.c
>to avoid the cast when it is called.
> 5. Avoiding a preceding underscore in all the e1000e-related names.
> 6. Minor style changes.
> 
> ===
> 
> Hello All,
> 
> This series is the final code of the e1000e device emulation, that we
> have developed. Please review, and consider acceptance of these patches
> to the upstream QEMU repository.
> 
> The code stability was verified by various traffic tests using Fedora 22
> Linux, and Windows Server 2012R2 guests. Also, Microsoft Hardware
> Certification Kit (HCK) tests were run on a Windows Server 2012R2 guest.
> 
> There was a discussion on the possibility of code sharing between the
> e1000e, and the existing e1000 devices. We have reviewed the final code
> for parts that may be shared between this device and the currently
> available e1000 emulation. The device specifications are very different,
> and there are almost no registers, nor functions, that were left as is
> from e1000. The ring descriptor structures were changed as well, by the
> introduction of extended and PS descriptors, as well as additional bits.
> 
> Additional differences stem from the fact that the e1000e device re-uses
> network packet abstractions introduced by the vmxnet3 device, while the
> e1000 has its own code for packet handling. BTW, it may be worth reusing
> those abstractions in e1000 as well. (Following these changes the
> vmxnet3 device was successfully tested for possible regressions.)
> 
> There are a few minor parts that may be shared, e.g. the default
> register handlers, and the ring management functions. The total amount
> of shared lines will be about 100--150, so we're not sure if it makes
> sense bothering, and taking a risk of breaking e1000, which is a good,
> old, and stable device.
> 
> Currently, the e1000e code is stand alone w.r.t. e1000.
> 
> Please share your thoughts.
> 
> Thanks in advance,
> Dmitry.
> 
> Changes since RFCv2:
> 
> 1. Device functionality verified using Microsoft Hardware Certification Test 
> Kit (HCK)
> 2. Introduced a number of performance improvements
> 3. The code was cleaned, and rebased to the latest master
> 4. Patches verified with checkpatch.pl
> 
> ===
> 
> Changes since RFCv1:
> 
> 1. Added support for all the device features:
>   - Interrupt moderation.
>   - RSS.
>   - Multiqueue.
> 2. Simulated exact PCI/PCIe configuration space layout.
> 3. Made fixes needed to pass Microsoft's HW certification tests (HCK).
> 
> This series is still an RFC, because the following tasks are not done yet:
> 
> 1. See which code can be shared between this device and the existing e1000 
> device.
> 2. Rebase patches to the latest master (current base is v2.3.0).
> 
> Please share your thoughts,
> Thanks, Dmitry.
> 
> ===
> 
> Hello qemu-devel,
> 
> This patch series is an RFC for the new networking device emulation
> we're developing for QEMU.
> 
> This new device emulates the Intel 82574 GbE Controller and works
> with unmodified Intel e1000e drivers from the Linux/Windows kernels.
> 
> The status of the current series is "Functional Device Ready, work
> on Extended Features in Progress".
> 
> More precisely, these patches represent a functional device, which
> is recognized by the standard Intel drivers, and is able to transfer
> TX/RX packets with CSO/TSO offloads, according to the spec.
> 
> Extended features not supported yet (work in progress):
>   1. TX/RX Interrupt moderation mechanisms
>   2. RSS
>   3. Full-featured multi-queue (use of multiqueued network backend)
> 
> Also, there will be some code refactoring and performance
> optimization efforts.
> 
> This series was tested on Linux (Fedora 22) and Windows (2012R2)
> guests, using Iperf, with TX/RX and TCP/UDP streams, and various
> 

Re: [Qemu-devel] [ RFC Patch v4 1/3] virtio-net rsc: add a new host offload(rsc) feature bit

2016-04-10 Thread Wei Xu



On 2016年04月05日 16:17, Michael S. Tsirkin wrote:

On Tue, Apr 05, 2016 at 10:05:17AM +0800, Jason Wang wrote:


On 04/04/2016 03:25 AM, w...@redhat.com wrote:

From: Wei Xu 

A new feature bit 'VIRTIO_NET_F_GUEST_RSC' is introduced to support WHQL
Receive-Segment-Offload test, this feature will coalesce tcp packets in
IPv4/6 for userspace virtio-net driver.

This feature can be enabled either by 'ACK' the feature when loading
the driver in the guest, or by sending the 'VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET'
command to the host via control queue.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 29 +++--
  include/standard-headers/linux/virtio_net.h |  1 +
  2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5798f87..bd91a4b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -537,6 +537,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
  virtio_clear_feature(, VIRTIO_NET_F_GUEST_TSO4);
  virtio_clear_feature(, VIRTIO_NET_F_GUEST_TSO6);
  virtio_clear_feature(, VIRTIO_NET_F_GUEST_ECN);
+virtio_clear_feature(, VIRTIO_NET_F_GUEST_RSC);

Several questions here:

- I think RSC should work even without vnet_hdr?
That's interesting, but i'm wondering how to test this? could you please 
point me out?

- Need we differentiate ipv4 and ipv6 like TSO here?

Sure, thanks.

- And looks like this patch should be squash to following patches.

OK.



  }
  
  if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {

@@ -582,7 +583,8 @@ static uint64_t 
virtio_net_guest_offloads_by_features(uint32_t features)
  (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
  (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
  (1ULL << VIRTIO_NET_F_GUEST_ECN)  |
-(1ULL << VIRTIO_NET_F_GUEST_UFO);
+(1ULL << VIRTIO_NET_F_GUEST_UFO)  |
+(1ULL << VIRTIO_NET_F_GUEST_RSC);

Looks like this is unnecessary since we won't set peer offload based on
GUEST_RSC.
there is an exclusive check when handling set feature command from 
control queue, so looks it will broke the check if don't include this bit.


  
  return guest_offloads_mask & features;

  }
@@ -1089,7 +1091,8 @@ static int receive_filter(VirtIONet *n, const uint8_t 
*buf, int size)
  return 0;
  }
  
-static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)

+static ssize_t virtio_net_do_receive(NetClientState *nc,
+ const uint8_t *buf, size_t size)
  {
  VirtIONet *n = qemu_get_nic_opaque(nc);
  VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1685,6 +1688,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
  return 0;
  }
  
+

+static ssize_t virtio_net_rsc_receive(NetClientState *nc,
+  const uint8_t *buf, size_t size)
+{
+return virtio_net_do_receive(nc, buf, size);
+}
+
+static ssize_t virtio_net_receive(NetClientState *nc,
+  const uint8_t *buf, size_t size)
+{
+VirtIONet *n;
+
+n = qemu_get_nic_opaque(nc);
+if (n->curr_guest_offloads & VIRTIO_NET_F_GUEST_RSC) {
+return virtio_net_rsc_receive(nc, buf, size);
+} else {
+return virtio_net_do_receive(nc, buf, size);
+}
+}

The changes here looks odd since it did nothing. Like I've mentioned,
better merge the patch into following ones.

OK.



+
  static NetClientInfo net_virtio_info = {
  .type = NET_CLIENT_OPTIONS_KIND_NIC,
  .size = sizeof(NICState),
@@ -1909,6 +1932,8 @@ static Property virtio_net_properties[] = {
 TX_TIMER_INTERVAL),
  DEFINE_PROP_INT32("x-txburst", VirtIONet, net_conf.txburst, TX_BURST),
  DEFINE_PROP_STRING("tx", VirtIONet, net_conf.tx),
+DEFINE_PROP_BIT("guest_rsc", VirtIONet, host_features,
+VIRTIO_NET_F_GUEST_RSC, true),

Need to compat the bit for old machine type to unbreak migration I believe?

And definitely disable by default.
There maybe some windows specific details about this, i'll discuss with 
Yan and update.



Btw, also need a patch for virtio spec.

Sure.


Thanks


  DEFINE_PROP_END_OF_LIST(),
  };
  
diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h

index a78f33e..5b95762 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -55,6 +55,7 @@
  #define VIRTIO_NET_F_MQ   22  /* Device supports Receive Flow
 * Steering */
  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
+#define VIRTIO_NET_F_GUEST_RSC  24  /* Guest can coalesce tcp packets */
  
  #ifndef VIRTIO_NET_NO_LEGACY

  #define VIRTIO_NET_F_GSO  6   /* Host handles pkts w/ any GSO type */





[Qemu-devel] [PATCH v4 11/16] net_pkt: Extend packet abstraction as required by e1000e functionality

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

This patch extends the TX/RX packet abstractions with features that will
be used by the e1000e device implementation.

Changes are:

  1. Support iovec lists for RX buffers
  2. Deeper RX packets parsing
  3. Loopback option for TX packets
  4. Extended VLAN headers handling
  5. RSS processing for RX packets

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/net_rx_pkt.c| 473 +
 hw/net/net_rx_pkt.h| 193 +++-
 hw/net/net_tx_pkt.c| 204 +
 hw/net/net_tx_pkt.h|  60 ++-
 include/net/checksum.h |   4 +-
 include/net/eth.h  | 153 +++-
 net/checksum.c |   7 +-
 net/eth.c  | 410 +-
 trace-events   |  40 +
 9 files changed, 1336 insertions(+), 208 deletions(-)

diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 8a4f29f..1019b50 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -16,24 +16,16 @@
  */
 
 #include "qemu/osdep.h"
+#include "trace.h"
 #include "net_rx_pkt.h"
-#include "net/eth.h"
-#include "qemu-common.h"
-#include "qemu/iov.h"
 #include "net/checksum.h"
 #include "net/tap.h"
 
-/*
- * RX packet may contain up to 2 fragments - rebuilt eth header
- * in case of VLAN tag stripping
- * and payload received from QEMU - in any case
- */
-#define NET_MAX_RX_PACKET_FRAGMENTS (2)
-
 struct NetRxPkt {
 struct virtio_net_hdr virt_hdr;
-uint8_t ehdr_buf[ETH_MAX_L2_HDR_LEN];
-struct iovec vec[NET_MAX_RX_PACKET_FRAGMENTS];
+uint8_t ehdr_buf[sizeof(struct eth_header)];
+struct iovec *vec;
+uint16_t vec_len_total;
 uint16_t vec_len;
 uint32_t tot_len;
 uint16_t tci;
@@ -46,17 +38,31 @@ struct NetRxPkt {
 bool isip6;
 bool isudp;
 bool istcp;
+
+size_t l3hdr_off;
+size_t l4hdr_off;
+size_t l5hdr_off;
+
+eth_ip6_hdr_info ip6hdr_info;
+eth_ip4_hdr_info ip4hdr_info;
+eth_l4_hdr_info  l4hdr_info;
 };
 
 void net_rx_pkt_init(struct NetRxPkt **pkt, bool has_virt_hdr)
 {
 struct NetRxPkt *p = g_malloc0(sizeof *p);
 p->has_virt_hdr = has_virt_hdr;
+p->vec = NULL;
+p->vec_len_total = 0;
 *pkt = p;
 }
 
 void net_rx_pkt_uninit(struct NetRxPkt *pkt)
 {
+if (pkt->vec_len_total != 0) {
+g_free(pkt->vec);
+}
+
 g_free(pkt);
 }
 
@@ -66,33 +72,88 @@ struct virtio_net_hdr *net_rx_pkt_get_vhdr(struct NetRxPkt 
*pkt)
 return >virt_hdr;
 }
 
-void net_rx_pkt_attach_data(struct NetRxPkt *pkt, const void *data,
-   size_t len, bool strip_vlan)
+static inline void
+net_rx_pkt_iovec_realloc(struct NetRxPkt *pkt,
+int new_iov_len)
+{
+if (pkt->vec_len_total < new_iov_len) {
+g_free(pkt->vec);
+pkt->vec = g_malloc(sizeof(*pkt->vec) * new_iov_len);
+pkt->vec_len_total = new_iov_len;
+}
+}
+
+static void
+net_rx_pkt_pull_data(struct NetRxPkt *pkt,
+const struct iovec *iov, int iovcnt,
+size_t ploff)
+{
+if (pkt->vlan_stripped) {
+net_rx_pkt_iovec_realloc(pkt, iovcnt + 1);
+
+pkt->vec[0].iov_base = pkt->ehdr_buf;
+pkt->vec[0].iov_len = sizeof(pkt->ehdr_buf);
+
+pkt->tot_len =
+iov_size(iov, iovcnt) - ploff + sizeof(struct eth_header);
+
+pkt->vec_len = iov_copy(pkt->vec + 1, pkt->vec_len_total - 1,
+iov, iovcnt, ploff, pkt->tot_len);
+} else {
+net_rx_pkt_iovec_realloc(pkt, iovcnt);
+
+pkt->tot_len = iov_size(iov, iovcnt) - ploff;
+pkt->vec_len = iov_copy(pkt->vec, pkt->vec_len_total,
+iov, iovcnt, ploff, pkt->tot_len);
+}
+
+eth_get_protocols(pkt->vec, pkt->vec_len, >isip4, >isip6,
+  >isudp, >istcp,
+  >l3hdr_off, >l4hdr_off, >l5hdr_off,
+  >ip6hdr_info, >ip4hdr_info, >l4hdr_info);
+
+trace_net_rx_pkt_parsed(pkt->isip4, pkt->isip6, pkt->isudp, pkt->istcp,
+pkt->l3hdr_off, pkt->l4hdr_off, pkt->l5hdr_off);
+}
+
+void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt,
+const struct iovec *iov, int iovcnt,
+size_t iovoff, bool strip_vlan)
 {
 uint16_t tci = 0;
-uint16_t ploff;
+uint16_t ploff = iovoff;
 assert(pkt);
 pkt->vlan_stripped = false;
 
 if (strip_vlan) {
-pkt->vlan_stripped = eth_strip_vlan(data, pkt->ehdr_buf, , );
+pkt->vlan_stripped = eth_strip_vlan(iov, iovcnt, iovoff, pkt->ehdr_buf,
+, );
 }
 
-if (pkt->vlan_stripped) {
-pkt->vec[0].iov_base = pkt->ehdr_buf;
-pkt->vec[0].iov_len = ploff - sizeof(struct vlan_header);
-   

[Qemu-devel] [PATCH v4 09/16] net_pkt: Name vmxnet3 packet abstractions more generic

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

This patch drops "vmx" prefix from packet abstrations names
to emphasize the fact they are generic and not tied to any
specific network device.

These abstrations will be reused by e1000e emulation implementation
introduced by following patches so their names need generalization.

This patch (except renamed files, adjusted comments and changes in MAINTAINTERS)
was produced by:

git grep -lz 'vmxnet_tx_pkt' | xargs -0 perl -i'' -pE 
"s/vmxnet_tx_pkt/net_tx_pkt/g"
git grep -lz 'vmxnet_rx_pkt' | xargs -0 perl -i'' -pE 
"s/vmxnet_rx_pkt/net_rx_pkt/g"
git grep -lz 'VmxnetTxPkt' | xargs -0 perl -i'' -pE "s/VmxnetTxPkt/NetTxPkt/g"
git grep -lz 'VMXNET_TX_PKT' | xargs -0 perl -i'' -pE 
"s/VMXNET_TX_PKT/NET_TX_PKT/g"
git grep -lz 'VmxnetRxPkt' | xargs -0 perl -i'' -pE "s/VmxnetRxPkt/NetRxPkt/g"
git grep -lz 'VMXNET_RX_PKT' | xargs -0 perl -i'' -pE 
"s/VMXNET_RX_PKT/NET_RX_PKT/g"
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_rx_pkt.c
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_tx_pkt.c

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 MAINTAINERS|   8 +
 hw/net/Makefile.objs   |   2 +-
 hw/net/net_rx_pkt.c| 187 
 hw/net/net_rx_pkt.h| 174 +++
 hw/net/net_tx_pkt.c| 581 +
 hw/net/net_tx_pkt.h| 146 +
 hw/net/vmxnet3.c   |  88 
 hw/net/vmxnet_rx_pkt.c | 187 
 hw/net/vmxnet_rx_pkt.h | 174 ---
 hw/net/vmxnet_tx_pkt.c | 581 -
 hw/net/vmxnet_tx_pkt.h | 146 -
 tests/Makefile |   4 +-
 12 files changed, 1143 insertions(+), 1135 deletions(-)
 create mode 100644 hw/net/net_rx_pkt.c
 create mode 100644 hw/net/net_rx_pkt.h
 create mode 100644 hw/net/net_tx_pkt.c
 create mode 100644 hw/net/net_tx_pkt.h
 delete mode 100644 hw/net/vmxnet_rx_pkt.c
 delete mode 100644 hw/net/vmxnet_rx_pkt.h
 delete mode 100644 hw/net/vmxnet_tx_pkt.c
 delete mode 100644 hw/net/vmxnet_tx_pkt.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 9277fbf..445c89e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -953,6 +953,14 @@ S: Maintained
 F: hw/*/xilinx_*
 F: include/hw/xilinx.h
 
+Network packet abstractions
+M: Dmitry Fleytman 
+S: Maintained
+F: include/net/eth.h
+F: net/eth.c
+F: hw/net/net_rx_pkt*
+F: hw/net/net_tx_pkt*
+
 Vmware
 M: Dmitry Fleytman 
 S: Maintained
diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
index 64d0449..527d264 100644
--- a/hw/net/Makefile.objs
+++ b/hw/net/Makefile.objs
@@ -8,7 +8,7 @@ common-obj-$(CONFIG_PCNET_PCI) += pcnet-pci.o
 common-obj-$(CONFIG_PCNET_COMMON) += pcnet.o
 common-obj-$(CONFIG_E1000_PCI) += e1000.o
 common-obj-$(CONFIG_RTL8139_PCI) += rtl8139.o
-common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet_tx_pkt.o vmxnet_rx_pkt.o
+common-obj-$(CONFIG_VMXNET3_PCI) += net_tx_pkt.o net_rx_pkt.o
 common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet3.o
 
 common-obj-$(CONFIG_SMC91C111) += smc91c111.o
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
new file mode 100644
index 000..8a4f29f
--- /dev/null
+++ b/hw/net/net_rx_pkt.c
@@ -0,0 +1,187 @@
+/*
+ * QEMU RX packets abstractions
+ *
+ * Copyright (c) 2012 Ravello Systems LTD (http://ravellosystems.com)
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ * Dmitry Fleytman 
+ * Tamir Shomer 
+ * Yan Vugenfirer 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "net_rx_pkt.h"
+#include "net/eth.h"
+#include "qemu-common.h"
+#include "qemu/iov.h"
+#include "net/checksum.h"
+#include "net/tap.h"
+
+/*
+ * RX packet may contain up to 2 fragments - rebuilt eth header
+ * in case of VLAN tag stripping
+ * and payload received from QEMU - in any case
+ */
+#define NET_MAX_RX_PACKET_FRAGMENTS (2)
+
+struct NetRxPkt {
+struct virtio_net_hdr virt_hdr;
+uint8_t ehdr_buf[ETH_MAX_L2_HDR_LEN];
+struct iovec vec[NET_MAX_RX_PACKET_FRAGMENTS];
+uint16_t vec_len;
+uint32_t tot_len;
+uint16_t tci;
+bool vlan_stripped;
+bool has_virt_hdr;
+eth_pkt_types_e packet_type;
+
+/* Analysis results */
+bool isip4;
+bool isip6;
+bool isudp;
+bool istcp;
+};
+
+void net_rx_pkt_init(struct NetRxPkt **pkt, bool has_virt_hdr)
+{
+struct NetRxPkt *p = g_malloc0(sizeof *p);
+p->has_virt_hdr = has_virt_hdr;
+*pkt = p;
+}
+
+void net_rx_pkt_uninit(struct NetRxPkt *pkt)
+{
+g_free(pkt);
+}
+
+struct virtio_net_hdr *net_rx_pkt_get_vhdr(struct NetRxPkt *pkt)
+{
+assert(pkt);
+return >virt_hdr;
+}
+
+void net_rx_pkt_attach_data(struct NetRxPkt *pkt, const void *data,
+   size_t len, bool strip_vlan)

[Qemu-devel] [PATCH v4 12/16] vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

To make this device and network packets
abstractions ready for IOMMU.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/net_tx_pkt.c | 16 +++-
 hw/net/net_tx_pkt.h |  5 +++--
 hw/net/vmxnet3.c| 51 ++-
 3 files changed, 44 insertions(+), 28 deletions(-)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index ad2258c..dbcbe23 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -20,6 +20,7 @@
 #include "net/checksum.h"
 #include "net/tap.h"
 #include "net/net.h"
+#include "hw/pci/pci.h"
 
 enum {
 NET_TX_PKT_VHDR_FRAG = 0,
@@ -30,6 +31,8 @@ enum {
 
 /* TX packet private context */
 struct NetTxPkt {
+PCIDevice *pci_dev;
+
 struct virtio_net_hdr virt_hdr;
 bool has_virt_hdr;
 
@@ -54,11 +57,13 @@ struct NetTxPkt {
 bool is_loopback;
 };
 
-void net_tx_pkt_init(struct NetTxPkt **pkt, uint32_t max_frags,
-bool has_virt_hdr)
+void net_tx_pkt_init(struct NetTxPkt **pkt, PCIDevice *pci_dev,
+uint32_t max_frags, bool has_virt_hdr)
 {
 struct NetTxPkt *p = g_malloc0(sizeof *p);
 
+p->pci_dev = pci_dev;
+
 p->vec = g_malloc((sizeof *p->vec) *
 (max_frags + NET_TX_PKT_PL_START_FRAG));
 
@@ -383,7 +388,8 @@ bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, 
hwaddr pa,
 ventry = >raw[pkt->raw_frags];
 mapped_len = len;
 
-ventry->iov_base = cpu_physical_memory_map(pa, _len, false);
+ventry->iov_base = pci_dma_map(pkt->pci_dev, pa,
+   _len, DMA_DIRECTION_TO_DEVICE);
 
 if ((ventry->iov_base != NULL) && (len == mapped_len)) {
 ventry->iov_len = mapped_len;
@@ -444,8 +450,8 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt)
 assert(pkt->raw);
 for (i = 0; i < pkt->raw_frags; i++) {
 assert(pkt->raw[i].iov_base);
-cpu_physical_memory_unmap(pkt->raw[i].iov_base, pkt->raw[i].iov_len,
-  false, pkt->raw[i].iov_len);
+pci_dma_unmap(pkt->pci_dev, pkt->raw[i].iov_base, pkt->raw[i].iov_len,
+  DMA_DIRECTION_TO_DEVICE, 0);
 }
 pkt->raw_frags = 0;
 
diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index e49772d..07b9a20 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -31,11 +31,12 @@ struct NetTxPkt;
  * Init function for tx packet functionality
  *
  * @pkt:packet pointer
+ * @pci_dev:PCI device processing this packet
  * @max_frags:  max tx ip fragments
  * @has_virt_hdr:   device uses virtio header.
  */
-void net_tx_pkt_init(struct NetTxPkt **pkt, uint32_t max_frags,
-bool has_virt_hdr);
+void net_tx_pkt_init(struct NetTxPkt **pkt, PCIDevice *pci_dev,
+uint32_t max_frags, bool has_virt_hdr);
 
 /**
  * Clean all tx packet resources.
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 378a2eb..bbd5447 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -802,7 +802,9 @@ vmxnet3_pop_rxc_descr(VMXNET3State *s, int qidx, uint32_t 
*descr_gen)
 hwaddr daddr =
 vmxnet3_ring_curr_cell_pa(>rxq_descr[qidx].comp_ring);
 
-cpu_physical_memory_read(daddr, , sizeof(struct Vmxnet3_RxCompDesc));
+pci_dma_read(PCI_DEVICE(s), daddr,
+ , sizeof(struct Vmxnet3_RxCompDesc));
+
 ring_gen = vmxnet3_ring_curr_gen(>rxq_descr[qidx].comp_ring);
 
 if (rxcd.gen != ring_gen) {
@@ -1023,10 +1025,11 @@ nocsum:
 }
 
 static void
-vmxnet3_physical_memory_writev(const struct iovec *iov,
-   size_t start_iov_off,
-   hwaddr target_addr,
-   size_t bytes_to_copy)
+vmxnet3_pci_dma_writev(PCIDevice *pci_dev,
+   const struct iovec *iov,
+   size_t start_iov_off,
+   hwaddr target_addr,
+   size_t bytes_to_copy)
 {
 size_t curr_off = 0;
 size_t copied = 0;
@@ -1036,9 +1039,9 @@ vmxnet3_physical_memory_writev(const struct iovec *iov,
 size_t chunk_len =
 MIN((curr_off + iov->iov_len) - start_iov_off, bytes_to_copy);
 
-cpu_physical_memory_write(target_addr + copied,
-  iov->iov_base + start_iov_off - curr_off,
-  chunk_len);
+pci_dma_write(pci_dev, target_addr + copied,
+  iov->iov_base + start_iov_off - curr_off,
+  chunk_len);
 
 copied += chunk_len;
 start_iov_off += chunk_len;
@@ -1088,15 +1091,15 @@ vmxnet3_indicate_packet(VMXNET3State *s)
 }
 
 chunk_size = MIN(bytes_left, rxd.len);
-vmxnet3_physical_memory_writev(data, bytes_copied,
-   le64_to_cpu(rxd.addr), chunk_size);
+

[Qemu-devel] [PATCH v4 10/16] rtl8139: Move more TCP definitions to common header

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/rtl8139.c  | 5 -
 include/net/eth.h | 8 
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 1e5ec14..562c1fd 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -1867,11 +1867,6 @@ static int rtl8139_transmit_one(RTL8139State *s, int 
descriptor)
 return 1;
 }
 
-/* structures and macros for task offloading */
-#define TCP_HEADER_DATA_OFFSET(tcp) (((be16_to_cpu(tcp->th_offset_flags) >> 
12)&0xf) << 2)
-#define TCP_FLAGS_ONLY(flags) ((flags)&0x3f)
-#define TCP_HEADER_FLAGS(tcp) TCP_FLAGS_ONLY(be16_to_cpu(tcp->th_offset_flags))
-
 #define TCP_HEADER_CLEAR_FLAGS(tcp, off) ((tcp)->th_offset_flags &= 
cpu_to_be16(~TCP_FLAGS_ONLY(off)))
 
 /* produces ones' complement sum of data */
diff --git a/include/net/eth.h b/include/net/eth.h
index 18d0be3..5a32259 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -67,6 +67,14 @@ typedef struct tcp_header {
 uint16_t th_urp;/* urgent pointer */
 } tcp_header;
 
+#define TCP_FLAGS_ONLY(flags) ((flags) & 0x3f)
+
+#define TCP_HEADER_FLAGS(tcp) \
+TCP_FLAGS_ONLY(be16_to_cpu((tcp)->th_offset_flags))
+
+#define TCP_HEADER_DATA_OFFSET(tcp) \
+(((be16_to_cpu((tcp)->th_offset_flags) >> 12) & 0xf) << 2)
+
 typedef struct udp_header {
 uint16_t uh_sport; /* source port */
 uint16_t uh_dport; /* destination port */
-- 
2.5.0




[Qemu-devel] [PATCH v4 13/16] e1000_regs: Add definitions for Intel 82574-specific bits

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/e1000_regs.h | 345 +++-
 1 file changed, 342 insertions(+), 3 deletions(-)

diff --git a/hw/net/e1000_regs.h b/hw/net/e1000_regs.h
index 1c40244..d62b3fa 100644
--- a/hw/net/e1000_regs.h
+++ b/hw/net/e1000_regs.h
@@ -85,6 +85,7 @@
 #define E1000_DEV_ID_82573E  0x108B
 #define E1000_DEV_ID_82573E_IAMT 0x108C
 #define E1000_DEV_ID_82573L  0x109A
+#define E1000_DEV_ID_82574L  0x10D3
 #define E1000_DEV_ID_82546GB_QUAD_COPPER_KSP3 0x10B5
 #define E1000_DEV_ID_80003ES2LAN_COPPER_DPT 0x1096
 #define E1000_DEV_ID_80003ES2LAN_SERDES_DPT 0x1098
@@ -104,6 +105,7 @@
 #define E1000_PHY_ID2_82544x 0xC30
 #define E1000_PHY_ID2_8254xx_DEFAULT 0xC20 /* 82540x, 82545x, and 82546x */
 #define E1000_PHY_ID2_82573x 0xCC0
+#define E1000_PHY_ID2_82574x 0xCB1
 
 /* Register Set. (82543, 82544)
  *
@@ -135,8 +137,11 @@
 #define E1000_ITR  0x000C4  /* Interrupt Throttling Rate - RW */
 #define E1000_ICS  0x000C8  /* Interrupt Cause Set - WO */
 #define E1000_IMS  0x000D0  /* Interrupt Mask Set - RW */
+#define E1000_EIAC 0x000DC  /* Ext. Interrupt Auto Clear - RW */
 #define E1000_IMC  0x000D8  /* Interrupt Mask Clear - WO */
 #define E1000_IAM  0x000E0  /* Interrupt Acknowledge Auto Mask */
+#define E1000_IVAR 0x000E4  /* Interrupt Vector Allocation Register - RW */
+#define E1000_EITR 0x000E8  /* Extended Interrupt Throttling Rate - RW */
 #define E1000_RCTL 0x00100  /* RX Control - RW */
 #define E1000_RDTR10x02820  /* RX Delay Timer (1) - RW */
 #define E1000_RDBAL1   0x02900  /* RX Descriptor Base Address Low (1) - RW */
@@ -145,6 +150,7 @@
 #define E1000_RDH1 0x02910  /* RX Descriptor Head (1) - RW */
 #define E1000_RDT1 0x02918  /* RX Descriptor Tail (1) - RW */
 #define E1000_FCTTV0x00170  /* Flow Control Transmit Timer Value - RW */
+#define E1000_FCRTV0x05F40  /* Flow Control Refresh Timer Value - RW */
 #define E1000_TXCW 0x00178  /* TX Configuration Word - RW */
 #define E1000_RXCW 0x00180  /* RX Configuration Word - RO */
 #define E1000_TCTL 0x00400  /* TX Control - RW */
@@ -161,6 +167,10 @@
 #define E1000_PBM  0x1  /* Packet Buffer Memory - RW */
 #define E1000_PBS  0x01008  /* Packet Buffer Size - RW */
 #define E1000_EEMNGCTL 0x01010  /* MNG EEprom Control */
+#define E1000_EEMNGDATA0x01014 /* MNG EEPROM Read/Write data */
+#define E1000_FLMNGCTL 0x01018 /* MNG Flash Control */
+#define E1000_FLMNGDATA0x0101C /* MNG FLASH Read data */
+#define E1000_FLMNGCNT 0x01020 /* MNG FLASH Read Counter */
 #define E1000_FLASH_UPDATES 1000
 #define E1000_EEARBC   0x01024  /* EEPROM Auto Read Bus Control */
 #define E1000_FLASHT   0x01028  /* FLASH Timer Register */
@@ -169,9 +179,12 @@
 #define E1000_FLSWDATA 0x01034  /* FLASH data register */
 #define E1000_FLSWCNT  0x01038  /* FLASH Access Counter */
 #define E1000_FLOP 0x0103C  /* FLASH Opcode Register */
+#define E1000_FLOL 0x01050  /* FEEP Auto Load */
 #define E1000_ERT  0x02008  /* Early Rx Threshold - RW */
 #define E1000_FCRTL0x02160  /* Flow Control Receive Threshold Low - RW */
+#define E1000_FCRTL_A  0x00168  /* Alias to FCRTL */
 #define E1000_FCRTH0x02168  /* Flow Control Receive Threshold High - RW */
+#define E1000_FCRTH_A  0x00160  /* Alias to FCRTH */
 #define E1000_PSRCTL   0x02170  /* Packet Split Receive Control - RW */
 #define E1000_RDBAL0x02800  /* RX Descriptor Base Address Low - RW */
 #define E1000_RDBAH0x02804  /* RX Descriptor Base Address High - RW */
@@ -179,11 +192,17 @@
 #define E1000_RDH  0x02810  /* RX Descriptor Head - RW */
 #define E1000_RDT  0x02818  /* RX Descriptor Tail - RW */
 #define E1000_RDTR 0x02820  /* RX Delay Timer - RW */
+#define E1000_RDTR_A   0x00108  /* Alias to RDTR */
 #define E1000_RDBAL0   E1000_RDBAL /* RX Desc Base Address Low (0) - RW */
+#define E1000_RDBAL0_A 0x00110 /* Alias to RDBAL0 */
 #define E1000_RDBAH0   E1000_RDBAH /* RX Desc Base Address High (0) - RW */
+#define E1000_RDBAH0_A 0x00114 /* Alias to RDBAH0 */
 #define E1000_RDLEN0   E1000_RDLEN /* RX Desc Length (0) - RW */
+#define E1000_RDLEN0_A 0x00118 /* Alias to RDLEN0 */
 #define E1000_RDH0 E1000_RDH   /* RX Desc Head (0) - RW */
+#define E1000_RDH0_A   0x00120 /* Alias to RDH0 */
 #define E1000_RDT0 E1000_RDT   /* RX Desc Tail (0) - RW */
+#define E1000_RDT0_A   0x00128 /* Alias to RDT0 */
 #define E1000_RDTR0E1000_RDTR  /* RX Delay Timer (0) - RW */
 #define E1000_RXDCTL   0x02828  /* RX Descriptor Control queue 0 - RW */
 #define E1000_RXDCTL1  0x02928  /* RX Descriptor Control queue 1 - RW */
@@ -192,22 +211,33 @@
 #define E1000_RAID 0x02C08  /* Receive Ack Interrupt Delay - RW */

[Qemu-devel] [PATCH v4 07/16] net: Add macros for MAC address tracing

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

These macros will be used by future commits introducing
e1000e device emulation and by vmxnet3 tracing code.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 include/net/net.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 73e4c46..129d46b 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -9,6 +9,11 @@
 #include "migration/vmstate.h"
 #include "qapi-types.h"
 
+#define MAC_FMT "%02X:%02X:%02X:%02X:%02X:%02X"
+#define MAC_ARG(x) ((uint8_t *)(x))[0], ((uint8_t *)(x))[1], \
+   ((uint8_t *)(x))[2], ((uint8_t *)(x))[3], \
+   ((uint8_t *)(x))[4], ((uint8_t *)(x))[5]
+
 #define MAX_QUEUE_NUM 1024
 
 /* Maximum GSO packet size (64k) plus plenty of room for
-- 
2.5.0




[Qemu-devel] [PATCH v4 08/16] vmxnet3: Use common MAC address tracing macros

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/vmxnet3.c  | 8 
 hw/net/vmxnet_debug.h | 3 ---
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 0a4db4d..26f6f90 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -474,7 +474,7 @@ static void vmxnet3_set_variable_mac(VMXNET3State *s, 
uint32_t h, uint32_t l)
 s->conf.macaddr.a[4] = VMXNET3_GET_BYTE(h, 0);
 s->conf.macaddr.a[5] = VMXNET3_GET_BYTE(h, 1);
 
-VMW_CFPRN("Variable MAC: " VMXNET_MF, VMXNET_MA(s->conf.macaddr.a));
+VMW_CFPRN("Variable MAC: " MAC_FMT, MAC_ARG(s->conf.macaddr.a));
 
 qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
@@ -1219,7 +1219,7 @@ static void vmxnet3_reset_interrupt_states(VMXNET3State 
*s)
 static void vmxnet3_reset_mac(VMXNET3State *s)
 {
 memcpy(>conf.macaddr.a, >perm_mac.a, sizeof(s->perm_mac.a));
-VMW_CFPRN("MAC address set to: " VMXNET_MF, VMXNET_MA(s->conf.macaddr.a));
+VMW_CFPRN("MAC address set to: " MAC_FMT, MAC_ARG(s->conf.macaddr.a));
 }
 
 static void vmxnet3_deactivate_device(VMXNET3State *s)
@@ -1301,7 +1301,7 @@ static void vmxnet3_update_mcast_filters(VMXNET3State *s)
 cpu_physical_memory_read(mcast_list_pa, s->mcast_list, list_bytes);
 VMW_CFPRN("Current multicast list len is %d:", s->mcast_list_len);
 for (i = 0; i < s->mcast_list_len; i++) {
-VMW_CFPRN("\t" VMXNET_MF, VMXNET_MA(s->mcast_list[i].a));
+VMW_CFPRN("\t" MAC_FMT, MAC_ARG(s->mcast_list[i].a));
 }
 }
 }
@@ -2102,7 +2102,7 @@ static void vmxnet3_net_init(VMXNET3State *s)
 
 s->link_status_and_speed = VMXNET3_LINK_SPEED | VMXNET3_LINK_STATUS_UP;
 
-VMW_CFPRN("Permanent MAC: " VMXNET_MF, VMXNET_MA(s->perm_mac.a));
+VMW_CFPRN("Permanent MAC: " MAC_FMT, MAC_ARG(s->perm_mac.a));
 
 s->nic = qemu_new_nic(_vmxnet3_info, >conf,
   object_get_typename(OBJECT(s)),
diff --git a/hw/net/vmxnet_debug.h b/hw/net/vmxnet_debug.h
index 96495db..5aab00b 100644
--- a/hw/net/vmxnet_debug.h
+++ b/hw/net/vmxnet_debug.h
@@ -142,7 +142,4 @@
 } \
 } while (0)
 
-#define VMXNET_MF   "%02X:%02X:%02X:%02X:%02X:%02X"
-#define VMXNET_MA(a)(a)[0], (a)[1], (a)[2], (a)[3], (a)[4], (a)[5]
-
 #endif /* _QEMU_VMXNET3_DEBUG_H  */
-- 
2.5.0




[Qemu-devel] [PATCH v4 04/16] pcie: Introduce function for DSN capability creation

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/pci/pcie.c | 10 ++
 include/hw/pci/pcie.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 24cfc3b..9599fde 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -695,3 +695,13 @@ void pcie_ari_init(PCIDevice *dev, uint16_t offset, 
uint16_t nextfn)
 offset, PCI_ARI_SIZEOF);
 pci_set_long(dev->config + offset + PCI_ARI_CAP, (nextfn & 0xff) << 8);
 }
+
+void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num)
+{
+static const int pci_dsn_ver = 1;
+static const int pci_dsn_cap = 4;
+
+pcie_add_capability(dev, PCI_EXT_CAP_ID_DSN, pci_dsn_ver, offset,
+PCI_EXT_CAP_DSN_SIZEOF);
+pci_set_quad(dev->config + offset + pci_dsn_cap, ser_num);
+}
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index cbbf0c5..056d25e 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -119,6 +119,7 @@ void pcie_add_capability(PCIDevice *dev,
  uint16_t offset, uint16_t size);
 
 void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
+void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num);
 
 extern const VMStateDescription vmstate_pcie_device;
 
-- 
2.5.0




[Qemu-devel] [PATCH v4 16/16] e1000e: Introduce qtest for e1000e device

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 tests/Makefile  |   3 +
 tests/e1000e-test.c | 480 
 2 files changed, 483 insertions(+)
 create mode 100644 tests/e1000e-test.c

diff --git a/tests/Makefile b/tests/Makefile
index c91bb2a..2ce1ac3 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -138,6 +138,8 @@ gcov-files-virtio-y += $(gcov-files-virtioserial-y)
 
 check-qtest-pci-y += tests/e1000-test$(EXESUF)
 gcov-files-pci-y += hw/net/e1000.c
+check-qtest-pci-y += tests/e1000e-test$(EXESUF)
+gcov-files-pci-y += hw/net/e1000e.c hw/net/e1000e_core.c
 check-qtest-pci-y += tests/rtl8139-test$(EXESUF)
 gcov-files-pci-y += hw/net/rtl8139.c
 check-qtest-pci-y += tests/pcnet-test$(EXESUF)
@@ -544,6 +546,7 @@ tests/i440fx-test$(EXESUF): tests/i440fx-test.o 
$(libqos-pc-obj-y)
 tests/q35-test$(EXESUF): tests/q35-test.o $(libqos-pc-obj-y)
 tests/fw_cfg-test$(EXESUF): tests/fw_cfg-test.o $(libqos-pc-obj-y)
 tests/e1000-test$(EXESUF): tests/e1000-test.o
+tests/e1000e-test$(EXESUF): tests/e1000e-test.o $(libqos-pc-obj-y)
 tests/rtl8139-test$(EXESUF): tests/rtl8139-test.o $(libqos-pc-obj-y)
 tests/pcnet-test$(EXESUF): tests/pcnet-test.o
 tests/eepro100-test$(EXESUF): tests/eepro100-test.o
diff --git a/tests/e1000e-test.c b/tests/e1000e-test.c
new file mode 100644
index 000..d6e6311
--- /dev/null
+++ b/tests/e1000e-test.c
@@ -0,0 +1,480 @@
+ /*
+ * QTest testcase for e1000e NIC
+ *
+ * Copyright (c) 2015 Ravello Systems LTD (http://ravellosystems.com)
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ * Dmitry Fleytman 
+ * Leonid Bloch 
+ * Yan Vugenfirer 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+
+#include "qemu/osdep.h"
+#include 
+#include "libqtest.h"
+#include "qemu-common.h"
+#include "libqos/pci-pc.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/bitops.h"
+#include "libqos/malloc.h"
+#include "libqos/malloc-pc.h"
+#include "libqos/malloc-generic.h"
+
+#define E1000E_IMS  (0x00d0)
+
+#define E1000E_STATUS   (0x0008)
+#define E1000E_STATUS_LU BIT(1)
+#define E1000E_STATUS_ASDV1000 BIT(9)
+
+#define E1000E_CTRL (0x)
+#define E1000E_CTRL_RESET BIT(26)
+
+#define E1000E_RCTL (0x0100)
+#define E1000E_RCTL_EN  BIT(1)
+#define E1000E_RCTL_UPE BIT(3)
+#define E1000E_RCTL_MPE BIT(4)
+
+#define E1000E_RFCTL (0x5008)
+#define E1000E_RFCTL_EXTEN  BIT(15)
+
+#define E1000E_TCTL (0x0400)
+#define E1000E_TCTL_EN  BIT(1)
+
+#define E1000E_CTRL_EXT (0x0018)
+#define E1000E_CTRL_EXT_DRV_LOADBIT(28)
+#define E1000E_CTRL_EXT_TXLSFLOWBIT(22)
+
+#define E1000E_RX0_MSG_ID   (0)
+#define E1000E_TX0_MSG_ID   (1)
+#define E1000E_OTHER_MSG_ID (2)
+
+#define E1000E_IVAR (0x00E4)
+#define E1000E_IVAR_TEST_CFG((E1000E_RX0_MSG_ID << 0)| BIT(3)  | \
+ (E1000E_TX0_MSG_ID << 8)| BIT(11) | \
+ (E1000E_OTHER_MSG_ID << 16) | BIT(19) | \
+ BIT(31))
+
+#define E1000E_RING_LEN (0x1000)
+#define E1000E_TXD_LEN  (16)
+#define E1000E_RXD_LEN  (16)
+
+#define E1000E_TDBAL(0x3800)
+#define E1000E_TDBAH(0x3804)
+#define E1000E_TDLEN(0x3808)
+#define E1000E_TDH  (0x3810)
+#define E1000E_TDT  (0x3818)
+
+#define E1000E_RDBAL(0x2800)
+#define E1000E_RDBAH(0x2804)
+#define E1000E_RDLEN(0x2808)
+#define E1000E_RDH  (0x2810)
+#define E1000E_RDT  (0x2818)
+
+typedef struct {
+QPCIDevice *pci_dev;
+void *mac_regs;
+
+uint64_t tx_ring;
+uint64_t rx_ring;
+} e1000e_device;
+
+static int test_sockets[2];
+static QGuestAllocator *test_alloc;
+static QPCIBus *test_bus;
+
+static void e1000e_pci_foreach_callback(QPCIDevice *dev, int devfn, void *data)
+{
+*(QPCIDevice **) data = dev;
+}
+
+static QPCIDevice *e1000e_device_find(QPCIBus *bus)
+{
+static const int e1000e_vendor_id = 0x8086;
+static const int e1000e_dev_id = 0x10D3;
+
+QPCIDevice *e1000e_dev = NULL;
+
+qpci_device_foreach(bus, e1000e_vendor_id, 

[Qemu-devel] [PATCH v4 03/16] pcie: Add support for PCIe CAP v1

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Added support for PCIe CAP v1, while reusing some of the existing v2
infrastructure.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/pci/pcie.c  | 84 --
 include/hw/pci/pcie.h  |  4 +++
 include/hw/pci/pcie_regs.h |  5 +--
 3 files changed, 73 insertions(+), 20 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 728386a..24cfc3b 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -43,26 +43,15 @@
 /***
  * pci express capability helper functions
  */
-int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
-{
-int pos;
-uint8_t *exp_cap;
-
-assert(pci_is_express(dev));
-
-pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset,
- PCI_EXP_VER2_SIZEOF);
-if (pos < 0) {
-return pos;
-}
-dev->exp.exp_cap = pos;
-exp_cap = dev->config + pos;
 
+static void
+pcie_cap_v1_fill(uint8_t *exp_cap, uint8_t port, uint8_t type, uint8_t version)
+{
 /* capability register
-   interrupt message number defaults to 0 */
+interrupt message number defaults to 0 */
 pci_set_word(exp_cap + PCI_EXP_FLAGS,
  ((type << PCI_EXP_FLAGS_TYPE_SHIFT) & PCI_EXP_FLAGS_TYPE) |
- PCI_EXP_FLAGS_VER2);
+ version);
 
 /* device capability register
  * table 7-12:
@@ -81,7 +70,27 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t 
type, uint8_t port)
 
 pci_set_word(exp_cap + PCI_EXP_LNKSTA,
  PCI_EXP_LNK_MLW_1 | PCI_EXP_LNK_LS_25 |PCI_EXP_LNKSTA_DLLLA);
+}
+
+int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
+{
+/* PCIe cap v2 init */
+int pos;
+uint8_t *exp_cap;
+
+assert(pci_is_express(dev));
+
+pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset, PCI_EXP_VER2_SIZEOF);
+if (pos < 0) {
+return pos;
+}
+dev->exp.exp_cap = pos;
+exp_cap = dev->config + pos;
+
+/* Filling values common with v1 */
+pcie_cap_v1_fill(exp_cap, port, type, PCI_EXP_FLAGS_VER2);
 
+/* Filling v2 specific values */
 pci_set_long(exp_cap + PCI_EXP_DEVCAP2,
  PCI_EXP_DEVCAP2_EFF | PCI_EXP_DEVCAP2_EETLPP);
 
@@ -89,7 +98,29 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t 
type, uint8_t port)
 return pos;
 }
 
-int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
+int pcie_cap_v1_init(PCIDevice *dev, uint8_t offset, uint8_t type,
+ uint8_t port)
+{
+/* PCIe cap v1 init */
+int pos;
+uint8_t *exp_cap;
+
+assert(pci_is_express(dev));
+
+pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset, PCI_EXP_VER1_SIZEOF);
+if (pos < 0) {
+return pos;
+}
+dev->exp.exp_cap = pos;
+exp_cap = dev->config + pos;
+
+pcie_cap_v1_fill(exp_cap, port, type, PCI_EXP_FLAGS_VER1);
+
+return pos;
+}
+
+static int
+pcie_endpoint_cap_common_init(PCIDevice *dev, uint8_t offset, uint8_t cap_size)
 {
 uint8_t type = PCI_EXP_TYPE_ENDPOINT;
 
@@ -102,7 +133,19 @@ int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
 type = PCI_EXP_TYPE_RC_END;
 }
 
-return pcie_cap_init(dev, offset, type, 0);
+return (cap_size == PCI_EXP_VER1_SIZEOF)
+? pcie_cap_v1_init(dev, offset, type, 0)
+: pcie_cap_init(dev, offset, type, 0);
+}
+
+int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
+{
+return pcie_endpoint_cap_common_init(dev, offset, PCI_EXP_VER2_SIZEOF);
+}
+
+int pcie_endpoint_cap_v1_init(PCIDevice *dev, uint8_t offset)
+{
+return pcie_endpoint_cap_common_init(dev, offset, PCI_EXP_VER1_SIZEOF);
 }
 
 void pcie_cap_exit(PCIDevice *dev)
@@ -110,6 +153,11 @@ void pcie_cap_exit(PCIDevice *dev)
 pci_del_capability(dev, PCI_CAP_ID_EXP, PCI_EXP_VER2_SIZEOF);
 }
 
+void pcie_cap_v1_exit(PCIDevice *dev)
+{
+pci_del_capability(dev, PCI_CAP_ID_EXP, PCI_EXP_VER1_SIZEOF);
+}
+
 uint8_t pcie_cap_get_type(const PCIDevice *dev)
 {
 uint32_t pos = dev->exp.exp_cap;
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index b48a7a2..cbbf0c5 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -80,8 +80,12 @@ struct PCIExpressDevice {
 
 /* PCI express capability helper functions */
 int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port);
+int pcie_cap_v1_init(PCIDevice *dev, uint8_t offset,
+ uint8_t type, uint8_t port);
 int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset);
 void pcie_cap_exit(PCIDevice *dev);
+int pcie_endpoint_cap_v1_init(PCIDevice *dev, uint8_t offset);
+void pcie_cap_v1_exit(PCIDevice *dev);
 uint8_t pcie_cap_get_type(const PCIDevice *dev);
 void pcie_cap_flags_set_vector(PCIDevice *dev, uint8_t 

[Qemu-devel] [PATCH v4 06/16] net: Introduce Toeplitz hash calculator

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 include/net/checksum.h | 45 +
 1 file changed, 45 insertions(+)

diff --git a/include/net/checksum.h b/include/net/checksum.h
index 7de1acb..dd8b4f6 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -18,6 +18,7 @@
 #ifndef QEMU_NET_CHECKSUM_H
 #define QEMU_NET_CHECKSUM_H
 
+#include "qemu/bswap.h"
 struct iovec;
 
 uint32_t net_checksum_add_cont(int len, uint8_t *buf, int seq);
@@ -50,4 +51,48 @@ uint32_t net_checksum_add_iov(const struct iovec *iov,
   const unsigned int iov_cnt,
   uint32_t iov_off, uint32_t size);
 
+typedef struct toeplitz_key_st {
+uint32_t leftmost_32_bits;
+uint8_t *next_byte;
+} net_toeplitz_key;
+
+static inline
+void net_toeplitz_key_init(net_toeplitz_key *key, uint8_t *key_bytes)
+{
+key->leftmost_32_bits = be32_to_cpu(*(uint32_t *)key_bytes);
+key->next_byte = key_bytes + sizeof(uint32_t);
+}
+
+static inline
+void net_toeplitz_add(uint32_t *result,
+  uint8_t *input,
+  uint32_t len,
+  net_toeplitz_key *key)
+{
+register uint32_t accumulator = *result;
+register uint32_t leftmost_32_bits = key->leftmost_32_bits;
+register uint32_t byte;
+
+for (byte = 0; byte < len; byte++) {
+register uint8_t input_byte = input[byte];
+register uint8_t key_byte = *(key->next_byte++);
+register uint8_t bit;
+
+for (bit = 0; bit < 8; bit++) {
+if (input_byte & (1 << 7)) {
+accumulator ^= leftmost_32_bits;
+}
+
+leftmost_32_bits =
+(leftmost_32_bits << 1) | ((key_byte & (1 << 7)) >> 7);
+
+input_byte <<= 1;
+key_byte <<= 1;
+}
+}
+
+key->leftmost_32_bits = leftmost_32_bits;
+*result = accumulator;
+}
+
 #endif /* QEMU_NET_CHECKSUM_H */
-- 
2.5.0




[Qemu-devel] [PATCH v4 00/16] Introduce Intel 82574 GbE Controller Emulation (e1000e)

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Hello All,

This is v4 of e1000e series.

For convenience, the same patches are available at:
https://github.com/daynix/qemu-e1000e/tree/e1000e-submit-v4

Best regards,
Dmitry.

Changes since v3:

1. Various code fixes as suggested by Jason and Michael
2. Rebased to the latest master

Changes since v2:

1. Interrupt storm on latest Linux kernels fixed
2. Device unit test added
3. Introduced code sharing between e1000 and e1000e
4. Various code fixes as suggested by Jason
5. Rebased to the latest master

Changes since v1:

1. PCI_PM_CAP_VER_1_1 is defined now in include/hw/pci/pci_regs.h and
   not in include/standard-headers/linux/pci_regs.h.
2. Changes in naming and extra comments in hw/pci/pcie.c and in
   include/hw/pci/pcie.h.
3. Defining pci_dsn_ver and pci_dsn_cap static const variables in
   hw/pci/pcie.c, instead of PCI_DSN_VER and PCI_DSN_CAP symbolic
   constants in include/hw/pci/pcie_regs.h.
4. Changing the vmxnet3_device_serial_num function in hw/net/vmxnet3.c
   to avoid the cast when it is called.
5. Avoiding a preceding underscore in all the e1000e-related names.
6. Minor style changes.

===

Hello All,

This series is the final code of the e1000e device emulation, that we
have developed. Please review, and consider acceptance of these patches
to the upstream QEMU repository.

The code stability was verified by various traffic tests using Fedora 22
Linux, and Windows Server 2012R2 guests. Also, Microsoft Hardware
Certification Kit (HCK) tests were run on a Windows Server 2012R2 guest.

There was a discussion on the possibility of code sharing between the
e1000e, and the existing e1000 devices. We have reviewed the final code
for parts that may be shared between this device and the currently
available e1000 emulation. The device specifications are very different,
and there are almost no registers, nor functions, that were left as is
from e1000. The ring descriptor structures were changed as well, by the
introduction of extended and PS descriptors, as well as additional bits.

Additional differences stem from the fact that the e1000e device re-uses
network packet abstractions introduced by the vmxnet3 device, while the
e1000 has its own code for packet handling. BTW, it may be worth reusing
those abstractions in e1000 as well. (Following these changes the
vmxnet3 device was successfully tested for possible regressions.)

There are a few minor parts that may be shared, e.g. the default
register handlers, and the ring management functions. The total amount
of shared lines will be about 100--150, so we're not sure if it makes
sense bothering, and taking a risk of breaking e1000, which is a good,
old, and stable device.

Currently, the e1000e code is stand alone w.r.t. e1000.

Please share your thoughts.

Thanks in advance,
Dmitry.

Changes since RFCv2:

1. Device functionality verified using Microsoft Hardware Certification Test 
Kit (HCK)
2. Introduced a number of performance improvements
3. The code was cleaned, and rebased to the latest master
4. Patches verified with checkpatch.pl

===

Changes since RFCv1:

1. Added support for all the device features:
  - Interrupt moderation.
  - RSS.
  - Multiqueue.
2. Simulated exact PCI/PCIe configuration space layout.
3. Made fixes needed to pass Microsoft's HW certification tests (HCK).

This series is still an RFC, because the following tasks are not done yet:

1. See which code can be shared between this device and the existing e1000 
device.
2. Rebase patches to the latest master (current base is v2.3.0).

Please share your thoughts,
Thanks, Dmitry.

===

Hello qemu-devel,

This patch series is an RFC for the new networking device emulation
we're developing for QEMU.

This new device emulates the Intel 82574 GbE Controller and works
with unmodified Intel e1000e drivers from the Linux/Windows kernels.

The status of the current series is "Functional Device Ready, work
on Extended Features in Progress".

More precisely, these patches represent a functional device, which
is recognized by the standard Intel drivers, and is able to transfer
TX/RX packets with CSO/TSO offloads, according to the spec.

Extended features not supported yet (work in progress):
  1. TX/RX Interrupt moderation mechanisms
  2. RSS
  3. Full-featured multi-queue (use of multiqueued network backend)

Also, there will be some code refactoring and performance
optimization efforts.

This series was tested on Linux (Fedora 22) and Windows (2012R2)
guests, using Iperf, with TX/RX and TCP/UDP streams, and various
packet sizes.

More thorough testing, including data streams with different MTU
sizes, and Microsoft Certification (HLK) tests, are pending missing
features' development.

See commit messages (esp. "net: Introduce e1000e device emulation")
for more information about the development approaches and the
architecture options chosen for this device.

This series is based 

[Qemu-devel] [PATCH v4 14/16] e1000: Move out code that will be reused in e1000e

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Code that will be shared moved to a separate files.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 MAINTAINERS|   5 +
 hw/net/Makefile.objs   |   2 +-
 hw/net/e1000.c | 411 +++--
 hw/net/e1000x_common.c | 267 
 hw/net/e1000x_common.h | 213 +
 trace-events   |  13 ++
 6 files changed, 591 insertions(+), 320 deletions(-)
 create mode 100644 hw/net/e1000x_common.c
 create mode 100644 hw/net/e1000x_common.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 445c89e..ca44cec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -980,6 +980,11 @@ F: hw/acpi/nvdimm.c
 F: hw/mem/nvdimm.c
 F: include/hw/mem/nvdimm.h
 
+e1000x
+M: Dmitry Fleytman 
+S: Maintained
+F: hw/net/e1000x*
+
 Subsystems
 --
 Audio
diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
index 527d264..bc69948 100644
--- a/hw/net/Makefile.objs
+++ b/hw/net/Makefile.objs
@@ -6,7 +6,7 @@ common-obj-$(CONFIG_NE2000_PCI) += ne2000.o
 common-obj-$(CONFIG_EEPRO100_PCI) += eepro100.o
 common-obj-$(CONFIG_PCNET_PCI) += pcnet-pci.o
 common-obj-$(CONFIG_PCNET_COMMON) += pcnet.o
-common-obj-$(CONFIG_E1000_PCI) += e1000.o
+common-obj-$(CONFIG_E1000_PCI) += e1000.o e1000x_common.o
 common-obj-$(CONFIG_RTL8139_PCI) += rtl8139.o
 common-obj-$(CONFIG_VMXNET3_PCI) += net_tx_pkt.o net_rx_pkt.o
 common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet3.o
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8e79b55..36e3dbe 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -36,7 +36,7 @@
 #include "qemu/iov.h"
 #include "qemu/range.h"
 
-#include "e1000_regs.h"
+#include "e1000x_common.h"
 
 static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
 
@@ -64,11 +64,6 @@ static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
 #define PNPMMIO_SIZE  0x2
 #define MIN_BUF_SIZE  60 /* Min. octets in an ethernet frame sans FCS */
 
-/* this is the size past which hardware will drop packets when setting LPE=0 */
-#define MAXIMUM_ETHERNET_VLAN_SIZE 1522
-/* this is the size past which hardware will drop packets when setting LPE=1 */
-#define MAXIMUM_ETHERNET_LPE_SIZE 16384
-
 #define MAXIMUM_ETHERNET_HDR_LEN (14+4)
 
 /*
@@ -102,22 +97,9 @@ typedef struct E1000State_st {
 unsigned char vlan[4];
 unsigned char data[0x1];
 uint16_t size;
-unsigned char sum_needed;
 unsigned char vlan_needed;
-uint8_t ipcss;
-uint8_t ipcso;
-uint16_t ipcse;
-uint8_t tucss;
-uint8_t tucso;
-uint16_t tucse;
-uint8_t hdr_len;
-uint16_t mss;
-uint32_t paylen;
+e1000x_txd_props props;
 uint16_t tso_frames;
-char tse;
-int8_t ip;
-int8_t tcp;
-char cptse; // current packet tse bit
 } tx;
 
 struct {
@@ -162,52 +144,19 @@ typedef struct E1000BaseClass {
 #define E1000_DEVICE_GET_CLASS(obj) \
 OBJECT_GET_CLASS(E1000BaseClass, (obj), TYPE_E1000_BASE)
 
-#define defreg(x)x = (E1000_##x>>2)
-enum {
-defreg(CTRL),defreg(EECD),defreg(EERD),defreg(GPRC),
-defreg(GPTC),defreg(ICR), defreg(ICS), defreg(IMC),
-defreg(IMS), defreg(LEDCTL),  defreg(MANC),defreg(MDIC),
-defreg(MPC), defreg(PBA), defreg(RCTL),defreg(RDBAH),
-defreg(RDBAL),   defreg(RDH), defreg(RDLEN),   defreg(RDT),
-defreg(STATUS),  defreg(SWSM),defreg(TCTL),defreg(TDBAH),
-defreg(TDBAL),   defreg(TDH), defreg(TDLEN),   defreg(TDT),
-defreg(TORH),defreg(TORL),defreg(TOTH),defreg(TOTL),
-defreg(TPR), defreg(TPT), defreg(TXDCTL),  defreg(WUFC),
-defreg(RA),  defreg(MTA), defreg(CRCERRS), defreg(VFTA),
-defreg(VET), defreg(RDTR),defreg(RADV),defreg(TADV),
-defreg(ITR), defreg(FCRUC),   defreg(TDFH),defreg(TDFT),
-defreg(TDFHS),   defreg(TDFTS),   defreg(TDFPC),   defreg(RDFH),
-defreg(RDFT),defreg(RDFHS),   defreg(RDFTS),   defreg(RDFPC),
-defreg(IPAV),defreg(WUC), defreg(WUS), defreg(AIT),
-defreg(IP6AT),   defreg(IP4AT),   defreg(FFLT),defreg(FFMT),
-defreg(FFVT),defreg(WUPM),defreg(PBM), defreg(SCC),
-defreg(ECOL),defreg(MCC), defreg(LATECOL), defreg(COLC),
-defreg(DC),  defreg(TNCRS),   defreg(SEC), defreg(CEXTERR),
-defreg(RLEC),defreg(XONRXC),  defreg(XONTXC),  defreg(XOFFRXC),
-defreg(XOFFTXC), defreg(RFC), defreg(RJC), defreg(RNBC),
-defreg(TSCTFC),  defreg(MGTPRC),  defreg(MGTPDC),  defreg(MGTPTC),
-defreg(RUC), defreg(ROC), defreg(GORCL),   defreg(GORCH),
-defreg(GOTCL),   defreg(GOTCH),   defreg(BPRC),defreg(MPRC),
-defreg(TSCTC),   defreg(PRC64),   defreg(PRC127),  defreg(PRC255),
- 

[Qemu-devel] [PATCH v4 05/16] vmxnet3: Use generic function for DSN capability definition

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/vmxnet3.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 093a71e..0a4db4d 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2255,9 +2255,9 @@ static const MemoryRegionOps b1_ops = {
 },
 };
 
-static uint8_t *vmxnet3_device_serial_num(VMXNET3State *s)
+static uint64_t vmxnet3_device_serial_num(VMXNET3State *s)
 {
-static uint64_t dsn_payload;
+uint64_t dsn_payload;
 uint8_t *dsnp = (uint8_t *)_payload;
 
 dsnp[0] = 0xfe;
@@ -2268,7 +2268,7 @@ static uint8_t *vmxnet3_device_serial_num(VMXNET3State *s)
 dsnp[5] = s->conf.macaddr.a[1];
 dsnp[6] = s->conf.macaddr.a[2];
 dsnp[7] = 0xff;
-return dsnp;
+return dsn_payload;
 }
 
 static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
@@ -2313,10 +2313,8 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, 
Error **errp)
 pcie_endpoint_cap_init(pci_dev, VMXNET3_EXP_EP_OFFSET);
 }
 
-pcie_add_capability(pci_dev, PCI_EXT_CAP_ID_DSN, 0x1,
-VMXNET3_DSN_OFFSET, PCI_EXT_CAP_DSN_SIZEOF);
-memcpy(pci_dev->config + VMXNET3_DSN_OFFSET + 4,
-   vmxnet3_device_serial_num(s), sizeof(uint64_t));
+pcie_dev_ser_num_init(pci_dev, VMXNET3_DSN_OFFSET,
+  vmxnet3_device_serial_num(s));
 }
 
 register_savevm(dev, "vmxnet3-msix", -1, 1,
-- 
2.5.0




[Qemu-devel] [PATCH v4 02/16] pci: Introduce define for PM capability version 1.1

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 include/hw/pci/pci_regs.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/pci/pci_regs.h b/include/hw/pci/pci_regs.h
index ba8cbe9..7a83142 100644
--- a/include/hw/pci/pci_regs.h
+++ b/include/hw/pci/pci_regs.h
@@ -1 +1,3 @@
 #include "standard-headers/linux/pci_regs.h"
+
+#define  PCI_PM_CAP_VER_1_1 0x0002  /* PCI PM spec ver. 1.1 */
-- 
2.5.0




[Qemu-devel] [PATCH v4 01/16] msix: make msix_clr_pending() visible for clients

2016-04-10 Thread Dmitry Fleytman
From: Dmitry Fleytman 

This function will be used by e1000e device code.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/pci/msix.c | 2 +-
 include/hw/pci/msix.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index b75f0e9..0ec1cb1 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -72,7 +72,7 @@ void msix_set_pending(PCIDevice *dev, unsigned int vector)
 *msix_pending_byte(dev, vector) |= msix_pending_mask(vector);
 }
 
-static void msix_clr_pending(PCIDevice *dev, int vector)
+void msix_clr_pending(PCIDevice *dev, int vector)
 {
 *msix_pending_byte(dev, vector) &= ~msix_pending_mask(vector);
 }
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 72e5f93..048a29d 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -29,6 +29,7 @@ int msix_present(PCIDevice *dev);
 
 bool msix_is_masked(PCIDevice *dev, unsigned vector);
 void msix_set_pending(PCIDevice *dev, unsigned vector);
+void msix_clr_pending(PCIDevice *dev, int vector);
 
 int msix_vector_use(PCIDevice *dev, unsigned vector);
 void msix_vector_unuse(PCIDevice *dev, unsigned vector);
-- 
2.5.0




Re: [Qemu-devel] [PATCH v2 13/13] net: Introduce e1000e device emulation

2016-04-10 Thread Dmitry Fleytman
Hi Jason,

See below...

> On 7 Apr 2016, at 10:24 AM, Jason Wang  wrote:

[…]

> 
 +Device properties:
 +
 ++-++-+
 +| Propery name   | Description|  Type  | Default |
 ++|
 +|  subsys_ven| PCI device Subsystem Vendor ID | UINT16 | 0x8086  |
 +|||| |
 +|  subsys| PCI device Subsystem ID| UINT16 | 0   |
 +|||| |
 +|  vnet  | Use virtio headers or perform  | BOOL   | TRUE|
 +|| SW offloads emulation  || |
 +++++-+
>>> 
>>> You may using PropertyInfo which has a "description" field to do this
>>> better (e.g qdev_prop_vlan). Then there's even no need for this file.
>> 
>> We replaced this file with source code comments.
>> As for PropertyInfo description I can’t see any way to pass
>> description to DEFINE_PROP_* macros. Could you please elaborate on this?
> 
> You could use DEFINE_PROP() which can accept a PropertyInfo parameter.

I see now. Done.

> 
>> 
>>> 
 

[…]

 +if (!nc->peer || !qemu_has_vnet_hdr(nc->peer)) {
 +s->core.has_vnet = false;
>>> 
>>> So in fact we're probing the vnet support instead of doing vnet our own
>>> if backend does not support it. It seems to me that there's no need to
>>> having "vnet" property.
>> 
>> This property is intended for forcible disable of virtio headers
>> support. Possible use cases are verification of SW offloads logic and
>> work around for theoretical interoperability problems.
> 
> Ok, so maybe we'd better rename it to "disable_vnet”.

Ok. Renamed.

> 
>> 
>>> 
 +trace_e1000e_cfg_support_virtio(false);
 +return;
 +}
 +}
 +
> 
> 
> 
 +VMSTATE_UINT8(core.tx[1].sum_needed, E1000EState),
 +VMSTATE_UINT8(core.tx[1].ipcss, E1000EState),
 +VMSTATE_UINT8(core.tx[1].ipcso, E1000EState),
 +VMSTATE_UINT16(core.tx[1].ipcse, E1000EState),
 +VMSTATE_UINT8(core.tx[1].tucss, E1000EState),
 +VMSTATE_UINT8(core.tx[1].tucso, E1000EState),
 +VMSTATE_UINT16(core.tx[1].tucse, E1000EState),
 +VMSTATE_UINT8(core.tx[1].hdr_len, E1000EState),
 +VMSTATE_UINT16(core.tx[1].mss, E1000EState),
 +VMSTATE_UINT32(core.tx[1].paylen, E1000EState),
 +VMSTATE_INT8(core.tx[1].ip, E1000EState),
 +VMSTATE_INT8(core.tx[1].tcp, E1000EState),
 +VMSTATE_BOOL(core.tx[1].tse, E1000EState),
 +VMSTATE_BOOL(core.tx[1].cptse, E1000EState),
 +VMSTATE_BOOL(core.tx[1].skip_cp, E1000EState),
>>> 
>>> You can move the tx state into another structure and use VMSTATE_ARRAY()
>>> here.
>> 
>> Not sure I got you point. Could you please provide more details?
> 
> I mean e.g, move skip_cp into e1000e_txd_props and move tx_pkt out of
> e1000_tx. Then you can use VMSTATE_ARRAY to save and load
> e1000_txd_props array.

I got your point. Done via VMSTATE_STRUCT_ARRAY.


> 
>> 
>>> 
 +VMSTATE_END_OF_LIST()


[…]


>>> 
>>> Should we stop/start the above timers during vm stop/start through vm
>>> state change handler?
>> 
>> Not sure. Is there any reason for this?
> 
> Otherwise we may raise irq during vm stop?

Right. Done.

> 
> [...]
> 
 +
 +static inline void
 +e1000e_tx_ring_init(E1000ECore *core, E1000E_TxRing *txr, int idx)
 +{
 +static const E1000E_RingInfo i[E1000E_NUM_QUEUES] = {
 +{ TDBAH,  TDBAL,  TDLEN,  TDH,  TDT, 0 },
 +{ TDBAH1, TDBAL1, TDLEN1, TDH1, TDT1, 1 }
 +};
>>> 
>>> Instead of using static inside a function, why not just use a global
>>> array and then there's no need for this function and caller can access
>>> it directly?
>> 
>> Anyway there should be a function to combine the two assignments below.
>> And since this array is used by this function only it should better be
>> hidden.
> 
> I mean you can avoid the assignment before each transmission by just
> introducing arrays like:
> 
> int tdt[] = {TDT, TDT1};
> 
> And then access them through qidx, e.g: core->mac[tdt[qidx]].

I’m not sure about this.

Current approach requires assignment of one pointer.
Alternative approach requires assignment of queue index in the same place or 
passing queue index as function parameter everywhere.
Also as for me the current approach make code look simpler.

What do you think?

> 
>> 
>>> 
 +

[…]

I’m sending v4,

Thanks,
Dmitry




Re: [Qemu-devel] configure qemu and compile it on windows

2016-04-10 Thread Marwa Hamza
thanks stefan for your answer ,
 i just download cygwin , mingw64 and msys2 and packages u mentioned , all
in c:/ , and i download qemu from source in c:/msys2/ , but when i run
./configure i got this error ERROR: Python not found. Use
--python=/path/to/python
[image: Images intégrées 1]

2016-04-10 15:56 GMT+02:00 Marwa Hamza :

> thanks stefan for your answer , i just download cygwin , mingw64 and msys2
> and packages u mentioned , all in c:/ , and i download qemu from source in
> c:/msys2/ , but when i run ./configure i got this error ERROR: Python not
> found. Use --python=/path/to/python
> [image: Images intégrées 1]
>
> 2016-04-10 12:19 GMT+02:00 Stefan Weil :
>
>> Am 10.04.2016 um 11:27 schrieb Marwa Hamza:
>>
>> hello
>> i'm trying to configure qemu on windows so i'm using MinGW and msys
>>  every time i run the script configure (./configure) an error message
>> show about missing packages , i fixed those errors i installed pkg-config
>>  , zlib packages but the last error i couldn't fix it
>> ERROR: glib-2.12 gthread-2.0 is required to compile QEMU
>> please any one have any idea how to fix that
>> thanks
>>
>>
>> MinGW is no longer supported because of unmaintained header files,
>> missing packages, old compilers and other reasons.
>>
>> I suggest using Cygwin and the Mingw64 cross tools and packages
>> provided by Cygwin. For 64 bit Windows, see these packages:
>>
>>
>> https://cygwin.com/cgi-bin2/package-grep.cgi?grep=mingw64-x86_64=x86_64
>>
>> 32 bit uses these packages:
>>
>> https://cygwin.com/cgi-bin2/package-grep.cgi?grep=mingw64-i686=x86_64
>>
>> (Both examples are for 64 bit Cygwin)
>>
>> With Cygwin, it is no longer necessary to search packages for QEMU at
>> different places or to build them yourself.
>>
>> Either install all those packages, or start with
>> mingw64-i686-gcc-g++-4.9.2-2,
>> mingw64-i686-gtk3-3.18.6-1 and maybe some more (most packages needed
>> will be installed automatically because of dependencies).
>>
>> Regards,
>> Stefan
>>
>
>


[Qemu-devel] [PATCH] [PATCH] [WIP] [RFC] Add initial 9pfs support for Windows hosts

2016-04-10 Thread Michael Fritscher
It was tested on Windows & Linux hosts, on the later no obvious regressions 
could be found. The guest was a Knoppix 7.6.0 live cd.

This is WIP and a RFC - it isn't meant to be upstreamed yet. The error_printf 
are only for debugging and will be deleted in the final patch.

There are some comments with "mifritscher" - I'm not quite sure how to handle 
these problems (e.g. on mingw, this in 32 bit, while on Linux this is 16 bit 
and so on). Additionally, some places from which I suspect problems are marked 
with a #warning.

What works:
   * mount
   * stat
   * chdir
   * readdir (aka ls)
   * read of small files (<10k)
   * overwrite

What does not work:
   * create new files (in 90% problems with path handling (old path occurs)
   * read / probably write big files

TODO:
   * fix broken functionality
   * test & fix mapped permission style
   * prepend a 777 on all files if security modell "NONE"
   * test everything - on Windows and Linux

Signed-off-by: Michael Fritscher 
---
Many thanks to Stefan Weil for supporting me!
 Makefile.objs  |   1 +
 configure  |  15 +++-
 fsdev/9p-iov-marshal.c |   2 +
 fsdev/9p-marshal.c |   4 +-
 fsdev/file-op-9p.h |  35 +++-
 fsdev/qemu-fsdev.c |   2 +
 hw/9pfs/9p-local.c | 200 +
 hw/9pfs/9p-synth.c |  10 ++-
 hw/9pfs/9p.c   |  99 --
 hw/9pfs/9p.h   |  70 +++-
 hw/9pfs/Makefile.objs  |   7 +-
 hw/9pfs/codir.c|  35 
 hw/9pfs/cofile.c   |  15 
 hw/9pfs/cofs.c |  15 
 hw/9pfs/virtio-9p-device.c |   4 +-
 15 files changed, 478 insertions(+), 36 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 8f705f6..6fd02bc 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -48,6 +48,7 @@ common-obj-$(CONFIG_WIN32) += os-win32.o
 common-obj-$(CONFIG_POSIX) += os-posix.o
 
 common-obj-$(CONFIG_LINUX) += fsdev/
+common-obj-$(CONFIG_WIN32) += fsdev/
 
 common-obj-y += migration/
 common-obj-y += qemu-char.o #aio.o
diff --git a/configure b/configure
index 5db29f0..a4797c3 100755
--- a/configure
+++ b/configure
@@ -4566,12 +4566,21 @@ if test "$want_tools" = "yes" ; then
 fi
 if test "$softmmu" = yes ; then
   if test "$virtfs" != no ; then
-if test "$cap" = yes && test "$linux" = yes && test "$attr" = yes ; then
+if test "$linux" = yes ; then
+  if test "$cap" = yes  && test "$attr" = yes ; then
+virtfs=yes
+tools="$tools fsdev/virtfs-proxy-helper\$(EXESUF)"
+  else
+if test "$virtfs" = yes; then
+  error_exit "VirtFS requires libcap-devel and libattr-devel on Linux"
+fi
+virtfs=no
+  fi
+elif test "$mingw32" = yes; then
   virtfs=yes
-  tools="$tools fsdev/virtfs-proxy-helper\$(EXESUF)"
 else
   if test "$virtfs" = yes; then
-error_exit "VirtFS is supported only on Linux and requires 
libcap-devel and libattr-devel"
+error_exit "VirtFS is only supported on Linux or Windows"
   fi
   virtfs=no
 fi
diff --git a/fsdev/9p-iov-marshal.c b/fsdev/9p-iov-marshal.c
index fb40bdf..1a292c6 100644
--- a/fsdev/9p-iov-marshal.c
+++ b/fsdev/9p-iov-marshal.c
@@ -15,7 +15,9 @@
 #include 
 #include 
 #include 
+#ifndef _WIN32
 #include 
+#endif
 
 #include "9p-iov-marshal.h"
 #include "qemu/bswap.h"
diff --git a/fsdev/9p-marshal.c b/fsdev/9p-marshal.c
index 183d366..081cb88 100644
--- a/fsdev/9p-marshal.c
+++ b/fsdev/9p-marshal.c
@@ -16,7 +16,9 @@
 #include 
 #include 
 #include 
-#include 
+#ifndef WIN32
+#include 
+#endif
 
 #include "9p-marshal.h"
 
diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index b8c2602..446c6a5 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -14,12 +14,43 @@
 #define _FILEOP_H
 #include 
 #include 
-#include 
-#include 
+#ifndef _WIN32
+#include 
+#include 
+#endif
 
 #define SM_LOCAL_MODE_BITS0600
 #define SM_LOCAL_DIR_MODE_BITS0700
 
+#ifdef _WIN32
+typedef uint32_t uid_t;
+typedef uint32_t gid_t;
+
+/* from http://man7.org/linux/man-pages/man2/statfs.2.html */
+typedef uint32_t __fsword_t;
+typedef uint32_t fsblkcnt_t;
+typedef uint32_t fsfilcnt_t;
+
+/* from linux/include/uapi/asm-generic/posix_types.h */
+typedef struct {
+long__val[2];
+} fsid_t;
+
+struct statfs {
+__fsword_t f_type;
+__fsword_t f_bsize;
+fsblkcnt_t f_blocks;
+fsblkcnt_t f_bfree;
+fsblkcnt_t f_bavail;
+fsfilcnt_t f_files;
+fsfilcnt_t f_ffree;
+fsid_t f_fsid;
+__fsword_t f_namelen;
+__fsword_t f_frsize;
+__fsword_t f_flags;
+};
+#endif
+
 typedef struct FsCred
 {
 uid_t   fc_uid;
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index bf7f0b0..384d72e 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -26,7 +26,9 @@ static FsDriverTable FsDrivers[] = {
 { .name = "handle", .ops = _ops},
 #endif
 { .name = "synth", .ops 

[Qemu-devel] [PATCH] [PATCH] [WIP] [RFC] [RESEND] Add initial 9pfs support for Windows hosts

2016-04-10 Thread Michael Fritscher
It was tested on Windows & Linux hosts, on the later no obvious regressions 
could be found. The guest was a Knoppix 7.6.0 live cd.

This is WIP and a RFC - it isn't meant to be upstreamed yet. The error_printf 
are only for debugging and will be deleted in the final patch.

There are some comments with "mifritscher" - I'm not quite sure how to handle 
these problems (e.g. on mingw, this in 32 bit, while on Linux this is 16 bit 
and so on). Additionally, some places from which I suspect problems are marked 
with a #warning.

What works:
   * mount
   * stat
   * chdir
   * readdir (aka ls)
   * read of small files (<10k)
   * overwrite

What does not work:
   * create new files (in 90% problems with path handling (old path occurs)
   * read / probably write big files

TODO:
   * fix broken functionality
   * test & fix mapped permission style
   * prepend a 777 on all files if security modell "NONE"
   * test everything - on Windows and Linux

Signed-off-by: Michael Fritscher 
---
Many thanks to Stefan Weil for supporting me!

This is a resend, because I forgot the fact that this mail is sent from another 
mail address than I'm subscribed to the list.
 Makefile.objs  |   1 +
 configure  |  15 +++-
 fsdev/9p-iov-marshal.c |   2 +
 fsdev/9p-marshal.c |   4 +-
 fsdev/file-op-9p.h |  35 +++-
 fsdev/qemu-fsdev.c |   2 +
 hw/9pfs/9p-local.c | 200 +
 hw/9pfs/9p-synth.c |  10 ++-
 hw/9pfs/9p.c   |  99 --
 hw/9pfs/9p.h   |  70 +++-
 hw/9pfs/Makefile.objs  |   7 +-
 hw/9pfs/codir.c|  35 
 hw/9pfs/cofile.c   |  15 
 hw/9pfs/cofs.c |  15 
 hw/9pfs/virtio-9p-device.c |   4 +-
 15 files changed, 478 insertions(+), 36 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 8f705f6..6fd02bc 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -48,6 +48,7 @@ common-obj-$(CONFIG_WIN32) += os-win32.o
 common-obj-$(CONFIG_POSIX) += os-posix.o
 
 common-obj-$(CONFIG_LINUX) += fsdev/
+common-obj-$(CONFIG_WIN32) += fsdev/
 
 common-obj-y += migration/
 common-obj-y += qemu-char.o #aio.o
diff --git a/configure b/configure
index 5db29f0..a4797c3 100755
--- a/configure
+++ b/configure
@@ -4566,12 +4566,21 @@ if test "$want_tools" = "yes" ; then
 fi
 if test "$softmmu" = yes ; then
   if test "$virtfs" != no ; then
-if test "$cap" = yes && test "$linux" = yes && test "$attr" = yes ; then
+if test "$linux" = yes ; then
+  if test "$cap" = yes  && test "$attr" = yes ; then
+virtfs=yes
+tools="$tools fsdev/virtfs-proxy-helper\$(EXESUF)"
+  else
+if test "$virtfs" = yes; then
+  error_exit "VirtFS requires libcap-devel and libattr-devel on Linux"
+fi
+virtfs=no
+  fi
+elif test "$mingw32" = yes; then
   virtfs=yes
-  tools="$tools fsdev/virtfs-proxy-helper\$(EXESUF)"
 else
   if test "$virtfs" = yes; then
-error_exit "VirtFS is supported only on Linux and requires 
libcap-devel and libattr-devel"
+error_exit "VirtFS is only supported on Linux or Windows"
   fi
   virtfs=no
 fi
diff --git a/fsdev/9p-iov-marshal.c b/fsdev/9p-iov-marshal.c
index fb40bdf..1a292c6 100644
--- a/fsdev/9p-iov-marshal.c
+++ b/fsdev/9p-iov-marshal.c
@@ -15,7 +15,9 @@
 #include 
 #include 
 #include 
+#ifndef _WIN32
 #include 
+#endif
 
 #include "9p-iov-marshal.h"
 #include "qemu/bswap.h"
diff --git a/fsdev/9p-marshal.c b/fsdev/9p-marshal.c
index 183d366..081cb88 100644
--- a/fsdev/9p-marshal.c
+++ b/fsdev/9p-marshal.c
@@ -16,7 +16,9 @@
 #include 
 #include 
 #include 
-#include 
+#ifndef WIN32
+#include 
+#endif
 
 #include "9p-marshal.h"
 
diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index b8c2602..446c6a5 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -14,12 +14,43 @@
 #define _FILEOP_H
 #include 
 #include 
-#include 
-#include 
+#ifndef _WIN32
+#include 
+#include 
+#endif
 
 #define SM_LOCAL_MODE_BITS0600
 #define SM_LOCAL_DIR_MODE_BITS0700
 
+#ifdef _WIN32
+typedef uint32_t uid_t;
+typedef uint32_t gid_t;
+
+/* from http://man7.org/linux/man-pages/man2/statfs.2.html */
+typedef uint32_t __fsword_t;
+typedef uint32_t fsblkcnt_t;
+typedef uint32_t fsfilcnt_t;
+
+/* from linux/include/uapi/asm-generic/posix_types.h */
+typedef struct {
+long__val[2];
+} fsid_t;
+
+struct statfs {
+__fsword_t f_type;
+__fsword_t f_bsize;
+fsblkcnt_t f_blocks;
+fsblkcnt_t f_bfree;
+fsblkcnt_t f_bavail;
+fsfilcnt_t f_files;
+fsfilcnt_t f_ffree;
+fsid_t f_fsid;
+__fsword_t f_namelen;
+__fsword_t f_frsize;
+__fsword_t f_flags;
+};
+#endif
+
 typedef struct FsCred
 {
 uid_t   fc_uid;
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index bf7f0b0..384d72e 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ 

[Qemu-devel] [PATCHv8] Improve documentation for TLS

2016-04-10 Thread Alex Bligh
* Call out TLS into a separate section

* Add details of the TLS protocol itself

* Emphasise that actual TLS session initiation (i.e. the TLS handshake) can
  be initiated from either side (as required by the TLS standard I believe
  and as actually works in practice)

* Clarify what is a requirement on servers, and what is a requirement on
  clients, separately, specifying their behaviour in a single place
  in the document.

* Document the three possible modes of operation of a server.

* Add text defining what 'terminate the session' means during
  negotiation, and when it is available.

Signed-off-by: Alex Bligh 
---
 doc/proto.md | 342 +--
 1 file changed, 308 insertions(+), 34 deletions(-)

Changes since v7

* I missed committing the changes re consistent use of 'option' rather than 
'command'
  in v7. They are here now.

Changes from v6:

* Introduced language mandating a server to reply with NBD_ERR_INVALID
 to NBD_OPT_STARTTLS if TLS is already negotiatied.

* Removed some duplication in SELECTIVETLS over the prohibition on
 servers not returning NBD_ERR_TLSREQD to options other than
 NBD_OPT_EXPORTNAME, NBD_OPT_INFO and NBD_OPT_GO. The same thing
 was said a different way a couple of paragraphs below.

* Consistently refer to 'options' rather than 'commands' in the
 negotiation phase.

* Eric Blake's nits

Changes from v5:

* Delete OPTIONALTLS (RIP)

* Add NBD_REP_ERR_POLICY

* s/NBD_ERR_REP/NBD_REP_ERR/ in one place

* Consistently use the phrase 'terminate the session' to mean dropping
the connection, as per Wouter. Note there are other inconsistent
uses of 'dropping the connection', 'disconnecting' etc. elsewhere
which I haven't touched.

* Similarly refer to the connection as a 'session' when it doesn't
explicitly mean the L3 TCP connection (TLS section only).

* Introduce a paragraph under newstyle negotiation emphasising that
terminating the session is legal and sometimes required, and defining
it.

Changes from v4

* Minor grammar nit

Changes from v3:

* Delete confusing text about server omitting entries from NBD_OPT_LIST
if TLS is not negotiated and FORCETLS is used, as that (of course)
requires NBD_REP_ERR_TLS_REQD elsewhere in the text.

* Further nits from Eric Blake

Changes from v2:

* The response to a command is a response, not a NBD_REP_ACK

* Make it clear that the response can be errored

* Nits from Eric Blake

Changes from v1:

* Make a NBD_CMD_CLOSE imply a flush

* Nits from Eric Blake

diff --git a/doc/proto.md b/doc/proto.md
index f117394..5005552 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -195,6 +195,13 @@ request before sending the next one of the same type. The 
server MAY
 send replies in the order that the requests were received, but is not
 required to.
 
+There is no requirement for the client or server to complete a negotiation
+if it does not wish to do so. If the client does not find an export it
+is looking for (for instance) it may simply close the TCP connection.
+Under certain circumstances either the client or the server may be required
+by this document to close the TCP connection. In each case, this is
+referred to as 'terminate the session'.
+
 ### Transmission
 
 There are three message types in the transmission phase: the request,
@@ -286,6 +293,287 @@ S: (*length* bytes of data if the request is of type 
`NBD_CMD_READ`)
 This reply type MUST NOT be used except as documented by the
 experimental `STRUCTURED_REPLY` extension; see below.
 
+## TLS support
+
+The NBD protocol supports Transport Layer Security (TLS) (see
+[RFC5246](https://tools.ietf.org/html/rfc5246)
+as updated by
+[RFC6176](https://tools.ietf.org/html/rfc6176)
+).
+
+TLS is negotiated with the `NBD_OPT_STARTTLS`
+option. This is performed as an in-session upgrade. Below the term
+'negotiation' is used to refer to the sending and receiving of
+NBD options and option replies, and the term 'initiation' of TLS
+is used to refer to the actual upgrade to TLS.
+
+### Certificates, authentication and authorisation
+
+This standard does not specify what encryption, certification
+and signature algorithms are used. This standard does not
+specify authentication and authorisation (for instance
+whether client and/or server certificates are required and
+what they should contain); this is implementation dependent.
+
+TLS requires fixed newstyle negotiation to have completed.
+
+### Server-side requirements
+
+There are three modes of operation for a server. The
+server MUST support one of these modes.
+
+* The server operates entirely without TLS ('NOTLS'); OR
+
+* The server insists upon TLS, and forces the client to
+  upgrade by erroring any NBD options other than `NBD_OPT_STARTTLS`
+  with `NBD_REP_ERR_TLS_REQD` ('FORCEDTLS'); this in practice means
+  that all option negotiation (apart from the `NBD_OPT_STARTTLS`
+  itself) is carried out with TLS; OR
+
+* The server provides TLS, and it is mandatory on zero or 

Re: [Qemu-devel] configure qemu and compile it on windows

2016-04-10 Thread Stefan Weil
Am 10.04.2016 um 11:27 schrieb Marwa Hamza:
> hello
> i'm trying to configure qemu on windows so i'm using MinGW and msys
>  every time i run the script configure (./configure) an error message
> show about missing packages , i fixed those errors i installed
> pkg-config  , zlib packages but the last error i couldn't fix it
> ERROR: glib-2.12 gthread-2.0 is required to compile QEMU
> please any one have any idea how to fix that 
> thanks

MinGW is no longer supported because of unmaintained header files,
missing packages, old compilers and other reasons.

I suggest using Cygwin and the Mingw64 cross tools and packages
provided by Cygwin. For 64 bit Windows, see these packages:

https://cygwin.com/cgi-bin2/package-grep.cgi?grep=mingw64-x86_64=x86_64

32 bit uses these packages:

https://cygwin.com/cgi-bin2/package-grep.cgi?grep=mingw64-i686=x86_64

(Both examples are for 64 bit Cygwin)

With Cygwin, it is no longer necessary to search packages for QEMU at
different places or to build them yourself.

Either install all those packages, or start with
mingw64-i686-gcc-g++-4.9.2-2,
mingw64-i686-gtk3-3.18.6-1 and maybe some more (most packages needed
will be installed automatically because of dependencies).

Regards,
Stefan


signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4 5/5] Add param Error ** for msi_init()

2016-04-10 Thread Cao jin



On 04/10/2016 04:20 PM, Marcel Apfelbaum wrote:



Hi,
I'll let Markus to continue the review, it brings very valuable
information,
I will only try to answer the questions below.


Several questions on this topic:
1. How to confirm whether a device model has non-MSI variant? AFAICT,
it is these who have msi property.



MSI is required for PCI Express devices, optional for PCI devices.
Even if a PCI device supports MSI, it is strongly advised to support
legacy INTx for backward compatibility.
Bottom line, as far as I know, almost all PCI devices support legacy
interrupts.
(an exception is the ivshmem device that requires MSI)


2. For those have non-MSI variant devices(have msi property), as I see
in the code, they all have it on by default, So we won`t know it is
user order, or user don`t set it at all.



I didn't quite understand the sentence, but some devices have a
"use_msi" property that can be set by the user. If no such property exists,
we can assume the user "prefers" the msi version.


Hi,
Sorry for my bad description. let me explain myself again.
I think(guess), if a device model has msi property, it means this device 
model has non-msi variant. For those devices who has msi property, I see 
most of them will have it on by default. So when these devices 
initialize msi, qemu won`t know whether it is user order or not.


If I understand Markus right:
1). If user orders msi on, when msi_init returns -ENOTSUP, It is error. 
Then I suggest to inform user "set msi=off and try again"
2). If user doesn`t order msi on(so device have msi on by default), when 
msi_init returns -ENOTSUP, I am ok with Markus`s suggestion: *caller 
should silently switch to the non-MSI variant*


But now the condition is, qemu can`t distinguish whether user ordered 
msi or not, so for the condition 2) above, my suggestion is the same as 1)





If user don`t know msi and don`t set it on, I think it is acceptable
to create the non-msi variant for user silently. But if it is user
order, like you said, it is an error.



I am not sure about this. At least a warning should be given IMHO.


So, how about: inform user to swich msi off and try again when
encounter -ENOTSUP, no matter it is user order, or user doesn`t set it
at all?



Not all devices have an "msi" switch. If the board has msi broken and
the devices supports legacy interrupts, its OK to continue without MSI.


Actually in this v4, I do checked whether device has a msi property,
like cover-letter said:

   3. most devices have msi/msix(except vmxnet3 & pvscsi) property as
a   switch, if it has and is switched on, then msi_init() failure
should   results in return directly. So in this version, mptsas
is updated



I don't see a "msi" properties on PCIDevice class or VirtioPCIClass, are
you sure we have an msi switch for most of the PCI devices?



My bad, I didn`t limit the range. I mean the devices who will call 
msi_init, they mostly have msi(or msix, or both) property




--
Yours Sincerely,

Cao jin





[Qemu-devel] configure qemu and compile it on windows

2016-04-10 Thread Marwa Hamza
hello
i'm trying to configure qemu on windows so i'm using MinGW and msys
 every time i run the script configure (./configure) an error message show
about missing packages , i fixed those errors i installed pkg-config  ,
zlib packages but the last error i couldn't fix it
ERROR: glib-2.12 gthread-2.0 is required to compile QEMU
please any one have any idea how to fix that
thanks


Re: [Qemu-devel] [PATCHv6] Improve documentation for TLS

2016-04-10 Thread Alex Bligh
Eric,

Thanks. In v7 I've taken all of these, save where indicated
(or as described) below.

>> +If the server receives `NBD_OPT_STARTTLS` it MUST reply with
> 
> Worth mentioning 'prior to initiating TLS', since...
> ...the behavior after initiating TLS is required to be different?

As you point out later (and as I found independently) this
difference was not explicitly set out. It now is.

> Worth mentioning that clients should be prepared to encounter buggy
> servers (qemu 2.5) that disconnect rather than properly send an error
> message?  Maybe not, since the buggy versions will eventually be ancient
> history, and since you already stated above that either party can
> disconnect at any time for any reason during handshake phase.

I think not worth mentioning. There may be (and probably are)
buggy clients of all sorts. In this case Qemu's behaviour isn't
technically a bug anyway as it can drop the session at any
time. It's merely being someone ungracious.

> You deleted the requirement about MUST reply with NBD_REP_ERR_INVALID
> when the client sends NBD_CMD_STARTTLS when already in TLS, but I don't
> see it above.  Arguably, since we required above that the client MUST
> NOT send NBD_CMD_STARTTLS twice, it's already undefined behavior, but
> requiring that the server MUST reject with NBD_REP_ERR_INVALID would
> make it harder for implementations to get it wrong when dealing with
> buggy clients.

Reinstated explicitly in the TLS section, and mentioned in the
NBD_OPT_ section too. Thanks for catching that one.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


[Qemu-devel] [PATCHv7] Improve documentation for TLS

2016-04-10 Thread Alex Bligh
* Call out TLS into a separate section

* Add details of the TLS protocol itself

* Emphasise that actual TLS session initiation (i.e. the TLS handshake) can
  be initiated from either side (as required by the TLS standard I believe
  and as actually works in practice)

* Clarify what is a requirement on servers, and what is a requirement on
  clients, separately, specifying their behaviour in a single place
  in the document.

* Document the three possible modes of operation of a server.

* Add text defining what 'terminate the session' means during
  negotiation, and when it is available.

Signed-off-by: Alex Bligh 
---
 doc/proto.md | 342 +--
 1 file changed, 308 insertions(+), 34 deletions(-)

Changes from v6:

* Introduced language mandating a server to reply with NBD_ERR_INVALID
  to NBD_OPT_STARTTLS if TLS is already negotiatied.

* Removed some duplication in SELECTIVETLS over the prohibition on
  servers not returning NBD_ERR_TLSREQD to options other than
  NBD_OPT_EXPORTNAME, NBD_OPT_INFO and NBD_OPT_GO. The same thing
  was said a different way a couple of paragraphs below.

* Consistently refer to 'options' rather than 'commands' in the
  negotiation phase.

* Eric Blake's nits

Changes from v5:

* Delete OPTIONALTLS (RIP)

* Add NBD_REP_ERR_POLICY

* s/NBD_ERR_REP/NBD_REP_ERR/ in one place

* Consistently use the phrase 'terminate the session' to mean dropping
 the connection, as per Wouter. Note there are other inconsistent
 uses of 'dropping the connection', 'disconnecting' etc. elsewhere
 which I haven't touched.

* Similarly refer to the connection as a 'session' when it doesn't
 explicitly mean the L3 TCP connection (TLS section only).

* Introduce a paragraph under newstyle negotiation emphasising that
 terminating the session is legal and sometimes required, and defining
 it.

Changes from v4

* Minor grammar nit

Changes from v3:

* Delete confusing text about server omitting entries from NBD_OPT_LIST
if TLS is not negotiated and FORCETLS is used, as that (of course)
requires NBD_REP_ERR_TLS_REQD elsewhere in the text.

* Further nits from Eric Blake

Changes from v2:

* The response to a command is a response, not a NBD_REP_ACK

* Make it clear that the response can be errored

* Nits from Eric Blake

Changes from v1:

* Make a NBD_CMD_CLOSE imply a flush

* Nits from Eric Blake

diff --git a/doc/proto.md b/doc/proto.md
index f117394..c38ca93 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -195,6 +195,13 @@ request before sending the next one of the same type. The 
server MAY
 send replies in the order that the requests were received, but is not
 required to.
 
+There is no requirement for the client or server to complete a negotiation
+if it does not wish to do so. If the client does not find an export it
+is looking for (for instance) it may simply close the TCP connection.
+Under certain circumstances either the client or the server may be required
+by this document to close the TCP connection. In each case, this is
+referred to as 'terminate the session'.
+
 ### Transmission
 
 There are three message types in the transmission phase: the request,
@@ -286,6 +293,287 @@ S: (*length* bytes of data if the request is of type 
`NBD_CMD_READ`)
 This reply type MUST NOT be used except as documented by the
 experimental `STRUCTURED_REPLY` extension; see below.
 
+## TLS support
+
+The NBD protocol supports Transport Layer Security (TLS) (see
+[RFC5246](https://tools.ietf.org/html/rfc5246)
+as updated by
+[RFC6176](https://tools.ietf.org/html/rfc6176)
+).
+
+TLS is negotiated with the `NBD_OPT_STARTTLS`
+option. This is performed as an in-session upgrade. Below the term
+'negotiation' is used to refer to the sending and receiving of
+NBD commands, and the term 'initiation' of TLS is used to refer to
+the actual upgrade to TLS.
+
+### Certificates, authentication and authorisation
+
+This standard does not specify what encryption, certification
+and signature algorithms are used. This standard does not
+specify authentication and authorisation (for instance
+whether client and/or server certificates are required and
+what they should contain); this is implementation dependent.
+
+TLS requires fixed newstyle negotiation to have completed.
+
+### Server-side requirements
+
+There are three modes of operation for a server. The
+server MUST support one of these modes.
+
+* The server operates entirely without TLS ('NOTLS'); OR
+
+* The server insists upon TLS, and forces the client to
+  upgrade by erroring any NBD options other than `NBD_OPT_STARTTLS`
+  with `NBD_REP_ERR_TLS_REQD` ('FORCEDTLS'); this in practice means
+  that all option negotiation (apart from the `NBD_OPT_STARTTLS`
+  itself) is carried out with TLS; OR
+
+* The server provides TLS, and it is mandatory on zero or more
+  exports, and is available at the client's option on all
+  other exports ('SELECTIVETLS'). The server does not force
+  the client to 

Re: [Qemu-devel] [Qemu-trivial] [PATCH] pci: fix identation

2016-04-10 Thread Marcel Apfelbaum

On 03/28/2016 10:32 AM, Marcel Apfelbaum wrote:

Signed-off-by: Marcel Apfelbaum 


Ping.

Thanks,
Marcel


---
  hw/pci/pci.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index a1d41aa..da8b0b6 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -165,15 +165,15 @@ static inline int pci_irq_state(PCIDevice *d, int irq_num)
  {
  assert(irq_num >= 0);

-   return (d->irq_state >> irq_num) & 0x1;
+return (d->irq_state >> irq_num) & 0x1;
  }

  static inline void pci_set_irq_state(PCIDevice *d, int irq_num, int level)
  {
  assert(irq_num >= 0);

-   d->irq_state &= ~(0x1 << irq_num);
-   d->irq_state |= level << irq_num;
+d->irq_state &= ~(0x1 << irq_num);
+d->irq_state |= level << irq_num;
  }

  static void pci_change_irq_level(PCIDevice *pci_dev, int irq_num, int change)






Re: [Qemu-devel] [PATCH] pci: fix compilation warnings

2016-04-10 Thread Marcel Apfelbaum

On 03/28/2016 10:23 AM, Marcel Apfelbaum wrote:

Fix 'error: shift exponent -1 is negative' warning
by adding a corresponding assert.



Ping.

Thanks,
Marcel


Reported-by: Peter Maydell 
Signed-off-by: Markus Armbruster 
Signed-off-by: Marcel Apfelbaum 
---
  hw/pci/pci.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e67664d..a1d41aa 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -163,11 +163,15 @@ int pci_bar(PCIDevice *d, int reg)

  static inline int pci_irq_state(PCIDevice *d, int irq_num)
  {
+assert(irq_num >= 0);
+
return (d->irq_state >> irq_num) & 0x1;
  }

  static inline void pci_set_irq_state(PCIDevice *d, int irq_num, int level)
  {
+assert(irq_num >= 0);
+
d->irq_state &= ~(0x1 << irq_num);
d->irq_state |= level << irq_num;
  }






Re: [Qemu-devel] [PATCH v4 5/5] Add param Error ** for msi_init()

2016-04-10 Thread Marcel Apfelbaum

On 04/09/2016 03:19 PM, Cao jin wrote:

Hi

On 04/08/2016 04:44 PM, Markus Armbruster wrote:


diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 0a13334..db4fdb5 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -146,7 +146,7 @@ static void pci_ich9_ahci_realize(PCIDevice *dev, Error 
**errp)
  /* Although the AHCI 1.3 specification states that the first capability
   * should be PMCAP, the Intel ICH9 data sheet specifies that the ICH9
   * AHCI device puts the MSI capability first, pointing to 0x80. */
-msi_init(dev, ICH9_MSI_CAP_OFFSET, 1, true, false);
+msi_init(dev, ICH9_MSI_CAP_OFFSET, 1, true, false, errp);


Sure there's nothing to undo on error?  Instead of undoing, you may want
to move msi_init() before the stuff that needs undoing.



ich9-ahci is a on-board device of Q35, like cover-letter says: when it fail, 
qemu will exit. So, is it necessary to undo on error?

maybe you saw, I did move msi_init() for some other devices.


diff --git a/hw/pci/msi.c b/hw/pci/msi.c


msi_init() has three failure modes:

* -ENOTSUP

   Board's MSI emulation is not known to work: !msi_nonbroken.

   This is not necessarily an error.

   It is when the device model requires MSI.

   It isnt' when a non-MSI variant of the device model exists.  Then
   caller should silently switch to the non-MSI variant[*].





Hi,
I'll let Markus to continue the review, it brings very valuable information,
I will only try to answer the questions below.


Several questions on this topic:
1. How to confirm whether a device model has non-MSI variant? AFAICT, it is 
these who have msi property.



MSI is required for PCI Express devices, optional for PCI devices.
Even if a PCI device supports MSI, it is strongly advised to support
legacy INTx for backward compatibility.
Bottom line, as far as I know, almost all PCI devices support legacy interrupts.
(an exception is the ivshmem device that requires MSI)


2. For those have non-MSI variant devices(have msi property), as I see in the 
code, they all have it on by default, So we won`t know it is user order, or 
user don`t set it at all.



I didn't quite understand the sentence, but some devices have a "use_msi" 
property that can be set by the user. If no such property exists,
we can assume the user "prefers" the msi version.



If user don`t know msi and don`t set it on, I think it is acceptable to create 
the non-msi variant for user silently. But if it is user order, like you said, 
it is an error.



I am not sure about this. At least a warning should be given IMHO.


So, how about: inform user to swich msi off and try again when encounter 
-ENOTSUP, no matter it is user order, or user doesn`t set it at all?



Not all devices have an "msi" switch. If the board has msi broken and the 
devices supports legacy interrupts, its OK to continue without MSI.


Actually in this v4, I do checked whether device has a msi property, like 
cover-letter said:

   3. most devices have msi/msix(except vmxnet3 & pvscsi) property as a   
switch, if it has and is switched on, then msi_init() failure should   results in 
return directly. So in this version, mptsas
is updated



I don't see a "msi" properties on PCIDevice class or VirtioPCIClass, are you 
sure we have an msi switch for most of the PCI devices?

Thanks for looking into this,
Marcel



Re: [Qemu-devel] [PATCH v4 4/5] mptsas: change .realize function name

2016-04-10 Thread Marcel Apfelbaum

On 04/05/2016 02:26 PM, Cao jin wrote:

All the other devices` .realize function name are xxx_realize, except this one

Signed-off-by: Cao jin 
CC: Paolo Bonzini 
---
  hw/scsi/mptsas.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/mptsas.c b/hw/scsi/mptsas.c
index 499c146..1c18c84 100644
--- a/hw/scsi/mptsas.c
+++ b/hw/scsi/mptsas.c
@@ -1270,7 +1270,7 @@ static const struct SCSIBusInfo mptsas_scsi_info = {
  .load_request = mptsas_load_request,
  };

-static void mptsas_scsi_init(PCIDevice *dev, Error **errp)
+static void mptsas_scsi_realize(PCIDevice *dev, Error **errp)
  {
  DeviceState *d = DEVICE(dev);
  MPTSASState *s = MPT_SAS(dev);
@@ -1413,7 +1413,7 @@ static void mptsas1068_class_init(ObjectClass *oc, void 
*data)
  DeviceClass *dc = DEVICE_CLASS(oc);
  PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);

-pc->realize = mptsas_scsi_init;
+pc->realize = mptsas_scsi_realize;
  pc->exit = mptsas_scsi_uninit;
  pc->romfile = 0;
  pc->vendor_id = PCI_VENDOR_ID_LSI_LOGIC;



Reviewed-by: Marcel Apfelbaum 

Thanks,
Marcel



Re: [Qemu-devel] [PATCH v4 2/5] change pvscsi_init_msi() type to void

2016-04-10 Thread Marcel Apfelbaum

On 04/05/2016 02:26 PM, Cao jin wrote:

Nobody use its return value, so change the type to void

Signed-off-by: Cao jin 
CC: Paolo Bonzini 
CC: Dmitry Fleytman 
---
  hw/scsi/vmw_pvscsi.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
index 9abc086..4ce3581 100644
--- a/hw/scsi/vmw_pvscsi.c
+++ b/hw/scsi/vmw_pvscsi.c
@@ -1039,7 +1039,7 @@ pvscsi_io_read(void *opaque, hwaddr addr, unsigned size)
  }


-static bool
+static void
  pvscsi_init_msi(PVSCSIState *s)
  {
  int res;
@@ -1053,8 +1053,6 @@ pvscsi_init_msi(PVSCSIState *s)
  } else {
  s->msi_used = true;
  }
-
-return s->msi_used;
  }

  static void




Reviewed-by: Marcel Apfelbaum 

Thanks,
Marcel



Re: [Qemu-devel] [PATCH v4 3/5] megasas: bugfix

2016-04-10 Thread Marcel Apfelbaum

On 04/08/2016 10:16 AM, Markus Armbruster wrote:

Please use a more descriptive title.  Suggest "megasas: Fix

Cao jin  writes:


msi_init returns non-zero value on both failure and success


This is a sentence, should end with a period.

Bug's impact?  Here's my guess.

msi_init() either succeeds and returns 0x50, or fails and returns a
negative errno.  If it succeeds, we mistakenly clear
MEGASAS_MASK_USE_MSI.  Its only use is in megasas_scsi_uninit(), via
megasas_use_msi().  There, we fail to msi_uninit() on unrealize due to
the bug.

I figure that's harmless if we destroy the device next.  This is the
common case.

If we don't destroy it, and then realize it again, msi_init() fails,
because there's no space at 0x50: the MSI capability we neglected to
delete is still there.  We report the problem to the user, then realize
the device anyway (I hate that, but it's a separate issue).

Marcel, can you confirm my analysis?


Your analysis is accurate, I didn't even look so hard at consequences,
this is a clear bug that needs to be fixed.
However, now I looked into it and your explanation shows why it even works...





Signed-off-by: Cao jin 
CC: Hannes Reinecke 
CC: Paolo Bonzini 
---
  hw/scsi/megasas.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index a63a581..56fb645 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2348,7 +2348,7 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
"megasas-queue", 0x4);

  if (megasas_use_msi(s) &&
-msi_init(dev, 0x50, 1, true, false)) {
+msi_init(dev, 0x50, 1, true, false) < 0) {
  s->flags &= ~MEGASAS_MASK_USE_MSI;
  }
  if (megasas_use_msix(s) &&

msix_init(dev, 15, >mmio_io, b->mmio_bar, 0x2000,
  >mmio_io, b->mmio_bar, 0x3800, 0x68)) {
s->flags &= ~MEGASAS_MASK_USE_MSIX;
}

This looks like the same bug, but it's actually okay, since msix_init()
returns 0 on success.  Suggest to test < 0 anyway so that future readers
don't get misled into thinking there's a bug like I was.



I agree we should follow the same convention.


Marcel, this difference between msi_init() and msix_init() is just mean.


It keeps us alert :)


Please clean it up.


Sure, I'll take care of it.

Thanks,
Marcel








Re: [Qemu-devel] [PATCH] hw/9pfs: Add CephFS support in VirtFS

2016-04-10 Thread Jevon Qiao

Hi Greg,

Thank you for spending time reviewing this patch.
On 7/4/16 23:50, Greg Kurz wrote:

On Tue, 15 Mar 2016 00:02:48 +0800
Jevon Qiao  wrote:


Ceph as a promising unified distributed storage system is widely used in the
world of OpenStack. OpenStack users deploying Ceph for block (Cinder) and
object (S3/Swift) are unsurprisingly looking at Manila and CephFS to round out
a unified storage solution. Since the typical hypervisor people are using is
Qemu/KVM, it is necessary to provide a high performance, easy to use, file
system service in it. VirtFS aims to offers paravirtualized system services and
simple passthrough for directories from host to guest, which currently only
support local file system, this patch wants to add CephFS support in VirtFS.

Signed-off-by: Jevon Qiao 
---

Jevon,

There's still work to be done on this patch.

One general remark is that there are far too many traces: it obfuscates the code
and does not bring much benefit in my opinion. If you look at the other fsdev
drivers, you see they don't do traces at all !
Also, I've found several errors where the code simply cannot work... please run
a file/io oriented testsuite in the guest to check all the fsdev operations are
working as expected... maybe some tests from LTP ?
Well, I did run some tests against this patch with iozone to guarantee 
the IO path is correct and did some manual tests to make sure that the 
basic file/directory operations work. Anyway, I will run the file system 
related parts of LTP.

Don't forget to list all the changes when you post your v3.

Sure thing.

You'll find detailed comments below.
I'll revisit the code per your comments, please see my replies to some 
of them below.

Cheers.

--
Greg


  configure |  33 ++
  fsdev/qemu-fsdev.c|   1 +
  fsdev/qemu-fsdev.h|   3 +-
  hw/9pfs/9p-cephfs.c   | 836 ++
  hw/9pfs/Makefile.objs |   3 +
  scripts/analyse-9p-simpletrace.py | 213 --
  scripts/analyze-9p-simpletrace.py | 306 ++

Hmm... even though 'analyze' is the preferred spelling in the QEMU code,
'analyse' is correct. I don't think it is useful to rename a script
that's been around since 2011... And even if it was, I'd prefer it to
be done in a separate patch, because it just makes review more difficult
here...

I'll kick this change out in this patch.

  trace-events  |  33 ++
  8 files changed, 1214 insertions(+), 214 deletions(-)
  create mode 100644 hw/9pfs/9p-cephfs.c
  delete mode 100755 scripts/analyse-9p-simpletrace.py
  create mode 100755 scripts/analyze-9p-simpletrace.py

diff --git a/configure b/configure
index 0c0472a..c48f1af 100755
--- a/configure
+++ b/configure
@@ -275,6 +275,7 @@ trace_backends="log"
  trace_file="trace"
  spice=""
  rbd=""
+cephfs=""
  smartcard=""
  libusb=""
  usb_redir=""
@@ -1019,6 +1020,10 @@ for opt do
;;
--enable-rbd) rbd="yes"
;;
+  --disable-cephfs) cephfs="no"
+  ;;
+  --enable-cephfs) cephfs="yes"
+  ;;
--disable-xfsctl) xfs="no"
;;
--enable-xfsctl) xfs="yes"
@@ -1345,6 +1350,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
vhost-net   vhost-net acceleration support
spice   spice
rbd rados block device (rbd)
+  cephfs  Ceph File System
libiscsiiscsi support
libnfs  nfs support
smartcard   smartcard support (libcacard)
@@ -3087,6 +3093,28 @@ EOF
  fi

  ##
+# cephfs probe
+if test "$cephfs" != "no" ; then
+  cat > $TMPC <
+#include 
+int main(void) {
+struct ceph_mount_info *cmount;
+ceph_create(, NULL);
+return 0;
+}
+EOF
+  cephfs_libs="-lcephfs"
+  if compile_prog "" "$cephfs_libs" ; then
+cephfs=yes
+  else
+if test "$cephfs" = "yes" ; then
+  feature_not_found "cephfs" "Install libcephfs/ceph devel"
+fi
+cephfs=no
+  fi
+fi
+##
  # libssh2 probe
  min_libssh2_version=1.2.8
  if test "$libssh2" != "no" ; then
@@ -4760,6 +4788,7 @@ else
  echo "spice support $spice"
  fi
  echo "rbd support   $rbd"
+echo "cephfs support$cephfs"
  echo "xfsctl support$xfs"
  echo "smartcard support $smartcard"
  echo "libusb$libusb"
@@ -5224,6 +5253,10 @@ if test "$rbd" = "yes" ; then
echo "RBD_CFLAGS=$rbd_cflags" >> $config_host_mak
echo "RBD_LIBS=$rbd_libs" >> $config_host_mak
  fi
+if test "$cephfs" = "yes" ; then
+  echo "CONFIG_CEPHFS=m" >> $config_host_mak
+  echo "CEPHFS_LIBS=$cephfs_libs" >> $config_host_mak
+fi

  echo "CONFIG_COROUTINE_BACKEND=$coroutine" >> $config_host_mak
  if test "$coroutine_pool" = "yes" ; then
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index bf7f0b0..7f07a2a 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -27,6 +27,7 @@ static FsDriverTable FsDrivers[] = {