Re: [Qemu-devel] [PATCH] PPC/KVM: early validation of vcpu id

2016-04-26 Thread David Gibson
On Tue, Apr 26, 2016 at 03:41:04PM +0200, Greg Kurz wrote:
> The KVM API restricts vcpu ids to be < KVM_CAP_MAX_VCPUS. On PowerPC
> targets, depending on the number of threads per core in the host and
> in the guest, some topologies do generate higher vcpu ids actually.
> When this happens, QEMU bails out with the following error:
> 
> kvm_init_vcpu failed: Invalid argument
> 
> The KVM_CREATE_VCPU ioctl has several EINVAL return paths, so it is
> not possible to fully disambiguate.
> 
> This patch adds a check in the code that computes vcpu ids, so that
> we can detect the error earlier, and print a friendlier message instead
> of calling KVM_CREATE_VCPU with an obviously bogus vcpu id.
> 
> Signed-off-by: Greg Kurz 

Applied to ppc-for-2.7, thanks.

I'm still kicking myself for doing the SMT / cpu ids that way way back
when.  Should have just had a "SET_SMT" ioctl() and allocated the cpu
ids sequentially.  Too clever by half :(

> ---
>  include/sysemu/kvm.h|2 ++
>  kvm-all.c   |6 ++
>  target-ppc/translate_init.c |8 
>  3 files changed, 16 insertions(+)
> 
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 0e18f15c9493..27bf50ef379e 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -344,6 +344,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s);
>  
>  int kvm_arch_init_vcpu(CPUState *cpu);
>  
> +bool kvm_vcpu_id_is_valid(int vcpu_id);
> +
>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
>  
> diff --git a/kvm-all.c b/kvm-all.c
> index e7b66df197fd..3625a3e07457 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1459,6 +1459,12 @@ static int kvm_max_vcpus(KVMState *s)
>  return (ret) ? ret : kvm_recommended_vcpus(s);
>  }
>  
> +bool kvm_vcpu_id_is_valid(int vcpu_id)
> +{
> +KVMState *s = KVM_STATE(current_machine->accelerator);
> +return vcpu_id >= 0 && vcpu_id < kvm_max_vcpus(s);
> +}
> +
>  static int kvm_init(MachineState *ms)
>  {
>  MachineClass *mc = MACHINE_GET_CLASS(ms);
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index f51572552bc2..6c89b18a05f9 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -9247,6 +9247,14 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  #if !defined(CONFIG_USER_ONLY)
>  cpu->cpu_dt_id = (cs->cpu_index / smp_threads) * max_smt
>  + (cs->cpu_index % smp_threads);
> +
> +if (kvm_enabled() && !kvm_vcpu_id_is_valid(cpu->cpu_dt_id)) {
> +error_setg(errp, "Can't create CPU with id %d in KVM", 
> cpu->cpu_dt_id);
> +error_append_hint(errp, "Adjust the number of cpus to %d "
> +  "or try to raise the number of threads per core\n",
> +  cpu->cpu_dt_id * smp_threads / max_smt);
> +return;
> +}
>  #endif
>  
>  if (tcg_enabled()) {
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2] hw/net/opencores_eth: Allocating Large sized arrays to heap

2016-04-26 Thread Wei, Jiangang
On Wed, 2016-04-27 at 06:44 +0300, Max Filippov wrote:
> Hi Wei,
> 
> On Wed, Apr 27, 2016 at 03:27:47AM +, Wei, Jiangang wrote:
> > On Wed, 2016-04-27 at 10:07 +0800, Zhou Jie wrote:
> > >  static void open_eth_start_xmit(OpenEthState *s, desc *tx)
> > >  {
> > > -uint8_t buf[65536];
> > > +uint8_t *buf = NULL;
> > > +uint8_t buffer[0x600];
> > Hi,
> > 
> > I'm curious about 0x600.
> > How do you determine this size?
> > IMO, Max's suggestion looks more reasonable.
> > (1536 bytes, maximal frame length when HUGEN bit is not set in MODER)
> 
> This is the same value. Opencores 10/100 ethernet spec uses both
> decimal and hexadecimal notation.
I got it
Thanks for your reply.

Wei





Re: [Qemu-devel] [PATCH v2] hw/net/opencores_eth: Allocating Large sized arrays to heap

2016-04-26 Thread Max Filippov
Hi Wei,

On Wed, Apr 27, 2016 at 03:27:47AM +, Wei, Jiangang wrote:
> On Wed, 2016-04-27 at 10:07 +0800, Zhou Jie wrote:
> >  static void open_eth_start_xmit(OpenEthState *s, desc *tx)
> >  {
> > -uint8_t buf[65536];
> > +uint8_t *buf = NULL;
> > +uint8_t buffer[0x600];
> Hi,
> 
> I'm curious about 0x600.
> How do you determine this size?
> IMO, Max's suggestion looks more reasonable.
> (1536 bytes, maximal frame length when HUGEN bit is not set in MODER)

This is the same value. Opencores 10/100 ethernet spec uses both
decimal and hexadecimal notation.

-- 
Thanks.
-- Max



Re: [Qemu-devel] [PATCH v2] hw/net/opencores_eth: Allocating Large sized arrays to heap

2016-04-26 Thread Wei, Jiangang
On Wed, 2016-04-27 at 10:07 +0800, Zhou Jie wrote:
> open_eth_start_xmit has a huge stack usage of 65536 bytes approx.
> Moving large arrays to heap to reduce stack usage.
> 
> Signed-off-by: Zhou Jie 
> ---
>  hw/net/opencores_eth.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/net/opencores_eth.c b/hw/net/opencores_eth.c
> index c6094fb..fa0a4e7 100644
> --- a/hw/net/opencores_eth.c
> +++ b/hw/net/opencores_eth.c
> @@ -483,7 +483,8 @@ static NetClientInfo net_open_eth_info = {
>  
>  static void open_eth_start_xmit(OpenEthState *s, desc *tx)
>  {
> -uint8_t buf[65536];
> +uint8_t *buf = NULL;
> +uint8_t buffer[0x600];
Hi,

I'm curious about 0x600.
How do you determine this size?
IMO, Max's suggestion looks more reasonable.
(1536 bytes, maximal frame length when HUGEN bit is not set in MODER)

Regards,
wei
>  unsigned len = GET_FIELD(tx->len_flags, TXD_LEN);
>  unsigned tx_len = len;
>  
> @@ -498,6 +499,11 @@ static void open_eth_start_xmit(OpenEthState *s, desc 
> *tx)
>  
>  trace_open_eth_start_xmit(tx->buf_ptr, len, tx_len);
>  
> +if (tx_len > 0x600) {
> +buf = g_new(uint8_t, tx_len);
> +} else {
> +buf = buffer;
> +}
>  if (len > tx_len) {
>  len = tx_len;
>  }
> @@ -506,6 +512,9 @@ static void open_eth_start_xmit(OpenEthState *s, desc *tx)
>  memset(buf + len, 0, tx_len - len);
>  }
>  qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len);
> +if (tx_len > 0x600) {
> +g_free(buf);
> +}
>  
>  if (tx->len_flags & TXD_WR) {
>  s->tx_desc = 0;





[Qemu-devel] 'tcg fatal error' with qemu v2.6.0-rc3 (bisected)

2016-04-26 Thread Guenter Roeck
Hi,

when running qemu version 2.6.0-rc3, I get the following error message.

/opt/buildbot/qemu/qemu/tcg/tcg.c:1769: tcg fatal error

qemu command line is as follows.

qemu-system-ppc64 -M mpc8544ds \
-cpu e5500 \
-m 1024 -kernel arch/powerpc/boot/uImage -initrd busybox-ppc.cpio \
-nographic -vga none -monitor null -no-reboot \
--append "rdinit=/sbin/init console=tty console=ttyS0" \
-machine "dt_compatible=fsl,,P5020DS"

Files are from my test suite at https://github.com/groeck/linux-build-test.

Bisect points to commit 40ae5c62ebaaf7d9d3b93b88c2d32bf6342f7889 ("tcg:
Introduce temp_load"). Bisect log is attached.

Guenter

---
# bad: [f419a626c76bcb26697883af702862e8623056f9] usb/uhci: move pid check
# good: [a8c40fa2d667e585382080db36ac44e216b37a1c] Update version for v2.5.0 
release
git bisect start 'HEAD' 'v2.5.0'
# bad: [253785e3b96f48c52568c312cec0a5ec596c527f] scripts/feature_to_c.sh: 
Include qemu/osdep.h rather than config.h
git bisect bad 253785e3b96f48c52568c312cec0a5ec596c527f
# good: [f4109dba216f2df61a6098fdd7a6f2d2be4ac848] scripts/kvm/kvm_stat: Moved 
DebugfsProvider
git bisect good f4109dba216f2df61a6098fdd7a6f2d2be4ac848
# good: [8983b670f62ab5e5e8dd2690bf8304123651bfe5] block: qemu-iotests - add 
test for snapshot, commit, snapshot bug
git bisect good 8983b670f62ab5e5e8dd2690bf8304123651bfe5
# good: [d7bea75d35a44023efc9d481d3a1a2600677b2ef] qapi: Avoid use of misnamed 
DO_UPCAST()
git bisect good d7bea75d35a44023efc9d481d3a1a2600677b2ef
# bad: [c9f19dff101e2c2cf3fa3967eceec2833e845e40] Merge remote-tracking branch 
'remotes/bonzini/tags/for-upstream' into staging
git bisect bad c9f19dff101e2c2cf3fa3967eceec2833e845e40
# bad: [977a82ab56daac83623d730174f47d5a7edd73c9] configure: sanity check the 
glib library that pkg-config finds
git bisect bad 977a82ab56daac83623d730174f47d5a7edd73c9
# good: [a86156401559cb4401cf9ecc704faeab6fc8bb19] qmp: Fix reference-counting 
of qnull on empty output visit
git bisect good a86156401559cb4401cf9ecc704faeab6fc8bb19
# good: [12b9b11a2743002232098afb41810f1c0cb211a0] tcg: Change temp_sync 
argument to TCGTemp
git bisect good 12b9b11a2743002232098afb41810f1c0cb211a0
# bad: [ac1be2ae6b2995b99430c48329eb971b0281acf1] Merge remote-tracking branch 
'remotes/armbru/tags/pull-qapi-2016-02-09' into staging
git bisect bad ac1be2ae6b2995b99430c48329eb971b0281acf1
# good: [423aeaf219890e8a7311dbeef1a925020027c2ea] qapi: Add missing JSON files 
in build dependencies
git bisect good 423aeaf219890e8a7311dbeef1a925020027c2ea
# bad: [40ae5c62ebaaf7d9d3b93b88c2d32bf6342f7889] tcg: Introduce temp_load
git bisect bad 40ae5c62ebaaf7d9d3b93b88c2d32bf6342f7889
# good: [b13eb728d33deaa53efc0dcef557da998e6ec40e] tcg: Change temp_save 
argument to TCGTemp
git bisect good b13eb728d33deaa53efc0dcef557da998e6ec40e
# first bad commit: [40ae5c62ebaaf7d9d3b93b88c2d32bf6342f7889] tcg: Introduce 
temp_load



Re: [Qemu-devel] [PATCH v2] hw/net/opencores_eth: Allocating Large sized arrays to heap

2016-04-26 Thread Zhou Jie



On 2016/4/27 10:46, Max Filippov wrote:

Hi Zhou,

On Wed, Apr 27, 2016 at 10:18:56AM +0800, Zhou Jie wrote:

When I committed another patch which named as
"hw/net/virtio-net: Allocating Large sized arrays to heap" .

Christian Borntraeger said that 16k is usually perfectly fine
for a userspace stack and doing allocations in a hot path
might actually hurt performance.

Although the size is 65536 bytes here,
I think open_eth_start_xmit is in a hot path.
So, it is OK, if you think that this patch should not be applied.


With Linux as guest OS we shouldn't see any allocations
as it doesn't send huge packets, so I think this patch is fine.
I can take it through the xtensa tree if you don't have other
plan.


OK, Thanks

Sincerely,
Zhou Jie






Re: [Qemu-devel] [PATCH v2] hw/net/opencores_eth: Allocating Large sized arrays to heap

2016-04-26 Thread Max Filippov
Hi Zhou,

On Wed, Apr 27, 2016 at 10:18:56AM +0800, Zhou Jie wrote:
>When I committed another patch which named as
>"hw/net/virtio-net: Allocating Large sized arrays to heap" .
> 
>Christian Borntraeger said that 16k is usually perfectly fine
>for a userspace stack and doing allocations in a hot path
>might actually hurt performance.
> 
>Although the size is 65536 bytes here,
>I think open_eth_start_xmit is in a hot path.
>So, it is OK, if you think that this patch should not be applied.

With Linux as guest OS we shouldn't see any allocations
as it doesn't send huge packets, so I think this patch is fine.
I can take it through the xtensa tree if you don't have other
plan.

-- 
Thanks.
-- Max



Re: [Qemu-devel] [PATCH v3 11/11] translate-all: add tb hash bucket info to 'info jit' dump

2016-04-26 Thread Emilio G. Cota
On Sun, Apr 24, 2016 at 18:06:51 -0400, Emilio G. Cota wrote:
> On Sun, Apr 24, 2016 at 12:46:08 -0700, Richard Henderson wrote:
> > On 04/22/2016 04:57 PM, Emilio G. Cota wrote:
> > >On Fri, Apr 22, 2016 at 12:59:52 -0700, Richard Henderson wrote:
> > >>FWIW, so that I could get an idea of how the stats change as we improve 
> > >>the
> > >>hashing, I inserted the attachment 1 patch between patches 5 and 6, and 
> > >>with
> > >>attachment 2 attempting to fix the accounting for patches 9 and 10.
> > >
> > >For qht, I dislike the approach of reporting "avg chain" per-element,
> > >instead of per-bucket. Performance for a bucket whose entries are
> > >all valid is virtually the same as that of a bucket that only
> > >has one valid element; thus, with per-bucket reporting, we'd say that
> > >the chain lenght is 1 in both cases, i.e. "perfect". With per-element
> > >reporting, we'd report 4 (on a 64-bit host, since that's the value of
> > >QHT_BUCKET_ENTRIES) when the bucket is full, which IMO gives the
> > >wrong idea (users would think they're in trouble, when they're not).
> > 
> > But otherwise you have no way of knowing how full the buckets are.  The
> > bucket size is just something that you have to keep in mind.
> 
> I'll make some changes in v4 that I think will address both your and
> my concerns:
> - Report the number of empty buckets
> - Do not count empty buckets when reporting avg bucket chain length
> - Report average bucket occupancy (in %, so that QHT_BUCKET_ENTRIES
>   does not have to be reported.)

How does the following look?

Example with good hashing, i.e. func5(phys_pc, pc, flags):
TB count704242/1342156
[...]
TB hash buckets 386484/524288 (73.72% used)
TB hash occupancy   32.57% avg chain occupancy. Histogram: 
0-10%▆▁█▁▁▅▁▃▁▁90-100%
TB hash avg chain   1.02 buckets. Histogram: 1█▁▁3

Example with bad hashing, i.e. func5(phys_pc, 0, 0):
TB count710748/1342156
[...]
TB hash buckets 113569/524288 (21.66% used)
TB hash occupancy   10.24% avg chain occupancy. Histogram: 
0-10%█▁90-100%
TB hash avg chain   2.11 buckets. Histogram: 1▇▁93

Note that:

- "TB hash avg chain" does _not_ count empty buckets. This gives
  an idea of how many buckets a typical hit goes through.

- "TB hash occupancy" _does_ count empty buckets. It is called
  "avg chain occupancy" and not "avg occupancy" because the
  counts are only valid per-chain due to the seqlock protecting
  each chain.

Thanks,

Emilio



Re: [Qemu-devel] [PATCH v2] hw/net/opencores_eth: Allocating Large sized arrays to heap

2016-04-26 Thread Max Filippov
On Wed, Apr 27, 2016 at 10:07:48AM +0800, Zhou Jie wrote:
> open_eth_start_xmit has a huge stack usage of 65536 bytes approx.
> Moving large arrays to heap to reduce stack usage.
> 
> Signed-off-by: Zhou Jie 
> ---
>  hw/net/opencores_eth.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)

Acked-by: Max Filippov 

-- 
Thanks.
-- Max



Re: [Qemu-devel] [PATCH v2] hw/net/opencores_eth: Allocating Large sized arrays to heap

2016-04-26 Thread Zhou Jie

Hi Max,

   When I committed another patch which named as
   "hw/net/virtio-net: Allocating Large sized arrays to heap" .

   Christian Borntraeger said that 16k is usually perfectly fine
   for a userspace stack and doing allocations in a hot path
   might actually hurt performance.

   Although the size is 65536 bytes here,
   I think open_eth_start_xmit is in a hot path.
   So, it is OK, if you think that this patch should not be applied.

Sincerely,
Zhou Jie

On 2016/4/27 10:07, Zhou Jie wrote:

open_eth_start_xmit has a huge stack usage of 65536 bytes approx.
Moving large arrays to heap to reduce stack usage.

Signed-off-by: Zhou Jie 
---
  hw/net/opencores_eth.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/net/opencores_eth.c b/hw/net/opencores_eth.c
index c6094fb..fa0a4e7 100644
--- a/hw/net/opencores_eth.c
+++ b/hw/net/opencores_eth.c
@@ -483,7 +483,8 @@ static NetClientInfo net_open_eth_info = {

  static void open_eth_start_xmit(OpenEthState *s, desc *tx)
  {
-uint8_t buf[65536];
+uint8_t *buf = NULL;
+uint8_t buffer[0x600];
  unsigned len = GET_FIELD(tx->len_flags, TXD_LEN);
  unsigned tx_len = len;

@@ -498,6 +499,11 @@ static void open_eth_start_xmit(OpenEthState *s, desc *tx)

  trace_open_eth_start_xmit(tx->buf_ptr, len, tx_len);

+if (tx_len > 0x600) {
+buf = g_new(uint8_t, tx_len);
+} else {
+buf = buffer;
+}
  if (len > tx_len) {
  len = tx_len;
  }
@@ -506,6 +512,9 @@ static void open_eth_start_xmit(OpenEthState *s, desc *tx)
  memset(buf + len, 0, tx_len - len);
  }
  qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len);
+if (tx_len > 0x600) {
+g_free(buf);
+}

  if (tx->len_flags & TXD_WR) {
  s->tx_desc = 0;



--

周潔
Dept 1
No. 6 Wenzhu Road,
Nanjing, 210012, China
TEL:+86+25-86630566-8557
FUJITSU INTERNAL:7998-8557
E-Mail:zhoujie2...@cn.fujitsu.com






[Qemu-devel] [PATCH v2] hw/net/opencores_eth: Allocating Large sized arrays to heap

2016-04-26 Thread Zhou Jie
open_eth_start_xmit has a huge stack usage of 65536 bytes approx.
Moving large arrays to heap to reduce stack usage.

Signed-off-by: Zhou Jie 
---
 hw/net/opencores_eth.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/net/opencores_eth.c b/hw/net/opencores_eth.c
index c6094fb..fa0a4e7 100644
--- a/hw/net/opencores_eth.c
+++ b/hw/net/opencores_eth.c
@@ -483,7 +483,8 @@ static NetClientInfo net_open_eth_info = {
 
 static void open_eth_start_xmit(OpenEthState *s, desc *tx)
 {
-uint8_t buf[65536];
+uint8_t *buf = NULL;
+uint8_t buffer[0x600];
 unsigned len = GET_FIELD(tx->len_flags, TXD_LEN);
 unsigned tx_len = len;
 
@@ -498,6 +499,11 @@ static void open_eth_start_xmit(OpenEthState *s, desc *tx)
 
 trace_open_eth_start_xmit(tx->buf_ptr, len, tx_len);
 
+if (tx_len > 0x600) {
+buf = g_new(uint8_t, tx_len);
+} else {
+buf = buffer;
+}
 if (len > tx_len) {
 len = tx_len;
 }
@@ -506,6 +512,9 @@ static void open_eth_start_xmit(OpenEthState *s, desc *tx)
 memset(buf + len, 0, tx_len - len);
 }
 qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len);
+if (tx_len > 0x600) {
+g_free(buf);
+}
 
 if (tx->len_flags & TXD_WR) {
 s->tx_desc = 0;
-- 
2.5.5






Re: [Qemu-devel] [PATCH v2] vl: change runstate only if new state is different from current state

2016-04-26 Thread Li Zhijian

ping...


Thanks
Li Zhijian

On 04/14/2016 11:25 AM, Li Zhijian wrote:

Previously, qemu will abort at following scenario:
(qemu) stop
(qemu) system_reset
(qemu) system_reset
(qemu) 2016-04-13T20:54:38.979158Z qemu-system-x86_64: invalid runstate 
transition: 'prelaunch' -> 'prelaunch'

Signed-off-by: Li Zhijian 
Acked-by: Paolo Bonzini 
---
from v1
- fix patch title typo 'chanage' -> 'change'
- coding sytle: 'return ;' -> 'return;'
- add Acked tag
  vl.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/vl.c b/vl.c
index 9df534f..039a353 100644
--- a/vl.c
+++ b/vl.c
@@ -692,6 +692,10 @@ void runstate_set(RunState new_state)
  {
  assert(new_state < RUN_STATE__MAX);

+if (current_run_state == new_state) {
+return;
+}
+
  if (!runstate_valid_transitions[current_run_state][new_state]) {
  error_report("invalid runstate transition: '%s' -> '%s'",
   RunState_lookup[current_run_state],







Re: [Qemu-devel] [PATCH v14 13/19] qmp: Tighten output visitor rules

2016-04-26 Thread Eric Blake
On 04/15/2016 03:02 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> Add a new qmp_output_visitor_reset(), which must be called before
>> reusing an exising QmpOutputVisitor on a new root object.  Tighten
>> assertions to require that qmp_output_get_qobject() can only be
>> called after pairing a visit_end_* for every visit_start_* (rather
>> than allowing it to return a partially built object), and that it
>> must not be called unless at least one visit_type_* or visit_start/
>> visit_end pair has occurred since creation/reset (the accidental
>> return of NULL fixed by commit ab8bf1d7 would have been much
>> easier to diagnose).
>>
>> Also, check that we are encountering the expected object or list
>> type (both during visit_end*, and also by validating whether 'name'
>> was NULL - although the latter may need to change later if we
>> improve error messages by always passing in a sensible name).
>> This provides protection against mismatched push(struct)/pop(list)
>> or push(list)/pop(struct), similar to the qmp-input protection
>> added in commit bdd8e6b5.
>>
>> Signed-off-by: Eric Blake 
> 
> As written, the commit message makes me wonder why we add
> qmp_output_visitor_reset() in the same commit.  I think the reason is
> the tightened rules make it necessary.  The commit message could make
> that clearer by explaining the rule changes first, then point out we
> need a reset to comply with the rules.

I think I'll try splitting the addition of qmp_output_visitor_reset()
into a separate patch.

>> @@ -93,6 +92,9 @@ static void qmp_output_add_obj(QmpOutputVisitor *qov, 
>> const char *name,
>>  qdict_put_obj(qobject_to_qdict(cur), name, value);
>>  break;
>>  case QTYPE_QLIST:
>> +/* FIXME: assertion needs adjustment if we fix visit-core
>> + * to pass "name.0" style name during lists.  */
> 
> visit-core merely passes through whatever name it gets from the client.
> Thus, saying we 'fix visit-core to pass "name.0"' is a bit misleading.
> What we'd do is change it to require "name.0", then update its users to
> comply.

Maybe it's not too inaccurate - the only callers are the generated
visit_type_FOOList() functions, but having a common "name.%d" generator
in the core may be easier than bloating each generated visit_type_FOOList.

> 
> Moreover, this is a note, not a FIXME: nothing is broken here.  The
> closest we get to "broken" are the bad error messages, but they're
> elsewhere.
> 
> I'd simply drop the comment.

But this solution nicely sidesteps the "how will we fix error messages",
so I've dropped the comment.

> 
>> +assert(!name);
> 
> PATCH 08 made this part of the contract.  It also added a bunch of
> contract-checking assertions.  Should this one be in PATCH 08, too?

Well, it's only weakly part of the contract unless (until?) we fix
callers/core to pass in "name.0", and then the assert would trigger.
However, checking the assertion in patch 8 is harder, without making the
core track whether it is currently in a list or struct visit (that is,
the only place where we know whether 'name' should be NULL or not is
where we've tracked a stack of our current visit_start_* calls; but the
core is not tracking a stack because that would be redundant with the
stacks in the qmp visitors).  So for now I think I'll just keep it here.


>> +++ b/tests/test-qmp-output-visitor.c
>> @@ -139,6 +139,7 @@ static void test_visitor_out_enum(TestOutputVisitorData 
>> *data,

>> @@ -455,6 +460,7 @@ static void 
>> test_visitor_out_alternate(TestOutputVisitorData *data,
>>  qapi_free_UserDefAlternate(tmp);
>>  qobject_decref(arg);
>>
>> +qmp_output_visitor_reset(data->qov);
>>  tmp = g_new0(UserDefAlternate, 1);
>>  tmp->type = QTYPE_QDICT;
>>  tmp->u.udfu.integer = 1;
> 
> How did you find the places that now need qmp_output_visitor_reset()?

Ran the test, found what asserted, and added a reset() to make the test
pass again.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] hw/net/opencores_eth: Allocating Large sized arrays to heap

2016-04-26 Thread Zhou Jie


On 2016/4/26 12:12, Max Filippov wrote:

Hi Zhou,

On Tue, Apr 26, 2016 at 6:35 AM, Zhou Jie  wrote:

open_eth_start_xmit has a huge stack usage of 65536 bytes approx.
Moving large arrays to heap to reduce stack usage.


It's an exception, not the rule when full 65536 byte long buffer might be
needed. Can we do a little better change and not allocate and free this
buffer every time unconditionally, but instead make buf smaller (1536
bytes, maximal frame length when HUGEN bit is not set in MODER)
and only do allocation when that's not enough?


Thank you for your suggestion.
I will modify this patch.

Sincerely,
Zhou Jie

--

周潔
Dept 1
No. 6 Wenzhu Road,
Nanjing, 210012, China
TEL:+86+25-86630566-8557
FUJITSU INTERNAL:7998-8557
E-Mail:zhoujie2...@cn.fujitsu.com






Re: [Qemu-devel] [PATCH] hw/net/virtio-net: Allocating Large sized arrays to heap

2016-04-26 Thread Zhou Jie

On 2016/4/26 20:42, Michael S. Tsirkin wrote:

On Tue, Apr 26, 2016 at 04:05:24PM +0800, Zhou Jie wrote:

virtio_net_flush_tx has a huge stack usage of 16392 bytes approx.
Moving large arrays to heap to reduce stack usage.

Signed-off-by: Zhou Jie 


I don't think it's appropriate for trivial.
Also - what's the point, exactly?

I think functions should not have very large stack frames.
For 64bit machine it will be 32k.
But according what Christian Borntraeger said, it doesn't matter.
So, considerate that virtio_net_flush_tx is in a hot path,
I agree to not change this.

Sincerely,
Zhou Jie




---
  hw/net/virtio-net.c | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5798f87..cab7bbc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1213,6 +1213,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
  VirtIONet *n = q->n;
  VirtIODevice *vdev = VIRTIO_DEVICE(n);
  VirtQueueElement *elem;
+struct iovec *sg = NULL, *sg2 = NULL;
  int32_t num_packets = 0;
  int queue_index = vq2q(virtio_get_queue_index(q->tx_vq));
  if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
@@ -1224,10 +1225,12 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
  return num_packets;
  }

+sg = g_new(struct iovec, VIRTQUEUE_MAX_SIZE);
+sg2 = g_new(struct iovec, VIRTQUEUE_MAX_SIZE + 1);
  for (;;) {
  ssize_t ret;
  unsigned int out_num;
-struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], 
*out_sg;
+struct iovec *out_sg;
  struct virtio_net_hdr_mrg_rxbuf mhdr;

  elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));
@@ -1252,7 +1255,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
  virtio_net_hdr_swap(vdev, (void *) );
  sg2[0].iov_base = 
  sg2[0].iov_len = n->guest_hdr_len;
-out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1,
+out_num = iov_copy([1], VIRTQUEUE_MAX_SIZE,
 out_sg, out_num,
 n->guest_hdr_len, -1);
  if (out_num == VIRTQUEUE_MAX_SIZE) {
@@ -1269,10 +1272,10 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
   */
  assert(n->host_hdr_len <= n->guest_hdr_len);
  if (n->host_hdr_len != n->guest_hdr_len) {
-unsigned sg_num = iov_copy(sg, ARRAY_SIZE(sg),
+unsigned sg_num = iov_copy(sg, VIRTQUEUE_MAX_SIZE,
 out_sg, out_num,
 0, n->host_hdr_len);
-sg_num += iov_copy(sg + sg_num, ARRAY_SIZE(sg) - sg_num,
+sg_num += iov_copy(sg + sg_num, VIRTQUEUE_MAX_SIZE - sg_num,
   out_sg, out_num,
   n->guest_hdr_len, -1);
  out_num = sg_num;
@@ -1284,6 +1287,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
  if (ret == 0) {
  virtio_queue_set_notification(q->tx_vq, 0);
  q->async_tx.elem = elem;
+g_free(sg);
+g_free(sg2);
  return -EBUSY;
  }

@@ -1296,6 +1301,8 @@ drop:
  break;
  }
  }
+g_free(sg);
+g_free(sg2);
  return num_packets;
  }

--
2.5.5





.



--

周潔
Dept 1
No. 6 Wenzhu Road,
Nanjing, 210012, China
TEL:+86+25-86630566-8557
FUJITSU INTERNAL:7998-8557
E-Mail:zhoujie2...@cn.fujitsu.com






Re: [Qemu-devel] [PATCH 02/19] block: Add BDS.backing_overridden

2016-04-26 Thread Eric Blake
On 04/26/2016 03:32 PM, Max Reitz wrote:
> If the backing file is overridden, this most probably does change the
> guest-visible data of a BDS. Therefore, we will need to consider this in
> bdrv_refresh_filename().
> 
> Adding a new field to the BDS is not nice, but it is very simple and
> exactly keeps track of whether the backing file has been overridden.
> 
> Signed-off-by: Max Reitz 
> ---
>  block.c   | 2 ++
>  include/block/block_int.h | 1 +
>  2 files changed, 3 insertions(+)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 01/19] block: Use children list in bdrv_refresh_filename

2016-04-26 Thread Eric Blake
On 04/26/2016 03:32 PM, Max Reitz wrote:
> bdrv_refresh_filename() should invoke itself recursively on all
> children, not just on file.
> 
> With that change, we can remove the manual invocations in blkverify and
> quorum.
> 
> Signed-off-by: Max Reitz 
> ---
>  block.c   | 9 +
>  block/blkverify.c | 3 ---
>  block/quorum.c| 1 -
>  3 files changed, 5 insertions(+), 8 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-2.7 v2 07/17] rbd: Implement image locking

2016-04-26 Thread Fam Zheng
On Tue, 04/26 10:42, Jason Dillaman wrote:
> On Sun, Apr 24, 2016 at 7:42 PM, Fam Zheng  wrote:
> > On Fri, 04/22 21:57, Jason Dillaman wrote:
> >> Since this cannot automatically recover from a crashed QEMU client with an
> >> RBD image, perhaps this RBD locking should not default to enabled.
> >> Additionally, this will conflict with the "exclusive-lock" feature
> >> available since the Ceph Hammer-release since both utilize the same locking
> >> construct.
> >>
> >> As a quick background, the optional exclusive-lock feature can be
> >> enabled/disabled on image and safely/automatically handles the case of
> >> recovery from a crashed client.  Under normal conditions, the RBD
> >> exclusive-lock feature automatically acquires the lock upon the first
> >> attempt to write to the image and transparently transitions ownership of
> >> the lock between two or more clients -- used for QEMU live-migration.
> >
> > Is it enabled by default?
> >
> 
> Starting with the Jewel release of Ceph it is enabled by default.

OK, then I'll leave rbd in this QEMU series for now.

> 
> >>
> >> I'd be happy to add a new librbd API method to explicitly expose acquiring
> >> and releasing the RBD exclusive lock since it certainly solves a couple
> >> compromises in our current QEMU integration.
> >
> > Does the API do enable/disable or acquire/relase? Could you explain the
> > differences between it and rbd_lock_exclusive?
> 
> There is already an API for enabling/disabling the exclusive-lock
> feature (and associated CLI tooling).  This would be a new API method
> to explicitly acquire / release the exclusive-lock (instead of
> implicitly acquiring the lock upon first write request as it currently
> does in order to support QEMU live-migration).
> 
> The rbd_lock_exclusive/rbd_lock_shared are essentially advisory locks.
> Nothing stops a client from accessing the image if it doesn't attempt
> to acquire the lock (even if another client already has the image
> locked exclusively).  It also doesn't support automatic recovery from
> failed clients.

I see, thanks for the explanation!

Fam



Re: [Qemu-devel] Updating documentation at http://wiki.qemu.org/download/qemu-doc.html

2016-04-26 Thread Mark Cave-Ayland
On 26/04/16 13:37, Jeff Cody wrote:

> On Tue, Apr 26, 2016 at 11:02:45AM +0100, Stefan Hajnoczi wrote:
>> On Tue, Apr 26, 2016 at 07:50:31AM +0100, Mark Cave-Ayland wrote:
>>> Does anyone know if it's possible to update the documentation at the
>>> above URL? It appears as a link at http://wiki.qemu-project.org/Manual
>>> and seems to be the first hit for most people looking for information on
>>> QEMU SPARC emulation, which is frustrating as the information there is
>>> well out of date. Presumably this is because the documentation is hosted
>>> on the qemu.org domain itself and so ranks higher?
>>
>> Hi Mark,
>> I've CCed Jeff Cody, who is taking over as system administrator for
>> QEMU.  The http://wiki.qemu.org/download/qemu-doc.html link is just a
>> static page from 2010.
>>
>> It should be possible to rebuild the HTML docs and replace the file.
>> Even better would be to update it from a cron job.
>>
>> Stefan
> 
> Thanks.
> 
> I've update the qemu-doc.html at the above link with the latest output of
> "make qemu-doc.html" from git master.  I'll set up a daily cronjob to take 
> care
> of that automatically, as well.
> 
> Jeff

Hi Jeff,

Thanks for getting all this set up - the new documentation update looks
good here. Shall I also update the link on
http://wiki.qemu-project.org/Manual to reflect the fact that the updates
are now generated regularly? "Older version of the above" doesn't quite
seem appropriate any more...


ATB,

Mark.




Re: [Qemu-devel] [PATCH v14 08/19] qapi: Document visitor interfaces, add assertions

2016-04-26 Thread Eric Blake
On 04/14/2016 09:22 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> The visitor interface for mapping between QObject/QemuOpts/string
>> and QAPI is scandalously under-documented, making changes to visitor
>> core, individual visitors, and users of visitors difficult to
>> coordinate.  Among other questions: when is it safe to pass NULL,
>> vs. when a string must be provided; which visitors implement which
>> callbacks; the difference between concrete and virtual visits.
>>
>> Correct this by retrofitting proper contracts, and document where some
>> of the interface warts remain (for example, we may want to modify
>> visit_end_* to require the same 'obj' as the visit_start counterpart,
>> so the dealloc visitor can be simplified).  Later patches in this
>> series will tackle some, but not all, of these warts.
>>
>> Add assertions to (partially) enforce the contract.  Some of these
>> were only made possible by recent cleanup commits.
>>
>> +/*
>> + * The QAPI schema defines both a set of C data types, and a QMP wire
>> + * format.  A QAPI object is formed as a directed acyclic graph of
>> + * QAPI values.
> 
> I understand what you're trying to say, but I find the value / object
> dichotomy odd.  For me, A QAPI object isn't a DAG, it's a node in a DAG.
> Perhaps: "QAPI objects can contain references to other QAPI objects,
> resulting in a directed acyclic graph."

Thanks for a lot of good comments. I'm replying only to the questions
you left amidst all the good review.


>> +++ b/qapi/qapi-visit-core.c
>> @@ -23,6 +23,10 @@
>>  void visit_start_struct(Visitor *v, const char *name, void **obj,
>>  size_t size, Error **errp)
>>  {
>> +if (obj) {
>> +assert(size);
> 
> Yes, because the generator puts a dummy member into empty structs.
> 
>> +assert(v->type != VISITOR_OUTPUT || *obj);
> 
> Can you point me to the spot in the contract that requires this?

Translation of the assert: If you are using an output visitor, and not
doing a virtual walk (obj is non-NULL), then the object must be
completely built (*obj is non-NULL).  For an input visitor, *obj is NULL
on entry (we're allocating it, after all); for the dealloc visitor, *obj
may or may not be NULL (since we handle cleanup of partial allocation).

In the text, "output visitors (QMP and string) take a completed QAPI
graph", but maybe I can further clarify that a completed object means
that *obj is non-NULL and all 'has_member' and 'member' members are
complete.

> 
> Unlike last time, my remarks are pretty much only about how to say
> things, not about what to say.  Progress!

Yay!  Hopefully I'll post v15 soon and we can get this in at the start
of the 2.7 cycle.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-arm] [PATCH] target-arm: Fix descriptor address masking in ARM address translation

2016-04-26 Thread Sergey Sorokin
26.04.2016, 19:35, "Tom Hanson" :On 03/21/2016 09:56 AM, Sergey Sorokin wrote: 17.03.2016, 18:24, "Peter Maydell" :   On 17 March 2016 at 15:21, Sergey Sorokin  wrote:17.03.2016, 14:40, "Peter Maydell" :On 13 March 2016 at 18:28, Sergey Sorokin  wrote:If you want to implement the AddressSize checks that's fine,but otherwise please leave this bit of the code alone. You said me that my code is not correct, I have proved that it conforms to the documentation. It's a bit obfuscating when the doc explicitly says to take bits up to 39 from the descriptor, but in QEMU we take bits up to 47 relying on the check in another part of the code, even if both ways are correct.The way the code in QEMU is structured is that we extract thedescriptor field in one go and then will operate on it(checking for need to AddressSize fault, etc) as a secondaction. The field descriptors themselves are the sizes I said.Well, may be it's enough just to change this comment as you intend:- /* The address field in the descriptor goes up to bit 39 for ARMv7- * but up to bit 47 for ARMv8.+ /* The address field in the descriptor goes up to bit 39 for AArch32+ * but up to bit 47 for AArch64.  */   The comment is correct as it stands.   thanks   -- PMM I mean in the patch. We need to fix lower bits in descaddrmask anyway. So: I could describe in the comment, that the descriptor field is up to bit 47 for ARMv8 (as long as you want it), but we use the descaddrmask up to bit 39 for AArch32, because we don't need other bits in that case to construct next descriptor address. It is clearly described in the ARM pseudo-code. Why should we keep in the mask bits from 40 up to 47 if we don't need them? Even if they are all zeroes. It is a bit obfuscating, as I said.  I agree with Peter. The original comment is correct.Looking at the TLBRecord AArch32.TranslationTableWalkLD pseudocode, it is treating the AArch32 address as 48 bits long. For example: if !IsZero(baseregister<47:40>) then level = 0; result.addrdesc.fault = AArch32.AddressSizeFault(ipaddress, domain, level, acctype, iswrite,  secondstage, s2fs1walk); return result;This requires that an AArch32 address have specific values up through bit 47. There is a newer version of the patch. I'm sorry, I forgot to report here about it.

[Qemu-devel] [PATCH 19/19] iotests: Add quorum case to test 110

2016-04-26 Thread Max Reitz
Test 110 tests relative backing filenames for complex BDS trees. Add
quorum as an example that can never work automatically (without
special-casing if all child nodes have the same base directory), and an
example on how to make it work manually (using the base-directory
option).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/110 | 48 ++
 tests/qemu-iotests/110.out |  9 +
 2 files changed, 57 insertions(+)

diff --git a/tests/qemu-iotests/110 b/tests/qemu-iotests/110
index ba1b3c6..f1b7b08 100755
--- a/tests/qemu-iotests/110
+++ b/tests/qemu-iotests/110
@@ -30,6 +30,7 @@ status=1  # failure is the default!
 _cleanup()
 {
_cleanup_test_img
+rm -f "$TEST_IMG.copy"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
@@ -87,6 +88,53 @@ echo
 # omit the image size; it should work anyway
 _make_test_img -b "$TEST_IMG_REL.base"
 
+echo
+echo '=== Nodes without a common directory ==='
+echo
+
+cp "$TEST_IMG" "$TEST_IMG.copy"
+
+# Should not work
+TEST_IMG="json:{
+'driver': '$IMGFMT',
+'file': {
+'driver': 'quorum',
+'vote-threshold': 1,
+'children': [
+{
+'driver': 'file',
+'filename': '$TEST_IMG'
+},
+{
+'driver': 'file',
+'filename': '$TEST_IMG.copy'
+}
+]
+}
+}" _img_info | _filter_img_info
+
+echo
+
+# Should work
+TEST_IMG="json:{
+'driver': '$IMGFMT',
+'file': {
+'driver': 'quorum',
+'base-directory': '$TEST_DIR/',
+'vote-threshold': 1,
+'children': [
+{
+'driver': 'file',
+'filename': '$TEST_IMG'
+},
+{
+'driver': 'file',
+'filename': '$TEST_IMG.copy'
+}
+]
+}
+}" _img_info | _filter_img_info
+
 
 # success, all done
 echo '*** done'
diff --git a/tests/qemu-iotests/110.out b/tests/qemu-iotests/110.out
index 5370bc1..e586538 100644
--- a/tests/qemu-iotests/110.out
+++ b/tests/qemu-iotests/110.out
@@ -19,4 +19,13 @@ backing file: t.IMGFMT.base (actual path: 
TEST_DIR/t.IMGFMT.base)
 === Backing name is always relative to the backed image ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
backing_file=t.IMGFMT.base
+
+=== Nodes without a common directory ===
+
+qemu-img: Cannot generate a base directory for quorum nodes
+
+image: json:{"driver": "IMGFMT", "file": {"children": [{"driver": "file", 
"filename": "TEST_DIR/t.IMGFMT"}, {"driver": "file", "filename": 
"TEST_DIR/t.IMGFMT.copy"}], "driver": "quorum", "blkverify": false, 
"rewrite-corrupted": false, "vote-threshold": 1}}
+file format: IMGFMT
+virtual size: 64M (67108864 bytes)
+backing file: t.IMGFMT.base (actual path: TEST_DIR/t.IMGFMT.base)
 *** done
-- 
2.8.0




[Qemu-devel] [PATCH 17/19] block: Use bdrv_dirname() for relative filenames

2016-04-26 Thread Max Reitz
bdrv_get_full_backing_filename_from_filename() breaks down when it comes
to JSON filenames. Using bdrv_dirname() as the basis is better because
since we have BDS, we can descend through the BDS tree to the protocol
layer, which gives us a greater probability of finding a non-JSON name;
also, bdrv_dirname() is more correct as it allows block drivers to
override the generation of that directory name in a protocol-specific
way.

We still need to keep bdrv_get_full_backing_filename_from_filename(),
though, because it has valid callers which need it during image creation
when no BDS is available yet.

This makes a test case in qemu-iotest 110, which was supposed to fail,
work. That is actually good, but we need to change the reference output
(and the comment in 110) accordingly.

Signed-off-by: Max Reitz 
---
 block.c| 20 +++-
 tests/qemu-iotests/110 |  3 ++-
 tests/qemu-iotests/110.out |  2 +-
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 23962ce..a909b7a 100644
--- a/block.c
+++ b/block.c
@@ -212,12 +212,22 @@ char *bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 static char *bdrv_make_absolute_filename(BlockDriverState *relative_to,
  const char *filename, Error **errp)
 {
-char *bs_filename = relative_to->exact_filename[0]
-? relative_to->exact_filename
-: relative_to->filename;
+char *dir, *full_name;
 
-return bdrv_get_full_backing_filename_from_filename(bs_filename, filename,
-errp);
+if (filename[0] == '\0' || path_has_protocol(filename) ||
+path_is_absolute(filename))
+{
+return g_strdup(filename);
+}
+
+dir = bdrv_dirname(relative_to, errp);
+if (!dir) {
+return NULL;
+}
+
+full_name = g_strconcat(dir, filename, NULL);
+g_free(dir);
+return full_name;
 }
 
 char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
diff --git a/tests/qemu-iotests/110 b/tests/qemu-iotests/110
index 9de7369..ba1b3c6 100755
--- a/tests/qemu-iotests/110
+++ b/tests/qemu-iotests/110
@@ -61,7 +61,8 @@ echo '=== Non-reconstructable filename ==='
 echo
 
 # Across blkdebug without a config file, you cannot reconstruct filenames, so
-# qemu is incapable of knowing the directory of the top image
+# qemu is incapable of knowing the directory of the top image from the filename
+# alone. However, using bdrv_dirname(), it should still work.
 TEST_IMG="json:{
 'driver': '$IMGFMT',
 'file': {
diff --git a/tests/qemu-iotests/110.out b/tests/qemu-iotests/110.out
index b3584ff..5370bc1 100644
--- a/tests/qemu-iotests/110.out
+++ b/tests/qemu-iotests/110.out
@@ -14,7 +14,7 @@ backing file: t.IMGFMT.base (actual path: 
TEST_DIR/t.IMGFMT.base)
 image: json:{"driver": "IMGFMT", "file": {"set-state.0.event": "read_aio", 
"image": {"driver": "file", "filename": "TEST_DIR/t.IMGFMT"}, "driver": 
"blkdebug", "set-state.0.new_state": 42}}
 file format: IMGFMT
 virtual size: 64M (67108864 bytes)
-backing file: t.IMGFMT.base (cannot determine actual path)
+backing file: t.IMGFMT.base (actual path: TEST_DIR/t.IMGFMT.base)
 
 === Backing name is always relative to the backed image ===
 
-- 
2.8.0




[Qemu-devel] [PATCH 14/19] blkverify: Make bdrv_dirname() return NULL

2016-04-26 Thread Max Reitz
blkverify's BDSs have a file BDS, but we do not want this to be
preferred over the raw node. There is no way to decide between the two
(and not really a reason to, either), so just return NULL in blkverify's
implementation of bdrv_dirname().

Signed-off-by: Max Reitz 
---
 block/blkverify.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index f445e3e..35d080f 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -338,6 +338,15 @@ static void blkverify_refresh_filename(BlockDriverState 
*bs, QDict *options)
 }
 }
 
+static char *blkverify_dirname(BlockDriverState *bs, Error **errp)
+{
+/* In general, there are two BDSs with different dirnames below this one;
+ * so there is no unique dirname we could return (unless both are equal by
+ * chance). Therefore, to be consistent, just always return NULL. */
+error_setg(errp, "Cannot generate a base directory for blkverify nodes");
+return NULL;
+}
+
 static BlockDriver bdrv_blkverify = {
 .format_name  = "blkverify",
 .protocol_name= "blkverify",
@@ -348,6 +357,7 @@ static BlockDriver bdrv_blkverify = {
 .bdrv_close   = blkverify_close,
 .bdrv_getlength   = blkverify_getlength,
 .bdrv_refresh_filename= blkverify_refresh_filename,
+.bdrv_dirname = blkverify_dirname,
 
 .bdrv_aio_readv   = blkverify_aio_readv,
 .bdrv_aio_writev  = blkverify_aio_writev,
-- 
2.8.0




[Qemu-devel] [PATCH 12/19] block: Fix bdrv_find_backing_image()

2016-04-26 Thread Max Reitz
bdrv_find_backing_image() should use bdrv_get_full_backing_filename() or
bdrv_make_absolute_filename() instead of trying to do what those
functions do by itself.

path_combine_deprecated() can now be dropped, so let's do that.

Signed-off-by: Max Reitz 
---
 block.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index 49dc2cb..6e89abd 100644
--- a/block.c
+++ b/block.c
@@ -192,15 +192,6 @@ char *path_combine(const char *base_path, const char 
*filename)
 return result;
 }
 
-static void path_combine_deprecated(char *dest, int dest_size,
-const char *base_path,
-const char *filename)
-{
-char *combined = path_combine(base_path, filename);
-pstrcpy(dest, dest_size, combined);
-g_free(combined);
-}
-
 char *bdrv_get_full_backing_filename_from_filename(const char *backed,
const char *backing,
Error **errp)
@@ -3147,7 +3138,6 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 
 filename_full = g_malloc(PATH_MAX);
 backing_file_full = g_malloc(PATH_MAX);
-filename_tmp  = g_malloc(PATH_MAX);
 
 is_protocol = path_has_protocol(backing_file);
 
@@ -3163,22 +3153,31 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 } else {
 /* If not an absolute filename path, make it relative to the 
current
  * image's filename path */
-path_combine_deprecated(filename_tmp, PATH_MAX, curr_bs->filename,
-backing_file);
+filename_tmp = bdrv_make_absolute_filename(curr_bs, backing_file,
+   NULL);
+if (!filename_tmp) {
+continue;
+}
 
 /* We are going to compare absolute pathnames */
 if (!realpath(filename_tmp, filename_full)) {
+g_free(filename_tmp);
 continue;
 }
+g_free(filename_tmp);
 
 /* We need to make sure the backing filename we are comparing 
against
  * is relative to the current image filename (or absolute) */
-path_combine_deprecated(filename_tmp, PATH_MAX, curr_bs->filename,
-curr_bs->backing_file);
+filename_tmp = bdrv_get_full_backing_filename(curr_bs, NULL);
+if (!filename_tmp) {
+continue;
+}
 
 if (!realpath(filename_tmp, backing_file_full)) {
+g_free(filename_tmp);
 continue;
 }
+g_free(filename_tmp);
 
 if (strcmp(backing_file_full, filename_full) == 0) {
 retval = curr_bs->backing->bs;
@@ -3189,7 +3188,6 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 
 g_free(filename_full);
 g_free(backing_file_full);
-g_free(filename_tmp);
 return retval;
 }
 
-- 
2.8.0




[Qemu-devel] [PATCH 11/19] block: Add bdrv_make_absolute_filename()

2016-04-26 Thread Max Reitz
This is a general function for making a filename that is relative to a
certain BDS absolute.

It calls bdrv_get_full_backing_filename_from_filename() for now, but
that will be changed in a follow-up patch.

Signed-off-by: Max Reitz 
---
 block.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 083f4ae..49dc2cb 100644
--- a/block.c
+++ b/block.c
@@ -218,15 +218,22 @@ char *bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 }
 }
 
-char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
+static char *bdrv_make_absolute_filename(BlockDriverState *relative_to,
+ const char *filename, Error **errp)
 {
-char *backed = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
+char *bs_filename = relative_to->exact_filename[0]
+? relative_to->exact_filename
+: relative_to->filename;
 
-return bdrv_get_full_backing_filename_from_filename(backed,
-bs->backing_file,
+return bdrv_get_full_backing_filename_from_filename(bs_filename, filename,
 errp);
 }
 
+char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
+{
+return bdrv_make_absolute_filename(bs, bs->backing_file, errp);
+}
+
 void bdrv_register(BlockDriver *bdrv)
 {
 bdrv_setup_io_funcs(bdrv);
-- 
2.8.0




[Qemu-devel] [PATCH 18/19] block: Add 'base-directory' BDS option

2016-04-26 Thread Max Reitz
Using this option, one can directly override what bdrv_dirname() will
return. This is useful if one uses e.g. qcow2 on top of quorum (with
only protocol BDSs under the quorum BDS) and wants to be able to use
relative backing filenames.

Signed-off-by: Max Reitz 
---
 block.c   | 13 +
 include/block/block_int.h |  2 ++
 qapi/block-core.json  |  9 +
 3 files changed, 24 insertions(+)

diff --git a/block.c b/block.c
index a909b7a..2086c6d 100644
--- a/block.c
+++ b/block.c
@@ -889,6 +889,11 @@ static QemuOptsList bdrv_runtime_opts = {
 .type = QEMU_OPT_BOOL,
 .help = "Ignore flush requests",
 },
+{
+.name = "base-directory",
+.type = QEMU_OPT_STRING,
+.help = "String to be prepended to filenames relative to this BDS 
to make them absolute",
+},
 { /* end of list */ }
 },
 };
@@ -948,6 +953,8 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild 
*file,
 goto fail_opts;
 }
 
+bs->dirname = g_strdup(qemu_opt_get(opts, "base-directory"));
+
 bs->request_alignment = 512;
 bs->zero_beyond_eof = true;
 bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
@@ -2359,6 +2366,8 @@ static void bdrv_delete(BlockDriverState *bs)
 
 bdrv_close(bs);
 
+g_free(bs->dirname);
+
 /* remove from list, if necessary */
 if (bs->node_name[0] != '\0') {
 QTAILQ_REMOVE(_bdrv_states, bs, node_list);
@@ -4043,6 +4052,10 @@ char *bdrv_dirname(BlockDriverState *bs, Error **errp)
 {
 BlockDriver *drv = bs->drv;
 
+if (bs->dirname) {
+return g_strdup(bs->dirname);
+}
+
 if (!drv) {
 error_setg(errp, "Node '%s' is ejected", bs->node_name);
 return NULL;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 087d9ac..863caf2 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -415,6 +415,8 @@ struct BlockDriverState {
 QDict *full_open_options;
 char exact_filename[PATH_MAX];
 
+char *dirname;
+
 BdrvChild *backing;
 BdrvChild *file;
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1d09079..85b8e06 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2045,6 +2045,14 @@
 # @node-name: #optional the name of a block driver state node (Since 2.0).
 # This option is required on the top level of blockdev-add if
 # the @id option is not given there.
+# @base-directory #optional May be specified for any node. Normally, whenever a
+# filename is specified which is supposed to be relative to 
this
+# node (such as relative backing filenames), the base directory
+# to be used is the directory the image file of this node is 
in,
+# which is simply prepended to the relative filename. Using 
this
+# option, the string which is prepended (i.e. the base
+# directory) can be overridden.
+# (Since 2.7)
 # @discard:   #optional discard-related options (default: ignore)
 # @cache: #optional cache-related options
 # @aio:   #optional AIO backend (default: threads)
@@ -2073,6 +2081,7 @@
   'base': { 'driver': 'BlockdevDriver',
 '*id': 'str',
 '*node-name': 'str',
+'*base-directory': 'str',
 '*discard': 'BlockdevDiscardOptions',
 '*cache': 'BlockdevCacheOptions',
 '*aio': 'BlockdevAioOptions',
-- 
2.8.0




[Qemu-devel] [PATCH 08/19] block: Make path_combine() return the path

2016-04-26 Thread Max Reitz
Besides being safe for arbitrary path lengths, after some follow-up
patches all callers will want a freshly allocated buffer anyway.

In the meantime, path_combine_deprecated() is added which has the same
interface as path_combine() had before this patch. All callers to that
function will be converted in follow-up patches.

Signed-off-by: Max Reitz 
---
 block.c   | 81 +--
 block/vmdk.c  |  3 +-
 include/block/block.h |  4 +--
 3 files changed, 48 insertions(+), 40 deletions(-)

diff --git a/block.c b/block.c
index b3ff08f..41f3c92 100644
--- a/block.c
+++ b/block.c
@@ -146,48 +146,59 @@ int path_is_absolute(const char *path)
 #endif
 }
 
-/* if filename is absolute, just copy it to dest. Otherwise, build a
+/* if filename is absolute, just return its duplicate. Otherwise, build a
path to it by considering it is relative to base_path. URL are
supported. */
-void path_combine(char *dest, int dest_size,
-  const char *base_path,
-  const char *filename)
+char *path_combine(const char *base_path, const char *filename)
 {
 const char *p, *p1;
+char *result;
 int len;
 
-if (dest_size <= 0)
-return;
 if (path_is_absolute(filename)) {
-pstrcpy(dest, dest_size, filename);
+return g_strdup(filename);
+}
+
+p = strchr(base_path, ':');
+if (p) {
+p++;
 } else {
-p = strchr(base_path, ':');
-if (p)
-p++;
-else
-p = base_path;
-p1 = strrchr(base_path, '/');
+p = base_path;
+}
+p1 = strrchr(base_path, '/');
 #ifdef _WIN32
-{
-const char *p2;
-p2 = strrchr(base_path, '\\');
-if (!p1 || p2 > p1)
-p1 = p2;
+{
+const char *p2;
+p2 = strrchr(base_path, '\\');
+if (!p1 || p2 > p1) {
+p1 = p2;
 }
+}
 #endif
-if (p1)
-p1++;
-else
-p1 = base_path;
-if (p1 > p)
-p = p1;
-len = p - base_path;
-if (len > dest_size - 1)
-len = dest_size - 1;
-memcpy(dest, base_path, len);
-dest[len] = '\0';
-pstrcat(dest, dest_size, filename);
+if (p1) {
+p1++;
+} else {
+p1 = base_path;
+}
+if (p1 > p) {
+p = p1;
 }
+len = p - base_path;
+
+result = g_malloc(len + strlen(filename) + 1);
+memcpy(result, base_path, len);
+strcpy(result + len, filename);
+
+return result;
+}
+
+static void path_combine_deprecated(char *dest, int dest_size,
+const char *base_path,
+const char *filename)
+{
+char *combined = path_combine(base_path, filename);
+pstrcpy(dest, dest_size, combined);
+g_free(combined);
 }
 
 void bdrv_get_full_backing_filename_from_filename(const char *backed,
@@ -203,7 +214,7 @@ void bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 error_setg(errp, "Cannot use relative backing file names for '%s'",
backed);
 } else {
-path_combine(dest, sz, backed, backing);
+path_combine_deprecated(dest, sz, backed, backing);
 }
 }
 
@@ -3147,8 +3158,8 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 } else {
 /* If not an absolute filename path, make it relative to the 
current
  * image's filename path */
-path_combine(filename_tmp, PATH_MAX, curr_bs->filename,
- backing_file);
+path_combine_deprecated(filename_tmp, PATH_MAX, curr_bs->filename,
+backing_file);
 
 /* We are going to compare absolute pathnames */
 if (!realpath(filename_tmp, filename_full)) {
@@ -3157,8 +3168,8 @@ BlockDriverState 
*bdrv_find_backing_image(BlockDriverState *bs,
 
 /* We need to make sure the backing filename we are comparing 
against
  * is relative to the current image filename (or absolute) */
-path_combine(filename_tmp, PATH_MAX, curr_bs->filename,
- curr_bs->backing_file);
+path_combine_deprecated(filename_tmp, PATH_MAX, curr_bs->filename,
+curr_bs->backing_file);
 
 if (!realpath(filename_tmp, backing_file_full)) {
 continue;
diff --git a/block/vmdk.c b/block/vmdk.c
index 45f9d3c..142de20 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -847,8 +847,7 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 return -EINVAL;
 }
 
-extent_path = g_malloc0(PATH_MAX);
-path_combine(extent_path, PATH_MAX, desc_file_path, fname);
+extent_path = path_combine(desc_file_path, fname);
 
 ret = snprintf(extent_opt_prefix, 32, 

[Qemu-devel] [PATCH 15/19] quorum: Make bdrv_dirname() return NULL

2016-04-26 Thread Max Reitz
While the common implementation for bdrv_dirname() should return NULL
for quorum BDSs already (because they do not have a file node and their
exact_filename field should be empty), there is no reason not to make
that explicit.

Signed-off-by: Max Reitz 
---
 block/quorum.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 98c6588..abe2148 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1053,6 +1053,16 @@ static void quorum_refresh_filename(BlockDriverState 
*bs, QDict *options)
 bs->full_open_options = opts;
 }
 
+static char *quorum_dirname(BlockDriverState *bs, Error **errp)
+{
+/* In general, there are multiple BDSs with different dirnames below this
+ * one; so there is no unique dirname we could return (unless all are equal
+ * by chance, or there is only one). Therefore, to be consistent, just
+ * always return NULL. */
+error_setg(errp, "Cannot generate a base directory for quorum nodes");
+return NULL;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= "quorum",
 .protocol_name  = "quorum",
@@ -1062,6 +1072,7 @@ static BlockDriver bdrv_quorum = {
 .bdrv_file_open = quorum_open,
 .bdrv_close = quorum_close,
 .bdrv_refresh_filename  = quorum_refresh_filename,
+.bdrv_dirname   = quorum_dirname,
 
 .bdrv_co_flush_to_disk  = quorum_co_flush,
 
-- 
2.8.0




[Qemu-devel] [PATCH 07/19] qcow2: Implement bdrv_refresh_filename()

2016-04-26 Thread Max Reitz
Implement this function by invoking
bdrv_default_refresh_format_filename(bs, false). None of the qcow2
runtime options change the guest-visible state of a BDS.

Signed-off-by: Max Reitz 
---
 block/qcow2.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 470734b..bcbf94e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3256,6 +3256,13 @@ void qcow2_signal_corruption(BlockDriverState *bs, bool 
fatal, int64_t offset,
 s->signaled_corruption = true;
 }
 
+static void qcow2_refresh_filename(BlockDriverState *bs, QDict *opts)
+{
+(void)opts;
+
+bdrv_default_refresh_format_filename(bs, false);
+}
+
 static QemuOptsList qcow2_create_opts = {
 .name = "qcow2-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(qcow2_create_opts.head),
@@ -3346,6 +3353,7 @@ BlockDriver bdrv_qcow2 = {
 .bdrv_snapshot_load_tmp = qcow2_snapshot_load_tmp,
 .bdrv_get_info  = qcow2_get_info,
 .bdrv_get_specific_info = qcow2_get_specific_info,
+.bdrv_refresh_filename  = qcow2_refresh_filename,
 
 .bdrv_save_vmstate= qcow2_save_vmstate,
 .bdrv_load_vmstate= qcow2_load_vmstate,
-- 
2.8.0




[Qemu-devel] [PATCH 05/19] block: Add bdrv_default_refresh_protocol_filename

2016-04-26 Thread Max Reitz
Split off the default code for protocol BDS from bdrv_refresh_filename()
into an own function, just as it has been done for format BDS.

Signed-off-by: Max Reitz 
---
 block.c | 58 +-
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/block.c b/block.c
index 447468c..511a0b1 100644
--- a/block.c
+++ b/block.c
@@ -3922,6 +3922,38 @@ static void 
bdrv_default_refresh_format_filename(BlockDriverState *bs)
 }
 }
 
+static void bdrv_default_refresh_protocol_filename(BlockDriverState *bs)
+{
+BlockDriver *drv = bs->drv;
+QDict *opts;
+
+/* There is no underlying file BDS (at least referenced by BDS.file),
+ * so the full options QDict should be equal to the options given
+ * specifically for this block device when it was opened (plus the
+ * driver specification).
+ * Because those options don't change, there is no need to update
+ * full_open_options when it's already set. */
+
+opts = qdict_new();
+append_open_options(opts, bs);
+qdict_put_obj(opts, "driver",
+  QOBJECT(qstring_from_str(drv->format_name)));
+
+if (bs->exact_filename[0]) {
+/* This may not work for all block protocol drivers (some may
+ * require this filename to be parsed), but we have to find some
+ * default solution here, so just include it. If some block driver
+ * does not support pure options without any filename at all or
+ * needs some special format of the options QDict, it needs to
+ * implement the driver-specific bdrv_refresh_filename() function.
+ */
+qdict_put_obj(opts, "filename",
+  QOBJECT(qstring_from_str(bs->exact_filename)));
+}
+
+bs->full_open_options = opts;
+}
+
 /* Updates the following BDS fields:
  *  - exact_filename: A filename which may be used for opening a block device
  *which (mostly) equals the given BDS (even without any
@@ -3967,31 +3999,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 /* Try to reconstruct valid information from the underlying file */
 bdrv_default_refresh_format_filename(bs);
 } else if (!bs->full_open_options && qdict_size(bs->options)) {
-/* There is no underlying file BDS (at least referenced by BDS.file),
- * so the full options QDict should be equal to the options given
- * specifically for this block device when it was opened (plus the
- * driver specification).
- * Because those options don't change, there is no need to update
- * full_open_options when it's already set. */
-
-opts = qdict_new();
-append_open_options(opts, bs);
-qdict_put_obj(opts, "driver",
-  QOBJECT(qstring_from_str(drv->format_name)));
-
-if (bs->exact_filename[0]) {
-/* This may not work for all block protocol drivers (some may
- * require this filename to be parsed), but we have to find some
- * default solution here, so just include it. If some block driver
- * does not support pure options without any filename at all or
- * needs some special format of the options QDict, it needs to
- * implement the driver-specific bdrv_refresh_filename() function.
- */
-qdict_put_obj(opts, "filename",
-  QOBJECT(qstring_from_str(bs->exact_filename)));
-}
-
-bs->full_open_options = opts;
+bdrv_default_refresh_protocol_filename(bs);
 }
 
 if (bs->exact_filename[0]) {
-- 
2.8.0




[Qemu-devel] [PATCH 16/19] block/nbd: Implement bdrv_dirname()

2016-04-26 Thread Max Reitz
The idea behind this implementation is that the export name might be
interpreted as a path (which is the only apparent interpretation of
relative filenames for NBD paths).

The default implementation of bdrv_dirname() would handle that just fine
for nbd+tcp, but not for nbd+unix, because in that case, the last
element of the path is the Unix socket path and not the export name.
Therefore, we need to implement an own bdrv_dirname() which uses the
legacy NBD URL which has the export name at the end.

Signed-off-by: Max Reitz 
---
 block/nbd.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index f7ea3b3..64534bb 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -469,6 +469,32 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
QDict *options)
 bs->full_open_options = opts;
 }
 
+static char *nbd_dirname(BlockDriverState *bs, Error **errp)
+{
+const char *path   = qdict_get_try_str(bs->options, "path");
+const char *host   = qdict_get_try_str(bs->options, "host");
+const char *port   = qdict_get_try_str(bs->options, "port");
+const char *export = qdict_get_try_str(bs->options, "export");
+
+if (path && export) {
+return g_strdup_printf("nbd:unix:%s:exportname=%s/", path, export);
+} else if (path && !export) {
+return g_strdup_printf("nbd:unix:%s:exportname=", path);
+} else if (!path && export && port) {
+return g_strdup_printf("nbd://%s:%s/%s/", host, port, export);
+} else if (!path && export && !port) {
+return g_strdup_printf("nbd://%s/%s/", host, export);
+} else if (!path && !export && port) {
+return g_strdup_printf("nbd://%s:%s/", host, port);
+} else if (!path && !export && !port) {
+return g_strdup_printf("nbd://%s/", host);
+}
+
+error_setg(errp, "Cannot generate a base directory for NBD node '%s'",
+   bs->filename);
+return NULL;
+}
+
 static BlockDriver bdrv_nbd = {
 .format_name= "nbd",
 .protocol_name  = "nbd",
@@ -487,6 +513,7 @@ static BlockDriver bdrv_nbd = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_dirname   = nbd_dirname,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -507,6 +534,7 @@ static BlockDriver bdrv_nbd_tcp = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_dirname   = nbd_dirname,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -527,6 +555,7 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_dirname   = nbd_dirname,
 };
 
 static void bdrv_nbd_init(void)
-- 
2.8.0




[Qemu-devel] [PATCH 13/19] block: Add bdrv_dirname()

2016-04-26 Thread Max Reitz
This function may be implemented by block drivers to derive a directory
name from a BDS. Concatenating this g_free()-able string with a relative
filename must result in a valid (not necessarily existing) filename, so
this is a function that should generally be not implemented by format
drivers, because this is protocol-specific.

If a BDS's driver does not implement this function, bdrv_dirname() will
fall through to the BDS's file if it exists. If it does not, the
exact_filename field will be used to generate a directory name.

Signed-off-by: Max Reitz 
---
 block.c   | 26 ++
 include/block/block.h |  1 +
 include/block/block_int.h |  1 +
 3 files changed, 28 insertions(+)

diff --git a/block.c b/block.c
index 6e89abd..23962ce 100644
--- a/block.c
+++ b/block.c
@@ -4028,3 +4028,29 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 QDECREF(json);
 }
 }
+
+char *bdrv_dirname(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (!drv) {
+error_setg(errp, "Node '%s' is ejected", bs->node_name);
+return NULL;
+}
+
+if (drv->bdrv_dirname) {
+return drv->bdrv_dirname(bs, errp);
+}
+
+if (bs->file) {
+return bdrv_dirname(bs->file->bs, errp);
+}
+
+if (bs->exact_filename[0] != '\0') {
+return path_combine(bs->exact_filename, "");
+}
+
+error_setg(errp, "Cannot generate a base directory for %s nodes",
+   drv->format_name);
+return NULL;
+}
diff --git a/include/block/block.h b/include/block/block.h
index 83f9f31..b0fe5dc 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -437,6 +437,7 @@ char *bdrv_get_full_backing_filename(BlockDriverState *bs, 
Error **errp);
 char *bdrv_get_full_backing_filename_from_filename(const char *backed,
const char *backing,
Error **errp);
+char *bdrv_dirname(BlockDriverState *bs, Error **errp);
 int bdrv_is_snapshot(BlockDriverState *bs);
 
 int path_has_protocol(const char *path);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index eb3665a..087d9ac 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -137,6 +137,7 @@ struct BlockDriver {
 int (*bdrv_make_empty)(BlockDriverState *bs);
 
 void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
+char *(*bdrv_dirname)(BlockDriverState *bs, Error **errp);
 
 /* aio */
 BlockAIOCB *(*bdrv_aio_readv)(BlockDriverState *bs,
-- 
2.8.0




[Qemu-devel] [PATCH 06/19] block: Make bdrv_default_refresh_format_filename public

2016-04-26 Thread Max Reitz
In order to allow block drivers to use that function, it needs to be
public. In order to be useful, it needs to take a parameter which allows
the caller to specify whether the runtime options allowed by the block
driver are actually significant for the guest-visible BDS content.

Signed-off-by: Max Reitz 
---
 block.c   | 9 ++---
 include/block/block_int.h | 3 +++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 511a0b1..b3ff08f 100644
--- a/block.c
+++ b/block.c
@@ -3872,7 +3872,10 @@ static bool append_open_options(QDict *d, 
BlockDriverState *bs)
 return found_any;
 }
 
-static void bdrv_default_refresh_format_filename(BlockDriverState *bs)
+/* @options_significant must be true if any of the driver-specific runtime
+ * options may change the guest-visible content of the BDS */
+void bdrv_default_refresh_format_filename(BlockDriverState *bs,
+  bool options_significant)
 {
 BlockDriver *drv = bs->drv;
 QDict *opts;
@@ -3885,7 +3888,7 @@ static void 
bdrv_default_refresh_format_filename(BlockDriverState *bs)
 }
 
 opts = qdict_new();
-has_open_options = append_open_options(opts, bs);
+has_open_options = append_open_options(opts, bs) && options_significant;
 has_open_options |= bs->backing_overridden;
 
 /* If no specific options have been given for this BDS, the filename of
@@ -3997,7 +4000,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 QDECREF(opts);
 } else if (bs->file) {
 /* Try to reconstruct valid information from the underlying file */
-bdrv_default_refresh_format_filename(bs);
+bdrv_default_refresh_format_filename(bs, true);
 } else if (!bs->full_open_options && qdict_size(bs->options)) {
 bdrv_default_refresh_protocol_filename(bs);
 }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index d73e9ce..eb3665a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -532,6 +532,9 @@ BlockDriver *bdrv_probe_all(const uint8_t *buf, int 
buf_size,
 void bdrv_set_io_limits(BlockDriverState *bs,
 ThrottleConfig *cfg);
 
+void bdrv_default_refresh_format_filename(BlockDriverState *bs,
+  bool options_significant);
+
 
 /**
  * bdrv_add_before_write_notifier:
-- 
2.8.0




[Qemu-devel] [PATCH 09/19] block: bdrv_get_full_backing_filename_from_...'s ret. val.

2016-04-26 Thread Max Reitz
Make bdrv_get_full_backing_filename_from_filename() return an allocated
string instead of placing the result in a caller-provided buffer.

Signed-off-by: Max Reitz 
---
 block.c   | 32 +++-
 block/vmdk.c  |  8 +++-
 include/block/block.h |  7 +++
 3 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/block.c b/block.c
index 41f3c92..a12b1d3 100644
--- a/block.c
+++ b/block.c
@@ -201,20 +201,20 @@ static void path_combine_deprecated(char *dest, int 
dest_size,
 g_free(combined);
 }
 
-void bdrv_get_full_backing_filename_from_filename(const char *backed,
-  const char *backing,
-  char *dest, size_t sz,
-  Error **errp)
+char *bdrv_get_full_backing_filename_from_filename(const char *backed,
+   const char *backing,
+   Error **errp)
 {
 if (backing[0] == '\0' || path_has_protocol(backing) ||
 path_is_absolute(backing))
 {
-pstrcpy(dest, sz, backing);
+return g_strdup(backing);
 } else if (backed[0] == '\0' || strstart(backed, "json:", NULL)) {
 error_setg(errp, "Cannot use relative backing file names for '%s'",
backed);
+return NULL;
 } else {
-path_combine_deprecated(dest, sz, backed, backing);
+return path_combine(backed, backing);
 }
 }
 
@@ -222,9 +222,15 @@ void bdrv_get_full_backing_filename(BlockDriverState *bs, 
char *dest, size_t sz,
 Error **errp)
 {
 char *backed = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
+char *full_name;
 
-bdrv_get_full_backing_filename_from_filename(backed, bs->backing_file,
- dest, sz, errp);
+full_name = bdrv_get_full_backing_filename_from_filename(backed,
+ bs->backing_file,
+ errp);
+if (full_name) {
+pstrcpy(dest, sz, full_name);
+g_free(full_name);
+}
 }
 
 void bdrv_register(BlockDriver *bdrv)
@@ -3551,16 +3557,16 @@ void bdrv_img_create(const char *filename, const char 
*fmt,
 if (size == -1) {
 if (backing_file) {
 BlockDriverState *bs;
-char *full_backing = g_new0(char, PATH_MAX);
+char *full_backing;
 int64_t size;
 int back_flags;
 QDict *backing_options = NULL;
 
-bdrv_get_full_backing_filename_from_filename(filename, 
backing_file,
- full_backing, 
PATH_MAX,
- _err);
+full_backing =
+bdrv_get_full_backing_filename_from_filename(filename,
+ backing_file,
+ _err);
 if (local_err) {
-g_free(full_backing);
 goto out;
 }
 
diff --git a/block/vmdk.c b/block/vmdk.c
index 142de20..58fe380 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1934,12 +1934,10 @@ static int vmdk_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 if (backing_file) {
 BlockBackend *blk;
-char *full_backing = g_new0(char, PATH_MAX);
-bdrv_get_full_backing_filename_from_filename(filename, backing_file,
- full_backing, PATH_MAX,
- _err);
+char *full_backing =
+bdrv_get_full_backing_filename_from_filename(filename, 
backing_file,
+ _err);
 if (local_err) {
-g_free(full_backing);
 error_propagate(errp, local_err);
 ret = -ENOENT;
 goto exit;
diff --git a/include/block/block.h b/include/block/block.h
index 4604465..c6c00dd 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -435,10 +435,9 @@ void bdrv_get_backing_filename(BlockDriverState *bs,
char *filename, int filename_size);
 void bdrv_get_full_backing_filename(BlockDriverState *bs,
 char *dest, size_t sz, Error **errp);
-void bdrv_get_full_backing_filename_from_filename(const char *backed,
-  const char *backing,
-  char *dest, size_t sz,
-  Error **errp);
+char *bdrv_get_full_backing_filename_from_filename(const char *backed,
+

[Qemu-devel] [PATCH 10/19] block: bdrv_get_full_backing_filename's ret. val.

2016-04-26 Thread Max Reitz
Make bdrv_get_full_backing_filename() return an allocated string instead
of placing the result in a caller-provided buffer.

Signed-off-by: Max Reitz 
---
 block.c   | 24 
 block/qapi.c  | 12 ++--
 include/block/block.h |  3 +--
 3 files changed, 11 insertions(+), 28 deletions(-)

diff --git a/block.c b/block.c
index a12b1d3..083f4ae 100644
--- a/block.c
+++ b/block.c
@@ -218,19 +218,13 @@ char *bdrv_get_full_backing_filename_from_filename(const 
char *backed,
 }
 }
 
-void bdrv_get_full_backing_filename(BlockDriverState *bs, char *dest, size_t 
sz,
-Error **errp)
+char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
 {
 char *backed = bs->exact_filename[0] ? bs->exact_filename : bs->filename;
-char *full_name;
 
-full_name = bdrv_get_full_backing_filename_from_filename(backed,
- bs->backing_file,
- errp);
-if (full_name) {
-pstrcpy(dest, sz, full_name);
-g_free(full_name);
-}
+return bdrv_get_full_backing_filename_from_filename(backed,
+bs->backing_file,
+errp);
 }
 
 void bdrv_register(BlockDriver *bdrv)
@@ -1289,7 +1283,7 @@ out:
 int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
const char *bdref_key, Error **errp)
 {
-char *backing_filename = g_malloc0(PATH_MAX);
+char *backing_filename = NULL;
 char *bdref_key_dot;
 const char *reference = NULL;
 int ret = 0;
@@ -1317,13 +1311,12 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 reference = qdict_get_try_str(parent_options, bdref_key);
 if (reference || qdict_haskey(options, "file.filename")) {
 bs->backing_overridden = true;
-backing_filename[0] = '\0';
+backing_filename = NULL;
 } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
 QDECREF(options);
 goto free_exit;
 } else {
-bdrv_get_full_backing_filename(bs, backing_filename, PATH_MAX,
-   _err);
+backing_filename = bdrv_get_full_backing_filename(bs, _err);
 if (local_err) {
 ret = -EINVAL;
 error_propagate(errp, local_err);
@@ -1344,8 +1337,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 }
 
 backing_hd = NULL;
-ret = bdrv_open_inherit(_hd,
-*backing_filename ? backing_filename : NULL,
+ret = bdrv_open_inherit(_hd, backing_filename,
 reference, options, 0, bs, _backing,
 errp);
 if (ret < 0) {
diff --git a/block/qapi.c b/block/qapi.c
index c5f6ba6..a0fb7fb 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -265,18 +265,10 @@ void bdrv_query_image_info(BlockDriverState *bs,
 
 backing_filename = bs->backing_file;
 if (backing_filename[0] != '\0') {
-char *backing_filename2 = g_malloc0(PATH_MAX);
+char *backing_filename2;
 info->backing_filename = g_strdup(backing_filename);
 info->has_backing_filename = true;
-bdrv_get_full_backing_filename(bs, backing_filename2, PATH_MAX, );
-if (err) {
-/* Can't reconstruct the full backing filename, so we must omit
- * this field and apply a Best Effort to this query. */
-g_free(backing_filename2);
-backing_filename2 = NULL;
-error_free(err);
-err = NULL;
-}
+backing_filename2 = bdrv_get_full_backing_filename(bs, NULL);
 
 /* Always report the full_backing_filename if present, even if it's the
  * same as backing_filename. That they are same is useful info. */
diff --git a/include/block/block.h b/include/block/block.h
index c6c00dd..83f9f31 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -433,8 +433,7 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
 const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
 void bdrv_get_backing_filename(BlockDriverState *bs,
char *filename, int filename_size);
-void bdrv_get_full_backing_filename(BlockDriverState *bs,
-char *dest, size_t sz, Error **errp);
+char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp);
 char *bdrv_get_full_backing_filename_from_filename(const char *backed,
const char *backing,
Error **errp);
-- 
2.8.0




[Qemu-devel] [PATCH 01/19] block: Use children list in bdrv_refresh_filename

2016-04-26 Thread Max Reitz
bdrv_refresh_filename() should invoke itself recursively on all
children, not just on file.

With that change, we can remove the manual invocations in blkverify and
quorum.

Signed-off-by: Max Reitz 
---
 block.c   | 9 +
 block/blkverify.c | 3 ---
 block/quorum.c| 1 -
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index d4939b4..e349e83 100644
--- a/block.c
+++ b/block.c
@@ -3885,16 +3885,17 @@ static bool append_open_options(QDict *d, 
BlockDriverState *bs)
 void bdrv_refresh_filename(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
+BdrvChild *child;
 QDict *opts;
 
 if (!drv) {
 return;
 }
 
-/* This BDS's file name will most probably depend on its file's name, so
- * refresh that first */
-if (bs->file) {
-bdrv_refresh_filename(bs->file->bs);
+/* This BDS's file name may depend on any of its children's file names, so
+ * refresh those first */
+QLIST_FOREACH(child, >children, next) {
+bdrv_refresh_filename(child->bs);
 }
 
 if (drv->bdrv_refresh_filename) {
diff --git a/block/blkverify.c b/block/blkverify.c
index 9414b7a..f445e3e 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -313,9 +313,6 @@ static void blkverify_refresh_filename(BlockDriverState 
*bs, QDict *options)
 {
 BDRVBlkverifyState *s = bs->opaque;
 
-/* bs->file->bs has already been refreshed */
-bdrv_refresh_filename(s->test_file->bs);
-
 if (bs->file->bs->full_open_options
 && s->test_file->bs->full_open_options)
 {
diff --git a/block/quorum.c b/block/quorum.c
index da15465..98c6588 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1028,7 +1028,6 @@ static void quorum_refresh_filename(BlockDriverState *bs, 
QDict *options)
 int i;
 
 for (i = 0; i < s->num_children; i++) {
-bdrv_refresh_filename(s->children[i]->bs);
 if (!s->children[i]->bs->full_open_options) {
 return;
 }
-- 
2.8.0




[Qemu-devel] [PATCH 00/19] block: Fix some filename generation issues

2016-04-26 Thread Max Reitz
(Fun fact: This series has been lying around on my disk since last
 November. I guess I forgot to send it because I still wanted to review
 it before sending it out, and that it what I forgot to do. Well, and
 now Berto noticed that we ought to fix something.)


There are some issues regarding filename generation right now:

- You always get a JSON filename if you set even a single qcow2-specific
  runtime options (as long as it does not have a dot in it, which is a
  bug, too, but here it is working in our favor...). That is not nice
  and actually breaks the usage of backing files with relative
  filenames with such qcow2 BDS.

- As hinted above, you cannot use relative backing filenames with BDS
  that have a JSON filename only, even though qemu might be able to
  obtain the directory name by walking through the BDS graph to the
  protocol level.

- Overriding the backing file at runtime should invalidate the filename
  because it actually changes the BDS's data. Therefore, we need to
  force a JSON filename in that case, containing the backing file
  override.

- Much of our code assumes paths never to exceed PATH_MAX in length.
  This is wrong, at least because of JSON filenames. This should be
  fixed wherever the opportunity arises.


This series addresses the first issue by splitting off code from
bdrv_refresh_filename() into bdrv_default_refresh_format_filename() and
bdrv_default_refresh_protocol_filename(). The former will take a boolean
signifying whether the driver-specific runtime options are actually
significant for the BDS data (guest-visible content). For qcow2, they
never are, therefore, the qcow2 driver has to implement
bdrv_refresh_filename() by invoking
bdrv_default_refresh_format_filename() with the boolean set
appropriately (deviating from the actual default implementation which
has to assume that all driver-specific runtime options are significant).
[Patches 4 -- 7]

The second issue is addressed by introducing bdrv_dirname() which
returns the directory of a specific BDS. By default, this is obtained by
walking through the BDS graph to the protocol level and processing that
BDS's filename (exact_filename, to be exact).
This behavior can be overridden by any block driver along the path
implementing bdrv_dirname() itself, or by the user explicitly specifying
the 'base-directory' option for any BDS node.
[Patches 11, 13 -- 18]

The third issue is addressed by paying respect to the backing file
options, and noting whether any of the backing file's options have been
overridden in a way that make opening the backing file implicitly (using
the filename specified in the overlay file) impossible.
[Patches 2 -- 3]

The fourth issue has been addressed as far as it made sense to do it
along fixing the second one.
[Patch 8 -- 10]


Furthermore, there are two fixes to code touched in this series.
[Patches 1 and 12]


Max Reitz (19):
  block: Use children list in bdrv_refresh_filename
  block: Add BDS.backing_overridden
  block: Respect backing bs in bdrv_refresh_filename
  block: Add bdrv_default_refresh_format_filename
  block: Add bdrv_default_refresh_protocol_filename
  block: Make bdrv_default_refresh_format_filename public
  qcow2: Implement bdrv_refresh_filename()
  block: Make path_combine() return the path
  block: bdrv_get_full_backing_filename_from_...'s ret. val.
  block: bdrv_get_full_backing_filename's ret. val.
  block: Add bdrv_make_absolute_filename()
  block: Fix bdrv_find_backing_image()
  block: Add bdrv_dirname()
  blkverify: Make bdrv_dirname() return NULL
  quorum: Make bdrv_dirname() return NULL
  block/nbd: Implement bdrv_dirname()
  block: Use bdrv_dirname() for relative filenames
  block: Add 'base-directory' BDS option
  iotests: Add quorum case to test 110

 block.c   | 340 +++---
 block/blkverify.c |  13 +-
 block/nbd.c   |  29 
 block/qapi.c  |  12 +-
 block/qcow2.c |   8 +
 block/quorum.c|  12 +-
 block/vmdk.c  |  11 +-
 include/block/block.h |  15 +-
 include/block/block_int.h |   7 +
 qapi/block-core.json  |   9 ++
 tests/qemu-iotests/051.out|   8 +-
 tests/qemu-iotests/051.pc.out |   8 +-
 tests/qemu-iotests/110|  51 ++-
 tests/qemu-iotests/110.out|  11 +-
 14 files changed, 372 insertions(+), 162 deletions(-)

-- 
2.8.0




[Qemu-devel] [PATCH 04/19] block: Add bdrv_default_refresh_format_filename

2016-04-26 Thread Max Reitz
Split off the default code for format BDS from bdrv_refresh_filename()
into an own function, first because it is nicer this way, and second
because this will allow block drivers to reuse this function in their
own specific implementation of bdrv_refresh_filename().

Signed-off-by: Max Reitz 
---
 block.c | 95 +++--
 1 file changed, 51 insertions(+), 44 deletions(-)

diff --git a/block.c b/block.c
index aff9ea3..447468c 100644
--- a/block.c
+++ b/block.c
@@ -3872,6 +3872,56 @@ static bool append_open_options(QDict *d, 
BlockDriverState *bs)
 return found_any;
 }
 
+static void bdrv_default_refresh_format_filename(BlockDriverState *bs)
+{
+BlockDriver *drv = bs->drv;
+QDict *opts;
+bool has_open_options;
+
+bs->exact_filename[0] = '\0';
+if (bs->full_open_options) {
+QDECREF(bs->full_open_options);
+bs->full_open_options = NULL;
+}
+
+opts = qdict_new();
+has_open_options = append_open_options(opts, bs);
+has_open_options |= bs->backing_overridden;
+
+/* If no specific options have been given for this BDS, the filename of
+ * the underlying file should suffice for this one as well */
+if (bs->file->bs->exact_filename[0] && !has_open_options) {
+strcpy(bs->exact_filename, bs->file->bs->exact_filename);
+}
+/* Reconstructing the full options QDict is simple for most format block
+ * drivers, as long as the full options are known for the underlying
+ * file BDS. The full options QDict of that file BDS should somehow
+ * contain a representation of the filename, therefore the following
+ * suffices without querying the (exact_)filename of this BDS. */
+if (bs->file->bs->full_open_options &&
+(!bs->backing || bs->backing->bs->full_open_options))
+{
+qdict_put_obj(opts, "driver",
+  QOBJECT(qstring_from_str(drv->format_name)));
+
+QINCREF(bs->file->bs->full_open_options);
+qdict_put_obj(opts, "file",
+  QOBJECT(bs->file->bs->full_open_options));
+
+if (bs->backing) {
+QINCREF(bs->backing->bs->full_open_options);
+qdict_put_obj(opts, "backing",
+  QOBJECT(bs->backing->bs->full_open_options));
+} else if (bs->backing_overridden && !bs->backing) {
+qdict_put_obj(opts, "backing", QOBJECT(qstring_new()));
+}
+
+bs->full_open_options = opts;
+} else {
+QDECREF(opts);
+}
+}
+
 /* Updates the following BDS fields:
  *  - exact_filename: A filename which may be used for opening a block device
  *which (mostly) equals the given BDS (even without any
@@ -3915,50 +3965,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 QDECREF(opts);
 } else if (bs->file) {
 /* Try to reconstruct valid information from the underlying file */
-bool has_open_options;
-
-bs->exact_filename[0] = '\0';
-if (bs->full_open_options) {
-QDECREF(bs->full_open_options);
-bs->full_open_options = NULL;
-}
-
-opts = qdict_new();
-has_open_options = append_open_options(opts, bs);
-has_open_options |= bs->backing_overridden;
-
-/* If no specific options have been given for this BDS, the filename of
- * the underlying file should suffice for this one as well */
-if (bs->file->bs->exact_filename[0] && !has_open_options) {
-strcpy(bs->exact_filename, bs->file->bs->exact_filename);
-}
-/* Reconstructing the full options QDict is simple for most format 
block
- * drivers, as long as the full options are known for the underlying
- * file BDS. The full options QDict of that file BDS should somehow
- * contain a representation of the filename, therefore the following
- * suffices without querying the (exact_)filename of this BDS. */
-if (bs->file->bs->full_open_options &&
-(!bs->backing || bs->backing->bs->full_open_options))
-{
-qdict_put_obj(opts, "driver",
-  QOBJECT(qstring_from_str(drv->format_name)));
-
-QINCREF(bs->file->bs->full_open_options);
-qdict_put_obj(opts, "file",
-  QOBJECT(bs->file->bs->full_open_options));
-
-if (bs->backing) {
-QINCREF(bs->backing->bs->full_open_options);
-qdict_put_obj(opts, "backing",
-  QOBJECT(bs->backing->bs->full_open_options));
-} else if (bs->backing_overridden && !bs->backing) {
-qdict_put_obj(opts, "backing", QOBJECT(qstring_new()));
-}
-
-bs->full_open_options = opts;
-} else {
-QDECREF(opts);
-}
+bdrv_default_refresh_format_filename(bs);
 } else if (!bs->full_open_options && 

[Qemu-devel] [PATCH 02/19] block: Add BDS.backing_overridden

2016-04-26 Thread Max Reitz
If the backing file is overridden, this most probably does change the
guest-visible data of a BDS. Therefore, we will need to consider this in
bdrv_refresh_filename().

Adding a new field to the BDS is not nice, but it is very simple and
exactly keeps track of whether the backing file has been overridden.

Signed-off-by: Max Reitz 
---
 block.c   | 2 ++
 include/block/block_int.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/block.c b/block.c
index e349e83..e178488 100644
--- a/block.c
+++ b/block.c
@@ -1299,6 +1299,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 
 reference = qdict_get_try_str(parent_options, bdref_key);
 if (reference || qdict_haskey(options, "file.filename")) {
+bs->backing_overridden = true;
 backing_filename[0] = '\0';
 } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
 QDECREF(options);
@@ -1589,6 +1590,7 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 backing = qdict_get_try_str(options, "backing");
 if (backing && *backing == '\0') {
 flags |= BDRV_O_NO_BACKING;
+bs->backing_overridden = true;
 qdict_del(options, "backing");
 }
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 10d8759..d73e9ce 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -409,6 +409,7 @@ struct BlockDriverState {
 char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
 this file image */
 char backing_format[16]; /* if non-zero and backing_file exists */
+bool backing_overridden; /* backing file has been specified by the user */
 
 QDict *full_open_options;
 char exact_filename[PATH_MAX];
-- 
2.8.0




[Qemu-devel] [PATCH 03/19] block: Respect backing bs in bdrv_refresh_filename

2016-04-26 Thread Max Reitz
Basically, bdrv_refresh_filename() should respect all children of a
BlockDriverState. However, generally those children are driver-specific,
so this function cannot handle the general case. On the other hand,
there are only few drivers which use other children than @file and
@backing (that being vmdk, quorum, and blkverify).

Most block drivers only use @file and/or @backing (if they use any
children at all). Both can be implemented directly in
bdrv_refresh_filename.

The user overriding the file's filename is already handled, however, the
user overriding the backing file is not. If this is done, opening the
BDS with the plain filename of its file will not be correct, so we may
not set bs->exact_filename in that case.

iotest 051 contains test cases for overwriting the backing file, and so
its output changes with this patch applied (which I consider a good
thing).

Signed-off-by: Max Reitz 
---
 block.c   | 14 +-
 tests/qemu-iotests/051.out|  8 
 tests/qemu-iotests/051.pc.out |  8 
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index e178488..aff9ea3 100644
--- a/block.c
+++ b/block.c
@@ -3925,6 +3925,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 
 opts = qdict_new();
 has_open_options = append_open_options(opts, bs);
+has_open_options |= bs->backing_overridden;
 
 /* If no specific options have been given for this BDS, the filename of
  * the underlying file should suffice for this one as well */
@@ -3936,13 +3937,24 @@ void bdrv_refresh_filename(BlockDriverState *bs)
  * file BDS. The full options QDict of that file BDS should somehow
  * contain a representation of the filename, therefore the following
  * suffices without querying the (exact_)filename of this BDS. */
-if (bs->file->bs->full_open_options) {
+if (bs->file->bs->full_open_options &&
+(!bs->backing || bs->backing->bs->full_open_options))
+{
 qdict_put_obj(opts, "driver",
   QOBJECT(qstring_from_str(drv->format_name)));
+
 QINCREF(bs->file->bs->full_open_options);
 qdict_put_obj(opts, "file",
   QOBJECT(bs->file->bs->full_open_options));
 
+if (bs->backing) {
+QINCREF(bs->backing->bs->full_open_options);
+qdict_put_obj(opts, "backing",
+  QOBJECT(bs->backing->bs->full_open_options));
+} else if (bs->backing_overridden && !bs->backing) {
+qdict_put_obj(opts, "backing", QOBJECT(qstring_new()));
+}
+
 bs->full_open_options = opts;
 } else {
 QDECREF(opts);
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 408d613..d936798 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -59,7 +59,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
 Testing: -drive 
file=TEST_DIR/t.qcow2,driver=qcow2,backing.file.filename=TEST_DIR/t.qcow2.orig,if=none,id=drive0
 -nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) iininfinfoinfo 
info binfo 
blinfo bloinfo 
blocinfo block
-drive0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
+drive0 (NODE_NAME): json:{"backing": {"driver": "qcow2", "file": {"driver": 
"file", "filename": "TEST_DIR/t.qcow2.orig"}}, "driver": "qcow2", "file": 
{"driver": "file", "filename": "TEST_DIR/t.qcow2"}} (qcow2)
 Removable device: not locked, tray closed
 Cache mode:   writeback
 Backing file: TEST_DIR/t.qcow2.orig (chain depth: 1)
@@ -148,7 +148,7 @@ QEMU_PROG: -drive driver=null-co,cache=invalid_value: 
invalid cache option
 Testing: -drive 
file=TEST_DIR/t.qcow2,cache=writeback,backing.file.filename=TEST_DIR/t.qcow2.base,backing.cache.no-flush=on,backing.node-name=backing,backing.file.node-name=backing-file,file.node-name=file,if=none,id=drive0
 -nodefaults
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) iininfinfoinfo 
info binfo 
blinfo bloinfo 
blocinfo block
-drive0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
+drive0 (NODE_NAME): json:{"backing": {"driver": "qcow2", "file": {"driver": 
"file", "filename": "TEST_DIR/t.qcow2.base"}}, "driver": "qcow2", "file": 
{"driver": "file", "filename": "TEST_DIR/t.qcow2"}} (qcow2)
 Removable device: not locked, tray closed
 Cache mode:   writeback
 Backing file: TEST_DIR/t.qcow2.base (chain depth: 1)
@@ -168,7 +168,7 @@ backing-file: TEST_DIR/t.qcow2.base (file, read-only)
 Testing: -drive 

Re: [Qemu-devel] [PATCH] usb-mtp: fix usb_mtp_get_device_info so that libmtp on the guest doesn't complain

2016-04-26 Thread Bandan Das
Stefan Hajnoczi  writes:

> On Sun, Apr 17, 2016 at 04:29:53AM -0700, Isaac Lozano wrote:
>> If an application uses libmtp on the guest system,
>> it will complain with the warning message:
>> LIBMTP WARNING: VendorExtensionID: 
>> LIBMTP WARNING: VendorExtensionDesc: (null)
>> LIBMTP WARNING: this typically means the device is PTP (i.e. a camera) but
>> not a MTP device at all. Trying to continue anyway.
>> 
>> This is because libmtp expects a MTP Vendor Extension ID of 0x0006 and a
>> MTP Version of 0x0064. These numbers are taken from Microsoft's MTP Vendor
>> Extension Identification Message page and are what most physical devices
>> show.
>> 
>> Signed-off-by: Isaac Lozano <109loza...@gmail.com>
>> ---
>>  hw/usb/dev-mtp.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Welcome to QEMU!
>
> Link to MTP Vendor Extension Identification Message page for other
> reviewers:
> https://msdn.microsoft.com/en-us/library/ff632482.aspx

I am confused why the MTP spec on usb.org specifies in 5.1.1.2 that
this value should be 0x for MTP devices. Is it just not updated
or there are similar MS specific differences to watch out for ? Either
way, it seems libmtp will always be a good reference when we are stuck
with such discrepencies.

> Reviewed-by: Stefan Hajnoczi 



Re: [Qemu-devel] [RFC PATCH v2 2/2] spapr: Memory hot-unplug support

2016-04-26 Thread Michael Roth
Quoting Igor Mammedov (2016-04-26 02:52:36)
> On Tue, 26 Apr 2016 10:39:23 +0530
> Bharata B Rao  wrote:
> 
> > On Mon, Apr 25, 2016 at 11:20:50AM +0200, Igor Mammedov wrote:
> > > On Wed, 16 Mar 2016 10:11:54 +0530
> > > Bharata B Rao  wrote:
> > >   
> > > > On Wed, Mar 16, 2016 at 12:36:05PM +1100, David Gibson wrote:  
> > > > > On Tue, Mar 15, 2016 at 10:08:56AM +0530, Bharata B Rao wrote:
> > > > > > Add support to hot remove pc-dimm memory devices.
> > > > > > 
> > > > > > Signed-off-by: Bharata B Rao 
> > > > > 
> > > > > Reviewed-by: David Gibson 
> > > > > 
> > > > > Looks correct, but again, needs to wait on the PAPR change.  
> > > [...]  
> > > > 
> > > > While we are here, I would also like to get some opinion on the real
> > > > need for memory unplug. Is there anything that memory unplug gives us
> > > > which memory ballooning (shrinking mem via ballooning) can't give ?  
> > > Sure ballooning can complement memory hotplug but turning it on would
> > > effectively reduce hotplug to balloning as it would enable overcommit
> > > capability instead of hard partitioning pc-dimms provides. So one
> > > could just use ballooning only and not bother with hotplug at all.
> > > 
> > > On the other hand memory hotplug/unplug (at least on x86) tries
> > > to model real hardware, thus removing need in paravirt ballooning
> > > solution in favor of native guest support.  
> > 
> > Thanks for your views.
> > 
> > > 
> > > PS:
> > > Guest wise, currently hot-unplug is not well supported in linux,
> > > i.e. it's not guarantied that guest will honor unplug request
> > > as it may pin dimm by using it as a non migratable memory. So
> > > there is something to work on guest side to make unplug more
> > > reliable/guarantied.  
> > 
> > In the above scenario where the guest doesn't allow removal of certain
> > parts of DIMM memory, what is the expected behaviour as far as QEMU
> > DIMM device is concerned ? I seem to be running into this situation
> > very often with PowerPC mem unplug where I am left with a DIMM device
> > that has only some memory blocks released. In this situation, I would like
> > to block further unplug requests on the same device, but QEMU seems
> > to allow more such unplug requests to come in via the monitor. So
> > qdev won't help me here ? Should I detect such condition from the
> > machine unplug() handler and take required action ?
> I think offlining is a guests task along with recovering from
> inability to offline (i.e. offline all + eject or restore original state).
> QUEM does it's job by notifying guest what dimm it wants to remove
> and removes it when guest asks it (at least in x86 world).

In the case of pseries, the DIMM abstraction isn't really exposed to
the guest, but rather the memory blocks we use to make the backing
memdev memory available to the guest. During unplug, the guest
completely releases these blocks back to QEMU, and if it can only
release a subset of what's requested it does not attempt to recover.
We can potentially change that behavior on the guest side, since
partially-freed DIMMs aren't currently useful on the host-side...

But, in the case of pseries, I wonder if it makes sense to maybe go
ahead and MADV_DONTNEED the ranges backing these released blocks so the
host can at least partially reclaim the memory from a partially
unplugged DIMM?

> 
> > 
> > On x86, if some pages are offlined and subsequently other pages couldn't
> > be offlined, then I see the full DIMM memory size remaining
> > with the guest. So I infer that on x86, QEMU memory unplug either
> > removes full DIMM or nothing. Is that understanding correct ?
> I wouldn't bet that it's guarantied behavior but it should be this way.
> 
> > 
> > Regards,
> > Bharata.
> > 
> 




Re: [Qemu-devel] Is anyone able to load a web page from a guest operating system?

2016-04-26 Thread Programmingkid

On Apr 26, 2016, at 4:12 PM, Thomas Huth wrote:

> On 26.04.2016 21:25, Programmingkid wrote:
>> 
>> On Apr 26, 2016, at 3:00 PM, Dr. David Alan Gilbert wrote:
>> 
>>> * Programmingkid (programmingk...@gmail.com) wrote:
 My three guest operating systems can't load a web page. I think this is a 
 bug with QEMU. Is there anyone who has the latest revision of QEMU that 
 can access the web from a guest? Or are you experiencing the same problem?
>>> 
>>> Works here.
>> So you are using a very recent version of QEMU? Maybe something that was 
>> pulled in the last day or so?
>> 
>>> Now - how about some basic debug!
>>> Does your guest see a network card?
>> yes
>> 
>>> Does DNS work?
>> Doesn't look like it. If I use just an ip address it still doesn't work.
>> 
>>> Does ping work?
>> I can ping the virtual router at 10.0.2.2. Any other ip address fails. 
> 
> That's normal for user-mode / slirp networking. You can't ping external
> hosts with this mode.
> 
>>> Does a telnet/ssh from the guest work?
>> telnet www.google.com 80 failed
>> Didn't have an address to use ssh on.
>> 
>>> What's the qemu command line you use?
>> qemu-system-ppc -hda  -hdb  -m 512 -boot c -M mac99 -netdev 
>> user,id=mynet0 -device usb-net,netdev=mynet0 -cpu 750 -prom-env boot-args=-v 
>> -device ich9-usb-uhci1,id=newusb -device usb-audio,bus=newusb.0 
>> 
>> and
>> 
>> qemu-system-ppc -hda  -hdb  -m 512 -boot c -M mac99 -netdev 
>> user,id=mynet0 -device rtl8139,netdev=mynet0 -cpu 750 -prom-env boot-args=-v 
>> -device ich9-usb-uhci1,id=newusb -device usb-audio,bus=newusb.0 
> 
> Ok, that means you're using user-mode / slirp networking.
> I just tried it with a pseries guest, and it seems to be working fine
> for me with the current git version of QEMU (f419a626c76bcb266).

So you are saying you can view web pages on your guest?

> 
> Now, what kind of host do you use? Mac OS X? 
Yes. Mac OS 10.6.

> Also can you determine a revision when it was still working fine for
> you? (and then maybe even bisect the problem?)

I will see what I can find out.


Re: [Qemu-devel] Is anyone able to load a web page from a guest operating system?

2016-04-26 Thread Thomas Huth
On 26.04.2016 21:25, Programmingkid wrote:
> 
> On Apr 26, 2016, at 3:00 PM, Dr. David Alan Gilbert wrote:
> 
>> * Programmingkid (programmingk...@gmail.com) wrote:
>>> My three guest operating systems can't load a web page. I think this is a 
>>> bug with QEMU. Is there anyone who has the latest revision of QEMU that can 
>>> access the web from a guest? Or are you experiencing the same problem?
>>
>> Works here.
> So you are using a very recent version of QEMU? Maybe something that was 
> pulled in the last day or so?
> 
>> Now - how about some basic debug!
>> Does your guest see a network card?
> yes
> 
>> Does DNS work?
> Doesn't look like it. If I use just an ip address it still doesn't work.
> 
>> Does ping work?
> I can ping the virtual router at 10.0.2.2. Any other ip address fails. 

That's normal for user-mode / slirp networking. You can't ping external
hosts with this mode.

>> Does a telnet/ssh from the guest work?
> telnet www.google.com 80 failed
> Didn't have an address to use ssh on.
> 
>> What's the qemu command line you use?
> qemu-system-ppc -hda  -hdb  -m 512 -boot c -M mac99 -netdev 
> user,id=mynet0 -device usb-net,netdev=mynet0 -cpu 750 -prom-env boot-args=-v 
> -device ich9-usb-uhci1,id=newusb -device usb-audio,bus=newusb.0 
> 
> and
> 
> qemu-system-ppc -hda  -hdb  -m 512 -boot c -M mac99 -netdev 
> user,id=mynet0 -device rtl8139,netdev=mynet0 -cpu 750 -prom-env boot-args=-v 
> -device ich9-usb-uhci1,id=newusb -device usb-audio,bus=newusb.0 

Ok, that means you're using user-mode / slirp networking.
I just tried it with a pseries guest, and it seems to be working fine
for me with the current git version of QEMU (f419a626c76bcb266).

Now, what kind of host do you use? Mac OS X?
Also can you determine a revision when it was still working fine for
you? (and then maybe even bisect the problem?)

 Thomas




Re: [Qemu-devel] Is anyone able to load a web page from a guest operating system?

2016-04-26 Thread Programmingkid

On Apr 26, 2016, at 3:00 PM, Dr. David Alan Gilbert wrote:

> * Programmingkid (programmingk...@gmail.com) wrote:
>> My three guest operating systems can't load a web page. I think this is a 
>> bug with QEMU. Is there anyone who has the latest revision of QEMU that can 
>> access the web from a guest? Or are you experiencing the same problem?
> 
> Works here.
So you are using a very recent version of QEMU? Maybe something that was pulled 
in the last day or so?

> Now - how about some basic debug!
> Does your guest see a network card?
yes

> Does DNS work?
Doesn't look like it. If I use just an ip address it still doesn't work.

> Does ping work?
I can ping the virtual router at 10.0.2.2. Any other ip address fails. 

> Does a telnet/ssh from the guest work?
telnet www.google.com 80 failed
Didn't have an address to use ssh on.

> What's the qemu command line you use?
qemu-system-ppc -hda  -hdb  -m 512 -boot c -M mac99 -netdev 
user,id=mynet0 -device usb-net,netdev=mynet0 -cpu 750 -prom-env boot-args=-v 
-device ich9-usb-uhci1,id=newusb -device usb-audio,bus=newusb.0 

and

qemu-system-ppc -hda  -hdb  -m 512 -boot c -M mac99 -netdev 
user,id=mynet0 -device rtl8139,netdev=mynet0 -cpu 750 -prom-env boot-args=-v 
-device ich9-usb-uhci1,id=newusb -device usb-audio,bus=newusb.0 

This and my other guest did access the internet before. 

Thanks for helping.


Re: [Qemu-devel] Is anyone able to load a web page from a guest operating system?

2016-04-26 Thread Dr. David Alan Gilbert
* Programmingkid (programmingk...@gmail.com) wrote:
> My three guest operating systems can't load a web page. I think this is a bug 
> with QEMU. Is there anyone who has the latest revision of QEMU that can 
> access the web from a guest? Or are you experiencing the same problem?

Works here.

Now - how about some basic debug!
Does your guest see a network card?
Does DNS work?
Does ping work?
Does a telnet/ssh from the guest work?
What's the qemu command line you use?

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [RFC for-2.7 05/11] pseries: Build device tree only at reset time

2016-04-26 Thread Thomas Huth
On 20.04.2016 04:33, David Gibson wrote:
> Currently the pseries code builds a "skeleton" device tree at machine init
> time, then adds a bunch of stuff to it at reset.  Over time, more and more
> logic has had to be moved from init to reset time, and there's really no
> advantage to doing any of it at init time.
> 
> This patch removes the init time spapr_create_fdt_skel() and moves its
> logic into the reset time spapr_build_fdt().  There's still a fairly
> pointless divide between the "skeleton" logic (using libfdt serial-write
> functions) and the "tweak" logic (using libfdt random access functions)
> but at least it all happens at the same time, making further consolidation
> easier.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr.c | 398 
> -
>  include/hw/ppc/spapr.h |   1 -
>  2 files changed, 192 insertions(+), 207 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index da10136..6e1192f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
...
> @@ -901,7 +702,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>   hwaddr rtas_addr,
>   hwaddr rtas_size)
>  {
> -MachineState *machine = MACHINE(qdev_get_machine());
> +MachineState *machine = MACHINE(spapr);
>  sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
>  const char *boot_device = machine->boot_order;
>  int ret, i;
> @@ -909,11 +710,200 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>  char *bootlist;
>  void *fdt;
>  sPAPRPHBState *phb;
> +uint32_t start_prop = cpu_to_be32(spapr->initrd_base);
> +uint32_t end_prop = cpu_to_be32(spapr->initrd_base + spapr->initrd_size);
> +GString *hypertas = g_string_sized_new(256);
> +GString *qemu_hypertas = g_string_sized_new(256);
> +uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
> +uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(max_cpus)};
> +unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
> +char *buf;
> +
> +add_str(hypertas, "hcall-pft");
> +add_str(hypertas, "hcall-term");
> +add_str(hypertas, "hcall-dabr");
> +add_str(hypertas, "hcall-interrupt");
> +add_str(hypertas, "hcall-tce");
> +add_str(hypertas, "hcall-vio");
> +add_str(hypertas, "hcall-splpar");
> +add_str(hypertas, "hcall-bulk");
> +add_str(hypertas, "hcall-set-mode");
> +add_str(qemu_hypertas, "hcall-memop1");
> +
> +fdt = g_malloc0(FDT_MAX_SIZE);
> +_FDT((fdt_create(fdt, FDT_MAX_SIZE)));
> +
> +if (spapr->kernel_size) {
> +_FDT((fdt_add_reservemap_entry(fdt, KERNEL_LOAD_ADDR,
> +   spapr->kernel_size)));
> +}
> +if (spapr->initrd_size) {
> +_FDT((fdt_add_reservemap_entry(fdt, spapr->initrd_base,
> +   spapr->initrd_size)));
> +}
> +_FDT((fdt_finish_reservemap(fdt)));
> +
> +/* Root node */
> +_FDT((fdt_begin_node(fdt, "")));
> +_FDT((fdt_property_string(fdt, "device_type", "chrp")));
> +_FDT((fdt_property_string(fdt, "model", "IBM pSeries (emulated by 
> qemu)")));
> +_FDT((fdt_property_string(fdt, "compatible", "qemu,pseries")));
> +
> +/*
> + * Add info to guest to indentify which host is it being run on
> + * and what is the uuid of the guest
> + */
> +if (kvmppc_get_host_model()) {
> +_FDT((fdt_property_string(fdt, "host-model", buf)));
> +g_free(buf);
> +}
> +if (kvmppc_get_host_serial()) {
> +_FDT((fdt_property_string(fdt, "host-serial", buf)));
> +g_free(buf);
> +}
>  
> -fdt = g_malloc(FDT_MAX_SIZE);
> +buf = g_strdup_printf(UUID_FMT, qemu_uuid[0], qemu_uuid[1],
> +  qemu_uuid[2], qemu_uuid[3], qemu_uuid[4],
> +  qemu_uuid[5], qemu_uuid[6], qemu_uuid[7],
> +  qemu_uuid[8], qemu_uuid[9], qemu_uuid[10],
> +  qemu_uuid[11], qemu_uuid[12], qemu_uuid[13],
> +  qemu_uuid[14], qemu_uuid[15]);
> +
> +_FDT((fdt_property_string(fdt, "vm,uuid", buf)));
> +if (qemu_uuid_set) {
> +_FDT((fdt_property_string(fdt, "system-id", buf)));
> +}
> +g_free(buf);
> +
> +if (qemu_get_vm_name()) {
> +_FDT((fdt_property_string(fdt, "ibm,partition-name",
> +  qemu_get_vm_name(;
> +}
> +
> +_FDT((fdt_property_cell(fdt, "#address-cells", 0x2)));
> +_FDT((fdt_property_cell(fdt, "#size-cells", 0x2)));
> +
> +/* /chosen */
> +_FDT((fdt_begin_node(fdt, "chosen")));
> +
> +/* Set Form1_affinity */
> +_FDT((fdt_property(fdt, "ibm,architecture-vec-5", vec5, sizeof(vec5;
> +
> +_FDT((fdt_property_string(fdt, "bootargs", machine->kernel_cmdline)));
> +_FDT((fdt_property(fdt, "linux,initrd-start",
> + 

Re: [Qemu-devel] KVM Forum 2016: Call For Participation

2016-04-26 Thread Peter Maydell
On 10 March 2016 at 18:09, Paolo Bonzini  wrote:
> =
> KVM Forum 2016: Call For Participation
> August 24-26, 2016 - Westin Harbor Castle - Toronto, Canada
>
> (All submissions must be received before midnight May 1, 2016)
> =
>
> KVM Forum is an annual event that presents a rare opportunity
> for developers and users to meet, discuss the state of Linux
> virtualization technology, and plan for the challenges ahead.
> We invite you to lead part of the discussion by submitting a speaking
> proposal for KVM Forum 2016.

Hi; this is just a followup to remind people that the KVM Forum
CFP deadline of May 1st is rapidly approaching, so if you were
thinking of submitting a proposal now is the time.

All the CFP information is here:
http://events.linuxfoundation.org/events/kvm-forum/program/cfp

thanks
-- PMM (on behalf of the KVM Forum 2016 Program Committee)



Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-04-26 Thread Jan Kiszka
On 2016-04-26 18:07, Radim Krčmář wrote:
> 2016-04-26 17:28+0200, Jan Kiszka:
>> On 2016-04-26 16:59, Radim Krčmář wrote:
>>> 2016-04-26 16:24+0200, Jan Kiszka:
 On 2016-04-26 13:40, Peter Xu wrote:
> Currently, all the interrupts will be translated into one MSI in
> vtd_generate_msi_message(), in which only 8 bits of dest_id is used
> (msg.dest = irq->dest). We may possibly need to use the high 32 bits
> of MSI address to store the rest dest[31:8]? Don't know whether this
> would be enough though.

 Yes, I ran into this topic as well as I hacked up those line. Currently,
 KVM does not support more than 254 vCPUs, so 8 bits of those 32 are
 actually fine, and piggy-backing them in an MSI message works.

 Once KVM supports more CPUs, it has to come up with a new userspace
 interface to inject APIC events for more than 255 CPUs. Maybe the
 existing direct MSI inject with its unused flags could be "bended",
 maybe there are already better ideas (Radim?).
>>>
>>> Adding a flag to msi_msg and taking 3-4 bytes from padding to express
>>> x2APIC addresses is reasonable.  (It is what my prototype did. :])
>>>
>>> The conceptually different idea is forcing all userspace interrupts
>>> through irqfd routes, which would obsolete the ad-host inject.
>>
>> irqfd for userspace sources is a bit clumsy from the API POV. On the
>> other hand, we need to tweak the routing API anyway to achieve the same
>> address extension there, too.
> 
> I agree, both of them should change.  KVM_SIGNAL_MSI was added just
> because the route-based injection in kvm_irqchip_send_msi() sucked;
> maybe we could find a good solution now, but the direct interface is
> already here ...

We won't get away without irq routes because we need them for sources
that only issue binary signals for interrupt events (in-kernel, eventfd
from userspace). In fact, those sources should not know more than that
about their irqs. But we also do not want them to route everything
through userspace for on-the-fly translation and reinjection to kvm.

Jan




Re: [Qemu-devel] [RFC for-2.7 04/11] pseries: Make spapr_create_fdt_skel() get information from machine state

2016-04-26 Thread Thomas Huth
On 20.04.2016 04:33, David Gibson wrote:
> Currently spapr_create_fdt_skel() takes a bunch of individual parameters
> for various things it will put in the device tree.  Some of these can
> already be taken directly from sPAPRMachineState.  This patch alters it so
> that all of them can be taken from there, which will allow this code to
> be moved away from its current caller in future.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr.c | 81 
> ++
>  include/hw/ppc/spapr.h |  4 +++
>  2 files changed, 40 insertions(+), 45 deletions(-)

Reviewed-by: Thomas Huth 




Re: [Qemu-devel] how to enable nvme device in Qemu?

2016-04-26 Thread Laszlo Ersek
On 04/26/16 11:18, Stefan Hajnoczi wrote:
> On Tue, Apr 19, 2016 at 08:37:42AM +, Qingtao Sun wrote:
>>  I want to boot OS from NVMe device in qemu-2.5.1. But I didn't find 
>> the bootable NVMe device in guest BIOS. How to enable this feature?
> 
> I don't see an NVMe driver so I guess SeaBIOS does not support boot from
> NVMe:
> https://code.coreboot.org/p/seabios/source/tree/master/src/hw
> 
> I have CCed the SeaBIOS mailing list in case they have any advice.

Removing the SeaBIOS list before I achieve "generally hated" status
there, for peddling OVMF :)

Qingtao Sun: you can boot off of NVMe devices with recent OVMF.

https://github.com/tianocore/edk2/issues/48
https://www.kraxel.org/repos/

As far as QEMU is concerned, you will need a binary that has commit
a907ec52cc1a (v2.6.0-rc3, for example -- v2.5.* don't have it, directly
or via cherry-pick).

Thanks
Laszlo



[Qemu-devel] Working on AF_VSOCK packet capture

2016-04-26 Thread Gerard
Hi all,

My name is Gerard Garcia and I have been funded by GSOC16 to work on a
device driver to allow capturing host/guest traffic through AF_VSOCK
sockets: http://qemu-project.org/Features/VirtioVsock

I'll mostly work on the Linux kernel codebase but the device driver is
closely related to QEMU so I felt like introducing myself here!

My IRC nick is gerardgarcia. I'll try to be around!

Gerard


Re: [Qemu-devel] [Qemu-arm] [PATCH] target-arm: Fix descriptor address masking in ARM address translation

2016-04-26 Thread Tom Hanson

On 03/21/2016 09:56 AM, Sergey Sorokin wrote:

17.03.2016, 18:24, "Peter Maydell" :

  On 17 March 2016 at 15:21, Sergey Sorokin  wrote:

   17.03.2016, 14:40, "Peter Maydell" :

   On 13 March 2016 at 18:28, Sergey Sorokin  wrote:

   If you want to implement the AddressSize checks that's fine,
   but otherwise please leave this bit of the code alone.


You said me that my code is not correct, I have proved that it conforms
to the documentation.
It's a bit obfuscating when the doc explicitly says to take bits up to 39
from the descriptor, but in QEMU we take bits up to 47 relying on the check 
in
another part of the code, even if both ways are correct.


   The way the code in QEMU is structured is that we extract the
   descriptor field in one go and then will operate on it
   (checking for need to AddressSize fault, etc) as a second
   action. The field descriptors themselves are the sizes I said.


   Well, may be it's enough just to change this comment as you intend:


   - /* The address field in the descriptor goes up to bit 39 for ARMv7
   - * but up to bit 47 for ARMv8.
   + /* The address field in the descriptor goes up to bit 39 for AArch32
   + * but up to bit 47 for AArch64.
 */


  The comment is correct as it stands.

  thanks
  -- PMM


I mean in the patch.
We need to fix lower bits in descaddrmask anyway.
So:

I could describe in the comment, that the descriptor field is up to bit 47 for 
ARMv8 (as long as you want it),
but we use the descaddrmask up to bit 39 for AArch32,
because we don't need other bits in that case to construct next descriptor 
address.
It is clearly described in the ARM pseudo-code.
Why should we keep in the mask bits from 40 up to 47 if we don't need them? 
Even if they are all zeroes.
It is a bit obfuscating, as I said.


 I agree with Peter.  The original comment is correct.

Looking at the TLBRecord AArch32.TranslationTableWalkLD pseudocode, it is 
treating the AArch32 address as 48 bits long.  For example:
if !IsZero(baseregister<47:40>) then
level = 0;
result.addrdesc.fault = AArch32.AddressSizeFault(ipaddress, domain, 
level, acctype, iswrite,
 secondstage, 
s2fs1walk);
return result;

This requires that an AArch32 address have specific values up through bit 47.



[Qemu-devel] Is anyone able to load a web page from a guest operating system?

2016-04-26 Thread Programmingkid
My three guest operating systems can't load a web page. I think this is a bug 
with QEMU. Is there anyone who has the latest revision of QEMU that can access 
the web from a guest? Or are you experiencing the same problem?


Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-04-26 Thread Radim Krčmář
2016-04-26 17:28+0200, Jan Kiszka:
> On 2016-04-26 16:59, Radim Krčmář wrote:
>> 2016-04-26 16:24+0200, Jan Kiszka:
>>> On 2016-04-26 13:40, Peter Xu wrote:
 Currently, all the interrupts will be translated into one MSI in
 vtd_generate_msi_message(), in which only 8 bits of dest_id is used
 (msg.dest = irq->dest). We may possibly need to use the high 32 bits
 of MSI address to store the rest dest[31:8]? Don't know whether this
 would be enough though.
>>>
>>> Yes, I ran into this topic as well as I hacked up those line. Currently,
>>> KVM does not support more than 254 vCPUs, so 8 bits of those 32 are
>>> actually fine, and piggy-backing them in an MSI message works.
>>>
>>> Once KVM supports more CPUs, it has to come up with a new userspace
>>> interface to inject APIC events for more than 255 CPUs. Maybe the
>>> existing direct MSI inject with its unused flags could be "bended",
>>> maybe there are already better ideas (Radim?).
>> 
>> Adding a flag to msi_msg and taking 3-4 bytes from padding to express
>> x2APIC addresses is reasonable.  (It is what my prototype did. :])
>> 
>> The conceptually different idea is forcing all userspace interrupts
>> through irqfd routes, which would obsolete the ad-host inject.
> 
> irqfd for userspace sources is a bit clumsy from the API POV. On the
> other hand, we need to tweak the routing API anyway to achieve the same
> address extension there, too.

I agree, both of them should change.  KVM_SIGNAL_MSI was added just
because the route-based injection in kvm_irqchip_send_msi() sucked;
maybe we could find a good solution now, but the direct interface is
already here ...



Re: [Qemu-devel] [PATCH for-2.7 v2 07/17] rbd: Implement image locking

2016-04-26 Thread Jason Dillaman
On Sun, Apr 24, 2016 at 7:42 PM, Fam Zheng  wrote:
> On Fri, 04/22 21:57, Jason Dillaman wrote:
>> Since this cannot automatically recover from a crashed QEMU client with an
>> RBD image, perhaps this RBD locking should not default to enabled.
>> Additionally, this will conflict with the "exclusive-lock" feature
>> available since the Ceph Hammer-release since both utilize the same locking
>> construct.
>>
>> As a quick background, the optional exclusive-lock feature can be
>> enabled/disabled on image and safely/automatically handles the case of
>> recovery from a crashed client.  Under normal conditions, the RBD
>> exclusive-lock feature automatically acquires the lock upon the first
>> attempt to write to the image and transparently transitions ownership of
>> the lock between two or more clients -- used for QEMU live-migration.
>
> Is it enabled by default?
>

Starting with the Jewel release of Ceph it is enabled by default.

>>
>> I'd be happy to add a new librbd API method to explicitly expose acquiring
>> and releasing the RBD exclusive lock since it certainly solves a couple
>> compromises in our current QEMU integration.
>
> Does the API do enable/disable or acquire/relase? Could you explain the
> differences between it and rbd_lock_exclusive?

There is already an API for enabling/disabling the exclusive-lock
feature (and associated CLI tooling).  This would be a new API method
to explicitly acquire / release the exclusive-lock (instead of
implicitly acquiring the lock upon first write request as it currently
does in order to support QEMU live-migration).

The rbd_lock_exclusive/rbd_lock_shared are essentially advisory locks.
Nothing stops a client from accessing the image if it doesn't attempt
to acquire the lock (even if another client already has the image
locked exclusively).  It also doesn't support automatic recovery from
failed clients.

-- 
Jason



Re: [Qemu-devel] [RFC v3] translate-all: protect code_gen_buffer with RCU

2016-04-26 Thread Richard Henderson
On 04/25/2016 11:35 PM, Alex Bennée wrote:
> 
> Richard Henderson  writes:
> 
>> On 04/25/2016 04:46 PM, Emilio G. Cota wrote:
>>> +/*
>>> + * write the prologue into buf2. This is safe because we'll later call
>>> + * tcg_prologue_init on buf1, from which we'll start execution.
>>> + */
>>> +tcg_ctx.code_gen_buffer = code_gen_buf2;
>>> +tcg_prologue_init(_ctx);
>>> +
>>
>> Ah, no.  Write only one prologue, not one per buffer.
>>
>> If they're sufficiently close (i.e. one allocation under the max size),
>> then the same one can be used for both halves.
>>
>> The global variables that you didn't see in this revision are:
>>
>> aarch64/tcg-target.inc.c:static tcg_insn_unit *tb_ret_addr;
>> arm/tcg-target.inc.c:static tcg_insn_unit *tb_ret_addr;
>> i386/tcg-target.inc.c:static tcg_insn_unit *tb_ret_addr;
>> ia64/tcg-target.inc.c:static tcg_insn_unit *tb_ret_addr;
>> ia64/tcg-target.inc.c:tcg_insn_unit *thunks[8] = { };
>> mips/tcg-target.inc.c:static tcg_insn_unit *tb_ret_addr;
>> ppc/tcg-target.inc.c:static tcg_insn_unit *tb_ret_addr;
>> s390/tcg-target.inc.c:static tcg_insn_unit *tb_ret_addr;
>> sparc/tcg-target.inc.c:static tcg_insn_unit *qemu_ld_trampoline[16];
>> sparc/tcg-target.inc.c:static tcg_insn_unit *qemu_st_trampoline[16];
> 
> Aside from the existing code structure is there any reason to have only
> one prologue? 

Well, there's also the gdb jit unwind info.  But aside from those, no.

> It doesn't seem to be a large amount of code and in the
> case of having smaller translation regions I would posit having a
> "local" prologue/epilogue would make the jumps cheaper.

Not really.  The jumps are generally in range already, based on the restriction
in max buffer size.

Really only arm32 (and ppc32, post direct jump atomicity patchset) are the only
ones that require a tiny (less than 64MB) buffer.  Anything bigger than 64MB, I
don't see any reason to create two independent buffers.

The other consideration not yet mentioned is that you'd like to put on the
entire buffer, in the case of x86_64 and some others, within 2GB of the main
executable, so that helper calls can use a direct call insn.



r~



Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-04-26 Thread Jan Kiszka
On 2016-04-26 16:59, Radim Krčmář wrote:
> 2016-04-26 16:24+0200, Jan Kiszka:
>> On 2016-04-26 13:40, Peter Xu wrote:
>>> Currently, all the interrupts will be translated into one MSI in
>>> vtd_generate_msi_message(), in which only 8 bits of dest_id is used
>>> (msg.dest = irq->dest). We may possibly need to use the high 32 bits
>>> of MSI address to store the rest dest[31:8]? Don't know whether this
>>> would be enough though.
>>
>> Yes, I ran into this topic as well as I hacked up those line. Currently,
>> KVM does not support more than 254 vCPUs, so 8 bits of those 32 are
>> actually fine, and piggy-backing them in an MSI message works.
>>
>> Once KVM supports more CPUs, it has to come up with a new userspace
>> interface to inject APIC events for more than 255 CPUs. Maybe the
>> existing direct MSI inject with its unused flags could be "bended",
>> maybe there are already better ideas (Radim?).
> 
> Adding a flag to msi_msg and taking 3-4 bytes from padding to express
> x2APIC addresses is reasonable.  (It is what my prototype did. :])
> 
> The conceptually different idea is forcing all userspace interrupts
> through irqfd routes, which would obsolete the ad-host inject.

irqfd for userspace sources is a bit clumsy from the API POV. On the
other hand, we need to tweak the routing API anyway to achieve the same
address extension there, too.

Jan

> 
>> In any case, the KVM layer in userspace will then have to pick up all 32
>> destination bits from the IRTE and deliver them via that new interface
>> to the in-kernel APICs.
> 
> I'll keep that in mind, thanks.
> 




Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-04-26 Thread Radim Krčmář
2016-04-26 16:24+0200, Jan Kiszka:
> On 2016-04-26 13:40, Peter Xu wrote:
>> Currently, all the interrupts will be translated into one MSI in
>> vtd_generate_msi_message(), in which only 8 bits of dest_id is used
>> (msg.dest = irq->dest). We may possibly need to use the high 32 bits
>> of MSI address to store the rest dest[31:8]? Don't know whether this
>> would be enough though.
> 
> Yes, I ran into this topic as well as I hacked up those line. Currently,
> KVM does not support more than 254 vCPUs, so 8 bits of those 32 are
> actually fine, and piggy-backing them in an MSI message works.
> 
> Once KVM supports more CPUs, it has to come up with a new userspace
> interface to inject APIC events for more than 255 CPUs. Maybe the
> existing direct MSI inject with its unused flags could be "bended",
> maybe there are already better ideas (Radim?).

Adding a flag to msi_msg and taking 3-4 bytes from padding to express
x2APIC addresses is reasonable.  (It is what my prototype did. :])

The conceptually different idea is forcing all userspace interrupts
through irqfd routes, which would obsolete the ad-host inject.

> In any case, the KVM layer in userspace will then have to pick up all 32
> destination bits from the IRTE and deliver them via that new interface
> to the in-kernel APICs.

I'll keep that in mind, thanks.



Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume

2016-04-26 Thread Alex Williamson
On Tue, 26 Apr 2016 11:39:02 +0800
Chen Fan  wrote:

> On 04/14/2016 09:02 AM, Chen Fan wrote:
> >
> > On 04/12/2016 05:38 AM, Alex Williamson wrote:  
> >> On Tue, 5 Apr 2016 19:42:02 +0800
> >> Cao jin  wrote:
> >>  
> >>> From: Chen Fan 
> >>>
> >>> for supporting aer recovery, host and guest would run the same aer
> >>> recovery code, that would do the secondary bus reset if the error
> >>> is fatal, the aer recovery process:
> >>>1. error_detected
> >>>2. reset_link (if fatal)
> >>>3. slot_reset/mmio_enabled
> >>>4. resume
> >>>
> >>> it indicates that host will do secondary bus reset to reset
> >>> the physical devices under bus in step 2, that would cause
> >>> devices in D3 status in a short time. but in qemu, we register
> >>> an error detected handler, that would be invoked as host broadcasts
> >>> the error-detected event in step 1, in order to avoid guest do
> >>> reset_link when host do reset_link simultaneously. it may cause
> >>> fatal error. we introduce a resmue notifier to assure host reset
> >>> completely. then do guest aer injection.  
> >> Why is it safe to continue running the VM between the error detected
> >> notification and the resume notification?  We're just pushing back the
> >> point at which we inject the AER into the guest, potentially negating
> >> any benefit by allowing the VM to consume bad data.  Shouldn't we
> >> instead be immediately notifying the VM on error detected, but stalling
> >> any access to the device until resume is signaled?  How do we know that
> >> resume will ever be signaled?  We have both the problem that we may be
> >> running on an older kernel that won't support a resume notification and
> >> the problem that seeing a resume notification depends on the host being
> >> able to successfully complete a link reset after fatal error. We can
> >> detect support for resume notification, but we still need a strategy
> >> for never receiving it.  Thanks,  
> > That's make sense, but I haven't came up with a good idea. do you have
> > any idea, Alex?

I don't know that there are any good solutions here.  We need to
respond to the current error notifier interrupt and not regress from
our support there.  I think that means that if we want to switch from a
simple halt-on-error to a mechanism for the guest to handle recovery,
we need to disable access to the device between being notified that the
error occurred and being notified to resume.  We can do that by
disabling mmaps to the device and preventing access via the slow path
handlers.  I don't know what the best solution is for preventing access,
do we block and pause the VM or do we drop writes and return -1 for
reads, that's something that needs to be determined.  We also need to
inject the AER into the VM at the point we're notified of an error
because the VM needs to know as soon as possible to stop using the
device or trusting any data from it.  The next coordination point would
be something like the resume notifier that you've added and there are
numerous questions around the interaction of that with the guest
handling.  Clearly we can't do a guest directed bus reset until we get
the resume notifier, so do we block that execution path in QEMU until
the resume notification is received?  What happens if we don't get that
notification?  Is there any way that we can rely on the host having
done a bus reset to the point where we don't need to act on the guest
directed reset?  These are all things that need to be figured out.
Thanks,

Alex



Re: [Qemu-devel] [PATCH] iotests: fix the redirection order in 083

2016-04-26 Thread Eric Blake
On 04/26/2016 04:13 AM, Wei Jiangang wrote:
> It should redirect stdout to /dev/null first,
> then redirect stderr to whatever stdout currently points at.
> 
> Signed-off-by: Wei Jiangang 
> ---
>  tests/qemu-iotests/083 | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-04-26 Thread Jan Kiszka
On 2016-04-26 13:40, Peter Xu wrote:
> On Tue, Apr 26, 2016 at 12:51:44PM +0200, Jan Kiszka wrote:
>> On 2016-04-26 12:38, Peter Xu wrote:
>> Hi, Jan,
>>
>> The above issue should be caused by EOI missing of level-triggered
>> interrupts. Before that, I was always using edge-triggered
>> interrupts for test, so didn't encounter this one. Would you please
>> help try below patch? It can be applied directly onto the series,
>> and should solve the issue (it works on my test vm, and I'll take it
>> in v5 as well if it also works for you):
>>
>
> Works here as well. I even made EIM working with some hack, though
> Jailhouse spits out strange warnings, despite it works fine (x2apic
> mode, split irqchip).

 Corrections: the warnings are issued by qemu, not Jailhouse, e.g.

 qemu-system-x86_64: VT-d Failed to remap interrupt for gsi 22.

 I suspect that comes from the hand-over phase of Jailhouse, when it
 mutes all interrupts in the system while reconfiguring IR and IOAPIC.

 Please convert this error (in kvm_arch_fixup_msi_route) into a trace
 point. It shall not annoy the host. Also check if you have more of such
 guest-triggerable error messages.
>>>
>>> Okay. This should be the only one. I can use trace instead.
>>>
>>> Meanwhile, I still suppose we should not seen it even with
>>> error_report().. Would this happen when boot e.g. generic kernels?
>>
>> No, this is - most probably, still need to validate in details - an
>> effect due to the special case with Jailhouse that interrupt and VT-d
>> management is handed over from a running OS to a hypervisor.
>>
>> Jan
>>
>> PS: Current quick hack for EIM support (lacks compat mode blocking at
>> least):
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 1082ab5..709a92c 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -907,6 +907,7 @@ static void 
>> vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
>>  value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
>>  s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
>>  s->intr_root = value & VTD_IRTA_ADDR_MASK;
>> +s->intr_eime = value & VTD_IRTA_EIME;
>>  
>>  QLIST_FOREACH(notifier, >iec_notifiers, list) {
>>  if (notifier->iec_notify) {
>> @@ -2052,11 +2053,13 @@ static int vtd_remap_irq_get(IntelIOMMUState *iommu, 
>> uint16_t index, VTDIrq *irq
>>  irq->trigger_mode = irte.trigger_mode;
>>  irq->vector = irte.vector;
>>  irq->delivery_mode = irte.delivery_mode;
>> -/* Not support EIM yet: please refer to vt-d 9.10 DST bits */
>> +irq->dest = irte.dest_id;
>> +if (!iommu->intr_eime) {
>>  #define  VTD_IR_APIC_DEST_MASK (0xff00ULL)
>>  #define  VTD_IR_APIC_DEST_SHIFT(8)
>> -irq->dest = (irte.dest_id & VTD_IR_APIC_DEST_MASK) >> \
>> -VTD_IR_APIC_DEST_SHIFT;
>> +irq->dest = (irq->dest & VTD_IR_APIC_DEST_MASK) >>
>> +VTD_IR_APIC_DEST_SHIFT;
>> +}
>>  irq->dest_mode = irte.dest_mode;
>>  irq->redir_hint = irte.redir_hint;
>>  
>> @@ -2316,7 +2319,7 @@ static void vtd_init(IntelIOMMUState *s)
>>  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
>>  
>>  if (ms->iommu_intr) {
>> -s->ecap |= VTD_ECAP_IR;
>> +s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM;
>>  }
>>  
>>  vtd_reset_context_cache(s);
>> @@ -2370,10 +2373,9 @@ static void vtd_init(IntelIOMMUState *s)
>>  vtd_define_quad(s, DMAR_FRCD_REG_0_2, 0, 0, 0x8000ULL);
>>  
>>  /*
>> - * Interrupt remapping registers, not support extended interrupt
>> - * mode for now.
>> + * Interrupt remapping registers.
>>   */
>> -vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf00fULL, 0);
>> +vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf80fULL, 0);
>>  }
>>  
>>  /* Should not reset address_spaces when reset because devices will still use
>> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
>> index 10c20fe..72b0114 100644
>> --- a/hw/i386/intel_iommu_internal.h
>> +++ b/hw/i386/intel_iommu_internal.h
>> @@ -176,6 +176,7 @@
>>  
>>  /* IRTA_REG */
>>  #define VTD_IRTA_ADDR_MASK  (VTD_HAW_MASK ^ 0xfffULL)
>> +#define VTD_IRTA_EIME   (1ULL << 11)
>>  #define VTD_IRTA_SIZE_MASK  (0xfULL)
>>  
>>  /* ECAP_REG */
>> @@ -184,6 +185,7 @@
>>  #define VTD_ECAP_QI (1ULL << 1)
>>  /* Interrupt Remapping support */
>>  #define VTD_ECAP_IR (1ULL << 3)
>> +#define VTD_ECAP_EIM(1ULL << 4)
>>  
>>  /* CAP_REG */
>>  /* (offset >> 4) << 24 */
>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>> index d7613af..c6c53c6 100644
>> --- a/include/hw/i386/intel_iommu.h
>> +++ b/include/hw/i386/intel_iommu.h
>> @@ -261,6 +261,7 @@ struct IntelIOMMUState {
>>  bool intr_enabled;  /* Whether guest enabled IR */
>>  dma_addr_t intr_root;

Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-04-26 Thread Radim Krčmář
2016-04-26 15:34+0800, Peter Xu:
> Hi, Jan,
> 
> The above issue should be caused by EOI missing of level-triggered
> interrupts. Before that, I was always using edge-triggered
> interrupts for test, so didn't encounter this one. Would you please
> help try below patch? It can be applied directly onto the series,
> and should solve the issue (it works on my test vm, and I'll take it
> in v5 as well if it also works for you):
> 
> -
> 
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int 
> size)
> +/*
> + * This is to satisfy the hack in Linux kernel. One hack of it is to
> + * simulate clearing the Remote IRR bit of IOAPIC entry using the
> + * following:
> + *
> + * "For IO-APIC's with EOI register, we use that to do an explicit EOI.
> + * Otherwise, we simulate the EOI message manually by changing the trigger
> + * mode to edge and then back to level, with RTE being masked during
> + * this."
> + *
> + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701)
> + *
> + * This is based on the assumption that, Remote IRR bit will be
> + * cleared by IOAPIC hardware for edge-triggered interrupts (I
> + * believe that's what the IOAPIC version 0x1X hardware does).

I thought that Linux doesn't use explicit "EOI" to IO-APIC, but relies
on EOI broadcast from LAPIC -- does that change with IR?

> + * So
> + * if we are emulating it, we'd better do it the same here, so that
> + * the guest kernel hack will work as well on QEMU.

Totally.

> + * Without this, level-triggered interrupts in IR mode might fail to
> + * work correctly.

(I don't really understand why it worked before.)

> + */
> +static inline void
> +ioapic_fix_edge_remote_irr(uint64_t *entry)
> +{
> +if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
> +/* Level triggered interrupts, make sure remote IRR is zero */
> +*entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);

(You can just unconditionally zero it, edge doesn't care.)

> +}
> +}
> +
> @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
>  s->ioredtbl[index] &= ~0xULL;
>  s->ioredtbl[index] |= val;
>  }
> +ioapic_fix_edge_remote_irr(>ioredtbl[index]);

I think this can be done only in the else branch of (s->ioregsel & 1).

(If the guest kernel does level->edge->level, then remote_irr probably
 should be cleared only on edge->level transition and not on
 level->level, but I haven't seen that in the spec ...)

>  ioapic_service(s);
> 
> 
> I am still looking into guest part codes. Although the above patch
> should solve the issue, there are still issues in guest codes when
> IR is enabled:
> 
> - mismatched "vector" in IOAPIC entry and IRTE entry (this is
>   required in vt-d spec 5.1.5.1, and required to correctly deliver
>   EOI broadcast I guess). See intel_irq_remapping_prepare_irte():

"required" is a way of saying that the opposite is undefined.
No need to think about it in IOMMU.

> - I encountered that level-triggered entries in IOAPIC is marked as
>   edge-triggered interrupt in APIC (which is strange)...

What/where do you mean?
(The only difference I know of is that level triggered vectors in LAPIC
 have their respective TMR bit set while edge do not.)

Thanks.



Re: [Qemu-devel] Simultaneous vhostfds= and ifname= options

2016-04-26 Thread Michał Dubiel
Hi all,

Any thoughts to share on that?

Thanks,
Michal

On 15 April 2016 at 18:50, Michał Dubiel  wrote:

> Hi guys,
>
> Qemu does not allow currently to use vhostfds= and ifname= options of
> -netdev tap... at the same time. In order to use vhostfds=, you have to
> also use fds= instead. May ask you what is rationale behind that?
>
> I am asking as we need this for the OpenContrail vRouter multiqueue
> support. We are passing name of previously created tap, and vhost-net
> device descriptors (which are supplied by the libvirt).
>
> Actually I have internally a very simple patch that allows for
> simultaneous vhostfds= and ifname=. All seems to work fine with this, but
> perhaps there is something I do not know and this combination of options
> indeed should not be allowed.
>
> If there is no issues with allowing for that, I could contribute it.
>
> Regards,
> Michal
>


[Qemu-devel] [PATCH] PPC/KVM: early validation of vcpu id

2016-04-26 Thread Greg Kurz
The KVM API restricts vcpu ids to be < KVM_CAP_MAX_VCPUS. On PowerPC
targets, depending on the number of threads per core in the host and
in the guest, some topologies do generate higher vcpu ids actually.
When this happens, QEMU bails out with the following error:

kvm_init_vcpu failed: Invalid argument

The KVM_CREATE_VCPU ioctl has several EINVAL return paths, so it is
not possible to fully disambiguate.

This patch adds a check in the code that computes vcpu ids, so that
we can detect the error earlier, and print a friendlier message instead
of calling KVM_CREATE_VCPU with an obviously bogus vcpu id.

Signed-off-by: Greg Kurz 
---
 include/sysemu/kvm.h|2 ++
 kvm-all.c   |6 ++
 target-ppc/translate_init.c |8 
 3 files changed, 16 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 0e18f15c9493..27bf50ef379e 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -344,6 +344,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s);
 
 int kvm_arch_init_vcpu(CPUState *cpu);
 
+bool kvm_vcpu_id_is_valid(int vcpu_id);
+
 /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
 unsigned long kvm_arch_vcpu_id(CPUState *cpu);
 
diff --git a/kvm-all.c b/kvm-all.c
index e7b66df197fd..3625a3e07457 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1459,6 +1459,12 @@ static int kvm_max_vcpus(KVMState *s)
 return (ret) ? ret : kvm_recommended_vcpus(s);
 }
 
+bool kvm_vcpu_id_is_valid(int vcpu_id)
+{
+KVMState *s = KVM_STATE(current_machine->accelerator);
+return vcpu_id >= 0 && vcpu_id < kvm_max_vcpus(s);
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index f51572552bc2..6c89b18a05f9 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -9247,6 +9247,14 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error 
**errp)
 #if !defined(CONFIG_USER_ONLY)
 cpu->cpu_dt_id = (cs->cpu_index / smp_threads) * max_smt
 + (cs->cpu_index % smp_threads);
+
+if (kvm_enabled() && !kvm_vcpu_id_is_valid(cpu->cpu_dt_id)) {
+error_setg(errp, "Can't create CPU with id %d in KVM", cpu->cpu_dt_id);
+error_append_hint(errp, "Adjust the number of cpus to %d "
+  "or try to raise the number of threads per core\n",
+  cpu->cpu_dt_id * smp_threads / max_smt);
+return;
+}
 #endif
 
 if (tcg_enabled()) {




Re: [Qemu-devel] [RFC PATCH resend 00/11] Make CoMutex/CoQueue/CoRwlock thread-safe

2016-04-26 Thread Stefan Hajnoczi
On Fri, Apr 15, 2016 at 01:31:55PM +0200, Paolo Bonzini wrote:
> [this time including the mailing list]
> 
> This is yet another tiny bit of the multiqueue work, this time affecting
> the synchronization infrastructure for coroutines.  Currently, coroutines
> synchronize between the main I/O thread and the dataplane iothread through
> the AioContext lock.  However, for multiqueue a single BDS will be used
> by multiple iothreads and hence multiple AioContexts.  This calls for
> a different approach to coroutine synchronization.  This series is my
> first attempt at it.  Besides multiqueue, I think throttling groups
> (which can already span multiple AioContexts) could also take advantage
> of the new CoMutexes.
> 
> The series has two main parts:
> 
> - it makes CoMutex bind coroutines to an AioContexts.  It is of course
>   still possible to move coroutines around with explicit yield and
>   qemu_coroutine_enter, but a CoMutex will now reenter the coroutine
>   where it was running before, rather than in the AioContext of
>   the previous owner.  To do this, a new function aio_co_schedule is
>   introduced to run a coroutine on a given iothread.  I think this could
>   be used in other places too; for now it lets a CoMutex protect stuff
>   across multiple AioContexts without them moving around(*).  Of course
>   currently a CoMutex is generally not used across multiple iothreads,
>   because you have to acquire/release the AioContext around CoMutex
>   critical sections.  However...
> 
> - ... the second change is exactly to make CoMutex thread-safe and remove
>   the need for a "thread-based" mutex around it.  The new CoMutex is
>   exactly the same as a mutex implementation that you'd find in an
>   operating system.  iothreads are the moral equivalent of CPUs in
>   a kernel, while coroutines resemble kernel threads running without
>   preemption on a CPU.  Even if you have to take concurrency into account,
>   the lack of preemption while running coroutines or bottom halves
>   keeps the complexity at bay.  For example, it is easy to synchronize
>   between qemu_co_mutex_lock's yield and the qemu_coroutine_enter in
>   aio_co_schedule's bottom half.
> 
>   Same as before, CoMutex puts coroutines to sleep with
>   qemu_coroutine_yield and wake them up with the new aio_co_schedule.
>   I could have wrapped CoMutex's CoQueue with a "regular" thread mutex or
>   spinlock.  The resulting code would have looked a lot like RFifoLock
>   (with CoQueue replacing RFifoLock's condition variable).  Rather,
>   inspired by the parallel between coroutines and non-preemptive kernel
>   threads, I chose to adopt the same lock-free mutex algorithm as OSv.
>   The algorithm only needs two to four atomic ops for a lock-unlock pair
>   (two when uncontended).  To cover CoQueue, each CoQueue is made to
>   depend on a CoMutex, similar to condition variables.  Most CoQueues
>   already have a corresponding CoMutex so this is not a big deal;
>   converting the others is left for a future series.  I did this because
>   CoQueue looks a lot like OSv's waitqueues; so if necessary, we could
>   even take OSv's support for wait morphing (which avoids the thundering
>   herd problem) and add it to CoMutex and CoQueue.  This may be useful
>   when making tracked_requests thread-safe.
> 
> Kevin: this has nothing to do with my old plan from Brno, and it's
> possibly a lot closer to what you wanted.  Your idea was to automatically
> release the "thread mutex" when a coroutine yields, I think you should
> be fine with not having a thread mutex at all!
> 
> This will need some changes in the formats because, for multiqueue,
> CoMutexes would need to be used like "real" thread mutexes.  Code like
> this:
> 
> ...
> qemu_co_mutex_unlock()
> ... /* still access shared data, but don't yield */
> qemu_coroutine_yield()
> 
> might be required to use this other pattern:
> 
> ... /* access shared data, but don't yield */
> qemu_co_mutex_unlock()
> qemu_coroutine_yield()
> 
> because "adding a second CPU" is already introducing concurrency that
> wasn't there before.  The "non-preemptive multitasking" reference only
> applies to things that access AioContext-local data.  This includes the
> synchronization primitives implemented in this series, the thread pool,
> the Linux AIO support, but not much else.  It still simplifies _those_
> though. :)
> 
> Anyhow, we'll always have some BlockDriver that do not support multiqueue,
> such as the network protocols.  Thus it would be possible to handle the
> formats one at a time.  raw-posix, raw and qcow2 would already form a
> pretty good set, and the first two do not use CoMutex at all.
> 
> The patch has quite a lot of new code, but about half of it is testcases.
> The new test covers correctness and (when run with --verbose) also takes a
> stab at measuring performance; the results is that performance of CoMutex
> is comparable to pthread mutexes.  The only snag is 

Re: [Qemu-devel] [PATCH] hw/intc/arm_gic: add tracepoints

2016-04-26 Thread Stefan Hajnoczi
On Thu, Apr 21, 2016 at 08:24:41AM -0700, Hollis Blanchard wrote:
> These are obviously critical to understanding interrupt delivery:
> gic_enable_irq
> gic_disable_irq
> gic_set_irq (inbound irq from device models)
> gic_update_set_irq (outbound irq to CPU)
> gic_acknowledge_irq
> 
> The only one that I think might raise eyebrows is gic_update_bestirq, but I've
> (sadly) debugged problems that ended up being caused by unexpected priorities.
> Knowing that the GIC has an irq ready, but doesn't deliver to the CPU due to
> priority, has also proven important.
> 
> Signed-off-by: Hollis Blanchard 
> ---
>  hw/intc/arm_gic.c | 12 
>  trace-events  |  8 
>  2 files changed, 20 insertions(+)

Thanks, applied to my tracing tree (for QEMU 2.7):
https://github.com/stefanha/qemu/commits/tracing

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v14 15/19] qapi-commands: Wrap argument visit in visit_start_struct

2016-04-26 Thread Eric Blake
On 04/15/2016 05:42 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> The qmp-input visitor was playing rather fast and loose: when
> 
> I guess (some of) its *users* are playing fast and loose, and the
> visitor itself lets them.  The patch's title suggests "some of its
> users" == qapi-commands.py.
> 
>> visiting a QDict, you could grab members of the root dictionary
>> without first pushing into the dict.  But we are about to tighten
>> the input visitor, at which point the generated marshal code
>> MUST follow the same paradigms as everyone else, of pushing into
>> the struct before grabbing its keys, because the value of 'name'
>> should be ignored on the top-level visit.
>>
>> Generated code grows as follows:
>>
>> |@@ -515,7 +695,15 @@ void qmp_marshal_blockdev_backup(QDict *
>> | BlockdevBackup arg = {0};
>> |
>> | v = qmp_input_get_visitor(qiv);
>> |+visit_start_struct(v, NULL, NULL, 0, );
>> |+if (err) {
>> |+goto out;
>> |+}
>> | visit_type_BlockdevBackup_members(v, , );
>> |+if (!err) {
>> |+visit_check_struct(v, );
>> |+}
> 
> Does this find errors that weren't found before?

All that this could find is excess input, but we are already checking
for excess input prior to calling into the generated marshaling code, so
it doesn't find anything new.

>>
>> Note that this change could also make it possible for the
>> marshalling code to automatically detect excess input at the top
>> level, and not just in nested dictionaries.  However, that checking
>> is not currently useful (and we rely on the manual checking in
>> monitor.c:qmp_check_client_args() instead) as long as qmp-commands.hx
>> uses .args_type, and as long as we have 'name:O' as an arg-type that
>> explicitly allows unknown top-level keys because we haven't yet
>> converted 'device_add' and 'netdev_add' to introspectible use of
>> 'any'.
> 
> Hmm, that explains why finding these additional errors wouldn't be
> useful.  Good to know, but doesn't quite answer my question.

I guess what it really translates to is we are now doing redundant
checking, and I should do a followup patch to simplify monitor.c.

>> @@ -150,7 +158,9 @@ out:
>>  qmp_input_visitor_cleanup(qiv);
>>  qdv = qapi_dealloc_visitor_new();
>>  v = qapi_dealloc_get_visitor(qdv);
>> +visit_start_struct(v, NULL, NULL, 0, NULL);
>>  visit_type_%(c_name)s_members(v, , NULL);
>> +visit_end_struct(v);
>>  qapi_dealloc_visitor_cleanup(qdv);
>>  ''',
>>   c_name=arg_type.c_name())
> 
> No visit_check_struct() here.  I think its contract should explicitly
> state that you may omit it when you're not interested in errors.

Indeed, calling visit_check_struct(, NULL) can't report any errors, so
skipping it should be documented as safe.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] how should a device implement an array of link(?) properties?

2016-04-26 Thread Peter Maydell
Hi; I have what seems like a fairly straightforward requirement for
a QOM device but no idea how to implement it, so I'm looking for
advice on the right "modern" way to do it...

Specifically, this is the GICv3 device. It would like to have
a pointer to every CPU object it needs to be connected to.
My guess is that this should be an array of QOM link properties
of some kind, so that the board level code can do:
 * create the gicv3 device
 * set the 'num-cpu' property
 * for each cpu: set the cpu[i] property to that CPU
 * realize the device

But how should I implement this in the device? There are
qdev array properties, but there's no link qdev property type.
Or should I not be using a link property at all?
Suggestions for the right approach welcome.

thanks
-- PMM



Re: [Qemu-devel] [PATCH] hw/net/virtio-net: Allocating Large sized arrays to heap

2016-04-26 Thread Michael S. Tsirkin
On Tue, Apr 26, 2016 at 04:05:24PM +0800, Zhou Jie wrote:
> virtio_net_flush_tx has a huge stack usage of 16392 bytes approx.
> Moving large arrays to heap to reduce stack usage.
> 
> Signed-off-by: Zhou Jie 

I don't think it's appropriate for trivial.
Also - what's the point, exactly?

> ---
>  hw/net/virtio-net.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 5798f87..cab7bbc 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1213,6 +1213,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>  VirtIONet *n = q->n;
>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>  VirtQueueElement *elem;
> +struct iovec *sg = NULL, *sg2 = NULL;
>  int32_t num_packets = 0;
>  int queue_index = vq2q(virtio_get_queue_index(q->tx_vq));
>  if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> @@ -1224,10 +1225,12 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>  return num_packets;
>  }
>  
> +sg = g_new(struct iovec, VIRTQUEUE_MAX_SIZE);
> +sg2 = g_new(struct iovec, VIRTQUEUE_MAX_SIZE + 1);
>  for (;;) {
>  ssize_t ret;
>  unsigned int out_num;
> -struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], 
> *out_sg;
> +struct iovec *out_sg;
>  struct virtio_net_hdr_mrg_rxbuf mhdr;
>  
>  elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));
> @@ -1252,7 +1255,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>  virtio_net_hdr_swap(vdev, (void *) );
>  sg2[0].iov_base = 
>  sg2[0].iov_len = n->guest_hdr_len;
> -out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1,
> +out_num = iov_copy([1], VIRTQUEUE_MAX_SIZE,
> out_sg, out_num,
> n->guest_hdr_len, -1);
>  if (out_num == VIRTQUEUE_MAX_SIZE) {
> @@ -1269,10 +1272,10 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>   */
>  assert(n->host_hdr_len <= n->guest_hdr_len);
>  if (n->host_hdr_len != n->guest_hdr_len) {
> -unsigned sg_num = iov_copy(sg, ARRAY_SIZE(sg),
> +unsigned sg_num = iov_copy(sg, VIRTQUEUE_MAX_SIZE,
> out_sg, out_num,
> 0, n->host_hdr_len);
> -sg_num += iov_copy(sg + sg_num, ARRAY_SIZE(sg) - sg_num,
> +sg_num += iov_copy(sg + sg_num, VIRTQUEUE_MAX_SIZE - sg_num,
>   out_sg, out_num,
>   n->guest_hdr_len, -1);
>  out_num = sg_num;
> @@ -1284,6 +1287,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>  if (ret == 0) {
>  virtio_queue_set_notification(q->tx_vq, 0);
>  q->async_tx.elem = elem;
> +g_free(sg);
> +g_free(sg2);
>  return -EBUSY;
>  }
>  
> @@ -1296,6 +1301,8 @@ drop:
>  break;
>  }
>  }
> +g_free(sg);
> +g_free(sg2);
>  return num_packets;
>  }
>  
> -- 
> 2.5.5
> 
> 



Re: [Qemu-devel] [RFC 00/13] Multiple fd migration support

2016-04-26 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> "Dr. David Alan Gilbert"  wrote:
> > * Juan Quintela (quint...@redhat.com) wrote:
> >> Hi
> >> 
> >> This patch series is "an" initial implementation of multiple fd migration.
> >> This is to get something out for others to comment, it is not finished at 
> >> all.
> >
> > I've had a quick skim:
> >   a) I think mst is right about the risk of getting stale pages out of 
> > order.
> 
> I have been thinking about this.  We just need to send a "we have
> finish" this round packet.  And reception has to wait for all threads to
> finish before continue.  It is easier and not expensive.  We never
> resend the same page during the same round. 

Yes.

> >   b) Since you don't change the URI at all, it's a bit restricted; for 
> > example,
> >  it means I can't run separate sessions over different NICs unless I've
> >  done something clever at the routing/or bonded them.
> >  One thing I liked the sound of multi-fd for is NUMA; get a BIG box
> >  and give each numa node a separate NIC and run a separate thread on 
> > each
> >  node.
> 
> If we want this _how_ we want to configure it.  This was part of the
> reason to post the patch.  It works only for tcp, I don't even try the
> others, just to see what people want.

I was thinking this would work even for TCP; you'd just need a way to pass
different URIs (with address/port) for each connection.

> >   c) Hmm we do still have a single thread doing all the bitmap syncing and 
> > scanning,
> >  we'll have to watch out if that is the bottleneck at all.
> 
> Yeap.  My idea here was to still maintain the bitmap scanning on the
> main thread, but send work to the "worker threads" in batches, not in
> single pages.  But I haven't really profiled how long we spend there.

Yeh, it would be interesting to see what this profile looked like; if we
suddenly found that main thread had spare cycles perhaps we could do some
more interesting types of scanning.

> >   d) All the zero testing is still done in the main thread which we know is
> >  expensive.
> 
> Not trivial if we don't want to send control information over the
> "other" channels.  One solution would be split the main memory in
> different "main" threads.  No performance profiles.

Yes, and it's tricky because the order is:
   1) Send control information
   2) Farm it out to individual thread

  It's too late for '2' to say 'it's zero'.

> >   e) Do we need to do something for security with having multiple ports? How
> >  do we check that nothing snuck in on one of our extra ports, have we 
> > got
> >  sanity checks to make sure it's actually the right stream.
> 
> 
> We only have a single port.  We opened it several times.  It shouldn't
> require changes in either libvirt/firewall.  (Famous last words)

True I guess.

> 
> >   f) You're handing out pages to the sending threads on the basis of which 
> > one
> >  is free (in the same way as the multi threaded compression); but I 
> > think
> >  it needs some sanity adding to only hand out whole host pages - it 
> > feels
> >  like receiving all the chunks of one host page down separate FDs would
> >  be horrible.
> 
> Trivial optimization would be to send _whole_ huge pages in one go.  I
> wanted comments about what people wanted here.  My idea was really to
> add multipage or several pages in one go.  Would reduce synchronization
> a lot.   I do to the 1st that becomes free because .. I don't know
> how long a specific transmission is going to take.  TCP for you :-(

Sending huge pages would be very nice; the tricky thing is you don't want to 
send
a huge page unless it's all marked dirty.

> >   g) I think you might be able to combine the compression into the same 
> > threads;
> >  so that if multi-fd + multi-threaded-compresison is set you don't end
> >  up with 2 sets of threads and it might be the simplest way to make them
> >  work together.
> 
> Yeap, I thought that.  But I didn't want to merge them in a first
> stage.  It makes much more sense to _not_ send the compressed data
> through the main channel.  But that would be v2 (or 3, or 4 ...)

Right.

> >   h) You've used the last free RAM_SAVE_FLAG!  And the person who takes the 
> > last
> >  slice^Wbit has to get some more.
> >  Since arm, ppc, and 68k have variants that have TARGET_PAGE_BITS 10  
> > that
> >  means we're full; I suggest what you do is use that flag to mean that 
> > we
> >  send another 64bit word; and in that word you use the bottom 7 bits for
> >  the fd index and bit 7 is set to indicate it's fd.  The other bits are 
> > sent
> >  as zero and available for the next use.
> >  Either that or start combining with some other flags.
> >  (I may have a use for some more bits in mind!)
> 
> Ok.  I can looke at that.
> 
> >   i) Is this safe for xbzrle - what happens to the cache (or is it all
> >  still the main 

Re: [Qemu-devel] Updating documentation at http://wiki.qemu.org/download/qemu-doc.html

2016-04-26 Thread Jeff Cody
On Tue, Apr 26, 2016 at 11:02:45AM +0100, Stefan Hajnoczi wrote:
> On Tue, Apr 26, 2016 at 07:50:31AM +0100, Mark Cave-Ayland wrote:
> > Does anyone know if it's possible to update the documentation at the
> > above URL? It appears as a link at http://wiki.qemu-project.org/Manual
> > and seems to be the first hit for most people looking for information on
> > QEMU SPARC emulation, which is frustrating as the information there is
> > well out of date. Presumably this is because the documentation is hosted
> > on the qemu.org domain itself and so ranks higher?
> 
> Hi Mark,
> I've CCed Jeff Cody, who is taking over as system administrator for
> QEMU.  The http://wiki.qemu.org/download/qemu-doc.html link is just a
> static page from 2010.
> 
> It should be possible to rebuild the HTML docs and replace the file.
> Even better would be to update it from a cron job.
> 
> Stefan

Thanks.

I've update the qemu-doc.html at the above link with the latest output of
"make qemu-doc.html" from git master.  I'll set up a daily cronjob to take care
of that automatically, as well.

Jeff



Re: [Qemu-devel] [Qemu-discuss] iolimits for virtio-9p

2016-04-26 Thread Alberto Garcia
On Tue 19 Apr 2016 02:09:24 PM CEST, Pradeep Kiruvale wrote:

> We are planning to implement the io-limits for the virtio-9p driver
> i.e for fsdev devices.
> So, I am looking into the code base and how it has done for the block
> io devices.
>
> I would like to know how difficult is this and is there some one out
> there who has any plan to do this?

Hi,

as Stefan said already, the common API is in throttle.h.

It should be generic enough to be used in other parts of QEMU, but tell
me if you feel that it needs changes.

Once you configure the throttling settings you essentially only need to
call throttle_schedule_timer() to see if a request needs to be
throttled, and afterwards throttle_account() to register the I/O that
has been peformed.

You'll see that there's also throttle-group.c, but that's specific to
the block layer and not meant to be generic.

Berto



Re: [Qemu-devel] [PATCH v4 2/3] hw/arm/virt: Add PMU node for virt machine

2016-04-26 Thread Andrew Jones
On Tue, Apr 26, 2016 at 07:40:45PM +0800, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Add a virtual PMU device for virt machine while use PPI 7 for PMU
> overflow interrupt number.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  hw/arm/virt.c | 33 +
>  include/hw/arm/virt.h |  4 
>  include/sysemu/kvm.h  |  1 +
>  stubs/kvm.c   |  5 +
>  target-arm/kvm64.c| 41 +
>  5 files changed, 84 insertions(+)

Reviewed-by: Andrew Jones 

> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 56d35c7..376cb87 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -428,6 +428,37 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi, int 
> type)
>  qemu_fdt_setprop_cell(vbi->fdt, "/intc", "phandle", vbi->gic_phandle);
>  }
>  
> +static void fdt_add_pmu_nodes(const VirtBoardInfo *vbi, int gictype)
> +{
> +CPUState *cpu;
> +ARMCPU *armcpu;
> +uint32_t irqflags = GIC_FDT_IRQ_FLAGS_LEVEL_HI;
> +
> +CPU_FOREACH(cpu) {
> +armcpu = ARM_CPU(cpu);
> +if (!armcpu->has_pmu ||
> +!kvm_arm_pmu_create(cpu, PPI(VIRTUAL_PMU_IRQ))) {
> +return;
> +}
> +}
> +
> +if (gictype == 2) {
> +irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
> + GIC_FDT_IRQ_PPI_CPU_WIDTH,
> + (1 << vbi->smp_cpus) - 1);
> +}
> +
> +armcpu = ARM_CPU(qemu_get_cpu(0));
> +qemu_fdt_add_subnode(vbi->fdt, "/pmu");
> +if (arm_feature(>env, ARM_FEATURE_V8)) {
> +const char compat[] = "arm,armv8-pmuv3";
> +qemu_fdt_setprop(vbi->fdt, "/pmu", "compatible",
> + compat, sizeof(compat));
> +qemu_fdt_setprop_cells(vbi->fdt, "/pmu", "interrupts",
> +   GIC_FDT_IRQ_TYPE_PPI, VIRTUAL_PMU_IRQ, 
> irqflags);
> +}
> +}
> +
>  static void create_v2m(VirtBoardInfo *vbi, qemu_irq *pic)
>  {
>  int i;
> @@ -1246,6 +1277,8 @@ static void machvirt_init(MachineState *machine)
>  
>  create_gic(vbi, pic, gic_version, vms->secure);
>  
> +fdt_add_pmu_nodes(vbi, gic_version);
> +
>  create_uart(vbi, pic, VIRT_UART, sysmem);
>  
>  if (vms->secure) {
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index ecd8589..b50f095 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -40,6 +40,10 @@
>  #define ARCH_TIMER_NS_EL1_IRQ 14
>  #define ARCH_TIMER_NS_EL2_IRQ 10
>  
> +#define VIRTUAL_PMU_IRQ 7
> +
> +#define PPI(irq) ((irq) + 16)
> +
>  enum {
>  VIRT_FLASH,
>  VIRT_MEM,
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 0e18f15..4522043 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -523,4 +523,5 @@ int kvm_set_one_reg(CPUState *cs, uint64_t id, void 
> *source);
>   * Returns: 0 on success, or a negative errno on failure.
>   */
>  int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target);
> +int kvm_arm_pmu_create(CPUState *cs, int irq);
>  #endif
> diff --git a/stubs/kvm.c b/stubs/kvm.c
> index ddd6204..667e269 100644
> --- a/stubs/kvm.c
> +++ b/stubs/kvm.c
> @@ -6,3 +6,8 @@ int kvm_arch_irqchip_create(MachineState *ms, KVMState *s)
>  {
>  return 0;
>  }
> +
> +int kvm_arm_pmu_create(CPUState *cs, int irq)
> +{
> +return 0;
> +}
> diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
> index b364789..893f983 100644
> --- a/target-arm/kvm64.c
> +++ b/target-arm/kvm64.c
> @@ -382,6 +382,47 @@ static CPUWatchpoint *find_hw_watchpoint(CPUState *cpu, 
> target_ulong addr)
>  return NULL;
>  }
>  
> +static bool kvm_arm_pmu_support_ctrl(CPUState *cs, struct kvm_device_attr 
> *attr)
> +{
> +return kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr) == 0;
> +}
> +
> +int kvm_arm_pmu_create(CPUState *cs, int irq)
> +{
> +int err;
> +
> +struct kvm_device_attr attr = {
> +.group = KVM_ARM_VCPU_PMU_V3_CTRL,
> +.addr = (intptr_t),
> +.attr = KVM_ARM_VCPU_PMU_V3_IRQ,
> +.flags = 0,
> +};
> +
> +if (!kvm_arm_pmu_support_ctrl(cs, )) {
> +return 0;
> +}
> +
> +err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, );
> +if (err < 0) {
> +fprintf(stderr, "KVM_SET_DEVICE_ATTR failed: %s\n",
> +strerror(-err));
> +abort();
> +}
> +
> +attr.group = KVM_ARM_VCPU_PMU_V3_CTRL;
> +attr.attr = KVM_ARM_VCPU_PMU_V3_INIT;
> +attr.addr = 0;
> +attr.flags = 0;
> +
> +err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, );
> +if (err < 0) {
> +fprintf(stderr, "KVM_SET_DEVICE_ATTR failed: %s\n",
> +strerror(-err));
> +abort();
> +}
> +
> +return 1;
> +}
>  
>  static inline void set_feature(uint64_t *features, int feature)
>  {
> -- 
> 2.0.4
> 
> 
> 



[Qemu-devel] [PATCH v4 0/3] Add guest PMU in machine virt

2016-04-26 Thread Shannon Zhao
From: Shannon Zhao 

KVM-ARM64 supports guest PMU now. This series add the support in machine
virt so that guest could use PMU.

The ACPI part is tested with below guest kernel patches.
https://lkml.org/lkml/2016/4/12/755

Changes since v3:
* if kvm_arm_pmu_create returns a failure, don't create pmu dts node for
  guest

Changes since v2:
* address Andrew's comments on PATCH 2, thanks

Changes since v1:
* rebase on master
* Address Andrew's comments, add a macro PPI, fix code style, add
  cpu_to_le32()

Shannon Zhao (3):
  target-arm: kvm64: set guest PMUv3 feature bit if supported
  hw/arm/virt: Add PMU node for virt machine
  hw/arm/virt-acpi-build: Add PMU IRQ number in ACPI table

 hw/arm/virt-acpi-build.c |  3 +++
 hw/arm/virt.c| 33 +
 include/hw/arm/virt.h|  4 
 include/sysemu/kvm.h |  1 +
 stubs/kvm.c  |  5 +
 target-arm/cpu-qom.h |  2 ++
 target-arm/kvm64.c   | 46 ++
 7 files changed, 94 insertions(+)

-- 
2.0.4





[Qemu-devel] [PATCH v4 1/3] target-arm: kvm64: set guest PMUv3 feature bit if supported

2016-04-26 Thread Shannon Zhao
From: Shannon Zhao 

Check if kvm supports guest PMUv3. If so, set the corresponding feature
bit for vcpu.

Signed-off-by: Shannon Zhao 
Reviewed-by: Andrew Jones 
---
 target-arm/cpu-qom.h | 2 ++
 target-arm/kvm64.c   | 5 +
 2 files changed, 7 insertions(+)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index 1061c08..93aa6a4 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -105,6 +105,8 @@ typedef struct ARMCPU {
 bool powered_off;
 /* CPU has security extension */
 bool has_el3;
+/* CPU has PMU (Performance Monitor Unit) */
+bool has_pmu;
 
 /* CPU has memory protection unit */
 bool has_mpu;
diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index e8527bf..b364789 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -461,6 +461,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (!arm_feature(>env, ARM_FEATURE_AARCH64)) {
 cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_EL1_32BIT;
 }
+if (kvm_irqchip_in_kernel() &&
+kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PMU_V3)) {
+cpu->has_pmu = true;
+cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_PMU_V3;
+}
 
 /* Do KVM_ARM_VCPU_INIT ioctl */
 ret = kvm_arm_vcpu_init(cs);
-- 
2.0.4





[Qemu-devel] [PATCH v4 2/3] hw/arm/virt: Add PMU node for virt machine

2016-04-26 Thread Shannon Zhao
From: Shannon Zhao 

Add a virtual PMU device for virt machine while use PPI 7 for PMU
overflow interrupt number.

Signed-off-by: Shannon Zhao 
---
 hw/arm/virt.c | 33 +
 include/hw/arm/virt.h |  4 
 include/sysemu/kvm.h  |  1 +
 stubs/kvm.c   |  5 +
 target-arm/kvm64.c| 41 +
 5 files changed, 84 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 56d35c7..376cb87 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -428,6 +428,37 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi, int type)
 qemu_fdt_setprop_cell(vbi->fdt, "/intc", "phandle", vbi->gic_phandle);
 }
 
+static void fdt_add_pmu_nodes(const VirtBoardInfo *vbi, int gictype)
+{
+CPUState *cpu;
+ARMCPU *armcpu;
+uint32_t irqflags = GIC_FDT_IRQ_FLAGS_LEVEL_HI;
+
+CPU_FOREACH(cpu) {
+armcpu = ARM_CPU(cpu);
+if (!armcpu->has_pmu ||
+!kvm_arm_pmu_create(cpu, PPI(VIRTUAL_PMU_IRQ))) {
+return;
+}
+}
+
+if (gictype == 2) {
+irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
+ GIC_FDT_IRQ_PPI_CPU_WIDTH,
+ (1 << vbi->smp_cpus) - 1);
+}
+
+armcpu = ARM_CPU(qemu_get_cpu(0));
+qemu_fdt_add_subnode(vbi->fdt, "/pmu");
+if (arm_feature(>env, ARM_FEATURE_V8)) {
+const char compat[] = "arm,armv8-pmuv3";
+qemu_fdt_setprop(vbi->fdt, "/pmu", "compatible",
+ compat, sizeof(compat));
+qemu_fdt_setprop_cells(vbi->fdt, "/pmu", "interrupts",
+   GIC_FDT_IRQ_TYPE_PPI, VIRTUAL_PMU_IRQ, 
irqflags);
+}
+}
+
 static void create_v2m(VirtBoardInfo *vbi, qemu_irq *pic)
 {
 int i;
@@ -1246,6 +1277,8 @@ static void machvirt_init(MachineState *machine)
 
 create_gic(vbi, pic, gic_version, vms->secure);
 
+fdt_add_pmu_nodes(vbi, gic_version);
+
 create_uart(vbi, pic, VIRT_UART, sysmem);
 
 if (vms->secure) {
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ecd8589..b50f095 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -40,6 +40,10 @@
 #define ARCH_TIMER_NS_EL1_IRQ 14
 #define ARCH_TIMER_NS_EL2_IRQ 10
 
+#define VIRTUAL_PMU_IRQ 7
+
+#define PPI(irq) ((irq) + 16)
+
 enum {
 VIRT_FLASH,
 VIRT_MEM,
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 0e18f15..4522043 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -523,4 +523,5 @@ int kvm_set_one_reg(CPUState *cs, uint64_t id, void 
*source);
  * Returns: 0 on success, or a negative errno on failure.
  */
 int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target);
+int kvm_arm_pmu_create(CPUState *cs, int irq);
 #endif
diff --git a/stubs/kvm.c b/stubs/kvm.c
index ddd6204..667e269 100644
--- a/stubs/kvm.c
+++ b/stubs/kvm.c
@@ -6,3 +6,8 @@ int kvm_arch_irqchip_create(MachineState *ms, KVMState *s)
 {
 return 0;
 }
+
+int kvm_arm_pmu_create(CPUState *cs, int irq)
+{
+return 0;
+}
diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index b364789..893f983 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -382,6 +382,47 @@ static CPUWatchpoint *find_hw_watchpoint(CPUState *cpu, 
target_ulong addr)
 return NULL;
 }
 
+static bool kvm_arm_pmu_support_ctrl(CPUState *cs, struct kvm_device_attr 
*attr)
+{
+return kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr) == 0;
+}
+
+int kvm_arm_pmu_create(CPUState *cs, int irq)
+{
+int err;
+
+struct kvm_device_attr attr = {
+.group = KVM_ARM_VCPU_PMU_V3_CTRL,
+.addr = (intptr_t),
+.attr = KVM_ARM_VCPU_PMU_V3_IRQ,
+.flags = 0,
+};
+
+if (!kvm_arm_pmu_support_ctrl(cs, )) {
+return 0;
+}
+
+err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, );
+if (err < 0) {
+fprintf(stderr, "KVM_SET_DEVICE_ATTR failed: %s\n",
+strerror(-err));
+abort();
+}
+
+attr.group = KVM_ARM_VCPU_PMU_V3_CTRL;
+attr.attr = KVM_ARM_VCPU_PMU_V3_INIT;
+attr.addr = 0;
+attr.flags = 0;
+
+err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, );
+if (err < 0) {
+fprintf(stderr, "KVM_SET_DEVICE_ATTR failed: %s\n",
+strerror(-err));
+abort();
+}
+
+return 1;
+}
 
 static inline void set_feature(uint64_t *features, int feature)
 {
-- 
2.0.4





[Qemu-devel] [PATCH v4 3/3] hw/arm/virt-acpi-build: Add PMU IRQ number in ACPI table

2016-04-26 Thread Shannon Zhao
From: Shannon Zhao 

Add PMU IRQ number in ACPI table, then we can use PMU in guest through
ACPI.

Signed-off-by: Shannon Zhao 
Reviewed-by: Andrew Jones 
---
 hw/arm/virt-acpi-build.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f51fe39..5031232 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -491,6 +491,9 @@ build_madt(GArray *table_data, GArray *linker, 
VirtGuestInfo *guest_info)
 gicc->arm_mpidr = armcpu->mp_affinity;
 gicc->uid = i;
 gicc->flags = cpu_to_le32(ACPI_GICC_ENABLED);
+
+if (armcpu->has_pmu)
+gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ));
 }
 
 if (guest_info->gic_version == 3) {
-- 
2.0.4





Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-04-26 Thread Peter Xu
On Tue, Apr 26, 2016 at 12:51:44PM +0200, Jan Kiszka wrote:
> On 2016-04-26 12:38, Peter Xu wrote:
>  Hi, Jan,
> 
>  The above issue should be caused by EOI missing of level-triggered
>  interrupts. Before that, I was always using edge-triggered
>  interrupts for test, so didn't encounter this one. Would you please
>  help try below patch? It can be applied directly onto the series,
>  and should solve the issue (it works on my test vm, and I'll take it
>  in v5 as well if it also works for you):
> 
> >>>
> >>> Works here as well. I even made EIM working with some hack, though
> >>> Jailhouse spits out strange warnings, despite it works fine (x2apic
> >>> mode, split irqchip).
> >>
> >> Corrections: the warnings are issued by qemu, not Jailhouse, e.g.
> >>
> >> qemu-system-x86_64: VT-d Failed to remap interrupt for gsi 22.
> >>
> >> I suspect that comes from the hand-over phase of Jailhouse, when it
> >> mutes all interrupts in the system while reconfiguring IR and IOAPIC.
> >>
> >> Please convert this error (in kvm_arch_fixup_msi_route) into a trace
> >> point. It shall not annoy the host. Also check if you have more of such
> >> guest-triggerable error messages.
> > 
> > Okay. This should be the only one. I can use trace instead.
> > 
> > Meanwhile, I still suppose we should not seen it even with
> > error_report().. Would this happen when boot e.g. generic kernels?
> 
> No, this is - most probably, still need to validate in details - an
> effect due to the special case with Jailhouse that interrupt and VT-d
> management is handed over from a running OS to a hypervisor.
> 
> Jan
> 
> PS: Current quick hack for EIM support (lacks compat mode blocking at
> least):
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 1082ab5..709a92c 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -907,6 +907,7 @@ static void 
> vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
>  value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
>  s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
>  s->intr_root = value & VTD_IRTA_ADDR_MASK;
> +s->intr_eime = value & VTD_IRTA_EIME;
>  
>  QLIST_FOREACH(notifier, >iec_notifiers, list) {
>  if (notifier->iec_notify) {
> @@ -2052,11 +2053,13 @@ static int vtd_remap_irq_get(IntelIOMMUState *iommu, 
> uint16_t index, VTDIrq *irq
>  irq->trigger_mode = irte.trigger_mode;
>  irq->vector = irte.vector;
>  irq->delivery_mode = irte.delivery_mode;
> -/* Not support EIM yet: please refer to vt-d 9.10 DST bits */
> +irq->dest = irte.dest_id;
> +if (!iommu->intr_eime) {
>  #define  VTD_IR_APIC_DEST_MASK (0xff00ULL)
>  #define  VTD_IR_APIC_DEST_SHIFT(8)
> -irq->dest = (irte.dest_id & VTD_IR_APIC_DEST_MASK) >> \
> -VTD_IR_APIC_DEST_SHIFT;
> +irq->dest = (irq->dest & VTD_IR_APIC_DEST_MASK) >>
> +VTD_IR_APIC_DEST_SHIFT;
> +}
>  irq->dest_mode = irte.dest_mode;
>  irq->redir_hint = irte.redir_hint;
>  
> @@ -2316,7 +2319,7 @@ static void vtd_init(IntelIOMMUState *s)
>  s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
>  
>  if (ms->iommu_intr) {
> -s->ecap |= VTD_ECAP_IR;
> +s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM;
>  }
>  
>  vtd_reset_context_cache(s);
> @@ -2370,10 +2373,9 @@ static void vtd_init(IntelIOMMUState *s)
>  vtd_define_quad(s, DMAR_FRCD_REG_0_2, 0, 0, 0x8000ULL);
>  
>  /*
> - * Interrupt remapping registers, not support extended interrupt
> - * mode for now.
> + * Interrupt remapping registers.
>   */
> -vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf00fULL, 0);
> +vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf80fULL, 0);
>  }
>  
>  /* Should not reset address_spaces when reset because devices will still use
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 10c20fe..72b0114 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -176,6 +176,7 @@
>  
>  /* IRTA_REG */
>  #define VTD_IRTA_ADDR_MASK  (VTD_HAW_MASK ^ 0xfffULL)
> +#define VTD_IRTA_EIME   (1ULL << 11)
>  #define VTD_IRTA_SIZE_MASK  (0xfULL)
>  
>  /* ECAP_REG */
> @@ -184,6 +185,7 @@
>  #define VTD_ECAP_QI (1ULL << 1)
>  /* Interrupt Remapping support */
>  #define VTD_ECAP_IR (1ULL << 3)
> +#define VTD_ECAP_EIM(1ULL << 4)
>  
>  /* CAP_REG */
>  /* (offset >> 4) << 24 */
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index d7613af..c6c53c6 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -261,6 +261,7 @@ struct IntelIOMMUState {
>  bool intr_enabled;  /* Whether guest enabled IR */
>  dma_addr_t intr_root;   /* Interrupt remapping table pointer */
>  uint32_t intr_size; /* Number of IR table entries */
> +  

Re: [Qemu-devel] [PATCH v8 0/5] ARM: Add NUMA support for machine virt

2016-04-26 Thread Andrew Jones
On Tue, Apr 26, 2016 at 06:40:24PM +0800, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Add NUMA support for machine virt. Tested successfully running a guest
> Linux kernel with the following patch applied:
> 
> - [PATCH v16 0/6] arm64, numa: Add numa support for arm64 platforms
> https://lkml.org/lkml/2016/4/8/571
> - [PATCH v5 00/14] ACPI NUMA support for ARM64
> https://lkml.org/lkml/2016/4/19/852

This series looks good to me, but I guess we still need to wait
for the DT spec[*] to be merged first. Hopefully that'll happen
around the time that the 2.7 dev window opens.

Thanks,
drew
[*] https://lkml.org/lkml/2016/4/8/572

> 
> Example qemu command line:
> qemu-system-aarch64 \
> -enable-kvm -smp 4\
> -kernel Image \
> -m 512 -machine virt,kernel_irqchip=on \
> -initrd guestfs.cpio.gz \
> -cpu host -nographic \
> -numa node,mem=256M,cpus=0-1,nodeid=0 \
> -numa node,mem=256M,cpus=2-3,nodeid=1 \
> -append "console=ttyAMA0 root=/dev/ram"
> 
> Changes since v7:
> * fix code style suggested by Marcel
> * rename acpi_build_srat_memory to build_srat_memory
> 
> Changes since v6:
> * squash first two patches of previous series together
> * fix the definition of proximity in AcpiSratMemoryAffinity
> * rename acpi_build_srat_memory to build_acpi_srat_memory
> 
> Changes since v5:
> * don't generate /distance-map node since it's optional
> * improve the /memory node name
> * move acpi_build_srat_memory to common place then reuse it to generate
>   SRAT table
> 
> Changes since v4:
> * rebased on new kernel driver and device bindings, especially the
>   compatible string "numa-distance-map-v1" of /distance-map node
> * set the numa-node-id for first /memory node
> 
> Changes since v3:
> * based on new kernel driver and device bindings
> * add ACPI part
> 
> Changes since v2:
> * update to use NUMA node property arm,associativity.
> 
> Changes since v1:
> Take into account Peter's comments:
> * rename virt_memory_init to arm_generate_memory_dtb
> * move arm_generate_memory_dtb to boot.c and make it a common func
> * use a struct numa_map to generate numa dtb
> 
> Shannon Zhao (5):
>   ARM: Virt: Set numa-node-id for cpu and memory nodes
>   ACPI: Add GICC Affinity Structure
>   ACPI: Fix the definition of proximity in AcpiSratMemoryAffinity
>   ACPI: move acpi_build_srat_memory to common place
>   ACPI: Virt: Generate SRAT table
> 
>  hw/acpi/aml-build.c | 11 ++
>  hw/arm/boot.c   | 43 +++--
>  hw/arm/virt-acpi-build.c| 52 
> +
>  hw/arm/virt.c   |  8 +++
>  hw/i386/acpi-build.c| 41 +--
>  include/hw/acpi/acpi-defs.h | 17 +--
>  include/hw/acpi/aml-build.h | 10 +
>  7 files changed, 143 insertions(+), 39 deletions(-)
> 
> -- 
> 2.0.4
> 
> 
> 



Re: [Qemu-devel] [PATCH v3 3/3] hw/arm/virt-acpi-build: Add PMU IRQ number in ACPI table

2016-04-26 Thread Andrew Jones
On Tue, Apr 26, 2016 at 07:08:25PM +0800, Shannon Zhao wrote:
> 
> 
> On 2016/4/25 21:49, Shannon Zhao wrote:
> > On 2016年04月25日 20:42, Andrew Jones wrote:
> >> > On Mon, Apr 25, 2016 at 03:11:46PM +0800, Shannon Zhao wrote:
>  >> > From: Shannon Zhao 
>  >> > 
>  >> > Add PMU IRQ number in ACPI table, then we can use PMU in guest 
>  >> > through
>  >> > ACPI.
>  >> > 
>  >> > Signed-off-by: Shannon Zhao 
>  >> > Reviewed-by: Andrew Jones 
>  >> > ---
>  >> >  hw/arm/virt-acpi-build.c | 3 +++
>  >> >  1 file changed, 3 insertions(+)
> >> > Question: for testing of this, did you use
> >> > https://lkml.org/lkml/2016/4/12/755 ?
> > Actually I didn't test PMU with ACPI since I didn't notice the patch
> > series you point out and it's very obvious for QEMU changes(adding irq
> > flag and irq number in GICC). I will have a test later.
> 
> I have test this patch series using ACPI and guest with the patches you
> point out. Guest prints below logs and perf works well in guest.
> 
> [0.00] ACPI-PMU: Assign CPU 0 girq 23 level 0
> [0.00] ACPI-PMU: Assign CPU 1 girq 23 level 0
> [0.00] ACPI-PMU: Assign CPU 2 girq 23 level 0
> [0.00] ACPI-PMU: Assign CPU 3 girq 23 level 0
> [...]
> [0.094782] ACPI-PMU: Setting up 4 PMUs for CPU type D07

Thanks for the additional testing Shannon!

drew



Re: [Qemu-devel] [PATCH v3 3/3] hw/arm/virt-acpi-build: Add PMU IRQ number in ACPI table

2016-04-26 Thread Shannon Zhao


On 2016/4/25 21:49, Shannon Zhao wrote:
> On 2016年04月25日 20:42, Andrew Jones wrote:
>> > On Mon, Apr 25, 2016 at 03:11:46PM +0800, Shannon Zhao wrote:
 >> > From: Shannon Zhao 
 >> > 
 >> > Add PMU IRQ number in ACPI table, then we can use PMU in guest through
 >> > ACPI.
 >> > 
 >> > Signed-off-by: Shannon Zhao 
 >> > Reviewed-by: Andrew Jones 
 >> > ---
 >> >  hw/arm/virt-acpi-build.c | 3 +++
 >> >  1 file changed, 3 insertions(+)
>> > Question: for testing of this, did you use
>> > https://lkml.org/lkml/2016/4/12/755 ?
> Actually I didn't test PMU with ACPI since I didn't notice the patch
> series you point out and it's very obvious for QEMU changes(adding irq
> flag and irq number in GICC). I will have a test later.

I have test this patch series using ACPI and guest with the patches you
point out. Guest prints below logs and perf works well in guest.

[0.00] ACPI-PMU: Assign CPU 0 girq 23 level 0
[0.00] ACPI-PMU: Assign CPU 1 girq 23 level 0
[0.00] ACPI-PMU: Assign CPU 2 girq 23 level 0
[0.00] ACPI-PMU: Assign CPU 3 girq 23 level 0
[...]
[0.094782] ACPI-PMU: Setting up 4 PMUs for CPU type D07

Thanks,
-- 
Shannon




[Qemu-devel] [PATCH v8 3/5] ACPI: Fix the definition of proximity in AcpiSratMemoryAffinity

2016-04-26 Thread Shannon Zhao
From: Shannon Zhao 

ACPI spec says that Proximity Domain is an "Integer that represents
the proximity domain to which the processor belongs". So define it as a
uint32_t.

Cc: Michael S. Tsirkin 
Cc: Igor Mammedov 
Signed-off-by: Shannon Zhao 
Reviewed-by: Andrew Jones 
---
 hw/i386/acpi-build.c| 3 +--
 include/hw/acpi/acpi-defs.h | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9ae4c0d..3c031aa 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2440,8 +2440,7 @@ acpi_build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
 {
 numamem->type = ACPI_SRAT_MEMORY;
 numamem->length = sizeof(*numamem);
-memset(numamem->proximity, 0, 4);
-numamem->proximity[0] = node;
+numamem->proximity = cpu_to_le32(node);
 numamem->flags = cpu_to_le32(flags);
 numamem->base_addr = cpu_to_le64(base);
 numamem->range_length = cpu_to_le64(len);
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index bcf5c3f..850a962 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -475,7 +475,7 @@ typedef struct AcpiSratProcessorAffinity 
AcpiSratProcessorAffinity;
 struct AcpiSratMemoryAffinity
 {
 ACPI_SUB_HEADER_DEF
-uint8_t proximity[4];
+uint32_tproximity;
 uint16_treserved1;
 uint64_tbase_addr;
 uint64_trange_length;
-- 
2.0.4





[Qemu-devel] [PATCH v8 5/5] ACPI: Virt: Generate SRAT table

2016-04-26 Thread Shannon Zhao
From: Shannon Zhao 

To support NUMA, it needs to generate SRAT ACPI table.

Signed-off-by: Shannon Zhao 
Reviewed-by: Andrew Jones 
---
 hw/arm/virt-acpi-build.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f51fe39..26a7bac 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -43,6 +43,7 @@
 #include "hw/acpi/aml-build.h"
 #include "hw/pci/pcie_host.h"
 #include "hw/pci/pci.h"
+#include "sysemu/numa.h"
 
 #define ARM_SPI_BASE 32
 #define ACPI_POWER_BUTTON_DEVICE "PWRB"
@@ -414,6 +415,52 @@ build_spcr(GArray *table_data, GArray *linker, 
VirtGuestInfo *guest_info)
 }
 
 static void
+build_srat(GArray *table_data, GArray *linker, VirtGuestInfo *guest_info)
+{
+AcpiSystemResourceAffinityTable *srat;
+AcpiSratProcessorGiccAffinity *core;
+AcpiSratMemoryAffinity *numamem;
+int i, j, srat_start;
+uint64_t mem_base;
+uint32_t *cpu_node = g_malloc0(guest_info->smp_cpus * sizeof(uint32_t));
+
+for (i = 0; i < guest_info->smp_cpus; i++) {
+for (j = 0; j < nb_numa_nodes; j++) {
+if (test_bit(i, numa_info[j].node_cpu)) {
+cpu_node[i] = j;
+break;
+}
+}
+}
+
+srat_start = table_data->len;
+srat = acpi_data_push(table_data, sizeof(*srat));
+srat->reserved1 = cpu_to_le32(1);
+
+for (i = 0; i < guest_info->smp_cpus; ++i) {
+core = acpi_data_push(table_data, sizeof(*core));
+core->type = ACPI_SRAT_PROCESSOR_GICC;
+core->length = sizeof(*core);
+core->proximity = cpu_to_le32(cpu_node[i]);
+core->acpi_processor_uid = cpu_to_le32(i);
+core->flags = cpu_to_le32(1);
+}
+g_free(cpu_node);
+
+mem_base = guest_info->memmap[VIRT_MEM].base;
+for (i = 0; i < nb_numa_nodes; ++i) {
+numamem = acpi_data_push(table_data, sizeof(*numamem));
+build_srat_memory(numamem, mem_base, numa_info[i].node_mem, i,
+  MEM_AFFINITY_ENABLED);
+mem_base += numa_info[i].node_mem;
+}
+
+build_header(linker, table_data,
+ (void *)(table_data->data + srat_start), "SRAT",
+ table_data->len - srat_start, 3, NULL, NULL);
+}
+
+static void
 build_mcfg(GArray *table_data, GArray *linker, VirtGuestInfo *guest_info)
 {
 AcpiTableMcfg *mcfg;
@@ -638,6 +685,11 @@ void virt_acpi_build(VirtGuestInfo *guest_info, 
AcpiBuildTables *tables)
 acpi_add_table(table_offsets, tables_blob);
 build_spcr(tables_blob, tables->linker, guest_info);
 
+if (nb_numa_nodes > 0) {
+acpi_add_table(table_offsets, tables_blob);
+build_srat(tables_blob, tables->linker, guest_info);
+}
+
 /* RSDT is pointed to by RSDP */
 rsdt = tables_blob->len;
 build_rsdt(tables_blob, tables->linker, table_offsets, NULL, NULL);
-- 
2.0.4





Re: [Qemu-devel] [RFC for-2.7 01/11] qdt: IEEE1275-style device tree utility code

2016-04-26 Thread Thomas Huth
On 20.04.2016 04:33, David Gibson wrote:
...
> This patch introduces a new utility library "qdt" for runtime
> manipulation of device trees.  The intention is that machine type code
> can use these routines to construct the device tree conveniently,
> using a pointer-based representation doesn't have the limitations
> above.  They can then use qdt_flatten() to convert the completed tree
> to fdt format as a single O(n) operation to pass to the guest.

Good idea, the FDT format itself is really not very well suited for
dynamic manipulations...

...
> diff --git a/util/qdt.c b/util/qdt.c
> new file mode 100644
> index 000..e3a449a
> --- /dev/null
> +++ b/util/qdt.c
> @@ -0,0 +1,262 @@
> +/*
> + * Functions for manipulating IEEE1275 (Open Firmware) style device
> + * trees.

What does QDT stand for? Maybe add that in the description here.

> + * Copyright David Gibson, Red Hat Inc. 2016
> + *
> + * This work is licensed unter the GNU GPL version 2 or (at your
> + * option) any later version.
> + */
> +
> +#include 
> +#include 
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/qdt.h"
> +#include "qemu/error-report.h"
> +
> +/*
> + * Node functions
> + */
> +
> +QDTNode *qdt_new_node(const gchar *name)
> +{
> +QDTNode *node = g_new0(QDTNode, 1);
> +
> +g_assert(!strchr(name, '/'));
> +
> +node->name = g_strdup(name);
> +QTAILQ_INIT(>properties);
> +QTAILQ_INIT(>children);
> +
> +return node;
> +}
> +
> +static QDTNode *get_subnode(QDTNode *parent, const gchar *name, size_t 
> namelen)
> +{
> +QDTNode *child;
> +
> +g_assert(!memchr(name, '/', namelen));
> +
> +QTAILQ_FOREACH(child, >children, sibling) {
> +if ((strlen(child->name) == namelen)
> +&& (memcmp(child->name, name, namelen) == 0)) {

Too many parenthesis for my taste ;-)

> +return child;
> +}
> +}
> +
> +return NULL;
> +}
> +
> +QDTNode *qdt_get_node_relative(QDTNode *node, const gchar *path)
> +{
> +const gchar *slash;
> +gsize seglen;
> +
> +do {
> +slash = strchr(path, '/');
> +seglen = slash ? slash - path : strlen(path);
> +
> +node = get_subnode(node, path, seglen);
> +path += seglen + 1;
> +} while (node && slash);
> +
> +return node;
> +}
> +
> +QDTNode *qdt_get_node(QDTNode *root, const gchar *path)
> +{
> +g_assert(!root->parent);
> +g_assert(path[0] == '/');
> +return qdt_get_node_relative(root, path + 1);
> +}
> +
> +QDTNode *qdt_add_subnode(QDTNode *parent, const gchar *name)
> +{
> +QDTNode *new = qdt_new_node(name);
> +
> +new->parent = parent;
> +QTAILQ_INSERT_TAIL(>children, new, sibling);
> +return new;
> +}

In case somebody ever tries to compile this with a C++ compiler ... it's
maybe better avoid using "new" as name for a variable.

> +/*
> + * Property functions
> + */
> +
> +QDTProperty *qdt_new_property(const gchar *name, gconstpointer val, gsize 
> len)
> +{
> +QDTProperty *prop = g_malloc0(sizeof(*prop) + len);
> +
> +prop->name = g_strdup(name);
> +prop->len = len;
> +memcpy(prop->val, val, len);
> +return prop;
> +}
> +
> +static QDTProperty *getprop_(const QDTNode *node, const gchar *name)

Underscore at the end looks somewhat strange ... can't you simply drop that?

> +{
> +QDTProperty *prop;
> +
> +QTAILQ_FOREACH(prop, >properties, list) {
> +if (strcmp(prop->name, name) == 0) {
> +return prop;
> +}
> +}
> +return NULL;
> +}
> +
> +const QDTProperty *qdt_getprop(const QDTNode *node, const gchar *name)
> +{
> +return getprop_(node, name);
> +}
> +
> +void qdt_delprop(QDTNode *node, const gchar *name)
> +{
> +QDTProperty *prop = getprop_(node, name);
> +
> +if (prop) {
> +QTAILQ_REMOVE(>properties, prop, list);
> +g_free(prop->name);
> +g_free(prop);
> +}
> +}
> +
> +const QDTProperty *qdt_setprop(QDTNode *node, const gchar *name,
> +   gconstpointer val, gsize len)
> +{
> +QDTProperty *prop;
> +
> +qdt_delprop(node, name);
> +
> +prop = g_malloc0(sizeof(*prop) + len);
> +prop->name = g_strdup(name);
> +prop->len = len;
> +memcpy(prop->val, val, len);

Can you replace the above four lines with qdt_new_property ?

> +QTAILQ_INSERT_TAIL(>properties, prop, list);
> +return prop;
> +}
> +
> +const QDTProperty *qdt_setprop_string(QDTNode *node, const gchar *name,
> +  const gchar *val)
> +{
> +return qdt_setprop(node, name, val, strlen(val) + 1);
> +}
> +
> +const QDTProperty *qdt_setprop_cells_(QDTNode *node, const gchar *name,
> +  const uint32_t *val, gsize len)
> +{
> +uint32_t swapval[len];
> +gsize i;
> +
> +for (i = 0; i < len; i++) {
> +swapval[i] = cpu_to_fdt32(val[i]);
> +}
> +return qdt_setprop(node, name, swapval, sizeof(swapval));
> +}
> +
> +const 

Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-04-26 Thread Jan Kiszka
On 2016-04-26 12:38, Peter Xu wrote:
 Hi, Jan,

 The above issue should be caused by EOI missing of level-triggered
 interrupts. Before that, I was always using edge-triggered
 interrupts for test, so didn't encounter this one. Would you please
 help try below patch? It can be applied directly onto the series,
 and should solve the issue (it works on my test vm, and I'll take it
 in v5 as well if it also works for you):

>>>
>>> Works here as well. I even made EIM working with some hack, though
>>> Jailhouse spits out strange warnings, despite it works fine (x2apic
>>> mode, split irqchip).
>>
>> Corrections: the warnings are issued by qemu, not Jailhouse, e.g.
>>
>> qemu-system-x86_64: VT-d Failed to remap interrupt for gsi 22.
>>
>> I suspect that comes from the hand-over phase of Jailhouse, when it
>> mutes all interrupts in the system while reconfiguring IR and IOAPIC.
>>
>> Please convert this error (in kvm_arch_fixup_msi_route) into a trace
>> point. It shall not annoy the host. Also check if you have more of such
>> guest-triggerable error messages.
> 
> Okay. This should be the only one. I can use trace instead.
> 
> Meanwhile, I still suppose we should not seen it even with
> error_report().. Would this happen when boot e.g. generic kernels?

No, this is - most probably, still need to validate in details - an
effect due to the special case with Jailhouse that interrupt and VT-d
management is handed over from a running OS to a hypervisor.

Jan

PS: Current quick hack for EIM support (lacks compat mode blocking at
least):

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 1082ab5..709a92c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -907,6 +907,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState 
*s)
 value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
 s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
 s->intr_root = value & VTD_IRTA_ADDR_MASK;
+s->intr_eime = value & VTD_IRTA_EIME;
 
 QLIST_FOREACH(notifier, >iec_notifiers, list) {
 if (notifier->iec_notify) {
@@ -2052,11 +2053,13 @@ static int vtd_remap_irq_get(IntelIOMMUState *iommu, 
uint16_t index, VTDIrq *irq
 irq->trigger_mode = irte.trigger_mode;
 irq->vector = irte.vector;
 irq->delivery_mode = irte.delivery_mode;
-/* Not support EIM yet: please refer to vt-d 9.10 DST bits */
+irq->dest = irte.dest_id;
+if (!iommu->intr_eime) {
 #define  VTD_IR_APIC_DEST_MASK (0xff00ULL)
 #define  VTD_IR_APIC_DEST_SHIFT(8)
-irq->dest = (irte.dest_id & VTD_IR_APIC_DEST_MASK) >> \
-VTD_IR_APIC_DEST_SHIFT;
+irq->dest = (irq->dest & VTD_IR_APIC_DEST_MASK) >>
+VTD_IR_APIC_DEST_SHIFT;
+}
 irq->dest_mode = irte.dest_mode;
 irq->redir_hint = irte.redir_hint;
 
@@ -2316,7 +2319,7 @@ static void vtd_init(IntelIOMMUState *s)
 s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
 if (ms->iommu_intr) {
-s->ecap |= VTD_ECAP_IR;
+s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM;
 }
 
 vtd_reset_context_cache(s);
@@ -2370,10 +2373,9 @@ static void vtd_init(IntelIOMMUState *s)
 vtd_define_quad(s, DMAR_FRCD_REG_0_2, 0, 0, 0x8000ULL);
 
 /*
- * Interrupt remapping registers, not support extended interrupt
- * mode for now.
+ * Interrupt remapping registers.
  */
-vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf00fULL, 0);
+vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf80fULL, 0);
 }
 
 /* Should not reset address_spaces when reset because devices will still use
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 10c20fe..72b0114 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -176,6 +176,7 @@
 
 /* IRTA_REG */
 #define VTD_IRTA_ADDR_MASK  (VTD_HAW_MASK ^ 0xfffULL)
+#define VTD_IRTA_EIME   (1ULL << 11)
 #define VTD_IRTA_SIZE_MASK  (0xfULL)
 
 /* ECAP_REG */
@@ -184,6 +185,7 @@
 #define VTD_ECAP_QI (1ULL << 1)
 /* Interrupt Remapping support */
 #define VTD_ECAP_IR (1ULL << 3)
+#define VTD_ECAP_EIM(1ULL << 4)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index d7613af..c6c53c6 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -261,6 +261,7 @@ struct IntelIOMMUState {
 bool intr_enabled;  /* Whether guest enabled IR */
 dma_addr_t intr_root;   /* Interrupt remapping table pointer */
 uint32_t intr_size; /* Number of IR table entries */
+bool intr_eime; /* Extended interrupt mode enabled */
 QLIST_HEAD(, VTD_IEC_Notifier) iec_notifiers; /* IEC notify list */
 };
 



[Qemu-devel] [PATCH v8 0/5] ARM: Add NUMA support for machine virt

2016-04-26 Thread Shannon Zhao
From: Shannon Zhao 

Add NUMA support for machine virt. Tested successfully running a guest
Linux kernel with the following patch applied:

- [PATCH v16 0/6] arm64, numa: Add numa support for arm64 platforms
https://lkml.org/lkml/2016/4/8/571
- [PATCH v5 00/14] ACPI NUMA support for ARM64
https://lkml.org/lkml/2016/4/19/852

Example qemu command line:
qemu-system-aarch64 \
-enable-kvm -smp 4\
-kernel Image \
-m 512 -machine virt,kernel_irqchip=on \
-initrd guestfs.cpio.gz \
-cpu host -nographic \
-numa node,mem=256M,cpus=0-1,nodeid=0 \
-numa node,mem=256M,cpus=2-3,nodeid=1 \
-append "console=ttyAMA0 root=/dev/ram"

Changes since v7:
* fix code style suggested by Marcel
* rename acpi_build_srat_memory to build_srat_memory

Changes since v6:
* squash first two patches of previous series together
* fix the definition of proximity in AcpiSratMemoryAffinity
* rename acpi_build_srat_memory to build_acpi_srat_memory

Changes since v5:
* don't generate /distance-map node since it's optional
* improve the /memory node name
* move acpi_build_srat_memory to common place then reuse it to generate
  SRAT table

Changes since v4:
* rebased on new kernel driver and device bindings, especially the
  compatible string "numa-distance-map-v1" of /distance-map node
* set the numa-node-id for first /memory node

Changes since v3:
* based on new kernel driver and device bindings
* add ACPI part

Changes since v2:
* update to use NUMA node property arm,associativity.

Changes since v1:
Take into account Peter's comments:
* rename virt_memory_init to arm_generate_memory_dtb
* move arm_generate_memory_dtb to boot.c and make it a common func
* use a struct numa_map to generate numa dtb

Shannon Zhao (5):
  ARM: Virt: Set numa-node-id for cpu and memory nodes
  ACPI: Add GICC Affinity Structure
  ACPI: Fix the definition of proximity in AcpiSratMemoryAffinity
  ACPI: move acpi_build_srat_memory to common place
  ACPI: Virt: Generate SRAT table

 hw/acpi/aml-build.c | 11 ++
 hw/arm/boot.c   | 43 +++--
 hw/arm/virt-acpi-build.c| 52 +
 hw/arm/virt.c   |  8 +++
 hw/i386/acpi-build.c| 41 +--
 include/hw/acpi/acpi-defs.h | 17 +--
 include/hw/acpi/aml-build.h | 10 +
 7 files changed, 143 insertions(+), 39 deletions(-)

-- 
2.0.4





[Qemu-devel] [PATCH v8 1/5] ARM: Virt: Set numa-node-id for cpu and memory nodes

2016-04-26 Thread Shannon Zhao
From: Shannon Zhao 

Generate memory nodes according to NUMA topology. Set numa-node-id
property for cpu and memory nodes.

Signed-off-by: Shannon Zhao 
Reviewed-by: Andrew Jones 
---
 hw/arm/boot.c | 43 +--
 hw/arm/virt.c |  8 
 2 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 5876945..1b913a4 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -14,6 +14,7 @@
 #include "hw/arm/linux-boot-if.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/numa.h"
 #include "hw/boards.h"
 #include "hw/loader.h"
 #include "elf.h"
@@ -405,6 +406,9 @@ static int load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 void *fdt = NULL;
 int size, rc;
 uint32_t acells, scells;
+char *nodename;
+unsigned int i;
+hwaddr mem_base, mem_len;
 
 if (binfo->dtb_filename) {
 char *filename;
@@ -456,12 +460,39 @@ static int load_dtb(hwaddr addr, const struct 
arm_boot_info *binfo,
 goto fail;
 }
 
-rc = qemu_fdt_setprop_sized_cells(fdt, "/memory", "reg",
-  acells, binfo->loader_start,
-  scells, binfo->ram_size);
-if (rc < 0) {
-fprintf(stderr, "couldn't set /memory/reg\n");
-goto fail;
+if (nb_numa_nodes > 0) {
+/*
+ * Turn the /memory node created before into a NOP node, then create
+ * /memory@addr nodes for all numa nodes respectively.
+ */
+qemu_fdt_nop_node(fdt, "/memory");
+mem_base = binfo->loader_start;
+for (i = 0; i < nb_numa_nodes; i++) {
+mem_len = numa_info[i].node_mem;
+nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
+qemu_fdt_add_subnode(fdt, nodename);
+qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
+rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
+  acells, mem_base,
+  scells, mem_len);
+if (rc < 0) {
+fprintf(stderr, "couldn't set %s/reg for node %d\n", nodename,
+i);
+goto fail;
+}
+
+qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", i);
+mem_base += mem_len;
+g_free(nodename);
+}
+} else {
+rc = qemu_fdt_setprop_sized_cells(fdt, "/memory", "reg",
+  acells, binfo->loader_start,
+  scells, binfo->ram_size);
+if (rc < 0) {
+fprintf(stderr, "couldn't set /memory/reg\n");
+goto fail;
+}
 }
 
 if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 56d35c7..fe6b11d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -38,6 +38,7 @@
 #include "net/net.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/device_tree.h"
+#include "sysemu/numa.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
 #include "hw/boards.h"
@@ -329,6 +330,7 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
 {
 int cpu;
 int addr_cells = 1;
+unsigned int i;
 
 /*
  * From Documentation/devicetree/bindings/arm/cpus.txt
@@ -378,6 +380,12 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
   armcpu->mp_affinity);
 }
 
+for (i = 0; i < nb_numa_nodes; i++) {
+if (test_bit(cpu, numa_info[i].node_cpu)) {
+qemu_fdt_setprop_cell(vbi->fdt, nodename, "numa-node-id", i);
+}
+}
+
 g_free(nodename);
 }
 }
-- 
2.0.4





[Qemu-devel] [PATCH v5 4/6] tcg: Clean up from 'next_tb'

2016-04-26 Thread Sergey Fedorov
From: Sergey Fedorov 

The value returned from tcg_qemu_tb_exec() is the value passed to the
corresponding tcg_gen_exit_tb() at translation time of the last TB
attempted to execute. It is a little confusing to store it in a variable
named 'next_tb'. In fact, it is a combination of 4-byte aligned pointer
and additional information in its two least significant bits. Break it
down right away into two variables named 'last_tb' and 'tb_exit' which
are a pointer to the last TB attempted to execute and the TB exit
reason, correspondingly. This simplifies the code and improves its
readability.

Correct a misleading documentation comment for tcg_qemu_tb_exec() and
fix logging in cpu_tb_exec(). Also rename a misleading 'next_tb' in
another couple of places.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---

Changes in v5:
 * Shortcut a non-null test

 cpu-exec.c   | 59 ---
 tcg/tcg.h| 19 ++-
 tci.c|  6 +++---
 trace-events |  2 +-
 4 files changed, 46 insertions(+), 40 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 36942340d7e3..6ad5fbdf1f8e 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -136,7 +136,9 @@ static void init_delay_params(SyncClocks *sc, const 
CPUState *cpu)
 static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock 
*itb)
 {
 CPUArchState *env = cpu->env_ptr;
-uintptr_t next_tb;
+uintptr_t ret;
+TranslationBlock *last_tb;
+int tb_exit;
 uint8_t *tb_ptr = itb->tc_ptr;
 
 qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
@@ -160,36 +162,37 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
TranslationBlock *itb)
 #endif /* DEBUG_DISAS */
 
 cpu->can_do_io = !use_icount;
-next_tb = tcg_qemu_tb_exec(env, tb_ptr);
+ret = tcg_qemu_tb_exec(env, tb_ptr);
 cpu->can_do_io = 1;
-trace_exec_tb_exit((void *) (next_tb & ~TB_EXIT_MASK),
-   next_tb & TB_EXIT_MASK);
+last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
+tb_exit = ret & TB_EXIT_MASK;
+trace_exec_tb_exit(last_tb, tb_exit);
 
-if ((next_tb & TB_EXIT_MASK) > TB_EXIT_IDX1) {
+if (tb_exit > TB_EXIT_IDX1) {
 /* We didn't start executing this TB (eg because the instruction
  * counter hit zero); we must restore the guest PC to the address
  * of the start of the TB.
  */
 CPUClass *cc = CPU_GET_CLASS(cpu);
-TranslationBlock *tb = (TranslationBlock *)(next_tb & ~TB_EXIT_MASK);
-qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
+qemu_log_mask_and_addr(CPU_LOG_EXEC, last_tb->pc,
"Stopped execution of TB chain before %p ["
TARGET_FMT_lx "] %s\n",
-   itb->tc_ptr, itb->pc, lookup_symbol(itb->pc));
+   last_tb->tc_ptr, last_tb->pc,
+   lookup_symbol(last_tb->pc));
 if (cc->synchronize_from_tb) {
-cc->synchronize_from_tb(cpu, tb);
+cc->synchronize_from_tb(cpu, last_tb);
 } else {
 assert(cc->set_pc);
-cc->set_pc(cpu, tb->pc);
+cc->set_pc(cpu, last_tb->pc);
 }
 }
-if ((next_tb & TB_EXIT_MASK) == TB_EXIT_REQUESTED) {
+if (tb_exit == TB_EXIT_REQUESTED) {
 /* We were asked to stop executing TBs (probably a pending
  * interrupt. We've now stopped, so clear the flag.
  */
 cpu->tcg_exit_req = 0;
 }
-return next_tb;
+return ret;
 }
 
 #ifndef CONFIG_USER_ONLY
@@ -358,8 +361,8 @@ int cpu_exec(CPUState *cpu)
 CPUArchState *env = _cpu->env;
 #endif
 int ret, interrupt_request;
-TranslationBlock *tb;
-uintptr_t next_tb;
+TranslationBlock *tb, *last_tb;
+int tb_exit = 0;
 SyncClocks sc;
 
 /* replay_interrupt may need current_cpu */
@@ -442,7 +445,7 @@ int cpu_exec(CPUState *cpu)
 #endif
 }
 
-next_tb = 0; /* force lookup of first TB */
+last_tb = NULL; /* forget the last executed TB after exception */
 for(;;) {
 interrupt_request = cpu->interrupt_request;
 if (unlikely(interrupt_request)) {
@@ -487,7 +490,7 @@ int cpu_exec(CPUState *cpu)
 else {
 replay_interrupt();
 if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
-next_tb = 0;
+last_tb = NULL;
 }
 }
 /* Don't use the cached interrupt_request value,
@@ -496,7 +499,7 @@ int cpu_exec(CPUState *cpu)
 cpu->interrupt_request &= ~CPU_INTERRUPT_EXITTB;
 /* ensure that no TB jump will be modified as
the program flow was changed */

[Qemu-devel] [PATCH v5 6/6] cpu-exec: Move TB chaining into tb_find_fast()

2016-04-26 Thread Sergey Fedorov
From: Sergey Fedorov 

Move tb_add_jump() call and surrounding code from cpu_exec() into
tb_find_fast(). That simplifies cpu_exec() a little by hiding the direct
chaining optimization details into tb_find_fast(). It also allows to
move tb_lock()/tb_unlock() pair into tb_find_fast(), putting it closer
to tb_find_slow() which also manipulates the lock.

Suggested-by: Alex Bennée 
Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c | 41 ++---
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 8ba899b47a23..44dae04ae819 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -320,7 +320,9 @@ found:
 return tb;
 }
 
-static inline TranslationBlock *tb_find_fast(CPUState *cpu)
+static inline TranslationBlock *tb_find_fast(CPUState *cpu,
+ TranslationBlock **last_tb,
+ int tb_exit)
 {
 CPUArchState *env = (CPUArchState *)cpu->env_ptr;
 TranslationBlock *tb;
@@ -331,11 +333,27 @@ static inline TranslationBlock *tb_find_fast(CPUState 
*cpu)
always be the same before a given translated block
is executed. */
 cpu_get_tb_cpu_state(env, , _base, );
+tb_lock();
 tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
 if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
  tb->flags != flags)) {
 tb = tb_find_slow(cpu, pc, cs_base, flags);
 }
+if (cpu->tb_flushed) {
+/* Ensure that no TB jump will be modified as the
+ * translation buffer has been flushed.
+ */
+*last_tb = NULL;
+cpu->tb_flushed = false;
+}
+/* see if we can patch the calling TB. When the TB
+   spans two pages, we cannot safely do a direct
+   jump. */
+if (*last_tb && tb->page_addr[1] == -1 &&
+!qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+tb_add_jump(*last_tb, tb_exit, tb);
+}
+tb_unlock();
 return tb;
 }
 
@@ -441,7 +459,8 @@ int cpu_exec(CPUState *cpu)
 } else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
 /* try to cause an exception pending in the log */
-cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
+last_tb = NULL; /* Avoid chaining TBs */
+cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, _tb, 0), true);
 ret = -1;
 break;
 #endif
@@ -511,23 +530,7 @@ int cpu_exec(CPUState *cpu)
 cpu->exception_index = EXCP_INTERRUPT;
 cpu_loop_exit(cpu);
 }
-tb_lock();
-tb = tb_find_fast(cpu);
-if (cpu->tb_flushed) {
-/* Ensure that no TB jump will be modified as the
- * translation buffer has been flushed.
- */
-last_tb = NULL;
-cpu->tb_flushed = false;
-}
-/* see if we can patch the calling TB. When the TB
-   spans two pages, we cannot safely do a direct
-   jump. */
-if (last_tb && tb->page_addr[1] == -1
-&& !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-tb_add_jump(last_tb, tb_exit, tb);
-}
-tb_unlock();
+tb = tb_find_fast(cpu, _tb, tb_exit);
 if (likely(!cpu->exit_request)) {
 uintptr_t ret;
 trace_exec_tb(tb, tb->pc);
-- 
2.8.1




[Qemu-devel] [PATCH v8 2/5] ACPI: Add GICC Affinity Structure

2016-04-26 Thread Shannon Zhao
From: Shannon Zhao 

Cc: Michael S. Tsirkin 
Cc: Igor Mammedov 
Signed-off-by: Shannon Zhao 
Reviewed-by: Andrew Jones 
---
 hw/i386/acpi-build.c|  2 +-
 include/hw/acpi/acpi-defs.h | 15 ++-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 6477003..9ae4c0d 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2474,7 +2474,7 @@ build_srat(GArray *table_data, GArray *linker, 
MachineState *machine)
 int apic_id = apic_ids->cpus[i].arch_id;
 
 core = acpi_data_push(table_data, sizeof *core);
-core->type = ACPI_SRAT_PROCESSOR;
+core->type = ACPI_SRAT_PROCESSOR_APIC;
 core->length = sizeof(*core);
 core->local_apic_id = apic_id;
 curnode = pcms->node_cpu[apic_id];
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index c7a03d4..bcf5c3f 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -455,8 +455,10 @@ struct AcpiSystemResourceAffinityTable
 } QEMU_PACKED;
 typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable;
 
-#define ACPI_SRAT_PROCESSOR  0
+#define ACPI_SRAT_PROCESSOR_APIC 0
 #define ACPI_SRAT_MEMORY 1
+#define ACPI_SRAT_PROCESSOR_x2APIC   2
+#define ACPI_SRAT_PROCESSOR_GICC 3
 
 struct AcpiSratProcessorAffinity
 {
@@ -483,6 +485,17 @@ struct AcpiSratMemoryAffinity
 } QEMU_PACKED;
 typedef struct AcpiSratMemoryAffinity AcpiSratMemoryAffinity;
 
+struct AcpiSratProcessorGiccAffinity
+{
+ACPI_SUB_HEADER_DEF
+uint32_tproximity;
+uint32_tacpi_processor_uid;
+uint32_tflags;
+uint32_tclock_domain;
+} QEMU_PACKED;
+
+typedef struct AcpiSratProcessorGiccAffinity AcpiSratProcessorGiccAffinity;
+
 /* PCI fw r3.0 MCFG table. */
 /* Subtable */
 struct AcpiMcfgAllocation {
-- 
2.0.4





Re: [Qemu-devel] update status -- vTPM for HVM virtual machine

2016-04-26 Thread Xu, Quan
On April 26, 2016 6:25 PM, Wei Liu  wrote:
> For avoidance of doubt, this is purely status update, no action is needed on
> my part.

Yes.

> Let me know if I misunderstood.
> 

Quan



[Qemu-devel] [PATCH v5 3/6] cpu-exec: elide more icount code if CONFIG_USER_ONLY

2016-04-26 Thread Sergey Fedorov
From: Paolo Bonzini 

Signed-off-by: Paolo Bonzini 
[Alex Bennée: #ifndef replay code to match elided functions]
Signed-off-by: Alex Bennée 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4cba4efc92b2..36942340d7e3 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -192,6 +192,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
TranslationBlock *itb)
 return next_tb;
 }
 
+#ifndef CONFIG_USER_ONLY
 /* Execute the code without caching the generated code. An interpreter
could be used if available. */
 static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
@@ -216,6 +217,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 tb_phys_invalidate(tb, -1);
 tb_free(tb);
 }
+#endif
 
 static TranslationBlock *tb_find_physical(CPUState *cpu,
   target_ulong pc,
@@ -430,12 +432,14 @@ int cpu_exec(CPUState *cpu)
 }
 #endif
 }
+#ifndef CONFIG_USER_ONLY
 } else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
 /* try to cause an exception pending in the log */
 cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
 ret = -1;
 break;
+#endif
 }
 
 next_tb = 0; /* force lookup of first TB */
@@ -545,6 +549,9 @@ int cpu_exec(CPUState *cpu)
 case TB_EXIT_ICOUNT_EXPIRED:
 {
 /* Instruction counter expired.  */
+#ifdef CONFIG_USER_ONLY
+abort();
+#else
 int insns_left = cpu->icount_decr.u32;
 if (cpu->icount_extra && insns_left >= 0) {
 /* Refill decrementer and continue execution.  */
@@ -564,6 +571,7 @@ int cpu_exec(CPUState *cpu)
 cpu_loop_exit(cpu);
 }
 break;
+#endif
 }
 default:
 break;
-- 
2.8.1




[Qemu-devel] [PATCH v5 1/6] tcg: code_bitmap is not used by user-mode emulation

2016-04-26 Thread Sergey Fedorov
From: Paolo Bonzini 

Signed-off-by: Paolo Bonzini 
[Sergey Fedorov: eliminate the field entirely in user-mode]
Signed-off-by: Sergey Fedorov 
Reviewed-by: Richard Henderson  
Reviewed-by: Alex Bennée 
---

Changes in v2:
 * The field is eliminated entirely in user-mode

 translate-all.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 8329ea60eeda..0d5d9449dc6b 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -75,8 +75,9 @@ typedef struct PageDesc {
 /* in order to optimize self modifying code, we count the number
of lookups we do to a given page to use a bitmap */
 unsigned int code_write_count;
+#ifdef CONFIG_SOFTMMU
 unsigned long *code_bitmap;
-#if defined(CONFIG_USER_ONLY)
+#else
 unsigned long flags;
 #endif
 } PageDesc;
@@ -784,8 +785,10 @@ void tb_free(TranslationBlock *tb)
 
 static inline void invalidate_page_bitmap(PageDesc *p)
 {
+#ifdef CONFIG_SOFTMMU
 g_free(p->code_bitmap);
 p->code_bitmap = NULL;
+#endif
 p->code_write_count = 0;
 }
 
@@ -1019,6 +1022,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
 }
 
+#ifdef CONFIG_SOFTMMU
 static void build_page_bitmap(PageDesc *p)
 {
 int n, tb_start, tb_end;
@@ -1047,6 +1051,7 @@ static void build_page_bitmap(PageDesc *p)
 tb = tb->page_next[n];
 }
 }
+#endif
 
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
@@ -1296,6 +1301,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 #endif
 }
 
+#ifdef CONFIG_SOFTMMU
 /* len must be <= 8 and start must be a multiple of len */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
 {
@@ -1333,8 +1339,7 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, 
int len)
 tb_invalidate_phys_page_range(start, start + len, 1);
 }
 }
-
-#if !defined(CONFIG_SOFTMMU)
+#else
 /* Called with mmap_lock held.  */
 static void tb_invalidate_phys_page(tb_page_addr_t addr,
 uintptr_t pc, void *puc,
-- 
2.8.1




[Qemu-devel] [PATCH v8 4/5] ACPI: move acpi_build_srat_memory to common place

2016-04-26 Thread Shannon Zhao
From: Shannon Zhao 

Move acpi_build_srat_memory to common place so that it could be reused
by ARM. Rename it to build_srat_memory.

Cc: Michael S. Tsirkin 
Cc: Igor Mammedov 
Signed-off-by: Shannon Zhao 
Reviewed-by: Andrew Jones 
---
 hw/acpi/aml-build.c | 11 +++
 hw/i386/acpi-build.c| 38 +-
 include/hw/acpi/aml-build.h | 10 ++
 3 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index ab89ca6..cedb74e 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1563,3 +1563,14 @@ build_rsdt(GArray *table_data, GArray *linker, GArray 
*table_offsets,
 build_header(linker, table_data,
  (void *)rsdt, "RSDT", rsdt_len, 1, oem_id, oem_table_id);
 }
+
+void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
+   uint64_t len, int node, MemoryAffinityFlags flags)
+{
+numamem->type = ACPI_SRAT_MEMORY;
+numamem->length = sizeof(*numamem);
+numamem->proximity = cpu_to_le32(node);
+numamem->flags = cpu_to_le32(flags);
+numamem->base_addr = cpu_to_le64(base);
+numamem->range_length = cpu_to_le64(len);
+}
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 3c031aa..279f0d7 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2427,25 +2427,6 @@ build_tpm2(GArray *table_data, GArray *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }
 
-typedef enum {
-MEM_AFFINITY_NOFLAGS  = 0,
-MEM_AFFINITY_ENABLED  = (1 << 0),
-MEM_AFFINITY_HOTPLUGGABLE = (1 << 1),
-MEM_AFFINITY_NON_VOLATILE = (1 << 2),
-} MemoryAffinityFlags;
-
-static void
-acpi_build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
-   uint64_t len, int node, MemoryAffinityFlags flags)
-{
-numamem->type = ACPI_SRAT_MEMORY;
-numamem->length = sizeof(*numamem);
-numamem->proximity = cpu_to_le32(node);
-numamem->flags = cpu_to_le32(flags);
-numamem->base_addr = cpu_to_le64(base);
-numamem->range_length = cpu_to_le64(len);
-}
-
 static void
 build_srat(GArray *table_data, GArray *linker, MachineState *machine)
 {
@@ -2491,7 +2472,7 @@ build_srat(GArray *table_data, GArray *linker, 
MachineState *machine)
 numa_start = table_data->len;
 
 numamem = acpi_data_push(table_data, sizeof *numamem);
-acpi_build_srat_memory(numamem, 0, 640*1024, 0, MEM_AFFINITY_ENABLED);
+build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
 next_base = 1024 * 1024;
 for (i = 1; i < pcms->numa_nodes + 1; ++i) {
 mem_base = next_base;
@@ -2507,21 +2488,21 @@ build_srat(GArray *table_data, GArray *linker, 
MachineState *machine)
 mem_len -= next_base - pcms->below_4g_mem_size;
 if (mem_len > 0) {
 numamem = acpi_data_push(table_data, sizeof *numamem);
-acpi_build_srat_memory(numamem, mem_base, mem_len, i - 1,
-   MEM_AFFINITY_ENABLED);
+build_srat_memory(numamem, mem_base, mem_len, i - 1,
+  MEM_AFFINITY_ENABLED);
 }
 mem_base = 1ULL << 32;
 mem_len = next_base - pcms->below_4g_mem_size;
 next_base += (1ULL << 32) - pcms->below_4g_mem_size;
 }
 numamem = acpi_data_push(table_data, sizeof *numamem);
-acpi_build_srat_memory(numamem, mem_base, mem_len, i - 1,
-   MEM_AFFINITY_ENABLED);
+build_srat_memory(numamem, mem_base, mem_len, i - 1,
+  MEM_AFFINITY_ENABLED);
 }
 slots = (table_data->len - numa_start) / sizeof *numamem;
 for (; slots < pcms->numa_nodes + 2; slots++) {
 numamem = acpi_data_push(table_data, sizeof *numamem);
-acpi_build_srat_memory(numamem, 0, 0, 0, MEM_AFFINITY_NOFLAGS);
+build_srat_memory(numamem, 0, 0, 0, MEM_AFFINITY_NOFLAGS);
 }
 
 /*
@@ -2531,10 +2512,9 @@ build_srat(GArray *table_data, GArray *linker, 
MachineState *machine)
  */
 if (hotplugabble_address_space_size) {
 numamem = acpi_data_push(table_data, sizeof *numamem);
-acpi_build_srat_memory(numamem, pcms->hotplug_memory.base,
-   hotplugabble_address_space_size, 0,
-   MEM_AFFINITY_HOTPLUGGABLE |
-   MEM_AFFINITY_ENABLED);
+build_srat_memory(numamem, pcms->hotplug_memory.base,
+  hotplugabble_address_space_size, 0,
+  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
 }
 
 build_header(linker, table_data,
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 2c994b3..7eb51c7 100644
--- a/include/hw/acpi/aml-build.h
+++ 

[Qemu-devel] [PATCH v5 0/6] tcg: Misc clean-up patches

2016-04-26 Thread Sergey Fedorov
From: Sergey Fedorov 

This patch series consists of various general TCG clean-up patches extracted
from Paolo's MTTCG tree [1] and Alex's MTTCG base enablement tree [2]. I also
add here a patch from myself to rework tb_invalidated_flag based on the Paolo's
"tcg: move tb_invalidated_flag to CPUState" patch from the original verions of
this series. Another patch of mine cleans up from a misleading 'next_tb'
variable.

The main idea is to review and merge these patches separately from the MTTCG
series to cut the latter and make it easier to review.

The series' tree can be found in a public git repository [3].

[1] https://github.com/bonzini/qemu/tree/mttcg
[2] https://github.com/stsquad/qemu/tree/mttcg/base-patches-v2
[3] https://github.com/sergefdrv/qemu/tree/tcg-cleanup-v5

Summary of changes:
 v5:
  * Shortcut non-null tests in [PATCH v5 4/6]
  * Add a patch to move TB chaining to tb_find_fast() [PATCH v5 6/6]
 v4:
  * Add a patch to clean up from 'next_tb' [PATCH v4 4/6]
 v3:
  * Add a patch to rework tb_invalidated_flag from myself [PATCH v3 4/4]
 v2:
  * Complete code_bitmap elimination [PATCH v2 1/3]
  * Take Alex's version of tb_find_physical() reorganization [PATCH v2 2/3]
  * Drop [PATCH 3/5] completely
  * Drop [PATCH 4/5] and [PATCH 5/5] for separate series
  * Take Alex's rebase of Paolo's icount code eliding [PATCH v2 3/3]

Alex Bennée (1):
  tcg: reorganize tb_find_physical loop

Paolo Bonzini (2):
  tcg: code_bitmap is not used by user-mode emulation
  cpu-exec: elide more icount code if CONFIG_USER_ONLY

Sergey Fedorov (3):
  tcg: Clean up from 'next_tb'
  tcg: Rework tb_invalidated_flag
  cpu-exec: Move TB chaining into tb_find_fast()

 cpu-exec.c  | 157 +++-
 include/exec/exec-all.h |   2 -
 include/qom/cpu.h   |   2 +
 tcg/tcg.h   |  19 +++---
 tci.c   |   6 +-
 trace-events|   2 +-
 translate-all.c |  16 ++---
 7 files changed, 114 insertions(+), 90 deletions(-)

-- 
2.8.1




[Qemu-devel] [PATCH v5 5/6] tcg: Rework tb_invalidated_flag

2016-04-26 Thread Sergey Fedorov
From: Sergey Fedorov 

'tb_invalidated_flag' was meant to catch two events:
 * some TB has been invalidated by tb_phys_invalidate();
 * the whole translation buffer has been flushed by tb_flush().

Then it was checked:
 * in cpu_exec() to ensure that the last executed TB can be safely
   linked to directly call the next one;
 * in cpu_exec_nocache() to decide if the original TB should be provided
   for further possible invalidation along with the temporarily
   generated TB.

It is always safe to patch an invalidated TB since it is not going to be
used anyway. It is also safe to call tb_phys_invalidate() for an already
invalidated TB. Thus, setting this flag in tb_phys_invalidate() is
simply unnecessary. Moreover, it can prevent from pretty proper linking
of TBs, if any arbitrary TB has been invalidated. So just don't touch it
in tb_phys_invalidate().

If this flag is only used to catch whether tb_flush() has been called
then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using
only 'true' and 'false' to set its value. Also, instead of setting it in
tb_gen_code(), just after tb_flush() has been called, do it right inside
of tb_flush().

In cpu_exec(), this flag is used to track if tb_flush() has been called
and have made 'next_tb' (a reference to the last executed TB) invalid
for linking it to directly call the next TB. tb_flush() can be called
during the CPU execution loop from tb_gen_code(), during TB execution or
by another thread while 'tb_lock' is released. Catch for translation
buffer flush reliably by resetting this flag once before first TB lookup
and each time we find it set before trying to add a direct jump. Don't
touch in in tb_find_physical().

Each vCPU has its own execution loop in multithreaded mode and thus
should have its own copy of the flag to be able to reset it with its own
'next_tb' and don't affect any other vCPU execution thread. So make this
flag per-vCPU and move it to CPUState.

In cpu_exec_nocache(), we only need to check if tb_flush() has been
called from tb_gen_code() called by cpu_exec_nocache() itself. To do
this reliably, preserve the old value of the flag, reset it before
calling tb_gen_code(), check afterwards, and combine the saved value
back to the flag.

This patch is based on the patch "tcg: move tb_invalidated_flag to
CPUState" from Paolo Bonzini .

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---

Changes in v4:
 * Rebased on top of the previous patch

 cpu-exec.c  | 21 +++--
 include/exec/exec-all.h |  2 --
 include/qom/cpu.h   |  2 ++
 translate-all.c |  5 +
 4 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 6ad5fbdf1f8e..8ba899b47a23 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -202,16 +202,20 @@ static void cpu_exec_nocache(CPUState *cpu, int 
max_cycles,
  TranslationBlock *orig_tb, bool ignore_icount)
 {
 TranslationBlock *tb;
+bool old_tb_flushed;
 
 /* Should never happen.
We only end up here when an existing TB is too long.  */
 if (max_cycles > CF_COUNT_MASK)
 max_cycles = CF_COUNT_MASK;
 
+old_tb_flushed = cpu->tb_flushed;
+cpu->tb_flushed = false;
 tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
  max_cycles | CF_NOCACHE
  | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
-tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb;
+tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
+cpu->tb_flushed |= old_tb_flushed;
 cpu->current_tb = tb;
 /* execute the generated code */
 trace_exec_tb_nocache(tb, tb->pc);
@@ -232,8 +236,6 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
 unsigned int h;
 tb_page_addr_t phys_pc, phys_page1;
 
-tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
-
 /* find translated block using physical mappings */
 phys_pc = get_page_addr_code(env, pc);
 phys_page1 = phys_pc & TARGET_PAGE_MASK;
@@ -446,6 +448,7 @@ int cpu_exec(CPUState *cpu)
 }
 
 last_tb = NULL; /* forget the last executed TB after exception */
+cpu->tb_flushed = false; /* reset before first TB lookup */
 for(;;) {
 interrupt_request = cpu->interrupt_request;
 if (unlikely(interrupt_request)) {
@@ -510,14 +513,12 @@ int cpu_exec(CPUState *cpu)
 }
 tb_lock();
 tb = tb_find_fast(cpu);
-/* Note: we do it here to avoid a gcc bug on Mac OS X when
-   doing it in tb_find_slow */
-if (tcg_ctx.tb_ctx.tb_invalidated_flag) {
-/* as some TB could have been invalidated because
-   of memory exceptions while generating the code, we
-   must recompute the hash 

[Qemu-devel] [PATCH v5 2/6] tcg: reorganize tb_find_physical loop

2016-04-26 Thread Sergey Fedorov
From: Alex Bennée 

Put some comments and improve code structure. This should help reading
the code.

Signed-off-by: Alex Bennée 
[Sergey Fedorov: provide commit message; bring back resetting of
tb_invalidated_flag]
Signed-off-by: Sergey Fedorov 
Reviewed-by: Richard Henderson  
---
 cpu-exec.c | 44 
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index bbfcbfb54385..4cba4efc92b2 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -223,10 +223,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
   uint64_t flags)
 {
 CPUArchState *env = (CPUArchState *)cpu->env_ptr;
-TranslationBlock *tb, **ptb1;
+TranslationBlock *tb, **tb_hash_head, **ptb1;
 unsigned int h;
 tb_page_addr_t phys_pc, phys_page1;
-target_ulong virt_page2;
 
 tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
 
@@ -234,37 +233,42 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
 phys_pc = get_page_addr_code(env, pc);
 phys_page1 = phys_pc & TARGET_PAGE_MASK;
 h = tb_phys_hash_func(phys_pc);
-ptb1 = _ctx.tb_ctx.tb_phys_hash[h];
-for(;;) {
-tb = *ptb1;
-if (!tb) {
-return NULL;
-}
+
+/* Start at head of the hash entry */
+ptb1 = tb_hash_head = _ctx.tb_ctx.tb_phys_hash[h];
+tb = *ptb1;
+
+while (tb) {
 if (tb->pc == pc &&
 tb->page_addr[0] == phys_page1 &&
 tb->cs_base == cs_base &&
 tb->flags == flags) {
-/* check next page if needed */
-if (tb->page_addr[1] != -1) {
-tb_page_addr_t phys_page2;
 
-virt_page2 = (pc & TARGET_PAGE_MASK) +
-TARGET_PAGE_SIZE;
-phys_page2 = get_page_addr_code(env, virt_page2);
+if (tb->page_addr[1] == -1) {
+/* done, we have a match */
+break;
+} else {
+/* check next page if needed */
+target_ulong virt_page2 = (pc & TARGET_PAGE_MASK) +
+  TARGET_PAGE_SIZE;
+tb_page_addr_t phys_page2 = get_page_addr_code(env, 
virt_page2);
+
 if (tb->page_addr[1] == phys_page2) {
 break;
 }
-} else {
-break;
 }
 }
+
 ptb1 = >phys_hash_next;
+tb = *ptb1;
 }
 
-/* Move the TB to the head of the list */
-*ptb1 = tb->phys_hash_next;
-tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
-tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;
+if (tb) {
+/* Move the TB to the head of the list */
+*ptb1 = tb->phys_hash_next;
+tb->phys_hash_next = *tb_hash_head;
+*tb_hash_head = tb;
+}
 return tb;
 }
 
-- 
2.8.1




Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-04-26 Thread Peter Xu
On Tue, Apr 26, 2016 at 10:15:46AM +0200, Jan Kiszka wrote:
> On 2016-04-26 09:57, Jan Kiszka wrote:
> > On 2016-04-26 09:34, Peter Xu wrote:
> >> On Mon, Apr 25, 2016 at 09:24:12AM +0200, Jan Kiszka wrote:
> >>> On 2016-04-25 09:18, Peter Xu wrote:
>  On Mon, Apr 25, 2016 at 07:16:19AM +0200, Jan Kiszka wrote:
> > On 2016-04-19 10:38, Peter Xu wrote:
> 
>  [...]
> 
> >> By default, IR is disabled to be better compatible with current
> >> QEMU. To enable IR, we can using the following command to boot a
> >> IR-supported VM with virtio-net device with vhost (still do not
> >> support kvm-ioapic, so we need to specify kernel-irqchip={split|off}
> >> here):
> >>
> >> $ qemu-system-x86_64 -M q35,iommu=on,intr=on,kernel-irqchip=split \
> >
> > "intr" sounds a bit too much like "interrupt", not "interrupt
> > remapping". Why not use the kernel's form, "intremap"?
> 
>  Sure. It sounds nice to be aligned with the kernel one. Let me take
>  it in v5.
> 
> >
> >>  -enable-kvm -m 1024 \
> >> -netdev tap,id=net0,vhost=on \
> >>  -device virtio-net-pci,netdev=user.0 \
> >>  -monitor telnet::,server,nowait \
> >> /var/lib/libvirt/images/vm1.qcow2
> >>
> >> When guest boots, we can verify whether IR enabled by grepping the
> >> dmesg like:
> >>
> >> [root@localhost ~]# journalctl -k | grep "DMAR-IR"
> >> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: IOAPIC id 0 
> >> under DRHD base  0xfed9 IOMMU 0
> >> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: Enabled IRQ 
> >> remapping in xapic mode
> >>
> >> Currently supported devices:
> >>
> >> - Emulated/Splitted irqchip
> >> - Generic PCI Devices
> >> - vhost devices
> >> - pass through device support? Not tested, but suppose it should work.
> >
> > I've tested this series against my Jailhouse setup, and it works pretty
> > well! Actually considering to move my test setup over this branch.
> 
>  This is really encouraging feedback! Btw, thanks for all kinds of
>  help on this patchset. :-)
> 
> >
> > However, split irqchip still has some issues: When I boot a q35 machine
> > with Linux, the e1000 network adapter only gets a single IRQ delivered.
> > Interestingly, other IOAPIC IRQs like the keyboard work all the time. I
> > didn't debug this in details yet.
> 
>  I reproduced this problem. It seems that it fails even with
>  kernel-irqchip=off. Will try to dig it out.
> >>>
> >>> Very good. Hope it can be easily fixed.
> >>
> >> Hi, Jan,
> >>
> >> The above issue should be caused by EOI missing of level-triggered
> >> interrupts. Before that, I was always using edge-triggered
> >> interrupts for test, so didn't encounter this one. Would you please
> >> help try below patch? It can be applied directly onto the series,
> >> and should solve the issue (it works on my test vm, and I'll take it
> >> in v5 as well if it also works for you):
> >>
> > 
> > Works here as well. I even made EIM working with some hack, though
> > Jailhouse spits out strange warnings, despite it works fine (x2apic
> > mode, split irqchip).
> 
> Corrections: the warnings are issued by qemu, not Jailhouse, e.g.
> 
> qemu-system-x86_64: VT-d Failed to remap interrupt for gsi 22.
> 
> I suspect that comes from the hand-over phase of Jailhouse, when it
> mutes all interrupts in the system while reconfiguring IR and IOAPIC.
> 
> Please convert this error (in kvm_arch_fixup_msi_route) into a trace
> point. It shall not annoy the host. Also check if you have more of such
> guest-triggerable error messages.

Okay. This should be the only one. I can use trace instead.

Meanwhile, I still suppose we should not seen it even with
error_report().. Would this happen when boot e.g. generic kernels?

-- peterx



  1   2   >