Re: [Qemu-devel] [Qemu-arm] [PATCH 1/2] hw/adc: Add basic Aspeed ADC model

2017-05-23 Thread Andrew Jeffery
On Wed, 2017-05-24 at 02:15 -0300, Philippe Mathieu-Daudé wrote:
> Hi Andrew,
> 
> On 05/22/2017 02:14 AM, Andrew Jeffery wrote:
> > On Mon, 2017-05-22 at 03:15 +, Ryan Chen wrote:
> > > In ASPEED SoC chip, all register access have following rule.
> > > Most of controller write access is only support 32bit access.
> > > Read is support 8bits/16bits/32bits.
> 
> This makes sens thinking about how a DMA controller can take advantage 
> of the ADC.
> 
> > 
> > Thanks for clearing that up Ryan.
> > 
> > Phil: I'll rework the model so the reads are 16-bits.
> 
> This shouldn't be necessary, QEMU is supposed to supports different 
> access size for different implemented size, so you can declare your 
> implementation as 32-bit and valid accesses from 8 to 32:
> 
>   static const MemoryRegionOps aspeed_adc_ops = {
>   .read = aspeed_adc_read,
>   .write = aspeed_adc_write,
>   .endianness = DEVICE_LITTLE_ENDIAN,
>   .valid.min_access_size = 1,
>   .valid.max_access_size = 4,
>   .valid.unaligned = false,
> +.impl.min_access_size = 4,
> +.impl.max_access_size = 4,
>   };
> 
> This way an I/O access from the CPU or a DMA could use 8/16-bit while 
> you keep a 32-bit implementation. The adjustment is done by 
> access_with_adjusted_size() from memory.c.
> 
> Afaik there is, however, no distinction between read/write different 
> access size in current QEMU MemoryRegionOps model.

Yep, I realised all of the above when I went to implement it. Thanks
for pointing it out though!

Andrew

signature.asc
Description: This is a digitally signed message part


Re: [Qemu-devel] [PATCH v2] qapi: Fix some QMP documentation regressions

2017-05-23 Thread Markus Armbruster
Eric Blake  writes:

> In the process of getting rid of docs/qmp-commands.txt, we managed
> to regress on some of the text that changed after the point where
> the move was first branched and when the move actually occurred.
> For example, commit 3282eca for blockdev-snapshot re-added the
> extra "options" layer which had been cleaned up in commit 0153d2f.
>
> This clears up all regressions identified over the range
> 02b351d..bd6092e:
> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05127.html
> as well as a cleanup to x-blockdev-remove-medium to prefer
> 'id' over 'device' (matching the cleanup for 'eject').
>
> CC: qemu-triv...@nongnu.org
> Signed-off-by: Eric Blake 

Reviewed-by: Markus Armbruster 

> ---
> v2: add 'eject', 'x-blockdev-remove-medium' cleanups, thanks to
> Markus' audit.
>
> Can go in either via trivial or via qapi tree

I'll put it in my next pull unless -trivial picks it up first.



[Qemu-devel] [PATCH] Fix nmi injection failure when vcpu got blocked

2017-05-23 Thread Zhuangyanying
From: ZhuangYanying 

Recently I found NMI could not be injected to vm via libvirt API
Reproduce the problem:
1 use guest of redhat 7.3
2 disable nmi_watchdog and trig spinlock deadlock inside the guest
check the running vcpu thread, make sure not vcpu0
3 inject NMI into the guest via libvirt API "inject-nmi"

Result:
The NMI could not be injected into the guest.

Reason:
1 It sets nmi_queued to 1 when calling ioctl KVM_NMI in qemu, and
sets cpu->kvm_vcpu_dirty to true in do_inject_external_nmi() meanwhile.
2 It sets nmi_queued to 0 in process_nmi(), before entering guest,
because cpu->kvm_vcpu_dirty is true.

Normally, vcpu could call vcpu_enter_guest successfully to inject the NMI.
However, in the problematic scenario, when the guest's threads hold
spin_lock_irqsave for a long time, such as entering a while loop after
spin_lock_irqsave(), other vcpus would enter into S state because of
pvspinlock scheme, then KVM module will loop in vcpu_block rather than
entry the guest.

I think that it's not suitable to decide whether to stay in vcpu_block()
or not just by checking nmi_queued, NMI should be injected immediately
even at this situation.

Solution:
There're 2 ways to solve the problem:
1 call cpu_synchronize_state_not_set_dirty() rather than
cpu_synchronize_state(), while injecting NMI, to avoid changing
nmi_queued to 0. But other workqueues may affect cpu->kvm_vcpu_dirty,
so it's not recommended.
2 add checking nmi_pending plus with nmi_queued in vm_vcpu_has_events()
in KVM module.

qemu_kvm_wait_io_event
  qemu_wait_io_event_common
flush_queued_work
  do_inject_external_nmi
cpu_synchronize_state
  kvm_cpu_synchronize_state
do_kvm_cpu_synchronize_state
  cpu->kvm_vcpu_dirty = true;/* trigger process_nmi */
kvm_vcpu_ioctl(cpu, KVM_NMI)
  kvm_vcpu_ioctl_nmi
kvm_inject_nmi
  atomic_inc(>arch.nmi_queued);
nmi_queued = 1
/* nmi_queued set to 1, when qemu ioctl KVM_NMI */
  kvm_make_request(KVM_REQ_NMI, vcpu);
kvm_cpu_exec
  kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE);
kvm_arch_put_registers
  kvm_put_vcpu_events
kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
  kvm_vcpu_ioctl_x86_set_vcpu_events
process_nmi(vcpu);
  vcpu->arch.nmi_pending +=
atomic_xchg(>arch.nmi_queued, 0);
nmi_queued = 0
/* nmi_queued set to 0, vcpu thread always block */
nmi_pending = 1
  kvm_make_request(KVM_REQ_EVENT, vcpu);
  kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
kvm_arch_vcpu_ioctl_run
  vcpu_run(vcpu);
kvm_vcpu_running(vcpu)
  /* always false, could not call vcpu_enter_guest */
  vcpu_block
kvm_arch_vcpu_runnable
  kvm_vcpu_has_events
if (atomic_read(>arch.nmi_queued))
   /* nmi_queued is 0, vcpu thread always block*/

Signed-off-by: Zhuang Yanying 
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02363e3..96983dc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8394,7 +8394,8 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu 
*vcpu)
if (vcpu->arch.pv.pv_unhalted)
return true;
 
-   if (atomic_read(>arch.nmi_queued))
+   if (vcpu->arch.nmi_pending ||
+   atomic_read(>arch.nmi_queued))
return true;
 
if (kvm_test_request(KVM_REQ_SMI, vcpu))
-- 
1.8.3.1





Re: [Qemu-devel] [PATCH v2 09/13] vvfat: correctly create base short names for non-ASCII filenames

2017-05-23 Thread Hervé Poussineau

Hi Philippe,

Le 24/05/2017 à 06:17, Philippe Mathieu-Daudé a écrit :

Hi Hervé,

On 05/22/2017 06:12 PM, Hervé Poussineau wrote:

More specifically, create short name from filename and change blacklist of
invalid chars to whitelist of valid chars.

Windows 9x also now correctly see long file names of filenames containing a 
space,
but Scandisk still complains about mismatch between SFN and LFN.

Specification: "FAT: General overview of on-disk format" v1.03, pages 30-31
Signed-off-by: Hervé Poussineau 
---
 block/vvfat.c | 105 ++
 1 file changed, 77 insertions(+), 28 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index d34241da17..3cb65493cd 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -516,6 +516,81 @@ static void set_begin_of_direntry(direntry_t* direntry, 
uint32_t begin)
 direntry->begin_hi = cpu_to_le16((begin >> 16) & 0x);
 }

+static uint8_t to_valid_short_char(gunichar c)
+{
+c = g_unichar_toupper(c);
+if ((c >= '0' && c <= '9') ||
+(c >= 'A' && c <= 'Z') ||
+strchr("$%'-_@~`!(){}^#&", c) != 0) {
+return c;
+} else {
+return 0;
+}
+}
+
+static direntry_t *create_short_filename(BDRVVVFATState *s,
+ const char *filename)
+{
+int i, j = 0;
+direntry_t *entry = array_get_next(&(s->directory));
+const gchar *p, *last_dot = NULL;
+gunichar c;
+bool lossy_conversion = false;


What is the purpose of this variable? Do you plan to use it later or it is just 
for debugging?


I will use it later in patch 10/13, when generating numeric-tails of short 
names (~1, ~2...)




+char tail[11];


This also looks like old debug code...


Yes, good catch! I will remove it. Compiler didn't warn me.




+
+if (!entry) {
+return NULL;
+}
+memset(entry->name, 0x20, sizeof(entry->name));
+
+/* copy filename and search last dot */
+for (p = filename; ; p = g_utf8_next_char(p)) {
+c = g_utf8_get_char(p);
+if (c == '\0') {
+break;
+} else if (c == '.') {
+if (j == 0) {
+/* '.' at start of filename */
+lossy_conversion = true;
+} else {
+if (last_dot) {
+lossy_conversion = true;
+}
+last_dot = p;
+}
+} else if (!last_dot) {
+/* first part of the name; copy it */
+uint8_t v = to_valid_short_char(c);
+if (j < 8 && v) {
+entry->name[j++] = v;
+} else {
+lossy_conversion = true;
+}
+}
+}
+
+/* copy extension (if any) */
+if (last_dot) {
+j = 0;
+for (p = g_utf8_next_char(last_dot); ; p = g_utf8_next_char(p)) {
+c = g_utf8_get_char(p);
+if (c == '\0') {
+break;
+} else {
+/* extension; copy it */
+uint8_t v = to_valid_short_char(c);
+if (j < 3 && v) {
+entry->name[8 + (j++)] = v;
+} else {
+lossy_conversion = true;
+}
+}
+}
+}



+(void)lossy_conversion;
+return entry;


See the (void)lossy_conversion, which was put on purpose. Those two lines will 
be replaced in patch 10/13.


+}
+
 /* fat functions */

 static inline uint8_t fat_chksum(const direntry_t* entry)
@@ -614,7 +689,7 @@ static inline void init_fat(BDRVVVFATState* s)
 static inline direntry_t* create_short_and_long_name(BDRVVVFATState* s,
 unsigned int directory_start, const char* filename, int is_dot)
 {
-int i,j,long_index=s->directory.next;
+int long_index = s->directory.next;
 direntry_t* entry = NULL;
 direntry_t* entry_long = NULL;

@@ -626,33 +701,7 @@ static inline direntry_t* 
create_short_and_long_name(BDRVVVFATState* s,
 }

 entry_long=create_long_filename(s,filename);
-
-i = strlen(filename);
-for(j = i - 1; j>0  && filename[j]!='.';j--);
-if (j > 0)
-i = (j > 8 ? 8 : j);
-else if (i > 8)
-i = 8;
-
-entry=array_get_next(&(s->directory));
-memset(entry->name, 0x20, sizeof(entry->name));
-memcpy(entry->name, filename, i);
-
-if (j > 0) {
-for (i = 0; i < 3 && filename[j + 1 + i]; i++) {
-entry->name[8 + i] = filename[j + 1 + i];
-}
-}
-
-/* upcase & remove unwanted characters */
-for(i=10;i>=0;i--) {
-if(i==10 || i==7) for(;i>0 && entry->name[i]==' ';i--);
-if(entry->name[i]<=' ' || entry->name[i]>0x7f
-|| strchr(".*?<>|\":/\\[];,+='",entry->name[i]))
-entry->name[i]='_';
-else if(entry->name[i]>='a' && entry->name[i]<='z')
-entry->name[i]+='A'-'a';
-}
+entry = create_short_filename(s, filename);

 /* mangle duplicates */

Re: [Qemu-devel] [PATCH v5 1/4] net/rocker: Remove the dead error handling

2017-05-23 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> Hi Markus,
>
> On 05/23/2017 06:27 AM, Markus Armbruster wrote:
> [...]
>> There's one more cleanup opportunity:
>>
> [...]
>>>  if (pci_dma_read(dev, le64_to_cpu(info->desc.buf_addr), info->buf, 
>>> size)) {
>>>  return NULL;
>>>  }
>>
>> None of the pci_dma_read() calls outside rocker check the return value.
>> Just as well, because it always returns 0.  Please clean this up in a
>> separate followup patch.
>
> It may be the correct way to do it but this sounds like we are missing
> something somewhere... pci_dma_read() calls pci_dma_rw() which always
> returns 0. Why not let it returns void? It is inlined and never used
> by address. Else we should document why returning 0 is correct, and
> what is the reason to not use a void prototype.
>
> pci_dma_rw() calls dma_memory_rw() which does return a boolean value,
> false on success (MEMTX_OK) and true on error
> (MEMTX_ERROR/DECODE_ERROR)

PCI question.  Michael, Marcel?



Re: [Qemu-devel] [PATCH v3 02/24] docker: add --include-files argument to 'build' command

2017-05-23 Thread Fam Zheng
On Wed, 05/24 13:21, Fam Zheng wrote:
> > diff --git a/tests/docker/docker.py b/tests/docker/docker.py
> > index 6ddc6e4c2a..68cca25f89 100755
> > --- a/tests/docker/docker.py
> > +++ b/tests/docker/docker.py
> > @@ -237,6 +237,11 @@ class BuildCommand(SubCommand):
> >  help="""Specify a binary that will be copied 
> > to the
> >  container together with all its dependent
> >  libraries""")
> > +parser.add_argument("--extra-files", "-f", nargs='*',
> > +help="""Specify files that will be sent to the
> > +Docker daemon. The daemon will copy those 
> > files into
> > +the built image. The ADD directive of the 
> > Dockerfile
> > +specify where a file is placed into the 
> > image""")
> 
> Or more precisely, "The daemon will copy these files into the docker build
> directory, which can be copied to the built image with ADD directives in
> Dockerfile"?

Hmm, not copied by daemon, maybe "The files will be sent...". But you have the
idea.

Fam



Re: [Qemu-devel] [PATCH v3 00/24] docker/shippable: cross-build mipsel and powerpc targets

2017-05-23 Thread Fam Zheng
On Sun, 05/21 00:29, Philippe Mathieu-Daudé wrote:
> This patchset add 2 more architectures to the cross-build farm.
> 
> There is still some issue trying to cross-build mips64el-softmmu, it seems the
> cross tools use the system outdated libfdt instead of the one checkouted in 
> the
> dtc submodule. I disabled this target for now.
> 
> The branch https://github.com/philmd/qemu/tree/docker_shippable_v3 can be
> checked at Shippable:
>   https://app.shippable.com/github/philmd/qemu/status/dashboard
> 
> Each arch builds in around ~9min. Using Shippable free open source projects
> service, the 5 jobs take ~38-44min in total.
> 
> v3:
> - Addressed review feedbacks from Fam:
> - Keep building images in various layers, but use 
> DEBIAN_FRONTEND=noninteractive
> - Document '--extra-files', now it supports adding various files at once
> - Checksum extra files to trigger a docker image rebuild if modified
> - Use better regex to generate deb-src entries
> - Reordered extra libs, to ease further add/remove diffs

Looks good in general, thanks! I've had two small questions, apart from that
it's ready to go.

Fam



Re: [Qemu-devel] [PATCH v3 03/24] docker: rebuild image if 'extra files' checksum does not match

2017-05-23 Thread Fam Zheng
On Sun, 05/21 00:29, Philippe Mathieu-Daudé wrote:
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  tests/docker/docker.py | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/docker/docker.py b/tests/docker/docker.py
> index eef92a54f1..0dd6a3304f 100755
> --- a/tests/docker/docker.py
> +++ b/tests/docker/docker.py
> @@ -38,6 +38,9 @@ def _text_checksum(text):
>  """Calculate a digest string unique to the text content"""
>  return hashlib.sha1(text).hexdigest()
>  
> +def _file_checksum(filename):
> +return _text_checksum(open(filename, 'rb').read())
> +
>  def _guess_docker_command():
>  """ Guess a working docker command or raise exception if not found"""
>  commands = [["docker"], ["sudo", "-n", "docker"]]
> @@ -154,7 +157,7 @@ class Docker(object):
>  return labels.get("com.qemu.dockerfile-checksum", "")
>  
>  def build_image(self, tag, docker_dir, dockerfile,
> -quiet=True, user=False, argv=None):
> +quiet=True, user=False, argv=None, extra_files_cksum=[]):
>  if argv == None:
>  argv = []
>  
> @@ -170,7 +173,7 @@ class Docker(object):
>  
>  tmp_df.write("\n")
>  tmp_df.write("LABEL com.qemu.dockerfile-checksum=%s" %
> - _text_checksum(dockerfile))
> + _text_checksum(dockerfile + 
> "-".join(extra_files_cksum)))

Since @dockerfile is the content of the the Dockerfile, how about concatenating
with "\n":

_text_checksum("\n".join([dockerfile] + 
extra_files_cksum))

?


>  tmp_df.flush()
>  
>  self._do(["build", "-t", tag, "-f", tmp_df.name] + argv + \
> @@ -276,16 +279,22 @@ class BuildCommand(SubCommand):
>  return 1
>  
>  # Include files used by ADD directives found within the 
> Dockerfile.
> +cksum = []
>  if args.include_executable:
> +# FIXME: there is no checksum of this executable and the 
> linked
> +# libraries, once the image built any change of this 
> executable
> +# or any library won't trigger another build.
>  _copy_binary_with_libs(args.include_executable, docker_dir)
>  for filename in args.include_files or []:
>  _copy_with_mkdir(filename, docker_dir)
> +cksum += [_file_checksum(filename)]
>  
>  argv += ["--build-arg=" + k.lower() + "=" + v
>  for k, v in os.environ.iteritems()
>  if k.lower() in FILTERED_ENV_NAMES]
>  dkr.build_image(tag, docker_dir, dockerfile,
> -quiet=args.quiet, user=args.user, argv=argv)
> +quiet=args.quiet, user=args.user, argv=argv,
> +extra_files_cksum=cksum)
>  
>  rmtree(docker_dir)
>  
> -- 
> 2.11.0
> 



Re: [Qemu-devel] [PATCH v3 02/24] docker: add --include-files argument to 'build' command

2017-05-23 Thread Fam Zheng
On Sun, 05/21 00:29, Philippe Mathieu-Daudé wrote:
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  tests/docker/Makefile.include |  5 -
>  tests/docker/docker.py| 12 +---
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
> index 03eda37bf4..fe1a9a53ff 100644
> --- a/tests/docker/Makefile.include
> +++ b/tests/docker/Makefile.include
> @@ -51,6 +51,7 @@ docker-image-%: $(DOCKER_FILES_DIR)/%.docker
>   $(SRC_PATH)/tests/docker/docker.py build qemu:$* $< \
>   $(if $V,,--quiet) $(if $(NOCACHE),--no-cache) \
>   $(if $(NOUSER),,--add-current-user) \
> + $(if $(EXTRA_FILES),--extra-files $(EXTRA_FILES))\
>   $(if $(EXECUTABLE),--include-executable=$(EXECUTABLE)),\
>   "BUILD","$*")
>  
> @@ -107,6 +108,8 @@ docker:
>   @echo 'NOUSER   Define to disable adding current user 
> to containers passwd.'
>   @echo 'NOCACHE=1Ignore cache when build images.'
>   @echo 'EXECUTABLE=Include executable in image.'
> + @echo 'EXTRA_FILES=" [... ]"'
> + @echo ' Include extra files in image.'
>  
>  # This rule if for directly running against an arbitrary docker target.
>  # It is called by the expanded docker targets (e.g. make
> diff --git a/tests/docker/docker.py b/tests/docker/docker.py
> index 6ddc6e4c2a..68cca25f89 100755
> --- a/tests/docker/docker.py
> +++ b/tests/docker/docker.py
> @@ -237,6 +237,11 @@ class BuildCommand(SubCommand):
>  help="""Specify a binary that will be copied to 
> the
>  container together with all its dependent
>  libraries""")
> +parser.add_argument("--extra-files", "-f", nargs='*',
> +help="""Specify files that will be sent to the
> +Docker daemon. The daemon will copy those files 
> into
> +the built image. The ADD directive of the 
> Dockerfile
> +specify where a file is placed into the image""")

Or more precisely, "The daemon will copy these files into the docker build
directory, which can be copied to the built image with ADD directives in
Dockerfile"?

>  parser.add_argument("--add-current-user", "-u", dest="user",
>  action="store_true",
>  help="Add the current user to image's passwd")
> @@ -270,10 +275,11 @@ class BuildCommand(SubCommand):
>  print "%s exited with code %d" % (docker_pre, rc)
>  return 1
>  
> -# Do we include a extra binary?
> +# Include files used by ADD directives found within the 
> Dockerfile.
>  if args.include_executable:
> -_copy_binary_with_libs(args.include_executable,
> -   docker_dir)
> +_copy_binary_with_libs(args.include_executable, docker_dir)
> +for filename in args.extra_files or []:
> +_copy_with_mkdir(filename, docker_dir)
>  
>  argv += ["--build-arg=" + k.lower() + "=" + v
>  for k, v in os.environ.iteritems()
> -- 
> 2.11.0
> 



Re: [Qemu-devel] [Qemu-arm] [PATCH 1/2] hw/adc: Add basic Aspeed ADC model

2017-05-23 Thread Philippe Mathieu-Daudé

Hi Andrew,

On 05/22/2017 02:14 AM, Andrew Jeffery wrote:

On Mon, 2017-05-22 at 03:15 +, Ryan Chen wrote:

In ASPEED SoC chip, all register access have following rule.
Most of controller write access is only support 32bit access.
Read is support 8bits/16bits/32bits.


This makes sens thinking about how a DMA controller can take advantage 
of the ADC.




Thanks for clearing that up Ryan.

Phil: I'll rework the model so the reads are 16-bits.


This shouldn't be necessary, QEMU is supposed to supports different 
access size for different implemented size, so you can declare your 
implementation as 32-bit and valid accesses from 8 to 32:


 static const MemoryRegionOps aspeed_adc_ops = {
 .read = aspeed_adc_read,
 .write = aspeed_adc_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
 .valid.min_access_size = 1,
 .valid.max_access_size = 4,
 .valid.unaligned = false,
+.impl.min_access_size = 4,
+.impl.max_access_size = 4,
 };

This way an I/O access from the CPU or a DMA could use 8/16-bit while 
you keep a 32-bit implementation. The adjustment is done by 
access_with_adjusted_size() from memory.c.


Afaik there is, however, no distinction between read/write different 
access size in current QEMU MemoryRegionOps model.




Thanks,

Andrew





Best Regards,
Ryan

信驊科技股份有限公司
ASPEED Technology Inc.
2F,No.15,Industry East Road 4.,Hsinchu Science Park, Hsinchu City 30077, Taiwan
Tel: 886-3-578-9568  #857
Fax: 886-3-578-9586
* Email Confidentiality Notice 
DISCLAIMER:
This message (and any attachments) may contain legally privileged and/or other 
confidential information. If you have received it in error, please notify the 
sender by reply e-mail and immediately delete the e-mail and any attachments 
without copying or disclosing the contents. Thank you.

-Original Message-

From: Andrew Jeffery [mailto:and...@aj.id.au]

Sent: Monday, May 22, 2017 8:24 AM

To: Philippe Mathieu-Daudé ; qemu-...@nongnu.org

Cc: Peter Maydell ; Ryan Chen ; Alistair 
Francis ; qemu-devel@nongnu.org; Cédric Le Goater ; Joel 
Stanley 

Subject: Re: [Qemu-arm] [PATCH 1/2] hw/adc: Add basic Aspeed ADC model

Hi Phil,

On Sat, 2017-05-20 at 00:21 -0300, Philippe Mathieu-Daudé wrote:

Hi Andrew,

On 05/19/2017 09:26 PM, Andrew Jeffery wrote:

This model implements enough behaviour to do basic functionality
tests such as device initialisation and read out of dummy sample
values. The sample value generation strategy is similar to the STM
ADC already in the tree.


Signed-off-by: Andrew Jeffery 


---
 hw/adc/Makefile.objs|   1 +
 hw/adc/aspeed_adc.c | 246

 include/hw/adc/aspeed_adc.h |  33 ++
 3 files changed, 280 insertions(+)
 create mode 100644 hw/adc/aspeed_adc.c
 create mode 100644 include/hw/adc/aspeed_adc.h

diff --git a/hw/adc/Makefile.objs b/hw/adc/Makefile.objs index
3f6dfdedaec7..2bf9362ba3c4 100644
--- a/hw/adc/Makefile.objs
+++ b/hw/adc/Makefile.objs
@@ -1 +1,2 @@
 obj-$(CONFIG_STM32F2XX_ADC) += stm32f2xx_adc.o
+obj-$(CONFIG_ASPEED_SOC) += aspeed_adc.o
diff --git a/hw/adc/aspeed_adc.c b/hw/adc/aspeed_adc.c new file mode
100644 index ..d08f1684f7bc
--- /dev/null
+++ b/hw/adc/aspeed_adc.c
@@ -0,0 +1,246 @@
+/*
+ * Aspeed ADC
+ *

+ * Andrew Jeffery 


+ *
+ * Copyright 2017 IBM Corp.
+ *
+ * This code is licensed under the GPL version 2 or later.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/adc/aspeed_adc.h"
+#include "qapi/error.h"
+#include "qemu/log.h"
+
+#define ASPEED_ADC_ENGINE_CTRL  0x00 #define
+ASPEED_ADC_ENGINE_CH_EN_MASK   0x #define
+ASPEED_ADC_ENGINE_CH_EN(x)((BIT(x)) << 16) #define
+ASPEED_ADC_ENGINE_INIT BIT(8) #define
+ASPEED_ADC_ENGINE_AUTO_COMPBIT(5) #define
+ASPEED_ADC_ENGINE_COMP BIT(4) #define
+ASPEED_ADC_ENGINE_MODE_MASK0x000e #define
+ASPEED_ADC_ENGINE_MODE_OFF(0b000 << 1) #define
+ASPEED_ADC_ENGINE_MODE_STANDBY(0b001 << 1) #define
+ASPEED_ADC_ENGINE_MODE_NORMAL (0b111 << 1) #define
+ASPEED_ADC_ENGINE_EN   BIT(0)
+
+#define ASPEED_ADC_L_MASK   ((1 << 10) - 1) #define
+ASPEED_ADC_L(x) ((x) & ASPEED_ADC_L_MASK) #define
+ASPEED_ADC_H(x) (((x) >> 16) & ASPEED_ADC_L_MASK) #define
+ASPEED_ADC_LH_MASK  (ASPEED_ADC_L_MASK << 16 |
+ASPEED_ADC_L_MASK)
+
+static inline uint32_t update_channels(uint32_t current) {
+const uint32_t next = (current + 7) & 0x3ff;
+
+return (next << 16) | next;
+}
+
+static bool breaks_threshold(AspeedADCState *s, int ch_off) {
+const uint32_t a = ASPEED_ADC_L(s->channels[ch_off]);
+const uint32_t a_lower = ASPEED_ADC_L(s->bounds[2 * ch_off]);
+

[Qemu-devel] [PATCH v3 3/3] spapr: Fix migration of Radix guests

2017-05-23 Thread Bharata B Rao
Fix migration of radix guests by ensuring that we issue
KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.

Reported-by: Nageswara R Sastry 
Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index daf335c..ea14bed 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1400,6 +1400,18 @@ static int spapr_post_load(void *opaque, int version_id)
 err = spapr_rtc_import_offset(>rtc, spapr->rtc_offset);
 }
 
+if (spapr->patb_entry) {
+PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+bool radix = !!(spapr->patb_entry & PATBE1_GR);
+bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE);
+
+err = kvmppc_configure_v3_mmu(cpu, radix, gtse, spapr->patb_entry);
+if (err) {
+error_report("Process table config unsupported by the host");
+return -EINVAL;
+}
+}
+
 return err;
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH v3 1/3] migration: Introduce unregister_savevm_live()

2017-05-23 Thread Bharata B Rao
Introduce a new function unregister_savevm_live() to unregister the vmstate
handlers registered via register_savevm_live().

register_savevm() allocates SaveVMHandlers while register_savevm_live()
gets passed with SaveVMHandlers. During unregistration, we  want to
free SaveVMHandlers in the former case but not free in the latter case.
Hence this new API is needed to differentiate this.

This new API will be needed by PowerPC to unregister the HTAB savevm
handlers.

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
Cc: Juan Quintela 
Cc: Dr. David Alan Gilbert 
---
 include/migration/vmstate.h |  1 +
 migration/savevm.c  | 17 +++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 8489659..02a1bac 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -79,6 +79,7 @@ int register_savevm_live(DeviceState *dev,
  void *opaque);
 
 void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque);
+void unregister_savevm_live(DeviceState *dev, const char *idstr, void *opaque);
 
 typedef struct VMStateInfo VMStateInfo;
 typedef struct VMStateDescription VMStateDescription;
diff --git a/migration/savevm.c b/migration/savevm.c
index f5e8194..4ef6fdc 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -630,7 +630,8 @@ int register_savevm(DeviceState *dev,
 ops, opaque);
 }
 
-void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque)
+static void unregister_savevm_common(DeviceState *dev, const char *idstr,
+ void *opaque, bool free_savevmhandlers)
 {
 SaveStateEntry *se, *new_se;
 char id[256] = "";
@@ -649,12 +650,24 @@ void unregister_savevm(DeviceState *dev, const char 
*idstr, void *opaque)
 if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
 QTAILQ_REMOVE(_state.handlers, se, entry);
 g_free(se->compat);
-g_free(se->ops);
+if (free_savevmhandlers) {
+g_free(se->ops);
+}
 g_free(se);
 }
 }
 }
 
+void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque)
+{
+unregister_savevm_common(dev, idstr, opaque, true);
+}
+
+void unregister_savevm_live(DeviceState *dev, const char *idstr, void *opaque)
+{
+unregister_savevm_common(dev, idstr, opaque, false);
+}
+
 int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
const VMStateDescription *vmsd,
void *opaque, int alias_id,
-- 
2.7.4




[Qemu-devel] [PATCH v3 0/3] ppc/spapr: Fix migration of radix guests

2017-05-23 Thread Bharata B Rao
This patchset fixes the migration of sPAPR radix guests.

Changes in v3:
--
- Dropped the patch that made h_register_process_table flags global as
  it is not required with the changed logic.
- The post load logic is now same for both TCG as well as KVM radix guests.
  Also ensured that radix and future v3 hash migration post load is
  treated similarly.
- Ensure TCG Radix guest migration isn't broken.

v2: https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg04491.html

Bharata B Rao (3):
  migration: Introduce unregister_savevm_live()
  spapr: Unregister HPT savevm handlers for radix guests
  spapr: Fix migration of Radix guests

 hw/ppc/spapr.c  | 111 +---
 include/hw/ppc/spapr.h  |   2 +
 include/migration/vmstate.h |   1 +
 migration/savevm.c  |  17 ++-
 4 files changed, 80 insertions(+), 51 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH v3 2/3] spapr: Unregister HPT savevm handlers for radix guests

2017-05-23 Thread Bharata B Rao
HPT gets created by default for TCG guests and later when the guest turns
out to be a radix guest, the HPT is destroyed when guest does
H_REGISTER_PROC_TBL hcall. Let HTAB savevm handlers registration and
unregistration follow the same model so that we don't end up having
unrequired HTAB savevm handlers for radix guests.

This also ensures that HTAB savevm handlers seemlessly get destroyed and
recreated like HTAB itself when hash guest reboots.

HTAB savevm handlers registration/unregistration is now done from
spapr_reallocate_hpt() which itself is called from one of the
savevm_htab_handlers.htab_load(). To cater to this circular dependency
spapr_reallocate_hpt() is made global.

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 hw/ppc/spapr.c | 99 +-
 include/hw/ppc/spapr.h |  2 +
 2 files changed, 52 insertions(+), 49 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 91f7434..daf335c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1233,53 +1233,7 @@ void spapr_free_hpt(sPAPRMachineState *spapr)
 spapr->htab = NULL;
 spapr->htab_shift = 0;
 close_htab_fd(spapr);
-}
-
-static void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift,
- Error **errp)
-{
-long rc;
-
-/* Clean up any HPT info from a previous boot */
-spapr_free_hpt(spapr);
-
-rc = kvmppc_reset_htab(shift);
-if (rc < 0) {
-/* kernel-side HPT needed, but couldn't allocate one */
-error_setg_errno(errp, errno,
- "Failed to allocate KVM HPT of order %d (try smaller 
maxmem?)",
- shift);
-/* This is almost certainly fatal, but if the caller really
- * wants to carry on with shift == 0, it's welcome to try */
-} else if (rc > 0) {
-/* kernel-side HPT allocated */
-if (rc != shift) {
-error_setg(errp,
-   "Requested order %d HPT, but kernel allocated order %ld 
(try smaller maxmem?)",
-   shift, rc);
-}
-
-spapr->htab_shift = shift;
-spapr->htab = NULL;
-} else {
-/* kernel-side HPT not needed, allocate in userspace instead */
-size_t size = 1ULL << shift;
-int i;
-
-spapr->htab = qemu_memalign(size, size);
-if (!spapr->htab) {
-error_setg_errno(errp, errno,
- "Could not allocate HPT of order %d", shift);
-return;
-}
-
-memset(spapr->htab, 0, size);
-spapr->htab_shift = shift;
-
-for (i = 0; i < size / HASH_PTE_SIZE_64; i++) {
-DIRTY_HPTE(HPTE(spapr->htab, i));
-}
-}
+unregister_savevm_live(NULL, "spapr/htab", spapr);
 }
 
 void spapr_setup_hpt_and_vrma(sPAPRMachineState *spapr)
@@ -1879,6 +1833,55 @@ static SaveVMHandlers savevm_htab_handlers = {
 .load_state = htab_load,
 };
 
+void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift,
+ Error **errp)
+{
+long rc;
+
+/* Clean up any HPT info from a previous boot */
+spapr_free_hpt(spapr);
+
+rc = kvmppc_reset_htab(shift);
+if (rc < 0) {
+/* kernel-side HPT needed, but couldn't allocate one */
+error_setg_errno(errp, errno,
+ "Failed to allocate KVM HPT of order %d (try smaller 
maxmem?)",
+ shift);
+/* This is almost certainly fatal, but if the caller really
+ * wants to carry on with shift == 0, it's welcome to try */
+} else if (rc > 0) {
+/* kernel-side HPT allocated */
+if (rc != shift) {
+error_setg(errp,
+   "Requested order %d HPT, but kernel allocated order %ld 
(try smaller maxmem?)",
+   shift, rc);
+}
+
+spapr->htab_shift = shift;
+spapr->htab = NULL;
+} else {
+/* kernel-side HPT not needed, allocate in userspace instead */
+size_t size = 1ULL << shift;
+int i;
+
+spapr->htab = qemu_memalign(size, size);
+if (!spapr->htab) {
+error_setg_errno(errp, errno,
+ "Could not allocate HPT of order %d", shift);
+return;
+}
+
+memset(spapr->htab, 0, size);
+spapr->htab_shift = shift;
+
+for (i = 0; i < size / HASH_PTE_SIZE_64; i++) {
+DIRTY_HPTE(HPTE(spapr->htab, i));
+}
+}
+register_savevm_live(NULL, "spapr/htab", -1, 1,
+ _htab_handlers, spapr);
+}
+
 static void spapr_boot_set(void *opaque, const char *boot_device,
Error **errp)
 {
@@ -2341,8 +2344,6 @@ static void ppc_spapr_init(MachineState *machine)
  * interface, this is a legacy from the sPAPREnvironment structure
  * which predated MachineState but had a 

Re: [Qemu-devel] [PATCH v3 15/24] docker: add powerpc build target

2017-05-23 Thread Philippe Mathieu-Daudé

Hi Alex,

On 05/22/2017 11:08 AM, Alex Bennée wrote:


Philippe Mathieu-Daudé  writes:


Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/docker/Makefile.include  |  4 +--
 .../docker/dockerfiles/debian-powerpc-cross.docker | 40 ++
 2 files changed, 42 insertions(+), 2 deletions(-)
 create mode 100644 tests/docker/dockerfiles/debian-powerpc-cross.docker

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 111b8090b2..9815976486 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -56,11 +56,13 @@ docker-image-%: $(DOCKER_FILES_DIR)/%.docker
"BUILD","$*")

 docker-image-debian-mipsel-cross: 
EXTRA_FILES:=tests/docker/dockerfiles/debian-apt-fake.sh
+docker-image-debian-powerpc-cross: 
EXTRA_FILES:=tests/docker/dockerfiles/debian-apt-fake.sh

 # Enforce dependancies for composite images
 docker-image-debian-armhf-cross: docker-image-debian
 docker-image-debian-arm64-cross: docker-image-debian
 docker-image-debian-mipsel-cross: docker-image-debian
+docker-image-debian-powerpc-cross: docker-image-debian

 # Expand all the pre-requistes for each docker image and test combination
 $(foreach i,$(DOCKER_IMAGES), \
@@ -111,8 +113,6 @@ docker:
@echo 'NOUSER   Define to disable adding current user 
to containers passwd.'
@echo 'NOCACHE=1Ignore cache when build images.'
@echo 'EXECUTABLE=Include executable in image.'
-   @echo 'EXTRA_FILES=" [... ]"'
-   @echo ' Include extra files in image.'


I'm fairly sure you didn't want to do this.


Ups, good catch :) Rebase mistake.
Are you Ok with those 2 lines back? (I think since it is pretty much a 
copy/paste of mipsel Dockerfile).

I'll send fixed and your other r-b after Fam review, thanks!





 # This rule if for directly running against an arbitrary docker target.
 # It is called by the expanded docker targets (e.g. make
diff --git a/tests/docker/dockerfiles/debian-powerpc-cross.docker 
b/tests/docker/dockerfiles/debian-powerpc-cross.docker
new file mode 100644
index 00..fa2cc7a657
--- /dev/null
+++ b/tests/docker/dockerfiles/debian-powerpc-cross.docker
@@ -0,0 +1,40 @@
+#
+# Docker powerpc cross-compiler target
+#
+# This docker target builds on the base debian image.
+#
+FROM qemu:debian
+MAINTAINER Philippe Mathieu-Daudé 
+
+# Add the foreign architecture we want and install dependencies
+RUN dpkg --add-architecture powerpc
+RUN apt-get update
+RUN DEBIAN_FRONTEND=noninteractive eatmydata \
+apt-get install -y --no-install-recommends \
+crossbuild-essential-powerpc
+
+#  to fix "following packages have unmet dependencies" ...
+ADD debian-apt-fake.sh /usr/local/bin/apt-fake
+RUN apt-get install -y --no-install-recommends \
+equivs \
+pkg-config
+RUN apt-fake install \
+pkg-config:powerpc=0.28-1.1-fake && \
+ln -s pkg-config /usr/bin/powerpc-linux-gnu-pkg-config
+ENV PKG_CONFIG_PATH /usr/lib/powerpc-linux-gnu/pkgconfig
+# 
+
+# Specify the cross prefix for this image (see tests/docker/common.rc)
+ENV QEMU_CONFIGURE_OPTS --cross-prefix=powerpc-linux-gnu-
+
+RUN DEBIAN_FRONTEND=noninteractive eatmydata \
+apt-get build-dep -yy -a powerpc qemu
+RUN DEBIAN_FRONTEND=noninteractive \
+apt-get install -y --no-install-recommends \
+glusterfs-common:powerpc \
+libbz2-dev:powerpc \
+liblzo2-dev:powerpc \
+libncursesw5-dev:powerpc \
+libnfs-dev:powerpc \
+librdmacm-dev:powerpc \
+libsnappy-dev:powerpc



--
Alex Bennée





Re: [Qemu-devel] [PATCH v5 2/4] net/rocker: Plug memory leak in pci_rocker_init()

2017-05-23 Thread Philippe Mathieu-Daudé

On 05/23/2017 01:04 AM, Mao Zhongyi wrote:

pci_rocker_init() leaks a World when the name more than 9 chars,
then return a negative value directly, doesn't make a correct
cleanup. So add a new goto label to fix it.

Signed-off-by: Mao Zhongyi 
Reviewed-by: Markus Armbruster 


Reviewed-by: Philippe Mathieu-Daudé 


---
 hw/net/rocker/rocker.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index d01ba9d..6d44f37 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1357,7 +1357,8 @@ static int pci_rocker_init(PCIDevice *dev)
 fprintf(stderr,
 "rocker: name too long; please shorten to at most %d chars\n",
 MAX_ROCKER_NAME_LEN);
-return -EINVAL;
+err = -EINVAL;
+goto err_name_too_long;
 }

 if (memcmp(>fp_start_macaddr, , sizeof(zero)) == 0) {
@@ -1416,6 +1417,7 @@ static int pci_rocker_init(PCIDevice *dev)

 return 0;

+err_name_too_long:
 err_duplicate:
 rocker_msix_uninit(r);
 err_msix_init:





Re: [Qemu-devel] [PATCH v5 3/4] net/rocker: Convert to realize()

2017-05-23 Thread Philippe Mathieu-Daudé

On 05/23/2017 01:04 AM, Mao Zhongyi wrote:

The rocker device still implements the old PCIDeviceClass .init()
instead of the new .realize(). All devices need to be converted to
.realize().

.init() reports errors with fprintf() and return 0 on success, negative
number on failure. Meanwhile, when -device rocker fails, it first report
a specific error, then a generic one, like this:

$ x86_64-softmmu/qemu-system-x86_64 -device rocker,name=qemu-rocker
rocker: name too long; please shorten to at most 9 chars
qemu-system-x86_64: -device rocker,name=qemu-rocker: Device initialization 
failed

Now, convert it to .realize() that passes errors to its callers via its
errp argument. Also avoid the superfluous second error message. After
the patch, effect like this:

$ x86_64-softmmu/qemu-system-x86_64 -device rocker,name=qemu-rocker
qemu-system-x86_64: -device rocker,name=qemu-rocker: name too long; please 
shorten to at most 9 chars

Signed-off-by: Mao Zhongyi 
Reviewed-by: Markus Armbruster 


Reviewed-by: Philippe Mathieu-Daudé 


---
 hw/net/rocker/rocker.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index 6d44f37..2764529 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1238,20 +1238,18 @@ rollback:
 return err;
 }

-static int rocker_msix_init(Rocker *r)
+static int rocker_msix_init(Rocker *r, Error **errp)
 {
 PCIDevice *dev = PCI_DEVICE(r);
 int err;
-Error *local_err = NULL;

 err = msix_init(dev, ROCKER_MSIX_VEC_COUNT(r->fp_ports),
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_TABLE_OFFSET,
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_PBA_OFFSET,
-0, _err);
+0, errp);
 if (err) {
-error_report_err(local_err);
 return err;
 }

@@ -1287,7 +1285,7 @@ static World *rocker_world_type_by_name(Rocker *r, const 
char *name)
 return NULL;
 }

-static int pci_rocker_init(PCIDevice *dev)
+static void pci_rocker_realize(PCIDevice *dev, Error **errp)
 {
 Rocker *r = to_rocker(dev);
 const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
@@ -1305,10 +1303,9 @@ static int pci_rocker_init(PCIDevice *dev)

 r->world_dflt = rocker_world_type_by_name(r, r->world_name);
 if (!r->world_dflt) {
-fprintf(stderr,
-"rocker: requested world \"%s\" does not exist\n",
+error_setg(errp,
+"invalid argument requested world %s does not exist",
 r->world_name);
-err = -EINVAL;
 goto err_world_type_by_name;
 }

@@ -1328,7 +1325,7 @@ static int pci_rocker_init(PCIDevice *dev)

 /* MSI-X init */

-err = rocker_msix_init(r);
+err = rocker_msix_init(r, errp);
 if (err) {
 goto err_msix_init;
 }
@@ -1340,7 +1337,7 @@ static int pci_rocker_init(PCIDevice *dev)
 }

 if (rocker_find(r->name)) {
-err = -EEXIST;
+error_setg(errp, "%s already exists", r->name);
 goto err_duplicate;
 }

@@ -1354,10 +1351,9 @@ static int pci_rocker_init(PCIDevice *dev)
 #define ROCKER_IFNAMSIZ 16
 #define MAX_ROCKER_NAME_LEN  (ROCKER_IFNAMSIZ - 1 - 3 - 3)
 if (strlen(r->name) > MAX_ROCKER_NAME_LEN) {
-fprintf(stderr,
-"rocker: name too long; please shorten to at most %d chars\n",
+error_setg(errp,
+"name too long; please shorten to at most %d chars",
 MAX_ROCKER_NAME_LEN);
-err = -EINVAL;
 goto err_name_too_long;
 }

@@ -1415,7 +1411,7 @@ static int pci_rocker_init(PCIDevice *dev)

 QLIST_INSERT_HEAD(, r, next);

-return 0;
+return;

 err_name_too_long:
 err_duplicate:
@@ -1429,7 +1425,6 @@ err_world_type_by_name:
 world_free(r->worlds[i]);
 }
 }
-return err;
 }

 static void pci_rocker_uninit(PCIDevice *dev)
@@ -1514,7 +1509,7 @@ static void rocker_class_init(ObjectClass *klass, void 
*data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);

-k->init = pci_rocker_init;
+k->realize = pci_rocker_realize;
 k->exit = pci_rocker_uninit;
 k->vendor_id = PCI_VENDOR_ID_REDHAT;
 k->device_id = PCI_DEVICE_ID_REDHAT_ROCKER;





Re: [Qemu-devel] [PATCH v5 4/4] net/rocker: Fix the unusual macro name

2017-05-23 Thread Philippe Mathieu-Daudé

On 05/23/2017 01:04 AM, Mao Zhongyi wrote:

Suggested-by: Markus Armbruster 
Signed-off-by: Mao Zhongyi 


Reviewed-by: Philippe Mathieu-Daudé 


---
 hw/net/rocker/rocker.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index 2764529..f8a32f7 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -69,10 +69,10 @@ struct rocker {
 QLIST_ENTRY(rocker) next;
 };

-#define ROCKER "rocker"
+#define TYPE_ROCKER "rocker"

-#define to_rocker(obj) \
-OBJECT_CHECK(Rocker, (obj), ROCKER)
+#define ROCKER(obj) \
+OBJECT_CHECK(Rocker, (obj), TYPE_ROCKER)

 static QLIST_HEAD(, rocker) rockers;

@@ -1287,7 +1287,7 @@ static World *rocker_world_type_by_name(Rocker *r, const 
char *name)

 static void pci_rocker_realize(PCIDevice *dev, Error **errp)
 {
-Rocker *r = to_rocker(dev);
+Rocker *r = ROCKER(dev);
 const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
 const MACAddr dflt = { .a = { 0x52, 0x54, 0x00, 0x12, 0x35, 0x01 } };
 static int sw_index;
@@ -1333,7 +1333,7 @@ static void pci_rocker_realize(PCIDevice *dev, Error 
**errp)
 /* validate switch properties */

 if (!r->name) {
-r->name = g_strdup(ROCKER);
+r->name = g_strdup(TYPE_ROCKER);
 }

 if (rocker_find(r->name)) {
@@ -1429,7 +1429,7 @@ err_world_type_by_name:

 static void pci_rocker_uninit(PCIDevice *dev)
 {
-Rocker *r = to_rocker(dev);
+Rocker *r = ROCKER(dev);
 int i;

 QLIST_REMOVE(r, next);
@@ -1462,7 +1462,7 @@ static void pci_rocker_uninit(PCIDevice *dev)

 static void rocker_reset(DeviceState *dev)
 {
-Rocker *r = to_rocker(dev);
+Rocker *r = ROCKER(dev);
 int i;

 for (i = 0; i < ROCKER_WORLD_TYPE_MAX; i++) {
@@ -1500,7 +1500,7 @@ static Property rocker_properties[] = {
 };

 static const VMStateDescription rocker_vmsd = {
-.name = ROCKER,
+.name = TYPE_ROCKER,
 .unmigratable = 1,
 };

@@ -1523,7 +1523,7 @@ static void rocker_class_init(ObjectClass *klass, void 
*data)
 }

 static const TypeInfo rocker_info = {
-.name  = ROCKER,
+.name  = TYPE_ROCKER,
 .parent= TYPE_PCI_DEVICE,
 .instance_size = sizeof(Rocker),
 .class_init= rocker_class_init,





Re: [Qemu-devel] [PATCH v5 1/4] net/rocker: Remove the dead error handling

2017-05-23 Thread Philippe Mathieu-Daudé

Hi Markus,

On 05/23/2017 06:27 AM, Markus Armbruster wrote:
[...]

There's one more cleanup opportunity:


[...]

 if (pci_dma_read(dev, le64_to_cpu(info->desc.buf_addr), info->buf, size)) {
 return NULL;
 }


None of the pci_dma_read() calls outside rocker check the return value.
Just as well, because it always returns 0.  Please clean this up in a
separate followup patch.


It may be the correct way to do it but this sounds like we are missing 
something somewhere... pci_dma_read() calls pci_dma_rw() which always 
returns 0. Why not let it returns void? It is inlined and never used by 
address. Else we should document why returning 0 is correct, and what is 
the reason to not use a void prototype.


pci_dma_rw() calls dma_memory_rw() which does return a boolean value, 
false on success (MEMTX_OK) and true on error (MEMTX_ERROR/DECODE_ERROR)


Regards,

Phil.



Re: [Qemu-devel] [PATCH v2 13/13] vvfat: change OEM name to 'MSWIN4.1'

2017-05-23 Thread Philippe Mathieu-Daudé

On 05/23/2017 04:41 PM, Hervé Poussineau wrote:

Hi Philippe,

Le 23/05/2017 à 06:23, Philippe Mathieu-Daudé a écrit :

Hi Hervé,

On 05/22/2017 06:12 PM, Hervé Poussineau wrote:

According to specification:
"'MSWIN4.1' is the recommanded setting, because it is the setting
least likely
to cause compatibility problems. If you want to put something else in
here,
that is your option, but the result may be that some FAT drivers
might not
recognize the volume."


It also says "Typically this is some indication of what system formatted
the volume."

From https://technet.microsoft.com/en-us/library/cc976796.aspx:

"Following the jump instruction is the 8-byte OEM ID, a string of
characters that identifies the name and version number of the
operating system that formatted the volume. To preserve compatibility
with MS-DOS, Windows 2000 records "MSDOS5.0" in this field on FAT16
and FAT32 disks. On NTFS disks, Windows 2000 records "NTFS."
You may also see the OEM ID "MSWIN4.0" on disks formatted by Windows
95 and "MSWIN4.1" on disks formatted by Windows 95 OSR2 and Windows
98. Windows 2000 does not use the OEM ID field in the boot
sector except for verifying NTFS volumes."

I'm curious which one is the most up-to-date, the specification v1.03
or a Windows 2000. Do you have an idea if there is more MSDOS/W2K vs
W95/98 users? Hopefully W95 is a Long time no see to me.

You think having "QEMU" OEM does more harm than good?


That's complicated. Indeed, OEM ID should be the name of the OS/tool
which formatted the partition.
However, some FAT drivers take different paths according to OEM ID.
See for example:
https://jdebp.eu/FGA/volume-boot-block-oem-name-field.html


"Linux (util-linux up to at least version 2.12) ... [first characters 
are QEMU] Treat the volume as unformatted" :facepalm:



http://seasip.info./Misc/oemid.html

So, in my opinion, it should be better to stick to some specification
for the whole FAT format, so we have a reference document to know if
there is a bug in the code or not.


Once explained, that's fine :) Can you add those references in the code? 
Eventually add a #define VOLUME_BOOT_BLOCKS_OEM_NAME with a fast 
explanation and those urls.


With that:
Reviewed-by: Philippe Mathieu-Daudé 


FAT specification 1.03 is dated December 6, 2000, while Windows 2000 has
been released December 15, 1999, so both are around the same date.
FAT specification gives all details about all disk structures, so I
think that's better to stick to what it says.

So to answer your question, and knowing the tricks played with OEM ID, I
think that's better to use "MSWIN4.1" than "QEMU".

Regards,

Hervé





Specification: "FAT: General overview of on-disk format" v1.03, page 9
Signed-off-by: Hervé Poussineau 
---
 block/vvfat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 53e8faa54c..1f7f46ecea 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1024,7 +1024,7 @@ static int init_directories(BDRVVVFATState* s,
 bootsector->jump[0]=0xeb;
 bootsector->jump[1]=0x3e;
 bootsector->jump[2]=0x90;
-memcpy(bootsector->name,"QEMU",8);
+memcpy(bootsector->name, "MSWIN4.1", 8);
 bootsector->sector_size=cpu_to_le16(0x200);
 bootsector->sectors_per_cluster=s->sectors_per_cluster;
 bootsector->reserved_sectors=cpu_to_le16(1);



Regards,

Phil.







Re: [Qemu-devel] [PATCH v2 12/13] vvfat: handle KANJI lead byte 0xe5

2017-05-23 Thread Philippe Mathieu-Daudé

Hi Hervé,

You explained in the cover "fix problems detected by disk checking 
utilities in read-only mode", do you think it would be doable to have 
unit-tests for those corner cases?


On 05/22/2017 06:12 PM, Hervé Poussineau wrote:

Specification: "FAT: General overview of on-disk format" v1.03, page 23
Signed-off-by: Hervé Poussineau 
---
 block/vvfat.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 5376659010..53e8faa54c 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -590,6 +590,10 @@ static direntry_t *create_short_filename(BDRVVVFATState *s,
 }
 }

+if (entry->name[0] == 0xe5) {


This deserves a comment, and a #define would also be welcomed.


+entry->name[0] = 0x05;
+}
+
 /* numeric-tail generation */
 for (j = 0; j < 8; j++) {
 if (entry->name[j] == ' ') {
@@ -710,8 +714,6 @@ static inline void init_fat(BDRVVVFATState* s)

 }

-/* TODO: in create_short_filename, 0xe5->0x05 is not yet handled! */
-/* TODO: in parse_short_filename, 0x05->0xe5 is not yet handled! */
 static inline direntry_t* create_short_and_long_name(BDRVVVFATState* s,
 unsigned int directory_start, const char* filename, int is_dot)
 {
@@ -1744,6 +1746,9 @@ static int parse_short_name(BDRVVVFATState* s,
 } else
 lfn->name[i + j + 1] = '\0';

+if (lfn->name[0] == 0x05) {
+lfn->name[0] = 0xe5;
+}
 lfn->len = strlen((char*)lfn->name);

 return 0;





Re: [Qemu-devel] [PATCH] net/rocker: Cleanup the useless return value check

2017-05-23 Thread Mao Zhongyi



On 05/24/2017 11:31 AM, Jason Wang wrote:



On 2017年05月24日 10:57, Mao Zhongyi wrote:

None of pci_dma_read()'s callers check the return value except
rocker. There is no need to check it because it always return
0. So the check work is useless. Remove it entirely.

Suggested-by: Markus Armbruster 
Signed-off-by: Mao Zhongyi 
---

[...]


Applied, thanks.


Thanks:)




















Re: [Qemu-devel] [PATCH v2 09/13] vvfat: correctly create base short names for non-ASCII filenames

2017-05-23 Thread Philippe Mathieu-Daudé

Hi Hervé,

On 05/22/2017 06:12 PM, Hervé Poussineau wrote:

More specifically, create short name from filename and change blacklist of
invalid chars to whitelist of valid chars.

Windows 9x also now correctly see long file names of filenames containing a 
space,
but Scandisk still complains about mismatch between SFN and LFN.

Specification: "FAT: General overview of on-disk format" v1.03, pages 30-31
Signed-off-by: Hervé Poussineau 
---
 block/vvfat.c | 105 ++
 1 file changed, 77 insertions(+), 28 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index d34241da17..3cb65493cd 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -516,6 +516,81 @@ static void set_begin_of_direntry(direntry_t* direntry, 
uint32_t begin)
 direntry->begin_hi = cpu_to_le16((begin >> 16) & 0x);
 }

+static uint8_t to_valid_short_char(gunichar c)
+{
+c = g_unichar_toupper(c);
+if ((c >= '0' && c <= '9') ||
+(c >= 'A' && c <= 'Z') ||
+strchr("$%'-_@~`!(){}^#&", c) != 0) {
+return c;
+} else {
+return 0;
+}
+}
+
+static direntry_t *create_short_filename(BDRVVVFATState *s,
+ const char *filename)
+{
+int i, j = 0;
+direntry_t *entry = array_get_next(&(s->directory));
+const gchar *p, *last_dot = NULL;
+gunichar c;
+bool lossy_conversion = false;


What is the purpose of this variable? Do you plan to use it later or it 
is just for debugging?



+char tail[11];


This also looks like old debug code...


+
+if (!entry) {
+return NULL;
+}
+memset(entry->name, 0x20, sizeof(entry->name));
+
+/* copy filename and search last dot */
+for (p = filename; ; p = g_utf8_next_char(p)) {
+c = g_utf8_get_char(p);
+if (c == '\0') {
+break;
+} else if (c == '.') {
+if (j == 0) {
+/* '.' at start of filename */
+lossy_conversion = true;
+} else {
+if (last_dot) {
+lossy_conversion = true;
+}
+last_dot = p;
+}
+} else if (!last_dot) {
+/* first part of the name; copy it */
+uint8_t v = to_valid_short_char(c);
+if (j < 8 && v) {
+entry->name[j++] = v;
+} else {
+lossy_conversion = true;
+}
+}
+}
+
+/* copy extension (if any) */
+if (last_dot) {
+j = 0;
+for (p = g_utf8_next_char(last_dot); ; p = g_utf8_next_char(p)) {
+c = g_utf8_get_char(p);
+if (c == '\0') {
+break;
+} else {
+/* extension; copy it */
+uint8_t v = to_valid_short_char(c);
+if (j < 3 && v) {
+entry->name[8 + (j++)] = v;
+} else {
+lossy_conversion = true;
+}
+}
+}
+}
+(void)lossy_conversion;
+return entry;
+}
+
 /* fat functions */

 static inline uint8_t fat_chksum(const direntry_t* entry)
@@ -614,7 +689,7 @@ static inline void init_fat(BDRVVVFATState* s)
 static inline direntry_t* create_short_and_long_name(BDRVVVFATState* s,
 unsigned int directory_start, const char* filename, int is_dot)
 {
-int i,j,long_index=s->directory.next;
+int long_index = s->directory.next;
 direntry_t* entry = NULL;
 direntry_t* entry_long = NULL;

@@ -626,33 +701,7 @@ static inline direntry_t* 
create_short_and_long_name(BDRVVVFATState* s,
 }

 entry_long=create_long_filename(s,filename);
-
-i = strlen(filename);
-for(j = i - 1; j>0  && filename[j]!='.';j--);
-if (j > 0)
-i = (j > 8 ? 8 : j);
-else if (i > 8)
-i = 8;
-
-entry=array_get_next(&(s->directory));
-memset(entry->name, 0x20, sizeof(entry->name));
-memcpy(entry->name, filename, i);
-
-if (j > 0) {
-for (i = 0; i < 3 && filename[j + 1 + i]; i++) {
-entry->name[8 + i] = filename[j + 1 + i];
-}
-}
-
-/* upcase & remove unwanted characters */
-for(i=10;i>=0;i--) {
-if(i==10 || i==7) for(;i>0 && entry->name[i]==' ';i--);
-if(entry->name[i]<=' ' || entry->name[i]>0x7f
-|| strchr(".*?<>|\":/\\[];,+='",entry->name[i]))
-entry->name[i]='_';
-else if(entry->name[i]>='a' && entry->name[i]<='z')
-entry->name[i]+='A'-'a';
-}
+entry = create_short_filename(s, filename);

 /* mangle duplicates */
 while(1) {





Re: [Qemu-devel] [PATCH v2 05/13] vvfat: introduce offset_to_bootsector, offset_to_fat and offset_to_root_dir

2017-05-23 Thread Philippe Mathieu-Daudé

Hi Hervé,

On 05/22/2017 06:11 PM, Hervé Poussineau wrote:

- offset_to_bootsector is the number of sectors up to FAT bootsector
- offset_to_fat is the number of sectors up to first File Allocation Table
- offset_to_root_dir is the number of sectors up to root directory sector


Eventually your commit description can end here, adding the 3 following 
lines below the "---" separator.



Replace first_sectors_number - 1 by offset_to_bootsector.
Replace first_sectors_number by offset_to_fat.
Replace faked_sectors by offset_to_rootdir.


This is "offset_to_root_dir"



Signed-off-by: Hervé Poussineau 
---


  ^-- separator


 block/vvfat.c | 70 ---
 1 file changed, 43 insertions(+), 27 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 6a36d4f7fa..e694d82df4 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -320,22 +320,24 @@ static void print_mapping(const struct mapping_t* 
mapping);
 typedef struct BDRVVVFATState {
 CoMutex lock;
 BlockDriverState* bs; /* pointer to parent */
-unsigned int first_sectors_number; /* 1 for a single partition, 0x40 for a 
disk with partition table */
 unsigned char first_sectors[0x40*0x200];

 int fat_type; /* 16 or 32 */
 array_t fat,directory,mapping;
 char volume_label[11];

+uint32_t offset_to_bootsector; /* 0 for floppy, 0x3f for disk */
+
 unsigned int cluster_size;
 unsigned int sectors_per_cluster;
 unsigned int sectors_per_fat;
 unsigned int sectors_of_root_directory;
 uint32_t last_cluster_of_root_directory;
-unsigned int faked_sectors; /* how many sectors are faked before file data 
*/
 uint32_t sector_count; /* total number of sectors of the partition */
 uint32_t cluster_count; /* total number of clusters of this partition */
 uint32_t max_fat_value;
+uint32_t offset_to_fat;
+uint32_t offset_to_root_dir;

 int current_fd;
 mapping_t* current_mapping;
@@ -394,15 +396,15 @@ static void init_mbr(BDRVVVFATState *s, int cyls, int 
heads, int secs)
 partition->attributes=0x80; /* bootable */

 /* LBA is used when partition is outside the CHS geometry */
-lba  = sector2CHS(>start_CHS, s->first_sectors_number - 1,
+lba  = sector2CHS(>start_CHS, s->offset_to_bootsector,
  cyls, heads, secs);
 lba |= sector2CHS(>end_CHS,   s->bs->total_sectors - 1,
  cyls, heads, secs);

 /*LBA partitions are identified only by start/length_sector_long not by 
CHS*/
-partition->start_sector_long  = cpu_to_le32(s->first_sectors_number - 1);
+partition->start_sector_long  = cpu_to_le32(s->offset_to_bootsector);
 partition->length_sector_long = cpu_to_le32(s->bs->total_sectors
-- s->first_sectors_number + 1);
+- s->offset_to_bootsector);

 /* FAT12/FAT16/FAT32 */
 /* DOS uses different types when partition is LBA,
@@ -823,12 +825,12 @@ static int read_directory(BDRVVVFATState* s, int 
mapping_index)

 static inline uint32_t sector2cluster(BDRVVVFATState* s,off_t sector_num)
 {
-return (sector_num-s->faked_sectors)/s->sectors_per_cluster;
+return (sector_num - s->offset_to_root_dir) / s->sectors_per_cluster;
 }

 static inline off_t cluster2sector(BDRVVVFATState* s, uint32_t cluster_num)
 {
-return s->faked_sectors + s->sectors_per_cluster * cluster_num;
+return s->offset_to_root_dir + s->sectors_per_cluster * cluster_num;
 }

 static int init_directories(BDRVVVFATState* s,
@@ -855,6 +857,9 @@ static int init_directories(BDRVVVFATState* s,
 i = 1+s->sectors_per_cluster*0x200*8/s->fat_type;
 s->sectors_per_fat=(s->sector_count+i)/i; /* round up */

+s->offset_to_fat = s->offset_to_bootsector + 1;
+s->offset_to_root_dir = s->offset_to_fat + s->sectors_per_fat * 2;
+
 array_init(&(s->mapping),sizeof(mapping_t));
 array_init(&(s->directory),sizeof(direntry_t));

@@ -868,7 +873,6 @@ static int init_directories(BDRVVVFATState* s,
 /* Now build FAT, and write back information into directory */
 init_fat(s);

-s->faked_sectors=s->first_sectors_number+s->sectors_per_fat*2;
 s->cluster_count=sector2cluster(s, s->sector_count);

 mapping = array_get_next(&(s->mapping));
@@ -946,7 +950,8 @@ static int init_directories(BDRVVVFATState* s,

 s->current_mapping = NULL;

-
bootsector=(bootsector_t*)(s->first_sectors+(s->first_sectors_number-1)*0x200);
+bootsector = (bootsector_t *)(s->first_sectors
+  + s->offset_to_bootsector * 0x200);
 bootsector->jump[0]=0xeb;
 bootsector->jump[1]=0x3e;
 bootsector->jump[2]=0x90;
@@ -957,16 +962,18 @@ static int init_directories(BDRVVVFATState* s,
 bootsector->number_of_fats=0x2; /* number of FATs */
 bootsector->root_entries=cpu_to_le16(s->sectors_of_root_directory*0x10);
 

Re: [Qemu-devel] [PATCH] docker: Add flex and bison to centos6 image

2017-05-23 Thread Philippe Mathieu-Daudé

On 05/23/2017 09:52 PM, Fam Zheng wrote:

Currently there are warnings about flex and bison being missing when
building in the centos6 image:

make[1]: flex: Command not found
 BISON dtc-parser.tab.c
make[1]: bison: Command not found

Add them.


I had the same issue with debian based images :/



Reported-by: Thomas Huth 
Signed-off-by: Fam Zheng 


Reviewed-by: Philippe Mathieu-Daudé 


---
 tests/docker/dockerfiles/centos6.docker | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/docker/dockerfiles/centos6.docker 
b/tests/docker/dockerfiles/centos6.docker
index 34e0d3b..17a4d24 100644
--- a/tests/docker/dockerfiles/centos6.docker
+++ b/tests/docker/dockerfiles/centos6.docker
@@ -1,7 +1,7 @@
 FROM centos:6
 RUN yum install -y epel-release
 ENV PACKAGES libfdt-devel ccache \
-tar git make gcc g++ \
+tar git make gcc g++ flex bison \
 zlib-devel glib2-devel SDL-devel pixman-devel \
 epel-release
 RUN yum install -y $PACKAGES





Re: [Qemu-devel] [PATCH V6 06/10] migration: add postcopy blocktime ctx into MigrationIncomingState

2017-05-23 Thread Peter Xu
On Tue, May 23, 2017 at 02:31:07PM +0300, Alexey Perevalov wrote:
> This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID,
> in case when this feature is provided by kernel.
> 
> PostcopyBlocktimeContext is incapsulated inside postcopy-ram.c,
> due to it's postcopy only feature.
> Also it defines PostcopyBlocktimeContext's instance live time.
> Information from PostcopyBlocktimeContext instance will be provided
> much after postcopy migration end, instance of PostcopyBlocktimeContext
> will live till QEMU exit, but part of it (vcpu_addr,
> page_fault_vcpu_time) used only during calculation, will be released
> when postcopy ended or failed.
> 
> To enable postcopy blocktime calculation on destination, need to request
> proper capabiltiy (Patch for documentation will be at the tail of the patch
> set).
> 
> As an example following command enable that capability, assume QEMU was
> started with
> -chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock
> option to control it
> 
> [root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \
> {\"execute\": \"migrate-set-capabilities\" , \"arguments\":   {
> \"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\":
> true } ] } }" | nc -U /var/lib/migrate-vm-monitor.sock
> 
> Or just with HMP
> (qemu) migrate_set_capability postcopy-blocktime on
> 
> Signed-off-by: Alexey Perevalov 
> ---
>  include/migration/migration.h |  8 +
>  migration/postcopy-ram.c  | 80 
> +++
>  2 files changed, 88 insertions(+)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 2951253..449cb07 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -57,6 +57,8 @@ enum mig_rp_message_type {
>  
>  typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
>  
> +struct PostcopyBlocktimeContext;
> +
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>  QEMUFile *from_src_file;
> @@ -97,6 +99,12 @@ struct MigrationIncomingState {
>  
>  /* See savevm.c */
>  LoadStateEntry_Head loadvm_handlers;
> +
> +/*
> + * PostcopyBlocktimeContext to keep information for postcopy
> + * live migration, to calculate vCPU block time
> + * */
> +struct PostcopyBlocktimeContext *blocktime_ctx;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 4f3f495..5435a40 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -59,6 +59,73 @@ struct PostcopyDiscardState {
>  #include 
>  #include 
>  
> +typedef struct PostcopyBlocktimeContext {
> +/* time when page fault initiated per vCPU */
> +int64_t *page_fault_vcpu_time;
> +/* page address per vCPU */
> +uint64_t *vcpu_addr;
> +int64_t total_blocktime;
> +/* blocktime per vCPU */
> +int64_t *vcpu_blocktime;
> +/* point in time when last page fault was initiated */
> +int64_t last_begin;
> +/* number of vCPU are suspended */
> +int smp_cpus_down;
> +
> +/*
> + * Handler for exit event, necessary for
> + * releasing whole blocktime_ctx
> + */
> +Notifier exit_notifier;
> +/*
> + * Handler for postcopy event, necessary for
> + * releasing unnecessary part of blocktime_ctx
> + */
> +Notifier postcopy_notifier;
> +} PostcopyBlocktimeContext;
> +
> +static void destroy_blocktime_context(struct PostcopyBlocktimeContext *ctx)
> +{
> +g_free(ctx->page_fault_vcpu_time);
> +g_free(ctx->vcpu_addr);
> +g_free(ctx->vcpu_blocktime);
> +g_free(ctx);
> +}
> +
> +static void postcopy_migration_cb(Notifier *n, void *data)
> +{
> +PostcopyBlocktimeContext *ctx = container_of(n, PostcopyBlocktimeContext,
> +   postcopy_notifier);
> +MigrationState *s = data;
> +if (migration_has_finished(s) || migration_has_failed(s)) {
> +g_free(ctx->page_fault_vcpu_time);
> +/* g_free is NULL robust */
> +ctx->page_fault_vcpu_time = NULL;
> +g_free(ctx->vcpu_addr);
> +ctx->vcpu_addr = NULL;
> +}
> +}
> +
> +static void migration_exit_cb(Notifier *n, void *data)
> +{
> +PostcopyBlocktimeContext *ctx = container_of(n, PostcopyBlocktimeContext,
> +   exit_notifier);
> +destroy_blocktime_context(ctx);
> +}
> +
> +static struct PostcopyBlocktimeContext *blocktime_context_new(void)
> +{
> +PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
> +ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
> +ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
> +ctx->vcpu_blocktime = g_new0(int64_t, smp_cpus);
> +
> +ctx->exit_notifier.notify = migration_exit_cb;
> +ctx->postcopy_notifier.notify = postcopy_migration_cb;
> +qemu_add_exit_notifier(>exit_notifier);
> +

Re: [Qemu-devel] [PATCH] net/rocker: Cleanup the useless return value check

2017-05-23 Thread Jason Wang



On 2017年05月24日 10:57, Mao Zhongyi wrote:

None of pci_dma_read()'s callers check the return value except
rocker. There is no need to check it because it always return
0. So the check work is useless. Remove it entirely.

Suggested-by: Markus Armbruster 
Signed-off-by: Mao Zhongyi 
---
  hw/net/rocker/rocker.c  | 9 +++--
  hw/net/rocker/rocker_desc.c | 4 +---
  2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index 6e70fdd..4f0f6d7 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -244,11 +244,9 @@ static int tx_consume(Rocker *r, DescInfo *info)
  goto err_no_mem;
  }
  
-if (pci_dma_read(dev, frag_addr, iov[iovcnt].iov_base,

- iov[iovcnt].iov_len)) {
-err = -ROCKER_ENXIO;
-goto err_bad_io;
-}
+pci_dma_read(dev, frag_addr, iov[iovcnt].iov_base,
+ iov[iovcnt].iov_len);
+
  iovcnt++;
  }
  
@@ -261,7 +259,6 @@ static int tx_consume(Rocker *r, DescInfo *info)

  err = fp_port_eg(r->fp_port[port], iov, iovcnt);
  
  err_too_many_frags:

-err_bad_io:
  err_no_mem:
  err_bad_attr:
  for (i = 0; i < ROCKER_TX_FRAGS_MAX; i++) {
diff --git a/hw/net/rocker/rocker_desc.c b/hw/net/rocker/rocker_desc.c
index ac02797..6184c40 100644
--- a/hw/net/rocker/rocker_desc.c
+++ b/hw/net/rocker/rocker_desc.c
@@ -69,9 +69,7 @@ char *desc_get_buf(DescInfo *info, bool read_only)
  return NULL;
  }
  
-if (pci_dma_read(dev, le64_to_cpu(info->desc.buf_addr), info->buf, size)) {

-return NULL;
-}
+pci_dma_read(dev, le64_to_cpu(info->desc.buf_addr), info->buf, size);
  
  return info->buf;

  }


Applied, thanks.



Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] Re: [PATCH v2 00/16] Vhost-pci for inter-VM communication

2017-05-23 Thread Jason Wang



On 2017年05月23日 18:48, Wei Wang wrote:

On 05/23/2017 02:32 PM, Jason Wang wrote:



On 2017年05月23日 13:47, Wei Wang wrote:

On 05/23/2017 10:08 AM, Jason Wang wrote:



On 2017年05月22日 19:46, Wang, Wei W wrote:

On Monday, May 22, 2017 10:28 AM, Jason Wang wrote:

On 2017年05月19日 23:33, Stefan Hajnoczi wrote:

On Fri, May 19, 2017 at 11:10:33AM +0800, Jason Wang wrote:

On 2017年05月18日 11:03, Wei Wang wrote:

On 05/17/2017 02:22 PM, Jason Wang wrote:

On 2017年05月17日 14:16, Jason Wang wrote:

On 2017年05月16日 15:12, Wei Wang wrote:

Hi:

Care to post the driver codes too?

OK. It may take some time to clean up the driver code 
before post
it out. You can first have a check of the draft at the repo 
here:

https://github.com/wei-w-wang/vhost-pci-driver

Best,
Wei

Interesting, looks like there's one copy on tx side. We used to
have zerocopy support for tun for VM2VM traffic. Could you 
please

try to compare it with your vhost-pci-net by:

We can analyze from the whole data path - from VM1's network 
stack

to send packets -> VM2's network stack to receive packets. The
number of copies are actually the same for both.
That's why I'm asking you to compare the performance. The only 
reason

for vhost-pci is performance. You should prove it.

There is another reason for vhost-pci besides maximum performance:

vhost-pci makes it possible for end-users to run networking or 
storage
appliances in compute clouds.  Cloud providers do not allow 
end-users
to run custom vhost-user processes on the host so you need 
vhost-pci.


Stefan
Then it has non NFV use cases and the question goes back to the 
performance
comparing between vhost-pci and zerocopy vhost_net. If it does 
not perform

better, it was less interesting at least in this case.


Probably I can share what we got about vhost-pci and vhost-user:
https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost_pci_vs_vhost_user.pdf 


Right now, I don’t have the environment to add the vhost_net test.


Thanks, the number looks good. But I have some questions:

- Is the number measured through your vhost-pci kernel driver code?


Yes, the kernel driver code.


Interesting, in the above link, "l2fwd" was used in vhost-pci 
testing. I want to know more about the test configuration: If l2fwd 
is the one that dpdk had, want to know how can you make it work for 
kernel driver. (Maybe packet socket I think?) If not, want to know 
how do you configure it (e.g through bridge or act_mirred or others). 
And in OVS dpdk, is dpdk l2fwd + pmd used in the testing?




Oh, that l2fwd is a kernel module from OPNFV vsperf
(http://artifacts.opnfv.org/vswitchperf/docs/userguide/quickstart.html)
For both legacy and vhost-pci cases, they use the same l2fwd module.
No bridge is used, the module already works at L2 to forward packets
between two net devices.


Thanks for the pointer. Just to confirm, I think virtio-net kernel 
driver is used in OVS-dpdk test?


Another question is, can we manage to remove the copy in tx? If not, is 
it a limitation of virtio protocol?


Thanks



Best,
Wei









Re: [Qemu-devel] [virtio-dev] Re: [PATCH RFC] virtio-net: enable configurable tx queue size

2017-05-23 Thread Jason Wang



On 2017年05月23日 18:36, Wei Wang wrote:

On 05/23/2017 02:24 PM, Jason Wang wrote:



On 2017年05月23日 13:15, Wei Wang wrote:

On 05/23/2017 10:04 AM, Jason Wang wrote:



On 2017年05月22日 19:52, Wei Wang wrote:

On 05/20/2017 04:42 AM, Michael S. Tsirkin wrote:

On Fri, May 19, 2017 at 10:32:19AM +0800, Wei Wang wrote:

This patch enables the virtio-net tx queue size to be configurable
between 256 (the default queue size) and 1024 by the user. The 
queue

size specified by the user should be power of 2.

Setting the tx queue size to be 1024 requires the guest driver to
support the VIRTIO_NET_F_MAX_CHAIN_SIZE feature.
This should be a generic ring feature, not one specific to virtio 
net.

OK. How about making two more changes below:

1) make the default tx queue size = 1024 (instead of 256).


As has been pointed out, you need compat the default value too in 
this case.


The driver gets the size info from the device, then would it cause any
compatibility issue if we change the default ring size to 1024 in the
vhost case? In other words, is there any software (i.e. any 
virtio-net driver)

functions based on the assumption of 256 queue size?


I don't know. But is it safe e.g we migrate from 1024 to an older 
qemu with 256 as its queue size?


Yes, I think it is safe, because the default queue size is used when 
the device is being

set up (e.g. feature negotiation).
During migration (the device has already been running), the 
destination machine will
load the device state based on the the queue size that is being used 
(i.e. vring.num).

The default value is not used any more after the setup phase.


I haven't checked all cases, but there's two obvious things:

- After migration and after a reset, it will go back to 256 on dst.
- ABI is changed, e.g -M pc-q35-2.10 returns 1024 on 2.11







For live migration, the queue size that is being used will also be 
transferred

to the destination.




We can reduce the size (to 256) if the MAX_CHAIN_SIZE feature
is not supported by the guest.
In this way, people who apply the QEMU patch can directly use the
largest queue size(1024) without adding the booting command line.

2) The vhost backend does not use writev, so I think when the vhost
backed is used, using 1024 queue size should not depend on the
MAX_CHAIN_SIZE feature.


But do we need to consider even larger queue size now?


Need Michael's feedback on this. Meanwhile, I'll get the next 
version of

code ready and check if larger queue size would cause any corner case.


The problem is, do we really need a new config filed for this? Or 
just introduce a flag which means "I support up to 1024 sgs" is 
sufficient?




For now, it also works without the new config field, max_chain_size,
But I would prefer to keep the new config field, because:

Without that, the driver will work on  an assumed value, 1023.


This is the fact, and it's too late to change legacy driver.


If the future, QEMU needs to change it to 1022, then how can the
new QEMU tell the old driver, which supports the MAX_CHAIN_SIZE
feature but works with the old hardcode value 1023?


Can config filed help in this case? The problem is similar to 
ANY_HEADER_SG, the only thing we can is to clarify the limitation for 
new drivers.


Thanks



So, I think using that config value would be good for future updates.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org






Re: [Qemu-devel] new to qemu with a simple question

2017-05-23 Thread Fam Zheng
On Wed, 05/24 10:56, Wang Dong wrote:
> Some C source code is generate from json file.
> 
> I wonder why this? What is the benefits of this?

This allows you to concentrate on the high level semantics of the QMP API,
without worrying about C syntax and structural code here and there. In one
word, more conciseness and little code duplication.

If done in C, it would be much more lines of code and hardly readable.

Fam



[Qemu-devel] [PATCH] net/rocker: Cleanup the useless return value check

2017-05-23 Thread Mao Zhongyi
None of pci_dma_read()'s callers check the return value except
rocker. There is no need to check it because it always return
0. So the check work is useless. Remove it entirely.

Suggested-by: Markus Armbruster 
Signed-off-by: Mao Zhongyi 
---
 hw/net/rocker/rocker.c  | 9 +++--
 hw/net/rocker/rocker_desc.c | 4 +---
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index 6e70fdd..4f0f6d7 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -244,11 +244,9 @@ static int tx_consume(Rocker *r, DescInfo *info)
 goto err_no_mem;
 }
 
-if (pci_dma_read(dev, frag_addr, iov[iovcnt].iov_base,
- iov[iovcnt].iov_len)) {
-err = -ROCKER_ENXIO;
-goto err_bad_io;
-}
+pci_dma_read(dev, frag_addr, iov[iovcnt].iov_base,
+ iov[iovcnt].iov_len);
+
 iovcnt++;
 }
 
@@ -261,7 +259,6 @@ static int tx_consume(Rocker *r, DescInfo *info)
 err = fp_port_eg(r->fp_port[port], iov, iovcnt);
 
 err_too_many_frags:
-err_bad_io:
 err_no_mem:
 err_bad_attr:
 for (i = 0; i < ROCKER_TX_FRAGS_MAX; i++) {
diff --git a/hw/net/rocker/rocker_desc.c b/hw/net/rocker/rocker_desc.c
index ac02797..6184c40 100644
--- a/hw/net/rocker/rocker_desc.c
+++ b/hw/net/rocker/rocker_desc.c
@@ -69,9 +69,7 @@ char *desc_get_buf(DescInfo *info, bool read_only)
 return NULL;
 }
 
-if (pci_dma_read(dev, le64_to_cpu(info->desc.buf_addr), info->buf, size)) {
-return NULL;
-}
+pci_dma_read(dev, le64_to_cpu(info->desc.buf_addr), info->buf, size);
 
 return info->buf;
 }
-- 
2.9.3






[Qemu-devel] [PATCH v3 16/16] tests: Add test case for BLK_PERM_AIO_CONTEXT_CHANGE

2017-05-23 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 tests/Makefile.include |  2 ++
 tests/test-blk-perm.c  | 59 ++
 2 files changed, 61 insertions(+)
 create mode 100644 tests/test-blk-perm.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 7589383..5811296 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -56,6 +56,7 @@ gcov-files-test-thread-pool-y = thread-pool.c
 gcov-files-test-hbitmap-y = util/hbitmap.c
 check-unit-y += tests/test-hbitmap$(EXESUF)
 gcov-files-test-hbitmap-y = blockjob.c
+check-unit-y += tests/test-blk-perm$(EXESUF)
 check-unit-y += tests/test-blockjob$(EXESUF)
 check-unit-y += tests/test-blockjob-txn$(EXESUF)
 check-unit-y += tests/test-x86-cpuid$(EXESUF)
@@ -548,6 +549,7 @@ tests/test-coroutine$(EXESUF): tests/test-coroutine.o 
$(test-block-obj-y)
 tests/test-aio$(EXESUF): tests/test-aio.o $(test-block-obj-y)
 tests/test-aio-multithread$(EXESUF): tests/test-aio-multithread.o 
$(test-block-obj-y)
 tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
+tests/test-blk-perm$(EXESUF): tests/test-blk-perm.o $(test-block-obj-y)
 tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) 
$(test-util-obj-y)
 tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o 
$(test-block-obj-y) $(test-util-obj-y)
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
diff --git a/tests/test-blk-perm.c b/tests/test-blk-perm.c
new file mode 100644
index 000..1c7b5d2
--- /dev/null
+++ b/tests/test-blk-perm.c
@@ -0,0 +1,59 @@
+/*
+ * Block permission tests
+ *
+ * Copyright Red Hat, Inc. 2017
+ *
+ * Authors:
+ *  Fam Zheng 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+
+static void test_aio_context_success(void)
+{
+BlockBackend *blk1 = blk_new(BLK_PERM_AIO_CONTEXT_CHANGE, BLK_PERM_ALL);
+BlockBackend *blk2 = blk_new(BLK_PERM_AIO_CONTEXT_CHANGE, BLK_PERM_ALL);
+BlockDriverState *bs = bdrv_open("null-co://", NULL, NULL, 0, 
_abort);
+
+blk_insert_bs(blk1, bs, _abort);
+blk_insert_bs(blk2, bs, _abort);
+
+blk_unref(blk1);
+blk_unref(blk2);
+bdrv_unref(bs);
+}
+
+static void test_aio_context_failure(void)
+{
+Error *local_err = NULL;
+BlockBackend *blk1 = blk_new(BLK_PERM_AIO_CONTEXT_CHANGE,
+ BLK_PERM_ALL & ~BLK_PERM_AIO_CONTEXT_CHANGE);
+BlockBackend *blk2 = blk_new(BLK_PERM_AIO_CONTEXT_CHANGE, BLK_PERM_ALL);
+BlockDriverState *bs = bdrv_open("null-co://", NULL, NULL, 0, 
_abort);
+
+blk_insert_bs(blk1, bs, _abort);
+blk_insert_bs(blk2, bs, _err);
+
+error_free_or_abort(_err);
+
+blk_unref(blk1);
+blk_unref(blk2);
+bdrv_unref(bs);
+}
+
+int main(int argc, char **argv)
+{
+bdrv_init();
+qemu_init_main_loop(_abort);
+g_test_init(, , NULL);
+g_test_add_func("/block/perm/aio-context/success",
+test_aio_context_success);
+g_test_add_func("/block/perm/aio-context/failure",
+test_aio_context_failure);
+return g_test_run();
+}
-- 
2.9.4




[Qemu-devel] [PATCH v3 14/16] nbd: Allow BLK_PERM_AIO_CONTEXT_CHANGE on BB

2017-05-23 Thread Fam Zheng
This is safe because of the aio context notifier we'll register on this
node. So allow it.

Signed-off-by: Fam Zheng 
---
 nbd/server.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 924a1fe..a8f58fb 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -900,8 +900,10 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
 if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
 perm |= BLK_PERM_WRITE;
 }
-blk = blk_new(perm, BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
-BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
+blk = blk_new(perm,
+  BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
+  BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD |
+  BLK_PERM_AIO_CONTEXT_CHANGE);
 ret = blk_insert_bs(blk, bs, errp);
 if (ret < 0) {
 goto fail;
-- 
2.9.4




[Qemu-devel] new to qemu with a simple question

2017-05-23 Thread Wang Dong


Hi guys,

I am new to qemu. But I need do some job in it right now.

When I try read qmp code. I found a interesting part against it.

Some C source code is generate from json file.

I wonder why this? What is the benefits of this?

Thanks in advance.


--
Best regards. Wang Dong




[Qemu-devel] [PATCH v3 15/16] block: Add perm assertion on blk_set_aio_context

2017-05-23 Thread Fam Zheng
Now that all BB users comply with the BLK_PERM_AIO_CONTEXT_CHANGE
rule, we can assert it.

Signed-off-by: Fam Zheng 
---
 block/block-backend.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index dfe577d..51c62ea 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1695,8 +1695,12 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB 
*acb)
 
 void blk_set_aio_context(BlockBackend *blk, AioContext *new_context)
 {
+uint64_t perm, shared_perm;
 BlockDriverState *bs = blk_bs(blk);
 
+blk_get_perm(blk, , _perm);
+assert(perm & BLK_PERM_AIO_CONTEXT_CHANGE);
+
 blk->aio_context = new_context;
 if (bs) {
 if (blk->public.throttle_state) {
-- 
2.9.4




[Qemu-devel] [PATCH v3 10/16] mirror: Do initial aio context move of target via BB interface

2017-05-23 Thread Fam Zheng
While blockdev-backup tried to verify before moving target's aio context, the
same is missing for blockdev-mirror. Now that we have the right interface, fix
this as well.

As a bonus, the aio context move is now conditional, which avoids some
unnecessary operations in bdrv_set_aio_context.

Signed-off-by: Fam Zheng 
---
 block/mirror.c | 54 --
 blockdev.c |  4 
 2 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index a3337ee..4eccb8d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1118,13 +1118,44 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
  bool auto_complete, const char *filter_node_name,
  Error **errp)
 {
+AioContext *aio_context, *target_context;
 MirrorBlockJob *s;
 BlockDriverState *mirror_top_bs;
+BlockBackend *target_blk;
 bool target_graph_mod;
 bool target_is_backing;
 Error *local_err = NULL;
 int ret;
 
+/* No resize for the target either; while the mirror is still running, a
+ * consistent read isn't necessarily possible. We could possibly allow
+ * writes and graph modifications, though it would likely defeat the
+ * purpose of a mirror, so leave them blocked for now.
+ *
+ * In the case of active commit, things look a bit different, though,
+ * because the target is an already populated backing file in active use.
+ * We can allow anything except resize there.*/
+target_is_backing = bdrv_chain_contains(bs, target);
+target_graph_mod = (backing_mode != MIRROR_LEAVE_BACKING_CHAIN);
+target_blk = blk_new(BLK_PERM_WRITE | BLK_PERM_RESIZE |
+ BLK_PERM_AIO_CONTEXT_CHANGE |
+ (target_graph_mod ? BLK_PERM_GRAPH_MOD : 0),
+ BLK_PERM_WRITE_UNCHANGED |
+ (target_is_backing ? BLK_PERM_CONSISTENT_READ |
+  BLK_PERM_WRITE |
+  BLK_PERM_GRAPH_MOD : 0));
+ret = blk_insert_bs(target_blk, target, errp);
+if (ret < 0) {
+blk_unref(target_blk);
+return;
+}
+aio_context = bdrv_get_aio_context(bs);
+target_context = bdrv_get_aio_context(target);
+if (target_context != aio_context) {
+aio_context_acquire(target_context);
+blk_set_aio_context(target_blk, aio_context);
+aio_context_release(target_context);
+}
 if (granularity == 0) {
 granularity = bdrv_get_default_bitmap_granularity(target);
 }
@@ -1179,28 +1210,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 s->source = bs;
 s->mirror_top_bs = mirror_top_bs;
 
-/* No resize for the target either; while the mirror is still running, a
- * consistent read isn't necessarily possible. We could possibly allow
- * writes and graph modifications, though it would likely defeat the
- * purpose of a mirror, so leave them blocked for now.
- *
- * In the case of active commit, things look a bit different, though,
- * because the target is an already populated backing file in active use.
- * We can allow anything except resize there.*/
-target_is_backing = bdrv_chain_contains(bs, target);
-target_graph_mod = (backing_mode != MIRROR_LEAVE_BACKING_CHAIN);
-s->target = blk_new(BLK_PERM_WRITE | BLK_PERM_RESIZE |
-BLK_PERM_AIO_CONTEXT_CHANGE |
-(target_graph_mod ? BLK_PERM_GRAPH_MOD : 0),
-BLK_PERM_WRITE_UNCHANGED |
-(target_is_backing ? BLK_PERM_CONSISTENT_READ |
- BLK_PERM_WRITE |
- BLK_PERM_GRAPH_MOD : 0));
-ret = blk_insert_bs(s->target, target, errp);
-if (ret < 0) {
-goto fail;
-}
-
+s->target = target_blk;
 s->replaces = g_strdup(replaces);
 s->on_source_error = on_source_error;
 s->on_target_error = on_target_error;
diff --git a/blockdev.c b/blockdev.c
index e8f72a1..52d81c4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3545,8 +3545,6 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
 goto out;
 }
 
-bdrv_set_aio_context(target_bs, aio_context);
-
 blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, target_bs,
arg->has_replaces, arg->replaces, arg->sync,
backing_mode, arg->has_speed, arg->speed,
@@ -3597,8 +3595,6 @@ void qmp_blockdev_mirror(bool has_job_id, const char 
*job_id,
 aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(aio_context);
 
-bdrv_set_aio_context(target_bs, aio_context);
-
 blockdev_mirror_common(has_job_id ? job_id : NULL, bs, target_bs,
has_replaces, 

[Qemu-devel] [PATCH v3 11/16] virtio-scsi: Request BLK_PERM_AIO_CONTEXT_CHANGE for dataplane

2017-05-23 Thread Fam Zheng
blk_set_aio_context is audited by perm API, so follow the protocol and
request for permission first.

Signed-off-by: Fam Zheng 
---
 hw/scsi/virtio-scsi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 46a3e3f..074e235 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -794,6 +794,10 @@ static void virtio_scsi_hotplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 return;
 }
 virtio_scsi_acquire(s);
+if (!blk_request_perm(sd->conf.blk, BLK_PERM_AIO_CONTEXT_CHANGE, 
errp)) {
+virtio_scsi_release(s);
+return;
+}
 blk_set_aio_context(sd->conf.blk, s->ctx);
 virtio_scsi_release(s);
 
-- 
2.9.4




[Qemu-devel] [PATCH v3 03/16] blockjob: Add BLK_PERM_AIO_CONTEXT_CHANGE shared perm on bs

2017-05-23 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 blockjob.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 6e48932..b20fb86 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -214,6 +214,9 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 }
 }
 
+/* The notifier we'll register on @blk takes care of following context
+ * change, so permit it. */
+shared_perm |= BLK_PERM_AIO_CONTEXT_CHANGE;
 blk = blk_new(perm, shared_perm);
 ret = blk_insert_bs(blk, bs, errp);
 if (ret < 0) {
-- 
2.9.4




[Qemu-devel] [PATCH v3 12/16] virtio-blk: Request BLK_PERM_AIO_CONTEXT_CHANGE for dataplane

2017-05-23 Thread Fam Zheng
blk_set_aio_context is audited by perm API, so follow the protocol and
request for permission first.

Previously the return code in error path was hardcoded as -EPERM, but
returning 'r' is more meaningful here both for the new error and
existing ones.

Signed-off-by: Fam Zheng 
---
 hw/block/dataplane/virtio-blk.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 5556f0e..8f2ff2df 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -168,6 +168,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 unsigned i;
 unsigned nvqs = s->conf->num_queues;
 int r;
+Error *local_err = NULL;
 
 if (vblk->dataplane_started || s->starting) {
 return 0;
@@ -175,12 +176,18 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
 s->starting = true;
 
+r = blk_request_perm(s->conf->conf.blk, BLK_PERM_AIO_CONTEXT_CHANGE,
+ _err);
+if (r) {
+error_report_err(local_err);
+goto fail;
+}
 /* Set up guest notifier (irq) */
 r = k->set_guest_notifiers(qbus->parent, nvqs, true);
 if (r != 0) {
 fprintf(stderr, "virtio-blk failed to set guest notifier (%d), "
 "ensure -enable-kvm is set\n", r);
-goto fail_guest_notifiers;
+goto fail;
 }
 
 /* Set up virtqueue notify */
@@ -191,7 +198,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 while (i--) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
 }
-goto fail_guest_notifiers;
+goto fail;
 }
 }
 
@@ -219,11 +226,11 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 aio_context_release(s->ctx);
 return 0;
 
-  fail_guest_notifiers:
+fail:
 vblk->dataplane_disabled = true;
 s->starting = false;
 vblk->dataplane_started = true;
-return -ENOSYS;
+return r;
 }
 
 /* Context: QEMU global mutex held */
-- 
2.9.4




[Qemu-devel] [PATCH v3 09/16] commit: Allow aio context change on s->base

2017-05-23 Thread Fam Zheng
The block job has the aio context change notifier, we should allow it
here as well.

Signed-off-by: Fam Zheng 
---
 block/commit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/commit.c b/block/commit.c
index e2ee0ff..bbf7768 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -391,7 +391,8 @@ void commit_start(const char *job_id, BlockDriverState *bs,
   | BLK_PERM_RESIZE,
   BLK_PERM_CONSISTENT_READ
   | BLK_PERM_GRAPH_MOD
-  | BLK_PERM_WRITE_UNCHANGED);
+  | BLK_PERM_WRITE_UNCHANGED
+  | BLK_PERM_AIO_CONTEXT_CHANGE);
 ret = blk_insert_bs(s->base, base, errp);
 if (ret < 0) {
 goto fail;
-- 
2.9.4




[Qemu-devel] [PATCH v3 02/16] block-backend: Add blk_request_perm

2017-05-23 Thread Fam Zheng
This function tries to request, if not granted yet, for the given
permissions.

Signed-off-by: Fam Zheng 
---
 block/block-backend.c  | 12 
 include/sysemu/block-backend.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index f3a6008..5492f64 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -654,6 +654,18 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, 
uint64_t *shared_perm)
 *shared_perm = blk->shared_perm;
 }
 
+int blk_request_perm(BlockBackend *blk, uint64_t perm, Error **errp)
+{
+uint64_t blk_perm, shared_perm;
+
+blk_get_perm(blk, _perm, _perm);
+if ((blk_perm & perm) == perm) {
+return 0;
+}
+blk_perm |= perm;
+return blk_set_perm(blk, blk_perm, shared_perm, errp);
+}
+
 static int blk_do_attach_dev(BlockBackend *blk, void *dev)
 {
 if (blk->dev) {
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 840ad61..fc23a9e 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -116,6 +116,7 @@ bool bdrv_is_root_node(BlockDriverState *bs);
 int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
  Error **errp);
 void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm);
+int blk_request_perm(BlockBackend *blk, uint64_t perm, Error **errp);
 
 void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow);
 void blk_iostatus_enable(BlockBackend *blk);
-- 
2.9.4




[Qemu-devel] [PATCH v3 08/16] mirror: Request aio context change permission on target

2017-05-23 Thread Fam Zheng
What's done in the source's context change notifier is moving the
target's context to follow the new one, so we request this permission
here.

Signed-off-by: Fam Zheng 
---
 block/mirror.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/mirror.c b/block/mirror.c
index 03e82eb..a3337ee 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1190,6 +1190,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 target_is_backing = bdrv_chain_contains(bs, target);
 target_graph_mod = (backing_mode != MIRROR_LEAVE_BACKING_CHAIN);
 s->target = blk_new(BLK_PERM_WRITE | BLK_PERM_RESIZE |
+BLK_PERM_AIO_CONTEXT_CHANGE |
 (target_graph_mod ? BLK_PERM_GRAPH_MOD : 0),
 BLK_PERM_WRITE_UNCHANGED |
 (target_is_backing ? BLK_PERM_CONSISTENT_READ |
-- 
2.9.4




[Qemu-devel] [PATCH v3 07/16] backup: Do initial aio context move of target via BB interface

2017-05-23 Thread Fam Zheng
The old aio context check is hacky because when it was added we didn't
have the permission system to enforce a general rule. It only checks if
the target BDS has a BB, which is in fact insufficient because there may
be other BBs in the graph that cannot handle the aio context change.

Do this through blk_set_aio_context interface, in backup_job_create()
which is a central place for both drive-backup and blockdev-backup, to
take care of the compatibility check.

Also the bdrv_set_aio_context in do_drive_backup could have been
conditional, to save a recursion when possible.

Signed-off-by: Fam Zheng 
---
 block/backup.c |  9 +
 blockdev.c | 14 --
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 546c5c5..9136f91 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -564,6 +564,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 BlockDriverInfo bdi;
 BackupBlockJob *job = NULL;
 int ret;
+AioContext *aio_context, *target_context;
 
 assert(bs);
 assert(target);
@@ -644,6 +645,14 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 goto error;
 }
 
+aio_context = bdrv_get_aio_context(bs);
+target_context = bdrv_get_aio_context(target);
+if (target_context != aio_context) {
+aio_context_acquire(target_context);
+blk_set_aio_context(job->target, aio_context);
+aio_context_release(target_context);
+}
+
 job->on_source_error = on_source_error;
 job->on_target_error = on_target_error;
 job->sync_mode = sync_mode;
diff --git a/blockdev.c b/blockdev.c
index c63f4e8..e8f72a1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3240,8 +3240,6 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
BlockJobTxn *txn,
 goto out;
 }
 
-bdrv_set_aio_context(target_bs, aio_context);
-
 if (set_backing_hd) {
 bdrv_set_backing_hd(target_bs, source, _err);
 if (local_err) {
@@ -3326,18 +3324,6 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, 
BlockJobTxn *txn,
 if (!target_bs) {
 goto out;
 }
-
-if (bdrv_get_aio_context(target_bs) != aio_context) {
-if (!bdrv_has_blk(target_bs)) {
-/* The target BDS is not attached, we can safely move it to another
- * AioContext. */
-bdrv_set_aio_context(target_bs, aio_context);
-} else {
-error_setg(errp, "Target is attached to a different thread from "
- "source.");
-goto out;
-}
-}
 job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
 backup->sync, NULL, backup->compress,
 backup->on_source_error, backup->on_target_error,
-- 
2.9.4




[Qemu-devel] [PATCH v3 06/16] backup: Request BLK_PERM_AIO_CONTEXT_CHANGE on target

2017-05-23 Thread Fam Zheng
What's done in the source's context change notifier is moving the
target's context to follow the new one, so we request this permission
here.

Signed-off-by: Fam Zheng 
---
 block/backup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/backup.c b/block/backup.c
index a4fb288..546c5c5 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -636,7 +636,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 }
 
 /* The target must match the source in size, so no resize here either */
-job->target = blk_new(BLK_PERM_WRITE,
+job->target = blk_new(BLK_PERM_WRITE | BLK_PERM_AIO_CONTEXT_CHANGE,
   BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
   BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
 ret = blk_insert_bs(job->target, target, errp);
-- 
2.9.4




[Qemu-devel] [PATCH v3 00/16] block: Protect AIO context change with perm API

2017-05-23 Thread Fam Zheng
v3: Move blk_set_aio_context to the front of mirror_start_job to avoid
accessing target without acquiring its aio context. [Stefan]
Use error_free_or_abort in test code. [Stefan]

v2: Address Stefan's comments:

- Clean up redundancy in bdrv_format_default_perms change.
- Add a test case to check both success/failure cases.
  A failure case is not possible at user interface level because of other
  checks we have, so write a unit test in tests/test-blk-perm.c.

Eject / change of scsi-cd on a virtio-scsi dataplane bus causes abort() because
the new BDS doesn't get proper bdrv_set_aio_context().

Store the AioContext in BB and do it in blk_insert_bs. That is done by
Vladimir's patch.

Other patches are to make sure such a bdrv_set_aio_context() doesn't interfere
with other BBs using other nodes from this graph.

Fam

Fam Zheng (15):
  block: Define BLK_PERM_AIO_CONTEXT_CHANGE
  block-backend: Add blk_request_perm
  blockjob: Add BLK_PERM_AIO_CONTEXT_CHANGE shared perm on bs
  blockjob: Allow aio context change on intermediate nodes
  block: Propagate BLK_PERM_AIO_CONTEXT_CHANGE down the graph
  backup: Request BLK_PERM_AIO_CONTEXT_CHANGE on target
  backup: Do initial aio context move of target via BB interface
  mirror: Request aio context change permission on target
  commit: Allow aio context change on s->base
  mirror: Do initial aio context move of target via BB interface
  virtio-scsi: Request BLK_PERM_AIO_CONTEXT_CHANGE for dataplane
  virtio-blk: Request BLK_PERM_AIO_CONTEXT_CHANGE for dataplane
  nbd: Allow BLK_PERM_AIO_CONTEXT_CHANGE on BB
  block: Add perm assertion on blk_set_aio_context
  tests: Add test case for BLK_PERM_AIO_CONTEXT_CHANGE

Vladimir Sementsov-Ogievskiy (1):
  blk: fix aio context loss on media change

 block.c | 12 ++---
 block/backup.c  | 11 +++-
 block/block-backend.c   | 22 +++
 block/commit.c  |  6 +++--
 block/mirror.c  | 56 +++---
 block/stream.c  |  3 ++-
 block/vvfat.c   |  2 +-
 blockdev.c  | 18 -
 blockjob.c  |  3 +++
 hw/block/dataplane/virtio-blk.c | 15 ---
 hw/scsi/virtio-scsi.c   |  4 +++
 include/block/block.h   |  7 -
 include/sysemu/block-backend.h  |  1 +
 nbd/server.c|  6 +++--
 tests/Makefile.include  |  2 ++
 tests/test-blk-perm.c   | 59 +
 16 files changed, 171 insertions(+), 56 deletions(-)
 create mode 100644 tests/test-blk-perm.c

-- 
2.9.4




[Qemu-devel] [PATCH v3 13/16] blk: fix aio context loss on media change

2017-05-23 Thread Fam Zheng
From: Vladimir Sementsov-Ogievskiy 

If we have separate iothread for cdrom, we lose connection to it on
qmp_blockdev_change_medium, as aio_context is on bds which is dropped
and switched with new one.

As an example result, after such media change we have crash on
virtio_scsi_ctx_check: Assertion `blk_get_aio_context(d->conf.blk) == s->ctx' 
failed.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fam Zheng 
---
 block/block-backend.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 5492f64..dfe577d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -68,6 +68,7 @@ struct BlockBackend {
 NotifierList remove_bs_notifiers, insert_bs_notifiers;
 
 int quiesce_counter;
+AioContext *aio_context;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -618,6 +619,10 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, 
Error **errp)
 }
 bdrv_ref(bs);
 
+if (blk->aio_context != NULL) {
+bdrv_set_aio_context(bs, blk->aio_context);
+}
+
 notifier_list_notify(>insert_bs_notifiers, blk);
 if (blk->public.throttle_state) {
 throttle_timers_attach_aio_context(
@@ -1692,6 +1697,7 @@ void blk_set_aio_context(BlockBackend *blk, AioContext 
*new_context)
 {
 BlockDriverState *bs = blk_bs(blk);
 
+blk->aio_context = new_context;
 if (bs) {
 if (blk->public.throttle_state) {
 throttle_timers_detach_aio_context(>public.throttle_timers);
-- 
2.9.4




[Qemu-devel] [PATCH v3 04/16] blockjob: Allow aio context change on intermediate nodes

2017-05-23 Thread Fam Zheng
The intermediate nodes do work with aio context change, so allow that
operations.

Signed-off-by: Fam Zheng 
---
 block/commit.c | 3 ++-
 block/mirror.c | 3 ++-
 block/stream.c | 3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 76a0d98..e2ee0ff 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -365,7 +365,8 @@ void commit_start(const char *job_id, BlockDriverState *bs,
  * for its backing file). The other options would be a second filter
  * driver above s->base. */
 ret = block_job_add_bdrv(>common, "intermediate node", iter, 0,
- BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
+ BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
+ BLK_PERM_AIO_CONTEXT_CHANGE,
  errp);
 if (ret < 0) {
 goto fail;
diff --git a/block/mirror.c b/block/mirror.c
index e86f8f8..03e82eb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1232,7 +1232,8 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
  * also blocked for its backing file). The other options would be a
  * second filter driver above s->base (== target). */
 ret = block_job_add_bdrv(>common, "intermediate node", iter, 0,
- BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
+ BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE 
|
+ BLK_PERM_AIO_CONTEXT_CHANGE,
  errp);
 if (ret < 0) {
 goto fail;
diff --git a/block/stream.c b/block/stream.c
index 0113710..2fab1f4 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -265,7 +265,8 @@ void stream_start(const char *job_id, BlockDriverState *bs,
  * and resizes. */
 for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) 
{
 block_job_add_bdrv(>common, "intermediate node", iter, 0,
-   BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED,
+   BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED 
|
+   BLK_PERM_AIO_CONTEXT_CHANGE,
_abort);
 }
 
-- 
2.9.4




[Qemu-devel] [PATCH v3 05/16] block: Propagate BLK_PERM_AIO_CONTEXT_CHANGE down the graph

2017-05-23 Thread Fam Zheng
bdrv_set_aio_context can take care of children recursively, so it is
okay to pass down the perm.

Signed-off-by: Fam Zheng 
---
 block.c   | 10 ++
 block/vvfat.c |  2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index d98662f..1e5eae7 100644
--- a/block.c
+++ b/block.c
@@ -1772,7 +1772,8 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, 
uint64_t shared,
 #define DEFAULT_PERM_PASSTHROUGH (BLK_PERM_CONSISTENT_READ \
  | BLK_PERM_WRITE \
  | BLK_PERM_WRITE_UNCHANGED \
- | BLK_PERM_RESIZE)
+ | BLK_PERM_RESIZE \
+ | BLK_PERM_AIO_CONTEXT_CHANGE)
 #define DEFAULT_PERM_UNCHANGED (BLK_PERM_ALL & ~DEFAULT_PERM_PASSTHROUGH)
 
 void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
@@ -1815,9 +1816,10 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 perm |= BLK_PERM_CONSISTENT_READ;
 shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
 } else {
-/* We want consistent read from backing files if the parent needs it.
+/* We want consistent read and aio context change from backing files if
+ * the parent needs it.
  * No other operations are performed on backing files. */
-perm &= BLK_PERM_CONSISTENT_READ;
+perm &= BLK_PERM_CONSISTENT_READ | BLK_PERM_AIO_CONTEXT_CHANGE;
 
 /* If the parent can deal with changing data, we're okay with a
  * writable and resizable backing file. */
@@ -1829,7 +1831,7 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 }
 
 shared |= BLK_PERM_CONSISTENT_READ | BLK_PERM_GRAPH_MOD |
-  BLK_PERM_WRITE_UNCHANGED;
+  BLK_PERM_WRITE_UNCHANGED | BLK_PERM_AIO_CONTEXT_CHANGE;
 }
 
 if (bs->open_flags & BDRV_O_INACTIVE) {
diff --git a/block/vvfat.c b/block/vvfat.c
index 426ca70..599a370 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3091,7 +3091,7 @@ static void vvfat_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 if (c == s->qcow) {
 /* This is a private node, nobody should try to attach to it */
 *nperm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
-*nshared = BLK_PERM_WRITE_UNCHANGED;
+*nshared = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_AIO_CONTEXT_CHANGE;
 } else {
 /* The backing file is there so 'commit' can use it. vvfat doesn't
  * access it in any way. */
-- 
2.9.4




[Qemu-devel] [PATCH v3 01/16] block: Define BLK_PERM_AIO_CONTEXT_CHANGE

2017-05-23 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 block.c   | 2 ++
 include/block/block.h | 7 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 50ba264..d98662f 100644
--- a/block.c
+++ b/block.c
@@ -1649,6 +1649,8 @@ char *bdrv_perm_names(uint64_t perm)
 { BLK_PERM_WRITE_UNCHANGED, "write unchanged" },
 { BLK_PERM_RESIZE,  "resize" },
 { BLK_PERM_GRAPH_MOD,   "change children" },
+{ BLK_PERM_AIO_CONTEXT_CHANGE,
+"aio context change" },
 { 0, NULL }
 };
 
diff --git a/include/block/block.h b/include/block/block.h
index 9b355e9..017d6c8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -225,7 +225,12 @@ enum {
  */
 BLK_PERM_GRAPH_MOD  = 0x10,
 
-BLK_PERM_ALL= 0x1f,
+/**
+ * This permission is required to change the AioContext of this node.
+ */
+BLK_PERM_AIO_CONTEXT_CHANGE = 0x20,
+
+BLK_PERM_ALL= 0x3f,
 };
 
 char *bdrv_perm_names(uint64_t perm);
-- 
2.9.4




Re: [Qemu-devel] [PATCH V6 04/10] migration: split ufd_version_check onto receive/request features part

2017-05-23 Thread Peter Xu
On Tue, May 23, 2017 at 02:31:05PM +0300, Alexey Perevalov wrote:
> This modification is necessary for userfault fd features which are
> required to be requested from userspace.
> UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
> be introduced in the next patch.
> 
> QEMU have to use separate userfault file descriptor, due to
> userfault context has internal state, and after first call of
> ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
> success), but kernel while handling ioctl UFFD_API expects 
> UFFD_STATE_WAIT_API.
> So only one ioctl with UFFD_API is possible per ufd.
> 
> Signed-off-by: Alexey Perevalov 

Hi, Alexey,

Mostly good to me, some nitpicks below.

> ---
>  migration/postcopy-ram.c | 100 
> ++-
>  1 file changed, 91 insertions(+), 9 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 3ed78bf..4f3f495 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -59,32 +59,114 @@ struct PostcopyDiscardState {
>  #include 
>  #include 
>  
> -static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> +
> +/**
> + * receive_ufd_features: check userfault fd features, to request only 
> supported
> + * features in the future.
> + *
> + * Returns: true on success
> + *
> + * __NR_userfaultfd - should be checked before

I don't see this line necessary. After all we will detect the error no
matter what...

> + *  @features: out parameter will contain uffdio_api.features provided by 
> kernel
> + *  in case of success
> + */
> +static bool receive_ufd_features(uint64_t *features)
>  {
> -struct uffdio_api api_struct;
> -uint64_t ioctl_mask;
> +struct uffdio_api api_struct = {0};
> +int ufd;
> +bool ret = true;
> +
> +/* if we are here __NR_userfaultfd should exists */
> +ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
> +if (ufd == -1) {
> +error_report("%s: syscall __NR_userfaultfd failed: %s", __func__,
> + strerror(errno));
> +return false;
> +}
>  
> +/* ask features */
>  api_struct.api = UFFD_API;
>  api_struct.features = 0;
>  if (ioctl(ufd, UFFDIO_API, _struct)) {
> -error_report("%s: UFFDIO_API failed: %s", __func__
> +error_report("%s: UFFDIO_API failed: %s", __func__,
>   strerror(errno));
> +ret = false;
> +goto release_ufd;
> +}
> +
> +*features = api_struct.features;
> +
> +release_ufd:
> +close(ufd);
> +return ret;
> +}
> +
> +/**
> + * request_ufd_features: this function should be called only once on a newly
> + * opened ufd, subsequent calls will lead to error.
> + *
> + * Returns: true on succes
> + *
> + * @ufd: fd obtained from userfaultfd syscall
> + * @features: bit mask see UFFD_API_FEATURES
> + */
> +static bool request_ufd_features(int ufd, uint64_t features)
> +{
> +struct uffdio_api api_struct = {0};
> +uint64_t ioctl_mask;
> +
> +api_struct.api = UFFD_API;
> +api_struct.features = features;
> +if (ioctl(ufd, UFFDIO_API, _struct)) {
> +error_report("%s failed: UFFDIO_API failed: %s", __func__,
> +strerror(errno));

Maybe we can indent this line to follow this file's rule?

error_report("%s failed: UFFDIO_API failed: %s", __func__,
 strerror(errno));

>  return false;
>  }
>  
> -ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
> - (__u64)1 << _UFFDIO_UNREGISTER;
> +ioctl_mask = 1 << _UFFDIO_REGISTER |
> + 1 << _UFFDIO_UNREGISTER;

Could I ask why we explicitly removed (__u64) here? Since I see the
old one better.

>  if ((api_struct.ioctls & ioctl_mask) != ioctl_mask) {
>  error_report("Missing userfault features: %" PRIx64,
>   (uint64_t)(~api_struct.ioctls & ioctl_mask));
>  return false;
>  }
>  
> +return true;
> +}
> +
> +static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
> +{
> +uint64_t asked_features = 0;
> +static uint64_t supported_features;
> +
> +/*
> + * it's not possible to
> + * request UFFD_API twice per one fd
> + * userfault fd features is persistent
> + */
> +if (!supported_features) {

I would prefer not having this static variable. After all, this
function call is rare, and the receive_ufd_features() is not that slow
as well.

> +if (!receive_ufd_features(_features)) {
> +error_report("%s failed", __func__);
> +return false;
> +}
> +}
> +
> +/*
> + * request features, even if asked_features is 0, due to
> + * kernel expects UFFD_API before UFFDIO_REGISTER, per
> + * userfault file descriptor
> + */
> +if (!request_ufd_features(ufd, asked_features)) {
> +error_report("%s failed: features %" PRIu64, __func__,
> +

Re: [Qemu-devel] [PATCH risu] ppc64: Fix patterns for rotate doubleword instructions

2017-05-23 Thread joserz
On Tue, May 23, 2017 at 11:47:30AM +0530, Nikunj A Dadhania wrote:
> G 3  writes:
> 
> > On May 22, 2017, at 4:32 AM, qemu-devel-requ...@nongnu.org wrote:
> >
> > Hello I have also done some work risu. My patches add ppc32 support.  
> > Well my patches were made to work with Mac OS X but they are required  
> > to work with Linux. Do you think you could help port these patches to  
> > Linux?
> 
> Ziviani did the ppc64 work, lets see if he can spare some time.
> 
> The patches haven't come right on the mailing list, its tedious to pull
> them. Please resend them with git send-mail.
> 

Hey, sure I can help. I'll take a look on them.

> >
> > ppc.risu:
> > https://patchwork.kernel.org/patch/9697489/
> >
> > risu_ppc.c:
> > https://patchwork.kernel.org/patch/9697491/
> >
> > risu_reginfo_ppc.c:
> > https://patchwork.kernel.org/patch/9697493/
> >
> > risu_reginfo_ppc.h:
> > https://patchwork.kernel.org/patch/9697495/
> >
> > risugen_ppc.pm:
> > https://patchwork.kernel.org/patch/9697497/
> >
> > Add ppc support to configure:
> > https://patchwork.kernel.org/patch/9697499/
> >
> > Add verbose option:
> > https://patchwork.kernel.org/patch/9697501/
> >
> > Add end of test message:
> > https://patchwork.kernel.org/patch/9697503/
> >
> > Add more descriptive comment for mismatch or end of test condition:
> > https://patchwork.kernel.org/patch/9697505/
> 
> Regards
> Nikunj
> 




Re: [Qemu-devel] [Qemu-block] [PATCH v2 10/16] mirror: Do initial aio context move of target via BB interface

2017-05-23 Thread Fam Zheng
On Thu, 05/11 16:27, Stefan Hajnoczi wrote:
> On Wed, Apr 19, 2017 at 05:43:50PM +0800, Fam Zheng wrote:
> > diff --git a/blockdev.c b/blockdev.c
> > index dfd1385..53f9874 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -3556,8 +3556,6 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
> >  goto out;
> >  }
> >  
> > -bdrv_set_aio_context(target_bs, aio_context);
> > -
> >  blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, 
> > target_bs,
> > arg->has_replaces, arg->replaces, arg->sync,
> > backing_mode, arg->has_speed, arg->speed,
> > @@ -3608,8 +3606,6 @@ void qmp_blockdev_mirror(bool has_job_id, const char 
> > *job_id,
> >  aio_context = bdrv_get_aio_context(bs);
> >  aio_context_acquire(aio_context);
> >  
> > -bdrv_set_aio_context(target_bs, aio_context);
> > -
> >  blockdev_mirror_common(has_job_id ? job_id : NULL, bs, target_bs,
> > has_replaces, replaces, sync, backing_mode,
> > has_speed, speed,
> 
> This patch post-pones bdrv_set_aio_context() until later on.  We now
> call bdrv_get_info() without holding target_bs's AioContext in
> mirror_start_job().
> 
> Please acquire the AioContext around target_bs accesses until we move it
> to our AioContext.

Good catch! Will fix.

Fam




Re: [Qemu-devel] [Qemu-block] [PATCH v2 06/16] backup: Request BLK_PERM_AIO_CONTEXT_CHANGE on target

2017-05-23 Thread Fam Zheng
On Thu, 05/11 15:41, Stefan Hajnoczi wrote:
> On Wed, Apr 19, 2017 at 05:43:46PM +0800, Fam Zheng wrote:
> > What's done in the source's context change notifier is moving the
> > target's context to follow the new one, so we request this permission
> > here.
> 
> It's true that the backup block job must be able to set target's
> AioContext, but does this change also allow other users to set target's
> AioContext while the backup job is running?  If yes, then we need to
> handle that.

If through job->target, yes, but I don't think there is any user of job->target.
Otherwise, it's not allowed, because the second parameter of blk_new doesn't
have BLK_PERM_AIO_CONTEXT_CHANGE.

So it's okay.

Fam



Re: [Qemu-devel] [Qemu-block] [PATCH v2 02/16] block-backend: Add blk_request_perm

2017-05-23 Thread Fam Zheng
On Thu, 05/11 15:32, Stefan Hajnoczi wrote:
> On Wed, Apr 19, 2017 at 05:43:42PM +0800, Fam Zheng wrote:
> > This function tries to request, if not granted yet, for the given
> > permissions.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/block-backend.c  | 12 
> >  include/sysemu/block-backend.h |  1 +
> >  2 files changed, 13 insertions(+)
> > 
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index 7405024..6bdd9ce 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -629,6 +629,18 @@ void blk_resume_after_migration(Error **errp)
> >  }
> >  }
> >  
> > +int blk_request_perm(BlockBackend *blk, uint64_t perm, Error **errp)
> > +{
> > +uint64_t blk_perm, shared_perm;
> > +
> > +blk_get_perm(blk, _perm, _perm);
> > +if ((blk_perm & perm) == perm) {
> > +return 0;
> > +}
> > +blk_perm |= perm;
> > +return blk_set_perm(blk, blk_perm, shared_perm, errp);
> > +}
> 
> I'm slightly confused about why this function is needed.  blk_set_perm()
> doesn't do quite the right thing?

Sorry for the late reply!

I think blk_set_perm() always calls bs->drv->bdrv_check_perm() even when the
required perm bits are already acquired.

Fam



Re: [Qemu-devel] [Xen-devel] [RFC PATCH V2 1/2] xen-pt: bind/unbind interrupt remapping format MSI

2017-05-23 Thread Lan Tianyu
On 2017年05月24日 01:06, Anthony PERARD wrote:
> On Tue, May 23, 2017 at 08:16:25PM +0800, Lan Tianyu wrote:
>> On 2017年05月19日 20:04, Jan Beulich wrote:
>> On 19.05.17 at 13:16,  wrote:
 On Thu, May 18, 2017 at 01:32:59AM -0400, Lan Tianyu wrote:
> --- a/include/hw/i386/apic-msidef.h
> +++ b/include/hw/i386/apic-msidef.h
> @@ -26,6 +26,7 @@
>  
>  #define MSI_ADDR_DEST_ID_SHIFT  12
>  #define MSI_ADDR_DEST_IDX_SHIFT 4
> -#define  MSI_ADDR_DEST_ID_MASK  0x000
> +#define  MSI_ADDR_DEST_ID_MASK  0x000fff00
 The value of MSI_ADDR_DEST_ID_MASK is changed here. I think the patch
 should be:
 +#define  MSI_ADDR_DEST_ID_MASK  0x0000
>>> Judging from other sources, rather the other way around - the
>>> mask needs to have further bits removed (should be 0x000ff000
>>> afaict). Xen sources confirm this, and while Linux has the value
>>> you suggest, that contradicts
>> Agree. Defining the mask as "0x000ff000" makes more sense.
>> Just check Qemu source code. Only apic_send_msi() and msi_dest_id() use
>> the mask
>> to get dest apic id. They mask MSI address field with 
>> MSI_ADDR_DEST_ID_MASK and
>> then right-shift 12bit. The low 12bit won't be used.
>>
>> Anthony, does this make sense?
> Yes, it does.
> The change to MSI_ADDR_DEST_ID_MASK should probably go in its own patch.
>
OK. Will update.


-- 
Best regards
Tianyu Lan



Re: [Qemu-devel] virtio crypto device implemenation

2017-05-23 Thread Michael S. Tsirkin
On Tue, May 23, 2017 at 04:08:25PM +, Zeng, Xin wrote:
> Hi, Michael, 
>As you know, Lei Gong from Huawei and I are co-working  on virtio crypto 
> device spec, he is focusing on symmetric algorithm part, I am focusing on 
> asymmetric part.  Now I am planning the implementation for asymmetric part, 
> would you please give me your point regarding the questions below?
>Current virtio crypto device implementation from Lei Gong:
>The virtio crypto device implementation has been upstreamed to QEMU and it 
> has a qemu backend implementation for symmetric algorithm part, the front end 
> Linux device driver for symmetric part has been upstreamed to Linux kernel as 
> well.
>My questions:
>From my side, I planned to add the asymmetric part support in upstreamed 
> front end device driver, and I don't want to add the asymmetric algorithm 
> support to current virtio crypto device's qemu backend, instead, I would like 
> to implement and upstream a DPDK vhost-user based backend for asymmetric 
> algorithm, and accordingly Lei Gong will help to upstream a vhost user agent 
> for virtio crypto device in QEMU,  is this approach acceptable? Is a qemu 
> backend a mandatory requirement for the virtio crypto device?  Is there a 
> general policy for this?
> 
> Thanks

Parity on QEMU side is naturally preferable.  I don't think we should require it
at all times, but if there's no implementation outside vhost-user,
and if the feature includes a non-trivial amount of code, how
will it be tested? I don't think we want to require all testers to use
dpdk. An implementation under tests using libvhost-user might
be a solution.

-- 
MST



[Qemu-devel] [PATCH] docker: Add flex and bison to centos6 image

2017-05-23 Thread Fam Zheng
Currently there are warnings about flex and bison being missing when
building in the centos6 image:

make[1]: flex: Command not found
 BISON dtc-parser.tab.c
make[1]: bison: Command not found

Add them.

Reported-by: Thomas Huth 
Signed-off-by: Fam Zheng 
---
 tests/docker/dockerfiles/centos6.docker | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/docker/dockerfiles/centos6.docker 
b/tests/docker/dockerfiles/centos6.docker
index 34e0d3b..17a4d24 100644
--- a/tests/docker/dockerfiles/centos6.docker
+++ b/tests/docker/dockerfiles/centos6.docker
@@ -1,7 +1,7 @@
 FROM centos:6
 RUN yum install -y epel-release
 ENV PACKAGES libfdt-devel ccache \
-tar git make gcc g++ \
+tar git make gcc g++ flex bison \
 zlib-devel glib2-devel SDL-devel pixman-devel \
 epel-release
 RUN yum install -y $PACKAGES
-- 
2.9.4




[Qemu-devel] [Bug 1693050] [NEW] xhci HCIVERSION register read emulation incorrectly handled

2017-05-23 Thread Robert Mustacchi
Public bug reported:

We had an illumos user trying to run illumos in QEMU 2.9.0 with the
qemu-xhci device enabled. Note, that while this was discovered against
QEMU 2.9.0, from my current read of the HEAD, it is still present. The
illumos bug at https://www.illumos.org/issues/8173 has additional
information on how the user invoked qemu. While investigating the
problem we found that the illumos driver was reading a version of 0x
when reading the HCIVERSION register from qemu.

In the illumos driver we're performing a 16-bit read of the version
register at offset 0x2. From looking around at other OSes, while Linux
performs a 4 byte read at offset 0x0 and masks out the version, others
that care about the version are doing a two byte read, though not all
actually act on the version and some just discard the information.

The user who hit this was able to enable tracing (note the tracing file
is attached to the illumos bug linked previously) and we hit the
unimplemented register read with offset 0x2 at
http://git.qemu.org/?p=qemu.git;a=blob;f=hw/usb/hcd-
xhci.c;hb=HEAD#l2960. The xhci register specifies today that its allowed
for users to do 1-4 byte reads; however, that it implements only four
byte reads in its implementation
(http://git.qemu.org/?p=qemu.git;a=blob;f=hw/usb/hcd-
xhci.c;hb=HEAD#l). Hence why when we read the HCIVERSION register at
offset 0x2, it isn't handled in xhci_cap_read() which then returns
zeros.

>From digging into this, I think that we're coming into
memory_region_dispatch_read() and then memory_region_dispatch_read1().
However, I don't see anything in either the general memory region logic
or in the xhci_cap_read() function that would deal with adjusting the
offset that we're reading at and then masking it off again. While the
access_with_adjusted_size() attempts to deal with this, it never adjusts
the hardware address that's passed in to be a multiple of the
implementation defined offset that I can see. I suspect that the FIXME
at line 582
(http://git.qemu.org/?p=qemu.git;a=blob;f=memory.c;hb=HEAD#l582) and the
implementation in the xhci code is the crux of the problem.

For the time being we're working around this in the illumos driver, but
I wanted to point this out such that it might be helpful for other
systems which are assuming that they can do the two byte read like on
hardware.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1693050

Title:
  xhci HCIVERSION register read emulation incorrectly handled

Status in QEMU:
  New

Bug description:
  We had an illumos user trying to run illumos in QEMU 2.9.0 with the
  qemu-xhci device enabled. Note, that while this was discovered against
  QEMU 2.9.0, from my current read of the HEAD, it is still present. The
  illumos bug at https://www.illumos.org/issues/8173 has additional
  information on how the user invoked qemu. While investigating the
  problem we found that the illumos driver was reading a version of
  0x when reading the HCIVERSION register from qemu.

  In the illumos driver we're performing a 16-bit read of the version
  register at offset 0x2. From looking around at other OSes, while Linux
  performs a 4 byte read at offset 0x0 and masks out the version, others
  that care about the version are doing a two byte read, though not all
  actually act on the version and some just discard the information.

  The user who hit this was able to enable tracing (note the tracing
  file is attached to the illumos bug linked previously) and we hit the
  unimplemented register read with offset 0x2 at
  http://git.qemu.org/?p=qemu.git;a=blob;f=hw/usb/hcd-
  xhci.c;hb=HEAD#l2960. The xhci register specifies today that its
  allowed for users to do 1-4 byte reads; however, that it implements
  only four byte reads in its implementation
  (http://git.qemu.org/?p=qemu.git;a=blob;f=hw/usb/hcd-
  xhci.c;hb=HEAD#l). Hence why when we read the HCIVERSION register
  at offset 0x2, it isn't handled in xhci_cap_read() which then returns
  zeros.

  From digging into this, I think that we're coming into
  memory_region_dispatch_read() and then memory_region_dispatch_read1().
  However, I don't see anything in either the general memory region
  logic or in the xhci_cap_read() function that would deal with
  adjusting the offset that we're reading at and then masking it off
  again. While the access_with_adjusted_size() attempts to deal with
  this, it never adjusts the hardware address that's passed in to be a
  multiple of the implementation defined offset that I can see. I
  suspect that the FIXME at line 582
  (http://git.qemu.org/?p=qemu.git;a=blob;f=memory.c;hb=HEAD#l582) and
  the implementation in the xhci code is the crux of the problem.

  For the time being we're working around this in the illumos driver,
  but I wanted to point this out such that it might be 

Re: [Qemu-devel] [PATCH 02/31] target/s390x: Implement EXECUTE via new TranslationBlock

2017-05-23 Thread Richard Henderson

On 05/23/2017 10:28 AM, Aurelien Jarno wrote:

Something like this, as a delta patch.


Unfortunately it doesn't work. So far I have no real idea what could be
the root cause of the issue. I have just determined that up to the crash,
only a very limited set of instructions are being executed. They are the
4 bytes long versions of MVC, CLC, XC, TR.


Yeah, it appears XC is the culprit, though I have not yet determined exactly 
what's going wrong.


That said, perhaps I'll delay this for later and just add some extra helpers 
for now.  It does seem slightly wasteful to create a TB for at least these 
common cases.



r~



Re: [Qemu-devel] [PATCH v2 4/4] 9pfs: local: metadata file for the VirtFS root

2017-05-23 Thread Leo Gaspard
On 05/23/2017 07:13 PM, Eric Blake wrote:> We have to block
VIRTFS_META_DIR at any depth in the hierarchy, but
> can/should we change the blocking of VIRTFS_META_ROOT_FILE to only
> happen at the root directory, rather than at all directories?  On the
> other hand, if you can simultaneously map /path/to/a for one mount
> point, and /path/to/a/b for another, then the root file for B is visible
> at a lower depth than the root file for A, and blocking the metafile
> name everywhere means that the mount A can't influence the behavior on
> the mount for B.

If you take this kind of vulnerabilities into account, then you also
have to consider a mix of mapped-file and mapped-attr mounts, or even
worse a proxy with a mapped-file mount (which I think is currently
vulnerable to this threat if the "proxy" path points above the
"local,security_model=mapped-file" path, as the check is done in
"local_" functions, which are I guess not used for proxy-type virtfs)

I'm clearly not saying it's an invalid attack (there is no explicit
documentation stating it's insecure to "nest" virtual mounts"), just
saying it's a much larger attack surface than one internal to virtfs
mapped-file only. Then the question of what is reasonably possible to
forbid to the user arises, and that's not one I could answer.

Cheers & HTH,
Leo



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 0/4] 9pfs: local: fix metadata of mapped-file security mode

2017-05-23 Thread Leo Gaspard
On 05/23/2017 04:32 PM, Greg Kurz wrote:
> v2: - posted patch for CVE-2017-7493 separately
> - other changes available in each patch changelog
> 
> Leo,
> 
> If you find time to test this series, I'll gladly add your Tested-by: to
> it before merging.

Just tested with a base of 2.9.0 with patches [1] [2] (from my
distribution), [3] (required to apply cleanly) and this patchset.

Things appear to work as expected, and .virtfs_metadata{,_root} appear
to be neither readable nor writable by any user.

That said, one thing still bothering me with the fix in [3] is that it
still "leaks" the host's uid/gid to the guest when a corresponding file
in .virtfs_metadata is not present (while I'd have expected it to appear
as root:root in the guest), but that's a separate issue, and I guess
retro-compatibility prevents any fixing it.

Thanks for these patches!
Leo


[1]
https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/virtualization/qemu/force-uid0-on-9p.patch

[2]
https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/virtualization/qemu/no-etc-install.patch

[3] https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03663.html



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 0/4] 9pfs: local: fix metadata of mapped-file security mode

2017-05-23 Thread Fam Zheng
On Tue, 05/23 17:04, Greg Kurz wrote:
> >   CC  hw/9pfs/9p-xattr-user.o
> > In file included from 
> > /var/tmp/patchew-tester-tmp-5niyvzwz/src/hw/9pfs/9p-local.c:18:0:
> > /var/tmp/patchew-tester-tmp-5niyvzwz/src/hw/9pfs/9p-local.c: In function 
> > ‘local_set_mapped_file_attrat’:
> > /var/tmp/patchew-tester-tmp-5niyvzwz/src/hw/9pfs/9p-util.h:19:5: error: 
> > ‘map_dirfd’ may be used uninitialized in this function 
> > [-Werror=maybe-uninitialized]
> >  close(fd);
> >  ^
> > /var/tmp/patchew-tester-tmp-5niyvzwz/src/hw/9pfs/9p-local.c:235:9: note: 
> > ‘map_dirfd’ was declared here
> >  int map_dirfd, map_fd;
> >  ^
> > cc1: all warnings being treated as errors
> 
> This is a false positive AFAICT... what compiler is this ?

It's gcc, as in the snipped package list above.

Fam



Re: [Qemu-devel] [PATCH v2 3/4] 9pfs: local: simplify file opening

2017-05-23 Thread Greg Kurz
On Tue, 23 May 2017 10:51:26 -0500
Eric Blake  wrote:

> On 05/23/2017 09:32 AM, Greg Kurz wrote:
> > The logic to open a path currently sits between local_open_nofollow() and
> > the relative_openat_nofollow() helper, which has no other user.
> > 
> > For the sake of clarity, this patch moves all the code of the helper into
> > its unique caller. While here we also:
> > - drop the code to skip leading "/" because the backend isn't supposed to
> >   pass anything but relative paths without consecutive slashes. The assert()
> >   is kept because we really don't want a buggy backend to pass   an 
> > absolute  
> 
> odd spacing
> 
> >   path to openat().
> > - use strchrnul() to get a simpler code. This is ok since virtfs if for  
> 
> s/if/is/
> 

Yeah, I spotted these two nits just after posting the series, as usual :)

I'll fix them before merging.

> >   linux+glibc hosts only.
> > - don't dup() the initial directory and add an assert() to ensure we don't
> >   return the global mountfd to the caller. BTW, this would mean that the
> >   caller passed an empty path, which isn't supposed to happen either.
> > 
> > Signed-off-by: Greg Kurz 
> > ---  
> 
> Reviewed-by: Eric Blake 
> 



pgpD7XR2CQwF1.pgp
Description: OpenPGP digital signature


[Qemu-devel] [Bug 1318091] Re: Perfctr MSRs not available to Guest OS on AMD Phenom II

2017-05-23 Thread Oliver Kaltenreiner
i don't understand this in detail, but since the last update of qemu i can't 
start my virtual win7 machine. i use gnome-boxes 3.24. qemu 2.8 works, 2.9 
leads to this:
Preformatted text(gnome-boxes:4301): Boxes-WARNING **: machine.vala:611: Failed 
to start win7: Unable to start domain: the CPU is incompatible with host CPU: 
Host CPU does not provide required features: monitor, rdtscp, svm

i ask, because i also use an phenom 2 x4 and if this is the bug i don't
need to opan a new one.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1318091

Title:
  Perfctr MSRs not available to Guest OS on AMD Phenom II

Status in QEMU:
  New

Bug description:
  The  AMD Phenom(tm) II X4 965 Processor (family 16, model 4, stepping
  3) has the 4 architecturally supported perf counters at MSRs.  The
  selectors are c001000-c001003, and the counters are c001004-c001007.
  I've verified that the MSRs are there and working by manually setting
  the MSRs with msr-tools to count cycles.

  The processor does not support the extended perfctr or the perfctr_nb.
  These are in cpuid leaf 8000_0001.  Qemu also sees that these cpuid
  flags are not set, when I try launching with  -cpu
  host,perfctr_core,check.  However, this flag is only for the extended
  perfctr MSRs, which also happen to map the original four counters at
  c0010200.

  When I run a Guest OS, that OS is unable to use the perf counter
  registers from c001000-7.  rdmsr and wrmsr complete, but the results
  are always 0.  By contrast, a wrmsr to one of the c0010200 registers
  causes a general protection fault (as it should, since those aren't
  supported).

  Kernel: 3.14.0-gentoo
  Qemu: 2.0.0 (gentoo) and also with 2.0.50 (commit 06b4f00d5)
  Qemu command: qemu-system-x86_64 -enable-kvm -cpu host -smp 8 -m 1024 
-nographic -monitor /dev/pts/4 mnt/hdd.img
  cat /proc/cpuinfo:
  processor : 3
  vendor_id : AuthenticAMD
  cpu family: 16
  model : 4
  model name: AMD Phenom(tm) II X4 965 Processor
  stepping  : 3
  cpu MHz   : 800.000
  cache size: 512 KB
  physical id   : 0
  siblings  : 4
  core id   : 3
  cpu cores : 4
  apicid: 3
  initial apicid: 3
  fpu   : yes
  fpu_exception : yes
  cpuid level   : 5
  wp: yes
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni 
monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 
misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock 
nrip_save
  bogomips  : 6803.79
  TLB size  : 1024 4K pages
  clflush size  : 64
  cache_alignment   : 64
  address sizes : 48 bits physical, 48 bits virtual
  power management: ts ttp tm stc 100mhzsteps hwpstate

  thanks.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1318091/+subscriptions



Re: [Qemu-devel] [PATCH v2] tests/libqtest: Print error instead of aborting when env variable is missing

2017-05-23 Thread Michael Roth
Quoting Thomas Huth (2017-05-23 02:25:30)
> On 22.05.2017 17:58, no-re...@patchew.org wrote:
> > This series failed automatic build test. Please find the testing commands 
> > and
> > their output below. If you have docker installed, you can probably 
> > reproduce it
> > locally.
> [...]
> >LEX convert-dtsv0-lexer.lex.c
> >DEP /tmp/qemu-test/src/dtc/fdtdump.c
> > make[1]: flex: Command not found
> >BISON dtc-parser.tab.c
> > make[1]: bison: Command not found
> >DEP /tmp/qemu-test/src/dtc/srcpos.c
> >LEX dtc-lexer.lex.c
> > make[1]: flex: Command not found
> >DEP /tmp/qemu-test/src/dtc/treesource.c
> >DEP /tmp/qemu-test/src/dtc/livetree.c
> >DEP /tmp/qemu-test/src/dtc/fstree.c
> >DEP /tmp/qemu-test/src/dtc/flattree.c
> >DEP /tmp/qemu-test/src/dtc/dtc.c
> >DEP /tmp/qemu-test/src/dtc/data.c
> >DEP /tmp/qemu-test/src/dtc/checks.c
> >   CHK version_gen.h
> >BISON dtc-parser.tab.c
> > make[1]: bison: Command not found
> >LEX convert-dtsv0-lexer.lex.c
> >   UPD version_gen.h
> > make[1]: flex: Command not found
> >LEX dtc-lexer.lex.c
> > make[1]: flex: Command not found
> >DEP /tmp/qemu-test/src/dtc/util.c
> >LEX convert-dtsv0-lexer.lex.c
> >BISON dtc-parser.tab.c
> > make[1]: flex: Command not found
> > make[1]: bison: Command not found
> >LEX dtc-lexer.lex.c
> > make[1]: flex: Command not found
> 
> Looks like flex and bison are missing in the docker image? Could someone
> please add it?
> 
> [...]
> >   CC  replay/replay-internal.o
> > /tmp/qemu-test/src/replay/replay-internal.c: In function ‘replay_put_array’:
> > /tmp/qemu-test/src/replay/replay-internal.c:65: warning: ignoring return 
> > value of ‘fwrite’, declared with attribute warn_unused_result
> 
> I guess that should be fixed?
> 
> [...]
> >   CC  slirp/tcp_input.o
> >   CC  slirp/tcp_output.o
> >   CC  slirp/tcp_subr.o
> >   CC  slirp/tcp_timer.o
> > /tmp/qemu-test/src/slirp/tcp_input.c: In function ‘tcp_input’:
> > /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_p’ may be 
> > used uninitialized in this function
> > /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_len’ may be 
> > used uninitialized in this function
> > /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_tos’ may be 
> > used uninitialized in this function
> > /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_id’ may be 
> > used uninitialized in this function
> > /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_off’ may be 
> > used uninitialized in this function
> > /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_ttl’ may be 
> > used uninitialized in this function
> > /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_sum’ may be 
> > used uninitialized in this function
> > /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_src.s_addr’ 
> > may be used uninitialized in this function
> > /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_dst.s_addr’ 
> > may be used uninitialized in this function
> > /tmp/qemu-test/src/slirp/tcp_input.c:220: warning: ‘save_ip6.ip_nh’ may be 
> > used uninitialized in this function
> 
> I've never seen these warnings in tcp_input.c before ... and they also
> look like false positives too me ... is that GCC in the docker image too
> sensitive?
> 
> [...]
> >   CC  aarch64-softmmu/hw/timer/exynos4210_rtc.o
> > /tmp/qemu-test/src/hw/i386/pc_piix.c: In function 
> > ‘igd_passthrough_isa_bridge_create’:
> > /tmp/qemu-test/src/hw/i386/pc_piix.c:1067: warning: ‘pch_rev_id’ may be 
> > used uninitialized in this function
> 
> dito
> 
> >   CC  aarch64-softmmu/hw/usb/tusb6010.o
> > /tmp/qemu-test/src/hw/i386/acpi-build.c: In function 
> > ‘build_append_pci_bus_devices’:
> > /tmp/qemu-test/src/hw/i386/acpi-build.c:525: warning: ‘notify_method’ may 
> > be used uninitialized in this function
> 
> dito
> 
> [...]
> > /tmp/qemu-test/src/tests/ide-test.c: In function ‘cdrom_pio_impl’:
> > /tmp/qemu-test/src/tests/ide-test.c:803: warning: ignoring return value of 
> > ‘fwrite’, declared with attribute warn_unused_result
> > /tmp/qemu-test/src/tests/ide-test.c: In function ‘test_cdrom_dma’:
> > /tmp/qemu-test/src/tests/ide-test.c:899: warning: ignoring return value of 
> > ‘fwrite’, declared with attribute warn_unused_result
> 
> Needs a patch, I guess?
> 
> >   GTESTER tests/test-vmstate
> > Failed to load simple/primitive:b_1
> > Failed to load simple/primitive:i64_2
> > Failed to load simple/primitive:i32_1
> > Failed to load simple/primitive:i32_1
> > Failed to load test/with_tmp:a
> > Failed to load test/tmp_child_parent:f
> > Failed to load test/tmp_child:parent
> > Failed to load test/with_tmp:tmp
> > Failed to load test/tmp_child:diff
> > Failed to load test/with_tmp:tmp
> > Failed to load test/tmp_child:diff
> > Failed to load test/with_tmp:tmp
> 
> 

Re: [Qemu-devel] [PATCH 1/7] qcow2: Remove unused Error in do_perform_cow()

2017-05-23 Thread Eric Blake
On 05/23/2017 06:22 AM, Alberto Garcia wrote:
> qcow2_encrypt_sectors() does not need an Error parameter, and we're
> not checking its value anyway, so we can safely remove it.

Misleading. You are NOT removing the Error parameter from
qcow2_encrypt_sectors(), but rather are explicitly ignoring any errors
by passing NULL.

I'd update the commit message to something like:

We are relying on the return value of qcow2_encrypt_sectors() to flag
problems, but have no way to report that error to the end user.  Since
we are just throwing away the error, we can pass NULL instead for
simpler code.

A more robust solution would figure out how to pass the original error
(rather than a new message related to our -EIO return) back to the
caller, but that is more invasive.

> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 

With a better commit message,
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 02/31] target/s390x: Implement EXECUTE via new TranslationBlock

2017-05-23 Thread Thomas Huth
On 23.05.2017 17:54, Richard Henderson wrote:
> On 05/23/2017 03:48 AM, Aurelien Jarno wrote:
>> On 2017-05-22 20:02, Richard Henderson wrote:
>>> Previously, helper_ex would construct the insn and then implement
>>> the insn via direct calls other helpers.  This was sufficient to
>>> boot Linux but that is all.
>>>
>>> It is easy enough to go the whole nine yards by stashing state for
>>> EXECUTE within the cpu, and then relying on a new TB to be created
>>> that properly and completely interprets the insn.
>>>
>>> Signed-off-by: Richard Henderson 
>>> ---
>>>   target/s390x/cpu.h |   4 +-
>>>   target/s390x/helper.h  |   2 +-
>>>   target/s390x/insn-data.def |   4 +-
>>>   target/s390x/machine.c |  19 +++
>>>   target/s390x/mem_helper.c  | 136
>>> +++--
>>>   target/s390x/translate.c   | 124
>>> +
>>>   6 files changed, 133 insertions(+), 156 deletions(-)
>>
>> This looks good on the principle, and finally removes a big hack. That
>> said it prevent my test system to boot. I haven't investigated why yet.
> 
> Hmm.  I've not got a complete environment -- merely booting a kernel up
> to the point it fails to find a rootfs.  Which did find several problems
> with my first attempts at this, but wouldn't have exercised paging. 
> I'll try again to get a full install working...

Something nice for a quick test is also:

http://www.qemu-advent-calendar.org/2014/download/s390-moon-buggy.tar.xz

Not sure whether it will trigger your EXECUTE problem, though.

 Thomas



Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS

2017-05-23 Thread Laurent Vivier
On 23/05/2017 20:22, Daniel Henrique Barboza wrote:
> 
> 
> This is the Libvirt use case that fails with this patch set applied in
> QEMU master, Libvirt 3.4.0 compiled from
> source:
> 
> # ./virsh start dhb_ub1704_nfs 2
> #
> # ./virsh setvcpus dhb_ub1704_nfs 2 --live
> #
> # ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs
> --desturi qemu+ssh://target_ip/system --timeout 60 --verbose
> error: internal error: unable to execute QEMU command 'device_add': CPU
> hotplug not supported without OS

Good point.

I think I should refine my series to allow hotplug if the machine is not
started.

Thanks,
Laurent



Re: [Qemu-devel] [PATCH v2 13/13] vvfat: change OEM name to 'MSWIN4.1'

2017-05-23 Thread Hervé Poussineau

Hi Philippe,

Le 23/05/2017 à 06:23, Philippe Mathieu-Daudé a écrit :

Hi Hervé,

On 05/22/2017 06:12 PM, Hervé Poussineau wrote:

According to specification:
"'MSWIN4.1' is the recommanded setting, because it is the setting least likely
to cause compatibility problems. If you want to put something else in here,
that is your option, but the result may be that some FAT drivers might not
recognize the volume."


It also says "Typically this is some indication of what system formatted
the volume."

From https://technet.microsoft.com/en-us/library/cc976796.aspx:

"Following the jump instruction is the 8-byte OEM ID, a string of characters 
that identifies the name and version number of the operating system that formatted 
the volume. To preserve compatibility
with MS-DOS, Windows 2000 records "MSDOS5.0" in this field on FAT16 and FAT32 disks. On 
NTFS disks, Windows 2000 records "NTFS."
You may also see the OEM ID "MSWIN4.0" on disks formatted by Windows 95 and 
"MSWIN4.1" on disks formatted by Windows 95 OSR2 and Windows 98. Windows 2000 does not 
use the OEM ID field in the boot
sector except for verifying NTFS volumes."

I'm curious which one is the most up-to-date, the specification v1.03 or a 
Windows 2000. Do you have an idea if there is more MSDOS/W2K vs W95/98 users? 
Hopefully W95 is a Long time no see to me.

You think having "QEMU" OEM does more harm than good?


That's complicated. Indeed, OEM ID should be the name of the OS/tool which 
formatted the partition.
However, some FAT drivers take different paths according to OEM ID.
See for example:
https://jdebp.eu/FGA/volume-boot-block-oem-name-field.html
http://seasip.info./Misc/oemid.html

So, in my opinion, it should be better to stick to some specification for the 
whole FAT format, so we have a reference document to know if there is a bug in 
the code or not.
FAT specification 1.03 is dated December 6, 2000, while Windows 2000 has been 
released December 15, 1999, so both are around the same date.
FAT specification gives all details about all disk structures, so I think 
that's better to stick to what it says.

So to answer your question, and knowing the tricks played with OEM ID, I think that's better to use 
"MSWIN4.1" than "QEMU".

Regards,

Hervé





Specification: "FAT: General overview of on-disk format" v1.03, page 9
Signed-off-by: Hervé Poussineau 
---
 block/vvfat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 53e8faa54c..1f7f46ecea 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1024,7 +1024,7 @@ static int init_directories(BDRVVVFATState* s,
 bootsector->jump[0]=0xeb;
 bootsector->jump[1]=0x3e;
 bootsector->jump[2]=0x90;
-memcpy(bootsector->name,"QEMU",8);
+memcpy(bootsector->name, "MSWIN4.1", 8);
 bootsector->sector_size=cpu_to_le16(0x200);
 bootsector->sectors_per_cluster=s->sectors_per_cluster;
 bootsector->reserved_sectors=cpu_to_le16(1);



Regards,

Phil.






Re: [Qemu-devel] [PATCH v2 4/4] 9pfs: local: metadata file for the VirtFS root

2017-05-23 Thread Greg Kurz
On Tue, 23 May 2017 12:13:17 -0500
Eric Blake  wrote:

> On 05/23/2017 09:32 AM, Greg Kurz wrote:
> > When using the mapped-file security, credentials are stored in a metadata
> > directory located in the parent directory. This is okay for all paths with
> > the notable exception of the root path, since we don't want and probably
> > can't create a metadata directory above the virtfs directory on the host.
> > 
> > This patch introduces a dedicated metadata file, sitting in the virtfs root
> > for this purpose. It relies on the fact that the "." name necessarily refers
> > to the virtfs root.
> > 
> > As for the metadata directory, we don't want the client to see this file.
> > The current code only cares for readdir() but there are many other places
> > to fix actually. The filtering logic is hence put in a separate function.
> > 
> > @@ -478,7 +504,8 @@ static off_t local_telldir(FsContext *ctx, 
> > V9fsFidOpenState *fs)
> >  
> >  static bool local_is_mapped_file_metadata(FsContext *fs_ctx, const char 
> > *name)
> >  {
> > -return !strcmp(name, VIRTFS_META_DIR);
> > +return
> > +!strcmp(name, VIRTFS_META_DIR) || !strcmp(name, 
> > VIRTFS_META_ROOT_FILE);  
> 
> We have to block VIRTFS_META_DIR at any depth in the hierarchy, but
> can/should we change the blocking of VIRTFS_META_ROOT_FILE to only
> happen at the root directory, rather than at all directories?  On the
> other hand, if you can simultaneously map /path/to/a for one mount
> point, and /path/to/a/b for another, then the root file for B is visible
> at a lower depth than the root file for A, and blocking the metafile
> name everywhere means that the mount A can't influence the behavior on
> the mount for B.
> 

I must confess I hadn't this scenario in mind but it's safer from a
security standpoint indeed.

> Not tested, but looks sane, so:
> Reviewed-by: Eric Blake 
> 

Thanks again for the review, Eric.

Cheers,

--
Greg


pgpZnM41L6JdP.pgp
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS

2017-05-23 Thread Daniel Henrique Barboza



On 05/23/2017 03:07 PM, Daniel Henrique Barboza wrote:



On 05/23/2017 02:52 PM, Daniel Henrique Barboza wrote:

Hi Laurent,

This is an interesting patch series. I've been working in the last 
weeks in the DRC
migration, mainly to solve the problem in which a hot CPU unplug will 
not succeed after
a migration if the CPU was hotplugged in the source. The problem 
happened when
migrating with virsh because Libvirt hotplugs the CPU in both source 
and target, and
the DRC state of the hotplugged after the migration is inconsistent. 
This series solves the
issue by preventing it from happening in the first place. Of course 
that migrating DRC states
has other uses (pending unplug operations during a migration, for 
example) so both patch

series can coexist.

One possible issue I see with this series is that it breaks Libvirt 
migration entirely if
a CPU/mem hotplug happens in the target. With your series applied the 
migration

fails before start with:


Sorry: if a migration happens in the *source*, before the migration.


Hehe nothing like fixing a typo with another one ...

This is the Libvirt use case that fails with this patch set applied in 
QEMU master, Libvirt 3.4.0 compiled from

source:

# ./virsh start dhb_ub1704_nfs 2
#
# ./virsh setvcpus dhb_ub1704_nfs 2 --live
#
# ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs 
--desturi qemu+ssh://target_ip/system --timeout 60 --verbose
error: internal error: unable to execute QEMU command 'device_add': CPU 
hotplug not supported without OS


This is the error msg that appears in Libvirt daemon:

2017-05-23 18:17:17.844+: 159678: error : 
qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU 
command 'device_add': CPU hotplug not supported without OS




Daniel






# ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs 
--desturi qemu+ssh://target_ip/system --timeout 60 --verbose
error: internal error: unable to execute QEMU command 'device_add': 
CPU hotplug not supported without OS


Note that I say "possible issue" because, although I believe we do 
not want to break Libvirt
if possible, I also believe that we need to think about what makes 
sense in QEMU first.




Thanks,


Daniel


On 05/23/2017 08:18 AM, Laurent Vivier wrote:

If the OS is not started, QEMU sends an event to the OS
that is lost and cannot be recovered. An unplug is not
able to restore QEMU in a coherent state.
So, while the OS is not started, disable CPU and memory hotplug.
We use option vector 6 to know if the OS is started

This series moves error checking for memory hotplug
in a pre_plug function, and introduces the option
vector 6 management. It also revert previous
fix which was not really fixing the hotplug problem
when the OS is not running.

Laurent Vivier (4):
   spapr: add pre_plug function for memory
   spapr: add option vector 6
   spapr: disable hotplugging without OS
   Revert "spapr: fix memory hot-unplugging"

  hw/ppc/spapr.c  | 103 


  hw/ppc/spapr_drc.c  |  20 ++---
  hw/ppc/spapr_hcall.c|   5 ++-
  hw/ppc/spapr_ovec.c |   8 
  include/hw/ppc/spapr.h  |   2 +
  include/hw/ppc/spapr_drc.h  |   1 -
  include/hw/ppc/spapr_ovec.h |   7 +++
  7 files changed, 109 insertions(+), 37 deletions(-)












Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS

2017-05-23 Thread Daniel Henrique Barboza



On 05/23/2017 02:52 PM, Daniel Henrique Barboza wrote:

Hi Laurent,

This is an interesting patch series. I've been working in the last 
weeks in the DRC
migration, mainly to solve the problem in which a hot CPU unplug will 
not succeed after
a migration if the CPU was hotplugged in the source. The problem 
happened when
migrating with virsh because Libvirt hotplugs the CPU in both source 
and target, and
the DRC state of the hotplugged after the migration is inconsistent. 
This series solves the
issue by preventing it from happening in the first place. Of course 
that migrating DRC states
has other uses (pending unplug operations during a migration, for 
example) so both patch

series can coexist.

One possible issue I see with this series is that it breaks Libvirt 
migration entirely if
a CPU/mem hotplug happens in the target. With your series applied the 
migration

fails before start with:


Sorry: if a migration happens in the *source*, before the migration.




# ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs 
--desturi qemu+ssh://target_ip/system --timeout 60 --verbose
error: internal error: unable to execute QEMU command 'device_add': 
CPU hotplug not supported without OS


Note that I say "possible issue" because, although I believe we do not 
want to break Libvirt
if possible, I also believe that we need to think about what makes 
sense in QEMU first.




Thanks,


Daniel


On 05/23/2017 08:18 AM, Laurent Vivier wrote:

If the OS is not started, QEMU sends an event to the OS
that is lost and cannot be recovered. An unplug is not
able to restore QEMU in a coherent state.
So, while the OS is not started, disable CPU and memory hotplug.
We use option vector 6 to know if the OS is started

This series moves error checking for memory hotplug
in a pre_plug function, and introduces the option
vector 6 management. It also revert previous
fix which was not really fixing the hotplug problem
when the OS is not running.

Laurent Vivier (4):
   spapr: add pre_plug function for memory
   spapr: add option vector 6
   spapr: disable hotplugging without OS
   Revert "spapr: fix memory hot-unplugging"

  hw/ppc/spapr.c  | 103 


  hw/ppc/spapr_drc.c  |  20 ++---
  hw/ppc/spapr_hcall.c|   5 ++-
  hw/ppc/spapr_ovec.c |   8 
  include/hw/ppc/spapr.h  |   2 +
  include/hw/ppc/spapr_drc.h  |   1 -
  include/hw/ppc/spapr_ovec.h |   7 +++
  7 files changed, 109 insertions(+), 37 deletions(-)









Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS

2017-05-23 Thread Daniel Henrique Barboza

Hi Laurent,

This is an interesting patch series. I've been working in the last weeks 
in the DRC
migration, mainly to solve the problem in which a hot CPU unplug will 
not succeed after
a migration if the CPU was hotplugged in the source. The problem 
happened when
migrating with virsh because Libvirt hotplugs the CPU in both source and 
target, and
the DRC state of the hotplugged after the migration is inconsistent. 
This series solves the
issue by preventing it from happening in the first place. Of course that 
migrating DRC states
has other uses (pending unplug operations during a migration, for 
example) so both patch

series can coexist.

One possible issue I see with this series is that it breaks Libvirt 
migration entirely if
a CPU/mem hotplug happens in the target. With your series applied the 
migration

fails before start with:


# ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs 
--desturi qemu+ssh://target_ip/system --timeout 60 --verbose
error: internal error: unable to execute QEMU command 'device_add': CPU 
hotplug not supported without OS


Note that I say "possible issue" because, although I believe we do not 
want to break Libvirt
if possible, I also believe that we need to think about what makes sense 
in QEMU first.




Thanks,


Daniel


On 05/23/2017 08:18 AM, Laurent Vivier wrote:

If the OS is not started, QEMU sends an event to the OS
that is lost and cannot be recovered. An unplug is not
able to restore QEMU in a coherent state.
So, while the OS is not started, disable CPU and memory hotplug.
We use option vector 6 to know if the OS is started

This series moves error checking for memory hotplug
in a pre_plug function, and introduces the option
vector 6 management. It also revert previous
fix which was not really fixing the hotplug problem
when the OS is not running.

Laurent Vivier (4):
   spapr: add pre_plug function for memory
   spapr: add option vector 6
   spapr: disable hotplugging without OS
   Revert "spapr: fix memory hot-unplugging"

  hw/ppc/spapr.c  | 103 
  hw/ppc/spapr_drc.c  |  20 ++---
  hw/ppc/spapr_hcall.c|   5 ++-
  hw/ppc/spapr_ovec.c |   8 
  include/hw/ppc/spapr.h  |   2 +
  include/hw/ppc/spapr_drc.h  |   1 -
  include/hw/ppc/spapr_ovec.h |   7 +++
  7 files changed, 109 insertions(+), 37 deletions(-)






Re: [Qemu-devel] [PATCH] block/gluster: glfs_lseek() workaround

2017-05-23 Thread Eric Blake
On 05/23/2017 12:27 PM, Jeff Cody wrote:
> On current released versions of glusterfs, glfs_lseek() will sometimes
> return invalid values for SEEK_DATA or SEEK_HOLE.  For SEEK_DATA and
> SEEK_HOLE, the returned value should be >= the passed offset, or < 0 in
> the case of error:
> 

> However, occasionally glfs_lseek() for SEEK_HOLE/DATA will return a
> value less than the passed offset, yet greater than zero.

Ouch.

> 
> Although this is being fixed in gluster, we still should work around it
> in QEMU, given that multiple released versions of gluster behave this
> way.

Fair enough.

> 
> This patch treats the return case of (offs < start) the same as if an
> error value other than ENXIO is returned; we will assume we learned
> nothing, and there are no holes in the file.

Yes, holes are merely an optimization, so treating bugs by the
pessimistic fallback of no holes rather than aborting is reasonable.


> +++ b/block/gluster.c
> @@ -1275,7 +1275,14 @@ static int find_allocation(BlockDriverState *bs, off_t 
> start,
>  if (offs < 0) {
>  return -errno;  /* D3 or D4 */
>  }
> -assert(offs >= start);
> +
> +if (offs < start) {
> +/* This is not a valid return by lseek().  We are safe to just return
> + * -EIO in this case, and we'll treat it like D4. Unfortunately some
> + *  versions of libgfapi will return offs < start, so an assert here
> + *  will unneccesarily abort QEMU. */

s/unneccesarily/unnecessarily/ (twice)

With spelling fix,
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 29/31] target/s390x: Use atomic operations for COMPARE SWAP PURGE

2017-05-23 Thread Aurelien Jarno
On 2017-05-23 09:31, Richard Henderson wrote:
> On 05/23/2017 05:28 AM, Aurelien Jarno wrote:
> > On 2017-05-22 20:03, Richard Henderson wrote:
> > > +/* flush global tlb */
> > > +void HELPER(purge)(CPUS390XState *env)
> > > +{
> > > +S390CPU *cpu = s390_env_get_cpu(env);
> > > +
> > > +tlb_flush_all_cpus(CPU(cpu));
> > 
> > > From what I understand from the PoP, the instruction should not complete
> > before the TLB has been purged on all CPUs. Therefore I guess
> > tlb_flush_all_cpus_synced() should be used instead.
> I don't read that from this:
> 
> # (1) all specified entries have been cleared
> # from the ALB and TLB of this CPU and
> 
> # (2) all other
> # CPUs in the configuration have completed any stor-
> # age accesses, including the updating of the change
> # and reference bits, by using the specified ALB and
> # TLB entries.
> 
> It talks about referenced bits being updated -- presumably before the tlb
> entry is flushed.  But it doesn't say "all specified ALB and TLB entries of
> other CPUs in the configuration".
> 
> But if you still disagree, it's certainly an easy change as you note.

Well i have to say it's not very clear. My point is that given the way
QEMU model things, if we want to ensure that all storage accesses using
the specified TLB entries are completed, we currently can only just make
sure that the all TLB entries have been flushed.
 
> > > +tcg_gen_atomic_cmpxchg_i64(old, addr, o->in1, o->out2,
> > 
> > Here the prep generator took the 32-bit version of in1. I guess the same
> > should be done for out2.
> 
> No, in1 is zero-extended for its use ...
> 
> > 
> > > +   get_mem_index(s), mop | MO_ALIGN);
> > > +tcg_temp_free_i64(addr);
> > > +
> > > +/* Are the memory and expected values (un)equal?  */
> > > +cc = tcg_temp_new_i64();
> > > +tcg_gen_setcond_i64(TCG_COND_NE, cc, o->in1, old);
> 
> ... here.
> 
> For out2 above, cmpxchg acts as any other store wrt MO_TEUL, in that it
> ignores the unused upper bits.

Indeed you are correct, I read it too fast.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



[Qemu-devel] [PATCH v2] qapi: Fix some QMP documentation regressions

2017-05-23 Thread Eric Blake
In the process of getting rid of docs/qmp-commands.txt, we managed
to regress on some of the text that changed after the point where
the move was first branched and when the move actually occurred.
For example, commit 3282eca for blockdev-snapshot re-added the
extra "options" layer which had been cleaned up in commit 0153d2f.

This clears up all regressions identified over the range
02b351d..bd6092e:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05127.html
as well as a cleanup to x-blockdev-remove-medium to prefer
'id' over 'device' (matching the cleanup for 'eject').

CC: qemu-triv...@nongnu.org
Signed-off-by: Eric Blake 
---
v2: add 'eject', 'x-blockdev-remove-medium' cleanups, thanks to
Markus' audit.

Can go in either via trivial or via qapi tree

 qapi/block-core.json | 28 ++--
 qapi/block.json  |  2 +-
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 309b1df..88a7471 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1206,11 +1206,11 @@
 # Example:
 #
 # -> { "execute": "blockdev-add",
-#  "arguments": { "options": { "driver": "qcow2",
-#  "node-name": "node1534",
-#  "file": { "driver": "file",
-#"filename": "hd1.qcow2" },
-#  "backing": "" } } }
+#  "arguments": { "driver": "qcow2",
+# "node-name": "node1534",
+# "file": { "driver": "file",
+#   "filename": "hd1.qcow2" },
+# "backing": "" } }
 #
 # <- { "return": {} }
 #
@@ -3214,7 +3214,7 @@
 # <- { "return": {} }
 #
 # -> { "execute": "x-blockdev-remove-medium",
-#  "arguments": { "device": "ide0-1-0" } }
+#  "arguments": { "id": "ide0-1-0" } }
 #
 # <- { "return": {} }
 #
@@ -3245,10 +3245,10 @@
 #
 # -> { "execute": "blockdev-add",
 #  "arguments": {
-#  "options": { "node-name": "node0",
-#   "driver": "raw",
-#   "file": { "driver": "file",
-# "filename": "fedora.iso" } } } }
+#  "node-name": "node0",
+#  "driver": "raw",
+#  "file": { "driver": "file",
+#"filename": "fedora.iso" } } }
 # <- { "return": {} }
 #
 # -> { "execute": "x-blockdev-insert-medium",
@@ -3701,10 +3701,10 @@
 # 1. Add a new node to a quorum
 # -> { "execute": "blockdev-add",
 #  "arguments": {
-#  "options": { "driver": "raw",
-#   "node-name": "new_node",
-#"file": { "driver": "file",
-#  "filename": "test.raw" } } } }
+#  "driver": "raw",
+#  "node-name": "new_node",
+#  "file": { "driver": "file",
+#"filename": "test.raw" } } }
 # <- { "return": {} }
 # -> { "execute": "x-blockdev-change",
 #  "arguments": { "parent": "disk1",
diff --git a/qapi/block.json b/qapi/block.json
index 6a2fdc7..414b61b 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -180,7 +180,7 @@
 #
 # Example:
 #
-# -> { "execute": "eject", "arguments": { "device": "ide1-0-1" } }
+# -> { "execute": "eject", "arguments": { "id": "ide1-0-1" } }
 # <- { "return": {} }
 ##
 { 'command': 'eject',
-- 
2.9.4




Re: [Qemu-devel] [PATCH 1/1] Improve Cocoa modifier key handling

2017-05-23 Thread Ian McKellar via Qemu-devel
On Tue, May 23, 2017 at 8:52 AM Ian McKellar  wrote:

> On Tue, May 23, 2017 at 3:03 AM Gerd Hoffmann  wrote:
>
>>
>> I'm wondering whenever we should just use modifierFlags all the time.
>>
>
> Probably. My initial patch tried to be minimally intrusive but I can try
> reworking the NSEventTypeFlagsChanged handling to compare the new
> modifierFlags to the last we've seen and synthesize the appropriate key
> down/up events.
>
> After a little more experimentation I think that the approach in this
patch is the right one. modifierFlags doesn't[1] indicate which instance of
a modifier (ie: left or right) is being held. Except for
the NSEventTypeFlagsChanged that's synthesized when a window takes focus
the event's keyCode indicates which physical key is affected. That first
event after focus has keyCode==0 which we can detect because 0 is the
keyCode of the 'A' key which isn't a modifier.

Ian

[1] actually there are undocumented flags outside the
NSEventModifierFlagDeviceIndependentFlagsMask mask that indicate left/right
keys but I don't think we should use them.


Re: [Qemu-devel] [PATCH 02/31] target/s390x: Implement EXECUTE via new TranslationBlock

2017-05-23 Thread Aurelien Jarno
On 2017-05-23 08:54, Richard Henderson wrote:
> On 05/23/2017 03:48 AM, Aurelien Jarno wrote:
> > On 2017-05-22 20:02, Richard Henderson wrote:
> > > Previously, helper_ex would construct the insn and then implement
> > > the insn via direct calls other helpers.  This was sufficient to
> > > boot Linux but that is all.
> > > 
> > > It is easy enough to go the whole nine yards by stashing state for
> > > EXECUTE within the cpu, and then relying on a new TB to be created
> > > that properly and completely interprets the insn.
> > > 
> > > Signed-off-by: Richard Henderson 
> > > ---
> > >   target/s390x/cpu.h |   4 +-
> > >   target/s390x/helper.h  |   2 +-
> > >   target/s390x/insn-data.def |   4 +-
> > >   target/s390x/machine.c |  19 +++
> > >   target/s390x/mem_helper.c  | 136 
> > > +++--
> > >   target/s390x/translate.c   | 124 
> > > +
> > >   6 files changed, 133 insertions(+), 156 deletions(-)
> > 
> > This looks good on the principle, and finally removes a big hack. That
> > said it prevent my test system to boot. I haven't investigated why yet.
> 
> Hmm.  I've not got a complete environment -- merely booting a kernel up to
> the point it fails to find a rootfs.  Which did find several problems with
> my first attempts at this, but wouldn't have exercised paging.  I'll try
> again to get a full install working...
> 
> I wonder if I needed to adjust s390_cpu_handle_mmu_fault (and its myriad
> subroutines) to handle setting ILEN correctly.
> 
> There might be a simpler fix though.  Currently I advance the PC and
> remember the ilen of the EX(RL).  Maybe better to *not* advance the PC so as
> to have the original EX(RL) right there for ILEN_LATER and ILEN_LATER_INC to
> operate on.
> 
> Something like this, as a delta patch.

Unfortunately it doesn't work. So far I have no real idea what could be
the root cause of the issue. I have just determined that up to the crash,
only a very limited set of instructions are being executed. They are the
4 bytes long versions of MVC, CLC, XC, TR.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



[Qemu-devel] [PATCH] block/gluster: glfs_lseek() workaround

2017-05-23 Thread Jeff Cody
On current released versions of glusterfs, glfs_lseek() will sometimes
return invalid values for SEEK_DATA or SEEK_HOLE.  For SEEK_DATA and
SEEK_HOLE, the returned value should be >= the passed offset, or < 0 in
the case of error:

LSEEK(2):

off_t lseek(int fd, off_t offset, int whence);

[...]

SEEK_HOLE
  Adjust  the file offset to the next hole in the file greater
  than or equal to offset.  If offset points into the middle of
  a hole, then the file offset is set to offset.  If there is no
  hole past offset, then the file offset is adjusted to the end
  of the file (i.e., there is  an implicit hole at the end of
  any file).

[...]

RETURN VALUE
  Upon  successful  completion,  lseek()  returns  the resulting
  offset location as measured in bytes from the beginning of the
  file.  On error, the value (off_t) -1 is returned and errno is
  set to indicate the error

However, occasionally glfs_lseek() for SEEK_HOLE/DATA will return a
value less than the passed offset, yet greater than zero.

For instance, here are example values observed from this call:

offs = glfs_lseek(s->fd, start, SEEK_HOLE);
if (offs < 0) {
return -errno;  /* D1 and (H3 or H4) */
}

start == 7608336384
offs == 7607877632

This causes QEMU to abort on the assert test.  When this value is
returned, errno is also 0.

This is a reported and known bug to glusterfs:
https://bugzilla.redhat.com/show_bug.cgi?id=1425293

Although this is being fixed in gluster, we still should work around it
in QEMU, given that multiple released versions of gluster behave this
way.

This patch treats the return case of (offs < start) the same as if an
error value other than ENXIO is returned; we will assume we learned
nothing, and there are no holes in the file.

Signed-off-by: Jeff Cody 
---
 block/gluster.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 7c76cd0..c147909e 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1275,7 +1275,14 @@ static int find_allocation(BlockDriverState *bs, off_t 
start,
 if (offs < 0) {
 return -errno;  /* D3 or D4 */
 }
-assert(offs >= start);
+
+if (offs < start) {
+/* This is not a valid return by lseek().  We are safe to just return
+ * -EIO in this case, and we'll treat it like D4. Unfortunately some
+ *  versions of libgfapi will return offs < start, so an assert here
+ *  will unneccesarily abort QEMU. */
+return -EIO;
+}
 
 if (offs > start) {
 /* D2: in hole, next data at offs */
@@ -1307,7 +1314,14 @@ static int find_allocation(BlockDriverState *bs, off_t 
start,
 if (offs < 0) {
 return -errno;  /* D1 and (H3 or H4) */
 }
-assert(offs >= start);
+
+if (offs < start) {
+/* This is not a valid return by lseek().  We are safe to just return
+ * -EIO in this case, and we'll treat it like H4. Unfortunately some
+ *  versions of libgfapi will return offs < start, so an assert here
+ *  will unneccesarily abort QEMU. */
+return -EIO;
+}
 
 if (offs > start) {
 /*
-- 
2.9.3




Re: [Qemu-devel] [PATCH] qapi: Fix some blockdev-add documentation regressions

2017-05-23 Thread Eric Blake
On 05/23/2017 09:32 AM, Eric Blake wrote:
> On 05/23/2017 03:39 AM, Markus Armbruster wrote:
>> Eric Blake  writes:
>>
>>> In the process of getting rid of docs/qmp-commands.txt, we
>>> managed to regress on any text that changed after the point
>>> where the move was first branched and when the move actually
>>> occurred.  For example, commit 3282eca for blockdev-snapshot
>>> re-added the extra "options" layer which had been cleaned up
>>> in commit 0153d2f.
>>>

> 
> I'll post a v2, since I have another pending doc patch that hasn't been
> reviewed yet, and since your audit means I need to update my commit
> message anyway.

Actually, it just got queued as trivial:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05323.html

although git handles things just fine if it goes in earlier through any
other tree.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 30/31] target/s390x: Implement CSPG

2017-05-23 Thread Aurelien Jarno
On 2017-05-23 09:33, Richard Henderson wrote:
> On 05/23/2017 04:12 AM, Aurelien Jarno wrote:
> > On 2017-05-22 20:03, Richard Henderson wrote:
> > > Signed-off-by: Richard Henderson 
> > > ---
> > >   target/s390x/insn-data.def | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
> > > index 4c91f30..8604847 100644
> > > --- a/target/s390x/insn-data.def
> > > +++ b/target/s390x/insn-data.def
> > > @@ -838,6 +838,7 @@
> > >   #ifndef CONFIG_USER_ONLY
> > >   /* COMPARE AND SWAP AND PURGE */
> > >   D(0xb250, CSP, RRE,   Z,   r1_32u, ra2, r1_P, 0, csp, 0, 
> > > MO_TEUL)
> > > +D(0xb98a, CSPG,RRE,   Z,   r1_o, ra2, r1_P, 0, csp, 0, MO_TEQ)
> > 
> > CSPG is part of the of the DAT-enhancement facility. I called it DAT_ENH
> > in my local patches to match the name we have in the CPU features.
> 
> The translator needs a large overhaul to bring it into line with the (more
> recently added) facilities infrastructure.  We don't currently enforce
> anything.

Agreed. That said it's actually useful to quickly check if all the
instructions of a given facility have been implemented and then flip the
corresponding facility bit.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Re: [Qemu-devel] [PATCH v2] tests/libqtest: Print error instead of aborting when env variable is missing

2017-05-23 Thread John Snow


On 05/23/2017 03:25 AM, Thomas Huth wrote:
> On 22.05.2017 17:58, no-re...@patchew.org wrote:
>> This series failed automatic build test. Please find the testing commands and
>> their output below. If you have docker installed, you can probably reproduce 
>> it
>> locally.
> [...]
>>   LEX convert-dtsv0-lexer.lex.c
>>   DEP /tmp/qemu-test/src/dtc/fdtdump.c
>> make[1]: flex: Command not found
>>   BISON dtc-parser.tab.c
>> make[1]: bison: Command not found
>>   DEP /tmp/qemu-test/src/dtc/srcpos.c
>>   LEX dtc-lexer.lex.c
>> make[1]: flex: Command not found
>>   DEP /tmp/qemu-test/src/dtc/treesource.c
>>   DEP /tmp/qemu-test/src/dtc/livetree.c
>>   DEP /tmp/qemu-test/src/dtc/fstree.c
>>   DEP /tmp/qemu-test/src/dtc/flattree.c
>>   DEP /tmp/qemu-test/src/dtc/dtc.c
>>   DEP /tmp/qemu-test/src/dtc/data.c
>>   DEP /tmp/qemu-test/src/dtc/checks.c
>>  CHK version_gen.h
>>   BISON dtc-parser.tab.c
>> make[1]: bison: Command not found
>>   LEX convert-dtsv0-lexer.lex.c
>>  UPD version_gen.h
>> make[1]: flex: Command not found
>>   LEX dtc-lexer.lex.c
>> make[1]: flex: Command not found
>>   DEP /tmp/qemu-test/src/dtc/util.c
>>   LEX convert-dtsv0-lexer.lex.c
>>   BISON dtc-parser.tab.c
>> make[1]: flex: Command not found
>> make[1]: bison: Command not found
>>   LEX dtc-lexer.lex.c
>> make[1]: flex: Command not found
> 
> Looks like flex and bison are missing in the docker image? Could someone
> please add it?
> 
> [...]
>>   CC  replay/replay-internal.o
>> /tmp/qemu-test/src/replay/replay-internal.c: In function ‘replay_put_array’:
>> /tmp/qemu-test/src/replay/replay-internal.c:65: warning: ignoring return 
>> value of ‘fwrite’, declared with attribute warn_unused_result
> 
> I guess that should be fixed?
> 
> [...]
>>   CC  slirp/tcp_input.o
>>   CC  slirp/tcp_output.o
>>   CC  slirp/tcp_subr.o
>>   CC  slirp/tcp_timer.o
>> /tmp/qemu-test/src/slirp/tcp_input.c: In function ‘tcp_input’:
>> /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_p’ may be 
>> used uninitialized in this function
>> /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_len’ may be 
>> used uninitialized in this function
>> /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_tos’ may be 
>> used uninitialized in this function
>> /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_id’ may be 
>> used uninitialized in this function
>> /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_off’ may be 
>> used uninitialized in this function
>> /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_ttl’ may be 
>> used uninitialized in this function
>> /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_sum’ may be 
>> used uninitialized in this function
>> /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_src.s_addr’ 
>> may be used uninitialized in this function
>> /tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_dst.s_addr’ 
>> may be used uninitialized in this function
>> /tmp/qemu-test/src/slirp/tcp_input.c:220: warning: ‘save_ip6.ip_nh’ may be 
>> used uninitialized in this function
> 
> I've never seen these warnings in tcp_input.c before ... and they also
> look like false positives too me ... is that GCC in the docker image too
> sensitive?
> 
> [...]
>>   CC  aarch64-softmmu/hw/timer/exynos4210_rtc.o
>> /tmp/qemu-test/src/hw/i386/pc_piix.c: In function 
>> ‘igd_passthrough_isa_bridge_create’:
>> /tmp/qemu-test/src/hw/i386/pc_piix.c:1067: warning: ‘pch_rev_id’ may be used 
>> uninitialized in this function
> 
> dito
> 
>>   CC  aarch64-softmmu/hw/usb/tusb6010.o
>> /tmp/qemu-test/src/hw/i386/acpi-build.c: In function 
>> ‘build_append_pci_bus_devices’:
>> /tmp/qemu-test/src/hw/i386/acpi-build.c:525: warning: ‘notify_method’ may be 
>> used uninitialized in this function
> 
> dito
> 
> [...]
>> /tmp/qemu-test/src/tests/ide-test.c: In function ‘cdrom_pio_impl’:
>> /tmp/qemu-test/src/tests/ide-test.c:803: warning: ignoring return value of 
>> ‘fwrite’, declared with attribute warn_unused_result
>> /tmp/qemu-test/src/tests/ide-test.c: In function ‘test_cdrom_dma’:
>> /tmp/qemu-test/src/tests/ide-test.c:899: warning: ignoring return value of 
>> ‘fwrite’, declared with attribute warn_unused_result
> 
> Needs a patch, I guess?
> 

Huh, I've never seen these come up locally, I'll send a patch (if only
to quiet patchew.)

>>   GTESTER tests/test-vmstate
>> Failed to load simple/primitive:b_1
>> Failed to load simple/primitive:i64_2
>> Failed to load simple/primitive:i32_1
>> Failed to load simple/primitive:i32_1
>> Failed to load test/with_tmp:a
>> Failed to load test/tmp_child_parent:f
>> Failed to load test/tmp_child:parent
>> Failed to load test/with_tmp:tmp
>> Failed to load test/tmp_child:diff
>> Failed to load test/with_tmp:tmp
>> Failed to load test/tmp_child:diff
>> Failed to load test/with_tmp:tmp
> 
> That works for me when I 

Re: [Qemu-devel] [PATCH v2 4/4] 9pfs: local: metadata file for the VirtFS root

2017-05-23 Thread Eric Blake
On 05/23/2017 09:32 AM, Greg Kurz wrote:
> When using the mapped-file security, credentials are stored in a metadata
> directory located in the parent directory. This is okay for all paths with
> the notable exception of the root path, since we don't want and probably
> can't create a metadata directory above the virtfs directory on the host.
> 
> This patch introduces a dedicated metadata file, sitting in the virtfs root
> for this purpose. It relies on the fact that the "." name necessarily refers
> to the virtfs root.
> 
> As for the metadata directory, we don't want the client to see this file.
> The current code only cares for readdir() but there are many other places
> to fix actually. The filtering logic is hence put in a separate function.
> 
> @@ -478,7 +504,8 @@ static off_t local_telldir(FsContext *ctx, 
> V9fsFidOpenState *fs)
>  
>  static bool local_is_mapped_file_metadata(FsContext *fs_ctx, const char 
> *name)
>  {
> -return !strcmp(name, VIRTFS_META_DIR);
> +return
> +!strcmp(name, VIRTFS_META_DIR) || !strcmp(name, 
> VIRTFS_META_ROOT_FILE);

We have to block VIRTFS_META_DIR at any depth in the hierarchy, but
can/should we change the blocking of VIRTFS_META_ROOT_FILE to only
happen at the root directory, rather than at all directories?  On the
other hand, if you can simultaneously map /path/to/a for one mount
point, and /path/to/a/b for another, then the root file for B is visible
at a lower depth than the root file for A, and blocking the metafile
name everywhere means that the mount A can't influence the behavior on
the mount for B.

Not tested, but looks sane, so:
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Xen-devel] [RFC PATCH V2 1/2] xen-pt: bind/unbind interrupt remapping format MSI

2017-05-23 Thread Anthony PERARD
On Tue, May 23, 2017 at 08:16:25PM +0800, Lan Tianyu wrote:
> On 2017年05月19日 20:04, Jan Beulich wrote:
>  On 19.05.17 at 13:16,  wrote:
> >> On Thu, May 18, 2017 at 01:32:59AM -0400, Lan Tianyu wrote:
> >>> --- a/include/hw/i386/apic-msidef.h
> >>> +++ b/include/hw/i386/apic-msidef.h
> >>> @@ -26,6 +26,7 @@
> >>>  
> >>>  #define MSI_ADDR_DEST_ID_SHIFT  12
> >>>  #define MSI_ADDR_DEST_IDX_SHIFT 4
> >>> -#define  MSI_ADDR_DEST_ID_MASK  0x000
> >>> +#define  MSI_ADDR_DEST_ID_MASK  0x000fff00
> >> The value of MSI_ADDR_DEST_ID_MASK is changed here. I think the patch
> >> should be:
> >> +#define  MSI_ADDR_DEST_ID_MASK  0x0000
> > Judging from other sources, rather the other way around - the
> > mask needs to have further bits removed (should be 0x000ff000
> > afaict). Xen sources confirm this, and while Linux has the value
> > you suggest, that contradicts
> Agree. Defining the mask as "0x000ff000" makes more sense.
> Just check Qemu source code. Only apic_send_msi() and msi_dest_id() use
> the mask
> to get dest apic id. They mask MSI address field with 
> MSI_ADDR_DEST_ID_MASK and
> then right-shift 12bit. The low 12bit won't be used.
> 
> Anthony, does this make sense?

Yes, it does.
The change to MSI_ADDR_DEST_ID_MASK should probably go in its own patch.

-- 
Anthony PERARD



Re: [Qemu-devel] [PATCH 1/4] block: Fix anonymous BBs in blk_root_inactivate()

2017-05-23 Thread Juan Quintela
Kevin Wolf  wrote:
> blk->name isn't an array, but a pointer that can be NULL. Checking for
> an anonymous BB must involve a NULL check first, otherwise we get
> crashes.
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PATCH 30/31] target/s390x: Implement CSPG

2017-05-23 Thread Richard Henderson

On 05/23/2017 04:12 AM, Aurelien Jarno wrote:

On 2017-05-22 20:03, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  target/s390x/insn-data.def | 1 +
  1 file changed, 1 insertion(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 4c91f30..8604847 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -838,6 +838,7 @@
  #ifndef CONFIG_USER_ONLY
  /* COMPARE AND SWAP AND PURGE */
  D(0xb250, CSP, RRE,   Z,   r1_32u, ra2, r1_P, 0, csp, 0, MO_TEUL)
+D(0xb98a, CSPG,RRE,   Z,   r1_o, ra2, r1_P, 0, csp, 0, MO_TEQ)


CSPG is part of the of the DAT-enhancement facility. I called it DAT_ENH
in my local patches to match the name we have in the CPU features.


The translator needs a large overhaul to bring it into line with the (more 
recently added) facilities infrastructure.  We don't currently enforce anything.


But you're also right that I shouldn't just ignore the issue and leave it 
marked incorrectly.



r~




Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] spapr: add option vector 6

2017-05-23 Thread Greg Kurz
On Tue, 23 May 2017 13:18:10 +0200
Laurent Vivier  wrote:

> This allows to know when the OS is started and its type.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/ppc/spapr.c  | 36 
>  hw/ppc/spapr_hcall.c|  5 -
>  hw/ppc/spapr_ovec.c |  8 
>  include/hw/ppc/spapr.h  |  2 ++
>  include/hw/ppc/spapr_ovec.h |  7 +++
>  5 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0e8d8d1..eceb4cc 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1369,6 +1369,7 @@ static void ppc_spapr_reset(void)
>  first_ppc_cpu->env.nip = SPAPR_ENTRY_POINT;
>  
>  spapr->cas_reboot = false;
> +spapr->os_name = OV6_NONE;
>  }
>  
>  static void spapr_create_nvram(sPAPRMachineState *spapr)
> @@ -1524,10 +1525,41 @@ static const VMStateDescription 
> vmstate_spapr_patb_entry = {
>  },
>  };
>  
> +static bool spapr_os_name_needed(void *opaque)
> +{
> +sPAPRMachineState *spapr = opaque;
> +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +return smc->need_os_name;

So this will have the subsection migrated unconditionally even if the value 
wasn't
changed from the default yet ? Also, it looks weird to involve a machine compat
flag here... if the concern is backwards migration then I guess you should check
the compat flag in h_client_architecture_support() and not set @os_name for 
older
machines.

> +}
> +
> +static const VMStateDescription vmstate_spapr_os_name = {
> +.name = "spapr_os_name",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = spapr_os_name_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT8(os_name, sPAPRMachineState),
> +VMSTATE_END_OF_LIST()
> +},
> +};
> +
> +static int spapr_pre_load(void *opaque)
> +{
> +sPAPRMachineState *spapr = opaque;
> +
> +/* if the os_name is not migrated from the source,
> + * we must allow hotplug, so set os_name to linux
> + */
> +spapr->os_name = OV6_LINUX;

But maybe the source was in SLOF and I guess you don't want @os_name
to be set in this case... The correct way to restore older machines
behavior is to set @os_name to OV6_LINUX according to the compat flag.

> +
> +return 0;
> +}
> +
>  static const VMStateDescription vmstate_spapr = {
>  .name = "spapr",
>  .version_id = 3,
>  .minimum_version_id = 1,
> +.pre_load = spapr_pre_load,
>  .post_load = spapr_post_load,
>  .fields = (VMStateField[]) {
>  /* used to be @next_irq */
> @@ -1542,6 +1574,7 @@ static const VMStateDescription vmstate_spapr = {
>  .subsections = (const VMStateDescription*[]) {
>  _spapr_ov5_cas,
>  _spapr_patb_entry,
> +_spapr_os_name,
>  NULL
>  }
>  };
> @@ -3216,6 +3249,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>   * in which LMBs are represented and hot-added
>   */
>  mc->numa_mem_align_shift = 28;
> +smc->need_os_name = true;

The name seems to indicate the machine requires this, which is obviously not
the case... what about @parse_os_name instead ?

>  }
>  
>  static const TypeInfo spapr_machine_info = {
> @@ -3293,9 +3327,11 @@ static void 
> spapr_machine_2_9_instance_options(MachineState *machine)
>  
>  static void spapr_machine_2_9_class_options(MachineClass *mc)
>  {
> +sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
>  spapr_machine_2_10_class_options(mc);
>  SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
>  mc->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
> +smc->need_os_name = false;
>  }
>  
>  DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 0d608d6..5dbe3c7 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1058,7 +1058,8 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  uint32_t max_compat = cpu->max_compat;
>  uint32_t best_compat = 0;
>  int i;
> -sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates;
> +sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates,
> +  *ov6_guest;
>  bool guest_radix;
>  
>  /*
> @@ -1112,6 +1113,7 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  
>  ov1_guest = spapr_ovec_parse_vector(ov_table, 1);
>  ov5_guest = spapr_ovec_parse_vector(ov_table, 5);
> +ov6_guest = spapr_ovec_parse_vector(ov_table, 6);
>  if (spapr_ovec_test(ov5_guest, OV5_MMU_BOTH)) {
>  error_report("guest requested hash and radix MMU, which is 
> invalid.");
>  exit(EXIT_FAILURE);
> @@ -1154,6 +1156,7 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  }
>  spapr->cas_legacy_guest_workaround = !spapr_ovec_test(ov1_guest,
> 

Re: [Qemu-devel] [PATCH 29/31] target/s390x: Use atomic operations for COMPARE SWAP PURGE

2017-05-23 Thread Richard Henderson

On 05/23/2017 05:28 AM, Aurelien Jarno wrote:

On 2017-05-22 20:03, Richard Henderson wrote:

+/* flush global tlb */
+void HELPER(purge)(CPUS390XState *env)
+{
+S390CPU *cpu = s390_env_get_cpu(env);
+
+tlb_flush_all_cpus(CPU(cpu));



From what I understand from the PoP, the instruction should not complete

before the TLB has been purged on all CPUs. Therefore I guess
tlb_flush_all_cpus_synced() should be used instead.

I don't read that from this:

# (1) all specified entries have been cleared
# from the ALB and TLB of this CPU and

# (2) all other
# CPUs in the configuration have completed any stor-
# age accesses, including the updating of the change
# and reference bits, by using the specified ALB and
# TLB entries.

It talks about referenced bits being updated -- presumably before the tlb entry 
is flushed.  But it doesn't say "all specified ALB and TLB entries of other 
CPUs in the configuration".


But if you still disagree, it's certainly an easy change as you note.



+tcg_gen_atomic_cmpxchg_i64(old, addr, o->in1, o->out2,


Here the prep generator took the 32-bit version of in1. I guess the same
should be done for out2.


No, in1 is zero-extended for its use ...




+   get_mem_index(s), mop | MO_ALIGN);
+tcg_temp_free_i64(addr);
+
+/* Are the memory and expected values (un)equal?  */
+cc = tcg_temp_new_i64();
+tcg_gen_setcond_i64(TCG_COND_NE, cc, o->in1, old);


... here.

For out2 above, cmpxchg acts as any other store wrt MO_TEUL, in that it ignores 
the unused upper bits.



r~



Re: [Qemu-devel] [PATCH 4/4] qemu-iotests: Block migration test

2017-05-23 Thread Eric Blake
On 05/23/2017 11:18 AM, Kevin Wolf wrote:
> Am 23.05.2017 um 17:46 hat Eric Blake geschrieben:
>> On 05/23/2017 09:01 AM, Kevin Wolf wrote:
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  tests/qemu-iotests/183 | 121 
>>> +
>>>  tests/qemu-iotests/183.out |  48 ++
>>>  tests/qemu-iotests/group   |   1 +
>>>  3 files changed, 170 insertions(+)
>>>  create mode 100755 tests/qemu-iotests/183
>>>  create mode 100644 tests/qemu-iotests/183.out
>>
>> Does this test gracefully skip when coupled with the efforts for a
>> configure option to disable block migration?
>> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg04398.html
> 
> Obviously it doesn't. How would we even check? Just grep whether the
> magic error string turns up somewhere after doing 'migrate -b'?

I think the easiest way (once Juan's series lands) would be to try this
QMP on a standalone qemu execution prior to firing up the rest of the test:

 { "execute":"migrate-set-capabilities", "arguments":{
"capabilities": [ { "capability":"block", "state":true } ] } }

If that command fails, block migration is not compiled in (or we're
talking to an older qemu that lacks the capability altogether, but we
don't have to worry about that in our iotests).  Of course, if we do
that, then we could use QMP 'migrate' with the capabilities rather than
HMP -b to drive the same aspect of the test.

> 
> I don't think we have other test cases that don't skip immediately, but
> only after doing half of the test, but I think it might work anyway.

That's an option too - grepping for the magic failure string and
gracefully exiting if we were unable to start migration.  I think we
have done something like that recently to grep whether we have
op-blocking support via fcntl OFD semantics, and exit early if it is an
older kernel that has less-sane locks based on the error message.


>> Should we also check the use of -i?
> 
> Good point, though I don't even know how to use -i manually. :-)
> 
> We can either have a follow-up patch extending this one or a completely
> separate test case for it. I would have to try out -i before I could say
> which makes more sense.

An incremental patch to expand this test is fine.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 28/31] target/s390x: Use unwind data for helper_lra

2017-05-23 Thread Richard Henderson

On 05/23/2017 05:11 AM, Aurelien Jarno wrote:

On 2017-05-22 20:03, Richard Henderson wrote:

Note that exception_index is not live during a TB,
so there is no point saving it around mmu_translate.


What do you mean by "is not live"? Indeed cpu_loop_exit() is not called
so the TB is not terminated immediately. That said the while loop in
cpu_exec() will trigger the exception after the TB.



Ah, yes.  I'll undo that bit.


r~



Re: [Qemu-devel] [PATCH 4/4] qemu-iotests: Block migration test

2017-05-23 Thread Kevin Wolf
Am 23.05.2017 um 17:46 hat Eric Blake geschrieben:
> On 05/23/2017 09:01 AM, Kevin Wolf wrote:
> > Signed-off-by: Kevin Wolf 
> > ---
> >  tests/qemu-iotests/183 | 121 
> > +
> >  tests/qemu-iotests/183.out |  48 ++
> >  tests/qemu-iotests/group   |   1 +
> >  3 files changed, 170 insertions(+)
> >  create mode 100755 tests/qemu-iotests/183
> >  create mode 100644 tests/qemu-iotests/183.out
> 
> Does this test gracefully skip when coupled with the efforts for a
> configure option to disable block migration?
> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg04398.html

Obviously it doesn't. How would we even check? Just grep whether the
magic error string turns up somewhere after doing 'migrate -b'?

I don't think we have other test cases that don't skip immediately, but
only after doing half of the test, but I think it might work anyway.

> > +echo
> > +echo === Do block migration to destination ===
> > +echo
> > +
> > +_send_qemu_cmd $src "migrate -b unix:${MIG_SOCKET}" "(qemu)"
> 
> I don't see any earlier checks, so it looks like you'd choke here if -b
> is not compiled in.
> 
> Should we also check the use of -i?

Good point, though I don't even know how to use -i manually. :-)

We can either have a follow-up patch extending this one or a completely
separate test case for it. I would have to try out -i before I could say
which makes more sense.

Kevin


pgpIbuEY8Irv7.pgp
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/4] spapr: add pre_plug function for memory

2017-05-23 Thread Laurent Vivier
On 23/05/2017 17:28, Greg Kurz wrote:
> On Tue, 23 May 2017 13:18:09 +0200
> Laurent Vivier  wrote:
> 
>> This allows to manage errors before the memory
>> has started to be hotplugged. We already have
>> the function for the CPU cores.
>>
>> Signed-off-by: Laurent Vivier 
>> ---
>>  hw/ppc/spapr.c | 45 ++---
>>  1 file changed, 30 insertions(+), 15 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 0980d73..0e8d8d1 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2569,20 +2569,6 @@ static void spapr_memory_plug(HotplugHandler 
>> *hotplug_dev, DeviceState *dev,
>>  uint64_t align = memory_region_get_alignment(mr);
>>  uint64_t size = memory_region_size(mr);
>>  uint64_t addr;
>> -char *mem_dev;
>> -
>> -if (size % SPAPR_MEMORY_BLOCK_SIZE) {
>> -error_setg(_err, "Hotplugged memory size must be a multiple 
>> of "
>> -  "%lld MB", SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
>> -goto out;
>> -}
>> -
>> -mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, 
>> NULL);
>> -if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
>> -error_setg(_err, "Memory backend has bad page size. "
>> -   "Use 'memory-backend-file' with correct mem-path.");
>> -goto out;
>> -}
>>  
>>  pc_dimm_memory_plug(dev, >hotplug_memory, mr, align, _err);
>>  if (local_err) {
>> @@ -2603,6 +2589,33 @@ out:
>>  error_propagate(errp, local_err);
>>  }
>>  
>> +static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState 
>> *dev,
>> +Error **errp)
> 
> Indentation nit

ok

> 
>> +{
>> +PCDIMMDevice *dimm = PC_DIMM(dev);
>> +PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>> +MemoryRegion *mr = ddc->get_memory_region(dimm);
>> +uint64_t size = memory_region_size(mr);
>> +Error *local_err = NULL;
>> +char *mem_dev;
>> +
>> +if (size % SPAPR_MEMORY_BLOCK_SIZE) {
>> +error_setg(_err, "Hotplugged memory size must be a multiple 
>> of "
>> +  "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
>> +goto out;
>> +}
>> +
>> +mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, 
>> NULL);
>> +if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
>> +error_setg(_err, "Memory backend has bad page size. "
>> +   "Use 'memory-backend-file' with correct mem-path.");
>> +goto out;
>> +}
>> +
>> +out:
>> +error_propagate(errp, local_err);
> 
> As recently discussed with Markus Armbruster, it isn't necessary to have a
> local Error * if you don't do anything else with it but propagate it.

Yes, you are right, it's a stupid cut'n'paste.

Thanks,
Laurent



[Qemu-devel] virtio crypto device implemenation

2017-05-23 Thread Zeng, Xin
Hi, Michael, 
   As you know, Lei Gong from Huawei and I are co-working  on virtio crypto 
device spec, he is focusing on symmetric algorithm part, I am focusing on 
asymmetric part.  Now I am planning the implementation for asymmetric part, 
would you please give me your point regarding the questions below?
   Current virtio crypto device implementation from Lei Gong:
   The virtio crypto device implementation has been upstreamed to QEMU and it 
has a qemu backend implementation for symmetric algorithm part, the front end 
Linux device driver for symmetric part has been upstreamed to Linux kernel as 
well.
   My questions:
   From my side, I planned to add the asymmetric part support in upstreamed 
front end device driver, and I don't want to add the asymmetric algorithm 
support to current virtio crypto device's qemu backend, instead, I would like 
to implement and upstream a DPDK vhost-user based backend for asymmetric 
algorithm, and accordingly Lei Gong will help to upstream a vhost user agent 
for virtio crypto device in QEMU,  is this approach acceptable? Is a qemu 
backend a mandatory requirement for the virtio crypto device?  Is there a 
general policy for this?

Thanks




Re: [Qemu-devel] [PATCH 02/31] target/s390x: Implement EXECUTE via new TranslationBlock

2017-05-23 Thread Aurelien Jarno
On 2017-05-23 12:48, Aurelien Jarno wrote:
> On 2017-05-22 20:02, Richard Henderson wrote:
> > Previously, helper_ex would construct the insn and then implement
> > the insn via direct calls other helpers.  This was sufficient to
> > boot Linux but that is all.
> > 
> > It is easy enough to go the whole nine yards by stashing state for
> > EXECUTE within the cpu, and then relying on a new TB to be created
> > that properly and completely interprets the insn.
> > 
> > Signed-off-by: Richard Henderson 
> > ---
> >  target/s390x/cpu.h |   4 +-
> >  target/s390x/helper.h  |   2 +-
> >  target/s390x/insn-data.def |   4 +-
> >  target/s390x/machine.c |  19 +++
> >  target/s390x/mem_helper.c  | 136 
> > +++--
> >  target/s390x/translate.c   | 124 +
> >  6 files changed, 133 insertions(+), 156 deletions(-)
> 
> This looks good on the principle, and finally removes a big hack. That
> said it prevent my test system to boot. I haven't investigated why yet.

This can aslo be reproduced using the kernel and initrd from the daily
Debian installer:

  https://d-i.debian.org/daily-images/s390x/daily/generic/

I am personally using the following command line:

  qemu-system-s390x -M s390-ccw-virtio -m 512 -nographic -kernel kernel.debian 
-initrd initrd.debian

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Re: [Qemu-devel] [PATCH 02/31] target/s390x: Implement EXECUTE via new TranslationBlock

2017-05-23 Thread Richard Henderson

On 05/23/2017 03:48 AM, Aurelien Jarno wrote:

On 2017-05-22 20:02, Richard Henderson wrote:

Previously, helper_ex would construct the insn and then implement
the insn via direct calls other helpers.  This was sufficient to
boot Linux but that is all.

It is easy enough to go the whole nine yards by stashing state for
EXECUTE within the cpu, and then relying on a new TB to be created
that properly and completely interprets the insn.

Signed-off-by: Richard Henderson 
---
  target/s390x/cpu.h |   4 +-
  target/s390x/helper.h  |   2 +-
  target/s390x/insn-data.def |   4 +-
  target/s390x/machine.c |  19 +++
  target/s390x/mem_helper.c  | 136 +++--
  target/s390x/translate.c   | 124 +
  6 files changed, 133 insertions(+), 156 deletions(-)


This looks good on the principle, and finally removes a big hack. That
said it prevent my test system to boot. I haven't investigated why yet.


Hmm.  I've not got a complete environment -- merely booting a kernel up to the 
point it fails to find a rootfs.  Which did find several problems with my first 
attempts at this, but wouldn't have exercised paging.  I'll try again to get a 
full install working...


I wonder if I needed to adjust s390_cpu_handle_mmu_fault (and its myriad 
subroutines) to handle setting ILEN correctly.


There might be a simpler fix though.  Currently I advance the PC and remember 
the ilen of the EX(RL).  Maybe better to *not* advance the PC so as to have the 
original EX(RL) right there for ILEN_LATER and ILEN_LATER_INC to operate on.


Something like this, as a delta patch.


r~


diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 67c85f0..5773f92 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -2206,8 +2206,10 @@ static ExitStatus op_ex(DisasContext *s, DisasOps *o)
 tcg_temp_free_i64(v1);
 }

-/* End the TB; a new TB will be created for modified insn.  */
-return EXIT_PC_STALE;
+/* End the TB; a new TB will be created for modified insn.
+   Note that the modified insn runs with this same PC.  */
+update_psw_addr(s);
+return EXIT_PC_UPDATED;
 }

 static ExitStatus op_fieb(DisasContext *s, DisasOps *o)
@@ -5189,14 +5191,10 @@ static const DisasInsn *extract_insn
 insn = s->ex_value & 0xull;
 ilen = s->ex_value & 0xff;
 op = insn >> 56;
-s->ilen = ilen;
-s->next_pc = s->pc;
 } else {
 insn = ld_code2(env, pc);
 op = (insn >> 8) & 0xff;
 ilen = get_ilen(op);
-s->ilen = ilen;
-s->next_pc = s->pc + ilen;

 switch (ilen) {
 case 2:
@@ -5212,6 +5210,8 @@ static const DisasInsn *extract_insn
 g_assert_not_reached();
 }
 }
+s->next_pc = s->pc + ilen;
+s->ilen = ilen;

 /* We can't actually determine the insn format until we've looked up
the full insn opcode.  Which we can't do without locating the
@@ -5470,17 +5470,14 @@ void gen_intermediate_code

 /* If we reach a page boundary, are single stepping,
or exhaust instruction count, stop generation.  */
-if (status == NO_EXIT) {
-if (unlikely(dc.ex_value)) {
-/* The PC on entry is already advanced.  */
-status = EXIT_PC_UPDATED;
-} else if (dc.pc >= next_page_start
-   || tcg_op_buf_full()
-   || num_insns >= max_insns
-   || singlestep
-   || cs->singlestep_enabled) {
-status = EXIT_PC_STALE;
-}
+if (status == NO_EXIT
+&& (dc.pc >= next_page_start
+|| tcg_op_buf_full()
+|| num_insns >= max_insns
+|| singlestep
+|| cs->singlestep_enabled
+|| dc.ex_value)) {
+status = EXIT_PC_STALE;
 }
 } while (status == NO_EXIT);




Re: [Qemu-devel] [PATCH 1/1] Improve Cocoa modifier key handling

2017-05-23 Thread Ian McKellar via Qemu-devel
On Tue, May 23, 2017 at 3:03 AM Gerd Hoffmann  wrote:

> Sounds like this happens in case there is a modifier state change
> without linked key event, such as state change while qemu did not have
> the keyboard focus.  Nice that macos sends notifications in that case.
>

Yeah, I guess it makes sense to send an event to update modifier state when
keyboard focus returns.

>
> I'm wondering whenever we should just use modifierFlags all the time.
>

Probably. My initial patch tried to be minimally intrusive but I can try
reworking the NSEventTypeFlagsChanged handling to compare the new
modifierFlags to the last we've seen and synthesize the appropriate key
down/up events.

Ian


Re: [Qemu-devel] [PATCH v2 3/4] 9pfs: local: simplify file opening

2017-05-23 Thread Eric Blake
On 05/23/2017 09:32 AM, Greg Kurz wrote:
> The logic to open a path currently sits between local_open_nofollow() and
> the relative_openat_nofollow() helper, which has no other user.
> 
> For the sake of clarity, this patch moves all the code of the helper into
> its unique caller. While here we also:
> - drop the code to skip leading "/" because the backend isn't supposed to
>   pass anything but relative paths without consecutive slashes. The assert()
>   is kept because we really don't want a buggy backend to pass   an absolute

odd spacing

>   path to openat().
> - use strchrnul() to get a simpler code. This is ok since virtfs if for

s/if/is/

>   linux+glibc hosts only.
> - don't dup() the initial directory and add an assert() to ensure we don't
>   return the global mountfd to the caller. BTW, this would mean that the
>   caller passed an empty path, which isn't supposed to happen either.
> 
> Signed-off-by: Greg Kurz 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 2/4] 9pfs: local: resolve special directories in paths

2017-05-23 Thread Eric Blake
On 05/23/2017 09:32 AM, Greg Kurz wrote:
> When using the mapped-file security mode, the creds of a path /foo/bar
> are stored in the /foo/.virtfs_metadata/bar file. This is okay for all
> paths unless they end with '.' or '..', because we cannot create the
> corresponding file in the metadata directory.
> 
> This patch ensures that '.' and '..' are resolved in all paths.
> 
> The core code only passes path elements (no '/') to the backend, with
> the notable exception of the '/' path, which refers to the virtfs root.
> This patch preserves the current behavior of converting it to '.' so
> that it can be passed to "*at()" syscalls ('/' would mean the host root).
> 
> Signed-off-by: Greg Kurz 
> ---
> +} else {
> +char *tmp = g_path_get_dirname(dir_path->data);
> +/* Symbolic links are resolved by the client. We can assume
> + * that ".." relative to "foo/bar" is equivalent to "foo"
> + */

Thanks for tweaking this since v1.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 0/2] docs: Minor updates around ide-drive

2017-05-23 Thread John Snow


On 05/23/2017 10:56 AM, Markus Armbruster wrote:
> Markus Armbruster  writes:
> 
>> John Snow  writes:
>>
>>> On 05/09/2017 05:41 AM, Markus Armbruster wrote:
 Markus Armbruster (2):
   docs qemu-doc: Avoid ide-drive, it's deprecated
   docs/qdev-device-use.txt: update section Default Devices

  docs/bootindex.txt   |  2 +-
  docs/qdev-device-use.txt | 13 +++--
  qemu-options.hx  |  2 +-
  3 files changed, 9 insertions(+), 8 deletions(-)

>>>
>>> Shame on me for not noticing, but thank you.
>>>
>>> Acked-by: John Snow 
>>
>> Thanks!
>>
>> The canonical path is through your tree, but I could stick it into a
>> miscellaneous pull request if you like.
> 
> Nevermind, qemu-trivial picked it up.
> 

That was my assumption since it went to -trivial :)
Glad my inaction helped sort it out.



Re: [Qemu-devel] [PATCH 4/4] qemu-iotests: Block migration test

2017-05-23 Thread Eric Blake
On 05/23/2017 09:01 AM, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/183 | 121 
> +
>  tests/qemu-iotests/183.out |  48 ++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 170 insertions(+)
>  create mode 100755 tests/qemu-iotests/183
>  create mode 100644 tests/qemu-iotests/183.out
> 

Does this test gracefully skip when coupled with the efforts for a
configure option to disable block migration?
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg04398.html

> +echo
> +echo === Do block migration to destination ===
> +echo
> +
> +_send_qemu_cmd $src "migrate -b unix:${MIG_SOCKET}" "(qemu)"

I don't see any earlier checks, so it looks like you'd choke here if -b
is not compiled in.

Should we also check the use of -i?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   >