date:20170705

Re: [Qemu-devel] [PATCH v2 00/13] vvfat: misc fixes for read-only mode

2017-07-05 Thread Hervé Poussineau


Hi,

Thanks to have taken this patch series.

However, I already have in my repository the v3 patch series, whose changelog 
is:

Changes v2->v3:
- added patches 5, 12, 16
- fixed warning (unused variable) (patch 11)
- added #defines for constants for deleted byte following Philippe remarks 
(patch 14)
- added #define and explanations for OEM name following Philippe remarks (patch 
15)

Changes v1->v2:
- small changes following Kevin remarks (patches 3, 5, 6)
- use g_utf8_* functions instead of ad-hock code (patches 8 and 9)
- fix a bug with filenames starting with a dot (patch 9)

Hervé Poussineau (16):
  vvfat: fix qemu-img map and qemu-img convert
  vvfat: replace tabs by 8 spaces
  vvfat: fix typos
  vvfat: rename useless enumeration values
  vvfat: add constants for special values of name[0]
  vvfat: introduce offset_to_bootsector, offset_to_fat and
offset_to_root_dir
  vvfat: fix field names in FAT12/FAT16 and FAT32 boot sectors
  vvfat: always create . and .. entries at first and in that order
  vvfat: correctly create long names for non-ASCII filenames
  vvfat: correctly create base short names for non-ASCII filenames
  vvfat: correctly generate numeric-tail of short file names
  vvfat: correctly parse non-ASCII short and long file names
  vvfat: limit number of entries in root directory in FAT12/FAT16
  vvfat: handle KANJI lead byte 0xe5
  vvfat: change OEM name to 'MSWIN4.1'
  vvfat: initialize memory after allocating it

Should I rebase on top of your branch, or should I send the v3 as is?

It fixes the last random errors I had in Win9x Scandisk (uninitialized memory).

Regards,

Hervé

Le 03/07/2017 à 18:50, Kevin Wolf a écrit :

Am 22.05.2017 um 23:11 hat Hervé Poussineau geschrieben:

Hi,

This patchset fixes some of issues I encountered when trying to use vvfat, and 
fixes
bug #1599539: https://bugs.launchpad.net/qemu/+bug/1599539

Patch 1 fixes a crash when using 'qemu-img convert'.
Patches 2 to 6 are code cleanup. No functionnal changes.
Patches 7 to 13 fix problems detected by disk checking utilities in read-only 
mode.

With these patches, vvfat creates valid FAT volumes and can be used with QEMU 
disk utilities.

Read-write mode is still buggy after this patchset, but at least, I was not
able to crash QEMU anymore.

Note that patch 2 doesn't pass checkpatch.pl, as it changes indentation only.


Thanks, fixed the build error in patch 9 (yet unused variables) and
applied to the block branch.

There were a few more minor comments for this series, but it has been on
the list for long enough and I figured that they can be addressed on
top.

Kevin

Re: [Qemu-devel] [Qemu-arm] [PATCH] target-arm: v7M: ignore writes to CONTROL.SPSEL from Thread mode

2017-07-05 Thread Philippe Mathieu-Daudé


Hi Peter,

On 06/30/2017 08:06 AM, Peter Maydell wrote:

For v7M, writes to the CONTROL register are only permitted for
privileged code. However even if the code is privileged, the
write must not affect the SPSEL bit in the CONTROL register
if the CPU is in Thread mode (as documented in the pseudocode
for the MSR instruction). Implement this, instead of permitting
SPSEL to be written in all cases.

This was causing mbed applications not to run, because the
RTX RTOS they use relies on this behaviour.

Signed-off-by: Peter Maydell  > ---
  target/arm/helper.c | 13 ++---
  1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 2594faa..4ed32c5 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8768,9 +8768,16 @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, 
uint32_t val)
  }
  break;
  case 20: /* CONTROL */
-switch_v7m_sp(env, (val & R_V7M_CONTROL_SPSEL_MASK) != 0);
-env->v7m.control = val & (R_V7M_CONTROL_SPSEL_MASK |
-  R_V7M_CONTROL_NPRIV_MASK);
+/* Writing to the SPSEL bit only has an effect if we are in
+ * thread mode; other bits can be updated by any privileged code.
+ * switch_v7m_sp() deals with updating the SPSEL bit in
+ * env->v7m.control, so we only need update the others.
+ */


I'v been thinking about adding some function like v7m_is_privileged() 
v7m_is_thread_mode() !v7m_exception_pending() to ease code readability, 
like armv7m_nvic_can_take_pending_exception() or is_singlestepping().

Not much inspired yet :(


+if (env->v7m.exception == 0) {
+switch_v7m_sp(env, (val & R_V7M_CONTROL_SPSEL_MASK) != 0);
+}
+env->v7m.control &= ~R_V7M_CONTROL_NPRIV_MASK;
+env->v7m.control |= val & R_V7M_CONTROL_NPRIV_MASK;
  break;
  default:
  qemu_log_mask(LOG_GUEST_ERROR, "Attempt to write unknown special"



Reviewed-by: Philippe Mathieu-Daudé 

Regards,

Phil.

Re: [Qemu-devel] [PATCH 1/3] include/hw/boards.h: Document memory_region_allocate_system_memory()

2017-07-05 Thread Philippe Mathieu-Daudé


Hi Peter, Paolo,

On 07/04/2017 02:02 PM, Peter Maydell wrote:

Add a documentation comment for memory_region_allocate_system_memory().

In particular, the reason for this function's existence and the
requirement on board code to call it exactly once are non-obvious.

Signed-off-by: Peter Maydell 
---
  include/hw/boards.h | 28 
  1 file changed, 28 insertions(+)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 76ce021..1bc5389 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -9,6 +9,34 @@
  #include "qom/object.h"
  #include "qom/cpu.h"
  
+/**

+ * memory_region_allocate_system_memory - Allocate a board's main memory
+ * @mr: the #MemoryRegion to be initialized
+ * @owner: the object that tracks the region's reference count
+ * @name: name of the memory region
+ * @ram_size: size of the region in bytes
+ *
+ * This function allocates the main memory for a board model, and
+ * initializes @mr appropriately. It also arranges for the memory
+ * to be migrated (by calling vmstate_register_ram_global()).
+ *
+ * Memory allocated via this function will be backed with the memory
+ * backend the user provided using -mem-path if appropriate; this
+ * is typically used to cause host huge pages to be used.
+ * This function should therefore be called by a board exactly once,


Using memory-backend-file objects one can use different mem-path.

Maybe removing the global mem_path used by vl.c for "main memory" (which 
is a memory-backend-file without naming it) this "exactly once" case can 
be avoided.



+ * for the primary or largest RAM area it implements.
+ *
+ * For boards where the major RAM is split into two parts in the memory
+ * map, you can deal with this by calling 
memory_region_allocate_system_memory()
+ * once to get a MemoryRegion with enough RAM for both parts, and then
+ * creating alias MemoryRegions via memory_region_init_alias() which
+ * alias into different parts of the RAM MemoryRegion and can be mapped
+ * into the memory map in the appropriate places.
+ *
+ * Smaller pieces of memory (display RAM, static RAMs, etc) don't need
+ * to be backed via the -mem-path memory backend and can simply
+ * be created via memory_region_init_ram().
+ */
  void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
const char *name,
uint64_t ram_size);

Re: [Qemu-devel] [RFC v3 3/3] char-socket: Report TCP socket waiting as information

2017-07-05 Thread Thomas Huth

On 05.07.2017 19:36, Alistair Francis wrote:
> When QEMU is waiting for a TCP socket connection it reports that message as
> an error. This isn't an error it is just information so let's change the
> report to use info_report() instead.
> 
> Signed-off-by: Alistair Francis 
> ---
> 
>  chardev/char-socket.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index ccc499cfa1..a050a686ea 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -765,8 +765,8 @@ static int tcp_chr_wait_connected(Chardev *chr, Error 
> **errp)
>   * in TLS and telnet cases, only wait for an accepted socket */
>  while (!s->ioc) {
>  if (s->is_listen) {
> -error_report("QEMU waiting for connection on: %s",
> - chr->filename);
> +info_report("QEMU waiting for connection on: %s",
> +chr->filename);
>  qio_channel_set_blocking(QIO_CHANNEL(s->listen_ioc), true, NULL);
>  tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr);
>  qio_channel_set_blocking(QIO_CHANNEL(s->listen_ioc), false, 
> NULL);
> 

Reviewed-by: Thomas Huth 

And in case you also want to add some warn_reports, I suggest to do a

grep -r "error_report.*[Ww]arning:" *

in the sources - there seem to be quite a lot of error_reports that are
rather a warning instead.

 Thomas

Re: [Qemu-devel] [RFC v3 2/3] qemu-error: Implement a more generic error reporting

2017-07-05 Thread Thomas Huth

On 05.07.2017 19:36, Alistair Francis wrote:
> This patch converts the existing error_vreport() function into a generic
> qmesg_vreport() function that takes an enum describing the

s/qmesg/qmsg/

> information to be reported.
> 
> As part of this change a new qmesg_report() function is added as well with the

s/qmesg/qmsg/

> same capability.
> 
> To maintain full compatibility the original error_report() function is
> maintained and no changes to the way errors are printed have been made.
> To improve access to the new informaiton and warning options wrapper functions

s/informaition/information/

> similar to error_report() have been added for warnings and information
> printing.
> 
> Signed-off-by: Alistair Francis 
> ---
> RFC V3:
>  - Change the function and enum names to be more descriptive
>  - Add wrapper functions for *_report() and *_vreport()
> 
>  include/qemu/error-report.h | 16 +
>  scripts/checkpatch.pl   |  8 -
>  util/qemu-error.c   | 80 
> +++--
>  3 files changed, 100 insertions(+), 4 deletions(-)

With the typos fixed:

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [RFC v3 1/3] util/qemu-error: Rename error_print_loc() to be more generic

2017-07-05 Thread Thomas Huth

On 05.07.2017 19:36, Alistair Francis wrote:
> Rename the error_print_loc() function in preparation for using it to
> print warnings as well.
> 
> Signed-off-by: Alistair Francis 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
> 
>  util/qemu-error.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/util/qemu-error.c b/util/qemu-error.c
> index b331f8f4a4..1c5e35ecdb 100644
> --- a/util/qemu-error.c
> +++ b/util/qemu-error.c
> @@ -146,7 +146,7 @@ const char *error_get_progname(void)
>  /*
>   * Print current location to current monitor if we have one, else to stderr.
>   */
> -static void error_print_loc(void)
> +static void print_loc(void)
>  {
>  const char *sep = "";
>  int i;
> @@ -197,7 +197,7 @@ void error_vreport(const char *fmt, va_list ap)
>  g_free(timestr);
>  }
>  
> -error_print_loc();
> +print_loc();
>  error_vprintf(fmt, ap);
>  error_printf("\n");
>  }

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()

2017-07-05 Thread Eric Blake

On 07/03/2017 05:14 PM, Eric Blake wrote:
> Not all callers care about which BDS owns the mapping for a given
> range of the file.  In particular, bdrv_is_allocated() cares more
> about finding the largest run of allocated data from the guest
> perspective, whether or not that data is consecutive from the
> host perspective.  Therefore, doing subsequent refinements such
> as checking how much of the format-layer allocation also satisfies
> BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best
> case, it just costs extra CPU cycles during a single
> bdrv_is_allocated(), but in the worst case, it results in a smaller
> *pnum, and forces callers to iterate through more status probes when
> visiting the entire file for even more extra CPU cycles.
> 
> This patch only optimizes the block layer.  But subsequent patches
> will tweak the driver callback to be byte-based, and in the process,
> can also pass this hint through to the driver.
> 
> Signed-off-by: Eric Blake 
> 

> @@ -1810,12 +1817,13 @@ static int64_t coroutine_fn 
> bdrv_co_get_block_status(BlockDriverState *bs,
>  }
>  }
> 
> -if (local_file && local_file != bs &&
> +if (!allocation && local_file && local_file != bs &&
>  (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>  (ret & BDRV_BLOCK_OFFSET_VALID)) {
>  int file_pnum;
> 
> -ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
> +ret2 = bdrv_co_get_block_status(local_file, true,
> +ret >> BDRV_SECTOR_BITS,
>  *pnum, _pnum, NULL);
>  if (ret2 >= 0) {
>  /* Ignore errors.  This is just providing extra information, it

Hmm. My initial thinking here was that if we already have a good primary
status, we want our secondary status (where we are probing bs->file for
whether we can add BDRV_BLOCK_ZERO) to be as fast as possible, so I
hard-coded the answer that favors is_allocated (I have to be careful
describing this, since v3 will switch from 'bool allocated=true' to
'bool mapping=false' to express that same request).  But it turns out
that, at least for file-posix.c (for that matter, for several protocol
drivers), it's a LOT faster to just blindly state that the entire file
is allocated and data than it is to lseek(SEEK_HOLE).  So favoring
allocation status instead of mapping status defeats the purpose, and
this should be s/true/allocation/ (which is always false at this point)
[or conversely s/false/mapping/, which is always true, in v3].

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4 0/1] virtio-scsi-ccw: fix iotest 068 for s390x

2017-07-05 Thread QingFeng Hao




在 2017/7/5 23:15, Stefan Hajnoczi 写道:

On Tue, Jul 04, 2017 at 03:23:49PM +0200, QingFeng Hao wrote:

This commit fixes iotest 068 for s390x as s390x uses virtio-scsi-ccw.
It's based on commit c324fd0a39c by Stefan Hajnoczi.
Thanks!

Change history:
v4:
 Got Cornelia Huck's Reviewed-by and take the comment to change the
 commit message.

v3:
 Take Christian Borntraeger and Cornelia Huck's comment to check
 if kvm is enabled in s390_assign_subch_ioeventfd instead of
 kvm_s390_assign_subch_ioeventfd to as the former is a general one.

v2:
 Remove Stefan from sign-off list and change the patch's commit message
 according to Christian Borntraeger's comment.

QingFeng Hao (1):
   virtio-scsi-ccw: use ioeventfd even when KVM is disabled

  hw/s390x/virtio-ccw.c | 2 +-
  target/s390x/cpu.h| 6 +-
  2 files changed, 6 insertions(+), 2 deletions(-)

I didn't realize s390 also has this old check.  Thanks for fixing it!

Thanks Stefan!


Reviewed-by: Stefan Hajnoczi 


--
Regards
QingFeng Hao

[Qemu-devel] [PATCH v2.1 3/4] doc: add item for "-M enforce-config-section"

2017-07-05 Thread Peter Xu

It's never documented, and now we have one more parameter for it (which
obsoletes this one). Document it properly.

Although now when enforce-config-section is set, it'll override the
other "-global" parameter, that is not necessarily a rule. Forbid that
usage in the document.

Suggested-by: Eduardo Habkost 
Signed-off-by: Peter Xu 
---
v2.1:
- remove the "undefined behavior" sentence [Markus]

 qemu-options.hx | 8 
 1 file changed, 8 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index 297bd8a..1ce7a37 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -85,6 +85,14 @@ Enables or disables NVDIMM support. The default is off.
 @item s390-squash-mcss=on|off
 Enables or disables squashing subchannels into the default css.
 The default is off.
+@item enforce-config-section=on|off
+If @option{enforce-config-section} is set to @var{on}, force migration
+code to send configuration section even if the machine-type sets the
+@option{migration.send-configuration} property to @var{off}.
+NOTE: this parameter is deprecated. Please use @option{-global}
+@option{migration.send-configuration}=@var{on|off} instead.
+@option{enforce-config-section} cannot be used together with
+@option{-global} @option{migration.send-configuration}.
 @end table
 ETEXI
 
-- 
2.7.4

Re: [Qemu-devel] [PATCH 3/4] doc: add item for "-M enforce-config-section"

2017-07-05 Thread Peter Xu

On Wed, Jul 05, 2017 at 05:31:22PM +0200, Markus Armbruster wrote:
> Eduardo Habkost  writes:
> 
> > (CCing Greg, the original author of the code that added the
> > enforce-config-section option)
> >
> > On Tue, Jul 04, 2017 at 10:06:54AM +0200, Markus Armbruster wrote:
> >> Peter Xu  writes:
> >> 
> >> > It's never documented, and now we have one more parameter for it (which
> >> > means this one can be obsolete in the future). Document it properly.
> >> >
> >> > Although now when enforce-config-section is set, it'll override the
> >> > other "-global" parameter, that is not necessarily a rule. Forbid that
> >> > usage in the document.
> >> >
> >> > Suggested-by: Eduardo Habkost 
> >> > Signed-off-by: Peter Xu 
> >> > ---
> >> >  qemu-options.hx | 8 
> >> >  1 file changed, 8 insertions(+)
> >> >
> >> > diff --git a/qemu-options.hx b/qemu-options.hx
> >> > index 297bd8a..927c51f 100644
> >> > --- a/qemu-options.hx
> >> > +++ b/qemu-options.hx
> >> > @@ -85,6 +85,14 @@ Enables or disables NVDIMM support. The default is 
> >> > off.
> >> >  @item s390-squash-mcss=on|off
> >> >  Enables or disables squashing subchannels into the default css.
> >> >  The default is off.
> >> > +@item enforce-config-section=on|off
> >> > +Decides whether we will send the configuration section when doing
> >> > +migration. By default, it is turned on. We can set this to off to
> >> > +explicitly disable it. Note: this parameter will be obsolete soon,
> >> > +please use "-global migration.send-configuration=on|off" instead.
> >> 
> >> Please say "... is deprecated, please use ...", to make it visible in
> >> "git-grep -i deprecat".
> >> 
> >> > +"enforce-config-section" cannot be used together with "-global
> >> > +migration.send-configuration". If it happens, the behavior is
> >> > +undefined.
> >> 
> >> Nasty.  Could we catch and reject such invalid usage?
> >
> > Actually, the machine option will override
> > migration.send-configuration=off, and I don't believe we will break that
> > rule.  Documenting that behavior (but warning that it is deprecated)
> > sounds easier than adding extra code to detect the conflicting options.
> >
> > We can simply replace "is undefined"  with "enforce-config-section will
> > override migration.send-configuration, but enforce-config-section is
> > deprecated".
> 
> Better than the scary "behavior is undefined".  But do we have to say
> anything at all?  If somebody gives both options with different values,
> the conflicting settings fight it out.  In a sane command line, the last
> one wins.  Ours isn't sane.  Do we really have to specify who wins?
> 
> > I don't think anybody is relying on that option. If nobody is using it,
> > we can remove its code soon if we make it trigger a warning.
> > Machine-type compatibility code is already using
> > migration.send-configuration instead.
> 
> What about
> 
>   @item enforce-config-section=on|off
>   Controls sending of the configuration section when doing migration
>   (default on).

Sorry to be misleading in current patch. Its default should be "off"
but not "on" (as corrected by Eduardo, the point is we are talking
about "enforce-config-section", not "migration.send-configuration").
Actually it won't make much sense if someone specify "off" here.

> Note: this parameter is deprecated, please use
>   "-global migration.send-configuration=on|off" instead.

This is a suggestion I would like to take. I'll remove the whole
"undefined" sentence and update my post. Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH 07/11] target/sh4: Unify cpu_fregs into FREG

2017-07-05 Thread Philippe Mathieu-Daudé


On 07/05/2017 09:23 PM, Richard Henderson wrote:

We were treating FREG as an index and REG as a TCGv.
Making FREG return a TCGv is both less confusing and
a step toward cleaner banking of cpu_fregs.

Signed-off-by: Richard Henderson 


Reviewed-by: Philippe Mathieu-Daudé 


---
  target/sh4/translate.c | 123 +
  1 file changed, 52 insertions(+), 71 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 20e24d5..e4fd6f2 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -382,10 +382,11 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
  #define REG(x) ctx->gregs[x]
  #define ALTREG(x)  ctx->altregs[x]
  
-#define FREG(x) (ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x))

+#define FREG(x) cpu_fregs[ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x)]
  #define XHACK(x) x) & 1 ) << 4) | ((x) & 0xe))
-#define XREG(x) (ctx->tbflags & FPSCR_FR ? XHACK(x) ^ 0x10 : XHACK(x))
-#define DREG(x) FREG(x) /* Assumes lsb of (x) is always 0 */
+#define XREG(x) FREG(XHACK(x))
+/* Assumes lsb of (x) is always 0 */
+#define DREG(x) (ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x))
  
  #define CHECK_NOT_DELAY_SLOT \

  if (ctx->envflags & DELAY_SLOT_MASK) {   \
@@ -1005,56 +1006,51 @@ static void _decode_opc(DisasContext * ctx)
CHECK_FPU_ENABLED
  if (ctx->tbflags & FPSCR_SZ) {
TCGv_i64 fp = tcg_temp_new_i64();
-   gen_load_fpr64(fp, XREG(B7_4));
-   gen_store_fpr64(fp, XREG(B11_8));
+   gen_load_fpr64(fp, XHACK(B7_4));
+   gen_store_fpr64(fp, XHACK(B11_8));
tcg_temp_free_i64(fp);
} else {
-   tcg_gen_mov_i32(cpu_fregs[FREG(B11_8)], cpu_fregs[FREG(B7_4)]);
+   tcg_gen_mov_i32(FREG(B11_8), FREG(B7_4));
}
return;
  case 0xf00a: /* fmov {F,D,X}Rm,@Rn - FPSCR: Nothing */
CHECK_FPU_ENABLED
  if (ctx->tbflags & FPSCR_SZ) {
TCGv addr_hi = tcg_temp_new();
-   int fr = XREG(B7_4);
+   int fr = XHACK(B7_4);
tcg_gen_addi_i32(addr_hi, REG(B11_8), 4);
-tcg_gen_qemu_st_i32(cpu_fregs[fr], REG(B11_8),
-ctx->memidx, MO_TEUL);
-tcg_gen_qemu_st_i32(cpu_fregs[fr+1], addr_hi,
-ctx->memidx, MO_TEUL);
+tcg_gen_qemu_st_i32(FREG(fr), REG(B11_8), ctx->memidx, MO_TEUL);
+tcg_gen_qemu_st_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
tcg_temp_free(addr_hi);
} else {
-tcg_gen_qemu_st_i32(cpu_fregs[FREG(B7_4)], REG(B11_8),
-ctx->memidx, MO_TEUL);
+tcg_gen_qemu_st_i32(FREG(B7_4), REG(B11_8), ctx->memidx, MO_TEUL);
}
return;
  case 0xf008: /* fmov @Rm,{F,D,X}Rn - FPSCR: Nothing */
CHECK_FPU_ENABLED
  if (ctx->tbflags & FPSCR_SZ) {
TCGv addr_hi = tcg_temp_new();
-   int fr = XREG(B11_8);
+   int fr = XHACK(B11_8);
tcg_gen_addi_i32(addr_hi, REG(B7_4), 4);
-tcg_gen_qemu_ld_i32(cpu_fregs[fr], REG(B7_4), ctx->memidx, 
MO_TEUL);
-tcg_gen_qemu_ld_i32(cpu_fregs[fr+1], addr_hi, ctx->memidx, 
MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(fr), REG(B7_4), ctx->memidx, MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
tcg_temp_free(addr_hi);
} else {
-tcg_gen_qemu_ld_i32(cpu_fregs[FREG(B11_8)], REG(B7_4),
-ctx->memidx, MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(B11_8), REG(B7_4), ctx->memidx, MO_TEUL);
}
return;
  case 0xf009: /* fmov @Rm+,{F,D,X}Rn - FPSCR: Nothing */
CHECK_FPU_ENABLED
  if (ctx->tbflags & FPSCR_SZ) {
TCGv addr_hi = tcg_temp_new();
-   int fr = XREG(B11_8);
+   int fr = XHACK(B11_8);
tcg_gen_addi_i32(addr_hi, REG(B7_4), 4);
-tcg_gen_qemu_ld_i32(cpu_fregs[fr], REG(B7_4), ctx->memidx, 
MO_TEUL);
-tcg_gen_qemu_ld_i32(cpu_fregs[fr+1], addr_hi, ctx->memidx, 
MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(fr), REG(B7_4), ctx->memidx, MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 8);
tcg_temp_free(addr_hi);
} else {
-tcg_gen_qemu_ld_i32(cpu_fregs[FREG(B11_8)], REG(B7_4),
-ctx->memidx, MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(B11_8), REG(B7_4), ctx->memidx, MO_TEUL);
tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 4);
}
return;
@@ -1063,13 +1059,12 @@ static void _decode_opc(DisasContext * ctx)
  TCGv addr = tcg_temp_new_i32();
  tcg_gen_subi_i32(addr, REG(B11_8), 4);
  if (ctx->tbflags & FPSCR_SZ) {
-   int fr = XREG(B7_4);
-

Re: [Qemu-devel] [PATCH 00/11] target/sh4 improvments

2017-07-05 Thread Laurent Vivier

Le 06/07/2017 à 02:23, Richard Henderson a écrit :
> This fixes two problems with atomic operations on sh4,
> including an attempt at supporting the user-space atomics
> technique used by most sh-linux-user binaries.

I tried some time ago to support gUSA hack by decoding the
instruction[1] the application wants to be atomic, but it is not viable
as we need to know all possible sequences the user can generate (but we
can guess it is always generated by glibc, and thus well known).

Laurent

[1]
https://github.com/vivier/qemu/commit/25013c121661c6831f08c2140114c8de06cf48da

Re: [Qemu-devel] [PATCH 05/11] linux-user/sh4: Notice gUSA regions during signal delivery

2017-07-05 Thread Laurent Vivier

Le 06/07/2017 à 02:23, Richard Henderson a écrit :
> We translate gUSA regions atomically in a parallel context.
> But in a serial context a gUSA region may be interrupted.
> In that case, restart the region as the kernel would.
> 
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/signal.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/linux-user/signal.c b/linux-user/signal.c
> index 3d18d1b..1e716a9 100644
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -3471,6 +3471,23 @@ static abi_ulong get_sigframe(struct target_sigaction 
> *ka,
>  return (sp - frame_size) & -8ul;
>  }
>  
> +/* Notice when we're in the middle of a gUSA region and reset.
> +   Note that this will only occur for !parallel_cpus, as we will
> +   translate such sequences differently in a parallel context.  */
> +static void unwind_gusa(CPUSH4State *regs)
> +{
> +/* If the stack pointer is sufficiently negative... */
> +if ((regs->gregs[15] & 0xc000u) == 0xc000u) {
> +/* Reset the PC to before the gUSA region, as computed from
> +   R0 = region end, SP = -(region size), plus one more insn
> +   that actually sets SP to the region size.  */
> +regs->pc = regs->gregs[0] + regs->gregs[15] - 2;
> +
> +/* Reset the SP to the saved version in R1.  */
> +regs->gregs[15] = regs->gregs[1];
> +}
> +}
> +
>  static void setup_sigcontext(struct target_sigcontext *sc,
>   CPUSH4State *regs, unsigned long mask)
>  {
> @@ -3534,6 +3551,8 @@ static void setup_frame(int sig, struct 
> target_sigaction *ka,
>  abi_ulong frame_addr;
>  int i;
>  
> +unwind_gusa(regs);
> +
>  frame_addr = get_sigframe(ka, regs->gregs[15], sizeof(*frame));
>  trace_user_setup_frame(regs, frame_addr);
>  if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) {
> @@ -3583,6 +3602,8 @@ static void setup_rt_frame(int sig, struct 
> target_sigaction *ka,
>  abi_ulong frame_addr;
>  int i;
>  
> +unwind_gusa(regs);
> +
>  frame_addr = get_sigframe(ka, regs->gregs[15], sizeof(*frame));
>  trace_user_setup_rt_frame(regs, frame_addr);
>  if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) {
> 

Reviewed-by: Laurent Vivier

[Qemu-devel] [PATCH 08/11] target/sh4: Pass DisasContext to fpr64 routines

2017-07-05 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/sh4/translate.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index e4fd6f2..05657a9 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -359,12 +359,12 @@ static void gen_delayed_conditional_jump(DisasContext * 
ctx)
 gen_jump(ctx);
 }
 
-static inline void gen_load_fpr64(TCGv_i64 t, int reg)
+static inline void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
 tcg_gen_concat_i32_i64(t, cpu_fregs[reg + 1], cpu_fregs[reg]);
 }
 
-static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
+static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
 tcg_gen_extr_i64_i32(cpu_fregs[reg + 1], cpu_fregs[reg], t);
 }
@@ -1006,8 +1006,8 @@ static void _decode_opc(DisasContext * ctx)
CHECK_FPU_ENABLED
 if (ctx->tbflags & FPSCR_SZ) {
TCGv_i64 fp = tcg_temp_new_i64();
-   gen_load_fpr64(fp, XHACK(B7_4));
-   gen_store_fpr64(fp, XHACK(B11_8));
+   gen_load_fpr64(ctx, fp, XHACK(B7_4));
+   gen_store_fpr64(ctx, fp, XHACK(B11_8));
tcg_temp_free_i64(fp);
} else {
tcg_gen_mov_i32(FREG(B11_8), FREG(B7_4));
@@ -1116,8 +1116,8 @@ static void _decode_opc(DisasContext * ctx)
break; /* illegal instruction */
fp0 = tcg_temp_new_i64();
fp1 = tcg_temp_new_i64();
-   gen_load_fpr64(fp0, DREG(B11_8));
-   gen_load_fpr64(fp1, DREG(B7_4));
+   gen_load_fpr64(ctx, fp0, DREG(B11_8));
+   gen_load_fpr64(ctx, fp1, DREG(B7_4));
 switch (ctx->opcode & 0xf00f) {
 case 0xf000:   /* fadd Rm,Rn */
 gen_helper_fadd_DT(fp0, cpu_env, fp0, fp1);
@@ -1138,7 +1138,7 @@ static void _decode_opc(DisasContext * ctx)
 gen_helper_fcmp_gt_DT(cpu_env, fp0, fp1);
 return;
 }
-   gen_store_fpr64(fp0, DREG(B11_8));
+   gen_store_fpr64(ctx, fp0, DREG(B11_8));
 tcg_temp_free_i64(fp0);
 tcg_temp_free_i64(fp1);
} else {
@@ -1728,7 +1728,7 @@ static void _decode_opc(DisasContext * ctx)
break; /* illegal instruction */
fp = tcg_temp_new_i64();
 gen_helper_float_DT(fp, cpu_env, cpu_fpul);
-   gen_store_fpr64(fp, DREG(B11_8));
+   gen_store_fpr64(ctx, fp, DREG(B11_8));
tcg_temp_free_i64(fp);
}
else {
@@ -1742,7 +1742,7 @@ static void _decode_opc(DisasContext * ctx)
if (ctx->opcode & 0x0100)
break; /* illegal instruction */
fp = tcg_temp_new_i64();
-   gen_load_fpr64(fp, DREG(B11_8));
+   gen_load_fpr64(ctx, fp, DREG(B11_8));
 gen_helper_ftrc_DT(cpu_fpul, cpu_env, fp);
tcg_temp_free_i64(fp);
}
@@ -1762,9 +1762,9 @@ static void _decode_opc(DisasContext * ctx)
if (ctx->opcode & 0x0100)
break; /* illegal instruction */
TCGv_i64 fp = tcg_temp_new_i64();
-   gen_load_fpr64(fp, DREG(B11_8));
+   gen_load_fpr64(ctx, fp, DREG(B11_8));
gen_helper_fabs_DT(fp, fp);
-   gen_store_fpr64(fp, DREG(B11_8));
+   gen_store_fpr64(ctx, fp, DREG(B11_8));
tcg_temp_free_i64(fp);
} else {
gen_helper_fabs_FT(FREG(B11_8), FREG(B11_8));
@@ -1776,9 +1776,9 @@ static void _decode_opc(DisasContext * ctx)
if (ctx->opcode & 0x0100)
break; /* illegal instruction */
TCGv_i64 fp = tcg_temp_new_i64();
-   gen_load_fpr64(fp, DREG(B11_8));
+   gen_load_fpr64(ctx, fp, DREG(B11_8));
 gen_helper_fsqrt_DT(fp, cpu_env, fp);
-   gen_store_fpr64(fp, DREG(B11_8));
+   gen_store_fpr64(ctx, fp, DREG(B11_8));
tcg_temp_free_i64(fp);
} else {
 gen_helper_fsqrt_FT(FREG(B11_8), cpu_env, FREG(B11_8));
@@ -1804,7 +1804,7 @@ static void _decode_opc(DisasContext * ctx)
{
TCGv_i64 fp = tcg_temp_new_i64();
 gen_helper_fcnvsd_FT_DT(fp, cpu_env, cpu_fpul);
-   gen_store_fpr64(fp, DREG(B11_8));
+   gen_store_fpr64(ctx, fp, DREG(B11_8));
tcg_temp_free_i64(fp);
}
return;
@@ -1812,7 +1812,7 @@ static void _decode_opc(DisasContext * ctx)
CHECK_FPU_ENABLED
{
TCGv_i64 fp = tcg_temp_new_i64();
-   gen_load_fpr64(fp, DREG(B11_8));
+   gen_load_fpr64(ctx, fp, DREG(B11_8));
 gen_helper_fcnvds_DT_FT(cpu_fpul, cpu_env, fp);
tcg_temp_free_i64(fp);
}
-- 
2.9.4

[Qemu-devel] [PATCH 05/11] linux-user/sh4: Notice gUSA regions during signal delivery

2017-07-05 Thread Richard Henderson

We translate gUSA regions atomically in a parallel context.
But in a serial context a gUSA region may be interrupted.
In that case, restart the region as the kernel would.

Signed-off-by: Richard Henderson 
---
 linux-user/signal.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 3d18d1b..1e716a9 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -3471,6 +3471,23 @@ static abi_ulong get_sigframe(struct target_sigaction 
*ka,
 return (sp - frame_size) & -8ul;
 }
 
+/* Notice when we're in the middle of a gUSA region and reset.
+   Note that this will only occur for !parallel_cpus, as we will
+   translate such sequences differently in a parallel context.  */
+static void unwind_gusa(CPUSH4State *regs)
+{
+/* If the stack pointer is sufficiently negative... */
+if ((regs->gregs[15] & 0xc000u) == 0xc000u) {
+/* Reset the PC to before the gUSA region, as computed from
+   R0 = region end, SP = -(region size), plus one more insn
+   that actually sets SP to the region size.  */
+regs->pc = regs->gregs[0] + regs->gregs[15] - 2;
+
+/* Reset the SP to the saved version in R1.  */
+regs->gregs[15] = regs->gregs[1];
+}
+}
+
 static void setup_sigcontext(struct target_sigcontext *sc,
  CPUSH4State *regs, unsigned long mask)
 {
@@ -3534,6 +3551,8 @@ static void setup_frame(int sig, struct target_sigaction 
*ka,
 abi_ulong frame_addr;
 int i;
 
+unwind_gusa(regs);
+
 frame_addr = get_sigframe(ka, regs->gregs[15], sizeof(*frame));
 trace_user_setup_frame(regs, frame_addr);
 if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) {
@@ -3583,6 +3602,8 @@ static void setup_rt_frame(int sig, struct 
target_sigaction *ka,
 abi_ulong frame_addr;
 int i;
 
+unwind_gusa(regs);
+
 frame_addr = get_sigframe(ka, regs->gregs[15], sizeof(*frame));
 trace_user_setup_rt_frame(regs, frame_addr);
 if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) {
-- 
2.9.4

[Qemu-devel] [PATCH 09/11] target/sh4: Avoid a potential translator crash for malformed FPR64

2017-07-05 Thread Richard Henderson

Produce valid, but nonsensical, code given an odd register index.

Signed-off-by: Richard Henderson 
---
 target/sh4/translate.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 05657a9..7f015c3 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -359,14 +359,18 @@ static void gen_delayed_conditional_jump(DisasContext * 
ctx)
 gen_jump(ctx);
 }
 
-static inline void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
+/* Assumes lsb of (x) is always 0.  */
+/* ??? Should the translator should signal an invalid opc?
+   In the meantime, using OR instead of PLUS to form the index of the
+   low register means we can't crash the translator for REG==15.  */
+static void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
-tcg_gen_concat_i32_i64(t, cpu_fregs[reg + 1], cpu_fregs[reg]);
+tcg_gen_concat_i32_i64(t, cpu_fregs[reg | 1], cpu_fregs[reg]);
 }
 
-static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
+static void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
-tcg_gen_extr_i64_i32(cpu_fregs[reg + 1], cpu_fregs[reg], t);
+tcg_gen_extr_i64_i32(cpu_fregs[reg | 1], cpu_fregs[reg], t);
 }
 
 #define B3_0 (ctx->opcode & 0xf)
@@ -385,7 +389,6 @@ static inline void gen_store_fpr64(DisasContext *ctx, 
TCGv_i64 t, int reg)
 #define FREG(x) cpu_fregs[ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x)]
 #define XHACK(x) x) & 1 ) << 4) | ((x) & 0xe))
 #define XREG(x) FREG(XHACK(x))
-/* Assumes lsb of (x) is always 0 */
 #define DREG(x) (ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x))
 
 #define CHECK_NOT_DELAY_SLOT \
-- 
2.9.4

[Qemu-devel] [PATCH 11/11] target/sh4: Eliminate DREG macro

2017-07-05 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/sh4/translate.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index a45d0ee..7e3de74 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -398,7 +398,6 @@ static void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, 
int reg)
 #define FREG(x)ctx->fregs[x]
 #define XHACK(x)   x) & 1 ) << 4) | ((x) & 0xe))
 #define XREG(x)FREG(XHACK(x))
-#define DREG(x)(x)
 
 #define CHECK_NOT_DELAY_SLOT \
 if (ctx->envflags & DELAY_SLOT_MASK) {   \
@@ -1128,8 +1127,8 @@ static void _decode_opc(DisasContext * ctx)
break; /* illegal instruction */
fp0 = tcg_temp_new_i64();
fp1 = tcg_temp_new_i64();
-   gen_load_fpr64(ctx, fp0, DREG(B11_8));
-   gen_load_fpr64(ctx, fp1, DREG(B7_4));
+   gen_load_fpr64(ctx, fp0, B11_8);
+   gen_load_fpr64(ctx, fp1, B7_4);
 switch (ctx->opcode & 0xf00f) {
 case 0xf000:   /* fadd Rm,Rn */
 gen_helper_fadd_DT(fp0, cpu_env, fp0, fp1);
@@ -1150,7 +1149,7 @@ static void _decode_opc(DisasContext * ctx)
 gen_helper_fcmp_gt_DT(cpu_env, fp0, fp1);
 return;
 }
-   gen_store_fpr64(ctx, fp0, DREG(B11_8));
+   gen_store_fpr64(ctx, fp0, B11_8);
 tcg_temp_free_i64(fp0);
 tcg_temp_free_i64(fp1);
} else {
@@ -1740,7 +1739,7 @@ static void _decode_opc(DisasContext * ctx)
break; /* illegal instruction */
fp = tcg_temp_new_i64();
 gen_helper_float_DT(fp, cpu_env, cpu_fpul);
-   gen_store_fpr64(ctx, fp, DREG(B11_8));
+   gen_store_fpr64(ctx, fp, B11_8);
tcg_temp_free_i64(fp);
}
else {
@@ -1754,7 +1753,7 @@ static void _decode_opc(DisasContext * ctx)
if (ctx->opcode & 0x0100)
break; /* illegal instruction */
fp = tcg_temp_new_i64();
-   gen_load_fpr64(ctx, fp, DREG(B11_8));
+   gen_load_fpr64(ctx, fp, B11_8);
 gen_helper_ftrc_DT(cpu_fpul, cpu_env, fp);
tcg_temp_free_i64(fp);
}
@@ -1774,9 +1773,9 @@ static void _decode_opc(DisasContext * ctx)
if (ctx->opcode & 0x0100)
break; /* illegal instruction */
TCGv_i64 fp = tcg_temp_new_i64();
-   gen_load_fpr64(ctx, fp, DREG(B11_8));
+   gen_load_fpr64(ctx, fp, B11_8);
gen_helper_fabs_DT(fp, fp);
-   gen_store_fpr64(ctx, fp, DREG(B11_8));
+   gen_store_fpr64(ctx, fp, B11_8);
tcg_temp_free_i64(fp);
} else {
gen_helper_fabs_FT(FREG(B11_8), FREG(B11_8));
@@ -1788,9 +1787,9 @@ static void _decode_opc(DisasContext * ctx)
if (ctx->opcode & 0x0100)
break; /* illegal instruction */
TCGv_i64 fp = tcg_temp_new_i64();
-   gen_load_fpr64(ctx, fp, DREG(B11_8));
+   gen_load_fpr64(ctx, fp, B11_8);
 gen_helper_fsqrt_DT(fp, cpu_env, fp);
-   gen_store_fpr64(ctx, fp, DREG(B11_8));
+   gen_store_fpr64(ctx, fp, B11_8);
tcg_temp_free_i64(fp);
} else {
 gen_helper_fsqrt_FT(FREG(B11_8), cpu_env, FREG(B11_8));
@@ -1816,7 +1815,7 @@ static void _decode_opc(DisasContext * ctx)
{
TCGv_i64 fp = tcg_temp_new_i64();
 gen_helper_fcnvsd_FT_DT(fp, cpu_env, cpu_fpul);
-   gen_store_fpr64(ctx, fp, DREG(B11_8));
+   gen_store_fpr64(ctx, fp, B11_8);
tcg_temp_free_i64(fp);
}
return;
@@ -1824,7 +1823,7 @@ static void _decode_opc(DisasContext * ctx)
CHECK_FPU_ENABLED
{
TCGv_i64 fp = tcg_temp_new_i64();
-   gen_load_fpr64(ctx, fp, DREG(B11_8));
+   gen_load_fpr64(ctx, fp, B11_8);
 gen_helper_fcnvds_DT_FT(cpu_fpul, cpu_env, fp);
tcg_temp_free_i64(fp);
}
-- 
2.9.4

[Qemu-devel] [PATCH 10/11] target/sh4: Hoist fp bank selection

2017-07-05 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/sh4/translate.c | 34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 7f015c3..a45d0ee 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -37,6 +37,7 @@ typedef struct DisasContext {
 struct TranslationBlock *tb;
 TCGv *gregs; /* active bank */
 TCGv *altregs;   /* inactive, alternate, bank */
+TCGv *fregs; /* active bank */
 target_ulong pc;
 uint16_t opcode;
 uint32_t tbflags;/* should stay unmodified during the TB translation */
@@ -72,7 +73,7 @@ static TCGv cpu_pc, cpu_ssr, cpu_spc, cpu_gbr;
 static TCGv cpu_vbr, cpu_sgr, cpu_dbr, cpu_mach, cpu_macl;
 static TCGv cpu_pr, cpu_fpscr, cpu_fpul;
 static TCGv cpu_lock_addr, cpu_lock_value;
-static TCGv cpu_fregs[32];
+static TCGv cpu_fregs[2][16];
 
 /* internal register indexes */
 static TCGv cpu_flags, cpu_delayed_pc, cpu_delayed_cond;
@@ -176,10 +177,18 @@ void sh4_translate_init(void)
offsetof(CPUSH4State, lock_value),
 "_lock_value_");
 
-for (i = 0; i < 32; i++)
-cpu_fregs[i] = tcg_global_mem_new_i32(cpu_env,
-  offsetof(CPUSH4State, fregs[i]),
-  fregnames[i]);
+for (i = 0; i < 16; i++) {
+cpu_fregs[0][i]
+= tcg_global_mem_new_i32(cpu_env,
+ offsetof(CPUSH4State, fregs[i]),
+ fregnames[i]);
+}
+for (i = 16; i < 32; i++) {
+cpu_fregs[1][i - 16]
+= tcg_global_mem_new_i32(cpu_env,
+ offsetof(CPUSH4State, fregs[i]),
+ fregnames[i]);
+}
 
 done_init = 1;
 }
@@ -365,12 +374,12 @@ static void gen_delayed_conditional_jump(DisasContext * 
ctx)
low register means we can't crash the translator for REG==15.  */
 static void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
-tcg_gen_concat_i32_i64(t, cpu_fregs[reg | 1], cpu_fregs[reg]);
+tcg_gen_concat_i32_i64(t, ctx->fregs[reg | 1], ctx->fregs[reg]);
 }
 
 static void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
-tcg_gen_extr_i64_i32(cpu_fregs[reg | 1], cpu_fregs[reg], t);
+tcg_gen_extr_i64_i32(ctx->fregs[reg | 1], ctx->fregs[reg], t);
 }
 
 #define B3_0 (ctx->opcode & 0xf)
@@ -386,10 +395,10 @@ static void gen_store_fpr64(DisasContext *ctx, TCGv_i64 
t, int reg)
 #define REG(x) ctx->gregs[x]
 #define ALTREG(x)  ctx->altregs[x]
 
-#define FREG(x) cpu_fregs[ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x)]
-#define XHACK(x) x) & 1 ) << 4) | ((x) & 0xe))
-#define XREG(x) FREG(XHACK(x))
-#define DREG(x) (ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x))
+#define FREG(x)ctx->fregs[x]
+#define XHACK(x)   x) & 1 ) << 4) | ((x) & 0xe))
+#define XREG(x)FREG(XHACK(x))
+#define DREG(x)(x)
 
 #define CHECK_NOT_DELAY_SLOT \
 if (ctx->envflags & DELAY_SLOT_MASK) {   \
@@ -2230,6 +2239,9 @@ void gen_intermediate_code(CPUSH4State * env, struct 
TranslationBlock *tb)
 ctx.gregs = cpu_gregs[bank];
 ctx.altregs = cpu_gregs[bank ^ 1];
 
+bank = (ctx.tbflags & FPSCR_FR) != 0;
+ctx.fregs = cpu_fregs[bank];
+
 max_insns = tb->cflags & CF_COUNT_MASK;
 if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
-- 
2.9.4

[Qemu-devel] [PATCH 02/11] target/sh4: Consolidate end-of-TB tests

2017-07-05 Thread Richard Henderson

We can fold 3 different tests within the decode loop
into a more accurate computation of max_insns to start.

Signed-off-by: Richard Henderson 
---
 target/sh4/translate.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 6b247fa..e1661e9 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -1856,7 +1856,6 @@ void gen_intermediate_code(CPUSH4State * env, struct 
TranslationBlock *tb)
 ctx.features = env->features;
 ctx.has_movcal = (ctx.tbflags & TB_FLAG_PENDING_MOVCA);
 
-num_insns = 0;
 max_insns = tb->cflags & CF_COUNT_MASK;
 if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
@@ -1864,9 +1863,23 @@ void gen_intermediate_code(CPUSH4State * env, struct 
TranslationBlock *tb)
 if (max_insns > TCG_MAX_INSNS) {
 max_insns = TCG_MAX_INSNS;
 }
+/* Since the ISA is fixed-width, we can bound by the number
+   of instructions remaining on the page.  */
+num_insns = (TARGET_PAGE_SIZE - (ctx.pc & (TARGET_PAGE_SIZE - 1))) / 2;
+if (max_insns > num_insns) {
+max_insns = num_insns;
+}
+/* Single stepping means just that.  */
+if (ctx.singlestep_enabled || singlestep) {
+max_insns = 1;
+}
 
 gen_tb_start(tb);
-while (ctx.bstate == BS_NONE && !tcg_op_buf_full()) {
+num_insns = 0;
+
+while (ctx.bstate == BS_NONE
+   && num_insns < max_insns
+   && !tcg_op_buf_full()) {
 tcg_gen_insn_start(ctx.pc, ctx.envflags);
 num_insns++;
 
@@ -1890,18 +1903,10 @@ void gen_intermediate_code(CPUSH4State * env, struct 
TranslationBlock *tb)
 ctx.opcode = cpu_lduw_code(env, ctx.pc);
decode_opc();
ctx.pc += 2;
-   if ((ctx.pc & (TARGET_PAGE_SIZE - 1)) == 0)
-   break;
-if (cs->singlestep_enabled) {
-   break;
-}
-if (num_insns >= max_insns)
-break;
-if (singlestep)
-break;
 }
-if (tb->cflags & CF_LAST_IO)
+if (tb->cflags & CF_LAST_IO) {
 gen_io_end();
+}
 if (cs->singlestep_enabled) {
 gen_save_cpu_state(, true);
 gen_helper_debug(cpu_env);
-- 
2.9.4

[Qemu-devel] [PATCH 07/11] target/sh4: Unify cpu_fregs into FREG

2017-07-05 Thread Richard Henderson

We were treating FREG as an index and REG as a TCGv.
Making FREG return a TCGv is both less confusing and
a step toward cleaner banking of cpu_fregs.

Signed-off-by: Richard Henderson 
---
 target/sh4/translate.c | 123 +
 1 file changed, 52 insertions(+), 71 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 20e24d5..e4fd6f2 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -382,10 +382,11 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define REG(x) ctx->gregs[x]
 #define ALTREG(x)  ctx->altregs[x]
 
-#define FREG(x) (ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x))
+#define FREG(x) cpu_fregs[ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x)]
 #define XHACK(x) x) & 1 ) << 4) | ((x) & 0xe))
-#define XREG(x) (ctx->tbflags & FPSCR_FR ? XHACK(x) ^ 0x10 : XHACK(x))
-#define DREG(x) FREG(x) /* Assumes lsb of (x) is always 0 */
+#define XREG(x) FREG(XHACK(x))
+/* Assumes lsb of (x) is always 0 */
+#define DREG(x) (ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x))
 
 #define CHECK_NOT_DELAY_SLOT \
 if (ctx->envflags & DELAY_SLOT_MASK) {   \
@@ -1005,56 +1006,51 @@ static void _decode_opc(DisasContext * ctx)
CHECK_FPU_ENABLED
 if (ctx->tbflags & FPSCR_SZ) {
TCGv_i64 fp = tcg_temp_new_i64();
-   gen_load_fpr64(fp, XREG(B7_4));
-   gen_store_fpr64(fp, XREG(B11_8));
+   gen_load_fpr64(fp, XHACK(B7_4));
+   gen_store_fpr64(fp, XHACK(B11_8));
tcg_temp_free_i64(fp);
} else {
-   tcg_gen_mov_i32(cpu_fregs[FREG(B11_8)], cpu_fregs[FREG(B7_4)]);
+   tcg_gen_mov_i32(FREG(B11_8), FREG(B7_4));
}
return;
 case 0xf00a: /* fmov {F,D,X}Rm,@Rn - FPSCR: Nothing */
CHECK_FPU_ENABLED
 if (ctx->tbflags & FPSCR_SZ) {
TCGv addr_hi = tcg_temp_new();
-   int fr = XREG(B7_4);
+   int fr = XHACK(B7_4);
tcg_gen_addi_i32(addr_hi, REG(B11_8), 4);
-tcg_gen_qemu_st_i32(cpu_fregs[fr], REG(B11_8),
-ctx->memidx, MO_TEUL);
-tcg_gen_qemu_st_i32(cpu_fregs[fr+1], addr_hi,
-ctx->memidx, MO_TEUL);
+tcg_gen_qemu_st_i32(FREG(fr), REG(B11_8), ctx->memidx, MO_TEUL);
+tcg_gen_qemu_st_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
tcg_temp_free(addr_hi);
} else {
-tcg_gen_qemu_st_i32(cpu_fregs[FREG(B7_4)], REG(B11_8),
-ctx->memidx, MO_TEUL);
+tcg_gen_qemu_st_i32(FREG(B7_4), REG(B11_8), ctx->memidx, MO_TEUL);
}
return;
 case 0xf008: /* fmov @Rm,{F,D,X}Rn - FPSCR: Nothing */
CHECK_FPU_ENABLED
 if (ctx->tbflags & FPSCR_SZ) {
TCGv addr_hi = tcg_temp_new();
-   int fr = XREG(B11_8);
+   int fr = XHACK(B11_8);
tcg_gen_addi_i32(addr_hi, REG(B7_4), 4);
-tcg_gen_qemu_ld_i32(cpu_fregs[fr], REG(B7_4), ctx->memidx, 
MO_TEUL);
-tcg_gen_qemu_ld_i32(cpu_fregs[fr+1], addr_hi, ctx->memidx, 
MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(fr), REG(B7_4), ctx->memidx, MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
tcg_temp_free(addr_hi);
} else {
-tcg_gen_qemu_ld_i32(cpu_fregs[FREG(B11_8)], REG(B7_4),
-ctx->memidx, MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(B11_8), REG(B7_4), ctx->memidx, MO_TEUL);
}
return;
 case 0xf009: /* fmov @Rm+,{F,D,X}Rn - FPSCR: Nothing */
CHECK_FPU_ENABLED
 if (ctx->tbflags & FPSCR_SZ) {
TCGv addr_hi = tcg_temp_new();
-   int fr = XREG(B11_8);
+   int fr = XHACK(B11_8);
tcg_gen_addi_i32(addr_hi, REG(B7_4), 4);
-tcg_gen_qemu_ld_i32(cpu_fregs[fr], REG(B7_4), ctx->memidx, 
MO_TEUL);
-tcg_gen_qemu_ld_i32(cpu_fregs[fr+1], addr_hi, ctx->memidx, 
MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(fr), REG(B7_4), ctx->memidx, MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 8);
tcg_temp_free(addr_hi);
} else {
-tcg_gen_qemu_ld_i32(cpu_fregs[FREG(B11_8)], REG(B7_4),
-ctx->memidx, MO_TEUL);
+tcg_gen_qemu_ld_i32(FREG(B11_8), REG(B7_4), ctx->memidx, MO_TEUL);
tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 4);
}
return;
@@ -1063,13 +1059,12 @@ static void _decode_opc(DisasContext * ctx)
 TCGv addr = tcg_temp_new_i32();
 tcg_gen_subi_i32(addr, REG(B11_8), 4);
 if (ctx->tbflags & FPSCR_SZ) {
-   int fr = XREG(B7_4);
-tcg_gen_qemu_st_i32(cpu_fregs[fr+1], addr, ctx->memidx, MO_TEUL);
+   int fr = XHACK(B7_4);
+

[Qemu-devel] [PATCH 04/11] target/sh4: Recognize common gUSA sequences

2017-07-05 Thread Richard Henderson

For many of the sequences produced by gcc or glibc,
we can translate these as host atomic operations.
Which saves the need to acquire the exclusive lock.

Signed-off-by: Richard Henderson 
---
 target/sh4/translate.c | 300 +++--
 1 file changed, 290 insertions(+), 10 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 02c6efc..9ab7d6e 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -1896,11 +1896,296 @@ static void decode_opc(DisasContext * ctx)
 }
 
 #ifdef CONFIG_USER_ONLY
-static int decode_gusa(DisasContext *ctx)
+/* For uniprocessors, SH4 uses optimistic restartable atomic sequences.
+   Upon an interrupt, a real kernel would simply notice magic values in
+   the registers and reset the PC to the start of the sequence.
+
+   For QEMU, we cannot do this in quite the same way.  Instead, we notice
+   the normal start of such a sequence (mov #-x,r15).  While we can handle
+   any sequence via cpu_exec_step_atomic, we can recognize the "normal"
+   sequences and transform them into atomic operations as seen by the host.
+*/
+static int decode_gusa(DisasContext *ctx, CPUSH4State *env, int *pmax_insns)
 {
+uint16_t insns[5];
+int ld_adr, ld_reg, ld_mop;
+int op_reg, op_arg, op_opc;
+int mt_reg, st_reg, st_mop;
+
 uint32_t pc = ctx->pc;
 uint32_t pc_end = ctx->tb->cs_base;
+int backup = sextract32(ctx->tbflags, GUSA_SHIFT, 8);
+int max_insns = (pc_end - pc) / 2;
+int i;
+
+if (pc != pc_end + backup || max_insns < 2) {
+/* This is a malformed gUSA region.  Don't do anything special,
+   since the interpreter is likely to get confused.  */
+ctx->envflags &= ~GUSA_MASK;
+return 0;
+}
+
+if (ctx->tbflags & GUSA_EXCLUSIVE) {
+/* Regardless of single-stepping or the end of the page,
+   we must complete execution of the gUSA region while
+   holding the exclusive lock.  */
+*pmax_insns = max_insns;
+return 0;
+}
+
+/* The state machine below will consume only a few insns.
+   If there are more than that in a region, fail now.  */
+if (max_insns > ARRAY_SIZE(insns)) {
+goto fail;
+}
+
+/* Read all of the insns for the region.  */
+for (i = 0; i < max_insns; ++i) {
+insns[i] = cpu_lduw_code(env, pc + i * 2);
+}
+
+ld_adr = ld_reg = ld_mop = -1;
+op_reg = op_arg = op_opc = -1;
+mt_reg = -1;
+st_reg = st_mop = -1;
+i = 0;
+
+#define NEXT_INSN \
+do { if (i >= max_insns) goto fail; ctx->opcode = insns[i++]; } while (0)
+
+/*
+ * Expect a load to begin the region.
+ */
+NEXT_INSN;
+switch (ctx->opcode & 0xf00f) {
+case 0x6000: /* mov.b @Rm,Rn */
+ld_mop = MO_SB;
+break;
+case 0x6001: /* mov.w @Rm,Rn */
+ld_mop = MO_TESW;
+break;
+case 0x6002: /* mov.l @Rm,Rn */
+ld_mop = MO_TESL;
+break;
+default:
+goto fail;
+}
+ld_adr = B7_4;
+op_reg = ld_reg = B11_8;
+if (ld_adr == ld_reg) {
+goto fail;
+}
+
+/*
+ * Expect an optional register move.
+ */
+NEXT_INSN;
+switch (ctx->opcode & 0xf00f) {
+case 0x6003: /* mov Rm,Rn */
+/* Here we want to recognize the ld output being
+   saved for later consumtion (e.g. atomic_fetch_op).  */
+if (ld_reg != B7_4) {
+goto fail;
+}
+op_reg = B11_8;
+break;
+
+default:
+/* Put back and re-examine as operation.  */
+--i;
+}
+
+/*
+ * Expect the operation.
+ */
+NEXT_INSN;
+switch (ctx->opcode & 0xf00f) {
+case 0x300c: /* add Rm,Rn */
+op_opc = INDEX_op_add_i32;
+goto do_reg_op;
+case 0x2009: /* and Rm,Rn */
+op_opc = INDEX_op_and_i32;
+goto do_reg_op;
+case 0x200a: /* xor Rm,Rn */
+op_opc = INDEX_op_xor_i32;
+goto do_reg_op;
+case 0x200b: /* or Rm,Rn */
+op_opc = INDEX_op_or_i32;
+do_reg_op:
+/* The operation register should be as expected, and the
+   other input cannot depend on the load.  */
+op_arg = B7_4;
+if (op_reg != B11_8 || op_arg == op_reg || op_arg == ld_reg) {
+goto fail;
+}
+break;
+
+case 0x3000: /* cmp/eq Rm,Rn */
+/* Looking for the middle of a compare-and-swap sequence,
+   beginning with the compare.  Operands can be either order,
+   but with only one overlapping the load.  */
+if ((op_reg == B11_8) + (op_reg == B7_4) != 1) {
+goto fail;
+}
+op_opc = INDEX_op_setcond_i32;  /* placeholder */
+op_arg = (op_reg == B11_8 ? B7_4 : B11_8);
+
+NEXT_INSN;
+switch (ctx->opcode & 0xff00) {
+case 0x8b00: /* bf label */
+case 0x8f00: /* bf/s label */
+if (pc + (i + 1 + B7_0s) * 2 != pc_end) {
+

[Qemu-devel] [PATCH 06/11] target/sh4: Hoist register bank selection

2017-07-05 Thread Richard Henderson

Compute which register bank to use once at the start of translation.

Signed-off-by: Richard Henderson 
---
 target/sh4/translate.c | 43 ++-
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 9ab7d6e..20e24d5 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -35,6 +35,8 @@
 
 typedef struct DisasContext {
 struct TranslationBlock *tb;
+TCGv *gregs; /* active bank */
+TCGv *altregs;   /* inactive, alternate, bank */
 target_ulong pc;
 uint16_t opcode;
 uint32_t tbflags;/* should stay unmodified during the TB translation */
@@ -64,7 +66,7 @@ enum {
 
 /* global register indexes */
 static TCGv_env cpu_env;
-static TCGv cpu_gregs[24];
+static TCGv cpu_gregs[2][16];
 static TCGv cpu_sr, cpu_sr_m, cpu_sr_q, cpu_sr_t;
 static TCGv cpu_pc, cpu_ssr, cpu_spc, cpu_gbr;
 static TCGv cpu_vbr, cpu_sgr, cpu_dbr, cpu_mach, cpu_macl;
@@ -99,16 +101,31 @@ void sh4_translate_init(void)
 "FPR12_BANK1", "FPR13_BANK1", "FPR14_BANK1", "FPR15_BANK1",
 };
 
-if (done_init)
+if (done_init) {
 return;
+}
 
 cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
 tcg_ctx.tcg_env = cpu_env;
 
-for (i = 0; i < 24; i++)
-cpu_gregs[i] = tcg_global_mem_new_i32(cpu_env,
-  offsetof(CPUSH4State, gregs[i]),
-  gregnames[i]);
+for (i = 0; i < 8; i++) {
+cpu_gregs[0][i]
+= tcg_global_mem_new_i32(cpu_env,
+ offsetof(CPUSH4State, gregs[i]),
+ gregnames[i]);
+}
+for (i = 8; i < 16; i++) {
+cpu_gregs[0][i] = cpu_gregs[1][i]
+= tcg_global_mem_new_i32(cpu_env,
+ offsetof(CPUSH4State, gregs[i]),
+ gregnames[i]);
+}
+for (i = 16; i < 24; i++) {
+cpu_gregs[1][i - 16]
+= tcg_global_mem_new_i32(cpu_env,
+ offsetof(CPUSH4State, gregs[i]),
+ gregnames[i]);
+}
 
 cpu_pc = tcg_global_mem_new_i32(cpu_env,
 offsetof(CPUSH4State, pc), "PC");
@@ -362,13 +379,8 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define B11_8 ((ctx->opcode >> 8) & 0xf)
 #define B15_12 ((ctx->opcode >> 12) & 0xf)
 
-#define REG(x) ((x) < 8 && (ctx->tbflags & (1u << SR_MD))\
-&& (ctx->tbflags & (1u << SR_RB))\
-? (cpu_gregs[x + 16]) : (cpu_gregs[x]))
-
-#define ALTREG(x) ((x) < 8 && (!(ctx->tbflags & (1u << SR_MD))\
-   || !(ctx->tbflags & (1u << SR_RB)))\
-   ? (cpu_gregs[x + 16]) : (cpu_gregs[x]))
+#define REG(x) ctx->gregs[x]
+#define ALTREG(x)  ctx->altregs[x]
 
 #define FREG(x) (ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x))
 #define XHACK(x) x) & 1 ) << 4) | ((x) & 0xe))
@@ -2214,6 +2226,7 @@ void gen_intermediate_code(CPUSH4State * env, struct 
TranslationBlock *tb)
 target_ulong pc_start;
 int num_insns;
 int max_insns;
+int bank;
 
 pc_start = tb->pc;
 ctx.pc = pc_start;
@@ -2229,6 +2242,10 @@ void gen_intermediate_code(CPUSH4State * env, struct 
TranslationBlock *tb)
 ctx.features = env->features;
 ctx.has_movcal = (ctx.tbflags & TB_FLAG_PENDING_MOVCA);
 
+bank = (ctx.tbflags & (1 << SR_MD)) && (ctx.tbflags & (1 << SR_RB));
+ctx.gregs = cpu_gregs[bank];
+ctx.altregs = cpu_gregs[bank ^ 1];
+
 max_insns = tb->cflags & CF_COUNT_MASK;
 if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
-- 
2.9.4

[Qemu-devel] [PATCH 01/11] target/sh4: Use cmpxchg for movco

2017-07-05 Thread Richard Henderson

As for other targets, cmpxchg isn't quite right for ll/sc,
suffering from an ABA race, but is sufficient to implement
portable atomic operations.

Signed-off-by: Richard Henderson 
---
 target/sh4/cpu.h   |  3 ++-
 target/sh4/translate.c | 56 +-
 2 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index ffb9168..b15116e 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -169,7 +169,8 @@ typedef struct CPUSH4State {
 tlb_t itlb[ITLB_SIZE]; /* instruction translation table */
 tlb_t utlb[UTLB_SIZE]; /* unified translation table */
 
-uint32_t ldst;
+uint32_t lock_addr;
+uint32_t lock_value;
 
 /* Fields up to this point are cleared by a CPU reset */
 struct {} end_reset_fields;
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 8bc132b..6b247fa 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -68,7 +68,8 @@ static TCGv cpu_gregs[24];
 static TCGv cpu_sr, cpu_sr_m, cpu_sr_q, cpu_sr_t;
 static TCGv cpu_pc, cpu_ssr, cpu_spc, cpu_gbr;
 static TCGv cpu_vbr, cpu_sgr, cpu_dbr, cpu_mach, cpu_macl;
-static TCGv cpu_pr, cpu_fpscr, cpu_fpul, cpu_ldst;
+static TCGv cpu_pr, cpu_fpscr, cpu_fpul;
+static TCGv cpu_lock_addr, cpu_lock_value;
 static TCGv cpu_fregs[32];
 
 /* internal register indexes */
@@ -151,8 +152,12 @@ void sh4_translate_init(void)
   offsetof(CPUSH4State,
delayed_cond),
   "_delayed_cond_");
-cpu_ldst = tcg_global_mem_new_i32(cpu_env,
- offsetof(CPUSH4State, ldst), "_ldst_");
+cpu_lock_addr = tcg_global_mem_new_i32(cpu_env,
+  offsetof(CPUSH4State, lock_addr),
+   "_lock_addr_");
+cpu_lock_value = tcg_global_mem_new_i32(cpu_env,
+   offsetof(CPUSH4State, lock_value),
+"_lock_value_");
 
 for (i = 0; i < 32; i++)
 cpu_fregs[i] = tcg_global_mem_new_i32(cpu_env,
@@ -1526,20 +1531,32 @@ static void _decode_opc(DisasContext * ctx)
return;
 case 0x0073:
 /* MOVCO.L
-  LDST -> T
+   LDST -> T
If (T == 1) R0 -> (Rn)
0 -> LDST
 */
 if (ctx->features & SH_FEATURE_SH4A) {
-TCGLabel *label = gen_new_label();
-tcg_gen_mov_i32(cpu_sr_t, cpu_ldst);
-   tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_ldst, 0, label);
-tcg_gen_qemu_st_i32(REG(0), REG(B11_8), ctx->memidx, MO_TEUL);
-   gen_set_label(label);
-   tcg_gen_movi_i32(cpu_ldst, 0);
-   return;
-   } else
-   break;
+TCGLabel *fail = gen_new_label();
+TCGLabel *done = gen_new_label();
+TCGv tmp;
+
+tcg_gen_brcond_i32(TCG_COND_NE, REG(B11_8), cpu_lock_addr, fail);
+
+tmp = tcg_temp_new();
+tcg_gen_atomic_cmpxchg_i32(tmp, REG(B11_8), cpu_lock_value,
+   REG(0), ctx->memidx, MO_TEUL);
+tcg_gen_setcond_i32(TCG_COND_EQ, cpu_sr_t, tmp, cpu_lock_value);
+tcg_temp_free(tmp);
+tcg_gen_br(done);
+
+gen_set_label(fail);
+tcg_gen_movi_i32(cpu_sr_t, 0);
+
+gen_set_label(done);
+return;
+} else {
+break;
+}
 case 0x0063:
 /* MOVLI.L @Rm,R0
1 -> LDST
@@ -1547,13 +1564,14 @@ static void _decode_opc(DisasContext * ctx)
When interrupt/exception
occurred 0 -> LDST
 */
-   if (ctx->features & SH_FEATURE_SH4A) {
-   tcg_gen_movi_i32(cpu_ldst, 0);
+if (ctx->features & SH_FEATURE_SH4A) {
 tcg_gen_qemu_ld_i32(REG(0), REG(B11_8), ctx->memidx, MO_TESL);
-   tcg_gen_movi_i32(cpu_ldst, 1);
-   return;
-   } else
-   break;
+tcg_gen_mov_i32(cpu_lock_addr, REG(B11_8));
+tcg_gen_mov_i32(cpu_lock_value, REG(0));
+return;
+} else {
+break;
+}
 case 0x0093:   /* ocbi @Rn */
{
 gen_helper_ocbi(cpu_env, REG(B11_8));
-- 
2.9.4

[Qemu-devel] [PATCH 03/11] target/sh4: Handle user-space atomics

2017-07-05 Thread Richard Henderson

For uniprocessors, SH4 uses optimistic restartable atomic sequences.
Upon an interrupt, a real kernel would simply notice magic values in
the registers and reset the PC to the start of the sequence.

For QEMU, we cannot do this in quite the same way.  Instead, we notice
the normal start of such a sequence (mov #-x,r15), and start a new TB
that can be executed under cpu_exec_step_atomic.

Reported-by: Bruno Haible  
LP: https://bugs.launchpad.net/bugs/1701971
Signed-off-by: Richard Henderson 
---
 target/sh4/cpu.h   |  21 ++--
 target/sh4/helper.h|   1 +
 target/sh4/op_helper.c |   6 +++
 target/sh4/translate.c | 137 +++--
 4 files changed, 147 insertions(+), 18 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index b15116e..0a08b12 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -96,6 +96,12 @@
 #define DELAY_SLOT_CONDITIONAL (1 << 1)
 #define DELAY_SLOT_RTE (1 << 2)
 
+#define TB_FLAG_PENDING_MOVCA  (1 << 3)
+
+#define GUSA_SHIFT 4
+#define GUSA_EXCLUSIVE (1 << 12)
+#define GUSA_MASK  ((0xff << GUSA_SHIFT) | GUSA_EXCLUSIVE)
+
 typedef struct tlb_t {
 uint32_t vpn;  /* virtual page number */
 uint32_t ppn;  /* physical page number */
@@ -367,7 +373,11 @@ static inline int cpu_ptel_pr (uint32_t ptel)
 #define PTEA_TC(1 << 3)
 #define cpu_ptea_tc(ptea) (((ptea) & PTEA_TC) >> 3)
 
-#define TB_FLAG_PENDING_MOVCA  (1 << 4)
+#ifdef CONFIG_USER_ONLY
+#define TB_FLAG_ENVFLAGS_MASK  (DELAY_SLOT_MASK | GUSA_MASK)
+#else
+#define TB_FLAG_ENVFLAGS_MASK  DELAY_SLOT_MASK
+#endif
 
 static inline target_ulong cpu_read_sr(CPUSH4State *env)
 {
@@ -388,12 +398,17 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, 
target_ulong *pc,
 target_ulong *cs_base, uint32_t *flags)
 {
 *pc = env->pc;
+#ifdef CONFIG_USER_ONLY
+/* For a gUSA region, notice the end of the region.  */
+*cs_base = env->flags & GUSA_MASK ? env->gregs[0] : 0;
+#else
 *cs_base = 0;
-*flags = (env->flags & DELAY_SLOT_MASK)/* Bits  0- 2 */
+#endif
+*flags = env->flags /* TB_FLAG_ENVFLAGS_MASK: bits 0-2, 4-12 */
 | (env->fpscr & (FPSCR_FR | FPSCR_SZ | FPSCR_PR))  /* Bits 19-21 */
 | (env->sr & ((1u << SR_MD) | (1u << SR_RB)))  /* Bits 29-30 */
 | (env->sr & (1u << SR_FD))/* Bit 15 */
-| (env->movcal_backup ? TB_FLAG_PENDING_MOVCA : 0); /* Bit 4 */
+| (env->movcal_backup ? TB_FLAG_PENDING_MOVCA : 0); /* Bit 3 */
 }
 
 #endif /* SH4_CPU_H */
diff --git a/target/sh4/helper.h b/target/sh4/helper.h
index dce859c..efbb560 100644
--- a/target/sh4/helper.h
+++ b/target/sh4/helper.h
@@ -6,6 +6,7 @@ DEF_HELPER_1(raise_slot_fpu_disable, noreturn, env)
 DEF_HELPER_1(debug, noreturn, env)
 DEF_HELPER_1(sleep, noreturn, env)
 DEF_HELPER_2(trapa, noreturn, env, i32)
+DEF_HELPER_1(exclusive, noreturn, env)
 
 DEF_HELPER_3(movcal, void, env, i32, i32)
 DEF_HELPER_1(discard_movcal_backup, void, env)
diff --git a/target/sh4/op_helper.c b/target/sh4/op_helper.c
index 528a40a..3139ad2 100644
--- a/target/sh4/op_helper.c
+++ b/target/sh4/op_helper.c
@@ -115,6 +115,12 @@ void helper_trapa(CPUSH4State *env, uint32_t tra)
 raise_exception(env, 0x160, 0);
 }
 
+void helper_exclusive(CPUSH4State *env)
+{
+/* We do not want cpu_restore_state to run.  */
+cpu_loop_exit_atomic(ENV_GET_CPU(env), 0);
+}
+
 void helper_movcal(CPUSH4State *env, uint32_t address, uint32_t value)
 {
 if (cpu_sh4_is_cached (env, address))
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index e1661e9..02c6efc 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -225,7 +225,7 @@ static inline void gen_save_cpu_state(DisasContext *ctx, 
bool save_pc)
 if (ctx->delayed_pc != (uint32_t) -1) {
 tcg_gen_movi_i32(cpu_delayed_pc, ctx->delayed_pc);
 }
-if ((ctx->tbflags & DELAY_SLOT_MASK) != ctx->envflags) {
+if ((ctx->tbflags & TB_FLAG_ENVFLAGS_MASK) != ctx->envflags) {
 tcg_gen_movi_i32(cpu_flags, ctx->envflags);
 }
 }
@@ -239,7 +239,7 @@ static inline bool use_goto_tb(DisasContext *ctx, 
target_ulong dest)
 #ifndef CONFIG_USER_ONLY
 return (ctx->tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK);
 #else
-return true;
+return (ctx->tbflags & GUSA_EXCLUSIVE) == 0;
 #endif
 }
 
@@ -260,16 +260,17 @@ static void gen_goto_tb(DisasContext *ctx, int n, 
target_ulong dest)
 
 static void gen_jump(DisasContext * ctx)
 {
-if (ctx->delayed_pc == (uint32_t) - 1) {
-   /* Target is not statically known, it comes necessarily from a
-  delayed jump as immediate jump are conditinal jumps */
-   tcg_gen_mov_i32(cpu_pc, cpu_delayed_pc);
+if (ctx->delayed_pc == -1) {
+/* Target is not statically known, it comes necessarily from a
+

[Qemu-devel] [PATCH 00/11] target/sh4 improvments

2017-07-05 Thread Richard Henderson

This fixes two problems with atomic operations on sh4,
including an attempt at supporting the user-space atomics
technique used by most sh-linux-user binaries.

This is good enough to run the one occurrence in linux-user-test-0.3.
I'm still downloading enough of a cross environment to be able to run
more recent sh4 binaries.  Including the one in the LP bug report.

Thoughts and more extensive testing appreciated.


r~


Richard Henderson (11):
  target/sh4: Use cmpxchg for movco
  target/sh4: Consolidate end-of-TB tests
  target/sh4: Handle user-space atomics
  target/sh4: Recognize common gUSA sequences
  linux-user/sh4: Notice gUSA regions during signal delivery
  target/sh4: Hoist register bank selection
  target/sh4: Unify cpu_fregs into FREG
  target/sh4: Pass DisasContext to fpr64 routines
  target/sh4: Avoid a potential translator crash for malformed FPR64
  target/sh4: Hoist fp bank selection
  target/sh4: Eliminate DREG macro

 target/sh4/cpu.h   |  24 +-
 target/sh4/helper.h|   1 +
 linux-user/signal.c|  21 ++
 target/sh4/op_helper.c |   6 +
 target/sh4/translate.c | 724 ++---
 5 files changed, 621 insertions(+), 155 deletions(-)

-- 
2.9.4

Re: [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors

2017-07-05 Thread John Snow



On 07/05/2017 05:08 PM, Eric Blake wrote:
> bdrv_is_allocated_above() was relying on intermediate->total_sectors,
> which is a field that can have stale contents depending on the value
> of intermediate->has_variable_length.  An audit shows that we are safe
> (we were first calling through bdrv_co_get_block_status() which in
> turn calls bdrv_nb_sectors() and therefore just refreshed the current
> length), but it's nicer to favor our accessor functions to avoid having
> to repeat such an audit, even if it means refresh_total_sectors() is
> called more frequently.
> 
> Suggested-by: John Snow 
> Signed-off-by: Eric Blake 
> Reviewed-by: Manos Pitsidianakis 
> Reviewed-by: Jeff Cody 

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob to byte-based

2017-07-05 Thread John Snow



On 07/05/2017 05:08 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting an
> internal structure (no semantic change), and all references to the
> buffer size.
> 
> Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS
> (necessary for interaction with sector-based dirty bitmaps, until
> a later patch converts those to be byte-based) does not suffer from
> truncation problems.
> 
> [checkpatch has a false positive on use of MIN() in this patch]
> 
> Signed-off-by: Eric Blake 
> 

Reviewed-by: John Snow

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v4] virtio-net: enable configurable tx queue size

2017-07-05 Thread Michael S. Tsirkin

On Tue, Jul 04, 2017 at 07:03:51PM +0800, Wei Wang wrote:
> On 07/04/2017 03:18 AM, Michael S. Tsirkin wrote:
> > On Wed, Jun 28, 2017 at 10:37:59AM +0800, Wei Wang wrote:
> > > This patch enables the virtio-net tx queue size to be configurable
> > > between 256 (the default queue size) and 1024 by the user when the
> > > vhost-user backend is used.
> > > 
> > > Currently, the maximum tx queue size for other backends is 512 due
> > > to the following limitations:
> > > - QEMU backend: the QEMU backend implementation in some cases may
> > > send 1024+1 iovs to writev.
> > > - Vhost_net backend: there are possibilities that the guest sends
> > > a vring_desc of memory which crosses a MemoryRegion thereby
> > > generating more than 1024 iovs after translation from guest-physical
> > > address in the backend.
> > > 
> > > Signed-off-by: Wei Wang 
> > I was going to apply this, but run into a host of issues:
> > 
> > This segfaults:
> > $ ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net,tx_queue_size=1024
> > Segmentation fault (core dumped)
> > 
> > I tried to tweak this code a bit to avoid the crash, and I run into a 
> > further issue:
> > $ ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net,tx_queue_size=1024
> > Bad ram offset aa49002
> > Aborted (core dumped)
> > 
> > the second issue is especially concerning.
> > 
> 
> AFAIK, all the virtio-net backends require "-netdev". I'm wondering if there
> is any case that virtio-net can work without a "-netdev" created in QEMU?

Of course. Old style -net with vlans still work.

> If not, would it be better if we just stop the device creation at the
> beginning of
> virtio_net_device_realize() if "-netdev" is not given (i.e.
> !n->nic_conf.peers.ncs[0])?
> 
> Best,
> Wei

That will break a ton of scripts without any real benefit
to users.


-- 
MST

Re: [Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete()

2017-07-05 Thread John Snow



On 07/05/2017 05:08 PM, Eric Blake wrote:
> stream_complete() skips the work of rewriting the backing file if
> the job was cancelled, if data->reached_end is false, or if there
> was an error detected (non-zero data->ret) during the streaming.
> But note that in stream_run(), data->reached_end is only set if the
> loop ran to completion, and data->ret is only 0 in two cases:
> either the loop ran to completion (possibly by cancellation, but
> stream_complete checks for that), or we took an early goto out
> because there is no bs->backing.  Thus, we can preserve the same
> semantics without the use of reached_end, by merely checking for
> bs->backing (and logically, if there was no backing file, streaming
> is a no-op, so there is no backing file to rewrite).
> 
> Suggested-by: Kevin Wolf 
> Signed-off-by: Eric Blake 
> 
> ---
> v4: new patch
> ---
>  block/stream.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index 746d525..12f1659 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -59,7 +59,6 @@ static int coroutine_fn stream_populate(BlockBackend *blk,
> 
>  typedef struct {
>  int ret;
> -bool reached_end;
>  } StreamCompleteData;
> 
>  static void stream_complete(BlockJob *job, void *opaque)
> @@ -70,7 +69,7 @@ static void stream_complete(BlockJob *job, void *opaque)
>  BlockDriverState *base = s->base;
>  Error *local_err = NULL;
> 
> -if (!block_job_is_cancelled(>common) && data->reached_end &&
> +if (!block_job_is_cancelled(>common) && bs->backing &&
>  data->ret == 0) {
>  const char *base_id = NULL, *base_fmt = NULL;
>  if (base) {
> @@ -211,7 +210,6 @@ out:
>  /* Modify backing chain and close BDSes in main loop */
>  data = g_malloc(sizeof(*data));
>  data->ret = ret;
> -data->reached_end = sector_num == end;
>  block_job_defer_to_main_loop(>common, stream_complete, data);
>  }
> 

This seems ever so slightly less intuitive to me, but it is functionally
identical.

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH 2/2 v2] xenfb: Allow vkbd to connect without a DisplayState

2017-07-05 Thread Stefano Stabellini

On Mon, 3 Jul 2017, Owen Smith wrote:
> If the vkbd device model is registered and the vfb device model
> is not registered, the backend will not transition to connected.
> If there is no DisplayState, then the absolute coordinates cannot
> be scaled, and will remain in the range [0, 0x7fff].
> Backend writes "feature-raw-pointer" to indicate that the backend
> supports reporting absolute position without rescaling.
> The frontend uses "request-raw-pointer" to request raw unscaled
> pointer values. If there is no DisplayState, the absolute values
> are always raw unscaled values.
> 
> Signed-off-by: Owen Smith 
> ---
>  hw/display/xenfb.c | 36 ++--
>  1 file changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/display/xenfb.c b/hw/display/xenfb.c
> index 88815df..d40af6e 100644
> --- a/hw/display/xenfb.c
> +++ b/hw/display/xenfb.c
> @@ -53,6 +53,7 @@ struct common {
>  struct XenInput {
>  struct common c;
>  int abs_pointer_wanted; /* Whether guest supports absolute pointer */
> +int raw_pointer_wanted; /* Whether guest supports unscaled pointer */
>  int button_state;   /* Last seen pointer button state */
>  int extended;
>  /* kbd */
> @@ -329,18 +330,22 @@ static void xenfb_mouse_event(void *opaque,
> int dx, int dy, int dz, int button_state)
>  {
>  struct XenInput *xenfb = opaque;
> -DisplaySurface *surface = qemu_console_surface(xenfb->c.con);
> -int dw = surface_width(surface);
> -int dh = surface_height(surface);
> -int i;
> +int i, x, y;
> +if (xenfb->c.con && xenfb->raw_pointer_wanted != 1) {
> +DisplaySurface *surface = qemu_console_surface(xenfb->c.con);
> +int dw = surface_width(surface);
> +int dh = surface_height(surface);
> +x = dx * (dw - 1) / 0x7fff;
> +y = dy * (dh - 1) / 0x7fff;
> +} else {
> +x = dx;
> +y = dy;
> +}
>  
>  trace_xenfb_mouse_event(opaque, dx, dy, dz, button_state,
>  xenfb->abs_pointer_wanted);
>  if (xenfb->abs_pointer_wanted)

Shouldn't this be:

  if (xenfb->abs_pointer_wanted || xenfb->raw_pointer_wanted)

?
It it possible to have raw_pointer_wanted && !abs_pointer_wanted? If
not, we should check at connection or initialization time.


> - xenfb_send_position(xenfb,
> - dx * (dw - 1) / 0x7fff,
> - dy * (dh - 1) / 0x7fff,
> - dz);
> +xenfb_send_position(xenfb, x, y, dz);
>  else
>   xenfb_send_motion(xenfb, dx, dy, dz);
>  
> @@ -423,6 +428,7 @@ static void xenfb_legacy_mouse_sync(DeviceState *dev)
>  static int input_init(struct XenDevice *xendev)
>  {
>  xenstore_write_be_int(xendev, "feature-abs-pointer", 1);
> +xenstore_write_be_int(xendev, "feature-raw-pointer", 1);
>  return 0;
>  }
>  
> @@ -432,8 +438,14 @@ static int input_initialise(struct XenDevice *xendev)
>  int rc;
>  
>  if (!in->c.con) {
> -xen_pv_printf(xendev, 1, "ds not set (yet)\n");
> -return -1;
> +char *vfb = xenstore_read_str(NULL, "device/vfb");

Isn't it better to do xenstore_read_str("device", "vfb") ?


> +if (vfb == NULL) {
> +/* there is no vfb, run vkbd on its own */
> +} else {

if (vfb != NULL)


> +free(vfb);

g_free


> +xen_pv_printf(xendev, 1, "ds not set (yet)\n");
> +return -1;
> +}
>  }
>  
>  rc = common_bind(>c);
> @@ -451,6 +463,10 @@ static void input_connected(struct XenDevice *xendev)
>   >abs_pointer_wanted) == -1) {
>  in->abs_pointer_wanted = 0;
>  }
> +if (xenstore_read_fe_int(xendev, "request-raw-pointer",
> + >raw_pointer_wanted) == -1) {
> +in->raw_pointer_wanted = 0;
> +}
>  
>  if (in->qkbd) {
>  qemu_input_handler_unregister(in->qkbd);
> -- 
> 2.1.4
>

Re: [Qemu-devel] [PATCH v1 1/2] target-arm: Move the regime_xxx helpers

2017-07-05 Thread Alistair Francis

On Fri, Jun 30, 2017 at 6:45 AM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Move the regime_xxx helpers in preparation for future code
> that will reuse them.
>
> No functional change.
>
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Alistair Francis 

Thanks,
Alistair

> ---
>  target/arm/helper.c | 404 
> ++--
>  1 file changed, 202 insertions(+), 202 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 2594faa..fd1027e 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -35,6 +35,208 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
> target_ulong address,
>  #define PMCRD   0x8
>  #define PMCRC   0x4
>  #define PMCRE   0x1
> +
> +/* Return the exception level which controls this address translation regime 
> */
> +static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
> +{
> +switch (mmu_idx) {
> +case ARMMMUIdx_S2NS:
> +case ARMMMUIdx_S1E2:
> +return 2;
> +case ARMMMUIdx_S1E3:
> +return 3;
> +case ARMMMUIdx_S1SE0:
> +return arm_el_is_aa64(env, 3) ? 1 : 3;
> +case ARMMMUIdx_S1SE1:
> +case ARMMMUIdx_S1NSE0:
> +case ARMMMUIdx_S1NSE1:
> +case ARMMMUIdx_MPriv:
> +case ARMMMUIdx_MNegPri:
> +case ARMMMUIdx_MUser:
> +return 1;
> +default:
> +g_assert_not_reached();
> +}
> +}
> +
> +/* Return true if this address translation regime is secure */
> +static inline bool regime_is_secure(CPUARMState *env, ARMMMUIdx mmu_idx)
> +{
> +switch (mmu_idx) {
> +case ARMMMUIdx_S12NSE0:
> +case ARMMMUIdx_S12NSE1:
> +case ARMMMUIdx_S1NSE0:
> +case ARMMMUIdx_S1NSE1:
> +case ARMMMUIdx_S1E2:
> +case ARMMMUIdx_S2NS:
> +case ARMMMUIdx_MPriv:
> +case ARMMMUIdx_MNegPri:
> +case ARMMMUIdx_MUser:
> +return false;
> +case ARMMMUIdx_S1E3:
> +case ARMMMUIdx_S1SE0:
> +case ARMMMUIdx_S1SE1:
> +return true;
> +default:
> +g_assert_not_reached();
> +}
> +}
> +
> +/* Return the SCTLR value which controls this address translation regime */
> +static inline uint32_t regime_sctlr(CPUARMState *env, ARMMMUIdx mmu_idx)
> +{
> +return env->cp15.sctlr_el[regime_el(env, mmu_idx)];
> +}
> +
> +/* Return true if the specified stage of address translation is disabled */
> +static inline bool regime_translation_disabled(CPUARMState *env,
> +   ARMMMUIdx mmu_idx)
> +{
> +if (arm_feature(env, ARM_FEATURE_M)) {
> +switch (env->v7m.mpu_ctrl &
> +(R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK)) 
> {
> +case R_V7M_MPU_CTRL_ENABLE_MASK:
> +/* Enabled, but not for HardFault and NMI */
> +return mmu_idx == ARMMMUIdx_MNegPri;
> +case R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK:
> +/* Enabled for all cases */
> +return false;
> +case 0:
> +default:
> +/* HFNMIENA set and ENABLE clear is UNPREDICTABLE, but
> + * we warned about that in armv7m_nvic.c when the guest set it.
> + */
> +return true;
> +}
> +}
> +
> +if (mmu_idx == ARMMMUIdx_S2NS) {
> +return (env->cp15.hcr_el2 & HCR_VM) == 0;
> +}
> +return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
> +}
> +
> +static inline bool regime_translation_big_endian(CPUARMState *env,
> + ARMMMUIdx mmu_idx)
> +{
> +return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
> +}
> +
> +/* Return the TCR controlling this translation regime */
> +static inline TCR *regime_tcr(CPUARMState *env, ARMMMUIdx mmu_idx)
> +{
> +if (mmu_idx == ARMMMUIdx_S2NS) {
> +return >cp15.vtcr_el2;
> +}
> +return >cp15.tcr_el[regime_el(env, mmu_idx)];
> +}
> +
> +/* Convert a possible stage1+2 MMU index into the appropriate
> + * stage 1 MMU index
> + */
> +static inline ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
> +{
> +if (mmu_idx == ARMMMUIdx_S12NSE0 || mmu_idx == ARMMMUIdx_S12NSE1) {
> +mmu_idx += (ARMMMUIdx_S1NSE0 - ARMMMUIdx_S12NSE0);
> +}
> +return mmu_idx;
> +}
> +
> +/* Returns TBI0 value for current regime el */
> +uint32_t arm_regime_tbi0(CPUARMState *env, ARMMMUIdx mmu_idx)
> +{
> +TCR *tcr;
> +uint32_t el;
> +
> +/* For EL0 and EL1, TBI is controlled by stage 1's TCR, so convert
> + * a stage 1+2 mmu index into the appropriate stage 1 mmu index.
> + */
> +mmu_idx = stage_1_mmu_idx(mmu_idx);
> +
> +tcr = regime_tcr(env, mmu_idx);
> +el = regime_el(env, mmu_idx);
> +
> +if (el > 1) {
> +return extract64(tcr->raw_tcr, 20, 1);
> +} else {
> +return extract64(tcr->raw_tcr, 37, 1);
> +}
> +}
> +
> +/* Returns TBI1 value for current

Re: [Qemu-devel] [PATCH v2 0/2] Add global device ID in virt machine

2017-07-05 Thread Michael S. Tsirkin

On Wed, May 31, 2017 at 12:02:56PM +, Diana Madalina Craciun wrote:
> On 05/25/2017 01:12 AM, Michael S. Tsirkin wrote:
> > On Tue, May 23, 2017 at 02:12:43PM +0300, Diana Craciun wrote:
> >> The NXP DPAA2 is a hardware architecture designed for high-speeed network
> >> packet processing. The DPAA2 hardware components are managed by a hardware
> >> component called the Management Complex (or MC) which provides an
> >> object-base abstraction for software drivers to use the DPAA2 hardware.
> >> For more details you can see: 
> >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fstaging%2Ffsl-mc%2FREADME.txt%3Fh%3Dv4.10=01%7C01%7Cdiana.craciun%40nxp.com%7Cce2cc4d066944ce2759308d4a2f1f3f8%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0=CJAiTF6Qnq4gklSqon7xcRby0O1HQvytTUSdPwaHuSE%3D=0
> >>
> >> The interrupts generated by the DPAA2 hardware components are MSIs. We 
> >> will add
> >> support for direct assigning these DPAA2 components/objects to a virtual 
> >> machine. However, this will add the need to expand the MSI usage in QEMU.
> >>
> >> Currently the MSIs in QEMU are pretty much tied to PCI. For ARM the
> >> GIC ITS is using a device ID for interrupt translation. Currently, for
> >> PCI, the requester ID is used as device ID. This will not work when
> >> we add another entity that needs also a device ID which is supposed to
> >> be unique across the system.
> >>
> >> My proposal is to add a static allocation in the virt machine. I considered
> >> that this allocation is specific to each machine/platform. Currently only
> >> virt machine has it, but other implementations may use the same mechanism
> >> as well.
> >> So, I used a static allocation with this formula:
> >>
> >> DeviceID = zero_extend( RequesterID[15:0] ) + 0x1 * Constant
> >>
> >> This formula was taken from SBSA spec (Appendix I: DeviceID generation and
> >> ITS groups). In case of QEMU the constant will be different for each 
> >> entity.
> >> In this way a unique DeviceID will be generated and the device ID will be
> >> derived from a requesterID (in case of PCI) or other means in case of other
> >> entities.
> >>
> >> The implementation is generic as there might be in the future other 
> >> non-pci devices
> >> that are using MSIs or IOMMU. Any architecture can use it, though currently
> >> only the ARM architecture is using the function that retrieves the stream 
> >> ID. I
> >> did not change all the replacements of the pci_requester_id (with 
> >> pci_stream_id)
> >> in the code (although if the constant is 0, the stream_id is equal with 
> >> requester_id).
> >> The other architectures (e.g. intel iommu code) assume that the ID is the
> >> requester ID.
> >>
> >> Tested on NXP LS2080 platform.
> >>
> >> History:
> > I am confused. I get it that non-PCI things want something else
> > in their requester ID, but why require it for PCI devices?
> > How about using Constant == 0 for PCI? This way you do
> > not need to touch PCI at all as DeviceID == RequesterID ...
> 
> It is not that other devices need something else in the requester ID,
> but more about finding a way to provide an unique ID across the system
> (more precisely it should be unique for all devices connected to the
> same IOMMU/ITS). The DT already offers support to describe the
> translation between stream IDs/device IDs to requester ID for PCI
> devices (iommu-map for IOMMU ([1]) and msi-map for MSIs ([2]). It will
> not change the way the requester ID is used in PCI in general, but only
> the places that need a unique ID (which are the MSIs and IOMMU).
> 
> If we are to use a value of 0 for the constant in case of PCI devices,
> what happens if we have multiple PCI controllers?

I guess we'd use the PCI Segment number for that?

> [1]
> https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
> [2]
> https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-msi.txt
> 
> Thanks,
> 
> Diana
>

Re: [Qemu-devel] [PATCH 1/2 v2] xenfb: Use qemu_input_handler_* calls directly

2017-07-05 Thread Stefano Stabellini

On Mon, 3 Jul 2017, Owen Smith wrote:
> The xenvkbd input device uses functions from input-legacy.c
> Use the appropriate qemu_input_handler_* functions instead
> of calling functions in input-legacy.c that in turn call
> the correct functions.
> The bulk of this patch removes the extra layer of calls
> by moving the required structure members into the XenInput
> struct.
> 
> Signed-off-by: Owen Smith 
> ---
>  hw/display/xenfb.c | 121 
> +
>  1 file changed, 113 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/display/xenfb.c b/hw/display/xenfb.c
> index e76c0d8..88815df 100644
> --- a/hw/display/xenfb.c
> +++ b/hw/display/xenfb.c
> @@ -27,6 +27,7 @@
>  #include "qemu/osdep.h"
>  
>  #include "hw/hw.h"
> +#include "ui/input.h"
>  #include "ui/console.h"
>  #include "hw/xen/xen_backend.h"
>  
> @@ -54,7 +55,14 @@ struct XenInput {
>  int abs_pointer_wanted; /* Whether guest supports absolute pointer */
>  int button_state;   /* Last seen pointer button state */
>  int extended;
> -QEMUPutMouseEntry *qmouse;
> +/* kbd */
> +QemuInputHandler hkbd;
> +QemuInputHandlerState *qkbd;
> +/* mouse */
> +QemuInputHandler hmouse;
> +QemuInputHandlerState *qmouse;
> +int axis[INPUT_AXIS__MAX];
> +int buttons;
>  };
>  
>  #define UP_QUEUE 8
> @@ -293,6 +301,21 @@ static void xenfb_key_event(void *opaque, int scancode)
>  xenfb_send_key(xenfb, down, scancode2linux[scancode]);
>  }
>  
> +static void xenfb_legacy_key_event(DeviceState *dev, QemuConsole *src,
> +   InputEvent *evt)
> +{
> +struct XenInput *in = (struct XenInput *)dev;
> +int scancodes[3], i, count;
> +InputKeyEvent *key = evt->u.key.data;
> +
> +count = qemu_input_key_value_to_scancode(key->key,
> + key->down,
> + scancodes);
> +for (i = 0; i < count; ++i) {
> +xenfb_key_event(in, scancodes[i]);
> +}
> +}
>  /*
>   * Send a mouse event from the client to the guest OS
>   *
> @@ -333,6 +356,70 @@ static void xenfb_mouse_event(void *opaque,
>  xenfb->button_state = button_state;
>  }
>  
> +static void xenfb_legacy_mouse_event(DeviceState *dev, QemuConsole *src,
> + InputEvent *evt)
> +{
> +static const int bmap[INPUT_BUTTON__MAX] = {
> +[INPUT_BUTTON_LEFT]   = MOUSE_EVENT_LBUTTON,
> +[INPUT_BUTTON_MIDDLE] = MOUSE_EVENT_MBUTTON,
> +[INPUT_BUTTON_RIGHT]  = MOUSE_EVENT_RBUTTON,
> +};
> +struct XenInput *in = (struct XenInput *)dev;
> +InputBtnEvent *btn;
> +InputMoveEvent *move;
> +
> +switch (evt->type) {
> +case INPUT_EVENT_KIND_BTN:
> +btn = evt->u.btn.data;
> +if (btn->down) {
> +in->buttons |= bmap[btn->button];
> +} else {
> +in->buttons &= ~bmap[btn->button];
> +}
> +if (btn->down && btn->button == INPUT_BUTTON_WHEEL_UP) {
> +xenfb_mouse_event(in,
> +  in->axis[INPUT_AXIS_X],
> +  in->axis[INPUT_AXIS_Y],
> +  -1,
> +  in->buttons);
> +}
> +if (btn->down && btn->button == INPUT_BUTTON_WHEEL_DOWN) {
> +xenfb_mouse_event(in,
> +  in->axis[INPUT_AXIS_X],
> +  in->axis[INPUT_AXIS_Y],
> +  1,
> +  in->buttons);
> +}

Why are we sending the WHEEL events from here rather than from
xenfb_legacy_mouse_sync?

Can't we add WHEEL_UP/DOWN to bmap? Unless it is due to a quirk
somewhere, I would store the wheel events in in->buttons or a new field,
then I would send the event to the other end from
xenfb_legacy_mouse_sync. You might have to reset the wheel event in
xenfb_legacy_mouse_sync, because, differently from the buttons, I don't
think we are going to get a corresponding "up" event.


> +break;
> +case INPUT_EVENT_KIND_ABS:
> +move = evt->u.abs.data;
> +in->axis[move->axis] = move->value;
> +break;
> +case INPUT_EVENT_KIND_REL:
> +move = evt->u.rel.data;
> +in->axis[move->axis] += move->value;
> +break;
> +default:
> +break;
> +}
> +}
> +
> +static void xenfb_legacy_mouse_sync(DeviceState *dev)
> +{
> +struct XenInput *in = (struct XenInput *)dev;
> +
> +xenfb_mouse_event(in,
> +  in->axis[INPUT_AXIS_X],
> +  in->axis[INPUT_AXIS_Y],
> +  0,
> +  in->buttons);
> +
> +if (!in->abs_pointer_wanted) {
> +in->axis[INPUT_AXIS_X] = 0;
> +in->axis[INPUT_AXIS_Y] = 0;
> +}

I think we should take the opportunity to rework and simplify
xenfb_mouse_event: we shouldn't keep track of the

Re: [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base

2017-07-05 Thread Michael S. Tsirkin

On Wed, Jul 05, 2017 at 06:16:17PM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Wed, Jun 28, 2017 at 08:00:43PM +0100, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > When we receive a GET_VRING_BASE message set enable = false
> > > to stop any new received packets modifying the ring.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert 
> > 
> > I think I already reviewed a similar patch.
> 
> Yes you replied to my off-list mail; I hadn't got
> around to fixing it yet.
> 
> > Spec says:
> > Client must only process each ring when it is started.
> 
> but in that reply you said the spec said 
> 
>   Client must only pass data between the ring and the
>   backend, when the ring is enabled.
> 
> So does the spec say 'started' or 'enabled'

Both. Ring processing is limited by ring being started.
Passing actual data - by ring also being enabled.
With ring started but disabled you drop all packets.

> (Pointer to the spec?)

It's part of QEMU source:
docs/interop/vhost-user.txt


> > IMHO the real fix is to fix client to check the started
> > flag before processing the ring.
> 
> Yep I can do that.  I was curious however whether it was
> specified as 'started' or 'enabled' or both.
> 
> Dave
> 
> > > ---
> > >  contrib/libvhost-user/libvhost-user.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/contrib/libvhost-user/libvhost-user.c 
> > > b/contrib/libvhost-user/libvhost-user.c
> > > index ceddeac74f..d37052b7b0 100644
> > > --- a/contrib/libvhost-user/libvhost-user.c
> > > +++ b/contrib/libvhost-user/libvhost-user.c
> > > @@ -652,6 +652,7 @@ vu_get_vring_base_exec(VuDev *dev, VhostUserMsg *vmsg)
> > >  vmsg->size = sizeof(vmsg->payload.state);
> > >  
> > >  dev->vq[index].started = false;
> > > +dev->vq[index].enable = false;
> > >  if (dev->iface->queue_set_started) {
> > >  dev->iface->queue_set_started(dev, index, false);
> > >  }
> > > -- 
> > > 2.13.0
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-05 Thread Stefano Stabellini

On Wed, 5 Jul 2017, Paul Durrant wrote:
> > -Original Message-
> > From: Igor Druzhinin
> > Sent: 04 July 2017 17:47
> > To: Paul Durrant ; xen-de...@lists.xenproject.org;
> > qemu-devel@nongnu.org
> > Cc: sstabell...@kernel.org; Anthony Perard ;
> > pbonz...@redhat.com
> > Subject: Re: [PATCH v2 3/4] xen/mapcache: introduce
> > xen_replace_cache_entry()
> > 
> > On 04/07/17 17:42, Paul Durrant wrote:
> > >> -Original Message-
> > >> From: Igor Druzhinin
> > >> Sent: 04 July 2017 17:34
> > >> To: Paul Durrant ; xen-
> > de...@lists.xenproject.org;
> > >> qemu-devel@nongnu.org
> > >> Cc: sstabell...@kernel.org; Anthony Perard
> > ;
> > >> pbonz...@redhat.com
> > >> Subject: Re: [PATCH v2 3/4] xen/mapcache: introduce
> > >> xen_replace_cache_entry()
> > >>
> > >> On 04/07/17 17:27, Paul Durrant wrote:
> >  -Original Message-
> >  From: Igor Druzhinin
> >  Sent: 04 July 2017 16:48
> >  To: xen-de...@lists.xenproject.org; qemu-devel@nongnu.org
> >  Cc: Igor Druzhinin ; sstabell...@kernel.org;
> >  Anthony Perard ; Paul Durrant
> >  ; pbonz...@redhat.com
> >  Subject: [PATCH v2 3/4] xen/mapcache: introduce
> >  xen_replace_cache_entry()
> > 
> >  This new call is trying to update a requested map cache entry
> >  according to the changes in the physmap. The call is searching
> >  for the entry, unmaps it and maps again at the same place using
> >  a new guest address. If the mapping is dummy this call will
> >  make it real.
> > 
> >  This function makes use of a new xenforeignmemory_map2() call
> >  with an extended interface that was recently introduced in
> >  libxenforeignmemory [1].
> > >>>
> > >>> I don't understand how the compat layer works here. If
> > >> xenforeignmemory_map2() is not available then you can't control the
> > >> placement in virtual address space.
> > >>>
> > >>
> > >> If it's not 4.10 or newer xenforeignmemory_map2() doesn't exist and is
> > >> going to be defined as xenforeignmemory_map(). At the same time
> > >> XEN_COMPAT_PHYSMAP is defined and the entry replace function
> > (which
> > >> relies on xenforeignmemory_map2 functionality) is never going to be
> > called.
> > >>
> > >> If you mean that I should incorporate this into the description I can do 
> > >> it.
> > >
> > > AFAICT XEN_COMPAT_PHYSMAP is not introduced until patch #4 though.
> > >
> > > The problem really comes down to defining xenforeignmemory_map2() in
> > terms of xenforeignmemory_map(). It basically can't be safely done. Could
> > you define xenforeignmemory_map2() as abort() in the compat case
> > instead?
> > >
> > 
> > xen_replace_cache_entry() is not called in patch #3. Which means it's
> > safe to use a fallback version (xenforeignmemory_map) in
> > xen_remap_bucket here.
> 
> I still don't like the fact that the compat definition of 
> xenforeignmemory_map2() loses the extra argument. That's going to catch 
> someone out one day. Is there any way you could re-work it so that 
> xenforeignmemory_map() is uses in the cases where the memory placement does 
> not matter?

We could assert(vaddr == NULL) in the compat implementation of
xenforeignmemory_map2. Would that work?



> > Igor
> > 
> > >   Paul
> > >
> > >>
> > >> Igor
> > >>
> > >>>   Paul
> > >>>
> > 
> >  [1] https://www.mail-archive.com/xen-
> > >> de...@lists.xen.org/msg113007.html
> > 
> >  Signed-off-by: Igor Druzhinin 
> >  ---
> >   configure | 18 ++
> >   hw/i386/xen/xen-mapcache.c| 79
> >  ++-
> >   include/hw/xen/xen_common.h   |  7 
> >   include/sysemu/xen-mapcache.h | 11 +-
> >   4 files changed, 106 insertions(+), 9 deletions(-)
> > 
> >  diff --git a/configure b/configure
> >  index c571ad1..ad6156b 100755
> >  --- a/configure
> >  +++ b/configure
> >  @@ -2021,6 +2021,24 @@ EOF
> >   # Xen unstable
> >   elif
> >   cat > $TMPC < >  +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
> >  +#include 
> >  +int main(void) {
> >  +  xenforeignmemory_handle *xfmem;
> >  +
> >  +  xfmem = xenforeignmemory_open(0, 0);
> >  +  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
> >  +
> >  +  return 0;
> >  +}
> >  +EOF
> >  +compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
> >  +  then
> >  +  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
> >  +  xen_ctrl_version=41000
> >  +  xen=yes
> >  +elif
> >  +cat > $TMPC < >   #undef XC_WANT_COMPAT_DEVICEMODEL_API
> >   #define __XEN_TOOLS__
> >   #include 
> >  diff --git

Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/5] spapr: DRC cleanups (part VI)

2017-07-05 Thread Michael Roth

Quoting Daniel Henrique Barboza (2017-07-05 16:53:57)
> 
> 
> On 07/05/2017 08:04 AM, David Gibson wrote:
> > On Tue, Jul 04, 2017 at 06:13:31PM -0300, Daniel Henrique Barboza wrote:
> >> I just tested this patch set on top of current ppc-for-2.10 branch (which
> >> contains
> >> the patches from part V). It applied cleanly but required a couple of
> >> trivial
> >> fixes to build probably because it was made on top of an older code base.
> > Right, I fixed that up locally already, but haven't gotten around to
> > reposting yet.  You can look at the 'drcVI' branch on my github tree,
> > if you're interested.
> >
> >> The trivial migration test worked fine. The libvirt scenario (attaching a
> >> device on
> >> target before migration, try to unplug after migration) isn't working as
> >> expected
> >> but we have a different result with this series. Instead of silently 
> >> failing
> >> to unplug
> >> with error messages on dmesg, the hot unplug works on QEMU level:
> > Thanks for testing.  Just to clarify what you're saying here, you
> > haven't spotted a regression with this series, but there is a case
> > which was broken and is still broken with slightly different
> > symptoms.  Yes?
> In my opinion, yes. It is debatable if the patch series made it worse 
> because
> the guest is now misbehaving, but the feature per se wasn't working
> prior to it.

I think it's the removal of awaiting_allocation. We know currently that
in the libvirt scenario the DRC is exposed in an pre-hotplug state of
ISOLATED/UNALLOCATED. In that state, spapr_drc_detach() completes
immediately because from the perspective of QEMU it apparently has not
been exposed to the guest yet, or the guest has already quiesced it on
it's end.

awaiting_allocation guarded against this, as it's intention was to make
sure that resource was put into an ALLOCATED state prior to getting moved
back into an UNALLOCATED state, so we didn't immediately unplug a CPU
while the hotplug was in progress.

So in your scenario the CPU is just mysteriously vanishing out from
under the guest, which probably explains the hang. The fix for this
particular scenario is to fix the initial DRC on the target. I think
we have a plan for that so I wouldn't consider this a regression
necessarily.

> 
> >
> >> (qemu) device_del core1
> >> (qemu)
> >> (qemu) info cpus
> >> * CPU #0: nip=0xc00a3e0c thread_id=86162
> >> (qemu) info hotpluggable-cpus
> >> Hotpluggable CPUs:
> >>type: "host-spapr-cpu-core"
> >>vcpus_count: "1"
> >>CPUInstance Properties:
> >>  core-id: "3"
> >>type: "host-spapr-cpu-core"
> >>vcpus_count: "1"
> >>CPUInstance Properties:
> >>  core-id: "2"
> >>type: "host-spapr-cpu-core"
> >>vcpus_count: "1"
> >>CPUInstance Properties:
> >>  core-id: "1"
> >>type: "host-spapr-cpu-core"
> >>vcpus_count: "1"
> >>qom_path: "/machine/unattached/device[0]"
> >>CPUInstance Properties:
> >>  core-id: "0"
> >> (qemu)
> >>
> >>
> >> However, any operation on the guest afterwards (tried with lscpu and dmesg)
> >> seems
> >> to hung the guest. This is what I got when trying to do a dmesg after the
> >> hot unplug:
> > Ouch.  That's bad.  I'll have to look into it.
> >
> > I have rather a lot on my plate at the moment - if you get a chance to
> > work out which of the patches in the series causes this behaviour,
> > that could be handy.
> 
> ***long post warning***
> 
> With the current master and current HEAD of ppc-for-2.10, the behavior in
> the Libvirt scenario (device_add in both source and target before migration,
> hot unplug after migration is completed) is that QEMU fails to hot 
> unplug the
> CPU from the guest OS. lscpu reports the same # of cpus even after 
> device_del,
> dmesg shows an error like this:
> 
> [  108.182291] pseries-hotplug-cpu: CPU with drc index 1008 already 
> exists
> 
> 
> WIth this patch series, QEMU removes the CPU but the guest misbehaves 
> like I said
> in my previous message.
> 
> With the current HEAD of drcVI branch, I rolled back until I found the 
> patch that was
> causing this new symptom. The patch is the very first of the series:
> 
> b752844 spapr: Remove 'awaiting_allocation' DRC flag
> 
> In short, adding this single patch into the HEAD of ppc-for-2.10 is 
> causing this new
> behavior I saw in my tests. This is the retrieved kernel log after the 
> failed unplug:
> 
> [  176.434099] random: crng init done
> [  461.182729] pseries-hotplug-cpu: CPU with drc index 1008 already 
> exists
> [  604.707369] INFO: task kworker/0:2:920 blocked for more than 120 seconds.
> [  604.707666]   Not tainted 4.10.0-26-generic #30-Ubuntu
> [  604.707881] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [  604.708194] kworker/0:2 D0   920  2 0x0800
> [  604.708248] Workqueue: events vmstat_shepherd
> [  604.708251] Call Trace:
> [  604.708265] [c55a7830] [c1492090] 
>

Re: [Qemu-devel] [PATCH v2 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-05 Thread Stefano Stabellini

On Tue, 4 Jul 2017, Igor Druzhinin wrote:
> This new call is trying to update a requested map cache entry
> according to the changes in the physmap. The call is searching
> for the entry, unmaps it and maps again at the same place using
> a new guest address. If the mapping is dummy this call will
> make it real.
> 
> This function makes use of a new xenforeignmemory_map2() call
> with an extended interface that was recently introduced in
> libxenforeignmemory [1].
> 
> [1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html
> 
> Signed-off-by: Igor Druzhinin 
> ---
>  configure | 18 ++
>  hw/i386/xen/xen-mapcache.c| 79 
> ++-
>  include/hw/xen/xen_common.h   |  7 
>  include/sysemu/xen-mapcache.h | 11 +-
>  4 files changed, 106 insertions(+), 9 deletions(-)
> 
> diff --git a/configure b/configure
> index c571ad1..ad6156b 100755
> --- a/configure
> +++ b/configure
> @@ -2021,6 +2021,24 @@ EOF
>  # Xen unstable
>  elif
>  cat > $TMPC < +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
> +#include 
> +int main(void) {
> +  xenforeignmemory_handle *xfmem;
> +
> +  xfmem = xenforeignmemory_open(0, 0);
> +  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
> +
> +  return 0;
> +}
> +EOF
> +compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
> +  then
> +  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
> +  xen_ctrl_version=41000
> +  xen=yes
> +elif
> +cat > $TMPC <  #undef XC_WANT_COMPAT_DEVICEMODEL_API
>  #define __XEN_TOOLS__
>  #include 
> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> index cd4e746..a988be7 100644
> --- a/hw/i386/xen/xen-mapcache.c
> +++ b/hw/i386/xen/xen-mapcache.c
> @@ -151,6 +151,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
> *opaque)
>  }
>  
>  static void xen_remap_bucket(MapCacheEntry *entry,
> + void *vaddr,
>   hwaddr size,
>   hwaddr address_index,
>   bool dummy)
> @@ -167,7 +168,9 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>  err = g_malloc0(nb_pfn * sizeof (int));
>  
>  if (entry->vaddr_base != NULL) {
> -ram_block_notify_remove(entry->vaddr_base, entry->size);
> +if (entry->vaddr_base != vaddr) {
> +ram_block_notify_remove(entry->vaddr_base, entry->size);
> +}

I would prefer to see checks based on the dummy flag, rather than
entry->vaddr_base != vaddr.


>  if (munmap(entry->vaddr_base, entry->size) != 0) {
>  perror("unmap fails");
>  exit(-1);
> @@ -181,11 +184,11 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>  }
>  
>  if (!dummy) {
> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
> -   PROT_READ|PROT_WRITE,
> +vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid, vaddr,
> +   PROT_READ|PROT_WRITE, 0,
> nb_pfn, pfns, err);
>  if (vaddr_base == NULL) {
> -perror("xenforeignmemory_map");
> +perror("xenforeignmemory_map2");
>  exit(-1);
>  }

Can we print a warning if (!dummy && vaddr != NULL)?


>  entry->flags &= ~(XEN_MAPCACHE_ENTRY_DUMMY);
> @@ -194,7 +197,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>   * We create dummy mappings where we are unable to create a foreign
>   * mapping immediately due to certain circumstances (i.e. on resume 
> now)
>   */
> -vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
> +vaddr_base = mmap(vaddr, size, PROT_READ|PROT_WRITE,
>MAP_ANON|MAP_SHARED, -1, 0);
>  if (vaddr_base == NULL) {
>  perror("mmap");
> @@ -203,13 +206,16 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>  entry->flags |= XEN_MAPCACHE_ENTRY_DUMMY;
>  }
>  
> +if (entry->vaddr_base == NULL || entry->vaddr_base != vaddr) {
> +ram_block_notify_add(vaddr_base, size);
> +}

Please also check (or check instead) on the dummy flag.


>  entry->vaddr_base = vaddr_base;
>  entry->paddr_index = address_index;
>  entry->size = size;
>  entry->valid_mapping = (unsigned long *) g_malloc0(sizeof(unsigned long) 
> *
>  BITS_TO_LONGS(size >> XC_PAGE_SHIFT));
>  
> -ram_block_notify_add(entry->vaddr_base, entry->size);
>  bitmap_zero(entry->valid_mapping, nb_pfn);
>  for (i = 0; i < nb_pfn; i++) {
>  if (!err[i]) {
> @@ -282,14 +288,14 @@ tryagain:
>  if (!entry) {
>  entry = g_malloc0(sizeof (MapCacheEntry));
>  pentry->next = entry;
> -xen_remap_bucket(entry, cache_size, address_index, dummy);
> +xen_remap_bucket(entry, NULL,

Re: [Qemu-devel] [PATCH v2 4/4] xen: don't use xenstore to save/restore physmap anymore

2017-07-05 Thread Stefano Stabellini

On Tue, 4 Jul 2017, Igor Druzhinin wrote:
> If we have a system with xenforeignmemory_map2() implemented
> we don't need to save/restore physmap on suspend/restore
> anymore. In case we resume a VM without physmap - try to
> recreate the physmap during memory region restore phase and
> remap map cache entries accordingly. The old code is left
> for compatibility reasons.
> 
> Signed-off-by: Igor Druzhinin 
> ---
>  hw/i386/xen/xen-hvm.c   | 48 
> ++---
>  include/hw/xen/xen_common.h |  1 +
>  2 files changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index d259cf7..d24ca47 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -289,6 +289,7 @@ static XenPhysmap *get_physmapping(XenIOState *state,
>  return NULL;
>  }
>  
> +#ifdef XEN_COMPAT_PHYSMAP
>  static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
> ram_addr_t size, void 
> *opaque)
>  {
> @@ -334,6 +335,12 @@ static int xen_save_physmap(XenIOState *state, 
> XenPhysmap *physmap)
>  }
>  return 0;
>  }
> +#else
> +static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
> +{
> +return 0;
> +}
> +#endif
>  
>  static int xen_add_to_physmap(XenIOState *state,
>hwaddr start_addr,
> @@ -368,6 +375,26 @@ go_physmap:
>  DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
>  start_addr, start_addr + size);
>  
> +mr_name = memory_region_name(mr);
> +
> +physmap = g_malloc(sizeof (XenPhysmap));
> +
> +physmap->start_addr = start_addr;
> +physmap->size = size;
> +physmap->name = mr_name;
> +physmap->phys_offset = phys_offset;
> +
> +QLIST_INSERT_HEAD(>physmap, physmap, list);
> +
> +if (runstate_check(RUN_STATE_INMIGRATE)) {
> +/* Now when we have a physmap entry we can replace a dummy mapping 
> with
> + * a real one of guest foreign memory. */
> +uint8_t *p = xen_replace_cache_entry(phys_offset, start_addr, size);
> +assert(p && p == memory_region_get_ram_ptr(mr));
> +
> +return 0;
> +}
> +
>  pfn = phys_offset >> TARGET_PAGE_BITS;
>  start_gpfn = start_addr >> TARGET_PAGE_BITS;
>  for (i = 0; i < size >> TARGET_PAGE_BITS; i++) {
> @@ -382,17 +409,6 @@ go_physmap:
>  }
>  }
>  
> -mr_name = memory_region_name(mr);
> -
> -physmap = g_malloc(sizeof (XenPhysmap));
> -
> -physmap->start_addr = start_addr;
> -physmap->size = size;
> -physmap->name = mr_name;
> -physmap->phys_offset = phys_offset;
> -
> -QLIST_INSERT_HEAD(>physmap, physmap, list);
> -
>  xc_domain_pin_memory_cacheattr(xen_xc, xen_domid,
> start_addr >> TARGET_PAGE_BITS,
> (start_addr + size - 1) >> 
> TARGET_PAGE_BITS,
> @@ -1158,6 +1174,7 @@ static void xen_exit_notifier(Notifier *n, void *data)
>  xs_daemon_close(state->xenstore);
>  }
>  
> +#ifdef XEN_COMPAT_PHYSMAP
>  static void xen_read_physmap(XenIOState *state)
>  {
>  XenPhysmap *physmap = NULL;
> @@ -1205,6 +1222,11 @@ static void xen_read_physmap(XenIOState *state)
>  }
>  free(entries);
>  }
> +#else
> +static void xen_read_physmap(XenIOState *state)
> +{
> +}
> +#endif
>  
>  static void xen_wakeup_notifier(Notifier *notifier, void *data)
>  {
> @@ -1331,7 +1353,11 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion 
> **ram_memory)
>  state->bufioreq_local_port = rc;
>  
>  /* Init RAM management */
> +#ifdef XEN_COMPAT_PHYSMAP
>  xen_map_cache_init(xen_phys_offset_to_gaddr, state);
> +#else
> +xen_map_cache_init(NULL, state);
> +#endif

This is good. I would also like to #ifdef the

  if (!translated && mapcache->phys_offset_to_gaddr) {

block in xen_map_cache_unlocked


>  xen_ram_init(pcms, ram_size, ram_memory);
>  
>  qemu_add_vm_change_state_handler(xen_hvm_change_state_handler, state);
> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
> index 70a5cad..c04c5c9 100644
> --- a/include/hw/xen/xen_common.h
> +++ b/include/hw/xen/xen_common.h
> @@ -80,6 +80,7 @@ extern xenforeignmemory_handle *xen_fmem;
>  
>  #if CONFIG_XEN_CTRL_INTERFACE_VERSION < 41000
>  
> +#define XEN_COMPAT_PHYSMAP
>  #define xenforeignmemory_map2(h, d, a, p, f, ps, ar, e) \
>  xenforeignmemory_map(h, d, p, ps, ar, e)
>  
> -- 
> 2.7.4
>

Re: [Qemu-devel] [PATCH v2 2/4] xen/mapcache: add an ability to create dummy mappings

2017-07-05 Thread Stefano Stabellini

On Tue, 4 Jul 2017, Igor Druzhinin wrote:
> Dummys are simple anonymous mappings that are placed instead
> of regular foreign mappings in certain situations when we need
> to postpone the actual mapping but still have to give a
> memory region to QEMU to play with.
> 
> This is planned to be used for restore on Xen.
> 
> Signed-off-by: Igor Druzhinin 

Reviewed-by: Stefano Stabellini 


> ---
>  hw/i386/xen/xen-mapcache.c | 40 
>  1 file changed, 32 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> index e60156c..cd4e746 100644
> --- a/hw/i386/xen/xen-mapcache.c
> +++ b/hw/i386/xen/xen-mapcache.c
> @@ -53,6 +53,8 @@ typedef struct MapCacheEntry {
>  uint8_t *vaddr_base;
>  unsigned long *valid_mapping;
>  uint8_t lock;
> +#define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0)
> +uint8_t flags;
>  hwaddr size;
>  struct MapCacheEntry *next;
>  } MapCacheEntry;
> @@ -150,7 +152,8 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
> *opaque)
>  
>  static void xen_remap_bucket(MapCacheEntry *entry,
>   hwaddr size,
> - hwaddr address_index)
> + hwaddr address_index,
> + bool dummy)
>  {
>  uint8_t *vaddr_base;
>  xen_pfn_t *pfns;
> @@ -177,11 +180,27 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>  pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT)) + i;
>  }
>  
> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid, 
> PROT_READ|PROT_WRITE,
> -  nb_pfn, pfns, err);
> -if (vaddr_base == NULL) {
> -perror("xenforeignmemory_map");
> -exit(-1);
> +if (!dummy) {
> +vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
> +   PROT_READ|PROT_WRITE,
> +   nb_pfn, pfns, err);
> +if (vaddr_base == NULL) {
> +perror("xenforeignmemory_map");
> +exit(-1);
> +}
> +entry->flags &= ~(XEN_MAPCACHE_ENTRY_DUMMY);
> +} else {
> +/*
> + * We create dummy mappings where we are unable to create a foreign
> + * mapping immediately due to certain circumstances (i.e. on resume 
> now)
> + */
> +vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
> +  MAP_ANON|MAP_SHARED, -1, 0);
> +if (vaddr_base == NULL) {
> +perror("mmap");
> +exit(-1);
> +}
> +entry->flags |= XEN_MAPCACHE_ENTRY_DUMMY;
>  }
>  
>  entry->vaddr_base = vaddr_base;
> @@ -211,6 +230,7 @@ static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, 
> hwaddr size,
>  hwaddr cache_size = size;
>  hwaddr test_bit_size;
>  bool translated = false;
> +bool dummy = false;
>  
>  tryagain:
>  address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
> @@ -262,14 +282,14 @@ tryagain:
>  if (!entry) {
>  entry = g_malloc0(sizeof (MapCacheEntry));
>  pentry->next = entry;
> -xen_remap_bucket(entry, cache_size, address_index);
> +xen_remap_bucket(entry, cache_size, address_index, dummy);
>  } else if (!entry->lock) {
>  if (!entry->vaddr_base || entry->paddr_index != address_index ||
>  entry->size != cache_size ||
>  !test_bits(address_offset >> XC_PAGE_SHIFT,
>  test_bit_size >> XC_PAGE_SHIFT,
>  entry->valid_mapping)) {
> -xen_remap_bucket(entry, cache_size, address_index);
> +xen_remap_bucket(entry, cache_size, address_index, dummy);
>  }
>  }
>  
> @@ -282,6 +302,10 @@ tryagain:
>  translated = true;
>  goto tryagain;
>  }
> +if (!dummy && runstate_check(RUN_STATE_INMIGRATE)) {
> +dummy = true;
> +goto tryagain;
> +}
>  trace_xen_map_cache_return(NULL);
>  return NULL;
>  }
> -- 
> 2.7.4
>

Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-07-05 Thread Tian, Kevin

> From: Jean-Philippe Brucker
> Sent: Wednesday, July 5, 2017 8:42 PM
> 
> On 05/07/17 07:45, Tian, Kevin wrote:
> >> From: Liu, Yi L
> >> Sent: Monday, July 3, 2017 6:31 PM
> >>
> >> Hi Jean,
> >>
> >>
> >>>
>  2. Define a structure in include/uapi/linux/iommu.h(newly added
> header
> >> file)
> 
>  struct iommu_tlb_invalidate {
>   __u32   scope;
>  /* pasid-selective invalidation described by @pasid */
>  #define IOMMU_INVALIDATE_PASID   (1 << 0)
>  /* address-selevtive invalidation described by (@vaddr, @size) */
>  #define IOMMU_INVALIDATE_VADDR   (1 << 1)
> >
> > For VT-d above two flags are related. There is no method of flushing
> > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > selective invalidation is valid only for a given pasid. So it's not 
> > appropriate
> > to put them in same level of scope definition at least for VT-d.
> 
> For ARM SMMU the "flush all by VA" operation is valid. Although it's
> unclear at this point if we will ever allow that, it should probably stay
> in the common format, if there is one.

fine in common format. earlier I was thinking whether it should
be in scope. possibly fine after another thinking. :-)

> 
>   __u32   flags;
>  /*  targets non-pasid mappings, @pasid is not valid */
>  #define IOMMU_INVALIDATE_NO_PASID(1 << 0)
> >>>
> >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> >>> using a special mode where PASID 0 is reserved and any traffic without
> >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> >>> to invalidate that special context explicitly. But this means that
> >>> invalidation packet targeted at that context will have "scope = PASID"
> and
> >>> "flags = NO_PASID", which is utterly confusing.
> >>>
> >>> I now think that we should get rid of the
> IOMMU_INVALIDATE_NO_PASID
> >> flag
> >>> and just use PASID 0 to invalidate this context on ARM. I don't think
> >>> other architectures would use the NO_PASID flag anyway, but might be
> >> mistaken.
> >>
> >> I may suggest to keep it so far. On VT-d, we may pass some data in
> opaque,
> >> so
> >> we may work without it. But if other vendor want to issue non-PASID
> tagged
> >> cache, then may encounter problem.
> >
> > I'm worried about what's the criteria which attribute should be abstracted
> > in common structure and which can be left to opaque. It doesn't make
> > much sense to do such abstraction purely because different vendor
> formats
> > have some common fields. Usually we do such abstraction because
> > vendor-agnostic code need to do some common handling before going to
> > vendor specific code. However in this case VFIO is not expected to do
> anything
> > with those IOMMU specific attributes. Then the structure is directly
> forwarded
> > to IOMMU driver, which simply translates the structure into vendor specific
> > opaque data again. Then why bothering to do double translations in Qemu
> > and IOMMU driver side?>
> > Take VT-d for example. Below is a summary of all possible selections
> around
> > invalidation of 1st level structure for svm:
> >
> > Scope: All PASIDs, single PASID
> > for each PASID:
> > all mappings, or page-selective mappings (addr, size)
> > invalidation target:
> > IOTLB entries (leaf)
> > paging structure cache (non-leaf)
> 
> I'm curious, can you invalidate all intermediate paging structures for a
> given PASID without invalidating the leaves?

I don't think so. usually IOTLB flush is the base. one can further
specify whether flush should apply to non-leaves.

Thanks
Kevin

Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-07-05 Thread Tian, Kevin

> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, July 6, 2017 1:28 AM
> 
> On Wed, 5 Jul 2017 13:42:03 +0100
> Jean-Philippe Brucker  wrote:
> 
> > On 05/07/17 07:45, Tian, Kevin wrote:
> > >> From: Liu, Yi L
> > >> Sent: Monday, July 3, 2017 6:31 PM
> > >>
> > >> Hi Jean,
> > >>
> > >>
> > >>>
> >  2. Define a structure in include/uapi/linux/iommu.h(newly added
> header
> > >> file)
> > 
> >  struct iommu_tlb_invalidate {
> > __u32   scope;
> >  /* pasid-selective invalidation described by @pasid */
> >  #define IOMMU_INVALIDATE_PASID (1 << 0)
> >  /* address-selevtive invalidation described by (@vaddr, @size) */
> >  #define IOMMU_INVALIDATE_VADDR (1 << 1)
> > >
> > > For VT-d above two flags are related. There is no method of flushing
> > > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > > selective invalidation is valid only for a given pasid. So it's not 
> > > appropriate
> > > to put them in same level of scope definition at least for VT-d.
> >
> > For ARM SMMU the "flush all by VA" operation is valid. Although it's
> > unclear at this point if we will ever allow that, it should probably stay
> > in the common format, if there is one.
> >
> > __u32   flags;
> >  /*  targets non-pasid mappings, @pasid is not valid */
> >  #define IOMMU_INVALIDATE_NO_PASID  (1 << 0)
> > >>>
> > >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> > >>> using a special mode where PASID 0 is reserved and any traffic without
> > >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID"
> flag
> > >>> to invalidate that special context explicitly. But this means that
> > >>> invalidation packet targeted at that context will have "scope = PASID"
> and
> > >>> "flags = NO_PASID", which is utterly confusing.
> > >>>
> > >>> I now think that we should get rid of the
> IOMMU_INVALIDATE_NO_PASID
> > >> flag
> > >>> and just use PASID 0 to invalidate this context on ARM. I don't think
> > >>> other architectures would use the NO_PASID flag anyway, but might be
> > >> mistaken.
> > >>
> > >> I may suggest to keep it so far. On VT-d, we may pass some data in
> opaque,
> > >> so
> > >> we may work without it. But if other vendor want to issue non-PASID
> tagged
> > >> cache, then may encounter problem.
> > >
> > > I'm worried about what's the criteria which attribute should be
> abstracted
> > > in common structure and which can be left to opaque. It doesn't make
> > > much sense to do such abstraction purely because different vendor
> formats
> > > have some common fields. Usually we do such abstraction because
> > > vendor-agnostic code need to do some common handling before going to
> > > vendor specific code. However in this case VFIO is not expected to do
> anything
> > > with those IOMMU specific attributes. Then the structure is directly
> forwarded
> > > to IOMMU driver, which simply translates the structure into vendor
> specific
> > > opaque data again. Then why bothering to do double translations in
> Qemu
> > > and IOMMU driver side?>
> > > Take VT-d for example. Below is a summary of all possible selections
> around
> > > invalidation of 1st level structure for svm:
> > >
> > > Scope: All PASIDs, single PASID
> > > for each PASID:
> > >   all mappings, or page-selective mappings (addr, size)
> > > invalidation target:
> > >   IOTLB entries (leaf)
> > >   paging structure cache (non-leaf)
> >
> > I'm curious, can you invalidate all intermediate paging structures for a
> > given PASID without invalidating the leaves?
> >
> > >   PASID cache (pasid->cr3)
> > I guess any implementations that gives the whole PASID table to userspace
> > will need the PASID cache invalidation. This was missing from my proposal
> > since it was from virtio-iommu.
> >
> > > invalidation hint:
> > >   whether global pages are included
> > >   drain reads/writes>
> > > Above are pretty architectural attributes if just looking at functional
> > > purpose. Then if we really consider defining a common structure, it
> > > might be more natural to define a superset of all vendors' capabilities
> > > and remove the opaque field at all. But as said earlier the purpose of
> > > doing such abstraction is not clear if there is no vendor-agnostic
> > > user actually digesting those fields. Then should we reconsider the
> > > full opaque approach?
> > >
> > > Welcome comments since I may overlook something here. :-)
> >
> > I guess on x86 the invalidation packet formats are stable, but for ARM I'm
> > reluctant to deal with vendor-specific formats at the API level, because
> > they tend to be volatile. If a virtual IOMMU version is different from the
> > physical one, then the page table format will be the same but invalidation
> > format will not.
> >
> > So it would be good to define common fields that have the same effects
> > regardless on the underlying pIOMMU. And the

Re: [Qemu-devel] [PATCH v2 1/4] xen: move physmap saving into a separate function

2017-07-05 Thread Stefano Stabellini

On Tue, 4 Jul 2017, Igor Druzhinin wrote:
> Non-functional change.
> 
> Signed-off-by: Igor Druzhinin 

Unless you change something from a previous version, please retain the
acked-by and reviewed-by that were given (see
alpine.DEB.2.10.1706301629170.2919@sstabellini-ThinkPad-X260).


> ---
>  hw/i386/xen/xen-hvm.c | 57 
> ---
>  1 file changed, 31 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index cffa7e2..d259cf7 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -305,6 +305,36 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
>  return start_addr;
>  }
>  
> +static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
> +{
> +char path[80], value[17];
> +
> +snprintf(path, sizeof(path),
> +"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
> +xen_domid, (uint64_t)physmap->phys_offset);
> +snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->start_addr);
> +if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> +return -1;
> +}
> +snprintf(path, sizeof(path),
> +"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
> +xen_domid, (uint64_t)physmap->phys_offset);
> +snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->size);
> +if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> +return -1;
> +}
> +if (physmap->name) {
> +snprintf(path, sizeof(path),
> +"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
> +xen_domid, (uint64_t)physmap->phys_offset);
> +if (!xs_write(state->xenstore, 0, path,
> +  physmap->name, strlen(physmap->name))) {
> +return -1;
> +}
> +}
> +return 0;
> +}
> +
>  static int xen_add_to_physmap(XenIOState *state,
>hwaddr start_addr,
>ram_addr_t size,
> @@ -316,7 +346,6 @@ static int xen_add_to_physmap(XenIOState *state,
>  XenPhysmap *physmap = NULL;
>  hwaddr pfn, start_gpfn;
>  hwaddr phys_offset = memory_region_get_ram_addr(mr);
> -char path[80], value[17];
>  const char *mr_name;
>  
>  if (get_physmapping(state, start_addr, size)) {
> @@ -368,31 +397,7 @@ go_physmap:
> start_addr >> TARGET_PAGE_BITS,
> (start_addr + size - 1) >> 
> TARGET_PAGE_BITS,
> XEN_DOMCTL_MEM_CACHEATTR_WB);
> -
> -snprintf(path, sizeof(path),
> -"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
> -xen_domid, (uint64_t)phys_offset);
> -snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)start_addr);
> -if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> -return -1;
> -}
> -snprintf(path, sizeof(path),
> -"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
> -xen_domid, (uint64_t)phys_offset);
> -snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)size);
> -if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> -return -1;
> -}
> -if (mr_name) {
> -snprintf(path, sizeof(path),
> -"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
> -xen_domid, (uint64_t)phys_offset);
> -if (!xs_write(state->xenstore, 0, path, mr_name, strlen(mr_name))) {
> -return -1;
> -}
> -}
> -
> -return 0;
> +return xen_save_physmap(state, physmap);
>  }
>  
>  static int xen_remove_from_physmap(XenIOState *state,
> -- 
> 2.7.4
>

Re: [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status()

2017-07-05 Thread Eric Blake

On 07/03/2017 05:14 PM, Eric Blake wrote:
> From: Manos Pitsidianakis 
> 
> bdrv_co_get_block_status_from_file() and
> bdrv_co_get_block_status_from_backing() set *file to bs->file and
> bs->backing respectively, so that bdrv_co_get_block_status() can recurse
> to them. Future block drivers won't have to duplicate code to implement
> this.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Manos Pitsidianakis 
> Message-Id: <20170629184320.7151-4-el13...@mail.ntua.gr>
> 

Missing my Signed-off-by if we take this one in isolation through my
series, and still awaiting resolution on what will happen to the rest of
his series (v3 had some valid review comments that still need addressing)

> ---
> v2: Including this patch from Manos, since it affects my later patches;
> however, I anticipate that we will get a full v3 series from Manos
> merged first


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH v2 00/11] Block layer thread-safety, part 2

2017-07-05 Thread Paolo Bonzini

On 29/06/2017 15:27, Paolo Bonzini wrote:
> This part takes care of drivers and devices, making sure that they can
> accept concurrent I/O from multiple AioContext.
> 
> The following drivers are thread-safe without using any QemuMutex/CoMutex:
> crypto, gluster, null, rbd, win32-aio.  NBD has already been fixed,
> because the patch fixed an unrelated testcase.
> 
> The following drivers already use mutexes for everything except possibly
> snapshots, which do not (yet?) need protection: bochs, cloop, dmg, qcow,
> parallels, vhdx, vmdk, curl, iscsi, nfs.
> 
> The following drivers already use mutexes for _almost_ everything: vpc
> (missing get_block_status), vdi (missing bitmap access), vvfat (missing
> commit), not protected), qcow2 (must call CoQueue APIs under CoMutex).
> They are fixed by patches 1-5.
> 
> The following drivers must be changed to use CoMutex to protect internal
> data: qed (patches 6-9), sheepdog (patch 10).
> 
> The following driver must be changed to support I/O from any AioContext:
> ssh.  It is fixed by patch 11.
> 
> Paolo
> 
> v1->v2: new patch 8 + adjustments to patch 9 to fix qemu-iotests testcase
> 183 (bdrv_invalidate_cache from block migration)
> 
> Paolo Bonzini (11):
>   qcow2: call CoQueue APIs under CoMutex
>   coroutine-lock: add qemu_co_rwlock_downgrade and
> qemu_co_rwlock_upgrade
>   vdi: make it thread-safe
>   vpc: make it thread-safe
>   vvfat: make it thread-safe
>   qed: move tail of qed_aio_write_main to qed_aio_write_{cow,alloc}
>   block: invoke .bdrv_drain callback in coroutine context and from
> AioContext
>   qed: introduce bdrv_qed_init_state
>   qed: protect table cache with CoMutex
>   sheepdog: add queue_lock
>   ssh: support I/O from any AioContext
> 
>  block/io.c |  42 +++--
>  block/qcow2.c  |   4 +-
>  block/qed-cluster.c|   4 +-
>  block/qed-l2-cache.c   |   6 ++
>  block/qed-table.c  |  24 +++--
>  block/qed.c| 214 
> -
>  block/qed.h|  11 ++-
>  block/sheepdog.c   |  21 -
>  block/ssh.c|  24 +++--
>  block/vdi.c|  48 +-
>  block/vpc.c|  20 ++---
>  block/vvfat.c  |   8 +-
>  include/block/block_int.h  |   2 +-
>  include/qemu/coroutine.h   |  18 
>  util/qemu-coroutine-lock.c |  35 
>  15 files changed, 331 insertions(+), 150 deletions(-)
> 


ping?

Paolo

Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/5] spapr: DRC cleanups (part VI)

2017-07-05 Thread Daniel Henrique Barboza




On 07/05/2017 08:04 AM, David Gibson wrote:

On Tue, Jul 04, 2017 at 06:13:31PM -0300, Daniel Henrique Barboza wrote:

I just tested this patch set on top of current ppc-for-2.10 branch (which
contains
the patches from part V). It applied cleanly but required a couple of
trivial
fixes to build probably because it was made on top of an older code base.

Right, I fixed that up locally already, but haven't gotten around to
reposting yet.  You can look at the 'drcVI' branch on my github tree,
if you're interested.


The trivial migration test worked fine. The libvirt scenario (attaching a
device on
target before migration, try to unplug after migration) isn't working as
expected
but we have a different result with this series. Instead of silently failing
to unplug
with error messages on dmesg, the hot unplug works on QEMU level:

Thanks for testing.  Just to clarify what you're saying here, you
haven't spotted a regression with this series, but there is a case
which was broken and is still broken with slightly different
symptoms.  Yes?
In my opinion, yes. It is debatable if the patch series made it worse 
because

the guest is now misbehaving, but the feature per se wasn't working
prior to it.




(qemu) device_del core1
(qemu)
(qemu) info cpus
* CPU #0: nip=0xc00a3e0c thread_id=86162
(qemu) info hotpluggable-cpus
Hotpluggable CPUs:
   type: "host-spapr-cpu-core"
   vcpus_count: "1"
   CPUInstance Properties:
 core-id: "3"
   type: "host-spapr-cpu-core"
   vcpus_count: "1"
   CPUInstance Properties:
 core-id: "2"
   type: "host-spapr-cpu-core"
   vcpus_count: "1"
   CPUInstance Properties:
 core-id: "1"
   type: "host-spapr-cpu-core"
   vcpus_count: "1"
   qom_path: "/machine/unattached/device[0]"
   CPUInstance Properties:
 core-id: "0"
(qemu)


However, any operation on the guest afterwards (tried with lscpu and dmesg)
seems
to hung the guest. This is what I got when trying to do a dmesg after the
hot unplug:

Ouch.  That's bad.  I'll have to look into it.

I have rather a lot on my plate at the moment - if you get a chance to
work out which of the patches in the series causes this behaviour,
that could be handy.


***long post warning***

With the current master and current HEAD of ppc-for-2.10, the behavior in
the Libvirt scenario (device_add in both source and target before migration,
hot unplug after migration is completed) is that QEMU fails to hot 
unplug the
CPU from the guest OS. lscpu reports the same # of cpus even after 
device_del,

dmesg shows an error like this:

[  108.182291] pseries-hotplug-cpu: CPU with drc index 1008 already 
exists



WIth this patch series, QEMU removes the CPU but the guest misbehaves 
like I said

in my previous message.

With the current HEAD of drcVI branch, I rolled back until I found the 
patch that was

causing this new symptom. The patch is the very first of the series:

b752844 spapr: Remove 'awaiting_allocation' DRC flag

In short, adding this single patch into the HEAD of ppc-for-2.10 is 
causing this new
behavior I saw in my tests. This is the retrieved kernel log after the 
failed unplug:


[  176.434099] random: crng init done
[  461.182729] pseries-hotplug-cpu: CPU with drc index 1008 already 
exists

[  604.707369] INFO: task kworker/0:2:920 blocked for more than 120 seconds.
[  604.707666]   Not tainted 4.10.0-26-generic #30-Ubuntu
[  604.707881] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.

[  604.708194] kworker/0:2 D0   920  2 0x0800
[  604.708248] Workqueue: events vmstat_shepherd
[  604.708251] Call Trace:
[  604.708265] [c55a7830] [c1492090] 
sysctl_sched_migration_cost+0x0/0x4 (unreliable)

[  604.708271] [c55a7a00] [c001b770] __switch_to+0x2c0/0x450
[  604.708286] [c55a7a60] [c0b51238] __schedule+0x2f8/0x990
[  604.708289] [c55a7b40] [c0b51918] schedule+0x48/0xc0
[  604.708292] [c55a7b70] [c0b51e40] 
schedule_preempt_disabled+0x20/0x30
[  604.708295] [c55a7b90] [c0b54598] 
__mutex_lock_slowpath+0x208/0x380
[  604.708310] [c55a7c10] [c00e2ec8] 
get_online_cpus+0x58/0xa0
[  604.708312] [c55a7c40] [c02a5a18] 
vmstat_shepherd+0x38/0x160
[  604.708316] [c55a7c90] [c01061a0] 
process_one_work+0x2b0/0x5a0
[  604.708319] [c55a7d20] [c0106538] 
worker_thread+0xa8/0x650

[  604.708322] [c55a7dc0] [c010f0a4] kthread+0x164/0x1b0
[  604.708326] [c55a7e30] [c000b4e8] 
ret_from_kernel_thread+0x5c/0x74
[  604.708341] INFO: task kworker/u8:0:3068 blocked for more than 120 
seconds.

[  604.708601]   Not tainted 4.10.0-26-generic #30-Ubuntu
[  604.708809] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.

[  604.709107] kworker/u8:0D0  3068  2 0x0800
[  604.709114] Workqueue: pseries hotplug workque pseries_hp_work_fn
[  604.709116] Call Trace:
[

Re: [Qemu-devel] [PATCH 4/7] dump: add vmcoreinfo ELF note

2017-07-05 Thread Marc-André Lureau

Hi

On Wed, Jul 5, 2017 at 1:48 AM, Laszlo Ersek  wrote:
> On 06/29/17 15:23, Marc-André Lureau wrote:
>> Read the vmcoreinfo ELF PT_NOTE from guest memory when vmcoreinfo
>> device provides the location, and write it as an ELF note in the dump.
>>
>> There are now 2 possible sources of phys_base information.
>>
>> (1) arch guessed value from arch_dump_info_get()
>
> The function is called cpu_get_dump_info().
>
>> (2) vmcoreinfo ELF note NUMBER(phys_base)= field
>>
>> NUMBER(phys_base) in vmcoreinfo has only been recently introduced
>> in Linux 4.10 (401721ecd1dc "kexec: export the value of phys_base
>> instead of symbol address").
>>
>> Since (2) has better chances to be accurate, the guessed value is
>> replaced by the value from the vmcoreinfo ELF note.
>>
>> The phys_base value is stored in the same dump field locations as
>> before, and may duplicate the information available in the vmcoreinfo
>> ELF PT_NOTE. Crash tools should be prepared to handle this case.
>>
>> Signed-off-by: Marc-André Lureau 
>> ---
>>  include/sysemu/dump.h |   2 +
>>  dump.c| 117 
>> ++
>>  2 files changed, 119 insertions(+)
>>
>> diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
>> index 2672a15f8b..111a7dcaa4 100644
>> --- a/include/sysemu/dump.h
>> +++ b/include/sysemu/dump.h
>> @@ -192,6 +192,8 @@ typedef struct DumpState {
>>* this could be used to calculate
>>* how much work we have
>>* finished. */
>> +uint8_t *vmcoreinfo; /* ELF note content */
>> +size_t vmcoreinfo_size;
>>  } DumpState;
>>
>>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
>> diff --git a/dump.c b/dump.c
>> index d9090a24cc..8fda5cc1ed 100644
>> --- a/dump.c
>> +++ b/dump.c
>> @@ -26,6 +26,8 @@
>>  #include "qapi/qmp/qerror.h"
>>  #include "qmp-commands.h"
>>  #include "qapi-event.h"
>> +#include "qemu/error-report.h"
>> +#include "hw/acpi/vmcoreinfo.h"
>>
>>  #include 
>>  #ifdef CONFIG_LZO
>> @@ -38,6 +40,11 @@
>>  #define ELF_MACHINE_UNAME "Unknown"
>>  #endif
>>
>> +#define ELF_NOTE_SIZE(hdr_size, name_size, desc_size)   \
>> +((DIV_ROUND_UP((hdr_size), 4)   \
>> +  + DIV_ROUND_UP((name_size), 4)\
>> +  + DIV_ROUND_UP((desc_size), 4)) * 4)
>> +
>
> This looks really useful to me, but (I think?) we generally leave the
> operator hanging at the end of the line:
>
> #define ELF_NOTE_SIZE(hdr_size, name_size, desc_size) \
> ((DIV_ROUND_UP((hdr_size), 4) +   \
>   DIV_ROUND_UP((name_size), 4) +  \
>   DIV_ROUND_UP((desc_size), 4)) * 4)

ok

>
>>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val)
>>  {
>>  if (s->dump_info.d_endian == ELFDATA2LSB) {
>> @@ -76,6 +83,8 @@ static int dump_cleanup(DumpState *s)
>>  guest_phys_blocks_free(>guest_phys_blocks);
>>  memory_mapping_list_free(>list);
>>  close(s->fd);
>> +g_free(s->vmcoreinfo);
>> +s->vmcoreinfo = NULL;
>>  if (s->resume) {
>>  if (s->detached) {
>>  qemu_mutex_lock_iothread();
>> @@ -235,6 +244,19 @@ static inline int cpu_index(CPUState *cpu)
>>  return cpu->cpu_index + 1;
>>  }
>>
>> +static void write_vmcoreinfo_note(WriteCoreDumpFunction f, DumpState *s,
>> +  Error **errp)
>> +{
>> +int ret;
>> +
>> +if (s->vmcoreinfo) {
>> +ret = f(s->vmcoreinfo, s->vmcoreinfo_size, s);
>> +if (ret < 0) {
>> +error_setg(errp, "dump: failed to write vmcoreinfo");
>> +}
>> +}
>> +}
>> +
>>  static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
>>Error **errp)
>>  {
>> @@ -258,6 +280,8 @@ static void write_elf64_notes(WriteCoreDumpFunction f, 
>> DumpState *s,
>>  return;
>>  }
>>  }
>> +
>> +write_vmcoreinfo_note(f, s, errp);
>>  }
>>
>>  static void write_elf32_note(DumpState *s, Error **errp)
>> @@ -303,6 +327,8 @@ static void write_elf32_notes(WriteCoreDumpFunction f, 
>> DumpState *s,
>>  return;
>>  }
>>  }
>> +
>> +write_vmcoreinfo_note(f, s, errp);
>>  }
>>
>
> Wait, I'm confused again. You explained why it was OK to hook this logic
> into the kdump handling too, but I don't think I understand your
> explanation, so let me repeat my confusion below :)
>
> In the ELF case, this code works fine, I think. As long as the guest
> provided us with a well-formed note, a well-formed note will be appended
> to the ELF dump.
>
> But, this code is also invoked in the kdump case, and I don't understand
> why that's a good thing. If I understand the next patch correctly, the
> kdump format already provides crash with a (trimmed) copy of the guest
> kernels vmcoreinfo note. So in the kdump case, why do we have to create
> yet

Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime

2017-07-05 Thread Alex Bennée


Peter Maydell  writes:

> On 5 July 2017 at 20:30, Alex Bennée  wrote:
>>
>> Peter Maydell  writes:
>>
>>> On 5 July 2017 at 17:01, Alex Bennée  wrote:
 An interesting bug was reported on #qemu today. It was bisected to
 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run
 with taskset -c 0. Originally the fingers where pointed at mttcg but it
 occurs in both single and multi-threaded modes.

 I think the problem is qemu_system_reset_request() is certainly racy
 when resetting a running CPU. AFAICT:

   - Guest resets board, writing to some hw address (e.g.
 arm_sysctl_write)
   - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET)
   - We exit iowrite and drop the BQL
   - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset
   - we start writing new values to CPU env while still in TCG code
   - CHAOS!

 The general solution for this is to ensure these sort of tasks are done
 with safe work in the CPUs context when we know nothing else is running.
 It seems this is probably best done by modifying
 qemu_system_reset_request to queue work up on current_cpu and execute it
 as safe work - I don't think the vl.c thread should ever be messing
 about with calling cpu_reset directly.
>>>
>>> My first thought is that qemu_system_reset() should absolutely
>>> stop every CPU (or other runnable thing like a DMA agent) in the
>>> system.
>>
>> Are all these reset calls system wide though?
>
> It's called 'system_reset' because it resets the entire system...
>
>> After all with PCSI you
>> can bring individual cores up and down. I appreciate the vexpress stuff
>> pre-dates those well defined semantics though.
>
> It's individual core reset that's a more ad-hoc afterthought,
> really.
>
>> vm_stop certainly tries to deal with things gracefully as well as send
>> qapi events, drain IO queues and the rest of it. My only concern is it
>> handles two cases - external vm_stops and those from the current CPU.
>>
>> I think it may be cleaner for CPU originated halts to use the
>> async_safe_run_on_cpu() mechanism.
>
> System reset already has an async component to it -- you call
> qemu_system_reset_request(), which just says "schedule a system
> reset as soon as convenient". qemu_system_reset() is the thing
> that runs later and actually does the job (from the io thread,
> not the CPU thread).
>
> Looking more closely at the vl.c code, it looks like it
> calls pause_all_vcpus() before calling qemu_system_reset():
> shouldn't that be pausing all the TCG CPUs?

Looking deeper it seems cpu_stop_current() is doing the wrong thing.
Because it sets cpu->stopped the pause_all_vcpus() in the vl.c thread
doesn't wait.

I suspect it should really be doing a cpu_loop_exit. I'll see if I can
work up a patch.

>
> thanks
> -- PMM


--
Alex Bennée

Re: [Qemu-devel] [PATCH 12/17] migration: add postcopy migration of dirty bitmaps

2017-07-05 Thread John Snow

On 07/05/2017 05:24 AM, Vladimir Sementsov-Ogievskiy wrote:
> 16.02.2017 16:04, Fam Zheng wrote:
>>> +dbms->node_name = bdrv_get_node_name(bs);
>>> +if (!dbms->node_name || dbms->node_name[0] == '\0') {
>>> +dbms->node_name = bdrv_get_device_name(bs);
>>> +}
>>> +dbms->bitmap = bitmap;
>> What protects the case that the bitmap is released before migration
>> completes?
>>
> What is the source of such deletion? qmp command? Theoretically possible.
> 
> I see the following variants:
> 
> 1. additional variable BdrvDirtyBItmap.migration, which forbids bitmap
> deletion
> 
> 2. make bitmap anonymous (bdrv_dirty_bitmap_make_anon) - it will not be
> available through qmp
> 

Making the bitmap anonymous would forbid us to query the bitmap, which
there is no general reason to do, excepting the idea that a third party
attempting to use the bitmap during a migration is probably a bad idea.
I don't really like the idea of "hiding" information from the user,
though, because then we'd have to worry about name collisions when we
de-anonymized the bitmap again. That's not so palatable.

> what do you think?
> 

The modes for bitmaps are getting messy.

As a reminder, the officially exposed "modes" of a bitmap are currently:

FROZEN: Cannot be reset/deleted. Implication is that the bitmap is
otherwise "ACTIVE."
DISABLED: Not recording any writes (by choice.)
ACTIVE: Actively recording writes.

These are documented in the public API as possibilities for
DirtyBitmapStatus in block-core.json. We didn't add a new condition for
"readonly" either, which I think is actually required:

READONLY: Not recording any writes (by necessity.)

Your new use case here sounds like Frozen to me, but it simply does not
have an anonymous successor to force it to be recognized as "frozen." We
can add a `bool protected` or `bool frozen` field to force recognition
of this status and adjust the json documentation accordingly.

I think then we'd have four recognized states:

FROZEN: Cannot be reset/deleted. Bitmap is in-use by a block job or
other internal process. Bitmap is otherwise ACTIVE.
DISABLED: Not recording any writes (by choice.)
READONLY: Not able to record any writes (by necessity.)
ACTIVE: Normal bitmap status.

Sound right?

Re: [Qemu-devel] [PATCH v5 03/13] char: chardevice hotswap

2017-07-05 Thread Paolo Bonzini


> So instead we'll need to use proper locks in each of the front-ends?

Hi,

the only front-end actually writing from multiple threads is the monitor.
You can skip everything else, as it will be locked on the "big QEMU lock".

Paolo

> Or do you mean that it can be skipped for the most of them? I don't know
> about all possible threading cases.
> e.g. for serial/virtio-serial? Will they always share the same thread
> with hmp/qmp driven chardev-change command? And won't yield and hotswap
> in the middle of some write handler?
> 
> /Anton
>

[Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors

2017-07-05 Thread Eric Blake

bdrv_is_allocated_above() was relying on intermediate->total_sectors,
which is a field that can have stale contents depending on the value
of intermediate->has_variable_length.  An audit shows that we are safe
(we were first calling through bdrv_co_get_block_status() which in
turn calls bdrv_nb_sectors() and therefore just refreshed the current
length), but it's nicer to favor our accessor functions to avoid having
to repeat such an audit, even if it means refresh_total_sectors() is
called more frequently.

Suggested-by: John Snow 
Signed-off-by: Eric Blake 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Jeff Cody 

---
v3-v4: no change
v2: new patch
---
 block/io.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index cb40069..fb8d1c7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1952,6 +1952,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
 intermediate = top;
 while (intermediate && intermediate != base) {
 int64_t pnum_inter;
+int64_t size_inter;
 int psectors_inter;

 ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
@@ -1969,13 +1970,14 @@ int bdrv_is_allocated_above(BlockDriverState *top,

 /*
  * [sector_num, nb_sectors] is unallocated on top but intermediate
- * might have
- *
- * [sector_num+x, nr_sectors] allocated.
+ * might have [sector_num+x, nb_sectors-x] allocated.
  */
+size_inter = bdrv_nb_sectors(intermediate);
+if (size_inter < 0) {
+return size_inter;
+}
 if (n > psectors_inter &&
-(intermediate == top ||
- sector_num + psectors_inter < intermediate->total_sectors)) {
+(intermediate == top || sector_num + psectors_inter < size_inter)) 
{
 n = psectors_inter;
 }

-- 
2.9.4

[Qemu-devel] [PATCH v4 18/21] backup: Switch backup_run() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of backups to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are cluster-aligned).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
---
v2-v4: no change
---
 block/backup.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index c029d44..04def91 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -370,11 +370,10 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 int ret = 0;
 int clusters_per_iter;
 uint32_t granularity;
-int64_t sector;
+int64_t offset;
 int64_t cluster;
 int64_t end;
 int64_t last_cluster = -1;
-int64_t sectors_per_cluster = cluster_size_sectors(job);
 BdrvDirtyBitmapIter *dbi;

 granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
@@ -382,8 +381,8 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);

 /* Find the next dirty sector(s) */
-while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
-cluster = sector / sectors_per_cluster;
+while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
+cluster = offset / job->cluster_size;

 /* Fake progress updates for any clusters we skipped */
 if (cluster != last_cluster + 1) {
@@ -410,7 +409,8 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 /* If the bitmap granularity is smaller than the backup granularity,
  * we need to advance the iterator pointer to the next cluster. */
 if (granularity < job->cluster_size) {
-bdrv_set_dirty_iter(dbi, cluster * sectors_per_cluster);
+bdrv_set_dirty_iter(dbi,
+cluster * job->cluster_size / 
BDRV_SECTOR_SIZE);
 }

 last_cluster = cluster - 1;
@@ -432,17 +432,15 @@ static void coroutine_fn backup_run(void *opaque)
 BackupBlockJob *job = opaque;
 BackupCompleteData *data;
 BlockDriverState *bs = blk_bs(job->common.blk);
-int64_t start, end;
+int64_t offset;
 int64_t sectors_per_cluster = cluster_size_sectors(job);
 int ret = 0;

 QLIST_INIT(>inflight_reqs);
 qemu_co_rwlock_init(>flush_rwlock);

-start = 0;
-end = DIV_ROUND_UP(job->common.len, job->cluster_size);
-
-job->done_bitmap = bitmap_new(end);
+job->done_bitmap = bitmap_new(DIV_ROUND_UP(job->common.len,
+   job->cluster_size));

 job->before_write.notify = backup_before_write_notify;
 bdrv_add_before_write_notifier(bs, >before_write);
@@ -457,7 +455,8 @@ static void coroutine_fn backup_run(void *opaque)
 ret = backup_run_incremental(job);
 } else {
 /* Both FULL and TOP SYNC_MODE's require copying.. */
-for (; start < end; start++) {
+for (offset = 0; offset < job->common.len;
+ offset += job->cluster_size) {
 bool error_is_read;
 int alloced = 0;

@@ -480,8 +479,8 @@ static void coroutine_fn backup_run(void *opaque)
  * needed but at some point that is always the case. */
 alloced =
 bdrv_is_allocated(bs,
-start * sectors_per_cluster + i,
-sectors_per_cluster - i, );
+  (offset >> BDRV_SECTOR_BITS) + i,
+  sectors_per_cluster - i, );
 i += n;

 if (alloced || n == 0) {
@@ -499,9 +498,8 @@ static void coroutine_fn backup_run(void *opaque)
 if (alloced < 0) {
 ret = alloced;
 } else {
-ret = backup_do_cow(job, start * job->cluster_size,
-job->cluster_size, _is_read,
-false);
+ret = backup_do_cow(job, offset, job->cluster_size,
+_is_read, false);
 }
 if (ret < 0) {
 /* Depending on error action, fail now or retry cluster */
@@ -510,7 +508,7 @@ static void coroutine_fn backup_run(void *opaque)
 if (action == BLOCK_ERROR_ACTION_REPORT) {
 break;
 } else {
-start--;
+offset -= job->cluster_size;
 continue;
 }
 }
-- 
2.9.4

[Qemu-devel] [PATCH v4 15/21] backup: Switch BackupBlockJob to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting an
internal structure (no semantic change), and all references to
tracking progress.  Drop a redundant local variable bytes_per_cluster.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
---
v2-v4: no change
---
 block/backup.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 06431ac..4e64710 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -39,7 +39,7 @@ typedef struct BackupBlockJob {
 BlockdevOnError on_source_error;
 BlockdevOnError on_target_error;
 CoRwlock flush_rwlock;
-uint64_t sectors_read;
+uint64_t bytes_read;
 unsigned long *done_bitmap;
 int64_t cluster_size;
 bool compress;
@@ -102,16 +102,15 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 void *bounce_buffer = NULL;
 int ret = 0;
 int64_t sectors_per_cluster = cluster_size_sectors(job);
-int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
-int64_t start, end;
-int n;
+int64_t start, end; /* clusters */
+int n; /* bytes */

 qemu_co_rwlock_rdlock(>flush_rwlock);

 start = sector_num / sectors_per_cluster;
 end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);

-trace_backup_do_cow_enter(job, start * bytes_per_cluster,
+trace_backup_do_cow_enter(job, start * job->cluster_size,
   sector_num * BDRV_SECTOR_SIZE,
   nb_sectors * BDRV_SECTOR_SIZE);

@@ -120,28 +119,27 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,

 for (; start < end; start++) {
 if (test_bit(start, job->done_bitmap)) {
-trace_backup_do_cow_skip(job, start * bytes_per_cluster);
+trace_backup_do_cow_skip(job, start * job->cluster_size);
 continue; /* already copied */
 }

-trace_backup_do_cow_process(job, start * bytes_per_cluster);
+trace_backup_do_cow_process(job, start * job->cluster_size);

-n = MIN(sectors_per_cluster,
-job->common.len / BDRV_SECTOR_SIZE -
-start * sectors_per_cluster);
+n = MIN(job->cluster_size,
+job->common.len - start * job->cluster_size);

 if (!bounce_buffer) {
 bounce_buffer = blk_blockalign(blk, job->cluster_size);
 }
 iov.iov_base = bounce_buffer;
-iov.iov_len = n * BDRV_SECTOR_SIZE;
+iov.iov_len = n;
 qemu_iovec_init_external(_qiov, , 1);

 ret = blk_co_preadv(blk, start * job->cluster_size,
 bounce_qiov.size, _qiov,
 is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
 if (ret < 0) {
-trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, ret);
+trace_backup_do_cow_read_fail(job, start * job->cluster_size, ret);
 if (error_is_read) {
 *error_is_read = true;
 }
@@ -157,7 +155,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 
0);
 }
 if (ret < 0) {
-trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, 
ret);
+trace_backup_do_cow_write_fail(job, start * job->cluster_size, 
ret);
 if (error_is_read) {
 *error_is_read = false;
 }
@@ -169,8 +167,8 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 /* Publish progress, guest I/O counts as progress too.  Note that the
  * offset field is an opaque progress value, it is not a disk offset.
  */
-job->sectors_read += n;
-job->common.offset += n * BDRV_SECTOR_SIZE;
+job->bytes_read += n;
+job->common.offset += n;
 }

 out:
@@ -363,9 +361,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob 
*job)
  */
 if (job->common.speed) {
 uint64_t delay_ns = ratelimit_calculate_delay(>limit,
-  job->sectors_read *
-  BDRV_SECTOR_SIZE);
-job->sectors_read = 0;
+  job->bytes_read);
+job->bytes_read = 0;
 block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
 } else {
 block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0);
-- 
2.9.4

Re: [Qemu-devel] [PATCH v11 6/6] trace: [trivial] Statically enable all guest events

2017-07-05 Thread Eric Blake

On 07/04/2017 03:54 AM, Lluís Vilanova wrote:
> The existing optimizations makes it feasible to have them available on all
> builds.

While this change may feel trivial, I think it is a misnomer to include
"[trivial]" in the subject line, and I also think it should not go in
through qemu-trivial.  The ideal trivial patch is one that can be
applied in isolation, but your patch can only be applied as part of a
series that includes the earlier optimizations that made this one possible.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v4 17/21] backup: Switch backup_do_cow() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 

---
v2-v4: no change
---
 block/backup.c | 62 --
 1 file changed, 26 insertions(+), 36 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index cfbd921..c029d44 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -91,7 +91,7 @@ static void cow_request_end(CowRequest *req)
 }

 static int coroutine_fn backup_do_cow(BackupBlockJob *job,
-  int64_t sector_num, int nb_sectors,
+  int64_t offset, uint64_t bytes,
   bool *error_is_read,
   bool is_write_notifier)
 {
@@ -101,34 +101,28 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 QEMUIOVector bounce_qiov;
 void *bounce_buffer = NULL;
 int ret = 0;
-int64_t sectors_per_cluster = cluster_size_sectors(job);
-int64_t start, end; /* clusters */
+int64_t start, end; /* bytes */
 int n; /* bytes */

 qemu_co_rwlock_rdlock(>flush_rwlock);

-start = sector_num / sectors_per_cluster;
-end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+start = QEMU_ALIGN_DOWN(offset, job->cluster_size);
+end = QEMU_ALIGN_UP(bytes + offset, job->cluster_size);

-trace_backup_do_cow_enter(job, start * job->cluster_size,
-  sector_num * BDRV_SECTOR_SIZE,
-  nb_sectors * BDRV_SECTOR_SIZE);
+trace_backup_do_cow_enter(job, start, offset, bytes);

-wait_for_overlapping_requests(job, start * job->cluster_size,
-  end * job->cluster_size);
-cow_request_begin(_request, job, start * job->cluster_size,
-  end * job->cluster_size);
+wait_for_overlapping_requests(job, start, end);
+cow_request_begin(_request, job, start, end);

-for (; start < end; start++) {
-if (test_bit(start, job->done_bitmap)) {
-trace_backup_do_cow_skip(job, start * job->cluster_size);
+for (; start < end; start += job->cluster_size) {
+if (test_bit(start / job->cluster_size, job->done_bitmap)) {
+trace_backup_do_cow_skip(job, start);
 continue; /* already copied */
 }

-trace_backup_do_cow_process(job, start * job->cluster_size);
+trace_backup_do_cow_process(job, start);

-n = MIN(job->cluster_size,
-job->common.len - start * job->cluster_size);
+n = MIN(job->cluster_size, job->common.len - start);

 if (!bounce_buffer) {
 bounce_buffer = blk_blockalign(blk, job->cluster_size);
@@ -137,11 +131,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 iov.iov_len = n;
 qemu_iovec_init_external(_qiov, , 1);

-ret = blk_co_preadv(blk, start * job->cluster_size,
-bounce_qiov.size, _qiov,
+ret = blk_co_preadv(blk, start, bounce_qiov.size, _qiov,
 is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
 if (ret < 0) {
-trace_backup_do_cow_read_fail(job, start * job->cluster_size, ret);
+trace_backup_do_cow_read_fail(job, start, ret);
 if (error_is_read) {
 *error_is_read = true;
 }
@@ -149,22 +142,22 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 }

 if (buffer_is_zero(iov.iov_base, iov.iov_len)) {
-ret = blk_co_pwrite_zeroes(job->target, start * job->cluster_size,
+ret = blk_co_pwrite_zeroes(job->target, start,
bounce_qiov.size, BDRV_REQ_MAY_UNMAP);
 } else {
-ret = blk_co_pwritev(job->target, start * job->cluster_size,
+ret = blk_co_pwritev(job->target, start,
  bounce_qiov.size, _qiov,
  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 
0);
 }
 if (ret < 0) {
-trace_backup_do_cow_write_fail(job, start * job->cluster_size, 
ret);
+trace_backup_do_cow_write_fail(job, start, ret);
 if (error_is_read) {
 *error_is_read = false;
 }
 goto out;
 }

-set_bit(start, job->done_bitmap);
+set_bit(start / job->cluster_size, job->done_bitmap);

 /* Publish progress, guest I/O counts as progress too.  Note that the
  * offset field is an opaque progress value, it is not a disk offset.
@@ -180,8 +173,7 @@ out:

 cow_request_end(_request);

-trace_backup_do_cow_return(job, sector_num * BDRV_SECTOR_SIZE,
-

[Qemu-devel] [PATCH v4 14/21] block: Drop unused bdrv_round_sectors_to_clusters()

2017-07-05 Thread Eric Blake

Now that the last user [mirror_iteration()] has converted to using
bytes, we no longer need a function to round sectors to clusters.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
---
v3-v4: no change
v2: hoist to earlier series, no change
---
 include/block/block.h |  4 
 block/io.c| 21 -
 2 files changed, 25 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index f432ea6..a9dc753 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -472,10 +472,6 @@ const char *bdrv_get_device_or_node_name(const 
BlockDriverState *bs);
 int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
-void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors,
-int64_t *cluster_sector_num,
-int *cluster_nb_sectors);
 void bdrv_round_to_clusters(BlockDriverState *bs,
 int64_t offset, unsigned int bytes,
 int64_t *cluster_offset,
diff --git a/block/io.c b/block/io.c
index 9186c5a..f662c97 100644
--- a/block/io.c
+++ b/block/io.c
@@ -419,27 +419,6 @@ static void mark_request_serialising(BdrvTrackedRequest 
*req, uint64_t align)
 }

 /**
- * Round a region to cluster boundaries (sector-based)
- */
-void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors,
-int64_t *cluster_sector_num,
-int *cluster_nb_sectors)
-{
-BlockDriverInfo bdi;
-
-if (bdrv_get_info(bs, ) < 0 || bdi.cluster_size == 0) {
-*cluster_sector_num = sector_num;
-*cluster_nb_sectors = nb_sectors;
-} else {
-int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
-*cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
-*cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
-nb_sectors, c);
-}
-}
-
-/**
  * Round a region to cluster boundaries
  */
 void bdrv_round_to_clusters(BlockDriverState *bs,
-- 
2.9.4

[Qemu-devel] [PATCH v4 11/21] mirror: Switch mirror_cow_align() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and add mirror_clip_bytes() as a
counterpart to mirror_clip_sectors().  Some of the conversion is
a bit tricky, requiring temporaries to convert between units; it
will be cleared up in a following patch.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 

---
v3-v4: no change
v2: tweak mirror_clip_bytes() signature to match previous patch
---
 block/mirror.c | 64 ++
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 70682b6..374cefd 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -176,6 +176,15 @@ static void mirror_read_complete(void *opaque, int ret)
 aio_context_release(blk_get_aio_context(s->common.blk));
 }

+/* Clip bytes relative to offset to not exceed end-of-file */
+static inline int64_t mirror_clip_bytes(MirrorBlockJob *s,
+int64_t offset,
+int64_t bytes)
+{
+return MIN(bytes, s->bdev_length - offset);
+}
+
+/* Clip nb_sectors relative to sector_num to not exceed end-of-file */
 static inline int mirror_clip_sectors(MirrorBlockJob *s,
   int64_t sector_num,
   int nb_sectors)
@@ -184,44 +193,39 @@ static inline int mirror_clip_sectors(MirrorBlockJob *s,
s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
 }

-/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
- * return the offset of the adjusted tail sector against original. */
-static int mirror_cow_align(MirrorBlockJob *s,
-int64_t *sector_num,
-int *nb_sectors)
+/* Round offset and/or bytes to target cluster if COW is needed, and
+ * return the offset of the adjusted tail against original. */
+static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
+unsigned int *bytes)
 {
 bool need_cow;
 int ret = 0;
-int chunk_sectors = s->granularity >> BDRV_SECTOR_BITS;
-int64_t align_sector_num = *sector_num;
-int align_nb_sectors = *nb_sectors;
-int max_sectors = chunk_sectors * s->max_iov;
+int64_t align_offset = *offset;
+unsigned int align_bytes = *bytes;
+int max_bytes = s->granularity * s->max_iov;

-need_cow = !test_bit(*sector_num / chunk_sectors, s->cow_bitmap);
-need_cow |= !test_bit((*sector_num + *nb_sectors - 1) / chunk_sectors,
+need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
+need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
   s->cow_bitmap);
 if (need_cow) {
-bdrv_round_sectors_to_clusters(blk_bs(s->target), *sector_num,
-   *nb_sectors, _sector_num,
-   _nb_sectors);
+bdrv_round_to_clusters(blk_bs(s->target), *offset, *bytes,
+   _offset, _bytes);
 }

-if (align_nb_sectors > max_sectors) {
-align_nb_sectors = max_sectors;
+if (align_bytes > max_bytes) {
+align_bytes = max_bytes;
 if (need_cow) {
-align_nb_sectors = QEMU_ALIGN_DOWN(align_nb_sectors,
-   s->target_cluster_size >>
-   BDRV_SECTOR_BITS);
+align_bytes = QEMU_ALIGN_DOWN(align_bytes,
+  s->target_cluster_size);
 }
 }
-/* Clipping may result in align_nb_sectors unaligned to chunk boundary, but
+/* Clipping may result in align_bytes unaligned to chunk boundary, but
  * that doesn't matter because it's already the end of source image. */
-align_nb_sectors = mirror_clip_sectors(s, align_sector_num,
-   align_nb_sectors);
+align_bytes = mirror_clip_bytes(s, align_offset, align_bytes);

-ret = align_sector_num + align_nb_sectors - (*sector_num + *nb_sectors);
-*sector_num = align_sector_num;
-*nb_sectors = align_nb_sectors;
+ret = align_offset + align_bytes - (*offset + *bytes);
+*offset = align_offset;
+*bytes = align_bytes;
 assert(ret >= 0);
 return ret;
 }
@@ -257,10 +261,18 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
 nb_sectors = MIN(max_sectors, nb_sectors);
 assert(nb_sectors);
+assert(nb_sectors < BDRV_REQUEST_MAX_SECTORS);
 ret = nb_sectors;

 if (s->cow_bitmap) {
-ret += mirror_cow_align(s, _num, _sectors);
+int64_t offset = sector_num * BDRV_SECTOR_SIZE;
+unsigned int bytes = nb_sectors *

[Qemu-devel] [PATCH v4 16/21] backup: Switch block_backup.h to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting
the public interface to backup jobs (no semantic change), including
a change to CowRequest to track by bytes instead of cluster indices.

Note that this does not change the difference between the public
interface (starting point, and size of the subsequent range) and
the internal interface (starting and end points).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Xie Changlong 
Reviewed-by: Jeff Cody 

---
v3-v4: no change
v2: change a couple more parameter names
---
 include/block/block_backup.h | 11 +--
 block/backup.c   | 33 -
 block/replication.c  | 12 
 3 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/include/block/block_backup.h b/include/block/block_backup.h
index 8a75947..994a3bd 100644
--- a/include/block/block_backup.h
+++ b/include/block/block_backup.h
@@ -21,17 +21,16 @@
 #include "block/block_int.h"

 typedef struct CowRequest {
-int64_t start;
-int64_t end;
+int64_t start_byte;
+int64_t end_byte;
 QLIST_ENTRY(CowRequest) list;
 CoQueue wait_queue; /* coroutines blocked on this request */
 } CowRequest;

-void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
-  int nb_sectors);
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
+  uint64_t bytes);
 void backup_cow_request_begin(CowRequest *req, BlockJob *job,
-  int64_t sector_num,
-  int nb_sectors);
+  int64_t offset, uint64_t bytes);
 void backup_cow_request_end(CowRequest *req);

 void backup_do_checkpoint(BlockJob *job, Error **errp);
diff --git a/block/backup.c b/block/backup.c
index 4e64710..cfbd921 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -55,7 +55,7 @@ static inline int64_t cluster_size_sectors(BackupBlockJob 
*job)

 /* See if in-flight requests overlap and wait for them to complete */
 static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
-   int64_t start,
+   int64_t offset,
int64_t end)
 {
 CowRequest *req;
@@ -64,7 +64,7 @@ static void coroutine_fn 
wait_for_overlapping_requests(BackupBlockJob *job,
 do {
 retry = false;
 QLIST_FOREACH(req, >inflight_reqs, list) {
-if (end > req->start && start < req->end) {
+if (end > req->start_byte && offset < req->end_byte) {
 qemu_co_queue_wait(>wait_queue, NULL);
 retry = true;
 break;
@@ -75,10 +75,10 @@ static void coroutine_fn 
wait_for_overlapping_requests(BackupBlockJob *job,

 /* Keep track of an in-flight request */
 static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
- int64_t start, int64_t end)
+  int64_t offset, int64_t end)
 {
-req->start = start;
-req->end = end;
+req->start_byte = offset;
+req->end_byte = end;
 qemu_co_queue_init(>wait_queue);
 QLIST_INSERT_HEAD(>inflight_reqs, req, list);
 }
@@ -114,8 +114,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
   sector_num * BDRV_SECTOR_SIZE,
   nb_sectors * BDRV_SECTOR_SIZE);

-wait_for_overlapping_requests(job, start, end);
-cow_request_begin(_request, job, start, end);
+wait_for_overlapping_requests(job, start * job->cluster_size,
+  end * job->cluster_size);
+cow_request_begin(_request, job, start * job->cluster_size,
+  end * job->cluster_size);

 for (; start < end; start++) {
 if (test_bit(start, job->done_bitmap)) {
@@ -277,32 +279,29 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
 bitmap_zero(backup_job->done_bitmap, len);
 }

-void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
-  int nb_sectors)
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
+  uint64_t bytes)
 {
 BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
-int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
 int64_t start, end;

 assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);

-start = sector_num / sectors_per_cluster;
-end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+start = QEMU_ALIGN_DOWN(offset, backup_job->cluster_size);
+end =

[Qemu-devel] [PATCH v4 12/21] mirror: Switch mirror_do_read() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 

---
v3-v4: no change
v2: rebase to earlier changes
---
 block/mirror.c | 75 ++
 1 file changed, 33 insertions(+), 42 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 374cefd..d3325d0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -196,7 +196,7 @@ static inline int mirror_clip_sectors(MirrorBlockJob *s,
 /* Round offset and/or bytes to target cluster if COW is needed, and
  * return the offset of the adjusted tail against original. */
 static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
-unsigned int *bytes)
+uint64_t *bytes)
 {
 bool need_cow;
 int ret = 0;
@@ -204,6 +204,7 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t 
*offset,
 unsigned int align_bytes = *bytes;
 int max_bytes = s->granularity * s->max_iov;

+assert(*bytes < INT_MAX);
 need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
 need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
   s->cow_bitmap);
@@ -239,59 +240,50 @@ static inline void mirror_wait_for_io(MirrorBlockJob *s)
 }

 /* Submit async read while handling COW.
- * Returns: The number of sectors copied after and including sector_num,
- *  excluding any sectors copied prior to sector_num due to alignment.
- *  This will be nb_sectors if no alignment is necessary, or
- *  (new_end - sector_num) if tail is rounded up or down due to
+ * Returns: The number of bytes copied after and including offset,
+ *  excluding any bytes copied prior to offset due to alignment.
+ *  This will be @bytes if no alignment is necessary, or
+ *  (new_end - offset) if tail is rounded up or down due to
  *  alignment or buffer limit.
  */
-static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
-  int nb_sectors)
+static uint64_t mirror_do_read(MirrorBlockJob *s, int64_t offset,
+   uint64_t bytes)
 {
 BlockBackend *source = s->common.blk;
-int sectors_per_chunk, nb_chunks;
-int ret;
+int nb_chunks;
+uint64_t ret;
 MirrorOp *op;
-int max_sectors;
+uint64_t max_bytes;

-sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-max_sectors = sectors_per_chunk * s->max_iov;
+max_bytes = s->granularity * s->max_iov;

 /* We can only handle as much as buf_size at a time. */
-nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
-nb_sectors = MIN(max_sectors, nb_sectors);
-assert(nb_sectors);
-assert(nb_sectors < BDRV_REQUEST_MAX_SECTORS);
-ret = nb_sectors;
+bytes = MIN(s->buf_size, MIN(max_bytes, bytes));
+assert(bytes);
+assert(bytes < BDRV_REQUEST_MAX_BYTES);
+ret = bytes;

 if (s->cow_bitmap) {
-int64_t offset = sector_num * BDRV_SECTOR_SIZE;
-unsigned int bytes = nb_sectors * BDRV_SECTOR_SIZE;
-int gap;
-
-gap = mirror_cow_align(s, , );
-sector_num = offset / BDRV_SECTOR_SIZE;
-nb_sectors = bytes / BDRV_SECTOR_SIZE;
-ret += gap / BDRV_SECTOR_SIZE;
+ret += mirror_cow_align(s, , );
 }
-assert(nb_sectors << BDRV_SECTOR_BITS <= s->buf_size);
-/* The sector range must meet granularity because:
+assert(bytes <= s->buf_size);
+/* The range will be sector-aligned because:
  * 1) Caller passes in aligned values;
- * 2) mirror_cow_align is used only when target cluster is larger. */
-assert(!(sector_num % sectors_per_chunk));
-nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
+ * 2) mirror_cow_align is used only when target cluster is larger.
+ * But it might not be cluster-aligned at end-of-file. */
+assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
+nb_chunks = DIV_ROUND_UP(bytes, s->granularity);

 while (s->buf_free_count < nb_chunks) {
-trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
- s->in_flight);
+trace_mirror_yield_in_flight(s, offset, s->in_flight);
 mirror_wait_for_io(s);
 }

 /* Allocate a MirrorOp that is used as an AIO callback.  */
 op = g_new(MirrorOp, 1);
 op->s = s;
-op->offset = sector_num * BDRV_SECTOR_SIZE;
-op->bytes = nb_sectors * BDRV_SECTOR_SIZE;
+op->offset = offset;
+op->bytes = bytes;

 /* Now make a QEMUIOVector taking enough granularity-sized chunks
  * from s->buf_free.
@@ -299,7 +291,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 qemu_iovec_init(>qiov, nb_chunks);
 while (nb_chunks-- > 0) {

[Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting an
internal structure (no semantic change), and all references to the
buffer size.

Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS
(necessary for interaction with sector-based dirty bitmaps, until
a later patch converts those to be byte-based) does not suffer from
truncation problems.

[checkpatch has a false positive on use of MIN() in this patch]

Signed-off-by: Eric Blake 

---
v4: add assertion and formatting tweak [Kevin], R-b dropped
v2-v3: no change
---
 block/mirror.c | 82 +-
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b4dfe95..9aca0cb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -24,9 +24,8 @@

 #define SLICE_TIME1ULL /* ns */
 #define MAX_IN_FLIGHT 16
-#define MAX_IO_SECTORS ((1 << 20) >> BDRV_SECTOR_BITS) /* 1 Mb */
-#define DEFAULT_MIRROR_BUF_SIZE \
-(MAX_IN_FLIGHT * MAX_IO_SECTORS * BDRV_SECTOR_SIZE)
+#define MAX_IO_BYTES (1 << 20) /* 1 Mb */
+#define DEFAULT_MIRROR_BUF_SIZE (MAX_IN_FLIGHT * MAX_IO_BYTES)

 /* The mirroring buffer is a list of granularity-sized chunks.
  * Free chunks are organized in a list.
@@ -67,11 +66,11 @@ typedef struct MirrorBlockJob {
 uint64_t last_pause_ns;
 unsigned long *in_flight_bitmap;
 int in_flight;
-int64_t sectors_in_flight;
+int64_t bytes_in_flight;
 int ret;
 bool unmap;
 bool waiting_for_io;
-int target_cluster_sectors;
+int target_cluster_size;
 int max_iov;
 bool initial_zeroing_ongoing;
 } MirrorBlockJob;
@@ -79,8 +78,8 @@ typedef struct MirrorBlockJob {
 typedef struct MirrorOp {
 MirrorBlockJob *s;
 QEMUIOVector qiov;
-int64_t sector_num;
-int nb_sectors;
+int64_t offset;
+uint64_t bytes;
 } MirrorOp;

 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
@@ -101,13 +100,12 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 MirrorBlockJob *s = op->s;
 struct iovec *iov;
 int64_t chunk_num;
-int i, nb_chunks, sectors_per_chunk;
+int i, nb_chunks;

-trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
-op->nb_sectors * BDRV_SECTOR_SIZE, ret);
+trace_mirror_iteration_done(s, op->offset, op->bytes, ret);

 s->in_flight--;
-s->sectors_in_flight -= op->nb_sectors;
+s->bytes_in_flight -= op->bytes;
 iov = op->qiov.iov;
 for (i = 0; i < op->qiov.niov; i++) {
 MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
@@ -115,16 +113,15 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 s->buf_free_count++;
 }

-sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-chunk_num = op->sector_num / sectors_per_chunk;
-nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
+chunk_num = op->offset / s->granularity;
+nb_chunks = DIV_ROUND_UP(op->bytes, s->granularity);
 bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
 if (ret >= 0) {
 if (s->cow_bitmap) {
 bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
 }
 if (!s->initial_zeroing_ongoing) {
-s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
+s->common.offset += op->bytes;
 }
 }
 qemu_iovec_destroy(>qiov);
@@ -144,7 +141,8 @@ static void mirror_write_complete(void *opaque, int ret)
 if (ret < 0) {
 BlockErrorAction action;

-bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
+bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
+  op->bytes >> BDRV_SECTOR_BITS);
 action = mirror_error_action(s, false, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -163,7 +161,8 @@ static void mirror_read_complete(void *opaque, int ret)
 if (ret < 0) {
 BlockErrorAction action;

-bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
+bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
+  op->bytes >> BDRV_SECTOR_BITS);
 action = mirror_error_action(s, true, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -171,7 +170,7 @@ static void mirror_read_complete(void *opaque, int ret)

 mirror_iteration_done(op, ret);
 } else {
-blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, 
>qiov,
+blk_aio_pwritev(s->target, op->offset, >qiov,
 0, mirror_write_complete, op);
 }
 aio_context_release(blk_get_aio_context(s->common.blk));
@@ -211,7 +210,8 @@ static int mirror_cow_align(MirrorBlockJob *s,

[Qemu-devel] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based

2017-07-05 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned
on input and that *pnum is sector-aligned on return to the caller,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, this code adds usages like
DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned
values, where the call might reasonbly give non-aligned results
in the future; on the other hand, no rounding is needed for callers
that should just continue to work with byte alignment.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_is_allocated().  But
some code, particularly bdrv_commit(), gets a lot simpler because it
no longer has to mess with sectors; also, it is now possible to pass
NULL if the caller does not care how much of the image is allocated
beyond the initial offset.

For ease of review, bdrv_is_allocated_above() will be tackled
separately.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Juan Quintela 
Reviewed-by: Jeff Cody 

---
v4: rebase to vvfat TAB cleanup, R-b kept
v3: no change
v2: rebase to earlier changes, tweak commit message
---
 include/block/block.h |  4 +--
 block/backup.c| 17 -
 block/commit.c| 21 +++-
 block/io.c| 49 +---
 block/stream.c|  5 ++--
 block/vvfat.c | 34 ++---
 migration/block.c |  9 ---
 qemu-img.c|  5 +++-
 qemu-io-cmds.c| 70 +++
 9 files changed, 114 insertions(+), 100 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index a9dc753..d3e01fb 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -427,8 +427,8 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 int64_t sector_num,
 int nb_sectors, int *pnum,
 BlockDriverState **file);
-int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
-  int *pnum);
+int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
+  int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
 int64_t sector_num, int nb_sectors, int *pnum);

diff --git a/block/backup.c b/block/backup.c
index 04def91..b2048bf 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -47,12 +47,6 @@ typedef struct BackupBlockJob {
 QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;

-/* Size of a cluster in sectors, instead of bytes. */
-static inline int64_t cluster_size_sectors(BackupBlockJob *job)
-{
-  return job->cluster_size / BDRV_SECTOR_SIZE;
-}
-
 /* See if in-flight requests overlap and wait for them to complete */
 static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
int64_t offset,
@@ -433,7 +427,6 @@ static void coroutine_fn backup_run(void *opaque)
 BackupCompleteData *data;
 BlockDriverState *bs = blk_bs(job->common.blk);
 int64_t offset;
-int64_t sectors_per_cluster = cluster_size_sectors(job);
 int ret = 0;

 QLIST_INIT(>inflight_reqs);
@@ -465,12 +458,13 @@ static void coroutine_fn backup_run(void *opaque)
 }

 if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
-int i, n;
+int i;
+int64_t n;

 /* Check to see if these blocks are already in the
  * backing file. */

-for (i = 0; i < sectors_per_cluster;) {
+for (i = 0; i < job->cluster_size;) {
 /* bdrv_is_allocated() only returns true/false based
  * on the first set of sectors it comes across that
  * are are all in the same state.
@@ -478,9 +472,8 @@ static void coroutine_fn backup_run(void *opaque)
  * backup cluster length.  We end up copying more than
  * needed but at some point that is always the case. */
 alloced =
-bdrv_is_allocated(bs,
-  (offset >> BDRV_SECTOR_BITS) + i,
-  sectors_per_cluster - i, );
+

[Qemu-devel] [PATCH v4 13/21] mirror: Switch mirror_iteration() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of mirroring to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are both sector-aligned and multiples of the granularity).  Drop
the now-unused mirror_clip_sectors().

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
---
v4: no change
v3: rebase to Paolo's thread-safety changes, R-b kept
v2: straightforward rebase to earlier mirror_clip_bytes() change, R-b kept
---
 block/mirror.c | 105 +
 1 file changed, 46 insertions(+), 59 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index d3325d0..f54a8d7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -184,15 +184,6 @@ static inline int64_t mirror_clip_bytes(MirrorBlockJob *s,
 return MIN(bytes, s->bdev_length - offset);
 }

-/* Clip nb_sectors relative to sector_num to not exceed end-of-file */
-static inline int mirror_clip_sectors(MirrorBlockJob *s,
-  int64_t sector_num,
-  int nb_sectors)
-{
-return MIN(nb_sectors,
-   s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
-}
-
 /* Round offset and/or bytes to target cluster if COW is needed, and
  * return the offset of the adjusted tail against original. */
 static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
@@ -336,30 +327,28 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
 BlockDriverState *source = s->source;
-int64_t sector_num, first_chunk;
+int64_t offset, first_chunk;
 uint64_t delay_ns = 0;
 /* At least the first dirty chunk is mirrored in one iteration. */
 int nb_chunks = 1;
-int64_t end = s->bdev_length / BDRV_SECTOR_SIZE;
 int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
 bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
 int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);

 bdrv_dirty_bitmap_lock(s->dirty_bitmap);
-sector_num = bdrv_dirty_iter_next(s->dbi);
-if (sector_num < 0) {
+offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+if (offset < 0) {
 bdrv_set_dirty_iter(s->dbi, 0);
-sector_num = bdrv_dirty_iter_next(s->dbi);
+offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
 trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
   BDRV_SECTOR_SIZE);
-assert(sector_num >= 0);
+assert(offset >= 0);
 }
 bdrv_dirty_bitmap_unlock(s->dirty_bitmap);

-first_chunk = sector_num / sectors_per_chunk;
+first_chunk = offset / s->granularity;
 while (test_bit(first_chunk, s->in_flight_bitmap)) {
-trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
- s->in_flight);
+trace_mirror_yield_in_flight(s, offset, s->in_flight);
 mirror_wait_for_io(s);
 }

@@ -368,25 +357,26 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 /* Find the number of consective dirty chunks following the first dirty
  * one, and wait for in flight requests in them. */
 bdrv_dirty_bitmap_lock(s->dirty_bitmap);
-while (nb_chunks * sectors_per_chunk < (s->buf_size >> BDRV_SECTOR_BITS)) {
+while (nb_chunks * s->granularity < s->buf_size) {
 int64_t next_dirty;
-int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk;
-int64_t next_chunk = next_sector / sectors_per_chunk;
-if (next_sector >= end ||
-!bdrv_get_dirty_locked(source, s->dirty_bitmap, next_sector)) {
+int64_t next_offset = offset + nb_chunks * s->granularity;
+int64_t next_chunk = next_offset / s->granularity;
+if (next_offset >= s->bdev_length ||
+!bdrv_get_dirty_locked(source, s->dirty_bitmap,
+   next_offset >> BDRV_SECTOR_BITS)) {
 break;
 }
 if (test_bit(next_chunk, s->in_flight_bitmap)) {
 break;
 }

-next_dirty = bdrv_dirty_iter_next(s->dbi);
-if (next_dirty > next_sector || next_dirty < 0) {
+next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+if (next_dirty > next_offset || next_dirty < 0) {
 /* The bitmap iterator's cache is stale, refresh it */
-bdrv_set_dirty_iter(s->dbi, next_sector);
-next_dirty = bdrv_dirty_iter_next(s->dbi);
+bdrv_set_dirty_iter(s->dbi, next_offset >> BDRV_SECTOR_BITS);
+next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
 }
-assert(next_dirty == next_sector);
+assert(next_dirty ==

[Qemu-devel] [PATCH v4 21/21] block: Make bdrv_is_allocated_above() byte-based

2017-07-05 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, for the most part this patch is just the
addition of scaling at the callers followed by inverse scaling at
bdrv_is_allocated().  But some code, particularly stream_run(),
gets a lot simpler because it no longer has to mess with sectors.

For ease of review, bdrv_is_allocated() was tackled separately.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Xie Changlong  [replication 
part]
Reviewed-by: Jeff Cody 

---
v3-v4: no change
v2: tweak function comments, favor bdrv_getlength() over ->total_sectors
---
 include/block/block.h |  2 +-
 block/commit.c| 20 
 block/io.c| 42 --
 block/mirror.c|  5 -
 block/replication.c   | 17 -
 block/stream.c| 21 +
 qemu-img.c| 10 +++---
 7 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index d3e01fb..f0fdbe8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -430,7 +430,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
   int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
-int64_t sector_num, int nb_sectors, int *pnum);
+int64_t offset, int64_t bytes, int64_t *pnum);

 bool bdrv_is_read_only(BlockDriverState *bs);
 bool bdrv_is_writable(BlockDriverState *bs);
diff --git a/block/commit.c b/block/commit.c
index 241aa95..774a8a5 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -146,7 +146,7 @@ static void coroutine_fn commit_run(void *opaque)
 int64_t offset;
 uint64_t delay_ns = 0;
 int ret = 0;
-int n = 0; /* sectors */
+int64_t n = 0; /* bytes */
 void *buf = NULL;
 int bytes_written = 0;
 int64_t base_len;
@@ -171,7 +171,7 @@ static void coroutine_fn commit_run(void *opaque)

 buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);

-for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
+for (offset = 0; offset < s->common.len; offset += n) {
 bool copy;

 /* Note that even when no rate limit is applied we need to yield
@@ -183,15 +183,12 @@ static void coroutine_fn commit_run(void *opaque)
 }
 /* Copy if allocated above the base */
 ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
-  offset / BDRV_SECTOR_SIZE,
-  COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
-  );
+  offset, COMMIT_BUFFER_SIZE, );
 copy = (ret == 1);
-trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
+trace_commit_one_iteration(s, offset, n, ret);
 if (copy) {
-ret = commit_populate(s->top, s->base, offset,
-  n * BDRV_SECTOR_SIZE, buf);
-bytes_written += n * BDRV_SECTOR_SIZE;
+ret = commit_populate(s->top, s->base, offset, n, buf);
+bytes_written += n;
 }
 if (ret < 0) {
 BlockErrorAction action =
@@ -204,11 +201,10 @@ static void coroutine_fn commit_run(void *opaque)
 }
 }
 /* Publish progress */
-s->common.offset += n * BDRV_SECTOR_SIZE;
+s->common.offset += n;

 if (copy && s->common.speed) {
-delay_ns = ratelimit_calculate_delay(>limit,
- n * BDRV_SECTOR_SIZE);
+delay_ns = ratelimit_calculate_delay(>limit, n);
 }
 }

diff --git a/block/io.c b/block/io.c
index fb8d1c7..569c503 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1931,54 +1931,52 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
*bs, int64_t offset,
 /*
  * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
  *
- * Return true if the given sector is allocated in any image between
- * BASE and TOP (inclusive).  BASE can be NULL to check if the given
- * sector is allocated in any image of the chain.  Return false otherwise,
+ * Return true if

[Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based

2017-07-05 Thread Eric Blake

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

The overall conversion currently looks like:
part 1: bdrv_is_allocated (this series, v3 was at [1])
part 2: dirty-bitmap (v4 is posted [2]; needs reviews)
part 3: bdrv_get_block_status (v2 is posted [3] and is mostly reviewed)
part 4: upcoming series, for .bdrv_co_block_status (second half of v1 [4])

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-allocated-v4

Depends on Max's block branch (which in turn includes Kevin's)

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06077.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00269.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00427.html
[4] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02642.html

Changes since v3:
new patch 4/21 to avoid a semantic change in 5/21 [Kevin]
new assertion in 8/21 [Kevin]
whitespace rebase churn in vvfat in 19/21

Most patches are now reviewed multiple times, although Kevin has not yet
reviewed the second half of the series.

001/21:[] [--] 'blockjob: Track job ratelimits via bytes, not sectors'
002/21:[] [--] 'trace: Show blockjob actions via bytes, not sectors'
003/21:[] [--] 'stream: Switch stream_populate() to byte-based'
004/21:[down] 'stream: Drop reached_end for stream_complete()'
005/21:[0002] [FC] 'stream: Switch stream_run() to byte-based'
006/21:[] [--] 'commit: Switch commit_populate() to byte-based'
007/21:[] [--] 'commit: Switch commit_run() to byte-based'
008/21:[0005] [FC] 'mirror: Switch MirrorBlockJob to byte-based'
009/21:[] [--] 'mirror: Switch mirror_do_zero_or_discard() to byte-based'
010/21:[] [--] 'mirror: Update signature of mirror_clip_sectors()'
011/21:[] [--] 'mirror: Switch mirror_cow_align() to byte-based'
012/21:[] [--] 'mirror: Switch mirror_do_read() to byte-based'
013/21:[] [--] 'mirror: Switch mirror_iteration() to byte-based'
014/21:[] [--] 'block: Drop unused bdrv_round_sectors_to_clusters()'
015/21:[] [--] 'backup: Switch BackupBlockJob to byte-based'
016/21:[] [--] 'backup: Switch block_backup.h to byte-based'
017/21:[] [--] 'backup: Switch backup_do_cow() to byte-based'
018/21:[] [--] 'backup: Switch backup_run() to byte-based'
019/21:[0004] [FC] 'block: Make bdrv_is_allocated() byte-based'
020/21:[] [--] 'block: Minimize raw use of bds->total_sectors'
021/21:[] [--] 'block: Make bdrv_is_allocated_above() byte-based'

Eric Blake (21):
  blockjob: Track job ratelimits via bytes, not sectors
  trace: Show blockjob actions via bytes, not sectors
  stream: Switch stream_populate() to byte-based
  stream: Drop reached_end for stream_complete()
  stream: Switch stream_run() to byte-based
  commit: Switch commit_populate() to byte-based
  commit: Switch commit_run() to byte-based
  mirror: Switch MirrorBlockJob to byte-based
  mirror: Switch mirror_do_zero_or_discard() to byte-based
  mirror: Update signature of mirror_clip_sectors()
  mirror: Switch mirror_cow_align() to byte-based
  mirror: Switch mirror_do_read() to byte-based
  mirror: Switch mirror_iteration() to byte-based
  block: Drop unused bdrv_round_sectors_to_clusters()
  backup: Switch BackupBlockJob to byte-based
  backup: Switch block_backup.h to byte-based
  backup: Switch backup_do_cow() to byte-based
  backup: Switch backup_run() to byte-based
  block: Make bdrv_is_allocated() byte-based
  block: Minimize raw use of bds->total_sectors
  block: Make bdrv_is_allocated_above() byte-based

 include/block/block.h|  10 +-
 include/block/block_backup.h |  11 +-
 include/qemu/ratelimit.h |   3 +-
 block/backup.c   | 130 --
 block/commit.c   |  54 
 block/io.c   |  92 +++--
 block/mirror.c   | 305 ++-
 block/replication.c  |  29 ++--
 block/stream.c   |  37 +++---
 block/vvfat.c|  34 +++--
 migration/block.c|   9 +-
 qemu-img.c   |  15 ++-
 qemu-io-cmds.c   |  70 +-
 block/trace-events   |  14 +-
 14 files changed, 400 insertions(+), 413 deletions(-)

-- 
2.9.4

[Qemu-devel] [PATCH v4 05/21] stream: Switch stream_run() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of streaming to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are sector-aligned).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 

---
v4: avoid reached_end change by rebasing to earlier changes [Kevin], R-b kept
v2-v3: no change
---
 block/stream.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 12f1659..e3dd2ac 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -107,12 +107,11 @@ static void coroutine_fn stream_run(void *opaque)
 BlockBackend *blk = s->common.blk;
 BlockDriverState *bs = blk_bs(blk);
 BlockDriverState *base = s->base;
-int64_t sector_num = 0;
-int64_t end = -1;
+int64_t offset = 0;
 uint64_t delay_ns = 0;
 int error = 0;
 int ret = 0;
-int n = 0;
+int n = 0; /* sectors */
 void *buf;

 if (!bs->backing) {
@@ -125,7 +124,6 @@ static void coroutine_fn stream_run(void *opaque)
 goto out;
 }

-end = s->common.len >> BDRV_SECTOR_BITS;
 buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE);

 /* Turn on copy-on-read for the whole block device so that guest read
@@ -137,7 +135,7 @@ static void coroutine_fn stream_run(void *opaque)
 bdrv_enable_copy_on_read(bs);
 }

-for (sector_num = 0; sector_num < end; sector_num += n) {
+for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
 bool copy;

 /* Note that even when no rate limit is applied we need to yield
@@ -150,28 +148,26 @@ static void coroutine_fn stream_run(void *opaque)

 copy = false;

-ret = bdrv_is_allocated(bs, sector_num,
+ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
 STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, );
 if (ret == 1) {
 /* Allocated in the top, no need to copy.  */
 } else if (ret >= 0) {
 /* Copy if allocated in the intermediate images.  Limit to the
- * known-unallocated area [sector_num, sector_num+n).  */
+ * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
 ret = bdrv_is_allocated_above(backing_bs(bs), base,
-  sector_num, n, );
+  offset / BDRV_SECTOR_SIZE, n, );

 /* Finish early if end of backing file has been reached */
 if (ret == 0 && n == 0) {
-n = end - sector_num;
+n = (s->common.len - offset) / BDRV_SECTOR_SIZE;
 }

 copy = (ret == 1);
 }
-trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
-   n * BDRV_SECTOR_SIZE, ret);
+trace_stream_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
 if (copy) {
-ret = stream_populate(blk, sector_num * BDRV_SECTOR_SIZE,
-  n * BDRV_SECTOR_SIZE, buf);
+ret = stream_populate(blk, offset, n * BDRV_SECTOR_SIZE, buf);
 }
 if (ret < 0) {
 BlockErrorAction action =
-- 
2.9.4

[Qemu-devel] [PATCH v4 02/21] trace: Show blockjob actions via bytes, not sectors

2017-07-05 Thread Eric Blake

Upcoming patches are going to switch to byte-based interfaces
instead of sector-based.  Even worse, trace_backup_do_cow_enter()
had a weird mix of cluster and sector indices.

The trace interface is low enough that there are no stability
guarantees, and therefore nothing wrong with changing our units,
even in cases like trace_backup_do_cow_skip() where we are not
changing the trace output.  So make the tracing uniformly use
bytes.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
v2: improve commit message, no code change
---
 block/backup.c | 16 ++--
 block/commit.c |  3 ++-
 block/mirror.c | 26 +-
 block/stream.c |  3 ++-
 block/trace-events | 14 +++---
 5 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 9ca1d8e..06431ac 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -102,6 +102,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 void *bounce_buffer = NULL;
 int ret = 0;
 int64_t sectors_per_cluster = cluster_size_sectors(job);
+int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
 int64_t start, end;
 int n;

@@ -110,18 +111,20 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 start = sector_num / sectors_per_cluster;
 end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);

-trace_backup_do_cow_enter(job, start, sector_num, nb_sectors);
+trace_backup_do_cow_enter(job, start * bytes_per_cluster,
+  sector_num * BDRV_SECTOR_SIZE,
+  nb_sectors * BDRV_SECTOR_SIZE);

 wait_for_overlapping_requests(job, start, end);
 cow_request_begin(_request, job, start, end);

 for (; start < end; start++) {
 if (test_bit(start, job->done_bitmap)) {
-trace_backup_do_cow_skip(job, start);
+trace_backup_do_cow_skip(job, start * bytes_per_cluster);
 continue; /* already copied */
 }

-trace_backup_do_cow_process(job, start);
+trace_backup_do_cow_process(job, start * bytes_per_cluster);

 n = MIN(sectors_per_cluster,
 job->common.len / BDRV_SECTOR_SIZE -
@@ -138,7 +141,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 bounce_qiov.size, _qiov,
 is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
 if (ret < 0) {
-trace_backup_do_cow_read_fail(job, start, ret);
+trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, ret);
 if (error_is_read) {
 *error_is_read = true;
 }
@@ -154,7 +157,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 
0);
 }
 if (ret < 0) {
-trace_backup_do_cow_write_fail(job, start, ret);
+trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, 
ret);
 if (error_is_read) {
 *error_is_read = false;
 }
@@ -177,7 +180,8 @@ out:

 cow_request_end(_request);

-trace_backup_do_cow_return(job, sector_num, nb_sectors, ret);
+trace_backup_do_cow_return(job, sector_num * BDRV_SECTOR_SIZE,
+   nb_sectors * BDRV_SECTOR_SIZE, ret);

 qemu_co_rwlock_unlock(>flush_rwlock);

diff --git a/block/commit.c b/block/commit.c
index 6993994..4cda7f2 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -190,7 +190,8 @@ static void coroutine_fn commit_run(void *opaque)
   COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
   );
 copy = (ret == 1);
-trace_commit_one_iteration(s, sector_num, n, ret);
+trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
+   n * BDRV_SECTOR_SIZE, ret);
 if (copy) {
 ret = commit_populate(s->top, s->base, sector_num, n, buf);
 bytes_written += n * BDRV_SECTOR_SIZE;
diff --git a/block/mirror.c b/block/mirror.c
index eb27efc..b4dfe95 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -103,7 +103,8 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 int64_t chunk_num;
 int i, nb_chunks, sectors_per_chunk;

-trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors, ret);
+trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
+op->nb_sectors * BDRV_SECTOR_SIZE, ret);

 s->in_flight--;
 s->sectors_in_flight -= op->nb_sectors;
@@ -268,7 +269,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);

 while (s->buf_free_count < nb_chunks) {
-

[Qemu-devel] [PATCH v4 06/21] commit: Switch commit_populate() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Start by converting an
internal function (no semantic change).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
v2: no change
---
 block/commit.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 4cda7f2..6f67d78 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -47,26 +47,25 @@ typedef struct CommitBlockJob {
 } CommitBlockJob;

 static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
-int64_t sector_num, int nb_sectors,
+int64_t offset, uint64_t bytes,
 void *buf)
 {
 int ret = 0;
 QEMUIOVector qiov;
 struct iovec iov = {
 .iov_base = buf,
-.iov_len = nb_sectors * BDRV_SECTOR_SIZE,
+.iov_len = bytes,
 };

+assert(bytes < SIZE_MAX);
 qemu_iovec_init_external(, , 1);

-ret = blk_co_preadv(bs, sector_num * BDRV_SECTOR_SIZE,
-qiov.size, , 0);
+ret = blk_co_preadv(bs, offset, qiov.size, , 0);
 if (ret < 0) {
 return ret;
 }

-ret = blk_co_pwritev(base, sector_num * BDRV_SECTOR_SIZE,
- qiov.size, , 0);
+ret = blk_co_pwritev(base, offset, qiov.size, , 0);
 if (ret < 0) {
 return ret;
 }
@@ -193,7 +192,9 @@ static void coroutine_fn commit_run(void *opaque)
 trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
n * BDRV_SECTOR_SIZE, ret);
 if (copy) {
-ret = commit_populate(s->top, s->base, sector_num, n, buf);
+ret = commit_populate(s->top, s->base,
+  sector_num * BDRV_SECTOR_SIZE,
+  n * BDRV_SECTOR_SIZE, buf);
 bytes_written += n * BDRV_SECTOR_SIZE;
 }
 if (ret < 0) {
-- 
2.9.4

[Qemu-devel] [PATCH v4 03/21] stream: Switch stream_populate() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Start by converting an
internal function (no semantic change).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
v2: no change
---
 block/stream.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 6cb3939..746d525 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -41,20 +41,20 @@ typedef struct StreamBlockJob {
 } StreamBlockJob;

 static int coroutine_fn stream_populate(BlockBackend *blk,
-int64_t sector_num, int nb_sectors,
+int64_t offset, uint64_t bytes,
 void *buf)
 {
 struct iovec iov = {
 .iov_base = buf,
-.iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
+.iov_len  = bytes,
 };
 QEMUIOVector qiov;

+assert(bytes < SIZE_MAX);
 qemu_iovec_init_external(, , 1);

 /* Copy-on-read the unallocated clusters */
-return blk_co_preadv(blk, sector_num * BDRV_SECTOR_SIZE, qiov.size, ,
- BDRV_REQ_COPY_ON_READ);
+return blk_co_preadv(blk, offset, qiov.size, , BDRV_REQ_COPY_ON_READ);
 }

 typedef struct {
@@ -171,7 +171,8 @@ static void coroutine_fn stream_run(void *opaque)
 trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
n * BDRV_SECTOR_SIZE, ret);
 if (copy) {
-ret = stream_populate(blk, sector_num, n, buf);
+ret = stream_populate(blk, sector_num * BDRV_SECTOR_SIZE,
+  n * BDRV_SECTOR_SIZE, buf);
 }
 if (ret < 0) {
 BlockErrorAction action =
-- 
2.9.4

[Qemu-devel] [PATCH v4 10/21] mirror: Update signature of mirror_clip_sectors()

2017-07-05 Thread Eric Blake

Rather than having a void function that modifies its input
in-place as the output, change the signature to reduce a layer
of indirection and return the result.

Suggested-by: John Snow 
Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 

---
v3-v4: no change
v2: new patch
---
 block/mirror.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 2736d0b..70682b6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -176,12 +176,12 @@ static void mirror_read_complete(void *opaque, int ret)
 aio_context_release(blk_get_aio_context(s->common.blk));
 }

-static inline void mirror_clip_sectors(MirrorBlockJob *s,
-   int64_t sector_num,
-   int *nb_sectors)
+static inline int mirror_clip_sectors(MirrorBlockJob *s,
+  int64_t sector_num,
+  int nb_sectors)
 {
-*nb_sectors = MIN(*nb_sectors,
-  s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
+return MIN(nb_sectors,
+   s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
 }

 /* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
@@ -216,7 +216,8 @@ static int mirror_cow_align(MirrorBlockJob *s,
 }
 /* Clipping may result in align_nb_sectors unaligned to chunk boundary, but
  * that doesn't matter because it's already the end of source image. */
-mirror_clip_sectors(s, align_sector_num, _nb_sectors);
+align_nb_sectors = mirror_clip_sectors(s, align_sector_num,
+   align_nb_sectors);

 ret = align_sector_num + align_nb_sectors - (*sector_num + *nb_sectors);
 *sector_num = align_sector_num;
@@ -445,7 +446,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 return 0;
 }

-mirror_clip_sectors(s, sector_num, _sectors);
+io_sectors = mirror_clip_sectors(s, sector_num, io_sectors);
 switch (mirror_method) {
 case MIRROR_METHOD_COPY:
 io_sectors = mirror_do_read(s, sector_num, io_sectors);
-- 
2.9.4

[Qemu-devel] [PATCH v4 07/21] commit: Switch commit_run() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of committing to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are sector-aligned).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 

---
v2: no change
---
 block/commit.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 6f67d78..c3a7bca 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -143,17 +143,16 @@ static void coroutine_fn commit_run(void *opaque)
 {
 CommitBlockJob *s = opaque;
 CommitCompleteData *data;
-int64_t sector_num, end;
+int64_t offset;
 uint64_t delay_ns = 0;
 int ret = 0;
-int n = 0;
+int n = 0; /* sectors */
 void *buf = NULL;
 int bytes_written = 0;
 int64_t base_len;

 ret = s->common.len = blk_getlength(s->top);

-
 if (s->common.len < 0) {
 goto out;
 }
@@ -170,10 +169,9 @@ static void coroutine_fn commit_run(void *opaque)
 }
 }

-end = s->common.len >> BDRV_SECTOR_BITS;
 buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);

-for (sector_num = 0; sector_num < end; sector_num += n) {
+for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
 bool copy;

 /* Note that even when no rate limit is applied we need to yield
@@ -185,15 +183,13 @@ static void coroutine_fn commit_run(void *opaque)
 }
 /* Copy if allocated above the base */
 ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
-  sector_num,
+  offset / BDRV_SECTOR_SIZE,
   COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
   );
 copy = (ret == 1);
-trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
-   n * BDRV_SECTOR_SIZE, ret);
+trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
 if (copy) {
-ret = commit_populate(s->top, s->base,
-  sector_num * BDRV_SECTOR_SIZE,
+ret = commit_populate(s->top, s->base, offset,
   n * BDRV_SECTOR_SIZE, buf);
 bytes_written += n * BDRV_SECTOR_SIZE;
 }
-- 
2.9.4

[Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete()

2017-07-05 Thread Eric Blake

stream_complete() skips the work of rewriting the backing file if
the job was cancelled, if data->reached_end is false, or if there
was an error detected (non-zero data->ret) during the streaming.
But note that in stream_run(), data->reached_end is only set if the
loop ran to completion, and data->ret is only 0 in two cases:
either the loop ran to completion (possibly by cancellation, but
stream_complete checks for that), or we took an early goto out
because there is no bs->backing.  Thus, we can preserve the same
semantics without the use of reached_end, by merely checking for
bs->backing (and logically, if there was no backing file, streaming
is a no-op, so there is no backing file to rewrite).

Suggested-by: Kevin Wolf 
Signed-off-by: Eric Blake 

---
v4: new patch
---
 block/stream.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 746d525..12f1659 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -59,7 +59,6 @@ static int coroutine_fn stream_populate(BlockBackend *blk,

 typedef struct {
 int ret;
-bool reached_end;
 } StreamCompleteData;

 static void stream_complete(BlockJob *job, void *opaque)
@@ -70,7 +69,7 @@ static void stream_complete(BlockJob *job, void *opaque)
 BlockDriverState *base = s->base;
 Error *local_err = NULL;

-if (!block_job_is_cancelled(>common) && data->reached_end &&
+if (!block_job_is_cancelled(>common) && bs->backing &&
 data->ret == 0) {
 const char *base_id = NULL, *base_fmt = NULL;
 if (base) {
@@ -211,7 +210,6 @@ out:
 /* Modify backing chain and close BDSes in main loop */
 data = g_malloc(sizeof(*data));
 data->ret = ret;
-data->reached_end = sector_num == end;
 block_job_defer_to_main_loop(>common, stream_complete, data);
 }

-- 
2.9.4

[Qemu-devel] [PATCH v4 09/21] mirror: Switch mirror_do_zero_or_discard() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 

---
v2-v4: no change
---
 block/mirror.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9aca0cb..2736d0b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -305,8 +305,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 }

 static void mirror_do_zero_or_discard(MirrorBlockJob *s,
-  int64_t sector_num,
-  int nb_sectors,
+  int64_t offset,
+  uint64_t bytes,
   bool is_discard)
 {
 MirrorOp *op;
@@ -315,16 +315,16 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
  * so the freeing in mirror_iteration_done is nop. */
 op = g_new0(MirrorOp, 1);
 op->s = s;
-op->offset = sector_num * BDRV_SECTOR_SIZE;
-op->bytes = nb_sectors * BDRV_SECTOR_SIZE;
+op->offset = offset;
+op->bytes = bytes;

 s->in_flight++;
-s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
+s->bytes_in_flight += bytes;
 if (is_discard) {
-blk_aio_pdiscard(s->target, sector_num << BDRV_SECTOR_BITS,
+blk_aio_pdiscard(s->target, offset,
  op->bytes, mirror_write_complete, op);
 } else {
-blk_aio_pwrite_zeroes(s->target, sector_num * BDRV_SECTOR_SIZE,
+blk_aio_pwrite_zeroes(s->target, offset,
   op->bytes, s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
   mirror_write_complete, op);
 }
@@ -453,7 +453,8 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 break;
 case MIRROR_METHOD_ZERO:
 case MIRROR_METHOD_DISCARD:
-mirror_do_zero_or_discard(s, sector_num, io_sectors,
+mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
+  io_sectors * BDRV_SECTOR_SIZE,
   mirror_method == MIRROR_METHOD_DISCARD);
 if (write_zeroes_ok) {
 io_bytes_acct = 0;
@@ -657,7 +658,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 continue;
 }

-mirror_do_zero_or_discard(s, sector_num, nb_sectors, false);
+mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
+  nb_sectors * BDRV_SECTOR_SIZE, false);
 sector_num += nb_sectors;
 }

-- 
2.9.4

[Qemu-devel] [PATCH v4 01/21] blockjob: Track job ratelimits via bytes, not sectors

2017-07-05 Thread Eric Blake

The user interface specifies job rate limits in bytes/second.
It's pointless to have our internal representation track things
in sectors/second, particularly since we want to move away from
sector-based interfaces.

Fix up a doc typo found while verifying that the ratelimit
code handles the scaling difference.

Repetition of expressions like 'n * BDRV_SECTOR_SIZE' will be
cleaned up later when functions are converted to iterate over
images by bytes rather than by sectors.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
v2: adjust commit message based on review; no code change
---
 include/qemu/ratelimit.h |  3 ++-
 block/backup.c   |  5 +++--
 block/commit.c   |  5 +++--
 block/mirror.c   | 13 +++--
 block/stream.c   |  5 +++--
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
index 8da1232..8dece48 100644
--- a/include/qemu/ratelimit.h
+++ b/include/qemu/ratelimit.h
@@ -24,7 +24,8 @@ typedef struct {

 /** Calculate and return delay for next request in ns
  *
- * Record that we sent @p n data units. If we may send more data units
+ * Record that we sent @n data units (where @n matches the scale chosen
+ * during ratelimit_set_speed). If we may send more data units
  * in the current time slice, return 0 (i.e. no delay). Otherwise
  * return the amount of time (in ns) until the start of the next time
  * slice that will permit sending the next chunk of data.
diff --git a/block/backup.c b/block/backup.c
index 5387fbd..9ca1d8e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -208,7 +208,7 @@ static void backup_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 error_setg(errp, QERR_INVALID_PARAMETER, "speed");
 return;
 }
-ratelimit_set_speed(>limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+ratelimit_set_speed(>limit, speed, SLICE_TIME);
 }

 static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
@@ -359,7 +359,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob 
*job)
  */
 if (job->common.speed) {
 uint64_t delay_ns = ratelimit_calculate_delay(>limit,
-  job->sectors_read);
+  job->sectors_read *
+  BDRV_SECTOR_SIZE);
 job->sectors_read = 0;
 block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
 } else {
diff --git a/block/commit.c b/block/commit.c
index 524bd54..6993994 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -209,7 +209,8 @@ static void coroutine_fn commit_run(void *opaque)
 s->common.offset += n * BDRV_SECTOR_SIZE;

 if (copy && s->common.speed) {
-delay_ns = ratelimit_calculate_delay(>limit, n);
+delay_ns = ratelimit_calculate_delay(>limit,
+ n * BDRV_SECTOR_SIZE);
 }
 }

@@ -231,7 +232,7 @@ static void commit_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 error_setg(errp, QERR_INVALID_PARAMETER, "speed");
 return;
 }
-ratelimit_set_speed(>limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+ratelimit_set_speed(>limit, speed, SLICE_TIME);
 }

 static const BlockJobDriver commit_job_driver = {
diff --git a/block/mirror.c b/block/mirror.c
index 61a862d..eb27efc 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -396,7 +396,8 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
 while (nb_chunks > 0 && sector_num < end) {
 int64_t ret;
-int io_sectors, io_sectors_acct;
+int io_sectors;
+int64_t io_bytes_acct;
 BlockDriverState *file;
 enum MirrorMethod {
 MIRROR_METHOD_COPY,
@@ -444,16 +445,16 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 switch (mirror_method) {
 case MIRROR_METHOD_COPY:
 io_sectors = mirror_do_read(s, sector_num, io_sectors);
-io_sectors_acct = io_sectors;
+io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
 break;
 case MIRROR_METHOD_ZERO:
 case MIRROR_METHOD_DISCARD:
 mirror_do_zero_or_discard(s, sector_num, io_sectors,
   mirror_method == MIRROR_METHOD_DISCARD);
 if (write_zeroes_ok) {
-io_sectors_acct = 0;
+io_bytes_acct = 0;
 } else {
-io_sectors_acct = io_sectors;
+io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
 }
 break;
 default:
@@ -463,7 +464,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)

[Qemu-devel] Modular hardware emulation

2017-07-05 Thread Tom Cook

I'm using qemu to emulate a certain fruit-flavoured single-board computer -
which is fantastic for testing changes configuration, by the way.

We have a bit of custom hardware attached to that single-board computer -
essentially an SPI-slave 8-channel ADC with some analog electronics
attached.

I'm trying to get a feel for the level of effort involved in writing an
emulation of this custom hardware that will work with qemu.  I can see two
extremes:  At one end of the scale, there might be a nice Python interface
already in place for writing SPI device emulations and all I need to do is
point a command-line parameter at a suitable script and away it goes.  At
the other end, I need to hack around in the qemu source, understand about
3/4 of it and create a custom build of the whole emulator.

Can someone give me some sort of idea what level of effort is involved, and
perhaps a pointer in the right direction, please?

Regards,
Tom

Re: [Qemu-devel] trigonometric functions in softfloat

2017-07-05 Thread Aurelien Jarno

Hi,

On 2017-07-05 10:25, Laurent Vivier wrote:
> Hi,
> 
> Thomas has pointed out that WinUAE[1] has an updated softfloat library
> implementing missing operations for 680x0.
> 
> Do you think these changes can be merged in QEMU?

From the licensing point of view, they seems to use the SoftFloat-2a
license, so that should be fine.

Personally I am fine adding general trigonometric functions to
softfloat, that might help other targets like x86 which uses  the math
functions from libm and thus have different behaviour/precision
depending on the host.

The question is more how m68k specific are those trigonometric
functions. If we can have a way to implement them as generic
trigonometric functions reusable by other targets, I am all for it. If
not that code would probably be better in target/m68k.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 07/20] mirror: Switch MirrorBlockJob to byte-based

2017-07-05 Thread Eric Blake

On 07/05/2017 06:42 AM, Kevin Wolf wrote:
> Am 27.06.2017 um 21:24 hat Eric Blake geschrieben:
>> We are gradually converting to byte-based interfaces, as they are
>> easier to reason about than sector-based.  Continue by converting an
>> internal structure (no semantic change), and all references to the
>> buffer size.
>>
>> [checkpatch has a false positive on use of MIN() in this patch]
>>
>> Signed-off-by: Eric Blake 
>> Reviewed-by: John Snow 
> 
> I wouldn't mind an assertion that granularity is a multiple of
> BDRV_SECTOR_SIZE, along with a comment that explains that this is
> required so that we avoid rounding problems when dealing with the bitmap
> functions.

That goes away later when series two converts the bitmap functions to be
byte-based, but you're right that the intermediate state should be
easier to follow.

> 
> blockdev_mirror_common() does already check this, but it feels like it's
> a bit far away from where the actual problem would happen in the mirror
> job code.

Indeed.

> 
>> @@ -768,17 +765,17 @@ static void coroutine_fn mirror_run(void *opaque)
>>   * the destination do COW.  Instead, we copy sectors around the
>>   * dirty data if needed.  We need a bitmap to do that.
>>   */
>> +s->target_cluster_size = BDRV_SECTOR_SIZE;
>>  bdrv_get_backing_filename(target_bs, backing_filename,
>>sizeof(backing_filename));
>>  if (!bdrv_get_info(target_bs, ) && bdi.cluster_size) {
>> -target_cluster_size = bdi.cluster_size;
>> +s->target_cluster_size = bdi.cluster_size;
>>  }
> 
> Why have the unrelated bdrv_get_backing_filename() between the two
> assignments of s->target_cluster_size? Or actually, wouldn't it be
> even easier to read with an else branch?
> 
> if (!bdrv_get_info(target_bs, ) && bdi.cluster_size) {
> s->target_cluster_size = bdi.cluster_size;
> } else {
> s->target_cluster_size = BDRV_SECTOR_SIZE;
> }

Yes, that looks nicer.

> 
> None of these comments are critical, so anyway:
> 
> Reviewed-by: Kevin Wolf 

I'm respinning v4 anyways, so I'll make the change (and while it is
small, it's still enough that I'll drop R-b).

> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime

2017-07-05 Thread Alex Bennée


Peter Maydell  writes:

> On 5 July 2017 at 20:30, Alex Bennée  wrote:
>>
>> Peter Maydell  writes:
>>
>>> On 5 July 2017 at 17:01, Alex Bennée  wrote:
 An interesting bug was reported on #qemu today. It was bisected to
 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run
 with taskset -c 0. Originally the fingers where pointed at mttcg but it
 occurs in both single and multi-threaded modes.

 I think the problem is qemu_system_reset_request() is certainly racy
 when resetting a running CPU. AFAICT:

   - Guest resets board, writing to some hw address (e.g.
 arm_sysctl_write)
   - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET)
   - We exit iowrite and drop the BQL
   - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset
   - we start writing new values to CPU env while still in TCG code
   - CHAOS!

 The general solution for this is to ensure these sort of tasks are done
 with safe work in the CPUs context when we know nothing else is running.
 It seems this is probably best done by modifying
 qemu_system_reset_request to queue work up on current_cpu and execute it
 as safe work - I don't think the vl.c thread should ever be messing
 about with calling cpu_reset directly.
>>>
>>> My first thought is that qemu_system_reset() should absolutely
>>> stop every CPU (or other runnable thing like a DMA agent) in the
>>> system.
>>
>> Are all these reset calls system wide though?
>
> It's called 'system_reset' because it resets the entire system...
>
>> After all with PCSI you
>> can bring individual cores up and down. I appreciate the vexpress stuff
>> pre-dates those well defined semantics though.
>
> It's individual core reset that's a more ad-hoc afterthought,
> really.
>
>> vm_stop certainly tries to deal with things gracefully as well as send
>> qapi events, drain IO queues and the rest of it. My only concern is it
>> handles two cases - external vm_stops and those from the current CPU.
>>
>> I think it may be cleaner for CPU originated halts to use the
>> async_safe_run_on_cpu() mechanism.
>
> System reset already has an async component to it -- you call
> qemu_system_reset_request(), which just says "schedule a system
> reset as soon as convenient". qemu_system_reset() is the thing
> that runs later and actually does the job (from the io thread,
> not the CPU thread).
>
> Looking more closely at the vl.c code, it looks like it
> calls pause_all_vcpus() before calling qemu_system_reset():
> shouldn't that be pausing all the TCG CPUs?

Hmm it should - but it doesn't seem to have in this backtrace:

#0  0x5593fdd3 in arm_cpu_reset (s=0x569abb90) at 
/home/alex/lsrc/qemu/qemu.git/target/arm/cpu.c:119
#1  0x55bcc74a in cpu_reset (cpu=0x569abb90) at qom/cpu.c:268
#2  0x5589d82a in do_cpu_reset (opaque=0x569abb90) at 
/home/alex/lsrc/qemu/qemu.git/hw/arm/boot.c:570
#3  0x55a257e4 in qemu_devices_reset () at hw/core/reset.c:69
#4  0x559697a8 in qemu_system_reset (reason=SHUTDOWN_CAUSE_GUEST_RESET) 
at vl.c:1713
#5  0x55969c0d in main_loop_should_exit () at vl.c:1885
#6  0x55969cda in main_loop () at vl.c:1922
#7  0x55971aca in main (argc=16, argv=0x7fffd918, 
envp=0x7fffd9a0) at vl.c:4749

Thread 4 (Thread 0x7fff731ff700 (LWP 10098)):
#0  0x7fffdf4f5a15 in do_futex_wait (private=0, abstime=0x7fff731fc670, 
expected=0, futex_word=0x7fff64cbb5b8) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  0x7fffdf4f5a15 in do_futex_wait (sem=sem@entry=0x7fff64cbb5b8, 
abstime=abstime@entry=0x7fff731fc670) at sem_waitcommon.c:111
#2  0x7fffdf4f5adf in __new_sem_wait_slow (sem=0x7fff64cbb5b8, 
abstime=0x7fff731fc670) at sem_waitcommon.c:181
#3  0x7fffdf4f5b92 in sem_timedwait (sem=, 
abstime=) at sem_timedwait.c:36
#4  0x55d27488 in qemu_sem_timedwait (sem=0x7fff64cbb5b8, ms=1) at 
util/qemu-thread-posix.c:271
#5  0x55d20aad in worker_thread (opaque=0x7fff64cbb550) at 
util/thread-pool.c:92
#6  0x7fffdf4ed6ba in start_thread (arg=0x7fff731ff700) at 
pthread_create.c:333
#7  0x7fffdf2233dd in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7fff7ebff700 (LWP 10097)):
#0  0x7fffdf4f630a in __lll_unlock_wake () at 
../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
#1  0x7fffdf4f14ff in __GI___pthread_mutex_unlock (decr=1, 
mutex=0x5641ae20 ) at pthread_mutex_unlock.c:55
#2  0x7fffdf4f14ff in __GI___pthread_mutex_unlock (mutex=0x5641ae20 
) at pthread_mutex_unlock.c:314
#3  0x55d27091 in qemu_mutex_unlock (mutex=0x5641ae20 
) at util/qemu-thread-posix.c:88
#4  0x557aa911 in qemu_mutex_unlock_iothread () at 
/home/alex/lsrc/qemu/qemu.git/cpus.c:1589
#5  0x557d791a in

Re: [Qemu-devel] [PATCH v4 5/5] tests: Add check-qobject for equality tests

2017-07-05 Thread Eric Blake

On 07/05/2017 02:04 PM, Max Reitz wrote:
> Add a new test file (check-qobject.c) for unit tests that concern
> QObjects as a whole.
> 
> Its only purpose for now is to test the qobject_is_equal() function.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/Makefile.include |   4 +-
>  qobject/qnum.c |  16 +-
>  tests/check-qobject.c  | 404 
> +
>  3 files changed, 417 insertions(+), 7 deletions(-)
>  create mode 100644 tests/check-qobject.c
> 

> +++ b/qobject/qnum.c
> @@ -217,12 +217,16 @@ QNum *qobject_to_qnum(const QObject *obj)
>  /**
>   * qnum_is_equal(): Test whether the two QNums are equal
>   *
> - * Negative integers are never considered equal to unsigned integers.
> - * Doubles are only considered equal to integers if their fractional
> - * part is zero and their integral part is exactly equal to the
> - * integer.  Because doubles have limited precision, there are
> - * therefore integers which do not have an equal double (e.g.
> - * INT64_MAX).
> + * This comparison is done independently of the internal
> + * representation.  Any two numbers are considered equal if they are
> + * mathmatically equal, that means:

s/mathmatically/mathematically/

> + * - Negative integers are never considered equal to unsigned
> + *   integers.
> + * - Floating point values are only considered equal to integers if
> + *   their fractional part is zero and their integral part is exactly
> + *   equal to the integer.  Because doubles have limited precision,
> + *   there are therefore integers which do not have an equal floating
> + *   point value (e.g. INT64_MAX).
>   */

> +static void qobject_is_equal_num_test(void)
> +{
> +QNum *u0, *i0, *d0, *d0p25, *dnan, *um42, *im42, *dm42;

Given my comments on 2/5, do you want a dinf?

> +QNum *umax, *imax, *umax_exact, *umax_exact_p1;
> +QNum *dumax, *dimax, *dumax_exact, *dumax_exact_p1;
> +QString *s0, *s_empty;
> +QBool *bfalse;
> +
> +u0 = qnum_from_uint(0u);
> +i0 = qnum_from_int(0);
> +d0 = qnum_from_double(0.0);
> +d0p25 = qnum_from_double(0.25);
> +dnan = qnum_from_double(0.0 / 0.0);

Are there compilers that complain if we open-code division by zero
instead of using NAN from  (similarly, if you test infinity, I'd
use the INFINITY macro instead of an open-coded computation)

> +um42 = qnum_from_uint((uint64_t)-42);
> +im42 = qnum_from_int(-42);
> +dm42 = qnum_from_int(-42.0);
> +
> +/* 2^64 - 1: Not exactly representable as a double (needs 64 bits
> + * of precision, but double only has 53).  The double equivalent
> + * may be either 2^64 or 2^64 - 2^11. */
> +umax = qnum_from_uint(UINT64_MAX);
> +
> +/* 2^63 - 1: Not exactly representable as a double (needs 63 bits
> + * of precision, but double only has 53).  The double equivalent
> + * may be either 2^63 or 2^63 - 2^10. */
> +imax = qnum_from_int(INT64_MAX);
> +/* 2^64 - 2^11: Exactly representable as a double (the least
> + * significant 11 bits are set to 0, so we only need the 53 bits
> + * of precision double offers).  This is the maximum value which
> + * is exactly representable both as a uint64_t and a double. */
> +umax_exact = qnum_from_uint(UINT64_MAX - 0x7ff);
> +
> +/* 2^64 - 2^11 + 1: Not exactly representable as a double (needs
> + * 64 bits again), but whereas (double)UINT64_MAX may be rounded
> + * up to 2^64, this will most likely be rounded down to
> + * 2^64 - 2^11. */
> +umax_exact_p1 = qnum_from_uint(UINT64_MAX - 0x7ff + 1);

Nice.

> +
> +dumax = qnum_from_double((double)qnum_get_uint(umax));
> +dimax = qnum_from_double((double)qnum_get_int(imax));
> +dumax_exact = qnum_from_double((double)qnum_get_uint(umax_exact));
> +dumax_exact_p1 = qnum_from_double((double)qnum_get_uint(umax_exact_p1));

Compiler-dependent what values (some) of these doubles hold.

> +
> +s0 = qstring_from_str("0");
> +s_empty = qstring_new();
> +bfalse = qbool_from_bool(false);
> +
> +/* The internal representation should not matter, as long as the
> + * precision is sufficient */
> +test_equality(true, u0, i0, d0);
> +
> +/* No automatic type conversion */
> +test_equality(false, u0, s0, s_empty, bfalse, qnull(), NULL);
> +test_equality(false, i0, s0, s_empty, bfalse, qnull(), NULL);
> +test_equality(false, d0, s0, s_empty, bfalse, qnull(), NULL);
> +
> +/* Do not round */
> +test_equality(false, u0, d0p25);
> +test_equality(false, i0, d0p25);
> +
> +/* Do not assume any object is equal to itself -- note however
> + * that NaN cannot occur in a JSON object anyway. */
> +g_assert(qobject_is_equal(QOBJECT(dnan), QOBJECT(dnan)) == false);

If you test infinity, that also cannot occur in JSON objects.

> +
> +/* No unsigned overflow */
> +test_equality(false, um42, im42);
> +test_equality(false, um42, dm42);
> +test_equality(true,

Re: [Qemu-devel] [PATCH v4 3/5] block: qobject_is_equal() in bdrv_reopen_prepare()

2017-07-05 Thread Eric Blake

On 07/05/2017 02:04 PM, Max Reitz wrote:
> Currently, bdrv_reopen_prepare() assumes that all BDS options are
> strings. However, this is not the case if the BDS has been created
> through the json: pseudo-protocol or blockdev-add.
> 
> Note that the user-invokable reopen command is an HMP command, so you
> can only specify strings there. Therefore, specifying a non-string
> option with the "same" value as it was when originally created will now
> return an error because the values are supposedly similar (and there is
> no way for the user to circumvent this but to just not specify the
> option again -- however, this is still strictly better than just
> crashing).
> 
> Signed-off-by: Max Reitz 
> ---
>  block.c | 29 ++---
>  1 file changed, 18 insertions(+), 11 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Eric Blake

On 07/05/2017 02:04 PM, Max Reitz wrote:
> This generic function (along with its implementations for different
> types) determines whether two QObjects are equal.
> 
> Signed-off-by: Max Reitz 
> ---
> Markus also proposed just reporting two values as unequal if they have a
> different internal representation (i.e. a different QNum kind).
> 
> I don't like this very much, because I feel like QInt and QFloat have
> been unified for a reason: Outside of these classes, nobody should care
> about the exact internal representation.  In JSON, there is no
> difference anyway.  We probably want to use integers as long as we can
> and doubles whenever we cannot.
> 
> In any case, I feel like the class should hide the different internal
> representations from the user.  This necessitates being able to compare
> floating point values against integers.  Since apparently the main use
> of QObject is to parse and emit JSON (and represent such objects
> internally), we also have to agree that JSON doesn't make a difference:
> 42 is just the same as 42.0.
> 
> Finally, I think it's rather pointless not to consider 42u and 42 the
> same value.  But since unsigned/signed are two different kinds of QNums
> already, we cannot consider them equal without considering 42.0 equal,
> too.
> 
> Because of this, I have decided to continue to compare QNum values even
> if they are of a different kind.

This explanation may deserve to be in the commit log proper.

>  /**
> + * qnum_is_equal(): Test whether the two QNums are equal
> + *
> + * Negative integers are never considered equal to unsigned integers.
> + * Doubles are only considered equal to integers if their fractional
> + * part is zero and their integral part is exactly equal to the
> + * integer.  Because doubles have limited precision, there are
> + * therefore integers which do not have an equal double (e.g.
> + * INT64_MAX).
> + */
> +bool qnum_is_equal(const QObject *x, const QObject *y)
> +{
> +QNum *num_x = qobject_to_qnum(x);
> +QNum *num_y = qobject_to_qnum(y);
> +double integral_part; /* Needed for the modf() calls below */
> +
> +switch (num_x->kind) {
> +case QNUM_I64:
> +switch (num_y->kind) {
> +case QNUM_I64:
> +/* Comparison in native int64_t type */
> +return num_x->u.i64 == num_y->u.i64;
> +case QNUM_U64:
> +/* Implicit conversion of x to uin64_t, so we have to
> + * check its sign before */
> +return num_x->u.i64 >= 0 && num_x->u.i64 == num_y->u.u64;
> +case QNUM_DOUBLE:
> +/* Comparing x to y in double (which the implicit
> + * conversion would do) is not exact.  So after having
> + * checked that y is an integer in the int64_t range
> + * (i.e. that it is within bounds and its fractional part
> + * is zero), compare both as integers. */
> +return num_y->u.dbl >= -0x1p63 && num_y->u.dbl < 0x1p63 &&
> +modf(num_y->u.dbl, _part) == 0.0 &&

'man modf': given modf(x, ), if x is a NaN, a Nan is returned
(good, NaN, is never equal to any integer value). But if x is positive
infinity, +0 is returned...

> +num_x->u.i64 == (int64_t)num_y->u.dbl;

...and *iptr is set to positive infinity.  You are now converting
infinity to int64_t (whether via num_y->u.dbl or via _part),
which falls in the unspecified portion of C99 (your quotes from 6.3.1.4
mentioned converting a finite value of real to integer, and say nothing
about converting NaN or infinity to integer).

Adding an 'isfinite(num_y->u.dbl) &&' to the expression would cover your
bases (or even 'isfinite(integral_part)', if we are worried about a
static checker complaining that we assign but never read integral_part).

> +}
> +abort();
> +case QNUM_U64:
> +switch (num_y->kind) {
> +case QNUM_I64:
> +return qnum_is_equal(y, x);
> +case QNUM_U64:
> +/* Comparison in native uint64_t type */
> +return num_x->u.u64 == num_y->u.u64;
> +case QNUM_DOUBLE:
> +/* Comparing x to y in double (which the implicit
> + * conversion would do) is not exact.  So after having
> + * checked that y is an integer in the uint64_t range
> + * (i.e. that it is within bounds and its fractional part
> + * is zero), compare both as integers. */
> +return num_y->u.dbl >= 0 && num_y->u.dbl < 0x1p64 &&
> +modf(num_y->u.dbl, _part) == 0.0 &&
> +num_x->u.u64 == (uint64_t)num_y->u.dbl;

And again.

With that addition,
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime

2017-07-05 Thread Peter Maydell

On 5 July 2017 at 20:30, Alex Bennée  wrote:
>
> Peter Maydell  writes:
>
>> On 5 July 2017 at 17:01, Alex Bennée  wrote:
>>> An interesting bug was reported on #qemu today. It was bisected to
>>> 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run
>>> with taskset -c 0. Originally the fingers where pointed at mttcg but it
>>> occurs in both single and multi-threaded modes.
>>>
>>> I think the problem is qemu_system_reset_request() is certainly racy
>>> when resetting a running CPU. AFAICT:
>>>
>>>   - Guest resets board, writing to some hw address (e.g.
>>> arm_sysctl_write)
>>>   - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET)
>>>   - We exit iowrite and drop the BQL
>>>   - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset
>>>   - we start writing new values to CPU env while still in TCG code
>>>   - CHAOS!
>>>
>>> The general solution for this is to ensure these sort of tasks are done
>>> with safe work in the CPUs context when we know nothing else is running.
>>> It seems this is probably best done by modifying
>>> qemu_system_reset_request to queue work up on current_cpu and execute it
>>> as safe work - I don't think the vl.c thread should ever be messing
>>> about with calling cpu_reset directly.
>>
>> My first thought is that qemu_system_reset() should absolutely
>> stop every CPU (or other runnable thing like a DMA agent) in the
>> system.
>
> Are all these reset calls system wide though?

It's called 'system_reset' because it resets the entire system...

> After all with PCSI you
> can bring individual cores up and down. I appreciate the vexpress stuff
> pre-dates those well defined semantics though.

It's individual core reset that's a more ad-hoc afterthought,
really.

> vm_stop certainly tries to deal with things gracefully as well as send
> qapi events, drain IO queues and the rest of it. My only concern is it
> handles two cases - external vm_stops and those from the current CPU.
>
> I think it may be cleaner for CPU originated halts to use the
> async_safe_run_on_cpu() mechanism.

System reset already has an async component to it -- you call
qemu_system_reset_request(), which just says "schedule a system
reset as soon as convenient". qemu_system_reset() is the thing
that runs later and actually does the job (from the io thread,
not the CPU thread).

Looking more closely at the vl.c code, it looks like it
calls pause_all_vcpus() before calling qemu_system_reset():
shouldn't that be pausing all the TCG CPUs?

thanks
-- PMM

Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime

2017-07-05 Thread Alex Bennée


Paolo Bonzini  writes:

> On 05/07/2017 18:14, Peter Maydell wrote:
>>>   - Guest resets board, writing to some hw address (e.g.
>>> arm_sysctl_write)
>>>   - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET)
>>>   - We exit iowrite and drop the BQL
>>>   - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset
>>>   - we start writing new values to CPU env while still in TCG code
>>>   - CHAOS!
>>>
>>> The general solution for this is to ensure these sort of tasks are done
>>> with safe work in the CPUs context when we know nothing else is running.
>>> It seems this is probably best done by modifying
>>> qemu_system_reset_request to queue work up on current_cpu and execute it
>>> as safe work - I don't think the vl.c thread should ever be messing
>>> about with calling cpu_reset directly.
>> My first thought is that qemu_system_reset() should absolutely
>> stop every CPU (or other runnable thing like a DMA agent) in the
>> system. The semantics are basically "like a power cycle", so
>> that should include a complete stop of the world. (Is this
>> what vm_stop() does? Dunno...)
>
> I agree, it should do vm_stop() as the first thing and, if applicable,
> vm_start() as the last thing, similar to e.g. savevm.

Why not use our async_safe_run_on_cpu mechanism for it? Certainly I
wouldn't expect the vCPU hitting it's own reset button to need to be
graceful about it.

>
> In fact, the above bug probably has existed forever in KVM.
>
> Paolo


--
Alex Bennée

Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime

2017-07-05 Thread Alex Bennée

Peter Maydell  writes:

> On 5 July 2017 at 17:01, Alex Bennée  wrote:
>> An interesting bug was reported on #qemu today. It was bisected to
>> 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run
>> with taskset -c 0. Originally the fingers where pointed at mttcg but it
>> occurs in both single and multi-threaded modes.
>>
>> I think the problem is qemu_system_reset_request() is certainly racy
>> when resetting a running CPU. AFAICT:
>>
>>   - Guest resets board, writing to some hw address (e.g.
>> arm_sysctl_write)
>>   - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET)
>>   - We exit iowrite and drop the BQL
>>   - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset
>>   - we start writing new values to CPU env while still in TCG code
>>   - CHAOS!
>>
>> The general solution for this is to ensure these sort of tasks are done
>> with safe work in the CPUs context when we know nothing else is running.
>> It seems this is probably best done by modifying
>> qemu_system_reset_request to queue work up on current_cpu and execute it
>> as safe work - I don't think the vl.c thread should ever be messing
>> about with calling cpu_reset directly.
>
> My first thought is that qemu_system_reset() should absolutely
> stop every CPU (or other runnable thing like a DMA agent) in the
> system.

Are all these reset calls system wide though? After all with PCSI you
can bring individual cores up and down. I appreciate the vexpress stuff
pre-dates those well defined semantics though.

> The semantics are basically "like a power cycle", so
> that should include a complete stop of the world. (Is this
> what vm_stop() does? Dunno...)

vm_stop certainly tries to deal with things gracefully as well as send
qapi events, drain IO queues and the rest of it. My only concern is it
handles two cases - external vm_stops and those from the current CPU.

I think it may be cleaner for CPU originated halts to use the
async_safe_run_on_cpu() mechanism. It has clear semantics with respect
to the behaviour of other CPUs. If you queue work with
async_safe_run_on_cpu and do a cpu_loop_exit you can guarantee all vCPUs
have stopped and the work has been serviced before the originating vCPU
executes its next instruction.

>
> thanks
> -- PMM

--
Alex Bennée

[Qemu-devel] [PATCH v4 5/5] tests: Add check-qobject for equality tests

2017-07-05 Thread Max Reitz

Add a new test file (check-qobject.c) for unit tests that concern
QObjects as a whole.

Its only purpose for now is to test the qobject_is_equal() function.

Signed-off-by: Max Reitz 
---
 tests/Makefile.include |   4 +-
 qobject/qnum.c |  16 +-
 tests/check-qobject.c  | 404 +
 3 files changed, 417 insertions(+), 7 deletions(-)
 create mode 100644 tests/check-qobject.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 42e17e2..07b130c 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -18,6 +18,7 @@ check-unit-y += tests/check-qlist$(EXESUF)
 gcov-files-check-qlist-y = qobject/qlist.c
 check-unit-y += tests/check-qnull$(EXESUF)
 gcov-files-check-qnull-y = qobject/qnull.c
+check-unit-y += tests/check-qobject$(EXESUF)
 check-unit-y += tests/check-qjson$(EXESUF)
 gcov-files-check-qjson-y = qobject/qjson.c
 check-unit-y += tests/test-qobject-output-visitor$(EXESUF)
@@ -508,7 +509,7 @@ GENERATED_FILES += tests/test-qapi-types.h 
tests/test-qapi-visit.h \
tests/test-qmp-introspect.h
 
 test-obj-y = tests/check-qnum.o tests/check-qstring.o tests/check-qdict.o \
-   tests/check-qlist.o tests/check-qnull.o \
+   tests/check-qlist.o tests/check-qnull.o tests/check-qobject.o \
tests/check-qjson.o \
tests/test-coroutine.o tests/test-string-output-visitor.o \
tests/test-string-input-visitor.o tests/test-qobject-output-visitor.o \
@@ -541,6 +542,7 @@ tests/check-qstring$(EXESUF): tests/check-qstring.o 
$(test-util-obj-y)
 tests/check-qdict$(EXESUF): tests/check-qdict.o $(test-util-obj-y)
 tests/check-qlist$(EXESUF): tests/check-qlist.o $(test-util-obj-y)
 tests/check-qnull$(EXESUF): tests/check-qnull.o $(test-util-obj-y)
+tests/check-qobject$(EXESUF): tests/check-qobject.o $(test-util-obj-y)
 tests/check-qjson$(EXESUF): tests/check-qjson.o $(test-util-obj-y)
 tests/check-qom-interface$(EXESUF): tests/check-qom-interface.o 
$(test-qom-obj-y)
 tests/check-qom-proplist$(EXESUF): tests/check-qom-proplist.o $(test-qom-obj-y)
diff --git a/qobject/qnum.c b/qobject/qnum.c
index 96c348c..3d029f6 100644
--- a/qobject/qnum.c
+++ b/qobject/qnum.c
@@ -217,12 +217,16 @@ QNum *qobject_to_qnum(const QObject *obj)
 /**
  * qnum_is_equal(): Test whether the two QNums are equal
  *
- * Negative integers are never considered equal to unsigned integers.
- * Doubles are only considered equal to integers if their fractional
- * part is zero and their integral part is exactly equal to the
- * integer.  Because doubles have limited precision, there are
- * therefore integers which do not have an equal double (e.g.
- * INT64_MAX).
+ * This comparison is done independently of the internal
+ * representation.  Any two numbers are considered equal if they are
+ * mathmatically equal, that means:
+ * - Negative integers are never considered equal to unsigned
+ *   integers.
+ * - Floating point values are only considered equal to integers if
+ *   their fractional part is zero and their integral part is exactly
+ *   equal to the integer.  Because doubles have limited precision,
+ *   there are therefore integers which do not have an equal floating
+ *   point value (e.g. INT64_MAX).
  */
 bool qnum_is_equal(const QObject *x, const QObject *y)
 {
diff --git a/tests/check-qobject.c b/tests/check-qobject.c
new file mode 100644
index 000..fd964bf
--- /dev/null
+++ b/tests/check-qobject.c
@@ -0,0 +1,404 @@
+/*
+ * Generic QObject unit-tests.
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+
+#include "qapi/qmp/types.h"
+#include "qemu-common.h"
+
+/* Marks the end of the test_equality() argument list.
+ * We cannot use NULL there because that is a valid argument. */
+static QObject _test_equality_end_of_arguments;
+
+/**
+ * Test whether all variadic QObject *arguments are equal (@expected
+ * is true) or whether they are all not equal (@expected is false).
+ * Every QObject is tested to be equal to itself (to test
+ * reflexivity), all tests are done both ways (to test symmetry), and
+ * transitivity is not assumed but checked (each object is compared to
+ * every other one).
+ *
+ * Note that qobject_is_equal() is not really an equivalence relation,
+ * so this function may not be used for all objects (reflexivity is
+ * not guaranteed, e.g. in the case of a QNum containing NaN).
+ */
+static void do_test_equality(bool expected, ...)
+{
+va_list ap_count, ap_extract;
+QObject **args;
+int arg_count = 0;
+int i, j;
+
+va_start(ap_count, expected);
+va_copy(ap_extract, ap_count);
+while (va_arg(ap_count, QObject *) != &_test_equality_end_of_arguments) {
+arg_count++;
+}
+va_end(ap_count);
+
+args = g_new(QObject *, arg_count);
+for (i = 0; i < arg_count; i++) {
+args[i] =

[Qemu-devel] [PATCH v4 4/5] iotests: Add test for non-string option reopening

2017-07-05 Thread Max Reitz

Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/133 | 9 +
 tests/qemu-iotests/133.out | 5 +
 2 files changed, 14 insertions(+)

diff --git a/tests/qemu-iotests/133 b/tests/qemu-iotests/133
index 9d35a6a..af6b3e1 100755
--- a/tests/qemu-iotests/133
+++ b/tests/qemu-iotests/133
@@ -83,6 +83,15 @@ $QEMU_IO -c 'reopen -o driver=qcow2' $TEST_IMG
 $QEMU_IO -c 'reopen -o file.driver=file' $TEST_IMG
 $QEMU_IO -c 'reopen -o backing.driver=qcow2' $TEST_IMG
 
+echo
+echo "=== Check that reopening works with non-string options ==="
+echo
+
+# Using the json: pseudo-protocol we can create non-string options
+# (Invoke 'info' just so we get some output afterwards)
+IMGOPTSSYNTAX=false $QEMU_IO -f null-co -c 'reopen' -c 'info' \
+"json:{'driver': 'null-co', 'size': 65536}"
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/133.out b/tests/qemu-iotests/133.out
index cc86b94..f4a85ae 100644
--- a/tests/qemu-iotests/133.out
+++ b/tests/qemu-iotests/133.out
@@ -19,4 +19,9 @@ Cannot change the option 'driver'
 
 === Check that unchanged driver is okay ===
 
+
+=== Check that reopening works with non-string options ===
+
+format name: null-co
+format name: null-co
 *** done
-- 
2.9.4

[Qemu-devel] [PATCH v4 1/5] qapi/qnull: Add own header

2017-07-05 Thread Max Reitz

Reviewed-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 include/qapi/qmp/qnull.h   | 26 ++
 include/qapi/qmp/qobject.h |  8 
 include/qapi/qmp/types.h   |  1 +
 qobject/qnull.c|  2 +-
 tests/check-qnull.c|  2 +-
 5 files changed, 29 insertions(+), 10 deletions(-)
 create mode 100644 include/qapi/qmp/qnull.h

diff --git a/include/qapi/qmp/qnull.h b/include/qapi/qmp/qnull.h
new file mode 100644
index 000..48edad4
--- /dev/null
+++ b/include/qapi/qmp/qnull.h
@@ -0,0 +1,26 @@
+/*
+ * QNull
+ *
+ * Copyright (C) 2015 Red Hat, Inc.
+ *
+ * Authors:
+ *  Markus Armbruster 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1
+ * or later.  See the COPYING.LIB file in the top-level directory.
+ */
+
+#ifndef QNULL_H
+#define QNULL_H
+
+#include "qapi/qmp/qobject.h"
+
+extern QObject qnull_;
+
+static inline QObject *qnull(void)
+{
+qobject_incref(_);
+return _;
+}
+
+#endif /* QNULL_H */
diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
index b8ddbca..ef1d1a9 100644
--- a/include/qapi/qmp/qobject.h
+++ b/include/qapi/qmp/qobject.h
@@ -93,12 +93,4 @@ static inline QType qobject_type(const QObject *obj)
 return obj->type;
 }
 
-extern QObject qnull_;
-
-static inline QObject *qnull(void)
-{
-qobject_incref(_);
-return _;
-}
-
 #endif /* QOBJECT_H */
diff --git a/include/qapi/qmp/types.h b/include/qapi/qmp/types.h
index a4bc662..749ac44 100644
--- a/include/qapi/qmp/types.h
+++ b/include/qapi/qmp/types.h
@@ -19,5 +19,6 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qlist.h"
+#include "qapi/qmp/qnull.h"
 
 #endif /* QAPI_QMP_TYPES_H */
diff --git a/qobject/qnull.c b/qobject/qnull.c
index c124d05..43918f1 100644
--- a/qobject/qnull.c
+++ b/qobject/qnull.c
@@ -12,7 +12,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu-common.h"
-#include "qapi/qmp/qobject.h"
+#include "qapi/qmp/qnull.h"
 
 QObject qnull_ = {
 .type = QTYPE_QNULL,
diff --git a/tests/check-qnull.c b/tests/check-qnull.c
index 8dd1c96..4a67b9a 100644
--- a/tests/check-qnull.c
+++ b/tests/check-qnull.c
@@ -8,7 +8,7 @@
  */
 #include "qemu/osdep.h"
 
-#include "qapi/qmp/qobject.h"
+#include "qapi/qmp/qnull.h"
 #include "qemu-common.h"
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qobject-output-visitor.h"
-- 
2.9.4

[Qemu-devel] [PATCH v4 3/5] block: qobject_is_equal() in bdrv_reopen_prepare()

2017-07-05 Thread Max Reitz

Currently, bdrv_reopen_prepare() assumes that all BDS options are
strings. However, this is not the case if the BDS has been created
through the json: pseudo-protocol or blockdev-add.

Note that the user-invokable reopen command is an HMP command, so you
can only specify strings there. Therefore, specifying a non-string
option with the "same" value as it was when originally created will now
return an error because the values are supposedly similar (and there is
no way for the user to circumvent this but to just not specify the
option again -- however, this is still strictly better than just
crashing).

Signed-off-by: Max Reitz 
---
 block.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index 913bb43..f8c8e92 100644
--- a/block.c
+++ b/block.c
@@ -2947,19 +2947,26 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 const QDictEntry *entry = qdict_first(reopen_state->options);
 
 do {
-QString *new_obj = qobject_to_qstring(entry->value);
-const char *new = qstring_get_str(new_obj);
+QObject *new = entry->value;
+QObject *old = qdict_get(reopen_state->bs->options, entry->key);
+
 /*
- * Caution: while qdict_get_try_str() is fine, getting
- * non-string types would require more care.  When
- * bs->options come from -blockdev or blockdev_add, its
- * members are typed according to the QAPI schema, but
- * when they come from -drive, they're all QString.
+ * TODO: When using -drive to specify blockdev options, all values
+ * will be strings; however, when using -blockdev, blockdev-add or
+ * filenames using the json:{} pseudo-protocol, they will be
+ * correctly typed.
+ * In contrast, reopening options are (currently) always strings
+ * (because you can only specify them through qemu-io; all other
+ * callers do not specify any options).
+ * Therefore, when using anything other than -drive to create a 
BDS,
+ * this cannot detect non-string options as unchanged, because
+ * qobject_is_equal() always returns false for objects of different
+ * type.  In the future, this should be remedied by correctly 
typing
+ * all options.  For now, this is not too big of an issue because
+ * the user can simply omit options which cannot be changed anyway,
+ * so they will stay unchanged.
  */
-const char *old = qdict_get_try_str(reopen_state->bs->options,
-entry->key);
-
-if (!old || strcmp(new, old)) {
+if (!qobject_is_equal(new, old)) {
 error_setg(errp, "Cannot change the option '%s'", entry->key);
 ret = -EINVAL;
 goto error;
-- 
2.9.4

[Qemu-devel] [PATCH v4 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Max Reitz

This generic function (along with its implementations for different
types) determines whether two QObjects are equal.

Signed-off-by: Max Reitz 
---
Markus also proposed just reporting two values as unequal if they have a
different internal representation (i.e. a different QNum kind).

I don't like this very much, because I feel like QInt and QFloat have
been unified for a reason: Outside of these classes, nobody should care
about the exact internal representation.  In JSON, there is no
difference anyway.  We probably want to use integers as long as we can
and doubles whenever we cannot.

In any case, I feel like the class should hide the different internal
representations from the user.  This necessitates being able to compare
floating point values against integers.  Since apparently the main use
of QObject is to parse and emit JSON (and represent such objects
internally), we also have to agree that JSON doesn't make a difference:
42 is just the same as 42.0.

Finally, I think it's rather pointless not to consider 42u and 42 the
same value.  But since unsigned/signed are two different kinds of QNums
already, we cannot consider them equal without considering 42.0 equal,
too.

Because of this, I have decided to continue to compare QNum values even
if they are of a different kind.
---
 include/qapi/qmp/qbool.h   |  1 +
 include/qapi/qmp/qdict.h   |  1 +
 include/qapi/qmp/qlist.h   |  1 +
 include/qapi/qmp/qnull.h   |  2 ++
 include/qapi/qmp/qnum.h|  1 +
 include/qapi/qmp/qobject.h |  9 ++
 include/qapi/qmp/qstring.h |  1 +
 qobject/qbool.c|  8 +
 qobject/qdict.c| 29 ++
 qobject/qlist.c| 32 
 qobject/qnull.c|  9 ++
 qobject/qnum.c | 73 ++
 qobject/qobject.c  | 29 ++
 qobject/qstring.c  |  9 ++
 14 files changed, 205 insertions(+)

diff --git a/include/qapi/qmp/qbool.h b/include/qapi/qmp/qbool.h
index a4c..f77ea86 100644
--- a/include/qapi/qmp/qbool.h
+++ b/include/qapi/qmp/qbool.h
@@ -24,6 +24,7 @@ typedef struct QBool {
 QBool *qbool_from_bool(bool value);
 bool qbool_get_bool(const QBool *qb);
 QBool *qobject_to_qbool(const QObject *obj);
+bool qbool_is_equal(const QObject *x, const QObject *y);
 void qbool_destroy_obj(QObject *obj);
 
 #endif /* QBOOL_H */
diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index 363e431..84f8ea7 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -42,6 +42,7 @@ void qdict_del(QDict *qdict, const char *key);
 int qdict_haskey(const QDict *qdict, const char *key);
 QObject *qdict_get(const QDict *qdict, const char *key);
 QDict *qobject_to_qdict(const QObject *obj);
+bool qdict_is_equal(const QObject *x, const QObject *y);
 void qdict_iter(const QDict *qdict,
 void (*iter)(const char *key, QObject *obj, void *opaque),
 void *opaque);
diff --git a/include/qapi/qmp/qlist.h b/include/qapi/qmp/qlist.h
index c4b5fda..24e1e9f 100644
--- a/include/qapi/qmp/qlist.h
+++ b/include/qapi/qmp/qlist.h
@@ -58,6 +58,7 @@ QObject *qlist_peek(QList *qlist);
 int qlist_empty(const QList *qlist);
 size_t qlist_size(const QList *qlist);
 QList *qobject_to_qlist(const QObject *obj);
+bool qlist_is_equal(const QObject *x, const QObject *y);
 void qlist_destroy_obj(QObject *obj);
 
 static inline const QListEntry *qlist_first(const QList *qlist)
diff --git a/include/qapi/qmp/qnull.h b/include/qapi/qmp/qnull.h
index 48edad4..f4fbcae 100644
--- a/include/qapi/qmp/qnull.h
+++ b/include/qapi/qmp/qnull.h
@@ -23,4 +23,6 @@ static inline QObject *qnull(void)
 return _;
 }
 
+bool qnull_is_equal(const QObject *x, const QObject *y);
+
 #endif /* QNULL_H */
diff --git a/include/qapi/qmp/qnum.h b/include/qapi/qmp/qnum.h
index 09d745c..237d01b 100644
--- a/include/qapi/qmp/qnum.h
+++ b/include/qapi/qmp/qnum.h
@@ -48,6 +48,7 @@ double qnum_get_double(QNum *qn);
 char *qnum_to_string(QNum *qn);
 
 QNum *qobject_to_qnum(const QObject *obj);
+bool qnum_is_equal(const QObject *x, const QObject *y);
 void qnum_destroy_obj(QObject *obj);
 
 #endif /* QNUM_H */
diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
index ef1d1a9..38ac688 100644
--- a/include/qapi/qmp/qobject.h
+++ b/include/qapi/qmp/qobject.h
@@ -68,6 +68,15 @@ static inline void qobject_incref(QObject *obj)
 }
 
 /**
+ * qobject_is_equal(): Return whether the two objects are equal.
+ *
+ * Any of the pointers may be NULL; return true if both are.  Always
+ * return false if only one is (therefore a QNull object is not
+ * considered equal to a NULL pointer).
+ */
+bool qobject_is_equal(const QObject *x, const QObject *y);
+
+/**
  * qobject_destroy(): Free resources used by the object
  */
 void qobject_destroy(QObject *obj);
diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h
index 10076b7..65c05a9 100644
--- a/include/qapi/qmp/qstring.h

[Qemu-devel] [PATCH v4 0/5] block: Don't compare strings in bdrv_reopen_prepare()

2017-07-05 Thread Max Reitz

bdrv_reopen_prepare() assumes that all BDS options are strings, which is
not necessarily correct. This series introduces a new qobject_is_equal()
function which can be used to test whether any options have changed,
independently of their type.


v4:
- Patch 1: Kept, because Markus gave his R-b already...
- Patch 2: Check that doubles match integers exactly, not just after the
   integer has been (lossily) converted to a double
   [Markus/Eric]
- Patch 3: Winged comment and s/simply can not/can simply omit/ (:-()
   [Markus/Eric]
- Patch 5:
  - Mention (at least one) reason for the non-reflexivity of
qobject_is_equal() (that being NaN) [Eric]
  - Mention that NaN cannot occur in JSON [Eric]
  - Tests for integer values that cannot be exactly represented as a
double


git-backport-diff against v3:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/5:[] [--] 'qapi/qnull: Add own header'
002/5:[0032] [FC] 'qapi: Add qobject_is_equal()'
003/5:[0022] [FC] 'block: qobject_is_equal() in bdrv_reopen_prepare()'
004/5:[] [--] 'iotests: Add test for non-string option reopening'
005/5:[0114] [FC] 'tests: Add check-qobject for equality tests'


Max Reitz (5):
  qapi/qnull: Add own header
  qapi: Add qobject_is_equal()
  block: qobject_is_equal() in bdrv_reopen_prepare()
  iotests: Add test for non-string option reopening
  tests: Add check-qobject for equality tests

 tests/Makefile.include |   4 +-
 include/qapi/qmp/qbool.h   |   1 +
 include/qapi/qmp/qdict.h   |   1 +
 include/qapi/qmp/qlist.h   |   1 +
 include/qapi/qmp/qnull.h   |  28 
 include/qapi/qmp/qnum.h|   1 +
 include/qapi/qmp/qobject.h |  17 +-
 include/qapi/qmp/qstring.h |   1 +
 include/qapi/qmp/types.h   |   1 +
 block.c|  29 ++--
 qobject/qbool.c|   8 +
 qobject/qdict.c|  29 
 qobject/qlist.c|  32 
 qobject/qnull.c|  11 +-
 qobject/qnum.c |  77 +
 qobject/qobject.c  |  29 
 qobject/qstring.c  |   9 +
 tests/check-qnull.c|   2 +-
 tests/check-qobject.c  | 404 +
 tests/qemu-iotests/133 |   9 +
 tests/qemu-iotests/133.out |   5 +
 21 files changed, 677 insertions(+), 22 deletions(-)
 create mode 100644 include/qapi/qmp/qnull.h
 create mode 100644 tests/check-qobject.c

-- 
2.9.4

Re: [Qemu-devel] [Qemu devel v6 PATCH 4/5] msf2: Add Smartfusion2 SoC.

2017-07-05 Thread Alistair Francis

On Sun, Jul 2, 2017 at 9:45 PM, Subbaraya Sundeep
 wrote:

The patch title shouldn't end in a full stop.

> Smartfusion2 SoC has hardened Microcontroller subsystem
> and flash based FPGA fabric. This patch adds support for
> Microcontroller subsystem in the SoC.
>
> Signed-off-by: Subbaraya Sundeep 

Once you have fixed up the title.

Reviewed-by: Alistair Francis 

Thanks,
Alistair


> ---
>  default-configs/arm-softmmu.mak |   1 +
>  hw/arm/Makefile.objs|   1 +
>  hw/arm/msf2-soc.c   | 216 
> 
>  include/hw/arm/msf2-soc.h   |  67 +
>  4 files changed, 285 insertions(+)
>  create mode 100644 hw/arm/msf2-soc.c
>  create mode 100644 include/hw/arm/msf2-soc.h
>
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 78d7af0..7062512 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -122,3 +122,4 @@ CONFIG_ACPI=y
>  CONFIG_SMBIOS=y
>  CONFIG_ASPEED_SOC=y
>  CONFIG_GPIO_KEY=y
> +CONFIG_MSF2=y
> diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
> index 4c5c4ee..c828061 100644
> --- a/hw/arm/Makefile.objs
> +++ b/hw/arm/Makefile.objs
> @@ -18,3 +18,4 @@ obj-$(CONFIG_FSL_IMX25) += fsl-imx25.o imx25_pdk.o
>  obj-$(CONFIG_FSL_IMX31) += fsl-imx31.o kzm.o
>  obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o sabrelite.o
>  obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
> +obj-$(CONFIG_MSF2) += msf2-soc.o
> diff --git a/hw/arm/msf2-soc.c b/hw/arm/msf2-soc.c
> new file mode 100644
> index 000..d45827f
> --- /dev/null
> +++ b/hw/arm/msf2-soc.c
> @@ -0,0 +1,216 @@
> +/*
> + * SmartFusion2 SoC emulation.
> + *
> + * Copyright (c) 2017 Subbaraya Sundeep 
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu-common.h"
> +#include "hw/arm/arm.h"
> +#include "exec/address-spaces.h"
> +#include "hw/char/serial.h"
> +#include "hw/boards.h"
> +#include "sysemu/block-backend.h"
> +#include "hw/arm/msf2-soc.h"
> +
> +#define MSF2_TIMER_BASE 0x40004000
> +#define MSF2_SYSREG_BASE0x40038000
> +
> +#define ENVM_BASE_ADDRESS 0x6000
> +
> +#define SRAM_BASE_ADDRESS 0x2000
> +
> +#define MSF2_ENVM_SIZE(512 * K_BYTE)
> +#define MSF2_ESRAM_SIZE   (64 * K_BYTE)
> +
> +static const uint32_t spi_addr[MSF2_NUM_SPIS] = { 0x40001000 , 0x40011000 };
> +static const uint32_t uart_addr[MSF2_NUM_UARTS] = { 0x4000 , 0x4001 
> };
> +
> +static const int spi_irq[MSF2_NUM_SPIS] = { 2, 3 };
> +static const int uart_irq[MSF2_NUM_UARTS] = { 10, 11 };
> +static const int timer_irq[MSF2_NUM_TIMERS] = { 14, 15 };
> +
> +static void m2sxxx_soc_initfn(Object *obj)
> +{
> +MSF2State *s = MSF2_SOC(obj);
> +int i;
> +
> +object_initialize(>armv7m, sizeof(s->armv7m), TYPE_ARMV7M);
> +qdev_set_parent_bus(DEVICE(>armv7m), sysbus_get_default());
> +
> +object_initialize(>sysreg, sizeof(s->sysreg), TYPE_MSF2_SYSREG);
> +qdev_set_parent_bus(DEVICE(>sysreg), sysbus_get_default());
> +
> +object_initialize(>timer, sizeof(s->timer), TYPE_MSS_TIMER);
> +qdev_set_parent_bus(DEVICE(>timer), sysbus_get_default());
> +
> +for (i = 0; i < MSF2_NUM_SPIS; i++) {
> +object_initialize(>spi[i], sizeof(s->spi[i]),
> +  TYPE_MSS_SPI);
> +qdev_set_parent_bus(DEVICE(>spi[i]), sysbus_get_default());
> +}
> +}
> +
> +static void m2sxxx_soc_realize(DeviceState *dev_soc, Error **errp)
> +{
> +MSF2State *s = MSF2_SOC(dev_soc);
> +DeviceState *dev, *armv7m;
> +SysBusDevice *busdev;
> +Error *err = NULL;
> +int i;
> +
> +MemoryRegion *system_memory = get_system_memory();
> +MemoryRegion *nvm = g_new(MemoryRegion, 1);
> +MemoryRegion

Re: [Qemu-devel] [Qemu devel v6 PATCH 3/5] msf2: Add Smartfusion2 SPI controller

2017-07-05 Thread Alistair Francis

On Sun, Jul 2, 2017 at 9:45 PM, Subbaraya Sundeep
 wrote:
> Modelled Microsemi's Smartfusion2 SPI controller.
>
> Signed-off-by: Subbaraya Sundeep 
> ---
>  hw/ssi/Makefile.objs |   1 +
>  hw/ssi/mss-spi.c | 414 
> +++
>  include/hw/ssi/mss-spi.h |  62 +++
>  3 files changed, 477 insertions(+)
>  create mode 100644 hw/ssi/mss-spi.c
>  create mode 100644 include/hw/ssi/mss-spi.h
>
> diff --git a/hw/ssi/Makefile.objs b/hw/ssi/Makefile.objs
> index 487add2..f5bcc65 100644
> --- a/hw/ssi/Makefile.objs
> +++ b/hw/ssi/Makefile.objs
> @@ -4,6 +4,7 @@ common-obj-$(CONFIG_XILINX_SPI) += xilinx_spi.o
>  common-obj-$(CONFIG_XILINX_SPIPS) += xilinx_spips.o
>  common-obj-$(CONFIG_ASPEED_SOC) += aspeed_smc.o
>  common-obj-$(CONFIG_STM32F2XX_SPI) += stm32f2xx_spi.o
> +common-obj-$(CONFIG_MSF2) += mss-spi.o
>
>  obj-$(CONFIG_OMAP) += omap_spi.o
>  obj-$(CONFIG_IMX) += imx_spi.o
> diff --git a/hw/ssi/mss-spi.c b/hw/ssi/mss-spi.c
> new file mode 100644
> index 000..a572abc
> --- /dev/null
> +++ b/hw/ssi/mss-spi.c
> @@ -0,0 +1,414 @@
> +/*
> + * Block model of SPI controller present in
> + * Microsemi's SmartFusion2 and SmartFusion SoCs.
> + *
> + * Copyright (C) 2017 Subbaraya Sundeep 
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "hw/ssi/mss-spi.h"

Same comment as earlier patches.

> +
> +#ifndef MSS_SPI_ERR_DEBUG
> +#define MSS_SPI_ERR_DEBUG   0
> +#endif
> +
> +#define DB_PRINT_L(lvl, fmt, args...) do { \
> +if (MSS_SPI_ERR_DEBUG >= lvl) { \
> +qemu_log("%s: " fmt "\n", __func__, ## args); \
> +} \
> +} while (0);
> +
> +#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
> +
> +#define FIFO_CAPACITY 32
> +#define FIFO_CAPACITY 32
> +
> +#define R_SPI_CONTROL 0
> +#define R_SPI_DFSIZE  1
> +#define R_SPI_STATUS  2
> +#define R_SPI_INTCLR  3
> +#define R_SPI_RX  4
> +#define R_SPI_TX  5
> +#define R_SPI_CLKGEN  6
> +#define R_SPI_SS  7
> +#define R_SPI_MIS 8
> +#define R_SPI_RIS 9
> +
> +#define S_TXDONE (1 << 0)
> +#define S_RXRDY  (1 << 1)
> +#define S_RXCHOVRF   (1 << 2)
> +#define S_RXFIFOFUL  (1 << 4)
> +#define S_RXFIFOFULNXT   (1 << 5)
> +#define S_RXFIFOEMP  (1 << 6)
> +#define S_RXFIFOEMPNXT   (1 << 7)
> +#define S_TXFIFOFUL  (1 << 8)
> +#define S_TXFIFOFULNXT   (1 << 9)
> +#define S_TXFIFOEMP  (1 << 10)
> +#define S_TXFIFOEMPNXT   (1 << 11)
> +#define S_FRAMESTART (1 << 12)
> +#define S_SSEL   (1 << 13)
> +#define S_ACTIVE (1 << 14)
> +
> +#define C_ENABLE (1 << 0)
> +#define C_MODE   (1 << 1)
> +#define C_INTRXDATA  (1 << 4)
> +#define C_INTTXDATA  (1 << 5)
> +#define C_INTRXOVRFLO(1 << 6)
> +#define C_SPS(1 << 26)
> +#define C_BIGFIFO(1 << 29)
> +#define C_RESET  (1 << 31)
> +
> +#define FRAMESZ_MASK 0x1F
> +#define FMCOUNT_MASK 0x0000
> +#define FMCOUNT_SHIFT8
> +
> +static void txfifo_reset(MSSSpiState *s)
> +{
> +fifo32_reset(>tx_fifo);
> +
> +s->regs[R_SPI_STATUS] &= ~S_TXFIFOFUL;
> +s->regs[R_SPI_STATUS] |= S_TXFIFOEMP;
> +}
> +
> +static void rxfifo_reset(MSSSpiState *s)
> +{
> +fifo32_reset(>rx_fifo);
> +
> +s->regs[R_SPI_STATUS] &= ~S_RXFIFOFUL;
> +s->regs[R_SPI_STATUS] |= S_RXFIFOEMP;
> +}
> +
> +static void set_fifodepth(MSSSpiState *s)
> +{
> +unsigned int size = s->regs[R_SPI_DFSIZE] & FRAMESZ_MASK;
> +
> +if (size <= 8) {
> +s->fifo_depth = 32;
> +} else if (size <= 16) {
> +s->fifo_depth = 16;
> +} else if (size <= 32) {
> +s->fifo_depth = 8;

Re: [Qemu-devel] [Qemu devel v6 PATCH 2/5] msf2: Microsemi Smartfusion2 System Register block.

2017-07-05 Thread Alistair Francis

On Sun, Jul 2, 2017 at 9:45 PM, Subbaraya Sundeep
 wrote:
> Added Sytem register block of Smartfusion2.
> This block has PLL registers which are accessed by guest.
>
> Signed-off-by: Subbaraya Sundeep 
> ---
>  hw/misc/Makefile.objs |   1 +
>  hw/misc/msf2-sysreg.c | 200 
> ++
>  include/hw/misc/msf2-sysreg.h |  82 +
>  3 files changed, 283 insertions(+)
>  create mode 100644 hw/misc/msf2-sysreg.c
>  create mode 100644 include/hw/misc/msf2-sysreg.h
>
> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> index c8b4893..0f52354 100644
> --- a/hw/misc/Makefile.objs
> +++ b/hw/misc/Makefile.objs
> @@ -56,3 +56,4 @@ obj-$(CONFIG_EDU) += edu.o
>  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
>  obj-$(CONFIG_AUX) += auxbus.o
>  obj-$(CONFIG_ASPEED_SOC) += aspeed_scu.o aspeed_sdmc.o
> +obj-$(CONFIG_MSF2) += msf2-sysreg.o
> diff --git a/hw/misc/msf2-sysreg.c b/hw/misc/msf2-sysreg.c
> new file mode 100644
> index 000..64ee141
> --- /dev/null
> +++ b/hw/misc/msf2-sysreg.c
> @@ -0,0 +1,200 @@
> +/*
> + * System Register block model of Microsemi SmartFusion2.
> + *
> + * Copyright (c) 2017 Subbaraya Sundeep 
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +
> +#include "hw/misc/msf2-sysreg.h"

Same #include comment from patch 1.

> +
> +#ifndef MSF2_SYSREG_ERR_DEBUG
> +#define MSF2_SYSREG_ERR_DEBUG  0
> +#endif
> +
> +#define DB_PRINT_L(lvl, fmt, args...) do { \
> +if (MSF2_SYSREG_ERR_DEBUG >= lvl) { \
> +qemu_log("%s: " fmt "\n", __func__, ## args); \
> +} \
> +} while (0);
> +
> +#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
> +
> +static inline int msf2_divbits(uint32_t div)
> +{
> +int ret = 0;
> +
> +switch (div) {
> +case 1:
> +ret = 0;
> +break;
> +case 2:
> +ret = 1;
> +break;
> +case 4:
> +ret = 2;
> +break;
> +case 8:
> +ret = 4;
> +break;
> +case 16:
> +ret = 5;
> +break;
> +case 32:
> +ret = 6;
> +break;
> +default:
> +break;
> +}
> +
> +return ret;
> +}
> +
> +static void msf2_sysreg_reset(DeviceState *d)
> +{
> +MSF2SysregState *s = MSF2_SYSREG(d);
> +
> +DB_PRINT("RESET");
> +
> +s->regs[MSSDDR_PLL_STATUS_LOW_CR] = 0x021A2358;
> +s->regs[MSSDDR_PLL_STATUS] = 0x3;
> +s->regs[MSSDDR_FACC1_CR] = msf2_divbits(s->apb0div) << 5 |
> +   msf2_divbits(s->apb1div) << 2;
> +}
> +
> +static uint64_t msf2_sysreg_read(void *opaque, hwaddr offset,
> +unsigned size)
> +{
> +MSF2SysregState *s = opaque;
> +offset /= 4;

Probably best to use a bitshift.

> +uint32_t ret = 0;
> +
> +if (offset < ARRAY_SIZE(s->regs)) {
> +ret = s->regs[offset];
> +DB_PRINT("addr: 0x%08" HWADDR_PRIx " data: 0x%08" PRIx32,
> +offset * 4, ret);

Bitshift here as well.

> +} else {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +"%s: Bad offset 0x%08" HWADDR_PRIx "\n", __func__,
> +offset * 4);
> +}
> +
> +return ret;
> +}
> +
> +static void msf2_sysreg_write(void *opaque, hwaddr offset,
> +  uint64_t val, unsigned size)
> +{
> +MSF2SysregState *s = (MSF2SysregState *)opaque;
> +uint32_t newval = val;
> +uint32_t oldval;
> +
> +DB_PRINT("addr: 0x%08" HWADDR_PRIx " data: 0x%08" PRIx64,
> +offset, val);
> +
> +offset /= 4;

Same here

> +
> +switch (offset) {
> +case MSSDDR_PLL_STATUS:
> +break;
> +
> +case ESRAM_CR:
> +oldval = s->regs[ESRAM_CR];
> +if (oldval ^ newval) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +   TYPE_MSF2_SYSREG": eSRAM remapping not supported\n");
> +abort();

The guest should not be able to kill QEMU, a guest error should never
result in an abort.

> +}
> +break;
> +
> +case DDR_CR:
> +oldval = s->regs[DDR_CR];
> +if (oldval ^ newval) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +   TYPE_MSF2_SYSREG": DDR remapping not supported\n");
> +abort();
> +}
> +break;
> +
> +case ENVM_REMAP_BASE_CR:
> +oldval = s->regs[ENVM_REMAP_BASE_CR];
> +if (oldval ^ newval) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +   TYPE_MSF2_SYSREG": eNVM remapping not supported\n");
> +abort();
> +}
> +

Re: [Qemu-devel] Managing architectural restrictions with -device and libvirt

2017-07-05 Thread Markus Armbruster

Mark Cave-Ayland  writes:

> On 05/07/17 16:46, Markus Armbruster wrote:
>
> I've been working on a patchset that brings the sun4u machine on
> qemu-system-sparc64 much closer to a real Ultra 5, however due to
> various design restrictions I need to be able to restrict how devices
> are added to the machine with -device.
>
> On a real Ultra 5, the root PCI bus (sabre) has 2 PCI bridges (simba A
> and simba B) with the onboard devices attached to simba A with 2 free
> slots, and an initially empty simba B.
>
> Firstly, is it possible to restrict the machine so that devices cannot
> be directly plugged into the root PCI bus, but only behind one of the
> PCI bridges? There is also an additional restriction in that slot 0
> behind simba A must be left empty to ensure that the ebus (containing
> the onboard devices) is the first device allocated.

 I figure sabre, simba A, simba B and the onboard devices attached to
 simba A are all created by MachineClass init().
>>>
>>> Yes that is effectively correct, although the Simba devices are created
>>> as part of the PCI host bridge (apb) creation in pci_apb_init().
>> 
>> Anything that runs within init() counts as "created by init()".
>
> Okay, in that case we should be fine here.
>
 What device provides "the ebus", and how is it created?
>>>
>>> It's actually just an ISA bus, so the ebus device is effectively a
>>> PCI-ISA bridge for legacy devices.
>> 
>> Is this bridge created by init()?
>
> Yes, it too is called via the machine init function.
>
 Can you provide a list of all onboard PCI devices and how they are
 connected?  Diagram would be best.
>>>
>>> I can try and come up with something more concise later, however I can
>>> quickly give you the OpenBIOS DT from my WIP patchset if that helps:
>>>
>>> 0 > show-devs
>>> ffe1bf38 /
>>> ffe1c110 /aliases
>>> ffe1c238 /openprom (BootROM)
>>> ffe26b50 /openprom/client-services
>>> ffe1c4f0 /options
>>> ffe1c5d0 /chosen
>>> ffe1c710 /builtin
>>> ffe1c838 /builtin/console
>>> ffe26618 /packages
>>> ffe28640 /packages/cmdline
>>> ffe28890 /packages/disk-label
>>> ffe2c8d8 /packages/deblocker
>>> ffe2cef0 /packages/grubfs-files
>>> ffe2d300 /packages/sun-parts
>>> ffe2d718 /packages/elf-loader
>>> ffe2b210 /memory@0,0 (memory)
>>> ffe2b370 /virtual-memory
>>> ffe2d878 /pci@1fe,0 (pci)
>>> ffe2e1a8 /pci@1fe,0/pci@1,1 (pci)
>>> ffe2e960 /pci@1fe,0/pci@1,1/ebus@1
>>> ffe2f1b0 /pci@1fe,0/pci@1,1/ebus@1/eeprom@0
>>> ffe2f328 /pci@1fe,0/pci@1,1/ebus@1/fdthree@0 (block)
>>> ffe2f878 /pci@1fe,0/pci@1,1/ebus@1/su@0 (serial)
>>> ffe2fc08 /pci@1fe,0/pci@1,1/ebus@1/8042@0 (8042)
>>> ffe2fe00 /pci@1fe,0/pci@1,1/ebus@1/8042@0/kb_ps2@0 (serial)
>>> ffe301b0 /pci@1fe,0/pci@1,1/NE2000@1,1 (network)
>>> ffe307c8 /pci@1fe,0/pci@1,1/QEMU,VGA@2 (display)
>>> ffe31e40 /pci@1fe,0/pci@1,1/ide@3 (ide)
>>> ffe32398 /pci@1fe,0/pci@1,1/ide@3/ide0@4100 (ide)
>>> ffe32678 /pci@1fe,0/pci@1,1/ide@3/ide1@4200 (ide)
>>> ffe32910 /pci@1fe,0/pci@1,1/ide@3/ide1@4200/cdrom@0 (block)
>>> ffe32f98 /pci@1fe,0/pci@1 (pci)
>>> ffe336e8 /SUNW,UltraSPARC-IIi (cpu)
>>>  ok
>>>
>>> For comparison you can see the DT from a real Ultra 5 here:
>>> http://www.pearsonitcertification.com/articles/article.aspx?p=440286=7
>>>
 The real sabre has two slots, and doesn't support hot (un)plug.  Can we
 simply model that?  If yes, the root PCI bus is full after init(), and
 remains full.  Takes care of "cannot directly plugged into the root PCI
 bus".
>>>
>>> Right. So what you're saying is that if we add the 2 simba devices to
>>> the sabre PCI host bridge during machine init and then mark the sabre
>>> PCI root bus as not hotplug-able then that will prevent people adding
>>> extra devices from the command line via -device? I will see if I can
>>> find time to try this later this evening.
>> 
>> No.  Marking the bus "not hotpluggable" only prevents *hotplug*,
>> i.e. plug/unplug after machine initialization completed, commonly with
>> device_add.  -device is *cold* plug; it happens during machine
>> initialization.
>> 
>> However, if you limit sabre's bus to two slots (modelling real hardware
>> faithfully), then you can't cold plug anything (there's no free slot).
>> If you additionally mark the bus or both simba devices not hotpluggable
>> (again modelling real hardware faithfully), you can't unplug the simbas.
>> I believe that's what you want.
>
> It seems like limiting the size of the bus would solve the majority of
> the problem. I've had a quick look around pci.c and while I can see that
> the PCIBus creation functions take a devfn_min parameter, I can't see
> anything that limits the number of slots available on the bus?

Marcel?

> And presumably if the user did try and coldplug something into a full
> bus then they would get the standard "PCI: no slot/function
> available..." error?

That's what I'd expect.

>>> My understanding from reading

Re: [Qemu-devel] [Qemu devel v6 PATCH 1/5] msf2: Add Smartfusion2 System timer

2017-07-05 Thread Alistair Francis

On Wed, Jul 5, 2017 at 10:56 AM, Alistair Francis  wrote:
> On Sun, Jul 2, 2017 at 9:45 PM, Subbaraya Sundeep
>  wrote:
>> Modelled System Timer in Microsemi's Smartfusion2 Soc.
>> Timer has two 32bit down counters and two interrupts.
>>
>> Signed-off-by: Subbaraya Sundeep 
>> ---
>>  hw/timer/Makefile.objs   |   1 +
>>  hw/timer/mss-timer.c | 261 
>> +++
>>  include/hw/timer/mss-timer.h |  67 +++
>>  3 files changed, 329 insertions(+)
>>  create mode 100644 hw/timer/mss-timer.c
>>  create mode 100644 include/hw/timer/mss-timer.h
>>
>> diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
>> index dd6f27e..fc4d2da 100644
>> --- a/hw/timer/Makefile.objs
>> +++ b/hw/timer/Makefile.objs
>> @@ -41,3 +41,4 @@ common-obj-$(CONFIG_STM32F2XX_TIMER) += stm32f2xx_timer.o
>>  common-obj-$(CONFIG_ASPEED_SOC) += aspeed_timer.o
>>
>>  common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
>> +common-obj-$(CONFIG_MSF2) += mss-timer.o
>> diff --git a/hw/timer/mss-timer.c b/hw/timer/mss-timer.c
>> new file mode 100644
>> index 000..e46d118
>> --- /dev/null
>> +++ b/hw/timer/mss-timer.c
>> @@ -0,0 +1,261 @@
>> +/*
>> + * Block model of System timer present in
>> + * Microsemi's SmartFusion2 and SmartFusion SoCs.
>> + *
>> + * Copyright (c) 2017 Subbaraya Sundeep .
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a 
>> copy
>> + * of this software and associated documentation files (the "Software"), to 
>> deal
>> + * in the Software without restriction, including without limitation the 
>> rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included 
>> in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
>> FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "hw/timer/mss-timer.h"

Also just noticed (this applies to the whole series) that you need the
headers in the C file.

Move every #include that you can from the mss-timer.h files to this
file, headers should be local to where they are called. Also make sure
you include the os-dep header in every file.

Thanks,
Alistair

>> +
>> +#ifndef MSS_TIMER_ERR_DEBUG
>> +#define MSS_TIMER_ERR_DEBUG  0
>> +#endif
>> +
>> +#define DB_PRINT_L(lvl, fmt, args...) do { \
>> +if (MSS_TIMER_ERR_DEBUG >= lvl) { \
>> +qemu_log("%s: " fmt "\n", __func__, ## args); \
>> +} \
>> +} while (0);
>> +
>> +#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
>> +
>> +#define R_TIM_VAL 0
>> +#define R_TIM_LOADVAL 1
>> +#define R_TIM_BGLOADVAL   2
>> +#define R_TIM_CTRL3
>> +#define R_TIM_RIS 4
>> +#define R_TIM_MIS 5
>> +
>> +#define TIMER_CTRL_ENBL (1 << 0)
>> +#define TIMER_CTRL_ONESHOT  (1 << 1)
>> +#define TIMER_CTRL_INTR (1 << 2)
>> +#define TIMER_RIS_ACK   (1 << 0)
>> +#define TIMER_RST_CLR   (1 << 6)
>> +#define TIMER_MODE  (1 << 0)
>> +
>> +static void timer_update_irq(struct Msf2Timer *st)
>> +{
>> +bool isr, ier;
>> +
>> +isr = !!(st->regs[R_TIM_RIS] & TIMER_RIS_ACK);
>> +ier = !!(st->regs[R_TIM_CTRL] & TIMER_CTRL_INTR);
>> +qemu_set_irq(st->irq, (ier && isr));
>> +}
>> +
>> +static void timer_update(struct Msf2Timer *st)
>> +{
>> +uint64_t count;
>> +
>> +if (!(st->regs[R_TIM_CTRL] & TIMER_CTRL_ENBL)) {
>> +ptimer_stop(st->ptimer);
>> +return;
>> +}
>> +
>> +count = st->regs[R_TIM_LOADVAL];
>> +ptimer_set_limit(st->ptimer, count, 1);
>> +ptimer_run(st->ptimer, 1);
>> +}
>> +
>> +static uint64_t
>> +timer_read(void *opaque, hwaddr offset, unsigned int size)
>> +{
>> +MSSTimerState *t = opaque;
>> +hwaddr addr;
>> +struct Msf2Timer *st;
>> +uint32_t ret = 0;
>> +int timer = 0;
>> +int isr;
>> +int ier;
>> +
>> +addr = offset >> 2;
>> +/*
>> + * Two independent timers has same base address.
>> + * Based on address passed figure out which timer is being used.
>> + */
>> +if ((addr >= R_TIM1_MAX) && (addr < NUM_TIMERS * R_TIM1_MAX)) {
>> +timer = 1;
>> +addr -= R_TIM1_MAX;
>> +}
>> +
>> +st = >timers[timer];
>> +
>> +switch (addr) {
>> +

Re: [Qemu-devel] [Qemu devel v6 PATCH 1/5] msf2: Add Smartfusion2 System timer

2017-07-05 Thread Alistair Francis

On Sun, Jul 2, 2017 at 9:45 PM, Subbaraya Sundeep
 wrote:
> Modelled System Timer in Microsemi's Smartfusion2 Soc.
> Timer has two 32bit down counters and two interrupts.
>
> Signed-off-by: Subbaraya Sundeep 
> ---
>  hw/timer/Makefile.objs   |   1 +
>  hw/timer/mss-timer.c | 261 
> +++
>  include/hw/timer/mss-timer.h |  67 +++
>  3 files changed, 329 insertions(+)
>  create mode 100644 hw/timer/mss-timer.c
>  create mode 100644 include/hw/timer/mss-timer.h
>
> diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
> index dd6f27e..fc4d2da 100644
> --- a/hw/timer/Makefile.objs
> +++ b/hw/timer/Makefile.objs
> @@ -41,3 +41,4 @@ common-obj-$(CONFIG_STM32F2XX_TIMER) += stm32f2xx_timer.o
>  common-obj-$(CONFIG_ASPEED_SOC) += aspeed_timer.o
>
>  common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
> +common-obj-$(CONFIG_MSF2) += mss-timer.o
> diff --git a/hw/timer/mss-timer.c b/hw/timer/mss-timer.c
> new file mode 100644
> index 000..e46d118
> --- /dev/null
> +++ b/hw/timer/mss-timer.c
> @@ -0,0 +1,261 @@
> +/*
> + * Block model of System timer present in
> + * Microsemi's SmartFusion2 and SmartFusion SoCs.
> + *
> + * Copyright (c) 2017 Subbaraya Sundeep .
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "hw/timer/mss-timer.h"
> +
> +#ifndef MSS_TIMER_ERR_DEBUG
> +#define MSS_TIMER_ERR_DEBUG  0
> +#endif
> +
> +#define DB_PRINT_L(lvl, fmt, args...) do { \
> +if (MSS_TIMER_ERR_DEBUG >= lvl) { \
> +qemu_log("%s: " fmt "\n", __func__, ## args); \
> +} \
> +} while (0);
> +
> +#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
> +
> +#define R_TIM_VAL 0
> +#define R_TIM_LOADVAL 1
> +#define R_TIM_BGLOADVAL   2
> +#define R_TIM_CTRL3
> +#define R_TIM_RIS 4
> +#define R_TIM_MIS 5
> +
> +#define TIMER_CTRL_ENBL (1 << 0)
> +#define TIMER_CTRL_ONESHOT  (1 << 1)
> +#define TIMER_CTRL_INTR (1 << 2)
> +#define TIMER_RIS_ACK   (1 << 0)
> +#define TIMER_RST_CLR   (1 << 6)
> +#define TIMER_MODE  (1 << 0)
> +
> +static void timer_update_irq(struct Msf2Timer *st)
> +{
> +bool isr, ier;
> +
> +isr = !!(st->regs[R_TIM_RIS] & TIMER_RIS_ACK);
> +ier = !!(st->regs[R_TIM_CTRL] & TIMER_CTRL_INTR);
> +qemu_set_irq(st->irq, (ier && isr));
> +}
> +
> +static void timer_update(struct Msf2Timer *st)
> +{
> +uint64_t count;
> +
> +if (!(st->regs[R_TIM_CTRL] & TIMER_CTRL_ENBL)) {
> +ptimer_stop(st->ptimer);
> +return;
> +}
> +
> +count = st->regs[R_TIM_LOADVAL];
> +ptimer_set_limit(st->ptimer, count, 1);
> +ptimer_run(st->ptimer, 1);
> +}
> +
> +static uint64_t
> +timer_read(void *opaque, hwaddr offset, unsigned int size)
> +{
> +MSSTimerState *t = opaque;
> +hwaddr addr;
> +struct Msf2Timer *st;
> +uint32_t ret = 0;
> +int timer = 0;
> +int isr;
> +int ier;
> +
> +addr = offset >> 2;
> +/*
> + * Two independent timers has same base address.
> + * Based on address passed figure out which timer is being used.
> + */
> +if ((addr >= R_TIM1_MAX) && (addr < NUM_TIMERS * R_TIM1_MAX)) {
> +timer = 1;
> +addr -= R_TIM1_MAX;
> +}
> +
> +st = >timers[timer];
> +
> +switch (addr) {
> +case R_TIM_VAL:
> +ret = ptimer_get_count(st->ptimer);
> +break;
> +
> +case R_TIM_MIS:
> +isr = !!(st->regs[R_TIM_RIS] & TIMER_RIS_ACK);
> +ier = !!(st->regs[R_TIM_CTRL] & TIMER_CTRL_INTR);
> +ret = ier & isr;
> +break;
> +
> +default:
> +if (addr < NUM_TIMERS * R_TIM1_MAX) {

Shouldn't this just be: addr < R_TIM1_MAX?

At this point you have already subjected the offset for multiple timers.

> +ret = st->regs[addr];
> +} else {
> +

Re: [Qemu-devel] [PATCH v3 3/5] block: qobject_is_equal() in bdrv_reopen_prepare()

2017-07-05 Thread Max Reitz

On 2017-07-05 09:14, Markus Armbruster wrote:
> Max Reitz  writes:
> 
>> Currently, bdrv_reopen_prepare() assumes that all BDS options are
>> strings. However, this is not the case if the BDS has been created
>> through the json: pseudo-protocol or blockdev-add.
>>
>> Note that the user-invokable reopen command is an HMP command, so you
>> can only specify strings there. Therefore, specifying a non-string
>> option with the "same" value as it was when originally created will now
>> return an error because the values are supposedly similar (and there is
>> no way for the user to circumvent this but to just not specify the
>> option again -- however, this is still strictly better than just
>> crashing).
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  block.c | 31 ++-
>>  1 file changed, 18 insertions(+), 13 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 913bb43..45eb248 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2947,19 +2947,24 @@ int bdrv_reopen_prepare(BDRVReopenState 
>> *reopen_state, BlockReopenQueue *queue,
>>  const QDictEntry *entry = qdict_first(reopen_state->options);
>>  
>>  do {
>> -QString *new_obj = qobject_to_qstring(entry->value);
>> -const char *new = qstring_get_str(new_obj);
>> -/*
>> - * Caution: while qdict_get_try_str() is fine, getting
>> - * non-string types would require more care.  When
>> - * bs->options come from -blockdev or blockdev_add, its
>> - * members are typed according to the QAPI schema, but
>> - * when they come from -drive, they're all QString.
>> - */
>> -const char *old = qdict_get_try_str(reopen_state->bs->options,
>> -entry->key);
>> -
>> -if (!old || strcmp(new, old)) {
>> +QObject *new = entry->value;
>> +QObject *old = qdict_get(reopen_state->bs->options, entry->key);
>> +
>> +/* TODO: When using -drive to specify blockdev options, all 
>> values
>> + * will be strings; however, when using -blockdev, blockdev-add 
>> or
>> + * filenames using the json:{} pseudo-protocol, they will be
>> + * correctly typed.
>> + * In contrast, reopening options are (currently) always strings
>> + * (because you can only specify them through qemu-io; all other
>> + * callers do not specify any options).
>> + * Therefore, when using anything other than -drive to create a 
>> BDS,
>> + * this cannot detect non-string options as unchanged, because
>> + * qobject_is_equal() always returns false for objects of 
>> different
>> + * type.  In the future, this should be remedied by correctly 
>> typing
>> + * all options.  For now, this is not too big of an issue 
>> because
>> + * the user simply can not specify options which cannot be 
>> changed
>> + * anyway, so they will stay unchanged. */
> 
> I'm not the maintainer, and this is not a demand: consider winging this
> comment and wrapping lines around column 70.

I actually did consider wrapping around column 70 and decided against it
because this comment is already indented by 12 characters, so none of
the lines exceed 65 characters (80 - (12 + strlen(" * "))).

About winging...  For some reason I don't quite like it here.  But it
probably is better because the comment is not immediately related to the
qobject_is_equal() call below, so I'll do it.

> As much as I fancy word play (see your reply to Eric), I have to admit
> that I had to read "the user simply can not specify options" about three
> times to make sense of it.  Please consider a wording that's easier to
> grasp, perhaps "the user can simply refrain from specifying options that
> cannot be changed".

Aw.  I've wanted this to be an example I could point people to to
educate them about the difference.  Now, alas, none shall learn. :-(

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] hw/s390x/ipl: Fix endianness problem with netboot_start_addr

2017-07-05 Thread Christian Borntraeger

On 07/05/2017 05:25 PM, Thomas Huth wrote:
> The start address has to be stored in big endian byte order
> in the iplb.ccw block for the guest.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/s390x/ipl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
> index 4e6469d..cc36003 100644
> --- a/hw/s390x/ipl.c
> +++ b/hw/s390x/ipl.c
> @@ -418,7 +418,7 @@ void s390_ipl_prepare_cpu(S390CPU *cpu)
>  error_report_err(err);
>  vm_stop(RUN_STATE_INTERNAL_ERROR);
>  }
> -ipl->iplb.ccw.netboot_start_addr = ipl->start_addr;
> +ipl->iplb.ccw.netboot_start_addr = cpu_to_be64(ipl->start_addr);
>  }
>  }
> 

Thanks also applied to s390-next.

Re: [Qemu-devel] [PATCH v5 03/13] char: chardevice hotswap

2017-07-05 Thread Anton Nefedov


On 07/05/2017 06:09 PM, Paolo Bonzini wrote:



On 05/07/2017 16:01, Anton Nefedov wrote:

This patch adds a possibility to change a char device without a frontend
removal.

1. Ideally, it would have to happen transparently to a frontend, i.e.
frontend would continue its regular operation.
However, backends are not stateless and are set up by the frontends
via qemu_chr_fe_<> functions, and it's not (generally) possible to replay
that setup entirely in a backend code, as different chardevs respond
to the setup calls differently, so do frontends work differently basing
on those setup responses.
Moreover, some frontend can generally get and save the backend pointer
(qemu_chr_fe_get_driver()), and it will become invalid after backend change.

So, a frontend which would like to support chardev hotswap has to register
a "backend change" handler, and redo its backend setup there.

2. Write path can be used by multiple threads and thus protected with
chr_write_lock.
So hotswap also has to be protected so write functions won't access
a backend being replaced.


Does this matter in practice?  CharBackend thread safety can be left to
the front-end.

Paolo



Hi Paolo,

So instead we'll need to use proper locks in each of the front-ends?
Or do you mean that it can be skipped for the most of them? I don't know
about all possible threading cases.
e.g. for serial/virtio-serial? Will they always share the same thread
with hmp/qmp driven chardev-change command? And won't yield and hotswap
in the middle of some write handler?

/Anton

[Qemu-devel] [RFC v3 2/3] qemu-error: Implement a more generic error reporting

2017-07-05 Thread Alistair Francis

This patch converts the existing error_vreport() function into a generic
qmesg_vreport() function that takes an enum describing the
information to be reported.

As part of this change a new qmesg_report() function is added as well with the
same capability.

To maintain full compatibility the original error_report() function is
maintained and no changes to the way errors are printed have been made.
To improve access to the new informaiton and warning options wrapper functions
similar to error_report() have been added for warnings and information
printing.

Signed-off-by: Alistair Francis 
---
RFC V3:
 - Change the function and enum names to be more descriptive
 - Add wrapper functions for *_report() and *_vreport()

 include/qemu/error-report.h | 16 +
 scripts/checkpatch.pl   |  8 -
 util/qemu-error.c   | 80 +++--
 3 files changed, 100 insertions(+), 4 deletions(-)

diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h
index 3001865896..62fc167ace 100644
--- a/include/qemu/error-report.h
+++ b/include/qemu/error-report.h
@@ -21,6 +21,12 @@ typedef struct Location {
 struct Location *prev;
 } Location;
 
+typedef enum {
+REPORT_TYPE_ERROR,
+REPORT_TYPE_WARNING,
+REPORT_TYPE_INFO,
+} report_type;
+
 Location *loc_push_restore(Location *loc);
 Location *loc_push_none(Location *loc);
 Location *loc_pop(Location *loc);
@@ -30,13 +36,23 @@ void loc_set_none(void);
 void loc_set_cmdline(char **argv, int idx, int cnt);
 void loc_set_file(const char *fname, int lno);
 
+void qmsg_vreport(report_type type, const char *fmt, va_list ap) 
GCC_FMT_ATTR(2, 0);
+void qmsg_report(report_type type, const char *fmt, ...)  GCC_FMT_ATTR(2, 3);
+
 void error_vprintf(const char *fmt, va_list ap) GCC_FMT_ATTR(1, 0);
 void error_printf(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
 void error_vprintf_unless_qmp(const char *fmt, va_list ap) GCC_FMT_ATTR(1, 0);
 void error_printf_unless_qmp(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
 void error_set_progname(const char *argv0);
+
 void error_vreport(const char *fmt, va_list ap) GCC_FMT_ATTR(1, 0);
+void warn_vreport(const char *fmt, va_list ap) GCC_FMT_ATTR(1, 0);
+void info_vreport(const char *fmt, va_list ap) GCC_FMT_ATTR(1, 0);
+
 void error_report(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
+void warn_report(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
+void info_report(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
+
 const char *error_get_progname(void);
 extern bool enable_timestamp_msg;
 
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 45027b9281..8b02621739 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2530,8 +2530,14 @@ sub process {
error_set|
error_prepend|
error_reportf_err|
+   qmsg_vreport|
error_vreport|
-   error_report}x;
+   warn_vreport|
+   info_vreport|
+   qmsg_report|
+   error_report|
+   warn_report|
+   info_report}x;
 
if ($rawline =~ /\b(?:$qemu_error_funcs)\s*\(.*\".*\\n/) {
ERROR("Error messages should not contain newlines\n" . 
$herecurr);
diff --git a/util/qemu-error.c b/util/qemu-error.c
index 1c5e35ecdb..63fdc0e174 100644
--- a/util/qemu-error.c
+++ b/util/qemu-error.c
@@ -179,17 +179,29 @@ static void print_loc(void)
 
 bool enable_timestamp_msg;
 /*
- * Print an error message to current monitor if we have one, else to stderr.
+ * Print a message to current monitor if we have one, else to stderr.
  * Format arguments like vsprintf().  The resulting message should be
  * a single phrase, with no newline or trailing punctuation.
  * Prepend the current location and append a newline.
  * It's wrong to call this in a QMP monitor.  Use error_setg() there.
  */
-void error_vreport(const char *fmt, va_list ap)
+void qmsg_vreport(report_type type, const char *fmt, va_list ap)
 {
 GTimeVal tv;
 gchar *timestr;
 
+switch (type) {
+case REPORT_TYPE_ERROR:
+/* To maintain compatibility we don't add anything here */
+break;
+case REPORT_TYPE_WARNING:
+error_printf("warning: ");
+break;
+case REPORT_TYPE_INFO:
+error_printf("info: ");
+break;
+}
+
 if (enable_timestamp_msg && !cur_mon) {
 g_get_current_time();
 timestr = g_time_val_to_iso8601();
@@ -204,16 +216,78 @@ void error_vreport(const char *fmt, va_list ap)
 
 /*
  * Print an error message to current monitor if we have one, else to stderr.
+ */
+void error_vreport(const char *fmt, va_list ap)
+{
+qmsg_vreport(REPORT_TYPE_ERROR, fmt, ap);
+}
+
+/*
+ * Print a warning message to current monitor

Re: [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base

2017-07-05 Thread Dr. David Alan Gilbert

* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > When we receive a GET_VRING_BASE message set enable = false
> > to stop any new received packets modifying the ring.
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> 
> I think I already reviewed a similar patch.

Yes you replied to my off-list mail; I hadn't got
around to fixing it yet.

> Spec says:
> Client must only process each ring when it is started.

but in that reply you said the spec said 

  Client must only pass data between the ring and the
  backend, when the ring is enabled.

So does the spec say 'started' or 'enabled'
(Pointer to the spec?)

> IMHO the real fix is to fix client to check the started
> flag before processing the ring.

Yep I can do that.  I was curious however whether it was
specified as 'started' or 'enabled' or both.

Dave

> > ---
> >  contrib/libvhost-user/libvhost-user.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/contrib/libvhost-user/libvhost-user.c 
> > b/contrib/libvhost-user/libvhost-user.c
> > index ceddeac74f..d37052b7b0 100644
> > --- a/contrib/libvhost-user/libvhost-user.c
> > +++ b/contrib/libvhost-user/libvhost-user.c
> > @@ -652,6 +652,7 @@ vu_get_vring_base_exec(VuDev *dev, VhostUserMsg *vmsg)
> >  vmsg->size = sizeof(vmsg->payload.state);
> >  
> >  dev->vq[index].started = false;
> > +dev->vq[index].enable = false;
> >  if (dev->iface->queue_set_started) {
> >  dev->iface->queue_set_started(dev, index, false);
> >  }
> > -- 
> > 2.13.0
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [RFC v3 3/3] char-socket: Report TCP socket waiting as information

2017-07-05 Thread Alistair Francis

When QEMU is waiting for a TCP socket connection it reports that message as
an error. This isn't an error it is just information so let's change the
report to use info_report() instead.

Signed-off-by: Alistair Francis 
---

 chardev/char-socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index ccc499cfa1..a050a686ea 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -765,8 +765,8 @@ static int tcp_chr_wait_connected(Chardev *chr, Error 
**errp)
  * in TLS and telnet cases, only wait for an accepted socket */
 while (!s->ioc) {
 if (s->is_listen) {
-error_report("QEMU waiting for connection on: %s",
- chr->filename);
+info_report("QEMU waiting for connection on: %s",
+chr->filename);
 qio_channel_set_blocking(QIO_CHANNEL(s->listen_ioc), true, NULL);
 tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr);
 qio_channel_set_blocking(QIO_CHANNEL(s->listen_ioc), false, NULL);
-- 
2.11.0

1 2 3 4 >

1 - 100 of 372 matches

Mail list logo