Re: [Qemu-devel] [PATCH] [v2 PATCH] qemu-img: sort block formats in help message

2014-05-16 Thread Mike Day
On Fri, May 16, 2014 at 9:59 AM, Laurent Desnogues
laurent.desnog...@gmail.com wrote:

 just noticed the use of some too recent glib features: g_strcmp0
 and GSequence related ones.

There is a patch upstream that removes g_sequence_lookup, which was
not linking on one platform. I haven't seen any other reports of
breakage. g_strcmp would be trivial to replace, of course.

thanks!

Mike



Re: [Qemu-devel] [PATCH v2] qemu-img fails to delete last snapshot

2014-05-15 Thread Mike Day
On Wed, May 14, 2014 at 11:15 AM, Kevin Wolf kw...@redhat.com wrote:
 freeing the extra clusters.

 Do you have an easy reproducer? Because I can't see the bug.

Thanks for the review! I was having a hard time reproducing this until
I did a bisect. This bug was fixed by 65f33bc which was merged at or
after  the time I submitted the patch:

 qcow2: Fix alloc_clusters_noref() overflow detection

I can reproduce the bug by checking out the immediate ancestor
43cbeffb1, creating a single snapshot in a qcow2 image, and then
attempting to delete that snapshot.

The error I get is:
qemu-img: Could not delete snapshot 'snapone': (Failed to remove
snapshot from snapshot list: File too large)

This is the error that is fixed by 65f33bc

Mike



Re: [Qemu-devel] [PULL 03/17] qemu-img: sort block formats in help message

2014-05-14 Thread Mike Day
On Wed, May 14, 2014 at 8:35 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 Jeff Cody recently wanted to eliminate duplicate entries in the list.  I
 thought part of your intention was to address the duplicates with your
 patch.

 We can back out the sequence API if it's not supported on older glib but
 it would be nice to eliminate duplicates later, too.

I agree. I can submit an additional patch that uses an older API.
What, exactly is the cause of duplicate entries in the list? I've only
seen it one time but its disconcerting when it happens.

Mike



Re: [Qemu-devel] [PATCH] qemu-img: sort block formats in help message

2014-05-14 Thread Mike Day
On Wed, May 14, 2014 at 9:02 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 Yes, we cannot change the git commit history once a commit is in the
 public qemu.git repository.  The only options are to:

 1. Add a patch on top that fixes the issue.
 2. Use git-revert(1) to undo your commit entirely (this adds a new patch
on top the applies the reverse diff).

 In this case #1 seems like a good choice.

I followed your advice and went with #1, I think its already been pulled.

Mike



Re: [Qemu-devel] [PULL 03/17] qemu-img: sort block formats in help message

2014-05-14 Thread Mike Day
On Wed, May 14, 2014 at 9:43 AM, Jeff Cody jc...@redhat.com wrote:
 Prior to a recent commit, this function did not make distinction on
 duplicates.  As of commit e855e4fb7, duplicates are not longer printed
 in the help message:

 e855e4fb7: Ignore duplicate or NULL format_name in bdrv_iterate_format):


That makes sense, thanks for the background!

Mike



[Qemu-devel] [PATCH] qemu-img: sort block formats in help message

2014-05-13 Thread Mike Day
The help message for qemu-img lists the supported block formats, of
which there are 27 as of version 2.0.50. The formats are printed in
the order of their driver's position in a linked list, which appears
random. This patch prints the formats in sorted order, making it
easier to read and to find a specific format in the list.

[Added suggestions from Fam Zheng f...@redhat.com to declare variables
at the top of the scope in help() and to omit explicit cast for void*
opaque.
--Stefan]

[Removed call to g_sequence_lookup because it breaks the build on
 machines with glib  2.28.
--Mike]

Signed-off-by: Mike Day ncm...@ncultra.org
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 qemu-img.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 96f4463..93e51d1 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -32,6 +32,7 @@
 #include block/block_int.h
 #include block/qapi.h
 #include getopt.h
+#include glib.h
 
 #define QEMU_IMG_VERSION qemu-img version  QEMU_VERSION \
   , Copyright (c) 2004-2008 Fabrice Bellard\n
@@ -55,9 +56,22 @@ typedef enum OutputFormat {
 #define BDRV_O_FLAGS BDRV_O_CACHE_WB
 #define BDRV_DEFAULT_CACHE writeback
 
-static void format_print(void *opaque, const char *name)
+static gint compare_data(gconstpointer a, gconstpointer b, gpointer user)
 {
-printf( %s, name);
+return g_strcmp0(a, b);
+}
+
+static void print_format(gpointer data, gpointer user)
+{
+printf( %s, (char *)data);
+}
+
+static void add_format_to_seq(void *opaque, const char *fmt_name)
+{
+GSequence *seq = opaque;
+
+g_sequence_insert_sorted(seq, (gpointer)fmt_name,
+ compare_data, NULL);
 }
 
 static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...)
@@ -142,10 +156,15 @@ static void QEMU_NORETURN help(void)
  '-f' first image format\n
  '-F' second image format\n
  '-s' run in Strict mode - fail on different image size or sector 
allocation\n;
+GSequence *seq;
 
 printf(%s\nSupported formats:, help_msg);
-bdrv_iterate_format(format_print, NULL);
+seq = g_sequence_new(NULL);
+bdrv_iterate_format(add_format_to_seq, seq);
+g_sequence_foreach(seq, print_format, NULL);
 printf(\n);
+g_sequence_free(seq);
+
 exit(EXIT_SUCCESS);
 }
 
-- 
1.9.0




Re: [Qemu-devel] [PULL 03/17] qemu-img: sort block formats in help message

2014-05-13 Thread Mike Day
On Tue, May 13, 2014 at 10:18 AM, Cornelia Huck
cornelia.h...@de.ibm.com wrote:

 qemu-img.o: In function `add_format_to_seq':
 /home/cohuck/git/qemu/qemu-img.c:73: undefined reference to 
 `g_sequence_lookup'
 collect2: ld returned 1 exit status

 g_sequence_lookup has been added with glib 2.28, and this box has
 2.22.5. configure looks for glib = 2.12 (2.20 for mingw).

g_sequence_lookup is there because I was getting duplicate formats in
the format list, as in vfat vfat vfat qcow. That was a temporary
situation. the lookup serves to guarantee each format is unique in the
list. It may not be needed.

Mike



[Qemu-devel] [PATCH] Remove g_sequence_lookup from qemu-img help function

2014-05-13 Thread Mike Day
g_sequence_lookup is not supported by glib  2.28. The usage
of g_sequence_lookup is not essential in this context (its a
safeguard against duplicate values in the help message).
Removing the call enables the build on all platforms and
does not change the operation of the help function.

Signed-off-by: Mike Day ncm...@ncultra.org
---
 qemu-img.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 04ce02a..bf5e74c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -70,11 +70,8 @@ static void add_format_to_seq(void *opaque, const char 
*fmt_name)
 {
 GSequence *seq = opaque;
 
-if (!g_sequence_lookup(seq, (gpointer)fmt_name,
-   compare_data, NULL)) {
 g_sequence_insert_sorted(seq, (gpointer)fmt_name,
  compare_data, NULL);
-}
 }
 
 static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...)
-- 
1.9.0




[Qemu-devel] [PATCH] Remove g_sequence_lookup from qemu-img help function

2014-05-13 Thread Mike Day
g_sequence_lookup is not supported by glib  2.28. The usage
of g_sequence_lookup is not essential in this context (it's a
safeguard against duplicate values in the help message).
Removing the call enables the build on all platforms and
does not change the operation of the help function.

Signed-off-by: Mike Day ncm...@ncultra.org
---
 qemu-img.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 04ce02a..1ad899e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -70,11 +70,8 @@ static void add_format_to_seq(void *opaque, const char 
*fmt_name)
 {
 GSequence *seq = opaque;
 
-if (!g_sequence_lookup(seq, (gpointer)fmt_name,
-   compare_data, NULL)) {
-g_sequence_insert_sorted(seq, (gpointer)fmt_name,
- compare_data, NULL);
-}
+g_sequence_insert_sorted(seq, (gpointer)fmt_name,
+ compare_data, NULL);
 }
 
 static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...)
-- 
1.9.0




[Qemu-devel] [PATCH] Add backing file option to qemu-img create help.

2014-05-12 Thread Mike Day
For the create subcommand the backing file (-b) option is documented
on-line but not in the binary. Add it.

Signed-off-by: Mike Day ncm...@ncultra.org
---
 qemu-img-cmds.hx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index d029609..7724709 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -16,9 +16,9 @@ STEXI
 ETEXI
 
 DEF(create, img_create,
-create [-q] [-f fmt] [-o options] filename [size])
+create [-q] [-f fmt] [-b backing-file] [-o options] filename [size])
 STEXI
-@item create [-q] [-f @var{fmt}] [-o @var{options}] @var{filename} [@var{size}]
+@item create [-q] [-f @var{fmt}] [-b @var{backing_file}] [-o @var{options}] 
@var{filename} [@var{size}]
 ETEXI
 
 DEF(commit, img_commit,
-- 
1.9.0




Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.

2014-05-12 Thread Mike Day
On Mon, May 12, 2014 at 11:20 AM, Eric Blake ebl...@redhat.com wrote:
 Online where?  In 'qemu-img --help', or on some web page (at what URL)?

http://qemu.weilnetz.de/qemu-doc.html#vm_005fsnapshots

-o backing file is not documented in the help command.

Mike



Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.

2014-05-12 Thread Mike Day
On Mon, May 12, 2014 at 11:53 AM, Eric Blake ebl...@redhat.com wrote:
 Ah, but it is:

 $ qemu-img create -f qcow2 -o help
 Supported options:
 size Virtual disk size
 compat   Compatibility level (0.10 or 1.1)
 backing_file File name of a base image
 backing_fmt  Image format of the base image
 encryption   Encrypt the image
 cluster_size qcow2 cluster size
 preallocationPreallocation mode (allowed values: off, metadata)
 lazy_refcounts   Postpone refcount updates

 and more importantly, it only appears in the help output of -f modes
 that actually support a backing file.  Contrast:

 $ qemu-img create -f raw -o help
 Supported options:
 size Virtual disk size

This is much more prominent in the top-level help:

rebase [-q] [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F
backing_fmt] filename



Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.

2014-05-12 Thread Mike Day
On Mon, May 12, 2014 at 12:36 PM, Kevin Wolf kw...@redhat.com wrote:
 What would qemu-img rebase do with -o? It is just for (safely) changing
 the backing file, not for updating options. There is qemu-img amend for
 that, and it does have an -o option.

A little background ... I'm writing a 4-day  KVM training course for
the Linux foundation. From time to time I come across undocumented,
buggy, or confusing options and if I can I try to fix them. In the
case of qemu-img I looked in a lot of places for documentation and
ended up at the unofficial web page that documented the -b backing
file option which works just fine for create and rebase. If an
official page had the information I needed in one place instead of
bits and pieces scattered everywhere I would have used it, naturally.

Here's why the help is confusing. On the top level help, all the
options except for -b are documented. A couple subcommands are even
listed in the top level help with their options. I expected that -b
would be documented on the top level with the others. The fact that -b
basefile is documented for the rebase subcommand reenforced my
expectation. I was assuming there is some consistency in using options
with backing files. (Especially if the option is called
backingfile.)

I'm not advocating anything here, and I think its fine to leave things
as they are. I just don't want to be the guy who complains and never
fixes anything.

Mike



Re: [Qemu-devel] [PATCH] qemu-img fails to delete last snapshot

2014-05-12 Thread Mike Day
On Mon, May 12, 2014 at 12:39 PM, Eric Blake ebl...@redhat.com wrote:
 This information is actually quite useful in understanding the patch,
 and I would have included it prior to the --- for inclusion in git,
 rather than in the reviewer-only commentary that gets stripped during
 'git am'.


  block/qcow2-snapshot.c | 25 +++--
  1 file changed, 15 insertions(+), 10 deletions(-)

 I'd suggest posting a v2 with a better commit message; but the code
 itself seems fine, so you can add:

 Reviewed-by: Eric Blake ebl...@redhat.com

Got it, thanks!

Mike



[Qemu-devel] [PATCH v2] qemu-img fails to delete last snapshot

2014-05-12 Thread Mike Day
When deleting the last snapshot, copying the resulting snapshot table
currently fails, causing the delete operation to also fail. Fix the
failure by skipping the copy and just writing the snapshot header and
freeing the extra clusters.

There are two specific problems in the current code. First is a lack of
parenthesis in the calculation of the memmove size parameter:

s-nb_snapshots - snapshot_index - 1

When s-nb_snapshots is 0, snapshot_index is 1.

0 - 1 - 1 = 0xfffe

it should be:

0 - (1 - 1) = 0x00

The second problem is shifting the snapshot table to the left. After
removing the last snapshot there are no existing snapshots to be
shifted. All that needs to be done is to write the header and
unallocate the blocks.

Signed-off-by: Mike Day ncm...@ncultra.org
Reviewed-by: Eric Blake ebl...@redhat.com
---
v2: improved the git log entry
added Eric Blake as a reviewer

 block/qcow2-snapshot.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0aa9def..c8b842c 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -165,9 +165,11 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 
 assert(offset = INT_MAX);
 snapshots_size = offset;
-
 /* Allocate space for the new snapshot list */
-snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
+snapshots_offset = 0;
+if (snapshots_size) {
+snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
+}
 offset = snapshots_offset;
 if (offset  0) {
 ret = offset;
@@ -180,12 +182,13 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 
 /* The snapshot list position has not yet been updated, so these clusters
  * must indeed be completely free */
-ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size);
-if (ret  0) {
-goto fail;
+if (snapshots_size) {
+ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size);
+if (ret  0) {
+goto fail;
+}
 }
 
-
 /* Write all snapshots to the new list */
 for(i = 0; i  s-nb_snapshots; i++) {
 sn = s-snapshots + i;
@@ -590,12 +593,14 @@ int qcow2_snapshot_delete(BlockDriverState *bs,
 return -ENOENT;
 }
 sn = s-snapshots[snapshot_index];
-
 /* Remove it from the snapshot list */
-memmove(s-snapshots + snapshot_index,
-s-snapshots + snapshot_index + 1,
-(s-nb_snapshots - snapshot_index - 1) * sizeof(sn));
 s-nb_snapshots--;
+if (s-nb_snapshots) {
+memmove(s-snapshots + snapshot_index,
+s-snapshots + snapshot_index + 1,
+(s-nb_snapshots - (snapshot_index - 1)) * sizeof(sn));
+}
+
 ret = qcow2_write_snapshots(bs);
 if (ret  0) {
 error_setg_errno(errp, -ret,
-- 
1.9.0




Re: [Qemu-devel] [PATCH 1/6] xics: add flags for interrupts

2014-05-09 Thread Mike Day
On Thu, May 8, 2014 at 11:12 PM, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 There are
 a couple ways to mitigate this type of situation by using alternative
 data structures to inform the loop traversal. I don't know if it is
 worth the effort, though.

 Here I lost you :)

If I read the code correctly, the problem I'm wondering  about is that
the loop will waste time traversing the array  when there are only
unallocated interrupts from the current index to the end. For example,
if the interrupt array is 256 entries and the highest index that is
allocated is 16, the outside loop will traverse all 256 entries while
it should have exited after the 16th.

To mitigate this you could keep a shadow index of the current highest
allocated index and check for that in the outside loop. Or you could
maintain a shadow linked list that only includes allocated array
entries and just traverse that list. Each element on the list would be
an allocated entry in the interrupt array.

 btw I just realized that in patch#2 it should be xics_find_source instead
 of xics_find_server. There are many interrupt servers already and just one
 interrupt source (we could have many like one per PHB or something like
 that but we are not there yet), this is what I meant.



[Qemu-devel] [PATCH] qemu-img fails to delete last snapshot

2014-05-09 Thread Mike Day
When deleting the last snapshot, copying the resulting snapshot table
currently fails, causing the delete operation to also fail. Fix the
failure by skipping the copy and just writing the snapshot header and
freeing the extra clusters.

Signed-off-by: Mike Day ncm...@ncultra.org
---

There are two specific problems in the curent code. First is a lack of
parenthesis in the calculation of a memmove parameter:

s-nb_snapshots - snapshot_index - 1

When s-nb_snapshots is 0, snapshot_index is 1. 

0 - 1 - 1 = 0xfffe

it should be:

0 - (1 - 1) = 0x00

The second problem is shifting the snapshot table to the left. After
removing the last snapshot there are no existing snapshots to be
shifted. All that needs to be done is to write the header and
unallocate the blocks.

 block/qcow2-snapshot.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0aa9def..c8b842c 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -165,9 +165,11 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 
 assert(offset = INT_MAX);
 snapshots_size = offset;
-
 /* Allocate space for the new snapshot list */
-snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
+snapshots_offset = 0;
+if (snapshots_size) {
+snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
+}
 offset = snapshots_offset;
 if (offset  0) {
 ret = offset;
@@ -180,12 +182,13 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 
 /* The snapshot list position has not yet been updated, so these clusters
  * must indeed be completely free */
-ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size);
-if (ret  0) {
-goto fail;
+if (snapshots_size) {
+ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size);
+if (ret  0) {
+goto fail;
+}
 }
 
-
 /* Write all snapshots to the new list */
 for(i = 0; i  s-nb_snapshots; i++) {
 sn = s-snapshots + i;
@@ -590,12 +593,14 @@ int qcow2_snapshot_delete(BlockDriverState *bs,
 return -ENOENT;
 }
 sn = s-snapshots[snapshot_index];
-
 /* Remove it from the snapshot list */
-memmove(s-snapshots + snapshot_index,
-s-snapshots + snapshot_index + 1,
-(s-nb_snapshots - snapshot_index - 1) * sizeof(sn));
 s-nb_snapshots--;
+if (s-nb_snapshots) {
+memmove(s-snapshots + snapshot_index,
+s-snapshots + snapshot_index + 1,
+(s-nb_snapshots - (snapshot_index - 1)) * sizeof(sn));
+}
+
 ret = qcow2_write_snapshots(bs);
 if (ret  0) {
 error_setg_errno(errp, -ret,
-- 
1.9.0




Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version

2014-05-08 Thread Mike Day
On Wed, May 7, 2014 at 5:16 PM, Hani Benhabiles kroo...@gmail.com wrote:
 FWIW, this new version doesn't trigger a false positive I was having with
 patches 05-07 in [1].

 However, from a quick test for this patch on a couple of patches, I am getting
 warnings like this: WARNING: braces {} are not necessary for single statement
 blocks so there are still some Qemu related changes missing.

Thanks for the feedback! As Peter pointed out (and you confirm), I
missed forward-porting QEMU's brace rules.



Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version

2014-05-08 Thread Mike Day
On Thu, May 8, 2014 at 10:02 AM, Peter Maydell peter.mayd...@linaro.org wrote:
 total: 0 errors, 0 warnings, 79 lines checked

 /tmp/a has no obvious style problems and is ready for submission.
 Check 500, Bad 52

 How does your new checkpatch compare to our current one?

 Yes; we do sometimes let checkpatch-failing commits through,
 so it would be more interesting to know:
  * how many commits passed old-checkpatch but fail the new one
  * how many commits failed with old-checkpatch but pass now
  * in both cases, how many of the failures are false positives?

 (the last of those is tricky to answer without eyeballing
 the errors, unfortunately)

That seems like a good test regime and not a lot of effort using the
bash script Don shared.

There appears to be some interest in this patch. I'll work on a 2nd
revision, following Peter's suggestion to separate the QEMU rules into
a separate, smaller patch (and with the correct handling of brackets).

Mike



Re: [Qemu-devel] [PATCH 1/6] xics: add flags for interrupts

2014-05-07 Thread Mike Day

  
  for (i = 0; i  ics-nr_irqs; i++) {
  /* FIXME: filter by server#? */
 -if (ics-islsi[i]) {
 +if (ics-irqs[i].flags  XICS_FLAGS_LSI) {
  resend_lsi(ics, i);

Not part of your patch, but I'm curious about this FIXME (there's an
identical FIXME in resend_msi). Has this proved to be a problem? With
these patches you could have many unallocated interrupts in array AFTER
the last allocated interrupt, correct? In that case, the loop would
continue beyond the last allocated interrupt for no purpose.  There are
a couple ways to mitigate this type of situation by using alternative
data structures to inform the loop traversal. I don't know if it is
worth the effort, though.


 +/* @flags == 0 measn the interrupt is not allocated */
 +#define XICS_FLAGS_LSI 0x1
 +#define XICS_FLAGS_MSI 0x2

(nit) typo in the above comment

Mike

-- 
Mike Day | Endurance is a Virtue



Re: [Qemu-devel] [PATCH 2/6] xics: add find_server

2014-05-07 Thread Mike Day

Alexey Kardashevskiy a...@ozlabs.ru writes:

 PAPR allows having multiple interrupr servers.


typo above in the commit log entry,

Mike
-- 
Mike Day | Endurance is a Virtue



Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version

2014-05-06 Thread Mike Day
On Tue, May 6, 2014 at 2:28 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 I think this is going to be difficult to review, to say the least.

 Where does your patch come from? Is the kernel's checkpatch.pl
 just a single commit between 0.31 and 0.32 (surely not) or
 a series of fixes?

Yes, this is today's version of the kernel's checkpatch.pl.  Version
0.32 is the product of many patches onto 0.31

 A couple of ideas about how we could approach this:
  (1) make a commit which is simply copying the kernel's 0.32
   into our repo; then follow that with a series of commits which
   re-apply our local changes.
  (2) apply all the individual commits from the kernel between 0.31
  and 0.32 to our repo
  (3) give up and stick with 0.31...

idea (1) makes perfect sense to me, and the local changes will be easy
to review.

 It might also be helpful if you could describe the benefits
 we get from this update (any bugfixes for false positives we
 tend to run into? useful new checks?)

I think its a valid question whether to forward-port the Qemu checks
to 0.32. I'm not sure, myself, but this gives folks an idea of what
the cost/benefit is.

Yesterday I was bitten by the StudlyCaps syndrome in a patch I
submitted which was checkpatch-clean. Afterward I noticed that version
0.31 has the check for StudlyCaps commented out. This is corrected in
version 0.32 with a more ambitious check for CamelCase.

The CamelCase check in v0.32 tries to determine if the patch
introduced StudlyCaps or if they already exist in the unpatched source
file. (I was hoping for a three-line check I could paste into v0.31.)

I noticed some other new features - it uses an optional configuration
file, checks for validly formed email addresses. The other changes I
believe, are either experimental or providing tools for filtering or
typing messages.

Mike



Re: [Qemu-devel] [PATCH] qemu-img: sort block formats in help message

2014-05-05 Thread Mike Day
Thanks for the review! I've been able to shorten the next version quite a bit.

On Mon, May 5, 2014 at 9:42 AM, Stefan Hajnoczi stefa...@redhat.com wrote:
  +static void GFunc_print_format(gpointer data, gpointer user)

 QEMU coding style is lowercase function and variable names.  The
 scripts/checkpatch.pl script should identify coding style violations,
 please run it.

Yeah this is ugly. I've long had checkpatch.pl configured as my
pre-commit hook. I've confirmed again this morning that these ugly
caps pass through with no error or warning. I'll look at the script.

  +
  +static void add_format_to_seq(void *opaque, const char *fmt_name)
  +{
  +GSequence *seq = (GSequence *)opaque;
  +
  +if (!g_sequence_lookup(seq, (gpointer)fmt_name,
  +  compare_data, NULL)) {
  +g_sequence_insert_sorted(seq, (gpointer)fmt_name,
  + compare_data, NULL);
  +}

 The type casts in this patch aren't necessary.  In C void* casts to and
 from any type without an explicit cast.  Only C++ demands explicit
 casts of void*.

The casts are ugly, and I don't know how to get rid of them here
beyond a pragma.  Due to the block and glib APIs I have to cast away a
const in the second parameter. I'm bracketed on both ends:
g_sequence_* needs that second parameter to be void * (pointer), while
the the block api (bdrv_iterate_format) defines the this function
pointer type as (*)(void *, const char *). Ideas?

Mike



[Qemu-devel] [PATCH] [v2 PATCH] qemu-img: sort block formats in help message

2014-05-05 Thread Mike Day
The help message for qemu-img lists the supported block formats, of
which there are 27 as of version 2.0.50. The formats are printed in
the order of their driver's position in a linked list, which appears
random. This patch prints the formats in sorted order, making it
easier to read and to find a specific format in the list.

v2: Incorporated feedback from Stefan Hajnoczi

Signed-off-by: Mike Day ncm...@ncultra.org
---
 qemu-img.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 96f4463..a559108 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -32,6 +32,7 @@
 #include block/block_int.h
 #include block/qapi.h
 #include getopt.h
+#include glib.h
 
 #define QEMU_IMG_VERSION qemu-img version  QEMU_VERSION \
   , Copyright (c) 2004-2008 Fabrice Bellard\n
@@ -55,9 +56,25 @@ typedef enum OutputFormat {
 #define BDRV_O_FLAGS BDRV_O_CACHE_WB
 #define BDRV_DEFAULT_CACHE writeback
 
-static void format_print(void *opaque, const char *name)
+static gint compare_data(gconstpointer a, gconstpointer b, gpointer user)
 {
-printf( %s, name);
+return g_strcmp0(a, b);
+}
+
+static void print_format(gpointer data, gpointer user)
+{
+printf( %s, (char *)data);
+}
+
+static void add_format_to_seq(void *opaque, const char *fmt_name)
+{
+GSequence *seq = (GSequence *)opaque;
+
+if (!g_sequence_lookup(seq, (gpointer)fmt_name,
+   compare_data, NULL)) {
+g_sequence_insert_sorted(seq, (gpointer)fmt_name,
+ compare_data, NULL);
+}
 }
 
 static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...)
@@ -144,8 +161,12 @@ static void QEMU_NORETURN help(void)
  '-s' run in Strict mode - fail on different image size or sector 
allocation\n;
 
 printf(%s\nSupported formats:, help_msg);
-bdrv_iterate_format(format_print, NULL);
+GSequence *seq = g_sequence_new(NULL);
+bdrv_iterate_format(add_format_to_seq, seq);
+g_sequence_foreach(seq, print_format, NULL);
 printf(\n);
+g_sequence_free(seq);
+
 exit(EXIT_SUCCESS);
 }
 
-- 
1.9.0




[Qemu-devel] [PATCH] qemu-img: sort block formats in help message

2014-05-02 Thread Mike Day
The help message for qemu-img lists the supported block formats, of
which there are 27 as of version 2.0.50. The formats are printed in
the order of their driver's position in a linked list, which appears
random. This patch prints the formats in sorted order, making it
easier to read and to find a specific format in the list.

Signed-off-by: Mike Day ncm...@ncultra.org
---
 qemu-img.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

 diff --git a/qemu-img.c b/qemu-img.c
 index 96f4463..d8b7ef4 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -32,6 +32,7 @@
 #include block/block_int.h
 #include block/qapi.h
 #include getopt.h
 +#include glib.h
 
 #define QEMU_IMG_VERSION qemu-img version  QEMU_VERSION \
 , Copyright (c) 2004-2008 Fabrice Bellard\n
 @@ -60,6 +61,32 @@ static void format_print(void *opaque, const char *name)
 printf( %s, name);
 }
 
 +static gint compare_data(gconstpointer a, gconstpointer b, gpointer user)
 +{
 +return g_strcmp0((const char *)a, (const char *)b);
 +}
 +
 +static void GFunc_print_format(gpointer data, gpointer user)
 +{
 +format_print(user, data);
 +}
 +
 +static GSequence *init_sequence(void)
 +{
 +return g_sequence_new(NULL);
 +}
 +
 +static void add_format_to_seq(void *opaque, const char *fmt_name)
 +{
 +GSequence *seq = (GSequence *)opaque;
 +
 +if (!g_sequence_lookup(seq, (gpointer)fmt_name,
 +  compare_data, NULL)) {
 +g_sequence_insert_sorted(seq, (gpointer)fmt_name,
 + compare_data, NULL);
 +}
 +}
 +
 static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...)
 {
 va_list ap;
 @@ -144,8 +171,12 @@ static void QEMU_NORETURN help(void)
   '-s' run in Strict mode - fail on different image size or sector 
allocation\n;
 
 printf(%s\nSupported formats:, help_msg);
 -bdrv_iterate_format(format_print, NULL);
 +GSequence *seq = init_sequence();
 +bdrv_iterate_format(add_format_to_seq, seq);
 +g_sequence_foreach(seq, GFunc_print_format, NULL);
 printf(\n);
 +g_sequence_free(seq);
 +
 exit(EXIT_SUCCESS);
 }
 
 -- 
 1.9.0




Re: [Qemu-devel] [PULL 14/16] qemu-img: Improve error messages

2014-04-30 Thread Mike Day
On Sun, Apr 27, 2014 at 9:55 PM, Fam Zheng f...@redhat.com wrote:
   /* not found */
  -help();
  -return 0;
  +error_exit(Command not found: %s, cmdname);

 Looks like we just relied previously on the default 'not found' case
 for help() to provide the --help option.


 Oops! Thanks for noticing this and sending the fix.

One of you guys going to send in the trivial patch to restore help?
(Or is it upstream in another patch?)

thanks, Mike



Re: [Qemu-devel] [PULL 14/16] qemu-img: Improve error messages

2014-04-30 Thread Mike Day
On Wed, Apr 30, 2014 at 10:16 AM, Eric Blake ebl...@redhat.com wrote:
 Additional patches:
 https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04074.html
 https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04468.html
 both already on the queue for the next PULL request of the block branch.

Excellent, thank you!

Mike



Re: [Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio

2014-04-29 Thread Mike Day
On Tue, Apr 29, 2014 at 3:09 AM, Gerd Hoffmann kra...@redhat.com wrote:

 I don't feel like adding more band-aid to paper over the fundamental
 issue.  Too much band-aid tends to cause other unwanted side effects.

Fair enough, I tend to agree. thanks for the review.

Mike



Re: [Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio

2014-04-29 Thread Mike Day
On Tue, Apr 29, 2014 at 1:02 AM, Michael Tokarev m...@tls.msk.ru wrote:
 I guess we should have some global variable like stdio_occuped, set
 it to 1 when -daemonize is specified, and set and check it each time
 we try to use stdio for something.  This way we'll prevent various
 parts of qemu from fighting for stdio.

That does seem to be a more fundamental solution. Thanks for the review.

Mike



[Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio

2014-04-28 Thread Mike Day
When the deprecated -nographic option is used with the -mon option in
readline mode, qemu will create a second character device for stdio
and place it over the stdio chardev put into place by the -mon
option. This causes the terminal to stop echoeing characters upon exit
from Qemu.

Fix by checking for the existing chardev before adding another.
Signed-off-by: Mike Day ncm...@ncultra.org

---

To reproduce, use -mon and -nographic together. I was able to
reproduce it using

# qemu-system-x86_64 -enable-kvm -m 1G  -chardev stdio,id=mon0 \
# -mon chardev=mon0,mode=readline -nographic



---
 vl.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/vl.c b/vl.c
index 236f95e..3104b20 100644
--- a/vl.c
+++ b/vl.c
@@ -93,6 +93,7 @@ int main(int argc, char **argv)
 #include sysemu/kvm.h
 #include qapi/qmp/qjson.h
 #include qemu/option.h
+#include qemu/option_int.h
 #include qemu/config-file.h
 #include qemu-options.h
 #include qmp-commands.h
@@ -524,6 +525,20 @@ static QemuOptsList qemu_mem_opts = {
 },
 };
 
+static int qemu_opts_loopfun(QemuOpts *opts, void *opaque)
+{
+QemuOpt *opt;
+
+if (!strncmp(((QemuOpt *)opts-list)-name, mon, 3)) {
+QTAILQ_FOREACH(opt, opts-head, next) {
+if (!strncmp(opt-name, chardev, 7)) {
+return 1;
+}
+}
+}
+return 0;
+}
+
 /**
  * Get machine options
  *
@@ -4113,6 +4128,11 @@ int main(int argc, char **argv, char **envp)
 }
 
 if (display_type == DT_NOGRAPHIC) {
+int have_stdio = 0;
+QemuOptsList *opts = qemu_find_opts(mon);
+if (opts) {
+have_stdio = qemu_opts_foreach(opts, qemu_opts_loopfun, NULL, 0);
+}
 if (default_parallel)
 add_device_config(DEV_PARALLEL, null);
 if (default_serial  default_monitor) {
@@ -4122,7 +4142,7 @@ int main(int argc, char **argv, char **envp)
 } else if (default_sclp  default_monitor) {
 add_device_config(DEV_SCLP, mon:stdio);
 } else {
-if (default_serial)
+if (default_serial  !have_stdio)
 add_device_config(DEV_SERIAL, stdio);
 if (default_virtcon)
 add_device_config(DEV_VIRTCON, stdio);
-- 
1.9.0




Re: [Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio

2014-04-28 Thread Mike Day
On Mon, Apr 28, 2014 at 3:53 PM, Michael Tokarev m...@tls.msk.ru wrote:
 Gosh. This is uuugly.

 Maybe, at least, we can add some variable which gets incremented for
 each -mon chardev= ?  I dunno, that's about the same ugliness.  Oh
 well... :(

Suggestions much appreciated, I think its ugly versus intrusive, but
I'm not familiar with this part of qemu.

There is another case with -monitor where you can configure two
chardevs for stdio and it does the same thing. I left that one alone,
its more complicated.

Mike



Re: [Qemu-devel] Monitor Readline - no terminal echo after exit

2014-04-24 Thread Mike Day
On Thu, Apr 24, 2014 at 3:31 AM, Markus Armbruster arm...@redhat.com wrote:
 I believe someone on the list mentioned they are seeing a couple
 problems entering and exiting the Monitor. I'd like to look at this more
 closely, starting with my most pending issue: losing the terminal echo
 after exiting the Monitor.

Thanks for the reply Markus.

 Reproducer?

I've found a couple of ways to reproduce the problem. The easiest is
to use -nographic on the qemu command line when starting a qemu
session. In this case the monitor opens stdio but there is no visible
input or output.

Another way is to use -nographic along with -mon chardev =
stdio,mode=readline. In this case the monitor works, but when you
exit from the monitor your terminal will not echo characters.

For reference, here are the chardev and mon options I use:

 -chardev stdio,id=mon0 -mon chardev=mon0,mode=readline

I see that -nographic is a deprecated option, fwiw.

 The monitor runs on top of a QEMU chardev.  Suggest to start digging at
 monitor_init(), both into the monitor itself, and into the
 CharDriverState object.

Thus far I've confirmed that when the -nographic option is passed, the
mon_init_func does not get called (as it does for readline mode). I
know why this is, but I'm not yet sure the right way to fix it.  Also,
with -nographic and mon:stdio  monitor_flush is called for every line
entered execpt for the last line. Normally monitor_flush is called for
every line including the last line, at least in readline mode.

I've run out of time looking at this today, but would but would be
happy if anyone has further ideas.

Mike



[Qemu-devel] Monitor Readline - no terminal echo after exit

2014-04-23 Thread Mike Day

I believe someone on the list mentioned they are seeing a couple
problems entering and exiting the Monitor. I'd like to look at this more
closely, starting with my most pending issue: losing the terminal echo
after exiting the Monitor.

Does anyone have a quick pointer as to where I should look for this
code? Otherwise I'll start looking through main_loop and friends and
vl.c for init and destroy routines. 

Mike

-- 
Mike Day | Endurance is a Virtue



Re: [Qemu-devel] [PATCH v6] spapr_hcall: add address-translation-mode-on-interrupt resource in H_SET_MODE

2014-04-15 Thread Mike Day
On Tue, Apr 8, 2014 at 2:28 AM, Alexander Graf ag...@suse.de wrote:
 On 22.03.14 14:03, Alexey Kardashevskiy wrote:

 This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from
 the H_SET_MODE, for POWER8 (PowerISA 2.07) only.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Reviewed-by: Mike Day ncm...@ncultra.org


 ... but I can comment on style issues regardless for now ;)

Alex, would a document under the CDA we have with SUSE be acceptable
to you? That is the fastest option available at the moment.

Mike



Re: [Qemu-devel] [PATCH v5 3/3] spapr_hcall: add address-translation-mode-on-interrupt resource in H_SET_MODE

2014-03-20 Thread Mike Day

Alexey Kardashevskiy a...@ozlabs.ru writes:

 This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from
 the H_SET_MODE, for POWER8 (PowerISA 2.07) only.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Reviewed-by: Mike Day ncm...@ncultra.org
 ---
  hw/ppc/spapr_hcall.c | 26 ++
  target-ppc/cpu.h |  2 ++
  2 files changed, 28 insertions(+)

 diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
 index fc5211b..fb23730 100644
 --- a/hw/ppc/spapr_hcall.c
 +++ b/hw/ppc/spapr_hcall.c
 @@ -747,6 +747,32 @@ static target_ulong h_set_mode(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  default:
  ret = H_UNSUPPORTED_FLAG;
  }
 +} else if (resource == H_SET_MODE_RESOURCE_ADDR_TRANS_MODE) {
 +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
 +
 +if (!(pcc-insns_flags2  PPC2_ISA207S)) {
 +return H_P2;
   
   ret = H_P2;
   goto out; 
   
Just a nit to make for easier review. The above would be more
consistent. (Even though ret is already initialized to H_P2.)

 +}
 +if (value1) {
 +ret = H_P3;
 +goto out;
 +}
 +if (value2) {
 +ret = H_P4;
 +goto out;
 +}
 +switch (mflags) {
 +case 0:
 +case 2:
 +case 3:
 +CPU_FOREACH(cs) {
 +set_spr(cs, SPR_LPCR, mflags  LPCR_AIL_SH, LPCR_AIL);
 +}
 +return H_SUCCESS;
 +
 +default:
 +return H_UNSUPPORTED_FLAG;
 +}
  }
  
  out:
 diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
 index 72cb546..577193a 100644
 --- a/target-ppc/cpu.h
 +++ b/target-ppc/cpu.h
 @@ -462,6 +462,8 @@ struct ppc_slb_t {
  #define MSR_LE   0  /* Little-endian mode   1 hflags 
 */
  
  #define LPCR_ILE (1  (63-38))
 +#define LPCR_AIL  0x0180  /* Alternate interrupt location */
 +#define LPCR_AIL_SH   (63-40)
  
  #define msr_sf   ((env-msr  MSR_SF)1)
  #define msr_isf  ((env-msr  MSR_ISF)   1)
 -- 
 1.8.4.rc4



-- 
Mike Day | Endurance is a Virtue



Re: [Qemu-devel] qemu patch for adding functionality to rtas_ibm_get_system_parameter [Version 2]

2014-03-14 Thread Mike Day

Tomohiro B Berry tbbe...@us.ibm.com writes:

 Hi again,

 I believe I have added the appropriate format changes and made a couple 
 changes to the code.  This patch should add functionality to the function 
 rtas_ibm_get_system_parameter to return a string containing the values for 
 partition_max_entitled_capacity and system_potential_processors. 

The text before the diff goes into the git commit log. The patch
guidelines say the commit log entry should describe the patch and
anything else should go below -- which will keep it out of the commit
log. I believe I have... is an example of the type of text that
shouldn't be in the commit log.

There is still a line wrap - If you are not doing so consider running
your patch through scripts/chechpatch.pl before submitting it.

Also consider using git-format-patch to generate the patch you will
send if you're not doing so already.

Mike

 Regards, 
 Tomo Berry

 Signed-off-by: Tomo Berry tbbe...@us.ibm.com
 diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
 index 1cb276d..931ba06 100644
 --- a/hw/ppc/spapr_rtas.c
 +++ b/hw/ppc/spapr_rtas.c
 @@ -225,6 +225,9 @@ static void rtas_stop_self(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  }
  
  #define DIAGNOSTICS_RUN_MODE42
 +#define SPLPAR_CHARACTERISTICS_TOKEN 20
 +#define CMO_CHARACTERISTICS_TOKEN 44
 +#define CEDE_LATENCY_TOKEN 45
  
  static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
sPAPREnvironment *spapr,
 @@ -236,6 +239,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU 
 *cpu,
  target_ulong buffer = rtas_ld(args, 1);
  target_ulong length = rtas_ld(args, 2);
  target_ulong ret = RTAS_OUT_NOT_SUPPORTED;
 +char *local_buffer;
  
  switch (parameter) {
  case DIAGNOSTICS_RUN_MODE:
 @@ -244,6 +248,36 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU 
 *cpu,
  ret = RTAS_OUT_SUCCESS;
  }
  break;
 +case SPLPAR_CHARACTERISTICS_TOKEN:
 +
 +/*  Create a string of length bytes locally to copy to buffer  */
 +
 +local_buffer = calloc(length, 1);
 +
 +/*  These are the current system parameters supported.  The first
 + *  16 bits of the buffer must contain the length of the string.
 + *  These 16 bits are not included in the length of the string. 
 */
 +
 +((uint16_t *)local_buffer)[0] = cpu_to_be16(snprintf(local_buffer 
 + 2,
 + length - 2, MaxEntCap=%d,MaxPlatProcs=%d, max_cpus, 
 smp_cpus));
 +
 +/*  Copy the string into buffer using rtas_st_buffer to
 + *  convert the addresses. */
 +
 +rtas_st_buffer(buffer, length, (uint8_t *)local_buffer);
 +
 +free(local_buffer);
 +ret = RTAS_OUT_SUCCESS;
 +break;
 +case CMO_CHARACTERISTICS_TOKEN:
 +ret = RTAS_OUT_NOT_SUPPORTED;
 +break;
 +case CEDE_LATENCY_TOKEN:
 +ret = RTAS_OUT_NOT_SUPPORTED;
 +break;
 +default:
 +ret = RTAS_OUT_NOT_SUPPORTED;
 +break;
  }
  
  rtas_st(rets, 0, ret);
 diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
 index b2f11e9..ee6ed2d 100644
 --- a/include/hw/ppc/spapr.h
 +++ b/include/hw/ppc/spapr.h
 @@ -356,6 +356,18 @@ static inline void rtas_st(target_ulong phys, int n, 
 uint32_t val)
  stl_be_phys(ppc64_phys_to_real(phys + 4*n), val);
  }
  
 +/*  This function will store a buffer 1 byte at a time into the
 + *  address at phys up to a maximum of phys_len bytes.*/
 +
 +static inline void rtas_st_buffer(target_ulong phys,
 +  target_ulong phys_len,
 +  uint8_t *buffer) {
 +uint32_t i;
 +for (i = 0; i  phys_len; i++) {
 +stb_phys(ppc64_phys_to_real(phys + i), buffer[i]);
 +}
 +}
 +
  typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPREnvironment *spapr,
uint32_t token,
uint32_t nargs, target_ulong args,



 From:   Mike Day ncm...@ncultra.org
 To: Tomohiro B Berry/Austin/IBM@IBMUS, qemu-devel@nongnu.org, 
 Date:   03/13/2014 03:24 PM
 Subject:Re: [Qemu-devel] qemu patch for adding functionality to 
 rtas_ibm_get_system_parameter




 Tomohiro, 

 Please follow the guidelines for submitting a patch to Qemu that are
 found in:

 http://wiki.qemu.org/Contribute/SubmitAPatch

 This patch has an inappropriate commit log, is missing a signed-off-by:
 tag, and some of the lines wrapped in my reader. These are explained in
 the document above. You should be able to fix these issues quickly and
 resubmit this patch. 

 Mike

 Tomohiro B Berry tbbe...@us.ibm.com writes:

 Hi all,

 rtas_ibm_get_system_parameter did not previously have the functionality 
 to 
 return the appropriate string when called with the 
 SPLPAR_CHARACTERISTICS_TOKEN.  I am proposing the following patch to add 

 that functionality.  I am including the cases for 
 CMO_CHARACTERISTICS_TOKEN

Re: [Qemu-devel] spapr-iommu: extend SPAPR_TCE_TABLE class

2014-03-13 Thread Mike Day
On 20/11/13 16:39 +1100, Alexey Kardashevskiy wrote:
 This adds a put_tce() callback to the SPAPR TCE TABLE device class.
 The new callback allows to have different IOMMU types such as upcoming
 VFIO IOMMU and it will be used more by the upcoming Multi-TCE support.
 
 This reworks the H_PUT_TCE handler to make use of the new put_tce()
 callback.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Reviewed-by: Mike Day ncm...@ncultra.org



 ---
  hw/ppc/spapr_iommu.c   | 21 +
  include/hw/ppc/spapr.h | 13 +
  2 files changed, 30 insertions(+), 4 deletions(-)
 
 diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
 index ef45f4f..0016c13 100644
 --- a/hw/ppc/spapr_iommu.c
 +++ b/hw/ppc/spapr_iommu.c
 @@ -207,7 +207,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, 
 target_ulong ioba,
  IOMMUTLBEntry entry;
  
  if (ioba = tcet-window_size) {
 -hcall_dprintf(spapr_vio_put_tce on out-of-bounds IOBA 0x
 +hcall_dprintf(spapr put_tce_emu on out-of-bounds IOBA 0x
TARGET_FMT_lx \n, ioba);
  return H_PARAMETER;
  }
 @@ -232,12 +232,21 @@ static target_ulong h_put_tce(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  target_ulong tce = args[2];
  target_ulong ret = H_PARAMETER;
  sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
 +sPAPRTCETableClass *info;
 +
 +if (!tcet) {
 +return H_PARAMETER;
 +}
 +
 +info = SPAPR_TCE_TABLE_GET_CLASS(tcet);
 +if (!info || !info-put_tce) {
 +return H_PARAMETER;
 +}
  
  ioba = ~(SPAPR_TCE_PAGE_SIZE - 1);
  
 -if (tcet) {
 -ret = put_tce_emu(tcet, ioba, tce);
 -}
 +ret = info-put_tce(tcet, ioba, tce);
 +
  trace_spapr_iommu_put(liobn, ioba, tce, ret);
  
  return ret;
 @@ -287,9 +296,12 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const 
 char *propname,
  static void spapr_tce_table_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
 +sPAPRTCETableClass *stc = SPAPR_TCE_TABLE_CLASS(klass);
 +
  dc-vmsd = vmstate_spapr_tce_table;
  dc-init = spapr_tce_table_realize;
  dc-reset = spapr_tce_reset;
 +stc-put_tce = put_tce_emu;
  
  QLIST_INIT(spapr_tce_tables);
  
 @@ -302,6 +314,7 @@ static TypeInfo spapr_tce_table_info = {
  .parent = TYPE_DEVICE,
  .instance_size = sizeof(sPAPRTCETable),
  .class_init = spapr_tce_table_class_init,
 +.class_size = sizeof(sPAPRTCETableClass),
  .instance_finalize = spapr_tce_table_finalize,
  };
  
 diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
 index fdaab2d..827cda2 100644
 --- a/include/hw/ppc/spapr.h
 +++ b/include/hw/ppc/spapr.h
 @@ -367,12 +367,25 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr 
 rtas_addr,
  
  #define RTAS_ERROR_LOG_MAX  2048
  
 +typedef struct sPAPRTCETableClass sPAPRTCETableClass;
  typedef struct sPAPRTCETable sPAPRTCETable;
  
  #define TYPE_SPAPR_TCE_TABLE spapr-tce-table
  #define SPAPR_TCE_TABLE(obj) \
  OBJECT_CHECK(sPAPRTCETable, (obj), TYPE_SPAPR_TCE_TABLE)
  
 +#define SPAPR_TCE_TABLE_CLASS(klass) \
 + OBJECT_CLASS_CHECK(sPAPRTCETableClass, (klass), TYPE_SPAPR_TCE_TABLE)
 +#define SPAPR_TCE_TABLE_GET_CLASS(obj) \
 + OBJECT_GET_CLASS(sPAPRTCETableClass, (obj), TYPE_SPAPR_TCE_TABLE)
 +
 +struct sPAPRTCETableClass {
 +DeviceClass parent_class;
 +
 +target_ulong (*put_tce)(sPAPRTCETable *tcet, target_ulong ioba,
 +target_ulong tce);
 +};
 +
  struct sPAPRTCETable {
  DeviceState parent;
  uint32_t liobn;
 -- 
 1.8.4.rc4
 
 
 
 



Re: [Qemu-devel] qemu patch for adding functionality to rtas_ibm_get_system_parameter

2014-03-13 Thread Mike Day

Tomohiro, 

Please follow the guidelines for submitting a patch to Qemu that are
found in:

http://wiki.qemu.org/Contribute/SubmitAPatch

This patch has an inappropriate commit log, is missing a signed-off-by:
tag, and some of the lines wrapped in my reader. These are explained in
the document above. You should be able to fix these issues quickly and
resubmit this patch. 

Mike

Tomohiro B Berry tbbe...@us.ibm.com writes:

 Hi all,

 rtas_ibm_get_system_parameter did not previously have the functionality to 
 return the appropriate string when called with the 
 SPLPAR_CHARACTERISTICS_TOKEN.  I am proposing the following patch to add 
 that functionality.  I am including the cases for 
 CMO_CHARACTERISTICS_TOKEN and CEDE_LATENCY_TOKEN because the function gets 
 called with those tokens during the boot process, but I understand that 
 they are currently redundant with the default case.  I just figured they 
 would be useful for future development, but if we don't want them in there 
 right now I think that would be okay, too. 

 Regards,
 Tomo Berry

 diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
 index 1cb276d..318fdcd 100644
 --- a/hw/ppc/spapr_rtas.c
 +++ b/hw/ppc/spapr_rtas.c
 @@ -225,6 +225,9 @@ static void rtas_stop_self(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  }
  
  #define DIAGNOSTICS_RUN_MODE42
 +#define SPLPAR_CHARACTERISTICS_TOKEN 20
 +#define CMO_CHARACTERISTICS_TOKEN 44
 +#define CEDE_LATENCY_TOKEN 45
  
  static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
sPAPREnvironment *spapr,
 @@ -236,6 +239,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU 
 *cpu,
  target_ulong buffer = rtas_ld(args, 1);
  target_ulong length = rtas_ld(args, 2);
  target_ulong ret = RTAS_OUT_NOT_SUPPORTED;
 +char *local_buffer;
  
  switch (parameter) {
  case DIAGNOSTICS_RUN_MODE:
 @@ -244,6 +248,42 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU 
 *cpu,
  ret = RTAS_OUT_SUCCESS;
  }
  break;
 +case SPLPAR_CHARACTERISTICS_TOKEN:
 +
 +   /*  Create a string locally to copy to buffer  */
 + 
 +   local_buffer=(char*)malloc(length*sizeof(char));
 +   memset(local_buffer,0,length);
 +
 +   /*  These are the current system parameters supported.  The spaces 
 at the 
 +*  start of the string are place holders for the string length. 
 */
 +
 +   snprintf(local_buffer,length, 
 MaxEntCap=%d,MaxPlatProcs=%d,max_cpus,smp_cpus);
 +
 +   /*  The first 16 bits of the buffer must contain the length of the 
 string.
 +*  These 16 bits are not included in the length of the string. */
 + 
 + 
 ((uint16_t*)local_buffer)[0]=cpu_to_be16((uint16_t)strlen(local_buffer[2]));
 + 
 +   /*  Copy the string into buffer using rtas_st_buffer to 
 +*  convert the addresses. */
 + 
 +   rtas_st_buffer(buffer,length,(uint8_t*)local_buffer);
 +
 +   /*  Free the local buffer and return success. */ 
 + 
 +   free(local_buffer);
 +   ret = RTAS_OUT_SUCCESS;
 +   break;
 +case CMO_CHARACTERISTICS_TOKEN:
 +   ret = RTAS_OUT_NOT_SUPPORTED;
 +   break;
 +case CEDE_LATENCY_TOKEN:
 +   ret = RTAS_OUT_NOT_SUPPORTED;
 +   break;
 +default:
 +   ret = RTAS_OUT_NOT_SUPPORTED;
 +   break;
  }
  
  rtas_st(rets, 0, ret);
 diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
 index b2f11e9..87c039c 100644
 --- a/include/hw/ppc/spapr.h
 +++ b/include/hw/ppc/spapr.h
 @@ -356,6 +356,16 @@ static inline void rtas_st(target_ulong phys, int n, 
 uint32_t val)
  stl_be_phys(ppc64_phys_to_real(phys + 4*n), val);
  }
  
 +/*  This function will store a buffer 1 byte at a time into the 
 + *  address at phys up to a maximum of phys_len bytes. */
 +
 +static inline void rtas_st_buffer(target_ulong phys, target_ulong 
 phys_len, uint8_t* buffer){
 +uint32_t i; 
 +for(i=0;iphys_len;i++){
 +stb_phys(ppc64_phys_to_real(phys + i),buffer[i]);
 +}
 +}
 +
  typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPREnvironment *spapr,
uint32_t token,
uint32_t nargs, target_ulong args,
 diff --git a/pixman b/pixman
 --- a/pixman
 +++ b/pixman
 @@ -1 +1 @@
 -Subproject commit 97336fad32acf802003855cd8bd6477fa49a12e3
 +Subproject commit 97336fad32acf802003855cd8bd6477fa49a12e3-dirty
-- 
Mike Day | Endurance is a Virtue



Re: [Qemu-devel] spapr-pci: convert init() callback to realize()

2014-03-12 Thread Mike Day
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote:
 This converts the old-style init() callback to a new style realize()
 callback as init() now is supposed to do only trivial initialization.
 
 As a part of convertion, this replaces fprintf(stderr) with error_setg()
 as realize() does not return any value, instead it puts the extended
 error into **errp.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Reviewed-by: Mike Day ncm...@ncultra.org

 ---
 Changes:
 v5:
 * finish_finalize() moved to a separate patch
 ---
  hw/ppc/spapr_pci.c | 44 ++--
  1 file changed, 22 insertions(+), 22 deletions(-)
 
 diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
 index 7763149..aeb012d 100644
 --- a/hw/ppc/spapr_pci.c
 +++ b/hw/ppc/spapr_pci.c
 @@ -32,6 +32,7 @@
  #include exec/address-spaces.h
  #include libfdt.h
  #include trace.h
 +#include qemu/error-report.h
  
  #include hw/pci/pci_bus.h
  
 @@ -494,9 +495,9 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, 
 void *opaque, int devfn)
  return phb-iommu_as;
  }
  
 -static int spapr_phb_init(SysBusDevice *s)
 +static void spapr_phb_realize(DeviceState *dev, Error **errp)
  {
 -DeviceState *dev = DEVICE(s);
 +SysBusDevice *s = SYS_BUS_DEVICE(dev);
  sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
  PCIHostState *phb = PCI_HOST_BRIDGE(s);
  const char *busname;
 @@ -510,9 +511,9 @@ static int spapr_phb_init(SysBusDevice *s)
  if ((sphb-buid != -1) || (sphb-dma_liobn != -1)
  || (sphb-mem_win_addr != -1)
  || (sphb-io_win_addr != -1)) {
 -fprintf(stderr, Either \index\ or other parameters must
 - be specified for PAPR PHB, not both\n);
 -return -1;
 +error_setg(errp, Either \index\ or other parameters must
 +be specified for PAPR PHB, not both);
 +return;
  }

Seems like these exit clauses should return an integer here and below.
  
  sphb-buid = SPAPR_PCI_BASE_BUID + sphb-index;
 @@ -525,28 +526,28 @@ static int spapr_phb_init(SysBusDevice *s)
  }
  
  if (sphb-buid == -1) {
 -fprintf(stderr, BUID not specified for PHB\n);
 -return -1;
 +error_setg(errp, BUID not specified for PHB);
 +return;
  }
  
  if (sphb-dma_liobn == -1) {
 -fprintf(stderr, LIOBN not specified for PHB\n);
 -return -1;
 +error_setg(errp, LIOBN not specified for PHB);
 +return;
  }
  
  if (sphb-mem_win_addr == -1) {
 -fprintf(stderr, Memory window address not specified for PHB\n);
 -return -1;
 +error_setg(errp, Memory window address not specified for PHB);
 +return;
  }
  
  if (sphb-io_win_addr == -1) {
 -fprintf(stderr, IO window address not specified for PHB\n);
 -return -1;
 +error_setg(errp, IO window address not specified for PHB);
 +return;
  }
  
  if (find_phb(spapr, sphb-buid)) {
 -fprintf(stderr, PCI host bridges must have unique BUIDs\n);
 -return -1;
 +error_setg(errp, PCI host bridges must have unique BUIDs);
 +return;
  }
  
  sphb-dtbusname = g_strdup_printf(pci@% PRIx64, sphb-buid);
 @@ -613,8 +614,9 @@ static int spapr_phb_init(SysBusDevice *s)
  sphb-tcet = spapr_tce_new_table(dev, sphb-dma_liobn,
   sphb-dma_window_size);
  if (!sphb-tcet) {
 -fprintf(stderr, Unable to create TCE table for %s\n, 
 sphb-dtbusname);
 -return -1;
 +error_setg(errp, Unable to create TCE table for %s,
 +   sphb-dtbusname);
 +return;
  }
  address_space_init(sphb-iommu_as, spapr_tce_get_iommu(sphb-tcet),
 sphb-dtbusname);
 @@ -631,13 +633,12 @@ static int spapr_phb_init(SysBusDevice *s)
  
  irq = spapr_allocate_lsi(0);
  if (!irq) {
 -return -1;
 +error_setg(errp, spapr_allocate_lsi failed);
 +return;
  }
  
  sphb-lsi_table[i].irq = irq;
  }
 -
 -return 0;
  }
  
  static void spapr_phb_reset(DeviceState *qdev)
 @@ -720,11 +721,10 @@ static const char *spapr_phb_root_bus_path(PCIHostState 
 *host_bridge,
  static void spapr_phb_class_init(ObjectClass *klass, void *data)
  {
  PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
 -SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass);
  DeviceClass *dc = DEVICE_CLASS(klass);
  
  hc-root_bus_path = spapr_phb_root_bus_path;
 -sdc-init = spapr_phb_init;
 +dc-realize = spapr_phb_realize;
  dc-props = spapr_phb_properties;
  dc-reset = spapr_phb_reset;
  dc-vmsd = vmstate_spapr_pci;
 -- 
 1.8.4.rc4
 
 
 
 



Re: [Qemu-devel] spapr-pci: convert init() callback to realize()

2014-03-12 Thread Mike Day
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote:
 This converts the old-style init() callback to a new style realize()
 callback as init() now is supposed to do only trivial initialization.
 
 As a part of convertion, this replaces fprintf(stderr) with error_setg()
 as realize() does not return any value, instead it puts the extended
 error into **errp.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Reviewed-by: Mike Day ncm...@ncultra.org

 ---
 Changes:
 v5:
 * finish_finalize() moved to a separate patch
 ---
  hw/ppc/spapr_pci.c | 44 ++--
  1 file changed, 22 insertions(+), 22 deletions(-)
 
 diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
 index 7763149..aeb012d 100644
 --- a/hw/ppc/spapr_pci.c
 +++ b/hw/ppc/spapr_pci.c
 @@ -32,6 +32,7 @@
  #include exec/address-spaces.h
  #include libfdt.h
  #include trace.h
 +#include qemu/error-report.h
  
  #include hw/pci/pci_bus.h
  
 @@ -494,9 +495,9 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, 
 void *opaque, int devfn)
  return phb-iommu_as;
  }
  
 -static int spapr_phb_init(SysBusDevice *s)
 +static void spapr_phb_realize(DeviceState *dev, Error **errp)
  {
 -DeviceState *dev = DEVICE(s);
 +SysBusDevice *s = SYS_BUS_DEVICE(dev);
  sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
  PCIHostState *phb = PCI_HOST_BRIDGE(s);
  const char *busname;
 @@ -510,9 +511,9 @@ static int spapr_phb_init(SysBusDevice *s)
  if ((sphb-buid != -1) || (sphb-dma_liobn != -1)
  || (sphb-mem_win_addr != -1)
  || (sphb-io_win_addr != -1)) {
 -fprintf(stderr, Either \index\ or other parameters must
 - be specified for PAPR PHB, not both\n);
 -return -1;
 +error_setg(errp, Either \index\ or other parameters must
 +be specified for PAPR PHB, not both);
 +return;
 
shouldn't these return an int? Or if the error is returned by pointer
perhaps the function should have a void return?

  }
  
  sphb-buid = SPAPR_PCI_BASE_BUID + sphb-index;
 @@ -525,28 +526,28 @@ static int spapr_phb_init(SysBusDevice *s)
  }
  
  if (sphb-buid == -1) {
 -fprintf(stderr, BUID not specified for PHB\n);
 -return -1;
 +error_setg(errp, BUID not specified for PHB);
 +return;
  }
  
  if (sphb-dma_liobn == -1) {
 -fprintf(stderr, LIOBN not specified for PHB\n);
 -return -1;
 +error_setg(errp, LIOBN not specified for PHB);
 +return;
  }
  
  if (sphb-mem_win_addr == -1) {
 -fprintf(stderr, Memory window address not specified for PHB\n);
 -return -1;
 +error_setg(errp, Memory window address not specified for PHB);
 +return;
  }
  
  if (sphb-io_win_addr == -1) {
 -fprintf(stderr, IO window address not specified for PHB\n);
 -return -1;
 +error_setg(errp, IO window address not specified for PHB);
 +return;
  }
  
  if (find_phb(spapr, sphb-buid)) {
 -fprintf(stderr, PCI host bridges must have unique BUIDs\n);
 -return -1;
 +error_setg(errp, PCI host bridges must have unique BUIDs);
 +return;
  }
  
  sphb-dtbusname = g_strdup_printf(pci@% PRIx64, sphb-buid);
 @@ -613,8 +614,9 @@ static int spapr_phb_init(SysBusDevice *s)
  sphb-tcet = spapr_tce_new_table(dev, sphb-dma_liobn,
   sphb-dma_window_size);
  if (!sphb-tcet) {
 -fprintf(stderr, Unable to create TCE table for %s\n, 
 sphb-dtbusname);
 -return -1;
 +error_setg(errp, Unable to create TCE table for %s,
 +   sphb-dtbusname);
 +return;
  }
  address_space_init(sphb-iommu_as, spapr_tce_get_iommu(sphb-tcet),
 sphb-dtbusname);
 @@ -631,13 +633,12 @@ static int spapr_phb_init(SysBusDevice *s)
  
  irq = spapr_allocate_lsi(0);
  if (!irq) {
 -return -1;
 +error_setg(errp, spapr_allocate_lsi failed);
 +return;
  }
  
  sphb-lsi_table[i].irq = irq;
  }
 -
 -return 0;
  }
  
  static void spapr_phb_reset(DeviceState *qdev)
 @@ -720,11 +721,10 @@ static const char *spapr_phb_root_bus_path(PCIHostState 
 *host_bridge,
  static void spapr_phb_class_init(ObjectClass *klass, void *data)
  {
  PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
 -SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass);
  DeviceClass *dc = DEVICE_CLASS(klass);
  
  hc-root_bus_path = spapr_phb_root_bus_path;
 -sdc-init = spapr_phb_init;
 +dc-realize = spapr_phb_realize;
  dc-props = spapr_phb_properties;
  dc-reset = spapr_phb_reset;
  dc-vmsd = vmstate_spapr_pci;
 -- 
 1.8.4.rc4
 
 
 
 



Re: [Qemu-devel] spapr-pci: introduce a finish_realize() callback

2014-03-12 Thread Mike Day
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote:
 The spapr-pci PHB initializes IOMMU for emulataed devices only.
 The upcoming VFIO support will do it different. However both emulated
 and VFIO PHB types share most of the initialization code.
 For the type specific things a new finish_realize() callback is
 introduced.
 
 This introduces sPAPRPHBClass derived from PCIHostBridgeClass and
 adds the callback pointer.
 
 This implements finish_realize() for emulated devices.
 
 This changes initialization steps order to have the finish_realize()
 call at the end of the spapr_finalize().
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: Mike Day ncm...@ncultra.org

 ---
 Changes:
 v5:
 * this is a new patch in the series, it was a part of a previous patch
 ---
  hw/ppc/spapr_pci.c  | 46 
 +
  include/hw/pci-host/spapr.h | 18 --
  2 files changed, 46 insertions(+), 18 deletions(-)
 
 diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
 index aeb012d..963841c 100644
 --- a/hw/ppc/spapr_pci.c
 +++ b/hw/ppc/spapr_pci.c
 @@ -500,6 +500,7 @@ static void spapr_phb_realize(DeviceState *dev, Error 
 **errp)
  SysBusDevice *s = SYS_BUS_DEVICE(dev);
  sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
  PCIHostState *phb = PCI_HOST_BRIDGE(s);
 +sPAPRPHBClass *info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(s);
  const char *busname;
  char *namebuf;
  int i;
 @@ -609,22 +610,6 @@ static void spapr_phb_realize(DeviceState *dev, Error 
 **errp)
 PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
  phb-bus = bus;
  
 -sphb-dma_window_start = 0;
 -sphb-dma_window_size = 0x4000;
 -sphb-tcet = spapr_tce_new_table(dev, sphb-dma_liobn,
 - sphb-dma_window_size);
 -if (!sphb-tcet) {
 -error_setg(errp, Unable to create TCE table for %s,
 -   sphb-dtbusname);
 -return;
 -}
 -address_space_init(sphb-iommu_as, spapr_tce_get_iommu(sphb-tcet),
 -   sphb-dtbusname);
 -
 -pci_setup_iommu(bus, spapr_pci_dma_iommu, sphb);
 -
 -pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq);
 -
  QLIST_INSERT_HEAD(spapr-phbs, sphb, list);
  
  /* Initialize the LSI table */
 @@ -639,6 +624,32 @@ static void spapr_phb_realize(DeviceState *dev, Error 
 **errp)
  
  sphb-lsi_table[i].irq = irq;
  }
 +
 +pci_setup_iommu(sphb-parent_obj.bus, spapr_pci_dma_iommu, sphb);
 +
 +pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq);
 +
 +if (!info-finish_realize) {
 +error_setg(errp, finish_realize not defined);
 +return;
 +}
 +
 +info-finish_realize(sphb, errp);
 +}
 +
 +static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
 +{
 +sphb-dma_window_start = 0;
 +sphb-dma_window_size = 0x4000;
 +sphb-tcet = spapr_tce_new_table(DEVICE(sphb), sphb-dma_liobn,
 + sphb-dma_window_size);
 +if (!sphb-tcet) {
 +error_setg(errp, Unable to create TCE table for %s,
 +   sphb-dtbusname);
 +return ;
 +}
 +address_space_init(sphb-iommu_as, spapr_tce_get_iommu(sphb-tcet),
 +   sphb-dtbusname);
  }
  
  static void spapr_phb_reset(DeviceState *qdev)
 @@ -722,12 +733,14 @@ static void spapr_phb_class_init(ObjectClass *klass, 
 void *data)
  {
  PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
  DeviceClass *dc = DEVICE_CLASS(klass);
 +sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
  
  hc-root_bus_path = spapr_phb_root_bus_path;
  dc-realize = spapr_phb_realize;
  dc-props = spapr_phb_properties;
  dc-reset = spapr_phb_reset;
  dc-vmsd = vmstate_spapr_pci;
 +spc-finish_realize = spapr_phb_finish_realize;
  }
  
  static const TypeInfo spapr_phb_info = {
 @@ -735,6 +748,7 @@ static const TypeInfo spapr_phb_info = {
  .parent= TYPE_PCI_HOST_BRIDGE,
  .instance_size = sizeof(sPAPRPHBState),
  .class_init= spapr_phb_class_init,
 +.class_size= sizeof(sPAPRPHBClass),
  };
  
  PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
 diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
 index 970b4a9..0f428a1 100644
 --- a/include/hw/pci-host/spapr.h
 +++ b/include/hw/pci-host/spapr.h
 @@ -34,7 +34,21 @@
  #define SPAPR_PCI_HOST_BRIDGE(obj) \
  OBJECT_CHECK(sPAPRPHBState, (obj), TYPE_SPAPR_PCI_HOST_BRIDGE)
  
 -typedef struct sPAPRPHBState {
 +#define SPAPR_PCI_HOST_BRIDGE_CLASS(klass) \
 + OBJECT_CLASS_CHECK(sPAPRPHBClass, (klass), TYPE_SPAPR_PCI_HOST_BRIDGE)
 +#define SPAPR_PCI_HOST_BRIDGE_GET_CLASS(obj) \
 + OBJECT_GET_CLASS(sPAPRPHBClass, (obj), TYPE_SPAPR_PCI_HOST_BRIDGE)
 +
 +typedef struct sPAPRPHBClass sPAPRPHBClass;
 +typedef struct sPAPRPHBState sPAPRPHBState;
 +
 +struct sPAPRPHBClass

Re: [Qemu-devel] spapr-pci: converts fprintf to error_report

2014-03-12 Thread Mike Day
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote:
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: Mike Day ncm...@ncultra.org

 ---
  hw/ppc/spapr_pci.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
 index 963841c..d102d82 100644
 --- a/hw/ppc/spapr_pci.c
 +++ b/hw/ppc/spapr_pci.c
 @@ -293,7 +293,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  ret_intr_type = RTAS_TYPE_MSIX;
  break;
  default:
 -fprintf(stderr, rtas_ibm_change_msi(%u) is not implemented\n, 
 func);
 +error_report(rtas_ibm_change_msi(%u) is not implemented, func);
  rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
  return;
  }
 @@ -327,7 +327,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  /* Find a device number in the map to add or reuse the existing one */
  ndev = spapr_msicfg_find(phb, config_addr, true);
  if (ndev = SPAPR_MSIX_MAX_DEVS || ndev  0) {
 -fprintf(stderr, No free entry for a new MSI device\n);
 +error_report(No free entry for a new MSI device);
  rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
  return;
  }
 @@ -336,7 +336,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  /* Check if there is an old config and MSI number has not changed */
  if (phb-msi_table[ndev].nvec  (req_num != phb-msi_table[ndev].nvec)) 
 {
  /* Unexpected behaviour */
 -fprintf(stderr, Cannot reuse MSI config for device#%d, ndev);
 +error_report(Cannot reuse MSI config for device#%d, ndev);
  rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
  return;
  }
 @@ -346,7 +346,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  irq = spapr_allocate_irq_block(req_num, false,
 ret_intr_type == RTAS_TYPE_MSI);
  if (irq  0) {
 -fprintf(stderr, Cannot allocate MSIs for device#%d, ndev);
 +error_report(Cannot allocate MSIs for device#%d, ndev);
  rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
  return;
  }
 -- 
 1.8.4.rc4
 
 
 
 



Re: [Qemu-devel] [PATCH] usb-ohci: add vmstate descriptor

2014-03-06 Thread Mike Day

Alexey Kardashevskiy a...@ozlabs.ru writes:

 This adds migration support for OHCI.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: Mike Day ncm...@ncultra.org

 ---
  hw/usb/hcd-ohci.c | 12 
  1 file changed, 12 insertions(+)

 diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
 index e38cdeb..c42e091 100644
 --- a/hw/usb/hcd-ohci.c
 +++ b/hw/usb/hcd-ohci.c
 @@ -1984,6 +1984,17 @@ static Property ohci_pci_properties[] = {
  DEFINE_PROP_END_OF_LIST(),
  };
  
 +static const VMStateDescription vmstate_ohci = {
 +.name = ohci,
 +.version_id = 1,
 +.minimum_version_id = 1,
 +.minimum_version_id_old = 1,
 +.fields  = (VMStateField[]) {
 +VMSTATE_PCI_DEVICE(parent_obj, OHCIPCIState),
 +VMSTATE_END_OF_LIST()
 +}
 +};
 +
  static void ohci_pci_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
 @@ -1997,6 +2008,7 @@ static void ohci_pci_class_init(ObjectClass *klass, 
 void *data)
  set_bit(DEVICE_CATEGORY_USB, dc-categories);
  dc-desc = Apple USB Controller;
  dc-props = ohci_pci_properties;
 +dc-vmsd = vmstate_ohci;
  }
  
  static const TypeInfo ohci_pci_info = {
 -- 
 1.8.4.rc4






[Qemu-devel] [PATCH 0/2][RFC] Convert Clock lists to RCU (V3)

2014-03-06 Thread Mike Day
The first patch in the series converts the
clock-timerlists-active_timers list to RCU. It also creates
QemuMutex in the parent QemuClock data structure.

The second patch converts clock-timerlists to RCU.

Mike Day (2):
  [RFC] Convert active timers list to use RCU V3
  [RFC] Convert Clock Timerlists to RCU V3

 include/qemu/timer.h |  19 ---
 qemu-timer.c | 142 ---
 2 files changed, 98 insertions(+), 63 deletions(-)

-- 
1.9.0




[Qemu-devel] [PATCH 1/2] [RFC] Convert active timers list to use RCU V3

2014-03-06 Thread Mike Day
active_timers is a list of timer lists, associated with a Qemu Clock,
that is read-mostly. This patch converts read accesses to the list to
use RCU, which should hopefully mitigate most of the synchronization
overhead. Write accesses are now via a mutex in the parent data
structure. Functions that change the change the parent data structure
are also protected by the mutex.

This patch applies against Paolo Bonzini's rcu branch:
https://github.com/bonzini/qemu/tree/rcu

V3:
- Address comments from Alex Bligh and Paolo Bonzini
- Move the mutex into the parent QemuClock structure

Signed-off-by: Mike Day ncm...@ncultra.org
---
 include/qemu/timer.h |  19 +
 qemu-timer.c | 111 +--
 2 files changed, 72 insertions(+), 58 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 5afcffc..f69ec49 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -5,8 +5,15 @@
 #include qemu-common.h
 #include qemu/notify.h
 
-/* timers */
+#ifdef __GNUC__
+#ifndef __ATOMIC_RELEASE
+#define __ATOMIC_RELEASE
+#endif
+#endif
+#include qemu/atomic.h
+#include qemu/rcu.h
 
+/* timers */
 #define SCALE_MS 100
 #define SCALE_US 1000
 #define SCALE_NS 1
@@ -61,6 +68,7 @@ struct QEMUTimer {
 void *opaque;
 QEMUTimer *next;
 int scale;
+struct rcu_head rcu;
 };
 
 extern QEMUTimerListGroup main_loop_tlg;
@@ -189,12 +197,6 @@ void qemu_clock_notify(QEMUClockType type);
  * @enabled: true to enable, false to disable
  *
  * Enable or disable a clock
- * Disabling the clock will wait for related timerlists to stop
- * executing qemu_run_timers.  Thus, this functions should not
- * be used from the callback of a timer that is based on @clock.
- * Doing so would cause a deadlock.
- *
- * Caller should hold BQL.
  */
 void qemu_clock_enable(QEMUClockType type, bool enabled);
 
@@ -543,7 +545,6 @@ void timer_del(QEMUTimer *ts);
  * freed while this function is running.
  */
 void timer_mod_ns(QEMUTimer *ts, int64_t expire_time);
-
 /**
  * timer_mod_anticipate_ns:
  * @ts: the timer
@@ -685,9 +686,7 @@ static inline int64_t qemu_soonest_timeout(int64_t 
timeout1, int64_t timeout2)
 void init_clocks(void);
 
 int64_t cpu_get_ticks(void);
-/* Caller must hold BQL */
 void cpu_enable_ticks(void);
-/* Caller must hold BQL */
 void cpu_disable_ticks(void);
 
 static inline int64_t get_ticks_per_sec(void)
diff --git a/qemu-timer.c b/qemu-timer.c
index fb9e680..57a1545 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -29,6 +29,7 @@
 #include hw/hw.h
 
 #include qemu/timer.h
+#include qemu/rcu_queue.h
 #ifdef CONFIG_POSIX
 #include pthread.h
 #endif
@@ -45,12 +46,10 @@
 /* timers */
 
 typedef struct QEMUClock {
-/* We rely on BQL to protect the timerlists */
 QLIST_HEAD(, QEMUTimerList) timerlists;
-
+QemuMutex timer_lock;
 NotifierList reset_notifiers;
 int64_t last;
-
 QEMUClockType type;
 bool enabled;
 
@@ -70,11 +69,11 @@ QEMUClock qemu_clocks[QEMU_CLOCK_MAX];
 
 struct QEMUTimerList {
 QEMUClock *clock;
-QemuMutex active_timers_lock;
 QEMUTimer *active_timers;
 QLIST_ENTRY(QEMUTimerList) list;
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
+QemuEvent timers_done_ev;
 };
 
 /**
@@ -87,6 +86,7 @@ struct QEMUTimerList {
  */
 static inline QEMUClock *qemu_clock_ptr(QEMUClockType type)
 {
+smp_rmb();
 return qemu_clocks[type];
 }
 
@@ -107,7 +107,6 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
 timer_list-clock = clock;
 timer_list-notify_cb = cb;
 timer_list-notify_opaque = opaque;
-qemu_mutex_init(timer_list-active_timers_lock);
 QLIST_INSERT_HEAD(clock-timerlists, timer_list, list);
 return timer_list;
 }
@@ -118,7 +117,7 @@ void timerlist_free(QEMUTimerList *timer_list)
 if (timer_list-clock) {
 QLIST_REMOVE(timer_list, list);
 }
-qemu_mutex_destroy(timer_list-active_timers_lock);
+qemu_event_destroy(timer_list-timers_done_ev);
 g_free(timer_list);
 }
 
@@ -129,6 +128,7 @@ static void qemu_clock_init(QEMUClockType type)
 clock-type = type;
 clock-enabled = true;
 clock-last = INT64_MIN;
+qemu_mutex_init(clock-timer_lock);
 QLIST_INIT(clock-timerlists);
 notifier_list_init(clock-reset_notifiers);
 main_loop_tlg.tl[type] = timerlist_new(type, NULL, NULL);
@@ -148,13 +148,6 @@ void qemu_clock_notify(QEMUClockType type)
 }
 }
 
-/* Disabling the clock will wait for related timerlists to stop
- * executing qemu_run_timers.  Thus, this functions should not
- * be used from the callback of a timer that is based on @clock.
- * Doing so would cause a deadlock.
- *
- * Caller should hold BQL.
- */
 void qemu_clock_enable(QEMUClockType type, bool enabled)
 {
 QEMUClock *clock = qemu_clock_ptr(type);
@@ -172,7 +165,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
 
 bool timerlist_has_timers(QEMUTimerList *timer_list)
 {
-return !!timer_list-active_timers

[Qemu-devel] [PATCH 2/2] [RFC] Convert Clock Timerlists to RCU V3

2014-03-06 Thread Mike Day
timerlists is a list of lists that holds active timers, among other
items. It is read-mostly. This patch converts read access to the
timerlists to use RCU.

Rather than introduce a second mutex for timerlists, which would
require nested mutexes to in order to write to the timerlists, use one
QemuMutex in the QemuClock structure for all write access to any list
hanging off the QemuClock structure.

This patch applies against Paolo Bonzini's rcu branch:
https://github.com/bonzini/qemu/tree/rcu and also requires the
previously submitted patch ae11e1c Convert active timers list to use
RCU for read operations V3.

V3:
- timerlists modifications split to a separate patch (this one).
- Addressed comments from Alex Bligh and Paolo Bonzini.

Signed-off-by: Mike Day ncm...@ncultra.org
---
 qemu-timer.c | 31 ++-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index 57a1545..4144e54 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -74,6 +74,7 @@ struct QEMUTimerList {
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
 QemuEvent timers_done_ev;
+struct rcu_head rcu;
 };
 
 /**
@@ -111,6 +112,13 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
 return timer_list;
 }
 
+static void reclaim_timerlist(struct rcu_head *rcu)
+{
+QEMUTimerList *tl = container_of(rcu, QEMUTimerList, rcu);
+g_free(tl);
+}
+
+
 void timerlist_free(QEMUTimerList *timer_list)
 {
 assert(!timerlist_has_timers(timer_list));
@@ -118,7 +126,7 @@ void timerlist_free(QEMUTimerList *timer_list)
 QLIST_REMOVE(timer_list, list);
 }
 qemu_event_destroy(timer_list-timers_done_ev);
-g_free(timer_list);
+call_rcu1(timer_list-rcu, reclaim_timerlist);
 }
 
 static void qemu_clock_init(QEMUClockType type)
@@ -143,9 +151,11 @@ void qemu_clock_notify(QEMUClockType type)
 {
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, clock-timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, clock-timerlists, list) {
 timerlist_notify(timer_list);
 }
+rcu_read_unlock();
 }
 
 void qemu_clock_enable(QEMUClockType type, bool enabled)
@@ -157,9 +167,11 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
 if (enabled  !old) {
 qemu_clock_notify(type);
 } else if (!enabled  old) {
-QLIST_FOREACH(tl, clock-timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(tl, clock-timerlists, list) {
 qemu_event_wait(tl-timers_done_ev);
 }
+rcu_read_unlock();
 }
 }
 
@@ -243,10 +255,12 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type)
 int64_t deadline = -1;
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, clock-timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, clock-timerlists, list) {
 deadline = qemu_soonest_timeout(deadline,
 timerlist_deadline_ns(timer_list));
 }
+rcu_read_unlock();
 return deadline;
 }
 
@@ -262,11 +276,13 @@ QEMUTimerList 
*qemu_clock_get_main_loop_timerlist(QEMUClockType type)
 
 void timerlist_notify(QEMUTimerList *timer_list)
 {
-if (timer_list-notify_cb) {
+rcu_read_lock();
+if (atomic_rcu_read(timer_list-notify_cb)) {
 timer_list-notify_cb(timer_list-notify_opaque);
 } else {
 qemu_notify_event();
 }
+rcu_read_unlock();
 }
 
 /* Transition function to convert a nanosecond timeout to ms
@@ -585,13 +601,18 @@ void qemu_clock_register_reset_notifier(QEMUClockType 
type,
 Notifier *notifier)
 {
 QEMUClock *clock = qemu_clock_ptr(type);
+qemu_mutex_lock(clock-timer_lock);
 notifier_list_add(clock-reset_notifiers, notifier);
+qemu_mutex_unlock(clock-timer_lock);
 }
 
 void qemu_clock_unregister_reset_notifier(QEMUClockType type,
   Notifier *notifier)
 {
+QEMUClock *clock = qemu_clock_ptr(type);
+qemu_mutex_lock(clock-timer_lock);
 notifier_remove(notifier);
+qemu_mutex_unlock(clock-timer_lock);
 }
 
 void init_clocks(void)
-- 
1.9.0




Re: [Qemu-devel] [PATCH 2/2] [RFC] Convert Clock Timerlists to RCU V2

2014-03-03 Thread Mike Day
On Fri, Feb 28, 2014 at 3:05 PM, Alex Bligh a...@alex.org.uk wrote:

 Rather than introduce a second mutex for timerlists, which would
 require nested mutexes to in orderwrite to the timerlists, use one
 QemuMutex in the QemuClock structure for all write access to any list
 hanging off the QemuClock structure.

 I sort of wonder why you don't just use one Mutex across the whole
 of QEMU rather than one per clock.

 This would save worrying about whether when you do things like:
   qemu_mutex_lock(timer_list-clock-clock_mutex);

I like it, it solves another problem I spotted after I submitted the patch.


 @@ -108,19 +108,24 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
 timer_list-clock = clock;
 timer_list-notify_cb = cb;
 timer_list-notify_opaque = opaque;
 -qemu_mutex_init(timer_list-active_timers_lock);
 +qemu_mutex_init(clock-clock_mutex);
 QLIST_INSERT_HEAD(clock-timerlists, timer_list, list);
 return timer_list;
 }

 That can't be right, you are calling qemu_mutex_init for each
 timerlist, but the timerlists share the mutex which is now in the
 clock. The mutex should therefore be initialized in qemu_clock_init.

 Also, clock_mutex appears never to get destroyed.

Yes, thank you, on both points.


 QEMUClockType timerlist_get_clock(QEMUTimerList *timer_list)
 {
 -return timer_list-clock-type;
 +return atomic_rcu_read(timer_list-clock-type);
 }

 I don't think this works because of the double indirection. It's
 The '' means it's not actually dereferencing clock-type, but it
 is dereferencing timer_list-clock outside of either an rcu read
 lock or an atomic read. I think you'd be better with than
 rcu_read_lock() etc. (sadly).

I should fix this with parenthesis as in (timer_list-clock-type)

 QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClockType type)
 @@ -261,11 +272,13 @@ QEMUTimerList 
 *qemu_clock_get_main_loop_timerlist(QEMUClockType type)

 void timerlist_notify(QEMUTimerList *timer_list)
 {
 -if (timer_list-notify_cb) {
 +rcu_read_lock();
 +if (atomic_rcu_read(timer_list-notify_cb)) {
 timer_list-notify_cb(timer_list-notify_opaque);
 } else {
 qemu_notify_event();
 }
 +rcu_read_unlock();
 }

 If you have the rcu_read_lock why do you need the atomic_rcu_read?
 And if you need it, why do you not need it on the following line?

rcu_read_lock/unlock and atomic_rcu_read/set do different things and
frequently need to be used together. rcu_read_lock/unlock causes the
thread to enter an RCU critical section by getting a counter out of
TLS. atomic_rcu_read/set will use memory barriers to flush caches
(depending on the memory model and platform) and ensure that reads and
writes are ordered. You typically would use atomic_rcu_read on the
first read, thereafter the memory is up-to-date and writes have been
flushed. And you typically use atomic_rcu_set on the last write, when
the data structure is complete. You don't need to use them on every
update. Just for completeness, rcu_read_unlock ends the rcu critical
section but its usually a no-op.

 However, as any QEMUTimerList can (now) be reclaimed, surely all
 callers of this function are going to need to hold the rcu_read_lock
 anyway?

Yes

 I think this is used only by timerlist_rearm and qemu_clock_notify.
 Provided these call this function holding the rcu_read_lock
 I don't think this function needs changing.

 /* Transition function to convert a nanosecond timeout to ms
 @@ -389,24 +402,30 @@ static void timerlist_rearm(QEMUTimerList *timer_list)
 /* stop a timer, but do not dealloc it */
 void timer_del(QEMUTimer *ts)
 {
 -QEMUTimerList *timer_list = ts-timer_list;
 +QEMUTimerList *timer_list;

 -qemu_mutex_lock(timer_list-active_timers_lock);
 +timer_list = atomic_rcu_read(ts-timer_list);
 +rcu_read_lock();
 +qemu_mutex_lock(timer_list-clock-clock_mutex);
 +rcu_read_unlock();
 timer_del_locked(timer_list, ts);
 -qemu_mutex_unlock(timer_list-active_timers_lock);
 +qemu_mutex_unlock(timer_list-clock-clock_mutex);
 }

 Again in my ignorance of RCU I don't know whether taking a mutex
 implicitly takes an rcu read lock. If not, that rcu_read_unlock
 should be after the mutex unlock.

I am going to change this because of a race condition, To answer your
question, holding a mutex does not imply holding an rcu read lock. It
does indicate the potential need for readers to use rcu to read the
list because they can do so when the mutex holder is updating the
list. And, I'm working under the assumption that holding a mutex
implies a memory barrier so there is no need for atomic_rcu_read/set.


 @@ -447,7 +468,10 @@ void timer_mod_anticipate(QEMUTimer *ts, int64_t 
 expire_time)

 bool timer_pending(QEMUTimer *ts)
 {
 -return ts-expire_time = 0;
 +int64 expire_time;
 +
 +expire_time = atomic_rcu_read(ts-expire_time);
 +return expire_time = 0;
 }

 Why not just

return atomic_rcu_read(ts-expire_time) = 0;


Re: [Qemu-devel] [PATCH v7 1/2] target-ppc: add PowerPCCPU::cpu_dt_id

2014-03-03 Thread Mike Day
On Sat, Feb 1, 2014 at 9:45 AM, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 Normally CPUState::cpu_index is used to pick the right CPU for various
 operations. However default consecutive numbering does not always work
 for POWERPC.

 These indexes are reflected in /proc/device-tree/cpus/PowerPC,POWER7@XX
 and used to call KVM VCPU's ioctls. In order to achieve this,
 kvmppc_fixup_cpu() was introduced. Roughly speaking, it multiplies
 cpu_index by the number of threads per core.

 This approach has disadvantages such as:
 1. NUMA configuration stays broken after the fixup;
 2. CPU-targeted commands from the QEMU Monitor do not work properly as
 CPU indexes have been fixed and there is no clear way for the user to
 know what the new CPU indexes are.

 This introduces a @cpu_dt_id field in the CPUPPCState struct which
 is initialized from @cpu_index by default and can be fixed later
 to meet the device tree requirements.

 This adds an API to handle @cpu_dt_id.

 This removes kvmppc_fixup_cpu() as it is not more needed, @cpu_dt_id
 is calculated in ppc_cpu_realize().

 This will be used later in machine code.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Acked-by: Mike Day ncm...@ncultra.org

 ---
 Changes:
 v6: inlined kvmppc_fixup_cpu()
 ---
  hw/ppc/ppc.c| 22 ++
  target-ppc/cpu-qom.h|  2 ++
  target-ppc/cpu.h| 18 ++
  target-ppc/kvm.c| 13 -
  target-ppc/kvm_ppc.h|  6 --
  target-ppc/translate_init.c | 10 --
  6 files changed, 46 insertions(+), 25 deletions(-)

 diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
 index 114be64..0e82719 100644
 --- a/hw/ppc/ppc.c
 +++ b/hw/ppc/ppc.c
 @@ -26,6 +26,7 @@
  #include hw/ppc/ppc_e500.h
  #include qemu/timer.h
  #include sysemu/sysemu.h
 +#include sysemu/cpus.h
  #include hw/timer/m48t59.h
  #include qemu/log.h
  #include hw/loader.h
 @@ -1362,3 +1363,24 @@ int PPC_NVRAM_set_params (nvram_t *nvram, uint16_t 
 NVRAM_size,

  return 0;
  }
 +
 +/* CPU device-tree ID helpers */
 +int ppc_get_vcpu_dt_id(PowerPCCPU *cpu)
 +{
 +return cpu-cpu_dt_id;
 +}
 +
 +PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id)
 +{
 +CPUState *cs;
 +
 +CPU_FOREACH(cs) {
 +PowerPCCPU *cpu = POWERPC_CPU(cs);
 +
 +if (cpu-cpu_dt_id == cpu_dt_id) {
 +return cpu;
 +}
 +}
 +
 +return NULL;
 +}
 diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
 index 72b2232..b17c024 100644
 --- a/target-ppc/cpu-qom.h
 +++ b/target-ppc/cpu-qom.h
 @@ -79,6 +79,7 @@ typedef struct PowerPCCPUClass {
  /**
   * PowerPCCPU:
   * @env: #CPUPPCState
 + * @cpu_dt_id: CPU index used in the device tree. KVM uses this index too
   *
   * A PowerPC CPU.
   */
 @@ -88,6 +89,7 @@ typedef struct PowerPCCPU {
  /* public */

  CPUPPCState env;
 +int cpu_dt_id;
  } PowerPCCPU;

  static inline PowerPCCPU *ppc_env_get_cpu(CPUPPCState *env)
 diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
 index 51bcd4a..d8577ae 100644
 --- a/target-ppc/cpu.h
 +++ b/target-ppc/cpu.h
 @@ -2154,4 +2154,22 @@ static inline bool cpu_has_work(CPUState *cpu)

  void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUPPCState *env);

 +/**
 + * ppc_get_vcpu_dt_id:
 + * @cs: a PowerPCCPU struct.
 + *
 + * Returns a device-tree ID for a CPU.
 + */
 +int ppc_get_vcpu_dt_id(PowerPCCPU *cpu);
 +
 +/**
 + * ppc_get_vcpu_by_dt_id:
 + * @cpu_dt_id: a device tree id
 + *
 + * Searches for a CPU by @cpu_dt_id.
 + *
 + * Returns: a PowerPCCPU struct
 + */
 +PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id);
 +
  #endif /* !defined (__CPU_PPC_H__) */
 diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
 index 781b72f..8bcc5fb 100644
 --- a/target-ppc/kvm.c
 +++ b/target-ppc/kvm.c
 @@ -1766,19 +1766,6 @@ static void kvmppc_host_cpu_class_init(ObjectClass 
 *oc, void *data)
  }
  }

 -int kvmppc_fixup_cpu(PowerPCCPU *cpu)
 -{
 -CPUState *cs = CPU(cpu);
 -int smt;
 -
 -/* Adjust cpu index for SMT */
 -smt = kvmppc_smt_threads();
 -cs-cpu_index = (cs-cpu_index / smp_threads) * smt
 -+ (cs-cpu_index % smp_threads);
 -
 -return 0;
 -}
 -
  bool kvmppc_has_cap_epr(void)
  {
  return cap_epr;
 diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
 index 5f78e4b..f3afcdb 100644
 --- a/target-ppc/kvm_ppc.h
 +++ b/target-ppc/kvm_ppc.h
 @@ -36,7 +36,6 @@ int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t 
 window_size);
  int kvmppc_reset_htab(int shift_hint);
  uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
  #endif /* !CONFIG_USER_ONLY */
 -int kvmppc_fixup_cpu(PowerPCCPU *cpu);
  bool kvmppc_has_cap_epr(void);
  int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function);
  int kvmppc_get_htab_fd(bool write);
 @@ -155,11 +154,6 @@ static inline int kvmppc_update_sdr1(CPUPPCState *env)

  #endif /* !CONFIG_USER_ONLY */

 -static inline int kvmppc_fixup_cpu(PowerPCCPU *cpu)
 -{
 -return -1

Re: [Qemu-devel] [PATCH v7 2/2] target-ppc: spapr: e500: fix to use cpu_dt_id

2014-03-03 Thread Mike Day
On Sat, Feb 1, 2014 at 9:45 AM, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 This makes use of @cpu_dt_id and related API in:
 1. emulated XICS hypercall handlers as they receive fixed CPU indexes;
 2. XICS-KVM to enable in-kernel XICS on right CPU;
 3. device-tree renderer.

 This removes @cpu_index fixup as @cpu_dt_id is used instead so QEMU monitor
 can accept command-line CPU indexes again.

 This changes kvm_arch_vcpu_id() to use ppc_get_vcpu_dt_id() as at the moment
 KVM CPU id and device tree ID are calculated using the same algorithm.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Acked-by: Mike Day ncm...@ncultra.org

 ---
 Changes:
 v7:
 * replaced referencing to PowerPCCPU::parent_obj with the CPU macro
 ---
  hw/intc/openpic_kvm.c   |  2 +-
  hw/intc/xics.c  | 15 +--
  hw/intc/xics_kvm.c  | 10 +-
  hw/ppc/e500.c   |  7 +--
  hw/ppc/spapr.c  |  9 +
  hw/ppc/spapr_hcall.c|  6 +++---
  hw/ppc/spapr_rtas.c | 14 +++---
  target-ppc/kvm.c|  2 +-
  target-ppc/translate_init.c |  1 +
  9 files changed, 41 insertions(+), 25 deletions(-)

 diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
 index c7f7b84..87fdb12 100644
 --- a/hw/intc/openpic_kvm.c
 +++ b/hw/intc/openpic_kvm.c
 @@ -228,7 +228,7 @@ int kvm_openpic_connect_vcpu(DeviceState *d, CPUState *cs)

  encap.cap = KVM_CAP_IRQ_MPIC;
  encap.args[0] = opp-fd;
 -encap.args[1] = cs-cpu_index;
 +encap.args[1] = kvm_arch_vcpu_id(cs);

  return kvm_vcpu_ioctl(cs, KVM_ENABLE_CAP, encap);
  }
 diff --git a/hw/intc/xics.c b/hw/intc/xics.c
 index b437563..64aabe7 100644
 --- a/hw/intc/xics.c
 +++ b/hw/intc/xics.c
 @@ -33,6 +33,17 @@
  #include qemu/error-report.h
  #include qapi/visitor.h

 +static int get_cpu_index_by_dt_id(int cpu_dt_id)
 +{
 +PowerPCCPU *cpu = ppc_get_vcpu_by_dt_id(cpu_dt_id);
 +
 +if (cpu) {
 +return cpu-parent_obj.cpu_index;
 +}
 +
 +return -1;
 +}
 +
  void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu)
  {
  CPUState *cs = CPU(cpu);
 @@ -659,7 +670,7 @@ static target_ulong h_cppr(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  static target_ulong h_ipi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
target_ulong opcode, target_ulong *args)
  {
 -target_ulong server = args[0];
 +target_ulong server = get_cpu_index_by_dt_id(args[0]);
  target_ulong mfrr = args[1];

  if (server = spapr-icp-nr_servers) {
 @@ -728,7 +739,7 @@ static void rtas_set_xive(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  }

  nr = rtas_ld(args, 0);
 -server = rtas_ld(args, 1);
 +server = get_cpu_index_by_dt_id(rtas_ld(args, 1));
  priority = rtas_ld(args, 2);

  if (!ics_valid_irq(ics, nr) || (server = ics-icp-nr_servers)
 diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
 index c203646..a5bbc24 100644
 --- a/hw/intc/xics_kvm.c
 +++ b/hw/intc/xics_kvm.c
 @@ -65,7 +65,7 @@ static void icp_get_kvm_state(ICPState *ss)
  ret = kvm_vcpu_ioctl(ss-cs, KVM_GET_ONE_REG, reg);
  if (ret != 0) {
  error_report(Unable to retrieve KVM interrupt controller state
 - for CPU %d: %s, ss-cs-cpu_index, strerror(errno));
 + for CPU %ld: %s, kvm_arch_vcpu_id(ss-cs), 
 strerror(errno));
  exit(1);
  }

 @@ -97,7 +97,7 @@ static int icp_set_kvm_state(ICPState *ss, int version_id)
  ret = kvm_vcpu_ioctl(ss-cs, KVM_SET_ONE_REG, reg);
  if (ret != 0) {
  error_report(Unable to restore KVM interrupt controller state (0x%
 -PRIx64 ) for CPU %d: %s, state, ss-cs-cpu_index,
 +PRIx64 ) for CPU %ld: %s, state, kvm_arch_vcpu_id(ss-cs),
  strerror(errno));
  return ret;
  }
 @@ -325,15 +325,15 @@ static void xics_kvm_cpu_setup(XICSState *icp, 
 PowerPCCPU *cpu)
  struct kvm_enable_cap xics_enable_cap = {
  .cap = KVM_CAP_IRQ_XICS,
  .flags = 0,
 -.args = {icpkvm-kernel_xics_fd, cs-cpu_index, 0, 0},
 +.args = {icpkvm-kernel_xics_fd, kvm_arch_vcpu_id(cs), 0, 0},
  };

  ss-cs = cs;

  ret = kvm_vcpu_ioctl(ss-cs, KVM_ENABLE_CAP, xics_enable_cap);
  if (ret  0) {
 -error_report(Unable to connect CPU%d to kernel XICS: %s,
 -cs-cpu_index, strerror(errno));
 +error_report(Unable to connect CPU%ld to kernel XICS: %s,
 +kvm_arch_vcpu_id(cs), strerror(errno));
  exit(1);
  }
  }
 diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
 index b37ce9d..8a08752 100644
 --- a/hw/ppc/e500.c
 +++ b/hw/ppc/e500.c
 @@ -238,6 +238,7 @@ static int ppce500_load_device_tree(QEMUMachineInitArgs 
 *args,
 the first node as boot node and be happy */
  for (i = smp_cpus - 1; i = 0; i--) {
  CPUState *cpu;
 +PowerPCCPU *pcpu;
  char cpu_name[128

[Qemu-devel] [PATCH 0/2] [RFC] Convert Clock lists to RCU (V2)

2014-02-27 Thread Mike Day
The first patch in the series convers the
clock-timerlists-active_timers list to RCU. The second patch
converts clock-timerlists to RCU and also protects access to
timerlists-active_timers-timer_list.

Mike Day (2):
  [RFC] Convert active timers list to use RCU V2
  [RFC] Convert Clock Timerlists to RCU V2

 include/qemu/timer.h |  19 ---
 qemu-timer.c | 148 +++
 2 files changed, 98 insertions(+), 69 deletions(-)

-- 
1.8.5.3




[Qemu-devel] [PATCH 1/2] [RFC] Convert active timers list to use RCU V2

2014-02-27 Thread Mike Day
active_timers is a list of timer lists, associated with a Qemu Clock,
that is read-mostly. This patch converts read accesses to the list to
use RCU, which should hopefully mitigate most of the synchronization
overhead.

This patch applies against Paolo Bonzini's rcu branch:
https://github.com/bonzini/qemu/tree/rcu

V2:
- Addresses comments from Alex Bligh

Signed-off-by: Mike Day ncm...@ncultra.org
---
 include/qemu/timer.h | 19 +++
 qemu-timer.c | 69 ++--
 2 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 5afcffc..f69ec49 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -5,8 +5,15 @@
 #include qemu-common.h
 #include qemu/notify.h
 
-/* timers */
+#ifdef __GNUC__
+#ifndef __ATOMIC_RELEASE
+#define __ATOMIC_RELEASE
+#endif
+#endif
+#include qemu/atomic.h
+#include qemu/rcu.h
 
+/* timers */
 #define SCALE_MS 100
 #define SCALE_US 1000
 #define SCALE_NS 1
@@ -61,6 +68,7 @@ struct QEMUTimer {
 void *opaque;
 QEMUTimer *next;
 int scale;
+struct rcu_head rcu;
 };
 
 extern QEMUTimerListGroup main_loop_tlg;
@@ -189,12 +197,6 @@ void qemu_clock_notify(QEMUClockType type);
  * @enabled: true to enable, false to disable
  *
  * Enable or disable a clock
- * Disabling the clock will wait for related timerlists to stop
- * executing qemu_run_timers.  Thus, this functions should not
- * be used from the callback of a timer that is based on @clock.
- * Doing so would cause a deadlock.
- *
- * Caller should hold BQL.
  */
 void qemu_clock_enable(QEMUClockType type, bool enabled);
 
@@ -543,7 +545,6 @@ void timer_del(QEMUTimer *ts);
  * freed while this function is running.
  */
 void timer_mod_ns(QEMUTimer *ts, int64_t expire_time);
-
 /**
  * timer_mod_anticipate_ns:
  * @ts: the timer
@@ -685,9 +686,7 @@ static inline int64_t qemu_soonest_timeout(int64_t 
timeout1, int64_t timeout2)
 void init_clocks(void);
 
 int64_t cpu_get_ticks(void);
-/* Caller must hold BQL */
 void cpu_enable_ticks(void);
-/* Caller must hold BQL */
 void cpu_disable_ticks(void);
 
 static inline int64_t get_ticks_per_sec(void)
diff --git a/qemu-timer.c b/qemu-timer.c
index fb9e680..dad30a3 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -29,6 +29,7 @@
 #include hw/hw.h
 
 #include qemu/timer.h
+#include qemu/rcu_queue.h
 #ifdef CONFIG_POSIX
 #include pthread.h
 #endif
@@ -45,12 +46,10 @@
 /* timers */
 
 typedef struct QEMUClock {
-/* We rely on BQL to protect the timerlists */
 QLIST_HEAD(, QEMUTimerList) timerlists;
 
 NotifierList reset_notifiers;
 int64_t last;
-
 QEMUClockType type;
 bool enabled;
 
@@ -75,6 +74,7 @@ struct QEMUTimerList {
 QLIST_ENTRY(QEMUTimerList) list;
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
+QemuEvent timers_done_ev;
 };
 
 /**
@@ -87,6 +87,7 @@ struct QEMUTimerList {
  */
 static inline QEMUClock *qemu_clock_ptr(QEMUClockType type)
 {
+smp_rmb();
 return qemu_clocks[type];
 }
 
@@ -148,13 +149,6 @@ void qemu_clock_notify(QEMUClockType type)
 }
 }
 
-/* Disabling the clock will wait for related timerlists to stop
- * executing qemu_run_timers.  Thus, this functions should not
- * be used from the callback of a timer that is based on @clock.
- * Doing so would cause a deadlock.
- *
- * Caller should hold BQL.
- */
 void qemu_clock_enable(QEMUClockType type, bool enabled)
 {
 QEMUClock *clock = qemu_clock_ptr(type);
@@ -172,7 +166,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
 
 bool timerlist_has_timers(QEMUTimerList *timer_list)
 {
-return !!timer_list-active_timers;
+return !!atomic_rcu_read(timer_list-active_timers);
 }
 
 bool qemu_clock_has_timers(QEMUClockType type)
@@ -184,16 +178,17 @@ bool qemu_clock_has_timers(QEMUClockType type)
 bool timerlist_expired(QEMUTimerList *timer_list)
 {
 int64_t expire_time;
+bool ret;
 
-qemu_mutex_lock(timer_list-active_timers_lock);
-if (!timer_list-active_timers) {
-qemu_mutex_unlock(timer_list-active_timers_lock);
+rcu_read_lock();
+if (!atomic_rcu_read(timer_list-active_timers)) {
+rcu_read_unlock();
 return false;
 }
 expire_time = timer_list-active_timers-expire_time;
-qemu_mutex_unlock(timer_list-active_timers_lock);
-
-return expire_time  qemu_clock_get_ns(timer_list-clock-type);
+ret = (expire_time  qemu_clock_get_ns(timer_list-clock-type));
+rcu_read_unlock();
+return ret;
 }
 
 bool qemu_clock_expired(QEMUClockType type)
@@ -220,16 +215,16 @@ int64_t timerlist_deadline_ns(QEMUTimerList *timer_list)
  * value but -notify_cb() is called when the deadline changes.  Therefore
  * the caller should notice the change and there is no race condition.
  */
-qemu_mutex_lock(timer_list-active_timers_lock);
-if (!timer_list-active_timers) {
-qemu_mutex_unlock(timer_list-active_timers_lock);
+
+rcu_read_lock

[Qemu-devel] [PATCH 2/2] [RFC] Convert Clock Timerlists to RCU V2

2014-02-27 Thread Mike Day
timerlists is a list of lists that holds active timers, among other
items. It is read-mostly. This patch converts read access to the
timerlists to use RCU.

Rather than introduce a second mutex for timerlists, which would
require nested mutexes to in orderwrite to the timerlists, use one
QemuMutex in the QemuClock structure for all write access to any list
hanging off the QemuClock structure.

This patch also protects timerlists-active_timers-timer_list by the
new clock mutex for write access and by RCU for read access.

This patch applies against Paolo Bonzini's rcu branch:
https://github.com/bonzini/qemu/tree/rcu and also requires the
previously submitted patch 03fba95 Convert active timers list to use
RCU for read operations V2.

V2:
- timerlists modifications split to a separate patch (this one).
- Addressed Alex Blighs comments.

Signed-off-by: Mike Day ncm...@ncultra.org
---
 qemu-timer.c | 79 +---
 1 file changed, 54 insertions(+), 25 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index dad30a3..e335a7a 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -47,7 +47,7 @@
 
 typedef struct QEMUClock {
 QLIST_HEAD(, QEMUTimerList) timerlists;
-
+QemuMutex clock_mutex;
 NotifierList reset_notifiers;
 int64_t last;
 QEMUClockType type;
@@ -69,12 +69,12 @@ QEMUClock qemu_clocks[QEMU_CLOCK_MAX];
 
 struct QEMUTimerList {
 QEMUClock *clock;
-QemuMutex active_timers_lock;
 QEMUTimer *active_timers;
 QLIST_ENTRY(QEMUTimerList) list;
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
 QemuEvent timers_done_ev;
+struct rcu_head rcu;
 };
 
 /**
@@ -108,19 +108,24 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
 timer_list-clock = clock;
 timer_list-notify_cb = cb;
 timer_list-notify_opaque = opaque;
-qemu_mutex_init(timer_list-active_timers_lock);
+qemu_mutex_init(clock-clock_mutex);
 QLIST_INSERT_HEAD(clock-timerlists, timer_list, list);
 return timer_list;
 }
 
+static void reclaim_timerlist(struct rcu_head *rcu)
+{
+QEMUTimerList *tl = container_of(rcu, QEMUTimerList, rcu);
+g_free(tl);
+}
+
 void timerlist_free(QEMUTimerList *timer_list)
 {
 assert(!timerlist_has_timers(timer_list));
 if (timer_list-clock) {
 QLIST_REMOVE(timer_list, list);
 }
-qemu_mutex_destroy(timer_list-active_timers_lock);
-g_free(timer_list);
+call_rcu1(timer_list-rcu, reclaim_timerlist);
 }
 
 static void qemu_clock_init(QEMUClockType type)
@@ -144,9 +149,11 @@ void qemu_clock_notify(QEMUClockType type)
 {
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, clock-timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, clock-timerlists, list) {
 timerlist_notify(timer_list);
 }
+rcu_read_unlock();
 }
 
 void qemu_clock_enable(QEMUClockType type, bool enabled)
@@ -154,13 +161,15 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
 QEMUClock *clock = qemu_clock_ptr(type);
 QEMUTimerList *tl;
 bool old = clock-enabled;
-clock-enabled = enabled;
+atomic_rcu_set(clock-enabled, enabled);
 if (enabled  !old) {
 qemu_clock_notify(type);
 } else if (!enabled  old) {
-QLIST_FOREACH(tl, clock-timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(tl, clock-timerlists, list) {
 qemu_event_wait(tl-timers_done_ev);
 }
+rcu_read_unlock();
 }
 }
 
@@ -242,16 +251,18 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type)
 int64_t deadline = -1;
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, clock-timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, clock-timerlists, list) {
 deadline = qemu_soonest_timeout(deadline,
 timerlist_deadline_ns(timer_list));
 }
+rcu_read_unlock();
 return deadline;
 }
 
 QEMUClockType timerlist_get_clock(QEMUTimerList *timer_list)
 {
-return timer_list-clock-type;
+return atomic_rcu_read(timer_list-clock-type);
 }
 
 QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClockType type)
@@ -261,11 +272,13 @@ QEMUTimerList 
*qemu_clock_get_main_loop_timerlist(QEMUClockType type)
 
 void timerlist_notify(QEMUTimerList *timer_list)
 {
-if (timer_list-notify_cb) {
+rcu_read_lock();
+if (atomic_rcu_read(timer_list-notify_cb)) {
 timer_list-notify_cb(timer_list-notify_opaque);
 } else {
 qemu_notify_event();
 }
+rcu_read_unlock();
 }
 
 /* Transition function to convert a nanosecond timeout to ms
@@ -389,24 +402,30 @@ static void timerlist_rearm(QEMUTimerList *timer_list)
 /* stop a timer, but do not dealloc it */
 void timer_del(QEMUTimer *ts)
 {
-QEMUTimerList *timer_list = ts-timer_list;
+QEMUTimerList *timer_list;
 
-qemu_mutex_lock

Re: [Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU

2014-02-17 Thread Mike Day
 1. You seem to be removing the use of the active_timers_lock and replacing it 
 by
rcu (fine). However, you seem to have left the qemu_mutex_destroy in
timerlist_free, and left the mutex in QEMUTimerList. Any reason why we 
 need  both?


I responded incorrectly to this yesterday. We still need the mutex
here (active_timers_lock) to provide synchronization for list updates.
The difference now is that we don't need to hold the mutex for
traversing the list. But to update the list we still need to hold the
mutex.

Mike



Re: [Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU

2014-02-15 Thread Mike Day
 more unlocks than locks?

Yes, I think the right fix is to omit the last rcu_read_unlock.

-- 
Mike Day | Endurance is a Virtue



Re: [Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU

2014-02-13 Thread Mike Day
On Thu, Feb 13, 2014 at 4:25 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 13/02/2014 10:11, Alex Bligh ha scritto:

 I'll certainly have a look through this. However before I do, what
 problem is this trying to solve? Do we think there is possibility
 of contention on the active timers lock? I used to think this was
 taken (let alone contented) relatively infrequently, but Rob Herring's
 recent email suggests to me the list is being modified in some
 circumstances rather more frequently than I thought.

 I think that, more than contention, it tries to reduce the cost of
 synchronization primitives, especially the locking and unlocking of the list
 around the invocation of timer callbacks.

Yes, the assumption is that the active timers are a read-mostly list,
so rcu is a win.

Mike



[Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU

2014-02-12 Thread Mike Day

Allow readers to use RCU when reading Qemu timer lists. Applies to
Paolo Bonzini's RCU branch, https://github.com/bonzini/qemu/tree/rcu.

This patch is for comment and review only.  The rcu branch needs to be
rebased on upstream.

Signed-off-by: Mike Day ncm...@ncultra.org
---
 include/qemu/timer.h |  9 +-
 qemu-timer.c | 88 +++-
 2 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index b58903b..5aaa213 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -4,7 +4,13 @@
 #include qemu/typedefs.h
 #include qemu-common.h
 #include qemu/notify.h
-
+#ifdef __GNUC__
+#ifndef __ATOMIC_RELEASE
+#define __ATOMIC_RELEASE
+#endif
+#endif
+#include qemu/atomic.h
+#include qemu/rcu.h
 /* timers */
 
 #define SCALE_MS 100
@@ -61,6 +67,7 @@ struct QEMUTimer {
 void *opaque;
 QEMUTimer *next;
 int scale;
+struct rcu_head rcu;
 };
 
 extern QEMUTimerListGroup main_loop_tlg;
diff --git a/qemu-timer.c b/qemu-timer.c
index 6b62e88..27285ca 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -29,6 +29,7 @@
 #include hw/hw.h
 
 #include qemu/timer.h
+#include qemu/rcu_queue.h
 #ifdef CONFIG_POSIX
 #include pthread.h
 #endif
@@ -49,7 +50,6 @@ typedef struct QEMUClock {
 
 NotifierList reset_notifiers;
 int64_t last;
-
 QEMUClockType type;
 bool enabled;
 } QEMUClock;
@@ -71,6 +71,7 @@ struct QEMUTimerList {
 QLIST_ENTRY(QEMUTimerList) list;
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
+struct rcu_head rcu;
 };
 
 /**
@@ -107,6 +108,12 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
 return timer_list;
 }
 
+static void reclaim_timer_list(struct rcu_head *rcu)
+{
+QEMUTimerList *timer_list = container_of(rcu, QEMUTimerList, rcu);
+g_free(timer_list);
+}
+
 void timerlist_free(QEMUTimerList *timer_list)
 {
 assert(!timerlist_has_timers(timer_list));
@@ -114,7 +121,7 @@ void timerlist_free(QEMUTimerList *timer_list)
 QLIST_REMOVE(timer_list, list);
 }
 qemu_mutex_destroy(timer_list-active_timers_lock);
-g_free(timer_list);
+call_rcu1(timer_list-rcu, reclaim_timer_list);
 }
 
 static void qemu_clock_init(QEMUClockType type)
@@ -138,9 +145,11 @@ void qemu_clock_notify(QEMUClockType type)
 {
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, clock-timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, clock-timerlists, list) {
 timerlist_notify(timer_list);
 }
+rcu_read_unlock();
 }
 
 void qemu_clock_enable(QEMUClockType type, bool enabled)
@@ -155,7 +164,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
 
 bool timerlist_has_timers(QEMUTimerList *timer_list)
 {
-return !!timer_list-active_timers;
+return !!atomic_rcu_read(timer_list-active_timers);
 }
 
 bool qemu_clock_has_timers(QEMUClockType type)
@@ -168,15 +177,15 @@ bool timerlist_expired(QEMUTimerList *timer_list)
 {
 int64_t expire_time;
 
-qemu_mutex_lock(timer_list-active_timers_lock);
-if (!timer_list-active_timers) {
-qemu_mutex_unlock(timer_list-active_timers_lock);
+rcu_read_lock();
+if (atomic_rcu_read(timer_list-active_timers)) {
+rcu_read_unlock();
 return false;
 }
-expire_time = timer_list-active_timers-expire_time;
-qemu_mutex_unlock(timer_list-active_timers_lock);
 
+expire_time = timer_list-active_timers-expire_time;
 return expire_time  qemu_clock_get_ns(timer_list-clock-type);
+rcu_read_unlock();
 }
 
 bool qemu_clock_expired(QEMUClockType type)
@@ -194,8 +203,10 @@ int64_t timerlist_deadline_ns(QEMUTimerList *timer_list)
 {
 int64_t delta;
 int64_t expire_time;
-
-if (!timer_list-clock-enabled) {
+bool enabled;
+rcu_read_lock();
+enabled = atomic_rcu_read(timer_list-clock-enabled);
+if (!enabled) {
 return -1;
 }
 
@@ -203,16 +214,13 @@ int64_t timerlist_deadline_ns(QEMUTimerList *timer_list)
  * value but -notify_cb() is called when the deadline changes.  Therefore
  * the caller should notice the change and there is no race condition.
  */
-qemu_mutex_lock(timer_list-active_timers_lock);
 if (!timer_list-active_timers) {
-qemu_mutex_unlock(timer_list-active_timers_lock);
+rcu_read_unlock();
 return -1;
 }
 expire_time = timer_list-active_timers-expire_time;
-qemu_mutex_unlock(timer_list-active_timers_lock);
-
 delta = expire_time - qemu_clock_get_ns(timer_list-clock-type);
-
+rcu_read_unlock();
 if (delta = 0) {
 return 0;
 }
@@ -230,16 +238,22 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type)
 int64_t deadline = -1;
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, clock-timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, clock

Re: [Qemu-devel] [PATCH v3] virtio: Introduce virtio-testdev

2014-01-30 Thread Mike Day

Andrew Jones drjo...@redhat.com writes:

 This is a virtio version of hw/misc/debugexit and should evolve into a
 virtio version of pc-testdev. pc-testdev uses the PC's ISA bus, whereas
 this testdev can be plugged into a virtio-mmio transport, which is
 needed for kvm-unit-tests/arm. virtio-testdev uses the virtio device
 config space as a communication channel, and implements an RTAS-like
 protocol through it allowing guests to execute commands. Only three
 commands are currently implemented;
 1) VERSION: for version compatibility checks
 2) CLEAR:   set all the config space back to zero
 3) EXIT:exit() from qemu with a status code

 +static uint32_t virtio_testdev_get_features(VirtIODevice *vdev, uint32_t f)
 +{
 +return f;
 +}
 +

Is this meant to be a stub currently? 

Mike

-- 
Mike Day | Endurance is a Virtue



Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node

2014-01-20 Thread Mike Day
On Sun, Jan 19, 2014 at 9:58 PM, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 I did not realize DRC is not just for PCI. How hard would it be to add hot
 plug support for a whole PHB? The current QEMU trend is to make QEMU
 monitor's device_add equal to the command line's -device which is not
 (yet) true for PHB but could be. Thanks.

We discussed this approach (hot-plug the whole bus) during the design
phase and at one point started to work on it. I don't think we
established whether or not the Linux sys/bus/pci/* file system would
work with it.

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-20 Thread Mike Day
Do you know which device is writing to the BAR below? From the trace
it appears it should be restoring the memory address to the BAR after
writing all 1s to the BAR and reading back the contents. (the protocol
for finding the length of the bar memory.)

On Thu, Jan 9, 2014 at 12:24 PM, Alex Williamson
alex.william...@redhat.com wrote:
 On Wed, 2013-12-11 at 20:30 +0200, Michael S. Tsirkin wrote:
 From: Paolo Bonzini pbonz...@redhat.com
 vfio: vfio_pci_read_config(:01:10.0, @0x10, len=0x4) febe0004
 (save lower 32bits of BAR)
 vfio: vfio_pci_write_config(:01:10.0, @0x10, 0x, len=0x4)
 (write mask to BAR)

Here the device should restore the memory address (original contents)
to the BAR.

 vfio: region_del febe - febe3fff
 (memory region gets unmapped)
 vfio: vfio_pci_read_config(:01:10.0, @0x10, len=0x4) c004
 (read size mask)
 vfio: vfio_pci_write_config(:01:10.0, @0x10, 0xfebe0004, len=0x4)
 (restore BAR)
 vfio: region_add febe - febe3fff [0x7fcf3654d000]
 (memory region re-mapped)
 vfio: vfio_pci_read_config(:01:10.0, @0x14, len=0x4) 0
 (save upper 32bits of BAR)
 vfio: vfio_pci_write_config(:01:10.0, @0x14, 0x, len=0x4)
 (write mask to BAR)

and here ...

 vfio: region_del febe - febe3fff
 (memory region gets unmapped)
 vfio: region_add febe - febe3fff [0x7fcf3654d000]
 (memory region gets re-mapped with new address)
 qemu-system-x86_64: vfio_dma_map(0x7fcf38861710, 0xfebe, 0x4000, 
 0x7fcf3654d000) = -14 (Bad address)
 (iommu barfs because it can only handle 48bit physical addresses)

I looked around some but I couldn't find an obvious culprit. Could it
be that the BAR is getting unmapped automatically due to
x-intx-mmap-timeout-ms before the device has a chance to finish
restoring the correct value to the BAR?

Mike



Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node

2014-01-20 Thread Mike Day
On Mon, Jan 20, 2014 at 12:24 PM, Michael Roth
mdr...@linux.vnet.ibm.com wrote:
 Quoting Alexey Kardashevskiy (2014-01-19 20:58:20)

 Would need to look at it a bit more closely to say for certain, but after
 discussing it a bit Tyrel/Mike, I think the main considerations would be:

 1) PHB hotplug/unplug would need to signal a different event type in it's
check-exception/epow message, we have stubs in place for a PHB event type,
so that's mostly a matter of adding special-casing in the hotplug callback
for spapr-pci-host-bridge devices
 2) The required properties for the OF node corresponding PHB will be 
 different.
Currently these are generated as part of the hotplug callback, and attached
to the corresponding ConfigureConnectorState node to be fed to the guest
via subsequent ibm,configure-connector RTAS calls, so we'd just hook the
PHB's OF node generation code in there as.
 3) The sysctl/kernel interface for handling PHB hotplug would be different,
we'd be relying on the rpadlar kernel module
(/sys/bus/pci/slots/control/add_slot) rather than rpaphp
(/sys/bus/pci/slots/slot/power) or the PCI rescan fallback.
This is mostly a matter of modifying the handling in the guest tools, 
 namely
in rtas_errd, to handle the event accordingly.

 We also haven't done anything extensive using rpadlpar operations within qemu
 guests, so there may be various odds/ends and possibly kernel changes needed 
 to
 get that working properly (as was the case for rpaphp, though thanks to the 
 PCI
 rescan workaround a new kernel isn't required for existing guests... a similar
 fallback likely won't be available for rpadlpar)

 But from a high-level view at least it seems fairly straight-forward. I'll see
 if we can get a prototype working.

The fact that it just works now by rescanning the pci filesystem is
a significant benefit. I don't think we want to lose it. There can be
many PHBs on one of these systems. Maybe we could make the PHB
hot-pluggable and also always have one PHB plugged in at startup. Then
the guest will see the bus when it starts and it will build the pci
file system.

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-14 Thread Mike Day

Michael S. Tsirkin m...@redhat.com writes:

 On Fri, Jan 10, 2014 at 08:31:36AM -0700, Alex Williamson wrote:

 Short term, just assume 48 bits on x86.

 We need to figure out what's the limitation on ppc and arm -
 maybe there's none and it can address full 64 bit range.

 Cc some people who might know about these platforms.

The document you need is here: 

http://goo.gl/fJYxdN

PCI Bus Binding To: IEEE Std 1275-1994

The short answer is that Power (OpenFirmware-to-PCI) supports both MMIO
and Memory mappings for BARs.

Also, both 32-bit and 64-bit BARs are required to be supported. It is
legal to construct a 64-bit BAR by masking all the high bits to
zero. Presumably it would be OK to mask the 16 high bits to zero as
well, constructing a 48-bit address.

Mike

-- 
Mike Day | Endurance is a Virtue



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-14 Thread Mike Day
On Tue, Jan 14, 2014 at 9:05 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Tue, Jan 14, 2014 at 08:50:54AM -0500, Mike Day wrote:


 Also, both 32-bit and 64-bit BARs are required to be supported. It is
 legal to construct a 64-bit BAR by masking all the high bits to
 zero. Presumably it would be OK to mask the 16 high bits to zero as
 well, constructing a 48-bit address.

 The question was whether addresses such as
 0xfec0 can be a valid BAR value on these
 platforms, whether it's accessible to the CPU and
 to other PCI devices.

The answer has to be no at least for Linux. Linux uses the high bit of
the page table address as state to indicate a huge page and uses
48-bit addresses. Each PCI device is different but right now Power7
supports 16TB of RAM so I don't think the PCI bridge would necessarily
decode the high 16 bits of the memory address. For two PCI devices to
communicate with each other using 64-bit addresses they both need to
support 64-bit memory in the same address range, which is possible.
All this info subject to Paul Mackerras or Alexy …

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-14 Thread Mike Day
   
Prior to this change, there was no re-map with the 
febe

 If we choose not to map them, how do we distinguish them from guest RAM?
 There's no MemoryRegion flag that I'm aware of to distinguish a ram_ptr
 that points to a chunk of guest memory from one that points to the mmap
 of a device BAR.  I think I'd need to explicitly walk all of the vfio
 device and try to match the MemoryRegion pointer to one of my devices.
 That only solves the problem for vfio devices and not ivshmem devices or
 pci-assign devices.


I don't know if this will save you doing your memory region search or
not. But a BAR that ends with the low bit set is MMIO, and BAR that
ends with the low bit clear is RAM. So the address above is RAM as was
pointed out earlier in the thread. If you got an ambitious address in
the future you could test the low bit. But MMIO is deprecated
according to http://wiki.osdev.org/PCI so you probably won't see it,
at least for 64-bit addresses.

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-14 Thread Mike Day
On Tue, Jan 14, 2014 at 12:49 PM, Mike Day ncm...@ncultra.org wrote:
   
Prior to this change, there was no re-map with the 
febe

 If we choose not to map them, how do we distinguish them from guest RAM?
 There's no MemoryRegion flag that I'm aware of to distinguish a ram_ptr
 that points to a chunk of guest memory from one that points to the mmap
 of a device BAR.  I think I'd need to explicitly walk all of the vfio
 device and try to match the MemoryRegion pointer to one of my devices.
 That only solves the problem for vfio devices and not ivshmem devices or
 pci-assign devices.


 I don't know if this will save you doing your memory region search or
 not. But a BAR that ends with the low bit set is MMIO, and BAR that
 ends with the low bit clear is RAM. So the address above is RAM as was
 pointed out earlier in the thread. If you got an ambitious address in
 the future you could test the low bit. But MMIO is deprecated
 according to http://wiki.osdev.org/PCI so you probably won't see it,
 at least for 64-bit addresses.

s/ambitious/ambiguous/

The address above has already been masked. What you need to do is read
the BAR. If the value from the BAR end in '1', its MMIO. If it ends in
'10', its RAM. If it ends in '0n' its disabled. The first thing that
the PCI software does after reading the BAR is mask off the two low
bits.

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-14 Thread Mike Day

 The address above has already been masked. What you need to do is read
 the BAR. If the value from the BAR end in '1', its MMIO. If it ends in
 '10', its RAM. If it ends in '0n' its disabled. The first thing that
 the PCI software does after reading the BAR is mask off the two low
 bits.

 Are you perhaps confusing MMIO and I/O port?  I/O port cannot be mmap'd
 on x86, so it can't be directly mapped.  It also doesn't come through
 the address_space_memory filter.  I/O port is deprecated, or at least
 discouraged, MMIO is not.  Thanks,

You're right, sorry I missed that. It doesn't solve the problem.

Mike



Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core

2014-01-10 Thread Mike Day

Alexey Kardashevskiy a...@ozlabs.ru writes:

 On 01/10/2014 10:40 AM, Alexander Graf wrote:
 

 What if we make the max thread count a property of our cpu class? The
 we
 can add a threads=max option which will be identical between kvm and tcg.


 You lost me here :)
 Right now the sequence is:
 1. smp_parse
 2. config_accelerator
 3. machine_init

 I proposed
 1. config_accelerator - reads max threads from KVM (and initializes host
 type)
 2. smp_parse - does the parsing using smp_threads tweaked in 1)
 3. machine_init - creates CPUs which may or may be not host.

The patch as it its now is very simple and well-contained. I wonder how
much it would expand if we added a max thread count to the cpu class. It
seems like the need for a max thread count is idiomatic to powerpc. 

Mike

-- 
Mike Day | Endurance is a Virtue



Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core

2014-01-10 Thread Mike Day

Alexander Graf ag...@suse.de writes:

 The patch as it its now is very simple and well-contained. I wonder how
 much it would expand if we added a max thread count to the cpu class. It
 seems like the need for a max thread count is idiomatic to powerpc. 

 It's only ever useful on IBM POWER. Any other PowerPC system can partition 
 vcpus by host threads, it's only IBM POWER hardware that's as broken as it is.


 I do see that the user experience is slightly suboptimal, but by creating 
 this special case there's a good chance you're doing more harm than good.

All the problems you raise about special cases are true. But, as you
point out, ibm power is uniquely broken, which would possibly justify a
special case, unless there is a better general solution. In other words,
special cases exist for unique circumstances and if I understand you
correctly this is a unique circumstance.

Alexy explained the use case in his initial posting. The user needs to
allocate all threads of a core to an instance of KVM, but has no easy
way to obtain that information (threads per core) for the Qemu threads
option. So specifying threads=max is a more user-friendly option. In
my understanding this is strictly a usability issue.

Mike
-- 
Mike Day | Endurance is a Virtue



Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core

2014-01-10 Thread Mike Day
On Fri, Jan 10, 2014 at 9:13 AM, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 On 01/11/2014 01:00 AM, Alexander Graf wrote:

 Can't we determine the number of default threads at a common place,
 preferably derived from cpu type?

 We can do anything. I asked how exactly as I really (really) do not
 understand the details.


Are you suggesting we create a dictionary with all the cpu type
information stored in it
(stepping, cores, threads, memory channels, caches) that we need to
keep updated?



Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core

2014-01-10 Thread Mike Day
On Fri, Jan 10, 2014 at 9:25 AM, Alexander Graf ag...@suse.de wrote:
 On 01/11/2014 01:00 AM, Alexander Graf wrote:

 Can't we determine the number of default threads at a common place,
 preferably derived from cpu type?

 We can do anything. I asked how exactly as I really (really) do not
 understand the details.


 Are you suggesting we create a dictionary with all the cpu type
 information stored in it
 (stepping, cores, threads, memory channels, caches) that we need to
 keep updated?

 We can always talk in extremes :). Today we have a dictionary of core types 
 in QEMU. If a certain core type comes with a specific number of threads, 
 that's a property of the core, no?

Yes, very true. I'm just considering the eventual situation where each
cpu type has several or more child classes to represent different
configurations (threads, cores). That would be a lot more complicated
than now.

Mike



Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core

2014-01-09 Thread Mike Day

Alexey Kardashevskiy a...@ozlabs.ru writes:

  /* compute missing values, prefer sockets over cores over threads */
  if (cpus == 0 || sockets == 0) {
  sockets = sockets  0 ? sockets : 1;
  cores = cores  0 ? cores : 1;
 -threads = threads  0 ? threads : 1;
 +if (threads_max) {
 +if (threads  0) {
 +fprintf(stderr, Use either threads or threads_max\n);
 +exit(1);

If you went ahead with the threads=max string option you wouldn't need
to check here for mutual excusivity and the user wouldn't need to worry
about an extra command options.

 +}
 +threads = smp_threads  0 ? smp_threads : 1;
 +} else {
 +threads = threads  0 ? threads : 1;
 +}
  if (cpus == 0) {
  cpus = cores * threads * sockets;
  }

-- 
Mike Day | Endurance is a Virtue



[Qemu-devel] [PATCH] A hexdump function that also displays UTF-8 strings contained in the dumped buffer.

2013-11-11 Thread Mike Day
This function is used by a forthcomingQemu monitor command that dumps
contents of OpenFirmware Device Trees. It dumps contents of a buffer
as hex in the same format as the existing function but also also
appends any UTF-8 strings in human-readable format.

Like the existing hexdump function, this function may be used
elsewhere in Qemu, and it shares the same prototype as the existing
function.

In both functions, check for a NULL prefix parameter and omit printing
the prefix if it is null.

Here is a sample of the output of both functions with no prefix string:

:  61 62 33 64  62 65 65 66  65 62 34 64  66 62 65 03
0010:  67 62 35 64  68 01 05 03  69 62 36 64  6a 01 06 03
0020:  6b 62 37 64  6c 01 07 03  6d 62 38 64  6e 01 08 03
0030:  6f 62 39 64  70 01 09 03  71 62 78 64

:  61 62 33 64  62 65 65 66  65 62 34 64  66 62 65 03   ab3dbeefeb4dfbe.
0010:  67 62 35 64  68 01 05 03  69 62 36 64  6a 01 06 03   gb5dh...ib6dj...
0020:  6b 62 37 64  6c 01 07 03  6d 62 38 64  6e 01 08 03   kb7dl...mb8dn...
0030:  6f 62 39 64  70 01 09 03  71 62 78 64ob9dp...qbxd

Signed-off-by: Mike Day ncm...@ncultra.org
---
 include/qemu-common.h |  2 ++
 util/hexdump.c| 48 +++-
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 5054836..7b8e2b9 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -435,6 +435,8 @@ int mod_utf8_codepoint(const char *s, size_t n, char **end);
  */
 
 void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
+/* include any strings alongside the hex output */
+void qemu_hexdump_str(gchar *buf, FILE *fp, const gchar *prefix, size_t len);
 
 /* vector definitions */
 #ifdef __ALTIVEC__
diff --git a/util/hexdump.c b/util/hexdump.c
index 969b340..a920c81 100644
--- a/util/hexdump.c
+++ b/util/hexdump.c
@@ -21,7 +21,11 @@ void qemu_hexdump(const char *buf, FILE *fp, const char 
*prefix, size_t size)
 
 for (b = 0; b  size; b++) {
 if ((b % 16) == 0) {
-fprintf(fp, %s: %04x:, prefix, b);
+if (prefix) {
+fprintf(fp, %s: %04x:, prefix, b);
+} else {
+fprintf(fp, %04x:, b);
+}
 }
 if ((b % 4) == 0) {
 fprintf(fp,  );
@@ -35,3 +39,45 @@ void qemu_hexdump(const char *buf, FILE *fp, const char 
*prefix, size_t size)
 fprintf(fp, \n);
 }
 }
+
+/* print any strings along side the hex dump */
+void qemu_hexdump_str(gchar *buf, FILE *fp, const gchar *prefix, size_t len)
+{
+
+gchar *inp, *linep;
+int i, offset = 0;
+inp = linep = buf;
+
+do {
+if (prefix) {
+fprintf(fp, %s: %04x:  , prefix, offset);
+} else {
+fprintf(fp, %04x:  , offset);
+}
+for (i = 0; i  16  len  0; i++, len--, offset++, inp++) {
+if (i  !(i % 4)) {
+fprintf(fp,  );
+}
+fprintf(fp, %02hx , *inp);
+}
+int j;
+if (i  16) {
+for (j = 16 - i; j; --j) {
+fprintf(fp,);
+if (j  (!(j % 4))) {
+fprintf(fp,  );
+}
+}
+}
+fprintf(fp,   );
+for (j = 0; j  i; j++) {
+if (*(linep + j)  0x20 || *(linep + j)  0x7e) {
+fprintf(fp, %c, '.');
+} else {
+fprintf(fp, %c, *(linep + j));
+}
+}
+fprintf(fp, \n);
+linep = inp;
+} while (len);
+}



[Qemu-devel] [RFC] SPAPR-PCI Hotplug Support in Qemu

2013-10-10 Thread Mike Day

[RFC] SPAPR-PCI Hotplug Support in Qemu

Background:
ppc64 has a unique bus structure for PCI slots: each slot is connected
to its PHB by a pci switch. This is true in some IBM hardware as well as
paravirtual hardware (PAPR).

SLOF firmware normally scans the hardware bus and creates the correct
slot/PCI switch -complex in the open firmware device tree. It also
configures the slot and PCI switch (BARs, etc.)

For devices set up by platform firmware, each PCI device is attached to
its PHB and correctly configured.

For Linux hot-plugged devices running under PowerVM today, each device
is created with a PCI switch hanging off the dev-subordinate
pointer. (PowerVM gets this info from the open firmware device tree in
rtas.)

Problem:

The Qemu hot-plug path doesn't anticipate a PCI switch being attached
to every PHB slot. 

When hot-plugging a device, Qemu qdev creates the device, which allows
the device to initialize itself. Qemu then passes this initialized
device to the ppc PHB via the hot-plug path.[1]

The current ppc hot-plug code then creates a device tree node for the
device [2], and allocates resources (BARs etc) for the new device. [3]

The ppc64 kernel expects each hot-plugged PCI device structure to
point to a subordinate bus dev-subordinate. This assumption is held
throughout the ppc PCI code, and there are numerous opportunities for
panics when the device gets passed to a kernel routine with a
subordinate pointer. [4]


Proposed Solutions:

(1) Create and hook an inert PCI switch to every hot-plugged PCI
device in Qemu. 

(a) After the device has initialized itself, at hot-plug time, create a
new PCI switch, configure the switch, allocate BARs, and attach the
switch to the hot-plugged devices (dev-subordinate).

(b) create a new device tree node that begins with the PCI switch and
the parent of the hot-plugged device. Add the PCI switch/device
complex to the device tree under the PHB.

(2) Add each hot-plugged PCI device to its own complex of PHB
(Processor Host Bus) and PCI switch.

Simplify (1) by creating a new PHB for each hot-plugged device. 

(a) At PHB creation time, create a PCI switch device node for each PHB
slot. 

(b) At hot-plug time, create and configure a new PHB and add the
hot-plugged device to one of the slots. Configure and allocate
resources as normally. 

Comments:

The current code has only one PHB. We know we need to support more
than one PHB ultimately. Solution #2 is consistent with this approach.


[1] https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c

[2] ibm,rtas_configure_connector:
https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L575

[3] spapr_phb_add_pci_dt
https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L900

[4] dlpar_pci_add_bus
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/pci/hotplug/rpadlpar_core.c?id=8bf3379a74bc9132751bfa685bad2da318fd59d7#n165


-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
Endurance is a Virtue



Re: [Qemu-devel] [RFC] SPAPR-PCI Hotplug Support in Qemu

2013-10-10 Thread Mike Day
Adding Anthony's corrected address.

On Thu, Oct 10, 2013 at 6:25 PM, Mike Day ncm...@ncultra.org wrote:
[RFC] SPAPR-PCI Hotplug Support in Qemu

 Background:
 ppc64 has a unique bus structure for PCI slots: each slot is connected
 to its PHB by a pci switch. This is true in some IBM hardware as well as
 paravirtual hardware (PAPR).

 SLOF firmware normally scans the hardware bus and creates the correct
 slot/PCI switch -complex in the open firmware device tree. It also
 configures the slot and PCI switch (BARs, etc.)

 For devices set up by platform firmware, each PCI device is attached to
 its PHB and correctly configured.

 For Linux hot-plugged devices running under PowerVM today, each device
 is created with a PCI switch hanging off the dev-subordinate
 pointer. (PowerVM gets this info from the open firmware device tree in
 rtas.)

 Problem:

 The Qemu hot-plug path doesn't anticipate a PCI switch being attached
 to every PHB slot.

 When hot-plugging a device, Qemu qdev creates the device, which allows
 the device to initialize itself. Qemu then passes this initialized
 device to the ppc PHB via the hot-plug path.[1]

 The current ppc hot-plug code then creates a device tree node for the
 device [2], and allocates resources (BARs etc) for the new device. [3]

 The ppc64 kernel expects each hot-plugged PCI device structure to
 point to a subordinate bus dev-subordinate. This assumption is held
 throughout the ppc PCI code, and there are numerous opportunities for
 panics when the device gets passed to a kernel routine with a
 subordinate pointer. [4]


 Proposed Solutions:

 (1) Create and hook an inert PCI switch to every hot-plugged PCI
 device in Qemu.

 (a) After the device has initialized itself, at hot-plug time, create a
 new PCI switch, configure the switch, allocate BARs, and attach the
 switch to the hot-plugged devices (dev-subordinate).

 (b) create a new device tree node that begins with the PCI switch and
 the parent of the hot-plugged device. Add the PCI switch/device
 complex to the device tree under the PHB.

 (2) Add each hot-plugged PCI device to its own complex of PHB
 (Processor Host Bus) and PCI switch.

 Simplify (1) by creating a new PHB for each hot-plugged device.

 (a) At PHB creation time, create a PCI switch device node for each PHB
 slot.

 (b) At hot-plug time, create and configure a new PHB and add the
 hot-plugged device to one of the slots. Configure and allocate
 resources as normally.

 Comments:

 The current code has only one PHB. We know we need to support more
 than one PHB ultimately. Solution #2 is consistent with this approach.


 [1]
https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c

 [2] ibm,rtas_configure_connector:

https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L575

 [3] spapr_phb_add_pci_dt

https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L900

 [4] dlpar_pci_add_bus

http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/pci/hotplug/rpadlpar_core.c?id=8bf3379a74bc9132751bfa685bad2da318fd59d7#n165


 --

 Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
 Endurance is a Virtue


Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe

2013-10-07 Thread Mike Day

Paolo Bonzini pbonz...@redhat.com writes:

 Il 30/09/2013 15:34, Alex Bligh ha scritto:
 
 I think the most likely change here is that the walkers might
 move outside the BQL. Given modification of this list is so rare,
 the lock would be very very read heavy, so RCU is probably a
 sensible option.

 I agree.  Keeping the write side on the BQL is sane, but RCU-protecting
 the read side actually makes the rules simpler.

 Mike, would you like to give it a shot?

Yes, I will. I'll have a patchset for review within a couple of days. 

Mike
-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
Endurance is a Virtue



Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe

2013-09-30 Thread Mike Day
) */
 -ts-cb(ts-opaque);
 +cb(opaque);
  progress = true;
  }
  return progress;
 -- 
 1.8.3.1



-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
Endurance is a Virtue



Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe

2013-09-30 Thread Mike Day


 On Mon, Sep 30, 2013 at 8:55 AM, Alex Bligh a...@alex.org.uk wrote:
 
 
  On 30 Sep 2013, at 13:45, Mike Day wrote:
 
   I've applied this set to Paolo's rcu tree - I see a couple of routines
   that appear to need the active_timers_lock:
  
   (line 137 of qemu-timer.c in my tree)
   void qemu_clock_notify(QEMUClockType type)
   {
  QEMUTimerList *timer_list;
  QEMUClock *clock = qemu_clock_ptr(type);
  QLIST_FOREACH(timer_list, clock-timerlists, list) {
  timerlist_notify(timer_list);
  }
   }
  
   (line 228 of qemu-timer.c in my tree)
   int64_t qemu_clock_deadline_ns_all(QEMUClockType type)
   {
  int64_t deadline = -1;
  QEMUTimerList *timer_list;
  QEMUClock *clock = qemu_clock_ptr(type);
  QLIST_FOREACH(timer_list, clock-timerlists, list) {
  deadline = qemu_soonest_timeout(deadline,
  
  timerlist_deadline_ns(timer_list));
  }
  return deadline;
   }
  
   I think these functions are always called now with the BQL held, so I
   wonder if they are good candidates for RCU?
 
  These routines iterate through the list of timerlists held by
  a clock.
 
  They do not iterate through the list of active timers in a timer
  list. I believe the latter is what active_timers_lock protects.
 
  The list of timers attached to a clock is only modified when timers
  are created and deleted which is (currently) under the BQL.
 


Sorry, and thanks for the correction re: active_timers_lock. I should have
said that clock-timerlists may need its own mutex if and when we remove
the BQL from the timer code.

Mike


Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe

2013-09-30 Thread Mike Day
On Mon, Sep 30, 2013 at 9:34 AM, Alex Bligh a...@alex.org.uk wrote:
  void qemu_clock_notify(QEMUClockType type)

 ...

  int64_t qemu_clock_deadline_ns_all(QEMUClockType type)

 ...

  I think these functions are always called now with the BQL held, so I
  wonder if they are good candidates for RCU?

 These routines iterate through the list of timerlists held by
 a clock.

 They do not iterate through the list of active timers in a timer
 list. I believe the latter is what active_timers_lock protects.

 The list of timers attached to a clock is only modified when timers
 are created and deleted which is (currently) under the BQL.

 Sorry, and thanks for the correction re: active_timers_lock. I should
 have said that clock-timerlists may need its own mutex if and when we
 remove the BQL from the timer code.


 Correct. The list of timerlists is only modified when a
 QEMUTimerListGroup is created or destroyed (currently when an
 AioContext is created or destroyed), and that is done with the
 BQL held.

 As you point out, it's currently walked by a couple of functions
 that also have the BQL held.

 I think the most likely change here is that the walkers might
 move outside the BQL. Given modification of this list is so rare,
 the lock would be very very read heavy, so RCU is probably a
 sensible option.

 This link may (or may not) help in understanding:
  http://blog.alex.org.uk/2013/08/24/changes-to-qemus-timer-system/

Alex, thanks for the link -

Mike


Re: [Qemu-devel] [PATCH v5 0/5] bugs fix for hpet

2013-09-26 Thread Mike Day

Paolo Bonzini pbonz...@redhat.com writes:

 Il 25/09/2013 08:27, liu ping fan ha scritto:
 Hi, is hpet orphan? Or who can help me to merge this patch-set if my
 patch is fine.

 Anthony, Michael?

Yes, happy to help out with this. I'll start looking at it now and work
with Liu Ping, 

Mike

-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
Endurance is a Virtue



Re: [Qemu-devel] [PATCH v5 0/5] bugs fix for hpet

2013-09-26 Thread Mike Day

Paolo Bonzini pbonz...@redhat.com writes:

 Il 25/09/2013 08:27, liu ping fan ha scritto:
 Hi, is hpet orphan? Or who can help me to merge this patch-set if my
 patch is fine.

 Anthony, Michael?

Sorry, wrong Michael - 

Mike

-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
Endurance is a Virtue



[Qemu-devel] [PATCH] Convert ram_list to RCU DQ V4,2

2013-09-09 Thread Mike Day
Changes from V4.1:

* Correct memory barriers for ram_list globals.

Changes from V4:

* rebased on https://github.com/bonzini/qemu/tree/rcu
  commit 965f3b2aac93bca6df50c86fb17a06b3c856fa30

Changes from V3:

* now passes virt-test -t qemu
* uses call_rcu instead of call_rcu1
* completely removed the ram_list mutex and its locking functions
* cleaned up some comments

Changes from V2:

* added rcu reclaim function to free ram blocks
* reduced the number of comments
* made rcu locking policy consistent for different caller scenarios
* rcu updates to ram_block now are protected only by the iothread mutex

Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow unlocked reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has been virt-tested using KVM -t qemu and
passes 15/15 migration tests.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day ncm...@ncultra.org

Signed-off-by: Mike Day ncm...@ncultra.org
---
 arch_init.c| 107 ++-
 exec.c | 167 +++--
 include/exec/cpu-all.h |  13 ++--
 3 files changed, 161 insertions(+), 126 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 0471cd5..e24d29e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include trace.h
 #include exec/cpu-all.h
 #include hw/acpi/acpi.h
+#include qemu/rcu_queue.h
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -399,7 +400,8 @@ static void migration_bitmap_sync(void)
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(address_space_memory);
 
-QTAILQ_FOREACH(block, ram_list.blocks, next) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next) {
 for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block-mr,
addr, TARGET_PAGE_SIZE,
@@ -408,6 +410,8 @@ static void migration_bitmap_sync(void)
 }
 }
 }
+rcu_read_unlock();
+
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -447,6 +451,8 @@ static void migration_bitmap_sync(void)
  *
  * Returns:  The number of bytes written.
  *   0 means no dirty pages
+ *
+ * assumes that the caller has rcu-locked the ram_list
  */
 
 static int ram_save_block(QEMUFile *f, bool last_stage)
@@ -458,8 +464,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 MemoryRegion *mr;
 ram_addr_t current_addr;
 
+
 if (!block)
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 
 while (true) {
 mr = block-mr;
@@ -470,9 +477,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset = block-length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -527,9 +534,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 }
 }
+
 last_seen_block = block;
 last_offset = offset;
-
 return bytes_sent;
 }
 
@@ -566,10 +573,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next)
 total += block-length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -601,12 +608,22 @@ static void reset_ram_globals(void)
 last_seen_block = NULL;
 last_sent_block = NULL;
 last_offset = 0;
-last_version = ram_list.version;
 ram_bulk_stage = true;
+smp_wmb();
+last_version = ram_list.version;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+
+/* ram_save_* functions each  implement a long-running RCU critical
+ * section. When rcu-reclaims in the code start to become numerous
+ * it will be necessary to reduce the granularity of these critical
+ * sections.
+ *
+ * (ram_save_setup

Re: [Qemu-devel] [PATCH] Convert ram_list to RCU DQ V4,2

2013-09-09 Thread Mike Day
On Mon, Sep 9, 2013 at 12:21 PM, Paolo Bonzini pbonz...@redhat.com wrote:

  @@ -601,12 +608,22 @@ static void reset_ram_globals(void)
   last_seen_block = NULL;
   last_sent_block = NULL;
   last_offset = 0;
  -last_version = ram_list.version;
   ram_bulk_stage = true;
  +smp_wmb();
  +last_version = ram_list.version;

 This barrier is still not needed.

Yes, I agree, because of the calling context. It is brittle though because
reset_ram_globals is a static function and may be called differently in the
future (or by new code at a different location). The need for a barrier may
change and it would be opaque to the internals of the reset function. Also,
the globals are writable elsewhere in the file. It needs more organization
but I agree that should be a discrete patch.

 Can you take a look at my rcu branch?

The comments clarify to me why writing to last_mru does not need a write
barrier, thank you.

 I pushed there the conversion of mru_block to RCU (based on 4.1), and
 clarified a bit more the locking conventions.  Basically we have:

 - functions called under iothread lock (e.g. qemu_ram_alloc)

 - functions called under RCU or iothread lock (e.g. qemu_get_ram_ptr)

 - functions called under RCU or iothread lock, or while holding a
 reference to the MemoryRegion that is looked up again (e.g.
 qemu_ram_addr_from_host).

 The difference between the second and third group is simply that the
 second will not call rcu_read_lock()/rcu_read_unlock(), the third will.

 Does it make sense?  Should we simplify it by always calling
 rcu_read_lock()/rcu_read_unlock(), which removes the second group?

I think the benefits of simplification and future reliability are greater
than the performance cost of the rcu_read_lock. The latter should be
something we verify, though.

Thank you!

Mike


[Qemu-devel] [PATCH] Convert ram_list to RCU DQ V4.1

2013-09-05 Thread Mike Day
Changes from V4:

rebased on https://github.com/bonzini/qemu/tree/rcu
commit 965f3b2aac93bca6df50c86fb17a06b3c856fa30

This brings the patchset down to three files.

Changes from V3:

* now passes virt-test -t qemu
* uses call_rcu instead of call_rcu1
* completely removed the ram_list mutex and its locking functions
* cleaned up some comments

Changes from V2:

* added rcu reclaim function to free ram blocks
* reduced the number of comments
* made rcu locking policy consistent for different caller scenarios
* rcu updates to ram_block now are protected only by the iothread mutex

Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow unlocked reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has been virt-tested using KVM -t qemu and
passes 15/15 migration tests.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day ncm...@ncultra.org
---
 arch_init.c| 105 ++-
 exec.c | 164 +++--
 include/exec/cpu-all.h |  13 ++--
 3 files changed, 158 insertions(+), 124 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 0471cd5..29f1da2 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include trace.h
 #include exec/cpu-all.h
 #include hw/acpi/acpi.h
+#include qemu/rcu_queue.h
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -399,7 +400,8 @@ static void migration_bitmap_sync(void)
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(address_space_memory);
 
-QTAILQ_FOREACH(block, ram_list.blocks, next) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next) {
 for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block-mr,
addr, TARGET_PAGE_SIZE,
@@ -408,6 +410,8 @@ static void migration_bitmap_sync(void)
 }
 }
 }
+rcu_read_unlock();
+
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -447,6 +451,8 @@ static void migration_bitmap_sync(void)
  *
  * Returns:  The number of bytes written.
  *   0 means no dirty pages
+ *
+ * assumes that the caller has rcu-locked the ram_list
  */
 
 static int ram_save_block(QEMUFile *f, bool last_stage)
@@ -458,8 +464,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 MemoryRegion *mr;
 ram_addr_t current_addr;
 
+
 if (!block)
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 
 while (true) {
 mr = block-mr;
@@ -470,9 +477,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset = block-length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -527,9 +534,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 }
 }
+
 last_seen_block = block;
 last_offset = offset;
-
 return bytes_sent;
 }
 
@@ -566,10 +573,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next)
 total += block-length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -603,10 +610,20 @@ static void reset_ram_globals(void)
 last_offset = 0;
 last_version = ram_list.version;
 ram_bulk_stage = true;
+smp_wmb();
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+
+/* ram_save_* functions each  implement a long-running RCU critical
+ * section. When rcu-reclaims in the code start to become numerous
+ * it will be necessary to reduce the granularity of these critical
+ * sections.
+ *
+ * (ram_save_setup, ram_save_iterate, and ram_save_complete.)
+ */
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
 RAMBlock *block;
@@ -632,7 +649,7 @@ static int ram_save_setup

[Qemu-devel] [PATCH] Convert ram_list to RCU DQ V3

2013-09-04 Thread Mike Day
Changes from V2:

* added rcu reclaim function to free ram blocks
* reduced the number of comments
* made rcu locking policy consistent for different caller scenarios
* rcu updates to ram_block now are protected only by the iothread mutex

Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow unlocked reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has not been tested further than that at
this point.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day ncm...@ncultra.org
---
 arch_init.c  | 103 ++
 exec.c   | 160 +--
 include/exec/cpu-all.h   |   6 +-
 include/qemu/rcu_queue.h |   8 +++
 4 files changed, 173 insertions(+), 104 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 68a7ab7..4cb1543 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include trace.h
 #include exec/cpu-all.h
 #include hw/acpi/acpi.h
+#include qemu/rcu_queue.h
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -398,7 +399,8 @@ static void migration_bitmap_sync(void)
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(address_space_memory);
 
-QTAILQ_FOREACH(block, ram_list.blocks, next) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next) {
 for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block-mr,
addr, TARGET_PAGE_SIZE,
@@ -407,6 +409,8 @@ static void migration_bitmap_sync(void)
 }
 }
 }
+rcu_read_unlock();
+
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -446,6 +450,8 @@ static void migration_bitmap_sync(void)
  *
  * Returns:  The number of bytes written.
  *   0 means no dirty pages
+ *
+ * assumes that the caller has rcu-locked the ram_list
  */
 
 static int ram_save_block(QEMUFile *f, bool last_stage)
@@ -457,8 +463,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 MemoryRegion *mr;
 ram_addr_t current_addr;
 
+
 if (!block)
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 
 while (true) {
 mr = block-mr;
@@ -469,9 +476,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset = block-length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -526,9 +533,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 }
 }
+
 last_seen_block = block;
 last_offset = offset;
-
 return bytes_sent;
 }
 
@@ -565,10 +572,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next)
 total += block-length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -602,10 +609,20 @@ static void reset_ram_globals(void)
 last_offset = 0;
 last_version = ram_list.version;
 ram_bulk_stage = true;
+smp_wmb();
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+
+/* ram_save_* functions each  implement a long-running RCU critical
+ * section. When rcu-reclaims in the code start to become numerous
+ * it will be necessary to reduce the granularity of these critical
+ * sections.
+ *
+ * (ram_save_setup, ram_save_iterate, and ram_save_complete.)
+ */
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
 RAMBlock *block;
@@ -631,7 +648,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-qemu_mutex_lock_ramlist();
+rcu_read_lock();
 bytes_transferred = 0;
 reset_ram_globals();
 
@@ -641,13 +658,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE

[Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V4

2013-09-04 Thread Mike Day
* now passes virt-test -t qemu
* uses call_rcu instead of call_rcu1
* completely removed the ram_list mutex and its locking functions
* cleaned up some comments

Changes from V3:

* added rcu reclaim function to free ram blocks
* reduced the number of comments
* made rcu locking policy consistent for different caller scenarios
* rcu updates to ram_block now are protected only by the iothread mutex

Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow unlocked reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has been virt-tested using KVM -t qemu and
passes 15/15 migration tests.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day ncm...@ncultra.org
---
 arch_init.c  |  96 +---
 exec.c   | 162 +--
 include/exec/cpu-all.h   |  13 ++--
 include/qemu/rcu_queue.h |   8 +++
 4 files changed, 160 insertions(+), 119 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 68a7ab7..76ef5c9 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include trace.h
 #include exec/cpu-all.h
 #include hw/acpi/acpi.h
+#include qemu/rcu_queue.h
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -398,7 +399,8 @@ static void migration_bitmap_sync(void)
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(address_space_memory);
 
-QTAILQ_FOREACH(block, ram_list.blocks, next) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next) {
 for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block-mr,
addr, TARGET_PAGE_SIZE,
@@ -407,6 +409,8 @@ static void migration_bitmap_sync(void)
 }
 }
 }
+rcu_read_unlock();
+
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -446,6 +450,8 @@ static void migration_bitmap_sync(void)
  *
  * Returns:  The number of bytes written.
  *   0 means no dirty pages
+ *
+ * assumes that the caller has rcu-locked the ram_list
  */
 
 static int ram_save_block(QEMUFile *f, bool last_stage)
@@ -458,7 +464,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 ram_addr_t current_addr;
 
 if (!block)
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 
 while (true) {
 mr = block-mr;
@@ -469,9 +475,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset = block-length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -565,10 +571,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next)
 total += block-length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -602,10 +608,20 @@ static void reset_ram_globals(void)
 last_offset = 0;
 last_version = ram_list.version;
 ram_bulk_stage = true;
+smp_wmb();
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+
+/* ram_save_* functions each  implement a long-running RCU critical
+ * section. When rcu-reclaims in the code start to become numerous
+ * it will be necessary to reduce the granularity of these critical
+ * sections.
+ *
+ * (ram_save_setup, ram_save_iterate, and ram_save_complete.)
+ */
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
 RAMBlock *block;
@@ -631,7 +647,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-qemu_mutex_lock_ramlist();
+rcu_read_lock();
 bytes_transferred = 0;
 reset_ram_globals();
 
@@ -641,13 +657,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
-QTAILQ_FOREACH(block, ram_list.blocks, next

Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V4

2013-09-04 Thread Mike Day
On Wed, Sep 4, 2013 at 3:58 PM, Paolo Bonzini pbonz...@redhat.com wrote:

  @@ -1323,23 +1325,21 @@ static RAMBlock *qemu_get_ram_block(ram_addr_t
addr)
   {
   RAMBlock *block;
 
  -/* The list is protected by the iothread lock here.  */
  +/* This assumes the iothread lock is taken here too. */
  +
   block = ram_list.mru_block;
   if (block  addr - block-offset  block-length) {
  -goto found;
  +return block;
   }
  -QTAILQ_FOREACH(block, ram_list.blocks, next) {
  +QLIST_FOREACH_RCU(block, ram_list.blocks, next) {
   if (addr - block-offset  block-length) {
  -goto found;
  +atomic_rcu_set(ram_list.mru_block, block);

 I think this is not needed, block was already published into the block
 list.  What is important is to order mru_block == NULL so that it
 happens before the node is removed.  Which we don't do, but we can fix
 later.

I am thinking of writing some macros and possibly reorganizing the ram
globals into a struct so that we can update it by exchanging pointers
atomically. It seems to disjoint right and error-prone now that we are
making it rcu-enabled.

thanks!

Mike


Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2

2013-09-03 Thread Mike Day
On Fri, Aug 30, 2013 at 12:38 PM, Paolo Bonzini pbonz...@redhat.com wrote:

  @@ -867,7 +879,12 @@ static int ram_load(QEMUFile *f, void *opaque, int
version_id)
   if (version_id  4 || version_id  4) {
   return -EINVAL;
   }
  -
  +/* this implements a long-running RCU critical section.
  + * When rcu reclaims in the code start to become numerous
  + * it will be necessary to reduce the granularity of this critical
  + * section.
  + */

 Please add the same comment (and a rcu_read_lock/unlock pair replacing
 the ramlist mutex) in ram_save_iterate, too.

Just double checking on this particular change. In practice ram_save
manipulates the ram_list indirectly through ram_save_block. But I'm
assuming you want this change because of the ram state info that persists
between calls to ram_save (ram_list version in particular). Also, there is
potential for the callback functions ram_control_*_iterate to manipulate
the ram_list.

I'm adding the rcu_read_lock/unlock pair in ram_load. It will be recursive
with the same calls in ram_save_block, but as you pointed out this is low
overhead.

With this change in my working code, ram_control_*_iterate are called from
within an rcu critical section.

Mike


Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2

2013-09-03 Thread Mike Day
On Tue, Sep 3, 2013 at 10:09 AM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 03/09/2013 15:56, Mike Day ha scritto:
   +/* this implements a long-running RCU critical section.
   + * When rcu reclaims in the code start to become numerous
   + * it will be necessary to reduce the granularity of this
critical
   + * section.
   + */
 
  Please add the same comment (and a rcu_read_lock/unlock pair replacing
  the ramlist mutex) in ram_save_iterate, too.
 
  Just double checking on this particular change. In practice ram_save
  manipulates the ram_list indirectly through ram_save_block. But I'm
  assuming you want this change because of the ram state info that
  persists between calls to ram_save (ram_list version in particular).

 ram_list.version is not really a problem, but last_seen_block has to
 persist across ram_save_block calls.

Got it. that's a subtle point.

  Also, there is potential for the callback functions
  ram_control_*_iterate to manipulate the ram_list.

 I think that's right now not possible (and they could use
 rcu_read_lock/unlock as well).

Yeah. So how about we say for now that the rcu critical section status upon
entry to the ram_control_*_iterate functions is undefined. I'll make some
updates.

Mike


Re: [Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ

2013-08-30 Thread Mike Day
On Fri, Aug 30, 2013 at 4:19 AM, Paolo Bonzini pbonz...@redhat.com wrote:


 I'm not sure about that; returning an RCU-protected variable after
 rcu_read_unlock() seems wrong to me because the pointer may not be valid
 at that point.  I suggest using a comment that asks to call
 host_from_stream_offset within rcu_read_lock()/rcu_read_unlock().
 However, if existing practice in the kernel is different, I'll bow to
that.

In this case the only caller is ram_load so I'm removing the critical
section from within host_from_stream_offset, and adding comments to note
that the ram_list needs to be protected by the caller. The current
docs/rcu.txt information indicates that rcu critical sections can be
nested or overlapping. But your suggestion results in cleaner code - we
will have to go back to this later as you noted earlier.

Mike


[Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2

2013-08-30 Thread Mike Day
Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow unlocked reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has not been tested further than that at
this point.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day ncm...@ncultra.org
---
 arch_init.c   |  80 +++---
 exec.c| 142 ++
 hw/9pfs/virtio-9p-synth.c |   2 +-
 include/exec/cpu-all.h|   4 +-
 include/qemu/rcu_queue.h  |   8 +++
 5 files changed, 151 insertions(+), 85 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 68a7ab7..3f4d676 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include trace.h
 #include exec/cpu-all.h
 #include hw/acpi/acpi.h
+#include qemu/rcu_queue.h
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -398,7 +399,11 @@ static void migration_bitmap_sync(void)
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(address_space_memory);
 
-QTAILQ_FOREACH(block, ram_list.blocks, next) {
+ /* This assumes the iothread lock or the ram_list mutex is taken.
+ * if that changes, accesses to ram_list need to be protected
+ * by a mutex (writes) or an rcu read lock (reads)
+ */
+QLIST_FOREACH_RCU(block, ram_list.blocks, next) {
 for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block-mr,
addr, TARGET_PAGE_SIZE,
@@ -457,8 +462,13 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 MemoryRegion *mr;
 ram_addr_t current_addr;
 
+/* Sometimes called with the ram_list mutex held (ram_save_complete)
+ * also called WITHOUT the ram_list mutex held. (ram_save_iterate).
+ * Protect ram_list with an rcu critical section.
+ */
+rcu_read_lock();
 if (!block)
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 
 while (true) {
 mr = block-mr;
@@ -469,9 +479,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset = block-length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -526,6 +536,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 }
 }
+rcu_read_unlock();
 last_seen_block = block;
 last_offset = offset;
 
@@ -565,10 +576,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next)
 total += block-length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -631,7 +642,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-qemu_mutex_lock_ramlist();
+rcu_read_lock();
 bytes_transferred = 0;
 reset_ram_globals();
 
@@ -641,13 +652,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
-QTAILQ_FOREACH(block, ram_list.blocks, next) {
+QLIST_FOREACH_RCU(block, ram_list.blocks, next) {
 qemu_put_byte(f, strlen(block-idstr));
 qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
 qemu_put_be64(f, block-length);
 }
 
-qemu_mutex_unlock_ramlist();
+rcu_read_unlock();
 
 ram_control_before_iterate(f, RAM_CONTROL_SETUP);
 ram_control_after_iterate(f, RAM_CONTROL_SETUP);
@@ -664,8 +675,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 int64_t t0;
 int total_sent = 0;
 
-qemu_mutex_lock_ramlist();
-
 if (ram_list.version != last_version) {
 reset_ram_globals();
 }
@@ -701,8 +710,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 i++;
 }
 
-qemu_mutex_unlock_ramlist();
-
 /*
  * Must occur before EOS (or any QEMUFile operation)
  * because of RDMA protocol.
@@ -815,6 +822,10 @@ static inline void *host_from_stream_offset(QEMUFile

Re: [Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ

2013-08-29 Thread Mike Day
On Wed, Aug 28, 2013 at 12:35 PM, Paolo Bonzini pbonz...@redhat.com wrote:

  @@ -828,13 +829,18 @@ static inline void
*host_from_stream_offset(QEMUFile *f,
   qemu_get_buffer(f, (uint8_t *)id, len);
   id[len] = 0;
 
  -QTAILQ_FOREACH(block, ram_list.blocks, next) {
  -if (!strncmp(id, block-idstr, sizeof(id)))
  -return memory_region_get_ram_ptr(block-mr) + offset;
  +rcu_read_lock();
  +QLIST_FOREACH_RCU(block, ram_list.blocks, next) {
  +if (!strncmp(id, block-idstr, sizeof(id))) {
  +ptr = memory_region_get_ram_ptr(block-mr) + offset;
  +goto unlock_out;
  +}
   }
 
   fprintf(stderr, Can't find block %s!\n, id);
  -return NULL;
  +unlock_out:
  +rcu_read_unlock();
  +return ptr;
   }


 Similarly, here the critical section includes the caller, and block is
 living across calls to host.  Again, for now just put all of ram_load
 under a huge RCU critical section.  Later we can use ram_list.version to
 refresh the list and make the critical sections smaller.

And: rcu_read_lock() and rcu_read_unlock() can be called recursively, so I
can still leave the lock/unlock pair in host_from_stream_offset.


[Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ

2013-08-28 Thread Mike Day
Allow unlocked reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has not been tested further than that at
this point.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day ncm...@ncultra.org
---
 arch_init.c   |  51 -
 exec.c| 111 +-
 hw/9pfs/virtio-9p-synth.c |   2 +-
 include/exec/cpu-all.h|   4 +-
 include/qemu/rcu_queue.h  |   8 
 5 files changed, 111 insertions(+), 65 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 68a7ab7..5c7b284 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include trace.h
 #include exec/cpu-all.h
 #include hw/acpi/acpi.h
+#include qemu/rcu_queue.h
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -397,8 +398,8 @@ static void migration_bitmap_sync(void)
 
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(address_space_memory);
-
-QTAILQ_FOREACH(block, ram_list.blocks, next) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next) {
 for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block-mr,
addr, TARGET_PAGE_SIZE,
@@ -407,6 +408,7 @@ static void migration_bitmap_sync(void)
 }
 }
 }
+rcu_read_unlock();
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -457,8 +459,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 MemoryRegion *mr;
 ram_addr_t current_addr;
 
+rcu_read_lock();
 if (!block)
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 
 while (true) {
 mr = block-mr;
@@ -469,9 +472,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset = block-length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(ram_list.blocks);
+block = QLIST_FIRST_RCU(ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -526,6 +529,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 }
 }
+rcu_read_unlock();
 last_seen_block = block;
 last_offset = offset;
 
@@ -565,10 +569,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, ram_list.blocks, next)
 total += block-length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -631,7 +635,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-qemu_mutex_lock_ramlist();
+rcu_read_lock();
 bytes_transferred = 0;
 reset_ram_globals();
 
@@ -641,13 +645,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
-QTAILQ_FOREACH(block, ram_list.blocks, next) {
+QLIST_FOREACH_RCU(block, ram_list.blocks, next) {
 qemu_put_byte(f, strlen(block-idstr));
 qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
 qemu_put_be64(f, block-length);
 }
 
-qemu_mutex_unlock_ramlist();
+rcu_read_unlock();
 
 ram_control_before_iterate(f, RAM_CONTROL_SETUP);
 ram_control_after_iterate(f, RAM_CONTROL_SETUP);
@@ -664,8 +668,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 int64_t t0;
 int total_sent = 0;
 
-qemu_mutex_lock_ramlist();
-
 if (ram_list.version != last_version) {
 reset_ram_globals();
 }
@@ -701,8 +703,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 i++;
 }
 
-qemu_mutex_unlock_ramlist();
-
 /*
  * Must occur before EOS (or any QEMUFile operation)
  * because of RDMA protocol.
@@ -814,6 +814,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 static RAMBlock *block = NULL;
 char id[256];
 uint8_t len;
+void *ptr = NULL;
 
 if (flags  RAM_SAVE_FLAG_CONTINUE) {
 if (!block) {
@@ -828,13 +829,18 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 qemu_get_buffer(f, (uint8_t *)id, len);
 id[len] = 0

[Qemu-devel] [RFC PATCH V3 ] Introduce RCU-enabled DQs

2013-08-27 Thread Mike Day
Add RCU-enabled variants on the existing bsd DQ facility. Each Q
operation has the same interface as the existing (non-RCU)
version. Also, each operation is implemented as macro for now.

Using the RCU-enabled DQ, existing DQ users will be able to convert to
RCU without using a different list interface.

Changes since V2:

* reverted QLIST_FOREACH_REVERSE_RCU, it's not needed in the interface
  file. (But it is used in the test program.)  

* added g_test support to the test program

To accompany the RCU-enabled DQ, there is also a test file that uses
concurrent readers to contend with a single updater.

This patchset builds on top of Paolo Bonzini's rcu tree:
https://github.com/bonzini/qemu/tree/rcu


Signed-off-by: Mike Day ncm...@ncultra.org
---
 docs/rcu.txt |   2 +-
 include/qemu/queue.h |  11 --
 include/qemu/rcu_queue.h | 137 +++
 tests/Makefile   |   5 +-
 tests/rcuq_test.c| 337 +++
 5 files changed, 479 insertions(+), 13 deletions(-)
 create mode 100644 include/qemu/rcu_queue.h
 create mode 100644 tests/rcuq_test.c

diff --git a/docs/rcu.txt b/docs/rcu.txt
index b3c593c..de59896 100644
--- a/docs/rcu.txt
+++ b/docs/rcu.txt
@@ -106,7 +106,7 @@ The core RCU API is small:
 so that the reclaimer function can fetch the struct foo address
 and free it:
 
-call_rcu1(foo_reclaim, foo.rcu);
+call_rcu1(foo.rcu, foo_reclaim);
 
 void foo_reclaim(struct rcu_head *rp)
 {
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 847ddd1..f6f0636 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -139,17 +139,6 @@ struct {   
 \
 (elm)-field.le_prev = (head)-lh_first;   \
 } while (/*CONSTCOND*/0)
 
-#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\
-(elm)-field.le_prev = (head)-lh_first;   \
-(elm)-field.le_next = (head)-lh_first;\
-smp_wmb(); /* fill elm before linking it */ \
-if ((head)-lh_first != NULL)  {\
-(head)-lh_first-field.le_prev = (elm)-field.le_next;\
-}   \
-(head)-lh_first = (elm);   \
-smp_wmb();  \
-} while (/* CONSTCOND*/0)
-
 #define QLIST_REMOVE(elm, field) do {   \
 if ((elm)-field.le_next != NULL)   \
 (elm)-field.le_next-field.le_prev =   \
diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h
new file mode 100644
index 000..e2b8ba5
--- /dev/null
+++ b/include/qemu/rcu_queue.h
@@ -0,0 +1,137 @@
+#ifndef QEMU_RCU_SYS_QUEUE_H
+#define QEMU_RCU_SYS_QUEUE_H
+
+/*
+ * rcu_queue.h
+ *
+ * Userspace RCU QSBR header.
+ *
+ * LGPL-compatible code should include this header with :
+ *
+ * #define _LGPL_SOURCE
+ * #include urcu.h
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Copyright (c) 2013 Mike D. Day, IBM Corporation.
+ *
+ * IBM's contributions to this file may be relicensed under LGPLv2 or later.
+ */
+
+#include qemu/rcu.h  /* rcu.h includes qemu/queue.h and qemu/atomic.h */
+
+
+#ifdef __cplusplus
+extern C {
+#endif
+
+/*
+ * List functions.
+ */
+
+
+/*
+ *  The difference between atomic_read/set and atomic_rcu_read/set
+ *  is in the including of a read/write memory barrier to the volatile
+ *  access. atomic_rcu_* macros include the memory barrier, the
+ *  plain atomic macros do not. Therefore, it should be correct to
+ *  issue a series of reads or writes to the same element using only
+ *  the atomic_* macro, until the last read or write, which should be
+ *  atomic_rcu_* to introduce a read or write memory barrier as
+ *  appropriate.
+ */
+
+/* Upon publication of the listelm-next value, list readers
+ * will see the new node when following next pointers from
+ * antecedent nodes, but may not see the new node when following
+ * prev pointers from subsequent nodes until

Re: [Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)

2013-08-26 Thread Mike Day
On Sun, Aug 25, 2013 at 3:18 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

 I'm not very comfortable with your DQ implementation not providing any
 kind of guarantee to a forward traversal followed by backward traversal,
 nor for backward followed by forward traversal. If a list add is
 executed concurrently with traversals, and we can ensure there are no
 list del of the node, if a traversal sees the added node when doing
 forward iteration, I would clearly expect to still see it if a backward
 iteration follows.

 I took the liberty of implementing a couple of ideas I had to provide
 a RCU DQ with those guarantees. I just pushed the code here (beware, I
 just did some basic single-threaded unit tests so far, so consider this
 code as largely untested concurrency-wise):

   git clone git://git.urcu.io/urcu.git
   branch: rcudq
   file: urcu/rcudq.h

 Direct link to the file via gitweb:

http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rcudq.h;h=4a8d7b0d5143a958514cf130b1c124d99f3194ca;hb=refs/heads/rcudq

Mathieu - Thanks for the review! And thanks for the code, I'm working with
it right now. I like the idea of using a flag to provide a form of
atomicity for the doubly-linked list elements. I'm also planning on running
some timing tests to see of the additional memory barriers and atomic
accesses make *any* difference whatsoever.

Mike

Mike Day | ncm...@ncultra.org | +1 919 371-8786


Re: [Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)

2013-08-25 Thread Mike Day

Paolo Bonzini pbonz...@redhat.com writes:

 Just a couple of questions, one of them on the new macro...

 +/* prior to publication of the elm-prev-next value, some list
 + * readers  may still see the removed element when following
 + * the antecedent's next pointer.
 + */
 +
 +#define QLIST_REMOVE_RCU(elm, field) do {   \
 +if ((elm)-field.le_next != NULL) { \
 +   (elm)-field.le_next-field.le_prev =\
 +(elm)-field.le_prev;   \
 +}   \
 +atomic_rcu_set((elm)-field.le_prev, (elm)-field.le_next); \
 +} while (/*CONSTCOND*/0)

 Why is the barrier needed here, but not in Linux's list_del_rcu?

 I think it is not needed because all involved elements have already been
 published and just have their pointers shuffled.

I read this as more than shuffling pointers. The intent here is 
that the previous element's next pointer is being updated to omit the
current element from the list.

atomic_set always deferences the pointer passed to it, and
(field)-le_pre is a double pointer. So looking at the macro:

#define atomic_set(ptr, i) ((*(__typeof__(*ptr) *volatile) (ptr)) = (i))

It translates to: 

( ( * (__typeof(*elm-field.le_prev) *volatile) (elm)-field.le_prev)  = 
elm-field.le_next; ) 

Which is: 

 *((struct *elm) *volatile)(elm)-field.le_prev = elm-field.le_next; 

Which is:

*(elm)-field.le_prev = elm-field.le_next;

Because field.le_prev is a double pointer that has previously been set
to prev (the address of the previous list element) this is assiging the
*previous* element's next pointer, the way I read it.

The Linux list_del_rcu is dealing with a singly linked list and
therefore does not set a value in the previous node's element. 

But I'm still unclear on whether or not the memory barrier is needed
because the deleted element won't be reclaimed right away.

Mike

-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
Endurance is a Virtue



[Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)

2013-08-24 Thread Mike Day
Add RCU-enabled variants on the existing bsd DQ facility. Each Q
operation has the same interface as the existing (non-RCU)
version. Also, each operation is implemented as macro for now.

Using the RCU-enabled DQ, existing DQ users will be able to convert to
RCU without using a different list interface.

This version (2) adds a macro to walk a Q in reverse:

QLIST_FOREACH_REVERSE_RCU(el, head, field)

Accordingly the reader threads in the test program walk the Q in
reverse in addition to walking forward.

To accompany the RCU-enabled DQ, there is also a test file that uses
concurrent readers to contend with a single updater.

This patchset builds on top of Paolo Bonzini's rcu tree:
https://github.com/bonzini/qemu/tree/rcu

Signed-off-by: Mike Day ncm...@ncultra.org
---
 docs/rcu.txt |   2 +-
 include/qemu/queue.h |  11 --
 include/qemu/rcu_queue.h | 145 
 tests/Makefile   |   6 +-
 tests/rcuq_test.c| 290 +++
 5 files changed, 440 insertions(+), 14 deletions(-)
 create mode 100644 include/qemu/rcu_queue.h
 create mode 100644 tests/rcuq_test.c

diff --git a/docs/rcu.txt b/docs/rcu.txt
index b3c593c..de59896 100644
--- a/docs/rcu.txt
+++ b/docs/rcu.txt
@@ -106,7 +106,7 @@ The core RCU API is small:
 so that the reclaimer function can fetch the struct foo address
 and free it:
 
-call_rcu1(foo_reclaim, foo.rcu);
+call_rcu1(foo.rcu, foo_reclaim);
 
 void foo_reclaim(struct rcu_head *rp)
 {
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 847ddd1..f6f0636 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -139,17 +139,6 @@ struct {   
 \
 (elm)-field.le_prev = (head)-lh_first;   \
 } while (/*CONSTCOND*/0)
 
-#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\
-(elm)-field.le_prev = (head)-lh_first;   \
-(elm)-field.le_next = (head)-lh_first;\
-smp_wmb(); /* fill elm before linking it */ \
-if ((head)-lh_first != NULL)  {\
-(head)-lh_first-field.le_prev = (elm)-field.le_next;\
-}   \
-(head)-lh_first = (elm);   \
-smp_wmb();  \
-} while (/* CONSTCOND*/0)
-
 #define QLIST_REMOVE(elm, field) do {   \
 if ((elm)-field.le_next != NULL)   \
 (elm)-field.le_next-field.le_prev =   \
diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h
new file mode 100644
index 000..198a87d
--- /dev/null
+++ b/include/qemu/rcu_queue.h
@@ -0,0 +1,145 @@
+#ifndef QEMU_RCU_SYS_QUEUE_H
+#define QEMU_RCU_SYS_QUEUE_H
+
+/*
+ * rc_queue.h
+ *
+ * Userspace RCU QSBR header.
+ *
+ * LGPL-compatible code should include this header with :
+ *
+ * #define _LGPL_SOURCE
+ * #include urcu.h
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Copyright (c) 2013 Mike D. Day, IBM Corporation.
+ *
+ * IBM's contributions to this file may be relicensed under LGPLv2 or later.
+ */
+
+#include qemu/rcu.h  /* rcu.h includes qemu/queue.h and qemu/atomic.h */
+
+
+#ifdef __cplusplus
+extern C {
+#endif
+
+/*
+ * List functions.
+ */
+
+
+/*
+ *  The difference between atomic_read/set and atomic_rcu_read/set
+ *  is in the including of a read/write memory barrier to the volatile
+ *  access. atomic_rcu_* macros include the memory barrier, the
+ *  plain atomic macros do not. Therefore, it should be correct to
+ *  issue a series of reads or writes to the same element using only
+ *  the atomic_* macro, until the last read or write, which should be
+ *  atomic_rcu_* to introduce a read or write memory barrier as
+ *  appropriate.
+ */
+
+/* Upon publication of the listelm-next value, list readers
+ * will see the new node when following next pointers from
+ * antecedent nodes, but may not see the new node when following
+ * prev pointers from

[Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs.

2013-08-23 Thread Mike Day
Add RCU-enabled variants on the existing bsd DQ facility. Each Q
operation has the same interface as the existing (non-RCU)
version. Also, each operation is implemented as macro for now.

Using the RCU-enabled DQ, existing DQ users will be able to convert to
RCU without using a different list interface.

To accompany the RCU-enabled DQ, there is also a test file that uses
concurrent readers to contend with a single updater.

This patchset builds on top of Paolo Bonzini's rcu tree:
https://github.com/bonzini/qemu/tree/rcu

Signed-off-by: Mike Day ncm...@ncultra.org
---
 docs/rcu.txt |   2 +-
 include/qemu/queue.h |  11 --
 include/qemu/rcu_queue.h | 137 +++
 tests/Makefile   |  14 ++-
 tests/rcuq_test.c| 279 +++
 5 files changed, 430 insertions(+), 13 deletions(-)

diff --git a/docs/rcu.txt b/docs/rcu.txt
index b3c593c..de59896 100644
--- a/docs/rcu.txt
+++ b/docs/rcu.txt
@@ -106,7 +106,7 @@ The core RCU API is small:
 so that the reclaimer function can fetch the struct foo address
 and free it:
 
-call_rcu1(foo_reclaim, foo.rcu);
+call_rcu1(foo.rcu, foo_reclaim);
 
 void foo_reclaim(struct rcu_head *rp)
 {
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 847ddd1..f6f0636 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -139,17 +139,6 @@ struct {   
 \
 (elm)-field.le_prev = (head)-lh_first;   \
 } while (/*CONSTCOND*/0)
 
-#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\
-(elm)-field.le_prev = (head)-lh_first;   \
-(elm)-field.le_next = (head)-lh_first;\
-smp_wmb(); /* fill elm before linking it */ \
-if ((head)-lh_first != NULL)  {\
-(head)-lh_first-field.le_prev = (elm)-field.le_next;\
-}   \
-(head)-lh_first = (elm);   \
-smp_wmb();  \
-} while (/* CONSTCOND*/0)
-
 #define QLIST_REMOVE(elm, field) do {   \
 if ((elm)-field.le_next != NULL)   \
 (elm)-field.le_next-field.le_prev =   \
diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h
new file mode 100644
index 000..e82196d
--- /dev/null
+++ b/include/qemu/rcu_queue.h
@@ -0,0 +1,137 @@
+#ifndef QEMU_RCU_SYS_QUEUE_H
+#define QEMU_RCU_SYS_QUEUE_H
+
+/*
+ * rc_queue.h
+ *
+ * Userspace RCU QSBR header.
+ *
+ * LGPL-compatible code should include this header with :
+ *
+ * #define _LGPL_SOURCE
+ * #include urcu.h
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Copyright (c) 2013 Mike D. Day, IBM Corporation.
+ *
+ * IBM's contributions to this file may be relicensed under LGPLv2 or later.
+ */
+
+#include qemu/rcu.h  /* rcu.h includes qemu/queue.h and qemu/atomic.h */
+
+
+#ifdef __cplusplus
+extern C {
+#endif
+
+/*
+ * List functions.
+ */
+
+
+/* 
+ *  The difference between atomic_read/set and atomic_rcu_read/set
+ *  is in the including of a read/write memory barrier to the volatile
+ *  access. atomic_rcu_* macros include the memory barrier, the 
+ *  plain atomic macros do not. Therefore, it should be correct to 
+ *  issue a series of reads or writes to the same element using only 
+ *  the atomic_* macro, until the last read or write, which should be 
+ *  atomic_rcu_* to introduce a read or write memory barrier as 
+ *  appropriate.  
+ */
+
+/* Upon publication of the listelm-next value, list readers
+ * will see the new node when following next pointers from 
+ * antecedent nodes, but may not see the new node when following 
+ * prev pointers from subsequent nodes until after the rcu grace 
+ * period expires.   
+ * see linux/include/rculist.h __list_add_rcu(new, prev, next)
+ */
+#define QLIST_INSERT_AFTER_RCU(listelm, elm, field) do {\
+(elm)-field.le_next = (listelm)-field.le_next;\
+(elm

[Qemu-devel] [PATCH] Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system. These patches change from using Linux kernel style upcasts to typesafe object orien

2013-08-19 Thread Mike Day
These patches apply to Paolo Bonzini's rcu tree:

https://github.com/bonzini/qemu/tree/rcu
commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135

Signed-off-by: Mike Day ncm...@ncultra.org
---
 hw/misc/ivshmem.c  | 2 +-
 hw/pci-bridge/pci_bridge_dev.c | 6 +++---
 hw/pci/pci_bridge.c| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index ebcb52a..46d8c27 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
 
 static void pci_ivshmem_instance_finalize(Object *obj)
 {
-IVShmemState *s = IVSHMEM(dev);
+IVShmemState *s = IVSHMEM(obj);
 
 if (s-migration_blocker) {
 migrate_del_blocker(s-migration_blocker);
diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index c995d5d..22caf14 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -87,7 +87,6 @@ shpc_error:
 bridge_error:
 return err;
 }
-
 static void pci_bridge_dev_exitfn(PCIDevice *dev)
 {
 PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev);
@@ -102,8 +101,9 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev)
 static void pci_bridge_dev_instance_finalize(Object *obj)
 {
 PCIDevice *dev = PCI_DEVICE(obj);
-PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev);
-PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br);
+PCIBridge *br = PCI_BRIDGE(dev);
+PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(br);
+
 shpc_free(dev);
 memory_region_destroy(bridge_dev-bar);
 pci_bridge_free(dev);
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 63f9912..307e076 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -391,7 +391,7 @@ void pci_bridge_exitfn(PCIDevice *pci_dev)
 
 void pci_bridge_free(PCIDevice *pci_dev)
 {
-PCIBridge *s = DO_UPCAST(PCIBridge, dev, pci_dev);
+PCIBridge *s = PCI_BRIDGE(pci_dev);
 pci_bridge_region_cleanup(s, s-windows);
 memory_region_destroy(s-address_space_mem);
 memory_region_destroy(s-address_space_io);
-- 
1.8.3.1




[Qemu-devel] [PATCH RESEND] RCU implementation for Qemu. Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system.

2013-08-19 Thread Mike Day

These patches change from using Linux kernel style upcasts to typesafe
object oriented casts with runtime checking semantics.

These patches apply to Paolo Bonzini's rcu tree:

https://github.com/bonzini/qemu/tree/rcu
commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135

Signed-off-by: Mike Day ncm...@ncultra.org
---
 hw/misc/ivshmem.c  | 2 +-
 hw/pci-bridge/pci_bridge_dev.c | 6 +++---
 hw/pci/pci_bridge.c| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index ebcb52a..46d8c27 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
 
 static void pci_ivshmem_instance_finalize(Object *obj)
 {
-IVShmemState *s = IVSHMEM(dev);
+IVShmemState *s = IVSHMEM(obj);
 
 if (s-migration_blocker) {
 migrate_del_blocker(s-migration_blocker);
diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index c995d5d..22caf14 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -87,7 +87,6 @@ shpc_error:
 bridge_error:
 return err;
 }
-
 static void pci_bridge_dev_exitfn(PCIDevice *dev)
 {
 PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev);
@@ -102,8 +101,9 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev)
 static void pci_bridge_dev_instance_finalize(Object *obj)
 {
 PCIDevice *dev = PCI_DEVICE(obj);
-PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev);
-PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br);
+PCIBridge *br = PCI_BRIDGE(dev);
+PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(br);
+
 shpc_free(dev);
 memory_region_destroy(bridge_dev-bar);
 pci_bridge_free(dev);
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 63f9912..307e076 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -391,7 +391,7 @@ void pci_bridge_exitfn(PCIDevice *pci_dev)
 
 void pci_bridge_free(PCIDevice *pci_dev)
 {
-PCIBridge *s = DO_UPCAST(PCIBridge, dev, pci_dev);
+PCIBridge *s = PCI_BRIDGE(pci_dev);
 pci_bridge_region_cleanup(s, s-windows);
 memory_region_destroy(s-address_space_mem);
 memory_region_destroy(s-address_space_io);
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH] Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system. These patches change from using Linux kernel style upcasts to typesafe object o

2013-08-19 Thread Mike Day

Peter Maydell peter.mayd...@linaro.org writes:

 On 19 August 2013 19:33, Mike Day ncm...@ncultra.org wrote:
 These patches apply to Paolo Bonzini's rcu tree:

 https://github.com/bonzini/qemu/tree/rcu
 commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135

 Signed-off-by: Mike Day ncm...@ncultra.org
 ---
  hw/misc/ivshmem.c  | 2 +-
  hw/pci-bridge/pci_bridge_dev.c | 6 +++---
  hw/pci/pci_bridge.c| 2 +-
  3 files changed, 5 insertions(+), 5 deletions(-)

 diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
 index ebcb52a..46d8c27 100644
 --- a/hw/misc/ivshmem.c
 +++ b/hw/misc/ivshmem.c
 @@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)

  static void pci_ivshmem_instance_finalize(Object *obj)
  {
 -IVShmemState *s = IVSHMEM(dev);
 +IVShmemState *s = IVSHMEM(obj);

 This should have been a flat-out compiler error, right?

Yes, correct, but Paolo hasn't previously submitted this specific change
code afaik.

  if (s-migration_blocker) {
  migrate_del_blocker(s-migration_blocker);
 diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
 index c995d5d..22caf14 100644
 --- a/hw/pci-bridge/pci_bridge_dev.c
 +++ b/hw/pci-bridge/pci_bridge_dev.c
 @@ -87,7 +87,6 @@ shpc_error:
  bridge_error:
  return err;
  }
 -

 Stray blank line change.

Thanks - and git just found it for me too. Apologies.

Mike
-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
Endurance is a Virtue



[Qemu-devel] [RFC PATCH 0/3] v2.2 RCU Implementation for QEMU

2013-08-16 Thread Mike Day
This patch set merges Paolo's conversion of address spaces to enable
RCU. There is one more patchset coming for TLB access that I'm
debugging right now. After I submit the last one I'll start working on
enabling RCU more widely - and the series will need to be refactored.

The series is availale online at: 

https://github.com/ncultra/qemu/tree/rcu-for-1.7

Mike Day (2):
  fixup changes from commit f63ca950
  fixup changes from commit f62a6b2f from Paulo Bonzini

Paolo Bonzini (1):
  exec: change iotlb APIs to take AddressSpaceDispatch

 aio-posix.c|  1 +
 aio-win32.c|  1 +
 cpus.c |  3 +++
 cputlb.c   |  7 ---
 exec.c | 13 -
 include/exec/cpu-all.h |  3 +++
 include/exec/cputlb.h  |  9 ++---
 7 files changed, 26 insertions(+), 11 deletions(-)

-- 
1.8.3.1




  1   2   >