Re: [RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()

2021-03-01 Thread Philippe Mathieu-Daudé
On 3/1/21 7:13 PM, Stefan Hajnoczi wrote:
> On Mon, Mar 01, 2021 at 12:53:29PM +0100, Philippe Mathieu-Daudé wrote:
>> If the block drive is read-only we will model a "protected" flash
>> device. We can thus use memory_region_init_rom_device_from_file()
>> which mmap the backing file when creating the MemoryRegion.
>> If the same backing file is used by multiple QEMU instances, this
>> reduces the memory footprint (this is often the case with the
>> CODE flash image from OVMF and AAVMF).
>>
>> Suggested-by: Stefan Hajnoczi 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/block/pflash_cfi01.c | 39 +++
>>  1 file changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
>> index a5fa8d8b74a..ec290636298 100644
>> --- a/hw/block/pflash_cfi01.c
>> +++ b/hw/block/pflash_cfi01.c
>> @@ -702,6 +702,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
>> **errp)
>>  int ret;
>>  uint64_t blocks_per_device, sector_len_per_device, device_len;
>>  int num_devices;
>> +bool romd_mr_shared_mapped;
>>  
>>  if (pfl->sector_len == 0) {
>>  error_setg(errp, "attribute \"sector-length\" not specified or 
>> zero.");
>> @@ -743,19 +744,41 @@ static void pflash_cfi01_realize(DeviceState *dev, 
>> Error **errp)
>>  pfl->ro = 0;
>>  }
>>  
>> -memory_region_init_rom_device(
>> ->mem, OBJECT(dev),
>> -_cfi01_ops,
>> -pfl,
>> -pfl->name, total_len, errp);
>> -if (*errp) {
>> -return;
>> +if (pfl->ro && pfl->blk) {
>> +BlockDriverState *bs = blk_bs(pfl->blk);
>> +
>> +/* If "raw" driver used, try to mmap the backing file as RAM_SHARED 
>> */
>> +if (bs->drv == _raw) { /* FIXME check offset=0 ? */
> 
> Bypassing the block layer is tricky because there are a lot of features
> that conflict (you already pointed out the offset= option). Checking
> bdrv_raw is not enough because the underlying protocol driver could be
> GlusterFS, iSCSI, etc.

OK.

> I think the goal here is to avoid changing the command-line/QMP so that
> users don't need to modify their guests. Therefore changing the pflash
> qdev properties is not desirable (we could have added a separate code
> path that bypasses the block layer cleanly).

Yes, this is the limitation.

> This seems like a
> worthwhile optimization that the block layer should support. I suggest
> adding a new API like:
> 
>   /* Returns a filename string if @blk supports read-only mmap */
>   char *blk_get_read_only_mmap_filename(BlockBackend *blk, Error **errp);
> 
> Then block/raw-format.c would forward the call to bs->file and
> block/raw-posix.c would implement it by returning a new filename string
> when bs->read_only is true.

Thanks :) Kevin suggested something similar too.

> 
> FWIW this API isn't perfect because the file could be reopened with QMP
> and the existing mmap would remain in place.

Can you show me a QMP example or point me at the command?
This shouldn't happen with the pflash.

Thanks for reviewing,

Phil.




Re: [PATCH v2 2/2] storage-daemon: include current command line option in the errors

2021-03-01 Thread Markus Armbruster
Paolo Bonzini  writes:

> Use the location management facilities that the emulator uses, so that
> the current command line option appears in the error message.
>
> Before:
>
>   $ storage-daemon/qemu-storage-daemon --nbd key..=
>   qemu-storage-daemon: Invalid parameter 'key..'
>
> After:
>
>   $ storage-daemon/qemu-storage-daemon --nbd key..=
>   qemu-storage-daemon: --nbd key..=: Invalid parameter 'key..'
>
> Reviewed-by: Eric Blake 
> Signed-off-by: Paolo Bonzini 

I have a similar patch in an unfinished branch.  You win :)

> ---
>  storage-daemon/qemu-storage-daemon.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/storage-daemon/qemu-storage-daemon.c 
> b/storage-daemon/qemu-storage-daemon.c
> index 9aa82e7d96..78ddf619d4 100644
> --- a/storage-daemon/qemu-storage-daemon.c
> +++ b/storage-daemon/qemu-storage-daemon.c
> @@ -152,6 +152,20 @@ static void init_qmp_commands(void)
>   qmp_marshal_qmp_capabilities, QCO_ALLOW_PRECONFIG);
>  }
>  
> +static int getopt_set_loc(int argc, char **argv, const char *optstring,
> +  const struct option *longopts)
> +{
> +int c, save_index;
> +
> +optarg = NULL;
> +save_index = optind;
> +c = getopt_long(argc, argv, optstring, longopts, NULL);
> +if (optarg) {
> +loc_set_cmdline(argv, save_index, MAX(1, optind - save_index));
> +}
> +return c;
> +}
> +

I think this function is more widely applicable:

$ git-grep -l getopt_long | xargs grep -l error_report
qemu-img.c
qemu-io.c
qemu-nbd.c
scsi/qemu-pr-helper.c
storage-daemon/qemu-storage-daemon.c

>  static void process_options(int argc, char *argv[])
>  {
>  int c;
> @@ -174,7 +188,7 @@ static void process_options(int argc, char *argv[])
>   * they are given on the command lines. This means that things must be
>   * defined first before they can be referenced in another option.
>   */
> -while ((c = getopt_long(argc, argv, "-hT:V", long_options, NULL)) != -1) 
> {
> +while ((c = getopt_set_loc(argc, argv, "-hT:V", long_options)) != -1) {
>  switch (c) {
>  case '?':
>  exit(EXIT_FAILURE);
> @@ -275,12 +289,13 @@ static void process_options(int argc, char *argv[])
>  break;
>  }
>  case 1:
> -error_report("Unexpected argument: %s", optarg);
> +error_report("Unexpected argument");
>  exit(EXIT_FAILURE);
>  default:
>  g_assert_not_reached();
>  }
>  }
> +loc_set_none();
>  }
>  
>  int main(int argc, char *argv[])




Re: [PATCH v2 1/2] storage-daemon: report unexpected arguments on the fly

2021-03-01 Thread Markus Armbruster
Eric Blake  writes:

> On 3/1/21 9:28 AM, Paolo Bonzini wrote:
>> If the first character of optstring is '-', then each nonoption argv
>> element is handled as if it were the argument of an option with character
>> code 1.  This removes the reordering of the argv array, and enables usage
>> of loc_set_cmdline to provide better error messages.
>> 
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  storage-daemon/qemu-storage-daemon.c | 9 -
>>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> Nice.  The man page for 'getopt_long' is unclear whether setting
> POSIXLY_CORRECT in the environment would break this (that is, setting
> POSIXLY_CORRECT has the same effect as a leading '+'; but you can't have
> both leading '+' and leading '-' and when both are set, it is not clear
> which one wins).  But that's a corner case that I don't think will ever
> bite us in real life.
>
> Reviewed-by: Eric Blake 

I'd consider environment overruling the programmer's express intent a
bug.

GLibc's _getopt_initialize():

  /* Determine how to handle the ordering of options and nonoptions.  */
  if (optstring[0] == '-')
{
  d->__ordering = RETURN_IN_ORDER;
  ++optstring;
}
  else if (optstring[0] == '+')
{
  d->__ordering = REQUIRE_ORDER;
  ++optstring;
}
  else if (posixly_correct || !!getenv ("POSIXLY_CORRECT"))
d->__ordering = REQUIRE_ORDER;
  else
d->__ordering = PERMUTE;

No surprises here.




Re: [PATCH v2 0/5] hw/block/nvme: misc fixes

2021-03-01 Thread Klaus Jensen
On Feb 22 19:47, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Small set of misc fixes from Gollu.
> 
> v2 changes
> 
>   * Split off the trace event additions from "[PATCH 1/3] hw/block/nvme:
> nvme_identify fixes" and "[PATCH 2/3] hw/block/nvme: fix potential
> compilation error" into their own commits (Minwoo, Philippe)
>   * Fix a missing check on the zasl_bs param in the
> nvme_identify_ctrl_csi refactor (Minwoo)
> 
> Gollu Appalanaidu (5):
>   hw/block/nvme: remove unnecessary endian conversion
>   hw/block/nvme: add identify trace event
>   hw/block/nvme: fix potential compilation error
>   hw/block/nvme: add trace event for zone read check
>   hw/block/nvme: report non-mdts command size limit for dsm
> 
>  hw/block/nvme.h   |  1 +
>  include/block/nvme.h  | 11 +++
>  hw/block/nvme.c   | 45 ---
>  hw/block/trace-events |  2 ++
>  4 files changed, 43 insertions(+), 16 deletions(-)
> 

Applied to nvme-next!


signature.asc
Description: PGP signature


[PATCH] file-posix: allow -EBUSY errors during write zeros on block

2021-03-01 Thread ChangLimin
After Linux 5.10, write zeros to a multipath device using
ioctl(fd, BLKZEROOUT, range) with cache none or directsync will return EBUSY.

Similar to handle_aiocb_write_zeroes_unmap, handle_aiocb_write_zeroes_block
allow -EBUSY errors during ioctl(fd, BLKZEROOUT, range).

Reference commit in Linux 5.10:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02

Signed-off-by: ChangLimin 
---
 block/file-posix.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 05079b40ca..3e60c96214 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1629,8 +1629,13 @@ static ssize_t 
handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
         } while (errno == EINTR);

         ret = translate_err(-errno);
-        if (ret == -ENOTSUP) {
+        switch (ret) {
+        case -ENOTSUP:
+        case -EINVAL:
+        case -EBUSY:
             s->has_write_zeroes = false;
+            return -ENOTSUP;
+            break;
         }
     }
 #endif
--
2.27.0



Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> On Mon, Mar 01, 2021 at 09:49:21AM -0600, Eric Blake wrote:
>> On 3/1/21 9:41 AM, Daniel P. Berrangé wrote:
>> > On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
>> >> The QMP monitor, NBD server, and vhost-user-blk export all support file
>> >> descriptor passing. This is a useful technique because it allows the
>> >> parent process to spawn and wait for qemu-storage-daemon without busy
>> >> waiting, which may delay startup due to arbitrary sleep() calls.
>> >>
>> >> This Python example is inspired by the test case written for libnbd by
>> >> Richard W.M. Jones :
>> >> https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
>> >>
>> >> Thanks to Daniel P. Berrangé  for suggestions on
>> >> how to get this working. Now let's document it!
>> >>
>> 
>> >> +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())
>> > 
>> > Example code inevitably gets cut+paste into real world apps, and this
>> > example is a tmpfile CVE flaw. At least put it in $CWD instead.
>> 
>> Except $CWD may be too long for a sock file name to be created.
>> Creating the sock in a securely-created subdirectory of /tmp is more
>> reliable.
>
> $XDG_RUNTIME_DIR then, which is /run/user/$UID, so safely per user on all
> modern OS.

Reach under your pillow and check the standard library:

import tempfile

with tempfile.TemporaryDirectory() as tmpdirname:
print('created temporary directory', tmpdirname)

https://docs.python.org/3.6/library/tempfile.html#tempfile.TemporaryDirectory




[PATCH 2/2] blockdev: Clarify error messages pertaining to 'node-name'

2021-03-01 Thread Connor Kuehl
Signed-off-by: Connor Kuehl 
---
 blockdev.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index cd438e60e3..7c7ab2b386 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1515,13 +1515,13 @@ static void external_snapshot_prepare(BlkActionState 
*common,
 s->has_snapshot_node_name ? s->snapshot_node_name : NULL;
 
 if (node_name && !snapshot_node_name) {
-error_setg(errp, "New overlay node name missing");
+error_setg(errp, "New overlay node-name missing");
 goto out;
 }
 
 if (snapshot_node_name &&
 bdrv_lookup_bs(snapshot_node_name, snapshot_node_name, NULL)) {
-error_setg(errp, "New overlay node name already in use");
+error_setg(errp, "New overlay node-name already in use");
 goto out;
 }
 
@@ -3598,13 +3598,14 @@ void qmp_x_blockdev_reopen(BlockdevOptions *options, 
Error **errp)
 
 /* Check for the selected node name */
 if (!options->has_node_name) {
-error_setg(errp, "Node name not specified");
+error_setg(errp, "node-name not specified");
 goto fail;
 }
 
 bs = bdrv_find_node(options->node_name);
 if (!bs) {
-error_setg(errp, "Cannot find node named '%s'", options->node_name);
+error_setg(errp, "Failed to find node with node-name='%s'",
+   options->node_name);
 goto fail;
 }
 
@@ -3635,7 +3636,7 @@ void qmp_blockdev_del(const char *node_name, Error **errp)
 
 bs = bdrv_find_node(node_name);
 if (!bs) {
-error_setg(errp, "Cannot find node %s", node_name);
+error_setg(errp, "Failed to find node with node-name='%s'", node_name);
 return;
 }
 if (bdrv_has_blk(bs)) {
@@ -3758,7 +3759,7 @@ void qmp_x_blockdev_set_iothread(const char *node_name, 
StrOrNull *iothread,
 
 bs = bdrv_find_node(node_name);
 if (!bs) {
-error_setg(errp, "Cannot find node %s", node_name);
+error_setg(errp, "Failed to find node with node-name='%s'", node_name);
 return;
 }
 
-- 
2.29.2




[PATCH 1/2] block: Clarify error messages pertaining to 'node-name'

2021-03-01 Thread Connor Kuehl
Reported-by: Tingting Mao 
Fixes: https://bugzilla.redhat.com/1651437
Signed-off-by: Connor Kuehl 
---
 block.c| 8 
 tests/qemu-iotests/040 | 4 ++--
 tests/qemu-iotests/249.out | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index a1f3cecd75..2daff6d29a 100644
--- a/block.c
+++ b/block.c
@@ -1440,7 +1440,7 @@ static void bdrv_assign_node_name(BlockDriverState *bs,
  * Check for empty string or invalid characters, but not if it is
  * generated (generated names use characters not available to the user)
  */
-error_setg(errp, "Invalid node name");
+error_setg(errp, "Invalid node-name: '%s'", node_name);
 return;
 }
 
@@ -1453,7 +1453,7 @@ static void bdrv_assign_node_name(BlockDriverState *bs,
 
 /* takes care of avoiding duplicates node names */
 if (bdrv_find_node(node_name)) {
-error_setg(errp, "Duplicate node name");
+error_setg(errp, "Duplicate nodes with node-name='%s'", node_name);
 goto out;
 }
 
@@ -5432,7 +5432,7 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
 }
 }
 
-error_setg(errp, "Cannot find device=%s nor node_name=%s",
+error_setg(errp, "Cannot find device=\'%s\' nor node-name=\'%s\'",
  device ? device : "",
  node_name ? node_name : "");
 return NULL;
@@ -6752,7 +6752,7 @@ BlockDriverState *check_to_replace_node(BlockDriverState 
*parent_bs,
 AioContext *aio_context;
 
 if (!to_replace_bs) {
-error_setg(errp, "Node name '%s' not found", node_name);
+error_setg(errp, "Failed to find node with node-name='%s'", node_name);
 return NULL;
 }
 
diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 7ebc9ed825..336ff7c4f2 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -175,13 +175,13 @@ class TestSingleDrive(ImageCommitTestCase):
 self.assert_no_active_block_jobs()
 result = self.vm.qmp('block-commit', device='drive0', 
top_node='badfile', base_node='base')
 self.assert_qmp(result, 'error/class', 'GenericError')
-self.assert_qmp(result, 'error/desc', "Cannot find device= nor 
node_name=badfile")
+self.assert_qmp(result, 'error/desc', "Cannot find device='' nor 
node-name='badfile'")
 
 def test_base_node_invalid(self):
 self.assert_no_active_block_jobs()
 result = self.vm.qmp('block-commit', device='drive0', top_node='mid', 
base_node='badfile')
 self.assert_qmp(result, 'error/class', 'GenericError')
-self.assert_qmp(result, 'error/desc', "Cannot find device= nor 
node_name=badfile")
+self.assert_qmp(result, 'error/desc', "Cannot find device='' nor 
node-name='badfile'")
 
 def test_top_path_and_node(self):
 self.assert_no_active_block_jobs()
diff --git a/tests/qemu-iotests/249.out b/tests/qemu-iotests/249.out
index 92ec81db03..d2bf9be85e 100644
--- a/tests/qemu-iotests/249.out
+++ b/tests/qemu-iotests/249.out
@@ -18,7 +18,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 
backing_file=TEST_DIR/t.
  'filter-node-name': '1234'}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}}
-{"error": {"class": "GenericError", "desc": "Invalid node name"}}
+{"error": {"class": "GenericError", "desc": "Invalid node-name: '1234'"}}
 
 === Send a write command to a drive opened in read-only mode (2)
 
-- 
2.29.2




[PATCH 0/2] Clarify error messages pertaining to 'node-name'

2021-03-01 Thread Connor Kuehl
Some error messages contain ambiguous representations of the 'node-name'
parameter. This can be particularly confusing when exchanging QMP
messages (C = client, S = server):

C: {"execute": "block_resize", "arguments": { "device": "my_file", "size": 
26843545600 }}
S: {"error": {"class": "GenericError", "desc": "Cannot find device=my_file nor 
node_name="}}
   
^

This error message suggests one could send a message with a key called
'node_name':

C: {"execute": "block_resize", "arguments": { "node_name": "my_file", "size": 
26843545600 }}
   ^

but using the underscore is actually incorrect, the parameter should be
'node-name':

S: {"error": {"class": "GenericError", "desc": "Parameter 'node_name' is 
unexpected"}}

This behavior was uncovered in bz1651437[1], but I ended up going down a
rabbit hole looking for other areas where this miscommunication might
occur and changing those accordingly as well.

[1] https://bugzilla.redhat.com/1651437

Connor Kuehl (2):
  block: Clarify error messages pertaining to 'node-name'
  blockdev: Clarify error messages pertaining to 'node-name'

 block.c|  8 
 blockdev.c | 13 +++--
 tests/qemu-iotests/040 |  4 ++--
 tests/qemu-iotests/249.out |  2 +-
 4 files changed, 14 insertions(+), 13 deletions(-)

-- 
2.29.2




Re: [PATCH v3 2/2] docs: replace insecure /tmp examples in qsd docs

2021-03-01 Thread Richard W.M. Jones


For the series:

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW




Re: [PATCH v4 00/12] hw/block/nvme: metadata and end-to-end data protection support

2021-03-01 Thread Keith Busch
On Mon, Mar 01, 2021 at 03:00:35PM +0100, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> This is v4 (RFC removed) of a series that adds support for metadata and
> end-to-end data protection.
> 
> First, on the subject of metadata, in v1, support was restricted to
> extended logical blocks, which was pretty trivial to implement, but
> required special initialization and broke DULBE. In v2, metadata is
> always stored continuously at the end of the underlying block device.
> This has the advantage of not breaking DULBE since the data blocks
> remains aligned and allows bdrv_block_status to be used to determinate
> allocation status. It comes at the expense of complicating the extended
> LBA emulation, but on the other hand it also gains support for metadata
> transfered as a separate buffer.
> 
> The end-to-end data protection support blew up in terms of required
> changes. This is due to the fact that a bunch of new commands has been
> added to the device since v1 (zone append, compare, copy), and they all
> require various special handling for protection information. If
> potential reviewers would like it split up into multiple patches, each
> adding pi support to one command, shout out.
> 
> The core of the series (metadata and eedp) is preceeded by a set of
> patches that refactors mapping (yes, again) and tries to deal with the
> qsg/iov duality mess (maybe also again?).
> 
> Support fro metadata and end-to-end data protection is all joint work
> with Gollu Appalanaidu.

Looks fine.

Reviewed-by: Keith Busch 



Re: [PATCH v3 2/2] docs: replace insecure /tmp examples in qsd docs

2021-03-01 Thread Daniel P . Berrangé
On Mon, Mar 01, 2021 at 05:27:28PM +, Stefan Hajnoczi wrote:
> World-writeable directories have security issues. Avoid showing them in
> the documentation since someone might accidentally use them in
> situations where they are insecure.
> 
> There tend to be 3 security problems:
> 1. Denial of service. An adversary may be able to create the file
>beforehand, consume all space/inodes, etc to sabotage us.
> 2. Impersonation. An adversary may be able to create a listen socket and
>accept incoming connections that were meant for us.
> 3. Unauthenticated client access. An adversary may be able to connect to
>us if we did not set the uid/gid and permissions correctly.
> 
> These can be prevented or mitigated with private /tmp, carefully setting
> the umask, etc but that requires special action and does not apply to
> all situations. Just avoid using /tmp in examples.
> 
> Reported-by: Richard W.M. Jones 
> Reported-by: Daniel P. Berrangé 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  docs/tools/qemu-storage-daemon.rst | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v3 1/2] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Daniel P . Berrangé
On Mon, Mar 01, 2021 at 05:27:27PM +, Stefan Hajnoczi wrote:
> The QMP monitor, NBD server, and vhost-user-blk export all support file
> descriptor passing. This is a useful technique because it allows the
> parent process to spawn and wait for qemu-storage-daemon without busy
> waiting, which may delay startup due to arbitrary sleep() calls.
> 
> This Python example is inspired by the test case written for libnbd by
> Richard W.M. Jones :
> https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
> 
> Thanks to Daniel P. Berrangé  for suggestions on
> how to get this working. Now let's document it!
> 
> Reported-by: Richard W.M. Jones 
> Cc: Kevin Wolf 
> Cc: Daniel P. Berrangé 
> Signed-off-by: Stefan Hajnoczi 
> ---
> v2:
>  * Use /var/run/qmp.sock instead of /tmp/qmp-$PID.sock to prevent
>security issues with world-writeable directories [Rich, Daniel]
> ---
>  docs/tools/qemu-storage-daemon.rst | 42 --
>  1 file changed, 40 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 0/2] storage-daemon: include current command line option in the errors

2021-03-01 Thread Kevin Wolf
Am 01.03.2021 um 16:28 hat Paolo Bonzini geschrieben:
> Use the location management facilities that the emulator uses, so that
> the current command line option appears in the error message.
> 
> Before:
> 
>   $ storage-daemon/qemu-storage-daemon --nbd key..=
>   qemu-storage-daemon: Invalid parameter 'key..'
> 
> After:
> 
>   $ storage-daemon/qemu-storage-daemon --nbd key..=
>   qemu-storage-daemon: --nbd key..=: Invalid parameter 'key..'
> 
> The first patch tweaks the command line parsing so that argv is
> not reordered by getopt_long and optind is only advanced by one
> option+argument on every call to getopt_long.  This is required
> by loc_set_cmdline.

Thanks, very useful to know about the "-" switch in getopts.

Applied to the block branch.

Kevin




Re: [PATCH v2 0/2] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Stefan Hajnoczi
On Mon, Mar 01, 2021 at 05:11:05PM +, Stefan Hajnoczi wrote:
> Add an example of how to spawn qemu-storage-daemon with fd passing. This
> approach eliminates the need to busy wait for the QMP, NBD, or vhost-user
> socket to become available.
> 
> v2:
>  * Use /var/run/qmp.sock instead of /tmp/qmp-$PID.sock to prevent security
>issues with world-writeable directories [Rich, Daniel]
>  * Add Patch 2 to fix insecure examples in the documentation [Rich, Daniel]
> 
> Stefan Hajnoczi (2):
>   docs: show how to spawn qemu-storage-daemon with fd passing
>   docs: replace insecure /tmp examples in qsd docs
> 
>  docs/tools/qemu-storage-daemon.rst | 44 ++
>  1 file changed, 39 insertions(+), 5 deletions(-)

Obsoleted by v3.

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()

2021-03-01 Thread Stefan Hajnoczi
On Mon, Mar 01, 2021 at 12:53:29PM +0100, Philippe Mathieu-Daudé wrote:
> If the block drive is read-only we will model a "protected" flash
> device. We can thus use memory_region_init_rom_device_from_file()
> which mmap the backing file when creating the MemoryRegion.
> If the same backing file is used by multiple QEMU instances, this
> reduces the memory footprint (this is often the case with the
> CODE flash image from OVMF and AAVMF).
> 
> Suggested-by: Stefan Hajnoczi 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/block/pflash_cfi01.c | 39 +++
>  1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
> index a5fa8d8b74a..ec290636298 100644
> --- a/hw/block/pflash_cfi01.c
> +++ b/hw/block/pflash_cfi01.c
> @@ -702,6 +702,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
> **errp)
>  int ret;
>  uint64_t blocks_per_device, sector_len_per_device, device_len;
>  int num_devices;
> +bool romd_mr_shared_mapped;
>  
>  if (pfl->sector_len == 0) {
>  error_setg(errp, "attribute \"sector-length\" not specified or 
> zero.");
> @@ -743,19 +744,41 @@ static void pflash_cfi01_realize(DeviceState *dev, 
> Error **errp)
>  pfl->ro = 0;
>  }
>  
> -memory_region_init_rom_device(
> ->mem, OBJECT(dev),
> -_cfi01_ops,
> -pfl,
> -pfl->name, total_len, errp);
> -if (*errp) {
> -return;
> +if (pfl->ro && pfl->blk) {
> +BlockDriverState *bs = blk_bs(pfl->blk);
> +
> +/* If "raw" driver used, try to mmap the backing file as RAM_SHARED 
> */
> +if (bs->drv == _raw) { /* FIXME check offset=0 ? */

Bypassing the block layer is tricky because there are a lot of features
that conflict (you already pointed out the offset= option). Checking
bdrv_raw is not enough because the underlying protocol driver could be
GlusterFS, iSCSI, etc.

I think the goal here is to avoid changing the command-line/QMP so that
users don't need to modify their guests. Therefore changing the pflash
qdev properties is not desirable (we could have added a separate code
path that bypasses the block layer cleanly). This seems like a
worthwhile optimization that the block layer should support. I suggest
adding a new API like:

  /* Returns a filename string if @blk supports read-only mmap */
  char *blk_get_read_only_mmap_filename(BlockBackend *blk, Error **errp);

Then block/raw-format.c would forward the call to bs->file and
block/raw-posix.c would implement it by returning a new filename string
when bs->read_only is true.

FWIW this API isn't perfect because the file could be reopened with QMP
and the existing mmap would remain in place.

> +Error *local_err = NULL;
> +
> +memory_region_init_rom_device_from_file(>mem, OBJECT(dev),
> +_cfi01_ops, pfl,
> +pfl->name, total_len,
> +qemu_real_host_page_size,
> +RAM_SHARED,
> +bs->exact_filename,
> +true, _err);
> +if (local_err) {
> +error_report_err(local_err);
> +/* fall back to memory_region_init_rom_device() */
> +} else {
> +romd_mr_shared_mapped = true;
> +}
> +}
> +}
> +if (!romd_mr_shared_mapped) {
> +memory_region_init_rom_device(>mem, OBJECT(dev),
> +  _cfi01_ops, pfl,
> +  pfl->name, total_len, errp);
> +if (*errp) {
> +return;
> +}
>  }
>  
>  pfl->storage = memory_region_get_ram_ptr(>mem);
>  sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mem);
>  
> -if (pfl->blk) {
> +if (pfl->blk && !romd_mr_shared_mapped) {
>  if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
>   errp)) {
>  vmstate_unregister_ram(>mem, DEVICE(pfl));
> -- 
> 2.26.2
> 


signature.asc
Description: PGP signature


[PATCH v3 1/2] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Stefan Hajnoczi
The QMP monitor, NBD server, and vhost-user-blk export all support file
descriptor passing. This is a useful technique because it allows the
parent process to spawn and wait for qemu-storage-daemon without busy
waiting, which may delay startup due to arbitrary sleep() calls.

This Python example is inspired by the test case written for libnbd by
Richard W.M. Jones :
https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543

Thanks to Daniel P. Berrangé  for suggestions on
how to get this working. Now let's document it!

Reported-by: Richard W.M. Jones 
Cc: Kevin Wolf 
Cc: Daniel P. Berrangé 
Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Use /var/run/qmp.sock instead of /tmp/qmp-$PID.sock to prevent
   security issues with world-writeable directories [Rich, Daniel]
---
 docs/tools/qemu-storage-daemon.rst | 42 --
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index f63627eaf6..789a8e4a75 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -101,10 +101,12 @@ Standard options:
 
 .. option:: --nbd-server 
addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
   --nbd-server 
addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
+  --nbd-server 
addr.type=fd,addr.str=[,tls-creds=][,tls-authz=][,max-connections=]
 
   is a server for NBD exports. Both TCP and UNIX domain sockets are supported.
-  TLS encryption can be configured using ``--object`` tls-creds-* and authz-*
-  secrets (see below).
+  A listen socket can be provided via file descriptor passing (see Examples
+  below). TLS encryption can be configured using ``--object`` tls-creds-* and
+  authz-* secrets (see below).
 
   To configure an NBD server on UNIX domain socket path ``/tmp/nbd.sock``::
 
@@ -127,6 +129,42 @@ QMP commands::
   --chardev socket,path=qmp.sock,server,nowait,id=char1 \
   --monitor chardev=char1
 
+Launch the daemon from Python with a QMP monitor socket using file descriptor
+passing so there is no need to busy wait for the QMP monitor to become
+available::
+
+  #!/usr/bin/env python3
+  import subprocess
+  import socket
+
+  sock_path = '/var/run/qmp.sock'
+
+  with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as listen_sock:
+  listen_sock.bind(sock_path)
+  listen_sock.listen()
+
+  fd = listen_sock.fileno()
+
+  subprocess.Popen(
+  ['qemu-storage-daemon',
+   '--chardev', f'socket,fd={fd},server=on,id=char1',
+   '--monitor', 'chardev=char1'],
+  pass_fds=[fd],
+  )
+
+  # listen_sock was automatically closed when leaving the 'with' statement
+  # body. If the daemon process terminated early then the following connect()
+  # will fail with "Connection refused" because no process has the listen
+  # socket open anymore. Launch errors can be detected this way.
+
+  qmp_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
+  qmp_sock.connect(sock_path)
+  ...QMP interaction...
+
+The same socket spawning approach also works with the ``--nbd-server
+addr.type=fd,addr.str=`` and ``--export
+type=vhost-user-blk,addr.type=fd,addr.str=`` options.
+
 Export raw image file ``disk.img`` over NBD UNIX domain socket ``nbd.sock``::
 
   $ qemu-storage-daemon \
-- 
2.29.2



[PATCH v3 2/2] docs: replace insecure /tmp examples in qsd docs

2021-03-01 Thread Stefan Hajnoczi
World-writeable directories have security issues. Avoid showing them in
the documentation since someone might accidentally use them in
situations where they are insecure.

There tend to be 3 security problems:
1. Denial of service. An adversary may be able to create the file
   beforehand, consume all space/inodes, etc to sabotage us.
2. Impersonation. An adversary may be able to create a listen socket and
   accept incoming connections that were meant for us.
3. Unauthenticated client access. An adversary may be able to connect to
   us if we did not set the uid/gid and permissions correctly.

These can be prevented or mitigated with private /tmp, carefully setting
the umask, etc but that requires special action and does not apply to
all situations. Just avoid using /tmp in examples.

Reported-by: Richard W.M. Jones 
Reported-by: Daniel P. Berrangé 
Signed-off-by: Stefan Hajnoczi 
---
 docs/tools/qemu-storage-daemon.rst | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index 789a8e4a75..2da28a447a 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -69,7 +69,7 @@ Standard options:
   a description of character device properties. A common character device
   definition configures a UNIX domain socket::
 
-  --chardev socket,id=char1,path=/tmp/qmp.sock,server,nowait
+  --chardev socket,id=char1,path=/var/run/qsd-qmp.sock,server,nowait
 
 .. option:: --export 
[type=]nbd,id=,node-name=[,name=][,writable=on|off][,bitmap=]
   --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=unix,addr.path=[,writable=on|off][,logical-block-size=][,num-queues=]
@@ -108,9 +108,10 @@ Standard options:
   below). TLS encryption can be configured using ``--object`` tls-creds-* and
   authz-* secrets (see below).
 
-  To configure an NBD server on UNIX domain socket path ``/tmp/nbd.sock``::
+  To configure an NBD server on UNIX domain socket path
+  ``/var/run/qsd-nbd.sock``::
 
-  --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock
+  --nbd-server addr.type=unix,addr.path=/var/run/qsd-nbd.sock
 
 .. option:: --object help
   --object ,help
-- 
2.29.2



[PATCH v3 0/2] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Stefan Hajnoczi
v3:
 * Explain how to detect launch errors and that the listen socket must be
   closed in the parent process in order for this to work [Daniel]

v2:
 * Use /var/run/qmp.sock instead of /tmp/qmp-$PID.sock to prevent security
   issues with world-writeable directories [Rich, Daniel]
 * Add Patch 2 to fix insecure examples in the documentation [Rich, Daniel]

Add an example of how to spawn qemu-storage-daemon with fd passing. This
approach eliminates the need to busy wait for the QMP, NBD, or vhost-user
socket to become available.

Stefan Hajnoczi (2):
  docs: show how to spawn qemu-storage-daemon with fd passing
  docs: replace insecure /tmp examples in qsd docs

 docs/tools/qemu-storage-daemon.rst | 49 +++---
 1 file changed, 44 insertions(+), 5 deletions(-)

-- 
2.29.2



Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Stefan Hajnoczi
On Mon, Mar 01, 2021 at 04:56:17PM +, Daniel P. Berrangé wrote:
> On Mon, Mar 01, 2021 at 04:50:14PM +, Stefan Hajnoczi wrote:
> > On Mon, Mar 01, 2021 at 03:44:42PM +, Daniel P. Berrangé wrote:
> > > On Mon, Mar 01, 2021 at 03:39:06PM +, Richard W.M. Jones wrote:
> > > > On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
> > > > > The QMP monitor, NBD server, and vhost-user-blk export all support 
> > > > > file
> > > > > descriptor passing. This is a useful technique because it allows the
> > > > > parent process to spawn and wait for qemu-storage-daemon without busy
> > > > > waiting, which may delay startup due to arbitrary sleep() calls.
> > > > > 
> > > > > This Python example is inspired by the test case written for libnbd by
> > > > > Richard W.M. Jones :
> > > > > https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
> > > > > 
> > > > > Thanks to Daniel P. Berrangé  for suggestions on
> > > > > how to get this working. Now let's document it!
> > > > > 
> > > > > Reported-by: Richard W.M. Jones 
> > > > > Cc: Kevin Wolf 
> > > > > Cc: Daniel P. Berrangé 
> > > > > Signed-off-by: Stefan Hajnoczi 
> > > > > ---
> > > > >  docs/tools/qemu-storage-daemon.rst | 38 
> > > > > --
> > > > >  1 file changed, 36 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/docs/tools/qemu-storage-daemon.rst 
> > > > > b/docs/tools/qemu-storage-daemon.rst
> > > > > index f63627eaf6..45854c131e 100644
> > > > > --- a/docs/tools/qemu-storage-daemon.rst
> > > > > +++ b/docs/tools/qemu-storage-daemon.rst
> > > > > @@ -101,10 +101,12 @@ Standard options:
> > > > >  
> > > > >  .. option:: --nbd-server 
> > > > > addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
> > > > >--nbd-server 
> > > > > addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
> > > > > +  --nbd-server 
> > > > > addr.type=fd,addr.str=[,tls-creds=][,tls-authz=][,max-connections=]
> > > > >  
> > > > >is a server for NBD exports. Both TCP and UNIX domain sockets are 
> > > > > supported.
> > > > > -  TLS encryption can be configured using ``--object`` tls-creds-* 
> > > > > and authz-*
> > > > > -  secrets (see below).
> > > > > +  A listen socket can be provided via file descriptor passing (see 
> > > > > Examples
> > > > > +  below). TLS encryption can be configured using ``--object`` 
> > > > > tls-creds-* and
> > > > > +  authz-* secrets (see below).
> > > > >  
> > > > >To configure an NBD server on UNIX domain socket path 
> > > > > ``/tmp/nbd.sock``::
> > > > >  
> > > > > @@ -127,6 +129,38 @@ QMP commands::
> > > > >--chardev socket,path=qmp.sock,server,nowait,id=char1 \
> > > > >--monitor chardev=char1
> > > > >  
> > > > > +Launch the daemon from Python with a QMP monitor socket using file 
> > > > > descriptor
> > > > > +passing so there is no need to busy wait for the QMP monitor to 
> > > > > become
> > > > > +available::
> > > > > +
> > > > > +  #!/usr/bin/env python3
> > > > > +  import os
> > > > > +  import subprocess
> > > > > +  import socket
> > > > > +
> > > > > +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())
> > > > 
> > > > Not sure how much you worry about the insecure / easily guessable tmp
> > > > path here.  I notice that there's already one in the surrounding
> > > > documentation (/tmp/nbd.sock) so maybe it's not a problem :-)
> > > > 
> > > > > +  with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as 
> > > > > listen_sock:
> > > > > +  listen_sock.bind(sock_path)
> > > > > +  listen_sock.listen()
> > > > > +
> > > > > +  fd = listen_sock.fileno()
> > > > > +
> > > > > +  subprocess.Popen(
> > > > > +  ['qemu-storage-daemon',
> > > > > +   '--chardev', f'socket,fd={fd},server=on,id=char1',
> > > > > +   '--monitor', 'chardev=char1'],
> > > > > +  pass_fds=[fd],
> > > > > +  )
> > > > > +
> > > > > +  qmp_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
> > > > > +  qmp_sock.connect(sock_path)
> > > > 
> > > > A note that the order of opening the sockets is slightly different
> > > > from how I did it in the interop test.  But I believe it makes no
> > > > difference, as long as you don't connect to the socket until it's in
> > > > the listening state, which is what you're doing here.  So it should be
> > > > fine.
> > > 
> > > Nothing here is closing listen_sock in the parent though.
> > > 
> > > The trick of passing the listener FD into the child relies on the
> > > listener being closed in the parent, so that the parent can get
> > > a socket error if the child exits abnormally during startup. Keeping
> > > the listen socket open means the parent will wait forever for an
> > > accept() that never comes.
> > 
> > The listen socket is closed by the context manager at the end of the
> > 'with' statement. This is the modern Python approach for resource
> > acquisition that also 

[PATCH v2 2/2] docs: replace insecure /tmp examples in qsd docs

2021-03-01 Thread Stefan Hajnoczi
World-writeable directories have security issues. Avoid showing them in
the documentation since someone might accidentally use them in
situations where they are insecure.

There tend to be 3 security problems:
1. Denial of service. An adversary may be able to create the file
   beforehand, consume all space/inodes, etc to sabotage us.
2. Impersonation. An adversary may be able to create a listen socket and
   accept incoming connections that were meant for us.
3. Unauthenticated client access. An adversary may be able to connect to
   us if we did not set the uid/gid and permissions correctly.

These can be prevented or mitigated with private /tmp, carefully setting
the umask, etc but that requires special action and does not apply to
all situations. Just avoid using /tmp in examples.

Reported-by: Richard W.M. Jones 
Reported-by: Daniel P. Berrangé 
Signed-off-by: Stefan Hajnoczi 
---
 docs/tools/qemu-storage-daemon.rst | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index 3b67ca72df..0c2a915434 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -69,7 +69,7 @@ Standard options:
   a description of character device properties. A common character device
   definition configures a UNIX domain socket::
 
-  --chardev socket,id=char1,path=/tmp/qmp.sock,server,nowait
+  --chardev socket,id=char1,path=/var/run/qsd-qmp.sock,server,nowait
 
 .. option:: --export 
[type=]nbd,id=,node-name=[,name=][,writable=on|off][,bitmap=]
   --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=unix,addr.path=[,writable=on|off][,logical-block-size=][,num-queues=]
@@ -108,9 +108,10 @@ Standard options:
   below). TLS encryption can be configured using ``--object`` tls-creds-* and
   authz-* secrets (see below).
 
-  To configure an NBD server on UNIX domain socket path ``/tmp/nbd.sock``::
+  To configure an NBD server on UNIX domain socket path
+  ``/var/run/qsd-nbd.sock``::
 
-  --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock
+  --nbd-server addr.type=unix,addr.path=/var/run/qsd-nbd.sock
 
 .. option:: --object help
   --object ,help
-- 
2.29.2



[PATCH v2 0/2] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Stefan Hajnoczi
Add an example of how to spawn qemu-storage-daemon with fd passing. This
approach eliminates the need to busy wait for the QMP, NBD, or vhost-user
socket to become available.

v2:
 * Use /var/run/qmp.sock instead of /tmp/qmp-$PID.sock to prevent security
   issues with world-writeable directories [Rich, Daniel]
 * Add Patch 2 to fix insecure examples in the documentation [Rich, Daniel]

Stefan Hajnoczi (2):
  docs: show how to spawn qemu-storage-daemon with fd passing
  docs: replace insecure /tmp examples in qsd docs

 docs/tools/qemu-storage-daemon.rst | 44 ++
 1 file changed, 39 insertions(+), 5 deletions(-)

-- 
2.29.2



[PATCH v2 1/2] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Stefan Hajnoczi
The QMP monitor, NBD server, and vhost-user-blk export all support file
descriptor passing. This is a useful technique because it allows the
parent process to spawn and wait for qemu-storage-daemon without busy
waiting, which may delay startup due to arbitrary sleep() calls.

This Python example is inspired by the test case written for libnbd by
Richard W.M. Jones :
https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543

Thanks to Daniel P. Berrangé  for suggestions on
how to get this working. Now let's document it!

Reported-by: Richard W.M. Jones 
Cc: Kevin Wolf 
Cc: Daniel P. Berrangé 
Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Use /var/run/qmp.sock instead of /tmp/qmp-$PID.sock to prevent
   security issues with world-writeable directories [Rich, Daniel]
---
 docs/tools/qemu-storage-daemon.rst | 37 --
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index f63627eaf6..3b67ca72df 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -101,10 +101,12 @@ Standard options:
 
 .. option:: --nbd-server 
addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
   --nbd-server 
addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
+  --nbd-server 
addr.type=fd,addr.str=[,tls-creds=][,tls-authz=][,max-connections=]
 
   is a server for NBD exports. Both TCP and UNIX domain sockets are supported.
-  TLS encryption can be configured using ``--object`` tls-creds-* and authz-*
-  secrets (see below).
+  A listen socket can be provided via file descriptor passing (see Examples
+  below). TLS encryption can be configured using ``--object`` tls-creds-* and
+  authz-* secrets (see below).
 
   To configure an NBD server on UNIX domain socket path ``/tmp/nbd.sock``::
 
@@ -127,6 +129,37 @@ QMP commands::
   --chardev socket,path=qmp.sock,server,nowait,id=char1 \
   --monitor chardev=char1
 
+Launch the daemon from Python with a QMP monitor socket using file descriptor
+passing so there is no need to busy wait for the QMP monitor to become
+available::
+
+  #!/usr/bin/env python3
+  import subprocess
+  import socket
+
+  sock_path = '/var/run/qmp.sock'
+
+  with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as listen_sock:
+  listen_sock.bind(sock_path)
+  listen_sock.listen()
+
+  fd = listen_sock.fileno()
+
+  subprocess.Popen(
+  ['qemu-storage-daemon',
+   '--chardev', f'socket,fd={fd},server=on,id=char1',
+   '--monitor', 'chardev=char1'],
+  pass_fds=[fd],
+  )
+
+  qmp_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
+  qmp_sock.connect(sock_path)
+  ...QMP interaction...
+
+The same socket spawning approach also works with the ``--nbd-server
+addr.type=fd,addr.str=`` and ``--export
+type=vhost-user-blk,addr.type=fd,addr.str=`` options.
+
 Export raw image file ``disk.img`` over NBD UNIX domain socket ``nbd.sock``::
 
   $ qemu-storage-daemon \
-- 
2.29.2



Re: [PATCH RFC v3 12/12] hw/block/nvme: add support for the format nvm command

2021-03-01 Thread Klaus Jensen
On Mar  2 01:09, Keith Busch wrote:
> On Wed, Feb 17, 2021 at 09:26:37AM +0100, Klaus Jensen wrote:
> > On Feb 16 15:16, Keith Busch wrote:
> > > On Mon, Feb 15, 2021 at 12:02:40AM +0100, Klaus Jensen wrote:
> > > > From: Minwoo Im 
> > > > 
> > > > Format NVM admin command can make a namespace or namespaces to be
> > > > with different LBA size and metadata size with protection information
> > > > types.
> > > > 
> > > > This patch introduces Format NVM command with LBA format, Metadata, and
> > > > Protection Information for the device. The secure erase operation things
> > > > are yet to be added.
> > > > 
> > > > The parameter checks inside of this patch has been referred from
> > > > Keith's old branch.
> > > 
> > > Oh, and here's the format command now, so my previous comment on patch
> > > 11 doesn't matter.
> > > 
> > > > +struct nvme_aio_format_ctx {
> > > > +NvmeRequest   *req;
> > > > +NvmeNamespace *ns;
> > > > +
> > > > +/* number of outstanding write zeroes for this namespace */
> > > > +int *count;
> > > 
> > > Shouldn't this count be the NvmeRequest's opaque value?
> > 
> > That is already occupied by `num_formats` which tracks formats of
> > individual namespaces. `count` is for outstanding write zeroes on one
> > particular namespace.
> 
> But why are they tracked separately? It looks like we only care about
> the number of outstanding zero-out commands. It doesn't matter how that
> number is spread across multiple namespaces.

It is to allow the Format In Progress status code to be returned
individually on the namespaces. When `count` is zero we set ns->status
back to 0x0.


signature.asc
Description: PGP signature


Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Daniel P . Berrangé
On Mon, Mar 01, 2021 at 04:50:14PM +, Stefan Hajnoczi wrote:
> On Mon, Mar 01, 2021 at 03:44:42PM +, Daniel P. Berrangé wrote:
> > On Mon, Mar 01, 2021 at 03:39:06PM +, Richard W.M. Jones wrote:
> > > On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
> > > > The QMP monitor, NBD server, and vhost-user-blk export all support file
> > > > descriptor passing. This is a useful technique because it allows the
> > > > parent process to spawn and wait for qemu-storage-daemon without busy
> > > > waiting, which may delay startup due to arbitrary sleep() calls.
> > > > 
> > > > This Python example is inspired by the test case written for libnbd by
> > > > Richard W.M. Jones :
> > > > https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
> > > > 
> > > > Thanks to Daniel P. Berrangé  for suggestions on
> > > > how to get this working. Now let's document it!
> > > > 
> > > > Reported-by: Richard W.M. Jones 
> > > > Cc: Kevin Wolf 
> > > > Cc: Daniel P. Berrangé 
> > > > Signed-off-by: Stefan Hajnoczi 
> > > > ---
> > > >  docs/tools/qemu-storage-daemon.rst | 38 --
> > > >  1 file changed, 36 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/docs/tools/qemu-storage-daemon.rst 
> > > > b/docs/tools/qemu-storage-daemon.rst
> > > > index f63627eaf6..45854c131e 100644
> > > > --- a/docs/tools/qemu-storage-daemon.rst
> > > > +++ b/docs/tools/qemu-storage-daemon.rst
> > > > @@ -101,10 +101,12 @@ Standard options:
> > > >  
> > > >  .. option:: --nbd-server 
> > > > addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
> > > >--nbd-server 
> > > > addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
> > > > +  --nbd-server 
> > > > addr.type=fd,addr.str=[,tls-creds=][,tls-authz=][,max-connections=]
> > > >  
> > > >is a server for NBD exports. Both TCP and UNIX domain sockets are 
> > > > supported.
> > > > -  TLS encryption can be configured using ``--object`` tls-creds-* and 
> > > > authz-*
> > > > -  secrets (see below).
> > > > +  A listen socket can be provided via file descriptor passing (see 
> > > > Examples
> > > > +  below). TLS encryption can be configured using ``--object`` 
> > > > tls-creds-* and
> > > > +  authz-* secrets (see below).
> > > >  
> > > >To configure an NBD server on UNIX domain socket path 
> > > > ``/tmp/nbd.sock``::
> > > >  
> > > > @@ -127,6 +129,38 @@ QMP commands::
> > > >--chardev socket,path=qmp.sock,server,nowait,id=char1 \
> > > >--monitor chardev=char1
> > > >  
> > > > +Launch the daemon from Python with a QMP monitor socket using file 
> > > > descriptor
> > > > +passing so there is no need to busy wait for the QMP monitor to become
> > > > +available::
> > > > +
> > > > +  #!/usr/bin/env python3
> > > > +  import os
> > > > +  import subprocess
> > > > +  import socket
> > > > +
> > > > +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())
> > > 
> > > Not sure how much you worry about the insecure / easily guessable tmp
> > > path here.  I notice that there's already one in the surrounding
> > > documentation (/tmp/nbd.sock) so maybe it's not a problem :-)
> > > 
> > > > +  with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as 
> > > > listen_sock:
> > > > +  listen_sock.bind(sock_path)
> > > > +  listen_sock.listen()
> > > > +
> > > > +  fd = listen_sock.fileno()
> > > > +
> > > > +  subprocess.Popen(
> > > > +  ['qemu-storage-daemon',
> > > > +   '--chardev', f'socket,fd={fd},server=on,id=char1',
> > > > +   '--monitor', 'chardev=char1'],
> > > > +  pass_fds=[fd],
> > > > +  )
> > > > +
> > > > +  qmp_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
> > > > +  qmp_sock.connect(sock_path)
> > > 
> > > A note that the order of opening the sockets is slightly different
> > > from how I did it in the interop test.  But I believe it makes no
> > > difference, as long as you don't connect to the socket until it's in
> > > the listening state, which is what you're doing here.  So it should be
> > > fine.
> > 
> > Nothing here is closing listen_sock in the parent though.
> > 
> > The trick of passing the listener FD into the child relies on the
> > listener being closed in the parent, so that the parent can get
> > a socket error if the child exits abnormally during startup. Keeping
> > the listen socket open means the parent will wait forever for an
> > accept() that never comes.
> 
> The listen socket is closed by the context manager at the end of the
> 'with' statement. This is the modern Python approach for resource
> acquisition that also handles exceptions automatically. It's like RAII
> in C++.

Hmm, yes, I didn't remember that at first. I'm not sure that is a good
idea as an example code, because people mapping this example into other
languages are likely to miss that critical detail.


Regards,
Daniel
-- 
|: https://berrange.com  -o-

Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Stefan Hajnoczi
On Mon, Mar 01, 2021 at 04:06:47PM +, Daniel P. Berrangé wrote:
> On Mon, Mar 01, 2021 at 09:49:21AM -0600, Eric Blake wrote:
> > On 3/1/21 9:41 AM, Daniel P. Berrangé wrote:
> > > On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
> > >> The QMP monitor, NBD server, and vhost-user-blk export all support file
> > >> descriptor passing. This is a useful technique because it allows the
> > >> parent process to spawn and wait for qemu-storage-daemon without busy
> > >> waiting, which may delay startup due to arbitrary sleep() calls.
> > >>
> > >> This Python example is inspired by the test case written for libnbd by
> > >> Richard W.M. Jones :
> > >> https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
> > >>
> > >> Thanks to Daniel P. Berrangé  for suggestions on
> > >> how to get this working. Now let's document it!
> > >>
> > 
> > >> +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())
> > > 
> > > Example code inevitably gets cut+paste into real world apps, and this
> > > example is a tmpfile CVE flaw. At least put it in $CWD instead.
> > 
> > Except $CWD may be too long for a sock file name to be created.
> > Creating the sock in a securely-created subdirectory of /tmp is more
> > reliable.
> 
> $XDG_RUNTIME_DIR then, which is /run/user/$UID, so safely per user on all
> modern OS.

Both you and Rich mentioned this. I'll change it to /var/run/qsd.pid.

I'll also update the /tmp/nbd.sock example in the documentation.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Stefan Hajnoczi
On Mon, Mar 01, 2021 at 03:39:06PM +, Richard W.M. Jones wrote:
> On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
> > The QMP monitor, NBD server, and vhost-user-blk export all support file
> > descriptor passing. This is a useful technique because it allows the
> > parent process to spawn and wait for qemu-storage-daemon without busy
> > waiting, which may delay startup due to arbitrary sleep() calls.
> > 
> > This Python example is inspired by the test case written for libnbd by
> > Richard W.M. Jones :
> > https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
> > 
> > Thanks to Daniel P. Berrangé  for suggestions on
> > how to get this working. Now let's document it!
> > 
> > Reported-by: Richard W.M. Jones 
> > Cc: Kevin Wolf 
> > Cc: Daniel P. Berrangé 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  docs/tools/qemu-storage-daemon.rst | 38 --
> >  1 file changed, 36 insertions(+), 2 deletions(-)
> > 
> > diff --git a/docs/tools/qemu-storage-daemon.rst 
> > b/docs/tools/qemu-storage-daemon.rst
> > index f63627eaf6..45854c131e 100644
> > --- a/docs/tools/qemu-storage-daemon.rst
> > +++ b/docs/tools/qemu-storage-daemon.rst
> > @@ -101,10 +101,12 @@ Standard options:
> >  
> >  .. option:: --nbd-server 
> > addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
> >--nbd-server 
> > addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
> > +  --nbd-server 
> > addr.type=fd,addr.str=[,tls-creds=][,tls-authz=][,max-connections=]
> >  
> >is a server for NBD exports. Both TCP and UNIX domain sockets are 
> > supported.
> > -  TLS encryption can be configured using ``--object`` tls-creds-* and 
> > authz-*
> > -  secrets (see below).
> > +  A listen socket can be provided via file descriptor passing (see Examples
> > +  below). TLS encryption can be configured using ``--object`` tls-creds-* 
> > and
> > +  authz-* secrets (see below).
> >  
> >To configure an NBD server on UNIX domain socket path ``/tmp/nbd.sock``::
> >  
> > @@ -127,6 +129,38 @@ QMP commands::
> >--chardev socket,path=qmp.sock,server,nowait,id=char1 \
> >--monitor chardev=char1
> >  
> > +Launch the daemon from Python with a QMP monitor socket using file 
> > descriptor
> > +passing so there is no need to busy wait for the QMP monitor to become
> > +available::
> > +
> > +  #!/usr/bin/env python3
> > +  import os
> > +  import subprocess
> > +  import socket
> > +
> > +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())
> 
> Not sure how much you worry about the insecure / easily guessable tmp
> path here.  I notice that there's already one in the surrounding
> documentation (/tmp/nbd.sock) so maybe it's not a problem :-)

Yes, the documentation doesn't address those issues. If I respin I'll
change the path to something that's less likely to be a globally
writeable directory (/var/run/... where the pid files usually go).

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Stefan Hajnoczi
On Mon, Mar 01, 2021 at 03:44:42PM +, Daniel P. Berrangé wrote:
> On Mon, Mar 01, 2021 at 03:39:06PM +, Richard W.M. Jones wrote:
> > On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
> > > The QMP monitor, NBD server, and vhost-user-blk export all support file
> > > descriptor passing. This is a useful technique because it allows the
> > > parent process to spawn and wait for qemu-storage-daemon without busy
> > > waiting, which may delay startup due to arbitrary sleep() calls.
> > > 
> > > This Python example is inspired by the test case written for libnbd by
> > > Richard W.M. Jones :
> > > https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
> > > 
> > > Thanks to Daniel P. Berrangé  for suggestions on
> > > how to get this working. Now let's document it!
> > > 
> > > Reported-by: Richard W.M. Jones 
> > > Cc: Kevin Wolf 
> > > Cc: Daniel P. Berrangé 
> > > Signed-off-by: Stefan Hajnoczi 
> > > ---
> > >  docs/tools/qemu-storage-daemon.rst | 38 --
> > >  1 file changed, 36 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/docs/tools/qemu-storage-daemon.rst 
> > > b/docs/tools/qemu-storage-daemon.rst
> > > index f63627eaf6..45854c131e 100644
> > > --- a/docs/tools/qemu-storage-daemon.rst
> > > +++ b/docs/tools/qemu-storage-daemon.rst
> > > @@ -101,10 +101,12 @@ Standard options:
> > >  
> > >  .. option:: --nbd-server 
> > > addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
> > >--nbd-server 
> > > addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
> > > +  --nbd-server 
> > > addr.type=fd,addr.str=[,tls-creds=][,tls-authz=][,max-connections=]
> > >  
> > >is a server for NBD exports. Both TCP and UNIX domain sockets are 
> > > supported.
> > > -  TLS encryption can be configured using ``--object`` tls-creds-* and 
> > > authz-*
> > > -  secrets (see below).
> > > +  A listen socket can be provided via file descriptor passing (see 
> > > Examples
> > > +  below). TLS encryption can be configured using ``--object`` 
> > > tls-creds-* and
> > > +  authz-* secrets (see below).
> > >  
> > >To configure an NBD server on UNIX domain socket path 
> > > ``/tmp/nbd.sock``::
> > >  
> > > @@ -127,6 +129,38 @@ QMP commands::
> > >--chardev socket,path=qmp.sock,server,nowait,id=char1 \
> > >--monitor chardev=char1
> > >  
> > > +Launch the daemon from Python with a QMP monitor socket using file 
> > > descriptor
> > > +passing so there is no need to busy wait for the QMP monitor to become
> > > +available::
> > > +
> > > +  #!/usr/bin/env python3
> > > +  import os
> > > +  import subprocess
> > > +  import socket
> > > +
> > > +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())
> > 
> > Not sure how much you worry about the insecure / easily guessable tmp
> > path here.  I notice that there's already one in the surrounding
> > documentation (/tmp/nbd.sock) so maybe it's not a problem :-)
> > 
> > > +  with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as listen_sock:
> > > +  listen_sock.bind(sock_path)
> > > +  listen_sock.listen()
> > > +
> > > +  fd = listen_sock.fileno()
> > > +
> > > +  subprocess.Popen(
> > > +  ['qemu-storage-daemon',
> > > +   '--chardev', f'socket,fd={fd},server=on,id=char1',
> > > +   '--monitor', 'chardev=char1'],
> > > +  pass_fds=[fd],
> > > +  )
> > > +
> > > +  qmp_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
> > > +  qmp_sock.connect(sock_path)
> > 
> > A note that the order of opening the sockets is slightly different
> > from how I did it in the interop test.  But I believe it makes no
> > difference, as long as you don't connect to the socket until it's in
> > the listening state, which is what you're doing here.  So it should be
> > fine.
> 
> Nothing here is closing listen_sock in the parent though.
> 
> The trick of passing the listener FD into the child relies on the
> listener being closed in the parent, so that the parent can get
> a socket error if the child exits abnormally during startup. Keeping
> the listen socket open means the parent will wait forever for an
> accept() that never comes.

The listen socket is closed by the context manager at the end of the
'with' statement. This is the modern Python approach for resource
acquisition that also handles exceptions automatically. It's like RAII
in C++.

https://www.python.org/dev/peps/pep-0343/#specification-the-with-statement

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] iotests: Fix up python style in 300

2021-03-01 Thread John Snow

On 2/26/21 2:04 AM, Vladimir Sementsov-Ogievskiy wrote:

16.02.2021 02:21, John Snow wrote:

On 2/15/21 5:05 PM, Eric Blake wrote:

Break some long lines, and relax our type hints to be more generic to
any JSON, in order to more easily permit the additional JSON depth now
possible in migration parameters.  Detected by iotest 297.

Fixes: ca4bfec41d56
  (qemu-iotests: 300: Add test case for modifying persistence of bitmap)
Reported-by: Kevin Wolf 
Signed-off-by: Eric Blake 


Reviewed-by: John Snow 


---
  tests/qemu-iotests/300 | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
index 63036f6a6e13..adb927629747 100755
--- a/tests/qemu-iotests/300
+++ b/tests/qemu-iotests/300
@@ -22,7 +22,7 @@
  import os
  import random
  import re
-from typing import Dict, List, Optional, Union
+from typing import Dict, List, Optional

  import iotests

@@ -30,7 +30,7 @@ import iotests
  # pylint: disable=wrong-import-order
  import qemu

-BlockBitmapMapping = List[Dict[str, Union[str, List[Dict[str, str]
+BlockBitmapMapping = List[Dict[str, object]]



Assuming iotest 297 didn't yap about this, I think this has the 
necessary power for this file and we don't have to work any harder.


If in the future you try to treat e.g. bmap['persistent'] as a 
particular kind of value (string? bool? int?) mypy will likely 
complain about that a little, saying it has no insight into the type 
beyond "object".


If *that* becomes annoying, you can degrade this type to use 'Any' 
instead of 'object' and even those checks will cease.


Probably at some future moment we'll have generated python types for 
QAPI structures ? :)




That's my hope, yes!

Typing the QAPI generator is something I see as a step to doing this so 
that we can safely work on the QAPI generator a bit more vigorously.


Marc-Andre is adding rust backends, I'd like to add either a Python or a 
JSON-Schema backend to help generate a fully typed SDK for us in Python.


I don't know how suitable those tools will be to use in the test suite; 
I suspect that every last build of QEMU from the development tree will 
have to possibly re-generate such a Python module.


When I get a little closer to a prototype for this I will try to 
announce it. In the meantime I am very fastidiously trying to strictly 
type the QAPI generator and move it to ./python/qemu/qapi.


--js




Re: [PATCH v2 1/2] storage-daemon: report unexpected arguments on the fly

2021-03-01 Thread Paolo Bonzini

On 01/03/21 16:38, Eric Blake wrote:

On 3/1/21 9:28 AM, Paolo Bonzini wrote:

If the first character of optstring is '-', then each nonoption argv
element is handled as if it were the argument of an option with character
code 1.  This removes the reordering of the argv array, and enables usage
of loc_set_cmdline to provide better error messages.

Signed-off-by: Paolo Bonzini 
---
  storage-daemon/qemu-storage-daemon.c | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)


Nice.  The man page for 'getopt_long' is unclear whether setting
POSIXLY_CORRECT in the environment would break this


It doesn't.  In fact, with this patch the behavior is the same as for 
POSIXLY_CORRECT, though for unrelated reasons, and interestingly enough 
I think the POSIXLY_CORRECT behavior is an improvement for 
qemu-storage-daemon.


Unpatched:

$ qemu-storage-daemon foo --object iothread
qemu-storage-daemon: Parameter 'id' is missing

$ POSIXLY_CORRECT=1 qemu-storage-daemon foo --object iothread
qemu-storage-daemon: Unexpected argument: foo

Patched:

$ storage-daemon/qemu-storage-daemon foo --object iothread
qemu-storage-daemon: foo: Unexpected argument
$ POSIXLY_CORRECT=1 storage-daemon/qemu-storage-daemon foo --object iothread
qemu-storage-daemon: foo: Unexpected argument

Paolo




Re: [PATCH] qemu-storage-daemon: add --pidfile option

2021-03-01 Thread Richard W.M. Jones
On Mon, Mar 01, 2021 at 04:15:47PM +, Daniel P. Berrangé wrote:
> On Mon, Mar 01, 2021 at 04:08:57PM +, Stefan Hajnoczi wrote:
> > Daemons often have a --pidfile option where the pid is written to a file
> > so that scripts can stop the daemon by sending a signal.
> > 
> > The pid file also acts as a lock to prevent multiple instances of the
> > daemon from launching for a given pid file.
> > 
> > QEMU, qemu-nbd, qemu-ga, virtiofsd, and qemu-pr-helper all support the
> > --pidfile option. Add it to qemu-storage-daemon too.
> > 
> > Reported-by: Richard W.M. Jones 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  docs/tools/qemu-storage-daemon.rst   | 10 ++
> >  storage-daemon/qemu-storage-daemon.c | 29 
> >  2 files changed, 39 insertions(+)
> > 
> > diff --git a/docs/tools/qemu-storage-daemon.rst 
> > b/docs/tools/qemu-storage-daemon.rst
> > index f63627eaf6..8f4ab16ffc 100644
> > --- a/docs/tools/qemu-storage-daemon.rst
> > +++ b/docs/tools/qemu-storage-daemon.rst
> > @@ -118,6 +118,16 @@ Standard options:
> >List object properties with ``,help``. See the :manpage:`qemu(1)`
> >manual page for a description of the object properties.
> >  
> > +.. option:: --pidfile PATH
> > +
> > +  is the path to a file where the daemon writes its pid. This allows 
> > scripts to
> > +  stop the daemon by sending a signal::
> > +
> > +$ kill -SIGTERM $( > +
> > +  A file lock is applied to the file so only one instance of the daemon 
> > can run
> > +  with a given pid file path. The daemon unlinks its pid file when 
> > terminating.
> 
> Usually a pidfile wants to have some explicit synchronization rules
> defined. AFAICS, qsd doesn't have a --daemonize option to sync against.
> If we're using the FD passing trick for the monitor, however, we want
> a guarantee that the pidfile is written before the monitor accepts the
> first client.
> 
> IOW, the parent process needs to know that once it has done the QMP
> handshake, there is guaranteed tobe a pidfile present, or if the
> QMP handshake fails, then the app is guaranteed to have quit.
> 
> IIUC, this impl should be ok in this respect, because we won't process
> the QMP handdshake until the main loop runs, at which point the pidfile
> is present. So we just need to document that the pidfile is guaranteed
> to be written by the time QMP is active.

I'm not sure if I follow this exactly, but from my point of view I'd
like to know that:

(1) If we're using --nbd-server addr.type=inet|unix then the PID file
must not be created until the socket has been created and is
listening.  Here I mean the NBD socket, but the same would apply to
the QMP socket or any other listening socket.  This allows you to do
scripting sanely (qemu-storage-daemon ... &) without arbitrary sleeps.

(2) If we're using the FD passing trick instead, we don't care and
would probably not use the --pidfile option anyway (since we have the
PID from calling fork).

Rich.

> 
> > +
> >  Examples
> >  
> >  Launch the daemon with QMP monitor socket ``qmp.sock`` so clients can 
> > execute
> > diff --git a/storage-daemon/qemu-storage-daemon.c 
> > b/storage-daemon/qemu-storage-daemon.c
> > index 9021a46b3a..011ae49ac3 100644
> > --- a/storage-daemon/qemu-storage-daemon.c
> > +++ b/storage-daemon/qemu-storage-daemon.c
> > @@ -59,6 +59,7 @@
> >  #include "sysemu/runstate.h"
> >  #include "trace/control.h"
> >  
> > +static const char *pid_file;
> >  static volatile bool exit_requested = false;
> >  
> >  void qemu_system_killed(int signal, pid_t pid)
> > @@ -126,6 +127,7 @@ enum {
> >  OPTION_MONITOR,
> >  OPTION_NBD_SERVER,
> >  OPTION_OBJECT,
> > +OPTION_PIDFILE,
> >  };
> >  
> >  extern QemuOptsList qemu_chardev_opts;
> > @@ -164,6 +166,7 @@ static void process_options(int argc, char *argv[])
> >  {"monitor", required_argument, NULL, OPTION_MONITOR},
> >  {"nbd-server", required_argument, NULL, OPTION_NBD_SERVER},
> >  {"object", required_argument, NULL, OPTION_OBJECT},
> > +{"pidfile", required_argument, NULL, OPTION_PIDFILE},
> >  {"trace", required_argument, NULL, 'T'},
> >  {"version", no_argument, NULL, 'V'},
> >  {0, 0, 0, 0}
> > @@ -275,6 +278,9 @@ static void process_options(int argc, char *argv[])
> >  qobject_unref(args);
> >  break;
> >  }
> > +case OPTION_PIDFILE:
> > +pid_file = optarg;
> > +break;
> >  default:
> >  g_assert_not_reached();
> >  }
> > @@ -285,6 +291,27 @@ static void process_options(int argc, char *argv[])
> >  }
> >  }
> >  
> > +static void pid_file_cleanup(void)
> > +{
> > +unlink(pid_file);
> > +}
> > +
> > +static void pid_file_init(void)
> > +{
> > +Error *err = NULL;
> > +
> > +if (!pid_file) {
> > +return;
> > +}
> > +
> > +if (!qemu_write_pidfile(pid_file, )) {
> > +error_reportf_err(err, "cannot 

Re: [PATCH] qemu-storage-daemon: add --pidfile option

2021-03-01 Thread Daniel P . Berrangé
On Mon, Mar 01, 2021 at 04:08:57PM +, Stefan Hajnoczi wrote:
> Daemons often have a --pidfile option where the pid is written to a file
> so that scripts can stop the daemon by sending a signal.
> 
> The pid file also acts as a lock to prevent multiple instances of the
> daemon from launching for a given pid file.
> 
> QEMU, qemu-nbd, qemu-ga, virtiofsd, and qemu-pr-helper all support the
> --pidfile option. Add it to qemu-storage-daemon too.
> 
> Reported-by: Richard W.M. Jones 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  docs/tools/qemu-storage-daemon.rst   | 10 ++
>  storage-daemon/qemu-storage-daemon.c | 29 
>  2 files changed, 39 insertions(+)
> 
> diff --git a/docs/tools/qemu-storage-daemon.rst 
> b/docs/tools/qemu-storage-daemon.rst
> index f63627eaf6..8f4ab16ffc 100644
> --- a/docs/tools/qemu-storage-daemon.rst
> +++ b/docs/tools/qemu-storage-daemon.rst
> @@ -118,6 +118,16 @@ Standard options:
>List object properties with ``,help``. See the :manpage:`qemu(1)`
>manual page for a description of the object properties.
>  
> +.. option:: --pidfile PATH
> +
> +  is the path to a file where the daemon writes its pid. This allows scripts 
> to
> +  stop the daemon by sending a signal::
> +
> +$ kill -SIGTERM $( +
> +  A file lock is applied to the file so only one instance of the daemon can 
> run
> +  with a given pid file path. The daemon unlinks its pid file when 
> terminating.

Usually a pidfile wants to have some explicit synchronization rules
defined. AFAICS, qsd doesn't have a --daemonize option to sync against.
If we're using the FD passing trick for the monitor, however, we want
a guarantee that the pidfile is written before the monitor accepts the
first client.

IOW, the parent process needs to know that once it has done the QMP
handshake, there is guaranteed tobe a pidfile present, or if the
QMP handshake fails, then the app is guaranteed to have quit.

IIUC, this impl should be ok in this respect, because we won't process
the QMP handdshake until the main loop runs, at which point the pidfile
is present. So we just need to document that the pidfile is guaranteed
to be written by the time QMP is active.


> +
>  Examples
>  
>  Launch the daemon with QMP monitor socket ``qmp.sock`` so clients can execute
> diff --git a/storage-daemon/qemu-storage-daemon.c 
> b/storage-daemon/qemu-storage-daemon.c
> index 9021a46b3a..011ae49ac3 100644
> --- a/storage-daemon/qemu-storage-daemon.c
> +++ b/storage-daemon/qemu-storage-daemon.c
> @@ -59,6 +59,7 @@
>  #include "sysemu/runstate.h"
>  #include "trace/control.h"
>  
> +static const char *pid_file;
>  static volatile bool exit_requested = false;
>  
>  void qemu_system_killed(int signal, pid_t pid)
> @@ -126,6 +127,7 @@ enum {
>  OPTION_MONITOR,
>  OPTION_NBD_SERVER,
>  OPTION_OBJECT,
> +OPTION_PIDFILE,
>  };
>  
>  extern QemuOptsList qemu_chardev_opts;
> @@ -164,6 +166,7 @@ static void process_options(int argc, char *argv[])
>  {"monitor", required_argument, NULL, OPTION_MONITOR},
>  {"nbd-server", required_argument, NULL, OPTION_NBD_SERVER},
>  {"object", required_argument, NULL, OPTION_OBJECT},
> +{"pidfile", required_argument, NULL, OPTION_PIDFILE},
>  {"trace", required_argument, NULL, 'T'},
>  {"version", no_argument, NULL, 'V'},
>  {0, 0, 0, 0}
> @@ -275,6 +278,9 @@ static void process_options(int argc, char *argv[])
>  qobject_unref(args);
>  break;
>  }
> +case OPTION_PIDFILE:
> +pid_file = optarg;
> +break;
>  default:
>  g_assert_not_reached();
>  }
> @@ -285,6 +291,27 @@ static void process_options(int argc, char *argv[])
>  }
>  }
>  
> +static void pid_file_cleanup(void)
> +{
> +unlink(pid_file);
> +}
> +
> +static void pid_file_init(void)
> +{
> +Error *err = NULL;
> +
> +if (!pid_file) {
> +return;
> +}
> +
> +if (!qemu_write_pidfile(pid_file, )) {
> +error_reportf_err(err, "cannot create PID file: ");
> +exit(EXIT_FAILURE);
> +}
> +
> +atexit(pid_file_cleanup);
> +}
> +
>  int main(int argc, char *argv[])
>  {
>  #ifdef CONFIG_POSIX
> @@ -312,6 +339,8 @@ int main(int argc, char *argv[])
>  qemu_init_main_loop(_fatal);
>  process_options(argc, argv);
>  
> +pid_file_init();
> +
>  while (!exit_requested) {
>  main_loop_wait(false);
>  }
> -- 
> 2.29.2
> 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH RFC v3 12/12] hw/block/nvme: add support for the format nvm command

2021-03-01 Thread Keith Busch
On Wed, Feb 17, 2021 at 09:26:37AM +0100, Klaus Jensen wrote:
> On Feb 16 15:16, Keith Busch wrote:
> > On Mon, Feb 15, 2021 at 12:02:40AM +0100, Klaus Jensen wrote:
> > > From: Minwoo Im 
> > > 
> > > Format NVM admin command can make a namespace or namespaces to be
> > > with different LBA size and metadata size with protection information
> > > types.
> > > 
> > > This patch introduces Format NVM command with LBA format, Metadata, and
> > > Protection Information for the device. The secure erase operation things
> > > are yet to be added.
> > > 
> > > The parameter checks inside of this patch has been referred from
> > > Keith's old branch.
> > 
> > Oh, and here's the format command now, so my previous comment on patch
> > 11 doesn't matter.
> > 
> > > +struct nvme_aio_format_ctx {
> > > +NvmeRequest   *req;
> > > +NvmeNamespace *ns;
> > > +
> > > +/* number of outstanding write zeroes for this namespace */
> > > +int *count;
> > 
> > Shouldn't this count be the NvmeRequest's opaque value?
> 
> That is already occupied by `num_formats` which tracks formats of
> individual namespaces. `count` is for outstanding write zeroes on one
> particular namespace.

But why are they tracked separately? It looks like we only care about
the number of outstanding zero-out commands. It doesn't matter how that
number is spread across multiple namespaces.



[PATCH] qemu-storage-daemon: add --pidfile option

2021-03-01 Thread Stefan Hajnoczi
Daemons often have a --pidfile option where the pid is written to a file
so that scripts can stop the daemon by sending a signal.

The pid file also acts as a lock to prevent multiple instances of the
daemon from launching for a given pid file.

QEMU, qemu-nbd, qemu-ga, virtiofsd, and qemu-pr-helper all support the
--pidfile option. Add it to qemu-storage-daemon too.

Reported-by: Richard W.M. Jones 
Signed-off-by: Stefan Hajnoczi 
---
 docs/tools/qemu-storage-daemon.rst   | 10 ++
 storage-daemon/qemu-storage-daemon.c | 29 
 2 files changed, 39 insertions(+)

diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index f63627eaf6..8f4ab16ffc 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -118,6 +118,16 @@ Standard options:
   List object properties with ``,help``. See the :manpage:`qemu(1)`
   manual page for a description of the object properties.
 
+.. option:: --pidfile PATH
+
+  is the path to a file where the daemon writes its pid. This allows scripts to
+  stop the daemon by sending a signal::
+
+$ kill -SIGTERM $(

Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Daniel P . Berrangé
On Mon, Mar 01, 2021 at 09:49:21AM -0600, Eric Blake wrote:
> On 3/1/21 9:41 AM, Daniel P. Berrangé wrote:
> > On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
> >> The QMP monitor, NBD server, and vhost-user-blk export all support file
> >> descriptor passing. This is a useful technique because it allows the
> >> parent process to spawn and wait for qemu-storage-daemon without busy
> >> waiting, which may delay startup due to arbitrary sleep() calls.
> >>
> >> This Python example is inspired by the test case written for libnbd by
> >> Richard W.M. Jones :
> >> https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
> >>
> >> Thanks to Daniel P. Berrangé  for suggestions on
> >> how to get this working. Now let's document it!
> >>
> 
> >> +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())
> > 
> > Example code inevitably gets cut+paste into real world apps, and this
> > example is a tmpfile CVE flaw. At least put it in $CWD instead.
> 
> Except $CWD may be too long for a sock file name to be created.
> Creating the sock in a securely-created subdirectory of /tmp is more
> reliable.

$XDG_RUNTIME_DIR then, which is /run/user/$UID, so safely per user on all
modern OS.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Eric Blake
On 3/1/21 9:41 AM, Daniel P. Berrangé wrote:
> On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
>> The QMP monitor, NBD server, and vhost-user-blk export all support file
>> descriptor passing. This is a useful technique because it allows the
>> parent process to spawn and wait for qemu-storage-daemon without busy
>> waiting, which may delay startup due to arbitrary sleep() calls.
>>
>> This Python example is inspired by the test case written for libnbd by
>> Richard W.M. Jones :
>> https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
>>
>> Thanks to Daniel P. Berrangé  for suggestions on
>> how to get this working. Now let's document it!
>>

>> +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())
> 
> Example code inevitably gets cut+paste into real world apps, and this
> example is a tmpfile CVE flaw. At least put it in $CWD instead.

Except $CWD may be too long for a sock file name to be created.
Creating the sock in a securely-created subdirectory of /tmp is more
reliable.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Daniel P . Berrangé
On Mon, Mar 01, 2021 at 03:39:06PM +, Richard W.M. Jones wrote:
> On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
> > The QMP monitor, NBD server, and vhost-user-blk export all support file
> > descriptor passing. This is a useful technique because it allows the
> > parent process to spawn and wait for qemu-storage-daemon without busy
> > waiting, which may delay startup due to arbitrary sleep() calls.
> > 
> > This Python example is inspired by the test case written for libnbd by
> > Richard W.M. Jones :
> > https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
> > 
> > Thanks to Daniel P. Berrangé  for suggestions on
> > how to get this working. Now let's document it!
> > 
> > Reported-by: Richard W.M. Jones 
> > Cc: Kevin Wolf 
> > Cc: Daniel P. Berrangé 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  docs/tools/qemu-storage-daemon.rst | 38 --
> >  1 file changed, 36 insertions(+), 2 deletions(-)
> > 
> > diff --git a/docs/tools/qemu-storage-daemon.rst 
> > b/docs/tools/qemu-storage-daemon.rst
> > index f63627eaf6..45854c131e 100644
> > --- a/docs/tools/qemu-storage-daemon.rst
> > +++ b/docs/tools/qemu-storage-daemon.rst
> > @@ -101,10 +101,12 @@ Standard options:
> >  
> >  .. option:: --nbd-server 
> > addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
> >--nbd-server 
> > addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
> > +  --nbd-server 
> > addr.type=fd,addr.str=[,tls-creds=][,tls-authz=][,max-connections=]
> >  
> >is a server for NBD exports. Both TCP and UNIX domain sockets are 
> > supported.
> > -  TLS encryption can be configured using ``--object`` tls-creds-* and 
> > authz-*
> > -  secrets (see below).
> > +  A listen socket can be provided via file descriptor passing (see Examples
> > +  below). TLS encryption can be configured using ``--object`` tls-creds-* 
> > and
> > +  authz-* secrets (see below).
> >  
> >To configure an NBD server on UNIX domain socket path ``/tmp/nbd.sock``::
> >  
> > @@ -127,6 +129,38 @@ QMP commands::
> >--chardev socket,path=qmp.sock,server,nowait,id=char1 \
> >--monitor chardev=char1
> >  
> > +Launch the daemon from Python with a QMP monitor socket using file 
> > descriptor
> > +passing so there is no need to busy wait for the QMP monitor to become
> > +available::
> > +
> > +  #!/usr/bin/env python3
> > +  import os
> > +  import subprocess
> > +  import socket
> > +
> > +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())
> 
> Not sure how much you worry about the insecure / easily guessable tmp
> path here.  I notice that there's already one in the surrounding
> documentation (/tmp/nbd.sock) so maybe it's not a problem :-)
> 
> > +  with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as listen_sock:
> > +  listen_sock.bind(sock_path)
> > +  listen_sock.listen()
> > +
> > +  fd = listen_sock.fileno()
> > +
> > +  subprocess.Popen(
> > +  ['qemu-storage-daemon',
> > +   '--chardev', f'socket,fd={fd},server=on,id=char1',
> > +   '--monitor', 'chardev=char1'],
> > +  pass_fds=[fd],
> > +  )
> > +
> > +  qmp_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
> > +  qmp_sock.connect(sock_path)
> 
> A note that the order of opening the sockets is slightly different
> from how I did it in the interop test.  But I believe it makes no
> difference, as long as you don't connect to the socket until it's in
> the listening state, which is what you're doing here.  So it should be
> fine.

Nothing here is closing listen_sock in the parent though.

The trick of passing the listener FD into the child relies on the
listener being closed in the parent, so that the parent can get
a socket error if the child exits abnormally during startup. Keeping
the listen socket open means the parent will wait forever for an
accept() that never comes.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Daniel P . Berrangé
On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
> The QMP monitor, NBD server, and vhost-user-blk export all support file
> descriptor passing. This is a useful technique because it allows the
> parent process to spawn and wait for qemu-storage-daemon without busy
> waiting, which may delay startup due to arbitrary sleep() calls.
> 
> This Python example is inspired by the test case written for libnbd by
> Richard W.M. Jones :
> https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
> 
> Thanks to Daniel P. Berrangé  for suggestions on
> how to get this working. Now let's document it!
> 
> Reported-by: Richard W.M. Jones 
> Cc: Kevin Wolf 
> Cc: Daniel P. Berrangé 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  docs/tools/qemu-storage-daemon.rst | 38 --
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/tools/qemu-storage-daemon.rst 
> b/docs/tools/qemu-storage-daemon.rst
> index f63627eaf6..45854c131e 100644
> --- a/docs/tools/qemu-storage-daemon.rst
> +++ b/docs/tools/qemu-storage-daemon.rst
> @@ -101,10 +101,12 @@ Standard options:
>  
>  .. option:: --nbd-server 
> addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
>--nbd-server 
> addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
> +  --nbd-server 
> addr.type=fd,addr.str=[,tls-creds=][,tls-authz=][,max-connections=]
>  
>is a server for NBD exports. Both TCP and UNIX domain sockets are 
> supported.
> -  TLS encryption can be configured using ``--object`` tls-creds-* and authz-*
> -  secrets (see below).
> +  A listen socket can be provided via file descriptor passing (see Examples
> +  below). TLS encryption can be configured using ``--object`` tls-creds-* and
> +  authz-* secrets (see below).
>  
>To configure an NBD server on UNIX domain socket path ``/tmp/nbd.sock``::
>  
> @@ -127,6 +129,38 @@ QMP commands::
>--chardev socket,path=qmp.sock,server,nowait,id=char1 \
>--monitor chardev=char1
>  
> +Launch the daemon from Python with a QMP monitor socket using file descriptor
> +passing so there is no need to busy wait for the QMP monitor to become
> +available::
> +
> +  #!/usr/bin/env python3
> +  import os
> +  import subprocess
> +  import socket
> +
> +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())

Example code inevitably gets cut+paste into real world apps, and this
example is a tmpfile CVE flaw. At least put it in $CWD instead.

> +
> +  with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as listen_sock:
> +  listen_sock.bind(sock_path)
> +  listen_sock.listen()
> +
> +  fd = listen_sock.fileno()
> +
> +  subprocess.Popen(
> +  ['qemu-storage-daemon',
> +   '--chardev', f'socket,fd={fd},server=on,id=char1',
> +   '--monitor', 'chardev=char1'],
> +  pass_fds=[fd],
> +  )
> +
> +  qmp_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
> +  qmp_sock.connect(sock_path)
> +  ...QMP interaction...
> +
> +The same socket spawning approach also works with the ``--nbd-server
> +addr.type=fd,addr.str=`` and ``--export
> +type=vhost-user-blk,addr.type=fd,addr.str=`` options.
> +
>  Export raw image file ``disk.img`` over NBD UNIX domain socket ``nbd.sock``::
>  
>$ qemu-storage-daemon \
> -- 
> 2.29.2
> 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Richard W.M. Jones
On Mon, Mar 01, 2021 at 03:31:59PM +, Stefan Hajnoczi wrote:
> The QMP monitor, NBD server, and vhost-user-blk export all support file
> descriptor passing. This is a useful technique because it allows the
> parent process to spawn and wait for qemu-storage-daemon without busy
> waiting, which may delay startup due to arbitrary sleep() calls.
> 
> This Python example is inspired by the test case written for libnbd by
> Richard W.M. Jones :
> https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543
> 
> Thanks to Daniel P. Berrangé  for suggestions on
> how to get this working. Now let's document it!
> 
> Reported-by: Richard W.M. Jones 
> Cc: Kevin Wolf 
> Cc: Daniel P. Berrangé 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  docs/tools/qemu-storage-daemon.rst | 38 --
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/tools/qemu-storage-daemon.rst 
> b/docs/tools/qemu-storage-daemon.rst
> index f63627eaf6..45854c131e 100644
> --- a/docs/tools/qemu-storage-daemon.rst
> +++ b/docs/tools/qemu-storage-daemon.rst
> @@ -101,10 +101,12 @@ Standard options:
>  
>  .. option:: --nbd-server 
> addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
>--nbd-server 
> addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
> +  --nbd-server 
> addr.type=fd,addr.str=[,tls-creds=][,tls-authz=][,max-connections=]
>  
>is a server for NBD exports. Both TCP and UNIX domain sockets are 
> supported.
> -  TLS encryption can be configured using ``--object`` tls-creds-* and authz-*
> -  secrets (see below).
> +  A listen socket can be provided via file descriptor passing (see Examples
> +  below). TLS encryption can be configured using ``--object`` tls-creds-* and
> +  authz-* secrets (see below).
>  
>To configure an NBD server on UNIX domain socket path ``/tmp/nbd.sock``::
>  
> @@ -127,6 +129,38 @@ QMP commands::
>--chardev socket,path=qmp.sock,server,nowait,id=char1 \
>--monitor chardev=char1
>  
> +Launch the daemon from Python with a QMP monitor socket using file descriptor
> +passing so there is no need to busy wait for the QMP monitor to become
> +available::
> +
> +  #!/usr/bin/env python3
> +  import os
> +  import subprocess
> +  import socket
> +
> +  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())

Not sure how much you worry about the insecure / easily guessable tmp
path here.  I notice that there's already one in the surrounding
documentation (/tmp/nbd.sock) so maybe it's not a problem :-)

> +  with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as listen_sock:
> +  listen_sock.bind(sock_path)
> +  listen_sock.listen()
> +
> +  fd = listen_sock.fileno()
> +
> +  subprocess.Popen(
> +  ['qemu-storage-daemon',
> +   '--chardev', f'socket,fd={fd},server=on,id=char1',
> +   '--monitor', 'chardev=char1'],
> +  pass_fds=[fd],
> +  )
> +
> +  qmp_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
> +  qmp_sock.connect(sock_path)

A note that the order of opening the sockets is slightly different
from how I did it in the interop test.  But I believe it makes no
difference, as long as you don't connect to the socket until it's in
the listening state, which is what you're doing here.  So it should be
fine.

> +  ...QMP interaction...
> +
> +The same socket spawning approach also works with the ``--nbd-server
> +addr.type=fd,addr.str=`` and ``--export
> +type=vhost-user-blk,addr.type=fd,addr.str=`` options.
> +
>  Export raw image file ``disk.img`` over NBD UNIX domain socket ``nbd.sock``::

The patch looks fine in general:

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW




Re: [PATCH v2 1/2] storage-daemon: report unexpected arguments on the fly

2021-03-01 Thread Eric Blake
On 3/1/21 9:28 AM, Paolo Bonzini wrote:
> If the first character of optstring is '-', then each nonoption argv
> element is handled as if it were the argument of an option with character
> code 1.  This removes the reordering of the argv array, and enables usage
> of loc_set_cmdline to provide better error messages.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  storage-daemon/qemu-storage-daemon.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)

Nice.  The man page for 'getopt_long' is unclear whether setting
POSIXLY_CORRECT in the environment would break this (that is, setting
POSIXLY_CORRECT has the same effect as a leading '+'; but you can't have
both leading '+' and leading '-' and when both are set, it is not clear
which one wins).  But that's a corner case that I don't think will ever
bite us in real life.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 5/5] hw/block/nvme: report non-mdts command size limit for dsm

2021-03-01 Thread Keith Busch
On Mon, Mar 01, 2021 at 12:15:26PM +0100, Klaus Jensen wrote:
> On Feb 22 22:12, Klaus Jensen wrote:
> > On Feb 23 05:55, Keith Busch wrote:
> > > On Mon, Feb 22, 2021 at 07:47:59PM +0100, Klaus Jensen wrote:
> > > > +typedef struct NvmeIdCtrlNvm {
> > > > +uint8_t vsl;
> > > > +uint8_t wzsl;
> > > > +uint8_t wusl;
> > > > +uint8_t dmrl;
> > > > +uint32_tdmrsl;
> > > > +uint64_tdmsl;
> > > > +uint8_t rsvd16[4080];
> > > > +} NvmeIdCtrlNvm;
> > > 
> > > TP 4040a still displays these fields with preceding '...' indicating
> > > something comes before this. Is that just left-over from the integration
> > > for TBD offsets, or is there something that still hasn't been accounted
> > > for?
> > 
> > Good question.
> > 
> > But since the TBDs have been assigned I believe it is just a left-over.
> > I must admit that I have not cross checked this with all other TPs, but
> > AFAIK this is the only ratified TP that adds something to the
> > NVM-specific identify controller data structure.
> 
> Are you otherwise OK with this?

Yes, I compared other TP's and it appears to be set for good once an
actual numeric value is assigned, so okay to go here.

Reviewed-by: Keith Busch 



[PATCH] docs: show how to spawn qemu-storage-daemon with fd passing

2021-03-01 Thread Stefan Hajnoczi
The QMP monitor, NBD server, and vhost-user-blk export all support file
descriptor passing. This is a useful technique because it allows the
parent process to spawn and wait for qemu-storage-daemon without busy
waiting, which may delay startup due to arbitrary sleep() calls.

This Python example is inspired by the test case written for libnbd by
Richard W.M. Jones :
https://gitlab.com/nbdkit/libnbd/-/commit/89113f484effb0e6c322314ba75c1cbe07a04543

Thanks to Daniel P. Berrangé  for suggestions on
how to get this working. Now let's document it!

Reported-by: Richard W.M. Jones 
Cc: Kevin Wolf 
Cc: Daniel P. Berrangé 
Signed-off-by: Stefan Hajnoczi 
---
 docs/tools/qemu-storage-daemon.rst | 38 --
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index f63627eaf6..45854c131e 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -101,10 +101,12 @@ Standard options:
 
 .. option:: --nbd-server 
addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
   --nbd-server 
addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
+  --nbd-server 
addr.type=fd,addr.str=[,tls-creds=][,tls-authz=][,max-connections=]
 
   is a server for NBD exports. Both TCP and UNIX domain sockets are supported.
-  TLS encryption can be configured using ``--object`` tls-creds-* and authz-*
-  secrets (see below).
+  A listen socket can be provided via file descriptor passing (see Examples
+  below). TLS encryption can be configured using ``--object`` tls-creds-* and
+  authz-* secrets (see below).
 
   To configure an NBD server on UNIX domain socket path ``/tmp/nbd.sock``::
 
@@ -127,6 +129,38 @@ QMP commands::
   --chardev socket,path=qmp.sock,server,nowait,id=char1 \
   --monitor chardev=char1
 
+Launch the daemon from Python with a QMP monitor socket using file descriptor
+passing so there is no need to busy wait for the QMP monitor to become
+available::
+
+  #!/usr/bin/env python3
+  import os
+  import subprocess
+  import socket
+
+  sock_path = '/tmp/qmp-{}.sock'.format(os.getpid())
+
+  with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as listen_sock:
+  listen_sock.bind(sock_path)
+  listen_sock.listen()
+
+  fd = listen_sock.fileno()
+
+  subprocess.Popen(
+  ['qemu-storage-daemon',
+   '--chardev', f'socket,fd={fd},server=on,id=char1',
+   '--monitor', 'chardev=char1'],
+  pass_fds=[fd],
+  )
+
+  qmp_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
+  qmp_sock.connect(sock_path)
+  ...QMP interaction...
+
+The same socket spawning approach also works with the ``--nbd-server
+addr.type=fd,addr.str=`` and ``--export
+type=vhost-user-blk,addr.type=fd,addr.str=`` options.
+
 Export raw image file ``disk.img`` over NBD UNIX domain socket ``nbd.sock``::
 
   $ qemu-storage-daemon \
-- 
2.29.2



[PATCH v2 1/2] storage-daemon: report unexpected arguments on the fly

2021-03-01 Thread Paolo Bonzini
If the first character of optstring is '-', then each nonoption argv
element is handled as if it were the argument of an option with character
code 1.  This removes the reordering of the argv array, and enables usage
of loc_set_cmdline to provide better error messages.

Signed-off-by: Paolo Bonzini 
---
 storage-daemon/qemu-storage-daemon.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index 9021a46b3a..9aa82e7d96 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -174,7 +174,7 @@ static void process_options(int argc, char *argv[])
  * they are given on the command lines. This means that things must be
  * defined first before they can be referenced in another option.
  */
-while ((c = getopt_long(argc, argv, "hT:V", long_options, NULL)) != -1) {
+while ((c = getopt_long(argc, argv, "-hT:V", long_options, NULL)) != -1) {
 switch (c) {
 case '?':
 exit(EXIT_FAILURE);
@@ -275,14 +275,13 @@ static void process_options(int argc, char *argv[])
 qobject_unref(args);
 break;
 }
+case 1:
+error_report("Unexpected argument: %s", optarg);
+exit(EXIT_FAILURE);
 default:
 g_assert_not_reached();
 }
 }
-if (optind != argc) {
-error_report("Unexpected argument: %s", argv[optind]);
-exit(EXIT_FAILURE);
-}
 }
 
 int main(int argc, char *argv[])
-- 
2.26.2





[PATCH v2 0/2] storage-daemon: include current command line option in the errors

2021-03-01 Thread Paolo Bonzini
Use the location management facilities that the emulator uses, so that
the current command line option appears in the error message.

Before:

  $ storage-daemon/qemu-storage-daemon --nbd key..=
  qemu-storage-daemon: Invalid parameter 'key..'

After:

  $ storage-daemon/qemu-storage-daemon --nbd key..=
  qemu-storage-daemon: --nbd key..=: Invalid parameter 'key..'

The first patch tweaks the command line parsing so that argv is
not reordered by getopt_long and optind is only advanced by one
option+argument on every call to getopt_long.  This is required
by loc_set_cmdline.

Paolo Bonzini (2):
  storage-daemon: report unexpected arguments on the fly
  storage-daemon: include current command line option in the errors

 storage-daemon/qemu-storage-daemon.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

-- 
2.26.2




[PATCH v2 2/2] storage-daemon: include current command line option in the errors

2021-03-01 Thread Paolo Bonzini
Use the location management facilities that the emulator uses, so that
the current command line option appears in the error message.

Before:

  $ storage-daemon/qemu-storage-daemon --nbd key..=
  qemu-storage-daemon: Invalid parameter 'key..'

After:

  $ storage-daemon/qemu-storage-daemon --nbd key..=
  qemu-storage-daemon: --nbd key..=: Invalid parameter 'key..'

Reviewed-by: Eric Blake 
Signed-off-by: Paolo Bonzini 
---
 storage-daemon/qemu-storage-daemon.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index 9aa82e7d96..78ddf619d4 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -152,6 +152,20 @@ static void init_qmp_commands(void)
  qmp_marshal_qmp_capabilities, QCO_ALLOW_PRECONFIG);
 }
 
+static int getopt_set_loc(int argc, char **argv, const char *optstring,
+  const struct option *longopts)
+{
+int c, save_index;
+
+optarg = NULL;
+save_index = optind;
+c = getopt_long(argc, argv, optstring, longopts, NULL);
+if (optarg) {
+loc_set_cmdline(argv, save_index, MAX(1, optind - save_index));
+}
+return c;
+}
+
 static void process_options(int argc, char *argv[])
 {
 int c;
@@ -174,7 +188,7 @@ static void process_options(int argc, char *argv[])
  * they are given on the command lines. This means that things must be
  * defined first before they can be referenced in another option.
  */
-while ((c = getopt_long(argc, argv, "-hT:V", long_options, NULL)) != -1) {
+while ((c = getopt_set_loc(argc, argv, "-hT:V", long_options)) != -1) {
 switch (c) {
 case '?':
 exit(EXIT_FAILURE);
@@ -275,12 +289,13 @@ static void process_options(int argc, char *argv[])
 break;
 }
 case 1:
-error_report("Unexpected argument: %s", optarg);
+error_report("Unexpected argument");
 exit(EXIT_FAILURE);
 default:
 g_assert_not_reached();
 }
 }
+loc_set_none();
 }
 
 int main(int argc, char *argv[])
-- 
2.26.2




Re: [PATCH] storage-daemon: include current command line option in the errors

2021-03-01 Thread Paolo Bonzini

On 01/03/21 13:26, Kevin Wolf wrote:

This save_index approach isn't perfect because getopt may skip
non-option arguments and they will be included in the help text:

 $ build/storage-daemon/qemu-storage-daemon foo --object iothread
 qemu-storage-daemon: foo --object iothread: Parameter 'id' is missing

 $ build/storage-daemon/qemu-storage-daemon foo --object iothread,id=t
 qemu-storage-daemon: --object iothread,id=t foo: Unexpected argument: foo

However, without changing the interface of loc_set_cmdline(), getting
the right index seems hard. Not sure what is the best way for fixing
this or if it's worth the effort.


We can do better by passing "-hT:V" to getopt_long.  Then each 
non-option argument is returned directly, and everything works as 
getopt_long no longer needs to reorder argv.


Paolo




[PATCH v4 10/12] hw/block/nvme: add non-mdts command size limit for verify

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

Verify is not subject to MDTS, so a single Verify command may result in
excessive amounts of allocated memory. Impose a limit on the data size
by adding support for TP 4040 ("Non-MDTS Command Size Limits").

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.h  |  1 +
 include/block/nvme.h |  5 +
 hw/block/nvme.c  | 49 +++-
 3 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 2afbece68b87..5154196ad5a3 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -20,6 +20,7 @@ typedef struct NvmeParams {
 uint8_t  aerl;
 uint32_t aer_max_queued;
 uint8_t  mdts;
+uint8_t  vsl;
 bool use_intel_id;
 uint8_t  zasl;
 bool legacy_cmb;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index c2fd7e817e5d..ec5262d17e12 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1042,6 +1042,11 @@ typedef struct QEMU_PACKED NvmeIdCtrl {
 uint8_t vs[1024];
 } NvmeIdCtrl;
 
+typedef struct NvmeIdCtrlNvm {
+uint8_t vsl;
+uint8_t rsvd1[4095];
+} NvmeIdCtrlNvm;
+
 typedef struct NvmeIdCtrlZoned {
 uint8_t zasl;
 uint8_t rsvd1[4095];
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 0bf667f824ce..beaf7f850bd3 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -22,7 +22,8 @@
  *  [pmrdev=,] \
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
- *  mdts=,zoned.append_size_limit=, \
+ *  mdts=,vsl=, \
+ *  zoned.append_size_limit=, \
  *  subsys= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
@@ -69,12 +70,26 @@
  *   as a power of two (2^n) and is in units of the minimum memory page size
  *   (CAP.MPSMIN). The default value is 7 (i.e. 512 KiB).
  *
+ * - `vsl`
+ *   Indicates the maximum data size limit for the Verify command. Like `mdts`,
+ *   this value is specified as a power of two (2^n) and is in units of the
+ *   minimum memory page size (CAP.MPSMIN). The default value is 7 (i.e. 512
+ *   KiB).
+ *
  * - `zoned.zasl`
  *   Indicates the maximum data transfer size for the Zone Append command. Like
  *   `mdts`, the value is specified as a power of two (2^n) and is in units of
  *   the minimum memory page size (CAP.MPSMIN). The default value is 0 (i.e.
  *   defaulting to the value of `mdts`).
  *
+ * - `zoned.append_size_limit`
+ *   The maximum I/O size in bytes that is allowed in Zone Append command.
+ *   The default is 128KiB. Since internally this this value is maintained as
+ *   ZASL = log2( / ), some values assigned
+ *   to this property may be rounded down and result in a lower maximum ZA
+ *   data size being in effect. By setting this property to 0, users can make
+ *   ZASL to be equal to MDTS. This property only affects zoned namespaces.
+ *
  * nvme namespace device parameters
  * 
  * - `subsys`
@@ -2505,6 +2520,10 @@ static uint16_t nvme_verify(NvmeCtrl *n, NvmeRequest 
*req)
 }
 }
 
+if (len > n->page_size << n->params.vsl) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
 status = nvme_check_bounds(ns, slba, nlb);
 if (status) {
 trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
@@ -4022,19 +4041,24 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, 
NvmeRequest *req)
 static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
-NvmeIdCtrlZoned id = {};
+uint8_t id[NVME_IDENTIFY_DATA_SIZE] = {};
 
 trace_pci_nvme_identify_ctrl_csi(c->csi);
 
-if (c->csi == NVME_CSI_NVM) {
-return nvme_rpt_empty_id_struct(n, req);
-} else if (c->csi == NVME_CSI_ZONED) {
-id.zasl = n->params.zasl;
+switch (c->csi) {
+case NVME_CSI_NVM:
+((NvmeIdCtrlNvm *))->vsl = n->params.vsl;
+break;
 
-return nvme_c2h(n, (uint8_t *), sizeof(id), req);
+case NVME_CSI_ZONED:
+((NvmeIdCtrlZoned *))->zasl = n->params.zasl;
+break;
+
+default:
+return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-return NVME_INVALID_FIELD | NVME_DNR;
+return nvme_c2h(n, id, sizeof(id), req);
 }
 
 static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
@@ -5418,6 +5442,11 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
"than or equal to mdts (Maximum Data Transfer Size)");
 return;
 }
+
+if (!n->params.vsl) {
+error_setg(errp, "vsl must be non-zero");
+return;
+}
 }
 
 static void nvme_init_state(NvmeCtrl *n)
@@ -5640,8 +5669,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->nn = cpu_to_le32(n->num_namespaces);
 id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROES | NVME_ONCS_TIMESTAMP |
NVME_ONCS_FEATURES | NVME_ONCS_DSM |
-   

[PATCH v4 12/12] hw/block/nvme: add support for the format nvm command

2021-03-01 Thread Klaus Jensen
From: Minwoo Im 

Format NVM admin command can make a namespace or namespaces to be
with different LBA size and metadata size with protection information
types.

This patch introduces Format NVM command with LBA format, Metadata, and
Protection Information for the device. The secure erase operation things
are yet to be added.

The parameter checks inside of this patch has been referred from
Keith's old branch.

Signed-off-by: Minwoo Im 
[anaidu.gollu: rebased on e2e]
Signed-off-by: Gollu Appalanaidu 
[k.jensen: rebased for reworked aio tracking]
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.h|   6 ++
 hw/block/nvme.h   |   1 +
 include/block/nvme.h  |   1 +
 hw/block/nvme-ns.c|   1 +
 hw/block/nvme.c   | 167 +-
 hw/block/trace-events |   3 +
 6 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 5a41522a4b33..94b97595ff4e 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -58,6 +58,7 @@ typedef struct NvmeNamespace {
 NvmeIdNs id_ns;
 const uint32_t *iocs;
 uint8_t  csi;
+uint16_t status;
 
 NvmeSubsystem   *subsys;
 
@@ -82,6 +83,11 @@ typedef struct NvmeNamespace {
 } features;
 } NvmeNamespace;
 
+static inline uint16_t nvme_ns_status(NvmeNamespace *ns)
+{
+return ns->status;
+}
+
 static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 {
 if (ns) {
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 5154196ad5a3..e9f6bba2e788 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -80,6 +80,7 @@ static inline const char *nvme_adm_opc_str(uint8_t opc)
 case NVME_ADM_CMD_SET_FEATURES: return "NVME_ADM_CMD_SET_FEATURES";
 case NVME_ADM_CMD_GET_FEATURES: return "NVME_ADM_CMD_GET_FEATURES";
 case NVME_ADM_CMD_ASYNC_EV_REQ: return "NVME_ADM_CMD_ASYNC_EV_REQ";
+case NVME_ADM_CMD_FORMAT_NVM:   return "NVME_ADM_CMD_FORMAT_NVM";
 default:return "NVME_ADM_CMD_UNKNOWN";
 }
 }
diff --git a/include/block/nvme.h b/include/block/nvme.h
index ec5262d17e12..2f21cd60cea3 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -825,6 +825,7 @@ enum NvmeStatusCodes {
 NVME_CAP_EXCEEDED   = 0x0081,
 NVME_NS_NOT_READY   = 0x0082,
 NVME_NS_RESV_CONFLICT   = 0x0083,
+NVME_FORMAT_IN_PROGRESS = 0x0084,
 NVME_INVALID_CQID   = 0x0100,
 NVME_INVALID_QID= 0x0101,
 NVME_MAX_QSIZE_EXCEEDED = 0x0102,
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 9aa9de335348..6ddccdb38dcf 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -102,6 +102,7 @@ lbaf_found:
 ns->mdata_offset = nvme_l2b(ns, nlbas);
 
 ns->csi = NVME_CSI_NVM;
+ns->status = 0x0;
 
 /* no thin provisioning */
 id_ns->ncap = id_ns->nsze;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index beaf7f850bd3..9f55ac1ae3da 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -196,6 +196,7 @@ static const uint32_t nvme_cse_acs[256] = {
 [NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_FORMAT_NVM]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 };
 
 static const uint32_t nvme_cse_iocs_none[256];
@@ -1831,6 +1832,42 @@ out:
 nvme_rw_complete_cb(req, ret);
 }
 
+struct nvme_aio_format_ctx {
+NvmeRequest   *req;
+NvmeNamespace *ns;
+
+/* number of outstanding write zeroes for this namespace */
+int *count;
+};
+
+static void nvme_aio_format_cb(void *opaque, int ret)
+{
+struct nvme_aio_format_ctx *ctx = opaque;
+NvmeRequest *req = ctx->req;
+NvmeNamespace *ns = ctx->ns;
+uintptr_t *num_formats = (uintptr_t *)>opaque;
+int *count = ctx->count;
+
+g_free(ctx);
+
+if (ret) {
+nvme_aio_err(req, ret);
+}
+
+if (--(*count)) {
+return;
+}
+
+g_free(count);
+ns->status = 0x0;
+
+if (--(*num_formats)) {
+return;
+}
+
+nvme_enqueue_req_completion(nvme_cq(req), req);
+}
+
 struct nvme_aio_flush_ctx {
 NvmeRequest *req;
 NvmeNamespace   *ns;
@@ -3514,6 +3551,7 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
 uint32_t nsid = le32_to_cpu(req->cmd.nsid);
+uint16_t status;
 
 trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req),
   req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode));
@@ -3555,6 +3593,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_OPCODE | NVME_DNR;
 }
 
+status = nvme_ns_status(req->ns);
+if (unlikely(status)) {
+return status;
+}
+
 switch (req->cmd.opcode) {
 case NVME_CMD_WRITE_ZEROES:
 return nvme_write_zeroes(n, req);
@@ -4670,6 +4713,126 @@ static uint16_t 

[PATCH v4 11/12] hw/block/nvme: support multiple lba formats

2021-03-01 Thread Klaus Jensen
From: Minwoo Im 

This patch introduces multiple LBA formats supported with the typical
logical block sizes of 512 bytes and 4096 bytes as well as metadata
sizes of 0, 8, 16 and 64 bytes. The format will be chosed based on the
lbads and ms parameters of the nvme-ns device.

Signed-off-by: Minwoo Im 
[k.jensen: resurrected and rebased]
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.c | 60 +++---
 1 file changed, 52 insertions(+), 8 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index f50e094c3d98..9aa9de335348 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -36,13 +36,15 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 {
 BlockDriverInfo bdi;
 NvmeIdNs *id_ns = >id_ns;
-int lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
 int npdg, nlbas;
+uint8_t ds;
+uint16_t ms;
+int i;
 
 ns->id_ns.dlfeat = 0x1;
 
-id_ns->lbaf[lba_index].ds = 31 - clz32(ns->blkconf.logical_block_size);
-id_ns->lbaf[lba_index].ms = ns->params.ms;
+ds = 31 - clz32(ns->blkconf.logical_block_size);
+ms = ns->params.ms;
 
 if (ns->params.ms) {
 id_ns->mc = 0x3;
@@ -53,8 +55,47 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 
 id_ns->dpc = 0x1f;
 id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;
+
+NvmeLBAF lbaf[16] = {
+[0] = { .ds =  9   },
+[1] = { .ds =  9, .ms =  8 },
+[2] = { .ds =  9, .ms = 16 },
+[3] = { .ds =  9, .ms = 64 },
+[4] = { .ds = 12   },
+[5] = { .ds = 12, .ms =  8 },
+[6] = { .ds = 12, .ms = 16 },
+[7] = { .ds = 12, .ms = 64 },
+};
+
+memcpy(_ns->lbaf, , sizeof(lbaf));
+id_ns->nlbaf = 7;
+} else {
+NvmeLBAF lbaf[16] = {
+[0] = { .ds =  9 },
+[1] = { .ds = 12 },
+};
+
+memcpy(_ns->lbaf, , sizeof(lbaf));
+id_ns->nlbaf = 1;
 }
 
+for (i = 0; i <= id_ns->nlbaf; i++) {
+NvmeLBAF *lbaf = _ns->lbaf[i];
+if (lbaf->ds == ds) {
+if (lbaf->ms == ms) {
+id_ns->flbas |= i;
+goto lbaf_found;
+}
+}
+}
+
+/* add non-standard lba format */
+id_ns->nlbaf++;
+id_ns->lbaf[id_ns->nlbaf].ds = ds;
+id_ns->lbaf[id_ns->nlbaf].ms = ms;
+id_ns->flbas |= id_ns->nlbaf;
+
+lbaf_found:
 nlbas = nvme_ns_nlbas(ns);
 
 id_ns->nsze = cpu_to_le64(nlbas);
@@ -244,9 +285,10 @@ static void nvme_ns_zoned_init_state(NvmeNamespace *ns)
 }
 }
 
-static void nvme_ns_init_zoned(NvmeNamespace *ns, int lba_index)
+static void nvme_ns_init_zoned(NvmeNamespace *ns)
 {
 NvmeIdNsZoned *id_ns_z;
+int i;
 
 nvme_ns_zoned_init_state(ns);
 
@@ -258,9 +300,11 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns, int 
lba_index)
 id_ns_z->zoc = 0;
 id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00;
 
-id_ns_z->lbafe[lba_index].zsze = cpu_to_le64(ns->zone_size);
-id_ns_z->lbafe[lba_index].zdes =
-ns->params.zd_extension_size >> 6; /* Units of 64B */
+for (i = 0; i <= ns->id_ns.nlbaf; i++) {
+id_ns_z->lbafe[i].zsze = cpu_to_le64(ns->zone_size);
+id_ns_z->lbafe[i].zdes =
+ns->params.zd_extension_size >> 6; /* Units of 64B */
+}
 
 ns->csi = NVME_CSI_ZONED;
 ns->id_ns.nsze = cpu_to_le64(ns->num_zones * ns->zone_size);
@@ -367,7 +411,7 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
 if (nvme_ns_zoned_check_calc_geometry(ns, errp) != 0) {
 return -1;
 }
-nvme_ns_init_zoned(ns, 0);
+nvme_ns_init_zoned(ns);
 }
 
 return 0;
-- 
2.30.1




[PATCH v4 07/12] hw/block/nvme: add metadata support

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

Add support for metadata in the form of extended logical blocks as well
as a separate buffer of data. The new `ms` nvme-ns device parameter
specifies the size of metadata per logical block in bytes. The `mset`
nvme-ns device parameter controls whether metadata is transfered as part
of an extended lba (set to '1') or in a separate buffer (set to '0',
the default).

Regardsless of the scheme chosen with `mset`, metadata is stored at the
end of the namespace backing block device. This requires the user
provided PRP/SGLs to be walked and "split" into data and metadata
scatter/gather lists if the extended logical block scheme is used, but
has the advantage of not breaking the deallocated blocks support.

Co-authored-by: Gollu Appalanaidu 
Signed-off-by: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-ns.h|  39 ++-
 hw/block/nvme-ns.c|  18 +-
 hw/block/nvme.c   | 642 --
 hw/block/trace-events |   4 +-
 4 files changed, 610 insertions(+), 93 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 7af6884862b5..2281fd39930a 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,9 @@ typedef struct NvmeNamespaceParams {
 uint32_t nsid;
 QemuUUID uuid;
 
+uint16_t ms;
+uint8_t  mset;
+
 uint16_t mssrl;
 uint32_t mcl;
 uint8_t  msrc;
@@ -47,6 +50,7 @@ typedef struct NvmeNamespace {
 BlockConfblkconf;
 int32_t  bootindex;
 int64_t  size;
+int64_t  mdata_offset;
 NvmeIdNs id_ns;
 const uint32_t *iocs;
 uint8_t  csi;
@@ -99,18 +103,41 @@ static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
 return nvme_ns_lbaf(ns)->ds;
 }
 
-/* calculate the number of LBAs that the namespace can accomodate */
-static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
-{
-return ns->size >> nvme_ns_lbads(ns);
-}
-
 /* convert an LBA to the equivalent in bytes */
 static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba)
 {
 return lba << nvme_ns_lbads(ns);
 }
 
+static inline size_t nvme_lsize(NvmeNamespace *ns)
+{
+return 1 << nvme_ns_lbads(ns);
+}
+
+static inline uint16_t nvme_msize(NvmeNamespace *ns)
+{
+return nvme_ns_lbaf(ns)->ms;
+}
+
+static inline size_t nvme_m2b(NvmeNamespace *ns, uint64_t lba)
+{
+return nvme_msize(ns) * lba;
+}
+
+static inline bool nvme_ns_ext(NvmeNamespace *ns)
+{
+return !!NVME_ID_NS_FLBAS_EXTENDED(ns->id_ns.flbas);
+}
+
+/* calculate the number of LBAs that the namespace can accomodate */
+static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
+{
+if (ns->params.ms) {
+return ns->size / (nvme_lsize(ns) + nvme_msize(ns));
+}
+return ns->size >> nvme_ns_lbads(ns);
+}
+
 typedef struct NvmeCtrl NvmeCtrl;
 
 static inline NvmeZoneState nvme_get_zone_state(NvmeZone *zone)
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 0e8760020483..d0c79318aad7 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -37,13 +37,25 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 BlockDriverInfo bdi;
 NvmeIdNs *id_ns = >id_ns;
 int lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
-int npdg;
+int npdg, nlbas;
 
 ns->id_ns.dlfeat = 0x9;
 
 id_ns->lbaf[lba_index].ds = 31 - clz32(ns->blkconf.logical_block_size);
+id_ns->lbaf[lba_index].ms = ns->params.ms;
 
-id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
+if (ns->params.ms) {
+id_ns->mc = 0x3;
+
+if (ns->params.mset) {
+id_ns->flbas |= 0x10;
+}
+}
+
+nlbas = nvme_ns_nlbas(ns);
+
+id_ns->nsze = cpu_to_le64(nlbas);
+ns->mdata_offset = nvme_l2b(ns, nlbas);
 
 ns->csi = NVME_CSI_NVM;
 
@@ -401,6 +413,8 @@ static Property nvme_ns_props[] = {
  NvmeSubsystem *),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
+DEFINE_PROP_UINT16("ms", NvmeNamespace, params.ms, 0),
+DEFINE_PROP_UINT8("mset", NvmeNamespace, params.mset, 0),
 DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128),
 DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128),
 DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127),
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 9b5c8de115ea..71bf550a25e6 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -343,6 +343,26 @@ static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void 
*buf, int size)
 return pci_dma_read(>parent_obj, addr, buf, size);
 }
 
+static int nvme_addr_write(NvmeCtrl *n, hwaddr addr, void *buf, int size)
+{
+hwaddr hi = addr + size - 1;
+if (hi < addr) {
+return 1;
+}
+
+if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {
+memcpy(nvme_addr_to_cmb(n, addr), buf, size);
+return 0;
+}
+
+if (nvme_addr_is_pmr(n, addr) && nvme_addr_is_pmr(n, hi)) {
+   

[PATCH v4 08/12] hw/block/nvme: end-to-end data protection

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

Add support for namespaces formatted with protection information. The
type of end-to-end data protection (i.e. Type 1, Type 2 or Type 3) is
selected with the `pi` nvme-ns device parameter. If the number of
metadata bytes is larger than 8, the `pil` nvme-ns device parameter may
be used to control the location of the 8-byte DIF tuple. The default
`pil` value of '0', causes the DIF tuple to be transferred as the last
8 bytes of the metadata. Set to 1 to store this in the first eight bytes
instead.

Co-authored-by: Gollu Appalanaidu 
Signed-off-by: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-dif.h   |  51 +
 hw/block/nvme-ns.h|   4 +
 hw/block/nvme.h   |  31 +++
 include/block/nvme.h  |  26 ++-
 hw/block/nvme-dif.c   | 513 ++
 hw/block/nvme-ns.c|  13 +-
 hw/block/nvme.c   | 257 +
 hw/block/meson.build  |   2 +-
 hw/block/trace-events |  11 +
 9 files changed, 861 insertions(+), 47 deletions(-)
 create mode 100644 hw/block/nvme-dif.h
 create mode 100644 hw/block/nvme-dif.c

diff --git a/hw/block/nvme-dif.h b/hw/block/nvme-dif.h
new file mode 100644
index ..793829782c9d
--- /dev/null
+++ b/hw/block/nvme-dif.h
@@ -0,0 +1,51 @@
+#ifndef HW_NVME_DIF_H
+#define HW_NVME_DIF_H
+
+/* from Linux kernel (crypto/crct10dif_common.c) */
+static const uint16_t t10_dif_crc_table[256] = {
+0x, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
+0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
+0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
+0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
+0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
+0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
+0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
+0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
+0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
+0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
+0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
+0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
+0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
+0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
+0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
+0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
+0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
+0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
+0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
+0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
+0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
+0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
+0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
+0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
+0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
+0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
+0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
+0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
+0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
+0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
+0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
+0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
+};
+
+uint16_t nvme_check_prinfo(NvmeNamespace *ns, uint16_t ctrl, uint64_t slba,
+   uint32_t reftag);
+void nvme_dif_pract_generate_dif(NvmeNamespace *ns, uint8_t *buf, size_t len,
+ uint8_t *mbuf, size_t mlen, uint16_t apptag,
+ uint32_t reftag);
+uint16_t nvme_dif_check(NvmeNamespace *ns, uint8_t *buf, size_t len,
+uint8_t *mbuf, size_t mlen, uint16_t ctrl,
+uint64_t slba, uint16_t apptag,
+uint16_t appmask, uint32_t reftag);
+uint16_t nvme_dif_rw(NvmeCtrl *n, NvmeRequest *req);
+
+#endif /* HW_NVME_DIF_H */
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 2281fd39930a..5a41522a4b33 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -15,6 +15,8 @@
 #ifndef NVME_NS_H
 #define NVME_NS_H
 
+#include "qemu/uuid.h"
+
 #define TYPE_NVME_NS "nvme-ns"
 #define NVME_NS(obj) \
 OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
@@ -31,6 +33,8 @@ typedef struct NvmeNamespaceParams {
 
 uint16_t ms;
 uint8_t  mset;
+uint8_t  pi;
+uint8_t  pil;
 
 uint16_t mssrl;
 uint32_t mcl;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 9e0b56f41ea8..fe5bb11131cf 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -2,6 +2,7 @@
 

[PATCH v4 04/12] hw/block/nvme: try to deal with the iov/qsg duality

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

Introduce NvmeSg and try to deal with that pesky qsg/iov duality that
haunts all the memory-related functions.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h |  17 -
 hw/block/nvme.c | 191 ++--
 2 files changed, 117 insertions(+), 91 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index f45ace0cff5b..9e0b56f41ea8 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -29,6 +29,20 @@ typedef struct NvmeAsyncEvent {
 NvmeAerResult result;
 } NvmeAsyncEvent;
 
+enum {
+NVME_SG_ALLOC = 1 << 0,
+NVME_SG_DMA   = 1 << 1,
+};
+
+typedef struct NvmeSg {
+int flags;
+
+union {
+QEMUSGList   qsg;
+QEMUIOVector iov;
+};
+} NvmeSg;
+
 typedef struct NvmeRequest {
 struct NvmeSQueue   *sq;
 struct NvmeNamespace*ns;
@@ -38,8 +52,7 @@ typedef struct NvmeRequest {
 NvmeCqe cqe;
 NvmeCmd cmd;
 BlockAcctCookie acct;
-QEMUSGList  qsg;
-QEMUIOVectoriov;
+NvmeSg  sg;
 QTAILQ_ENTRY(NvmeRequest)entry;
 } NvmeRequest;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ed6068d1306d..ae411f04752b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -432,15 +432,31 @@ static void nvme_req_clear(NvmeRequest *req)
 req->status = NVME_SUCCESS;
 }
 
-static void nvme_req_exit(NvmeRequest *req)
+static inline void nvme_sg_init(NvmeCtrl *n, NvmeSg *sg, bool dma)
 {
-if (req->qsg.sg) {
-qemu_sglist_destroy(>qsg);
+if (dma) {
+pci_dma_sglist_init(>qsg, >parent_obj, 0);
+sg->flags = NVME_SG_DMA;
+} else {
+qemu_iovec_init(>iov, 0);
 }
 
-if (req->iov.iov) {
-qemu_iovec_destroy(>iov);
+sg->flags |= NVME_SG_ALLOC;
+}
+
+static inline void nvme_sg_unmap(NvmeSg *sg)
+{
+if (!(sg->flags & NVME_SG_ALLOC)) {
+return;
 }
+
+if (sg->flags & NVME_SG_DMA) {
+qemu_sglist_destroy(>qsg);
+} else {
+qemu_iovec_destroy(>iov);
+}
+
+memset(sg, 0x0, sizeof(*sg));
 }
 
 static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
@@ -477,8 +493,7 @@ static uint16_t nvme_map_addr_pmr(NvmeCtrl *n, QEMUIOVector 
*iov, hwaddr addr,
 return NVME_SUCCESS;
 }
 
-static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
-  hwaddr addr, size_t len)
+static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, hwaddr addr, size_t len)
 {
 bool cmb = false, pmr = false;
 
@@ -495,38 +510,31 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList 
*qsg, QEMUIOVector *iov,
 }
 
 if (cmb || pmr) {
-if (qsg && qsg->sg) {
+if (sg->flags & NVME_SG_DMA) {
 return NVME_INVALID_USE_OF_CMB | NVME_DNR;
 }
 
-assert(iov);
-
-if (!iov->iov) {
-qemu_iovec_init(iov, 1);
-}
-
 if (cmb) {
-return nvme_map_addr_cmb(n, iov, addr, len);
+return nvme_map_addr_cmb(n, >iov, addr, len);
 } else {
-return nvme_map_addr_pmr(n, iov, addr, len);
+return nvme_map_addr_pmr(n, >iov, addr, len);
 }
 }
 
-if (iov && iov->iov) {
+if (!(sg->flags & NVME_SG_DMA)) {
 return NVME_INVALID_USE_OF_CMB | NVME_DNR;
 }
 
-assert(qsg);
-
-if (!qsg->sg) {
-pci_dma_sglist_init(qsg, >parent_obj, 1);
-}
-
-qemu_sglist_add(qsg, addr, len);
+qemu_sglist_add(>qsg, addr, len);
 
 return NVME_SUCCESS;
 }
 
+static inline bool nvme_addr_is_dma(NvmeCtrl *n, hwaddr addr)
+{
+return !(nvme_addr_is_cmb(n, addr) || nvme_addr_is_pmr(n, addr));
+}
+
 static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
  uint32_t len, NvmeRequest *req)
 {
@@ -536,20 +544,13 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 uint16_t status;
 int ret;
 
-QEMUSGList *qsg = >qsg;
-QEMUIOVector *iov = >iov;
-
 trace_pci_nvme_map_prp(trans_len, len, prp1, prp2, num_prps);
 
-if (nvme_addr_is_cmb(n, prp1) || (nvme_addr_is_pmr(n, prp1))) {
-qemu_iovec_init(iov, num_prps);
-} else {
-pci_dma_sglist_init(qsg, >parent_obj, num_prps);
-}
+nvme_sg_init(n, >sg, nvme_addr_is_dma(n, prp1));
 
-status = nvme_map_addr(n, qsg, iov, prp1, trans_len);
+status = nvme_map_addr(n, >sg, prp1, trans_len);
 if (status) {
-return status;
+goto unmap;
 }
 
 len -= trans_len;
@@ -564,7 +565,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
 if (ret) {
 trace_pci_nvme_err_addr_read(prp2);
-return NVME_DATA_TRAS_ERROR;
+status = NVME_DATA_TRAS_ERROR;
+goto unmap;
 }
 

[PATCH v4 03/12] hw/block/nvme: fix strerror printing

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

Fix missing sign inversion.

Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 8244909562a2..ed6068d1306d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1155,7 +1155,7 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
 break;
 }
 
-trace_pci_nvme_err_aio(nvme_cid(req), strerror(ret), status);
+trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);
 
 error_setg_errno(_err, -ret, "aio failed");
 error_report_err(local_err);
-- 
2.30.1




[PATCH v4 09/12] hw/block/nvme: add verify command

2021-03-01 Thread Klaus Jensen
From: Gollu Appalanaidu 

See NVM Express 1.4, section 6.14 ("Verify Command").

Signed-off-by: Gollu Appalanaidu 
[k.jensen: rebased, refactored for e2e]
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-dif.h   |   2 +
 hw/block/nvme.h   |   1 +
 include/block/nvme.h  |   2 +
 hw/block/nvme-dif.c   |   4 +-
 hw/block/nvme.c   | 147 +-
 hw/block/trace-events |   3 +
 6 files changed, 156 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme-dif.h b/hw/block/nvme-dif.h
index 793829782c9d..5a8e37c8525b 100644
--- a/hw/block/nvme-dif.h
+++ b/hw/block/nvme-dif.h
@@ -39,6 +39,8 @@ static const uint16_t t10_dif_crc_table[256] = {
 
 uint16_t nvme_check_prinfo(NvmeNamespace *ns, uint16_t ctrl, uint64_t slba,
uint32_t reftag);
+uint16_t nvme_dif_mangle_mdata(NvmeNamespace *ns, uint8_t *mbuf, size_t mlen,
+   uint64_t slba);
 void nvme_dif_pract_generate_dif(NvmeNamespace *ns, uint8_t *buf, size_t len,
  uint8_t *mbuf, size_t mlen, uint16_t apptag,
  uint32_t reftag);
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index fe5bb11131cf..2afbece68b87 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -92,6 +92,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 case NVME_CMD_COMPARE:  return "NVME_NVM_CMD_COMPARE";
 case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
 case NVME_CMD_DSM:  return "NVME_NVM_CMD_DSM";
+case NVME_CMD_VERIFY:   return "NVME_NVM_CMD_VERIFY";
 case NVME_CMD_COPY: return "NVME_NVM_CMD_COPY";
 case NVME_CMD_ZONE_MGMT_SEND:   return "NVME_ZONED_CMD_MGMT_SEND";
 case NVME_CMD_ZONE_MGMT_RECV:   return "NVME_ZONED_CMD_MGMT_RECV";
diff --git a/include/block/nvme.h b/include/block/nvme.h
index a7debf29c644..c2fd7e817e5d 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -579,6 +579,7 @@ enum NvmeIoCommands {
 NVME_CMD_COMPARE= 0x05,
 NVME_CMD_WRITE_ZEROES   = 0x08,
 NVME_CMD_DSM= 0x09,
+NVME_CMD_VERIFY = 0x0c,
 NVME_CMD_COPY   = 0x19,
 NVME_CMD_ZONE_MGMT_SEND = 0x79,
 NVME_CMD_ZONE_MGMT_RECV = 0x7a,
@@ -1060,6 +1061,7 @@ enum NvmeIdCtrlOncs {
 NVME_ONCS_FEATURES  = 1 << 4,
 NVME_ONCS_RESRVATIONS   = 1 << 5,
 NVME_ONCS_TIMESTAMP = 1 << 6,
+NVME_ONCS_VERIFY= 1 << 7,
 NVME_ONCS_COPY  = 1 << 8,
 };
 
diff --git a/hw/block/nvme-dif.c b/hw/block/nvme-dif.c
index d7154d302ab0..4df411a2bb18 100644
--- a/hw/block/nvme-dif.c
+++ b/hw/block/nvme-dif.c
@@ -162,8 +162,8 @@ uint16_t nvme_dif_check(NvmeNamespace *ns, uint8_t *buf, 
size_t len,
 return NVME_SUCCESS;
 }
 
-static uint16_t nvme_dif_mangle_mdata(NvmeNamespace *ns, uint8_t *mbuf,
-  size_t mlen, uint64_t slba)
+uint16_t nvme_dif_mangle_mdata(NvmeNamespace *ns, uint8_t *mbuf, size_t mlen,
+   uint64_t slba)
 {
 BlockBackend *blk = ns->blkconf.blk;
 BlockDriverState *bs = blk_bs(blk);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b88b5c956178..0bf667f824ce 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -191,6 +191,7 @@ static const uint32_t nvme_cse_iocs_nvm[256] = {
 [NVME_CMD_WRITE]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_DSM]  = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_VERIFY]   = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_COPY] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_COMPARE]  = NVME_CMD_EFF_CSUPP,
 };
@@ -201,6 +202,7 @@ static const uint32_t nvme_cse_iocs_zoned[256] = {
 [NVME_CMD_WRITE]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_DSM]  = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_VERIFY]   = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_COPY] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_COMPARE]  = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_ZONE_APPEND]  = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
@@ -1849,6 +1851,90 @@ static void nvme_aio_flush_cb(void *opaque, int ret)
 nvme_enqueue_req_completion(nvme_cq(req), req);
 }
 
+static void nvme_verify_cb(void *opaque, int ret)
+{
+NvmeBounceContext *ctx = opaque;
+NvmeRequest *req = ctx->req;
+NvmeNamespace *ns = req->ns;
+BlockBackend *blk = ns->blkconf.blk;
+BlockAcctCookie *acct = >acct;
+BlockAcctStats *stats = blk_get_stats(blk);
+NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
+uint64_t slba = le64_to_cpu(rw->slba);
+uint16_t ctrl = le16_to_cpu(rw->control);
+uint16_t apptag = le16_to_cpu(rw->apptag);
+uint16_t appmask = 

[PATCH v4 05/12] hw/block/nvme: remove the req dependency in map functions

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

The PRP and SGL mapping functions does not have any particular need for
the entire NvmeRequest as a parameter. Clean it up.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c   | 61 ++-
 hw/block/trace-events |  4 +--
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ae411f04752b..fda2d480cb66 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -535,8 +535,8 @@ static inline bool nvme_addr_is_dma(NvmeCtrl *n, hwaddr 
addr)
 return !(nvme_addr_is_cmb(n, addr) || nvme_addr_is_pmr(n, addr));
 }
 
-static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
- uint32_t len, NvmeRequest *req)
+static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, uint64_t prp1,
+ uint64_t prp2, uint32_t len)
 {
 hwaddr trans_len = n->page_size - (prp1 % n->page_size);
 trans_len = MIN(len, trans_len);
@@ -546,9 +546,9 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 
 trace_pci_nvme_map_prp(trans_len, len, prp1, prp2, num_prps);
 
-nvme_sg_init(n, >sg, nvme_addr_is_dma(n, prp1));
+nvme_sg_init(n, sg, nvme_addr_is_dma(n, prp1));
 
-status = nvme_map_addr(n, >sg, prp1, trans_len);
+status = nvme_map_addr(n, sg, prp1, trans_len);
 if (status) {
 goto unmap;
 }
@@ -598,7 +598,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 }
 
 trans_len = MIN(len, n->page_size);
-status = nvme_map_addr(n, >sg, prp_ent, trans_len);
+status = nvme_map_addr(n, sg, prp_ent, trans_len);
 if (status) {
 goto unmap;
 }
@@ -612,7 +612,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
 goto unmap;
 }
-status = nvme_map_addr(n, >sg, prp2, len);
+status = nvme_map_addr(n, sg, prp2, len);
 if (status) {
 goto unmap;
 }
@@ -622,7 +622,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 return NVME_SUCCESS;
 
 unmap:
-nvme_sg_unmap(>sg);
+nvme_sg_unmap(sg);
 return status;
 }
 
@@ -632,7 +632,7 @@ unmap:
  */
 static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
   NvmeSglDescriptor *segment, uint64_t nsgld,
-  size_t *len, NvmeRequest *req)
+  size_t *len, NvmeCmd *cmd)
 {
 dma_addr_t addr, trans_len;
 uint32_t dlen;
@@ -643,7 +643,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
 
 switch (type) {
 case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
-if (req->cmd.opcode == NVME_CMD_WRITE) {
+if (cmd->opcode == NVME_CMD_WRITE) {
 continue;
 }
 case NVME_SGL_DESCR_TYPE_DATA_BLOCK:
@@ -672,7 +672,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
 break;
 }
 
-trace_pci_nvme_err_invalid_sgl_excess_length(nvme_cid(req));
+trace_pci_nvme_err_invalid_sgl_excess_length(dlen);
 return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
 }
 
@@ -701,7 +701,7 @@ next:
 }
 
 static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, NvmeSglDescriptor sgl,
- size_t len, NvmeRequest *req)
+ size_t len, NvmeCmd *cmd)
 {
 /*
  * Read the segment in chunks of 256 descriptors (one 4k page) to avoid
@@ -722,7 +722,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
 sgld = 
 addr = le64_to_cpu(sgl.addr);
 
-trace_pci_nvme_map_sgl(nvme_cid(req), NVME_SGL_TYPE(sgl.type), len);
+trace_pci_nvme_map_sgl(NVME_SGL_TYPE(sgl.type), len);
 
 nvme_sg_init(n, sg, nvme_addr_is_dma(n, addr));
 
@@ -731,7 +731,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
  * be mapped directly.
  */
 if (NVME_SGL_TYPE(sgl.type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
-status = nvme_map_sgl_data(n, sg, sgld, 1, , req);
+status = nvme_map_sgl_data(n, sg, sgld, 1, , cmd);
 if (status) {
 goto unmap;
 }
@@ -770,7 +770,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
 }
 
 status = nvme_map_sgl_data(n, sg, segment, SEG_CHUNK_SIZE,
-   , req);
+   , cmd);
 if (status) {
 goto unmap;
 }
@@ -796,7 +796,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
 switch (NVME_SGL_TYPE(last_sgld->type)) {
 case 

[PATCH v4 06/12] hw/block/nvme: refactor nvme_dma

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

The nvme_dma function doesn't just do DMA (QEMUSGList-based) memory transfers;
it also handles QEMUIOVector copies.

Introduce the NvmeTxDirection enum and rename to nvme_tx. Remove mapping
of PRPs/SGLs from nvme_tx and instead assert that they have been mapped
previously. This allows more fine-grained use in subsequent patches.

Add new (better named) helpers, nvme_{c2h,h2c}, that does both PRP/SGL
mapping and transfer.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 139 ++--
 1 file changed, 76 insertions(+), 63 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index fda2d480cb66..9b5c8de115ea 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -862,45 +862,71 @@ static uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, 
size_t len,
 }
 }
 
-static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
- DMADirection dir, NvmeRequest *req)
+typedef enum NvmeTxDirection {
+NVME_TX_DIRECTION_TO_DEVICE   = 0,
+NVME_TX_DIRECTION_FROM_DEVICE = 1,
+} NvmeTxDirection;
+
+static uint16_t nvme_tx(NvmeCtrl *n, NvmeSg *sg, uint8_t *ptr, uint32_t len,
+NvmeTxDirection dir)
 {
-uint16_t status = NVME_SUCCESS;
+assert(sg->flags & NVME_SG_ALLOC);
+
+if (sg->flags & NVME_SG_DMA) {
+uint64_t residual;
+
+if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
+residual = dma_buf_write(ptr, len, >qsg);
+} else {
+residual = dma_buf_read(ptr, len, >qsg);
+}
+
+if (unlikely(residual)) {
+trace_pci_nvme_err_invalid_dma();
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+} else {
+size_t bytes;
+
+if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
+bytes = qemu_iovec_to_buf(>iov, 0, ptr, len);
+} else {
+bytes = qemu_iovec_from_buf(>iov, 0, ptr, len);
+}
+
+if (unlikely(bytes != len)) {
+trace_pci_nvme_err_invalid_dma();
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+}
+
+return NVME_SUCCESS;
+}
+
+static inline uint16_t nvme_c2h(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+NvmeRequest *req)
+{
+uint16_t status;
 
 status = nvme_map_dptr(n, >sg, len, >cmd);
 if (status) {
 return status;
 }
 
-if (req->sg.flags & NVME_SG_DMA) {
-uint64_t residual;
+return nvme_tx(n, >sg, ptr, len, NVME_TX_DIRECTION_FROM_DEVICE);
+}
 
-if (dir == DMA_DIRECTION_TO_DEVICE) {
-residual = dma_buf_write(ptr, len, >sg.qsg);
-} else {
-residual = dma_buf_read(ptr, len, >sg.qsg);
-}
+static inline uint16_t nvme_h2c(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+NvmeRequest *req)
+{
+uint16_t status;
 
-if (unlikely(residual)) {
-trace_pci_nvme_err_invalid_dma();
-status = NVME_INVALID_FIELD | NVME_DNR;
-}
-} else {
-size_t bytes;
-
-if (dir == DMA_DIRECTION_TO_DEVICE) {
-bytes = qemu_iovec_to_buf(>sg.iov, 0, ptr, len);
-} else {
-bytes = qemu_iovec_from_buf(>sg.iov, 0, ptr, len);
-}
-
-if (unlikely(bytes != len)) {
-trace_pci_nvme_err_invalid_dma();
-status = NVME_INVALID_FIELD | NVME_DNR;
-}
+status = nvme_map_dptr(n, >sg, len, >cmd);
+if (status) {
+return status;
 }
 
-return status;
+return nvme_tx(n, >sg, ptr, len, NVME_TX_DIRECTION_TO_DEVICE);
 }
 
 static inline void nvme_blk_read(BlockBackend *blk, int64_t offset,
@@ -1740,8 +1766,7 @@ static void nvme_compare_cb(void *opaque, int ret)
 
 buf = g_malloc(ctx->iov.size);
 
-status = nvme_dma(nvme_ctrl(req), buf, ctx->iov.size,
-  DMA_DIRECTION_TO_DEVICE, req);
+status = nvme_h2c(nvme_ctrl(req), buf, ctx->iov.size, req);
 if (status) {
 req->status = status;
 goto out;
@@ -1777,8 +1802,7 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
 NvmeDsmRange range[nr];
 uintptr_t *discards = (uintptr_t *)>opaque;
 
-status = nvme_dma(n, (uint8_t *)range, sizeof(range),
-  DMA_DIRECTION_TO_DEVICE, req);
+status = nvme_h2c(n, (uint8_t *)range, sizeof(range), req);
 if (status) {
 return status;
 }
@@ -1860,8 +1884,8 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 
 range = g_new(NvmeCopySourceRange, nr);
 
-status = nvme_dma(n, (uint8_t *)range, nr * sizeof(NvmeCopySourceRange),
-  DMA_DIRECTION_TO_DEVICE, req);
+status = nvme_h2c(n, (uint8_t *)range, nr * sizeof(NvmeCopySourceRange),
+  req);
 if (status) {
 return status;
 }
@@ -2512,8 +2536,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, 
NvmeRequest *req)
 

[PATCH v4 00/12] hw/block/nvme: metadata and end-to-end data protection support

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

This is v4 (RFC removed) of a series that adds support for metadata and
end-to-end data protection.

First, on the subject of metadata, in v1, support was restricted to
extended logical blocks, which was pretty trivial to implement, but
required special initialization and broke DULBE. In v2, metadata is
always stored continuously at the end of the underlying block device.
This has the advantage of not breaking DULBE since the data blocks
remains aligned and allows bdrv_block_status to be used to determinate
allocation status. It comes at the expense of complicating the extended
LBA emulation, but on the other hand it also gains support for metadata
transfered as a separate buffer.

The end-to-end data protection support blew up in terms of required
changes. This is due to the fact that a bunch of new commands has been
added to the device since v1 (zone append, compare, copy), and they all
require various special handling for protection information. If
potential reviewers would like it split up into multiple patches, each
adding pi support to one command, shout out.

The core of the series (metadata and eedp) is preceeded by a set of
patches that refactors mapping (yes, again) and tries to deal with the
qsg/iov duality mess (maybe also again?).

Support fro metadata and end-to-end data protection is all joint work
with Gollu Appalanaidu.

v4:
  * promoted from RFC
  * moved most eedp additions to nvme-dif.{c,h}. (Keith)

v3:

  * added patch with Verify command
  * added patches for multiple LBA formats and Format NVM
  * changed NvmeSG to be a union (Keith)

Gollu Appalanaidu (1):
  hw/block/nvme: add verify command

Klaus Jensen (9):
  hw/block/nvme: remove redundant len member in compare context
  hw/block/nvme: remove block accounting for write zeroes
  hw/block/nvme: fix strerror printing
  hw/block/nvme: try to deal with the iov/qsg duality
  hw/block/nvme: remove the req dependency in map functions
  hw/block/nvme: refactor nvme_dma
  hw/block/nvme: add metadata support
  hw/block/nvme: end-to-end data protection
  hw/block/nvme: add non-mdts command size limit for verify

Minwoo Im (2):
  hw/block/nvme: support multiple lba formats
  hw/block/nvme: add support for the format nvm command

 hw/block/nvme-dif.h   |   53 ++
 hw/block/nvme-ns.h|   49 +-
 hw/block/nvme.h   |   51 +-
 include/block/nvme.h  |   34 +-
 hw/block/nvme-dif.c   |  513 ++
 hw/block/nvme-ns.c|   90 ++-
 hw/block/nvme.c   | 1548 ++---
 hw/block/meson.build  |2 +-
 hw/block/trace-events |   25 +-
 9 files changed, 2071 insertions(+), 294 deletions(-)
 create mode 100644 hw/block/nvme-dif.h
 create mode 100644 hw/block/nvme-dif.c

-- 
2.30.1




Re: [RFC PATCH 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()

2021-03-01 Thread Philippe Mathieu-Daudé
On 3/1/21 2:38 PM, David Edmondson wrote:
> On Monday, 2021-03-01 at 12:50:33 +01, Philippe Mathieu-Daudé wrote:
> 
>> On 2/26/21 9:23 AM, David Edmondson wrote:
>>> On Friday, 2021-02-26 at 00:02:38 +01, Philippe Mathieu-Daudé wrote:
>>>
 If the block drive is read-only we will model a "protected" flash
 device. We can thus use memory_region_init_rom_device_from_file()
 which mmap the backing file when creating the MemoryRegion.
 If the same backing file is used by multiple QEMU instances, this
 reduces the memory footprint (this is often the case with the
 CODE flash image from OVMF and AAVMF).

 Suggested-by: Stefan Hajnoczi 
 Signed-off-by: Philippe Mathieu-Daudé 
 ---
  hw/block/pflash_cfi01.c | 20 ++--
  hw/block/pflash_cfi02.c | 18 ++
  2 files changed, 28 insertions(+), 10 deletions(-)

 diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
 index a5fa8d8b74a..5757391df1c 100644
 --- a/hw/block/pflash_cfi01.c
 +++ b/hw/block/pflash_cfi01.c
 @@ -743,11 +743,19 @@ static void pflash_cfi01_realize(DeviceState *dev, 
 Error **errp)
  pfl->ro = 0;
  }
  
 -memory_region_init_rom_device(
 ->mem, OBJECT(dev),
 -_cfi01_ops,
 -pfl,
 -pfl->name, total_len, errp);
 +if (pfl->blk && pfl->ro) {
 +memory_region_init_rom_device_from_file(>mem, OBJECT(dev),
 +_cfi01_ops, pfl,
 +pfl->name, total_len,
 +qemu_real_host_page_size,
 +RAM_SHARED,
 +
 blk_bs(pfl->blk)->filename,
>>>
>>> How will this behave if someone does:
>>>
>>> -drive 
>>> file=OVMF_CODE.fd.qcow2,index=0,if=pflash,format=qcow2,readonly=on
>>>
>>> Honestly, I'm not sure why they would, but it works today.
>>
>> OK I can add a check for "raw" driver, but I don't know to check for
>> offset == 0.
> 
> This is pretty much where I got to when I tried using mmap() and gave up
> (mostly because I figured that adding layer violating checks to the
> pflash driver would not be well received, but also because we don't
> share the same underlying file between multiple VMs and I wasn't sure
> that it would eventually work well for writable devices).

Kevin suggested on IRC (#qemu-block, you are welcome to join) to
introduce a new blk_*() interface to mmap an image (or possibly
part of it), and have it work with non-zero raw offsets.




[PATCH v4 02/12] hw/block/nvme: remove block accounting for write zeroes

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

A Write Zeroes commands should not be counted in either the 'Data Units
Written' or in 'Host Write Commands' SMART/Health Information Log page.

Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index baa69a4a6859..8244909562a2 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2171,7 +2171,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
  nvme_rw_cb, req);
 }
 } else {
-block_acct_start(blk_get_stats(blk), >acct, 0, BLOCK_ACCT_WRITE);
 req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size,
BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
req);
-- 
2.30.1




[PATCH v4 01/12] hw/block/nvme: remove redundant len member in compare context

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

The 'len' member of the nvme_compare_ctx struct is redundant since the
same information is available in the 'iov' member.

Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index edd0b85c10ce..baa69a4a6859 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1697,7 +1697,6 @@ static void nvme_aio_copy_in_cb(void *opaque, int ret)
 struct nvme_compare_ctx {
 QEMUIOVector iov;
 uint8_t *bounce;
-size_t len;
 };
 
 static void nvme_compare_cb(void *opaque, int ret)
@@ -1718,16 +1717,16 @@ static void nvme_compare_cb(void *opaque, int ret)
 goto out;
 }
 
-buf = g_malloc(ctx->len);
+buf = g_malloc(ctx->iov.size);
 
-status = nvme_dma(nvme_ctrl(req), buf, ctx->len, DMA_DIRECTION_TO_DEVICE,
-  req);
+status = nvme_dma(nvme_ctrl(req), buf, ctx->iov.size,
+  DMA_DIRECTION_TO_DEVICE, req);
 if (status) {
 req->status = status;
 goto out;
 }
 
-if (memcmp(buf, ctx->bounce, ctx->len)) {
+if (memcmp(buf, ctx->bounce, ctx->iov.size)) {
 req->status = NVME_CMP_FAILURE;
 }
 
@@ -1964,7 +1963,6 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest 
*req)
 
 ctx = g_new(struct nvme_compare_ctx, 1);
 ctx->bounce = bounce;
-ctx->len = len;
 
 req->opaque = ctx;
 
-- 
2.30.1




Re: [RFC PATCH 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()

2021-03-01 Thread David Edmondson
On Monday, 2021-03-01 at 12:50:33 +01, Philippe Mathieu-Daudé wrote:

> On 2/26/21 9:23 AM, David Edmondson wrote:
>> On Friday, 2021-02-26 at 00:02:38 +01, Philippe Mathieu-Daudé wrote:
>> 
>>> If the block drive is read-only we will model a "protected" flash
>>> device. We can thus use memory_region_init_rom_device_from_file()
>>> which mmap the backing file when creating the MemoryRegion.
>>> If the same backing file is used by multiple QEMU instances, this
>>> reduces the memory footprint (this is often the case with the
>>> CODE flash image from OVMF and AAVMF).
>>>
>>> Suggested-by: Stefan Hajnoczi 
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>> ---
>>>  hw/block/pflash_cfi01.c | 20 ++--
>>>  hw/block/pflash_cfi02.c | 18 ++
>>>  2 files changed, 28 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
>>> index a5fa8d8b74a..5757391df1c 100644
>>> --- a/hw/block/pflash_cfi01.c
>>> +++ b/hw/block/pflash_cfi01.c
>>> @@ -743,11 +743,19 @@ static void pflash_cfi01_realize(DeviceState *dev, 
>>> Error **errp)
>>>  pfl->ro = 0;
>>>  }
>>>  
>>> -memory_region_init_rom_device(
>>> ->mem, OBJECT(dev),
>>> -_cfi01_ops,
>>> -pfl,
>>> -pfl->name, total_len, errp);
>>> +if (pfl->blk && pfl->ro) {
>>> +memory_region_init_rom_device_from_file(>mem, OBJECT(dev),
>>> +_cfi01_ops, pfl,
>>> +pfl->name, total_len,
>>> +qemu_real_host_page_size,
>>> +RAM_SHARED,
>>> +blk_bs(pfl->blk)->filename,
>> 
>> How will this behave if someone does:
>> 
>> -drive file=OVMF_CODE.fd.qcow2,index=0,if=pflash,format=qcow2,readonly=on
>> 
>> Honestly, I'm not sure why they would, but it works today.
>
> OK I can add a check for "raw" driver, but I don't know to check for
> offset == 0.

This is pretty much where I got to when I tried using mmap() and gave up
(mostly because I figured that adding layer violating checks to the
pflash driver would not be well received, but also because we don't
share the same underlying file between multiple VMs and I wasn't sure
that it would eventually work well for writable devices).

dme.
-- 
Driving at 90 down those country lanes, singing to "Tiny Dancer".



Re: [PATCH] storage-daemon: include current command line option in the errors

2021-03-01 Thread Kevin Wolf
Am 26.02.2021 um 12:03 hat Paolo Bonzini geschrieben:
> Use the location management facilities that the emulator uses, so that
> the current command line option appears in the error message.
> 
> Before:
> 
>   $ storage-daemon/qemu-storage-daemon --nbd key..=
>   qemu-storage-daemon: Invalid parameter 'key..'
> 
> After:
> 
>   $ storage-daemon/qemu-storage-daemon --nbd key..=
>   qemu-storage-daemon: --nbd key..=: Invalid parameter 'key..'
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  storage-daemon/qemu-storage-daemon.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/storage-daemon/qemu-storage-daemon.c 
> b/storage-daemon/qemu-storage-daemon.c
> index 9021a46b3a..a8f8d83f6f 100644
> --- a/storage-daemon/qemu-storage-daemon.c
> +++ b/storage-daemon/qemu-storage-daemon.c
> @@ -152,6 +152,20 @@ static void init_qmp_commands(void)
>   qmp_marshal_qmp_capabilities, QCO_ALLOW_PRECONFIG);
>  }
>  
> +static int getopt_set_loc(int argc, char **argv, const char *optstring,
> +  const struct option *longopts)
> +{
> +int c, save_index;
> +
> +optarg = NULL;
> +save_index = optind;
> +c = getopt_long(argc, argv, optstring, longopts, NULL);
> +if (optarg) {
> +loc_set_cmdline(argv, save_index, MAX(1, optind - save_index));

This save_index approach isn't perfect because getopt may skip
non-option arguments and they will be included in the help text:

$ build/storage-daemon/qemu-storage-daemon foo --object iothread
qemu-storage-daemon: foo --object iothread: Parameter 'id' is missing

$ build/storage-daemon/qemu-storage-daemon foo --object iothread,id=t
qemu-storage-daemon: --object iothread,id=t foo: Unexpected argument: foo

However, without changing the interface of loc_set_cmdline(), getting
the right index seems hard. Not sure what is the best way for fixing
this or if it's worth the effort.

Kevin

> +}
> +return c;
> +}
> +
>  static void process_options(int argc, char *argv[])
>  {
>  int c;
> @@ -174,7 +188,7 @@ static void process_options(int argc, char *argv[])
>   * they are given on the command lines. This means that things must be
>   * defined first before they can be referenced in another option.
>   */
> -while ((c = getopt_long(argc, argv, "hT:V", long_options, NULL)) != -1) {
> +while ((c = getopt_set_loc(argc, argv, "hT:V", long_options)) != -1) {
>  switch (c) {
>  case '?':
>  exit(EXIT_FAILURE);
> @@ -283,6 +297,7 @@ static void process_options(int argc, char *argv[])
>  error_report("Unexpected argument: %s", argv[optind]);
>  exit(EXIT_FAILURE);
>  }
> +loc_set_none();
>  }
>  
>  int main(int argc, char *argv[])
> -- 
> 2.26.2
> 




Re: [RFC PATCH v2 3/4] block: Support multiple reopening with x-blockdev-reopen

2021-03-01 Thread Peter Krempa
On Mon, Mar 01, 2021 at 12:07:26 +0100, Kevin Wolf wrote:
> Am 25.02.2021 um 18:02 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > 24.02.2021 15:33, Kevin Wolf wrote:
> > > Am 09.02.2021 um 09:03 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > 08.02.2021 21:44, Alberto Garcia wrote:
> > > > > Signed-off-by: Alberto Garcia 
> > > > > ---
> > > > >qapi/block-core.json   |  2 +-
> > > > >include/block/block.h  |  1 +
> > > > >block.c| 16 +--
> > > > >blockdev.c | 85 
> > > > > +-
> > > > >tests/qemu-iotests/155 |  9 ++--
> > > > >tests/qemu-iotests/165 |  4 +-
> > > > >tests/qemu-iotests/245 | 27 +++-
> > > > >tests/qemu-iotests/248 |  2 +-
> > > > >tests/qemu-iotests/248.out |  2 +-
> > > > >tests/qemu-iotests/298 |  4 +-
> > > > >10 files changed, 89 insertions(+), 63 deletions(-)
> > > > > 
> > > > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > > > index c0e7c23331..b9fcf20a81 100644
> > > > > --- a/qapi/block-core.json
> > > > > +++ b/qapi/block-core.json
> > > > > @@ -4177,7 +4177,7 @@
> > > > ># Since: 4.0
> > > > >##
> > > > >{ 'command': 'x-blockdev-reopen',
> > > > > -  'data': 'BlockdevOptions', 'boxed': true }
> > > > > +  'data': { 'options': ['BlockdevOptions'] } }
> > > > 
> > > > Do we also want to drop x- prefix?
> > > 
> > > libvirt really wants to have a stable blockdev-reopen interface in 6.0
> > > because enabling the incremental backup code depends on this (they just
> > > toggle the readonly flag if I understand correctly, so most of the work
> > > we're currently doing isn't even relevant at this moment for libvirt).
> > 
> > Do you know what is the case exactly? If they do it to remove dirty bitmap
> > from backing image after snapshot operation, probably we'd better improve
> > block-dirty-bitmap-remove command to be able to reopen r-o image?
> > 
> > (I just recently faced such a task)
> 
> I think it was to switch nodes between read-only and read-write, but I
> don't remember the exact context.
> 
> Peter, can you add the details?

Libvirt indeed has code to drive blockdev-reopen to use it to toggle the
read-write state for block dirty bitmap manipulation.

Few of the cases where we use it:

- after blockjobs to toggle the destination image writable (the
  blockjobs turns it RO when it finishes) so that we can merge the
  appropriate bitmaps

- for deletion of bitmaps in backing images in the checkpoint APIs

Both of the operations really require blockdev-reopen and are required
for the incremental backup feature as whole since bitmap manipulation is
crucial for it and it doesn't happen automatically.

The blockdev code has the facilities to also use blockdev-reopen to
change RO/RW state for blockjobs if it will be needed in the future or
it can be later used to even replace the auto-read-only feature of the
protocol node drivers which were actually added to make -blockdev work
in libvirt's s-virt environment properly.




Re: block/throttle and burst bucket

2021-03-01 Thread Peter Lieven
Am 01.03.21 um 11:59 schrieb Kevin Wolf:
> Am 26.02.2021 um 13:33 hat Peter Lieven geschrieben:
>> Am 26.02.21 um 10:27 schrieb Alberto Garcia:
>>> On Thu 25 Feb 2021 06:34:48 PM CET, Peter Lieven  wrote:
 I was wondering if there is a way to check from outside (qmp etc.) if
 a throttled block device has exceeded the iops_max_length seconds of
 time bursting up to iops_max and is now hard limited to the iops limit
 that is supplied?

 Would it be also a good idea to exetend the accounting to account for
 requests that must have waited before being sent out to the backend
 device?
>>> No, there's no such interface as far as I'm aware. I think one problem
>>> is that throttling is now done using a filter, that can be inserted
>>> anywhere in the node graph, and accounting is done at the BlockBackend
>>> level.
>>>
>>> We don't even have a query-block-throttle function. I actually started
>>> to write one six years ago but it was never finished.
>>
>> A quick idea that came to my mind was to add an option to emit a QMP
>> event if the burst_bucket is exhausted and hard limits are enforced.
> Do you actually need to do something every time that it's exceeded, so
> QEMU needs to be the active part sending out an event, or is it
> something that you need to check in specific places and could reasonably
> query on demand?
>
> For the latter, my idea would have been adding a new read-only QOM
> property to the throttle group object that exposes how much is still
> left. When it becomes 0, the hard limits are enforced.
>
>> There seems to be something wrong in the throttling code anyway.
>> Throttling causes addtional i/o latency always even if the actual iops
>> rate is far away from the limits and ever more far away from the burst
>> limits. I will dig into this.
>>
>> My wishlist:
>>
>>  - have a possibility to query the throttling state.
>>  - have counters for no of delayed ops and for how long they were delayed.
>>  - have counters for untrottled <= 4k request performance for a backend 
>> storage device.
>>
>> The later two seem not trivial as you mentioned.
> Do you need the information per throttle node or per throttle group? For
> the latter, the same QOM property approach would work.


Hi Kevin,


per throttle-group information would be sufficient. So you would expose the the 
level of the bucket and

additionally a counter for throttled vs. total ops and total delay?


Why we talk about throttling I still do not understand the following part in 
util/throttle.c function throttle_compute_wait


    if (!bkt->max) {
    /* If bkt->max is 0 we still want to allow short bursts of I/O
 * from the guest, otherwise every other request will be throttled
 * and performance will suffer considerably. */
    bucket_size = (double) bkt->avg / 10;
    burst_bucket_size = 0;
    } else {
    /* If we have a burst limit then we have to wait until all I/O
 * at burst rate has finished before throttling to bkt->avg */
    bucket_size = bkt->max * bkt->burst_length;
    burst_bucket_size = (double) bkt->max / 10;
    }


Why burst_bucket_size = bkt->max / 10?

>From what I understand it should be bkt->max. Otherwise we compare the "extra" 
>against a tenth of the bucket capacity

and schedule a timer where it is not necessary.


What am I missing here?



Peter






Re: [PATCH v3 15/21] sd: emmc: Update CID structure for eMMC

2021-03-01 Thread Cédric Le Goater
On 2/28/21 8:33 PM, Sai Pavan Boddu wrote:
> CID structure is little different for eMMC, w.r.t to product name and
> manufacturing date.
> 
> Signed-off-by: Sai Pavan Boddu 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  hw/sd/sd.c | 47 ++-
>  1 file changed, 30 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index bba0446..08b77ad 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -365,23 +365,36 @@ static void sd_set_scr(SDState *sd)
>  
>  static void sd_set_cid(SDState *sd)
>  {
> -sd->cid[0] = MID;   /* Fake card manufacturer ID (MID) */
> -sd->cid[1] = OID[0];/* OEM/Application ID (OID) */
> -sd->cid[2] = OID[1];
> -sd->cid[3] = PNM[0];/* Fake product name (PNM) */
> -sd->cid[4] = PNM[1];
> -sd->cid[5] = PNM[2];
> -sd->cid[6] = PNM[3];
> -sd->cid[7] = PNM[4];
> -sd->cid[8] = PRV;   /* Fake product revision (PRV) */
> -sd->cid[9] = 0xde;  /* Fake serial number (PSN) */
> -sd->cid[10] = 0xad;
> -sd->cid[11] = 0xbe;
> -sd->cid[12] = 0xef;
> -sd->cid[13] = 0x00 |/* Manufacture date (MDT) */
> -((MDT_YR - 2000) / 10);
> -sd->cid[14] = ((MDT_YR % 10) << 4) | MDT_MON;
> -sd->cid[15] = (sd_crc7(sd->cid, 15) << 1) | 1;
> +if (sd->emmc) {
> +sd->cid[0] = MID;
> +sd->cid[1] = 0x1;   /* CBX */
> +sd->cid[2] = OID[0];/* OEM/Application ID (OID) */
> +sd->cid[8] = 0x0;
> +sd->cid[9] = PRV;/* Fake product revision (PRV) */
> +sd->cid[10] = 0xde;  /* Fake serial number (PSN) */
> +sd->cid[11] = 0xad;
> +sd->cid[12] = 0xbe;
> +sd->cid[13] = 0xef;
> +sd->cid[14] = ((MDT_YR - 1997) % 0x10); /* MDT */
> +} else {
> +sd->cid[0] = MID;   /* Fake card manufacturer ID (MID) */
> +sd->cid[1] = OID[0];/* OEM/Application ID (OID) */
> +sd->cid[2] = OID[1];
> +sd->cid[8] = PRV;   /* Fake product revision (PRV) */
> +sd->cid[9] = 0xde;  /* Fake serial number (PSN) */
> +sd->cid[10] = 0xad;
> +sd->cid[11] = 0xbe;
> +sd->cid[12] = 0xef;
> +sd->cid[13] = 0x00 |/* Manufacture date (MDT) */
> +((MDT_YR - 2000) / 10);
> +sd->cid[14] = ((MDT_YR % 10) << 4) | MDT_MON;
> +   }
> +   sd->cid[3] = PNM[0];/* Fake product name (PNM) 48bit */
> +   sd->cid[4] = PNM[1];
> +   sd->cid[5] = PNM[2];
> +   sd->cid[6] = PNM[3];
> +   sd->cid[7] = PNM[4];
> +   sd->cid[15] = (sd_crc7(sd->cid, 15) << 1) | 1;
>  }


These are constant values and could be kept in SDCardClass. 

C.


>  #define HWBLOCK_SHIFT   9   /* 512 bytes */
> 




Re: [PATCH v3 16/21] sd: emmc: Support boot area in emmc image

2021-03-01 Thread Cédric Le Goater
On 2/28/21 8:33 PM, Sai Pavan Boddu wrote:
> From: Joel Stanley 
> 
> This assumes a specially constructued image:
> 
>   dd if=/dev/zero of=mmc-bootarea.img count=2 bs=1M
>   dd if=u-boot-spl.bin of=mmc-bootarea.img conv=notrunc
>   dd if=u-boot.bin of=mmc-bootarea.img conv=notrunc count=64 bs=1K
>   cat mmc-bootarea.img obmc-phosphor-image.wic > mmc.img
>   truncate --size 16GB mmc.img
>   truncate --size 128MB mmc-bootarea.img
> 
> Signed-off-by: Joel Stanley 
> [clg: - changes on the definition names ]
> Signed-off-by: Cédric Le Goater 
> [spb: use data_start property to access right emmc partition,
>   Clean up PARTITION_ENABLE support as incomplete,
>   Fix commit message to be generic.]
> Signed-off-by: Sai Pavan Boddu 
> ---
>  hw/sd/sd.c | 40 
>  1 file changed, 40 insertions(+)
> 
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 08b77ad..d311477 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1044,6 +1044,34 @@ static void sd_lock_command(SDState *sd)
>  sd->card_status &= ~CARD_IS_LOCKED;
>  }
>  
> +/*
> + * This requires a disk image that has two boot partitions inserted at the
> + * beginning of it. The size of the boot partitions are configured in the
> + * ext_csd structure, which is hardcoded in qemu. They are currently set to
> + * 1MB each.
> + */
> +static uint32_t sd_bootpart_offset(SDState *sd)
> +{
> +unsigned int access = sd->ext_csd[EXT_CSD_PART_CONFIG] &
> +EXT_CSD_PART_CONFIG_ACC_MASK;
> +unsigned int boot_capacity = sd->ext_csd[EXT_CSD_BOOT_MULT] << 17;
> +
> +if (!sd->emmc) {
> +return 0;
> +}
> +
> +switch (access) {
> +case EXT_CSD_PART_CONFIG_ACC_DEFAULT:
> +return boot_capacity * 2;
> +case EXT_CSD_PART_CONFIG_ACC_BOOT0:
> +return 0;
> +case EXT_CSD_PART_CONFIG_ACC_BOOT0 + 1:
> +return boot_capacity * 1;
> +default:
> + g_assert_not_reached();
> +}
> +}
> +
>  static sd_rsp_type_t sd_normal_command(SDState *sd, SDRequest req)
>  {
>  uint32_t rca = 0x;
> @@ -1359,6 +1387,9 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  return sd_r1;
>  }
>  
> +if (sd->emmc) {
> +addr += sd_bootpart_offset(sd);


This could be a class handler. The default behavior would be to return 0.

C.

> +}
>  sd->state = sd_sendingdata_state;
>  sd->data_start = addr;
>  sd->data_offset = 0;
> @@ -1378,6 +1409,9 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  return sd_r1;
>  }
>  
> +if (sd->emmc) {
> +addr += sd_bootpart_offset(sd);
> +}
>  sd->state = sd_sendingdata_state;
>  sd->data_start = addr;
>  sd->data_offset = 0;
> @@ -1434,6 +1468,9 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  return sd_r1;
>  }
>  
> +if (sd->emmc) {
> +addr += sd_bootpart_offset(sd);
> +}
>  sd->state = sd_receivingdata_state;
>  sd->data_start = addr;
>  sd->data_offset = 0;
> @@ -1464,6 +1501,9 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  return sd_r1;
>  }
>  
> +if (sd->emmc) {
> +addr += sd_bootpart_offset(sd);
> +}
>  sd->state = sd_receivingdata_state;
>  sd->data_start = addr;
>  sd->data_offset = 0;
> 




Re: [PATCH v2 19/31] qapi/qom: QAPIfy object-add

2021-03-01 Thread Paolo Bonzini

On 01/03/21 12:54, Kevin Wolf wrote:

Please add a check in object_property_add_child that the id is well formed
(using the id_wellformed function).  This is pre-existing, but it becomes a
regression for -object later in the series.

Are the conditions for internally called object_property_add_child()
actually the same as for IDs specified by the user? For example, I seem
to remember some array-ish properties with [] in their name which aren't
allowed by id_wellformed().


Yes, you are right.


The obvious place to affect only the external interfaces would be
user_creatable_add_type().


Makes sense, thanks.

Paolo




Re: [PATCH v3 07/21] sd: sdmmc-internal: Add command string for SEND_OP_CMD

2021-03-01 Thread Cédric Le Goater
On 2/28/21 8:33 PM, Sai Pavan Boddu wrote:
> From: Cédric Le Goater 
> 
> This adds extra info to trace log.
> 
> Signed-off-by: Sai Pavan Boddu 

>From and Signed-off-by are not in sync :)

C.

> ---
>  hw/sd/sdmmc-internal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/sd/sdmmc-internal.c b/hw/sd/sdmmc-internal.c
> index 2053def..8648a78 100644
> --- a/hw/sd/sdmmc-internal.c
> +++ b/hw/sd/sdmmc-internal.c
> @@ -14,7 +14,7 @@
>  const char *sd_cmd_name(uint8_t cmd)
>  {
>  static const char *cmd_abbrev[SDMMC_CMD_MAX] = {
> - [0]= "GO_IDLE_STATE",
> + [0]= "GO_IDLE_STATE",   [1]= "SEND_OP_CMD",
>   [2]= "ALL_SEND_CID",[3]= "SEND_RELATIVE_ADDR",
>   [4]= "SET_DSR", [5]= "IO_SEND_OP_COND",
>   [6]= "SWITCH_FUNC", [7]= "SELECT/DESELECT_CARD",
> 




Re: [PATCH v3 06/21] sd: emmc: Update CMD8 to send EXT_CSD register

2021-03-01 Thread Cédric Le Goater
On 2/28/21 8:33 PM, Sai Pavan Boddu wrote:
> From: Vincent Palatin 
> 
> Sends the EXT_CSD register as response to CMD8.
> 
> Signed-off-by: Vincent Palatin 
> Signed-off-by: Sai Pavan Boddu 
> ---
>  hw/sd/sd.c | 52 
>  1 file changed, 36 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index a26695b..181e7e2 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1141,24 +1141,37 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  }
>  break;
>  
> -case 8: /* CMD8:   SEND_IF_COND */
> -if (sd->spec_version < SD_PHY_SPECv2_00_VERS) {
> -break;
> -}
> -if (sd->state != sd_idle_state) {
> -break;
> -}
> -sd->vhs = 0;
> -
> -/* No response if not exactly one VHS bit is set.  */
> -if (!(req.arg >> 8) || (req.arg >> (ctz32(req.arg & ~0xff) + 1))) {
> -return sd->spi ? sd_r7 : sd_r0;
> -}
> +case 8:/* CMD8:   SEND_IF_COND / SEND_EXT_CSD */
> +if (sd->emmc) {
> +switch (sd->state) {
> +case sd_transfer_state:
> +/* MMC : Sends the EXT_CSD register as a Block of data */
> +sd->state = sd_sendingdata_state;
> +memcpy(sd->data, sd->ext_csd, sizeof(sd->ext_csd));
> +sd->data_start = addr;
> +sd->data_offset = 0;
> +return sd_r1;
> +default:
> +break;
> +}

This is big enough to be a SDCardClass handler.

Thanks,

C.


> +} else {
> +if (sd->spec_version < SD_PHY_SPECv2_00_VERS) {
> +break;
> +}
> +if (sd->state != sd_idle_state) {
> +break;
> +}
> +sd->vhs = 0;
>  
> -/* Accept.  */
> -sd->vhs = req.arg;
> -return sd_r7;
> +/* No response if not exactly one VHS bit is set.  */
> +if (!(req.arg >> 8) || (req.arg >> (ctz32(req.arg & ~0xff) + 
> 1))) {
> +return sd->spi ? sd_r7 : sd_r0;
> +}
>  
> +/* Accept.  */
> +sd->vhs = req.arg;
> +return sd_r7;
> +}
>  case 9: /* CMD9:   SEND_CSD */
>  switch (sd->state) {
>  case sd_standby_state:
> @@ -2081,6 +2094,13 @@ uint8_t sd_read_byte(SDState *sd)
>  sd->state = sd_transfer_state;
>  break;
>  
> +case 8: /* CMD8: SEND_EXT_CSD on MMC */
> +ret = sd->data[sd->data_offset++];
> +if (sd->data_offset >= sizeof(sd->ext_csd)) {
> +sd->state = sd_transfer_state;
> +}
> +break;
> +
>  case 9: /* CMD9:   SEND_CSD */
>  case 10:/* CMD10:  SEND_CID */
>  ret = sd->data[sd->data_offset ++];
> 




Re: [RFC PATCH 1/3] exec/memory: Introduce memory_region_init_rom_device_from_file()

2021-03-01 Thread Philippe Mathieu-Daudé
On 3/1/21 12:53 PM, Stefano Garzarella wrote:
> I don't know this code very well, but I have a couple of comments below :-)
> 
> On Fri, Feb 26, 2021 at 12:02:36AM +0100, Philippe Mathieu-Daudé wrote:
>> Introduce memory_region_init_rom_device_from_file() which mmap
>> the backing file of ROM devices. This allows to reduce QEMU memory
>> footprint as the same file can be shared between multiple instances
>> of QEMU.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>> include/exec/memory.h | 85 +
>> softmmu/memory.c  | 98 +++
>> 2 files changed, 183 insertions(+)
...
>> +void memory_region_init_rom_device_from_file_nomigrate(MemoryRegion *mr,
>> +   Object *owner,
>> +   const
>> MemoryRegionOps *ops,
>> +   void *opaque,
>> +   const char *name,
>> +   uint64_t size,
>> +   uint64_t align,
>> +   uint32_t
>> ram_flags,
>> +   const char *path,
>> +   bool readonly,
>> +   Error **errp)
>> +{
>> +    Error *err = NULL;
>> +
>> +    assert(ops);
>> +#ifdef CONFIG_POSIX
>> +    memory_region_init(mr, owner, name, size);
>> +    mr->opaque = opaque;
>> +    mr->ops = ops;
>> +    mr->rom_device = true;
>> +    mr->readonly = readonly;
>> +    mr->ram = true;
>> +    mr->align = align;
>> +    mr->terminates = true;
>> +    mr->destructor = memory_region_destructor_ram;
>> +    mr->ram_block = qemu_ram_alloc_from_file(size, mr, ram_flags, path,
>> + readonly, );
>> +    if (err) {
>> +    mr->size = int128_zero();
>> +    object_unparent(OBJECT(mr));
>> +    error_propagate(errp, err);
>> +    }
>> +#else
>> +    g_autoptr(GError) gerr = NULL;
>> +    gsize len;
>> +
>> +    memory_region_init(mr, owner, name, size);
>> +    mr->ops = ops;
>> +    mr->opaque = opaque;
>> +    mr->terminates = true;
>> +    mr->rom_device = true;
> 
> Why when CONFIG_POSIX is defined we set 'mr->ram', 'mr->align', and
> 'mr->readonly = readonly' but not here?
> (I honestly don't know if they are important, I ask out of curiosity.  :-)

I suppose we should and I forgot :/

> 
>> +
>> +    if (!g_file_get_contents(path, >contents, , )) {
> 
> Should we do these steps in case of an error?
> 
>   mr->size = int128_zero();
>   object_unparent(OBJECT(mr));

Yes...

> 
>> +    error_setg(errp, "Unable to read '%s': %s", path,
>> gerr->message);
>> +    return;
>> +    }
>> +    mr->destructor = memory_region_destructor_contents;
>> +    mr->contents = g_realloc(mr->contents, size);
>> +    mr->ram_block = qemu_ram_alloc_from_ptr(size, mr->contents, mr,
>> );
>> +    if (err) {
>> +    mr->size = int128_zero();
>> +    object_unparent(OBJECT(mr));
>> +    error_propagate(errp, err);
>> +    }
>> +#endif
> 
> Maybe I would reorganize the code inside ifdef like this:
> 
>     memory_region_init(mr, owner, name, size);
>     mr->opaque = opaque;
>     ...
>     #ifdef CONFIG_POSIX
>     mr->destructor = memory_region_destructor_ram;
>     mr->ram_block = qemu_ram_alloc_from_file(...);
>     #else
>     if (!g_file_get_contents(..)) {
>     ...
>     }
>     mr->destructor = memory_region_destructor_contents;
>     mr->contents = g_realloc(mr->contents, size);
>     mr->ram_block = qemu_ram_alloc_from_ptr(...)
>     #endif
> 
>     if (err) {
>     ...
>     }

Yes, thanks :)

> 
> I don't have a strong opinion, just an idea.
> 
> Thanks,
> Stefano




Re: [RFC PATCH v2 3/4] block: Support multiple reopening with x-blockdev-reopen

2021-03-01 Thread Vladimir Sementsov-Ogievskiy

01.03.2021 14:07, Kevin Wolf wrote:

Am 25.02.2021 um 18:02 hat Vladimir Sementsov-Ogievskiy geschrieben:

24.02.2021 15:33, Kevin Wolf wrote:

Am 09.02.2021 um 09:03 hat Vladimir Sementsov-Ogievskiy geschrieben:

08.02.2021 21:44, Alberto Garcia wrote:

Signed-off-by: Alberto Garcia 
---
qapi/block-core.json   |  2 +-
include/block/block.h  |  1 +
block.c| 16 +--
blockdev.c | 85 +-
tests/qemu-iotests/155 |  9 ++--
tests/qemu-iotests/165 |  4 +-
tests/qemu-iotests/245 | 27 +++-
tests/qemu-iotests/248 |  2 +-
tests/qemu-iotests/248.out |  2 +-
tests/qemu-iotests/298 |  4 +-
10 files changed, 89 insertions(+), 63 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index c0e7c23331..b9fcf20a81 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4177,7 +4177,7 @@
# Since: 4.0
##
{ 'command': 'x-blockdev-reopen',
-  'data': 'BlockdevOptions', 'boxed': true }
+  'data': { 'options': ['BlockdevOptions'] } }


Do we also want to drop x- prefix?


libvirt really wants to have a stable blockdev-reopen interface in 6.0
because enabling the incremental backup code depends on this (they just
toggle the readonly flag if I understand correctly, so most of the work
we're currently doing isn't even relevant at this moment for libvirt).


Do you know what is the case exactly? If they do it to remove dirty bitmap
from backing image after snapshot operation, probably we'd better improve
block-dirty-bitmap-remove command to be able to reopen r-o image?

(I just recently faced such a task)


I think it was to switch nodes between read-only and read-write, but I
don't remember the exact context.



I already don't think that making implicit reopen-to-rw is a good idea. It's OK 
for blockdev-commit, but may be unexpected for bitmaps manipulation.




Given that the soft freeze is coming closer (March 16), I wonder if we
should just make this API change and declare the interface stable. We
can then make Vladimir's fixes and the file reopening on top of it - if
it's in time for 6.0, that would be good, but if not we could move it to
6.1 without impacting libvirt.

I think we're reasonable confident that the QAPI interfaces are right,
even if maybe not that all aspects of the implementation are right yet.

What do you think?


I think it's OK.. We have it since 4.0. What will we win keeping -x
for years? Even latest change from updating one device to several
could be easily done with help of 'alternate' if the command was
already stable.


I think your series is kind of important to really call the
implementation stable. We can always feature flags to indicate the fixes
if necessary, but it would still feel better to declare something stable
that doesn't have known bugs. :-)

Do you think your series will still take a while? Maybe my first
comments sounded a bit negative because it was really hard to review at
first without knowing the final state, but after all I think the
approach is sane and apart from some implementation details, we're not
that far away from getting it into a mergable state.



Thanks :)

I'm now busy with our bugs for Virtuozzo release.. Still, I hope, I'll have a 
chance to reroll permission-update series this week.

--
Best regards,
Vladimir



Re: [PATCH v2 19/31] qapi/qom: QAPIfy object-add

2021-03-01 Thread Kevin Wolf
Am 26.02.2021 um 12:30 hat Paolo Bonzini geschrieben:
> On 24/02/21 14:52, Kevin Wolf wrote:
> > +v = qobject_output_visitor_new();
> > +visit_type_ObjectOptions(v, NULL, , _abort);
> > +visit_complete(v, );
> > +visit_free(v);
> > +
> > +props = qobject_to(QDict, qobj);
> > +qdict_del(props, "qom-type");
> > +qdict_del(props, "id");
> > +
> > +v = qobject_input_visitor_new(QOBJECT(props));
> > +obj = user_creatable_add_type(ObjectType_str(options->qom_type),
> > +  options->id, props, v, errp);
> > +object_unref(obj);
> 
> Please add a check in object_property_add_child that the id is well formed
> (using the id_wellformed function).  This is pre-existing, but it becomes a
> regression for -object later in the series.

Are the conditions for internally called object_property_add_child()
actually the same as for IDs specified by the user? For example, I seem
to remember some array-ish properties with [] in their name which aren't
allowed by id_wellformed().

The obvious place to affect only the external interfaces would be
user_creatable_add_type().

Kevin




[RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()

2021-03-01 Thread Philippe Mathieu-Daudé
If the block drive is read-only we will model a "protected" flash
device. We can thus use memory_region_init_rom_device_from_file()
which mmap the backing file when creating the MemoryRegion.
If the same backing file is used by multiple QEMU instances, this
reduces the memory footprint (this is often the case with the
CODE flash image from OVMF and AAVMF).

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/pflash_cfi01.c | 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index a5fa8d8b74a..ec290636298 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -702,6 +702,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 int ret;
 uint64_t blocks_per_device, sector_len_per_device, device_len;
 int num_devices;
+bool romd_mr_shared_mapped;
 
 if (pfl->sector_len == 0) {
 error_setg(errp, "attribute \"sector-length\" not specified or zero.");
@@ -743,19 +744,41 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 pfl->ro = 0;
 }
 
-memory_region_init_rom_device(
->mem, OBJECT(dev),
-_cfi01_ops,
-pfl,
-pfl->name, total_len, errp);
-if (*errp) {
-return;
+if (pfl->ro && pfl->blk) {
+BlockDriverState *bs = blk_bs(pfl->blk);
+
+/* If "raw" driver used, try to mmap the backing file as RAM_SHARED */
+if (bs->drv == _raw) { /* FIXME check offset=0 ? */
+Error *local_err = NULL;
+
+memory_region_init_rom_device_from_file(>mem, OBJECT(dev),
+_cfi01_ops, pfl,
+pfl->name, total_len,
+qemu_real_host_page_size,
+RAM_SHARED,
+bs->exact_filename,
+true, _err);
+if (local_err) {
+error_report_err(local_err);
+/* fall back to memory_region_init_rom_device() */
+} else {
+romd_mr_shared_mapped = true;
+}
+}
+}
+if (!romd_mr_shared_mapped) {
+memory_region_init_rom_device(>mem, OBJECT(dev),
+  _cfi01_ops, pfl,
+  pfl->name, total_len, errp);
+if (*errp) {
+return;
+}
 }
 
 pfl->storage = memory_region_get_ram_ptr(>mem);
 sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mem);
 
-if (pfl->blk) {
+if (pfl->blk && !romd_mr_shared_mapped) {
 if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
  errp)) {
 vmstate_unregister_ram(>mem, DEVICE(pfl));
-- 
2.26.2




[RFC PATCH v2 2/3] hw/block/pflash: Move code around

2021-03-01 Thread Philippe Mathieu-Daudé
First do the block checks, so we know if it is read-only or not.
Then create the MemoryRegion. This will allow optimization in
the next commit.

Reviewed-by: David Edmondson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/pflash_cfi01.c | 24 
 hw/block/pflash_cfi02.c | 18 +-
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 22287a1522e..a5fa8d8b74a 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -731,18 +731,6 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 }
 device_len = sector_len_per_device * blocks_per_device;
 
-memory_region_init_rom_device(
->mem, OBJECT(dev),
-_cfi01_ops,
-pfl,
-pfl->name, total_len, errp);
-if (*errp) {
-return;
-}
-
-pfl->storage = memory_region_get_ram_ptr(>mem);
-sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mem);
-
 if (pfl->blk) {
 uint64_t perm;
 pfl->ro = !blk_supports_write_perm(pfl->blk);
@@ -755,6 +743,18 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 pfl->ro = 0;
 }
 
+memory_region_init_rom_device(
+>mem, OBJECT(dev),
+_cfi01_ops,
+pfl,
+pfl->name, total_len, errp);
+if (*errp) {
+return;
+}
+
+pfl->storage = memory_region_get_ram_ptr(>mem);
+sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mem);
+
 if (pfl->blk) {
 if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
  errp)) {
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index 7962cff7455..4f62ce8917d 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -791,15 +791,6 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-memory_region_init_rom_device(>orig_mem, OBJECT(pfl),
-  _cfi02_ops, pfl, pfl->name,
-  pfl->chip_len, errp);
-if (*errp) {
-return;
-}
-
-pfl->storage = memory_region_get_ram_ptr(>orig_mem);
-
 if (pfl->blk) {
 uint64_t perm;
 pfl->ro = !blk_supports_write_perm(pfl->blk);
@@ -812,6 +803,15 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 pfl->ro = 0;
 }
 
+memory_region_init_rom_device(>orig_mem, OBJECT(pfl),
+  _cfi02_ops, pfl, pfl->name,
+  pfl->chip_len, errp);
+if (*errp) {
+return;
+}
+
+pfl->storage = memory_region_get_ram_ptr(>orig_mem);
+
 if (pfl->blk) {
 if (!blk_check_size_and_read_all(pfl->blk, pfl->storage,
  pfl->chip_len, errp)) {
-- 
2.26.2




[RFC PATCH v2 1/3] exec/memory: Introduce memory_region_init_rom_device_from_file()

2021-03-01 Thread Philippe Mathieu-Daudé
Introduce memory_region_init_rom_device_from_file() which mmap
the backing file of ROM devices. This allows to reduce QEMU memory
footprint as the same file can be shared between multiple instances
of QEMU.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/exec/memory.h | 85 +
 softmmu/memory.c  | 98 +++
 2 files changed, 183 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c6fb714e499..bacf7495003 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -487,6 +487,9 @@ struct MemoryRegion {
 const char *name;
 unsigned ioeventfd_nb;
 MemoryRegionIoeventfd *ioeventfds;
+#ifndef CONFIG_POSIX
+gchar *contents;
+#endif
 };
 
 struct IOMMUMemoryRegion {
@@ -1131,6 +1134,43 @@ void 
memory_region_init_rom_device_nomigrate(MemoryRegion *mr,
  uint64_t size,
  Error **errp);
 
+/**
+ * memory_region_init_rom_device_from_file_nomigrate:
+ * Initialize a ROM memory region from the specified backing file.
+ * Writes are handled via callbacks.
+ *
+ * Note that this function does not do anything to cause the data in the
+ * RAM side of the memory region to be migrated; that is the responsibility
+ * of the caller.
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @ops: callbacks for write access handling (must not be NULL).
+ * @opaque: passed to the read and write callbacks of the @ops structure.
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
+ *must be unique within any device
+ * @size: size of the region.
+ * @ram_flags: specify the properties of the ram block, which can be one
+ * or bit-or of following values
+ * - RAM_SHARED: mmap the backing file or device with MAP_SHARED
+ * - RAM_PMEM: the backend @mem_path is persistent memory
+ * Other bits are ignored.
+ * @path: specify the backing file
+ * @readonly: true to open @path for reading, false for read/write.
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+void memory_region_init_rom_device_from_file_nomigrate(MemoryRegion *mr,
+   Object *owner,
+   const MemoryRegionOps 
*ops,
+   void *opaque,
+   const char *name,
+   uint64_t size,
+   uint64_t align,
+   uint32_t ram_flags,
+   const char *path,
+   bool readonly,
+   Error **errp);
+
 /**
  * memory_region_init_iommu: Initialize a memory region of a custom type
  * that translates addresses
@@ -1249,6 +1289,51 @@ void memory_region_init_rom_device(MemoryRegion *mr,
Error **errp);
 
 
+/**
+ * memory_region_init_rom_device_from_file:
+ * Initialize a ROM memory region from the specified backing file.
+ * Writes are handled via callbacks.
+ *
+ * This function initializes a memory region backed by RAM for reads
+ * and callbacks for writes, and arranges for the RAM backing to
+ * be migrated (by calling vmstate_register_ram()
+ * if @owner is a DeviceState, or vmstate_register_ram_global() if
+ * @owner is NULL).
+ *
+ * TODO: Currently we restrict @owner to being either NULL (for
+ * global RAM regions with no owner) or devices, so that we can
+ * give the RAM block a unique name for migration purposes.
+ * We should lift this restriction and allow arbitrary Objects.
+ * If you pass a non-NULL non-device @owner then we will assert.
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @ops: callbacks for write access handling (must not be NULL).
+ * @opaque: passed to the read and write callbacks of the @ops structure.
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
+ *must be unique within any device
+ * @size: size of the region.
+ * @ram_flags: specify the properties of the ram block, which can be one
+ * or bit-or of following values
+ * - RAM_SHARED: mmap the backing file or device with MAP_SHARED
+ * - RAM_PMEM: the backend @mem_path is persistent memory
+ * Other bits are ignored.
+ * @path: specify the backing file
+ * @readonly: true to open @path for reading, false for read/write.
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+void 

[RFC PATCH v2 0/3] hw/block/pflash: Mmap read-only backend files with MAP_SHARED

2021-03-01 Thread Philippe Mathieu-Daudé
Since v1:
- check driver is "raw" (David)
- ignore CFI02 for now

Hi,

This series aims to reduce the memory footprint of flash devices
when the backing file is read-only.

When a backing file is read-only, the model considers the flash
is in "protected" mode. No write are allowed, but the MMIO
state machine is still usable.

This series introduces a new memory region helper to mmap files
and use it with the pflash device (only with read-only backing
files).

The goal is to reduce QEMU's memory footprint when multiple VMs
are instantiated using the same read-only backing file, which is
the case with the CODE flash from OVMF and AAVMF.

Previous attempts:

- Huawei
https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html
- Tencent
https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html
- Oracle
https://www.mail-archive.com/qemu-devel@nongnu.org/msg760065.html

RFC because yet another approach to tackle this technical debt,
and very little tested.

Regards,

Phil.

Philippe Mathieu-Daudé (3):
  exec/memory: Introduce memory_region_init_rom_device_from_file()
  hw/block/pflash: Move code around
  hw/block/pflash: use memory_region_init_rom_device_from_file()

 include/exec/memory.h   | 85 +++
 hw/block/pflash_cfi01.c | 49 +++--
 hw/block/pflash_cfi02.c | 18 
 softmmu/memory.c| 98 +
 4 files changed, 228 insertions(+), 22 deletions(-)

-- 
2.26.2





Re: [RFC PATCH 1/3] exec/memory: Introduce memory_region_init_rom_device_from_file()

2021-03-01 Thread Stefano Garzarella
I don't know this code very well, but I have a couple of comments below 
:-)


On Fri, Feb 26, 2021 at 12:02:36AM +0100, Philippe Mathieu-Daudé wrote:

Introduce memory_region_init_rom_device_from_file() which mmap
the backing file of ROM devices. This allows to reduce QEMU memory
footprint as the same file can be shared between multiple instances
of QEMU.

Signed-off-by: Philippe Mathieu-Daudé 
---
include/exec/memory.h | 85 +
softmmu/memory.c  | 98 +++
2 files changed, 183 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c6fb714e499..bacf7495003 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -487,6 +487,9 @@ struct MemoryRegion {
const char *name;
unsigned ioeventfd_nb;
MemoryRegionIoeventfd *ioeventfds;
+#ifndef CONFIG_POSIX
+gchar *contents;
+#endif
};

struct IOMMUMemoryRegion {
@@ -1131,6 +1134,43 @@ void 
memory_region_init_rom_device_nomigrate(MemoryRegion *mr,
 uint64_t size,
 Error **errp);

+/**
+ * memory_region_init_rom_device_from_file_nomigrate:
+ * Initialize a ROM memory region from the specified backing file.
+ * Writes are handled via callbacks.
+ *
+ * Note that this function does not do anything to cause the data in the
+ * RAM side of the memory region to be migrated; that is the responsibility
+ * of the caller.
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @ops: callbacks for write access handling (must not be NULL).
+ * @opaque: passed to the read and write callbacks of the @ops structure.
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
+ *must be unique within any device
+ * @size: size of the region.
+ * @ram_flags: specify the properties of the ram block, which can be one
+ * or bit-or of following values
+ * - RAM_SHARED: mmap the backing file or device with MAP_SHARED
+ * - RAM_PMEM: the backend @mem_path is persistent memory
+ * Other bits are ignored.
+ * @path: specify the backing file
+ * @readonly: true to open @path for reading, false for read/write.
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+void memory_region_init_rom_device_from_file_nomigrate(MemoryRegion *mr,
+   Object *owner,
+   const MemoryRegionOps 
*ops,
+   void *opaque,
+   const char 
*name,

+   uint64_t size,
+   uint64_t align,
+   uint32_t ram_flags,
+   const char *path,
+   bool readonly,
+   Error **errp);
+
/**
 * memory_region_init_iommu: Initialize a memory region of a custom type
 * that translates addresses
@@ -1249,6 +1289,51 @@ void memory_region_init_rom_device(MemoryRegion *mr,
   Error **errp);


+/**
+ * memory_region_init_rom_device_from_file:
+ * Initialize a ROM memory region from the specified backing file.
+ * Writes are handled via callbacks.
+ *
+ * This function initializes a memory region backed by RAM for reads
+ * and callbacks for writes, and arranges for the RAM backing to
+ * be migrated (by calling vmstate_register_ram()
+ * if @owner is a DeviceState, or vmstate_register_ram_global() if
+ * @owner is NULL).
+ *
+ * TODO: Currently we restrict @owner to being either NULL (for
+ * global RAM regions with no owner) or devices, so that we can
+ * give the RAM block a unique name for migration purposes.
+ * We should lift this restriction and allow arbitrary Objects.
+ * If you pass a non-NULL non-device @owner then we will assert.
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @ops: callbacks for write access handling (must not be NULL).
+ * @opaque: passed to the read and write callbacks of the @ops structure.
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
+ *must be unique within any device
+ * @size: size of the region.
+ * @ram_flags: specify the properties of the ram block, which can be one
+ * or bit-or of following values
+ * - RAM_SHARED: mmap the backing file or device with MAP_SHARED
+ * - RAM_PMEM: the backend @mem_path is persistent memory
+ * Other bits are ignored.
+ * @path: specify the backing file
+ * @readonly: true to open @path for reading, false 

Re: [RFC PATCH 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()

2021-03-01 Thread Philippe Mathieu-Daudé
On 2/26/21 9:23 AM, David Edmondson wrote:
> On Friday, 2021-02-26 at 00:02:38 +01, Philippe Mathieu-Daudé wrote:
> 
>> If the block drive is read-only we will model a "protected" flash
>> device. We can thus use memory_region_init_rom_device_from_file()
>> which mmap the backing file when creating the MemoryRegion.
>> If the same backing file is used by multiple QEMU instances, this
>> reduces the memory footprint (this is often the case with the
>> CODE flash image from OVMF and AAVMF).
>>
>> Suggested-by: Stefan Hajnoczi 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/block/pflash_cfi01.c | 20 ++--
>>  hw/block/pflash_cfi02.c | 18 ++
>>  2 files changed, 28 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
>> index a5fa8d8b74a..5757391df1c 100644
>> --- a/hw/block/pflash_cfi01.c
>> +++ b/hw/block/pflash_cfi01.c
>> @@ -743,11 +743,19 @@ static void pflash_cfi01_realize(DeviceState *dev, 
>> Error **errp)
>>  pfl->ro = 0;
>>  }
>>  
>> -memory_region_init_rom_device(
>> ->mem, OBJECT(dev),
>> -_cfi01_ops,
>> -pfl,
>> -pfl->name, total_len, errp);
>> +if (pfl->blk && pfl->ro) {
>> +memory_region_init_rom_device_from_file(>mem, OBJECT(dev),
>> +_cfi01_ops, pfl,
>> +pfl->name, total_len,
>> +qemu_real_host_page_size,
>> +RAM_SHARED,
>> +blk_bs(pfl->blk)->filename,
> 
> How will this behave if someone does:
> 
> -drive file=OVMF_CODE.fd.qcow2,index=0,if=pflash,format=qcow2,readonly=on
> 
> Honestly, I'm not sure why they would, but it works today.

OK I can add a check for "raw" driver, but I don't know to check for
offset == 0.




Re: [PATCH v2] hw/nvme: move nvme emulation out of hw/block

2021-03-01 Thread Klaus Jensen
On Mar  1 12:35, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> With the introduction of the nvme-subsystem device we are really
> cluttering up the hw/block directory.
> 
> As suggested by Philippe previously, move the nvme emulation to
> hw/nvme.
> 
> Suggested-by: Philippe Mathieu-Daudé 
> Signed-off-by: Klaus Jensen 
> ---
> v2:
>   * rebased on nvme-next
>   * got rid of the second patch (Minwoo)
> 

Argh. I forgot to add Minwoo's Acked-by and the change to the commit
messages that was suggested.

Anyway, would be nice to get this merged before I post v4 of the eedp
series since that will add yet another file in the hw/block directory.


signature.asc
Description: PGP signature


[PATCH v2] hw/nvme: move nvme emulation out of hw/block

2021-03-01 Thread Klaus Jensen
From: Klaus Jensen 

With the introduction of the nvme-subsystem device we are really
cluttering up the hw/block directory.

As suggested by Philippe previously, move the nvme emulation to
hw/nvme.

Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Klaus Jensen 
---
v2:
  * rebased on nvme-next
  * got rid of the second patch (Minwoo)

 meson.build   |   1 +
 hw/block/nvme-ns.h| 193 -
 hw/block/nvme-subsys.h|  32 
 hw/{block => nvme}/nvme.h | 198 +-
 hw/nvme/trace.h   |   1 +
 hw/{block/nvme.c => nvme/ctrl.c}  |   1 -
 hw/{block/nvme-ns.c => nvme/ns.c} |   1 -
 hw/{block/nvme-subsys.c => nvme/subsys.c} |   2 +-
 MAINTAINERS   |   2 +-
 hw/Kconfig|   1 +
 hw/block/Kconfig  |   5 -
 hw/block/meson.build  |   1 -
 hw/block/trace-events | 182 
 hw/meson.build|   1 +
 hw/nvme/Kconfig   |   4 +
 hw/nvme/meson.build   |   1 +
 hw/nvme/trace-events  | 180 
 17 files changed, 387 insertions(+), 419 deletions(-)
 delete mode 100644 hw/block/nvme-ns.h
 delete mode 100644 hw/block/nvme-subsys.h
 rename hw/{block => nvme}/nvme.h (55%)
 create mode 100644 hw/nvme/trace.h
 rename hw/{block/nvme.c => nvme/ctrl.c} (99%)
 rename hw/{block/nvme-ns.c => nvme/ns.c} (99%)
 rename hw/{block/nvme-subsys.c => nvme/subsys.c} (98%)
 create mode 100644 hw/nvme/Kconfig
 create mode 100644 hw/nvme/meson.build
 create mode 100644 hw/nvme/trace-events

diff --git a/meson.build b/meson.build
index a923f249d89e..59507b125cfe 100644
--- a/meson.build
+++ b/meson.build
@@ -1793,6 +1793,7 @@ if have_system
 'hw/misc/macio',
 'hw/net',
 'hw/net/can',
+'hw/nvme',
 'hw/nvram',
 'hw/pci',
 'hw/pci-host',
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
deleted file mode 100644
index 7af6884862b5..
--- a/hw/block/nvme-ns.h
+++ /dev/null
@@ -1,193 +0,0 @@
-/*
- * QEMU NVM Express Virtual Namespace
- *
- * Copyright (c) 2019 CNEX Labs
- * Copyright (c) 2020 Samsung Electronics
- *
- * Authors:
- *  Klaus Jensen  
- *
- * This work is licensed under the terms of the GNU GPL, version 2. See the
- * COPYING file in the top-level directory.
- *
- */
-
-#ifndef NVME_NS_H
-#define NVME_NS_H
-
-#define TYPE_NVME_NS "nvme-ns"
-#define NVME_NS(obj) \
-OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
-
-typedef struct NvmeZone {
-NvmeZoneDescr   d;
-uint64_tw_ptr;
-QTAILQ_ENTRY(NvmeZone) entry;
-} NvmeZone;
-
-typedef struct NvmeNamespaceParams {
-uint32_t nsid;
-QemuUUID uuid;
-
-uint16_t mssrl;
-uint32_t mcl;
-uint8_t  msrc;
-
-bool zoned;
-bool cross_zone_read;
-uint64_t zone_size_bs;
-uint64_t zone_cap_bs;
-uint32_t max_active_zones;
-uint32_t max_open_zones;
-uint32_t zd_extension_size;
-} NvmeNamespaceParams;
-
-typedef struct NvmeNamespace {
-DeviceState  parent_obj;
-BlockConfblkconf;
-int32_t  bootindex;
-int64_t  size;
-NvmeIdNs id_ns;
-const uint32_t *iocs;
-uint8_t  csi;
-
-NvmeSubsystem   *subsys;
-
-NvmeIdNsZoned   *id_ns_zoned;
-NvmeZone*zone_array;
-QTAILQ_HEAD(, NvmeZone) exp_open_zones;
-QTAILQ_HEAD(, NvmeZone) imp_open_zones;
-QTAILQ_HEAD(, NvmeZone) closed_zones;
-QTAILQ_HEAD(, NvmeZone) full_zones;
-uint32_tnum_zones;
-uint64_tzone_size;
-uint64_tzone_capacity;
-uint32_tzone_size_log2;
-uint8_t *zd_extensions;
-int32_t nr_open_zones;
-int32_t nr_active_zones;
-
-NvmeNamespaceParams params;
-
-struct {
-uint32_t err_rec;
-} features;
-} NvmeNamespace;
-
-static inline uint32_t nvme_nsid(NvmeNamespace *ns)
-{
-if (ns) {
-return ns->params.nsid;
-}
-
-return -1;
-}
-
-static inline bool nvme_ns_shared(NvmeNamespace *ns)
-{
-return !!ns->subsys;
-}
-
-static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
-{
-NvmeIdNs *id_ns = >id_ns;
-return _ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
-}
-
-static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
-{
-return nvme_ns_lbaf(ns)->ds;
-}
-
-/* calculate the number of LBAs that the namespace can accomodate */
-static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
-{
-return ns->size >> nvme_ns_lbads(ns);
-}
-
-/* convert an LBA to the equivalent in bytes */
-static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba)
-{
-return lba << nvme_ns_lbads(ns);
-}
-
-typedef struct NvmeCtrl NvmeCtrl;
-
-static inline NvmeZoneState nvme_get_zone_state(NvmeZone *zone)
-{
-return zone->d.zs >> 4;
-}
-
-static inline 

Re: [PATCH v2] blockjob: report a better error message

2021-03-01 Thread Kevin Wolf
Am 25.02.2021 um 11:36 hat Stefano Garzarella geschrieben:
> When a block job fails, we report strerror(-job->job.ret) error
> message, also if the job set an error object.
> Let's report a better error message using error_get_pretty(job->job.err).
> 
> If an error object was not set, strerror(-job->ret) is used as fallback,
> as explained in include/qemu/job.h:
> 
> typedef struct Job {
> ...
> /**
>  * Error object for a failed job.
>  * If job->ret is nonzero and an error object was not set, it will be set
>  * to strerror(-job->ret) during job_completed.
>  */
> Error *err;
> }
> 
> In block_job_query() there can be a transient where 'job.err' is not set
> by a scheduled bottom half. In that case we use strerror(-job->ret) as it
> was before.
> 
> Suggested-by: Kevin Wolf 
> Signed-off-by: Stefano Garzarella 

Thanks, applied to the block branch.

Kevin




Re: [PATCH 00/14] deprecations: remove many old deprecations

2021-03-01 Thread Kevin Wolf
Am 25.02.2021 um 18:32 hat Jim Fehlig geschrieben:
> Adding xen-devel and Ian to cc.
> 
> On 2/24/21 6:11 AM, Daniel P. Berrangé wrote:
> > The following features have been deprecated for well over the 2
> > release cycle we promise
> 
> This reminded me of a bug report we received late last year when updating to
> 5.2.0. 'virsh setvcpus' suddenly stopped working for Xen HVM guests. Turns
> out libxl uses cpu-add under the covers.
> 
> > 
> >``-usbdevice`` (since 2.10.0)
> >``-drive file=3Djson:{...{'driver':'file'}}`` (since 3.0)
> >``-vnc acl`` (since 4.0.0)
> >``-mon ...,control=3Dreadline,pretty=3Don|off`` (since 4.1)
> >``migrate_set_downtime`` and ``migrate_set_speed`` (since 2.8.0)
> >``query-named-block-nodes`` result ``encryption_key_missing`` (since 
> > 2.10.0)
> >``query-block`` result ``inserted.encryption_key_missing`` (since 2.10.0)
> >``migrate-set-cache-size`` and ``query-migrate-cache-size`` (since 
> > 2.11.0)
> >``query-named-block-nodes`` and ``query-block`` result 
> > dirty-bitmaps[i].sta=
> > tus (ince 4.0)
> >``query-cpus`` (since 2.12.0)
> >``query-cpus-fast`` ``arch`` output member (since 3.0.0)
> >``query-events`` (since 4.0)
> >chardev client socket with ``wait`` option (since 4.0)
> >``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` 
> > (s=
> > ince 4.0.0)
> >``ide-drive`` (since 4.2)
> >``scsi-disk`` (since 4.2)
> > 
> > AFAICT, libvirt has ceased to use all of these too.
> 
> A quick grep of the libxl code shows it uses -usbdevice, query-cpus, and 
> scsi-disk.
> 
> > There are many more similarly old deprecations not (yet) tackled.
> 
> The Xen tools maintainers will need to be more vigilant of the deprecations.
> I don't follow Xen development close enough to know if this topic has
> already been discussed.

MAINTAINERS has a section for "Incompatible changes" that covers
docs/system/deprecated.rst. Maybe if the Xen maintainers are interested
in that, we could add another list or individual people there so they
would see patches that deprecate something?

But either way, it would probably be useful to check the full
deprecation list rather than just what we're going to remove right now.

Kevin




Re: [PATCH v2 5/5] hw/block/nvme: report non-mdts command size limit for dsm

2021-03-01 Thread Klaus Jensen
On Feb 22 22:12, Klaus Jensen wrote:
> On Feb 23 05:55, Keith Busch wrote:
> > On Mon, Feb 22, 2021 at 07:47:59PM +0100, Klaus Jensen wrote:
> > > +typedef struct NvmeIdCtrlNvm {
> > > +uint8_t vsl;
> > > +uint8_t wzsl;
> > > +uint8_t wusl;
> > > +uint8_t dmrl;
> > > +uint32_tdmrsl;
> > > +uint64_tdmsl;
> > > +uint8_t rsvd16[4080];
> > > +} NvmeIdCtrlNvm;
> > 
> > TP 4040a still displays these fields with preceding '...' indicating
> > something comes before this. Is that just left-over from the integration
> > for TBD offsets, or is there something that still hasn't been accounted
> > for?
> 
> Good question.
> 
> But since the TBDs have been assigned I believe it is just a left-over.
> I must admit that I have not cross checked this with all other TPs, but
> AFAIK this is the only ratified TP that adds something to the
> NVM-specific identify controller data structure.

Are you otherwise OK with this?


signature.asc
Description: PGP signature


Re: [PATCH v3 05/21] sd: emmc: Add support for EXT_CSD & CSD for eMMC

2021-03-01 Thread Cédric Le Goater
On 2/28/21 8:33 PM, Sai Pavan Boddu wrote:
> From: Vincent Palatin 
> 
> eMMC CSD is similar to SD with an option to refer EXT_CSD for larger
> devices.
> 
> Signed-off-by: Vincent Palatin 
> [clg: Add user friendly macros for EXT_CSD register]
> Signed-off-by: Cédric Le Goater 
> [spb: updated commit message]
> Signed-off-by: Sai Pavan Boddu 
> ---
>  hw/sd/sdmmc-internal.h | 97 
> ++
>  hw/sd/sd.c | 61 +--
>  2 files changed, 156 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/sd/sdmmc-internal.h b/hw/sd/sdmmc-internal.h
> index d8bf17d..7ab7b4d 100644
> --- a/hw/sd/sdmmc-internal.h
> +++ b/hw/sd/sdmmc-internal.h
> @@ -37,4 +37,101 @@ const char *sd_cmd_name(uint8_t cmd);
>   */
>  const char *sd_acmd_name(uint8_t cmd);
>  
> +/*
> + * EXT_CSD fields
> + */
> +
> +#define EXT_CSD_CMDQ_MODE_EN15  /* R/W */
> +#define EXT_CSD_FLUSH_CACHE   32  /* W */
> +#define EXT_CSD_CACHE_CTRL33  /* R/W */
> +#define EXT_CSD_POWER_OFF_NOTIFICATION  34  /* R/W */
> +#define EXT_CSD_PACKED_FAILURE_INDEX  35  /* RO */
> +#define EXT_CSD_PACKED_CMD_STATUS 36  /* RO */
> +#define EXT_CSD_EXP_EVENTS_STATUS 54  /* RO, 2 bytes */
> +#define EXT_CSD_EXP_EVENTS_CTRL   56  /* R/W, 2 bytes */
> +#define EXT_CSD_DATA_SECTOR_SIZE  61  /* R */
> +#define EXT_CSD_GP_SIZE_MULT143 /* R/W */
> +#define EXT_CSD_PARTITION_SETTING_COMPLETED 155 /* R/W */
> +#define EXT_CSD_PARTITION_ATTRIBUTE 156 /* R/W */
> +#define EXT_CSD_PARTITION_SUPPORT 160 /* RO */
> +#define EXT_CSD_HPI_MGMT161 /* R/W */
> +#define EXT_CSD_RST_N_FUNCTION162 /* R/W */
> +#define EXT_CSD_BKOPS_EN163 /* R/W */
> +#define EXT_CSD_BKOPS_START   164 /* W */
> +#define EXT_CSD_SANITIZE_START165 /* W */
> +#define EXT_CSD_WR_REL_PARAM166 /* RO */
> +#define EXT_CSD_RPMB_MULT   168 /* RO */
> +#define EXT_CSD_FW_CONFIG   169 /* R/W */
> +#define EXT_CSD_BOOT_WP 173 /* R/W */
> +#define EXT_CSD_ERASE_GROUP_DEF   175 /* R/W */
> +#define EXT_CSD_PART_CONFIG   179 /* R/W */
> +#define EXT_CSD_ERASED_MEM_CONT   181 /* RO */
> +#define EXT_CSD_BUS_WIDTH   183 /* R/W */
> +#define EXT_CSD_STROBE_SUPPORT184 /* RO */
> +#define EXT_CSD_HS_TIMING   185 /* R/W */
> +#define EXT_CSD_POWER_CLASS   187 /* R/W */
> +#define EXT_CSD_REV 192 /* RO */
> +#define EXT_CSD_STRUCTURE   194 /* RO */
> +#define EXT_CSD_CARD_TYPE   196 /* RO */
> +#define EXT_CSD_DRIVER_STRENGTH   197 /* RO */
> +#define EXT_CSD_OUT_OF_INTERRUPT_TIME 198 /* RO */
> +#define EXT_CSD_PART_SWITCH_TIME199 /* RO */
> +#define EXT_CSD_PWR_CL_52_195   200 /* RO */
> +#define EXT_CSD_PWR_CL_26_195   201 /* RO */
> +#define EXT_CSD_PWR_CL_52_360   202 /* RO */
> +#define EXT_CSD_PWR_CL_26_360   203 /* RO */
> +#define EXT_CSD_SEC_CNT 212 /* RO, 4 bytes */
> +#define EXT_CSD_S_A_TIMEOUT   217 /* RO */
> +#define EXT_CSD_S_C_VCCQ  219 /* RO */
> +#define EXT_CSD_S_C_VCC 220 /* RO */
> +#define EXT_CSD_REL_WR_SEC_C222 /* RO */
> +#define EXT_CSD_HC_WP_GRP_SIZE221 /* RO */
> +#define EXT_CSD_ERASE_TIMEOUT_MULT  223 /* RO */
> +#define EXT_CSD_HC_ERASE_GRP_SIZE 224 /* RO */
> +#define EXT_CSD_ACC_SIZE225 /* RO */
> +#define EXT_CSD_BOOT_MULT   226 /* RO */
> +#define EXT_CSD_BOOT_INFO   228 /* RO */
> +#define EXT_CSD_SEC_TRIM_MULT   229 /* RO */
> +#define EXT_CSD_SEC_ERASE_MULT230 /* RO */
> +#define EXT_CSD_SEC_FEATURE_SUPPORT 231 /* RO */
> +#define EXT_CSD_TRIM_MULT   232 /* RO */
> +#define EXT_CSD_PWR_CL_200_195236 /* RO */
> +#define EXT_CSD_PWR_CL_200_360237 /* RO */
> +#define EXT_CSD_PWR_CL_DDR_52_195 238 /* RO */
> +#define EXT_CSD_PWR_CL_DDR_52_360 239 /* RO */
> +#define EXT_CSD_BKOPS_STATUS246 /* RO */
> +#define EXT_CSD_POWER_OFF_LONG_TIME 247 /* RO */
> +#define EXT_CSD_GENERIC_CMD6_TIME 248 /* RO */
> +#define EXT_CSD_CACHE_SIZE249 /* RO, 4 bytes */
> +#define EXT_CSD_PWR_CL_DDR_200_360  253 /* RO */
> +#define EXT_CSD_FIRMWARE_VERSION  254 /* RO, 8 bytes */
> +#define EXT_CSD_PRE_EOL_INFO267 /* RO */
> +#define EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A  268 /* RO */
> +#define EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B  269 /* RO */
> +#define EXT_CSD_CMDQ_DEPTH307 /* RO */
> +#define EXT_CSD_CMDQ_SUPPORT308 /* RO */
> +#define EXT_CSD_SUPPORTED_MODE493 /* RO */
> +#define EXT_CSD_TAG_UNIT_SIZE   498 /* RO */
> +#define EXT_CSD_DATA_TAG_SUPPORT  499 /* RO */
> +#define EXT_CSD_MAX_PACKED_WRITES 500 /* RO */
> +#define EXT_CSD_MAX_PACKED_READS  501 /* RO */
> +#define EXT_CSD_BKOPS_SUPPORT   502 /* RO */
> +#define EXT_CSD_HPI_FEATURES503 /* RO */
> +#define EXT_CSD_S_CMD_SET   504 /* RO */
> +
> +/*
> + * EXT_CSD field definitions
> + */
> +
> +#define EXT_CSD_WR_REL_PARAM_EN   (1 << 2)
> +#define EXT_CSD_WR_REL_PARAM_EN_RPMB_REL_WR (1 << 4)
> +
> +#define EXT_CSD_PART_CONFIG_ACC_MASK  (0x7)
> +#define EXT_CSD_PART_CONFIG_ACC_DEFAULT (0x0)
> +#define EXT_CSD_PART_CONFIG_ACC_BOOT0 

Re: [RFC PATCH v2 3/4] block: Support multiple reopening with x-blockdev-reopen

2021-03-01 Thread Kevin Wolf
Am 25.02.2021 um 18:02 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 24.02.2021 15:33, Kevin Wolf wrote:
> > Am 09.02.2021 um 09:03 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 08.02.2021 21:44, Alberto Garcia wrote:
> > > > Signed-off-by: Alberto Garcia 
> > > > ---
> > > >qapi/block-core.json   |  2 +-
> > > >include/block/block.h  |  1 +
> > > >block.c| 16 +--
> > > >blockdev.c | 85 
> > > > +-
> > > >tests/qemu-iotests/155 |  9 ++--
> > > >tests/qemu-iotests/165 |  4 +-
> > > >tests/qemu-iotests/245 | 27 +++-
> > > >tests/qemu-iotests/248 |  2 +-
> > > >tests/qemu-iotests/248.out |  2 +-
> > > >tests/qemu-iotests/298 |  4 +-
> > > >10 files changed, 89 insertions(+), 63 deletions(-)
> > > > 
> > > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > > index c0e7c23331..b9fcf20a81 100644
> > > > --- a/qapi/block-core.json
> > > > +++ b/qapi/block-core.json
> > > > @@ -4177,7 +4177,7 @@
> > > ># Since: 4.0
> > > >##
> > > >{ 'command': 'x-blockdev-reopen',
> > > > -  'data': 'BlockdevOptions', 'boxed': true }
> > > > +  'data': { 'options': ['BlockdevOptions'] } }
> > > 
> > > Do we also want to drop x- prefix?
> > 
> > libvirt really wants to have a stable blockdev-reopen interface in 6.0
> > because enabling the incremental backup code depends on this (they just
> > toggle the readonly flag if I understand correctly, so most of the work
> > we're currently doing isn't even relevant at this moment for libvirt).
> 
> Do you know what is the case exactly? If they do it to remove dirty bitmap
> from backing image after snapshot operation, probably we'd better improve
> block-dirty-bitmap-remove command to be able to reopen r-o image?
> 
> (I just recently faced such a task)

I think it was to switch nodes between read-only and read-write, but I
don't remember the exact context.

Peter, can you add the details?

> > Given that the soft freeze is coming closer (March 16), I wonder if we
> > should just make this API change and declare the interface stable. We
> > can then make Vladimir's fixes and the file reopening on top of it - if
> > it's in time for 6.0, that would be good, but if not we could move it to
> > 6.1 without impacting libvirt.
> > 
> > I think we're reasonable confident that the QAPI interfaces are right,
> > even if maybe not that all aspects of the implementation are right yet.
> > 
> > What do you think?
> 
> I think it's OK.. We have it since 4.0. What will we win keeping -x
> for years? Even latest change from updating one device to several
> could be easily done with help of 'alternate' if the command was
> already stable.

I think your series is kind of important to really call the
implementation stable. We can always feature flags to indicate the fixes
if necessary, but it would still feel better to declare something stable
that doesn't have known bugs. :-)

Do you think your series will still take a while? Maybe my first
comments sounded a bit negative because it was really hard to review at
first without knowing the final state, but after all I think the
approach is sane and apart from some implementation details, we're not
that far away from getting it into a mergable state.

Kevin




Re: [PATCH v3 04/21] sd: emmc: update OCR fields for eMMC

2021-03-01 Thread Cédric Le Goater
On 2/28/21 8:33 PM, Sai Pavan Boddu wrote:
> From: Vincent Palatin 
> 
> eMMC OCR register doesn't has UHS-II field and voltage fields are
> different.
> 
> Signed-off-by: Vincent Palatin 
> Signed-off-by: Sai Pavan Boddu 
> ---
>  hw/sd/sd.c | 27 ---
>  1 file changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 6de414b..bc9d913 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -287,6 +287,15 @@ FIELD(OCR, UHS_II_CARD,29,  1) /* Only 
> UHS-II */
>  FIELD(OCR, CARD_CAPACITY,  30,  1) /* 0:SDSC, 1:SDHC/SDXC */
>  FIELD(OCR, CARD_POWER_UP,  31,  1)
>  
> +/*
> + * eMMC OCR register
> + */
> +FIELD(EMMC_OCR, VDD_WINDOW_0,  7, 1)
> +FIELD(EMMC_OCR, VDD_WINDOW_1,  8, 7)
> +FIELD(EMMC_OCR, VDD_WINDOW_2, 15, 9)
> +FIELD(EMMC_OCR, ACCESS_MODE,  29, 2)
> +FIELD(EMMC_OCR, POWER_UP, 31, 1)
> +
>  #define ACMD41_ENQUIRY_MASK 0x00ff
>  #define ACMD41_R3_MASK  (R_OCR_VDD_VOLTAGE_WIN_HI_MASK \
> | R_OCR_ACCEPT_SWITCH_1V8_MASK \
> @@ -296,8 +305,16 @@ FIELD(OCR, CARD_POWER_UP,  31,  1)
>  
>  static void sd_set_ocr(SDState *sd)
>  {
> -/* All voltages OK */
> -sd->ocr = R_OCR_VDD_VOLTAGE_WIN_HI_MASK;
> +if (sd->emmc) {
> +/*
> + * Dual Voltage eMMC card
> + */
> +sd->ocr = R_EMMC_OCR_VDD_WINDOW_0_MASK |
> +  R_EMMC_OCR_VDD_WINDOW_2_MASK;
> +} else {
> +/* All voltages OK */
> +sd->ocr = R_OCR_VDD_VOLTAGE_WIN_HI_MASK;

'ocr' is a constant that could be in SDCardClass 

> +}
>  }
>  
>  static void sd_ocr_powerup(void *opaque)
> @@ -525,7 +542,11 @@ static void sd_response_r1_make(SDState *sd, uint8_t 
> *response)
>  
>  static void sd_response_r3_make(SDState *sd, uint8_t *response)
>  {
> -stl_be_p(response, sd->ocr & ACMD41_R3_MASK);
> +if (sd->emmc) {
> +stl_be_p(response, sd->ocr);
> +} else {
> +stl_be_p(response, sd->ocr & ACMD41_R3_MASK);
> +}
>  }
>  
>  static void sd_response_r6_make(SDState *sd, uint8_t *response)
> 




Re: [PATCH v3 02/21] sd: emmc: Add support for eMMC cards

2021-03-01 Thread Cédric Le Goater
On 2/28/21 8:33 PM, Sai Pavan Boddu wrote:
> Add eMMC device built on top of SD card.
> 
> Signed-off-by: Sai Pavan Boddu 
> ---
>  include/hw/sd/sd.h |  2 ++
>  hw/sd/sd.c | 20 
>  2 files changed, 22 insertions(+)
> 
> diff --git a/include/hw/sd/sd.h b/include/hw/sd/sd.h
> index 05ef9b7..b402dad 100644
> --- a/include/hw/sd/sd.h
> +++ b/include/hw/sd/sd.h
> @@ -90,6 +90,8 @@ typedef struct {
>  } SDRequest;
>  
>  
> +#define TYPE_EMMC "emmc"
> +OBJECT_DECLARE_SIMPLE_TYPE(EMMCState, EMMC)
>  #define TYPE_SD_CARD "sd-card"
>  OBJECT_DECLARE_TYPE(SDState, SDCardClass, SD_CARD)
>  
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 74b9162..a23af6d 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -108,6 +108,7 @@ struct SDState {
>  uint8_t spec_version;
>  BlockBackend *blk;
>  bool spi;
> +bool emmc;
>  
>  /* Runtime changeables */
>  
> @@ -143,6 +144,10 @@ struct SDState {
>  bool cmd_line;
>  };
>  
> +struct EMMCState {
> +SDState parent;
> +};
> +
>  static void sd_realize(DeviceState *dev, Error **errp);
>  
>  static const char *sd_state_name(enum SDCardStates state)
> @@ -2105,6 +2110,13 @@ static void sd_instance_init(Object *obj)
>  sd->ocr_power_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, sd_ocr_powerup, 
> sd);
>  }
>  
> +static void emmc_instance_init(Object *obj)
> +{
> +SDState *sd = SD_CARD(obj);
> +
> +sd->emmc = true;
> +}
I think field 'emmc' would fit better in SDCardClass since it is a device 
constant. You should not need 'struct EMMCState'. So something like below.
Then you can add handlers for specific emmc commands.

Thanks,

C.


diff --git a/include/hw/sd/sd.h b/include/hw/sd/sd.h
index 47360ba4ee98..80e7cd526a57 100644
--- a/include/hw/sd/sd.h
+++ b/include/hw/sd/sd.h
@@ -93,6 +93,8 @@ typedef struct {
 #define TYPE_SD_CARD "sd-card"
 OBJECT_DECLARE_TYPE(SDState, SDCardClass, SD_CARD)
 
+#define TYPE_EMMC "emmc"
+
 struct SDCardClass {
 /*< private >*/
 DeviceClass parent_class;
@@ -124,6 +126,8 @@ struct SDCardClass {
 void (*enable)(SDState *sd, bool enable);
 bool (*get_inserted)(SDState *sd);
 bool (*get_readonly)(SDState *sd);
+
+bool emmc;
 };
 
 #define TYPE_SD_BUS "sd-bus"
diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 660026f2a667..95608f11b36e 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -2447,9 +2447,24 @@ static const TypeInfo sd_info = {
 .instance_finalize = sd_instance_finalize,
 };
 
+static void emmc_class_init(ObjectClass *klass, void *data)
+{
+SDCardClass *sc = SD_CARD_CLASS(klass);
+
+sc->emmc = true;
+}
+
+static const TypeInfo emmc_info = {
+.name = TYPE_EMMC,
+.parent = TYPE_SD_CARD,
+.instance_size = sizeof(SDState),
+.class_init = emmc_class_init,
+};
+
 static void sd_register_types(void)
 {
 type_register_static(_info);
+type_register_static(_info);
 }
 
 type_init(sd_register_types)




Re: block/throttle and burst bucket

2021-03-01 Thread Kevin Wolf
Am 26.02.2021 um 13:33 hat Peter Lieven geschrieben:
> Am 26.02.21 um 10:27 schrieb Alberto Garcia:
> > On Thu 25 Feb 2021 06:34:48 PM CET, Peter Lieven  wrote:
> >> I was wondering if there is a way to check from outside (qmp etc.) if
> >> a throttled block device has exceeded the iops_max_length seconds of
> >> time bursting up to iops_max and is now hard limited to the iops limit
> >> that is supplied?
> >>
> >> Would it be also a good idea to exetend the accounting to account for
> >> requests that must have waited before being sent out to the backend
> >> device?
> > No, there's no such interface as far as I'm aware. I think one problem
> > is that throttling is now done using a filter, that can be inserted
> > anywhere in the node graph, and accounting is done at the BlockBackend
> > level.
> >
> > We don't even have a query-block-throttle function. I actually started
> > to write one six years ago but it was never finished.
> 
> 
> A quick idea that came to my mind was to add an option to emit a QMP
> event if the burst_bucket is exhausted and hard limits are enforced.

Do you actually need to do something every time that it's exceeded, so
QEMU needs to be the active part sending out an event, or is it
something that you need to check in specific places and could reasonably
query on demand?

For the latter, my idea would have been adding a new read-only QOM
property to the throttle group object that exposes how much is still
left. When it becomes 0, the hard limits are enforced.

> There seems to be something wrong in the throttling code anyway.
> Throttling causes addtional i/o latency always even if the actual iops
> rate is far away from the limits and ever more far away from the burst
> limits. I will dig into this.
> 
> My wishlist:
> 
>  - have a possibility to query the throttling state.
>  - have counters for no of delayed ops and for how long they were delayed.
>  - have counters for untrottled <= 4k request performance for a backend 
> storage device.
> 
> The later two seem not trivial as you mentioned.

Do you need the information per throttle node or per throttle group? For
the latter, the same QOM property approach would work.

Kevin




Re: [PATCH v3 12/21] sd: emmc: add CMD21 tuning sequence

2021-03-01 Thread Dr. David Alan Gilbert
* Sai Pavan Boddu (sai.pavan.bo...@xilinx.com) wrote:
> eMMC cards support tuning sequence for entering HS200 mode.
> 
> Signed-off-by: Sai Pavan Boddu 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  hw/sd/sd.c | 47 +++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index bf963ec..174c28e 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1386,6 +1386,14 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  }
>  break;
>  
> +case 21:/* CMD21: mmc SEND TUNING_BLOCK */
> +if (sd->emmc && (sd->state == sd_transfer_state)) {
> +sd->state = sd_sendingdata_state;
> +sd->data_offset = 0;
> +return sd_r1;
> +}
> +break;
> +
>  case 23:/* CMD23: SET_BLOCK_COUNT */
>  if (sd->spec_version < SD_PHY_SPECv3_01_VERS) {
>  break;
> @@ -2120,6 +2128,30 @@ static const uint8_t 
> sd_tuning_block_pattern[SD_TUNING_BLOCK_SIZE] = {
>  0xbb, 0xff, 0xf7, 0xff, 0xf7, 0x7f, 0x7b, 0xde,
>  };
>  
> +#define EXCSD_BUS_WIDTH_OFFSET 183
> +#define BUS_WIDTH_8_MASK0x4
> +#define BUS_WIDTH_4_MASK0x2
> +#define MMC_TUNING_BLOCK_SIZE   128
> +
> +static const uint8_t mmc_tunning_block_pattern[128] = {
> +   0xff, 0xff, 0x00, 0xff, 0xff, 0xff, 0x00, 0x00,
> +   0xff, 0xff, 0xcc, 0xcc, 0xcc, 0x33, 0xcc, 0xcc,
> +   0xcc, 0x33, 0x33, 0xcc, 0xcc, 0xcc, 0xff, 0xff,
> +   0xff, 0xee, 0xff, 0xff, 0xff, 0xee, 0xee, 0xff,
> +   0xff, 0xff, 0xdd, 0xff, 0xff, 0xff, 0xdd, 0xdd,
> +   0xff, 0xff, 0xff, 0xbb, 0xff, 0xff, 0xff, 0xbb,
> +   0xbb, 0xff, 0xff, 0xff, 0x77, 0xff, 0xff, 0xff,
> +   0x77, 0x77, 0xff, 0x77, 0xbb, 0xdd, 0xee, 0xff,
> +   0xff, 0xff, 0xff, 0x00, 0xff, 0xff, 0xff, 0x00,
> +   0x00, 0xff, 0xff, 0xcc, 0xcc, 0xcc, 0x33, 0xcc,
> +   0xcc, 0xcc, 0x33, 0x33, 0xcc, 0xcc, 0xcc, 0xff,
> +   0xff, 0xff, 0xee, 0xff, 0xff, 0xff, 0xee, 0xee,
> +   0xff, 0xff, 0xff, 0xdd, 0xff, 0xff, 0xff, 0xdd,
> +   0xdd, 0xff, 0xff, 0xff, 0xbb, 0xff, 0xff, 0xff,
> +   0xbb, 0xbb, 0xff, 0xff, 0xff, 0x77, 0xff, 0xff,
> +   0xff, 0x77, 0x77, 0xff, 0x77, 0xbb, 0xdd, 0xee,

Where does this magic pattern come from?  Is it part of some spec?

Dave

> +};
> +
>  uint8_t sd_read_byte(SDState *sd)
>  {
>  /* TODO: Append CRCs */
> @@ -2213,6 +2245,21 @@ uint8_t sd_read_byte(SDState *sd)
>  ret = sd_tuning_block_pattern[sd->data_offset++];
>  break;
>  
> +case 21:/* CMD21: SEND_TUNNING_BLOCK (MMC) */
> +if (sd->data_offset >= MMC_TUNING_BLOCK_SIZE - 1) {
> +sd->state = sd_transfer_state;
> +}
> +if (sd->ext_csd[EXCSD_BUS_WIDTH_OFFSET] & BUS_WIDTH_8_MASK) {
> +ret = mmc_tunning_block_pattern[sd->data_offset++];
> +} else {
> +/* Return LSB Nibbles of two byte from the 8bit tuning block
> + * for 4bit mode
> + */
> +ret = mmc_tunning_block_pattern[sd->data_offset++] & 0x0F;
> +ret |= (mmc_tunning_block_pattern[sd->data_offset++] & 0x0F) << 
> 4;
> +}
> +break;
> +
>  case 22:/* ACMD22: SEND_NUM_WR_BLOCKS */
>  ret = sd->data[sd->data_offset ++];
>  
> -- 
> 2.7.4
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH] iotests: Fix up python style in 300

2021-03-01 Thread Kevin Wolf
Am 15.02.2021 um 23:05 hat Eric Blake geschrieben:
> Break some long lines, and relax our type hints to be more generic to
> any JSON, in order to more easily permit the additional JSON depth now
> possible in migration parameters.  Detected by iotest 297.
> 
> Fixes: ca4bfec41d56
>  (qemu-iotests: 300: Add test case for modifying persistence of bitmap)
> Reported-by: Kevin Wolf 
> Signed-off-by: Eric Blake 

Thanks, applied to the block branch.

Kevin




Re: [PATCH] iotests: Fix up python style in 300

2021-03-01 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> 16.02.2021 02:21, John Snow wrote:
>> On 2/15/21 5:05 PM, Eric Blake wrote:
>>> Break some long lines, and relax our type hints to be more generic to
>>> any JSON, in order to more easily permit the additional JSON depth now
>>> possible in migration parameters.  Detected by iotest 297.
>>>
>>> Fixes: ca4bfec41d56
>>>   (qemu-iotests: 300: Add test case for modifying persistence of bitmap)
>>> Reported-by: Kevin Wolf 
>>> Signed-off-by: Eric Blake 
>> Reviewed-by: John Snow 
>> 
>>> ---
>>>   tests/qemu-iotests/300 | 10 ++
>>>   1 file changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
>>> index 63036f6a6e13..adb927629747 100755
>>> --- a/tests/qemu-iotests/300
>>> +++ b/tests/qemu-iotests/300
>>> @@ -22,7 +22,7 @@
>>>   import os
>>>   import random
>>>   import re
>>> -from typing import Dict, List, Optional, Union
>>> +from typing import Dict, List, Optional
>>>
>>>   import iotests
>>>
>>> @@ -30,7 +30,7 @@ import iotests
>>>   # pylint: disable=wrong-import-order
>>>   import qemu
>>>
>>> -BlockBitmapMapping = List[Dict[str, Union[str, List[Dict[str, str]
>>> +BlockBitmapMapping = List[Dict[str, object]]
>>>
>> Assuming iotest 297 didn't yap about this, I think this has the
>> necessary power for this file and we don't have to work any harder.
>> If in the future you try to treat e.g. bmap['persistent'] as a
>> particular kind of value (string? bool? int?) mypy will likely
>> complain about that a little, saying it has no insight into the type
>> beyond "object".
>> If *that* becomes annoying, you can degrade this type to use 'Any'
>> instead of 'object' and even those checks will cease.
>
> Probably at some future moment we'll have generated python types for QAPI 
> structures ? :)

Generating Python from the QAPI schema is possible.  I'm not aware of
anyone planning to work on it near term.