Re: [PATCH] doc: Describe missing generic -blockdev options
On Tue, 15 Oct 2019 at 13:40, Kevin Wolf wrote: > > We added more generic options after introducing -blockdev and forgot to > update the documentation (man page and --help output) accordingly. Do > that now. > > Signed-off-by: Kevin Wolf > --- > qemu-options.hx | 19 ++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/qemu-options.hx b/qemu-options.hx > index 793d70ff93..9f6aa3dde3 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -849,7 +849,8 @@ ETEXI > DEF("blockdev", HAS_ARG, QEMU_OPTION_blockdev, > "-blockdev [driver=]driver[,node-name=N][,discard=ignore|unmap]\n" > " [,cache.direct=on|off][,cache.no-flush=on|off]\n" > -" [,read-only=on|off][,detect-zeroes=on|off|unmap]\n" > +" [,read-only=on|off][,auto-read-only=on|off]\n" > +" [,force-share=on|off][,detect-zeroes=on|off|unmap]\n" > " [,driver specific parameters...]\n" > "configure a block backend\n", QEMU_ARCH_ALL) > STEXI > @@ -885,6 +886,22 @@ name is not intended to be predictable and changes > between QEMU invocations. > For the top level, an explicit node name must be specified. > @item read-only > Open the node read-only. Guest write attempts will fail. > + > +Note that some block drivers support only read-only access, either generally > or > +in certain configurations. In this case, the default value > +@option{read-only=off} does not work and the option must be specified > +explicitly. > +@item auto-read-only > +If @option{auto-read-only=on} is set, QEMU is allowed not to open the image > +read-write even if @option{read-only=off} is requested, but fall back to > +read-only instead (and switch between the modes later), e.g. depending on > +whether the image file is writable or whether a writing user is attached to > the > +node. > +@item force-share > +Override the image locking system of QEMU and force the node to allowing > +sharing all permissions with other uses. Grammar nit: "to allow sharing"; but maybe the phrasing could be clarified anyway -- I'm not entirely sure what 'sharing permissions" would be. The first part of the sentence suggests this option is "force the image file to be opened even if some other QEMU instance has it open already", but the second half soudns like "don't lock the image, so that some other use later is allowed to open it" ? Or is it both, or something else? > + > +Enabling @option{force-share=on} requires @option{read-only=on}. thanks -- PMM
Re: [PATCH] doc: Describe missing generic -blockdev options
On 10/15/19 7:38 AM, Kevin Wolf wrote: We added more generic options after introducing -blockdev and forgot to update the documentation (man page and --help output) accordingly. Do that now. Signed-off-by: Kevin Wolf --- qemu-options.hx | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) @@ -885,6 +886,22 @@ name is not intended to be predictable and changes between QEMU invocations. For the top level, an explicit node name must be specified. @item read-only Open the node read-only. Guest write attempts will fail. + +Note that some block drivers support only read-only access, either generally or +in certain configurations. In this case, the default value +@option{read-only=off} does not work and the option must be specified +explicitly. +@item auto-read-only +If @option{auto-read-only=on} is set, QEMU is allowed not to open the image +read-write even if @option{read-only=off} is requested, but fall back to +read-only instead (and switch between the modes later), e.g. depending on +whether the image file is writable or whether a writing user is attached to the +node. Hard to read. Maybe: If @option{auto-read-only=on} is set, QEMU may fall back to read-only usage even when @option{read-only=off} is requested, or even switch between modes as needed, e.g. depending on whether the image file is writable or whether a writing user is attached to the node. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH] blockdev: Use error_report() in hmp_commit()
On 10/15/19 7:39 AM, Kevin Wolf wrote: Instead of using monitor_printf() to report errors, hmp_commit() should use error_report() like other places do. Signed-off-by: Kevin Wolf --- blockdev.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH] doc: Describe missing generic -blockdev options
Am 15.10.2019 um 15:55 hat Peter Maydell geschrieben: > On Tue, 15 Oct 2019 at 13:40, Kevin Wolf wrote: > > > > We added more generic options after introducing -blockdev and forgot to > > update the documentation (man page and --help output) accordingly. Do > > that now. > > > > Signed-off-by: Kevin Wolf > > --- > > qemu-options.hx | 19 ++- > > 1 file changed, 18 insertions(+), 1 deletion(-) > > > > diff --git a/qemu-options.hx b/qemu-options.hx > > index 793d70ff93..9f6aa3dde3 100644 > > --- a/qemu-options.hx > > +++ b/qemu-options.hx > > @@ -849,7 +849,8 @@ ETEXI > > DEF("blockdev", HAS_ARG, QEMU_OPTION_blockdev, > > "-blockdev [driver=]driver[,node-name=N][,discard=ignore|unmap]\n" > > " [,cache.direct=on|off][,cache.no-flush=on|off]\n" > > -" [,read-only=on|off][,detect-zeroes=on|off|unmap]\n" > > +" [,read-only=on|off][,auto-read-only=on|off]\n" > > +" [,force-share=on|off][,detect-zeroes=on|off|unmap]\n" > > " [,driver specific parameters...]\n" > > "configure a block backend\n", QEMU_ARCH_ALL) > > STEXI > > @@ -885,6 +886,22 @@ name is not intended to be predictable and changes > > between QEMU invocations. > > For the top level, an explicit node name must be specified. > > @item read-only > > Open the node read-only. Guest write attempts will fail. > > + > > +Note that some block drivers support only read-only access, either > > generally or > > +in certain configurations. In this case, the default value > > +@option{read-only=off} does not work and the option must be specified > > +explicitly. > > +@item auto-read-only > > +If @option{auto-read-only=on} is set, QEMU is allowed not to open the image > > +read-write even if @option{read-only=off} is requested, but fall back to > > +read-only instead (and switch between the modes later), e.g. depending on > > +whether the image file is writable or whether a writing user is attached > > to the > > +node. > > +@item force-share > > +Override the image locking system of QEMU and force the node to allowing > > +sharing all permissions with other uses. > > Grammar nit: "to allow sharing"; but maybe the phrasing could > be clarified anyway -- I'm not entirely sure what 'sharing > permissions" would be. The first part of the sentence suggests > this option is "force the image file to be opened even if some > other QEMU instance has it open already", but the second half > soudns like "don't lock the image, so that some other use later > is allowed to open it" ? Or is it both, or something else? It's more the latter. Open the image file and allow other instances to have it open as well (existing and future instances), but still error out if the other instance doesn't allow sharing. I'm open for suggestions on how to phrase this better. Kevin
Re: [PATCH] doc: Describe missing generic -blockdev options
On 10/15/19 9:05 AM, Kevin Wolf wrote: +@item force-share +Override the image locking system of QEMU and force the node to allowing +sharing all permissions with other uses. Grammar nit: "to allow sharing"; but maybe the phrasing could be clarified anyway -- I'm not entirely sure what 'sharing permissions" would be. The first part of the sentence suggests this option is "force the image file to be opened even if some other QEMU instance has it open already", but the second half soudns like "don't lock the image, so that some other use later is allowed to open it" ? Or is it both, or something else? It's more the latter. Open the image file and allow other instances to have it open as well (existing and future instances), but still error out if the other instance doesn't allow sharing. I'm open for suggestions on how to phrase this better. Here's a shot (although I'm not 100% certain I've captured the nuances correctly): Override the image locking system of QEMU by forcing the node to utilize weaker shared access for permissions where it would normally request exclusive access. When there is the potential for multiple instances to have the same file open (whether this invocation of qemu is the first or the second instance), both instances must permit shared access for the second instance to succeed at opening the file. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
[PATCH v2 02/21] iotests/qcow2.py: Split feature fields into bits
Print the feature fields as a set of bits so that filtering is easier. Signed-off-by: Max Reitz --- tests/qemu-iotests/031.out | 36 +-- tests/qemu-iotests/036.out | 18 +- tests/qemu-iotests/039.out | 22 ++-- tests/qemu-iotests/060.out | 20 +-- tests/qemu-iotests/061.out | 72 ++--- tests/qemu-iotests/137.out | 2 +- tests/qemu-iotests/qcow2.py | 18 +++--- 7 files changed, 99 insertions(+), 89 deletions(-) diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out index 68a74d03b9..d535e407bc 100644 --- a/tests/qemu-iotests/031.out +++ b/tests/qemu-iotests/031.out @@ -18,9 +18,9 @@ refcount_table_offset 0x1 refcount_table_clusters 1 nb_snapshots 0 snapshot_offset 0x0 -incompatible_features 0x0 -compatible_features 0x0 -autoclear_features0x0 +incompatible_features [] +compatible_features [] +autoclear_features[] refcount_order4 header_length 72 @@ -46,9 +46,9 @@ refcount_table_offset 0x1 refcount_table_clusters 1 nb_snapshots 0 snapshot_offset 0x0 -incompatible_features 0x0 -compatible_features 0x0 -autoclear_features0x0 +incompatible_features [] +compatible_features [] +autoclear_features[] refcount_order4 header_length 72 @@ -74,9 +74,9 @@ refcount_table_offset 0x1 refcount_table_clusters 1 nb_snapshots 0 snapshot_offset 0x0 -incompatible_features 0x0 -compatible_features 0x0 -autoclear_features0x0 +incompatible_features [] +compatible_features [] +autoclear_features[] refcount_order4 header_length 72 @@ -109,9 +109,9 @@ refcount_table_offset 0x1 refcount_table_clusters 1 nb_snapshots 0 snapshot_offset 0x0 -incompatible_features 0x0 -compatible_features 0x0 -autoclear_features0x0 +incompatible_features [] +compatible_features [] +autoclear_features[] refcount_order4 header_length 104 @@ -142,9 +142,9 @@ refcount_table_offset 0x1 refcount_table_clusters 1 nb_snapshots 0 snapshot_offset 0x0 -incompatible_features 0x0 -compatible_features 0x0 -autoclear_features0x0 +incompatible_features [] +compatible_features [] +autoclear_features[] refcount_order4 header_length 104 @@ -175,9 +175,9 @@ refcount_table_offset 0x1 refcount_table_clusters 1 nb_snapshots 0 snapshot_offset 0x0 -incompatible_features 0x0 -compatible_features 0x0 -autoclear_features0x0 +incompatible_features [] +compatible_features [] +autoclear_features[] refcount_order4 header_length 104 diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out index e489b44386..15229a9604 100644 --- a/tests/qemu-iotests/036.out +++ b/tests/qemu-iotests/036.out @@ -16,9 +16,9 @@ refcount_table_offset 0x1 refcount_table_clusters 1 nb_snapshots 0 snapshot_offset 0x0 -incompatible_features 0x8000 -compatible_features 0x0 -autoclear_features0x0 +incompatible_features [63] +compatible_features [] +autoclear_features[] refcount_order4 header_length 104 @@ -50,9 +50,9 @@ refcount_table_offset 0x1 refcount_table_clusters 1 nb_snapshots 0 snapshot_offset 0x0 -incompatible_features 0x0 -compatible_features 0x0 -autoclear_features0x8000 +incompatible_features [] +compatible_features [] +autoclear_features[63] refcount_order4 header_length 104 @@ -78,9 +78,9 @@ refcount_table_offset 0x1 refcount_table_clusters 1 nb_snapshots 0 snapshot_offset 0x0 -incompatible_features 0x0 -compatible_features 0x0 -autoclear_features0x0 +incompatible_features [] +compatible_features [] +autoclear_features[] refcount_order4 header_length 104 diff --git a/tests/qemu-iotests/039.out b/tests/qemu-iotests/039.out index 2e356d51b6..bdafa3ace3 100644 --- a/tests/qemu-iotests/039.out +++ b/tests/qemu-iotests/039.out @@ -4,7 +4,7 @@ QA output created by 039 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 wrote 512/512 bytes at offset 0 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -incompatible_features 0x0 +incompatible_features [] No errors were found on the image. == Creating a dirty image file == @@ -12,7 +12,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 wrote 512/512 bytes at offset 0 512 bytes, X ops; XX:XX:XX.X (XXX
[PATCH v2 05/21] iotests: Replace IMGOPTS by _unsupported_imgopts
Some tests require compat=1.1 and thus set IMGOPTS='compat=1.1' globally. That is not how it should be done; instead, they should simply set _unsupported_imgopts to compat=0.10 (compat=1.1 is the default anyway). This makes the tests heed user-specified $IMGOPTS. Some do not work with all image options, though, so we need to disable them accordingly. Signed-off-by: Max Reitz --- tests/qemu-iotests/036 | 3 +-- tests/qemu-iotests/060 | 4 ++-- tests/qemu-iotests/062 | 3 ++- tests/qemu-iotests/066 | 3 ++- tests/qemu-iotests/068 | 3 ++- tests/qemu-iotests/098 | 4 ++-- 6 files changed, 11 insertions(+), 9 deletions(-) diff --git a/tests/qemu-iotests/036 b/tests/qemu-iotests/036 index 5f929ad3be..bbaf0ef45b 100755 --- a/tests/qemu-iotests/036 +++ b/tests/qemu-iotests/036 @@ -43,9 +43,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 # This tests qcow2-specific low-level functionality _supported_fmt qcow2 _supported_proto file - # Only qcow2v3 and later supports feature bits -IMGOPTS="compat=1.1" +_unsupported_imgopts 'compat=0.10' echo echo === Image with unknown incompatible feature bit === diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060 index b91d8321bb..9c2ef42522 100755 --- a/tests/qemu-iotests/060 +++ b/tests/qemu-iotests/060 @@ -48,6 +48,8 @@ _filter_io_error() _supported_fmt qcow2 _supported_proto file _supported_os Linux +# These tests only work for compat=1.1 images with refcount_bits=16 +_unsupported_imgopts 'compat=0.10' 'refcount_bits=\([^1]\|.\([^6]\|$\)\)' rt_offset=65536 # 0x1 (XXX: just an assumption) rb_offset=131072 # 0x2 (XXX: just an assumption) @@ -55,8 +57,6 @@ l1_offset=196608 # 0x3 (XXX: just an assumption) l2_offset=262144 # 0x4 (XXX: just an assumption) l2_offset_after_snapshot=524288 # 0x8 (XXX: just an assumption) -IMGOPTS="compat=1.1" - OPEN_RW="open -o overlap-check=all $TEST_IMG" # Overlap checks are done before write operations only, therefore opening an # image read-only makes the overlap-check option irrelevant diff --git a/tests/qemu-iotests/062 b/tests/qemu-iotests/062 index d5f818fcce..ac0d2a9a3b 100755 --- a/tests/qemu-iotests/062 +++ b/tests/qemu-iotests/062 @@ -40,8 +40,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 # This tests qocw2-specific low-level functionality _supported_fmt qcow2 _supported_proto generic +# We need zero clusters and snapshots +_unsupported_imgopts 'compat=0.10' 'refcount_bits=1[^0-9]' -IMGOPTS="compat=1.1" IMG_SIZE=64M echo diff --git a/tests/qemu-iotests/066 b/tests/qemu-iotests/066 index 28f8c98412..00eb80d89e 100755 --- a/tests/qemu-iotests/066 +++ b/tests/qemu-iotests/066 @@ -39,9 +39,10 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 # This tests qocw2-specific low-level functionality _supported_fmt qcow2 _supported_proto generic +# We need zero clusters and snapshots +_unsupported_imgopts 'compat=0.10' 'refcount_bits=1[^0-9]' # Intentionally create an unaligned image -IMGOPTS="compat=1.1" IMG_SIZE=$((64 * 1024 * 1024 + 512)) echo diff --git a/tests/qemu-iotests/068 b/tests/qemu-iotests/068 index 22f5ca3ba6..65650fca9a 100755 --- a/tests/qemu-iotests/068 +++ b/tests/qemu-iotests/068 @@ -39,8 +39,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 # This tests qocw2-specific low-level functionality _supported_fmt qcow2 _supported_proto generic +# Internal snapshots are (currently) impossible with refcount_bits=1 +_unsupported_imgopts 'compat=0.10' 'refcount_bits=1[^0-9]' -IMGOPTS="compat=1.1" IMG_SIZE=128K case "$QEMU_DEFAULT_MACHINE" in diff --git a/tests/qemu-iotests/098 b/tests/qemu-iotests/098 index 1c1d1c468f..700068b328 100755 --- a/tests/qemu-iotests/098 +++ b/tests/qemu-iotests/098 @@ -40,8 +40,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 _supported_fmt qcow2 _supported_proto file - -IMGOPTS="compat=1.1" +# The code path we want to test here only works for compat=1.1 images +_unsupported_imgopts 'compat=0.10' for event in l1_update empty_image_prepare reftable_update refblock_alloc; do -- 2.21.0
[PATCH v2 01/21] iotests/qcow2.py: Add dump-header-exts
This is useful for tests that want to whitelist fields from dump-header (with grep) but still print all header extensions. Signed-off-by: Max Reitz --- tests/qemu-iotests/qcow2.py | 5 + 1 file changed, 5 insertions(+) diff --git a/tests/qemu-iotests/qcow2.py b/tests/qemu-iotests/qcow2.py index b392972d1b..d813b4fc81 100755 --- a/tests/qemu-iotests/qcow2.py +++ b/tests/qemu-iotests/qcow2.py @@ -154,6 +154,10 @@ def cmd_dump_header(fd): h.dump() h.dump_extensions() +def cmd_dump_header_exts(fd): +h = QcowHeader(fd) +h.dump_extensions() + def cmd_set_header(fd, name, value): try: value = int(value, 0) @@ -230,6 +234,7 @@ def cmd_set_feature_bit(fd, group, bit): cmds = [ [ 'dump-header', cmd_dump_header, 0, 'Dump image header and header extensions' ], +[ 'dump-header-exts', cmd_dump_header_exts, 0, 'Dump image header extensions' ], [ 'set-header', cmd_set_header, 2, 'Set a field in the header'], [ 'add-header-ext', cmd_add_header_ext, 2, 'Add a header extension' ], [ 'add-header-ext-stdio', cmd_add_header_ext_stdio, 1, 'Add a header extension, data from stdin' ], -- 2.21.0
[PATCH v2 00/21] iotests: Allow ./check -o data_file
Hi, The cover letter from v1 (explaining the motivation behind this series and the general structure) is here: https://lists.nongnu.org/archive/html/qemu-block/2019-09/msg01323.html For v2, I’ve tried to address Maxim’s comments: - Patch 1 through 3: New - Patch 4: Only print feature bits instead of blacklisting stuff that we don’t need - Patch 5: - Fix typo - Add comment why 098 needs compat=1.1 - Patch 16: Use _check_test_img - Patch 17: Use the new _filter_json_filename - Patch 18: Rethink the incompatible feature filter approach: Instead of filtering out the data_file bit, just check whether the dirty bit is present (because that is all we want to know) - Patch 19: Use the new _filter_json_filename - Patch 20: Rebase conflicts due to the changes to patch 5 - Patch 21: - Add and use _get_data_file - Add a comment how the data_file_filter in _filter_qemu_img_map works git-backport-diff against v1: Key: [] : patches are identical [] : number of functional differences between upstream/downstream patch [down] : patch is downstream-only The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively 001/21:[down] 'iotests/qcow2.py: Add dump-header-exts' 002/21:[down] 'iotests/qcow2.py: Split feature fields into bits' 003/21:[down] 'iotests: Add _filter_json_filename' 004/21:[0060] [FC] 'iotests: Filter refcount_order in 036' 005/21:[0003] [FC] 'iotests: Replace IMGOPTS by _unsupported_imgopts' 006/21:[] [--] 'iotests: Drop compat=1.1 in 050' 007/21:[] [--] 'iotests: Let _make_test_img parse its parameters' 008/21:[] [--] 'iotests: Add -o and --no-opts to _make_test_img' 009/21:[] [--] 'iotests: Inject space into -ocompat=0.10 in 051' 010/21:[] [--] 'iotests: Replace IMGOPTS= by -o' 011/21:[] [--] 'iotests: Replace IMGOPTS='' by --no-opts' 012/21:[] [--] 'iotests: Drop IMGOPTS use in 267' 013/21:[] [--] 'iotests: Avoid qemu-img create' 014/21:[] [--] 'iotests: Use _rm_test_img for deleting test images' 015/21:[] [--] 'iotests: Avoid cp/mv of test images' 016/21:[0004] [FC] 'iotests: Make 091 work with data_file' 017/21:[0004] [FC] 'iotests: Make 110 work with data_file' 018/21:[0002] [FC] 'iotests: Make 137 work with data_file' 019/21:[0004] [FC] 'iotests: Make 198 work with data_file' 020/21:[0002] [FC] 'iotests: Disable data_file where it cannot be used' 021/21:[0034] [FC] 'iotests: Allow check -o data_file' Max Reitz (21): iotests/qcow2.py: Add dump-header-exts iotests/qcow2.py: Split feature fields into bits iotests: Add _filter_json_filename iotests: Filter refcount_order in 036 iotests: Replace IMGOPTS by _unsupported_imgopts iotests: Drop compat=1.1 in 050 iotests: Let _make_test_img parse its parameters iotests: Add -o and --no-opts to _make_test_img iotests: Inject space into -ocompat=0.10 in 051 iotests: Replace IMGOPTS= by -o iotests: Replace IMGOPTS='' by --no-opts iotests: Drop IMGOPTS use in 267 iotests: Avoid qemu-img create iotests: Use _rm_test_img for deleting test images iotests: Avoid cp/mv of test images iotests: Make 091 work with data_file iotests: Make 110 work with data_file iotests: Make 137 work with data_file iotests: Make 198 work with data_file iotests: Disable data_file where it cannot be used iotests: Allow check -o data_file tests/qemu-iotests/005 | 2 +- tests/qemu-iotests/007 | 5 ++- tests/qemu-iotests/014 | 2 + tests/qemu-iotests/015 | 5 ++- tests/qemu-iotests/019 | 6 +-- tests/qemu-iotests/020 | 6 +-- tests/qemu-iotests/024 | 10 ++--- tests/qemu-iotests/026 | 5 ++- tests/qemu-iotests/028 | 2 +- tests/qemu-iotests/029 | 7 ++-- tests/qemu-iotests/031 | 9 ++-- tests/qemu-iotests/031.out | 36 tests/qemu-iotests/036 | 15 --- tests/qemu-iotests/036.out | 66 - tests/qemu-iotests/039 | 27 +--- tests/qemu-iotests/039.out | 22 +- tests/qemu-iotests/043 | 4 +- tests/qemu-iotests/046 | 2 + tests/qemu-iotests/048 | 4 +- tests/qemu-iotests/050 | 8 +--- tests/qemu-iotests/051 | 7 ++-- tests/qemu-iotests/053 | 4 +- tests/qemu-iotests/058 | 7 ++-- tests/qemu-iotests/059 | 20 - tests/qemu-iotests/060 | 12 +++--- tests/qemu-iotests/060.out | 20 - tests/qemu-iotests/061 | 61 ++- tests/qemu-iotests/061.out | 72 tests/qemu-iotests/062 | 3 +- tests/qemu-iotests/063 | 18 tests/qemu-iotests/063.out | 3 +- tests/qemu-iotests/066 | 3 +- tests/qemu-iotests/067 | 6 ++- tests/qemu-iotests/068 | 4 +- tests/qemu-iotests/069 |
[PATCH v2 03/21] iotests: Add _filter_json_filename
Signed-off-by: Max Reitz --- tests/qemu-iotests/common.filter | 24 1 file changed, 24 insertions(+) diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter index 9f418b4881..63bc6f6f26 100644 --- a/tests/qemu-iotests/common.filter +++ b/tests/qemu-iotests/common.filter @@ -227,5 +227,29 @@ _filter_qmp_empty_return() grep -v '{"return": {}}' } +_filter_json_filename() +{ +$PYTHON -c 'import sys +result, *fnames = sys.stdin.read().split("json:{") +depth = 0 +for fname in fnames: +depth += 1 # For the opening brace in the split separator +for chr_i, chr in enumerate(fname): +if chr == "{": +depth += 1 +elif chr == "}": +depth -= 1 +if depth == 0: +break + +# json:{} filenames may be nested; filter out everything from +# inside the outermost one +if depth == 0: +chr_i += 1 # First character past the filename +result += "json:{ /* filtered */ }" + fname[chr_i:] + +sys.stdout.write(result)' +} + # make sure this script returns success true -- 2.21.0
[PATCH v3 1/5] qcow2: Allow writing compressed data of multiple clusters
QEMU currently supports writing compressed data of the size equal to one cluster. This patch allows writing QCOW2 compressed data that exceed one cluster. Now, we split buffered data into separate clusters and write them compressed using the existing functionality. To inform the block layer about writing all the data compressed, we introduce the 'compress' command line option. Based on that option, the written data will be aligned by the cluster size at the generic layer. Suggested-by: Pavel Butsykin Suggested-by: Vladimir Sementsov-Ogievskiy Suggested-by: Roman Kagan Signed-off-by: Andrey Shinkevich --- block.c | 12 +- block/io.c| 2 +- block/qcow2.c | 106 ++ block/qcow2.h | 1 + blockdev.c| 4 ++ include/block/block.h | 1 + include/block/block_int.h | 2 + qapi/block-core.json | 6 ++- qemu-options.hx | 6 ++- 9 files changed, 108 insertions(+), 32 deletions(-) diff --git a/block.c b/block.c index 5944124..4cfbea2 100644 --- a/block.c +++ b/block.c @@ -1418,6 +1418,11 @@ QemuOptsList bdrv_runtime_opts = { .type = QEMU_OPT_BOOL, .help = "always accept other writers (default: off)", }, +{ +.name = BDRV_OPT_COMPRESS, +.type = QEMU_OPT_BOOL, +.help = "compress all writes to the image (default: off)", +}, { /* end of list */ } }, }; @@ -2983,6 +2988,11 @@ static BlockDriverState *bdrv_open_inherit(const char *filename, flags &= ~BDRV_O_RDWR; } +if (!g_strcmp0(qdict_get_try_str(options, BDRV_OPT_COMPRESS), "on") || +qdict_get_try_bool(options, BDRV_OPT_COMPRESS, false)) { +bs->all_write_compressed = true; +} + if (flags & BDRV_O_SNAPSHOT) { snapshot_options = qdict_new(); bdrv_temp_snapshot_options(_flags, snapshot_options, @@ -3208,7 +3218,7 @@ static int bdrv_reset_options_allowed(BlockDriverState *bs, * in bdrv_reopen_prepare() so they can be left out of @new_opts */ const char *const common_options[] = { "node-name", "discard", "cache.direct", "cache.no-flush", -"read-only", "auto-read-only", "detect-zeroes", NULL +"read-only", "auto-read-only", "detect-zeroes", "compress", NULL }; for (e = qdict_first(bs->options); e; e = qdict_next(bs->options, e)) { diff --git a/block/io.c b/block/io.c index f8c3596..6a5509c 100644 --- a/block/io.c +++ b/block/io.c @@ -1922,7 +1922,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child, } else if (flags & BDRV_REQ_ZERO_WRITE) { bdrv_debug_event(bs, BLKDBG_PWRITEV_ZERO); ret = bdrv_co_do_pwrite_zeroes(bs, offset, bytes, flags); -} else if (flags & BDRV_REQ_WRITE_COMPRESSED) { +} else if (flags & BDRV_REQ_WRITE_COMPRESSED || bs->all_write_compressed) { ret = bdrv_driver_pwritev_compressed(bs, offset, bytes, qiov, qiov_offset); } else if (bytes <= max_transfer) { diff --git a/block/qcow2.c b/block/qcow2.c index 7961c05..9a85d73 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1787,6 +1787,10 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp) /* Encryption works on a sector granularity */ bs->bl.request_alignment = qcrypto_block_get_sector_size(s->crypto); } +if (bs->all_write_compressed) { +bs->bl.request_alignment = MAX(bs->bl.request_alignment, + s->cluster_size); +} bs->bl.pwrite_zeroes_alignment = s->cluster_size; bs->bl.pdiscard_alignment = s->cluster_size; } @@ -4152,10 +4156,8 @@ fail: return ret; } -/* XXX: put compressed sectors first, then all the cluster aligned - tables to avoid losing bytes in alignment */ static coroutine_fn int -qcow2_co_pwritev_compressed_part(BlockDriverState *bs, +qcow2_co_pwritev_compressed_task(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, size_t qiov_offset) { @@ -4165,32 +4167,11 @@ qcow2_co_pwritev_compressed_part(BlockDriverState *bs, uint8_t *buf, *out_buf; uint64_t cluster_offset; -if (has_data_file(bs)) { -return -ENOTSUP; -} - -if (bytes == 0) { -/* align end of file to a sector boundary to ease reading with - sector based I/Os */ -int64_t len = bdrv_getlength(bs->file->bs); -if (len < 0) { -return len; -} -return bdrv_co_truncate(bs->file, len, PREALLOC_MODE_OFF, NULL); -} - -if (offset_into_cluster(s, offset)) { -return -EINVAL; -} +assert(bytes == s->cluster_size || (bytes < s->cluster_size && + (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS))); buf = qemu_blockalign(bs, s->cluster_size); -
[PATCH v3 2/5] tests/qemu-iotests: add case to write compressed data of multiple clusters
Add the test case to the iotest #214 that checks possibility of writing compressed data of more than one cluster size. Signed-off-by: Andrey Shinkevich --- tests/qemu-iotests/214 | 35 +++ tests/qemu-iotests/214.out | 15 +++ 2 files changed, 50 insertions(+) diff --git a/tests/qemu-iotests/214 b/tests/qemu-iotests/214 index 21ec8a2..0003dc2 100755 --- a/tests/qemu-iotests/214 +++ b/tests/qemu-iotests/214 @@ -89,6 +89,41 @@ _check_test_img -r all $QEMU_IO -c "read -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir $QEMU_IO -c "read -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir +echo +echo "=== Write compressed data of multiple clusters ===" +echo +cluster_size=0x1 +_make_test_img 2M -o cluster_size=$cluster_size + +echo "Uncompressed data:" +let data_size="8 * $cluster_size" +$QEMU_IO -c "write -P 0xaa 0 $data_size" "$TEST_IMG" \ + 2>&1 | _filter_qemu_io | _filter_testdir +$QEMU_IMG info "$TEST_IMG" | sed -n '/disk size:/ s/^ *//p' + +_make_test_img 2M -o cluster_size=$cluster_size +let data_size="3 * $cluster_size + ($cluster_size >> 1)" +# Set compress=on. That will align the written data +# by the cluster size and will write them compressed. +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT \ +$QEMU_IO -c "write -P 0xbb 0 $data_size" --image-opts \ + driver=$IMGFMT,compress=on,file.filename=$TEST_IMG \ + 2>&1 | _filter_qemu_io | _filter_testdir + +let offset="4 * $cluster_size" +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT \ +$QEMU_IO -c "write -P 0xcc $offset $data_size" "json:{\ +'driver': '$IMGFMT', +'file': { +'driver': 'file', +'filename': '$TEST_IMG' +}, +'compress': true +}" | _filter_qemu_io | _filter_testdir + +echo "After the multiple cluster data have been written compressed," +$QEMU_IMG info "$TEST_IMG" | sed -n '/disk size:/ s/^ *//p' + # success, all done echo '*** done' rm -f $seq.full diff --git a/tests/qemu-iotests/214.out b/tests/qemu-iotests/214.out index 0fcd8dc..09a2e9a 100644 --- a/tests/qemu-iotests/214.out +++ b/tests/qemu-iotests/214.out @@ -32,4 +32,19 @@ read 4194304/4194304 bytes at offset 0 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) read 4194304/4194304 bytes at offset 4194304 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + +=== Write compressed data of multiple clusters === + +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2097152 +Uncompressed data: +wrote 524288/524288 bytes at offset 0 +512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +disk size: 772 KiB +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2097152 +wrote 229376/229376 bytes at offset 0 +224 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 229376/229376 bytes at offset 262144 +224 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +After the multiple cluster data have been written compressed, +disk size: 268 KiB *** done -- 1.8.3.1
[PATCH v2 3/3] tests: More iotest 223 improvements
Run the core of the test twice, once without iothreads, and again with, for more coverage of both setups. Suggested-by: Nir Soffer Signed-off-by: Eric Blake --- tests/qemu-iotests/223 | 16 ++- tests/qemu-iotests/223.out | 85 +- 2 files changed, 97 insertions(+), 4 deletions(-) diff --git a/tests/qemu-iotests/223 b/tests/qemu-iotests/223 index 2ba3d8124b4f..8b43ddb02b2c 100755 --- a/tests/qemu-iotests/223 +++ b/tests/qemu-iotests/223 @@ -117,10 +117,19 @@ _send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return" _send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add", "arguments":{"driver":"qcow2", "node-name":"n", "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return" -_send_qemu_cmd $QEMU_HANDLE '{"execute":"x-blockdev-set-iothread", - "arguments":{"node-name":"n", "iothread":"io0"}}' "return" _send_qemu_cmd $QEMU_HANDLE '{"execute":"block-dirty-bitmap-disable", "arguments":{"node":"n", "name":"b"}}' "return" + +for attempt in normal iothread; do + +echo +echo "=== Set up NBD with $attempt access ===" +echo +if [ $attempt = iothread ]; then +_send_qemu_cmd $QEMU_HANDLE '{"execute":"x-blockdev-set-iothread", + "arguments":{"node-name":"n", "iothread":"io0"}}' "return" +fi + _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-add", "arguments":{"device":"n"}}' "error" # Attempt add without server _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", @@ -180,6 +189,9 @@ _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-remove", "arguments":{"name":"n2"}}' "error" # Attempt duplicate clean _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "return" _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "error" # Again + +done + _send_qemu_cmd $QEMU_HANDLE '{"execute":"quit"}' "return" wait=yes _cleanup_qemu diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out index 8bfc5072ea9d..ed543047956f 100644 --- a/tests/qemu-iotests/223.out +++ b/tests/qemu-iotests/223.out @@ -28,10 +28,91 @@ wrote 2097152/2097152 bytes at offset 2097152 {"return": {}} {"execute":"blockdev-add", "arguments":{"driver":"qcow2", "node-name":"n", "file":{"driver":"file", "filename":"TEST_DIR/t.qcow2"}}} {"return": {}} -{"execute":"x-blockdev-set-iothread", "arguments":{"node-name":"n", "iothread":"io0"}} -{"return": {}} {"execute":"block-dirty-bitmap-disable", "arguments":{"node":"n", "name":"b"}} {"return": {}} + +=== Set up NBD with normal access === + +{"execute":"nbd-server-add", "arguments":{"device":"n"}} +{"error": {"class": "GenericError", "desc": "NBD server not running"}} +{"execute":"nbd-server-start", "arguments":{"addr":{"type":"unix", "data":{"path":"TEST_DIR/nbd" +{"return": {}} +{"execute":"nbd-server-start", "arguments":{"addr":{"type":"unix", "data":{"path":"TEST_DIR/nbd1" +{"error": {"class": "GenericError", "desc": "NBD server already running"}} +exports available: 0 +{"execute":"nbd-server-add", "arguments":{"device":"n", "bitmap":"b"}} +{"return": {}} +{"execute":"nbd-server-add", "arguments":{"device":"nosuch"}} +{"error": {"class": "GenericError", "desc": "Cannot find device=nosuch nor node_name=nosuch"}} +{"execute":"nbd-server-add", "arguments":{"device":"n"}} +{"error": {"class": "GenericError", "desc": "NBD server already has export named 'n'"}} +{"execute":"nbd-server-add", "arguments":{"device":"n", "name":"n2", "bitmap":"b2"}} +{"error": {"class": "GenericError", "desc": "Enabled bitmap 'b2' incompatible with readonly export"}} +{"execute":"nbd-server-add", "arguments":{"device":"n", "name":"n2", "bitmap":"b3"}} +{"error": {"class": "GenericError", "desc": "Bitmap 'b3' is not found"}} +{"execute":"nbd-server-add", "arguments":{"device":"n", "name":"n2", "writable":true, "bitmap":"b2"}} +{"return": {}} +exports available: 2 + export: 'n' + size: 4194304 + flags: 0x58f ( readonly flush fua df multi cache ) + min block: 1 + opt block: 4096 + max block: 33554432 + available meta contexts: 2 + base:allocation + qemu:dirty-bitmap:b + export: 'n2' + size: 4194304 + flags: 0xced ( flush fua trim zeroes df cache fast-zero ) + min block: 1 + opt block: 4096 + max block: 33554432 + available meta contexts: 2 + base:allocation + qemu:dirty-bitmap:b2 + +=== Contrast normal status to large granularity dirty-bitmap === + +read 512/512 bytes at offset 512 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 524288/524288 bytes at offset 524288 +512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 1048576/1048576 bytes at offset 1048576 +1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 2097152/2097152 bytes at offset 2097152 +2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET}, +{ "start": 4096, "length": 1044480, "depth": 0, "zero": true, "data": false, "offset": OFFSET}, +{ "start": 1048576, "length": 3145728,
[PATCH v2 2/3] iotests: Include QMP input in .out files
We generally include relevant HMP input in .out files, by virtue of the fact that HMP echoes its input. But QMP does not, so we have to explicitly inject it in the output stream, in order to make it easier to read .out files to see what behavior is being tested (especially true where the output file is a sequence of {'return': {}}). Suggested-by: Max Reitz Signed-off-by: Eric Blake --- tests/qemu-iotests/common.qemu | 9 tests/qemu-iotests/085.out | 26 ++ tests/qemu-iotests/094.out | 4 ++ tests/qemu-iotests/095.out | 2 + tests/qemu-iotests/109.out | 88 ++ tests/qemu-iotests/117.out | 5 ++ tests/qemu-iotests/127.out | 4 ++ tests/qemu-iotests/140.out | 5 ++ tests/qemu-iotests/141.out | 26 ++ tests/qemu-iotests/143.out | 3 ++ tests/qemu-iotests/144.out | 5 ++ tests/qemu-iotests/153.out | 11 + tests/qemu-iotests/156.out | 11 + tests/qemu-iotests/161.out | 8 tests/qemu-iotests/173.out | 4 ++ tests/qemu-iotests/182.out | 8 tests/qemu-iotests/183.out | 11 + tests/qemu-iotests/185.out | 18 +++ tests/qemu-iotests/191.out | 8 tests/qemu-iotests/200.out | 1 + tests/qemu-iotests/223.out | 19 tests/qemu-iotests/229.out | 3 ++ tests/qemu-iotests/249.out | 6 +++ 23 files changed, 285 insertions(+) diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu index 8d2021a7eb0c..abc231743e82 100644 --- a/tests/qemu-iotests/common.qemu +++ b/tests/qemu-iotests/common.qemu @@ -123,6 +123,9 @@ _timed_wait_for() # until either timeout, or a response. If it is not set, or <=0, # then the command is only sent once. # +# If neither $silent nor $mismatch_only is set, and $cmd begins with '{', +# echo the command before sending it the first time. +# # If $qemu_error_no_exit is set, then even if the expected response # is not seen, we will not exit. $QEMU_STATUS[$1] will be set it -1 in # that case. @@ -152,6 +155,12 @@ _send_qemu_cmd() shift $(($# - 2)) fi +# Display QMP being sent, but not HMP (since HMP already echoes its +# input back to output); decide based on leading '{' +if [ -z "$silent" ] && [ -z "$mismatch_only" ] && +[ "$cmd" != "${cmd#{}" ]; then +echo "${cmd}" | _filter_testdir +fi while [ ${count} -gt 0 ] do echo "${cmd}" >&${QEMU_IN[${h}]} diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out index 2a5f256cd3ec..e92f125b63c4 100644 --- a/tests/qemu-iotests/085.out +++ b/tests/qemu-iotests/085.out @@ -7,48 +7,61 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 === Sending capabilities === +{ 'execute': 'qmp_capabilities' } {"return": {}} === Create a single snapshot on virtio0 === +{ 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'virtio0', 'snapshot-file':'TEST_DIR/1-snapshot-v0.qcow2', 'format': 'qcow2' } } Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 {"return": {}} === Invalid command - missing device and nodename === +{ 'execute': 'blockdev-snapshot-sync', 'arguments': { 'snapshot-file':'TEST_DIR/1-snapshot-v0.qcow2', 'format': 'qcow2' } } {"error": {"class": "GenericError", "desc": "Cannot find device= nor node_name="}} === Invalid command - missing snapshot-file === +{ 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'virtio0', 'format': 'qcow2' } } {"error": {"class": "GenericError", "desc": "Parameter 'snapshot-file' is missing"}} === Create several transactional group snapshots === +{ 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/2-snapshot-v0.qcow2' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/2-snapshot-v1.qcow2' } } ] } } Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/1-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 {"return": {}} +{ 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/3-snapshot-v0.qcow2' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/3-snapshot-v1.qcow2' } } ] } } Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2
[PATCH v3 0/5] qcow2: advanced compression options
New enhancements for writing compressed data to QCOW2 image. The preceding patches have been queued in the Max's block branch: Based-on: <20190916175324.18478-1-vsement...@virtuozzo.com> v2: Instead of introducing multiple key options for many drivers, the 'compression' option has been introduced on generic block layer as suggested by Roman Kagan. Discussed on the thread ID <1570026166-748566-1-git-send-email-andrey.shinkev...@virtuozzo.com> Andrey Shinkevich (5): qcow2: Allow writing compressed data of multiple clusters tests/qemu-iotests: add case to write compressed data of multiple clusters block: support compressed write for copy-on-read block-stream: add compress option tests/qemu-iotests: add case for block-stream compress block.c| 12 - block/io.c | 23 +++--- block/qcow2.c | 106 + block/qcow2.h | 1 + block/stream.c | 10 - block/trace-events | 2 +- blockdev.c | 16 ++- include/block/block.h | 1 + include/block/block_int.h | 2 + qapi/block-core.json | 6 ++- qemu-options.hx| 6 ++- tests/qemu-iotests/030 | 51 +- tests/qemu-iotests/030.out | 4 +- tests/qemu-iotests/214 | 35 +++ tests/qemu-iotests/214.out | 15 +++ 15 files changed, 246 insertions(+), 44 deletions(-) -- 1.8.3.1
[PATCH v3 5/5] tests/qemu-iotests: add case for block-stream compress
Add a case to the iotest #030 that tests the 'compress' option for a block-stream job. Signed-off-by: Andrey Shinkevich --- tests/qemu-iotests/030 | 51 +- tests/qemu-iotests/030.out | 4 ++-- 2 files changed, 52 insertions(+), 3 deletions(-) diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030 index f3766f2..f0f0e26 100755 --- a/tests/qemu-iotests/030 +++ b/tests/qemu-iotests/030 @@ -21,7 +21,8 @@ import time import os import iotests -from iotests import qemu_img, qemu_io +from iotests import qemu_img, qemu_io, qemu_img_pipe +import json backing_img = os.path.join(iotests.test_dir, 'backing.img') mid_img = os.path.join(iotests.test_dir, 'mid.img') @@ -956,6 +957,54 @@ class TestSetSpeed(iotests.QMPTestCase): self.cancel_and_wait(resume=True) +class TestCompressed(iotests.QMPTestCase): +test_img_init_size = 0 + +def setUp(self): +qemu_img('create', '-f', iotests.imgfmt, backing_img, '1M') +qemu_img('create', '-f', iotests.imgfmt, '-o', + 'backing_file=%s' % backing_img, mid_img) +qemu_img('create', '-f', iotests.imgfmt, '-o', + 'backing_file=%s' % mid_img, test_img) +qemu_io('-c', 'write -P 0x1 0 512k', backing_img) +top = json.loads(qemu_img_pipe('info', '--output=json', test_img)) +self.test_img_init_size = top['actual-size'] +self.vm = iotests.VM().add_drive(test_img, "backing.node-name=mid," + + "backing.backing.node-name=base," + + "compress=on") +self.vm.launch() + +def tearDown(self): +self.vm.shutdown() +os.remove(test_img) +os.remove(mid_img) +os.remove(backing_img) + +def test_stream_compress(self): +self.assert_no_active_block_jobs() + +result = self.vm.qmp('block-stream', device='mid', job_id='stream-mid') +self.assert_qmp(result, 'return', {}) + +self.wait_until_completed(drive='stream-mid') +# Remove other 'JOB_STATUS_CHANGE' events for the job 'stream-mid' +self.vm.get_qmp_events(wait=True) + +result = self.vm.qmp('block-stream', device='drive0', + job_id='stream-top') +self.assert_qmp(result, 'return', {}) + +self.wait_until_completed(drive='stream-top') +self.vm.shutdown() + +top = json.loads(qemu_img_pipe('info', '--output=json', test_img)) +mid = json.loads(qemu_img_pipe('info', '--output=json', mid_img)) +base = json.loads(qemu_img_pipe('info', '--output=json', backing_img)) + +self.assertEqual(mid['actual-size'], base['actual-size']) +self.assertLess(top['actual-size'], mid['actual-size']) +self.assertLess(self.test_img_init_size, top['actual-size']) + if __name__ == '__main__': iotests.main(supported_fmts=['qcow2', 'qed'], supported_protocols=['file']) diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out index 6d9bee1..af8dac1 100644 --- a/tests/qemu-iotests/030.out +++ b/tests/qemu-iotests/030.out @@ -1,5 +1,5 @@ -... + -- -Ran 27 tests +Ran 28 tests OK -- 1.8.3.1
[PATCH v3 4/5] block-stream: add compress option
Allow data compression during block-stream job for backup backing chain. Signed-off-by: Andrey Shinkevich --- block/stream.c | 10 -- blockdev.c | 12 +++- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/block/stream.c b/block/stream.c index 5562ccb..25f9324 100644 --- a/block/stream.c +++ b/block/stream.c @@ -41,10 +41,16 @@ typedef struct StreamBlockJob { static int coroutine_fn stream_populate(BlockBackend *blk, int64_t offset, uint64_t bytes) { +BlockDriverState *bs = blk_bs(blk); +int flags = BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH; + +if (bs->all_write_compressed) { +flags |= BDRV_REQ_WRITE_COMPRESSED; +} + assert(bytes < SIZE_MAX); -return blk_co_preadv(blk, offset, bytes, NULL, - BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH); +return blk_co_preadv(blk, offset, bytes, NULL, flags); } static void stream_abort(Job *job) diff --git a/blockdev.c b/blockdev.c index 2103730..fd824da 100644 --- a/blockdev.c +++ b/blockdev.c @@ -471,7 +471,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, int bdrv_flags = 0; int on_read_error, on_write_error; bool account_invalid, account_failed; -bool writethrough, read_only; +bool writethrough, read_only, compress; BlockBackend *blk; BlockDriverState *bs; ThrottleConfig cfg; @@ -570,6 +570,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, } read_only = qemu_opt_get_bool(opts, BDRV_OPT_READ_ONLY, false); +compress = qemu_opt_get_bool(opts, BDRV_OPT_COMPRESS, false); /* init */ if ((!file || !*file) && !qdict_size(bs_opts)) { @@ -595,6 +596,8 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, qdict_set_default_str(bs_opts, BDRV_OPT_READ_ONLY, read_only ? "on" : "off"); qdict_set_default_str(bs_opts, BDRV_OPT_AUTO_READ_ONLY, "on"); +qdict_set_default_str(bs_opts, BDRV_OPT_COMPRESS, + compress ? "on" : "off"); assert((bdrv_flags & BDRV_O_CACHE_MASK) == 0); if (runstate_check(RUN_STATE_INMIGRATE)) { @@ -3308,6 +3311,13 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, goto out; } +if (bs->all_write_compressed && +bs->drv->bdrv_co_pwritev_compressed_part == NULL) { +error_setg(errp, "Compression is not supported for this drive %s", + bdrv_get_device_name(bs)); +goto out; +} + /* backing_file string overrides base bs filename */ base_name = has_backing_file ? backing_file : base_name; -- 1.8.3.1
[PATCH v3 3/5] block: support compressed write for copy-on-read
Support the data compression during block-stream job over a backup backing chain implemented in the following patch 'block-stream: add compress option'. Signed-off-by: Anton Nefedov Signed-off-by: Denis V. Lunev Signed-off-by: Andrey Shinkevich --- block/io.c | 21 - block/trace-events | 2 +- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/block/io.c b/block/io.c index 6a5509c..fc7f157 100644 --- a/block/io.c +++ b/block/io.c @@ -1264,12 +1264,13 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child, * allocating cluster in the image file. Note that this value may exceed * BDRV_REQUEST_MAX_BYTES (even when the original read did not), which * is one reason we loop rather than doing it all at once. + * Also, this is crucial for compressed copy-on-read. */ bdrv_round_to_clusters(bs, offset, bytes, _offset, _bytes); skip_bytes = offset - cluster_offset; trace_bdrv_co_do_copy_on_readv(bs, offset, bytes, - cluster_offset, cluster_bytes); + cluster_offset, cluster_bytes, flags); while (cluster_bytes) { int64_t pnum; @@ -1328,9 +1329,15 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child, /* This does not change the data on the disk, it is not * necessary to flush even in cache=writethrough mode. */ -ret = bdrv_driver_pwritev(bs, cluster_offset, pnum, - _qiov, 0, - BDRV_REQ_WRITE_UNCHANGED); +if (flags & BDRV_REQ_WRITE_COMPRESSED) { +ret = bdrv_driver_pwritev_compressed(bs, cluster_offset, + pnum, _qiov, + qiov_offset); +} else { +ret = bdrv_driver_pwritev(bs, cluster_offset, pnum, + _qiov, 0, + BDRV_REQ_WRITE_UNCHANGED); +} } if (ret < 0) { @@ -1396,7 +1403,11 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, * to pass through to drivers. For now, there aren't any * passthrough flags. */ assert(!(flags & ~(BDRV_REQ_NO_SERIALISING | BDRV_REQ_COPY_ON_READ | - BDRV_REQ_PREFETCH))); + BDRV_REQ_PREFETCH | BDRV_REQ_WRITE_COMPRESSED))); + +/* write compressed only makes sense with copy on read */ +assert(!(flags & BDRV_REQ_WRITE_COMPRESSED) || + (flags & BDRV_REQ_COPY_ON_READ)); /* Handle Copy on Read and associated serialisation */ if (flags & BDRV_REQ_COPY_ON_READ) { diff --git a/block/trace-events b/block/trace-events index 3aa27e6..f444548 100644 --- a/block/trace-events +++ b/block/trace-events @@ -14,7 +14,7 @@ blk_root_detach(void *child, void *blk, void *bs) "child %p blk %p bs %p" bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x" bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x" bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x" -bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, int64_t cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %"PRId64 +bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, int64_t cluster_bytes, int flags) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %"PRId64" flags 0x%x" bdrv_co_copy_range_from(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int read_flags, int write_flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" rw flags 0x%x 0x%x" bdrv_co_copy_range_to(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int read_flags, int write_flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" rw flags 0x%x 0x%x" -- 1.8.3.1
[PATCH v2 0/3] tests: More iotest 223 improvements
[subject line kept for continuity with v1, but now touches much more] Max suggested that instead of special-casing just 223 to trace QMP input as well output, that we should instead patch common.qemu to do it for all tests. That in turn found that test 173 has been broken since v3.0. Max also suggested that 223 use a for loop rather than massive code duplication, which does indeed look nicer. Eric Blake (3): iotests: Fix 173 iotests: Include QMP input in .out files tests: More iotest 223 improvements tests/qemu-iotests/common.qemu | 9 +++ tests/qemu-iotests/085.out | 26 + tests/qemu-iotests/094.out | 4 ++ tests/qemu-iotests/095.out | 2 + tests/qemu-iotests/109.out | 88 + tests/qemu-iotests/117.out | 5 ++ tests/qemu-iotests/127.out | 4 ++ tests/qemu-iotests/140.out | 5 ++ tests/qemu-iotests/141.out | 26 + tests/qemu-iotests/143.out | 3 + tests/qemu-iotests/144.out | 5 ++ tests/qemu-iotests/153.out | 11 tests/qemu-iotests/156.out | 11 tests/qemu-iotests/161.out | 8 +++ tests/qemu-iotests/173 | 4 +- tests/qemu-iotests/173.out | 10 +++- tests/qemu-iotests/182.out | 8 +++ tests/qemu-iotests/183.out | 11 tests/qemu-iotests/185.out | 18 ++ tests/qemu-iotests/191.out | 8 +++ tests/qemu-iotests/200.out | 1 + tests/qemu-iotests/223 | 16 +- tests/qemu-iotests/223.out | 100 + tests/qemu-iotests/229.out | 3 + tests/qemu-iotests/249.out | 6 ++ 25 files changed, 387 insertions(+), 5 deletions(-) -- 2.21.0
[PATCH v2 16/21] iotests: Make 091 work with data_file
The image end offset as reported by qemu-img check is different when using an external data file; we do not care about its value here, so we can just filter it. Incidentally, common.rc already has _check_test_img for us which does exactly that. Signed-off-by: Max Reitz --- tests/qemu-iotests/091 | 2 +- tests/qemu-iotests/091.out | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/tests/qemu-iotests/091 b/tests/qemu-iotests/091 index f4b44659ae..0874fa84c8 100755 --- a/tests/qemu-iotests/091 +++ b/tests/qemu-iotests/091 @@ -101,7 +101,7 @@ echo "Check image pattern" ${QEMU_IO} -c "read -P 0x22 0 4M" "${TEST_IMG}" | _filter_testdir | _filter_qemu_io echo "Running 'qemu-img check -r all \$TEST_IMG'" -"${QEMU_IMG}" check -r all "${TEST_IMG}" 2>&1 | _filter_testdir | _filter_qemu +_check_test_img -r all echo "*** done" rm -f $seq.full diff --git a/tests/qemu-iotests/091.out b/tests/qemu-iotests/091.out index 5017f8c2d9..5ec7b00f13 100644 --- a/tests/qemu-iotests/091.out +++ b/tests/qemu-iotests/091.out @@ -23,6 +23,4 @@ read 4194304/4194304 bytes at offset 0 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) Running 'qemu-img check -r all $TEST_IMG' No errors were found on the image. -80/16384 = 0.49% allocated, 0.00% fragmented, 0.00% compressed clusters -Image end offset: 5570560 *** done -- 2.21.0
[PATCH v2 08/21] iotests: Add -o and --no-opts to _make_test_img
Blindly overriding IMGOPTS is suboptimal as this discards user-specified options. Whatever options the test needs should simply be appended. Some tests do this (with IMGOPTS=$(_optstr_add "$IMGOPTS" "...")), but that is cumbersome. It’s simpler to just give _make_test_img an -o parameter with which tests can add options. Some tests actually must override the user-specified options, though, for example when creating an image in a different format than the test $IMGFMT. For such cases, --no-opts allows clearing the current option list. Signed-off-by: Max Reitz Reviewed-by: Maxim Levitsky --- tests/qemu-iotests/common.rc | 13 + 1 file changed, 13 insertions(+) diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc index 3e7adc4834..f3784077de 100644 --- a/tests/qemu-iotests/common.rc +++ b/tests/qemu-iotests/common.rc @@ -287,6 +287,7 @@ _make_test_img() local use_backing=0 local backing_file="" local object_options="" +local opts_param=false local misc_params=() if [ -n "$TEST_IMG_FILE" ]; then @@ -307,6 +308,10 @@ _make_test_img() if [ "$use_backing" = "1" -a -z "$backing_file" ]; then backing_file=$param continue +elif $opts_param; then +optstr=$(_optstr_add "$optstr" "$param") +opts_param=false +continue fi case "$param" in @@ -314,6 +319,14 @@ _make_test_img() use_backing=1 ;; +-o) +opts_param=true +;; + +--no-opts) +optstr="" +;; + *) misc_params=("${misc_params[@]}" "$param") ;; -- 2.21.0
[PATCH v2 11/21] iotests: Replace IMGOPTS='' by --no-opts
Signed-off-by: Max Reitz Reviewed-by: Maxim Levitsky --- tests/qemu-iotests/071 | 4 ++-- tests/qemu-iotests/174 | 2 +- tests/qemu-iotests/178 | 4 ++-- tests/qemu-iotests/197 | 4 ++-- tests/qemu-iotests/215 | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071 index fab52b..4e31943244 100755 --- a/tests/qemu-iotests/071 +++ b/tests/qemu-iotests/071 @@ -58,7 +58,7 @@ echo echo "=== Testing blkverify through filename ===" echo -TEST_IMG="$TEST_IMG.base" IMGOPTS="" IMGFMT="raw" _make_test_img $IMG_SIZE |\ +TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE |\ _filter_imgfmt _make_test_img $IMG_SIZE $QEMU_IO -c "open -o driver=raw,file.driver=blkverify,file.raw.filename=$TEST_IMG.base $TEST_IMG" \ @@ -73,7 +73,7 @@ echo echo "=== Testing blkverify through file blockref ===" echo -TEST_IMG="$TEST_IMG.base" IMGOPTS="" IMGFMT="raw" _make_test_img $IMG_SIZE |\ +TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE |\ _filter_imgfmt _make_test_img $IMG_SIZE $QEMU_IO -c "open -o driver=raw,file.driver=blkverify,file.raw.filename=$TEST_IMG.base,file.test.driver=$IMGFMT,file.test.file.filename=$TEST_IMG" \ diff --git a/tests/qemu-iotests/174 b/tests/qemu-iotests/174 index 0a952a73fd..e2f14a38c6 100755 --- a/tests/qemu-iotests/174 +++ b/tests/qemu-iotests/174 @@ -40,7 +40,7 @@ _unsupported_fmt raw size=256K -IMGFMT=raw IMGKEYSECRET= IMGOPTS= _make_test_img $size | _filter_imgfmt +IMGFMT=raw IMGKEYSECRET= _make_test_img --no-opts $size | _filter_imgfmt echo echo "== reading wrong format should fail ==" diff --git a/tests/qemu-iotests/178 b/tests/qemu-iotests/178 index 21231cadd3..75b5e8f314 100755 --- a/tests/qemu-iotests/178 +++ b/tests/qemu-iotests/178 @@ -62,8 +62,8 @@ $QEMU_IMG measure -O foo "$TEST_IMG" # unknown image file format make_test_img_with_fmt() { # Shadow global variables within this function -local IMGFMT="$1" IMGOPTS="" -_make_test_img "$2" +local IMGFMT="$1" +_make_test_img --no-opts "$2" } qemu_io_with_fmt() { diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197 index 1d4f6786db..4d3d08ad6f 100755 --- a/tests/qemu-iotests/197 +++ b/tests/qemu-iotests/197 @@ -66,8 +66,8 @@ if [ "$IMGFMT" = "vpc" ]; then fi _make_test_img 4G $QEMU_IO -c "write -P 55 3G 1k" "$TEST_IMG" | _filter_qemu_io -IMGPROTO=file IMGFMT=qcow2 IMGOPTS= TEST_IMG_FILE="$TEST_WRAP" \ -_make_test_img -F "$IMGFMT" -b "$TEST_IMG" | _filter_img_create +IMGPROTO=file IMGFMT=qcow2 TEST_IMG_FILE="$TEST_WRAP" \ +_make_test_img --no-opts -F "$IMGFMT" -b "$TEST_IMG" | _filter_img_create $QEMU_IO -f qcow2 -c "write -z -u 1M 64k" "$TEST_WRAP" | _filter_qemu_io # Ensure that a read of two clusters, but where one is already allocated, diff --git a/tests/qemu-iotests/215 b/tests/qemu-iotests/215 index 2eb377d682..55a1874dcd 100755 --- a/tests/qemu-iotests/215 +++ b/tests/qemu-iotests/215 @@ -63,8 +63,8 @@ if [ "$IMGFMT" = "vpc" ]; then fi _make_test_img 4G $QEMU_IO -c "write -P 55 3G 1k" "$TEST_IMG" | _filter_qemu_io -IMGPROTO=file IMGFMT=qcow2 IMGOPTS= TEST_IMG_FILE="$TEST_WRAP" \ -_make_test_img -F "$IMGFMT" -b "$TEST_IMG" | _filter_img_create +IMGPROTO=file IMGFMT=qcow2 TEST_IMG_FILE="$TEST_WRAP" \ +_make_test_img --no-opts -F "$IMGFMT" -b "$TEST_IMG" | _filter_img_create $QEMU_IO -f qcow2 -c "write -z -u 1M 64k" "$TEST_WRAP" | _filter_qemu_io # Ensure that a read of two clusters, but where one is already allocated, -- 2.21.0
[PATCH v2 15/21] iotests: Avoid cp/mv of test images
This will not work with external data files, so try to get tests working without it as far as possible. Signed-off-by: Max Reitz Reviewed-by: Maxim Levitsky --- tests/qemu-iotests/063 | 12 tests/qemu-iotests/063.out | 3 ++- tests/qemu-iotests/085 | 9 +++-- tests/qemu-iotests/085.out | 8 4 files changed, 13 insertions(+), 19 deletions(-) diff --git a/tests/qemu-iotests/063 b/tests/qemu-iotests/063 index eef2b8a534..c750b3806e 100755 --- a/tests/qemu-iotests/063 +++ b/tests/qemu-iotests/063 @@ -51,15 +51,13 @@ _unsupported_imgopts "subformat=monolithicFlat" \ _make_test_img 4M echo "== Testing conversion with -n fails with no target file ==" -# check .orig file does not exist -rm -f "$TEST_IMG.orig" if $QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n "$TEST_IMG" "$TEST_IMG.orig" >/dev/null 2>&1; then exit 1 fi echo "== Testing conversion with -n succeeds with a target file ==" -rm -f "$TEST_IMG.orig" -cp "$TEST_IMG" "$TEST_IMG.orig" +_rm_test_img "$TEST_IMG.orig" +TEST_IMG="$TEST_IMG.orig" _make_test_img 4M if ! $QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n "$TEST_IMG" "$TEST_IMG.orig" ; then exit 1 fi @@ -85,10 +83,8 @@ fi _check_test_img echo "== Testing conversion to a smaller file fails ==" -rm -f "$TEST_IMG.orig" -mv "$TEST_IMG" "$TEST_IMG.orig" -_make_test_img 2M -if $QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n "$TEST_IMG.orig" "$TEST_IMG" >/dev/null 2>&1; then +TEST_IMG="$TEST_IMG.target" _make_test_img 2M +if $QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n "$TEST_IMG" "$TEST_IMG.target" >/dev/null 2>&1; then exit 1 fi diff --git a/tests/qemu-iotests/063.out b/tests/qemu-iotests/063.out index 7b691b2c9e..890b719bf0 100644 --- a/tests/qemu-iotests/063.out +++ b/tests/qemu-iotests/063.out @@ -2,11 +2,12 @@ QA output created by 063 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304 == Testing conversion with -n fails with no target file == == Testing conversion with -n succeeds with a target file == +Formatting 'TEST_DIR/t.IMGFMT.orig', fmt=IMGFMT size=4194304 == Testing conversion to raw is the same after conversion with -n == == Testing conversion back to original format == No errors were found on the image. == Testing conversion to a smaller file fails == -Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2097152 +Formatting 'TEST_DIR/t.IMGFMT.target', fmt=IMGFMT size=2097152 == Regression testing for copy offloading bug == Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 Formatting 'TEST_DIR/t.IMGFMT.target', fmt=IMGFMT size=1048576 diff --git a/tests/qemu-iotests/085 b/tests/qemu-iotests/085 index bbea1252d2..46981dbb64 100755 --- a/tests/qemu-iotests/085 +++ b/tests/qemu-iotests/085 @@ -105,8 +105,7 @@ add_snapshot_image() { base_image="${TEST_DIR}/$((${1}-1))-${snapshot_virt0}" snapshot_file="${TEST_DIR}/${1}-${snapshot_virt0}" -_make_test_img -u -b "${base_image}" "$size" -mv "${TEST_IMG}" "${snapshot_file}" +TEST_IMG=$snapshot_file _make_test_img -u -b "${base_image}" "$size" do_blockdev_add "$1" "'backing': null, " "${snapshot_file}" } @@ -122,10 +121,8 @@ blockdev_snapshot() size=128M -_make_test_img $size -mv "${TEST_IMG}" "${TEST_IMG}.1" -_make_test_img $size -mv "${TEST_IMG}" "${TEST_IMG}.2" +TEST_IMG="$TEST_IMG.1" _make_test_img $size +TEST_IMG="$TEST_IMG.2" _make_test_img $size echo echo === Running QEMU === diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out index 2a5f256cd3..313198f182 100644 --- a/tests/qemu-iotests/085.out +++ b/tests/qemu-iotests/085.out @@ -1,6 +1,6 @@ QA output created by 085 -Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 -Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 +Formatting 'TEST_DIR/t.IMGFMT.1', fmt=IMGFMT size=134217728 +Formatting 'TEST_DIR/t.IMGFMT.2', fmt=IMGFMT size=134217728 === Running QEMU === @@ -55,10 +55,10 @@ Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_fil === Create a couple of snapshots using blockdev-snapshot === -Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/10-snapshot-v0.IMGFMT +Formatting 'TEST_DIR/11-snapshot-v0.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/10-snapshot-v0.IMGFMT {"return": {}} {"return": {}} -Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/11-snapshot-v0.IMGFMT +Formatting 'TEST_DIR/12-snapshot-v0.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/11-snapshot-v0.IMGFMT {"return": {}} {"return": {}} -- 2.21.0
[PATCH v2 09/21] iotests: Inject space into -ocompat=0.10 in 051
It did not matter before, but now that _make_test_img understands -o, we should use it properly here. Signed-off-by: Max Reitz Reviewed-by: Maxim Levitsky --- tests/qemu-iotests/051 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051 index 53bcdbc911..9cd1d60d45 100755 --- a/tests/qemu-iotests/051 +++ b/tests/qemu-iotests/051 @@ -157,7 +157,7 @@ echo echo === With version 2 images enabling lazy refcounts must fail === echo -_make_test_img -ocompat=0.10 $size +_make_test_img -o compat=0.10 $size run_qemu -drive file="$TEST_IMG",format=qcow2,lazy-refcounts=on run_qemu -drive file="$TEST_IMG",format=qcow2,lazy-refcounts=off -- 2.21.0
Re: [PATCH] blockdev: Use error_report() in hmp_commit()
On 10/15/19 2:39 PM, Kevin Wolf wrote: Instead of using monitor_printf() to report errors, hmp_commit() should use error_report() like other places do. Signed-off-by: Kevin Wolf --- blockdev.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/blockdev.c b/blockdev.c index f89e48fc79..e2358966c3 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1088,11 +1088,11 @@ void hmp_commit(Monitor *mon, const QDict *qdict) blk = blk_by_name(device); if (!blk) { -monitor_printf(mon, "Device '%s' not found\n", device); +error_report("Device '%s' not found", device); return; } if (!blk_is_available(blk)) { -monitor_printf(mon, "Device '%s' has no medium\n", device); +error_report("Device '%s' has no medium", device); return; } @@ -1105,8 +1105,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict) aio_context_release(aio_context); } if (ret < 0) { -monitor_printf(mon, "'commit' error for '%s': %s\n", device, - strerror(-ret)); +error_report("'commit' error for '%s': %s", device, strerror(-ret)); } } Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v2 00/20] nvme: support NVMe v1.3d, SGLs and multiple namespaces
Patchew URL: https://patchew.org/QEMU/20191015103900.313928-1-...@irrelevant.dk/ Hi, This series seems to have some coding style problems. See output below for more information: Subject: [PATCH v2 00/20] nvme: support NVMe v1.3d, SGLs and multiple namespaces Type: series Message-id: 20191015103900.313928-1-...@irrelevant.dk === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Switched to a new branch 'test' c68f7e0 nvme: handle dma errors 855f2b8 nvme: make lba data size configurable 68fc575 nvme: remove redundant NvmeCmd pointer parameter eb585d1 nvme: bump controller pci device id 227280c nvme: support multiple namespaces ccc877b nvme: add support for scatter gather lists 76d6fe6 nvme: allow multiple aios per command 73227cb nvme: refactor prp mapping df5fd9f nvme: bump supported specification version to 1.3 c85c0ff nvme: add missing mandatory features 1188552 nvme: add logging to error information log page 714808c nvme: add support for the asynchronous event request command 88bdfce nvme: add support for the get log page command 7716649 nvme: refactor device realization 7d2d51e nvme: add support for the abort command 4ec0e81 nvme: allow completion queues in the cmb 68f00db nvme: populate the mandatory subnqn and ver fields f08d66a nvme: add missing fields in the identify controller data structure 315a6eb nvme: move device parameters to separate struct b94cf4a nvme: remove superfluous breaks === OUTPUT BEGIN === 1/20 Checking commit b94cf4aea07b (nvme: remove superfluous breaks) 2/20 Checking commit 315a6eb1f09f (nvme: move device parameters to separate struct) ERROR: Macros with complex values should be enclosed in parenthesis #177: FILE: hw/block/nvme.h:6: +#define DEFINE_NVME_PROPERTIES(_state, _props) \ +DEFINE_PROP_STRING("serial", _state, _props.serial), \ +DEFINE_PROP_UINT32("cmb_size_mb", _state, _props.cmb_size_mb, 0), \ +DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 64) total: 1 errors, 0 warnings, 181 lines checked Patch 2/20 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 3/20 Checking commit f08d66aa761b (nvme: add missing fields in the identify controller data structure) 4/20 Checking commit 68f00db57e87 (nvme: populate the mandatory subnqn and ver fields) 5/20 Checking commit 4ec0e81a8ca5 (nvme: allow completion queues in the cmb) 6/20 Checking commit 7d2d51e5da89 (nvme: add support for the abort command) 7/20 Checking commit 7716649c3d6d (nvme: refactor device realization) 8/20 Checking commit 88bdfce1a599 (nvme: add support for the get log page command) 9/20 Checking commit 714808cd3ef8 (nvme: add support for the asynchronous event request command) 10/20 Checking commit 11885522fa87 (nvme: add logging to error information log page) 11/20 Checking commit c85c0ff5ea35 (nvme: add missing mandatory features) 12/20 Checking commit df5fd9f283a4 (nvme: bump supported specification version to 1.3) 13/20 Checking commit 73227cb3c83c (nvme: refactor prp mapping) 14/20 Checking commit 76d6fe6ea1cf (nvme: allow multiple aios per command) 15/20 Checking commit ccc877b6f72b (nvme: add support for scatter gather lists) 16/20 Checking commit 227280c8d08c (nvme: support multiple namespaces) WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #42: new file mode 100644 total: 0 errors, 1 warnings, 801 lines checked Patch 16/20 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 17/20 Checking commit eb585d1231e3 (nvme: bump controller pci device id) 18/20 Checking commit 68fc575b3fc7 (nvme: remove redundant NvmeCmd pointer parameter) 19/20 Checking commit 855f2b86dd6c (nvme: make lba data size configurable) 20/20 Checking commit c68f7e0d0c55 (nvme: handle dma errors) WARNING: line over 80 characters #77: FILE: hw/block/nvme.c:257: +if (nvme_addr_read(n, prp_ent, (void *) prp_list, prp_trans)) { WARNING: line over 80 characters #103: FILE: hw/block/nvme.c:428: +if (nvme_addr_read(n, addr, segment, nsgld * sizeof(NvmeSglDescriptor))) { total: 0 errors, 2 warnings, 148 lines checked Patch 20/20 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20191015103900.313928-1-...@irrelevant.dk/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [PATCH v2 6/6] tests/qemu-iotests: add case for block-stream compress
On 03/10/2019 17:58, Vladimir Sementsov-Ogievskiy wrote: > 02.10.2019 17:22, Andrey Shinkevich wrote: >> Add a test case to the iotest #030 that checks 'compress' option for a >> block-stream job. >> >> Signed-off-by: Andrey Shinkevich >> --- >>tests/qemu-iotests/030 | 49 >> +- >>tests/qemu-iotests/030.out | 4 ++-- >>2 files changed, 50 insertions(+), 3 deletions(-) >> >> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030 >> index f3766f2..13fe5a2 100755 >> --- a/tests/qemu-iotests/030 >> +++ b/tests/qemu-iotests/030 >> @@ -21,7 +21,8 @@ >>import time >>import os >>import iotests >> -from iotests import qemu_img, qemu_io >> +from iotests import qemu_img, qemu_io, qemu_img_pipe >> +import json >> >>backing_img = os.path.join(iotests.test_dir, 'backing.img') >>mid_img = os.path.join(iotests.test_dir, 'mid.img') >> @@ -956,6 +957,52 @@ class TestSetSpeed(iotests.QMPTestCase): >> >>self.cancel_and_wait(resume=True) >> >> +class TestCompressed(iotests.QMPTestCase): >> + >> +def setUp(self): >> +qemu_img('create', '-f', iotests.imgfmt, backing_img, '1M') >> +qemu_img('create', '-f', iotests.imgfmt, '-o', >> + 'backing_file=%s' % backing_img, mid_img) >> +qemu_img('create', '-f', iotests.imgfmt, '-o', >> + 'backing_file=%s' % mid_img, test_img) >> +qemu_io('-c', 'write -P 0x1 0 512k', backing_img) >> +self.vm = iotests.VM().add_drive(test_img, "backing.node-name=mid," >> + >> + "backing.backing.node-name=base") >> +self.vm.launch() > > Why you can't just add a test-case to TestSingleDrive class? Their setUp() functions differ. > >> + >> +def tearDown(self): >> +self.vm.shutdown() >> +os.remove(test_img) >> +os.remove(mid_img) >> +os.remove(backing_img) >> + >> +def test_stream_compress(self): >> +self.assert_no_active_block_jobs() >> + >> +result = self.vm.qmp('block-stream', device='mid', >> job_id='stream-mid') >> +self.assert_qmp(result, 'return', {}) >> + >> +self.wait_until_completed(drive='stream-mid') >> +for event in self.vm.get_qmp_events(wait=True): >> +if event['event'] == 'BLOCK_JOB_COMPLETED': >> +self.dictpath(event, 'data/device') >> +self.assert_qmp_absent(event, 'data/error') > > COMPLETED event is for sure already waited by wait_until_completed > >> + >> +result = self.vm.qmp('block-stream', device='drive0', base=mid_img, >> + job_id='stream-top', compress=True) >> +self.assert_qmp(result, 'return', {}) >> + >> +self.wait_until_completed(drive='stream-top') >> +self.assert_no_active_block_jobs() > > this assertion is done in wait_until_completed > >> +self.vm.shutdown() >> + >> +top = json.loads(qemu_img_pipe('info', '--output=json', test_img)) >> +mid = json.loads(qemu_img_pipe('info', '--output=json', mid_img)) >> +base = json.loads(qemu_img_pipe('info', '--output=json', >> backing_img)) >> + >> +self.assertEqual(mid['actual-size'], base['actual-size']) >> +self.assertLess(top['actual-size'], mid['actual-size']) >> + >>if __name__ == '__main__': >>iotests.main(supported_fmts=['qcow2', 'qed'], >> supported_protocols=['file']) >> diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out >> index 6d9bee1..af8dac1 100644 >> --- a/tests/qemu-iotests/030.out >> +++ b/tests/qemu-iotests/030.out >> @@ -1,5 +1,5 @@ >> -... >> + >>-- >> -Ran 27 tests >> +Ran 28 tests >> >>OK >> > > -- With the best regards, Andrey Shinkevich
[PATCH v2 1/3] iotests: Fix 173
This test has been broken since 3.0. It used TEST_IMG to influence the name of a file created during _make_test_img, but commit 655ae6bb changed things so that the wrong file name is being created, which then caused _launch_qemu to fail. In the meantime, the set of events issued for the actions of the test has increased. Why haven't we noticed the failure? Because the test rarely gets run: './check -qcow2 173' is insufficient (that defaults to using file protocol) './check -nfs 173' is insufficient (that defaults to using raw format) so the test is only run with: ./check -qcow2 -nfs 173 Note that we already have a number of other problems with -nfs: ./check -nfs (fails 18/30) ./check -qcow2 -nfs (fails 45/76 after this patch) and it's not on my priority list to fix those. Rather, I found this because of my next patch's work on tests using _send_qemu_cmd. Fixes: 655ae6b Signed-off-by: Eric Blake --- tests/qemu-iotests/173 | 4 ++-- tests/qemu-iotests/173.out | 6 +- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/tests/qemu-iotests/173 b/tests/qemu-iotests/173 index 9e2fa2e73cb9..29dcaa1960df 100755 --- a/tests/qemu-iotests/173 +++ b/tests/qemu-iotests/173 @@ -47,9 +47,9 @@ size=100M BASE_IMG="${TEST_DIR}/image.base" TOP_IMG="${TEST_DIR}/image.snp1" -TEST_IMG="${BASE_IMG}" _make_test_img $size +TEST_IMG_FILE="${BASE_IMG}" _make_test_img $size -TEST_IMG="${TOP_IMG}" _make_test_img $size +TEST_IMG_FILE="${TOP_IMG}" _make_test_img $size echo echo === Running QEMU, using block-stream to find backing image === diff --git a/tests/qemu-iotests/173.out b/tests/qemu-iotests/173.out index f477a0099a32..e83d17ec2f64 100644 --- a/tests/qemu-iotests/173.out +++ b/tests/qemu-iotests/173.out @@ -7,6 +7,10 @@ Formatting 'TEST_DIR/image.snp1', fmt=IMGFMT size=104857600 {"return": {}} {"return": {}} {"return": {}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk2"}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk2"}} {"return": {}} -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "disk2", "len": 104857600, "offset": 104857600, "speed": 0, "type": "stream"}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "disk2"}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "disk2"}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "disk2", "len": 0, "offset": 0, "speed": 0, "type": "stream"}} *** done -- 2.21.0
Re: [PULL 0/2] Tracing patches
On 10/15/19 2:24 PM, Peter Maydell wrote: On Mon, 14 Oct 2019 at 09:57, Stefan Hajnoczi wrote: The following changes since commit 98b2e3c9ab3abfe476a2b02f8f51813edb90e72d: Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging (2019-10-08 16:08:35 +0100) are available in the Git repository at: https://github.com/stefanha/qemu.git tags/tracing-pull-request for you to fetch changes up to a1f4fc951a277c49a25418cafb028ec5529707fa: trace: avoid "is" with a literal Python 3.8 warnings (2019-10-14 09:54:46 +0100) Pull request Stefan Hajnoczi (2): trace: add --group=all to tracing.txt trace: avoid "is" with a literal Python 3.8 warnings Applied, thanks. Buh, v2 missed :(
Re: [RFC PATCH 23/23] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
On 10/15/19 10:23 AM, Alberto Garcia wrote: Now that the implementation of subclusters is complete we can finally add the necessary options to create and read images with this feature, which we call "extended L2 entries". Signed-off-by: Alberto Garcia --- +++ b/qapi/block-core.json @@ -85,6 +85,7 @@ 'compat': 'str', '*data-file': 'str', '*data-file-raw': 'bool', + '*extended-l2': 'bool', '*lazy-refcounts': 'bool', '*corrupt': 'bool', 'refcount-bits': 'int', Missing documentation for the new member. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [RFC PATCH 00/23] Add subcluster allocation to qcow2
On 10/15/19 10:23 AM, Alberto Garcia wrote: Hi, this series adds a new feature to the qcow2 on-disk format called "Extended L2 Entries", which allows us to do subcluster allocation. This cover letter explains the reasons behind this proposal, the changes to the on-disk format, test results and pending work. If you are curious you can also have a look at previous discussions about this feature: === Changes to the on-disk format === An L2 entry is 64 bits wide, with this format (for uncompressed clusters): 6356 5548 4740 3932 3124 2316 15 8 7 0 **<> <--><--->* Rsrved host cluster offset of data Reserved (6 bits)(47 bits) (8 bits) bit 63: refcount == 1 (QCOW_OFLAG_COPIED) bit 62: compressed = 1 (QCOW_OFLAG_COMPRESSED) bit 0: all zeros (QCOW_OFLAG_ZERO) If Extended L2 Entries are enabled, bit 0 becomes reserved and must be unset, and this 64-bit bitmap follows the entry: 6356 5548 4740 3932 3124 2316 15 8 7 0 <-> <-> subcluster reads as zerossubcluster is allocated (32 bits) (32 bits) I like the grouping - you can then do a 4-byte read and comparison to 0 to see if the entire cluster reads as zeroes or is unallocated. With 32k clusters, this results in 1k subclusters. In cluster 1 (offset 32k), which bits map where? (The obvious choices are that sub-cluster 32k maps to bit 0, 33k maps to bit 1, ...; or that sub-cluster 32k maps to bit 31, 33k maps to bit 30, ...) /me reads ahead okay, in patch 5, you said you map the most significant bit to the first cluster. That feels backwards to me; I wonder if the math is any easier if you map sub-clusters starting from the least-significant, because then you get: bit = (address >> cluster_size) & 32 rather than bit = 31 - ((address >> cluster_size) & 32) Some comments about the results: - The smallest allowed cluster size for an image with subclusters is 16 KB (in this case the subclusters size is 512 bytes), hence the missing values in the 4 KB and 8 KB rows. Again reading ahead, I see that patch 5 requires a 16k minimum cluster for using extended L2. Could we still permit clusters smaller than that, but merely document that subclusters are always a minimum of 512 bytes and therefore for an 8k cluster we only use 16 bits (leaving the other 16 bits zero)? But I'm also fine with the simplicity of just stating that subclusters require at least 16k clusters. === To do === A couple of things are missing from this series: - The ability to efficiently zero individual subclusters using qcow2_co_pwrite_zeroes(). At the moment only full clusters can be zeroed with this method. - Alternatively we could get rid of the individual "all zeroes" bits altogether and have 64 subclusters per cluster. We would still have the QCOW_OFLAG_ZERO bit in the standard cluster descriptor. I think you've got more flexibility with the two bits per sub-cluster than you would with just 1 bit and 64 subclusters, so I don't think this direction is going to get us far. - The number of subclusters per cluster is always 32. It would be trivial to allow configuring this, but I don't see any use case. Agreed. - Tests: I have a few written that I'll add in future revisions of this series. - handle_alloc_space() works at the subclusters level. That is, if you have an unallocated 2MB cluster with 64KB subclusters, no backing image and you write 4KB of data, QEMU won't write zeroes to the affected subcluster(s) and will use handle_alloc_space() instead. The other subclusters won't be touched and will remain unallocated. This behavior is consistent with how subclusters work and saves disk space, but offers slightly lower performance (see test results above). Theoretically we could offer a setting to configure this, but I'm not convinced that this is very useful. === As usual, feedback is welcome, Looks promising! How do subclusters interact with external data files? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
[RFC PATCH 00/23] Add subcluster allocation to qcow2
Hi, this series adds a new feature to the qcow2 on-disk format called "Extended L2 Entries", which allows us to do subcluster allocation. This cover letter explains the reasons behind this proposal, the changes to the on-disk format, test results and pending work. If you are curious you can also have a look at previous discussions about this feature: https://lists.gnu.org/archive/html/qemu-block/2017-04/msg00178.html https://lists.gnu.org/archive/html/qemu-block/2019-06/msg01155.html This is the first proper version of the patches, and I believe that the implementation is complete. However since I'm proposing a change to the on-disk format I'm labeling this as RFC because I'm expecting some debate. I'll remove the RFC tag and add more tests in future revisions. === Problem === A qcow2 image is divided into units of constant size called clusters, and among other things it contains metadata that maps guest addresses to host addresses (the so-called L1 and L2 tables). There are two basic problems that result from this: 1) Reading from or writing to a qcow2 image involves reading the corresponding entry on the L2 table that maps the guest address to the host address. This is very slow because it involves two I/O operations: one on the L2 table and the other one on the actual data cluster. 2) A cluster is the smallest unit of allocation. Therefore writing a mere 512 bytes to an empty disk requires allocating a complete cluster and filling it with zeroes (or with data from the backing image if there is one). This wastes more disk space and also has a negative impact on I/O. Problem (1) can be solved by caching the L2 tables in memory. The maximum amount of disk space used by L2 tables depends on the virtual disk size and the cluster size: max_l2_size = virtual_disk_size * 8 / cluster_size Because of this, the only way to reduce the size of the L2 tables is by increasing the cluster size (which can be any power of two between 512 bytes and 2 MB). But then we hit problem (2): I/O is slower and more disk space is wasted. === The proposal === The proposal is to extend the qcow2 format by allowing subcluster allocation. The on-disk format remains essentially the same, except that each data cluster is internally divided into 32 subclusters of equal size. The way it works in practice is with a new optional feature called "Extended L2 Entries", that needs to be enabled when an image is created. With this, each entry on an L2 table is accompanied by a bitmap indicating the allocation state of each one of the subclusters for that cluster. The size of an L2 entry doubles from 64 to 128 bits. Other than L2 entries, all other data structures remain unchanged, but for data clusters the smallest unit of allocation is now the subcluster. Reference counting is still at the cluster level, because there is no way to reference individual subclusters. Copy-on-write on internal snapshots needs to copy complete clusters, so that scenario would not benefit from this change. I see two main use cases for this feature: a) The qcow2 image is not too large / the L2 cache is not a problem, but you want to increase the allocation performance. In this case you can have a 128KB cluster with 4KB subclusters (with 4KB being a common block size in ext4 and other filesystems) b) The qcow2 image is very large and you want to save metadata space in order to have a smaller L2 cache. In this case you can go for the maximum cluster size (2MB) but you want to have smaller subclusters to increase the allocation performance and optimize the disk usage. === Changes to the on-disk format === An L2 entry is 64 bits wide, with this format (for uncompressed clusters): 6356 5548 4740 3932 3124 2316 15 8 7 0 **<> <--><--->* Rsrved host cluster offset of data Reserved (6 bits)(47 bits) (8 bits) bit 63: refcount == 1 (QCOW_OFLAG_COPIED) bit 62: compressed = 1 (QCOW_OFLAG_COMPRESSED) bit 0: all zeros (QCOW_OFLAG_ZERO) If Extended L2 Entries are enabled, bit 0 becomes reserved and must be unset, and this 64-bit bitmap follows the entry: 6356 5548 4740 3932 3124 2316 15 8 7 0 <-> <-> subcluster reads as zerossubcluster is allocated (32 bits) (32 bits) All this applies to uncompressed clusters. Compressed clusters are not divided into subclusters, the cluster descriptor remains exactly the same, and the 64-bit bitmap is not used (i.e. all bits are always 0). === Test results === I made all tests on an SSD drive,
[RFC PATCH 21/23] qcow2: Add subcluster support to handle_alloc_space()
The bdrv_co_pwrite_zeroes() call here fills complete clusters with zeroes, but it can happen that some subclusters are not part of the write request or the copy-on-write. This patch makes sure that only the affected subclusters are overwritten. A potential improvement would be to also fill with zeroes the other subclusters if we can guarantee that we are not overwriting existing data. However this would waste more disk space, so we should first evaluate if it's really worth doing. Signed-off-by: Alberto Garcia --- block/qcow2.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index c222cd261d..c54278ab0b 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2194,6 +2194,9 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta) for (m = l2meta; m != NULL; m = m->next) { int ret; +uint64_t start_offset = m->alloc_offset + m->cow_start.offset; +uint64_t nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes - +m->cow_start.offset; if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) { continue; @@ -2208,16 +2211,14 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta) * efficiently zero out the whole clusters */ -ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset, -m->nb_clusters * s->cluster_size, +ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes, true); if (ret < 0) { return ret; } BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE); -ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset, -m->nb_clusters * s->cluster_size, +ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes, BDRV_REQ_NO_FALLBACK); if (ret < 0) { if (ret != -ENOTSUP && ret != -EAGAIN) { -- 2.20.1
[RFC PATCH 14/23] qcow2: Add subcluster support to qcow2_get_cluster_offset()
The logic of this function remains pretty much the same, except that it uses count_contiguous_subclusters(), which combines the logic of count_contiguous_clusters() / count_contiguous_clusters_unallocated() and checks individual subclusters. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 111 -- 1 file changed, 52 insertions(+), 59 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 8df0f67316..71d4cc518a 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -372,66 +372,51 @@ fail: } /* - * Checks how many clusters in a given L2 slice are contiguous in the image - * file. As soon as one of the flags in the bitmask stop_flags changes compared - * to the first cluster, the search is stopped and the cluster is not counted - * as contiguous. (This allows it, for example, to stop at the first compressed - * cluster which may require a different handling) + * Return the number of contiguous subclusters of the exact same type + * in a given L2 slice, starting from cluster @l2_index, subcluster + * @sc_index. At most @nb_clusters are checked. Allocated clusters are + * also required to be contiguous in the image file. */ -static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters, -int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags) +static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters, +unsigned sc_index, uint64_t *l2_slice, +int l2_index) { BDRVQcow2State *s = bs->opaque; -int i; -QCow2ClusterType first_cluster_type; -uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED; -uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index); -uint64_t offset = first_entry & mask; - -first_cluster_type = qcow2_get_cluster_type(bs, first_entry); -if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) { -return 0; +int i, j, count = 0; +uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index); +uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index); +uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK; +bool check_offset = true; +QCow2ClusterType type = +qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index); + +assert(type != QCOW2_CLUSTER_INVALID); /* The caller should check this */ + +if (type == QCOW2_CLUSTER_COMPRESSED) { +return 1; /* Compressed clusters are always counted one by one */ } -/* must be allocated */ -assert(first_cluster_type == QCOW2_CLUSTER_NORMAL || - first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC); - -for (i = 0; i < nb_clusters; i++) { -uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask; -if (offset + (uint64_t) i * cluster_size != l2_entry) { -break; -} +if (type == QCOW2_CLUSTER_UNALLOCATED || type == QCOW2_CLUSTER_ZERO_PLAIN) { +check_offset = false; } -return i; -} - -/* - * Checks how many consecutive unallocated clusters in a given L2 - * slice have the same cluster type. - */ -static int count_contiguous_clusters_unallocated(BlockDriverState *bs, - int nb_clusters, - uint64_t *l2_slice, - int l2_index, - QCow2ClusterType wanted_type) -{ -BDRVQcow2State *s = bs->opaque; -int i; - -assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN || - wanted_type == QCOW2_CLUSTER_UNALLOCATED); for (i = 0; i < nb_clusters; i++) { -uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i); -QCow2ClusterType type = qcow2_get_cluster_type(bs, entry); - -if (type != wanted_type) { -break; +l2_entry = get_l2_entry(s, l2_slice, l2_index + i); +l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i); +if (check_offset && expected_offset != (l2_entry & L2E_OFFSET_MASK)) { +goto out; +} +for (j = (i == 0) ? sc_index : 0; j < s->subclusters_per_cluster; j++) { +if (qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, j) != type) { +goto out; +} +count++; } +expected_offset += s->cluster_size; } -return i; +out: +return count; } static int coroutine_fn do_perform_cow_read(BlockDriverState *bs, @@ -514,8 +499,8 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset, unsigned int *bytes, uint64_t *cluster_offset) { BDRVQcow2State *s = bs->opaque; -unsigned int l2_index; -uint64_t l1_index, l2_offset, *l2_slice; +unsigned int l2_index, sc_index; +uint64_t l1_index, l2_offset, *l2_slice, l2_bitmap; int c;
[RFC PATCH 13/23] qcow2: Add subcluster support to calculate_l2_meta()
If an image has subclusters then there are more copy-on-write scenarios that we need to consider. Let's say we have a write request from the middle of subcluster #3 until the end of the cluster: - If the cluster is new, then subclusters #0 to #3 from the old cluster must be copied into the new one. - If the cluster is new but the old cluster was unallocated, then only subcluster #3 needs copy-on-write. #0 to #2 are marked as unallocated in the bitmap of the new L2 entry. - If we are overwriting an old cluster and subcluster #3 is unallocated or has the all-zeroes bit set then we need copy-on-write on subcluster #3. - If we are overwriting an old cluster and subcluster #3 was allocated then there is no need to copy-on-write. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 136 +- 1 file changed, 108 insertions(+), 28 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 67f90e415d..8df0f67316 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1034,14 +1034,16 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m) * If @keep_old is true it means that the clusters were already * allocated and will be overwritten. If false then the clusters are * new and we have to decrease the reference count of the old ones. + * + * Returns 1 on success, -errno on failure. */ -static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset, - uint64_t guest_offset, uint64_t bytes, - uint64_t *l2_slice, QCowL2Meta **m, bool keep_old) +static int calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset, + uint64_t guest_offset, uint64_t bytes, + uint64_t *l2_slice, QCowL2Meta **m, bool keep_old) { BDRVQcow2State *s = bs->opaque; -int l2_index = offset_to_l2_slice_index(s, guest_offset); -uint64_t l2_entry; +int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset); +uint64_t l2_entry, l2_bitmap; unsigned cow_start_from, cow_end_to; unsigned cow_start_to = offset_into_cluster(s, guest_offset); unsigned cow_end_from = cow_start_to + bytes; @@ -1049,38 +1051,108 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset, QCowL2Meta *old_m = *m; QCow2ClusterType type; -/* Return if there's no COW (all clusters are normal and we keep them) */ +/* Return if there's no COW (all subclusters are normal and we are + * keeping the clusters) */ if (keep_old) { +unsigned first_sc = cow_start_to / s->subcluster_size; +unsigned last_sc = (cow_end_from - 1) / s->subcluster_size; int i; -for (i = 0; i < nb_clusters; i++) { -l2_entry = get_l2_entry(s, l2_slice, l2_index + i); -if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) { +for (i = first_sc; i <= last_sc; i++) { +unsigned c = i / s->subclusters_per_cluster; +unsigned sc = i % s->subclusters_per_cluster; +l2_entry = get_l2_entry(s, l2_slice, l2_index + c); +l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + c); +type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc); +if (type == QCOW2_CLUSTER_INVALID) { +l2_index += c; /* Point to the invalid entry */ +goto fail; +} +if (type != QCOW2_CLUSTER_NORMAL) { break; } } -if (i == nb_clusters) { -return; +if (i == last_sc + 1) { +return 1; } } /* Get the L2 entry from the first cluster */ l2_entry = get_l2_entry(s, l2_slice, l2_index); -type = qcow2_get_cluster_type(bs, l2_entry); +l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index); +sc_index = offset_to_sc_index(s, guest_offset); +type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index); -if (type == QCOW2_CLUSTER_NORMAL && keep_old) { -cow_start_from = cow_start_to; +if (type == QCOW2_CLUSTER_INVALID) { +goto fail; +} + +if (!keep_old) { +switch (type) { +case QCOW2_CLUSTER_NORMAL: +case QCOW2_CLUSTER_COMPRESSED: +case QCOW2_CLUSTER_ZERO_ALLOC: +case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER: +cow_start_from = 0; +break; +case QCOW2_CLUSTER_ZERO_PLAIN: +case QCOW2_CLUSTER_UNALLOCATED: +cow_start_from = sc_index << s->subcluster_bits; +break; +default: +g_assert_not_reached(); +} } else { -cow_start_from = 0; +switch (type) { +case QCOW2_CLUSTER_NORMAL: +cow_start_from = cow_start_to; +break; +case QCOW2_CLUSTER_ZERO_ALLOC: +case
[RFC PATCH 17/23] qcow2: Add subcluster support to check_refcounts_l2()
Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an image has subclusters. Instead, the individual 'all zeroes' bits must be used. Signed-off-by: Alberto Garcia --- block/qcow2-refcount.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index a2c4d36378..3eda523e25 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1685,8 +1685,13 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2; -l2_entry = QCOW_OFLAG_ZERO; -set_l2_entry(s, l2_table, i, l2_entry); +if (has_subclusters(s)) { +set_l2_entry(s, l2_table, i, 0); +set_l2_bitmap(s, l2_table, i, + QCOW_L2_BITMAP_ALL_ZEROES); +} else { +set_l2_entry(s, l2_table, i, QCOW_OFLAG_ZERO); +} ret = qcow2_pre_write_overlap_check(bs, ign, l2e_offset, l2_entry_size(s), false); if (ret < 0) { -- 2.20.1
[RFC PATCH 11/23] qcow2: Add qcow2_get_subcluster_type()
This function returns the type of an individual subcluster. If an image does not have subclusters then this returns the exact same value as qcow2_get_cluster_type(). The information in standard and extended L2 entries is encoded in a slightly different way, but all existing QCow2ClusterType values are also valid for subclusters and have the same meanings (although they typically only apply to the requested subcluster). There are two important exceptions to this: a) QCOW2_CLUSTER_COMPRESSED means that the whole cluster is compressed. We do not support compression at the subcluster level. b) QCOW2_CLUSTER_UNALLOCATED means that the cluster is unallocated, that is, the offset field of the L2 entry does not point to a host cluster. All subclusters are obviously unallocated too but any of them could be of type QCOW2_CLUSTER_ZERO_PLAIN. In addition to that, extended L2 entries allow one new scenario where the cluster is normally allocated but an individual subcluster is not. This is very different from (b) and because of that this patch adds a new value called QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER. As a last thing, this patch adds QCOW2_CLUSTER_INVALID to detect the cases where an L2 entry has a value that violates the spec. The caller is responsible for handling these situations. To prevent compatibility problems with images that have invalid values but are currently being read by QEMU without causing side effects, QCOW2_CLUSTER_INVALID is only returned for images with extended L2 entries. Signed-off-by: Alberto Garcia --- block/qcow2.h | 62 +++ 1 file changed, 62 insertions(+) diff --git a/block/qcow2.h b/block/qcow2.h index d9fe883fe0..60e4bf963e 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -74,6 +74,15 @@ #define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32 +/* The subcluster X [0..31] reads as zeroes */ +#define QCOW_OFLAG_SUB_ZERO(X)((1ULL << 63) >> (X)) +/* The subcluster X [0..31] is allocated */ +#define QCOW_OFLAG_SUB_ALLOC(X) ((1ULL << 31) >> (X)) +/* L2 entry bitmap with all "read as zeroes" bits set */ +#define QCOW_L2_BITMAP_ALL_ZEROES 0xULL +/* L2 entry bitmap with all allocation bits set */ +#define QCOW_L2_BITMAP_ALL_ALLOC 0xULL + #define MIN_CLUSTER_BITS 9 #define MAX_CLUSTER_BITS 21 @@ -435,10 +444,12 @@ typedef struct QCowL2Meta typedef enum QCow2ClusterType { QCOW2_CLUSTER_UNALLOCATED, +QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER, QCOW2_CLUSTER_ZERO_PLAIN, QCOW2_CLUSTER_ZERO_ALLOC, QCOW2_CLUSTER_NORMAL, QCOW2_CLUSTER_COMPRESSED, +QCOW2_CLUSTER_INVALID, } QCow2ClusterType; typedef enum QCow2MetadataOverlap { @@ -618,6 +629,57 @@ static inline QCow2ClusterType qcow2_get_cluster_type(BlockDriverState *bs, } } +/* In an image without subsclusters this returns the same value as + * qcow2_get_cluster_type() */ +static inline int qcow2_get_subcluster_type(BlockDriverState *bs, +uint64_t l2_entry, +uint64_t l2_bitmap, +unsigned sc_index) +{ +BDRVQcow2State *s = bs->opaque; +QCow2ClusterType type = qcow2_get_cluster_type(bs, l2_entry); +assert(sc_index < s->subclusters_per_cluster); + +if (has_subclusters(s)) { +bool sc_zero = l2_bitmap & QCOW_OFLAG_SUB_ZERO(sc_index); +bool sc_alloc = l2_bitmap & QCOW_OFLAG_SUB_ALLOC(sc_index); +switch (type) { +case QCOW2_CLUSTER_COMPRESSED: +if (l2_bitmap != 0) { +return QCOW2_CLUSTER_INVALID; +} +break; +case QCOW2_CLUSTER_ZERO_PLAIN: +case QCOW2_CLUSTER_ZERO_ALLOC: +return QCOW2_CLUSTER_INVALID; +case QCOW2_CLUSTER_NORMAL: +if (!sc_zero && !sc_alloc) { +return QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER; +} else if (!sc_zero && sc_alloc) { +return QCOW2_CLUSTER_NORMAL; +} else if (sc_zero && !sc_alloc) { +return QCOW2_CLUSTER_ZERO_ALLOC; +} else { /* sc_zero && sc_alloc */ +return QCOW2_CLUSTER_INVALID; +} +case QCOW2_CLUSTER_UNALLOCATED: +if (!sc_zero && !sc_alloc) { +return QCOW2_CLUSTER_UNALLOCATED; +} else if (!sc_zero && sc_alloc) { +return QCOW2_CLUSTER_INVALID; +} else if (sc_zero && !sc_alloc) { +return QCOW2_CLUSTER_ZERO_PLAIN; +} else { /* sc_zero && sc_alloc */ +return QCOW2_CLUSTER_INVALID; +} +default: +g_assert_not_reached(); +} +} + +return type; +} + /* Check whether refcounts are eager or lazy */ static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s) { -- 2.20.1
[RFC PATCH 07/23] qcow2: Add subcluster-related fields to BDRVQcow2State
This patch adds the following new fields to BDRVQcow2State: - subclusters_per_cluster: Number of subclusters in a cluster - subcluster_size: The size of each subcluster, in bytes - subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size Images without subclusters are treated as if they had exactly one, with subcluster_size = cluster_size. Signed-off-by: Alberto Garcia --- block/qcow2.c | 5 + block/qcow2.h | 5 + 2 files changed, 10 insertions(+) diff --git a/block/qcow2.c b/block/qcow2.c index 4d16393e61..be9854c5ea 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1341,6 +1341,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, } } +s->subclusters_per_cluster = +has_subclusters(s) ? QCOW_MAX_SUBCLUSTERS_PER_CLUSTER : 1; +s->subcluster_size = s->cluster_size / s->subclusters_per_cluster; +s->subcluster_bits = ctz32(s->subcluster_size); + /* Check support for various header values */ if (header.refcount_order > 6) { error_setg(errp, "Reference count entry width too large; may not " diff --git a/block/qcow2.h b/block/qcow2.h index 6d6fc57f41..e6486a2cf8 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -72,6 +72,8 @@ /* The cluster reads as all zeros */ #define QCOW_OFLAG_ZERO (1ULL << 0) +#define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32 + #define MIN_CLUSTER_BITS 9 #define MAX_CLUSTER_BITS 21 @@ -274,6 +276,9 @@ typedef struct BDRVQcow2State { int cluster_bits; int cluster_size; int l2_slice_size; +int subcluster_bits; +int subcluster_size; +int subclusters_per_cluster; int l2_bits; int l2_size; int l1_size; -- 2.20.1
[RFC PATCH 08/23] qcow2: Add offset_to_sc_index()
For a given offset, return the subcluster number within its cluster (i.e. with 32 subclusters per cluster it returns a number between 0 and 31). Signed-off-by: Alberto Garcia --- block/qcow2.h | 5 + 1 file changed, 5 insertions(+) diff --git a/block/qcow2.h b/block/qcow2.h index e6486a2cf8..c450267c88 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -556,6 +556,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State *s, int64_t offset) return (offset >> s->cluster_bits) & (s->l2_slice_size - 1); } +static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset) +{ +return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1); +} + static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s) { return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits); -- 2.20.1
[RFC PATCH 18/23] qcow2: Add subcluster support to expand_zero_clusters_in_l1()
Two changes are needed in order to add subcluster support to this function: deallocated clusters must have their bitmaps cleared, and expanded clusters must have all the "subcluster allocated" bits set. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index bf32447d18..dc72f0e595 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -2033,6 +2033,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table, /* not backed; therefore we can simply deallocate the * cluster */ set_l2_entry(s, l2_slice, j, 0); +set_l2_bitmap(s, l2_slice, j, 0); l2_dirty = true; continue; } @@ -2099,6 +2100,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table, } else { set_l2_entry(s, l2_slice, j, offset); } +set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC); l2_dirty = true; } -- 2.20.1
[RFC PATCH 02/23] qcow2: Split cluster_needs_cow() out of count_cow_clusters()
We are going to need it in other places. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 34 +++--- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index fe2523ed66..f462e169c0 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1068,6 +1068,24 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset, QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight); } +/* Returns true if writing to a cluster requires COW */ +static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry) +{ +switch (qcow2_get_cluster_type(bs, l2_entry)) { +case QCOW2_CLUSTER_NORMAL: +if (l2_entry & QCOW_OFLAG_COPIED) { +return false; +} +case QCOW2_CLUSTER_UNALLOCATED: +case QCOW2_CLUSTER_COMPRESSED: +case QCOW2_CLUSTER_ZERO_PLAIN: +case QCOW2_CLUSTER_ZERO_ALLOC: +return true; +default: +abort(); +} +} + /* * Returns the number of contiguous clusters that can be used for an allocating * write, but require COW to be performed (this includes yet unallocated space, @@ -1080,25 +1098,11 @@ static int count_cow_clusters(BlockDriverState *bs, int nb_clusters, for (i = 0; i < nb_clusters; i++) { uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]); -QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry); - -switch(cluster_type) { -case QCOW2_CLUSTER_NORMAL: -if (l2_entry & QCOW_OFLAG_COPIED) { -goto out; -} +if (!cluster_needs_cow(bs, l2_entry)) { break; -case QCOW2_CLUSTER_UNALLOCATED: -case QCOW2_CLUSTER_COMPRESSED: -case QCOW2_CLUSTER_ZERO_PLAIN: -case QCOW2_CLUSTER_ZERO_ALLOC: -break; -default: -abort(); } } -out: assert(i <= nb_clusters); return i; } -- 2.20.1
[PATCH v2 04/21] iotests: Filter refcount_order in 036
This test can run just fine with other values for refcount_bits, so we should filter the value from qcow2.py's dump-header. In fact, we can filter everything but the feature bits and header extensions, because that is what the test is about. (036 currently ignores user-specified image options, but that will be fixed in the next patch.) Signed-off-by: Max Reitz --- tests/qemu-iotests/036 | 9 --- tests/qemu-iotests/036.out | 48 -- 2 files changed, 6 insertions(+), 51 deletions(-) diff --git a/tests/qemu-iotests/036 b/tests/qemu-iotests/036 index f06ff67408..5f929ad3be 100755 --- a/tests/qemu-iotests/036 +++ b/tests/qemu-iotests/036 @@ -55,7 +55,8 @@ $PYTHON qcow2.py "$TEST_IMG" set-feature-bit incompatible 63 # Without feature table $PYTHON qcow2.py "$TEST_IMG" del-header-ext 0x6803f857 -$PYTHON qcow2.py "$TEST_IMG" dump-header +$PYTHON qcow2.py "$TEST_IMG" dump-header | grep features +$PYTHON qcow2.py "$TEST_IMG" dump-header-exts _img_info # With feature table containing bit 63 @@ -103,14 +104,16 @@ echo === Create image with unknown autoclear feature bit === echo _make_test_img 64M $PYTHON qcow2.py "$TEST_IMG" set-feature-bit autoclear 63 -$PYTHON qcow2.py "$TEST_IMG" dump-header +$PYTHON qcow2.py "$TEST_IMG" dump-header | grep features +$PYTHON qcow2.py "$TEST_IMG" dump-header-exts echo echo === Repair image === echo _check_test_img -r all -$PYTHON qcow2.py "$TEST_IMG" dump-header +$PYTHON qcow2.py "$TEST_IMG" dump-header | grep features +$PYTHON qcow2.py "$TEST_IMG" dump-header-exts # success, all done echo "*** done" diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out index 15229a9604..0b52b934e1 100644 --- a/tests/qemu-iotests/036.out +++ b/tests/qemu-iotests/036.out @@ -3,25 +3,9 @@ QA output created by 036 === Image with unknown incompatible feature bit === Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 -magic 0x514649fb -version 3 -backing_file_offset 0x0 -backing_file_size 0x0 -cluster_bits 16 -size 67108864 -crypt_method 0 -l1_size 1 -l1_table_offset 0x3 -refcount_table_offset 0x1 -refcount_table_clusters 1 -nb_snapshots 0 -snapshot_offset 0x0 incompatible_features [63] compatible_features [] autoclear_features[] -refcount_order4 -header_length 104 - qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Unsupported IMGFMT feature(s): Unknown incompatible feature: 8000 qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Unsupported IMGFMT feature(s): Test feature @@ -37,25 +21,9 @@ qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Unsupported IMGFMT feature(s): tes === Create image with unknown autoclear feature bit === Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 -magic 0x514649fb -version 3 -backing_file_offset 0x0 -backing_file_size 0x0 -cluster_bits 16 -size 67108864 -crypt_method 0 -l1_size 1 -l1_table_offset 0x3 -refcount_table_offset 0x1 -refcount_table_clusters 1 -nb_snapshots 0 -snapshot_offset 0x0 incompatible_features [] compatible_features [] autoclear_features[63] -refcount_order4 -header_length 104 - Header extension: magic 0x6803f857 length192 @@ -65,25 +33,9 @@ data === Repair image === No errors were found on the image. -magic 0x514649fb -version 3 -backing_file_offset 0x0 -backing_file_size 0x0 -cluster_bits 16 -size 67108864 -crypt_method 0 -l1_size 1 -l1_table_offset 0x3 -refcount_table_offset 0x1 -refcount_table_clusters 1 -nb_snapshots 0 -snapshot_offset 0x0 incompatible_features [] compatible_features [] autoclear_features[] -refcount_order4 -header_length 104 - Header extension: magic 0x6803f857 length192 -- 2.21.0
[PATCH v2 13/21] iotests: Avoid qemu-img create
Use _make_test_img whenever possible. This way, we will not ignore user-specified image options. Signed-off-by: Max Reitz Reviewed-by: Maxim Levitsky --- tests/qemu-iotests/094 | 2 +- tests/qemu-iotests/111 | 3 +-- tests/qemu-iotests/123 | 2 +- tests/qemu-iotests/153 | 2 +- tests/qemu-iotests/200 | 4 ++-- 5 files changed, 6 insertions(+), 7 deletions(-) diff --git a/tests/qemu-iotests/094 b/tests/qemu-iotests/094 index 9343e09492..d645952d54 100755 --- a/tests/qemu-iotests/094 +++ b/tests/qemu-iotests/094 @@ -45,7 +45,7 @@ _supported_proto nbd _unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat" _make_test_img 64M -$QEMU_IMG create -f $IMGFMT "$TEST_DIR/source.$IMGFMT" 64M | _filter_img_create +TEST_IMG_FILE="$TEST_DIR/source.$IMGFMT" IMGPROTO=file _make_test_img 64M _launch_qemu -drive if=none,id=src,file="$TEST_DIR/source.$IMGFMT",format=raw \ -nodefaults diff --git a/tests/qemu-iotests/111 b/tests/qemu-iotests/111 index 490a5bbcb5..3b43d1bd83 100755 --- a/tests/qemu-iotests/111 +++ b/tests/qemu-iotests/111 @@ -41,8 +41,7 @@ _supported_fmt qed qcow qcow2 vmdk _supported_proto file _unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat" -$QEMU_IMG create -f $IMGFMT -b "$TEST_IMG.inexistent" "$TEST_IMG" 2>&1 \ -| _filter_testdir | _filter_imgfmt +_make_test_img -b "$TEST_IMG.inexistent" # success, all done echo '*** done' diff --git a/tests/qemu-iotests/123 b/tests/qemu-iotests/123 index d33950eb54..74d40d0478 100755 --- a/tests/qemu-iotests/123 +++ b/tests/qemu-iotests/123 @@ -44,7 +44,7 @@ _supported_os Linux SRC_IMG="$TEST_DIR/source.$IMGFMT" _make_test_img 1M -$QEMU_IMG create -f $IMGFMT "$SRC_IMG" 1M | _filter_img_create +TEST_IMG_FILE=$SRC_IMG IMGPROTO=file _make_test_img 1M $QEMU_IO -c 'write -P 42 0 1M' "$SRC_IMG" | _filter_qemu_io diff --git a/tests/qemu-iotests/153 b/tests/qemu-iotests/153 index c969a1a16f..e59090259c 100755 --- a/tests/qemu-iotests/153 +++ b/tests/qemu-iotests/153 @@ -98,7 +98,7 @@ for opts1 in "" "read-only=on" "read-only=on,force-share=on"; do echo echo "== Creating test image ==" -$QEMU_IMG create -f $IMGFMT "${TEST_IMG}" -b ${TEST_IMG}.base | _filter_img_create +_make_test_img -b "${TEST_IMG}.base" echo echo "== Launching QEMU, opts: '$opts1' ==" diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200 index 72d431f251..d904885136 100755 --- a/tests/qemu-iotests/200 +++ b/tests/qemu-iotests/200 @@ -46,8 +46,8 @@ _supported_proto file BACKING_IMG="${TEST_DIR}/backing.img" TEST_IMG="${TEST_DIR}/test.img" -${QEMU_IMG} create -f $IMGFMT "${BACKING_IMG}" 512M | _filter_img_create -${QEMU_IMG} create -f $IMGFMT -F $IMGFMT "${TEST_IMG}" -b "${BACKING_IMG}" 512M | _filter_img_create +TEST_IMG="$BACKING_IMG" _make_test_img 512M +_make_test_img -F $IMGFMT -b "$BACKING_IMG" 512M ${QEMU_IO} -c "write -P 0xa5 512 300M" "${BACKING_IMG}" | _filter_qemu_io -- 2.21.0
[PATCH v2 14/21] iotests: Use _rm_test_img for deleting test images
Just rm will not delete external data files. Use _rm_test_img every time we delete a test image. (In the process, clean up the indentation of every _cleanup() this patch touches.) ((Also, use quotes consistently. I am happy to see unquoted instances like "rm -rf $TEST_DIR/..." go.)) Signed-off-by: Max Reitz --- tests/qemu-iotests/005 | 2 +- tests/qemu-iotests/019 | 6 +++--- tests/qemu-iotests/020 | 6 +++--- tests/qemu-iotests/024 | 10 +- tests/qemu-iotests/028 | 2 +- tests/qemu-iotests/029 | 2 +- tests/qemu-iotests/043 | 4 +++- tests/qemu-iotests/048 | 2 +- tests/qemu-iotests/050 | 4 ++-- tests/qemu-iotests/053 | 4 ++-- tests/qemu-iotests/058 | 2 +- tests/qemu-iotests/059 | 2 +- tests/qemu-iotests/061 | 2 +- tests/qemu-iotests/063 | 6 -- tests/qemu-iotests/069 | 2 +- tests/qemu-iotests/074 | 2 +- tests/qemu-iotests/080 | 2 +- tests/qemu-iotests/081 | 6 +++--- tests/qemu-iotests/085 | 9 ++--- tests/qemu-iotests/088 | 2 +- tests/qemu-iotests/092 | 2 +- tests/qemu-iotests/094 | 2 +- tests/qemu-iotests/095 | 5 +++-- tests/qemu-iotests/099 | 7 --- tests/qemu-iotests/109 | 4 ++-- tests/qemu-iotests/110 | 4 ++-- tests/qemu-iotests/122 | 6 -- tests/qemu-iotests/123 | 2 +- tests/qemu-iotests/141 | 4 +++- tests/qemu-iotests/142 | 2 +- tests/qemu-iotests/144 | 4 +++- tests/qemu-iotests/153 | 10 +++--- tests/qemu-iotests/156 | 8 ++-- tests/qemu-iotests/159 | 2 +- tests/qemu-iotests/160 | 3 ++- tests/qemu-iotests/161 | 4 ++-- tests/qemu-iotests/170 | 2 +- tests/qemu-iotests/172 | 6 +++--- tests/qemu-iotests/173 | 3 ++- tests/qemu-iotests/178 | 2 +- tests/qemu-iotests/182 | 2 +- tests/qemu-iotests/183 | 2 +- tests/qemu-iotests/185 | 4 ++-- tests/qemu-iotests/187 | 6 +++--- tests/qemu-iotests/190 | 2 +- tests/qemu-iotests/191 | 6 +++--- tests/qemu-iotests/195 | 2 +- tests/qemu-iotests/197 | 2 +- tests/qemu-iotests/200 | 3 ++- tests/qemu-iotests/215 | 2 +- tests/qemu-iotests/225 | 2 +- tests/qemu-iotests/229 | 3 ++- tests/qemu-iotests/232 | 4 +++- tests/qemu-iotests/243 | 2 +- tests/qemu-iotests/244 | 4 ++-- tests/qemu-iotests/247 | 4 +++- tests/qemu-iotests/249 | 4 ++-- tests/qemu-iotests/252 | 2 +- 58 files changed, 119 insertions(+), 96 deletions(-) diff --git a/tests/qemu-iotests/005 b/tests/qemu-iotests/005 index 58442762fe..2b651f2c37 100755 --- a/tests/qemu-iotests/005 +++ b/tests/qemu-iotests/005 @@ -62,7 +62,7 @@ if [ "$IMGFMT" = "raw" ]; then if ! truncate --size=5T "$TEST_IMG"; then _notrun "file system on $TEST_DIR does not support large enough files" fi -rm "$TEST_IMG" +_rm_test_img "$TEST_IMG" fi echo diff --git a/tests/qemu-iotests/019 b/tests/qemu-iotests/019 index b4f5234609..813a84acac 100755 --- a/tests/qemu-iotests/019 +++ b/tests/qemu-iotests/019 @@ -30,9 +30,9 @@ status=1 # failure is the default! _cleanup() { - _cleanup_test_img -rm -f "$TEST_IMG.base" -rm -f "$TEST_IMG.orig" +_cleanup_test_img +_rm_test_img "$TEST_IMG.base" +_rm_test_img "$TEST_IMG.orig" } trap "_cleanup; exit \$status" 0 1 2 3 15 diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020 index f41b92f35f..20f8f185d0 100755 --- a/tests/qemu-iotests/020 +++ b/tests/qemu-iotests/020 @@ -28,9 +28,9 @@ status=1 # failure is the default! _cleanup() { - _cleanup_test_img -rm -f "$TEST_IMG.base" -rm -f "$TEST_IMG.orig" +_cleanup_test_img +_rm_test_img "$TEST_IMG.base" +_rm_test_img "$TEST_IMG.orig" } trap "_cleanup; exit \$status" 0 1 2 3 15 diff --git a/tests/qemu-iotests/024 b/tests/qemu-iotests/024 index 23298c6f59..e2e766241e 100755 --- a/tests/qemu-iotests/024 +++ b/tests/qemu-iotests/024 @@ -29,12 +29,12 @@ status=1# failure is the default! _cleanup() { _cleanup_test_img -rm -f "$TEST_DIR/t.$IMGFMT.base_old" -rm -f "$TEST_DIR/t.$IMGFMT.base_new" +_rm_test_img "$TEST_DIR/t.$IMGFMT.base_old" +_rm_test_img "$TEST_DIR/t.$IMGFMT.base_new" -rm -f "$TEST_DIR/subdir/t.$IMGFMT" -rm -f "$TEST_DIR/subdir/t.$IMGFMT.base_old" -rm -f "$TEST_DIR/subdir/t.$IMGFMT.base_new" +_rm_test_img "$TEST_DIR/subdir/t.$IMGFMT" +_rm_test_img "$TEST_DIR/subdir/t.$IMGFMT.base_old" +_rm_test_img "$TEST_DIR/subdir/t.$IMGFMT.base_new" rmdir "$TEST_DIR/subdir" 2> /dev/null } trap "_cleanup; exit \$status" 0 1 2 3 15 diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028 index 71301ec6e5..caf1258647 100755 --- a/tests/qemu-iotests/028 +++ b/tests/qemu-iotests/028 @@ -32,7 +32,7 @@ status=1 # failure is the default! _cleanup() { _cleanup_qemu -rm -f "${TEST_IMG}.copy" +_rm_test_img "${TEST_IMG}.copy" _cleanup_test_img } trap "_cleanup; exit \$status" 0 1 2 3 15 diff --git a/tests/qemu-iotests/029 b/tests/qemu-iotests/029 index 94c2713132..9254ede5e5 100755 --- a/tests/qemu-iotests/029 +++ b/tests/qemu-iotests/029 @@
[PATCH v2 10/21] iotests: Replace IMGOPTS= by -o
Tests should not overwrite all user-supplied image options, but only add to it (which will effectively overwrite conflicting values). Accomplish this by passing options to _make_test_img via -o instead of $IMGOPTS. For some tests, there is no functional change because they already only appended options to IMGOPTS. For these, this patch is just a simplification. For others, this is a change, so they now heed user-specified $IMGOPTS. Some of those tests do not work with all image options, though, so we need to disable them accordingly. Signed-off-by: Max Reitz --- tests/qemu-iotests/031 | 9 --- tests/qemu-iotests/039 | 24 ++ tests/qemu-iotests/059 | 18 ++--- tests/qemu-iotests/060 | 6 ++--- tests/qemu-iotests/061 | 57 ++ tests/qemu-iotests/079 | 3 +-- tests/qemu-iotests/106 | 2 +- tests/qemu-iotests/108 | 2 +- tests/qemu-iotests/112 | 32 tests/qemu-iotests/115 | 3 +-- tests/qemu-iotests/121 | 6 ++--- tests/qemu-iotests/125 | 2 +- tests/qemu-iotests/137 | 2 +- tests/qemu-iotests/138 | 3 +-- tests/qemu-iotests/175 | 2 +- tests/qemu-iotests/190 | 2 +- tests/qemu-iotests/191 | 3 +-- tests/qemu-iotests/220 | 4 ++- tests/qemu-iotests/243 | 6 +++-- tests/qemu-iotests/244 | 10 +--- tests/qemu-iotests/250 | 3 +-- tests/qemu-iotests/265 | 2 +- 22 files changed, 100 insertions(+), 101 deletions(-) diff --git a/tests/qemu-iotests/031 b/tests/qemu-iotests/031 index a3c25ec237..c44fcf91bb 100755 --- a/tests/qemu-iotests/031 +++ b/tests/qemu-iotests/031 @@ -40,19 +40,22 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 # This tests qcow2-specific low-level functionality _supported_fmt qcow2 _supported_proto file +# We want to test compat=0.10, which does not support refcount widths +# other than 16 +_unsupported_imgopts 'refcount_bits=\([^1]\|.\([^6]\|$\)\)' CLUSTER_SIZE=65536 # qcow2.py output depends on the exact options used, so override the command # line here as an exception -for IMGOPTS in "compat=0.10" "compat=1.1"; do +for compat in "compat=0.10" "compat=1.1"; do echo -echo = Testing with -o $IMGOPTS = +echo = Testing with -o $compat = echo echo === Create image with unknown header extension === echo -_make_test_img 64M +_make_test_img -o $compat 64M $PYTHON qcow2.py "$TEST_IMG" add-header-ext 0x12345678 "This is a test header extension" $PYTHON qcow2.py "$TEST_IMG" dump-header _check_test_img diff --git a/tests/qemu-iotests/039 b/tests/qemu-iotests/039 index 325da63a4c..99563bf126 100755 --- a/tests/qemu-iotests/039 +++ b/tests/qemu-iotests/039 @@ -50,8 +50,7 @@ size=128M echo echo "== Checking that image is clean on shutdown ==" -IMGOPTS="compat=1.1,lazy_refcounts=on" -_make_test_img $size +_make_test_img -o "compat=1.1,lazy_refcounts=on" $size $QEMU_IO -c "write -P 0x5a 0 512" "$TEST_IMG" | _filter_qemu_io @@ -62,8 +61,7 @@ _check_test_img echo echo "== Creating a dirty image file ==" -IMGOPTS="compat=1.1,lazy_refcounts=on" -_make_test_img $size +_make_test_img -o "compat=1.1,lazy_refcounts=on" $size _NO_VALGRIND \ $QEMU_IO -c "write -P 0x5a 0 512" \ @@ -98,8 +96,7 @@ $QEMU_IO -c "read -P 0x5a 0 512" "$TEST_IMG" | _filter_qemu_io echo echo "== Opening a dirty image read/write should repair it ==" -IMGOPTS="compat=1.1,lazy_refcounts=on" -_make_test_img $size +_make_test_img -o "compat=1.1,lazy_refcounts=on" $size _NO_VALGRIND \ $QEMU_IO -c "write -P 0x5a 0 512" \ @@ -117,8 +114,7 @@ $PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features echo echo "== Creating an image file with lazy_refcounts=off ==" -IMGOPTS="compat=1.1,lazy_refcounts=off" -_make_test_img $size +_make_test_img -o "compat=1.1,lazy_refcounts=off" $size _NO_VALGRIND \ $QEMU_IO -c "write -P 0x5a 0 512" \ @@ -132,11 +128,9 @@ _check_test_img echo echo "== Committing to a backing file with lazy_refcounts=on ==" -IMGOPTS="compat=1.1,lazy_refcounts=on" -TEST_IMG="$TEST_IMG".base _make_test_img $size +TEST_IMG="$TEST_IMG".base _make_test_img -o "compat=1.1,lazy_refcounts=on" $size -IMGOPTS="compat=1.1,lazy_refcounts=on,backing_file=$TEST_IMG.base" -_make_test_img $size +_make_test_img -o "compat=1.1,lazy_refcounts=on,backing_file=$TEST_IMG.base" $size $QEMU_IO -c "write 0 512" "$TEST_IMG" | _filter_qemu_io $QEMU_IMG commit "$TEST_IMG" @@ -151,8 +145,7 @@ TEST_IMG="$TEST_IMG".base _check_test_img echo echo "== Changing lazy_refcounts setting at runtime ==" -IMGOPTS="compat=1.1,lazy_refcounts=off" -_make_test_img $size +_make_test_img -o "compat=1.1,lazy_refcounts=off" $size _NO_VALGRIND \ $QEMU_IO -c "reopen -o lazy-refcounts=on" \ @@ -164,8 +157,7 @@ $QEMU_IO -c "reopen -o lazy-refcounts=on" \ $PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features _check_test_img -IMGOPTS="compat=1.1,lazy_refcounts=on" -_make_test_img $size +_make_test_img -o
[PATCH v2 20/21] iotests: Disable data_file where it cannot be used
Signed-off-by: Max Reitz --- tests/qemu-iotests/007 | 5 +++-- tests/qemu-iotests/014 | 2 ++ tests/qemu-iotests/015 | 5 +++-- tests/qemu-iotests/026 | 5 - tests/qemu-iotests/029 | 5 +++-- tests/qemu-iotests/031 | 6 +++--- tests/qemu-iotests/036 | 5 +++-- tests/qemu-iotests/039 | 3 +++ tests/qemu-iotests/046 | 2 ++ tests/qemu-iotests/048 | 2 ++ tests/qemu-iotests/051 | 5 +++-- tests/qemu-iotests/058 | 5 +++-- tests/qemu-iotests/060 | 6 -- tests/qemu-iotests/061 | 6 -- tests/qemu-iotests/062 | 2 +- tests/qemu-iotests/066 | 2 +- tests/qemu-iotests/067 | 6 -- tests/qemu-iotests/068 | 5 +++-- tests/qemu-iotests/071 | 3 +++ tests/qemu-iotests/073 | 2 ++ tests/qemu-iotests/074 | 2 ++ tests/qemu-iotests/080 | 5 +++-- tests/qemu-iotests/090 | 2 ++ tests/qemu-iotests/098 | 6 -- tests/qemu-iotests/099 | 3 ++- tests/qemu-iotests/103 | 5 +++-- tests/qemu-iotests/108 | 6 -- tests/qemu-iotests/112 | 5 +++-- tests/qemu-iotests/114 | 2 ++ tests/qemu-iotests/121 | 3 +++ tests/qemu-iotests/138 | 2 ++ tests/qemu-iotests/156 | 2 ++ tests/qemu-iotests/176 | 7 +-- tests/qemu-iotests/191 | 2 ++ tests/qemu-iotests/201 | 6 +++--- tests/qemu-iotests/214 | 3 ++- tests/qemu-iotests/217 | 3 ++- tests/qemu-iotests/220 | 5 +++-- tests/qemu-iotests/243 | 6 -- tests/qemu-iotests/244 | 5 +++-- tests/qemu-iotests/250 | 2 ++ tests/qemu-iotests/267 | 5 +++-- 42 files changed, 117 insertions(+), 52 deletions(-) diff --git a/tests/qemu-iotests/007 b/tests/qemu-iotests/007 index 7d3544b479..160683adf8 100755 --- a/tests/qemu-iotests/007 +++ b/tests/qemu-iotests/007 @@ -41,8 +41,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 _supported_fmt qcow2 _supported_proto generic # refcount_bits must be at least 4 so we can create ten internal snapshots -# (1 bit supports none, 2 bits support two, 4 bits support 14) -_unsupported_imgopts 'refcount_bits=\(1\|2\)[^0-9]' +# (1 bit supports none, 2 bits support two, 4 bits support 14); +# snapshot are generally impossible with external data files +_unsupported_imgopts 'refcount_bits=\(1\|2\)[^0-9]' data_file echo echo "creating image" diff --git a/tests/qemu-iotests/014 b/tests/qemu-iotests/014 index 2f728a1956..e1221c0fff 100755 --- a/tests/qemu-iotests/014 +++ b/tests/qemu-iotests/014 @@ -43,6 +43,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 _supported_fmt qcow2 _supported_proto file _supported_os Linux +# Compression and snapshots do not work with external data files +_unsupported_imgopts data_file TEST_OFFSETS="0 4294967296" TEST_OPS="writev read write readv" diff --git a/tests/qemu-iotests/015 b/tests/qemu-iotests/015 index eec5387f3d..4d8effd0ae 100755 --- a/tests/qemu-iotests/015 +++ b/tests/qemu-iotests/015 @@ -40,8 +40,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 # actually any format that supports snapshots _supported_fmt qcow2 _supported_proto generic -# Internal snapshots are (currently) impossible with refcount_bits=1 -_unsupported_imgopts 'refcount_bits=1[^0-9]' +# Internal snapshots are (currently) impossible with refcount_bits=1, +# and generally impossible with external data files +_unsupported_imgopts 'refcount_bits=1[^0-9]' data_file echo echo "creating image" diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026 index 3430029ed6..a4aa74764f 100755 --- a/tests/qemu-iotests/026 +++ b/tests/qemu-iotests/026 @@ -49,7 +49,10 @@ _supported_cache_modes writethrough none # 32 and 64 bits do not work either, however, due to different leaked cluster # count on error. # Thus, the only remaining option is refcount_bits=16. -_unsupported_imgopts 'refcount_bits=\([^1]\|.\([^6]\|$\)\)' +# +# As for data_file, none of the refcount tests can work for it. +_unsupported_imgopts 'refcount_bits=\([^1]\|.\([^6]\|$\)\)' \ +data_file echo "Errors while writing 128 kB" echo diff --git a/tests/qemu-iotests/029 b/tests/qemu-iotests/029 index 9254ede5e5..2161a4b87a 100755 --- a/tests/qemu-iotests/029 +++ b/tests/qemu-iotests/029 @@ -42,8 +42,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 _supported_fmt qcow2 _supported_proto generic _unsupported_proto vxhs -# Internal snapshots are (currently) impossible with refcount_bits=1 -_unsupported_imgopts 'refcount_bits=1[^0-9]' +# Internal snapshots are (currently) impossible with refcount_bits=1, +# and generally impossible with external data files +_unsupported_imgopts 'refcount_bits=1[^0-9]' data_file offset_size=24 offset_l1_size=36 diff --git a/tests/qemu-iotests/031 b/tests/qemu-iotests/031 index c44fcf91bb..646ecd593f 100755 --- a/tests/qemu-iotests/031 +++ b/tests/qemu-iotests/031 @@ -40,9 +40,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 # This tests qcow2-specific low-level functionality _supported_fmt qcow2 _supported_proto file -# We want to test compat=0.10, which does not support refcount widths -# other than 16 -_unsupported_imgopts 'refcount_bits=\([^1]\|.\([^6]\|$\)\)' +# We want to test compat=0.10, which does not support external
Re: [PATCH v2 1/2] nbd: Don't send oversize strings
On 10/11/19 2:32 AM, Vladimir Sementsov-Ogievskiy wrote: 11.10.2019 0:00, Eric Blake wrote: Qemu as server currently won't accept export names larger than 256 bytes, nor create dirty bitmap names longer than 1023 bytes, so most uses of qemu as client or server have no reason to get anywhere near the NBD spec maximum of a 4k limit per string. However, we weren't actually enforcing things, ignoring when the remote side violates the protocol on input, and also having several code paths where we send oversize strings on output (for example, qemu-nbd --description could easily send more than 4k). Tighten things up as follows: client: - Perform bounds check on export name and dirty bitmap request prior to handing it to server - Validate that copied server replies are not too long (ignoring NBD_INFO_* replies that are not copied is not too bad) server: - Perform bounds check on export name and description prior to advertising it to client - Reject client name or metadata query that is too long Signed-off-by: Eric Blake --- +++ b/include/block/nbd.h @@ -232,6 +232,7 @@ enum { * going larger would require an audit of more code to make sure we * aren't overflowing some other buffer. */ This comment says, that we restrict export name to 256... Yes, because we still stack-allocate the name in places, but 4k is too large for stack allocation. But we're inconsistent on where we use the smaller 256-limit; the server won't serve an image that large, but doesn't prevent a client from requesting a 4k name export (even though that export will not be present). +++ b/blockdev-nbd.c @@ -162,6 +162,11 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name, name = device; } +if (strlen(name) > NBD_MAX_STRING_SIZE) { +error_setg(errp, "export name '%s' too long", name); +return; +} Hmmm, no, so here we restrict to 4096, but, we will not allow client to request more than 256. Seems, to correctly update server-part, we should drop NBD_MAX_NAME_SIZE and do the audit mentioned in the comment above its definition. Yeah, I guess it's time to just get rid of NBD_MAX_NAME_SIZE, and move away from stack allocations. Should I do that as a followup to this patch, or spin a v3? +++ b/nbd/client.c @@ -289,8 +289,8 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, char **description, return -1; } len -= sizeof(namelen); -if (len < namelen) { -error_setg(errp, "incorrect option name length"); +if (len < namelen || namelen > NBD_MAX_STRING_SIZE) { +error_setg(errp, "incorrect list name length"); New wording made me go above and read the comment, what functions does. Comment is good, but without it, it sounds like name of the list for me... Maybe: incorrect name length in server's list response nbd_send_opt_abort(ioc); return -1; } @@ -303,6 +303,11 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, char **description, local_name[namelen] = '\0'; len -= namelen; if (len) { +if (len > NBD_MAX_STRING_SIZE) { +error_setg(errp, "incorrect list description length"); and incorrect description length in server's list response @@ -648,6 +657,7 @@ static int nbd_send_meta_query(QIOChannel *ioc, uint32_t opt, if (query) { query_len = strlen(query); data_len += sizeof(query_len) + query_len; +assert(query_len <= NBD_MAX_STRING_SIZE); } else { assert(opt == NBD_OPT_LIST_META_CONTEXT); } you may assert export_len as well.. It was asserted earlier, but doing it again might not hurt, especially if I do the followup patch getting rid of NBD_MAX_NAME_SIZE @@ -1561,6 +1569,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, exp->export_bitmap = bm; exp->export_bitmap_context = g_strdup_printf("qemu:dirty-bitmap:%s", bitmap); +/* See BME_MAX_NAME_SIZE in block/qcow2-bitmap.c */ Hmm. BME_MAX_NAME_SIZE is checked only when creating persistent bitmaps. But for non-persistent name length is actually unlimited. So, we should either limit all bitmap names to 1023 (hope, this will not break existing scenarios) or error out here (or earlier) instead of assertion. I'm leaning towards limiting ALL bitmaps to the same length (as we've already debated the idea of being able to convert an existing bitmap from transient to persistent). We also may want QEMU_BUILD_BUG_ON(NBD_MAX_STRING_SIZE < BME_MAX_NAME_SIZE + sizeof("qemu:dirty-bitmap:") - 1) Except that BME_MAX_NAME_SIZE is not (currently) in a public .h file. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
[RFC PATCH 09/23] qcow2: Add l2_entry_size()
qcow2 images with subclusters have 128-bit L2 entries. The first 64 bits contain the same information as traditional images and the last 64 bits form a bitmap with the status of each individual subcluster. Because of that we cannot assume that L2 entries are sizeof(uint64_t) anymore. This function returns the proper value for the image. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 12 ++-- block/qcow2-refcount.c | 14 -- block/qcow2.c | 6 +++--- block/qcow2.h | 5 + 4 files changed, 22 insertions(+), 15 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index b2045d51bf..67f90e415d 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -209,7 +209,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset, uint64_t l2_offset, uint64_t **l2_slice) { BDRVQcow2State *s = bs->opaque; -int start_of_slice = sizeof(uint64_t) * +int start_of_slice = l2_entry_size(s) * (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset)); return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice, @@ -277,7 +277,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index) /* allocate a new l2 entry */ -l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t)); +l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s)); if (l2_offset < 0) { ret = l2_offset; goto fail; @@ -301,7 +301,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index) /* allocate a new entry in the l2 cache */ -slice_size2 = s->l2_slice_size * sizeof(uint64_t); +slice_size2 = s->l2_slice_size * l2_entry_size(s); n_slices = s->cluster_size / slice_size2; trace_qcow2_l2_allocate_get_empty(bs, l1_index); @@ -365,7 +365,7 @@ fail: } s->l1_table[l1_index] = old_l2_offset; if (l2_offset > 0) { -qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t), +qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s), QCOW2_DISCARD_ALWAYS); } return ret; @@ -708,7 +708,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset, /* Then decrease the refcount of the old table */ if (l2_offset) { -qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t), +qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s), QCOW2_DISCARD_OTHER); } @@ -1880,7 +1880,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table, int ret; int i, j; -slice_size2 = s->l2_slice_size * sizeof(uint64_t); +slice_size2 = s->l2_slice_size * l2_entry_size(s); n_slices = s->cluster_size / slice_size2; if (!is_active_l1) { diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 14f71df7da..a2c4d36378 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1253,7 +1253,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs, l2_slice = NULL; l1_table = NULL; l1_size2 = l1_size * sizeof(uint64_t); -slice_size2 = s->l2_slice_size * sizeof(uint64_t); +slice_size2 = s->l2_slice_size * l2_entry_size(s); n_slices = s->cluster_size / slice_size2; s->cache_discards = true; @@ -1604,7 +1604,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, int i, l2_size, nb_csectors, ret; /* Read L2 table from disk */ -l2_size = s->l2_size * sizeof(uint64_t); +l2_size = s->l2_size * l2_entry_size(s); l2_table = g_malloc(l2_size); ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size); @@ -1679,15 +1679,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", offset); if (fix & BDRV_FIX_ERRORS) { +int idx = i * (l2_entry_size(s) / sizeof(uint64_t)); uint64_t l2e_offset = -l2_offset + (uint64_t)i * sizeof(uint64_t); +l2_offset + (uint64_t)i * l2_entry_size(s); int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2; l2_entry = QCOW_OFLAG_ZERO; set_l2_entry(s, l2_table, i, l2_entry); ret = qcow2_pre_write_overlap_check(bs, ign, -l2e_offset, sizeof(uint64_t), false); +l2e_offset, l2_entry_size(s), false); if (ret < 0) { fprintf(stderr, "ERROR: Overlap check failed\n"); res->check_errors++; @@ -1697,7 +1698,8 @@ static int
[RFC PATCH 04/23] qcow2: Add get_l2_entry() and set_l2_entry()
The size of an L2 entry is 64 bits, but if we want to have subclusters we need extended L2 entries. This means that we have to access L2 tables and slices differently depending on whether an image has extended L2 entries or not. This patch replaces all l2_slice[] accesses with calls to get_l2_entry() and set_l2_entry(). Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 65 ++ block/qcow2-refcount.c | 17 +-- block/qcow2.h | 12 3 files changed, 55 insertions(+), 39 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 70b2e32f7e..b2045d51bf 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -379,12 +379,13 @@ fail: * cluster which may require a different handling) */ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters, -int cluster_size, uint64_t *l2_slice, uint64_t stop_flags) +int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags) { +BDRVQcow2State *s = bs->opaque; int i; QCow2ClusterType first_cluster_type; uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED; -uint64_t first_entry = be64_to_cpu(l2_slice[0]); +uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index); uint64_t offset = first_entry & mask; first_cluster_type = qcow2_get_cluster_type(bs, first_entry); @@ -397,7 +398,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters, first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC); for (i = 0; i < nb_clusters; i++) { -uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask; +uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask; if (offset + (uint64_t) i * cluster_size != l2_entry) { break; } @@ -413,14 +414,16 @@ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters, static int count_contiguous_clusters_unallocated(BlockDriverState *bs, int nb_clusters, uint64_t *l2_slice, + int l2_index, QCow2ClusterType wanted_type) { +BDRVQcow2State *s = bs->opaque; int i; assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN || wanted_type == QCOW2_CLUSTER_UNALLOCATED); for (i = 0; i < nb_clusters; i++) { -uint64_t entry = be64_to_cpu(l2_slice[i]); +uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i); QCow2ClusterType type = qcow2_get_cluster_type(bs, entry); if (type != wanted_type) { @@ -566,7 +569,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset, /* find the cluster offset for the given disk offset */ l2_index = offset_to_l2_slice_index(s, offset); -*cluster_offset = be64_to_cpu(l2_slice[l2_index]); +*cluster_offset = get_l2_entry(s, l2_slice, l2_index); nb_clusters = size_to_clusters(s, bytes_needed); /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned @@ -601,14 +604,14 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset, case QCOW2_CLUSTER_UNALLOCATED: /* how many empty clusters ? */ c = count_contiguous_clusters_unallocated(bs, nb_clusters, - _slice[l2_index], type); + l2_slice, l2_index, type); *cluster_offset = 0; break; case QCOW2_CLUSTER_ZERO_ALLOC: case QCOW2_CLUSTER_NORMAL: /* how many allocated clusters ? */ c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size, - _slice[l2_index], QCOW_OFLAG_ZERO); + l2_slice, l2_index, QCOW_OFLAG_ZERO); *cluster_offset &= L2E_OFFSET_MASK; if (offset_into_cluster(s, *cluster_offset)) { qcow2_signal_corruption(bs, true, -1, -1, @@ -761,7 +764,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, /* Compression can't overwrite anything. Fail if the cluster was already * allocated. */ -cluster_offset = be64_to_cpu(l2_slice[l2_index]); +cluster_offset = get_l2_entry(s, l2_slice, l2_index); if (cluster_offset & L2E_OFFSET_MASK) { qcow2_cache_put(s->l2_table_cache, (void **) _slice); return -EIO; @@ -786,7 +789,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED); qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); -l2_slice[l2_index] = cpu_to_be64(cluster_offset); +set_l2_entry(s, l2_slice, l2_index, cluster_offset); qcow2_cache_put(s->l2_table_cache, (void **) _slice); *host_offset = cluster_offset &
[RFC PATCH 16/23] qcow2: Add subcluster support to discard_in_l2_slice()
Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an image has subclusters. Instead, the individual 'all zeroes' bits must be used. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index c554b1a88c..bf32447d18 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1769,7 +1769,11 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset, /* First remove L2 entries */ qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); -if (!full_discard && s->qcow_version >= 3) { +if (has_subclusters(s)) { +set_l2_entry(s, l2_slice, l2_index + i, 0); +set_l2_bitmap(s, l2_slice, l2_index + i, + full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES); +} else if (!full_discard && s->qcow_version >= 3) { set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); } else { set_l2_entry(s, l2_slice, l2_index + i, 0); -- 2.20.1
[PATCH v2 06/21] iotests: Drop compat=1.1 in 050
IMGOPTS can never be empty for qcow2, because the check scripts adds compat=1.1 unless the user specified any compat option themselves. Thus, this block does not do anything and can be dropped. Signed-off-by: Max Reitz Reviewed-by: Maxim Levitsky --- tests/qemu-iotests/050 | 4 1 file changed, 4 deletions(-) diff --git a/tests/qemu-iotests/050 b/tests/qemu-iotests/050 index 211fc00797..272ecab195 100755 --- a/tests/qemu-iotests/050 +++ b/tests/qemu-iotests/050 @@ -41,10 +41,6 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 _supported_fmt qcow2 qed _supported_proto file -if test "$IMGFMT" = qcow2 && test $IMGOPTS = ""; then - IMGOPTS=compat=1.1 -fi - echo echo "== Creating images ==" -- 2.21.0
[PATCH v2 18/21] iotests: Make 137 work with data_file
When using an external data file, there are no refcounts for data clusters. We thus have to adjust the corruption test in this patch to not be based around a data cluster allocation, but the L2 table allocation (L2 tables are still refcounted with external data files). Furthermore, we should not print qcow2.py's list of incompatible features because it differs depending on whether there is an external data file or not. With those two changes, the test will work both with an external data files (once that options works with the iotests at all). Signed-off-by: Max Reitz --- tests/qemu-iotests/137 | 15 +++ tests/qemu-iotests/137.out | 6 ++ 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/tests/qemu-iotests/137 b/tests/qemu-iotests/137 index 6cf2997577..7ae86892f7 100755 --- a/tests/qemu-iotests/137 +++ b/tests/qemu-iotests/137 @@ -138,14 +138,21 @@ $QEMU_IO \ "$TEST_IMG" 2>&1 | _filter_qemu_io # The dirty bit must not be set -$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features +# (Filter the external data file bit) +if $PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features \ +| grep -q '\<0\>' +then +echo 'ERROR: Dirty bit set' +else +echo 'OK: Dirty bit not set' +fi # Similarly we can test whether corruption detection has been enabled: -# Create L1/L2, overwrite first entry in refcount block, allocate something. +# Create L1, overwrite refcounts, force allocation of L2 by writing +# data. # Disabling the checks should fail, so the corruption must be detected. _make_test_img 64M -$QEMU_IO -c "write 0 64k" "$TEST_IMG" | _filter_qemu_io -poke_file "$TEST_IMG" "$((0x2))" "\x00\x00" +poke_file "$TEST_IMG" "$((0x2))" "\x00\x00\x00\x00\x00\x00\x00\x00" $QEMU_IO \ -c "reopen -o overlap-check=none,lazy-refcounts=42" \ -c "write 64k 64k" \ diff --git a/tests/qemu-iotests/137.out b/tests/qemu-iotests/137.out index bd4523a853..86377c80cd 100644 --- a/tests/qemu-iotests/137.out +++ b/tests/qemu-iotests/137.out @@ -36,11 +36,9 @@ qemu-io: Unsupported value 'blubb' for qcow2 option 'overlap-check'. Allowed are wrote 512/512 bytes at offset 0 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) ./common.rc: Killed ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" ) -incompatible_features [] +OK: Dirty bit not set Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 -wrote 65536/65536 bytes at offset 0 -64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) qemu-io: Parameter 'lazy-refcounts' expects 'on' or 'off' -qcow2: Marking image as corrupt: Preventing invalid write on metadata (overlaps with qcow2_header); further corruption events will be suppressed +qcow2: Marking image as corrupt: Preventing invalid allocation of L2 table at offset 0; further corruption events will be suppressed write failed: Input/output error *** done -- 2.21.0
[PATCH v2 21/21] iotests: Allow check -o data_file
The problem with allowing the data_file option is that you want to use a different data file per image used in the test. Therefore, we need to allow patterns like -o data_file='$TEST_IMG.data_file'. Then, we need to filter it out from qemu-img map, qemu-img create, and remove the data file in _rm_test_img. Signed-off-by: Max Reitz --- tests/qemu-iotests/common.filter | 23 +-- tests/qemu-iotests/common.rc | 22 +- 2 files changed, 42 insertions(+), 3 deletions(-) diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter index 63bc6f6f26..9dd05689d1 100644 --- a/tests/qemu-iotests/common.filter +++ b/tests/qemu-iotests/common.filter @@ -121,7 +121,13 @@ _filter_actual_image_size() # replace driver-specific options in the "Formatting..." line _filter_img_create() { -$SED -e "s#$REMOTE_TEST_DIR#TEST_DIR#g" \ +data_file_filter=() +if data_file=$(_get_data_file "$TEST_IMG"); then +data_file_filter=(-e "s# data_file=$data_file##") +fi + +$SED "${data_file_filter[@]}" \ +-e "s#$REMOTE_TEST_DIR#TEST_DIR#g" \ -e "s#$IMGPROTO:$TEST_DIR#TEST_DIR#g" \ -e "s#$TEST_DIR#TEST_DIR#g" \ -e "s#$IMGFMT#IMGFMT#g" \ @@ -204,9 +210,22 @@ _filter_img_info() # human and json output _filter_qemu_img_map() { +# Assuming the data_file value in $IMGOPTS contains a '$TEST_IMG', +# create a filter that replaces the data file name by $TEST_IMG. +# Example: +# In $IMGOPTS: 'data_file=$TEST_IMG.data_file' +# Then data_file_pattern == '\(.*\).data_file' +# And data_file_filter == -e 's#\(.*\).data_file#\1# +data_file_filter=() +if data_file_pattern=$(_get_data_file '\\(.*\\)'); then +data_file_filter=(-e "s#$data_file_pattern#\\1#") +fi + $SED -e 's/\([0-9a-fx]* *[0-9a-fx]* *\)[0-9a-fx]* */\1/g' \ -e 's/"offset": [0-9]\+/"offset": OFFSET/g' \ --e 's/Mapped to *//' | _filter_testdir | _filter_imgfmt +-e 's/Mapped to *//' \ +"${data_file_filter[@]}" \ +| _filter_testdir | _filter_imgfmt } _filter_nbd() diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc index f3784077de..bed789a691 100644 --- a/tests/qemu-iotests/common.rc +++ b/tests/qemu-iotests/common.rc @@ -277,6 +277,20 @@ _stop_nbd_server() fi } +# Gets the data_file value from IMGOPTS and replaces the '$TEST_IMG' +# pattern by '$1' +# Caution: The replacement is done with sed, so $1 must be escaped +# properly. (The delimiter is '#'.) +_get_data_file() +{ +if ! echo "$IMGOPTS" | grep -q 'data_file='; then +return 1 +fi + +echo "$IMGOPTS" | sed -e 's/.*data_file=\([^,]*\).*/\1/' \ +| sed -e "s#\\\$TEST_IMG#$1#" +} + _make_test_img() { # extra qemu-img options can be added by tests @@ -297,7 +311,8 @@ _make_test_img() fi if [ -n "$IMGOPTS" ]; then -optstr=$(_optstr_add "$optstr" "$IMGOPTS") +imgopts_expanded=$(echo "$IMGOPTS" | sed -e "s#\\\$TEST_IMG#$img_name#") +optstr=$(_optstr_add "$optstr" "$imgopts_expanded") fi if [ -n "$IMGKEYSECRET" ]; then object_options="--object secret,id=keysec0,data=$IMGKEYSECRET" @@ -376,6 +391,11 @@ _rm_test_img() # Remove all the extents for vmdk "$QEMU_IMG" info "$img" 2>/dev/null | grep 'filename:' | cut -f 2 -d: \ | xargs -I {} rm -f "{}" +elif [ "$IMGFMT" = "qcow2" ]; then +# Remove external data file +if data_file=$(_get_data_file "$img"); then +rm -f "$data_file" +fi fi rm -f "$img" } -- 2.21.0
[RFC PATCH 10/23] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
Extended L2 entries are 128-bit wide: 64 bits for the entry itself and 64 bits for the subcluster allocation bitmap. In order to support them correctly get/set_l2_entry() need to be updated so they take the entry width into account in order to calculate the correct offset. This patch also adds the get/set_l2_bitmap() functions that are used to access the bitmaps. For convenience, these functions are no-ops when used in traditional qcow2 images. Signed-off-by: Alberto Garcia --- block/qcow2.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/block/qcow2.h b/block/qcow2.h index 9a7648af47..d9fe883fe0 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -504,15 +504,37 @@ static inline size_t l2_entry_size(BDRVQcow2State *s) static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice, int idx) { +idx *= l2_entry_size(s) / sizeof(uint64_t); return be64_to_cpu(l2_slice[idx]); } +static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice, + int idx) +{ +if (has_subclusters(s)) { +idx *= l2_entry_size(s) / sizeof(uint64_t); +return be64_to_cpu(l2_slice[idx + 1]); +} else { +return 0; +} +} + static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice, int idx, uint64_t entry) { +idx *= l2_entry_size(s) / sizeof(uint64_t); l2_slice[idx] = cpu_to_be64(entry); } +static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice, + int idx, uint64_t bitmap) +{ +if (has_subclusters(s)) { +idx *= l2_entry_size(s) / sizeof(uint64_t); +l2_slice[idx + 1] = cpu_to_be64(bitmap); +} +} + static inline bool has_data_file(BlockDriverState *bs) { BDRVQcow2State *s = bs->opaque; -- 2.20.1
[RFC PATCH 03/23] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
When writing to a qcow2 file there are two functions that take a virtual offset and return a host offset, possibly allocating new clusters if necessary: - handle_copied() looks for normal data clusters that are already allocated and have a reference count of 1. In those clusters we can simply write the data and there is no need to perform any copy-on-write. - handle_alloc() looks for clusters that do need copy-on-write, either because they haven't been allocated yet, because their reference count is != 1 or because they are ZERO_ALLOC clusters. The ZERO_ALLOC case is a bit special because those are clusters that are already allocated and they could perfectly be dealt with in handle_copied() (as long as copy-on-write is performed when required). In fact, there is extra code specifically for them in handle_alloc() that tries to reuse the existing allocation if possible and frees them otherwise. This patch changes the handling of ZERO_ALLOC clusters so the semantics of these two functions are now like this: - handle_copied() looks for clusters that are already allocated and which we can overwrite (NORMAL and ZERO_ALLOC clusters with a reference count of 1). - handle_alloc() looks for clusters for which we need a new allocation (all other cases). One importante difference after this change is that clusters found in handle_copied() may now require copy-on-write, but this will be anyway necessary once we add support for subclusters. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 177 +++--- 1 file changed, 96 insertions(+), 81 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index f462e169c0..70b2e32f7e 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1021,7 +1021,8 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m) /* * For a given write request, create a new QCowL2Meta structure and - * add it to @m. + * add it to @m. If the write request does not need copy-on-write or + * changes to the L2 metadata then this function does nothing. * * @host_offset points to the beginning of the first cluster. * @@ -1034,15 +1035,51 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m) */ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset, uint64_t guest_offset, uint64_t bytes, - QCowL2Meta **m, bool keep_old) + uint64_t *l2_slice, QCowL2Meta **m, bool keep_old) { BDRVQcow2State *s = bs->opaque; -unsigned cow_start_from = 0; +int l2_index = offset_to_l2_slice_index(s, guest_offset); +uint64_t l2_entry; +unsigned cow_start_from, cow_end_to; unsigned cow_start_to = offset_into_cluster(s, guest_offset); unsigned cow_end_from = cow_start_to + bytes; -unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size); unsigned nb_clusters = size_to_clusters(s, cow_end_from); QCowL2Meta *old_m = *m; +QCow2ClusterType type; + +/* Return if there's no COW (all clusters are normal and we keep them) */ +if (keep_old) { +int i; +for (i = 0; i < nb_clusters; i++) { +l2_entry = be64_to_cpu(l2_slice[l2_index + i]); +if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) { +break; +} +} +if (i == nb_clusters) { +return; +} +} + +/* Get the L2 entry from the first cluster */ +l2_entry = be64_to_cpu(l2_slice[l2_index]); +type = qcow2_get_cluster_type(bs, l2_entry); + +if (type == QCOW2_CLUSTER_NORMAL && keep_old) { +cow_start_from = cow_start_to; +} else { +cow_start_from = 0; +} + +/* Get the L2 entry from the last cluster */ +l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]); +type = qcow2_get_cluster_type(bs, l2_entry); + +if (type == QCOW2_CLUSTER_NORMAL && keep_old) { +cow_end_to = cow_end_from; +} else { +cow_end_to = ROUND_UP(cow_end_from, s->cluster_size); +} *m = g_malloc0(sizeof(**m)); **m = (QCowL2Meta) { @@ -1068,18 +1105,18 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset, QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight); } -/* Returns true if writing to a cluster requires COW */ +/* Returns true if the cluster is unallocated or has refcount > 1 */ static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry) { switch (qcow2_get_cluster_type(bs, l2_entry)) { case QCOW2_CLUSTER_NORMAL: +case QCOW2_CLUSTER_ZERO_ALLOC: if (l2_entry & QCOW_OFLAG_COPIED) { return false; } case QCOW2_CLUSTER_UNALLOCATED: case QCOW2_CLUSTER_COMPRESSED: case QCOW2_CLUSTER_ZERO_PLAIN: -case QCOW2_CLUSTER_ZERO_ALLOC: return true; default:
[RFC PATCH 01/23] qcow2: Add calculate_l2_meta()
handle_alloc() creates a QCowL2Meta structure in order to update the image metadata and perform the necessary copy-on-write operations. This patch moves that code to a separate function so it can be used from other places. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 76 +-- 1 file changed, 52 insertions(+), 24 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 8d5fa1539c..fe2523ed66 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1019,6 +1019,55 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m) QCOW2_DISCARD_NEVER); } +/* + * For a given write request, create a new QCowL2Meta structure and + * add it to @m. + * + * @host_offset points to the beginning of the first cluster. + * + * @guest_offset and @bytes indicate the offset and length of the + * request. + * + * If @keep_old is true it means that the clusters were already + * allocated and will be overwritten. If false then the clusters are + * new and we have to decrease the reference count of the old ones. + */ +static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset, + uint64_t guest_offset, uint64_t bytes, + QCowL2Meta **m, bool keep_old) +{ +BDRVQcow2State *s = bs->opaque; +unsigned cow_start_from = 0; +unsigned cow_start_to = offset_into_cluster(s, guest_offset); +unsigned cow_end_from = cow_start_to + bytes; +unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size); +unsigned nb_clusters = size_to_clusters(s, cow_end_from); +QCowL2Meta *old_m = *m; + +*m = g_malloc0(sizeof(**m)); +**m = (QCowL2Meta) { +.next = old_m, + +.alloc_offset = host_offset, +.offset = start_of_cluster(s, guest_offset), +.nb_clusters= nb_clusters, + +.keep_old_clusters = keep_old, + +.cow_start = { +.offset = cow_start_from, +.nb_bytes = cow_start_to - cow_start_from, +}, +.cow_end = { +.offset = cow_end_from, +.nb_bytes = cow_end_to - cow_end_from, +}, +}; + +qemu_co_queue_init(&(*m)->dependent_requests); +QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight); +} + /* * Returns the number of contiguous clusters that can be used for an allocating * write, but require COW to be performed (this includes yet unallocated space, @@ -1414,35 +1463,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset, uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset); int avail_bytes = MIN(INT_MAX, nb_clusters << s->cluster_bits); int nb_bytes = MIN(requested_bytes, avail_bytes); -QCowL2Meta *old_m = *m; - -*m = g_malloc0(sizeof(**m)); - -**m = (QCowL2Meta) { -.next = old_m, - -.alloc_offset = alloc_cluster_offset, -.offset = start_of_cluster(s, guest_offset), -.nb_clusters= nb_clusters, - -.keep_old_clusters = keep_old_clusters, - -.cow_start = { -.offset = 0, -.nb_bytes = offset_into_cluster(s, guest_offset), -}, -.cow_end = { -.offset = nb_bytes, -.nb_bytes = avail_bytes - nb_bytes, -}, -}; -qemu_co_queue_init(&(*m)->dependent_requests); -QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight); *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset); *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset)); assert(*bytes != 0); +calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, + m, keep_old_clusters); + return 1; fail: -- 2.20.1
Re: [PULL 0/2] Tracing patches
On Tue, 15 Oct 2019 at 16:38, Philippe Mathieu-Daudé wrote: > > On 10/15/19 2:24 PM, Peter Maydell wrote: > > On Mon, 14 Oct 2019 at 09:57, Stefan Hajnoczi wrote: > >> > >> The following changes since commit > >> 98b2e3c9ab3abfe476a2b02f8f51813edb90e72d: > >> > >>Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' > >> into staging (2019-10-08 16:08:35 +0100) > >> > >> are available in the Git repository at: > >> > >>https://github.com/stefanha/qemu.git tags/tracing-pull-request > >> > >> for you to fetch changes up to a1f4fc951a277c49a25418cafb028ec5529707fa: > >> > >>trace: avoid "is" with a literal Python 3.8 warnings (2019-10-14 > >> 09:54:46 +0100) > >> > >> > >> Pull request > >> > >> > >> > >> Stefan Hajnoczi (2): > >>trace: add --group=all to tracing.txt > >>trace: avoid "is" with a literal Python 3.8 warnings > >> > > > > > > Applied, thanks. > > Buh, v2 missed :( Oops. I don't necessarily notice updated pullreq versions unless somebody follows up to the v1 coverletter to say the pull is out of date. thanks -- PMM
[PATCH v2 12/21] iotests: Drop IMGOPTS use in 267
Overwriting IMGOPTS means ignoring all user-supplied options, which is not what we want. Replace the current IMGOPTS use by a new BACKING_FILE variable. Signed-off-by: Max Reitz --- tests/qemu-iotests/267 | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/tests/qemu-iotests/267 b/tests/qemu-iotests/267 index d37a67c012..eda45449d4 100755 --- a/tests/qemu-iotests/267 +++ b/tests/qemu-iotests/267 @@ -68,7 +68,11 @@ size=128M run_test() { -_make_test_img $size +if [ -n "$BACKING_FILE" ]; then +_make_test_img -b "$BACKING_FILE" $size +else +_make_test_img $size +fi printf "savevm snap0\ninfo snapshots\nloadvm snap0\n" | run_qemu "$@" | _filter_date } @@ -119,12 +123,12 @@ echo TEST_IMG="$TEST_IMG.base" _make_test_img $size -IMGOPTS="backing_file=$TEST_IMG.base" \ +BACKING_FILE="$TEST_IMG.base" \ run_test -blockdev driver=file,filename="$TEST_IMG.base",node-name=backing-file \ -blockdev driver=file,filename="$TEST_IMG",node-name=file \ -blockdev driver=$IMGFMT,file=file,backing=backing-file,node-name=fmt -IMGOPTS="backing_file=$TEST_IMG.base" \ +BACKING_FILE="$TEST_IMG.base" \ run_test -blockdev driver=file,filename="$TEST_IMG.base",node-name=backing-file \ -blockdev driver=$IMGFMT,file=backing-file,node-name=backing-fmt \ -blockdev driver=file,filename="$TEST_IMG",node-name=file \ @@ -141,7 +145,7 @@ echo echo "=== -blockdev with NBD server on the backing file ===" echo -IMGOPTS="backing_file=$TEST_IMG.base" _make_test_img $size +_make_test_img -b "$TEST_IMG.base" $size cat <
[PATCH v2 19/21] iotests: Make 198 work with data_file
We do not care about the json:{} filenames here, so we can just filter them out and thus make the test work both with and without external data files. Signed-off-by: Max Reitz --- tests/qemu-iotests/198 | 6 -- tests/qemu-iotests/198.out | 4 ++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/tests/qemu-iotests/198 b/tests/qemu-iotests/198 index c8f824cfae..fb0d5a29d3 100755 --- a/tests/qemu-iotests/198 +++ b/tests/qemu-iotests/198 @@ -92,13 +92,15 @@ echo echo "== checking image base ==" $QEMU_IMG info --image-opts $IMGSPECBASE | _filter_img_info --format-specific \ | sed -e "/^disk size:/ D" -e '/refcount bits:/ D' -e '/compat:/ D' \ - -e '/lazy refcounts:/ D' -e '/corrupt:/ D' + -e '/lazy refcounts:/ D' -e '/corrupt:/ D' -e '/^\s*data file/ D' \ +| _filter_json_filename echo echo "== checking image layer ==" $QEMU_IMG info --image-opts $IMGSPECLAYER | _filter_img_info --format-specific \ | sed -e "/^disk size:/ D" -e '/refcount bits:/ D' -e '/compat:/ D' \ - -e '/lazy refcounts:/ D' -e '/corrupt:/ D' + -e '/lazy refcounts:/ D' -e '/corrupt:/ D' -e '/^\s*data file/ D' \ +| _filter_json_filename # success, all done diff --git a/tests/qemu-iotests/198.out b/tests/qemu-iotests/198.out index e86b175e39..831ce3a289 100644 --- a/tests/qemu-iotests/198.out +++ b/tests/qemu-iotests/198.out @@ -32,7 +32,7 @@ read 16777216/16777216 bytes at offset 0 16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) == checking image base == -image: json:{"encrypt.key-secret": "sec0", "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/t.IMGFMT.base"}} +image: json:{ /* filtered */ } file format: IMGFMT virtual size: 16 MiB (16777216 bytes) Format specific information: @@ -74,7 +74,7 @@ Format specific information: master key iters: 1024 == checking image layer == -image: json:{"encrypt.key-secret": "sec1", "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/t.IMGFMT"}} +image: json:{ /* filtered */ } file format: IMGFMT virtual size: 16 MiB (16777216 bytes) backing file: TEST_DIR/t.IMGFMT.base -- 2.21.0
[PATCH v2 17/21] iotests: Make 110 work with data_file
The only difference is that the json:{} filename of the image looks different. We actually do not care about that filename in this test, we are only interested in (1) that there is a json:{} filename, and (2) whether the backing filename can be constructed. So just filter out the json:{} data, thus making this test pass both with and without data_file. Signed-off-by: Max Reitz --- tests/qemu-iotests/110 | 7 +-- tests/qemu-iotests/110.out | 4 ++-- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/tests/qemu-iotests/110 b/tests/qemu-iotests/110 index f78df0e6e1..139c02c2cf 100755 --- a/tests/qemu-iotests/110 +++ b/tests/qemu-iotests/110 @@ -67,6 +67,7 @@ echo # Across blkdebug without a config file, you cannot reconstruct filenames, so # qemu is incapable of knowing the directory of the top image from the filename # alone. However, using bdrv_dirname(), it should still work. +# (Filter out the json:{} filename so this test works with external data files) TEST_IMG="json:{ 'driver': '$IMGFMT', 'file': { @@ -82,7 +83,8 @@ TEST_IMG="json:{ } ] } -}" _img_info | _filter_img_info | grep -v 'backing file format' +}" _img_info | _filter_img_info | grep -v 'backing file format' \ +| _filter_json_filename echo echo '=== Backing name is always relative to the backed image ===' @@ -114,7 +116,8 @@ TEST_IMG="json:{ } ] } -}" _img_info | _filter_img_info | grep -v 'backing file format' +}" _img_info | _filter_img_info | grep -v 'backing file format' \ +| _filter_json_filename # success, all done diff --git a/tests/qemu-iotests/110.out b/tests/qemu-iotests/110.out index f60b26390e..f835553a99 100644 --- a/tests/qemu-iotests/110.out +++ b/tests/qemu-iotests/110.out @@ -11,7 +11,7 @@ backing file: t.IMGFMT.base (actual path: TEST_DIR/t.IMGFMT.base) === Non-reconstructable filename === -image: json:{"driver": "IMGFMT", "file": {"set-state.0.event": "read_aio", "image": {"driver": "file", "filename": "TEST_DIR/t.IMGFMT"}, "driver": "blkdebug", "set-state.0.new_state": 42}} +image: json:{ /* filtered */ } file format: IMGFMT virtual size: 64 MiB (67108864 bytes) backing file: t.IMGFMT.base (actual path: TEST_DIR/t.IMGFMT.base) @@ -22,7 +22,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 backing_file=t.IMGFMT.b === Nodes without a common directory === -image: json:{"driver": "IMGFMT", "file": {"children": [{"driver": "file", "filename": "TEST_DIR/t.IMGFMT"}, {"driver": "file", "filename": "TEST_DIR/t.IMGFMT.copy"}], "driver": "quorum", "vote-threshold": 1}} +image: json:{ /* filtered */ } file format: IMGFMT virtual size: 64 MiB (67108864 bytes) backing file: t.IMGFMT.base (cannot determine actual path) -- 2.21.0
[RFC PATCH 23/23] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
Now that the implementation of subclusters is complete we can finally add the necessary options to create and read images with this feature, which we call "extended L2 entries". Signed-off-by: Alberto Garcia --- block/qcow2.c| 47 ++ block/qcow2.h| 8 ++- include/block/block_int.h| 1 + qapi/block-core.json | 2 + tests/qemu-iotests/031.out | 8 +-- tests/qemu-iotests/036.out | 4 +- tests/qemu-iotests/049.out | 102 +++ tests/qemu-iotests/060.out | 1 + tests/qemu-iotests/061.out | 20 +++--- tests/qemu-iotests/065 | 18 -- tests/qemu-iotests/082.out | 48 --- tests/qemu-iotests/085.out | 38 ++-- tests/qemu-iotests/144.out | 4 +- tests/qemu-iotests/182.out | 2 +- tests/qemu-iotests/185.out | 8 +-- tests/qemu-iotests/198.out | 2 + tests/qemu-iotests/206.out | 4 ++ tests/qemu-iotests/242.out | 5 ++ tests/qemu-iotests/255.out | 8 +-- tests/qemu-iotests/common.filter | 1 + 20 files changed, 221 insertions(+), 110 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 2eb032aed7..44d97d30b1 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1346,6 +1346,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, s->subcluster_size = s->cluster_size / s->subclusters_per_cluster; s->subcluster_bits = ctz32(s->subcluster_size); +if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) { +error_setg(errp, "Unsupported subcluster size: %d", s->subcluster_size); +ret = -EINVAL; +goto fail; +} + /* Check support for various header values */ if (header.refcount_order > 6) { error_setg(errp, "Reference count entry width too large; may not " @@ -2646,6 +2652,11 @@ int qcow2_update_header(BlockDriverState *bs) .bit = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR, .name = "lazy refcounts", }, +{ +.type = QCOW2_FEAT_TYPE_INCOMPATIBLE, +.bit = QCOW2_INCOMPAT_EXTL2_BITNR, +.name = "extended L2 entries", +}, }; ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE, @@ -3138,6 +3149,27 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) goto out; } +if (!qcow2_opts->has_extended_l2) { +qcow2_opts->extended_l2 = false; +} +if (qcow2_opts->extended_l2) { +unsigned min_cluster_size = +(1 << MIN_CLUSTER_BITS) * QCOW_MAX_SUBCLUSTERS_PER_CLUSTER; +if (version < 3) { +error_setg(errp, "Extended L2 entries are only supported with " + "compatibility level 1.1 and above (use version=v3 or " + "greater)"); +ret = -EINVAL; +goto out; +} +if (cluster_size < min_cluster_size) { +error_setg(errp, "Extended L2 entries are only supported with " + "cluster sizes of at least %u bytes", min_cluster_size); +ret = -EINVAL; +goto out; +} +} + if (!qcow2_opts->has_refcount_bits) { qcow2_opts->refcount_bits = 16; } @@ -3232,6 +3264,11 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) cpu_to_be64(QCOW2_AUTOCLEAR_DATA_FILE_RAW); } +if (qcow2_opts->extended_l2) { +header->incompatible_features |= +cpu_to_be64(QCOW2_INCOMPAT_EXTL2); +} + ret = blk_pwrite(blk, 0, header, cluster_size, 0); g_free(header); if (ret < 0) { @@ -3409,6 +3446,7 @@ static int coroutine_fn qcow2_co_create_opts(const char *filename, QemuOpts *opt { BLOCK_OPT_BACKING_FMT,"backing-fmt" }, { BLOCK_OPT_CLUSTER_SIZE, "cluster-size" }, { BLOCK_OPT_LAZY_REFCOUNTS, "lazy-refcounts" }, +{ BLOCK_OPT_EXTL2, "extended-l2" }, { BLOCK_OPT_REFCOUNT_BITS, "refcount-bits" }, { BLOCK_OPT_ENCRYPT,BLOCK_OPT_ENCRYPT_FORMAT }, { BLOCK_OPT_COMPAT_LEVEL, "version" }, @@ -4612,6 +4650,9 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs, .corrupt= s->incompatible_features & QCOW2_INCOMPAT_CORRUPT, .has_corrupt= true, +.has_extended_l2= true, +.extended_l2= s->incompatible_features & + QCOW2_INCOMPAT_EXTL2, .refcount_bits = s->refcount_bits, .has_bitmaps= !!bitmaps, .bitmaps= bitmaps, @@ -5205,6 +5246,12 @@ static QemuOptsList qcow2_create_opts = { .help = "Postpone refcount updates", .def_value_str = "off"
[RFC PATCH 20/23] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
The L2 bitmap needs to be updated after each write to indicate what new subclusters are now allocated. This needs to happen even if the cluster was already allocated and the L2 entry was otherwise valid. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 16 1 file changed, 16 insertions(+) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 75579c1470..9a4bf672b3 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -980,6 +980,22 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m) set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED | (cluster_offset + (i << s->cluster_bits))); + +/* Update bitmap with the subclusters that were just written */ +if (has_subclusters(s)) { +uint64_t written_from = m->cow_start.offset; +uint64_t written_to = m->cow_end.offset + m->cow_end.nb_bytes; +uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i); +int sc; +for (sc = 0; sc < s->subclusters_per_cluster; sc++) { +uint64_t sc_off = i * s->cluster_size + sc * s->subcluster_size; +if (sc_off >= written_from && sc_off < written_to) { +l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc); +l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc); +} +} +set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap); +} } -- 2.20.1
[PATCH v2 07/21] iotests: Let _make_test_img parse its parameters
This will allow us to add more options than just -b. Signed-off-by: Max Reitz Reviewed-by: Maxim Levitsky --- tests/qemu-iotests/common.rc | 28 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc index 12b4751848..3e7adc4834 100644 --- a/tests/qemu-iotests/common.rc +++ b/tests/qemu-iotests/common.rc @@ -282,12 +282,12 @@ _make_test_img() # extra qemu-img options can be added by tests # at least one argument (the image size) needs to be added local extra_img_options="" -local image_size=$* local optstr="" local img_name="" local use_backing=0 local backing_file="" local object_options="" +local misc_params=() if [ -n "$TEST_IMG_FILE" ]; then img_name=$TEST_IMG_FILE @@ -303,11 +303,23 @@ _make_test_img() optstr=$(_optstr_add "$optstr" "key-secret=keysec0") fi -if [ "$1" = "-b" ]; then -use_backing=1 -backing_file=$2 -image_size=$3 -fi +for param; do +if [ "$use_backing" = "1" -a -z "$backing_file" ]; then +backing_file=$param +continue +fi + +case "$param" in +-b) +use_backing=1 +;; + +*) +misc_params=("${misc_params[@]}" "$param") +;; +esac +done + if [ \( "$IMGFMT" = "qcow2" -o "$IMGFMT" = "qed" \) -a -n "$CLUSTER_SIZE" ]; then optstr=$(_optstr_add "$optstr" "cluster_size=$CLUSTER_SIZE") fi @@ -323,9 +335,9 @@ _make_test_img() # XXX(hch): have global image options? ( if [ $use_backing = 1 ]; then -$QEMU_IMG create $object_options -f $IMGFMT $extra_img_options -b "$backing_file" "$img_name" $image_size 2>&1 +$QEMU_IMG create $object_options -f $IMGFMT $extra_img_options -b "$backing_file" "$img_name" "${misc_params[@]}" 2>&1 else -$QEMU_IMG create $object_options -f $IMGFMT $extra_img_options "$img_name" $image_size 2>&1 +$QEMU_IMG create $object_options -f $IMGFMT $extra_img_options "$img_name" "${misc_params[@]}" 2>&1 fi ) | _filter_img_create -- 2.21.0
[RFC PATCH 06/23] qcow2: Add dummy has_subclusters() function
This function will be used by the qcow2 code to check if an image has subclusters or not. At the moment this simply returns false. Once all patches needed for subcluster support are ready then QEMU will be able to create and read images with subclusters and this function will return the actual value. Signed-off-by: Alberto Garcia --- block/qcow2.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/block/qcow2.h b/block/qcow2.h index 0b68c55c01..6d6fc57f41 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -485,6 +485,12 @@ typedef enum QCow2MetadataOverlap { #define INV_OFFSET (-1ULL) +static inline bool has_subclusters(BDRVQcow2State *s) +{ +/* FIXME: Return false until this feature is complete */ +return false; +} + static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice, int idx) { -- 2.20.1
[RFC PATCH 12/23] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
In the previous patch we added a new QCow2ClusterType named QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER. There is a couple of places where this new value needs to be handled, and that is what this patch does. Signed-off-by: Alberto Garcia --- block/qcow2.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 131711d6fa..c222cd261d 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1922,8 +1922,8 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs, *pnum = bytes; -if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) && -!s->crypto) { +if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC || + ret == QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER) && !s->crypto) { index_in_cluster = offset & (s->cluster_size - 1); *map = cluster_offset | index_in_cluster; *file = s->data_file->bs; @@ -1931,7 +1931,8 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs, } if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) { status |= BDRV_BLOCK_ZERO; -} else if (ret != QCOW2_CLUSTER_UNALLOCATED) { +} else if (ret != QCOW2_CLUSTER_UNALLOCATED && + ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER) { status |= BDRV_BLOCK_DATA; } if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) && @@ -2009,6 +2010,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs, switch (ret) { case QCOW2_CLUSTER_UNALLOCATED: +case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER: if (bs->backing) { BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO); @@ -3542,6 +3544,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs, nr = s->cluster_size; ret = qcow2_get_cluster_offset(bs, offset, , ); if (ret != QCOW2_CLUSTER_UNALLOCATED && +ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER && ret != QCOW2_CLUSTER_ZERO_PLAIN && ret != QCOW2_CLUSTER_ZERO_ALLOC) { qemu_co_mutex_unlock(>lock); @@ -3612,6 +3615,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs, switch (ret) { case QCOW2_CLUSTER_UNALLOCATED: +case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER: if (bs->backing && bs->backing->bs) { int64_t backing_length = bdrv_getlength(bs->backing->bs); if (src_offset >= backing_length) { -- 2.20.1
[RFC PATCH 05/23] qcow2: Document the Extended L2 Entries feature
Subcluster allocation in qcow2 is implemented by extending the existing L2 table entries and adding additional information to indicate the allocation status of each subcluster. This patch documents the changes to the qcow2 format and how they affect the calculation of the L2 cache size. Signed-off-by: Alberto Garcia --- docs/interop/qcow2.txt | 68 -- docs/qcow2-cache.txt | 19 +++- 2 files changed, 83 insertions(+), 4 deletions(-) diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt index af5711e533..d34261f955 100644 --- a/docs/interop/qcow2.txt +++ b/docs/interop/qcow2.txt @@ -39,6 +39,9 @@ The first cluster of a qcow2 image contains the file header: as the maximum cluster size and won't be able to open images with larger cluster sizes. +Note: if the image has Extended L2 Entries then cluster_bits +must be at least 14 (i.e. 16384 byte clusters). + 24 - 31: size Virtual disk size in bytes. @@ -109,7 +112,12 @@ in the description of a field. An External Data File Name header extension may be present if this bit is set. -Bits 3-63: Reserved (set to 0) +Bit 3: Extended L2 Entries. If this bit is set then +L2 table entries use an extended format that +allows subcluster-based allocation. See the +Extended L2 Entries section for more details. + +Bits 4-63: Reserved (set to 0) 80 - 87: compatible_features Bitmask of compatible features. An implementation can @@ -437,7 +445,7 @@ cannot be relaxed without an incompatible layout change). Given an offset into the virtual disk, the offset into the image file can be obtained as follows: -l2_entries = (cluster_size / sizeof(uint64_t)) +l2_entries = (cluster_size / sizeof(uint64_t))[*] l2_index = (offset / cluster_size) % l2_entries l1_index = (offset / cluster_size) / l2_entries @@ -447,6 +455,8 @@ obtained as follows: return cluster_offset + (offset % cluster_size) +[*] this changes if Extended L2 Entries are enabled, see next section + L1 table entry: Bit 0 - 8:Reserved (set to 0) @@ -487,7 +497,8 @@ Standard Cluster Descriptor: nor is data read from the backing file if the cluster is unallocated. -With version 2, this is always 0. +With version 2 or with extended L2 entries (see the next +section), this is always 0. 1 - 8:Reserved (set to 0) @@ -524,6 +535,57 @@ file (except if bit 0 in the Standard Cluster Descriptor is set). If there is no backing file or the backing file is smaller than the image, they shall read zeros for all parts that are not covered by the backing file. +== Extended L2 Entries == + +An image uses Extended L2 Entries if bit 3 is set on the incompatible_features +field of the header. + +In these images standard data clusters are divided into 32 subclusters of the +same size. They are contiguous and start from the beginning of the cluster. +Subclusters can be allocated independently and the L2 entry contains information +indicating the status of each one of them. Compressed data clusters don't have +subclusters so they are treated like in images without this feature. + +The size of an extended L2 entry is 128 bits so the number of entries per table +is calculated using this formula: + +l2_entries = (cluster_size / (2 * sizeof(uint64_t))) + +The first 64 bits have the same format as the standard L2 table entry described +in the previous section, with the exception of bit 0 of the standard cluster +descriptor. + +The last 64 bits contain a subcluster allocation bitmap with this format: + +Subcluster Allocation Bitmap (for standard clusters): + +Bit 0 - 31: Allocation status (one bit per subcluster) + +1: the subcluster is allocated. In this case the + host cluster offset field must contain a valid + offset. +0: the subcluster is not allocated. In this case + read requests shall go to the backing file or + return zeros if there is no backing file data. + +Bits are assigned starting from the most significant one. +(i.e. bit x is used for subcluster 31 - x) + +32 - 63Subcluster reads as zeros (one bit per subcluster) + +1: the subcluster reads as zeros. In this case the + allocation status bit must be unset. The host + cluster offset field may or may not be set. +
[RFC PATCH 22/23] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
Ideally it should be possible to zero individual subclusters using this function, but this is currently not implemented. Signed-off-by: Alberto Garcia --- block/qcow2.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/block/qcow2.c b/block/qcow2.c index c54278ab0b..2eb032aed7 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3544,6 +3544,12 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs, bytes = s->cluster_size; nr = s->cluster_size; ret = qcow2_get_cluster_offset(bs, offset, , ); +/* TODO: allow zeroing separate subclusters, we only allow + * zeroing full clusters at the moment. */ +if (nr != bytes) { +qemu_co_mutex_unlock(>lock); +return -ENOTSUP; +} if (ret != QCOW2_CLUSTER_UNALLOCATED && ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER && ret != QCOW2_CLUSTER_ZERO_PLAIN && -- 2.20.1
[RFC PATCH 19/23] qcow2: Fix offset calculation in handle_dependencies()
l2meta_cow_start() and l2meta_cow_end() are not necessarily cluster-aligned if the image has subclusters, so update the calculation of old_start and old_end to guarantee that no two requests try to write on the same cluster. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index dc72f0e595..75579c1470 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1262,8 +1262,8 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset, uint64_t start = guest_offset; uint64_t end = start + bytes; -uint64_t old_start = l2meta_cow_start(old_alloc); -uint64_t old_end = l2meta_cow_end(old_alloc); +uint64_t old_start = start_of_cluster(s, l2meta_cow_start(old_alloc)); +uint64_t old_end = ROUND_UP(l2meta_cow_end(old_alloc), s->cluster_size); if (end <= old_start || start >= old_end) { /* No intersection */ -- 2.20.1
[RFC PATCH 15/23] qcow2: Add subcluster support to zero_in_l2_slice()
Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an image has subclusters. Instead, the individual 'all zeroes' bits must be used. Signed-off-by: Alberto Garcia --- block/qcow2-cluster.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 71d4cc518a..c554b1a88c 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1849,7 +1849,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset, assert(nb_clusters <= INT_MAX); for (i = 0; i < nb_clusters; i++) { -uint64_t old_offset; +uint64_t old_offset, l2_entry = 0; QCow2ClusterType cluster_type; old_offset = get_l2_entry(s, l2_slice, l2_index + i); @@ -1866,12 +1866,18 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset, qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) { -set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); } else { -uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i); -set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO); +l2_entry = get_l2_entry(s, l2_slice, l2_index + i); } + +if (has_subclusters(s)) { +set_l2_bitmap(s, l2_slice, l2_index + i, QCOW_L2_BITMAP_ALL_ZEROES); +} else { +l2_entry |= QCOW_OFLAG_ZERO; +} + +set_l2_entry(s, l2_slice, l2_index + i, l2_entry); } qcow2_cache_put(s->l2_table_cache, (void **) _slice); -- 2.20.1
Re: [PATCH v2 1/2] nbd: Don't send oversize strings
15.10.2019 18:07, Eric Blake wrote: > On 10/11/19 2:32 AM, Vladimir Sementsov-Ogievskiy wrote: >> 11.10.2019 0:00, Eric Blake wrote: >>> Qemu as server currently won't accept export names larger than 256 >>> bytes, nor create dirty bitmap names longer than 1023 bytes, so most >>> uses of qemu as client or server have no reason to get anywhere near >>> the NBD spec maximum of a 4k limit per string. >>> >>> However, we weren't actually enforcing things, ignoring when the >>> remote side violates the protocol on input, and also having several >>> code paths where we send oversize strings on output (for example, >>> qemu-nbd --description could easily send more than 4k). Tighten >>> things up as follows: >>> >>> client: >>> - Perform bounds check on export name and dirty bitmap request prior >>> to handing it to server >>> - Validate that copied server replies are not too long (ignoring >>> NBD_INFO_* replies that are not copied is not too bad) >>> server: >>> - Perform bounds check on export name and description prior to >>> advertising it to client >>> - Reject client name or metadata query that is too long >>> >>> Signed-off-by: Eric Blake >>> --- > >>> +++ b/include/block/nbd.h >>> @@ -232,6 +232,7 @@ enum { >>> * going larger would require an audit of more code to make sure we >>> * aren't overflowing some other buffer. */ >> >> This comment says, that we restrict export name to 256... > > Yes, because we still stack-allocate the name in places, but 4k is too large > for stack allocation. But we're inconsistent on where we use the smaller > 256-limit; the server won't serve an image that large, but doesn't prevent a > client from requesting a 4k name export (even though that export will not be > present). > > >>> +++ b/blockdev-nbd.c >>> @@ -162,6 +162,11 @@ void qmp_nbd_server_add(const char *device, bool >>> has_name, const char *name, >>> name = device; >>> } >>> >>> + if (strlen(name) > NBD_MAX_STRING_SIZE) { >>> + error_setg(errp, "export name '%s' too long", name); >>> + return; >>> + } >> >> Hmmm, no, so here we restrict to 4096, but, we will not allow client to >> request more than >> 256. Seems, to correctly update server-part, we should drop >> NBD_MAX_NAME_SIZE and do the >> audit mentioned in the comment above its definition. > > Yeah, I guess it's time to just get rid of NBD_MAX_NAME_SIZE, and move away > from stack allocations. Should I do that as a followup to this patch, or > spin a v3? Hmm. It's OK too. With - fixed mem-leak in nbd_process_options - s/x_dirty_bitmap/x-dirty-bitmap in nbd_process_options in error message - following yours new wordings Reviewed-by: Vladimir Sementsov-Ogievskiy However, this patch introduces possible crash point, asserting on bitmap name below, so it would better be fixed before this patch or immediately after it.. Still, it's unlikely to have a bitmap with name longer than 4k.. > >>> +++ b/nbd/client.c >>> @@ -289,8 +289,8 @@ static int nbd_receive_list(QIOChannel *ioc, char >>> **name, char **description, >>> return -1; >>> } >>> len -= sizeof(namelen); >>> - if (len < namelen) { >>> - error_setg(errp, "incorrect option name length"); >>> + if (len < namelen || namelen > NBD_MAX_STRING_SIZE) { >>> + error_setg(errp, "incorrect list name length"); >> >> New wording made me go above and read the comment, what functions does. >> Comment is good, but without >> it, it sounds like name of the list for me... > > Maybe: > > incorrect name length in server's list response Yes, this is better, thanks > >> >>> nbd_send_opt_abort(ioc); >>> return -1; >>> } >>> @@ -303,6 +303,11 @@ static int nbd_receive_list(QIOChannel *ioc, char >>> **name, char **description, >>> local_name[namelen] = '\0'; >>> len -= namelen; >>> if (len) { >>> + if (len > NBD_MAX_STRING_SIZE) { >>> + error_setg(errp, "incorrect list description length"); > > and > > incorrect description length in server's list response > > >>> @@ -648,6 +657,7 @@ static int nbd_send_meta_query(QIOChannel *ioc, >>> uint32_t opt, >>> if (query) { >>> query_len = strlen(query); >>> data_len += sizeof(query_len) + query_len; >>> + assert(query_len <= NBD_MAX_STRING_SIZE); >>> } else { >>> assert(opt == NBD_OPT_LIST_META_CONTEXT); >>> } >> >> you may assert export_len as well.. > > It was asserted earlier, but doing it again might not hurt, especially if I > do the followup patch getting rid of NBD_MAX_NAME_SIZE > > >>> @@ -1561,6 +1569,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, >>> uint64_t dev_offset, >>> exp->export_bitmap = bm; >>> exp->export_bitmap_context = >>> g_strdup_printf("qemu:dirty-bitmap:%s", >>> bitmap); >>> + /* See
Re: [PULL 00/15] Block layer patches
On Mon, 14 Oct 2019 at 17:03, Kevin Wolf wrote: > > The following changes since commit 22dbfdecc3c52228d3489da3fe81da92b21197bf: > > Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20191010.0' > into staging (2019-10-14 15:09:08 +0100) > > are available in the Git repository at: > > git://repo.or.cz/qemu/kevin.git tags/for-upstream > > for you to fetch changes up to a1406a9262a087d9ec9627b88da13c4590b61dae: > > iotests: Test large write request to qcow2 file (2019-10-14 17:12:48 +0200) > > > Block layer patches: > > - block: Fix crash with qcow2 partial cluster COW with small cluster > sizes (misaligned write requests with BDRV_REQ_NO_FALLBACK) > - qcow2: Fix integer overflow potentially causing corruption with huge > requests > - vhdx: Detect truncated image files > - tools: Support help options for --object > - Various block-related replay improvements > - iotests/028: Fix for long $TEST_DIRs Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2 for any user-visible changes. -- PMM
Re: [PATCH v2 00/20] nvme: support NVMe v1.3d, SGLs and multiple namespaces
Patchew URL: https://patchew.org/QEMU/20191015103900.313928-1-...@irrelevant.dk/ Hi, This series failed the docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #! /bin/bash export ARCH=x86_64 make docker-image-fedora V=1 NETWORK=1 time make docker-test-mingw@fedora J=14 NETWORK=1 === TEST SCRIPT END === CC hw/misc/imx7_gpr.o CC hw/misc/mst_fpga.o /tmp/qemu-test/src/hw/block/nvme.c: In function 'nvme_map_prp': /tmp/qemu-test/src/hw/block/nvme.c:232:42: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] trace_nvme_err_addr_read((void *) prp2); ^ /tmp/qemu-test/src/hw/block/nvme.c:258:50: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] trace_nvme_err_addr_read((void *) prp_ent); ^ /tmp/qemu-test/src/hw/block/nvme.c: In function 'nvme_map_sgl': /tmp/qemu-test/src/hw/block/nvme.c:414:42: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] trace_nvme_err_addr_read((void *) addr); ^ /tmp/qemu-test/src/hw/block/nvme.c:429:38: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] trace_nvme_err_addr_read((void *) addr); ^ /tmp/qemu-test/src/hw/block/nvme.c:478:38: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] trace_nvme_err_addr_read((void *) addr); ^ /tmp/qemu-test/src/hw/block/nvme.c:493:34: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] trace_nvme_err_addr_read((void *) addr); ^ /tmp/qemu-test/src/hw/block/nvme.c: In function 'nvme_post_cqes': /tmp/qemu-test/src/hw/block/nvme.c:847:39: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] trace_nvme_err_addr_write((void *) addr); ^ /tmp/qemu-test/src/hw/block/nvme.c: In function 'nvme_process_sq': /tmp/qemu-test/src/hw/block/nvme.c:1971:38: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] trace_nvme_err_addr_read((void *) addr); ^ cc1: all warnings being treated as errors make: *** [/tmp/qemu-test/src/rules.mak:69: hw/block/nvme.o] Error 1 make: *** Waiting for unfinished jobs Traceback (most recent call last): File "./tests/docker/docker.py", line 662, in --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=8aa0a85fff1f457c9dc7c826d7b3189d', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-2g1bl41s/src/docker-src.2019-10-15-13.13.48.993:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=8aa0a85fff1f457c9dc7c826d7b3189d make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-2g1bl41s/src' make: *** [docker-run-test-mingw@fedora] Error 2 real5m56.522s user0m7.913s The full log is available at http://patchew.org/logs/20191015103900.313928-1-...@irrelevant.dk/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [PATCH 2/2] core: replace getpagesize() with qemu_real_host_page_size
On Tue, Oct 15, 2019 at 02:45:15PM +0300, Yuval Shaia wrote: >On Sun, Oct 13, 2019 at 10:11:45AM +0800, Wei Yang wrote: >> There are three page size in qemu: >> >> real host page size >> host page size >> target page size >> >> All of them have dedicate variable to represent. For the last two, we >> use the same form in the whole qemu project, while for the first one we >> use two forms: qemu_real_host_page_size and getpagesize(). >> >> qemu_real_host_page_size is defined to be a replacement of >> getpagesize(), so let it serve the role. >> >> [Note] Not fully tested for some arch or device. >> >> Signed-off-by: Wei Yang >> --- >> accel/kvm/kvm-all.c| 6 +++--- >> backends/hostmem.c | 2 +- >> block.c| 4 ++-- >> block/file-posix.c | 9 + >> block/io.c | 2 +- >> block/parallels.c | 2 +- >> block/qcow2-cache.c| 2 +- >> contrib/vhost-user-gpu/vugbm.c | 2 +- >> exec.c | 6 +++--- >> hw/intc/s390_flic_kvm.c| 2 +- >> hw/ppc/mac_newworld.c | 2 +- >> hw/ppc/spapr_pci.c | 2 +- >> hw/rdma/vmw/pvrdma_main.c | 2 +- > >for pvrdma stuff: > >Reviewed-by: Yuval Shaia >Tested-by: Yuval Shaia Thanks > >> hw/vfio/spapr.c| 7 --- >> include/exec/ram_addr.h| 2 +- >> include/qemu/osdep.h | 4 ++-- >> migration/migration.c | 2 +- >> migration/postcopy-ram.c | 4 ++-- >> monitor/misc.c | 2 +- >> target/ppc/kvm.c | 2 +- >> tests/vhost-user-bridge.c | 8 >> util/mmap-alloc.c | 10 +- >> util/oslib-posix.c | 4 ++-- >> util/oslib-win32.c | 2 +- >> util/vfio-helpers.c| 12 ++-- >> 25 files changed, 52 insertions(+), 50 deletions(-) >> >> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c >> index d2d96d73e8..140b0bd8f6 100644 >> --- a/accel/kvm/kvm-all.c >> +++ b/accel/kvm/kvm-all.c >> @@ -52,7 +52,7 @@ >> /* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We >> * need to use the real host PAGE_SIZE, as that's what KVM will use. >> */ >> -#define PAGE_SIZE getpagesize() >> +#define PAGE_SIZE qemu_real_host_page_size >> >> //#define DEBUG_KVM >> >> @@ -507,7 +507,7 @@ static int >> kvm_get_dirty_pages_log_range(MemoryRegionSection *section, >> { >> ram_addr_t start = section->offset_within_region + >> memory_region_get_ram_addr(section->mr); >> -ram_addr_t pages = int128_get64(section->size) / getpagesize(); >> +ram_addr_t pages = int128_get64(section->size) / >> qemu_real_host_page_size; >> >> cpu_physical_memory_set_dirty_lebitmap(bitmap, start, pages); >> return 0; >> @@ -1841,7 +1841,7 @@ static int kvm_init(MachineState *ms) >> * even with KVM. TARGET_PAGE_SIZE is assumed to be the minimum >> * page size for the system though. >> */ >> -assert(TARGET_PAGE_SIZE <= getpagesize()); >> +assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size); >> >> s->sigmask_len = 8; >> >> diff --git a/backends/hostmem.c b/backends/hostmem.c >> index 6d333dc23c..e773bdfa6e 100644 >> --- a/backends/hostmem.c >> +++ b/backends/hostmem.c >> @@ -304,7 +304,7 @@ size_t host_memory_backend_pagesize(HostMemoryBackend >> *memdev) >> #else >> size_t host_memory_backend_pagesize(HostMemoryBackend *memdev) >> { >> -return getpagesize(); >> +return qemu_real_host_page_size; >> } >> #endif >> >> diff --git a/block.c b/block.c >> index 5944124845..98f47e2902 100644 >> --- a/block.c >> +++ b/block.c >> @@ -106,7 +106,7 @@ size_t bdrv_opt_mem_align(BlockDriverState *bs) >> { >> if (!bs || !bs->drv) { >> /* page size or 4k (hdd sector size) should be on the safe side */ >> -return MAX(4096, getpagesize()); >> +return MAX(4096, qemu_real_host_page_size); >> } >> >> return bs->bl.opt_mem_alignment; >> @@ -116,7 +116,7 @@ size_t bdrv_min_mem_align(BlockDriverState *bs) >> { >> if (!bs || !bs->drv) { >> /* page size or 4k (hdd sector size) should be on the safe side */ >> -return MAX(4096, getpagesize()); >> +return MAX(4096, qemu_real_host_page_size); >> } >> >> return bs->bl.min_mem_alignment; >> diff --git a/block/file-posix.c b/block/file-posix.c >> index f12c06de2d..f60ac3f93f 100644 >> --- a/block/file-posix.c >> +++ b/block/file-posix.c >> @@ -322,7 +322,7 @@ static void raw_probe_alignment(BlockDriverState *bs, >> int fd, Error **errp) >> { >> BDRVRawState *s = bs->opaque; >> char *buf; >> -size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize()); >> +size_t max_align = MAX(MAX_BLOCKSIZE, qemu_real_host_page_size); >> size_t alignments[] = {1, 512, 1024, 2048, 4096}; >> >> /* For SCSI generic devices the alignment is not really used. >> @@
Re: [PATCH v3 0/5] qcow2: advanced compression options
Patchew URL: https://patchew.org/QEMU/1571163625-642312-1-git-send-email-andrey.shinkev...@virtuozzo.com/ Hi, This series failed the docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #! /bin/bash export ARCH=x86_64 make docker-image-fedora V=1 NETWORK=1 time make docker-test-mingw@fedora J=14 NETWORK=1 === TEST SCRIPT END === CC block/blklogwrites.o CC block/block-backend.o /tmp/qemu-test/src/block/qcow2.c: In function 'qcow2_co_pwritev_compressed_part': /tmp/qemu-test/src/block/qcow2.c:4244:9: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized] int ret; ^~~ cc1: all warnings being treated as errors make: *** [/tmp/qemu-test/src/rules.mak:69: block/qcow2.o] Error 1 make: *** Waiting for unfinished jobs Traceback (most recent call last): File "./tests/docker/docker.py", line 664, in --- File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 291, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=4299392cefd911e9addb68b59973b7d0', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-r2c14at8/src/docker-src.2019-10-16-01.53.08.3890:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2. make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-r2c14at8/src' make: *** [docker-run-test-mingw@fedora] Error 2 real2m50.343s user0m8.261s The full log is available at http://patchew.org/logs/1571163625-642312-1-git-send-email-andrey.shinkev...@virtuozzo.com/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [PATCH v2 00/21] iotests: Allow ./check -o data_file
Patchew URL: https://patchew.org/QEMU/20191015142729.18123-1-mre...@redhat.com/ Hi, This series seems to have some coding style problems. See output below for more information: Subject: [PATCH v2 00/21] iotests: Allow ./check -o data_file Type: series Message-id: 20191015142729.18123-1-mre...@redhat.com === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 7e75916 iotests: Allow check -o data_file a21918d iotests: Disable data_file where it cannot be used 1eb7209 iotests: Make 198 work with data_file 02453ff iotests: Make 137 work with data_file cdb651c iotests: Make 110 work with data_file 1b30e90 iotests: Make 091 work with data_file 26ebffa iotests: Avoid cp/mv of test images 5d6ba79 iotests: Use _rm_test_img for deleting test images 4c20fa0 iotests: Avoid qemu-img create 944555b iotests: Drop IMGOPTS use in 267 9037b83 iotests: Replace IMGOPTS='' by --no-opts e62282b iotests: Replace IMGOPTS= by -o 26d39b5 iotests: Inject space into -ocompat=0.10 in 051 99d129e iotests: Add -o and --no-opts to _make_test_img 301f2c3 iotests: Let _make_test_img parse its parameters 53a8dea iotests: Drop compat=1.1 in 050 85b18f8 iotests: Replace IMGOPTS by _unsupported_imgopts 476fb23 iotests: Filter refcount_order in 036 67b9119 iotests: Add _filter_json_filename fbf9402 iotests/qcow2.py: Split feature fields into bits afe3486 iotests/qcow2.py: Add dump-header-exts === OUTPUT BEGIN === 1/21 Checking commit afe348661672 (iotests/qcow2.py: Add dump-header-exts) ERROR: line over 90 characters #32: FILE: tests/qemu-iotests/qcow2.py:237: +[ 'dump-header-exts', cmd_dump_header_exts, 0, 'Dump image header extensions' ], total: 1 errors, 0 warnings, 17 lines checked Patch 1/21 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 2/21 Checking commit fbf940255d05 (iotests/qcow2.py: Split feature fields into bits) 3/21 Checking commit 67b9119032ad (iotests: Add _filter_json_filename) 4/21 Checking commit 476fb233c777 (iotests: Filter refcount_order in 036) 5/21 Checking commit 85b18f83a826 (iotests: Replace IMGOPTS by _unsupported_imgopts) 6/21 Checking commit 53a8dea8fb7b (iotests: Drop compat=1.1 in 050) 7/21 Checking commit 301f2c32204c (iotests: Let _make_test_img parse its parameters) 8/21 Checking commit 99d129e91dbe (iotests: Add -o and --no-opts to _make_test_img) 9/21 Checking commit 26d39b59dfe1 (iotests: Inject space into -ocompat=0.10 in 051) 10/21 Checking commit e62282b2ad38 (iotests: Replace IMGOPTS= by -o) 11/21 Checking commit 9037b83425c4 (iotests: Replace IMGOPTS='' by --no-opts) 12/21 Checking commit 944555b5c283 (iotests: Drop IMGOPTS use in 267) 13/21 Checking commit 4c20fa09b6c5 (iotests: Avoid qemu-img create) 14/21 Checking commit 5d6ba791204b (iotests: Use _rm_test_img for deleting test images) 15/21 Checking commit 26ebffafbd87 (iotests: Avoid cp/mv of test images) 16/21 Checking commit 1b30e9035908 (iotests: Make 091 work with data_file) 17/21 Checking commit cdb651c3c22b (iotests: Make 110 work with data_file) 18/21 Checking commit 02453ff71311 (iotests: Make 137 work with data_file) 19/21 Checking commit 1eb720910a65 (iotests: Make 198 work with data_file) 20/21 Checking commit a21918dcdf92 (iotests: Disable data_file where it cannot be used) 21/21 Checking commit 7e7591696382 (iotests: Allow check -o data_file) === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20191015142729.18123-1-mre...@redhat.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [PATCH 2/2] core: replace getpagesize() with qemu_real_host_page_size
On Sun, Oct 13, 2019 at 08:28:41PM +1100, David Gibson wrote: >On Sun, Oct 13, 2019 at 10:11:45AM +0800, Wei Yang wrote: >> There are three page size in qemu: >> >> real host page size >> host page size >> target page size >> >> All of them have dedicate variable to represent. For the last two, we >> use the same form in the whole qemu project, while for the first one we >> use two forms: qemu_real_host_page_size and getpagesize(). >> >> qemu_real_host_page_size is defined to be a replacement of >> getpagesize(), so let it serve the role. >> >> [Note] Not fully tested for some arch or device. >> >> Signed-off-by: Wei Yang > >Reviewed-by: David Gibson > >Although the chances of someone messing this up again are almost 100%. > Hi, David I found put a check in checkpatch.pl may be a good way to prevent it. Just draft a patch, hope you would like it. >-- >David Gibson | I'll have my music baroque, and my code >david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! >http://www.ozlabs.org/~dgibson -- Wei Yang Help you, Help me
Re: [PULL 1/1] test-bdrv-drain: fix iothread_join() hang
On Mon, Oct 14, 2019 at 01:11:41PM +0200, Paolo Bonzini wrote: > On 14/10/19 10:52, Stefan Hajnoczi wrote: > > tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run(): > > > > while (!atomic_read(>stopping)) { > > aio_poll(iothread->ctx, true); > > } > > > > The iothread_join() function works as follows: > > > > void iothread_join(IOThread *iothread) > > { > > iothread->stopping = true; > > aio_notify(iothread->ctx); > > qemu_thread_join(>thread); > > > > If iothread_run() checks iothread->stopping before the iothread_join() > > thread sets stopping to true, then aio_notify() may be optimized away > > and iothread_run() hangs forever in aio_poll(). > > > > The correct way to change iothread->stopping is from a BH that executes > > within iothread_run(). This ensures that iothread->stopping is checked > > after we set it to true. > > > > This was already fixed for ./iothread.c (note this is a different source > > file!) by commit 2362a28ea11c145e1a13ae79342d76dc118a72a6 ("iothread: > > fix iothread_stop() race condition"), but not for tests/iothread.c. > > Aha, I did have some kind of dejavu when sending the patch I have just > sent; let's see if this also fixes the test-aio-multithread assertion > failure. > > Note that with this change the atomic read of iothread->stopping can go > away; I can send a separate patch later. Yes, I thought about the atomic_read() later as well. Stefan signature.asc Description: PGP signature
Re: [PULL 01/19] util/hbitmap: strict hbitmap_reset
Am 14.10.2019 um 20:10 hat John Snow geschrieben: > > > On 10/11/19 7:18 PM, John Snow wrote: > > > > > > On 10/11/19 5:48 PM, Eric Blake wrote: > >> On 10/11/19 4:25 PM, John Snow wrote: > >>> From: Vladimir Sementsov-Ogievskiy > >>> > >>> hbitmap_reset has an unobvious property: it rounds requested region up. > >>> It may provoke bugs, like in recently fixed write-blocking mode of > >>> mirror: user calls reset on unaligned region, not keeping in mind that > >>> there are possible unrelated dirty bytes, covered by rounded-up region > >>> and information of this unrelated "dirtiness" will be lost. > >>> > >>> Make hbitmap_reset strict: assert that arguments are aligned, allowing > >>> only one exception when @start + @count == hb->orig_size. It's needed > >>> to comfort users of hbitmap_next_dirty_area, which cares about > >>> hb->orig_size. > >>> > >>> Signed-off-by: Vladimir Sementsov-Ogievskiy > >>> Reviewed-by: Max Reitz > >>> Message-Id: <20190806152611.280389-1-vsement...@virtuozzo.com> > >>> [Maintainer edit: Max's suggestions from on-list. --js] > >>> Signed-off-by: John Snow > >>> --- > >>> include/qemu/hbitmap.h | 5 + > >>> tests/test-hbitmap.c | 2 +- > >>> util/hbitmap.c | 4 > >>> 3 files changed, 10 insertions(+), 1 deletion(-) > >>> > >> > >>> +++ b/util/hbitmap.c > >>> @@ -476,6 +476,10 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, > >>> uint64_t count) > >>> /* Compute range in the last layer. */ > >>> uint64_t first; > >>> uint64_t last = start + count - 1; > >>> + uint64_t gran = 1ULL << hb->granularity; > >>> + > >>> + assert(!(start & (gran - 1))); > >>> + assert(!(count & (gran - 1)) || (start + count == hb->orig_size)); > >> > >> I know I'm replying a bit late (since this is now a pull request), but > >> would it be worth using the dedicated macro: > >> > >> assert(QEMU_IS_ALIGNED(start, gran)); > >> assert(QEMU_IS_ALIGNED(count, gran) || start + count == hb->orig_size); > >> > >> instead of open-coding it? (I would also drop the extra () around the > >> right half of ||). If we want it, that would now be a followup patch. > > I've noticed that seasoned C programmers hate extra parentheses a lot. > I've noticed that I cannot remember operator precedence enough to ever > feel like this is actually an improvement. > > Something about a nice weighted tree of ((expr1) || (expr2)) feels > soothing to my weary eyes. So, if it's not terribly important, I'd > prefer to leave it as-is. I don't mind the parentheses, but I do prefer QEMU_IS_ALIGNED() to the open-coded version. Would that be a viable compromise? Kevin
Re: [PULL 1/2] trace: add --group=all to tracing.txt
On Mon, Oct 14, 2019 at 11:08:25AM +0200, Philippe Mathieu-Daudé wrote: > Hi Stefan, > > On 10/14/19 10:57 AM, Stefan Hajnoczi wrote: > > tracetool needs to know the group name ("all", "root", or a specific > > subdirectory). Also remove the stdin redirection because tracetool.py > > needs the path to the trace-events file. Update the documentation. > > > > Fixes: 2098c56a9bc5901e145fa5d4759f075808811685 > > ("trace: move setting of group name into Makefiles") > > Launchpad: https://bugs.launchpad.net/bugs/1844814 > > Sorry I didn't noticed that earlier, but on > https://wiki.qemu.org/Contribute/SubmitAPatch#Write_a_meaningful_commit_message > we recommend using the 'Buglink' tag. > Not sure it's worth resending another pull request... Sure, it hasn't been merged yet so I can send a v2. Stefan signature.asc Description: PGP signature
[PATCH v2 07/20] nvme: refactor device realization
This patch splits up nvme_realize into multiple individual functions, each initializing a different subset of the device. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 176 +++- hw/block/nvme.h | 22 ++ 2 files changed, 135 insertions(+), 63 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 84e4f2ea7a15..1fdb3b8655ed 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -43,6 +43,8 @@ #include "trace.h" #include "nvme.h" +#define NVME_MAX_QS PCI_MSIX_FLAGS_QSIZE + #define NVME_GUEST_ERR(trace, fmt, ...) \ do { \ (trace_##trace)(__VA_ARGS__); \ @@ -1336,67 +1338,106 @@ static const MemoryRegionOps nvme_cmb_ops = { }, }; -static void nvme_realize(PCIDevice *pci_dev, Error **errp) +static int nvme_check_constraints(NvmeCtrl *n, Error **errp) { -NvmeCtrl *n = NVME(pci_dev); -NvmeIdCtrl *id = >id_ctrl; - -int i; -int64_t bs_size; -uint8_t *pci_conf; - -if (!n->params.num_queues) { -error_setg(errp, "num_queues can't be zero"); -return; -} +NvmeParams *params = >params; if (!n->conf.blk) { -error_setg(errp, "drive property not set"); -return; +error_setg(errp, "nvme: block backend not configured"); +return 1; } -bs_size = blk_getlength(n->conf.blk); -if (bs_size < 0) { -error_setg(errp, "could not get backing file size"); -return; +if (!params->serial) { +error_setg(errp, "nvme: serial not configured"); +return 1; } -if (!n->params.serial) { -error_setg(errp, "serial property not set"); -return; +if ((params->num_queues < 1 || params->num_queues > NVME_MAX_QS)) { +error_setg(errp, "nvme: invalid queue configuration"); +return 1; } + +return 0; +} + +static int nvme_init_blk(NvmeCtrl *n, Error **errp) +{ blkconf_blocksizes(>conf); if (!blkconf_apply_backend_options(>conf, blk_is_read_only(n->conf.blk), - false, errp)) { -return; +false, errp)) { +return 1; } -pci_conf = pci_dev->config; -pci_conf[PCI_INTERRUPT_PIN] = 1; -pci_config_set_prog_interface(pci_dev->config, 0x2); -pci_config_set_class(pci_dev->config, PCI_CLASS_STORAGE_EXPRESS); -pcie_endpoint_cap_init(pci_dev, 0x80); +return 0; +} +static void nvme_init_state(NvmeCtrl *n) +{ n->num_namespaces = 1; n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4); -n->ns_size = bs_size / (uint64_t)n->num_namespaces; - n->namespaces = g_new0(NvmeNamespace, n->num_namespaces); n->sq = g_new0(NvmeSQueue *, n->params.num_queues); n->cq = g_new0(NvmeCQueue *, n->params.num_queues); +} + +static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev) +{ +NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 2); +NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0); + +NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1); +NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 1); +NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0); +NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1); +NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1); +NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); +NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb); + +n->cmbloc = n->bar.cmbloc; +n->cmbsz = n->bar.cmbsz; + +n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz)); +memory_region_init_io(>ctrl_mem, OBJECT(n), _cmb_ops, n, +"nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz)); +pci_register_bar(pci_dev, NVME_CMBLOC_BIR(n->bar.cmbloc), +PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64 | +PCI_BASE_ADDRESS_MEM_PREFETCH, >ctrl_mem); +} + +static void nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev) +{ +uint8_t *pci_conf = pci_dev->config; -memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, - "nvme", n->reg_size); +pci_conf[PCI_INTERRUPT_PIN] = 1; +pci_config_set_prog_interface(pci_conf, 0x2); +pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL); +pci_config_set_device_id(pci_conf, 0x5845); +pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS); +pcie_endpoint_cap_init(pci_dev, 0x80); + +memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, "nvme", +n->reg_size); pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, >iomem); msix_init_exclusive_bar(pci_dev, n->params.num_queues, 4, NULL); +if (n->params.cmb_size_mb) { +nvme_init_cmb(n, pci_dev); +} +} + +static void nvme_init_ctrl(NvmeCtrl *n) +{ +NvmeIdCtrl *id = >id_ctrl; +NvmeParams *params = >params; +uint8_t *pci_conf = n->parent_obj.config; + id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID)); id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID)); strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe
[PATCH v2 00/20] nvme: support NVMe v1.3d, SGLs and multiple namespaces
Hi, (Quick note to Fam): most of this series is irrelevant to you as the maintainer of the nvme block driver, but patch "nvme: add support for scatter gather lists" touches block/nvme.c due to changes in the shared NvmeCmd struct. Anyway, v2 comes with a good bunch of changes. Compared to v1[1], I have squashed some commits in the beginning of the series and heavily refactored "nvme: support multiple block requests per request" into the new commit "nvme: allow multiple aios per command". I have also removed the original implementation of the Abort command (commit "nvme: add support for the abort command") as it is currently too tricky to test reliably. It has been replaced by a stub that, besides a trivial sanity check, just fails to abort the given command. *Some* implementation of the Abort command is mandatory, but given the "best effort" nature of the command this is acceptable for now. When the device gains support for arbitration it should be less tricky to test. The support for multiple namespaces is now backwards compatible. The nvme device still accepts a 'drive' parameter, but for multiple namespaces the use of 'nvme-ns' devices are required. I also integrated some feedback from Paul so the device supports non-consecutive namespace ids. I have also added some new commits at the end: - "nvme: bump controller pci device id" makes sure the Linux kernel doesn't apply any quirks to the controller that it no longer has. - "nvme: handle dma errors" won't actually do anything before this[2] fix to include/hw/pci/pci.h is merged. With these two patches added, the device reliably passes some additional nasty tests from blktests (block/011 "disable PCI device while doing I/O" and block/019 "break PCI link device while doing I/O"). Before this patch, block/011 would pass from time to time if you were lucky, but would at least mess up the controller pretty badly, causing a reset in the best case. [1]: https://patchwork.kernel.org/project/qemu-devel/list/?series=142383 [2]: https://patchwork.kernel.org/patch/11184911/ Klaus Jensen (20): nvme: remove superfluous breaks nvme: move device parameters to separate struct nvme: add missing fields in the identify controller data structure nvme: populate the mandatory subnqn and ver fields nvme: allow completion queues in the cmb nvme: add support for the abort command nvme: refactor device realization nvme: add support for the get log page command nvme: add support for the asynchronous event request command nvme: add logging to error information log page nvme: add missing mandatory features nvme: bump supported specification version to 1.3 nvme: refactor prp mapping nvme: allow multiple aios per command nvme: add support for scatter gather lists nvme: support multiple namespaces nvme: bump controller pci device id nvme: remove redundant NvmeCmd pointer parameter nvme: make lba data size configurable nvme: handle dma errors block/nvme.c | 18 +- hw/block/Makefile.objs |2 +- hw/block/nvme-ns.c | 139 +++ hw/block/nvme-ns.h | 60 ++ hw/block/nvme.c| 1863 +--- hw/block/nvme.h| 219 - hw/block/trace-events | 37 +- include/block/nvme.h | 132 ++- 8 files changed, 2094 insertions(+), 376 deletions(-) create mode 100644 hw/block/nvme-ns.c create mode 100644 hw/block/nvme-ns.h -- 2.23.0
[PATCH v2 06/20] nvme: add support for the abort command
Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1, Section 5.1 ("Abort command"). The Abort command is a best effort command; for now, the device always fails to abort the given command. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 16 1 file changed, 16 insertions(+) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index daa2367b0863..84e4f2ea7a15 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -741,6 +741,18 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd) } } +static uint16_t nvme_abort(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +{ +uint16_t sqid = le32_to_cpu(cmd->cdw10) & 0x; + +req->cqe.result = 1; +if (nvme_check_sqid(n, sqid)) { +return NVME_INVALID_FIELD | NVME_DNR; +} + +return NVME_SUCCESS; +} + static inline void nvme_set_timestamp(NvmeCtrl *n, uint64_t ts) { trace_nvme_setfeat_timestamp(ts); @@ -859,6 +871,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) trace_nvme_err_invalid_setfeat(dw10); return NVME_INVALID_FIELD | NVME_DNR; } + return NVME_SUCCESS; } @@ -875,6 +888,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) return nvme_create_cq(n, cmd); case NVME_ADM_CMD_IDENTIFY: return nvme_identify(n, cmd); +case NVME_ADM_CMD_ABORT: +return nvme_abort(n, cmd, req); case NVME_ADM_CMD_SET_FEATURES: return nvme_set_feature(n, cmd, req); case NVME_ADM_CMD_GET_FEATURES: @@ -1388,6 +1403,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) id->ieee[2] = 0xb3; id->ver = cpu_to_le32(0x00010201); id->oacs = cpu_to_le16(0); +id->acl = 3; id->frmw = 7 << 1; id->lpa = 1 << 0; id->sqes = (0x6 << 4) | 0x6; -- 2.23.0
[PATCH v2 02/20] nvme: move device parameters to separate struct
Move device configuration parameters to separate struct to make it explicit what is configurable and what is set internally. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 44 ++-- hw/block/nvme.h | 16 +--- 2 files changed, 35 insertions(+), 25 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index c06e3ca31905..277700fdcc58 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -64,12 +64,12 @@ static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size) static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid) { -return sqid < n->num_queues && n->sq[sqid] != NULL ? 0 : -1; +return sqid < n->params.num_queues && n->sq[sqid] != NULL ? 0 : -1; } static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid) { -return cqid < n->num_queues && n->cq[cqid] != NULL ? 0 : -1; +return cqid < n->params.num_queues && n->cq[cqid] != NULL ? 0 : -1; } static void nvme_inc_cq_tail(NvmeCQueue *cq) @@ -631,7 +631,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd) trace_nvme_err_invalid_create_cq_addr(prp1); return NVME_INVALID_FIELD | NVME_DNR; } -if (unlikely(vector > n->num_queues)) { +if (unlikely(vector > n->params.num_queues)) { trace_nvme_err_invalid_create_cq_vector(vector); return NVME_INVALID_IRQ_VECTOR | NVME_DNR; } @@ -783,7 +783,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) trace_nvme_getfeat_vwcache(result ? "enabled" : "disabled"); break; case NVME_NUMBER_OF_QUEUES: -result = cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16)); +result = cpu_to_le32((n->params.num_queues - 2) | +((n->params.num_queues - 2) << 16)); trace_nvme_getfeat_numq(result); break; case NVME_TIMESTAMP: @@ -827,9 +828,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) case NVME_NUMBER_OF_QUEUES: trace_nvme_setfeat_numq((dw11 & 0x) + 1, ((dw11 >> 16) & 0x) + 1, -n->num_queues - 1, n->num_queues - 1); -req->cqe.result = -cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16)); +n->params.num_queues - 1, +n->params.num_queues - 1); +req->cqe.result = cpu_to_le32((n->params.num_queues - 2) | +((n->params.num_queues - 2) << 16)); break; case NVME_TIMESTAMP: return nvme_set_feature_timestamp(n, cmd); @@ -900,12 +902,12 @@ static void nvme_clear_ctrl(NvmeCtrl *n) blk_drain(n->conf.blk); -for (i = 0; i < n->num_queues; i++) { +for (i = 0; i < n->params.num_queues; i++) { if (n->sq[i] != NULL) { nvme_free_sq(n->sq[i], n); } } -for (i = 0; i < n->num_queues; i++) { +for (i = 0; i < n->params.num_queues; i++) { if (n->cq[i] != NULL) { nvme_free_cq(n->cq[i], n); } @@ -1308,7 +1310,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) int64_t bs_size; uint8_t *pci_conf; -if (!n->num_queues) { +if (!n->params.num_queues) { error_setg(errp, "num_queues can't be zero"); return; } @@ -1324,7 +1326,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) return; } -if (!n->serial) { +if (!n->params.serial) { error_setg(errp, "serial property not set"); return; } @@ -1341,25 +1343,25 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) pcie_endpoint_cap_init(pci_dev, 0x80); n->num_namespaces = 1; -n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4); +n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4); n->ns_size = bs_size / (uint64_t)n->num_namespaces; n->namespaces = g_new0(NvmeNamespace, n->num_namespaces); -n->sq = g_new0(NvmeSQueue *, n->num_queues); -n->cq = g_new0(NvmeCQueue *, n->num_queues); +n->sq = g_new0(NvmeSQueue *, n->params.num_queues); +n->cq = g_new0(NvmeCQueue *, n->params.num_queues); memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, "nvme", n->reg_size); pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, >iomem); -msix_init_exclusive_bar(pci_dev, n->num_queues, 4, NULL); +msix_init_exclusive_bar(pci_dev, n->params.num_queues, 4, NULL); id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID)); id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID)); strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' '); strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' '); -strpadcpy((char *)id->sn, sizeof(id->sn), n->serial, ' '); +strpadcpy((char *)id->sn,
[PATCH v2 03/20] nvme: add missing fields in the identify controller data structure
Not used by the device model but added for completeness. See NVM Express 1.2.1, Section 5.11 ("Identify command"), Figure 90. Signed-off-by: Klaus Jensen --- include/block/nvme.h | 34 +- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/include/block/nvme.h b/include/block/nvme.h index 3ec8efcc435e..1b0accd4fe2b 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -543,7 +543,13 @@ typedef struct NvmeIdCtrl { uint8_t ieee[3]; uint8_t cmic; uint8_t mdts; -uint8_t rsvd255[178]; +uint16_tcntlid; +uint32_tver; +uint16_trtd3r; +uint32_trtd3e; +uint32_toaes; +uint32_tctratt; +uint8_t rsvd255[156]; uint16_toacs; uint8_t acl; uint8_t aerl; @@ -551,10 +557,22 @@ typedef struct NvmeIdCtrl { uint8_t lpa; uint8_t elpe; uint8_t npss; -uint8_t rsvd511[248]; +uint8_t avscc; +uint8_t apsta; +uint16_twctemp; +uint16_tcctemp; +uint16_tmtfa; +uint32_thmpre; +uint32_thmmin; +uint8_t tnvmcap[16]; +uint8_t unvmcap[16]; +uint32_trpmbs; +uint8_t rsvd319[4]; +uint16_tkas; +uint8_t rsvd511[190]; uint8_t sqes; uint8_t cqes; -uint16_trsvd515; +uint16_tmaxcmd; uint32_tnn; uint16_toncs; uint16_tfuses; @@ -562,8 +580,14 @@ typedef struct NvmeIdCtrl { uint8_t vwc; uint16_tawun; uint16_tawupf; -uint8_t rsvd703[174]; -uint8_t rsvd2047[1344]; +uint8_t nvscc; +uint8_t rsvd531; +uint16_tacwu; +uint16_trsvd535; +uint32_tsgls; +uint8_t rsvd767[228]; +uint8_t subnqn[256]; +uint8_t rsvd2047[1024]; NvmePSD psd[32]; uint8_t vs[1024]; } NvmeIdCtrl; -- 2.23.0
[PATCH v2 04/20] nvme: populate the mandatory subnqn and ver fields
Required for compliance with NVMe revision 1.2.1 or later. See NVM Express 1.2.1, Section 5.11 ("Identify command"), Figure 90 and Section 7.9 ("NVMe Qualified Names"). This also bumps the supported version to 1.2.1. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 277700fdcc58..16f0fba10b08 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -9,9 +9,9 @@ */ /** - * Reference Specs: http://www.nvmexpress.org, 1.2, 1.1, 1.0e + * Reference Specification: NVM Express 1.2.1 * - * http://www.nvmexpress.org/resources/ + * https://nvmexpress.org/resources/specifications/ */ /** @@ -1366,6 +1366,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) id->ieee[0] = 0x00; id->ieee[1] = 0x02; id->ieee[2] = 0xb3; +id->ver = cpu_to_le32(0x00010201); id->oacs = cpu_to_le16(0); id->frmw = 7 << 1; id->lpa = 1 << 0; @@ -1373,6 +1374,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) id->cqes = (0x4 << 4) | 0x4; id->nn = cpu_to_le32(n->num_namespaces); id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP); + +strcpy((char *) id->subnqn, "nqn.2019-08.org.qemu:"); +pstrcat((char *) id->subnqn, sizeof(id->subnqn), n->params.serial); + id->psd[0].mp = cpu_to_le16(0x9c4); id->psd[0].enlat = cpu_to_le32(0x10); id->psd[0].exlat = cpu_to_le32(0x4); @@ -1387,7 +1392,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) NVME_CAP_SET_CSS(n->bar.cap, 1); NVME_CAP_SET_MPSMAX(n->bar.cap, 4); -n->bar.vs = 0x00010200; +n->bar.vs = 0x00010201; n->bar.intmc = n->bar.intms = 0; if (n->params.cmb_size_mb) { -- 2.23.0
[PATCH v2 10/20] nvme: add logging to error information log page
This adds the nvme_set_error_page function which allows errors to be written to the error information log page. The functionality is largely unused in the device, but with this in place we can at least try to push new contributions to use it. NOTE: In violation of the specification the Error Count field is *not* retained across power off conditions because the device currently has no place to store this kind of persistent state. Cribbed from Keith's qemu-nvme tree. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 22 -- hw/block/nvme.h | 2 ++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 5cdee37582f9..32381d7df655 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -161,6 +161,22 @@ static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq) } } +static void nvme_set_error_page(NvmeCtrl *n, uint16_t sqid, uint16_t cid, +uint16_t status, uint16_t location, uint64_t lba, uint32_t nsid) +{ +NvmeErrorLog *elp; + +elp = >elpes[n->elp_index]; +elp->error_count = n->error_count++; +elp->sqid = sqid; +elp->cid = cid; +elp->status_field = status; +elp->param_error_location = location; +elp->lba = lba; +elp->nsid = nsid; +n->elp_index = (n->elp_index + 1) % n->params.elpe; +} + static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1, uint64_t prp2, uint32_t len, NvmeCtrl *n) { @@ -386,7 +402,9 @@ static void nvme_rw_cb(void *opaque, int ret) req->status = NVME_SUCCESS; } else { block_acct_failed(blk_get_stats(n->conf.blk), >acct); -req->status = NVME_INTERNAL_DEV_ERROR; +nvme_set_error_page(n, sq->sqid, cpu_to_le16(req->cid), +NVME_INTERNAL_DEV_ERROR, 0, 0, 1); +req->status = NVME_INTERNAL_DEV_ERROR | NVME_MORE; } if (req->has_sg) { qemu_sglist_destroy(>qsg); @@ -678,7 +696,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae, smart.host_read_commands[0] = cpu_to_le64(read_commands); smart.host_write_commands[0] = cpu_to_le64(write_commands); -smart.number_of_error_log_entries[0] = cpu_to_le64(0); +smart.number_of_error_log_entries[0] = cpu_to_le64(n->error_count); smart.temperature[0] = n->temperature & 0xff; smart.temperature[1] = (n->temperature >> 8) & 0xff; diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 3fc36f577b46..d74b0e0f9b2c 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -100,6 +100,8 @@ typedef struct NvmeCtrl { uint64_ttimestamp_set_qemu_clock_ms;/* QEMU clock time */ uint64_tstarttime_ms; uint16_ttemperature; +uint8_t elp_index; +uint64_terror_count; QEMUTimer *aer_timer; uint8_t aer_mask; -- 2.23.0
[PATCH v2 09/20] nvme: add support for the asynchronous event request command
Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1, Section 5.2 ("Asynchronous Event Request command"). Mostly imported from Keith's qemu-nvme tree. Modified to not enqueue events if something of the same type is already queued (but not cleared by the host). Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 180 -- hw/block/nvme.h | 13 ++- hw/block/trace-events | 8 ++ include/block/nvme.h | 4 +- 4 files changed, 196 insertions(+), 9 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 4412a3bea3bc..5cdee37582f9 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -334,6 +334,46 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req) timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500); } +static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type, +uint8_t event_info, uint8_t log_page) +{ +NvmeAsyncEvent *event; + +trace_nvme_enqueue_event(event_type, event_info, log_page); + +/* + * Do not enqueue the event if something of this type is already queued. + * This bounds the size of the event queue and makes sure it does not grow + * indefinitely when events are not processed by the host (i.e. does not + * issue any AERs). + */ +if (n->aer_mask_queued & (1 << event_type)) { +trace_nvme_enqueue_event_masked(event_type); +return; +} +n->aer_mask_queued |= (1 << event_type); + +event = g_new(NvmeAsyncEvent, 1); +event->result = (NvmeAerResult) { +.event_type = event_type, +.event_info = event_info, +.log_page = log_page, +}; + +QTAILQ_INSERT_TAIL(>aer_queue, event, entry); + +timer_mod(n->aer_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500); +} + +static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type) +{ +n->aer_mask &= ~(1 << event_type); +if (!QTAILQ_EMPTY(>aer_queue)) { +timer_mod(n->aer_timer, +qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500); +} +} + static void nvme_rw_cb(void *opaque, int ret) { NvmeRequest *req = opaque; @@ -578,7 +618,7 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd) return NVME_SUCCESS; } -static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, +static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae, uint32_t buf_len, uint64_t off, NvmeRequest *req) { uint32_t trans_len; @@ -591,12 +631,16 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, trans_len = MIN(sizeof(*n->elpes) * (n->params.elpe + 1) - off, buf_len); +if (!rae) { +nvme_clear_events(n, NVME_AER_TYPE_ERROR); +} + return nvme_dma_read_prp(n, (uint8_t *) n->elpes + off, trans_len, prp1, prp2); } -static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len, -uint64_t off, NvmeRequest *req) +static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae, +uint32_t buf_len, uint64_t off, NvmeRequest *req) { uint64_t prp1 = le64_to_cpu(cmd->prp1); uint64_t prp2 = le64_to_cpu(cmd->prp2); @@ -646,6 +690,10 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len, smart.power_on_hours[0] = cpu_to_le64( (((current_ms - n->starttime_ms) / 1000) / 60) / 60); +if (!rae) { +nvme_clear_events(n, NVME_AER_TYPE_SMART); +} + return nvme_dma_read_prp(n, (uint8_t *) + off, trans_len, prp1, prp2); } @@ -698,9 +746,9 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) switch (lid) { case NVME_LOG_ERROR_INFO: -return nvme_error_info(n, cmd, len, off, req); +return nvme_error_info(n, cmd, rae, len, off, req); case NVME_LOG_SMART_INFO: -return nvme_smart_info(n, cmd, len, off, req); +return nvme_smart_info(n, cmd, rae, len, off, req); case NVME_LOG_FW_SLOT_INFO: return nvme_fw_log_info(n, cmd, len, off, req); default: @@ -958,6 +1006,9 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) break; case NVME_TIMESTAMP: return nvme_get_feature_timestamp(n, cmd); +case NVME_ASYNCHRONOUS_EVENT_CONF: +result = cpu_to_le32(n->features.async_config); +break; default: trace_nvme_err_invalid_getfeat(dw10); return NVME_INVALID_FIELD | NVME_DNR; @@ -993,6 +1044,12 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) switch (dw10) { case NVME_TEMPERATURE_THRESHOLD: n->features.temp_thresh = dw11; + +if (n->features.temp_thresh <= n->temperature) { +nvme_enqueue_event(n, NVME_AER_TYPE_SMART, +NVME_AER_INFO_SMART_TEMP_THRESH, NVME_LOG_SMART_INFO); +} + break; case NVME_VOLATILE_WRITE_CACHE: @@ -1008,6 +1065,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n,
[PATCH v2 08/20] nvme: add support for the get log page command
Add support for the Get Log Page command and basic implementations of the mandatory Error Information, SMART/Health Information and Firmware Slot Information log pages. In violation of the specification, the SMART/Health Information log page does not persist information over the lifetime of the controller because the device has no place to store such persistent state. Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1, Section 5.10 ("Get Log Page command"). Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 150 +- hw/block/nvme.h | 9 ++- hw/block/trace-events | 2 + include/block/nvme.h | 2 +- 4 files changed, 160 insertions(+), 3 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 1fdb3b8655ed..4412a3bea3bc 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -44,6 +44,7 @@ #include "nvme.h" #define NVME_MAX_QS PCI_MSIX_FLAGS_QSIZE +#define NVME_TEMPERATURE 0x143 #define NVME_GUEST_ERR(trace, fmt, ...) \ do { \ @@ -577,6 +578,137 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd) return NVME_SUCCESS; } +static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, +uint32_t buf_len, uint64_t off, NvmeRequest *req) +{ +uint32_t trans_len; +uint64_t prp1 = le64_to_cpu(cmd->prp1); +uint64_t prp2 = le64_to_cpu(cmd->prp2); + +if (off > sizeof(*n->elpes) * (n->params.elpe + 1)) { +return NVME_INVALID_FIELD | NVME_DNR; +} + +trans_len = MIN(sizeof(*n->elpes) * (n->params.elpe + 1) - off, buf_len); + +return nvme_dma_read_prp(n, (uint8_t *) n->elpes + off, trans_len, prp1, +prp2); +} + +static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len, +uint64_t off, NvmeRequest *req) +{ +uint64_t prp1 = le64_to_cpu(cmd->prp1); +uint64_t prp2 = le64_to_cpu(cmd->prp2); +uint32_t nsid = le32_to_cpu(cmd->nsid); + +uint32_t trans_len; +time_t current_ms; +uint64_t units_read = 0, units_written = 0, read_commands = 0, +write_commands = 0; +NvmeSmartLog smart; +BlockAcctStats *s; + +if (!nsid || (nsid != 0x && nsid > n->num_namespaces)) { +trace_nvme_err_invalid_ns(nsid, n->num_namespaces); +return NVME_INVALID_NSID | NVME_DNR; +} + +s = blk_get_stats(n->conf.blk); + +units_read = s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS; +units_written = s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS; +read_commands = s->nr_ops[BLOCK_ACCT_READ]; +write_commands = s->nr_ops[BLOCK_ACCT_WRITE]; + +if (off > sizeof(smart)) { +return NVME_INVALID_FIELD | NVME_DNR; +} + +trans_len = MIN(sizeof(smart) - off, buf_len); + +memset(, 0x0, sizeof(smart)); + +smart.data_units_read[0] = cpu_to_le64(units_read / 1000); +smart.data_units_written[0] = cpu_to_le64(units_written / 1000); +smart.host_read_commands[0] = cpu_to_le64(read_commands); +smart.host_write_commands[0] = cpu_to_le64(write_commands); + +smart.number_of_error_log_entries[0] = cpu_to_le64(0); +smart.temperature[0] = n->temperature & 0xff; +smart.temperature[1] = (n->temperature >> 8) & 0xff; + +if (n->features.temp_thresh <= n->temperature) { +smart.critical_warning |= NVME_SMART_TEMPERATURE; +} + +current_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL); +smart.power_on_hours[0] = cpu_to_le64( +(((current_ms - n->starttime_ms) / 1000) / 60) / 60); + +return nvme_dma_read_prp(n, (uint8_t *) + off, trans_len, prp1, +prp2); +} + +static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len, +uint64_t off, NvmeRequest *req) +{ +uint32_t trans_len; +uint64_t prp1 = le64_to_cpu(cmd->prp1); +uint64_t prp2 = le64_to_cpu(cmd->prp2); +NvmeFwSlotInfoLog fw_log; + +if (off > sizeof(fw_log)) { +return NVME_INVALID_FIELD | NVME_DNR; +} + +memset(_log, 0, sizeof(NvmeFwSlotInfoLog)); + +trans_len = MIN(sizeof(fw_log) - off, buf_len); + +return nvme_dma_read_prp(n, (uint8_t *) _log + off, trans_len, prp1, +prp2); +} + +static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +{ +uint32_t dw10 = le32_to_cpu(cmd->cdw10); +uint32_t dw11 = le32_to_cpu(cmd->cdw11); +uint32_t dw12 = le32_to_cpu(cmd->cdw12); +uint32_t dw13 = le32_to_cpu(cmd->cdw13); +uint16_t lid = dw10 & 0xff; +uint8_t rae = (dw10 >> 15) & 0x1; +uint32_t numdl, numdu; +uint64_t off, lpol, lpou; +size_t len; + +numdl = (dw10 >> 16); +numdu = (dw11 & 0x); +lpol = dw12; +lpou = dw13; + +len = (((numdu << 16) | numdl) + 1) << 2; +off = (lpou << 32ULL) | lpol; + +if (off & 0x3) { +return NVME_INVALID_FIELD | NVME_DNR; +} + +trace_nvme_get_log(req->cid, lid, rae, len, off); + +switch (lid) { +case NVME_LOG_ERROR_INFO: +return nvme_error_info(n, cmd, len, off, req); +case
[PATCH v2 11/20] nvme: add missing mandatory features
Add support for returning a resonable response to Get/Set Features of mandatory features. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 51 --- hw/block/trace-events | 2 ++ include/block/nvme.h | 3 ++- 3 files changed, 52 insertions(+), 4 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 32381d7df655..e7d46dcc6afe 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1007,12 +1007,24 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd) static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) { uint32_t dw10 = le32_to_cpu(cmd->cdw10); +uint32_t dw11 = le32_to_cpu(cmd->cdw11); uint32_t result; +trace_nvme_getfeat(dw10); + switch (dw10) { +case NVME_ARBITRATION: +result = cpu_to_le32(n->features.arbitration); +break; +case NVME_POWER_MANAGEMENT: +result = cpu_to_le32(n->features.power_mgmt); +break; case NVME_TEMPERATURE_THRESHOLD: result = cpu_to_le32(n->features.temp_thresh); break; +case NVME_ERROR_RECOVERY: +result = cpu_to_le32(n->features.err_rec); +break; case NVME_VOLATILE_WRITE_CACHE: result = blk_enable_write_cache(n->conf.blk); trace_nvme_getfeat_vwcache(result ? "enabled" : "disabled"); @@ -1024,6 +1036,19 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) break; case NVME_TIMESTAMP: return nvme_get_feature_timestamp(n, cmd); +case NVME_INTERRUPT_COALESCING: +result = cpu_to_le32(n->features.int_coalescing); +break; +case NVME_INTERRUPT_VECTOR_CONF: +if ((dw11 & 0x) > n->params.num_queues) { +return NVME_INVALID_FIELD | NVME_DNR; +} + +result = cpu_to_le32(n->features.int_vector_config[dw11 & 0x]); +break; +case NVME_WRITE_ATOMICITY: +result = cpu_to_le32(n->features.write_atomicity); +break; case NVME_ASYNCHRONOUS_EVENT_CONF: result = cpu_to_le32(n->features.async_config); break; @@ -1059,6 +1084,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) uint32_t dw10 = le32_to_cpu(cmd->cdw10); uint32_t dw11 = le32_to_cpu(cmd->cdw11); +trace_nvme_setfeat(dw10, dw11); + switch (dw10) { case NVME_TEMPERATURE_THRESHOLD: n->features.temp_thresh = dw11; @@ -1086,6 +1113,13 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) case NVME_ASYNCHRONOUS_EVENT_CONF: n->features.async_config = dw11; break; +case NVME_ARBITRATION: +case NVME_POWER_MANAGEMENT: +case NVME_ERROR_RECOVERY: +case NVME_INTERRUPT_COALESCING: +case NVME_INTERRUPT_VECTOR_CONF: +case NVME_WRITE_ATOMICITY: +return NVME_FEAT_NOT_CHANGABLE | NVME_DNR; default: trace_nvme_err_invalid_setfeat(dw10); return NVME_INVALID_FIELD | NVME_DNR; @@ -1709,6 +1743,14 @@ static void nvme_init_state(NvmeCtrl *n) n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL); n->temperature = NVME_TEMPERATURE; n->features.temp_thresh = 0x14d; +n->features.int_vector_config = g_malloc0_n(n->params.num_queues, +sizeof(*n->features.int_vector_config)); + +/* disable coalescing (not supported) */ +for (int i = 0; i < n->params.num_queues; i++) { +n->features.int_vector_config[i] = i | (1 << 16); +} + n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1); } @@ -1786,15 +1828,17 @@ static void nvme_init_ctrl(NvmeCtrl *n) id->nn = cpu_to_le32(n->num_namespaces); id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP); + +if (blk_enable_write_cache(n->conf.blk)) { +id->vwc = 1; +} + strcpy((char *) id->subnqn, "nqn.2019-08.org.qemu:"); pstrcat((char *) id->subnqn, sizeof(id->subnqn), n->params.serial); id->psd[0].mp = cpu_to_le16(0x9c4); id->psd[0].enlat = cpu_to_le32(0x10); id->psd[0].exlat = cpu_to_le32(0x4); -if (blk_enable_write_cache(n->conf.blk)) { -id->vwc = 1; -} n->bar.cap = 0; NVME_CAP_SET_MQES(n->bar.cap, 0x7ff); @@ -1866,6 +1910,7 @@ static void nvme_exit(PCIDevice *pci_dev) g_free(n->sq); g_free(n->elpes); g_free(n->aer_reqs); +g_free(n->features.int_vector_config); if (n->params.cmb_size_mb) { g_free(n->cmbuf); diff --git a/hw/block/trace-events b/hw/block/trace-events index 6ddb13d34061..a20a68d85d5a 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -41,6 +41,8 @@ nvme_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16"" nvme_identify_ctrl(void) "identify controller" nvme_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16"" nvme_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16"" +nvme_getfeat(uint32_t fid) "fid 0x%"PRIx32""
[PATCH v2 15/20] nvme: add support for scatter gather lists
For now, support the Data Block, Segment and Last Segment descriptor types. See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)"). Signed-off-by: Klaus Jensen --- block/nvme.c | 18 +- hw/block/nvme.c | 380 -- hw/block/trace-events | 3 + include/block/nvme.h | 62 ++- 4 files changed, 398 insertions(+), 65 deletions(-) diff --git a/block/nvme.c b/block/nvme.c index 5be3a39b632e..8825c19c72c2 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -440,7 +440,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp) error_setg(errp, "Cannot map buffer for DMA"); goto out; } -cmd.prp1 = cpu_to_le64(iova); +cmd.dptr.prp.prp1 = cpu_to_le64(iova); if (nvme_cmd_sync(bs, s->queues[0], )) { error_setg(errp, "Failed to identify controller"); @@ -529,7 +529,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp) } cmd = (NvmeCmd) { .opcode = NVME_ADM_CMD_CREATE_CQ, -.prp1 = cpu_to_le64(q->cq.iova), +.dptr.prp.prp1 = cpu_to_le64(q->cq.iova), .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)), .cdw11 = cpu_to_le32(0x3), }; @@ -540,7 +540,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp) } cmd = (NvmeCmd) { .opcode = NVME_ADM_CMD_CREATE_SQ, -.prp1 = cpu_to_le64(q->sq.iova), +.dptr.prp.prp1 = cpu_to_le64(q->sq.iova), .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)), .cdw11 = cpu_to_le32(0x1 | (n << 16)), }; @@ -889,16 +889,16 @@ try_map: case 0: abort(); case 1: -cmd->prp1 = pagelist[0]; -cmd->prp2 = 0; +cmd->dptr.prp.prp1 = pagelist[0]; +cmd->dptr.prp.prp2 = 0; break; case 2: -cmd->prp1 = pagelist[0]; -cmd->prp2 = pagelist[1]; +cmd->dptr.prp.prp1 = pagelist[0]; +cmd->dptr.prp.prp2 = pagelist[1]; break; default: -cmd->prp1 = pagelist[0]; -cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t)); +cmd->dptr.prp.prp1 = pagelist[0]; +cmd->dptr.prp.prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t)); break; } trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries); diff --git a/hw/block/nvme.c b/hw/block/nvme.c index f4b9bd36a04e..0a5cd079df9a 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -296,6 +296,198 @@ unmap: return status; } +static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg, +NvmeSglDescriptor *segment, uint64_t nsgld, uint32_t *len, +NvmeRequest *req) +{ +dma_addr_t addr, trans_len; + +for (int i = 0; i < nsgld; i++) { +if (NVME_SGL_TYPE(segment[i].type) != SGL_DESCR_TYPE_DATA_BLOCK) { +trace_nvme_err_invalid_sgl_descriptor(req->cid, +NVME_SGL_TYPE(segment[i].type)); +return NVME_SGL_DESCRIPTOR_TYPE_INVALID | NVME_DNR; +} + +if (*len == 0) { +if (!NVME_CTRL_SGLS_EXCESS_LENGTH(n->id_ctrl.sgls)) { +trace_nvme_err_invalid_sgl_excess_length(req->cid); +return NVME_DATA_SGL_LENGTH_INVALID | NVME_DNR; +} + +break; +} + +addr = le64_to_cpu(segment[i].addr); +trans_len = MIN(*len, le64_to_cpu(segment[i].len)); + +if (nvme_addr_is_cmb(n, addr)) { +/* + * All data and metadata, if any, associated with a particular + * command shall be located in either the CMB or host memory. Thus, + * if an address if found to be in the CMB and we have already + * mapped data that is in host memory, the use is invalid. + */ +if (!nvme_req_is_cmb(req) && qsg->size) { +return NVME_INVALID_USE_OF_CMB | NVME_DNR; +} + +nvme_req_set_cmb(req); +} else { +/* + * Similarly, if the address does not reference the CMB, but we + * have already established that the request has data or metadata + * in the CMB, the use is invalid. + */ +if (nvme_req_is_cmb(req)) { +return NVME_INVALID_USE_OF_CMB | NVME_DNR; +} +} + +qemu_sglist_add(qsg, addr, trans_len); + +*len -= trans_len; +} + +return NVME_SUCCESS; +} + +static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, +NvmeSglDescriptor sgl, uint32_t len, NvmeRequest *req) +{ +const int MAX_NSGLD = 256; + +NvmeSglDescriptor segment[MAX_NSGLD]; +uint64_t nsgld; +uint16_t status; +bool sgl_in_cmb = false; +hwaddr addr = le64_to_cpu(sgl.addr); + +trace_nvme_map_sgl(req->cid, NVME_SGL_TYPE(sgl.type), req->nlb, len); + +pci_dma_sglist_init(qsg, >parent_obj, 1); + +/* + * If the entire transfer can be described with a
Re: [PULL 0/1] Block patches
On Mon, 14 Oct 2019 at 09:52, Stefan Hajnoczi wrote: > > The following changes since commit 98b2e3c9ab3abfe476a2b02f8f51813edb90e72d: > > Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' > into staging (2019-10-08 16:08:35 +0100) > > are available in the Git repository at: > > https://github.com/stefanha/qemu.git tags/block-pull-request > > for you to fetch changes up to 69de48445a0d6169f1e2a6c5bfab994e1c810e33: > > test-bdrv-drain: fix iothread_join() hang (2019-10-14 09:48:01 +0100) > > > Pull request > > > > Stefan Hajnoczi (1): > test-bdrv-drain: fix iothread_join() hang > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2 for any user-visible changes. -- PMM
[PATCH v2 01/20] nvme: remove superfluous breaks
These break statements was left over when commit 3036a626e9ef ("nvme: add Get/Set Feature Timestamp support") was merged. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 4 1 file changed, 4 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 12d825425016..c06e3ca31905 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -788,7 +788,6 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) break; case NVME_TIMESTAMP: return nvme_get_feature_timestamp(n, cmd); -break; default: trace_nvme_err_invalid_getfeat(dw10); return NVME_INVALID_FIELD | NVME_DNR; @@ -832,11 +831,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) req->cqe.result = cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16)); break; - case NVME_TIMESTAMP: return nvme_set_feature_timestamp(n, cmd); -break; - default: trace_nvme_err_invalid_setfeat(dw10); return NVME_INVALID_FIELD | NVME_DNR; -- 2.23.0
[PATCH v2 05/20] nvme: allow completion queues in the cmb
Allow completion queues in the controller memory buffer. This also inlines the nvme_addr_{read,write} functions and adds an nvme_addr_is_cmb helper. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 16f0fba10b08..daa2367b0863 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -52,14 +52,34 @@ static void nvme_process_sq(void *opaque); -static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size) +static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr) { -if (n->cmbsz && addr >= n->ctrl_mem.addr && -addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size))) { -memcpy(buf, (void *)>cmbuf[addr - n->ctrl_mem.addr], size); -} else { -pci_dma_read(>parent_obj, addr, buf, size); +hwaddr low = n->ctrl_mem.addr; +hwaddr hi = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size); + +return addr >= low && addr < hi; +} + +static inline void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, +int size) +{ +if (n->cmbsz && nvme_addr_is_cmb(n, addr)) { +memcpy(buf, (void *) >cmbuf[addr - n->ctrl_mem.addr], size); +return; } + +pci_dma_read(>parent_obj, addr, buf, size); +} + +static inline void nvme_addr_write(NvmeCtrl *n, hwaddr addr, void *buf, +int size) +{ +if (n->cmbsz && nvme_addr_is_cmb(n, addr)) { +memcpy((void *) >cmbuf[addr - n->ctrl_mem.addr], buf, size); +return; +} + +pci_dma_write(>parent_obj, addr, buf, size); } static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid) @@ -281,6 +301,7 @@ static void nvme_post_cqes(void *opaque) QTAILQ_FOREACH_SAFE(req, >req_list, entry, next) { NvmeSQueue *sq; +NvmeCqe *cqe = >cqe; hwaddr addr; if (nvme_cq_full(cq)) { @@ -294,8 +315,7 @@ static void nvme_post_cqes(void *opaque) req->cqe.sq_head = cpu_to_le16(sq->head); addr = cq->dma_addr + cq->tail * n->cqe_size; nvme_inc_cq_tail(cq); -pci_dma_write(>parent_obj, addr, (void *)>cqe, -sizeof(req->cqe)); +nvme_addr_write(n, addr, (void *) cqe, sizeof(*cqe)); QTAILQ_INSERT_TAIL(>req_list, req, entry); } if (cq->tail != cq->head) { @@ -1401,7 +1421,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0); NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1); -NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0); +NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 1); NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0); NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1); NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1); -- 2.23.0
[PATCH v2 12/20] nvme: bump supported specification version to 1.3
Add the new Namespace Identification Descriptor List (CNS 03h) and track creation of queues to enable the controller to return Command Sequence Error if Set Features is called for Number of Queues after any queues have been created. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 82 +++ hw/block/nvme.h | 1 + hw/block/trace-events | 8 +++-- include/block/nvme.h | 30 +--- 4 files changed, 100 insertions(+), 21 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index e7d46dcc6afe..1e2320b38b14 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -9,20 +9,22 @@ */ /** - * Reference Specification: NVM Express 1.2.1 + * Reference Specification: NVM Express 1.3d * * https://nvmexpress.org/resources/specifications/ */ /** * Usage: add options: - * -drive file=,if=none,id= - * -device nvme,drive=,serial=,id=, \ - * cmb_size_mb=, \ - * num_queues= + * -drive file=,if=none,id= + * -device nvme,drive=,serial=,id= * - * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at - * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. + * Advanced optional options: + * + * num_queues= : Maximum number of IO Queues. + * Default: 64 + * cmb_size_mb= : Size of Controller Memory Buffer in MBs. + * Default: 0 (disabled) */ #include "qemu/osdep.h" @@ -345,6 +347,8 @@ static void nvme_post_cqes(void *opaque) static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req) { assert(cq->cqid == req->sq->cqid); + +trace_nvme_enqueue_req_completion(req->cid, cq->cqid, req->status); QTAILQ_REMOVE(>sq->out_req_list, req, entry); QTAILQ_INSERT_TAIL(>req_list, req, entry); timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500); @@ -530,6 +534,7 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n) if (sq->sqid) { g_free(sq); } +n->qs_created--; } static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd) @@ -596,6 +601,7 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr, cq = n->cq[cqid]; QTAILQ_INSERT_TAIL(&(cq->sq_list), sq, entry); n->sq[sqid] = sq; +n->qs_created++; } static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd) @@ -742,7 +748,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) uint32_t dw11 = le32_to_cpu(cmd->cdw11); uint32_t dw12 = le32_to_cpu(cmd->cdw12); uint32_t dw13 = le32_to_cpu(cmd->cdw13); -uint16_t lid = dw10 & 0xff; +uint8_t lid = dw10 & 0xff; +uint8_t lsp = (dw10 >> 8) & 0xf; uint8_t rae = (dw10 >> 15) & 0x1; uint32_t numdl, numdu; uint64_t off, lpol, lpou; @@ -760,7 +767,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -trace_nvme_get_log(req->cid, lid, rae, len, off); +trace_nvme_get_log(req->cid, lid, lsp, rae, len, off); switch (lid) { case NVME_LOG_ERROR_INFO: @@ -784,6 +791,7 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) if (cq->cqid) { g_free(cq); } +n->qs_created--; } static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeCmd *cmd) @@ -824,6 +832,7 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr, msix_vector_use(>parent_obj, cq->vector); n->cq[cqid] = cq; cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq); +n->qs_created++; } static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd) @@ -897,7 +906,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c) prp1, prp2); } -static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c) +static uint16_t nvme_identify_ns_list(NvmeCtrl *n, NvmeIdentify *c) { static const int data_len = 4 * KiB; uint32_t min_nsid = le32_to_cpu(c->nsid); @@ -907,7 +916,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c) uint16_t ret; int i, j = 0; -trace_nvme_identify_nslist(min_nsid); +trace_nvme_identify_ns_list(min_nsid); list = g_malloc0(data_len); for (i = 0; i < n->num_namespaces; i++) { @@ -924,6 +933,41 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c) return ret; } +static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeCmd *c) +{ +static const int len = 4096; + +struct ns_descr { +uint8_t nidt; +uint8_t nidl; +uint8_t rsvd2[2]; +uint8_t nid[16]; +}; + +uint32_t nsid = le32_to_cpu(c->nsid); +uint64_t prp1 = le64_to_cpu(c->prp1); +uint64_t prp2 = le64_to_cpu(c->prp2); + +struct ns_descr *list; +uint16_t ret; + +trace_nvme_identify_ns_descr_list(nsid); + +if (unlikely(nsid == 0 || nsid > n->num_namespaces)) { +
[PATCH v2 18/20] nvme: remove redundant NvmeCmd pointer parameter
The command struct is available in the NvmeRequest that we generally pass around anyway. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 219 +++- 1 file changed, 106 insertions(+), 113 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index bcd801c345b6..67f92bf5a3ac 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -574,14 +574,14 @@ static uint16_t nvme_dma_write_sgl(NvmeCtrl *n, uint8_t *ptr, uint32_t len, } static uint16_t nvme_dma_write(NvmeCtrl *n, uint8_t *ptr, uint32_t len, -NvmeCmd *cmd, NvmeRequest *req) +NvmeRequest *req) { -if (NVME_CMD_FLAGS_PSDT(cmd->flags)) { -return nvme_dma_write_sgl(n, ptr, len, cmd->dptr.sgl, req); +if (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) { +return nvme_dma_write_sgl(n, ptr, len, req->cmd.dptr.sgl, req); } -uint64_t prp1 = le64_to_cpu(cmd->dptr.prp.prp1); -uint64_t prp2 = le64_to_cpu(cmd->dptr.prp.prp2); +uint64_t prp1 = le64_to_cpu(req->cmd.dptr.prp.prp1); +uint64_t prp2 = le64_to_cpu(req->cmd.dptr.prp.prp2); return nvme_dma_write_prp(n, ptr, len, prp1, prp2, req); } @@ -624,7 +624,7 @@ out: } static uint16_t nvme_dma_read_sgl(NvmeCtrl *n, uint8_t *ptr, uint32_t len, -NvmeSglDescriptor sgl, NvmeCmd *cmd, NvmeRequest *req) +NvmeSglDescriptor sgl, NvmeRequest *req) { QEMUSGList qsg; uint16_t err = NVME_SUCCESS; @@ -662,29 +662,29 @@ out: } static uint16_t nvme_dma_read(NvmeCtrl *n, uint8_t *ptr, uint32_t len, -NvmeCmd *cmd, NvmeRequest *req) +NvmeRequest *req) { -if (NVME_CMD_FLAGS_PSDT(cmd->flags)) { -return nvme_dma_read_sgl(n, ptr, len, cmd->dptr.sgl, cmd, req); +if (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) { +return nvme_dma_read_sgl(n, ptr, len, req->cmd.dptr.sgl, req); } -uint64_t prp1 = le64_to_cpu(cmd->dptr.prp.prp1); -uint64_t prp2 = le64_to_cpu(cmd->dptr.prp.prp2); +uint64_t prp1 = le64_to_cpu(req->cmd.dptr.prp.prp1); +uint64_t prp2 = le64_to_cpu(req->cmd.dptr.prp.prp2); return nvme_dma_read_prp(n, ptr, len, prp1, prp2, req); } -static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +static uint16_t nvme_map(NvmeCtrl *n, NvmeRequest *req) { uint32_t len = req->nlb << nvme_ns_lbads(req->ns); uint64_t prp1, prp2; -if (NVME_CMD_FLAGS_PSDT(cmd->flags)) { -return nvme_map_sgl(n, >qsg, cmd->dptr.sgl, len, req); +if (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) { +return nvme_map_sgl(n, >qsg, req->cmd.dptr.sgl, len, req); } -prp1 = le64_to_cpu(cmd->dptr.prp.prp1); -prp2 = le64_to_cpu(cmd->dptr.prp.prp2); +prp1 = le64_to_cpu(req->cmd.dptr.prp.prp1); +prp2 = le64_to_cpu(req->cmd.dptr.prp.prp2); return nvme_map_prp(n, >qsg, prp1, prp2, len, req); } @@ -1045,7 +1045,7 @@ static uint16_t nvme_check_rw(NvmeCtrl *n, NvmeRequest *req) return NVME_SUCCESS; } -static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) { NvmeNamespace *ns = req->ns; @@ -1057,12 +1057,12 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) return NVME_NO_COMPLETE; } -static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeRequest *req) { NvmeAIO *aio; NvmeNamespace *ns = req->ns; -NvmeRwCmd *rw = (NvmeRwCmd *) cmd; +NvmeRwCmd *rw = (NvmeRwCmd *) >cmd; int64_t offset; size_t count; @@ -1092,9 +1092,9 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) return NVME_NO_COMPLETE; } -static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) { -NvmeRwCmd *rw = (NvmeRwCmd *) cmd; +NvmeRwCmd *rw = (NvmeRwCmd *) >cmd; NvmeNamespace *ns = req->ns; int status; @@ -1114,7 +1114,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) return status; } -status = nvme_map(n, cmd, req); +status = nvme_map(n, req); if (status) { block_acct_invalid(blk_get_stats(ns->conf.blk), acct); return status; @@ -1126,11 +1126,12 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) return NVME_NO_COMPLETE; } -static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) { -uint32_t nsid = le32_to_cpu(cmd->nsid); +uint32_t nsid = le32_to_cpu(req->cmd.nsid); -trace_nvme_io_cmd(req->cid, nsid, le16_to_cpu(req->sq->sqid), cmd->opcode); +trace_nvme_io_cmd(req->cid, nsid, le16_to_cpu(req->sq->sqid), +req->cmd.opcode); req->ns = nvme_ns(n, nsid); @@ -1139,16 +1140,16 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) return NVME_INVALID_NSID | NVME_DNR; } -
[PATCH v2 14/20] nvme: allow multiple aios per command
This refactors how the device issues asynchronous block backend requests. The NvmeRequest now holds a queue of NvmeAIOs that are associated with the command. This allows multiple aios to be issued for a command. Only when all requests have been completed will the device post a completion queue entry. Because the device is currently guaranteed to only issue a single aio request per command, the benefit is not immediately obvious. But this functionality is required to support metadata. Signed-off-by: Klaus Jensen Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 455 +- hw/block/nvme.h | 165 --- hw/block/trace-events | 8 + 3 files changed, 511 insertions(+), 117 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index cbc0b6a660b6..f4b9bd36a04e 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -25,6 +25,8 @@ * Default: 64 * cmb_size_mb= : Size of Controller Memory Buffer in MBs. * Default: 0 (disabled) + * mdts= : Maximum Data Transfer Size (power of two) + * Default: 7 */ #include "qemu/osdep.h" @@ -56,6 +58,7 @@ } while (0) static void nvme_process_sq(void *opaque); +static void nvme_aio_cb(void *opaque, int ret); static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr) { @@ -197,7 +200,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, uint64_t prp1, } if (nvme_addr_is_cmb(n, prp1)) { -req->is_cmb = true; +nvme_req_set_cmb(req); } pci_dma_sglist_init(qsg, >parent_obj, num_prps); @@ -255,8 +258,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, uint64_t prp1, } addr_is_cmb = nvme_addr_is_cmb(n, prp_ent); -if ((req->is_cmb && !addr_is_cmb) || -(!req->is_cmb && addr_is_cmb)) { +if ((nvme_req_is_cmb(req) && !addr_is_cmb) || +(!nvme_req_is_cmb(req) && addr_is_cmb)) { status = NVME_INVALID_USE_OF_CMB | NVME_DNR; goto unmap; } @@ -269,8 +272,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, uint64_t prp1, } } else { bool addr_is_cmb = nvme_addr_is_cmb(n, prp2); -if ((req->is_cmb && !addr_is_cmb) || -(!req->is_cmb && addr_is_cmb)) { +if ((nvme_req_is_cmb(req) && !addr_is_cmb) || +(!nvme_req_is_cmb(req) && addr_is_cmb)) { status = NVME_INVALID_USE_OF_CMB | NVME_DNR; goto unmap; } @@ -312,7 +315,7 @@ static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len, return status; } -if (req->is_cmb) { +if (nvme_req_is_cmb(req)) { QEMUIOVector iov; qemu_iovec_init(, qsg.nsg); @@ -341,19 +344,18 @@ static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len, static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len, uint64_t prp1, uint64_t prp2, NvmeRequest *req) { -QEMUSGList qsg; uint16_t status = NVME_SUCCESS; -status = nvme_map_prp(n, , prp1, prp2, len, req); +status = nvme_map_prp(n, >qsg, prp1, prp2, len, req); if (status) { return status; } -if (req->is_cmb) { +if (nvme_req_is_cmb(req)) { QEMUIOVector iov; -qemu_iovec_init(, qsg.nsg); -dma_to_cmb(n, , ); +qemu_iovec_init(, req->qsg.nsg); +dma_to_cmb(n, >qsg, ); if (unlikely(qemu_iovec_from_buf(, 0, ptr, len) != len)) { trace_nvme_err_invalid_dma(); @@ -365,17 +367,137 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len, goto out; } -if (unlikely(dma_buf_read(ptr, len, ))) { +if (unlikely(dma_buf_read(ptr, len, >qsg))) { trace_nvme_err_invalid_dma(); status = NVME_INVALID_FIELD | NVME_DNR; } out: -qemu_sglist_destroy(); +qemu_sglist_destroy(>qsg); return status; } +static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) +{ +NvmeNamespace *ns = req->ns; + +uint32_t len = req->nlb << nvme_ns_lbads(ns); +uint64_t prp1 = le64_to_cpu(cmd->prp1); +uint64_t prp2 = le64_to_cpu(cmd->prp2); + +return nvme_map_prp(n, >qsg, prp1, prp2, len, req); +} + +static void nvme_aio_destroy(NvmeAIO *aio) +{ +if (aio->iov.nalloc) { +qemu_iovec_destroy(>iov); +} + +g_free(aio); +} + +static NvmeAIO *nvme_aio_new(BlockBackend *blk, int64_t offset, +QEMUSGList *qsg, NvmeRequest *req, NvmeAIOCompletionFunc *cb) +{ +NvmeAIO *aio = g_malloc0(sizeof(*aio)); + +*aio = (NvmeAIO) { +.blk = blk, +.offset = offset, +.req = req, +.qsg = qsg, +.cb = cb, +}; + +if (qsg &&
[PATCH v2 20/20] nvme: handle dma errors
Handling DMA errors gracefully is required for the device to pass the block/011 test ("disable PCI device while doing I/O") in the blktests suite. With this patch the device passes the test by retrying "critical" transfers (posting of completion entries and processing of submission queue entries). If DMA errors occur at any other point in the execution of the command (say, while mapping the PRPs or SGLs), the command is aborted with a Data Transfer Error status code. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 63 +-- hw/block/trace-events | 2 ++ include/block/nvme.h | 2 +- 3 files changed, 52 insertions(+), 15 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index d0103c16cfe9..00c5b843295b 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -71,26 +71,26 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr) return addr >= low && addr < hi; } -static inline void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, +static inline int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size) { if (n->cmbsz && nvme_addr_is_cmb(n, addr)) { memcpy(buf, (void *) >cmbuf[addr - n->ctrl_mem.addr], size); -return; +return 0; } -pci_dma_read(>parent_obj, addr, buf, size); +return pci_dma_read(>parent_obj, addr, buf, size); } -static inline void nvme_addr_write(NvmeCtrl *n, hwaddr addr, void *buf, +static inline int nvme_addr_write(NvmeCtrl *n, hwaddr addr, void *buf, int size) { if (n->cmbsz && nvme_addr_is_cmb(n, addr)) { memcpy((void *) >cmbuf[addr - n->ctrl_mem.addr], buf, size); -return; +return 0; } -pci_dma_write(>parent_obj, addr, buf, size); +return pci_dma_write(>parent_obj, addr, buf, size); } static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid) @@ -228,7 +228,11 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, uint64_t prp1, nents = (len + n->page_size - 1) >> n->page_bits; prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t); -nvme_addr_read(n, prp2, (void *) prp_list, prp_trans); +if (nvme_addr_read(n, prp2, (void *) prp_list, prp_trans)) { +trace_nvme_err_addr_read((void *) prp2); +status = NVME_DATA_TRANSFER_ERROR; +goto unmap; +} while (len != 0) { bool addr_is_cmb; uint64_t prp_ent = le64_to_cpu(prp_list[i]); @@ -250,7 +254,11 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, uint64_t prp1, i = 0; nents = (len + n->page_size - 1) >> n->page_bits; prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t); -nvme_addr_read(n, prp_ent, (void *) prp_list, prp_trans); +if (nvme_addr_read(n, prp_ent, (void *) prp_list, prp_trans)) { +trace_nvme_err_addr_read((void *) prp_ent); +status = NVME_DATA_TRANSFER_ERROR; +goto unmap; +} prp_ent = le64_to_cpu(prp_list[i]); } @@ -402,7 +410,11 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, /* read the segment in chunks of 256 descriptors (4k) */ while (nsgld > MAX_NSGLD) { -nvme_addr_read(n, addr, segment, sizeof(segment)); +if (nvme_addr_read(n, addr, segment, sizeof(segment))) { +trace_nvme_err_addr_read((void *) addr); +status = NVME_DATA_TRANSFER_ERROR; +goto unmap; +} status = nvme_map_sgl_data(n, qsg, segment, MAX_NSGLD, , req); if (status) { @@ -413,7 +425,11 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, addr += MAX_NSGLD * sizeof(NvmeSglDescriptor); } -nvme_addr_read(n, addr, segment, nsgld * sizeof(NvmeSglDescriptor)); +if (nvme_addr_read(n, addr, segment, nsgld * sizeof(NvmeSglDescriptor))) { +trace_nvme_err_addr_read((void *) addr); +status = NVME_DATA_TRANSFER_ERROR; +goto unmap; +} sgl = segment[nsgld - 1]; addr = le64_to_cpu(sgl.addr); @@ -458,7 +474,11 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, nsgld = le64_to_cpu(sgl.len) / sizeof(NvmeSglDescriptor); while (nsgld > MAX_NSGLD) { -nvme_addr_read(n, addr, segment, sizeof(segment)); +if (nvme_addr_read(n, addr, segment, sizeof(segment))) { +trace_nvme_err_addr_read((void *) addr); +status = NVME_DATA_TRANSFER_ERROR; +goto unmap; +} status = nvme_map_sgl_data(n, qsg, segment, MAX_NSGLD, , req); if (status) { @@ -469,7 +489,11 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, addr +=
[PATCH v2 16/20] nvme: support multiple namespaces
This adds support for multiple namespaces by introducing a new 'nvme-ns' device model. The nvme device creates a bus named from the device name ('id'). The nvme-ns devices then connect to this and registers themselves with the nvme device. This changes how an nvme device is created. Example with two namespaces: -drive file=nvme0n1.img,if=none,id=disk1 -drive file=nvme0n2.img,if=none,id=disk2 -device nvme,serial=deadbeef,id=nvme0 -device nvme-ns,drive=disk1,bus=nvme0,nsid=1 -device nvme-ns,drive=disk2,bus=nvme0,nsid=2 The drive property is kept on the nvme device to keep the change backward compatible, but the property is now optional. Specifying a drive for the nvme device will always create the namespace with nsid 1. Signed-off-by: Klaus Jensen Signed-off-by: Klaus Jensen --- hw/block/Makefile.objs | 2 +- hw/block/nvme-ns.c | 139 +++ hw/block/nvme-ns.h | 58 +++ hw/block/nvme.c| 212 + hw/block/nvme.h| 51 +- hw/block/trace-events | 5 +- 6 files changed, 352 insertions(+), 115 deletions(-) create mode 100644 hw/block/nvme-ns.c create mode 100644 hw/block/nvme-ns.h diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs index f5f643f0cc06..d44a2f4b780d 100644 --- a/hw/block/Makefile.objs +++ b/hw/block/Makefile.objs @@ -7,7 +7,7 @@ common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o common-obj-$(CONFIG_XEN) += xen-block.o common-obj-$(CONFIG_ECC) += ecc.o common-obj-$(CONFIG_ONENAND) += onenand.o -common-obj-$(CONFIG_NVME_PCI) += nvme.o +common-obj-$(CONFIG_NVME_PCI) += nvme.o nvme-ns.o obj-$(CONFIG_SH4) += tc58128.o diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c new file mode 100644 index ..aa76bb63ef45 --- /dev/null +++ b/hw/block/nvme-ns.c @@ -0,0 +1,139 @@ +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "qemu/cutils.h" +#include "qemu/log.h" +#include "hw/block/block.h" +#include "hw/pci/msix.h" +#include "sysemu/sysemu.h" +#include "sysemu/block-backend.h" +#include "qapi/error.h" + +#include "hw/qdev-properties.h" +#include "hw/qdev-core.h" + +#include "nvme.h" +#include "nvme-ns.h" + +static int nvme_ns_init(NvmeNamespace *ns) +{ +NvmeIdNs *id_ns = >id_ns; + +id_ns->lbaf[0].ds = BDRV_SECTOR_BITS; +id_ns->nuse = id_ns->ncap = id_ns->nsze = +cpu_to_le64(nvme_ns_nlbas(ns)); + +return 0; +} + +static int nvme_ns_init_blk(NvmeNamespace *ns, NvmeIdCtrl *id, Error **errp) +{ +blkconf_blocksizes(>conf); + +if (!blkconf_apply_backend_options(>conf, +blk_is_read_only(ns->conf.blk), false, errp)) { +return 1; +} + +ns->size = blk_getlength(ns->conf.blk); +if (ns->size < 0) { +error_setg_errno(errp, -ns->size, "blk_getlength"); +return 1; +} + +if (!blk_enable_write_cache(ns->conf.blk)) { +id->vwc = 0; +} + +return 0; +} + +static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) +{ +if (!ns->conf.blk) { +error_setg(errp, "block backend not configured"); +return 1; +} + +return 0; +} + +int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) +{ +Error *local_err = NULL; + +if (nvme_ns_check_constraints(ns, _err)) { +error_propagate_prepend(errp, local_err, +"nvme_ns_check_constraints: "); +return 1; +} + +if (nvme_ns_init_blk(ns, >id_ctrl, _err)) { +error_propagate_prepend(errp, local_err, "nvme_ns_init_blk: "); +return 1; +} + +nvme_ns_init(ns); +if (nvme_register_namespace(n, ns, _err)) { +error_propagate_prepend(errp, local_err, "nvme_register_namespace: "); +return 1; +} + +return 0; +} + +static void nvme_ns_realize(DeviceState *dev, Error **errp) +{ +NvmeNamespace *ns = NVME_NS(dev); +BusState *s = qdev_get_parent_bus(dev); +NvmeCtrl *n = NVME(s->parent); +Error *local_err = NULL; + +if (nvme_ns_setup(n, ns, _err)) { +error_propagate_prepend(errp, local_err, "nvme_ns_setup: "); +return; +} +} + +static Property nvme_ns_props[] = { +DEFINE_BLOCK_PROPERTIES(NvmeNamespace, conf), +DEFINE_NVME_NS_PROPERTIES(NvmeNamespace, params), +DEFINE_PROP_END_OF_LIST(), +}; + +static void nvme_ns_class_init(ObjectClass *oc, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(oc); + +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); + +dc->bus_type = TYPE_NVME_BUS; +dc->realize = nvme_ns_realize; +dc->props = nvme_ns_props; +dc->desc = "virtual nvme namespace"; +} + +static void nvme_ns_instance_init(Object *obj) +{ +NvmeNamespace *ns = NVME_NS(obj); +char *bootindex = g_strdup_printf("/namespace@%d,0", ns->params.nsid); + +device_add_bootindex_property(obj, >conf.bootindex, "bootindex", +bootindex, DEVICE(obj), _abort); + +g_free(bootindex); +} + +static const TypeInfo nvme_ns_info = { +.name = TYPE_NVME_NS,
[PATCH v2 13/20] nvme: refactor prp mapping
Instead of handling both QSGs and IOVs in multiple places, simply use QSGs everywhere by assuming that the request does not involve the controller memory buffer (CMB). If the request is found to involve the CMB, convert the QSG to an IOV and issue the I/O. The QSG is converted to an IOV by the dma helpers anyway, so the CMB path is not unfairly affected by this simplifying change. As a side-effect, this patch also allows PRPs to be located in the CMB. The logic ensures that if some of the PRP is in the CMB, all of it must be located there, as per the specification. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 255 -- hw/block/nvme.h | 4 +- hw/block/trace-events | 1 + include/block/nvme.h | 1 + 4 files changed, 174 insertions(+), 87 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 1e2320b38b14..cbc0b6a660b6 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -179,138 +179,200 @@ static void nvme_set_error_page(NvmeCtrl *n, uint16_t sqid, uint16_t cid, n->elp_index = (n->elp_index + 1) % n->params.elpe; } -static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1, - uint64_t prp2, uint32_t len, NvmeCtrl *n) +static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, uint64_t prp1, +uint64_t prp2, uint32_t len, NvmeRequest *req) { hwaddr trans_len = n->page_size - (prp1 % n->page_size); trans_len = MIN(len, trans_len); int num_prps = (len >> n->page_bits) + 1; +uint16_t status = NVME_SUCCESS; +bool prp_list_in_cmb = false; + +trace_nvme_map_prp(req->cid, req->cmd.opcode, trans_len, len, prp1, prp2, +num_prps); if (unlikely(!prp1)) { trace_nvme_err_invalid_prp(); return NVME_INVALID_FIELD | NVME_DNR; -} else if (n->cmbsz && prp1 >= n->ctrl_mem.addr && - prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) { -qsg->nsg = 0; -qemu_iovec_init(iov, num_prps); -qemu_iovec_add(iov, (void *)>cmbuf[prp1 - n->ctrl_mem.addr], trans_len); -} else { -pci_dma_sglist_init(qsg, >parent_obj, num_prps); -qemu_sglist_add(qsg, prp1, trans_len); } + +if (nvme_addr_is_cmb(n, prp1)) { +req->is_cmb = true; +} + +pci_dma_sglist_init(qsg, >parent_obj, num_prps); +qemu_sglist_add(qsg, prp1, trans_len); + len -= trans_len; if (len) { if (unlikely(!prp2)) { trace_nvme_err_invalid_prp2_missing(); +status = NVME_INVALID_FIELD | NVME_DNR; goto unmap; } + if (len > n->page_size) { uint64_t prp_list[n->max_prp_ents]; uint32_t nents, prp_trans; int i = 0; +if (nvme_addr_is_cmb(n, prp2)) { +prp_list_in_cmb = true; +} + nents = (len + n->page_size - 1) >> n->page_bits; prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t); -nvme_addr_read(n, prp2, (void *)prp_list, prp_trans); +nvme_addr_read(n, prp2, (void *) prp_list, prp_trans); while (len != 0) { +bool addr_is_cmb; uint64_t prp_ent = le64_to_cpu(prp_list[i]); if (i == n->max_prp_ents - 1 && len > n->page_size) { if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) { trace_nvme_err_invalid_prplist_ent(prp_ent); +status = NVME_INVALID_FIELD | NVME_DNR; +goto unmap; +} + +addr_is_cmb = nvme_addr_is_cmb(n, prp_ent); +if ((prp_list_in_cmb && !addr_is_cmb) || +(!prp_list_in_cmb && addr_is_cmb)) { +status = NVME_INVALID_USE_OF_CMB | NVME_DNR; goto unmap; } i = 0; nents = (len + n->page_size - 1) >> n->page_bits; prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t); -nvme_addr_read(n, prp_ent, (void *)prp_list, -prp_trans); +nvme_addr_read(n, prp_ent, (void *) prp_list, prp_trans); prp_ent = le64_to_cpu(prp_list[i]); } if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) { trace_nvme_err_invalid_prplist_ent(prp_ent); +status = NVME_INVALID_FIELD | NVME_DNR; goto unmap; } -trans_len = MIN(len, n->page_size); -if (qsg->nsg){ -qemu_sglist_add(qsg, prp_ent, trans_len); -} else { -qemu_iovec_add(iov, (void *)>cmbuf[prp_ent - n->ctrl_mem.addr], trans_len); +addr_is_cmb = nvme_addr_is_cmb(n, prp_ent); +
[PATCH v2 19/20] nvme: make lba data size configurable
Signed-off-by: Klaus Jensen --- hw/block/nvme-ns.c | 2 +- hw/block/nvme-ns.h | 4 +++- hw/block/nvme.c| 1 + 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index aa76bb63ef45..70ff622a5729 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -18,7 +18,7 @@ static int nvme_ns_init(NvmeNamespace *ns) { NvmeIdNs *id_ns = >id_ns; -id_ns->lbaf[0].ds = BDRV_SECTOR_BITS; +id_ns->lbaf[0].ds = ns->params.lbads; id_ns->nuse = id_ns->ncap = id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns)); diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 64dd054cf6a9..aa1c81d85cde 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -6,10 +6,12 @@ OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS) #define DEFINE_NVME_NS_PROPERTIES(_state, _props) \ -DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0) +DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0), \ +DEFINE_PROP_UINT8("lbads", _state, _props.lbads, 9) typedef struct NvmeNamespaceParams { uint32_t nsid; +uint8_t lbads; } NvmeNamespaceParams; typedef struct NvmeNamespace { diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 67f92bf5a3ac..d0103c16cfe9 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -2602,6 +2602,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) if (n->namespace.conf.blk) { ns = >namespace; ns->params.nsid = 1; +ns->params.lbads = 9; if (nvme_ns_setup(n, ns, _err)) { error_propagate_prepend(errp, local_err, "nvme_ns_setup: "); -- 2.23.0