Re: [PATCH v2 01/11] qcow2: make function update_refcount_discard() global
On Mon 13 May 2024 09:31:53 AM +03, Andrey Drobyshev wrote: > We are going to need it for discarding separate subclusters. The > function itself doesn't do anything with the refcount tables, it simply > adds a discard request to the queue, so rename it to qcow2_queue_discard(). > > Signed-off-by: Andrey Drobyshev > Reviewed-by: Hanna Czenczek Reviewed-by: Alberto Garcia Berto
Re: [PATCH v2 02/11] qcow2: simplify L2 entries accounting for discard-no-unref
On Mon 13 May 2024 09:31:54 AM +03, Andrey Drobyshev wrote: > Commits 42a2890a and b2b10904 introduce handling of discard-no-unref > option in discard_in_l2_slice() and zero_in_l2_slice(). They add even > more if's when chosing the right l2 entry. What we really need for this > option is the new entry simply to contain the same host cluster offset, > no matter whether we unmap or zeroize the cluster. For that OR'ing with > the old entry is enough. > > This patch doesn't change the logic and is pure refactoring. > > Signed-off-by: Andrey Drobyshev Reviewed-by: Alberto Garcia Berto
Re: [PATCH v2 04/11] block/file-posix: add trace event for fallocate() calls
On Mon 13 May 2024 09:31:56 AM +03, Andrey Drobyshev wrote: > This would ease debugging of write zeroes and discard operations. > > Signed-off-by: Andrey Drobyshev Reviewed-by: Alberto Garcia Berto
Re: [PATCH v2 05/11] iotests/common.rc: add disk_usage function
On Mon 13 May 2024 09:31:57 AM +03, Andrey Drobyshev wrote: > Move the definition from iotests/250 to common.rc. This is used to > detect real disk usage of sparse files. In particular, we want to use > it for checking subclusters-based discards. > > Signed-off-by: Andrey Drobyshev Reviewed-by: Alberto Garcia Berto
Re: [PATCH v2 06/11] iotests/290: add test case to check 'discard-no-unref' option behavior
On Mon 13 May 2024 09:31:58 AM +03, Andrey Drobyshev wrote: > We basically fill 2 images with identical data and perform discard > operations with and without 'discard-no-unref' enabled. Then we check > that images still read identically, that their disk usage is the same > (i.e. fallocate(FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE) is called for > both) and that with the option enabled cluster is still marked as > allocated in "qemu-img map" output. We also check that the option > doesn't work with qcow2 v2 images. > > Signed-off-by: Andrey Drobyshev Reviewed-by: Alberto Garcia Berto
[PATCH] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
This tool converts a disk image to qcow2, writing the result directly to stdout. This can be used for example to send the generated file over the network. This is equivalent to using qemu-img to convert a file to qcow2 and then writing the result to stdout, with the difference that this tool does not need to create this temporary qcow2 file and therefore does not need any additional disk space. The input file is read twice. The first pass is used to determine which clusters contain non-zero data and that information is used to create the qcow2 header, refcount table and blocks, and L1 and L2 tables. After all that metadata is created then the second pass is used to write the guest data. By default qcow2-to-stdout.py expects the input to be a raw file, but if qemu-storage-daemon is available then it can also be used to read images in other formats. Alternatively the user can also run qemu-ndb or qemu-storage-daemon manually instead. Signed-off-by: Alberto Garcia Signed-off-by: Madeeha Javed --- scripts/qcow2-to-stdout.py | 330 + 1 file changed, 330 insertions(+) create mode 100755 scripts/qcow2-to-stdout.py diff --git a/scripts/qcow2-to-stdout.py b/scripts/qcow2-to-stdout.py new file mode 100755 index 00..b9f75de690 --- /dev/null +++ b/scripts/qcow2-to-stdout.py @@ -0,0 +1,330 @@ +#!/usr/bin/env python3 + +# This tool reads a disk image in any format and converts it to qcow2, +# writing the result directly to stdout. +# +# Copyright (C) 2024 Igalia, S.L. +# +# Authors: Alberto Garcia +# Madeeha Javed +# +# SPDX-License-Identifier: GPL-2.0-or-later +# +# qcow2 files produced by this script are always arranged like this: +# +# - qcow2 header +# - refcount table +# - refcount blocks +# - L1 table +# - L2 tables +# - Data clusters +# +# A note about variable names: in qcow2 there is one refcount table +# and one (active) L1 table, although each can occupy several +# clusters. For the sake of simplicity the code sometimes talks about +# refcount tables and L1 tables when referring to those clusters. + +import argparse +import atexit +import math +import os +import signal +import struct +import sys +import subprocess +import tempfile +import time + +QCOW2_DEFAULT_CLUSTER_SIZE = 65536 +QCOW2_DEFAULT_REFCOUNT_BITS = 16 +QCOW2_DEFAULT_VERSION = 3 +QCOW_OFLAG_COPIED = 1 << 63 + +def bitmap_set(bitmap, idx): +bitmap[int(idx / 8)] |= (1 << (idx % 8)) + +def bitmap_test(bitmap, idx): +return (bitmap[int(idx / 8)] & (1 << (idx % 8))) != 0 + +# Kill the storage daemon on exit +def kill_storage_daemon(pid_file, raw_file, temp_dir): +if os.path.exists(pid_file): +with open(pid_file, 'r') as f: +pid=int(f.readline()) +os.kill(pid, signal.SIGTERM) +while os.path.exists(pid_file): +time.sleep(0.1) +os.unlink(raw_file) +os.rmdir(temp_dir) + +def write_features(header): +qcow2_features = [ +# Incompatible +(0, 0, 'dirty bit'), +(0, 1, 'corrupt bit'), +(0, 2, 'external data file'), +(0, 3, 'compression type'), +(0, 4, 'extended L2 entries'), +# Compatible +(1, 0, 'lazy refcounts'), +# Autoclear +(2, 0, 'bitmaps'), +(2, 1, 'raw external data') +] +struct.pack_into('>I', header, 0x70, 0x6803f857) +struct.pack_into('>I', header, 0x74, len(qcow2_features) * 48) +cur_offset = 0x78 +for (feature_type, feature_bit, feature_name) in qcow2_features: +struct.pack_into('>BB46s', header, cur_offset, + feature_type, feature_bit, feature_name.encode('ascii')) +cur_offset += 48 + +# Command-line arguments +parser = argparse.ArgumentParser(description='This program converts a QEMU disk image to qcow2 ' + 'and writes it to the standard output') +parser.add_argument('input_file', help='name of the input file') +parser.add_argument('-f', dest='input_format', metavar='input_format', +default='raw', +help='format of the input file (default: raw)') +parser.add_argument('-c', dest='cluster_size', metavar='cluster_size', +help=f'qcow2 cluster size (default: {QCOW2_DEFAULT_CLUSTER_SIZE})', +default=QCOW2_DEFAULT_CLUSTER_SIZE, type=int, +choices=[1 << x for x in range(9,22)]) +parser.add_argument('-r', dest='refcount_bits', metavar='refcount_bits', +help=f'width of the reference count entries (default: {QCOW2_DEFAULT_REFCOUNT_BITS})', +default=QCOW2_DEFAULT_R
Re: [PATCH] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
On Wed 12 Jun 2024 09:01:01 AM +03, Manos Pitsidianakis wrote: > Hello Alberto, Hello Manos! > > This is equivalent to using qemu-img to convert a file to qcow2 and > > then writing the result to stdout, with the difference that this > > tool does not need to create this temporary qcow2 file and therefore > > does not need any additional disk space. > > Can you expand on this a little bit? Would modifying qemu-img to write > to stdout if given, say, - instead of a file output path be enough to > make this script unnecessary? Yes, it would be enough. Allowing qemu-img convert to write to stdout would indeed be very nice for the end user but it's a bit of a niche use case and it's also not a trivial task so I don't think that it's worth the effort. The output files that you pass to qemu-img convert need to be seekable because the only way to produce a qcow2 file without doing that is by precalculating all the metadata in advance before starting to write anything (that's why this script reads the input file twice). This is fundamentally different to what qemu-img convert does, which is to read the input file from start to finish and write it to the output file, relying on the relevant driver's existing write operations. All those assume random access to the output file. qemu-img is also much more generic in the sense that it supports many different output formats and image options. In contrast, writing the algorithm for a basic subset of qcow2 is quite simple and that's why I think that it makes sense to do it in a separate tool. Berto
Re: [PATCH] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
On Wed 12 Jun 2024 02:00:19 PM +03, Manos Pitsidianakis wrote: Hi, thanks for the review and sorry for taking so long to reply, I was on vacation. >> scripts/qcow2-to-stdout.py | 330 + >> 1 file changed, 330 insertions(+) >> create mode 100755 scripts/qcow2-to-stdout.py > > I recommend running the `black` formatter on this script, it makes the > code more diff-friendly and uniform. Also it has become the de-facto > python style. Hmmm, I don't like how it reformats some of the lines. However other changes do make sense, so I'll apply those. > Also, it's more pythonic to name constants in uppercase, like > allocated_l2_tables. You can check such lints with pylint > scripts/qcow2-to-stdout.py allocated_l2_tables is not a constant :-? >>+struct.pack_into('>I', header, 0x70, 0x6803f857) >>+struct.pack_into('>I', header, 0x74, len(qcow2_features) * 48) >>+cur_offset = 0x78 > > Minor comment: extract magic values/offsets into constant globals with > descriptive names, it'd help the code be more readable and easier to > maintain if ported in the future to other formats. Good idea, will do. >>+for (feature_type, feature_bit, feature_name) in qcow2_features: >>+struct.pack_into('>BB46s', header, cur_offset, >>+ feature_type, feature_bit, >>feature_name.encode('ascii')) >>+cur_offset += 48 >>+ > >>From here onwards put everything under a main block like so: Ok. >>+# Command-line arguments >>+parser = argparse.ArgumentParser(description='This program converts a QEMU >>disk image to qcow2 ' >>+ 'and writes it to the standard output') >>+parser.add_argument('input_file', help='name of the input file') > > Suggestion: > > -parser.add_argument('input_file', help='name of the input file') > +parser.add_argument('input_file', help='name of the input file', > type=pathlib.Path, required=True) 'required' is not valid in positional arguments, and I'm not sure what benefits using pathlib brings in this case. >>+parser.add_argument('-v', dest='qcow2_version', metavar='qcow2_version', > > Maybe -q instead of -v? No strong feelings on this one, it's just that > -v is usually version. -q is also usually --quiet so not sure... Yeah, I thought the same but I didn't want to complicate this too much, this is just a helper script. >>+# If the input file is not in raw format we can use >>+# qemu-storage-daemon to make it appear as such >>+if input_format != 'raw': >>+temp_dir = tempfile.mkdtemp() > > Consider using the tempfile.TemporaryDirectory as with... context > manager so that the temp dir cleanup is performed automatically I don't think I can do that directly here because the temp dir has to live until the very end (qemu-storage-daemon needs it). >>+pid_file = temp_dir + "/pid" >>+raw_file = temp_dir + "/raw" >>+open(raw_file, 'wb').close() > > Consider using a with open(...) open manager for opening the file How would that be? Like this? with open(raw_file, 'wb'): pass If so I don't see the benefit, I just need to create an empty file and close it immediately. >>+atexit.register(kill_storage_daemon, pid_file, raw_file, temp_dir) > > Hm, this too could be a context manager. Seems very C-like to use > atexit here. Yeah it is, but I think that using the context manager would require me to split the main function in two, and I'm not sure that it's worth it for this case. Other Python scripts in the QEMU repo use atexit already. >>+ret = subprocess.run(["qemu-storage-daemon", "--daemonize", "--pidfile", >>pid_file, >>+ "--blockdev", >>f"driver=file,node-name=file0,driver=file,filename={input_file},read-only=on", >>+ "--blockdev", >>f"driver={input_format},node-name=disk0,file=file0,read-only=on", >>+ "--export", >>f"type=fuse,id=export0,node-name=disk0,mountpoint={raw_file},writable=off"]) > > You can add shell=True, check=False arguments to subprocess.run() so > that it captures the outputs. (check=False is the default behavior, but > better make it explicit) I'm not sure that I understand, why would I need to use a shell here? >>+sys.stdout.buffer.write(cluster) > > Would it be a good idea to check if stdout is a tty and not a > pipe/redirection? You can check it with isatty() and error out to > prevent printing binary to the terminal. Yeah this is a good idea, thanks. Berto
Re: [PATCH] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
On Mon 01 Jul 2024 02:07:01 PM +03, Manos Pitsidianakis wrote: >> and I'm not sure what benefits using pathlib brings in this case. > > implicit type requirement, argument value validations, path > normalization etc. Do you have a specific example? I don't see any difference in behavior if I make input_file a pathlib.Path, I still need to check if the file exists, etc., I don't see that this is validating anything. >> with open(raw_file, 'wb'): >> pass >> >> If so I don't see the benefit, I just need to create an empty file and >> close it immediately. > > My only argument here is that it's "more pythonic" which I know is of > little value and consequence :) Feel free to ignore! They were mere > suggestions. In general I would agree (that's why I'm opening files this way in other parts of the script) but for this case I don't think it's worth it. >> I'm not sure that I understand, why would I need to use a shell here? > > I must have meant capture_output=True, not shell=True, sorry for that > š¤. The explicit check=False says to the reader that this won't throw > an exception so it's just for readability. The capture_output part is > so that you can print the outputs if the return code is an error. Ah I see. I'll send a new version soon. Thanks for the review, Berto
[PATCH v2] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
This tool converts a disk image to qcow2, writing the result directly to stdout. This can be used for example to send the generated file over the network. This is equivalent to using qemu-img to convert a file to qcow2 and then writing the result to stdout, with the difference that this tool does not need to create this temporary qcow2 file and therefore does not need any additional disk space. Implementing this directly in qemu-img is not really an option because it expects the output file to be seekable and it is also meant to be a generic tool that supports all combinations of file formats and image options. Instead, this tool can only produce qcow2 files with the basic options, without compression, encryption or other features. The input file is read twice. The first pass is used to determine which clusters contain non-zero data and that information is used to create the qcow2 header, refcount table and blocks, and L1 and L2 tables. After all that metadata is created then the second pass is used to write the guest data. By default qcow2-to-stdout.py expects the input to be a raw file, but if qemu-storage-daemon is available then it can also be used to read images in other formats. Alternatively the user can also run qemu-ndb or qemu-storage-daemon manually instead. Signed-off-by: Alberto Garcia Signed-off-by: Madeeha Javed --- scripts/qcow2-to-stdout.py | 377 + 1 file changed, 377 insertions(+) create mode 100755 scripts/qcow2-to-stdout.py v2: - Define the QCOW2_V3_HDR_LENGTH and QCOW2_FEATURE_NAME_TABLE constants [Manos] - Define the QEMU_STORAGE_DAEMON constant - Use isfile() instead of exists() for the input file - Refuse to write to stdout if it's a tty [Manos] - Move the bulk of the code to a function called from __main__ [Manos] - Remove the qcow2_ prefix from qcow2_cluster_size and qcow2_refcount_bits - Formatting fixes suggested by the Python black formatter [Manos] - On error pass the string directly to sys.exit() - Capture the output of qemu-storage-daemon [Manos] - Use a contextmanager to run qemu-storage-daemon [Manos] - Update patch description to mention why this cannot be implemeted directly in qemu-img [Manos] v1: https://lists.gnu.org/archive/html/qemu-block/2024-06/msg00073.html diff --git a/scripts/qcow2-to-stdout.py b/scripts/qcow2-to-stdout.py new file mode 100755 index 00..d486a80e86 --- /dev/null +++ b/scripts/qcow2-to-stdout.py @@ -0,0 +1,377 @@ +#!/usr/bin/env python3 + +# This tool reads a disk image in any format and converts it to qcow2, +# writing the result directly to stdout. +# +# Copyright (C) 2024 Igalia, S.L. +# +# Authors: Alberto Garcia +# Madeeha Javed +# +# SPDX-License-Identifier: GPL-2.0-or-later +# +# qcow2 files produced by this script are always arranged like this: +# +# - qcow2 header +# - refcount table +# - refcount blocks +# - L1 table +# - L2 tables +# - Data clusters +# +# A note about variable names: in qcow2 there is one refcount table +# and one (active) L1 table, although each can occupy several +# clusters. For the sake of simplicity the code sometimes talks about +# refcount tables and L1 tables when referring to those clusters. + +import argparse +import atexit +import math +import os +import signal +import struct +import subprocess +import sys +import tempfile +import time +from contextlib import contextmanager + +QCOW2_DEFAULT_CLUSTER_SIZE = 65536 +QCOW2_DEFAULT_REFCOUNT_BITS = 16 +QCOW2_DEFAULT_VERSION = 3 +QCOW2_FEATURE_NAME_TABLE = 0x6803F857 +QCOW2_V3_HEADER_LENGTH = 112 # Header length in QEMU 9.0. Must be a multiple of 8 +QCOW_OFLAG_COPIED = 1 << 63 +QEMU_STORAGE_DAEMON = "qemu-storage-daemon" + + +def bitmap_set(bitmap, idx): +bitmap[int(idx / 8)] |= 1 << (idx % 8) + + +def bitmap_test(bitmap, idx): +return (bitmap[int(idx / 8)] & (1 << (idx % 8))) != 0 + + +# create_qcow2_file() expects a raw input file. If we have a different +# format we can use qemu-storage-daemon to make it appear as raw. +@contextmanager +def get_input_as_raw_file(input_file, input_format): +if input_format == "raw": +yield input_file +return +try: +temp_dir = tempfile.mkdtemp() +pid_file = temp_dir + "/pid" +raw_file = temp_dir + "/raw" +open(raw_file, "wb").close() +ret = subprocess.run( +[ +QEMU_STORAGE_DAEMON, +"--daemonize", +"--pidfile", pid_file, +"--blockdev", f"driver=file,node-name=file0,driver=file,filename={input_file},read-only=on", +"--blockdev", f"driver={input_format},node-name=disk0,file=file0,read-only=on", +"--export", f"type=fuse,id=export0,node-name=disk0,mountpoint={raw_file},writable=off", +
Re: [PATCH v2] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
ping On Mon, Jul 01, 2024 at 05:11:40PM +0200, Alberto Garcia wrote: > This tool converts a disk image to qcow2, writing the result directly > to stdout. This can be used for example to send the generated file > over the network.
Re: [PATCH v2] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
On Sun 28 Jul 2024 01:01:07 AM +03, Nir Soffer wrote: >> +def bitmap_set(bitmap, idx): >> +bitmap[int(idx / 8)] |= 1 << (idx % 8) > > Should use floor division operator (//): > > bitmap[idx // 8] |= 1 << (idx % 8) > > Same for bitmap_test(). Right, also for all other cases of int(foo / bar). I'll update them. >> +ret = subprocess.run( >> +[ >> +QEMU_STORAGE_DAEMON, >> +"--daemonize", > > Any reason to daemonize? managing the child process is easier if you > don't daemonize. --daemonize guarantees that when subprocess.run() returns the exported raw_file is ready to use. >> +if len(cluster) < cluster_size: >> +cluster += bytes(cluster_size - len(cluster)) > > This should be done only for the last cluster. But that's what that condition is for, we'll only read less than cluster_size bytes at the end of the file. >> +if not bitmap_test(l1_bitmap, l1_idx): >> +bitmap_set(l1_bitmap, l1_idx) >> +allocated_l2_tables += 1 > > This could be much more efficient using SEEK_DATA/SEEK_HOLE, avoiding > reading the entire image twice. Or using "qemu-img map --output json" > to simplify. I prefer not to have external dependencies(*) so I would rather not use qemu-img, but I certainly can use SEEK_DATA/SEEK_HOLE to avoid reading data that is known to be zero in the first pass. (*) there is of course qemu-storage-daemon but that one is optional and I anyway cannot implement its functionality in this script. >> +sys.stdout.buffer.write(cluster) >> +else: >> +skip += 1 > > If would be easier to work with if you add a function iterating over > the l2_entries, yielding the he cluster index to copy: > > def iter_l2_entries(bitmap, clusters): > for idx in range(clusters): > if bitmap_test(bitmap, idx): > yield idx > > The copy loop can read using os.pread(): > > for idx in iter_l2_entries(l2_bitmap, total_data_clusters): > cluster = os.pread(fd, cluster_size, cluster_size * idx) > sys.stdout.buffer.write(cluster) > > I'm not sure the offset is right in my example, it is hard to > understand the semantics of skip in your code. That part reads the input file sequentially from start to end, but instead of reading empty clusters we use seek() to skip them. The 'skip' variable keeps a counter of empty clusters since the last read. Your proposal requires an additional function but I see that it can make the code more readable, so I'll give it a try. >> +if __name__ == "__main__": > > Usually code is easier to work with when __main__ calls main(). Good idea. Thanks for the detailed review! Berto
[PATCH v3] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
This tool converts a disk image to qcow2, writing the result directly to stdout. This can be used for example to send the generated file over the network. This is equivalent to using qemu-img to convert a file to qcow2 and then writing the result to stdout, with the difference that this tool does not need to create this temporary qcow2 file and therefore does not need any additional disk space. Implementing this directly in qemu-img is not really an option because it expects the output file to be seekable and it is also meant to be a generic tool that supports all combinations of file formats and image options. Instead, this tool can only produce qcow2 files with the basic options, without compression, encryption or other features. The input file is read twice. The first pass is used to determine which clusters contain non-zero data and that information is used to create the qcow2 header, refcount table and blocks, and L1 and L2 tables. After all that metadata is created then the second pass is used to write the guest data. By default qcow2-to-stdout.py expects the input to be a raw file, but if qemu-storage-daemon is available then it can also be used to read images in other formats. Alternatively the user can also run qemu-ndb or qemu-storage-daemon manually instead. Signed-off-by: Alberto Garcia Signed-off-by: Madeeha Javed --- scripts/qcow2-to-stdout.py | 400 + 1 file changed, 400 insertions(+) create mode 100755 scripts/qcow2-to-stdout.py v3: - Remove stale 'import atexit' [Nir] - Replace all instances of int(x / y) with x // y [Nir] - Rename bitmap_test() to bitmap_is_set() [Nir] - Use os.path.join() to construct path names [Nir] - Rename create_qcow2_file() to write_qcow2_content() [Nir] - Create a main() function [Nir] - Use os.pread() with the absolute offset instead of seek() [Nir] - Detect holes in the input file and skip reading them [Nir] v2: https://lists.gnu.org/archive/html/qemu-block/2024-07/msg00018.html - Define the QCOW2_V3_HDR_LENGTH and QCOW2_FEATURE_NAME_TABLE constants [Manos] - Define the QEMU_STORAGE_DAEMON constant - Use isfile() instead of exists() for the input file - Refuse to write to stdout if it's a tty [Manos] - Move the bulk of the code to a function called from __main__ [Manos] - Remove the qcow2_ prefix from qcow2_cluster_size and qcow2_refcount_bits - Formatting fixes suggested by the Python black formatter [Manos] - On error pass the string directly to sys.exit() - Capture the output of qemu-storage-daemon [Manos] - Use a contextmanager to run qemu-storage-daemon [Manos] - Update patch description to mention why this cannot be implemeted directly in qemu-img [Manos] v1: https://lists.gnu.org/archive/html/qemu-block/2024-06/msg00073.html diff --git a/scripts/qcow2-to-stdout.py b/scripts/qcow2-to-stdout.py new file mode 100755 index 00..1b45bc5bd2 --- /dev/null +++ b/scripts/qcow2-to-stdout.py @@ -0,0 +1,400 @@ +#!/usr/bin/env python3 + +# This tool reads a disk image in any format and converts it to qcow2, +# writing the result directly to stdout. +# +# Copyright (C) 2024 Igalia, S.L. +# +# Authors: Alberto Garcia +# Madeeha Javed +# +# SPDX-License-Identifier: GPL-2.0-or-later +# +# qcow2 files produced by this script are always arranged like this: +# +# - qcow2 header +# - refcount table +# - refcount blocks +# - L1 table +# - L2 tables +# - Data clusters +# +# A note about variable names: in qcow2 there is one refcount table +# and one (active) L1 table, although each can occupy several +# clusters. For the sake of simplicity the code sometimes talks about +# refcount tables and L1 tables when referring to those clusters. + +import argparse +import errno +import math +import os +import signal +import struct +import subprocess +import sys +import tempfile +import time +from contextlib import contextmanager + +QCOW2_DEFAULT_CLUSTER_SIZE = 65536 +QCOW2_DEFAULT_REFCOUNT_BITS = 16 +QCOW2_DEFAULT_VERSION = 3 +QCOW2_FEATURE_NAME_TABLE = 0x6803F857 +QCOW2_V3_HEADER_LENGTH = 112 # Header length in QEMU 9.0. Must be a multiple of 8 +QCOW_OFLAG_COPIED = 1 << 63 +QEMU_STORAGE_DAEMON = "qemu-storage-daemon" + + +def bitmap_set(bitmap, idx): +bitmap[idx // 8] |= 1 << (idx % 8) + + +def bitmap_is_set(bitmap, idx): +return (bitmap[idx // 8] & (1 << (idx % 8))) != 0 + + +def bitmap_iterator(bitmap, length): +for idx in range(length): +if bitmap_is_set(bitmap, idx): +yield idx + + +# Holes in the input file contain only zeroes so we can skip them and +# save time. This function returns the indexes of the clusters that +# are known to contain data. Those are the ones that we need to read. +def clusters_with_data(fd, cluster_size): +data_off = 0 +while True: +hole_off = os.lseek(fd, data_off, os.SEEK_HOLE) +for idx in range(data_off // cluster_size, math.ceil(hole_off / cluster_size)): +yield idx +
Re: [PATCH v3] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
On Mon 29 Jul 2024 03:20:18 PM -05, Eric Blake wrote: > On Mon, Jul 29, 2024 at 05:02:26PM GMT, Alberto Garcia wrote: >> +# qcow2 files produced by this script are always arranged like this: >> +# >> +# - qcow2 header >> +# - refcount table >> +# - refcount blocks >> +# - L1 table >> +# - L2 tables >> +# - Data clusters > > Is it easy to make your tool spit out a qcow2 image with external data > file (to write a quick qcow2 wrapper for an existing file to now be > used as external data)? Or is that too much of a difference from the > intended use of this tool? I didn't consider that use case and I didn't want to make the tool too complicated. It's probably not very hard to do, if I can do it without too many changes to the code I'll give it a try. >> +def clusters_with_data(fd, cluster_size): >> +data_off = 0 >> +while True: >> +hole_off = os.lseek(fd, data_off, os.SEEK_HOLE) >> +for idx in range(data_off // cluster_size, math.ceil(hole_off / >> cluster_size)): >> +yield idx >> +try: >> +data_off = os.lseek(fd, hole_off, os.SEEK_DATA) > > Depending on the size of cluster_size, this could return the same > offset more than once (for example, for 1M clusters but 64k > granularity on holes, consider what happens if lseek(0, SEEK_HOLE) > returns 64k, then lseek(64k, SEEK_DATA) returns 128k: you end up > yielding idx 0 twice). You may need to be more careful than that. I literally opened my laptop because I just realized that this could be a problem :D I'll fix it in the next version. >> +ret = subprocess.run( >> +[ >> +QEMU_STORAGE_DAEMON, >> +"--daemonize", >> +"--pidfile", pid_file, >> +"--blockdev", >> f"driver=file,node-name=file0,driver=file,filename={input_file},read-only=on", >> +"--blockdev", >> f"driver={input_format},node-name=disk0,file=file0,read-only=on", >> +"--export", >> f"type=fuse,id=export0,node-name=disk0,mountpoint={raw_file},writable=off", >> +], >> +capture_output=True, >> +) > > Does q-s-d exposing an image as raw still support lseek(SEEK_HOLE) > efficiently? Yes, I tested it before sending the new version, the improvements are actually quite noticeable, thanks Nir for suggesting this. >> +parser.add_argument( >> +"-v", >> +dest="qcow2_version", >> +metavar="qcow2_version", >> +help=f"qcow2 version (default: {QCOW2_DEFAULT_VERSION})", >> +default=QCOW2_DEFAULT_VERSION, >> +type=int, >> +choices=[2, 3], > > Is it really worth trying to create v2 images? I don't know, I added this because it required almost no changes to the code. But I don't have a strong opinion, I can remove support for v2 images. Berto
[PATCH v4] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
This tool converts a disk image to qcow2, writing the result directly to stdout. This can be used for example to send the generated file over the network. This is equivalent to using qemu-img to convert a file to qcow2 and then writing the result to stdout, with the difference that this tool does not need to create this temporary qcow2 file and therefore does not need any additional disk space. Implementing this directly in qemu-img is not really an option because it expects the output file to be seekable and it is also meant to be a generic tool that supports all combinations of file formats and image options. Instead, this tool can only produce qcow2 files with the basic options, without compression, encryption or other features. The input file is read twice. The first pass is used to determine which clusters contain non-zero data and that information is used to create the qcow2 header, refcount table and blocks, and L1 and L2 tables. After all that metadata is created then the second pass is used to write the guest data. By default qcow2-to-stdout.py expects the input to be a raw file, but if qemu-storage-daemon is available then it can also be used to read images in other formats. Alternatively the user can also run qemu-nbd or qemu-storage-daemon manually instead. Signed-off-by: Alberto Garcia Signed-off-by: Madeeha Javed --- scripts/qcow2-to-stdout.py | 449 + 1 file changed, 449 insertions(+) create mode 100755 scripts/qcow2-to-stdout.py v4: - Rewrite clusters_with_data() to simplify the code and to prevent the same cluster from being returned more than once. - Add support for data_file and data_file_raw [Eric] - When allocating the l2_bitmap use the actual number of L1 entries instead of the theoretical maximum. This reduces the memory requirements significantly with larger cluster sizes. - Add align_up() function - Remove support for v2 images [Eric] - Fix typo in commit message [Eric] - Minor style fixes v3: https://lists.gnu.org/archive/html/qemu-block/2024-07/msg00597.html - Remove stale 'import atexit' [Nir] - Replace all instances of int(x / y) with x // y [Nir] - Rename bitmap_test() to bitmap_is_set() [Nir] - Use os.path.join() to construct path names [Nir] - Rename create_qcow2_file() to write_qcow2_content() [Nir] - Create a main() function [Nir] - Use os.pread() with the absolute offset instead of seek() [Nir] - Detect holes in the input file and skip reading them [Nir] v2: https://lists.gnu.org/archive/html/qemu-block/2024-07/msg00018.html - Define the QCOW2_V3_HDR_LENGTH and QCOW2_FEATURE_NAME_TABLE constants [Manos] - Define the QEMU_STORAGE_DAEMON constant - Use isfile() instead of exists() for the input file - Refuse to write to stdout if it's a tty [Manos] - Move the bulk of the code to a function called from __main__ [Manos] - Remove the qcow2_ prefix from qcow2_cluster_size and qcow2_refcount_bits - Formatting fixes suggested by the Python black formatter [Manos] - On error pass the string directly to sys.exit() - Capture the output of qemu-storage-daemon [Manos] - Use a contextmanager to run qemu-storage-daemon [Manos] - Update patch description to mention why this cannot be implemeted directly in qemu-img [Manos] v1: https://lists.gnu.org/archive/html/qemu-block/2024-06/msg00073.html diff --git a/scripts/qcow2-to-stdout.py b/scripts/qcow2-to-stdout.py new file mode 100755 index 00..06b7c13ccb --- /dev/null +++ b/scripts/qcow2-to-stdout.py @@ -0,0 +1,449 @@ +#!/usr/bin/env python3 + +# This tool reads a disk image in any format and converts it to qcow2, +# writing the result directly to stdout. +# +# Copyright (C) 2024 Igalia, S.L. +# +# Authors: Alberto Garcia +# Madeeha Javed +# +# SPDX-License-Identifier: GPL-2.0-or-later +# +# qcow2 files produced by this script are always arranged like this: +# +# - qcow2 header +# - refcount table +# - refcount blocks +# - L1 table +# - L2 tables +# - Data clusters +# +# A note about variable names: in qcow2 there is one refcount table +# and one (active) L1 table, although each can occupy several +# clusters. For the sake of simplicity the code sometimes talks about +# refcount tables and L1 tables when referring to those clusters. + +import argparse +import errno +import math +import os +import signal +import struct +import subprocess +import sys +import tempfile +import time +from contextlib import contextmanager + +QCOW2_DEFAULT_CLUSTER_SIZE = 65536 +QCOW2_DEFAULT_REFCOUNT_BITS = 16 +QCOW2_FEATURE_NAME_TABLE = 0x6803F857 +QCOW2_DATA_FILE_NAME_STRING = 0x44415441 +QCOW2_V3_HEADER_LENGTH = 112 # Header length in QEMU 9.0. Must be a multiple of 8 +QCOW2_INCOMPAT_DATA_FILE_BIT = 2 +QCOW2_AUTOCLEAR_DATA_FILE_RAW_BIT = 1 +QCOW_OFLAG_COPIED = 1 << 63 +QEMU_STORAGE_DAEMON = "qemu-storage-daemon" + + +def bitmap_set(bitmap, idx): +bitmap[idx // 8] |= 1 << (idx % 8) + + +def bitmap_is_set(bitmap, idx): +return (bitm
Re: [PATCH v4] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
ping On Tue, Jul 30, 2024 at 04:15:52PM +0200, Alberto Garcia wrote: > This tool converts a disk image to qcow2, writing the result directly > to stdout. This can be used for example to send the generated file > over the network.
Re: [PATCH v4] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
On Tue, Jul 30, 2024 at 04:15:52PM +0200, Alberto Garcia wrote: > This tool converts a disk image to qcow2, writing the result directly > to stdout. This can be used for example to send the generated file > over the network. ping
[Qemu-block] [PATCH v2 14/22] iotests: Add test for the block device statistics
Signed-off-by: Alberto Garcia --- tests/qemu-iotests/136 | 349 + tests/qemu-iotests/136.out | 5 + tests/qemu-iotests/group | 1 + 3 files changed, 355 insertions(+) create mode 100644 tests/qemu-iotests/136 create mode 100644 tests/qemu-iotests/136.out diff --git a/tests/qemu-iotests/136 b/tests/qemu-iotests/136 new file mode 100644 index 000..f574d83 --- /dev/null +++ b/tests/qemu-iotests/136 @@ -0,0 +1,349 @@ +#!/usr/bin/env python +# +# Tests for block device statistics +# +# Copyright (C) 2015 Igalia, S.L. +# Author: Alberto Garcia +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +import iotests +import os + +interval_length = 10 +nsec_per_sec = 10 +op_latency = nsec_per_sec / 1000 # See qtest_latency_ns in accounting.c +bad_sector = 8192 +bad_offset = bad_sector * 512 +blkdebug_file = os.path.join(iotests.test_dir, 'blkdebug.conf') + +class BlockDeviceStatsTestCase(iotests.QMPTestCase): +test_img = "null-aio://" +total_rd_bytes = 0 +total_rd_ops = 0 +total_wr_bytes = 0 +total_wr_ops = 0 +total_wr_merged = 0 +total_flush_ops = 0 +failed_rd_ops = 0 +failed_wr_ops = 0 +invalid_rd_ops = 0 +invalid_wr_ops = 0 +wr_highest_offset = 0 +account_invalid = False +account_failed = False + +def blockstats(self, device): +result = self.vm.qmp("query-blockstats") +for r in result['return']: +if r['device'] == device: +return r['stats'] +raise Exception("Device not found for blockstats: %s" % device) + +def create_blkdebug_file(self): +file = open(blkdebug_file, 'w') +file.write(''' +[inject-error] +event = "read_aio" +errno = "5" +sector = "%d" + +[inject-error] +event = "write_aio" +errno = "5" +sector = "%d" +''' % (bad_sector, bad_sector)) +file.close() + +def setUp(self): +drive_args = [] +drive_args.append("stats-intervals=%d" % interval_length) +drive_args.append("stats-account-invalid=%s" % + (self.account_invalid and "on" or "off")) +drive_args.append("stats-account-failed=%s" % + (self.account_failed and "on" or "off")) +self.create_blkdebug_file() +self.vm = iotests.VM().add_drive('blkdebug:%s:%s ' % + (blkdebug_file, self.test_img), + ','.join(drive_args)) +self.vm.launch() +# Set an initial value for the clock +self.vm.qtest("clock_step %d" % nsec_per_sec) + +def tearDown(self): +self.vm.shutdown() +os.remove(blkdebug_file) + +def accounted_ops(self, read = False, write = False, flush = False): +ops = 0 +if write: +ops += self.total_wr_ops +if self.account_failed: +ops += self.failed_wr_ops +if self.account_invalid: +ops += self.invalid_wr_ops +if read: +ops += self.total_rd_ops +if self.account_failed: +ops += self.failed_rd_ops +if self.account_invalid: +ops += self.invalid_rd_ops +if flush: +ops += self.total_flush_ops +return ops + +def accounted_latency(self, read = False, write = False, flush = False): +latency = 0 +if write: +latency += self.total_wr_ops * op_latency +if self.account_failed: +latency += self.failed_wr_ops * op_latency +if read: +latency += self.total_rd_ops * op_latency +if self.account_failed: +latency += self.failed_rd_ops * op_latency +if flush: +latency += self.total_flush_ops * op_latency +return latency + +def check_values(self): +stats = self.blockstats('drive0') + +# Check that the totals match with what we have calculated +self.assertEqual(self.total_rd_bytes, stats['rd_bytes']) +
[Qemu-block] [PATCH v2 02/22] ide: Account for write operations correctly
Signed-off-by: Alberto Garcia --- hw/ide/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 317406d..b559f1b 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -895,7 +895,7 @@ static void ide_sector_write(IDEState *s) qemu_iovec_init_external(&s->qiov, &s->iov, 1); block_acct_start(blk_get_stats(s->blk), &s->acct, - n * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); + n * BDRV_SECTOR_SIZE, BLOCK_ACCT_WRITE); s->pio_aiocb = blk_aio_writev(s->blk, sector_num, &s->qiov, n, ide_sector_write_cb, s); } -- 2.5.3
[Qemu-block] [PATCH v2 12/22] qemu-io: Account for failed, invalid and flush operations
Signed-off-by: Alberto Garcia --- qemu-io-cmds.c | 9 + 1 file changed, 9 insertions(+) diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index d6572a8..f8f02ab 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -1364,6 +1364,7 @@ static void aio_write_done(void *opaque, int ret) if (ret < 0) { printf("aio_write failed: %s\n", strerror(-ret)); +block_acct_failed(blk_get_stats(ctx->blk), &ctx->acct); goto out; } @@ -1392,6 +1393,7 @@ static void aio_read_done(void *opaque, int ret) if (ret < 0) { printf("readv failed: %s\n", strerror(-ret)); +block_acct_failed(blk_get_stats(ctx->blk), &ctx->acct); goto out; } @@ -1505,6 +1507,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char **argv) if (ctx->offset & 0x1ff) { printf("offset %" PRId64 " is not sector aligned\n", ctx->offset); +block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ); g_free(ctx); return 0; } @@ -1512,6 +1515,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char **argv) nr_iov = argc - optind; ctx->buf = create_iovec(blk, &ctx->qiov, &argv[optind], nr_iov, 0xab); if (ctx->buf == NULL) { +block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ); g_free(ctx); return 0; } @@ -1600,6 +1604,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char **argv) if (ctx->offset & 0x1ff) { printf("offset %" PRId64 " is not sector aligned\n", ctx->offset); +block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); g_free(ctx); return 0; } @@ -1607,6 +1612,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char **argv) nr_iov = argc - optind; ctx->buf = create_iovec(blk, &ctx->qiov, &argv[optind], nr_iov, pattern); if (ctx->buf == NULL) { +block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); g_free(ctx); return 0; } @@ -1621,7 +1627,10 @@ static int aio_write_f(BlockBackend *blk, int argc, char **argv) static int aio_flush_f(BlockBackend *blk, int argc, char **argv) { +BlockAcctCookie cookie = { 0 }; +block_acct_start(blk_get_stats(blk), &cookie, 0, BLOCK_ACCT_FLUSH); blk_drain_all(); +block_acct_done(blk_get_stats(blk), &cookie); return 0; } -- 2.5.3
[Qemu-block] [PATCH v2 13/22] block: Use QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode
This patch switches to QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode, and makes the latency of the operation constant. This way we can perform tests on the accounting code with reproducible results. Signed-off-by: Alberto Garcia --- block/accounting.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/block/accounting.c b/block/accounting.c index d94ebed..8eb59fc 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -25,14 +25,20 @@ #include "block/accounting.h" #include "block/block_int.h" #include "qemu/timer.h" +#include "sysemu/qtest.h" static QEMUClockType clock_type = QEMU_CLOCK_REALTIME; +static const int qtest_latency_ns = NANOSECONDS_PER_SECOND / 1000; void block_acct_init(BlockAcctStats *stats, bool account_invalid, bool account_failed) { stats->account_invalid = account_invalid; stats->account_failed = account_failed; + +if (qtest_enabled()) { +clock_type = QEMU_CLOCK_VIRTUAL; +} } void block_acct_cleanup(BlockAcctStats *stats) @@ -84,6 +90,10 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie) int64_t time_ns = qemu_clock_get_ns(clock_type); int64_t latency_ns = time_ns - cookie->start_time_ns; +if (qtest_enabled()) { +latency_ns = qtest_latency_ns; +} + assert(cookie->type < BLOCK_MAX_IOTYPE); stats->nr_bytes[cookie->type] += cookie->bytes; @@ -107,6 +117,10 @@ void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie) int64_t time_ns = qemu_clock_get_ns(clock_type); int64_t latency_ns = time_ns - cookie->start_time_ns; +if (qtest_enabled()) { +latency_ns = qtest_latency_ns; +} + stats->total_time_ns[cookie->type] += latency_ns; stats->last_access_time_ns = time_ns; -- 2.5.3
[Qemu-block] [PATCH v2 05/22] block: Add idle_time_ns to BlockDeviceStats
This patch adds the new field 'idle_time_ns' to the BlockDeviceStats structure, indicating the time that has passed since the previous I/O operation. It also adds the block_acct_idle_time_ns() call, to ensure that all references to the clock type used for accounting are in the same place. This will later allow us to use a different clock for iotests. Signed-off-by: Alberto Garcia --- block/accounting.c | 12 ++-- block/qapi.c | 5 + hmp.c | 4 +++- include/block/accounting.h | 2 ++ qapi/block-core.json | 6 +- qmp-commands.hx| 10 -- 6 files changed, 33 insertions(+), 6 deletions(-) diff --git a/block/accounting.c b/block/accounting.c index 6f4c0f1..d427fa8 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -40,12 +40,15 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie) { +int64_t time_ns = qemu_clock_get_ns(clock_type); +int64_t latency_ns = time_ns - cookie->start_time_ns; + assert(cookie->type < BLOCK_MAX_IOTYPE); stats->nr_bytes[cookie->type] += cookie->bytes; stats->nr_ops[cookie->type]++; -stats->total_time_ns[cookie->type] += -qemu_clock_get_ns(clock_type) - cookie->start_time_ns; +stats->total_time_ns[cookie->type] += latency_ns; +stats->last_access_time_ns = time_ns; } @@ -55,3 +58,8 @@ void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, assert(type < BLOCK_MAX_IOTYPE); stats->merged[type] += num_requests; } + +int64_t block_acct_idle_time_ns(BlockAcctStats *stats) +{ +return qemu_clock_get_ns(clock_type) - stats->last_access_time_ns; +} diff --git a/block/qapi.c b/block/qapi.c index e936ba7..66f2604 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -357,6 +357,11 @@ static BlockStats *bdrv_query_stats(const BlockDriverState *bs, s->stats->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE]; s->stats->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ]; s->stats->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH]; + +s->stats->has_idle_time_ns = stats->last_access_time_ns > 0; +if (s->stats->has_idle_time_ns) { +s->stats->idle_time_ns = block_acct_idle_time_ns(stats); +} } s->stats->wr_highest_offset = bs->wr_highest_offset; diff --git a/hmp.c b/hmp.c index 6e2bbc8..289c240 100644 --- a/hmp.c +++ b/hmp.c @@ -511,6 +511,7 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict) " flush_total_time_ns=%" PRId64 " rd_merged=%" PRId64 " wr_merged=%" PRId64 + " idle_time_ns=%" PRId64 "\n", stats->value->stats->rd_bytes, stats->value->stats->wr_bytes, @@ -521,7 +522,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict) stats->value->stats->rd_total_time_ns, stats->value->stats->flush_total_time_ns, stats->value->stats->rd_merged, - stats->value->stats->wr_merged); + stats->value->stats->wr_merged, + stats->value->stats->idle_time_ns); } qapi_free_BlockStatsList(stats_list); diff --git a/include/block/accounting.h b/include/block/accounting.h index 66637cd..4b2b999 100644 --- a/include/block/accounting.h +++ b/include/block/accounting.h @@ -40,6 +40,7 @@ typedef struct BlockAcctStats { uint64_t nr_ops[BLOCK_MAX_IOTYPE]; uint64_t total_time_ns[BLOCK_MAX_IOTYPE]; uint64_t merged[BLOCK_MAX_IOTYPE]; +int64_t last_access_time_ns; } BlockAcctStats; typedef struct BlockAcctCookie { @@ -53,5 +54,6 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie); void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, int num_requests); +int64_t block_acct_idle_time_ns(BlockAcctStats *stats); #endif diff --git a/qapi/block-core.json b/qapi/block-core.json index 5f12af7..a529e2f 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -448,6 +448,10 @@ # @wr_merged: Number of write requests that have been merged into another # request (Since 2.3). # +# @idle_time_ns: #optional Time since the last I/O operation, in +#miliseconds. If the field is absent it means that +#there haven't been any operations yet (Since 2.5). +# # Since: 0.14.0 ##
[Qemu-block] [PATCH v2 17/22] xen_disk: Account for failed and invalid operations
Signed-off-by: Alberto Garcia --- hw/block/xen_disk.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index 4869518..02eda6e 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -537,7 +537,11 @@ static void qemu_aio_complete(void *opaque, int ret) break; } case BLKIF_OP_READ: -block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct); +if (ioreq->status == BLKIF_RSP_OKAY) { +block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct); +} else { +block_acct_failed(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct); +} break; case BLKIF_OP_DISCARD: default: @@ -722,6 +726,23 @@ static void blk_handle_requests(struct XenBlkDev *blkdev) /* parse them */ if (ioreq_parse(ioreq) != 0) { + +switch (ioreq->req.operation) { +case BLKIF_OP_READ: +block_acct_invalid(blk_get_stats(blkdev->blk), + BLOCK_ACCT_READ); +break; +case BLKIF_OP_WRITE: +block_acct_invalid(blk_get_stats(blkdev->blk), + BLOCK_ACCT_WRITE); +break; +case BLKIF_OP_FLUSH_DISKCACHE: +block_acct_invalid(blk_get_stats(blkdev->blk), + BLOCK_ACCT_FLUSH); +default: +break; +}; + if (blk_send_response_one(ioreq)) { xen_be_send_notify(&blkdev->xendev); } -- 2.5.3
[Qemu-block] [PATCH v2 18/22] atapi: Account for failed and invalid operations
Signed-off-by: Alberto Garcia --- hw/ide/atapi.c | 31 +++ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c index 747f466..cf0b78e 100644 --- a/hw/ide/atapi.c +++ b/hw/ide/atapi.c @@ -108,27 +108,30 @@ static void cd_data_to_raw(uint8_t *buf, int lba) static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size) { int ret; +block_acct_start(blk_get_stats(s->blk), &s->acct, + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); switch(sector_size) { case 2048: -block_acct_start(blk_get_stats(s->blk), &s->acct, - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4); -block_acct_done(blk_get_stats(s->blk), &s->acct); break; case 2352: -block_acct_start(blk_get_stats(s->blk), &s->acct, - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4); -block_acct_done(blk_get_stats(s->blk), &s->acct); -if (ret < 0) -return ret; -cd_data_to_raw(buf, lba); +if (ret >= 0) { +cd_data_to_raw(buf, lba); +} break; default: -ret = -EIO; -break; +block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ); +return -EIO; } + +if (ret < 0) { +block_acct_failed(blk_get_stats(s->blk), &s->acct); +} else { +block_acct_done(blk_get_stats(s->blk), &s->acct); +} + return ret; } @@ -357,7 +360,11 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret) return; eot: -block_acct_done(blk_get_stats(s->blk), &s->acct); +if (ret < 0) { +block_acct_failed(blk_get_stats(s->blk), &s->acct); +} else { +block_acct_done(blk_get_stats(s->blk), &s->acct); +} ide_set_inactive(s, false); } -- 2.5.3
[Qemu-block] [PATCH v2 03/22] block: define 'clock_type' for the accounting code
Its value is still QEMU_CLOCK_REALTIME, but having it in a variable will allow us to change its value easily in the future when running in qtest mode. Signed-off-by: Alberto Garcia --- block/accounting.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/block/accounting.c b/block/accounting.c index a423560..6f4c0f1 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -26,13 +26,15 @@ #include "block/block_int.h" #include "qemu/timer.h" +static QEMUClockType clock_type = QEMU_CLOCK_REALTIME; + void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, int64_t bytes, enum BlockAcctType type) { assert(type < BLOCK_MAX_IOTYPE); cookie->bytes = bytes; -cookie->start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); +cookie->start_time_ns = qemu_clock_get_ns(clock_type); cookie->type = type; } @@ -43,7 +45,7 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie) stats->nr_bytes[cookie->type] += cookie->bytes; stats->nr_ops[cookie->type]++; stats->total_time_ns[cookie->type] += -qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - cookie->start_time_ns; +qemu_clock_get_ns(clock_type) - cookie->start_time_ns; } -- 2.5.3
[Qemu-block] [PATCH v2 00/22] Extended I/O accounting
Hi, A few months ago I announced that I was planning to extend the I/O accounting in QEMU based on the previous plans by BenoƮt and his discussions in the mailing list in 2014. Here are the links for reference: https://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04954.html https://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00080.html https://lists.nongnu.org/archive/html/qemu-block/2015-06/msg00071.html In June I sent a patch series with the infrastructure that I was going to use to compute averages. After that the 2.4 release happened and I had some time to continue working on this, so this new series contains the complete implementation of the new statistics, plus tests and a couple of bug fixes. The series is long but most patches are quite simple so they should be easy to understand. It applies on top of Max's "BlockBackend and media v5" that moves BlockAcctStats to BlockBackend. Here's the summary of what's new: - New block_acct_failed() and block_acct_invalid() calls. We keep track now of the number of successful, failed and invalid operations (each one separated into read, write and flush). So from the API point of view, BlockDeviceStats contains 6 new fields for those. - idle_time_ns: time since the last I/O operation. - New BlockDeviceTimedStats struct: it has statistics for the I/O during a given interval of time. It keeps minimum, maximum and average latencies for read, write and flush operations. It also keeps the average read and write queue depths. - New 'stats-intervals' option that allows the user to define the intervals used to keep the aforementioned statistics. An arbitrary number of intervals can be specified, the length of each one is in seconds. For the API I opted for a colon-separated list of numbers, stats-intervals=60:3600:86400 I also considered something a different syntax, stats-intervals.0.length=60, stats-intervals.1.length=3600, stats-intervals.2.length=86400 This one could be useful if we want to specify any other attribute for each interval, but I couldn't come up with any, so I chose the simpler solution. - Two new options, stats-account-invalid and stats-account-failed, which allow the user to decide whether to count invalid and failed operations when computing the idle time and total latency. - 'supports_stats': a new field for the BlockStats structure that tells you whether the BDS supports statistics or not. This one can probably be improved by asking the device model. I think that's all. I'm sure there will be questions and rough edges to discuss, so I'm all yours. Regards, Berto v2: - First complete implementation of the new statistics v1: https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg03321.html - Initial series containing only the timed average infrastructure. Alberto Garcia (22): xen_disk: Account for flush operations ide: Account for write operations correctly block: define 'clock_type' for the accounting code util: Infrastructure for computing recent averages block: Add idle_time_ns to BlockDeviceStats block: Add "supports_stats" field to BlockStats block: Add statistics for failed and invalid I/O operations block: Allow configuring whether to account failed and invalid ops block: Compute minimum, maximum and average I/O latencies block: Add average I/O queue depth to BlockDeviceTimedStats block: New option to define the intervals for collecting I/O statistics qemu-io: Account for failed, invalid and flush operations block: Use QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode iotests: Add test for the block device statistics nvme: Account for failed and invalid operations virtio-blk: Account for failed and invalid operations xen_disk: Account for failed and invalid operations atapi: Account for failed and invalid operations ide: Account for failed and invalid operations macio: Account for failed operations scsi-disk: Account for failed operations block: Update copyright of the accounting code block/accounting.c | 118 ++- block/block-backend.c| 1 + block/qapi.c | 53 +++ blockdev.c | 53 +++ hmp.c| 4 +- hw/block/nvme.c | 11 +- hw/block/virtio-blk.c| 4 +- hw/block/xen_disk.c | 27 +++- hw/ide/atapi.c | 31 ++-- hw/ide/core.c| 12 +- hw/ide/macio.c | 12 +- hw/scsi/scsi-disk.c | 46 -- include/block/accounting.h | 28 include/qemu/timed-average.h | 64 qapi/block-core.json | 106 - qemu-io-cmds.c | 9 ++ qmp-commands.hx | 86 ++- tests/Makefile | 4 + tests/qemu-iotests/136 | 349 +++
[Qemu-block] [PATCH v2 15/22] nvme: Account for failed and invalid operations
Signed-off-by: Alberto Garcia --- hw/block/nvme.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 5da41b2..169e4fa 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -201,10 +201,11 @@ static void nvme_rw_cb(void *opaque, int ret) NvmeCtrl *n = sq->ctrl; NvmeCQueue *cq = n->cq[sq->cqid]; -block_acct_done(blk_get_stats(n->conf.blk), &req->acct); if (!ret) { +block_acct_done(blk_get_stats(n->conf.blk), &req->acct); req->status = NVME_SUCCESS; } else { +block_acct_failed(blk_get_stats(n->conf.blk), &req->acct); req->status = NVME_INTERNAL_DEV_ERROR; } if (req->has_sg) { @@ -238,18 +239,22 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, uint64_t data_size = (uint64_t)nlb << data_shift; uint64_t aio_slba = slba << (data_shift - BDRV_SECTOR_BITS); int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0; +enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ; if ((slba + nlb) > ns->id_ns.nsze) { +block_acct_invalid(blk_get_stats(n->conf.blk), acct); return NVME_LBA_RANGE | NVME_DNR; } + if (nvme_map_prp(&req->qsg, prp1, prp2, data_size, n)) { +block_acct_invalid(blk_get_stats(n->conf.blk), acct); return NVME_INVALID_FIELD | NVME_DNR; } + assert((nlb << data_shift) == req->qsg.size); req->has_sg = true; -dma_acct_start(n->conf.blk, &req->acct, &req->qsg, - is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ); +dma_acct_start(n->conf.blk, &req->acct, &req->qsg, acct); req->aiocb = is_write ? dma_blk_write(n->conf.blk, &req->qsg, aio_slba, nvme_rw_cb, req) : dma_blk_read(n->conf.blk, &req->qsg, aio_slba, nvme_rw_cb, req); -- 2.5.3
[Qemu-block] [PATCH v2 22/22] block: Update copyright of the accounting code
Signed-off-by: Alberto Garcia --- block/accounting.c | 1 + include/block/accounting.h | 1 + 2 files changed, 2 insertions(+) diff --git a/block/accounting.c b/block/accounting.c index 8eb59fc..ebe5ad6 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -2,6 +2,7 @@ * QEMU System Emulator block accounting * * Copyright (c) 2011 Christoph Hellwig + * Copyright (c) 2015 Igalia, S.L. * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff --git a/include/block/accounting.h b/include/block/accounting.h index f41ddde..0215a4a 100644 --- a/include/block/accounting.h +++ b/include/block/accounting.h @@ -2,6 +2,7 @@ * QEMU System Emulator block accounting * * Copyright (c) 2011 Christoph Hellwig + * Copyright (c) 2015 Igalia, S.L. * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal -- 2.5.3
Re: [Qemu-block] [PATCH v2 03/16] blkverify: Convert s->test_file to BdrvChild
On Thu 01 Oct 2015 03:13:21 PM CEST, Kevin Wolf wrote: > Signed-off-by: Kevin Wolf > Reviewed-by: Max Reitz Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v2 05/16] block: Convert bs->file to BdrvChild
On Thu 01 Oct 2015 03:13:23 PM CEST, Kevin Wolf wrote: > This patch removes the temporary duplication between bs->file and > bs->file_child by converting everything to BdrvChild. > > Signed-off-by: Kevin Wolf Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v2 07/16] block: Convert bs->backing_hd to BdrvChild
On Thu 01 Oct 2015 03:13:25 PM CEST, Kevin Wolf wrote: > This is the final step in converting all of the BlockDriverState > pointers that block drivers use to BdrvChild. > > After this patch, bs->children contains the full list of child nodes > that are referenced by a given BDS, and these children are only > referenced through BdrvChild, so that updating the pointer in there is > enough for changing edges in the graph. > > Signed-off-by: Kevin Wolf Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v2 08/16] block: Manage backing file references in bdrv_set_backing_hd()
On Thu 01 Oct 2015 03:13:26 PM CEST, Kevin Wolf wrote: > @@ -2428,12 +2434,9 @@ int bdrv_drop_intermediate(BlockDriverState *active, > BlockDriverState *top, > BlockDriverState *intermediate; > BlockDriverState *base_bs = NULL; > BlockDriverState *new_top_bs = NULL; > -BlkIntermediateStates *intermediate_state, *next; > +BlkIntermediateStates *intermediate_state; > int ret = -EIO; > > -QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete; > -QSIMPLEQ_INIT(&states_to_delete); > - > if (!top->drv || !base->drv) { > goto exit; > } > @@ -2460,7 +2463,6 @@ int bdrv_drop_intermediate(BlockDriverState *active, > BlockDriverState *top, > while (intermediate) { > intermediate_state = g_new0(BlkIntermediateStates, 1); > intermediate_state->bs = intermediate; > -QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry); > > if (backing_bs(intermediate) == base) { > base_bs = backing_bs(intermediate); The purpose of this while() loop in the original code was to make a list of all the intermediate states whose references need to be dropped after the bdrv_set_backing_hd() call. Since now bdrv_set_backing_hd() is the one who manages the backing references, this list is no longer necessary, and you are actually leaking the BlkIntermediateStates objects here (see the rest of the patch below). The loop is also useful to check that 'base' is indeed part of the backing chain, so that we should keep. > @@ -2483,17 +2485,8 @@ int bdrv_drop_intermediate(BlockDriverState *active, > BlockDriverState *top, > } > bdrv_set_backing_hd(new_top_bs, base_bs); > > -QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, > next) { > -/* so that bdrv_close() does not recursively close the chain */ > -bdrv_set_backing_hd(intermediate_state->bs, NULL); > -bdrv_unref(intermediate_state->bs); > -} > ret = 0; > - > exit: > -QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, > next) { > -g_free(intermediate_state); > -} > return ret; > } Berto
Re: [Qemu-block] [PATCH v2 11/16] block-backend: Add blk_set_bs()
On Thu 01 Oct 2015 03:13:29 PM CEST, Kevin Wolf wrote: > It allows changing the BlockDriverState that a BlockBackend points to. > > Signed-off-by: Kevin Wolf Reviewed-by: Alberto Garcia Berto
[Qemu-block] blockdev-del
I would like to implement blockdev-del. Since 48f364dd0b there's no way to remove a block device added with blockdev-add. Here's the use case: 1. We can attach a disk to a running QEMU instance: {"execute": "blockdev-add", "arguments": { "options": { "driver": "qcow2", "file": { "driver": "file", "filename": "hd.img"}, "id": "disk1"}}} 2. We can attach a front-end device that uses the disk that we just added. {"execute": "device_add", "arguments": { "id": "dev1", "driver": "virtio-blk-pci", "drive": "disk1"}} 3. When we're done, we can remove the front-end: {"execute": "device_del", "arguments": { "id": "dev1" }} However, it is not possible to remove the disk added in step 1 so if we want to attach a new disk we cannot reuse the device name ("disk1"). I know that blockdev-add is still a work in progress, but I wonder if there's anything else that is preventing us from writing blockdev-del (that's maybe not even the best name here, since what we want to destroy is the BlockBackend; the BDS can be already removed with 'eject'). Is this a good idea? Has it been discussed already? I would be happy to hear your opinions. Thanks, Berto
Re: [Qemu-block] [PATCH v2 14/16] blockjob: Store device name at job creation
On Thu 01 Oct 2015 03:13:32 PM CEST, Kevin Wolf wrote: > Some block jobs change the block device graph on completion. This means > that the device that owns the job and originally was addressed with its > device name may no longer be what the corresponding BlockBackend points > to. > > Previously, the effects of bdrv_swap() ensured that the job was (at > least partially) transferred to the target image. Events that contain > the device name could still use bdrv_get_device_name(job->bs) and get > the same result. > > After removing bdrv_swap(), this won't work any more. Instead, save the > device name at job creation and use that copy for QMP events and > anything else identifying the job. > > Signed-off-by: Kevin Wolf Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH] throttle: test that snapshots move the throttling configuration
Ping On Thu 17 Sep 2015 04:33:06 PM CEST, Alberto Garcia wrote: > If a snapshot is performed on a device that has I/O limits they should > be moved to the target image (the new active layer). > > Signed-off-by: Alberto Garcia > --- > tests/qemu-iotests/096 | 69 > ++ > tests/qemu-iotests/096.out | 5 > tests/qemu-iotests/group | 1 + > 3 files changed, 75 insertions(+) > create mode 100644 tests/qemu-iotests/096 > create mode 100644 tests/qemu-iotests/096.out > > diff --git a/tests/qemu-iotests/096 b/tests/qemu-iotests/096 > new file mode 100644 > index 000..e34204b > --- /dev/null > +++ b/tests/qemu-iotests/096 > @@ -0,0 +1,69 @@ > +#!/usr/bin/env python > +# > +# Test that snapshots move the throttling configuration to the active > +# layer > +# > +# Copyright (C) 2015 Igalia, S.L. > +# > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 2 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see <http://www.gnu.org/licenses/>. > +# > + > +import iotests > +import os > + > +class TestLiveSnapshot(iotests.QMPTestCase): > +base_img = os.path.join(iotests.test_dir, 'base.img') > +target_img = os.path.join(iotests.test_dir, 'target.img') > +group = 'mygroup' > +iops = 6000 > +iops_size = 1024 > + > +def setUp(self): > +opts = [] > +opts.append('node-name=base') > +opts.append('throttling.group=%s' % self.group) > +opts.append('throttling.iops-total=%d' % self.iops) > +opts.append('throttling.iops-size=%d' % self.iops_size) > +iotests.qemu_img('create', '-f', iotests.imgfmt, self.base_img, > '100M') > +self.vm = iotests.VM().add_drive(self.base_img, ','.join(opts)) > +self.vm.launch() > + > +def tearDown(self): > +self.vm.shutdown() > +os.remove(self.base_img) > +os.remove(self.target_img) > + > +def checkConfig(self, active_layer): > +result = self.vm.qmp('query-named-block-nodes') > +for r in result['return']: > +if r['node-name'] == active_layer: > +self.assertEqual(r['group'], self.group) > +self.assertEqual(r['iops'], self.iops) > +self.assertEqual(r['iops_size'], self.iops_size) > +else: > +self.assertFalse(r.has_key('group')) > +self.assertEqual(r['iops'], 0) > +self.assertFalse(r.has_key('iops_size')) > + > +def testSnapshot(self): > +self.checkConfig('base') > +self.vm.qmp('blockdev-snapshot-sync', > +node_name = 'base', > +snapshot_node_name = 'target', > +snapshot_file = self.target_img, > +format = iotests.imgfmt) > +self.checkConfig('target') > + > +if __name__ == '__main__': > +iotests.main(supported_fmts=['qcow2']) > diff --git a/tests/qemu-iotests/096.out b/tests/qemu-iotests/096.out > new file mode 100644 > index 000..ae1213e > --- /dev/null > +++ b/tests/qemu-iotests/096.out > @@ -0,0 +1,5 @@ > +. > +-- > +Ran 1 tests > + > +OK > diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group > index 439b1d2..30c784e 100644 > --- a/tests/qemu-iotests/group > +++ b/tests/qemu-iotests/group > @@ -102,6 +102,7 @@ > 093 auto > 094 rw auto quick > 095 rw auto quick > +096 rw auto quick > 097 rw auto backing > 098 rw auto backing quick > 099 rw auto quick > -- > 2.5.1
Re: [Qemu-block] [PATCH v6 3/4] block: add a 'blockdev-snapshot' QMP command
On Tue 06 Oct 2015 05:30:07 PM CEST, Kevin Wolf wrote: >> -options = qdict_new(); >> -if (has_snapshot_node_name) { >> -qdict_put(options, "node-name", >> - qstring_from_str(snapshot_node_name)); >> +if (snapshot_node_name && bdrv_find_node(snapshot_node_name)) { >> +error_setg(errp, "New snapshot node name already exists"); >> +return; >> +} > > Preexisting, but shouldn't we use bdrv_lookup_bs() here (because devices > and node names share a namespace)? I think you're right, good catch! >> +if (state->new_bs->blk != NULL) { >> +error_setg(errp, "The snapshot is already in use by %s", >> + blk_name(state->new_bs->blk)); >> +return; >> +} > > Is it even possible yet to create a root BDS without a BB? It is possible with Max's series, on which mine depends. http://patchwork.ozlabs.org/patch/519375/ >> +if (bdrv_op_is_blocked(state->new_bs, BLOCK_OP_TYPE_EXTERNAL_SNAPSHOT, >> + errp)) { >> +return; >> +} >> + >> +if (state->new_bs->backing_hd != NULL) { >> +error_setg(errp, "The snapshot already has a backing image"); >> } > > The error cases after bdrv_open() should probably bdrv_unref() the > node. I don't think it's necessary, external_snapshot_abort() already takes care of that. Thanks for reviewing! Berto
Re: [Qemu-block] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child
On Tue 22 Sep 2015 09:44:19 AM CEST, Wen Congyang wrote: > In some cases, we want to take a quorum child offline, and take > another child online. > > Signed-off-by: Wen Congyang > Signed-off-by: zhanghailiang > Signed-off-by: Gonglei > Reviewed-by: Eric Blake > +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs, > +Error **errp) > +{ > + > +if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) { > +error_setg(errp, "The BDS %s doesn't support adding a child", > + bdrv_get_device_or_node_name(parent_bs)); > +return; > +} > + > +if (!QLIST_EMPTY(&child_bs->parents)) { > +error_setg(errp, "The BDS %s already has parent", > + child_bs->node_name); I think there's one 'a' missing: "The BDS %s already has a parent". I also don't think we should use "BDS" in error messages, that's an acronym for the name of a C data type, not something that the user is supposed to know about. I suggest using 'Node' instead. Otherwise the patch looks good to me, thanks! Berto
Re: [Qemu-block] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
On Tue 22 Sep 2015 09:44:20 AM CEST, Wen Congyang wrote: > +++ b/block/quorum.c > @@ -66,6 +66,9 @@ typedef struct QuorumVotes { > typedef struct BDRVQuorumState { > BlockDriverState **bs; /* children BlockDriverStates */ > int num_children; /* children count */ > +int max_children; /* The maximum children count, we need to > reallocate > +* bs if num_children grows larger than maximum. > +*/ > int threshold; /* if less than threshold children reads gave the > * same result a quorum error occurs. > */ As you announce in the cover letter of this series, your code depends on the parents list patch written by Kevin here: http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html As you might be aware, and as part of the same series by Kevin, BDRVQuorumState will no longer hold a list of BlockDriverState but a list of BdrvChild instead: https://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04571.html > +static void quorum_add_child(BlockDriverState *bs, BlockDriverState > *child_bs, > + Error **errp) > +{ > +BDRVQuorumState *s = bs->opaque; > + > +bdrv_drain(bs); > + > +if (s->num_children == s->max_children) { > +if (s->max_children >= INT_MAX) { > +error_setg(errp, "Too many children"); > +return; > +} max_children can never be greater than INT_MAX. Use == instead. > +s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1); > +s->bs[s->num_children] = NULL; No need to set the pointer to NULL here, and you are anyway setting the pointer to the new child a few lines afterwards. > +s->max_children++; > +} > + > +bdrv_ref(child_bs); > +bdrv_attach_child(bs, child_bs, &child_format); > +s->bs[s->num_children++] = child_bs; > +} > + > +static void quorum_del_child(BlockDriverState *bs, BlockDriverState > *child_bs, > + Error **errp) > +{ > +BDRVQuorumState *s = bs->opaque; > +BdrvChild *child; > +int i; > + > +for (i = 0; i < s->num_children; i++) { > +if (s->bs[i] == child_bs) { > +break; > +} > +} > + > +QLIST_FOREACH(child, &bs->children, next) { > +if (child->bs == child_bs) { > +break; > +} > +} > + > +/* we have checked it in bdrv_del_child() */ > +assert(i < s->num_children && child); > + > +if (s->num_children <= s->threshold) { > +error_setg(errp, > +"The number of children cannot be lower than the vote threshold > %d", > +s->threshold); > +return; > +} > + > +bdrv_drain(bs); > +/* We can safely remove this child now */ > +memmove(&s->bs[i], &s->bs[i + 1], > +(s->num_children - i - 1) * sizeof(void *)); > +s->num_children--; > +s->bs[s->num_children] = NULL; Same here, no one will check or use s->bs[s->num_children] so there's no need to make it NULL. Apart from the issue of using only part of Kevin's series, the rest are minor things. Thanks and sorry for the late review! Berto
Re: [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
On Tue 22 Sep 2015 09:44:21 AM CEST, Wen Congyang wrote: > The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del. > It justs for adding/removing quorum's child now, and don't support all > kinds of children, nor all block drivers. So it is experimental now. Better "So it is experimental for now". Otherwise it suggests that it was not experimental before but now it is. > +void qmp_x_blockdev_child_add(const char *parent, const char *child, > + Error **errp) > +{ > +BlockDriverState *parent_bs, *child_bs; > +Error *local_err = NULL; > + > +parent_bs = bdrv_lookup_bs(parent, parent, &local_err); > +if (!parent_bs) { > +error_propagate(errp, local_err); > +return; > +} I think this is fine as it is now, but since you are checking the value of parent_bs to see if bdrv_lookup_bs() succeeded you can use errp directly instead and skip the error_propagate() call. > +void qmp_x_blockdev_child_del(const char *parent, const char *child, > + Error **errp) > +{ > +BlockDriverState *parent_bs, *child_bs; > +Error *local_err = NULL; > + > +parent_bs = bdrv_lookup_bs(parent, parent, &local_err); > +if (!parent_bs) { > +error_propagate(errp, local_err); > +return; > +} Same here. > +## > +# @x-blockdev-child-add > +# > +# Add a new child to the parent BDS. Currently only the Quorum driver > +# implements this feature. This is useful to fix a broken quorum > child. Again, I would not use 'BDS' here. "Add a new child to an existing block device", or something like that. > +# @parent: graph node name or id which the child will be added to. > +# > +# @child: graph node name that will be added. > +# > +# Note: this command is experimental, and not a stable API. "[..] and does not have a stable API". > +x-blockdev-child-add > + > + > +Add a child to a quorum node. > + > +Arguments: > + > +- "parent": the quorum's id or node name > +- "child": the child node-name which will be added > + I would use the same kind of description than in the json file. "Add a child to an existing block device. Currently only Quorum supports this feature". Same for the 'x-blockdev-child-del' documentation. Berto
Re: [Qemu-block] [PATCH v5 4/4] hmp: add monitor command to add/remove a child
On Tue 22 Sep 2015 09:44:22 AM CEST, Wen Congyang wrote: > --- a/hmp-commands.hx > +++ b/hmp-commands.hx > @@ -193,6 +193,34 @@ actions (drive options rerror, werror). > ETEXI > > { > +.name = "blockdev_child_add", > +.args_type = "id:B,child:B", > +.params = "parent child", > +.help = "add a child to a BDS", Once again, I would not use 'BDS' in the documentation. > +.mhandler.cmd = hmp_blockdev_child_add, > +}, > + > +STEXI > +@item blockdev_child_add @var{parent} @var{child} > +@findex blockdev_child_add > +Add a child to the block device. "Add a child to a block device." Thanks! Berto
Re: [Qemu-block] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
On Wed 07 Oct 2015 08:51:21 PM CEST, Max Reitz wrote: >> +if (s->num_children == s->max_children) { >> +if (s->max_children >= INT_MAX) { > > Opposing Berto (:-)), I like >= even if the > part is actually > impossible. I myself like to use constructs such as: > > for (i = 0; i < n; i++) { > /* ... */ > } > if (i >= n) { > /* ... */ > } > > Even though @i can never exceed @n after the loop. This is because my > way of thinking is "What if it could exceed @n? Then I'd like to take > this branch as well." The same applies here. s->max_children can never > exceed INT_MAX, but if it could, we'd want that to be an error, too. If s->max_children (and therefore s->num_children) could exceed the upper limit then we would probably want to assert on that, because it means that there's something seriously broken. The purpose of this code is to make sure that the upper limit is... well, an upper limit :-) If that invariant no longer holds then I don't think we want a simple "Too many children" error. >> +s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1); >> +s->bs[s->num_children] = NULL; >> +s->max_children++; >> +} > > Just a suggestion, please feel free to ignore it completely: > > You can drop the s->max_children field and just always call g_renew() > with s->num_children + 1 as the @count parameter. There shouldn't be > any (visible) performance penalty, but it would simplify the code. If s->num_children has decreased since the previous g_renew() call (because the user called quorum_del_child()) that could actually reduce the array size. Nothing would break though, at worst it would just be a bit counter-intuitive :-) https://www.gnu.org/software/libc/manual/html_node/Changing-Block-Size.html Berto
Re: [Qemu-block] [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
On Thu 08 Oct 2015 08:15:25 AM CEST, Markus Armbruster wrote: >> For the second point, you should also consider how useful this >> feature is to management tools. Just being able to remove and attach >> children from a quorum node seems very useful on its own. I don't see >> why we should wait for having support for other block drivers; also, >> for most block drivers there is no meaningful way of adding or >> removing children as nicely as that is possible for quorum. > > Okay, this is an argument I might be able to buy. Note that if we want to make this interface stable there's one use case missing: there's currently no way to change the vote threshold. This is maybe not so important for the COLO use case, but for the general case of adding and removing children from a quorum node having the possibility to change the threshold makes a lot of sense. That would probably require a its own API ('quorum-set-threshold' or something like that) so I don't think it has an effect on these child-add and child-del commands, but I wanted to mention it here anyway in case someone sees something that I'm overlooking. Berto
Re: [Qemu-block] [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
On Thu 08 Oct 2015 12:03:45 PM CEST, Kevin Wolf wrote: >> Note that if we want to make this interface stable there's one use >> case missing: there's currently no way to change the vote threshold. >> > The right way to change the voting threshold is bdrv_reopen(). Adding > support to quorum for changing the threshold this way should be > trivial. Ah, and that would mean implementing quorum_reopen_*. Works for me. > The hard part is how to expose it in QMP, which is why I've only added > it to qemu-io so far (which happens to be accessible from HMP). > > If you need this functionality, let's start a new email thread and > discuss the right QMP interface for blockdev-reopen (or blockdev- > set-options or whatever you would call it). Ok, sounds good. Thanks! Berto
Re: [Qemu-block] blockdev-del
On Mon 05 Oct 2015 05:58:14 PM CEST, Kevin Wolf wrote: > What I recall is that the desired semantics for blockdev-del is that it > only works if the monitor reference is the only remaining reference and > that the name blockdev-del is therefore actually not too bad. > > The reason why we want this restriction is that otherwise images could > stay in use without the user even noticing, which is rather dangerous. > Explicit monitor commands for adding and for removing an image seem to > be much easier to handle than images disappearing after e.g. a block job > decided to drop its reference (without any user interaction). One thing that is new is that we'll soon have the ability to create BDSs without a BlockBackend using blockdev-add, therefore blockdev-del could a) destroy the BlockDriverState if it doesn't have a BlockBackend and the monitor reference is the only remaining one. b) remove the BlockDriverState from the backend (that would be essentially 'blockdev-remove-medium') and destroy the backend. That would require that the backend is also unused. blk_get_attached_dev() doesn't seem to be enough because others can also hold a reference to the backend (e.g. the NBD server). c) it could also happen that the device name passed to blockdev-del is that of a BlockBackend that has no BDS attached. In that case it would simply destroy the BlockBackend as described in b). Does that sound right? Berto
[Qemu-block] [PATCH v7 1/5] block: check for existing device IDs in external_snapshot_prepare()
The 'snapshot-node-name' parameter of blockdev-snapshot-sync allows setting the node name of the image that is going to be created. Before creating the image, external_snapshot_prepare() checks that the name is not already being used. The check is however incomplete since it only considers existing node names, but node names must not clash with device IDs either because they share the same namespace. If the user attempts to create a snapshot using the name of an existing device for the 'snapshot-node-name' parameter the operation will eventually fail, but only after the new image has been created. This patch replaces bdrv_find_node() with bdrv_lookup_bs() to extend the check to existing device IDs, and thus detect possible name clashes before the new image is created. Signed-off-by: Alberto Garcia --- blockdev.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/blockdev.c b/blockdev.c index 4731843..0898d1f 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1570,8 +1570,9 @@ static void external_snapshot_prepare(BlkTransactionState *common, return; } -if (has_snapshot_node_name && bdrv_find_node(snapshot_node_name)) { -error_setg(errp, "New snapshot node name already existing"); +if (has_snapshot_node_name && +bdrv_lookup_bs(snapshot_node_name, snapshot_node_name, NULL)) { +error_setg(errp, "New snapshot node name already in use"); return; } -- 2.6.1
[Qemu-block] [PATCH v7 3/5] block: support passing 'backing': '' to 'blockdev-add'
Passing an empty string allows opening an image but not its backing file. This was already described in the API documentation, only the implementation was missing. This is useful for creating snapshots using images opened with blockdev-add, since they are not supposed to have a backing image before the operation. Signed-off-by: Alberto Garcia Reviewed-by: Max Reitz Reviewed-by: Eric Blake Reviewed-by: Kevin Wolf --- block.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/block.c b/block.c index c9e3c6c..31d1b6e 100644 --- a/block.c +++ b/block.c @@ -1406,6 +1406,7 @@ static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename, BlockDriverState *file = NULL, *bs; BlockDriver *drv = NULL; const char *drvname; +const char *backing; Error *local_err = NULL; int snapshot_flags = 0; @@ -1473,6 +1474,12 @@ static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename, assert(drvname || !(flags & BDRV_O_PROTOCOL)); +backing = qdict_get_try_str(options, "backing"); +if (backing && *backing == '\0') { +flags |= BDRV_O_NO_BACKING; +qdict_del(options, "backing"); +} + bs->open_flags = flags; bs->options = options; options = qdict_clone_shallow(options); -- 2.6.1
[Qemu-block] [PATCH v7 0/5] Add 'blockdev-snapshot' command
This series adds a new 'blockdev-snapshot' command, that is similar to 'blockdev-snapshot-sync' but takes references to two existing block devices. This depends on Max's "BlockBackend and media" series: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00497.html v7: - Rebase on top of the current master. qmp_marshal_input_blockdev_snapshot is renamed to qmp_marshal_blockdev_snapshot in order to make it build. - New patch to use bdrv_lookup_bs() instead of bdrv_find_node() in external_snapshot_prepare(). This way, if the user attempts to use blockdev-snapshot-sync using an existing device ID in the snapshot-node-name parameter, the code will detect the error before the new image is created. v6: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00575.html - Update documentation and parameter names following Eric's suggestions: 'device' -> 'node', 'snapshot' -> 'overlay'. - Rebased on top of Max's "BlockBackend and media" v5 v5: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00483.html - Don't delete the 'backing' option if it contains something different from an empty string. - Rebase on top of the current master. v4: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00372.html - Implement the support for 'backing': '', drop 'ignore-backing', and update iotest 085 accordingly. - Include sample 'blockdev-add' call in the 'blockdev-snapshot' documentation. - Clarify that the snapshot must not have a backing file in the BlockdevSnapshot documentation. - Update error message ("...node name already existing" -> "...exists"). v3: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00272.html - Add 'ignore-backing' field to BlockdevOptionsGenericCOWFormat. This allows opening images but not their backing images. - Check for op blockers in the snapshot node and make sure that it doesn't have any backing image. - Remove extra check for the existence of the snapshot node: bdrv_open() already does that. - Extend iotest 085 to add tests for 'blockdev-snapshot'. - Replace local_err with errp in some places where the former is unnecessary. - Update command description. - Add 'since' tag to the 'blockdev-snapshot' field in TransactionAction. v2: https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00094.html - Add 'blockdev-snapshot' command instead of allowing passing options to 'blockdev-snapshot-sync'. - Rename BlockdevSnapshot to BlockdevSnapshotSync v1: https://lists.gnu.org/archive/html/qemu-block/2015-08/msg00236.html Alberto Garcia (5): block: check for existing device IDs in external_snapshot_prepare() block: rename BlockdevSnapshot to BlockdevSnapshotSync block: support passing 'backing': '' to 'blockdev-add' block: add a 'blockdev-snapshot' QMP command block: add tests for the 'blockdev-snapshot' command block.c| 7 ++ blockdev.c | 166 - qapi-schema.json | 4 +- qapi/block-core.json | 34 +- qmp-commands.hx| 38 +++ tests/qemu-iotests/085 | 102 ++-- tests/qemu-iotests/085.out | 34 +- 7 files changed, 312 insertions(+), 73 deletions(-) -- 2.6.1
[Qemu-block] [PATCH v7 4/5] block: add a 'blockdev-snapshot' QMP command
One of the limitations of the 'blockdev-snapshot-sync' command is that it does not allow passing BlockdevOptions to the newly created snapshots, so they are always opened using the default values. Extending the command to allow passing options is not a practical solution because there is overlap between those options and some of the existing parameters of the command. This patch introduces a new 'blockdev-snapshot' command with a simpler interface: it just takes two references to existing block devices that will be used as the source and target for the snapshot. Since the main difference between the two commands is that one of them creates and opens the target image, while the other uses an already opened one, the bulk of the implementation is shared. Signed-off-by: Alberto Garcia Cc: Eric Blake Reviewed-by: Max Reitz --- blockdev.c | 165 --- qapi-schema.json | 2 + qapi/block-core.json | 28 + qmp-commands.hx | 38 4 files changed, 172 insertions(+), 61 deletions(-) diff --git a/blockdev.c b/blockdev.c index 12741a0..b5470c9 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1183,6 +1183,18 @@ void qmp_blockdev_snapshot_sync(bool has_device, const char *device, &snapshot, errp); } +void qmp_blockdev_snapshot(const char *node, const char *overlay, + Error **errp) +{ +BlockdevSnapshot snapshot_data = { +.node = (char *) node, +.overlay = (char *) overlay +}; + +blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT, + &snapshot_data, errp); +} + void qmp_blockdev_snapshot_internal_sync(const char *device, const char *name, Error **errp) @@ -1521,58 +1533,48 @@ typedef struct ExternalSnapshotState { static void external_snapshot_prepare(BlkTransactionState *common, Error **errp) { -int flags, ret; -QDict *options; +int flags = 0, ret; +QDict *options = NULL; Error *local_err = NULL; -bool has_device = false; +/* Device and node name of the image to generate the snapshot from */ const char *device; -bool has_node_name = false; const char *node_name; -bool has_snapshot_node_name = false; -const char *snapshot_node_name; +/* Reference to the new image (for 'blockdev-snapshot') */ +const char *snapshot_ref; +/* File name of the new image (for 'blockdev-snapshot-sync') */ const char *new_image_file; -const char *format = "qcow2"; -enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS; ExternalSnapshotState *state = DO_UPCAST(ExternalSnapshotState, common, common); TransactionAction *action = common->action; -/* get parameters */ -g_assert(action->kind == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC); - -has_device = action->blockdev_snapshot_sync->has_device; -device = action->blockdev_snapshot_sync->device; -has_node_name = action->blockdev_snapshot_sync->has_node_name; -node_name = action->blockdev_snapshot_sync->node_name; -has_snapshot_node_name = -action->blockdev_snapshot_sync->has_snapshot_node_name; -snapshot_node_name = action->blockdev_snapshot_sync->snapshot_node_name; - -new_image_file = action->blockdev_snapshot_sync->snapshot_file; -if (action->blockdev_snapshot_sync->has_format) { -format = action->blockdev_snapshot_sync->format; -} -if (action->blockdev_snapshot_sync->has_mode) { -mode = action->blockdev_snapshot_sync->mode; +/* 'blockdev-snapshot' and 'blockdev-snapshot-sync' have similar + * purpose but a different set of parameters */ +switch (action->kind) { +case TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT: +{ +BlockdevSnapshot *s = action->blockdev_snapshot; +device = s->node; +node_name = s->node; +new_image_file = NULL; +snapshot_ref = s->overlay; +} +break; +case TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC: +{ +BlockdevSnapshotSync *s = action->blockdev_snapshot_sync; +device = s->has_device ? s->device : NULL; +node_name = s->has_node_name ? s->node_name : NULL; +new_image_file = s->snapshot_file; +snapshot_ref = NULL; +} +break; +default: +g_assert_not_reached(); } /* start processing */ -state->old_bs = bdrv_lookup_bs(has_device ? device : NULL, - has_node_name ? node_name : NULL, - &local_err); -
[Qemu-block] [PATCH v7 5/5] block: add tests for the 'blockdev-snapshot' command
Signed-off-by: Alberto Garcia Reviewed-by: Max Reitz --- tests/qemu-iotests/085 | 102 ++--- tests/qemu-iotests/085.out | 34 ++- 2 files changed, 128 insertions(+), 8 deletions(-) diff --git a/tests/qemu-iotests/085 b/tests/qemu-iotests/085 index 56cd6f8..9484117 100755 --- a/tests/qemu-iotests/085 +++ b/tests/qemu-iotests/085 @@ -7,6 +7,7 @@ # snapshots are performed. # # Copyright (C) 2014 Red Hat, Inc. +# Copyright (C) 2015 Igalia, S.L. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -34,17 +35,17 @@ status=1# failure is the default! snapshot_virt0="snapshot-v0.qcow2" snapshot_virt1="snapshot-v1.qcow2" -MAX_SNAPSHOTS=10 +SNAPSHOTS=10 _cleanup() { _cleanup_qemu -for i in $(seq 1 ${MAX_SNAPSHOTS}) +for i in $(seq 1 ${SNAPSHOTS}) do rm -f "${TEST_DIR}/${i}-${snapshot_virt0}" rm -f "${TEST_DIR}/${i}-${snapshot_virt1}" done - _cleanup_test_img +rm -f "${TEST_IMG}.1" "${TEST_IMG}.2" } trap "_cleanup; exit \$status" 0 1 2 3 15 @@ -85,18 +86,50 @@ function create_group_snapshot() _send_qemu_cmd $h "${cmd}" "return" } +# ${1}: unique identifier for the snapshot filename +# ${2}: true: open backing images; false: don't open them (default) +function add_snapshot_image() +{ +if [ "${2}" = "true" ]; then +extra_params="" +else +extra_params="'backing': '', " +fi +base_image="${TEST_DIR}/$((${1}-1))-${snapshot_virt0}" +snapshot_file="${TEST_DIR}/${1}-${snapshot_virt0}" +_make_test_img -b "${base_image}" "$size" +mv "${TEST_IMG}" "${snapshot_file}" +cmd="{ 'execute': 'blockdev-add', 'arguments': + { 'options': + { 'driver': 'qcow2', 'node-name': 'snap_"${1}"', "${extra_params}" + 'file': + { 'driver': 'file', 'filename': '"${snapshot_file}"' } } } }" +_send_qemu_cmd $h "${cmd}" "return" +} + +# ${1}: unique identifier for the snapshot filename +# ${2}: expected response, defaults to 'return' +function blockdev_snapshot() +{ +cmd="{ 'execute': 'blockdev-snapshot', + 'arguments': { 'node': 'virtio0', + 'overlay':'snap_"${1}"' } }" +_send_qemu_cmd $h "${cmd}" "${2:-return}" +} + size=128M _make_test_img $size -mv "${TEST_IMG}" "${TEST_IMG}.orig" +mv "${TEST_IMG}" "${TEST_IMG}.1" _make_test_img $size +mv "${TEST_IMG}" "${TEST_IMG}.2" echo echo === Running QEMU === echo qemu_comm_method="qmp" -_launch_qemu -drive file="${TEST_IMG}.orig",if=virtio -drive file="${TEST_IMG}",if=virtio +_launch_qemu -drive file="${TEST_IMG}.1",if=virtio -drive file="${TEST_IMG}.2",if=virtio h=$QEMU_HANDLE echo @@ -105,6 +138,8 @@ echo _send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" "return" +# Tests for the blockdev-snapshot-sync command + echo echo === Create a single snapshot on virtio0 === echo @@ -132,11 +167,66 @@ echo echo === Create several transactional group snapshots === echo -for i in $(seq 2 ${MAX_SNAPSHOTS}) +for i in $(seq 2 ${SNAPSHOTS}) do create_group_snapshot ${i} done +# Tests for the blockdev-snapshot command + +echo +echo === Create a couple of snapshots using blockdev-snapshot === +echo + +SNAPSHOTS=$((${SNAPSHOTS}+1)) +add_snapshot_image ${SNAPSHOTS} +blockdev_snapshot ${SNAPSHOTS} + +SNAPSHOTS=$((${SNAPSHOTS}+1)) +add_snapshot_image ${SNAPSHOTS} +blockdev_snapshot ${SNAPSHOTS} + +echo +echo === Invalid command - snapshot node used as active layer === +echo + +blockdev_snapshot ${SNAPSHOTS} error + +_send_qemu_cmd $h "{ 'execute': 'blockdev-snapshot', + 'arguments': { 'node':'virtio0', +'overlay':'virtio0' } + }" "error" + +_send_qemu_cmd $h "{ 'execute': 'blockdev-snapshot', + 'arguments': { 'node':'virtio0', +'overlay':'virtio1' } + }" "error" + +echo +echo === Invalid command - snaps
[Qemu-block] [PATCH v7 2/5] block: rename BlockdevSnapshot to BlockdevSnapshotSync
We will introduce the 'blockdev-snapshot' command that will require its own struct for the parameters, so we need to rename this one in order to avoid name clashes. Signed-off-by: Alberto Garcia Reviewed-by: Eric Blake Reviewed-by: Max Reitz Reviewed-by: Kevin Wolf --- blockdev.c | 2 +- qapi-schema.json | 2 +- qapi/block-core.json | 8 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/blockdev.c b/blockdev.c index 0898d1f..12741a0 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1166,7 +1166,7 @@ void qmp_blockdev_snapshot_sync(bool has_device, const char *device, bool has_format, const char *format, bool has_mode, NewImageMode mode, Error **errp) { -BlockdevSnapshot snapshot = { +BlockdevSnapshotSync snapshot = { .has_device = has_device, .device = (char *) device, .has_node_name = has_node_name, diff --git a/qapi-schema.json b/qapi-schema.json index a05794e..65701dc 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1534,7 +1534,7 @@ ## { 'union': 'TransactionAction', 'data': { - 'blockdev-snapshot-sync': 'BlockdevSnapshot', + 'blockdev-snapshot-sync': 'BlockdevSnapshotSync', 'drive-backup': 'DriveBackup', 'blockdev-backup': 'BlockdevBackup', 'abort': 'Abort', diff --git a/qapi/block-core.json b/qapi/block-core.json index 5f12af7..6b5ac02 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -682,7 +682,7 @@ 'data': [ 'existing', 'absolute-paths' ] } ## -# @BlockdevSnapshot +# @BlockdevSnapshotSync # # Either @device or @node-name must be set but not both. # @@ -699,7 +699,7 @@ # @mode: #optional whether and how QEMU should create a new image, default is #'absolute-paths'. ## -{ 'struct': 'BlockdevSnapshot', +{ 'struct': 'BlockdevSnapshotSync', 'data': { '*device': 'str', '*node-name': 'str', 'snapshot-file': 'str', '*snapshot-node-name': 'str', '*format': 'str', '*mode': 'NewImageMode' } } @@ -790,7 +790,7 @@ # # Generates a synchronous snapshot of a block device. # -# For the arguments, see the documentation of BlockdevSnapshot. +# For the arguments, see the documentation of BlockdevSnapshotSync. # # Returns: nothing on success # If @device is not a valid block device, DeviceNotFound @@ -798,7 +798,7 @@ # Since 0.14.0 ## { 'command': 'blockdev-snapshot-sync', - 'data': 'BlockdevSnapshot' } + 'data': 'BlockdevSnapshotSync' } ## # @change-backing-file -- 2.6.1
Re: [Qemu-block] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
On Fri 09 Oct 2015 05:51:55 PM CEST, Max Reitz wrote: +s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1); +s->bs[s->num_children] = NULL; +s->max_children++; +} >>> >>> Just a suggestion, please feel free to ignore it completely: >>> >>> You can drop the s->max_children field and just always call g_renew() >>> with s->num_children + 1 as the @count parameter. There shouldn't be >>> any (visible) performance penalty, but it would simplify the code. >> >> If s->num_children has decreased since the previous g_renew() call >> (because the user called quorum_del_child()) that could actually reduce >> the array size. > > Yes, it could. And that would be just fine. ;-) > > We'd just keep the array exactly as big as it needs to be. I find that > pretty intuitive. It's just counter-intuitive if you think one should > never use realloc() for reducing the size of a buffer (and I know I > myself tend to write my code thinking that). If the goal is to keep the array exactly as big as it needs to be then we should use g_renew() in quorum_del_child()... Anyway we're digressing :-) this array is one pointer per Quorum child, so the amount of memory we're talking about here is probably negligible. I'm fine with any solution. Berto
Re: [Qemu-block] [PATCH v3 07/16] block: Convert bs->backing_hd to BdrvChild
On Fri 09 Oct 2015 02:15:32 PM CEST, Kevin Wolf wrote: > This is the final step in converting all of the BlockDriverState > pointers that block drivers use to BdrvChild. > > After this patch, bs->children contains the full list of child nodes > that are referenced by a given BDS, and these children are only > referenced through BdrvChild, so that updating the pointer in there is > enough for changing edges in the graph. > > Signed-off-by: Kevin Wolf Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v3 08/16] block: Manage backing file references in bdrv_set_backing_hd()
On Fri 09 Oct 2015 02:15:33 PM CEST, Kevin Wolf wrote: > This simplifies the code somewhat, especially when dropping whole > backing file subchains. > > The exception is the mirroring code that does adventurous things with > bdrv_swap() and in order to keep it working, I had to duplicate most of > bdrv_set_backing_hd() locally. We'll get rid again of this ugliness > shortly. > > Signed-off-by: Kevin Wolf Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v3 09/16] block: Split bdrv_move_feature_fields()
On Fri 09 Oct 2015 02:15:34 PM CEST, Kevin Wolf wrote: > After bdrv_swap(), some fields must be moved back to their original BDS > to compensate for the effects that a swap of the contents of the objects > has while keeping the old addresses. Other fields must be moved back > because they should logically be moved and must stay on top > > When replacing bdrv_swap() with operations changing the pointers in the > parents, we only need the latter and must avoid swapping the former. > Split the function accordingly. > > Signed-off-by: Kevin Wolf > Reviewed-by: Max Reitz > Reviewed-by: Fam Zheng Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v3 13/16] block: Implement bdrv_append() without bdrv_swap()
On Fri 09 Oct 2015 02:15:38 PM CEST, Kevin Wolf wrote: > +static void change_parent_backing_link(BlockDriverState *from, > + BlockDriverState *to) > +{ > +BdrvChild *c, *next; > + > +QLIST_FOREACH_SAFE(c, &from->parents, next_parent, next) { > +assert(c->role != &child_backing); > +c->bs = to; > +QLIST_REMOVE(c, next_parent); > +QLIST_INSERT_HEAD(&to->parents, c, next_parent); > +bdrv_ref(to); > +bdrv_unref(from); > +} > +if (from->blk) { > +blk_set_bs(from->blk, to); > +if (!to->device_list.tqe_prev) { > +QTAILQ_INSERT_BEFORE(from, to, device_list); > +} Is it even possible that this last condition is false? In what case would 'to' be already in bdrv_states? I understand that it would mean that it would already be attached to a BlockBackend, but that's not possible in this case. Berto
Re: [Qemu-block] [PATCH v7 4/5] block: add a 'blockdev-snapshot' QMP command
On Mon 12 Oct 2015 10:29:35 PM CEST, Max Reitz wrote: >> -if (has_snapshot_node_name && >> -bdrv_lookup_bs(snapshot_node_name, snapshot_node_name, NULL)) { >> -error_setg(errp, "New snapshot node name already in use"); > > There's a difference from v6 here... [...] >> -options = qdict_new(); >> -if (has_snapshot_node_name) { >> -qdict_put(options, "node-name", >> - qstring_from_str(snapshot_node_name)); >> +if (snapshot_node_name && >> +bdrv_lookup_bs(snapshot_node_name, snapshot_node_name, NULL)) { >> +error_setg(errp, "New snapshot node name already in use"); > > ...and here, but how to resolve the conflict resulting from the newly > added patch 1 was obvious, so my R-b stands, of course. The differences are because this patch is now rebased on top of the new one, sorry if I overstepped by keeping your R-b here! >> +if (state->new_bs->backing_hd != NULL) { >> +error_setg(errp, "The snapshot already has a backing image"); >> } > > It's here: In case Kevin's series is applied before this one (which > I'm assuming since you were brave enough to base this series on my BB > series), this needs to be s/backing_hd/backing/. I'm just saying this > so you know you can keep my R-b then. Sure, I was about to test your new version of the series. Thanks, Berto
Re: [Qemu-block] [PATCH v3 13/16] block: Implement bdrv_append() without bdrv_swap()
On Tue 13 Oct 2015 10:39:22 AM CEST, Kevin Wolf wrote: >> > +static void change_parent_backing_link(BlockDriverState *from, >> > + BlockDriverState *to) >> > +{ >> > +BdrvChild *c, *next; >> > + >> > +QLIST_FOREACH_SAFE(c, &from->parents, next_parent, next) { >> > +assert(c->role != &child_backing); >> > +c->bs = to; >> > +QLIST_REMOVE(c, next_parent); >> > +QLIST_INSERT_HEAD(&to->parents, c, next_parent); >> > +bdrv_ref(to); >> > +bdrv_unref(from); >> > +} >> > +if (from->blk) { >> > +blk_set_bs(from->blk, to); >> > +if (!to->device_list.tqe_prev) { >> > +QTAILQ_INSERT_BEFORE(from, to, device_list); >> > +} >> >> Is it even possible that this last condition is false? In what case >> would 'to' be already in bdrv_states? >> >> I understand that it would mean that it would already be attached to >> a BlockBackend, but that's not possible in this case. > > Yes, I think it's not possible currently (hopefully, because that > would cause other bugs), just being careful. Eventually we'll allow > more than one BlockBackend pointing to the same BDS, and then this is > a scenario that could happen. blk_set_bs() already asserts that to->blk == NULL, so if this is not possible here wouldn't it be better to use another assertion instead? When I was reviewing the code I kept wondering in what kind of scenario this condition could be false. Berto
[Qemu-block] [PATCH 0/3] Add 'blockdev-del' command
Here's my first attempt at the 'blockdev-del' command. This series goes on top of Max's "BlockBackend and media" v6: https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02810.html With Max's code, 'blockdev-add' can now create a BlockDriverState with or without a BlockBackend (depending on whether the 'id' parameter is passed). Therefore, 'blockdev-del' can be used to delete a backend and/or a BDS. The way it works is simple: it receives a single 'device' parameter which can refer to a block backend or a node name. - If it's a block backend, it will delete it along with its attached BDS (if any). - If it's a node name, it will delete the BDS along with the backend it is attached to (if any). - If you're wondering whether it's possible to delete the BDS but not the backend it is attached to, there is already eject or blockdev-remove-medium for that. If either the backend or the BDS are being used (refcount > 1, or if the BDS has any parents) the command fails. The series includes bunch of test cases with different scenarios (nodes with and without backend, empty backend, backing images, block jobs, Quorum). Regards, Berto Alberto Garcia (3): block: Add blk_get_refcnt() block: Add 'blockdev-del' QMP command iotests: Add test for the blockdev-del command block/block-backend.c | 5 ++ blockdev.c | 42 + include/sysemu/block-backend.h | 1 + qapi/block-core.json | 21 + qmp-commands.hx| 36 tests/qemu-iotests/139 | 189 + tests/qemu-iotests/139.out | 5 ++ tests/qemu-iotests/group | 1 + 8 files changed, 300 insertions(+) create mode 100644 tests/qemu-iotests/139 create mode 100644 tests/qemu-iotests/139.out -- 2.6.1
[Qemu-block] [PATCH 1/3] block: Add blk_get_refcnt()
This function returns the reference count of a given BlockBackend. For convenience, it returns 0 if the BlockBackend pointer is NULL. Signed-off-by: Alberto Garcia --- block/block-backend.c | 5 + include/sysemu/block-backend.h | 1 + 2 files changed, 6 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 10e4d71..5f1e395 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -189,6 +189,11 @@ static void drive_info_del(DriveInfo *dinfo) g_free(dinfo); } +int blk_get_refcnt(BlockBackend *blk) +{ +return blk ? blk->refcnt : 0; +} + /* * Increment @blk's reference count. * @blk must not be null. diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 14a6d32..8a6413d 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -65,6 +65,7 @@ BlockBackend *blk_new_with_bs(const char *name, Error **errp); BlockBackend *blk_new_open(const char *name, const char *filename, const char *reference, QDict *options, int flags, Error **errp); +int blk_get_refcnt(BlockBackend *blk); void blk_ref(BlockBackend *blk); void blk_unref(BlockBackend *blk); const char *blk_name(BlockBackend *blk); -- 2.6.1
[Qemu-block] [PATCH 3/3] iotests: Add test for the blockdev-del command
Signed-off-by: Alberto Garcia --- tests/qemu-iotests/139 | 189 + tests/qemu-iotests/139.out | 5 ++ tests/qemu-iotests/group | 1 + 3 files changed, 195 insertions(+) create mode 100644 tests/qemu-iotests/139 create mode 100644 tests/qemu-iotests/139.out diff --git a/tests/qemu-iotests/139 b/tests/qemu-iotests/139 new file mode 100644 index 000..325ac7a --- /dev/null +++ b/tests/qemu-iotests/139 @@ -0,0 +1,189 @@ +#!/usr/bin/env python +# +# Test cases for the QMP 'blockdev-del' command +# +# Copyright (C) 2015 Igalia, S.L. +# Author: Alberto Garcia +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +import os +import iotests +import time + +base_img = os.path.join(iotests.test_dir, 'base.img') +new_img = os.path.join(iotests.test_dir, 'new.img') + +class TestBlockdevDel(iotests.QMPTestCase): + +def setUp(self): +iotests.qemu_img('create', '-f', iotests.imgfmt, base_img, '1M') +self.vm = iotests.VM() +self.vm.launch() + +def tearDown(self): +self.vm.shutdown() +os.remove(base_img) +if os.path.isfile(new_img): +os.remove(new_img) + +def addBlockDev(self, block_backend = True): +opts = {'node-name': 'node0', +'driver': iotests.imgfmt, +'file': {'filename': base_img, + 'driver': 'file'}} +if block_backend: +opts['id'] = 'drive0' +result = self.vm.qmp('blockdev-add', conv_keys = False, options = opts) +self.assert_qmp(result, 'return', {}) + +def delBlockDev(self, use_node_name = False, expect_error = False): +if use_node_name: +result = self.vm.qmp('blockdev-del', device = 'node0') +else: +result = self.vm.qmp('blockdev-del', device = 'drive0') +if expect_error: +self.assert_qmp(result, 'error/class', 'GenericError') +else: +self.assert_qmp(result, 'return', {}) + +def addDeviceModel(self): +result = self.vm.qmp('device_add', id = 'device0', + driver = 'virtio-blk-pci', drive = 'drive0') +self.assert_qmp(result, 'return', {}) + +def delDeviceModel(self): +result = self.vm.qmp('device_del', id = 'device0') +self.assert_qmp(result, 'return', {}) + +result = self.vm.qmp('system_reset') +self.assert_qmp(result, 'return', {}) + +device_path = '/machine/peripheral/device0/virtio-backend' +event = self.vm.event_wait(name="DEVICE_DELETED", + match={'data': {'path': device_path}}) +self.assertNotEqual(event, None) + +event = self.vm.event_wait(name="DEVICE_DELETED", + match={'data': {'device': 'device0'}}) +self.assertNotEqual(event, None) + +def ejectDrive(self, expect_error = False): +result = self.vm.qmp('eject', device = 'drive0') +if expect_error: +self.assert_qmp(result, 'error/class', 'GenericError') +else: +self.assert_qmp(result, 'return', {}) + +def createSnapshot(self): +opts = {'node-name': 'node0', +'snapshot-file': new_img, +'snapshot-node-name': 'overlay0', +'format': iotests.imgfmt} +result = self.vm.qmp('blockdev-snapshot-sync', conv_keys=False, **opts) +self.assert_qmp(result, 'return', {}) + +def createMirror(self): +opts = {'device': 'drive0', +'target': new_img, +'sync': 'top', +'format': iotests.imgfmt} +result = self.vm.qmp('drive-mirror', conv_keys=False, **opt
[Qemu-block] [PATCH 2/3] block: Add 'blockdev-del' QMP command
This is the companion to 'blockdev-add'. It allows deleting a BlockBackend with its associated BlockDriverState tree, or a BlockDriverState that is not attached to any backend. In either case, the command fails if the reference count is greater than 1 or the BlockDriverState has any parents. Signed-off-by: Alberto Garcia --- blockdev.c | 42 ++ qapi/block-core.json | 21 + qmp-commands.hx | 36 3 files changed, 99 insertions(+) diff --git a/blockdev.c b/blockdev.c index 2360c1f..c448a0b 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3402,6 +3402,48 @@ fail: qmp_output_visitor_cleanup(ov); } +void qmp_blockdev_del(const char *device, Error **errp) +{ +AioContext *aio_context; +BlockBackend *blk; +BlockDriverState *bs; + +blk = blk_by_name(device); +if (blk) { +bs = blk_bs(blk); +aio_context = blk_get_aio_context(blk); +} else { +bs = bdrv_find_node(device); +if (!bs) { +error_setg(errp, "Cannot find block device %s", device); +return; +} +blk = bs->blk; +aio_context = bdrv_get_aio_context(bs); +} + +aio_context_acquire(aio_context); + +if (bs && bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_DRIVE_DEL, errp)) { +goto out; +} + +if (blk_get_refcnt(blk) > 1 || +(bs && (bs->refcnt > 1 || !QLIST_EMPTY(&bs->parents { +error_setg(errp, "Block device %s is being used", device); +goto out; +} + +if (blk) { +blk_unref(blk); +} else { +bdrv_unref(bs); +} + +out: +aio_context_release(aio_context); +} + BlockJobInfoList *qmp_query_block_jobs(Error **errp) { BlockJobInfoList *head = NULL, **p_next = &head; diff --git a/qapi/block-core.json b/qapi/block-core.json index 5f12af7..20897eb 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1877,6 +1877,27 @@ { 'command': 'blockdev-add', 'data': { 'options': 'BlockdevOptions' } } ## +# @blockdev-del: +# +# Deletes a block device. The selected device can be either a block +# backend or a graph node. +# +# In the former case the backend will be destroyed, along with its +# inserted medium if there's any. +# +# In the latter case the node will be destroyed, along with the +# backend it is attached to if there's any. +# +# The command will fail if the selected block device is still being +# used. +# +# @device: Name of the block backend or the graph node to be deleted. +# +# Since: 2.5 +## +{ 'command': 'blockdev-del', 'data': { 'device': 'str' } } + +## # @blockdev-open-tray: # # Opens a block device's tray. If there is a block driver state tree inserted as diff --git a/qmp-commands.hx b/qmp-commands.hx index 4f03d11..b8b133e 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -3921,6 +3921,42 @@ Example (2): EQMP { +.name = "blockdev-del", +.args_type = "device:s", +.mhandler.cmd_new = qmp_marshal_blockdev_del, +}, + +SQMP +blockdev-del + +Since 2.5 + +Deletes a block device. The selected device can be either a block +backend or a graph node. + +In the former case the backend will be destroyed, along with its +inserted medium if there's any. + +In the latter case the node will be destroyed, along with the +backend it is attached to if there's any. + +The command will fail if the selected block device is still being +used. + +Arguments: + +- "device": Identifier of the block device or graph node name (json-string) + +Example: + +-> { "execute": "blockdev-del", +"arguments": { "device": "virtio0" } + } +<- { "return": {} } + +EQMP + +{ .name = "blockdev-open-tray", .args_type = "device:s,force:b?", .mhandler.cmd_new = qmp_marshal_blockdev_open_tray, -- 2.6.1
Re: [Qemu-block] [PATCH v2 07/22] block: Add statistics for failed and invalid I/O operations
On Tue 13 Oct 2015 05:42:33 PM CEST, Stefan Hajnoczi wrote: > On Fri, Oct 02, 2015 at 05:26:17PM +0300, Alberto Garcia wrote: >> +void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie) >> +{ >> +int64_t time_ns = qemu_clock_get_ns(clock_type); >> + >> +assert(cookie->type < BLOCK_MAX_IOTYPE); >> + >> +stats->failed_ops[cookie->type]++; >> +stats->total_time_ns[cookie->type] += time_ns - cookie->start_time_ns; >> +stats->last_access_time_ns = time_ns; >> +} >> + >> +void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type) >> +{ >> +assert(type < BLOCK_MAX_IOTYPE); >> + >> +stats->invalid_ops[type]++; >> +stats->last_access_time_ns = qemu_clock_get_ns(clock_type); >> +} > > block_acct_failed() updates total_time_ns[] but block_acct_invalid() > does not. I guess that's because block_acct_invalid() is expected to > happen during request submission and has effectively 0 duration? That's right, I put it in the commit message but I guess it's a good idea to mention it here as well. I was actually updating total_time_ns[] in block_acct_invalid() as well, but then when I was adding the block_acct_invalid() calls to the device models I realized that most/all of them happen before we even call block_acct_start(). So I decided to leave it like it is now, first because it makes the code simpler and second because as you say there would be no I/O to measure. Berto
Re: [Qemu-block] [PATCH 1/3] block: Use bdrv_lookup_bs() instead of bdrv_find_node()
On Wed 14 Oct 2015 03:16:00 PM CEST, Jeff Cody wrote: > This is a precursor to making bdrv_find_node() static, and internal > to block.c > > To find a BlockDriverState interface, it can be done via blk_by_name(), > bdrv_find_node(), and bdrv_lookup_bs(). The latter can take the place > of the other two, in the instances where we are only concerned with > the BlockDriverState. > > There is no benefit in calling bdrv_find_node() directly. This patch > replaces all calls to bdrv_find_node() outside of block.c with > bdrv_lookup_bs(). > > Signed-off-by: Jeff Cody Reviewed-by: Alberto Garcia
Re: [Qemu-block] [PATCH 2/3] block: make bdrv_find_node() static
On Wed 14 Oct 2015 03:16:01 PM CEST, Jeff Cody wrote: > This patch does two things: it moves bdrv_find_node() up before the > first usage in block.c, and it makes the function static so that it > is only internal to block.c. > > Signed-off-by: Jeff Cody Reviewed-by: Alberto Garcia
Re: [Qemu-block] [PATCH v2 06/22] block: Add "supports_stats" field to BlockStats
On Tue 13 Oct 2015 05:38:30 PM CEST, Stefan Hajnoczi wrote: >> query-blockstats always returns a BlockDeviceStats structure for each >> BDS, regardless of whether it implements accounting or not. Since the >> field is mandatory there's no way to tell that from the values alone. >> >> This field solves that problem by indicating which BDSs support I/O >> accounting. > > I'm not sure I understand the problem. > > If I/O accounting isn't being used then all fields will be 0? Yes, but there's no way to tell if that happens because I/O accounting is not supported or because there hasn't been any I/O yet. There's one additional problem: this patch assumes that accounting is supported if this BDS is attached to a BlockBackend. But we don't know if the device model supports accounting or not, I still need to figure out what's the best way to do it. Berto
Re: [Qemu-block] [PATCH] block: fix memory leak in early exit
On Thu 15 Oct 2015 05:54:27 PM CEST, Stefan Hajnoczi wrote: > The stream block job has two early exit code paths. They do not free > s->backing_file_str. > > Also, the early exits rely on the fact that the coroutine hasn't yielded > yet and was launched from the main thread. Therefore the coroutine is > guaranteed to be running in the main thread where block_job_completed() > may be called safely. This is very subtle so it's nice to eliminate the > assumption by unifying the early exit with the normal exit code path. > > Cc: Fam Zheng > Cc: Jeff Cody > Signed-off-by: Stefan Hajnoczi I had a slightly simpler version of this in my intermediate block streaming series in case you're interested: https://patchwork.ozlabs.org/patch/471881/ But this one looks good to me too, so: Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v2 06/22] block: Add "supports_stats" field to BlockStats
On Thu 15 Oct 2015 04:58:22 PM CEST, Stefan Hajnoczi wrote: >> > If I/O accounting isn't being used then all fields will be 0? >> >> Yes, but there's no way to tell if that happens because I/O >> accounting is not supported or because there hasn't been any I/O yet. >> >> There's one additional problem: this patch assumes that accounting is >> supported if this BDS is attached to a BlockBackend. But we don't >> know if the device model supports accounting or not, I still need to >> figure out what's the best way to do it. > > Is there a corresponding libvirt patch or why does it matter whether > the QMP client can detect whether blockstats are available? I'm thinking that keeping this patch as it is now is probably not very useful. Block statistics are kept in the BlockBackend, so the only BDS that is going to have data != 0 when you call query-blockstats is the topmost one. There's probably no need to have an additional flag for this. If you disconnect a BlockBackend from a device model that implements accounting and then connect it to one that does not, there's no way for the client to know that. That's probably worth exposing in the API, but this patch does not do that yet, so I think we can skip it for now. Berto
Re: [Qemu-block] [PATCH] blkdebug: Don't confuse image as backing file
On Fri 16 Oct 2015 12:46:04 PM CEST, Fam Zheng wrote: > The word "backing file" nowadays refers to the backing_hd in the > external snapshot sense (i.e. bs->backing_hd), instead of the file sense > (bs->file). Correct the comment to use the right term. > > Signed-off-by: Fam Zheng Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH 2/3] block: Add 'blockdev-del' QMP command
On Sat 17 Oct 2015 08:23:26 PM CEST, Max Reitz wrote: >> +void qmp_blockdev_del(const char *device, Error **errp) >> +{ >> +AioContext *aio_context; >> +BlockBackend *blk; >> +BlockDriverState *bs; >> + >> +blk = blk_by_name(device); >> +if (blk) { >> +bs = blk_bs(blk); >> +aio_context = blk_get_aio_context(blk); >> +} else { >> +bs = bdrv_find_node(device); >> +if (!bs) { >> +error_setg(errp, "Cannot find block device %s", device); >> +return; >> +} >> +blk = bs->blk; > > After seeing testBlockBackendUsingNodeName() in patch 3, I don't know > about this. We often have this notion of "If you need a BDS, you can > specify both the BB and the node name", but here it's the other way > around: One "needs" a BB (at least that's what the command will work > on) and one gets it by specifying a node name. That seems a bit > strange to me. I'm also not completely sure about this to be honest and I'm not surprised that you don't like it. I was thinking about this these last couple of days and I think that this alternative is probably better: - If you specify the name of a backend, delete the backend and its BDS (if there's any). - If you specify the name of a BDS (using its node name), delete the BDS and only the BDS. Berto
Re: [Qemu-block] [PATCH v6 1/4] Add new block driver interface to add/delete a BDS's child
On Fri 16 Oct 2015 10:57:43 AM CEST, Wen Congyang wrote: > In some cases, we want to take a quorum child offline, and take > another child online. > > Signed-off-by: Wen Congyang > Signed-off-by: zhanghailiang > Signed-off-by: Gonglei > Reviewed-by: Eric Blake > --- Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v6 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
On Fri 16 Oct 2015 10:57:44 AM CEST, Wen Congyang wrote: > +static void quorum_add_child(BlockDriverState *bs, BlockDriverState > *child_bs, > + Error **errp) > +{ > +BDRVQuorumState *s = bs->opaque; > +BdrvChild *child; > + > +bdrv_drain(bs); > + > +assert(s->num_children <= INT_MAX / sizeof(BdrvChild *)); > +if (s->num_children == INT_MAX / sizeof(BdrvChild *)) { > +error_setg(errp, "Too many children"); > +return; > +} This limit guarantees that s->num_children <= INT_MAX and that s->num_children * sizeof(BdrvChild *) <= SIZE_MAX, right? > +s->children = g_renew(BdrvChild *, s->children, s->num_children + 1); > + > +bdrv_ref(child_bs); > +child = bdrv_attach_child(bs, child_bs, &child_format); > +s->children[s->num_children++] = child; > +} Maybe you want to use ++s->num_children in the g_renew() call to make it symmetric to the one in quorum_del_child()... > +/* We can safely remove this child now */ > +memmove(&s->children[i], &s->children[i + 1], > +(s->num_children - i - 1) * sizeof(void *)); > +s->children = g_renew(BdrvChild *, s->children, --s->num_children); > +bdrv_unref_child(bs, child); > +} But it's not really important, so you can leave it as it is now if you prefer. Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH 0/3] Add 'blockdev-del' command
On Mon 19 Oct 2015 01:27:45 PM CEST, Kevin Wolf wrote: > I've been thinking a bit about the creation and deletion of > BlockBackends a bit last week, and honestly it still feels a bit messy > (or maybe I just don't fully understand it yet). Anyway, the cover > letter of this series seems to be a good place for trying to write it > down. Thanks for the summary! > So we seem to have two criteria to distinguish BDSes: > > 1. Does/Doesn't have a BlockBackend on top. >In the future, multiple BlockBackends are possible, too. One single BDS attached to multiple BlockBackends at the same time? What's the use case? > 1. BB without BDS > 2. BB attached to BDS without monitor reference > 3. BB attached to BDS with monitor reference > 4. BDS without BB and without monitor reference > 5. BDS without BB and with monitor reference I would say that the rule that we should follow is: if the outcome is not clear for a particular case, then the command fails. We should try not to guess what the user wants. blockdev-del should only delete something when there's only one sane alternative. > Let me start with the first two questions for all cases: > > 1. As far as I know, it's currently not possible to directly create a BB >without a BDS (or before Max' series with an empty BDS). You have to >create a BB with an opened BDS and then eject that. Oops. > >blockdev-del is easy: It deletes the BB and nothing else. I agree. > 2. This is the normal case, easy to produce with blockdev-add. >The two options for deleting something are removing the BDS while >keeping the BB (this is already covered by 'eject') and dropping the >BB (probably makes sense to use 'blockdev-del' here). There is no >monitor reference to the BDS anyway, so blockdev-dev can't delete >that. > >Dropping the BB automatically also drops the BDS in simple cases. In >more complicated cases where the BDS has got more references later, >it might still exist. Then the blockdev-del wasn't the exact opposite >operation of blockdev-add. > >Is this a problem for a user/management tool that wants to keep track >of which image files are still in use? I would say that if the BDS has additional references then 'blockdev-del' should fail and do nothing (what the current patch does). > 3. A BDS with a monitor reference can be created (with some patches) by >using blockdev-add with a node-name, but no id. Directly attaching it >to a BB is tricky today, even though I'm sure you could do it with >some clever use of block jobs... With 1. fixed (i.e. you can create >an empty BB), insert-medium should do the job. > >Here we have even more choices for deleting something: Delete the BB, >but leave the BDS alone; delete the BDS and leave an empty BB behind >(this is different from eject because eject would drop the connection >between BB and BDS, but both would continue to exist!); delete both >at once. > >We had two separate blockdev-add commands for the BDS and the BB, so >it makes sense to require two separate blockdev-del commands as well. >In order to keep blockdev-add/del symmetrical, we would have to make >blockdev-del of a node-name delete the BDS and blockdev-del of an id >delete the BB. This is a tricky case, thanks for pointing it out. I would also go for the conservative option here: if you create a BDS with blockdev-add and then attach it to a BlockBackend, you're adding one additional reference (via blk_insert_bs()). Therefore blockdev-del would fail like in case 2. You need to eject the BDS first in order to decide which one you want to delete. > 4. That's the typical bs->file. Created by nested blockdev-add >descriptions. blockdev-del errors out, there is no monitor reference >to drop, and there is also no BB that the request could be used >for. Correct. > 5. Created by a top-level blockdev-add with node-name and no id (like 3. >but without attaching it to a BB afterwards). blockdev-del can only >mean deleting the BDS. Correct. > I haven't thought about it enough yet, but it seems to me that we > can't do the BDS/BB aliasing with blockdev-del, but need to interpret > node-name as BDS and id as BB. Perhaps we also shouldn't use a single > 'device' option then, but a union of 'node-name' and 'id' (just like > the same devices were created in blockdev-add). Having two separate options could make things more clear, indeed. Berto
Re: [Qemu-block] [PATCH 2/3] block: Add 'blockdev-del' QMP command
On Sat 17 Oct 2015 08:06:19 PM CEST, Max Reitz wrote: >> +if (blk_get_refcnt(blk) > 1 || >> +(bs && (bs->refcnt > 1 || !QLIST_EMPTY(&bs->parents { >> +error_setg(errp, "Block device %s is being used", device); >> +goto out; >> +} > > I can't think of a way off the top of my head that a BB with a refcount > of one or a BDS with a refcount of one without any parents would not be > monitor-owned, but I don't quite like assuming it just from the refcount > and the parent list anyway. I agree, I would also like to have a clearer way to do it. I'll try to go over all possible ways to create a BDS and see other cases that I might have missed. At the very list I would like to extend the iotest and make it as comprehensive as possible. > The two patches which are significant for this patch are: > - "blockdev: Keep track of monitor-owned BDS" from the first series > - "blockdev: Add list of monitor-owned BlockBackends" from the second > > Explaining what they do and why they are relevant to this patch doesn't > really seem necessary, but anyway: They add one list for the > monitor-owned BDSs and one for the monitor-owned BBs. Yeah, those would be handy indeed. > It's up to you. It probably works just how you implemented it, and > considering the pace of the "BB and media" series (and other series of > mine in the past year), getting those two series merged may be a > matter of decades. So it's probably best to do it like this for now > and I'll fix it up then... I think I'll first try to go over all possible cases and see what I find. Thanks, Berto
Re: [Qemu-block] [PATCH 0/3] Add 'blockdev-del' command
On Mon 19 Oct 2015 05:04:50 PM CEST, Kevin Wolf wrote: >> > So we seem to have two criteria to distinguish BDSes: >> > >> > 1. Does/Doesn't have a BlockBackend on top. >> >In the future, multiple BlockBackends are possible, too. >> >> One single BDS attached to multiple BlockBackends at the same time? >> What's the use case? > > For having multiple users of the BDS, e.g. a guest device and an NBD > server. > > Or sometimes maybe just because it's the logical thing to happen: > Imagine an image B with a backing file A. Both have a BlockBackend on > them. Do a live commit of B into A. On completion, the BDS B is > deleted and both BBs point to A. Can you have a BlockBackend on a BDS that is being used as a backing file? What is that for? And can you even write to it? >> > I haven't thought about it enough yet, but it seems to me that we >> > can't do the BDS/BB aliasing with blockdev-del, but need to interpret >> > node-name as BDS and id as BB. Perhaps we also shouldn't use a single >> > 'device' option then, but a union of 'node-name' and 'id' (just like >> > the same devices were created in blockdev-add). >> >> Having two separate options could make things more clear, indeed. > > Note that it doesn't improve things with your generalised rule (if I got > it right anyway). > > So I think these are the options we have: > > a) blockdev-del only works on a BDS or empty BB without any references. >Before deleting a BDS, it must be ejected from the BB; before >deleting a BB, its BDS must be ejected. > > b) Your rule: Same as a), but treating BB and BDS as a unit if the BDS >is only referenced by the BB. > > c) No aliasing. You explicitly refer to a BB or to a BDS. The difference >from A is that it works without an explicit eject: When a BDS is >deleted, it is automatically ejected from the BB; and when the BB is >deleted, the BDS stays around if it still has other references. I'm currently thinking about d), which tries to maintain the symmetry with blockdev-add: - blockdev-del would have two parameters, 'id' and 'node-name', and only one of them can be set, so you must choose whether you want to delete a backend or a BDS. - blockdev-add can either create a backend with a BDS, or a BDS alone, so: - If you created a backend and you try to delete it you can do it (along with its BDS) as long as neither the backend nor the BDS are being used (no extra references, no parents). This means that the operation will fail if there's a BDS that has been created separately and manually attached to the the backend. - If you created a BDS alone and you try to delete it you can do it as long as no one else is using it. This would delete the BDS and only the BDS (because that's what you create with blockdev-add). If it's currently attached to a backend then the operation fails. I think this is what best mirrors blockdev-add. It however has the following consequence: blockdev-add id=drive0 node-name=node0 ... blockdev-del node-name=node0 <-- This fails, node0 is used by drive0 blockdev-del id=drive0 <-- This works and removes drive0 and node0 Berto
Re: [Qemu-block] [Qemu-devel] [PATCH 0/3] Add 'blockdev-del' command
On Wed 21 Oct 2015 10:57:32 AM CEST, Markus Armbruster wrote: > blockdev-add is a big & hairy feature that has taken considerable time > to develop, spanning multiple releases, and still isn't quite done > (never been closer, though). As such, it's a textbook example of an > experimental interface, and should have been x-blockdev-add. We > messed that up, and it's probably not worth renaming now. > > Quite a few users have been trying to use blockdev-add, and ran into > roadblocks sooner or later. Inevitable, given its incomplete state. > Perhaps an x- prefix providing fair warning would've saved them > trouble, but that's water under the bridge now. We tried to warn them > off with scary documentation instead (commit da2cf4e). > > Naturally, running into an obvious roadblock early is less bad than > running into a subtle one late. The lack of blockdev-del has served > as an obvious early one. > > Mind, that's not an objection to implementing it now. I think the > time's ripe, actually. I just want us to try to erect a proper > warning sign where the "no blockdev-del" roadblock used to be. Naming > it x-blockdev-del together with a note explaining why it's > experimental should do the trick. That works for me, I can rename it in my next submission and add the appropriate warnings. Berto
Re: [Qemu-block] [PATCH v7 31/39] blockdev: Add blockdev-insert-medium
On Mon 19 Oct 2015 05:53:37 PM CEST, Max Reitz wrote: > And a helper function for that, which directly takes a pointer to the > BDS to be inserted instead of its node-name (which will be used for > implementing 'change' using blockdev-insert-medium). Shouldn't this update bdrv_states? Consider this scenario: 1) We add a drive and eject its BDS { "execute": "blockdev-add", "arguments": { "options": { "driver": "qcow2", "file": { "driver": "file", "filename": "/tmp/hd0.img"}, "id": "drive0" }}} { "execute": "eject", "arguments": {"device": "drive0"}} 2) We create a new BDS and insert it in drive0 { "execute": "blockdev-add", "arguments": { "options": { "driver": "qcow2", "file": { "driver": "file", "filename": "/tmp/hd0.img"}, "node-name": "hd0" }}} { "execute": "blockdev-insert-medium", "arguments": { "device": "drive0", "node-name": "hd0" }} 3) Now we try to create a snapshot... { "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive0", "snapshot-file": "/tmp/new.img", "format": "qcow2" }} {"error": {"class": "GenericError", "desc": "The feature 'snapshot' is not enabled"}} Ooops! It seems that this is because the new node hd0 is not in the bdrv_states list. 4) Let's try to create a mirror instead { "execute": "drive-mirror", "arguments": { "device": "drive0", "target": "/tmp/new.img", "sync": "top"}} {"return": {}} {"timestamp": {"seconds": 1445427560, "microseconds": 765993}, "event": "BLOCK_JOB_READY", "data": {"device": "drive0", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}} 5) Ok, the block job is ready, so let's complete it: { "execute": "query-block-jobs" } {"return": []} Ooops! Again, hd0 is not in bdrv_states so QEMU cannot find the block job. 6) Anyway, we only need the backend name in order to complete a block job, so surely we can do it even if it's not in the list: { "execute": "block-job-complete", "arguments": { "device": "drive0"}} Segmentation fault That's QTAILQ_INSERT_BEFORE() in change_parent_backing_link(). This code assumes that since the 'from' BDS is attached to a backend, it must also be in bdrv_states. Berto
[Qemu-block] [PATCH] throttle: Remove throttle_group_lock/unlock()
The group throttling code was always meant to handle its locking internally. However, bdrv_swap() was touching the ThrottleGroup structure directly and therefore needed an API for that. Now that bdrv_swap() no longer exists there's no need for the throttle_group_lock() API anymore. Signed-off-by: Alberto Garcia --- block/throttle-groups.c | 31 +-- include/block/throttle-groups.h | 3 --- 2 files changed, 1 insertion(+), 33 deletions(-) diff --git a/block/throttle-groups.c b/block/throttle-groups.c index 1abc6fc..f70c31a 100644 --- a/block/throttle-groups.c +++ b/block/throttle-groups.c @@ -33,8 +33,7 @@ * its own locking. * * This locking is however handled internally in this file, so it's - * mostly transparent to outside users (but see the documentation in - * throttle_groups_lock()). + * transparent to outside users. * * The whole ThrottleGroup structure is private and invisible to * outside users, that only use it through its ThrottleState. @@ -465,34 +464,6 @@ void throttle_group_unregister_bs(BlockDriverState *bs) bs->throttle_state = NULL; } -/* Acquire the lock of this throttling group. - * - * You won't normally need to use this. None of the functions from the - * ThrottleGroup API require you to acquire the lock since all of them - * deal with it internally. - * - * This should only be used in exceptional cases when you want to - * access the protected fields of a BlockDriverState directly - * (e.g. bdrv_swap()). - * - * @bs: a BlockDriverState that is member of the group - */ -void throttle_group_lock(BlockDriverState *bs) -{ -ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts); -qemu_mutex_lock(&tg->lock); -} - -/* Release the lock of this throttling group. - * - * See the comments in throttle_group_lock(). - */ -void throttle_group_unlock(BlockDriverState *bs) -{ -ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts); -qemu_mutex_unlock(&tg->lock); -} - static void throttle_groups_init(void) { qemu_mutex_init(&throttle_groups_lock); diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h index fab113f..322139a 100644 --- a/include/block/throttle-groups.h +++ b/include/block/throttle-groups.h @@ -40,7 +40,4 @@ void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs, unsigned int bytes, bool is_write); -void throttle_group_lock(BlockDriverState *bs); -void throttle_group_unlock(BlockDriverState *bs); - #endif -- 2.6.1
[Qemu-block] [PATCH v3 06/21] block: Add statistics for failed and invalid I/O operations
This patch adds the block_acct_failed() and block_acct_invalid() functions to allow keeping track of failed and invalid I/O operations. The number of failed and invalid operations is exposed in BlockDeviceStats. We don't keep track of the time spent on invalid operations because they are cancelled immediately when they are started. Signed-off-by: Alberto Garcia --- block/accounting.c | 23 +++ block/qapi.c | 10 ++ include/block/accounting.h | 4 qapi/block-core.json | 23 ++- qmp-commands.hx| 12 5 files changed, 71 insertions(+), 1 deletion(-) diff --git a/block/accounting.c b/block/accounting.c index d427fa8..49a9444 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -51,6 +51,29 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie) stats->last_access_time_ns = time_ns; } +void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie) +{ +int64_t time_ns = qemu_clock_get_ns(clock_type); + +assert(cookie->type < BLOCK_MAX_IOTYPE); + +stats->failed_ops[cookie->type]++; +stats->total_time_ns[cookie->type] += time_ns - cookie->start_time_ns; +stats->last_access_time_ns = time_ns; +} + +void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type) +{ +assert(type < BLOCK_MAX_IOTYPE); + +/* block_acct_done() and block_acct_failed() update + * total_time_ns[], but this one does not. The reason is that + * invalid requests are accounted during their submission, + * therefore there's no actual I/O involved. */ + +stats->invalid_ops[type]++; +stats->last_access_time_ns = qemu_clock_get_ns(clock_type); +} void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, int num_requests) diff --git a/block/qapi.c b/block/qapi.c index 539c2e3..84d8412 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -351,6 +351,16 @@ static BlockStats *bdrv_query_stats(const BlockDriverState *bs, s->stats->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE]; s->stats->rd_operations = stats->nr_ops[BLOCK_ACCT_READ]; s->stats->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE]; + +s->stats->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ]; +s->stats->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE]; +s->stats->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH]; + +s->stats->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ]; +s->stats->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE]; +s->stats->invalid_flush_operations = +stats->invalid_ops[BLOCK_ACCT_FLUSH]; + s->stats->rd_merged = stats->merged[BLOCK_ACCT_READ]; s->stats->wr_merged = stats->merged[BLOCK_ACCT_WRITE]; s->stats->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH]; diff --git a/include/block/accounting.h b/include/block/accounting.h index 4b2b999..b50e3cc 100644 --- a/include/block/accounting.h +++ b/include/block/accounting.h @@ -38,6 +38,8 @@ enum BlockAcctType { typedef struct BlockAcctStats { uint64_t nr_bytes[BLOCK_MAX_IOTYPE]; uint64_t nr_ops[BLOCK_MAX_IOTYPE]; +uint64_t invalid_ops[BLOCK_MAX_IOTYPE]; +uint64_t failed_ops[BLOCK_MAX_IOTYPE]; uint64_t total_time_ns[BLOCK_MAX_IOTYPE]; uint64_t merged[BLOCK_MAX_IOTYPE]; int64_t last_access_time_ns; @@ -52,6 +54,8 @@ typedef struct BlockAcctCookie { void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, int64_t bytes, enum BlockAcctType type); void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie); +void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie); +void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type); void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, int num_requests); int64_t block_acct_idle_time_ns(BlockAcctStats *stats); diff --git a/qapi/block-core.json b/qapi/block-core.json index 69c3e1f..1e9b9a6 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -452,6 +452,24 @@ #nanoseconds. If the field is absent it means that #there haven't been any operations yet (Since 2.5). # +# @failed_rd_operations: The number of failed read operations +#performed by the device (Since 2.5) +# +# @failed_wr_operations: The number of failed write operations +#performed by the device (Since 2.5) +# +# @failed_flush_operations: The number of failed flush operations +# performed by the device (Since 2.5) +# +# @invalid_rd_operations: The number of invalid read operations +#
[Qemu-block] [PATCH v3 02/21] ide: Account for write operations correctly
Signed-off-by: Alberto Garcia Reviewed-by: Stefan Hajnoczi --- hw/ide/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 317406d..b559f1b 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -895,7 +895,7 @@ static void ide_sector_write(IDEState *s) qemu_iovec_init_external(&s->qiov, &s->iov, 1); block_acct_start(blk_get_stats(s->blk), &s->acct, - n * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); + n * BDRV_SECTOR_SIZE, BLOCK_ACCT_WRITE); s->pio_aiocb = blk_aio_writev(s->blk, sector_num, &s->qiov, n, ide_sector_write_cb, s); } -- 2.6.1
[Qemu-block] [PATCH v3 05/21] block: Add idle_time_ns to BlockDeviceStats
This patch adds the new field 'idle_time_ns' to the BlockDeviceStats structure, indicating the time that has passed since the previous I/O operation. It also adds the block_acct_idle_time_ns() call, to ensure that all references to the clock type used for accounting are in the same place. This will later allow us to use a different clock for iotests. Signed-off-by: Alberto Garcia --- block/accounting.c | 12 ++-- block/qapi.c | 5 + hmp.c | 4 +++- include/block/accounting.h | 2 ++ qapi/block-core.json | 6 +- qmp-commands.hx| 10 -- 6 files changed, 33 insertions(+), 6 deletions(-) diff --git a/block/accounting.c b/block/accounting.c index 6f4c0f1..d427fa8 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -40,12 +40,15 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie) { +int64_t time_ns = qemu_clock_get_ns(clock_type); +int64_t latency_ns = time_ns - cookie->start_time_ns; + assert(cookie->type < BLOCK_MAX_IOTYPE); stats->nr_bytes[cookie->type] += cookie->bytes; stats->nr_ops[cookie->type]++; -stats->total_time_ns[cookie->type] += -qemu_clock_get_ns(clock_type) - cookie->start_time_ns; +stats->total_time_ns[cookie->type] += latency_ns; +stats->last_access_time_ns = time_ns; } @@ -55,3 +58,8 @@ void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, assert(type < BLOCK_MAX_IOTYPE); stats->merged[type] += num_requests; } + +int64_t block_acct_idle_time_ns(BlockAcctStats *stats) +{ +return qemu_clock_get_ns(clock_type) - stats->last_access_time_ns; +} diff --git a/block/qapi.c b/block/qapi.c index ec0f513..539c2e3 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -357,6 +357,11 @@ static BlockStats *bdrv_query_stats(const BlockDriverState *bs, s->stats->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE]; s->stats->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ]; s->stats->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH]; + +s->stats->has_idle_time_ns = stats->last_access_time_ns > 0; +if (s->stats->has_idle_time_ns) { +s->stats->idle_time_ns = block_acct_idle_time_ns(stats); +} } s->stats->wr_highest_offset = bs->wr_highest_offset; diff --git a/hmp.c b/hmp.c index 28caa7d..8ee473e 100644 --- a/hmp.c +++ b/hmp.c @@ -522,6 +522,7 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict) " flush_total_time_ns=%" PRId64 " rd_merged=%" PRId64 " wr_merged=%" PRId64 + " idle_time_ns=%" PRId64 "\n", stats->value->stats->rd_bytes, stats->value->stats->wr_bytes, @@ -532,7 +533,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict) stats->value->stats->rd_total_time_ns, stats->value->stats->flush_total_time_ns, stats->value->stats->rd_merged, - stats->value->stats->wr_merged); + stats->value->stats->wr_merged, + stats->value->stats->idle_time_ns); } qapi_free_BlockStatsList(stats_list); diff --git a/include/block/accounting.h b/include/block/accounting.h index 66637cd..4b2b999 100644 --- a/include/block/accounting.h +++ b/include/block/accounting.h @@ -40,6 +40,7 @@ typedef struct BlockAcctStats { uint64_t nr_ops[BLOCK_MAX_IOTYPE]; uint64_t total_time_ns[BLOCK_MAX_IOTYPE]; uint64_t merged[BLOCK_MAX_IOTYPE]; +int64_t last_access_time_ns; } BlockAcctStats; typedef struct BlockAcctCookie { @@ -53,5 +54,6 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie); void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, int num_requests); +int64_t block_acct_idle_time_ns(BlockAcctStats *stats); #endif diff --git a/qapi/block-core.json b/qapi/block-core.json index 5f12af7..69c3e1f 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -448,6 +448,10 @@ # @wr_merged: Number of write requests that have been merged into another # request (Since 2.3). # +# @idle_time_ns: #optional Time since the last I/O operation, in +#nanoseconds. If the field is absent it means that +#there haven't been any operations yet (Since 2.5). +# # Since: 0.14.0 ##
[Qemu-block] [PATCH v3 12/21] block: Use QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode
This patch switches to QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode, and makes the latency of the operation constant. This way we can perform tests on the accounting code with reproducible results. Signed-off-by: Alberto Garcia --- block/accounting.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/block/accounting.c b/block/accounting.c index a941931..05a5c5f 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -25,14 +25,20 @@ #include "block/accounting.h" #include "block/block_int.h" #include "qemu/timer.h" +#include "sysemu/qtest.h" static QEMUClockType clock_type = QEMU_CLOCK_REALTIME; +static const int qtest_latency_ns = NANOSECONDS_PER_SECOND / 1000; void block_acct_init(BlockAcctStats *stats, bool account_invalid, bool account_failed) { stats->account_invalid = account_invalid; stats->account_failed = account_failed; + +if (qtest_enabled()) { +clock_type = QEMU_CLOCK_VIRTUAL; +} } void block_acct_cleanup(BlockAcctStats *stats) @@ -84,6 +90,10 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie) int64_t time_ns = qemu_clock_get_ns(clock_type); int64_t latency_ns = time_ns - cookie->start_time_ns; +if (qtest_enabled()) { +latency_ns = qtest_latency_ns; +} + assert(cookie->type < BLOCK_MAX_IOTYPE); stats->nr_bytes[cookie->type] += cookie->bytes; @@ -107,6 +117,10 @@ void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie) int64_t time_ns = qemu_clock_get_ns(clock_type); int64_t latency_ns = time_ns - cookie->start_time_ns; +if (qtest_enabled()) { +latency_ns = qtest_latency_ns; +} + stats->total_time_ns[cookie->type] += latency_ns; stats->last_access_time_ns = time_ns; -- 2.6.1
[Qemu-block] [PATCH v3 10/21] block: New option to define the intervals for collecting I/O statistics
The BlockAcctStats structure contains a list of BlockAcctTimedStats. Each one of these collects statistics about the minimum, maximum and average latencies of all I/O operations in a certain interval of time. This patch adds a new "stats-intervals" option that allows defining these intervals. Signed-off-by: Alberto Garcia --- blockdev.c | 37 + qapi/block-core.json | 4 2 files changed, 41 insertions(+) diff --git a/blockdev.c b/blockdev.c index 94635b5..c316c0a 100644 --- a/blockdev.c +++ b/blockdev.c @@ -468,6 +468,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, int bdrv_flags = 0; int on_read_error, on_write_error; bool account_invalid, account_failed; +const char *stats_intervals; BlockBackend *blk; BlockDriverState *bs; ThrottleConfig cfg; @@ -507,6 +508,8 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, account_invalid = qemu_opt_get_bool(opts, "stats-account-invalid", true); account_failed = qemu_opt_get_bool(opts, "stats-account-failed", true); +stats_intervals = qemu_opt_get(opts, "stats-intervals"); + extract_common_blockdev_options(opts, &bdrv_flags, &throttling_group, &cfg, &detect_zeroes, &error); if (error) { @@ -605,6 +608,35 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, } block_acct_init(blk_get_stats(blk), account_invalid, account_failed); + +if (stats_intervals) { +char **intervals = g_strsplit(stats_intervals, ":", 0); +unsigned i; + +if (*stats_intervals == '\0') { +error_setg(&error, "stats-intervals can't have an empty value"); +} + +for (i = 0; !error && intervals[i] != NULL; i++) { +unsigned long long val; +if (parse_uint_full(intervals[i], &val, 10) == 0 && +val > 0 && val <= UINT_MAX) { +block_acct_add_interval(blk_get_stats(blk), val); +} else { +error_setg(&error, "Invalid interval length: '%s'", + intervals[i]); +} +} + +g_strfreev(intervals); + +if (error) { +error_propagate(errp, error); +blk_unref(blk); +blk = NULL; +goto err_no_bs_opts; +} +} } blk_set_on_error(blk, on_read_error, on_write_error); @@ -3535,6 +3567,11 @@ QemuOptsList qemu_common_drive_opts = { .type = QEMU_OPT_BOOL, .help = "whether to account for failed I/O operations " "in the statistics", +},{ +.name = "stats-intervals", +.type = QEMU_OPT_STRING, +.help = "colon-separated list of intervals " +"for collecting I/O statistics, in seconds", }, { /* end of list */ } }, diff --git a/qapi/block-core.json b/qapi/block-core.json index e32b523..2c1600b 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1503,6 +1503,9 @@ # @stats-account-failed: #optional whether to include failed # operations when computing latency and last # access statistics (default: true) (Since 2.5) +# @stats-intervals: #optional colon-separated list of intervals for +# collecting I/O statistics, in seconds (default: none) +# (Since 2.5) # @detect-zeroes: #optional detect and optimize zero writes (Since 2.1) # (default: off) # @@ -1520,6 +1523,7 @@ '*read-only': 'bool', '*stats-account-invalid': 'bool', '*stats-account-failed': 'bool', +'*stats-intervals': 'str', '*detect-zeroes': 'BlockdevDetectZeroesOptions' } } ## -- 2.6.1
[Qemu-block] [PATCH v3 03/21] block: define 'clock_type' for the accounting code
Its value is still QEMU_CLOCK_REALTIME, but having it in a variable will allow us to change its value easily in the future when running in qtest mode. Signed-off-by: Alberto Garcia Reviewed-by: Stefan Hajnoczi --- block/accounting.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/block/accounting.c b/block/accounting.c index a423560..6f4c0f1 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -26,13 +26,15 @@ #include "block/block_int.h" #include "qemu/timer.h" +static QEMUClockType clock_type = QEMU_CLOCK_REALTIME; + void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, int64_t bytes, enum BlockAcctType type) { assert(type < BLOCK_MAX_IOTYPE); cookie->bytes = bytes; -cookie->start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); +cookie->start_time_ns = qemu_clock_get_ns(clock_type); cookie->type = type; } @@ -43,7 +45,7 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie) stats->nr_bytes[cookie->type] += cookie->bytes; stats->nr_ops[cookie->type]++; stats->total_time_ns[cookie->type] += -qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - cookie->start_time_ns; +qemu_clock_get_ns(clock_type) - cookie->start_time_ns; } -- 2.6.1
[Qemu-block] [PATCH v3 09/21] block: Add average I/O queue depth to BlockDeviceTimedStats
This patch adds two new fields to BlockDeviceTimedStats that track the average number of pending read and write requests for a block device. The values are calculated for the period of time defined for that interval. Signed-off-by: Alberto Garcia --- block/accounting.c | 12 block/qapi.c | 5 + include/block/accounting.h | 2 ++ include/qemu/timed-average.h | 1 + qapi/block-core.json | 9 - qmp-commands.hx | 6 ++ util/timed-average.c | 17 + 7 files changed, 51 insertions(+), 1 deletion(-) diff --git a/block/accounting.c b/block/accounting.c index 61de8ce..a941931 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -143,3 +143,15 @@ int64_t block_acct_idle_time_ns(BlockAcctStats *stats) { return qemu_clock_get_ns(clock_type) - stats->last_access_time_ns; } + +double block_acct_queue_depth(BlockAcctTimedStats *stats, + enum BlockAcctType type) +{ +uint64_t sum, elapsed; + +assert(type < BLOCK_MAX_IOTYPE); + +sum = timed_average_sum(&stats->latency[type], &elapsed); + +return (double) sum / elapsed; +} diff --git a/block/qapi.c b/block/qapi.c index 4baf6e1..99d5303 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -402,6 +402,11 @@ static BlockStats *bdrv_query_stats(const BlockDriverState *bs, dev_stats->min_flush_latency_ns = timed_average_min(fl); dev_stats->max_flush_latency_ns = timed_average_max(fl); dev_stats->avg_flush_latency_ns = timed_average_avg(fl); + +dev_stats->avg_rd_queue_depth = +block_acct_queue_depth(ts, BLOCK_ACCT_READ); +dev_stats->avg_wr_queue_depth = +block_acct_queue_depth(ts, BLOCK_ACCT_WRITE); } } diff --git a/include/block/accounting.h b/include/block/accounting.h index 09605bb..f41ddde 100644 --- a/include/block/accounting.h +++ b/include/block/accounting.h @@ -78,5 +78,7 @@ void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type); void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, int num_requests); int64_t block_acct_idle_time_ns(BlockAcctStats *stats); +double block_acct_queue_depth(BlockAcctTimedStats *stats, + enum BlockAcctType type); #endif diff --git a/include/qemu/timed-average.h b/include/qemu/timed-average.h index f1cdddc..364bf88 100644 --- a/include/qemu/timed-average.h +++ b/include/qemu/timed-average.h @@ -59,5 +59,6 @@ void timed_average_account(TimedAverage *ta, uint64_t value); uint64_t timed_average_min(TimedAverage *ta); uint64_t timed_average_avg(TimedAverage *ta); uint64_t timed_average_max(TimedAverage *ta); +uint64_t timed_average_sum(TimedAverage *ta, uint64_t *elapsed); #endif diff --git a/qapi/block-core.json b/qapi/block-core.json index 741f7e6..e32b523 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -450,6 +450,12 @@ # @avg_flush_latency_ns: Average latency of flush operations in the #defined interval, in nanoseconds. # +# @avg_rd_queue_depth: Average number of pending read operations +# in the defined interval. +# +# @avg_wr_queue_depth: Average number of pending write operations +# in the defined interval. +# # Since: 2.5 ## @@ -458,7 +464,8 @@ 'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int', 'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int', 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int', -'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int' } } +'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int', +'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } } ## # @BlockDeviceStats: diff --git a/qmp-commands.hx b/qmp-commands.hx index 9f1d2ab..526a317 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -2578,6 +2578,12 @@ Each json-object contain the following: - "avg_flush_latency_ns": average latency of flush operations in the defined interval, in nanoseconds (json-int) +- "avg_rd_queue_depth": average number of pending read +operations in the defined interval +(json-number) +- "avg_wr_queue_depth": average number of pending write +operations in the defined interval +(json-number). - "parent": Contains recursively the
[Qemu-block] [PATCH v3 00/21] Extended I/O accounting
Here's a new version of the extended I/O accounting patches. This one is rebased on top of the current master, has a few minor documentation fixes and drops the 'supports_stats' field completely. Regards, Berto v3: - Rebased on top of the current master and on Max's BlockBackend series v7 - patch 4: minor documentation fixes - patch 5: s/miliseconds/nanoseconds/ - patch 6: dropped, there's no "supports_stats" anymore - patch 7 (now 6): explain why block_acct_invalid() does not update total_time_ns[] - patch 12 (now 11): don't initialize BlockAcctCookie to { 0 }, it's not needed v2: https://lists.gnu.org/archive/html/qemu-block/2015-10/msg00161.html - First complete implementation of the new statistics v1: https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg03321.html - Initial series containing only the timed average infrastructure. Alberto Garcia (21): xen_disk: Account for flush operations ide: Account for write operations correctly block: define 'clock_type' for the accounting code util: Infrastructure for computing recent averages block: Add idle_time_ns to BlockDeviceStats block: Add statistics for failed and invalid I/O operations block: Allow configuring whether to account failed and invalid ops block: Compute minimum, maximum and average I/O latencies block: Add average I/O queue depth to BlockDeviceTimedStats block: New option to define the intervals for collecting I/O statistics qemu-io: Account for failed, invalid and flush operations block: Use QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode iotests: Add test for the block device statistics nvme: Account for failed and invalid operations virtio-blk: Account for failed and invalid operations xen_disk: Account for failed and invalid operations atapi: Account for failed and invalid operations ide: Account for failed and invalid operations macio: Account for failed operations scsi-disk: Account for failed operations block: Update copyright of the accounting code block/accounting.c | 123 ++- block/block-backend.c| 1 + block/qapi.c | 51 +++ blockdev.c | 53 +++ hmp.c| 4 +- hw/block/nvme.c | 11 +- hw/block/virtio-blk.c| 4 +- hw/block/xen_disk.c | 27 +++- hw/ide/atapi.c | 31 ++-- hw/ide/core.c| 12 +- hw/ide/macio.c | 12 +- hw/scsi/scsi-disk.c | 46 -- include/block/accounting.h | 28 include/qemu/timed-average.h | 64 qapi/block-core.json | 103 - qemu-io-cmds.c | 9 ++ qmp-commands.hx | 80 +- tests/Makefile | 4 + tests/qemu-iotests/136 | 349 +++ tests/qemu-iotests/136.out | 5 + tests/qemu-iotests/group | 1 + tests/test-timed-average.c | 90 +++ util/Makefile.objs | 1 + util/timed-average.c | 227 24 files changed, 1288 insertions(+), 48 deletions(-) create mode 100644 include/qemu/timed-average.h create mode 100644 tests/qemu-iotests/136 create mode 100644 tests/qemu-iotests/136.out create mode 100644 tests/test-timed-average.c create mode 100644 util/timed-average.c -- 2.6.1
[Qemu-block] [PATCH v3 01/21] xen_disk: Account for flush operations
Currently both BLKIF_OP_WRITE and BLKIF_OP_FLUSH_DISKCACHE are being accounted as write operations. Signed-off-by: Alberto Garcia Reviewed-by: Stefan Hajnoczi --- hw/block/xen_disk.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index 1bbc111..4869518 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -576,7 +576,9 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq) } block_acct_start(blk_get_stats(blkdev->blk), &ioreq->acct, - ioreq->v.size, BLOCK_ACCT_WRITE); + ioreq->v.size, + ioreq->req.operation == BLKIF_OP_WRITE ? + BLOCK_ACCT_WRITE : BLOCK_ACCT_FLUSH); ioreq->aio_inflight++; blk_aio_writev(blkdev->blk, ioreq->start / BLOCK_SIZE, &ioreq->v, ioreq->v.size / BLOCK_SIZE, -- 2.6.1
[Qemu-block] [PATCH v3 15/21] virtio-blk: Account for failed and invalid operations
Signed-off-by: Alberto Garcia --- hw/block/virtio-blk.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 8beb26b..aea9728 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -76,7 +76,7 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error, s->rq = req; } else if (action == BLOCK_ERROR_ACTION_REPORT) { virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); -block_acct_done(blk_get_stats(s->blk), &req->acct); +block_acct_failed(blk_get_stats(s->blk), &req->acct); virtio_blk_free_request(req); } @@ -536,6 +536,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) if (!virtio_blk_sect_range_ok(req->dev, req->sector_num, req->qiov.size)) { virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); +block_acct_invalid(blk_get_stats(req->dev->blk), + is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ); virtio_blk_free_request(req); return; } -- 2.6.1
[Qemu-block] [PATCH v3 14/21] nvme: Account for failed and invalid operations
Signed-off-by: Alberto Garcia --- hw/block/nvme.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 5da41b2..169e4fa 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -201,10 +201,11 @@ static void nvme_rw_cb(void *opaque, int ret) NvmeCtrl *n = sq->ctrl; NvmeCQueue *cq = n->cq[sq->cqid]; -block_acct_done(blk_get_stats(n->conf.blk), &req->acct); if (!ret) { +block_acct_done(blk_get_stats(n->conf.blk), &req->acct); req->status = NVME_SUCCESS; } else { +block_acct_failed(blk_get_stats(n->conf.blk), &req->acct); req->status = NVME_INTERNAL_DEV_ERROR; } if (req->has_sg) { @@ -238,18 +239,22 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, uint64_t data_size = (uint64_t)nlb << data_shift; uint64_t aio_slba = slba << (data_shift - BDRV_SECTOR_BITS); int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0; +enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ; if ((slba + nlb) > ns->id_ns.nsze) { +block_acct_invalid(blk_get_stats(n->conf.blk), acct); return NVME_LBA_RANGE | NVME_DNR; } + if (nvme_map_prp(&req->qsg, prp1, prp2, data_size, n)) { +block_acct_invalid(blk_get_stats(n->conf.blk), acct); return NVME_INVALID_FIELD | NVME_DNR; } + assert((nlb << data_shift) == req->qsg.size); req->has_sg = true; -dma_acct_start(n->conf.blk, &req->acct, &req->qsg, - is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ); +dma_acct_start(n->conf.blk, &req->acct, &req->qsg, acct); req->aiocb = is_write ? dma_blk_write(n->conf.blk, &req->qsg, aio_slba, nvme_rw_cb, req) : dma_blk_read(n->conf.blk, &req->qsg, aio_slba, nvme_rw_cb, req); -- 2.6.1
[Qemu-block] [PATCH v3 18/21] ide: Account for failed and invalid operations
Signed-off-by: Alberto Garcia --- hw/ide/core.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index b559f1b..dd7e9c5 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -574,7 +574,6 @@ static void ide_sector_read_cb(void *opaque, int ret) if (ret == -ECANCELED) { return; } -block_acct_done(blk_get_stats(s->blk), &s->acct); if (ret != 0) { if (ide_handle_rw_error(s, -ret, IDE_RETRY_PIO | IDE_RETRY_READ)) { @@ -582,6 +581,8 @@ static void ide_sector_read_cb(void *opaque, int ret) } } +block_acct_done(blk_get_stats(s->blk), &s->acct); + n = s->nsector; if (n > s->req_nb_sectors) { n = s->req_nb_sectors; @@ -621,6 +622,7 @@ static void ide_sector_read(IDEState *s) if (!ide_sect_range_ok(s, sector_num, n)) { ide_rw_error(s); +block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ); return; } @@ -672,6 +674,7 @@ static int ide_handle_rw_error(IDEState *s, int error, int op) assert(s->bus->retry_unit == s->unit); s->bus->error_status = op; } else if (action == BLOCK_ERROR_ACTION_REPORT) { +block_acct_failed(blk_get_stats(s->blk), &s->acct); if (op & IDE_RETRY_DMA) { ide_dma_error(s); } else { @@ -750,6 +753,7 @@ static void ide_dma_cb(void *opaque, int ret) if ((s->dma_cmd == IDE_DMA_READ || s->dma_cmd == IDE_DMA_WRITE) && !ide_sect_range_ok(s, sector_num, n)) { ide_dma_error(s); +block_acct_invalid(blk_get_stats(s->blk), s->acct.type); return; } @@ -826,7 +830,6 @@ static void ide_sector_write_cb(void *opaque, int ret) if (ret == -ECANCELED) { return; } -block_acct_done(blk_get_stats(s->blk), &s->acct); s->pio_aiocb = NULL; s->status &= ~BUSY_STAT; @@ -837,6 +840,8 @@ static void ide_sector_write_cb(void *opaque, int ret) } } +block_acct_done(blk_get_stats(s->blk), &s->acct); + n = s->nsector; if (n > s->req_nb_sectors) { n = s->req_nb_sectors; @@ -887,6 +892,7 @@ static void ide_sector_write(IDEState *s) if (!ide_sect_range_ok(s, sector_num, n)) { ide_rw_error(s); +block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_WRITE); return; } -- 2.6.1
[Qemu-block] [PATCH v3 08/21] block: Compute minimum, maximum and average I/O latencies
This patch keeps track of the minimum, maximum and average latencies of I/O operations during a certain interval of time. The values are exposed in the BlockDeviceTimedStats structure. An option to define the intervals to collect these statistics will be added in a separate patch. Signed-off-by: Alberto Garcia --- block/accounting.c | 43 ++ block/block-backend.c | 1 + block/qapi.c | 28 + include/block/accounting.h | 14 + qapi/block-core.json | 52 +- qmp-commands.hx| 31 +++ 6 files changed, 168 insertions(+), 1 deletion(-) diff --git a/block/accounting.c b/block/accounting.c index 923aeaf..61de8ce 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -35,6 +35,39 @@ void block_acct_init(BlockAcctStats *stats, bool account_invalid, stats->account_failed = account_failed; } +void block_acct_cleanup(BlockAcctStats *stats) +{ +BlockAcctTimedStats *s, *next; +QSLIST_FOREACH_SAFE(s, &stats->intervals, entries, next) { +g_free(s); +} +} + +void block_acct_add_interval(BlockAcctStats *stats, unsigned interval_length) +{ +BlockAcctTimedStats *s; +unsigned i; + +s = g_new0(BlockAcctTimedStats, 1); +s->interval_length = interval_length; +QSLIST_INSERT_HEAD(&stats->intervals, s, entries); + +for (i = 0; i < BLOCK_MAX_IOTYPE; i++) { +timed_average_init(&s->latency[i], clock_type, + (uint64_t) interval_length * NANOSECONDS_PER_SECOND); +} +} + +BlockAcctTimedStats *block_acct_interval_next(BlockAcctStats *stats, + BlockAcctTimedStats *s) +{ +if (s == NULL) { +return QSLIST_FIRST(&stats->intervals); +} else { +return QSLIST_NEXT(s, entries); +} +} + void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, int64_t bytes, enum BlockAcctType type) { @@ -47,6 +80,7 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie) { +BlockAcctTimedStats *s; int64_t time_ns = qemu_clock_get_ns(clock_type); int64_t latency_ns = time_ns - cookie->start_time_ns; @@ -56,6 +90,10 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie) stats->nr_ops[cookie->type]++; stats->total_time_ns[cookie->type] += latency_ns; stats->last_access_time_ns = time_ns; + +QSLIST_FOREACH(s, &stats->intervals, entries) { +timed_average_account(&s->latency[cookie->type], latency_ns); +} } void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie) @@ -65,11 +103,16 @@ void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie) stats->failed_ops[cookie->type]++; if (stats->account_failed) { +BlockAcctTimedStats *s; int64_t time_ns = qemu_clock_get_ns(clock_type); int64_t latency_ns = time_ns - cookie->start_time_ns; stats->total_time_ns[cookie->type] += latency_ns; stats->last_access_time_ns = time_ns; + +QSLIST_FOREACH(s, &stats->intervals, entries) { +timed_average_account(&s->latency[cookie->type], latency_ns); +} } } diff --git a/block/block-backend.c b/block/block-backend.c index 10e4d71..c3490fb 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -176,6 +176,7 @@ static void blk_delete(BlockBackend *blk) } g_free(blk->name); drive_info_del(blk->legacy_dinfo); +block_acct_cleanup(&blk->stats); g_free(blk); } diff --git a/block/qapi.c b/block/qapi.c index 56c8139..4baf6e1 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -346,6 +346,7 @@ static BlockStats *bdrv_query_stats(const BlockDriverState *bs, s->stats = g_malloc0(sizeof(*s->stats)); if (bs->blk) { BlockAcctStats *stats = blk_get_stats(bs->blk); +BlockAcctTimedStats *ts = NULL; s->stats->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ]; s->stats->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE]; @@ -375,6 +376,33 @@ static BlockStats *bdrv_query_stats(const BlockDriverState *bs, s->stats->account_invalid = stats->account_invalid; s->stats->account_failed = stats->account_failed; + +while ((ts = block_acct_interval_next(stats, ts))) { +BlockDeviceTimedStatsList *timed_stats = +g_malloc0(sizeof(*timed_stats)); +BlockDeviceTimedStats *dev_stats = g_malloc0(sizeof(*dev_stats)); +timed_stats->next = s->stats->timed_stats; +timed_stats->value = dev_stats; +s->stats->ti
[Qemu-block] [PATCH v3 07/21] block: Allow configuring whether to account failed and invalid ops
This patch adds two options, "stats-account-invalid" and "stats-account-failed", that can be used to decide whether invalid and failed I/O operations must be used when collecting statistics for latency and last access time. Signed-off-by: Alberto Garcia Reviewed-by: Stefan Hajnoczi --- block/accounting.c | 24 +++- block/qapi.c | 3 +++ blockdev.c | 16 include/block/accounting.h | 5 + qapi/block-core.json | 17 - qmp-commands.hx| 25 - 6 files changed, 79 insertions(+), 11 deletions(-) diff --git a/block/accounting.c b/block/accounting.c index 49a9444..923aeaf 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -28,6 +28,13 @@ static QEMUClockType clock_type = QEMU_CLOCK_REALTIME; +void block_acct_init(BlockAcctStats *stats, bool account_invalid, + bool account_failed) +{ +stats->account_invalid = account_invalid; +stats->account_failed = account_failed; +} + void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, int64_t bytes, enum BlockAcctType type) { @@ -53,13 +60,17 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie) void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie) { -int64_t time_ns = qemu_clock_get_ns(clock_type); - assert(cookie->type < BLOCK_MAX_IOTYPE); stats->failed_ops[cookie->type]++; -stats->total_time_ns[cookie->type] += time_ns - cookie->start_time_ns; -stats->last_access_time_ns = time_ns; + +if (stats->account_failed) { +int64_t time_ns = qemu_clock_get_ns(clock_type); +int64_t latency_ns = time_ns - cookie->start_time_ns; + +stats->total_time_ns[cookie->type] += latency_ns; +stats->last_access_time_ns = time_ns; +} } void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type) @@ -72,7 +83,10 @@ void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type) * therefore there's no actual I/O involved. */ stats->invalid_ops[type]++; -stats->last_access_time_ns = qemu_clock_get_ns(clock_type); + +if (stats->account_invalid) { +stats->last_access_time_ns = qemu_clock_get_ns(clock_type); +} } void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, diff --git a/block/qapi.c b/block/qapi.c index 84d8412..56c8139 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -372,6 +372,9 @@ static BlockStats *bdrv_query_stats(const BlockDriverState *bs, if (s->stats->has_idle_time_ns) { s->stats->idle_time_ns = block_acct_idle_time_ns(stats); } + +s->stats->account_invalid = stats->account_invalid; +s->stats->account_failed = stats->account_failed; } s->stats->wr_highest_offset = bs->wr_highest_offset; diff --git a/blockdev.c b/blockdev.c index b79b0a6..94635b5 100644 --- a/blockdev.c +++ b/blockdev.c @@ -467,6 +467,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, const char *buf; int bdrv_flags = 0; int on_read_error, on_write_error; +bool account_invalid, account_failed; BlockBackend *blk; BlockDriverState *bs; ThrottleConfig cfg; @@ -503,6 +504,9 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, /* extract parameters */ snapshot = qemu_opt_get_bool(opts, "snapshot", 0); +account_invalid = qemu_opt_get_bool(opts, "stats-account-invalid", true); +account_failed = qemu_opt_get_bool(opts, "stats-account-failed", true); + extract_common_blockdev_options(opts, &bdrv_flags, &throttling_group, &cfg, &detect_zeroes, &error); if (error) { @@ -599,6 +603,8 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, if (bdrv_key_required(bs)) { autostart = 0; } + +block_acct_init(blk_get_stats(blk), account_invalid, account_failed); } blk_set_on_error(blk, on_read_error, on_write_error); @@ -3519,6 +3525,16 @@ QemuOptsList qemu_common_drive_opts = { .name = "detect-zeroes", .type = QEMU_OPT_STRING, .help = "try to optimize zero writes (off, on, unmap)", +},{ +.name = "stats-account-invalid", +.type = QEMU_OPT_BOOL, +.help = "whether to account for invalid I/O operations " +"in the statistics", +},{ +.name = "stats-account-failed", +.type = QEMU_OPT_BOOL, +.help = "whether to account for failed I/O operations " +"in the statistic
[Qemu-block] [PATCH v3 04/21] util: Infrastructure for computing recent averages
This module computes the average of a set of values within a time window, keeping also track of the minimum and maximum values. In order to produce more accurate results it works internally by creating two time windows of the same period, offsetted by half of that period. Values are accounted on both windows and the data is always returned from the oldest one. Signed-off-by: Alberto Garcia --- include/qemu/timed-average.h | 63 + tests/Makefile | 4 + tests/test-timed-average.c | 90 +++ util/Makefile.objs | 1 + util/timed-average.c | 210 +++ 5 files changed, 368 insertions(+) create mode 100644 include/qemu/timed-average.h create mode 100644 tests/test-timed-average.c create mode 100644 util/timed-average.c diff --git a/include/qemu/timed-average.h b/include/qemu/timed-average.h new file mode 100644 index 000..f1cdddc --- /dev/null +++ b/include/qemu/timed-average.h @@ -0,0 +1,63 @@ +/* + * QEMU timed average computation + * + * Copyright (C) Nodalink, EURL. 2014 + * Copyright (C) Igalia, S.L. 2015 + * + * Authors: + * BenoƮt Canet + * Alberto Garcia + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) version 3 or any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef TIMED_AVERAGE_H +#define TIMED_AVERAGE_H + +#include + +#include "qemu/timer.h" + +typedef struct TimedAverageWindow TimedAverageWindow; +typedef struct TimedAverage TimedAverage; + +/* All fields of both structures are private */ + +struct TimedAverageWindow { +uint64_t min; /* minimum value accounted in the window */ +uint64_t max; /* maximum value accounted in the window */ +uint64_t sum; /* sum of all values */ +uint64_t count; /* number of values */ +int64_t expiration; /* the end of the current window in ns */ +}; + +struct TimedAverage { +uint64_t period; /* period in nanoseconds */ +TimedAverageWindow windows[2]; /* two overlapping windows of with +* an offset of period / 2 between them */ +unsigned current;/* the current window index: it's also the +* oldest window index */ +QEMUClockType clock_type; /* the clock used */ +}; + +void timed_average_init(TimedAverage *ta, QEMUClockType clock_type, +uint64_t period); + +void timed_average_account(TimedAverage *ta, uint64_t value); + +uint64_t timed_average_min(TimedAverage *ta); +uint64_t timed_average_avg(TimedAverage *ta); +uint64_t timed_average_max(TimedAverage *ta); + +#endif diff --git a/tests/Makefile b/tests/Makefile index 0531b30..0c12112 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -81,6 +81,7 @@ check-unit-y += tests/test-crypto-cipher$(EXESUF) check-unit-$(CONFIG_GNUTLS) += tests/test-crypto-tlscredsx509$(EXESUF) check-unit-$(CONFIG_GNUTLS) += tests/test-crypto-tlssession$(EXESUF) check-unit-$(CONFIG_LINUX) += tests/test-qga$(EXESUF) +check-unit-y += tests/test-timed-average$(EXESUF) check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh @@ -403,6 +404,9 @@ tests/test-vmstate$(EXESUF): tests/test-vmstate.o \ migration/vmstate.o migration/qemu-file.o migration/qemu-file-buf.o \ migration/qemu-file-unix.o qjson.o \ $(test-qom-obj-y) +tests/test-timed-average$(EXESUF): tests/test-timed-average.o qemu-timer.o \ + libqemuutil.a stubs/clock-warp.o stubs/cpu-get-icount.o \ + stubs/notify-event.o tests/test-qapi-types.c tests/test-qapi-types.h :\ $(SRC_PATH)/tests/qapi-schema/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py $(qapi-py) diff --git a/tests/test-timed-average.c b/tests/test-timed-average.c new file mode 100644 index 000..a049799 --- /dev/null +++ b/tests/test-timed-average.c @@ -0,0 +1,90 @@ +/* + * Timed average computation tests + * + * Copyright Nodalink, EURL. 2014 + * + * Authors: + * BenoƮt Canet + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + */ + +#include +#include + +#include "qemu/timed-average.h" + +/* This is the clock for QEMU_CLOCK_VIRTUAL */ +static int64_t my_clock_value; + +int64_t cpu_get_clock(void) +{ +return my_clock_v
[Qemu-block] [PATCH v3 21/21] block: Update copyright of the accounting code
Signed-off-by: Alberto Garcia --- block/accounting.c | 1 + include/block/accounting.h | 1 + 2 files changed, 2 insertions(+) diff --git a/block/accounting.c b/block/accounting.c index 05a5c5f..185025e 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -2,6 +2,7 @@ * QEMU System Emulator block accounting * * Copyright (c) 2011 Christoph Hellwig + * Copyright (c) 2015 Igalia, S.L. * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff --git a/include/block/accounting.h b/include/block/accounting.h index f41ddde..0215a4a 100644 --- a/include/block/accounting.h +++ b/include/block/accounting.h @@ -2,6 +2,7 @@ * QEMU System Emulator block accounting * * Copyright (c) 2011 Christoph Hellwig + * Copyright (c) 2015 Igalia, S.L. * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal -- 2.6.1
[Qemu-block] [PATCH v3 11/21] qemu-io: Account for failed, invalid and flush operations
Signed-off-by: Alberto Garcia --- qemu-io-cmds.c | 9 + 1 file changed, 9 insertions(+) diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index 6e5d1e4..0cac623 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -1364,6 +1364,7 @@ static void aio_write_done(void *opaque, int ret) if (ret < 0) { printf("aio_write failed: %s\n", strerror(-ret)); +block_acct_failed(blk_get_stats(ctx->blk), &ctx->acct); goto out; } @@ -1392,6 +1393,7 @@ static void aio_read_done(void *opaque, int ret) if (ret < 0) { printf("readv failed: %s\n", strerror(-ret)); +block_acct_failed(blk_get_stats(ctx->blk), &ctx->acct); goto out; } @@ -1505,6 +1507,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char **argv) if (ctx->offset & 0x1ff) { printf("offset %" PRId64 " is not sector aligned\n", ctx->offset); +block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ); g_free(ctx); return 0; } @@ -1512,6 +1515,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char **argv) nr_iov = argc - optind; ctx->buf = create_iovec(blk, &ctx->qiov, &argv[optind], nr_iov, 0xab); if (ctx->buf == NULL) { +block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ); g_free(ctx); return 0; } @@ -1600,6 +1604,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char **argv) if (ctx->offset & 0x1ff) { printf("offset %" PRId64 " is not sector aligned\n", ctx->offset); +block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); g_free(ctx); return 0; } @@ -1607,6 +1612,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char **argv) nr_iov = argc - optind; ctx->buf = create_iovec(blk, &ctx->qiov, &argv[optind], nr_iov, pattern); if (ctx->buf == NULL) { +block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); g_free(ctx); return 0; } @@ -1621,7 +1627,10 @@ static int aio_write_f(BlockBackend *blk, int argc, char **argv) static int aio_flush_f(BlockBackend *blk, int argc, char **argv) { +BlockAcctCookie cookie; +block_acct_start(blk_get_stats(blk), &cookie, 0, BLOCK_ACCT_FLUSH); blk_drain_all(); +block_acct_done(blk_get_stats(blk), &cookie); return 0; } -- 2.6.1
[Qemu-block] [PATCH v3 20/21] scsi-disk: Account for failed operations
Signed-off-by: Alberto Garcia --- hw/scsi/scsi-disk.c | 46 +++--- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index bada9a7..20a31a7 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -90,7 +90,7 @@ struct SCSIDiskState bool tray_locked; }; -static int scsi_handle_rw_error(SCSIDiskReq *r, int error); +static int scsi_handle_rw_error(SCSIDiskReq *r, int error, bool acct_failed); static void scsi_free_request(SCSIRequest *req) { @@ -169,18 +169,18 @@ static void scsi_aio_complete(void *opaque, int ret) assert(r->req.aiocb != NULL); r->req.aiocb = NULL; -block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); if (r->req.io_canceled) { scsi_req_cancel_complete(&r->req); goto done; } if (ret < 0) { -if (scsi_handle_rw_error(r, -ret)) { +if (scsi_handle_rw_error(r, -ret, true)) { goto done; } } +block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); scsi_req_complete(&r->req, GOOD); done: @@ -247,7 +247,7 @@ static void scsi_dma_complete_noio(SCSIDiskReq *r, int ret) } if (ret < 0) { -if (scsi_handle_rw_error(r, -ret)) { +if (scsi_handle_rw_error(r, -ret, false)) { goto done; } } @@ -273,7 +273,11 @@ static void scsi_dma_complete(void *opaque, int ret) assert(r->req.aiocb != NULL); r->req.aiocb = NULL; -block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); +if (ret < 0) { +block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct); +} else { +block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); +} scsi_dma_complete_noio(r, ret); } @@ -285,18 +289,18 @@ static void scsi_read_complete(void * opaque, int ret) assert(r->req.aiocb != NULL); r->req.aiocb = NULL; -block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); if (r->req.io_canceled) { scsi_req_cancel_complete(&r->req); goto done; } if (ret < 0) { -if (scsi_handle_rw_error(r, -ret)) { +if (scsi_handle_rw_error(r, -ret, true)) { goto done; } } +block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); DPRINTF("Data ready tag=0x%x len=%zd\n", r->req.tag, r->qiov.size); n = r->qiov.size / 512; @@ -322,7 +326,7 @@ static void scsi_do_read(SCSIDiskReq *r, int ret) } if (ret < 0) { -if (scsi_handle_rw_error(r, -ret)) { +if (scsi_handle_rw_error(r, -ret, false)) { goto done; } } @@ -355,7 +359,11 @@ static void scsi_do_read_cb(void *opaque, int ret) assert (r->req.aiocb != NULL); r->req.aiocb = NULL; -block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); +if (ret < 0) { +block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct); +} else { +block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); +} scsi_do_read(opaque, ret); } @@ -407,7 +415,7 @@ static void scsi_read_data(SCSIRequest *req) * scsi_handle_rw_error always manages its reference counts, independent * of the return value. */ -static int scsi_handle_rw_error(SCSIDiskReq *r, int error) +static int scsi_handle_rw_error(SCSIDiskReq *r, int error, bool acct_failed) { bool is_read = (r->req.cmd.mode == SCSI_XFER_FROM_DEV); SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev); @@ -415,6 +423,9 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error) is_read, error); if (action == BLOCK_ERROR_ACTION_REPORT) { +if (acct_failed) { +block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct); +} switch (error) { case ENOMEDIUM: scsi_check_condition(r, SENSE_CODE(NO_MEDIUM)); @@ -452,7 +463,7 @@ static void scsi_write_complete_noio(SCSIDiskReq *r, int ret) } if (ret < 0) { -if (scsi_handle_rw_error(r, -ret)) { +if (scsi_handle_rw_error(r, -ret, false)) { goto done; } } @@ -481,7 +492,11 @@ static void scsi_write_complete(void * opaque, int ret) assert (r->req.aiocb != NULL); r->req.aiocb = NULL; -block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); +if (ret < 0) { +block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct); +} else { +block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); +} scsi_write_complete_noio(r, ret); } @@ -1592,7 +1607,7 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, int ret)
[Qemu-block] [PATCH v3 17/21] atapi: Account for failed and invalid operations
Signed-off-by: Alberto Garcia --- hw/ide/atapi.c | 31 +++ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c index 747f466..cf0b78e 100644 --- a/hw/ide/atapi.c +++ b/hw/ide/atapi.c @@ -108,27 +108,30 @@ static void cd_data_to_raw(uint8_t *buf, int lba) static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size) { int ret; +block_acct_start(blk_get_stats(s->blk), &s->acct, + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); switch(sector_size) { case 2048: -block_acct_start(blk_get_stats(s->blk), &s->acct, - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4); -block_acct_done(blk_get_stats(s->blk), &s->acct); break; case 2352: -block_acct_start(blk_get_stats(s->blk), &s->acct, - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4); -block_acct_done(blk_get_stats(s->blk), &s->acct); -if (ret < 0) -return ret; -cd_data_to_raw(buf, lba); +if (ret >= 0) { +cd_data_to_raw(buf, lba); +} break; default: -ret = -EIO; -break; +block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ); +return -EIO; } + +if (ret < 0) { +block_acct_failed(blk_get_stats(s->blk), &s->acct); +} else { +block_acct_done(blk_get_stats(s->blk), &s->acct); +} + return ret; } @@ -357,7 +360,11 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret) return; eot: -block_acct_done(blk_get_stats(s->blk), &s->acct); +if (ret < 0) { +block_acct_failed(blk_get_stats(s->blk), &s->acct); +} else { +block_acct_done(blk_get_stats(s->blk), &s->acct); +} ide_set_inactive(s, false); } -- 2.6.1
[Qemu-block] [PATCH v3 13/21] iotests: Add test for the block device statistics
Signed-off-by: Alberto Garcia --- tests/qemu-iotests/136 | 349 + tests/qemu-iotests/136.out | 5 + tests/qemu-iotests/group | 1 + 3 files changed, 355 insertions(+) create mode 100644 tests/qemu-iotests/136 create mode 100644 tests/qemu-iotests/136.out diff --git a/tests/qemu-iotests/136 b/tests/qemu-iotests/136 new file mode 100644 index 000..f574d83 --- /dev/null +++ b/tests/qemu-iotests/136 @@ -0,0 +1,349 @@ +#!/usr/bin/env python +# +# Tests for block device statistics +# +# Copyright (C) 2015 Igalia, S.L. +# Author: Alberto Garcia +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +import iotests +import os + +interval_length = 10 +nsec_per_sec = 10 +op_latency = nsec_per_sec / 1000 # See qtest_latency_ns in accounting.c +bad_sector = 8192 +bad_offset = bad_sector * 512 +blkdebug_file = os.path.join(iotests.test_dir, 'blkdebug.conf') + +class BlockDeviceStatsTestCase(iotests.QMPTestCase): +test_img = "null-aio://" +total_rd_bytes = 0 +total_rd_ops = 0 +total_wr_bytes = 0 +total_wr_ops = 0 +total_wr_merged = 0 +total_flush_ops = 0 +failed_rd_ops = 0 +failed_wr_ops = 0 +invalid_rd_ops = 0 +invalid_wr_ops = 0 +wr_highest_offset = 0 +account_invalid = False +account_failed = False + +def blockstats(self, device): +result = self.vm.qmp("query-blockstats") +for r in result['return']: +if r['device'] == device: +return r['stats'] +raise Exception("Device not found for blockstats: %s" % device) + +def create_blkdebug_file(self): +file = open(blkdebug_file, 'w') +file.write(''' +[inject-error] +event = "read_aio" +errno = "5" +sector = "%d" + +[inject-error] +event = "write_aio" +errno = "5" +sector = "%d" +''' % (bad_sector, bad_sector)) +file.close() + +def setUp(self): +drive_args = [] +drive_args.append("stats-intervals=%d" % interval_length) +drive_args.append("stats-account-invalid=%s" % + (self.account_invalid and "on" or "off")) +drive_args.append("stats-account-failed=%s" % + (self.account_failed and "on" or "off")) +self.create_blkdebug_file() +self.vm = iotests.VM().add_drive('blkdebug:%s:%s ' % + (blkdebug_file, self.test_img), + ','.join(drive_args)) +self.vm.launch() +# Set an initial value for the clock +self.vm.qtest("clock_step %d" % nsec_per_sec) + +def tearDown(self): +self.vm.shutdown() +os.remove(blkdebug_file) + +def accounted_ops(self, read = False, write = False, flush = False): +ops = 0 +if write: +ops += self.total_wr_ops +if self.account_failed: +ops += self.failed_wr_ops +if self.account_invalid: +ops += self.invalid_wr_ops +if read: +ops += self.total_rd_ops +if self.account_failed: +ops += self.failed_rd_ops +if self.account_invalid: +ops += self.invalid_rd_ops +if flush: +ops += self.total_flush_ops +return ops + +def accounted_latency(self, read = False, write = False, flush = False): +latency = 0 +if write: +latency += self.total_wr_ops * op_latency +if self.account_failed: +latency += self.failed_wr_ops * op_latency +if read: +latency += self.total_rd_ops * op_latency +if self.account_failed: +latency += self.failed_rd_ops * op_latency +if flush: +latency += self.total_flush_ops * op_latency +return latency + +def check_values(self): +stats = self.blockstats('drive0') + +# Check that the totals match with what we have calculated +self.assertEqual(self.total_rd_bytes, stats['rd_bytes']) +
[Qemu-block] [PATCH v3 19/21] macio: Account for failed operations
Signed-off-by: Alberto Garcia --- hw/ide/macio.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/hw/ide/macio.c b/hw/ide/macio.c index 66ac2ba..176e331 100644 --- a/hw/ide/macio.c +++ b/hw/ide/macio.c @@ -286,7 +286,11 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret) return; done: -block_acct_done(blk_get_stats(s->blk), &s->acct); +if (ret < 0) { +block_acct_failed(blk_get_stats(s->blk), &s->acct); +} else { +block_acct_done(blk_get_stats(s->blk), &s->acct); +} io->dma_end(opaque); return; @@ -348,7 +352,11 @@ static void pmac_ide_transfer_cb(void *opaque, int ret) done: if (s->dma_cmd == IDE_DMA_READ || s->dma_cmd == IDE_DMA_WRITE) { -block_acct_done(blk_get_stats(s->blk), &s->acct); +if (ret < 0) { +block_acct_failed(blk_get_stats(s->blk), &s->acct); +} else { +block_acct_done(blk_get_stats(s->blk), &s->acct); +} } io->dma_end(opaque); } -- 2.6.1
[Qemu-block] [PATCH v3 16/21] xen_disk: Account for failed and invalid operations
Signed-off-by: Alberto Garcia --- hw/block/xen_disk.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index 4869518..02eda6e 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -537,7 +537,11 @@ static void qemu_aio_complete(void *opaque, int ret) break; } case BLKIF_OP_READ: -block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct); +if (ioreq->status == BLKIF_RSP_OKAY) { +block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct); +} else { +block_acct_failed(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct); +} break; case BLKIF_OP_DISCARD: default: @@ -722,6 +726,23 @@ static void blk_handle_requests(struct XenBlkDev *blkdev) /* parse them */ if (ioreq_parse(ioreq) != 0) { + +switch (ioreq->req.operation) { +case BLKIF_OP_READ: +block_acct_invalid(blk_get_stats(blkdev->blk), + BLOCK_ACCT_READ); +break; +case BLKIF_OP_WRITE: +block_acct_invalid(blk_get_stats(blkdev->blk), + BLOCK_ACCT_WRITE); +break; +case BLKIF_OP_FLUSH_DISKCACHE: +block_acct_invalid(blk_get_stats(blkdev->blk), + BLOCK_ACCT_FLUSH); +default: +break; +}; + if (blk_send_response_one(ioreq)) { xen_be_send_notify(&blkdev->xendev); } -- 2.6.1