Hey, I was benchmarking the performance of the qcow2 file format and I noticed that if the storage backend is fast enough then the qcow2 overlap checks can have a very significant impact in performance.
One important bottleneck that applies to all images is in the refcount-block check, that checks the refcount table and sees if a write request overlaps with an existing refcount block. The problem is that, unlike the L1 table, the refcount table takes entire clusters. With the default cluster size of 64k, a normal refcount table has 8192 entries, all of which have to be checked for each write request. This is an expensive operation. Refcounts are used for host clusters, and we need one refcount block (and therefore one entry in the refcount table) per 2GB in the qcow2 file. This means that the default refcount table can address up to 16TB (we're talking about actual image size, not virtual size). In other words: the vast majority of the entries in the refcount table are probably not going to be used ever, but we're still checking them for each write request. One user reported a >200% performance increase on a fast SSD when using overlap-check=constant. I think this is at least worth documenting a bit better (unless there's existing documentation that I have missed), but my main question is: does it make sense to try to optimize these checks, or is it better to simply tell the user to disable them in these scenarios? Berto