On Thu, 2020-11-12 at 17:04 +0200, Maxim Levitsky wrote: > On Thu, 2020-11-12 at 07:45 -0600, Eric Blake wrote: > > On 11/12/20 6:40 AM, Peter Lieven wrote: > > > > > > /* > > > > - * Avoid that s->sector_next_status becomes unaligned to the > > > > source > > > > - * request alignment and/or cluster size to avoid unnecessary > > > > read > > > > - * cycles. > > > > + * Avoid that s->sector_next_status becomes unaligned to the > > > > + * source/destination request alignment and/or cluster size to > > > > avoid > > > > + * unnecessary read/write cycles. > > > > */ > > > > - tail = (sector_num - src_cur_offset + n) % > > > > s->src_alignment[src_cur]; > > > > + alignment = MAX(s->src_alignment[src_cur], s->alignment); > > > > + assert(is_power_of_2(alignment)); > > > > + > > > > + tail = (sector_num - src_cur_offset + n) % alignment; > > > > if (n > tail) { > > > > n -= tail; > > > > } > > > > > > I was also considering including the s->alignment when adding this > > > chance. However, you need the least common multiple of both alignments, > > > not the maximum, otherwise > > > > > > you might get misaligned to either source or destination. > > > > > > > > > Why exactly do you need the power of two requirement? > > > > The power of two requirement ensures that you h ave the least common > > multiple of both alignments ;) > - > Yes, and in addition to that both alignments are already power of two because > we align them > to it. The assert I added is just in case. > > This is how we calculate the destination alignment: > > s.alignment = MAX(pow2floor(s.min_sparse), > DIV_ROUND_UP(out_bs->bl.request_alignment, > BDRV_SECTOR_SIZE)); > > > This is how we calculate the source alignments (it is possible to have > several source files) > > s.src_alignment[bs_i] = DIV_ROUND_UP(src_bs->bl.request_alignment, > BDRV_SECTOR_SIZE); > if (!bdrv_get_info(src_bs, &bdi)) { > s.src_alignment[bs_i] = MAX(s.src_alignment[bs_i], bdi.cluster_size / > BDRV_SECTOR_SIZE); > } > > > The bl.request_alignment comment mentions that it must be power of 2, > and cluster sizes are also very likely to be power of 2 in all drivers > we support. An assert for s.src_alignment won't hurt though. > > > Note though that we don't really read the discard alignment. > We have max_pdiscard, and pdiscard_alignment, but max_pdiscard > is only used to split 'too large' discard requests, and pdiscard_alignment > is advisory and used only in a couple of places. > Neither are set by file-posix. > > We do have request_alignment, which file-posix tries to align to the logical > block > size which is still often 512 for backward compatibility on many devices > (even nvme) > > Now both source and destination alignments in qemu-img convert are based on > request_aligment > and on min_sparse (which is by default 4K, controlled by qemu-img -S parameter > (which otherwise controls the minimum number of blocks to be zero, to consider > discarding them in convert) > > > This means that there is no guarantee to have 4K alignment, and besides, > with sufficiently fragmented source file, the bdrv_block_status_above > can return arbitrary short and unaligned extents which can't be aligned, > thus as I said this patch alone can't guarantee that we won't have > write and discard sharing the same page. > > Another thing that can be discussed is why is_allocated_sectors was patched > to convert short discards to writes. Probably because underlying hardware > ignores them or something? In theory we should detect this and fail > back to regular zeroing in this case. > Again though, while in case of converting an empty file, this is the only > source of writes, and eliminating it, also 'fixes' this case, with > sufficiently > fragmented source file we can even without this code get a write and discard > sharing a page. > > > Best regards, > Maxim Levitsky > > > However, there ARE devices in practice that have advertised a > > non-power-of-2 discard granularity, such as 15MiB (see commits 430b26a8, > > 63188c24). Which means you probably don't want to assert power-of-2, > > and in turn need to worry about least common multiple. > >
Any update on this patch? Best regards, Maxim Levitsky