jay xiong wrote: > I have two questions for zfs code: > > In dbuf_hold_imple(): > In this piece of code, it will copy the data to the extra buffer if the > current one is syncing into disks. My question is that why not deferring this > data copy to dbuf_dirty, where we check the db_data_pending and do the copy. > The similar case is in dbuf_sync_leaf(): > > Here it copies the data also if there are extra reference count on the > syncing dbuf. What I don't understand is why it need to do this > operations since dbuf_dirty can handle this scenario perfectly.
The DMU provides an interface such that the db_data pointer will not change from the consumer's point of view. If we did the copy in dbuf_dirty(), we would have to change db_data (to the new copy). We can not not make a new copy for the syncing thread, because we don't know what state it is in (eg, it might have already given that pointer to the i/o subsystem to write). Note that we need to check and possibly make a copy in both places (in dbuf_sync_leaf() and dbuf_hold_impl()). The critical section is while the dbuf is being written out, so we must check at the beginning of that section (dbuf_sync_leaf), and also if we want to add a hold while we are in the critical section (dbuf_hold_impl). Alternatively, dbuf_hold_impl() could simply wait for the dbuf to finish being written out, but that would hurt performance. > And the further question, perhaps related to the above one, why does zfs > need to release the arc buffer in dbuf_dirty? The comment says this is > needed to protect other from reading the cached data block. But I don't > know if there is other calling patch to hit the arc buffer except via > dmu interface. Perhaps this is needed to protect the snapshot from reading > the new data? That is correct. arc_release() essentially disassociates the arc_buf from its blkptr (ie, from the arc_buf_hdr, from the arc's point of view), creating a new "copy" for the DMU to modify. There may be many arc_buf's for one blkptr because other datasets (eg, snapshots, clones) may reference the same block. --matt