jay xiong wrote:
> I have two questions for zfs code:
> 
> In dbuf_hold_imple():
> In this piece of code, it will copy the data to the extra buffer if the 
> current one is syncing into disks. My question is that why not deferring this 
> data copy to dbuf_dirty, where we check the db_data_pending and do the copy. 
> The similar case is in dbuf_sync_leaf():
> 
> Here it copies the data also if there are extra reference count on the 
> syncing dbuf. What I don't understand is why it need to do this 
> operations since dbuf_dirty can handle this scenario perfectly.

The DMU provides an interface such that the db_data pointer will not change 
from the consumer's point of view.  If we did the copy in dbuf_dirty(), we 
would have to change db_data (to the new copy).  We can not not make a new 
copy for the syncing thread, because we don't know what state it is in (eg, 
it might have already given that pointer to the i/o subsystem to write).

Note that we need to check and possibly make a copy in both places (in 
dbuf_sync_leaf() and dbuf_hold_impl()).  The critical section is while the 
dbuf is being written out, so we must check at the beginning of that section 
(dbuf_sync_leaf), and also if we want to add a hold while we are in the 
critical section (dbuf_hold_impl).

Alternatively, dbuf_hold_impl() could simply wait for the dbuf to finish 
being written out, but that would hurt performance.

> And the further question, perhaps related to the above one, why does zfs 
> need to release the arc buffer in dbuf_dirty? The comment says this is 
> needed to protect other from reading the cached data block. But I don't 
> know if there is other calling patch to hit the arc buffer except via 
> dmu interface. Perhaps this is needed to protect the snapshot from reading 
> the new data?

That is correct.  arc_release() essentially disassociates the arc_buf from 
its blkptr (ie, from the arc_buf_hdr, from the arc's point of view), creating 
a new "copy" for the DMU to modify.  There may be many arc_buf's for one 
blkptr because other datasets (eg, snapshots, clones) may reference the same 
block.

--matt

Reply via email to