I have two questions for zfs code: In dbuf_hold_imple():
/* * If this buffer is currently syncing out, and we are are * still referencing it from db_data, we need to make a copy * of it in case we decide we want to dirty it again in this txg. */ if (db->db_level == 0 && db->db_blkid != DB_BONUS_BLKID && dn->dn_object != DMU_META_DNODE_OBJECT && db->db_state == DB_CACHED && db->db_data_pending) { dbuf_dirty_record_t *dr = db->db_data_pending; if (dr->dt.dl.dr_data == db->db_buf) { arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db); dbuf_set_data(db, arc_buf_alloc(db->db_dnode->dn_objset->os_spa, db->db.db_size, db, type)); bcopy(dr->dt.dl.dr_data->b_data, db->db.db_data, db->db.db_size); } } In this piece of code, it will copy the data to the extra buffer if the current one is syncing into disks. My question is that why not deferring this data copy to dbuf_dirty, where we check the db_data_pending and do the copy. The similar case is in dbuf_sync_leaf(): if (dn->dn_object != DMU_META_DNODE_OBJECT) { /* * If this buffer is currently "in use" (i.e., there are * active holds and db_data still references it), then make * a copy before we start the write so that any modifications * from the open txg will not leak into this write. * * NOTE: this copy does not need to be made for objects only * modified in the syncing context (e.g. DNONE_DNODE blocks). */ if (refcount_count(&db->db_holds) > 1 && *datap == db->db_buf) { arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db); *datap = arc_buf_alloc(os->os_spa, blksz, db, type); bcopy(db->db.db_data, (*datap)->b_data, blksz); } } else { Here it copies the data also if there are extra reference count on the syncing dbuf. What I don't understand is why it need to do this operations since dbuf_dirty can handle this scenario perfectly. And the further question, perhaps related to the above one, why does zfs need to release the arc buffer in dbuf_dirty? The comment says this is needed to protect other from reading the cached data block. But I don't know if there is other calling patch to hit the arc buffer except via dmu interface. Perhaps this is needed to protect the snapshot from reading the new data? } else if (db->db.db_object != DMU_META_DNODE_OBJECT) { /* * Release the data buffer from the cache so that we * can modify it without impacting possible other users * of this cached data block. Note that indirect * blocks and private objects are not released until the * syncing state (since they are only modified then). */ arc_release(db->db_buf, db); dbuf_fix_old_data(db, tx->tx_txg); data_old = db->db_buf; } Thanks, Jay -- This messages posted from opensolaris.org