Hello, I believe the following is true, correct me if it is not:
If more than one objects reference a block (e.g. 2 files have the same block open) there must be multiple clones of the arc_buf_t ( and associated dmu_impl_t ) records present, one for each of the objects. This is always so, even if the block is not modified, "just in case the block a should end up being modified". So: if there are 100 files accessing the same block in the same txg, there will be 100 clones of the data, even if none of the files ultimately modifies this block. Seems a bit wasteful. This dos not feel like COW to me, rather, "copy always, just in case" at least in the arc/dmu realm. I fail to see why the above scenario should not be able to get by with a single, shared, reference counted record. A clone would only have to be made of a block if a given file decides to modify the block. As it is, reference counting is significantly complicated by mixing it with this pre-cloning. On to some code comprehension questions: It seems to be that the conceptual model of a file in the dmu layer: A number of dmu buffers, hanging off of a dnode (i.e the per-dnode the list formed via the db_link "list enabler"). Not all blocks of the file are in this list, only the "active" ones. I take "active" to mean "recently accessed". There is a somewhat opaque aspect to dmu, that is missing from the otherwise excellent data structure chart. I am talking about dirty buffer management. db_data_pending? db_last_dirty? db_dirtycnt? Could someone provide the 10K mile overview on dirty buffers? The dbuf_states are a bit of a mystery: What is the difference between "DB_READ" and "DB_FILL"? My guess, maybe the data is coming from a different direction into the cache. >From below: Read from disk, (maybe) >From above: Nascent data coming from an application (newly created data?). I am guessing DB_NOFILL is a short-circuit path to throw obsoleted data away. It would be nice to comment the states ( beyond an unexplained state transition diagramm. ZFS would be more approachable to newcomers if the code was a bit more commented. I am not talking about copious comments, just every field in the major data structures, and minimum a one-liner per function as to what the function does. Yes, given enough perseverance and a lot of time one can figure everything out from studying the usage patterns but the pain of this could be lessened. The more people understand ZFS, the stronger it will become. -- This message posted from opensolaris.org