Free tags Free tags in Tux3 will perforrm a similar function to Ext2's "unallocated block" counts. In Ext2, an unallocated block count is a 16 bit field in the group descriptor for each block group. Tux3 does not have group descriptors, but will use for a similar purpose a table of "free tags", where the table is just a large linear array at some offset in a special inode.
Accounting data provided by free tags is required by several different allocators in Tux3 in order to scale effectively to large allocation spaces. Our allocation algorithms need to be able to locate available free objects in volumes theoretically as large as one exabyte, and in practice, up to several terabytes for personal workstation usage. We need to track three kinds of free objects: Free Blocks Free Inode numbers Free directory records Each of these kinds of objects has its own unique allocation characteristics. For example with free directory records we are primarily interested in the availability of records of a given size, while for free blocks we are mainly interested in the total free block count in some given region of a volume. We will attempt to view these three kinds of allocation space as similar in some important respects, and handle them with similar data structures and algorithms. For now, we will just concentrate on prototyping sufficient functionality for the first of these, free blocks, to enable us to proceed with work on high level block allocation algorithms. Initial free block tracking Our initial implementation of free block tracking will use a simple persistent table of 16 bit counts of the allocated blocks in a single 4K block bitmap block, starting at offset zero in the table. Note: the size of a "bitmap block" is probably going to change for non-4K volumes. For now it is one filesystem block. But this does not scale nicely as block size increases. We will eventually change to a fixed size bitmaps that do not change size as filesystem block size changes, but have not yet done so. For now, each free tag map element will always cover one block's worth of bitmap bits. Note: we will be storing the allocated block count, not the free block count, so that a block full of zeroes is interpreted as maximum availability. The relationship is simply: free_blocks = 2**(12 + 3) - allocated blocks. As with block bitmaps, we need to consider the possibility of allocation recursion in this structure, that is, the case where allocating a new block for the free block map itself causes a free block tag to be updated. We will deal with this recursion in the same way as the bitmap: we will log changes to the free block map instead of directly writing out changed free block map blocks per delta. Eventually, dirty free block map blocks will be written out as part of a unify (rollup), just like bitmap blocks. As an initial implementation, we will simply increment a free block tag every time a bit changes from zero to one in the respective bitmap block (a block is allocated), and decrement it every time a bit changes from one to zero (a block is freed). Future Optimization: lazy updates It is possible that the simplistic update strategy above may slow down writes measurably by generating additional CPU work updating tags, increasing the size of per-delta logs, and increasing the number of blocks needing to be flushed in a unify (rollup). If this turns out to be the case, then it is thought that such overhead can be reduced by lazily updating the tag maps. This proposition may be tested experimentally, given the simple implementation described here to test it against. Future Optimization: one byte tags It is possible that exact 16 bit block counts are higher precision than is actually required by effective volume layout optimization algorithms. An approximate representation that can be stored in a single byte might do just as well, and cut down the size of the mapping tables by a factor of two. This can be tested experimentally, later. Future Optimization: hierarchical maps For very large volumes, we may find that scanning the free tag map becomes inefficient. Then it may be worth mapping the tag map itself, with a higher level tag map so that we can rapidly locate free space in extremely large volumes. Whether such an optimization is actually needed for realistic volume sizes is an open question that we can test, given an initial, simple implementation. _______________________________________________ Tux3 mailing list Tux3@phunq.net http://phunq.net/mailman/listinfo/tux3