Hi all,

I've been looking at rgrp.c:gfs2_alloc_blocks(), which is called from
various places to allocate single/multiple blocks for inodes. I've come up
with some data structures to accomplish recording of these allocations as
extents.

I'm proposing we add a new metadata type for journal blocks that will hold
these extent records.

GFS2_METATYPE_EX 15 /* New metadata type for a block that will hold extents
 */

This structure below will be at the start of the block, followed by a
number of alloc_ext structures.

struct gfs2_extents { /* This structure is 32 bytes long */
    struct gfs2_meta_header ex_header;
    __be32 ex_count; /* count of number of alloc_ext structs that follow
this header. */
    __be32 __pad;
};
/* flags for the alloc_ext struct */
#define AE_FL_XXX

struct alloc_ext { /* This structure is 48 bytes long */
    struct gfs2_inum ae_num; /* The inode this allocation/deallocation
belongs to */
    __be32 ae_flags; /* specifies if we're allocating/deallocating,
data/metadata, etc. */
    __be64 ae_start; /* starting physical block number of the extent */
    __be64 ae_len;   /* length of the extent */
    __be32 ae_uid;   /* user this belongs to, for quota accounting */
    __be32 ae_gid;   /* group this belongs to, for quota accounting */
    __be32 __pad;
};

With 4k block sizes, we can fit 84 extents (10 for 512b, 20 for 1k, 42 for
2k block sizes) in one block. As we process more allocs/deallocs, we keep
creating more such alloc_ext records and tack them to the back of this
block if there's space or else create a new block. For smaller extents,
this might not be efficient, so we might just want to revert to the old
method of recording the bitmap blocks instead.
During journal replay, we decode these new blocks and flip the
corresponding bitmaps for each of the blocks represented in the extents.
For the ones where we just recorded the bitmap blocks the old-fashioned
way, we also replay them the old-fashioned way. This way we're also
backward compatible with an older version of gfs2 that only records the
bitmaps.
Since we record the uid/gid with each extent, we can do the quota
accounting without relying on the quota change file. We might need to keep
the quota change file around for backward compatibility and for the cases
where we might want to record allocs/deallocs the old-fashioned way.

I'm going to play around with this and come up with some patches to see if
this works and what kind of performance improvements we get. These data
structures will mostly likely need reworking and renaming, but this is the
general direction I'm thinking along.

Please let me know what you think.

Cheers!
--Abhi

Reply via email to