Re: [PATCHv3 1/1] ext4: Optimize file overwrites
On Fri, Sep 18, 2020 at 10:36:35AM +0530, Ritesh Harjani wrote: > In case if the file already has underlying blocks/extents allocated > then we don't need to start a journal txn and can directly return > the underlying mapping. Currently ext4_iomap_begin() is used by > both DAX & DIO path. We can check if the write request is an > overwrite & then directly return the mapping information. > > This could give a significant perf boost for multi-threaded writes > specially random overwrites. > On PPC64 VM with simulated pmem(DAX) device, ~10x perf improvement > could be seen in random writes (overwrite). Also bcoz this optimizes > away the spinlock contention during jbd2 slab cache allocation > (jbd2_journal_handle). On x86 VM, ~2x perf improvement was observed. > > Reported-by: Dan Williams > Suggested-by: Jan Kara > Signed-off-by: Ritesh Harjani Thanks, applied. - Ted
Re: [PATCHv3 1/1] ext4: Optimize file overwrites
On Fri 18-09-20 10:36:35, Ritesh Harjani wrote: > In case if the file already has underlying blocks/extents allocated > then we don't need to start a journal txn and can directly return > the underlying mapping. Currently ext4_iomap_begin() is used by > both DAX & DIO path. We can check if the write request is an > overwrite & then directly return the mapping information. > > This could give a significant perf boost for multi-threaded writes > specially random overwrites. > On PPC64 VM with simulated pmem(DAX) device, ~10x perf improvement > could be seen in random writes (overwrite). Also bcoz this optimizes > away the spinlock contention during jbd2 slab cache allocation > (jbd2_journal_handle). On x86 VM, ~2x perf improvement was observed. > > Reported-by: Dan Williams > Suggested-by: Jan Kara > Signed-off-by: Ritesh Harjani The patch looks good to me. You can add: Reviewed-by: Jan Kara Honza > --- > fs/ext4/inode.c | 18 +++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 10dd470876b3..6eae17758ece 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -3437,14 +3437,26 @@ static int ext4_iomap_begin(struct inode *inode, > loff_t offset, loff_t length, > map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, > EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; > > - if (flags & IOMAP_WRITE) > + if (flags & IOMAP_WRITE) { > + /* > + * We check here if the blocks are already allocated, then we > + * don't need to start a journal txn and we can directly return > + * the mapping information. This could boost performance > + * especially in multi-threaded overwrite requests. > + */ > + if (offset + length <= i_size_read(inode)) { > + ret = ext4_map_blocks(NULL, inode, , 0); > + if (ret > 0 && (map.m_flags & EXT4_MAP_MAPPED)) > + goto out; > + } > ret = ext4_iomap_alloc(inode, , flags); > - else > + } else { > ret = ext4_map_blocks(NULL, inode, , 0); > + } > > if (ret < 0) > return ret; > - > +out: > ext4_set_iomap(inode, iomap, , offset, length); > > return 0; > -- > 2.26.2 > -- Jan Kara SUSE Labs, CR
Re: [PATCHv3 1/1] ext4: Optimize file overwrites
On Fri, Sep 18, 2020 at 7:09 AM Ritesh Harjani wrote: > > In case if the file already has underlying blocks/extents allocated > then we don't need to start a journal txn and can directly return > the underlying mapping. Currently ext4_iomap_begin() is used by > both DAX & DIO path. We can check if the write request is an > overwrite & then directly return the mapping information. > > This could give a significant perf boost for multi-threaded writes > specially random overwrites. > On PPC64 VM with simulated pmem(DAX) device, ~10x perf improvement > could be seen in random writes (overwrite). Also bcoz this optimizes > away the spinlock contention during jbd2 slab cache allocation > (jbd2_journal_handle). On x86 VM, ~2x perf improvement was observed. > > Reported-by: Dan Williams > Suggested-by: Jan Kara > Signed-off-by: Ritesh Harjani I have applied your patch on top of recent Linus Git and boot-tested on x86-64. Here I have LTP installed. If you have a LTP filesystem test-/use-case you know for testing, please let me know. Yes, I have seen the FIO config in the cover-letter. Maybe you have a different FIO config - 16G AFAIK is too big here. Feel free to add... Tested-by: Sedat Dilek # Compile and boot on x86-64 Debian/unstable Thanks. - Sedat - > --- > fs/ext4/inode.c | 18 +++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 10dd470876b3..6eae17758ece 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -3437,14 +3437,26 @@ static int ext4_iomap_begin(struct inode *inode, > loff_t offset, loff_t length, > map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, > EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; > > - if (flags & IOMAP_WRITE) > + if (flags & IOMAP_WRITE) { > + /* > +* We check here if the blocks are already allocated, then we > +* don't need to start a journal txn and we can directly > return > +* the mapping information. This could boost performance > +* especially in multi-threaded overwrite requests. > +*/ > + if (offset + length <= i_size_read(inode)) { > + ret = ext4_map_blocks(NULL, inode, , 0); > + if (ret > 0 && (map.m_flags & EXT4_MAP_MAPPED)) > + goto out; > + } > ret = ext4_iomap_alloc(inode, , flags); > - else > + } else { > ret = ext4_map_blocks(NULL, inode, , 0); > + } > > if (ret < 0) > return ret; > - > +out: > ext4_set_iomap(inode, iomap, , offset, length); > > return 0; > -- > 2.26.2 >
[PATCHv3 1/1] ext4: Optimize file overwrites
In case if the file already has underlying blocks/extents allocated then we don't need to start a journal txn and can directly return the underlying mapping. Currently ext4_iomap_begin() is used by both DAX & DIO path. We can check if the write request is an overwrite & then directly return the mapping information. This could give a significant perf boost for multi-threaded writes specially random overwrites. On PPC64 VM with simulated pmem(DAX) device, ~10x perf improvement could be seen in random writes (overwrite). Also bcoz this optimizes away the spinlock contention during jbd2 slab cache allocation (jbd2_journal_handle). On x86 VM, ~2x perf improvement was observed. Reported-by: Dan Williams Suggested-by: Jan Kara Signed-off-by: Ritesh Harjani --- fs/ext4/inode.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 10dd470876b3..6eae17758ece 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3437,14 +3437,26 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; - if (flags & IOMAP_WRITE) + if (flags & IOMAP_WRITE) { + /* +* We check here if the blocks are already allocated, then we +* don't need to start a journal txn and we can directly return +* the mapping information. This could boost performance +* especially in multi-threaded overwrite requests. +*/ + if (offset + length <= i_size_read(inode)) { + ret = ext4_map_blocks(NULL, inode, , 0); + if (ret > 0 && (map.m_flags & EXT4_MAP_MAPPED)) + goto out; + } ret = ext4_iomap_alloc(inode, , flags); - else + } else { ret = ext4_map_blocks(NULL, inode, , 0); + } if (ret < 0) return ret; - +out: ext4_set_iomap(inode, iomap, , offset, length); return 0; -- 2.26.2