Re: [PATCH] ext4: delayed inode update for the consistency of file size after a crash

2017-12-18 Thread Christoph Hellwig
> that update of the timestamp and i_size needs to be moved to an I/O
> completion handler.  We do this already to convert unwritten requests
> to be written in fs/ext4/page_io.c.  See ext4_put_io_end_defer() in
> fs/ext4/page_io.c; if we need to convert unwritten extents the
> EXT4_IO_END_UNWRITTEN flag is set, and ext4_add_complete_io() tacks
> the io_end queue onto a workqueue.  This infrastructure could be made
> more general so that it can do other work after the I/O has been
> completed, including the i_size update.

That's what we do for the i_size update in XFS.


Re: [PATCH] ext4: delayed inode update for the consistency of file size after a crash

2017-12-18 Thread Christoph Hellwig
> that update of the timestamp and i_size needs to be moved to an I/O
> completion handler.  We do this already to convert unwritten requests
> to be written in fs/ext4/page_io.c.  See ext4_put_io_end_defer() in
> fs/ext4/page_io.c; if we need to convert unwritten extents the
> EXT4_IO_END_UNWRITTEN flag is set, and ext4_add_complete_io() tacks
> the io_end queue onto a workqueue.  This infrastructure could be made
> more general so that it can do other work after the I/O has been
> completed, including the i_size update.

That's what we do for the i_size update in XFS.


Re: [PATCH] ext4: delayed inode update for the consistency of file size after a crash

2017-12-16 Thread Theodore Ts'o
On Sat, Dec 16, 2017 at 01:33:26PM +0900, Seongbae Son wrote:
> > > Details can be found as follows.
> > >
> > > Son et al. "Guaranteeing the Metadata Update Atomicity in EXT4 
> > > Filesystem”,
> > > In Proc. of APSYS 2017, Mumbai, India
> 
> > This is behind a paywall, so I can't access it.  I am sorry I wasn't
> > on the program committee, or I would have pointed this out while the
> > paper was being reviewed.
> 
> Thanks for your quick answer.
> I am sorry about that. I could not think about the paywall.

If you want to send me a PDF of the paper, I'm happy to look at it.

> I have performed the above scenario to xfs, btrfs, f2fs, and zfs.
> As the test result, all of the four file systems does not have the problem
> that fileA in which fsync() was not executed has the wrong file size
> after a system crash. So, I think, the portability of applications might be
> okay even though EXT4 guarantees the consistency between the file size and
> the data blocks of the file that fsync() is not executed after a system crash.

So first of all, there are more file systems than xfs, btrfs, f2fs,
and zfs, and there are more operating systems than Linux and Solaris.

Secondly, your patch doesn't solve the problem.  Updating the
timestamps without putting the changes in a transaction is no
guarantee; if some other process does some operation such as chmod,
et. al, the inode timestamps will get updated ahead of time.  Worse,
you are updating the i_size in a separate handle, right *before* the
write request is submitted.  So if a transaction commit gets started
immediately after call to ext4_journal_stop() which was added in
mpage_process_page_bufs(), it's still possible for i_size to be
updated and be visible after a crash, without the data block being
updated.  It's a narrower race window, but it's still there.

Furthermore, the patch is huge because the introduction of the new
functions _ext4_update_time(), ext4_update_time(),
_generic_file_write_iter() have included a large amount of extra lines
of code which are copied from elsewhere --- e.g., this is "reverse
code factoring" --- and it makes the resulting source less
maintainable.  And the fact that it forces every single write which
affects the last block in the file to be written *twice* means that it
has really unfortunate real performance impacts for workloads which
are appending to a file (e.g., any kind of log file).

If you really want to solve this problem, what needs to be done is
that update of the timestamp and i_size needs to be moved to an I/O
completion handler.  We do this already to convert unwritten requests
to be written in fs/ext4/page_io.c.  See ext4_put_io_end_defer() in
fs/ext4/page_io.c; if we need to convert unwritten extents the
EXT4_IO_END_UNWRITTEN flag is set, and ext4_add_complete_io() tacks
the io_end queue onto a workqueue.  This infrastructure could be made
more general so that it can do other work after the I/O has been
completed, including the i_size update.

I'm not really convinced it's worth it, but this would be a much more
efficient way of solving the problem, and it would avoid need to clone
a large amount of code both in ext4 and in the generic mm/filemap.c
file.

Best regards,

- Ted


Re: [PATCH] ext4: delayed inode update for the consistency of file size after a crash

2017-12-16 Thread Theodore Ts'o
On Sat, Dec 16, 2017 at 01:33:26PM +0900, Seongbae Son wrote:
> > > Details can be found as follows.
> > >
> > > Son et al. "Guaranteeing the Metadata Update Atomicity in EXT4 
> > > Filesystem”,
> > > In Proc. of APSYS 2017, Mumbai, India
> 
> > This is behind a paywall, so I can't access it.  I am sorry I wasn't
> > on the program committee, or I would have pointed this out while the
> > paper was being reviewed.
> 
> Thanks for your quick answer.
> I am sorry about that. I could not think about the paywall.

If you want to send me a PDF of the paper, I'm happy to look at it.

> I have performed the above scenario to xfs, btrfs, f2fs, and zfs.
> As the test result, all of the four file systems does not have the problem
> that fileA in which fsync() was not executed has the wrong file size
> after a system crash. So, I think, the portability of applications might be
> okay even though EXT4 guarantees the consistency between the file size and
> the data blocks of the file that fsync() is not executed after a system crash.

So first of all, there are more file systems than xfs, btrfs, f2fs,
and zfs, and there are more operating systems than Linux and Solaris.

Secondly, your patch doesn't solve the problem.  Updating the
timestamps without putting the changes in a transaction is no
guarantee; if some other process does some operation such as chmod,
et. al, the inode timestamps will get updated ahead of time.  Worse,
you are updating the i_size in a separate handle, right *before* the
write request is submitted.  So if a transaction commit gets started
immediately after call to ext4_journal_stop() which was added in
mpage_process_page_bufs(), it's still possible for i_size to be
updated and be visible after a crash, without the data block being
updated.  It's a narrower race window, but it's still there.

Furthermore, the patch is huge because the introduction of the new
functions _ext4_update_time(), ext4_update_time(),
_generic_file_write_iter() have included a large amount of extra lines
of code which are copied from elsewhere --- e.g., this is "reverse
code factoring" --- and it makes the resulting source less
maintainable.  And the fact that it forces every single write which
affects the last block in the file to be written *twice* means that it
has really unfortunate real performance impacts for workloads which
are appending to a file (e.g., any kind of log file).

If you really want to solve this problem, what needs to be done is
that update of the timestamp and i_size needs to be moved to an I/O
completion handler.  We do this already to convert unwritten requests
to be written in fs/ext4/page_io.c.  See ext4_put_io_end_defer() in
fs/ext4/page_io.c; if we need to convert unwritten extents the
EXT4_IO_END_UNWRITTEN flag is set, and ext4_add_complete_io() tacks
the io_end queue onto a workqueue.  This infrastructure could be made
more general so that it can do other work after the I/O has been
completed, including the i_size update.

I'm not really convinced it's worth it, but this would be a much more
efficient way of solving the problem, and it would avoid need to clone
a large amount of code both in ext4 and in the generic mm/filemap.c
file.

Best regards,

- Ted


[PATCH] ext4: delayed inode update for the consistency of file size after a crash

2017-12-15 Thread Seongbae Son
> > 1. Current file offset of fileA is 14 KB. An application appends 2 KB data 
> > to
> > fileA by executing a write() system call. At this time, the file size in
> > the ext4_inode of fileA is updated to 16 KB by ext4_da_write_end().
> > 2. Current file offset of fileB is 14 KB. An application appends 2 KB data 
> > to
> > fileB by executing a write() system call. At this time, the file size in
> > the ext4_inode of fileB is updated to 16 KB by ext4_da_write_end().
> > 3. A fsync(fileB) is called before the kworker thread runs. At this time,
> > the application thread transfers the data block of fileB to storage and
> > wakes up the JBD2. Then, JBD2 writes the ext4_inodes of fileA and fileB in
> > the running transaction to the journal area. The ext4_inode of fileA in
> > the journal area has the file size, 16 KB, even though the data block of
> > fileA has not been written to storage.
> > 4. Assume that a system crash occurs. The EXT4 recovery module recovers
> > the inodes of fileA and fileB. The recovered inode of fileA has the updated
> > file size, 16 KB, even though the data of fileA has not been made durable.
> > The data block of fileA between 14 KB and 16 KB is seen as zeros.

> There's nothing wrong with this.  The user space application called
> fsync on fileB, and *not* on fileA.  Therefore, there is absolutely no
> guarantee that fileA's data contents are valid.
> 
> Consider the exact same thing will happen if the application had
> written data to fileA at offsets 6k to 8k.  If those offsets were
> previously zero, then after the crash, those offsets *might* still be
> zero after the crash, *unless* the application had first called
> fsync() or fdatasync() first.

> > Details can be found as follows.
> >
> > Son et al. "Guaranteeing the Metadata Update Atomicity in EXT4 Filesystem”,
> > In Proc. of APSYS 2017, Mumbai, India

> This is behind a paywall, so I can't access it.  I am sorry I wasn't
> on the program committee, or I would have pointed this out while the
> paper was being reviewed.

Hello Ted,

Thanks for your quick answer.
I am sorry about that. I could not think about the paywall.

> The problem with providing more guarantees than what is strictly
> provided for by POSIX is that it degrades the performance of the file
> system.  It can also promote application writes to depend on semantics
> which are non-portable, which can cause problems when they try to run
> that progam on other operating systems or other file systems.

I have performed the above scenario to xfs, btrfs, f2fs, and zfs.
As the test result, all of the four file systems does not have the problem
that fileA in which fsync() was not executed has the wrong file size
after a system crash. So, I think, the portability of applications might be
okay even though EXT4 guarantees the consistency between the file size and
the data blocks of the file that fsync() is not executed after a system crash.

Many thanks,

Seongbae Son.


[PATCH] ext4: delayed inode update for the consistency of file size after a crash

2017-12-15 Thread Seongbae Son
> > 1. Current file offset of fileA is 14 KB. An application appends 2 KB data 
> > to
> > fileA by executing a write() system call. At this time, the file size in
> > the ext4_inode of fileA is updated to 16 KB by ext4_da_write_end().
> > 2. Current file offset of fileB is 14 KB. An application appends 2 KB data 
> > to
> > fileB by executing a write() system call. At this time, the file size in
> > the ext4_inode of fileB is updated to 16 KB by ext4_da_write_end().
> > 3. A fsync(fileB) is called before the kworker thread runs. At this time,
> > the application thread transfers the data block of fileB to storage and
> > wakes up the JBD2. Then, JBD2 writes the ext4_inodes of fileA and fileB in
> > the running transaction to the journal area. The ext4_inode of fileA in
> > the journal area has the file size, 16 KB, even though the data block of
> > fileA has not been written to storage.
> > 4. Assume that a system crash occurs. The EXT4 recovery module recovers
> > the inodes of fileA and fileB. The recovered inode of fileA has the updated
> > file size, 16 KB, even though the data of fileA has not been made durable.
> > The data block of fileA between 14 KB and 16 KB is seen as zeros.

> There's nothing wrong with this.  The user space application called
> fsync on fileB, and *not* on fileA.  Therefore, there is absolutely no
> guarantee that fileA's data contents are valid.
> 
> Consider the exact same thing will happen if the application had
> written data to fileA at offsets 6k to 8k.  If those offsets were
> previously zero, then after the crash, those offsets *might* still be
> zero after the crash, *unless* the application had first called
> fsync() or fdatasync() first.

> > Details can be found as follows.
> >
> > Son et al. "Guaranteeing the Metadata Update Atomicity in EXT4 Filesystem”,
> > In Proc. of APSYS 2017, Mumbai, India

> This is behind a paywall, so I can't access it.  I am sorry I wasn't
> on the program committee, or I would have pointed this out while the
> paper was being reviewed.

Hello Ted,

Thanks for your quick answer.
I am sorry about that. I could not think about the paywall.

> The problem with providing more guarantees than what is strictly
> provided for by POSIX is that it degrades the performance of the file
> system.  It can also promote application writes to depend on semantics
> which are non-portable, which can cause problems when they try to run
> that progam on other operating systems or other file systems.

I have performed the above scenario to xfs, btrfs, f2fs, and zfs.
As the test result, all of the four file systems does not have the problem
that fileA in which fsync() was not executed has the wrong file size
after a system crash. So, I think, the portability of applications might be
okay even though EXT4 guarantees the consistency between the file size and
the data blocks of the file that fsync() is not executed after a system crash.

Many thanks,

Seongbae Son.


Re: [PATCH] ext4: delayed inode update for the consistency of file size after a crash

2017-12-10 Thread Theodore Ts'o
On Sun, Dec 10, 2017 at 09:12:57PM +0900, seongbaeSon wrote:
> 1. Current file offset of fileA is 14 KB. An application appends 2 KB data to
> fileA by executing a write() system call. At this time, the file size in 
> the ext4_inode of fileA is updated to 16 KB by ext4_da_write_end().
> 2. Current file offset of fileB is 14 KB. An application appends 2 KB data to
> fileB by executing a write() system call. At this time, the file size in
> the ext4_inode of fileB is updated to 16 KB by ext4_da_write_end().
> 3. A fsync(fileB) is called before the kworker thread runs. At this time,
> the application thread transfers the data block of fileB to storage and
> wakes up the JBD2. Then, JBD2 writes the ext4_inodes of fileA and fileB in
> the running transaction to the journal area. The ext4_inode of fileA in 
> the journal area has the file size, 16 KB, even though the data block of
> fileA has not been written to storage.
> 4. Assume that a system crash occurs. The EXT4 recovery module recovers
> the inodes of fileA and fileB. The recovered inode of fileA has the updated
> file size, 16 KB, even though the data of fileA has not been made durable.
> The data block of fileA between 14 KB and 16 KB is seen as zeros.

There's nothing wrong with this.  The user space application called
fsync on fileB, and *not* on fileA.  Therefore, there is absolutely no
guarantee that fileA's data contents are valid.

Consider the exact same thing will happen if the application had
written data to fileA at offsets 6k to 8k.  If those offsets were
previously zero, then after the crash, those offsets *might* still be
zero after the crash, *unless* the application had first called
fsync() or fdatasync() first.

> Details can be found as follows.
> 
> Son et al. "Guaranteeing the Metadata Update Atomicity in EXT4 Filesystem”,
> In Proc. of APSYS 2017, Mumbai, India

This is behind a paywall, so I can't access it.  I am sorry I wasn't
on the program committee, or I would have pointed this out while the
paper was being reviewed.

The problem with providing more guarantees than what is strictly
provided for by POSIX is that it degrades the performance of the file
system.  It can also promote application writes to depend on semantics
which are non-portable, which can cause problems when they try to run
that progam on other operating systems or other file systems.

Cheers,

- Ted


Re: [PATCH] ext4: delayed inode update for the consistency of file size after a crash

2017-12-10 Thread Theodore Ts'o
On Sun, Dec 10, 2017 at 09:12:57PM +0900, seongbaeSon wrote:
> 1. Current file offset of fileA is 14 KB. An application appends 2 KB data to
> fileA by executing a write() system call. At this time, the file size in 
> the ext4_inode of fileA is updated to 16 KB by ext4_da_write_end().
> 2. Current file offset of fileB is 14 KB. An application appends 2 KB data to
> fileB by executing a write() system call. At this time, the file size in
> the ext4_inode of fileB is updated to 16 KB by ext4_da_write_end().
> 3. A fsync(fileB) is called before the kworker thread runs. At this time,
> the application thread transfers the data block of fileB to storage and
> wakes up the JBD2. Then, JBD2 writes the ext4_inodes of fileA and fileB in
> the running transaction to the journal area. The ext4_inode of fileA in 
> the journal area has the file size, 16 KB, even though the data block of
> fileA has not been written to storage.
> 4. Assume that a system crash occurs. The EXT4 recovery module recovers
> the inodes of fileA and fileB. The recovered inode of fileA has the updated
> file size, 16 KB, even though the data of fileA has not been made durable.
> The data block of fileA between 14 KB and 16 KB is seen as zeros.

There's nothing wrong with this.  The user space application called
fsync on fileB, and *not* on fileA.  Therefore, there is absolutely no
guarantee that fileA's data contents are valid.

Consider the exact same thing will happen if the application had
written data to fileA at offsets 6k to 8k.  If those offsets were
previously zero, then after the crash, those offsets *might* still be
zero after the crash, *unless* the application had first called
fsync() or fdatasync() first.

> Details can be found as follows.
> 
> Son et al. "Guaranteeing the Metadata Update Atomicity in EXT4 Filesystem”,
> In Proc. of APSYS 2017, Mumbai, India

This is behind a paywall, so I can't access it.  I am sorry I wasn't
on the program committee, or I would have pointed this out while the
paper was being reviewed.

The problem with providing more guarantees than what is strictly
provided for by POSIX is that it degrades the performance of the file
system.  It can also promote application writes to depend on semantics
which are non-portable, which can cause problems when they try to run
that progam on other operating systems or other file systems.

Cheers,

- Ted


[PATCH] ext4: delayed inode update for the consistency of file size after a crash

2017-12-10 Thread seongbaeSon
For one write() system call, the file size in the ext4_inode can be updated
twice; at the write() system call and when dirty pages of the file are written
to storage (i.g. fsync() or kworker thread). The write() system call that
appends data to a file but does not need to allocate a block updates
the page cache entry and marks the page as dirty. Then, the write() system
call updates the file size in ext4_inode to the size of appended data and
inserts the ext4_inode into a running transaction. After that, if the
application calls an fsync(), the application thread dispatches the dirty
page of the file and wakes up the JBD2. Once the JBD2 is wakened up,
the JBD2 commits a running transaction. When the JBD2 commits the running
transaction, it is possible that there are some ext4_inodes which file size is
updated even though the dirty pages are not written to storage in the running
transaction. If an unexpected power failure occurs after the fsync(),
the EXT4 recovery module recovers the ext4_inodes in the journal area, but
without the appropriate data blocks on disk.

Consider the following scenario. There are the two files, namely, fileA and
fileB. The sizes of these two files are 14 KB. Mount options of EXT4 include
the delayed allocation and the ordered mode journaling.

1. Current file offset of fileA is 14 KB. An application appends 2 KB data to
fileA by executing a write() system call. At this time, the file size in 
the ext4_inode of fileA is updated to 16 KB by ext4_da_write_end().
2. Current file offset of fileB is 14 KB. An application appends 2 KB data to
fileB by executing a write() system call. At this time, the file size in
the ext4_inode of fileB is updated to 16 KB by ext4_da_write_end().
3. A fsync(fileB) is called before the kworker thread runs. At this time,
the application thread transfers the data block of fileB to storage and
wakes up the JBD2. Then, JBD2 writes the ext4_inodes of fileA and fileB in
the running transaction to the journal area. The ext4_inode of fileA in 
the journal area has the file size, 16 KB, even though the data block of
fileA has not been written to storage.
4. Assume that a system crash occurs. The EXT4 recovery module recovers
the inodes of fileA and fileB. The recovered inode of fileA has the updated
file size, 16 KB, even though the data of fileA has not been made durable.
The data block of fileA between 14 KB and 16 KB is seen as zeros.

EXT4 updates the file size of the ext4_inode at ext4_da_write_end() so that
both the time stamps and the file size of the ext4_inode get updated at
the write() system call. This saves EXT4 from having to journal the ext4_inode
again when dirty pages of the file are written to storage. If the file size in
the ext4_inode is updated again when dirty pages of the file are written
to storage, the ext4_inode can be inserted into two different transactions
by the update of time stamps and the update of the file size respectively.
This causes the amount of blocks written to the journal area to get increased.
In order to address the problem that the ext4_inode has a wrong file size
after an unexpected power failure, this commit does not update the file size
in the ext4_inode at the write() system call, but delays the update of
the file size in the ext4_inode until dirty pages of the file are written
to storage (i.g. fsync() or kworker thread). So, ext4_inode can be inserted
into two different transactions in this technique that this commit suggests.
In order to keep the above EXT4 optimization, our technique delays journaling
the ext4_inode in which the time stamps are updated until dirty pages of
the file are written to storage. Therefore, the time stamps and the file size
of the ext4_inode get updated at the same time in our technique as well.

Additionally, we should consider the kswapd behavior for this commit.
When there is a memory pressure, the kswapd cleans dirty pages in the page
cache. Consider that all dirty pages of a file are written to storage by
the kswapd. The ext4_inode of the file is not journaled due to this commit.
After the fsync() system call or the kswapd writes the dirty page to storage,
the state of the page is changed to clean or writeback and the ext4_inode
associated with the page is still not in the journal transaction and
will not be inserted to the transaction ever after. In order to update and
insert the ext4_inode into a running transaction in the above situation,
I make the kswapd be re-dirty the last selected victim page among the dirty
pages of a file.

This commit is applied to kernel 4.15-rc2.

Details can be found as follows.

Son et al. "Guaranteeing the Metadata Update Atomicity in EXT4 Filesystem”,
In Proc. of APSYS 2017, Mumbai, India

Signed-off-by: Seongbae Son, ESOSLab, Hanyang University 

---
 fs/ext4/ext4.h |   4 ++
 fs/ext4/file.c |  10 +++-
 fs/ext4/inode.c| 170 ++---
 include/linux/fs.h |   1 +
 

[PATCH] ext4: delayed inode update for the consistency of file size after a crash

2017-12-10 Thread seongbaeSon
For one write() system call, the file size in the ext4_inode can be updated
twice; at the write() system call and when dirty pages of the file are written
to storage (i.g. fsync() or kworker thread). The write() system call that
appends data to a file but does not need to allocate a block updates
the page cache entry and marks the page as dirty. Then, the write() system
call updates the file size in ext4_inode to the size of appended data and
inserts the ext4_inode into a running transaction. After that, if the
application calls an fsync(), the application thread dispatches the dirty
page of the file and wakes up the JBD2. Once the JBD2 is wakened up,
the JBD2 commits a running transaction. When the JBD2 commits the running
transaction, it is possible that there are some ext4_inodes which file size is
updated even though the dirty pages are not written to storage in the running
transaction. If an unexpected power failure occurs after the fsync(),
the EXT4 recovery module recovers the ext4_inodes in the journal area, but
without the appropriate data blocks on disk.

Consider the following scenario. There are the two files, namely, fileA and
fileB. The sizes of these two files are 14 KB. Mount options of EXT4 include
the delayed allocation and the ordered mode journaling.

1. Current file offset of fileA is 14 KB. An application appends 2 KB data to
fileA by executing a write() system call. At this time, the file size in 
the ext4_inode of fileA is updated to 16 KB by ext4_da_write_end().
2. Current file offset of fileB is 14 KB. An application appends 2 KB data to
fileB by executing a write() system call. At this time, the file size in
the ext4_inode of fileB is updated to 16 KB by ext4_da_write_end().
3. A fsync(fileB) is called before the kworker thread runs. At this time,
the application thread transfers the data block of fileB to storage and
wakes up the JBD2. Then, JBD2 writes the ext4_inodes of fileA and fileB in
the running transaction to the journal area. The ext4_inode of fileA in 
the journal area has the file size, 16 KB, even though the data block of
fileA has not been written to storage.
4. Assume that a system crash occurs. The EXT4 recovery module recovers
the inodes of fileA and fileB. The recovered inode of fileA has the updated
file size, 16 KB, even though the data of fileA has not been made durable.
The data block of fileA between 14 KB and 16 KB is seen as zeros.

EXT4 updates the file size of the ext4_inode at ext4_da_write_end() so that
both the time stamps and the file size of the ext4_inode get updated at
the write() system call. This saves EXT4 from having to journal the ext4_inode
again when dirty pages of the file are written to storage. If the file size in
the ext4_inode is updated again when dirty pages of the file are written
to storage, the ext4_inode can be inserted into two different transactions
by the update of time stamps and the update of the file size respectively.
This causes the amount of blocks written to the journal area to get increased.
In order to address the problem that the ext4_inode has a wrong file size
after an unexpected power failure, this commit does not update the file size
in the ext4_inode at the write() system call, but delays the update of
the file size in the ext4_inode until dirty pages of the file are written
to storage (i.g. fsync() or kworker thread). So, ext4_inode can be inserted
into two different transactions in this technique that this commit suggests.
In order to keep the above EXT4 optimization, our technique delays journaling
the ext4_inode in which the time stamps are updated until dirty pages of
the file are written to storage. Therefore, the time stamps and the file size
of the ext4_inode get updated at the same time in our technique as well.

Additionally, we should consider the kswapd behavior for this commit.
When there is a memory pressure, the kswapd cleans dirty pages in the page
cache. Consider that all dirty pages of a file are written to storage by
the kswapd. The ext4_inode of the file is not journaled due to this commit.
After the fsync() system call or the kswapd writes the dirty page to storage,
the state of the page is changed to clean or writeback and the ext4_inode
associated with the page is still not in the journal transaction and
will not be inserted to the transaction ever after. In order to update and
insert the ext4_inode into a running transaction in the above situation,
I make the kswapd be re-dirty the last selected victim page among the dirty
pages of a file.

This commit is applied to kernel 4.15-rc2.

Details can be found as follows.

Son et al. "Guaranteeing the Metadata Update Atomicity in EXT4 Filesystem”,
In Proc. of APSYS 2017, Mumbai, India

Signed-off-by: Seongbae Son, ESOSLab, Hanyang University 

---
 fs/ext4/ext4.h |   4 ++
 fs/ext4/file.c |  10 +++-
 fs/ext4/inode.c| 170 ++---
 include/linux/fs.h |   1 +
 mm/filemap.c   |  88