[f2fs-dev] [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

2020-05-29 Thread Chao Yu
Under heavy fsstress, we may triggle panic while issuing discard,
because __check_sit_bitmap() detects that discard command may earse
valid data blocks, the root cause is as below race stack described,
since we removed lock when flushing quota data, quota data writeback
may race with write_checkpoint(), so that it causes inconsistency in
between cached discard entry and segment bitmap.

- f2fs_write_checkpoint
 - block_operations
  - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
 - f2fs_flush_sit_entries
  - add_discard_addrs
   - __set_bit_le(i, (void *)de->discard_map);
- f2fs_write_data_pages
 - f2fs_write_single_data_page
   : inode is quota one, 
cp_rwsem won't be locked
  - f2fs_do_write_data_page
   - f2fs_allocate_data_block
- f2fs_wait_discard_bio
  : discard entry has not 
been added yet.
- update_sit_entry
 - f2fs_clear_prefree_segments
  - f2fs_issue_discard
  : add discard entry

This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix quota_sync
failure due to f2fs_lock_op").

Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
Signed-off-by: Chao Yu 
---
 fs/f2fs/compress.c | 8 +++-
 fs/f2fs/data.c | 4 ++--
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index a53578a89211..62c4857ae46d 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -1097,7 +1097,7 @@ static int f2fs_write_compressed_pages(struct 
compress_ctx *cc,
loff_t psize;
int i, err;
 
-   if (!IS_NOQUOTA(inode) && !f2fs_trylock_op(sbi))
+   if (!f2fs_trylock_op(sbi))
return -EAGAIN;
 
set_new_dnode(&dn, cc->inode, NULL, NULL, 0);
@@ -1204,8 +1204,7 @@ static int f2fs_write_compressed_pages(struct 
compress_ctx *cc,
set_inode_flag(inode, FI_FIRST_BLOCK_WRITTEN);
 
f2fs_put_dnode(&dn);
-   if (!IS_NOQUOTA(inode))
-   f2fs_unlock_op(sbi);
+   f2fs_unlock_op(sbi);
 
spin_lock(&fi->i_size_lock);
if (fi->last_disk_size < psize)
@@ -1231,8 +1230,7 @@ static int f2fs_write_compressed_pages(struct 
compress_ctx *cc,
 out_put_dnode:
f2fs_put_dnode(&dn);
 out_unlock_op:
-   if (!IS_NOQUOTA(inode))
-   f2fs_unlock_op(sbi);
+   f2fs_unlock_op(sbi);
return -EAGAIN;
 }
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 999a36ff05f1..d983ad54f318 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2724,8 +2724,8 @@ int f2fs_write_single_data_page(struct page *page, int 
*submitted,
f2fs_available_free_memory(sbi, BASE_CHECK
goto redirty_out;
 
-   /* Dentry/quota blocks are controlled by checkpoint */
-   if (S_ISDIR(inode->i_mode) || IS_NOQUOTA(inode)) {
+   /* Dentry blocks are controlled by checkpoint */
+   if (S_ISDIR(inode->i_mode)) {
fio.need_lock = LOCK_DONE;
err = f2fs_do_write_data_page(&fio);
goto done;
-- 
2.18.0.rc1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] mm: mkfs.ext4 invoked oom-killer on i386 - pagecache_get_page

2020-05-29 Thread Michal Hocko
On Fri 29-05-20 02:56:44, Chris Down wrote:
> Yafang Shao writes:
> > Look at this patch[1] carefully you will find that it introduces the
> > same issue that I tried to fix in another patch [2]. Even more sad is
> > these two patches are in the same patchset. Although this issue isn't
> > related with the issue found by Naresh, we have to ask ourselves why
> > we always make the same mistake ?
> > One possible answer is that we always forget the lifecyle of
> > memory.emin before we read it. memory.emin doesn't have the same
> > lifecycle with the memcg, while it really has the same lifecyle with
> > the reclaimer. IOW, once a reclaimer begins the protetion value should
> > be set to 0, and after we traversal the memcg tree we calculate a
> > protection value for this reclaimer, finnaly it disapears after the
> > reclaimer stops. That is why I highly suggest to add an new protection
> > member in scan_control before.
> 
> I agree with you that the e{min,low} lifecycle is confusing for everyone --
> the only thing I've not seen confirmation of is any confirmed correlation
> with the i386 oom killer issue. If you've validated that, I'd like to see
> the data :-)

Agreed. Even if e{low,min} might still have some rough edges I am
completely puzzled how we could end up oom if none of the protection
path triggers which the additional debugging should confirm. Maybe my
debugging patch is incomplete or used incorrectly (maybe it would be
esier to use printk rather than trace_printk?).
-- 
Michal Hocko
SUSE Labs


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Can F2FS roll forward after fdatasync()?

2020-05-29 Thread Hongwei
Hi,
>On 05/28, Hongwei wrote:
>> Hi F2FS experts,
>> As written in f2fs_do_sync_file():
>> "Both of fdatasync() and fsync() are able to be recovered from 
>> sudden-power-off."
>>
>> Please consider this workflow:
>> 1. Start atomic write
>> 2. Multiple file writes
>> 3. Commit atomic write
>> 4. fdatasync()
>> 5. Powerloss.
>>
>> In the 4th step, the fdatasync() doesn't wait for node writeback.
>> So we may loss node blocks after powerloss.
>>
>> If the data blocks are persisted but node blocks aren't, can the recovery 
>> program recover the transaction?
>
>#3 will guarantee the blocks written by #2. So, if there's no written between 
>#3
>and #4, I think we have nothing to recover.
>Does this make sense to you?

Thanks for your reply. Please consider this:
f2fs_do_sync_file() doesn't wait for node writeback if atomic==1. So it is 
possible that after #3, node is still writing back.
#4 fdatasync() doesn't wait for node write back either.
Considering node writeback BIO is flagged with PREFLUSH and FUA, it may take a 
long time to complete.
Therefore, when #5 power failure happens, it is possible that the node block is 
not persisted?
If I was correct about this, can the recovery program recover the transaction?

>
>>
>> Thanks!
>>
>> Hongwei
>> ___
>> Linux-f2fs-devel mailing list
>> Linux-f2fs-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2] blkdev: Replace blksize_bits() with ilog2()

2020-05-29 Thread Kaitao Cheng
There is a function named ilog2() exist which can replace blksize.
The generated code will be shorter and more efficient on some
architecture, such as arm64. And ilog2() can be optimized according
to different architecture.

Signed-off-by: Kaitao Cheng 
---
changes in v2:
Remove all blksize_bits

 drivers/nvme/target/io-cmd-bdev.c|  2 +-
 drivers/s390/block/dasd_ioctl.c  |  2 +-
 drivers/usb/gadget/function/storage_common.c |  2 +-
 fs/block_dev.c   |  6 +++---
 fs/btrfs/disk-io.c   |  4 ++--
 fs/buffer.c  |  2 +-
 fs/direct-io.c   |  2 +-
 fs/f2fs/data.c   |  2 +-
 fs/iomap/direct-io.c |  2 +-
 fs/ocfs2/super.c |  2 +-
 fs/romfs/super.c |  2 +-
 include/linux/blkdev.h   | 11 ---
 12 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c 
b/drivers/nvme/target/io-cmd-bdev.c
index bcf979eb8e83..58bd947e232e 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -63,7 +63,7 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
return ret;
}
ns->size = i_size_read(ns->bdev->bd_inode);
-   ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
+   ns->blksize_shift = ilog2(bdev_logical_block_size(ns->bdev));
return 0;
 }
 
diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
index 34d1b4e5..55adb134451b 100644
--- a/drivers/s390/block/dasd_ioctl.c
+++ b/drivers/s390/block/dasd_ioctl.c
@@ -228,7 +228,7 @@ dasd_format(struct dasd_block *block, struct format_data_t 
*fdata)
 */
if (fdata->start_unit == 0) {
struct block_device *bdev = bdget_disk(block->gdp, 0);
-   bdev->bd_inode->i_blkbits = blksize_bits(fdata->blksize);
+   bdev->bd_inode->i_blkbits = ilog2(fdata->blksize);
bdput(bdev);
}
 
diff --git a/drivers/usb/gadget/function/storage_common.c 
b/drivers/usb/gadget/function/storage_common.c
index f7e6c42558eb..eada3e801dd7 100644
--- a/drivers/usb/gadget/function/storage_common.c
+++ b/drivers/usb/gadget/function/storage_common.c
@@ -233,7 +233,7 @@ int fsg_lun_open(struct fsg_lun *curlun, const char 
*filename)
blkbits = 11;
} else if (inode->i_bdev) {
blksize = bdev_logical_block_size(inode->i_bdev);
-   blkbits = blksize_bits(blksize);
+   blkbits = ilog2(blksize);
} else {
blksize = 512;
blkbits = 9;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index a333a648244e..d18496dfc6e7 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -115,7 +115,7 @@ static void set_init_blocksize(struct block_device *bdev)
bsize <<= 1;
}
bdev->bd_block_size = bsize;
-   bdev->bd_inode->i_blkbits = blksize_bits(bsize);
+   bdev->bd_inode->i_blkbits = ilog2(bsize);
 }
 
 int set_blocksize(struct block_device *bdev, int size)
@@ -132,7 +132,7 @@ int set_blocksize(struct block_device *bdev, int size)
if (bdev->bd_block_size != size) {
sync_blockdev(bdev);
bdev->bd_block_size = size;
-   bdev->bd_inode->i_blkbits = blksize_bits(size);
+   bdev->bd_inode->i_blkbits = ilog2(size);
kill_bdev(bdev);
}
return 0;
@@ -147,7 +147,7 @@ int sb_set_blocksize(struct super_block *sb, int size)
/* If we get here, we know size is power of two
 * and it's value is between 512 and PAGE_SIZE */
sb->s_blocksize = size;
-   sb->s_blocksize_bits = blksize_bits(size);
+   sb->s_blocksize_bits = ilog2(size);
return sb->s_blocksize;
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7c6f0bbb54a5..711b9fc31c94 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2765,7 +2765,7 @@ static int init_mount_fs_info(struct btrfs_fs_info 
*fs_info, struct super_block
 
fs_info->sb = sb;
sb->s_blocksize = BTRFS_BDEV_BLOCKSIZE;
-   sb->s_blocksize_bits = blksize_bits(BTRFS_BDEV_BLOCKSIZE);
+   sb->s_blocksize_bits = ilog2(BTRFS_BDEV_BLOCKSIZE);
 
ret = percpu_counter_init(&fs_info->dio_bytes, 0, GFP_KERNEL);
if (ret)
@@ -3059,7 +3059,7 @@ int __cold open_ctree(struct super_block *sb, struct 
btrfs_fs_devices *fs_device
sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
 
sb->s_blocksize = sectorsize;
-   sb->s_blocksize_bits = blksize_bits(sectorsize);
+   sb->s_blocksize_bits = ilog2(sectorsize);
memcpy(&sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
 
mutex_lock(&fs_info->chunk_mutex);
diff --git a/fs/buffer.c b/fs/buffer.c
index fc8831c392d7..fa92e

Re: [f2fs-dev] [PATCH v2] blkdev: Replace blksize_bits() with ilog2()

2020-05-29 Thread Christoph Hellwig
>   ns->size = i_size_read(ns->bdev->bd_inode);
> - ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
> + ns->blksize_shift = ilog2(bdev_logical_block_size(ns->bdev));

This should just be:

ns->blksize_shift = ns->bdev->bd_inode->i_blkbits;

> diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
> index 34d1b4e5..55adb134451b 100644
> --- a/drivers/s390/block/dasd_ioctl.c
> +++ b/drivers/s390/block/dasd_ioctl.c
> @@ -228,7 +228,7 @@ dasd_format(struct dasd_block *block, struct 
> format_data_t *fdata)
>*/
>   if (fdata->start_unit == 0) {
>   struct block_device *bdev = bdget_disk(block->gdp, 0);
> - bdev->bd_inode->i_blkbits = blksize_bits(fdata->blksize);
> + bdev->bd_inode->i_blkbits = ilog2(fdata->blksize);

This also needs to set bdev->bd_block_size, so this probably warrants
a separate fix that be backported.  It might be nice to split out
a helper that sets bd_block_size and bd_inode->i_blkbits together
so that such a use is more obvious.

>   } else if (inode->i_bdev) {
>   blksize = bdev_logical_block_size(inode->i_bdev);
> - blkbits = blksize_bits(blksize);
> + blkbits = ilog2(blksize);

This can just use inode->i_bdev->bd_inode->i_blkbits.

> diff --git a/fs/buffer.c b/fs/buffer.c
> index fc8831c392d7..fa92e0afe349 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -907,7 +907,7 @@ static sector_t blkdev_max_block(struct block_device 
> *bdev, unsigned int size)
>   loff_t sz = i_size_read(bdev->bd_inode);
>  
>   if (sz) {
> - unsigned int sizebits = blksize_bits(size);
> + unsigned int sizebits = ilog2(size);

bdev->bd_inode->i_blkbits.

> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 1543b5af400e..7ea2cd3effcc 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -1148,7 +1148,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode 
> *inode,
>  
>   if (align & blocksize_mask) {
>   if (bdev)
> - blkbits = blksize_bits(bdev_logical_block_size(bdev));
> + blkbits = ilog2(bdev_logical_block_size(bdev));

bdev->bd_inode->i_blkbits.

> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index cb05f71cf850..b896da27942a 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -3458,7 +3458,7 @@ static int check_direct_IO(struct inode *inode, struct 
> iov_iter *iter,
>  
>   if (align & blocksize_mask) {
>   if (bdev)
> - blkbits = blksize_bits(bdev_logical_block_size(bdev));
> + blkbits = ilog2(bdev_logical_block_size(bdev));

bdev->bd_inode->i_blkbits.

>   blocksize_mask = (1 << blkbits) - 1;
>   if (align & blocksize_mask)
>   return -EINVAL;
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index ec7b78e6feca..2a807657d544 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -203,7 +203,7 @@ static loff_t
>  iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
>   struct iomap_dio *dio, struct iomap *iomap)
>  {
> - unsigned int blkbits = 
> blksize_bits(bdev_logical_block_size(iomap->bdev));
> + unsigned int blkbits = ilog2(bdev_logical_block_size(iomap->bdev));

iomap->bdev->bd_inode->i_blkbits.


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2] blkdev: Replace blksize_bits() with ilog2()

2020-05-29 Thread Jens Axboe
On 5/29/20 8:11 AM, Kaitao Cheng wrote:
> There is a function named ilog2() exist which can replace blksize.
> The generated code will be shorter and more efficient on some
> architecture, such as arm64. And ilog2() can be optimized according
> to different architecture.

When you posted this last time, I said:

"I like the simplification, but do you have any results to back up
 that claim? Is the generated code shorter? Runs faster?"

which you handily ignored, yet sending out a new version. I'm not
going to apply this without justification, your commit message is
handwavy at best.

-- 
Jens Axboe



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2] blkdev: Replace blksize_bits() with ilog2()

2020-05-29 Thread Matthew Wilcox
On Fri, May 29, 2020 at 10:11:00PM +0800, Kaitao Cheng wrote:
> There is a function named ilog2() exist which can replace blksize.
> The generated code will be shorter and more efficient on some
> architecture, such as arm64. And ilog2() can be optimized according
> to different architecture.

We'd get the same benefit from a smaller patch with just:

--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1502,15 +1502,9 @@ static inline int blk_rq_aligned(struct request_queue 
*q, unsigned long addr,
return !(addr & alignment) && !(len & alignment);
 }
 
-/* assumes size > 256 */
 static inline unsigned int blksize_bits(unsigned int size)
 {
-   unsigned int bits = 8;
-   do {
-   bits++;
-   size >>= 1;
-   } while (size > 256);
-   return bits;
+   return ilog2(size);
 }
 
 static inline unsigned int block_size(struct block_device *bdev)


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] fsck.f2fs: fix dirent position check for encrypted+casefolded dentries

2020-05-29 Thread Eric Biggers
From: Eric Biggers 

fsck.f2fs reports corruption if the filesystem contains any encrypted +
casefolded directories with any substantial number of dentries:

[ASSERT] (f2fs_check_dirent_position:1374)  --> Wrong position of dirent 
pino:8, name:۟�[I�^*�(�5~�}�D��#]7�8�ˎ�, level:1, dir_level:0, pgofs:4, correct 
range:[2, 3]

The problem is that f2fs_check_dirent_position() computes the wrong hash
for encrypted+casefolded dentries.  It's not actually possible for it to
compute the correct hash, because it would need the encryption key.

However, the on-disk dentry already contains the hash code, and its
correctness was already verified by f2fs_check_hash_code() if possible.

So, make f2fs_check_dirent_position() use the hash code from disk rather
than recompute it.

Also fix it to print the filename in human-readable form.

This bug was causing 'kvm-xfstests -c f2fs/encrypt -g casefold'
to fail with the test_dummy_encryption_v2 and encryption+casefolding
kernel patches applied.

Fixes: 7f3767ee8dc5 ("f2fs-tools: Casefolded Encryption support")
Cc: Daniel Rosenberg 
Signed-off-by: Eric Biggers 
---
 fsck/fsck.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fsck/fsck.c b/fsck/fsck.c
index 0389146..c249dfa 100644
--- a/fsck/fsck.c
+++ b/fsck/fsck.c
@@ -1348,11 +1348,10 @@ static int __get_current_level(int dir_level, u32 pgofs)
return i;
 }
 
-static int f2fs_check_dirent_position(int encoding, int casefolded,
-   u8 *name, u16 name_len, u32 
pgofs,
-   u8 dir_level, u32 pino)
+static int f2fs_check_dirent_position(const struct f2fs_dir_entry *dentry,
+ const char *printable_name,
+ u32 pgofs, u8 dir_level, u32 pino)
 {
-   f2fs_hash_t namehash = f2fs_dentry_hash(encoding, casefolded, name, 
name_len);
unsigned int nbucket, nblock;
unsigned int bidx, end_block;
int level;
@@ -1363,7 +1362,7 @@ static int f2fs_check_dirent_position(int encoding, int 
casefolded,
nblock = bucket_blocks(level);
 
bidx = dir_block_index(level, dir_level,
-   le32_to_cpu(namehash) % nbucket);
+  le32_to_cpu(dentry->hash_code) % nbucket);
end_block = bidx + nblock;
 
if (pgofs >= bidx && pgofs < end_block)
@@ -1371,7 +1370,8 @@ static int f2fs_check_dirent_position(int encoding, int 
casefolded,
 
ASSERT_MSG("Wrong position of dirent pino:%u, name:%s, level:%d, "
"dir_level:%d, pgofs:%u, correct range:[%u, %u]\n",
-   pino, name, level, dir_level, pgofs, bidx, end_block - 1);
+   pino, printable_name, level, dir_level, pgofs, bidx,
+   end_block - 1);
return 1;
 }
 
@@ -1552,10 +1552,12 @@ static int __chk_dentries(struct f2fs_sb_info *sbi, int 
casefolded,
if (f2fs_check_hash_code(get_encoding(sbi), casefolded, dentry 
+ i, name, name_len, enc_name))
fixed = 1;
 
+   pretty_print_filename(name, name_len, en, enc_name);
+
if (max == NR_DENTRY_IN_BLOCK) {
-   ret = f2fs_check_dirent_position(get_encoding(sbi), 
casefolded,
-   name, name_len, child->pgofs,
-   child->dir_level, child->p_ino);
+   ret = f2fs_check_dirent_position(dentry + i, en,
+   child->pgofs, child->dir_level,
+   child->p_ino);
if (ret) {
if (c.fix_on) {
FIX_MSG("Clear bad dentry 0x%x", i);
@@ -1568,7 +1570,6 @@ static int __chk_dentries(struct f2fs_sb_info *sbi, int 
casefolded,
}
}
 
-   pretty_print_filename(name, name_len, en, enc_name);
DBG(1, "[%3u]-[0x%x] name[%s] len[0x%x] ino[0x%x] type[0x%x]\n",
fsck->dentry_depth, i, en, name_len,
le32_to_cpu(dentry[i].ino),
-- 
2.27.0.rc0.183.gde8f92d652-goog



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2] blkdev: Replace blksize_bits() with ilog2()

2020-05-29 Thread Bart Van Assche
On 2020-05-29 13:27, Matthew Wilcox wrote:
> On Fri, May 29, 2020 at 10:11:00PM +0800, Kaitao Cheng wrote:
>> There is a function named ilog2() exist which can replace blksize.
>> The generated code will be shorter and more efficient on some
>> architecture, such as arm64. And ilog2() can be optimized according
>> to different architecture.
> 
> We'd get the same benefit from a smaller patch with just:
> 
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1502,15 +1502,9 @@ static inline int blk_rq_aligned(struct request_queue 
> *q, unsigned long addr,
>   return !(addr & alignment) && !(len & alignment);
>  }
>  
> -/* assumes size > 256 */
>  static inline unsigned int blksize_bits(unsigned int size)
>  {
> - unsigned int bits = 8;
> - do {
> - bits++;
> - size >>= 1;
> - } while (size > 256);
> - return bits;
> + return ilog2(size);
>  }
>  
>  static inline unsigned int block_size(struct block_device *bdev)

Hi Matthew,

I had suggested to change all blksize_bits() calls into ilog2() calls
because I think that a single function to calculate the logarithm base 2
is sufficient.

Thanks,

Bart.




___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

2020-05-29 Thread Jaegeuk Kim
On 05/29, Chao Yu wrote:
> Under heavy fsstress, we may triggle panic while issuing discard,
> because __check_sit_bitmap() detects that discard command may earse
> valid data blocks, the root cause is as below race stack described,
> since we removed lock when flushing quota data, quota data writeback
> may race with write_checkpoint(), so that it causes inconsistency in
> between cached discard entry and segment bitmap.
> 
> - f2fs_write_checkpoint
>  - block_operations
>   - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
>  - f2fs_flush_sit_entries
>   - add_discard_addrs
>- __set_bit_le(i, (void *)de->discard_map);
>   - f2fs_write_data_pages
>- f2fs_write_single_data_page
>  : inode is quota one, 
> cp_rwsem won't be locked
> - f2fs_do_write_data_page
>  - f2fs_allocate_data_block
>   - f2fs_wait_discard_bio
> : discard entry has not 
> been added yet.
>   - update_sit_entry
>  - f2fs_clear_prefree_segments
>   - f2fs_issue_discard
>   : add discard entry
> 
> This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix quota_sync
> failure due to f2fs_lock_op").
> 
> Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")

The previous patch fixes quota_sync gets EAGAIN all the time.
How about this? It seems this works for fsstress test.

---
 fs/f2fs/segment.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index ebbadde6cbced..f67cffc38975e 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3095,6 +3095,14 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, 
struct page *page,
struct curseg_info *curseg = CURSEG_I(sbi, type);
bool put_pin_sem = false;
 
+   /*
+* We need to wait for node_write to avoid block allocation during
+* checkpoint. This can only happen to quota writes which can cause
+* the below discard race condition.
+*/
+   if (IS_DATASEG(type))
+   down_write(&sbi->node_write);
+
if (type == CURSEG_COLD_DATA) {
/* GC during CURSEG_COLD_DATA_PINNED allocation */
if (down_read_trylock(&sbi->pin_sem)) {
@@ -3174,6 +3182,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, 
struct page *page,
 
if (put_pin_sem)
up_read(&sbi->pin_sem);
+
+   if (IS_DATASEG(type))
+   up_write(&sbi->node_write);
 }
 
 static void update_device_state(struct f2fs_io_info *fio)
-- 
2.27.0.rc0.183.gde8f92d652-goog



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2] blkdev: Replace blksize_bits() with ilog2()

2020-05-29 Thread Jens Axboe
On 5/29/20 4:27 PM, Bart Van Assche wrote:
> On 2020-05-29 13:27, Matthew Wilcox wrote:
>> On Fri, May 29, 2020 at 10:11:00PM +0800, Kaitao Cheng wrote:
>>> There is a function named ilog2() exist which can replace blksize.
>>> The generated code will be shorter and more efficient on some
>>> architecture, such as arm64. And ilog2() can be optimized according
>>> to different architecture.
>>
>> We'd get the same benefit from a smaller patch with just:
>>
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -1502,15 +1502,9 @@ static inline int blk_rq_aligned(struct request_queue 
>> *q, unsigned long addr,
>>  return !(addr & alignment) && !(len & alignment);
>>  }
>>  
>> -/* assumes size > 256 */
>>  static inline unsigned int blksize_bits(unsigned int size)
>>  {
>> -unsigned int bits = 8;
>> -do {
>> -bits++;
>> -size >>= 1;
>> -} while (size > 256);
>> -return bits;
>> +return ilog2(size);
>>  }
>>  
>>  static inline unsigned int block_size(struct block_device *bdev)
> 
> Hi Matthew,
> 
> I had suggested to change all blksize_bits() calls into ilog2() calls
> because I think that a single function to calculate the logarithm base 2
> is sufficient.

I think this should be a two-parter:

1) Use the inode blkbits where appropriate
2) Then do this change

I'm leaning towards just doing it in blksize_bits() instead of updating
every caller, unless there aren't that many left once we've gone
through patch 1.

-- 
Jens Axboe



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

2020-05-29 Thread Chao Yu
On 2020/5/30 6:34, Jaegeuk Kim wrote:
> On 05/29, Chao Yu wrote:
>> Under heavy fsstress, we may triggle panic while issuing discard,
>> because __check_sit_bitmap() detects that discard command may earse
>> valid data blocks, the root cause is as below race stack described,
>> since we removed lock when flushing quota data, quota data writeback
>> may race with write_checkpoint(), so that it causes inconsistency in
>> between cached discard entry and segment bitmap.
>>
>> - f2fs_write_checkpoint
>>  - block_operations
>>   - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
>>  - f2fs_flush_sit_entries
>>   - add_discard_addrs
>>- __set_bit_le(i, (void *)de->discard_map);
>>  - f2fs_write_data_pages
>>   - f2fs_write_single_data_page
>> : inode is quota one, 
>> cp_rwsem won't be locked
>>- f2fs_do_write_data_page
>> - f2fs_allocate_data_block
>>  - f2fs_wait_discard_bio
>>: discard entry has not 
>> been added yet.
>>  - update_sit_entry
>>  - f2fs_clear_prefree_segments
>>   - f2fs_issue_discard
>>   : add discard entry
>>
>> This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix quota_sync
>> failure due to f2fs_lock_op").
>>
>> Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
> 
> The previous patch fixes quota_sync gets EAGAIN all the time.
> How about this? It seems this works for fsstress test.
> 
> ---
>  fs/f2fs/segment.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index ebbadde6cbced..f67cffc38975e 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -3095,6 +3095,14 @@ void f2fs_allocate_data_block(struct f2fs_sb_info 
> *sbi, struct page *page,
>   struct curseg_info *curseg = CURSEG_I(sbi, type);
>   bool put_pin_sem = false;
>  
> + /*
> +  * We need to wait for node_write to avoid block allocation during
> +  * checkpoint. This can only happen to quota writes which can cause
> +  * the below discard race condition.
> +  */
> + if (IS_DATASEG(type))

type is CURSEG_COLD_DATA_PINNED, IS_DATASEG(CURSEG_COLD_DATA_PINNED) should be 
false,
then node_write lock will not be held, later type will be updated to 
CURSEG_COLD_DATA,
then we will try to release node_write.

Thanks,

> + down_write(&sbi->node_write);
> +
>   if (type == CURSEG_COLD_DATA) {
>   /* GC during CURSEG_COLD_DATA_PINNED allocation */
>   if (down_read_trylock(&sbi->pin_sem)) {
> @@ -3174,6 +3182,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, 
> struct page *page,
>  
>   if (put_pin_sem)
>   up_read(&sbi->pin_sem);
> +
> + if (IS_DATASEG(type))
> + up_write(&sbi->node_write);
>  }
>  
>  static void update_device_state(struct f2fs_io_info *fio)
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

2020-05-29 Thread Jaegeuk Kim
On 05/30, Chao Yu wrote:
> On 2020/5/30 6:34, Jaegeuk Kim wrote:
> > On 05/29, Chao Yu wrote:
> >> Under heavy fsstress, we may triggle panic while issuing discard,
> >> because __check_sit_bitmap() detects that discard command may earse
> >> valid data blocks, the root cause is as below race stack described,
> >> since we removed lock when flushing quota data, quota data writeback
> >> may race with write_checkpoint(), so that it causes inconsistency in
> >> between cached discard entry and segment bitmap.
> >>
> >> - f2fs_write_checkpoint
> >>  - block_operations
> >>   - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
> >>  - f2fs_flush_sit_entries
> >>   - add_discard_addrs
> >>- __set_bit_le(i, (void *)de->discard_map);
> >>- f2fs_write_data_pages
> >> - f2fs_write_single_data_page
> >>   : inode is quota one, 
> >> cp_rwsem won't be locked
> >>  - f2fs_do_write_data_page
> >>   - f2fs_allocate_data_block
> >>- f2fs_wait_discard_bio
> >>  : discard entry has not 
> >> been added yet.
> >>- update_sit_entry
> >>  - f2fs_clear_prefree_segments
> >>   - f2fs_issue_discard
> >>   : add discard entry
> >>
> >> This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix 
> >> quota_sync
> >> failure due to f2fs_lock_op").
> >>
> >> Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
> > 
> > The previous patch fixes quota_sync gets EAGAIN all the time.
> > How about this? It seems this works for fsstress test.
> > 

Then this?

---
 fs/f2fs/segment.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index ebbadde6cbced..ed11dcf2d69ed 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3107,6 +3107,14 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, 
struct page *page,
type = CURSEG_COLD_DATA;
}
 
+   /*
+* We need to wait for node_write to avoid block allocation during
+* checkpoint. This can only happen to quota writes which can cause
+* the below discard race condition.
+*/
+   if (IS_DATASEG(type))
+   down_write(&sbi->node_write);
+
down_read(&SM_I(sbi)->curseg_lock);
 
mutex_lock(&curseg->curseg_mutex);
@@ -3174,6 +3182,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, 
struct page *page,
 
if (put_pin_sem)
up_read(&sbi->pin_sem);
+
+   if (IS_DATASEG(type))
+   up_write(&sbi->node_write);
 }
 
 static void update_device_state(struct f2fs_io_info *fio)
-- 
2.27.0.rc0.183.gde8f92d652-goog



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

2020-05-29 Thread Chao Yu
On 2020/5/30 9:49, Jaegeuk Kim wrote:
> On 05/30, Chao Yu wrote:
>> On 2020/5/30 6:34, Jaegeuk Kim wrote:
>>> On 05/29, Chao Yu wrote:
 Under heavy fsstress, we may triggle panic while issuing discard,
 because __check_sit_bitmap() detects that discard command may earse
 valid data blocks, the root cause is as below race stack described,
 since we removed lock when flushing quota data, quota data writeback
 may race with write_checkpoint(), so that it causes inconsistency in
 between cached discard entry and segment bitmap.

 - f2fs_write_checkpoint
  - block_operations
   - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
  - f2fs_flush_sit_entries
   - add_discard_addrs
- __set_bit_le(i, (void *)de->discard_map);
- f2fs_write_data_pages
 - f2fs_write_single_data_page
   : inode is quota one, 
 cp_rwsem won't be locked
  - f2fs_do_write_data_page
   - f2fs_allocate_data_block
- f2fs_wait_discard_bio
  : discard entry has not 
 been added yet.
- update_sit_entry
  - f2fs_clear_prefree_segments
   - f2fs_issue_discard
   : add discard entry

 This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix 
 quota_sync
 failure due to f2fs_lock_op").

 Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
>>>
>>> The previous patch fixes quota_sync gets EAGAIN all the time.
>>> How about this? It seems this works for fsstress test.
>>>
> 
> Then this?
> 
> ---
>  fs/f2fs/segment.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index ebbadde6cbced..ed11dcf2d69ed 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -3107,6 +3107,14 @@ void f2fs_allocate_data_block(struct f2fs_sb_info 
> *sbi, struct page *page,
>   type = CURSEG_COLD_DATA;
>   }
>  
> + /*
> +  * We need to wait for node_write to avoid block allocation during
> +  * checkpoint. This can only happen to quota writes which can cause
> +  * the below discard race condition.
> +  */
> + if (IS_DATASEG(type))
> + down_write(&sbi->node_write);
> +
>   down_read(&SM_I(sbi)->curseg_lock);
>  
>   mutex_lock(&curseg->curseg_mutex);
> @@ -3174,6 +3182,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, 
> struct page *page,

Minor thing: unlock order.

if (IS_DATASEG(type))
up_write(&sbi->node_write);

Could you merge the diff into this patch?

>  
>   if (put_pin_sem)
>   up_read(&sbi->pin_sem);
> +
> + if (IS_DATASEG(type))
> + up_write(&sbi->node_write);
>  }
>  
>  static void update_device_state(struct f2fs_io_info *fio)
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] ext4: avoid utf8_strncasecmp() with unstable name

2020-05-29 Thread Eric Biggers
From: Eric Biggers 

If the dentry name passed to ->d_compare() fits in dentry::d_iname, then
it may be concurrently modified by a rename.  This can cause undefined
behavior (possibly out-of-bounds memory accesses or crashes) in
utf8_strncasecmp(), since fs/unicode/ isn't written to handle strings
that may be concurrently modified.

Fix this by first copying the filename to a stack buffer if needed.
This way we get a stable snapshot of the filename.

Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups")
Cc:  # v5.2+
Cc: Al Viro 
Cc: Daniel Rosenberg 
Cc: Gabriel Krisman Bertazi 
Signed-off-by: Eric Biggers 
---
 fs/ext4/dir.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index c654205f648dd..19aef8328bb18 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -675,6 +675,7 @@ static int ext4_d_compare(const struct dentry *dentry, 
unsigned int len,
struct qstr qstr = {.name = str, .len = len };
const struct dentry *parent = READ_ONCE(dentry->d_parent);
const struct inode *inode = READ_ONCE(parent->d_inode);
+   char strbuf[DNAME_INLINE_LEN];
 
if (!inode || !IS_CASEFOLDED(inode) ||
!EXT4_SB(inode->i_sb)->s_encoding) {
@@ -683,6 +684,22 @@ static int ext4_d_compare(const struct dentry *dentry, 
unsigned int len,
return memcmp(str, name->name, len);
}
 
+   /*
+* If the dentry name is stored in-line, then it may be concurrently
+* modified by a rename.  If this happens, the VFS will eventually retry
+* the lookup, so it doesn't matter what ->d_compare() returns.
+* However, it's unsafe to call utf8_strncasecmp() with an unstable
+* string.  Therefore, we have to copy the name into a temporary buffer.
+*/
+   if (len <= DNAME_INLINE_LEN - 1) {
+   unsigned int i;
+
+   for (i = 0; i < len; i++)
+   strbuf[i] = READ_ONCE(str[i]);
+   strbuf[len] = 0;
+   qstr.name = strbuf;
+   }
+
return ext4_ci_compare(inode, name, &qstr, false);
 }
 
-- 
2.26.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: avoid utf8_strncasecmp() with unstable name

2020-05-29 Thread Eric Biggers
From: Eric Biggers 

If the dentry name passed to ->d_compare() fits in dentry::d_iname, then
it may be concurrently modified by a rename.  This can cause undefined
behavior (possibly out-of-bounds memory accesses or crashes) in
utf8_strncasecmp(), since fs/unicode/ isn't written to handle strings
that may be concurrently modified.

Fix this by first copying the filename to a stack buffer if needed.
This way we get a stable snapshot of the filename.

Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups")
Cc:  # v5.4+
Cc: Al Viro 
Cc: Daniel Rosenberg 
Cc: Gabriel Krisman Bertazi 
Signed-off-by: Eric Biggers 
---
 fs/f2fs/dir.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 44bfc464df787..5c179b72eb8a8 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -1083,6 +1083,7 @@ static int f2fs_d_compare(const struct dentry *dentry, 
unsigned int len,
struct qstr qstr = {.name = str, .len = len };
const struct dentry *parent = READ_ONCE(dentry->d_parent);
const struct inode *inode = READ_ONCE(parent->d_inode);
+   char strbuf[DNAME_INLINE_LEN];
 
if (!inode || !IS_CASEFOLDED(inode)) {
if (len != name->len)
@@ -1090,6 +1091,22 @@ static int f2fs_d_compare(const struct dentry *dentry, 
unsigned int len,
return memcmp(str, name->name, len);
}
 
+   /*
+* If the dentry name is stored in-line, then it may be concurrently
+* modified by a rename.  If this happens, the VFS will eventually retry
+* the lookup, so it doesn't matter what ->d_compare() returns.
+* However, it's unsafe to call utf8_strncasecmp() with an unstable
+* string.  Therefore, we have to copy the name into a temporary buffer.
+*/
+   if (len <= DNAME_INLINE_LEN - 1) {
+   unsigned int i;
+
+   for (i = 0; i < len; i++)
+   strbuf[i] = READ_ONCE(str[i]);
+   strbuf[len] = 0;
+   qstr.name = strbuf;
+   }
+
return f2fs_ci_compare(inode, name, &qstr, false);
 }
 
-- 
2.26.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] ext4: avoid utf8_strncasecmp() with unstable name

2020-05-29 Thread Gabriel Krisman Bertazi
Eric Biggers  writes:

> From: Eric Biggers 
>
> If the dentry name passed to ->d_compare() fits in dentry::d_iname, then
> it may be concurrently modified by a rename.  This can cause undefined
> behavior (possibly out-of-bounds memory accesses or crashes) in
> utf8_strncasecmp(), since fs/unicode/ isn't written to handle strings
> that may be concurrently modified.
>
> Fix this by first copying the filename to a stack buffer if needed.
> This way we get a stable snapshot of the filename.
>
> Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups")
> Cc:  # v5.2+
> Cc: Al Viro 
> Cc: Daniel Rosenberg 
> Cc: Gabriel Krisman Bertazi 
> Signed-off-by: Eric Biggers 
> ---
>  fs/ext4/dir.c | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index c654205f648dd..19aef8328bb18 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -675,6 +675,7 @@ static int ext4_d_compare(const struct dentry *dentry, 
> unsigned int len,
>   struct qstr qstr = {.name = str, .len = len };
>   const struct dentry *parent = READ_ONCE(dentry->d_parent);
>   const struct inode *inode = READ_ONCE(parent->d_inode);
> + char strbuf[DNAME_INLINE_LEN];
>  
>   if (!inode || !IS_CASEFOLDED(inode) ||
>   !EXT4_SB(inode->i_sb)->s_encoding) {
> @@ -683,6 +684,22 @@ static int ext4_d_compare(const struct dentry *dentry, 
> unsigned int len,
>   return memcmp(str, name->name, len);
>   }
>  
> + /*
> +  * If the dentry name is stored in-line, then it may be concurrently
> +  * modified by a rename.  If this happens, the VFS will eventually retry
> +  * the lookup, so it doesn't matter what ->d_compare() returns.
> +  * However, it's unsafe to call utf8_strncasecmp() with an unstable
> +  * string.  Therefore, we have to copy the name into a temporary buffer.
> +  */
> + if (len <= DNAME_INLINE_LEN - 1) {
> + unsigned int i;
> +
> + for (i = 0; i < len; i++)
> + strbuf[i] = READ_ONCE(str[i]);
> + strbuf[len] = 0;
> + qstr.name = strbuf;
> + }
> +

Could we avoid this if the casefolded version were cached in the dentry?
Then we could use utf8_strncasecmp_folded which would be safe.  Would
this be acceptable for vfs?

>   return ext4_ci_compare(inode, name, &qstr, false);
>  }

-- 
Gabriel Krisman Bertazi


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] ext4: avoid utf8_strncasecmp() with unstable name

2020-05-29 Thread Eric Biggers
On Sat, May 30, 2020 at 02:17:02AM -0400, Gabriel Krisman Bertazi wrote:
> >  > > +  /*
> > +* If the dentry name is stored in-line, then it may be concurrently
> > +* modified by a rename.  If this happens, the VFS will eventually retry
> > +* the lookup, so it doesn't matter what ->d_compare() returns.
> > +* However, it's unsafe to call utf8_strncasecmp() with an unstable
> > +* string.  Therefore, we have to copy the name into a temporary buffer.
> > +*/
> > +   if (len <= DNAME_INLINE_LEN - 1) {
> > +   unsigned int i;
> > +
> > +   for (i = 0; i < len; i++)
> > +   strbuf[i] = READ_ONCE(str[i]);
> > +   strbuf[len] = 0;
> > +   qstr.name = strbuf;
> > +   }
> > +
> 
> Could we avoid this if the casefolded version were cached in the dentry?
> Then we could use utf8_strncasecmp_folded which would be safe.  Would
> this be acceptable for vfs?

The VFS assumes that each dentry has one name, the one in d_name.  That's what
it passes to ->d_compare(), and that's what it updates in __d_move().

So while ext4 and f2fs could put the casefolded name in ->d_fsdata,
->d_compare() wouldn't actually have access to it (unless we added d_fsdata as a
parameter to ->d_compare()).  Also, the casefolded name would get outdated when
__d_move() changes d_name.

We could instead make d_name always be the casefolded name.  I'm not sure that
would be possible, though.  For one, I don't think ->lookup() is allowed to just
change the dentry name.  It would also make getcwd(), /proc/*/fd/, etc. always
show casefolded names, which could be problematic.  And probably other issues I
can't think of off the top of my head.

- Eric


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] ext4: avoid utf8_strncasecmp() with unstable name

2020-05-29 Thread Gabriel Krisman Bertazi
Eric Biggers  writes:

> On Sat, May 30, 2020 at 02:17:02AM -0400, Gabriel Krisman Bertazi wrote:
>> >  > > + /*
>> > +   * If the dentry name is stored in-line, then it may be concurrently
>> > +   * modified by a rename.  If this happens, the VFS will eventually retry
>> > +   * the lookup, so it doesn't matter what ->d_compare() returns.
>> > +   * However, it's unsafe to call utf8_strncasecmp() with an unstable
>> > +   * string.  Therefore, we have to copy the name into a temporary buffer.
>> > +   */
>> > +  if (len <= DNAME_INLINE_LEN - 1) {
>> > +  unsigned int i;
>> > +
>> > +  for (i = 0; i < len; i++)
>> > +  strbuf[i] = READ_ONCE(str[i]);
>> > +  strbuf[len] = 0;
>> > +  qstr.name = strbuf;
>> > +  }
>> > +
>> 
>> Could we avoid this if the casefolded version were cached in the dentry?
>> Then we could use utf8_strncasecmp_folded which would be safe.  Would
>> this be acceptable for vfs?
>
> The VFS assumes that each dentry has one name, the one in d_name.  That's what
> it passes to ->d_compare(), and that's what it updates in __d_move().
>
> So while ext4 and f2fs could put the casefolded name in ->d_fsdata,
> ->d_compare() wouldn't actually have access to it (unless we added d_fsdata 
> as a
> parameter to ->d_compare()).  Also, the casefolded name would get outdated 
> when
> __d_move() changes d_name.

Thanks for the explanation.  I see.  I was thinking this cached name
could be invalidated if the name changes and all that jazz, but I didn't
understand this code well enough to give it a try.  I hate the need to
do this copy here, though.

Anyway,

Reviewed-by: Gabriel Krisman Bertazi 

>
> We could instead make d_name always be the casefolded name.  I'm not sure that
> would be possible, though.  For one, I don't think ->lookup() is allowed to 
> just
> change the dentry name.  It would also make getcwd(), /proc/*/fd/, etc. always
> show casefolded names, which could be problematic.  And probably other issues 
> I
> can't think of off the top of my head.
>
> - Eric

-- 
Gabriel Krisman Bertazi


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel