Re: [f2fs-dev] [PATCH v10 3/8] libfs: Introduce case-insensitive string comparison helper

2024-02-20 Thread Gabriel Krisman Bertazi
Eugen Hristev  writes:

> Okay, I am changing it.
>
> By the way, is this supposed to work like this on case-insensitive 
> directories ?
>
> user@debian-rockchip-rock5b-rk3588:~$ ls -la /media/CI_dir/*cuc
> ls: cannot access '/media/CI_dir/*cuc': No such file or directory
> user@debian-rockchip-rock5b-rk3588:~$ ls -la /media/CI_dir/*CUC
> -rw-r--r-- 1 root root 0 Feb 12 17:47 /media/CI_dir/CUC
> user@debian-rockchip-rock5b-rk3588:~$ ls -la /media/CI_dir/cuc
> -rw-r--r-- 1 root root 0 Feb 12 17:47 /media/CI_dir/cuc
> user@debian-rockchip-rock5b-rk3588:~$
>
>
> basically wildcards don't work.

Yes, at least from a kernel point of view.  Your shell does wildcards in
userspace, probably by doing getdents and then comparing with possible
matches.  Since the shell itself is not case-insensitive aware, its
comparison is case-sensitive, and you get these apparent weird
semantics.

Not ideal from a user point of view.  But not a kernel bug.  If it
pushes people away from using case-insensitive directories in their
day-to-day work and leave it to only be used by Windows compatibility
layers, maybe that's a win? :)

-- 
Gabriel Krisman Bertazi


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v10 3/8] libfs: Introduce case-insensitive string comparison helper

2024-02-19 Thread Eugen Hristev via Linux-f2fs-devel
On 2/19/24 16:55, Gabriel Krisman Bertazi wrote:
> Eugen Hristev  writes:
> 
>> On 2/16/24 18:12, Gabriel Krisman Bertazi wrote:
>>> Eugen Hristev  writes:
>>>
 From: Gabriel Krisman Bertazi 

 generic_ci_match can be used by case-insensitive filesystems to compare
 strings under lookup with dirents in a case-insensitive way.  This
 function is currently reimplemented by each filesystem supporting
 casefolding, so this reduces code duplication in filesystem-specific
 code.

 Signed-off-by: Gabriel Krisman Bertazi 
 [eugen.hris...@collabora.com: rework to first test the exact match]
 Signed-off-by: Eugen Hristev 
 ---
  fs/libfs.c | 80 ++
  include/linux/fs.h |  4 +++
  2 files changed, 84 insertions(+)

 diff --git a/fs/libfs.c b/fs/libfs.c
 index bb18884ff20e..82871fa1b066 100644
 --- a/fs/libfs.c
 +++ b/fs/libfs.c
 @@ -1773,6 +1773,86 @@ static const struct dentry_operations 
 generic_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
  };
 +
 +/**
 + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
 + * This is a filesystem helper for comparison with directory entries.
 + * generic_ci_d_compare should be used in VFS' ->d_compare instead.
 + *
 + * @parent: Inode of the parent of the dirent under comparison
 + * @name: name under lookup.
 + * @folded_name: Optional pre-folded name under lookup
 + * @de_name: Dirent name.
 + * @de_name_len: dirent name length.
 + *
 + *
>>>
>>> Since this need a respin, mind dropping the extra empty line here?
>>>
 + * Test whether a case-insensitive directory entry matches the filename
 + * being searched.  If @folded_name is provided, it is used instead of
 + * recalculating the casefold of @name.
 + *
 + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
 + * < 0 on error.
 + */
 +int generic_ci_match(const struct inode *parent,
 +   const struct qstr *name,
 +   const struct qstr *folded_name,
 +   const u8 *de_name, u32 de_name_len)
 +{
 +  const struct super_block *sb = parent->i_sb;
 +  const struct unicode_map *um = sb->s_encoding;
 +  struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
 +  struct qstr dirent = QSTR_INIT(de_name, de_name_len);
 +  int res;
 +
 +  if (IS_ENCRYPTED(parent)) {
 +  const struct fscrypt_str encrypted_name =
 +  FSTR_INIT((u8 *) de_name, de_name_len);
 +
 +  if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
 +  return -EINVAL;
 +
 +  decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
 +  if (!decrypted_name.name)
 +  return -ENOMEM;
 +  res = fscrypt_fname_disk_to_usr(parent, 0, 0, _name,
 +  _name);
 +  if (res < 0)
 +  goto out;
 +  dirent.name = decrypted_name.name;
 +  dirent.len = decrypted_name.len;
 +  }
 +
 +  /*
 +   * Attempt a case-sensitive match first. It is cheaper and
 +   * should cover most lookups, including all the sane
 +   * applications that expect a case-sensitive filesystem.
 +   *
>>>
>>>
 +   * This comparison is safe under RCU because the caller
 +   * guarantees the consistency between str and len. See
 +   * __d_lookup_rcu_op_compare() for details.
 +   */
>>>
>>> This paragraph doesn't really make sense here.  It is originally from
>>> the d_compare hook, which can be called under RCU, but there is no RCU
>>> here.  Also, here we are comparing the dirent with the
>>> name-under-lookup, name which is already safe.
>>>
>>>
 +  if (folded_name->name) {
 +  if (dirent.len == folded_name->len &&
 +  !memcmp(folded_name->name, dirent.name, dirent.len)) {
 +  res = 1;
 +  goto out;
 +  }
 +  res = !utf8_strncasecmp_folded(um, folded_name, );
>>>
>>> Hmm, second thought on this.  This will ignore errors from 
>>> utf8_strncasecmp*,
>>> which CAN happen for the first time here, if the dirent itself is
>>> corrupted on disk (exactly why we have patch 6).  Yes, ext4_match will drop 
>>> the
>>> error, but we want to propagate it from here, such that the warning on
>>> patch 6 can trigger.
>>>
>>> This is why I did that match dance on the original submission.  Sorry
>>> for suggesting it.  We really want to get the error from utf8 and
>>> propagate it if it is negative. basically:
>>>
>>> res > 0: match
>>> res == 0: no match.
>>> res < 0: propagate error and let the caller handle it
>>
>> In that case I will revert to 

Re: [f2fs-dev] [PATCH v10 3/8] libfs: Introduce case-insensitive string comparison helper

2024-02-19 Thread Gabriel Krisman Bertazi
Eugen Hristev  writes:

> On 2/16/24 18:12, Gabriel Krisman Bertazi wrote:
>> Eugen Hristev  writes:
>> 
>>> From: Gabriel Krisman Bertazi 
>>>
>>> generic_ci_match can be used by case-insensitive filesystems to compare
>>> strings under lookup with dirents in a case-insensitive way.  This
>>> function is currently reimplemented by each filesystem supporting
>>> casefolding, so this reduces code duplication in filesystem-specific
>>> code.
>>>
>>> Signed-off-by: Gabriel Krisman Bertazi 
>>> [eugen.hris...@collabora.com: rework to first test the exact match]
>>> Signed-off-by: Eugen Hristev 
>>> ---
>>>  fs/libfs.c | 80 ++
>>>  include/linux/fs.h |  4 +++
>>>  2 files changed, 84 insertions(+)
>>>
>>> diff --git a/fs/libfs.c b/fs/libfs.c
>>> index bb18884ff20e..82871fa1b066 100644
>>> --- a/fs/libfs.c
>>> +++ b/fs/libfs.c
>>> @@ -1773,6 +1773,86 @@ static const struct dentry_operations 
>>> generic_ci_dentry_ops = {
>>> .d_hash = generic_ci_d_hash,
>>> .d_compare = generic_ci_d_compare,
>>>  };
>>> +
>>> +/**
>>> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
>>> + * This is a filesystem helper for comparison with directory entries.
>>> + * generic_ci_d_compare should be used in VFS' ->d_compare instead.
>>> + *
>>> + * @parent: Inode of the parent of the dirent under comparison
>>> + * @name: name under lookup.
>>> + * @folded_name: Optional pre-folded name under lookup
>>> + * @de_name: Dirent name.
>>> + * @de_name_len: dirent name length.
>>> + *
>>> + *
>> 
>> Since this need a respin, mind dropping the extra empty line here?
>> 
>>> + * Test whether a case-insensitive directory entry matches the filename
>>> + * being searched.  If @folded_name is provided, it is used instead of
>>> + * recalculating the casefold of @name.
>>> + *
>>> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
>>> + * < 0 on error.
>>> + */
>>> +int generic_ci_match(const struct inode *parent,
>>> +const struct qstr *name,
>>> +const struct qstr *folded_name,
>>> +const u8 *de_name, u32 de_name_len)
>>> +{
>>> +   const struct super_block *sb = parent->i_sb;
>>> +   const struct unicode_map *um = sb->s_encoding;
>>> +   struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
>>> +   struct qstr dirent = QSTR_INIT(de_name, de_name_len);
>>> +   int res;
>>> +
>>> +   if (IS_ENCRYPTED(parent)) {
>>> +   const struct fscrypt_str encrypted_name =
>>> +   FSTR_INIT((u8 *) de_name, de_name_len);
>>> +
>>> +   if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>>> +   return -EINVAL;
>>> +
>>> +   decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>>> +   if (!decrypted_name.name)
>>> +   return -ENOMEM;
>>> +   res = fscrypt_fname_disk_to_usr(parent, 0, 0, _name,
>>> +   _name);
>>> +   if (res < 0)
>>> +   goto out;
>>> +   dirent.name = decrypted_name.name;
>>> +   dirent.len = decrypted_name.len;
>>> +   }
>>> +
>>> +   /*
>>> +* Attempt a case-sensitive match first. It is cheaper and
>>> +* should cover most lookups, including all the sane
>>> +* applications that expect a case-sensitive filesystem.
>>> +*
>> 
>> 
>>> +* This comparison is safe under RCU because the caller
>>> +* guarantees the consistency between str and len. See
>>> +* __d_lookup_rcu_op_compare() for details.
>>> +*/
>> 
>> This paragraph doesn't really make sense here.  It is originally from
>> the d_compare hook, which can be called under RCU, but there is no RCU
>> here.  Also, here we are comparing the dirent with the
>> name-under-lookup, name which is already safe.
>> 
>> 
>>> +   if (folded_name->name) {
>>> +   if (dirent.len == folded_name->len &&
>>> +   !memcmp(folded_name->name, dirent.name, dirent.len)) {
>>> +   res = 1;
>>> +   goto out;
>>> +   }
>>> +   res = !utf8_strncasecmp_folded(um, folded_name, );
>> 
>> Hmm, second thought on this.  This will ignore errors from utf8_strncasecmp*,
>> which CAN happen for the first time here, if the dirent itself is
>> corrupted on disk (exactly why we have patch 6).  Yes, ext4_match will drop 
>> the
>> error, but we want to propagate it from here, such that the warning on
>> patch 6 can trigger.
>> 
>> This is why I did that match dance on the original submission.  Sorry
>> for suggesting it.  We really want to get the error from utf8 and
>> propagate it if it is negative. basically:
>> 
>> res > 0: match
>> res == 0: no match.
>> res < 0: propagate error and let the caller handle it
>
> In that case I will revert to the original v9 implementation and send a v11 to
> handle that.

Please, note that the memcmp optimization is still valid. On 

Re: [f2fs-dev] [PATCH v10 3/8] libfs: Introduce case-insensitive string comparison helper

2024-02-18 Thread Eugen Hristev via Linux-f2fs-devel
On 2/16/24 18:12, Gabriel Krisman Bertazi wrote:
> Eugen Hristev  writes:
> 
>> From: Gabriel Krisman Bertazi 
>>
>> generic_ci_match can be used by case-insensitive filesystems to compare
>> strings under lookup with dirents in a case-insensitive way.  This
>> function is currently reimplemented by each filesystem supporting
>> casefolding, so this reduces code duplication in filesystem-specific
>> code.
>>
>> Signed-off-by: Gabriel Krisman Bertazi 
>> [eugen.hris...@collabora.com: rework to first test the exact match]
>> Signed-off-by: Eugen Hristev 
>> ---
>>  fs/libfs.c | 80 ++
>>  include/linux/fs.h |  4 +++
>>  2 files changed, 84 insertions(+)
>>
>> diff --git a/fs/libfs.c b/fs/libfs.c
>> index bb18884ff20e..82871fa1b066 100644
>> --- a/fs/libfs.c
>> +++ b/fs/libfs.c
>> @@ -1773,6 +1773,86 @@ static const struct dentry_operations 
>> generic_ci_dentry_ops = {
>>  .d_hash = generic_ci_d_hash,
>>  .d_compare = generic_ci_d_compare,
>>  };
>> +
>> +/**
>> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
>> + * This is a filesystem helper for comparison with directory entries.
>> + * generic_ci_d_compare should be used in VFS' ->d_compare instead.
>> + *
>> + * @parent: Inode of the parent of the dirent under comparison
>> + * @name: name under lookup.
>> + * @folded_name: Optional pre-folded name under lookup
>> + * @de_name: Dirent name.
>> + * @de_name_len: dirent name length.
>> + *
>> + *
> 
> Since this need a respin, mind dropping the extra empty line here?
> 
>> + * Test whether a case-insensitive directory entry matches the filename
>> + * being searched.  If @folded_name is provided, it is used instead of
>> + * recalculating the casefold of @name.
>> + *
>> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
>> + * < 0 on error.
>> + */
>> +int generic_ci_match(const struct inode *parent,
>> + const struct qstr *name,
>> + const struct qstr *folded_name,
>> + const u8 *de_name, u32 de_name_len)
>> +{
>> +const struct super_block *sb = parent->i_sb;
>> +const struct unicode_map *um = sb->s_encoding;
>> +struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
>> +struct qstr dirent = QSTR_INIT(de_name, de_name_len);
>> +int res;
>> +
>> +if (IS_ENCRYPTED(parent)) {
>> +const struct fscrypt_str encrypted_name =
>> +FSTR_INIT((u8 *) de_name, de_name_len);
>> +
>> +if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>> +return -EINVAL;
>> +
>> +decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>> +if (!decrypted_name.name)
>> +return -ENOMEM;
>> +res = fscrypt_fname_disk_to_usr(parent, 0, 0, _name,
>> +_name);
>> +if (res < 0)
>> +goto out;
>> +dirent.name = decrypted_name.name;
>> +dirent.len = decrypted_name.len;
>> +}
>> +
>> +/*
>> + * Attempt a case-sensitive match first. It is cheaper and
>> + * should cover most lookups, including all the sane
>> + * applications that expect a case-sensitive filesystem.
>> + *
> 
> 
>> + * This comparison is safe under RCU because the caller
>> + * guarantees the consistency between str and len. See
>> + * __d_lookup_rcu_op_compare() for details.
>> + */
> 
> This paragraph doesn't really make sense here.  It is originally from
> the d_compare hook, which can be called under RCU, but there is no RCU
> here.  Also, here we are comparing the dirent with the
> name-under-lookup, name which is already safe.
> 
> 
>> +if (folded_name->name) {
>> +if (dirent.len == folded_name->len &&
>> +!memcmp(folded_name->name, dirent.name, dirent.len)) {
>> +res = 1;
>> +goto out;
>> +}
>> +res = !utf8_strncasecmp_folded(um, folded_name, );
> 
> Hmm, second thought on this.  This will ignore errors from utf8_strncasecmp*,
> which CAN happen for the first time here, if the dirent itself is
> corrupted on disk (exactly why we have patch 6).  Yes, ext4_match will drop 
> the
> error, but we want to propagate it from here, such that the warning on
> patch 6 can trigger.
> 
> This is why I did that match dance on the original submission.  Sorry
> for suggesting it.  We really want to get the error from utf8 and
> propagate it if it is negative. basically:
> 
> res > 0: match
> res == 0: no match.
> res < 0: propagate error and let the caller handle it

In that case I will revert to the original v9 implementation and send a v11 to
handle that.

Eugen
> 
> 
>> +} else {
>> +if (dirent.len == name->len &&
>> +!memcmp(name->name, dirent.name, dirent.len) &&
>> +

Re: [f2fs-dev] [PATCH v10 3/8] libfs: Introduce case-insensitive string comparison helper

2024-02-16 Thread Gabriel Krisman Bertazi
Eugen Hristev  writes:

> From: Gabriel Krisman Bertazi 
>
> generic_ci_match can be used by case-insensitive filesystems to compare
> strings under lookup with dirents in a case-insensitive way.  This
> function is currently reimplemented by each filesystem supporting
> casefolding, so this reduces code duplication in filesystem-specific
> code.
>
> Signed-off-by: Gabriel Krisman Bertazi 
> [eugen.hris...@collabora.com: rework to first test the exact match]
> Signed-off-by: Eugen Hristev 
> ---
>  fs/libfs.c | 80 ++
>  include/linux/fs.h |  4 +++
>  2 files changed, 84 insertions(+)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index bb18884ff20e..82871fa1b066 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1773,6 +1773,86 @@ static const struct dentry_operations 
> generic_ci_dentry_ops = {
>   .d_hash = generic_ci_d_hash,
>   .d_compare = generic_ci_d_compare,
>  };
> +
> +/**
> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
> + * This is a filesystem helper for comparison with directory entries.
> + * generic_ci_d_compare should be used in VFS' ->d_compare instead.
> + *
> + * @parent: Inode of the parent of the dirent under comparison
> + * @name: name under lookup.
> + * @folded_name: Optional pre-folded name under lookup
> + * @de_name: Dirent name.
> + * @de_name_len: dirent name length.
> + *
> + *

Since this need a respin, mind dropping the extra empty line here?

> + * Test whether a case-insensitive directory entry matches the filename
> + * being searched.  If @folded_name is provided, it is used instead of
> + * recalculating the casefold of @name.
> + *
> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
> + * < 0 on error.
> + */
> +int generic_ci_match(const struct inode *parent,
> +  const struct qstr *name,
> +  const struct qstr *folded_name,
> +  const u8 *de_name, u32 de_name_len)
> +{
> + const struct super_block *sb = parent->i_sb;
> + const struct unicode_map *um = sb->s_encoding;
> + struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
> + struct qstr dirent = QSTR_INIT(de_name, de_name_len);
> + int res;
> +
> + if (IS_ENCRYPTED(parent)) {
> + const struct fscrypt_str encrypted_name =
> + FSTR_INIT((u8 *) de_name, de_name_len);
> +
> + if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
> + return -EINVAL;
> +
> + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
> + if (!decrypted_name.name)
> + return -ENOMEM;
> + res = fscrypt_fname_disk_to_usr(parent, 0, 0, _name,
> + _name);
> + if (res < 0)
> + goto out;
> + dirent.name = decrypted_name.name;
> + dirent.len = decrypted_name.len;
> + }
> +
> + /*
> +  * Attempt a case-sensitive match first. It is cheaper and
> +  * should cover most lookups, including all the sane
> +  * applications that expect a case-sensitive filesystem.
> +  *


> +  * This comparison is safe under RCU because the caller
> +  * guarantees the consistency between str and len. See
> +  * __d_lookup_rcu_op_compare() for details.
> +  */

This paragraph doesn't really make sense here.  It is originally from
the d_compare hook, which can be called under RCU, but there is no RCU
here.  Also, here we are comparing the dirent with the
name-under-lookup, name which is already safe.


> + if (folded_name->name) {
> + if (dirent.len == folded_name->len &&
> + !memcmp(folded_name->name, dirent.name, dirent.len)) {
> + res = 1;
> + goto out;
> + }
> + res = !utf8_strncasecmp_folded(um, folded_name, );

Hmm, second thought on this.  This will ignore errors from utf8_strncasecmp*,
which CAN happen for the first time here, if the dirent itself is
corrupted on disk (exactly why we have patch 6).  Yes, ext4_match will drop the
error, but we want to propagate it from here, such that the warning on
patch 6 can trigger.

This is why I did that match dance on the original submission.  Sorry
for suggesting it.  We really want to get the error from utf8 and
propagate it if it is negative. basically:

res > 0: match
res == 0: no match.
res < 0: propagate error and let the caller handle it


> + } else {
> + if (dirent.len == name->len &&
> + !memcmp(name->name, dirent.name, dirent.len) &&
> + (!sb_has_strict_encoding(sb) || !utf8_validate(um, name))) {
> + res = 1;
> + goto out;
> + }
> + res = !utf8_strncasecmp(um, name, );
> + }
> +
> +out:
> + kfree(decrypted_name.name);
> +