Re: [PATCH v4 2/5] lib/lz4: update LZ4 decompressor module

2024-05-28 Thread Gao Xiang




On 2024/5/28 21:28, Jonathan Liu wrote:

Hi Jianan,

Here are the LZ4 decompression times I got running "time unlz4 ..." on
Rock 4 SE with RK3399 CPU.

v2024.04: 1.329 seconds
v2024.04 with 26c7fdadcb7 ("lib/lz4: update LZ4 decompressor module")
reverted: 0.665 seconds
v2024.04 with your patch: 1.216 seconds

I managed to get better performance by optimizing it myself.
v2024.04 with my patch: 0.785 seconds

With my patch, it makes no difference if I use __builtin_memcpy or
memcpy in lz4.c and lz4_wrapper.c so I just left it using memcpy.
It is still slower than reverting the LZ4 update though.


I'm not sure what's the old version come from, but from the copyright
itself it seems some earlier 2015 version.  Note that there is no
absolute outperform between old version and new version, old version
may be lack of some necessary boundary check or lead to some msan
warning or something, like a new commit of year 2016:
https://github.com/lz4/lz4/commit/f094f531441140f10fd461ba769f49d10f5cd581


/* costs ~1%; silence an msan warning when offset == 0 */
/*
 * note : when partialDecoding, there is no guarantee that
 * at least 4 bytes remain available in output buffer
 */
if (!partialDecoding) {
assert(oend > op);
assert(oend - op >= 4);

LZ4_write32(op, (U32)offset);
}
For the example above, you could completely remove the line above if
you don't care about such warning, as it claims ~1% performance loss.

Also since no u-boot user uses in-place decompression and if
you memmove() doesn't behave well, you could update the
following line
/*
 * supports overlapping memory regions; only matters
 * for in-place decompression scenarios
 */
memmove(op, ip, length);
into memcpy() instead.

The new lz4 codebase relies on memcpy() / memmove() code optimization
more than old version, if memcpy() assembly doesn't generate well on
your platform, it might have some issue.

Thanks,
Gao Xiang



My patch:
--- a/lib/lz4.c
+++ b/lib/lz4.c
@@ -41,6 +41,16 @@ static FORCE_INLINE u16 LZ4_readLE16(const void *src)
 return get_unaligned_le16(src);
  }

+static FORCE_INLINE void LZ4_copy2(void *dst, const void *src)
+{
+ put_unaligned(get_unaligned((const u16 *)src), (u16 *)dst);
+}
+
+static FORCE_INLINE void LZ4_copy4(void *dst, const void *src)
+{
+ put_unaligned(get_unaligned((const u32 *)src), (u32 *)dst);
+}
+
  static FORCE_INLINE void LZ4_copy8(void *dst, const void *src)
  {
 put_unaligned(get_unaligned((const u64 *)src), (u64 *)dst);
@@ -215,7 +225,10 @@ static FORCE_INLINE int LZ4_decompress_generic(
&& likely((endOnInput ? ip < shortiend : 1) &
  (op <= shortoend))) {
 /* Copy the literals */
-   memcpy(op, ip, endOnInput ? 16 : 8);
+ LZ4_copy8(op, ip);
+ if (endOnInput)
+ LZ4_copy8(op + 8, ip + 8);
+
 op += length; ip += length;

 /*
@@ -234,9 +247,9 @@ static FORCE_INLINE int LZ4_decompress_generic(
 (offset >= 8) &&
 (dict == withPrefix64k || match >= lowPrefix)) {
 /* Copy the match. */
-   memcpy(op + 0, match + 0, 8);
-   memcpy(op + 8, match + 8, 8);
-   memcpy(op + 16, match + 16, 2);
+ LZ4_copy8(op, match);
+ LZ4_copy8(op + 8, match + 8);
+ LZ4_copy2(op + 16, match + 16);
 op += length + MINMATCH;
 /* Both stages worked, load the next token. */
 continue;
@@ -466,7 +479,7 @@ _copy_match:
 op[2] = match[2];
 op[3] = match[3];
 match += inc32table[offset];
-   memcpy(op + 4, match, 4);
+ LZ4_copy4(op + 4, match);
 match -= dec64table[offset];
 } else {
 LZ4_copy8(op, match);

Let me know if you have any further suggestions.

Thanks.

Regards,
Jonathan

On Sun, 26 May 2024 at 22:18, Jianan Huang  wrote:


Hi Jonathan,

Could you please try the following patch ? It replaces all memcpy() calls in 
lz4 with __builtin_memcpy().

diff --git a/lib/lz4.c b/lib/lz4.c
index d365dc727c..2afe31c1c3 100644
--- a/lib/lz4.c
+++ b/lib/lz4.c
@@ -34,6 +34,8 @@
  #include 
  #include 

+#define LZ4_memcpy(dst, src, size) __builtin_memcpy(dst, src, size)
+
  #define FORCE_INLINE inline

[PATCH] erofs-utils: fix false-positive errors on gcc 4.8.5

2024-05-28 Thread Gao Xiang
Just old compiler bugs.

Signed-off-by: Gao Xiang 
---
 lib/data.c  | 2 +-
 lib/dedupe.c| 2 +-
 lib/fragments.c | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/data.c b/lib/data.c
index a87053f..c139e0c 100644
--- a/lib/data.c
+++ b/lib/data.c
@@ -420,7 +420,7 @@ static void *erofs_read_metadata_bdi(struct erofs_sb_info 
*sbi,
ret = blk_read(sbi, 0, data, erofs_blknr(sbi, *offset), 1);
if (ret)
return ERR_PTR(ret);
-   len = le16_to_cpu(*(__le16 *)[erofs_blkoff(sbi, *offset)]);
+   len = le16_to_cpu(*(__le16 *)(data + erofs_blkoff(sbi, *offset)));
if (!len)
return ERR_PTR(-EFSCORRUPTED);
 
diff --git a/lib/dedupe.c b/lib/dedupe.c
index aaaccb5..e475635 100644
--- a/lib/dedupe.c
+++ b/lib/dedupe.c
@@ -99,7 +99,7 @@ int z_erofs_dedupe_match(struct z_erofs_dedupe_ctx *ctx)
struct z_erofs_dedupe_item *e;
 
unsigned int extra = 0;
-   u64 xxh64_csum;
+   u64 xxh64_csum = 0;
 
if (initial) {
/* initial try */
diff --git a/lib/fragments.c b/lib/fragments.c
index d4f6be1..f4c9bd7 100644
--- a/lib/fragments.c
+++ b/lib/fragments.c
@@ -289,6 +289,8 @@ int z_erofs_pack_file_from_fd(struct erofs_inode *inode, 
int fd,
if (memblock)
rc = z_erofs_fragments_dedupe_insert(memblock,
inode->fragment_size, inode->fragmentoff, tofcrc);
+   else
+   rc = 0;
 out:
if (memblock)
munmap(memblock, inode->i_size);
-- 
2.39.3



Re: [PATCH v4 2/5] lib/lz4: update LZ4 decompressor module

2024-05-24 Thread Gao Xiang

Hi,

On 2024/5/24 22:26, Jonathan Liu wrote:

Hi Jianan,

On Sat, 26 Feb 2022 at 18:05, Huang Jianan  wrote:


Update the LZ4 compression module based on LZ4 v1.8.3 in order to
use the newest LZ4_decompress_safe_partial() which can now decode
exactly the nb of bytes requested.

Signed-off-by: Huang Jianan 


I noticed after this commit LZ4 decompression is slower.
ulz4fn function call takes 1.209670 seconds with this commit.
After reverting this commit, the ulz4fn function call takes 0.587032 seconds.

I am decompressing a LZ4 compressed kernel (compressed with lz4 v1.9.4
using -9 option for maximum compression) on RK3399.

Any ideas why it is slower with this commit and how the performance
regression can be fixed?


Just the quick glance, I think the issue may be due to memcpy/memmove
since it seems the main difference between these two codebases
(I'm not sure which LZ4 version the old codebase was based on) and
the new version mainly relies on memcpy/memmove instead of its own
versions.

Would you mind to check the assembly how memcpy/memset is generated
on your platform?

Thanks,
Gao Xiang



Thanks.

Regards,
Jonathan


[GIT PULL] erofs more updates for 6.10-rc1

2024-05-24 Thread Gao Xiang
Hi Linus,

Could you consider these extra patches for 6.10-rc1?

The main ones are metadata API conversion to byte offsets by
Al Viro.  Since some of patches are also part of VFS
"->bd_inode elimination" (and they were merged upstream days ago),
I did a merge commit to resolve the dependency with the detailed
description.

Another patch gets rid of unnecessary memory allocation out of
DEFLATE decompressor.  The remaining one is a trivial cleanup.

All commits have been in -next and no potential merge conflict is
observed.

Thanks,
Gao Xiang

The following changes since commit 7c35de4df1056a5a1fb4de042197b8f5b1033b61:

  erofs: Zstandard compression support (2024-05-09 07:46:56 +0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git 
tags/erofs-for-6.10-rc1-2

for you to fetch changes up to 80eb4f62056d6ae709bdd0636ab96ce660f494b2:

  erofs: avoid allocating DEFLATE streams before mounting (2024-05-21 03:07:39 
+0800)


Changes since last update:

 - Convert metadata APIs to byte offsets;

 - Avoid allocating DEFLATE streams unnecessarily;

 - Some erofs_show_options() cleanup.


Al Viro (6):
  erofs: switch erofs_bread() to passing offset instead of block number
  erofs_buf: store address_space instead of inode
  erofs: mechanically convert erofs_read_metabuf() to offsets
  erofs: don't align offset for erofs_read_metabuf() (simple cases)
  erofs: don't round offset down for erofs_read_metabuf()
  z_erofs_pcluster_begin(): don't bother with rounding position down

Gao Xiang (2):
  Merge branch 'misc.erofs' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git
  erofs: avoid allocating DEFLATE streams before mounting

Hongzhen Luo (1):
  erofs: clean up erofs_show_options()

 fs/erofs/data.c | 25 +--
 fs/erofs/decompressor_deflate.c | 55 ++---
 fs/erofs/dir.c  |  4 +--
 fs/erofs/fscache.c  | 12 +++--
 fs/erofs/inode.c|  4 +--
 fs/erofs/internal.h |  9 +++
 fs/erofs/namei.c|  6 ++---
 fs/erofs/super.c| 44 +++--
 fs/erofs/xattr.c| 37 +++
 fs/erofs/zdata.c|  8 +++---
 fs/erofs/zmap.c | 24 +-
 11 files changed, 97 insertions(+), 131 deletions(-)


Re: [PATCH 1/2] erofs-utils: lib: provide helper to disable hashmap shrinking

2024-05-23 Thread Gao Xiang




On 2024/5/24 05:01, Sandeep Dhavale wrote:

This helper sets hasmap.shrink_at to 0. This is helpful to iterate over
hashmap using hashmap_iter_next() and use hashmap_remove() in single
pass efficeintly.

Signed-off-by: Sandeep Dhavale 


Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


Re: [PATCH 2/2] erofs-utils: lib: improve freeing hashmap in erofs_blob_exit()

2024-05-23 Thread Gao Xiang

Hi Sandeep,

On 2024/5/24 05:01, Sandeep Dhavale wrote:

Depending on size of the filesystem being built there can be huge number
of elements in the hashmap. Currently we call hashmap_iter_first() in
while loop to iterate and free the elements. However technically
correct, this is inefficient in 2 aspects.

- As we are iterating the elements for removal, we do not need overhead of
rehashing.
- Second part which contributes hugely to the performance is using
hashmap_iter_first() as it starts scanning from index 0 throwing away
the previous successful scan. For sparsely populated hashmap this becomes
O(n^2) in worst case.

Lets fix this by disabling hashmap shrink which avoids rehashing
and use hashmap_iter_next() which is now guaranteed to iterate over
all the elements while removing while avoiding the performance pitfalls
of using hashmap_iter_first().

Test with random data shows performance improvement as:

fs_size  Before   After
1G   23s  7s
2G   81s  15s
4G   272s 31s
8G   1252s61s


Sigh.. BTW, in the long term, I guess we might need to
find a better hashmap implementation (with MIT or BSD
license) instead of this one.



Signed-off-by: Sandeep Dhavale > ---
  lib/blobchunk.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/blobchunk.c b/lib/blobchunk.c
index 645bcc1..8082aa4 100644
--- a/lib/blobchunk.c
+++ b/lib/blobchunk.c
@@ -548,11 +548,17 @@ void erofs_blob_exit(void)
if (blobfile)
fclose(blobfile);
  
-	while ((e = hashmap_iter_first(_hashmap, ))) {

+   /* Disable hashmap shrink, effectively disabling rehash.
+* This way we can iterate over entire hashmap efficiently
+* and safely by using hashmap_iter_next() */


I will fix up the comment style manually, otherwise it
looks good to me...

Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


Re: [PATCH] build: support building static library

2024-05-23 Thread Gao Xiang

Hi Comix!

On 2024/5/23 15:31, ComixHe wrote:

In some cases, developer may need to integrate erofs-utils into their
proejct as a static library to reduce package dependencies and
have more finer control over the feature used by the project.


Thanks for sharing this.



For exapmle, squashfuse provides a static library `libsquashfuse.a` and
exposes some useful functions, Appimage uses this static library to build
image. It could ensure that the executable image can be executed directly
on most linux platforms and the user doesn't need to install squashfuse
in order to execute the image.

Signed-off-by: ComixHe 
---
  configure.ac | 28 
  dump/Makefile.am | 10 ++
  fsck/Makefile.am | 10 ++
  fuse/Makefile.am | 10 ++
  mkfs/Makefile.am | 10 ++
  5 files changed, 68 insertions(+)

diff --git a/configure.ac b/configure.ac
index 1989bca..16ddb7c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -147,6 +147,30 @@ AC_ARG_ENABLE(fuse,
 [AS_HELP_STRING([--enable-fuse], [enable erofsfuse @<:@default=no@:>@])],
 [enable_fuse="$enableval"], [enable_fuse="no"])
  
+AC_ARG_ENABLE([static-fuse],

+[AS_HELP_STRING([--enable-static-fuse],
+[build erofsfuse as a static library @<:@default=no@:>@])],
+[enable_static_fuse="$enableval"],
+[enable_static_fuse="no"])
+
+AC_ARG_ENABLE([static-dump],
+[AS_HELP_STRING([--enable-static-dump],
+[build dump.erofs as a static library 
@<:@default=no@:>@])],
+[enable_static_dump="$enableval"],
+[enable_static_dump="no"])
+
+AC_ARG_ENABLE([static-mkfs],
+[AS_HELP_STRING([--enable-static-mkfs],
+[build mkfs.erofs as a static library 
@<:@default=no@:>@])],
+[enable_static_mkfs="$enableval"],
+[enable_static_mkfs="no"])
+
+AC_ARG_ENABLE([static-fsck],
+[AS_HELP_STRING([--enable-static-fsck],
+[build fsck.erofs as a static library 
@<:@default=no@:>@])],
+[enable_static_fsck="$enableval"],
+[enable_static_fsck="no"])


But how could we support static libraries from binaries?

I guess you need static liberofs instead?

Thanks,
Gao Xiang


[PATCH] erofs-utils: lib: fix uncompressed packed inode

2024-05-22 Thread Gao Xiang
Currently, packed inode can be used in the unencoded way too such
as xattr prefixes.

Signed-off-by: Gao Xiang 
---
 lib/inode.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/inode.c b/lib/inode.c
index cbe0810..cd48e55 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -1710,24 +1710,24 @@ struct erofs_inode 
*erofs_mkfs_build_special_from_fd(int fd, const char *name)
inode->nid = inode->sbi->packed_nid;
}
 
-   ictx = erofs_begin_compressed_file(inode, fd, 0);
-   if (IS_ERR(ictx))
-   return ERR_CAST(ictx);
+   if (cfg.c_compr_opts[0].alg &&
+   erofs_file_is_compressible(inode)) {
+   ictx = erofs_begin_compressed_file(inode, fd, 0);
+   if (IS_ERR(ictx))
+   return ERR_CAST(ictx);
+
+   DBG_BUGON(!ictx);
+   ret = erofs_write_compressed_file(ictx);
+   if (ret && ret != -ENOSPC)
+return ERR_PTR(ret);
 
-   DBG_BUGON(!ictx);
-   ret = erofs_write_compressed_file(ictx);
-   if (ret == -ENOSPC) {
ret = lseek(fd, 0, SEEK_SET);
if (ret < 0)
return ERR_PTR(-errno);
-
-   ret = write_uncompressed_file_from_fd(inode, fd);
}
-
-   if (ret) {
-   DBG_BUGON(ret == -ENOSPC);
+   ret = write_uncompressed_file_from_fd(inode, fd);
+   if (ret)
return ERR_PTR(ret);
-   }
erofs_prepare_inode_buffer(inode);
erofs_write_tail_end(inode);
return inode;
-- 
2.39.3



[PATCH 6.6.y 2/2] erofs: reliably distinguish block based and fscache mode

2024-05-21 Thread Gao Xiang
From: Christian Brauner 

commit 7af2ae1b1531feab5d38ec9c8f472dc6cceb4606 upstream.

When erofs_kill_sb() is called in block dev based mode, s_bdev may not
have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled,
it will be mistaken for fscache mode, and then attempt to free an anon_dev
that has never been allocated, triggering the following warning:


ida_free called for id=0 which is not allocated.
WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
Modules linked in:
CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
RIP: 0010:ida_free+0x134/0x140
Call Trace:
 
 erofs_kill_sb+0x81/0x90
 deactivate_locked_super+0x35/0x80
 get_tree_bdev+0x136/0x1e0
 vfs_get_tree+0x2c/0xf0
 do_new_mount+0x190/0x2f0
 [...]


Now when erofs_kill_sb() is called, erofs_sb_info must have been
initialised, so use sbi->fsid to distinguish between the two modes.

Signed-off-by: Christian Brauner 
Signed-off-by: Baokun Li 
Reviewed-by: Jingbo Xu 
Reviewed-by: Gao Xiang 
Reviewed-by: Chao Yu 
Link: https://lore.kernel.org/r/20240419123611.947084-3-libaok...@huawei.com
Signed-off-by: Gao Xiang 
---
 fs/erofs/super.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index ea9bb0ad2a7c..113414e6f35b 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -786,17 +786,13 @@ static int erofs_init_fs_context(struct fs_context *fc)
 
 static void erofs_kill_sb(struct super_block *sb)
 {
-   struct erofs_sb_info *sbi;
+   struct erofs_sb_info *sbi = EROFS_SB(sb);
 
-   if (erofs_is_fscache_mode(sb))
+   if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && sbi->fsid)
kill_anon_super(sb);
else
kill_block_super(sb);
 
-   sbi = EROFS_SB(sb);
-   if (!sbi)
-   return;
-
erofs_free_dev_context(sbi->devs);
fs_put_dax(sbi->dax_dev, NULL);
erofs_fscache_unregister_fs(sb);
-- 
2.39.3



[PATCH 6.6.y 1/2] erofs: get rid of erofs_fs_context

2024-05-21 Thread Gao Xiang
From: Baokun Li 

commit 07abe43a28b2c660f726d66f5470f7f114f9643a upstream.

Instead of allocating the erofs_sb_info in fill_super() allocate it during
erofs_init_fs_context() and ensure that erofs can always have the info
available during erofs_kill_sb(). After this erofs_fs_context is no longer
needed, replace ctx with sbi, no functional changes.

Suggested-by: Jingbo Xu 
Signed-off-by: Baokun Li 
Reviewed-by: Jingbo Xu 
Reviewed-by: Gao Xiang 
Reviewed-by: Chao Yu 
Link: https://lore.kernel.org/r/20240419123611.947084-2-libaok...@huawei.com
[ Gao Xiang: trivial conflict due to a warning message. ]
Signed-off-by: Gao Xiang 
---
 fs/erofs/internal.h |   7 ---
 fs/erofs/super.c| 116 
 2 files changed, 53 insertions(+), 70 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 1a4fe9f60295..787cc9ff9029 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -82,13 +82,6 @@ struct erofs_dev_context {
bool flatdev;
 };
 
-struct erofs_fs_context {
-   struct erofs_mount_opts opt;
-   struct erofs_dev_context *devs;
-   char *fsid;
-   char *domain_id;
-};
-
 /* all filesystem-wide lz4 configurations */
 struct erofs_sb_lz4_info {
/* # of pages needed for EROFS lz4 rolling decompression */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index c9f9a43197db..ea9bb0ad2a7c 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -367,18 +367,18 @@ static int erofs_read_superblock(struct super_block *sb)
return ret;
 }
 
-static void erofs_default_options(struct erofs_fs_context *ctx)
+static void erofs_default_options(struct erofs_sb_info *sbi)
 {
 #ifdef CONFIG_EROFS_FS_ZIP
-   ctx->opt.cache_strategy = EROFS_ZIP_CACHE_READAROUND;
-   ctx->opt.max_sync_decompress_pages = 3;
-   ctx->opt.sync_decompress = EROFS_SYNC_DECOMPRESS_AUTO;
+   sbi->opt.cache_strategy = EROFS_ZIP_CACHE_READAROUND;
+   sbi->opt.max_sync_decompress_pages = 3;
+   sbi->opt.sync_decompress = EROFS_SYNC_DECOMPRESS_AUTO;
 #endif
 #ifdef CONFIG_EROFS_FS_XATTR
-   set_opt(>opt, XATTR_USER);
+   set_opt(>opt, XATTR_USER);
 #endif
 #ifdef CONFIG_EROFS_FS_POSIX_ACL
-   set_opt(>opt, POSIX_ACL);
+   set_opt(>opt, POSIX_ACL);
 #endif
 }
 
@@ -423,17 +423,17 @@ static const struct fs_parameter_spec 
erofs_fs_parameters[] = {
 static bool erofs_fc_set_dax_mode(struct fs_context *fc, unsigned int mode)
 {
 #ifdef CONFIG_FS_DAX
-   struct erofs_fs_context *ctx = fc->fs_private;
+   struct erofs_sb_info *sbi = fc->s_fs_info;
 
switch (mode) {
case EROFS_MOUNT_DAX_ALWAYS:
warnfc(fc, "DAX enabled. Warning: EXPERIMENTAL, use at your own 
risk");
-   set_opt(>opt, DAX_ALWAYS);
-   clear_opt(>opt, DAX_NEVER);
+   set_opt(>opt, DAX_ALWAYS);
+   clear_opt(>opt, DAX_NEVER);
return true;
case EROFS_MOUNT_DAX_NEVER:
-   set_opt(>opt, DAX_NEVER);
-   clear_opt(>opt, DAX_ALWAYS);
+   set_opt(>opt, DAX_NEVER);
+   clear_opt(>opt, DAX_ALWAYS);
return true;
default:
DBG_BUGON(1);
@@ -448,7 +448,7 @@ static bool erofs_fc_set_dax_mode(struct fs_context *fc, 
unsigned int mode)
 static int erofs_fc_parse_param(struct fs_context *fc,
struct fs_parameter *param)
 {
-   struct erofs_fs_context *ctx = fc->fs_private;
+   struct erofs_sb_info *sbi = fc->s_fs_info;
struct fs_parse_result result;
struct erofs_device_info *dif;
int opt, ret;
@@ -461,9 +461,9 @@ static int erofs_fc_parse_param(struct fs_context *fc,
case Opt_user_xattr:
 #ifdef CONFIG_EROFS_FS_XATTR
if (result.boolean)
-   set_opt(>opt, XATTR_USER);
+   set_opt(>opt, XATTR_USER);
else
-   clear_opt(>opt, XATTR_USER);
+   clear_opt(>opt, XATTR_USER);
 #else
errorfc(fc, "{,no}user_xattr options not supported");
 #endif
@@ -471,16 +471,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
case Opt_acl:
 #ifdef CONFIG_EROFS_FS_POSIX_ACL
if (result.boolean)
-   set_opt(>opt, POSIX_ACL);
+   set_opt(>opt, POSIX_ACL);
else
-   clear_opt(>opt, POSIX_ACL);
+   clear_opt(>opt, POSIX_ACL);
 #else
errorfc(fc, "{,no}acl options not supported");
 #endif
break;
case Opt_cache_strategy:
 #ifdef CONFIG_EROFS_FS_ZIP
-   ctx->opt.cache_strategy = result.uint_32;
+   sbi->opt.cache_strategy = result.uint_32;
 #else
errorfc(fc, "compre

[PATCH 6.8.y 1/2] erofs: get rid of erofs_fs_context

2024-05-21 Thread Gao Xiang
From: Baokun Li 

commit 07abe43a28b2c660f726d66f5470f7f114f9643a upstream.

Instead of allocating the erofs_sb_info in fill_super() allocate it during
erofs_init_fs_context() and ensure that erofs can always have the info
available during erofs_kill_sb(). After this erofs_fs_context is no longer
needed, replace ctx with sbi, no functional changes.

Suggested-by: Jingbo Xu 
Signed-off-by: Baokun Li 
Reviewed-by: Jingbo Xu 
Reviewed-by: Gao Xiang 
Reviewed-by: Chao Yu 
Link: https://lore.kernel.org/r/20240419123611.947084-2-libaok...@huawei.com
[ Gao Xiang: trivial conflict due to a warning message. ]
Signed-off-by: Gao Xiang 
---
 fs/erofs/internal.h |   7 ---
 fs/erofs/super.c| 116 
 2 files changed, 53 insertions(+), 70 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 410f5af62354..c69174675caf 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -84,13 +84,6 @@ struct erofs_dev_context {
bool flatdev;
 };
 
-struct erofs_fs_context {
-   struct erofs_mount_opts opt;
-   struct erofs_dev_context *devs;
-   char *fsid;
-   char *domain_id;
-};
-
 /* all filesystem-wide lz4 configurations */
 struct erofs_sb_lz4_info {
/* # of pages needed for EROFS lz4 rolling decompression */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 24788c230b49..8d7a3abb9c1b 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -370,18 +370,18 @@ static int erofs_read_superblock(struct super_block *sb)
return ret;
 }
 
-static void erofs_default_options(struct erofs_fs_context *ctx)
+static void erofs_default_options(struct erofs_sb_info *sbi)
 {
 #ifdef CONFIG_EROFS_FS_ZIP
-   ctx->opt.cache_strategy = EROFS_ZIP_CACHE_READAROUND;
-   ctx->opt.max_sync_decompress_pages = 3;
-   ctx->opt.sync_decompress = EROFS_SYNC_DECOMPRESS_AUTO;
+   sbi->opt.cache_strategy = EROFS_ZIP_CACHE_READAROUND;
+   sbi->opt.max_sync_decompress_pages = 3;
+   sbi->opt.sync_decompress = EROFS_SYNC_DECOMPRESS_AUTO;
 #endif
 #ifdef CONFIG_EROFS_FS_XATTR
-   set_opt(>opt, XATTR_USER);
+   set_opt(>opt, XATTR_USER);
 #endif
 #ifdef CONFIG_EROFS_FS_POSIX_ACL
-   set_opt(>opt, POSIX_ACL);
+   set_opt(>opt, POSIX_ACL);
 #endif
 }
 
@@ -426,17 +426,17 @@ static const struct fs_parameter_spec 
erofs_fs_parameters[] = {
 static bool erofs_fc_set_dax_mode(struct fs_context *fc, unsigned int mode)
 {
 #ifdef CONFIG_FS_DAX
-   struct erofs_fs_context *ctx = fc->fs_private;
+   struct erofs_sb_info *sbi = fc->s_fs_info;
 
switch (mode) {
case EROFS_MOUNT_DAX_ALWAYS:
warnfc(fc, "DAX enabled. Warning: EXPERIMENTAL, use at your own 
risk");
-   set_opt(>opt, DAX_ALWAYS);
-   clear_opt(>opt, DAX_NEVER);
+   set_opt(>opt, DAX_ALWAYS);
+   clear_opt(>opt, DAX_NEVER);
return true;
case EROFS_MOUNT_DAX_NEVER:
-   set_opt(>opt, DAX_NEVER);
-   clear_opt(>opt, DAX_ALWAYS);
+   set_opt(>opt, DAX_NEVER);
+   clear_opt(>opt, DAX_ALWAYS);
return true;
default:
DBG_BUGON(1);
@@ -451,7 +451,7 @@ static bool erofs_fc_set_dax_mode(struct fs_context *fc, 
unsigned int mode)
 static int erofs_fc_parse_param(struct fs_context *fc,
struct fs_parameter *param)
 {
-   struct erofs_fs_context *ctx = fc->fs_private;
+   struct erofs_sb_info *sbi = fc->s_fs_info;
struct fs_parse_result result;
struct erofs_device_info *dif;
int opt, ret;
@@ -464,9 +464,9 @@ static int erofs_fc_parse_param(struct fs_context *fc,
case Opt_user_xattr:
 #ifdef CONFIG_EROFS_FS_XATTR
if (result.boolean)
-   set_opt(>opt, XATTR_USER);
+   set_opt(>opt, XATTR_USER);
else
-   clear_opt(>opt, XATTR_USER);
+   clear_opt(>opt, XATTR_USER);
 #else
errorfc(fc, "{,no}user_xattr options not supported");
 #endif
@@ -474,16 +474,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
case Opt_acl:
 #ifdef CONFIG_EROFS_FS_POSIX_ACL
if (result.boolean)
-   set_opt(>opt, POSIX_ACL);
+   set_opt(>opt, POSIX_ACL);
else
-   clear_opt(>opt, POSIX_ACL);
+   clear_opt(>opt, POSIX_ACL);
 #else
errorfc(fc, "{,no}acl options not supported");
 #endif
break;
case Opt_cache_strategy:
 #ifdef CONFIG_EROFS_FS_ZIP
-   ctx->opt.cache_strategy = result.uint_32;
+   sbi->opt.cache_strategy = result.uint_32;
 #else
errorfc(fc, "compre

[PATCH 6.8.y 2/2] erofs: reliably distinguish block based and fscache mode

2024-05-21 Thread Gao Xiang
From: Christian Brauner 

commit 7af2ae1b1531feab5d38ec9c8f472dc6cceb4606 upstream.

When erofs_kill_sb() is called in block dev based mode, s_bdev may not
have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled,
it will be mistaken for fscache mode, and then attempt to free an anon_dev
that has never been allocated, triggering the following warning:


ida_free called for id=0 which is not allocated.
WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
Modules linked in:
CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
RIP: 0010:ida_free+0x134/0x140
Call Trace:
 
 erofs_kill_sb+0x81/0x90
 deactivate_locked_super+0x35/0x80
 get_tree_bdev+0x136/0x1e0
 vfs_get_tree+0x2c/0xf0
 do_new_mount+0x190/0x2f0
 [...]


Now when erofs_kill_sb() is called, erofs_sb_info must have been
initialised, so use sbi->fsid to distinguish between the two modes.

Signed-off-by: Christian Brauner 
Signed-off-by: Baokun Li 
Reviewed-by: Jingbo Xu 
Reviewed-by: Gao Xiang 
Reviewed-by: Chao Yu 
Link: https://lore.kernel.org/r/20240419123611.947084-3-libaok...@huawei.com
Signed-off-by: Gao Xiang 
---
 fs/erofs/super.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 8d7a3abb9c1b..a2fa74558570 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -790,17 +790,13 @@ static int erofs_init_fs_context(struct fs_context *fc)
 
 static void erofs_kill_sb(struct super_block *sb)
 {
-   struct erofs_sb_info *sbi;
+   struct erofs_sb_info *sbi = EROFS_SB(sb);
 
-   if (erofs_is_fscache_mode(sb))
+   if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && sbi->fsid)
kill_anon_super(sb);
else
kill_block_super(sb);
 
-   sbi = EROFS_SB(sb);
-   if (!sbi)
-   return;
-
erofs_free_dev_context(sbi->devs);
fs_put_dax(sbi->dax_dev, NULL);
erofs_fscache_unregister_fs(sb);
-- 
2.39.3



Re: [PATCH v2 4/5] cachefiles: cyclic allocation of msg_id to avoid reuse

2024-05-20 Thread Gao Xiang




On 2024/5/21 10:36, Baokun Li wrote:

On 2024/5/20 22:56, Gao Xiang wrote:

Hi Baokun,

On 2024/5/20 21:24, Baokun Li wrote:

On 2024/5/20 20:54, Gao Xiang wrote:



On 2024/5/20 20:42, Baokun Li wrote:

On 2024/5/20 18:04, Jeff Layton wrote:

On Mon, 2024-05-20 at 12:06 +0800, Baokun Li wrote:

Hi Jeff,

Thank you very much for your review!

On 2024/5/19 19:11, Jeff Layton wrote:

On Wed, 2024-05-15 at 20:51 +0800, libao...@huaweicloud.com wrote:

From: Baokun Li 

Reusing the msg_id after a maliciously completed reopen request may cause
a read request to remain unprocessed and result in a hung, as shown below:

 t1   |  t2   |  t3
-
cachefiles_ondemand_select_req
   cachefiles_ondemand_object_is_close(A)
   cachefiles_ondemand_set_object_reopening(A)
   queue_work(fscache_object_wq, >work)
  ondemand_object_worker
cachefiles_ondemand_init_object(A)
cachefiles_ondemand_send_req(OPEN)
  // get msg_id 6
wait_for_completion(_A->done)
cachefiles_ondemand_daemon_read
   // read msg_id 6 req_A
   cachefiles_ondemand_get_fd
   copy_to_user
  // Malicious completion msg_id 6
  copen 6,-1
cachefiles_ondemand_copen
complete(_A->done)
   // will not set the object to close
   // because ondemand_id && fd is valid.

  // ondemand_object_worker() is done
  // but the object is still reopening.

  // new open req_B
cachefiles_ondemand_init_object(B)
cachefiles_ondemand_send_req(OPEN)
   // reuse msg_id 6
process_open_req
   copen 6,A.size
   // The expected failed copen was executed successfully

Expect copen to fail, and when it does, it closes fd, which sets the
object to close, and then close triggers reopen again. However, due to
msg_id reuse resulting in a successful copen, the anonymous fd is not
closed until the daemon exits. Therefore read requests waiting for reopen
to complete may trigger hung task.

To avoid this issue, allocate the msg_id cyclically to avoid reusing the
msg_id for a very short duration of time.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up 
cookie")
Signed-off-by: Baokun Li 
---
   fs/cachefiles/internal.h |  1 +
   fs/cachefiles/ondemand.c | 20 
   2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 8ecd296cc1c4..9200c00f3e98 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -128,6 +128,7 @@ struct cachefiles_cache {
   unsigned long    req_id_next;
   struct xarray    ondemand_ids;    /* xarray for ondemand_id 
allocation */
   u32    ondemand_id_next;
+    u32    msg_id_next;
   };
   static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache 
*cache)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index f6440b3e7368..b10952f77472 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -433,20 +433,32 @@ static int cachefiles_ondemand_send_req(struct 
cachefiles_object *object,
   smp_mb();
   if (opcode == CACHEFILES_OP_CLOSE &&
- !cachefiles_ondemand_object_is_open(object)) {
+ !cachefiles_ondemand_object_is_open(object)) {
WARN_ON_ONCE(object->ondemand->ondemand_id == 0);
   xas_unlock();
   ret = -EIO;
   goto out;
   }
-    xas.xa_index = 0;
+    /*
+ * Cyclically find a free xas to avoid msg_id reuse that would
+ * cause the daemon to successfully copen a stale msg_id.
+ */
+    xas.xa_index = cache->msg_id_next;
   xas_find_marked(, UINT_MAX, XA_FREE_MARK);
+    if (xas.xa_node == XAS_RESTART) {
+    xas.xa_index = 0;
+    xas_find_marked(, cache->msg_id_next - 1, XA_FREE_MARK);
+    }
   if (xas.xa_node == XAS_RESTART)
   xas_set_err(, -EBUSY);
+
   xas_store(, req);
-    xas_clear_mark(, XA_FREE_MARK);
-    xas_set_mark(, CACHEFILES_REQ_NEW);
+    if (xas_valid()) {
+    cache->msg_id_next = xas.xa_index + 1;

If you have a long-standing stuck request, could this counter wrap
around and you still end up with reuse?

Yes, msg_id_next is declared to be of type u32 in the hope that when
xa_index == UINT_MAX, a wrap around occurs so that msg_id_next
goes to zero. Limiting xa_index to no more than UINT_MAX is to avoid
the xarry being too deep.

If msg_id_next is equal to the id of a long-standing stuck request
after the wrap-around, it is true that the reuse in the above problem
may also occur.

But I feel that a long stuck request is problematic in itself, it means
that after we have sent 4294967

Re: [PATCH v2 4/5] cachefiles: cyclic allocation of msg_id to avoid reuse

2024-05-20 Thread Gao Xiang

Hi Baokun,

On 2024/5/20 21:24, Baokun Li wrote:

On 2024/5/20 20:54, Gao Xiang wrote:



On 2024/5/20 20:42, Baokun Li wrote:

On 2024/5/20 18:04, Jeff Layton wrote:

On Mon, 2024-05-20 at 12:06 +0800, Baokun Li wrote:

Hi Jeff,

Thank you very much for your review!

On 2024/5/19 19:11, Jeff Layton wrote:

On Wed, 2024-05-15 at 20:51 +0800, libao...@huaweicloud.com wrote:

From: Baokun Li 

Reusing the msg_id after a maliciously completed reopen request may cause
a read request to remain unprocessed and result in a hung, as shown below:

 t1   |  t2   |  t3
-
cachefiles_ondemand_select_req
   cachefiles_ondemand_object_is_close(A)
   cachefiles_ondemand_set_object_reopening(A)
   queue_work(fscache_object_wq, >work)
  ondemand_object_worker
   cachefiles_ondemand_init_object(A)
    cachefiles_ondemand_send_req(OPEN)
  // get msg_id 6
wait_for_completion(_A->done)
cachefiles_ondemand_daemon_read
   // read msg_id 6 req_A
   cachefiles_ondemand_get_fd
   copy_to_user
  // Malicious completion msg_id 6
  copen 6,-1
cachefiles_ondemand_copen
complete(_A->done)
   // will not set the object to close
   // because ondemand_id && fd is valid.

  // ondemand_object_worker() is done
  // but the object is still reopening.

  // new open req_B
cachefiles_ondemand_init_object(B)
cachefiles_ondemand_send_req(OPEN)
   // reuse msg_id 6
process_open_req
   copen 6,A.size
   // The expected failed copen was executed successfully

Expect copen to fail, and when it does, it closes fd, which sets the
object to close, and then close triggers reopen again. However, due to
msg_id reuse resulting in a successful copen, the anonymous fd is not
closed until the daemon exits. Therefore read requests waiting for reopen
to complete may trigger hung task.

To avoid this issue, allocate the msg_id cyclically to avoid reusing the
msg_id for a very short duration of time.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up 
cookie")
Signed-off-by: Baokun Li 
---
   fs/cachefiles/internal.h |  1 +
   fs/cachefiles/ondemand.c | 20 
   2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 8ecd296cc1c4..9200c00f3e98 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -128,6 +128,7 @@ struct cachefiles_cache {
   unsigned long    req_id_next;
   struct xarray    ondemand_ids;    /* xarray for ondemand_id 
allocation */
   u32    ondemand_id_next;
+    u32    msg_id_next;
   };
   static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache 
*cache)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index f6440b3e7368..b10952f77472 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -433,20 +433,32 @@ static int cachefiles_ondemand_send_req(struct 
cachefiles_object *object,
   smp_mb();
   if (opcode == CACHEFILES_OP_CLOSE &&
- !cachefiles_ondemand_object_is_open(object)) {
+ !cachefiles_ondemand_object_is_open(object)) {
WARN_ON_ONCE(object->ondemand->ondemand_id == 0);
   xas_unlock();
   ret = -EIO;
   goto out;
   }
-    xas.xa_index = 0;
+    /*
+ * Cyclically find a free xas to avoid msg_id reuse that would
+ * cause the daemon to successfully copen a stale msg_id.
+ */
+    xas.xa_index = cache->msg_id_next;
   xas_find_marked(, UINT_MAX, XA_FREE_MARK);
+    if (xas.xa_node == XAS_RESTART) {
+    xas.xa_index = 0;
+    xas_find_marked(, cache->msg_id_next - 1, XA_FREE_MARK);
+    }
   if (xas.xa_node == XAS_RESTART)
   xas_set_err(, -EBUSY);
+
   xas_store(, req);
-    xas_clear_mark(, XA_FREE_MARK);
-    xas_set_mark(, CACHEFILES_REQ_NEW);
+    if (xas_valid()) {
+    cache->msg_id_next = xas.xa_index + 1;

If you have a long-standing stuck request, could this counter wrap
around and you still end up with reuse?

Yes, msg_id_next is declared to be of type u32 in the hope that when
xa_index == UINT_MAX, a wrap around occurs so that msg_id_next
goes to zero. Limiting xa_index to no more than UINT_MAX is to avoid
the xarry being too deep.

If msg_id_next is equal to the id of a long-standing stuck request
after the wrap-around, it is true that the reuse in the above problem
may also occur.

But I feel that a long stuck request is problematic in itself, it means
that after we have sent 4294967295 requests, the first one has not

Re: [PATCH v2 4/5] cachefiles: cyclic allocation of msg_id to avoid reuse

2024-05-20 Thread Gao Xiang
t is problematic in itself, it means
that after we have sent 4294967295 requests, the first one has not
been processed yet, and even if we send a million requests per
second, this one hasn't been completed for more than an hour.

We have a keep-alive process that pulls the daemon back up as
soon as it exits, and there is a timeout mechanism for requests in
the daemon to prevent the kernel from waiting for long periods
of time. In other words, we should avoid the situation where
a request is stuck for a long period of time.

If you think UINT_MAX is not enough, perhaps we could raise
the maximum value of msg_id_next to ULONG_MAX?

Maybe this should be using
ida_alloc/free instead, which would prevent that too?


The id reuse here is that the kernel has finished the open request
req_A and freed its id_A and used it again when sending the open
request req_B, but the daemon is still working on req_A, so the
copen id_A succeeds but operates on req_B.

The id that is being used by the kernel will not be allocated here
so it seems that ida _alloc/free does not prevent reuse either,
could you elaborate a bit more how this works?


ida_alloc and free absolutely prevent reuse while the id is in use.
That's sort of the point of those functions. Basically it uses a set of
bitmaps in an xarray to track which IDs are in use, so ida_alloc only
hands out values which are not in use. See the comments over
ida_alloc_range() in lib/idr.c.


Thank you for the explanation!

The logic now provides the same guarantees as ida_alloc/free.
The "reused" id, indeed, is no longer in use in the kernel, but it is still
in use in the userland, so a multi-threaded daemon could be handling
two different requests for the same msg_id at the same time.

Previously, the logic for allocating msg_ids was to start at 0 and look
for a free xas.index, so it was possible for an id to be allocated to a
new request just as the id was being freed.

With the change to cyclic allocation, the kernel will not use the same
id again until INT_MAX requests have been sent, and during the time
it takes to send requests, the daemon has enough time to process
requests whose ids are still in use by the daemon, but have already
been freed in the kernel.


Again, If I understand correctly, I think the main point
here is

wait_for_completion(_A->done)

which could hang due to some malicious deamon.  But I think it
should be switched to wait_for_completion_killable() instead.
It's up to users to kill the mount instance if there is a
malicious user daemon.

So in that case, hung task will not be triggered anymore, and
you don't need to care about cyclic allocation too.

Thanks,
Gao Xiang



Regards,
Baokun

+    xas_clear_mark(, XA_FREE_MARK);
+    xas_set_mark(, CACHEFILES_REQ_NEW);
+    }
   xas_unlock();
   } while (xas_nomem(, GFP_KERNEL));



Re: [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid

2024-05-20 Thread Gao Xiang




On 2024/5/20 19:14, Baokun Li wrote:

On 2024/5/20 17:24, Jingbo Xu wrote:


On 5/20/24 5:07 PM, Baokun Li wrote:

On 2024/5/20 16:43, Jingbo Xu wrote:

On 5/15/24 4:45 PM, libao...@huaweicloud.com wrote:

From: Baokun Li 


SNIP


To avoid this, allocate a new anonymous fd only if no anonymous fd has
been allocated (ondemand_id == 0) or if the previously allocated
anonymous
fd has been closed (ondemand_id == -1). Moreover, returns an error if
ondemand_id is valid, letting the daemon know that the current userland
restore logic is abnormal and needs to be checked.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking
up cookie")
Signed-off-by: Baokun Li 

The LOCs of this fix is quite under control.  But still it seems that
the worst consequence is that the (potential) malicious daemon gets
hung.  No more effect to the system or other processes.  Or does a
non-malicious daemon have any chance having the same issue?

If we enable hung_task_panic, it may cause panic to crash the server.

Then this issue has nothing to do with this patch?  As long as a
malicious daemon doesn't close the anonymous fd after umounting, then I
guess a following attempt of mounting cookie with the same name will
also wait and hung there?


Yes, a daemon that only reads requests but doesn't process them will
cause hung,but the daemon will obey the basic constraints when we
test it.


If we'd really like to enhanace this ("hung_task_panic"), I think
you'd better to switch wait_for_completion() to
wait_for_completion_killable() at least IMHO anyway.

Thanks,
Gao Xiang



[PATCH] erofs: avoid allocating DEFLATE streams before mounting

2024-05-20 Thread Gao Xiang
Currently, each DEFLATE stream takes one 32 KiB permanent internal
window buffer even if there is no running instance which uses DEFLATE
algorithm.

It's unexpected and wasteful on embedded devices with limited resources
and servers with hundreds of CPU cores if DEFLATE is enabled but unused.

Fixes: ffa09b3bd024 ("erofs: DEFLATE compression support")
Cc:  # 6.6+
Signed-off-by: Gao Xiang 
---
 fs/erofs/decompressor_deflate.c | 55 +
 1 file changed, 29 insertions(+), 26 deletions(-)

diff --git a/fs/erofs/decompressor_deflate.c b/fs/erofs/decompressor_deflate.c
index 81e65c453ef0..3a3461561a3c 100644
--- a/fs/erofs/decompressor_deflate.c
+++ b/fs/erofs/decompressor_deflate.c
@@ -46,39 +46,15 @@ int __init z_erofs_deflate_init(void)
/* by default, use # of possible CPUs instead */
if (!z_erofs_deflate_nstrms)
z_erofs_deflate_nstrms = num_possible_cpus();
-
-   for (; z_erofs_deflate_avail_strms < z_erofs_deflate_nstrms;
-++z_erofs_deflate_avail_strms) {
-   struct z_erofs_deflate *strm;
-
-   strm = kzalloc(sizeof(*strm), GFP_KERNEL);
-   if (!strm)
-   goto out_failed;
-
-   /* XXX: in-kernel zlib cannot shrink windowbits currently */
-   strm->z.workspace = vmalloc(zlib_inflate_workspacesize());
-   if (!strm->z.workspace) {
-   kfree(strm);
-   goto out_failed;
-   }
-
-   spin_lock(_erofs_deflate_lock);
-   strm->next = z_erofs_deflate_head;
-   z_erofs_deflate_head = strm;
-   spin_unlock(_erofs_deflate_lock);
-   }
return 0;
-
-out_failed:
-   erofs_err(NULL, "failed to allocate zlib workspace");
-   z_erofs_deflate_exit();
-   return -ENOMEM;
 }
 
 int z_erofs_load_deflate_config(struct super_block *sb,
struct erofs_super_block *dsb, void *data, int size)
 {
struct z_erofs_deflate_cfgs *dfl = data;
+   static DEFINE_MUTEX(deflate_resize_mutex);
+   static bool inited;
 
if (!dfl || size < sizeof(struct z_erofs_deflate_cfgs)) {
erofs_err(sb, "invalid deflate cfgs, size=%u", size);
@@ -89,9 +65,36 @@ int z_erofs_load_deflate_config(struct super_block *sb,
erofs_err(sb, "unsupported windowbits %u", dfl->windowbits);
return -EOPNOTSUPP;
}
+   mutex_lock(_resize_mutex);
+   if (!inited) {
+   for (; z_erofs_deflate_avail_strms < z_erofs_deflate_nstrms;
+++z_erofs_deflate_avail_strms) {
+   struct z_erofs_deflate *strm;
+
+   strm = kzalloc(sizeof(*strm), GFP_KERNEL);
+   if (!strm)
+   goto failed;
+   /* XXX: in-kernel zlib cannot customize windowbits */
+   strm->z.workspace = 
vmalloc(zlib_inflate_workspacesize());
+   if (!strm->z.workspace) {
+   kfree(strm);
+   goto failed;
+   }
 
+   spin_lock(_erofs_deflate_lock);
+   strm->next = z_erofs_deflate_head;
+   z_erofs_deflate_head = strm;
+   spin_unlock(_erofs_deflate_lock);
+   }
+   inited = true;
+   }
+   mutex_unlock(_resize_mutex);
erofs_info(sb, "EXPERIMENTAL DEFLATE feature in use. Use at your own 
risk!");
return 0;
+failed:
+   mutex_unlock(_resize_mutex);
+   z_erofs_deflate_exit();
+   return -ENOMEM;
 }
 
 int z_erofs_deflate_decompress(struct z_erofs_decompress_req *rq,
-- 
2.39.3



Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()

2024-05-20 Thread Gao Xiang
viewed-by: Jia Zhu 
---
  fs/cachefiles/internal.h |  1 +
  fs/cachefiles/ondemand.c | 44 ++--
  2 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index d33169f0018b..7745b8abc3aa 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -138,6 +138,7 @@ static inline bool cachefiles_in_ondemand_mode(struct 
cachefiles_cache *cache)
  struct cachefiles_req {
  struct cachefiles_object *object;
  struct completion done;
+    refcount_t ref;
  int error;
  struct cachefiles_msg msg;
  };
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index fd49728d8bae..56d12fe4bf73 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -4,6 +4,12 @@
  #include 
  #include "internal.h"
+static inline void cachefiles_req_put(struct cachefiles_req *req)
+{
+    if (refcount_dec_and_test(>ref))
+    kfree(req);
+}
+
  static int cachefiles_ondemand_fd_release(struct inode *inode,
    struct file *file)
  {
@@ -299,7 +305,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct 
cachefiles_cache *cache,
  {
  struct cachefiles_req *req;
  struct cachefiles_msg *msg;
-    unsigned long id = 0;
  size_t n;
  int ret = 0;
  XA_STATE(xas, >reqs, cache->req_id_next);
@@ -330,41 +335,39 @@ ssize_t cachefiles_ondemand_daemon_read(struct 
cachefiles_cache *cache,
  xas_clear_mark(, CACHEFILES_REQ_NEW);
  cache->req_id_next = xas.xa_index + 1;
+    refcount_inc(>ref);
  xa_unlock(>reqs);
-    id = xas.xa_index;
-
  if (msg->opcode == CACHEFILES_OP_OPEN) {
  ret = cachefiles_ondemand_get_fd(req);
  if (ret) {
  cachefiles_ondemand_set_object_close(req->object);
-    goto error;
+    goto out;
  }
  }
-    msg->msg_id = id;
+    msg->msg_id = xas.xa_index;
  msg->object_id = req->object->ondemand->ondemand_id;
  if (copy_to_user(_buffer, msg, n) != 0) {
  ret = -EFAULT;
  if (msg->opcode == CACHEFILES_OP_OPEN)
  close_fd(((struct cachefiles_open *)msg->data)->fd);
-    goto error;
  }
-
-    /* CLOSE request has no reply */
-    if (msg->opcode == CACHEFILES_OP_CLOSE) {
-    xa_erase(>reqs, id);
-    complete(>done);
+out:
+    /* Remove error request and CLOSE request has no reply */
+    if (ret || msg->opcode == CACHEFILES_OP_CLOSE) {
+    xas_reset();
+    xas_lock();
+    if (xas_load() == req) {

Just out of curiosity... How could xas_load() doesn't equal to req?


As mentioned above, the req may have been deleted or even the id

may have been reused.




+    req->error = ret;
+    complete(>done);
+    xas_store(, NULL);
+    }
+    xas_unlock();
  }
-
-    return n;
-
-error:
-    xa_erase(>reqs, id);
-    req->error = ret;
-    complete(>done);
-    return ret;
+    cachefiles_req_put(req);
+    return ret ? ret : n;
  }

This is actually a combination of a fix and a cleanup which combines the
logic of removing error request and the CLOSE requests into one place.
Also it relies on the cleanup made in patch 2 ("cachefiles: remove
err_put_fd tag in cachefiles_ondemand_daemon_read()"), making it
difficult to be atomatically back ported to the stable (as patch 2 is
not marked as "Fixes").

Thus could we make the fix first, and then make the cleanup.

I don't think that's necessary, stable automatically backports the
relevant dependency patches in case of backport patch conflicts,
and later patches modify the logic here as well.
Or add Fixes tag for patch 2?


I think we might better to avoid unnecessary dependencies
since it relies on some "AI" magic and often mis-backportes
real dependencies.

I tend to leave real bugfixes first, and do cleanup next.
But please don't leave cleanup patches with "Fixes:" tags
anyway since it just misleads people.

Thanks,
Gao Xiang


[PATCH v2] erofs-utils: unify the tree traversal for the rebuild mode

2024-05-20 Thread Gao Xiang
Let's drop the legacy approach and `tarerofs` will be applied too.

Signed-off-by: Gao Xiang 
---
change since v1:
 - fix incorrect unencoded data reported by:
   https://github.com/erofs/erofsnightly/actions/runs/9153196829

 include/erofs/internal.h |   7 +-
 lib/diskbuf.c|  14 +-
 lib/inode.c  | 383 ++-
 3 files changed, 231 insertions(+), 173 deletions(-)

diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index ecbbdf6..46345e0 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -290,7 +290,12 @@ static inline unsigned int erofs_inode_datalayout(unsigned 
int value)
  EROFS_I_DATALAYOUT_BITS);
 }
 
-#define IS_ROOT(x) ((x) == (x)->i_parent)
+static inline struct erofs_inode *erofs_parent_inode(struct erofs_inode *inode)
+{
+   return (void *)((unsigned long)inode->i_parent & ~1UL);
+}
+
+#define IS_ROOT(x) ((x) == erofs_parent_inode(x))
 
 struct erofs_dentry {
struct list_head d_child;   /* child of parent list */
diff --git a/lib/diskbuf.c b/lib/diskbuf.c
index 8205ba5..e5889df 100644
--- a/lib/diskbuf.c
+++ b/lib/diskbuf.c
@@ -10,7 +10,7 @@
 
 /* A simple approach to avoid creating too many temporary files */
 static struct erofs_diskbufstrm {
-   u64 count;
+   erofs_atomic_t count;
u64 tailoffset, devpos;
int fd;
unsigned int alignsize;
@@ -25,8 +25,6 @@ int erofs_diskbuf_getfd(struct erofs_diskbuf *db, u64 *fpos)
if (!strm)
return -1;
offset = db->offset + strm->devpos;
-   if (lseek(strm->fd, offset, SEEK_SET) != offset)
-   return -E2BIG;
if (fpos)
*fpos = offset;
return strm->fd;
@@ -46,7 +44,7 @@ int erofs_diskbuf_reserve(struct erofs_diskbuf *db, int sid, 
u64 *off)
if (off)
*off = db->offset + strm->devpos;
db->sp = strm;
-   ++strm->count;
+   (void)erofs_atomic_inc_return(>count);
strm->locked = true;/* TODO: need a real lock for MT */
return strm->fd;
 }
@@ -66,8 +64,8 @@ void erofs_diskbuf_close(struct erofs_diskbuf *db)
struct erofs_diskbufstrm *strm = db->sp;
 
DBG_BUGON(!strm);
-   DBG_BUGON(strm->count <= 1);
-   --strm->count;
+   DBG_BUGON(erofs_atomic_read(>count) <= 1);
+   (void)erofs_atomic_dec_return(>count);
db->sp = NULL;
 }
 
@@ -122,7 +120,7 @@ int erofs_diskbuf_init(unsigned int nstrms)
return -ENOSPC;
 setupone:
strm->tailoffset = 0;
-   strm->count = 1;
+   erofs_atomic_set(>count, 1);
if (fstat(strm->fd, ))
return -errno;
strm->alignsize = max_t(u32, st.st_blksize, getpagesize());
@@ -138,7 +136,7 @@ void erofs_diskbuf_exit(void)
return;
 
for (strm = dbufstrm; strm->fd >= 0; ++strm) {
-   DBG_BUGON(strm->count != 1);
+   DBG_BUGON(erofs_atomic_read(>count) != 1);
 
close(strm->fd);
strm->fd = -1;
diff --git a/lib/inode.c b/lib/inode.c
index fda98a4..cbe0810 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -264,7 +264,7 @@ int erofs_init_empty_dir(struct erofs_inode *dir)
d = erofs_d_alloc(dir, "..");
if (IS_ERR(d))
return PTR_ERR(d);
-   d->inode = erofs_igrab(dir->i_parent);
+   d->inode = erofs_igrab(erofs_parent_inode(dir));
d->type = EROFS_FT_DIR;
 
dir->i_nlink = 2;
@@ -494,29 +494,6 @@ int erofs_write_unencoded_file(struct erofs_inode *inode, 
int fd, u64 fpos)
return write_uncompressed_file_from_fd(inode, fd);
 }
 
-int erofs_write_file(struct erofs_inode *inode, int fd, u64 fpos)
-{
-   DBG_BUGON(!inode->i_size);
-
-   if (cfg.c_compr_opts[0].alg && erofs_file_is_compressible(inode)) {
-   void *ictx;
-   int ret;
-
-   ictx = erofs_begin_compressed_file(inode, fd, fpos);
-   if (IS_ERR(ictx))
-   return PTR_ERR(ictx);
-
-   ret = erofs_write_compressed_file(ictx);
-   if (ret != -ENOSPC)
-   return ret;
-
-   if (lseek(fd, fpos, SEEK_SET) < 0)
-   return -errno;
-   }
-   /* fallback to all data uncompressed */
-   return erofs_write_unencoded_file(inode, fd, fpos);
-}
-
 static int erofs_bh_flush_write_inode(struct erofs_buffer_head *bh)
 {
struct erofs_inode *const inode = bh->fsprivate;
@@ -1113,6 +1090,7 @@ struct erofs_mkfs_job_ndir_ctx {
struct erofs_inode *inode;
void *ictx;
int fd;
+   u64 fpos;
 };
 
 static int erofs_mkfs_job_write_file(struct erofs_mkfs_job_ndir_ctx *ctx)
@@ -1120,19 +1098,31

Re: [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests

2024-05-19 Thread Gao Xiang




On 2024/5/15 16:45, libao...@huaweicloud.com wrote:

From: Baokun Li 



The subject line can be
"cachefiles: remove requests from xarray during flushing requests"



Even with CACHEFILES_DEAD set, we can still read the requests, so in the
following concurrency the request may be used after it has been freed:

  mount  |   daemon_thread1|daemon_thread2

  cachefiles_ondemand_init_object
   cachefiles_ondemand_send_req
REQ_A = kzalloc(sizeof(*req) + data_len)
wait_for_completion(_A->done)
 cachefiles_daemon_read
  cachefiles_ondemand_daemon_read
   // close dev fd
   cachefiles_flush_reqs
complete(_A->done)
kfree(REQ_A)
   xa_lock(>reqs);
   cachefiles_ondemand_select_req
 req->msg.opcode != CACHEFILES_OP_READ
 // req use-after-free !!!
   xa_unlock(>reqs);
xa_destroy(>reqs)

Hence remove requests from cache->reqs when flushing them to avoid
accessing freed requests.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up 
cookie")
Signed-off-by: Baokun Li 
Reviewed-by: Jia Zhu 


Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


---
  fs/cachefiles/daemon.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 6465e2574230..ccb7b707ea4b 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -159,6 +159,7 @@ static void cachefiles_flush_reqs(struct cachefiles_cache 
*cache)
xa_for_each(xa, index, req) {
req->error = -EIO;
complete(>done);
+   __xa_erase(xa, index);
}
xa_unlock(xa);
  


Re: [PATCH v2 02/12] cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read()

2024-05-19 Thread Gao Xiang




On 2024/5/15 16:45, libao...@huaweicloud.com wrote:

From: Baokun Li 

The err_put_fd tag is only used once, so remove it to make the code more


The err_put_fd label ..

Also the subject line needs to be updated too.  ("C goto label")


readable.

Signed-off-by: Baokun Li 
Reviewed-by: Jia Zhu 


Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


---
  fs/cachefiles/ondemand.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 4ba42f1fa3b4..fd49728d8bae 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -347,7 +347,9 @@ ssize_t cachefiles_ondemand_daemon_read(struct 
cachefiles_cache *cache,
  
  	if (copy_to_user(_buffer, msg, n) != 0) {

ret = -EFAULT;
-   goto err_put_fd;
+   if (msg->opcode == CACHEFILES_OP_OPEN)
+   close_fd(((struct cachefiles_open *)msg->data)->fd);
+   goto error;
}
  
  	/* CLOSE request has no reply */

@@ -358,9 +360,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct 
cachefiles_cache *cache,
  
  	return n;
  
-err_put_fd:

-   if (msg->opcode == CACHEFILES_OP_OPEN)
-   close_fd(((struct cachefiles_open *)msg->data)->fd);
  error:
xa_erase(>reqs, id);
req->error = ret;


Re: [PATCH] erofs: clean up erofs_show_options()

2024-05-17 Thread Gao Xiang




On 2024/5/17 17:56, Hongzhen Luo wrote:

Avoid unnecessary #ifdefs and simplify the code a bit.

Signed-off-by: Hongzhen Luo 


Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


[PATCH] erofs-utils: mkfs: add `--zfeature-bits` option

2024-05-17 Thread Gao Xiang
Thus, we could traverse all compression features with a number
easily in the testcases.

Signed-off-by: Gao Xiang 
---
 mkfs/main.c | 160 
 1 file changed, 124 insertions(+), 36 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index c26cb56..12321ed 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -81,6 +81,7 @@ static struct option long_options[] = {
 #ifdef EROFS_MT_ENABLED
{"workers", required_argument, NULL, 520},
 #endif
+   {"zfeature-bits", required_argument, NULL, 521},
{0, 0, 0, 0},
 };
 
@@ -166,6 +167,7 @@ static void usage(int argc, char **argv)
" --gid-offset=#add offset # to all file gids (# = id 
offset)\n"
" --ignore-mtimeuse build time instead of strict 
per-file modification time\n"
" --max-extent-bytes=#  set maximum decompressed extent size # 
in bytes\n"
+   " --mount-point=X   X=prefix of target fs path (default: 
/)\n"
" --preserve-mtime  keep per-file modification time 
strictly\n"
" --offset=#skip # bytes at the beginning of 
IMAGE.\n"
" --aufsreplace aufs special files with 
overlayfs metadata\n"
@@ -190,7 +192,7 @@ static void usage(int argc, char **argv)
" --workers=#   set the number of worker threads to # 
(default: %u)\n"
 #endif
" --xattr-prefix=X  X=extra xattr name prefix\n"
-   " --mount-point=X   X=prefix of target fs path (default: 
/)\n"
+   " --zfeature-bits   toggle filesystem compression features 
according to given bits\n"
 #ifdef WITH_ANDROID
"\n"
"Android-specific options:\n"
@@ -220,10 +222,81 @@ static bool tar_mode, rebuild_mode;
 static unsigned int rebuild_src_count;
 static LIST_HEAD(rebuild_src_list);
 
+static int erofs_mkfs_feat_set_legacy_compress(bool en, const char *val,
+  unsigned int vallen)
+{
+   if (vallen)
+   return -EINVAL;
+   /* disable compacted indexes and 0padding */
+   cfg.c_legacy_compress = en;
+   return 0;
+}
+
+static int erofs_mkfs_feat_set_ztailpacking(bool en, const char *val,
+   unsigned int vallen)
+{
+   if (vallen)
+   return -EINVAL;
+   cfg.c_ztailpacking = en;
+   return 0;
+}
+
+static int erofs_mkfs_feat_set_fragments(bool en, const char *val,
+unsigned int vallen)
+{
+   if (!en) {
+   if (vallen)
+   return -EINVAL;
+   cfg.c_fragments = false;
+   return 0;
+   }
+
+   if (vallen) {
+   char *endptr;
+   u64 i = strtoull(val, , 0);
+
+   if (endptr - val != vallen) {
+   erofs_err("invalid pcluster size %s for the packed file 
%s", val);
+   return -EINVAL;
+   }
+   pclustersize_packed = i;
+   }
+   cfg.c_fragments = true;
+   return 0;
+}
+
+static int erofs_mkfs_feat_set_all_fragments(bool en, const char *val,
+unsigned int vallen)
+{
+   cfg.c_all_fragments = en;
+   return erofs_mkfs_feat_set_fragments(en, val, vallen);
+}
+
+static int erofs_mkfs_feat_set_dedupe(bool en, const char *val,
+ unsigned int vallen)
+{
+   if (vallen)
+   return -EINVAL;
+   cfg.c_dedupe = en;
+   return 0;
+}
+
+static struct {
+   char *feat;
+   int (*set)(bool en, const char *val, unsigned int len);
+} z_erofs_mkfs_features[] = {
+   {"legacy-compress", erofs_mkfs_feat_set_legacy_compress},
+   {"ztailpacking", erofs_mkfs_feat_set_ztailpacking},
+   {"fragments", erofs_mkfs_feat_set_fragments},
+   {"all-fragments", erofs_mkfs_feat_set_all_fragments},
+   {"dedupe", erofs_mkfs_feat_set_dedupe},
+   {NULL, NULL},
+};
+
 static int parse_extended_opts(const char *opts)
 {
 #define MATCH_EXTENTED_OPT(opt, token, keylen) \
-   (keylen == sizeof(opt) - 1 && !memcmp(token, opt, sizeof(opt) - 1))
+   (keylen == strlen(opt) && !memcmp(token, opt, keylen))
 
const char *token, *next, *tokenend, *value __maybe_unused;
unsigned int keylen, vallen;
@@ -262,12 +335,7 @@ static int parse_extended_opts(const char *opts)
clear = true;
}
 
-   if (MATCH_EXTENTED_OPT("legacy-compress", token, keylen)) {
-   if (vallen)
-   return -EINVAL;
-   /*

[PATCH 2/2] erofs-utils: pretty root directory progressinfo

2024-05-15 Thread Gao Xiang
Avoid `Processing  ...` or `file  dumped (mode 40755)`..

Signed-off-by: Gao Xiang 
---
 lib/inode.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/inode.c b/lib/inode.c
index 8ec87e6..67a572d 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -1405,10 +1405,11 @@ static int erofs_rebuild_handle_directory(struct 
erofs_inode *dir)
 
 static int erofs_mkfs_handle_inode(struct erofs_inode *inode)
 {
+   const char *relpath = erofs_fspath(inode->i_srcpath);
char *trimmed;
int ret;
 
-   trimmed = erofs_trim_for_progressinfo(erofs_fspath(inode->i_srcpath),
+   trimmed = erofs_trim_for_progressinfo(relpath[0] ? relpath : "/",
  sizeof("Processing  ...") - 1);
erofs_update_progressinfo("Processing %s ...", trimmed);
free(trimmed);
@@ -1442,8 +1443,7 @@ static int erofs_mkfs_handle_inode(struct erofs_inode 
*inode)
} else {
ret = erofs_mkfs_handle_directory(inode);
}
-   erofs_info("file %s dumped (mode %05o)", erofs_fspath(inode->i_srcpath),
-  inode->i_mode);
+   erofs_info("file /%s dumped (mode %05o)", relpath, inode->i_mode);
return ret;
 }
 
-- 
2.39.3



[PATCH 1/2] erofs-utils: correct the default number of workers in the usage

2024-05-15 Thread Gao Xiang
Fixes: 59c36e7a4008 ("erofs-utils: mkfs: use all available processors by 
default")
Signed-off-by: Gao Xiang 
---
 mkfs/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index 4c3620d..455c152 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -187,7 +187,7 @@ static void usage(int argc, char **argv)
"   (and optionally dump the raw stream to 
X together)\n"
 #endif
 #ifdef EROFS_MT_ENABLED
-   " --workers=#   set the number of worker threads to # 
(default=1)\n"
+   " --workers=#   set the number of worker threads to # 
(default: %u)\n"
 #endif
" --xattr-prefix=X  X=extra xattr name prefix\n"
" --mount-point=X   X=prefix of target fs path (default: 
/)\n"
@@ -198,7 +198,7 @@ static void usage(int argc, char **argv)
" --fs-config-file=XX=fs_config file\n"
" --block-list-file=X   X=block_list file\n"
 #endif
-   );
+   , erofs_get_available_processors() /* --workers= */);
 }
 
 static void version(void)
-- 
2.39.3



[PATCH] erofs-utils: unify the tree traversal for the rebuild mode

2024-05-15 Thread Gao Xiang
Let's drop the legacy approach and `tarerofs` will be applied too.

Signed-off-by: Gao Xiang 
---
 include/erofs/internal.h |   7 +-
 lib/diskbuf.c|  14 +-
 lib/inode.c  | 384 ++-
 3 files changed, 232 insertions(+), 173 deletions(-)

diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index ecbbdf6..46345e0 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -290,7 +290,12 @@ static inline unsigned int erofs_inode_datalayout(unsigned 
int value)
  EROFS_I_DATALAYOUT_BITS);
 }
 
-#define IS_ROOT(x) ((x) == (x)->i_parent)
+static inline struct erofs_inode *erofs_parent_inode(struct erofs_inode *inode)
+{
+   return (void *)((unsigned long)inode->i_parent & ~1UL);
+}
+
+#define IS_ROOT(x) ((x) == erofs_parent_inode(x))
 
 struct erofs_dentry {
struct list_head d_child;   /* child of parent list */
diff --git a/lib/diskbuf.c b/lib/diskbuf.c
index 8205ba5..e5889df 100644
--- a/lib/diskbuf.c
+++ b/lib/diskbuf.c
@@ -10,7 +10,7 @@
 
 /* A simple approach to avoid creating too many temporary files */
 static struct erofs_diskbufstrm {
-   u64 count;
+   erofs_atomic_t count;
u64 tailoffset, devpos;
int fd;
unsigned int alignsize;
@@ -25,8 +25,6 @@ int erofs_diskbuf_getfd(struct erofs_diskbuf *db, u64 *fpos)
if (!strm)
return -1;
offset = db->offset + strm->devpos;
-   if (lseek(strm->fd, offset, SEEK_SET) != offset)
-   return -E2BIG;
if (fpos)
*fpos = offset;
return strm->fd;
@@ -46,7 +44,7 @@ int erofs_diskbuf_reserve(struct erofs_diskbuf *db, int sid, 
u64 *off)
if (off)
*off = db->offset + strm->devpos;
db->sp = strm;
-   ++strm->count;
+   (void)erofs_atomic_inc_return(>count);
strm->locked = true;/* TODO: need a real lock for MT */
return strm->fd;
 }
@@ -66,8 +64,8 @@ void erofs_diskbuf_close(struct erofs_diskbuf *db)
struct erofs_diskbufstrm *strm = db->sp;
 
DBG_BUGON(!strm);
-   DBG_BUGON(strm->count <= 1);
-   --strm->count;
+   DBG_BUGON(erofs_atomic_read(>count) <= 1);
+   (void)erofs_atomic_dec_return(>count);
db->sp = NULL;
 }
 
@@ -122,7 +120,7 @@ int erofs_diskbuf_init(unsigned int nstrms)
return -ENOSPC;
 setupone:
strm->tailoffset = 0;
-   strm->count = 1;
+   erofs_atomic_set(>count, 1);
if (fstat(strm->fd, ))
return -errno;
strm->alignsize = max_t(u32, st.st_blksize, getpagesize());
@@ -138,7 +136,7 @@ void erofs_diskbuf_exit(void)
return;
 
for (strm = dbufstrm; strm->fd >= 0; ++strm) {
-   DBG_BUGON(strm->count != 1);
+   DBG_BUGON(erofs_atomic_read(>count) != 1);
 
close(strm->fd);
strm->fd = -1;
diff --git a/lib/inode.c b/lib/inode.c
index 44d684f..8ec87e6 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -264,7 +264,7 @@ int erofs_init_empty_dir(struct erofs_inode *dir)
d = erofs_d_alloc(dir, "..");
if (IS_ERR(d))
return PTR_ERR(d);
-   d->inode = erofs_igrab(dir->i_parent);
+   d->inode = erofs_igrab(erofs_parent_inode(dir));
d->type = EROFS_FT_DIR;
 
dir->i_nlink = 2;
@@ -494,29 +494,6 @@ int erofs_write_unencoded_file(struct erofs_inode *inode, 
int fd, u64 fpos)
return write_uncompressed_file_from_fd(inode, fd);
 }
 
-int erofs_write_file(struct erofs_inode *inode, int fd, u64 fpos)
-{
-   DBG_BUGON(!inode->i_size);
-
-   if (cfg.c_compr_opts[0].alg && erofs_file_is_compressible(inode)) {
-   void *ictx;
-   int ret;
-
-   ictx = erofs_begin_compressed_file(inode, fd, fpos);
-   if (IS_ERR(ictx))
-   return PTR_ERR(ictx);
-
-   ret = erofs_write_compressed_file(ictx);
-   if (ret != -ENOSPC)
-   return ret;
-
-   if (lseek(fd, fpos, SEEK_SET) < 0)
-   return -errno;
-   }
-   /* fallback to all data uncompressed */
-   return erofs_write_unencoded_file(inode, fd, fpos);
-}
-
 static int erofs_bh_flush_write_inode(struct erofs_buffer_head *bh)
 {
struct erofs_inode *const inode = bh->fsprivate;
@@ -1113,6 +1090,7 @@ struct erofs_mkfs_job_ndir_ctx {
struct erofs_inode *inode;
void *ictx;
int fd;
+   u64 fpos;
 };
 
 static int erofs_mkfs_job_write_file(struct erofs_mkfs_job_ndir_ctx *ctx)
@@ -1121,18 +1099,31 @@ static int erofs_mkfs_job_write_file(struct 
erofs_mkfs_job_ndir_ctx *ctx)
int ret;
 
if (ctx->ictx)

[PATCH v3] erofs-utils: add preliminary zstd support [x]

2024-05-14 Thread Gao Xiang
This patch just adds a preliminary Zstandard support to erofs-utils
since currently Zstandard doesn't support fixed-sized output compression
officially.  Mkfs could take more time to finish but it works at least.

The built-in zstd compressor for erofs-utils is slowly WIP, therefore
apparently it will take more efforts.

[ TODO: Later I tend to add another way to generate fixed-sized input
pclusters temporarily for relatively large pcluster sizes as
an option since it will have minor impacts to the results. ]

Signed-off-by: Gao Xiang 
---
changes since v2:
 - use ZSTD_compress2() since only this can enable the previous applied
   parameters. 

 configure.ac |  35 ++
 dump/Makefile.am |   3 +-
 fsck/Makefile.am |   6 +-
 fuse/Makefile.am |   2 +-
 include/erofs_fs.h   |  10 +++
 lib/Makefile.am  |   3 +
 lib/compress.c   |  24 +++
 lib/compressor.c |   8 +++
 lib/compressor.h |   1 +
 lib/compressor_libzstd.c | 143 +++
 lib/decompress.c |  67 ++
 mkfs/Makefile.am |   2 +-
 12 files changed, 299 insertions(+), 5 deletions(-)
 create mode 100644 lib/compressor_libzstd.c

diff --git a/configure.ac b/configure.ac
index 4a940a8..1560f84 100644
--- a/configure.ac
+++ b/configure.ac
@@ -139,6 +139,10 @@ AC_ARG_WITH(libdeflate,
   [Enable and build with libdeflate inflate support 
@<:@default=disabled@:>@])], [],
   [with_libdeflate="no"])
 
+AC_ARG_WITH(libzstd,
+   [AS_HELP_STRING([--with-libzstd],
+  [Enable and build with of libzstd support @<:@default=auto@:>@])])
+
 AC_ARG_ENABLE(fuse,
[AS_HELP_STRING([--enable-fuse], [enable erofsfuse @<:@default=no@:>@])],
[enable_fuse="$enableval"], [enable_fuse="no"])
@@ -474,6 +478,32 @@ AS_IF([test "x$with_libdeflate" != "xno"], [
   LIBS="${saved_LIBS}"
   CPPFLAGS="${saved_CPPFLAGS}"], [have_libdeflate="no"])
 
+# Configure libzstd
+have_libzstd="no"
+AS_IF([test "x$with_libzstd" != "xno"], [
+  PKG_CHECK_MODULES([libzstd], [libzstd >= 1.4.0], [
+# Paranoia: don't trust the result reported by pkgconfig before trying out
+saved_LIBS="$LIBS"
+saved_CPPFLAGS=${CPPFLAGS}
+CPPFLAGS="${libzstd_CFLAGS} ${CPPFLAGS}"
+LIBS="${libzstd_LIBS} $LIBS"
+AC_CHECK_HEADERS([zstd.h],[
+  AC_CHECK_LIB(zstd, ZSTD_compress2, [], [
+AC_MSG_ERROR([libzstd doesn't work properly])])
+  AC_CHECK_DECL(ZSTD_compress2, [have_libzstd="yes"],
+[AC_MSG_ERROR([libzstd doesn't work properly])], [[
+#include 
+  ]])
+  AC_CHECK_FUNCS([ZSTD_getFrameContentSize])
+])
+LIBS="${saved_LIBS}"
+CPPFLAGS="${saved_CPPFLAGS}"], [
+AS_IF([test "x$with_libzstd" = "xyes"], [
+  AC_MSG_ERROR([Cannot find proper libzstd])
+])
+  ])
+])
+
 # Enable 64-bit off_t
 CFLAGS+=" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64"
 
@@ -494,6 +524,7 @@ AM_CONDITIONAL([ENABLE_LZ4HC], [test "x${have_lz4hc}" = 
"xyes"])
 AM_CONDITIONAL([ENABLE_FUSE], [test "x${have_fuse}" = "xyes"])
 AM_CONDITIONAL([ENABLE_LIBLZMA], [test "x${have_liblzma}" = "xyes"])
 AM_CONDITIONAL([ENABLE_LIBDEFLATE], [test "x${have_libdeflate}" = "xyes"])
+AM_CONDITIONAL([ENABLE_LIBZSTD], [test "x${have_libzstd}" = "xyes"])
 
 if test "x$have_uuid" = "xyes"; then
   AC_DEFINE([HAVE_LIBUUID], 1, [Define to 1 if libuuid is found])
@@ -539,6 +570,10 @@ if test "x$have_libdeflate" = "xyes"; then
   AC_DEFINE([HAVE_LIBDEFLATE], 1, [Define to 1 if libdeflate is found])
 fi
 
+if test "x$have_libzstd" = "xyes"; then
+  AC_DEFINE([HAVE_LIBZSTD], 1, [Define to 1 if libzstd is found])
+fi
+
 # Dump maximum block size
 AS_IF([test "x$erofs_cv_max_block_size" = "x"],
   [$erofs_cv_max_block_size = 4096], [])
diff --git a/dump/Makefile.am b/dump/Makefile.am
index aed20c2..09c483e 100644
--- a/dump/Makefile.am
+++ b/dump/Makefile.am
@@ -7,4 +7,5 @@ AM_CPPFLAGS = ${libuuid_CFLAGS}
 dump_erofs_SOURCES = main.c
 dump_erofs_CFLAGS = -Wall -I$(top_srcdir)/include
 dump_erofs_LDADD = $(top_builddir)/lib/liberofs.la ${libselinux_LIBS} \
-   ${liblz4_LIBS} ${liblzma_LIBS} ${zlib_LIBS} ${libdeflate_LIBS}
+   ${liblz4_LIBS} ${liblzma_LIBS} ${zlib_LIBS} ${libdeflate_LIBS} \
+   ${libzstd_LIBS}
diff --git a/fsck/Makefile.am b/fsck/Makefile.am
index d024405..70eacc0 100644
--- a/fsck/Makefile.am
+++ b/fsck/Makefile.am
@@ -7,7 +7,8 @@ AM_CPPFLAGS = ${libuuid_CFLAGS}
 fsck_erofs_SOURCES = main.c
 fsck_erofs_CFLAGS = -Wall -I$(top_srcdir)/include
 fsck_erofs_LDADD = $(top_builddir)/lib/liberofs.la ${libselinu

[GIT PULL] erofs updates for 6.10-rc1

2024-05-13 Thread Gao Xiang
Hi Linus,

Could you consider this pull request for 6.10-rc1?

In this cycle, LZ4 global buffer count is now configurable instead
of the previous per-CPU buffers, which is useful for bare metals with
hundreds of CPUs.  A reserved buffer pool for LZ4 decompression can
also be enabled to minimize the tail allocation latencies under the
low memory scenarios with heavy memory pressure.

In addition, Zstandard algorithm is now supported as an alternative
since it has been requested by users for a while.

There are some random cleanups as usual.  All commits have been in
-next and no potential merge conflict is observed.

Thanks,
Gao Xiang

The following changes since commit dd5a440a31fae6e459c0d627162825505361:

  Linux 6.9-rc7 (2024-05-05 14:06:01 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git 
tags/erofs-for-6.10-rc1

for you to fetch changes up to 7c35de4df1056a5a1fb4de042197b8f5b1033b61:

  erofs: Zstandard compression support (2024-05-09 07:46:56 +0800)


Changes since last update:

 - Make LZ4 global buffers configurable instead of per-CPU buffers;

 - Add a reserved buffer pool for LZ4 decompression for lower latencies;

 - Support Zstandard compression algorithm as an alternative;

 - Derive fsid from on-disk UUID for .statfs() if possible;

 - Minor cleanups.


Chunhai Guo (4):
  erofs: rename utils.c to zutil.c
  erofs: rename per-CPU buffers to global buffer pool and make it 
configurable
  erofs: do not use pagepool in z_erofs_gbuf_growsize()
  erofs: add a reserved buffer pool for lz4 decompression

Gao Xiang (2):
  erofs: clean up z_erofs_load_full_lcluster()
  erofs: Zstandard compression support

Hongzhen Luo (1):
  erofs: derive fsid from on-disk UUID for .statfs() if possible

 fs/erofs/Kconfig  |  15 +++
 fs/erofs/Makefile |   5 +-
 fs/erofs/compress.h   |   4 +
 fs/erofs/decompressor.c   |  15 ++-
 fs/erofs/decompressor_zstd.c  | 279 ++
 fs/erofs/erofs_fs.h   |  15 ++-
 fs/erofs/internal.h   |  28 +++--
 fs/erofs/pcpubuf.c| 148 --
 fs/erofs/super.c  |  28 +++--
 fs/erofs/zmap.c   |  24 ++--
 fs/erofs/{utils.c => zutil.c} | 206 ---
 11 files changed, 555 insertions(+), 212 deletions(-)
 create mode 100644 fs/erofs/decompressor_zstd.c
 delete mode 100644 fs/erofs/pcpubuf.c
 rename fs/erofs/{utils.c => zutil.c} (58%)


[PATCH v3] erofs-utils: add preliminary zstd support [x]

2024-05-09 Thread Gao Xiang
This patch just adds a preliminary Zstandard support to erofs-utils
since currently Zstandard doesn't support fixed-sized output compression
officially.  Mkfs could take more time to finish but it works at least.

The built-in zstd compressor for erofs-utils is slowly WIP, therefore
apparently it will take more efforts.

[ TODO: Later I tend to add another way to generate fixed-sized input
pclusters temporarily for relatively large pcluster sizes as
an option since it will have minor impacts to the results. ]

Signed-off-by: Gao Xiang 
---
v3:
 - Fix incorrect assumption of sub-block compression.

 configure.ac |  35 +++
 dump/Makefile.am |   3 +-
 fsck/Makefile.am |   6 +-
 fuse/Makefile.am |   2 +-
 include/erofs_fs.h   |  10 +++
 lib/Makefile.am  |   3 +
 lib/compress.c   |  24 
 lib/compressor.c |   8 +++
 lib/compressor.h |   1 +
 lib/compressor_libzstd.c | 128 +++
 lib/decompress.c |  67 
 mkfs/Makefile.am |   2 +-
 12 files changed, 284 insertions(+), 5 deletions(-)
 create mode 100644 lib/compressor_libzstd.c

diff --git a/configure.ac b/configure.ac
index 4a940a8..1560f84 100644
--- a/configure.ac
+++ b/configure.ac
@@ -139,6 +139,10 @@ AC_ARG_WITH(libdeflate,
   [Enable and build with libdeflate inflate support 
@<:@default=disabled@:>@])], [],
   [with_libdeflate="no"])
 
+AC_ARG_WITH(libzstd,
+   [AS_HELP_STRING([--with-libzstd],
+  [Enable and build with of libzstd support @<:@default=auto@:>@])])
+
 AC_ARG_ENABLE(fuse,
[AS_HELP_STRING([--enable-fuse], [enable erofsfuse @<:@default=no@:>@])],
[enable_fuse="$enableval"], [enable_fuse="no"])
@@ -474,6 +478,32 @@ AS_IF([test "x$with_libdeflate" != "xno"], [
   LIBS="${saved_LIBS}"
   CPPFLAGS="${saved_CPPFLAGS}"], [have_libdeflate="no"])
 
+# Configure libzstd
+have_libzstd="no"
+AS_IF([test "x$with_libzstd" != "xno"], [
+  PKG_CHECK_MODULES([libzstd], [libzstd >= 1.4.0], [
+# Paranoia: don't trust the result reported by pkgconfig before trying out
+saved_LIBS="$LIBS"
+saved_CPPFLAGS=${CPPFLAGS}
+CPPFLAGS="${libzstd_CFLAGS} ${CPPFLAGS}"
+LIBS="${libzstd_LIBS} $LIBS"
+AC_CHECK_HEADERS([zstd.h],[
+  AC_CHECK_LIB(zstd, ZSTD_decompressDCtx, [], [
+AC_MSG_ERROR([libzstd doesn't work properly])])
+  AC_CHECK_DECL(ZSTD_decompressDCtx, [have_libzstd="yes"],
+[AC_MSG_ERROR([libzstd doesn't work properly])], [[
+#include 
+  ]])
+  AC_CHECK_FUNCS([ZSTD_getFrameContentSize])
+])
+LIBS="${saved_LIBS}"
+CPPFLAGS="${saved_CPPFLAGS}"], [
+AS_IF([test "x$with_libzstd" = "xyes"], [
+  AC_MSG_ERROR([Cannot find proper libzstd])
+])
+  ])
+])
+
 # Enable 64-bit off_t
 CFLAGS+=" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64"
 
@@ -494,6 +524,7 @@ AM_CONDITIONAL([ENABLE_LZ4HC], [test "x${have_lz4hc}" = 
"xyes"])
 AM_CONDITIONAL([ENABLE_FUSE], [test "x${have_fuse}" = "xyes"])
 AM_CONDITIONAL([ENABLE_LIBLZMA], [test "x${have_liblzma}" = "xyes"])
 AM_CONDITIONAL([ENABLE_LIBDEFLATE], [test "x${have_libdeflate}" = "xyes"])
+AM_CONDITIONAL([ENABLE_LIBZSTD], [test "x${have_libzstd}" = "xyes"])
 
 if test "x$have_uuid" = "xyes"; then
   AC_DEFINE([HAVE_LIBUUID], 1, [Define to 1 if libuuid is found])
@@ -539,6 +570,10 @@ if test "x$have_libdeflate" = "xyes"; then
   AC_DEFINE([HAVE_LIBDEFLATE], 1, [Define to 1 if libdeflate is found])
 fi
 
+if test "x$have_libzstd" = "xyes"; then
+  AC_DEFINE([HAVE_LIBZSTD], 1, [Define to 1 if libzstd is found])
+fi
+
 # Dump maximum block size
 AS_IF([test "x$erofs_cv_max_block_size" = "x"],
   [$erofs_cv_max_block_size = 4096], [])
diff --git a/dump/Makefile.am b/dump/Makefile.am
index aed20c2..09c483e 100644
--- a/dump/Makefile.am
+++ b/dump/Makefile.am
@@ -7,4 +7,5 @@ AM_CPPFLAGS = ${libuuid_CFLAGS}
 dump_erofs_SOURCES = main.c
 dump_erofs_CFLAGS = -Wall -I$(top_srcdir)/include
 dump_erofs_LDADD = $(top_builddir)/lib/liberofs.la ${libselinux_LIBS} \
-   ${liblz4_LIBS} ${liblzma_LIBS} ${zlib_LIBS} ${libdeflate_LIBS}
+   ${liblz4_LIBS} ${liblzma_LIBS} ${zlib_LIBS} ${libdeflate_LIBS} \
+   ${libzstd_LIBS}
diff --git a/fsck/Makefile.am b/fsck/Makefile.am
index d024405..70eacc0 100644
--- a/fsck/Makefile.am
+++ b/fsck/Makefile.am
@@ -7,7 +7,8 @@ AM_CPPFLAGS = ${libuuid_CFLAGS}
 fsck_erofs_SOURCES = main.c
 fsck_erofs_CFLAGS = -Wall -I$(top_srcdir)/include
 fsck_erofs_LDADD = $(top_builddir)/lib/liberofs.la ${libselinux_LIBS} \
-   ${liblz4_LIBS} ${liblzma_L

Re: [PATCH] erofs-utils: add preliminary zstd support

2024-05-09 Thread Gao Xiang




On 2024/5/9 17:53, Yifan Zhao wrote:

Hi Gao,

I found this zstd implementation failed in my smoking test[1] with 
-Eall-fragments and -Eztailpacking enabled. You could reproduce it with enwik8 
workload.

After a galance I notice that libzstd_compress_destsize() may return a value 
larger than `dstsize`. I mark it in the code below.


That is expected.



I believe this will break the assumption in its caller and lead to broken image.

However I'm not an expert of zstd algorithms, could you please take a look at 
it?


[1] https://github.com/SToPire/erofsnightly/actions/runs/9004052391



I will look into that, thanks!

Thanks,
Gao Xiang


[PATCH v2] erofs-utils: add preliminary zstd support [x]

2024-05-08 Thread Gao Xiang
This patch just adds a preliminary Zstandard support to erofs-utils
since currently Zstandard doesn't support fixed-sized output compression
officially.  Mkfs could take more time to finish but it works at least.

The built-in zstd compressor for erofs-utils is slowly WIP, therefore
apparently it will take more efforts.

[ TODO: Later I tend to add another way to generate fixed-sized input
pclusters temporarily for relatively large pcluster sizes as
an option since it will have minor impacts to the results. ]

Signed-off-by: Gao Xiang 
---
v2:
 Use a newer helper ZSTD_getFrameContentSize() for now if possible.

 configure.ac |  35 +++
 dump/Makefile.am |   3 +-
 fsck/Makefile.am |   6 +-
 fuse/Makefile.am |   2 +-
 include/erofs_fs.h   |  10 +++
 lib/Makefile.am  |   3 +
 lib/compress.c   |  24 
 lib/compressor.c |   8 +++
 lib/compressor.h |   1 +
 lib/compressor_libzstd.c | 130 +++
 lib/decompress.c |  67 
 mkfs/Makefile.am |   2 +-
 12 files changed, 286 insertions(+), 5 deletions(-)
 create mode 100644 lib/compressor_libzstd.c

diff --git a/configure.ac b/configure.ac
index 4a940a8..1560f84 100644
--- a/configure.ac
+++ b/configure.ac
@@ -139,6 +139,10 @@ AC_ARG_WITH(libdeflate,
   [Enable and build with libdeflate inflate support 
@<:@default=disabled@:>@])], [],
   [with_libdeflate="no"])
 
+AC_ARG_WITH(libzstd,
+   [AS_HELP_STRING([--with-libzstd],
+  [Enable and build with of libzstd support @<:@default=auto@:>@])])
+
 AC_ARG_ENABLE(fuse,
[AS_HELP_STRING([--enable-fuse], [enable erofsfuse @<:@default=no@:>@])],
[enable_fuse="$enableval"], [enable_fuse="no"])
@@ -474,6 +478,32 @@ AS_IF([test "x$with_libdeflate" != "xno"], [
   LIBS="${saved_LIBS}"
   CPPFLAGS="${saved_CPPFLAGS}"], [have_libdeflate="no"])
 
+# Configure libzstd
+have_libzstd="no"
+AS_IF([test "x$with_libzstd" != "xno"], [
+  PKG_CHECK_MODULES([libzstd], [libzstd >= 1.4.0], [
+# Paranoia: don't trust the result reported by pkgconfig before trying out
+saved_LIBS="$LIBS"
+saved_CPPFLAGS=${CPPFLAGS}
+CPPFLAGS="${libzstd_CFLAGS} ${CPPFLAGS}"
+LIBS="${libzstd_LIBS} $LIBS"
+AC_CHECK_HEADERS([zstd.h],[
+  AC_CHECK_LIB(zstd, ZSTD_decompressDCtx, [], [
+AC_MSG_ERROR([libzstd doesn't work properly])])
+  AC_CHECK_DECL(ZSTD_decompressDCtx, [have_libzstd="yes"],
+[AC_MSG_ERROR([libzstd doesn't work properly])], [[
+#include 
+  ]])
+  AC_CHECK_FUNCS([ZSTD_getFrameContentSize])
+])
+LIBS="${saved_LIBS}"
+CPPFLAGS="${saved_CPPFLAGS}"], [
+AS_IF([test "x$with_libzstd" = "xyes"], [
+  AC_MSG_ERROR([Cannot find proper libzstd])
+])
+  ])
+])
+
 # Enable 64-bit off_t
 CFLAGS+=" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64"
 
@@ -494,6 +524,7 @@ AM_CONDITIONAL([ENABLE_LZ4HC], [test "x${have_lz4hc}" = 
"xyes"])
 AM_CONDITIONAL([ENABLE_FUSE], [test "x${have_fuse}" = "xyes"])
 AM_CONDITIONAL([ENABLE_LIBLZMA], [test "x${have_liblzma}" = "xyes"])
 AM_CONDITIONAL([ENABLE_LIBDEFLATE], [test "x${have_libdeflate}" = "xyes"])
+AM_CONDITIONAL([ENABLE_LIBZSTD], [test "x${have_libzstd}" = "xyes"])
 
 if test "x$have_uuid" = "xyes"; then
   AC_DEFINE([HAVE_LIBUUID], 1, [Define to 1 if libuuid is found])
@@ -539,6 +570,10 @@ if test "x$have_libdeflate" = "xyes"; then
   AC_DEFINE([HAVE_LIBDEFLATE], 1, [Define to 1 if libdeflate is found])
 fi
 
+if test "x$have_libzstd" = "xyes"; then
+  AC_DEFINE([HAVE_LIBZSTD], 1, [Define to 1 if libzstd is found])
+fi
+
 # Dump maximum block size
 AS_IF([test "x$erofs_cv_max_block_size" = "x"],
   [$erofs_cv_max_block_size = 4096], [])
diff --git a/dump/Makefile.am b/dump/Makefile.am
index aed20c2..09c483e 100644
--- a/dump/Makefile.am
+++ b/dump/Makefile.am
@@ -7,4 +7,5 @@ AM_CPPFLAGS = ${libuuid_CFLAGS}
 dump_erofs_SOURCES = main.c
 dump_erofs_CFLAGS = -Wall -I$(top_srcdir)/include
 dump_erofs_LDADD = $(top_builddir)/lib/liberofs.la ${libselinux_LIBS} \
-   ${liblz4_LIBS} ${liblzma_LIBS} ${zlib_LIBS} ${libdeflate_LIBS}
+   ${liblz4_LIBS} ${liblzma_LIBS} ${zlib_LIBS} ${libdeflate_LIBS} \
+   ${libzstd_LIBS}
diff --git a/fsck/Makefile.am b/fsck/Makefile.am
index d024405..70eacc0 100644
--- a/fsck/Makefile.am
+++ b/fsck/Makefile.am
@@ -7,7 +7,8 @@ AM_CPPFLAGS = ${libuuid_CFLAGS}
 fsck_erofs_SOURCES = main.c
 fsck_erofs_CFLAGS = -Wall -I$(top_srcdir)/include
 fsck_erofs_LDADD = $(top_builddir)/lib/liberofs.la ${libselinux_LIBS} \
-   ${li

[PATCH v2] erofs: Zstandard compression support

2024-05-08 Thread Gao Xiang
From: Gao Xiang 

Add Zstandard compression as the 4th supported algorithm since it
becomes more popular now and some end users have asked this for
quite a while [1][2].

Each EROFS physical cluster contains only one valid standard
Zstandard frame as described in [3] so that decompression can be
performed on a per-pcluster basis independently.

Currently, it just leverages multi-call stream decompression APIs with
internal sliding window buffers.  One-shot or bufferless decompression
could be implemented later for even better performance if needed.

[1] https://github.com/erofs/erofs-utils/issues/6
[2] https://lore.kernel.org/r/y08h+z6czdns1...@b-p7tqmd6m-0146.lan
[3] https://www.rfc-editor.org/rfc/rfc8478.txt

Acked-by: Chao Yu 
Signed-off-by: Gao Xiang 
---
change since v1:
 - Fix an unused warning:
https://lore.kernel.org/r/202405090343.ziq0crfw-...@intel.com

 fs/erofs/Kconfig |  15 ++
 fs/erofs/Makefile|   1 +
 fs/erofs/compress.h  |   4 +
 fs/erofs/decompressor.c  |   7 +
 fs/erofs/decompressor_zstd.c | 279 +++
 fs/erofs/erofs_fs.h  |  10 ++
 fs/erofs/internal.h  |   8 +
 fs/erofs/super.c |   7 +
 fs/erofs/zmap.c  |   3 +-
 9 files changed, 333 insertions(+), 1 deletion(-)
 create mode 100644 fs/erofs/decompressor_zstd.c

diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index fffd3919343e..7dcdce660cac 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -112,6 +112,21 @@ config EROFS_FS_ZIP_DEFLATE
 
  If unsure, say N.
 
+config EROFS_FS_ZIP_ZSTD
+   bool "EROFS Zstandard compressed data support"
+   depends on EROFS_FS_ZIP
+   select ZSTD_DECOMPRESS
+   help
+ Saying Y here includes support for reading EROFS file systems
+ containing Zstandard compressed data.  It gives better compression
+ ratios than the default LZ4 format, while it costs more CPU
+ overhead.
+
+ Zstandard support is an experimental feature for now and so most
+ file systems will be readable without selecting this option.
+
+ If unsure, say N.
+
 config EROFS_FS_ONDEMAND
bool "EROFS fscache-based on-demand read support"
depends on EROFS_FS
diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 20d1ec422443..097d672e6b14 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -6,4 +6,5 @@ erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
 erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o zutil.o
 erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
 erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) += decompressor_deflate.o
+erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
index 333587ba6183..19d53c30c8af 100644
--- a/fs/erofs/compress.h
+++ b/fs/erofs/compress.h
@@ -90,8 +90,12 @@ int z_erofs_load_lzma_config(struct super_block *sb,
struct erofs_super_block *dsb, void *data, int size);
 int z_erofs_load_deflate_config(struct super_block *sb,
struct erofs_super_block *dsb, void *data, int size);
+int z_erofs_load_zstd_config(struct super_block *sb,
+   struct erofs_super_block *dsb, void *data, int size);
 int z_erofs_lzma_decompress(struct z_erofs_decompress_req *rq,
struct page **pagepool);
 int z_erofs_deflate_decompress(struct z_erofs_decompress_req *rq,
   struct page **pagepool);
+int z_erofs_zstd_decompress(struct z_erofs_decompress_req *rq,
+   struct page **pgpl);
 #endif
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index d2fe8130819e..9d85b6c11c6b 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -399,6 +399,13 @@ const struct z_erofs_decompressor erofs_decompressors[] = {
.name = "deflate"
},
 #endif
+#ifdef CONFIG_EROFS_FS_ZIP_ZSTD
+   [Z_EROFS_COMPRESSION_ZSTD] = {
+   .config = z_erofs_load_zstd_config,
+   .decompress = z_erofs_zstd_decompress,
+   .name = "zstd"
+   },
+#endif
 };
 
 int z_erofs_parse_cfgs(struct super_block *sb, struct erofs_super_block *dsb)
diff --git a/fs/erofs/decompressor_zstd.c b/fs/erofs/decompressor_zstd.c
new file mode 100644
index ..63a23cac3af4
--- /dev/null
+++ b/fs/erofs/decompressor_zstd.c
@@ -0,0 +1,279 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include 
+#include "compress.h"
+
+struct z_erofs_zstd {
+   struct z_erofs_zstd *next;
+   u8 bounce[PAGE_SIZE];
+   void *wksp;
+   unsigned int wkspsz;
+};
+
+static DEFINE_SPINLOCK(z_erofs_zstd_lock);
+static unsigned int z_erofs_zstd_max_dictsize;
+static unsigned int z_erofs_zstd_nstrms, z_erofs_zstd_avail_strms;
+static struct z_erofs_zstd *z_erofs_zstd_head;
+static DECLARE_WAIT_

[PATCH] erofs: clean up z_erofs_load_full_lcluster()

2024-05-08 Thread Gao Xiang
Only four lcluster types here, remove redundant code.
No real logic changes.

Signed-off-by: Gao Xiang 
---
Some random cleanup out of the upcoming big lclusters..

 fs/erofs/erofs_fs.h |  5 +
 fs/erofs/zmap.c | 21 +
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index 4bc11602aac8..6c0c270c42e1 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -406,8 +406,7 @@ enum {
Z_EROFS_LCLUSTER_TYPE_MAX
 };
 
-#define Z_EROFS_LI_LCLUSTER_TYPE_BITS2
-#define Z_EROFS_LI_LCLUSTER_TYPE_BIT 0
+#define Z_EROFS_LI_LCLUSTER_TYPE_MASK  (Z_EROFS_LCLUSTER_TYPE_MAX - 1)
 
 /* (noncompact only, HEAD) This pcluster refers to partial decompressed data */
 #define Z_EROFS_LI_PARTIAL_REF (1 << 15)
@@ -461,8 +460,6 @@ static inline void 
erofs_check_ondisk_layout_definitions(void)
 sizeof(struct z_erofs_lcluster_index));
BUILD_BUG_ON(sizeof(struct erofs_deviceslot) != 128);
 
-   BUILD_BUG_ON(BIT(Z_EROFS_LI_LCLUSTER_TYPE_BITS) <
-Z_EROFS_LCLUSTER_TYPE_MAX - 1);
/* exclude old compiler versions like gcc 7.5.0 */
BUILD_BUG_ON(__builtin_constant_p(fmh) ?
 fmh != cpu_to_le64(1ULL << 63) : 0);
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index e313c936351d..26637a60eba5 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -31,7 +31,7 @@ static int z_erofs_load_full_lcluster(struct 
z_erofs_maprecorder *m,
vi->inode_isize + vi->xattr_isize) +
lcn * sizeof(struct z_erofs_lcluster_index);
struct z_erofs_lcluster_index *di;
-   unsigned int advise, type;
+   unsigned int advise;
 
m->kaddr = erofs_read_metabuf(>map->buf, inode->i_sb,
  erofs_blknr(inode->i_sb, pos), 
EROFS_KMAP);
@@ -43,10 +43,8 @@ static int z_erofs_load_full_lcluster(struct 
z_erofs_maprecorder *m,
di = m->kaddr + erofs_blkoff(inode->i_sb, pos);
 
advise = le16_to_cpu(di->di_advise);
-   type = (advise >> Z_EROFS_LI_LCLUSTER_TYPE_BIT) &
-   ((1 << Z_EROFS_LI_LCLUSTER_TYPE_BITS) - 1);
-   switch (type) {
-   case Z_EROFS_LCLUSTER_TYPE_NONHEAD:
+   m->type = advise & Z_EROFS_LI_LCLUSTER_TYPE_MASK;
+   if (m->type == Z_EROFS_LCLUSTER_TYPE_NONHEAD) {
m->clusterofs = 1 << vi->z_logical_clusterbits;
m->delta[0] = le16_to_cpu(di->di_u.delta[0]);
if (m->delta[0] & Z_EROFS_LI_D0_CBLKCNT) {
@@ -60,24 +58,15 @@ static int z_erofs_load_full_lcluster(struct 
z_erofs_maprecorder *m,
m->delta[0] = 1;
}
m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
-   break;
-   case Z_EROFS_LCLUSTER_TYPE_PLAIN:
-   case Z_EROFS_LCLUSTER_TYPE_HEAD1:
-   case Z_EROFS_LCLUSTER_TYPE_HEAD2:
-   if (advise & Z_EROFS_LI_PARTIAL_REF)
-   m->partialref = true;
+   } else {
+   m->partialref = !!(advise & Z_EROFS_LI_PARTIAL_REF);
m->clusterofs = le16_to_cpu(di->di_clusterofs);
if (m->clusterofs >= 1 << vi->z_logical_clusterbits) {
DBG_BUGON(1);
return -EFSCORRUPTED;
}
m->pblk = le32_to_cpu(di->di_u.blkaddr);
-   break;
-   default:
-   DBG_BUGON(1);
-   return -EOPNOTSUPP;
}
-   m->type = type;
return 0;
 }
 
-- 
2.39.3



[PATCH] erofs-utils: add preliminary zstd support

2024-05-08 Thread Gao Xiang
This patch just adds a preliminary Zstandard support to erofs-utils
since currently Zstandard doesn't support fixed-sized output compression
officially.  Mkfs could take more time to finish but it works at least.

The built-in zstd compressor for erofs-utils is slowly WIP, therefore
apparently it will take more efforts.

[ TODO: Later I tend to add another way to generate fixed-sized input
pclusters temporarily for relatively large pcluster sizes as
an option since it will have minor impacts to the results. ]
Signed-off-by: Gao Xiang 
---
 configure.ac |  34 ++
 dump/Makefile.am |   3 +-
 fsck/Makefile.am |   6 +-
 fuse/Makefile.am |   2 +-
 include/erofs_fs.h   |  10 +++
 lib/Makefile.am  |   3 +
 lib/compress.c   |  24 
 lib/compressor.c |   8 +++
 lib/compressor.h |   1 +
 lib/compressor_libzstd.c | 130 +++
 lib/decompress.c |  59 ++
 mkfs/Makefile.am |   2 +-
 12 files changed, 277 insertions(+), 5 deletions(-)
 create mode 100644 lib/compressor_libzstd.c

diff --git a/configure.ac b/configure.ac
index 4a940a8..d862b4d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -139,6 +139,10 @@ AC_ARG_WITH(libdeflate,
   [Enable and build with libdeflate inflate support 
@<:@default=disabled@:>@])], [],
   [with_libdeflate="no"])
 
+AC_ARG_WITH(libzstd,
+   [AS_HELP_STRING([--with-libzstd],
+  [Enable and build with of libzstd support @<:@default=auto@:>@])])
+
 AC_ARG_ENABLE(fuse,
[AS_HELP_STRING([--enable-fuse], [enable erofsfuse @<:@default=no@:>@])],
[enable_fuse="$enableval"], [enable_fuse="no"])
@@ -474,6 +478,31 @@ AS_IF([test "x$with_libdeflate" != "xno"], [
   LIBS="${saved_LIBS}"
   CPPFLAGS="${saved_CPPFLAGS}"], [have_libdeflate="no"])
 
+# Configure libzstd
+have_libzstd="no"
+AS_IF([test "x$with_libzstd" != "xno"], [
+  PKG_CHECK_MODULES([libzstd], [libzstd >= 1.4.0], [
+# Paranoia: don't trust the result reported by pkgconfig before trying out
+saved_LIBS="$LIBS"
+saved_CPPFLAGS=${CPPFLAGS}
+CPPFLAGS="${libzstd_CFLAGS} ${CPPFLAGS}"
+LIBS="${libzstd_LIBS} $LIBS"
+AC_CHECK_HEADERS([zstd.h],[
+  AC_CHECK_LIB(zstd, ZSTD_decompressDCtx, [], [
+AC_MSG_ERROR([libzstd doesn't work properly])])
+  AC_CHECK_DECL(ZSTD_decompressDCtx, [have_libzstd="yes"],
+[AC_MSG_ERROR([libzstd doesn't work properly])], [[
+#include 
+  ]])
+])
+LIBS="${saved_LIBS}"
+CPPFLAGS="${saved_CPPFLAGS}"], [
+AS_IF([test "x$with_libzstd" = "xyes"], [
+  AC_MSG_ERROR([Cannot find proper libzstd])
+])
+  ])
+])
+
 # Enable 64-bit off_t
 CFLAGS+=" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64"
 
@@ -494,6 +523,7 @@ AM_CONDITIONAL([ENABLE_LZ4HC], [test "x${have_lz4hc}" = 
"xyes"])
 AM_CONDITIONAL([ENABLE_FUSE], [test "x${have_fuse}" = "xyes"])
 AM_CONDITIONAL([ENABLE_LIBLZMA], [test "x${have_liblzma}" = "xyes"])
 AM_CONDITIONAL([ENABLE_LIBDEFLATE], [test "x${have_libdeflate}" = "xyes"])
+AM_CONDITIONAL([ENABLE_LIBZSTD], [test "x${have_libzstd}" = "xyes"])
 
 if test "x$have_uuid" = "xyes"; then
   AC_DEFINE([HAVE_LIBUUID], 1, [Define to 1 if libuuid is found])
@@ -539,6 +569,10 @@ if test "x$have_libdeflate" = "xyes"; then
   AC_DEFINE([HAVE_LIBDEFLATE], 1, [Define to 1 if libdeflate is found])
 fi
 
+if test "x$have_libzstd" = "xyes"; then
+  AC_DEFINE([HAVE_LIBZSTD], 1, [Define to 1 if libzstd is found])
+fi
+
 # Dump maximum block size
 AS_IF([test "x$erofs_cv_max_block_size" = "x"],
   [$erofs_cv_max_block_size = 4096], [])
diff --git a/dump/Makefile.am b/dump/Makefile.am
index aed20c2..09c483e 100644
--- a/dump/Makefile.am
+++ b/dump/Makefile.am
@@ -7,4 +7,5 @@ AM_CPPFLAGS = ${libuuid_CFLAGS}
 dump_erofs_SOURCES = main.c
 dump_erofs_CFLAGS = -Wall -I$(top_srcdir)/include
 dump_erofs_LDADD = $(top_builddir)/lib/liberofs.la ${libselinux_LIBS} \
-   ${liblz4_LIBS} ${liblzma_LIBS} ${zlib_LIBS} ${libdeflate_LIBS}
+   ${liblz4_LIBS} ${liblzma_LIBS} ${zlib_LIBS} ${libdeflate_LIBS} \
+   ${libzstd_LIBS}
diff --git a/fsck/Makefile.am b/fsck/Makefile.am
index d024405..70eacc0 100644
--- a/fsck/Makefile.am
+++ b/fsck/Makefile.am
@@ -7,7 +7,8 @@ AM_CPPFLAGS = ${libuuid_CFLAGS}
 fsck_erofs_SOURCES = main.c
 fsck_erofs_CFLAGS = -Wall -I$(top_srcdir)/include
 fsck_erofs_LDADD = $(top_builddir)/lib/liberofs.la ${libselinux_LIBS} \
-   ${liblz4_LIBS} ${liblzma_LIBS} ${zlib_LIBS} ${libdeflate_LIBS}
+   ${liblz4_LIBS} ${liblzma_LIBS} ${zlib_LIBS} ${libdeflate_L

[PATCH] erofs: Zstandard compression support

2024-05-08 Thread Gao Xiang
Add Zstandard compression as the 4th supported algorithm since it
becomes more popular now and some end users have asked this for
quite a while [1][2].

Each EROFS physical cluster contains only one valid standard
Zstandard frame as described in [3] so that decompression can be
performed on a per-pcluster basis independently.

Currently, it just leverages multi-call stream decompression APIs with
internal sliding window buffers.  One-shot or bufferless decompression
could be implemented later for even better performance if needed.

[1] https://github.com/erofs/erofs-utils/issues/6
[2] https://lore.kernel.org/r/y08h+z6czdns1...@b-p7tqmd6m-0146.lan
[3] https://www.rfc-editor.org/rfc/rfc8478.txt
Signed-off-by: Gao Xiang 
---
 fs/erofs/Kconfig |  15 ++
 fs/erofs/Makefile|   1 +
 fs/erofs/compress.h  |   4 +
 fs/erofs/decompressor.c  |   7 +
 fs/erofs/decompressor_zstd.c | 279 +++
 fs/erofs/erofs_fs.h  |  10 ++
 fs/erofs/internal.h  |   8 +
 fs/erofs/super.c |   7 +
 8 files changed, 331 insertions(+)
 create mode 100644 fs/erofs/decompressor_zstd.c

diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index fffd3919343e..7dcdce660cac 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -112,6 +112,21 @@ config EROFS_FS_ZIP_DEFLATE
 
  If unsure, say N.
 
+config EROFS_FS_ZIP_ZSTD
+   bool "EROFS Zstandard compressed data support"
+   depends on EROFS_FS_ZIP
+   select ZSTD_DECOMPRESS
+   help
+ Saying Y here includes support for reading EROFS file systems
+ containing Zstandard compressed data.  It gives better compression
+ ratios than the default LZ4 format, while it costs more CPU
+ overhead.
+
+ Zstandard support is an experimental feature for now and so most
+ file systems will be readable without selecting this option.
+
+ If unsure, say N.
+
 config EROFS_FS_ONDEMAND
bool "EROFS fscache-based on-demand read support"
depends on EROFS_FS
diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 20d1ec422443..097d672e6b14 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -6,4 +6,5 @@ erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
 erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o zutil.o
 erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
 erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) += decompressor_deflate.o
+erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
index 333587ba6183..19d53c30c8af 100644
--- a/fs/erofs/compress.h
+++ b/fs/erofs/compress.h
@@ -90,8 +90,12 @@ int z_erofs_load_lzma_config(struct super_block *sb,
struct erofs_super_block *dsb, void *data, int size);
 int z_erofs_load_deflate_config(struct super_block *sb,
struct erofs_super_block *dsb, void *data, int size);
+int z_erofs_load_zstd_config(struct super_block *sb,
+   struct erofs_super_block *dsb, void *data, int size);
 int z_erofs_lzma_decompress(struct z_erofs_decompress_req *rq,
struct page **pagepool);
 int z_erofs_deflate_decompress(struct z_erofs_decompress_req *rq,
   struct page **pagepool);
+int z_erofs_zstd_decompress(struct z_erofs_decompress_req *rq,
+   struct page **pgpl);
 #endif
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index d2fe8130819e..9d85b6c11c6b 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -399,6 +399,13 @@ const struct z_erofs_decompressor erofs_decompressors[] = {
.name = "deflate"
},
 #endif
+#ifdef CONFIG_EROFS_FS_ZIP_ZSTD
+   [Z_EROFS_COMPRESSION_ZSTD] = {
+   .config = z_erofs_load_zstd_config,
+   .decompress = z_erofs_zstd_decompress,
+   .name = "zstd"
+   },
+#endif
 };
 
 int z_erofs_parse_cfgs(struct super_block *sb, struct erofs_super_block *dsb)
diff --git a/fs/erofs/decompressor_zstd.c b/fs/erofs/decompressor_zstd.c
new file mode 100644
index ..24279511db3b
--- /dev/null
+++ b/fs/erofs/decompressor_zstd.c
@@ -0,0 +1,279 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include 
+#include "compress.h"
+
+struct z_erofs_zstd {
+   struct z_erofs_zstd *next;
+   u8 bounce[PAGE_SIZE];
+   void *wksp;
+   unsigned int wkspsz;
+};
+
+static DEFINE_SPINLOCK(z_erofs_zstd_lock);
+static unsigned int z_erofs_zstd_max_dictsize;
+static unsigned int z_erofs_zstd_nstrms, z_erofs_zstd_avail_strms;
+static struct z_erofs_zstd *z_erofs_zstd_head;
+static DECLARE_WAIT_QUEUE_HEAD(z_erofs_zstd_wq);
+
+module_param_named(zstd_streams, z_erofs_zstd_nstrms, uint, 0444);
+
+static struct z_erofs_zstd *z_erofs_isolate_strms(bool all)
+{
+   struct z_erofs_zstd *

Re: [PATCH AUTOSEL 6.8 26/52] erofs: reliably distinguish block based and fscache mode

2024-05-07 Thread Gao Xiang

Hi,

On 2024/5/8 07:06, Sasha Levin wrote:

From: Christian Brauner 

[ Upstream commit 7af2ae1b1531feab5d38ec9c8f472dc6cceb4606 ]

When erofs_kill_sb() is called in block dev based mode, s_bdev may not
have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled,
it will be mistaken for fscache mode, and then attempt to free an anon_dev
that has never been allocated, triggering the following warning:


ida_free called for id=0 which is not allocated.
WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
Modules linked in:
CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
RIP: 0010:ida_free+0x134/0x140
Call Trace:
  
  erofs_kill_sb+0x81/0x90
  deactivate_locked_super+0x35/0x80
  get_tree_bdev+0x136/0x1e0
  vfs_get_tree+0x2c/0xf0
  do_new_mount+0x190/0x2f0
  [...]


Now when erofs_kill_sb() is called, erofs_sb_info must have been
initialised, so use sbi->fsid to distinguish between the two modes.

Signed-off-by: Christian Brauner 
Signed-off-by: Baokun Li 
Reviewed-by: Jingbo Xu 
Reviewed-by: Gao Xiang 
Reviewed-by: Chao Yu 
Link: https://lore.kernel.org/r/20240419123611.947084-3-libaok...@huawei.com
Signed-off-by: Gao Xiang 
Signed-off-by: Sasha Levin 



Please help drop this patch for now too, you should backport
the dependency commit
07abe43a28b2 ("erofs: get rid of erofs_fs_context") in advance.

Otherwise it doesn't work and break the functionality.

Thanks,
Gao Xiang



Re: [PATCH AUTOSEL 6.6 21/43] erofs: reliably distinguish block based and fscache mode

2024-05-07 Thread Gao Xiang

Hi,

On 2024/5/8 07:09, Sasha Levin wrote:

From: Christian Brauner 

[ Upstream commit 7af2ae1b1531feab5d38ec9c8f472dc6cceb4606 ]

When erofs_kill_sb() is called in block dev based mode, s_bdev may not
have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled,
it will be mistaken for fscache mode, and then attempt to free an anon_dev
that has never been allocated, triggering the following warning:


ida_free called for id=0 which is not allocated.
WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
Modules linked in:
CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
RIP: 0010:ida_free+0x134/0x140
Call Trace:
  
  erofs_kill_sb+0x81/0x90
  deactivate_locked_super+0x35/0x80
  get_tree_bdev+0x136/0x1e0
  vfs_get_tree+0x2c/0xf0
  do_new_mount+0x190/0x2f0
  [...]


Now when erofs_kill_sb() is called, erofs_sb_info must have been
initialised, so use sbi->fsid to distinguish between the two modes.

Signed-off-by: Christian Brauner 
Signed-off-by: Baokun Li 
Reviewed-by: Jingbo Xu 
Reviewed-by: Gao Xiang 
Reviewed-by: Chao Yu 
Link: https://lore.kernel.org/r/20240419123611.947084-3-libaok...@huawei.com
Signed-off-by: Gao Xiang 
Signed-off-by: Sasha Levin 


Please help drop this patch, you should backport the dependency
commit 07abe43a28b2 ("erofs: get rid of erofs_fs_context")

in advance.

Thanks,
Gao Xiang


[PATCH v2 1/2] erofs-utils: record pclustersize in bytes instead of blocks

2024-04-30 Thread Gao Xiang
So that we don't need to handle blocksizes everywhere.

Signed-off-by: Gao Xiang 
---
v1:
  fix CI failures:

https://github.com/erofs/erofsnightly/actions/runs/8896341493/job/24428809079

 include/erofs/config.h |  4 +++-
 lib/compress.c | 38 +++---
 lib/compress_hints.c   | 11 ++-
 lib/config.c   |  2 --
 mkfs/main.c| 16 +---
 5 files changed, 37 insertions(+), 34 deletions(-)

diff --git a/include/erofs/config.h b/include/erofs/config.h
index 16910ea..3ce8c59 100644
--- a/include/erofs/config.h
+++ b/include/erofs/config.h
@@ -79,7 +79,9 @@ struct erofs_configure {
u64 c_mkfs_segment_size;
u32 c_mt_workers;
 #endif
-   u32 c_pclusterblks_max, c_pclusterblks_def, c_pclusterblks_packed;
+   u32 c_mkfs_pclustersize_max;
+   u32 c_mkfs_pclustersize_def;
+   u32 c_mkfs_pclustersize_packed;
u32 c_max_decompressed_extent_bytes;
u64 c_unix_timestamp;
u32 c_uid, c_gid;
diff --git a/lib/compress.c b/lib/compress.c
index f918322..9772543 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -416,22 +416,21 @@ static int write_uncompressed_extent(struct 
z_erofs_compress_sctx *ctx,
 
 static unsigned int z_erofs_get_max_pclustersize(struct erofs_inode *inode)
 {
-   unsigned int pclusterblks;
-
-   if (erofs_is_packed_inode(inode))
-   pclusterblks = cfg.c_pclusterblks_packed;
+   if (erofs_is_packed_inode(inode)) {
+   return cfg.c_mkfs_pclustersize_packed;
 #ifndef NDEBUG
-   else if (cfg.c_random_pclusterblks)
-   pclusterblks = 1 + rand() % cfg.c_pclusterblks_max;
+   } else if (cfg.c_random_pclusterblks) {
+   unsigned int pclusterblks =
+   cfg.c_mkfs_pclustersize_max >> inode->sbi->blkszbits;
+
+   return (1 + rand() % pclusterblks) << inode->sbi->blkszbits;
 #endif
-   else if (cfg.c_compress_hints_file) {
+   } else if (cfg.c_compress_hints_file) {
z_erofs_apply_compress_hints(inode);
DBG_BUGON(!inode->z_physical_clusterblks);
-   pclusterblks = inode->z_physical_clusterblks;
-   } else {
-   pclusterblks = cfg.c_pclusterblks_def;
+   return inode->z_physical_clusterblks << inode->sbi->blkszbits;
}
-   return pclusterblks * erofs_blksiz(inode->sbi);
+   return cfg.c_mkfs_pclustersize_def;
 }
 
 static int z_erofs_fill_inline_data(struct erofs_inode *inode, void *data,
@@ -1591,7 +1590,8 @@ static int z_erofs_build_compr_cfgs(struct erofs_sb_info 
*sbi,
.lz4 = {
.max_distance =
cpu_to_le16(sbi->lz4_max_distance),
-   .max_pclusterblks = cfg.c_pclusterblks_max,
+   .max_pclusterblks =
+   cfg.c_mkfs_pclustersize_max >> 
sbi->blkszbits,
}
};
 
@@ -1696,17 +1696,17 @@ int z_erofs_compress_init(struct erofs_sb_info *sbi, 
struct erofs_buffer_head *s
 * if big pcluster is enabled, an extra CBLKCNT lcluster index needs
 * to be loaded in order to get those compressed block counts.
 */
-   if (cfg.c_pclusterblks_max > 1) {
-   if (cfg.c_pclusterblks_max >
-   Z_EROFS_PCLUSTER_MAX_SIZE / erofs_blksiz(sbi)) {
-   erofs_err("unsupported clusterblks %u (too large)",
- cfg.c_pclusterblks_max);
+   if (cfg.c_mkfs_pclustersize_max > erofs_blksiz(sbi)) {
+   if (cfg.c_mkfs_pclustersize_max > Z_EROFS_PCLUSTER_MAX_SIZE) {
+   erofs_err("unsupported pclustersize %u (too large)",
+ cfg.c_mkfs_pclustersize_max);
return -EINVAL;
}
erofs_sb_set_big_pcluster(sbi);
}
-   if (cfg.c_pclusterblks_packed > cfg.c_pclusterblks_max) {
-   erofs_err("invalid physical cluster size for the packed file");
+   if (cfg.c_mkfs_pclustersize_packed > cfg.c_mkfs_pclustersize_max) {
+   erofs_err("invalid pclustersize for the packed file %u",
+ cfg.c_mkfs_pclustersize_packed);
return -EINVAL;
}
 
diff --git a/lib/compress_hints.c b/lib/compress_hints.c
index 8b78f80..e79bd48 100644
--- a/lib/compress_hints.c
+++ b/lib/compress_hints.c
@@ -55,7 +55,7 @@ bool z_erofs_apply_compress_hints(struct erofs_inode *inode)
return true;
 
s = erofs_fspath(inode->i_srcpath);
-   pclusterblks = cfg.c_pclusterblks_def;
+   pclusterblks = cfg.c_mkfs_pclustersize_def >> inode->sbi->blkszbits;
algorithmtype =

Re: [PATCH] erofs-utils: optimize pthread_cond_signal calling

2024-04-30 Thread Gao Xiang




On 2024/5/1 10:24, Noboru Asai wrote:

Call pthread_cond_signal once per file.

Signed-off-by: Noboru Asai 


Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


Re: [PATCH] erofs-utils: simplify file handling

2024-04-30 Thread Gao Xiang

Hi Noboru,

On 2024/4/30 14:37, Noboru Asai wrote:

Opening files again when data compression doesn't save space,
simplify file handling.

* remove dup and lseek.
* call pthread_cond_signal once per file.

I think the probability of the above case occurring is a few percent.

Signed-off-by: Noboru Asai 
---
  lib/compress.c | 11 ++-
  lib/inode.c| 24 
  2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/lib/compress.c b/lib/compress.c
index 7fef698..4c7351f 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -1261,8 +1261,10 @@ void z_erofs_mt_workfn(struct erofs_work *work, void 
*tlsp)
  out:
cwork->errcode = ret;
pthread_mutex_lock(>mutex);
-   ++ictx->nfini;
-   pthread_cond_signal(>cond);
+   if (++ictx->nfini == ictx->seg_num) {
+   close(ictx->fd);


Thanks for the patch.

I think it's better to close fd in the main writer thread
(erofs_mt_write_compressed_file) rather than some random
compression worker.


+   pthread_cond_signal(>cond);


Could you send this fix seperately, also I'm not sure if
it could be some benefits to merge segments in advance
(maybe in bulk) without waiting all tasks finished.

But I may not have enough time to work on this for now..
If you have some interest, I think it's worth doing..


+   }
pthread_mutex_unlock(>mutex);
  }
  
@@ -1406,7 +1408,6 @@ int erofs_mt_write_compressed_file(struct z_erofs_compress_ictx *ictx)

blkaddr - compressed_blocks, compressed_blocks);
  
  out:

-   close(ictx->fd);
free(ictx);
return ret;
  }
@@ -1456,7 +1457,6 @@ void *erofs_begin_compressed_file(struct erofs_inode 
*inode, int fd, u64 fpos)
ictx = malloc(sizeof(*ictx));
if (!ictx)
return ERR_PTR(-ENOMEM);
-   ictx->fd = dup(fd);
} else {
  #ifdef EROFS_MT_ENABLED
pthread_mutex_lock(_ictx.mutex);
@@ -1466,8 +1466,8 @@ void *erofs_begin_compressed_file(struct erofs_inode 
*inode, int fd, u64 fpos)
pthread_mutex_unlock(_ictx.mutex);
  #endif
ictx = _ictx;
-   ictx->fd = fd;
}
+   ictx->fd = fd;
  
  	ictx->ccfg = _ccfg[inode->z_algorithmtype[0]];

inode->z_algorithmtype[0] = ictx->ccfg->algorithmtype;
@@ -1551,6 +1551,7 @@ int erofs_write_compressed_file(struct 
z_erofs_compress_ictx *ictx)
init_list_head();
  
  	ret = z_erofs_compress_segment(, -1, blkaddr);

+   close(ictx->fd);
if (ret)
goto err_free_idata;
  
diff --git a/lib/inode.c b/lib/inode.c

index 44d684f..a30975b 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -1112,27 +1112,27 @@ static void erofs_fixup_meta_blkaddr(struct erofs_inode 
*rootdir)
  struct erofs_mkfs_job_ndir_ctx {
struct erofs_inode *inode;
void *ictx;
-   int fd;
  };
  
  static int erofs_mkfs_job_write_file(struct erofs_mkfs_job_ndir_ctx *ctx)

  {
struct erofs_inode *inode = ctx->inode;
+   int fd;
int ret;
  
  	if (ctx->ictx) {

ret = erofs_write_compressed_file(ctx->ictx);
if (ret != -ENOSPC)
-   goto out;
-   if (lseek(ctx->fd, 0, SEEK_SET) < 0) {
-   ret = -errno;
-   goto out;
-   }
+   return ret;
}
+
/* fallback to all data uncompressed */
-   ret = erofs_write_unencoded_file(inode, ctx->fd, 0);
-out:
-   close(ctx->fd);
+   fd = open(inode->i_srcpath, O_RDONLY | O_BINARY);


At a quick glance, here we need to open i_srcpath again, I tend to
avoid it so I use dup instead.

Thanks,
Gao Xiang


[PATCH 1/2] erofs-utils: record pclustersize in bytes instead of blocks

2024-04-30 Thread Gao Xiang
So that we don't need to handle blocksizes everywhere.

Signed-off-by: Gao Xiang 
---
 include/erofs/config.h |  4 +++-
 lib/compress.c | 30 --
 lib/compress_hints.c   | 11 ++-
 lib/config.c   |  2 --
 mkfs/main.c| 16 +---
 5 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/include/erofs/config.h b/include/erofs/config.h
index 16910ea..3ce8c59 100644
--- a/include/erofs/config.h
+++ b/include/erofs/config.h
@@ -79,7 +79,9 @@ struct erofs_configure {
u64 c_mkfs_segment_size;
u32 c_mt_workers;
 #endif
-   u32 c_pclusterblks_max, c_pclusterblks_def, c_pclusterblks_packed;
+   u32 c_mkfs_pclustersize_max;
+   u32 c_mkfs_pclustersize_def;
+   u32 c_mkfs_pclustersize_packed;
u32 c_max_decompressed_extent_bytes;
u64 c_unix_timestamp;
u32 c_uid, c_gid;
diff --git a/lib/compress.c b/lib/compress.c
index f918322..20d1568 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -416,22 +416,23 @@ static int write_uncompressed_extent(struct 
z_erofs_compress_sctx *ctx,
 
 static unsigned int z_erofs_get_max_pclustersize(struct erofs_inode *inode)
 {
-   unsigned int pclusterblks;
+   unsigned int pclustersize;
 
if (erofs_is_packed_inode(inode))
-   pclusterblks = cfg.c_pclusterblks_packed;
+   pclustersize = cfg.c_mkfs_pclustersize_packed;
 #ifndef NDEBUG
else if (cfg.c_random_pclusterblks)
-   pclusterblks = 1 + rand() % cfg.c_pclusterblks_max;
+   pclustersize = ((1 + rand()) << inode->sbi->blkszbits) %
+   cfg.c_mkfs_pclustersize_max;
 #endif
else if (cfg.c_compress_hints_file) {
z_erofs_apply_compress_hints(inode);
DBG_BUGON(!inode->z_physical_clusterblks);
-   pclusterblks = inode->z_physical_clusterblks;
+   pclustersize = inode->z_physical_clusterblks << 
inode->sbi->blkszbits;
} else {
-   pclusterblks = cfg.c_pclusterblks_def;
+   pclustersize = cfg.c_mkfs_pclustersize_def;
}
-   return pclusterblks * erofs_blksiz(inode->sbi);
+   return pclustersize;
 }
 
 static int z_erofs_fill_inline_data(struct erofs_inode *inode, void *data,
@@ -1591,7 +1592,8 @@ static int z_erofs_build_compr_cfgs(struct erofs_sb_info 
*sbi,
.lz4 = {
.max_distance =
cpu_to_le16(sbi->lz4_max_distance),
-   .max_pclusterblks = cfg.c_pclusterblks_max,
+   .max_pclusterblks =
+   cfg.c_mkfs_pclustersize_max >> 
sbi->blkszbits,
}
};
 
@@ -1696,17 +1698,17 @@ int z_erofs_compress_init(struct erofs_sb_info *sbi, 
struct erofs_buffer_head *s
 * if big pcluster is enabled, an extra CBLKCNT lcluster index needs
 * to be loaded in order to get those compressed block counts.
 */
-   if (cfg.c_pclusterblks_max > 1) {
-   if (cfg.c_pclusterblks_max >
-   Z_EROFS_PCLUSTER_MAX_SIZE / erofs_blksiz(sbi)) {
-   erofs_err("unsupported clusterblks %u (too large)",
- cfg.c_pclusterblks_max);
+   if (cfg.c_mkfs_pclustersize_max > erofs_blksiz(sbi)) {
+   if (cfg.c_mkfs_pclustersize_max > Z_EROFS_PCLUSTER_MAX_SIZE) {
+   erofs_err("unsupported pclustersize %u (too large)",
+ cfg.c_mkfs_pclustersize_max);
return -EINVAL;
}
erofs_sb_set_big_pcluster(sbi);
}
-   if (cfg.c_pclusterblks_packed > cfg.c_pclusterblks_max) {
-   erofs_err("invalid physical cluster size for the packed file");
+   if (cfg.c_mkfs_pclustersize_packed > cfg.c_mkfs_pclustersize_max) {
+   erofs_err("invalid pclustersize for the packed file %u",
+ cfg.c_mkfs_pclustersize_packed);
return -EINVAL;
}
 
diff --git a/lib/compress_hints.c b/lib/compress_hints.c
index 8b78f80..e79bd48 100644
--- a/lib/compress_hints.c
+++ b/lib/compress_hints.c
@@ -55,7 +55,7 @@ bool z_erofs_apply_compress_hints(struct erofs_inode *inode)
return true;
 
s = erofs_fspath(inode->i_srcpath);
-   pclusterblks = cfg.c_pclusterblks_def;
+   pclusterblks = cfg.c_mkfs_pclustersize_def >> inode->sbi->blkszbits;
algorithmtype = 0;
 
list_for_each_entry(r, _hints_head, list) {
@@ -136,7 +136,7 @@ int erofs_load_compress_hints(struct erofs_sb_info *sbi)
if (pclustersize % erofs_blksiz(sbi)) {
erofs_warn("invalid physical clust

[PATCH 2/2] erofs-utils: lib: adjust MicroLZMA default dictionary size

2024-04-30 Thread Gao Xiang
If dict_size is not given, it will be set as max(32k, pclustersize * 8)
but no more than Z_EROFS_LZMA_MAX_DICT_SIZE.

Also kill an obsolete warning since multi-threaded support is landed.

Signed-off-by: Gao Xiang 
---
 lib/compressor_liblzma.c | 19 +++
 mkfs/main.c  |  8 ++--
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/lib/compressor_liblzma.c b/lib/compressor_liblzma.c
index 2f19a93..d609a28 100644
--- a/lib/compressor_liblzma.c
+++ b/lib/compressor_liblzma.c
@@ -70,11 +70,18 @@ static int erofs_compressor_liblzma_setlevel(struct 
erofs_compress *c,
 static int erofs_compressor_liblzma_setdictsize(struct erofs_compress *c,
u32 dict_size)
 {
-   if (!dict_size)
-   dict_size = erofs_compressor_lzma.default_dictsize;
+   if (!dict_size) {
+   if (erofs_compressor_lzma.default_dictsize) {
+   dict_size = erofs_compressor_lzma.default_dictsize;
+   } else {
+   dict_size = min_t(u32, Z_EROFS_LZMA_MAX_DICT_SIZE,
+ cfg.c_mkfs_pclustersize_max << 3);
+   if (dict_size < 32768)
+   dict_size = 32768;
+   }
+   }
 
-   if (dict_size > erofs_compressor_lzma.max_dictsize ||
-   dict_size < 4096) {
+   if (dict_size > Z_EROFS_LZMA_MAX_DICT_SIZE || dict_size < 4096) {
erofs_err("invalid dictionary size %u", dict_size);
return -EINVAL;
}
@@ -86,7 +93,6 @@ static int erofs_compressor_liblzma_init(struct 
erofs_compress *c)
 {
struct erofs_liblzma_context *ctx;
u32 preset;
-   static erofs_atomic_bool_t __warnonce;
 
ctx = malloc(sizeof(*ctx));
if (!ctx)
@@ -105,15 +111,12 @@ static int erofs_compressor_liblzma_init(struct 
erofs_compress *c)
ctx->opt.dict_size = c->dict_size;
 
c->private_data = ctx;
-   if (!erofs_atomic_test_and_set(&__warnonce))
-   erofs_warn("It may take a longer time since MicroLZMA is still 
single-threaded for now.");
return 0;
 }
 
 const struct erofs_compressor erofs_compressor_lzma = {
.default_level = LZMA_PRESET_DEFAULT,
.best_level = 109,
-   .default_dictsize = Z_EROFS_LZMA_MAX_DICT_SIZE,
.max_dictsize = Z_EROFS_LZMA_MAX_DICT_SIZE,
.init = erofs_compressor_liblzma_init,
.exit = erofs_compressor_liblzma_exit,
diff --git a/mkfs/main.c b/mkfs/main.c
index 3d19f60..bbf4f43 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -137,8 +137,12 @@ static void usage(int argc, char **argv)
   spaces, s->c->best_level, 
s->c->default_level);
}
if (s->c->setdictsize) {
-   printf("%s  [,dictsize=]\t(default=%u, 
max=%u)\n",
-  spaces, s->c->default_dictsize, 
s->c->max_dictsize);
+   if (s->c->default_dictsize)
+   printf("%s  
[,dictsize=]\t(default=%u, max=%u)\n",
+  spaces, s->c->default_dictsize, 
s->c->max_dictsize);
+   else
+   printf("%s  
[,dictsize=]\t(default=, max=%u)\n",
+  spaces, s->c->max_dictsize);
}
}
printf(
-- 
2.39.3



[GIT PULL] erofs fixes for 6.9-rc7

2024-04-29 Thread Gao Xiang
Hi Linus,

Could you consider this pull request for 6.9-rc7?

Here are three fixes related to EROFS fscache mode.  The most important
two patches fix calling kill_block_super() in bdev-based mode instead of
kill_anon_super() as mentioned in [1].  The rest patch is an informative
one.

All commits have been in -next and no potential merge conflict is
observed.

[1] https://lore.kernel.org/r/15ab9875-5123-7bc2-bb25-fc683129a...@huawei.com

Thanks,
Gao Xiang

The following changes since commit ed30a4a51bb196781c8058073ea720133a65596f:

  Linux 6.9-rc5 (2024-04-21 12:35:54 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git 
tags/erofs-for-6.9-rc7-fixes

for you to fetch changes up to 7af2ae1b1531feab5d38ec9c8f472dc6cceb4606:

  erofs: reliably distinguish block based and fscache mode (2024-04-28 20:36:52 
+0800)


Changes since last update:

 - Better error message when prepare_ondemand_read failed;

 - Fix unmount of bdev-based mode if CONFIG_EROFS_FS_ONDEMAND is on.


Baokun Li (1):
  erofs: get rid of erofs_fs_context

Christian Brauner (1):
  erofs: reliably distinguish block based and fscache mode

Hongbo Li (1):
  erofs: modify the error message when prepare_ondemand_read failed

 fs/erofs/fscache.c  |   2 +-
 fs/erofs/internal.h |   7 ---
 fs/erofs/super.c| 124 +++-
 3 files changed, 56 insertions(+), 77 deletions(-)


Re: [PATCH -next v3 2/2] erofs: reliably distinguish block based and fscache mode

2024-04-28 Thread Gao Xiang




On 2024/4/19 20:36, Baokun Li wrote:

From: Christian Brauner 

When erofs_kill_sb() is called in block dev based mode, s_bdev may not
have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled,
it will be mistaken for fscache mode, and then attempt to free an anon_dev
that has never been allocated, triggering the following warning:


ida_free called for id=0 which is not allocated.
WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
Modules linked in:
CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
RIP: 0010:ida_free+0x134/0x140
Call Trace:
  
  erofs_kill_sb+0x81/0x90
  deactivate_locked_super+0x35/0x80
  get_tree_bdev+0x136/0x1e0
  vfs_get_tree+0x2c/0xf0
  do_new_mount+0x190/0x2f0
  [...]


Now when erofs_kill_sb() is called, erofs_sb_info must have been
initialised, so use sbi->fsid to distinguish between the two modes.

Signed-off-by: Christian Brauner 
Signed-off-by: Baokun Li 


Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


Re: [PATCH -next v3 1/2] erofs: get rid of erofs_fs_context

2024-04-28 Thread Gao Xiang




On 2024/4/19 20:36, Baokun Li wrote:

Instead of allocating the erofs_sb_info in fill_super() allocate it during
erofs_init_fs_context() and ensure that erofs can always have the info
available during erofs_kill_sb(). After this erofs_fs_context is no longer
needed, replace ctx with sbi, no functional changes.

Suggested-by: Jingbo Xu 
Signed-off-by: Baokun Li 


Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


[PATCH] erofs-utils: mkfs: use all available processors by default

2024-04-27 Thread Gao Xiang
Fulfill the needs of most users.

Signed-off-by: Gao Xiang 
---
 include/erofs/config.h |  3 +--
 lib/compress.c | 16 
 lib/config.c   |  5 -
 mkfs/main.c| 10 --
 4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/erofs/config.h b/include/erofs/config.h
index d2f91ff..16910ea 100644
--- a/include/erofs/config.h
+++ b/include/erofs/config.h
@@ -76,10 +76,9 @@ struct erofs_configure {
/* < 0, xattr disabled and INT_MAX, always use inline xattrs */
int c_inline_xattr_tolerance;
 #ifdef EROFS_MT_ENABLED
-   u64 c_segment_size;
+   u64 c_mkfs_segment_size;
u32 c_mt_workers;
 #endif
-
u32 c_pclusterblks_max, c_pclusterblks_def, c_pclusterblks_packed;
u32 c_max_decompressed_extent_bytes;
u64 c_unix_timestamp;
diff --git a/lib/compress.c b/lib/compress.c
index 7fef698..f918322 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -1255,7 +1255,7 @@ void z_erofs_mt_workfn(struct erofs_work *work, void 
*tlsp)
}
sctx->memoff = 0;
 
-   ret = z_erofs_compress_segment(sctx, sctx->seg_idx * cfg.c_segment_size,
+   ret = z_erofs_compress_segment(sctx, sctx->seg_idx * 
cfg.c_mkfs_segment_size,
   EROFS_NULL_ADDR);
 
 out:
@@ -1304,7 +1304,7 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx)
struct erofs_compress_work *cur, *head = NULL, **last = 
struct erofs_compress_cfg *ccfg = ictx->ccfg;
struct erofs_inode *inode = ictx->inode;
-   int nsegs = DIV_ROUND_UP(inode->i_size, cfg.c_segment_size);
+   int nsegs = DIV_ROUND_UP(inode->i_size, cfg.c_mkfs_segment_size);
int i;
 
ictx->seg_num = nsegs;
@@ -1338,9 +1338,9 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx)
if (i == nsegs - 1)
cur->ctx.remaining = inode->i_size -
  inode->fragment_size -
- i * cfg.c_segment_size;
+ i * cfg.c_mkfs_segment_size;
else
-   cur->ctx.remaining = cfg.c_segment_size;
+   cur->ctx.remaining = cfg.c_mkfs_segment_size;
 
cur->alg_id = ccfg->handle.alg->id;
cur->alg_name = ccfg->handle.alg->name;
@@ -1718,6 +1718,14 @@ int z_erofs_compress_init(struct erofs_sb_info *sbi, 
struct erofs_buffer_head *s
 
z_erofs_mt_enabled = false;
 #ifdef EROFS_MT_ENABLED
+   if (cfg.c_mt_workers > 1 && (cfg.c_dedupe || cfg.c_fragments)) {
+   if (cfg.c_dedupe)
+   erofs_warn("multi-threaded dedupe is NOT implemented 
for now");
+   if (cfg.c_fragments)
+   erofs_warn("multi-threaded fragments is NOT implemented 
for now");
+   cfg.c_mt_workers = 0;
+   }
+
if (cfg.c_mt_workers > 1) {
ret = erofs_alloc_workqueue(_erofs_mt_ctrl.wq,
cfg.c_mt_workers,
diff --git a/lib/config.c b/lib/config.c
index 2530274..98adaef 100644
--- a/lib/config.c
+++ b/lib/config.c
@@ -38,11 +38,6 @@ void erofs_init_configure(void)
cfg.c_pclusterblks_max = 1;
cfg.c_pclusterblks_def = 1;
cfg.c_max_decompressed_extent_bytes = -1;
-#ifdef EROFS_MT_ENABLED
-   cfg.c_segment_size = 16ULL * 1024 * 1024;
-   cfg.c_mt_workers = 1;
-#endif
-
erofs_stdout_tty = isatty(STDOUT_FILENO);
 }
 
diff --git a/mkfs/main.c b/mkfs/main.c
index d632f74..9ad213b 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -838,12 +838,6 @@ static int mkfs_parse_options_cfg(int argc, char *argv[])
}
cfg.c_pclusterblks_packed = pclustersize_packed >> 
sbi.blkszbits;
}
-#ifdef EROFS_MT_ENABLED
-   if (cfg.c_mt_workers > 1 && (cfg.c_dedupe || cfg.c_fragments)) {
-   erofs_warn("Note that dedupe/fragments are NOT supported in 
multi-threaded mode for now, resetting --workers=1.");
-   cfg.c_mt_workers = 1;
-   }
-#endif
return 0;
 }
 
@@ -954,6 +948,10 @@ static void erofs_mkfs_default_options(void)
cfg.c_legacy_compress = false;
cfg.c_inline_data = true;
cfg.c_xattr_name_filter = true;
+#ifdef EROFS_MT_ENABLED
+   cfg.c_mt_workers = erofs_get_available_processors();
+   cfg.c_mkfs_segment_size = 16ULL * 1024 * 1024;
+#endif
sbi.blkszbits = ilog2(min_t(u32, getpagesize(), EROFS_MAX_BLOCK_SIZE));
sbi.feature_incompat = EROFS_FEATURE_INCOMPAT_ZERO_PADDING;
sbi.feature_compat = EROFS_FEATURE_COMPAT_SB_CHKSUM |
-- 
2.39.3



Re: [PATCH v2] erofs-utils: add missing block counting

2024-04-25 Thread Gao Xiang




On 2024/4/25 10:48, Noboru Asai wrote:

Hi Gao,

Oh, sorry.
I knew to access i_blkaddr on uncompressed file, but it didn't occur
on the file system for testing, so I overlooked it.
  I needed to be careful.


np, I've fixed it :)

Thanks,
Gao Xiang


Re: Trying to work with the tests

2024-04-24 Thread Gao Xiang




On 2024/4/24 18:37, Ian Kent wrote:

On 24/4/24 17:47, Ian Kent wrote:

On 23/4/24 18:34, Gao Xiang wrote:

Hi Ian,

On 2024/4/22 21:10, Ian Kent wrote:

On 22/4/24 17:12, Gao Xiang wrote:

Hi Ian,

(+Cc Jingbo here).

On 2024/4/22 16:31, Ian Kent wrote:

I'm new to the list so Hi to all,


I'm working with a heavily patched 5.14 kernel and I've gathered together 
patches to bring erofs

up to 5.19 and I'm trying to run the erofs and fscache tests from a checkout of 
the 1.7.1 repo.

(branch experimental-tests-fscache) and I have a couple of fails I can't quite 
work out so I'm

hoping for a little halp.


Thanks for your interest and provide the detailed infos.

I guess a modified 5.14 kernel may be originated from RHEL 9?


Yes, that's right.

I am working on improving erofs support in RHEL which of course goes via CentOS 
Stream 9.


BTW, could you submit the current patches to CentOS stream 9 mainline?
so I could review as well.


CentOS Stream is meant to allow our development to be public so, yes, I would 
like to do that.


It will be interesting to see how it works, I'll have a look around the CentOS 
web site to see if I can work

out how it looks to external people.


Timing is good too as I'm about to construct a merge request and our process 
requires that to be against

the CentOS Stream repo.


That repository is located on GitLab ... so we'll need to work out how to go 
about that.


Looking at the CentOS web page at 
https://docs.centos.org/en-US/stream-contrib/quickstart/

you would need a GitLab account to take part in the merge request review 
process.


Yes, I have a gitlab account.




If you wanted to take part in the case discussion as well you would need a Red 
Hat Issues

account (sign up https://issues.redhat.com/). This is only needed if you want 
to take part

in development/log bug reports, etc. since a Jira bug is required for each 
merge request.


I guess I don't need a Red Hat Issues account, since I could comment in the PR 
itself.




As the Mandalorian would say, "this is the way".


If you don't wish to do this then I can post elsewhere, perhaps a kernel.org 
repo. but it gets

a bit harder if we work outside of the development process.


Nope, gitlab repo is fine, and I already participated in CentOS
Stream 9 repo before.

Thanks,
Gao Xiang




Ian



Re: [PATCH v2] erofs-utils: add missing block counting

2024-04-24 Thread Gao Xiang
On Wed, Apr 24, 2024 at 02:15:58PM +0800, Gao Xiang wrote:
> 
> 
> On 2024/4/24 13:59, Noboru Asai wrote:
> > Add missing block counting when the data to be inlined is not inlined.
> > 
> > ---
> > v2:
> > - move from erofs_write_tail_end() to erofs_prepare_tail_block()
> > 
> > Signed-off-by: Noboru Asai 
> 
> Reviewed-by: Gao Xiang 
> 
> Thanks,
> Gao Xiang

I applied the following version since v2 caused CI failure:
https://github.com/erofs/erofsnightly/actions/runs/8812585654


>From 89e76dda5fd4956709bbb88b76063ef165fa3882 Mon Sep 17 00:00:00 2001
From: Noboru Asai 
Date: Wed, 24 Apr 2024 14:59:23 +0900
Subject: [PATCH] erofs-utils: add missing block counting

Add missing block counting when the data to be inlined is not inlined.

Signed-off-by: Noboru Asai 
Reviewed-by: Gao Xiang 
Signed-off-by: Gao Xiang 
---
 lib/inode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/inode.c b/lib/inode.c
index 7508c74..896a257 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -664,6 +664,8 @@ static int erofs_prepare_tail_block(struct erofs_inode 
*inode)
} else {
inode->lazy_tailblock = true;
}
+   if (is_inode_layout_compression(inode))
+   inode->u.i_blocks += 1;
return 0;
 }
 
-- 
2.30.2



Re: Trying to work with the tests

2024-04-24 Thread Gao Xiang




On 2024/4/24 17:47, Ian Kent wrote:

On 23/4/24 18:34, Gao Xiang wrote:

Hi Ian,

On 2024/4/22 21:10, Ian Kent wrote:

On 22/4/24 17:12, Gao Xiang wrote:

Hi Ian,

(+Cc Jingbo here).

On 2024/4/22 16:31, Ian Kent wrote:

I'm new to the list so Hi to all,


I'm working with a heavily patched 5.14 kernel and I've gathered together 
patches to bring erofs

up to 5.19 and I'm trying to run the erofs and fscache tests from a checkout of 
the 1.7.1 repo.

(branch experimental-tests-fscache) and I have a couple of fails I can't quite 
work out so I'm

hoping for a little halp.


Thanks for your interest and provide the detailed infos.

I guess a modified 5.14 kernel may be originated from RHEL 9?


Yes, that's right.

I am working on improving erofs support in RHEL which of course goes via CentOS 
Stream 9.


BTW, could you submit the current patches to CentOS stream 9 mainline?
so I could review as well.


CentOS Stream is meant to allow our development to be public so, yes, I would 
like to do that.


It will be interesting to see how it works, I'll have a look around the CentOS 
web site to see if I can work

out how it looks to external people.


Timing is good too as I'm about to construct a merge request and our process 
requires that to be against

the CentOS Stream repo.


That repository is located on GitLab ... so we'll need to work out how to go 
about that.


Yeah, so I could apply the pull request and play on my local
development machice.












I have a plan to backport the latest EROFS to CentOS stream 9, but
currently I'm busy in internal stuffs, so it's still a bit delayed...


Right, Eric mentioned you were keen to help out.


The full back port is a bit much to do in one step, I'd like to let it settle 
for a minor release before considering

further back port effort. Of course any assistance is also welcome if and when 
you have time.


Yeah, since you've already picked to Linux 5.19.  So I think I could
also give a try if these commits are available on gitlab...

I think I can help nail down the issue (fscache/005) too.


Right, there are only a couple of problems with the tests so I've decided to go 
ahead with the merge request and

work the test problems out those out as I go.


Yes, that is more effective to the problem on my side :-)

Thanks,
Gao Xiang




Ian


Re: [PATCH v2] erofs-utils: add missing block counting

2024-04-24 Thread Gao Xiang




On 2024/4/24 13:59, Noboru Asai wrote:

Add missing block counting when the data to be inlined is not inlined.

---
v2:
- move from erofs_write_tail_end() to erofs_prepare_tail_block()

Signed-off-by: Noboru Asai 


Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


Re: [PATCH v2] erofs-utils: fsck: extract chunk-based file with hole correctly

2024-04-24 Thread Gao Xiang
On Mon, Apr 22, 2024 at 07:31:32PM +0800, Yifan Zhao wrote:
> Currently fsck skips file extraction if it finds that EROFS_MAP_MAPPED
> is unset, which is not the case for chunk-based files with hole. This
> patch handles the corner case correctly.
> 
> Signed-off-by: Yifan Zhao 

I will apply the following version:

>From 56e2f73cec3fa45d8b1dd1e9ec571b1f075d2275 Mon Sep 17 00:00:00 2001
From: Yifan Zhao 
Date: Mon, 22 Apr 2024 19:31:32 +0800
Subject: [PATCH] erofs-utils: fsck: extract chunk-based file with hole correctly

Currently fsck skips file extraction if it finds that EROFS_MAP_MAPPED
is unset, which is not the case for chunk-based files with holes.

This patch handles the corner case correctly.

Signed-off-by: Yifan Zhao 
Signed-off-by: Gao Xiang 
---
 fsck/main.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fsck/main.c b/fsck/main.c
index e5c37be..4dcb49d 100644
--- a/fsck/main.c
+++ b/fsck/main.c
@@ -470,9 +470,18 @@ static int erofs_verify_inode_data(struct erofs_inode 
*inode, int outfd)
pos += map.m_llen;
 
/* should skip decomp? */
-   if (!(map.m_flags & EROFS_MAP_MAPPED) || !fsckcfg.check_decomp)
+   if (map.m_la >= inode->i_size || !fsckcfg.check_decomp)
continue;
 
+   if (outfd >= 0 && !(map.m_flags & EROFS_MAP_MAPPED)) {
+   ret = lseek(outfd, map.m_llen, SEEK_CUR);
+   if (ret < 0) {
+   ret = -errno;
+   goto out;
+   }
+   continue;
+   }
+
if (map.m_plen > Z_EROFS_PCLUSTER_MAX_SIZE) {
if (compressed) {
erofs_err("invalid pcluster size %" PRIu64 " @ 
offset %" PRIu64 " of nid %" PRIu64,
-- 
2.30.2



Re: [PATCH] erofs-utils: add missing block counting

2024-04-23 Thread Gao Xiang




On 2024/4/24 13:33, Noboru Asai wrote:

Hi Gao,

I think that erofs_balloc() and erofs_bh_baloon() function in
erofs_write_tail_end()
also alloc a tail block, Is it not true?


erofs_prepare_tail_block() is the place to decide the fallback
tail block. But due to some dependency, bh can be allocated in
erofs_write_tail_end() later.

erofs_write_tail_end() is designed for filling tail data, not
decide to get a fallback tail block, anyway.

commit 21d84349e79a ("erofs-utils: rearrange on-disk metadata")
changed the timing due to some dependency as I said before, but
later I need to revisit it.  erofs_prepare_tail_block() is the
original place to decide if a fallback tail block is needed,
that is also true for old versions.

Thanks,
Gao Xiang


Re: [PATCH] erofs-utils: add missing block counting

2024-04-23 Thread Gao Xiang

Hi Noboru,

On 2024/4/24 12:34, Noboru Asai wrote:

Add missing block counting when the data to be inlined is not inlined.

Signed-off-by: Noboru Asai 



Thanks for catching this! Could we fixup this at
erofs_prepare_tail_block()?

since currently it the place to allocate a tail block for this.

Thanks,
Gao Xiang


---
  lib/inode.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/lib/inode.c b/lib/inode.c
index cf22bbe..727dcee 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -840,6 +840,7 @@ static int erofs_write_tail_end(struct erofs_inode *inode)
inode->idata_size = 0;
free(inode->idata);
inode->idata = NULL;
+   inode->u.i_blocks += 1;
  
  		erofs_droid_blocklist_write_tail_end(inode, erofs_blknr(sbi, pos));

}


Re: [PATCH 5/5] cachefiles: add missing lock protection when polling

2024-04-23 Thread Gao Xiang

Hi Baokun,

On 2024/4/24 11:34, libao...@huaweicloud.com wrote:

From: Jingbo Xu 

Add missing lock protection in poll routine when iterating xarray,
otherwise:

Even with RCU read lock held, only the slot of the radix tree is
ensured to be pinned there, while the data structure (e.g. struct
cachefiles_req) stored in the slot has no such guarantee.  The poll
routine will iterate the radix tree and dereference cachefiles_req
accordingly.  Thus RCU read lock is not adequate in this case and
spinlock is needed here.

Fixes: b817e22b2e91 ("cachefiles: narrow the scope of triggering EPOLLIN events in 
ondemand mode")
Signed-off-by: Jingbo Xu 
Reviewed-by: Joseph Qi 
Reviewed-by: Gao Xiang 


I'm not sure why this patch didn't send upstream,
https://gitee.com/anolis/cloud-kernel/commit/324ecaaa10fefb0e3d94b547e3170e40b90cda1f

But since we're now working on upstreaming, so let's drop
the previous in-house review tags..

Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


Signed-off-by: Baokun Li 
---
  fs/cachefiles/daemon.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 6465e2574230..73ed2323282a 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -365,14 +365,14 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
  
  	if (cachefiles_in_ondemand_mode(cache)) {

if (!xa_empty(>reqs)) {
-   rcu_read_lock();
+   xas_lock();
xas_for_each_marked(, req, ULONG_MAX, 
CACHEFILES_REQ_NEW) {
if 
(!cachefiles_ondemand_is_reopening_read(req)) {
mask |= EPOLLIN;
break;
}
}
-   rcu_read_unlock();
+   xas_unlock();
}
} else {
if (test_bit(CACHEFILES_STATE_CHANGED, >flags))


Re: [PATCH -next] erofs: modify the error message when prepare_ondemand_read failed

2024-04-23 Thread Gao Xiang



(+cc linux-erofs & LKML)

On 2024/4/24 10:39, Hongbo Li wrote:

When prepare_ondemand_read failed, wrong error message is printed.
The prepare_read is also implemented in cachefiles, so we amend it.

Signed-off-by: Hongbo Li 


Reviewed-by: Gao Xiang 

Could you resend the patch with proper mailing list cced with my
"reviewed-by:" tag?  So I could apply with "b4" tool.

Thanks,
Gao Xiang


---
  fs/erofs/fscache.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 8aff1a724805..62da538d91cb 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -151,7 +151,7 @@ static int erofs_fscache_read_io_async(struct 
fscache_cookie *cookie,
if (WARN_ON(len == 0))
source = NETFS_INVALID_READ;
if (source != NETFS_READ_FROM_CACHE) {
-   erofs_err(NULL, "prepare_read failed (source %d)", 
source);
+   erofs_err(NULL, "prepare_ondemand_read failed (source 
%d)", source);
return -EIO;
}
  


Re: [syzbot] [erofs?] BUG: using smp_processor_id() in preemptible code in z_erofs_get_gbuf

2024-04-23 Thread Gao Xiang




On 2024/4/10 13:19, syzbot wrote:

Hello,

syzbot tried to test the proposed patch but the build/boot failed:



#syz invalid

Since
https://lore.kernel.org/r/20240408215231.3376659-1-dhav...@google.com
has been folded into the original patch and this issue is only in -next.

Thanks,
Gao Xiang


Re: [PATCH -next v3 1/2] erofs: get rid of erofs_fs_context

2024-04-23 Thread Gao Xiang




On 2024/4/19 20:36, Baokun Li wrote:

Instead of allocating the erofs_sb_info in fill_super() allocate it during
erofs_init_fs_context() and ensure that erofs can always have the info
available during erofs_kill_sb(). After this erofs_fs_context is no longer
needed, replace ctx with sbi, no functional changes.

Suggested-by: Jingbo Xu 
Signed-off-by: Baokun Li 


Thanks, it looks good to me, let's see how it behaves after applying
to -next.

Thanks,
Gao Xiang


Re: Trying to work with the tests

2024-04-23 Thread Gao Xiang

Hi Ian,

On 2024/4/22 21:10, Ian Kent wrote:

On 22/4/24 17:12, Gao Xiang wrote:

Hi Ian,

(+Cc Jingbo here).

On 2024/4/22 16:31, Ian Kent wrote:

I'm new to the list so Hi to all,


I'm working with a heavily patched 5.14 kernel and I've gathered together 
patches to bring erofs

up to 5.19 and I'm trying to run the erofs and fscache tests from a checkout of 
the 1.7.1 repo.

(branch experimental-tests-fscache) and I have a couple of fails I can't quite 
work out so I'm

hoping for a little halp.


Thanks for your interest and provide the detailed infos.

I guess a modified 5.14 kernel may be originated from RHEL 9?


Yes, that's right.

I am working on improving erofs support in RHEL which of course goes via CentOS 
Stream 9.


BTW, could you submit the current patches to CentOS stream 9 mainline?
so I could review as well.






I have a plan to backport the latest EROFS to CentOS stream 9, but
currently I'm busy in internal stuffs, so it's still a bit delayed...


Right, Eric mentioned you were keen to help out.


The full back port is a bit much to do in one step, I'd like to let it settle 
for a minor release before considering

further back port effort. Of course any assistance is also welcome if and when 
you have time.


Yeah, since you've already picked to Linux 5.19.  So I think I could
also give a try if these commits are available on gitlab...

I think I can help nail down the issue (fscache/005) too.

Thanks,
Gao Xiang


Re: [PATCH] erofs-utils: fsck: extract chunk-based file with hole correctly

2024-04-22 Thread Gao Xiang




On 2024/4/22 18:05, Yifan Zhao wrote:

Currently fsck skips file extraction if it finds that EROFS_MAP_MAPPED
is unset, which is not the case for chunk-based files with hole. This
patch handles the corner case correctly.

Signed-off-by: Yifan Zhao 
---
  fsck/main.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fsck/main.c b/fsck/main.c
index e5c37be..c10b68e 100644
--- a/fsck/main.c
+++ b/fsck/main.c
@@ -470,7 +470,7 @@ static int erofs_verify_inode_data(struct erofs_inode 
*inode, int outfd)
pos += map.m_llen;
  
  		/* should skip decomp? */

-   if (!(map.m_flags & EROFS_MAP_MAPPED) || !fsckcfg.check_decomp)
+   if (map.m_la >= inode->i_size || !fsckcfg.check_decomp)
continue;
  
  		if (map.m_plen > Z_EROFS_PCLUSTER_MAX_SIZE) {

@@ -517,9 +517,14 @@ static int erofs_verify_inode_data(struct erofs_inode 
*inode, int outfd)
u64 count = min_t(u64, alloc_rawsize,
  map.m_llen);
  
-ret = erofs_read_one_data(inode, , raw, p, count);

-   if (ret)
-   goto out;
+   if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+   memset(raw, 0, count);


I think we could use lseek instead of write
zeros explicitly..


Thanks,
Gao Xiang


Re: Trying to work with the tests

2024-04-22 Thread Gao Xiang
-asynchronous-io-for-fscache-readpage_readahead.patch
+ erofs-scan-devices-from-device-table.patch
+ erofs-leave-compressed-inodes-unsupported-in-fscache-mode-for-now.patch
+ erofs-fix-crash-when-enable-tracepoint-cachefiles_prep_read.patch
+ erofs-get-rid-of-struct-z_erofs_collection.patch
+ erofs-get-rid-of-label-restart_now.patch
+ erofs-simplify-z_erofs_pcluster_readmore.patch
+ erofs-fix-backmost-member-of-z_erofs_decompress_frontend.patch
+ erofs-missing-hunks.patch


The last patch consists of what looks like a few hunks added by Linus to 
complete a folio pull

request that came in at the same time as the 5.19 erofs merge request. I know 
the list of

patches isn't very useful but it should give some idea of what I have and maybe 
someone can

spot a missing patch or so.


Anyway, my failing tests are erofs/021, erofs/022, erofs/024 and fscache/005.


I guess the following failure fails as expected:
erofs/021  -- uncompressed sub-page block sizes (esp. 512-byte block sizes, 
since v6.4)
erofs/022  -- long xattr prefix (since v6.4)
erofs/024  -- deflate algorithm support (since v6.6)

So these failures can be skipped on your side, I think I need to modify
these tests for gracefully skipping ... That is also why all testcases
are marked as "experimental" :-)

I'm not quite sure why "fscache/005" fails, hopefully Jingbo could
help you on this.

Thanks!
Gao Xiang



erofs/018 does not run due to "lzma compression is disabled, skipped." message 
which I think

is too old a version of xz.


Any insight into cases that could cause these tests to fail would be much 
appreciated.


Ian




[PATCH v2 8/8] erofs-utils: mkfs: enable inter-file multi-threaded compression

2024-04-21 Thread Gao Xiang
From: Gao Xiang 

Dispatch deferred ops in another per-sb worker thread.  Note that
deferred ops are strictly FIFOed.

Signed-off-by: Gao Xiang 
---
 include/erofs/internal.h |   6 ++
 lib/inode.c  | 119 ++-
 2 files changed, 123 insertions(+), 2 deletions(-)

diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index f31e548..ecbbdf6 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -71,6 +71,7 @@ struct erofs_xattr_prefix_item {
 
 #define EROFS_PACKED_NID_UNALLOCATED   -1
 
+struct erofs_mkfs_dfops;
 struct erofs_sb_info {
struct erofs_device_info *devs;
char *devname;
@@ -124,6 +125,11 @@ struct erofs_sb_info {
struct list_head list;
 
u64 saved_by_deduplication;
+
+#ifdef EROFS_MT_ENABLED
+   pthread_t dfops_worker;
+   struct erofs_mkfs_dfops *mkfs_dfops;
+#endif
 };
 
 /* make sure that any user of the erofs headers has atleast 64bit off_t type */
diff --git a/lib/inode.c b/lib/inode.c
index 6ad66bf..cf22bbe 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -1165,6 +1165,7 @@ enum erofs_mkfs_jobtype { /* ordered job types */
EROFS_MKFS_JOB_NDIR,
EROFS_MKFS_JOB_DIR,
EROFS_MKFS_JOB_DIR_BH,
+   EROFS_MKFS_JOB_MAX
 };
 
 struct erofs_mkfs_jobitem {
@@ -1203,6 +1204,73 @@ static int erofs_mkfs_jobfn(struct erofs_mkfs_jobitem 
*item)
return -EINVAL;
 }
 
+#ifdef EROFS_MT_ENABLED
+
+struct erofs_mkfs_dfops {
+   pthread_t worker;
+   pthread_mutex_t lock;
+   pthread_cond_t full, empty;
+   struct erofs_mkfs_jobitem *queue;
+   unsigned int entries, head, tail;
+};
+
+#define EROFS_MT_QUEUE_SIZE 128
+
+void *erofs_mkfs_pop_jobitem(struct erofs_mkfs_dfops *q)
+{
+   struct erofs_mkfs_jobitem *item;
+
+   pthread_mutex_lock(>lock);
+   while (q->head == q->tail)
+   pthread_cond_wait(>empty, >lock);
+
+   item = q->queue + q->head;
+   q->head = (q->head + 1) & (q->entries - 1);
+
+   pthread_cond_signal(>full);
+   pthread_mutex_unlock(>lock);
+   return item;
+}
+
+void *z_erofs_mt_dfops_worker(void *arg)
+{
+   struct erofs_sb_info *sbi = arg;
+   int ret = 0;
+
+   while (1) {
+   struct erofs_mkfs_jobitem *item;
+
+   item = erofs_mkfs_pop_jobitem(sbi->mkfs_dfops);
+   if (item->type >= EROFS_MKFS_JOB_MAX)
+   break;
+   ret = erofs_mkfs_jobfn(item);
+   if (ret)
+   break;
+   }
+   pthread_exit((void *)(uintptr_t)ret);
+}
+
+int erofs_mkfs_go(struct erofs_sb_info *sbi,
+ enum erofs_mkfs_jobtype type, void *elem, int size)
+{
+   struct erofs_mkfs_jobitem *item;
+   struct erofs_mkfs_dfops *q = sbi->mkfs_dfops;
+
+   pthread_mutex_lock(>lock);
+
+   while (((q->tail + 1) & (q->entries - 1)) == q->head)
+   pthread_cond_wait(>full, >lock);
+
+   item = q->queue + q->tail;
+   item->type = type;
+   memcpy(>u, elem, size);
+   q->tail = (q->tail + 1) & (q->entries - 1);
+
+   pthread_cond_signal(>empty);
+   pthread_mutex_unlock(>lock);
+   return 0;
+}
+#else
 int erofs_mkfs_go(struct erofs_sb_info *sbi,
  enum erofs_mkfs_jobtype type, void *elem, int size)
 {
@@ -1212,6 +1280,7 @@ int erofs_mkfs_go(struct erofs_sb_info *sbi,
memcpy(, elem, size);
return erofs_mkfs_jobfn();
 }
+#endif
 
 static int erofs_mkfs_handle_directory(struct erofs_inode *dir)
 {
@@ -1344,7 +1413,11 @@ static int erofs_mkfs_handle_inode(struct erofs_inode 
*inode)
return ret;
 }
 
-struct erofs_inode *erofs_mkfs_build_tree_from_path(const char *path)
+#ifndef EROFS_MT_ENABLED
+#define __erofs_mkfs_build_tree_from_path erofs_mkfs_build_tree_from_path
+#endif
+
+struct erofs_inode *__erofs_mkfs_build_tree_from_path(const char *path)
 {
struct erofs_inode *root, *dumpdir;
int err;
@@ -1399,10 +1472,52 @@ struct erofs_inode 
*erofs_mkfs_build_tree_from_path(const char *path)
if (err)
return ERR_PTR(err);
} while (dumpdir);
-
return root;
 }
 
+#ifdef EROFS_MT_ENABLED
+struct erofs_inode *erofs_mkfs_build_tree_from_path(const char *path)
+{
+   struct erofs_mkfs_dfops *q;
+   struct erofs_inode *root;
+   int err;
+
+   q = malloc(sizeof(*q));
+   if (!q)
+   return ERR_PTR(-ENOMEM);
+
+   q->entries = EROFS_MT_QUEUE_SIZE;
+   q->queue = malloc(q->entries * sizeof(*q->queue));
+   if (!q->queue) {
+   free(q);
+   return ERR_PTR(-ENOMEM);
+   }
+   pthread_mutex_init(>lock, NULL);
+   pthread_cond_init(>empty, NULL);
+   pthread_cond_init(>full, NULL);
+
+   q->head = 0;
+   q->tail = 0;
+   s

[PATCH v2 7/8] erofs-utils: lib: introduce non-directory jobitem context

2024-04-21 Thread Gao Xiang
From: Gao Xiang 

It will describe EROFS_MKFS_JOB_NDIR defer work.  Also, start
compression before queueing EROFS_MKFS_JOB_NDIR.

Signed-off-by: Gao Xiang 
---
 lib/inode.c | 62 +++--
 1 file changed, 51 insertions(+), 11 deletions(-)

diff --git a/lib/inode.c b/lib/inode.c
index 0d044f4..6ad66bf 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -1107,8 +1107,36 @@ static void erofs_fixup_meta_blkaddr(struct erofs_inode 
*rootdir)
rootdir->nid = (off - meta_offset) >> EROFS_ISLOTBITS;
 }
 
-static int erofs_mkfs_handle_nondirectory(struct erofs_inode *inode)
+struct erofs_mkfs_job_ndir_ctx {
+   struct erofs_inode *inode;
+   void *ictx;
+   int fd;
+};
+
+static int erofs_mkfs_job_write_file(struct erofs_mkfs_job_ndir_ctx *ctx)
 {
+   struct erofs_inode *inode = ctx->inode;
+   int ret;
+
+   if (ctx->ictx) {
+   ret = erofs_write_compressed_file(ctx->ictx);
+   if (ret != -ENOSPC)
+   goto out;
+   if (lseek(ctx->fd, 0, SEEK_SET) < 0) {
+   ret = -errno;
+   goto out;
+   }
+   }
+   /* fallback to all data uncompressed */
+   ret = erofs_write_unencoded_file(inode, ctx->fd, 0);
+out:
+   close(ctx->fd);
+   return ret;
+}
+
+static int erofs_mkfs_handle_nondirectory(struct erofs_mkfs_job_ndir_ctx *ctx)
+{
+   struct erofs_inode *inode = ctx->inode;
int ret = 0;
 
if (S_ISLNK(inode->i_mode)) {
@@ -1124,12 +1152,7 @@ static int erofs_mkfs_handle_nondirectory(struct 
erofs_inode *inode)
ret = erofs_write_file_from_buffer(inode, symlink);
free(symlink);
} else if (inode->i_size) {
-   int fd = open(inode->i_srcpath, O_RDONLY | O_BINARY);
-
-   if (fd < 0)
-   return -errno;
-   ret = erofs_write_file(inode, fd, 0);
-   close(fd);
+   ret = erofs_mkfs_job_write_file(ctx);
}
if (ret)
return ret;
@@ -1148,6 +1171,7 @@ struct erofs_mkfs_jobitem {
enum erofs_mkfs_jobtype type;
union {
struct erofs_inode *inode;
+   struct erofs_mkfs_job_ndir_ctx ndir;
} u;
 };
 
@@ -1157,7 +1181,7 @@ static int erofs_mkfs_jobfn(struct erofs_mkfs_jobitem 
*item)
int ret;
 
if (item->type == EROFS_MKFS_JOB_NDIR)
-   return erofs_mkfs_handle_nondirectory(inode);
+   return erofs_mkfs_handle_nondirectory(>u.ndir);
 
if (item->type == EROFS_MKFS_JOB_DIR) {
ret = erofs_prepare_inode_buffer(inode);
@@ -1294,11 +1318,27 @@ static int erofs_mkfs_handle_inode(struct erofs_inode 
*inode)
if (ret < 0)
return ret;
 
-   if (!S_ISDIR(inode->i_mode))
+   if (!S_ISDIR(inode->i_mode)) {
+   struct erofs_mkfs_job_ndir_ctx ctx = { .inode = inode };
+
+   if (!S_ISLNK(inode->i_mode) && inode->i_size) {
+   ctx.fd = open(inode->i_srcpath, O_RDONLY | O_BINARY);
+   if (ctx.fd < 0)
+   return -errno;
+
+   if (cfg.c_compr_opts[0].alg &&
+   erofs_file_is_compressible(inode)) {
+   ctx.ictx = erofs_begin_compressed_file(inode,
+   ctx.fd, 0);
+   if (IS_ERR(ctx.ictx))
+   return PTR_ERR(ctx.ictx);
+   }
+   }
ret = erofs_mkfs_go(inode->sbi, EROFS_MKFS_JOB_NDIR,
-   , sizeof(inode));
-   else
+   , sizeof(ctx));
+   } else {
ret = erofs_mkfs_handle_directory(inode);
+   }
erofs_info("file %s dumped (mode %05o)", erofs_fspath(inode->i_srcpath),
   inode->i_mode);
return ret;
-- 
2.30.2



[PATCH v2 6/8] erofs-utils: mkfs: prepare inter-file multi-threaded compression

2024-04-21 Thread Gao Xiang
From: Yifan Zhao 

This patch separates the compression process into two parts.

Specifically, erofs_begin_compressed_file() will trigger compression.
erofs_write_compressed_file() will wait for the compression finish and
write compressed (meta)data.

Note that it's possible that erofs_begin_compressed_file() and
erofs_write_compressed_file() run with different threads even the
global inode context is used, thus add another synchronization point.

Signed-off-by: Yifan Zhao 
Co-authored-by: Tong Xin 
Signed-off-by: Gao Xiang 
---
 include/erofs/compress.h |   5 +-
 lib/compress.c   | 138 ---
 lib/inode.c  |  17 -
 3 files changed, 118 insertions(+), 42 deletions(-)

diff --git a/include/erofs/compress.h b/include/erofs/compress.h
index 871db54..c9831a7 100644
--- a/include/erofs/compress.h
+++ b/include/erofs/compress.h
@@ -17,8 +17,11 @@ extern "C"
 #define EROFS_CONFIG_COMPR_MAX_SZ  (4000 * 1024)
 #define Z_EROFS_COMPR_QUEUE_SZ (EROFS_CONFIG_COMPR_MAX_SZ * 2)
 
+struct z_erofs_compress_ictx;
+
 void z_erofs_drop_inline_pcluster(struct erofs_inode *inode);
-int erofs_write_compressed_file(struct erofs_inode *inode, int fd, u64 fpos);
+void *erofs_begin_compressed_file(struct erofs_inode *inode, int fd, u64 fpos);
+int erofs_write_compressed_file(struct z_erofs_compress_ictx *ictx);
 
 int z_erofs_compress_init(struct erofs_sb_info *sbi,
  struct erofs_buffer_head *bh);
diff --git a/lib/compress.c b/lib/compress.c
index 4ac4760..7fef698 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -109,6 +109,7 @@ struct erofs_compress_work {
 static struct {
struct erofs_workqueue wq;
struct erofs_compress_work *idle;
+   pthread_mutex_t mutex;
 } z_erofs_mt_ctrl;
 #endif
 
@@ -1312,11 +1313,13 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx)
pthread_cond_init(>cond, NULL);
 
for (i = 0; i < nsegs; i++) {
+   pthread_mutex_lock(_erofs_mt_ctrl.mutex);
cur = z_erofs_mt_ctrl.idle;
if (cur) {
z_erofs_mt_ctrl.idle = cur->next;
cur->next = NULL;
}
+   pthread_mutex_unlock(_erofs_mt_ctrl.mutex);
if (!cur) {
cur = calloc(1, sizeof(*cur));
if (!cur)
@@ -1364,8 +1367,10 @@ int erofs_mt_write_compressed_file(struct 
z_erofs_compress_ictx *ictx)
pthread_mutex_unlock(>mutex);
 
bh = erofs_balloc(DATA, 0, 0, 0);
-   if (IS_ERR(bh))
-   return PTR_ERR(bh);
+   if (IS_ERR(bh)) {
+   ret = PTR_ERR(bh);
+   goto out;
+   }
 
DBG_BUGON(!head);
blkaddr = erofs_mapbh(bh->block);
@@ -1389,27 +1394,31 @@ int erofs_mt_write_compressed_file(struct 
z_erofs_compress_ictx *ictx)
blkaddr = cur->ctx.blkaddr;
}
 
+   pthread_mutex_lock(_erofs_mt_ctrl.mutex);
cur->next = z_erofs_mt_ctrl.idle;
z_erofs_mt_ctrl.idle = cur;
-   } while(head);
+   pthread_mutex_unlock(_erofs_mt_ctrl.mutex);
+   } while (head);
 
if (ret)
-   return ret;
-
-   return erofs_commit_compressed_file(ictx, bh,
+   goto out;
+   ret = erofs_commit_compressed_file(ictx, bh,
blkaddr - compressed_blocks, compressed_blocks);
+
+out:
+   close(ictx->fd);
+   free(ictx);
+   return ret;
 }
 #endif
 
-int erofs_write_compressed_file(struct erofs_inode *inode, int fd, u64 fpos)
+static struct z_erofs_compress_ictx g_ictx;
+
+void *erofs_begin_compressed_file(struct erofs_inode *inode, int fd, u64 fpos)
 {
-   static u8 g_queue[Z_EROFS_COMPR_QUEUE_SZ];
-   struct erofs_buffer_head *bh;
-   static struct z_erofs_compress_ictx ctx;
-   static struct z_erofs_compress_sctx sctx;
-   erofs_blk_t blkaddr;
-   int ret;
struct erofs_sb_info *sbi = inode->sbi;
+   struct z_erofs_compress_ictx *ictx;
+   int ret;
 
/* initialize per-file compression setting */
inode->z_advise = 0;
@@ -1440,43 +1449,87 @@ int erofs_write_compressed_file(struct erofs_inode 
*inode, int fd, u64 fpos)
}
}
 #endif
-   ctx.ccfg = _ccfg[inode->z_algorithmtype[0]];
-   inode->z_algorithmtype[0] = ctx.ccfg->algorithmtype;
-   inode->z_algorithmtype[1] = 0;
-
inode->idata_size = 0;
inode->fragment_size = 0;
 
+   if (z_erofs_mt_enabled) {
+   ictx = malloc(sizeof(*ictx));
+   if (!ictx)
+   return ERR_PTR(-ENOMEM);
+   ictx->fd = dup(fd);
+   } else {
+#ifdef EROFS_MT_ENABLED
+   pthread_mutex_lock(_ictx.mutex);
+   if (g_ictx.seg_num)
+   pthread_cond_w

[PATCH v2 5/8] erofs-utils: lib: split up z_erofs_mt_compress()

2024-04-21 Thread Gao Xiang
From: Gao Xiang 

The on-disk compressed data write will be moved into a new function
erofs_mt_write_compressed_file().

Signed-off-by: Gao Xiang 
---
 lib/compress.c | 162 -
 1 file changed, 93 insertions(+), 69 deletions(-)

diff --git a/lib/compress.c b/lib/compress.c
index 0bc5426..4ac4760 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -57,6 +57,8 @@ struct z_erofs_compress_ictx {/* inode 
context */
pthread_mutex_t mutex;
pthread_cond_t cond;
int nfini;
+
+   struct erofs_compress_work *mtworks;
 #endif
 };
 
@@ -530,11 +532,11 @@ static int __z_erofs_compress_one(struct 
z_erofs_compress_sctx *ctx,
if (len <= ctx->pclustersize) {
if (!final || !len)
return 1;
+   if (inode->fragment_size && !ictx->fix_dedupedfrag) {
+   ctx->pclustersize = roundup(len, blksz);
+   goto fix_dedupedfrag;
+   }
if (may_packing) {
-   if (inode->fragment_size && !ictx->fix_dedupedfrag) {
-   ctx->pclustersize = roundup(len, blksz);
-   goto fix_dedupedfrag;
-   }
e->length = len;
goto frag_packing;
}
@@ -1034,6 +1036,26 @@ int z_erofs_compress_segment(struct 
z_erofs_compress_sctx *ctx,
z_erofs_commit_extent(ctx, ctx->pivot);
ctx->pivot = NULL;
}
+
+   /* generate an extra extent for the deduplicated fragment */
+   if (ctx->seg_idx >= ictx->seg_num - 1 &&
+   ictx->inode->fragment_size && !ictx->fragemitted) {
+   struct z_erofs_extent_item *ei;
+
+   ei = malloc(sizeof(*ei));
+   if (!ei)
+   return -ENOMEM;
+
+   ei->e = (struct z_erofs_inmem_extent) {
+   .length = ictx->inode->fragment_size,
+   .compressedblks = 0,
+   .raw = false,
+   .partial = false,
+   .blkaddr = ctx->blkaddr,
+   };
+   init_list_head(>list);
+   z_erofs_commit_extent(ctx, ei);
+   }
return 0;
 }
 
@@ -1048,6 +1070,8 @@ int erofs_commit_compressed_file(struct 
z_erofs_compress_ictx *ictx,
u8 *compressmeta;
int ret;
 
+   z_erofs_fragments_commit(inode);
+
/* fall back to no compression mode */
DBG_BUGON(compressed_blocks < !!inode->idata_size);
compressed_blocks -= !!inode->idata_size;
@@ -1125,11 +1149,11 @@ err_free_meta:
free(compressmeta);
inode->compressmeta = NULL;
 err_free_idata:
+   erofs_bdrop(bh, true);  /* revoke buffer */
if (inode->idata) {
free(inode->idata);
inode->idata = NULL;
}
-   erofs_bdrop(bh, true);  /* revoke buffer */
return ret;
 }
 
@@ -1260,7 +1284,7 @@ int z_erofs_merge_segment(struct z_erofs_compress_ictx 
*ictx,
sctx->blkaddr += ei->e.compressedblks;
 
/* skip write data but leave blkaddr for inline fallback */
-   if (ei->e.inlined)
+   if (ei->e.inlined || !ei->e.compressedblks)
continue;
ret2 = blk_write(sbi, sctx->membuf + blkoff * erofs_blksiz(sbi),
 ei->e.blkaddr, ei->e.compressedblks);
@@ -1274,15 +1298,13 @@ int z_erofs_merge_segment(struct z_erofs_compress_ictx 
*ictx,
return ret;
 }
 
-int z_erofs_mt_compress(struct z_erofs_compress_ictx *ictx,
-   erofs_blk_t blkaddr,
-   erofs_blk_t *compressed_blocks)
+int z_erofs_mt_compress(struct z_erofs_compress_ictx *ictx)
 {
struct erofs_compress_work *cur, *head = NULL, **last = 
struct erofs_compress_cfg *ccfg = ictx->ccfg;
struct erofs_inode *inode = ictx->inode;
int nsegs = DIV_ROUND_UP(inode->i_size, cfg.c_segment_size);
-   int ret, i;
+   int i;
 
ictx->seg_num = nsegs;
ictx->nfini = 0;
@@ -1290,11 +1312,12 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx,
pthread_cond_init(>cond, NULL);
 
for (i = 0; i < nsegs; i++) {
-   if (z_erofs_mt_ctrl.idle) {
-   cur = z_erofs_mt_ctrl.idle;
+   cur = z_erofs_mt_ctrl.idle;
+   if (cur) {
z_erofs_mt_ctrl.idle = cur->next;
cur->next = NULL;
-   } else {
+   }
+   if (!cur) {
cur = calloc(1, sizeof(*cur));
if (!cur)
   

[PATCH v2 4/8] erofs-utils: rearrange several fields for multi-threaded mkfs

2024-04-21 Thread Gao Xiang
From: Gao Xiang 

They should be located in `struct z_erofs_compress_ictx`.

Signed-off-by: Gao Xiang 
---
 lib/compress.c | 57 +++---
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/lib/compress.c b/lib/compress.c
index 8ca4033..0bc5426 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -38,6 +38,7 @@ struct z_erofs_extent_item {
 
 struct z_erofs_compress_ictx { /* inode context */
struct erofs_inode *inode;
+   struct erofs_compress_cfg *ccfg;
int fd;
u64 fpos;
 
@@ -49,6 +50,14 @@ struct z_erofs_compress_ictx {   /* inode 
context */
u8 *metacur;
struct list_head extents;
u16 clusterofs;
+
+   int seg_num;
+
+#if EROFS_MT_ENABLED
+   pthread_mutex_t mutex;
+   pthread_cond_t cond;
+   int nfini;
+#endif
 };
 
 struct z_erofs_compress_sctx { /* segment context */
@@ -68,7 +77,7 @@ struct z_erofs_compress_sctx {/* segment 
context */
erofs_blk_t blkaddr;/* pointing to the next blkaddr */
u16 clusterofs;
 
-   int seg_num, seg_idx;
+   int seg_idx;
 
void *membuf;
erofs_off_t memoff;
@@ -98,9 +107,6 @@ struct erofs_compress_work {
 static struct {
struct erofs_workqueue wq;
struct erofs_compress_work *idle;
-   pthread_mutex_t mutex;
-   pthread_cond_t cond;
-   int nfini;
 } z_erofs_mt_ctrl;
 #endif
 
@@ -513,7 +519,7 @@ static int __z_erofs_compress_one(struct 
z_erofs_compress_sctx *ctx,
struct erofs_compress *const h = ctx->chandle;
unsigned int len = ctx->tail - ctx->head;
bool is_packed_inode = erofs_is_packed_inode(inode);
-   bool tsg = (ctx->seg_idx + 1 >= ctx->seg_num), final = !ctx->remaining;
+   bool tsg = (ctx->seg_idx + 1 >= ictx->seg_num), final = !ctx->remaining;
bool may_packing = (cfg.c_fragments && tsg && final &&
!is_packed_inode && !z_erofs_mt_enabled);
bool may_inline = (cfg.c_ztailpacking && tsg && final && !may_packing);
@@ -1201,7 +1207,8 @@ void z_erofs_mt_workfn(struct erofs_work *work, void 
*tlsp)
struct erofs_compress_work *cwork = (struct erofs_compress_work *)work;
struct erofs_compress_wq_tls *tls = tlsp;
struct z_erofs_compress_sctx *sctx = >ctx;
-   struct erofs_inode *inode = sctx->ictx->inode;
+   struct z_erofs_compress_ictx *ictx = sctx->ictx;
+   struct erofs_inode *inode = ictx->inode;
struct erofs_sb_info *sbi = inode->sbi;
int ret = 0;
 
@@ -1228,10 +1235,10 @@ void z_erofs_mt_workfn(struct erofs_work *work, void 
*tlsp)
 
 out:
cwork->errcode = ret;
-   pthread_mutex_lock(_erofs_mt_ctrl.mutex);
-   ++z_erofs_mt_ctrl.nfini;
-   pthread_cond_signal(_erofs_mt_ctrl.cond);
-   pthread_mutex_unlock(_erofs_mt_ctrl.mutex);
+   pthread_mutex_lock(>mutex);
+   ++ictx->nfini;
+   pthread_cond_signal(>cond);
+   pthread_mutex_unlock(>mutex);
 }
 
 int z_erofs_merge_segment(struct z_erofs_compress_ictx *ictx,
@@ -1268,16 +1275,19 @@ int z_erofs_merge_segment(struct z_erofs_compress_ictx 
*ictx,
 }
 
 int z_erofs_mt_compress(struct z_erofs_compress_ictx *ictx,
-   struct erofs_compress_cfg *ccfg,
erofs_blk_t blkaddr,
erofs_blk_t *compressed_blocks)
 {
struct erofs_compress_work *cur, *head = NULL, **last = 
+   struct erofs_compress_cfg *ccfg = ictx->ccfg;
struct erofs_inode *inode = ictx->inode;
int nsegs = DIV_ROUND_UP(inode->i_size, cfg.c_segment_size);
int ret, i;
 
-   z_erofs_mt_ctrl.nfini = 0;
+   ictx->seg_num = nsegs;
+   ictx->nfini = 0;
+   pthread_mutex_init(>mutex, NULL);
+   pthread_cond_init(>cond, NULL);
 
for (i = 0; i < nsegs; i++) {
if (z_erofs_mt_ctrl.idle) {
@@ -1294,7 +1304,6 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx,
 
cur->ctx = (struct z_erofs_compress_sctx) {
.ictx = ictx,
-   .seg_num = nsegs,
.seg_idx = i,
.pivot = _pivot,
};
@@ -1316,11 +1325,10 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx,
erofs_queue_work(_erofs_mt_ctrl.wq, >work);
}
 
-   pthread_mutex_lock(_erofs_mt_ctrl.mutex);
-   while (z_erofs_mt_ctrl.nfini != nsegs)
-   pthread_cond_wait(_erofs_mt_ctrl.cond,
- _erofs_mt_ctrl.mutex);
-   pthread_mutex_unlock(_erofs_mt_ctrl.mutex);
+   pthread_mutex_lock(>mutex);
+   while (ictx->nfini < ictx->seg_num)
+   pthread_cond_wait(>cond, >mutex);
+   

[PATCH v2 3/8] erofs-utils: lib: split out erofs_commit_compressed_file()

2024-04-21 Thread Gao Xiang
From: Gao Xiang 

Just split out on-disk compressed metadata commit logic.

Signed-off-by: Gao Xiang 
---
 lib/compress.c | 191 +++--
 1 file changed, 105 insertions(+), 86 deletions(-)

diff --git a/lib/compress.c b/lib/compress.c
index b084446..8ca4033 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -1031,6 +1031,102 @@ int z_erofs_compress_segment(struct 
z_erofs_compress_sctx *ctx,
return 0;
 }
 
+int erofs_commit_compressed_file(struct z_erofs_compress_ictx *ictx,
+struct erofs_buffer_head *bh,
+erofs_blk_t blkaddr,
+erofs_blk_t compressed_blocks)
+{
+   struct erofs_inode *inode = ictx->inode;
+   struct erofs_sb_info *sbi = inode->sbi;
+   unsigned int legacymetasize;
+   u8 *compressmeta;
+   int ret;
+
+   /* fall back to no compression mode */
+   DBG_BUGON(compressed_blocks < !!inode->idata_size);
+   compressed_blocks -= !!inode->idata_size;
+
+   compressmeta = malloc(BLK_ROUND_UP(sbi, inode->i_size) *
+ sizeof(struct z_erofs_lcluster_index) +
+ Z_EROFS_LEGACY_MAP_HEADER_SIZE);
+   if (!compressmeta) {
+   ret = -ENOMEM;
+   goto err_free_idata;
+   }
+   ictx->metacur = compressmeta + Z_EROFS_LEGACY_MAP_HEADER_SIZE;
+   z_erofs_write_indexes(ictx);
+
+   legacymetasize = ictx->metacur - compressmeta;
+   /* estimate if data compression saves space or not */
+   if (!inode->fragment_size &&
+   compressed_blocks * erofs_blksiz(sbi) + inode->idata_size +
+   legacymetasize >= inode->i_size) {
+   z_erofs_dedupe_commit(true);
+   ret = -ENOSPC;
+   goto err_free_meta;
+   }
+   z_erofs_dedupe_commit(false);
+   z_erofs_write_mapheader(inode, compressmeta);
+
+   if (!ictx->fragemitted)
+   sbi->saved_by_deduplication += inode->fragment_size;
+
+   /* if the entire file is a fragment, a simplified form is used. */
+   if (inode->i_size <= inode->fragment_size) {
+   DBG_BUGON(inode->i_size < inode->fragment_size);
+   DBG_BUGON(inode->fragmentoff >> 63);
+   *(__le64 *)compressmeta =
+   cpu_to_le64(inode->fragmentoff | 1ULL << 63);
+   inode->datalayout = EROFS_INODE_COMPRESSED_FULL;
+   legacymetasize = Z_EROFS_LEGACY_MAP_HEADER_SIZE;
+   }
+
+   if (compressed_blocks) {
+   ret = erofs_bh_balloon(bh, erofs_pos(sbi, compressed_blocks));
+   DBG_BUGON(ret != erofs_blksiz(sbi));
+   } else {
+   if (!cfg.c_fragments && !cfg.c_dedupe)
+   DBG_BUGON(!inode->idata_size);
+   }
+
+   erofs_info("compressed %s (%llu bytes) into %u blocks",
+  inode->i_srcpath, (unsigned long long)inode->i_size,
+  compressed_blocks);
+
+   if (inode->idata_size) {
+   bh->op = _skip_write_bhops;
+   inode->bh_data = bh;
+   } else {
+   erofs_bdrop(bh, false);
+   }
+
+   inode->u.i_blocks = compressed_blocks;
+
+   if (inode->datalayout == EROFS_INODE_COMPRESSED_FULL) {
+   inode->extent_isize = legacymetasize;
+   } else {
+   ret = z_erofs_convert_to_compacted_format(inode, blkaddr,
+ legacymetasize,
+ compressmeta);
+   DBG_BUGON(ret);
+   }
+   inode->compressmeta = compressmeta;
+   if (!erofs_is_packed_inode(inode))
+   erofs_droid_blocklist_write(inode, blkaddr, compressed_blocks);
+   return 0;
+
+err_free_meta:
+   free(compressmeta);
+   inode->compressmeta = NULL;
+err_free_idata:
+   if (inode->idata) {
+   free(inode->idata);
+   inode->idata = NULL;
+   }
+   erofs_bdrop(bh, true);  /* revoke buffer */
+   return ret;
+}
+
 #ifdef EROFS_MT_ENABLED
 void *z_erofs_mt_wq_tls_alloc(struct erofs_workqueue *wq, void *ptr)
 {
@@ -1260,23 +1356,9 @@ int erofs_write_compressed_file(struct erofs_inode 
*inode, int fd, u64 fpos)
static struct z_erofs_compress_sctx sctx;
struct erofs_compress_cfg *ccfg;
erofs_blk_t blkaddr, compressed_blocks = 0;
-   unsigned int legacymetasize;
int ret;
bool ismt = false;
struct erofs_sb_info *sbi = inode->sbi;
-   u8 *compressmeta = malloc(BLK_ROUND_UP(sbi, inode->i_size) *
- sizeof(struct z_erofs_lcluster_index) +
- Z_EROFS_LEGACY_MAP_HEADER_SIZE);
-
- 

[PATCH v2 2/8] erofs-utils: lib: prepare for later deferred work

2024-04-21 Thread Gao Xiang
From: Gao Xiang 

Split out ordered metadata operations and add the following helpers:

 - erofs_mkfs_jobfn()

 - erofs_mkfs_go()

to handle these mkfs job items for multi-threadding support.

Signed-off-by: Gao Xiang 
---
 lib/inode.c | 69 -
 1 file changed, 58 insertions(+), 11 deletions(-)

diff --git a/lib/inode.c b/lib/inode.c
index 55969d9..1ff05e1 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -1133,6 +1133,57 @@ static int erofs_mkfs_handle_nondirectory(struct 
erofs_inode *inode)
return 0;
 }
 
+enum erofs_mkfs_jobtype {  /* ordered job types */
+   EROFS_MKFS_JOB_NDIR,
+   EROFS_MKFS_JOB_DIR,
+   EROFS_MKFS_JOB_DIR_BH,
+};
+
+struct erofs_mkfs_jobitem {
+   enum erofs_mkfs_jobtype type;
+   union {
+   struct erofs_inode *inode;
+   } u;
+};
+
+static int erofs_mkfs_jobfn(struct erofs_mkfs_jobitem *item)
+{
+   struct erofs_inode *inode = item->u.inode;
+   int ret;
+
+   if (item->type == EROFS_MKFS_JOB_NDIR)
+   return erofs_mkfs_handle_nondirectory(inode);
+
+   if (item->type == EROFS_MKFS_JOB_DIR) {
+   ret = erofs_prepare_inode_buffer(inode);
+   if (ret)
+   return ret;
+   inode->bh->op = _skip_write_bhops;
+   if (IS_ROOT(inode)) /* assign root NID */
+   erofs_fixup_meta_blkaddr(inode);
+   return 0;
+   }
+
+   if (item->type == EROFS_MKFS_JOB_DIR_BH) {
+   erofs_write_dir_file(inode);
+   erofs_write_tail_end(inode);
+   inode->bh->op = _write_inode_bhops;
+   erofs_iput(inode);
+   return 0;
+   }
+   return -EINVAL;
+}
+
+int erofs_mkfs_go(struct erofs_sb_info *sbi,
+ enum erofs_mkfs_jobtype type, void *elem, int size)
+{
+   struct erofs_mkfs_jobitem item;
+
+   item.type = type;
+   memcpy(, elem, size);
+   return erofs_mkfs_jobfn();
+}
+
 static int erofs_mkfs_handle_directory(struct erofs_inode *dir)
 {
DIR *_dir;
@@ -1213,11 +1264,7 @@ static int erofs_mkfs_handle_directory(struct 
erofs_inode *dir)
else
dir->i_nlink = i_nlink;
 
-   ret = erofs_prepare_inode_buffer(dir);
-   if (ret)
-   return ret;
-   dir->bh->op = _skip_write_bhops;
-   return 0;
+   return erofs_mkfs_go(dir->sbi, EROFS_MKFS_JOB_DIR, , sizeof(dir));
 
 err_closedir:
closedir(_dir);
@@ -1243,7 +1290,8 @@ static int erofs_mkfs_handle_inode(struct erofs_inode 
*inode)
return ret;
 
if (!S_ISDIR(inode->i_mode))
-   ret = erofs_mkfs_handle_nondirectory(inode);
+   ret = erofs_mkfs_go(inode->sbi, EROFS_MKFS_JOB_NDIR,
+   , sizeof(inode));
else
ret = erofs_mkfs_handle_directory(inode);
erofs_info("file %s dumped (mode %05o)", erofs_fspath(inode->i_srcpath),
@@ -1268,7 +1316,6 @@ struct erofs_inode *erofs_mkfs_build_tree_from_path(const 
char *path)
err = erofs_mkfs_handle_inode(root);
if (err)
return ERR_PTR(err);
-   erofs_fixup_meta_blkaddr(root);
 
do {
int err;
@@ -1302,10 +1349,10 @@ struct erofs_inode 
*erofs_mkfs_build_tree_from_path(const char *path)
}
*last = dumpdir;/* fixup the last (or the only) one */
dumpdir = head;
-   erofs_write_dir_file(dir);
-   erofs_write_tail_end(dir);
-   dir->bh->op = _write_inode_bhops;
-   erofs_iput(dir);
+   err = erofs_mkfs_go(dir->sbi, EROFS_MKFS_JOB_DIR_BH,
+   , sizeof(dir));
+   if (err)
+   return ERR_PTR(err);
} while (dumpdir);
 
return root;
-- 
2.30.2



[PATCH v2 1/8] erofs-utils: use erofs_atomic_t for inode->i_count

2024-04-21 Thread Gao Xiang
From: Gao Xiang 

Since `inode->i_count` can be touched for more than one thread if
multi-threading is enabled.

Signed-off-by: Gao Xiang 
---
patchset v1->v2:
 - Fix `--all-fragments` functionality;
 - Fix issues pointed out by Yifan. 

 include/erofs/atomic.h   | 10 ++
 include/erofs/inode.h|  2 +-
 include/erofs/internal.h |  3 ++-
 lib/inode.c  |  5 +++--
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/erofs/atomic.h b/include/erofs/atomic.h
index 214cdb1..f28687e 100644
--- a/include/erofs/atomic.h
+++ b/include/erofs/atomic.h
@@ -25,4 +25,14 @@ __n;})
 #define erofs_atomic_test_and_set(ptr) \
__atomic_test_and_set(ptr, __ATOMIC_RELAXED)
 
+#define erofs_atomic_add_return(ptr, i) \
+   __atomic_add_fetch(ptr, i, __ATOMIC_RELAXED)
+
+#define erofs_atomic_sub_return(ptr, i) \
+   __atomic_sub_fetch(ptr, i, __ATOMIC_RELAXED)
+
+#define erofs_atomic_inc_return(ptr) erofs_atomic_add_return(ptr, 1)
+
+#define erofs_atomic_dec_return(ptr) erofs_atomic_sub_return(ptr, 1)
+
 #endif
diff --git a/include/erofs/inode.h b/include/erofs/inode.h
index d5a732a..5d6bc98 100644
--- a/include/erofs/inode.h
+++ b/include/erofs/inode.h
@@ -17,7 +17,7 @@ extern "C"
 
 static inline struct erofs_inode *erofs_igrab(struct erofs_inode *inode)
 {
-   ++inode->i_count;
+   (void)erofs_atomic_inc_return(>i_count);
return inode;
 }
 
diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index 4cd2059..f31e548 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -25,6 +25,7 @@ typedef unsigned short umode_t;
 #ifdef HAVE_PTHREAD_H
 #include 
 #endif
+#include "atomic.h"
 
 #ifndef PATH_MAX
 #define PATH_MAX4096/* # chars in a path name including nul */
@@ -169,7 +170,7 @@ struct erofs_inode {
/* (mkfs.erofs) next pointer for directory dumping */
struct erofs_inode *next_dirwrite;
};
-   unsigned int i_count;
+   erofs_atomic_t i_count;
struct erofs_sb_info *sbi;
struct erofs_inode *i_parent;
 
diff --git a/lib/inode.c b/lib/inode.c
index 7508c74..55969d9 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -129,9 +129,10 @@ struct erofs_inode *erofs_iget_by_nid(erofs_nid_t nid)
 unsigned int erofs_iput(struct erofs_inode *inode)
 {
struct erofs_dentry *d, *t;
+   unsigned long got = erofs_atomic_dec_return(>i_count);
 
-   if (inode->i_count > 1)
-   return --inode->i_count;
+   if (got >= 1)
+   return got;
 
list_for_each_entry_safe(d, t, >i_subdirs, d_child)
free(d);
-- 
2.30.2



[PATCH v2] erofs-utils: mkfs: skip the redundant write for ztailpacking block

2024-04-18 Thread Gao Xiang
From: Yifan Zhao 

z_erofs_merge_segment() doesn't consider the ztailpacking block in the
extent list and unnecessarily writes it back to the disk. This patch
fixes this issue by introducing a new `inlined` field in the struct
`z_erofs_inmem_extent`.

Fixes: 830b27bc2334 ("erofs-utils: mkfs: introduce inner-file multi-threaded 
compression")
Signed-off-by: Yifan Zhao 
[ Gao Xiang: simplify a bit. ]
Signed-off-by: Gao Xiang 
---
v2:
  Yifan's patch is almost correct, it just has minor changes.
 include/erofs/dedupe.h |  2 +-
 lib/compress.c | 14 +++---
 lib/dedupe.c   |  1 +
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/erofs/dedupe.h b/include/erofs/dedupe.h
index 153bd4c..4cbfb2c 100644
--- a/include/erofs/dedupe.h
+++ b/include/erofs/dedupe.h
@@ -16,7 +16,7 @@ struct z_erofs_inmem_extent {
erofs_blk_t blkaddr;
unsigned int compressedblks;
unsigned int length;
-   bool raw, partial;
+   bool raw, partial, inlined;
 };
 
 struct z_erofs_dedupe_ctx {
diff --git a/lib/compress.c b/lib/compress.c
index 74c5707..b084446 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -305,6 +305,7 @@ static int z_erofs_compress_dedupe(struct 
z_erofs_compress_sctx *ctx,
if (z_erofs_dedupe_match())
break;
 
+   DBG_BUGON(dctx.e.inlined);
delta = ctx->queue + ctx->head - dctx.cur;
/*
 * For big pcluster dedupe, leave two indices at least to store
@@ -519,6 +520,7 @@ static int __z_erofs_compress_one(struct 
z_erofs_compress_sctx *ctx,
unsigned int compressedsize;
int ret;
 
+   *e = (struct z_erofs_inmem_extent){};
if (len <= ctx->pclustersize) {
if (!final || !len)
return 1;
@@ -553,16 +555,18 @@ static int __z_erofs_compress_one(struct 
z_erofs_compress_sctx *ctx,
if (may_inline && len < blksz) {
ret = z_erofs_fill_inline_data(inode,
ctx->queue + ctx->head, len, true);
+   if (ret < 0)
+   return ret;
+   e->inlined = true;
} else {
may_inline = false;
may_packing = false;
 nocompression:
/* TODO: reset clusterofs to 0 if permitted */
ret = write_uncompressed_extent(ctx, len, dst);
+   if (ret < 0)
+   return ret;
}
-
-   if (ret < 0)
-   return ret;
e->length = ret;
 
/*
@@ -598,6 +602,7 @@ frag_packing:
compressedsize, false);
if (ret < 0)
return ret;
+   e->inlined = true;
e->compressedblks = 1;
e->raw = false;
} else {
@@ -1151,6 +1156,9 @@ int z_erofs_merge_segment(struct z_erofs_compress_ictx 
*ictx,
ei->e.blkaddr = sctx->blkaddr;
sctx->blkaddr += ei->e.compressedblks;
 
+   /* skip write data but leave blkaddr for inline fallback */
+   if (ei->e.inlined)
+   continue;
ret2 = blk_write(sbi, sctx->membuf + blkoff * erofs_blksiz(sbi),
 ei->e.blkaddr, ei->e.compressedblks);
blkoff += ei->e.compressedblks;
diff --git a/lib/dedupe.c b/lib/dedupe.c
index 19a1c8d..aaaccb5 100644
--- a/lib/dedupe.c
+++ b/lib/dedupe.c
@@ -138,6 +138,7 @@ int z_erofs_dedupe_match(struct z_erofs_dedupe_ctx *ctx)
ctx->e.partial = e->partial ||
(window_size + extra < e->original_length);
ctx->e.raw = e->raw;
+   ctx->e.inlined = false;
ctx->e.blkaddr = e->compressed_blkaddr;
ctx->e.compressedblks = e->compressed_blks;
return 0;
-- 
2.30.2



Re: [PATCH v2] erofs: reliably distinguish block based and fscache mode

2024-04-18 Thread Gao Xiang
gt; -    if (!sbi)
> > > > > -    return -ENOMEM;
> > > > > -
> > > > > -    sb->s_fs_info = sbi;
> > > > > -    sbi->opt = ctx->opt;
> > > > > -    sbi->devs = ctx->devs;
> > > > > -    ctx->devs = NULL;
> > > > > -    sbi->fsid = ctx->fsid;
> > > > > -    ctx->fsid = NULL;
> > > > > -    sbi->domain_id = ctx->domain_id;
> > > > > -    ctx->domain_id = NULL;
> > > > > -
> > > > >    sbi->blkszbits = PAGE_SHIFT;
> > > > >    if (erofs_is_fscache_mode(sb)) {
> > > > >    sb->s_blocksize = PAGE_SIZE;
> > > > > @@ -704,11 +690,32 @@ static int erofs_fc_fill_super(struct
> > > > > super_block *sb, struct fs_context *fc)
> > > > >    return 0;
> > > > >    }
> > > > >    -static int erofs_fc_get_tree(struct fs_context *fc)
> > > > > +static void erofs_ctx_to_info(struct fs_context *fc)
> > > > >    {
> > > > >    struct erofs_fs_context *ctx = fc->fs_private;
> > > > > +    struct erofs_sb_info *sbi = fc->s_fs_info;
> > > > > +
> > > > > +    sbi->opt = ctx->opt;
> > > > > +    sbi->devs = ctx->devs;
> > > > > +    ctx->devs = NULL;
> > > > > +    sbi->fsid = ctx->fsid;
> > > > > +    ctx->fsid = NULL;
> > > > > +    sbi->domain_id = ctx->domain_id;
> > > > > +    ctx->domain_id = NULL;
> > > > > +}
> > > > I'm not sure if abstracting this logic into a seperate helper really
> > > > helps understanding the code as the logic itself is quite simple and
> > > > easy to be understood. Usually it's a hint of over-abstraction when a
> > > > simple helper has only one caller.
> > > > 
> > > Static functions that have only one caller are compiled inline, so we
> > > don't have to worry about how that affects the code.
> > > 
> > > The reason these codes are encapsulated in a separate function is so
> > > that the code reader understands that these codes are integrated
> > > as a whole, and that we shouldn't have to move one or two of these
> > > lines individually.
> > > 
> > > But after we get rid of erofs_fs_context, those won't be needed
> > > anymore.
> > Yeah, I understand. It's only coding style concerns.
> > 
> > 
> > 
> Okay, thanks!

I'm fine to get rid of those (erofs_fs_context) as long as the codebase
is more clearer and simple.  BTW, for the current codebase, I also think
it's unneeded to have a separate helper called once without extra actual
meaning...

Thanks,
Gao Xiang

> 
> -- 
> With Best Regards,
> Baokun Li


Re: [PATCH 1/3] erofs-utils: determine the [un]compressed data block is inline or not early

2024-04-18 Thread Gao Xiang
Hi Noboru,

On Thu, Apr 18, 2024 at 02:52:29PM +0900, Noboru Asai wrote:
> Introducing erofs_get_inodesize function and erofs_get_lowest_offset function,
> we can determine the [un]compressed data block is inline or not before
> executing z_erofs_merge_segment function. It enable the following,
> 
> * skip the redundant write for ztailpacking block.
> * simplify erofs_prepare_inode_buffer function. (remove handling ENOSPC error)
> 
> Signed-off-by: Noboru Asai 

I appreciate and thanks for your effort and time.

Yet I tend to avoid assuming if the inline is ok or not before
prepare_inode_buffer() since it will be free for space allocator
to decide inline or not at that time.

So personally I still would like to write a final compressed
index for inline fallback.

I will fix this issue myself later (it should be just a small
patch to fix this). 

Thanks for your effort on this issue again!

Thanks,
Gao Xiang



Re: [PATCH v3] erofs-utils: lib: treat data blocks filled with 0s as a hole

2024-04-17 Thread Gao Xiang
On Wed, Apr 17, 2024 at 04:48:44PM -0700, Sandeep Dhavale wrote:
> Add optimization to treat data blocks filled with 0s as a hole.
> Even though diskspace savings are comparable to chunk based or dedupe,
> having no block assigned saves us redundant disk IOs during read.
> 
> To detect blocks filled with zeros during chunking, we insert block
> filled with zeros (zerochunk) in the hashmap. If we detect a possible
> dedupe, we map it to the hole so there is no physical block assigned.
> 
> Signed-off-by: Sandeep Dhavale 

Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


Re: [PATCH] erofs-utils: mkfs: skip the redundant write for ztailpacking block

2024-04-17 Thread Gao Xiang
Hi Noboru,

On Thu, Apr 18, 2024 at 10:09:22AM +0900, Noboru Asai wrote:
> In this patch, the value of blkaddr in z_erofs_lcluster_index
> corresponding to the ztailpacking block in the extent list
> is invalid value. It looks that the linux kernel doesn't refer to this
> value, but what value is correct?
> 0 or -1 (EROFS_NULL_ADDR) or don't care?

Thanks for pointing out!

On the kernel side, it doesn't care this value if it's really
_inlined_.

But on the mkfs side, since we have inline fallback so I don't
think an invalid blkaddr is correct.  The next blkaddr should
be filled for inline fallback instead.

Let me think more about it and update the patch.

Thanks,
Gao Xiang


Re: [PATCH v3] erofs-utils: dump: print filesystem blocksize

2024-04-17 Thread Gao Xiang
On Wed, Apr 17, 2024 at 05:00:54PM -0700, Sandeep Dhavale wrote:
> mkfs.erofs supports creating filesystem images with different
> blocksizes. Add filesystem blocksize in super block dump so
> its easier to inspect the filesystem.
> 
> The field is added after FS magic, so the output now looks like:
> 
> Filesystem magic number:  0xE0F5E1E2
> Filesystem blocksize: 65536
> Filesystem blocks:21
> Filesystem inode metadata start block:0
> Filesystem shared xattr metadata start block: 0
> Filesystem root nid:  36
> Filesystem lz4_max_distance:  65535
> Filesystem sb_extslots:   0
> Filesystem inode count:   10
> Filesystem created:   Wed Apr 17 16:53:10 2024
> Filesystem features:  sb_csum mtime 0padding
> Filesystem UUID:  
> e66f6dd1-6882-48c3-9770-fee7c4841a93
> 
> Signed-off-by: Sandeep Dhavale 

Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


Re: [PATCH] erofs: set SB_NODEV sb_flags when mounting with fsid

2024-04-16 Thread Gao Xiang
On Wed, Apr 17, 2024 at 10:59:53AM +0800, Baokun Li wrote:
> On 2024/4/16 22:49, Gao Xiang wrote:
> > On Tue, Apr 16, 2024 at 02:35:08PM +0200, Christian Brauner wrote:
> > > > > I'm not sure how to resolve it in EROFS itself, anyway...
> > > Instead of allocating the erofs_sb_info in fill_super() allocate it
> > > during erofs_get_tree() and then you can ensure that you always have the
> > > info you need available during erofs_kill_sb(). See the appended
> > > (untested) patch.
> > Hi Christian,
> > 
> > Yeah, that is a good way I think.  Although sbi will be allocated
> > unconditionally instead but that is minor.
> > 
> > I'm on OSSNA this week, will test this patch more when returning.
> > 
> > Hi Baokun,
> > 
> > Could you also check this on your side?
> > 
> > Thanks,
> > Gao Xiang
> Hi Xiang,
> 
> This patch does fix the initial problem.
> 
> 
> Hi Christian,
> 
> Thanks for the patch, this is a good idea. Just with nits below.
> Otherwise feel free to add.
> 
> Reviewed-and-tested-by: Baokun Li 
> > 
> > >  From e4f586a41748b6edc05aca36d49b7b39e55def81 Mon Sep 17 00:00:00 2001
> > > From: Christian Brauner 
> > > Date: Mon, 15 Apr 2024 20:17:46 +0800
> > > Subject: [PATCH] erofs: reliably distinguish block based and fscache mode
> > > 
> SNIP
> 
> > > 
> > > diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> > > index c0eb139adb07..4ed80154edf8 100644
> > > --- a/fs/erofs/super.c
> > > +++ b/fs/erofs/super.c
> > > @@ -581,7 +581,7 @@ static const struct export_operations 
> > > erofs_export_ops = {
> > >   static int erofs_fc_fill_super(struct super_block *sb, struct 
> > > fs_context *fc)
> > >   {
> > >   struct inode *inode;
> > > - struct erofs_sb_info *sbi;
> > > + struct erofs_sb_info *sbi = EROFS_SB(sb);
> > >   struct erofs_fs_context *ctx = fc->fs_private;
> > >   int err;
> > > @@ -590,15 +590,10 @@ static int erofs_fc_fill_super(struct super_block 
> > > *sb, struct fs_context *fc)
> > >   sb->s_maxbytes = MAX_LFS_FILESIZE;
> > >   sb->s_op = _sops;
> > > - sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
> > > - if (!sbi)
> > > - return -ENOMEM;
> > > -
> > >   sb->s_fs_info = sbi;
> This line is no longer needed.
> > >   sbi->opt = ctx->opt;
> > >   sbi->devs = ctx->devs;
> > >   ctx->devs = NULL;
> > > - sbi->fsid = ctx->fsid;
> > >   ctx->fsid = NULL;
> > >   sbi->domain_id = ctx->domain_id;
> > >   ctx->domain_id = NULL;
> Since erofs_sb_info is now allocated in erofs_fc_get_tree(), why not
> encapsulate the above lines as erofs_ctx_to_info() helper function
> to be called in erofs_fc_get_tree()?Then erofs_fc_fill_super() wouldn't
> have to use erofs_fs_context and would prevent the fsid from being
> freed twice.

Hi Baokun,

I'm not sure if Christian has enough time to polish the whole
codebase (I'm happy if do so).  Basically, that is just a hint
to the issue, if you have more time, I guess you could also help
revive this patch together (also because you also have a real
EROFS test environment).

Let me also check this next week after OSSNA travelling.

Thanks,
Gao Xiang


Re: [PATCH] fs/erofs: add DEFLATE algorithm support

2024-04-16 Thread Gao Xiang
Hi Jianan,

On Sun, Apr 14, 2024 at 11:04:14PM +0800, Jianan Huang wrote:
> This patch adds DEFLATE compression algorithm support. It's a good choice
> to trade off between compression ratios and performance compared to LZ4.
> Alternatively, DEFLATE could be used for some specific files since EROFS
> supports multiple compression algorithms in one image.
> 
> Signed-off-by: Jianan Huang 

Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


Re: [PATCH 4/8] erofs-utils: rearrange several fields for multi-threaded mkfs

2024-04-16 Thread Gao Xiang
On Tue, Apr 16, 2024 at 07:55:05PM +0800, Yifan Zhao wrote:
> 
> On 4/16/24 4:04 PM, Gao Xiang wrote:
> > From: Gao Xiang 
> > 
> > They should be located in `struct z_erofs_compress_ictx`.
> > 
> > Signed-off-by: Gao Xiang 
> > ---
> >   lib/compress.c | 55 --
> >   1 file changed, 31 insertions(+), 24 deletions(-)
> > 
> > diff --git a/lib/compress.c b/lib/compress.c
> > index a2e0d0f..72f33d2 100644
> > --- a/lib/compress.c
> > +++ b/lib/compress.c
> > @@ -38,6 +38,7 @@ struct z_erofs_extent_item {
> >   struct z_erofs_compress_ictx {/* inode context */
> > struct erofs_inode *inode;
> > +   struct erofs_compress_cfg *ccfg;
> > int fd;
> > u64 fpos;
> > @@ -49,6 +50,14 @@ struct z_erofs_compress_ictx {   /* inode 
> > context */
> > u8 *metacur;
> > struct list_head extents;
> > u16 clusterofs;
> > +
> > +   int seg_num;
> > +
> > +#if EROFS_MT_ENABLED
> > +   pthread_mutex_t mutex;
> > +   pthread_cond_t cond;
> > +   int nfini;
> > +#endif
> >   };
> >   struct z_erofs_compress_sctx {/* segment context */
> > @@ -68,7 +77,7 @@ struct z_erofs_compress_sctx {/* segment 
> > context */
> > erofs_blk_t blkaddr;/* pointing to the next blkaddr */
> > u16 clusterofs;
> > -   int seg_num, seg_idx;
> > +   int seg_idx;
> > void *membuf;
> > erofs_off_t memoff;
> > @@ -99,8 +108,6 @@ static struct {
> > struct erofs_workqueue wq;
> > struct erofs_compress_work *idle;
> > pthread_mutex_t mutex;
> I think `mutex` should also be removed. Do you miss it?

Yeah, will fix in the next version.

> > -   pthread_cond_t cond;
> > -   int nfini;
> >   } z_erofs_mt_ctrl;
> >   #endif
> > @@ -512,7 +519,7 @@ static int __z_erofs_compress_one(struct 
> > z_erofs_compress_sctx *ctx,
> > struct erofs_compress *const h = ctx->chandle;
> > unsigned int len = ctx->tail - ctx->head;
> > bool is_packed_inode = erofs_is_packed_inode(inode);
> > -   bool tsg = (ctx->seg_idx + 1 >= ctx->seg_num), final = !ctx->remaining;
> > +   bool tsg = (ctx->seg_idx + 1 >= ictx->seg_num), final = !ctx->remaining;
> > bool may_packing = (cfg.c_fragments && tsg && final &&
> > !is_packed_inode && !z_erofs_mt_enabled);
> > bool may_inline = (cfg.c_ztailpacking && tsg && final && !may_packing);
> > @@ -1196,7 +1203,8 @@ void z_erofs_mt_workfn(struct erofs_work *work, void 
> > *tlsp)
> > struct erofs_compress_work *cwork = (struct erofs_compress_work *)work;
> > struct erofs_compress_wq_tls *tls = tlsp;
> > struct z_erofs_compress_sctx *sctx = >ctx;
> > -   struct erofs_inode *inode = sctx->ictx->inode;
> > +   struct z_erofs_compress_ictx *ictx = sctx->ictx;
> > +   struct erofs_inode *inode = ictx->inode;
> > struct erofs_sb_info *sbi = inode->sbi;
> > int ret = 0;
> > @@ -1223,10 +1231,10 @@ void z_erofs_mt_workfn(struct erofs_work *work, 
> > void *tlsp)
> >   out:
> > cwork->errcode = ret;
> > -   pthread_mutex_lock(_erofs_mt_ctrl.mutex);
> > -   ++z_erofs_mt_ctrl.nfini;
> > -   pthread_cond_signal(_erofs_mt_ctrl.cond);
> > -   pthread_mutex_unlock(_erofs_mt_ctrl.mutex);
> > +   pthread_mutex_lock(>mutex);
> > +   ++ictx->nfini;
> > +   pthread_cond_signal(>cond);
> > +   pthread_mutex_unlock(>mutex);
> >   }
> >   int z_erofs_merge_segment(struct z_erofs_compress_ictx *ictx,
> > @@ -1260,16 +1268,19 @@ int z_erofs_merge_segment(struct 
> > z_erofs_compress_ictx *ictx,
> >   }
> >   int z_erofs_mt_compress(struct z_erofs_compress_ictx *ictx,
> > -   struct erofs_compress_cfg *ccfg,
> > erofs_blk_t blkaddr,
> > erofs_blk_t *compressed_blocks)
> >   {
> > struct erofs_compress_work *cur, *head = NULL, **last = 
> > +   struct erofs_compress_cfg *ccfg = ictx->ccfg;
> > struct erofs_inode *inode = ictx->inode;
> > int nsegs = DIV_ROUND_UP(inode->i_size, cfg.c_segment_size);
> > int ret, i;
> > -   z_erofs_mt_ctrl.nfini = 0;
> > +   ictx->seg_num = nsegs;
> > +   ictx->nfini = 0;
> > +   pthread_mutex_init(>mutex, NULL);
> > +   pthread_cond_init(>cond, NULL);
> > for (i = 0; i < nsegs; i++) {

Re: [PATCH 2/8] erofs-utils: lib: prepare for later deferred work

2024-04-16 Thread Gao Xiang
Hi Yifan,

On Tue, Apr 16, 2024 at 07:58:30PM +0800, Yifan Zhao wrote:
> 
> On 4/16/24 4:04 PM, Gao Xiang wrote:
> > From: Gao Xiang 
> > 
> > Split out ordered metadata operations and add the following helpers:
> > 
> >   - erofs_mkfs_jobfn()
> > 
> >   - erofs_mkfs_go()
> > 
> > to handle these mkfs job items for multi-threadding support.
> > 
> > Signed-off-by: Gao Xiang 
> > ---
> >   lib/inode.c | 68 +
> >   1 file changed, 58 insertions(+), 10 deletions(-)
> > 
> > diff --git a/lib/inode.c b/lib/inode.c
> > index 55969d9..8ef0604 100644
> > --- a/lib/inode.c
> > +++ b/lib/inode.c
> > @@ -1133,6 +1133,57 @@ static int erofs_mkfs_handle_nondirectory(struct 
> > erofs_inode *inode)
> > return 0;
> >   }
> > +enum erofs_mkfs_jobtype {  /* ordered job types */
> > +   EROFS_MKFS_JOB_NDIR,
> > +   EROFS_MKFS_JOB_DIR,
> > +   EROFS_MKFS_JOB_DIR_BH,
> > +};
> > +
> > +struct erofs_mkfs_jobitem {
> > +   enum erofs_mkfs_jobtype type;
> > +   union {
> > +   struct erofs_inode *inode;
> > +   } u;
> > +};
> > +
> > +static int erofs_mkfs_jobfn(struct erofs_mkfs_jobitem *item)
> > +{
> > +   struct erofs_inode *inode = item->u.inode;
> > +   int ret;
> > +
> > +   if (item->type == EROFS_MKFS_JOB_NDIR)
> > +   return erofs_mkfs_handle_nondirectory(inode);
> > +
> > +   if (item->type == EROFS_MKFS_JOB_DIR) {
> > +   ret = erofs_prepare_inode_buffer(inode);
> > +   if (ret)
> > +   return ret;
> > +   inode->bh->op = _skip_write_bhops;
> > +   if (IS_ROOT(inode))
> > +   erofs_fixup_meta_blkaddr(inode);
> 
> I think this 2 line above does not exist in the logic replaced by
> `erofs_mkfs_jobfn`, should it appear in this patch, or need further
> explanation in the commit msg?

Because erofs_fixup_meta_blkaddr() needs to be called
strictly after erofs_prepare_inode_buffer(root) is
done, which allocates on-disk inode so NID is also
meaningful then.

But you're right. This part is not quite good, let me
think more about it.

Thanks,
Gao Xiang

> 
> 
> Thanks,
> 
> Yifan Zhao


Re: [PATCH] erofs: set SB_NODEV sb_flags when mounting with fsid

2024-04-16 Thread Gao Xiang
On Tue, Apr 16, 2024 at 02:35:08PM +0200, Christian Brauner wrote:
> > > I'm not sure how to resolve it in EROFS itself, anyway...
> 
> Instead of allocating the erofs_sb_info in fill_super() allocate it
> during erofs_get_tree() and then you can ensure that you always have the
> info you need available during erofs_kill_sb(). See the appended
> (untested) patch.

Hi Christian,

Yeah, that is a good way I think.  Although sbi will be allocated
unconditionally instead but that is minor.

I'm on OSSNA this week, will test this patch more when returning.

Hi Baokun,

Could you also check this on your side?

Thanks,
Gao Xiang


> From e4f586a41748b6edc05aca36d49b7b39e55def81 Mon Sep 17 00:00:00 2001
> From: Christian Brauner 
> Date: Mon, 15 Apr 2024 20:17:46 +0800
> Subject: [PATCH] erofs: reliably distinguish block based and fscache mode
> 
> When erofs_kill_sb() is called in block dev based mode, s_bdev may not have
> been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled, it will
> be mistaken for fscache mode, and then attempt to free an anon_dev that has
> never been allocated, triggering the following warning:
> 
> 
> ida_free called for id=0 which is not allocated.
> WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
> Modules linked in:
> CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
> RIP: 0010:ida_free+0x134/0x140
> Call Trace:
>  
>  erofs_kill_sb+0x81/0x90
>  deactivate_locked_super+0x35/0x80
>  get_tree_bdev+0x136/0x1e0
>  vfs_get_tree+0x2c/0xf0
>  do_new_mount+0x190/0x2f0
>  [...]
> 
> 
> Instead of allocating the erofs_sb_info in fill_super() allocate it
> during erofs_get_tree() and ensure that erofs can always have the info
> available during erofs_kill_sb().
> 
> Signed-off-by: Baokun Li 
> Signed-off-by: Christian Brauner 
> ---
>  fs/erofs/super.c | 29 -
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index c0eb139adb07..4ed80154edf8 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -581,7 +581,7 @@ static const struct export_operations erofs_export_ops = {
>  static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>  {
>   struct inode *inode;
> - struct erofs_sb_info *sbi;
> + struct erofs_sb_info *sbi = EROFS_SB(sb);
>   struct erofs_fs_context *ctx = fc->fs_private;
>   int err;
>  
> @@ -590,15 +590,10 @@ static int erofs_fc_fill_super(struct super_block *sb, 
> struct fs_context *fc)
>   sb->s_maxbytes = MAX_LFS_FILESIZE;
>   sb->s_op = _sops;
>  
> - sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
> - if (!sbi)
> - return -ENOMEM;
> -
>   sb->s_fs_info = sbi;
>   sbi->opt = ctx->opt;
>   sbi->devs = ctx->devs;
>   ctx->devs = NULL;
> - sbi->fsid = ctx->fsid;
>   ctx->fsid = NULL;
>   sbi->domain_id = ctx->domain_id;
>   ctx->domain_id = NULL;
> @@ -707,8 +702,15 @@ static int erofs_fc_fill_super(struct super_block *sb, 
> struct fs_context *fc)
>  static int erofs_fc_get_tree(struct fs_context *fc)
>  {
>   struct erofs_fs_context *ctx = fc->fs_private;
> + struct erofs_sb_info *sbi;
> +
> + sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
> + if (!sbi)
> + return -ENOMEM;
>  
> - if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->fsid)
> + fc->s_fs_info = sbi;
> + sbi->fsid = ctx->fsid;
> + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && sbi->fsid)
>   return get_tree_nodev(fc, erofs_fc_fill_super);
>  
>   return get_tree_bdev(fc, erofs_fc_fill_super);
> @@ -762,11 +764,15 @@ static void erofs_free_dev_context(struct 
> erofs_dev_context *devs)
>  static void erofs_fc_free(struct fs_context *fc)
>  {
>   struct erofs_fs_context *ctx = fc->fs_private;
> + struct erofs_sb_info *sbi = fc->s_fs_info;
>  
>   erofs_free_dev_context(ctx->devs);
>   kfree(ctx->fsid);
>   kfree(ctx->domain_id);
>   kfree(ctx);
> +
> + if (sbi)
> + kfree(sbi);
>  }
>  
>  static const struct fs_context_operations erofs_context_ops = {
> @@ -783,6 +789,7 @@ static int erofs_init_fs_context(struct fs_context *fc)
>   ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>   if (!ctx)
>   return -ENOMEM;
> +
>   ctx->devs = kzalloc(sizeof(struct erofs_dev_context), GFP_KERNEL);
>   if (!ctx->devs) {
>   kfree(ctx);
> @@ -799,17 +8

[PATCH 7/8] erofs-utils: lib: introduce non-directory jobitem context

2024-04-16 Thread Gao Xiang
From: Gao Xiang 

It will describe EROFS_MKFS_JOB_NDIR defer work.  Also start
compression before queueing EROFS_MKFS_JOB_NDIR.

Signed-off-by: Gao Xiang 
---
 lib/inode.c | 62 +++--
 1 file changed, 51 insertions(+), 11 deletions(-)

diff --git a/lib/inode.c b/lib/inode.c
index 66eacab..681460c 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -1107,8 +1107,36 @@ static void erofs_fixup_meta_blkaddr(struct erofs_inode 
*rootdir)
rootdir->nid = (off - meta_offset) >> EROFS_ISLOTBITS;
 }
 
-static int erofs_mkfs_handle_nondirectory(struct erofs_inode *inode)
+struct erofs_mkfs_job_ndir_ctx {
+   struct erofs_inode *inode;
+   void *ictx;
+   int fd;
+};
+
+static int erofs_mkfs_job_write_file(struct erofs_mkfs_job_ndir_ctx *ctx)
 {
+   struct erofs_inode *inode = ctx->inode;
+   int ret;
+
+   if (ctx->ictx) {
+   ret = erofs_write_compressed_file(ctx->ictx);
+   if (ret != -ENOSPC)
+   goto out;
+   if (lseek(ctx->fd, 0, SEEK_SET) < 0) {
+   ret = -errno;
+   goto out;
+   }
+   }
+   /* fallback to all data uncompressed */
+   ret = erofs_write_unencoded_file(inode, ctx->fd, 0);
+out:
+   close(ctx->fd);
+   return ret;
+}
+
+static int erofs_mkfs_handle_nondirectory(struct erofs_mkfs_job_ndir_ctx *ctx)
+{
+   struct erofs_inode *inode = ctx->inode;
int ret = 0;
 
if (S_ISLNK(inode->i_mode)) {
@@ -1124,12 +1152,7 @@ static int erofs_mkfs_handle_nondirectory(struct 
erofs_inode *inode)
ret = erofs_write_file_from_buffer(inode, symlink);
free(symlink);
} else if (inode->i_size) {
-   int fd = open(inode->i_srcpath, O_RDONLY | O_BINARY);
-
-   if (fd < 0)
-   return -errno;
-   ret = erofs_write_file(inode, fd, 0);
-   close(fd);
+   ret = erofs_mkfs_job_write_file(ctx);
}
if (ret)
return ret;
@@ -1148,6 +1171,7 @@ struct erofs_mkfs_jobitem {
enum erofs_mkfs_jobtype type;
union {
struct erofs_inode *inode;
+   struct erofs_mkfs_job_ndir_ctx ndir;
} u;
 };
 
@@ -1157,7 +1181,7 @@ static int erofs_mkfs_jobfn(struct erofs_mkfs_jobitem 
*item)
int ret;
 
if (item->type == EROFS_MKFS_JOB_NDIR)
-   return erofs_mkfs_handle_nondirectory(inode);
+   return erofs_mkfs_handle_nondirectory(>u.ndir);
 
if (item->type == EROFS_MKFS_JOB_DIR) {
ret = erofs_prepare_inode_buffer(inode);
@@ -1294,11 +1318,27 @@ static int erofs_mkfs_handle_inode(struct erofs_inode 
*inode)
if (ret < 0)
return ret;
 
-   if (!S_ISDIR(inode->i_mode))
+   if (!S_ISDIR(inode->i_mode)) {
+   struct erofs_mkfs_job_ndir_ctx ctx = { .inode = inode };
+
+   if (!S_ISLNK(inode->i_mode) && inode->i_size) {
+   ctx.fd = open(inode->i_srcpath, O_RDONLY | O_BINARY);
+   if (ctx.fd < 0)
+   return -errno;
+
+   if (cfg.c_compr_opts[0].alg &&
+   erofs_file_is_compressible(inode)) {
+   ctx.ictx = erofs_begin_compressed_file(inode,
+   ctx.fd, 0);
+   if (IS_ERR(ctx.ictx))
+   return PTR_ERR(ctx.ictx);
+   }
+   }
ret = erofs_mkfs_go(inode->sbi, EROFS_MKFS_JOB_NDIR,
-   , sizeof(inode));
-   else
+   , sizeof(ctx));
+   } else {
ret = erofs_mkfs_handle_directory(inode);
+   }
erofs_info("file %s dumped (mode %05o)", erofs_fspath(inode->i_srcpath),
   inode->i_mode);
return ret;
-- 
2.30.2



[PATCH 6/8] erofs-utils: mkfs: prepare inter-file multi-threaded compression

2024-04-16 Thread Gao Xiang
From: Yifan Zhao 

This patch separate compression process into two parts.

Specifically, erofs_begin_compressed_file() will trigger compression.
erofs_write_compressed_file() will wait for compression finish and
write compressed (meta)data.

Signed-off-by: Yifan Zhao 
Co-authored-by: Tong Xin 
Signed-off-by: Gao Xiang 
---
 include/erofs/compress.h |   5 +-
 lib/compress.c   | 115 ++-
 lib/inode.c  |  17 +-
 3 files changed, 95 insertions(+), 42 deletions(-)

diff --git a/include/erofs/compress.h b/include/erofs/compress.h
index 871db54..c9831a7 100644
--- a/include/erofs/compress.h
+++ b/include/erofs/compress.h
@@ -17,8 +17,11 @@ extern "C"
 #define EROFS_CONFIG_COMPR_MAX_SZ  (4000 * 1024)
 #define Z_EROFS_COMPR_QUEUE_SZ (EROFS_CONFIG_COMPR_MAX_SZ * 2)
 
+struct z_erofs_compress_ictx;
+
 void z_erofs_drop_inline_pcluster(struct erofs_inode *inode);
-int erofs_write_compressed_file(struct erofs_inode *inode, int fd, u64 fpos);
+void *erofs_begin_compressed_file(struct erofs_inode *inode, int fd, u64 fpos);
+int erofs_write_compressed_file(struct z_erofs_compress_ictx *ictx);
 
 int z_erofs_compress_init(struct erofs_sb_info *sbi,
  struct erofs_buffer_head *bh);
diff --git a/lib/compress.c b/lib/compress.c
index 3fd3874..45ff128 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -1359,8 +1359,10 @@ int erofs_mt_write_compressed_file(struct 
z_erofs_compress_ictx *ictx)
pthread_mutex_unlock(>mutex);
 
bh = erofs_balloc(DATA, 0, 0, 0);
-   if (IS_ERR(bh))
-   return PTR_ERR(bh);
+   if (IS_ERR(bh)) {
+   ret = PTR_ERR(bh);
+   goto out;
+   }
 
DBG_BUGON(!head);
blkaddr = erofs_mapbh(bh->block);
@@ -1384,27 +1386,31 @@ int erofs_mt_write_compressed_file(struct 
z_erofs_compress_ictx *ictx)
blkaddr = cur->ctx.blkaddr;
}
 
+   pthread_mutex_lock(_erofs_mt_ctrl.mutex);
cur->next = z_erofs_mt_ctrl.idle;
z_erofs_mt_ctrl.idle = cur;
-   } while(head);
+   pthread_mutex_unlock(_erofs_mt_ctrl.mutex);
+   } while (head);
 
if (ret)
-   return ret;
-
-   return erofs_commit_compressed_file(ictx, bh,
+   goto out;
+   ret = erofs_commit_compressed_file(ictx, bh,
blkaddr - compressed_blocks, compressed_blocks);
+
+out:
+   close(ictx->fd);
+   free(ictx);
+   return ret;
 }
 #endif
 
-int erofs_write_compressed_file(struct erofs_inode *inode, int fd, u64 fpos)
+static struct z_erofs_compress_ictx g_ictx;
+
+void *erofs_begin_compressed_file(struct erofs_inode *inode, int fd, u64 fpos)
 {
-   static u8 g_queue[Z_EROFS_COMPR_QUEUE_SZ];
-   struct erofs_buffer_head *bh;
-   static struct z_erofs_compress_ictx ctx;
-   static struct z_erofs_compress_sctx sctx;
-   erofs_blk_t blkaddr;
-   int ret;
struct erofs_sb_info *sbi = inode->sbi;
+   struct z_erofs_compress_ictx *ictx;
+   int ret;
 
/* initialize per-file compression setting */
inode->z_advise = 0;
@@ -1435,45 +1441,79 @@ int erofs_write_compressed_file(struct erofs_inode 
*inode, int fd, u64 fpos)
}
}
 #endif
-   ctx.ccfg = _ccfg[inode->z_algorithmtype[0]];
-   inode->z_algorithmtype[0] = ctx.ccfg->algorithmtype;
-   inode->z_algorithmtype[1] = 0;
-
inode->idata_size = 0;
inode->fragment_size = 0;
 
+   if (z_erofs_mt_enabled) {
+   ictx = malloc(sizeof(*ictx));
+   if (!ictx)
+   return ERR_PTR(-ENOMEM);
+   ictx->fd = dup(fd);
+   } else {
+   ictx = _ictx;
+   ictx->fd = fd;
+   }
+
+   ictx->ccfg = _ccfg[inode->z_algorithmtype[0]];
+   inode->z_algorithmtype[0] = ictx->ccfg->algorithmtype;
+   inode->z_algorithmtype[1] = 0;
+
/*
 * Handle tails in advance to avoid writing duplicated
 * parts into the packed inode.
 */
if (cfg.c_fragments && !erofs_is_packed_inode(inode)) {
-   ret = z_erofs_fragments_dedupe(inode, fd, _chksum);
+   ret = z_erofs_fragments_dedupe(inode, fd, >tof_chksum);
if (ret < 0)
-   return ret;
+   goto err_free_ictx;
}
 
-   ctx.inode = inode;
-   ctx.fd = fd;
-   ctx.fpos = fpos;
-   init_list_head();
-   ctx.fix_dedupedfrag = false;
-   ctx.fragemitted = false;
+   ictx->inode = inode;
+   ictx->fpos = fpos;
+   init_list_head(>extents);
+   ictx->fix_dedupedfrag = false;
+   ictx->fragemitted = false;
 
if (cfg.c_all_fragments && !erofs_is_packed_inode(inode) &&
!i

[PATCH 4/8] erofs-utils: rearrange several fields for multi-threaded mkfs

2024-04-16 Thread Gao Xiang
From: Gao Xiang 

They should be located in `struct z_erofs_compress_ictx`.

Signed-off-by: Gao Xiang 
---
 lib/compress.c | 55 --
 1 file changed, 31 insertions(+), 24 deletions(-)

diff --git a/lib/compress.c b/lib/compress.c
index a2e0d0f..72f33d2 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -38,6 +38,7 @@ struct z_erofs_extent_item {
 
 struct z_erofs_compress_ictx { /* inode context */
struct erofs_inode *inode;
+   struct erofs_compress_cfg *ccfg;
int fd;
u64 fpos;
 
@@ -49,6 +50,14 @@ struct z_erofs_compress_ictx {   /* inode 
context */
u8 *metacur;
struct list_head extents;
u16 clusterofs;
+
+   int seg_num;
+
+#if EROFS_MT_ENABLED
+   pthread_mutex_t mutex;
+   pthread_cond_t cond;
+   int nfini;
+#endif
 };
 
 struct z_erofs_compress_sctx { /* segment context */
@@ -68,7 +77,7 @@ struct z_erofs_compress_sctx {/* segment 
context */
erofs_blk_t blkaddr;/* pointing to the next blkaddr */
u16 clusterofs;
 
-   int seg_num, seg_idx;
+   int seg_idx;
 
void *membuf;
erofs_off_t memoff;
@@ -99,8 +108,6 @@ static struct {
struct erofs_workqueue wq;
struct erofs_compress_work *idle;
pthread_mutex_t mutex;
-   pthread_cond_t cond;
-   int nfini;
 } z_erofs_mt_ctrl;
 #endif
 
@@ -512,7 +519,7 @@ static int __z_erofs_compress_one(struct 
z_erofs_compress_sctx *ctx,
struct erofs_compress *const h = ctx->chandle;
unsigned int len = ctx->tail - ctx->head;
bool is_packed_inode = erofs_is_packed_inode(inode);
-   bool tsg = (ctx->seg_idx + 1 >= ctx->seg_num), final = !ctx->remaining;
+   bool tsg = (ctx->seg_idx + 1 >= ictx->seg_num), final = !ctx->remaining;
bool may_packing = (cfg.c_fragments && tsg && final &&
!is_packed_inode && !z_erofs_mt_enabled);
bool may_inline = (cfg.c_ztailpacking && tsg && final && !may_packing);
@@ -1196,7 +1203,8 @@ void z_erofs_mt_workfn(struct erofs_work *work, void 
*tlsp)
struct erofs_compress_work *cwork = (struct erofs_compress_work *)work;
struct erofs_compress_wq_tls *tls = tlsp;
struct z_erofs_compress_sctx *sctx = >ctx;
-   struct erofs_inode *inode = sctx->ictx->inode;
+   struct z_erofs_compress_ictx *ictx = sctx->ictx;
+   struct erofs_inode *inode = ictx->inode;
struct erofs_sb_info *sbi = inode->sbi;
int ret = 0;
 
@@ -1223,10 +1231,10 @@ void z_erofs_mt_workfn(struct erofs_work *work, void 
*tlsp)
 
 out:
cwork->errcode = ret;
-   pthread_mutex_lock(_erofs_mt_ctrl.mutex);
-   ++z_erofs_mt_ctrl.nfini;
-   pthread_cond_signal(_erofs_mt_ctrl.cond);
-   pthread_mutex_unlock(_erofs_mt_ctrl.mutex);
+   pthread_mutex_lock(>mutex);
+   ++ictx->nfini;
+   pthread_cond_signal(>cond);
+   pthread_mutex_unlock(>mutex);
 }
 
 int z_erofs_merge_segment(struct z_erofs_compress_ictx *ictx,
@@ -1260,16 +1268,19 @@ int z_erofs_merge_segment(struct z_erofs_compress_ictx 
*ictx,
 }
 
 int z_erofs_mt_compress(struct z_erofs_compress_ictx *ictx,
-   struct erofs_compress_cfg *ccfg,
erofs_blk_t blkaddr,
erofs_blk_t *compressed_blocks)
 {
struct erofs_compress_work *cur, *head = NULL, **last = 
+   struct erofs_compress_cfg *ccfg = ictx->ccfg;
struct erofs_inode *inode = ictx->inode;
int nsegs = DIV_ROUND_UP(inode->i_size, cfg.c_segment_size);
int ret, i;
 
-   z_erofs_mt_ctrl.nfini = 0;
+   ictx->seg_num = nsegs;
+   ictx->nfini = 0;
+   pthread_mutex_init(>mutex, NULL);
+   pthread_cond_init(>cond, NULL);
 
for (i = 0; i < nsegs; i++) {
if (z_erofs_mt_ctrl.idle) {
@@ -1286,7 +1297,6 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx,
 
cur->ctx = (struct z_erofs_compress_sctx) {
.ictx = ictx,
-   .seg_num = nsegs,
.seg_idx = i,
.pivot = _pivot,
};
@@ -1308,11 +1318,10 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx,
erofs_queue_work(_erofs_mt_ctrl.wq, >work);
}
 
-   pthread_mutex_lock(_erofs_mt_ctrl.mutex);
-   while (z_erofs_mt_ctrl.nfini != nsegs)
-   pthread_cond_wait(_erofs_mt_ctrl.cond,
- _erofs_mt_ctrl.mutex);
-   pthread_mutex_unlock(_erofs_mt_ctrl.mutex);
+   pthread_mutex_lock(>mutex);
+   while (ictx->nfini < ictx->seg_num)
+   pthread_cond_wait(>cond, >mutex);
+   pthread_mutex_unlock(>mutex);
 

[PATCH 8/8] erofs-utils: mkfs: enable inter-file multi-threaded compression

2024-04-16 Thread Gao Xiang
From: Gao Xiang 

Dispatch deferred ops in another per-sb worker thread.  Note that
deferred ops are strictly FIFOed.

Signed-off-by: Gao Xiang 
---
 include/erofs/internal.h |   6 ++
 lib/inode.c  | 121 ++-
 2 files changed, 124 insertions(+), 3 deletions(-)

diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index f31e548..ecbbdf6 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -71,6 +71,7 @@ struct erofs_xattr_prefix_item {
 
 #define EROFS_PACKED_NID_UNALLOCATED   -1
 
+struct erofs_mkfs_dfops;
 struct erofs_sb_info {
struct erofs_device_info *devs;
char *devname;
@@ -124,6 +125,11 @@ struct erofs_sb_info {
struct list_head list;
 
u64 saved_by_deduplication;
+
+#ifdef EROFS_MT_ENABLED
+   pthread_t dfops_worker;
+   struct erofs_mkfs_dfops *mkfs_dfops;
+#endif
 };
 
 /* make sure that any user of the erofs headers has atleast 64bit off_t type */
diff --git a/lib/inode.c b/lib/inode.c
index 681460c..3c952b2 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -1165,6 +1165,7 @@ enum erofs_mkfs_jobtype { /* ordered job types */
EROFS_MKFS_JOB_NDIR,
EROFS_MKFS_JOB_DIR,
EROFS_MKFS_JOB_DIR_BH,
+   EROFS_MKFS_JOB_MAX
 };
 
 struct erofs_mkfs_jobitem {
@@ -1203,6 +1204,74 @@ static int erofs_mkfs_jobfn(struct erofs_mkfs_jobitem 
*item)
return -EINVAL;
 }
 
+#ifdef EROFS_MT_ENABLED
+
+struct erofs_mkfs_dfops {
+   pthread_t worker;
+   pthread_mutex_t lock;
+   pthread_cond_t full, empty;
+   struct erofs_mkfs_jobitem *queue;
+   size_t size, elem_size;
+   size_t head, tail;
+};
+
+#define EROFS_MT_QUEUE_SIZE 256
+
+void *erofs_mkfs_pop_jobitem(struct erofs_mkfs_dfops *q)
+{
+   struct erofs_mkfs_jobitem *item;
+
+   pthread_mutex_lock(>lock);
+   while (q->head == q->tail)
+   pthread_cond_wait(>empty, >lock);
+
+   item = q->queue + q->head;
+   q->head = (q->head + 1) % q->size;
+
+   pthread_cond_signal(>full);
+   pthread_mutex_unlock(>lock);
+   return item;
+}
+
+void *z_erofs_mt_dfops_worker(void *arg)
+{
+   struct erofs_sb_info *sbi = arg;
+   int ret = 0;
+
+   while (1) {
+   struct erofs_mkfs_jobitem *item;
+
+   item = erofs_mkfs_pop_jobitem(sbi->mkfs_dfops);
+   if (item->type >= EROFS_MKFS_JOB_MAX)
+   break;
+   ret = erofs_mkfs_jobfn(item);
+   if (ret)
+   break;
+   }
+   pthread_exit((void *)(uintptr_t)ret);
+}
+
+int erofs_mkfs_go(struct erofs_sb_info *sbi,
+ enum erofs_mkfs_jobtype type, void *elem, int size)
+{
+   struct erofs_mkfs_jobitem *item;
+   struct erofs_mkfs_dfops *q = sbi->mkfs_dfops;
+
+   pthread_mutex_lock(>lock);
+
+   while ((q->tail + 1) % q->size == q->head)
+   pthread_cond_wait(>full, >lock);
+
+   item = q->queue + q->tail;
+   item->type = type;
+   memcpy(>u, elem, size);
+   q->tail = (q->tail + 1) % q->size;
+
+   pthread_cond_signal(>empty);
+   pthread_mutex_unlock(>lock);
+   return 0;
+}
+#else
 int erofs_mkfs_go(struct erofs_sb_info *sbi,
  enum erofs_mkfs_jobtype type, void *elem, int size)
 {
@@ -1212,6 +1281,7 @@ int erofs_mkfs_go(struct erofs_sb_info *sbi,
memcpy(, elem, size);
return erofs_mkfs_jobfn();
 }
+#endif
 
 static int erofs_mkfs_handle_directory(struct erofs_inode *dir)
 {
@@ -1344,7 +1414,11 @@ static int erofs_mkfs_handle_inode(struct erofs_inode 
*inode)
return ret;
 }
 
-struct erofs_inode *erofs_mkfs_build_tree_from_path(const char *path)
+#ifndef EROFS_MT_ENABLED
+#define __erofs_mkfs_build_tree_from_path erofs_mkfs_build_tree_from_path
+#endif
+
+struct erofs_inode *__erofs_mkfs_build_tree_from_path(const char *path)
 {
struct erofs_inode *root, *dumpdir;
int err;
@@ -1361,7 +1435,6 @@ struct erofs_inode *erofs_mkfs_build_tree_from_path(const 
char *path)
err = erofs_mkfs_handle_inode(root);
if (err)
return ERR_PTR(err);
-   erofs_fixup_meta_blkaddr(root);
 
do {
int err;
@@ -1400,10 +1473,52 @@ struct erofs_inode 
*erofs_mkfs_build_tree_from_path(const char *path)
if (err)
return ERR_PTR(err);
} while (dumpdir);
-
return root;
 }
 
+#ifdef EROFS_MT_ENABLED
+struct erofs_inode *erofs_mkfs_build_tree_from_path(const char *path)
+{
+   struct erofs_mkfs_dfops *q;
+   struct erofs_inode *root;
+   int err;
+
+   q = malloc(sizeof(*q));
+   if (!q)
+   return ERR_PTR(-ENOMEM);
+
+   q->queue = malloc(q->size * sizeof(*q->queue));
+   if (!q->queue) {
+   free(q);
+   return ER

[PATCH 3/8] erofs-utils: lib: split out erofs_commit_compressed_file()

2024-04-16 Thread Gao Xiang
From: Gao Xiang 

Just split out on-disk compressed metadata commit logic.

Signed-off-by: Gao Xiang 
---
 lib/compress.c | 191 +++--
 1 file changed, 105 insertions(+), 86 deletions(-)

diff --git a/lib/compress.c b/lib/compress.c
index 74c5707..a2e0d0f 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -1026,6 +1026,102 @@ int z_erofs_compress_segment(struct 
z_erofs_compress_sctx *ctx,
return 0;
 }
 
+int erofs_commit_compressed_file(struct z_erofs_compress_ictx *ictx,
+struct erofs_buffer_head *bh,
+erofs_blk_t blkaddr,
+erofs_blk_t compressed_blocks)
+{
+   struct erofs_inode *inode = ictx->inode;
+   struct erofs_sb_info *sbi = inode->sbi;
+   unsigned int legacymetasize;
+   u8 *compressmeta;
+   int ret;
+
+   /* fall back to no compression mode */
+   DBG_BUGON(compressed_blocks < !!inode->idata_size);
+   compressed_blocks -= !!inode->idata_size;
+
+   compressmeta = malloc(BLK_ROUND_UP(sbi, inode->i_size) *
+ sizeof(struct z_erofs_lcluster_index) +
+ Z_EROFS_LEGACY_MAP_HEADER_SIZE);
+   if (!compressmeta) {
+   ret = -ENOMEM;
+   goto err_free_idata;
+   }
+   ictx->metacur = compressmeta + Z_EROFS_LEGACY_MAP_HEADER_SIZE;
+   z_erofs_write_indexes(ictx);
+
+   legacymetasize = ictx->metacur - compressmeta;
+   /* estimate if data compression saves space or not */
+   if (!inode->fragment_size &&
+   compressed_blocks * erofs_blksiz(sbi) + inode->idata_size +
+   legacymetasize >= inode->i_size) {
+   z_erofs_dedupe_commit(true);
+   ret = -ENOSPC;
+   goto err_free_meta;
+   }
+   z_erofs_dedupe_commit(false);
+   z_erofs_write_mapheader(inode, compressmeta);
+
+   if (!ictx->fragemitted)
+   sbi->saved_by_deduplication += inode->fragment_size;
+
+   /* if the entire file is a fragment, a simplified form is used. */
+   if (inode->i_size <= inode->fragment_size) {
+   DBG_BUGON(inode->i_size < inode->fragment_size);
+   DBG_BUGON(inode->fragmentoff >> 63);
+   *(__le64 *)compressmeta =
+   cpu_to_le64(inode->fragmentoff | 1ULL << 63);
+   inode->datalayout = EROFS_INODE_COMPRESSED_FULL;
+   legacymetasize = Z_EROFS_LEGACY_MAP_HEADER_SIZE;
+   }
+
+   if (compressed_blocks) {
+   ret = erofs_bh_balloon(bh, erofs_pos(sbi, compressed_blocks));
+   DBG_BUGON(ret != erofs_blksiz(sbi));
+   } else {
+   if (!cfg.c_fragments && !cfg.c_dedupe)
+   DBG_BUGON(!inode->idata_size);
+   }
+
+   erofs_info("compressed %s (%llu bytes) into %u blocks",
+  inode->i_srcpath, (unsigned long long)inode->i_size,
+  compressed_blocks);
+
+   if (inode->idata_size) {
+   bh->op = _skip_write_bhops;
+   inode->bh_data = bh;
+   } else {
+   erofs_bdrop(bh, false);
+   }
+
+   inode->u.i_blocks = compressed_blocks;
+
+   if (inode->datalayout == EROFS_INODE_COMPRESSED_FULL) {
+   inode->extent_isize = legacymetasize;
+   } else {
+   ret = z_erofs_convert_to_compacted_format(inode, blkaddr,
+ legacymetasize,
+ compressmeta);
+   DBG_BUGON(ret);
+   }
+   inode->compressmeta = compressmeta;
+   if (!erofs_is_packed_inode(inode))
+   erofs_droid_blocklist_write(inode, blkaddr, compressed_blocks);
+   return 0;
+
+err_free_meta:
+   free(compressmeta);
+   inode->compressmeta = NULL;
+err_free_idata:
+   if (inode->idata) {
+   free(inode->idata);
+   inode->idata = NULL;
+   }
+   erofs_bdrop(bh, true);  /* revoke buffer */
+   return ret;
+}
+
 #ifdef EROFS_MT_ENABLED
 void *z_erofs_mt_wq_tls_alloc(struct erofs_workqueue *wq, void *ptr)
 {
@@ -1252,23 +1348,9 @@ int erofs_write_compressed_file(struct erofs_inode 
*inode, int fd, u64 fpos)
static struct z_erofs_compress_sctx sctx;
struct erofs_compress_cfg *ccfg;
erofs_blk_t blkaddr, compressed_blocks = 0;
-   unsigned int legacymetasize;
int ret;
bool ismt = false;
struct erofs_sb_info *sbi = inode->sbi;
-   u8 *compressmeta = malloc(BLK_ROUND_UP(sbi, inode->i_size) *
- sizeof(struct z_erofs_lcluster_index) +
- Z_EROFS_LEGACY_MAP_HEADER_SIZE);
-
- 

[PATCH 5/8] erofs-utils: lib: split up z_erofs_mt_compress()

2024-04-16 Thread Gao Xiang
From: Gao Xiang 

The on-disk compressed data write will be moved into a new function
erofs_mt_write_compressed_file().

Signed-off-by: Gao Xiang 
---
 lib/compress.c | 172 -
 1 file changed, 99 insertions(+), 73 deletions(-)

diff --git a/lib/compress.c b/lib/compress.c
index 72f33d2..3fd3874 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -57,6 +57,8 @@ struct z_erofs_compress_ictx {/* inode 
context */
pthread_mutex_t mutex;
pthread_cond_t cond;
int nfini;
+
+   struct erofs_compress_work *mtworks;
 #endif
 };
 
@@ -1030,6 +1032,26 @@ int z_erofs_compress_segment(struct 
z_erofs_compress_sctx *ctx,
z_erofs_commit_extent(ctx, ctx->pivot);
ctx->pivot = NULL;
}
+
+   /* generate an extra extent for the deduplicated fragment */
+   if (ctx->seg_idx >= ictx->seg_num - 1 &&
+   ictx->inode->fragment_size && !ictx->fragemitted) {
+   struct z_erofs_extent_item *ei;
+
+   ei = malloc(sizeof(*ei));
+   if (!ei)
+   return -ENOMEM;
+
+   ei->e = (struct z_erofs_inmem_extent) {
+   .length = ictx->inode->fragment_size,
+   .compressedblks = 0,
+   .raw = false,
+   .partial = false,
+   .blkaddr = ctx->blkaddr,
+   };
+   init_list_head(>list);
+   z_erofs_commit_extent(ctx, ei);
+   }
return 0;
 }
 
@@ -1044,6 +1066,8 @@ int erofs_commit_compressed_file(struct 
z_erofs_compress_ictx *ictx,
u8 *compressmeta;
int ret;
 
+   z_erofs_fragments_commit(inode);
+
/* fall back to no compression mode */
DBG_BUGON(compressed_blocks < !!inode->idata_size);
compressed_blocks -= !!inode->idata_size;
@@ -1121,11 +1145,11 @@ err_free_meta:
free(compressmeta);
inode->compressmeta = NULL;
 err_free_idata:
+   erofs_bdrop(bh, true);  /* revoke buffer */
if (inode->idata) {
free(inode->idata);
inode->idata = NULL;
}
-   erofs_bdrop(bh, true);  /* revoke buffer */
return ret;
 }
 
@@ -1267,15 +1291,13 @@ int z_erofs_merge_segment(struct z_erofs_compress_ictx 
*ictx,
return ret;
 }
 
-int z_erofs_mt_compress(struct z_erofs_compress_ictx *ictx,
-   erofs_blk_t blkaddr,
-   erofs_blk_t *compressed_blocks)
+int z_erofs_mt_compress(struct z_erofs_compress_ictx *ictx)
 {
struct erofs_compress_work *cur, *head = NULL, **last = 
struct erofs_compress_cfg *ccfg = ictx->ccfg;
struct erofs_inode *inode = ictx->inode;
int nsegs = DIV_ROUND_UP(inode->i_size, cfg.c_segment_size);
-   int ret, i;
+   int i;
 
ictx->seg_num = nsegs;
ictx->nfini = 0;
@@ -1283,11 +1305,14 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx,
pthread_cond_init(>cond, NULL);
 
for (i = 0; i < nsegs; i++) {
-   if (z_erofs_mt_ctrl.idle) {
-   cur = z_erofs_mt_ctrl.idle;
+   pthread_mutex_lock(_erofs_mt_ctrl.mutex);
+   cur = z_erofs_mt_ctrl.idle;
+   if (cur) {
z_erofs_mt_ctrl.idle = cur->next;
cur->next = NULL;
-   } else {
+   }
+   pthread_mutex_unlock(_erofs_mt_ctrl.mutex);
+   if (!cur) {
cur = calloc(1, sizeof(*cur));
if (!cur)
return -ENOMEM;
@@ -1317,14 +1342,31 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx,
cur->work.fn = z_erofs_mt_workfn;
erofs_queue_work(_erofs_mt_ctrl.wq, >work);
}
+   ictx->mtworks = head;
+   return 0;
+}
+
+int erofs_mt_write_compressed_file(struct z_erofs_compress_ictx *ictx)
+{
+   struct erofs_buffer_head *bh = NULL;
+   struct erofs_compress_work *head = ictx->mtworks, *cur;
+   erofs_blk_t blkaddr, compressed_blocks = 0;
+   int ret;
 
pthread_mutex_lock(>mutex);
while (ictx->nfini < ictx->seg_num)
pthread_cond_wait(>cond, >mutex);
pthread_mutex_unlock(>mutex);
 
+   bh = erofs_balloc(DATA, 0, 0, 0);
+   if (IS_ERR(bh))
+   return PTR_ERR(bh);
+
+   DBG_BUGON(!head);
+   blkaddr = erofs_mapbh(bh->block);
+
ret = 0;
-   while (head) {
+   do {
cur = head;
head = cur->next;
 
@@ -1338,14 +1380,19 @@ int z_erofs_mt_compress(struct z_erofs_compress_ictx 
*ictx,
if (ret2)
ret = ret2;
 
-  

[PATCH 2/8] erofs-utils: lib: prepare for later deferred work

2024-04-16 Thread Gao Xiang
From: Gao Xiang 

Split out ordered metadata operations and add the following helpers:

 - erofs_mkfs_jobfn()

 - erofs_mkfs_go()

to handle these mkfs job items for multi-threadding support.

Signed-off-by: Gao Xiang 
---
 lib/inode.c | 68 +
 1 file changed, 58 insertions(+), 10 deletions(-)

diff --git a/lib/inode.c b/lib/inode.c
index 55969d9..8ef0604 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -1133,6 +1133,57 @@ static int erofs_mkfs_handle_nondirectory(struct 
erofs_inode *inode)
return 0;
 }
 
+enum erofs_mkfs_jobtype {  /* ordered job types */
+   EROFS_MKFS_JOB_NDIR,
+   EROFS_MKFS_JOB_DIR,
+   EROFS_MKFS_JOB_DIR_BH,
+};
+
+struct erofs_mkfs_jobitem {
+   enum erofs_mkfs_jobtype type;
+   union {
+   struct erofs_inode *inode;
+   } u;
+};
+
+static int erofs_mkfs_jobfn(struct erofs_mkfs_jobitem *item)
+{
+   struct erofs_inode *inode = item->u.inode;
+   int ret;
+
+   if (item->type == EROFS_MKFS_JOB_NDIR)
+   return erofs_mkfs_handle_nondirectory(inode);
+
+   if (item->type == EROFS_MKFS_JOB_DIR) {
+   ret = erofs_prepare_inode_buffer(inode);
+   if (ret)
+   return ret;
+   inode->bh->op = _skip_write_bhops;
+   if (IS_ROOT(inode))
+   erofs_fixup_meta_blkaddr(inode);
+   return 0;
+   }
+
+   if (item->type == EROFS_MKFS_JOB_DIR_BH) {
+   erofs_write_dir_file(inode);
+   erofs_write_tail_end(inode);
+   inode->bh->op = _write_inode_bhops;
+   erofs_iput(inode);
+   return 0;
+   }
+   return -EINVAL;
+}
+
+int erofs_mkfs_go(struct erofs_sb_info *sbi,
+ enum erofs_mkfs_jobtype type, void *elem, int size)
+{
+   struct erofs_mkfs_jobitem item;
+
+   item.type = type;
+   memcpy(, elem, size);
+   return erofs_mkfs_jobfn();
+}
+
 static int erofs_mkfs_handle_directory(struct erofs_inode *dir)
 {
DIR *_dir;
@@ -1213,11 +1264,7 @@ static int erofs_mkfs_handle_directory(struct 
erofs_inode *dir)
else
dir->i_nlink = i_nlink;
 
-   ret = erofs_prepare_inode_buffer(dir);
-   if (ret)
-   return ret;
-   dir->bh->op = _skip_write_bhops;
-   return 0;
+   return erofs_mkfs_go(dir->sbi, EROFS_MKFS_JOB_DIR, , sizeof(dir));
 
 err_closedir:
closedir(_dir);
@@ -1243,7 +1290,8 @@ static int erofs_mkfs_handle_inode(struct erofs_inode 
*inode)
return ret;
 
if (!S_ISDIR(inode->i_mode))
-   ret = erofs_mkfs_handle_nondirectory(inode);
+   ret = erofs_mkfs_go(inode->sbi, EROFS_MKFS_JOB_NDIR,
+   , sizeof(inode));
else
ret = erofs_mkfs_handle_directory(inode);
erofs_info("file %s dumped (mode %05o)", erofs_fspath(inode->i_srcpath),
@@ -1302,10 +1350,10 @@ struct erofs_inode 
*erofs_mkfs_build_tree_from_path(const char *path)
}
*last = dumpdir;/* fixup the last (or the only) one */
dumpdir = head;
-   erofs_write_dir_file(dir);
-   erofs_write_tail_end(dir);
-   dir->bh->op = _write_inode_bhops;
-   erofs_iput(dir);
+   err = erofs_mkfs_go(dir->sbi, EROFS_MKFS_JOB_DIR_BH,
+   , sizeof(dir));
+   if (err)
+   return ERR_PTR(err);
} while (dumpdir);
 
return root;
-- 
2.30.2



[PATCH 1/8] erofs-utils: use erofs_atomic_t for inode->i_count

2024-04-16 Thread Gao Xiang
From: Gao Xiang 

Since it can be touched for more than one thread if multi-threading
is enabled.

Signed-off-by: Gao Xiang 
---
 include/erofs/atomic.h   | 10 ++
 include/erofs/inode.h|  2 +-
 include/erofs/internal.h |  3 ++-
 lib/inode.c  |  5 +++--
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/erofs/atomic.h b/include/erofs/atomic.h
index 214cdb1..f28687e 100644
--- a/include/erofs/atomic.h
+++ b/include/erofs/atomic.h
@@ -25,4 +25,14 @@ __n;})
 #define erofs_atomic_test_and_set(ptr) \
__atomic_test_and_set(ptr, __ATOMIC_RELAXED)
 
+#define erofs_atomic_add_return(ptr, i) \
+   __atomic_add_fetch(ptr, i, __ATOMIC_RELAXED)
+
+#define erofs_atomic_sub_return(ptr, i) \
+   __atomic_sub_fetch(ptr, i, __ATOMIC_RELAXED)
+
+#define erofs_atomic_inc_return(ptr) erofs_atomic_add_return(ptr, 1)
+
+#define erofs_atomic_dec_return(ptr) erofs_atomic_sub_return(ptr, 1)
+
 #endif
diff --git a/include/erofs/inode.h b/include/erofs/inode.h
index d5a732a..5d6bc98 100644
--- a/include/erofs/inode.h
+++ b/include/erofs/inode.h
@@ -17,7 +17,7 @@ extern "C"
 
 static inline struct erofs_inode *erofs_igrab(struct erofs_inode *inode)
 {
-   ++inode->i_count;
+   (void)erofs_atomic_inc_return(>i_count);
return inode;
 }
 
diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index 4cd2059..f31e548 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -25,6 +25,7 @@ typedef unsigned short umode_t;
 #ifdef HAVE_PTHREAD_H
 #include 
 #endif
+#include "atomic.h"
 
 #ifndef PATH_MAX
 #define PATH_MAX4096/* # chars in a path name including nul */
@@ -169,7 +170,7 @@ struct erofs_inode {
/* (mkfs.erofs) next pointer for directory dumping */
struct erofs_inode *next_dirwrite;
};
-   unsigned int i_count;
+   erofs_atomic_t i_count;
struct erofs_sb_info *sbi;
struct erofs_inode *i_parent;
 
diff --git a/lib/inode.c b/lib/inode.c
index 7508c74..55969d9 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -129,9 +129,10 @@ struct erofs_inode *erofs_iget_by_nid(erofs_nid_t nid)
 unsigned int erofs_iput(struct erofs_inode *inode)
 {
struct erofs_dentry *d, *t;
+   unsigned long got = erofs_atomic_dec_return(>i_count);
 
-   if (inode->i_count > 1)
-   return --inode->i_count;
+   if (got >= 1)
+   return got;
 
list_for_each_entry_safe(d, t, >i_subdirs, d_child)
free(d);
-- 
2.30.2



Re: [PATCH v2] erofs-utils: dump: print filesystem blocksize

2024-04-15 Thread Gao Xiang via Linux-erofs
Hi Sandeep,

On Mon, Apr 15, 2024 at 11:35:38AM -0700, Sandeep Dhavale wrote:
> mkfs.erofs supports creating filesystem images with different
> blocksizes. Add filesystem blocksize in super block dump so
> its easier to inspect the filesystem.
>
> The field is added after FS magic, so the output now looks like:
>
> Filesystem magic number:  0xE0F5E1E2
> Filesystem blocksize: 65536
> Filesystem blocks:21
> Filesystem inode metadata start block:0
> Filesystem shared xattr metadata start block: 0
> Filesystem root nid:  36
> Filesystem lz4_max_distance:  65535
> Filesystem sb_extslots:   0
> Filesystem inode count:   10
> Filesystem created:   Fri Apr 12 15:43:40 2024
> Filesystem features:  sb_csum mtime 0padding
> Filesystem UUID:  
> a84a2acc-08d8-4b72-8b8c-b811a815fa07
>
> Signed-off-by: Sandeep Dhavale 
> ---
> Changes since v2:
>   - Moved the field after FS magic as suggested by Gao
>  dump/main.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/dump/main.c b/dump/main.c
> index a89fc6b..928909d 100644
> --- a/dump/main.c
> +++ b/dump/main.c
> @@ -633,6 +633,8 @@ static void erofsdump_show_superblock(void)
>
>   fprintf(stdout, "Filesystem magic number:  
> 0x%04X\n",
>   EROFS_SUPER_MAGIC_V1);
> + fprintf(stdout, "Filesystem blocksize: %llu\n",
> + erofs_blksiz() | 0ULL);

Could we use `%u` for `erofs_blksiz()`? since currently EROFS
block size isn't possible to be larger than PAGE_SIZE.

Even if block size > page size is supported, I think we should
not consider too large blocksizes.

Otherwise it looks good to me.

Thanks,
Gao Xiang


Re: [PATCH] erofs: set SB_NODEV sb_flags when mounting with fsid

2024-04-15 Thread Gao Xiang
Hi Christian, Baokun,

On Mon, Apr 15, 2024 at 11:23:58PM +0800, Baokun Li wrote:
> On 2024/4/15 21:38, Christian Brauner wrote:
> > On Mon, Apr 15, 2024 at 08:17:46PM +0800, Baokun Li wrote:
> > > When erofs_kill_sb() is called in block dev based mode, s_bdev may not 
> > > have
> > > been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled, it will
> > > be mistaken for fscache mode, and then attempt to free an anon_dev that 
> > > has
> > > never been allocated, triggering the following warning:
> > > 
> > > 
> > > ida_free called for id=0 which is not allocated.
> > > WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
> > > Modules linked in:
> > > CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
> > > RIP: 0010:ida_free+0x134/0x140
> > > Call Trace:
> > >   
> > >   erofs_kill_sb+0x81/0x90
> > >   deactivate_locked_super+0x35/0x80
> > >   get_tree_bdev+0x136/0x1e0
> > >   vfs_get_tree+0x2c/0xf0
> > >   do_new_mount+0x190/0x2f0
> > >   [...]
> > > 
> > > 
> > > To avoid this problem, add SB_NODEV to fc->sb_flags after successfully
> > > parsing the fsid, and then the superblock inherits this flag when it is
> > > allocated, so that the sb_flags can be used to distinguish whether it is
> > > in block dev based mode when calling erofs_kill_sb().
> > > 
> > > Signed-off-by: Baokun Li 
> > > ---
> > >   fs/erofs/super.c | 7 +++
> > >   1 file changed, 3 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> > > index b21bd8f78dc1..7539ce7d64bc 100644
> > > --- a/fs/erofs/super.c
> > > +++ b/fs/erofs/super.c
> > > @@ -520,6 +520,7 @@ static int erofs_fc_parse_param(struct fs_context *fc,
> > >   ctx->fsid = kstrdup(param->string, GFP_KERNEL);
> > >   if (!ctx->fsid)
> > >   return -ENOMEM;
> > > + fc->sb_flags |= SB_NODEV;
> > Hm, I wouldn't do it this way. That's an abuse of that flag imho.
> > Record the information in the erofs_fs_context if you need to.
> The stack diagram that triggers the problem is as follows, the call to
> erofs_kill_sb() fails before fill_super() has been executed, and we can
> only use super_block to determine whether it is currently in nodev
> fscahe mode or block device based mode. So if it is recorded in
> erofs_fs_context (aka fc->fs_private), we can't access the recorded data
> unless we pass fc into erofs_kill_sb() as well.
> 

If I understand correctly, from the discussion above, I think
there exists a gap between alloc_super() and sb->s_bdev is set.
But .kill_sb() can be called between them and fc is not passed
into .kill_sb().

I'm not sure how to resolve it in EROFS itself, anyway...

Thanks,
Gao Xiang


Re: [PATCH] erofs: Consider NUMA affinity when allocating memory for per-CPU pcpubuf

2024-04-15 Thread Gao Xiang
Hi RongQing,

On Mon, Apr 15, 2024 at 02:19:40PM +0800, Li RongQing wrote:
> per-CPU pcpubufs are dominantly accessed from their own local CPUs,
> so allocate them node-local to improve performance.
> 
> Signed-off-by: Li RongQing 

Thanks for your patch!  Yeah, NUMA-aware is important to
NUMA bare metal server scenarios.

In the next cycle, we also reduce the total number of buffers
since we don't such many per-CPU buffers if there are too many
CPUs: we called "global buffers" and maintain CPU->global
buffer mappings.  Also erofs_allocpage() won't be used to
allocate too.

For more details, see my for-next branch:
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git/log/?h=dev

So could you make some difference to make the new global
buffers NUMA-aware? (both for allocation and mapping, maybe
the allocation is still a priority stuff if per-CPU is needed
anyway.)

Thanks,
Gao Xiang


Re: [PATCH v1] erofs-utils: dump: print filesystem blocksize

2024-04-12 Thread Gao Xiang




On 2024/4/13 09:00, Sandeep Dhavale wrote:

On Fri, Apr 12, 2024 at 5:09 PM Gao Xiang  wrote:


Hi Sandeep,

On 2024/4/13 06:51, Sandeep Dhavale wrote:

mkfs.erofs supports creating filesystem images with different
blocksizes. Add filesystem blocksize in super block dump so
its easier to inspect the filesystem.

The filed is added at last, so the output now looks like:

Filesystem magic number:  0xE0F5E1E2
Filesystem blocks:21
Filesystem inode metadata start block:0
Filesystem shared xattr metadata start block: 0
Filesystem root nid:  36
Filesystem lz4_max_distance:  65535
Filesystem sb_extslots:   0
Filesystem inode count:   10
Filesystem created:   Fri Apr 12 15:43:40 2024
Filesystem features:  sb_csum mtime 0padding
Filesystem UUID:  
a84a2acc-08d8-4b72-8b8c-b811a815fa07
Filesystem blocksize: 65536

Signed-off-by: Sandeep Dhavale 


Just a minor nit:
Could we move "Filesystem blocksize:" between the line of
"Filesystem magic number:" and "Filesystem blocks:" ?

Otherwise it looks good to me, thanks for this!


Hi Gao,
Sure I can change the location. I didn't do it assuming someone might
have scripted around it (not that I know of!), so I added it last.


Yeah, I could guess your original intentation. But it seems somewhat
strange to show `Filesystem blocksize` in the very end...

Thanks,
Gao Xiang


I will send V2 by moving it after magic number.

Thanks,
Sandeep.

Thanks,
Gao Xiang


Re: [PATCH v1] erofs-utils: dump: print filesystem blocksize

2024-04-12 Thread Gao Xiang

Hi Sandeep,

On 2024/4/13 06:51, Sandeep Dhavale wrote:

mkfs.erofs supports creating filesystem images with different
blocksizes. Add filesystem blocksize in super block dump so
its easier to inspect the filesystem.

The filed is added at last, so the output now looks like:

Filesystem magic number:  0xE0F5E1E2
Filesystem blocks:21
Filesystem inode metadata start block:0
Filesystem shared xattr metadata start block: 0
Filesystem root nid:  36
Filesystem lz4_max_distance:  65535
Filesystem sb_extslots:   0
Filesystem inode count:   10
Filesystem created:   Fri Apr 12 15:43:40 2024
Filesystem features:  sb_csum mtime 0padding
Filesystem UUID:  
a84a2acc-08d8-4b72-8b8c-b811a815fa07
Filesystem blocksize: 65536

Signed-off-by: Sandeep Dhavale 


Just a minor nit:
Could we move "Filesystem blocksize:" between the line of
"Filesystem magic number:" and "Filesystem blocks:" ?

Otherwise it looks good to me, thanks for this!

Thanks,
Gao Xiang


Re: [PATCH v2] erofs-utils: lib: treat data blocks filled with 0s as a hole

2024-04-12 Thread Gao Xiang

Hi Sandeep,

On 2024/4/10 06:14, Sandeep Dhavale wrote:

Add optimization to treat data blocks filled with 0s as a hole.
Even though diskspace savings are comparable to chunk based or dedupe,
having no block assigned saves us redundant disk IOs during read.

To detect blocks filled with zeros during chunking, we insert block
filled with zeros (zerochunk) in the hashmap. If we detect a possible
dedupe, we map it to the hole so there is no physical block assigned.

Signed-off-by: Sandeep Dhavale 
---
Changes since v1:
- Instead of checking every block for 0s word by word,
  add a zerochunk in blobs during init. So we effectively
  detect the zero blocks by comparing the hash.
  include/erofs/blobchunk.h |  2 +-
  lib/blobchunk.c   | 41 ---
  mkfs/main.c   |  2 +-
  3 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/include/erofs/blobchunk.h b/include/erofs/blobchunk.h
index a674640..ebe2efe 100644
--- a/include/erofs/blobchunk.h
+++ b/include/erofs/blobchunk.h
@@ -23,7 +23,7 @@ int erofs_write_zero_inode(struct erofs_inode *inode);
  int tarerofs_write_chunkes(struct erofs_inode *inode, erofs_off_t 
data_offset);
  int erofs_mkfs_dump_blobs(struct erofs_sb_info *sbi);
  void erofs_blob_exit(void);
-int erofs_blob_init(const char *blobfile_path);
+int erofs_blob_init(const char *blobfile_path, erofs_off_t chunksize);
  int erofs_mkfs_init_devices(struct erofs_sb_info *sbi, unsigned int devices);
  
  #ifdef __cplusplus

diff --git a/lib/blobchunk.c b/lib/blobchunk.c
index 641e3d4..87c153f 100644
--- a/lib/blobchunk.c
+++ b/lib/blobchunk.c
@@ -323,13 +323,21 @@ int erofs_blob_write_chunked_file(struct erofs_inode 
*inode, int fd,
ret = -EIO;
goto err;
}
-
chunk = erofs_blob_getchunk(sbi, chunkdata, len);
if (IS_ERR(chunk)) {
ret = PTR_ERR(chunk);
goto err;
}



Sorry for late reply since I'm working on multi-threaded mkfs.

Can erofs_blob_getchunk directly return _holechunk? I mean,

static struct erofs_blobchunk *erofs_blob_getchunk(struct erofs_sb_info *sbi,
u8 *buf, erofs_off_t chunksize)
{
...
chunk = hashmap_get_from_hash(_hashmap, hash, sha256);
if (chunk) {
DBG_BUGON(chunksize != chunk->chunksize);

if (chunk->blkaddr == erofs_holechunk.blkaddr)
chunk = _holechunk;

sbi->saved_by_deduplication += chunksize;
erofs_dbg("Found duplicated chunk at %u", chunk->blkaddr);
return chunk;
}
...
}

  
+		if (chunk->blkaddr == erofs_holechunk.blkaddr) {

+   *(void **)idx++ = _holechunk;
+   erofs_update_minextblks(sbi, interval_start, pos,
+   );
+   interval_start = pos + len;



I guess several zerochunks can also be merged?  is this line
an expected behavior?


+   lastch = NULL;
+   continue;
+   }
+
if (lastch && (lastch->device_id != chunk->device_id ||
erofs_pos(sbi, lastch->blkaddr) + lastch->chunksize !=
erofs_pos(sbi, chunk->blkaddr))) {


I guess we could form a helper like
static bool erofs_blob_can_merge(struct erofs_sb_info *sbi,
struct erofs_blobchunk *lastch,
struct erofs_blobchunk *chunk)
{
if (lastch == _holechunk && chunk == _holechunk)
return true;
if (lastch->device_id == chunk->device_id &&
erofs_pos(sbi, lastch->blkaddr) + lastch->chunksize ==
erofs_pos(sbi, chunk->blkaddr))
return true;
return false;
}

if (lastch && erofs_blob_can_merge(sbi, lastch, chunk)) {
...
}




@@ -540,7 +548,34 @@ void erofs_blob_exit(void)
}
  }
  
-int erofs_blob_init(const char *blobfile_path)

+static int erofs_insert_zerochunk(erofs_off_t chunksize)
+{
+   u8 *zeros;
+   struct erofs_blobchunk *chunk;
+   u8 sha256[32];
+   unsigned int hash;
+
+   zeros = calloc(1, chunksize);
+   if (!zeros)
+   return -ENOMEM;
+
+   erofs_sha256(zeros, chunksize, sha256);
+   hash = memhash(sha256, sizeof(sha256));



`zeros` needs to be freed here I guess:
free(zeros);

Thanks,
Gao Xiang


+   chunk = malloc(sizeof(struct erofs_blobchunk));
+   if (!chunk)
+   return -ENOMEM;> +
+   chunk->chunksize = chunksize;
+   /* treat chunk filled with zeros as hole */
+   chunk->blkaddr = erofs_holechunk.blkaddr

[PATCH] erofs-utils: lib: fix tarerofs 32-bit overflows

2024-04-11 Thread Gao Xiang
Otherwise, large files won't be imported properly.

Fixes: e3dfe4b8db26 ("erofs-utils: mkfs: support tgz streams for tarerofs")
Fixes: 95d315fd7958 ("erofs-utils: introduce tarerofs")
Signed-off-by: Gao Xiang 
---
 lib/tar.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/tar.c b/lib/tar.c
index b45657d..8d606f9 100644
--- a/lib/tar.c
+++ b/lib/tar.c
@@ -233,7 +233,7 @@ int erofs_iostream_read(struct erofs_iostream *ios, void 
**buf, u64 bytes)
  ret, erofs_strerror(-errno));
}
*buf = ios->buffer;
-   ret = min_t(int, ios->tail, bytes);
+   ret = min_t(int, ios->tail, min_t(u64, bytes, INT_MAX));
ios->head = ret;
return ret;
 }
@@ -605,10 +605,9 @@ void tarerofs_remove_inode(struct erofs_inode *inode)
 static int tarerofs_write_file_data(struct erofs_inode *inode,
struct erofs_tarfile *tar)
 {
-   unsigned int j;
void *buf;
int fd, nread;
-   u64 off;
+   u64 off, j;
 
if (!inode->i_diskbuf) {
inode->i_diskbuf = calloc(1, sizeof(*inode->i_diskbuf));
-- 
2.39.3



Re: [PATCH] erofs: derive fsid from on-disk UUID for .statfs() if possible

2024-04-09 Thread Gao Xiang




On 2024/4/9 16:11, Hongzhen Luo wrote:

Use the superblock's UUID to generate the fsid when it's non-null.

Signed-off-by: Hongzhen Luo 


Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


---
  fs/erofs/super.c | 12 +---
  1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index c0eb139adb07..83bd8ee3b5ba 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -923,22 +923,20 @@ static int erofs_statfs(struct dentry *dentry, struct 
kstatfs *buf)
  {
struct super_block *sb = dentry->d_sb;
struct erofs_sb_info *sbi = EROFS_SB(sb);
-   u64 id = 0;
-
-   if (!erofs_is_fscache_mode(sb))
-   id = huge_encode_dev(sb->s_bdev->bd_dev);
  
  	buf->f_type = sb->s_magic;

buf->f_bsize = sb->s_blocksize;
buf->f_blocks = sbi->total_blocks;
buf->f_bfree = buf->f_bavail = 0;
-
buf->f_files = ULLONG_MAX;
buf->f_ffree = ULLONG_MAX - sbi->inos;
-
buf->f_namelen = EROFS_NAME_LEN;
  
-	buf->f_fsid= u64_to_fsid(id);

+   if (uuid_is_null(>s_uuid))
+   buf->f_fsid = u64_to_fsid(erofs_is_fscache_mode(sb) ? 0 :
+   huge_encode_dev(sb->s_bdev->bd_dev));
+   else
+   buf->f_fsid = uuid_to_fsid((__u8 *)>s_uuid);
return 0;
  }
  


Re: [syzbot] [erofs?] BUG: using smp_processor_id() in preemptible code in z_erofs_get_gbuf

2024-04-08 Thread Gao Xiang

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev


Re: [PATCH v1] erofs: use raw_smp_processor_id() to get buffer from global buffer pool

2024-04-08 Thread Gao Xiang




On 2024/4/9 07:05, Sandeep Dhavale wrote:


Thanks for catching this, since the original patch is
for next upstream cycle, may I fold this fix in the
original patch?


Hi Gao,
Sounds good. As the fix is simple, it makes sense to fold it into the
original one.

Thanks,
Sandeep.


Thanks, folded.

Thanks,
Gao Xiang


Re: [PATCH v1] erofs: use raw_smp_processor_id() to get buffer from global buffer pool

2024-04-08 Thread Gao Xiang

Hi Sandeep,

On 2024/4/9 05:52, Sandeep Dhavale wrote:

erofs will decompress in the preemptible context (kworker or per cpu
thread). As smp_processor_id() cannot be used in preemptible contexts,
use raw_smp_processor_id() instead to index into global buffer pool.

Reported-by: syzbot+27cc650ef45b379df...@syzkaller.appspotmail.com
Fixes: 7a7513292cc6 ("erofs: rename per-CPU buffers to global buffer pool and make 
it configurable")
Signed-off-by: Sandeep Dhavale 


Thanks for catching this, since the original patch is
for next upstream cycle, may I fold this fix in the
original patch?

I will add your credit into the original patch.

Thanks,
Gao Xiang


---
  fs/erofs/zutil.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/erofs/zutil.c b/fs/erofs/zutil.c
index b9b99158bb4e..036024bce9f7 100644
--- a/fs/erofs/zutil.c
+++ b/fs/erofs/zutil.c
@@ -30,7 +30,7 @@ static struct shrinker *erofs_shrinker_info;
  
  static unsigned int z_erofs_gbuf_id(void)

  {
-   return smp_processor_id() % z_erofs_gbuf_count;
+   return raw_smp_processor_id() % z_erofs_gbuf_count;
  }
  
  void *z_erofs_get_gbuf(unsigned int requiredpages)


Re: [PATCH] erofs-utils: change temporal buffer to non static

2024-04-08 Thread Gao Xiang

Hi Yifan,

On 2024/4/8 18:34, Yifan Zhao wrote:

Hi Noboru,


AFAIK, this `tryrecompress_trailing` is only used when `may_inline` is true, 
indicating that

this segment is the last one in the file. In the current inner-file 
implementation, it means

that only one worker will use the `tmp` buffer at a given time.


In fact, the `static` modifier is removed in the first version of the patchset, 
but the change

is reversed during the review. I think Xiang may share his opinion about this.



Yes, I think it will impact inter-file implementation, but that doesn't matter 
since we'll
finally enable this.  So I will apply this first :)

Thanks,
Gao Xiang




Thanks,

Yifan Zhao

On 4/8/24 5:16 PM, Noboru Asai wrote:

In multi-threaded mode, each thread must use a different buffer in 
tryrecompress_trailing
function, so change this buffer to non static.

Signed-off-by: Noboru Asai 
---
  lib/compress.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/compress.c b/lib/compress.c
index 641fde6..7415fda 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -447,7 +447,7 @@ static void tryrecompress_trailing(struct 
z_erofs_compress_sctx *ctx,
 void *out, unsigned int *compressedsize)
  {
  struct erofs_sb_info *sbi = ctx->ictx->inode->sbi;
-    static char tmp[Z_EROFS_PCLUSTER_MAX_SIZE];
+    char tmp[Z_EROFS_PCLUSTER_MAX_SIZE];
  unsigned int count;
  int ret = *compressedsize;


  1   2   3   4   5   6   7   8   9   10   >