Re: [PATCH] staging: erofs: removing an extra call to iloc() in fill_inode()

2019-08-14 Thread Gao Xiang
On Wed, Aug 14, 2019 at 09:22:53AM +0530, Pratik Shinde wrote:
> Yes.since we already have a function with same name (and we are using it in
> same context).
> 'inode_loc' was the most meaningful name I could come up with :)
> 
> --Pratik.

And one more small suggestion... see the following,
https://lore.kernel.org/lkml/20190805044225.ga14...@kroah.com/

Happy hacking! :)

Thanks,
Gao Xiang

> 
> On Wed, Aug 14, 2019 at 7:37 AM Gao Xiang  wrote:
> 
> > On Wed, Aug 14, 2019 at 09:56:09AM +0800, Chao Yu wrote:
> > > On 2019/8/14 9:59, Gao Xiang wrote:
> > > > Hi Pratik,
> > > >
> > > > On Wed, Aug 14, 2019 at 02:08:40AM +0530, Pratik Shinde wrote:
> > > >> in fill_inode() we call iloc() twice.Avoiding the extra call by
> > > >> storing the result.
> > > >>
> > > >> Signed-off-by: Pratik Shinde 
> > > >
> > > > I have no objection of this patch, but I'd like to
> > > > hear Chao/Greg's idea about this...
> > >
> > > It looks more clean. :)
> > >
> > > Nitpick, maybe change 'inode_loc' to shorter 'iloc' will be better.
> >
> > iloc is the name of static inline helper function in internal.h
> > used for shorter lines...
> >
> > Thanks,
> > Gao Xiang
> >
> > >
> > > Reviewed-by: Chao Yu 
> > >
> > > Thanks,
> > >
> > > >
> > > > Thanks,
> > > > Gao Xiang
> > > >
> > > >> ---
> > > >>  drivers/staging/erofs/inode.c | 7 ---
> > > >>  1 file changed, 4 insertions(+), 3 deletions(-)
> > > >>
> > > >> diff --git a/drivers/staging/erofs/inode.c
> > b/drivers/staging/erofs/inode.c
> > > >> index 4c3d8bf..d82ba6c 100644
> > > >> --- a/drivers/staging/erofs/inode.c
> > > >> +++ b/drivers/staging/erofs/inode.c
> > > >> @@ -167,11 +167,12 @@ static int fill_inode(struct inode *inode, int
> > isdir)
> > > >>int err;
> > > >>erofs_blk_t blkaddr;
> > > >>unsigned int ofs;
> > > >> +  erofs_off_t inode_loc;
> > > >>
> > > >>trace_erofs_fill_inode(inode, isdir);
> > > >> -
> > > >> -  blkaddr = erofs_blknr(iloc(sbi, vi->nid));
> > > >> -  ofs = erofs_blkoff(iloc(sbi, vi->nid));
> > > >> +  inode_loc = iloc(sbi, vi->nid);
> > > >> +  blkaddr = erofs_blknr(inode_loc);
> > > >> +  ofs = erofs_blkoff(inode_loc);
> > > >>
> > > >>debugln("%s, reading inode nid %llu at %u of blkaddr %u",
> > > >>__func__, vi->nid, ofs, blkaddr);
> > > >> --
> > > >> 2.9.3
> > > >>
> > > > .
> > > >
> >


Re: [PATCH v2] Add flags option to get xattr method paired to __vfs_getxattr

2019-08-14 Thread kbuild test robot
Hi Mark,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc4 next-20190813]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Mark-Salyzyn/Add-flags-option-to-get-xattr-method-paired-to-__vfs_getxattr/20190814-124805
config: nds32-allmodconfig (attached as .config)
compiler: nds32le-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=nds32 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   fs/ubifs/xattr.c:326:9: error: conflicting types for 'ubifs_xattr_get'
ssize_t ubifs_xattr_get(struct inode *host, const char *name, void *buf,
^~~
   In file included from fs/ubifs/xattr.c:46:
   fs/ubifs/ubifs.h:2006:9: note: previous declaration of 'ubifs_xattr_get' was 
here
ssize_t ubifs_xattr_get(struct inode *host, const char *name, void *buf,
^~~
   fs/ubifs/xattr.c: In function 'xattr_get':
   fs/ubifs/xattr.c:678:9: error: too few arguments to function 
'ubifs_xattr_get'
 return ubifs_xattr_get(inode, name, buffer, size);
^~~
   fs/ubifs/xattr.c:326:9: note: declared here
ssize_t ubifs_xattr_get(struct inode *host, const char *name, void *buf,
^~~
   fs/ubifs/xattr.c: At top level:
>> fs/ubifs/xattr.c:699:9: error: initialization of 'int (*)(const struct 
>> xattr_handler *, struct dentry *, struct inode *, const char *, void *, 
>> size_t,  int)' {aka 'int (*)(const struct xattr_handler *, struct dentry *, 
>> struct inode *, const char *, void *, unsigned int,  int)'} from 
>> incompatible pointer type 'int (*)(const struct xattr_handler *, struct 
>> dentry *, struct inode *, const char *, void *, size_t)' {aka 'int (*)(const 
>> struct xattr_handler *, struct dentry *, struct inode *, const char *, void 
>> *, unsigned int)'} [-Werror=incompatible-pointer-types]
 .get = xattr_get,
^
   fs/ubifs/xattr.c:699:9: note: (near initialization for 
'ubifs_user_xattr_handler.get')
   fs/ubifs/xattr.c:705:9: error: initialization of 'int (*)(const struct 
xattr_handler *, struct dentry *, struct inode *, const char *, void *, size_t, 
 int)' {aka 'int (*)(const struct xattr_handler *, struct dentry *, struct 
inode *, const char *, void *, unsigned int,  int)'} from incompatible pointer 
type 'int (*)(const struct xattr_handler *, struct dentry *, struct inode *, 
const char *, void *, size_t)' {aka 'int (*)(const struct xattr_handler *, 
struct dentry *, struct inode *, const char *, void *, unsigned int)'} 
[-Werror=incompatible-pointer-types]
 .get = xattr_get,
^
   fs/ubifs/xattr.c:705:9: note: (near initialization for 
'ubifs_trusted_xattr_handler.get')
   fs/ubifs/xattr.c:712:9: error: initialization of 'int (*)(const struct 
xattr_handler *, struct dentry *, struct inode *, const char *, void *, size_t, 
 int)' {aka 'int (*)(const struct xattr_handler *, struct dentry *, struct 
inode *, const char *, void *, unsigned int,  int)'} from incompatible pointer 
type 'int (*)(const struct xattr_handler *, struct dentry *, struct inode *, 
const char *, void *, size_t)' {aka 'int (*)(const struct xattr_handler *, 
struct dentry *, struct inode *, const char *, void *, unsigned int)'} 
[-Werror=incompatible-pointer-types]
 .get = xattr_get,
^
   fs/ubifs/xattr.c:712:9: note: (near initialization for 
'ubifs_security_xattr_handler.get')
   fs/ubifs/xattr.c: In function 'xattr_get':
   fs/ubifs/xattr.c:679:1: warning: control reaches end of non-void function 
[-Wreturn-type]
}
^
   cc1: some warnings being treated as errors

vim +699 fs/ubifs/xattr.c

2b88fc21cae91e Andreas Gruenbacher 2016-04-22  669  
ade46c3a6029de Richard Weinberger  2016-09-19  670  static int xattr_get(const 
struct xattr_handler *handler,
2b88fc21cae91e Andreas Gruenbacher 2016-04-22  671 
struct dentry *dentry, struct inode *inode,
2b88fc21cae91e Andreas Gruenbacher 2016-04-22  672 
const char *name, void *buffer, size_t size)
2b88fc21cae91e Andreas Gruenbacher 2016-04-22  673  {
2b88fc21cae91e Andreas Gruenbacher 2016-04-22  674  dbg_gen("xattr '%s', 
ino %lu ('%pd'), buf size %zd", name,
2b88fc21cae91e Andreas Gruenbacher 2016-04-22  675  inode->i_ino, 
dentry, size);
2b88fc21cae91e Andreas Gruenbacher 2016-04-22  676  
17ce1eb0b64eb2 Richard Weinberger  2016-07-31  677  name = 
xattr_full_name(handler, name);
ade46c3a6029de Richard Weinberger  2016-09-19 @678 

[PATCH v8 18/24] erofs: introduce pagevec for decompression subsystem

2019-08-14 Thread Gao Xiang
For each physical cluster, there is a straight-forward
way of allocating a fixed or variable-sized array to
record the corresponding file pages for its decompression
if we decide to decompress these pages asynchronously
(eg. read-ahead case), however it will take variable-sized
on-heap memory compared with traditional uncompressed
filesystems.

This patch introduces a pagevec solution to reuse some
allocated file page in the time-sharing approach to store
parts of the array itself in order to minimize the extra
memory overhead, thus only a small-sized constant array
used for booting the whole array itself up will be needed.

Signed-off-by: Gao Xiang 
---
 fs/erofs/zpvec.h | 159 +++
 1 file changed, 159 insertions(+)
 create mode 100644 fs/erofs/zpvec.h

diff --git a/fs/erofs/zpvec.h b/fs/erofs/zpvec.h
new file mode 100644
index ..bb7689e67836
--- /dev/null
+++ b/fs/erofs/zpvec.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * linux/fs/erofs/zpvec.h
+ *
+ * Copyright (C) 2018 HUAWEI, Inc.
+ * http://www.huawei.com/
+ * Created by Gao Xiang 
+ */
+#ifndef __EROFS_FS_ZPVEC_H
+#define __EROFS_FS_ZPVEC_H
+
+#include "tagptr.h"
+
+/* page type in pagevec for decompress subsystem */
+enum z_erofs_page_type {
+   /* including Z_EROFS_VLE_PAGE_TAIL_EXCLUSIVE */
+   Z_EROFS_PAGE_TYPE_EXCLUSIVE,
+
+   Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED,
+
+   Z_EROFS_VLE_PAGE_TYPE_HEAD,
+   Z_EROFS_VLE_PAGE_TYPE_MAX
+};
+
+extern void __compiletime_error("Z_EROFS_PAGE_TYPE_EXCLUSIVE != 0")
+   __bad_page_type_exclusive(void);
+
+/* pagevec tagged pointer */
+typedef tagptr2_t  erofs_vtptr_t;
+
+/* pagevec collector */
+struct z_erofs_pagevec_ctor {
+   struct page *curr, *next;
+   erofs_vtptr_t *pages;
+
+   unsigned int nr, index;
+};
+
+static inline void z_erofs_pagevec_ctor_exit(struct z_erofs_pagevec_ctor *ctor,
+bool atomic)
+{
+   if (!ctor->curr)
+   return;
+
+   if (atomic)
+   kunmap_atomic(ctor->pages);
+   else
+   kunmap(ctor->curr);
+}
+
+static inline struct page *
+z_erofs_pagevec_ctor_next_page(struct z_erofs_pagevec_ctor *ctor,
+  unsigned int nr)
+{
+   unsigned int index;
+
+   /* keep away from occupied pages */
+   if (ctor->next)
+   return ctor->next;
+
+   for (index = 0; index < nr; ++index) {
+   const erofs_vtptr_t t = ctor->pages[index];
+   const unsigned int tags = tagptr_unfold_tags(t);
+
+   if (tags == Z_EROFS_PAGE_TYPE_EXCLUSIVE)
+   return tagptr_unfold_ptr(t);
+   }
+   DBG_BUGON(nr >= ctor->nr);
+   return NULL;
+}
+
+static inline void
+z_erofs_pagevec_ctor_pagedown(struct z_erofs_pagevec_ctor *ctor,
+ bool atomic)
+{
+   struct page *next = z_erofs_pagevec_ctor_next_page(ctor, ctor->nr);
+
+   z_erofs_pagevec_ctor_exit(ctor, atomic);
+
+   ctor->curr = next;
+   ctor->next = NULL;
+   ctor->pages = atomic ?
+   kmap_atomic(ctor->curr) : kmap(ctor->curr);
+
+   ctor->nr = PAGE_SIZE / sizeof(struct page *);
+   ctor->index = 0;
+}
+
+static inline void z_erofs_pagevec_ctor_init(struct z_erofs_pagevec_ctor *ctor,
+unsigned int nr,
+erofs_vtptr_t *pages,
+unsigned int i)
+{
+   ctor->nr = nr;
+   ctor->curr = ctor->next = NULL;
+   ctor->pages = pages;
+
+   if (i >= nr) {
+   i -= nr;
+   z_erofs_pagevec_ctor_pagedown(ctor, false);
+   while (i > ctor->nr) {
+   i -= ctor->nr;
+   z_erofs_pagevec_ctor_pagedown(ctor, false);
+   }
+   }
+   ctor->next = z_erofs_pagevec_ctor_next_page(ctor, i);
+   ctor->index = i;
+}
+
+static inline bool z_erofs_pagevec_enqueue(struct z_erofs_pagevec_ctor *ctor,
+  struct page *page,
+  enum z_erofs_page_type type,
+  bool *occupied)
+{
+   *occupied = false;
+   if (unlikely(!ctor->next && type))
+   if (ctor->index + 1 == ctor->nr)
+   return false;
+
+   if (unlikely(ctor->index >= ctor->nr))
+   z_erofs_pagevec_ctor_pagedown(ctor, false);
+
+   /* exclusive page type must be 0 */
+   if (Z_EROFS_PAGE_TYPE_EXCLUSIVE != (uintptr_t)NULL)
+   __bad_page_type_exclusive();
+
+   /* should remind that collector->next never equal to 1, 2 */
+   if (type == (uintptr_t)ctor->next) {
+   ctor->next = page;
+   *occupied = true;
+   }
+   ctor->pages[ctor->index++] = 

[PATCH v8 01/24] erofs: add on-disk layout

2019-08-14 Thread Gao Xiang
This commit adds the on-disk layout header file of erofs.
On-disk format is compatible with erofs-staging added in 4.19.

In addition, add EROFS_SUPER_MAGIC_V1 to magic.h.

Signed-off-by: Gao Xiang 
---
 fs/erofs/erofs_fs.h| 316 +
 include/uapi/linux/magic.h |   1 +
 2 files changed, 317 insertions(+)
 create mode 100644 fs/erofs/erofs_fs.h

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
new file mode 100644
index ..230fcba1099d
--- /dev/null
+++ b/fs/erofs/erofs_fs.h
@@ -0,0 +1,316 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR Apache-2.0 */
+/*
+ * linux/fs/erofs/erofs_fs.h
+ *
+ * Copyright (C) 2017-2018 HUAWEI, Inc.
+ * http://www.huawei.com/
+ * Created by Gao Xiang 
+ */
+#ifndef __EROFS_FS_H
+#define __EROFS_FS_H
+
+/* Enhanced(Extended) ROM File System */
+#define EROFS_SUPER_OFFSET  1024
+
+/*
+ * Any bits that aren't in EROFS_ALL_REQUIREMENTS should be
+ * incompatible with this kernel version.
+ */
+#define EROFS_REQUIREMENT_LZ4_0PADDING 0x0001
+#define EROFS_ALL_REQUIREMENTS 0
+
+struct erofs_super_block {
+/*  0 */__le32 magic;   /* in the little endian */
+/*  4 */__le32 checksum;/* crc32c(super_block) */
+/*  8 */__le32 features;/* (aka. feature_compat) */
+/* 12 */__u8 blkszbits; /* support block_size == PAGE_SIZE only */
+/* 13 */__u8 reserved;
+
+/* 14 */__le16 root_nid;
+/* 16 */__le64 inos;/* total valid ino # (== f_files - f_favail) */
+
+/* 24 */__le64 build_time;  /* inode v1 time derivation */
+/* 32 */__le32 build_time_nsec;
+/* 36 */__le32 blocks;  /* used for statfs */
+/* 40 */__le32 meta_blkaddr;
+/* 44 */__le32 xattr_blkaddr;
+/* 48 */__u8 uuid[16];  /* 128-bit uuid for volume */
+/* 64 */__u8 volume_name[16];   /* volume name */
+/* 80 */__le32 requirements;/* (aka. feature_incompat) */
+
+/* 84 */__u8 reserved2[44];
+} __packed; /* 128 bytes */
+
+/*
+ * erofs inode data mapping:
+ * 0 - inode plain without inline data A:
+ * inode, [xattrs], ... | ... | no-holed data
+ * 1 - inode VLE compression B (legacy):
+ * inode, [xattrs], extents ... | ...
+ * 2 - inode plain with inline data C:
+ * inode, [xattrs], last_inline_data, ... | ... | no-holed data
+ * 3 - inode compression D:
+ * inode, [xattrs], map_header, extents ... | ...
+ * 4~7 - reserved
+ */
+enum {
+   EROFS_INODE_FLAT_PLAIN,
+   EROFS_INODE_FLAT_COMPRESSION_LEGACY,
+   EROFS_INODE_FLAT_INLINE,
+   EROFS_INODE_FLAT_COMPRESSION,
+   EROFS_INODE_LAYOUT_MAX
+};
+
+static inline bool erofs_inode_is_data_compressed(unsigned int datamode)
+{
+   if (datamode == EROFS_INODE_FLAT_COMPRESSION)
+   return true;
+   return datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY;
+}
+
+/* bit definitions of inode i_advise */
+#define EROFS_I_VERSION_BITS1
+#define EROFS_I_DATA_MAPPING_BITS   3
+
+#define EROFS_I_VERSION_BIT 0
+#define EROFS_I_DATA_MAPPING_BIT1
+
+struct erofs_inode_v1 {
+/*  0 */__le16 i_advise;
+
+/* 1 header + n-1 * 4 bytes inline xattr to keep continuity */
+/*  2 */__le16 i_xattr_icount;
+/*  4 */__le16 i_mode;
+/*  6 */__le16 i_nlink;
+/*  8 */__le32 i_size;
+/* 12 */__le32 i_reserved;
+/* 16 */union {
+   /* file total compressed blocks for data mapping 1 */
+   __le32 compressed_blocks;
+   __le32 raw_blkaddr;
+
+   /* for device files, used to indicate old/new device # */
+   __le32 rdev;
+   } i_u __packed;
+/* 20 */__le32 i_ino;   /* only used for 32-bit stat compatibility */
+/* 24 */__le16 i_uid;
+/* 26 */__le16 i_gid;
+/* 28 */__le32 i_reserved2;
+} __packed;
+
+/* 32 bytes on-disk inode */
+#define EROFS_INODE_LAYOUT_V1   0
+/* 64 bytes on-disk inode */
+#define EROFS_INODE_LAYOUT_V2   1
+
+struct erofs_inode_v2 {
+/*  0 */__le16 i_advise;
+
+/* 1 header + n-1 * 4 bytes inline xattr to keep continuity */
+/*  2 */__le16 i_xattr_icount;
+/*  4 */__le16 i_mode;
+/*  6 */__le16 i_reserved;
+/*  8 */__le64 i_size;
+/* 16 */union {
+   /* file total compressed blocks for data mapping 1 */
+   __le32 compressed_blocks;
+   __le32 raw_blkaddr;
+
+   /* for device files, used to indicate old/new device # */
+   __le32 rdev;
+   } i_u __packed;
+
+   /* only used for 32-bit stat compatibility */
+/* 20 */__le32 i_ino;
+
+/* 24 */__le32 i_uid;
+/* 28 */__le32 i_gid;
+/* 32 */__le64 i_ctime;
+/* 40 */__le32 i_ctime_nsec;
+/* 44 */__le32 i_nlink;
+/* 48 */__u8   i_reserved2[16];
+} __packed; /* 64 bytes */
+
+#define EROFS_MAX_SHARED_XATTRS (128)
+/* h_shared_count between 129 ... 255 are special # */
+#define EROFS_SHARED_XATTR_EXTENT   (255)
+
+/*
+ * inline xattrs (n == i_xattr_icount):
+ * erofs_xattr_ibody_header(1) + (n - 1) * 4 bytes
+ *  12 bytes   /   

[PATCH v8 23/24] erofs: introduce cached decompression

2019-08-14 Thread Gao Xiang
This patch adds strategies which can be selected
by users in order to cache both incomplete ends of
compressed physical clusters as a complement of
in-place I/O in order to boost random read, but
it costs more memory than the in-place I/O only.

Signed-off-by: Gao Xiang 
---
 fs/erofs/internal.h |  16 +
 fs/erofs/super.c| 126 -
 fs/erofs/utils.c|  40 ---
 fs/erofs/zdata.c| 165 ++--
 fs/erofs/zdata.h|   7 +-
 5 files changed, 336 insertions(+), 18 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 2be1ae700aca..ad3b6ba75979 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -72,6 +72,12 @@ struct erofs_sb_info {
unsigned int max_sync_decompress_pages;
 
unsigned int shrinker_run_no;
+
+   /* current strategy of how to use managed cache */
+   unsigned char cache_strategy;
+
+   /* pseudo inode to manage cached pages */
+   struct inode *managed_cache;
 #endif /* CONFIG_EROFS_FS_ZIP */
u32 blocks;
u32 meta_blkaddr;
@@ -157,6 +163,12 @@ static inline void *erofs_kmalloc(struct erofs_sb_info 
*sbi,
 #define test_opt(sbi, option)  ((sbi)->mount_opt & EROFS_MOUNT_##option)
 
 #ifdef CONFIG_EROFS_FS_ZIP
+enum {
+   EROFS_ZIP_CACHE_DISABLED,
+   EROFS_ZIP_CACHE_READAHEAD,
+   EROFS_ZIP_CACHE_READAROUND
+};
+
 #define EROFS_LOCKED_MAGIC (INT_MIN | 0xE0F510CCL)
 
 /* basic unit of the workstation of a super_block */
@@ -524,6 +536,10 @@ int __init erofs_init_shrinker(void);
 void erofs_exit_shrinker(void);
 int __init z_erofs_init_zip_subsystem(void);
 void z_erofs_exit_zip_subsystem(void);
+int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
+  struct erofs_workgroup *egrp);
+int erofs_try_to_free_cached_page(struct address_space *mapping,
+ struct page *page);
 #else
 static inline void erofs_shrinker_register(struct super_block *sb) {}
 static inline void erofs_shrinker_unregister(struct super_block *sb) {}
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index bdac8abf3aa7..95187619b3e3 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -197,10 +197,45 @@ static unsigned int erofs_get_fault_rate(struct 
erofs_sb_info *sbi)
 }
 #endif
 
+#ifdef CONFIG_EROFS_FS_ZIP
+static int erofs_build_cache_strategy(struct erofs_sb_info *sbi,
+ substring_t *args)
+{
+   const char *cs = match_strdup(args);
+   int err = 0;
+
+   if (!cs) {
+   errln("Not enough memory to store cache strategy");
+   return -ENOMEM;
+   }
+
+   if (!strcmp(cs, "disabled")) {
+   sbi->cache_strategy = EROFS_ZIP_CACHE_DISABLED;
+   } else if (!strcmp(cs, "readahead")) {
+   sbi->cache_strategy = EROFS_ZIP_CACHE_READAHEAD;
+   } else if (!strcmp(cs, "readaround")) {
+   sbi->cache_strategy = EROFS_ZIP_CACHE_READAROUND;
+   } else {
+   errln("Unrecognized cache strategy \"%s\"", cs);
+   err = -EINVAL;
+   }
+   kfree(cs);
+   return err;
+}
+#else
+static int erofs_build_cache_strategy(struct erofs_sb_info *sbi,
+ substring_t *args)
+{
+   infoln("EROFS compression is disabled, so cache strategy is ignored");
+   return 0;
+}
+#endif
+
 /* set up default EROFS parameters */
 static void default_options(struct erofs_sb_info *sbi)
 {
 #ifdef CONFIG_EROFS_FS_ZIP
+   sbi->cache_strategy = EROFS_ZIP_CACHE_READAROUND;
sbi->max_sync_decompress_pages = 3;
 #endif
 #ifdef CONFIG_EROFS_FS_XATTR
@@ -217,6 +252,7 @@ enum {
Opt_acl,
Opt_noacl,
Opt_fault_injection,
+   Opt_cache_strategy,
Opt_err
 };
 
@@ -226,6 +262,7 @@ static match_table_t erofs_tokens = {
{Opt_acl, "acl"},
{Opt_noacl, "noacl"},
{Opt_fault_injection, "fault_injection=%u"},
+   {Opt_cache_strategy, "cache_strategy=%s"},
{Opt_err, NULL}
 };
 
@@ -283,6 +320,11 @@ static int parse_options(struct super_block *sb, char 
*options)
if (err)
return err;
break;
+   case Opt_cache_strategy:
+   err = erofs_build_cache_strategy(EROFS_SB(sb), args);
+   if (err)
+   return err;
+   break;
default:
errln("Unrecognized mount option \"%s\" or missing 
value", p);
return -EINVAL;
@@ -291,6 +333,65 @@ static int parse_options(struct super_block *sb, char 
*options)
return 0;
 }
 
+#ifdef CONFIG_EROFS_FS_ZIP
+static const struct address_space_operations managed_cache_aops;
+
+static int managed_cache_releasepage(struct page *page, gfp_t gfp_mask)
+{
+   int ret = 1;/* 0 - busy */
+  

[PATCH v8 12/24] erofs: introduce tagged pointer

2019-08-14 Thread Gao Xiang
Currently kernel has scattered tagged pointer usages
hacked by hand in plain code, without a unique and
portable functionset to highlight the tagged pointer
itself and wrap these hacked code in order to clean up
all over meaningless magic masks.

This patch introduces simple generic methods to fold
tags into a pointer integer. Currently it supports
the last n bits of the pointer for tags, which can be
selected by users.

In addition, it will also be used for the upcoming EROFS
filesystem, which heavily uses tagged pointer pproach
 to reduce extra memory allocation.

Link: https://en.wikipedia.org/wiki/Tagged_pointer

Signed-off-by: Gao Xiang 
---
 fs/erofs/tagptr.h | 110 ++
 1 file changed, 110 insertions(+)
 create mode 100644 fs/erofs/tagptr.h

diff --git a/fs/erofs/tagptr.h b/fs/erofs/tagptr.h
new file mode 100644
index ..a72897c86744
--- /dev/null
+++ b/fs/erofs/tagptr.h
@@ -0,0 +1,110 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * A tagged pointer implementation
+ *
+ * Copyright (C) 2018 Gao Xiang 
+ */
+#ifndef __EROFS_FS_TAGPTR_H
+#define __EROFS_FS_TAGPTR_H
+
+#include 
+#include 
+
+/*
+ * the name of tagged pointer types are tagptr{1, 2, 3...}_t
+ * avoid directly using the internal structs __tagptr{1, 2, 3...}
+ */
+#define __MAKE_TAGPTR(n) \
+typedef struct __tagptr##n {   \
+   uintptr_t v;\
+} tagptr##n##_t;
+
+__MAKE_TAGPTR(1)
+__MAKE_TAGPTR(2)
+__MAKE_TAGPTR(3)
+__MAKE_TAGPTR(4)
+
+#undef __MAKE_TAGPTR
+
+extern void __compiletime_error("bad tagptr tags")
+   __bad_tagptr_tags(void);
+
+extern void __compiletime_error("bad tagptr type")
+   __bad_tagptr_type(void);
+
+/* fix the broken usage of "#define tagptr2_t tagptr3_t" by users */
+#define __tagptr_mask_1(ptr, n)\
+   __builtin_types_compatible_p(typeof(ptr), struct __tagptr##n) ? \
+   (1UL << (n)) - 1 :
+
+#define __tagptr_mask(ptr) (\
+   __tagptr_mask_1(ptr, 1) ( \
+   __tagptr_mask_1(ptr, 2) ( \
+   __tagptr_mask_1(ptr, 3) ( \
+   __tagptr_mask_1(ptr, 4) ( \
+   __bad_tagptr_type(), 0)
+
+/* generate a tagged pointer from a raw value */
+#define tagptr_init(type, val) \
+   ((typeof(type)){ .v = (uintptr_t)(val) })
+
+/*
+ * directly cast a tagged pointer to the native pointer type, which
+ * could be used for backward compatibility of existing code.
+ */
+#define tagptr_cast_ptr(tptr) ((void *)(tptr).v)
+
+/* encode tagged pointers */
+#define tagptr_fold(type, ptr, _tags) ({ \
+   const typeof(_tags) tags = (_tags); \
+   if (__builtin_constant_p(tags) && (tags & ~__tagptr_mask(type))) \
+   __bad_tagptr_tags(); \
+tagptr_init(type, (uintptr_t)(ptr) | tags); })
+
+/* decode tagged pointers */
+#define tagptr_unfold_ptr(tptr) \
+   ((void *)((tptr).v & ~__tagptr_mask(tptr)))
+
+#define tagptr_unfold_tags(tptr) \
+   ((tptr).v & __tagptr_mask(tptr))
+
+/* operations for the tagger pointer */
+#define tagptr_eq(_tptr1, _tptr2) ({ \
+   typeof(_tptr1) tptr1 = (_tptr1); \
+   typeof(_tptr2) tptr2 = (_tptr2); \
+   (void)( == ); \
+(tptr1).v == (tptr2).v; })
+
+/* lock-free CAS operation */
+#define tagptr_cmpxchg(_ptptr, _o, _n) ({ \
+   typeof(_ptptr) ptptr = (_ptptr); \
+   typeof(_o) o = (_o); \
+   typeof(_n) n = (_n); \
+   (void)( == ); \
+   (void)( == ptptr); \
+tagptr_init(o, cmpxchg(>v, o.v, n.v)); })
+
+/* wrap WRITE_ONCE if atomic update is needed */
+#define tagptr_replace_tags(_ptptr, tags) ({ \
+   typeof(_ptptr) ptptr = (_ptptr); \
+   *ptptr = tagptr_fold(*ptptr, tagptr_unfold_ptr(*ptptr), tags); \
+*ptptr; })
+
+#define tagptr_set_tags(_ptptr, _tags) ({ \
+   typeof(_ptptr) ptptr = (_ptptr); \
+   const typeof(_tags) tags = (_tags); \
+   if (__builtin_constant_p(tags) && (tags & ~__tagptr_mask(*ptptr))) \
+   __bad_tagptr_tags(); \
+   ptptr->v |= tags; \
+*ptptr; })
+
+#define tagptr_clear_tags(_ptptr, _tags) ({ \
+   typeof(_ptptr) ptptr = (_ptptr); \
+   const typeof(_tags) tags = (_tags); \
+   if (__builtin_constant_p(tags) && (tags & ~__tagptr_mask(*ptptr))) \
+   __bad_tagptr_tags(); \
+   ptptr->v &= ~tags; \
+*ptptr; })
+
+#endif /* __EROFS_FS_TAGPTR_H */
+
-- 
2.17.1



[PATCH v8 22/24] erofs: introduce the decompression frontend

2019-08-14 Thread Gao Xiang
This patch introduces the basic inplace fixed-sized
output decompression implementation for erofs
filesystem.

In constant to fixed-sized input compression, it has
fixed-sized capacity for each compressed cluster to
contain compressed data with the following advantages:
 1) improved storage density;
 2) decompression inplace support;
 3) all data in a compressed physical cluster can be
decompressed and utilized.

The key point of inplace refers to one of all erofs
decompression strategies: Instead of allocating extra
compressed pages and data management structures, it
reuses the allocated file cache pages as much as
possible to store its compressed data (called inplace
I/O) and the corresponding pagevec in a time-sharing
approach, which is particularly useful for low memory
scenario.

In addition, decompression inplace technology is based
on inplace I/O, which eliminates page allocation and
all extra compressed data memcpy.

Signed-off-by: Gao Xiang 
---
 fs/erofs/Kconfig|1 +
 fs/erofs/Makefile   |2 +-
 fs/erofs/internal.h |   14 +-
 fs/erofs/super.c|   11 +
 fs/erofs/zdata.c| 1254 +++
 fs/erofs/zdata.h|  192 +++
 fs/erofs/zmap.c |4 +-
 7 files changed, 1474 insertions(+), 4 deletions(-)
 create mode 100644 fs/erofs/zdata.c
 create mode 100644 fs/erofs/zdata.h

diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index 5f8787c0cf89..16316d1adca3 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -76,6 +76,7 @@ config EROFS_FS_ZIP
bool "EROFS Data Compression Support"
depends on EROFS_FS
select LZ4_DECOMPRESS
+   default y
help
  Enable fixed-sized output compression for EROFS.
 
diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 5594abca6f95..46f2aa4ba46c 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -7,5 +7,5 @@ ccflags-y += -DEROFS_VERSION=\"$(EROFS_VERSION)\"
 obj-$(CONFIG_EROFS_FS) += erofs.o
 erofs-objs := super.o inode.o data.o namei.o dir.o utils.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
-erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o
+erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
 
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 9dc3d47347db..2be1ae700aca 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -68,6 +68,9 @@ struct erofs_sb_info {
/* the dedicated workstation for compression */
struct radix_tree_root workstn_tree;
 
+   /* threshold for decompression synchronously */
+   unsigned int max_sync_decompress_pages;
+
unsigned int shrinker_run_no;
 #endif /* CONFIG_EROFS_FS_ZIP */
u32 blocks;
@@ -327,6 +330,9 @@ static inline bool is_inode_flat_inline(struct inode *inode)
 extern const struct super_operations erofs_sops;
 
 extern const struct address_space_operations erofs_raw_access_aops;
+#ifdef CONFIG_EROFS_FS_ZIP
+extern const struct address_space_operations z_erofs_vle_normalaccess_aops;
+#endif
 
 /*
  * Logical to physical block mapping, used by erofs_map_blocks()
@@ -487,7 +493,7 @@ int erofs_namei(struct inode *dir, struct qstr *name,
 /* dir.c */
 extern const struct file_operations erofs_dir_fops;
 
-/* utils.c */
+/* utils.c / zdata.c */
 struct page *erofs_allocpage(struct list_head *pool, gfp_t gfp, bool nofail);
 
 #if (EROFS_PCPUBUF_NR_PAGES > 0)
@@ -511,16 +517,20 @@ struct erofs_workgroup *erofs_find_workgroup(struct 
super_block *sb,
 pgoff_t index, bool *tag);
 int erofs_register_workgroup(struct super_block *sb,
 struct erofs_workgroup *grp, bool tag);
-static inline void erofs_workgroup_free_rcu(struct erofs_workgroup *grp) {}
+void erofs_workgroup_free_rcu(struct erofs_workgroup *grp);
 void erofs_shrinker_register(struct super_block *sb);
 void erofs_shrinker_unregister(struct super_block *sb);
 int __init erofs_init_shrinker(void);
 void erofs_exit_shrinker(void);
+int __init z_erofs_init_zip_subsystem(void);
+void z_erofs_exit_zip_subsystem(void);
 #else
 static inline void erofs_shrinker_register(struct super_block *sb) {}
 static inline void erofs_shrinker_unregister(struct super_block *sb) {}
 static inline int erofs_init_shrinker(void) { return 0; }
 static inline void erofs_exit_shrinker(void) {}
+static inline int z_erofs_init_zip_subsystem(void) { return 0; }
+static inline void z_erofs_exit_zip_subsystem(void) {}
 #endif /* !CONFIG_EROFS_FS_ZIP */
 
 #define EFSCORRUPTEDEUCLEAN /* Filesystem is corrupted */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index ea8d065068fa..bdac8abf3aa7 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -200,6 +200,9 @@ static unsigned int erofs_get_fault_rate(struct 
erofs_sb_info *sbi)
 /* set up default EROFS parameters */
 static void default_options(struct erofs_sb_info *sbi)
 {
+#ifdef CONFIG_EROFS_FS_ZIP
+   sbi->max_sync_decompress_pages = 3;
+#endif
 #ifdef CONFIG_EROFS_FS_XATTR

[PATCH v8 03/24] erofs: add super block operations

2019-08-14 Thread Gao Xiang
This commit adds erofs super block operations, including (u)mount,
remount_fs, show_options, statfs, in addition to some private
icache management functions.

Signed-off-by: Gao Xiang 
---
 fs/erofs/super.c | 437 +++
 1 file changed, 437 insertions(+)
 create mode 100644 fs/erofs/super.c

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
new file mode 100644
index ..cd4bd6f48173
--- /dev/null
+++ b/fs/erofs/super.c
@@ -0,0 +1,437 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * linux/fs/erofs/super.c
+ *
+ * Copyright (C) 2017-2018 HUAWEI, Inc.
+ * http://www.huawei.com/
+ * Created by Gao Xiang 
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "internal.h"
+
+#define CREATE_TRACE_POINTS
+#include 
+
+static struct kmem_cache *erofs_inode_cachep __read_mostly;
+
+static void init_once(void *ptr)
+{
+   struct erofs_vnode *vi = ptr;
+
+   inode_init_once(>vfs_inode);
+}
+
+static int __init erofs_init_inode_cache(void)
+{
+   erofs_inode_cachep = kmem_cache_create("erofs_inode",
+  sizeof(struct erofs_vnode), 0,
+  SLAB_RECLAIM_ACCOUNT,
+  init_once);
+
+   return erofs_inode_cachep ? 0 : -ENOMEM;
+}
+
+static void erofs_exit_inode_cache(void)
+{
+   kmem_cache_destroy(erofs_inode_cachep);
+}
+
+static struct inode *alloc_inode(struct super_block *sb)
+{
+   struct erofs_vnode *vi =
+   kmem_cache_alloc(erofs_inode_cachep, GFP_KERNEL);
+
+   if (!vi)
+   return NULL;
+
+   /* zero out everything except vfs_inode */
+   memset(vi, 0, offsetof(struct erofs_vnode, vfs_inode));
+   return >vfs_inode;
+}
+
+static void free_inode(struct inode *inode)
+{
+   struct erofs_vnode *vi = EROFS_V(inode);
+
+   /* be careful RCU symlink path (see ext4_inode_info->i_data)! */
+   if (is_inode_fast_symlink(inode))
+   kfree(inode->i_link);
+
+   kmem_cache_free(erofs_inode_cachep, vi);
+}
+
+static bool check_layout_compatibility(struct super_block *sb,
+  struct erofs_super_block *layout)
+{
+   const unsigned int requirements = le32_to_cpu(layout->requirements);
+
+   EROFS_SB(sb)->requirements = requirements;
+
+   /* check if current kernel meets all mandatory requirements */
+   if (requirements & (~EROFS_ALL_REQUIREMENTS)) {
+   errln("unidentified requirements %x, please upgrade kernel 
version",
+ requirements & ~EROFS_ALL_REQUIREMENTS);
+   return false;
+   }
+   return true;
+}
+
+static int superblock_read(struct super_block *sb)
+{
+   struct erofs_sb_info *sbi;
+   struct buffer_head *bh;
+   struct erofs_super_block *layout;
+   unsigned int blkszbits;
+   int ret;
+
+   bh = sb_bread(sb, 0);
+
+   if (!bh) {
+   errln("cannot read erofs superblock");
+   return -EIO;
+   }
+
+   sbi = EROFS_SB(sb);
+   layout = (struct erofs_super_block *)((u8 *)bh->b_data
++ EROFS_SUPER_OFFSET);
+
+   ret = -EINVAL;
+   if (le32_to_cpu(layout->magic) != EROFS_SUPER_MAGIC_V1) {
+   errln("cannot find valid erofs superblock");
+   goto out;
+   }
+
+   blkszbits = layout->blkszbits;
+   /* 9(512 bytes) + LOG_SECTORS_PER_BLOCK == LOG_BLOCK_SIZE */
+   if (unlikely(blkszbits != LOG_BLOCK_SIZE)) {
+   errln("blksize %u isn't supported on this platform",
+ 1 << blkszbits);
+   goto out;
+   }
+
+   if (!check_layout_compatibility(sb, layout))
+   goto out;
+
+   sbi->blocks = le32_to_cpu(layout->blocks);
+   sbi->meta_blkaddr = le32_to_cpu(layout->meta_blkaddr);
+   sbi->islotbits = ffs(sizeof(struct erofs_inode_v1)) - 1;
+   sbi->root_nid = le16_to_cpu(layout->root_nid);
+   sbi->inos = le64_to_cpu(layout->inos);
+
+   sbi->build_time = le64_to_cpu(layout->build_time);
+   sbi->build_time_nsec = le32_to_cpu(layout->build_time_nsec);
+
+   memcpy(>s_uuid, layout->uuid, sizeof(layout->uuid));
+   memcpy(sbi->volume_name, layout->volume_name,
+  sizeof(layout->volume_name));
+
+   ret = 0;
+out:
+   brelse(bh);
+   return ret;
+}
+
+#ifdef CONFIG_EROFS_FAULT_INJECTION
+const char *erofs_fault_name[FAULT_MAX] = {
+   [FAULT_KMALLOC] = "kmalloc",
+   [FAULT_READ_IO] = "read IO error",
+};
+
+static void __erofs_build_fault_attr(struct erofs_sb_info *sbi,
+unsigned int rate)
+{
+   struct erofs_fault_info *ffi = >fault_info;
+
+   if (rate) {
+   atomic_set(>inject_ops, 0);
+   ffi->inject_rate = rate;
+   ffi->inject_type = (1 << FAULT_MAX) - 1;
+   } else {
+   

[PATCH v8 15/24] erofs: introduce erofs shrinker

2019-08-14 Thread Gao Xiang
This patch adds a dedicated shrinker targeting to free
unneeded memory consumed by a number of erofs in-memory
data structures.

Like F2FS and UBIFS, it also adds:
  - sbi->umount_mutex to avoid races on shrinker and put_super;
  - sbi->shrinker_run_no to not revisit recently scanned objects.

Signed-off-by: Gao Xiang 
---
 fs/erofs/internal.h |  7 
 fs/erofs/super.c|  6 +++
 fs/erofs/utils.c| 93 -
 3 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 62f1e3ffe0a2..4bcdf32a45ad 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -63,6 +63,9 @@ struct erofs_sb_info {
 #ifdef CONFIG_EROFS_FS_ZIP
/* list for all registered superblocks, mainly for shrinker */
struct list_head list;
+   struct mutex umount_mutex;
+
+   unsigned int shrinker_run_no;
 #endif /* CONFIG_EROFS_FS_ZIP */
u32 blocks;
u32 meta_blkaddr;
@@ -408,9 +411,13 @@ extern const struct file_operations erofs_dir_fops;
 #ifdef CONFIG_EROFS_FS_ZIP
 void erofs_shrinker_register(struct super_block *sb);
 void erofs_shrinker_unregister(struct super_block *sb);
+int __init erofs_init_shrinker(void);
+void erofs_exit_shrinker(void);
 #else
 static inline void erofs_shrinker_register(struct super_block *sb) {}
 static inline void erofs_shrinker_unregister(struct super_block *sb) {}
+static inline int erofs_init_shrinker(void) { return 0; }
+static inline void erofs_exit_shrinker(void) {}
 #endif /* !CONFIG_EROFS_FS_ZIP */
 
 #define EFSCORRUPTEDEUCLEAN /* Filesystem is corrupted */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 2eca3b25db75..09992cc3b2fd 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -413,6 +413,9 @@ static int __init erofs_module_init(void)
if (err)
goto icache_err;
 
+   err = erofs_init_shrinker();
+   if (err)
+   goto shrinker_err;
err = register_filesystem(_fs_type);
if (err)
goto fs_err;
@@ -421,6 +424,8 @@ static int __init erofs_module_init(void)
return 0;
 
 fs_err:
+   erofs_exit_shrinker();
+shrinker_err:
erofs_exit_inode_cache();
 icache_err:
return err;
@@ -429,6 +434,7 @@ static int __init erofs_module_init(void)
 static void __exit erofs_module_exit(void)
 {
unregister_filesystem(_fs_type);
+   erofs_exit_shrinker();
erofs_exit_inode_cache();
infoln("successfully finalize erofs");
 }
diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c
index 791b2df1f761..cab7d77c4e59 100644
--- a/fs/erofs/utils.c
+++ b/fs/erofs/utils.c
@@ -9,6 +9,12 @@
 #include "internal.h"
 
 #ifdef CONFIG_EROFS_FS_ZIP
+/* global shrink count (for all mounted EROFS instances) */
+static atomic_long_t erofs_global_shrink_cnt;
+
+/* protected by 'erofs_sb_list_lock' */
+static unsigned int shrinker_run_no;
+
 /* protects the mounted 'erofs_sb_list' */
 static DEFINE_SPINLOCK(erofs_sb_list_lock);
 static LIST_HEAD(erofs_sb_list);
@@ -17,6 +23,8 @@ void erofs_shrinker_register(struct super_block *sb)
 {
struct erofs_sb_info *sbi = EROFS_SB(sb);
 
+   mutex_init(>umount_mutex);
+
spin_lock(_sb_list_lock);
list_add(>list, _sb_list);
spin_unlock(_sb_list_lock);
@@ -24,9 +32,92 @@ void erofs_shrinker_register(struct super_block *sb)
 
 void erofs_shrinker_unregister(struct super_block *sb)
 {
+   struct erofs_sb_info *const sbi = EROFS_SB(sb);
+
+   mutex_lock(>umount_mutex);
+   /* will add shrink final handler here */
+
+   spin_lock(_sb_list_lock);
+   list_del(>list);
+   spin_unlock(_sb_list_lock);
+   mutex_unlock(>umount_mutex);
+}
+
+static unsigned long erofs_shrink_count(struct shrinker *shrink,
+   struct shrink_control *sc)
+{
+   return atomic_long_read(_global_shrink_cnt);
+}
+
+static unsigned long erofs_shrink_scan(struct shrinker *shrink,
+  struct shrink_control *sc)
+{
+   struct erofs_sb_info *sbi;
+   struct list_head *p;
+
+   unsigned long nr = sc->nr_to_scan;
+   unsigned int run_no;
+   unsigned long freed = 0;
+
spin_lock(_sb_list_lock);
-   list_del(_SB(sb)->list);
+   do {
+   run_no = ++shrinker_run_no;
+   } while (run_no == 0);
+
+   /* Iterate over all mounted superblocks and try to shrink them */
+   p = erofs_sb_list.next;
+   while (p != _sb_list) {
+   sbi = list_entry(p, struct erofs_sb_info, list);
+
+   /*
+* We move the ones we do to the end of the list, so we stop
+* when we see one we have already done.
+*/
+   if (sbi->shrinker_run_no == run_no)
+   break;
+
+   if (!mutex_trylock(>umount_mutex)) {
+   p = p->next;
+   continue;
+   }
+

[PATCH v8 19/24] erofs: add erofs_allocpage()

2019-08-14 Thread Gao Xiang
This patch introduces an temporary _on-stack_ page
pool to reuse the freed page directly as much as
it can for better performance and release all pages
at a time, it also slightly reduces the possibility of
the potential memory allocation failure.

Signed-off-by: Gao Xiang 
---
 fs/erofs/internal.h |  2 ++
 fs/erofs/utils.c| 14 ++
 2 files changed, 16 insertions(+)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 3222947c9bab..9dc3d47347db 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -488,6 +488,8 @@ int erofs_namei(struct inode *dir, struct qstr *name,
 extern const struct file_operations erofs_dir_fops;
 
 /* utils.c */
+struct page *erofs_allocpage(struct list_head *pool, gfp_t gfp, bool nofail);
+
 #if (EROFS_PCPUBUF_NR_PAGES > 0)
 void *erofs_get_pcpubuf(unsigned int pagenr);
 #define erofs_put_pcpubuf(buf) do { \
diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c
index f3eed9af24d6..ae6362abed67 100644
--- a/fs/erofs/utils.c
+++ b/fs/erofs/utils.c
@@ -9,6 +9,20 @@
 #include "internal.h"
 #include 
 
+struct page *erofs_allocpage(struct list_head *pool, gfp_t gfp, bool nofail)
+{
+   struct page *page;
+
+   if (!list_empty(pool)) {
+   page = lru_to_page(pool);
+   DBG_BUGON(page_ref_count(page) != 1);
+   list_del(>lru);
+   } else {
+   page = alloc_pages(gfp | (nofail ? __GFP_NOFAIL : 0), 0);
+   }
+   return page;
+}
+
 #if (EROFS_PCPUBUF_NR_PAGES > 0)
 static struct {
u8 data[PAGE_SIZE * EROFS_PCPUBUF_NR_PAGES];
-- 
2.17.1



[PATCH v8 14/24] erofs: introduce superblock registration

2019-08-14 Thread Gao Xiang
In order to introducing shrinker solution for erofs,
let's manage all mounted erofs instances at first.

Signed-off-by: Gao Xiang 
---
 fs/erofs/Makefile   |  2 +-
 fs/erofs/internal.h | 13 +
 fs/erofs/super.c|  9 +
 fs/erofs/utils.c| 32 
 4 files changed, 55 insertions(+), 1 deletion(-)
 create mode 100644 fs/erofs/utils.c

diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 481a966caf06..930770be124f 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -5,7 +5,7 @@ EROFS_VERSION = "1.0"
 ccflags-y += -DEROFS_VERSION=\"$(EROFS_VERSION)\"
 
 obj-$(CONFIG_EROFS_FS) += erofs.o
-erofs-objs := super.o inode.o data.o namei.o dir.o
+erofs-objs := super.o inode.o data.o namei.o dir.o utils.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
 erofs-$(CONFIG_EROFS_FS_ZIP) += zmap.o
 
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 8432f488409d..62f1e3ffe0a2 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -60,6 +60,10 @@ typedef u64 erofs_off_t;
 typedef u32 erofs_blk_t;
 
 struct erofs_sb_info {
+#ifdef CONFIG_EROFS_FS_ZIP
+   /* list for all registered superblocks, mainly for shrinker */
+   struct list_head list;
+#endif /* CONFIG_EROFS_FS_ZIP */
u32 blocks;
u32 meta_blkaddr;
 #ifdef CONFIG_EROFS_FS_XATTR
@@ -400,6 +404,15 @@ int erofs_namei(struct inode *dir, struct qstr *name,
 /* dir.c */
 extern const struct file_operations erofs_dir_fops;
 
+/* utils.c */
+#ifdef CONFIG_EROFS_FS_ZIP
+void erofs_shrinker_register(struct super_block *sb);
+void erofs_shrinker_unregister(struct super_block *sb);
+#else
+static inline void erofs_shrinker_register(struct super_block *sb) {}
+static inline void erofs_shrinker_unregister(struct super_block *sb) {}
+#endif /* !CONFIG_EROFS_FS_ZIP */
+
 #define EFSCORRUPTEDEUCLEAN /* Filesystem is corrupted */
 
 #endif /* __EROFS_INTERNAL_H */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 561ae6f7fe13..2eca3b25db75 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -354,6 +354,8 @@ static int erofs_fill_super(struct super_block *sb, void 
*data, int silent)
if (unlikely(!sb->s_root))
return -ENOMEM;
 
+   erofs_shrinker_register(sb);
+
if (!silent)
infoln("mounted on %s with opts: %s.", sb->s_id, (char *)data);
return 0;
@@ -385,6 +387,12 @@ static void erofs_kill_sb(struct super_block *sb)
sb->s_fs_info = NULL;
 }
 
+/* called when ->s_root is non-NULL */
+static void erofs_put_super(struct super_block *sb)
+{
+   erofs_shrinker_unregister(sb);
+}
+
 static struct file_system_type erofs_fs_type = {
.owner  = THIS_MODULE,
.name   = "erofs",
@@ -496,6 +504,7 @@ static int erofs_remount(struct super_block *sb, int 
*flags, char *data)
 }
 
 const struct super_operations erofs_sops = {
+   .put_super = erofs_put_super,
.alloc_inode = alloc_inode,
.free_inode = free_inode,
.statfs = erofs_statfs,
diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c
new file mode 100644
index ..791b2df1f761
--- /dev/null
+++ b/fs/erofs/utils.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * linux/fs/erofs/utils.c
+ *
+ * Copyright (C) 2018 HUAWEI, Inc.
+ * http://www.huawei.com/
+ * Created by Gao Xiang 
+ */
+#include "internal.h"
+
+#ifdef CONFIG_EROFS_FS_ZIP
+/* protects the mounted 'erofs_sb_list' */
+static DEFINE_SPINLOCK(erofs_sb_list_lock);
+static LIST_HEAD(erofs_sb_list);
+
+void erofs_shrinker_register(struct super_block *sb)
+{
+   struct erofs_sb_info *sbi = EROFS_SB(sb);
+
+   spin_lock(_sb_list_lock);
+   list_add(>list, _sb_list);
+   spin_unlock(_sb_list_lock);
+}
+
+void erofs_shrinker_unregister(struct super_block *sb)
+{
+   spin_lock(_sb_list_lock);
+   list_del(_SB(sb)->list);
+   spin_unlock(_sb_list_lock);
+}
+#endif /* !CONFIG_EROFS_FS_ZIP */
+
-- 
2.17.1



[PATCH v8 04/24] erofs: add raw address_space operations

2019-08-14 Thread Gao Xiang
This commit adds functions for meta and raw data, and also
provides address_space_operations for raw data access.

Signed-off-by: Gao Xiang 
---
 fs/erofs/data.c | 419 
 1 file changed, 419 insertions(+)
 create mode 100644 fs/erofs/data.c

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
new file mode 100644
index ..3d8f1511cacb
--- /dev/null
+++ b/fs/erofs/data.c
@@ -0,0 +1,419 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * linux/fs/erofs/data.c
+ *
+ * Copyright (C) 2017-2018 HUAWEI, Inc.
+ * http://www.huawei.com/
+ * Created by Gao Xiang 
+ */
+#include "internal.h"
+#include 
+
+#include 
+
+static inline void read_endio(struct bio *bio)
+{
+   struct super_block *const sb = bio->bi_private;
+   struct bio_vec *bvec;
+   blk_status_t err = bio->bi_status;
+   struct bvec_iter_all iter_all;
+
+   if (time_to_inject(EROFS_SB(sb), FAULT_READ_IO)) {
+   erofs_show_injection_info(FAULT_READ_IO);
+   err = BLK_STS_IOERR;
+   }
+
+   bio_for_each_segment_all(bvec, bio, iter_all) {
+   struct page *page = bvec->bv_page;
+
+   /* page is already locked */
+   DBG_BUGON(PageUptodate(page));
+
+   if (unlikely(err))
+   SetPageError(page);
+   else
+   SetPageUptodate(page);
+
+   unlock_page(page);
+   /* page could be reclaimed now */
+   }
+   bio_put(bio);
+}
+
+/* prio -- true is used for dir */
+struct page *__erofs_get_meta_page(struct super_block *sb,
+  erofs_blk_t blkaddr, bool prio, bool nofail)
+{
+   struct inode *const bd_inode = sb->s_bdev->bd_inode;
+   struct address_space *const mapping = bd_inode->i_mapping;
+   /* prefer retrying in the allocator to blindly looping below */
+   const gfp_t gfp = mapping_gfp_constraint(mapping, ~__GFP_FS) |
+   (nofail ? __GFP_NOFAIL : 0);
+   unsigned int io_retries = nofail ? EROFS_IO_MAX_RETRIES_NOFAIL : 0;
+   struct page *page;
+   int err;
+
+repeat:
+   page = find_or_create_page(mapping, blkaddr, gfp);
+   if (unlikely(!page)) {
+   DBG_BUGON(nofail);
+   return ERR_PTR(-ENOMEM);
+   }
+   DBG_BUGON(!PageLocked(page));
+
+   if (!PageUptodate(page)) {
+   struct bio *bio;
+
+   bio = erofs_grab_bio(sb, blkaddr, 1, sb, read_endio, nofail);
+   if (IS_ERR(bio)) {
+   DBG_BUGON(nofail);
+   err = PTR_ERR(bio);
+   goto err_out;
+   }
+
+   err = bio_add_page(bio, page, PAGE_SIZE, 0);
+   if (unlikely(err != PAGE_SIZE)) {
+   err = -EFAULT;
+   goto err_out;
+   }
+
+   __submit_bio(bio, REQ_OP_READ,
+REQ_META | (prio ? REQ_PRIO : 0));
+
+   lock_page(page);
+
+   /* this page has been truncated by others */
+   if (unlikely(page->mapping != mapping)) {
+unlock_repeat:
+   unlock_page(page);
+   put_page(page);
+   goto repeat;
+   }
+
+   /* more likely a read error */
+   if (unlikely(!PageUptodate(page))) {
+   if (io_retries) {
+   --io_retries;
+   goto unlock_repeat;
+   }
+   err = -EIO;
+   goto err_out;
+   }
+   }
+   return page;
+
+err_out:
+   unlock_page(page);
+   put_page(page);
+   return ERR_PTR(err);
+}
+
+static int erofs_map_blocks_flatmode(struct inode *inode,
+struct erofs_map_blocks *map,
+int flags)
+{
+   int err = 0;
+   erofs_blk_t nblocks, lastblk;
+   u64 offset = map->m_la;
+   struct erofs_vnode *vi = EROFS_V(inode);
+
+   trace_erofs_map_blocks_flatmode_enter(inode, map, flags);
+
+   nblocks = DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
+   lastblk = nblocks - is_inode_flat_inline(inode);
+
+   if (unlikely(offset >= inode->i_size)) {
+   /* leave out-of-bound access unmapped */
+   map->m_flags = 0;
+   map->m_plen = 0;
+   goto out;
+   }
+
+   /* there is no hole in flatmode */
+   map->m_flags = EROFS_MAP_MAPPED;
+
+   if (offset < blknr_to_addr(lastblk)) {
+   map->m_pa = blknr_to_addr(vi->raw_blkaddr) + map->m_la;
+   map->m_plen = blknr_to_addr(lastblk) - offset;
+   } else if (is_inode_flat_inline(inode)) {
+   /* 2 - inode inline B: inode, [xattrs], inline last blk... */
+   struct erofs_sb_info *sbi = 

[PATCH v8 24/24] erofs: add document

2019-08-14 Thread Gao Xiang
This documents key features, usage, and
on-disk design of erofs.

Signed-off-by: Gao Xiang 
---
 Documentation/filesystems/erofs.txt | 225 
 1 file changed, 225 insertions(+)
 create mode 100644 Documentation/filesystems/erofs.txt

diff --git a/Documentation/filesystems/erofs.txt 
b/Documentation/filesystems/erofs.txt
new file mode 100644
index ..457e601e0467
--- /dev/null
+++ b/Documentation/filesystems/erofs.txt
@@ -0,0 +1,225 @@
+Overview
+
+
+EROFS file-system stands for Enhanced Read-Only File System. Different
+from other read-only file systems, it aims to be designed for flexibility,
+scalability, but be kept simple and high performance.
+
+It is designed as a better filesystem solution for the following scenarios:
+ - read-only storage media or
+
+ - part of a fully trusted read-only solution, which means it needs to be
+   immutable and bit-for-bit identical to the official golden image for
+   their releases due to security and other considerations and
+
+ - hope to save some extra storage space with guaranteed end-to-end performance
+   by using reduced metadata and transparent file compression, especially
+   for those embedded devices with limited memory (ex, smartphone);
+
+Here is the main features of EROFS:
+ - Little endian on-disk design;
+
+ - Currently 4KB block size (nobh) and therefore maximum 16TB address space;
+
+ - Metadata & data could be mixed by design;
+
+ - 2 inode versions for different requirements:
+  v1v2
+   Inode metadata size:   32 bytes  64 bytes
+   Max file size: 4 GB  16 EB (also limited by max. vol size)
+   Max uids/gids: 65536 4294967296
+   File creation time:noyes (64 + 32-bit timestamp)
+   Max hardlinks: 65536 4294967296
+   Metadata reserved: 4 bytes   14 bytes
+
+ - Support extended attributes (xattrs) as an option;
+
+ - Support xattr inline and tail-end data inline for all files;
+
+ - Support POSIX.1e ACLs by using xattrs;
+
+ - Support statx();
+
+ - Support transparent file compression as an option:
+   LZ4 algorithm with 4 KB fixed-output compression for high performance;
+
+The following git tree provides the file system user-space tools under
+development (ex, formatting tool mkfs.erofs):
+>> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
+
+Bugs and patches are welcome, please kindly help us and send to the following
+linux-erofs mailing list:
+>> linux-erofs mailing list   
+
+Note that EROFS is still working in progress as a Linux staging driver,
+Cc the staging mailing list as well is highly recommended:
+>> Linux Driver Project Developer List 
+
+Mount options
+=
+
+fault_injection=%d Enable fault injection in all supported types with
+   specified injection rate. Supported injection type:
+   Type_NameType_Value
+   FAULT_KMALLOC0x1
+   FAULT_READ_IO0x2
+(no)user_xattr Setup Extended User Attributes. Note: xattr is enabled
+   by default if CONFIG_EROFS_FS_XATTR is selected.
+(no)aclSetup POSIX Access Control List. Note: acl is enabled
+   by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
+cache_strategy=%s  Select a strategy for cached decompression from now on:
+ disabled: In-place I/O decompression only;
+readahead: Cache the last incomplete compressed 
physical
+   cluster for further reading. It still does
+   in-place I/O decompression for the rest
+   compressed physical clusters;
+   readaround: Cache the both ends of incomplete compressed
+   physical clusters for further reading.
+   It still does in-place I/O decompression
+   for the rest compressed physical clusters.
+
+Module parameters
+=
+use_vmap=[0|1] Use vmap() instead of vm_map_ram() (default 0).
+
+On-disk details
+===
+
+Summary
+---
+Different from other read-only file systems, an EROFS volume is designed
+to be as simple as possible:
+
+|-> aligned with the block size
+   
+  | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data |
+  |_|__|_|_|__|_|__|__|_|__|
+  0 +1K
+
+All data areas should be aligned with the block size, but metadata areas
+may not. All metadatas can be now observed in two different spaces (views):
+ 1. Inode metadata space
+Each valid inode should be aligned with an inode slot, which is a fixed
+value (32 bytes) and 

[PATCH v8 00/24] erofs: promote erofs from staging v8

2019-08-14 Thread Gao Xiang
[I strip the previous cover letter, the old one can be found in v6:
 https://lore.kernel.org/r/20190802125347.166018-1-gaoxian...@huawei.com/]

We'd like to submit a formal moving patch applied to staging tree
for 5.4, before that we'd like to hear if there are some ACKs,
suggestions or NAKs, objections of EROFS. Therefore, we can improve
it in this round or rethink about the whole thing.

As related materials mentioned [1] [2], the goal of EROFS is to
save extra storage space with guaranteed end-to-end performance
for read-only files, which has better performance over exist Linux
compression filesystems based on fixed-sized output compression
and inplace decompression. It even has better performance in
a large compression ratio range compared with generic uncompressed
filesystems with proper CPU-storage combinations. And we think this
direction is correct and a dedicated kernel team is continuously /
actively working on improving it, enough testers and beta / end
users using it.

EROFS has been applied to almost all in-service HUAWEI smartphones
(Yes, the number is still increasing by time) and it seems like
a success. It can be used in more wider scenarios. We think it's
useful for Linux / Android OS community and it's the time moving
out of staging.

In order to get started, latest stable mkfs.erofs is available at

git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git -b dev

with README in the repository.

We are still tuning sequential read performance for ultra-fast
speed NVME SSDs like Samsung 970PRO, but at least now you can
try on your PC with some data with proper compression ratio,
the latest Linux kernel, USB stick for convenience sake and
a not very old-fashioned CPU. There are also benchmarks available
in the above materials mentioned.

EROFS is a self-contained filesystem driver. Although there are
still some TODOs to be more generic, we will actively keep on
developping / tuning EROFS with the evolution of Linux kernel
as the other in-kernel filesystems.

As I mentioned before in LSF/MM 2019, in the future, we'd like
to generalize the decompression engine into a library for other
fses to use after the whole system is mature like fscrypt.
However, such metadata should be designed respectively for
each fs, and synchronous metadata read cost will be larger
than EROFS because of those ondisk limitation. Therefore EROFS
is still a better choice for read-only scenarios.

EROFS is now ready for reviewing and moving, and the code is
already cleaned up as shiny floors... Please kindly take some
precious time, share your comments about EROFS and let us know
your opinion about this. It's really important for us since
generally speaking, we like to use Linux _in-tree_ stuffs rather
than lack of supported out-of-tree / orphan stuffs as well.

Thank you in advance,
Gao Xiang

[1] 
https://kccncosschn19eng.sched.com/event/Nru2/erofs-an-introduction-and-our-smartphone-practice-xiang-gao-huawei
[2] https://www.usenix.org/conference/atc19/presentation/gao

Changelog from v7:
 o keep up with the latest staging tree in addition to
   the latest staging patch:
   https://lore.kernel.org/r/20190814103705.60698-1-gaoxian...@huawei.com/
   - use EUCLEAN for fs corruption cases suggested by Pavel;
   - turn EIO into EOPNOTSUPP for unsupported on-disk format;
   - fix all misused ENOTSUPP into EOPNOTSUPP pointed out by Chao;
 o update cover letter

It can also be found in git at tag "erofs_2019-08-15" (will be shown later) at:
 https://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git/

and the latest fs code is available at:
 
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git/tree/fs/erofs?h=erofs-outofstaging

Changelog from v6:
 o keep up with the latest staging patchset
   
https://lore.kernel.org/linux-fsdevel/20190813023054.73126-1-gaoxian...@huawei.com/
   in order to fix the following cases:
   - inline erofs_inode_is_data_compressed() in erofs_fs.h;
   - remove incomplete cleancache;
   - remove all BUG_ON in EROFS.
 o Removing the file names from the comments at the top of the files
   suggested by Stephen will be applied to the real moving patch later.

Changelog from v5:
 o keep up with "[PATCH v2] staging: erofs: updates according to 
erofs-outofstaging v4"
https://lore.kernel.org/lkml/20190731155752.210602-1-gaoxian...@huawei.com/
   which mainly addresses review comments from Chao:
  - keep the marco EROFS_IO_MAX_RETRIES_NOFAIL in internal.h;
  - kill a redundant NULL check in "__stagingpage_alloc";
  - add some descriptions in document about "use_vmap";
  - rearrange erofs_vmap of "staging: erofs: kill 
CONFIG_EROFS_FS_USE_VM_MAP_RAM";

 o all changes have been merged into staging tree, which are under 
staging-testing:

https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git/log/?h=staging-testing

Changelog from v4:
 o rebase on Linux 5.3-rc1;

 o keep up with "staging: erofs: updates according to erofs-outofstaging v4"
   in order to get main 

[PATCH v8 06/24] erofs: support special inode

2019-08-14 Thread Gao Xiang
This patch adds to support special inode, such as
block dev, char, socket, pipe inode.

Signed-off-by: Gao Xiang 
---
 fs/erofs/inode.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index 9960edaf6f7a..f55193856359 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -34,7 +34,16 @@ static int read_inode(struct inode *inode, void *data)
vi->xattr_isize = ondisk_xattr_ibody_size(v2->i_xattr_icount);
 
inode->i_mode = le16_to_cpu(v2->i_mode);
-   vi->raw_blkaddr = le32_to_cpu(v2->i_u.raw_blkaddr);
+   if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+   S_ISLNK(inode->i_mode))
+   vi->raw_blkaddr = le32_to_cpu(v2->i_u.raw_blkaddr);
+   else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode))
+   inode->i_rdev =
+   new_decode_dev(le32_to_cpu(v2->i_u.rdev));
+   else if (S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode))
+   inode->i_rdev = 0;
+   else
+   goto bogusimode;
 
i_uid_write(inode, le32_to_cpu(v2->i_uid));
i_gid_write(inode, le32_to_cpu(v2->i_gid));
@@ -58,7 +67,16 @@ static int read_inode(struct inode *inode, void *data)
vi->xattr_isize = ondisk_xattr_ibody_size(v1->i_xattr_icount);
 
inode->i_mode = le16_to_cpu(v1->i_mode);
-   vi->raw_blkaddr = le32_to_cpu(v1->i_u.raw_blkaddr);
+   if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+   S_ISLNK(inode->i_mode))
+   vi->raw_blkaddr = le32_to_cpu(v1->i_u.raw_blkaddr);
+   else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode))
+   inode->i_rdev =
+   new_decode_dev(le32_to_cpu(v1->i_u.rdev));
+   else if (S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode))
+   inode->i_rdev = 0;
+   else
+   goto bogusimode;
 
i_uid_write(inode, le16_to_cpu(v1->i_uid));
i_gid_write(inode, le16_to_cpu(v1->i_gid));
@@ -86,6 +104,11 @@ static int read_inode(struct inode *inode, void *data)
else
inode->i_blocks = nblks << LOG_SECTORS_PER_BLOCK;
return 0;
+
+bogusimode:
+   errln("bogus i_mode (%o) @ nid %llu", inode->i_mode, vi->nid);
+   DBG_BUGON(1);
+   return -EFSCORRUPTED;
 }
 
 /*
@@ -178,6 +201,11 @@ static int fill_inode(struct inode *inode, int isdir)
/* by default, page_get_link is used for symlink */
inode->i_op = _symlink_iops;
inode_nohighmem(inode);
+   } else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
+   S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
+   inode->i_op = _generic_iops;
+   init_special_inode(inode, inode->i_mode, inode->i_rdev);
+   goto out_unlock;
} else {
err = -EFSCORRUPTED;
goto out_unlock;
-- 
2.17.1



[PATCH v8 09/24] erofs: support tracepoint

2019-08-14 Thread Gao Xiang
Add basic tracepoints for ->readpage{,s}, ->lookup,
->destroy_inode, fill_inode and map_blocks.

Signed-off-by: Gao Xiang 
---
 include/trace/events/erofs.h | 241 +++
 1 file changed, 241 insertions(+)
 create mode 100644 include/trace/events/erofs.h

diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
new file mode 100644
index ..0c5847c54b60
--- /dev/null
+++ b/include/trace/events/erofs.h
@@ -0,0 +1,241 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM erofs
+
+#if !defined(_TRACE_EROFS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_EROFS_H
+
+#include 
+
+#define show_dev(dev)  MAJOR(dev), MINOR(dev)
+#define show_dev_nid(entry)show_dev(entry->dev), entry->nid
+
+#define show_file_type(type)   \
+   __print_symbolic(type,  \
+   { 0,"FILE" },   \
+   { 1,"DIR" })
+
+#define show_map_flags(flags) __print_flags(flags, "|",\
+   { EROFS_GET_BLOCKS_RAW, "RAW" })
+
+#define show_mflags(flags) __print_flags(flags, "",\
+   { EROFS_MAP_MAPPED, "M" },  \
+   { EROFS_MAP_META,   "I" })
+
+TRACE_EVENT(erofs_lookup,
+
+   TP_PROTO(struct inode *dir, struct dentry *dentry, unsigned int flags),
+
+   TP_ARGS(dir, dentry, flags),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev )
+   __field(erofs_nid_t,nid )
+   __field(const char *,   name)
+   __field(unsigned int,   flags   )
+   ),
+
+   TP_fast_assign(
+   __entry->dev= dir->i_sb->s_dev;
+   __entry->nid= EROFS_V(dir)->nid;
+   __entry->name   = dentry->d_name.name;
+   __entry->flags  = flags;
+   ),
+
+   TP_printk("dev = (%d,%d), pnid = %llu, name:%s, flags:%x",
+   show_dev_nid(__entry),
+   __entry->name,
+   __entry->flags)
+);
+
+TRACE_EVENT(erofs_fill_inode,
+   TP_PROTO(struct inode *inode, int isdir),
+   TP_ARGS(inode, isdir),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev )
+   __field(erofs_nid_t,nid )
+   __field(erofs_blk_t,blkaddr )
+   __field(unsigned int,   ofs )
+   __field(int,isdir   )
+   ),
+
+   TP_fast_assign(
+   __entry->dev= inode->i_sb->s_dev;
+   __entry->nid= EROFS_V(inode)->nid;
+   __entry->blkaddr= erofs_blknr(iloc(EROFS_I_SB(inode), 
__entry->nid));
+   __entry->ofs= erofs_blkoff(iloc(EROFS_I_SB(inode), 
__entry->nid));
+   __entry->isdir  = isdir;
+   ),
+
+   TP_printk("dev = (%d,%d), nid = %llu, blkaddr %u ofs %u, isdir %d",
+ show_dev_nid(__entry),
+ __entry->blkaddr, __entry->ofs,
+ __entry->isdir)
+);
+
+TRACE_EVENT(erofs_readpage,
+
+   TP_PROTO(struct page *page, bool raw),
+
+   TP_ARGS(page, raw),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev )
+   __field(erofs_nid_t,nid )
+   __field(int,dir )
+   __field(pgoff_t,index   )
+   __field(int,uptodate)
+   __field(bool,   raw )
+   ),
+
+   TP_fast_assign(
+   __entry->dev= page->mapping->host->i_sb->s_dev;
+   __entry->nid= EROFS_V(page->mapping->host)->nid;
+   __entry->dir= S_ISDIR(page->mapping->host->i_mode);
+   __entry->index  = page->index;
+   __entry->uptodate = PageUptodate(page);
+   __entry->raw = raw;
+   ),
+
+   TP_printk("dev = (%d,%d), nid = %llu, %s, index = %lu, uptodate = %d "
+   "raw = %d",
+   show_dev_nid(__entry),
+   show_file_type(__entry->dir),
+   (unsigned long)__entry->index,
+   __entry->uptodate,
+   __entry->raw)
+);
+
+TRACE_EVENT(erofs_readpages,
+
+   TP_PROTO(struct inode *inode, struct page *page, unsigned int nrpage,
+   bool raw),
+
+   TP_ARGS(inode, page, nrpage, raw),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev )
+   __field(erofs_nid_t,nid )
+   __field(pgoff_t,start   )
+   __field(unsigned int,   nrpage  )
+   __field(bool,   raw )
+   ),
+
+   TP_fast_assign(
+   __entry->dev= inode->i_sb->s_dev;
+   __entry->nid= EROFS_V(inode)->nid;
+   __entry->start  = page->index;
+   __entry->nrpage = nrpage;
+ 

[PATCH v8 07/24] erofs: add directory operations

2019-08-14 Thread Gao Xiang
This adds functions for directory, mainly readdir.

Signed-off-by: Gao Xiang 
---
 fs/erofs/dir.c | 148 +
 1 file changed, 148 insertions(+)
 create mode 100644 fs/erofs/dir.c

diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c
new file mode 100644
index ..c52d27bedff4
--- /dev/null
+++ b/fs/erofs/dir.c
@@ -0,0 +1,148 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * linux/fs/erofs/dir.c
+ *
+ * Copyright (C) 2017-2018 HUAWEI, Inc.
+ * http://www.huawei.com/
+ * Created by Gao Xiang 
+ */
+#include "internal.h"
+
+static const unsigned char erofs_filetype_table[EROFS_FT_MAX] = {
+   [EROFS_FT_UNKNOWN]  = DT_UNKNOWN,
+   [EROFS_FT_REG_FILE] = DT_REG,
+   [EROFS_FT_DIR]  = DT_DIR,
+   [EROFS_FT_CHRDEV]   = DT_CHR,
+   [EROFS_FT_BLKDEV]   = DT_BLK,
+   [EROFS_FT_FIFO] = DT_FIFO,
+   [EROFS_FT_SOCK] = DT_SOCK,
+   [EROFS_FT_SYMLINK]  = DT_LNK,
+};
+
+static void debug_one_dentry(unsigned char d_type, const char *de_name,
+unsigned int de_namelen)
+{
+#ifdef CONFIG_EROFS_FS_DEBUG
+   /* since the on-disk name could not have the trailing '\0' */
+   unsigned char dbg_namebuf[EROFS_NAME_LEN + 1];
+
+   memcpy(dbg_namebuf, de_name, de_namelen);
+   dbg_namebuf[de_namelen] = '\0';
+
+   debugln("found dirent %s de_len %u d_type %d", dbg_namebuf,
+   de_namelen, d_type);
+#endif
+}
+
+static int erofs_fill_dentries(struct inode *dir, struct dir_context *ctx,
+  void *dentry_blk, unsigned int *ofs,
+  unsigned int nameoff, unsigned int maxsize)
+{
+   struct erofs_dirent *de = dentry_blk + *ofs;
+   const struct erofs_dirent *end = dentry_blk + nameoff;
+
+   while (de < end) {
+   const char *de_name;
+   unsigned int de_namelen;
+   unsigned char d_type;
+
+   if (de->file_type < EROFS_FT_MAX)
+   d_type = erofs_filetype_table[de->file_type];
+   else
+   d_type = DT_UNKNOWN;
+
+   nameoff = le16_to_cpu(de->nameoff);
+   de_name = (char *)dentry_blk + nameoff;
+
+   /* the last dirent in the block? */
+   if (de + 1 >= end)
+   de_namelen = strnlen(de_name, maxsize - nameoff);
+   else
+   de_namelen = le16_to_cpu(de[1].nameoff) - nameoff;
+
+   /* a corrupted entry is found */
+   if (unlikely(nameoff + de_namelen > maxsize ||
+de_namelen > EROFS_NAME_LEN)) {
+   errln("bogus dirent @ nid %llu", EROFS_V(dir)->nid);
+   DBG_BUGON(1);
+   return -EFSCORRUPTED;
+   }
+
+   debug_one_dentry(d_type, de_name, de_namelen);
+   if (!dir_emit(ctx, de_name, de_namelen,
+ le64_to_cpu(de->nid), d_type))
+   /* stopped by some reason */
+   return 1;
+   ++de;
+   *ofs += sizeof(struct erofs_dirent);
+   }
+   *ofs = maxsize;
+   return 0;
+}
+
+static int erofs_readdir(struct file *f, struct dir_context *ctx)
+{
+   struct inode *dir = file_inode(f);
+   struct address_space *mapping = dir->i_mapping;
+   const size_t dirsize = i_size_read(dir);
+   unsigned int i = ctx->pos / EROFS_BLKSIZ;
+   unsigned int ofs = ctx->pos % EROFS_BLKSIZ;
+   int err = 0;
+   bool initial = true;
+
+   while (ctx->pos < dirsize) {
+   struct page *dentry_page;
+   struct erofs_dirent *de;
+   unsigned int nameoff, maxsize;
+
+   dentry_page = read_mapping_page(mapping, i, NULL);
+   if (IS_ERR(dentry_page))
+   continue;
+
+   de = (struct erofs_dirent *)kmap(dentry_page);
+
+   nameoff = le16_to_cpu(de->nameoff);
+
+   if (unlikely(nameoff < sizeof(struct erofs_dirent) ||
+nameoff >= PAGE_SIZE)) {
+   errln("%s, invalid de[0].nameoff %u @ nid %llu",
+ __func__, nameoff, EROFS_V(dir)->nid);
+   err = -EFSCORRUPTED;
+   goto skip_this;
+   }
+
+   maxsize = min_t(unsigned int,
+   dirsize - ctx->pos + ofs, PAGE_SIZE);
+
+   /* search dirents at the arbitrary position */
+   if (unlikely(initial)) {
+   initial = false;
+
+   ofs = roundup(ofs, sizeof(struct erofs_dirent));
+   if (unlikely(ofs >= nameoff))
+   goto skip_this;
+   }
+
+   err = erofs_fill_dentries(dir, ctx, de, ,
+  

[PATCH v8 21/24] erofs: introduce LZ4 decompression inplace

2019-08-14 Thread Gao Xiang
compressed data will be usually loaded into last pages of
the extent (the last page for 4k) for in-place decompression
(more specifically, in-place IO), as ilustration below,

 start of compressed logical extent
   |  end of this logical extent
   |   |
 __v___v
... |  page 6  |  page 7  |  page 8  |  page 9  | ...
|__|__|__|__|
   . ^ .^
   . |compressed|
   . |   data   |
   .   ..
   |<  dstsize>||
   oend iend
   opip

Therefore, it's possible to do decompression inplace (thus no
memcpy at all) if the margin is sufficient and safe enough [1],
and it can be implemented only for fixed-size output compression
compared with fixed-size input compression.

No memcpy for most of in-place IO (about 99% of enwik9) after
decompression inplace is implemented and sequential read will
be improved of course (see the following patches for test results).

[1] https://github.com/lz4/lz4/commit/b17f578a919b7e6b078cede2d52be29dd48c8e8c
https://github.com/lz4/lz4/commit/5997e139f53169fa3a1c1b4418d2452a90b01602

Signed-off-by: Gao Xiang 
---
 fs/erofs/decompressor.c | 36 
 fs/erofs/erofs_fs.h |  2 +-
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 2374dd3c967c..9a750bf662a5 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -15,6 +15,9 @@
 #endif
 
 #define LZ4_MAX_DISTANCE_PAGES (DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE) + 1)
+#ifndef LZ4_DECOMPRESS_INPLACE_MARGIN
+#define LZ4_DECOMPRESS_INPLACE_MARGIN(srcsize)  (((srcsize) >> 8) + 32)
+#endif
 
 struct z_erofs_decompressor {
/*
@@ -117,7 +120,7 @@ static int lz4_decompress(struct z_erofs_decompress_req 
*rq, u8 *out)
 {
unsigned int inputmargin, inlen;
u8 *src;
-   bool copied;
+   bool copied, support_0padding;
int ret;
 
if (rq->inputsize > PAGE_SIZE)
@@ -125,13 +128,38 @@ static int lz4_decompress(struct z_erofs_decompress_req 
*rq, u8 *out)
 
src = kmap_atomic(*rq->in);
inputmargin = 0;
+   support_0padding = false;
+
+   /* decompression inplace is only safe when 0padding is enabled */
+   if (EROFS_SB(rq->sb)->requirements & EROFS_REQUIREMENT_LZ4_0PADDING) {
+   support_0padding = true;
+
+   while (!src[inputmargin & ~PAGE_MASK])
+   if (!(++inputmargin & ~PAGE_MASK))
+   break;
+
+   if (inputmargin >= rq->inputsize) {
+   kunmap_atomic(src);
+   return -EIO;
+   }
+   }
 
copied = false;
inlen = rq->inputsize - inputmargin;
if (rq->inplace_io) {
-   src = generic_copy_inplace_data(rq, src, inputmargin);
-   inputmargin = 0;
-   copied = true;
+   const uint oend = (rq->pageofs_out +
+  rq->outputsize) & ~PAGE_MASK;
+   const uint nr = PAGE_ALIGN(rq->pageofs_out +
+  rq->outputsize) >> PAGE_SHIFT;
+
+   if (rq->partial_decoding || !support_0padding ||
+   rq->out[nr - 1] != rq->in[0] ||
+   rq->inputsize - oend <
+ LZ4_DECOMPRESS_INPLACE_MARGIN(inlen)) {
+   src = generic_copy_inplace_data(rq, src, inputmargin);
+   inputmargin = 0;
+   copied = true;
+   }
}
 
ret = LZ4_decompress_safe_partial(src + inputmargin, out,
diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index 230fcba1099d..c0fb7d6ebfcb 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -17,7 +17,7 @@
  * incompatible with this kernel version.
  */
 #define EROFS_REQUIREMENT_LZ4_0PADDING 0x0001
-#define EROFS_ALL_REQUIREMENTS 0
+#define EROFS_ALL_REQUIREMENTS EROFS_REQUIREMENT_LZ4_0PADDING
 
 struct erofs_super_block {
 /*  0 */__le32 magic;   /* in the little endian */
-- 
2.17.1



[PATCH v2 2/3] staging: erofs: differentiate unsupported on-disk format

2019-08-14 Thread Gao Xiang
For some specific fields, use EOPNOTSUPP instead of EIO
for values which look sane but aren't supported right now.

Reviewed-by: Chao Yu 
Signed-off-by: Gao Xiang 
---
change log from v1:
 - use EOPNOTSUPP rather than ENOTSUPP pointed by Chao;

 drivers/staging/erofs/inode.c | 4 ++--
 drivers/staging/erofs/zmap.c  | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/erofs/inode.c b/drivers/staging/erofs/inode.c
index 461fd4213ce7..c8f3ded17583 100644
--- a/drivers/staging/erofs/inode.c
+++ b/drivers/staging/erofs/inode.c
@@ -24,7 +24,7 @@ static int read_inode(struct inode *inode, void *data)
errln("unsupported data mapping %u of nid %llu",
  vi->datamode, vi->nid);
DBG_BUGON(1);
-   return -EIO;
+   return -EOPNOTSUPP;
}
 
if (__inode_version(advise) == EROFS_INODE_LAYOUT_V2) {
@@ -95,7 +95,7 @@ static int read_inode(struct inode *inode, void *data)
errln("unsupported on-disk inode version %u of nid %llu",
  __inode_version(advise), vi->nid);
DBG_BUGON(1);
-   return -EIO;
+   return -EOPNOTSUPP;
}
 
if (!nblks)
diff --git a/drivers/staging/erofs/zmap.c b/drivers/staging/erofs/zmap.c
index 16b3625604f4..5551e615e8ea 100644
--- a/drivers/staging/erofs/zmap.c
+++ b/drivers/staging/erofs/zmap.c
@@ -178,7 +178,7 @@ static int vle_legacy_load_cluster_from_disk(struct 
z_erofs_maprecorder *m,
break;
default:
DBG_BUGON(1);
-   return -EIO;
+   return -EOPNOTSUPP;
}
m->type = type;
return 0;
@@ -362,7 +362,7 @@ static int vle_extent_lookback(struct z_erofs_maprecorder 
*m,
errln("unknown type %u at lcn %lu of nid %llu",
  m->type, lcn, vi->nid);
DBG_BUGON(1);
-   return -EIO;
+   return -EOPNOTSUPP;
}
return 0;
 }
@@ -436,7 +436,7 @@ int z_erofs_map_blocks_iter(struct inode *inode,
default:
errln("unknown type %u at offset %llu of nid %llu",
  m.type, ofs, vi->nid);
-   err = -EIO;
+   err = -EOPNOTSUPP;
goto unmap_out;
}
 
-- 
2.17.1



Re: [PATCH v2] Add flags option to get xattr method paired to __vfs_getxattr

2019-08-14 Thread Jan Kara
On Tue 13-08-19 07:55:06, Mark Salyzyn wrote:
...
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 90dd78f0eb27..71f887518d6f 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
...
>  ssize_t
>  __vfs_getxattr(struct dentry *dentry, struct inode *inode, const char *name,
> -void *value, size_t size)
> +void *value, size_t size, int flags)
>  {
>   const struct xattr_handler *handler;
> -
> - handler = xattr_resolve_name(inode, );
> - if (IS_ERR(handler))
> - return PTR_ERR(handler);
> - if (!handler->get)
> - return -EOPNOTSUPP;
> - return handler->get(handler, dentry, inode, name, value, size);
> -}
> -EXPORT_SYMBOL(__vfs_getxattr);
> -
> -ssize_t
> -vfs_getxattr(struct dentry *dentry, const char *name, void *value, size_t 
> size)
> -{
> - struct inode *inode = dentry->d_inode;
>   int error;
>  
> + if (flags & XATTR_NOSECURITY)
> + goto nolsm;

Hum, is it OK for XATTR_NOSECURITY to skip even the xattr_permission()
check? I understand that for reads of security xattrs it actually does not
matter in practice but conceptually that seems wrong to me as
XATTR_NOSECURITY is supposed to skip just security-module checks to avoid
recursion AFAIU.

> diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h
> index c1395b5bd432..1216d777d210 100644
> --- a/include/uapi/linux/xattr.h
> +++ b/include/uapi/linux/xattr.h
> @@ -17,8 +17,9 @@
>  #if __UAPI_DEF_XATTR
>  #define __USE_KERNEL_XATTR_DEFS
>  
> -#define XATTR_CREATE 0x1 /* set value, fail if attr already exists */
> -#define XATTR_REPLACE0x2 /* set value, fail if attr does not 
> exist */
> +#define XATTR_CREATE  0x1/* set value, fail if attr already exists */
> +#define XATTR_REPLACE 0x2/* set value, fail if attr does not 
> exist */
> +#define XATTR_NOSECURITY 0x4 /* get value, do not involve security check */
>  #endif

It seems confusing to export XATTR_NOSECURITY definition to userspace when
that is kernel-internal flag. I'd just define it in include/linux/xattr.h
somewhere from the top of flags space (like 0x4000).

Otherwise the patch looks OK to me (cannot really comment on the security
module aspect of this whole thing though).

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH RESEND 2/2] staging: erofs: differentiate unsupported on-disk format

2019-08-14 Thread Chao Yu
On 2019/8/14 12:32, Gao Xiang wrote:
> For some specific fields, use ENOTSUPP instead of EIO
> for values which look sane but aren't supported right now.
> 
> Signed-off-by: Gao Xiang 

Reviewed-by: Chao Yu 

> + return -ENOTSUPP;

A little bit confused about when we need to use ENOTSUPP or EOPNOTSUPP, I
checked several manual of syscall, it looks EOPNOTSUPP is widely used.

Thanks,


Re: [PATCH RESEND 1/2] staging: erofs: introduce EFSCORRUPTED and more logs

2019-08-14 Thread Chao Yu
On 2019/8/14 12:32, Gao Xiang wrote:
> Previously, EROFS uses EIO to indicate that filesystem is
> corrupted as well, but other filesystems tend to use
> EUCLEAN instead, let's follow what others do right now.
> 
> Also, add some more prints to the syslog.
> 
> Suggested-by: Pavel Machek 
> Signed-off-by: Gao Xiang 

Reviewed-by: Chao Yu 

Thanks,