Re: [PATCH 00/19] pramfs

2013-09-09 Thread Marco Stornelli

Il 09/09/2013 01:40, Dave Chinner ha scritto:

On Sat, Sep 07, 2013 at 10:14:04AM +0200, Marco Stornelli wrote:

Hi all,

this is an attempt to include pramfs in mainline. At the moment pramfs
has been included in LTSI kernel. Since last review the code is more
or less the same but, with a really big thanks to Vladimir Davydov and
Parallels, the development of fsck has been started and we have now
the possibility to correct fs errors due to corruption. It's a "young"
tool but we are working on it. You can clone the code from our repos:

git clone git://git.code.sf.net/p/pramfs/code pramfs-code
git clone git://git.code.sf.net/p/pramfs/Tools pramfs-Tools


The 1980s are calling, and they want their filesytem back. :)

So, Devil's Advocate time. Convince me as to why pramfs should be
merged.



Never used in my embedded project with more of few mb, and maybe you are 
asking for cases not targeted. Pramfs is in LTSI not why I asked Greg to 
include it, but because there are several companies asked me to do. The 
message wasn't "hey, pramfs is in LTSI, so we must include it in 
mainline" but it was only a consideration and indication that there is 
need of something like that out there. Sure we can talk about if it's 
the best option or not, however this is the real world. In addition, 
this comment is exactly why the kernel community is not really friendly 
to be time-to-market, we have to say it. It's not possible to have a 10 
years review. It's a never ending story. Maybe when there is a big 
company behind a patch the behavior is different and I can do several 
examples, but I don't want to say more, life is hard, I know, even in 
the kernel community. I think I can close the review here, my delusion 
is not for me, I can live well in each case, but for the companies and 
all the people believes in the project.


Thanks to all.

Marco

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/19] pramfs

2013-09-09 Thread Marco Stornelli

Il 09/09/2013 01:40, Dave Chinner ha scritto:

On Sat, Sep 07, 2013 at 10:14:04AM +0200, Marco Stornelli wrote:

Hi all,

this is an attempt to include pramfs in mainline. At the moment pramfs
has been included in LTSI kernel. Since last review the code is more
or less the same but, with a really big thanks to Vladimir Davydov and
Parallels, the development of fsck has been started and we have now
the possibility to correct fs errors due to corruption. It's a young
tool but we are working on it. You can clone the code from our repos:

git clone git://git.code.sf.net/p/pramfs/code pramfs-code
git clone git://git.code.sf.net/p/pramfs/Tools pramfs-Tools


The 1980s are calling, and they want their filesytem back. :)

So, Devil's Advocate time. Convince me as to why pramfs should be
merged.



Never used in my embedded project with more of few mb, and maybe you are 
asking for cases not targeted. Pramfs is in LTSI not why I asked Greg to 
include it, but because there are several companies asked me to do. The 
message wasn't hey, pramfs is in LTSI, so we must include it in 
mainline but it was only a consideration and indication that there is 
need of something like that out there. Sure we can talk about if it's 
the best option or not, however this is the real world. In addition, 
this comment is exactly why the kernel community is not really friendly 
to be time-to-market, we have to say it. It's not possible to have a 10 
years review. It's a never ending story. Maybe when there is a big 
company behind a patch the behavior is different and I can do several 
examples, but I don't want to say more, life is hard, I know, even in 
the kernel community. I think I can close the review here, my delusion 
is not for me, I can live well in each case, but for the companies and 
all the people believes in the project.


Thanks to all.

Marco

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/19] pramfs

2013-09-08 Thread Marco Stornelli

Il 08/09/2013 11:05, Vladimir Davydov ha scritto:

On 09/07/2013 08:22 PM, Marco Stornelli wrote:

Il 07/09/2013 16:58, richard -rw- weinberger ha scritto:

On Sat, Sep 7, 2013 at 10:14 AM, Marco Stornelli
 wrote:

Hi all,

this is an attempt to include pramfs in mainline. At the moment pramfs
has been included in LTSI kernel. Since last review the code is more
or less the same but, with a really big thanks to Vladimir Davydov and
Parallels, the development of fsck has been started and we have now
the possibility to correct fs errors due to corruption. It's a "young"
tool but we are working on it. You can clone the code from our repos:

git clone git://git.code.sf.net/p/pramfs/code pramfs-code
git clone git://git.code.sf.net/p/pramfs/Tools pramfs-Tools


I'm a bit confused, what kind of non-volatile RAM is your fs targeting?
Wouldn't it make sense to use pstore like
arch/powerpc/platforms/pseries/nvram.c does?



Usually battery-backed SRAM, but actually it can be used in any piece
of ram directly accessible and it provides a normal and complete fs
interface. Usually I do the fs test remapping my system ram. You can
find documentation here:

http://pramfs.sourceforge.net


I'd like to add that in contrast to pstore, pramfs allows storing any
files in it, not only system logs. This can be of value even on machines
w/o special devices like sram/nvram: one can store data that should be
quickly restored after reboot in conventional ram and use kexec to boot
to a new kernel. One of the use cases of this could be checkpointing
time-critical services to ram (using criu.org) to be quickly restored
after a kernel update providing almost zero-downtime.



Yep. I add that if you use your system ram, your bootloader must be 
aware because it mustn't clear your memory after a reboot, indeed you 
can find reference of Pramfs in Uboot documentation:


http://www.denx.de/wiki/view/DULG/PersistentRAMFileSystem

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/19] pramfs

2013-09-08 Thread Marco Stornelli

Il 07/09/2013 10:14, Marco Stornelli ha scritto:

Hi all,

this is an attempt to include pramfs in mainline. At the moment pramfs
has been included in LTSI kernel. Since last review the code is more
or less the same but, with a really big thanks to Vladimir Davydov and
Parallels, the development of fsck has been started and we have now
the possibility to correct fs errors due to corruption. It's a "young"
tool but we are working on it. You can clone the code from our repos:

git clone git://git.code.sf.net/p/pramfs/code pramfs-code
git clone git://git.code.sf.net/p/pramfs/Tools pramfs-Tools



Other comments before v2? Please let me know.

Marco

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/19] pramfs

2013-09-08 Thread Marco Stornelli

Il 07/09/2013 10:14, Marco Stornelli ha scritto:

Hi all,

this is an attempt to include pramfs in mainline. At the moment pramfs
has been included in LTSI kernel. Since last review the code is more
or less the same but, with a really big thanks to Vladimir Davydov and
Parallels, the development of fsck has been started and we have now
the possibility to correct fs errors due to corruption. It's a young
tool but we are working on it. You can clone the code from our repos:

git clone git://git.code.sf.net/p/pramfs/code pramfs-code
git clone git://git.code.sf.net/p/pramfs/Tools pramfs-Tools



Other comments before v2? Please let me know.

Marco

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/19] pramfs

2013-09-08 Thread Marco Stornelli

Il 08/09/2013 11:05, Vladimir Davydov ha scritto:

On 09/07/2013 08:22 PM, Marco Stornelli wrote:

Il 07/09/2013 16:58, richard -rw- weinberger ha scritto:

On Sat, Sep 7, 2013 at 10:14 AM, Marco Stornelli
marco.storne...@gmail.com wrote:

Hi all,

this is an attempt to include pramfs in mainline. At the moment pramfs
has been included in LTSI kernel. Since last review the code is more
or less the same but, with a really big thanks to Vladimir Davydov and
Parallels, the development of fsck has been started and we have now
the possibility to correct fs errors due to corruption. It's a young
tool but we are working on it. You can clone the code from our repos:

git clone git://git.code.sf.net/p/pramfs/code pramfs-code
git clone git://git.code.sf.net/p/pramfs/Tools pramfs-Tools


I'm a bit confused, what kind of non-volatile RAM is your fs targeting?
Wouldn't it make sense to use pstore like
arch/powerpc/platforms/pseries/nvram.c does?



Usually battery-backed SRAM, but actually it can be used in any piece
of ram directly accessible and it provides a normal and complete fs
interface. Usually I do the fs test remapping my system ram. You can
find documentation here:

http://pramfs.sourceforge.net


I'd like to add that in contrast to pstore, pramfs allows storing any
files in it, not only system logs. This can be of value even on machines
w/o special devices like sram/nvram: one can store data that should be
quickly restored after reboot in conventional ram and use kexec to boot
to a new kernel. One of the use cases of this could be checkpointing
time-critical services to ram (using criu.org) to be quickly restored
after a kernel update providing almost zero-downtime.



Yep. I add that if you use your system ram, your bootloader must be 
aware because it mustn't clear your memory after a reboot, indeed you 
can find reference of Pramfs in Uboot documentation:


http://www.denx.de/wiki/view/DULG/PersistentRAMFileSystem

Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/19] pramfs

2013-09-07 Thread Marco Stornelli

Il 07/09/2013 16:58, richard -rw- weinberger ha scritto:

On Sat, Sep 7, 2013 at 10:14 AM, Marco Stornelli
 wrote:

Hi all,

this is an attempt to include pramfs in mainline. At the moment pramfs
has been included in LTSI kernel. Since last review the code is more
or less the same but, with a really big thanks to Vladimir Davydov and
Parallels, the development of fsck has been started and we have now
the possibility to correct fs errors due to corruption. It's a "young"
tool but we are working on it. You can clone the code from our repos:

git clone git://git.code.sf.net/p/pramfs/code pramfs-code
git clone git://git.code.sf.net/p/pramfs/Tools pramfs-Tools


I'm a bit confused, what kind of non-volatile RAM is your fs targeting?
Wouldn't it make sense to use pstore like
arch/powerpc/platforms/pseries/nvram.c does?



Usually battery-backed SRAM, but actually it can be used in any piece of 
ram directly accessible and it provides a normal and complete fs 
interface. Usually I do the fs test remapping my system ram. You can 
find documentation here:


http://pramfs.sourceforge.net

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/19] pramfs: inode operations for dirs

2013-09-07 Thread Marco Stornelli

Il 07/09/2013 17:08, Al Viro ha scritto:

On Sat, Sep 07, 2013 at 10:23:42AM +0200, Marco Stornelli wrote:


+static int pram_rmdir(struct inode *dir, struct dentry *dentry)
+{
+   struct inode *inode = dentry->d_inode;
+   struct pram_inode *pi;
+   int err = -ENOTEMPTY;
+
+   if (!inode)
+   return -ENOENT;
+
+   pi = pram_get_inode(dir->i_sb, inode->i_ino);
+
+   /* directory to delete is empty? */
+   if (pi->i_type.dir.tail == 0) {
+   inode->i_ctime = dir->i_ctime;
+   inode->i_size = 0;
+   clear_nlink(inode);
+   pram_write_inode(inode, NULL);
+   pram_dec_count(dir);
+   err = 0;
+   } else {
+   pram_dbg("dir not empty\n");
+   }
+
+   return err;
+}


... and here you are paying for delayed removal of entries:
mkdir foo
touch foo/bar
rm -rf foo 

Yep. Same problem as before. I think I can move the remove link into 
pram_dec_count and I have to modify the evict path, it should be easy to 
manage.


Thanks for your comments Al

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 08/19] pramfs: file operations for dirs

2013-09-07 Thread Marco Stornelli

Il 07/09/2013 17:01, Al Viro ha scritto:

On Sat, Sep 07, 2013 at 10:22:36AM +0200, Marco Stornelli wrote:

+int pram_add_link(struct dentry *dentry, struct inode *inode)
+{
+   struct inode *dir = dentry->d_parent->d_inode;
+   struct pram_inode *pidir, *pi, *pitail = NULL;
+   u64 tail_ino, prev_ino;
+
+   const char *name = dentry->d_name.name;
+
+   int namelen = min_t(unsigned int, dentry->d_name.len, PRAM_NAME_LEN);


Whatever the hell for?  Your ->lookup() rejects dentries with names longer
than PRAM_NAME_LEN with an error, so they won't reach this function at all.



Ok. I'll remove it.


+int pram_remove_link(struct inode *inode)


Umm...  That's called on rename (for old one) *and* inode eviction when link
count goes to zero.  What's the point of keeping unlinked ones (unlink/rmdir/
rename victims) on those lists?  Sure, you skip them on lookups, but why
delay link removal until eviction?  You pay for that with extra locking,
BTW - if not for that, you wouldn't need your i_link_mutex at all.



Good question. The only answer I've got now is for "historical" reason, 
I can't see at the moment why we can remove the link information in case 
of opened-but-unlinked, instead of delay the operation until evict.



+   pi = pram_get_inode(sb, inode->i_ino);
+
+   switch ((u32)file->f_pos) {
+   case 0:
+   ret = dir_emit_dot(file, ctx);
+   ctx->pos = 1;
+   return ret;


Really?  So on the first call of ->iterate() you just generate one
entry and don't even try to produce more?  And it looks like the
rest is no nicer...



I'll try to improve the behavior here.

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 12/19] pramfs: symlink operations

2013-09-07 Thread Marco Stornelli

Il 07/09/2013 16:41, Al Viro ha scritto:

On Sat, Sep 07, 2013 at 10:29:15AM +0200, Marco Stornelli wrote:

+static int pram_readlink(struct dentry *dentry, char __user *buffer, int 
buflen)
+{
+   struct inode *inode = dentry->d_inode;
+   struct super_block *sb = inode->i_sb;
+   u64 block;
+   char *blockp;
+
+   block = pram_find_data_block(inode, 0);
+   blockp = pram_get_block(sb, block);
+   return vfs_readlink(dentry, buffer, buflen, blockp);
+}



+static void *pram_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+   struct inode *inode = dentry->d_inode;
+   struct super_block *sb = inode->i_sb;
+   off_t block;
+   int status;
+   char *blockp;
+
+   block = pram_find_data_block(inode, 0);
+   blockp = pram_get_block(sb, block);
+   status = vfs_follow_link(nd, blockp);
+   return ERR_PTR(status);
+}


Just nd_set_link(nd, blockp) instead of that vfs_follow_link() and be
done with that; that way you can use generic_readlink() instead of
pram_readlink() *and* get lower stack footprint on traversing them.




Yep, you're right (as usual :))

Marco

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/19] pramfs: Kconfig and makefile

2013-09-07 Thread Marco Stornelli
Add Kconfig and makefile.

Signed-off-by: Marco Stornelli 
---
 fs/Kconfig |6 +++-
 fs/Makefile|1 +
 fs/pramfs/Kconfig  |   72 
 fs/pramfs/Makefile |   14 ++
 4 files changed, 91 insertions(+), 2 deletions(-)
 create mode 100644 fs/pramfs/Kconfig
 create mode 100644 fs/pramfs/Makefile

diff --git a/fs/Kconfig b/fs/Kconfig
index c229f82..fd86a48 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -17,7 +17,7 @@ source "fs/ext4/Kconfig"
 config FS_XIP
 # execute in place
bool
-   depends on EXT2_FS_XIP
+   depends on EXT2_FS_XIP || PRAMFS_XIP
default y
 
 source "fs/jbd/Kconfig"
@@ -29,7 +29,8 @@ config FS_MBCACHE
default y if EXT2_FS=y && EXT2_FS_XATTR
default y if EXT3_FS=y && EXT3_FS_XATTR
default y if EXT4_FS=y
-   default m if EXT2_FS_XATTR || EXT3_FS_XATTR || EXT4_FS
+   default y if PRAMFS=y && PRAMFS_XATTR
+   default m if EXT2_FS_XATTR || EXT3_FS_XATTR || EXT4_FS || PRAMFS_XATTR
 
 source "fs/reiserfs/Kconfig"
 source "fs/jfs/Kconfig"
@@ -209,6 +210,7 @@ source "fs/romfs/Kconfig"
 source "fs/pstore/Kconfig"
 source "fs/sysv/Kconfig"
 source "fs/ufs/Kconfig"
+source "fs/pramfs/Kconfig"
 source "fs/exofs/Kconfig"
 source "fs/f2fs/Kconfig"
 source "fs/efivarfs/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index 4fe6df3..f8e70df 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -126,3 +126,4 @@ obj-y   += exofs/ # Multiple 
modules
 obj-$(CONFIG_CEPH_FS)  += ceph/
 obj-$(CONFIG_PSTORE)   += pstore/
 obj-$(CONFIG_EFIVAR_FS)+= efivarfs/
+obj-$(CONFIG_PRAMFS)   += pramfs/
diff --git a/fs/pramfs/Kconfig b/fs/pramfs/Kconfig
new file mode 100644
index 000..0ca2402
--- /dev/null
+++ b/fs/pramfs/Kconfig
@@ -0,0 +1,72 @@
+config PRAMFS
+   tristate "Persistent and Protected RAM file system support"
+   depends on HAS_IOMEM
+   select CRC32
+   help
+  If your system has a block of fast (comparable in access speed to
+  system memory) and non-volatile RAM and you wish to mount a
+  light-weight, full-featured, and space-efficient filesystem over it,
+  say Y here, and read .
+
+  To compile this as a module,  choose M here: the module will be
+  called pramfs.
+
+config PRAMFS_XIP
+   bool "Execute-in-place in PRAMFS"
+   depends on PRAMFS && BLOCK
+   help
+  Say Y here to enable XIP feature of PRAMFS.
+
+config PRAMFS_WRITE_PROTECT
+   bool "PRAMFS write protection"
+   depends on PRAMFS && MMU && HAVE_SET_MEMORY_RO
+   default y
+   help
+  Say Y here to enable the write protect feature of PRAMFS.
+
+config PRAMFS_XATTR
+   bool "PRAMFS extended attributes"
+   depends on PRAMFS && BLOCK
+   help
+ Extended attributes are name:value pairs associated with inodes by
+ the kernel or by users (see the attr(5) manual page, or visit
+ <http://acl.bestbits.at/> for details).
+
+ If unsure, say N.
+
+config PRAMFS_POSIX_ACL
+   bool "PRAMFS POSIX Access Control Lists"
+   depends on PRAMFS_XATTR
+   select FS_POSIX_ACL
+   help
+ Posix Access Control Lists (ACLs) support permissions for users and
+ groups beyond the owner/group/world scheme.
+
+ To learn more about Access Control Lists, visit the Posix ACLs for
+ Linux website <http://acl.bestbits.at/>.
+
+ If you don't know what Access Control Lists are, say N.
+
+config PRAMFS_SECURITY
+   bool "PRAMFS Security Labels"
+   depends on PRAMFS_XATTR
+   help
+ Security labels support alternative access control models
+ implemented by security modules like SELinux.  This option
+ enables an extended attribute handler for file security
+ labels in the pram filesystem.
+
+ If you are not using a security module that requires using
+ extended attributes for file security labels, say N.
+
+config PRAMFS_TEST
+   boolean
+   depends on PRAMFS
+
+config PRAMFS_TEST_MODULE
+   tristate "PRAMFS Test"
+   depends on PRAMFS && PRAMFS_WRITE_PROTECT && m
+   select PRAMFS_TEST
+   help
+ Say Y here to build a simple module to test the protection of
+ PRAMFS. The module will be called pramfs_test.
diff --git a/fs/pramfs/Makefile b/fs/pramfs/Makefile
new file mode 100644
index 000..055f0bb
--- /dev/null
+++ b/fs/pramfs/Makefile
@@ -0,0 +1,14 @@
+#
+# Makefile for the linux pram-filesystem routines.
+#
+
+obj-$(CONFIG_PRAMFS) += pramfs.o
+obj-$(CONFIG_PRAMFS_TEST_MODULE) += pramfs_test.o
+
+pramfs-y :=

[PATCH 18/19] pramfs: test module

2013-09-07 Thread Marco Stornelli
Add test module.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/pramfs_test.c |   47 +++
 1 files changed, 47 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/pramfs_test.c

diff --git a/fs/pramfs/pramfs_test.c b/fs/pramfs/pramfs_test.c
new file mode 100644
index 000..7acfda2
--- /dev/null
+++ b/fs/pramfs/pramfs_test.c
@@ -0,0 +1,47 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Pramfs test module.
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+#include 
+#include 
+#include 
+#include 
+#include "pram.h"
+
+int __init test_pramfs_write(void)
+{
+   struct pram_super_block *psb;
+
+   psb = get_pram_super();
+   if (!psb) {
+   printk(KERN_ERR
+   "%s: PRAMFS super block not found (not mounted?)\n",
+   __func__);
+   return 1;
+   }
+
+   /*
+* Attempt an unprotected clear of checksum information in the
+* superblock, this should cause a kernel page protection fault.
+*/
+   printk("%s: writing to kernel VA %p\n", __func__, psb);
+   psb->s_sum = 0;
+
+   return 0;
+}
+
+void test_pramfs_write_cleanup(void) {}
+
+/* Module information */
+MODULE_LICENSE("GPL");
+module_init(test_pramfs_write);
+module_exit(test_pramfs_write_cleanup);
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/19] pramfs: write protection

2013-09-07 Thread Marco Stornelli
Add write protection.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/wprotect.c |   39 ++
 fs/pramfs/wprotect.h |  144 ++
 2 files changed, 183 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/wprotect.c
 create mode 100644 fs/pramfs/wprotect.h

diff --git a/fs/pramfs/wprotect.c b/fs/pramfs/wprotect.c
new file mode 100644
index 000..ba1e488
--- /dev/null
+++ b/fs/pramfs/wprotect.c
@@ -0,0 +1,39 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Write protection for the filesystem pages.
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "pram.h"
+
+void pram_writeable(void *vaddr, unsigned long size, int rw)
+{
+   int ret = 0;
+   unsigned long nrpages = size >> PAGE_SHIFT;
+   unsigned long addr = (unsigned long)vaddr;
+
+   /* Page aligned */
+   addr &= PAGE_MASK;
+
+   if (size & (PAGE_SIZE - 1))
+   nrpages++;
+
+   if (rw)
+   ret = set_memory_rw(addr, nrpages);
+   else
+   ret = set_memory_ro(addr, nrpages);
+
+   BUG_ON(ret);
+}
diff --git a/fs/pramfs/wprotect.h b/fs/pramfs/wprotect.h
new file mode 100644
index 000..f5ee08d
--- /dev/null
+++ b/fs/pramfs/wprotect.h
@@ -0,0 +1,144 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Memory protection definitions for the PRAMFS filesystem.
+ *
+ * Copyright 2010-2011 Marco Stornelli 
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#ifndef __WPROTECT_H
+#define __WPROTECT_H
+
+#include 
+
+/* pram_memunlock_super() before calling! */
+static inline void pram_sync_super(struct pram_super_block *ps)
+{
+   u32 crc = 0;
+   ps->s_wtime = cpu_to_be32(get_seconds());
+   ps->s_sum = 0;
+   crc = crc32(~0, (__u8 *)ps + sizeof(__be32), PRAM_SB_SIZE -
+   sizeof(__be32));
+   ps->s_sum = cpu_to_be32(crc);
+   /* Keep sync redundant super block */
+   memcpy((void *)ps + PRAM_SB_SIZE, (void *)ps, PRAM_SB_SIZE);
+}
+
+/* pram_memunlock_inode() before calling! */
+static inline void pram_sync_inode(struct pram_inode *pi)
+{
+   u32 crc = 0;
+   pi->i_sum = 0;
+   crc = crc32(~0, (__u8 *)pi + sizeof(__be32), PRAM_INODE_SIZE -
+   sizeof(__be32));
+   pi->i_sum = cpu_to_be32(crc);
+}
+
+#ifdef CONFIG_PRAMFS_WRITE_PROTECT
+extern void pram_writeable(void *vaddr, unsigned long size, int rw);
+
+static inline int pram_is_protected(struct super_block *sb)
+{
+   struct pram_sb_info *sbi = (struct pram_sb_info *)sb->s_fs_info;
+   return sbi->s_mount_opt & PRAM_MOUNT_PROTECT;
+}
+
+static inline void __pram_memunlock_range(void *p, unsigned long len)
+{
+   pram_writeable(p, len, 1);
+}
+
+static inline void __pram_memlock_range(void *p, unsigned long len)
+{
+   pram_writeable(p, len, 0);
+}
+
+static inline void pram_memunlock_range(struct super_block *sb, void *p,
+   unsigned long len)
+{
+   if (pram_is_protected(sb))
+   __pram_memunlock_range(p, len);
+}
+
+static inline void pram_memlock_range(struct super_block *sb, void *p,
+   unsigned long len)
+{
+   if (pram_is_protected(sb))
+   __pram_memlock_range(p, len);
+}
+
+static inline void pram_memunlock_super(struct super_block *sb,
+   struct pram_super_block *ps)
+{
+   if (pram_is_protected(sb))
+   __pram_memunlock_range(ps, PRAM_SB_SIZE);
+}
+
+static inline void pram_memlock_super(struct super_block *sb,
+   struct pram_super_block *ps)
+{
+   pram_sync_super(ps);
+   if (pram_is_protected(sb))
+   __pram_memlock_range(ps, PRAM_SB_SIZE);
+}
+
+static inline void pram_memunlock_inode(struct super_block *sb,
+   struct pram_inode *pi)
+{
+   if (pram_is_protected(sb))
+   __pram_memunlock_range(pi, PRAM_SB_SIZE);
+}
+
+static inline void pram_memlock_inode(struct super_block *sb,
+   struct pram_inode *pi)
+{
+   pram_sync_inode(pi);
+   if (pram_is_protected(sb))
+   __pram_memlock_range(pi, PRAM_SB_SIZE);
+}
+
+static inline void pram_memunlock_block(str

[PATCH 16/19] pramfs: acl operations

2013-09-07 Thread Marco Stornelli
Add acl operations.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/acl.c |  415 +++
 fs/pramfs/acl.h |   85 +++
 2 files changed, 500 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/acl.c
 create mode 100644 fs/pramfs/acl.h

diff --git a/fs/pramfs/acl.c b/fs/pramfs/acl.c
new file mode 100644
index 000..c0f1f63
--- /dev/null
+++ b/fs/pramfs/acl.c
@@ -0,0 +1,415 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * POSIX ACL operations
+ *
+ * Copyright 2010-2011 Marco Stornelli 
+ *
+ * based on fs/ext2/acl.c with the following copyright:
+ *
+ * Copyright (C) 2001-2003 Andreas Gruenbacher, 
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "pram.h"
+#include "xattr.h"
+#include "acl.h"
+
+/*
+ * Load ACL information from filesystem.
+ */
+static struct posix_acl *pram_acl_load(const void *value, size_t size)
+{
+   const char *end = (char *)value + size;
+   int n, count;
+   struct posix_acl *acl;
+
+   if (!value)
+   return NULL;
+   if (size < sizeof(struct pram_acl_header))
+   return ERR_PTR(-EINVAL);
+   if (((struct pram_acl_header *)value)->a_version !=
+   cpu_to_be32(PRAM_ACL_VERSION))
+   return ERR_PTR(-EINVAL);
+   value = (char *)value + sizeof(struct pram_acl_header);
+   count = pram_acl_count(size);
+   if (count < 0)
+   return ERR_PTR(-EINVAL);
+   if (count == 0)
+   return NULL;
+   acl = posix_acl_alloc(count, GFP_KERNEL);
+   if (!acl)
+   return ERR_PTR(-ENOMEM);
+   for (n = 0; n < count; n++) {
+   struct pram_acl_entry *entry = (struct pram_acl_entry *)value;
+   if ((char *)value + sizeof(struct pram_acl_entry_short) > end)
+   goto fail;
+   acl->a_entries[n].e_tag  = be16_to_cpu(entry->e_tag);
+   acl->a_entries[n].e_perm = be16_to_cpu(entry->e_perm);
+   switch (acl->a_entries[n].e_tag) {
+   case ACL_USER_OBJ:
+   case ACL_GROUP_OBJ:
+   case ACL_MASK:
+   case ACL_OTHER:
+   value = (char *)value +
+   sizeof(struct pram_acl_entry_short);
+   break;
+   case ACL_USER:
+   value = (char *)value + sizeof(struct pram_acl_entry);
+   if ((char *)value > end)
+   goto fail;
+   acl->a_entries[n].e_uid = make_kuid(_user_ns,
+   be32_to_cpu(entry->e_id));
+   break;
+   case ACL_GROUP:
+   value = (char *)value + sizeof(struct pram_acl_entry);
+   if ((char *)value > end)
+   goto fail;
+   acl->a_entries[n].e_gid = make_kgid(_user_ns,
+   be32_to_cpu(entry->e_id));
+   break;
+   default:
+   goto fail;
+   }
+   }
+   if (value != end)
+   goto fail;
+   return acl;
+
+fail:
+   posix_acl_release(acl);
+   return ERR_PTR(-EINVAL);
+}
+
+/*
+ * Save ACL information into the filesystem.
+ */
+static void *pram_acl_save(const struct posix_acl *acl, size_t *size)
+{
+   struct pram_acl_header *ext_acl;
+   char *e;
+   size_t n;
+
+   *size = pram_acl_size(acl->a_count);
+   ext_acl = kmalloc(sizeof(struct pram_acl_header) + acl->a_count *
+   sizeof(struct pram_acl_entry), GFP_KERNEL);
+   if (!ext_acl)
+   return ERR_PTR(-ENOMEM);
+   ext_acl->a_version = cpu_to_be32(PRAM_ACL_VERSION);
+   e = (char *)ext_acl + sizeof(struct pram_acl_header);
+   for (n = 0; n < acl->a_count; n++) {
+   const struct posix_acl_entry *acl_e = >a_entries[n];
+   struct pram_acl_entry *entry = (struct pram_acl_entry *)e;
+   entry->e_tag  = cpu_to_le16(acl_e->e_tag);
+   entry->e_perm = cpu_to_le16(acl_e->e_perm);
+   switch(acl_e->e_tag) {
+   case ACL_USER:
+   entry->e_id = cpu_to_be32(
+   from_kuid(_user_ns, acl_e->e_uid));
+   e += sizeof(struct pram_acl_entry);
+   break;
+   case ACL_GROUP:
+   entry->e_id = cpu_to_be32(
+   from_kgid(_user_ns, acl_e->e_gid));
+   e +

[PATCH 15/19] pramfs: extended attributes

2013-09-07 Thread Marco Stornelli
Add extended attributes.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/xattr.c  | 1118 
 fs/pramfs/xattr.h  |   92 
 fs/pramfs/xattr_security.c |   80 
 fs/pramfs/xattr_trusted.c  |   65 +++
 fs/pramfs/xattr_user.c |   69 +++
 5 files changed, 1424 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/xattr.c
 create mode 100644 fs/pramfs/xattr.h
 create mode 100644 fs/pramfs/xattr_security.c
 create mode 100644 fs/pramfs/xattr_trusted.c
 create mode 100644 fs/pramfs/xattr_user.c

diff --git a/fs/pramfs/xattr.c b/fs/pramfs/xattr.c
new file mode 100644
index 000..a78bf1d
--- /dev/null
+++ b/fs/pramfs/xattr.c
@@ -0,0 +1,1118 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Extended attributes operations.
+ *
+ * Copyright 2010-2011 Marco Stornelli 
+ *
+ * based on fs/ext2/xattr.c with the following copyright:
+ *
+ * Fix by Harrison Xing .
+ * Extended attributes for symlinks and special files added per
+ *  suggestion of Luka Renko .
+ * xattr consolidation Copyright (c) 2004 James Morris ,
+ *  Red Hat Inc.
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+/*
+ * Extended attributes are stored in blocks allocated outside of
+ * any inode. The i_xattr field is then made to point to this allocated
+ * block. If all extended attributes of an inode are identical, these
+ * inodes may share the same extended attribute block. Such situations
+ * are automatically detected by keeping a cache of recent attribute block
+ * numbers and hashes over the block's contents in memory.
+ *
+ *
+ * Extended attribute block layout:
+ *
+ *   +--+
+ *   | header   |
+ *   | entry 1  | |
+ *   | entry 2  | | growing downwards
+ *   | entry 3  | v
+ *   | four null bytes  |
+ *   | . . .|
+ *   | value 1  | ^
+ *   | value 3  | | growing upwards
+ *   | value 2  | |
+ *   +--+
+ *
+ * The block header is followed by multiple entry descriptors. These entry
+ * descriptors are variable in size, and aligned to PRAM_XATTR_PAD
+ * byte boundaries. The entry descriptors are sorted by attribute name,
+ * so that two extended attribute blocks can be compared efficiently.
+ *
+ * Attribute values are aligned to the end of the block, stored in
+ * no specific order. They are also padded to PRAM_XATTR_PAD byte
+ * boundaries. No additional gaps are left between them.
+ *
+ * Locking strategy
+ * 
+ * pi->i_xattr is protected by PRAM_I(inode)->xattr_sem.
+ * EA blocks are only changed if they are exclusive to an inode, so
+ * holding xattr_sem also means that nothing but the EA block's reference
+ * count will change. Multiple writers to an EA block are synchronized
+ * by the mutex in each block descriptor. Block descriptors are kept in a
+ * red black tree and the key is the absolute block number.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "pram.h"
+#include "xattr.h"
+#include "acl.h"
+#include "desctree.h"
+
+#define HDR(bp) ((struct pram_xattr_header *)(bp))
+#define ENTRY(ptr) ((struct pram_xattr_entry *)(ptr))
+#define FIRST_ENTRY(bh) ENTRY(HDR(bh)+1)
+#define IS_LAST_ENTRY(entry) (*(__u32 *)(entry) == 0)
+#define GET_DESC(sbi, blocknr) \
+   lookup_xblock_desc(sbi, blocknr, pram_xblock_desc_cache, 1)
+#define LOOKUP_DESC(sbi, blocknr) lookup_xblock_desc(sbi, blocknr, NULL, 0)
+
+#ifdef PRAM_XATTR_DEBUG
+# define ea_idebug(inode, f...) do { \
+   printk(KERN_DEBUG "inode %ld: ", inode->i_ino); \
+   printk(f); \
+   printk("\n"); \
+   } while (0)
+# define ea_bdebug(blocknr, f...) do { \
+   printk(KERN_DEBUG "block %lu: ", blocknr); \
+   printk(f); \
+   printk("\n"); \
+   } while (0)
+#else
+# define ea_idebug(f...)
+# define ea_bdebug(f...)
+#endif
+
+static int pram_xattr_set2(struct inode *, char *, struct pram_xblock_desc *,
+  struct pram_xattr_header *);
+
+static int pram_xattr_cache_insert(struct super_block *sb,
+  unsigned long blocknr, u32 xhash);
+static struct pram_xblock_desc *pram_xattr_cache_find(struct inode *,
+struct pram_xattr_header *);
+static void pram_xattr_rehash(struct pram_xattr_header *,
+ struct pram_xattr_entry *);
+
+static struct mb_cache *pram_xattr_cache;
+static struct kmem_cache *pram_xblock_desc_cache;
+
+static const struct xattr_handler *pram_xattr_handler_map[] = {
+   [PRAM_XATTR_INDEX_USER]  = _xattr_user_handler,
+#ifdef CONFIG_PRAMFS_POSIX_ACL
+   [PRAM_XATTR_INDEX_POSIX_ACL_ACCESS]  = _

[PATCH 14/19] pramfs: extended attributes block description tree

2013-09-07 Thread Marco Stornelli
Add extended attributes block description tree.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/desctree.c |  181 ++
 fs/pramfs/desctree.h |   44 
 2 files changed, 225 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/desctree.c
 create mode 100644 fs/pramfs/desctree.h

diff --git a/fs/pramfs/desctree.c b/fs/pramfs/desctree.c
new file mode 100644
index 000..fa1c9fc
--- /dev/null
+++ b/fs/pramfs/desctree.c
@@ -0,0 +1,181 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Extended attributes block descriptors tree.
+ *
+ * Copyright 2010-2011 Marco Stornelli 
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include "desctree.h"
+#include "pram.h"
+
+/* xblock_desc_init_always()
+ *
+ * These are initializations that need to be done on every
+ * descriptor allocation as the fields are not initialised
+ * by slab allocation.
+ */
+void xblock_desc_init_always(struct pram_xblock_desc *desc)
+{
+   atomic_set(>refcount, 0);
+   desc->blocknr = 0;
+   desc->flags = 0;
+}
+
+/* xblock_desc_init_once()
+ *
+ * These are initializations that only need to be done
+ * once, because the fields are idempotent across use
+ * of the descriptor, so let the slab aware of that.
+ */
+void xblock_desc_init_once(struct pram_xblock_desc *desc)
+{
+   mutex_init(>lock);
+}
+
+/* __insert_xblock_desc()
+ *
+ * Insert a new descriptor in the tree.
+ */
+static void __insert_xblock_desc(struct pram_sb_info *sbi,
+unsigned long blocknr, struct rb_node *node)
+{
+   struct rb_node **p = &(sbi->desc_tree.rb_node);
+   struct rb_node *parent = NULL;
+   struct pram_xblock_desc *desc;
+
+   while (*p) {
+   parent = *p;
+   desc = rb_entry(parent, struct pram_xblock_desc, node);
+
+   if (blocknr < desc->blocknr)
+   p = &(*p)->rb_left;
+   else if (blocknr > desc->blocknr)
+   p = &(*p)->rb_right;
+   else
+   /* Oops...an other descriptor for the same block ? */
+   BUG();
+   }
+
+   rb_link_node(node, parent, p);
+   rb_insert_color(node, >desc_tree);
+}
+
+void insert_xblock_desc(struct pram_sb_info *sbi, struct pram_xblock_desc 
*desc)
+{
+   spin_lock(>desc_tree_lock);
+   __insert_xblock_desc(sbi, desc->blocknr, >node);
+   spin_unlock(>desc_tree_lock);
+};
+
+/* __lookup_xblock_desc()
+ *
+ * Search an extended attribute descriptor in the tree via the
+ * block number. It returns the descriptor if it's found or
+ * NULL. If not found it creates a new descriptor if create is not 0.
+ */
+static struct pram_xblock_desc *__lookup_xblock_desc(struct pram_sb_info *sbi,
+   unsigned long blocknr,
+   struct kmem_cache *cache,
+   int create)
+{
+   struct rb_node *n = sbi->desc_tree.rb_node;
+   struct pram_xblock_desc *desc = NULL;
+
+   while (n) {
+   desc = rb_entry(n, struct pram_xblock_desc, node);
+
+   if (blocknr < desc->blocknr)
+   n = n->rb_left;
+   else if (blocknr > desc->blocknr)
+   n = n->rb_right;
+   else {
+   atomic_inc(>refcount);
+   goto out;
+   }
+   }
+
+   /* not found */
+   if (create) {
+   desc = kmem_cache_alloc(cache, GFP_NOFS);
+   if (!desc)
+   return ERR_PTR(-ENOMEM);
+   xblock_desc_init_always(desc);
+   atomic_set(>refcount, 1);
+   desc->blocknr = blocknr;
+   __insert_xblock_desc(sbi, desc->blocknr, >node);
+   }
+out:
+   return desc;
+}
+
+struct pram_xblock_desc *lookup_xblock_desc(struct pram_sb_info *sbi,
+   unsigned long blocknr,
+   struct kmem_cache *cache,
+   int create)
+{
+   struct pram_xblock_desc *desc = NULL;
+
+   spin_lock(>desc_tree_lock);
+   desc = __lookup_xblock_desc(sbi, blocknr, cache, create);
+   spin_unlock(>desc_tree_lock);
+   return desc;
+}
+
+/* put_xblock_desc()
+ *
+ * Decrement the reference count and if it reaches zero and the
+ * desciptor has been marked to be free, then we free it.
+ * It returns 0 if the descriptor has been deleted and 1 otherwise.
+ */
+int put_xblock_desc(struct pram_sb_info *sbi, struct pram_xblock_desc *desc)

[PATCH 12/19] pramfs: symlink operations

2013-09-07 Thread Marco Stornelli
Add symlink operations.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/symlink.c |   76 +++
 1 files changed, 76 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/symlink.c

diff --git a/fs/pramfs/symlink.c b/fs/pramfs/symlink.c
new file mode 100644
index 000..0d5213f
--- /dev/null
+++ b/fs/pramfs/symlink.c
@@ -0,0 +1,76 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Symlink operations
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include "pram.h"
+#include "xattr.h"
+
+int pram_block_symlink(struct inode *inode, const char *symname, int len)
+{
+   struct super_block *sb = inode->i_sb;
+   u64 block;
+   char *blockp;
+   int err;
+
+   err = pram_alloc_blocks(inode, 0, 1);
+   if (err)
+   return err;
+
+   block = pram_find_data_block(inode, 0);
+   blockp = pram_get_block(sb, block);
+
+   pram_memunlock_block(sb, blockp);
+   memcpy(blockp, symname, len);
+   blockp[len] = '\0';
+   pram_memlock_block(sb, blockp);
+   return 0;
+}
+
+static int pram_readlink(struct dentry *dentry, char __user *buffer, int 
buflen)
+{
+   struct inode *inode = dentry->d_inode;
+   struct super_block *sb = inode->i_sb;
+   u64 block;
+   char *blockp;
+
+   block = pram_find_data_block(inode, 0);
+   blockp = pram_get_block(sb, block);
+   return vfs_readlink(dentry, buffer, buflen, blockp);
+}
+
+static void *pram_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+   struct inode *inode = dentry->d_inode;
+   struct super_block *sb = inode->i_sb;
+   off_t block;
+   int status;
+   char *blockp;
+
+   block = pram_find_data_block(inode, 0);
+   blockp = pram_get_block(sb, block);
+   status = vfs_follow_link(nd, blockp);
+   return ERR_PTR(status);
+}
+
+const struct inode_operations pram_symlink_inode_operations = {
+   .readlink   = pram_readlink,
+   .follow_link= pram_follow_link,
+   .setattr= pram_notify_change,
+#ifdef CONFIG_PRAMFS_XATTR
+   .setxattr   = generic_setxattr,
+   .getxattr   = generic_getxattr,
+   .listxattr  = pram_listxattr,
+   .removexattr= generic_removexattr,
+#endif
+};
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/19] pramfs: xip operations

2013-09-07 Thread Marco Stornelli
Add xip operations.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/xip.c |  119 +++
 fs/pramfs/xip.h |   33 +++
 2 files changed, 152 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/xip.c
 create mode 100644 fs/pramfs/xip.h

diff --git a/fs/pramfs/xip.c b/fs/pramfs/xip.c
new file mode 100644
index 000..26b8afe
--- /dev/null
+++ b/fs/pramfs/xip.c
@@ -0,0 +1,119 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * XIP operations.
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include 
+#include 
+#include "pram.h"
+#include "xip.h"
+
+/*
+ * Wrappers. We need to use the rcu read lock to avoid
+ * concurrent truncate operation. No problem for write because we held
+ * i_mutex.
+ */
+ssize_t pram_xip_file_read(struct file *filp, char __user *buf,
+   size_t len, loff_t *ppos)
+{
+   ssize_t res;
+   rcu_read_lock();
+   res = xip_file_read(filp, buf, len, ppos);
+   rcu_read_unlock();
+   return res;
+}
+
+static int pram_xip_file_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf)
+{
+   int ret = 0;
+   rcu_read_lock();
+   ret = xip_file_fault(vma, vmf);
+   rcu_read_unlock();
+   return ret;
+}
+
+static const struct vm_operations_struct pram_xip_vm_ops = {
+   .fault  = pram_xip_file_fault,
+   .page_mkwrite = filemap_page_mkwrite,
+.remap_pages = generic_file_remap_pages,
+};
+
+int pram_xip_file_mmap(struct file * file, struct vm_area_struct * vma)
+{
+   BUG_ON(!file->f_mapping->a_ops->get_xip_mem);
+
+   file_accessed(file);
+   vma->vm_ops = _xip_vm_ops;
+   vma->vm_flags |= VM_MIXEDMAP;
+   return 0;
+}
+
+static int pram_find_and_alloc_blocks(struct inode *inode, sector_t iblock,
+sector_t *data_block, int create)
+{
+   int err = -EIO;
+   u64 block;
+
+   block = pram_find_data_block(inode, iblock);
+
+   if (!block) {
+   if (!create) {
+   err = -ENODATA;
+   goto err;
+   }
+
+   err = pram_alloc_blocks(inode, iblock, 1);
+   if (err)
+   goto err;
+
+   block = pram_find_data_block(inode, iblock);
+   if (!block) {
+   err = -ENODATA;
+   goto err;
+   }
+   }
+
+   *data_block = block;
+   err = 0;
+
+ err:
+   return err;
+}
+
+static inline int __pram_get_block(struct inode *inode, pgoff_t pgoff,
+  int create, sector_t *result)
+{
+   int rc = 0;
+
+   rc = pram_find_and_alloc_blocks(inode, (sector_t)pgoff, result, create);
+
+   if (rc == -ENODATA)
+   BUG_ON(create);
+
+   return rc;
+}
+
+int pram_get_xip_mem(struct address_space *mapping, pgoff_t pgoff, int create,
+void **kmem, unsigned long *pfn)
+{
+   int rc;
+   sector_t block = 0;
+
+   /* first, retrieve the block */
+   rc = __pram_get_block(mapping->host, pgoff, create, );
+   if (rc)
+   goto exit;
+
+   *kmem = pram_get_block(mapping->host->i_sb, block);
+   *pfn =  pram_get_pfn(mapping->host->i_sb, block);
+
+exit:
+   return rc;
+}
diff --git a/fs/pramfs/xip.h b/fs/pramfs/xip.h
new file mode 100644
index 000..5bd82f2
--- /dev/null
+++ b/fs/pramfs/xip.h
@@ -0,0 +1,33 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * XIP operations.
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#ifdef CONFIG_PRAMFS_XIP
+int pram_get_xip_mem(struct address_space *, pgoff_t, int, void **,
+ unsigned long *);
+ssize_t pram_xip_file_read(struct file *filp, char __user *buf,
+   size_t len, loff_t *ppos);
+int pram_xip_file_mmap(struct file * file, struct vm_area_struct * vma);
+static inline int pram_use_xip(struct super_block *sb)
+{
+   struct pram_sb_info *sbi = PRAM_SB(sb);
+   return sbi->s_mount_opt & PRAM_MOUNT_XIP;
+}
+#define mapping_is_xip(map) (map->a_ops->get_xip_mem)
+
+#else
+
+#define mapping_is_xip(map)0
+#define pram_use_xip(sb)   0
+#define pram_get_xip_mem   NULL
+#define pram_xip_file_read NULL
+#define pram_xip_file_mmap NULL
+
+#endif
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kern

[PATCH 11/19] pramfs: ioctl operations

2013-09-07 Thread Marco Stornelli
Add ioctl operations.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/ioctl.c |  127 +
 1 files changed, 127 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/ioctl.c

diff --git a/fs/pramfs/ioctl.c b/fs/pramfs/ioctl.c
new file mode 100644
index 000..565cc46
--- /dev/null
+++ b/fs/pramfs/ioctl.c
@@ -0,0 +1,127 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Ioctl operations.
+ *
+ * Copyright 2010-2011 Marco Stornelli 
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "pram.h"
+
+long pram_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+   struct inode *inode = file_inode(filp);
+   struct pram_inode *pi;
+   unsigned int flags;
+   int ret;
+
+   pi = pram_get_inode(inode->i_sb, inode->i_ino);
+   if (!pi)
+   return -EACCES;
+
+   switch (cmd) {
+   case FS_IOC_GETFLAGS:
+   flags = be32_to_cpu(pi->i_flags) & PRAM_FL_USER_VISIBLE;
+   return put_user(flags, (int __user *) arg);
+   case FS_IOC_SETFLAGS: {
+   unsigned int oldflags;
+
+   ret = mnt_want_write_file(filp);
+   if (ret)
+   return ret;
+
+   if (!inode_owner_or_capable(inode)) {
+   ret = -EPERM;
+   goto flags_out;
+   }
+
+   if (get_user(flags, (int __user *) arg)) {
+   ret = -EFAULT;
+   goto flags_out;
+   }
+
+   mutex_lock(>i_mutex);
+   oldflags = be32_to_cpu(pi->i_flags);
+
+   if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
+   if (!capable(CAP_LINUX_IMMUTABLE)) {
+   mutex_unlock(>i_mutex);
+   ret = -EPERM;
+   goto flags_out;
+   }
+   }
+
+   if (!S_ISDIR(inode->i_mode))
+   flags &= ~FS_DIRSYNC_FL;
+
+   flags = flags & FS_FL_USER_MODIFIABLE;
+   flags |= oldflags & ~FS_FL_USER_MODIFIABLE;
+   pram_memunlock_inode(inode->i_sb, pi);
+   pi->i_flags = cpu_to_be32(flags);
+   inode->i_ctime = CURRENT_TIME_SEC;
+   pi->i_ctime = cpu_to_be32(inode->i_ctime.tv_sec);
+   pram_set_inode_flags(inode, pi);
+   pram_memlock_inode(inode->i_sb, pi);
+   mutex_unlock(>i_mutex);
+flags_out:
+   mnt_drop_write_file(filp);
+   return ret;
+   }
+   case FS_IOC_GETVERSION:
+   return put_user(inode->i_generation, (int __user *) arg);
+   case FS_IOC_SETVERSION: {
+   __u32 generation;
+   if (!inode_owner_or_capable(inode))
+   return -EPERM;
+   ret = mnt_want_write_file(filp);
+   if (ret)
+   return ret;
+   if (get_user(generation, (int __user *) arg)) {
+   ret = -EFAULT;
+   goto setversion_out;
+   }
+   mutex_lock(>i_mutex);
+   inode->i_ctime = CURRENT_TIME_SEC;
+   inode->i_generation = generation;
+   pram_update_inode(inode);
+   mutex_unlock(>i_mutex);
+setversion_out:
+   mnt_drop_write_file(filp);
+   return ret;
+   }
+   default:
+   return -ENOTTY;
+   }
+}
+
+#ifdef CONFIG_COMPAT
+long pram_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+   switch (cmd) {
+   case FS_IOC32_GETFLAGS:
+   cmd = FS_IOC_GETFLAGS;
+   break;
+   case FS_IOC32_SETFLAGS:
+   cmd = FS_IOC_SETFLAGS;
+   break;
+   case FS_IOC32_GETVERSION:
+   cmd = FS_IOC_GETVERSION;
+   break;
+   case FS_IOC32_SETVERSION:
+   cmd = FS_IOC_SETVERSION;
+   break;
+   default:
+   return -ENOIOCTLCMD;
+   }
+   return pram_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
+}
+#endif
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/19] pramfs: block allocation

2013-09-07 Thread Marco Stornelli
Add block allocation operations.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/balloc.c |  160 
 1 files changed, 160 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/balloc.c

diff --git a/fs/pramfs/balloc.c b/fs/pramfs/balloc.c
new file mode 100644
index 000..8291bf3
--- /dev/null
+++ b/fs/pramfs/balloc.c
@@ -0,0 +1,160 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * The blocks allocation and deallocation routines.
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include 
+#include "pram.h"
+
+void pram_bitmap_fill(unsigned long *dst, int nbits)
+{
+   size_t nlongs = BITS_TO_LONGS(nbits);
+   if (!small_const_nbits(nbits)) {
+   int len = (nlongs - 1) * sizeof(unsigned long);
+   memset(dst, 0xff,  len);
+   }
+   if (BITS_PER_LONG == 64)
+   dst[nlongs - 1] = cpu_to_le64(BITMAP_LAST_WORD_MASK(nbits));
+   else
+   dst[nlongs - 1] = cpu_to_le32(BITMAP_LAST_WORD_MASK(nbits));
+}
+
+/*
+ * This just marks in-use the blocks that make up the bitmap.
+ * The bitmap must be writeable before calling.
+ */
+void pram_init_bitmap(struct super_block *sb)
+{
+   struct pram_super_block *ps = pram_get_super(sb);
+   unsigned long *bitmap = pram_get_bitmap(sb);
+   int blocks = be32_to_cpu(ps->s_bitmap_blocks);
+
+   memset(bitmap, 0, blocks << sb->s_blocksize_bits);
+
+   pram_bitmap_fill(bitmap, blocks);
+}
+
+
+/* Free absolute blocknr */
+void pram_free_block(struct super_block *sb, unsigned long blocknr)
+{
+   struct pram_super_block *ps;
+   u64 bitmap_block;
+   unsigned long bitmap_bnr;
+   void *bitmap;
+   void *bp;
+
+   mutex_lock(_SB(sb)->s_lock);
+
+   bitmap = pram_get_bitmap(sb);
+   /*
+* find the block within the bitmap that contains the inuse bit
+* for the block we need to free. We need to unlock this bitmap
+* block to clear the inuse bit.
+*/
+   bitmap_bnr = blocknr >> (3 + sb->s_blocksize_bits);
+   bitmap_block = pram_get_block_off(sb, bitmap_bnr);
+   bp = pram_get_block(sb, bitmap_block);
+
+   pram_memunlock_block(sb, bp);
+   pram_clear_bit(blocknr, bitmap); /* mark the block free */
+   pram_memlock_block(sb, bp);
+
+   ps = pram_get_super(sb);
+   pram_memunlock_super(sb, ps);
+
+   if (blocknr < be32_to_cpu(ps->s_free_blocknr_hint))
+   ps->s_free_blocknr_hint = cpu_to_be32(blocknr);
+   be32_add_cpu(>s_free_blocks_count, 1);
+   pram_memlock_super(sb, ps);
+
+   mutex_unlock(_SB(sb)->s_lock);
+}
+
+
+/*
+ * allocate a block and return it's absolute blocknr. Zeroes out the
+ * block if zero set.
+ */
+int pram_new_block(struct super_block *sb, unsigned long *blocknr, int zero)
+{
+   struct pram_super_block *ps;
+   u64 bitmap_block;
+   unsigned long bnr, bitmap_bnr;
+   int errval;
+   void *bitmap;
+   void *bp;
+
+   mutex_lock(_SB(sb)->s_lock);
+   ps = pram_get_super(sb);
+   bitmap = pram_get_bitmap(sb);
+
+   if (ps->s_free_blocks_count) {
+   /* find the oldest unused block */
+   bnr = pram_find_next_zero_bit(bitmap,
+be32_to_cpu(ps->s_blocks_count),
+be32_to_cpu(ps->s_free_blocknr_hint));
+
+   if (bnr < be32_to_cpu(ps->s_bitmap_blocks) ||
+   bnr >= be32_to_cpu(ps->s_blocks_count)) {
+   pram_dbg("no free blocks found!\n");
+   errval = -ENOSPC;
+   goto fail;
+   }
+
+   pram_memunlock_super(sb, ps);
+   be32_add_cpu(>s_free_blocks_count, -1);
+   if (bnr < (be32_to_cpu(ps->s_blocks_count)-1))
+   ps->s_free_blocknr_hint = cpu_to_be32(bnr+1);
+   else
+   ps->s_free_blocknr_hint = 0;
+   pram_memlock_super(sb, ps);
+   } else {
+   pram_dbg("all blocks allocated\n");
+   errval = -ENOSPC;
+   goto fail;
+   }
+
+   /*
+* find the block within the bitmap that contains the inuse bit
+* for the unused block we just found. We need to unlock it to
+* set the inuse bit.
+*/
+   bitmap_bnr = bnr >> (3 + sb->s_blocksize_bits);
+   bitmap_block = pram_get_block_off(sb, bitmap_bnr);
+   bp

[PATCH 09/19] pramfs: inode operations for dirs

2013-09-07 Thread Marco Stornelli
Add inode operations for dirs.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/namei.c |  391 +
 1 files changed, 391 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/namei.c

diff --git a/fs/pramfs/namei.c b/fs/pramfs/namei.c
new file mode 100644
index 000..64dd946
--- /dev/null
+++ b/fs/pramfs/namei.c
@@ -0,0 +1,391 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Inode operations for directories.
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+#include 
+#include 
+#include "pram.h"
+#include "acl.h"
+#include "xattr.h"
+#include "xip.h"
+
+/*
+ * Couple of helper functions - make the code slightly cleaner.
+ */
+
+static inline void pram_inc_count(struct inode *inode)
+{
+   inc_nlink(inode);
+   pram_write_inode(inode, NULL);
+}
+
+static inline void pram_dec_count(struct inode *inode)
+{
+   if (inode->i_nlink) {
+   drop_nlink(inode);
+   pram_write_inode(inode, NULL);
+   }
+}
+
+static inline int pram_add_nondir(struct inode *dir,
+  struct dentry *dentry,
+  struct inode *inode)
+{
+   int err = pram_add_link(dentry, inode);
+   if (!err) {
+   unlock_new_inode(inode);
+   d_instantiate(dentry, inode);
+   return 0;
+   }
+   pram_dec_count(inode);
+   unlock_new_inode(inode);
+   iput(inode);
+   return err;
+}
+
+/*
+ * Methods themselves.
+ */
+
+static ino_t pram_inode_by_name(struct inode *dir, struct dentry *dentry)
+{
+   struct pram_inode *pi;
+   ino_t ino;
+   int namelen;
+
+   pi = pram_get_inode(dir->i_sb, dir->i_ino);
+   ino = be64_to_cpu(pi->i_type.dir.head);
+
+   mutex_lock(_I(dir)->i_link_mutex);
+   while (ino) {
+   pi = pram_get_inode(dir->i_sb, ino);
+
+   if (pi->i_links_count) {
+   namelen = strlen(pi->i_d.d_name);
+
+   if (namelen == dentry->d_name.len &&
+   !memcmp(dentry->d_name.name,
+   pi->i_d.d_name, namelen))
+   break;
+   }
+
+   ino = be64_to_cpu(pi->i_d.d_next);
+   }
+   mutex_unlock(_I(dir)->i_link_mutex);
+   return ino;
+}
+
+static struct dentry *pram_lookup(struct inode *dir, struct dentry *dentry,
+ unsigned int flags)
+{
+   struct inode *inode = NULL;
+   ino_t ino;
+
+   if (dentry->d_name.len > PRAM_NAME_LEN)
+   return ERR_PTR(-ENAMETOOLONG);
+
+   ino = pram_inode_by_name(dir, dentry);
+   if (ino) {
+   inode = pram_iget(dir->i_sb, ino);
+   if (inode == ERR_PTR(-ESTALE)) {
+   pram_err(dir->i_sb,
+   "deleted inode referenced: %lu",
+   (unsigned long) ino);
+   return ERR_PTR(-EIO);
+   }
+   }
+
+   return d_splice_alias(inode, dentry);
+}
+
+
+/*
+ * By the time this is called, we already have created
+ * the directory cache entry for the new file, but it
+ * is so far negative - it has no inode.
+ *
+ * If the create succeeds, we fill in the inode information
+ * with d_instantiate().
+ */
+static int pram_create(struct inode *dir, struct dentry *dentry, umode_t mode,
+  bool flags)
+{
+   struct inode *inode = pram_new_inode(dir, mode, >d_name);
+   int err = PTR_ERR(inode);
+   if (!IS_ERR(inode)) {
+   inode->i_op = _file_inode_operations;
+   if (pram_use_xip(inode->i_sb)) {
+   inode->i_mapping->a_ops = _aops_xip;
+   inode->i_fop = _xip_file_operations;
+   } else {
+   inode->i_fop = _file_operations;
+   inode->i_mapping->a_ops = _aops;
+   }
+   err = pram_add_nondir(dir, dentry, inode);
+   }
+   return err;
+}
+
+static int pram_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
+{
+   struct inode *inode = pram_new_inode(dir, mode, NULL);
+   if (IS_ERR(inode))
+   return PTR_ERR(inode);
+
+   inode->i_op = _file_inode_operations;
+   if (pram_use_xip(inode->i_sb)) {
+   inode->i_mapping->a_ops = _aops_xip;
+   inode->i_fop = _xip_file_operati

[PATCH 08/19] pramfs: file operations for dirs

2013-09-07 Thread Marco Stornelli
Add file operations for dirs.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/dir.c |  226 +++
 1 files changed, 226 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/dir.c

diff --git a/fs/pramfs/dir.c b/fs/pramfs/dir.c
new file mode 100644
index 000..137a592
--- /dev/null
+++ b/fs/pramfs/dir.c
@@ -0,0 +1,226 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * File operations for directories.
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include 
+#include "pram.h"
+
+/*
+ * Parent is locked.
+ */
+int pram_add_link(struct dentry *dentry, struct inode *inode)
+{
+   struct inode *dir = dentry->d_parent->d_inode;
+   struct pram_inode *pidir, *pi, *pitail = NULL;
+   u64 tail_ino, prev_ino;
+
+   const char *name = dentry->d_name.name;
+
+   int namelen = min_t(unsigned int, dentry->d_name.len, PRAM_NAME_LEN);
+
+   pidir = pram_get_inode(dir->i_sb, dir->i_ino);
+
+   mutex_lock(_I(dir)->i_link_mutex);
+
+   pi = pram_get_inode(dir->i_sb, inode->i_ino);
+
+   dir->i_mtime = dir->i_ctime = CURRENT_TIME;
+
+   tail_ino = be64_to_cpu(pidir->i_type.dir.tail);
+   if (tail_ino != 0) {
+   pitail = pram_get_inode(dir->i_sb, tail_ino);
+   pram_memunlock_inode(dir->i_sb, pitail);
+   pitail->i_d.d_next = cpu_to_be64(inode->i_ino);
+   pram_memlock_inode(dir->i_sb, pitail);
+
+   prev_ino = tail_ino;
+
+   pram_memunlock_inode(dir->i_sb, pidir);
+   pidir->i_type.dir.tail = cpu_to_be64(inode->i_ino);
+   pidir->i_mtime = cpu_to_be32(dir->i_mtime.tv_sec);
+   pidir->i_ctime = cpu_to_be32(dir->i_ctime.tv_sec);
+   pram_memlock_inode(dir->i_sb, pidir);
+   } else {
+   /* the directory is empty */
+   prev_ino = 0;
+
+   pram_memunlock_inode(dir->i_sb, pidir);
+   pidir->i_type.dir.tail = cpu_to_be64(inode->i_ino);
+   pidir->i_type.dir.head = cpu_to_be64(inode->i_ino);
+   pidir->i_mtime = cpu_to_be32(dir->i_mtime.tv_sec);
+   pidir->i_ctime = cpu_to_be32(dir->i_ctime.tv_sec);
+   pram_memlock_inode(dir->i_sb, pidir);
+   }
+
+
+   pram_memunlock_inode(dir->i_sb, pi);
+   pi->i_d.d_prev = cpu_to_be64(prev_ino);
+   pi->i_d.d_parent = cpu_to_be64(dir->i_ino);
+   memcpy(pi->i_d.d_name, name, namelen);
+   pi->i_d.d_name[namelen] = '\0';
+   pram_memlock_inode(dir->i_sb, pi);
+   mutex_unlock(_I(dir)->i_link_mutex);
+   return 0;
+}
+
+int pram_remove_link(struct inode *inode)
+{
+   struct super_block *sb = inode->i_sb;
+   struct pram_inode *prev = NULL;
+   struct pram_inode *next = NULL;
+   struct pram_inode *pidir, *pi;
+   struct inode *dir = NULL;
+
+   pi = pram_get_inode(sb, inode->i_ino);
+   pidir = pram_get_inode(sb, be64_to_cpu(pi->i_d.d_parent));
+   if (!pidir)
+   return -EACCES;
+
+   dir = pram_iget(inode->i_sb, be64_to_cpu(pi->i_d.d_parent));
+   if (IS_ERR(dir))
+   return -EACCES;
+   mutex_lock(_I(dir)->i_link_mutex);
+
+   if (inode->i_ino == be64_to_cpu(pidir->i_type.dir.head)) {
+   /* first inode in directory */
+   next = pram_get_inode(sb, be64_to_cpu(pi->i_d.d_next));
+
+   if (next) {
+   pram_memunlock_inode(sb, next);
+   next->i_d.d_prev = 0;
+   pram_memlock_inode(sb, next);
+
+   pram_memunlock_inode(sb, pidir);
+   pidir->i_type.dir.head = pi->i_d.d_next;
+   } else {
+   pram_memunlock_inode(sb, pidir);
+   pidir->i_type.dir.head = 0;
+   pidir->i_type.dir.tail = 0;
+   }
+   pram_memlock_inode(sb, pidir);
+   } else if (inode->i_ino == be64_to_cpu(pidir->i_type.dir.tail)) {
+   /* last inode in directory */
+   prev = pram_get_inode(sb, be64_to_cpu(pi->i_d.d_prev));
+
+   pram_memunlock_inode(sb, prev);
+   prev->i_d.d_next = 0;
+   pram_memlock_inode(sb, prev);
+
+   pram_memunlock_inode(sb, pidir);
+   pidir->i_type.dir.tail = pi->i_d.d_prev;
+   pram_memlock_

[PATCH 07/19] pramfs: file operations

2013-09-07 Thread Marco Stornelli
Add file operations.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/file.c |  417 ++
 1 files changed, 417 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/file.c

diff --git a/fs/pramfs/file.c b/fs/pramfs/file.c
new file mode 100644
index 000..5aaba99
--- /dev/null
+++ b/fs/pramfs/file.c
@@ -0,0 +1,417 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * File operations for files.
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "pram.h"
+#include "acl.h"
+#include "xip.h"
+#include "xattr.h"
+
+/*
+ * The following functions are helper routines to copy to/from
+ * user space and iter over io vectors (mainly for readv/writev).
+ * They are used in the direct IO path.
+ */
+static size_t __pram_iov_copy_from(char *vaddr,
+   const struct iovec *iov, size_t base, size_t bytes)
+{
+   size_t copied = 0, left = 0;
+
+   while (bytes) {
+   char __user *buf = iov->iov_base + base;
+   int copy = min(bytes, iov->iov_len - base);
+
+   base = 0;
+   left = __copy_from_user(vaddr, buf, copy);
+   copied += copy;
+   bytes -= copy;
+   vaddr += copy;
+   iov++;
+
+   if (unlikely(left))
+   break;
+   }
+   return copied - left;
+}
+
+static size_t __pram_iov_copy_to(char *vaddr,
+   const struct iovec *iov, size_t base, size_t bytes)
+{
+   size_t copied = 0, left = 0;
+
+   while (bytes) {
+   char __user *buf = iov->iov_base + base;
+   int copy = min(bytes, iov->iov_len - base);
+
+   base = 0;
+   left = __copy_to_user(buf, vaddr, copy);
+   copied += copy;
+   bytes -= copy;
+   vaddr += copy;
+   iov++;
+
+   if (unlikely(left))
+   break;
+   }
+   return copied - left;
+}
+
+static size_t pram_iov_copy_from(void *to, struct iov_iter *i, size_t bytes)
+{
+   size_t copied;
+
+   if (likely(i->nr_segs == 1)) {
+   int left;
+   char __user *buf = i->iov->iov_base + i->iov_offset;
+   left = __copy_from_user(to, buf, bytes);
+   copied = bytes - left;
+   } else {
+   copied = __pram_iov_copy_from(to, i->iov, i->iov_offset, bytes);
+   }
+
+   return copied;
+}
+
+static size_t pram_iov_copy_to(void *from, struct iov_iter *i, size_t bytes)
+{
+   size_t copied;
+
+   if (likely(i->nr_segs == 1)) {
+   int left;
+   char __user *buf = i->iov->iov_base + i->iov_offset;
+   left = __copy_to_user(buf, from, bytes);
+   copied = bytes - left;
+   } else {
+   copied = __pram_iov_copy_to(from, i->iov, i->iov_offset, bytes);
+   }
+
+   return copied;
+}
+
+static size_t __pram_clear_user(const struct iovec *iov, size_t base,
+   size_t bytes)
+{
+   size_t claened = 0, left = 0;
+
+   while (bytes) {
+   char __user *buf = iov->iov_base + base;
+   int clear = min(bytes, iov->iov_len - base);
+
+   base = 0;
+   left = __clear_user(buf, clear);
+   claened += clear;
+   bytes -= clear;
+   iov++;
+
+   if (unlikely(left))
+   break;
+   }
+   return claened - left;
+}
+
+static size_t pram_clear_user(struct iov_iter *i, size_t bytes)
+{
+   size_t clear;
+
+   if (likely(i->nr_segs == 1)) {
+   int left;
+   char __user *buf = i->iov->iov_base + i->iov_offset;
+   left = __clear_user(buf, bytes);
+   clear = bytes - left;
+   } else {
+   clear = __pram_clear_user(i->iov, i->iov_offset, bytes);
+   }
+
+   return clear;
+}
+
+static int pram_open_file(struct inode *inode, struct file *filp)
+{
+   filp->f_flags |= O_DIRECT;
+   return generic_file_open(inode, filp);
+}
+
+ssize_t pram_direct_IO(int rw, struct kiocb *iocb,
+  const struct iovec *iov,
+  loff_t offset, unsigned long nr_segs)
+{
+   struct file *file = iocb->ki_filp;
+   struct inode *inode = file->f_mapping->host;
+   struct super_block *sb = inode->i_s

[PATCH 05/19] pramfs: super block operations

2013-09-07 Thread Marco Stornelli
Add super block operations.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/super.c |  994 +
 1 files changed, 994 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/super.c

diff --git a/fs/pramfs/super.c b/fs/pramfs/super.c
new file mode 100644
index 000..1c9801c
--- /dev/null
+++ b/fs/pramfs/super.c
@@ -0,0 +1,994 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Super block operations.
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "xattr.h"
+#include "pram.h"
+
+static struct super_operations pram_sops;
+static const struct export_operations pram_export_ops;
+static struct kmem_cache *pram_inode_cachep;
+
+#ifdef CONFIG_PRAMFS_TEST
+static void *first_pram_super;
+
+struct pram_super_block *get_pram_super(void)
+{
+   return (struct pram_super_block *)first_pram_super;
+}
+EXPORT_SYMBOL(get_pram_super);
+#endif
+
+void pram_error_mng(struct super_block *sb, const char *fmt, ...)
+{
+   va_list args;
+
+   va_start(args, fmt);
+   printk(KERN_ERR "pramfs error: ");
+   vprintk(fmt, args);
+   printk("\n");
+   va_end(args);
+
+   if (test_opt(sb, ERRORS_PANIC))
+   panic("pramfs: panic from previous error\n");
+   if (test_opt(sb, ERRORS_RO)) {
+   printk(KERN_CRIT "pramfs err: remounting filesystem read-only");
+   sb->s_flags |= MS_RDONLY;
+   }
+}
+
+static void pram_set_blocksize(struct super_block *sb, unsigned long size)
+{
+   int bits;
+
+   /*
+* We've already validated the user input and the value here must be
+* between PRAM_MAX_BLOCK_SIZE and PRAM_MIN_BLOCK_SIZE
+* and it must be a power of 2.
+*/
+   bits = fls(size) - 1;
+   sb->s_blocksize_bits = bits;
+   sb->s_blocksize = (1< MAX_LFS_FILESIZE)
+   res = MAX_LFS_FILESIZE;
+
+   pram_info("max file size %llu bytes\n", res);
+   return res;
+}
+
+enum {
+   Opt_addr, Opt_bpi, Opt_size,
+   Opt_num_inodes, Opt_mode, Opt_uid,
+   Opt_gid, Opt_blocksize, Opt_user_xattr,
+   Opt_nouser_xattr, Opt_noprotect,
+   Opt_acl, Opt_noacl, Opt_xip,
+   Opt_err_cont, Opt_err_panic, Opt_err_ro,
+   Opt_err
+};
+
+static const match_table_t tokens = {
+   {Opt_bpi,   "physaddr=%x"},
+   {Opt_bpi,   "bpi=%u"},
+   {Opt_size,  "init=%s"},
+   {Opt_num_inodes,"N=%u"},
+   {Opt_mode,  "mode=%o"},
+   {Opt_uid,   "uid=%u"},
+   {Opt_gid,   "gid=%u"},
+   {Opt_blocksize, "bs=%s"},
+   {Opt_user_xattr,"user_xattr"},
+   {Opt_user_xattr,"nouser_xattr"},
+   {Opt_noprotect, "noprotect"},
+   {Opt_acl,   "acl"},
+   {Opt_acl,   "noacl"},
+   {Opt_xip,   "xip"},
+   {Opt_err_cont,  "errors=continue"},
+   {Opt_err_panic, "errors=panic"},
+   {Opt_err_ro,"errors=remount-ro"},
+   {Opt_err,   NULL},
+};
+
+static phys_addr_t get_phys_addr(void **data)
+{
+   phys_addr_t phys_addr;
+   char *options = (char *) *data;
+   unsigned long long ulltmp;
+   char *end;
+   char org_end;
+   int err;
+
+   if (!options || strncmp(options, "physaddr=", 9) != 0)
+   return (phys_addr_t)ULLONG_MAX;
+   options += 9;
+   end = strchr(options, ',') ?: options + strlen(options);
+   org_end = *end;
+   *end = '\0';
+   err = kstrtoull(options, 0, );
+   *end = org_end;
+   options = end;
+   phys_addr = (phys_addr_t)ulltmp;
+   if (err) {
+   printk(KERN_ERR "Invalid phys addr specification: %s\n",
+  (char *) *data);
+   return (phys_addr_t)ULLONG_MAX;
+   }
+   if (phys_addr & (PAGE_SIZE - 1)) {
+   printk(KERN_ERR "physical address 0x%16llx for pramfs isn't "
+ "aligned to a page boundary\n",
+ (u64)phys_addr);
+   return (phys_addr_t)ULLO

[PATCH 06/19] pramfs: inode operations

2013-09-07 Thread Marco Stornelli
Add inode operations.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/inode.c |  907 +
 1 files changed, 907 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/inode.c

diff --git a/fs/pramfs/inode.c b/fs/pramfs/inode.c
new file mode 100644
index 000..d0f03b1
--- /dev/null
+++ b/fs/pramfs/inode.c
@@ -0,0 +1,907 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Inode methods (allocate/free/read/write).
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "pram.h"
+#include "xattr.h"
+#include "xip.h"
+#include "acl.h"
+
+struct backing_dev_info pram_backing_dev_info __read_mostly = {
+   .ra_pages   = 0,/* No readahead */
+   .capabilities   = BDI_CAP_NO_ACCT_AND_WRITEBACK,
+};
+
+/*
+ * allocate a data block for inode and return it's absolute blocknr.
+ * Zeroes out the block if zero set. Increments inode->i_blocks.
+ */
+static int pram_new_data_block(struct inode *inode, unsigned long *blocknr,
+  int zero)
+{
+   int errval = pram_new_block(inode->i_sb, blocknr, zero);
+
+   if (!errval) {
+   struct pram_inode *pi = pram_get_inode(inode->i_sb,
+   inode->i_ino);
+   inode->i_blocks++;
+   pram_memunlock_inode(inode->i_sb, pi);
+   pi->i_blocks = cpu_to_be32(inode->i_blocks);
+   pram_memlock_inode(inode->i_sb, pi);
+   }
+
+   return errval;
+}
+
+/*
+ * find the offset to the block represented by the given inode's file
+ * relative block number.
+ */
+u64 pram_find_data_block(struct inode *inode, unsigned long file_blocknr)
+{
+   struct super_block *sb = inode->i_sb;
+   struct pram_inode *pi;
+   u64 *row; /* ptr to row block */
+   u64 *col; /* ptr to column blocks */
+   u64 bp = 0;
+   unsigned int i_row, i_col;
+   unsigned int N = sb->s_blocksize >> 3; /* num block ptrs per block */
+   unsigned int Nbits = sb->s_blocksize_bits - 3;
+
+   pi = pram_get_inode(sb, inode->i_ino);
+
+   i_row = file_blocknr >> Nbits;
+   i_col  = file_blocknr & (N-1);
+
+   row = pram_get_block(sb, be64_to_cpu(pi->i_type.reg.row_block));
+   if (row) {
+   col = pram_get_block(sb, be64_to_cpu(row[i_row]));
+   if (col)
+   bp = be64_to_cpu(col[i_col]);
+   }
+
+   return bp;
+}
+
+/*
+ * find the file offset for SEEK_DATA/SEEK_HOLE
+ */
+int pram_find_region(struct inode *inode, loff_t *offset, int hole)
+{
+   struct super_block *sb = inode->i_sb;
+   struct pram_inode *pi = pram_get_inode(sb, inode->i_ino);
+   int N = sb->s_blocksize >> 3; /* num block ptrs per block */
+   int first_row_index, last_row_index, i, j;
+   unsigned long first_blocknr, last_blocknr, blocks = 0, offset_in_block;
+   u64 *row; /* ptr to row block */
+   u64 *col; /* ptr to column blocks */
+   int data_found = 0, hole_found = 0;
+
+   if (*offset >= inode->i_size)
+   return -ENXIO;
+
+   if (!inode->i_blocks || !pi->i_type.reg.row_block) {
+   if (hole)
+   return inode->i_size;
+   else
+   return -ENXIO;
+   }
+
+   offset_in_block = *offset & (sb->s_blocksize - 1);
+
+   first_blocknr = *offset >> sb->s_blocksize_bits;
+   last_blocknr = inode->i_size >> sb->s_blocksize_bits;
+
+   first_row_index = first_blocknr >> (sb->s_blocksize_bits - 3);
+   last_row_index  = last_blocknr >> (sb->s_blocksize_bits - 3);
+
+   row = pram_get_block(sb, be64_to_cpu(pi->i_type.reg.row_block));
+
+   for (i = first_row_index; i <= last_row_index; i++) {
+   int first_col_index = (i == first_row_index) ?
+   first_blocknr & (N-1) : 0;
+   int last_col_index = (i == last_row_index) ?
+   last_blocknr & (N-1) : N-1;
+
+   if (!row[i]) {
+   hole_found = 1;
+   if (!hole)
+   blocks += sb->s_blocksize >> 3;
+   continue;
+   }
+
+   col = pram_get_block(sb, be64_to_cpu(row[i]));
+
+   for (j = first_col_index; j <= last_col_index; j++) {
+

[PATCH 04/19] pramfs: add include files

2013-09-07 Thread Marco Stornelli
Added include files.

Signed-off-by: Marco Stornelli 
---
 fs/pramfs/pram.h |  283 ++
 include/linux/pram_fs.h  |   48 +++
 include/uapi/linux/Kbuild|1 +
 include/uapi/linux/magic.h   |1 +
 include/uapi/linux/pram_fs.h |  192 
 5 files changed, 525 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/pram.h
 create mode 100644 include/linux/pram_fs.h
 create mode 100644 include/uapi/linux/pram_fs.h

diff --git a/fs/pramfs/pram.h b/fs/pramfs/pram.h
new file mode 100644
index 000..08d724b
--- /dev/null
+++ b/fs/pramfs/pram.h
@@ -0,0 +1,283 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Definitions for the PRAMFS filesystem.
+ *
+ * Copyright 2009-2011 Marco Stornelli 
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+#ifndef __PRAM_H
+#define __PRAM_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "wprotect.h"
+
+/*
+ * Debug code
+ */
+#ifdef pr_fmt
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#endif
+
+#define pram_dbg(s, args...)   pr_debug(s, ## args)
+#define pram_warn(s, args...)  pr_warning(s, ## args)
+#define pram_info(s, args...)  pr_info(s, ## args)
+
+#ifdef CONFIG_PRINTK
+#define pram_err(sb, s, args...)pram_error_mng(sb, s, ## args)
+#else
+#define pram_err(sb, s, args...)   \
+do {   \
+   no_printk(s, ## args);  \
+   pram_error_mng(sb, " ");\
+} while (0)
+#endif
+
+#define pram_set_bit   __test_and_set_bit_le
+#define pram_clear_bit __test_and_clear_bit_le
+#define pram_find_next_zero_bitfind_next_zero_bit_le
+
+#define clear_opt(o, opt)  (o &= ~PRAM_MOUNT_##opt)
+#define set_opt(o, opt)(o |= PRAM_MOUNT_##opt)
+#define test_opt(sb, opt)  (PRAM_SB(sb)->s_mount_opt & PRAM_MOUNT_##opt)
+
+/* Function Prototypes */
+extern void pram_error_mng(struct super_block *sb, const char *fmt, ...);
+
+/* file.c */
+extern ssize_t pram_direct_IO(int rw, struct kiocb *iocb,
+ const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs);
+extern int pram_mmap(struct file *file, struct vm_area_struct *vma);
+
+/* balloc.c */
+extern void pram_init_bitmap(struct super_block *sb);
+extern void pram_free_block(struct super_block *sb, unsigned long blocknr);
+extern int pram_new_block(struct super_block *sb, unsigned long *blocknr,
+ int zero);
+extern unsigned long pram_count_free_blocks(struct super_block *sb);
+
+/* dir.c */
+extern int pram_add_link(struct dentry *dentry, struct inode *inode);
+extern int pram_remove_link(struct inode *inode);
+
+/* namei.c */
+extern struct dentry *pram_get_parent(struct dentry *child);
+
+/* inode.c */
+extern int pram_alloc_blocks(struct inode *inode, int file_blocknr,
+unsigned int num);
+extern u64 pram_find_data_block(struct inode *inode,
+   unsigned long file_blocknr);
+
+extern struct inode *pram_iget(struct super_block *sb, unsigned long ino);
+extern void pram_put_inode(struct inode *inode);
+extern void pram_evict_inode(struct inode *inode);
+extern struct inode *pram_new_inode(struct inode *dir, umode_t mode,
+   const struct qstr *qstr);
+extern int pram_update_inode(struct inode *inode);
+extern int pram_write_inode(struct inode *inode, struct writeback_control 
*wbc);
+extern void pram_dirty_inode(struct inode *inode, int flags);
+extern int pram_notify_change(struct dentry *dentry, struct iattr *attr);
+extern void pram_set_inode_flags(struct inode *inode, struct pram_inode *pi);
+extern void pram_get_inode_flags(struct inode *inode, struct pram_inode *pi);
+extern int pram_find_region(struct inode *inode, loff_t *offset, int hole);
+
+/* ioctl.c */
+extern long pram_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+#ifdef CONFIG_COMPAT
+extern long pram_compat_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg);
+#endif
+
+/* super.c */
+#ifdef CONFIG_PRAMFS_TEST
+extern struct pram_super_block *get_pram_super(void);
+#endif
+extern struct super_block *pram_read_super(struct super_block *sb,
+ void *data,
+ int silent);
+extern int pram_statfs(struct dentry *d, st

[PATCH 03/19] pramfs: export xip_file_fault

2013-09-07 Thread Marco Stornelli
Export xip_file_fault to modules.

Signed-off-by: Marco Stornelli 
---
 include/linux/fs.h |2 ++
 mm/filemap_xip.c   |3 ++-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3b4cd82..1f61e07 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -41,6 +41,7 @@ struct kobject;
 struct pipe_inode_info;
 struct poll_table_struct;
 struct kstatfs;
+struct vm_fault;
 struct vm_area_struct;
 struct vfsmount;
 struct cred;
@@ -2445,6 +2446,7 @@ extern int nonseekable_open(struct inode * inode, struct 
file * filp);
 #ifdef CONFIG_FS_XIP
 extern ssize_t xip_file_read(struct file *filp, char __user *buf, size_t len,
 loff_t *ppos);
+extern int xip_file_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern int xip_file_mmap(struct file * file, struct vm_area_struct * vma);
 extern ssize_t xip_file_write(struct file *filp, const char __user *buf,
  size_t len, loff_t *ppos);
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index 28fe26b..50bbc5d 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -219,7 +219,7 @@ retry:
  *
  * This function is derived from filemap_fault, but used for execute in place
  */
-static int xip_file_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+int xip_file_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
struct file *file = vma->vm_file;
struct address_space *mapping = file->f_mapping;
@@ -303,6 +303,7 @@ out:
}
 }
 
+EXPORT_SYMBOL_GPL(xip_file_fault);
 static const struct vm_operations_struct xip_file_vm_ops = {
.fault  = xip_file_fault,
.page_mkwrite   = filemap_page_mkwrite,
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/19] pramfs: add x86 set_memory_{rw|ro} flag

2013-09-07 Thread Marco Stornelli
Add a flag to x86 arch to know if a set_memory_{rw|ro} is supported.

Signed-off-by: Marco Stornelli 
---
 arch/Kconfig |3 +++
 arch/x86/Kconfig |1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 1feb169..589a043 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -171,6 +171,9 @@ config USER_RETURN_NOTIFIER
 config HAVE_IOREMAP_PROT
bool
 
+config HAVE_SET_MEMORY_RO
+   bool
+
 config HAVE_KPROBES
bool
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c0ed72..c500462 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -30,6 +30,7 @@ config X86
select HAVE_OPROFILE
select HAVE_PCSPKR_PLATFORM
select HAVE_PERF_EVENTS
+   select HAVE_SET_MEMORY_RO
select HAVE_IOREMAP_PROT
select HAVE_KPROBES
select HAVE_MEMBLOCK
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/19] pramfs: documentation

2013-09-07 Thread Marco Stornelli
Added pramfs documentation.

Signed-off-by: Marco Stornelli 
---
 Documentation/filesystems/pramfs.txt |  177 ++
 Documentation/filesystems/xip.txt|2 +
 MAINTAINERS  |9 ++
 3 files changed, 188 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/pramfs.txt

diff --git a/Documentation/filesystems/pramfs.txt 
b/Documentation/filesystems/pramfs.txt
new file mode 100644
index 000..a61d7a0
--- /dev/null
+++ b/Documentation/filesystems/pramfs.txt
@@ -0,0 +1,177 @@
+
+PRAMFS Overview
+===
+
+Many embedded systems have a block of non-volatile RAM separate from
+normal system memory, i.e. of which the kernel maintains no memory page
+descriptors. For such systems it would be beneficial to mount a
+fast read/write filesystem over this "I/O memory", for storing frequently
+accessed data that must survive system reboots and power cycles or volatile
+data avoiding to write on a disk or flash. An example usage might be system
+logs under /var/log or debug information of a flight-recorder.
+
+Linux traditionally had no support for a persistent, non-volatile RAM-based
+filesystem, persistent meaning the filesystem survives a system reboot
+or power cycle intact. The RAM-based filesystems such as tmpfs and ramfs
+have no actual backing store but exist entirely in the page and buffer
+caches, hence the filesystem disappears after a system reboot or
+power cycle.
+
+A relatively straightforward solution is to write a simple block driver
+for the non-volatile RAM, and mount over it any disk-based filesystem such
+as ext2, ext3, ext4, etc.
+
+But the disk-based fs over non-volatile RAM block driver approach has
+some drawbacks:
+
+1. Complexity of disk-based fs: disk-based filesystems such as ext2/ext3/ext4
+   were designed for optimum performance on spinning disk media, so they
+   implement features such as block groups, which attempts to group inode data
+   into a contiguous set of data blocks to minimize disk seeking when accessing
+   files. For RAM there is no such concern; a file's data blocks can be
+   scattered throughout the media with no access speed penalty at all. So block
+   groups in a filesystem mounted over RAM just adds unnecessary
+   complexity. A better approach is to use a filesystem specifically
+   tailored to RAM media which does away with these disk-based features.
+   This increases the efficient use of space on the media, i.e. more
+   space is dedicated to actual file data storage and less to meta-data
+   needed to maintain that file data.
+
+2. Different problems between disks and RAM: Because PRAMFS attempts to avoid
+   filesystem corruption caused by kernel bugs, dirty pages in the page cache
+   are not allowed to be written back to the backing-store RAM. This way, an
+   errant write into the page cache will not get written back to the 
filesystem.
+   However, if the backing-store RAM is comparable in access speed to system
+   memory, the penalty of not using caching is minimal. With this consideration
+   it's better to move file data directly between the user buffers and the 
backing
+   store RAM, i.e. use direct I/O. This prevents the unnecessary populating of
+   the page cache with dirty pages. However direct I/O has to be enabled at
+   every file open. To enable direct I/O at all times for all regular files
+   requires either that applications be modified to include the O_DIRECT flag 
on
+   all file opens, or that the filesystem used performs direct I/O by default.
+
+The Persistent/Protected RAM Special Filesystem (PRAMFS) is a read/write
+filesystem that has been designed to address these issues. PRAMFS is targeted
+to fast I/O memory, and if the memory is non-volatile, the filesystem will be
+persistent.
+
+In PRAMFS, direct I/O is enabled across all files in the filesystem, in other
+words the O_DIRECT flag is forced on every open of a PRAMFS file. Also, file
+I/O in the PRAMFS is always synchronous. There is no need to block the current
+process while the transfer to/from the PRAMFS is in progress, since one of
+the requirements of the PRAMFS is that the filesystem exists in fast RAM. So
+file I/O in PRAMFS is always direct, synchronous, and never blocks.
+
+PRAMFS supports the execute-in-place. With Xip, instead of doing
+memory-to-memory copies to transfer data from/to user space from/to kernel
+space, read operations are performed directly from/to the memory. For
+file mappings, the RAM itself is mapped directly into userspace. Xip,
+in addition, speed-up the applications start-up time because it removes the
+needs of any copies.
+
+PRAMFS is write protected. The page table entries that map the backing-store
+RAM are normally marked read-only. Write operations into the filesystem
+temporarily mark the affected pages as writeable, the write operation is
+carried out with locks held, and then the page table entries is
+marked read-only again.
+Th

[PATCH 00/19] pramfs

2013-09-07 Thread Marco Stornelli
Hi all,

this is an attempt to include pramfs in mainline. At the moment pramfs
has been included in LTSI kernel. Since last review the code is more
or less the same but, with a really big thanks to Vladimir Davydov and
Parallels, the development of fsck has been started and we have now
the possibility to correct fs errors due to corruption. It's a "young"
tool but we are working on it. You can clone the code from our repos:

git clone git://git.code.sf.net/p/pramfs/code pramfs-code
git clone git://git.code.sf.net/p/pramfs/Tools pramfs-Tools

Marco Stornelli (19):
  pramfs: documentation
  pramfs: add x86 set_memory_{rw|ro} flag
  pramfs: export xip_file_fault
  pramfs: add include files
  pramfs: super block operations
  pramfs: inode operations
  pramfs: file operations
  pramfs: file operations for dirs
  pramfs: inode operations for dirs
  pramfs: block allocation
  pramfs: ioctl operations
  pramfs: symlink operations
  pramfs: xip operations
  pramfs: extended attributes block description tree
  pramfs: extended attributes
  pramfs: acl operations
  pramfs: write protection
  pramfs: test module
  pramfs: Kconfig and makefile

 Documentation/filesystems/pramfs.txt |  177 ++
 Documentation/filesystems/xip.txt|2 +
 MAINTAINERS  |9 +
 arch/Kconfig |3 +
 arch/x86/Kconfig |1 +
 fs/Kconfig   |6 +-
 fs/Makefile  |1 +
 fs/pramfs/Kconfig|   72 +++
 fs/pramfs/Makefile   |   14 +
 fs/pramfs/acl.c  |  415 +
 fs/pramfs/acl.h  |   85 +++
 fs/pramfs/balloc.c   |  160 +
 fs/pramfs/desctree.c |  181 ++
 fs/pramfs/desctree.h |   44 ++
 fs/pramfs/dir.c  |  226 +++
 fs/pramfs/file.c |  417 +
 fs/pramfs/inode.c|  907 +++
 fs/pramfs/ioctl.c|  127 
 fs/pramfs/namei.c|  391 
 fs/pramfs/pram.h |  283 +
 fs/pramfs/pramfs_test.c  |   47 ++
 fs/pramfs/super.c|  994 ++
 fs/pramfs/symlink.c  |   76 +++
 fs/pramfs/wprotect.c |   39 ++
 fs/pramfs/wprotect.h |  144 +
 fs/pramfs/xattr.c| 1118 ++
 fs/pramfs/xattr.h|   92 +++
 fs/pramfs/xattr_security.c   |   80 +++
 fs/pramfs/xattr_trusted.c|   65 ++
 fs/pramfs/xattr_user.c   |   69 +++
 fs/pramfs/xip.c  |  119 
 fs/pramfs/xip.h  |   33 +
 include/linux/fs.h   |2 +
 include/linux/pram_fs.h  |   48 ++
 include/uapi/linux/Kbuild|1 +
 include/uapi/linux/magic.h   |1 +
 include/uapi/linux/pram_fs.h |  192 ++
 mm/filemap_xip.c |3 +-
 38 files changed, 6641 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/filesystems/pramfs.txt
 create mode 100644 fs/pramfs/Kconfig
 create mode 100644 fs/pramfs/Makefile
 create mode 100644 fs/pramfs/acl.c
 create mode 100644 fs/pramfs/acl.h
 create mode 100644 fs/pramfs/balloc.c
 create mode 100644 fs/pramfs/desctree.c
 create mode 100644 fs/pramfs/desctree.h
 create mode 100644 fs/pramfs/dir.c
 create mode 100644 fs/pramfs/file.c
 create mode 100644 fs/pramfs/inode.c
 create mode 100644 fs/pramfs/ioctl.c
 create mode 100644 fs/pramfs/namei.c
 create mode 100644 fs/pramfs/pram.h
 create mode 100644 fs/pramfs/pramfs_test.c
 create mode 100644 fs/pramfs/super.c
 create mode 100644 fs/pramfs/symlink.c
 create mode 100644 fs/pramfs/wprotect.c
 create mode 100644 fs/pramfs/wprotect.h
 create mode 100644 fs/pramfs/xattr.c
 create mode 100644 fs/pramfs/xattr.h
 create mode 100644 fs/pramfs/xattr_security.c
 create mode 100644 fs/pramfs/xattr_trusted.c
 create mode 100644 fs/pramfs/xattr_user.c
 create mode 100644 fs/pramfs/xip.c
 create mode 100644 fs/pramfs/xip.h
 create mode 100644 include/linux/pram_fs.h
 create mode 100644 include/uapi/linux/pram_fs.h

-- 
1.7.3.4
---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/19] pramfs

2013-09-07 Thread Marco Stornelli
Hi all,

this is an attempt to include pramfs in mainline. At the moment pramfs
has been included in LTSI kernel. Since last review the code is more
or less the same but, with a really big thanks to Vladimir Davydov and
Parallels, the development of fsck has been started and we have now
the possibility to correct fs errors due to corruption. It's a young
tool but we are working on it. You can clone the code from our repos:

git clone git://git.code.sf.net/p/pramfs/code pramfs-code
git clone git://git.code.sf.net/p/pramfs/Tools pramfs-Tools

Marco Stornelli (19):
  pramfs: documentation
  pramfs: add x86 set_memory_{rw|ro} flag
  pramfs: export xip_file_fault
  pramfs: add include files
  pramfs: super block operations
  pramfs: inode operations
  pramfs: file operations
  pramfs: file operations for dirs
  pramfs: inode operations for dirs
  pramfs: block allocation
  pramfs: ioctl operations
  pramfs: symlink operations
  pramfs: xip operations
  pramfs: extended attributes block description tree
  pramfs: extended attributes
  pramfs: acl operations
  pramfs: write protection
  pramfs: test module
  pramfs: Kconfig and makefile

 Documentation/filesystems/pramfs.txt |  177 ++
 Documentation/filesystems/xip.txt|2 +
 MAINTAINERS  |9 +
 arch/Kconfig |3 +
 arch/x86/Kconfig |1 +
 fs/Kconfig   |6 +-
 fs/Makefile  |1 +
 fs/pramfs/Kconfig|   72 +++
 fs/pramfs/Makefile   |   14 +
 fs/pramfs/acl.c  |  415 +
 fs/pramfs/acl.h  |   85 +++
 fs/pramfs/balloc.c   |  160 +
 fs/pramfs/desctree.c |  181 ++
 fs/pramfs/desctree.h |   44 ++
 fs/pramfs/dir.c  |  226 +++
 fs/pramfs/file.c |  417 +
 fs/pramfs/inode.c|  907 +++
 fs/pramfs/ioctl.c|  127 
 fs/pramfs/namei.c|  391 
 fs/pramfs/pram.h |  283 +
 fs/pramfs/pramfs_test.c  |   47 ++
 fs/pramfs/super.c|  994 ++
 fs/pramfs/symlink.c  |   76 +++
 fs/pramfs/wprotect.c |   39 ++
 fs/pramfs/wprotect.h |  144 +
 fs/pramfs/xattr.c| 1118 ++
 fs/pramfs/xattr.h|   92 +++
 fs/pramfs/xattr_security.c   |   80 +++
 fs/pramfs/xattr_trusted.c|   65 ++
 fs/pramfs/xattr_user.c   |   69 +++
 fs/pramfs/xip.c  |  119 
 fs/pramfs/xip.h  |   33 +
 include/linux/fs.h   |2 +
 include/linux/pram_fs.h  |   48 ++
 include/uapi/linux/Kbuild|1 +
 include/uapi/linux/magic.h   |1 +
 include/uapi/linux/pram_fs.h |  192 ++
 mm/filemap_xip.c |3 +-
 38 files changed, 6641 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/filesystems/pramfs.txt
 create mode 100644 fs/pramfs/Kconfig
 create mode 100644 fs/pramfs/Makefile
 create mode 100644 fs/pramfs/acl.c
 create mode 100644 fs/pramfs/acl.h
 create mode 100644 fs/pramfs/balloc.c
 create mode 100644 fs/pramfs/desctree.c
 create mode 100644 fs/pramfs/desctree.h
 create mode 100644 fs/pramfs/dir.c
 create mode 100644 fs/pramfs/file.c
 create mode 100644 fs/pramfs/inode.c
 create mode 100644 fs/pramfs/ioctl.c
 create mode 100644 fs/pramfs/namei.c
 create mode 100644 fs/pramfs/pram.h
 create mode 100644 fs/pramfs/pramfs_test.c
 create mode 100644 fs/pramfs/super.c
 create mode 100644 fs/pramfs/symlink.c
 create mode 100644 fs/pramfs/wprotect.c
 create mode 100644 fs/pramfs/wprotect.h
 create mode 100644 fs/pramfs/xattr.c
 create mode 100644 fs/pramfs/xattr.h
 create mode 100644 fs/pramfs/xattr_security.c
 create mode 100644 fs/pramfs/xattr_trusted.c
 create mode 100644 fs/pramfs/xattr_user.c
 create mode 100644 fs/pramfs/xip.c
 create mode 100644 fs/pramfs/xip.h
 create mode 100644 include/linux/pram_fs.h
 create mode 100644 include/uapi/linux/pram_fs.h

-- 
1.7.3.4
---

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/19] pramfs: documentation

2013-09-07 Thread Marco Stornelli
Added pramfs documentation.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 Documentation/filesystems/pramfs.txt |  177 ++
 Documentation/filesystems/xip.txt|2 +
 MAINTAINERS  |9 ++
 3 files changed, 188 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/pramfs.txt

diff --git a/Documentation/filesystems/pramfs.txt 
b/Documentation/filesystems/pramfs.txt
new file mode 100644
index 000..a61d7a0
--- /dev/null
+++ b/Documentation/filesystems/pramfs.txt
@@ -0,0 +1,177 @@
+
+PRAMFS Overview
+===
+
+Many embedded systems have a block of non-volatile RAM separate from
+normal system memory, i.e. of which the kernel maintains no memory page
+descriptors. For such systems it would be beneficial to mount a
+fast read/write filesystem over this I/O memory, for storing frequently
+accessed data that must survive system reboots and power cycles or volatile
+data avoiding to write on a disk or flash. An example usage might be system
+logs under /var/log or debug information of a flight-recorder.
+
+Linux traditionally had no support for a persistent, non-volatile RAM-based
+filesystem, persistent meaning the filesystem survives a system reboot
+or power cycle intact. The RAM-based filesystems such as tmpfs and ramfs
+have no actual backing store but exist entirely in the page and buffer
+caches, hence the filesystem disappears after a system reboot or
+power cycle.
+
+A relatively straightforward solution is to write a simple block driver
+for the non-volatile RAM, and mount over it any disk-based filesystem such
+as ext2, ext3, ext4, etc.
+
+But the disk-based fs over non-volatile RAM block driver approach has
+some drawbacks:
+
+1. Complexity of disk-based fs: disk-based filesystems such as ext2/ext3/ext4
+   were designed for optimum performance on spinning disk media, so they
+   implement features such as block groups, which attempts to group inode data
+   into a contiguous set of data blocks to minimize disk seeking when accessing
+   files. For RAM there is no such concern; a file's data blocks can be
+   scattered throughout the media with no access speed penalty at all. So block
+   groups in a filesystem mounted over RAM just adds unnecessary
+   complexity. A better approach is to use a filesystem specifically
+   tailored to RAM media which does away with these disk-based features.
+   This increases the efficient use of space on the media, i.e. more
+   space is dedicated to actual file data storage and less to meta-data
+   needed to maintain that file data.
+
+2. Different problems between disks and RAM: Because PRAMFS attempts to avoid
+   filesystem corruption caused by kernel bugs, dirty pages in the page cache
+   are not allowed to be written back to the backing-store RAM. This way, an
+   errant write into the page cache will not get written back to the 
filesystem.
+   However, if the backing-store RAM is comparable in access speed to system
+   memory, the penalty of not using caching is minimal. With this consideration
+   it's better to move file data directly between the user buffers and the 
backing
+   store RAM, i.e. use direct I/O. This prevents the unnecessary populating of
+   the page cache with dirty pages. However direct I/O has to be enabled at
+   every file open. To enable direct I/O at all times for all regular files
+   requires either that applications be modified to include the O_DIRECT flag 
on
+   all file opens, or that the filesystem used performs direct I/O by default.
+
+The Persistent/Protected RAM Special Filesystem (PRAMFS) is a read/write
+filesystem that has been designed to address these issues. PRAMFS is targeted
+to fast I/O memory, and if the memory is non-volatile, the filesystem will be
+persistent.
+
+In PRAMFS, direct I/O is enabled across all files in the filesystem, in other
+words the O_DIRECT flag is forced on every open of a PRAMFS file. Also, file
+I/O in the PRAMFS is always synchronous. There is no need to block the current
+process while the transfer to/from the PRAMFS is in progress, since one of
+the requirements of the PRAMFS is that the filesystem exists in fast RAM. So
+file I/O in PRAMFS is always direct, synchronous, and never blocks.
+
+PRAMFS supports the execute-in-place. With Xip, instead of doing
+memory-to-memory copies to transfer data from/to user space from/to kernel
+space, readwrite operations are performed directly from/to the memory. For
+file mappings, the RAM itself is mapped directly into userspace. Xip,
+in addition, speed-up the applications start-up time because it removes the
+needs of any copies.
+
+PRAMFS is write protected. The page table entries that map the backing-store
+RAM are normally marked read-only. Write operations into the filesystem
+temporarily mark the affected pages as writeable, the write operation is
+carried out with locks held, and then the page table entries is
+marked read-only

[PATCH 02/19] pramfs: add x86 set_memory_{rw|ro} flag

2013-09-07 Thread Marco Stornelli
Add a flag to x86 arch to know if a set_memory_{rw|ro} is supported.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 arch/Kconfig |3 +++
 arch/x86/Kconfig |1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 1feb169..589a043 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -171,6 +171,9 @@ config USER_RETURN_NOTIFIER
 config HAVE_IOREMAP_PROT
bool
 
+config HAVE_SET_MEMORY_RO
+   bool
+
 config HAVE_KPROBES
bool
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c0ed72..c500462 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -30,6 +30,7 @@ config X86
select HAVE_OPROFILE
select HAVE_PCSPKR_PLATFORM
select HAVE_PERF_EVENTS
+   select HAVE_SET_MEMORY_RO
select HAVE_IOREMAP_PROT
select HAVE_KPROBES
select HAVE_MEMBLOCK
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/19] pramfs: export xip_file_fault

2013-09-07 Thread Marco Stornelli
Export xip_file_fault to modules.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 include/linux/fs.h |2 ++
 mm/filemap_xip.c   |3 ++-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3b4cd82..1f61e07 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -41,6 +41,7 @@ struct kobject;
 struct pipe_inode_info;
 struct poll_table_struct;
 struct kstatfs;
+struct vm_fault;
 struct vm_area_struct;
 struct vfsmount;
 struct cred;
@@ -2445,6 +2446,7 @@ extern int nonseekable_open(struct inode * inode, struct 
file * filp);
 #ifdef CONFIG_FS_XIP
 extern ssize_t xip_file_read(struct file *filp, char __user *buf, size_t len,
 loff_t *ppos);
+extern int xip_file_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern int xip_file_mmap(struct file * file, struct vm_area_struct * vma);
 extern ssize_t xip_file_write(struct file *filp, const char __user *buf,
  size_t len, loff_t *ppos);
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index 28fe26b..50bbc5d 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -219,7 +219,7 @@ retry:
  *
  * This function is derived from filemap_fault, but used for execute in place
  */
-static int xip_file_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+int xip_file_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
struct file *file = vma-vm_file;
struct address_space *mapping = file-f_mapping;
@@ -303,6 +303,7 @@ out:
}
 }
 
+EXPORT_SYMBOL_GPL(xip_file_fault);
 static const struct vm_operations_struct xip_file_vm_ops = {
.fault  = xip_file_fault,
.page_mkwrite   = filemap_page_mkwrite,
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/19] pramfs: add include files

2013-09-07 Thread Marco Stornelli
Added include files.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/pram.h |  283 ++
 include/linux/pram_fs.h  |   48 +++
 include/uapi/linux/Kbuild|1 +
 include/uapi/linux/magic.h   |1 +
 include/uapi/linux/pram_fs.h |  192 
 5 files changed, 525 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/pram.h
 create mode 100644 include/linux/pram_fs.h
 create mode 100644 include/uapi/linux/pram_fs.h

diff --git a/fs/pramfs/pram.h b/fs/pramfs/pram.h
new file mode 100644
index 000..08d724b
--- /dev/null
+++ b/fs/pramfs/pram.h
@@ -0,0 +1,283 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Definitions for the PRAMFS filesystem.
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+#ifndef __PRAM_H
+#define __PRAM_H
+
+#include linux/buffer_head.h
+#include linux/pram_fs.h
+#include linux/crc32.h
+#include linux/mutex.h
+#include linux/rcupdate.h
+#include linux/types.h
+#include wprotect.h
+
+/*
+ * Debug code
+ */
+#ifdef pr_fmt
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
+#endif
+
+#define pram_dbg(s, args...)   pr_debug(s, ## args)
+#define pram_warn(s, args...)  pr_warning(s, ## args)
+#define pram_info(s, args...)  pr_info(s, ## args)
+
+#ifdef CONFIG_PRINTK
+#define pram_err(sb, s, args...)pram_error_mng(sb, s, ## args)
+#else
+#define pram_err(sb, s, args...)   \
+do {   \
+   no_printk(s, ## args);  \
+   pram_error_mng(sb,  );\
+} while (0)
+#endif
+
+#define pram_set_bit   __test_and_set_bit_le
+#define pram_clear_bit __test_and_clear_bit_le
+#define pram_find_next_zero_bitfind_next_zero_bit_le
+
+#define clear_opt(o, opt)  (o = ~PRAM_MOUNT_##opt)
+#define set_opt(o, opt)(o |= PRAM_MOUNT_##opt)
+#define test_opt(sb, opt)  (PRAM_SB(sb)-s_mount_opt  PRAM_MOUNT_##opt)
+
+/* Function Prototypes */
+extern void pram_error_mng(struct super_block *sb, const char *fmt, ...);
+
+/* file.c */
+extern ssize_t pram_direct_IO(int rw, struct kiocb *iocb,
+ const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs);
+extern int pram_mmap(struct file *file, struct vm_area_struct *vma);
+
+/* balloc.c */
+extern void pram_init_bitmap(struct super_block *sb);
+extern void pram_free_block(struct super_block *sb, unsigned long blocknr);
+extern int pram_new_block(struct super_block *sb, unsigned long *blocknr,
+ int zero);
+extern unsigned long pram_count_free_blocks(struct super_block *sb);
+
+/* dir.c */
+extern int pram_add_link(struct dentry *dentry, struct inode *inode);
+extern int pram_remove_link(struct inode *inode);
+
+/* namei.c */
+extern struct dentry *pram_get_parent(struct dentry *child);
+
+/* inode.c */
+extern int pram_alloc_blocks(struct inode *inode, int file_blocknr,
+unsigned int num);
+extern u64 pram_find_data_block(struct inode *inode,
+   unsigned long file_blocknr);
+
+extern struct inode *pram_iget(struct super_block *sb, unsigned long ino);
+extern void pram_put_inode(struct inode *inode);
+extern void pram_evict_inode(struct inode *inode);
+extern struct inode *pram_new_inode(struct inode *dir, umode_t mode,
+   const struct qstr *qstr);
+extern int pram_update_inode(struct inode *inode);
+extern int pram_write_inode(struct inode *inode, struct writeback_control 
*wbc);
+extern void pram_dirty_inode(struct inode *inode, int flags);
+extern int pram_notify_change(struct dentry *dentry, struct iattr *attr);
+extern void pram_set_inode_flags(struct inode *inode, struct pram_inode *pi);
+extern void pram_get_inode_flags(struct inode *inode, struct pram_inode *pi);
+extern int pram_find_region(struct inode *inode, loff_t *offset, int hole);
+
+/* ioctl.c */
+extern long pram_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+#ifdef CONFIG_COMPAT
+extern long pram_compat_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg);
+#endif
+
+/* super.c */
+#ifdef CONFIG_PRAMFS_TEST
+extern struct pram_super_block *get_pram_super(void);
+#endif
+extern struct super_block *pram_read_super(struct super_block *sb,
+ void *data

[PATCH 06/19] pramfs: inode operations

2013-09-07 Thread Marco Stornelli
Add inode operations.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/inode.c |  907 +
 1 files changed, 907 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/inode.c

diff --git a/fs/pramfs/inode.c b/fs/pramfs/inode.c
new file mode 100644
index 000..d0f03b1
--- /dev/null
+++ b/fs/pramfs/inode.c
@@ -0,0 +1,907 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Inode methods (allocate/free/read/write).
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/fs.h
+#include linux/sched.h
+#include linux/highuid.h
+#include linux/module.h
+#include linux/mpage.h
+#include linux/backing-dev.h
+#include pram.h
+#include xattr.h
+#include xip.h
+#include acl.h
+
+struct backing_dev_info pram_backing_dev_info __read_mostly = {
+   .ra_pages   = 0,/* No readahead */
+   .capabilities   = BDI_CAP_NO_ACCT_AND_WRITEBACK,
+};
+
+/*
+ * allocate a data block for inode and return it's absolute blocknr.
+ * Zeroes out the block if zero set. Increments inode-i_blocks.
+ */
+static int pram_new_data_block(struct inode *inode, unsigned long *blocknr,
+  int zero)
+{
+   int errval = pram_new_block(inode-i_sb, blocknr, zero);
+
+   if (!errval) {
+   struct pram_inode *pi = pram_get_inode(inode-i_sb,
+   inode-i_ino);
+   inode-i_blocks++;
+   pram_memunlock_inode(inode-i_sb, pi);
+   pi-i_blocks = cpu_to_be32(inode-i_blocks);
+   pram_memlock_inode(inode-i_sb, pi);
+   }
+
+   return errval;
+}
+
+/*
+ * find the offset to the block represented by the given inode's file
+ * relative block number.
+ */
+u64 pram_find_data_block(struct inode *inode, unsigned long file_blocknr)
+{
+   struct super_block *sb = inode-i_sb;
+   struct pram_inode *pi;
+   u64 *row; /* ptr to row block */
+   u64 *col; /* ptr to column blocks */
+   u64 bp = 0;
+   unsigned int i_row, i_col;
+   unsigned int N = sb-s_blocksize  3; /* num block ptrs per block */
+   unsigned int Nbits = sb-s_blocksize_bits - 3;
+
+   pi = pram_get_inode(sb, inode-i_ino);
+
+   i_row = file_blocknr  Nbits;
+   i_col  = file_blocknr  (N-1);
+
+   row = pram_get_block(sb, be64_to_cpu(pi-i_type.reg.row_block));
+   if (row) {
+   col = pram_get_block(sb, be64_to_cpu(row[i_row]));
+   if (col)
+   bp = be64_to_cpu(col[i_col]);
+   }
+
+   return bp;
+}
+
+/*
+ * find the file offset for SEEK_DATA/SEEK_HOLE
+ */
+int pram_find_region(struct inode *inode, loff_t *offset, int hole)
+{
+   struct super_block *sb = inode-i_sb;
+   struct pram_inode *pi = pram_get_inode(sb, inode-i_ino);
+   int N = sb-s_blocksize  3; /* num block ptrs per block */
+   int first_row_index, last_row_index, i, j;
+   unsigned long first_blocknr, last_blocknr, blocks = 0, offset_in_block;
+   u64 *row; /* ptr to row block */
+   u64 *col; /* ptr to column blocks */
+   int data_found = 0, hole_found = 0;
+
+   if (*offset = inode-i_size)
+   return -ENXIO;
+
+   if (!inode-i_blocks || !pi-i_type.reg.row_block) {
+   if (hole)
+   return inode-i_size;
+   else
+   return -ENXIO;
+   }
+
+   offset_in_block = *offset  (sb-s_blocksize - 1);
+
+   first_blocknr = *offset  sb-s_blocksize_bits;
+   last_blocknr = inode-i_size  sb-s_blocksize_bits;
+
+   first_row_index = first_blocknr  (sb-s_blocksize_bits - 3);
+   last_row_index  = last_blocknr  (sb-s_blocksize_bits - 3);
+
+   row = pram_get_block(sb, be64_to_cpu(pi-i_type.reg.row_block));
+
+   for (i = first_row_index; i = last_row_index; i++) {
+   int first_col_index = (i == first_row_index) ?
+   first_blocknr  (N-1) : 0;
+   int last_col_index = (i == last_row_index) ?
+   last_blocknr  (N-1) : N-1;
+
+   if (!row[i]) {
+   hole_found = 1;
+   if (!hole)
+   blocks += sb-s_blocksize  3;
+   continue;
+   }
+
+   col = pram_get_block(sb, be64_to_cpu(row[i]));
+
+   for (j = first_col_index; j = last_col_index; j++) {
+
+   if (col[j]) {
+   data_found = 1;
+   if (!hole

[PATCH 05/19] pramfs: super block operations

2013-09-07 Thread Marco Stornelli
Add super block operations.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/super.c |  994 +
 1 files changed, 994 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/super.c

diff --git a/fs/pramfs/super.c b/fs/pramfs/super.c
new file mode 100644
index 000..1c9801c
--- /dev/null
+++ b/fs/pramfs/super.c
@@ -0,0 +1,994 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Super block operations.
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/module.h
+#include linux/string.h
+#include linux/slab.h
+#include linux/init.h
+#include linux/parser.h
+#include linux/vfs.h
+#include linux/uaccess.h
+#include linux/io.h
+#include linux/seq_file.h
+#include linux/mount.h
+#include linux/mm.h
+#include linux/ctype.h
+#include linux/bitops.h
+#include linux/magic.h
+#include linux/exportfs.h
+#include linux/random.h
+#include linux/cred.h
+#include linux/backing-dev.h
+#include linux/ioport.h
+#include xattr.h
+#include pram.h
+
+static struct super_operations pram_sops;
+static const struct export_operations pram_export_ops;
+static struct kmem_cache *pram_inode_cachep;
+
+#ifdef CONFIG_PRAMFS_TEST
+static void *first_pram_super;
+
+struct pram_super_block *get_pram_super(void)
+{
+   return (struct pram_super_block *)first_pram_super;
+}
+EXPORT_SYMBOL(get_pram_super);
+#endif
+
+void pram_error_mng(struct super_block *sb, const char *fmt, ...)
+{
+   va_list args;
+
+   va_start(args, fmt);
+   printk(KERN_ERR pramfs error: );
+   vprintk(fmt, args);
+   printk(\n);
+   va_end(args);
+
+   if (test_opt(sb, ERRORS_PANIC))
+   panic(pramfs: panic from previous error\n);
+   if (test_opt(sb, ERRORS_RO)) {
+   printk(KERN_CRIT pramfs err: remounting filesystem read-only);
+   sb-s_flags |= MS_RDONLY;
+   }
+}
+
+static void pram_set_blocksize(struct super_block *sb, unsigned long size)
+{
+   int bits;
+
+   /*
+* We've already validated the user input and the value here must be
+* between PRAM_MAX_BLOCK_SIZE and PRAM_MIN_BLOCK_SIZE
+* and it must be a power of 2.
+*/
+   bits = fls(size) - 1;
+   sb-s_blocksize_bits = bits;
+   sb-s_blocksize = (1bits);
+}
+
+static inline void *pram_ioremap(phys_addr_t phys_addr, ssize_t size,
+bool protect)
+{
+   void *retval;
+
+   /*
+* NOTE: Userland may not map this resource, we will mark the region so
+* /dev/mem and the sysfs MMIO access will not be allowed. This
+* restriction depends on STRICT_DEVMEM option. If this option is
+* disabled or not available we mark the region only as busy.
+*/
+   retval = request_mem_region_exclusive(phys_addr, size, pramfs);
+   if (!retval)
+   goto fail;
+
+   if (protect) {
+   retval = (__force void *)ioremap_nocache(phys_addr, size);
+   if (!retval)
+   goto fail;
+   pram_writeable(retval, size, 0);
+   } else
+   retval = (__force void *)ioremap(phys_addr, size);
+fail:
+   return retval;
+}
+
+static loff_t pram_max_size(int bits)
+{
+   loff_t res;
+   res = (1ULL  (3*bits - 6)) - 1;
+
+   if (res  MAX_LFS_FILESIZE)
+   res = MAX_LFS_FILESIZE;
+
+   pram_info(max file size %llu bytes\n, res);
+   return res;
+}
+
+enum {
+   Opt_addr, Opt_bpi, Opt_size,
+   Opt_num_inodes, Opt_mode, Opt_uid,
+   Opt_gid, Opt_blocksize, Opt_user_xattr,
+   Opt_nouser_xattr, Opt_noprotect,
+   Opt_acl, Opt_noacl, Opt_xip,
+   Opt_err_cont, Opt_err_panic, Opt_err_ro,
+   Opt_err
+};
+
+static const match_table_t tokens = {
+   {Opt_bpi,   physaddr=%x},
+   {Opt_bpi,   bpi=%u},
+   {Opt_size,  init=%s},
+   {Opt_num_inodes,N=%u},
+   {Opt_mode,  mode=%o},
+   {Opt_uid,   uid=%u},
+   {Opt_gid,   gid=%u},
+   {Opt_blocksize, bs=%s},
+   {Opt_user_xattr,user_xattr},
+   {Opt_user_xattr,nouser_xattr},
+   {Opt_noprotect, noprotect},
+   {Opt_acl,   acl},
+   {Opt_acl,   noacl},
+   {Opt_xip,   xip},
+   {Opt_err_cont,  errors=continue},
+   {Opt_err_panic, errors=panic},
+   {Opt_err_ro,errors=remount-ro},
+   {Opt_err,   NULL},
+};
+
+static phys_addr_t get_phys_addr(void **data

[PATCH 07/19] pramfs: file operations

2013-09-07 Thread Marco Stornelli
Add file operations.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/file.c |  417 ++
 1 files changed, 417 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/file.c

diff --git a/fs/pramfs/file.c b/fs/pramfs/file.c
new file mode 100644
index 000..5aaba99
--- /dev/null
+++ b/fs/pramfs/file.c
@@ -0,0 +1,417 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * File operations for files.
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/fs.h
+#include linux/aio.h
+#include linux/slab.h
+#include linux/uio.h
+#include linux/mm.h
+#include linux/uaccess.h
+#include linux/falloc.h
+#include pram.h
+#include acl.h
+#include xip.h
+#include xattr.h
+
+/*
+ * The following functions are helper routines to copy to/from
+ * user space and iter over io vectors (mainly for readv/writev).
+ * They are used in the direct IO path.
+ */
+static size_t __pram_iov_copy_from(char *vaddr,
+   const struct iovec *iov, size_t base, size_t bytes)
+{
+   size_t copied = 0, left = 0;
+
+   while (bytes) {
+   char __user *buf = iov-iov_base + base;
+   int copy = min(bytes, iov-iov_len - base);
+
+   base = 0;
+   left = __copy_from_user(vaddr, buf, copy);
+   copied += copy;
+   bytes -= copy;
+   vaddr += copy;
+   iov++;
+
+   if (unlikely(left))
+   break;
+   }
+   return copied - left;
+}
+
+static size_t __pram_iov_copy_to(char *vaddr,
+   const struct iovec *iov, size_t base, size_t bytes)
+{
+   size_t copied = 0, left = 0;
+
+   while (bytes) {
+   char __user *buf = iov-iov_base + base;
+   int copy = min(bytes, iov-iov_len - base);
+
+   base = 0;
+   left = __copy_to_user(buf, vaddr, copy);
+   copied += copy;
+   bytes -= copy;
+   vaddr += copy;
+   iov++;
+
+   if (unlikely(left))
+   break;
+   }
+   return copied - left;
+}
+
+static size_t pram_iov_copy_from(void *to, struct iov_iter *i, size_t bytes)
+{
+   size_t copied;
+
+   if (likely(i-nr_segs == 1)) {
+   int left;
+   char __user *buf = i-iov-iov_base + i-iov_offset;
+   left = __copy_from_user(to, buf, bytes);
+   copied = bytes - left;
+   } else {
+   copied = __pram_iov_copy_from(to, i-iov, i-iov_offset, bytes);
+   }
+
+   return copied;
+}
+
+static size_t pram_iov_copy_to(void *from, struct iov_iter *i, size_t bytes)
+{
+   size_t copied;
+
+   if (likely(i-nr_segs == 1)) {
+   int left;
+   char __user *buf = i-iov-iov_base + i-iov_offset;
+   left = __copy_to_user(buf, from, bytes);
+   copied = bytes - left;
+   } else {
+   copied = __pram_iov_copy_to(from, i-iov, i-iov_offset, bytes);
+   }
+
+   return copied;
+}
+
+static size_t __pram_clear_user(const struct iovec *iov, size_t base,
+   size_t bytes)
+{
+   size_t claened = 0, left = 0;
+
+   while (bytes) {
+   char __user *buf = iov-iov_base + base;
+   int clear = min(bytes, iov-iov_len - base);
+
+   base = 0;
+   left = __clear_user(buf, clear);
+   claened += clear;
+   bytes -= clear;
+   iov++;
+
+   if (unlikely(left))
+   break;
+   }
+   return claened - left;
+}
+
+static size_t pram_clear_user(struct iov_iter *i, size_t bytes)
+{
+   size_t clear;
+
+   if (likely(i-nr_segs == 1)) {
+   int left;
+   char __user *buf = i-iov-iov_base + i-iov_offset;
+   left = __clear_user(buf, bytes);
+   clear = bytes - left;
+   } else {
+   clear = __pram_clear_user(i-iov, i-iov_offset, bytes);
+   }
+
+   return clear;
+}
+
+static int pram_open_file(struct inode *inode, struct file *filp)
+{
+   filp-f_flags |= O_DIRECT;
+   return generic_file_open(inode, filp);
+}
+
+ssize_t pram_direct_IO(int rw, struct kiocb *iocb,
+  const struct iovec *iov,
+  loff_t offset, unsigned long nr_segs)
+{
+   struct file *file = iocb-ki_filp;
+   struct inode *inode = file-f_mapping-host;
+   struct super_block *sb = inode-i_sb;
+   int progress = 0, hole = 0, alloc_once = 1

[PATCH 08/19] pramfs: file operations for dirs

2013-09-07 Thread Marco Stornelli
Add file operations for dirs.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/dir.c |  226 +++
 1 files changed, 226 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/dir.c

diff --git a/fs/pramfs/dir.c b/fs/pramfs/dir.c
new file mode 100644
index 000..137a592
--- /dev/null
+++ b/fs/pramfs/dir.c
@@ -0,0 +1,226 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * File operations for directories.
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/fs.h
+#include linux/pagemap.h
+#include pram.h
+
+/*
+ * Parent is locked.
+ */
+int pram_add_link(struct dentry *dentry, struct inode *inode)
+{
+   struct inode *dir = dentry-d_parent-d_inode;
+   struct pram_inode *pidir, *pi, *pitail = NULL;
+   u64 tail_ino, prev_ino;
+
+   const char *name = dentry-d_name.name;
+
+   int namelen = min_t(unsigned int, dentry-d_name.len, PRAM_NAME_LEN);
+
+   pidir = pram_get_inode(dir-i_sb, dir-i_ino);
+
+   mutex_lock(PRAM_I(dir)-i_link_mutex);
+
+   pi = pram_get_inode(dir-i_sb, inode-i_ino);
+
+   dir-i_mtime = dir-i_ctime = CURRENT_TIME;
+
+   tail_ino = be64_to_cpu(pidir-i_type.dir.tail);
+   if (tail_ino != 0) {
+   pitail = pram_get_inode(dir-i_sb, tail_ino);
+   pram_memunlock_inode(dir-i_sb, pitail);
+   pitail-i_d.d_next = cpu_to_be64(inode-i_ino);
+   pram_memlock_inode(dir-i_sb, pitail);
+
+   prev_ino = tail_ino;
+
+   pram_memunlock_inode(dir-i_sb, pidir);
+   pidir-i_type.dir.tail = cpu_to_be64(inode-i_ino);
+   pidir-i_mtime = cpu_to_be32(dir-i_mtime.tv_sec);
+   pidir-i_ctime = cpu_to_be32(dir-i_ctime.tv_sec);
+   pram_memlock_inode(dir-i_sb, pidir);
+   } else {
+   /* the directory is empty */
+   prev_ino = 0;
+
+   pram_memunlock_inode(dir-i_sb, pidir);
+   pidir-i_type.dir.tail = cpu_to_be64(inode-i_ino);
+   pidir-i_type.dir.head = cpu_to_be64(inode-i_ino);
+   pidir-i_mtime = cpu_to_be32(dir-i_mtime.tv_sec);
+   pidir-i_ctime = cpu_to_be32(dir-i_ctime.tv_sec);
+   pram_memlock_inode(dir-i_sb, pidir);
+   }
+
+
+   pram_memunlock_inode(dir-i_sb, pi);
+   pi-i_d.d_prev = cpu_to_be64(prev_ino);
+   pi-i_d.d_parent = cpu_to_be64(dir-i_ino);
+   memcpy(pi-i_d.d_name, name, namelen);
+   pi-i_d.d_name[namelen] = '\0';
+   pram_memlock_inode(dir-i_sb, pi);
+   mutex_unlock(PRAM_I(dir)-i_link_mutex);
+   return 0;
+}
+
+int pram_remove_link(struct inode *inode)
+{
+   struct super_block *sb = inode-i_sb;
+   struct pram_inode *prev = NULL;
+   struct pram_inode *next = NULL;
+   struct pram_inode *pidir, *pi;
+   struct inode *dir = NULL;
+
+   pi = pram_get_inode(sb, inode-i_ino);
+   pidir = pram_get_inode(sb, be64_to_cpu(pi-i_d.d_parent));
+   if (!pidir)
+   return -EACCES;
+
+   dir = pram_iget(inode-i_sb, be64_to_cpu(pi-i_d.d_parent));
+   if (IS_ERR(dir))
+   return -EACCES;
+   mutex_lock(PRAM_I(dir)-i_link_mutex);
+
+   if (inode-i_ino == be64_to_cpu(pidir-i_type.dir.head)) {
+   /* first inode in directory */
+   next = pram_get_inode(sb, be64_to_cpu(pi-i_d.d_next));
+
+   if (next) {
+   pram_memunlock_inode(sb, next);
+   next-i_d.d_prev = 0;
+   pram_memlock_inode(sb, next);
+
+   pram_memunlock_inode(sb, pidir);
+   pidir-i_type.dir.head = pi-i_d.d_next;
+   } else {
+   pram_memunlock_inode(sb, pidir);
+   pidir-i_type.dir.head = 0;
+   pidir-i_type.dir.tail = 0;
+   }
+   pram_memlock_inode(sb, pidir);
+   } else if (inode-i_ino == be64_to_cpu(pidir-i_type.dir.tail)) {
+   /* last inode in directory */
+   prev = pram_get_inode(sb, be64_to_cpu(pi-i_d.d_prev));
+
+   pram_memunlock_inode(sb, prev);
+   prev-i_d.d_next = 0;
+   pram_memlock_inode(sb, prev);
+
+   pram_memunlock_inode(sb, pidir);
+   pidir-i_type.dir.tail = pi-i_d.d_prev;
+   pram_memlock_inode(sb, pidir);
+   } else {
+   /* somewhere in the middle */
+   prev = pram_get_inode(sb, be64_to_cpu(pi-i_d.d_prev));
+   next = pram_get_inode(sb

[PATCH 09/19] pramfs: inode operations for dirs

2013-09-07 Thread Marco Stornelli
Add inode operations for dirs.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/namei.c |  391 +
 1 files changed, 391 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/namei.c

diff --git a/fs/pramfs/namei.c b/fs/pramfs/namei.c
new file mode 100644
index 000..64dd946
--- /dev/null
+++ b/fs/pramfs/namei.c
@@ -0,0 +1,391 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Inode operations for directories.
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+#include linux/fs.h
+#include linux/pagemap.h
+#include pram.h
+#include acl.h
+#include xattr.h
+#include xip.h
+
+/*
+ * Couple of helper functions - make the code slightly cleaner.
+ */
+
+static inline void pram_inc_count(struct inode *inode)
+{
+   inc_nlink(inode);
+   pram_write_inode(inode, NULL);
+}
+
+static inline void pram_dec_count(struct inode *inode)
+{
+   if (inode-i_nlink) {
+   drop_nlink(inode);
+   pram_write_inode(inode, NULL);
+   }
+}
+
+static inline int pram_add_nondir(struct inode *dir,
+  struct dentry *dentry,
+  struct inode *inode)
+{
+   int err = pram_add_link(dentry, inode);
+   if (!err) {
+   unlock_new_inode(inode);
+   d_instantiate(dentry, inode);
+   return 0;
+   }
+   pram_dec_count(inode);
+   unlock_new_inode(inode);
+   iput(inode);
+   return err;
+}
+
+/*
+ * Methods themselves.
+ */
+
+static ino_t pram_inode_by_name(struct inode *dir, struct dentry *dentry)
+{
+   struct pram_inode *pi;
+   ino_t ino;
+   int namelen;
+
+   pi = pram_get_inode(dir-i_sb, dir-i_ino);
+   ino = be64_to_cpu(pi-i_type.dir.head);
+
+   mutex_lock(PRAM_I(dir)-i_link_mutex);
+   while (ino) {
+   pi = pram_get_inode(dir-i_sb, ino);
+
+   if (pi-i_links_count) {
+   namelen = strlen(pi-i_d.d_name);
+
+   if (namelen == dentry-d_name.len 
+   !memcmp(dentry-d_name.name,
+   pi-i_d.d_name, namelen))
+   break;
+   }
+
+   ino = be64_to_cpu(pi-i_d.d_next);
+   }
+   mutex_unlock(PRAM_I(dir)-i_link_mutex);
+   return ino;
+}
+
+static struct dentry *pram_lookup(struct inode *dir, struct dentry *dentry,
+ unsigned int flags)
+{
+   struct inode *inode = NULL;
+   ino_t ino;
+
+   if (dentry-d_name.len  PRAM_NAME_LEN)
+   return ERR_PTR(-ENAMETOOLONG);
+
+   ino = pram_inode_by_name(dir, dentry);
+   if (ino) {
+   inode = pram_iget(dir-i_sb, ino);
+   if (inode == ERR_PTR(-ESTALE)) {
+   pram_err(dir-i_sb,
+   deleted inode referenced: %lu,
+   (unsigned long) ino);
+   return ERR_PTR(-EIO);
+   }
+   }
+
+   return d_splice_alias(inode, dentry);
+}
+
+
+/*
+ * By the time this is called, we already have created
+ * the directory cache entry for the new file, but it
+ * is so far negative - it has no inode.
+ *
+ * If the create succeeds, we fill in the inode information
+ * with d_instantiate().
+ */
+static int pram_create(struct inode *dir, struct dentry *dentry, umode_t mode,
+  bool flags)
+{
+   struct inode *inode = pram_new_inode(dir, mode, dentry-d_name);
+   int err = PTR_ERR(inode);
+   if (!IS_ERR(inode)) {
+   inode-i_op = pram_file_inode_operations;
+   if (pram_use_xip(inode-i_sb)) {
+   inode-i_mapping-a_ops = pram_aops_xip;
+   inode-i_fop = pram_xip_file_operations;
+   } else {
+   inode-i_fop = pram_file_operations;
+   inode-i_mapping-a_ops = pram_aops;
+   }
+   err = pram_add_nondir(dir, dentry, inode);
+   }
+   return err;
+}
+
+static int pram_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
+{
+   struct inode *inode = pram_new_inode(dir, mode, NULL);
+   if (IS_ERR(inode))
+   return PTR_ERR(inode);
+
+   inode-i_op = pram_file_inode_operations;
+   if (pram_use_xip(inode-i_sb)) {
+   inode-i_mapping-a_ops = pram_aops_xip;
+   inode-i_fop = pram_xip_file_operations;
+   } else {
+   inode-i_mapping-a_ops = pram_aops;
+   inode

[PATCH 10/19] pramfs: block allocation

2013-09-07 Thread Marco Stornelli
Add block allocation operations.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/balloc.c |  160 
 1 files changed, 160 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/balloc.c

diff --git a/fs/pramfs/balloc.c b/fs/pramfs/balloc.c
new file mode 100644
index 000..8291bf3
--- /dev/null
+++ b/fs/pramfs/balloc.c
@@ -0,0 +1,160 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * The blocks allocation and deallocation routines.
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/fs.h
+#include linux/bitops.h
+#include pram.h
+
+void pram_bitmap_fill(unsigned long *dst, int nbits)
+{
+   size_t nlongs = BITS_TO_LONGS(nbits);
+   if (!small_const_nbits(nbits)) {
+   int len = (nlongs - 1) * sizeof(unsigned long);
+   memset(dst, 0xff,  len);
+   }
+   if (BITS_PER_LONG == 64)
+   dst[nlongs - 1] = cpu_to_le64(BITMAP_LAST_WORD_MASK(nbits));
+   else
+   dst[nlongs - 1] = cpu_to_le32(BITMAP_LAST_WORD_MASK(nbits));
+}
+
+/*
+ * This just marks in-use the blocks that make up the bitmap.
+ * The bitmap must be writeable before calling.
+ */
+void pram_init_bitmap(struct super_block *sb)
+{
+   struct pram_super_block *ps = pram_get_super(sb);
+   unsigned long *bitmap = pram_get_bitmap(sb);
+   int blocks = be32_to_cpu(ps-s_bitmap_blocks);
+
+   memset(bitmap, 0, blocks  sb-s_blocksize_bits);
+
+   pram_bitmap_fill(bitmap, blocks);
+}
+
+
+/* Free absolute blocknr */
+void pram_free_block(struct super_block *sb, unsigned long blocknr)
+{
+   struct pram_super_block *ps;
+   u64 bitmap_block;
+   unsigned long bitmap_bnr;
+   void *bitmap;
+   void *bp;
+
+   mutex_lock(PRAM_SB(sb)-s_lock);
+
+   bitmap = pram_get_bitmap(sb);
+   /*
+* find the block within the bitmap that contains the inuse bit
+* for the block we need to free. We need to unlock this bitmap
+* block to clear the inuse bit.
+*/
+   bitmap_bnr = blocknr  (3 + sb-s_blocksize_bits);
+   bitmap_block = pram_get_block_off(sb, bitmap_bnr);
+   bp = pram_get_block(sb, bitmap_block);
+
+   pram_memunlock_block(sb, bp);
+   pram_clear_bit(blocknr, bitmap); /* mark the block free */
+   pram_memlock_block(sb, bp);
+
+   ps = pram_get_super(sb);
+   pram_memunlock_super(sb, ps);
+
+   if (blocknr  be32_to_cpu(ps-s_free_blocknr_hint))
+   ps-s_free_blocknr_hint = cpu_to_be32(blocknr);
+   be32_add_cpu(ps-s_free_blocks_count, 1);
+   pram_memlock_super(sb, ps);
+
+   mutex_unlock(PRAM_SB(sb)-s_lock);
+}
+
+
+/*
+ * allocate a block and return it's absolute blocknr. Zeroes out the
+ * block if zero set.
+ */
+int pram_new_block(struct super_block *sb, unsigned long *blocknr, int zero)
+{
+   struct pram_super_block *ps;
+   u64 bitmap_block;
+   unsigned long bnr, bitmap_bnr;
+   int errval;
+   void *bitmap;
+   void *bp;
+
+   mutex_lock(PRAM_SB(sb)-s_lock);
+   ps = pram_get_super(sb);
+   bitmap = pram_get_bitmap(sb);
+
+   if (ps-s_free_blocks_count) {
+   /* find the oldest unused block */
+   bnr = pram_find_next_zero_bit(bitmap,
+be32_to_cpu(ps-s_blocks_count),
+be32_to_cpu(ps-s_free_blocknr_hint));
+
+   if (bnr  be32_to_cpu(ps-s_bitmap_blocks) ||
+   bnr = be32_to_cpu(ps-s_blocks_count)) {
+   pram_dbg(no free blocks found!\n);
+   errval = -ENOSPC;
+   goto fail;
+   }
+
+   pram_memunlock_super(sb, ps);
+   be32_add_cpu(ps-s_free_blocks_count, -1);
+   if (bnr  (be32_to_cpu(ps-s_blocks_count)-1))
+   ps-s_free_blocknr_hint = cpu_to_be32(bnr+1);
+   else
+   ps-s_free_blocknr_hint = 0;
+   pram_memlock_super(sb, ps);
+   } else {
+   pram_dbg(all blocks allocated\n);
+   errval = -ENOSPC;
+   goto fail;
+   }
+
+   /*
+* find the block within the bitmap that contains the inuse bit
+* for the unused block we just found. We need to unlock it to
+* set the inuse bit.
+*/
+   bitmap_bnr = bnr  (3 + sb-s_blocksize_bits);
+   bitmap_block = pram_get_block_off(sb, bitmap_bnr);
+   bp = pram_get_block(sb, bitmap_block);
+
+   pram_memunlock_block(sb, bp

[PATCH 11/19] pramfs: ioctl operations

2013-09-07 Thread Marco Stornelli
Add ioctl operations.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/ioctl.c |  127 +
 1 files changed, 127 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/ioctl.c

diff --git a/fs/pramfs/ioctl.c b/fs/pramfs/ioctl.c
new file mode 100644
index 000..565cc46
--- /dev/null
+++ b/fs/pramfs/ioctl.c
@@ -0,0 +1,127 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Ioctl operations.
+ *
+ * Copyright 2010-2011 Marco Stornelli marco.storne...@gmail.com
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/capability.h
+#include linux/time.h
+#include linux/sched.h
+#include linux/compat.h
+#include linux/mount.h
+#include pram.h
+
+long pram_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+   struct inode *inode = file_inode(filp);
+   struct pram_inode *pi;
+   unsigned int flags;
+   int ret;
+
+   pi = pram_get_inode(inode-i_sb, inode-i_ino);
+   if (!pi)
+   return -EACCES;
+
+   switch (cmd) {
+   case FS_IOC_GETFLAGS:
+   flags = be32_to_cpu(pi-i_flags)  PRAM_FL_USER_VISIBLE;
+   return put_user(flags, (int __user *) arg);
+   case FS_IOC_SETFLAGS: {
+   unsigned int oldflags;
+
+   ret = mnt_want_write_file(filp);
+   if (ret)
+   return ret;
+
+   if (!inode_owner_or_capable(inode)) {
+   ret = -EPERM;
+   goto flags_out;
+   }
+
+   if (get_user(flags, (int __user *) arg)) {
+   ret = -EFAULT;
+   goto flags_out;
+   }
+
+   mutex_lock(inode-i_mutex);
+   oldflags = be32_to_cpu(pi-i_flags);
+
+   if ((flags ^ oldflags)  (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
+   if (!capable(CAP_LINUX_IMMUTABLE)) {
+   mutex_unlock(inode-i_mutex);
+   ret = -EPERM;
+   goto flags_out;
+   }
+   }
+
+   if (!S_ISDIR(inode-i_mode))
+   flags = ~FS_DIRSYNC_FL;
+
+   flags = flags  FS_FL_USER_MODIFIABLE;
+   flags |= oldflags  ~FS_FL_USER_MODIFIABLE;
+   pram_memunlock_inode(inode-i_sb, pi);
+   pi-i_flags = cpu_to_be32(flags);
+   inode-i_ctime = CURRENT_TIME_SEC;
+   pi-i_ctime = cpu_to_be32(inode-i_ctime.tv_sec);
+   pram_set_inode_flags(inode, pi);
+   pram_memlock_inode(inode-i_sb, pi);
+   mutex_unlock(inode-i_mutex);
+flags_out:
+   mnt_drop_write_file(filp);
+   return ret;
+   }
+   case FS_IOC_GETVERSION:
+   return put_user(inode-i_generation, (int __user *) arg);
+   case FS_IOC_SETVERSION: {
+   __u32 generation;
+   if (!inode_owner_or_capable(inode))
+   return -EPERM;
+   ret = mnt_want_write_file(filp);
+   if (ret)
+   return ret;
+   if (get_user(generation, (int __user *) arg)) {
+   ret = -EFAULT;
+   goto setversion_out;
+   }
+   mutex_lock(inode-i_mutex);
+   inode-i_ctime = CURRENT_TIME_SEC;
+   inode-i_generation = generation;
+   pram_update_inode(inode);
+   mutex_unlock(inode-i_mutex);
+setversion_out:
+   mnt_drop_write_file(filp);
+   return ret;
+   }
+   default:
+   return -ENOTTY;
+   }
+}
+
+#ifdef CONFIG_COMPAT
+long pram_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+   switch (cmd) {
+   case FS_IOC32_GETFLAGS:
+   cmd = FS_IOC_GETFLAGS;
+   break;
+   case FS_IOC32_SETFLAGS:
+   cmd = FS_IOC_SETFLAGS;
+   break;
+   case FS_IOC32_GETVERSION:
+   cmd = FS_IOC_GETVERSION;
+   break;
+   case FS_IOC32_SETVERSION:
+   cmd = FS_IOC_SETVERSION;
+   break;
+   default:
+   return -ENOIOCTLCMD;
+   }
+   return pram_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
+}
+#endif
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/19] pramfs: xip operations

2013-09-07 Thread Marco Stornelli
Add xip operations.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/xip.c |  119 +++
 fs/pramfs/xip.h |   33 +++
 2 files changed, 152 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/xip.c
 create mode 100644 fs/pramfs/xip.h

diff --git a/fs/pramfs/xip.c b/fs/pramfs/xip.c
new file mode 100644
index 000..26b8afe
--- /dev/null
+++ b/fs/pramfs/xip.c
@@ -0,0 +1,119 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * XIP operations.
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/mm.h
+#include linux/fs.h
+#include linux/buffer_head.h
+#include pram.h
+#include xip.h
+
+/*
+ * Wrappers. We need to use the rcu read lock to avoid
+ * concurrent truncate operation. No problem for write because we held
+ * i_mutex.
+ */
+ssize_t pram_xip_file_read(struct file *filp, char __user *buf,
+   size_t len, loff_t *ppos)
+{
+   ssize_t res;
+   rcu_read_lock();
+   res = xip_file_read(filp, buf, len, ppos);
+   rcu_read_unlock();
+   return res;
+}
+
+static int pram_xip_file_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf)
+{
+   int ret = 0;
+   rcu_read_lock();
+   ret = xip_file_fault(vma, vmf);
+   rcu_read_unlock();
+   return ret;
+}
+
+static const struct vm_operations_struct pram_xip_vm_ops = {
+   .fault  = pram_xip_file_fault,
+   .page_mkwrite = filemap_page_mkwrite,
+.remap_pages = generic_file_remap_pages,
+};
+
+int pram_xip_file_mmap(struct file * file, struct vm_area_struct * vma)
+{
+   BUG_ON(!file-f_mapping-a_ops-get_xip_mem);
+
+   file_accessed(file);
+   vma-vm_ops = pram_xip_vm_ops;
+   vma-vm_flags |= VM_MIXEDMAP;
+   return 0;
+}
+
+static int pram_find_and_alloc_blocks(struct inode *inode, sector_t iblock,
+sector_t *data_block, int create)
+{
+   int err = -EIO;
+   u64 block;
+
+   block = pram_find_data_block(inode, iblock);
+
+   if (!block) {
+   if (!create) {
+   err = -ENODATA;
+   goto err;
+   }
+
+   err = pram_alloc_blocks(inode, iblock, 1);
+   if (err)
+   goto err;
+
+   block = pram_find_data_block(inode, iblock);
+   if (!block) {
+   err = -ENODATA;
+   goto err;
+   }
+   }
+
+   *data_block = block;
+   err = 0;
+
+ err:
+   return err;
+}
+
+static inline int __pram_get_block(struct inode *inode, pgoff_t pgoff,
+  int create, sector_t *result)
+{
+   int rc = 0;
+
+   rc = pram_find_and_alloc_blocks(inode, (sector_t)pgoff, result, create);
+
+   if (rc == -ENODATA)
+   BUG_ON(create);
+
+   return rc;
+}
+
+int pram_get_xip_mem(struct address_space *mapping, pgoff_t pgoff, int create,
+void **kmem, unsigned long *pfn)
+{
+   int rc;
+   sector_t block = 0;
+
+   /* first, retrieve the block */
+   rc = __pram_get_block(mapping-host, pgoff, create, block);
+   if (rc)
+   goto exit;
+
+   *kmem = pram_get_block(mapping-host-i_sb, block);
+   *pfn =  pram_get_pfn(mapping-host-i_sb, block);
+
+exit:
+   return rc;
+}
diff --git a/fs/pramfs/xip.h b/fs/pramfs/xip.h
new file mode 100644
index 000..5bd82f2
--- /dev/null
+++ b/fs/pramfs/xip.h
@@ -0,0 +1,33 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * XIP operations.
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#ifdef CONFIG_PRAMFS_XIP
+int pram_get_xip_mem(struct address_space *, pgoff_t, int, void **,
+ unsigned long *);
+ssize_t pram_xip_file_read(struct file *filp, char __user *buf,
+   size_t len, loff_t *ppos);
+int pram_xip_file_mmap(struct file * file, struct vm_area_struct * vma);
+static inline int pram_use_xip(struct super_block *sb)
+{
+   struct pram_sb_info *sbi = PRAM_SB(sb);
+   return sbi-s_mount_opt  PRAM_MOUNT_XIP;
+}
+#define mapping_is_xip(map) (map-a_ops-get_xip_mem)
+
+#else
+
+#define mapping_is_xip(map)0
+#define pram_use_xip(sb)   0
+#define pram_get_xip_mem   NULL
+#define pram_xip_file_read NULL
+#define pram_xip_file_mmap NULL
+
+#endif
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord

[PATCH 12/19] pramfs: symlink operations

2013-09-07 Thread Marco Stornelli
Add symlink operations.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/symlink.c |   76 +++
 1 files changed, 76 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/symlink.c

diff --git a/fs/pramfs/symlink.c b/fs/pramfs/symlink.c
new file mode 100644
index 000..0d5213f
--- /dev/null
+++ b/fs/pramfs/symlink.c
@@ -0,0 +1,76 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Symlink operations
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/fs.h
+#include pram.h
+#include xattr.h
+
+int pram_block_symlink(struct inode *inode, const char *symname, int len)
+{
+   struct super_block *sb = inode-i_sb;
+   u64 block;
+   char *blockp;
+   int err;
+
+   err = pram_alloc_blocks(inode, 0, 1);
+   if (err)
+   return err;
+
+   block = pram_find_data_block(inode, 0);
+   blockp = pram_get_block(sb, block);
+
+   pram_memunlock_block(sb, blockp);
+   memcpy(blockp, symname, len);
+   blockp[len] = '\0';
+   pram_memlock_block(sb, blockp);
+   return 0;
+}
+
+static int pram_readlink(struct dentry *dentry, char __user *buffer, int 
buflen)
+{
+   struct inode *inode = dentry-d_inode;
+   struct super_block *sb = inode-i_sb;
+   u64 block;
+   char *blockp;
+
+   block = pram_find_data_block(inode, 0);
+   blockp = pram_get_block(sb, block);
+   return vfs_readlink(dentry, buffer, buflen, blockp);
+}
+
+static void *pram_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+   struct inode *inode = dentry-d_inode;
+   struct super_block *sb = inode-i_sb;
+   off_t block;
+   int status;
+   char *blockp;
+
+   block = pram_find_data_block(inode, 0);
+   blockp = pram_get_block(sb, block);
+   status = vfs_follow_link(nd, blockp);
+   return ERR_PTR(status);
+}
+
+const struct inode_operations pram_symlink_inode_operations = {
+   .readlink   = pram_readlink,
+   .follow_link= pram_follow_link,
+   .setattr= pram_notify_change,
+#ifdef CONFIG_PRAMFS_XATTR
+   .setxattr   = generic_setxattr,
+   .getxattr   = generic_getxattr,
+   .listxattr  = pram_listxattr,
+   .removexattr= generic_removexattr,
+#endif
+};
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/19] pramfs: extended attributes block description tree

2013-09-07 Thread Marco Stornelli
Add extended attributes block description tree.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/desctree.c |  181 ++
 fs/pramfs/desctree.h |   44 
 2 files changed, 225 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/desctree.c
 create mode 100644 fs/pramfs/desctree.h

diff --git a/fs/pramfs/desctree.c b/fs/pramfs/desctree.c
new file mode 100644
index 000..fa1c9fc
--- /dev/null
+++ b/fs/pramfs/desctree.c
@@ -0,0 +1,181 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Extended attributes block descriptors tree.
+ *
+ * Copyright 2010-2011 Marco Stornelli marco.storne...@gmail.com
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/spinlock.h
+#include desctree.h
+#include pram.h
+
+/* xblock_desc_init_always()
+ *
+ * These are initializations that need to be done on every
+ * descriptor allocation as the fields are not initialised
+ * by slab allocation.
+ */
+void xblock_desc_init_always(struct pram_xblock_desc *desc)
+{
+   atomic_set(desc-refcount, 0);
+   desc-blocknr = 0;
+   desc-flags = 0;
+}
+
+/* xblock_desc_init_once()
+ *
+ * These are initializations that only need to be done
+ * once, because the fields are idempotent across use
+ * of the descriptor, so let the slab aware of that.
+ */
+void xblock_desc_init_once(struct pram_xblock_desc *desc)
+{
+   mutex_init(desc-lock);
+}
+
+/* __insert_xblock_desc()
+ *
+ * Insert a new descriptor in the tree.
+ */
+static void __insert_xblock_desc(struct pram_sb_info *sbi,
+unsigned long blocknr, struct rb_node *node)
+{
+   struct rb_node **p = (sbi-desc_tree.rb_node);
+   struct rb_node *parent = NULL;
+   struct pram_xblock_desc *desc;
+
+   while (*p) {
+   parent = *p;
+   desc = rb_entry(parent, struct pram_xblock_desc, node);
+
+   if (blocknr  desc-blocknr)
+   p = (*p)-rb_left;
+   else if (blocknr  desc-blocknr)
+   p = (*p)-rb_right;
+   else
+   /* Oops...an other descriptor for the same block ? */
+   BUG();
+   }
+
+   rb_link_node(node, parent, p);
+   rb_insert_color(node, sbi-desc_tree);
+}
+
+void insert_xblock_desc(struct pram_sb_info *sbi, struct pram_xblock_desc 
*desc)
+{
+   spin_lock(sbi-desc_tree_lock);
+   __insert_xblock_desc(sbi, desc-blocknr, desc-node);
+   spin_unlock(sbi-desc_tree_lock);
+};
+
+/* __lookup_xblock_desc()
+ *
+ * Search an extended attribute descriptor in the tree via the
+ * block number. It returns the descriptor if it's found or
+ * NULL. If not found it creates a new descriptor if create is not 0.
+ */
+static struct pram_xblock_desc *__lookup_xblock_desc(struct pram_sb_info *sbi,
+   unsigned long blocknr,
+   struct kmem_cache *cache,
+   int create)
+{
+   struct rb_node *n = sbi-desc_tree.rb_node;
+   struct pram_xblock_desc *desc = NULL;
+
+   while (n) {
+   desc = rb_entry(n, struct pram_xblock_desc, node);
+
+   if (blocknr  desc-blocknr)
+   n = n-rb_left;
+   else if (blocknr  desc-blocknr)
+   n = n-rb_right;
+   else {
+   atomic_inc(desc-refcount);
+   goto out;
+   }
+   }
+
+   /* not found */
+   if (create) {
+   desc = kmem_cache_alloc(cache, GFP_NOFS);
+   if (!desc)
+   return ERR_PTR(-ENOMEM);
+   xblock_desc_init_always(desc);
+   atomic_set(desc-refcount, 1);
+   desc-blocknr = blocknr;
+   __insert_xblock_desc(sbi, desc-blocknr, desc-node);
+   }
+out:
+   return desc;
+}
+
+struct pram_xblock_desc *lookup_xblock_desc(struct pram_sb_info *sbi,
+   unsigned long blocknr,
+   struct kmem_cache *cache,
+   int create)
+{
+   struct pram_xblock_desc *desc = NULL;
+
+   spin_lock(sbi-desc_tree_lock);
+   desc = __lookup_xblock_desc(sbi, blocknr, cache, create);
+   spin_unlock(sbi-desc_tree_lock);
+   return desc;
+}
+
+/* put_xblock_desc()
+ *
+ * Decrement the reference count and if it reaches zero and the
+ * desciptor has been marked to be free, then we free it.
+ * It returns 0 if the descriptor has been deleted and 1 otherwise.
+ */
+int put_xblock_desc(struct pram_sb_info *sbi, struct pram_xblock_desc *desc)
+{
+   int ret = 1;
+   if (!desc)
+   return ret

[PATCH 15/19] pramfs: extended attributes

2013-09-07 Thread Marco Stornelli
Add extended attributes.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/xattr.c  | 1118 
 fs/pramfs/xattr.h  |   92 
 fs/pramfs/xattr_security.c |   80 
 fs/pramfs/xattr_trusted.c  |   65 +++
 fs/pramfs/xattr_user.c |   69 +++
 5 files changed, 1424 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/xattr.c
 create mode 100644 fs/pramfs/xattr.h
 create mode 100644 fs/pramfs/xattr_security.c
 create mode 100644 fs/pramfs/xattr_trusted.c
 create mode 100644 fs/pramfs/xattr_user.c

diff --git a/fs/pramfs/xattr.c b/fs/pramfs/xattr.c
new file mode 100644
index 000..a78bf1d
--- /dev/null
+++ b/fs/pramfs/xattr.c
@@ -0,0 +1,1118 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Extended attributes operations.
+ *
+ * Copyright 2010-2011 Marco Stornelli marco.storne...@gmail.com
+ *
+ * based on fs/ext2/xattr.c with the following copyright:
+ *
+ * Fix by Harrison Xing harri...@mountainviewdata.com.
+ * Extended attributes for symlinks and special files added per
+ *  suggestion of Luka Renko luka.re...@hermes.si.
+ * xattr consolidation Copyright (c) 2004 James Morris jmor...@redhat.com,
+ *  Red Hat Inc.
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+/*
+ * Extended attributes are stored in blocks allocated outside of
+ * any inode. The i_xattr field is then made to point to this allocated
+ * block. If all extended attributes of an inode are identical, these
+ * inodes may share the same extended attribute block. Such situations
+ * are automatically detected by keeping a cache of recent attribute block
+ * numbers and hashes over the block's contents in memory.
+ *
+ *
+ * Extended attribute block layout:
+ *
+ *   +--+
+ *   | header   |
+ *   | entry 1  | |
+ *   | entry 2  | | growing downwards
+ *   | entry 3  | v
+ *   | four null bytes  |
+ *   | . . .|
+ *   | value 1  | ^
+ *   | value 3  | | growing upwards
+ *   | value 2  | |
+ *   +--+
+ *
+ * The block header is followed by multiple entry descriptors. These entry
+ * descriptors are variable in size, and aligned to PRAM_XATTR_PAD
+ * byte boundaries. The entry descriptors are sorted by attribute name,
+ * so that two extended attribute blocks can be compared efficiently.
+ *
+ * Attribute values are aligned to the end of the block, stored in
+ * no specific order. They are also padded to PRAM_XATTR_PAD byte
+ * boundaries. No additional gaps are left between them.
+ *
+ * Locking strategy
+ * 
+ * pi-i_xattr is protected by PRAM_I(inode)-xattr_sem.
+ * EA blocks are only changed if they are exclusive to an inode, so
+ * holding xattr_sem also means that nothing but the EA block's reference
+ * count will change. Multiple writers to an EA block are synchronized
+ * by the mutex in each block descriptor. Block descriptors are kept in a
+ * red black tree and the key is the absolute block number.
+ */
+
+#include linux/module.h
+#include linux/init.h
+#include linux/mbcache.h
+#include linux/rwsem.h
+#include linux/security.h
+#include pram.h
+#include xattr.h
+#include acl.h
+#include desctree.h
+
+#define HDR(bp) ((struct pram_xattr_header *)(bp))
+#define ENTRY(ptr) ((struct pram_xattr_entry *)(ptr))
+#define FIRST_ENTRY(bh) ENTRY(HDR(bh)+1)
+#define IS_LAST_ENTRY(entry) (*(__u32 *)(entry) == 0)
+#define GET_DESC(sbi, blocknr) \
+   lookup_xblock_desc(sbi, blocknr, pram_xblock_desc_cache, 1)
+#define LOOKUP_DESC(sbi, blocknr) lookup_xblock_desc(sbi, blocknr, NULL, 0)
+
+#ifdef PRAM_XATTR_DEBUG
+# define ea_idebug(inode, f...) do { \
+   printk(KERN_DEBUG inode %ld: , inode-i_ino); \
+   printk(f); \
+   printk(\n); \
+   } while (0)
+# define ea_bdebug(blocknr, f...) do { \
+   printk(KERN_DEBUG block %lu: , blocknr); \
+   printk(f); \
+   printk(\n); \
+   } while (0)
+#else
+# define ea_idebug(f...)
+# define ea_bdebug(f...)
+#endif
+
+static int pram_xattr_set2(struct inode *, char *, struct pram_xblock_desc *,
+  struct pram_xattr_header *);
+
+static int pram_xattr_cache_insert(struct super_block *sb,
+  unsigned long blocknr, u32 xhash);
+static struct pram_xblock_desc *pram_xattr_cache_find(struct inode *,
+struct pram_xattr_header *);
+static void pram_xattr_rehash(struct pram_xattr_header *,
+ struct pram_xattr_entry *);
+
+static struct mb_cache *pram_xattr_cache;
+static struct kmem_cache *pram_xblock_desc_cache;
+
+static const struct xattr_handler *pram_xattr_handler_map[] = {
+   [PRAM_XATTR_INDEX_USER]  = pram_xattr_user_handler,
+#ifdef

[PATCH 17/19] pramfs: write protection

2013-09-07 Thread Marco Stornelli
Add write protection.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/wprotect.c |   39 ++
 fs/pramfs/wprotect.h |  144 ++
 2 files changed, 183 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/wprotect.c
 create mode 100644 fs/pramfs/wprotect.h

diff --git a/fs/pramfs/wprotect.c b/fs/pramfs/wprotect.c
new file mode 100644
index 000..ba1e488
--- /dev/null
+++ b/fs/pramfs/wprotect.c
@@ -0,0 +1,39 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Write protection for the filesystem pages.
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/module.h
+#include linux/fs.h
+#include linux/mm.h
+#include linux/io.h
+#include pram.h
+
+void pram_writeable(void *vaddr, unsigned long size, int rw)
+{
+   int ret = 0;
+   unsigned long nrpages = size  PAGE_SHIFT;
+   unsigned long addr = (unsigned long)vaddr;
+
+   /* Page aligned */
+   addr = PAGE_MASK;
+
+   if (size  (PAGE_SIZE - 1))
+   nrpages++;
+
+   if (rw)
+   ret = set_memory_rw(addr, nrpages);
+   else
+   ret = set_memory_ro(addr, nrpages);
+
+   BUG_ON(ret);
+}
diff --git a/fs/pramfs/wprotect.h b/fs/pramfs/wprotect.h
new file mode 100644
index 000..f5ee08d
--- /dev/null
+++ b/fs/pramfs/wprotect.h
@@ -0,0 +1,144 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Memory protection definitions for the PRAMFS filesystem.
+ *
+ * Copyright 2010-2011 Marco Stornelli marco.storne...@gmail.com
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#ifndef __WPROTECT_H
+#define __WPROTECT_H
+
+#include linux/pram_fs.h
+
+/* pram_memunlock_super() before calling! */
+static inline void pram_sync_super(struct pram_super_block *ps)
+{
+   u32 crc = 0;
+   ps-s_wtime = cpu_to_be32(get_seconds());
+   ps-s_sum = 0;
+   crc = crc32(~0, (__u8 *)ps + sizeof(__be32), PRAM_SB_SIZE -
+   sizeof(__be32));
+   ps-s_sum = cpu_to_be32(crc);
+   /* Keep sync redundant super block */
+   memcpy((void *)ps + PRAM_SB_SIZE, (void *)ps, PRAM_SB_SIZE);
+}
+
+/* pram_memunlock_inode() before calling! */
+static inline void pram_sync_inode(struct pram_inode *pi)
+{
+   u32 crc = 0;
+   pi-i_sum = 0;
+   crc = crc32(~0, (__u8 *)pi + sizeof(__be32), PRAM_INODE_SIZE -
+   sizeof(__be32));
+   pi-i_sum = cpu_to_be32(crc);
+}
+
+#ifdef CONFIG_PRAMFS_WRITE_PROTECT
+extern void pram_writeable(void *vaddr, unsigned long size, int rw);
+
+static inline int pram_is_protected(struct super_block *sb)
+{
+   struct pram_sb_info *sbi = (struct pram_sb_info *)sb-s_fs_info;
+   return sbi-s_mount_opt  PRAM_MOUNT_PROTECT;
+}
+
+static inline void __pram_memunlock_range(void *p, unsigned long len)
+{
+   pram_writeable(p, len, 1);
+}
+
+static inline void __pram_memlock_range(void *p, unsigned long len)
+{
+   pram_writeable(p, len, 0);
+}
+
+static inline void pram_memunlock_range(struct super_block *sb, void *p,
+   unsigned long len)
+{
+   if (pram_is_protected(sb))
+   __pram_memunlock_range(p, len);
+}
+
+static inline void pram_memlock_range(struct super_block *sb, void *p,
+   unsigned long len)
+{
+   if (pram_is_protected(sb))
+   __pram_memlock_range(p, len);
+}
+
+static inline void pram_memunlock_super(struct super_block *sb,
+   struct pram_super_block *ps)
+{
+   if (pram_is_protected(sb))
+   __pram_memunlock_range(ps, PRAM_SB_SIZE);
+}
+
+static inline void pram_memlock_super(struct super_block *sb,
+   struct pram_super_block *ps)
+{
+   pram_sync_super(ps);
+   if (pram_is_protected(sb))
+   __pram_memlock_range(ps, PRAM_SB_SIZE);
+}
+
+static inline void pram_memunlock_inode(struct super_block *sb,
+   struct pram_inode *pi)
+{
+   if (pram_is_protected(sb))
+   __pram_memunlock_range(pi, PRAM_SB_SIZE);
+}
+
+static inline void pram_memlock_inode(struct super_block *sb,
+   struct pram_inode *pi)
+{
+   pram_sync_inode(pi);
+   if (pram_is_protected(sb))
+   __pram_memlock_range(pi, PRAM_SB_SIZE);
+}
+
+static inline void

[PATCH 16/19] pramfs: acl operations

2013-09-07 Thread Marco Stornelli
Add acl operations.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/acl.c |  415 +++
 fs/pramfs/acl.h |   85 +++
 2 files changed, 500 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/acl.c
 create mode 100644 fs/pramfs/acl.h

diff --git a/fs/pramfs/acl.c b/fs/pramfs/acl.c
new file mode 100644
index 000..c0f1f63
--- /dev/null
+++ b/fs/pramfs/acl.c
@@ -0,0 +1,415 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * POSIX ACL operations
+ *
+ * Copyright 2010-2011 Marco Stornelli marco.storne...@gmail.com
+ *
+ * based on fs/ext2/acl.c with the following copyright:
+ *
+ * Copyright (C) 2001-2003 Andreas Gruenbacher, agr...@suse.de
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/capability.h
+#include linux/init.h
+#include linux/sched.h
+#include linux/slab.h
+#include linux/fs.h
+#include pram.h
+#include xattr.h
+#include acl.h
+
+/*
+ * Load ACL information from filesystem.
+ */
+static struct posix_acl *pram_acl_load(const void *value, size_t size)
+{
+   const char *end = (char *)value + size;
+   int n, count;
+   struct posix_acl *acl;
+
+   if (!value)
+   return NULL;
+   if (size  sizeof(struct pram_acl_header))
+   return ERR_PTR(-EINVAL);
+   if (((struct pram_acl_header *)value)-a_version !=
+   cpu_to_be32(PRAM_ACL_VERSION))
+   return ERR_PTR(-EINVAL);
+   value = (char *)value + sizeof(struct pram_acl_header);
+   count = pram_acl_count(size);
+   if (count  0)
+   return ERR_PTR(-EINVAL);
+   if (count == 0)
+   return NULL;
+   acl = posix_acl_alloc(count, GFP_KERNEL);
+   if (!acl)
+   return ERR_PTR(-ENOMEM);
+   for (n = 0; n  count; n++) {
+   struct pram_acl_entry *entry = (struct pram_acl_entry *)value;
+   if ((char *)value + sizeof(struct pram_acl_entry_short)  end)
+   goto fail;
+   acl-a_entries[n].e_tag  = be16_to_cpu(entry-e_tag);
+   acl-a_entries[n].e_perm = be16_to_cpu(entry-e_perm);
+   switch (acl-a_entries[n].e_tag) {
+   case ACL_USER_OBJ:
+   case ACL_GROUP_OBJ:
+   case ACL_MASK:
+   case ACL_OTHER:
+   value = (char *)value +
+   sizeof(struct pram_acl_entry_short);
+   break;
+   case ACL_USER:
+   value = (char *)value + sizeof(struct pram_acl_entry);
+   if ((char *)value  end)
+   goto fail;
+   acl-a_entries[n].e_uid = make_kuid(init_user_ns,
+   be32_to_cpu(entry-e_id));
+   break;
+   case ACL_GROUP:
+   value = (char *)value + sizeof(struct pram_acl_entry);
+   if ((char *)value  end)
+   goto fail;
+   acl-a_entries[n].e_gid = make_kgid(init_user_ns,
+   be32_to_cpu(entry-e_id));
+   break;
+   default:
+   goto fail;
+   }
+   }
+   if (value != end)
+   goto fail;
+   return acl;
+
+fail:
+   posix_acl_release(acl);
+   return ERR_PTR(-EINVAL);
+}
+
+/*
+ * Save ACL information into the filesystem.
+ */
+static void *pram_acl_save(const struct posix_acl *acl, size_t *size)
+{
+   struct pram_acl_header *ext_acl;
+   char *e;
+   size_t n;
+
+   *size = pram_acl_size(acl-a_count);
+   ext_acl = kmalloc(sizeof(struct pram_acl_header) + acl-a_count *
+   sizeof(struct pram_acl_entry), GFP_KERNEL);
+   if (!ext_acl)
+   return ERR_PTR(-ENOMEM);
+   ext_acl-a_version = cpu_to_be32(PRAM_ACL_VERSION);
+   e = (char *)ext_acl + sizeof(struct pram_acl_header);
+   for (n = 0; n  acl-a_count; n++) {
+   const struct posix_acl_entry *acl_e = acl-a_entries[n];
+   struct pram_acl_entry *entry = (struct pram_acl_entry *)e;
+   entry-e_tag  = cpu_to_le16(acl_e-e_tag);
+   entry-e_perm = cpu_to_le16(acl_e-e_perm);
+   switch(acl_e-e_tag) {
+   case ACL_USER:
+   entry-e_id = cpu_to_be32(
+   from_kuid(init_user_ns, acl_e-e_uid));
+   e += sizeof(struct pram_acl_entry);
+   break;
+   case ACL_GROUP:
+   entry-e_id = cpu_to_be32(
+   from_kgid(init_user_ns, acl_e-e_gid));
+   e += sizeof(struct

[PATCH 18/19] pramfs: test module

2013-09-07 Thread Marco Stornelli
Add test module.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/pramfs/pramfs_test.c |   47 +++
 1 files changed, 47 insertions(+), 0 deletions(-)
 create mode 100644 fs/pramfs/pramfs_test.c

diff --git a/fs/pramfs/pramfs_test.c b/fs/pramfs/pramfs_test.c
new file mode 100644
index 000..7acfda2
--- /dev/null
+++ b/fs/pramfs/pramfs_test.c
@@ -0,0 +1,47 @@
+/*
+ * BRIEF DESCRIPTION
+ *
+ * Pramfs test module.
+ *
+ * Copyright 2009-2011 Marco Stornelli marco.storne...@gmail.com
+ * Copyright 2003 Sony Corporation
+ * Copyright 2003 Matsushita Electric Industrial Co., Ltd.
+ * 2003-2004 (c) MontaVista Software, Inc. , Steve Longerbeam
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+#include linux/module.h
+#include linux/version.h
+#include linux/init.h
+#include linux/fs.h
+#include pram.h
+
+int __init test_pramfs_write(void)
+{
+   struct pram_super_block *psb;
+
+   psb = get_pram_super();
+   if (!psb) {
+   printk(KERN_ERR
+   %s: PRAMFS super block not found (not mounted?)\n,
+   __func__);
+   return 1;
+   }
+
+   /*
+* Attempt an unprotected clear of checksum information in the
+* superblock, this should cause a kernel page protection fault.
+*/
+   printk(%s: writing to kernel VA %p\n, __func__, psb);
+   psb-s_sum = 0;
+
+   return 0;
+}
+
+void test_pramfs_write_cleanup(void) {}
+
+/* Module information */
+MODULE_LICENSE(GPL);
+module_init(test_pramfs_write);
+module_exit(test_pramfs_write_cleanup);
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/19] pramfs: Kconfig and makefile

2013-09-07 Thread Marco Stornelli
Add Kconfig and makefile.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/Kconfig |6 +++-
 fs/Makefile|1 +
 fs/pramfs/Kconfig  |   72 
 fs/pramfs/Makefile |   14 ++
 4 files changed, 91 insertions(+), 2 deletions(-)
 create mode 100644 fs/pramfs/Kconfig
 create mode 100644 fs/pramfs/Makefile

diff --git a/fs/Kconfig b/fs/Kconfig
index c229f82..fd86a48 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -17,7 +17,7 @@ source fs/ext4/Kconfig
 config FS_XIP
 # execute in place
bool
-   depends on EXT2_FS_XIP
+   depends on EXT2_FS_XIP || PRAMFS_XIP
default y
 
 source fs/jbd/Kconfig
@@ -29,7 +29,8 @@ config FS_MBCACHE
default y if EXT2_FS=y  EXT2_FS_XATTR
default y if EXT3_FS=y  EXT3_FS_XATTR
default y if EXT4_FS=y
-   default m if EXT2_FS_XATTR || EXT3_FS_XATTR || EXT4_FS
+   default y if PRAMFS=y  PRAMFS_XATTR
+   default m if EXT2_FS_XATTR || EXT3_FS_XATTR || EXT4_FS || PRAMFS_XATTR
 
 source fs/reiserfs/Kconfig
 source fs/jfs/Kconfig
@@ -209,6 +210,7 @@ source fs/romfs/Kconfig
 source fs/pstore/Kconfig
 source fs/sysv/Kconfig
 source fs/ufs/Kconfig
+source fs/pramfs/Kconfig
 source fs/exofs/Kconfig
 source fs/f2fs/Kconfig
 source fs/efivarfs/Kconfig
diff --git a/fs/Makefile b/fs/Makefile
index 4fe6df3..f8e70df 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -126,3 +126,4 @@ obj-y   += exofs/ # Multiple 
modules
 obj-$(CONFIG_CEPH_FS)  += ceph/
 obj-$(CONFIG_PSTORE)   += pstore/
 obj-$(CONFIG_EFIVAR_FS)+= efivarfs/
+obj-$(CONFIG_PRAMFS)   += pramfs/
diff --git a/fs/pramfs/Kconfig b/fs/pramfs/Kconfig
new file mode 100644
index 000..0ca2402
--- /dev/null
+++ b/fs/pramfs/Kconfig
@@ -0,0 +1,72 @@
+config PRAMFS
+   tristate Persistent and Protected RAM file system support
+   depends on HAS_IOMEM
+   select CRC32
+   help
+  If your system has a block of fast (comparable in access speed to
+  system memory) and non-volatile RAM and you wish to mount a
+  light-weight, full-featured, and space-efficient filesystem over it,
+  say Y here, and read file:Documentation/filesystems/pramfs.txt.
+
+  To compile this as a module,  choose M here: the module will be
+  called pramfs.
+
+config PRAMFS_XIP
+   bool Execute-in-place in PRAMFS
+   depends on PRAMFS  BLOCK
+   help
+  Say Y here to enable XIP feature of PRAMFS.
+
+config PRAMFS_WRITE_PROTECT
+   bool PRAMFS write protection
+   depends on PRAMFS  MMU  HAVE_SET_MEMORY_RO
+   default y
+   help
+  Say Y here to enable the write protect feature of PRAMFS.
+
+config PRAMFS_XATTR
+   bool PRAMFS extended attributes
+   depends on PRAMFS  BLOCK
+   help
+ Extended attributes are name:value pairs associated with inodes by
+ the kernel or by users (see the attr(5) manual page, or visit
+ http://acl.bestbits.at/ for details).
+
+ If unsure, say N.
+
+config PRAMFS_POSIX_ACL
+   bool PRAMFS POSIX Access Control Lists
+   depends on PRAMFS_XATTR
+   select FS_POSIX_ACL
+   help
+ Posix Access Control Lists (ACLs) support permissions for users and
+ groups beyond the owner/group/world scheme.
+
+ To learn more about Access Control Lists, visit the Posix ACLs for
+ Linux website http://acl.bestbits.at/.
+
+ If you don't know what Access Control Lists are, say N.
+
+config PRAMFS_SECURITY
+   bool PRAMFS Security Labels
+   depends on PRAMFS_XATTR
+   help
+ Security labels support alternative access control models
+ implemented by security modules like SELinux.  This option
+ enables an extended attribute handler for file security
+ labels in the pram filesystem.
+
+ If you are not using a security module that requires using
+ extended attributes for file security labels, say N.
+
+config PRAMFS_TEST
+   boolean
+   depends on PRAMFS
+
+config PRAMFS_TEST_MODULE
+   tristate PRAMFS Test
+   depends on PRAMFS  PRAMFS_WRITE_PROTECT  m
+   select PRAMFS_TEST
+   help
+ Say Y here to build a simple module to test the protection of
+ PRAMFS. The module will be called pramfs_test.
diff --git a/fs/pramfs/Makefile b/fs/pramfs/Makefile
new file mode 100644
index 000..055f0bb
--- /dev/null
+++ b/fs/pramfs/Makefile
@@ -0,0 +1,14 @@
+#
+# Makefile for the linux pram-filesystem routines.
+#
+
+obj-$(CONFIG_PRAMFS) += pramfs.o
+obj-$(CONFIG_PRAMFS_TEST_MODULE) += pramfs_test.o
+
+pramfs-y := balloc.o dir.o file.o inode.o namei.o super.o symlink.o ioctl.o
+
+pramfs-$(CONFIG_PRAMFS_WRITE_PROTECT) += wprotect.o
+pramfs-$(CONFIG_PRAMFS_XIP) += xip.o
+pramfs-$(CONFIG_PRAMFS_XATTR) += xattr.o xattr_user.o xattr_trusted.o 
desctree.o
+pramfs

Re: [PATCH 12/19] pramfs: symlink operations

2013-09-07 Thread Marco Stornelli

Il 07/09/2013 16:41, Al Viro ha scritto:

On Sat, Sep 07, 2013 at 10:29:15AM +0200, Marco Stornelli wrote:

+static int pram_readlink(struct dentry *dentry, char __user *buffer, int 
buflen)
+{
+   struct inode *inode = dentry-d_inode;
+   struct super_block *sb = inode-i_sb;
+   u64 block;
+   char *blockp;
+
+   block = pram_find_data_block(inode, 0);
+   blockp = pram_get_block(sb, block);
+   return vfs_readlink(dentry, buffer, buflen, blockp);
+}



+static void *pram_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+   struct inode *inode = dentry-d_inode;
+   struct super_block *sb = inode-i_sb;
+   off_t block;
+   int status;
+   char *blockp;
+
+   block = pram_find_data_block(inode, 0);
+   blockp = pram_get_block(sb, block);
+   status = vfs_follow_link(nd, blockp);
+   return ERR_PTR(status);
+}


Just nd_set_link(nd, blockp) instead of that vfs_follow_link() and be
done with that; that way you can use generic_readlink() instead of
pram_readlink() *and* get lower stack footprint on traversing them.




Yep, you're right (as usual :))

Marco

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 08/19] pramfs: file operations for dirs

2013-09-07 Thread Marco Stornelli

Il 07/09/2013 17:01, Al Viro ha scritto:

On Sat, Sep 07, 2013 at 10:22:36AM +0200, Marco Stornelli wrote:

+int pram_add_link(struct dentry *dentry, struct inode *inode)
+{
+   struct inode *dir = dentry-d_parent-d_inode;
+   struct pram_inode *pidir, *pi, *pitail = NULL;
+   u64 tail_ino, prev_ino;
+
+   const char *name = dentry-d_name.name;
+
+   int namelen = min_t(unsigned int, dentry-d_name.len, PRAM_NAME_LEN);


Whatever the hell for?  Your -lookup() rejects dentries with names longer
than PRAM_NAME_LEN with an error, so they won't reach this function at all.



Ok. I'll remove it.


+int pram_remove_link(struct inode *inode)


Umm...  That's called on rename (for old one) *and* inode eviction when link
count goes to zero.  What's the point of keeping unlinked ones (unlink/rmdir/
rename victims) on those lists?  Sure, you skip them on lookups, but why
delay link removal until eviction?  You pay for that with extra locking,
BTW - if not for that, you wouldn't need your i_link_mutex at all.



Good question. The only answer I've got now is for historical reason, 
I can't see at the moment why we can remove the link information in case 
of opened-but-unlinked, instead of delay the operation until evict.



+   pi = pram_get_inode(sb, inode-i_ino);
+
+   switch ((u32)file-f_pos) {
+   case 0:
+   ret = dir_emit_dot(file, ctx);
+   ctx-pos = 1;
+   return ret;


Really?  So on the first call of -iterate() you just generate one
entry and don't even try to produce more?  And it looks like the
rest is no nicer...



I'll try to improve the behavior here.

Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/19] pramfs: inode operations for dirs

2013-09-07 Thread Marco Stornelli

Il 07/09/2013 17:08, Al Viro ha scritto:

On Sat, Sep 07, 2013 at 10:23:42AM +0200, Marco Stornelli wrote:


+static int pram_rmdir(struct inode *dir, struct dentry *dentry)
+{
+   struct inode *inode = dentry-d_inode;
+   struct pram_inode *pi;
+   int err = -ENOTEMPTY;
+
+   if (!inode)
+   return -ENOENT;
+
+   pi = pram_get_inode(dir-i_sb, inode-i_ino);
+
+   /* directory to delete is empty? */
+   if (pi-i_type.dir.tail == 0) {
+   inode-i_ctime = dir-i_ctime;
+   inode-i_size = 0;
+   clear_nlink(inode);
+   pram_write_inode(inode, NULL);
+   pram_dec_count(dir);
+   err = 0;
+   } else {
+   pram_dbg(dir not empty\n);
+   }
+
+   return err;
+}


... and here you are paying for delayed removal of entries:
mkdir foo
touch foo/bar
rm -rf foo foo/bar
will fail, since opened-and-unlinked file in effect keeps the directory
where it used to be not empty from your rmdir (and rename) POV.



Yep. Same problem as before. I think I can move the remove link into 
pram_dec_count and I have to modify the evict path, it should be easy to 
manage.


Thanks for your comments Al

Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/19] pramfs

2013-09-07 Thread Marco Stornelli

Il 07/09/2013 16:58, richard -rw- weinberger ha scritto:

On Sat, Sep 7, 2013 at 10:14 AM, Marco Stornelli
marco.storne...@gmail.com wrote:

Hi all,

this is an attempt to include pramfs in mainline. At the moment pramfs
has been included in LTSI kernel. Since last review the code is more
or less the same but, with a really big thanks to Vladimir Davydov and
Parallels, the development of fsck has been started and we have now
the possibility to correct fs errors due to corruption. It's a young
tool but we are working on it. You can clone the code from our repos:

git clone git://git.code.sf.net/p/pramfs/code pramfs-code
git clone git://git.code.sf.net/p/pramfs/Tools pramfs-Tools


I'm a bit confused, what kind of non-volatile RAM is your fs targeting?
Wouldn't it make sense to use pstore like
arch/powerpc/platforms/pseries/nvram.c does?



Usually battery-backed SRAM, but actually it can be used in any piece of 
ram directly accessible and it provides a normal and complete fs 
interface. Usually I do the fs test remapping my system ram. You can 
find documentation here:


http://pramfs.sourceforge.net

Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


O_TMPFILE problem

2013-07-28 Thread Marco Stornelli

Hi,

I'm doing a couple of test about O_TMPFILE on my fs. I can see that when 
the file is closed the blocks allocated are not freed. It happens 
because of i_mode, it isn't a regular file nor a directory nor a link. I 
added S_IFREG in my implementation of tmpfile callback when I have to 
call new_inode() and now it works, but am I missing something here?


Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] pram: persistent over-kexec memory file system

2013-07-28 Thread Marco Stornelli

Il 28/07/2013 12:05, Vladimir Davydov ha scritto:

On 07/27/2013 09:37 PM, Marco Stornelli wrote:

Il 27/07/2013 19:35, Vladimir Davydov ha scritto:

On 07/27/2013 07:41 PM, Marco Stornelli wrote:

Il 26/07/2013 14:29, Vladimir Davydov ha scritto:

Hi,

We want to propose a way to upgrade a kernel on a machine without
restarting all the user-space services. This is to be done with CRIU
project, but we need help from the kernel to preserve some data in
memory while doing kexec.

The key point of our implementation is leaving process memory in-place
during reboot. This should eliminate most io operations the services
would produce during initialization. To achieve this, we have
implemented a pseudo file system that preserves its content during
kexec. We propose saving CRIU dump files to this file system,
kexec'ing
and then restoring the processes in the newly booted kernel.



http://pramfs.sourceforge.net/


AFAIU it's a bit different thing: PRAMFS as well as pstore, which has
already been merged, requires hardware support for over-reboot
persistency, so called non-volatile RAM, i.e. RAM which is not directly
accessible and so is not used by the kernel. On the contrary, what we'd
like to have is preserving usual RAM on kexec. It is possible, because
RAM is not reset during kexec. This would allow leaving applications
working set as well as filesystem caches in place, speeding the reboot
process as a whole and reducing the downtime significantly.

Thanks.


Actually not. You can use normal system RAM reserved at boot with mem
parameter without any kernel change. Until an hard reset happens, that
area will be "persistent".


Thank you, we'll look at PRAMFS closer, but right now, after trying it I
have a couple of concerns I'd appreciate if you could clarify:

1) As you advised, I tried to reserve a range of memory (passing
memmap=4G$4G at boot) and mounted PRAMFS using the following options:

# mount -t pramfs -o physaddr=0x1,init=4G,bs=4096 none /mnt/pramfs

And it turned out that PRAMFS is very slow as compared to ramfs:

# dd if=/dev/zero of=/mnt/pramfs if=/dev/zero of=/mnt/pramfs/dummy
bs=4096 count=$[100*1024]
102400+0 records in
102400+0 records out
419430400 bytes (419 MB) copied, 9.23498 s, 45.4 MB/s
# dd if=/dev/zero of=/mnt/pramfs if=/dev/zero of=/mnt/pramfs/dummy
bs=4096 count=$[100*1024] conv=notrunc
102400+0 records in
102400+0 records out
419430400 bytes (419 MB) copied, 3.04692 s, 138 MB/s

We need it to be as fast as usual RAM, because otherwise the benefit of
it over hdd disappears. So before diving into the code, I'd like to ask
you if it's intrinsic to PRAMFS, or can it be fixed? Or, perhaps, I used
wrong mount/boot/config options (btw, I enabled only CONFIG_PRAMFS)?



In x86 you should have the write protection enabled. Turn it off or 
mount it with noprotect option.



2) To enable saving application dump files in memory using PRAMFS, one
should reserve half of RAM for it. That's too expensive. While with
ramfs, once SPLICE_F_MOVE flag is implemented, one could move anonymous
memory pages to ramfs page cache and after kexec move it back so that
almost no extra memory space costs would be required. Of course,
SPLICE_F_MOVE is to be yet implemented, but with PRAMFS significant
memory costs are inevitable... or am I wrong?

Thanks.


From this point of view you are right. Pramfs (or other solution like 
that) are out of page cache, so you can't do any memory transfer. It's 
like to have a disk but it's actually a separate piece of RAM. We could 
talk about it again when this kind of implementation will be done.


Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] pram: persistent over-kexec memory file system

2013-07-28 Thread Marco Stornelli

Il 28/07/2013 12:05, Vladimir Davydov ha scritto:

On 07/27/2013 09:37 PM, Marco Stornelli wrote:

Il 27/07/2013 19:35, Vladimir Davydov ha scritto:

On 07/27/2013 07:41 PM, Marco Stornelli wrote:

Il 26/07/2013 14:29, Vladimir Davydov ha scritto:

Hi,

We want to propose a way to upgrade a kernel on a machine without
restarting all the user-space services. This is to be done with CRIU
project, but we need help from the kernel to preserve some data in
memory while doing kexec.

The key point of our implementation is leaving process memory in-place
during reboot. This should eliminate most io operations the services
would produce during initialization. To achieve this, we have
implemented a pseudo file system that preserves its content during
kexec. We propose saving CRIU dump files to this file system,
kexec'ing
and then restoring the processes in the newly booted kernel.



http://pramfs.sourceforge.net/


AFAIU it's a bit different thing: PRAMFS as well as pstore, which has
already been merged, requires hardware support for over-reboot
persistency, so called non-volatile RAM, i.e. RAM which is not directly
accessible and so is not used by the kernel. On the contrary, what we'd
like to have is preserving usual RAM on kexec. It is possible, because
RAM is not reset during kexec. This would allow leaving applications
working set as well as filesystem caches in place, speeding the reboot
process as a whole and reducing the downtime significantly.

Thanks.


Actually not. You can use normal system RAM reserved at boot with mem
parameter without any kernel change. Until an hard reset happens, that
area will be persistent.


Thank you, we'll look at PRAMFS closer, but right now, after trying it I
have a couple of concerns I'd appreciate if you could clarify:

1) As you advised, I tried to reserve a range of memory (passing
memmap=4G$4G at boot) and mounted PRAMFS using the following options:

# mount -t pramfs -o physaddr=0x1,init=4G,bs=4096 none /mnt/pramfs

And it turned out that PRAMFS is very slow as compared to ramfs:

# dd if=/dev/zero of=/mnt/pramfs if=/dev/zero of=/mnt/pramfs/dummy
bs=4096 count=$[100*1024]
102400+0 records in
102400+0 records out
419430400 bytes (419 MB) copied, 9.23498 s, 45.4 MB/s
# dd if=/dev/zero of=/mnt/pramfs if=/dev/zero of=/mnt/pramfs/dummy
bs=4096 count=$[100*1024] conv=notrunc
102400+0 records in
102400+0 records out
419430400 bytes (419 MB) copied, 3.04692 s, 138 MB/s

We need it to be as fast as usual RAM, because otherwise the benefit of
it over hdd disappears. So before diving into the code, I'd like to ask
you if it's intrinsic to PRAMFS, or can it be fixed? Or, perhaps, I used
wrong mount/boot/config options (btw, I enabled only CONFIG_PRAMFS)?



In x86 you should have the write protection enabled. Turn it off or 
mount it with noprotect option.



2) To enable saving application dump files in memory using PRAMFS, one
should reserve half of RAM for it. That's too expensive. While with
ramfs, once SPLICE_F_MOVE flag is implemented, one could move anonymous
memory pages to ramfs page cache and after kexec move it back so that
almost no extra memory space costs would be required. Of course,
SPLICE_F_MOVE is to be yet implemented, but with PRAMFS significant
memory costs are inevitable... or am I wrong?

Thanks.


From this point of view you are right. Pramfs (or other solution like 
that) are out of page cache, so you can't do any memory transfer. It's 
like to have a disk but it's actually a separate piece of RAM. We could 
talk about it again when this kind of implementation will be done.


Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


O_TMPFILE problem

2013-07-28 Thread Marco Stornelli

Hi,

I'm doing a couple of test about O_TMPFILE on my fs. I can see that when 
the file is closed the blocks allocated are not freed. It happens 
because of i_mode, it isn't a regular file nor a directory nor a link. I 
added S_IFREG in my implementation of tmpfile callback when I have to 
call new_inode() and now it works, but am I missing something here?


Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] pram: persistent over-kexec memory file system

2013-07-27 Thread Marco Stornelli

Il 27/07/2013 19:35, Vladimir Davydov ha scritto:

On 07/27/2013 07:41 PM, Marco Stornelli wrote:

Il 26/07/2013 14:29, Vladimir Davydov ha scritto:

Hi,

We want to propose a way to upgrade a kernel on a machine without
restarting all the user-space services. This is to be done with CRIU
project, but we need help from the kernel to preserve some data in
memory while doing kexec.

The key point of our implementation is leaving process memory in-place
during reboot. This should eliminate most io operations the services
would produce during initialization. To achieve this, we have
implemented a pseudo file system that preserves its content during
kexec. We propose saving CRIU dump files to this file system, kexec'ing
and then restoring the processes in the newly booted kernel.



http://pramfs.sourceforge.net/


AFAIU it's a bit different thing: PRAMFS as well as pstore, which has
already been merged, requires hardware support for over-reboot
persistency, so called non-volatile RAM, i.e. RAM which is not directly
accessible and so is not used by the kernel. On the contrary, what we'd
like to have is preserving usual RAM on kexec. It is possible, because
RAM is not reset during kexec. This would allow leaving applications
working set as well as filesystem caches in place, speeding the reboot
process as a whole and reducing the downtime significantly.

Thanks.


Actually not. You can use normal system RAM reserved at boot with mem 
parameter without any kernel change. Until an hard reset happens, that 
area will be "persistent".


Regards,

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] pram: persistent over-kexec memory file system

2013-07-27 Thread Marco Stornelli

Il 26/07/2013 14:29, Vladimir Davydov ha scritto:

Hi,

We want to propose a way to upgrade a kernel on a machine without
restarting all the user-space services. This is to be done with CRIU
project, but we need help from the kernel to preserve some data in
memory while doing kexec.

The key point of our implementation is leaving process memory in-place
during reboot. This should eliminate most io operations the services
would produce during initialization. To achieve this, we have
implemented a pseudo file system that preserves its content during
kexec. We propose saving CRIU dump files to this file system, kexec'ing
and then restoring the processes in the newly booted kernel.



http://pramfs.sourceforge.net/

Marco

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] pram: persistent over-kexec memory file system

2013-07-27 Thread Marco Stornelli

Il 26/07/2013 14:29, Vladimir Davydov ha scritto:

Hi,

We want to propose a way to upgrade a kernel on a machine without
restarting all the user-space services. This is to be done with CRIU
project, but we need help from the kernel to preserve some data in
memory while doing kexec.

The key point of our implementation is leaving process memory in-place
during reboot. This should eliminate most io operations the services
would produce during initialization. To achieve this, we have
implemented a pseudo file system that preserves its content during
kexec. We propose saving CRIU dump files to this file system, kexec'ing
and then restoring the processes in the newly booted kernel.



http://pramfs.sourceforge.net/

Marco

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] pram: persistent over-kexec memory file system

2013-07-27 Thread Marco Stornelli

Il 27/07/2013 19:35, Vladimir Davydov ha scritto:

On 07/27/2013 07:41 PM, Marco Stornelli wrote:

Il 26/07/2013 14:29, Vladimir Davydov ha scritto:

Hi,

We want to propose a way to upgrade a kernel on a machine without
restarting all the user-space services. This is to be done with CRIU
project, but we need help from the kernel to preserve some data in
memory while doing kexec.

The key point of our implementation is leaving process memory in-place
during reboot. This should eliminate most io operations the services
would produce during initialization. To achieve this, we have
implemented a pseudo file system that preserves its content during
kexec. We propose saving CRIU dump files to this file system, kexec'ing
and then restoring the processes in the newly booted kernel.



http://pramfs.sourceforge.net/


AFAIU it's a bit different thing: PRAMFS as well as pstore, which has
already been merged, requires hardware support for over-reboot
persistency, so called non-volatile RAM, i.e. RAM which is not directly
accessible and so is not used by the kernel. On the contrary, what we'd
like to have is preserving usual RAM on kexec. It is possible, because
RAM is not reset during kexec. This would allow leaving applications
working set as well as filesystem caches in place, speeding the reboot
process as a whole and reducing the downtime significantly.

Thanks.


Actually not. You can use normal system RAM reserved at boot with mem 
parameter without any kernel change. Until an hard reset happens, that 
area will be persistent.


Regards,

Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[3.11-rc1] kernel hangs

2013-07-21 Thread Marco Stornelli

Hi,

I'm trying 3.11-rc1 but my vm hangs at boot. I can't see anything 
because the system hangs after "booting the kernel". Attached my 
.config. Extra patches: pramfs (with or without it's the same). My 
environment: intel cpu, 32bit, vm with 64MB RAM, no initramfs, rootfs 
over NFS on the host system. Is it a known problem?


Regards,

Marco


.config
Description: application/config


[3.11-rc1] kernel hangs

2013-07-21 Thread Marco Stornelli

Hi,

I'm trying 3.11-rc1 but my vm hangs at boot. I can't see anything 
because the system hangs after booting the kernel. Attached my 
.config. Extra patches: pramfs (with or without it's the same). My 
environment: intel cpu, 32bit, vm with 64MB RAM, no initramfs, rootfs 
over NFS on the host system. Is it a known problem?


Regards,

Marco


.config
Description: application/config


Re: [PATCH v2] fs: add jfsv3 (AIX powerpc native JFS file system) read-only support

2013-05-24 Thread Marco Stornelli

Il 22/05/2013 18:57, p...@macq.eu ha scritto:

From: Philippe De Muyter 

This is a file system driver for the file system called JFS on AIX, but
different from what's called jfs on linux.  In AIX header files this
file system seems to be called "Version 3" or "Version 3p", hence its
name here.  This driver supports only read-only access to such file systems,
and has been tested successfully on AIX 3.5, AIX 4.1 and AIX 4.2 filesystems.

Signed-off-by: Philippe De Muyter 
Tested-by: Jori Mantysalo 
---
  fs/Kconfig|1 +
  fs/Makefile   |1 +
  fs/jfsv3/Kconfig  |   10 +
  fs/jfsv3/Makefile |7 +
  fs/jfsv3/inode.c  |  707 +
  5 files changed, 726 insertions(+), 0 deletions(-)
  create mode 100644 fs/jfsv3/Kconfig
  create mode 100644 fs/jfsv3/Makefile
  create mode 100644 fs/jfsv3/inode.c

diff --git a/fs/Kconfig b/fs/Kconfig
index c229f82..807823a 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -212,6 +212,7 @@ source "fs/ufs/Kconfig"
  source "fs/exofs/Kconfig"
  source "fs/f2fs/Kconfig"
  source "fs/efivarfs/Kconfig"
+source "fs/jfsv3/Kconfig"

  endif # MISC_FILESYSTEMS

diff --git a/fs/Makefile b/fs/Makefile
index 4fe6df3..99cd8e6 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -122,6 +122,7 @@ obj-$(CONFIG_OCFS2_FS)  += ocfs2/
  obj-$(CONFIG_BTRFS_FS)+= btrfs/
  obj-$(CONFIG_GFS2_FS)   += gfs2/
  obj-$(CONFIG_F2FS_FS) += f2fs/
+obj-$(CONFIG_JFSV3_FS) += jfsv3/
  obj-y += exofs/ # Multiple modules
  obj-$(CONFIG_CEPH_FS) += ceph/
  obj-$(CONFIG_PSTORE)  += pstore/
diff --git a/fs/jfsv3/Kconfig b/fs/jfsv3/Kconfig
new file mode 100644
index 000..4ba73c5
--- /dev/null
+++ b/fs/jfsv3/Kconfig
@@ -0,0 +1,10 @@
+config JFSV3_FS
+   tristate "AIX jfsv3 file system support"
+   ---help---
+ Read-only support for AIX jfs file systems (not to be confused
+ with linux jfs).  You should normally also select support for
+ AIX LVM partitions, but if you manage to get a AIX file system
+ image by another way (dd, e.g.), selecting this is enough.  You'll
+ be able to mount your disk image using the loop driver.
+ To compile this file system support as a module, choose M here: the
+ module will be called jfsv3.
diff --git a/fs/jfsv3/Makefile b/fs/jfsv3/Makefile
new file mode 100644
index 000..d6ecd66
--- /dev/null
+++ b/fs/jfsv3/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the AIX jfsv3 filesystem routines.
+#
+
+obj-$(CONFIG_JFSV3_FS) += jfsv3.o
+
+jfsv3-objs := inode.o
diff --git a/fs/jfsv3/inode.c b/fs/jfsv3/inode.c
new file mode 100644
index 000..bd40103
--- /dev/null
+++ b/fs/jfsv3/inode.c
@@ -0,0 +1,707 @@
+/*
+ * AIX JFS Version 3/3p file system, Linux read-only implementation
+ *
+ * Copyright (C) 2012-2013  Philippe De Muyter 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct  jfsv3_superblock {
+   __be32 s_magic;/* magic number */
+   char   s_cpu;  /* Target cpu type code */
+   char   s_flag1;/* reserved */
+   char   s_flag2;/* reserved */
+   char   s_type; /* File system type code */
+   __be32 s_agsize;   /* fragments per allocation group */
+   __be32 s_logserial;/* serial number of log when fs mounted */
+   __be32 s_fsize;/* size (in 512 bytes) of entire fs */
+   __be16 s_bsize;/* block size in bytes */
+   __be16 s_spare;/* unused. */
+   char   s_fname[6]; /* name of this file system */
+   char   s_fpack[6]; /* name of this volume */
+   __be32 s_logdev;   /* device address of log */
+
+   /* current file system state information, values change over time */
+   char   s_fmod; /* flag: set when file system is mounted */
+   char   s_ronly;/* flag: file system is read only */
+   __be32 s_time; /* time of last superblock update */
+
+   /* more persistent information */
+   __be32 s_version;  /* version number */
+   __be32 s_fragsize; /* fragment size in bytes (fsv3p only) */
+   __be32 s_iagsize;  /* disk inode per alloc grp (fsv3p only) */
+   __be32 s_compress; /* > 0 if data compression */
+};
+
+#define JFSV3_SUPER_MAGIC 0x43218765   /* Version 3 fs magic number */
+#define JFSV3P_SUPER_MAGIC 0x65872143  /* Version 3p fs magic number */
+


In magic.h please.


+#define D_PRIVATE 48   /* max len of in-inode symlink */
+
+struct jfsv3_dinode {
+   __be32 di_gen;
+   __be32 di_mode;
+   __be16 di_nlink;
+   __be16 di_acct;
+  

Re: [PATCH v2] fs: add jfsv3 (AIX powerpc native JFS file system) read-only support

2013-05-24 Thread Marco Stornelli

Il 22/05/2013 18:57, p...@macq.eu ha scritto:

From: Philippe De Muyter p...@macqel.be

This is a file system driver for the file system called JFS on AIX, but
different from what's called jfs on linux.  In AIX header files this
file system seems to be called Version 3 or Version 3p, hence its
name here.  This driver supports only read-only access to such file systems,
and has been tested successfully on AIX 3.5, AIX 4.1 and AIX 4.2 filesystems.

Signed-off-by: Philippe De Muyter p...@macqel.be
Tested-by: Jori Mantysalo jori.mantys...@uta.fi
---
  fs/Kconfig|1 +
  fs/Makefile   |1 +
  fs/jfsv3/Kconfig  |   10 +
  fs/jfsv3/Makefile |7 +
  fs/jfsv3/inode.c  |  707 +
  5 files changed, 726 insertions(+), 0 deletions(-)
  create mode 100644 fs/jfsv3/Kconfig
  create mode 100644 fs/jfsv3/Makefile
  create mode 100644 fs/jfsv3/inode.c

diff --git a/fs/Kconfig b/fs/Kconfig
index c229f82..807823a 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -212,6 +212,7 @@ source fs/ufs/Kconfig
  source fs/exofs/Kconfig
  source fs/f2fs/Kconfig
  source fs/efivarfs/Kconfig
+source fs/jfsv3/Kconfig

  endif # MISC_FILESYSTEMS

diff --git a/fs/Makefile b/fs/Makefile
index 4fe6df3..99cd8e6 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -122,6 +122,7 @@ obj-$(CONFIG_OCFS2_FS)  += ocfs2/
  obj-$(CONFIG_BTRFS_FS)+= btrfs/
  obj-$(CONFIG_GFS2_FS)   += gfs2/
  obj-$(CONFIG_F2FS_FS) += f2fs/
+obj-$(CONFIG_JFSV3_FS) += jfsv3/
  obj-y += exofs/ # Multiple modules
  obj-$(CONFIG_CEPH_FS) += ceph/
  obj-$(CONFIG_PSTORE)  += pstore/
diff --git a/fs/jfsv3/Kconfig b/fs/jfsv3/Kconfig
new file mode 100644
index 000..4ba73c5
--- /dev/null
+++ b/fs/jfsv3/Kconfig
@@ -0,0 +1,10 @@
+config JFSV3_FS
+   tristate AIX jfsv3 file system support
+   ---help---
+ Read-only support for AIX jfs file systems (not to be confused
+ with linux jfs).  You should normally also select support for
+ AIX LVM partitions, but if you manage to get a AIX file system
+ image by another way (dd, e.g.), selecting this is enough.  You'll
+ be able to mount your disk image using the loop driver.
+ To compile this file system support as a module, choose M here: the
+ module will be called jfsv3.
diff --git a/fs/jfsv3/Makefile b/fs/jfsv3/Makefile
new file mode 100644
index 000..d6ecd66
--- /dev/null
+++ b/fs/jfsv3/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the AIX jfsv3 filesystem routines.
+#
+
+obj-$(CONFIG_JFSV3_FS) += jfsv3.o
+
+jfsv3-objs := inode.o
diff --git a/fs/jfsv3/inode.c b/fs/jfsv3/inode.c
new file mode 100644
index 000..bd40103
--- /dev/null
+++ b/fs/jfsv3/inode.c
@@ -0,0 +1,707 @@
+/*
+ * AIX JFS Version 3/3p file system, Linux read-only implementation
+ *
+ * Copyright (C) 2012-2013  Philippe De Muyter p...@macqel.be
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include linux/module.h
+#include linux/namei.h
+#include linux/statfs.h
+#include linux/buffer_head.h
+#include linux/mpage.h
+#include linux/fs.h
+#include linux/slab.h
+
+struct  jfsv3_superblock {
+   __be32 s_magic;/* magic number */
+   char   s_cpu;  /* Target cpu type code */
+   char   s_flag1;/* reserved */
+   char   s_flag2;/* reserved */
+   char   s_type; /* File system type code */
+   __be32 s_agsize;   /* fragments per allocation group */
+   __be32 s_logserial;/* serial number of log when fs mounted */
+   __be32 s_fsize;/* size (in 512 bytes) of entire fs */
+   __be16 s_bsize;/* block size in bytes */
+   __be16 s_spare;/* unused. */
+   char   s_fname[6]; /* name of this file system */
+   char   s_fpack[6]; /* name of this volume */
+   __be32 s_logdev;   /* device address of log */
+
+   /* current file system state information, values change over time */
+   char   s_fmod; /* flag: set when file system is mounted */
+   char   s_ronly;/* flag: file system is read only */
+   __be32 s_time; /* time of last superblock update */
+
+   /* more persistent information */
+   __be32 s_version;  /* version number */
+   __be32 s_fragsize; /* fragment size in bytes (fsv3p only) */
+   __be32 s_iagsize;  /* disk inode per alloc grp (fsv3p only) */
+   __be32 s_compress; /*  0 if data compression */
+};
+
+#define JFSV3_SUPER_MAGIC 0x43218765   /* Version 3 fs magic number */
+#define JFSV3P_SUPER_MAGIC 0x65872143  /* Version 3p fs magic number */
+


In magic.h please.


+#define D_PRIVATE 48   /* max len of 

[PATCH 4/4] fsfreeze: return EINTR from mnt_want_write and mnt_want_write_file

2013-05-04 Thread Marco Stornelli
Replaced sb_start_write with sb_start_write_killable inside
mnt_want_write and mnt_want_write_file.

Signed-off-by: Marco Stornelli 
Reviewed-by: Jan Kara 
---
 fs/namei.c |6 ++
 fs/namespace.c |8 ++--
 ipc/mqueue.c   |6 +-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 57ae9c8..5f239fd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2750,6 +2750,8 @@ static int do_last(struct nameidata *nd, struct path 
*path,
 retry_lookup:
if (op->open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
error = mnt_want_write(nd->path.mnt);
+   if (error == -EINTR)
+   goto out;
if (!error)
got_write = true;
/*
@@ -3053,6 +3055,10 @@ struct dentry *kern_path_create(int dfd, const char 
*pathname,
 
/* don't fail immediately if it's r/o, at least try to report other 
errors */
err2 = mnt_want_write(nd.path.mnt);
+   if (err2 == -EINTR) {
+   dentry = ERR_PTR(-EINTR);
+   goto out;
+   }
/*
 * Do the final lookup.
 */
diff --git a/fs/namespace.c b/fs/namespace.c
index b4f96a5..2028e74 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -345,7 +345,9 @@ int mnt_want_write(struct vfsmount *m)
 {
int ret;
 
-   sb_start_write(m->mnt_sb);
+   ret = sb_start_write_killable(m->mnt_sb);
+   if (ret < 0)
+   return ret;
ret = __mnt_want_write(m);
if (ret)
sb_end_write(m->mnt_sb);
@@ -405,7 +407,9 @@ int mnt_want_write_file(struct file *file)
 {
int ret;
 
-   sb_start_write(file->f_path.mnt->mnt_sb);
+   ret = sb_start_write_killable(file->f_path.mnt->mnt_sb);
+   if (ret < 0)
+   return ret;
ret = __mnt_want_write_file(file);
if (ret)
sb_end_write(file->f_path.mnt->mnt_sb);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index e4e47f6..e8fdc03 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -800,7 +800,11 @@ SYSCALL_DEFINE4(mq_open, const char __user *, u_name, int, 
oflag, umode_t, mode,
if (fd < 0)
goto out_putname;
 
-   ro = mnt_want_write(mnt);   /* we'll drop it in any case */
+   ro = mnt_want_write(mnt);
+   if (ro == -EINTR) {
+   fd = ro;
+   goto out_putname;
+   }
error = 0;
mutex_lock(>d_inode->i_mutex);
path.dentry = lookup_one_len(name->name, root, strlen(name->name));
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] fsfreeze: use sb_start_write_killable instead of sb_start_write

2013-05-04 Thread Marco Stornelli
Replace sb_start_write with sb_start_write_killable where
possible.

Signed-off-by: Marco Stornelli 
Reviewed-by: Jan Kara 
---
 fs/open.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 8c74100..d621d76 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -182,7 +182,9 @@ static long do_sys_ftruncate(unsigned int fd, loff_t 
length, int small)
if (IS_APPEND(inode))
goto out_putf;
 
-   sb_start_write(inode->i_sb);
+   error = sb_start_write_killable(inode->i_sb);
+   if (error < 0)
+   return error;
error = locks_verify_truncate(inode, f.file, length);
if (!error)
error = security_path_truncate(>f_path);
@@ -273,7 +275,9 @@ int do_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
if (!file->f_op->fallocate)
return -EOPNOTSUPP;
 
-   sb_start_write(inode->i_sb);
+   ret = sb_start_write_killable(inode->i_sb);
+   if (ret < 0)
+   return ret;
ret = file->f_op->fallocate(file, mode, offset, len);
sb_end_write(inode->i_sb);
return ret;
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] fsfreeze: added new file_start_write_killable

2013-05-04 Thread Marco Stornelli
Replace file_start_write with file_start_write_killable where
possible.

Signed-off-by: Marco Stornelli 
Reviewed-by: Jan Kara 
---
 drivers/block/loop.c |4 +++-
 fs/aio.c |7 +--
 fs/coda/file.c   |4 +++-
 fs/read_write.c  |   38 ++
 fs/splice.c  |4 +++-
 include/linux/fs.h   |   17 +
 6 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index b2955b3..321cf26 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -230,7 +230,9 @@ static int __do_lo_send_write(struct file *file,
ssize_t bw;
mm_segment_t old_fs = get_fs();
 
-   file_start_write(file);
+   bw = file_start_write_killable(file);
+   if (bw < 0)
+   return bw;
set_fs(get_ds());
bw = file->f_op->write(file, buf, len, );
set_fs(old_fs);
diff --git a/fs/aio.c b/fs/aio.c
index 351afe7..692b408 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1324,8 +1324,11 @@ static ssize_t aio_rw_vect_retry(struct kiocb *iocb)
if (iocb->ki_pos < 0)
return -EINVAL;
 
-   if (opcode == IOCB_CMD_PWRITEV)
-   file_start_write(file);
+   if (opcode == IOCB_CMD_PWRITEV) {
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   return ret;
+   }
do {
ret = rw_op(iocb, >ki_iovec[iocb->ki_cur_seg],
iocb->ki_nr_segs - iocb->ki_cur_seg,
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 380b798..c5708d0 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -79,7 +79,9 @@ coda_file_write(struct file *coda_file, const char __user 
*buf, size_t count, lo
return -EINVAL;
 
host_inode = file_inode(host_file);
-   file_start_write(host_file);
+   ret = file_start_write_killable(host_file);
+   if (ret < 0)
+   return ret;
mutex_lock(_inode->i_mutex);
 
ret = host_file->f_op->write(host_file, buf, count, ppos);
diff --git a/fs/read_write.c b/fs/read_write.c
index 605dbbc..b561818 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -457,21 +457,23 @@ ssize_t vfs_write(struct file *file, const char __user 
*buf, size_t count, loff_
return -EFAULT;
 
ret = rw_verify_area(WRITE, file, pos, count);
-   if (ret >= 0) {
-   count = ret;
-   file_start_write(file);
-   if (file->f_op->write)
-   ret = file->f_op->write(file, buf, count, pos);
-   else
-   ret = do_sync_write(file, buf, count, pos);
-   if (ret > 0) {
-   fsnotify_modify(file);
-   add_wchar(current, ret);
-   }
-   inc_syscw(current);
-   file_end_write(file);
+   if (ret < 0)
+   goto out;
+   count = ret;
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   goto out;
+   if (file->f_op->write)
+   ret = file->f_op->write(file, buf, count, pos);
+   else
+   ret = do_sync_write(file, buf, count, pos);
+   if (ret > 0) {
+   fsnotify_modify(file);
+   add_wchar(current, ret);
}
-
+   inc_syscw(current);
+   file_end_write(file);
+out:
return ret;
 }
 
@@ -745,7 +747,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
} else {
fn = (io_fn_t)file->f_op->write;
fnv = file->f_op->aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   goto out;
}
 
if (fnv)
@@ -925,7 +929,9 @@ static ssize_t compat_do_readv_writev(int type, struct file 
*file,
} else {
fn = (io_fn_t)file->f_op->write;
fnv = file->f_op->aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   goto out;
}
 
if (fnv)
diff --git a/fs/splice.c b/fs/splice.c
index e6b2559..b37c30e 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1115,7 +1115,9 @@ static long do_splice_from(struct pipe_inode_info *pipe, 
struct file *out,
else
splice_write = default_file_splice_write;
 
-   file_start_write(out);
+   ret = file_start_write_killable(out);
+   if (ret < 0)
+   return ret;
ret = splice_write(pipe, out, ppos, len, flags);
file_end_write(out);
return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6d9bcef..a85091e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1404,6 

[PATCH 1/4] fsfreeze: wait in killable state in __sb_start_write

2013-05-04 Thread Marco Stornelli
Added a new enum to decide if we want to sleep in uninterruptible or
killable state or we want simply to return immediately.

Signed-off-by: Marco Stornelli 
Reviewed-by: Jan Kara 
---
 fs/super.c |   24 ++--
 include/linux/fs.h |   19 +--
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 7465d43..6b70c7f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1190,14 +1190,25 @@ static void acquire_freeze_lock(struct super_block *sb, 
int level, bool trylock,
  * This is an internal function, please use sb_start_{write,pagefault,intwrite}
  * instead.
  */
-int __sb_start_write(struct super_block *sb, int level, bool wait)
+int __sb_start_write(struct super_block *sb, int level, int wait)
 {
+   int ret = 0;
 retry:
if (unlikely(sb->s_writers.frozen >= level)) {
-   if (!wait)
-   return 0;
-   wait_event(sb->s_writers.wait_unfrozen,
-  sb->s_writers.frozen < level);
+   switch (wait) {
+   case FREEZE_NOWAIT:
+   return ret;
+   case FREEZE_WAIT:
+   wait_event(sb->s_writers.wait_unfrozen,
+  sb->s_writers.frozen < level);
+   break;
+   case FREEZE_WAIT_KILLABLE:
+   ret = wait_event_killable(sb->s_writers.wait_unfrozen,
+  sb->s_writers.frozen < level);
+   if (ret)
+   return -EINTR;
+   break;
+   }
}
 
 #ifdef CONFIG_LOCKDEP
@@ -1213,7 +1224,8 @@ retry:
__sb_end_write(sb, level);
goto retry;
}
-   return 1;
+   ret = 1;
+   return ret;
 }
 EXPORT_SYMBOL(__sb_start_write);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e8cd6b8..6d9bcef 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1220,6 +1220,13 @@ enum {
 
 #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1)
 
+/* Possible waiting modes */
+enum {
+   FREEZE_NOWAIT = 0,  /* no blocking call */
+   FREEZE_WAIT = 1,/* wait in uninterruptible state */
+   FREEZE_WAIT_KILLABLE = 2,   /* wait in killable state */
+};
+
 struct sb_writers {
/* Counters for counting writers at each level */
struct percpu_counter   counter[SB_FREEZE_LEVELS];
@@ -1335,7 +1342,7 @@ extern struct timespec current_fs_time(struct super_block 
*sb);
  */
 
 void __sb_end_write(struct super_block *sb, int level);
-int __sb_start_write(struct super_block *sb, int level, bool wait);
+int __sb_start_write(struct super_block *sb, int level, int wait);
 
 /**
  * sb_end_write - drop write access to a superblock
@@ -1394,12 +1401,12 @@ static inline void sb_end_intwrite(struct super_block 
*sb)
  */
 static inline void sb_start_write(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline int sb_start_write_trylock(struct super_block *sb)
 {
-   return __sb_start_write(sb, SB_FREEZE_WRITE, false);
+   return __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_NOWAIT);
 }
 
 /**
@@ -1423,7 +1430,7 @@ static inline int sb_start_write_trylock(struct 
super_block *sb)
  */
 static inline void sb_start_pagefault(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, true);
+   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, FREEZE_WAIT);
 }
 
 /*
@@ -1441,7 +1448,7 @@ static inline void sb_start_pagefault(struct super_block 
*sb)
  */
 static inline void sb_start_intwrite(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_FS, true);
+   __sb_start_write(sb, SB_FREEZE_FS, FREEZE_WAIT);
 }
 
 
@@ -2224,7 +2231,7 @@ static inline void file_start_write(struct file *file)
 {
if (!S_ISREG(file_inode(file)->i_mode))
return;
-   __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline void file_end_write(struct file *file)
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] fsfreeze: wait in killable state in __sb_start_write

2013-05-04 Thread Marco Stornelli
Added a new enum to decide if we want to sleep in uninterruptible or
killable state or we want simply to return immediately.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/super.c |   24 ++--
 include/linux/fs.h |   19 +--
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 7465d43..6b70c7f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1190,14 +1190,25 @@ static void acquire_freeze_lock(struct super_block *sb, 
int level, bool trylock,
  * This is an internal function, please use sb_start_{write,pagefault,intwrite}
  * instead.
  */
-int __sb_start_write(struct super_block *sb, int level, bool wait)
+int __sb_start_write(struct super_block *sb, int level, int wait)
 {
+   int ret = 0;
 retry:
if (unlikely(sb-s_writers.frozen = level)) {
-   if (!wait)
-   return 0;
-   wait_event(sb-s_writers.wait_unfrozen,
-  sb-s_writers.frozen  level);
+   switch (wait) {
+   case FREEZE_NOWAIT:
+   return ret;
+   case FREEZE_WAIT:
+   wait_event(sb-s_writers.wait_unfrozen,
+  sb-s_writers.frozen  level);
+   break;
+   case FREEZE_WAIT_KILLABLE:
+   ret = wait_event_killable(sb-s_writers.wait_unfrozen,
+  sb-s_writers.frozen  level);
+   if (ret)
+   return -EINTR;
+   break;
+   }
}
 
 #ifdef CONFIG_LOCKDEP
@@ -1213,7 +1224,8 @@ retry:
__sb_end_write(sb, level);
goto retry;
}
-   return 1;
+   ret = 1;
+   return ret;
 }
 EXPORT_SYMBOL(__sb_start_write);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e8cd6b8..6d9bcef 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1220,6 +1220,13 @@ enum {
 
 #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1)
 
+/* Possible waiting modes */
+enum {
+   FREEZE_NOWAIT = 0,  /* no blocking call */
+   FREEZE_WAIT = 1,/* wait in uninterruptible state */
+   FREEZE_WAIT_KILLABLE = 2,   /* wait in killable state */
+};
+
 struct sb_writers {
/* Counters for counting writers at each level */
struct percpu_counter   counter[SB_FREEZE_LEVELS];
@@ -1335,7 +1342,7 @@ extern struct timespec current_fs_time(struct super_block 
*sb);
  */
 
 void __sb_end_write(struct super_block *sb, int level);
-int __sb_start_write(struct super_block *sb, int level, bool wait);
+int __sb_start_write(struct super_block *sb, int level, int wait);
 
 /**
  * sb_end_write - drop write access to a superblock
@@ -1394,12 +1401,12 @@ static inline void sb_end_intwrite(struct super_block 
*sb)
  */
 static inline void sb_start_write(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline int sb_start_write_trylock(struct super_block *sb)
 {
-   return __sb_start_write(sb, SB_FREEZE_WRITE, false);
+   return __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_NOWAIT);
 }
 
 /**
@@ -1423,7 +1430,7 @@ static inline int sb_start_write_trylock(struct 
super_block *sb)
  */
 static inline void sb_start_pagefault(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, true);
+   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, FREEZE_WAIT);
 }
 
 /*
@@ -1441,7 +1448,7 @@ static inline void sb_start_pagefault(struct super_block 
*sb)
  */
 static inline void sb_start_intwrite(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_FS, true);
+   __sb_start_write(sb, SB_FREEZE_FS, FREEZE_WAIT);
 }
 
 
@@ -2224,7 +2231,7 @@ static inline void file_start_write(struct file *file)
 {
if (!S_ISREG(file_inode(file)-i_mode))
return;
-   __sb_start_write(file_inode(file)-i_sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(file_inode(file)-i_sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline void file_end_write(struct file *file)
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] fsfreeze: added new file_start_write_killable

2013-05-04 Thread Marco Stornelli
Replace file_start_write with file_start_write_killable where
possible.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
Reviewed-by: Jan Kara j...@suse.cz
---
 drivers/block/loop.c |4 +++-
 fs/aio.c |7 +--
 fs/coda/file.c   |4 +++-
 fs/read_write.c  |   38 ++
 fs/splice.c  |4 +++-
 include/linux/fs.h   |   17 +
 6 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index b2955b3..321cf26 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -230,7 +230,9 @@ static int __do_lo_send_write(struct file *file,
ssize_t bw;
mm_segment_t old_fs = get_fs();
 
-   file_start_write(file);
+   bw = file_start_write_killable(file);
+   if (bw  0)
+   return bw;
set_fs(get_ds());
bw = file-f_op-write(file, buf, len, pos);
set_fs(old_fs);
diff --git a/fs/aio.c b/fs/aio.c
index 351afe7..692b408 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1324,8 +1324,11 @@ static ssize_t aio_rw_vect_retry(struct kiocb *iocb)
if (iocb-ki_pos  0)
return -EINVAL;
 
-   if (opcode == IOCB_CMD_PWRITEV)
-   file_start_write(file);
+   if (opcode == IOCB_CMD_PWRITEV) {
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   return ret;
+   }
do {
ret = rw_op(iocb, iocb-ki_iovec[iocb-ki_cur_seg],
iocb-ki_nr_segs - iocb-ki_cur_seg,
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 380b798..c5708d0 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -79,7 +79,9 @@ coda_file_write(struct file *coda_file, const char __user 
*buf, size_t count, lo
return -EINVAL;
 
host_inode = file_inode(host_file);
-   file_start_write(host_file);
+   ret = file_start_write_killable(host_file);
+   if (ret  0)
+   return ret;
mutex_lock(coda_inode-i_mutex);
 
ret = host_file-f_op-write(host_file, buf, count, ppos);
diff --git a/fs/read_write.c b/fs/read_write.c
index 605dbbc..b561818 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -457,21 +457,23 @@ ssize_t vfs_write(struct file *file, const char __user 
*buf, size_t count, loff_
return -EFAULT;
 
ret = rw_verify_area(WRITE, file, pos, count);
-   if (ret = 0) {
-   count = ret;
-   file_start_write(file);
-   if (file-f_op-write)
-   ret = file-f_op-write(file, buf, count, pos);
-   else
-   ret = do_sync_write(file, buf, count, pos);
-   if (ret  0) {
-   fsnotify_modify(file);
-   add_wchar(current, ret);
-   }
-   inc_syscw(current);
-   file_end_write(file);
+   if (ret  0)
+   goto out;
+   count = ret;
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   goto out;
+   if (file-f_op-write)
+   ret = file-f_op-write(file, buf, count, pos);
+   else
+   ret = do_sync_write(file, buf, count, pos);
+   if (ret  0) {
+   fsnotify_modify(file);
+   add_wchar(current, ret);
}
-
+   inc_syscw(current);
+   file_end_write(file);
+out:
return ret;
 }
 
@@ -745,7 +747,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
} else {
fn = (io_fn_t)file-f_op-write;
fnv = file-f_op-aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   goto out;
}
 
if (fnv)
@@ -925,7 +929,9 @@ static ssize_t compat_do_readv_writev(int type, struct file 
*file,
} else {
fn = (io_fn_t)file-f_op-write;
fnv = file-f_op-aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   goto out;
}
 
if (fnv)
diff --git a/fs/splice.c b/fs/splice.c
index e6b2559..b37c30e 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1115,7 +1115,9 @@ static long do_splice_from(struct pipe_inode_info *pipe, 
struct file *out,
else
splice_write = default_file_splice_write;
 
-   file_start_write(out);
+   ret = file_start_write_killable(out);
+   if (ret  0)
+   return ret;
ret = splice_write(pipe, out, ppos, len, flags);
file_end_write(out);
return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6d9bcef..a85091e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1404,6 +1404,16 @@ static inline void sb_start_write(struct super_block *sb)
__sb_start_write(sb

[PATCH 3/4] fsfreeze: use sb_start_write_killable instead of sb_start_write

2013-05-04 Thread Marco Stornelli
Replace sb_start_write with sb_start_write_killable where
possible.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/open.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 8c74100..d621d76 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -182,7 +182,9 @@ static long do_sys_ftruncate(unsigned int fd, loff_t 
length, int small)
if (IS_APPEND(inode))
goto out_putf;
 
-   sb_start_write(inode-i_sb);
+   error = sb_start_write_killable(inode-i_sb);
+   if (error  0)
+   return error;
error = locks_verify_truncate(inode, f.file, length);
if (!error)
error = security_path_truncate(f.file-f_path);
@@ -273,7 +275,9 @@ int do_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
if (!file-f_op-fallocate)
return -EOPNOTSUPP;
 
-   sb_start_write(inode-i_sb);
+   ret = sb_start_write_killable(inode-i_sb);
+   if (ret  0)
+   return ret;
ret = file-f_op-fallocate(file, mode, offset, len);
sb_end_write(inode-i_sb);
return ret;
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] fsfreeze: return EINTR from mnt_want_write and mnt_want_write_file

2013-05-04 Thread Marco Stornelli
Replaced sb_start_write with sb_start_write_killable inside
mnt_want_write and mnt_want_write_file.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/namei.c |6 ++
 fs/namespace.c |8 ++--
 ipc/mqueue.c   |6 +-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 57ae9c8..5f239fd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2750,6 +2750,8 @@ static int do_last(struct nameidata *nd, struct path 
*path,
 retry_lookup:
if (op-open_flag  (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
error = mnt_want_write(nd-path.mnt);
+   if (error == -EINTR)
+   goto out;
if (!error)
got_write = true;
/*
@@ -3053,6 +3055,10 @@ struct dentry *kern_path_create(int dfd, const char 
*pathname,
 
/* don't fail immediately if it's r/o, at least try to report other 
errors */
err2 = mnt_want_write(nd.path.mnt);
+   if (err2 == -EINTR) {
+   dentry = ERR_PTR(-EINTR);
+   goto out;
+   }
/*
 * Do the final lookup.
 */
diff --git a/fs/namespace.c b/fs/namespace.c
index b4f96a5..2028e74 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -345,7 +345,9 @@ int mnt_want_write(struct vfsmount *m)
 {
int ret;
 
-   sb_start_write(m-mnt_sb);
+   ret = sb_start_write_killable(m-mnt_sb);
+   if (ret  0)
+   return ret;
ret = __mnt_want_write(m);
if (ret)
sb_end_write(m-mnt_sb);
@@ -405,7 +407,9 @@ int mnt_want_write_file(struct file *file)
 {
int ret;
 
-   sb_start_write(file-f_path.mnt-mnt_sb);
+   ret = sb_start_write_killable(file-f_path.mnt-mnt_sb);
+   if (ret  0)
+   return ret;
ret = __mnt_want_write_file(file);
if (ret)
sb_end_write(file-f_path.mnt-mnt_sb);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index e4e47f6..e8fdc03 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -800,7 +800,11 @@ SYSCALL_DEFINE4(mq_open, const char __user *, u_name, int, 
oflag, umode_t, mode,
if (fd  0)
goto out_putname;
 
-   ro = mnt_want_write(mnt);   /* we'll drop it in any case */
+   ro = mnt_want_write(mnt);
+   if (ro == -EINTR) {
+   fd = ro;
+   goto out_putname;
+   }
error = 0;
mutex_lock(root-d_inode-i_mutex);
path.dentry = lookup_one_len(name-name, root, strlen(name-name));
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] fsfreeze: return EINTR from mnt_want_write and mnt_want_write_file

2013-04-30 Thread Marco Stornelli
2013/4/29 Jan Kara :
> On Fri 26-04-13 10:53:27, Marco Stornelli wrote:
>> Replaced sb_start_write with sb_start_write_killable inside
>> mnt_want_write and mnt_want_write_file.
>   The patch looks good. You can add:
> Reviewed-by: Jan Kara 
> Honza
>>

Thanks for the review. I'll submit the patches for mainline in the
current merge-window.

Regards,

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] fsfreeze: return EINTR from mnt_want_write and mnt_want_write_file

2013-04-30 Thread Marco Stornelli
2013/4/29 Jan Kara j...@suse.cz:
 On Fri 26-04-13 10:53:27, Marco Stornelli wrote:
 Replaced sb_start_write with sb_start_write_killable inside
 mnt_want_write and mnt_want_write_file.
   The patch looks good. You can add:
 Reviewed-by: Jan Kara j...@suse.cz
 Honza


Thanks for the review. I'll submit the patches for mainline in the
current merge-window.

Regards,

Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] fsfreeze: added new file_start_write_killable

2013-04-26 Thread Marco Stornelli

Hi,

Il 26/04/2013 14:06, Matthew Wilcox ha scritto:

On Fri, Apr 26, 2013 at 10:50:52AM +0200, Marco Stornelli wrote:

Replace file_start_write with file_start_write_killable where
possible.


I feel like I'm missing context here.  Possibly because you only cc'd me
on patch 2/4.  In particular, file_start_write doesn't exist upstream,
so I'm not sure what it's for.  But returning 1 for non-regular files
looks dodgy:


The patch series is based on -next due to several changes done by Al 
about fsfreeze. file_start_write_killable returns 1 because it's mainly 
a wrapper of __st_start_write. __sb_start_write returns 1 when 
everything is ok, 0 when the lock can't be gotten (we are using the 
trylock version) and _now_ a value < 0 when something happens (i.e. -EINTR).





+static inline int file_start_write_killable(struct file *file)
+{
+   if (!S_ISREG(file_inode(file)->i_mode))
+   return 1;
+   return sb_start_write_killable(file_inode(file)->i_sb);
+}



+++ b/fs/aio.c
@@ -1103,8 +1103,11 @@ static ssize_t aio_rw_vect_retry(struct kiocb *iocb, int 
rw, aio_rw_op *rw_op)
if (iocb->ki_pos < 0)
return -EINVAL;

-   if (rw == WRITE)
-   file_start_write(file);
+   if (rw == WRITE) {
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   return ret;
+   }
do {


So ... it's OK to do this write to pipes/directories/devices/... ?  Or is
that check always taken care of elsewhere?  If so, why do we need this
check?  I'm confused.  None of the callers check for the 'ret == 1' case,
so I'm sure there's something wrong here, I just can't tell what it is.



See above.


+++ b/fs/read_write.c
@@ -438,17 +438,19 @@ ssize_t vfs_write(struct file *file, const char __user 
*buf, size_t count, loff_
ret = rw_verify_area(WRITE, file, pos, count);
if (ret >= 0) {
count = ret;
-   file_start_write(file);
-   if (file->f_op->write)
-   ret = file->f_op->write(file, buf, count, pos);
-   else
-   ret = do_sync_write(file, buf, count, pos);
+   ret = file_start_write_killable(file);
if (ret > 0) {
-   fsnotify_modify(file);
-   add_wchar(current, ret);
+   if (file->f_op->write)
+   ret = file->f_op->write(file, buf, count, pos);
+   else
+   ret = do_sync_write(file, buf, count, pos);
+   if (ret > 0) {
+   fsnotify_modify(file);
+   add_wchar(current, ret);
+   }
+   inc_syscw(current);
+   file_end_write(file);
}
-   inc_syscw(current);
-   file_end_write(file);
}

return ret;


I don't like it that you've increased the indentation here.  Better to do
a preliminary patch which just converts to our normal style with gotos.  ie:



Ok, I can change the style here, no problem.

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] fsfreeze: return EINTR from mnt_want_write and mnt_want_write_file

2013-04-26 Thread Marco Stornelli
Replaced sb_start_write with sb_start_write_killable inside
mnt_want_write and mnt_want_write_file.

Signed-off-by: Marco Stornelli 
---
 fs/namei.c |6 ++
 fs/namespace.c |8 ++--
 ipc/mqueue.c   |6 +-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 57ae9c8..5f239fd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2750,6 +2750,8 @@ static int do_last(struct nameidata *nd, struct path 
*path,
 retry_lookup:
if (op->open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
error = mnt_want_write(nd->path.mnt);
+   if (error == -EINTR)
+   goto out;
if (!error)
got_write = true;
/*
@@ -3053,6 +3055,10 @@ struct dentry *kern_path_create(int dfd, const char 
*pathname,
 
/* don't fail immediately if it's r/o, at least try to report other 
errors */
err2 = mnt_want_write(nd.path.mnt);
+   if (err2 == -EINTR) {
+   dentry = ERR_PTR(-EINTR);
+   goto out;
+   }
/*
 * Do the final lookup.
 */
diff --git a/fs/namespace.c b/fs/namespace.c
index af73554..db09ecb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -344,7 +344,9 @@ int mnt_want_write(struct vfsmount *m)
 {
int ret;
 
-   sb_start_write(m->mnt_sb);
+   ret = sb_start_write_killable(m->mnt_sb);
+   if (ret < 0)
+   return ret;
ret = __mnt_want_write(m);
if (ret)
sb_end_write(m->mnt_sb);
@@ -404,7 +406,9 @@ int mnt_want_write_file(struct file *file)
 {
int ret;
 
-   sb_start_write(file->f_path.mnt->mnt_sb);
+   ret = sb_start_write_killable(file->f_path.mnt->mnt_sb);
+   if (ret < 0)
+   return ret;
ret = __mnt_want_write_file(file);
if (ret)
sb_end_write(file->f_path.mnt->mnt_sb);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index e4e47f6..e8fdc03 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -800,7 +800,11 @@ SYSCALL_DEFINE4(mq_open, const char __user *, u_name, int, 
oflag, umode_t, mode,
if (fd < 0)
goto out_putname;
 
-   ro = mnt_want_write(mnt);   /* we'll drop it in any case */
+   ro = mnt_want_write(mnt);
+   if (ro == -EINTR) {
+   fd = ro;
+   goto out_putname;
+   }
error = 0;
mutex_lock(>d_inode->i_mutex);
path.dentry = lookup_one_len(name->name, root, strlen(name->name));
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] fsfreeze: use sb_start_write_killable instead of sb_start_write

2013-04-26 Thread Marco Stornelli
Replace sb_start_write with sb_start_write_killable where
possible.

Signed-off-by: Marco Stornelli 
Reviewed-by: Jan Kara 
---
 fs/open.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 8c74100..d621d76 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -182,7 +182,9 @@ static long do_sys_ftruncate(unsigned int fd, loff_t 
length, int small)
if (IS_APPEND(inode))
goto out_putf;
 
-   sb_start_write(inode->i_sb);
+   error = sb_start_write_killable(inode->i_sb);
+   if (error < 0)
+   return error;
error = locks_verify_truncate(inode, f.file, length);
if (!error)
error = security_path_truncate(>f_path);
@@ -273,7 +275,9 @@ int do_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
if (!file->f_op->fallocate)
return -EOPNOTSUPP;
 
-   sb_start_write(inode->i_sb);
+   ret = sb_start_write_killable(inode->i_sb);
+   if (ret < 0)
+   return ret;
ret = file->f_op->fallocate(file, mode, offset, len);
sb_end_write(inode->i_sb);
return ret;
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] fsfreeze: added new file_start_write_killable

2013-04-26 Thread Marco Stornelli
Replace file_start_write with file_start_write_killable where
possible.

Signed-off-by: Marco Stornelli 
Reviewed-by: Jan Kara 
---
 drivers/block/loop.c |4 +++-
 fs/aio.c |7 +--
 fs/coda/file.c   |4 +++-
 fs/read_write.c  |   28 +---
 fs/splice.c  |4 +++-
 include/linux/fs.h   |   17 +
 6 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index be9a101..2c0d0a3 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -230,7 +230,9 @@ static int __do_lo_send_write(struct file *file,
ssize_t bw;
mm_segment_t old_fs = get_fs();
 
-   file_start_write(file);
+   bw = file_start_write_killable(file);
+   if (bw < 0)
+   return bw;
set_fs(get_ds());
bw = file->f_op->write(file, buf, len, );
set_fs(old_fs);
diff --git a/fs/aio.c b/fs/aio.c
index 5b7ed78..5deddf5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1103,8 +1103,11 @@ static ssize_t aio_rw_vect_retry(struct kiocb *iocb, int 
rw, aio_rw_op *rw_op)
if (iocb->ki_pos < 0)
return -EINVAL;
 
-   if (rw == WRITE)
-   file_start_write(file);
+   if (rw == WRITE) {
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   return ret;
+   }
do {
ret = rw_op(iocb, >ki_iovec[iocb->ki_cur_seg],
iocb->ki_nr_segs - iocb->ki_cur_seg,
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 380b798..c5708d0 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -79,7 +79,9 @@ coda_file_write(struct file *coda_file, const char __user 
*buf, size_t count, lo
return -EINVAL;
 
host_inode = file_inode(host_file);
-   file_start_write(host_file);
+   ret = file_start_write_killable(host_file);
+   if (ret < 0)
+   return ret;
mutex_lock(_inode->i_mutex);
 
ret = host_file->f_op->write(host_file, buf, count, ppos);
diff --git a/fs/read_write.c b/fs/read_write.c
index 7eb7ef3..ed9006f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -438,17 +438,19 @@ ssize_t vfs_write(struct file *file, const char __user 
*buf, size_t count, loff_
ret = rw_verify_area(WRITE, file, pos, count);
if (ret >= 0) {
count = ret;
-   file_start_write(file);
-   if (file->f_op->write)
-   ret = file->f_op->write(file, buf, count, pos);
-   else
-   ret = do_sync_write(file, buf, count, pos);
+   ret = file_start_write_killable(file);
if (ret > 0) {
-   fsnotify_modify(file);
-   add_wchar(current, ret);
+   if (file->f_op->write)
+   ret = file->f_op->write(file, buf, count, pos);
+   else
+   ret = do_sync_write(file, buf, count, pos);
+   if (ret > 0) {
+   fsnotify_modify(file);
+   add_wchar(current, ret);
+   }
+   inc_syscw(current);
+   file_end_write(file);
}
-   inc_syscw(current);
-   file_end_write(file);
}
 
return ret;
@@ -718,7 +720,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
} else {
fn = (io_fn_t)file->f_op->write;
fnv = file->f_op->aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   goto out;
}
 
if (fnv)
@@ -898,7 +902,9 @@ static ssize_t compat_do_readv_writev(int type, struct file 
*file,
} else {
fn = (io_fn_t)file->f_op->write;
fnv = file->f_op->aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   goto out;
}
 
if (fnv)
diff --git a/fs/splice.c b/fs/splice.c
index e6b2559..b37c30e 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1115,7 +1115,9 @@ static long do_splice_from(struct pipe_inode_info *pipe, 
struct file *out,
else
splice_write = default_file_splice_write;
 
-   file_start_write(out);
+   ret = file_start_write_killable(out);
+   if (ret < 0)
+   return ret;
ret = splice_write(pipe, out, ppos, len, flags);
file_end_write(out);
return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c8b7325..998ec2a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1404,6 +1404,

[PATCH 1/4] fsfreeze: wait in killable state in __sb_start_write

2013-04-26 Thread Marco Stornelli
Added a new enum to decide if we want to sleep in uninterruptible or
killable state or we want simply to return immediately.

Signed-off-by: Marco Stornelli 
Reviewed-by: Jan Kara 
---
 fs/super.c |   24 ++--
 include/linux/fs.h |   19 +--
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 7465d43..6b70c7f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1190,14 +1190,25 @@ static void acquire_freeze_lock(struct super_block *sb, 
int level, bool trylock,
  * This is an internal function, please use sb_start_{write,pagefault,intwrite}
  * instead.
  */
-int __sb_start_write(struct super_block *sb, int level, bool wait)
+int __sb_start_write(struct super_block *sb, int level, int wait)
 {
+   int ret = 0;
 retry:
if (unlikely(sb->s_writers.frozen >= level)) {
-   if (!wait)
-   return 0;
-   wait_event(sb->s_writers.wait_unfrozen,
-  sb->s_writers.frozen < level);
+   switch (wait) {
+   case FREEZE_NOWAIT:
+   return ret;
+   case FREEZE_WAIT:
+   wait_event(sb->s_writers.wait_unfrozen,
+  sb->s_writers.frozen < level);
+   break;
+   case FREEZE_WAIT_KILLABLE:
+   ret = wait_event_killable(sb->s_writers.wait_unfrozen,
+  sb->s_writers.frozen < level);
+   if (ret)
+   return -EINTR;
+   break;
+   }
}
 
 #ifdef CONFIG_LOCKDEP
@@ -1213,7 +1224,8 @@ retry:
__sb_end_write(sb, level);
goto retry;
}
-   return 1;
+   ret = 1;
+   return ret;
 }
 EXPORT_SYMBOL(__sb_start_write);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8d47c9a..c8b7325 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1220,6 +1220,13 @@ enum {
 
 #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1)
 
+/* Possible waiting modes */
+enum {
+   FREEZE_NOWAIT = 0,  /* no blocking call */
+   FREEZE_WAIT = 1,/* wait in uninterruptible state */
+   FREEZE_WAIT_KILLABLE = 2,   /* wait in killable state */
+};
+
 struct sb_writers {
/* Counters for counting writers at each level */
struct percpu_counter   counter[SB_FREEZE_LEVELS];
@@ -1335,7 +1342,7 @@ extern struct timespec current_fs_time(struct super_block 
*sb);
  */
 
 void __sb_end_write(struct super_block *sb, int level);
-int __sb_start_write(struct super_block *sb, int level, bool wait);
+int __sb_start_write(struct super_block *sb, int level, int wait);
 
 /**
  * sb_end_write - drop write access to a superblock
@@ -1394,12 +1401,12 @@ static inline void sb_end_intwrite(struct super_block 
*sb)
  */
 static inline void sb_start_write(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline int sb_start_write_trylock(struct super_block *sb)
 {
-   return __sb_start_write(sb, SB_FREEZE_WRITE, false);
+   return __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_NOWAIT);
 }
 
 /**
@@ -1423,7 +1430,7 @@ static inline int sb_start_write_trylock(struct 
super_block *sb)
  */
 static inline void sb_start_pagefault(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, true);
+   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, FREEZE_WAIT);
 }
 
 /*
@@ -1441,7 +1448,7 @@ static inline void sb_start_pagefault(struct super_block 
*sb)
  */
 static inline void sb_start_intwrite(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_FS, true);
+   __sb_start_write(sb, SB_FREEZE_FS, FREEZE_WAIT);
 }
 
 
@@ -2224,7 +2231,7 @@ static inline void file_start_write(struct file *file)
 {
if (!S_ISREG(file_inode(file)->i_mode))
return;
-   __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline void file_end_write(struct file *file)
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] fsfreeze: from uninterruptible to killable

2013-04-26 Thread Marco Stornelli

Hi,

I re-send the patch series. The first three patches are not changed 
since the last review, but I add a fourth patch to manage the killable 
state in mnt_want_write/mnt_want_write_file. I did some tests and it 
seems ok to me. The hot points were do_last, kern_path_create and 
mq_open. At the moment the path not covered is the page_mkwrite, however 
it could be covered with a future patch. The work made until now is 
sufficient to give the user the possibility to do a "kill -9" in several 
cases.


Marco Stornelli (4):
  fsfreeze: wait in killable state in __sb_start_write
  fsfreeze: added new file_start_write_killable
  fsfreeze: use sb_start_write_killable instead of sb_start_write
  fsfreeze: return EINTR from mnt_want_write and mnt_want_write_file

 drivers/block/loop.c |4 +++-
 fs/aio.c |7 +--
 fs/coda/file.c   |4 +++-
 fs/namei.c   |6 ++
 fs/namespace.c   |8 ++--
 fs/open.c|8 ++--
 fs/read_write.c  |   28 +---
 fs/splice.c  |4 +++-
 fs/super.c   |   24 ++--
 include/linux/fs.h   |   36 ++--
 ipc/mqueue.c |6 +-
 11 files changed, 102 insertions(+), 33 deletions(-)

--
1.7.3.4
---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] fsfreeze: from uninterruptible to killable

2013-04-26 Thread Marco Stornelli

Hi,

I re-send the patch series. The first three patches are not changed 
since the last review, but I add a fourth patch to manage the killable 
state in mnt_want_write/mnt_want_write_file. I did some tests and it 
seems ok to me. The hot points were do_last, kern_path_create and 
mq_open. At the moment the path not covered is the page_mkwrite, however 
it could be covered with a future patch. The work made until now is 
sufficient to give the user the possibility to do a kill -9 in several 
cases.


Marco Stornelli (4):
  fsfreeze: wait in killable state in __sb_start_write
  fsfreeze: added new file_start_write_killable
  fsfreeze: use sb_start_write_killable instead of sb_start_write
  fsfreeze: return EINTR from mnt_want_write and mnt_want_write_file

 drivers/block/loop.c |4 +++-
 fs/aio.c |7 +--
 fs/coda/file.c   |4 +++-
 fs/namei.c   |6 ++
 fs/namespace.c   |8 ++--
 fs/open.c|8 ++--
 fs/read_write.c  |   28 +---
 fs/splice.c  |4 +++-
 fs/super.c   |   24 ++--
 include/linux/fs.h   |   36 ++--
 ipc/mqueue.c |6 +-
 11 files changed, 102 insertions(+), 33 deletions(-)

--
1.7.3.4
---

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] fsfreeze: wait in killable state in __sb_start_write

2013-04-26 Thread Marco Stornelli
Added a new enum to decide if we want to sleep in uninterruptible or
killable state or we want simply to return immediately.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/super.c |   24 ++--
 include/linux/fs.h |   19 +--
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 7465d43..6b70c7f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1190,14 +1190,25 @@ static void acquire_freeze_lock(struct super_block *sb, 
int level, bool trylock,
  * This is an internal function, please use sb_start_{write,pagefault,intwrite}
  * instead.
  */
-int __sb_start_write(struct super_block *sb, int level, bool wait)
+int __sb_start_write(struct super_block *sb, int level, int wait)
 {
+   int ret = 0;
 retry:
if (unlikely(sb-s_writers.frozen = level)) {
-   if (!wait)
-   return 0;
-   wait_event(sb-s_writers.wait_unfrozen,
-  sb-s_writers.frozen  level);
+   switch (wait) {
+   case FREEZE_NOWAIT:
+   return ret;
+   case FREEZE_WAIT:
+   wait_event(sb-s_writers.wait_unfrozen,
+  sb-s_writers.frozen  level);
+   break;
+   case FREEZE_WAIT_KILLABLE:
+   ret = wait_event_killable(sb-s_writers.wait_unfrozen,
+  sb-s_writers.frozen  level);
+   if (ret)
+   return -EINTR;
+   break;
+   }
}
 
 #ifdef CONFIG_LOCKDEP
@@ -1213,7 +1224,8 @@ retry:
__sb_end_write(sb, level);
goto retry;
}
-   return 1;
+   ret = 1;
+   return ret;
 }
 EXPORT_SYMBOL(__sb_start_write);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8d47c9a..c8b7325 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1220,6 +1220,13 @@ enum {
 
 #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1)
 
+/* Possible waiting modes */
+enum {
+   FREEZE_NOWAIT = 0,  /* no blocking call */
+   FREEZE_WAIT = 1,/* wait in uninterruptible state */
+   FREEZE_WAIT_KILLABLE = 2,   /* wait in killable state */
+};
+
 struct sb_writers {
/* Counters for counting writers at each level */
struct percpu_counter   counter[SB_FREEZE_LEVELS];
@@ -1335,7 +1342,7 @@ extern struct timespec current_fs_time(struct super_block 
*sb);
  */
 
 void __sb_end_write(struct super_block *sb, int level);
-int __sb_start_write(struct super_block *sb, int level, bool wait);
+int __sb_start_write(struct super_block *sb, int level, int wait);
 
 /**
  * sb_end_write - drop write access to a superblock
@@ -1394,12 +1401,12 @@ static inline void sb_end_intwrite(struct super_block 
*sb)
  */
 static inline void sb_start_write(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline int sb_start_write_trylock(struct super_block *sb)
 {
-   return __sb_start_write(sb, SB_FREEZE_WRITE, false);
+   return __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_NOWAIT);
 }
 
 /**
@@ -1423,7 +1430,7 @@ static inline int sb_start_write_trylock(struct 
super_block *sb)
  */
 static inline void sb_start_pagefault(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, true);
+   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, FREEZE_WAIT);
 }
 
 /*
@@ -1441,7 +1448,7 @@ static inline void sb_start_pagefault(struct super_block 
*sb)
  */
 static inline void sb_start_intwrite(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_FS, true);
+   __sb_start_write(sb, SB_FREEZE_FS, FREEZE_WAIT);
 }
 
 
@@ -2224,7 +2231,7 @@ static inline void file_start_write(struct file *file)
 {
if (!S_ISREG(file_inode(file)-i_mode))
return;
-   __sb_start_write(file_inode(file)-i_sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(file_inode(file)-i_sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline void file_end_write(struct file *file)
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] fsfreeze: added new file_start_write_killable

2013-04-26 Thread Marco Stornelli
Replace file_start_write with file_start_write_killable where
possible.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
Reviewed-by: Jan Kara j...@suse.cz
---
 drivers/block/loop.c |4 +++-
 fs/aio.c |7 +--
 fs/coda/file.c   |4 +++-
 fs/read_write.c  |   28 +---
 fs/splice.c  |4 +++-
 include/linux/fs.h   |   17 +
 6 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index be9a101..2c0d0a3 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -230,7 +230,9 @@ static int __do_lo_send_write(struct file *file,
ssize_t bw;
mm_segment_t old_fs = get_fs();
 
-   file_start_write(file);
+   bw = file_start_write_killable(file);
+   if (bw  0)
+   return bw;
set_fs(get_ds());
bw = file-f_op-write(file, buf, len, pos);
set_fs(old_fs);
diff --git a/fs/aio.c b/fs/aio.c
index 5b7ed78..5deddf5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1103,8 +1103,11 @@ static ssize_t aio_rw_vect_retry(struct kiocb *iocb, int 
rw, aio_rw_op *rw_op)
if (iocb-ki_pos  0)
return -EINVAL;
 
-   if (rw == WRITE)
-   file_start_write(file);
+   if (rw == WRITE) {
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   return ret;
+   }
do {
ret = rw_op(iocb, iocb-ki_iovec[iocb-ki_cur_seg],
iocb-ki_nr_segs - iocb-ki_cur_seg,
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 380b798..c5708d0 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -79,7 +79,9 @@ coda_file_write(struct file *coda_file, const char __user 
*buf, size_t count, lo
return -EINVAL;
 
host_inode = file_inode(host_file);
-   file_start_write(host_file);
+   ret = file_start_write_killable(host_file);
+   if (ret  0)
+   return ret;
mutex_lock(coda_inode-i_mutex);
 
ret = host_file-f_op-write(host_file, buf, count, ppos);
diff --git a/fs/read_write.c b/fs/read_write.c
index 7eb7ef3..ed9006f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -438,17 +438,19 @@ ssize_t vfs_write(struct file *file, const char __user 
*buf, size_t count, loff_
ret = rw_verify_area(WRITE, file, pos, count);
if (ret = 0) {
count = ret;
-   file_start_write(file);
-   if (file-f_op-write)
-   ret = file-f_op-write(file, buf, count, pos);
-   else
-   ret = do_sync_write(file, buf, count, pos);
+   ret = file_start_write_killable(file);
if (ret  0) {
-   fsnotify_modify(file);
-   add_wchar(current, ret);
+   if (file-f_op-write)
+   ret = file-f_op-write(file, buf, count, pos);
+   else
+   ret = do_sync_write(file, buf, count, pos);
+   if (ret  0) {
+   fsnotify_modify(file);
+   add_wchar(current, ret);
+   }
+   inc_syscw(current);
+   file_end_write(file);
}
-   inc_syscw(current);
-   file_end_write(file);
}
 
return ret;
@@ -718,7 +720,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
} else {
fn = (io_fn_t)file-f_op-write;
fnv = file-f_op-aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   goto out;
}
 
if (fnv)
@@ -898,7 +902,9 @@ static ssize_t compat_do_readv_writev(int type, struct file 
*file,
} else {
fn = (io_fn_t)file-f_op-write;
fnv = file-f_op-aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   goto out;
}
 
if (fnv)
diff --git a/fs/splice.c b/fs/splice.c
index e6b2559..b37c30e 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1115,7 +1115,9 @@ static long do_splice_from(struct pipe_inode_info *pipe, 
struct file *out,
else
splice_write = default_file_splice_write;
 
-   file_start_write(out);
+   ret = file_start_write_killable(out);
+   if (ret  0)
+   return ret;
ret = splice_write(pipe, out, ppos, len, flags);
file_end_write(out);
return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c8b7325..998ec2a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1404,6 +1404,16 @@ static inline void sb_start_write(struct super_block *sb)
__sb_start_write(sb

[PATCH 3/4] fsfreeze: use sb_start_write_killable instead of sb_start_write

2013-04-26 Thread Marco Stornelli
Replace sb_start_write with sb_start_write_killable where
possible.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/open.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 8c74100..d621d76 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -182,7 +182,9 @@ static long do_sys_ftruncate(unsigned int fd, loff_t 
length, int small)
if (IS_APPEND(inode))
goto out_putf;
 
-   sb_start_write(inode-i_sb);
+   error = sb_start_write_killable(inode-i_sb);
+   if (error  0)
+   return error;
error = locks_verify_truncate(inode, f.file, length);
if (!error)
error = security_path_truncate(f.file-f_path);
@@ -273,7 +275,9 @@ int do_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
if (!file-f_op-fallocate)
return -EOPNOTSUPP;
 
-   sb_start_write(inode-i_sb);
+   ret = sb_start_write_killable(inode-i_sb);
+   if (ret  0)
+   return ret;
ret = file-f_op-fallocate(file, mode, offset, len);
sb_end_write(inode-i_sb);
return ret;
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] fsfreeze: return EINTR from mnt_want_write and mnt_want_write_file

2013-04-26 Thread Marco Stornelli
Replaced sb_start_write with sb_start_write_killable inside
mnt_want_write and mnt_want_write_file.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/namei.c |6 ++
 fs/namespace.c |8 ++--
 ipc/mqueue.c   |6 +-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 57ae9c8..5f239fd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2750,6 +2750,8 @@ static int do_last(struct nameidata *nd, struct path 
*path,
 retry_lookup:
if (op-open_flag  (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
error = mnt_want_write(nd-path.mnt);
+   if (error == -EINTR)
+   goto out;
if (!error)
got_write = true;
/*
@@ -3053,6 +3055,10 @@ struct dentry *kern_path_create(int dfd, const char 
*pathname,
 
/* don't fail immediately if it's r/o, at least try to report other 
errors */
err2 = mnt_want_write(nd.path.mnt);
+   if (err2 == -EINTR) {
+   dentry = ERR_PTR(-EINTR);
+   goto out;
+   }
/*
 * Do the final lookup.
 */
diff --git a/fs/namespace.c b/fs/namespace.c
index af73554..db09ecb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -344,7 +344,9 @@ int mnt_want_write(struct vfsmount *m)
 {
int ret;
 
-   sb_start_write(m-mnt_sb);
+   ret = sb_start_write_killable(m-mnt_sb);
+   if (ret  0)
+   return ret;
ret = __mnt_want_write(m);
if (ret)
sb_end_write(m-mnt_sb);
@@ -404,7 +406,9 @@ int mnt_want_write_file(struct file *file)
 {
int ret;
 
-   sb_start_write(file-f_path.mnt-mnt_sb);
+   ret = sb_start_write_killable(file-f_path.mnt-mnt_sb);
+   if (ret  0)
+   return ret;
ret = __mnt_want_write_file(file);
if (ret)
sb_end_write(file-f_path.mnt-mnt_sb);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index e4e47f6..e8fdc03 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -800,7 +800,11 @@ SYSCALL_DEFINE4(mq_open, const char __user *, u_name, int, 
oflag, umode_t, mode,
if (fd  0)
goto out_putname;
 
-   ro = mnt_want_write(mnt);   /* we'll drop it in any case */
+   ro = mnt_want_write(mnt);
+   if (ro == -EINTR) {
+   fd = ro;
+   goto out_putname;
+   }
error = 0;
mutex_lock(root-d_inode-i_mutex);
path.dentry = lookup_one_len(name-name, root, strlen(name-name));
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] fsfreeze: added new file_start_write_killable

2013-04-26 Thread Marco Stornelli

Hi,

Il 26/04/2013 14:06, Matthew Wilcox ha scritto:

On Fri, Apr 26, 2013 at 10:50:52AM +0200, Marco Stornelli wrote:

Replace file_start_write with file_start_write_killable where
possible.


I feel like I'm missing context here.  Possibly because you only cc'd me
on patch 2/4.  In particular, file_start_write doesn't exist upstream,
so I'm not sure what it's for.  But returning 1 for non-regular files
looks dodgy:


The patch series is based on -next due to several changes done by Al 
about fsfreeze. file_start_write_killable returns 1 because it's mainly 
a wrapper of __st_start_write. __sb_start_write returns 1 when 
everything is ok, 0 when the lock can't be gotten (we are using the 
trylock version) and _now_ a value  0 when something happens (i.e. -EINTR).





+static inline int file_start_write_killable(struct file *file)
+{
+   if (!S_ISREG(file_inode(file)-i_mode))
+   return 1;
+   return sb_start_write_killable(file_inode(file)-i_sb);
+}



+++ b/fs/aio.c
@@ -1103,8 +1103,11 @@ static ssize_t aio_rw_vect_retry(struct kiocb *iocb, int 
rw, aio_rw_op *rw_op)
if (iocb-ki_pos  0)
return -EINVAL;

-   if (rw == WRITE)
-   file_start_write(file);
+   if (rw == WRITE) {
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   return ret;
+   }
do {


So ... it's OK to do this write to pipes/directories/devices/... ?  Or is
that check always taken care of elsewhere?  If so, why do we need this
check?  I'm confused.  None of the callers check for the 'ret == 1' case,
so I'm sure there's something wrong here, I just can't tell what it is.



See above.


+++ b/fs/read_write.c
@@ -438,17 +438,19 @@ ssize_t vfs_write(struct file *file, const char __user 
*buf, size_t count, loff_
ret = rw_verify_area(WRITE, file, pos, count);
if (ret = 0) {
count = ret;
-   file_start_write(file);
-   if (file-f_op-write)
-   ret = file-f_op-write(file, buf, count, pos);
-   else
-   ret = do_sync_write(file, buf, count, pos);
+   ret = file_start_write_killable(file);
if (ret  0) {
-   fsnotify_modify(file);
-   add_wchar(current, ret);
+   if (file-f_op-write)
+   ret = file-f_op-write(file, buf, count, pos);
+   else
+   ret = do_sync_write(file, buf, count, pos);
+   if (ret  0) {
+   fsnotify_modify(file);
+   add_wchar(current, ret);
+   }
+   inc_syscw(current);
+   file_end_write(file);
}
-   inc_syscw(current);
-   file_end_write(file);
}

return ret;


I don't like it that you've increased the indentation here.  Better to do
a preliminary patch which just converts to our normal style with gotos.  ie:



Ok, I can change the style here, no problem.

Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] fsfreeze: wait in killable state in __sb_start_write

2013-04-17 Thread Marco Stornelli
Resend due to mail client problem.

Marco

2013/4/17 Marco Stornelli 
>
> Hi,
>
>
> 2013/4/15 Jan Kara 
>>
>> On Sat 13-04-13 12:35:54, Marco Stornelli wrote:
>> > Added a new enum to decide if we want to sleep in uninterruptible or
>> > killable state or we want simply to return immediately.
>>   I like the patch. You can add:
>> Reviewed-by: Jan Kara 
>>
>> Honza
>>
>

I'm happy if we can include the patches. However I do an update about
the on-going and additional work: the patches submitted can  be
applied as-is, however, I'm still working on extending the killable
path in mnt_want_write/mnt_want_write_file and I'm seeing if it's
possible to change even the page_mkwrite path. In the first case, as
Al said, there are three clear "hot" points, do_last, kern_path_create
and mq_open. However I modified the code carefully in these code paths
and I did some basic tests and it works. I'm going to submit the patch
next week for a review if I'm able to do more tests.
About the page_mkwrite path and the return value VM_FAULT_RETRY: the
return value of get_user_pages can be 0 in case of VM_FAULT_RETRY. The
caller set the nonblocking flag and it can manage the situation.
Blocking callers, instead, have a BUG_ON on the return value of 0.
They want either an error code or the number of pages gotten. A little
modification of __get_user_pages can do the work. However,
page_mkwrite could return VM_FAULT_RETRY even if the flag
FAULT_FLAG_ALLOW_RETRY is not set, I don't know if it's correct and
this flag in each case, at the moment, it's not visibile in
page_mkwrite. In addition, I need to understand if a skip can be
useful, or in each case the process will go to sleep in
uninterruptible state in a step forward. Any comments is welcome.

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] fsfreeze: wait in killable state in __sb_start_write

2013-04-17 Thread Marco Stornelli
Resend due to mail client problem.

Marco

2013/4/17 Marco Stornelli marco.storne...@gmail.com

 Hi,


 2013/4/15 Jan Kara j...@suse.cz

 On Sat 13-04-13 12:35:54, Marco Stornelli wrote:
  Added a new enum to decide if we want to sleep in uninterruptible or
  killable state or we want simply to return immediately.
   I like the patch. You can add:
 Reviewed-by: Jan Kara j...@suse.cz

 Honza



I'm happy if we can include the patches. However I do an update about
the on-going and additional work: the patches submitted can  be
applied as-is, however, I'm still working on extending the killable
path in mnt_want_write/mnt_want_write_file and I'm seeing if it's
possible to change even the page_mkwrite path. In the first case, as
Al said, there are three clear hot points, do_last, kern_path_create
and mq_open. However I modified the code carefully in these code paths
and I did some basic tests and it works. I'm going to submit the patch
next week for a review if I'm able to do more tests.
About the page_mkwrite path and the return value VM_FAULT_RETRY: the
return value of get_user_pages can be 0 in case of VM_FAULT_RETRY. The
caller set the nonblocking flag and it can manage the situation.
Blocking callers, instead, have a BUG_ON on the return value of 0.
They want either an error code or the number of pages gotten. A little
modification of __get_user_pages can do the work. However,
page_mkwrite could return VM_FAULT_RETRY even if the flag
FAULT_FLAG_ALLOW_RETRY is not set, I don't know if it's correct and
this flag in each case, at the moment, it's not visibile in
page_mkwrite. In addition, I need to understand if a skip can be
useful, or in each case the process will go to sleep in
uninterruptible state in a step forward. Any comments is welcome.

Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Return value of __mm_populate

2013-04-14 Thread Marco Stornelli

Hi,

Il 14/04/2013 02:18, KOSAKI Motohiro ha scritto:

(4/13/13 5:14 AM), Marco Stornelli wrote:

Hi,

I was seeing the code of __mm_populate (in -next) and I've got a doubt
about the return value. The function __mlock_posix_error_return should
return a proper error for mlock, converting the return value from
__get_user_pages. It checks for EFAULT and ENOMEM. Actually
__get_user_pages could return, in addition, ERESTARTSYS and EHWPOISON.


__get_user_pages doesn't return EHWPOISON if FOLL_HWPOISON is not specified.
I'm not expert ERESTARTSYS. I understand correctly, ERESTARTSYS is only returned
when signal received, and signal handling routine (e.g. do_signal) modify EIP 
and
hidden ERESTARTSYS from userland generically.



Yep, you're right, the "magic" is inside the signal management. Thanks!!

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Return value of __mm_populate

2013-04-14 Thread Marco Stornelli

Hi,

Il 14/04/2013 02:18, KOSAKI Motohiro ha scritto:

(4/13/13 5:14 AM), Marco Stornelli wrote:

Hi,

I was seeing the code of __mm_populate (in -next) and I've got a doubt
about the return value. The function __mlock_posix_error_return should
return a proper error for mlock, converting the return value from
__get_user_pages. It checks for EFAULT and ENOMEM. Actually
__get_user_pages could return, in addition, ERESTARTSYS and EHWPOISON.


__get_user_pages doesn't return EHWPOISON if FOLL_HWPOISON is not specified.
I'm not expert ERESTARTSYS. I understand correctly, ERESTARTSYS is only returned
when signal received, and signal handling routine (e.g. do_signal) modify EIP 
and
hidden ERESTARTSYS from userland generically.



Yep, you're right, the magic is inside the signal management. Thanks!!

Marco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] fsfreeze: use sb_start_write_killable instead of sb_start_write

2013-04-13 Thread Marco Stornelli
Replace sb_start_write with sb_start_write_killable where
possible.

Signed-off-by: Marco Stornelli 
---
 fs/open.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 8c74100..d621d76 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -182,7 +182,9 @@ static long do_sys_ftruncate(unsigned int fd, loff_t 
length, int small)
if (IS_APPEND(inode))
goto out_putf;
 
-   sb_start_write(inode->i_sb);
+   error = sb_start_write_killable(inode->i_sb);
+   if (error < 0)
+   return error;
error = locks_verify_truncate(inode, f.file, length);
if (!error)
error = security_path_truncate(>f_path);
@@ -273,7 +275,9 @@ int do_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
if (!file->f_op->fallocate)
return -EOPNOTSUPP;
 
-   sb_start_write(inode->i_sb);
+   ret = sb_start_write_killable(inode->i_sb);
+   if (ret < 0)
+   return ret;
ret = file->f_op->fallocate(file, mode, offset, len);
sb_end_write(inode->i_sb);
return ret;
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] fsfreeze: added new file_start_write_killable

2013-04-13 Thread Marco Stornelli
Replace file_start_write with file_start_write_killable where
possible.

Signed-off-by: Marco Stornelli 
---
 drivers/block/loop.c |4 +++-
 fs/aio.c |7 +--
 fs/coda/file.c   |4 +++-
 fs/read_write.c  |   28 +---
 fs/splice.c  |4 +++-
 include/linux/fs.h   |   17 +
 6 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index be9a101..2c0d0a3 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -230,7 +230,9 @@ static int __do_lo_send_write(struct file *file,
ssize_t bw;
mm_segment_t old_fs = get_fs();
 
-   file_start_write(file);
+   bw = file_start_write_killable(file);
+   if (bw < 0)
+   return bw;
set_fs(get_ds());
bw = file->f_op->write(file, buf, len, );
set_fs(old_fs);
diff --git a/fs/aio.c b/fs/aio.c
index 5b7ed78..5deddf5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1103,8 +1103,11 @@ static ssize_t aio_rw_vect_retry(struct kiocb *iocb, int 
rw, aio_rw_op *rw_op)
if (iocb->ki_pos < 0)
return -EINVAL;
 
-   if (rw == WRITE)
-   file_start_write(file);
+   if (rw == WRITE) {
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   return ret;
+   }
do {
ret = rw_op(iocb, >ki_iovec[iocb->ki_cur_seg],
iocb->ki_nr_segs - iocb->ki_cur_seg,
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 380b798..c5708d0 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -79,7 +79,9 @@ coda_file_write(struct file *coda_file, const char __user 
*buf, size_t count, lo
return -EINVAL;
 
host_inode = file_inode(host_file);
-   file_start_write(host_file);
+   ret = file_start_write_killable(host_file);
+   if (ret < 0)
+   return ret;
mutex_lock(_inode->i_mutex);
 
ret = host_file->f_op->write(host_file, buf, count, ppos);
diff --git a/fs/read_write.c b/fs/read_write.c
index 7eb7ef3..ed9006f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -438,17 +438,19 @@ ssize_t vfs_write(struct file *file, const char __user 
*buf, size_t count, loff_
ret = rw_verify_area(WRITE, file, pos, count);
if (ret >= 0) {
count = ret;
-   file_start_write(file);
-   if (file->f_op->write)
-   ret = file->f_op->write(file, buf, count, pos);
-   else
-   ret = do_sync_write(file, buf, count, pos);
+   ret = file_start_write_killable(file);
if (ret > 0) {
-   fsnotify_modify(file);
-   add_wchar(current, ret);
+   if (file->f_op->write)
+   ret = file->f_op->write(file, buf, count, pos);
+   else
+   ret = do_sync_write(file, buf, count, pos);
+   if (ret > 0) {
+   fsnotify_modify(file);
+   add_wchar(current, ret);
+   }
+   inc_syscw(current);
+   file_end_write(file);
}
-   inc_syscw(current);
-   file_end_write(file);
}
 
return ret;
@@ -718,7 +720,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
} else {
fn = (io_fn_t)file->f_op->write;
fnv = file->f_op->aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   goto out;
}
 
if (fnv)
@@ -898,7 +902,9 @@ static ssize_t compat_do_readv_writev(int type, struct file 
*file,
} else {
fn = (io_fn_t)file->f_op->write;
fnv = file->f_op->aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret < 0)
+   goto out;
}
 
if (fnv)
diff --git a/fs/splice.c b/fs/splice.c
index e6b2559..b37c30e 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1115,7 +1115,9 @@ static long do_splice_from(struct pipe_inode_info *pipe, 
struct file *out,
else
splice_write = default_file_splice_write;
 
-   file_start_write(out);
+   ret = file_start_write_killable(out);
+   if (ret < 0)
+   return ret;
ret = splice_write(pipe, out, ppos, len, flags);
file_end_write(out);
return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c8b7325..998ec2a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1404,6 +1404,16 @@ static 

[PATCH 1/3] fsfreeze: wait in killable state in __sb_start_write

2013-04-13 Thread Marco Stornelli
Added a new enum to decide if we want to sleep in uninterruptible or
killable state or we want simply to return immediately.

Signed-off-by: Marco Stornelli 
---
 fs/super.c |   24 ++--
 include/linux/fs.h |   19 +--
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 7465d43..6b70c7f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1190,14 +1190,25 @@ static void acquire_freeze_lock(struct super_block *sb, 
int level, bool trylock,
  * This is an internal function, please use sb_start_{write,pagefault,intwrite}
  * instead.
  */
-int __sb_start_write(struct super_block *sb, int level, bool wait)
+int __sb_start_write(struct super_block *sb, int level, int wait)
 {
+   int ret = 0;
 retry:
if (unlikely(sb->s_writers.frozen >= level)) {
-   if (!wait)
-   return 0;
-   wait_event(sb->s_writers.wait_unfrozen,
-  sb->s_writers.frozen < level);
+   switch (wait) {
+   case FREEZE_NOWAIT:
+   return ret;
+   case FREEZE_WAIT:
+   wait_event(sb->s_writers.wait_unfrozen,
+  sb->s_writers.frozen < level);
+   break;
+   case FREEZE_WAIT_KILLABLE:
+   ret = wait_event_killable(sb->s_writers.wait_unfrozen,
+  sb->s_writers.frozen < level);
+   if (ret)
+   return -EINTR;
+   break;
+   }
}
 
 #ifdef CONFIG_LOCKDEP
@@ -1213,7 +1224,8 @@ retry:
__sb_end_write(sb, level);
goto retry;
}
-   return 1;
+   ret = 1;
+   return ret;
 }
 EXPORT_SYMBOL(__sb_start_write);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8d47c9a..c8b7325 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1220,6 +1220,13 @@ enum {
 
 #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1)
 
+/* Possible waiting modes */
+enum {
+   FREEZE_NOWAIT = 0,  /* no blocking call */
+   FREEZE_WAIT = 1,/* wait in uninterruptible state */
+   FREEZE_WAIT_KILLABLE = 2,   /* wait in killable state */
+};
+
 struct sb_writers {
/* Counters for counting writers at each level */
struct percpu_counter   counter[SB_FREEZE_LEVELS];
@@ -1335,7 +1342,7 @@ extern struct timespec current_fs_time(struct super_block 
*sb);
  */
 
 void __sb_end_write(struct super_block *sb, int level);
-int __sb_start_write(struct super_block *sb, int level, bool wait);
+int __sb_start_write(struct super_block *sb, int level, int wait);
 
 /**
  * sb_end_write - drop write access to a superblock
@@ -1394,12 +1401,12 @@ static inline void sb_end_intwrite(struct super_block 
*sb)
  */
 static inline void sb_start_write(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline int sb_start_write_trylock(struct super_block *sb)
 {
-   return __sb_start_write(sb, SB_FREEZE_WRITE, false);
+   return __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_NOWAIT);
 }
 
 /**
@@ -1423,7 +1430,7 @@ static inline int sb_start_write_trylock(struct 
super_block *sb)
  */
 static inline void sb_start_pagefault(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, true);
+   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, FREEZE_WAIT);
 }
 
 /*
@@ -1441,7 +1448,7 @@ static inline void sb_start_pagefault(struct super_block 
*sb)
  */
 static inline void sb_start_intwrite(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_FS, true);
+   __sb_start_write(sb, SB_FREEZE_FS, FREEZE_WAIT);
 }
 
 
@@ -2224,7 +2231,7 @@ static inline void file_start_write(struct file *file)
 {
if (!S_ISREG(file_inode(file)->i_mode))
return;
-   __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline void file_end_write(struct file *file)
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3 v3][RFC] fsfreeze: from uninterruptible to killable

2013-04-13 Thread Marco Stornelli
Hi,

I rebased the work on top of -next and I applied a Jan's comment about 
__sb_start_write.
I did some basic tests and they are ok.

Open points:
- without change mnt_want_write several paths are still blocking paths;
- page_mkwrite still call blocking variant of __sb_start_write.

Any comments are welcome.

Regards.

Marco Stornelli (3):
  fsfreeze: wait in killable state in __sb_start_write
  fsfreeze: added new file_start_write_killable
  fsfreeze: use sb_start_write_killable instead of sb_start_write

 drivers/block/loop.c |4 +++-
 fs/aio.c |7 +--
 fs/coda/file.c   |4 +++-
 fs/open.c|8 ++--
 fs/read_write.c  |   28 +---
 fs/splice.c  |4 +++-
 fs/super.c   |   24 ++--
 include/linux/fs.h   |   36 ++--
 8 files changed, 85 insertions(+), 30 deletions(-)

-- 
1.7.3.4
---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3 v3][RFC] fsfreeze: from uninterruptible to killable

2013-04-13 Thread Marco Stornelli
Hi,

I rebased the work on top of -next and I applied a Jan's comment about 
__sb_start_write.
I did some basic tests and they are ok.

Open points:
- without change mnt_want_write several paths are still blocking paths;
- page_mkwrite still call blocking variant of __sb_start_write.

Any comments are welcome.

Regards.

Marco Stornelli (3):
  fsfreeze: wait in killable state in __sb_start_write
  fsfreeze: added new file_start_write_killable
  fsfreeze: use sb_start_write_killable instead of sb_start_write

 drivers/block/loop.c |4 +++-
 fs/aio.c |7 +--
 fs/coda/file.c   |4 +++-
 fs/open.c|8 ++--
 fs/read_write.c  |   28 +---
 fs/splice.c  |4 +++-
 fs/super.c   |   24 ++--
 include/linux/fs.h   |   36 ++--
 8 files changed, 85 insertions(+), 30 deletions(-)

-- 
1.7.3.4
---

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] fsfreeze: wait in killable state in __sb_start_write

2013-04-13 Thread Marco Stornelli
Added a new enum to decide if we want to sleep in uninterruptible or
killable state or we want simply to return immediately.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/super.c |   24 ++--
 include/linux/fs.h |   19 +--
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 7465d43..6b70c7f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1190,14 +1190,25 @@ static void acquire_freeze_lock(struct super_block *sb, 
int level, bool trylock,
  * This is an internal function, please use sb_start_{write,pagefault,intwrite}
  * instead.
  */
-int __sb_start_write(struct super_block *sb, int level, bool wait)
+int __sb_start_write(struct super_block *sb, int level, int wait)
 {
+   int ret = 0;
 retry:
if (unlikely(sb-s_writers.frozen = level)) {
-   if (!wait)
-   return 0;
-   wait_event(sb-s_writers.wait_unfrozen,
-  sb-s_writers.frozen  level);
+   switch (wait) {
+   case FREEZE_NOWAIT:
+   return ret;
+   case FREEZE_WAIT:
+   wait_event(sb-s_writers.wait_unfrozen,
+  sb-s_writers.frozen  level);
+   break;
+   case FREEZE_WAIT_KILLABLE:
+   ret = wait_event_killable(sb-s_writers.wait_unfrozen,
+  sb-s_writers.frozen  level);
+   if (ret)
+   return -EINTR;
+   break;
+   }
}
 
 #ifdef CONFIG_LOCKDEP
@@ -1213,7 +1224,8 @@ retry:
__sb_end_write(sb, level);
goto retry;
}
-   return 1;
+   ret = 1;
+   return ret;
 }
 EXPORT_SYMBOL(__sb_start_write);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8d47c9a..c8b7325 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1220,6 +1220,13 @@ enum {
 
 #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1)
 
+/* Possible waiting modes */
+enum {
+   FREEZE_NOWAIT = 0,  /* no blocking call */
+   FREEZE_WAIT = 1,/* wait in uninterruptible state */
+   FREEZE_WAIT_KILLABLE = 2,   /* wait in killable state */
+};
+
 struct sb_writers {
/* Counters for counting writers at each level */
struct percpu_counter   counter[SB_FREEZE_LEVELS];
@@ -1335,7 +1342,7 @@ extern struct timespec current_fs_time(struct super_block 
*sb);
  */
 
 void __sb_end_write(struct super_block *sb, int level);
-int __sb_start_write(struct super_block *sb, int level, bool wait);
+int __sb_start_write(struct super_block *sb, int level, int wait);
 
 /**
  * sb_end_write - drop write access to a superblock
@@ -1394,12 +1401,12 @@ static inline void sb_end_intwrite(struct super_block 
*sb)
  */
 static inline void sb_start_write(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline int sb_start_write_trylock(struct super_block *sb)
 {
-   return __sb_start_write(sb, SB_FREEZE_WRITE, false);
+   return __sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_NOWAIT);
 }
 
 /**
@@ -1423,7 +1430,7 @@ static inline int sb_start_write_trylock(struct 
super_block *sb)
  */
 static inline void sb_start_pagefault(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, true);
+   __sb_start_write(sb, SB_FREEZE_PAGEFAULT, FREEZE_WAIT);
 }
 
 /*
@@ -1441,7 +1448,7 @@ static inline void sb_start_pagefault(struct super_block 
*sb)
  */
 static inline void sb_start_intwrite(struct super_block *sb)
 {
-   __sb_start_write(sb, SB_FREEZE_FS, true);
+   __sb_start_write(sb, SB_FREEZE_FS, FREEZE_WAIT);
 }
 
 
@@ -2224,7 +2231,7 @@ static inline void file_start_write(struct file *file)
 {
if (!S_ISREG(file_inode(file)-i_mode))
return;
-   __sb_start_write(file_inode(file)-i_sb, SB_FREEZE_WRITE, true);
+   __sb_start_write(file_inode(file)-i_sb, SB_FREEZE_WRITE, FREEZE_WAIT);
 }
 
 static inline void file_end_write(struct file *file)
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] fsfreeze: added new file_start_write_killable

2013-04-13 Thread Marco Stornelli
Replace file_start_write with file_start_write_killable where
possible.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 drivers/block/loop.c |4 +++-
 fs/aio.c |7 +--
 fs/coda/file.c   |4 +++-
 fs/read_write.c  |   28 +---
 fs/splice.c  |4 +++-
 include/linux/fs.h   |   17 +
 6 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index be9a101..2c0d0a3 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -230,7 +230,9 @@ static int __do_lo_send_write(struct file *file,
ssize_t bw;
mm_segment_t old_fs = get_fs();
 
-   file_start_write(file);
+   bw = file_start_write_killable(file);
+   if (bw  0)
+   return bw;
set_fs(get_ds());
bw = file-f_op-write(file, buf, len, pos);
set_fs(old_fs);
diff --git a/fs/aio.c b/fs/aio.c
index 5b7ed78..5deddf5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1103,8 +1103,11 @@ static ssize_t aio_rw_vect_retry(struct kiocb *iocb, int 
rw, aio_rw_op *rw_op)
if (iocb-ki_pos  0)
return -EINVAL;
 
-   if (rw == WRITE)
-   file_start_write(file);
+   if (rw == WRITE) {
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   return ret;
+   }
do {
ret = rw_op(iocb, iocb-ki_iovec[iocb-ki_cur_seg],
iocb-ki_nr_segs - iocb-ki_cur_seg,
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 380b798..c5708d0 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -79,7 +79,9 @@ coda_file_write(struct file *coda_file, const char __user 
*buf, size_t count, lo
return -EINVAL;
 
host_inode = file_inode(host_file);
-   file_start_write(host_file);
+   ret = file_start_write_killable(host_file);
+   if (ret  0)
+   return ret;
mutex_lock(coda_inode-i_mutex);
 
ret = host_file-f_op-write(host_file, buf, count, ppos);
diff --git a/fs/read_write.c b/fs/read_write.c
index 7eb7ef3..ed9006f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -438,17 +438,19 @@ ssize_t vfs_write(struct file *file, const char __user 
*buf, size_t count, loff_
ret = rw_verify_area(WRITE, file, pos, count);
if (ret = 0) {
count = ret;
-   file_start_write(file);
-   if (file-f_op-write)
-   ret = file-f_op-write(file, buf, count, pos);
-   else
-   ret = do_sync_write(file, buf, count, pos);
+   ret = file_start_write_killable(file);
if (ret  0) {
-   fsnotify_modify(file);
-   add_wchar(current, ret);
+   if (file-f_op-write)
+   ret = file-f_op-write(file, buf, count, pos);
+   else
+   ret = do_sync_write(file, buf, count, pos);
+   if (ret  0) {
+   fsnotify_modify(file);
+   add_wchar(current, ret);
+   }
+   inc_syscw(current);
+   file_end_write(file);
}
-   inc_syscw(current);
-   file_end_write(file);
}
 
return ret;
@@ -718,7 +720,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
} else {
fn = (io_fn_t)file-f_op-write;
fnv = file-f_op-aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   goto out;
}
 
if (fnv)
@@ -898,7 +902,9 @@ static ssize_t compat_do_readv_writev(int type, struct file 
*file,
} else {
fn = (io_fn_t)file-f_op-write;
fnv = file-f_op-aio_write;
-   file_start_write(file);
+   ret = file_start_write_killable(file);
+   if (ret  0)
+   goto out;
}
 
if (fnv)
diff --git a/fs/splice.c b/fs/splice.c
index e6b2559..b37c30e 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1115,7 +1115,9 @@ static long do_splice_from(struct pipe_inode_info *pipe, 
struct file *out,
else
splice_write = default_file_splice_write;
 
-   file_start_write(out);
+   ret = file_start_write_killable(out);
+   if (ret  0)
+   return ret;
ret = splice_write(pipe, out, ppos, len, flags);
file_end_write(out);
return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c8b7325..998ec2a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1404,6 +1404,16 @@ static inline void sb_start_write(struct super_block *sb)
__sb_start_write(sb, SB_FREEZE_WRITE, FREEZE_WAIT

[PATCH 3/3] fsfreeze: use sb_start_write_killable instead of sb_start_write

2013-04-13 Thread Marco Stornelli
Replace sb_start_write with sb_start_write_killable where
possible.

Signed-off-by: Marco Stornelli marco.storne...@gmail.com
---
 fs/open.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 8c74100..d621d76 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -182,7 +182,9 @@ static long do_sys_ftruncate(unsigned int fd, loff_t 
length, int small)
if (IS_APPEND(inode))
goto out_putf;
 
-   sb_start_write(inode-i_sb);
+   error = sb_start_write_killable(inode-i_sb);
+   if (error  0)
+   return error;
error = locks_verify_truncate(inode, f.file, length);
if (!error)
error = security_path_truncate(f.file-f_path);
@@ -273,7 +275,9 @@ int do_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
if (!file-f_op-fallocate)
return -EOPNOTSUPP;
 
-   sb_start_write(inode-i_sb);
+   ret = sb_start_write_killable(inode-i_sb);
+   if (ret  0)
+   return ret;
ret = file-f_op-fallocate(file, mode, offset, len);
sb_end_write(inode-i_sb);
return ret;
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >