Re: [patch 0/8] unprivileged mount syscall

2007-04-16 Thread Miklos Szeredi
Arn't there ways to escape chroot jails? Serge had pointed me to a URL which showed chroots can be escaped. And if that is true than having all user's private mount tree in the same namespace can be a security issue? No. In fact chrooting the user into /share/$USER will actually _grant_ a

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-16 Thread Miklos Szeredi
Given the existence of shared subtrees allowing/denying this at the mount namespace level is silly and wrong. If we need more than just the filesystem permission checks can we make it a mount flag settable with mount and remount that allows non-privileged users the ability to create

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-16 Thread Miklos Szeredi
Also for bind-mount and remount operations the flag has to be propagated down its propagation tree. Otherwise a unpriviledged mount in a shared mount wont get reflected in its peers and slaves, leading to unidentical shared-subtrees. That's an interesting question. Do we want

[patch 01/10] add user mounts to the kernel

2007-04-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Add ownership information to mounts. A new mount flag, MS_SETUSER is used to make a mount owned by a user. If this flag is specified, then the owner will be set to the current real user id and the mount will be marked with the MNT_USER flag. On remount

[patch 02/10] allow unprivileged umount

2007-04-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] The owner doesn't need sysadmin capabilities to call umount(). Similar behavior as umount(8) on mounts having user=UID option in /etc/mtab. The difference is that umount also checks /etc/fstab, presumably to exclude another mount on the same mountpoint

[patch 07/10] allow unprivileged bind mounts

2007-04-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Allow bind mounts to unprivileged users if the following conditions are met: - user submounts are permitted on the mountpoint's mount - mountpoint is not a symlink or special file - mountpoint is not a sticky directory or is owned by the current user

[patch 04/10] allow per-mount flags to be set/cleared individually

2007-04-16 Thread Miklos Szeredi
solves this problem by generalizing do_change_type() so that not only the propagation property can be changed, but mnt_flags can be set/cleared individually. From: Miklos Szeredi [EMAIL PROTECTED] Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/namespace.c

[patch 00/10] mount ownership and unprivileged mount syscall (v3)

2007-04-16 Thread Miklos Szeredi
This patchset adds support for keeping mount ownership information in the kernel, and allow unprivileged mount(2) and umount(2) in certain cases. This can be useful for the following reasons: - mount(8) can store ownership (user=XY option) in the kernel instead, or in addition to storing it in

[patch 03/10] account user mounts

2007-04-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Add sysctl variables for accounting and limiting the number of user mounts. The maximum number of user mounts is set to 1024 by default. This won't in itself enable user mounts, setting the permit user submount mount flag will also be needed. Signed-off

[patch 08/10] put declaration of put_filesystem() in fs.h

2007-04-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Declarations go into headers. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/super.c === --- linux.orig/fs/super.c 2007-04-13 12:26:11.0 +0200 +++ linux/fs

[patch 06/10] propagate error values from clone_mnt

2007-04-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Allow clone_mnt() to return errors other than ENOMEM. This will be used for returning a different error value when the number of user mounts goes over the limit. Fix copy_tree() to return EPERM for unbindable mounts. Don't propagate further from

[patch 10/10] allow unprivileged fuse mounts

2007-04-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Use FS_SAFE for fuse fs type, but not for fuseblk. FUSE was designed from the beginning to be safe for unprivileged users. This has also been verified in practice over many years. In addition unprivileged fuse mounts require the usermnt mount option

[patch 05/10] Add permit user submounts flag to vfsmount

2007-04-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] If MNT_USERMNT flag is not set in the target vfsmount, then unprivileged mounts will be denied. By default this flag is cleared, and can be set on new mounts, on remounts or with the MS_SETFLAGS option. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED

[patch 09/10] allow unprivileged mounts

2007-04-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Define a new fs flag FS_SAFE, which denotes, that unprivileged mounting of this filesystem may not constitute a security problem. Since most filesystems haven't been designed with unprivileged mounting in mind, a thorough audit is needed before setting

Re: [patch 0/8] unprivileged mount syscall

2007-04-16 Thread Miklos Szeredi
Arn't there ways to escape chroot jails? Serge had pointed me to a URL which showed chroots can be escaped. And if that is true than having all user's private mount tree in the same namespace can be a security issue? No. In fact chrooting the user into /share/$USER will actually

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-16 Thread Miklos Szeredi
That depends. Current patches check the unprivileged submounts allowed under this mount flag only on the requested mount and not on the propagated mounts. Do you see a problem with this? I think privileges of this sort should propagate. If I read what you just said correctly if I have

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-16 Thread Miklos Szeredi
Also for bind-mount and remount operations the flag has to be propagated down its propagation tree. Otherwise a unpriviledged mount in a shared mount wont get reflected in its peers and slaves, leading to unidentical shared-subtrees. That's an

Re: ZFS with Linux: An Open Plea

2007-04-17 Thread Miklos Szeredi
FUSE is nice for trying out new and interresting ideas in userspace - it has its uses. Yes, but it is not really for the end-user. To paraphrase another, it is mostly academic. Oh? I thought those ~10,000 downloads of SSHFS and ~200,000 downloads of NTFS-3G were end users.(*) Maybe I

Re: [patch 05/10] Add permit user submounts flag to vfsmount

2007-04-17 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] If MNT_USERMNT flag is not set in the target vfsmount, then MNT_USER and MNT_USERMNT? I claim no way will people keep those straight. How about MNT_ALLOWUSER and MNT_USER? Umm, is allowuser more clear than usermnt? What is allowed to the user

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-17 Thread Miklos Szeredi
Interesting So far even today these things can happen, however they are sufficiently unlikely the tools don't account for them. Once a hostile user can cause them things are more of a problem. (Unless you want to tackle each problem legacy tool one at a time to remove problems -

Re: [patch 05/10] Add permit user submounts flag to vfsmount

2007-04-17 Thread Miklos Szeredi
MNT_USER and MNT_USERMNT? I claim no way will people keep those straight. How about MNT_ALLOWUSER and MNT_USER? Umm, is allowuser more clear than usermnt? What is allowed to the I think so, yes. One makes it clear that we're talking about allowing user (somethings :), one might

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-17 Thread Miklos Szeredi
I'm a bit lost about what is currently done and who advocates for what. It seems to me the MNT_ALLOWUSERMNT (or whatever :) flag should be propagated. In the /share rbind+chroot example, I assume the admin would start by doing mount --bind /share /share mount --make-slave

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-17 Thread Miklos Szeredi
I'm still not sure, what your problem is. My problem right now is that I see a serious complexity escalation in the user interface that we must support indefinitely. I see us taking a nice powerful concept and seriously watering it down. To some extent we have to avoid confusing suid

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-17 Thread Miklos Szeredi
mount --make-rshared / mkdir -p /mnt/ns/$USER mount --rbind / /mnt/ns/$USER mount --make-rslave /mnt/ns/$USER This was my main point - that the tree in which users can mount will be a slave of /, so that propagating the are user mounts allowed flag among peers is safe and intuitive.

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-17 Thread Miklos Szeredi
I'm a bit lost about what is currently done and who advocates for what. It seems to me the MNT_ALLOWUSERMNT (or whatever :) flag should be propagated. In the /share rbind+chroot example, I assume the admin would start by doing mount --bind /share /share mount

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-18 Thread Miklos Szeredi
I've tried to make this unprivileged mount thing as simple as possible, and no simpler. If we can make it even simpler, all the better. We are certainly much more complex then the code in plan9 (just read through it) so I think we have room for improvement. Just for reference what I

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-18 Thread Miklos Szeredi
Allowing this and other flags to NOT be propagated just makes it possible to have a set of shared mounts with asymmetric properties, which may actually be desirable. The shared mount feature was designed to ensure that the mount remained identical at all the locations. OK, so remount

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-18 Thread Miklos Szeredi
I've tried to make this unprivileged mount thing as simple as possible, and no simpler. If we can make it even simpler, all the better. We are certainly much more complex then the code in plan9 (just read through it) so I think we have room for improvement. Just for

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-18 Thread Miklos Szeredi
Don't forget that almost all mount flags are per-superblock. How are you planning on dealing with the case that one user mounts a filesystem read-only, while another is trying to mount the same one read-write? Yeah, I forgot, the per-mount read-only patches are not yet in. That

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-18 Thread Miklos Szeredi
I've tried to make this unprivileged mount thing as simple as possible, and no simpler. If we can make it even simpler, all the better. We are certainly much more complex then the code in plan9 (just read through it) so I think we have room for improvement. Just for reference

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-19 Thread Miklos Szeredi
As I said earlier, I see a case where two mounts that are peers of each other can become un-identical if we dont propagate the allowusermnt. As a practical example. /tmp and /mnt are peers of each other. /tmp has its allowusermnt flag set, which has not been propagated to

Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-19 Thread Miklos Szeredi
Checking the permissions on the mountpoint to allow unmounting is - rather inelegant: user can't see those permissions, can only determine if umount is allowed by trial and error - may be a security hole, e.g.: sysadmin: mkdir -m 777 /mnt/disk mount

Re: [PATCH 09/12] mm: count unstable pages per BDI

2007-04-19 Thread Miklos Szeredi
Count per BDI unstable pages. I'm wondering, is it really worth having this category separate from per BDI brity pages? With the exception of the export to sysfs, always the sum of unstable + dirty is used. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the

Re: [PATCH 11/12] mm: per device dirty threshold

2007-04-19 Thread Miklos Szeredi
+static inline unsigned long bdi_stat_delta(void) +{ +#ifdef CONFIG_SMP + return NR_CPUS * FBC_BATCH; Shouln't this be multiplied by the number of counters to sum? I.e. 3 if dirty and unstable are separate, and 2 if they are not. Miklos - To unsubscribe from this list: send the line

Re: [PATCH 09/12] mm: count unstable pages per BDI

2007-04-19 Thread Miklos Szeredi
Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c2007-04-19 19:59:26.0 +0200 +++ linux-2.6/fs/buffer.c 2007-04-19 20:35:39.0 +0200 @@ -733,7 +733,7 @@ int __set_page_dirty_buffers(struct

[patch 2/8] allow unprivileged umount

2007-04-20 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] The owner doesn't need sysadmin capabilities to call umount(). Similar behavior as umount(8) on mounts having user=UID option in /etc/mtab. The difference is that umount also checks /etc/fstab, presumably to exclude another mount on the same mountpoint

[patch 1/8] add user mounts to the kernel

2007-04-20 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Add ownership information to mounts. A new mount flag, MS_SETUSER is used to make a mount owned by a user. If this flag is specified, then the owner will be set to the current real user id and the mount will be marked with the MNT_USER flag. On remount

[patch 4/8] propagate error values from clone_mnt

2007-04-20 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Allow clone_mnt() to return errors other than ENOMEM. This will be used for returning a different error value when the number of user mounts goes over the limit. Fix copy_tree() to return EPERM for unbindable mounts. Don't propagate further from

[patch 6/8] put declaration of put_filesystem() in fs.h

2007-04-20 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Declarations go into headers. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/super.c === --- linux.orig/fs/super.c 2007-04-20 11:55:02.0 +0200 +++ linux/fs

[patch 5/8] allow unprivileged bind mounts

2007-04-20 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Allow bind mounts to unprivileged users if the following conditions are met: - mountpoint is not a symlink or special file - parent mount is owned by the user - the number of user mounts is below the maximum Unprivileged mounts imply MS_SETUSER

[patch 8/8] allow unprivileged fuse mounts

2007-04-20 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Use FS_SAFE for fuse fs type, but not for fuseblk. FUSE was designed from the beginning to be safe for unprivileged users. This has also been verified in practice over many years. In addition unprivileged mounts require the parent mount to be owned

[patch 3/8] account user mounts

2007-04-20 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Add sysctl variables for accounting and limiting the number of user mounts. The maximum number of user mounts is set to 1024 by default. This won't in itself enable user mounts, setting a mount to be owned by a user is first needed Signed-off-by: Miklos

[patch 7/8] allow unprivileged mounts

2007-04-20 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Define a new fs flag FS_SAFE, which denotes, that unprivileged mounting of this filesystem may not constitute a security problem. Since most filesystems haven't been designed with unprivileged mounting in mind, a thorough audit is needed before setting

[patch 0/8] mount ownership and unprivileged mount syscall (v4)

2007-04-20 Thread Miklos Szeredi
This patchset has now been bared to the lowest common denominator that everybody can agree on. Or at least there weren't any objections to this proposal. Andrew, please consider it for -mm. Thanks, Miklos v3 - v4: - simplify interface as much as possible, now only a single option

Re: [d_path 0/7] Fixes to d_path: Respin

2007-04-20 Thread Miklos Szeredi
I gave a chroot example that showed that in the current implementation, you can get pretty random clashes between mounts; there are other cases with lazy unmounts as well. Irrelevant as well. If you create chroot problems it's your problem. The fact is that if you have a normal setup

Re: [patch 1/8] add user mounts to the kernel

2007-04-21 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Add ownership information to mounts. A new mount flag, MS_SETUSER is used to make a mount owned by a user. If this flag is specified, then the owner will be set to the current real user id and the mount will be marked with the MNT_USER flag

Re: [patch 2/8] allow unprivileged umount

2007-04-21 Thread Miklos Szeredi
+static bool permit_umount(struct vfsmount *mnt, int flags) +{ ... + return mnt-mnt_uid == current-uid; +} Yes, this seems very wrong. I'd have thought that comparing user_struct*'s would get us a heck of a lot closer to being able to support aliasing of UIDs between

Re: [patch 7/8] allow unprivileged mounts

2007-04-21 Thread Miklos Szeredi
On Fri, 20 Apr 2007 12:25:39 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: Define a new fs flag FS_SAFE, which denotes, that unprivileged mounting of this filesystem may not constitute a security problem. Since most filesystems haven't been designed with unprivileged mounting in mind

Re: [patch 8/8] allow unprivileged fuse mounts

2007-04-21 Thread Miklos Szeredi
Use FS_SAFE for fuse fs type, but not for fuseblk. FUSE was designed from the beginning to be safe for unprivileged users. This has also been verified in practice over many years. How does FUSE do this? There are obvious cases like crafting a filesystem which has setuid

Re: [patch 7/8] allow unprivileged mounts

2007-04-21 Thread Miklos Szeredi
Define a new fs flag FS_SAFE, which denotes, that unprivileged mounting of this filesystem may not constitute a security problem. Since most filesystems haven't been designed with unprivileged mounting in mind, a thorough audit is needed before setting this flag.

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-21 Thread Miklos Szeredi
On Fri, 20 Apr 2007 17:52:04 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: Scale writeback cache per backing device, proportional to its writeout speed. By decoupling the BDI dirty thresholds a number of problems we currently have will go away, namely: - mutual interference

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-21 Thread Miklos Szeredi
The other deadlock, in throttle_vm_writeout() is still to be solved. Let's go back to the original changelog: Author: marcelo.tosatti marcelo.tosatti Date: Tue Mar 8 17:25:19 2005 + [PATCH] vm: pageout throttling With silly pageout testcases it is possible to place

Re: [patch 2/8] allow unprivileged umount

2007-04-22 Thread Miklos Szeredi
On Sat, 21 Apr 2007 10:09:42 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: +static bool permit_umount(struct vfsmount *mnt, int flags) +{ ... + return mnt-mnt_uid == current-uid; +} Yes, this seems very wrong. I'd have thought that comparing

Re: [patch 1/8] add user mounts to the kernel

2007-04-22 Thread Miklos Szeredi
The MNT_USER flag is not copied on any kind of mount cloning: namespace creation, binding or propagation. I half agree, and as an initial approximation this works. Ultimately we should be at the point that for mount propagation that we copy the owner of the from the owner of our parent

Re: [patch 2/8] allow unprivileged umount

2007-04-22 Thread Miklos Szeredi
I suspect we can allow MNT_FORCE for non-privileged users as well if we can trust the filesystem. I don't think so. MNT_FORCE has side effects on the superblock. So a user shouldn't be able to force an unmount on a bind mount s/he did, but there's no problem with allowing plain/lazy

Re: [patch 3/8] account user mounts

2007-04-22 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Add sysctl variables for accounting and limiting the number of user mounts. The maximum number of user mounts is set to 1024 by default. This won't in itself enable user mounts, setting a mount to be owned by a user is first needed Since

Re: [patch 5/8] allow unprivileged bind mounts

2007-04-22 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Allow bind mounts to unprivileged users if the following conditions are met: - mountpoint is not a symlink or special file Why? This sounds like a left over from when we were checking permissions. Hmm, yes. Don't know. Maybe only

Re: [patch 8/8] allow unprivileged fuse mounts

2007-04-22 Thread Miklos Szeredi
+ /* +* For unprivileged mounts use current uid/gid. Still allow +* user_id and group_id options for compatibility, but +* only if they match these values. +*/ + if (!capable(CAP_SYS_ADMIN)) { + d-user_id = current-uid; + d-user_id_present =

Re: [patch 2/8] allow unprivileged umount

2007-04-22 Thread Miklos Szeredi
Does this mean, that containers will need this? Or that you don't know yet? The uid namespace is something we have to handle carefully and we have not decided on the final design. What is clear is that all permission checks will need to become either (uid namspace, uid) tuple

Re: [patch 1/8] add user mounts to the kernel

2007-04-22 Thread Miklos Szeredi
+if (mnt-mnt_flags MNT_USER) +seq_printf(m, ,user=%i, mnt-mnt_uid); How about making the test if (mnt-mnt_user != root_user) We don't want to treat root_user special. That's what capabilities were invented for. For the print statement? What ever it is

Re: [patch 3/8] account user mounts

2007-04-22 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Add sysctl variables for accounting and limiting the number of user mounts. The maximum number of user mounts is set to 1024 by default. This won't in itself enable user mounts, setting a mount to be owned by a user is first needed

Re: [patch 7/8] allow unprivileged mounts

2007-04-22 Thread Miklos Szeredi
On Apr 21 2007 10:57, Eric W. Biederman wrote: tmpfs! tmpfs is a possible problem because it can consume lots of ram/swap. Which is why it has limits on the amount of space it can consume. Users can gobble up all RAM and swap already today. (Unless they are confined into an

Re: [patch 1/8] add user mounts to the kernel

2007-04-22 Thread Miklos Szeredi
+ +uid_t mnt_uid; /* owner of the mount */ Can we please make this a user struct. That requires a bit of reference counting but it has uid namespace benefits as well as making it easy to implement per user mount rlimits. OK, can you ellaborate, what

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-23 Thread Miklos Szeredi
The other deadlock, in throttle_vm_writeout() is still to be solved. Let's go back to the original changelog: Author: marcelo.tosatti marcelo.tosatti Date: Tue Mar 8 17:25:19 2005 + [PATCH] vm: pageout throttling With silly pageout testcases it

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Miklos Szeredi
This is probably a reasonable thing to do but it doesn't feel like the right place. I think get_dirty_limits should return the raw threshold, and balance_dirty_pages should do both tests - the bdi-local test and the system-wide test. Ok, that makes sense I guess. Well, my narrow

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Miklos Szeredi
This is probably a reasonable thing to do but it doesn't feel like the right place. I think get_dirty_limits should return the raw threshold, and balance_dirty_pages should do both tests - the bdi-local test and the system-wide test. Ok, that makes sense I guess.

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Miklos Szeredi
Ahh, now I see; I had totally blocked out these few lines: pages_written += write_chunk - wbc.nr_to_write; if (pages_written = write_chunk) break; /* We've done our duty */ yeah, those look dubious

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Miklos Szeredi
Ahh, now I see; I had totally blocked out these few lines: pages_written += write_chunk - wbc.nr_to_write; if (pages_written = write_chunk) break; /* We've done our duty */

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Miklos Szeredi
On Tue, 24 Apr 2007 12:12:18 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: On Tue, 2007-04-24 at 03:00 -0700, Andrew Morton wrote: On Tue, 24 Apr 2007 11:47:20 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: Ahh, now I see; I had totally blocked out these few lines

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Miklos Szeredi
No, we _start_ writeback for 1.5*ratelimit_pages pages, but do not wait for those writebacks to finish. So for a slow device and a fast writer, dirty+writeback can indeed increase beyond the dirty threshold. Nope, try it. If a process dirties 1000 pages it'll then go into

Re: [patch 0/8] mount ownership and unprivileged mount syscall (v4)

2007-04-25 Thread Miklos Szeredi
The following extra security measures are taken for unprivileged mounts: - usermounts are limited by a sysctl tunable - force nosuid,nodev mount options on the created mount The original userspace user= solution also implies the noexec option by default (you can override the

[patch] unprivileged mounts update

2007-04-25 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] - refine adding nosuid and nodev flags for unprivileged mounts: o add nosuid, only if mounter doesn't have CAP_SETUID capability o add nodev, only if mounter doesn't have CAP_MKNOD capability - allow unprivileged forced unmount, but only for FS_SAFE

Re: [patch 3/8] per backing_dev dirty and writeback page accounting

2007-03-12 Thread Miklos Szeredi
I have no idea how serious the scalability problems with this are. If they are serious, different solutions can probably be found for the above, but this is certainly the simplest. Atomic operations to a single per-backing device from all CPUs at once? That's a pretty serious

Re: [patch 3/8] per backing_dev dirty and writeback page accounting

2007-03-12 Thread Miklos Szeredi
I'll try to explain the reason for the deadlock first. IIUC, your problem is that there's another bdi that holds all the dirty pages, and this throttle loop never flushes pages from that other bdi and we sleep instead. It seems to me that the fundamental problem is that to clean the pages we

Re: [patch 3/8] per backing_dev dirty and writeback page accounting

2007-03-13 Thread Miklos Szeredi
IIUC, your problem is that there's another bdi that holds all the dirty pages, and this throttle loop never flushes pages from that other bdi and we sleep instead. It seems to me that the fundamental problem is that to clean the pages we need to flush both bdi's, not just the bdi we

Re: [patch 3/8] per backing_dev dirty and writeback page accounting

2007-03-14 Thread Miklos Szeredi
Only if the queue depth is not bound. Queue depths are bound and so the distance we can go over the threshold is limited. This is the fundamental principle on which the throttling is based. Hence, if the queue is not full, then we will have either written dirty pages to it (i.e

Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()

2007-02-11 Thread Miklos Szeredi
Unfortunately this patch doesn't completely solve this problem, since the system will still be hosed due to all memory being used up by dentries. And I bet the OOM killer won't find the real target (du) but will kill anything before that. So the second part of the problem is to

[RFC PATCH] add filesystem subtype support

2007-02-12 Thread Miklos Szeredi
There's a slight problem with filesystem type representation in fuse based filesystems. From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem types. The user is not even much concerned if the filesystem is fuse

Re: [RFC PATCH] add filesystem subtype support

2007-02-12 Thread Miklos Szeredi
There's a slight problem with filesystem type representation in fuse based filesystems. From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem types. The user is not even much concerned if the

Re: [RFC PATCH] add filesystem subtype support

2007-02-12 Thread Miklos Szeredi
-static struct file_system_type **find_filesystem(const char *name) +static struct file_system_type **find_filesystem(const char *name, unsigned len) { struct file_system_type **p; for (p=file_systems; *p; p=(*p)-next) -if (strcmp((*p)-name,name) == 0) +

Re: [uml-devel] UML hang with 100% CPU

2007-02-15 Thread Miklos Szeredi
Strangely enough after continuing in gdb, UML is back to normal, and I can't make it hang any more. It must be something timing related. Can you see if the patch below fixes it? Yay! Got my nice fast UML back instead of ugly slow QEmu ;) Seems to work perfectly now. Thanks, Miklos - To

[PATCH] consolidate generic_writepages and mpage_writepages

2007-02-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Clean up massive code duplication between mpage_writepages() and generic_writepages(). The new generic function, write_cache_pages() takes a function pointer argument, which will be called for each page to be written. Maybe cifs_writepages() too can use

Re: [Fwd: [PATCH] consolidate generic_writepages and mpage_writepages]

2007-02-17 Thread Miklos Szeredi
Maybe cifs_writepages() too can use this infrastructure, but I'm not touching that with a ten-foot pole. The cifs case ought to be one of the simpler ones, pseudo-code is pretty easy, the hard part is all of the stuff unrelated to cifs: Ideally if there were generic functions to help

dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
I was testing the new fuse shared writable mmap support, and finding that bash-shared-mapping deadlocks (which isn't so strange ;). What is more strange is that this is not an OOM situation at all, with plenty of free and cached pages. A little more investigation shows that a similar deadlock

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
I was testing the new fuse shared writable mmap support, and finding that bash-shared-mapping deadlocks (which isn't so strange ;). What is more strange is that this is not an OOM situation at all, with plenty of free and cached pages. A little more investigation shows that a similar

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
Andrew Morton wrote: On Sun, 18 Feb 2007 19:28:18 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote: I was testing the new fuse shared writable mmap support, and finding that bash-shared-mapping deadlocks (which isn't so strange ;). What is more strange is that this is not an OOM situation

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
I was testing the new fuse shared writable mmap support, and finding that bash-shared-mapping deadlocks (which isn't so strange ;). What is more strange is that this is not an OOM situation at all, with plenty of free and cached pages. A little more investigation shows

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
If so, writes to B will decrease the dirty memory threshold. Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000. Some pages queued for writeback (doesn't matter how much). B writes back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() for B doesn't know

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
--- a/fs/fs-writeback.c~a +++ a/fs/fs-writeback.c @@ -356,7 +356,7 @@ int generic_sync_sb_inodes(struct super_ continue; /* Skip a congested blockdev */ } - if (wbc-bdi bdi != wbc-bdi) { + if (wbc-bdi bdi !=

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
If so, writes to B will decrease the dirty memory threshold. Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000. Some pages queued for writeback (doesn't matter how much). B writes back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() for B doesn't

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
If so, writes to B will decrease the dirty memory threshold. Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000. Some pages queued for writeback (doesn't matter how much). B writes back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() for B

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
In general, writepage is supposed to do work without blocking on expensive locks that will get pdflush and dirty reclaim stuck in this fashion. You'll probably have to take the same approach reiserfs does in data=journal mode, which is leaving the page dirty if fuse_get_req_wp is

Re: dirty balancing deadlock

2007-02-19 Thread Miklos Szeredi
How about this? Solves the FUSE deadlock, but not the throttle_vm_writeout() one. I'll try to tackle that one as well. If the per-bdi dirty counter goes below 16, balance_dirty_pages() returns. Does the constant need to tunable? If it's too large, then the global threshold is more easily

Re: dirty balancing deadlock

2007-02-19 Thread Miklos Szeredi
Solves the FUSE deadlock, but not the throttle_vm_writeout() one. I'll try to tackle that one as well. If the per-bdi dirty counter goes below 16, balance_dirty_pages() returns. Does the constant need to tunable? If it's too large, then the global threshold is more easily exceeded. If

[patch 1/3] fix illogical behavior in balance_dirty_pages()

2007-03-24 Thread Miklos Szeredi
: Miklos Szeredi [EMAIL PROTECTED] Current behavior of balance_dirty_pages() is to try to start writeout into the specified queue for at least write_chunk number of pages. If write_chunk pages have been submitted, then return. However if there are less than write_chunk dirty pages for this queue

[patch 2/3] remove throttle_vm_writeout()

2007-03-24 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Remove this function. It's purpose was to limit the global number of writeback pages from submitted by direct reclaim. But this is equally well accomplished by limited queue lengths. When this function was added, the device queues had much larger default

[patch 3/3] balance dirty pages from loop device

2007-03-24 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] The function do_lo_send_aops() should call balance_dirty_pages_ratelimited() after each page similarly to generic_file_buffered_write(). Without this, writing the loop device directly (not through a filesystem) is very slow, and also slows the whole system

[patch] add file position info to proc

2007-03-24 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] This patch adds support for finding out the current file position, open flags and possibly other info in the future. These new entries are added: /proc/PID/fdinfo/FD /proc/PID/task/TID/fdinfo/FD For each fd the information is provided in the following

[patch 1/3] split mmap

2007-03-24 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] This is a straightforward split of do_mmap_pgoff() into two functions: - do_mmap_pgoff() checks the parameters, and calculates the vma flags. Then it calls - mmap_region(), which does the actual mapping Signed-off-by: Miklos Szeredi [EMAIL PROTECTED

[patch 2/3] only allow nonlinear vmas for ram backed filesystems

2007-03-24 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Dirty page accounting/limiting doesn't work for nonlinear mappings, so for non-ram backed filesystems emulate with linear mappings. This retains ABI compatibility with previous kernels at minimal code cost. All known users of nonlinear mappings actually

  1   2   3   4   5   6   7   8   9   10   >