Re: big kernel locks for inode ops

2000-12-06 Thread Matthew Wilcox
On Wed, Dec 06, 2000 at 03:03:00PM -0800, LA Walsh wrote: I notice many places in open.c where 'lock_kernel()' has been dropped in 2.4. There are still a few places. Should there be a separate file-sub-system lock, perhaps later evolving into a /per filesystem lock? Under what

Re: dnotify enhancement

2000-12-13 Thread Matthew Wilcox
On Tue, Dec 12, 2000 at 03:47:54PM -0500, James Antill wrote: Say you have... /a/b/c/d ...and you are monitoring the 'd' directory. Then someone does... % mv /a/b /a/b.old % mkdir -p /a/b/c/d that shouldn't trigger an event. i'm strongly against doing recursion of this in the

Re: More better in mount(2)

2001-01-05 Thread Matthew Wilcox
On Fri, Jan 05, 2001 at 01:00:58PM -0800, LA Walsh wrote: There is no file-system specific code. even better, you can do it in the vfs. file systems, but that seems as ugly or uglier than adding a 6th parameter. Note that the 6th parameter would only be looked at if the MS_EXTRA bit

NULL f_ops

2001-03-08 Thread Matthew Wilcox
Someone tell me if my chain of reasoning is wrong here... (1) The only way to get a `struct file' is to call get_empty_filp() (2) All callers of get_empty_filp() set -f_ops to a non-NULL value (3) All checks for f_ops being NULL can be removed -- Revolutions do not require corporate

Re: NULL f_ops

2001-03-08 Thread Matthew Wilcox
On Thu, Mar 08, 2001 at 11:16:10AM -0500, Alexander Viro wrote: I'm not sure on your #2. In principle, -i_fop can be NULL. It may be a good thing to declare that it should never happens, but right now it's not guaranteed. Besides, revoke-like thing in proc/generic.c _does_ set f_op to NULL.

remove unused entries from ext2 inode

2001-03-12 Thread Matthew Wilcox
Andi Kleen mentioned that the ext2 part of the inode union was rather large. So I wondered if I could chop it down a bit. * i_osync is only referenced, never set * i_faddr, i_frag_no, i_frag_size -- we don't support fragments. * not_used_1 can clearly be removed. * i_high_size is obsoleted

Re: (struct dentry *)-vfsmnt;

2001-03-14 Thread Matthew Wilcox
On Wed, Mar 14, 2001 at 10:26:50AM -0700, Andreas Dilger wrote: Let me put it that way: I don't understand why (if it is useful at all) it is done in the fs. Looks like a wrong level... For the same reason that the UUID and LABEL are stored in the superblock: you want this infomation kept

Re: [RFC] sane access to per-fs metadata (was Re: [PATCH] Documentation/ioctl-number.txt)

2001-03-23 Thread Matthew Wilcox
On Fri, Mar 23, 2001 at 09:56:47AM -0700, Bryan Henderson wrote: There's a lot of cool simplicity in this, both in implementation and application, but it leaves something to be desired in functionality. This is partly because the price you pay for being able to use existing, well-worn

Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Matthew Wilcox
On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. 4k page size * 2GB =

Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Matthew Wilcox
On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote: What do you mean by problems 5 years down the road? The real issue is that this 32-bit block count limit affects composite devices like MD RAID and LVM today, not just individual disks. There have been several postings I have

Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Matthew Wilcox
On Mon, Mar 26, 2001 at 01:13:32PM -0800, Ion Badulescu wrote: Yes, there are millions of 32-bit systems in use today. They do their job just fine with the 32-bit device support we have right now. Do you really want to penalize them *all* for the sake of the few idiotic sysadmins who want

File Locking in 2.5

2001-04-30 Thread Matthew Wilcox
. Links [1]POSIX file locking [2]Olaf Kirch's page on NLM (warning: out of date) _ Matthew Wilcox [EMAIL PROTECTED] References 1. http://www.opengroup.org/onlinepubs/007908799/xsh/fcntl.html 2. http

Re: [gfs-admin@sistina.com: Your message to gfs awaits moderator approval]

2001-04-30 Thread Matthew Wilcox
On Mon, Apr 30, 2001 at 02:15:50PM -0700, David S. Miller wrote: Now since kpreslan is the only one from sistina.com on linux-fsdevel I would say that would be the problem address. However, I've made some postings to linux-kernel today (which this address is also on) yet I received no such

Re: File Locking in 2.5

2001-05-02 Thread Matthew Wilcox
On Wed, May 02, 2001 at 04:49:30PM +0200, Andi Kleen wrote: On Mon, Apr 30, 2001 at 12:39:23PM -0600, Matthew Wilcox wrote: * All filesystems will fill in their -lock method. Why when a common stub should work for 90% of them? Please keep global search-and-edit operation low when

Re: File Locking in Linux 2.5

2001-05-03 Thread Matthew Wilcox
On Thu, May 03, 2001 at 12:19:31AM -0700, Jeremy Allison wrote: I want to add another byte-range lock, which looks and smells like a POSIX fcntl lock except that it is not removed by closing any fd which happens to be open on this file. If you do this I'll be eternally in your debt ! I

Re: File Locking in Linux 2.5

2001-05-03 Thread Matthew Wilcox
On Thu, May 03, 2001 at 01:28:33PM +0200, Andi Kleen wrote: Just don't forget to add a per user ulimit for it and probably an admin tool like ipcs. It'll use the same limit as the other locks do. -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line

Re: File Locking in Linux 2.5

2001-05-03 Thread Matthew Wilcox
On Thu, May 03, 2001 at 02:10:48PM +0200, Trond Myklebust wrote: == Matthew Wilcox [EMAIL PROTECTED] writes: I'll get to it this weekend then. Should be a relatively simple patch. Are there any other semantics you want changing from the POSIX lock? I'm thinking

Re: File Locking in Linux 2.5

2001-05-04 Thread Matthew Wilcox
On Fri, May 04, 2001 at 01:33:44PM +0200, Trond Myklebust wrote: ??? It'll be a completely useless form of lock if programs can end up blocking forever on a set of (from their perspective) valid operations. Umm.. that's your opinion. I don't see how `making them mandatory' would make this

F_SETLK_NP

2001-05-06 Thread Matthew Wilcox
, here's the test program (I checked what was going on via /proc/locks, you can too). /* lock-test.c Copyright (c) Matthew Wilcox, Hewlett-Packard. GPL License */ #define _GNU_SOURCE #include fcntl.h #include stdio.h #include unistd.h #define F_SETLK_NP 15 int main(int argc, char **argv

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-19 Thread Matthew Wilcox
On Sat, May 19, 2001 at 12:51:23PM -0600, Richard Gooch wrote: Al, if you really want to kill ioctl(2), then perhaps you should implement a transaction(2) syscall. Something like: int transaction (int fd, void *rbuf, size_t rlen, void *wbuf, size_t wlen); Of course,

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-19 Thread Matthew Wilcox
On Sat, May 19, 2001 at 10:22:55PM -0400, Richard Gooch wrote: The transaction(2) syscall can be just as easily abused as ioctl(2) in this respect. But read() and write() cannot. -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line unsubscribe

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-20 Thread Matthew Wilcox
On Sun, May 20, 2001 at 03:11:53PM -0400, Alexander Viro wrote: Pheeew... Could you spell about megabyte of stuff in ioctl.c? No. $ ls -l arch/*/kernel/ioctl32*.c -rw-r--r--1 willywilly 22479 Jan 24 16:59 arch/mips64/kernel/ioctl32.c -rw-r--r--1 willywilly 109475 May

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-21 Thread Matthew Wilcox
On Mon, May 21, 2001 at 06:13:18PM -0700, Linus Torvalds wrote: Nope. You can (and people do, quite often) share filps. So you can't associate state with it. For _devices_, though? I don't expect my mouse to work if gpm and xfree both try to consume device events from the same filp. Heck, it

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-21 Thread Matthew Wilcox
On Tue, May 22, 2001 at 02:22:34AM +0200, Ingo Oeser wrote: ioctl has actually 4 semantics: command only command + read command + write command + rw-transaction Separating these would be a first step. And yes, I consider each of them useful. command only: reset drive echo 'reset'

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-19 Thread Matthew Wilcox
On Sat, May 19, 2001 at 12:51:07PM -0400, Alexander Viro wrote: clone(), walk(), clunk(), stat() and open() ;-) Basically, we can add unopened descriptors. I.e. no IO until you open it (turning the thing into opened one), but we can do lookups (move to child), we can clone and kill them and

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-22 Thread Matthew Wilcox
On Tue, May 22, 2001 at 04:31:37PM +0100, Alan Cox wrote: `the class of devices in question' was cryptographic devices, and possibly other transactional DSPs. I don't consider audio to be transactional. in any case, you can do transactional things with two threads, as long as they each

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-19 Thread Matthew Wilcox
On Sat, May 19, 2001 at 05:25:22PM +0100, Alan Cox wrote: Only to an English speaker. I suspect Quebec City canadians would prefer a different command set. Should we support `pas387' as well as `no387' as a kernel boot parameter then? Face it, a sysadmin has to know the limited subset of

Re: about BKL in VFS

2001-06-08 Thread Matthew Wilcox
On Fri, Jun 08, 2001 at 02:04:16PM -0400, Alexander Viro wrote: the only areas in VFS that still rely on BKL are locks.c, dquot.c and super.c. The latter is fixed in the patches I'm feeding to Linus (OK, the pieces I'm feeding to Linus make shifting the BKL down to method calls trivial).

Re: [Final call for testers][PATCH] superblock handling changes (2.4.6-pre3)

2001-06-15 Thread Matthew Wilcox
On Fri, Jun 15, 2001 at 01:10:00AM -0400, Alexander Viro wrote: +static inline void write_super(struct super_block *sb) +{ + lock_super(sb); + if (sb-s_root sb-s_dirt) When I first looked at this, I thought it was a typo. I don't think we should

Re: [PATCH] Minor ext3 speedup

2005-01-15 Thread Matthew Wilcox
On Thu, Jan 13, 2005 at 01:15:06PM +0100, Jan Kara wrote: Attached patch removes unnecessary division and modulo from ext3 code paths. It reduces (according to oprofile) the CPU usage measurably under a dbench load (see description of the patch for the numbers). I thought I'd apply Jan

[PATCH] Minor ext2 speedup

2005-01-24 Thread Matthew Wilcox
Port Andreas Dilger's and Jan Kara's patch for ext3 to ext2. Also some whitespace changes to get ext2/ext3 closer in sync. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] Index: fs/ext2/balloc.c === RCS file: /var/cvs/linux-2.6/fs

Re: Help!

2005-02-04 Thread Matthew Wilcox
On Fri, Feb 04, 2005 at 03:58:04PM +0530, Anirban Mukherjee wrote: I am doing a project on Ext2fs and Ext3fs and I require some materials on the physical structure of Ext2fs.It would be very helpful if someone sends me some information(or links where to find it) along with some general stuff

Re: RFC: [PATCH-2.6] Add helper function to lock multiple page cache pages.

2005-02-02 Thread Matthew Wilcox
On Wed, Feb 02, 2005 at 03:12:50PM +, Anton Altaparmakov wrote: I think the below loop would be clearer as a for loop ... err = 0; for (nr = 0; nr nr_pages; nr++, start++) { if (start == lp_idx) { pages[nr] = locked_page;

NFSD needs EXPORTFS

2005-02-03 Thread Matthew Wilcox
Got this report about 2.6.11-rc3. Is this the correct solution? - Forwarded message from Joel Soete [EMAIL PROTECTED] - A short analyse, it seems that's because NFSD was builtin while EXPORTFS was a module in my previous config file. Imho EXPORTFS would be build as NFSD? Is the

Efficient handling of sparse files

2005-02-28 Thread Matthew Wilcox
This problem came up with the systemimager program which uses rsync to install files from a master server to many clients. Red Hat has a system user with uid 2^32-1 which causes lastlog to grow to 1.2GB in size. rsync does understand the concept of sparse files (with the -S flag), but it has to

Re: mmap question

2005-03-21 Thread Matthew Wilcox
On Mon, Mar 21, 2005 at 01:33:55PM -0800, Bryan Henderson wrote: It looks to me like you're running into the fundamental limitation that the CPU doesn't notify Linux every time you store into a memory location. It does, though, set the dirty flag in the page table, and Linux eventually

Re: [PATCH] fs/fcntl.c : don't test unsigned value for less than zero

2005-04-14 Thread Matthew Wilcox
On Fri, Apr 15, 2005 at 03:07:42AM +0200, Jesper Juhl wrote: 'arg' is unsigned so it can never be less than zero, so testing for that is pointless and also generates a warning when building with gcc -W. This patch eliminates the pointless check. Didn't Linus already reject this one 6 months

Re: [PATCH] fs/fcntl.c : don't test unsigned value for less than zero

2005-04-15 Thread Matthew Wilcox
On Fri, Apr 15, 2005 at 10:03:05PM +1000, Herbert Xu wrote: I suppose it could be smart and stay quiet about val 0 || val BOUND However, gcc is slow enough as it is without adding unnecessary smarts like this. It only warns with -W on, not with -Wall, so I see no compelling reason to

Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Matthew Wilcox
On Fri, Aug 19, 2005 at 08:38:34PM +0100, Al Viro wrote: FWIW, I'd rather take page_symlink(), page_symlink_inode_operations, page_put_link(), page_follow_link_light(), page_readlink(), page_getlink(), generic_readlink() and vfs_readlink() to the same place where these guys would live. They

[PATCH] Make journal_commit_transaction() more understandable

2005-08-29 Thread Matthew Wilcox
journal_commit_transaction() is still 650+ lines long and contains 16 local variables. By moving phase 3 into its own function, we reduce its length by 150+ lines and reduce it to 5 local variables. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] commit.c | 367

Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)

2006-11-17 Thread Matthew Wilcox
On Fri, Nov 17, 2006 at 08:43:00AM -0500, Jeff Layton wrote: 2) this scheme would effectively leak inode addresses into userspace. I'm not sure if that would be exploitable, but it's probably best not to do it. The patch adds a static unsigned int that is initialized to a random value at boot

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-11-28 Thread Matthew Wilcox
On Mon, Nov 27, 2006 at 09:34:05PM -0700, Gary Grider wrote: Things like openg() - on process opens a file and gets a key that is passed to lots of processes which use the key to get a handle (great for thousands of processes opening a file) I don't understand how this leads to a more

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-11-29 Thread Matthew Wilcox
On Wed, Nov 29, 2006 at 09:04:50AM +, Christoph Hellwig wrote: - openg/sutoc No way. We already have a very nice file descriptor abstraction. You can pass file descriptors over unix sockets just fine. Yes, but it behaves like dup(). Gary replied to me off-list (which I

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-11-29 Thread Matthew Wilcox
On Wed, Nov 29, 2006 at 05:23:13AM -0700, Matthew Wilcox wrote: On Wed, Nov 29, 2006 at 09:04:50AM +, Christoph Hellwig wrote: - openg/sutoc No way. We already have a very nice file descriptor abstraction. You can pass file descriptors over unix sockets just fine. Yes

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Matthew Wilcox
On Tue, Dec 05, 2006 at 10:07:48AM +, Christoph Hellwig wrote: The filehandle idiocy on the other hand is way of into crackpipe land. Right, and it needs to be discarded. Of course, there was a real problem that it addressed, so we need to come up with an acceptable alternative. The

Re: openg and path_to_handle

2006-12-06 Thread Matthew Wilcox
On Thu, Dec 07, 2006 at 07:40:05AM +1100, David Chinner wrote: Permission checks are done on the path_to_handle(), so in reality only root or CAP_SYS_ADMIN users can currently use the open_by_handle interface because of this lack of checking. Given that our current users of this interface need

Re: openg and path_to_handle

2006-12-06 Thread Matthew Wilcox
On Wed, Dec 06, 2006 at 03:09:10PM -0700, Andreas Dilger wrote: Considering that filesystems like GFS and OCFS allow clients DIRECT ACCESS to the block device itself (which no amount of authentication will fix, unless it is in the disks themselves), the risk of passing a file handle around is

Re: openg and path_to_handle

2006-12-14 Thread Matthew Wilcox
On Thu, Dec 14, 2006 at 03:00:41PM -0600, Rob Ross wrote: I don't think that I understand what you're saying here. The openg() call does not perform file open (not that that is necessarily even a first-class FS operation), it simply does the lookup. When we were naming these calls, from a

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-17 Thread Matthew Wilcox
On Sun, Dec 17, 2006 at 11:07:27AM -0800, Ulrich Drepper wrote: And how often do the scripts which are in everyday use require such a command? And the same for the other programs. I know that the rsync load is a major factor on kernel.org right now. With all the git trees (particularly the

Re: inode i_blksize problem

2006-12-20 Thread Matthew Wilcox
On Wed, Dec 20, 2006 at 01:26:56PM +0100, Sergio Paracuellos wrote: I am trying to compile a module for kernel 2.6.18-1 that uses the 'inode struct' but the compiler tell me inode struct hasn't a member called i_blksize. I don't have that problem in kernel 2.6.16. What happend with i_blksize?

Re: Finding hardlinks

2007-01-03 Thread Matthew Wilcox
On Wed, Jan 03, 2007 at 01:33:31PM +0100, Miklos Szeredi wrote: High probability is all you have. Cosmic radiation hitting your computer will more likly cause problems, than colliding 64bit inode numbers ;) Some of us have machines designed to cope with cosmic rays, and would be unimpressed

Re: [PATCH] ext2: conditional removal of NFSD code

2007-01-06 Thread Matthew Wilcox
On Sat, Jan 06, 2007 at 10:58:31PM +0300, Alexey Dobriyan wrote: Nor me nor my box is going to act as NFS server, so ifdef all exporting code. @@ -916,7 +918,9 @@ static int ext2_fill_super(struct super_ * set up enough so that it can read an inode */ sb-s_op =

Re: Symbolic links vs hard links

2007-01-10 Thread Matthew Wilcox
On Wed, Jan 10, 2007 at 09:38:11AM -0800, Bryan Henderson wrote: Other people are of the opinion that the invention of the symbolic link was a huge mistake. I guess I haven't heard that one. What is the argument that we were better off without symbolic links? I suppose

Re: Rename file over another with the same inode number fails silently.

2007-02-08 Thread Matthew Wilcox
On Thu, Feb 08, 2007 at 11:48:16AM -0500, John Muir wrote: The attached test program creates a file, and then some hard links to that file (file0 - fileN). The test program then attempts to rename(fileN, file) for every hard link created. My expectation is that the hard links file0 - fileN

Re: [RFC][PATCH] sys_fallocate() system call

2007-03-17 Thread Matthew Wilcox
On Fri, Mar 16, 2007 at 05:17:04PM +0100, Heiko Carstens wrote: +asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len) e.g. asmlinkage long sys_fallocate(int fd, loff_t offset, loff_t len, int mode) would work even on s390 ;) How about: asmlinkage long

Re: forced umount?

2007-03-18 Thread Matthew Wilcox
On Sun, Mar 18, 2007 at 08:16:19PM +0100, Arjan van de Ven wrote: the problem with the people who say they want forced umount is.. that most of the time they either want 1) get rid of the namespace entry or 2) want to stop any and all IO to a certain device/partition There is a third

Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.

2007-04-09 Thread Matthew Wilcox
On Sun, Apr 08, 2007 at 09:16:59PM -0700, [EMAIL PROTECTED] wrote: REISER4 FOR INCLUSION IN THE LINUX KERNEL. Fuck off. Cheerleading like this only hurts your cause. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More

Re: [nameidata 1/2] Don't pass NULL nameidata to vfs_create

2007-04-16 Thread Matthew Wilcox
On Mon, Apr 16, 2007 at 06:11:30PM +0200, Andreas Gruenbacher wrote: +static inline int +nfsd_do_create(struct inode *dir, struct dentry *child, struct vfsmount *mnt, +int mode) +{ + struct nameidata nd = { + .dentry = child, + .mnt = mnt, + };

Re: [NFS] [PATCH] locks: provide a file lease method enabling cluster-coherent leases

2007-06-01 Thread Matthew Wilcox
On Fri, Jun 01, 2007 at 12:44:16PM -0400, J. Bruce Fields wrote: The only problem I'm aware of is that leases aren't broken on rename, link, and unlink. This is kind of tricky to fix. David Richter (cc'd) and I sketched out a few different approaches, and I think he has some patches

Re: Read/write counts

2007-06-04 Thread Matthew Wilcox
On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote: Programs that assume a full transfer are fairly common, but are universally regarded as either broken or just lazy, and when it does cause a problem, it is far more common to fix the application than the kernel. Linus has

Re: [ANNOUNCE] util-linux-ng 2.13-rc1

2007-07-05 Thread Matthew Wilcox
On Thu, Jul 05, 2007 at 11:30:20PM +0200, Karel Zak wrote: The package build system is now based on autotools. The build system supports separate CFLAGS and LDFLAGS for suid programs (SUID_CFLAGS, SUID_LDFLAGS). For more details see the README file And this is really dumb.

Re: [Advocacy] Re: 3ware 9650 tips

2007-07-16 Thread Matthew Wilcox
On Mon, Jul 16, 2007 at 08:40:00PM +0300, Al Boldi wrote: XFS surely rocks, but it's missing one critical component: data=ordered And that's one component that's just too critical to overlook for an enterprise environment that is built on data-integrity over performance. So that's the

Re: [RFC] VFS: data=ordered (was: [Advocacy] Re: 3ware 9650 tips)

2007-07-16 Thread Matthew Wilcox
On Mon, Jul 16, 2007 at 09:28:08PM +0300, Al Boldi wrote: Well, conceptually it sounds like a piece of cake, technically your guess is as good as mine. IIRC, akpm once mentioned something like this. How much have you looked at the VFS? There's nothing journalling-related in the VFS right

Re: [RFC PATCH 0/2] avoid clobbering registers with J_ASSERT macro

2007-08-20 Thread Matthew Wilcox
On Mon, Aug 20, 2007 at 04:22:04PM +0100, Stephen C. Tweedie wrote: Hi, On Mon, 2007-08-20 at 09:18 -0400, Chris Snook wrote: How's about we just remove that printk? Do #define J_ASSERT(e) BUG_ON(e)? ITYM #define J_ASSERT(e) BUG_ON(!e) It did. The original J_ASSERT predates

Re: [patch 1/2] VFS: new fgetattr() file operation

2007-09-24 Thread Matthew Wilcox
On Mon, Sep 24, 2007 at 02:48:08PM +0200, Miklos Szeredi wrote: and if that means adding silly rename support so be it. That's what is done currently. But it's has various dawbacks, like rmdir doesn't work if there are open files within an otherwise empty directory. I'd happily accept

Re: [patch 1/2] VFS: new fgetattr() file operation

2007-09-24 Thread Matthew Wilcox
On Mon, Sep 24, 2007 at 03:06:06PM +0200, Miklos Szeredi wrote: A file isn't deleted while there are still links or open files refering to it. So getting the attributes for a file with nlink==0 is perfectly valid while the file is still open. Is it? Why not just pretend that the attributes

Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Matthew Wilcox
On Thu, Sep 27, 2007 at 02:37:42PM -0400, Theodore Tso wrote: I'm reminded of Rusty's 2003 OLS Keynote, where he points out that what's important is not making an interface easy to use, but _hard_ _to_ _misuse_. That fact that sysfs is all laid out in a directory, but for which some

Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Matthew Wilcox
On Thu, Sep 27, 2007 at 07:19:27PM -0400, Theodore Tso wrote: Would you accept a patch which causes the deprecated sysfs files/directories to disappear, even if CONFIG_SYS_DEPRECATED is defined, via a boot-time parameter? How about a mount option? That way people can test without a reboot:

SLUB performance regression vs SLAB

2007-10-04 Thread Matthew Wilcox
On Mon, Oct 01, 2007 at 01:50:44PM -0700, Christoph Lameter wrote: The problem is with the weird way of Intel testing and communication. Every 3-6 month or so they will tell you the system is X% up or down on arch Y (and they wont give you details because its somehow secret). And then there

Re: SLUB performance regression vs SLAB

2007-10-04 Thread Matthew Wilcox
On Thu, Oct 04, 2007 at 10:38:15AM -0700, Christoph Lameter wrote: On Thu, 4 Oct 2007, Matthew Wilcox wrote: So, on a well-known OLTP benchmark which prohibits publishing absolute numbers and on an x86-64 system (I don't think exactly which model is important), we're seeing *6.51

Re: SLUB performance regression vs SLAB

2007-10-04 Thread Matthew Wilcox
On Thu, Oct 04, 2007 at 01:48:34PM -0700, David Miller wrote: There comes a point where it is the reporter's responsibility to help the developer come up with a publishable test case the developer can use to work on fixing the problem and help ensure it stays fixed. That's a lot of effort. Is

Re: SLUB performance regression vs SLAB

2007-10-04 Thread Matthew Wilcox
On Thu, Oct 04, 2007 at 01:55:37PM -0700, David Miller wrote: Anything, I do mean anything, can be simulated using small test programs. Pointing at a big fancy machine with lots of storage and disk is a passive aggressive way to avoid the real issues, in that nobody is putting forth the

Re: [PATCH] isofs: add +w bit for non-RR discs

2007-10-04 Thread Matthew Wilcox
On Tue, Oct 02, 2007 at 08:00:26PM +0200, Jan Engelhardt wrote: Add %S_IWUGO bit for files on ISO-9660 filesystems without RockRidge Looks to me like you've added S_IWUSR, not S_IWUGO. - popt-mode = S_IRUGO | S_IXUGO; /* + popt-mode = S_IRUGO | S_IWUSR | S_IXUGO; -

Re: SLUB performance regression vs SLAB

2007-10-05 Thread Matthew Wilcox
On Fri, Oct 05, 2007 at 08:48:53AM +0200, Jens Axboe wrote: I'd like to second Davids emails here, this is a serious problem. Having a reproducible test case lowers the barrier for getting the problem fixed by orders of magnitude. It's the difference between the problem getting fixed in a day

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Matthew Wilcox
On Sun, Oct 28, 2007 at 01:43:21PM -0400, J. Bruce Fields wrote: We currently attempt to return -EDEALK to blocking fcntl() file locking requests that would create a cycle in the graph of tasks waiting on locks. This is inefficient: in the general case it requires us determining whether

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Matthew Wilcox
On Sun, Oct 28, 2007 at 06:40:52PM +, Alan Cox wrote: NAK. This is an ABI change. It was also comprehensively rejected before because - EDEADLK behaviour is ABI Not in any meaningful way. - EDEADLK behaviour is required by SuSv3 What SuSv3 actually says is: If the system

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Matthew Wilcox
On Sun, Oct 28, 2007 at 05:50:30PM -0400, Trond Myklebust wrote: You can't fix the false EDEADLK detection without solving the halting problem. Best of luck with that. I can see that it would be difficult to do efficiently, but basically, this boils down to finding a circular path in a

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Matthew Wilcox
On Sun, Oct 28, 2007 at 10:48:33PM +, Alan Cox wrote: Bzzt. You get a false deadlock with multiple threads like so: Thread A of task B takes lock 1 Thread C of task D takes lock 2 Thread C of task D blocks on lock 1 Thread E of task B blocks on lock 2 The spec and SYSV

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Matthew Wilcox
On Sun, Oct 28, 2007 at 11:55:52PM +0100, Jiri Kosina wrote: On Sun, 28 Oct 2007, Matthew Wilcox wrote: Bzzt. You get a false deadlock with multiple threads like so: Thread A of task B takes lock 1 Thread C of task D takes lock 2 Thread C of task D blocks on lock 1 Thread E of task B

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Matthew Wilcox
On Sun, Oct 28, 2007 at 09:38:55PM +, Alan Cox wrote: It doesn't require the system to detect it, only mandate what to return if it does detect it. We should be detecting at least the obvious case. What is the obvious case? A task that has never called clone()? If SYSV only spots

Re: Massive slowdown when re-querying large nfs dir

2007-11-04 Thread Matthew Wilcox
On Mon, Nov 05, 2007 at 07:58:38AM +0300, Al Boldi wrote: Any ideas? How about tcpdumping and seeing what requests are flowing across the wire? You might be able to figure out what's being done differently. -- Intel are signing my paycheques ... these opinions are still mine Bill, look, we

Re: NFS Killable tasks request comments on patch

2007-12-06 Thread Matthew Wilcox
On Thu, Dec 06, 2007 at 10:34:26AM -0700, Matthew Wilcox wrote: I've done a more thorough review of Liam's work and come up with a few more fixes (which I'll publish later today) I've put up a git tree for this work; see http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h

[RFC] Remove BKL from fs/locks.c

2007-12-29 Thread Matthew Wilcox
I've been promising to do this for about seven years now. It seems to work well enough, but I haven't run any serious stress tests on it. This implementation uses one spinlock to protect both lock lists and all the i_flock chains. It doesn't seem worth splitting up the locking any further. I

Re: [RFC] Remove BKL from fs/locks.c

2007-12-30 Thread Matthew Wilcox
On Sun, Dec 30, 2007 at 08:36:44PM +1100, Stephen Rothwell wrote: We should probably do some performance testing on this because the last time we tried the impact was quite noticeable. You should ping Tridge as he has some good lock testing setups. And he cares if we slow him down :-) Last

On setting a lease across a cluster

2008-01-04 Thread Matthew Wilcox
Hi Bruce, The current implementation of vfs_setlease/generic_setlease/etc is a bit quirky. I've been thinking it over for the past couple of days, and I think we need to refactor it to work sensibly. As you may have noticed, I've been mulling over getting rid of the BKL in fs/locks.c and the

Re: On setting a lease across a cluster

2008-01-04 Thread Matthew Wilcox
On Fri, Jan 04, 2008 at 02:47:18PM -0500, J. Bruce Fields wrote: Then I started to wonder about the current split of functionality between fcntl_setlease, vfs_setlease and generic_setlease. The check for no other process having this file open should be common to all filesystems. And

Re: On setting a lease across a cluster

2008-01-04 Thread Matthew Wilcox
On Fri, Jan 04, 2008 at 01:55:36PM -0500, david m. richter wrote: fwiw, i've done some work on extending the lease subsystem to help support the full range of requirements for NFSv4 file and directory delegations (e.g., breaking a lease when unlinking a file) and we ended up actually

Re: On setting a lease across a cluster

2008-01-04 Thread Matthew Wilcox
On Fri, Jan 04, 2008 at 03:53:04PM -0500, J. Bruce Fields wrote: So, the problem is that fcntl_setlease() does vfs_setlease() fasync_helper() which the bkl held over both, and you want to preserve that? But what that BKL is doing is a mystery to me--the very first thing that

Re: [patch] rewrite rd

2008-01-14 Thread Matthew Wilcox
On Tue, Dec 04, 2007 at 05:26:28AM +0100, Nick Piggin wrote: +static void copy_to_brd(struct brd_device *brd, const void *src, + sector_t sector, size_t n) +{ + struct page *page; + void *dst; + unsigned int offset = (sector (PAGE_SECTORS-1)) SECTOR_SHIFT;

file locks: Use wait_event_interruptible_timeout()

2008-01-14 Thread Matthew Wilcox
interruptible_sleep_on_locked() is just an open-coded wait_event_interruptible_timeout() with a few assumptions since we know we hold the BKL. locks_block_on_timeout() is only used in one place, so it's actually simpler to inline it into its caller. Signed-off-by: Matthew Wilcox [EMAIL

Re: Leak in nlmsvc_testlock for async GETFL case

2008-01-14 Thread Matthew Wilcox
On Mon, Jan 14, 2008 at 03:44:19PM -0500, J. Bruce Fields wrote: Thanks! I've queued it up for 2.6.25. Hi Bruce, I haven't had as much time to play with de-BKL-ising fs/locks.c as I would like, so fixing that for 2.6.25 is probably out of the question, but here are two janitorial patches that

file locks: Split flock_find_conflict out of flock_lock_file

2008-01-14 Thread Matthew Wilcox
Reduce the spaghetti-like nature of flock_lock_file by making the chunk of code labelled find_conflict into its own function. Also allocate memory before taking the kernel lock in preparation for switching to a normal spinlock. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] diff --git a/fs

Re: file locks: Use wait_event_interruptible_timeout()

2008-01-15 Thread Matthew Wilcox
On Tue, Jan 15, 2008 at 09:48:51AM -0500, J. Bruce Fields wrote: On Mon, Jan 14, 2008 at 09:28:30PM -0700, Matthew Wilcox wrote: interruptible_sleep_on_locked() is just an open-coded wait_event_interruptible_timeout() with a few assumptions since we know we hold the BKL

Re: [PATCH 1/3] enhanced ESTALE error handling

2008-01-18 Thread Matthew Wilcox
On Fri, Jan 18, 2008 at 10:36:01AM -0500, Peter Staubach wrote: @@ -1025,12 +1027,27 @@ static int fastcall link_path_walk(const mntget(save.mnt); result = __link_path_walk(name, nd); - if (result == -ESTALE) { + while (result == -ESTALE) { + /* +

Re: how to show propagation state for mounts

2008-02-20 Thread Matthew Wilcox
On Wed, Feb 20, 2008 at 04:04:22PM +, Al Viro wrote: It's less about the form of representation (after all, we generate poll events when contents of that sucker changes, so one *can* get a consistent snapshot of the entire thing) and more about having it self-contained when we have