Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Anton Altaparmakov
Hi,

There is a bug somewhere in 2.6.11.4 and I can't figure out where it is.
I assume it is present in older and newer kernels, too as the related
code hasn't changed much AFAICS and googling for Bad page state
returns rather a lot of hits relating to both older (up to 2.5.70!) and
newer kernels...

Note: PLEASE do not stop reading because you read ncpfs below as I am
pretty sure it is not ncpfs related!  And looking at google a lot of
people have reported such similar problems since 2.5.70 or so and they
were all told to go away as they have bad ram.  That is impossible
because this happens on well over 600 workstations and several servers
100% reproducible.  Many different types of hardware, different makes,
difference age, all running smp kernels even if single cpu.  You can't
tell me they all have bad ram.  Windows works fine and Linux works fine
except for that one specific problem which is 100% reproducible...

The bug only appears, but it appears 100% reproducibly when a cross
volume symlink on ncpfs is accessed using nautilus under gnome.  I.e.
double click on a cross volume symlink on ncpfs in nautilus and the
machine locks up solid.  Here is what is logged to syslog over the
network (this is an example, after the second message the machine is
locked up solid, I can send you other traces if you want to see more of
them):

Bad page state at free_hot_cold_page (in process 'nautilus', page c10d33c0)
flags:0x200c mapping:cd83f5d0 mapcount:0 count:0
Backtrace:
 [c01470ec] bad_page+0x7c/0xb0
 [c014790c] free_hot_cold_page+0x7c/0x110
 [c01482fc] __pagevec_free+0x1c/0x30
 [c014d3e3] release_pages+0x163/0x180
 [c014d572] __pagevec_lru_add+0xb2/0xc0
 [c014d1a5] lru_add_drain+0x45/0x50
 [c015640f] unmap_region+0x2f/0x130
 [c015653a] detach_vmas_to_be_unmapped+0x2a/0x60
 [c0156782] do_munmap+0x102/0x150
 [c0156818] sys_munmap+0x48/0x70
 [c0104079] sysenter_past_esp+0x52/0x79
Trying to fix it up, but a reboot is needed
Bad page state at free_hot_cold_page (in process 'nautilus', page c10d33c0)
flags:0x2004 mapping: mapcount:1 count:0
Backtrace:
 [c01470ec] bad_page+0x7c/0xb0
 [c014790c] free_hot_cold_page+0x7c/0x110
 [c0171659] link_path_walk+0x929/0xe80
 [c02b6882] ip_local_deliver+0x192/0x290
 [c02b6d94] ip_rcv+0x414/0x540
 [c0171e96] path_lookup+0xa6/0x1c0
 [c01726f2] open_namei+0x82/0x680
 [c011b297] recalc_task_prio+0xe7/0x200
 [c0162588] filp_open+0x28/0x50
 [c016284a] get_unused_fd+0x9a/0xc0
 [c01705e8] getname+0x68/0xe0
 [c016295c] sys_open+0x3c/0xa0
 [c0104079] sysenter_past_esp+0x52/0x79
Trying to fix it up, but a reboot is needed

So somehow pages get into a bad state.

Note that using gnome-terminal or console to access the symlink works
fine.  It is only nautilus and looking at the backtrace it has something
to do with mmap.

Note this is not an ncpfs bug AFAICS because ncpfs uses the generic
symlink code provided by VFS.  From fs/ncpfs/inode.c:

static struct inode_operations ncp_symlink_inode_operations = {
.readlink   = generic_readlink,
.follow_link= page_follow_link_light,
.put_link   = page_put_link,
.setattr= ncp_notify_change,
};

And from fs/ncpfs/symlink.c:

struct address_space_operations ncp_symlink_aops = {
.readpage   = ncp_symlink_readpage,
};

And ncp_symlink_readpage() is:

#define NCP_SYMLINK_MAGIC0  cpu_to_le32(0x6c6d7973) /* symlnk- */
#define NCP_SYMLINK_MAGIC1  cpu_to_le32(0x3e2d6b6e)

/* - read a symbolic link -- */

static int ncp_symlink_readpage(struct file *file, struct page *page)
{
struct inode *inode = page-mapping-host;
int error, length, len;
char *link, *rawlink;
char *buf = kmap(page);

error = -ENOMEM;
rawlink=(char *)kmalloc(NCP_MAX_SYMLINK_SIZE, GFP_KERNEL);
if (!rawlink)
goto fail;

if (ncp_make_open(inode,O_RDONLY))
goto failEIO;

error=ncp_read_kernel(NCP_SERVER(inode),NCP_FINFO(inode)-file_handle,
 0,NCP_MAX_SYMLINK_SIZE,rawlink,length);

ncp_inode_close(inode);
/* Close file handle if no other users... */
ncp_make_closed(inode);
if (error)
goto failEIO;

if (NCP_FINFO(inode)-flags  NCPI_KLUDGE_SYMLINK) {
if (lengthNCP_MIN_SYMLINK_SIZE ||
((__le32 *)rawlink)[0]!=NCP_SYMLINK_MAGIC0 ||
((__le32 *)rawlink)[1]!=NCP_SYMLINK_MAGIC1)
goto failEIO;
link = rawlink + 8;
length -= 8;
} else {
link = rawlink;
}

len = NCP_MAX_SYMLINK_SIZE;
error = ncp_vol2io(NCP_SERVER(inode), buf, len, link, length, 0);
kfree(rawlink);
if (error)
goto fail;
SetPageUptodate(page);
kunmap(page);
unlock_page(page);
return 0;

failEIO:
error = 

Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 12:14:48PM +0100, Anton Altaparmakov wrote:
 Hi,
 
 There is a bug somewhere in 2.6.11.4 and I can't figure out where it is.
 I assume it is present in older and newer kernels, too as the related
 code hasn't changed much AFAICS and googling for Bad page state
 returns rather a lot of hits relating to both older (up to 2.5.70!) and
 newer kernels...
 
 Note: PLEASE do not stop reading because you read ncpfs below as I am
 pretty sure it is not ncpfs related!  And looking at google a lot of
 people have reported such similar problems since 2.5.70 or so and they
 were all told to go away as they have bad ram.  That is impossible
 because this happens on well over 600 workstations and several servers
 100% reproducible.  Many different types of hardware, different makes,
 difference age, all running smp kernels even if single cpu.  You can't
 tell me they all have bad ram.  Windows works fine and Linux works fine
 except for that one specific problem which is 100% reproducible...
 
 The bug only appears, but it appears 100% reproducibly when a cross
 volume symlink on ncpfs is accessed using nautilus under gnome.  I.e.
 double click on a cross volume symlink on ncpfs in nautilus and the
 machine locks up solid.

Ugh...  Could you at least tell what does nautilus attempt to do at that
point?  Something that wouldn't show up with simple ls -l symlink or
cat symlink /dev/null, judging by the above, but what?
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mirror a file system on the fly

2005-08-19 Thread Dave Schwartz
Ram,

Your code snippet seems to work great as discussed. Thanks. :-)
However, my requirement is slightly different.  What I also want is
that any file created from the mirrored/cloned file-system must not be
available in the parent file system.

Gracias,
decebel


On 8/18/05, Ram Pai [EMAIL PROTECTED] wrote:
 On Thu, 2005-08-18 at 13:27, Dave Schwartz wrote:
  Hi Ram,
  Thanks for the inputs. I was going over the man pages describing the
  clone system call and its option of CLONE_NEWNS. Could understand the
  description only in parts.
 
  The man page suggests that this flag when set, the cloned child is
  started in a new name space, initialized with a copy of the parent.
  Now does that mean, a program like a shell when cloned with
  CLONE_NEWNS set, will have a copy of file hierarchy of the underlying
  parent process?
 
 Yes the child process will see an exact copy of all the mounts of
 various filesystems as that of the parent. However if you mount/unmount
 any filesystems in the child, the same will not be mounted/unmounted in
 the parent and vice-versa.  Each has its individual view of the
 the filesystem heirarchy.
 
 Try the following program that clones off a child process with a mirror
 namespace and gives you a bash prompt. Try mounting and unmounting
 in this bash prompt and see if the same is visible in a totally
 different window.
 
 
 #include  stdio.h
 #include  signal.h
 #include  sched.h
 
 char somemem[4096];
 
 int myfunc(){
 system(bash);
 }
 
 int
 main(int argc, char *argv[])
 {
 if(clone(myfunc, somemem, CLONE_NEWNS|SIGCHLD, NULL)) {
 wait(NULL);
 } else {
 printf(clone failed\n);
 }
 printf(exit\n);
 }
 
 
 Hope this helps,
 RP
 
 
 
 
 
  Gracias,
  decebel
 
 
 
  On 8/19/05, Ram Pai [EMAIL PROTECTED] wrote:
   On Thu, 2005-08-18 at 12:40, Dave Schwartz wrote:
Hi list,
   
Not too sure if this is the right forum to ask this question but since
my requirement is around linux filesystems, I shall take this liberty
to post my question.
   
My requirement is to develop a kernel/user space module to add an
extension to the shell program environment such that this shell forks
a mirror look-alike filesystem of the underlying OS to the programs
run in that particular shell.
  
   u seem to be talking about namespaces, if I get you right.
  
   there is a flag CLONE_NEWNS to the system call 'clone' which does what
   u r talking about.
  
   RP
  
  
  
  
   
   
Was trying to look thru the FAQ and a few list archives to look for
ideas around my requirement. The archives were overwhelming.
   
   
Any ideas/pointers will be a great help,
Gracias,
decebel
-
To unsubscribe from this list: send the line unsubscribe 
linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
  
  -
  To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Anton Altaparmakov
On Fri, 2005-08-19 at 15:20 +0100, Al Viro wrote:
 On Fri, Aug 19, 2005 at 12:14:48PM +0100, Anton Altaparmakov wrote:
  There is a bug somewhere in 2.6.11.4 and I can't figure out where it is.
  I assume it is present in older and newer kernels, too as the related
  code hasn't changed much AFAICS and googling for Bad page state
  returns rather a lot of hits relating to both older (up to 2.5.70!) and
  newer kernels...
  
  Note: PLEASE do not stop reading because you read ncpfs below as I am
  pretty sure it is not ncpfs related!  And looking at google a lot of
  people have reported such similar problems since 2.5.70 or so and they
  were all told to go away as they have bad ram.  That is impossible
  because this happens on well over 600 workstations and several servers
  100% reproducible.  Many different types of hardware, different makes,
  difference age, all running smp kernels even if single cpu.  You can't
  tell me they all have bad ram.  Windows works fine and Linux works fine
  except for that one specific problem which is 100% reproducible...
  
  The bug only appears, but it appears 100% reproducibly when a cross
  volume symlink on ncpfs is accessed using nautilus under gnome.  I.e.
  double click on a cross volume symlink on ncpfs in nautilus and the
  machine locks up solid.
 
 Ugh...  Could you at least tell what does nautilus attempt to do at that
 point?  Something that wouldn't show up with simple ls -l symlink or
 cat symlink /dev/null, judging by the above, but what?

One important thing I forgot to add is that the symlink must point to a
directory not a file.  If it points to a file all works fine.

I tried stracing nautilus to answer your question.  And this time for
the first time instead of a Bad page state I got a BUG() triggered in
fs/namei.c, the arrow below marks the spot:

void page_put_link(struct dentry *dentry, struct nameidata *nd)
{
if (!IS_ERR(nd_get_link(nd))) {
struct page *page;
page = find_get_page(dentry-d_inode-i_mapping, 0);
if (!page)
   BUG();
kunmap(page);
page_cache_release(page);
page_cache_release(page);
}
}

Here is the BUG output:

kernel BUG at fs/namei.c:2329!
invalid operand:  [#1]
SMP
Modules linked in: autofs4 nls_cp850 nls_utf8 ncpfs subfs fuse snd
soundcore speedstep_lib freq_table thermal processor fan button battery
ac nvram ipt_REJECT iptable_filter ip_tables af_packet edd joydev evdev
st video1394 ohci1394 raw1394 ieee1394 capability sg sr_mod dm_mod e1000
uhci_hcd usbcore i2c_i801 i2c_core hw_random parport_pc lp parport
reiserfs ide_cd cdrom ide_disk aic7xxx aic79xx via82cxxx ata_piix libata
piix ide_core sd_mod scsi_mod
CPU:3
EIP:0060:[c0174e3e]Tainted: G U VLI
EFLAGS: 00010246   (2.6.11.4-21.8-smp)
EIP is at page_put_link+0x3e/0x50
eax:    ebx:    ecx: d9c75654   edx: 
esi:    edi: d9d3   ebp: ffac2000   esp: d9d31eac
ds: 007b   es: 007b   ss: 0068
Process nautilus (pid: 21910, threadinfo=d9d3 task=f5caf550)
Stack: d9c75654 c0171659  4000 c2025060 0001 d9d31f1c
dff3ec80
   d9c75654 001d981f 0003 d9eb2014  d9d3 d9d31f1c

   d9eb2000 c0171e96 d9eb2000 d9d31f1c 0001 d9eb2000 c017211f
08542878
Call Trace:
 [c0171659] link_path_walk+0x929/0xe80
 [c0171e96] path_lookup+0xa6/0x1c0
 [c017211f] __user_walk+0x2f/0x70
 [c016c6fd] vfs_stat+0x1d/0x60
 [c016ce5f] sys_stat64+0xf/0x30
 [c0109051] do_syscall_trace+0xb1/0x175
 [c01040d7] syscall_call+0x7/0xb
Code: 31 d2 8b 80 a8 00 00 00 e8 80 e6 fc ff 85 c0 89 c3 74 18 89 d8 e8
93 5e fa ff 89 d8 e8 2c 80 fd ff 89 d8 5b e9 24 80 fd ff 5b c3 0f 0b
19 09 76 bb 32 c0 eb de 90 8d b4 26 00 00 00 00 83 ec 28

The compressed strace output is attached.  It was tracing the main
nautilus process (strace -f -F -o /strace.log -p 21910) which is where
the bug was hit as well.  FWIW I attached strace to the process just
before double-clicking on the symlink in the nautilus window...

Looking at the bottom of the strace.log file it seems like two possibly
concurrent stat64() calls of the symlink were running, one of which did
not complete...  (note this was done on a dual-cpu machine with
hyperthreading enabled, i.e. it appears to have four cpus).

Does this help?

Let me know if I can provide any other details...  (I can provide an ssh
connection to a machine for testing purposes if required.)

Best regards,

Anton
-- 
Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/  http://www-stu.christs.cam.ac.uk/~aia21/


strace.log.bz2
Description: application/bzip


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Anton Altaparmakov wrote:

   struct page *page;
   page = find_get_page(dentry-d_inode-i_mapping, 0);
   if (!page)
  BUG();

Something has truncated the mapping. 

My guess is that you had a cache invalidate event that removed the page
from the mapping while it was being used. That might also explain why this
only happens for ncpfs. I bet that in the other cases, the mapping was 
also invalidated, but re-filled immediately, and your strace slowed the 
other process down enough that it didn't get to re-fill the cache it 
invalidated or something like that.

The fact that it happens only for cross-volume things might also be 
explained that way - is there something ncpfs does when switching volumes 
that might trigger a cache invalidate for another volume (either on the 
client side or the server side - maybe the server tends to try to break 
the connection for the old volume when you start traversing a new one?)

The generic page cache for symlinks code does _not_ support invalidating 
the cache while it's being used. A local filesystem will obviously never 
invalidate the cache at all. 

Hmm.. NFS _does_ use the page cache for symlinks, but uses it slightly 
differently: instead of relying on the page cache entry being the same 
when freeing the page, it just caches the page it looked up in the page 
cache (ie nfs_follow_link() does look up the page cache, but then hides 
the page pointer inside the page data itself (uglee), and thus does not 
depend on the mapping staying the same (nfs_put_link() just takes the page 
from the symlink data).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 09:07:35AM -0700, Linus Torvalds wrote:
 Hmm.. NFS _does_ use the page cache for symlinks, but uses it slightly 
 differently: instead of relying on the page cache entry being the same 
 when freeing the page, it just caches the page it looked up in the page 
 cache (ie nfs_follow_link() does look up the page cache, but then hides 
 the page pointer inside the page data itself (uglee), and thus does not 
 depend on the mapping staying the same (nfs_put_link() just takes the page 
 from the symlink data).

For NFS that was done exactly to deal with cache invalidation.  IIRC, I've
convinced myself that it wasn't going to happen on ncpfs and happily
abstained from duplicating the NFS variant.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Linus Torvalds wrote:
 
 The generic page cache for symlinks code does _not_ support invalidating 
 the cache while it's being used. A local filesystem will obviously never 
 invalidate the cache at all. 
 
 Hmm.. NFS _does_ use the page cache for symlinks [..]

Looking more and more at this, I'm convinced this is it.

Basically, page_follow_link_light() and page_put_link() depend on the fact
that the page in the page cache is the same one the whole time:  
page_follow_link_light() will increment the page count of the page it
finds at offset 0, and page_put_link() will decrement it. If the page has 
changed, they increment/decrement different pages.

There's two ways to fix this:

 - document this as a fundamental fact, and apply the ncpfs patch. Local 
   filesystems can still continue to use the generic helper functions 
   (all other users _are_ local filesystems).

 - make nameidata contain not just the virtual addresses of the names, 
   but also have a struct page *pages[MAX_NESTED_LINKS + 1], and save 
   away the page there. That will fix ncpfs, and we could then make NFS 
   also use the generic routines.

I suspect that #1 is the prudent one. We have a patch already, and we
don't want to grow nameidata. I'll commit a comment at the head of
page_follow_link_light() too.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Linus Torvalds wrote:
 
  - document this as a fundamental fact, and apply the ncpfs patch. Local 
filesystems can still continue to use the generic helper functions 
(all other users _are_ local filesystems).

Actually, looking at the ncpfs patch, I'd rather not apply that patch 
as-is. It looks like it will totally disable symlink caching, which would 
be kind of sad. Somebody willing to do the same thing NFS does?

NFS hides away the struct page * pointer at the end of the page data, so
that when we pass the pointer to the virtual address around, we can 
trivially look up the struct page.

An alternative is to make the symlink address-space use a gfp_mask of 
GFP_KERNEL (no highmem), at which point ncpfs_put_link() just becomes 
something like

void ncpfs_put_link(struct dentry *dentry, struct nameidata *nd)
{
char *addr = nd_get_link(nd);

if (!IS_ERR(addr))
page_cache_release(virt_to_page(addr));
}

which is pretty ugly, but even simpler than the NFS trick.

Anybody?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 09:43:17AM -0700, Linus Torvalds wrote:
 Actually, looking at the ncpfs patch, I'd rather not apply that patch 
 as-is. It looks like it will totally disable symlink caching, which would 
 be kind of sad. Somebody willing to do the same thing NFS does?
 
 NFS hides away the struct page * pointer at the end of the page data, so
 that when we pass the pointer to the virtual address around, we can 
 trivially look up the struct page.
 
 An alternative is to make the symlink address-space use a gfp_mask of 
 GFP_KERNEL (no highmem), at which point ncpfs_put_link() just becomes 
 something like
 
   void ncpfs_put_link(struct dentry *dentry, struct nameidata *nd)
   {
   char *addr = nd_get_link(nd);
 
   if (!IS_ERR(addr))
   page_cache_release(virt_to_page(addr));
   }
 
 which is pretty ugly, but even simpler than the NFS trick.
 
 Anybody?

I'm taking NFS helpers to libfs.c and switching ncpfs to them.  IMO that's
better than copying the damn thing and other network filesystems might have
the same needs eventually...
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH] Generic fallback for security xattrs

2005-08-19 Thread Stephen Smalley
This is a request for comments (only) on the patch below that modifies
the VFS setxattr, getxattr, and listxattr code to fall back to the
security module for security xattrs if the filesystem does not support
xattrs natively.  This allows security modules to export the incore
inode security label information to userspace even if the filesystem
does not provide xattr storage, and eliminates the need to
individually patch various pseudo filesystem types to provide such
access (note that this patch removes the existing xattr code from
devpts and tmpfs as it is then no longer needed). Note that this
approach may be controversial [1]; it has been suggested that we
should instead be modifying all filesystem types to support security
(and other) xattrs natively, but this seems questionable for legacy
filesystems like vfat and pseudo filesystems like proc, especially
when the resulting code will end up simply calling the security
framework to access the incore security label as with the current
devpts and tmpfs handlers.

The patch restructures the code flow slightly to reduce duplication
between the normal path and the fallback path, but this should only have
one user-visible side effect - a program may get -EACCES rather than
-EOPNOTSUPP if policy denied access but the filesystem didn't support
the operation anyway.  Note that the post_setxattr hook call is not
needed in the fallback case, as the inode_setsecurity hook call handles
the incore inode security state update directly.  In contrast, we do
call fsnotify in both cases.

Let me know what you think.  Please do NOT apply yet.

A prior discussion thread on this patch on the linux-security-module
list is available at
[1] http://marc.theaimsgroup.com/?t=11214414723r=1w=2

---

 fs/Kconfig |   43 --
 fs/devpts/Makefile |1 
 fs/devpts/inode.c  |   21 ---
 fs/devpts/xattr_security.c |   47 
 fs/xattr.c |   80 +-
 mm/shmem.c |   85 -
 6 files changed, 49 insertions(+), 228 deletions(-)

diff -X /home/sds/dontdiff -Nrup linux-2.6.13-rc6-mm1/fs/devpts/inode.c 
linux-2.6.13-rc6-mm1-xattr/fs/devpts/inode.c
--- linux-2.6.13-rc6-mm1/fs/devpts/inode.c  2005-06-17 15:48:29.0 
-0400
+++ linux-2.6.13-rc6-mm1-xattr/fs/devpts/inode.c2005-08-19 
13:02:58.0 -0400
@@ -18,28 +18,9 @@
 #include linux/mount.h
 #include linux/tty.h
 #include linux/devpts_fs.h
-#include linux/xattr.h
 
 #define DEVPTS_SUPER_MAGIC 0x1cd1
 
-extern struct xattr_handler devpts_xattr_security_handler;
-
-static struct xattr_handler *devpts_xattr_handlers[] = {
-#ifdef CONFIG_DEVPTS_FS_SECURITY
-   devpts_xattr_security_handler,
-#endif
-   NULL
-};
-
-static struct inode_operations devpts_file_inode_operations = {
-#ifdef CONFIG_DEVPTS_FS_XATTR
-   .setxattr   = generic_setxattr,
-   .getxattr   = generic_getxattr,
-   .listxattr  = generic_listxattr,
-   .removexattr= generic_removexattr,
-#endif
-};
-
 static struct vfsmount *devpts_mnt;
 static struct dentry *devpts_root;
 
@@ -102,7 +83,6 @@ devpts_fill_super(struct super_block *s,
s-s_blocksize_bits = 10;
s-s_magic = DEVPTS_SUPER_MAGIC;
s-s_op = devpts_sops;
-   s-s_xattr = devpts_xattr_handlers;
s-s_time_gran = 1;
 
inode = new_inode(s);
@@ -175,7 +155,6 @@ int devpts_pty_new(struct tty_struct *tt
inode-i_gid = config.setgid ? config.gid : current-fsgid;
inode-i_mtime = inode-i_atime = inode-i_ctime = CURRENT_TIME;
init_special_inode(inode, S_IFCHR|config.mode, device);
-   inode-i_op = devpts_file_inode_operations;
inode-u.generic_ip = tty;
 
dentry = get_node(number);
diff -X /home/sds/dontdiff -Nrup linux-2.6.13-rc6-mm1/fs/devpts/Makefile 
linux-2.6.13-rc6-mm1-xattr/fs/devpts/Makefile
--- linux-2.6.13-rc6-mm1/fs/devpts/Makefile 2005-06-17 15:48:29.0 
-0400
+++ linux-2.6.13-rc6-mm1-xattr/fs/devpts/Makefile   2005-08-19 
13:03:07.0 -0400
@@ -5,4 +5,3 @@
 obj-$(CONFIG_UNIX98_PTYS)  += devpts.o
 
 devpts-$(CONFIG_UNIX98_PTYS)   := inode.o
-devpts-$(CONFIG_DEVPTS_FS_SECURITY)+= xattr_security.o
diff -X /home/sds/dontdiff -Nrup 
linux-2.6.13-rc6-mm1/fs/devpts/xattr_security.c 
linux-2.6.13-rc6-mm1-xattr/fs/devpts/xattr_security.c
--- linux-2.6.13-rc6-mm1/fs/devpts/xattr_security.c 2005-06-17 
15:48:29.0 -0400
+++ linux-2.6.13-rc6-mm1-xattr/fs/devpts/xattr_security.c   1969-12-31 
19:00:00.0 -0500
@@ -1,47 +0,0 @@
-/*
- * Security xattr support for devpts.
- *
- * Author: Stephen Smalley [EMAIL PROTECTED]
- * Copyright (c) 2004 Red Hat, Inc., James Morris [EMAIL PROTECTED]
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by 

Re: [RFC][PATCH] Generic fallback for security xattrs

2005-08-19 Thread Christoph Hellwig
On Fri, Aug 19, 2005 at 01:57:56PM -0400, Stephen Smalley wrote:
 This is a request for comments (only) on the patch below that modifies
 the VFS setxattr, getxattr, and listxattr code to fall back to the
 security module for security xattrs if the filesystem does not support
 xattrs natively.  This allows security modules to export the incore
 inode security label information to userspace even if the filesystem
 does not provide xattr storage, and eliminates the need to
 individually patch various pseudo filesystem types to provide such
 access (note that this patch removes the existing xattr code from
 devpts and tmpfs as it is then no longer needed). Note that this
 approach may be controversial [1]; it has been suggested that we
 should instead be modifying all filesystem types to support security
 (and other) xattrs natively, but this seems questionable for legacy
 filesystems like vfat and pseudo filesystems like proc, especially
 when the resulting code will end up simply calling the security
 framework to access the incore security label as with the current
 devpts and tmpfs handlers.
 
 The patch restructures the code flow slightly to reduce duplication
 between the normal path and the fallback path, but this should only have
 one user-visible side effect - a program may get -EACCES rather than
 -EOPNOTSUPP if policy denied access but the filesystem didn't support
 the operation anyway.  Note that the post_setxattr hook call is not
 needed in the fallback case, as the inode_setsecurity hook call handles
 the incore inode security state update directly.  In contrast, we do
 call fsnotify in both cases.
 
 Let me know what you think.  Please do NOT apply yet.

Very nice, and gets rid of lots of crap.  Now that we started parsing
the attribute name in generic code we should deprecate the old
-{get,set,list,remove}xattr inode operations and make the helpers
James added a while ago mandatory for the future.


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 05:53:32PM +0100, Al Viro wrote:
 I'm taking NFS helpers to libfs.c and switching ncpfs to them.  IMO that's
 better than copying the damn thing and other network filesystems might have
 the same needs eventually...

[something like this - completely untested]

* stray_page_get_link(inode, filler) - returns ERR_PTR(error) or pointer
to symlink body.  Said symlink body sits in a page at offset equal to
offsetof(page, struct stray_page_link).  filler() is expected to put it
at such offset. Page is cached.

* stray_page_put_link() - -put_link() suitable for links obtained from
stray_page_get_link().  Unlike the usual pagecache-based variants, this
sucker does _not_ rely on page staying cached.

* nfs and ncpfs switched to the helpers above.

Signed-off-by: Al Viro [EMAIL PROTECTED]

diff -urN RC14-rc6-git10-base/fs/libfs.c current/fs/libfs.c
--- RC13-rc6-git10-base/fs/libfs.c  2005-08-10 10:37:52.0 -0400
+++ current/fs/libfs.c  2005-08-19 13:29:39.0 -0400
@@ -7,6 +7,7 @@
 #include linux/pagemap.h
 #include linux/mount.h
 #include linux/vfs.h
+#include linux/namei.h
 #include asm/uaccess.h
 
 int simple_getattr(struct vfsmount *mnt, struct dentry *dentry,
@@ -616,6 +617,42 @@
return ret;
 }
 
+char *stray_page_get_link(struct inode *inode, int (*filler)(struct inode *, 
struct page *))
+{
+   struct stray_page_symlink *p;
+   struct page *page;
+
+   page = read_cache_page(inode-i_data, 0, (filler_t *)filler, inode);
+   if (IS_ERR(page))
+   goto read_failed;
+   if (!PageUptodate(page))
+   goto getlink_read_error;
+   p = kmap(page);
+   p-page = page;
+   return p-body;
+
+getlink_read_error:
+   page_cache_release(page);
+   page = ERR_PTR(-EIO);
+read_failed:
+   return (char *)page;/* error */
+}
+
+void stray_page_put_link(struct dentry *dentry, struct nameidata *nd)
+{
+   char *s = nd_get_link(nd);
+   if (!IS_ERR(s)) {
+   struct stray_page_symlink *p;
+   struct page *page;
+
+   p = container_of(s, struct stray_page_symlink, body[0]);
+   page = p-page;
+
+   kunmap(page);
+   page_cache_release(page);
+   }
+}
+
 EXPORT_SYMBOL(dcache_dir_close);
 EXPORT_SYMBOL(dcache_dir_lseek);
 EXPORT_SYMBOL(dcache_dir_open);
@@ -648,3 +685,5 @@
 EXPORT_SYMBOL_GPL(simple_attr_close);
 EXPORT_SYMBOL_GPL(simple_attr_read);
 EXPORT_SYMBOL_GPL(simple_attr_write);
+EXPORT_SYMBOL(stray_page_get_link);
+EXPORT_SYMBOL(stray_page_put_link);
diff -urN RC13-rc6-git10-base/fs/ncpfs/inode.c current/fs/ncpfs/inode.c
--- RC13-rc6-git10-base/fs/ncpfs/inode.c2005-06-17 15:48:29.0 
-0400
+++ current/fs/ncpfs/inode.c2005-08-19 13:12:01.0 -0400
@@ -104,7 +104,7 @@
 
 extern struct dentry_operations ncp_root_dentry_operations;
 #if defined(CONFIG_NCPFS_EXTRAS) || defined(CONFIG_NCPFS_NFS_NS)
-extern struct address_space_operations ncp_symlink_aops;
+extern int ncp_follow_link(struct dentry *dentry, struct nameidata *nd);
 extern int ncp_symlink(struct inode*, struct dentry*, const char*);
 #endif
 
@@ -233,8 +233,8 @@
 #if defined(CONFIG_NCPFS_EXTRAS) || defined(CONFIG_NCPFS_NFS_NS)
 static struct inode_operations ncp_symlink_inode_operations = {
.readlink   = generic_readlink,
-   .follow_link= page_follow_link_light,
-   .put_link   = page_put_link,
+   .follow_link= ncp_follow_link,
+   .put_link   = stray_page_put_link,
.setattr= ncp_notify_change,
 };
 #endif
@@ -272,7 +272,6 @@
 #if defined(CONFIG_NCPFS_EXTRAS) || defined(CONFIG_NCPFS_NFS_NS)
} else if (S_ISLNK(inode-i_mode)) {
inode-i_op = ncp_symlink_inode_operations;
-   inode-i_data.a_ops = ncp_symlink_aops;
 #endif
} else {
make_bad_inode(inode);
diff -urN RC13-rc6-git10-base/fs/ncpfs/symlink.c current/fs/ncpfs/symlink.c
--- RC13-rc6-git10-base/fs/ncpfs/symlink.c  2005-06-17 15:48:29.0 
-0400
+++ current/fs/ncpfs/symlink.c  2005-08-19 13:26:26.0 -0400
@@ -30,6 +30,7 @@
 #include linux/time.h
 #include linux/mm.h
 #include linux/stat.h
+#include linux/namei.h
 #include ncplib_kernel.h
 
 
@@ -41,9 +42,8 @@
 
 /* - read a symbolic link -- */
 
-static int ncp_symlink_readpage(struct file *file, struct page *page)
+static int symlink_filler(struct inode *inode, struct page *page)
 {
-   struct inode *inode = page-mapping-host;
int error, length, len;
char *link, *rawlink;
char *buf = kmap(page);
@@ -77,6 +77,7 @@
}
 
len = NCP_MAX_SYMLINK_SIZE;
+   buf += offsetof(struct stray_page_symlink, body);
error = ncp_vol2io(NCP_SERVER(inode), buf, len, link, length, 0);
kfree(rawlink);
if (error)
@@ -96,13 +97,12 @@
return error;
 }
 
-/*
- * 

Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Christoph Hellwig
On Fri, Aug 19, 2005 at 07:02:18PM +0100, Al Viro wrote:
 On Fri, Aug 19, 2005 at 05:53:32PM +0100, Al Viro wrote:
  I'm taking NFS helpers to libfs.c and switching ncpfs to them.  IMO that's
  better than copying the damn thing and other network filesystems might have
  the same needs eventually...
 
 [something like this - completely untested]
 
 * stray_page_get_link(inode, filler) - returns ERR_PTR(error) or pointer
 to symlink body.  Said symlink body sits in a page at offset equal to
 offsetof(page, struct stray_page_link).  filler() is expected to put it
 at such offset. Page is cached.
 
 * stray_page_put_link() - -put_link() suitable for links obtained from
 stray_page_get_link().  Unlike the usual pagecache-based variants, this
 sucker does _not_ rely on page staying cached.
 
 * nfs and ncpfs switched to the helpers above.

Can you add some kerneldoc comments to describe them?  Especially as
the name is not very descriptive.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Mika Penttilä

Al Viro wrote:


On Fri, Aug 19, 2005 at 05:53:32PM +0100, Al Viro wrote:
 


I'm taking NFS helpers to libfs.c and switching ncpfs to them.  IMO that's
better than copying the damn thing and other network filesystems might have
the same needs eventually...
   



[something like this - completely untested]

* stray_page_get_link(inode, filler) - returns ERR_PTR(error) or pointer
to symlink body.  Said symlink body sits in a page at offset equal to
offsetof(page, struct stray_page_link).  filler() is expected to put it
at such offset. Page is cached.

* stray_page_put_link() - -put_link() suitable for links obtained from
stray_page_get_link().  Unlike the usual pagecache-based variants, this
sucker does _not_ rely on page staying cached.

* nfs and ncpfs switched to the helpers above.

Signed-off-by: Al Viro [EMAIL PROTECTED]

 



Just out of curiosity - what protects even local filesystems against 
concurrent truncate and symlink resolving when using the page cache helpers?


--Mika

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH] Stacking support for inode_init_security

2005-08-19 Thread hallyn
The following patch changes the (new to -mm) inode_init_security
function to support multiple LSMs.  It does this by placing the
three passed arguments (name, value, len) into a structure, and
passing in a list_head, onto which the structure can be appended.
The callers (filesystems) call their fs_xattr_set functions
on each returned (name, value, len) set.

This is useful both for the stacker LSM, and for any two (or more)
LSMs which might want to cooperate even without stacker.

I've tested it under a plain selinux-enabled 2.6.13-rc6-mm1 using
Stephen Smalley's sample exploit originally motivating
inode_init_security, as well as with a simple 'touch ab; ls -Z ab'.

I've also tested it with a corresponding stacker patch, with
selinux stacked with two test LSMs which simply define
inode_init_security.  Again, this passed the sample exploit, and
manually inspecting the .security xattrs gave the expected results.


thanks,
-serge

--
 fs/ext2/xattr_security.c |   22 ++
 fs/ext3/xattr_security.c |   22 ++
 include/linux/security.h |   30 +++---
 mm/shmem.c   |6 ++
 security/dummy.c |2 +-
 security/selinux/hooks.c |   42 ++
 6 files changed, 76 insertions(+), 48 deletions(-)

Index: linux-2.6.13-rc6-mm1/include/linux/security.h
===
--- linux-2.6.13-rc6-mm1.orig/include/linux/security.h  2005-08-19 
17:39:44.0 -0500
+++ linux-2.6.13-rc6-mm1/include/linux/security.h   2005-08-19 
19:02:30.0 -0500
@@ -89,6 +89,14 @@ struct swap_info_struct;
 
 #ifdef CONFIG_SECURITY
 
+struct xattr_data {
+   struct list_head list;
+   char *name;
+   void *value;
+   int len;
+};
+
+
 /**
  * struct security_operations - main security structure
  *
@@ -263,9 +271,13 @@ struct swap_info_struct;
  * then it should return -EOPNOTSUPP to skip this processing.
  * @inode contains the inode structure of the newly created inode.
  * @dir contains the inode structure of the parent directory.
- * @name will be set to the allocated name suffix (e.g. selinux).
- * @value will be set to the allocated attribute value.
- * @len will be set to the length of the value.
+ * @head, if not null, points to a listhead to which to append a
+ * newly allocated struct xattr_data with the following data:
+ * @data-name will be set to the allocated name suffix
+ * (e.g. selinux).
+ * @data-value will be set to the allocated attribute value.
+ * @data-len will be set to the length of the value.
+ * @data-list is used to add the data to the list_head
  * Returns 0 if @name and @value have been successfully set,
  * -EOPNOTSUPP if no security attribute is needed, or
  * -ENOMEM on memory allocation failure.
@@ -1064,7 +1076,7 @@ struct security_operations {
int (*inode_alloc_security) (struct inode *inode);  
void (*inode_free_security) (struct inode *inode);
int (*inode_init_security) (struct inode *inode, struct inode *dir,
-   char **name, void **value, size_t *len);
+   struct list_head *head);
int (*inode_create) (struct inode *dir,
 struct dentry *dentry, int mode);
int (*inode_link) (struct dentry *old_dentry,
@@ -1415,13 +1427,11 @@ static inline void security_inode_free (
 
 static inline int security_inode_init_security (struct inode *inode,
struct inode *dir,
-   char **name,
-   void **value,
-   size_t *len)
+   struct list_head *head)
 {
if (unlikely (IS_PRIVATE (inode)))
return -EOPNOTSUPP;
-   return security_ops-inode_init_security (inode, dir, name, value, len);
+   return security_ops-inode_init_security (inode, dir, head);
 }

 static inline int security_inode_create (struct inode *dir,
@@ -2103,9 +2113,7 @@ static inline void security_inode_free (
 
 static inline int security_inode_init_security (struct inode *inode,
struct inode *dir,
-   char **name,
-   void **value,
-   size_t *len)
+   struct list_head *head)
 {
return -EOPNOTSUPP;
 }
Index: linux-2.6.13-rc6-mm1/mm/shmem.c
===
--- linux-2.6.13-rc6-mm1.orig/mm/shmem.c2005-08-19 17:39:44.0 
-0500
+++ 

Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 07:00:38PM +0100, Christoph Hellwig wrote:
 On Fri, Aug 19, 2005 at 07:02:18PM +0100, Al Viro wrote:
  On Fri, Aug 19, 2005 at 05:53:32PM +0100, Al Viro wrote:
   I'm taking NFS helpers to libfs.c and switching ncpfs to them.  IMO that's
   better than copying the damn thing and other network filesystems might 
   have
   the same needs eventually...
  
  [something like this - completely untested]
  
  * stray_page_get_link(inode, filler) - returns ERR_PTR(error) or pointer
  to symlink body.  Said symlink body sits in a page at offset equal to
  offsetof(page, struct stray_page_link).  filler() is expected to put it
  at such offset. Page is cached.
  
  * stray_page_put_link() - -put_link() suitable for links obtained from
  stray_page_get_link().  Unlike the usual pagecache-based variants, this
  sucker does _not_ rely on page staying cached.
  
  * nfs and ncpfs switched to the helpers above.
 
 Can you add some kerneldoc comments to describe them?  Especially as
 the name is not very descriptive.

Hey, if anybody has suggestions on names - they are very welcome ;-)

FWIW, I'd rather take page_symlink(), page_symlink_inode_operations,
page_put_link(), page_follow_link_light(), page_readlink(), page_getlink(),
generic_readlink() and vfs_readlink() to the same place where these guys
would live.  They all belong together and none of them has any business
in fs/namei.c.  Options: fs/libfs.c or separate library since fs/libfs.c
is getting crowded.  Linus, do you have any objections to that or suggestions
on filename here?
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Matthew Wilcox
On Fri, Aug 19, 2005 at 08:38:34PM +0100, Al Viro wrote:
 FWIW, I'd rather take page_symlink(), page_symlink_inode_operations,
 page_put_link(), page_follow_link_light(), page_readlink(), page_getlink(),
 generic_readlink() and vfs_readlink() to the same place where these guys
 would live.  They all belong together and none of them has any business
 in fs/namei.c.  Options: fs/libfs.c or separate library since fs/libfs.c
 is getting crowded.  Linus, do you have any objections to that or suggestions
 on filename here?

fs/symlink.c?

-- 
Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception. -- Mark Twain
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Mika Penttilä

Al Viro wrote:


On Fri, Aug 19, 2005 at 10:16:47PM +0300, Mika Penttilä wrote:
 

Just out of curiosity - what protects even local filesystems against 
concurrent truncate and symlink resolving when using the page cache helpers?
   



How do you get truncate(2) or ftruncate(2) to do something with a symlink?
The former follows links, the latter takes an open file...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 

Yes that is right, there is no way to invalidate the symlink inode 
mapping page(s) from user space.


--Mika




-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix cramfs making duplicate entries in inode cache

2005-08-19 Thread Dave Johnson

Every time cramfs_lookup() is called to lookup and inode for a dentry,
get_cramfs_inode() will allocate a new inode without checking to see
if that inode already exists in the inode cache.

This is fine the first time, but if the dentry cache entry(ies)
associated with that inode are aged out, but the inode entry is not
aged out (which can be quite common if the inode has buffer cache
linked to it), cramfs_lookup() will be called again and another inode
will be allocated and added to the inode cache creating a duplicate in
the inode cache.

The big issue here is that the buffers associated with each inode
cache entry are not shared between the duplicates!

The older inode entries are now orphaned as no dentry points to it and
won't be freed until the buffer cache assoicated with them are first
freed.  The newest entry will have to create all new buffer cache for
each part of its file as the old buffer cache is now orphaned as well.

Patch below fixes this by making get_cramfs_inode() use the inode
cache before blindly creating a new entry every time.  This eliminates
the duplicate inodes and duplicate buffer cache.

-- 
Dave Johnson
Starent Networks


= fs/cramfs/inode.c 1.42 vs edited =
--- 1.42/fs/cramfs/inode.c  2005-07-14 12:24:48 -04:00
+++ edited/fs/cramfs/inode.c2005-08-19 15:39:05 -04:00
@@ -44,10 +44,10 @@
 
 static struct inode *get_cramfs_inode(struct super_block *sb, struct 
cramfs_inode * cramfs_inode)
 {
-   struct inode * inode = new_inode(sb);
+   struct inode * inode = iget_locked(sb, CRAMINO(cramfs_inode));
static struct timespec zerotime;
 
-   if (inode) {
+   if (inode  (inode-i_state  I_NEW)) {
inode-i_mode = cramfs_inode-mode;
inode-i_uid = cramfs_inode-uid;
inode-i_size = cramfs_inode-size;
@@ -57,11 +57,6 @@
/* Struct copy intentional */
inode-i_mtime = inode-i_atime = inode-i_ctime = zerotime;
inode-i_ino = CRAMINO(cramfs_inode);
-   /* inode-i_nlink is left 1 - arguably wrong for directories,
-  but it's the best we can do without reading the directory
-  contents.  1 yields the right result in GNU find, even
-  without -noleaf option. */
-   insert_inode_hash(inode);
if (S_ISREG(inode-i_mode)) {
inode-i_fop = generic_ro_fops;
inode-i_data.a_ops = cramfs_aops;
@@ -76,6 +72,7 @@
init_special_inode(inode, inode-i_mode,
old_decode_dev(cramfs_inode-size));
}
+   unlock_new_inode(inode);
}
return inode;
 }

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Al Viro wrote:
 
 FWIW, I'd rather take page_symlink(), page_symlink_inode_operations,
 page_put_link(), page_follow_link_light(), page_readlink(), page_getlink(),
 generic_readlink() and vfs_readlink() to the same place where these guys
 would live.  They all belong together and none of them has any business
 in fs/namei.c.  Options: fs/libfs.c or separate library since fs/libfs.c
 is getting crowded.  Linus, do you have any objections to that or suggestions
 on filename here?

I'm not sure if this merits a new file or new organization (hey,
fs/lib/xxx might be good in theory). In particular, I had actually been
hoping to release 2.6.13 today, but this seems like a valid thing to hold 
things up for - but not if we're going to re-organize things.

The one thing that strikes me is that we might actually have less pain if
we just changed the calling convention for follow_link/put_link slightly
instead of creating a new library function. The existing page cache  
thing really _does_ work very well, and would work fine for NFS and ncpfs
too, if we just allowed an extra cookie to be passed along from
follow_link() to put_link().

A patch like this (totally untested, and you'd need to update any
filesystems that don't use the regular page_follow_link interface) would
seem to clean up the mess nicely.. The basic change is that follow_link() 
returns a error-pointer cookie instead of just zero or error, and that is 
passed into put_link().

That simple calling convention change solves all problems. It so _happens_ 
that any old binary code also continues to work (the cookie will be zero, 
and put_link will ignore it), so it shouldn't even break any unconverted 
stuff (unless they mix using their own functions _and_ using the helpher 
functions, which is of course possible).

The shouldn't break nonconverted filesystems makes me think this is a 
safe change. Comments?

NOTE NOTE NOTE! Let me say again that it's untested. It might not break 
nonconverted filesystems, but it equally well migth break even the 
converted ones ;)

Linus


diff --git a/fs/namei.c b/fs/namei.c
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -501,6 +501,7 @@ struct path {
 static inline int __do_follow_link(struct path *path, struct nameidata *nd)
 {
int error;
+   void *cookie;
struct dentry *dentry = path-dentry;
 
touch_atime(path-mnt, dentry);
@@ -508,13 +509,15 @@ static inline int __do_follow_link(struc
 
if (path-mnt == nd-mnt)
mntget(path-mnt);
-   error = dentry-d_inode-i_op-follow_link(dentry, nd);
-   if (!error) {
+   cookie = dentry-d_inode-i_op-follow_link(dentry, nd);
+   error = PTR_ERR(cookie);
+   if (!IS_ERR(cookie)) {
char *s = nd_get_link(nd);
+   error = 0;
if (s)
error = __vfs_follow_link(nd, s);
if (dentry-d_inode-i_op-put_link)
-   dentry-d_inode-i_op-put_link(dentry, nd);
+   dentry-d_inode-i_op-put_link(dentry, nd, cookie);
}
dput(dentry);
mntput(path-mnt);
@@ -2345,14 +2348,17 @@ int generic_readlink(struct dentry *dent
 {
struct nameidata nd;
int res;
+   void *cookie;
+
nd.depth = 0;
-   res = dentry-d_inode-i_op-follow_link(dentry, nd);
-   if (!res) {
+   cookie = dentry-d_inode-i_op-follow_link(dentry, nd);
+   if (!IS_ERR(cookie)) {
res = vfs_readlink(dentry, buffer, buflen, nd_get_link(nd));
if (dentry-d_inode-i_op-put_link)
-   dentry-d_inode-i_op-put_link(dentry, nd);
+   dentry-d_inode-i_op-put_link(dentry, nd, cookie);
+   cookie = ERR_PTR(0);
}
-   return res;
+   return PTR_ERR(cookie);
 }
 
 int vfs_follow_link(struct nameidata *nd, const char *link)
@@ -2395,23 +2401,19 @@ int page_readlink(struct dentry *dentry,
return res;
 }
 
-int page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
+void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
 {
struct page *page;
nd_set_link(nd, page_getlink(dentry, page));
-   return 0;
+   return page;
 }
 
-void page_put_link(struct dentry *dentry, struct nameidata *nd)
+void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
 {
if (!IS_ERR(nd_get_link(nd))) {
-   struct page *page;
-   page = find_get_page(dentry-d_inode-i_mapping, 0);
-   if (!page)
-   BUG();
+   struct page *page = cookie;
kunmap(page);
page_cache_release(page);
-   page_cache_release(page);
}
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -993,8 +993,8 @@ struct inode_operations {
int (*rename) (struct inode *, 

Re: [RFC][PATCH] Generic fallback for security xattrs

2005-08-19 Thread Jeff Mahoney
Christoph Hellwig wrote:
 Very nice, and gets rid of lots of crap.  Now that we started parsing
 the attribute name in generic code we should deprecate the old
 -{get,set,list,remove}xattr inode operations and make the helpers
 James added a while ago mandatory for the future.

I'd be fine with making the generic xattr routines the default with a
few exceptions:

1) Pass the entire name of the xattr (including prefix) to the
underlying filesystem. ReiserFS needs the entire name, including the
prefix, to look up an xattr. Reconstructing the name again, when it was
available in the caller, is wasteful. Once the determination of the
prefix is made, the filesystem code can just skip the prefix length
anyway if it doesn't need it.

2) Keep i_op-listxattr. For xattrs like system.posix_acl_access, sure,
it makes sense. But when selinux or user xattrs are introduced, the name
and number of such xattrs will be unknown to a generic routine.

I have a series of patches for reiserfs xattrs that I'm still working
the kinks out of, mainly performance-wise. With the patch set applied,
reiserfs_{set,get,remove}xattr look identical to the generic versions,
except that find_xattr_handler_prefix doesn't modify the name.

-Jeff

-- 
Jeff Mahoney
SuSE Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Generic fallback for security xattrs

2005-08-19 Thread Chris Wright
* Christoph Hellwig ([EMAIL PROTECTED]) wrote:
 On Fri, Aug 19, 2005 at 01:57:56PM -0400, Stephen Smalley wrote:
  Note that this
  approach may be controversial [1]; it has been suggested that we
  should instead be modifying all filesystem types to support security
  (and other) xattrs natively, but this seems questionable for legacy
  filesystems like vfat and pseudo filesystems like proc, especially
  when the resulting code will end up simply calling the security
  framework to access the incore security label as with the current
  devpts and tmpfs handlers.

Agreed, I think the counter points that were made were not reasonable.

  The patch restructures the code flow slightly to reduce duplication
  between the normal path and the fallback path, but this should only have
  one user-visible side effect - a program may get -EACCES rather than
  -EOPNOTSUPP if policy denied access but the filesystem didn't support
  the operation anyway.  Note that the post_setxattr hook call is not
  needed in the fallback case, as the inode_setsecurity hook call handles
  the incore inode security state update directly.  In contrast, we do
  call fsnotify in both cases.
  
  Let me know what you think.  Please do NOT apply yet.
 
 Very nice, and gets rid of lots of crap.  Now that we started parsing
 the attribute name in generic code we should deprecate the old
 -{get,set,list,remove}xattr inode operations and make the helpers
 James added a while ago mandatory for the future.

I agree it's a nice cleanup.  The only thing I didn't care for was parsing
name in generic code (since it was special cased as the only name), but
you make a great point.  There are toplevel generic names which each fs
has to parse anyway.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Anton Altaparmakov
On Fri, 19 Aug 2005, Linus Torvalds wrote:
 On Fri, 19 Aug 2005, Linus Torvalds wrote:
  
   - document this as a fundamental fact, and apply the ncpfs patch. Local 
 filesystems can still continue to use the generic helper functions 
 (all other users _are_ local filesystems).
 
 Actually, looking at the ncpfs patch, I'd rather not apply that patch 
 as-is. It looks like it will totally disable symlink caching, which would 
 be kind of sad. Somebody willing to do the same thing NFS does?

It does disable link caching.  But I didn't make this up.  This is exactly 
what smbfs uses.  I just copied smbfs given ncpfs copies almost everything 
smbfs does anyway...

Best regards,

Anton
-- 
Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/  http://www-stu.christs.cam.ac.uk/~aia21/
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Anton Altaparmakov wrote:
 
 It does disable link caching.  But I didn't make this up.  This is exactly 
 what smbfs uses.  I just copied smbfs given ncpfs copies almost everything 
 smbfs does anyway...

Can you test whether the untested test-patch I sent out seems to work for 
your case that bugs out? You'll probably get a few new initialization 
from incompatible pointer type warnings, but I do believe that they 
should be harmless at least on normal architectures.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Christoph Hellwig
On Fri, Aug 19, 2005 at 08:43:06PM +0100, Al Viro wrote:
 On Fri, Aug 19, 2005 at 08:41:29PM +0100, Matthew Wilcox wrote:
   is getting crowded.  Linus, do you have any objections to that or 
   suggestions
   on filename here?
  
  fs/symlink.c?
 
 Or fs/lib/symlink.c...

That's a very good idea.  We were in need of fs/lib/ for a long time
to unwind all the generic routines of the VFS logic.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Anton Altaparmakov
On Fri, 19 Aug 2005, Linus Torvalds wrote:
 The one thing that strikes me is that we might actually have less pain if
 we just changed the calling convention for follow_link/put_link slightly
 instead of creating a new library function. The existing page cache  
 thing really _does_ work very well, and would work fine for NFS and ncpfs
 too, if we just allowed an extra cookie to be passed along from
 follow_link() to put_link().
 
 A patch like this (totally untested, and you'd need to update any
 filesystems that don't use the regular page_follow_link interface) would
 seem to clean up the mess nicely.. The basic change is that follow_link() 
 returns a error-pointer cookie instead of just zero or error, and that is 
 passed into put_link().
 
 That simple calling convention change solves all problems. It so _happens_ 
 that any old binary code also continues to work (the cookie will be zero, 
 and put_link will ignore it), so it shouldn't even break any unconverted 
 stuff (unless they mix using their own functions _and_ using the helpher 
 functions, which is of course possible).
 
 The shouldn't break nonconverted filesystems makes me think this is a 
 safe change. Comments?
 
 NOTE NOTE NOTE! Let me say again that it's untested. It might not break 
 nonconverted filesystems, but it equally well migth break even the 
 converted ones ;)
 
   Linus
 
 
 diff --git a/fs/namei.c b/fs/namei.c
 --- a/fs/namei.c
 +++ b/fs/namei.c
 @@ -501,6 +501,7 @@ struct path {
  static inline int __do_follow_link(struct path *path, struct nameidata *nd)
  {
   int error;
 + void *cookie;
   struct dentry *dentry = path-dentry;
  
   touch_atime(path-mnt, dentry);
 @@ -508,13 +509,15 @@ static inline int __do_follow_link(struc
  
   if (path-mnt == nd-mnt)
   mntget(path-mnt);
 - error = dentry-d_inode-i_op-follow_link(dentry, nd);
 - if (!error) {
 + cookie = dentry-d_inode-i_op-follow_link(dentry, nd);
 + error = PTR_ERR(cookie);
 + if (!IS_ERR(cookie)) {
   char *s = nd_get_link(nd);
 + error = 0;
   if (s)
   error = __vfs_follow_link(nd, s);
   if (dentry-d_inode-i_op-put_link)
 - dentry-d_inode-i_op-put_link(dentry, nd);
 + dentry-d_inode-i_op-put_link(dentry, nd, cookie);
   }
   dput(dentry);
   mntput(path-mnt);
 @@ -2345,14 +2348,17 @@ int generic_readlink(struct dentry *dent
  {
   struct nameidata nd;
   int res;
 + void *cookie;
 +
   nd.depth = 0;
 - res = dentry-d_inode-i_op-follow_link(dentry, nd);
 - if (!res) {
 + cookie = dentry-d_inode-i_op-follow_link(dentry, nd);
 + if (!IS_ERR(cookie)) {
   res = vfs_readlink(dentry, buffer, buflen, nd_get_link(nd));
   if (dentry-d_inode-i_op-put_link)
 - dentry-d_inode-i_op-put_link(dentry, nd);
 + dentry-d_inode-i_op-put_link(dentry, nd, cookie);
 + cookie = ERR_PTR(0);

You are throwing away the error return from vfs_readlink().  I suspect you 
wanted:

+   cookie = ERR_PTR(res);

   }
 - return res;
 + return PTR_ERR(cookie);
  }
  
  int vfs_follow_link(struct nameidata *nd, const char *link)
 @@ -2395,23 +2401,19 @@ int page_readlink(struct dentry *dentry,
   return res;
  }
  
 -int page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
 +void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
  {
   struct page *page;
   nd_set_link(nd, page_getlink(dentry, page));
 - return 0;
 + return page;
  }
  
 -void page_put_link(struct dentry *dentry, struct nameidata *nd)
 +void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
  {
   if (!IS_ERR(nd_get_link(nd))) {
 - struct page *page;
 - page = find_get_page(dentry-d_inode-i_mapping, 0);
 - if (!page)
 - BUG();
 + struct page *page = cookie;
   kunmap(page);
   page_cache_release(page);
 - page_cache_release(page);
   }
  }
  
 diff --git a/include/linux/fs.h b/include/linux/fs.h
 --- a/include/linux/fs.h
 +++ b/include/linux/fs.h
 @@ -993,8 +993,8 @@ struct inode_operations {
   int (*rename) (struct inode *, struct dentry *,
   struct inode *, struct dentry *);
   int (*readlink) (struct dentry *, char __user *,int);
 - int (*follow_link) (struct dentry *, struct nameidata *);
 - void (*put_link) (struct dentry *, struct nameidata *);
 + void * (*follow_link) (struct dentry *, struct nameidata *);
 + void (*put_link) (struct dentry *, struct nameidata *, void *);
   void (*truncate) (struct inode *);
   int (*permission) (struct inode *, int, struct nameidata *);
   int (*setattr) (struct dentry *, struct iattr *);
 @@ -1602,8 +1602,8 @@ extern struct 

Re: [PATCH] fix cramfs making duplicate entries in inode cache

2005-08-19 Thread Phillip Lougher

Dave Johnson wrote:



Patch below fixes this by making get_cramfs_inode() use the inode
cache before blindly creating a new entry every time.  This eliminates
the duplicate inodes and duplicate buffer cache.


 +  struct inode * inode = iget_locked(sb, CRAMINO(cramfs_inode));

Doesn't iget_locked() assume inode numbers are unique?

In Cramfs inode numbers are set to 1 for non-data inodes (fifos, 
sockets, devices, empty directories), i.e


%stat device namedpipe
  File: `device'
  Size: 0   Blocks: 0  IO Block: 4096   character 
special file

Device: 700h/1792d  Inode: 1   Links: 1 Device type: 1,1
Access: (0644/crw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 1970-01-01 01:00:00.0 +0100
Modify: 1970-01-01 01:00:00.0 +0100
Change: 1970-01-01 01:00:00.0 +0100
  File: `namedpipe'
  Size: 0   Blocks: 0  IO Block: 4096   fifo
Device: 700h/1792d  Inode: 1   Links: 1
Access: (0644/prw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 1970-01-01 01:00:00.0 +0100
Modify: 1970-01-01 01:00:00.0 +0100
Change: 1970-01-01 01:00:00.0 +0100

Should iget5_locked() be used here?

Phillip
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Anton Altaparmakov wrote:
 
 Yes, sure.  I have applied your patch to our 2.6.11.4 tree (with the one 
 liner change I emailed you just now) and have kicked off a compile.

Actually, hold on. The original patch had another problem: it returned an
uninitialized page pointer when page_getlink() failed.

This one should have that fixed, and has converted a few other 
filesystems. Most of them trivially, but I took the opportunity to just 
simplify NFS while I was at it, since it now has no reason to need to save 
off the struct page * any more.

It's still not tested, but at least I've looked at it a bit more ;)

Linus

---
diff --git a/fs/autofs/symlink.c b/fs/autofs/symlink.c
--- a/fs/autofs/symlink.c
+++ b/fs/autofs/symlink.c
@@ -12,11 +12,12 @@
 
 #include autofs_i.h
 
-static int autofs_follow_link(struct dentry *dentry, struct nameidata *nd)
+/* Nothing to release.. */
+static void *autofs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
char *s=((struct autofs_symlink *)dentry-d_inode-u.generic_ip)-data;
nd_set_link(nd, s);
-   return 0;
+   return NULL;
 }
 
 struct inode_operations autofs_symlink_inode_operations = {
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -83,8 +83,8 @@ extern int cifs_dir_notify(struct file *
 extern struct dentry_operations cifs_dentry_ops;
 
 /* Functions related to symlinks */
-extern int cifs_follow_link(struct dentry *direntry, struct nameidata *nd);
-extern void cifs_put_link(struct dentry *direntry, struct nameidata *nd);
+extern void *cifs_follow_link(struct dentry *direntry, struct nameidata *nd);
+extern void cifs_put_link(struct dentry *direntry, struct nameidata *nd, void 
*);
 extern int cifs_readlink(struct dentry *direntry, char __user *buffer, 
 int buflen);
 extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -92,7 +92,7 @@ cifs_hl_exit:
return rc;
 }
 
-int
+void *
 cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
 {
struct inode *inode = direntry-d_inode;
@@ -148,7 +148,7 @@ out:
 out_no_free:
FreeXid(xid);
nd_set_link(nd, target_path);
-   return 0;
+   return NULL;/* No cookie */
 }
 
 int
@@ -330,7 +330,7 @@ cifs_readlink(struct dentry *direntry, c
return rc;
 }
 
-void cifs_put_link(struct dentry *direntry, struct nameidata *nd)
+void cifs_put_link(struct dentry *direntry, struct nameidata *nd, void *cookie)
 {
char *p = nd_get_link(nd);
if (!IS_ERR(p))
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -21,11 +21,11 @@
 #include xattr.h
 #include linux/namei.h
 
-static int ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct ext2_inode_info *ei = EXT2_I(dentry-d_inode);
nd_set_link(nd, (char *)ei-i_data);
-   return 0;
+   return NULL;
 }
 
 struct inode_operations ext2_symlink_inode_operations = {
diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
--- a/fs/ext3/symlink.c
+++ b/fs/ext3/symlink.c
@@ -23,11 +23,11 @@
 #include linux/namei.h
 #include xattr.h
 
-static int ext3_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void * ext3_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct ext3_inode_info *ei = EXT3_I(dentry-d_inode);
nd_set_link(nd, (char*)ei-i_data);
-   return 0;
+   return NULL;
 }
 
 struct inode_operations ext3_symlink_inode_operations = {
diff --git a/fs/namei.c b/fs/namei.c
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -501,6 +501,7 @@ struct path {
 static inline int __do_follow_link(struct path *path, struct nameidata *nd)
 {
int error;
+   void *cookie;
struct dentry *dentry = path-dentry;
 
touch_atime(path-mnt, dentry);
@@ -508,13 +509,15 @@ static inline int __do_follow_link(struc
 
if (path-mnt == nd-mnt)
mntget(path-mnt);
-   error = dentry-d_inode-i_op-follow_link(dentry, nd);
-   if (!error) {
+   cookie = dentry-d_inode-i_op-follow_link(dentry, nd);
+   error = PTR_ERR(cookie);
+   if (!IS_ERR(cookie)) {
char *s = nd_get_link(nd);
+   error = 0;
if (s)
error = __vfs_follow_link(nd, s);
if (dentry-d_inode-i_op-put_link)
-   dentry-d_inode-i_op-put_link(dentry, nd);
+   dentry-d_inode-i_op-put_link(dentry, nd, cookie);
}
dput(dentry);
mntput(path-mnt);
@@ -2344,15 +2347,17 @@ out:
 int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
struct nameidata nd;
-   int res;
+   void *cookie;
+
nd.depth = 0;
-   

Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 03:04:52PM -0700, Linus Torvalds wrote:
 
 
 On Fri, 19 Aug 2005, Anton Altaparmakov wrote:
  
  Yes, sure.  I have applied your patch to our 2.6.11.4 tree (with the one 
  liner change I emailed you just now) and have kicked off a compile.
 
 Actually, hold on. The original patch had another problem: it returned an
 uninitialized page pointer when page_getlink() failed.
 
 This one should have that fixed, and has converted a few other 
 filesystems. Most of them trivially, but I took the opportunity to just 
 simplify NFS while I was at it, since it now has no reason to need to save 
 off the struct page * any more.
 
 It's still not tested, but at least I've looked at it a bit more ;)

That looks OK except for
* jffs2 is b0rken (see patch in another mail)
* afs, autofs4, befs, devfs, freevxfs, jffs2, jfs, ncpfs, procfs,
smbfs, sysvfs, ufs, xfs - prototype change for -follow_link()
* befs, smbfs, xfs - same for -put_link()
* ncpfs fix is actually missing here

Prototype changes are covered by patch below (incremental on top of your +
jffs2 fix upthread).  No ncpfs changes - these will go separately, assuming
you haven't done them yet; just a plain janitor stuff.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Sat, Aug 20, 2005 at 12:15:42AM +0100, Al Viro wrote:
 
 That looks OK except for
   * jffs2 is b0rken (see patch in another mail)
   * afs, autofs4, befs, devfs, freevxfs, jffs2, jfs, ncpfs, procfs,
 smbfs, sysvfs, ufs, xfs - prototype change for -follow_link()
   * befs, smbfs, xfs - same for -put_link()
   * ncpfs fix is actually missing here
 
 Prototype changes are covered by patch below (incremental on top of your +
 jffs2 fix upthread).  No ncpfs changes - these will go separately, assuming
 you haven't done them yet; just a plain janitor stuff.


Gaack...  And here's the patch itself - sorry.

diff -urN RC13-rc6-git10-base/fs/afs/mntpt.c current/fs/afs/mntpt.c
--- RC13-rc6-git10-base/fs/afs/mntpt.c  2005-06-17 15:48:29.0 -0400
+++ current/fs/afs/mntpt.c  2005-08-19 19:02:48.0 -0400
@@ -30,7 +30,7 @@
   struct dentry *dentry,
   struct nameidata *nd);
 static int afs_mntpt_open(struct inode *inode, struct file *file);
-static int afs_mntpt_follow_link(struct dentry *dentry, struct nameidata *nd);
+static void *afs_mntpt_follow_link(struct dentry *dentry, struct nameidata 
*nd);
 
 struct file_operations afs_mntpt_file_operations = {
.open   = afs_mntpt_open,
@@ -233,7 +233,7 @@
 /*
  * follow a link from a mountpoint directory, thus causing it to be mounted
  */
-static int afs_mntpt_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *afs_mntpt_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct vfsmount *newmnt;
struct dentry *old_dentry;
@@ -249,7 +249,7 @@
newmnt = afs_mntpt_do_automount(dentry);
if (IS_ERR(newmnt)) {
path_release(nd);
-   return PTR_ERR(newmnt);
+   return (void *)newmnt;
}
 
old_dentry = nd-dentry;
@@ -267,7 +267,7 @@
}
 
kleave( = %d, err);
-   return err;
+   return ERR_PTR(err);
 } /* end afs_mntpt_follow_link() */
 
 /*/
diff -urN RC13-rc6-git10-base/fs/autofs4/symlink.c current/fs/autofs4/symlink.c
--- RC13-rc6-git10-base/fs/autofs4/symlink.c2005-06-17 15:48:29.0 
-0400
+++ current/fs/autofs4/symlink.c2005-08-19 19:03:11.0 -0400
@@ -12,11 +12,11 @@
 
 #include autofs_i.h
 
-static int autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct autofs_info *ino = autofs4_dentry_ino(dentry);
nd_set_link(nd, (char *)ino-u.symlink);
-   return 0;
+   return NULL;
 }
 
 struct inode_operations autofs4_symlink_inode_operations = {
diff -urN RC13-rc6-git10-base/fs/befs/linuxvfs.c current/fs/befs/linuxvfs.c
--- RC13-rc6-git10-base/fs/befs/linuxvfs.c  2005-06-17 15:48:29.0 
-0400
+++ current/fs/befs/linuxvfs.c  2005-08-19 19:03:52.0 -0400
@@ -41,8 +41,8 @@
 static void befs_destroy_inode(struct inode *inode);
 static int befs_init_inodecache(void);
 static void befs_destroy_inodecache(void);
-static int befs_follow_link(struct dentry *, struct nameidata *);
-static void befs_put_link(struct dentry *, struct nameidata *);
+static void *befs_follow_link(struct dentry *, struct nameidata *);
+static void befs_put_link(struct dentry *, struct nameidata *, void *);
 static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
char **out, int *out_len);
 static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -487,10 +487,10 @@
}
 
nd_set_link(nd, link);
-   return 0;
+   return NULL;
 }
 
-static void befs_put_link(struct dentry *dentry, struct nameidata *nd)
+static void befs_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
 {
befs_inode_info *befs_ino = BEFS_I(dentry-d_inode);
if (befs_ino-i_flags  BEFS_LONG_SYMLINK) {
diff -urN RC13-rc6-git10-base/fs/devfs/base.c current/fs/devfs/base.c
--- RC13-rc6-git10-base/fs/devfs/base.c 2005-06-17 15:48:29.0 -0400
+++ current/fs/devfs/base.c 2005-08-19 19:04:17.0 -0400
@@ -2491,11 +2491,11 @@
return 0;
 }  /*  End Function devfs_mknod  */
 
-static int devfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *devfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct devfs_entry *p = get_devfs_entry_from_vfs_inode(dentry-d_inode);
nd_set_link(nd, p ? p-u.symlink.linkname : ERR_PTR(-ENODEV));
-   return 0;
+   return NULL;
 }  /*  End Function devfs_follow_link  */
 
 static struct inode_operations devfs_iops = {
diff -urN RC13-rc6-git10-base/fs/freevxfs/vxfs_immed.c 
current/fs/freevxfs/vxfs_immed.c
--- RC13-rc6-git10-base/fs/freevxfs/vxfs_immed.c2005-06-17 
15:48:29.0 

Re: [PATCH] fix cramfs making duplicate entries in inode cache

2005-08-19 Thread Dave Johnson
Phillip Lougher writes:
 Doesn't iget_locked() assume inode numbers are unique?
 
 In Cramfs inode numbers are set to 1 for non-data inodes (fifos, 
 sockets, devices, empty directories), i.e
 
 %stat device namedpipe
File: `device'
Size: 0   Blocks: 0  IO Block: 4096   character 
 special file
 Device: 700h/1792d  Inode: 1   Links: 1 Device type: 1,1
 Access: (0644/crw-r--r--)  Uid: (0/root)   Gid: (0/root)
 Access: 1970-01-01 01:00:00.0 +0100
 Modify: 1970-01-01 01:00:00.0 +0100
 Change: 1970-01-01 01:00:00.0 +0100
File: `namedpipe'
Size: 0   Blocks: 0  IO Block: 4096   fifo
 Device: 700h/1792d  Inode: 1   Links: 1
 Access: (0644/prw-r--r--)  Uid: (0/root)   Gid: (0/root)
 Access: 1970-01-01 01:00:00.0 +0100
 Modify: 1970-01-01 01:00:00.0 +0100
 Change: 1970-01-01 01:00:00.0 +0100
 
 Should iget5_locked() be used here?

Yep, that was busted.  Below patch should be better.

-- 
Dave Johnson
Starent Networks


= fs/cramfs/inode.c 1.42 vs edited =
--- 1.42/fs/cramfs/inode.c  2005-07-14 12:24:48 -04:00
+++ edited/fs/cramfs/inode.c2005-08-19 18:47:28 -04:00
@@ -42,12 +42,43 @@
 #define CRAMINO(x) ((x)-offset?(x)-offset2:1)
 #define OFFSET(x)  ((x)-i_ino)
 
+
+static int cramfs_iget5_test(struct inode *inode, void *opaque)
+{
+   struct cramfs_inode * cramfs_inode = (struct cramfs_inode *)opaque;
+
+   if (inode-i_ino != 1)
+   return 1;
+
+   /* all empty directories, char, block, pipe, and sock, share inode #1 */
+  
+   if ((inode-i_mode != cramfs_inode-mode) ||
+   (inode-i_gid != cramfs_inode-gid) ||
+   (inode-i_uid != cramfs_inode-uid))
+   return 0; /* does not match */
+  
+   if ((S_ISCHR(inode-i_mode) || S_ISBLK(inode-i_mode)) 
+   (inode-i_rdev != old_decode_dev(cramfs_inode-size)))
+   return 0; /* does not match */
+
+   return 1; /* matches */
+}
+
+static int cramfs_iget5_set(struct inode *inode, void *opaque)
+{
+   struct cramfs_inode * cramfs_inode = (struct cramfs_inode *)opaque;
+   inode-i_ino = CRAMINO(cramfs_inode);
+   return 0;
+}
+
 static struct inode *get_cramfs_inode(struct super_block *sb, struct 
cramfs_inode * cramfs_inode)
 {
-   struct inode * inode = new_inode(sb);
+   struct inode * inode = iget5_locked(sb, CRAMINO(cramfs_inode),
+   cramfs_iget5_test, cramfs_iget5_set,
+   cramfs_inode);
static struct timespec zerotime;
 
-   if (inode) {
+   if (inode  (inode-i_state  I_NEW)) {
inode-i_mode = cramfs_inode-mode;
inode-i_uid = cramfs_inode-uid;
inode-i_size = cramfs_inode-size;
@@ -59,11 +92,6 @@
/* Struct copy intentional */
inode-i_mtime = inode-i_atime = inode-i_ctime = zerotime;
inode-i_ino = CRAMINO(cramfs_inode);
-   /* inode-i_nlink is left 1 - arguably wrong for directories,
-  but it's the best we can do without reading the directory
-  contents.  1 yields the right result in GNU find, even
-  without -noleaf option. */
-   insert_inode_hash(inode);
if (S_ISREG(inode-i_mode)) {
inode-i_fop = generic_ro_fops;
inode-i_data.a_ops = cramfs_aops;
@@ -75,6 +104,7 @@
init_special_inode(inode, inode-i_mode,
old_decode_dev(cramfs_inode-size));
}
+   unlock_new_inode(inode);
}
return inode;
 }

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 06:08:12PM -0700, Linus Torvalds wrote:
 
 
 On Sat, 20 Aug 2005, Al Viro wrote:
  
  That looks OK except for
  * ncpfs fix is actually missing here
 
 Well, the thing is, with the change to page_follow_link() and 
 page_put_link(), ncpfs should now work fine - it doesn't need any fixing 
 any more.

Ah - right, it's using normal methods...
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix cramfs making duplicate entries in inode cache

2005-08-19 Thread Dave Johnson
Dave Johnson writes:
 Phillip Lougher writes:
  Doesn't iget_locked() assume inode numbers are unique?
  
  In Cramfs inode numbers are set to 1 for non-data inodes (fifos, 
  sockets, devices, empty directories), i.e
  
  %stat device namedpipe
 File: `device'
 Size: 0   Blocks: 0  IO Block: 4096   character 
  special file
  Device: 700h/1792d  Inode: 1   Links: 1 Device type: 1,1
  Access: (0644/crw-r--r--)  Uid: (0/root)   Gid: (0/root)
  Access: 1970-01-01 01:00:00.0 +0100
  Modify: 1970-01-01 01:00:00.0 +0100
  Change: 1970-01-01 01:00:00.0 +0100
 File: `namedpipe'
 Size: 0   Blocks: 0  IO Block: 4096   fifo
  Device: 700h/1792d  Inode: 1   Links: 1
  Access: (0644/prw-r--r--)  Uid: (0/root)   Gid: (0/root)
  Access: 1970-01-01 01:00:00.0 +0100
  Modify: 1970-01-01 01:00:00.0 +0100
  Change: 1970-01-01 01:00:00.0 +0100
  
  Should iget5_locked() be used here?
 
 Yep, that was busted.  Below patch should be better.

Doh, 2nd attempt wasn't verifying the inode number in the test
function to protect against hash collisions (iget_locked does this but
iget5_locked doesn't)

It should be good now.

-- 
Dave Johnson
Starent Networks

= fs/cramfs/inode.c 1.42 vs edited =
--- 1.42/fs/cramfs/inode.c  2005-07-14 12:24:48 -04:00
+++ edited/fs/cramfs/inode.c2005-08-19 22:06:44 -04:00
@@ -42,12 +42,46 @@
 #define CRAMINO(x) ((x)-offset?(x)-offset2:1)
 #define OFFSET(x)  ((x)-i_ino)
 
+
+static int cramfs_iget5_test(struct inode *inode, void *opaque)
+{
+   struct cramfs_inode * cramfs_inode = (struct cramfs_inode *)opaque;
+
+   if (inode-i_ino != CRAMINO(cramfs_inode))
+   return 0; /* does not match */
+
+   if (inode-i_ino != 1)
+   return 1;
+
+   /* all empty directories, char, block, pipe, and sock, share inode #1 */
+  
+   if ((inode-i_mode != cramfs_inode-mode) ||
+   (inode-i_gid != cramfs_inode-gid) ||
+   (inode-i_uid != cramfs_inode-uid))
+   return 0; /* does not match */
+  
+   if ((S_ISCHR(inode-i_mode) || S_ISBLK(inode-i_mode)) 
+   (inode-i_rdev != old_decode_dev(cramfs_inode-size)))
+   return 0; /* does not match */
+
+   return 1; /* matches */
+}
+
+static int cramfs_iget5_set(struct inode *inode, void *opaque)
+{
+   struct cramfs_inode * cramfs_inode = (struct cramfs_inode *)opaque;
+   inode-i_ino = CRAMINO(cramfs_inode);
+   return 0;
+}
+
 static struct inode *get_cramfs_inode(struct super_block *sb, struct 
cramfs_inode * cramfs_inode)
 {
-   struct inode * inode = new_inode(sb);
+   struct inode * inode = iget5_locked(sb, CRAMINO(cramfs_inode),
+   cramfs_iget5_test, cramfs_iget5_set,
+   cramfs_inode);
static struct timespec zerotime;
 
-   if (inode) {
+   if (inode  (inode-i_state  I_NEW)) {
inode-i_mode = cramfs_inode-mode;
inode-i_uid = cramfs_inode-uid;
inode-i_size = cramfs_inode-size;
@@ -63,7 +99,6 @@
   but it's the best we can do without reading the directory
   contents.  1 yields the right result in GNU find, even
   without -noleaf option. */
-   insert_inode_hash(inode);
if (S_ISREG(inode-i_mode)) {
inode-i_fop = generic_ro_fops;
inode-i_data.a_ops = cramfs_aops;
@@ -75,6 +110,7 @@
init_special_inode(inode, inode-i_mode,
old_decode_dev(cramfs_inode-size));
}
+   unlock_new_inode(inode);
}
return inode;
 }

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mirror a file system on the fly

2005-08-19 Thread Dave Quigley

Dave Schwartz wrote:


Ram,

Your code snippet seems to work great as discussed. Thanks. :-)
However, my requirement is slightly different.  What I also want is
that any file created from the mirrored/cloned file-system must not be
available in the parent file system.

Gracias,
decebel


On 8/18/05, Ram Pai [EMAIL PROTECTED] wrote:
 


On Thu, 2005-08-18 at 13:27, Dave Schwartz wrote:
   


Hi Ram,
Thanks for the inputs. I was going over the man pages describing the
clone system call and its option of CLONE_NEWNS. Could understand the
description only in parts.

The man page suggests that this flag when set, the cloned child is
started in a new name space, initialized with a copy of the parent.
Now does that mean, a program like a shell when cloned with
CLONE_NEWNS set, will have a copy of file hierarchy of the underlying
parent process?
 


Yes the child process will see an exact copy of all the mounts of
various filesystems as that of the parent. However if you mount/unmount
any filesystems in the child, the same will not be mounted/unmounted in
the parent and vice-versa.  Each has its individual view of the
the filesystem heirarchy.

Try the following program that clones off a child process with a mirror
namespace and gives you a bash prompt. Try mounting and unmounting
in this bash prompt and see if the same is visible in a totally
different window.


#include  stdio.h
#include  signal.h
#include  sched.h

char somemem[4096];

int myfunc(){
   system(bash);
}

int
main(int argc, char *argv[])
{
   if(clone(myfunc, somemem, CLONE_NEWNS|SIGCHLD, NULL)) {
   wait(NULL);
   } else {
   printf(clone failed\n);
   }
   printf(exit\n);
}


Hope this helps,
RP




   


Gracias,
decebel



On 8/19/05, Ram Pai [EMAIL PROTECTED] wrote:
 


On Thu, 2005-08-18 at 12:40, Dave Schwartz wrote:
   


Hi list,

Not too sure if this is the right forum to ask this question but since
my requirement is around linux filesystems, I shall take this liberty
to post my question.

My requirement is to develop a kernel/user space module to add an
extension to the shell program environment such that this shell forks
a mirror look-alike filesystem of the underlying OS to the programs
run in that particular shell.
 


u seem to be talking about namespaces, if I get you right.

there is a flag CLONE_NEWNS to the system call 'clone' which does what
u r talking about.

RP




   


Was trying to look thru the FAQ and a few list archives to look for
ideas around my requirement. The archives were overwhelming.


Any ideas/pointers will be a great help,
Gracias,
decebel
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

   


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

   


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


 

You might want to look into a project I work on called unionfs It 
contains code for sandboxing which is what your looking to do. The 
current code base is available for both 2.4 and 2.6 so you can do this 
with either kernel version.  You can find the source at 
http://www.filesystems.org/project-unionfs.html the split cache source 
is what you should be looking at. It will give you an example of how to 
create a small sandbox.  If you think unionfs itself can be used for 
your purposes Ide suggest asking on their mailing list..


Dave Quigley

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html