Re: 9pfs hangs since 4.7
Any idea of what xfstests is doing at this point in time? I'd be a bit worried about some sort of loop in the namespace since it seems to be in path traversal. Could also be some sort of resource leak or fragmentation, I'll admit that many of the regression tests we do are fairly short in duration. Another approach would be to look at doing this with a different server (over a network link instead of virtio) to isolate it as a client versus server side problem (although from the looks of things this does seem to be a client issue). On Thu, Nov 24, 2016 at 1:50 PM, Tuomas Tynkkynen wrote: > Hi fsdevel, > > I have been observing hangs when running xfstests generic/224. Curiously > enough, the test is *not* causing problems on the FS under test (I've > tried both ext4 and f2fs) but instead it's causing the 9pfs that I'm > using as the root filesystem to crap out. > > How it shows up is that the test doesn't finish in time (usually > takes ~50 sec) but the hung task detector triggers for some task in > d_alloc_parallel(): > > [ 660.701646] INFO: task 224:7800 blocked for more than 300 seconds. > [ 660.702756] Not tainted 4.9.0-rc5 #1-NixOS > [ 660.703232] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 660.703927] 224 D0 7800549 0x > [ 660.704501] 8a82ec022800 8a82fc03c800 > 8a82ff217dc0 > [ 660.705302] 8a82d0f88c00 a94a41a27b88 aeb4ad1d > a94a41a27b78 > [ 660.706125] ae800fc6 8a82fbd90f08 8a82d0f88c00 > 8a82fbfd5418 > [ 660.706924] Call Trace: > [ 660.707185] [] ? __schedule+0x18d/0x640 > [ 660.707751] [] ? __d_alloc+0x126/0x1e0 > [ 660.708304] [] schedule+0x36/0x80 > [ 660.708841] [] d_alloc_parallel+0x3a7/0x480 > [ 660.709454] [] ? wake_up_q+0x70/0x70 > [ 660.710007] [] lookup_slow+0x73/0x140 > [ 660.710572] [] walk_component+0x1ca/0x2f0 > [ 660.711167] [] ? path_init+0x1d9/0x330 > [ 660.711747] [] ? mntput+0x24/0x40 > [ 660.716962] [] path_lookupat+0x5d/0x110 > [ 660.717581] [] filename_lookup+0x9e/0x150 > [ 660.718194] [] ? kmem_cache_alloc+0x156/0x1b0 > [ 660.719037] [] ? getname_flags+0x56/0x1f0 > [ 660.719801] [] ? getname_flags+0x72/0x1f0 > [ 660.720492] [] user_path_at_empty+0x36/0x40 > [ 660.721206] [] vfs_fstatat+0x53/0xa0 > [ 660.721980] [] SYSC_newstat+0x1f/0x40 > [ 660.722732] [] SyS_newstat+0xe/0x10 > [ 660.723702] [] entry_SYSCALL_64_fastpath+0x1a/0xa9 > > SysRq-T is full of things stuck inside p9_client_rpc like: > > [ 271.703598] bashS0 100 96 0x > [ 271.703968] 8a82ff824800 8a82faee4800 > 8a82ff217dc0 > [ 271.704486] 8a82fb946c00 a94a404ebae8 aeb4ad1d > 8a82fb9fc058 > [ 271.705024] a94a404ebb10 ae8f21f9 8a82fb946c00 > 8a82fbbba000 > [ 271.705542] Call Trace: > [ 271.705715] [] ? __schedule+0x18d/0x640 > [ 271.706079] [] ? idr_get_empty_slot+0x199/0x3b0 > [ 271.706489] [] schedule+0x36/0x80 > [ 271.706825] [] p9_client_rpc+0x12a/0x460 [9pnet] > [ 271.707239] [] ? idr_alloc+0x87/0x100 > [ 271.707596] [] ? wake_atomic_t_function+0x60/0x60 > [ 271.708043] [] p9_client_walk+0x77/0x200 [9pnet] > [ 271.708459] [] v9fs_vfs_lookup.part.16+0x59/0x120 [9p] > [ 271.708912] [] v9fs_vfs_lookup+0x1f/0x30 [9p] > [ 271.709308] [] lookup_slow+0x96/0x140 > [ 271.709664] [] walk_component+0x1ca/0x2f0 > [ 271.710036] [] ? path_init+0x1d9/0x330 > [ 271.710390] [] path_lookupat+0x5d/0x110 > [ 271.710763] [] filename_lookup+0x9e/0x150 > [ 271.711136] [] ? mem_cgroup_commit_charge+0x7e/0x4a0 > [ 271.711581] [] ? kmem_cache_alloc+0x156/0x1b0 > [ 271.711977] [] ? getname_flags+0x56/0x1f0 > [ 271.712349] [] ? getname_flags+0x72/0x1f0 > [ 271.712726] [] user_path_at_empty+0x36/0x40 > [ 271.713110] [] vfs_fstatat+0x53/0xa0 > [ 271.713454] [] SYSC_newstat+0x1f/0x40 > [ 271.713810] [] SyS_newstat+0xe/0x10 > [ 271.714150] [] entry_SYSCALL_64_fastpath+0x1a/0xa9 > > [ 271.729022] sleep S0 218216 0x0002 > [ 271.729391] 8a82fb990800 8a82fc0d8000 > 8a82ff317dc0 > [ 271.729915] 8a82fbbec800 a94a404f3cf8 aeb4ad1d > 8a82fb9fc058 > [ 271.730426] ec95c1ee08c0 0001 8a82fbbec800 > 8a82fbbba000 > [ 271.730950] Call Trace: > [ 271.731115] [] ? __schedule+0x18d/0x640 > [ 271.731479] [] schedule+0x36/0x80 > [ 271.731814] [] p9_client_rpc+0x12a/0x460 [9pnet] > [ 271.732226] [] ? wake_atomic_t_function+0x60/0x60 > [ 271.732649] [] p9_client_clunk+0x38/0xb0 [9pnet] > [ 271.733061] [] v9fs_dir_release+0x1a/0x30 [9p] > [ 271.733494] [] __fput+0xdf/0x1f0 > [ 271.733844] [] fput+0xe/0x10 > [ 271.734176] [] task_work_run+0x7e/0xa0 > [ 271.734532] [] do_exit+0x2b9/0xad0 > [ 271.734888] [] ? __do_page_fault+0x287/0x4b0 > [ 271.735276] [] do_group_exit+0x43/0xb0 > [ 271.7
Re: [V9fs-developer] [PATCH] 9p: trans_fd, initialize recv fcall properly if not set
I thought the nature of trans_fd would have prevented any sort of true zero copy, but I suppose one less is always welcome :) -eric On Sun, Sep 6, 2015 at 1:55 AM, Dominique Martinet wrote: > Eric Van Hensbergen wrote on Sat, Sep 05, 2015: >> On Thu, Sep 3, 2015 at 4:38 AM, Dominique Martinet >> wrote: >> > To be honest, I think it might be better to just bail out if we get in >> > this switch (m->req->rc == NULL after p9_tag_lookup) and not try to >> > allocate more, because if we get there it's likely a race condition and >> > silently re-allocating will end up in more troubles than trying to >> > recover is worth. >> > Thoughts ? >> > >> >> Hmmm...trying to rattle my brain and remember why I put it in there >> back in 2008. >> It might have just been over-defensive programming -- or more likely it just >> pre-dated all the zero copy infrastructure which pretty much guaranteed we >> had >> an rc allocated and what is there is vestigial. I'm happy to accept a >> patch which >> makes this an assert, or perhaps just resets the connection because something >> has gone horribly wrong (similar to the ENOMEM path that is there now). > > Yeah, it looks like the safety comes from the zero-copy stuff that came > much later. > Let's go with resetting the connection then. Hmm. EIO is a bit too > generic so would be good to avoid that if possible, but can't think of > anything better... > > > Speaking of zero-copy, I believe it should be fairly straight-forward to > implement for trans_fd now I've actually looked at it, since we do the > payload read after a p9_tag_lookup, would just need m->req to point to a > zc buffer. Write is similar, if there's a zc buffer just send it after > the header. > The cost is a couple more pointers in req and an extra if in both > workers, that seems pretty reasonable. > > Well, I'm not using trans_fd much here (and unfortunately zero-copy > isn't possible at all given the transport protocol for RDMA, at least > for recv), but if anyone cares it probably could be done without too > much hassle for the fd workers. > > -- > Dominique -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 9p: trans_fd, initialize recv fcall properly if not set
On Thu, Sep 3, 2015 at 4:38 AM, Dominique Martinet wrote: > That code really should never be called (rc is allocated in > tag_alloc), but if it had been it couldn't have worked... > > Signed-off-by: Dominique Martinet > --- > net/9p/trans_fd.c | 3 +++ > 1 file changed, 3 insertions(+) > > To be honest, I think it might be better to just bail out if we get in > this switch (m->req->rc == NULL after p9_tag_lookup) and not try to > allocate more, because if we get there it's likely a race condition and > silently re-allocating will end up in more troubles than trying to > recover is worth. > Thoughts ? > Hmmm...trying to rattle my brain and remember why I put it in there back in 2008. It might have just been over-defensive programming -- or more likely it just pre-dated all the zero copy infrastructure which pretty much guaranteed we had an rc allocated and what is there is vestigial. I'm happy to accept a patch which makes this an assert, or perhaps just resets the connection because something has gone horribly wrong (similar to the ENOMEM path that is there now). -eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] 9p changes for 4.3 merge window (part-1)
The following changes since commit eb63b34bdfbdd70a734c2a90d89117c5c6c605c2: Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus (2015-08-23 07:23:09 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git tags/for-linus-4.3-merge-window-part-1 for you to fetch changes up to b5ac1fb2717e48177d3f73f9e4c9b556c0a24c6b: 9p: fix return code of read() when count is 0 (2015-08-23 14:21:36 -0500) Just a few cleanups for 4.3 merge window for the 9p file system. I've gotten several more over the past week, but this group has been in for-next for at least a couple of weeks so I figured I'd push them first while I test the rest. Most of the ones not in this set are bug-fixes anyways so I could hold them for rc1 if you'd rather they see more time in for-next. -eric Fabian Frederick (1): 9p: remove unused option Opt_trans Vincent Bernat (1): 9p: fix return code of read() when count is 0 fs/9p/v9fs.c | 2 +- fs/9p/vfs_file.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] 9p: patches for the 4.1 merge window
The following changes since commit b314acaccd7e0d55314d96be4a33b5f50d0b3344: Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input (2015-03-19 16:43:10 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git tags/for-linus-4.1-merge-window for you to fetch changes up to f569d3ef8254d4b3b8daa4f131f9397d48bf296c: net/9p: add a privport option for RDMA transport. (2015-03-21 19:32:33 -0700) 9p: patches for 4.1 merge window Some accumulated cleanup patches for kerneldoc and unused variables as well as some lock bug fixes and adding privateport option for RDMA. A quick check shows some merge-conflicts versus current-tip on 9p: use unsigned integers for nwqid/count If you would prefer I can rebase, remerge and fix the patch but didn't want to do that and look the for-next references. Signed-off-by: Eric Van Hensbergen Andrey Ryabinin (1): net/9p: use memcpy() instead of snprintf() in p9_mount_tag_show() Dominique Martinet (3): net/9p: Initialize opts->privport as it should be. fs/9p: Initialize status in v9fs_file_do_lock. net/9p: add a privport option for RDMA transport. Fabian Frederick (2): 9p: kerneldoc warning fixes 9p: remove unused variable in p9_fd_create() Kirill A. Shutemov (3): 9p: fix error handling in v9fs_file_do_lock 9p: do not crash on unknown lock status code 9p: use unsigned integers for nwqid/count fs/9p/v9fs.h | 1 - fs/9p/vfs_addr.c | 2 -- fs/9p/vfs_file.c | 10 ++ net/9p/protocol.c | 6 +++--- net/9p/trans_fd.c | 3 +-- net/9p/trans_rdma.c | 52 + net/9p/trans_virtio.c | 5 - 7 files changed, 58 insertions(+), 21 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Make v9fs uname and remotename parsing more robust
On Sat, Feb 23, 2008 at 2:07 AM, Andrew Morton <[EMAIL PROTECTED]> wrote: > > It would be better to present this as two patches. One adds the new core > APIs and the other uses those APIs in v9fs. The patches would take > separate routes into mainline. > > I guess I can sneak this one in as-is, as long as the v9fs guys are OK with > that? > I'm fine with it. Shall I pull it through the v9fs-devel patch line or would you rather send it with your patches Andrew? -eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PULL] v9fs patches for merge window
On Feb 6, 2008 8:43 PM, Andrew Morton <[EMAIL PROTECTED]> wrote: > On Wed, 6 Feb 2008 19:39:26 -0600 "Eric Van Hensbergen" <[EMAIL PROTECTED]> > wrote: > > Could you please cc me on pull requests? I need to pay more attention to > them. Thanks. > > > Andrew Morton (1): > > 9p: fix p9_printfcall export > > Really this should have been folded into the patch which it fixes. We get > a cleaner history that way, and it protects git-bisectability. > I would have, but I didn't see the original offender in my upstream branch, so I just applied it separately - looks to me like fcprint.c hasn't been touched (until your patch) since its introduction: [EMAIL PROTECTED]:~/src/linux/9p$ git log net/9p/fcprint.c commit bd238fb431f31989898423c8b6496bc8c4204a86 Author: Latchesar Ionkov <[EMAIL PROTECTED]> Date: Tue Jul 10 17:57:28 2007 -0500 9p: Reorganization of 9p file system code This patchset moves non-filesystem interfaces of v9fs from fs/9p to net/9p. It moves the transport, packet marshalling and connection layers to net/9p leaving only the VFS related files in fs/9p. This work is being done in preparation for in-kernel 9p servers as well as alternate 9p clients (other than VFS). Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]> Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> -eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PULL] v9fs patches for merge window
The following changes since commit 3e6bdf473f489664dac4d7511d26c7ac3dfdc748: Linus Torvalds (1): Merge git://git.kernel.org/.../x86/linux-2.6-x86 are found in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git Andrew Morton (1): 9p: fix p9_printfcall export Anthony Liguori (2): 9p: add support for sticky bit 9p: Convert semaphore to spinlock for p9_idpool Eric Van Hensbergen (7): 9p: create transport rpc cut-thru 9p: block-based virtio client 9p: fix bug in attach-per-user 9p: Fix soft lockup in virtio transport 9p: fix mmap to be read-only 9p: add remove function to trans_virtio 9p: transport API reorganization Martin Stava (1): 9p: fix bug in p9_clone_stat fs/9p/fid.c|4 +- fs/9p/v9fs.c | 51 +-- fs/9p/v9fs.h |5 +- fs/9p/vfs_file.c |4 +- fs/9p/vfs_inode.c |5 + include/net/9p/9p.h|1 + include/net/9p/client.h|5 +- include/net/9p/conn.h | 57 --- include/net/9p/transport.h | 11 +- net/9p/Makefile|1 - net/9p/client.c| 161 +-- net/9p/fcprint.c |4 +- net/9p/mod.c |9 +- net/9p/mux.c | 1060 -- net/9p/trans_fd.c | 1103 +++- net/9p/trans_virtio.c | 355 +-- net/9p/util.c | 20 +- 17 files changed, 1466 insertions(+), 1390 deletions(-) delete mode 100644 include/net/9p/conn.h delete mode 100644 net/9p/mux.c -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] 9p: 2.6.24-rc1 patches
Linus, Please pull the following bug-fixes for v9fs. The following changes since commit 2655e2cee2d77459fcb7e10228259e4ee0328697: Alan Cox (1): ata_piix: Add additional PCI identifier for 40 wire short cable are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git for-linus Latchesar Ionkov (4): 9p: fix memory leak in v9fs_get_sb 9p: use copy of the options value instead of original 9p: return NULL when trans not found 9p: add missing end-of-options record for trans_fd fs/9p/v9fs.c |6 -- fs/9p/vfs_super.c |3 +++ net/9p/mod.c |4 ++-- net/9p/trans_fd.c |3 ++- 4 files changed, 11 insertions(+), 5 deletions(-) Thanks, -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V9fs-developer] [PATCH] 9p: v9fs_vfs_rename incorrect clunk order
On 10/22/07, Latchesar Ionkov <[EMAIL PROTECTED]> wrote: > In v9fs_vfs_rename function labels don't match the fids that are clunked. > The correct clunk order is clunking newdirfid first and then olddirfid next. > > Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]> Acked-by: Eric Van Hensbergen <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] fs/9p/v9fs.c: memleak fix
On 10/19/07, Adrian Bunk <[EMAIL PROTECTED]> wrote: > This patch fixes a memory leak introduced by > commit ba17674fe02909fef049fd4b620a2805bdb8c693. > > Spotted by the Coverity checker. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Acked-by: Eric Van Hensbergen <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] 9p: 2.6.24 patches (phase 2)
Linus, please pull from the 'for-linus' branch of: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git/ for-linus This tree contains the following: Latchesar Ionkov(1): 9p: v9fs_vfs_rename incorrect clunk order Adrian Bunk(1): 9p: fix memleak in fs/9p/v9fs.c Eric Van Hensbergen(1) 9p: add virtio transport Documentation/filesystems/9p.txt |8 fs/9p/v9fs.c |1 fs/9p/vfs_inode.c|4 include/linux/virtio_9p.h| 10 + net/9p/Kconfig |7 net/9p/Makefile |4 net/9p/trans_virtio.c| 353 +++ 7 files changed, 382 insertions(+), 5 deletions(-) Thanks, -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] 9p patches for 2.6.24 merge window
On 10/17/07, Sam Ravnborg <[EMAIL PROTECTED]> wrote: > On Wed, Oct 17, 2007 at 04:34:02PM -0500, Eric Van Hensbergen wrote: > > Linus, please pull from the 'for-linus' branch of: > > git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git/ for-linus > > > > This tree contains the following: > > > > Latchesar Ionkov(3): > > attach-per-user support > > rename uid and gid parameters > > define session flags > > > > Eric Van Hensbergen(4) > > remove sysctl code > > fix bad kconfig cross-dependency > > soften invalidationin loose mode > > make transports dynamic > > Could you please tag your patches with 9p: or [9p] so it > is obvious that they belong to this subsystem. > When browsing head-commits and other places this is a great help. > They should be so tagged, I just stripped it in my pull email summary. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] 9p patches for 2.6.24 merge window
Linus, please pull from the 'for-linus' branch of: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git/ for-linus This tree contains the following: Latchesar Ionkov(3): attach-per-user support rename uid and gid parameters define session flags Eric Van Hensbergen(4) remove sysctl code fix bad kconfig cross-dependency soften invalidationin loose mode make transports dynamic There are a few patches relating to a virtio transport support that I'm holding back until I know Rusty's lguest series is merged. b/Documentation/filesystems/9p.txt | 22 + b/fs/9p/fid.c | 157 +++-- b/fs/9p/v9fs.c | 189 +++- b/fs/9p/v9fs.h | 38 +-- b/fs/9p/vfs_file.c |6 b/fs/9p/vfs_inode.c| 50 ++-- b/fs/9p/vfs_super.c| 19 - b/include/net/9p/9p.h | 21 - b/include/net/9p/client.h |9 b/include/net/9p/conn.h|4 b/include/net/9p/transport.h | 27 +- b/net/9p/Kconfig | 10 b/net/9p/Makefile |5 b/net/9p/client.c | 13 - b/net/9p/conv.c| 32 ++ b/net/9p/mod.c | 71 +- b/net/9p/mux.c |5 b/net/9p/trans_fd.c| 419 +++-- net/9p/sysctl.c| 81 --- 19 files changed, 689 insertions(+), 489 deletions(-) Thanks, -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5][9PFS] Cleanup explicit check for mandatory locks
On 9/17/07, Pavel Emelyanov <[EMAIL PROTECTED]> wrote: > The __mandatory_lock(inode) macro makes the same check, but > makes the code more readable. > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> > Acked-by: Eric Van Hensbergen <[EMAIL PROTECTED]> > > --- > > fs/9p/vfs_file.c |2 +- > 1 files changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c > index 2a40c29..7166916 100644 > --- a/fs/9p/vfs_file.c > +++ b/fs/9p/vfs_file.c > @@ -105,7 +105,7 @@ static int v9fs_file_lock(struct file *f > P9_DPRINTK(P9_DEBUG_VFS, "filp: %p lock: %p\n", filp, fl); > > /* No mandatory locks */ > - if ((inode->i_mode & (S_ISGID | S_IXGRP)) == S_ISGID) > + if (__mandatory_lock(inode)) > return -ENOLCK; > > if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) { > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] 9p: add readahead support for loose mode
This patch adds readpages support in support of readahead when using loose cache mode. It substantially increases performance for certain workloads. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- fs/9p/v9fs.c|2 +- fs/9p/vfs_addr.c| 98 ++ include/net/9p/client.h |3 +- net/9p/client.c | 82 +-- 4 files changed, 143 insertions(+), 42 deletions(-) diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 89ee0ba..ca97404 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -131,7 +131,7 @@ static void v9fs_parse_options(struct v9fs_session_info *v9ses) char *s, *e; /* setup defaults */ - v9ses->maxdata = 8192; + v9ses->maxdata = (64*1024); v9ses->afid = ~0; v9ses->debug = 0; v9ses->cache = 0; diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 6248f0e..86c6e0d 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -31,8 +31,11 @@ #include #include #include +#include #include #include +#include +#include #include #include @@ -50,31 +53,108 @@ static int v9fs_vfs_readpage(struct file *filp, struct page *page) { - int retval; loff_t offset; char *buffer; struct p9_fid *fid; + int retval = 0; + int total = 0; + int count = PAGE_SIZE; P9_DPRINTK(P9_DEBUG_VFS, "\n"); fid = filp->private_data; buffer = kmap(page); offset = page_offset(page); - retval = p9_client_readn(fid, buffer, offset, PAGE_CACHE_SIZE); - if (retval < 0) - goto done; + while (count) { + struct kvec kv = {buffer+offset, PAGE_SIZE-count}; + retval = p9_client_readv(fid, &kv, offset, 1); + if (retval <= 0) + break; - memset(buffer + retval, 0, PAGE_CACHE_SIZE - retval); - flush_dcache_page(page); - SetPageUptodate(page); - retval = 0; + buffer += retval; + offset += retval; + count -= retval; + total += retval; + } + + if (retval >= 0) { + flush_dcache_page(page); + SetPageUptodate(page); + retval = 0; + } -done: kunmap(page); unlock_page(page); return retval; } +/* large chunks copied and adapted from fs/cifs/file.c */ +static int v9fs_vfs_readpages(struct file *file, struct address_space *mapping, + struct list_head *page_list, unsigned num_pages) +{ + struct page *tmp_page; + loff_t offset; + struct pagevec lru_pvec; + struct p9_fid *fid; + u32 read_size; + int retval = 0; + unsigned int count = 0; + struct list_head *p, *n; + + struct kvec *kv = kmalloc(sizeof(struct kvec)*num_pages, GFP_KERNEL); + + P9_DPRINTK(P9_DEBUG_VFS, "%d pages\n", num_pages); + + if (!kv) + return -ENOMEM; + + if (list_empty(page_list)) + goto free_vec; + + pagevec_init(&lru_pvec, 0); + + fid = file->private_data; + tmp_page = list_entry(page_list->prev, struct page, lru); + offset = (loff_t)tmp_page->index << PAGE_CACHE_SHIFT; + + list_for_each_entry_reverse(tmp_page, page_list, lru) { + BUG_ON(count > num_pages); + if (add_to_page_cache(tmp_page, mapping, + tmp_page->index, GFP_KERNEL)) { + page_cache_release(tmp_page); + continue; + } + + kv[count].iov_base = kmap(tmp_page); + kv[count].iov_len = PAGE_CACHE_SIZE; + count++; + } + + read_size = count * PAGE_CACHE_SIZE; + if (!read_size) + goto cleanup; + + retval = p9_client_readv(fid, kv, offset, count); + +cleanup: + list_for_each_safe(p, n, page_list) { + tmp_page = list_entry(p, struct page, lru); + list_del(&tmp_page->lru); + if (!pagevec_add(&lru_pvec, tmp_page)) + __pagevec_lru_add(&lru_pvec); + kunmap(tmp_page); + flush_dcache_page(tmp_page); + SetPageUptodate(tmp_page); + unlock_page(tmp_page); + } + pagevec_lru_add(&lru_pvec); + +free_vec: + kfree(kv); + return retval; +} + const struct address_space_operations v9fs_addr_operations = { .readpage = v9fs_vfs_readpage, + .readpages = v9fs_vfs_readpages, }; diff --git a/include/net/9p/client.h b/include/net/9p/client.h index 9b9221a..6f17d0a 100644 --- a/include/net/9p/client.h +++ b/include/net/9p/client.h @@ -67,8 +67,7 @@ int p9_client_fcreate(struct p9_fid *fid, char *name, u32 perm, int mode,
Re: [PATCH] 9p: attach-per-user
On 9/12/07, Latchesar Ionkov <[EMAIL PROTECTED]> wrote: > > - allow only one user to access the tree (access=) > Only the user with uid can access the v9fs tree. Other users that attempt > to access it will get EPERM error. > While access= and dfltuid= creates an interesting flexibility in the way things can be used, I'm wondering if access= dfltuid=DEFAULT_UID is intuitive, it might be nice for the default behavior to be setting defltuid to the uid specified in access when that access option is used. This can be overridden with the dfltuid option, but I think it makes more sense to attach as the uid you are restricting access to. If that's the way we want to go, I think that can be handled in a separate patch. I've merged this stuff into my test tree, as soon as regressions pass and I confirm they compile clean on another architecture I'll push them into my devel to be picked up by -mm. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 9p: rename uid and gid parameters
On 9/12/07, Latchesar Ionkov <[EMAIL PROTECTED]> wrote: > Change the names of 'uid' and 'gid' parameters to the more appropriate > 'dfltuid' and 'dfltgid'. > ... > strcpy(v9ses->name, V9FS_DEFUSER); > strcpy(v9ses->remotename, V9FS_DEFANAME); > + v9ses->dfltuid = V9FS_DEFUID; > + v9ses->dfltgid = V9FS_DEFGID; > ... > +#define V9FS_DEFUID(0) > +#define V9FS_DEFGID(0) I'm not sure if there is a good solution here, but I'm uncomfortable with using uid=0 as the default. I'm not sure if there is a default uid for nobody, but anything is probably better than 0. Looks like nfsnobody is 65534, we could use that - even if only as a marker for the server to map it to nobody on the target system? What do you think? Particularly with attach-per-user, we probably need to look at interacting with idmapd or create our own variant real soon. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V9fs-developer] [PATCH] 9p: attach-per-user
On 9/3/07, Latchesar Ionkov <[EMAIL PROTECTED]> wrote: > > This patch improves the 9P2000 support by allowing every user to attach > separately. The patch defines three modes of access (new mount option > 'access'): > nit picks: * you added/changed options without updated Documentation/filesystems/9p.txt * you changed v9fs->extended to be part of a flags structure, that should have been a separate patch * rename of options should have been done in a separate patch > - attach-per-user (access=user) > If a user tries to access a file served by v9fs for the first time, v9fs > sends an attach command to the server (Tattach) specifying the user. If > the attach succeeds, the user can access the v9fs tree. > As there is no uname->uid (string->integer) mapping yet, this mode works > only with the 9P2000.u dialect. > > - allow only one user to access the tree (access=) > Only the user with uid can access the v9fs tree. Other users that attempt > to access it will get EPERM error. > > - do all operations as a single user (access=any) > V9fs does a single attach and all operations are done as a single user. > If this mode is selected, the v9fs behavior is identical with the current > one. > Which option is default? > > /** > - * v9fs_fid_insert - add a fid to a dentry > + * v9fs_fid_add - add a fid to a dentry > + * @dentry: dentry that the fid is being added to > * @fid: fid to add > - * @dentry: dentry that it is being added to > * > */ > > @@ -66,52 +67,144 @@ int v9fs_fid_add(struct dentry *dentry, struct p9_fid > *fid) > } Even small cleanups like this should probably be confined to a separate patch if they are unrelated. > > -struct p9_fid *v9fs_fid_lookup(struct dentry *dentry) > +static struct p9_fid *v9fs_fid_find(struct dentry *dentry, u32 uid, int any) > { ... > > -struct p9_fid *v9fs_fid_clone(struct dentry *dentry) > +struct p9_fid *v9fs_fid_lookup(struct dentry *dentry) > { ... > + > +struct p9_fid *v9fs_fid_clone(struct dentry *dentry) > +{ The way the patch got formatted, these look like compulsive renames..but there's an added function and then changes to the other two. I think it might be because of the way you ordered the functions. Put new functions after the old functions and maybe this won't happen. And clone seems to have lost his function header. The code is pretty inconsistent about those these days, but I'd like to do an audit soon and make sure we have proper comment blocks where appropriate. scripts/checkpatch.pl reports: ERROR: need a space before the open parenthesis '(' #244: FILE: fs/9p/fid.c:147: + for(ds = dentry; !IS_ROOT(ds); ds = ds->d_parent) ERROR: need a space before the open parenthesis '(' #275: FILE: fs/9p/fid.c:178: + for(d = dentry, i = n; i >= 0; i--, d = d->d_parent) Please fix up these small bits and resubmit. -eric Also, go ahead and cc: me directly on patches, for some reason this one missed my normal filters and got lost. If I'm directly cc:'d it will pop to the top of my inbox. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lguest] [RFC] 9p Virtualization Transports
On 9/1/07, Rusty Russell <[EMAIL PROTECTED]> wrote: > On Tue, 2007-08-28 at 13:52 -0500, Eric Van Hensbergen wrote: > > The lguest and kvm transports are functional, but we are still working out > > remaining bugs and need to spend some time focusing on performance issues. > > I wanted to send out this "preview" patch set to the community to solicit > > ideas on things we can do differently/better. > > Patches look reasonable, but just a heads-up: lguest will be moving to > virtio, as will kvm. That means a single implementation for both > (yay!), but it does complicate your life in the short term 8( > > Dor has published a kvm virtio implementation, and we've already > discussed a couple of modifications. I expect that to be nailed in the > next 2 weeks tho, and lguest will follow. > yeah, I've been emailing Dor -- it sounds like he'll have stuff ready for the 2.6.24 merge window -- that being the case, I'll write a virtio transport and mothball the PCI and lguest transports. They were straightforward to write (a couple hours for the lguest transport) and the lguest transport was a good learning experience -- so I'm not shedding tears over wasted effort. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lguest] [PATCH] modify lguest console to support multiple hvc's
On 8/30/07, Rusty Russell <[EMAIL PROTECTED]> wrote: > On Thu, 2007-08-30 at 13:38 -0500, Eric Van Hensbergen wrote: > > From: Eric Van Hensbergen <[EMAIL PROTECTED]> > > > > This was a quick modification I did of lguest to be able to support multiple > > HVC channels for some experiments I was doing. I'm not sure if this is more > > generally useful, so I'm posting it to the list in case someone else has a > > need for it. > > > > Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> > > This is cool! Great that it's useful for you. What do the other > consoles look like from inside the guest? > They just show up on other hvc device minor numbers. I was running 9p over them, but I wanted a tighter coupling for v9fs so I could tune performance and incrementally optimize. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] modify lguest console to support multiple hvc's
From: Eric Van Hensbergen <[EMAIL PROTECTED]> This was a quick modification I did of lguest to be able to support multiple HVC channels for some experiements I was doing. I'm not sure if this is more generally useful, so I'm posting it to the list in case someone else has a need for it. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- Documentation/lguest/lguest.c | 161 - drivers/char/hvc_lguest.c | 57 +-- 2 files changed, 129 insertions(+), 89 deletions(-) diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index f791840..c6a3e4d 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -690,12 +690,14 @@ static void restore_term(void) } /* We associate some data with the console for our exit hack. */ -struct console_abort +struct console_priv { + /* which console we are */ + int index; /* How many times have they hit ^C? */ - int count; + int a_count; /* When did they start? */ - struct timeval start; + struct timeval a_start; }; /* This is the routine which handles console input (ie. stdin). */ @@ -705,11 +707,12 @@ static bool handle_console_input(int fd, struct device *dev) int len; unsigned int num; struct iovec iov[LGUEST_MAX_DMA_SECTIONS]; - struct console_abort *abort = dev->priv; + struct console_priv *cons = dev->priv; /* First we get the console buffer from the Guest. The key is dev->mem -* which was set to 0 in setup_console(). */ - lenp = get_dma_buffer(fd, dev->mem, iov, &num, &irq); +* plus the console index adjusted to be a multiple of 4 because lguest +* wants keys to be a multiple of 4 */ + lenp = get_dma_buffer(fd, dev->mem+(cons->index*4), iov, &num, &irq); if (!lenp) { /* If it's not ready for input, warn and set up to discard. */ warn("console: no dma buffer!"); @@ -734,39 +737,44 @@ static bool handle_console_input(int fd, struct device *dev) trigger_irq(fd, irq); } - /* Three ^C within one second? Exit. -* -* This is such a hack, but works surprisingly well. Each ^C has to be -* in a buffer by itself, so they can't be too fast. But we check that -* we get three within about a second, so they can't be too slow. */ - if (len == 1 && ((char *)iov[0].iov_base)[0] == 3) { - if (!abort->count++) - gettimeofday(&abort->start, NULL); - else if (abort->count == 3) { - struct timeval now; - gettimeofday(&now, NULL); - if (now.tv_sec <= abort->start.tv_sec+1) { - u32 args[] = { LHREQ_BREAK, 0 }; - /* Close the fd so Waker will know it has to -* exit. */ - close(waker_fd); - /* Just in case waker is blocked in BREAK, send -* unbreak now. */ - write(fd, args, sizeof(args)); - exit(2); + /* Only do interrupt hack and restore_term() on initial console */ + if (cons->index == 0) { + /* Three ^C within one second? Exit. +* +* This is such a hack, but works surprisingly well. Each ^C +* has to be in a buffer by itself, so they can't be too fast. +* But we check that we get three within about a second, so +* they can't be too slow. */ + if (len == 1 && ((char *)iov[0].iov_base)[0] == 3) { + if (!cons->a_count++) + gettimeofday(&cons->a_start, NULL); + else if (cons->a_count == 3) { + struct timeval now; + gettimeofday(&now, NULL); + if (now.tv_sec <= cons->a_start.tv_sec+1) { + u32 args[] = { LHREQ_BREAK, 0 }; + /* Close the fd so Waker will know it +* has to exit. */ + close(waker_fd); + /* Just in case waker is blocked in +* BREAK, send unbreak now. */ + write(fd, args, sizeof(args)); + exit(2); + } + cons->a_count = 0; } -
Re: [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport
On 8/28/07, Arnd Bergmann <[EMAIL PROTECTED]> wrote: > On Tuesday 28 August 2007, Eric Van Hensbergen wrote: > > > This adds a shared memory transport for a synthetic 9p device for > > paravirtualized file system support under KVM/QEMU. > > Nice driver. I'm hoping we can do a virtio driver using a similar > concept. > Yes. I'm looking at the patches from Dor now, it should be pretty straight forward. The PCI is interesting in its own right for other (non-virtual) projects we've been playing with -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] 9p: add KVM/QEMU pci transport
From: Latchesar Ionkov <[EMAIL PROTECTED]> This adds a shared memory transport for a synthetic 9p device for paravirtualized file system support under KVM/QEMU. Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]> Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- Documentation/filesystems/9p.txt |2 + net/9p/Kconfig | 10 ++- net/9p/Makefile |4 + net/9p/trans_pci.c | 295 ++ 4 files changed, 310 insertions(+), 1 deletions(-) create mode 100644 net/9p/trans_pci.c diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index 1a5f50d..e1879bd 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt @@ -46,6 +46,8 @@ OPTIONS tcp - specifying a normal TCP/IP connection fd - used passed file descriptors for connection (see rfdno and wfdno) + pci - use a PCI pseudo device for 9p communication + over shared memory between a guest and host uname=name user name to attempt mount as on the remote server. The server may override or ignore this value. Certain user diff --git a/net/9p/Kconfig b/net/9p/Kconfig index 09566ae..8517560 100644 --- a/net/9p/Kconfig +++ b/net/9p/Kconfig @@ -16,13 +16,21 @@ menuconfig NET_9P config NET_9P_FD depends on NET_9P default y if NET_9P - tristate "9P File Descriptor Transports (Experimental)" + tristate "9p File Descriptor Transports (Experimental)" help This builds support for file descriptor transports for 9p which includes support for TCP/IP, named pipes, or passed file descriptors. TCP/IP is the default transport for 9p, so if you are going to use 9p, you'll likely want this. +config NET_9P_PCI + depends on NET_9P + tristate "9p PCI Shared Memory Transport (Experimental)" + help + This builds support for a PCI psuedo-device currently available + under KVM/QEMU which allows for 9p transactions over shared + memory between the guest and the host. + config NET_9P_DEBUG bool "Debug information" depends on NET_9P diff --git a/net/9p/Makefile b/net/9p/Makefile index 7b2a67a..26ce89d 100644 --- a/net/9p/Makefile +++ b/net/9p/Makefile @@ -1,5 +1,6 @@ obj-$(CONFIG_NET_9P) := 9pnet.o obj-$(CONFIG_NET_9P_FD) += 9pnet_fd.o +obj-$(CONFIG_NET_9P_PCI) += 9pnet_pci.o 9pnet-objs := \ mod.o \ @@ -14,3 +15,6 @@ obj-$(CONFIG_NET_9P_FD) += 9pnet_fd.o 9pnet_fd-objs := \ trans_fd.o \ + +9pnet_pci-objs := \ + trans_pci.o \ diff --git a/net/9p/trans_pci.c b/net/9p/trans_pci.c new file mode 100644 index 000..36ddc5f --- /dev/null +++ b/net/9p/trans_pci.c @@ -0,0 +1,295 @@ +/* + * net/9p/trans_pci.c + * + * 9P over PCI transport layer. For use with KVM/QEMU. + * + * Copyright (C) 2007 by Latchesar Ionkov <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to: + * Free Software Foundation + * 51 Franklin Street, Fifth Floor + * Boston, MA 02111-1301 USA + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define P9PCI_DRIVER_NAME "9P PCI Device" +#define P9PCI_DRIVER_VERSION "1" + +#define PCI_VENDOR_ID_9P 0x5002 +#define PCI_DEVICE_ID_9P 0x000D + +#define MAX_PCI_BUF(4*1024) /* TODO: Get a number from lucho */ + +struct p9pci_trans { + struct pci_dev *pdev; + void __iomem*ioaddr; + void __iomem*tx; + void __iomem*rx; + int irq; + int pos; + int len; + wait_queue_head_t wait; +}; +static struct p9pci_trans *p9pci_trans; /* single channel for now */ + +static struct pci_device_id p9pci_tbl[] = { + {PCI_VENDOR_ID_9P, PCI_DEVICE_ID_9P, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, + {0,} +}; + +static irqreturn_t p9pci_interrupt(int irq, void *dev) +{ + p9pci_trans = dev; + p9pci_trans->len = le32_to_cpu(readl(p9pci_trans->rx)); + P9_DPRINTK(P9_DEBUG_TRANS, "%p le
[REFERENCE ONLY] 9p: add shared memory transport
From: Eric Van Hensbergen <[EMAIL PROTECTED](none)> This adds a 9p generic shared memory transport which has been used to communicate between Dom0 and DomU under Xen as part of the Libra and PROSE projects (http://www.research.ibm.com/prose). Parts of the code are a horrible hack, but may be useful as reference for constructing (or how not to construct) a poll-driven shared-memory driver for Xen (or other purposes). Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- net/9p/Kconfig |7 + net/9p/Makefile|4 + net/9p/trans_shm.c | 378 3 files changed, 389 insertions(+), 0 deletions(-) create mode 100644 net/9p/trans_shm.c diff --git a/net/9p/Kconfig b/net/9p/Kconfig index fab7bb9..a1b55e8 100644 --- a/net/9p/Kconfig +++ b/net/9p/Kconfig @@ -38,6 +38,13 @@ config NET_9P_LG This builds support for a transport between an Lguest guest partition and the host partition. +config NET_9P_SHM + depends on NET_9P + tristate "9p Shared Memory Transport (Experimental)" + help + This builds support for a shared memory transport which + can be used on XenPPC to mount 9p between DomU and Dom0. + config NET_9P_DEBUG bool "Debug information" depends on NET_9P diff --git a/net/9p/Makefile b/net/9p/Makefile index 80a4227..e7a036a 100644 --- a/net/9p/Makefile +++ b/net/9p/Makefile @@ -2,6 +2,7 @@ obj-$(CONFIG_NET_9P) := 9pnet.o obj-$(CONFIG_NET_9P_FD) += 9pnet_fd.o obj-$(CONFIG_NET_9P_PCI) += 9pnet_pci.o obj-$(CONFIG_NET_9P_LG) += 9pnet_lg.o +obj-$(CONFIG_NET_9P_SHM) += 9pnet_shm.o 9pnet-objs := \ mod.o \ @@ -22,3 +23,6 @@ obj-$(CONFIG_NET_9P_LG) += 9pnet_lg.o 9pnet_lg-objs := \ trans_lg.o \ + +9pnet_shm-objs := \ + trans_shm.o \ diff --git a/net/9p/trans_shm.c b/net/9p/trans_shm.c new file mode 100644 index 000..d7847fd --- /dev/null +++ b/net/9p/trans_shm.c @@ -0,0 +1,378 @@ +/* + * linux/fs/9p/trans_shm.c + * + * Shared memory transport layer. + * + * This is the Linux version of shared memory transport hack used + * in the Libra and PROSE projects to communicate between Dom0 and + * DomU under Xen and rHype. + * + * Certain aspects of this code (such as the BIG_UGLY_BUFFER) are + * horrible hacks, but the rest of the code may provide a decent starting + * point for someone wanting to write a proper shared-memory transport for + * Xen (or other purposes). + * + * The server side of this transport exists in inferno-tx branch of + * inferno. It can be grabbed from the txinferno branch of + * http://git.9grid.us/git/inferno.git + * + * Copyright (C) 2006,2007 by Eric Van Hensbergen <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to: + * Free Software Foundation + * 51 Franklin Street, Fifth Floor + * Boston, MA 02111-1301 USA + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +enum +{ + Shm_Idle = 0, + Shm_Announcing =1, + Shm_Announced = 2, + Shm_Connecting =3, + Shm_Connected = 4, + Shm_Hungup =5, + + Shmaddrlen =255, +}; + +enum +{ + S_USM = 1, /* Sys V shared memory */ + S_MSM = 2, /* mmap */ + S_XEN = 3, /* xen shared memory */ + + SM_SERVER = 0, + SM_CLIENT = 1, + + DATA_POLL = 100, + HANDSHAKE_POLL =1 +}; + +struct chan +{ + u32 magic; + u32 write; + u32 read; + u32 overflow; +}; + +enum { + Chan_listen, + Chan_connected, + Chan_hungup +}; + +/* Two circular buffers: small one for input, large one for output. */ +struct chan_pipe +{ + u32 magic; + u32 buflen; + u32 state; + struct chan out; + struct chan in; + char buffers[0]; +}; + +#define CHUNK_SIZE (64<<20) +#define CHAN_MAGIC 0xB0BABEEF +#define CHAN_BUF_MAGIC 0xCAFEBABE + +/* + * UGLY HACK: static buffer just like in libOS so we can easily + *address things. Xen hackers free to fix this. + * + */ + +#define BIG_UGLY_BUFFER_SZ 8*1024 +static char big_ugly_buffer[sizeof(struct chan_pipe)+(BIG_UGLY_BUFFER_SZ*2)]; + +/* + * (expr) may be as
[RFC] 9p: add lguest transport
From: Eric Van Hensbergen <[EMAIL PROTECTED](none)> This adds a transport to 9p for communicating between guest and host domains on lguest. Currently, the host-side proxies the communication to a socket connected to the actual server. The transport is based heavily on the existing console code. A better integrated server component which eliminates some of the copy overhead is in progress and will look less like the existing console code. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- Documentation/filesystems/9p.txt |2 + Documentation/lguest/lguest.c| 127 fs/9p/v9fs.c |2 +- include/linux/lguest_launcher.h |1 + net/9p/Kconfig |7 + net/9p/Makefile |4 + net/9p/trans_lg.c| 303 ++ 7 files changed, 445 insertions(+), 1 deletions(-) create mode 100644 net/9p/trans_lg.c diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index e1879bd..1a3342f 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt @@ -48,6 +48,8 @@ OPTIONS (see rfdno and wfdno) pci - use a PCI pseudo device for 9p communication over shared memory between a guest and host + lg - use a lguest 9p channel for communication + over shared memory between a guest and host uname=name user name to attempt mount as on the remote server. The server may override or ignore this value. Certain user diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index f791840..adc50de 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -1318,6 +1318,128 @@ static void setup_tun_net(const char *arg, struct device_list *devices) } /* That's the end of device setup. */ +/* 9p transport code. + * This code implements the host side of the 9p transport. Right now + * this is heavily based on the console code and just proxies data to + * a socket connected to an external server. Eventually we'll hook the + * server code in more directly like we do with lguest to avoid the + * socket overhead. + */ +/* This is the routine proxies 9p channel input */ +static bool handle_9p_input(int fd, struct device *dev) +{ + u32 irq = 0; + u32 *lenp; + int len = 0; + unsigned int num = 0; + struct iovec iov[LGUEST_MAX_DMA_SECTIONS]; + + /* First we get the console buffer from the Guest. The key is dev->mem +* which was set in setup_9p(). */ + + lenp = get_dma_buffer(fd, dev->mem, iov, &num, &irq); + if (!lenp) { + /* If it's not ready for input, warn and set up to discard. */ + warn("9p: no dma buffer!"); + discard_iovec(iov, &num); + } + + /* This is why we convert to iovecs: the readv() call uses them, and so +* it reads straight into the Guest's buffer. */ + len = readv(dev->fd, iov, num); + if (len == 0) { + /* +* BUG: When using msize > 1k we get zero length reads +* and I'm not sure why. +*/ + err(1, "9p: zero length read!"); + } + + if (len < 0) /* Something has gone horribly wrong */ + errx(1, "9p: input readv returned %d", len); + + /* If we read the data into the Guest, fill in the length and send the +* interrupt. */ + if (lenp) { + *lenp = len; + trigger_irq(fd, irq); + } + + /* Now, if we didn't read anything, return failure */ + if (!len) + return false; + + /* Everything went OK! */ + return true; +} + +/* Proxy output to socket. */ +static u32 handle_9p_output(int fd, const struct iovec *iov, +unsigned num, struct device*dev) +{ + /* Whatever the Guest sends, write it to the fd. Return the +* number of bytes written. */ + return writev(dev->fd, iov, num); +} + +/* Connect to 9p server (stolen from spfsclient by Lucho Ionkov) */ +/* We can't use gethostbyname because it makes us link a shared library */ +static int connect_9p(const char *arg) +{ + int fd, port; + char *addr, *p, *s; + struct sockaddr_in saddr; + u32 ipaddr; + + if (!arg) + err(1, "9p: problem with args"); + + addr = strdup(arg); + ipaddr = str2ip(addr); + + port = 567; + p = strrchr(addr, ':'); + if (p) { + *p = '\0'; + p++; + port = strtol(p, &s, 10); + if (*s != '\0') + err(1, "9p
[RFC] 9p: Make transports dynamic
From: Eric Van Hensbergen <[EMAIL PROTECTED](none)> This patch abstracts out the interfaces to underlying transports so that new transports can be added as modules. This should also allow kernel configuration of transports without ifdef-hell. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- Documentation/filesystems/9p.txt |8 +- fs/9p/v9fs.c | 149 +++--- fs/9p/v9fs.h | 15 +-- fs/9p/vfs_super.c| 19 +-- include/net/9p/client.h |4 +- include/net/9p/conn.h|4 +- include/net/9p/transport.h | 25 ++- net/9p/Kconfig | 10 + net/9p/Makefile |5 +- net/9p/client.c |2 +- net/9p/mux.c |4 +- net/9p/trans_fd.c| 419 -- 12 files changed, 379 insertions(+), 285 deletions(-) diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index cda6905..1a5f50d 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt @@ -35,12 +35,12 @@ For remote file server: For Plan 9 From User Space applications (http://swtch.com/plan9) - mount -t 9p `namespace`/acme /mnt/9 -o proto=unix,uname=$USER + mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER OPTIONS === - proto=name select an alternative transport. Valid options are + trans=name select an alternative transport. Valid options are currently: unix - specifying a named pipe mount point tcp - specifying a normal TCP/IP connection @@ -68,9 +68,9 @@ OPTIONS 0x40 = display transport debug 0x80 = display allocation debug - rfdno=n the file descriptor for reading with proto=fd + rfdno=n the file descriptor for reading with trans=fd - wfdno=n the file descriptor for writing with proto=fd + wfdno=n the file descriptor for writing with trans=fd maxdata=nthe number of bytes to use for 9p packet payload (msize) diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 0a7068e..08d880f 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -37,18 +37,58 @@ #include "v9fs_vfs.h" /* + * Dynamic Transport Registration Routines + * + */ + +static LIST_HEAD(v9fs_trans_list); +static struct p9_trans_module *v9fs_default_trans; + +/** + * v9fs_register_trans - register a new transport with 9p + * @m - structure describing the transport module and entry points + * + */ +void v9fs_register_trans(struct p9_trans_module *m) +{ + list_add_tail(&m->list, &v9fs_trans_list); + if (m->def) + v9fs_default_trans = m; +} +EXPORT_SYMBOL(v9fs_register_trans); + +/** + * v9fs_match_trans - match transport versus registered transports + * @arg: string identifying transport + * + */ +static struct p9_trans_module *v9fs_match_trans(const substring_t *name) +{ + struct list_head *p; + struct p9_trans_module *t = NULL; + + list_for_each(p, &v9fs_trans_list) { + t = list_entry(p, struct p9_trans_module, list); + if (strncmp(t->name, name->from, name->to-name->from) == 0) { + P9_DPRINTK(P9_DEBUG_TRANS, "trans=%s\n", t->name); + break; + } + } + return t; +} + +/* * Option Parsing (code inspired by NFS code) - * + * NOTE: each transport will parse its own options */ enum { /* Options that take integer arguments */ - Opt_debug, Opt_port, Opt_msize, Opt_uid, Opt_gid, Opt_afid, - Opt_rfdno, Opt_wfdno, + Opt_debug, Opt_msize, Opt_uid, Opt_gid, Opt_afid, /* String options */ - Opt_uname, Opt_remotename, + Opt_uname, Opt_remotename, Opt_trans, /* Options that take no arguments */ - Opt_legacy, Opt_nodevmap, Opt_unix, Opt_tcp, Opt_fd, Opt_pci, + Opt_legacy, Opt_nodevmap, /* Cache options */ Opt_cache_loose, /* Error token */ @@ -57,24 +97,13 @@ enum { static match_table_t tokens = { {Opt_debug, "debug=%x"}, - {Opt_port, "port=%u"}, {Opt_msize, "msize=%u"}, {Opt_uid, "uid=%u"}, {Opt_gid, "gid=%u"}, {Opt_afid, "afid=%u"}, - {Opt_rfdno, "rfdno=%u"}, - {Opt_wfdno, "wfdno=%u"}, {Opt_uname, "uname=%s"}, {Opt_remotename, "aname=%s"}, - {Opt_unix, "proto=unix"}, - {Opt_tcp, "proto=tcp"}, - {Opt_fd, "proto=fd"}, -#ifdef CONFIG_PCI_9P - {Opt_pci, "proto=pci"}, -#endif - {Opt_tcp, "tcp"}, - {Opt_unix, "unix"}, - {Opt_fd, "fd"}, + {Opt_trans,
[RFC] 9p Virtualization Transports
This patch set contains a set of virtualization transports for the 9p file system intended to provide a mechanism for guests to access a portion of the hosts name space without having to go through a virtualized network. Shared memory based transports are provided for lguest using a variation of the lguest console code and for KVM using a synthetic PCI device. The patches to the qemu portion of the latter will be posted to the kvm-devel list later today. Also provided is a much older hack implementation which was used on XenPPC to communicated between Dom0 and DomU as part of the PROSE (http://www.research.ibm.com/prose) and Libra projects. It is not our intent to push the Xen shared memory transport into the kernel, but we are providing it in this patch-set for historical reference. The lguest and kvm transports are functional, but we are still working out remaining bugs and need to spend some time focusing on performance issues. I wanted to send out this "preview" patch set to the community to solicit ideas on things we can do differently/better. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] 9p fixes
Linus, please pull from the 'for-linus' branch of: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git/ for-linus This tree contains the following: Mariusz Kozlowski (1): 9p: fix bad error path in conversion routines Adrian Bunk(2): 9p: remove deprecated v9fs_fid_lookup_remove 9p: fix use after free Eric Van Hensbergen(1): 9p: update maintainers and documentation Documentation/filesystems/9p.txt | 24 +++- MAINTAINERS |5 ++--- fs/9p/fid.c | 17 - fs/9p/fid.h |2 -- net/9p/conv.c|2 +- net/9p/mux.c | 10 ++ 6 files changed, 28 insertions(+), 32 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.20.17 review 32/58] fs: 9p/conv.c error path fix
On 8/22/07, Willy Tarreau <[EMAIL PROTECTED]> wrote: > When buf_check_overflow() returns != 0 we will hit kfree(ERR_PTR(err)) > and it will not be happy about it. > > Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]> > Cc: Andrew Morton <[EMAIL PROTECTED]> > Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> > Signed-off-by: Willy Tarreau <[EMAIL PROTECTED]> Acked-by: Eric Van Hensbergen <[EMAIL PROTECTED]> This seems to be in the current code as well, I'll forward-port the patch... -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] #if 0 v9fs_fid_lookup_remove()
Sorry- its in my merge queue, but I've been fighting other fires. Will try and get this regression tested and merged into v9fs-devel tomorrow afternoon along with a few other patches. -eric On 8/14/07, Adrian Bunk <[EMAIL PROTECTED]> wrote: > This patch #if 0's the unused v9fs_fid_lookup_remove(). > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > --- > > This patch has been sent on: > - 29 Jul 2007 > > fs/9p/fid.c |2 ++ > fs/9p/fid.h |1 - > 2 files changed, 2 insertions(+), 1 deletion(-) > > --- linux-2.6.23-rc1-mm1/fs/9p/fid.h.old2007-07-26 13:22:00.0 > +0200 > +++ linux-2.6.23-rc1-mm1/fs/9p/fid.h2007-07-26 13:22:07.0 +0200 > @@ -28,6 +28,5 @@ > }; > > struct p9_fid *v9fs_fid_lookup(struct dentry *dentry); > -struct p9_fid *v9fs_fid_lookup_remove(struct dentry *dentry); > struct p9_fid *v9fs_fid_clone(struct dentry *dentry); > int v9fs_fid_add(struct dentry *dentry, struct p9_fid *fid); > --- linux-2.6.23-rc1-mm1/fs/9p/fid.c.old2007-07-26 13:22:22.0 > +0200 > +++ linux-2.6.23-rc1-mm1/fs/9p/fid.c2007-07-26 13:22:40.0 +0200 > @@ -92,6 +92,7 @@ > return fid; > } > > +#if 0 > struct p9_fid *v9fs_fid_lookup_remove(struct dentry *dentry) > { > struct p9_fid *fid; > @@ -107,6 +108,7 @@ > > return fid; > } > +#endif /* 0 */ > > > /** > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: net/9p/mux.c: use-after-free
On 7/25/07, Andrew Morton <[EMAIL PROTECTED]> wrote: On Wed, 25 Jul 2007 13:43:16 -0500 "Eric Van Hensbergen" <[EMAIL PROTECTED]> wrote: > mtmp = ERR_PTR(PTR_ERR(m->tagpool)); odd. What does ERR_PTR(PTR_ERR(...)) do? I kind of assumed it was a necessry evil to get the casting right. A quick grep shows it in 42 other places within the kernel. Unpacking the macros it looks like: (void *)(long)(struct p9_idpool *) So all that you would really need is (void *) or ERR_PTR -- but that might look confusing in the code. Of course, broadening the context a bit: m->tagpool = p9_idpool_create(); if (!m->tagpool) { mtmp = ERR_PTR(PTR_ERR(m->tagpool)); kfree(m); return mtmp; } m->tagpool must be zero to enter the code at all, so we are returning a NULL pointer, not really an error -- which is probably wrong (I don't think it will properly trigger IS_ERR_VALUE) -- so we should probably be returning -ENOMEM. Of course, we really should be seeing an ERR_PTR returned from p9_idpool_create, not 0 -- checking that code, it either returns -ENOMEM or the correct value, never 0, so the check is wrong as well. It should be: m->tagpool = p9_idpool_create(); if (IS_ERR(m->tagpool)) { mtmp = ERR_PTR(-ENOMEM); kfree(m); return mtmp; } We could have done: ERR_PTR(m->tagpool); or kept the long: ERR_PTR(PTR_ERR(m->tagpool)); but I think returning an explicit error code keeps the code more clear. So, which is the correct approach? -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: net/9p/mux.c: use-after-free
On 7/25/07, Latchesar Ionkov <[EMAIL PROTECTED]> wrote: Yep, it's a leak. Okay, I'll roll that into the patch as well. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: net/9p/mux.c: use-after-free
On 7/22/07, Adrian Bunk <[EMAIL PROTECTED]> wrote: The Coverity checker spotted the following use-after-free in net/9p/mux.c: <-- snip --> ... struct p9_conn *p9_conn_create(struct p9_transport *trans, int msize, unsigned char *extended) { ... if (!m->tagpool) { kfree(m); return ERR_PTR(PTR_ERR(m->tagpool)); } ... <-- snip --> I've got a fix for this one: if (!m->tagpool) { mtmp = ERR_PTR(PTR_ERR(m->tagpool)); kfree(m); return mtmp; } but I was wondering about one of the other returns further down the function: ... memset(&m->poll_waddr, 0, sizeof(m->poll_waddr)); m->poll_task = NULL; n = p9_mux_poll_start(m); if (n) return ERR_PTR(n); n = trans->poll(trans, &m->pt); ... lucho: doesn't that constitute a leak? Shouldn't we be doing: if (n) { kfree(m); return ERR_PTR(n); } -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CTL_UNNUMBERED (Re: [PATCH] 9p: Don't use binary sysctl numbers.)
On 7/23/07, Latchesar Ionkov <[EMAIL PROTECTED]> wrote: It doesn't really matter (for me) whether it is sysctl or sysfs interface. The sysctl approach seemed easier to implement. If the consensus is to use sysfs, I'll send a patch (for 2.6.24). Sorry for the incorrect implementation, I guess I stole the code from unappropriate place :) I think this is appropriate for a "fix" submission to the 2.6.23-rc series. If you don't have the bandwidth right now, I'll look at reworking it, or at the very least just removing the sysctl interface. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CTL_UNNUMBERED (Re: [PATCH] 9p: Don't use binary sysctl numbers.)
On 7/21/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: Alexey Dobriyan <[EMAIL PROTECTED]> writes: > > That's separate patch but CTL_UNNUMBERED must die, because it's totally > unneeded. If you don't want sysctl(2) interface just SKIP ->ctl_name > initialization and save one line for something useful. As for the 9p code it doesn't seem to need or want a real binary interface. The 9p debug code picking of a semi-random number and not patching it into sysctl.h like it should for a binary interface is an implementation bug, and a maintenance problem. Now that -rc1 is out, lets talk a bit more about this. Lucho can you provide some level of justification of why you went for a sysctl interface versus something directly accessible within the file system -- that would seem more on-par with the 9p philosophy. Perhaps its time for a general cleanup of the debug_level stuff -- it was always ugly to have it as a global, but there was just no clear way to have the session structure available everywhere we use it. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Plan 9 Resource Sharing Support - what is it?
On 7/23/07, Pavel Machek <[EMAIL PROTECTED]> wrote: Hi! What is "plan 9 resource sharing"? Some kind of mosix-like process migration? Could you explain it in two lines in Kconfig? http://v9fs.sf.net is redirect to http://v9fs.sourceforge.net/ which tells me Moved to SWiK after clicking on that, I get to page with content, but no explanation (could Kconfig/MAINTAINERS be updated?). Sure, I'll try to put something in that is considerably less vague and I'll update the URLs as well. We've been somewhat lagging on documentation. The most complete explanation is available in the Freenix paper: http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html The short answer is that it is a Linux client for sharing file systems, devices, and system services mapped as synthetic file systems via a simple protocol. Its primary focus has been to keep things simple, and to try and maintain support for being able to effectively share synthetic file systems (like /proc, or /sysfs, or ones exported by Plan 9 applications (Russ Cox's Plan 9 ports package contains a set of Plan 9 applications ported to UNIX such as the Venti content addressable storage system)). Its being used internally by (at the very least) IBM Research and Los Alamos National Labs. To date we've been focused on the client and a few specialized servers, we are currently broadening our approach to a better general-purpose server and evaluating if it makes sense to make an in-kernel server available. We've also been looking at using 9p in the context of virtualized environments to provide file service, and perhaps sharing of other system resources as well. Once we have a better handle on the server story, we'll produce a much larger body of documentation discussing usage as well as development of 9p file servers. There are also side-efforts underway evaluating different methods of extended the 9p protocol to better support the Linux (and other UNIX) environments -- ideally without adding a signifigant amount of complexity. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] 9p patch to fix compilation issue
Linus, please pull from the 'for-linus' branch of: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git/ for-linus This tree contains the following: Dave Jones(1): fix debug compilation error v9fs.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 9p: fix debug compilation error
On 7/16/07, Dave Jones <[EMAIL PROTECTED]> wrote: On Mon, Jul 16, 2007 at 09:47:49AM -0500, Eric Van Hensbergen wrote: > From: Meelis Roos <[EMAIL PROTECTED]> > > With 9P but no 9P debug options, this error occurs: > CC [M] fs/9p/v9fs.o > fs/9p/v9fs.c: In function 'v9fs_parse_options': > fs/9p/v9fs.c:134: error: 'p9_debug_level' undeclared (first use in this function) > > The following patch moves the definition of p9_debug_level out of #ifdef > and seems to fix the problem. > > (Original patch took care of the extern definition in the includes, but > not the actual definition in mod.c - ericvh) Seems somewhat wasteful to include the debug options when the config option has been disabled though. Wouldn't something like this (untested) make more sense ? Fair enough. Looks like I introduced this when I put back the mount-time debug option. Dave --- fs/9p/v9fs.c: In function 'v9fs_parse_options': fs/9p/v9fs.c:134: error: 'p9_debug_level' undeclared (first use in this function) Signed-off-by: Dave Jones <[EMAIL PROTECTED]> Acked-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- linux-2.6.22.noarch/fs/9p/v9fs.c~ 2007-07-16 11:45:56.0 -0400 +++ linux-2.6.22.noarch/fs/9p/v9fs.c2007-07-16 11:46:12.0 -0400 @@ -131,7 +131,9 @@ static void v9fs_parse_options(char *opt switch (token) { case Opt_debug: v9ses->debug = option; +#ifdef CONFIG_NET_9P_DEBUG p9_debug_level = option; +#endif break; case Opt_port: v9ses->port = option; -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 9p: fix debug compilation error
From: Meelis Roos <[EMAIL PROTECTED]> With 9P but no 9P debug options, this error occurs: CC [M] fs/9p/v9fs.o fs/9p/v9fs.c: In function 'v9fs_parse_options': fs/9p/v9fs.c:134: error: 'p9_debug_level' undeclared (first use in this function) The following patch moves the definition of p9_debug_level out of #ifdef and seems to fix the problem. (Original patch took care of the extern definition in the includes, but not the actual definition in mod.c - ericvh) Signed-off-by: Meelis Roos <[EMAIL PROTECTED]> Signed-off-by: Eric Van Hensbergren <[EMAIL PROTECTED]> --- include/net/9p/9p.h |4 ++-- net/9p/mod.c|2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h index 4d3..8fc3796 100644 --- a/include/net/9p/9p.h +++ b/include/net/9p/9p.h @@ -27,6 +27,8 @@ #ifndef NET_9P_H #define NET_9P_H +extern unsigned int p9_debug_level; + #ifdef CONFIG_NET_9P_DEBUG #define P9_DEBUG_ERROR (1<<0) @@ -38,8 +40,6 @@ #define P9_DEBUG_SLABS (1<<7) #define P9_DEBUG_FCALL (1<<8) -extern unsigned int p9_debug_level; - #define P9_DPRINTK(level, format, arg...) \ do { \ if ((p9_debug_level & level) == level) \ diff --git a/net/9p/mod.c b/net/9p/mod.c index 4f9e1d2..951bb1d 100644 --- a/net/9p/mod.c +++ b/net/9p/mod.c @@ -28,12 +28,10 @@ #include #include -#ifdef CONFIG_NET_9P_DEBUG unsigned int p9_debug_level = 0; /* feature-rific global debug level */ EXPORT_SYMBOL(p9_debug_level); module_param_named(debug, p9_debug_level, uint, 0); MODULE_PARM_DESC(debug, "9P debugging level"); -#endif extern int p9_mux_global_init(void); extern void p9_mux_global_exit(void); -- 1.5.0.2.gfbe3d-dirty - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [9Pfs] Compilation error
On 7/16/07, Alejandro Riveira Fernández <[EMAIL PROTECTED]> wrote: I get this in today's head 8f41958bdd577731f7411c9605cfaa9db6766809 $ make O=../2.6.23 Using /home/alex/kernel/linux-2.6 as source for kernel GEN /home/alex/kernel/2.6.23/Makefile CHK include/linux/version.h CHK include/linux/utsrelease.h CALL/home/alex/kernel/linux-2.6/scripts/checksyscalls.sh CHK include/linux/compile.h CC [M] fs/9p/v9fs.o /home/alex/kernel/linux-2.6/fs/9p/v9fs.c: En la función 'v9fs_parse_options': /home/alex/kernel/linux-2.6/fs/9p/v9fs.c:134: error: 'p9_debug_level' no se declaró aquà (primer uso en esta función) /home/alex/kernel/linux-2.6/fs/9p/v9fs.c:134: error: (Cada identificador no declarado solamente se reporta una vez /home/alex/kernel/linux-2.6/fs/9p/v9fs.c:134: error: ara cada funcion en la que aparece.) make[3]: *** [fs/9p/v9fs.o] Error 1 make[2]: *** [fs/9p] Error 2 make[1]: *** [fs] Error 2 make: *** [_all] Error 2 Thanks, someone already submitted a patch, I'm merging and testing now. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] 9p Patches for 2.6.23 merge window
Linus, please pull from the 'for-linus' branch of: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git/ for-linus This tree contains the following: Latchesar Ionkov(3): Reorganization of 9p file system code Change net/9p module name to 9pnet Set error to EREMOTEIO if transport write returns zero Eric Van Hensbergen(3) Cache meta-data when loose cache option is set Renable mount-time debug option Fix a race condition bug in umount which caused segfault The bulk of the changes were in the reorganization patch which mostly moved files and interfaces around in preparation for work on an in-kernel 9p server. b/fs/9p/Makefile |6 b/fs/9p/fid.c| 168 ++ b/fs/9p/fid.h| 43 - b/fs/9p/v9fs.c | 288 ++- b/fs/9p/v9fs.h | 32 - b/fs/9p/v9fs_vfs.h |6 b/fs/9p/vfs_addr.c | 57 -- b/fs/9p/vfs_dentry.c | 37 - b/fs/9p/vfs_dir.c| 155 +- b/fs/9p/vfs_file.c | 160 +- b/fs/9p/vfs_inode.c | 753 +++--- b/fs/9p/vfs_super.c | 91 +-- b/fs/Kconfig |2 b/include/net/9p/9p.h| 417 + b/include/net/9p/client.h| 80 +++ b/include/net/9p/conn.h | 57 ++ b/include/net/9p/transport.h | 49 ++ b/net/9p/Kconfig | 21 b/net/9p/Makefile| 13 b/net/9p/client.c| 965 +++ b/net/9p/conv.c | 903 b/net/9p/error.c | 240 + b/net/9p/fcprint.c | 358 ++ b/net/9p/mod.c | 85 +++ b/net/9p/mux.c | 1050 +++ b/net/9p/sysctl.c| 86 +++ b/net/9p/trans_fd.c | 363 ++ b/net/9p/util.c | 125 + b/net/Kconfig|1 b/net/Makefile |2 fs/9p/9p.h | 375 --- fs/9p/conv.c | 845 -- fs/9p/conv.h | 50 -- fs/9p/debug.h| 77 --- fs/9p/error.c| 93 --- fs/9p/error.h| 177 --- fs/9p/fcall.c| 427 - fs/9p/fcprint.c | 345 -- fs/9p/mux.c | 1033 -- fs/9p/mux.h | 55 -- fs/9p/trans_fd.c | 308 fs/9p/transport.h| 45 - fs/9p/v9fs.c |7 fs/9p/vfs_file.c | 14 fs/9p/vfs_inode.c|4 fs/9p/vfs_super.c|3 net/9p/Makefile |7 net/9p/client.c |7 net/9p/mux.c |7 49 files changed, 5400 insertions(+), 5092 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/14] Add a new mount flag (MNT_UNION) for union mount
On 5/15/07, Bharata B Rao <[EMAIL PROTECTED]> wrote: So there can be two cases in union mounts: 1. A file exists in topmost layer and also in one or more lower layers. Deleting the file would result in the top layer file being deleted and a whiteout being created in the top layer. 2. A file exists in one or more of lower layers, but not in the topmost layer. Deleting this file would result in just a whiteout being created in the topmost layer. I'd imagine there is a third potential option, which I'll admit strays a bit from the conventional UNIX semantic. If only one layer is marked as writable, then any changes (including delete) only effect that layer. I could imagine this would be useful in situations like overlaying a sandbox on an otherwise read-only source code tree (you might want to just get rid of a modification by removing your file and have it replaced by the original underlying source). I suppose a further extension would be to have multiple layers marked as mutable and functions such as delete would effect all mutable layers, but functions like create would only affect the top mutable layer. As an aside, perhaps it would be useful to mark the mutable layer at mount time (instead of having it always be the top layer). Again this could lead to some optional non-conventional file system semantics, but its proven useful in Plan 9 union mount semantics and it seems a fairly trivial extension to what you currently have. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V9fs-developer] [PATCH 1/4] v9fs: rename non-vfs related structs and functions to be moved to net/9p
On 5/8/07, Andrew Morton <[EMAIL PROTECTED]> wrote: On Tue, 8 May 2007 14:51:02 -0600 "Latchesar Ionkov" <[EMAIL PROTECTED]> wrote: > This patchset moves non-filesystem interfaces of v9fs from fs/9p to net/9p. > It moves the transport, packet marshalling and connection layers to net/9p > leaving only the VFS related files in fs/9p. (Please cc [EMAIL PROTECTED] on net-related work) These changes would be best handled via Eric's git tree, with appropriate acks from the net maintainers. Ack. I was waiting to see if Lucho hit any major brick walls with the community before pulling them in. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V9fs-developer] [PATCH] 9p: create separate 9p client interface
On 4/30/07, Christoph Hellwig <[EMAIL PROTECTED]> wrote: On Mon, Apr 30, 2007 at 09:32:41AM -0600, Latchesar Ionkov wrote: > Create a separate 9P client interface that can be used outside the VFS > layer. In addition to VFS, the new interface can be used to export the > authentication channel or from other interfaces. And what exact users would that be? We have a huge dislike for putting abstractions in just for the abstractions sake, so if you want this merged you'd better present a highly useful client to that interface. I'll let Lucho give more details on his possible uses for such an interface -- for my part we have been looking at doing in-kernel servers for more efficient export of devices and system services (such as the network stack). We've been using user space applications for doing such sharing, but there are undesirable inefficiencies in such an approach. 19 files changed, 1656 insertions(+), 1585 deletions(-) I believe the log message was poorly worded, it was more of a reorganization of the existing interface versus the creation of an additional interface. Also the non-filesystem interface code shouldn't live in fs/ but rather in net/9p/ Which bits do you think are candidates for such a move? The transport interfaces? (there are a few more in the wings to cover shared memory transports for VMMs among other things) Should the protocol elements move as well? -- that seems a bit fuzzier to me. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/8] unprivileged mount syscall
On 4/6/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote: Jan Engelhardt wrote: > On Apr 6 2007 16:16, H. Peter Anvin wrote: - users can use bind mounts without having to pre-configure them in /etc/fstab >> This is by far the biggest concern I see. I think the security implication of >> allowing anyone to do bind mounts are poorly understood. > > $ whoami > miklos > $ mount --bind / ~/down_under > > later that day: > # userdel -r miklos > Consider backups, for example. This is the reason why enforcing private namespaces for user mounts makes sense. I think it catches many of these corner cases. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] 9p patches
Linus, please pull from the 'for-linus' branch of: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git/ for-linus This tree contains the following: Adrian Bunk(1): make struct v9fs_cached_file_operations static v9fs_vfs.h |1 - vfs_file.c |4 +++- 2 files changed, 3 insertions(+), 2 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [-mm patch] fs/9p/vfs_addr.c: make 2 functions static
On 2/19/07, Adrian Bunk <[EMAIL PROTECTED]> wrote: On Sat, Feb 17, 2007 at 09:51:46PM -0800, Andrew Morton wrote: >... > Changes since 2.6.20-mm1: >... > git-v9fs.patch >... > git trees >... This patch makes two needlessly global functions static. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Acked-by: Eric Van Hensbergen <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] 9p patches
Linus, please pull from the 'for-linus' branch of: git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git/ for-linus This tree contains the following: Eric Van Hensbergen(1): Implement optional loose read cache Eric W. Biederman(1): Use kthread_strop instead of sending a SIGKILL. Documentation/filesystems/00-INDEX |4 ++-- Documentation/filesystems/9p.txt |4 fs/9p/fid.c|3 ++- fs/9p/mux.c|5 + fs/9p/v9fs.c |9 - fs/9p/v9fs.h |9 - fs/9p/v9fs_vfs.h |2 ++ fs/9p/vfs_addr.c |2 ++ fs/9p/vfs_dentry.c | 26 ++ fs/9p/vfs_file.c | 18 ++ fs/9p/vfs_inode.c | 20 11 files changed, 89 insertions(+), 13 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 9p: add write-cache support to loose cache mode (take 4)
On 2/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: On Fri, 16 Feb 2007 18:46:59 -0600 Eric Van Hensbergen <[EMAIL PROTECTED]> wrote: > + if(!PageUptodate(page)) { > + if (to - from != PAGE_CACHE_SIZE) { > + void *kaddr = kmap_atomic(page, KM_USER0); > + memset(kaddr, 0, from); > + memset(kaddr + to, 0, PAGE_CACHE_SIZE - to); > + flush_dcache_page(page); > + kunmap_atomic(kaddr, KM_USER0); > + } > + if ((file->f_flags & O_ACCMODE) != O_WRONLY) > + v9fs_vfs_readpage_worker(file, page); > + } Seems strange to memset part of the page and to then go and fill the page in from backing store. Perhaps some optimisation is possible here? Just double-checking in an effort to actually get the next patch right (hopefully) -- seems like there are two cases -- if I can read from the file, I just call readpage and it'll zero out bits. If the file is open write-only, things are a little cloudy -- fs/cifs looks like they just don't do anything. In the write-only case, do I need to zero the unwritten portions of the page, or does this get handled under the covers? Looks like NFS just avoids this by only writing the bits that change, which I suppose has other advantages. I'll refactor the writepage code to follow the NFS example versus the CIFS code I originally based my implementation on. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 9p: add write-cache support to loose cache mode (take 4)
Loose cache mode was added primarily to asssist exclusive, read-only mounts (like venti) -- however, there is also a case for using loose write cacheing in support of read/write exclusive mounts. This feature is linked to the loose cache option and is disabled by default. This code adds the necessary code to support writes through the page cache. Write caches are not used for synthetic files or for files opened in APPEND mode. Signed-of-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- fs/9p/9p.h|2 +- fs/9p/conv.c | 18 +- fs/9p/conv.h |2 +- fs/9p/fcall.c | 10 ++- fs/9p/fid.c | 16 +++--- fs/9p/fid.h |2 +- fs/9p/v9fs_vfs.h |2 + fs/9p/vfs_addr.c | 173 +--- fs/9p/vfs_dir.c |2 +- fs/9p/vfs_file.c | 61 ++ fs/9p/vfs_inode.c | 20 +- fs/9p/vfs_super.c |3 +- 12 files changed, 265 insertions(+), 46 deletions(-) diff --git a/fs/9p/9p.h b/fs/9p/9p.h index 94e2f92..6f8edf0 100644 --- a/fs/9p/9p.h +++ b/fs/9p/9p.h @@ -370,6 +370,6 @@ int v9fs_t_read(struct v9fs_session_info *v9ses, u32 fid, u64 offset, u32 count, struct v9fs_fcall **rcall); int v9fs_t_write(struct v9fs_session_info *v9ses, u32 fid, u64 offset, -u32 count, const char __user * data, +u32 count, const char __user * data, char * kdata, struct v9fs_fcall **rcall); int v9fs_printfcall(char *, int, struct v9fs_fcall *, int); diff --git a/fs/9p/conv.c b/fs/9p/conv.c index a3ed571..9bb075a 100644 --- a/fs/9p/conv.c +++ b/fs/9p/conv.c @@ -458,6 +458,15 @@ v9fs_put_user_data(struct cbuf *bufp, const char __user * data, int count, return copy_from_user(*pdata, data, count); } +static int +v9fs_put_kernel_data(struct cbuf *bufp, char * kdata, int count, + unsigned char **pdata) +{ + *pdata = buf_alloc(bufp, count); + memcpy(*pdata, kdata, count); + return 0; +} + static void v9fs_put_wstat(struct cbuf *bufp, struct v9fs_wstat *wstat, struct v9fs_stat *stat, int statsz, int extended) @@ -723,7 +732,7 @@ struct v9fs_fcall *v9fs_create_tread(u32 fid, u64 offset, u32 count) } struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, - const char __user * data) + const char __user * data, char * kdata) { int size, err; struct v9fs_fcall *fc; @@ -738,7 +747,12 @@ struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, v9fs_put_int32(bufp, fid, &fc->params.twrite.fid); v9fs_put_int64(bufp, offset, &fc->params.twrite.offset); v9fs_put_int32(bufp, count, &fc->params.twrite.count); - err = v9fs_put_user_data(bufp, data, count, &fc->params.twrite.data); + if(data) + err = v9fs_put_user_data(bufp, data, count, + &fc->params.twrite.data); + else + err = v9fs_put_kernel_data(bufp, kdata, count, + &fc->params.twrite.data); if (err) { kfree(fc); fc = ERR_PTR(err); diff --git a/fs/9p/conv.h b/fs/9p/conv.h index dd5b6b1..8091672 100644 --- a/fs/9p/conv.h +++ b/fs/9p/conv.h @@ -42,7 +42,7 @@ struct v9fs_fcall *v9fs_create_tcreate(u32 fid, char *name, u32 perm, u8 mode, char *extension, int extended); struct v9fs_fcall *v9fs_create_tread(u32 fid, u64 offset, u32 count); struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, - const char __user *data); + const char __user *data, char *kdata); struct v9fs_fcall *v9fs_create_tclunk(u32 fid); struct v9fs_fcall *v9fs_create_tremove(u32 fid); struct v9fs_fcall *v9fs_create_tstat(u32 fid); diff --git a/fs/9p/fcall.c b/fs/9p/fcall.c index dc336a6..ca77839 100644 --- a/fs/9p/fcall.c +++ b/fs/9p/fcall.c @@ -367,7 +367,7 @@ v9fs_t_read(struct v9fs_session_info *v9ses, u32 fid, u64 offset, int ret; struct v9fs_fcall *tc, *rc; - dprintk(DEBUG_9P, "fid %d offset 0x%llux count 0x%x\n", fid, + dprintk(DEBUG_9P, "fid %d offset 0x%llx count 0x%x\n", fid, (long long unsigned) offset, count); tc = v9fs_create_tread(fid, offset, count); @@ -393,21 +393,23 @@ v9fs_t_read(struct v9fs_session_info *v9ses, u32 fid, u64 offset, * @fid: fid to write to * @offset: offset to start write at * @count: how many bytes to write + * @data: userspace data + * @kdata: kernelspace data * @fcall: pointer to response fcall * */ int v9fs_t_write(struct v9fs_session_info *v9ses, u32 fid, u64 offset, u32 count, - const char __user *data, struct v9fs_fcall **rcp) + const char __user *data, char *kdata, struct v9fs_fcall **rcp) { int ret; struct v9fs_fcall *tc,
Re: [PATCH] 9p: add write-cache support to loose cache mode (take 3)
On 2/16/07, Andrew Morton <[EMAIL PROTECTED]> wrote: On Fri, 16 Feb 2007 09:37:01 -0600 Eric Van Hensbergen <[EMAIL PROTECTED]> wrote: > +static int v9fs_vfs_writepage(struct page *page, struct writeback_control *wbc) > +{ > + char *buffer = NULL; > + struct address_space *mapping = page->mapping; > + int retval = -EIO; > + loff_t offset = 0; > + pgoff_t end_index; > + int count = PAGE_CACHE_SIZE; > + struct file *filp = v9fs_find_file(page); > + struct inode *inode = mapping->host; > + > + dprintk(DEBUG_VFS, "page: %p\n", page); > + > + if ((!inode) || (!filp)) > + goto UnlockPage; > + > + end_index = inode->i_size >> PAGE_CACHE_SHIFT; > + > + /* complicated case at end of file */ > + if (page->index >= end_index) { > + /* things got complicated... */ > + count = inode->i_size & (PAGE_CACHE_SIZE - 1); > + if (page->index >= end_index + 1 || !count) > + return 0; /* truncated - don't care */ > + } > + > + buffer = kmap(page); > + offset = ((loff_t) page->index << PAGE_CACHE_SHIFT); > + page_cache_get(page); > + retval = v9fs_write(filp, NULL, buffer, count, &offset); > + > kunmap(page); > + > + UnlockPage: > unlock_page(page); > + page_cache_release(page); > + > return retval; > } The page_cache_get/release here aren't needed: lock_page suffices. Are you sure the page refcounting is right if the `goto UnlockPage' happens? Can that goto actually happen?? It shouldn't, if it does we are probably in big trouble anyways. I'll pull it out. Thanks. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 9p: add write-cache support to loose cache mode (take 3)
Loose cache mode was added primarily to asssist exclusive, read-only mounts (like venti) -- however, there is also a case for using loose write cacheing in support of read/write exclusive mounts. This feature is linked to the loose cache option and is disabled by default. This code adds the necessary code to support writes through the page cache. Write caches are not used for synthetic files or for files opened in APPEND mode. Signed-of-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- fs/9p/9p.h|2 +- fs/9p/conv.c | 18 +- fs/9p/conv.h |2 +- fs/9p/fcall.c | 10 ++- fs/9p/fid.c | 16 +++--- fs/9p/fid.h |2 +- fs/9p/v9fs_vfs.h |2 + fs/9p/vfs_addr.c | 183 ++--- fs/9p/vfs_dir.c |2 +- fs/9p/vfs_file.c | 61 ++ fs/9p/vfs_inode.c | 20 +- fs/9p/vfs_super.c |3 +- 12 files changed, 275 insertions(+), 46 deletions(-) diff --git a/fs/9p/9p.h b/fs/9p/9p.h index 94e2f92..6f8edf0 100644 --- a/fs/9p/9p.h +++ b/fs/9p/9p.h @@ -370,6 +370,6 @@ int v9fs_t_read(struct v9fs_session_info *v9ses, u32 fid, u64 offset, u32 count, struct v9fs_fcall **rcall); int v9fs_t_write(struct v9fs_session_info *v9ses, u32 fid, u64 offset, -u32 count, const char __user * data, +u32 count, const char __user * data, char * kdata, struct v9fs_fcall **rcall); int v9fs_printfcall(char *, int, struct v9fs_fcall *, int); diff --git a/fs/9p/conv.c b/fs/9p/conv.c index a3ed571..9bb075a 100644 --- a/fs/9p/conv.c +++ b/fs/9p/conv.c @@ -458,6 +458,15 @@ v9fs_put_user_data(struct cbuf *bufp, const char __user * data, int count, return copy_from_user(*pdata, data, count); } +static int +v9fs_put_kernel_data(struct cbuf *bufp, char * kdata, int count, + unsigned char **pdata) +{ + *pdata = buf_alloc(bufp, count); + memcpy(*pdata, kdata, count); + return 0; +} + static void v9fs_put_wstat(struct cbuf *bufp, struct v9fs_wstat *wstat, struct v9fs_stat *stat, int statsz, int extended) @@ -723,7 +732,7 @@ struct v9fs_fcall *v9fs_create_tread(u32 fid, u64 offset, u32 count) } struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, - const char __user * data) + const char __user * data, char * kdata) { int size, err; struct v9fs_fcall *fc; @@ -738,7 +747,12 @@ struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, v9fs_put_int32(bufp, fid, &fc->params.twrite.fid); v9fs_put_int64(bufp, offset, &fc->params.twrite.offset); v9fs_put_int32(bufp, count, &fc->params.twrite.count); - err = v9fs_put_user_data(bufp, data, count, &fc->params.twrite.data); + if(data) + err = v9fs_put_user_data(bufp, data, count, + &fc->params.twrite.data); + else + err = v9fs_put_kernel_data(bufp, kdata, count, + &fc->params.twrite.data); if (err) { kfree(fc); fc = ERR_PTR(err); diff --git a/fs/9p/conv.h b/fs/9p/conv.h index dd5b6b1..8091672 100644 --- a/fs/9p/conv.h +++ b/fs/9p/conv.h @@ -42,7 +42,7 @@ struct v9fs_fcall *v9fs_create_tcreate(u32 fid, char *name, u32 perm, u8 mode, char *extension, int extended); struct v9fs_fcall *v9fs_create_tread(u32 fid, u64 offset, u32 count); struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, - const char __user *data); + const char __user *data, char *kdata); struct v9fs_fcall *v9fs_create_tclunk(u32 fid); struct v9fs_fcall *v9fs_create_tremove(u32 fid); struct v9fs_fcall *v9fs_create_tstat(u32 fid); diff --git a/fs/9p/fcall.c b/fs/9p/fcall.c index dc336a6..ca77839 100644 --- a/fs/9p/fcall.c +++ b/fs/9p/fcall.c @@ -367,7 +367,7 @@ v9fs_t_read(struct v9fs_session_info *v9ses, u32 fid, u64 offset, int ret; struct v9fs_fcall *tc, *rc; - dprintk(DEBUG_9P, "fid %d offset 0x%llux count 0x%x\n", fid, + dprintk(DEBUG_9P, "fid %d offset 0x%llx count 0x%x\n", fid, (long long unsigned) offset, count); tc = v9fs_create_tread(fid, offset, count); @@ -393,21 +393,23 @@ v9fs_t_read(struct v9fs_session_info *v9ses, u32 fid, u64 offset, * @fid: fid to write to * @offset: offset to start write at * @count: how many bytes to write + * @data: userspace data + * @kdata: kernelspace data * @fcall: pointer to response fcall * */ int v9fs_t_write(struct v9fs_session_info *v9ses, u32 fid, u64 offset, u32 count, - const char __user *data, struct v9fs_fcall **rcp) + const char __user *data, char *kdata, struct v9fs_fcall **rcp) { int ret; struct v9fs_fcall *tc,
[PATCH] 9p: add write-cache support to loose cache mode (take 2)
Loose cache mode was added primarily to asssist exclusive, read-only mounts (like venti) -- however, there is also a case for using loose write cacheing in support of read/write exclusive mounts. This feature is linked to the loose cache option and is disabled by default. This code adds the necessary code to support writes through the page cache. Write caches are not used for synthetic files or for files opened in APPEND mode. Signed-of-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- fs/9p/9p.h|2 +- fs/9p/conv.c | 19 +- fs/9p/conv.h |2 +- fs/9p/fcall.c | 10 ++- fs/9p/fid.c | 17 +++-- fs/9p/fid.h |2 +- fs/9p/v9fs_vfs.h |3 + fs/9p/vfs_addr.c | 189 +--- fs/9p/vfs_dir.c |2 +- fs/9p/vfs_file.c | 63 ++ fs/9p/vfs_inode.c | 24 +-- fs/9p/vfs_super.c |3 +- 12 files changed, 287 insertions(+), 49 deletions(-) diff --git a/fs/9p/9p.h b/fs/9p/9p.h index 94e2f92..6f8edf0 100644 --- a/fs/9p/9p.h +++ b/fs/9p/9p.h @@ -370,6 +370,6 @@ int v9fs_t_read(struct v9fs_session_info *v9ses, u32 fid, u64 offset, u32 count, struct v9fs_fcall **rcall); int v9fs_t_write(struct v9fs_session_info *v9ses, u32 fid, u64 offset, -u32 count, const char __user * data, +u32 count, const char __user * data, char * kdata, struct v9fs_fcall **rcall); int v9fs_printfcall(char *, int, struct v9fs_fcall *, int); diff --git a/fs/9p/conv.c b/fs/9p/conv.c index a3ed571..8d90c79 100644 --- a/fs/9p/conv.c +++ b/fs/9p/conv.c @@ -458,6 +458,15 @@ v9fs_put_user_data(struct cbuf *bufp, const char __user * data, int count, return copy_from_user(*pdata, data, count); } +static int +v9fs_put_kernel_data(struct cbuf *bufp, char * kdata, int count, + unsigned char **pdata) +{ + *pdata = buf_alloc(bufp, count); + memcpy(*pdata, kdata, count); + return 0; +} + static void v9fs_put_wstat(struct cbuf *bufp, struct v9fs_wstat *wstat, struct v9fs_stat *stat, int statsz, int extended) @@ -723,7 +732,7 @@ struct v9fs_fcall *v9fs_create_tread(u32 fid, u64 offset, u32 count) } struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, - const char __user * data) + const char __user * data, char * kdata) { int size, err; struct v9fs_fcall *fc; @@ -738,7 +747,13 @@ struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, v9fs_put_int32(bufp, fid, &fc->params.twrite.fid); v9fs_put_int64(bufp, offset, &fc->params.twrite.offset); v9fs_put_int32(bufp, count, &fc->params.twrite.count); - err = v9fs_put_user_data(bufp, data, count, &fc->params.twrite.data); + if(data) { + err = v9fs_put_user_data(bufp, data, count, + &fc->params.twrite.data); + } else { + err = v9fs_put_kernel_data(bufp, kdata, count, + &fc->params.twrite.data); + } if (err) { kfree(fc); fc = ERR_PTR(err); diff --git a/fs/9p/conv.h b/fs/9p/conv.h index dd5b6b1..8091672 100644 --- a/fs/9p/conv.h +++ b/fs/9p/conv.h @@ -42,7 +42,7 @@ struct v9fs_fcall *v9fs_create_tcreate(u32 fid, char *name, u32 perm, u8 mode, char *extension, int extended); struct v9fs_fcall *v9fs_create_tread(u32 fid, u64 offset, u32 count); struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, - const char __user *data); + const char __user *data, char *kdata); struct v9fs_fcall *v9fs_create_tclunk(u32 fid); struct v9fs_fcall *v9fs_create_tremove(u32 fid); struct v9fs_fcall *v9fs_create_tstat(u32 fid); diff --git a/fs/9p/fcall.c b/fs/9p/fcall.c index dc336a6..ca77839 100644 --- a/fs/9p/fcall.c +++ b/fs/9p/fcall.c @@ -367,7 +367,7 @@ v9fs_t_read(struct v9fs_session_info *v9ses, u32 fid, u64 offset, int ret; struct v9fs_fcall *tc, *rc; - dprintk(DEBUG_9P, "fid %d offset 0x%llux count 0x%x\n", fid, + dprintk(DEBUG_9P, "fid %d offset 0x%llx count 0x%x\n", fid, (long long unsigned) offset, count); tc = v9fs_create_tread(fid, offset, count); @@ -393,21 +393,23 @@ v9fs_t_read(struct v9fs_session_info *v9ses, u32 fid, u64 offset, * @fid: fid to write to * @offset: offset to start write at * @count: how many bytes to write + * @data: userspace data + * @kdata: kernelspace data * @fcall: pointer to response fcall * */ int v9fs_t_write(struct v9fs_session_info *v9ses, u32 fid, u64 offset, u32 count, - const char __user *data, struct v9fs_fcall **rcp) + const char __user *data, char *kdata, struct v9fs_fcall **rcp) { int ret; struct v9f
Re: [RESEND][PATCH] 9p: add write-cache support to loose cache mode
On 2/13/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > On Tue, 13 Feb 2007 17:55:31 -0600 Eric Van Hensbergen <[EMAIL PROTECTED]> wrote: > +int v9fs_prepare_write(struct file *file, struct page *page, > +unsigned from, unsigned to) > +{ > + if (!PageUptodate(page)) { > + if (to - from != PAGE_CACHE_SIZE) { > + void *kaddr = kmap_atomic(page, KM_USER0); > + memset(kaddr, 0, from); > + memset(kaddr + to, 0, PAGE_CACHE_SIZE - to); > + flush_dcache_page(page); > + kunmap_atomic(kaddr, KM_USER0); > + } > + SetPageUptodate(page); > + } This will mark the page uptodate while the piece between `to' and `from' is uninitialised. A concurrent pagefault can come in and permit a read of that uninitialised data. Because filemap_nopage() doesn't lock the page if it is uptodate. Okay - I snagged this code from fs/libfs.c (simple_prepare_write) -- is that code also not correct, or am I just using it in the wrong context? -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND][PATCH] 9p: add write-cache support to loose cache mode
Loose cache mode was added primarily to asssist exclusive, read-only mounts (like venti) -- however, there is also a case for using loose write cacheing in support of read/write exclusive mounts. This feature is linked to the loose cache option and is disabled by default. This code adds the necessary code to support writes through the page cache. Write caches are not used for synthetic files or for files opened in APPEND mode. Signed-of-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- Documentation/filesystems/9p.txt |2 fs/9p/9p.h |2 fs/9p/conv.c | 19 fs/9p/conv.h |2 fs/9p/fcall.c|6 + fs/9p/fid.c | 17 ++-- fs/9p/fid.h |2 fs/9p/v9fs_vfs.h |2 fs/9p/vfs_addr.c | 180 -- fs/9p/vfs_dir.c |2 fs/9p/vfs_file.c | 58 ++-- fs/9p/vfs_inode.c| 13 ++- fs/9p/vfs_super.c|3 - 13 files changed, 264 insertions(+), 44 deletions(-) diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index bbd8b28..36ed211 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt @@ -42,7 +42,7 @@ OPTIONS cache=mode specifies a cacheing policy. By default, no caches are used. loose = no attempts are made at consistency, -intended for exclusive, read-only mounts +intended for exclusive mounts debug=n specifies debug level. The debug level is a bitmask. 0x01 = display verbose error messages diff --git a/fs/9p/9p.h b/fs/9p/9p.h index 94e2f92..6f8edf0 100644 --- a/fs/9p/9p.h +++ b/fs/9p/9p.h @@ -370,6 +370,6 @@ int v9fs_t_read(struct v9fs_session_info u64 offset, u32 count, struct v9fs_fcall **rcall); int v9fs_t_write(struct v9fs_session_info *v9ses, u32 fid, u64 offset, -u32 count, const char __user * data, +u32 count, const char __user * data, char * kdata, struct v9fs_fcall **rcall); int v9fs_printfcall(char *, int, struct v9fs_fcall *, int); diff --git a/fs/9p/conv.c b/fs/9p/conv.c index a3ed571..89c3d3c 100644 --- a/fs/9p/conv.c +++ b/fs/9p/conv.c @@ -458,6 +458,15 @@ v9fs_put_user_data(struct cbuf *bufp, co return copy_from_user(*pdata, data, count); } +static int +v9fs_put_kernel_data(struct cbuf *bufp, char * kdata, int count, + unsigned char **pdata) +{ + *pdata = buf_alloc(bufp, count); + memcpy(*pdata, kdata, count); + return 0; +} + static void v9fs_put_wstat(struct cbuf *bufp, struct v9fs_wstat *wstat, struct v9fs_stat *stat, int statsz, int extended) @@ -723,7 +732,7 @@ struct v9fs_fcall *v9fs_create_tread(u32 } struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, - const char __user * data) + const char __user * data, char * kdata) { int size, err; struct v9fs_fcall *fc; @@ -738,7 +747,13 @@ struct v9fs_fcall *v9fs_create_twrite(u3 v9fs_put_int32(bufp, fid, &fc->params.twrite.fid); v9fs_put_int64(bufp, offset, &fc->params.twrite.offset); v9fs_put_int32(bufp, count, &fc->params.twrite.count); - err = v9fs_put_user_data(bufp, data, count, &fc->params.twrite.data); + if(data) { + err = v9fs_put_user_data(bufp, data, count, + &fc->params.twrite.data); + } else { + err = v9fs_put_kernel_data(bufp, kdata, count, + &fc->params.twrite.data); + } if (err) { kfree(fc); fc = ERR_PTR(err); diff --git a/fs/9p/conv.h b/fs/9p/conv.h index dd5b6b1..8091672 100644 --- a/fs/9p/conv.h +++ b/fs/9p/conv.h @@ -42,7 +42,7 @@ struct v9fs_fcall *v9fs_create_tcreate(u char *extension, int extended); struct v9fs_fcall *v9fs_create_tread(u32 fid, u64 offset, u32 count); struct v9fs_fcall *v9fs_create_twrite(u32 fid, u64 offset, u32 count, - const char __user *data); + const char __user *data, char *kdata); struct v9fs_fcall *v9fs_create_tclunk(u32 fid); struct v9fs_fcall *v9fs_create_tremove(u32 fid); struct v9fs_fcall *v9fs_create_tstat(u32 fid); diff --git a/fs/9p/fcall.c b/fs/9p/fcall.c index dc336a6..671301a 100644 --- a/fs/9p/fcall.c +++ b/fs/9p/fcall.c @@ -393,13 +393,15 @@ v9fs_t_read(struct v9fs_session_info *v9 * @fid: fid to write to * @offset: offset to start write at * @count: how many bytes to write + * @data: userspace data + * @kdata: kernelspace data * @fcall: pointer to response fcall * */ i
[PATCH] 9p: add write-cache support to loose cache mode
Loose cache mode was added primarily to asssist exclusive, read-only mounts (like venti) -- however, there is also a case for using loose write cacheing in support of read/write exclusive mounts. This feature is linked to the loose cache option and is disabled by default. This code adds the necessary code to support writes through the page cache. Write caches are not used for synthetic files or for files opened in APPEND mode. Signed-of-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- Documentation/filesystems/9p.txt |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index bbd8b28..36ed211 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt @@ -42,7 +42,7 @@ OPTIONS cache=mode specifies a cacheing policy. By default, no caches are used. loose = no attempts are made at consistency, -intended for exclusive, read-only mounts +intended for exclusive mounts debug=n specifies debug level. The debug level is a bitmask. 0x01 = display verbose error messages -- 1.4.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 9p: implement optional loose read cache
While cacheing is generally frowned upon in the 9p world, it has its place -- particularly in situations where the remote file system is exclusive and/or read-only. The vacfs views of venti content addressable store are a real-world instance of such a situation. To facilitate higher performance for these workloads (and eventually use the fscache patches), we have enabled a "loose" cache mode which does not attempt to maintain any form of consistency on the page-cache or dcache. This results in over two orders of magnitude performance improvement for cacheable block reads in the Bonnie benchmark. The more aggressive use of the dcache also seems to improve metadata operational performance. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- Documentation/filesystems/00-INDEX |4 ++-- Documentation/filesystems/9p.txt |4 fs/9p/fid.c|3 ++- fs/9p/v9fs.c |9 - fs/9p/v9fs.h |9 - fs/9p/v9fs_vfs.h |2 ++ fs/9p/vfs_addr.c |2 ++ fs/9p/vfs_dentry.c | 26 ++ fs/9p/vfs_file.c | 18 ++ fs/9p/vfs_inode.c | 20 10 files changed, 88 insertions(+), 9 deletions(-) diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 4dc28cc..5717858 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -4,6 +4,8 @@ Exporting - explanation of how to make filesystems exportable. Locking - info on locking rules as they pertain to Linux VFS. +9p.txt + - 9p (v9fs) is an implementation of the Plan 9 remote fs protocol. adfs.txt - info and mount options for the Acorn Advanced Disc Filing System. afs.txt @@ -82,8 +84,6 @@ udf.txt - info and mount options for the UDF filesystem. ufs.txt - info on the ufs filesystem. -v9fs.txt - - v9fs is a Unix implementation of the Plan 9 9p remote fs protocol. vfat.txt - info on using the VFAT filesystem used in Windows NT and Windows 95 vfs.txt diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index 4d075a4..bbd8b28 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt @@ -40,6 +40,10 @@ OPTIONS aname=name aname specifies the file tree to access when the server is offering several exported file systems. + cache=mode specifies a cacheing policy. By default, no caches are used. + loose = no attempts are made at consistency, +intended for exclusive, read-only mounts + debug=n specifies debug level. The debug level is a bitmask. 0x01 = display verbose error messages 0x02 = developer debug (DEBUG_CURRENT) diff --git a/fs/9p/fid.c b/fs/9p/fid.c index a9b6301..9041971 100644 --- a/fs/9p/fid.c +++ b/fs/9p/fid.c @@ -136,7 +136,8 @@ struct v9fs_fid *v9fs_fid_lookup(struct } /** - * v9fs_fid_clone - lookup the fid for a dentry, clone a private copy and release it + * v9fs_fid_clone - lookup the fid for a dentry, clone a private copy and + * release it * @dentry: dentry to look for fid in * * find a fid in the dentry and then clone to a new private fid diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index d9b561b..6ad6f19 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -53,6 +53,8 @@ enum { Opt_uname, Opt_remotename, /* Options that take no arguments */ Opt_legacy, Opt_nodevmap, Opt_unix, Opt_tcp, Opt_fd, + /* Cache options */ + Opt_cache_loose, /* Error token */ Opt_err }; @@ -76,6 +78,8 @@ static match_table_t tokens = { {Opt_fd, "fd"}, {Opt_legacy, "noextend"}, {Opt_nodevmap, "nodevmap"}, + {Opt_cache_loose, "cache=loose"}, + {Opt_cache_loose, "loose"}, {Opt_err, NULL} }; @@ -106,6 +110,7 @@ static void v9fs_parse_options(char *opt v9ses->debug = 0; v9ses->rfdno = ~0; v9ses->wfdno = ~0; + v9ses->cache = 0; if (!options) return; @@ -121,7 +126,6 @@ static void v9fs_parse_options(char *opt "integer field, but no integer?\n"); continue; } - } switch (token) { case Opt_port: @@ -169,6 +173,9 @@ static void v9fs_parse_options(char *opt case Opt_nodevmap: v9ses->nodev = 1; break; + case Opt_cache_loose: + v9ses->cache = CACHE_LOOSE; + break; default:
Re: [RFC][PATCH] dm-cow: copy-on-write stackable target for device-mapper
On 11/27/06, Eric Van Hensbergen <[EMAIL PROTECTED]> wrote: Subject: [RFC] [PATCH] dm-cow: copy-on-write stackable target for device-mapper + +A simple script file is included to format WB. +The way it works is the standard dmsetup calls for +the device mapper. The arguments are + There have been several requests for the "simple" script file. Portions of its implementation are somewhat specific to our installation (like the use of the rc shell). I'm including it inline here - not sure where else to put it -- perhaps in the Documentation: #!/usr/bin/rc tableboth ='0 devblksz cow devread 0 devwrite devbmapw 0' devsize = `{sfdisk -s $1} fn wipeoutdev{ bmapsize = `{echo $devsize / 512 /2 |bc} echo dd 'if=/dev/zero' 'of='^$1 'count='^$bmapsize 'bs=512' dd 'if=/dev/zero' 'of='^$1 'count='^$bmapsize 'bs=512' } fn escapedev{ echo $1|sed 's/\//\\\//g' } fn setcow{ devread = `{escapedev $1} devwrite = `{escapedev $2} devbmapw = `{escapedev $3} tableset = `{echo $tableboth|sed -r 's/devread/'^$devread^'/'} tableset = `{echo $tableset|sed -r 's/devwrite/'^$devwrite^'/'} tableset = `{echo $tableset|sed -r 's/devbmapw/'^$devbmapw^'/'} devblksz = `{expr $devsize '*' 2} tableset = `{echo $tableset|sed -r 's/devblksz/'^$devblksz^'/'} echo trying to set $tableset echo $tableset > /tmp/tableset echo trying to dmsetup create $4 dmsetup create $4 << EOF $tableset EOF dmsetup table } if(! ~ $#* 4){ echo usage: newcow origdev destdev bmapdest namecow exit 1 } namecow=$4 origdev=$1 wipeoutdev $2 setcow $1 $2 $3 $namecow - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 9p: null terminate error strings for debug print
From: Eric Van Hensbergen <[EMAIL PROTECTED]> - unquoted We weren't properly NULL terminating protocol error strings for our debug printk resulting in garbage being included in the output when debug was enabled. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- fs/9p/error.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/9p/error.c b/fs/9p/error.c index ae91555..0d7fa4e 100644 --- a/fs/9p/error.c +++ b/fs/9p/error.c @@ -83,6 +83,7 @@ int v9fs_errstr2errno(char *errstr, int len) if (errno == 0) { /* TODO: if error isn't found, add it dynamically */ + errstr[len] = 0; printk(KERN_ERR "%s: errstr :%s: not found\n", __FUNCTION__, errstr); errno = 1; -- 1.5.0.rc1.gde38 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 9p: fix segfault caused by race condition in meta-data operations
From: Eric Van Hensbergen <[EMAIL PROTECTED]> - unquoted Running dbench multithreaded exposed a race condition where fid structures were removed while in use. This patch adds semaphores to meta-data operations to protect the fid structure. Some cleanup of error-case handling in the inode operations is also included. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- fs/9p/fid.c | 69 +- fs/9p/fid.h |5 ++ fs/9p/vfs_file.c | 47 ++-- fs/9p/vfs_inode.c | 204 ++-- 4 files changed, 196 insertions(+), 129 deletions(-) diff --git a/fs/9p/fid.c b/fs/9p/fid.c index 2750720..a9b6301 100644 --- a/fs/9p/fid.c +++ b/fs/9p/fid.c @@ -25,6 +25,7 @@ #include #include #include +#include #include "debug.h" #include "v9fs.h" @@ -84,6 +85,7 @@ struct v9fs_fid *v9fs_fid_create(struct v9fs_session_info *v9ses, int fid) new->iounit = 0; new->rdir_pos = 0; new->rdir_fcall = NULL; + init_MUTEX(&new->lock); INIT_LIST_HEAD(&new->list); return new; @@ -102,11 +104,11 @@ void v9fs_fid_destroy(struct v9fs_fid *fid) } /** - * v9fs_fid_lookup - retrieve the right fid from a particular dentry + * v9fs_fid_lookup - return a locked fid from a dentry * @dentry: dentry to look for fid in - * @type: intent of lookup (operation or traversal) * - * find a fid in the dentry + * find a fid in the dentry, obtain its semaphore and return a reference to it. + * code calling lookup is responsible for releasing lock * * TODO: only match fids that have the same uid as current user * @@ -124,7 +126,68 @@ struct v9fs_fid *v9fs_fid_lookup(struct dentry *dentry) if (!return_fid) { dprintk(DEBUG_ERROR, "Couldn't find a fid in dentry\n"); + return_fid = ERR_PTR(-EBADF); } + if(down_interruptible(&return_fid->lock)) + return ERR_PTR(-EINTR); + return return_fid; } + +/** + * v9fs_fid_clone - lookup the fid for a dentry, clone a private copy and release it + * @dentry: dentry to look for fid in + * + * find a fid in the dentry and then clone to a new private fid + * + * TODO: only match fids that have the same uid as current user + * + */ + +struct v9fs_fid *v9fs_fid_clone(struct dentry *dentry) +{ + struct v9fs_session_info *v9ses = v9fs_inode2v9ses(dentry->d_inode); + struct v9fs_fid *base_fid, *new_fid = ERR_PTR(-EBADF); + struct v9fs_fcall *fcall = NULL; + int fid, err; + + base_fid = v9fs_fid_lookup(dentry); + + if(IS_ERR(base_fid)) + return base_fid; + + if(base_fid) { /* clone fid */ + fid = v9fs_get_idpool(&v9ses->fidpool); + if (fid < 0) { + eprintk(KERN_WARNING, "newfid fails!\n"); + new_fid = ERR_PTR(-ENOSPC); + goto Release_Fid; + } + + err = v9fs_t_walk(v9ses, base_fid->fid, fid, NULL, &fcall); + if (err < 0) { + dprintk(DEBUG_ERROR, "clone walk didn't work\n"); + v9fs_put_idpool(fid, &v9ses->fidpool); + new_fid = ERR_PTR(err); + goto Free_Fcall; + } + new_fid = v9fs_fid_create(v9ses, fid); + if (new_fid == NULL) { + dprintk(DEBUG_ERROR, "out of memory\n"); + new_fid = ERR_PTR(-ENOMEM); + } +Free_Fcall: + kfree(fcall); + } + +Release_Fid: + up(&base_fid->lock); + return new_fid; +} + +void v9fs_fid_clunk(struct v9fs_session_info *v9ses, struct v9fs_fid *fid) +{ + v9fs_t_clunk(v9ses, fid->fid); + v9fs_fid_destroy(fid); +} diff --git a/fs/9p/fid.h b/fs/9p/fid.h index aa974d6..48fc170 100644 --- a/fs/9p/fid.h +++ b/fs/9p/fid.h @@ -30,6 +30,8 @@ struct v9fs_fid { struct list_head list; /* list of fids associated with a dentry */ struct list_head active; /* XXX - debug */ + struct semaphore lock; + u32 fid; unsigned char fidopen;/* set when fid is opened */ unsigned char fidclunked; /* set when fid has already been clunked */ @@ -55,3 +57,6 @@ struct v9fs_fid *v9fs_fid_get_created(struct dentry *); void v9fs_fid_destroy(struct v9fs_fid *fid); struct v9fs_fid *v9fs_fid_create(struct v9fs_session_info *, int fid); int v9fs_fid_insert(struct v9fs_fid *fid, struct dentry *dentry); +struct v9fs_fid *v9fs_fid_clone(struct dentry *dentry); +void v9fs_fid_clunk(struct v9fs_session_info *v9ses, struct v9fs_fid *fid); + diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c index e86a071..9f17b0c 100644 --- a/fs/9p/vfs_file.c +++ b/fs/9p/vfs_file.c @@ -55,53 +55,22 @@ int v9fs_file
[PATCH] 9p: update documentation regarding server applications
Update the documentation to cover using Inferno as a server for 9p and to include information about spfs (a stable single-threaded stand-alone 9p server). Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- Documentation/filesystems/9p.txt | 20 +--- 1 files changed, 17 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index 43b89c2..be45fe3 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt @@ -73,8 +73,22 @@ OPTIONS RESOURCES = -The Linux version of the 9p server is now maintained under the npfs project -on sourceforge (http://sourceforge.net/projects/npfs). +Our current recommendation is to use Inferno (http://www.vitanuova.com/inferno) +as the 9p server. You can start a 9p server under Inferno by issuing the +following command: + ; styxlisten -A tcp!*!564 export '#U*' + +The -A specifies an unauthenticated export. The 564 is the port # (you may +have to choose a higher port number if running as a normal user). The '#U*' +specifies exporting the root of the Linux name space. You may specify a +subset of the namespace by extending the path: '#U*'/tmp would just export +/tmp. For more information, see the Inferno manual pages covering styxlisten +and export. + +A Linux version of the 9p server is now maintained under the npfs project +on sourceforge (http://sourceforge.net/projects/npfs). There is also a +more stable single-threaded version of the server (named spfs) available from +the same CVS repository. There are user and developer mailing lists available through the v9fs project on sourceforge (http://sourceforge.net/projects/v9fs). @@ -96,5 +110,5 @@ STATUS The 2.6 kernel support is working on PPC and x86. -PLEASE USE THE SOURCEFORGE BUG-TRACKER TO REPORT PROBLEMS. +PLEASE USE THE KERNEL BUGZILLA TO REPORT PROBLEMS. (http://bugzilla.kernel.org) -- 1.5.0.rc1.gdf1b-dirty - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND][PATCH] 9p: fix rename return code
9p doesn't handle renames between directories -- however, we were returning EPERM instead of EXDEV when we detected this case. Signed-off-by: Eric Van Hensbergren <[EMAIL PROTECTED]> --- fs/9p/vfs_inode.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c index 18f26cd..05d30e8 100644 --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -767,7 +767,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry, /* 9P can only handle file rename in the same directory */ if (memcmp(&olddirfid->qid, &newdirfid->qid, sizeof(newdirfid->qid))) { dprintk(DEBUG_ERROR, "old dir and new dir are different\n"); - retval = -EPERM; + retval = -EXDEV; goto FreeFcallnBail; } -- 1.5.0.rc1.gdf1b-dirty - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND][PATCH] 9p: fix bogus return code checks during initialization
There is a simple logic error in init_v9fs - the return code checks are reversed. This patch fixes the return code and adds some messages to prevent module initialization from failing silently. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- fs/9p/mux.c |4 +++- fs/9p/v9fs.c | 11 --- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/9p/mux.c b/fs/9p/mux.c index 944273c..147ceef 100644 --- a/fs/9p/mux.c +++ b/fs/9p/mux.c @@ -132,8 +132,10 @@ int v9fs_mux_global_init(void) v9fs_mux_poll_tasks[i].task = NULL; v9fs_mux_wq = create_workqueue("v9fs"); - if (!v9fs_mux_wq) + if (!v9fs_mux_wq) { + printk(KERN_WARNING "v9fs: mux: creating workqueue failed\n"); return -ENOMEM; + } return 0; } diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 0b96fae..d9b561b 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -457,14 +457,19 @@ static int __init init_v9fs(void) v9fs_error_init(); - printk(KERN_INFO "Installing v9fs 9P2000 file system support\n"); + printk(KERN_INFO "Installing v9fs 9p2000 file system support\n"); ret = v9fs_mux_global_init(); - if (!ret) + if (ret) { + printk(KERN_WARNING "v9fs: starting mux failed\n"); return ret; + } ret = register_filesystem(&v9fs_fs_type); - if (!ret) + if (ret) { + printk(KERN_WARNING "v9fs: registering file system failed\n"); v9fs_mux_global_exit(); + } + return ret; } -- 1.5.0.rc1.gdf1b-dirty - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] dm-cache: block level disk cache target for device mapper
On 11/27/06, Eric Van Hensbergen <[EMAIL PROTECTED]> wrote: This is the first cut of a device-mapper target which provides a write-back or write-through block cache. It is intended to be used in conjunction with remote block devices such as iSCSI or ATA-over-Ethernet, particularly in cluster situations. The technical paper describing our motivations and some performance results has finally made it through IBM's clearance process. It is available here: http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/ba52bef8b940e7438525723c006bafea?OpenDocument -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] dm-cow: copy-on-write stackable target for device-mapper
On 11/27/06, Eric Van Hensbergen <[EMAIL PROTECTED]> wrote: Subject: [RFC] [PATCH] dm-cow: copy-on-write stackable target for device-mapper This is the first cut of a device-mapper target which allows stacking of multiple block devices and in which the top-layer of the stack is a copy-on-write layer. It was originally developed in support of a cluster image management solution. The paper describing our motivation for this work including some description of this implementation and performance results is now available: http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/801d563d3be022198525723c006fafc1?OpenDocument -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [RFC][PATCH] dm-cache: block level disk cache target for device mapper
On 11/30/06, Jens Wilke <[EMAIL PROTECTED]> wrote: On Monday 27 November 2006 19:26, Eric Van Hensbergen wrote: If this is intended to speed up remote disks, is it possible that the cache content can be paged out on local disks in low-mem situations? The main intent was to use local disks as cache to offload centralized remote disks. The logic was that most systems have local disks, if only for swap -- so why not use them as a cache to help offload centralized storage. While the in-memory page cache works perfectly fine in certain situations -- we were dealing with workloads in which the in-memory page-cache wasn't sufficient to hold all the data. There are also some additional possibilities we've thought through and have been playing with including allowing the local disk cache to be persistent across reboots (with varying validation schemes). -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] dm-cache: block level disk cache target for device mapper
On 11/27/06, bert hubert <[EMAIL PROTECTED]> wrote: On Mon, Nov 27, 2006 at 06:26:34PM +, Eric Van Hensbergen wrote: > This is the first cut of a device-mapper target which provides a write-back > or write-through block cache. It is intended to be used in conjunction with > remote block devices such as iSCSI or ATA-over-Ethernet, particularly in > cluster situations. How does this work in practice? In other words, what is a typical actual configuration? There is a remote block device, and a local one, and these are kept into sync in some way? That's the basic idea. In our testbed, we had a single iSCSI server exporting block devices to several clients -- each maintaining their own local disk cache of the server exported block devices. You can configured either write-through or write-back policies -- write-back has better performance, but somewhat obvious consistency issues in failure cases. The original intent was to combine this with the dm-cow target (which I posted a few hours before the dm-cache patch) to provide a scalable cluster deployment system based on back-end iSCSI or ATA-over-Ethernet storage. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] dm-cache: block level disk cache target for device mapper
This is the first cut of a device-mapper target which provides a write-back or write-through block cache. It is intended to be used in conjunction with remote block devices such as iSCSI or ATA-over-Ethernet, particularly in cluster situations. In performance tests with iSCSI, gave peformance improvements of 2-10x that of iSCSI alone when Postmark or Bonnie loads were applied from 8 clients to a single server. Evidence suggests even greater differences on larger clusters. A detailed performance analysis will be vailable shortly via a technical report on IBM's CyberDigest. This module was developed during an intership at IBM Research by Ming Zhao. Please direct comments to both Ming and myself. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- drivers/md/Kconfig|6 drivers/md/Makefile |1 drivers/md/dm-cache.c | 1465 + 3 files changed, 1472 insertions(+), 0 deletions(-) diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index c92c152..0f23a15 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -261,6 +261,12 @@ config DM_MULTIPATH_EMC ---help--- Multipath support for EMC CX/AX series hardware. +config DM_CACHE + tristate "Cache target support (EXPERIMENTAL)" + depends on BLK_DEV_DM && EXPERIMENTAL + ---help--- + Support for generic cache target for device-mapper. + endmenu endif diff --git a/drivers/md/Makefile b/drivers/md/Makefile index 34957a6..49f7266 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile @@ -36,6 +36,7 @@ obj-$(CONFIG_DM_MULTIPATH_EMC)+= dm-emc obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o obj-$(CONFIG_DM_MIRROR)+= dm-mirror.o obj-$(CONFIG_DM_ZERO) += dm-zero.o +obj-$(CONFIG_DM_CACHE) += dm-cache.o quiet_cmd_unroll = UNROLL $@ cmd_unroll = $(PERL) $(srctree)/$(src)/unroll.pl $(UNROLL) \ diff --git a/drivers/md/dm-cache.c b/drivers/md/dm-cache.c new file mode 100755 index 000..209bae0 --- /dev/null +++ b/drivers/md/dm-cache.c @@ -0,0 +1,1465 @@ +/* + * dm-cache.c + * Device mapper target for block-level disk caching + * + * Copyright (C) International Business Machines Corp., 2006 + * Author: Ming Zhao ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; under version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "dm.h" +#include "dm-io.h" +#include "dm-bio-list.h" +#include "kcopyd.h" + +#define DMC_DEBUG 0 + +#define DM_MSG_PREFIX "dm-cache" +#define DMC_PREFIX "dm-cache: " + +#if DMC_DEBUG +#define DPRINTK( s, arg... ) printk(DMC_PREFIX s "\n", ##arg) +#else +#define DPRINTK( s, arg... ) +#endif + +#define WRITE_THROUGH 0 +#define WRITE_BACK 1 +#define DEFAULT_WRITE_POLICY WRITE_BACK + +#define DMCACHE_COPY_PAGES 1024 +#define DEFAULT_CACHE_SIZE 65536 +#define DEFAULT_CACHE_ASSOC1024 +#define DEFAULT_BLOCK_SIZE 8 +#define CONSECUTIVE_BLOCKS 128 + +#define HASH 0 +#define UNIFORM1 +#define DEFAULT_HASHFUNC UNIFORM + +/* states of a cache block */ +#define INVALID0 +#define VALID 1 /* Valid */ +#define RESERVED 2 /* Allocated but data not in place yet */ +#define DIRTY 4 /* Locally modified */ +#define WRITEBACK 8 /* In the process of write back */ + +/* + * cache: maps a cache range of a device. + */ +struct cache_c { + struct dm_dev *src_dev; /* Source device */ + struct dm_dev *cache_dev; /* Cache device */ + struct kcopyd_client *kcp_client; /* Kcopyd client for writing back data */ + + struct cacheblock *cache; /* Hash table for cache blocks */ + sector_t size; /* Cache size */ + unsigned int bits; /* Cache size in bits */ + unsigned int assoc; /* Cache associativity */ + unsigned int block_size;/* Cache block size */ + unsigned int block_shift; /* Cache block size in bits */ + unsigned int block_mask;
[RFC][PATCH] dm-cow: copy-on-write stackable target for device-mapper
Subject: [RFC] [PATCH] dm-cow: copy-on-write stackable target for device-mapper This is the first cut of a device-mapper target which allows stacking of multiple block devices and in which the top-layer of the stack is a copy-on-write layer. It was originally developed in support of a cluster image management solution. Existing device mapper snapshot facilities could be used to implement stackable block devices, as they support a copy-on-write mechanism for taking snapshots of logical volumes. However, benchmarks (using bonnie++) of such solutions showed an order of magnitude performance degredation. This target was written in an attempt to provide a stacking and copy-on-write solution which would incur only a minimal overhead. Detailed performance results will be available shortly in a technical-report available via IBM's CyberDigest website -- however, initial results obtained with bonnie++ show dramatic performance improvements. The code within this module was developed by an intern (Gorka Guardiola) during his stay with us at IBM Research. Please direct comments both to myself and Gorka. Signed-off-by: Eric Van Hensbegren <[EMAIL PROTECTED]> --- Documentation/device-mapper/dm-cow.txt | 29 + drivers/md/Kconfig | 15 + drivers/md/Makefile|1 drivers/md/dm-cow.c| 926 4 files changed, 971 insertions(+), 0 deletions(-) diff --git a/Documentation/device-mapper/dm-cow.txt b/Documentation/device-mapper/dm-cow.txt new file mode 100644 index 000..2b18ee6 --- /dev/null +++ b/Documentation/device-mapper/dm-cow.txt @@ -0,0 +1,29 @@ +This is a target for the dm-mapper which stacks +block devices. The base image B is a formatted block +device. Over that go N read only block devices R +and then 1 write device W. It does copy on write +of the devices, and reads from the appropiate device. +You start by formatting B. Then add a W on it. +W consists on two parts, a block device for the bitmap +which should start zeroed and which gets some magic number +on it the first time it is used. The you can add another +W and the first W turns into a R. WRB. and so on. + +A simple script file is included to format WB. +The way it works is the standard dmsetup calls for +the device mapper. The arguments are + +N M logdevname Bdevname Boffset Rbitmapdev Rdevname Roffset Wbitmapdev Wdevname Woffset + +N and M are the offsets on the logical device /dev/mapper/logdevname + +Bdevname is the base image device name +Boffset is the offset on the base image device +Rbitmapdev is the block device for the bitmap on a read only device + +and so on. + +a real world example of BW: + +0 8385866 cow /dev/sdb1 0 /dev/sdb2 /dev/sdb3 0 + diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index c92c152..dc86099 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -261,6 +261,21 @@ config DM_MULTIPATH_EMC ---help--- Multipath support for EMC CX/AX series hardware. +config DM_COW + tristate "Copy-on-write Stackable target (EXPERIMENTAL)" + depends on BLK_DEV_DM && EXPERIMENTAL + ---help--- + This is a target for the dm-mapper which stacks + block devices. The base image B is a formatted block + device. Over that go N read only block devices R + and then 1 write device W. It does copy on write + of the devices, and reads from the appropiate device. + You start by formatting B. Then add a W on it. + W consists on two parts, a block device for the bitmap + which should start zeroed and which gets some magic number + on it the first time it is used. The you can add another + W and the first W turns into a R. WRB. and so on. + endmenu endif diff --git a/drivers/md/Makefile b/drivers/md/Makefile index 34957a6..8a3d79f 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile @@ -36,6 +36,7 @@ obj-$(CONFIG_DM_MULTIPATH_EMC)+= dm-emc obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o obj-$(CONFIG_DM_MIRROR)+= dm-mirror.o obj-$(CONFIG_DM_ZERO) += dm-zero.o +obj-$(CONFIG_DM_COW) += dm-cow.o quiet_cmd_unroll = UNROLL $@ cmd_unroll = $(PERL) $(srctree)/$(src)/unroll.pl $(UNROLL) \ diff --git a/drivers/md/dm-cow.c b/drivers/md/dm-cow.c new file mode 100644 index 000..e77060e --- /dev/null +++ b/drivers/md/dm-cow.c @@ -0,0 +1,926 @@ +/* + * dm-cow.c + * Device mapper target for block-level disk caching + * + * Copyright (C) International Business Machines Corp., 2006 + * Author: Gorka Guardiola and Eric Van Hensbergen ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; under version 2 of the License. + * + * This program is distributed in the hope that it will b
Re: FUSE merging?
On 9/3/05, Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > While FUSE doesn't handle it directly, doesn't it have to punt it to > > its network file systems, how to the sshfs and what not handle this > > sort of mapping? > > Sshfs handles it by not handling it. In this case it is neither > possible, nor needed to be able to correctly map the id space. > > Yes, it may confuse the user. It may even confuse the kernel for > sticky directories(*). But basically it just works, and is very > simple. > In principal, Plan 9 file servers handle permission checking server-side, so we could likewise punt -- but it seemed a good idea to have some form of mapping for directory listings (and things like sticky directories) to make sense. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: FUSE merging?
On 9/3/05, Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > I agree that lots of people would like the functionality. I regret that > > although it appears that v9fs could provide it, > > I think you are wrong there. You don't appreciate all the complexity > FUSE _lacks_ by not being network transparent. Just look at the error > text to errno conversion muck that v9fs has. And their problems with > trying to do generic uid/gid mappings. > While FUSE doesn't handle it directly, doesn't it have to punt it to its network file systems, how to the sshfs and what not handle this sort of mapping? Not really a criticism, just curious. This doesn't so much relate to FUSE, but I've been wrestling with what to do about this chunk of (mapping) code -- it seems like it might be a good idea to have some common code shared amongst the networked file systems to handle this sort of thing. The NFS idmapd service seems overcomplicated, but something like that in the common code could provide the same level of service. What do folks think? Should someone (me?) take a whack at a common id mapping service for the kernel (or just extract idmapd from NFS) -- or is this something better implemented filesystem-to-filesystem? > > there seems to be no interest in working on that. > > It would mean adding a plethora of extensions to the 9P protocol, that > would take away all it's beauty. I think you should realize that > these are different interfaces for different purposes. There may be > some overlap, but not enough to warrant trying to massage them into > one big ball. > A very good point. I toyed with the idea of looking at creating a FUSE-API-compatible v9fs file server library - but there are a good deal of features (like extended attributes) that we don't have provisions for in the protocol -- and most likely a good deal of complexity supporting some of these features that we may not want to deal with just yet. Miklos is right, for the moment FUSE and v9fs have some overlap, but they remain very different things. FUSE is far more focused on delivering user-space file servers, and as such has a better solution for developing user-space file servers. We are still focusing on getting the core of v9fs worked out, when we eventually have that working smoothly, I like to think we'd be able to spend some time developing a file server SDK as rich as FUSE (perhaps something API-compatible as I mentioned before) -- but we want to focus on getting the core protocol implementation right first - since it has uses beyond user-space file servers. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[-mm PATCH] v9fs: cleanup fd transport
[PATCH] v9fs: cleanup fd transport Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]> Signed-off-by: Eric Van Hensbegren <[EMAIL PROTECTED]> --- commit a1949213f1723a7b8bba8edfa118985460d31604 tree 40224cafbfb68543c60a8e0f04ae669cba2cedf7 parent 3f92b2539fe581ee9011d687fbd43cebb641465e author Eric Van Hensbergen <[EMAIL PROTECTED]> Wed, 31 Aug 2005 16:02:42 -0500 committer Eric Van Hensbergen <[EMAIL PROTECTED]> Wed, 31 Aug 2005 16:02:42 -0500 fs/9p/trans_fd.c | 42 +++--- fs/9p/v9fs.c |5 - 2 files changed, 39 insertions(+), 8 deletions(-) diff --git a/fs/9p/trans_fd.c b/fs/9p/trans_fd.c --- a/fs/9p/trans_fd.c +++ b/fs/9p/trans_fd.c @@ -56,6 +56,9 @@ static int v9fs_fd_recv(struct v9fs_tran { struct v9fs_trans_fd *ts = trans ? trans->priv : NULL; + if (!trans || trans->status != Connected || !ts) + return -EIO; + return kernel_read(ts->in_file, ts->in_file->f_pos, v, len); } @@ -73,6 +76,9 @@ static int v9fs_fd_send(struct v9fs_tran mm_segment_t oldfs = get_fs(); int ret = 0; + if (!trans || trans->status != Connected || !ts) + return -EIO; + set_fs(get_ds()); /* The cast to a user pointer is valid due to the set_fs() */ ret = vfs_write(ts->out_file, (void __user *)v, len, &ts->out_file->f_pos); @@ -95,6 +101,11 @@ v9fs_fd_init(struct v9fs_session_info *v struct v9fs_trans_fd *ts = NULL; struct v9fs_transport *trans = v9ses->transport; + if((v9ses->wfdno == ~0) || (v9ses->rfdno == ~0)) { + printk(KERN_ERR "v9fs: Insufficient options for proto=fd\n"); + return -ENOPROTOOPT; + } + sema_init(&trans->writelock, 1); sema_init(&trans->readlock, 1); @@ -103,11 +114,21 @@ v9fs_fd_init(struct v9fs_session_info *v if (!ts) return -ENOMEM; - trans->priv = ts; - ts->in_file = fget( v9ses->rfdno ); ts->out_file = fget( v9ses->wfdno ); + if (!ts->in_file || !ts->out_file) { + if (ts->in_file) + fput(ts->in_file); + + if (ts->out_file) + fput(ts->out_file); + + kfree(ts); + return -EIO; + } + + trans->priv = ts; trans->status = Connected; return 0; @@ -122,7 +143,22 @@ v9fs_fd_init(struct v9fs_session_info *v static void v9fs_fd_close(struct v9fs_transport *trans) { - struct v9fs_trans_fd *ts = trans ? trans->priv : NULL; + struct v9fs_trans_fd *ts; + + if (!trans) + return; + + trans->status = Disconnected; + ts = trans->priv; + + if (!ts) + return; + + if (ts->in_file) + fput(ts->in_file); + + if (ts->out_file) + fput(ts->out_file); kfree(ts); } diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -296,11 +296,6 @@ v9fs_session_init(struct v9fs_session_in case PROTO_FD: trans_proto = &v9fs_trans_fd; *v9ses->remotename = 0; - if((v9ses->wfdno == ~0) || (v9ses->rfdno == ~0)) { - printk(KERN_ERR "v9fs: Insufficient options for proto=fd\n"); - retval = -ENOPROTOOPT; - goto SessCleanUp; - } break; default: printk(KERN_ERR "v9fs: Bad mount protocol %d\n", v9ses->proto); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] v9fs: Support to force umount
[PATCH] v9fs: Support to force umount Support for force umount Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]> Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- commit 3f92b2539fe581ee9011d687fbd43cebb641465e tree cd34696129c3b636b85578f659f260100196dee1 parent 83f1fe3d2adc3746d719e430d0a794de1f151c40 author Eric Van Hensbergen <[EMAIL PROTECTED]> Wed, 31 Aug 2005 15:53:14 -0500 committer Eric Van Hensbergen <[EMAIL PROTECTED]> Wed, 31 Aug 2005 15:53:14 -0500 fs/9p/mux.c | 20 fs/9p/mux.h |1 + fs/9p/v9fs.c |9 + fs/9p/v9fs.h |4 +--- fs/9p/vfs_super.c |9 + 5 files changed, 40 insertions(+), 3 deletions(-) diff --git a/fs/9p/mux.c b/fs/9p/mux.c --- a/fs/9p/mux.c +++ b/fs/9p/mux.c @@ -331,6 +331,26 @@ v9fs_mux_rpc(struct v9fs_session_info *v } /** + * v9fs_mux_cancel_requests - cancels all pending requests + * + * @v9ses: session info structure + * @err: error code to return to the requests + */ +void v9fs_mux_cancel_requests(struct v9fs_session_info *v9ses, int err) +{ + struct v9fs_rpcreq *rptr; + struct v9fs_rpcreq *rreq; + + dprintk(DEBUG_MUX, " %d\n", err); + spin_lock(&v9ses->muxlock); + list_for_each_entry_safe(rreq, rptr, &v9ses->mux_fcalls, next) { + rreq->err = err; + } + spin_unlock(&v9ses->muxlock); + wake_up_all(&v9ses->read_wait); +} + +/** * v9fs_recvproc - kproc to handle demultiplexing responses * @data: session info structure * diff --git a/fs/9p/mux.h b/fs/9p/mux.h --- a/fs/9p/mux.h +++ b/fs/9p/mux.h @@ -38,3 +38,4 @@ struct v9fs_rpcreq { int v9fs_mux_init(struct v9fs_session_info *v9ses, const char *dev_name); long v9fs_mux_rpc(struct v9fs_session_info *v9ses, struct v9fs_fcall *tcall, struct v9fs_fcall **rcall); +void v9fs_mux_cancel_requests(struct v9fs_session_info *v9ses, int err); diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -414,6 +414,15 @@ void v9fs_session_close(struct v9fs_sess putname(v9ses->remotename); } +/** + * v9fs_session_cancel - mark transport as disconnected + * and cancel all pending requests. + */ +void v9fs_session_cancel(struct v9fs_session_info *v9ses) { + v9ses->transport->status = Disconnected; + v9fs_mux_cancel_requests(v9ses, -EIO); +} + extern int v9fs_error_init(void); /** diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h --- a/fs/9p/v9fs.h +++ b/fs/9p/v9fs.h @@ -89,9 +89,7 @@ struct v9fs_session_info *v9fs_inode2v9s void v9fs_session_close(struct v9fs_session_info *v9ses); int v9fs_get_idpool(struct v9fs_idpool *p); void v9fs_put_idpool(int id, struct v9fs_idpool *p); -int v9fs_get_option(char *opts, char *name, char *buf, int buflen); -long long v9fs_get_int_option(char *opts, char *name, long long dflt); -int v9fs_parse_tcp_devname(const char *devname, char **addr, char **remotename); +void v9fs_session_cancel(struct v9fs_session_info *v9ses); #define V9FS_MAGIC 0x01021997 diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c --- a/fs/9p/vfs_super.c +++ b/fs/9p/vfs_super.c @@ -257,10 +257,19 @@ static int v9fs_show_options(struct seq_ return 0; } +static void +v9fs_umount_begin(struct super_block *sb) +{ + struct v9fs_session_info *v9ses = sb->s_fs_info; + + v9fs_session_cancel(v9ses); +} + static struct super_operations v9fs_super_ops = { .statfs = simple_statfs, .clear_inode = v9fs_clear_inode, .show_options = v9fs_show_options, + .umount_begin = v9fs_umount_begin, }; struct file_system_type v9fs_fs_type = { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.13-rc6-mm2] v9fs: remove sparse bitwise warnings
[PATCH] v9fs: remove sparse bitwise warnings Fixed a bunch of cast conversions to remove -Wbitwise warnings from sparse. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- commit fec4b0831dba7e27e9531d0566eec1a5646f3e79 tree dfc14f433354a8dcdb049bc8137e7f31d7cbda3e parent 67fefd3d8da2c41c41dfd9cd69765b74e246f31f author Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 17:23:47 -0500 committer Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 17:23:47 -0500 fs/9p/conv.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/9p/conv.c b/fs/9p/conv.c --- a/fs/9p/conv.c +++ b/fs/9p/conv.c @@ -88,7 +88,7 @@ static inline void buf_put_int16(struct { buf_check_size(buf, 2); - *(u16 *) buf->p = cpu_to_le16(val); + *(__le16 *) buf->p = cpu_to_le16(val); buf->p += 2; } @@ -96,7 +96,7 @@ static inline void buf_put_int32(struct { buf_check_size(buf, 4); - *(u32 *)buf->p = cpu_to_le32(val); + *(__le32 *)buf->p = cpu_to_le32(val); buf->p += 4; } @@ -104,7 +104,7 @@ static inline void buf_put_int64(struct { buf_check_size(buf, 8); - *(u64 *)buf->p = cpu_to_le64(val); + *(__le64 *)buf->p = cpu_to_le64(val); buf->p += 8; } @@ -147,7 +147,7 @@ static inline u16 buf_get_int16(struct c u16 ret = 0; buf_check_size(buf, 2); - ret = le16_to_cpu(*(u16 *)buf->p); + ret = le16_to_cpu(*(__le16 *)buf->p); buf->p += 2; @@ -159,7 +159,7 @@ static inline u32 buf_get_int32(struct c u32 ret = 0; buf_check_size(buf, 4); - ret = le32_to_cpu(*(u32 *)buf->p); + ret = le32_to_cpu(*(__le32 *)buf->p); buf->p += 4; @@ -171,7 +171,7 @@ static inline u64 buf_get_int64(struct c u64 ret = 0; buf_check_size(buf, 8); - ret = le64_to_cpu(*(u64 *)buf->p); + ret = le64_to_cpu(*(__le64 *)buf->p); buf->p += 8; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND][PATCH 2.6.13-rc6-mm2] v9fs: fix plan9port example in v9fs documentation.
[PATCH] v9fs: Fix Plan9port example in v9fs documentation. Resend: to fix typo that I should have caught first time around. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- commit 678b78b5268b253e21aa818fac25ea13291eafff tree fc3d94d10d23fedee95091e372c51e1156a0360f parent 06e00e56fdf2c3e230ff60f6fdab6db789f16e73 author Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 16:09:12 -0500 committer Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 16:09:12 -0500 Documentation/filesystems/v9fs.txt |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Documentation/filesystems/v9fs.txt b/Documentation/filesystems/v9fs.txt --- a/Documentation/filesystems/v9fs.txt +++ b/Documentation/filesystems/v9fs.txt @@ -20,7 +20,7 @@ For remote file server: For Plan 9 From User Space applications (http://swtch.com/plan9) - mount -t 9P /tmp/ns.root.:0/acme/acme /mnt/9 proto=unix,name=$USER + mount -t 9P `namespace`/acme /mnt/9 -o proto=unix,name=$USER OPTIONS === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-mm2] v9fs: use standard kernel byteswapping routines
On 8/28/05, Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > On Sun, Aug 28, 2005 at 04:05:07PM -0500, Eric Van Hensbergen wrote: > > [PATCH] v9fs: use standard kernel byteswapping routines > > > > Originally suggested by hch, we have removed our byteswap code > > and replaced it with calls to the standard kernel byteswapping code. > > > - buf->p[0] = val; > > - buf->p[1] = val >> 8; > > + *(u16 *) buf->p = cpu_to_le16(val); > > *(__le16 *) > > > - ret = buf->p[0] | (buf->p[1] << 8); > > + ret = le16_to_cpu(*(u16 *)buf->p); > > *(__le16 *) etc. > > Otherwise sparse will warn. > It didn't give me any complaints -- I'm building my kernels with a recent (updated today) version of sparse and built with C=1 -- am I not invoking it correctly? -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PATCH 2.6.13-rc7-mm1] v9fs: adjust follow_link and put_link to match new VFS API
[PATCH] v9fs: adjust follow_link and put_link to match new VFS API In 2.6.13-rc7 the prototypes for follow_link and put_link were changed to include support for a cookie to help reclaim resources. This patch adjusts their definitions in the v9fs implementation. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- commit 30bdd61e96418043a07d2da71bcd757a0341113f tree 3e268ece4b911b960b47b47182972d8f439667da parent e189afc5ed8102a56f74cb5be91a6bf3e478a06a author Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 16:33:42 -0500 committer Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 16:33:42 -0500 fs/9p/vfs_inode.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -1089,7 +1089,7 @@ static int v9fs_vfs_readlink(struct dent * */ -static int v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd) +static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd) { int len = 0; char *link = __getname(); @@ -1109,7 +1109,7 @@ static int v9fs_vfs_follow_link(struct d } nd_set_link(nd, link); - return 0; + return NULL; } /** @@ -1119,7 +1119,7 @@ static int v9fs_vfs_follow_link(struct d * */ -static void v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd) +static void v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd, void *p) { char *s = nd_get_link(nd); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.13-rc6-mm2] v9fs: fix plan9port example in v9fs documentation.
[PATCH] v9fs: Fix Plan9port example in v9fs documentation. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- commit 678b78b5268b253e21aa818fac25ea13291eafff tree fc3d94d10d23fedee95091e372c51e1156a0360f parent 06e00e56fdf2c3e230ff60f6fdab6db789f16e73 author Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 16:09:12 -0500 committer Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 16:09:12 -0500 Documentation/filesystems/v9fs.txt |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Documentation/filesystems/v9fs.txt b/Documentation/filesystems/v9fs.txt --- a/Documentation/filesystems/v9fs.txt +++ b/Documentation/filesystems/v9fs.txt @@ -20,7 +20,7 @@ For remote file server: For Plan 9 From User Space applications (http://swtch.com/plan9) - mount -t 9P /tmp/ns.root.:0/acme/acme /mnt/9 proto=unix,name=$USER + mount -t 9P `namsepace`/acme /mnt/9 -o proto=unix,name=$USER OPTIONS === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.13-rc6-mm2] v9fs: use standard kernel byteswapping routines
[PATCH] v9fs: use standard kernel byteswapping routines Originally suggested by hch, we have removed our byteswap code and replaced it with calls to the standard kernel byteswapping code. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- commit 06e00e56fdf2c3e230ff60f6fdab6db789f16e73 tree 6eff647a71c056d133aa0f0a9e0a0ff95af05683 parent f32fc66e311abe9e7167991e6b2d37e7c56dcc72 author Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 16:03:40 -0500 committer Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 16:03:40 -0500 fs/9p/conv.c | 28 ++-- 1 files changed, 6 insertions(+), 22 deletions(-) diff --git a/fs/9p/conv.c b/fs/9p/conv.c --- a/fs/9p/conv.c +++ b/fs/9p/conv.c @@ -88,8 +88,7 @@ static inline void buf_put_int16(struct { buf_check_size(buf, 2); - buf->p[0] = val; - buf->p[1] = val >> 8; + *(u16 *) buf->p = cpu_to_le16(val); buf->p += 2; } @@ -97,10 +96,7 @@ static inline void buf_put_int32(struct { buf_check_size(buf, 4); - buf->p[0] = val; - buf->p[1] = val >> 8; - buf->p[2] = val >> 16; - buf->p[3] = val >> 24; + *(u32 *)buf->p = cpu_to_le32(val); buf->p += 4; } @@ -108,14 +104,7 @@ static inline void buf_put_int64(struct { buf_check_size(buf, 8); - buf->p[0] = val; - buf->p[1] = val >> 8; - buf->p[2] = val >> 16; - buf->p[3] = val >> 24; - buf->p[4] = val >> 32; - buf->p[5] = val >> 40; - buf->p[6] = val >> 48; - buf->p[7] = val >> 56; + *(u64 *)buf->p = cpu_to_le64(val); buf->p += 8; } @@ -158,7 +147,7 @@ static inline u16 buf_get_int16(struct c u16 ret = 0; buf_check_size(buf, 2); - ret = buf->p[0] | (buf->p[1] << 8); + ret = le16_to_cpu(*(u16 *)buf->p); buf->p += 2; @@ -170,9 +159,7 @@ static inline u32 buf_get_int32(struct c u32 ret = 0; buf_check_size(buf, 4); - ret = - buf->p[0] | (buf->p[1] << 8) | (buf->p[2] << 16) | (buf-> - p[3] << 24); + ret = le32_to_cpu(*(u32 *)buf->p); buf->p += 4; @@ -184,10 +171,7 @@ static inline u64 buf_get_int64(struct c u64 ret = 0; buf_check_size(buf, 8); - ret = (u64) buf->p[0] | ((u64) buf->p[1] << 8) | - ((u64) buf->p[2] << 16) | ((u64) buf->p[3] << 24) | - ((u64) buf->p[4] << 32) | ((u64) buf->p[5] << 40) | - ((u64) buf->p[6] << 48) | ((u64) buf->p[7] << 56); + ret = le64_to_cpu(*(u64 *)buf->p); buf->p += 8; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] v9fs: fix a problem with named-pipe transport
[PATCH] v9fs: fix a problem with named-pipe transport Found the problem. I am not sure why, but unix_mkname in net/unix/af_unix.c writes a zero byte outside the sockaddr_un parameter. There is even a comment that it might seem like a bug, but it is not -- I didn't understand the explanation -- it looks like a bug to me :) The patch that I am attaching sets addr_len parameter of ops->connect to sizeof(struct sockaddr_un) - 1 and thus ensures that unix_mkname won't write outside the struct. The patch also checks if the length of the unix socket name specified in mount doesn't exceed UNIX_PATH_MAX. Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]> Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- commit f32fc66e311abe9e7167991e6b2d37e7c56dcc72 tree 3b2a77e0c674e86aed92823857d33352d93938f3 parent 97bc19b509356dda0145cd19fb9768ac3c88ecda author Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 13:29:03 -0500 committer Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 13:29:03 -0500 fs/9p/trans_sock.c | 24 1 files changed, 16 insertions(+), 8 deletions(-) diff --git a/fs/9p/trans_sock.c b/fs/9p/trans_sock.c --- a/fs/9p/trans_sock.c +++ b/fs/9p/trans_sock.c @@ -202,14 +202,23 @@ static int v9fs_unix_init(struct v9fs_session_info *v9ses, const char *dev_name, char *data) { - struct socket *csocket = NULL; + int rc; + struct socket *csocket; struct sockaddr_un sun_server; - struct v9fs_transport *trans = v9ses->transport; - int rc = 0; + struct v9fs_transport *trans; + struct v9fs_trans_sock *ts; - struct v9fs_trans_sock *ts = - kmalloc(sizeof(struct v9fs_trans_sock), GFP_KERNEL); + rc = 0; + csocket = NULL; + trans = v9ses->transport; + + if (strlen(dev_name) > UNIX_PATH_MAX) { + eprintk(KERN_ERR, "v9fs_trans_unix: address too long: %s\n", + dev_name); + return -ENOMEM; + } + ts = kmalloc(sizeof(struct v9fs_trans_sock), GFP_KERNEL); if (!ts) return -ENOMEM; @@ -222,9 +231,8 @@ v9fs_unix_init(struct v9fs_session_info sun_server.sun_family = PF_UNIX; strcpy(sun_server.sun_path, dev_name); sock_create_kern(PF_UNIX, SOCK_STREAM, 0, &csocket); - rc = csocket->ops->connect(csocket, - (struct sockaddr *)&sun_server, - sizeof(struct sockaddr_un), 0); + rc = csocket->ops->connect(csocket, (struct sockaddr *)&sun_server, + sizeof(struct sockaddr_un) - 1, 0); /* -1 *is* important */ if (rc < 0) { eprintk(KERN_ERR, "v9fs_trans_unix: problem connecting socket: %s: %d\n", - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.13-rc6-mm2] v9fs: fix handling of malformed 9P messages
[PATCH] v9fs: fix handling of malformed 9P messages This patch attempts to do a better job of cleaning up after detecting errors on the transport. This should also improve error reporting on broken connections to servers. Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]> Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- commit 97bc19b509356dda0145cd19fb9768ac3c88ecda tree f12a9e827c949f386cca42b718bac63405e9192d parent 2b2ebf0cea451ad876ab29159162571b5291f8b7 author Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 13:11:33 -0500 committer Eric Van Hensbergen <[EMAIL PROTECTED]> Sun, 28 Aug 2005 13:11:33 -0500 fs/9p/error.h |1 + fs/9p/mux.c| 53 +--- fs/9p/mux.h|1 + fs/9p/trans_sock.c | 12 ++-- 4 files changed, 46 insertions(+), 21 deletions(-) diff --git a/fs/9p/error.h b/fs/9p/error.h --- a/fs/9p/error.h +++ b/fs/9p/error.h @@ -47,6 +47,7 @@ static struct errormap errmap[] = { {"Operation not permitted", EPERM}, {"wstat prohibited", EPERM}, {"No such file or directory", ENOENT}, + {"directory entry not found", ENOENT}, {"file not found", ENOENT}, {"Interrupted system call", EINTR}, {"Input/output error", EIO}, diff --git a/fs/9p/mux.c b/fs/9p/mux.c --- a/fs/9p/mux.c +++ b/fs/9p/mux.c @@ -162,18 +162,21 @@ static int v9fs_recv(struct v9fs_session dprintk(DEBUG_MUX, "waiting for response: %d\n", req->tcall->tag); ret = wait_event_interruptible(v9ses->read_wait, ((v9ses->transport->status != Connected) || - (req->rcall != 0) || dprintcond(v9ses, req))); + (req->rcall != 0) || (req->err < 0) || + dprintcond(v9ses, req))); dprintk(DEBUG_MUX, "got it: rcall %p\n", req->rcall); + + spin_lock(&v9ses->muxlock); + list_del(&req->next); + spin_unlock(&v9ses->muxlock); + + if (req->err < 0) + return req->err; + if (v9ses->transport->status == Disconnected) return -ECONNRESET; - if (ret == 0) { - spin_lock(&v9ses->muxlock); - list_del(&req->next); - spin_unlock(&v9ses->muxlock); - } - return ret; } @@ -245,6 +248,9 @@ v9fs_mux_rpc(struct v9fs_session_info *v if (!v9ses) return -EINVAL; + if (!v9ses->transport || v9ses->transport->status != Connected) + return -EIO; + if (rcall) *rcall = NULL; @@ -257,6 +263,7 @@ v9fs_mux_rpc(struct v9fs_session_info *v tcall->tag = tid; req.tcall = tcall; + req.err = 0; req.rcall = NULL; ret = v9fs_send(v9ses, &req); @@ -351,16 +358,21 @@ static int v9fs_recvproc(void *data) } err = read_message(v9ses, rcall, v9ses->maxdata + V9FS_IOHDRSZ); - if (err < 0) { - kfree(rcall); - break; - } spin_lock(&v9ses->muxlock); - list_for_each_entry_safe(rreq, rptr, &v9ses->mux_fcalls, next) { - if (rreq->tcall->tag == rcall->tag) { - req = rreq; - req->rcall = rcall; - break; + if (err < 0) { + list_for_each_entry_safe(rreq, rptr, &v9ses->mux_fcalls, next) { + rreq->err = err; + } + if(err != -ERESTARTSYS) + eprintk(KERN_ERR, + "Transport error while reading message %d\n", err); + } else { + list_for_each_entry_safe(rreq, rptr, &v9ses->mux_fcalls, next) { + if (rreq->tcall->tag == rcall->tag) { + req = rreq; + req->rcall = rcall; + break; + } } } @@ -379,9 +391,10 @@ static int v9fs_recvproc(void *data) spin_unlock(&v9ses->muxlock); if (!req) { - dprintk(DEBUG_ERROR, - "unexpected response: id %d tag %d\n", - rcall->id, rcall->tag); + if (err >= 0) + dprintk(DEBUG_ERROR, + "unexpected
[PATCH 2.6.13-rc6-mm2] v9fs: readlink extended mode check
LANL reported some issues with random crashes during mount of legacy protocol servers (9P2000 versus 9P2000.u) -- crash was always happening in readlink (which should never happen in legacy mode). Added some sanity conditionals to the get_inode code which should prevent the errors LANL was seeing. Code tested benign through regression. Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]> --- commit 4bbf929d3991fde7eeb8754ae10025644637a268 tree bf671c4f29343ef86eb9c00030fa66d06915560b parent f58a81f47f45c929ea0a1f74f9f15a27d3ad4ded author Eric Van Hensbergen <[EMAIL PROTECTED]> Tue, 02 Aug 2005 15:40:46 -0500 committer Eric Van Hensbergen <[EMAIL PROTECTED]> Tue, 02 Aug 2005 15:40:46 -0500 fs/9p/vfs_inode.c | 35 ++- 1 files changed, 30 insertions(+), 5 deletions(-) diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -44,6 +44,7 @@ #include "fid.h" static struct inode_operations v9fs_dir_inode_operations; +static struct inode_operations v9fs_dir_inode_operations_ext; static struct inode_operations v9fs_file_inode_operations; static struct inode_operations v9fs_symlink_inode_operations; @@ -232,6 +233,7 @@ v9fs_mistat2unix(struct v9fs_stat *mista struct inode *v9fs_get_inode(struct super_block *sb, int mode) { struct inode *inode = NULL; + struct v9fs_session_info *v9ses = sb->s_fs_info; dprintk(DEBUG_VFS, "super block: %p mode: %o\n", sb, mode); @@ -250,6 +252,10 @@ struct inode *v9fs_get_inode(struct supe case S_IFBLK: case S_IFCHR: case S_IFSOCK: + if(!v9ses->extended) { + dprintk(DEBUG_ERROR, "special files without extended mode\n"); + return ERR_PTR(-EINVAL); + } init_special_inode(inode, inode->i_mode, inode->i_rdev); break; @@ -257,14 +263,21 @@ struct inode *v9fs_get_inode(struct supe inode->i_op = &v9fs_file_inode_operations; inode->i_fop = &v9fs_file_operations; break; + case S_IFLNK: + if(!v9ses->extended) { + dprintk(DEBUG_ERROR, "extended modes used w/o 9P2000.u\n"); + return ERR_PTR(-EINVAL); + } + inode->i_op = &v9fs_symlink_inode_operations; + break; case S_IFDIR: inode->i_nlink++; - inode->i_op = &v9fs_dir_inode_operations; + if(v9ses->extended) + inode->i_op = &v9fs_dir_inode_operations_ext; + else + inode->i_op = &v9fs_dir_inode_operations; inode->i_fop = &v9fs_dir_operations; break; - case S_IFLNK: - inode->i_op = &v9fs_symlink_inode_operations; - break; default: dprintk(DEBUG_ERROR, "BAD mode 0x%x S_IFMT 0x%x\n", mode, mode & S_IFMT); @@ -1284,7 +1297,7 @@ v9fs_vfs_mknod(struct inode *dir, struct return retval; } -static struct inode_operations v9fs_dir_inode_operations = { +static struct inode_operations v9fs_dir_inode_operations_ext = { .create = v9fs_vfs_create, .lookup = v9fs_vfs_lookup, .symlink = v9fs_vfs_symlink, @@ -1299,6 +1312,18 @@ static struct inode_operations v9fs_dir_ .setattr = v9fs_vfs_setattr, }; +static struct inode_operations v9fs_dir_inode_operations = { + .create = v9fs_vfs_create, + .lookup = v9fs_vfs_lookup, + .unlink = v9fs_vfs_unlink, + .mkdir = v9fs_vfs_mkdir, + .rmdir = v9fs_vfs_rmdir, + .mknod = v9fs_vfs_mknod, + .rename = v9fs_vfs_rename, + .getattr = v9fs_vfs_getattr, + .setattr = v9fs_vfs_setattr, +}; + static struct inode_operations v9fs_file_inode_operations = { .getattr = v9fs_vfs_getattr, .setattr = v9fs_vfs_setattr, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V9fs-developer] Re: [PATCH 2.6.13-rc3-mm2] v9fs: add fd based transport
On 7/28/05, Ronald G. Minnich wrote: > > > On Thu, 28 Jul 2005, Christoph Hellwig wrote: > > > Couldn't the two other transports be implemented ontop of this one using > > a mount helper doing the pipe or tcp setup? > > that's how we did it in the version we did for 2.4. I don't see why not. > I strayed away from doing it this way originally for two reasons - perhaps both are not really valid: a) I really disliked requiring a helper application to mount a file system. I really wanted to be able to boot a diskless system with no initrd and have just 9P serving root. I figured if I could enable people to use 9P without having a helper app, it would be used by more folks -- of course the need for things like DNS resolution, etc. that helper apps tend to provide sort of invalidates this piece of things. b) I was concerned with additional copy overhead - one of the other transports which isn't published yet uses shared memory (to virtualized partitions) and it just seemed easier to deal with that in the kernel rather than punting to a user-level application -- so in short, I figured keeping the transport modules in the kernel made sense. Of course, that doesn't have anything to do with the socket interfaces being in the kernel -- I don't think there is any additional copy overhead when using an fd versus a sock. That being said, many things may be much easier with a user-level helper - have user level security modules for instance. I guess I'm not opposed to removing the TCP and named-pipe transports if folks think that's a reasonable thing to do -- but I'd like to keep the modular transport infrastructure to support things like the shared memory transport. Of course we also need to get our act in gear and make a reasonable mount-helper application available -- we've got three versions right now and two of them rely on the Plan 9 from User Space packages. Anybody against taking this path? -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: (v9fs) -mm -> 2.6.13 merge status
On 7/14/05, Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > On Friday 15 July 2005 00:04, Christoph Hellwig wrote: > > normally we prefer a patch per actual change, not per file so the > > description fits. Given that all these are pretty trivial fixes one > > patch would have done it aswell, though. > > > > With these changes the code is fine for mainline in my opinion. > > Can I make one more nitpicking comment? > > All these functions can use cpu_to_le*() and le*_to_cpu(). > I need to rethink some parts of conv.c, I'll incorporate your suggestion during the rework. Thanks Alexey. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc2-mm2 2/7] v9fs: VFS file, dentry, and directory operations (2.0.2)
Doh! Good catch, I'll fix and resubmit - same goes for the formating issues. On 7/14/05, Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > @@ -383,9 +379,10 @@ v9fs_file_write(struct file *filp, const > > return -ENOMEM; > > > > ret = copy_from_user(buffer, data, count); > > - if (ret) > > + if (ret) { > > dprintk(DEBUG_ERROR, "Problem copying from user\n"); > > - else > > + return -EFAULT; > > + } else > > ret = v9fs_write(filp, buffer, count, offset); > > > > kfree(buffer); > > Aren't you leaking buffer in the error case? Also we Linux people really > hate an else clause when the if block contains a return statement ;-) > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc2-mm2 5/7] v9fs: 9P protocol implementation (2.0.2)
On 7/14/05, Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > +static inline void buf_check_size(struct cbuf *buf, int len) > > +{ > > + if (buf->p+len > buf->ep) { > > + if (buf->p < buf->ep) { > > + eprintk(KERN_ERR, "buffer overflow\n"); > > + buf->p = buf->ep + 1; > > + } > > + } > > +} > > "handling" a buffer overflow with a printk doesn't seem appopinquate. > In what cases can this happen and what problems may it cause? > I believe all of these cases represent what we would consider to be protocol errors. I suppose it is possible that our truncation approach could be used as an exploit in some weird case -- I'll take a look at fixing things so that any such overflow case is treated as a fatal protocol error and reported as such (via the protocol as appropriate). -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: (v9fs) -mm -> 2.6.13 merge status
On 6/27/05, Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > That beeing said there's a few issues with the code still I'd like to > see fixed: > Sorry I didn't get to these quicker - was on vacation and basically off-line for the past week and a half. I've made 90% of the changes suggested and committed them to my git tree, I'll combine the changes into a single patch and then split them by file-group before sending them to akpm to more closely match the existing patches. The 10% I didn't address I'll comment on below, most of them represent harder problems that I'd like to think about a bit more. > > - there's three sparse warnings still. Two of them are easily fixed > by moving externs to headers, one doesn't look fixable until we get > a sane in-kernel api for socket operations done > - some dentry handling looks rather odd. Why are you for example > calling d_drop in v9fs_vfs_symlink, v9fs_vfs_mknod and v9fs_vfs_link? > Shouldn't all these call d_instantatiate to actually reuse the > dentry as in v9fs_vfs_create? Also what's the issue with > v9fs_fid_insert? It would seem better and more logical to me to > always set d_fsdata in create/mknod/symlink/open before hashing it > and then beeing able to rely on it beeing non-NULL. All of this is kind of tricky due to the association of fids with dentry elements and the special way we handle certain features (such as special files and symlinks). The current code aggressively invalidates fids to prevent the dcache from masking operations that may be semantically important to synthetic file systems. If you look in v9fs_create we actually d_drop the dentry for created directories as well. The only reason we don't d_drop normal files is because we are trying to preserve the atomic create/open semantics. I'm not 100% confident this is the right solution, but its the closest I've been able to come so far -- there's actually been a fair amount of discussion on this in the v9fs-developer's list. If you want more details, it's probably worth a separate thread to discuss the reasons behind why we want to aggressively invalidate the dcache and how we have tried to accomplish this -- or we could just catch up at OLS. > - buf_check_sizep, buf_check_size and buf_check_sizev should be made > inlines, and lose the implict return. Please don't hide such > things in macros done > - please avoid using hlist_for_each, usually hlist_for_each_entry is > a much better choice > - dito for list_for_each_safe vs list_for_each_entry_safe done > - can you please check whether lib/idr.c fullfills your needs so we > can get rid of idpool.c? Last time I looked idr didn't do exactly what I wanted, but looking over it again I realize its just doing more than I want -- so I've eliminated idpool.*, but still have wrapper functions to encapsulate locking and retry -- it strikes me that there may be a case for generalizing these wrapper functions and putting them in lib/idr.c, but figured that could wait. > - v9fs_inode2v9ses has lots of useless checks, inode->i_sb can never > be NULL, and inode->i_sb->s_fs_info can't be either once set in > fill_inode, which is before the first inode on the filesystem is > created. Also the argument is never NULL. Because of that you > can also kill all the return value checks in the callers. > - do you really need to keep v9fs_dentry_delete just for the dprintk? > - no need to check for a NULL file in v9fs_dir_readdir, the VFS gurantees > it's not. And if it was you'd better be off panic because something > is enormously fscked. > - Dito for v9fs_file_open > - And the inode in v9fs_file_lock > - And dir, file, file->d_inode, sb, v9ses in v9fs_remove. > - And dir, sb and v9ses in v9fs_vfs_lookup > - And dir, sb and v9ses in v9fs_vfs_symlink > - And dir, sb and v9ses in v9fs_vfs_link > - And dir, sb and v9ses in v9fs_vfs_mknod Yeah, all of these were sanity checks during initial development while I was still understanding the VFS API. I think I got most of them this time. > - copy_from_user returns the bytes actually copied in the failure case, > but you should return -EFAULT instead of that number in v9fs_file_write fixed > - No need to implement v9fs_file_mmap, do_mmap_pgoff makes sure to error > out if it's not present (and actually returns the correct errno) > - I think it's pretty similar for all these checks for fid (=private_data) > checks. You always set them in open, so they can't be NULL > - kfree can be called with a NULL argument just fine, you can remove > lots of ifs for that. You also often set pointers to NULL just before > freeing a structure - that's pretty useless as slab debugging will > catch bugs with stary references very well, and overwrites these NULLs > ASAP. > - The call to ->put_inode in the error case of v9fs_get_inode is very > wrong. You'd actually pani
Re: [RFC] FUSE permission modell (Was: fuse review bits)
Somewhat related question for Viro/the group: Why is CLONE_NEWNS considered a priveledged operation? Would placing limits on the number of private namespaces a user can own solve any resource concerns or is there something more nefarious I'm missing? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] FUSE permission modell (Was: fuse review bits)
On 4/19/05, Bodo Eggert <[EMAIL PROTECTED]> wrote: > > > > Well, that would kinda be the intent behind the permissions file -- > > it can specify what restricted set of images/devices/whatever the user > > can mount, I suppose the sensible thing would be to always enforce > > nosuid and nsgid, but I'd rather keep these as the default version of > > options (allowing admins to shoot themselves in the foot perhaps, but > > in the single-user workstation case, is seems like there's less reason > > to be so paranoid). > > I think you shouldn't help the admins by creating shoes with target marks. > Fair enough. Since I don't really have any cases I can think of that require this sort of behavior, I'll back off on allowing user mounts with suid or sgid enabled. > > Allowing user mounts with no* should be allways ok (no config needed > besides the ulimit), and mounting specified files to defined locations > is allready supported by fstab. > Do folks think that the limits should be per-user or per-process for user-mounts, what about separate limits for # of private namespaces and # of mounts? The fstab support doesn't seem to provide enough flexibility for certain situations, say I want to support mounting any remote file system, as long as its in the user's private hierarchy? What if I want user's to be able to mount FUSE, v9fs, etc. user-space file systems, but only in a private namespace and only in their private hierarchy? Or are these situations which you think should "always be okay" as long as nosuid and nogid (and newns?) are implicit? -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] FUSE permission modell (Was: fuse review bits)
On 4/17/05, Bodo Eggert <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: > > > I was thinking about this a while back and thought having a user-mount > > permissions file might be the right way to address lots of these > > issues. Essentially it would contain information about what > > users/groups were allowed to mount what sources to what destinations > > and with what mandatory options. > > Users being able to mount random fs containing suid or device nodes > are root whenever they want to. If you want to mount with dev or suid, > use sudo and restrict the mount to a limited set of images/devices/whatever. > Well, that would kinda be the intent behind the permissions file -- it can specify what restricted set of images/devices/whatever the user can mount, I suppose the sensible thing would be to always enforce nosuid and nsgid, but I'd rather keep these as the default version of options (allowing admins to shoot themselves in the foot perhaps, but in the single-user workstation case, is seems like there's less reason to be so paranoid). -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] FUSE permission modell (Was: fuse review bits)
On 4/17/05, Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > > > > 1) Only allow mount over a directory for which the user has write > > > access (and is not sticky) > > > > > > 2) Use nosuid,nodev mount options > > > > > > [ parts deleted ] > > > > Do these solve all the security concerns with unprivileged mounts, or > > are there other barriers/concerns? Should there be ulimit (or rlimit) > > style restrictions on how many mounts/binds a user is allowed to have > > to prevent users from abusing mount privs? > > Currently there is a (configurable) global limit for all non-root FUSE > mounts. An additional per-user limit would be nice, but from the > security standpoint it doesn't matter. > > > I was thinking about this a while back and thought having a user-mount > > permissions file might be the right way to address lots of these > > issues. Essentially it would contain information about what > > users/groups were allowed to mount what sources to what destinations > > and with what mandatory options. > > I haven't yet seen the need for such a great flexibility. Debian > installs fusermount (the FUSE mount utility) "-rwsr-x--- root fuse", These are both well and good, but I was looking for a more global system (for things other than FUSE). > > > Is this unnecessary? Is this not enough? > > Maybe it is necessary, but why bother until somebody actually wants > it? I'm a great believer of the "lazy" development philosophy ;) > Yeah, I guess I'm motivated in that I want to use normal mount to handle v9fs user file systems, local private mounts, and local private resource shares. I'd also like normal users to be able to take better advantage of -o bind. I think its kinda silly that we have special purpose mounts for cifs, samba, fuse, v9fs, etc -- but I suppose that's more of a user-space util-linux dilemma than a kernel dilemma. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] FUSE permission modell (Was: fuse review bits)
On 4/11/05, Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > 1) Only allow mount over a directory for which the user has write > access (and is not sticky) > > 2) Use nosuid,nodev mount options > > [ parts deleted ] Do these solve all the security concerns with unprivileged mounts, or are there other barriers/concerns? Should there be ulimit (or rlimit) style restrictions on how many mounts/binds a user is allowed to have to prevent users from abusing mount privs? I was thinking about this a while back and thought having a user-mount permissions file might be the right way to address lots of these issues. Essentially it would contain information about what users/groups were allowed to mount what sources to what destinations and with what mandatory options. You can get the start of this with the user/users/etc. stuff in /etc/fstab, but I was envisioning something a bit more dynamic with regular expression based rules for sources and destinations. So, something like: # /etc/usermounts: user mount permissions # # allow users to mount any file system under their home directory * $HOME * nosuid, nosgid # allow users to bind over /usr/bin as long as its only in their private namespace * /usr/bin bindnewns # allow users to loopback mount distributed file systems to /mnt 127.0.0.1 /mnt * nosuid, nosgid # allow users to mount files over any directory they have right access to * (perm=0222) * nosuid, nosgid Is this unnecessary? Is this not enough? -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] FUSE permission modell (Was: fuse review bits)
On 4/12/05, Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > I think that would be _much_ nicer implemented as a mount which is > > invisible to other users, rather than one which causes the admin's > > scripts to spew error messages. >> > > Is the namespace mechanism at all suitable for that? > > It is certainly the right tool for this. However currently private > namespaces are quite limited. The only sane usage I can think of is > that before mounting the user starts a shell with CLONE_NS, and does > the mount in this. However all the other programs he already has > running (editor, browser, desktop environment) won't be able to access > the mount. > I'd like to second that I think private-namespaces are the right way to solve this sort of problem. It also helps not cluttering the global namespace with user-local mounts > > Shared subtrees and more support in userspace tools is needed before > private namespaces can become really useful. > I'd like to talk about this a bit more and start driving to a solution here. I've been looking at the namespace code quite a bit and was just about to dive in and start checking into adding/fixing certain aspects such as stackable namespaces, optional inheritence (changes in a parent namespace are reflected in the child but not vice-versa), etc. One aspect I was thinking about here was a mount flag that would give you a new private namespace (if you didn't already have one) for the mount (and I guess that would impact any subsequent mounts from the user in that shell). Another option would be a 'newns' style system-call, but I'm generally against adding new system calls. Shared subtrees are a tricky one. I know how we would handle it in V9FS, but not sure how well that would translate to others (essentially we'd re-export the subtree so other user's could mount it individually -- but that's a very Plan 9 solution and may not be what more UNIX-minded folks would want -- we also need to improve our own server infrastructure to more efficiently support such a re-export). So, to sum up I think private namespaces is the right solution, and I'd rather put effort into making it more useful than work-around the fact that its not practical right now. -eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/