Re: 2.6.22.6: kernel BUG at fs/locks.c:171

2007-09-13 Thread Soeren Sonnenburg
On Thu, 2007-09-13 at 09:51 +1000, Nick Piggin wrote:
> On Thursday 13 September 2007 19:20, Soeren Sonnenburg wrote:
> > Dear all,
> >
> > I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine
> > (config attached).
> >
> > Any ideas / which further information needed ?
> 
> Thanks for the report. Is it reproduceable? It seems like the
> locks_free_lock call that's oopsing is coming from __posix_lock_file.
> The actual function looks fine, but the lock being freed could have
> been corrupted if there was slab corruption, or a hardware corruption.
> 
> You could: try running memtest86+ overnight. And try the following
> patch and turn on slab debugging then try to reproduce the problem.

OK so far I've run memtest86+ 1.40 from freedos for 8 hrs (v1.70 hung on
startup) - nothing.

Could this corruption be caused by a pci card/driver? I am asking as I
am using a new dvb-t card (asus p7131) and the oops happened after 5 or
6 days of uptime just about a day after watching some movie (very bad
reception/lots of errors). 

However this machine used to have uptimes of months before the dvb card
was in there and the kernel version upgrade (don't know which version
that was...).

Anyway I am not sure if this is reproducible, but I will keep memtest
running today and then proceed as you said...

Thanks,
Soeren
-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Christoph Lameter
On Thu, 13 Sep 2007, Mel Gorman wrote:

> Surely, we'll be able to detect the situation where the memory is really
> contiguous as a fast path and have a slower path where fragmentation was
> a problem.

Yes I have a draft here now of a virtual compound page solution that I am 
testing with SLUB. Page allocator first tries to allocate a contiguous 
page. If that is not possible then we string together a contiguous page 
through order 0 allocs and vmalloc.
 
> The only implementation question about these patches that hasn't been 
> addressed
> is the mmap() support. What's wrong with it in it's current form. Can it be
> fixed or if it's fundamentally screwed etc. That has fallen by the
> wayside.

Yes I have not heard about those. Got some more ideas how to clean it up 
even more in the meantime. No feedback usually means no objections. 

> I am *very* wary of using reserve pools for anything other than
> emergency situations. If nothing else pools == wasted memory + a sizing
> problem. But hey, it is one option.

Well we need the page pools anyways for large page sizes > MAX_ORDER. But 

> Are we going to agree on some sort of plan or are we just going to
> handwave ourselves to death?

Ill try to do the virtual mapped compound pages. Hopefully that will 
allow fallback for compound pages in a generic way.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [-mm] FS: file name must be unique in the same dir in procfs

2007-09-13 Thread Andrew Morton
> 
> From: Zhang Rui <[EMAIL PROTECTED]>
> File name should be unique in the same directory.
> 
> In order to keep the back-compatibility, only a warning is given
> currently, but actions must be taken to fix it when such duplicates
> are detected.
> 
> Bug report and a simple fix can be found here: 
> http://bugzilla.kernel.org/show_bug.cgi?id=8798
> 
> Signed-off-by: Zhang Rui <[EMAIL PROTECTED]>
> ---
>  fs/proc/generic.c |   12 
>  1 file changed, 12 insertions(+)
> 
> Index: linux-2.6/fs/proc/generic.c
> ===
> --- linux-2.6.orig/fs/proc/generic.c
> +++ linux-2.6/fs/proc/generic.c
> @@ -523,6 +523,7 @@ static const struct inode_operations pro
>  
>  static int proc_register(struct proc_dir_entry * dir, struct
> proc_dir_entry * dp)

Your email client is wordwrapping the text.

>  {
> + struct proc_dir_entry *tmp = NULL;

That initialisation is unneeded - I removed it.

`tmp' is always a crappy name for anything.  I renamed it to `de'.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATH 08/19] jfs: new export ops

2007-09-13 Thread Christoph Hellwig
On Thu, Sep 13, 2007 at 04:57:01PM -0400, J. Bruce Fields wrote:
> On Thu, Sep 13, 2007 at 03:59:34PM +0200, Christoph Hellwig wrote:
> > Yes, please add them to the queue.  I've put a quilt series of patches
> > with the whitespace fixe on http://verein.lst.de/~hch/patches.nfsd.tgz.
> > Do you want to take it that way or should I resend everything to the
> > list?
> 
> Yeah, I hate to be so picky, but would it be possible to get them as
> mail?  It just saves me some scripting.  (I tried git-quiltimport and it
> doesn't get the silly subject lines right.)  They don't even have to be
> mailed out, an mbox I can wget would be fine.

No problem, I'll do that tomorrow.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATH 08/19] jfs: new export ops

2007-09-13 Thread J. Bruce Fields
On Thu, Sep 13, 2007 at 03:59:34PM +0200, Christoph Hellwig wrote:
> Yes, please add them to the queue.  I've put a quilt series of patches
> with the whitespace fixe on http://verein.lst.de/~hch/patches.nfsd.tgz.
> Do you want to take it that way or should I resend everything to the
> list?

Yeah, I hate to be so picky, but would it be possible to get them as
mail?  It just saves me some scripting.  (I tried git-quiltimport and it
doesn't get the silly subject lines right.)  They don't even have to be
mailed out, an mbox I can wget would be fine.

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 23:03, David Chinner wrote:
> On Thu, Sep 13, 2007 at 03:23:21AM +1000, Nick Piggin wrote:

> > Well, it may not be easy to _fix_, but it's easy to try a few
> > improvements ;)
> >
> > How do I make an image and run a workload that will coerce XFS into
> > doing a significant number of vmaps?
>
> # mkfs.xfs -n size=16384 
>
> to create a filesystem with a 16k directory block size on a 4k page
> machine.
>
> Then just do operations on directories with lots of files in them
> (tens of thousands). Every directory operation will require at
> least one vmap in this situation - e.g. a traversal will result in
> lots and lots of blocks being read that will require vmap() for every
> directory block read from disk and an unmap almost immediately
> afterwards when the reference is dropped

Ah, wow, thanks: I can reproduce it.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22.6: kernel BUG at fs/locks.c:171

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 19:20, Soeren Sonnenburg wrote:
> Dear all,
>
> I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine
> (config attached).
>
> Any ideas / which further information needed ?

Thanks for the report. Is it reproduceable? It seems like the
locks_free_lock call that's oopsing is coming from __posix_lock_file.
The actual function looks fine, but the lock being freed could have
been corrupted if there was slab corruption, or a hardware corruption.

You could: try running memtest86+ overnight. And try the following
patch and turn on slab debugging then try to reproduce the problem.


>
> Soeren
>
> [ cut here ]
> kernel BUG at fs/locks.c:171!
> invalid opcode:  [#1]
> Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
> ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
> iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables
> x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp
> nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x
> ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core
> dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp
> ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010206   (2.6.22.6 #1)
> EIP is at locks_free_lock+0xb/0x3b
> eax: e1d07f9c   ebx: e1d07f80   ecx: f5f5e2f0   edx: 
> esi:    edi:    ebp:    esp: da3d7f04
> ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
> Process mrtg-load (pid: 19688, ti=da3d6000 task=f5e3a030 task.ti=da3d6000)
> Stack:  c015972b 0002 c04889c8 c012b920 f5f5e290 c048541c
> f0ed3ca0 01485414  e1d07f80  f0f39f58 44ef35f1 f62fc2ac
>   f5f5e290  d23106c0 c015a891  0007
> 0004 Call Trace:
>  [] __posix_lock_file+0x44e/0x47f
>  [] getnstimeofday+0x2b/0xaf
>  [] fcntl_setlk+0xff/0x1f6
>  [] do_setitimer+0xfa/0x226
>  [] sys_fcntl64+0x74/0x85
>  [] syscall_call+0x7/0xb
>  ===
> Code: 74 1b 8b 15 30 93 48 c0 8d 43 04 89 53 04 89 42 04 a3 30 93 48 c0 c7
> 40 04 30 93 48 c0 5b 5e c3 53 89 c3 8d 40 1c 39 43 1c 74 04 <0f> 0b eb fe
> 8d 43 0c 39 43 0c 74 04 0f 0b eb fe 8d 43 04 39 43 EIP: []
> locks_free_lock+0xb/0x3b SS:ESP 0068:da3d7f04
> BUG: unable to handle kernel paging request at virtual address 9ee420b0
>  printing eip:
> c014ab7d
> *pde = 
> Oops: 0002 [#2]
> Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
> ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
> iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables
> x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp
> nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x
> ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core
> dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp
> ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010082   (2.6.22.6 #1)
> EIP is at free_block+0x61/0xfb
> eax: a75b2c19   ebx: c1cf6c10   ecx: e1d070c4   edx: 9ee420ac
> esi: e1d07000   edi: dfde6960   ebp: dfde7620   esp: dfd87f44
> ds: 007b   es: 007b   fs:   gs:   ss: 0068
> Process events/0 (pid: 4, ti=dfd86000 task=dfdc4a50 task.ti=dfd86000)
> Stack: 0012  0018  c1cf6c10 c1cf6c10 0018
> c1cf6c00 dfde7620 c014ac86  dfde6960 dfde7620 c0521d20 
> c014b869   dfde69e0 c0521d20 c014b827 c0125955 dfdc4b5c
> 8f0c99c0 Call Trace:
>  [] drain_array+0x6f/0x89
>  [] cache_reap+0x42/0xde
>  [] cache_reap+0x0/0xde
>  [] run_workqueue+0x6b/0xdf
>  [] worker_thread+0x0/0xbd
>  [] worker_thread+0xb2/0xbd
>  [] autoremove_wake_function+0x0/0x35
>  [] kthread+0x36/0x5a
>  [] kthread+0x0/0x5a
>  [] kernel_thread_helper+0x7/0x10
>  ===
> Code: 8b 02 25 00 40 02 00 3d 00 40 02 00 75 03 8b 52 0c 8b 02 84 c0 78 04
> 0f 0b eb fe 8b 72 1c 8b 54 24 28 8b 46 04 8b 7c 95 4c 8b 16 <89> 42 04 89
> 10 2b 4e 0c c7 06 00 01 10 00 c7 46 04 00 02 20 00 EIP: []
> free_block+0x61/0xfb SS:ESP 0068:dfd87f44
> [ cut here ]
> kernel BUG at fs/locks.c:171!
> invalid opcode:  [#3]
> Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
> ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
> iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables
> x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp
> nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x
> ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core
> dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp
> ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010287   (2.6.

Re: [PATCH] 9p: rename uid and gid parameters

2007-09-13 Thread Bernd Petrovitsch
On Thu, 2007-09-13 at 08:51 -0600, Latchesar Ionkov wrote:
> Zero was the value that was used before, even though it wasn't defined
> explicitly. I just defined a macro so we can see and eventually change
> it to something better. I don't know if there is a good default value.
> Is nfsnobody the same on all Linux distributions?

Not necessarily.

[]
> On 9/13/07, Eric Van Hensbergen <[EMAIL PROTECTED]> wrote:
> > On 9/12/07, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:
> > > Change the names of 'uid' and 'gid' parameters to the more appropriate
> > > 'dfltuid' and 'dfltgid'.
> > >
> >
> > ...
> >
> > > strcpy(v9ses->name, V9FS_DEFUSER);
> > > strcpy(v9ses->remotename, V9FS_DEFANAME);
> > > +   v9ses->dfltuid = V9FS_DEFUID;
> > > +   v9ses->dfltgid = V9FS_DEFGID;
> > >
> > ...
> > > +#define V9FS_DEFUID(0)
> > > +#define V9FS_DEFGID(0)
> >
> > I'm not sure if there is a good solution here, but I'm uncomfortable
> > with using uid=0 as the default.  I'm not sure if there is a default
> > uid for nobody, but anything is probably better than 0.  Looks like
> > nfsnobody is 65534, we could use that - even if only as a marker for

On 32bit hardware. On 64bit it is (similar to 32bit): (unsigned int)-2.

> > the server to map it to nobody on the target system?  What do you
> > think?
> >
> > Particularly with attach-per-user, we probably need to look at
> > interacting with idmapd or create our own variant real soon.

Bernd
-- 
Firmix Software GmbH   http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
  Embedded Linux Development and Services


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Paul E. McKenney
On Thu, Sep 13, 2007 at 04:22:59PM +0400, Evgeniy Polyakov wrote:
> Hi Paul.
> 
> On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL 
> PROTECTED]) wrote:
> > > Further TODO list includes:
> > > * implement optional saving of mirroring/linear information on the remote
> > >   nodes (simple)
> > > * implement netlink based setup (simple)
> > > * new redundancy algorithm (complex)
> > > 
> > > Homepage:
> > > http://tservice.net.ru/~s0mbre/old/?section=projects&item=dst
> > 
> > A couple questions below, but otherwise looks good from an RCU viewpoint.
> > 
> > Thanx, Paul
> 
> Thanks for your comments, and sorry for late reply I was at KS/London
> trip.
> > > + if (--num) {
> > > + list_for_each_entry_rcu(n, &node->shared, shared) {
> > 
> > This function is called under rcu_read_lock() or similar, right?
> > (Can't tell from this patch.)  It is also OK to call it from under the
> > update-side mutex, of course.
> 
> Actually not, but it does not require it, since entry can not be removed
> during this operations since appropriate reference counter for given node is
> being held. It should not be RCU at all.

Ah!  Yes, it is OK to use _rcu in this case, but should be avoided
unless doing so eliminates duplicate code or some such.  So, agree
with dropping _rcu in this case.

> > > +static int dst_mirror_read(struct dst_request *req)
> > > +{
> > > + struct dst_node *node = req->node, *n, *min_dist_node;
> > > + struct dst_mirror_priv *priv = node->priv;
> > > + u64 dist, d;
> > > + int err;
> > > +
> > > + req->bio_endio = &dst_mirror_read_endio;
> > > +
> > > + do {
> > > + err = -ENODEV;
> > > + min_dist_node = NULL;
> > > + dist = -1ULL;
> > > + 
> > > + /*
> > > +  * Reading is never performed from the node under resync.
> > > +  * If this will cause any troubles (like all nodes must be
> > > +  * resynced between each other), this check can be removed
> > > +  * and per-chunk dirty bit can be tested instead.
> > > +  */
> > > +
> > > + if (!test_bit(DST_NODE_NOTSYNC, &node->flags)) {
> > > + priv = node->priv;
> > > + if (req->start > priv->last_start)
> > > + dist = req->start - priv->last_start;
> > > + else
> > > + dist = priv->last_start - req->start;
> > > + min_dist_node = req->node;
> > > + }
> > > +
> > > + list_for_each_entry_rcu(n, &node->shared, shared) {
> > 
> > I see one call to this function that appears to be under the update-side
> > mutex, but I cannot tell if the other calls are safe.  (Safe as in either
> > under the update-side mutex or under rcu_read_lock() and friends.)
> 
> The same here - those processing function are called from
> generic_make_request() from any lock on top of them. Each node is linked
> into the list of the first added node, which reference counter is
> increased in higher layer. Right now there is no way to add or remove
> nodes after array was started, such functionality requires storage tree
> lock to be taken and RCU can not be used (since it requires sleeping and
> I did not investigate sleepable RCU for this purpose).
> 
> So, essentially RCU is not used in DST :)

Works for me!  "Use the right tool for the job!"

> Thanks for review, Paul.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 9p: rename uid and gid parameters

2007-09-13 Thread Latchesar Ionkov
Zero was the value that was used before, even though it wasn't defined
explicitly. I just defined a macro so we can see and eventually change
it to something better. I don't know if there is a good default value.
Is nfsnobody the same on all Linux distributions?

Thanks,
Lucho

On 9/13/07, Eric Van Hensbergen <[EMAIL PROTECTED]> wrote:
> On 9/12/07, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:
> > Change the names of 'uid' and 'gid' parameters to the more appropriate
> > 'dfltuid' and 'dfltgid'.
> >
>
> ...
>
> > strcpy(v9ses->name, V9FS_DEFUSER);
> > strcpy(v9ses->remotename, V9FS_DEFANAME);
> > +   v9ses->dfltuid = V9FS_DEFUID;
> > +   v9ses->dfltgid = V9FS_DEFGID;
> >
> ...
> > +#define V9FS_DEFUID(0)
> > +#define V9FS_DEFGID(0)
>
> I'm not sure if there is a good solution here, but I'm uncomfortable
> with using uid=0 as the default.  I'm not sure if there is a default
> uid for nobody, but anything is probably better than 0.  Looks like
> nfsnobody is 65534, we could use that - even if only as a marker for
> the server to map it to nobody on the target system?  What do you
> think?
>
> Particularly with attach-per-user, we probably need to look at
> interacting with idmapd or create our own variant real soon.
>
>   -eric
>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 9p: attach-per-user

2007-09-13 Thread Eric Van Hensbergen
On 9/12/07, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:
>
> - allow only one user to access the tree (access=)
>  Only the user with uid can access the v9fs tree. Other users that attempt
>  to access it will get EPERM error.
>

While access= and dfltuid= creates an interesting
flexibility in the way things can be used, I'm wondering if
access= dfltuid=DEFAULT_UID is intuitive, it might be nice for
the default behavior to be setting defltuid to the uid specified in
access when that access option is used.  This can be overridden with
the dfltuid option, but I think it makes more sense to attach as the
uid you are restricting access to.

If that's the way we want to go, I think that can be handled in a
separate patch.

I've merged this stuff into my test tree, as soon as regressions pass
and I confirm they compile clean on another architecture I'll push
them into my devel to be picked up by -mm.

 -eric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATH 08/19] jfs: new export ops

2007-09-13 Thread Christoph Hellwig
On Mon, Sep 10, 2007 at 03:45:33PM -0400, J. Bruce Fields wrote:
> On Thu, Aug 30, 2007 at 03:16:32PM +0200, Christoph Hellwig wrote:
> > Trivial switch over to the new generic helpers.
> > -   result = ERR_PTR(-ESTALE);
> > -   goto out_iput;
> > +   iput(inode);
> > +   return ERR_PTR(-ESTALE);
> 
> For some reason you're introducing a bunch of whitespace damage on lines
> like that last one.  Would you mind running these through checkpatch.pl
> or whatever and then resending?
> 
> Other than that I can't find any problem, and it seems like an obvious
> improvement, though I'd like to see Neil take a look.  Do you want these
> added to the nfsd patches for 2.6.24?

Yes, please add them to the queue.  I've put a quilt series of patches
with the whitespace fixe on http://verein.lst.de/~hch/patches.nfsd.tgz.
Do you want to take it that way or should I resend everything to the
list?
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 9p: rename uid and gid parameters

2007-09-13 Thread Eric Van Hensbergen
On 9/12/07, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:
> Change the names of 'uid' and 'gid' parameters to the more appropriate
> 'dfltuid' and 'dfltgid'.
>

...

> strcpy(v9ses->name, V9FS_DEFUSER);
> strcpy(v9ses->remotename, V9FS_DEFANAME);
> +   v9ses->dfltuid = V9FS_DEFUID;
> +   v9ses->dfltgid = V9FS_DEFGID;
>
...
> +#define V9FS_DEFUID(0)
> +#define V9FS_DEFGID(0)

I'm not sure if there is a good solution here, but I'm uncomfortable
with using uid=0 as the default.  I'm not sure if there is a default
uid for nobody, but anything is probably better than 0.  Looks like
nfsnobody is 65534, we could use that - even if only as a marker for
the server to map it to nobody on the target system?  What do you
think?

Particularly with attach-per-user, we probably need to look at
interacting with idmapd or create our own variant real soon.

  -eric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread David Chinner
On Thu, Sep 13, 2007 at 03:23:21AM +1000, Nick Piggin wrote:
> On Thursday 13 September 2007 11:49, David Chinner wrote:
> > On Wed, Sep 12, 2007 at 01:27:33AM +1000, Nick Piggin wrote:
> 
> > > I just gave 4 things which combined might easily reduce xfs vmap overhead
> > > by several orders of magnitude, all without changing much code at all.
> >
> > Patches would be greatly appreciately. You obviously understand this
> > vm code much better than I do, so if it's easy to fix by adding some
> > generic vmap cache thingy, please do.
> 
> Well, it may not be easy to _fix_, but it's easy to try a few improvements ;)
> 
> How do I make an image and run a workload that will coerce XFS into
> doing a significant number of vmaps?

# mkfs.xfs -n size=16384 

to create a filesystem with a 16k directory block size on a 4k page
machine.

Then just do operations on directories with lots of files in them
(tens of thousands). Every directory operation will require at
least one vmap in this situation - e.g. a traversal will result in
lots and lots of blocks being read that will require vmap() for every
directory block read from disk and an unmap almost immediately
afterwards when the reference is dropped

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Protect update of b_transaction by j_list_lock

2007-09-13 Thread Amit K. Arora
Hello,

At all other places, j_list_lock is being held when b_transaction is
being modified. But at one place in transaction.c, this lock is not
held. Also, the journal-head.h file which defines journal_head
structure says that b_transaction should be protected by two locks -
j_list_lock and jbd_lock_bh_state()  :

/*
 * Pointer to the compound transaction which owns this buffer's
 * metadata: either the running transaction or the committing
 * transaction (if there is one).  Only applies to buffers on a
 * transaction's data or metadata journaling list.
 * [j_list_lock] [jbd_lock_bh_state()]
 */
transaction_t *b_transaction;


This was observed while debugging a problem of 'b_transaction' getting
corrupted. Here is the trace :

-
[c267bc50] d00e0654 .__journal_refile_buffer+0x100/0x16c
[jbd]
[c267bcf0] d00e48bc
.journal_commit_transaction+0x136c/0x16e0 
[jbd]
[c267be00] d00e9968 .kjournald+0xf0/0x2e8 [jbd]
[c267bee0] c0080fb8 .kthread+0x128/0x178
[c267bf90] c00264bc .kernel_thread+0x4c/0x68
3:mon> e
cpu 0x3: Vector: 700 (Program Check) at [c267b930]
pc: d00e02b0: .__journal_file_buffer+0x74/0x2e0 [jbd]
lr: d00e0654: .__journal_refile_buffer+0x100/0x16c [jbd]
sp: c267bbb0
   msr: 80029032
  current = 0xc241d3d0
  paca= 0xc0464900
pid   = 16224, comm = kjournald
kernel BUG in __journal_file_buffer at fs/jbd/transaction.c:1951!
3:mon> r
R00 = 0001   R16 = 41c0
R01 = c267bbb0   R17 = c0371060
R02 = d0102d20   R18 = 
R03 = c00038b25208   R19 = 002bf000
R04 = c001e6dfee80   R20 = c27e4680
R05 = 0002   R21 = 
R06 = 0283   R22 = 0fdc
R07 =    R23 = 
R08 = c0464900   R24 = c000e8ad5024
R09 = c001e6dfee80   R25 = c001b6b726e8
R10 = c00038b25208   R26 = c1f5c780
R11 = 0008   R27 = 0002
R12 = 2448   R28 = c001e6dfee80
R13 = c0464900   R29 = c00038b25208
R14 =    R30 = d0101f80
R15 = c0372620   R31 = c00163ce7908
pc  = d00e02b0 .__journal_file_buffer+0x74/0x2e0 [jbd]
lr  = d00e0654 .__journal_refile_buffer+0x100/0x16c [jbd]
msr = 80029032   cr  = 4442
ctr = d00e0170   xer = 2000   trap =  700
-

The assertion is coming from following code (fs/jbd/transaction.c):

/*
 * File a buffer on the given transaction list.
 */
void __journal_file_buffer(struct journal_head *jh,
transaction_t *transaction, int jlist)
{
struct journal_head **list = NULL;
int was_dirty = 0;
struct buffer_head *bh = jh2bh(jh);

J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));

assert_spin_locked(&transaction->t_journal->j_list_lock);<==HERE

J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
J_ASSERT_JH(jh, jh->b_transaction == transaction ||
jh->b_transaction == 0);
:
:


It looks like the transaction pointer got corrupted. On code inspection,
I could find a place where b_transaction is being updated without
holding the j_list_lock. Tried to fix this in the following patch and
the bug is no longer being discovered. Please see if this change is
okay.


Following patch has been tested successfully for more than two weeks:

Signed-off-by: Amit Arora <[EMAIL PROTECTED]>

diff -Nuarp a/fs/jbd/transaction.c b/fs/jbd/transaction.c
--- a/fs/jbd/transaction.c  2007-08-10 15:34:08.0 +0530
+++ b/fs/jbd/transaction.c  2007-08-10 15:56:02.0 +0530
@@ -693,15 +693,15 @@ repeat:
 * sure it doesn't get written to disk before the caller actually
 * commits the new data
 */
+   spin_lock(&journal->j_list_lock);
if (!jh->b_transaction) {
JBUFFER_TRACE(jh, "no transaction");
J_ASSERT_JH(jh, !jh->b_next_transaction);
jh->b_transaction = transaction;
JBUFFER_TRACE(jh, "file as BJ_Reserved");
-   spin_lock(&journal->j_list_lock);
__journal_file_buffer(jh, transaction, BJ_Reserved);
-   spin_unlock(&journal->j_list_lock);
}
+   spin_unlock(&journal->j_list_lock);
 
 done:
if (need_copy) {
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Evgeniy Polyakov
Hi Paul.

On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL PROTECTED]) 
wrote:
> > Further TODO list includes:
> > * implement optional saving of mirroring/linear information on the remote
> > nodes (simple)
> > * implement netlink based setup (simple)
> > * new redundancy algorithm (complex)
> > 
> > Homepage:
> > http://tservice.net.ru/~s0mbre/old/?section=projects&item=dst
> 
> A couple questions below, but otherwise looks good from an RCU viewpoint.
> 
>   Thanx, Paul

Thanks for your comments, and sorry for late reply I was at KS/London
trip.
> > +   if (--num) {
> > +   list_for_each_entry_rcu(n, &node->shared, shared) {
> 
> This function is called under rcu_read_lock() or similar, right?
> (Can't tell from this patch.)  It is also OK to call it from under the
> update-side mutex, of course.

Actually not, but it does not require it, since entry can not be removed
during this operations since appropriate reference counter for given node is
being held. It should not be RCU at all.

> > +static int dst_mirror_read(struct dst_request *req)
> > +{
> > +   struct dst_node *node = req->node, *n, *min_dist_node;
> > +   struct dst_mirror_priv *priv = node->priv;
> > +   u64 dist, d;
> > +   int err;
> > +
> > +   req->bio_endio = &dst_mirror_read_endio;
> > +
> > +   do {
> > +   err = -ENODEV;
> > +   min_dist_node = NULL;
> > +   dist = -1ULL;
> > + 
> > +   /*
> > +* Reading is never performed from the node under resync.
> > +* If this will cause any troubles (like all nodes must be
> > +* resynced between each other), this check can be removed
> > +* and per-chunk dirty bit can be tested instead.
> > +*/
> > +
> > +   if (!test_bit(DST_NODE_NOTSYNC, &node->flags)) {
> > +   priv = node->priv;
> > +   if (req->start > priv->last_start)
> > +   dist = req->start - priv->last_start;
> > +   else
> > +   dist = priv->last_start - req->start;
> > +   min_dist_node = req->node;
> > +   }
> > +
> > +   list_for_each_entry_rcu(n, &node->shared, shared) {
> 
> I see one call to this function that appears to be under the update-side
> mutex, but I cannot tell if the other calls are safe.  (Safe as in either
> under the update-side mutex or under rcu_read_lock() and friends.)

The same here - those processing function are called from
generic_make_request() from any lock on top of them. Each node is linked
into the list of the first added node, which reference counter is
increased in higher layer. Right now there is no way to add or remove
nodes after array was started, such functionality requires storage tree
lock to be taken and RCU can not be used (since it requires sleeping and
I did not investigate sleepable RCU for this purpose).

So, essentially RCU is not used in DST :)

Thanks for review, Paul.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Jfs-discussion] [2/4] 2.6.23-rc6: known regressions

2007-09-13 Thread Dave Kleikamp
On Wed, 2007-09-12 at 18:58 +0200, Michal Piotrowski wrote:

> Subject : umount triggers a warning in jfs and takes almost a minute
> References  : http://lkml.org/lkml/2007/9/4/73
> Last known good : ?
> Submitter   : Oliver Neukum <[EMAIL PROTECTED]>
> Caused-By   : ?
> Handled-By  : ?
> Status  : unknown

I'm still waiting to hear from Oliver whether or not this is actually a
regression.

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Mel Gorman
On (12/09/07 16:17), Christoph Lameter didst pronounce:
> On Wed, 12 Sep 2007, Nick Piggin wrote:
> 
> > I will still argue that my approach is the better technical solution for 
> > large
> > block support than yours, I don't think we made progress on that. And I'm
> > quite sure we agreed at the VM summit not to rely on your patches for
> > VM or IO scalability.
> 
> The approach has already been tried (see the XFS layer) and found lacking. 
> 
> Having a fake linear block through vmalloc means that a special software 
> layer must be introduced and we may face special casing in the block / fs 
> layer to check if we have one of these strange vmalloc blocks.
> 

One of Nick's points is that to have a 100% reliable solution, that is
what is required. We already have a layering between the VM and the FS
but my understanding is that fsblock replaces rather than adds to it.

Surely, we'll be able to detect the situation where the memory is really
contiguous as a fast path and have a slower path where fragmentation was
a problem.

> > But you just showed in two emails that you don't understand what the
> > problem is. To reiterate: lumpy reclaim does *not* invalidate my formulae;
> > and antifrag does *not* isolate the issue.
> 
> I do understand what the problem is. I just do not get what your problem 
> with this is and why you have this drive to demand perfection. We are 
> working a variety of approaches on the (potential) issue but you 
> categorically state that it cannot be solved.
> 

This is going in circles.

His point is that we also cannot prove it is 100% correct in all
situations. Without taking additional (expensive) steps, there will be a
workload that fragments physical memory. He doesn't know what it is and neither
do we, but that does not mean that someone else will find it. He also has a
point about the slow degredation of fragmentation that is woefully difficult
to reproduce. We've had this provability of correctness problem before.

His initial problem was not with the patches as such but the fact that they
seemed to be presented as a 1st class feature that we fully support and
is a potential solution for some VM and IO Scalability problems.  This is
not the case, we have to treat it as a 2nd class feature until we *know* no
situation exists where it breaks down. These patches on their own would have
to run for months if not a year or so before we could be really sure about it.

The only implementation question about these patches that hasn't been addressed
is the mmap() support. What's wrong with it in it's current form. Can it be
fixed or if it's fundamentally screwed etc. That has fallen by the
wayside.

> > But what do you say about viable alternatives that do not have to
> > worry about these "unlikely scenarios", full stop? So, why should we
> > not use fs block for higher order page support?
> 
> Because it has already been rejected in another form and adds more 
> layering to the filesystem and more checking for special cases in which 
> we only have virtual linearity? It does not reduce the number of page 
> structs that have to be handled by the lower layers etc.
> 

Unless callers always use an iterator for blocks that is optimised in the
physically linear case to be a simple array offset and when not physically
linear it either walks chains (complex) or uses vmap (must deal with TLB
flushes amoung other things). If it optimistically uses physically contiguous
memory, we may find a way to use only one page struct as well.

> Maybe we coud get to something like a hybrid that avoids some of these 
> issues?

Or gee whiz, I don't know. Start with your patches as a strictly 2nd class
citizen and build fsblock in while trying to keep use of physically contiguous
memory where possible and it makes sense.

> Add support so something like a virtual compound page can be 
> handled transparently in the filesystem layer with special casing if 
> such a beast reaches the block layer?
> 
> > I didn't skip that. We have large page pools today. How does that give
> > first class of support to those allocations if you have to have memory
> > reserves?
> 
> See my other mail. That portion is not complete yet. Sorry.
> 

I am *very* wary of using reserve pools for anything other than
emergency situations. If nothing else pools == wasted memory + a sizing
problem. But hey, it is one option.

Are we going to agree on some sort of plan or are we just going to
handwave ourselves to death?

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 11:49, David Chinner wrote:
> On Wed, Sep 12, 2007 at 01:27:33AM +1000, Nick Piggin wrote:

> > I just gave 4 things which combined might easily reduce xfs vmap overhead
> > by several orders of magnitude, all without changing much code at all.
>
> Patches would be greatly appreciately. You obviously understand this
> vm code much better than I do, so if it's easy to fix by adding some
> generic vmap cache thingy, please do.

Well, it may not be easy to _fix_, but it's easy to try a few improvements ;)

How do I make an image and run a workload that will coerce XFS into
doing a significant number of vmaps?
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html