Re: Massive slowdown when re-querying large nfs dir

2007-11-07 Thread Al Boldi
Andrew Morton wrote:
> > > I would suggest getting a 'tcpdump -s0' trace and seeing (with
> > > wireshark) what is different between the various cases.
> >
> > Thanks Neil for looking into this.  Your suggestion has already been
> > answered in a previous post, where the difference has been attributed to
> > "ls -l" inducing lookup for the first try, which is fast, and getattr
> > for later tries, which is super-slow.
> >
> > Now it's easy to blame the userland rpc.nfs.V2 server for this, but
> > what's not clear is how come 2.4.31 handles getattr faster than 2.6.23?
>
> We broke 2.6?  It'd be interesting to run the ls in an infinite loop on
> the client them start poking at the server.  Is the 2.6 server doing
> physical IO?  Is the 2.6 server consuming more system time?  etc.  A basic
> `vmstat 1' trace for both 2.4 and 2.6 would be a starting point.
>
> Could be that there's some additional latency caused by networking
> changes, too.  I expect the tcpdump/wireshark/etc traces would have
> sufficient resolution for us to be able to see that.

The problem turns out to be "tune2fs -O dir_index".
Removing that feature resolves the big slowdown.

Does 2.4.31 support this feature?

Neil Brown wrote:
> Maybe an "strace -tt" of the nfs server might show some significant
> difference.

###
# ls -l <3K dir entry> (first try after mount inducing lookup) in ~3sec
# strace -tt rpc.nfsd

08:28:14.668557 time([1194499694])  = 1194499694
08:28:14.669420 alarm(5)= 2
08:28:14.669667 select(1024, [4 5], NULL, NULL, NULL) = 1 (in [4])
08:28:14.670142 recvfrom(4, 
"\275\3607{\0\0\0\0\0\0\0\2\0\1\206\243\0\0\0\2\0\0\0\4"..., 8800, 0, 
{sa_family=AF_INET, sin_port=htons(888), sin_addr=inet_addr("10.0.0.111")}, 
[16]) = 116
08:28:14.670554 time(NULL)  = 1194499694
08:28:14.670711 time([1194499694])  = 1194499694
08:28:14.670875 lstat("/a/x", {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0
08:28:14.671134 time([1194499694])  = 1194499694
08:28:14.671302 lstat("/a/x/3619", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
08:28:14.671530 time([1194499694])  = 1194499694
08:28:14.671701 alarm(2)= 5
08:28:14.671903 time([1194499694])  = 1194499694
08:28:14.672060 lstat("/a/x/3619", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
08:28:14.672305 time([1194499694])  = 1194499694
08:28:14.672508 sendto(4, 
"\275\3607{\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128, 0, 
{sa_family=AF_INET, sin_port=htons(888), sin_addr=inet_addr("10.0.0.111")}, 16) 
= 128
08:28:14.672909 time([1194499694])  = 1194499694
08:28:14.673869 alarm(5)= 2
08:28:14.674145 select(1024, [4 5], NULL, NULL, NULL) = 1 (in [4])
08:28:14.674589 recvfrom(4, 
"\276\3607{\0\0\0\0\0\0\0\2\0\1\206\243\0\0\0\2\0\0\0\4"..., 8800, 0, 
{sa_family=AF_INET, sin_port=htons(888), sin_addr=inet_addr("10.0.0.111")}, 
[16]) = 116
08:28:14.675003 time(NULL)  = 1194499694
08:28:14.675160 time([1194499694])  = 1194499694
08:28:14.675321 lstat("/a/x", {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0
08:28:14.675581 time([1194499694])  = 1194499694
08:28:14.675749 lstat("/a/x/3631", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
08:28:14.675979 time([1194499694])  = 1194499694
08:28:14.676150 alarm(2)= 5
08:28:14.676348 time([1194499694])  = 1194499694
08:28:14.676505 lstat("/a/x/3631", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
08:28:14.676746 time([1194499694])  = 1194499694
08:28:14.676952 sendto(4, 
"\276\3607{\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128, 0, 
{sa_family=AF_INET, sin_port=htons(888), sin_addr=inet_addr("10.0.0.111")}, 16) 
= 128

##
# ls -l <3K dir entry> (second try after mount inducing getattr) in ~11sec
# strace -tt rpc.nfsd

08:28:40.963668 time([1194499720])  = 1194499720
08:28:40.964525 alarm(5)= 2
08:28:40.964772 select(1024, [4 5], NULL, NULL, NULL) = 1 (in [4])
08:28:40.965215 recvfrom(4, 
",\3747{\0\0\0\0\0\0\0\2\0\1\206\243\0\0\0\2\0\0\0\1\0\0"..., 8800, 0, 
{sa_family=AF_INET, sin_port=htons(888), sin_addr=inet_addr("10.0.0.111")}, 
[16]) = 108
08:28:40.965609 time(NULL)  = 1194499720
08:28:40.965763 time([1194499720])  = 1194499720
08:28:40.965941 stat("/", {st_mode=S_IFDIR|0755, st_size=2048, ...}) = 0
08:28:40.966176 setfsuid(0) = 0
08:28:40.966329 stat("/", {st_mode=S_IFDIR|0755, st_size=2048, ...}) = 0
08:28:40.966539 stat("/", {st_mode=S_IFDIR|0755, st_size=2048, ...}) = 0
08:28:40.966748 open("/", O_RDONLY|O_NONBLOCK) = 0
08:28:40.966919 fcntl(0, F_SETFD, FD_CLOEXEC) = 0
08:28:40.967084 lseek(0, 0, SEEK_CUR)   = 0
08:28:40.967240 getdents(0, /* 71 entries */, 3933) = 1220
08:28:40.968195 close(0)= 0
08:28:40.968351 stat("/a/", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
08:28:40.968583 stat("/a/", 

Re: writeout stalls in current -git

2007-11-07 Thread David Chinner
On Wed, Nov 07, 2007 at 08:15:06AM +0100, Torsten Kaiser wrote:
> On 11/7/07, David Chinner <[EMAIL PROTECTED]> wrote:
> > Ok, so it's not synchronous writes that we are doing - we're just
> > submitting bio's tagged as WRITE_SYNC to get the I/O issued quickly.
> > The "synchronous" nature appears to be coming from higher level
> > locking when reclaiming inodes (on the flush lock). It appears that
> > inode write clustering is failing completely so we are writing the
> > same block multiple times i.e. once for each inode in the cluster we
> > have to write.
> 
> Works for me. The only remaining stalls are sub second and look
> completely valid, considering the amount of files being removed.

> Tested-by: Torsten Kaiser <[EMAIL PROTECTED]>

Great - thanks for reporting the problem and testing the fix.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with accessing namespace_sem from LSM.

2007-11-07 Thread Tetsuo Handa
Hello.

Christoph Hellwig wrote:
> Same argument as with the AA folks: it does not have any business looking
> at the vfsmount.  If you create a file it can and in many setups will
> show up in multiple vfsmounts, so making decisions based on the particular
> one this creat happens through is wrong and actually dangerous.
Thus TOMOYO 1.x doesn't use LSM hooks, and AppArmor for OpenSuSE 10.3
added "struct vfsmount" parameter for VFS helper functions and LSM hooks.

Not all systems use bind mounts.
There is likely only one vfsmount which corresponds with a given dentry.

What does "dangerous" mean? It causes crash?

Regards.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Large SMBwriteX testing.

2007-11-07 Thread Steve French
I have verified that it works for the case in which "min receivefile
size" is under 128K.  When I set it to 25 and tried to read 148000
there were two or three problems (reply_write_and_X in Samba is
calling smb_len instead of smb_len_large and it is looking for
"req->unread_bytes" incorrectly in a few places in reply.c and
fileio.c

On Nov 2, 2007 6:43 PM, Jeremy Allison <[EMAIL PROTECTED]> wrote:
> Hi Steve,
>
> I've finished adding the ability for smbd to support up to
> 16MB writeX calls in the latest git 3.2 tree.
>
> To enable, set the parameter :
>
> min receivefile size = XXX
>
> where XXX is the smallest writeX you want to handle with recvfile.
>
> The linux kernel doesn't yet support zerocopy from network to
> file (ie. splice only works one way currently) so it's emulated
> in userspace (with a 128k staging buffer) for now.
>
> Also it must be an unsigned connection (for obvious reasons).
>
> Once you've set this param smbd will start reporting
> CIFS_UNIX_LARGE_WRITE_CAP on a SMB_QUERY_CIFS_UNIX_INFO:
> call and you should be good to go. You'll need to use
> a writeX call identical to Windows (14 wct with a 1 byte
> pad field) in order to trigger the new code.
>
> Let me know if you get the chance to test it and if
> it makes a speed difference for CIFSFS.
>
> Cheers,
>
> Jeremy.
>



-- 
Thanks,

Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cramfs in big endian

2007-11-07 Thread Christoph Hellwig
On Wed, Nov 07, 2007 at 09:51:48PM +0100, Andi Drebes wrote:
> Hi!
> 
> > I would suggest you to use squashfs instead of cramfs.
> > First, it's newer, it's better, it's actively developed, it doesn't have any
> > limits like the bad cramfs. 
> I'm developing a new linux based firmware for my router which uses cramfs. 
> Switching to squashfs still needs some time. Meanwhile, I have to work with 
> cramfs. As the router uses the big endian format and as my machine works with 
> the little endian format, I'm unable to mount the router's filesystem images.

Making cramfs endianess-independent shouldn't be much work.  Take a look
at the helpers in fs/ufs/swab.h and use them for every ondisk access in
cramfs.  Drop me a not if you need some help.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with accessing namespace_sem from LSM.

2007-11-07 Thread Christoph Hellwig
On Thu, Nov 08, 2007 at 07:04:23AM +0900, Tetsuo Handa wrote:
> The reason why I want to access namespace_sem inside security_inode_create() 
> is that
> it doesn't receive "struct vfsmount" parameter.
> If "struct vfsmount" *were* passed to security_inode_create(), 
> I have no need to access namespace_sem.

Same argument as with the AA folks: it does not have any business looking
at the vfsmount.  If you create a file it can and in many setups will
show up in multiple vfsmounts, so making decisions based on the particular
one this creat happens through is wrong and actually dangerous.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with accessing namespace_sem from LSM.

2007-11-07 Thread Tetsuo Handa
Hello.

Christoph Hellwig wrote:
> > Isn't security_inode_create() a part of VFS internals?
> It's not.  security_inode_create is part of the LSM infrastructure, and
> the actual methods are part of security modules and definitively not
> VFS internals.
The reason why I want to access namespace_sem inside security_inode_create() is 
that
it doesn't receive "struct vfsmount" parameter.
If "struct vfsmount" *were* passed to security_inode_create(), 
I have no need to access namespace_sem.

And now, since calling down_read(&namespace_sem) causes deadlock, I'm looking 
for a solution.
What you said ("I'd start looking for design bugs in whatever code you have 
using it first.")
sounds "never try to implement pathname based access control at 
security_inode_create()",
which makes AppArmor (for OpenSuSE 10.1/10.2) and TOMOYO unable to apply access 
control.

At first, I thought that this lockdep's warning is a false positive,
since "struct inode" is allocated/freed dynamically.
But the warning still appears even after I disabled freeing memory
at destroy_inode() in fs/namei.c (so that address of locking object
in "struct inode" never be reused), it is likely genuine.

Regards.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Massive slowdown when re-querying large nfs dir - CORRECTION

2007-11-07 Thread Neil Brown
On Thursday November 8, [EMAIL PROTECTED] wrote:
> 
> Not really a credible difference as the reported difference is between
> two *clients* and the speed of getattr vs lookup would depend on the
> *server*. 

Sorry, my bad.  I misread your original problem description.  It would
appear to be a server difference.

Maybe an "strace -tt" of the nfs server might show some significant
difference.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Massive slowdown when re-querying large nfs dir

2007-11-07 Thread Neil Brown
On Wednesday November 7, [EMAIL PROTECTED] wrote:
> Neil Brown wrote:
> >
> > I would suggest getting a 'tcpdump -s0' trace and seeing (with
> > wireshark) what is different between the various cases.
> 
> Thanks Neil for looking into this.  Your suggestion has already been answered 
> in a previous post, where the difference has been attributed to "ls -l" 
> inducing lookup for the first try, which is fast, and getattr for later 
> tries, which is super-slow.

Not really a credible difference as the reported difference is between
two *clients* and the speed of getattr vs lookup would depend on the
*server*. 

> 
> Now it's easy to blame the userland rpc.nfs.V2 server for this, but what's 
> not clear is how come 2.4.31 handles getattr faster than 2.6.23?

I suspect a more detailed analysis of the traces is in order.  I
strongly suspect you will see a difference between the two clients,
and you have only reported a difference between the first and second
"ls -l" (unless I missed some email).

It seems most likely that 2.6 is issuing substantially more GETATTR
requests than 2.4.  There have certainly been reports of this in the
past and they have been either fixed or justified.
This may be a new situation.  Or it may be that 2.4 was being fast by
being incorrect in some way.  Only an analysis of the logs would tell.

Maybe you would like to post the (binary, using "-s 0") traces for
both 2.4 and 2.6

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cramfs in big endian

2007-11-07 Thread Andi Drebes
Hi!

> I would suggest you to use squashfs instead of cramfs.
> First, it's newer, it's better, it's actively developed, it doesn't have any
> limits like the bad cramfs. 
I'm developing a new linux based firmware for my router which uses cramfs. 
Switching to squashfs still needs some time. Meanwhile, I have to work with 
cramfs. As the router uses the big endian format and as my machine works with 
the little endian format, I'm unable to mount the router's filesystem images.


Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fs io with struct page instead of iovecs

2007-11-07 Thread David Chinner
On Wed, Nov 07, 2007 at 09:02:05AM -0800, Zach Brown wrote:
> Badari Pulavarty wrote:
> > On Tue, 2007-11-06 at 17:43 -0800, Zach Brown wrote:
> >> At the FS meeting at LCE there was some talk of doing O_DIRECT writes from 
> >> the
> >> kernel with pages instead of with iovecs.  T
> > 
> > Why ? Whats the use case ?
> 
> Well, I think there's a few:
> 
> There are existing callers which hold a kmap() across ->write, which
> isn't great.  ecryptfs() does this.  That's mentioned in the patch
> series.  Arguably loopback should be using this instead of copying some
> fs paths and trying to call aop methods directly.
> 
> I seem to remember Christoph and David having stories of knfsd folks in
> SGI wanting to do O_DIRECT writes from knfsd?  (If not, *I* kind of want
> to, after rolling some patches to align net rx descriptors :)).

The main reason is to remove the serialised writer problem when multiple
clients are writing to the one file. With XFS and direct I/O, we can have
multiple concurrent writers to the one file and have it scale rather than be
limited to what a single cpu holding the i_mutex can do

> Lustre shows us that there is a point at which you can't saturate your
> network and storage if your cpu is copying all the data.

Buy more CPUs ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with accessing namespace_sem from LSM.

2007-11-07 Thread Christoph Hellwig
On Tue, Nov 06, 2007 at 11:52:40PM +0900, Tetsuo Handa wrote:
> Hello.
> 
> Christoph Hellwig wrote:
> > Any code except VFS internals has no business using it at all and doesn't
> > do that in mainline either.  I'd start looking for design bugs in whatever
> > code you have using it first.
> Isn't security_inode_create() a part of VFS internals?

It's not.  security_inode_create is part of the LSM infrastructure, and
the actual methods are part of security modules and definitively not
VFS internals.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Massive slowdown when re-querying large nfs dir

2007-11-07 Thread Andrew Morton
> On Wed, 7 Nov 2007 12:36:26 +0300 Al Boldi <[EMAIL PROTECTED]> wrote:
> Neil Brown wrote:
> > On Tuesday November 6, [EMAIL PROTECTED] wrote:
> > > > On Tue, 6 Nov 2007 14:28:11 +0300 Al Boldi <[EMAIL PROTECTED]> wrote:
> > > > Al Boldi wrote:
> > > > > There is a massive (3-18x) slowdown when re-querying a large nfs dir
> > > > > (2k+ entries) using a simple ls -l.
> > > > >
> > > > > On 2.6.23 client and server running userland rpc.nfs.V2:
> > > > > first  try: time -p ls -l <2k+ entry dir>  in ~2.5sec
> > > > > more tries: time -p ls -l <2k+ entry dir>  in ~8sec
> > > > >
> > > > > first  try: time -p ls -l <5k+ entry dir>  in ~9sec
> > > > > more tries: time -p ls -l <5k+ entry dir>  in ~180sec
> > > > >
> > > > > On 2.6.23 client and 2.4.31 server running userland rpc.nfs.V2:
> > > > > first  try: time -p ls -l <2k+ entry dir>  in ~2.5sec
> > > > > more tries: time -p ls -l <2k+ entry dir>  in ~7sec
> > > > >
> > > > > first  try: time -p ls -l <5k+ entry dir>  in ~8sec
> > > > > more tries: time -p ls -l <5k+ entry dir>  in ~43sec
> > > > >
> > > > > Remounting the nfs-dir on the client resets the problem.
> > > > >
> > > > > Any ideas?
> > > >
> > > > Ok, I played some more with this, and it turns out that nfsV3 is a lot
> > > > faster.  But, this does not explain why the 2.4.31 kernel is still
> > > > over 4-times faster than 2.6.23.
> > > >
> > > > Can anybody explain what's going on?
> > >
> > > Sure, Neil can! ;)
> 
> Thanks Andrew!
> 
> > Nuh.
> > He said "userland rpc.nfs.Vx".  I only do "kernel-land NFS".  In these
> > days of high specialisation, each line of code is owned by a different
> > person, and finding the right person is hard
> >
> > I would suggest getting a 'tcpdump -s0' trace and seeing (with
> > wireshark) what is different between the various cases.
> 
> Thanks Neil for looking into this.  Your suggestion has already been answered 
> in a previous post, where the difference has been attributed to "ls -l" 
> inducing lookup for the first try, which is fast, and getattr for later 
> tries, which is super-slow.
> 
> Now it's easy to blame the userland rpc.nfs.V2 server for this, but what's 
> not clear is how come 2.4.31 handles getattr faster than 2.6.23?
> 

We broke 2.6?  It'd be interesting to run the ls in an infinite loop on the
client them start poking at the server.  Is the 2.6 server doing physical
IO?  Is the 2.6 server consuming more system time?  etc.  A basic `vmstat
1' trace for both 2.4 and 2.6 would be a starting point.

Could be that there's some additional latency caused by networking changes,
too.  I expect the tcpdump/wireshark/etc traces would have sufficient
resolution for us to be able to see that.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fs io with struct page instead of iovecs

2007-11-07 Thread Zach Brown
Badari Pulavarty wrote:
> On Tue, 2007-11-06 at 17:43 -0800, Zach Brown wrote:
>> At the FS meeting at LCE there was some talk of doing O_DIRECT writes from 
>> the
>> kernel with pages instead of with iovecs.  T
> 
> Why ? Whats the use case ?

Well, I think there's a few:

There are existing callers which hold a kmap() across ->write, which
isn't great.  ecryptfs() does this.  That's mentioned in the patch
series.  Arguably loopback should be using this instead of copying some
fs paths and trying to call aop methods directly.

I seem to remember Christoph and David having stories of knfsd folks in
SGI wanting to do O_DIRECT writes from knfsd?  (If not, *I* kind of want
to, after rolling some patches to align net rx descriptors :)).

Lustre shows us that there is a point at which you can't saturate your
network and storage if your cpu is copying all the data.  I'll be the
first to admit that the community might not feel a pressing need to
address this for in-kernel file system writers, but the observation remains.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fs io with struct page instead of iovecs

2007-11-07 Thread Badari Pulavarty
On Tue, 2007-11-06 at 17:43 -0800, Zach Brown wrote:
> At the FS meeting at LCE there was some talk of doing O_DIRECT writes from the
> kernel with pages instead of with iovecs.  T

Why ? Whats the use case ?

Thanks,
Badari



-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANN] Squashfs 3.3 released

2007-11-07 Thread Phillip Lougher

maximilian attems wrote:

On Mon, Nov 05, 2007 at 11:13:14AM +, Phillip Lougher wrote:


The next stage after this release is to fix the one remaining blocking issue
(filesystem endianness), and then try to get Squashfs mainlined into the
Linux kernel again.



that would be very cool!


Yes, it would be cool :)  Five years is a long time to maintain
something out of tree, especially recently when there's been
so many minor changes to the VFS interface between kernel releases.


with my hat as debian kernel maintainer i'd be very relieved to see it
mainlined. i don't know of any major distro that doesn't ship it.



I don't know of any major distro that doesn't ship Squashfs either
(except arguably Slackware).  Putting my other hat on (one of the
Ubuntu kernel maintainers) I don't think Squashfs has caused
distros that many problems because it is an easy patch to apply
(it doesn't touch that many kernel files), but it is always good
to minimise the differences from the stock kernel.org kernel.

Phillip

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANN] Squashfs 3.3 released

2007-11-07 Thread Phillip Lougher

Michael Tokarev wrote:



A tiny bug[fix] I always forgot to send...  In fs/squashfs/inode.c,
constants TASK_UNINTERRUPTIBLE and TASK_INTERRUPTIBLE are used, but
they aren't sometimes defined (declared in linux/sched.h):



Thanks - Squashfs gained a lot of #includes over time, many which I deemed were
unnecessary and removed in Squashfs 3.2.   I obviously removed too many.
Fix applied to CVS.

Phillip


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: migratepage failures on reiserfs

2007-11-07 Thread Badari Pulavarty
On Wed, 2007-11-07 at 14:56 +, Mel Gorman wrote:
> On (05/11/07 14:46), Christoph Lameter didst pronounce:
> > On Mon, 5 Nov 2007, Mel Gorman wrote:
> > 
> > > The grow_dev_page() pages should be reclaimable even though migration
> > > is not supported for those pages? They were marked movable as it was
> > > useful for lumpy reclaim taking back pages for hugepage allocations and
> > > the like. Would it make sense for memory unremove to attempt migration
> > > first and reclaim second?
> > 
> > Note that a page is still movable even if there is no file system method 
> > for migration available. In that case the page needs to be cleaned before 
> > it can be moved.
> > 
> 
> Badari, do you know if the pages failed to migrate because they were
> dirty or because the filesystem simply had ownership of the pages and
> wouldn't let them go?

>From the debug, it looks like all the buffers are clean and they
have a b_count == 1. So drop_buffers() fails to release the buffer.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cramfs in big endian

2007-11-07 Thread Tomas M
> I'm currently trying to enable the cramfs to mount filesystems with a 
> different endianness. 

I would suggest you to use squashfs instead of cramfs.

First, it's newer, it's better, it's actively developed, it doesn't have any 
limits like the bad cramfs.
Moreover, it currently supports both endians. 

(hurry up, as kernel people said in the past that squashfs should NEVER EVER 
support multiple endians, so the feature will be dropped from squashfs, in 
order to get it into mainline kernel more easily; if my informations are 
correct).



Tomas M

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: migratepage failures on reiserfs

2007-11-07 Thread Mel Gorman
On (05/11/07 14:46), Christoph Lameter didst pronounce:
> On Mon, 5 Nov 2007, Mel Gorman wrote:
> 
> > The grow_dev_page() pages should be reclaimable even though migration
> > is not supported for those pages? They were marked movable as it was
> > useful for lumpy reclaim taking back pages for hugepage allocations and
> > the like. Would it make sense for memory unremove to attempt migration
> > first and reclaim second?
> 
> Note that a page is still movable even if there is no file system method 
> for migration available. In that case the page needs to be cleaned before 
> it can be moved.
> 

Badari, do you know if the pages failed to migrate because they were
dirty or because the filesystem simply had ownership of the pages and
wouldn't let them go?

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Massive slowdown when re-querying large nfs dir

2007-11-07 Thread Al Boldi
Neil Brown wrote:
> On Tuesday November 6, [EMAIL PROTECTED] wrote:
> > > On Tue, 6 Nov 2007 14:28:11 +0300 Al Boldi <[EMAIL PROTECTED]> wrote:
> > > Al Boldi wrote:
> > > > There is a massive (3-18x) slowdown when re-querying a large nfs dir
> > > > (2k+ entries) using a simple ls -l.
> > > >
> > > > On 2.6.23 client and server running userland rpc.nfs.V2:
> > > > first  try: time -p ls -l <2k+ entry dir>  in ~2.5sec
> > > > more tries: time -p ls -l <2k+ entry dir>  in ~8sec
> > > >
> > > > first  try: time -p ls -l <5k+ entry dir>  in ~9sec
> > > > more tries: time -p ls -l <5k+ entry dir>  in ~180sec
> > > >
> > > > On 2.6.23 client and 2.4.31 server running userland rpc.nfs.V2:
> > > > first  try: time -p ls -l <2k+ entry dir>  in ~2.5sec
> > > > more tries: time -p ls -l <2k+ entry dir>  in ~7sec
> > > >
> > > > first  try: time -p ls -l <5k+ entry dir>  in ~8sec
> > > > more tries: time -p ls -l <5k+ entry dir>  in ~43sec
> > > >
> > > > Remounting the nfs-dir on the client resets the problem.
> > > >
> > > > Any ideas?
> > >
> > > Ok, I played some more with this, and it turns out that nfsV3 is a lot
> > > faster.  But, this does not explain why the 2.4.31 kernel is still
> > > over 4-times faster than 2.6.23.
> > >
> > > Can anybody explain what's going on?
> >
> > Sure, Neil can! ;)

Thanks Andrew!

> Nuh.
> He said "userland rpc.nfs.Vx".  I only do "kernel-land NFS".  In these
> days of high specialisation, each line of code is owned by a different
> person, and finding the right person is hard
>
> I would suggest getting a 'tcpdump -s0' trace and seeing (with
> wireshark) what is different between the various cases.

Thanks Neil for looking into this.  Your suggestion has already been answered 
in a previous post, where the difference has been attributed to "ls -l" 
inducing lookup for the first try, which is fast, and getattr for later 
tries, which is super-slow.

Now it's easy to blame the userland rpc.nfs.V2 server for this, but what's 
not clear is how come 2.4.31 handles getattr faster than 2.6.23?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + embed-a-struct-path-into-struct-nameidata-instead-of-nd-dentrymnt.pa tch added to -mm tree

2007-11-07 Thread Jan Blunck
Junjiro Okajima,

first of all thanks for the feedback on my union mount patches.

On Tue, Nov 06, [EMAIL PROTECTED] wrote:

> Whiteouts in your code can be a serious memory pressure, since they are
> kept in dcache. I know the inode for whiteouts exists only one and it is
> shared, but dentries for whiteouts are not. They are created for each
> name and resident in dcache.
> I am afraid it can be a problem easily when you create and unlink a
> temporary file many times. Generally their filenames are unique.

The problem that you describe is only existing on tmpfs as the topmost union
layer. In all other cases the whiteout dentries can be shrinked like the
dentries of other filetypes too. This is the price you have to pay for using
union mounts because somewhere this information must be stored. With ext3 or
other diskbased filesystems the whiteouts are stored on disk like normal
files. Therefore the dentry cache can be shrinked and reread by a lookup.

> Regarding to struct path in nameidata, I have no objection
> basically. But I think it is better to create macros for backward
> compatibility as struct file did.

In case of f_dentry and f_mnt that was easy because you could use macros for
it. Still people tend to be lazy and don't change their code if you don't
force them (or do it for them). Anyway, in nameidata we used dentry and mnt as
the field names. Therefore it isn't possible to use macros except of stuff
like ND2DENTRY(nd) kind of stuff which is even worse.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html