Re: [NFS] What's slated for inclusion in 2.6.24-rc1 from the NFS client git tree...

2007-10-08 Thread Greg Banks
On Fri, Oct 05, 2007 at 02:00:37PM -0400, Jeff Layton wrote:
> On Fri, 05 Oct 2007 13:30:10 -0400
> [EMAIL PROTECTED] wrote:
> >
> > How does Joe Sysadmin tell if he has an affected legacy app or not?
> > 
> > (The obvious "try it and see what breaks" is a non-starter for many places,
> > because you too easily end up in a loop of "enable it, find 4-5 show 
> > stoppers,
> > turn it off, fix them, lather rinse repease".  Been there, done that, got
> > the tshirt - a project I got dragged into involves a large storage array 
> > that
> > appears to insist on exporting 64-bit stuff, and a large farm of clients 
> > that
> > are very 64-bit unclean)
> > 
> 
> In addition to Trond's suggestion, you might be able to use "nm" or
> something like it and see if there are references to non-LFS (f)stat
> calls in your binaries. For instance, if you see references to stat()
> (and not stat64()), then the app is probably not built with 64-bit file
> offsets.

Attached is a Perl script I wrote a while back to scan directories
looking for old stat calls in binaries.  Here's the output from
my laptop:

# ./summarise-stat64.pl /usr/bin
775 26.8% are scripts (shell, perl, whatever)
   1404 48.5% don't use any stat() family calls at all
428 14.8% use 32-bit stat() family interfaces only
278  9.6% use 64-bit stat64() family interfaces only
 11  0.4% use both 32-bit and 64-bit stat() family interfaces

# ./summarise-stat64.pl /usr/sbin
164 35.7% are scripts (shell, perl, whatever)
170 37.0% don't use any stat() family calls at all
 78 17.0% use 32-bit stat() family interfaces only
 46 10.0% use 64-bit stat64() family interfaces only
  1  0.2% use both 32-bit and 64-bit stat() family interfaces

# ./summarise-stat64.pl -v /usr/bin
...
/usr/bin/vi use 32-bit stat() family interfaces only
/usr/bin/view use 32-bit stat() family interfaces only
/usr/bin/vim use 32-bit stat() family interfaces only
...
/usr/bin/Mail use 32-bit stat() family interfaces only
/usr/bin/mail use 32-bit stat() family interfaces only
/usr/bin/mailx use 32-bit stat() family interfaces only
...
/usr/bin/gdb use 32-bit stat() family interfaces only
/usr/bin/gdbtui use 32-bit stat() family interfaces only
/usr/bin/rpcgen use 32-bit stat() family interfaces only
...
/usr/bin/cc use 32-bit stat() family interfaces only
/usr/bin/gcc use 32-bit stat() family interfaces only
/usr/bin/gcov use 32-bit stat() family interfaces only
/usr/bin/unprotoize use 32-bit stat() family interfaces only
...
/usr/bin/git use 32-bit stat() family interfaces only
/usr/bin/git-check-ref-format use 32-bit stat() family interfaces only
/usr/bin/git-cat-file use 32-bit stat() family interfaces only
/usr/bin/git-checkout-index use 32-bit stat() family interfaces only
/usr/bin/git-clone-pack use 32-bit stat() family interfaces only
/usr/bin/git-commit-tree use 32-bit stat() family interfaces only
/usr/bin/git-convert-objects use 32-bit stat() family interfaces only
/usr/bin/git-daemon use 32-bit stat() family interfaces only
/usr/bin/git-describe use 32-bit stat() family interfaces only
...

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.
#!/usr/bin/perl
#
# A Perl script for evaluating and summarising which executables in
# the given directories depend on the old 32-bit stat() family APIs.
#
# Usage: summariese-stat64.pl directory [...]
#
# Copyright (c) 2007 Silicon Graphics, Inc.  All Rights Reserved.
# By Greg Banks <[EMAIL PROTECTED]>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#

use strict;
use warnings;

my @pathnames;  # file and directories to read, from the commandline
my @results;# array of { path, used32, used64, not_exe, no_perm } 
hashes
my $verbose = 0;

sub usage
{
print STDERR "Usage: summarise-stat64 [--verbose] file_or_directory...\n";
exit 1;
}

# Parse arguments
foreach my $a (@ARGV)
{
if ($a eq '--verbose' || $a eq '-v')
{
$verbose++;
}
elsif ($a =~ m/^-/)
{
usage;
}
else
{
push(@pathnames,$a);
}
}
usage unless scalar(@pathna

Re: [NFS] What's slated for inclusion in 2.6.24-rc1 from the NFS client git tree...

2007-10-08 Thread Greg Banks
On Fri, Oct 05, 2007 at 02:00:37PM -0400, Jeff Layton wrote:
 On Fri, 05 Oct 2007 13:30:10 -0400
 [EMAIL PROTECTED] wrote:
 
  How does Joe Sysadmin tell if he has an affected legacy app or not?
  
  (The obvious try it and see what breaks is a non-starter for many places,
  because you too easily end up in a loop of enable it, find 4-5 show 
  stoppers,
  turn it off, fix them, lather rinse repease.  Been there, done that, got
  the tshirt - a project I got dragged into involves a large storage array 
  that
  appears to insist on exporting 64-bit stuff, and a large farm of clients 
  that
  are very 64-bit unclean)
  
 
 In addition to Trond's suggestion, you might be able to use nm or
 something like it and see if there are references to non-LFS (f)stat
 calls in your binaries. For instance, if you see references to stat()
 (and not stat64()), then the app is probably not built with 64-bit file
 offsets.

Attached is a Perl script I wrote a while back to scan directories
looking for old stat calls in binaries.  Here's the output from
my laptop:

# ./summarise-stat64.pl /usr/bin
775 26.8% are scripts (shell, perl, whatever)
   1404 48.5% don't use any stat() family calls at all
428 14.8% use 32-bit stat() family interfaces only
278  9.6% use 64-bit stat64() family interfaces only
 11  0.4% use both 32-bit and 64-bit stat() family interfaces

# ./summarise-stat64.pl /usr/sbin
164 35.7% are scripts (shell, perl, whatever)
170 37.0% don't use any stat() family calls at all
 78 17.0% use 32-bit stat() family interfaces only
 46 10.0% use 64-bit stat64() family interfaces only
  1  0.2% use both 32-bit and 64-bit stat() family interfaces

# ./summarise-stat64.pl -v /usr/bin
...
/usr/bin/vi use 32-bit stat() family interfaces only
/usr/bin/view use 32-bit stat() family interfaces only
/usr/bin/vim use 32-bit stat() family interfaces only
...
/usr/bin/Mail use 32-bit stat() family interfaces only
/usr/bin/mail use 32-bit stat() family interfaces only
/usr/bin/mailx use 32-bit stat() family interfaces only
...
/usr/bin/gdb use 32-bit stat() family interfaces only
/usr/bin/gdbtui use 32-bit stat() family interfaces only
/usr/bin/rpcgen use 32-bit stat() family interfaces only
...
/usr/bin/cc use 32-bit stat() family interfaces only
/usr/bin/gcc use 32-bit stat() family interfaces only
/usr/bin/gcov use 32-bit stat() family interfaces only
/usr/bin/unprotoize use 32-bit stat() family interfaces only
...
/usr/bin/git use 32-bit stat() family interfaces only
/usr/bin/git-check-ref-format use 32-bit stat() family interfaces only
/usr/bin/git-cat-file use 32-bit stat() family interfaces only
/usr/bin/git-checkout-index use 32-bit stat() family interfaces only
/usr/bin/git-clone-pack use 32-bit stat() family interfaces only
/usr/bin/git-commit-tree use 32-bit stat() family interfaces only
/usr/bin/git-convert-objects use 32-bit stat() family interfaces only
/usr/bin/git-daemon use 32-bit stat() family interfaces only
/usr/bin/git-describe use 32-bit stat() family interfaces only
...

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.
#!/usr/bin/perl
#
# A Perl script for evaluating and summarising which executables in
# the given directories depend on the old 32-bit stat() family APIs.
#
# Usage: summariese-stat64.pl directory [...]
#
# Copyright (c) 2007 Silicon Graphics, Inc.  All Rights Reserved.
# By Greg Banks [EMAIL PROTECTED]
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#

use strict;
use warnings;

my @pathnames;  # file and directories to read, from the commandline
my @results;# array of { path, used32, used64, not_exe, no_perm } 
hashes
my $verbose = 0;

sub usage
{
print STDERR Usage: summarise-stat64 [--verbose] file_or_directory...\n;
exit 1;
}

# Parse arguments
foreach my $a (@ARGV)
{
if ($a eq '--verbose' || $a eq '-v')
{
$verbose++;
}
elsif ($a =~ m/^-/)
{
usage;
}
else
{
push(@pathnames,$a);
}
}
usage unless scalar(@pathnames);

# Function to scan a file
sub scan_file
{
my ($path) = @_;
my $fh;

my %res =
(
path = $path,
used32 = 0,
used64 = 0

Re: [NFS] [PATCH 2/7] NFS: if ATTR_KILL_S*ID bits are set, then skip mode change

2007-09-14 Thread Greg Banks
On Fri, Sep 14, 2007 at 10:58:38AM -0400, Jeff Layton wrote:
> On Sat, 15 Sep 2007 00:40:33 +1000
> Greg Banks <[EMAIL PROTECTED]> wrote:
> 
> 
> > Ok, you convinced me.
> 
> Right. When I was first looking at this, I considered some similar
> approaches, but hit roadblocks with all of them. The only real option
> seems to be to leave this to the server, but that does assume that the
> server handles this properly.
> 
> Servers that don't are broken, IMO.

According to what spec?  A quick trip around the machine room shows
that neither Solaris 10 nor Darwin 7.9.0 clobber setuid on write
either.

> If Irix isn't clearing these bits
> on a write then it might be good to see if they can fix that...

I think first you'd have to mount a serious argument that it's broken,
more serious than "it works differently from Linux".

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [PATCH 2/7] NFS: if ATTR_KILL_S*ID bits are set, then skip mode change

2007-09-14 Thread Greg Banks
On Fri, Sep 14, 2007 at 09:38:46AM -0400, Jeff Layton wrote:
> On Fri, 14 Sep 2007 23:09:24 +1000
> Greg Banks <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, Sep 14, 2007 at 07:02:58AM -0400, Jeff Layton wrote:
> > > On Fri, 14 Sep 2007 20:25:45 +1000
> > > Greg Banks <[EMAIL PROTECTED]> wrote:
> > > 
> > > > I'm curious about the reasons behind this change.  You mention
> > > > credential issues; how exactly is it that you have the correct creds
> > > > to perform a WRITE rpc but not a SETATTR rpc?
> > > > 
> > > 
> > > Consider this case. user1 and user2 are both members of group
> > > "allusers":
> > > 
> > > user1$ echo foo > foo
> > > user1$ chgrp allusers foo
> > > user1$ chmod 04770 foo
> > > user2$ echo bar >> foo
> > > 
> > > On most local filesystems, this would work correctly. The end result
> > > would be a file with mode 0770 and the expected contents. On NFS
> > > though, the write by user2 fails. When the write is attempted, the
> > > kernel tries to squash the setuid bit using the credentials of user2,
> > > who's not allowed to change the mode. The write then fails because the
> > > setattr fails.
> > 
> > Ok, I ran an experiment and I see this failure mode.
> > 
> > So the SETATTR rpc is really a side effect of the client kernel's
> > behaviour and not an operation directly requested by the user process
> > on the client.  Is there any reason why that rpc needs to have user2's
> > creds?  Why not do the rpc with a fake set of creds with uid and gid
> > set to the uid and gid of the file, in this case user1/allusers ?
> > That way the rpc will most likely pass the server's permission check.
> > 
> 
> That might work in some cases, but there are many where it wouldn't...
> 
> Suppose user1 here is root and all of the user1 operations are being
> done on the server. If the server has root squashing enabled, then
> user2's operation would still fail.

In that case, user1's operations would also fail, which is even more
serious a problem.  Also arguably you actually *want* writes by a
nonroot user to a setuid root executable to fail ;-)

> Another problem:
> 
> Suppose we're using gssapi. There's no guarantee that the client will
> have the proper credentials to fake up a call as user1 (you might need
> user1 krb5 tickets, etc).

Yes, good point.  You could use the root creds, except for root squashing.
Ok, you convinced me.

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [PATCH 2/7] NFS: if ATTR_KILL_S*ID bits are set, then skip mode change

2007-09-14 Thread Greg Banks
On Fri, Sep 14, 2007 at 07:02:58AM -0400, Jeff Layton wrote:
> On Fri, 14 Sep 2007 20:25:45 +1000
> Greg Banks <[EMAIL PROTECTED]> wrote:
> 
> > I'm curious about the reasons behind this change.  You mention
> > credential issues; how exactly is it that you have the correct creds
> > to perform a WRITE rpc but not a SETATTR rpc?
> > 
> 
> Consider this case. user1 and user2 are both members of group
> "allusers":
> 
> user1$ echo foo > foo
> user1$ chgrp allusers foo
> user1$ chmod 04770 foo
> user2$ echo bar >> foo
> 
> On most local filesystems, this would work correctly. The end result
> would be a file with mode 0770 and the expected contents. On NFS
> though, the write by user2 fails. When the write is attempted, the
> kernel tries to squash the setuid bit using the credentials of user2,
> who's not allowed to change the mode. The write then fails because the
> setattr fails.

Ok, I ran an experiment and I see this failure mode.

So the SETATTR rpc is really a side effect of the client kernel's
behaviour and not an operation directly requested by the user process
on the client.  Is there any reason why that rpc needs to have user2's
creds?  Why not do the rpc with a fake set of creds with uid and gid
set to the uid and gid of the file, in this case user1/allusers ?
That way the rpc will most likely pass the server's permission check.

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [PATCH 2/7] NFS: if ATTR_KILL_S*ID bits are set, then skip mode change

2007-09-14 Thread Greg Banks
On Tue, Sep 04, 2007 at 10:37:04AM -0400, Jeff Layton wrote:
> If the ATTR_KILL_S*ID bits are set then any mode change is only for
> clearing the setuid/setgid bits. For NFS skip the mode change and
> let the server handle it.

You're assuming the server will remove setuid and setgid bits on WRITE?
I don't see that behaviour specified in the RFC, at least for v3.
The RFC specifies a behaviour for the mtime attribute as a side
effect of WRITE, but says nothing about mode.  This means server
implementations are free to clobber setuid or not.  A quick experiment
shows that at least the Irix server will *NOT* clobber those bits.
So with an Irix server you've now lost this Linux-specific "security
feature".

I'm curious about the reasons behind this change.  You mention
credential issues; how exactly is it that you have the correct creds
to perform a WRITE rpc but not a SETATTR rpc?

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [PATCH 2/7] NFS: if ATTR_KILL_S*ID bits are set, then skip mode change

2007-09-14 Thread Greg Banks
On Tue, Sep 04, 2007 at 10:37:04AM -0400, Jeff Layton wrote:
 If the ATTR_KILL_S*ID bits are set then any mode change is only for
 clearing the setuid/setgid bits. For NFS skip the mode change and
 let the server handle it.

You're assuming the server will remove setuid and setgid bits on WRITE?
I don't see that behaviour specified in the RFC, at least for v3.
The RFC specifies a behaviour for the mtime attribute as a side
effect of WRITE, but says nothing about mode.  This means server
implementations are free to clobber setuid or not.  A quick experiment
shows that at least the Irix server will *NOT* clobber those bits.
So with an Irix server you've now lost this Linux-specific security
feature.

I'm curious about the reasons behind this change.  You mention
credential issues; how exactly is it that you have the correct creds
to perform a WRITE rpc but not a SETATTR rpc?

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [PATCH 2/7] NFS: if ATTR_KILL_S*ID bits are set, then skip mode change

2007-09-14 Thread Greg Banks
On Fri, Sep 14, 2007 at 07:02:58AM -0400, Jeff Layton wrote:
 On Fri, 14 Sep 2007 20:25:45 +1000
 Greg Banks [EMAIL PROTECTED] wrote:
 
  I'm curious about the reasons behind this change.  You mention
  credential issues; how exactly is it that you have the correct creds
  to perform a WRITE rpc but not a SETATTR rpc?
  
 
 Consider this case. user1 and user2 are both members of group
 allusers:
 
 user1$ echo foo  foo
 user1$ chgrp allusers foo
 user1$ chmod 04770 foo
 user2$ echo bar  foo
 
 On most local filesystems, this would work correctly. The end result
 would be a file with mode 0770 and the expected contents. On NFS
 though, the write by user2 fails. When the write is attempted, the
 kernel tries to squash the setuid bit using the credentials of user2,
 who's not allowed to change the mode. The write then fails because the
 setattr fails.

Ok, I ran an experiment and I see this failure mode.

So the SETATTR rpc is really a side effect of the client kernel's
behaviour and not an operation directly requested by the user process
on the client.  Is there any reason why that rpc needs to have user2's
creds?  Why not do the rpc with a fake set of creds with uid and gid
set to the uid and gid of the file, in this case user1/allusers ?
That way the rpc will most likely pass the server's permission check.

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [PATCH 2/7] NFS: if ATTR_KILL_S*ID bits are set, then skip mode change

2007-09-14 Thread Greg Banks
On Fri, Sep 14, 2007 at 09:38:46AM -0400, Jeff Layton wrote:
 On Fri, 14 Sep 2007 23:09:24 +1000
 Greg Banks [EMAIL PROTECTED] wrote:
 
  On Fri, Sep 14, 2007 at 07:02:58AM -0400, Jeff Layton wrote:
   On Fri, 14 Sep 2007 20:25:45 +1000
   Greg Banks [EMAIL PROTECTED] wrote:
   
I'm curious about the reasons behind this change.  You mention
credential issues; how exactly is it that you have the correct creds
to perform a WRITE rpc but not a SETATTR rpc?

   
   Consider this case. user1 and user2 are both members of group
   allusers:
   
   user1$ echo foo  foo
   user1$ chgrp allusers foo
   user1$ chmod 04770 foo
   user2$ echo bar  foo
   
   On most local filesystems, this would work correctly. The end result
   would be a file with mode 0770 and the expected contents. On NFS
   though, the write by user2 fails. When the write is attempted, the
   kernel tries to squash the setuid bit using the credentials of user2,
   who's not allowed to change the mode. The write then fails because the
   setattr fails.
  
  Ok, I ran an experiment and I see this failure mode.
  
  So the SETATTR rpc is really a side effect of the client kernel's
  behaviour and not an operation directly requested by the user process
  on the client.  Is there any reason why that rpc needs to have user2's
  creds?  Why not do the rpc with a fake set of creds with uid and gid
  set to the uid and gid of the file, in this case user1/allusers ?
  That way the rpc will most likely pass the server's permission check.
  
 
 That might work in some cases, but there are many where it wouldn't...
 
 Suppose user1 here is root and all of the user1 operations are being
 done on the server. If the server has root squashing enabled, then
 user2's operation would still fail.

In that case, user1's operations would also fail, which is even more
serious a problem.  Also arguably you actually *want* writes by a
nonroot user to a setuid root executable to fail ;-)

 Another problem:
 
 Suppose we're using gssapi. There's no guarantee that the client will
 have the proper credentials to fake up a call as user1 (you might need
 user1 krb5 tickets, etc).

Yes, good point.  You could use the root creds, except for root squashing.
Ok, you convinced me.

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [PATCH 2/7] NFS: if ATTR_KILL_S*ID bits are set, then skip mode change

2007-09-14 Thread Greg Banks
On Fri, Sep 14, 2007 at 10:58:38AM -0400, Jeff Layton wrote:
 On Sat, 15 Sep 2007 00:40:33 +1000
 Greg Banks [EMAIL PROTECTED] wrote:
 
 
  Ok, you convinced me.
 
 Right. When I was first looking at this, I considered some similar
 approaches, but hit roadblocks with all of them. The only real option
 seems to be to leave this to the server, but that does assume that the
 server handles this properly.
 
 Servers that don't are broken, IMO.

According to what spec?  A quick trip around the machine room shows
that neither Solaris 10 nor Darwin 7.9.0 clobber setuid on write
either.

 If Irix isn't clearing these bits
 on a write then it might be good to see if they can fix that...

I think first you'd have to mount a serious argument that it's broken,
more serious than it works differently from Linux.

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.xx - NFSv3 vs. Samba Data Transfer Semantics

2005-08-06 Thread Greg Banks
On Sat, Aug 06, 2005 at 10:34:55AM -0400, Justin Piszcz wrote:
> UDP/NFSv3:

Don't use UDP.  It won't help you with this problem, but use TCP.

> UDP/Samba, Win2K->Linux box:
  ^^^
That would be a surprise.

>   When NFS transfers are taking 
> place, watching gkrellm, I see 64MB/s for a few seconds then it goes to 0 
> as the disk (hda) continues to write for 3-4 seconds, this continues on 
> and off.  

It's instructive to watch the server's disk traffic on a graph with the
same timescale as the network traffic.

> I am using XFS filesystems on both Linux machines.  The drives are 7200RPM 
> Seagate HDDs with either 2MB or 8MB of cache.

With a single drive, your transfer rate is going to be disk limited 
to probably 40-50 MB/s anyway.

> Are there any 'tweaks' or 'hacks' to make NFS behave more like Samba or 

The 'async' export option.  RTFM before you use it.

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.xx - NFSv3 vs. Samba Data Transfer Semantics

2005-08-06 Thread Greg Banks
On Sat, Aug 06, 2005 at 10:34:55AM -0400, Justin Piszcz wrote:
 UDP/NFSv3:

Don't use UDP.  It won't help you with this problem, but use TCP.

 UDP/Samba, Win2K-Linux box:
  ^^^
That would be a surprise.

   When NFS transfers are taking 
 place, watching gkrellm, I see 64MB/s for a few seconds then it goes to 0 
 as the disk (hda) continues to write for 3-4 seconds, this continues on 
 and off.  

It's instructive to watch the server's disk traffic on a graph with the
same timescale as the network traffic.

 I am using XFS filesystems on both Linux machines.  The drives are 7200RPM 
 Seagate HDDs with either 2MB or 8MB of cache.

With a single drive, your transfer rate is going to be disk limited 
to probably 40-50 MB/s anyway.

 Are there any 'tweaks' or 'hacks' to make NFS behave more like Samba or 

The 'async' export option.  RTFM before you use it.

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bdflush/rpciod high CPU utilization, profile does not make sense

2005-04-11 Thread Greg Banks
On Tue, 2005-04-12 at 01:42, Jakob Oestergaard wrote:
> Yes, as far as I know - the Broadcom Tigeon3 driver does not have the
> option of enabling/disabling RX polling (if we agree that is what we're
> talking about), but looking in tg3.c it seems that it *always*
> unconditionally uses NAPI...

I've whined and moaned about this in the past, but for all its
faults NAPI on tg3 doesn't lose packets.  It does cause a huge
increase in irq cpu time on multiple fast CPUs.  What irq rate
are you seeing?

I did once post a patch to make NAPI for tg3 selectable at
configure time.
http://marc.theaimsgroup.com/?l=linux-netdev=107183822710263=2

> No dropped packets... I wonder if the tg3 driver is being completely
> honest about this...

At one point it wasn't, since this patch it is:
http://marc.theaimsgroup.com/?l=linux-netdev=108433829603319=2

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bdflush/rpciod high CPU utilization, profile does not make sense

2005-04-11 Thread Greg Banks
On Tue, 2005-04-12 at 01:42, Jakob Oestergaard wrote:
 Yes, as far as I know - the Broadcom Tigeon3 driver does not have the
 option of enabling/disabling RX polling (if we agree that is what we're
 talking about), but looking in tg3.c it seems that it *always*
 unconditionally uses NAPI...

I've whined and moaned about this in the past, but for all its
faults NAPI on tg3 doesn't lose packets.  It does cause a huge
increase in irq cpu time on multiple fast CPUs.  What irq rate
are you seeing?

I did once post a patch to make NAPI for tg3 selectable at
configure time.
http://marc.theaimsgroup.com/?l=linux-netdevm=107183822710263w=2

 No dropped packets... I wonder if the tg3 driver is being completely
 honest about this...

At one point it wasn't, since this patch it is:
http://marc.theaimsgroup.com/?l=linux-netdevm=108433829603319w=2

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bdflush/rpciod high CPU utilization, profile does not make sense

2005-04-07 Thread Greg Banks
On Thu, Apr 07, 2005 at 05:38:48PM +0200, Jakob Oestergaard wrote:
> On Thu, Apr 07, 2005 at 09:19:06AM +1000, Greg Banks wrote:
> ...
> > How large is the client's RAM? 
> 
> 2GB - (32 bit kernel because it's dual PIII, so I use highmem)

Ok, that's probably not enough to fully trigger some of the problems
I've seen on large-memory NFS clients.

> A few more details:
> 
> With standard VM settings, the client will be laggy during the copy, but
> it will also have a load average around 10 (!)   And really, the only
> thing I do with it is one single 'cp' operation.  The CPU hogs are
> pdflush, rpciod/0 and rpciod/1.

NFS writes of single files much larger than client RAM still have
interesting issues.

> I tweaked the VM a bit, put the following in /etc/sysctl.conf:
>  vm.dirty_writeback_centisecs=100
>  vm.dirty_expire_centisecs=200
> 
> The defaults are 500 and 3000 respectively...

Yes, you want more frequent and smaller writebacks.  It may help to
reduce vm.dirty_ratio and possibly vm.dirty_background_ratio.

> This improved things a lot; the client is now "almost not very laggy",
> and load stays in the saner 1-2 range.
> 
> Still, system CPU utilization is very high (still from rpciod and
> pdflush - more rpciod and less pdflush though),

This is probably the rpciod's and pdflush all trying to do things
at the same time and contending for the BKL.

> During the copy I typically see:
> 
> nfs_write_data  681   952 480  8 1 : tunables  54 27 8 : slabdata 119 119 108
> nfs_page  15639 18300  64 61 1 : tunables 120 60 8 : slabdata 300 300 180

That's not so bad, it's only about 3% of the system's pages.

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bdflush/rpciod high CPU utilization, profile does not make sense

2005-04-07 Thread Greg Banks
On Thu, Apr 07, 2005 at 05:38:48PM +0200, Jakob Oestergaard wrote:
 On Thu, Apr 07, 2005 at 09:19:06AM +1000, Greg Banks wrote:
 ...
  How large is the client's RAM? 
 
 2GB - (32 bit kernel because it's dual PIII, so I use highmem)

Ok, that's probably not enough to fully trigger some of the problems
I've seen on large-memory NFS clients.

 A few more details:
 
 With standard VM settings, the client will be laggy during the copy, but
 it will also have a load average around 10 (!)   And really, the only
 thing I do with it is one single 'cp' operation.  The CPU hogs are
 pdflush, rpciod/0 and rpciod/1.

NFS writes of single files much larger than client RAM still have
interesting issues.

 I tweaked the VM a bit, put the following in /etc/sysctl.conf:
  vm.dirty_writeback_centisecs=100
  vm.dirty_expire_centisecs=200
 
 The defaults are 500 and 3000 respectively...

Yes, you want more frequent and smaller writebacks.  It may help to
reduce vm.dirty_ratio and possibly vm.dirty_background_ratio.

 This improved things a lot; the client is now almost not very laggy,
 and load stays in the saner 1-2 range.
 
 Still, system CPU utilization is very high (still from rpciod and
 pdflush - more rpciod and less pdflush though),

This is probably the rpciod's and pdflush all trying to do things
at the same time and contending for the BKL.

 During the copy I typically see:
 
 nfs_write_data  681   952 480  8 1 : tunables  54 27 8 : slabdata 119 119 108
 nfs_page  15639 18300  64 61 1 : tunables 120 60 8 : slabdata 300 300 180

That's not so bad, it's only about 3% of the system's pages.

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bdflush/rpciod high CPU utilization, profile does not make sense

2005-04-06 Thread Greg Banks
On Wed, Apr 06, 2005 at 06:01:23PM +0200, Jakob Oestergaard wrote:
> 
> Problem; during simple tests such as a 'cp largefile0 largefile1' on the
> client (under the mountpoint from the NFS server), the client becomes
> extremely laggy, NFS writes are slow, and I see very high CPU
> utilization by bdflush and rpciod.
> 
> For example, writing a single 8G file with dd will give me about
> 20MB/sec (I get 60+ MB/sec locally on the server), and the client rarely
> drops below 40% system CPU utilization.

How large is the client's RAM?  What does the following command report
before and during the write?

egrep 'nfs_page|nfs_write_data' /proc/slabinfo

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bdflush/rpciod high CPU utilization, profile does not make sense

2005-04-06 Thread Greg Banks
On Wed, Apr 06, 2005 at 06:01:23PM +0200, Jakob Oestergaard wrote:
 
 Problem; during simple tests such as a 'cp largefile0 largefile1' on the
 client (under the mountpoint from the NFS server), the client becomes
 extremely laggy, NFS writes are slow, and I see very high CPU
 utilization by bdflush and rpciod.
 
 For example, writing a single 8G file with dd will give me about
 20MB/sec (I get 60+ MB/sec locally on the server), and the client rarely
 drops below 40% system CPU utilization.

How large is the client's RAM?  What does the following command report
before and during the write?

egrep 'nfs_page|nfs_write_data' /proc/slabinfo

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [PATCH] SGI 926917: make knfsd interact cleanly with HSMs

2005-03-30 Thread Greg Banks
On Thu, 2005-03-31 at 11:58, Neil Brown wrote:
> On Thursday March 31, [EMAIL PROTECTED] wrote:
> > On Tue, 2005-03-15 at 18:49, Greg Banks wrote:
> > > This patch seeks to remedy the interaction between knfsd and HSMs by
> > > providing mechanisms to allow knfsd to tell an underlying filesystem
> > > (which supports HSMs) not to block for reads, writes and truncates
> > > of offline files.  It's a port of a Linux 2.4 patch used in SGI's
> > > ProPack distro for the last 12 months.  The patch:
> > 
> > Any news on this patch?  Is it good, bad, ugly, or what?
> [...]
> Yes, it looks reasonably sane.
> 
> I'm not very comfortable about the
> 
> + if (rqstp->rq_vers == 3)
> 
> usage.  Shouldn't it be 
> + if (rqstp->rq_vers >= 3)
> as presumably NFSv4 would like NFSERR_JUKEBOX returns too.

I guess so, but I haven't tested it with v4.  I'll update the patch.

> Also, it assumes an extension to the semantics of IFREG files such
> that O_NONBLOCK has a meaning... 

Yes.

> What exactly is that meaning?
> "Returned -EAGAIN if the request will take a long time for some vague
> definition of long" ...

This is one of the issues I'd appreciate some real feedback on, so
I've cc'ed lkml and fsdevel.

The specific and practical answer is "Return -EAGAIN if DMAPI decides
it needs to queue an event", but that only applies to XFS (and JFS
in SLES) so it's not really a generic definition.

>From knfsd's point of view, the desired definition is "Return -EAGAIN
if the operation is likely to take longer than a client RPC timeout".
Of course, the server doesn't know what that number is, although 1.1 sec
is a pretty good guess.

Perhaps the best definition is "Return -EAGAIN if the operation needs
to block on something other than a disk IO".  This covers what actually
happens in the guts of XFS, what needs generically to happen for HSMs,
and suits the needs of knfsd.

> Is this new semantic in any way 'standard' or accepted by the
> filesystem gurus (e.g. Al Viro)??

It's not currently standard; my hope is to extend the standard.
I've cc'ed Al Viro in the hope of some feedback.

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [PATCH] SGI 926917: make knfsd interact cleanly with HSMs

2005-03-30 Thread Greg Banks
On Thu, 2005-03-31 at 11:58, Neil Brown wrote:
 On Thursday March 31, [EMAIL PROTECTED] wrote:
  On Tue, 2005-03-15 at 18:49, Greg Banks wrote:
   This patch seeks to remedy the interaction between knfsd and HSMs by
   providing mechanisms to allow knfsd to tell an underlying filesystem
   (which supports HSMs) not to block for reads, writes and truncates
   of offline files.  It's a port of a Linux 2.4 patch used in SGI's
   ProPack distro for the last 12 months.  The patch:
  
  Any news on this patch?  Is it good, bad, ugly, or what?
 [...]
 Yes, it looks reasonably sane.
 
 I'm not very comfortable about the
 
 + if (rqstp-rq_vers == 3)
 
 usage.  Shouldn't it be 
 + if (rqstp-rq_vers = 3)
 as presumably NFSv4 would like NFSERR_JUKEBOX returns too.

I guess so, but I haven't tested it with v4.  I'll update the patch.

 Also, it assumes an extension to the semantics of IFREG files such
 that O_NONBLOCK has a meaning... 

Yes.

 What exactly is that meaning?
 Returned -EAGAIN if the request will take a long time for some vague
 definition of long ...

This is one of the issues I'd appreciate some real feedback on, so
I've cc'ed lkml and fsdevel.

The specific and practical answer is Return -EAGAIN if DMAPI decides
it needs to queue an event, but that only applies to XFS (and JFS
in SLES) so it's not really a generic definition.

From knfsd's point of view, the desired definition is Return -EAGAIN
if the operation is likely to take longer than a client RPC timeout.
Of course, the server doesn't know what that number is, although 1.1 sec
is a pretty good guess.

Perhaps the best definition is Return -EAGAIN if the operation needs
to block on something other than a disk IO.  This covers what actually
happens in the guts of XFS, what needs generically to happen for HSMs,
and suits the needs of knfsd.

 Is this new semantic in any way 'standard' or accepted by the
 filesystem gurus (e.g. Al Viro)??

It's not currently standard; my hope is to extend the standard.
I've cc'ed Al Viro in the hope of some feedback.

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.4] SGI 932676 link_path_walk refcount problem allows umount of active filesystem

2005-03-21 Thread Greg Banks
G'day,

The attached patch fixes a bug in the VFS code which causes
"Busy inodes after unmount" and a subsequent oops.

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

Following an absolute symlink opens a window during which the
filesystem containing the symlink has an outstanding dentry count
and no outstanding vfsmount count.  A umount() of the filesystem can
(incorrectly) proceed, resulting in the "Busy inodes after unmount"
message and an oops shortly thereafter.

Systems using autofs-controlled NFS mounts are especially vulnerable,
as autofs both increases the number of unmounts happening and does NFS
mounting in response to lookups which can result in multiple-second
vulnerability windows.  However the bug could happen on any filesystem.

This patch adds a mntget()/mntput() pair around the link following code
(as the 2.6 code does).  Attempts to umount() during link following
now return EBUSY.


Signed-off-by: Greg Banks <[EMAIL PROTECTED]>
---
 linux/linux/fs/namei.c |7 +++
 1 files changed, 7 insertions(+)

--- a/linux/linux/fs/namei.c	2005-03-21 12:53:48 +11:00
+++ b/linux/linux/fs/namei.c	2005-03-21 12:16:46 +11:00
@@ -541,8 +541,10 @@
 			goto out_dput;
 
 		if (inode->i_op->follow_link) {
+			struct vfsmount *mnt = mntget(nd->mnt);
 			err = do_follow_link(dentry, nd);
 			dput(dentry);
+			mntput(mnt);
 			if (err)
 goto return_err;
 			err = -ENOENT;
@@ -596,8 +598,10 @@
 		inode = dentry->d_inode;
 		if ((lookup_flags & LOOKUP_FOLLOW)
 		&& inode && inode->i_op && inode->i_op->follow_link) {
+			struct vfsmount *mnt = mntget(nd->mnt);
 			err = do_follow_link(dentry, nd);
 			dput(dentry);
+			mntput(mnt);
 			if (err)
 goto return_err;
 			inode = nd->dentry->d_inode;
@@ -1002,6 +1006,7 @@
 	int acc_mode, error = 0;
 	struct inode *inode;
 	struct dentry *dentry;
+	struct vfsmount *mnt;
 	struct dentry *dir;
 	int count = 0;
 
@@ -1185,8 +1190,10 @@
 	 * are done. Procfs-like symlinks just set LAST_BIND.
 	 */
 	UPDATE_ATIME(dentry->d_inode);
+	mnt = mntget(nd->mnt);
 	error = dentry->d_inode->i_op->follow_link(dentry, nd);
 	dput(dentry);
+	mntput(mnt);
 	if (error)
 		return error;
 	if (nd->last_type == LAST_BIND) {


[PATCH 2.4] SGI 932676 link_path_walk refcount problem allows umount of active filesystem

2005-03-21 Thread Greg Banks
G'day,

The attached patch fixes a bug in the VFS code which causes
Busy inodes after unmount and a subsequent oops.

Greg.
-- 
Greg Banks, RD Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

Following an absolute symlink opens a window during which the
filesystem containing the symlink has an outstanding dentry count
and no outstanding vfsmount count.  A umount() of the filesystem can
(incorrectly) proceed, resulting in the Busy inodes after unmount
message and an oops shortly thereafter.

Systems using autofs-controlled NFS mounts are especially vulnerable,
as autofs both increases the number of unmounts happening and does NFS
mounting in response to lookups which can result in multiple-second
vulnerability windows.  However the bug could happen on any filesystem.

This patch adds a mntget()/mntput() pair around the link following code
(as the 2.6 code does).  Attempts to umount() during link following
now return EBUSY.


Signed-off-by: Greg Banks [EMAIL PROTECTED]
---
 linux/linux/fs/namei.c |7 +++
 1 files changed, 7 insertions(+)

--- a/linux/linux/fs/namei.c	2005-03-21 12:53:48 +11:00
+++ b/linux/linux/fs/namei.c	2005-03-21 12:16:46 +11:00
@@ -541,8 +541,10 @@
 			goto out_dput;
 
 		if (inode-i_op-follow_link) {
+			struct vfsmount *mnt = mntget(nd-mnt);
 			err = do_follow_link(dentry, nd);
 			dput(dentry);
+			mntput(mnt);
 			if (err)
 goto return_err;
 			err = -ENOENT;
@@ -596,8 +598,10 @@
 		inode = dentry-d_inode;
 		if ((lookup_flags  LOOKUP_FOLLOW)
 		 inode  inode-i_op  inode-i_op-follow_link) {
+			struct vfsmount *mnt = mntget(nd-mnt);
 			err = do_follow_link(dentry, nd);
 			dput(dentry);
+			mntput(mnt);
 			if (err)
 goto return_err;
 			inode = nd-dentry-d_inode;
@@ -1002,6 +1006,7 @@
 	int acc_mode, error = 0;
 	struct inode *inode;
 	struct dentry *dentry;
+	struct vfsmount *mnt;
 	struct dentry *dir;
 	int count = 0;
 
@@ -1185,8 +1190,10 @@
 	 * are done. Procfs-like symlinks just set LAST_BIND.
 	 */
 	UPDATE_ATIME(dentry-d_inode);
+	mnt = mntget(nd-mnt);
 	error = dentry-d_inode-i_op-follow_link(dentry, nd);
 	dput(dentry);
+	mntput(mnt);
 	if (error)
 		return error;
 	if (nd-last_type == LAST_BIND) {


[PATCH] SGI 926917: make knfsd interact cleanly with HSMs

2005-03-14 Thread Greg Banks
G'day,

The NFSv3 protocol specifies an error, NFS3ERR_JUKEBOX, which a server
should return when an I/O operation will take a very long time.
This causes a different pattern of retries in clients, and avoids
a number of serious problems associated with I/Os which take longer
than an RPC timeout.  The Linux knfsd server has code to generate the
jukebox error and many NFS clients are known to have working code to
handle it.

One scenario in which a server should emit the JUKEBOX error is when
a file data which the client is attempting to access is managed by
an HSM (Hierarchical Storage Manager) and is not present on the disk
and needs to be brought in from tape.  Due to the nature of tapes this
operation can take minutes rather than the milliseconds normally seen
for local file data.

Currently the Linux knfsd handles this situation poorly.  A READ NFS
call will cause the nfsd thread handling it to block until the file
is available, without sending a reply to the NFS client.  After a
few seconds the client retries, and this second READ call causes
another nfsd to block behind the first one.  A few seconds later and
the client's retries have blocked *all* the nfsd threads, and all NFS
service from the server stops until the original file arrives on disk.

WRITEs and SETATTRs which truncate the file are marginally better, in
that the knfsd dupcache will catch the retries and drop them without
blocking an nfsd (the dupcache *will* catch the retries because the
cache entry remains in RC_INPROG state and is not reused until the
first call finishes).  However the first call still blocks, so given
WRITEs to enough offline files the server can still be locked up.

There are also client-side implications, depending on the client
implementation.  For example, on a Linux client an RPC retry loop uses
an RPC request slot, so reads from enough separate offline files can
lock up a mountpoint.

This patch seeks to remedy the interaction between knfsd and HSMs by
providing mechanisms to allow knfsd to tell an underlying filesystem
(which supports HSMs) not to block for reads, writes and truncates
of offline files.  It's a port of a Linux 2.4 patch used in SGI's
ProPack distro for the last 12 months.  The patch:

*  provides a new ATTR_NO_BLOCK flag which the kernel can
   use to tell a filesystem's inode_ops->setattr() operation not
   to block when truncating an offline file.  XFS already obeys
   this flag (inside a #ifdef)
   
*  changes knfsd to provide ATTR_NO_BLOCK when it does the VFS
   calls to implement the SETATTR NFS call.

*  changes knfsd to supply the O_NONBLOCK flag in the temporary
   struct file it uses for VFS reads and writes, in order to ask
   the filesystem not to block when reading or writing an offline
   file.  XFS already obeys this new semantic for O_NONBLOCK
   (and in SLES9 so does JFS).

*  adds code to translate the -EAGAIN the filesystem returns when
   it would have blocked, to the -ETIMEDOUT that knfsd expects.


Signed-off-by: Greg Banks <[EMAIL PROTECTED]>
---
 fs/nfsd/vfs.c  |   33 +++--
 include/linux/fs.h |1 +
 2 files changed, 32 insertions(+), 2 deletions(-)


Index: linux/fs/nfsd/vfs.c
===
--- linux.orig/fs/nfsd/vfs.c2005-03-07 13:13:57.0 +1100
+++ linux/fs/nfsd/vfs.c 2005-03-07 14:01:52.0 +1100
@@ -311,6 +311,16 @@ nfsd_setattr(struct svc_rqst *rqstp, str
goto out_nfserr;
}
DQUOT_INIT(inode);
+
+
+   /*
+* Tell a Hierarchical Storage Manager (e.g. via DMAPI) to
+* return EAGAIN when an action would take minutes instead of
+* milliseconds so that NFS can reply to the client with
+* NFSERR_JUKEBOX instead of blocking an nfsd thread.
+*/
+   if (rqstp->rq_vers == 3)
+   iap->ia_valid |= ATTR_NO_BLOCK;
}
 
imode = inode->i_mode;
@@ -333,6 +343,9 @@ nfsd_setattr(struct svc_rqst *rqstp, str
if (!check_guard || guardtime == inode->i_ctime.tv_sec) {
fh_lock(fhp);
err = notify_change(dentry, iap);
+   /* to get NFSERR_JUKEBOX on the wire, need -ETIMEDOUT */
+   if (err == -EAGAIN)
+   err = -ETIMEDOUT;
err = nfserrno(err);
fh_unlock(fhp);
}
@@ -671,6 +684,10 @@ nfsd_read(struct svc_rqst *rqstp, struct
if (ra)
file.f_ra = ra->p_ra;
 
+   /* Support HSMs -- see comment in nfsd_setattr() */
+   if (rqstp->rq_vers == 3)
+   file.f_flags |= O_NONBLOCK;
+
if (file.f_op->sendfile) {
svc_pushback_unused_pages(rqstp);
err = file.f_op->sendfile(, , *count,
@@ -694,8 +711,12 @@ nfsd_read(struct svc_rqst *rqstp, struct
*cou

[PATCH] SGI 926917: make knfsd interact cleanly with HSMs

2005-03-14 Thread Greg Banks
G'day,

The NFSv3 protocol specifies an error, NFS3ERR_JUKEBOX, which a server
should return when an I/O operation will take a very long time.
This causes a different pattern of retries in clients, and avoids
a number of serious problems associated with I/Os which take longer
than an RPC timeout.  The Linux knfsd server has code to generate the
jukebox error and many NFS clients are known to have working code to
handle it.

One scenario in which a server should emit the JUKEBOX error is when
a file data which the client is attempting to access is managed by
an HSM (Hierarchical Storage Manager) and is not present on the disk
and needs to be brought in from tape.  Due to the nature of tapes this
operation can take minutes rather than the milliseconds normally seen
for local file data.

Currently the Linux knfsd handles this situation poorly.  A READ NFS
call will cause the nfsd thread handling it to block until the file
is available, without sending a reply to the NFS client.  After a
few seconds the client retries, and this second READ call causes
another nfsd to block behind the first one.  A few seconds later and
the client's retries have blocked *all* the nfsd threads, and all NFS
service from the server stops until the original file arrives on disk.

WRITEs and SETATTRs which truncate the file are marginally better, in
that the knfsd dupcache will catch the retries and drop them without
blocking an nfsd (the dupcache *will* catch the retries because the
cache entry remains in RC_INPROG state and is not reused until the
first call finishes).  However the first call still blocks, so given
WRITEs to enough offline files the server can still be locked up.

There are also client-side implications, depending on the client
implementation.  For example, on a Linux client an RPC retry loop uses
an RPC request slot, so reads from enough separate offline files can
lock up a mountpoint.

This patch seeks to remedy the interaction between knfsd and HSMs by
providing mechanisms to allow knfsd to tell an underlying filesystem
(which supports HSMs) not to block for reads, writes and truncates
of offline files.  It's a port of a Linux 2.4 patch used in SGI's
ProPack distro for the last 12 months.  The patch:

*  provides a new ATTR_NO_BLOCK flag which the kernel can
   use to tell a filesystem's inode_ops-setattr() operation not
   to block when truncating an offline file.  XFS already obeys
   this flag (inside a #ifdef)
   
*  changes knfsd to provide ATTR_NO_BLOCK when it does the VFS
   calls to implement the SETATTR NFS call.

*  changes knfsd to supply the O_NONBLOCK flag in the temporary
   struct file it uses for VFS reads and writes, in order to ask
   the filesystem not to block when reading or writing an offline
   file.  XFS already obeys this new semantic for O_NONBLOCK
   (and in SLES9 so does JFS).

*  adds code to translate the -EAGAIN the filesystem returns when
   it would have blocked, to the -ETIMEDOUT that knfsd expects.


Signed-off-by: Greg Banks [EMAIL PROTECTED]
---
 fs/nfsd/vfs.c  |   33 +++--
 include/linux/fs.h |1 +
 2 files changed, 32 insertions(+), 2 deletions(-)


Index: linux/fs/nfsd/vfs.c
===
--- linux.orig/fs/nfsd/vfs.c2005-03-07 13:13:57.0 +1100
+++ linux/fs/nfsd/vfs.c 2005-03-07 14:01:52.0 +1100
@@ -311,6 +311,16 @@ nfsd_setattr(struct svc_rqst *rqstp, str
goto out_nfserr;
}
DQUOT_INIT(inode);
+
+
+   /*
+* Tell a Hierarchical Storage Manager (e.g. via DMAPI) to
+* return EAGAIN when an action would take minutes instead of
+* milliseconds so that NFS can reply to the client with
+* NFSERR_JUKEBOX instead of blocking an nfsd thread.
+*/
+   if (rqstp-rq_vers == 3)
+   iap-ia_valid |= ATTR_NO_BLOCK;
}
 
imode = inode-i_mode;
@@ -333,6 +343,9 @@ nfsd_setattr(struct svc_rqst *rqstp, str
if (!check_guard || guardtime == inode-i_ctime.tv_sec) {
fh_lock(fhp);
err = notify_change(dentry, iap);
+   /* to get NFSERR_JUKEBOX on the wire, need -ETIMEDOUT */
+   if (err == -EAGAIN)
+   err = -ETIMEDOUT;
err = nfserrno(err);
fh_unlock(fhp);
}
@@ -671,6 +684,10 @@ nfsd_read(struct svc_rqst *rqstp, struct
if (ra)
file.f_ra = ra-p_ra;
 
+   /* Support HSMs -- see comment in nfsd_setattr() */
+   if (rqstp-rq_vers == 3)
+   file.f_flags |= O_NONBLOCK;
+
if (file.f_op-sendfile) {
svc_pushback_unused_pages(rqstp);
err = file.f_op-sendfile(file, offset, *count,
@@ -694,8 +711,12 @@ nfsd_read(struct svc_rqst *rqstp, struct
*count = err;
err = 0

Re: [RFC: 2.6 patch] unexport get_wchan

2005-01-31 Thread Greg Banks
On Mon, Jan 31, 2005 at 02:36:17PM +0100, Adrian Bunk wrote:
> The only user of get_wchan I was able to find is the proc fs - and proc 
> can't be built modular.
> 
> Is the patch below to remove the export of get_wchan correct or did I 
> oversee something?

I have an oprofile patch queued up which uses get_wchan.  Oprofile
can be built modular.

Greg.
-- 
Greg Banks, R Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kbuild-devel] Configure.help entries wanted

2001-05-27 Thread Greg Banks

Jaswinder Singh wrote:
> 
> What is the companion chip in DMIDA ?

  HD64465.

> IrDA and USB are working properly in linux ?

  No.  IrDA seems easy, just haven't got around to it.
USB is a major pain on the HD64465 because of the way it
deals with "host" memory.  I had a driver which initialised
the root hub at one point but haven't had time to push
it any further.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-27 Thread Greg Banks

Jaswinder Singh wrote:
> 
> 
> i even face problem in xscribble too , i think it donot likes my handwriting
> ;)

  Or anyone else's.

> Are you having sources of Calligrapher ?

  No.

> If no , i know that you can write better version then Calligrapher in Linux
> :)

  This would seem a perfect opportunity for sarcasm.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-27 Thread Greg Banks

Jaswinder Singh wrote:
 
 What is the companion chip in DMIDA ?

  HD64465.

 IrDA and USB are working properly in linux ?

  No.  IrDA seems easy, just haven't got around to it.
USB is a major pain on the HD64465 because of the way it
deals with host memory.  I had a driver which initialised
the root hub at one point but haven't had time to push
it any further.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-27 Thread Greg Banks

Jaswinder Singh wrote:
 
 
 i even face problem in xscribble too , i think it donot likes my handwriting
 ;)

  Or anyone else's.

 Are you having sources of Calligrapher ?

  No.

 If no , i know that you can write better version then Calligrapher in Linux
 :)

  This would seem a perfect opportunity for sarcasm.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-26 Thread Greg Banks

Alan Cox wrote:
> 
> > Visual Studio, or the feature where you can have a decent handwriting
> > recognition system, or the feature where you can run Pocket {Internet
> > Explorer,Word} then the answer is none of them.
> 
> Handwriting recognition with fscrib works very well indeed.

  Ok, I've found the description of this on handhelds.org, and
it appears to be a derivative of xscribble, which I have tried.
Unlike xscribble it does fullscreen mode, which is good, but
it's still single-character and requires the user to learn
how to write all over again.  In other words, like everything
else available on Linux (and even MS's Jot) it's *crap* compared to

http://www.paragraph.com/products/internetink/calligrapher/features.html

  I would give my eye teeth for a Linux version of Calligrapher.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-26 Thread Greg Banks

Jaswinder Singh wrote:
> 
> "Greg Banks" <[EMAIL PROTECTED]> wrote:
> >
> >   I have some code which could become the basis for such a thing.
> > It's a touch panel driver for the DMIDA but it also has a device-
> > independent layer which does supersampling, scaling, provides
> > raw and cooked Linux Input interfaces, and a /proc interface to
> > allow the calibration app to control the scaling.
> >
> >   Unfortunately I can't release it yet for (ahem) legal reasons.
> >
> 
> nice job , from where you get the related specs ?

  From the manufacturer.  We had the fullest possible co-operation
including all technical specs, several sample machines, source to
the WinCE drivers for the hardware, and engineers (including the lead
hardware designer, a *very* cluey gentleman) on standby to answer
questions.

  It makes an amazing difference to have all the nasty little
hardware quirks and known workarounds for them laid bare before you.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-26 Thread Greg Banks

[EMAIL PROTECTED] wrote:
> 
> Greg Banks <[EMAIL PROTECTED]>:
> >   Having said that, I agree that the help text entries for the SH
> > port are in general of less than stellar quality, for various
> > (mostly good) reasons.  I'm hoping ESR will give us some editorial
> > feedback which will provide a good excuse to fix them.
> 
> Since you asked...
> 
> # Choice: superhsys
> Generic
> CONFIG_SH_GENERIC
>   Select Generic if configuring for a generic SuperH system.

  The "generic" option compiles in *all* the possible hardware
support and relies on the sh_mv= kernel commandline option to choose
at runtime which routines to use.  "MV" stands for "machine vector";
each of the machines below is described by a machine vector and
the "generic" option chooses to compile them all in.

>   Select SolutionEngine if configuring for a Hitachi SH7709
>   or SH7750 evalutation board.
> 
>   Select Overdrive if configuring for a ST407750 Overdrive board.
>   More information at
>   <http://linuxsh.sourceforge.net/docs/7750overdrive.php3>
> 
>   Select HP620 if configuring for a HP Jornada HP620.
>   More information at
>   <http://www.hp.com/jornada>.
> 
>   Select HP680 if configuring for a HP Jornada HP680.
>   More information at
>   <http://www.hp.com/jornada/products/680>.
> 
>   Select HP690 if configuring for a HP Jornada HP690.
>   More information at <http://www.hp.com/jornada/products/680>.

  You won't get any information about Linux on Jornadas at HP.

> 
>   Select CqREEK if configuring for a CqREEK SH7708 or SH7750.
>   More information at
>   <http://sources.redhat.com/ecos/hardware.html#SuperH>.
> 
>   Select DMIDA if configuring for a DataMyte 4000 Industrial
>   Digital Assistant. More information at <http://www.dmida.com>.
> 
>   Select EC3104 if configuring for a system with an Eclipse
>   International EC3104 chip, e.g. the Harris AD2000.
> 
>   Select Dreamcast if configuring for a SEGA Dreamcast.
>   More information at
>   <http://www.m17n.org/linux-sh/dreamcast>.

  The Dreamcast project is at <http://linuxdc.sourceforge.net/>
They usually have slightly newer DC support than
linuxsh.sourceforge.net,
to which they sync regularly.

> 
>   Select BareCPU if you know what this means, and it applies
>   to your system.
> 
> Can you be any more explicit about the BareCPU option?

  "Bare CPU" aka "unknown" means an SH-based system which is not
one of the specific ones mentioned above, which means you need to
enter all sorts of stuff like CONFIG_MEMORY_START because the config
system doesn't already know what it is.  You get a machine vector
without any platform-specific code in it, so things like the RTC may
not work.

  This option is for the early stages of porting to a new machine.

  Basically the machine choices are laid out like this:

  generic = all of the known machines
  machine foo
  machine bar
  unknown = none of the known machines

> Physical memory start address
> CONFIG_MEMORY_START
>   The physical memory start address will be automatically
>   set to 0800, unless you selected one of the following
>   processor types: SolutionEngine, Overdrive, HP620, HP680, HP690,
>   in which case the start address will be set to 0c00.
> 
>   Do not change this address unless you know what you are doing.
> 
> Why might someone want to change this address?

  Only when porting to a new machine which is not already
known by the config system.  Changing it from the known correct
value on any of the known systems will only lead to disaster.

> Early printk support
> CONFIG_SH_EARLY_PRINTK
>   Say Y here to redirect kernel printks from the boot console to an
>   SCI serial console as soon as one is available.
> 
> This was my guess.  Is it correct?

  Nearly.

-  the serial console can be either SCI or SCIF (the latter has a FIFO)

-  the redirect happens *before* the serial console is available, and
   stops when the serial console is initialised

-  printks go to a BIOS conforming to the LinuxSH standard (i.e.
   the SH-IPL bootloader)

  Try:

Say Y here to redirect kernel messages to the serial port
used by the SH-IPL bootloader, starting very early in the boot
process and ending when the kernel's serial console is initialised.
This option is only useful porting the kernel to a new machine,
when the kernel may crash or hang before the serial console is
initialised.

> SuperH SCI (serial) support
> CONFIG_SH_SCI
>   Selecting this option will allow the Linux kernel to transfer
>   data over SCI (Serial Communication Interface) and/or SCIF
>   which are built into the Hitachi SuperH processor.
> 
>   If in doubt, press "y".
> 
> What

Re: [kbuild-devel] Configure.help entries wanted

2001-05-26 Thread Greg Banks

Jaswinder Singh wrote:
> 
> "Alan Cox" <[EMAIL PROTECTED]> wrote :
> 
> >
> > Handwriting recognition with fscrib works very well indeed.
> >
> 
> But not in Linux SH , there is so Touch Panel Interface in Linux SH yet :(

  I have some code which could become the basis for such a thing.
It's a touch panel driver for the DMIDA but it also has a device-
independent layer which does supersampling, scaling, provides
raw and cooked Linux Input interfaces, and a /proc interface to
allow the calibration app to control the scaling.

  Unfortunately I can't release it yet for (ahem) legal reasons.

  Anyway the limitation with handwriting recognition is not getting
the data out of the hardware but recognising the sample stream as
characters.  This is *difficult*.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-26 Thread Greg Banks

Alan Cox wrote:
> 
> > [...] or the feature where you can have a decent handwriting
> > recognition system,[...]
> 
> Handwriting recognition with fscrib works very well indeed.

  I haven't tried that one.  Does it do cursive writing,
with dictionary assistance, on the X root window?

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-26 Thread Greg Banks

[EMAIL PROTECTED] wrote:
 
 Greg Banks [EMAIL PROTECTED]:
Having said that, I agree that the help text entries for the SH
  port are in general of less than stellar quality, for various
  (mostly good) reasons.  I'm hoping ESR will give us some editorial
  feedback which will provide a good excuse to fix them.
 
 Since you asked...
 
 # Choice: superhsys
 Generic
 CONFIG_SH_GENERIC
   Select Generic if configuring for a generic SuperH system.

  The generic option compiles in *all* the possible hardware
support and relies on the sh_mv= kernel commandline option to choose
at runtime which routines to use.  MV stands for machine vector;
each of the machines below is described by a machine vector and
the generic option chooses to compile them all in.

   Select SolutionEngine if configuring for a Hitachi SH7709
   or SH7750 evalutation board.
 
   Select Overdrive if configuring for a ST407750 Overdrive board.
   More information at
   http://linuxsh.sourceforge.net/docs/7750overdrive.php3
 
   Select HP620 if configuring for a HP Jornada HP620.
   More information at
   http://www.hp.com/jornada.
 
   Select HP680 if configuring for a HP Jornada HP680.
   More information at
   http://www.hp.com/jornada/products/680.
 
   Select HP690 if configuring for a HP Jornada HP690.
   More information at http://www.hp.com/jornada/products/680.

  You won't get any information about Linux on Jornadas at HP.

 
   Select CqREEK if configuring for a CqREEK SH7708 or SH7750.
   More information at
   http://sources.redhat.com/ecos/hardware.html#SuperH.
 
   Select DMIDA if configuring for a DataMyte 4000 Industrial
   Digital Assistant. More information at http://www.dmida.com.
 
   Select EC3104 if configuring for a system with an Eclipse
   International EC3104 chip, e.g. the Harris AD2000.
 
   Select Dreamcast if configuring for a SEGA Dreamcast.
   More information at
   http://www.m17n.org/linux-sh/dreamcast.

  The Dreamcast project is at http://linuxdc.sourceforge.net/
They usually have slightly newer DC support than
linuxsh.sourceforge.net,
to which they sync regularly.

 
   Select BareCPU if you know what this means, and it applies
   to your system.
 
 Can you be any more explicit about the BareCPU option?

  Bare CPU aka unknown means an SH-based system which is not
one of the specific ones mentioned above, which means you need to
enter all sorts of stuff like CONFIG_MEMORY_START because the config
system doesn't already know what it is.  You get a machine vector
without any platform-specific code in it, so things like the RTC may
not work.

  This option is for the early stages of porting to a new machine.

  Basically the machine choices are laid out like this:

  generic = all of the known machines
  machine foo
  machine bar
  unknown = none of the known machines

 Physical memory start address
 CONFIG_MEMORY_START
   The physical memory start address will be automatically
   set to 0800, unless you selected one of the following
   processor types: SolutionEngine, Overdrive, HP620, HP680, HP690,
   in which case the start address will be set to 0c00.
 
   Do not change this address unless you know what you are doing.
 
 Why might someone want to change this address?

  Only when porting to a new machine which is not already
known by the config system.  Changing it from the known correct
value on any of the known systems will only lead to disaster.

 Early printk support
 CONFIG_SH_EARLY_PRINTK
   Say Y here to redirect kernel printks from the boot console to an
   SCI serial console as soon as one is available.
 
 This was my guess.  Is it correct?

  Nearly.

-  the serial console can be either SCI or SCIF (the latter has a FIFO)

-  the redirect happens *before* the serial console is available, and
   stops when the serial console is initialised

-  printks go to a BIOS conforming to the LinuxSH standard (i.e.
   the SH-IPL bootloader)

  Try:

Say Y here to redirect kernel messages to the serial port
used by the SH-IPL bootloader, starting very early in the boot
process and ending when the kernel's serial console is initialised.
This option is only useful porting the kernel to a new machine,
when the kernel may crash or hang before the serial console is
initialised.

 SuperH SCI (serial) support
 CONFIG_SH_SCI
   Selecting this option will allow the Linux kernel to transfer
   data over SCI (Serial Communication Interface) and/or SCIF
   which are built into the Hitachi SuperH processor.
 
   If in doubt, press y.
 
 What data?  Is this just an on-board RS232C controller?

  Sorry, the description is unclear.  It's an on-CPU RS232 controller,
usually used as the console.  The option provides 1 to 3 (depending
on the CPU model) standard Linux tty devices, /dev/ttySC[012].

 
 Use LinuxSH standard BIOS
 CONFIG_SH_STANDARD_BIOS Say Y here if your target has the gdb-sh-stub
   package from www.m17n.org (or any conforming standard LinuxSH BIOS)
   in FLASH or EPROM.  The kernel will use

Re: [kbuild-devel] Configure.help entries wanted

2001-05-26 Thread Greg Banks

Jaswinder Singh wrote:
 
 Alan Cox [EMAIL PROTECTED] wrote :
 
 
  Handwriting recognition with fscrib works very well indeed.
 
 
 But not in Linux SH , there is so Touch Panel Interface in Linux SH yet :(

  I have some code which could become the basis for such a thing.
It's a touch panel driver for the DMIDA but it also has a device-
independent layer which does supersampling, scaling, provides
raw and cooked Linux Input interfaces, and a /proc interface to
allow the calibration app to control the scaling.

  Unfortunately I can't release it yet for (ahem) legal reasons.

  Anyway the limitation with handwriting recognition is not getting
the data out of the hardware but recognising the sample stream as
characters.  This is *difficult*.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-26 Thread Greg Banks

Jaswinder Singh wrote:
 
 Greg Banks [EMAIL PROTECTED] wrote:
 
I have some code which could become the basis for such a thing.
  It's a touch panel driver for the DMIDA but it also has a device-
  independent layer which does supersampling, scaling, provides
  raw and cooked Linux Input interfaces, and a /proc interface to
  allow the calibration app to control the scaling.
 
Unfortunately I can't release it yet for (ahem) legal reasons.
 
 
 nice job , from where you get the related specs ?

  From the manufacturer.  We had the fullest possible co-operation
including all technical specs, several sample machines, source to
the WinCE drivers for the hardware, and engineers (including the lead
hardware designer, a *very* cluey gentleman) on standby to answer
questions.

  It makes an amazing difference to have all the nasty little
hardware quirks and known workarounds for them laid bare before you.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-26 Thread Greg Banks

Alan Cox wrote:
 
  [...] or the feature where you can have a decent handwriting
  recognition system,[...]
 
 Handwriting recognition with fscrib works very well indeed.

  I haven't tried that one.  Does it do cursive writing,
with dictionary assistance, on the X root window?

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-26 Thread Greg Banks

Alan Cox wrote:
 
  Visual Studio, or the feature where you can have a decent handwriting
  recognition system, or the feature where you can run Pocket {Internet
  Explorer,Word} then the answer is none of them.
 
 Handwriting recognition with fscrib works very well indeed.

  Ok, I've found the description of this on handhelds.org, and
it appears to be a derivative of xscribble, which I have tried.
Unlike xscribble it does fullscreen mode, which is good, but
it's still single-character and requires the user to learn
how to write all over again.  In other words, like everything
else available on Linux (and even MS's Jot) it's *crap* compared to

http://www.paragraph.com/products/internetink/calligrapher/features.html

  I would give my eye teeth for a Linux version of Calligrapher.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-25 Thread Greg Banks

Eric S. Raymond wrote:
> 
> CONFIG_SH_SCI
> CONFIG_SH_STANDARD_BIOS
> CONFIG_DEBUG_KERNEL_WITH_GDB_STUB

  From the LinuxSH CVS (I can write new ones if these are inadequate):

SuperH SCI (serial) support
CONFIG_SH_SCI
  Selecting this option will allow the Linux kernel to transfer
  data over SCI (Serial Communication Interface) and/or SCIF
  which are built into the Hitachi SuperH processor.

  If in doubt, press "y".

Use LinuxSH standard BIOS
CONFIG_SH_STANDARD_BIOS
  Say Y here if your target has the gdb-sh-stub package from
  www.m17n.org (or any conforming standard LinuxSH BIOS) in FLASH
  or EPROM.  The kernel will use standard BIOS calls during boot
  for various housekeeping tasks.  Note this does not work with
  WindowsCE machines.  If unsure, say N.

GDB Stub kernel debug
CONFIG_DEBUG_KERNEL_WITH_GDB_STUB
  If you say Y here, it will be possible to remotely debug the SuperH
  kernel using gdb, if you have the gdb-sh-stub package from
  www.m17n.org (or any conforming standard LinuxSH BIOS) in FLASH or
  EPROM.  This enlarges your kernel image disk size by several megabytes
  but allows you to load, run and debug the kernel image remotely using
  gdb.  This is only useful for kernel hackers.  If unsure, say N.


Greg.
-- 
These are my opinions not PPIs.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Configure.help entries wanted

2001-05-25 Thread Greg Banks

Eric S. Raymond wrote:
 
 CONFIG_SH_SCI
 CONFIG_SH_STANDARD_BIOS
 CONFIG_DEBUG_KERNEL_WITH_GDB_STUB

  From the LinuxSH CVS (I can write new ones if these are inadequate):

SuperH SCI (serial) support
CONFIG_SH_SCI
  Selecting this option will allow the Linux kernel to transfer
  data over SCI (Serial Communication Interface) and/or SCIF
  which are built into the Hitachi SuperH processor.

  If in doubt, press y.

Use LinuxSH standard BIOS
CONFIG_SH_STANDARD_BIOS
  Say Y here if your target has the gdb-sh-stub package from
  www.m17n.org (or any conforming standard LinuxSH BIOS) in FLASH
  or EPROM.  The kernel will use standard BIOS calls during boot
  for various housekeeping tasks.  Note this does not work with
  WindowsCE machines.  If unsure, say N.

GDB Stub kernel debug
CONFIG_DEBUG_KERNEL_WITH_GDB_STUB
  If you say Y here, it will be possible to remotely debug the SuperH
  kernel using gdb, if you have the gdb-sh-stub package from
  www.m17n.org (or any conforming standard LinuxSH BIOS) in FLASH or
  EPROM.  This enlarges your kernel image disk size by several megabytes
  but allows you to load, run and debug the kernel image remotely using
  gdb.  This is only useful for kernel hackers.  If unsure, say N.


Greg.
-- 
These are my opinions not PPIs.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Why recovering from broken configs is too hard

2001-05-03 Thread Greg Banks

Eric S. Raymond wrote:
> 

  I agree with the main thrust of your argument, but

> It would be hard to know how to order your candidates to present
> them to the user in a natural sequence -- and the problem of deciding
> which variable to present for mutation by the user next, if you choose
> that UI, equates to this.

  There is a natural order for presenting variables to the
user, and that's the menu tree order.  At least in the Linux
kernel CML2 corpus the menus are roughly organised from most
general to most specific options, so options appearing earlier
in the tree are likely to appear in more constraints and you
probably want to ask the user to mutate them later.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [kbuild-devel] Why recovering from broken configs is too hard

2001-05-03 Thread Greg Banks

Eric S. Raymond wrote:
 

  I agree with the main thrust of your argument, but

 It would be hard to know how to order your candidates to present
 them to the user in a natural sequence -- and the problem of deciding
 which variable to present for mutation by the user next, if you choose
 that UI, equates to this.

  There is a natural order for presenting variables to the
user, and that's the menu tree order.  At least in the Linux
kernel CML2 corpus the menus are roughly organised from most
general to most specific options, so options appearing earlier
in the tree are likely to appear in more constraints and you
probably want to ask the user to mutate them later.

Greg.
-- 
If it's a choice between being a paranoid, hyper-suspicious global
village idiot, or a gullible, mega-trusting sheep, I don't look
good in mint sauce.  - jd, slashdot, 11Feb2000.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/