Re: Wow! Is memory ever cheap!

2001-05-08 Thread John Alvord

On Tue, 8 May 2001 22:22:10 -0700, Larry McVoy <[EMAIL PROTECTED]>
wrote:
>
>Just to make sure you understand:  I think ECC is a fine thing.  If I'm
>running systems with no other integrity checks, I'll take ECC and like it.
>However, having ECC does not mean that I trust that my data is safe,
>that is most certainly not a true statement.  The bus, the disks, the
>disk controller, the disk driver, the buffer cache, etc, can all corrupt
>the data.   Oh, yeah, let's not forget NFS.  I have seen each and every
>one of those things corrupt data.

This is an interesting observation of a truth that was well known in
the second generation computers of the 1950s and 1960s. I first worked
at John Hancock... they had a bunch of 7074 machines. All those
systems made use of programmed checksums in each tape block and in
each full file. The reason was that those machines did not have ECC...
they did have parity checking if I remember right. With IBM's third
generation computers (S/360s) and probably other manufacturers, ECC
became a standard feature. Parity checking was added through different
data paths such as channel memory, buffer memory, etc. There was so
much protection added that the programmed checksums became
superfluous.

There were still odd moments. I remember working on an Amdahl computer
problem where some internal data paths... where the contents of one
register moved to an internal storage area... and the path did not
have parity. There was a machine fault... the path was electrically
open, so the contents of the register always became zero. But since it
wasn't parity checked, there was no machine check. I remember another
problem on the IBM 3033. Cosmic rays (really) caused one bit errors in
channel memory. That was parity but not ECC so you got a weird channel
check. Back at the diagnosis ranch, the board looked good. It was only
when someone noticed that the rate of such problems was proportional
to the height above sea level that the light bulb went on.

The lesson is that when paths are not checked, hardware or software,
data being held or transformed can change. Old lesson but a good one
to know.

john alvord
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: what causes Machine Check exception? revisited (2.2.18)

2001-05-08 Thread Mike Fedyk

On Mon, May 07, 2001 at 11:57:17AM +0100, Alan Cox wrote:
> Generally it indicates a CPU problem but I've see it caused by overclocking
> and poorly fitted heatsinks
I've been able to trigger a Machine check error on PPC when trying to boot
directly from OF with a COFF kernel.  The system has worked perfectly with
BootX.

I wonder why this is the first non-x86 report...

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



tunables??

2001-05-08 Thread SRIKANTH CHOWDARY M. K. G


Hi All,

  1.  Is there a file which contains all the tunables??

  2. Are the variables - free_pages_high & free_pages_low  (, which the
kswapd looks for after the timer expires),  tunable parameters??
  
  3. What  range of addresses  separates the Normal, High Memory
  & DMA zones??

 Thanks & Regards,
 Srikanth


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread Dan Hollis

On Tue, 8 May 2001, Larry McVoy wrote:
> which is a text version of the paper I mentioned before.  The basic
> message of the paper is that it really doesn't help much to have things
> like ECC unless you can be sure that 100% of the rest of your system
> has similar checks.

UDMA has crc, scsi has parity, pci has (i think) parity, tcpip has crc,
your cpu l1 and l2 have ecc...

Looks like similar checks are already there.

-Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread Larry McVoy

On Wed, May 09, 2001 at 12:24:25AM -0400, Marty Leisner wrote:
> I'm confused by the "lets not use ECC and use bk" talk.

I'll take a pass at unconfusing you, I can see how you might be.  I wish
I had never mentioned BK, that was never the point.  End to end was the
point, BK was just an example and now I'm getting accused of bringing
up the whole thread as a BK advertisement.  Which completely misses
the point.  Please go read

http://www.google.com/search?q=cache:web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf+clark+end+to+end=en

which is a text version of the paper I mentioned before.  The basic
message of the paper is that it really doesn't help much to have things
like ECC unless you can be sure that 100% of the rest of your system
has similar checks.

The point was made again, but apparently missed here, when I pointed
out that Linux's disk subsystem passes up bad data when it knows there
may be a problem.  ECC will not help you in this case, the data was bad
before it hit memory.  So now you have carefully error corrected BAD DATA.
See the point?  ECC doesn't help unless every other component is equally
careful; those components include software and hardware.  You can fix
that chunk of software and then I'll go find a rogue disk controller
that breaks the datapath, there are plenty to choose from.

Just to make sure you understand:  I think ECC is a fine thing.  If I'm
running systems with no other integrity checks, I'll take ECC and like it.
However, having ECC does not mean that I trust that my data is safe,
that is most certainly not a true statement.  The bus, the disks, the
disk controller, the disk driver, the buffer cache, etc, can all corrupt
the data.   Oh, yeah, let's not forget NFS.  I have seen each and every
one of those things corrupt data.

As to the BitKeeper stuff, those of you who think this is a BitKeeper
discussion are off wacking in the weeds.  The point isn't that BitKeeper
is good because it has integrity checks, the point is that integrity
checks are a good thing.  Period.   BitKeeper was just an example.
If there was a Linux filesystem that had built in integrity checks (and
I knew about it, for all I know there is one), then I would have used
that as the example.  I used BitKeeper as an example because I know it
and I can point to numerous cases where it exposed problems that ECC
would not have caught.  Ask Dave Miller about the mmap/read sparc linux
cache aliasing bug that BK exposed, that one was nasty.

Let's review:  ECC is nice, but it doesn't solve all data corruption
problems.  Applications which do their own end to end data integrity
checks will catch many more error cases than what ECC catches.  My efforts
in this thread had nothing to do with BitKeeper, they were trying to
get people to realize that end to end is good, and ECC isn't end to end.

Examples of end to end applications, which I should have thought of
sooner, are the md5sums on ftp.kernel.org, the integrity checks in rpms,
crcs in cpio.  I'm sure you can think of lots of others, this is an
old problem.

> My understanding is suns big machines stopped using ecc and they

The SUN problem was a cache problem and there is no way that I believe
that SUN would turn of ECC in the cache.  There are good reasons for
not doing so.  If you think through the end to end argument, you will
see that you have no way to do checks on the data path into/out of the
processor.  If that part of the datapath is not checked then no amount
of checking elsewhere does any good, the processor can be corrupting
your data and never know it.  If SUN was so stupid as to remove this,
then it is a dramatically different place.  I heard that there was a
bug in the cache controller, I never heard that they had removed ECC.
If you really want to know I can ask, I know at least one of the guys
who works on that stuff there.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread Marty Leisner


I'm confused by the "lets not use ECC and use bk" talk.

My understanding is suns big machines stopped using ecc and they
started to have "random" problems running big-iron applications
that took them a while to figure out (and a lot of bad press) and can
only be rectified in the big cycle (this was last year so its probably solved 
now).

I thought one of the primary reasons to have ecc is to catch
wierd things before they become catostrophic...and at least
know WHY weirdness is happening...


marty

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



nfs warning: mount version older than kernel

2001-05-08 Thread Jeff Chua


Where can I get the latest "mount" ?

# mount -t nfs lo:/ /mnt
NFS: NFSv3 not supported.
nfs warning: mount version older than kernel

#mount --version
mount: mount-2.11a


Thanks,
Jeff
[ [EMAIL PROTECTED] ]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Jonathan Morton

>That said, anyone who doesn't understand the former should probably
>get some more C experience before commenting on others' code...

I understood it, but it looked very much like a typo.

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SPARC include problem

2001-05-08 Thread Sean Jones

The include error was in kernel/sched.c . Should I rewrite the includes
for this file to include include/asm/irq.h over include/linux/irq.h? I
temporarily bypassed this problem by creating a blank asm/hw_irq.h . 

I also ran into a compile problem in arch/sparc/kernel/sparc_ksyms.c .
The rw semaphores seem to be undeclared. Here are the warnings:

D__KERNEL__ -I/usr/src/linux-2.4.4/include -Wall -Wstrict-prototypes -O2
-fomit-frame-pointer -fno-strict-aliasing -m32 -pipe -mno-fpu
-fcall-used-g5 -fcall-used-g7-DEXPORT_SYMTAB -c sparc_ksyms.c
In file included from /usr/src/linux-2.4.4/include/linux/sched.h:9,
 from sparc_ksyms.c:17:
/usr/src/linux-2.4.4/include/linux/binfmts.h:45: warning: `struct
mm_struct' declared inside parameter list
/usr/src/linux-2.4.4/include/linux/binfmts.h:45: warning: its scope is
only this definition or declaration, which is probably not what you
want.
sparc_ksyms.c:121: `___down_read' undeclared here (not in a function)
sparc_ksyms.c:121: initializer element is not constant
sparc_ksyms.c:121: (near initialization for
`__ksymtabdown_read.value')
sparc_ksyms.c:122: `___down_write' undeclared here (not in a function)
sparc_ksyms.c:122: initializer element is not constant
sparc_ksyms.c:122: (near initialization for
`__ksymtabdown_write.value')
sparc_ksyms.c:123: `___up_read' undeclared here (not in a function)
sparc_ksyms.c:123: initializer element is not constant
sparc_ksyms.c:123: (near initialization for
`__ksymtabup_read.value')
sparc_ksyms.c:124: `___up_write' undeclared here (not in a function)
sparc_ksyms.c:124: initializer element is not constant
sparc_ksyms.c:124: (near initialization for
`__ksymtabup_write.value')
make[1]: *** [sparc_ksyms.o] Error 1
make[1]: Leaving directory `/usr/src/linux-2.4.4/arch/sparc/kernel'
make: *** [_dir_arch/sparc/kernel] Error 2

Thank you,

Sean


Erik Mouw wrote:
> 
> On Mon, May 07, 2001 at 05:01:03PM -0500, Sean Jones wrote:
> > In compiling 2.4.4-ac5 for my SPARCStation 20, I had an error in the
> > compile resulting from the inability to find a hw_irq.h in the
> > include/asm directory. Do you know where I may be able to find such a
> > file?
> 
> You don't. I discussed this last week with Russell King: the ARM port
> also doesn't have the file hw_irq.h in include/asm-arm. According to
> Russell it is only needed in the arch dependent subdirectories, and not
> in the drivers.
> 
> Any driver that includes linux/irq.h is not written to be portable. The
> only generic driver that includes it is driver/pcmcia/hd64465_ss.c, but
> on second glance it's a Hitachi HD64465 specific driver anyway.
> 
> Erik
> 
> --
> J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department
> of Electrical Engineering, Faculty of Information Technology and Systems,
> Delft University of Technology, PO BOX 5031,  2600 GA Delft, The Netherlands
> Phone: +31-15-2783635  Fax: +31-15-2781843  Email: [EMAIL PROTECTED]
> WWW: http://www-ict.its.tudelft.nl/~erik/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN: Volunteers needed

2001-05-08 Thread Billy Harvey

 > This was the big argument I was running into from sites, "well it
 > isn't standard yet, when it is we'll do something about it".  The
 > larger sites like to avoid updates until absolutely necessary.

Good grief - nothing like planning ahead ... and these large-site
administrators actually accept paychecks for their lack of foresight?

Billy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN: Volunteers needed

2001-05-08 Thread jamal



On Tue, 8 May 2001, jamal wrote:
> Any one wishing to volunteer, please still send your emails in --
> we should be ready in a few days from now,
>

I guess i should have mentioned the IESG is sitting in to approve ECN
as proposed standard in about a week or so.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: nfs MAP_SHARED corruption fix

2001-05-08 Thread Andrea Arcangeli

On Tue, May 08, 2001 at 05:21:02PM +0200, Trond Myklebust wrote:
> AFAICs this fix will clearly deadlock...

yeah, it didn't triggered because it probably needs to be the same page
writepaged and in the dirty list at the same time. I hooked it very deep
into the writeback logic to keep it generic (it wasn't going to add a
significant overhead) but it didn't need to be _that_ deep.

Even worse I think it was partly wrong because it was only in the
close(2) path but not in the fput path that is the one walked by munmap.

This looks better to me, what do you think?

diff -urN ref/fs/nfs/file.c nfs-corruption/fs/nfs/file.c
--- ref/fs/nfs/file.c   Thu Feb 22 03:45:10 2001
+++ nfs-corruption/fs/nfs/file.cTue May  8 19:11:57 2001
@@ -39,6 +39,7 @@
 static ssize_t nfs_file_write(struct file *, const char *, size_t, loff_t *);
 static int  nfs_file_flush(struct file *);
 static int  nfs_fsync(struct file *, struct dentry *dentry, int datasync);
+static void nfs_file_close_vma(struct vm_area_struct *);
 
 struct file_operations nfs_file_operations = {
read:   nfs_file_read,
@@ -57,6 +58,11 @@
setattr:nfs_notify_change,
 };
 
+static struct vm_operations_struct nfs_file_vm_ops = {
+   nopage: filemap_nopage,
+   close:  nfs_file_close_vma,
+};
+
 /* Hack for future NFS swap support */
 #ifndef IS_SWAPFILE
 # define IS_SWAPFILE(inode)(0)
@@ -104,6 +110,20 @@
return result;
 }
 
+static void nfs_file_close_vma(struct vm_area_struct * vma)
+{
+   struct inode * inode;
+
+   inode = vma->vm_file->f_dentry->d_inode;
+
+   if (inode->i_state & I_DIRTY_PAGES) {
+   filemap_fdatasync(inode->i_mapping);
+   lock_kernel();
+   nfs_wb_file(inode, vma->vm_file);
+   unlock_kernel();
+   }
+}
+
 static int
 nfs_file_mmap(struct file * file, struct vm_area_struct * vma)
 {
@@ -115,8 +135,11 @@
dentry->d_parent->d_name.name, dentry->d_name.name);
 
status = nfs_revalidate_inode(NFS_SERVER(inode), inode);
-   if (!status)
+   if (!status) {
status = generic_file_mmap(file, vma);
+   if (!status)
+   vma->vm_ops = _file_vm_ops;
+   }
return status;
 }
 

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN: Volunteers needed

2001-05-08 Thread jamal



On Tue, 8 May 2001, David S. Miller wrote:

>
> I believe it would only be prudent to actually send out these messages
> starting at the moment ECN is officially standard.
>
> This was the big argument I was running into from sites, "well it
> isn't standard yet, when it is we'll do something about it".  The
> larger sites like to avoid updates until absolutely necessary.
>
> If we are to improve ECN deployment, we should understand the
> priorities of the people who run the sites which stand in the way
> of our doing so.
>

Sally new draft:
ftp://ftp.normos.org/ietf/internet-drafts/draft-floyd-tcp-reset-00.txt

builds a strong case against the RST issue and maybe used to point
to problems despite ECN.
But you are right, it will be stronger and wiser to wait until this is
standardized before paying a visit to the site owners.
Any one wishing to volunteer, please still send your emails in --
we should be ready in a few days from now,

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ECN: Volunteers needed

2001-05-08 Thread David S. Miller


jamal writes:
 > Help is needed to contact these site owners and politely using a standard
 > email ask them that their site was non-conformant.
 > Point them to Sally's draft and the fact that ECN is becoming standard
 > in the next week or so. Also to Jeff's ECN-under-Linux Unofficial
 > Vendor Support Page, and to encourage them to have their firewall
 > or load-balancer upgraded.
 > I suppose the first volunteer needed is to draft such an email. We have to
 > be polite and persistent for this to work.

I believe it would only be prudent to actually send out these messages
starting at the moment ECN is officially standard.

This was the big argument I was running into from sites, "well it
isn't standard yet, when it is we'll do something about it".  The
larger sites like to avoid updates until absolutely necessary.

If we are to improve ECN deployment, we should understand the
priorities of the people who run the sites which stand in the way
of our doing so.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] RAID5 NULL Checking Bug Fix

2001-05-08 Thread david chan


Hi,
In drivers/md/raid5.c, the author does not check to see if alloc_page() returns
NULL. This patch also adds checks that return 1 (following the
error-path convention in the respective function).

Please discard this e-mail if this patch is irrelevant to you. I just
tried to be thorough.

Thank you,
David Chan

---snip
--- drivers/md/raid5.c.orig Tue May  8 19:17:22 2001
+++ drivers/md/raid5.c  Tue May  8 19:20:07 2001
@@ -157,17 +157,21 @@
memset(bh, 0, sizeof (struct buffer_head));
init_waitqueue_head(>b_wait);
page = alloc_page(priority);
+   if (!page)
+   goto nomem_path;
bh->b_data = page_address(page);
-   if (!bh->b_data) {
-   kfree(bh);
-   return 1;
-   }
+   if (!bh->b_data)
+   goto nomem_path;
atomic_set(>b_count, 0);
bh->b_page = page;
sh->bh_cache[i] = bh;

}
return 0;
+
+nomem_path:
+   kfree(bh);
+   return 1;
 }

 static struct buffer_head *raid5_build_block (struct stripe_head *sh, int i);
---snip---

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



blkdev in pagecache

2001-05-08 Thread Andrea Arcangeli

This night I moved the blkdev layer in pagecache in this patch:


ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.5pre1/blkdev-pagecache-1

It is incremental and depends on the o_direct functionality, latest
o_direct patch against 2.4.5pre1 is here:


ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.5pre1/o_direct-5

The main reasons I moved the blkdev in pagecaches is that the current
blkdev provides horrible performance with fast I/O subsystem capable of
over 50mbyte/sec that I just increased x2 with a simple hack that you
can see here if you're curious:


ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.5pre1aa2/00_4k_block_dev-1

(btw, also the current rawio uses a 512byte bh->b_size granularity that is even
worse than the 1024byte b_size of the blkdev, O_DIRECT is much smarter
on this side as it uses the softblocksize of the fs that can be as well
4k if you created the fs with -b 4096)

However after running this 4k_block_dev-1 hack on some more machine I
noticed the blkdev layer wasn't able anymore to update the superblock of
1k ext2 filesystems and to make it "usable" in real life I needed to fix
it. But I didn't wanted ot invest any further time on such an hack and I
preferred to move the blkdev in pagecache and to fix the problem on top
of the new better design (moving blkdev in pagecache of course
introduces that same problem too as I also mentioned in one of the below
points).

I'll describe here some of the details of the blkdev-pagecache-1 patch:

- /dev/raw* and drivers/char/raw.c gets obsoleted and replaced by
  opening the blkdevice with O_DIRECT, it looks much saner and I
  basically get it for free by just implementing 10 lines of the
  blkdev_direct_IO callback, of course I didn't removed the /dev/raw*
  API for compatibility.

  While testing O_DIRECT I destroyed the first 50mbyte of the root
  partition so I will need to wait the test box to return alive before I
  can make further testing ;). But I just fixed the bug that caused the
  corruption before uploading the patch so I don't expect further
  problems (it was only a s/i_dev/i_rdev thing) because the regression
  testing was working well even if it was writing in the wrong disk ;).

- I force the virtual blocksize for all the blkdev I/O
  (buffered and direct) to work with a 4096 bytes granularity instead of
  the current 1024 softblocksize because we need that for getting higher
  performance, 1024 is too low because it wastes too much ram and too
  much cpu. So a DBMS won't be able anymore to write 512bytes to the
  disk using rawio being sure it will be a single atomic block update.
  If you use /dev/raw nothing changed of course, only opening blkdev
  with O_DIRECT enforce a minimal granularity of 4096 bytes in the I/O.
  I don't think this is a problem, and also O_DIRECT through the fs was
  just using the fs softblocksize instead of the hardblocksize as unit
  of the minimal direct-IO granularity.

- writes to the blockdevice won't end in the buffer cache, so it will
  be impossible to update the superblock of an ext2 partition mounted ro
  for example, it must not be mounted at all to update the superblock, I
  will need to invent an hack to fix this problem or it will get too
  annoying. One way could simply to change ext2 and have it checking
  the buffer to be uptodate before marking it dirty again but maybe
  we could also do it in a generic manner that fixes all the fs at once
  (OTOH probably not that many fs needs to be fscked online...).

- mmap should be functional but it's totally untested.

- currently the last `harddisk_size & 4095' bytes (if any) won't be
  accessible via the blkdev, to avoid sending to the hardware requests
  beyond the end of the device. Not sure how/if to solve this. But this is
  definitely not a new issue, the same thing happens today in 2.2 and
  2.4 after you mount a 4k filesystem on a blockdevice. OTOH I'm scared
  a mke2fs -b 1024 could get confused. But I really don't want to
  decrease the b_size of the buffer header even if we fix this.

- to share all the filemap.c code and not to change too much stuff in
  the first patch I added some ISBLK check in fast paths, basically
  only to check against blk_size instead of inode->i_size, I also
  considered changing the i_size semantics for the blkdev inodes but
  I didn't wanted to break all the fs yet so I took the localized
  slower way for now (I doubt it is noticeable in the benchmarks
  but nevertheless it would be nice to optimize away those branches).

- once the blkdev is closed in the block_close callback I
  filemap_fdatasync;fsync_dev;filemap_fdatawait;invalidate_inode_pages2
  (fdatawait seems not necessary but it won't hurt). I'm not calling
  truncate_inode_pages because those pages could be still mapped
  (->release is called when f_count goes down to zero, not when
  i_count reaches zero). I'd like to defer the 

ECN: Volunteers needed

2001-05-08 Thread jamal


Folks,

ECN is about to become a Proposed Standard RFC. Thanks to
efforts from the Linux community, a few issues were discovered
in the course of deploying the code. Special kudos go to Alexey
Kuznetsov and David Miller.

I wont go into details of the issues other than to say some
midlle-box vendors in the past have associated the semantics of the
natural-language English word "reserved" to have a different meaning.
visit Jeff Garzik's ECN-under-Linux Unofficial Vendor Support Page
at: http://gtf.org/garzik/ecn/ for more details

Sally Floyd explains best why it is wrong for vendors of middle boxes to
be doing this in the draft to be found at:
ftp://ftp.normos.org/ietf/internet-drafts/draft-floyd-tcp-reset-00.txt

So why am i posting this?

This is to solicit volunteers who will help removing the remaining cruft.
Some vendors (special positive mention goes to CISCO) have released
patches which are unfortunately not being propagated by some of the
site owners.
Help is needed to contact these site owners and politely using a standard
email ask them that their site was non-conformant.
Point them to Sally's draft and the fact that ECN is becoming standard
in the next week or so. Also to Jeff's ECN-under-Linux Unofficial
Vendor Support Page, and to encourage them to have their firewall
or load-balancer upgraded.
I suppose the first volunteer needed is to draft such an email. We have to
be polite and persistent for this to work.

Jitendra Padhye at ACIRI is running weekly tests to detect offending
sites. Most recent results can be found at:
http://www.aciri.org/tbit/ecn_test3A.html
Any site with the word "RST" on the line should be considered
non-conformant.

Volunteers please send an email to [EMAIL PROTECTED] with subject "interested in
volunteering"

Flames etc please redirect to netdev (since that's the only list i am on).
as well make sure you cc the other people (other than linux-kernel and
linux-net)

cheers,
jamal



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Rusty Russell

In message <[EMAIL PROTECTED]> you write:
> 
> Jonathan Morton writes:
>  > >-  page_count(page) == (1 + !!page->buffers));
>  > 
>  > Two inversions in a row?
> 
> It is the most straightforward way to make a '1' or '0'
> integer from the NULL state of a pointer.

Overall, I'd have to say that this:

-   dead_swap_page =
-   (PageSwapCache(page) &&
-page_count(page) == (1 + !!page->buffers));
-

Is nicer as:

int dead_swap_page = 0;

if (PageSwapCache(page)
&& page_count(page) == (page->buffers ? 1 : 2))
dead_swap_page = 1;

After all, the second is what the code *means* (1 and 2 are magic
numbers).

That said, anyone who doesn't understand the former should probably
get some more C experience before commenting on others' code...

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.2.19 + reiserfs 3.5.32 nfsd wait_on_buffer/down_failed

2001-05-08 Thread Chris Mason



On Tuesday, May 08, 2001 04:42:43 PM +0200 Michael Stiller <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> we run a nfs server utilizing 2.2.19 + ReiserFS version 3.5.32 on a
> P 3 550 machine. Disk subsystem is a GDT7518RN using 4 UW disks as raid 5
> device. After upgrading from 2.2.17 + reiserfs to 2.2.19 we experience
> many (very much more than with 2.2.17) problems with our nfs clients
> about 12 (linux). Network ist 100Mbit full duplex / switched. 
> I do not think this is network related, cause ping -f doesnt show any
> packet loss. 
> 
> During not so heavy IO on the exported fs
> one nfsd thread seems to be waiting for the disk:

Are you running any patches to make knfsd deal with the reiserfs iget issues?

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread David S. Miller


Marcelo Tosatti writes:
 > Ok, this patch implements thet thing and also changes ext2+swap+shm
 > writepage operations (so I could test the thing).
 > 
 > The performance is better with the patch on my restricted swapping tests.

Nice.  Now the only bit left is moving the referenced bit
checking and/or state into writepage as well.  This is still
part of the plan right?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: REVISED: Experimentation with Athlon and fast_page_copy

2001-05-08 Thread Tom Leete

Alan Cox wrote:
> 
> > the memory copy in the fast_page_copy routine.  The machine then
> > proceeded
> > not to stop at my panic, but I got my "normal" oopses.  I then had an
> 
> Ok
> 
> > idea and removed all the prefetch instructions from the beginning of the
> > routine and tried the resultin kernel.  I now have no crashes.
> > What could this mean?
> 
> I think it has to mean a hardware problem.

I don't think so, reasons below
 
> What still stands out is that exactly _zero_ people have reported the same
> problem with non VIA chipset Athlons.

Not any more :-(

Hi Alan,

IIRC this thread is about boot going catatonic right after unloading
__initmem.
I'm seeing that in 2.4.5-pre1 with Athlon stepping 2, AMD 751, MS-6195 mobo,
128M.
The machine is fine with kernels up through 2.4.4-pre3, and still works with
them.

On that gear, there is no crash. The keyboard and display are alive and
SysRq works.
I have copied the stack trace for pid=1 and the processor dump. I'm short of
time
but I have a kind typist electrifying the trace, and I'll try to generate
something
ksymoops can digest.

Here is what a quick eyeballing of System.map shows.

The code is at the end of init/main.c:init(). The processor dump shows
init() halted
in default_idle() from the sequence L6 -> init -> cpu_idle.

Trace of pid 1 shows it stuck in D state. The last addresses listed are from
filemap_nopage -> do_execve -> do_no_page -> handle_mm_fault -> __pmd_alloc
-> rwsem_down_write_failed -> stext_lock -> system_call. That looks fishy.

Earlier, it looks like handle_mm_fault is being triggered from
fast_clear_page.

I'll post the full dump soon as I have it.

Btw, above happens with both gcc-2.95.3 and gcc-3.0-[20010423] compiled
kernels.

Cheers,
Tom

-- 
The Daemons lurk and are dumb. -- Emerson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4 Kernel - ASUS CUV4X-DLS Question

2001-05-08 Thread J. S. Connell

All,

I unfortunately don't have the time this evening to produce actual kernel
messages, but I did want to throw out that I have an ASUS CUV4X-DLS board
too, with two PIII/1GHz processors in it, and I cannot get it to boot an
SMP kernel at all.  In addition to the built-in devices, the following
cards are also present:

* Netgear FA310TX (rev. D, I believe - lspci reports it as a Lite-On
  LNE100TX rev 21)
* Promise PDC20267 Ultra100 controller
* Creative SB Live! Value
* Matrox G200 AGP

When it gets to the point of activating the second processor, kernel
2.4.3-ac13 starts spewing:


probable hardware bug: clock timer configuration lost - probably a VIA686a motherboard.
probable hardware bug: restoring chip configuration.


continuously.  Older kernels simply hang at this point.  I'll try to get
the actual messages leading up to this tomorrow.  Also, if there's
any other information I can collect from my system that could help,
feel free to ask.  I'll also build 2.4.4-ac6 on it tomorrow and try
booting it SMP. Here's the output of lspci -vv:

00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo PRO133x] (rev c4)
Subsystem: Asustek Computer, Inc.: Unknown device 8038
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- 
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP] 
(prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Reset- FastB2B-
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
Subsystem: Asustek Computer, Inc.: Unknown device 8038
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ 
SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR-  [disabled] [size=256K]

00:0b.0 Unknown mass storage controller: Promise Technology, Inc. 20267 (rev 02)
Subsystem: Promise Technology, Inc.: Unknown device 4d33
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR-  [disabled] [size=64K]
Capabilities: [58] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0c.0 Multimedia audio controller: Creative Labs SB Live! EMU1 (rev 08)
Subsystem: Creative Labs CT4832 SBLive! Value
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- 
Starting kswapd v1.8
Winbond Super-IO detection, now testing ports 3F0,370,250,4E,2E ...
SMSC Super-IO detection, now testing Ports 2F0, 370 ...
0x378: FIFO is 16 bytes
0x378: writeIntrThreshold is 8
0x378: readIntrThreshold is 8
0x378: PWord is 8 bits
0x378: Interrupts are ISA-Pulses
0x378: ECP port cfgA=0x10 cfgB=0x00
0x378: ECP settings irq= dma=
parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE,COMPAT,ECP]
parport0: cpp_daisy: aa5500ff(38)
parport0: assign_addrs: aa5500ff(38)
parport0: cpp_daisy: aa5500ff(38)
parport0: assign_addrs: aa5500ff(38)
parport_pc: Via 686A parallel port: io=0x378
Detected PS/2 Mouse Port.
pty: 256 Unix98 ptys configured
lp0: using parport0 (polling).
block: queued sectors max/low 169506kB/56502kB, 512 slots per queue
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller on PCI bus 00 dev 21
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci00:04.1
ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:DMA
PDC20267: IDE controller on PCI bus 00 dev 58
PCI: Found IRQ 10 for device 00:0b.0
PCI: The same IRQ used for device 00:07.0
PDC20267: chipset revision 2
PDC20267: not 100% native mode: will probe irqs later
PDC20267: (U)DMA Burst Bit ENABLED Primary PCI 

No Subject

2001-05-08 Thread abhilash s

sir 
  I am a linux fan from India and i am eagar to
know about the emerging technologies please inform me
@

[EMAIL PROTECTED]


__
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Marcelo Tosatti



On Tue, 8 May 2001, Linus Torvalds wrote:

> 
> 
> On Tue, 8 May 2001, Marcelo Tosatti wrote:
> >
> > There are two issues which I missed yesterday: we have to get a reference
> > on the page, mark it clean, drop the locks and then call writepage(). If
> > the writepage() fails, we'll have to set_page_dirty(page).
> 
> We can move the "mark it clean" into writepage, which would actually
> simplify the error cases for shared memory writepage (no need to mark it
> dirty again etc).
> 
> > I guess this is too much overhead for the common case, don't you?
> 
> You could easily be right.
> 
> On the other hand, remember that a noticeable part of the time you should
> be seeing a real write too, so the CPU overhead compared to the IO might
> not be prohibitive. Ie, let's assuem that 10% of the time we actually end
> up doing writes, then that 10% is going to be _soo_ much more than the
> extra 10 cycles 90% of the time that the cleanup may well be worth it.
> 
> Especially if the cleanup means that we can avoid doing some of the real
> writes altogether, by being better able to release dead memory to the
> system.

Ok, this patch implements thet thing and also changes ext2+swap+shm
writepage operations (so I could test the thing).

The performance is better with the patch on my restricted swapping tests.

In case you don't have any problems with this I'll fix the other
writepage's (so tell me if its ok for you).


diff -Nur --exclude-from=exclude linux.orig/fs/buffer.c linux/fs/buffer.c
--- linux.orig/fs/buffer.c  Mon May  7 20:47:26 2001
+++ linux/fs/buffer.c   Tue May  8 22:04:00 2001
@@ -1933,12 +1933,17 @@
return err;
 }
 
-int block_write_full_page(struct page *page, get_block_t *get_block)
+int block_write_full_page(struct page *page, get_block_t *get_block, int priority)
 {
struct inode *inode = page->mapping->host;
unsigned long end_index = inode->i_size >> PAGE_CACHE_SHIFT;
unsigned offset;
int err;
+
+   if (!priority)
+   return -1;
+
+   ClearPageDirty(page);
 
/* easy case */
if (page->index < end_index)
diff -Nur --exclude-from=exclude linux.orig/fs/ext2/inode.c linux/fs/ext2/inode.c
--- linux.orig/fs/ext2/inode.c  Mon May  7 20:47:26 2001
+++ linux/fs/ext2/inode.c   Tue May  8 20:46:54 2001
@@ -650,9 +650,9 @@
return NULL;
 }
 
-static int ext2_writepage(struct page *page)
+static int ext2_writepage(struct page *page, int priority)
 {
-   return block_write_full_page(page,ext2_get_block);
+   return block_write_full_page(page,ext2_get_block,priority);
 }
 static int ext2_readpage(struct file *file, struct page *page)
 {
diff -Nur --exclude-from=exclude linux.orig/include/linux/fs.h linux/include/linux/fs.h
--- linux.orig/include/linux/fs.h   Tue May  8 16:45:42 2001
+++ linux/include/linux/fs.hTue May  8 22:22:38 2001
@@ -362,7 +362,7 @@
 struct address_space;
 
 struct address_space_operations {
-   int (*writepage)(struct page *);
+   int (*writepage)(struct page *, int);
int (*readpage)(struct file *, struct page *);
int (*sync_page)(struct page *);
int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
@@ -1268,7 +1268,7 @@
 /* Generic buffer handling for block filesystems.. */
 extern int block_flushpage(struct page *, unsigned long);
 extern int block_symlink(struct inode *, const char *, int);
-extern int block_write_full_page(struct page*, get_block_t*);
+extern int block_write_full_page(struct page*, get_block_t*, int);
 extern int block_read_full_page(struct page*, get_block_t*);
 extern int block_prepare_write(struct page*, unsigned, unsigned, get_block_t*);
 extern int cont_prepare_write(struct page*, unsigned, unsigned, get_block_t*,
diff -Nur --exclude-from=exclude linux.orig/mm/filemap.c linux/mm/filemap.c
--- linux.orig/mm/filemap.c Mon May  7 20:47:26 2001
+++ linux/mm/filemap.c  Tue May  8 22:22:50 2001
@@ -411,7 +411,7 @@
  */
 void filemap_fdatasync(struct address_space * mapping)
 {
-   int (*writepage)(struct page *) = mapping->a_ops->writepage;
+   int (*writepage)(struct page *, int) = mapping->a_ops->writepage;
 
spin_lock(_lock);
 
@@ -430,8 +430,7 @@
lock_page(page);
 
if (PageDirty(page)) {
-   ClearPageDirty(page);
-   writepage(page);
+   writepage(page, 1);
} else
UnlockPage(page);
 
diff -Nur --exclude-from=exclude linux.orig/mm/shmem.c linux/mm/shmem.c
--- linux.orig/mm/shmem.c   Mon May  7 20:47:26 2001
+++ linux/mm/shmem.cTue May  8 22:23:01 2001
@@ -221,13 +221,16 @@
  * once.  We still need to guard against racing with
  * shmem_getpage_locked().  
  */
-static int shmem_writepage(struct page * page)
+static int shmem_writepage(struct page * page, int priority)
 {
int error = 0;
struct shmem_inode_info *info;
swp_entry_t 

Re: SPARC include problem

2001-05-08 Thread Sean Jones

Sean Jones wrote:
> 
> "David S. Miller" wrote:
> >
> > Sean Jones writes:
> >  > In compiling 2.4.4-ac5 for my SPARCStation 20, I had an error in the
> >  > compile resulting from the inability to find a hw_irq.h in the
> >  > include/asm directory. Do you know where I may be able to find such a
> >  > file?
> >
> > How did you find this problem if the build couldn't find the
> > "bzImage" rule? :-)
> >
> > Later,
> > David S. Miller
> > [EMAIL PROTECTED]
> 
> I found it by kicking the make stuff around one more time after I sent
> that e-mail.
> 
> Sean
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: nfsd from kernel 2.4.4 oops

2001-05-08 Thread Neil Brown

On Tuesday May 8, [EMAIL PROTECTED] wrote:
> Hi,
> 
> I'm using kernel 2.4.4 cvs from SGI, with xfs. I'm getting this Oops:
> 
> kernel: Unable to handle kernel NULL pointer dereference at virtual address 0010
> kernel:  printing eip:
> kernel: c017bfd8
> kernel: *pde = 
> kernel: Oops: 
> kernel: CPU:0
> kernel: EIP:0010:[nfsd_findparent+120/236]
> kernel: EIP:0010:[]
> kernel: EFLAGS: 00010246
> kernel: eax:    ebx:    ecx: cff8d458   edx: 0010
> kernel: esi: cb22c6a0   edi: cb22c720   ebp: cb22c720   esp: ce4c9e54
> kernel: ds: 0018   es: 0018   ss: 0018
> kernel: Process nfsd (pid: 592, stackpage=ce4c9000)
> kernel: Stack:  1802280f c017c416 cb22c720  ce4cf814 1127 
>ce4cf804
> kernel:c03c5740 cfe3b5c8 000e ff8c  c017c7c4 cfe3b400 
>1802280f
> kernel:  0001 ce4cf804 0008 cb1fc77c ce4cfc00 
>ceb7b000
> kernel: Call Trace: [find_fh_dentry+598/928] [fh_verify+612/1128] 
>[nfsd_lookup+110/1368] [nfsd3_proc_lookup+314/332] [nfs3svc_decode_diropargs+152/268] 
>[nfsd_dispatch+203/360] [svc_process+684/1348]
> kernel: Call Trace: [] [] [] [] [] 
>[] []
> 

nfsd_findparent+120/236 corresponds to line 257 on fs/nfsd/nfsfh.h
and the condition of the "if" statement:
if (aliases->next != aliases) {
just after the "spin_lock(_lock)".
eax == 0 implies that >d_inode == NULL, and hence the oops.

d_inode being NULL here implies that the "lookup" of ".." failed
to find a ".." entry, which is very odd.

I find it hard to believe that ext2fs would ever do this unless the
filesystem was corrupt.  XFS might, I don't know.

I guess nfsd should be robust against this sort of behaviour in
filesystems.

Something like:

--- nfsfh.c 2001/05/09 00:54:56 1.1
+++ nfsfh.c 2001/05/09 00:56:01
@@ -244,6 +244,10 @@
 */
pdentry = child->d_inode->i_op->lookup(child->d_inode, tdentry);
d_drop(tdentry); /* we never want ".." hashed */
+   if (!pdentry && tdentry->d_inode == NULL) {
+   dput(tdentry);
+   pdentry = ERR_PTR(-EINVAL);
+   }
if (!pdentry) {
/* I don't want to return a ".." dentry.
 * I would prefer to return an unconnected "IS_ROOT" dentry,


Is probably the best fix for knfsd, but someone should find out why
XFS isn't finding ".." when asked (If that is indeed what is
happening).

NeilBrown


> 
> It's produced very randomly. Some people (readed in xfs list) get similar error and
> tested too with a clean 2.4.4 with ext2 filesystem, and oops too. I think this is
> related to nfsd code (maybe sunrpc code), and it's not related to xfs code.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: pci_pool_free from IRQ

2001-05-08 Thread David S. Miller


David Brownell writes:
 > Pete's patch to pci_pool_free() is fine with me, and I'd be glad
 > to see that bit of pci interface cleaned up.  Any changes needed
 > other than the pci.txt doc update?

Ummm... What Alan's saying is:

1) Whatever driver is trying to shut down from IRQ context
   is broken must be fixed.  pci_pool is fine.

2) The Documentation/ files which suggest that such device
   removal from IRQs is "OK" must be fixed because it is not
   "OK" to handle device removal from IRQ context.

So Pete's change is not needed.  A fix for the documentation and
broken drivers is needed instead.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4.4-ac6

2001-05-08 Thread Dan Podeanu

On Wed, 9 May 2001, Alan Cox wrote:

...
> 2.4.4-ac6
...

To be sincere I was expecting the Athlone pre-pre-pre-patch/fix to be
included.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Patch to make ymfpci legacy address 16 bits

2001-05-08 Thread Pete Zaitcev

Hi:

I found that every time I run a 2.4 on my laptop, APM locks up
the machine. Apparently, legacy YMF code enabled decoding of
10 bits of I/O address. A call to APM BIOS touched that and
somehow the system locked up.

If Pavel Roskin, Daisuke Nagano or someone else do not mind,
I want this in stock kernel.

-- Pete

--- linux-2.4.4/drivers/sound/ymfpci.c  Thu Apr 26 22:17:27 2001
+++ linux-2.4.4-niph/drivers/sound/ymfpci.c Tue May  8 16:46:58 2001
@@ -2059,9 +2059,10 @@
}
 
if (mpuio >= 0 || oplio >= 0) {
-   v = 0x003e;
+   /* 0x0020: 1 - 10 bits of I/O address decoded, 0 - 16 bits. */
+   v = 0x001e;
pci_write_config_word(pcidev, PCIR_LEGCTRL, v);
-   
+
switch (pcidev->device) {
case PCI_DEVICE_ID_YAMAHA_724:
case PCI_DEVICE_ID_YAMAHA_740:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re:

2001-05-08 Thread Andrew Morton

"Richard B. Johnson" wrote:
> 
> To driver wizards:
> 
> I have a driver which needs to wait for some hardware.
> Basically, it needs to have some code added to the run-queue
> so it can get some CPU time even though it's not being called.
> 
> It needs to get some CPU time which can be "turned on" or
> "turned off" as a result of an interrupt or some external
> input from  an ioctl().

schedule_task()?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Linux 2.4.4-ac6

2001-05-08 Thread Alan Cox


ftp://ftp.kernel.org/pub/linux/kernel/people/alan/2.4/

Intermediate diffs are available from

http://www.bzimage.org

2.4.4-ac6
o   Revert dead swap patch pending fixes(Dave Miller)
o   Allow arch specific writeproc/DMA for IDE   (Bjorn Wesen)
o   Move to aic7xxx 6.1.13  (Justin Gibbs)
o   Use pci_set_master on eni.c (Jeff Garzik)
o   Update wireless drivers, add airport(Jean Tourrihles,
Benjamin Herrenschmidt)
o   Add new pci ids, clean up dup defines in eicon  (Jeff Garzik)
o   Add module loader to kernel docs(Erik Mouw)
o   Fix wanrouter makefile bug  (Arnaldo Carvalho de Melo)
o   Add another pair of idents to the yenta driver  (Alexandr Kanevskiy)
o   Parport fixes for 1284 mode (Fred Barnes)
o   Update 8139too driver to handle wakeup bug  (Jeff Garzik)
o   Add koi8-ru locale  (Andrzej Krzysztofowicz)
o   Add ICH3 to the i810 audio driver   (Tom Woller)
o   Improve (hopefully) the confusing I82365 help   (me)
o   Fix a bug in koi8-u tables  (Andrzej Krzysztofowicz)
o   Fix a bug in UTF8->CP1255   (Andrzej Krzysztofowicz)
o   Fix a bug in iso8859-13 tables  (Andrzej Krzysztofowicz)
o   Update gdth driver to current vendor release(Achim Leubner)
o   Kill cpia_write_proc (its insecure) (Al Viro, me)
o   Fix unterminated array strtoul() in comx(Al Viro)
o   Fix TCP send path leak  (Dave Miller)
o   Restore older skb_cow() headroom behaviour  (Dave Miller)
o   Fix ipv6 oops   (Dave Miller)
o   Small ipx tidy up   (Arnaldo Carvalho de Melo)
o   Fix unprotected userspace reference in trident  (Al Viro)
audio
o   Fix expand stack locking(Manfred Spraul)
o   Fix offslab_limit calculation   (Manfred Spraul)
o   EATA and U14F updates   (Dario Ballabio)
o   Update scsi generic to 3.1.18   (Doug Gilbert)
o   Clean up abs()  (Kai Germaschewski)
| This needs further checking
o   ymfpci update   (Pete Zaitcev)
o   Quota code updates  (Jan Kara)
o   Clean up eicon include abuse(me)

2.4.4-ac5
o   Fix DMA setup on hpt366/370 (Tim Hockin)
o   DRM memory alloc failure checks (Akash Jain)
o   Remove bogus fs/buffer.c diff   (Ben LaHaise)
o   cs46xx update - adds Hercules Game Theatre XP   (Thomas Woller)
o   Fix menuconfig breakage with () (Andrzej Krzysztofowicz)
o   Updated multithreaded core dump support (Don Dugger)
o   Remove dead ibmtr.h include (Mike Phillips)
o   Fix misplaced letters in koi8-u (Andriy Rysin)
o   Further alpha module locking fix(Andrea Arcangeli)
o   Keyspan bitwidth fixes  (Hugh Blemings)
o   usb-uhci oops fix   (Pete Zaitcev)
o   Add ability to specify preferred minor on   (Gerd Knorr)
video/radio4linux devices
o   Further IPX updates (Arnaldo Carvalho de Melo)
o   Further IRDA updates(Dag Brattli)
o   Make x86 ptrace framesize a define (code clean) (Pavel Machek)
o   Moxa serial tidy(Tim Hockin)
o   Fix tiny select race(Rusty Russell)
o   Update aic7xxx to 6.1.12(Justin Gibbs)
o   Alpha was missing rwlock_init   (Reto Baettig)
o   Alpha SCHED_YIELD was broken on UP  (Andrea Arcangeli)
o   Allow IRQ sharingon more PCI ide(Pete Zaitcev)
o   Fix capable checks found by Stanford analyser   (me)
for cciss/cpqarray
o   List more devices in sysrq table(Andrzej Krzysztofowicz)
o   Run uml exit callbacks reverse to init  (Andrew Morton)
o   Fix SMP resched_idle pre-emption bug(Nigel Gamble)
o   Work around config problem with menuconfig
and USB (Andrzej Krzysztofowicz)
o   Fix nasty bug in Alpha PCI mapping  (Hyung Min SEO)
| Nautilus specific stuff not applied yet
o   SBLive endianness fixes (output only so far)(Ira Weiny)
o   Move sblive pci_enable earlier  (Marcus Meissner)
o   Merge IBM ServeRAID 4.72 driver (Keith Mitchell)
o   Fix affs 

Re: [patch] 2.4.4: mmap() fails for certain legal requests

2001-05-08 Thread David S. Miller


Alan Cox writes:
 > And just how is he going to test it ? Considering he was just
 > asking if the concept was reasonable I think you are a little out
 > of order

I can't test every platform when I have to make such changes.
But it always serves to show the port maintainer "what" the
change was.

Yes, I am slightly out of order if the intent is just "does
this idea look fine" (which it does btw, I can't  find any
problems with it).

I apologize to Maciej, but I do deplore him to actually do the
final bits for the other ports when he makes his final patch
submission.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.4: mmap() fails for certain legal requests

2001-05-08 Thread Maciej W. Rozycki

On Tue, 8 May 2001, David S. Miller wrote:

> That's pretty arrogant that cut and pasting a few lines into some
> architecture specific files and reporting the updated patch is too
> much to ask.

 I'm sorry if you find me arrogant -- that certainly was not my intent.  I
did look at the files and changes are not as trivial as cut and paste.

> Perhaps reviewing your change is also, too much to ask.  Perhaps
> we are too lazy and short on time to have a look at your change.

 Well, I've been using similar changes since July.  I may live with
patches forever and be fine.  Still this is not the point with free
software.  It would be malicious if I had a fix and I wouldn't share it. 
Sooner or later someone would discover the problem again and would waste
time to track it down unnecessarily.  And again, and again...

> I don't think it's asking a lot to provide a complete change.

 It's not a lot, supposedly, but look at the case from my point of view. 
It's a bugfix and not a new feature.  I've invested a few hours in finding
the cause of a weird bug on a MIPS/Linux machine.  I am providing a ready
solution that works for most architectures with the exception of a few
ones I'm not familiar with. 

 Well, it's great I have an opportunity to get better knowledge on these
architectures, but I cannot always afford it and I know there are people
who already have enough knowledge to be sure bits get written correctly
immediately.  I never hesitate to do job myself in the areas I am familiar
with or when I have enough free time (and I do have, from time to time). 
I don't have time currently, I am afraid (basically I am now stealing the
time I would otherwise spend sleeping for a task that was quite low on my
priority list) and I am sure someone familiar with the specific ports
would spend less time than I do.  Finally I do consider my time equally
worth to anyone else's one, so why should I have to spend x units of time,
whilst some else would only spend x/2 or x/3 or whatever...  Of course I
consider this rule working both ways. 

> I'm sure the MIPS folks know all too well whats it's like when their
> port is crapped up because someone only made changes to x86 port
> portions.  At least for me on after working on Sparc for some time,
> I'm adamant about providing complete changes so that this kind of
> grief is avoided for other port maintainers.

 The port gets crapped from time to time, although Ralf is doing great job
to keep it fine, so it's more that specific MIPS hosts lag behind the rest
of the kernel.  Still I consider it the specific maintainer's job to get
things synchronized.  It just works better this way. 

> In the time you used to compose your response to me, and now
> to read this email from me, you could have fixed up the patch
> perhaps 2 or 3 times.  Just do it and get it over with ok?

 I'm not so sure, I'm afraid, especially at this time of the day.  Check
timestamps of mails if curious... 

> Dziekuje.

 Nie za ma co. ;-)

 A patch follows.  Architecture-specific changes are completely untested. 
I hope I got things right, otherwise I'll consider my time wasted.

 BTW, I've noticed the "if (flags & MAP_FIXED)" statements in
arch_get_unmapped_area() in arch/sparc*/kernel/sys_sparc.c are dead code
now, as get_unmapped_area() in mm/mmap.c never calls it if MAP_FIXED is
set in flags.

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

diff -up --recursive --new-file linux-2.4.4.macro/arch/ia64/kernel/sys_ia64.c 
linux-2.4.4/arch/ia64/kernel/sys_ia64.c
--- linux-2.4.4.macro/arch/ia64/kernel/sys_ia64.c   Mon May  7 16:43:50 2001
+++ linux-2.4.4/arch/ia64/kernel/sys_ia64.c Tue May  8 23:25:49 2001
@@ -28,13 +28,22 @@ arch_get_unmapped_area (struct file *fil
 
if (len > RGN_MAP_LIMIT)
return -ENOMEM;
-   if (!addr)
-   addr = TASK_UNMAPPED_BASE;
 
+   if (addr) {
+   if (flags & MAP_SHARED)
+   addr = COLOR_ALIGN(addr);
+   else
+   addr = PAGE_ALIGN(addr);
+   vmm = find_vma(current->mm, addr);
+   if (TASK_SIZE - len >= addr &&
+   rgn_offset(addr) + len <= RGN_MAP_LIMIT) &&
+   (!vmm || addr + len <= vmm->vm_start))
+   return addr;
+   }
if (flags & MAP_SHARED)
-   addr = COLOR_ALIGN(addr);
+   addr = COLOR_ALIGN(TASK_UNMAPPED_BASE);
else
-   addr = PAGE_ALIGN(addr);
+   addr = PAGE_ALIGN(TASK_UNMAPPED_BASE);
 
for (vmm = find_vma(current->mm, addr); ; vmm = vmm->vm_next) {
/* At this point:  (!vmm || addr < vmm->vm_end). */
diff -up --recursive --new-file linux-2.4.4.macro/arch/sparc/kernel/sys_sparc.c 
linux-2.4.4/arch/sparc/kernel/sys_sparc.c
--- 

Re: page_launder() bug

2001-05-08 Thread Linus Torvalds



On Tue, 8 May 2001, Marcelo Tosatti wrote:
>
> There are two issues which I missed yesterday: we have to get a reference
> on the page, mark it clean, drop the locks and then call writepage(). If
> the writepage() fails, we'll have to set_page_dirty(page).

We can move the "mark it clean" into writepage, which would actually
simplify the error cases for shared memory writepage (no need to mark it
dirty again etc).

> I guess this is too much overhead for the common case, don't you?

You could easily be right.

On the other hand, remember that a noticeable part of the time you should
be seeing a real write too, so the CPU overhead compared to the IO might
not be prohibitive. Ie, let's assuem that 10% of the time we actually end
up doing writes, then that 10% is going to be _soo_ much more than the
extra 10 cycles 90% of the time that the cleanup may well be worth it.

Especially if the cleanup means that we can avoid doing some of the real
writes altogether, by being better able to release dead memory to the
system.

Tradeoffs..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.4: mmap() fails for certain legal requests

2001-05-08 Thread Alan Cox

>  >  Thanks for your response, though -- maybe there is someone interested,
>  > after all. 
> 
> That's pretty arrogant that cut and pasting a few lines into some
> architecture specific files and reporting the updated patch is too
> much to ask.

And just how is he going to test it ? Considering he was just asking if the
concept was reasonable I think you are a little out of order

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



`smp_num_cpus' undeclared in 2.4.3

2001-05-08 Thread Jaswinder Singh

Dear linux-kernel mailing list,

I am trying to build 2.4.3 for Intel machine .

But i am getting this error when i say no to 'CONFIG_SMP' :-

In file included from ksyms.c:17:
/usr/src/linux-2.4.3/include/linux/kernel_stat.h:48: `smp_num_cpus'
undeclared (first use in this function)
/usr/src/linux-2.4.3/include/linux/kernel_stat.h:48: (Each undeclared
identifier is reported only once
/usr/src/linux-2.4.3/include/linux/kernel_stat.h:48: for each function it
appears in.)
make[2]: *** [ksyms.o] Error 1


but when i say yes to 'CONFIG_SMP' , there is no compilation error.

I am attaching autoconf.h for reference.

Thanks ,

Best Regards,

Jaswinder.
--
These are my opinions not 3Di.


 autoconf.h


Child first after fork violates the SCHED_FIFO and SCHED_RR standard.

2001-05-08 Thread george anzinger

The standard says a SCHED_FIFO task only gives up the processor if it
blocks, yields, or changes its priority.  

The counter is not really used by SCHED_FIFO tasks, however the
update_process_times() code will set the "need_resched" flag on a
SCHED_FIFO task, even though schedule() effectively ignores the entry. 
The attached patch addresses these issues by setting the counter to -100
for SCHED_FIFO tasks and "teaching" update_process_timers() to not count
down negative counters.  This avoids the calling of schedule() every
jiffie while a SCHED_FIFO task is running.

I tried to keep the change to recalculate confined to only the data
elements it was already touching, however, the standard really doesn't
allow recalculate to touch the SCHED_RR counter.  A standard conforming
test would restrict recalculate to only SCHED_OTHER tasks.

Comments?

George

diff -urP -X /usr/src/patch.exclude linux-2.4.4-kb/kernel/fork.c linux/kernel/fork.c
--- linux-2.4.4-kb/kernel/fork.cMon May  7 14:46:17 2001
+++ linux/kernel/fork.c Tue May  8 15:17:51 2001
@@ -673,10 +673,14 @@
 * if the child for a fork() just wants to do a few simple things
 * and then exec(). This is only important in the first timeslice.
 * In the long run, the scheduling behavior is unchanged.
+ * SCHED_FIFO tasks don't count down and have a negative counter.
+ * Don't change these, least they all end up at -1.
 */
-   p->counter = current->counter;
-   current->counter = 0;
-   current->need_resched = 1;
+if (p->policy == SCHED_OTHER){
+p->counter = current->counter;
+current->counter = 0;
+current->need_resched = 1;
+}
 
/*
 * Ok, add it to the run-queues and make it
diff -urP -X /usr/src/patch.exclude linux-2.4.4-kb/kernel/sched.c linux/kernel/sched.c
--- linux-2.4.4-kb/kernel/sched.c   Mon May  7 14:46:17 2001
+++ linux/kernel/sched.cTue May  8 13:43:54 2001
@@ -682,7 +682,10 @@
spin_unlock_irq(_lock);
read_lock(_lock);
for_each_task(p)
-   p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
+if (p->counter >= 0 ){
+p->counter = (p->counter >> 1) + 
+NICE_TO_TICKS(p->nice);
+}
read_unlock(_lock);
spin_lock_irq(_lock);
}
@@ -932,6 +935,11 @@
 
retval = 0;
p->policy = policy;
+if ( policy == SCHED_FIFO) {
+p->counter = -100;/* we don't count down neg couters */
+}else{
+p->counter = NICE_TO_TICKS(p->nice);
+}
p->rt_priority = lp.sched_priority;
if (task_on_runqueue(p))
move_first_runqueue(p);
diff -urP -X /usr/src/patch.exclude linux-2.4.4-kb/kernel/timer.c linux/kernel/timer.c
--- linux-2.4.4-kb/kernel/timer.c   Sun Dec 10 09:53:19 2000
+++ linux/kernel/timer.cTue May  8 15:14:37 2001
@@ -583,7 +583,11 @@
 
update_one_process(p, user_tick, system, cpu);
if (p->pid) {
-   if (--p->counter <= 0) {
+/*
+ * SCHED_FIFO and the idle(s) have counters set to -100, 
+ * so we won't count them.
+ */
+   if (p->counter >= 0 && --p->counter <= 0) {
p->counter = 0;
p->need_resched = 1;
}



Re: [PATCH][RFT] smbfs bugfixes for 2.4.4

2001-05-08 Thread James H. Puttick

> No, I broke it when copying the ncpfs dircache code.
> 
> That code will reuse an old inode if it already exists (and thus also
> any pages attached to it), which is what I wanted and should be fine
> except that it needs to invalidate_inode_pages() if something changed.
> 
> Xuan and James, you have both seen this bug with smbfs not properly
> handling changes made on the server. Could you please test this patch
> vs 2.4.4 and let me know if it helps or not.
> 
> http://www.hojdpunkten.ac.se/054/samba/smbfs-2.4.4-truncate+retry-4.patch

Urban:

I am actually using a 2.4.3 kernel, rather than 2.4.4. However, I 
manually applied the patches to my 2.4.3 kernel, and did some tests - 
it appears to work now!

I probably won't be using Samba heavily until next week, but I will let 
you know if I see any evidence that the problem is not fixed.

Thank you very much for the fix.

-- James



James H. Puttick

Kerr Vayne Systems Ltd.
1 Valleywood Drive, Unit 5A
Markham, Ontario L3R 5L9
Canada

+1 905 475 6161  office
+1 905 479 9833  fax

mailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] proc_root_init() made saner

2001-05-08 Thread Alexander Viro

Changes:
* proc_root_init() is called later in the boot sequence, after all essential
  VFS stuff had been initialized. That way we can have proc_mnt (along
  with superblock, root of dentry tree, etc.) set before we start registering
  any entries. As the result, now we are able to use all normal VFS machinery
  in create_proc_entry() and friends.  What's more important, we can use it
  when we do sysctl_init(). That will allow to remove a lot of cruft from
  sysctl handling.
* procfs_syms.c is gone (merged with root.c).

This change is backwards compatible, BTW - nothing done that early
(between the old and new locations of proc_root_init() call) tries to create
proc entries, so we don't break anything by postponing the call.

Please, apply it. It makes life much simpler for all procfs and sysctl
stuff - we _will_ need something equivalent if we ever want to get rid of
proc_dir_entry mess.
Al

diff -urN S5-pre1/fs/proc/Makefile S5-pre1-proc_init/fs/proc/Makefile
--- S5-pre1/fs/proc/MakefileFri Feb 16 21:06:31 2001
+++ S5-pre1-proc_init/fs/proc/Makefile  Tue May  8 17:36:57 2001
@@ -9,10 +9,10 @@
 
 O_TARGET := proc.o
 
-export-objs := procfs_syms.o
+export-objs := root.o
 
 obj-y:= inode.o root.o base.o generic.o array.o \
-   kmsg.o proc_tty.o proc_misc.o kcore.o procfs_syms.o
+   kmsg.o proc_tty.o proc_misc.o kcore.o
 
 ifeq ($(CONFIG_PROC_DEVICETREE),y)
 obj-y += proc_devtree.o
diff -urN S5-pre1/fs/proc/procfs_syms.c S5-pre1-proc_init/fs/proc/procfs_syms.c
--- S5-pre1/fs/proc/procfs_syms.c   Tue May  8 17:55:17 2001
+++ S5-pre1-proc_init/fs/proc/procfs_syms.c Wed Dec 31 19:00:00 1969
@@ -1,46 +0,0 @@
-#include 
-#include 
-#include 
-#include 
-#include 
-
-extern struct proc_dir_entry *proc_sys_root;
-
-#ifdef CONFIG_SYSCTL
-EXPORT_SYMBOL(proc_sys_root);
-#endif
-EXPORT_SYMBOL(proc_symlink);
-EXPORT_SYMBOL(proc_mknod);
-EXPORT_SYMBOL(proc_mkdir);
-EXPORT_SYMBOL(create_proc_entry);
-EXPORT_SYMBOL(remove_proc_entry);
-EXPORT_SYMBOL(proc_root);
-EXPORT_SYMBOL(proc_root_fs);
-EXPORT_SYMBOL(proc_net);
-EXPORT_SYMBOL(proc_bus);
-EXPORT_SYMBOL(proc_root_driver);
-
-static DECLARE_FSTYPE(proc_fs_type, "proc", proc_read_super, FS_SINGLE);
-
-static int __init init_proc_fs(void)
-{
-   int err = register_filesystem(_fs_type);
-   if (!err) {
-   proc_mnt = kern_mount(_fs_type);
-   err = PTR_ERR(proc_mnt);
-   if (IS_ERR(proc_mnt))
-   unregister_filesystem(_fs_type);
-   else
-   err = 0;
-   }
-   return err;
-}
-
-static void __exit exit_proc_fs(void)
-{
-   unregister_filesystem(_fs_type);
-   kern_umount(proc_mnt);
-}
-
-module_init(init_proc_fs)
-module_exit(exit_proc_fs)
diff -urN S5-pre1/fs/proc/root.c S5-pre1-proc_init/fs/proc/root.c
--- S5-pre1/fs/proc/root.c  Fri Feb 16 20:25:45 2001
+++ S5-pre1-proc_init/fs/proc/root.cTue May  8 17:37:44 2001
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct proc_dir_entry *proc_net, *proc_bus, *proc_root_fs, *proc_root_driver;
@@ -22,8 +23,19 @@
 struct proc_dir_entry *proc_sys_root;
 #endif
 
+static DECLARE_FSTYPE(proc_fs_type, "proc", proc_read_super, FS_SINGLE);
+
 void __init proc_root_init(void)
 {
+   int err = register_filesystem(_fs_type);
+   if (err)
+   return;
+   proc_mnt = kern_mount(_fs_type);
+   err = PTR_ERR(proc_mnt);
+   if (IS_ERR(proc_mnt)) {
+   unregister_filesystem(_fs_type);
+   return;
+   }
proc_misc_init();
proc_net = proc_mkdir("net", 0);
 #ifdef CONFIG_SYSVIPC
@@ -106,3 +118,17 @@
proc_fops:  _root_operations,
parent: _root,
 };
+
+#ifdef CONFIG_SYSCTL
+EXPORT_SYMBOL(proc_sys_root);
+#endif
+EXPORT_SYMBOL(proc_symlink);
+EXPORT_SYMBOL(proc_mknod);
+EXPORT_SYMBOL(proc_mkdir);
+EXPORT_SYMBOL(create_proc_entry);
+EXPORT_SYMBOL(remove_proc_entry);
+EXPORT_SYMBOL(proc_root);
+EXPORT_SYMBOL(proc_root_fs);
+EXPORT_SYMBOL(proc_net);
+EXPORT_SYMBOL(proc_bus);
+EXPORT_SYMBOL(proc_root_driver);
diff -urN S5-pre1/init/main.c S5-pre1-proc_init/init/main.c
--- S5-pre1/init/main.c Wed May  2 11:16:38 2001
+++ S5-pre1-proc_init/init/main.c   Tue May  8 17:19:42 2001
@@ -561,9 +561,6 @@
 #endif
mem_init();
kmem_cache_sizes_init();
-#ifdef CONFIG_PROC_FS
-   proc_root_init();
-#endif
mempages = num_physpages;
 
fork_init(mempages);
@@ -577,6 +574,9 @@
signals_init();
bdev_init();
inode_init(mempages);
+#ifdef CONFIG_PROC_FS
+   proc_root_init();
+#endif
 #if defined(CONFIG_SYSVIPC)
ipc_init();
 #endif

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please 

Re: [patch] 2.4.4: mmap() fails for certain legal requests

2001-05-08 Thread David S. Miller


Czesc,

Maciej W. Rozycki writes:
 >  Yep, I know (ia64 and sparc*).  But being lazy enough (and being short on
 > time) I won't do it until I know the idea of the change is accepted.  I'm
 > sorry -- I sent previous versions of the patch twice since last Summer
 > with no response at all and doing bits no one is interested in is a waste
 > of time.
 > 
 >  Thanks for your response, though -- maybe there is someone interested,
 > after all. 

That's pretty arrogant that cut and pasting a few lines into some
architecture specific files and reporting the updated patch is too
much to ask.

Perhaps reviewing your change is also, too much to ask.  Perhaps
we are too lazy and short on time to have a look at your change.

I don't think it's asking a lot to provide a complete change.

I'm sure the MIPS folks know all too well whats it's like when their
port is crapped up because someone only made changes to x86 port
portions.  At least for me on after working on Sparc for some time,
I'm adamant about providing complete changes so that this kind of
grief is avoided for other port maintainers.

In the time you used to compose your response to me, and now
to read this email from me, you could have fixed up the patch
perhaps 2 or 3 times.  Just do it and get it over with ok?

Dziekuje.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: pci_pool_free from IRQ

2001-05-08 Thread David S. Miller


Alan Cox writes:
 > I suspect we should fix the documentation (and if need be the code) to reflect
 > the fact that you have to be completely out of your tree to handle device 
 > removal in the irq handler

Agreed.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: oddity with page_launder() handling of dirty pages

2001-05-08 Thread Linus Torvalds



On Tue, 8 May 2001, Marcelo Tosatti wrote:
>
> Linus, since you wrote that part of the code, I ask you: do you have any
> reason to not remove a page being writepage()'d from the
> inactive_dirty_list to avoid this kind of problems ?
>
> (the page must be added back to the inactive_dirty_list again after the
> writeout, yes).

This is the reason. I think it is absolutely _wrong_ to add it back after
the writeout - anything could have happened to the page, including the
page moving to other lists or not being a page cache page AT ALL.

We had tons of bugs in this area when the page lists were introduced.

Leaving it on the list and letting anybody who changed the state of the
page remove it cleanly fixed all the bugs. And I'm not going back to the
old and broken code.

You can move it to the "active_list" if you want to while it is being
written out ("it's busy, so it's active"). As long as you move it _before_
you start the write-out.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Direct Sockets Support??

2001-05-08 Thread 'Pete Wyckoff'

[EMAIL PROTECTED] said:
>   > But in the case of an application which fits in main memory, and
>   > has been running for a while (so all pages are present and
>   > dirty), all you'd really have to do is verify the page tables are
>   > in the proper state and skip the TLB flush, right?
> 
>   We really cannot assume this. There are two cases 
>   a. when a user app wants to receive some data, it allocates
> memory(using malloc) and waits for the hw to do zero-copy read. The kernel
> does not allocate physical page frames for the entire memory region
> allocated. We need to lock the memory (and locking is expensive due to
> costly TLB flushes) to do this
> 
>   b. when a user app wants to send data, he fills the buffer
> and waits for the hw to transmit data, but under heavy physical memory
> pressure, the swapper might swap the pages we want to transmit. So we need
> to lock the memory to be 100% sure.

You're right, of course.  But I suspect that the fast path of
re-locking memory which is happily in core will go much faster
by removing the multi-processor TLB purge.  And it can't hurt,
unless I'm missing something.

-- Pete

--- linux-2.4.4-stock/mm/mlock.cTue May  8 17:26:34 2001
+++ linux/mm/mlock.cTue May  8 17:24:13 2001
@@ -114,6 +114,10 @@
return 0;
 }
 
+/* implemented in mm/memory.c */
+extern int mlock_make_pages_present(struct vm_area_struct *vma,
+   unsigned long addr, unsigned long end);
+
 static int mlock_fixup(struct vm_area_struct * vma, 
unsigned long start, unsigned long end, unsigned int newflags)
 {
@@ -138,7 +142,7 @@
pages = (end - start) >> PAGE_SHIFT;
if (newflags & VM_LOCKED) {
pages = -pages;
-   make_pages_present(start, end);
+   mlock_make_pages_present(vma, start, end);
}
vma->vm_mm->locked_vm -= pages;
}

--- linux-2.4.4-stock/mm/memory.c   Tue May  8 17:25:36 2001
+++ linux/mm/memory.c   Tue May  8 17:24:40 2001
@@ -1438,3 +1438,80 @@
} while (addr < end);
return 0;
 }
+
+/*
+ * Specialized version of make_pages_present which does not require
+ * a multi-processor TLB purge for every page if nothing about the PTE
+ * was modified.
+ */
+int mlock_make_pages_present(struct vm_area_struct *vma,
+   unsigned long addr, unsigned long end)
+{
+   int ret, write;
+   struct mm_struct *mm = current->mm;
+
+   write = (vma->vm_flags & VM_WRITE) != 0;
+
+   /*
+* We need the page table lock to synchronize with kswapd
+* and the SMP-safe atomic PTE updates.
+*/
+   spin_lock(>page_table_lock);
+
+   ret = 0;
+   for (ret=0; !ret && addr < end; addr += PAGE_SIZE) {
+   pgd_t *pgd;
+   pmd_t *pmd;
+   pte_t *pte, entry;
+   int modified;
+
+   current->state = TASK_RUNNING;
+   pgd = pgd_offset(mm, addr);
+   pmd = pmd_alloc(mm, pgd, addr);
+   if (!pmd) {
+   ret = -1;
+   break;
+   }
+   pte = pte_alloc(mm, pmd, addr);
+   if (!pte) {
+   ret = -1;
+   break;
+   }
+   entry = *pte;
+   if (!pte_present(entry)) {
+   /*
+* If it truly wasn't present, we know that kswapd
+* and the PTE updates will not touch it later. So
+* drop the lock.
+*/
+   if (pte_none(entry)) {
+   ret = do_no_page(mm, vma, addr, write, pte);
+   continue;
+   }
+   ret = do_swap_page(mm, vma, addr, pte,
+   pte_to_swp_entry(entry), write);
+   continue;
+   }
+
+   modified = 0;
+   if (write) {
+   if (!pte_write(entry)) {
+   ret = do_wp_page(mm, vma, addr, pte, entry);
+   continue;
+   }
+   if (!pte_dirty(entry)) {
+   entry = pte_mkdirty(entry);
+   modified = 1;
+   }
+   }
+   if (!pte_young(entry)) {
+   entry = pte_mkyoung(entry);
+   modified = 1;
+   }
+   if (modified)
+   establish_pte(vma, addr, pte, entry);
+   }
+
+   spin_unlock(>page_table_lock);
+   return ret;
+}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: pci_pool_free from IRQ

2001-05-08 Thread Alan Cox

> This sure makes life difficult. Device removal events can be called
> from interrupt context according to Documentation/pci.txt. This is
> certainly a place where one might want to call pci_consistent_free.

None of our device code supports interrupt based device removal. In fact
many drivers use vmalloc directly so will hit the same problem the
pci_consistent_free hits on the ARM.

I suspect we should fix the documentation (and if need be the code) to reflect
the fact that you have to be completely out of your tree to handle device 
removal in the irq handler
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.4: mmap() fails for certain legal requests

2001-05-08 Thread Maciej W. Rozycki

On Tue, 8 May 2001, David S. Miller wrote:

> There are several get_unmapped_area() implementations besides the
> standard one (search for HAVE_ARCH_UNMAPPED_AREA).  Please fix
> them up too.

 Yep, I know (ia64 and sparc*).  But being lazy enough (and being short on
time) I won't do it until I know the idea of the change is accepted.  I'm
sorry -- I sent previous versions of the patch twice since last Summer
with no response at all and doing bits no one is interested in is a waste
of time.

 Thanks for your response, though -- maybe there is someone interested,
after all. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re:

2001-05-08 Thread george anzinger

"Richard B. Johnson" wrote:
> 
> To driver wizards:
> 
> I have a driver which needs to wait for some hardware.
> Basically, it needs to have some code added to the run-queue
> so it can get some CPU time even though it's not being called.
> 
> It needs to get some CPU time which can be "turned on" or
> "turned off" as a result of an interrupt or some external
> input from  an ioctl().
> 
> So I thought that the "tasklet" would be ideal. However, the
> scheduler "thinks" that a tasklet is an interrupt, so any
> attempt to sleep in the tasklet results in a kernel panic,
> "ieee scheduling in an interrupt..., BUG sched.c line 688".
> 
> Next, I added code to try queue_task(). This has the same problem.
> 
> Basically the procedure needs to do:
> 
> procedure()
> {
> if(some_event)
> schedule_timeout(n);   /* Needs to sleep */
> else if(something_else)
> do_something();
>queue_task(procedure, _immediate);   /* Needs to queue itself again */
> }
> 
> Since I'm running against a time-line, I temporarily  gave the module
> some CPU time through an ioctl(), i.e., a separate task that does nothing
> except repeatably execute ioctl(GIVE_CPU, NULL); This shows that the
> driver actually works. It's a GPIB driver so it needs to get the
> CPU to find out if it's addressed to listen, etc. These events don't
> produce interrupts.
> 
> So, what am I supposed to do to add a piece of driver code to the
> run queue so it gets scheduled occasionally?
> 
> Cheers,
> Dick Johnson

How about something like:

#include 

void queue_task(void process_timeout(void), unsigned long timeout,
struct timer_list *timer, unsigned long data)
{
unsigned long expire = timeout + jiffies;

init_timer();
timer->expires = expire;
timer->data = data;
timer->function = process_timeout;

add_timer();
}


You will have to define the "struct timer_list timer".  This should
cause the function passed to be called after "timeout" jiffies (1/HZ,
not to be confused with 10 ms).  If you want to stop the timer early do:

del_timer_sync();

"data" was not used in you example, but process_timeout will be passed
"data" when it is called.  This routine is called as part of the timer
interrupt, so it must be fast and should not do schedule() calls.  It
could queue a tasklet, however, to relax constraints a bit.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: REVISED: Experimentation with Athlon and fast_page_copy

2001-05-08 Thread Arjan van de Ven

In article <[EMAIL PROTECTED]> you wrote:
> Arjan - care to unroll the tail 320 bytes of copying from the main loop ?

I'll see what I can do to make us not loose too much speed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Athlon and fast_page_copy: What's it worth ? :)

2001-05-08 Thread Arjan van de Ven

In article <[EMAIL PROTECTED]> you wrote:
> Hi,

>   Before I go any further with this investigation, I'd like to get an
> idea
> of how much of a performance improvement the K7 fast_page_copy will give
> me.
> Can someone suggest the best benchmark to test the speed of this
> routine?

http://www.fenrus.demon.nl/athlon.c

is a userspace benchmark of the current code vs C etc
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: pci_pool_free from IRQ

2001-05-08 Thread Albert D. Cahalan

Pete Zaitcev writes:

> Russel King complained that you might be calling pci_consistent_free
> from an interrupt, which is unsafe on ARM.

This sure makes life difficult. Device removal events can be called
from interrupt context according to Documentation/pci.txt. This is
certainly a place where one might want to call pci_consistent_free.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



pci_pool_free from IRQ

2001-05-08 Thread Pete Zaitcev

David,

Russel King complained that you might be calling pci_consistent_free
from an interrupt, which is unsafe on ARM. Why don't you remove this
part from pci_pool_free():

+   else if (!is_page_busy (pool->blocks_per_page, page->bitmap))
+   pool_free_page (pool, page);

In that case, fully free pages will stick about until the whole
pool is destroyed, which I think is not a big deal.

-- Pete
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: your mail

2001-05-08 Thread Richard B. Johnson

On Tue, 8 May 2001, Alan Cox wrote:

> > I have a driver which needs to wait for some hardware.
> > Basically, it needs to have some code added to the run-queue
> > so it can get some CPU time even though it's not being called.
> 
> Wht does it have to wait ? Why cant it just poll and come back next time ?
> 

Good question. I wanted to be able to call the exact same routine(s)
that other routines (exected from read() and write()), execute.
These routines are complex and sleep while waiting for events. I
didn't want to duplicate that code with different time-out mechanisms.

GPIB is nasty because you can't do anything unless the 'controller'
tells you to do it. When "addressed to talk", you have to parse
all the stuff sent via interrupt (ATN bit set, control byte, which
address from the control byte, etc.), then let somebody sleeping
in poll() know that they can now "write()". That can all be handled
via interrupt. But, now for the receive . The user-mode code
needs to be sleeping until some data are available. That data
may never be available. Something in the driver needs to wait
until the hardware is addressed to receive. Since it is not now
receiving, there is no interrupt! It takes time for the controller
to tell you to listen and then tell somebody else to talk to you.
This means that I need some timeout to recover from the fact
that the other guy may never talk.

Once the other guy starts sending data, the interrupts can be
used to handle the data and, once there are valid data, the
device owner can be awakened, presumably sleeping in poll() or
select(). It's the intermediate time where there are no
interrupts that needs the CPU to determine that we've waited
too long for interrupts so the device had better get off the
bus to start the error recovery procedure.

Bright an early tommorrow, I will check out both ways. A kernel
thread might be "neat". However, I may just look to see if
I can just poll while using existing code.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: OT: ps source?

2001-05-08 Thread Albert D. Cahalan

Pierre Rousselet writes:
> James Bourne wrote:

>> From the procps man page:
>>Albert Cahalan <[EMAIL PROTECTED]> rewrote ps  for  full
>>Unix98  and  BSD  support,  along with some ugly hacks for
>>obsolete and foreign syntax.
>> 
>>Michael K. Johnson <[EMAIL PROTECTED]>  is  the  current
>>maintainer.

There has been a bit of a fork actually... sorry.

> Right. For international support procps-2.0.7 is the one to choose with
> the patch procps-2.0.7-intl.patch.

That one is quite buggy. The parser is broken ("ps -o %p" fails),
you can get a core dump if you get unlucky with the System.map file,
the BSD-style process selection is incorrect... I've fixed about 100
bugs and introduced only a few.

What you really ought to use is the Debian package. That gives you
my source plus a few fixes that I don't have yet. Head over to
www.debian.org and drill down to the "unstable" package. There you
will find a source tarball and a patch file for it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Patch to improve readability of sock_rcvlowat() - comments wanted...

2001-05-08 Thread Jesper Juhl

Ronald Bultje wrote:

> On 2001.05.08 01:04:57 +0200 Jesper Juhl wrote:
> 
>> static inline int sock_rcvlowat(struct sock *sk, int waitall, int len)
>> {
>>  int r = len;
>>  if (!waitall)
>>  r = min(sk->rcvlowat, len);
>>  return max(1,r);
>> }
>> 
> 
> 
> return max(1, waitall ? len : min(sk->rcvlowat, len));
> 
> Although I doubt this is more readable... :-)
> 

IMO your version is less readable than the 4-liner above, and the code 
it generates is a lot bigger than both the original and the proposed 
replacement - but thank you for the suggestion...

- Jesper Juhl - [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH][RFT] smbfs bugfixes for 2.4.4

2001-05-08 Thread Urban Widmark

On 7 May 2001, Linus Torvalds wrote:

> It has code to do that in smb_revalidate_inode(), but it may be that
> something else refreshes the inode size _without_ doing the proper
> invalidation checks. Or maybe Urban broke that logic by mistake while
> fixing the other one ;)

No, I broke it when copying the ncpfs dircache code.

That code will reuse an old inode if it already exists (and thus also any
pages attached to it), which is what I wanted and should be fine except
that it needs to invalidate_inode_pages() if something changed.


Xuan and James, you have both seen this bug with smbfs not properly
handling changes made on the server. Could you please test this patch vs
2.4.4 and let me know if it helps or not.

http://www.hojdpunkten.ac.se/054/samba/smbfs-2.4.4-truncate+retry-4.patch
(Apply with 'patch -p1' in the linux/ source dir)

/Urban

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Direct Sockets Support??

2001-05-08 Thread Alan Cox

>   a. when a user app wants to receive some data, it allocates
> memory(using malloc) and waits for the hw to do zero-copy read. The kernel
> does not allocate physical page frames for the entire memory region
> allocated. We need to lock the memory (and locking is expensive due to
> costly TLB flushes) to do this
> 
>   b. when a user app wants to send data, he fills the buffer
> and waits for the hw to transmit data, but under heavy physical memory
> pressure, the swapper might swap the pages we want to transmit. So we need
> to lock the memory to be 100% sure.
> 

Or c) you prealloc two ring buffers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: your mail

2001-05-08 Thread Alan Cox

> I have a driver which needs to wait for some hardware.
> Basically, it needs to have some code added to the run-queue
> so it can get some CPU time even though it's not being called.

Wht does it have to wait ? Why cant it just poll and come back next time ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread David S. Miller


Marcelo Tosatti writes:
 > On Tue, 8 May 2001, Mark Hemment wrote:
 > >   Does anyone know why the 2.4.3pre6 change was made?
 > 
 > Because wakeup_bdflush(0) can wakeup bdflush _even_ if it does not have
 > any job to do (ie less than 30% dirty buffers in the default config).  

Actually, the change was made because it is illogical to try only
once on multi-order pages.  Especially because we depend upon order
1 pages so much (every task struct allocated).  We depend upon them
even more so on sparc64 (certain kinds of page tables need to be
allocated as 1 order pages).

The old code failed _far_ too easily, it was unacceptable.

Why put some strange limit in there?  Whatever number you pick
is arbitrary, and I can probably piece together an allocation
state where the choosen limit is too small.

So instead, you could test for the condition that prevents any
possible forward progress, no?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[patch] 2.4.4: mmap() fails for certain legal requests

2001-05-08 Thread Maciej W. Rozycki

Hi,

 The mmap() call fails when addr is specified, MAP_FIXED is cleared in
flags and no address space can be allocated either at addr or above it. 
This is a legal request and it should not fail as long as there is space
available below addr.  Following is a patch that fixes the problem. 

 This is nothing new -- I already submitted a similar patch against
2.4.0-test4 once upon a time.  This patch is clean(er), though, and I
believe it can be safely applied to the upcoming 2.4.5 release. 

 A simple test case to trigger the current mmap() bad behaviour for 32-bit
CPUs is something like: 

fd = open("/dev/zero", O_RDONLY);
p = mmap((void *)0xf000, 4096, PROT_READ, MAP_SHARED, fd, 0);

With my patch the code does not fail anymore -- p is set to an available
address lower than 0xf000. 

 The bug was discovered when tracking down the reason of dlopen() failures
when called from statically linked binaries on MIPS/Linux.  The patch
fixes them.

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

diff -up --recursive --new-file linux-2.4.4.macro/mm/mmap.c linux-2.4.4/mm/mmap.c
--- linux-2.4.4.macro/mm/mmap.c Tue May  1 17:24:25 2001
+++ linux-2.4.4/mm/mmap.c   Tue May  1 18:23:25 2001
@@ -219,7 +219,7 @@ unsigned long do_mmap_pgoff(struct file 
if ((len = PAGE_ALIGN(len)) == 0)
return addr;
 
-   if (len > TASK_SIZE || addr > TASK_SIZE-len)
+   if (len > TASK_SIZE)
return -EINVAL;
 
/* offset overflow? */
@@ -405,9 +405,15 @@ static inline unsigned long arch_get_unm
 
if (len > TASK_SIZE)
return -ENOMEM;
-   if (!addr)
-   addr = TASK_UNMAPPED_BASE;
-   addr = PAGE_ALIGN(addr);
+
+   if (addr) {
+   addr = PAGE_ALIGN(addr);
+   vma = find_vma(current->mm, addr);
+   if (TASK_SIZE - len >= addr &&
+   (!vma || addr + len <= vma->vm_start))
+   return addr;
+   }
+   addr = PAGE_ALIGN(TASK_UNMAPPED_BASE);
 
for (vma = find_vma(current->mm, addr); ; vma = vma->vm_next) {
/* At this point:  (!vma || addr < vma->vm_end). */
@@ -425,6 +431,8 @@ extern unsigned long arch_get_unmapped_a
 unsigned long get_unmapped_area(struct file *file, unsigned long addr, unsigned long 
len, unsigned long pgoff, unsigned long flags)
 {
if (flags & MAP_FIXED) {
+   if (addr > TASK_SIZE - len)
+   return -EINVAL;
if (addr & ~PAGE_MASK)
return -EINVAL;
return addr;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: your mail

2001-05-08 Thread Jens Axboe

On Tue, May 08 2001, Richard B. Johnson wrote:
> > Use a kernel thread? If you don't need to access user space, context
> > switches are very cheap.
> > 
> > > So, what am I supposed to do to add a piece of driver code to the
> > > run queue so it gets scheduled occasionally?
> > 
> > Several, grep for kernel_thread.
> > 
> > -- 
> > Jens Axboe
> > 
> 
> Okay. Thanks. I thought I would have to do that too. No problem.

A small worker thread and a wait queue to sleeep on and you are all set,
10 minutes tops :-)

> It's a "tomorrow" thing. Ten hours it too long to stare at a
> screen.

Sissy!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: [RFC] Direct Sockets Support??

2001-05-08 Thread Venkatesh Ramamurthy

> But in the case of an application which fits in main memory, and
> has been running for a while (so all pages are present and
> dirty), all you'd really have to do is verify the page tables are
> in the proper state and skip the TLB flush, right?

We really cannot assume this. There are two cases 
a. when a user app wants to receive some data, it allocates
memory(using malloc) and waits for the hw to do zero-copy read. The kernel
does not allocate physical page frames for the entire memory region
allocated. We need to lock the memory (and locking is expensive due to
costly TLB flushes) to do this

b. when a user app wants to send data, he fills the buffer
and waits for the hw to transmit data, but under heavy physical memory
pressure, the swapper might swap the pages we want to transmit. So we need
to lock the memory to be 100% sure.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: your mail

2001-05-08 Thread Richard B. Johnson

On Tue, 8 May 2001, Jens Axboe wrote:

> On Tue, May 08 2001, Richard B. Johnson wrote:
> > 
> > To driver wizards:
> > 
> > I have a driver which needs to wait for some hardware.
> > Basically, it needs to have some code added to the run-queue
> > so it can get some CPU time even though it's not being called.
> > 
> > It needs to get some CPU time which can be "turned on" or
> > "turned off" as a result of an interrupt or some external
> > input from  an ioctl().
> > 
> > So I thought that the "tasklet" would be ideal. However, the
> > scheduler "thinks" that a tasklet is an interrupt, so any
> > attempt to sleep in the tasklet results in a kernel panic,
> > "ieee scheduling in an interrupt..., BUG sched.c line 688".
> 
> Use a kernel thread? If you don't need to access user space, context
> switches are very cheap.
> 
> > So, what am I supposed to do to add a piece of driver code to the
> > run queue so it gets scheduled occasionally?
> 
> Several, grep for kernel_thread.
> 
> -- 
> Jens Axboe
> 

Okay. Thanks. I thought I would have to do that too. No problem.
It's a "tomorrow" thing. Ten hours it too long to stare at a
screen.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: your mail

2001-05-08 Thread Jens Axboe

On Tue, May 08 2001, Richard B. Johnson wrote:
> 
> To driver wizards:
> 
> I have a driver which needs to wait for some hardware.
> Basically, it needs to have some code added to the run-queue
> so it can get some CPU time even though it's not being called.
> 
> It needs to get some CPU time which can be "turned on" or
> "turned off" as a result of an interrupt or some external
> input from  an ioctl().
> 
> So I thought that the "tasklet" would be ideal. However, the
> scheduler "thinks" that a tasklet is an interrupt, so any
> attempt to sleep in the tasklet results in a kernel panic,
> "ieee scheduling in an interrupt..., BUG sched.c line 688".

Use a kernel thread? If you don't need to access user space, context
switches are very cheap.

> So, what am I supposed to do to add a piece of driver code to the
> run queue so it gets scheduled occasionally?

Several, grep for kernel_thread.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



oddity with page_launder() handling of dirty pages

2001-05-08 Thread Marcelo Tosatti


Hi, 

I was just wondering how bad the current way of writing out dirty pages is
wrt multiple page_launder() users. 

We don't remove a dirty page from the inactive dirty list when writing it
out (as opposed to "direct" page->buffers ll_rw_block() IO). 

When we have multiple users inside page_launder(), that means a dirty page
which is being written out (and has an additional reference gotten by the
writer) but has no page->buffers mapping yet will be moved to the
beginning of the active list and kept there until the reference is
released by the writer (since refill_inactive_scan() will not move it back
to the inactive dirty list because of the extra reference). 

Remeber that we limit the amount of swap writeout's at rw_swap_page(), so
any writepage() which blocks there will have its page moved to the
_beginning_ of the active list because it has no page->buffers yet. 

Linus, since you wrote that part of the code, I ask you: do you have any
reason to not remove a page being writepage()'d from the
inactive_dirty_list to avoid this kind of problems ? 

(the page must be added back to the inactive_dirty_list again after the
writeout, yes).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[i386 arch] MTR messages significant?]

2001-05-08 Thread LA Walsh

I've been seeing these for a while now (2.4.4 - <=2.4.2) also coincidental
with a change to XFree86 X 4.0.3 from "MetroX" in the time frame.  Am not sure
exactly when they started but was wondering if they were significant.  It
seems some app is trying to delete or modify something.  On console and in syslog:

mtrr: no MTRR for fd00,80 found
mtrr: MTRR 1 not used
mtrr: reg 1 not used

while /proc/mtrr currently contains:

reg00: base=0x (   0MB), size= 512MB: write-back, count=1
reg01: base=0xfd00 (4048MB), size=   8MB: write-combining, count=1

Could it be the X server trying to delete a segment when it it starts up or
shuts down?  Is it an error in the X server to try to delete a non-existant
segment?  Does the kernel 'care'?  I.e. -- why is it printing out messages --
are they debug messages that perhaps should be off by default?

Concurrent with these messages and perhaps unrelated is a new, unwelcome,
behavior of X dying on display of some Netscape-rendered websites (cf. it
doesn't die under konqueror).

thanks,
-linda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Direct Sockets Support??

2001-05-08 Thread Pete Wyckoff

[EMAIL PROTECTED] said:
> > A couple of concerns I have:
> >  * How to pin or pagelock the application buffer without
> > making a kernel transition.
> 
> You need to pin them in advance. And pinning pages is _expensive_ so you dont
> want to keep pinning/unpinning pages

I can't convince myself why this has to be so expensive.  The
current implementation does this for mlock:

1.  Split vma if only a subset of the pages are being locked.
2.  Mark bit in vma.
3.  Make sure the pages are in core.

That third step has the potential of being the most expensive,
as changing the page tables requires invalidating the TLBs on all
processors.  Currently make_pages_present() does the work for 3.

But in the case of an application which fits in main memory, and
has been running for a while (so all pages are present and
dirty), all you'd really have to do is verify the page tables are
in the proper state and skip the TLB flush, right?

Then 3 turns into a single spin_lock pair for the page_table_lock, 
and walking down the page table.

The VMA splitting can be nasty, as it might require a couple of
slab allocations, and doing an AVL insertion.  (More nastiness in
the case of shared memory or file mapping, too.)  But nothing
like playing with TLBs.

Any reason why make_pages_present() is not the really oversized
hammer it seems to be?

-- Pete
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



No Subject

2001-05-08 Thread Richard B. Johnson


To driver wizards:

I have a driver which needs to wait for some hardware.
Basically, it needs to have some code added to the run-queue
so it can get some CPU time even though it's not being called.

It needs to get some CPU time which can be "turned on" or
"turned off" as a result of an interrupt or some external
input from  an ioctl().

So I thought that the "tasklet" would be ideal. However, the
scheduler "thinks" that a tasklet is an interrupt, so any
attempt to sleep in the tasklet results in a kernel panic,
"ieee scheduling in an interrupt..., BUG sched.c line 688".

Next, I added code to try queue_task(). This has the same problem.

Basically the procedure needs to do:

procedure()
{
if(some_event)
schedule_timeout(n);   /* Needs to sleep */
else if(something_else)
do_something();
   queue_task(procedure, _immediate);   /* Needs to queue itself again */
}

Since I'm running against a time-line, I temporarily  gave the module
some CPU time through an ioctl(), i.e., a separate task that does nothing
except repeatably execute ioctl(GIVE_CPU, NULL); This shows that the
driver actually works. It's a GPIB driver so it needs to get the
CPU to find out if it's addressed to listen, etc. These events don't
produce interrupts.

So, what am I supposed to do to add a piece of driver code to the
run queue so it gets scheduled occasionally?

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Jens Axboe

On Tue, May 08 2001, Marcelo Tosatti wrote:
> >   The attached patch (against 2.4.5-pre1) fixes the looping symptom, by
> > adding a counter and looping only twice for non-zero order allocations.
> 
> Looks good. (actually Rik had a patch similar to this which fixed a real
> case with cdda2wav just like you described)

Not cdda2wav, I pressume, but the optimization discussed here before that
wasn't really doable because of the vm behaviour when doing

do 
try to alloc some amount of contiogous pages
if (ok)
break

lower number of pages wanted
while true

CDROMREADAUDIO stopped doing this and fell back to single cdda frame
size allocations because of these failures, even though it meant a huge
decrease in speed. cdda2wav will ask for iirc 16 frames at the time, the
current driver will try and to 8 first and then fall back to slower
extraction if allocations fail.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: write to dvd ram

2001-05-08 Thread Jens Axboe

On Tue, May 08 2001, Thiago Vinhas de Moraes wrote:
> 
> Hi!
> 
> Can this new UDF driver do cd-rewriting ?

No not in itself, but you can give the pktcdvd module a shot. It can do
rw CD-RW mount so far, at least.

*.kernel.org/pub/linux/kernel/people/axboe/packet/

There's a packet-writing mailing list for the above patch, there is more
info in the tar file above (subscribe info, archives, resources, etc).

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Marcelo Tosatti



On Tue, 8 May 2001, Mark Hemment wrote:

> 
>   In 2.4.3pre6, code in page_alloc.c:__alloc_pages(), changed from;
> 
>   try_to_free_pages(gfp_mask);
>   wakeup_bdflush();
>   if (!order)
>   goto try_again;
> to
>   try_to_free_pages(gfp_mask);
>   wakeup_bdflush();
>   goto try_again;
> 
> 
>   This introduced the effect of a non-zero order, __GFP_WAIT allocation
> (without PF_MEMALLOC set), never returning failure.  The allocation keeps
> looping in __alloc_pages(), kicking kswapd, until the allocation succeeds.
> 
>   If there is plenty of memory in the free-pools and inactive-lists
> free_shortage() will return false, causing the state of these
> free-pools/inactive-lists not to be 'improved' by kswapd.
> 
>   If there is nothing else changing/improving the free-pools or
> inactive-lists, the allocation loops forever (kicking kswapd).
> 
>   Does anyone know why the 2.4.3pre6 change was made?

Because wakeup_bdflush(0) can wakeup bdflush _even_ if it does not have
any job to do (ie less than 30% dirty buffers in the default config).  

> 
>   The attached patch (against 2.4.5-pre1) fixes the looping symptom, by
> adding a counter and looping only twice for non-zero order allocations.

Looks good. (actually Rik had a patch similar to this which fixed a real
case with cdda2wav just like you described)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: write to dvd ram

2001-05-08 Thread Thiago Vinhas de Moraes


Hi!

Can this new UDF driver do cd-rewriting ?



Em Ter 08 Mai 2001 14:50, Jens Axboe escreveu:
> On Tue, May 08 2001, Ben Fennema wrote:
> > > The log is:
> > > Apr 15 20:58:27 hydra kernel: UDF-fs INFO UDF 0.9.1 (2000/02/29)
> > > Mounting volume 'UDF Volume', timestamp 2001/03/02 11:55 (1e98)
> >
> > At the very least, run 0.9.3 from sourceforce (or the cvs version) and
> > see if it works any better.
>
> I was just about to say the same thing, 0.9.3 works well for me. In fact
> so well, that I made a patch to bring 2.4.5-pre1 UDF up to date with
> current CVS earlier this afternoon (hint hint, Ben :-).
>
> *.kernel.org/pub/linux/kernel/people/axboe/patches/2.4.5-pre1/
>
> udf-0.9.3-2.4.5p1-1.bz2

-- 

 Thiago Vinhas de Moraes
 NetWorx - A SuaCompanhia.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LSB 0.9 public draft

2001-05-08 Thread Guest section DW

On Tue, May 08, 2001 at 04:29:37PM +0100, Alan Cox wrote:
> To make sure this gets enough publicity and eyes on it..
> > http://www.linuxbase.org/spec/lsbreview.html

Yes. Lots of tiny inaccuracies. And no email address.
(But a form with the mysterious button "Change".)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Problem: 'keyboard: Timeout - AT keyboard not present?'

2001-05-08 Thread Len Sorensen

I have been encountering the following problem for quite a while now
(in 2.4 pre kernels, and 2.4.x final kernels), and from what I have
been able to determine, it has affected people since 2.3.4x or so,
and is also affecting 2.2.17 and above.

The problem is that once in a while (which varies greatly and doesn't
appear at all consistent), the keyboard will lock up for a second or
two and the kernel prints the message 'keyboard: Timeout - AT keyboard
not present?'.  This almost always involves getting an extra character
(usually the one just hit or the one before it), or missing the character
just hit.  It can even occur when not typing at all, although that is
much more rare.

I used to think it was just a problem in the keyboard controller on my
machine, but I now have it happening on the machine I just switched to,
and have found a number of other posts about this problem by searching
www.google.com for the string in the error message.  I no longer think
it is necessarily a hardware problem (although I can't rule out this
error being caused by flaky keyboard controllers).  The same machines
that display this error, never even once have done so with 2.2.16
or lower.

Some interesting trends I found while searching for any info on this message on google 
are these:

initialize_kbd: Keyboard reset failed, no ACK
pty: 256 Unix98 ptys configured
keyboard: Timeout - AT keyboard not present?
keyboard: Timeout - AT keyboard not present?
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize

initialize_kbd: Keyboard interface failed self test
pty: 256 Unix98 ptys configured
keyboard: Timeout - AT keyboard not present?
keyboard: Timeout - AT keyboard not present?
Floppy drive(s): fd0 is 2.88M

initialize_kbd: Keyboard reset failed, no ACK
Serial driver version 4.92 (2000-1-27) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
keyboard: Timeout - AT keyboard not present?
keyboard: Timeout - AT keyboard not present?
ttyS00 at 0x03f8 (irq = 4) is a 16550A

and:

PIIX4: IDE controller on PCI bus 00 dev f9
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
keyboard: Timeout - AT keyboard not present?
keyboard: Timeout - AT keyboard not present?
hda: Maxtor 91366U4, ATA DISK drive

PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0x84c0-0x84c7, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0x84c8-0x84cf, BIOS settings: hdc:pio, hdd:DMA
keyboard: Timeout - AT keyboard not present?
keyboard: Timeout - AT keyboard not present?
hda: QUANTUM FIREBALLP LM15, ATA DISK drive

SIS5513: chipset revision 208
SIS5513: not 100% native mode: will probe irqs later
SiS5597
ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:DMA, hdd:pio
keyboard: Timeout - AT keyboard not present?
keyboard: Timeout - AT keyboard not present?
hda: ST5660A, ATA DISK drive

It seems that it tends to occur just after the ide controller is
detected, and/or just around the unix98 pty init (which is right around
the serial port init).  Not sure if the probing of hardware involves
the interrupts being disabled, and that causing a problem.

It of course also happens lots while I am typing, so I am not sure what
can be causing interrupt loses, but it could be disk access or power
management.  I will try using a kernel without any power management and
see if that makes a difference.

Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



nfsd from kernel 2.4.4 oops

2001-05-08 Thread Fermin Molina

Hi,

I'm using kernel 2.4.4 cvs from SGI, with xfs. I'm getting this Oops:

kernel: Unable to handle kernel NULL pointer dereference at virtual address 0010
kernel:  printing eip:
kernel: c017bfd8
kernel: *pde = 
kernel: Oops: 
kernel: CPU:0
kernel: EIP:0010:[nfsd_findparent+120/236]
kernel: EIP:0010:[]
kernel: EFLAGS: 00010246
kernel: eax:    ebx:    ecx: cff8d458   edx: 0010
kernel: esi: cb22c6a0   edi: cb22c720   ebp: cb22c720   esp: ce4c9e54
kernel: ds: 0018   es: 0018   ss: 0018
kernel: Process nfsd (pid: 592, stackpage=ce4c9000)
kernel: Stack:  1802280f c017c416 cb22c720  ce4cf814 1127 ce4cf804
kernel:c03c5740 cfe3b5c8 000e ff8c  c017c7c4 cfe3b400 1802280f
kernel:  0001 ce4cf804 0008 cb1fc77c ce4cfc00 ceb7b000
kernel: Call Trace: [find_fh_dentry+598/928] [fh_verify+612/1128] 
[nfsd_lookup+110/1368] [nfsd3_proc_lookup+314/332] [nfs3svc_decode_diropargs+152/268] 
[nfsd_dispatch+203/360] [svc_process+684/1348]
kernel: Call Trace: [] [] [] [] [] 
[] []


It's produced very randomly. Some people (readed in xfs list) get similar error and
tested too with a clean 2.4.4 with ext2 filesystem, and oops too. I think this is
related to nfsd code (maybe sunrpc code), and it's not related to xfs code.

Always is produced in nfsd_findparent function. I enabled kdb support and I always
see the same stack trace, same order on functions calls.

Client machines mount exports from this server (an i386 with SMP enabled, 2 
processors),
and both 3 and 2 nfs protocol version are used.

Some hint? Someone else gets similar oops?

How can I enable some debugging in nfsd & sunrpc stuff to try to see what is happen?

Thanx!

/Fermin

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Kai Henningsen

[EMAIL PROTECTED] (Horst von Brand)  wrote on 07.05.01 in 
<[EMAIL PROTECTED]>:

> "David S. Miller" <[EMAIL PROTECTED]> said:
> > Jonathan Morton writes:
> >  > >-page_count(page) == (1 + !!page->buffers));
> >  >
> >  > Two inversions in a row?
> >
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
>
> IMVHO, it is clearer to write:
>
>   page_count(page) == 1 + (page->buffers != NULL)
>
> At least, the original poster wouldn't have wondered, and I wouldn't have
> had to think a bit to find out what it meant... If gcc generates worse code
> for this, it should be fixed.

Huh. IMO, that is significantly *less* readable. And incidentally I'd be  
less certain that it actually does what you want - it is rather easy to  
convince yourself that !! has to do the right thing.

MfG Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PCMCIA IDE flash problem found

2001-05-08 Thread Pavel Machek

> Why did not you take care of the request_region() call and just disabled it?
> The ports will be considered free by the system, and another device might 
> grab them later on!

Because it was one of changes between 2.4.0 and 2.4.4. Ignore that.

Pavel
> 
> Vassilii
> 
> -Original Message-
> From: Pavel Machek [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, May 08, 2001 8:14 AM
> To: kernel list
> Subject: PCMCIA IDE flash problem found
> 
> 
> Hi!
> 
> 2.4.[123] changed name of ide-cs module, which means your pcmcia setup
> breaks... This is how to undo the damage. Works for me, do *not* apply
> into anything official.
> 
>   Pavel
> 
> --- clean/drivers/ide/ide-cs.cSun Apr  1 00:23:29 2001
> +++ linux/drivers/ide/ide-cs.cTue May  8 14:06:09 2001
> @@ -95,7 +96,7 @@
>  static int ide_event(event_t event, int priority,
>event_callback_args_t *args);
>  
> -static dev_info_t dev_info = "ide-cs";
> +static dev_info_t dev_info = "ide_cs";
>  
>  static dev_link_t *ide_attach(void);
>  static void ide_detach(dev_link_t *);
> @@ -388,9 +389,12 @@
>   MOD_DEC_USE_COUNT;
>  }
>  
> +#if 0
>  request_region(link->io.BasePort1, link->io.NumPorts1,"ide-cs");
>  if (link->io.NumPorts2)
>   request_region(link->io.BasePort2, link->io.NumPorts2,"ide-cs");
> +#endif
> +printk("Should call request_region\n");
>  
>  info->ndev = 0;
>  link->dev = NULL;

-- 
The best software in life is free (not shareware)!  Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kdb wishlist

2001-05-08 Thread mirabilos

> * Change kdb invocation key from ^A to ^X^X^X within 3 seconds.  ^A is
>   used by emacs, bash, minicom etc.

Why not Alt-SysRq-D (like Debug) or so?

> * Command history.  Handle up/down/left/right/delete keys.  Each
>   kdba_io routine is responsible for recognising the arch specific
>   keys, with a common history and editting routine.

yes!

> * Clean up repeating commands.  Pressing enter at the kdb prompt
>   repeats the previous command, no matter what the previous command
>   was.  Some commands it makes no sense to repeat (bp in particular),
>   for other commands you want to repeat the command but without the
>   parameter (md in particular).

Should be configurable. Sometimes I accidentally hit enter or do it
just to do something...

-mirabilos
-- 
EA F0 FF 00 F0 #$@%CARRIER LOST

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: write to dvd ram

2001-05-08 Thread cacook

Thanks, I'll try it.  Didn't get the prior response.
--
C.

The best way out is always through.
  - Robert Frost  A Servant to Servants, 1914



Jens Axboe wrote:

> On Tue, May 08 2001, Ben Fennema wrote:
> > > The log is:
> > > Apr 15 20:58:27 hydra kernel: UDF-fs INFO UDF 0.9.1 (2000/02/29) Mounting
> > > volume 'UDF Volume', timestamp 2001/03/02 11:55 (1e98)
> >
> > At the very least, run 0.9.3 from sourceforce (or the cvs version) and
> > see if it works any better.
>
> I was just about to say the same thing, 0.9.3 works well for me. In fact
> so well, that I made a patch to bring 2.4.5-pre1 UDF up to date with
> current CVS earlier this afternoon (hint hint, Ben :-).
>
> *.kernel.org/pub/linux/kernel/people/axboe/patches/2.4.5-pre1/
>
> udf-0.9.3-2.4.5p1-1.bz2
>
> --
> Jens Axboe




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: PCMCIA IDE flash problem found

2001-05-08 Thread Khachaturov, Vassilii

Why did not you take care of the request_region() call and just disabled it?
The ports will be considered free by the system, and another device might 
grab them later on!

Vassilii

-Original Message-
From: Pavel Machek [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, May 08, 2001 8:14 AM
To: kernel list
Subject: PCMCIA IDE flash problem found


Hi!

2.4.[123] changed name of ide-cs module, which means your pcmcia setup
breaks... This is how to undo the damage. Works for me, do *not* apply
into anything official.

Pavel

--- clean/drivers/ide/ide-cs.c  Sun Apr  1 00:23:29 2001
+++ linux/drivers/ide/ide-cs.c  Tue May  8 14:06:09 2001
@@ -95,7 +96,7 @@
 static int ide_event(event_t event, int priority,
 event_callback_args_t *args);
 
-static dev_info_t dev_info = "ide-cs";
+static dev_info_t dev_info = "ide_cs";
 
 static dev_link_t *ide_attach(void);
 static void ide_detach(dev_link_t *);
@@ -388,9 +389,12 @@
MOD_DEC_USE_COUNT;
 }
 
+#if 0
 request_region(link->io.BasePort1, link->io.NumPorts1,"ide-cs");
 if (link->io.NumPorts2)
request_region(link->io.BasePort2, link->io.NumPorts2,"ide-cs");
+#endif
+printk("Should call request_region\n");
 
 info->ndev = 0;
 link->dev = NULL;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: write to dvd ram

2001-05-08 Thread Jens Axboe

On Tue, May 08 2001, Ben Fennema wrote:
> > The log is:
> > Apr 15 20:58:27 hydra kernel: UDF-fs INFO UDF 0.9.1 (2000/02/29) Mounting
> > volume 'UDF Volume', timestamp 2001/03/02 11:55 (1e98)
> 
> At the very least, run 0.9.3 from sourceforce (or the cvs version) and
> see if it works any better.

I was just about to say the same thing, 0.9.3 works well for me. In fact
so well, that I made a patch to bring 2.4.5-pre1 UDF up to date with
current CVS earlier this afternoon (hint hint, Ben :-).

*.kernel.org/pub/linux/kernel/people/axboe/patches/2.4.5-pre1/

udf-0.9.3-2.4.5p1-1.bz2

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



PCMCIA IDE flash problem found

2001-05-08 Thread Pavel Machek

Hi!

2.4.[123] changed name of ide-cs module, which means your pcmcia setup
breaks... This is how to undo the damage. Works for me, do *not* apply
into anything official.

Pavel

--- clean/drivers/ide/ide-cs.c  Sun Apr  1 00:23:29 2001
+++ linux/drivers/ide/ide-cs.c  Tue May  8 14:06:09 2001
@@ -95,7 +96,7 @@
 static int ide_event(event_t event, int priority,
 event_callback_args_t *args);
 
-static dev_info_t dev_info = "ide-cs";
+static dev_info_t dev_info = "ide_cs";
 
 static dev_link_t *ide_attach(void);
 static void ide_detach(dev_link_t *);
@@ -388,9 +389,12 @@
MOD_DEC_USE_COUNT;
 }
 
+#if 0
 request_region(link->io.BasePort1, link->io.NumPorts1,"ide-cs");
 if (link->io.NumPorts2)
request_region(link->io.BasePort2, link->io.NumPorts2,"ide-cs");
+#endif
+printk("Should call request_region\n");
 
 info->ndev = 0;
 link->dev = NULL;

-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kdb wishlist

2001-05-08 Thread Juan Quintela

> "slurn" == slurn  <[EMAIL PROTECTED]> writes:

>> 
>> Keith Owens wrote:
>> > 
>> > This is part of my kdb wishlist, does anybody fancy writing the code to
>> > add any of these features?  It would be a nice project for anybody
>> > wanting to start on the kernel.  Replies to [EMAIL PROTECTED] please.
>> > Current patches at http://oss.sgi.com/projects/kdb/download/
>> > 
>> > * Change kdb invocation key from ^A to ^X^X^X within 3 seconds.  ^A is
>> >   used by emacs, bash, minicom etc.
>> > 
>> ^X^X swaps point and mark in emacs.  One (well, I) often will do
>> ^X^X^X^X to examine where mark is and then return to point.

slurn> How about using the break condition instead.  This is only for the
slurn> serial port, and most terminal emulators (e.g. kermit, minicom) provide
slurn> a means to generate a break condition on the serial port. 

kdb uses BREAK in the serial port (that minicom uses C-a for sending a
break is an anecdote :)  But the problem at hang is the console.  I
vote for the ^X^X^X as I a think that it is not a difficult shortcut.
(and yes, I also use emacs and ^X^X all the time, but I think that
this combination is not specially bad, and I suppose that the pet
aplication of other people will have problems with something like:
^A^A^A that I never use). 

Later, Juan.



-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



PCMCIA ide flash card does not work

2001-05-08 Thread Pavel Machek

Hi!

My ide flash card used to work in 2.4.0, but does not work in
2.4.4. Everything compiled in (no modules)

May  8 13:43:44 bug cardmgr[58]: initializing socket 0
May  8 13:43:44 bug cardmgr[58]: socket 0: ATA/IDE Fixed Disk
May  8 13:43:44 bug cardmgr[58]: module //pcmcia/ide_cs.o not
available
May  8 13:43:45 bug cardmgr[58]: get dev info on socket 0 failed:
Resource temporarily unavailable

PCMCIA ne2000 card works at the same time.

Any hints?

Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Question] Explanation of zero-copy networking

2001-05-08 Thread Alexander Eichhorn

At first, thanks for the (unexpected large) discussion and hints! 

Second: sorry for the multimedia-centric viewpoint, but i think
it's an important task for future operating systems development
(or better: for a real world OS like linux) to have sophisticated 
support for a _large diversity_ in application requirements 
and realtime/multimedia apps are treated stepmotherly for too long.


"Richard B. Johnson" wrote:

> So, the kernel is going to send a packet to another host on
> behalf of the system caller.  It copies the data,  (partial
> checksum) assembles the packet, finishes the checksum, then
> sends it.  The CPU is given to somebody  else while waiting
> for the packet to  get somewhere and be ACKed.  But,  think
> about a server  where EVERY  task  is  waiting  for I/O  to
> complete!  These CPU cycles,  that you saved by eliminating
> a copy (or two), are now wasted spinning.
>
> Basically, "no copy" is an academic exercise. It makes the first
> packet get sent more quickly, after which everything slows to
> the natural bandwidth of the system.

This is the semantic of a typical client/server request/reply 
protocol which is used in "traditional" applications. But it isn't 
appropriate for the communication of realtime mediastreams
because it breakes the strict timing constraints. Here we need
asynchronous (*non blocking semantics*) communication.

> 
> If you used a server for multicast-only.  In other words,  you
> just spewed out unidirectional data, you still slow to the rate
> at which the media can take the data.  And CPUs can obtain or
> generate these data a lot faster than 100-base can sink them.
> 
> When we get to media that can sink data as fast as we can generate
> them (it), then we have to worry about memory copy speed. However,
> these new devices are actually an IP subsystem.  They generate and
> receive entire datagrams. To fully utilize these devices, the data-
> gram generation and reception (the basis of all TCP/IP networking)
> will have to be moved out of the kernel and into these boards. The
> kernel code will only handle interfaces, connections, and rules.

O, these are the arguments of people rather investing in
more ressources than investing in clever algorithms. It's comparable 
to the old war between the ATM folks and the IP/Ethernet folks; 
concepts against "brute" ressources. 

1. You don't take into account that there are not only high-end PC's and
Workstations with enormous CPU and memory resources! Devices for 
"pervasive ubiquitous computing" (don't blame me for this fashion word)
for example are mostly embedded systems with scarce ressources, happy
to have enough CPU-cycles for video-codecs. 

2. On the other hand are Video-on-Demand servers with (not only one)
high speed NIC's, large SAN's or disk arrays for video storage with
gigabit/infiniband connections, . Here's
the problem not only saturating the links (for economic reasons), but
also to guarantee low delay and jitter to every connection. 
I think we should extend the usability of linux to this class of 
servers too.

3. Have a look at the various papers on high performance networking.
The gap between the growth in network bandwidth and the growth in CPU 
and bus performance is increasing. Today the system-busses are not 
considered to be in the "window of scarcity" (today we have 100MBit 
Ethernets and 133++MB/s PCI). Tomorrow our operating system concepts 
have to cope with 1, 10, ?? Gigabit Ethernets, Infiniband ,
... who knows.

This means: scale CPU and memory-bus performance accordingly or 
use ressource-sparing ipc-mechanisms and implement computational 
complex algorithms (checksum calculations, encryption) in hardware.
Besides continuous-media applications other applications who need
to move data-chunks much larger as the CPU-caches will
benefit from such infrastructures too. (Both classes of systems 
from above will be affected.)

For those applications copy avoidance is so fundamantal or 
copying is so expensive because copying needs all three basic 
system ressources (CPU, memory and bandwidth of local communication-
facilities - busses) at the same time (synchronous)!

Many researchers recognized this problem and developed techniques 
to overcome the dusted os-concepts (UNet, UVM,..). Unfortunately
they need special hardware (NIC's), have partially too much 
overheads or are not general enough. The one thing it shows us
is that there is still some work to be done.

Regards,

Alexander Eichhorn

-- 
Alexander Eichhorn
Technical University of Ilmenau
Computer Science And Automation Faculty
Distributed Systems and Operating Systems Department
Phone +49 3677 69 4557, Fax  +49 3677 69 4541
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: write to dvd ram

2001-05-08 Thread Ben Fennema

> The log is:
> Apr 15 20:58:27 hydra kernel: UDF-fs INFO UDF 0.9.1 (2000/02/29) Mounting
> volume 'UDF Volume', timestamp 2001/03/02 11:55 (1e98)

At the very least, run 0.9.3 from sourceforce (or the cvs version) and
see if it works any better.

Ben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kdb wishlist

2001-05-08 Thread slurn

> 
> Keith Owens wrote:
> > 
> > This is part of my kdb wishlist, does anybody fancy writing the code to
> > add any of these features?  It would be a nice project for anybody
> > wanting to start on the kernel.  Replies to [EMAIL PROTECTED] please.
> > Current patches at http://oss.sgi.com/projects/kdb/download/
> > 
> > * Change kdb invocation key from ^A to ^X^X^X within 3 seconds.  ^A is
> >   used by emacs, bash, minicom etc.
> > 
> ^X^X swaps point and mark in emacs.  One (well, I) often will do
> ^X^X^X^X to examine where mark is and then return to point.

How about using the break condition instead.  This is only for the
serial port, and most terminal emulators (e.g. kermit, minicom) provide
a means to generate a break condition on the serial port. 

scott

> 
> George
> 
> ~snip
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] x86 page fault handler not interrupt safe

2001-05-08 Thread Linus Torvalds


On Tue, 8 May 2001, Alan Cox wrote:
> 
> I dont see where the alternative patch ensures the user didnt flip the
> direction flag for one

Yeah. 

We might as well just make it "eflags & IF", none of the other flags
should matter (or we explicitly want them cleared).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [ot] named sockets

2001-05-08 Thread Mikael Pettersson

On Mon, 7 May 2001 21:47:33 -0400 (EDT), Adam <[EMAIL PROTECTED]> wrote:

>So I'm wondering, is there a way, kind of like "relink" system call which
>coule take existing file descriptor (they are still so the fd is there,
>just unlinked) and link it back to file name?

POSIX' fattach(int fd, const char *path) library call does that,
although it's often limited to STREAMS fd:s. It's usually
implemented as mounting "namefs" at the path (SVR4) or setting
a magic mount option (OSF1), with the fd passed in as mount-point
specific data. Regular users are allowed to do this special mount().

Linux currently doesn't have this functionality, but it could
probably be implemented as a pseudo-fs in 2.4, assuming the 2.4
VFS properly supports stacking of file systems. (There's some
gotchas concerning chown/chmod changes and restoring the original
object after the fd is unmounted.)

Not that I think Linux really needs this creeping featurism ...

/Mikael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Question] Explanation of zero-copy networking

2001-05-08 Thread Jamie Lokier

Alan Cox wrote:
> > so there's still single copy for write() of a mmap()ed page?
> 
> An mmap page will go direct to disk.

Looking at the 2.4.4 code, mmap() of file followed by write() to socket
will copy the data once.

I could be mistaken (only glanced at the code quickly) but I base that
on the only call to ->sendpage being through sendfile.

So yes, there's a single copy overhead for mmap()+write().

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Possible PCI subsystem bug in 2.4

2001-05-08 Thread Maciej W. Rozycki

On 4 May 2001, Eric W. Biederman wrote:

> The example that sticks out in my head is we rely on the MP table to
> tell us if the local apic is in pic_mode or in virtual wire mode.
> When all we really have to do is ask it.

 You can't.  IMCR is write-only and may involve chipset-specific
side-effects.  Then even if IMCR exists, a system's firmware might have
chosen the virtual wire mode for whatever reason (e.g. broken hardware). 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: nfs MAP_SHARED corruption fix

2001-05-08 Thread Kurt Garloff

On Tue, May 08, 2001 at 05:21:02PM +0200, Trond Myklebust wrote:
> Could you instead detail exactly which corruption problem you are
> trying to fix?

int fd = open (name, O_RDWR);
char* adr = (char*) mmap (0, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
/* write to *adr through *(ard+len-1) */
/* Try adding here: msync (adr, len, MS_SYNC); */
munmap (adr, len);
close (fd);

The code works on files on local harddisks and on NFS volumes on a 2.2
kernel, but breaks on NFS drives on a 2.4.4 kernel.
msync() works around the bug.
Andrea's patch did help as well.

Regards,
-- 
Kurt Garloff  <[EMAIL PROTECTED]>  Eindhoven, NL
GPG key: See mail header, key servers Linux kernel development
SuSE GmbH, Nuernberg, FRG   SCSI, Security

 PGP signature


LSB 0.9 public draft

2001-05-08 Thread Alan Cox

To make sure this gets enough publicity and eyes on it..

-
> The Linux Standard Base is in the final stages of the LSB written
> specification for Linux.   The workgroup has published the LSB v0.9 written
> specification, and is undergoing a thirty day Request For Comments  from
> the public until Wednesday June 6th, 2001.  Afterwards, this draft will be
> submitted to the Free Standards Group for adoption.
> 
> http://www.linuxbase.org/spec/lsbreview.html
> 
> The goal of the LSB is to develop and promote a set of standards that will
> increase compatibility among Linux distributions and enable software
> applications to run on any compliant Linux system. In addition, the LSB
> will help coordinate efforts to recruit software vendors to port and write
> products for Linux.
> 
> http://www.linuxbase.org/
> 
> George (gk4)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: nfs MAP_SHARED corruption fix

2001-05-08 Thread Trond Myklebust

> " " == Andrea Arcangeli <[EMAIL PROTECTED]> writes:

 > This fixes corruption with MAP_SHARED on top of nfs filesystem
 > in 2.4:
 > --- 2.4.5pre1aa2/fs/nfs/write.c.~1~ Tue May 1 19:35:29 2001
 > +++ 2.4.5pre1aa2/fs/nfs/write.c Tue May 8 02:04:15 2001
 > @@ -1533,6 +1533,7 @@
 >  if (!inode && file)
 >  inode = file->f_dentry->d_inode;
 
 > + filemap_fdatasync(inode->i_mapping);
 >  do {
 >  error = 0; if (wait)

Yech! Apart from the fact that this means you do a full fdatasync()
even when you are simply trying to flush out single pages,
nfs_sync_file() gets called all over the place including in areas
where we know we're already holding a page lock.


AFAICs this fix will clearly deadlock...

Could you instead detail exactly which corruption problem you are
trying to fix?

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: RAID question

2001-05-08 Thread Jakob Østergaard

On Tue, May 08, 2001 at 12:48:25PM +1000, Peter Waltenberg wrote:
> We have a RAID 5 system thats had 2 of 6 disks in the RAID go into thermal
> shutdown. (Air-con failure).
> 
> The disks are functional, but the RAID won't restart because the superblock
> timestamps on those two disks are now out of step with the rest of the array and
> there aren't enough "good" disks to reconstruct the array.
> 
> We know there was very little activity when this happened.
> 
> Does anyone out there know of a way to hack the superblocks on the "bad" disks
> to force them to appear to be O.K. so that the RAID will restart. 

As documented in the HOWTO (http://unthought.net/Software-RAID.HOWTO), you should
re-run mkraid after making dead sure that your raidtab still corrosponds to the
RAID on your disks (it usually does unless someone screwed it up).

Run fsck on the RAID after mkraid.

-- 

:   [EMAIL PROTECTED]   : And I see the elder races, :
:.: putrid forms of man:
:   Jakob Østergaard  : See him rise and claim the earth,  :
:OZ9ABN   : his downfall is at hand.   :
:.:{Konkhra}...:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Alex Bligh - linux-kernel

>   The real fix is to measure fragmentation and the progress of kswapd, but
> that is too drastic for 2.4.x.

I suspect the real fix might, in general, be
a) to reduce use of kmalloc() etc. which gives
   physically contiguous memory, where virtually
   contiguous memory will do (and is, presumably,
   far easier to come by). (or perhaps add some
   flag to kmalloc to allocate out of virtual
   rather than physical memory).
b) to bias flush or swap out routines to create
   physically contiguous higher order blocks.
   Many heuristics will give you that ability.

Disclaimer: I haven't looked at this for issue for years,
but Linux seems to fail on >4k allocations now, and
fragment memory far more, than it did on much smaller
systems doing lots of nasty (8k, thus 3 pages including
header) NFS stuff back in 94.

--
Alex Bligh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: write to dvd ram

2001-05-08 Thread cacook

Many thanks Jim.  Now at least I have a way.

But I caution others that Linux modifies the UDF filesystem somehow, so that Winders 
can no longer understand it.  I nearly lost all my music & photo archives to this.  
And attempts to rm or mv on a DVDRAM with UDF cause it to segfault & jam up.  There 
doesn't seem to be an answer to this.  (yes, I have written to the developer of 
cddriver; no response at all after two weeks)  UDF2 is just nonfunctional in Linux and 
I don't know why.

To recap: running Panasonic LF-D101 DVDRAM drive on SCSI (AHA2940) and
getting segfaults.  On-disk format is UDF2.0, as 2.1 won't mount.

Mount, ls, umount, mount, ls, umount, etc - no problem except filestructure is now no 
longer available to Winders. (CAUTION! Save data using Linux for recovery in Winders)

Mount, cp <20Mfile>, umount, mount, ls, (20Mfile), umount, mount, ls, (20Mfile),
rpm -q 20Mfile, umount, etc - no problem except filestructure no longer available to 
Winders.

Mount, rm <20Mfile>, Segmentation Fault, umount, (device busy), umount, (device busy), 
etc.  Reboot without reset and bootup hangs at Running Linuxconf hooks. Reset & system 
boots fine.  Mount, ls, (no files), umount, mount, ls, (no files), umount, etc.

Running RedHat Wolverine with HelixGnome & Nautilus.
--
C.

The best way out is always through.
  - Robert Frost  A Servant to Servants, 1914

Keywords: DVDRAM DVD-RAM LF-D101 LFD101 cdrecord


The log is:
Apr 15 20:58:27 hydra kernel: UDF-fs INFO UDF 0.9.1 (2000/02/29) Mounting
volume 'UDF Volume', timestamp 2001/03/02 11:55 (1e98)
Apr 15 20:59:31 hydra kernel: UDF-fs INFO UDF 0.9.1 (2000/02/29) Mounting
volume 'UDF Volume', timestamp 2001/03/02 11:55 (1e98)
Apr 15 20:59:50 hydra last message repeated 3 times
Apr 15 21:00:17 hydra mon[1258]: failure for servers http 987390017 localhost
Apr 15 21:01:11 hydra kernel: UDF-fs INFO UDF 0.9.1 (2000/02/29) Mounting
volume 'UDF Volume', timestamp 2001/03/02 11:55 (1e98)
Apr 15 21:03:25 hydra last message repeated 2 times
Apr 15 21:03:40 hydra kernel: kernel BUG at inode.c:890!
Apr 15 21:03:40 hydra kernel: invalid operand: 
Apr 15 21:03:40 hydra kernel: CPU:0
Apr 15 21:03:40 hydra kernel: EIP:0010:[iput_free+216/352]
Apr 15 21:03:40 hydra kernel: EIP:0010:[]
Apr 15 21:03:40 hydra kernel: EFLAGS: 00010286
Apr 15 21:03:40 hydra kernel: eax: 001b   ebx: cb1ad640   ecx: 0004
edx: c5508840
Apr 15 21:03:40 hydra kernel: esi: c0319560   edi: cb4f2740   ebp: b678 esp:
c9d5ff20
Apr 15 21:03:40 hydra kernel: ds: 0018   es: 0018   ss: 0018
Apr 15 21:03:40 hydra kernel: Process rm (pid: 2254, stackpage=c9d5f000)
Apr 15 21:03:40 hydra kernel: Stack: c02a1610 c02a16f3 037a 
0012 c392 cffb3560 cb4f2740
Apr 15 21:03:40 hydra kernel:cb1ad640 c0144a3c cb1ad640 0184
fff0 c39229c0 cb4f2740 
Apr 15 21:03:40 hydra kernel:c39229c0 c013e31c cb4f2740 cfc83d40
c9d5ff9c  ffeb cb4f2740
Apr 15 21:03:40 hydra kernel: Call Trace: [error_table+39488/42452]
[error_table+39715/42452] [d_delete+76/112] [vfs_unlink+316/368]
[sys_unlink+150/272] [do_page_fault+0/1088] [system_call+51/56]
Apr 15 21:03:40 hydra kernel: Call Trace: [] []
[] [] [] [] []
Apr 15 21:03:40 hydra kernel:
Apr 15 21:03:40 hydra kernel: Code: 0f 0b 83 c4 0c eb 69 90 39 1b 74 3c f6 83 f8 00 00 
00 07 75
Apr 15 21:04:18 hydra mon[1258]: failure for servers http 987390258 localhost


ver_linux
Linux hydra.darkmatter.com 2.4.2-0.1.49 #1 Sun Apr 15 18:12:33 MDT 2001 i686 unknown

Gnu C  2.96
Gnu make   3.79.1
binutils   2.10.91.0.2
util-linux 2.10r
modutils   2.4.2
e2fsprogs  1.19
reiserfsprogs  3.x.0b
PPP2.4.0
isdn4k-utils   3.1pre1
Linux C Library2.2.2
Dynamic linker (ldd)   2.2.2
Procps 2.0.7
Net-tools  1.57
Console-tools  0.3.3
Sh-utils   2.0
Modules Loaded via82cxxx_audio ac97_codec binfmt_misc autofs
nls_iso8859-1 nls_cp437
cdrecord 1.9-6


"Hawthorne, Jim J SSI-ISEA" wrote:

> No problem with ext2 file system -- I have been using LM 7.2 with kernel
> 2.2.14 and it works straight out of the box. Newer kernel 2.4.x should also
> work .
> Have Toshiba w1101 scsi dvd ram and initio wide scsi card. I also use
> WINDOZE 2000 and use UDF for DVD RAM on WINDOZE 2000 (Instant Write from VOB
> at www.vob.de -- came with the drive).
>
> I use BRUBACK for backup to backup both Linux and W2K (Bruback will backup
> windoze filesystem from Linux -- no problem)
>
> format your media with mke2fs /dev/scd1 (or wherever your dvd ram is
> detected) -- just use defaults  takes about 1 min to format a 2.6 GB media.
>
> create a directory under /   say /dvdram
>
> then simply mount -t ext2 /dev/scd1 /dvdram  hey presto you should get
> read/write access to your drive
>
> your fstab entry should look something like this
>
> /dev/scd1 /dvdram ext2 

2.2.19 + reiserfs 3.5.32 nfsd wait_on_buffer/down_failed

2001-05-08 Thread Michael Stiller

Hi,

we run a nfs server utilizing 2.2.19 + ReiserFS version 3.5.32 on a
P 3 550 machine. Disk subsystem is a GDT7518RN using 4 UW disks as raid 5
device. After upgrading from 2.2.17 + reiserfs to 2.2.19 we experience
many (very much more than with 2.2.17) problems with our nfs clients
about 12 (linux). Network ist 100Mbit full duplex / switched. 
I do not think this is network related, cause ping -f doesnt show any
packet loss. 

During not so heavy IO on the exported fs
one nfsd thread seems to be waiting for the disk:

  621 root   1   0 00 wait_on_b DW6.2  0.0   1:49 nfsd

and the other threads are waiting in down_fail:

  610 root   0   0 00 down_fail DW0.0  0.0   1:52 nfsd
  611 root   0   0 00 down_fail DW0.0  0.0   1:40 nfsd
  612 root   0   0 00 down_fail DW0.0  0.0   1:41 nfsd
  613 root   0   0 00 down_fail DW0.0  0.0   1:48 nfsd
  614 root   0   0 00 down_fail DW0.0  0.0   1:45 nfsd
  615 root   0   0 00 down_fail DW0.0  0.0   1:43 nfsd
  616 root   0   0 00 down_fail DW0.0  0.0   1:50 nfsd
  617 root   0   0 00 down_fail DW0.0  0.0   1:42 nfsd
  618 root   0   0 00 down_fail DW0.0  0.0   1:44 nfsd
  619 root   0   0 00 down_fail DW0.0  0.0   1:42 nfsd
  620 root   0   0 00 down_fail DW0.0  0.0   1:47 nfsd
  622 root   0   0 00 down_fail DW0.0  0.0   1:47 nfsd
  623 root   0   0 00 down_fail DW0.0  0.0   1:43 nfsd
  624 root   0   0 00 down_fail DW0.0  0.0   1:48 nfsd
  609 root   0   0 00 down_fail DW0.0  0.0   1:50 nfsd

During this event:

- If i check the disk io with e.g. vmstat 1 the machine is doing about 200 bi
  per second, which is not so much i guess. 
- the client machines hang, should be clear:

nfs: server foo is not responding
nfs: server foo still not responding
nfs: server foo OK

Our idea is to revert back to 2.2.17 cause the behaviour was much better.

How can i debug this ? Can i do some tuning ? 
Should i revert to some older kernel. 
Are there any patches for this problem ? Does anyone has the same or
related problem ?

Any pointer would be useful.

TIA and cheers,

-Michael

-- 
In a world where an admin is rendered useless when the ball in his mouse
has been taken out, its good to know that I know UNIX. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: fs.file-max

2001-05-08 Thread Nathan Straz

On Tue, May 08, 2001 at 10:03:23AM +, Federico Edelman Anaya wrote:
> What can I do to test the FD limit? ... Because, the FD limit is set in
> /proc/sys/fs/file-max, sample:
> 
> echo "2048" > /proc/sys/fs/file-max
> ulimit -n 8192
> 
> In this case ... the FD limit = 8192 :( ... when the limit should be
> 2048?
> 
> I wrote a perl script for the test ... anybody known a "C" program for
> test the FD limit?

Hmm, we seem to be missing this test case from the Linux Test Project.
I see that dup03 exhausts all FDs and tests dup() for EMFILE.  You could
easily adapt that test case to a setrlimit() test case.  

-- 
Nate Straz  [EMAIL PROTECTED]
sgi, inc   http://www.sgi.com/
Linux Test Project http://ltp.sourceforge.net/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



kmalloc(..., GFP_ATOMIC) buffers contiguous - hence suitable for PCI DMA?

2001-05-08 Thread Khachaturov, Vassilii

Hi folks!

(I have looked up in the archive the linux-kernel threads for kwds 
"DMA, contiguous, address" before writing this mail, and read the 
corresponding threads.)

I am trying to port some driver to Linux2.4/i386. I have just read 
the "Linux device drivers" book by A.Rubini, and this is what he 
says there in Ch.13 "Mmap & DMA", on the GFP_DMA allocator flag:

"The kernel guarantees that DMA-capable buffers have 2 features. 
1st, the phys. addrs must be conseccutive when get_free_page() 
returns > 1 page (but this is always true, indep. of GFP_DMA, 
because the kernel arranges free memory in clusters of consecutive pages).
And second, when GFP_DMA is set, the kernel guarantees that only addrs lower
than MAX_DMA_ADDRESS are returned. The macro MAX_DMA_ADDRESS is set to 16MB
on the PC, to deal with the 
ISA [...].

As far as PCI is concerned, there's no MAX_DMA_ADDRESS limit, 
and a PCI dev. driver should avoid setting GFP_DMA when 
allocating its buffers."

Is this really still true at kernels 2.2 and on? (The book 
refers to 2.1.43 as to the most modern version as of the time 
of its writing) I.e., can I just assume a buffer which I know 
to have been successfully allocated with just a 
kmalloc(..., GFP_ATOMIC) 
will be physically contiguous and hence suitable for PCI DMA?

I tried to understand the corresponding code path in mm/slab.c, 
but failed to come up with a 100%-assuring opinion out of it.

The driver and the device at present are not oriented for 
doing scatter-gather.

TIA for any possible help,
Vassilii
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: fs.file-max

2001-05-08 Thread Federico Edelman Anaya

Dan:
Hi ...

Dan Kegel wrote:

> Federico Edelman Anaya ([EMAIL PROTECTED]) wrote:
>
> > What can I do to test the FD limit? ... Because, the FD limit is set in
> > /proc/sys/fs/file-max, sample:
> >
> > echo "2048" > /proc/sys/fs/file-max
>
> That sets the systemwide limit to 2048.

Ok ...

>
>
> > ulimit -n 8192
>
> That sets the per-process limit (for this process
> and its children) to 2048.
>

But, my perl script could open 8192 files ... I don't understand exactly
work ... which is the limit of FD? file-max?


>
> > In this case ... the FD limit = 8192 :( ... when the limit should be
> > 2048?
>
> No, the two limits are independant (except, obviously, that
> that process will reach the systemwide fd limit before it
> exhausts its per-process fd limit).
>
> > I wrote a perl script for the test ... anybody known a "C" program for
> > test the FD limit?
>
> http://www.kegel.com/dkftpbench/#tuning
>
> - Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: CML2 design philosophy heads-up

2001-05-08 Thread Rogier Wolff

Eric S. Raymond wrote:
> More generally, arguments of the form "Non-mainline custom hack X
> could invalidate constraint Y, therefore we can't have Y in the
> rulebase" are dangerous -- I suspect you could reduce your set of
> constraints to nil very quickly that way, and thus badly screw over
> the 99% of people who just want to build a more or less stock kernel.

Eric, 

Still being able to use the "tool" is useful! So I want a "don't mess
with me" mode where I'd get more control than 99% of the lusers

Roger. 

-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots. 
* There are also old, bald pilots. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   3   4   >