Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, 25 Sep 2000, Alexander Viro wrote: On Sun, 24 Sep 2000, Linus Torvalds wrote: The remaining part if the directory handling. THAT is very buffer-cache intensive, as the directory handling hasn't been moved over to the page cache at all for ext2. Doing a large "find" (or even just a "ls -l") will basically do purely buffer cache accesses, first for the directory data and then for the inode data. With no page cache activity to balance things out at all - leading to a potentially quite unbalanced VM that never really had a good chance to get rid of dentries etc. You forgot inode tables themselves. I don't. That's the "then for the inode data" part. I'm not claiming that the buffer cache accesses would go away - I'm just saying that the unbalanced "only buffer cache" case should go away, because things like "find" and friends will still cause mostly page cache activity. (Considering the size of the inode on ext2, I don't know how true this is, I have to admit. It might still be quite biased towards the buffer cache, and as such the additional page cache pressure might not be enough to really cause any major shift in balancing). I'll do it and post the result tomorrow. I bet that there will be issues I've overlooked (stuff that happens to work on UFS, but needs to be more general for ext2), so it's going as "very alpha", but hey, it's pretty straightforward, so there is a chance to debug it fast. Yes, famous last words and all such... Sure. BTW, we _will_ need it on UFS side in 2.4 anyway. Rationale: [ reasons removed ] I have no problem with that. Especially as I suspect the people who use UFS are more likely to be the technical kind of user who is more inclined to be able to debug whatever potential problems crop up anyway. Your point about not duplicating the fragment handling is certainly quite convincing for the case of UFS. So some variant of directories in pagecache is needed for 2.4, the question being whether it's UFS-only or we use its port on ext2... BTW, minixfs/sysvfs can also use the thing, but that's another story. Let's plan on UFS-only, for all the prudent reasons. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: NAT dropping packets
In message [EMAIL PROTECTED] you write: Hi, I've just spotted a small problem with 2.4.0-test8 running netfilter: NAT: 3 dropping untracked packet c065d3a0 1 192.168.0.1 - 192.168.0.9 Yes. The connection tracking code doesn't try to understand broadcast packets, so when it sees the ping reply, it doesn't recognize it. The NAT code then drops the (untracked) packet. The message has been very useful in highlighing connection tracking problems in the past 8). If you don't mind your box `leaking', you can simply comment out this message and make NAT return NF_ACCEPT for this. Rusty. -- Hacking time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Sun, 24 Sep 2000, Linus Torvalds wrote: I'm not claiming that the buffer cache accesses would go away - I'm just saying that the unbalanced "only buffer cache" case should go away, because things like "find" and friends will still cause mostly page cache activity. (Considering the size of the inode on ext2, I don't know how true this is, I have to admit. It might still be quite biased towards the buffer cache, and as such the additional page cache pressure might not be enough to really cause any major shift in balancing). Hrrrmmm... You know, since we don't have to associate struct inode with every address space and inode table _is_ a linear array, after all... We might put it into pagecache too. Very few places access the on-disk inode, so it's not too horrible. All we need is readpage() and that's very easy, considering the fact that allocation is static. prepare_write() and commit_write() may be NULL for all I care and writepage() will be easy too - no holes, no allocation, no nothing. Looks like we need to deal with ext2_update_inode(), ext2_read_inode() and that's it. Even less intrusive than directory stuff... Comments? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [DOC] Debugging early kernel hangs
James Sutherland writes: On Sat, 23 Sep 2000, Russell King wrote: And I'll try to make the point a second time that everything does not have a character-based screen to write to. So what? For platforms which have a nice easy way to stick ASCII on screen, use this. For other platforms, find some other approach - if you have a nice easy serial port handy, try feeding the characters there. So what? It shouldn't be called "VIDEO_CHAR" then - calling it that describes ONE implementation only, not what it is actually doing. Something more like DEBUGCH(char) or whatever is a better choice because it describes what the intention of it is, rather than the implementation. _ |_| - ---+---+- | | Russell King[EMAIL PROTECTED] --- --- | | | | http://www.arm.linux.org.uk/personal/aboutme.html / / | | +-+-+ --- -+- / | THE developer of ARM Linux |+| /|\ / | | | --- | +-+-+ - /\\\ | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: problem with 2.4.0-test9-pre6 seems to be SHM
Hi David, David Ford [EMAIL PROTECTED] writes: I think it's time to get Christoph on the line and see what he has to say. The 4096 number is a limit to the system, you can have a max of 4096 shared memory segments systemwide. Do you know offhand which programs are using(abusing) shm? Here I am on the line again. But fortunately you found out yourself that it's not the kernel to blame... Greetings Christoph - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 1023rd thread crashes 2.4.0-test8 from non-root user
On Mon, 25 Sep 2000, Mark Hahn wrote: The problem is large numbers of threads in 2.4.0-test8 can result in a hard crash of the entire kernel. This can be done as a non-root user. this appears to be reproducable (128M duron, haven't tried intel UP/SMP): i've done some experimentation, and to me it appears we overload the queued signal limit of bash, or something like that? The Ctrl-C thing definitely creates alot of signals. And the default limit for queued signals [kernel/signal.c:max_queued_signals] is 1024 ... so i think this is threading-unrelated, to me it (tentatively) looks like to be a signal handling bug. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.17
On Mon, Sep 04, 2000 at 10:58:09PM +0200, Pedro M. Rodrigues wrote: The change to eepro100 done in pre16 isn´t listed as being restored. Is it still in i/o mode? The investigation hasn't succeeded yet. It looks like a timing problem (however, I'm not so sure now). I spent 3 full evenings last week working on this matter, no luck so far. 2.2.17pre16 [...] o Switch eepro100 to I/O mode pending investigation (Andrey Savochkin) Best regards Andrey V. Savochkin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 1023rd thread crashes 2.4.0-test8 from non-root user
indeed, after changing max_queued_signals to 4096, i cannot crash the kernel anymore with 2000 threads. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 1023rd thread crashes 2.4.0-test8 from non-root user
btw., maybe it's init that gets those 2000 signals, not bash? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[Demo program]: Poor elevator performance in 2.4.0-test9pre6
Ive written a small program to demonstrate the performance problems Ive been seeing in recent Linux kernels. The benchmark is a single process which writes and read 8k blocks round-robin from a number of files. It is written as a single process so the ordering of the operations is known and perfectly interleaved. I include the source at the end of this message. The files are initially created using sequential writes so that the files are laid out nicely on disk. With kernel version 2.4.0-test9pre6 the results are as follows. The test machine has 128 Megs of memory. The tests accesses 240 Megs of files so that it can't fit in cache. If I run it with 8 files of size 30 Megs: [robert@test25 src]$ ./elv_test 8 30 files created, 240 megs written at 8.96 megs/sec finished writing 240 megs written at 1.05 megs per sec finished reading, 240 megs read at 5.848833 megs/sec If I do the same with a single file of size 240 Megs [robert@test25 src]$ ./elv_test 1 240 files created, 240 megs written at 11.12 megs/sec finished writing 240 megs written at 11.08 megs per sec finished reading, 240 megs read at 12.580521 megs/sec Comparing this to a similar tiotest run [robert@test25 src]$ tiotest -f 30 -b 8192 -t 8 -r 0 Tiotest results for 8 concurrent io threads: ,--. | Item | Time | Rate | Usr CPU | Sys CPU | +---+--+--+--+-+ | Write 240 MBs | 25.5 s | 9.410 MB/s | 0.1 % | 10.0 % | | Read 240 MBs | 20.4 s | 11.755 MB/s | 0.0 % | 8.8 % | `--' As the tests demonstrate, we get terrible write performance when a single processes is writing round robin to a number of files. There are two possible explanations for this, the single threaded nature of the program is slowing things down. Or the fact that the files are being written round robin is slowing us down. Since I see exactly the same kind of behaviour with the netatalk benchmark I have been using and the netatalk benchmark isnt single threaded, I believe that its the round robin interleaving of writes that leading to the performance problems. As a comparison, heres the results of the test program in kernel version 2.4.0-test1-ac22. [robert@testmac25 src]$ ./elv_test 8 30 files created, 240 megs written at 8.24 megs/sec finished writing 240 megs written at 8.99 megs per sec finished reading, 240 megs read at 5.849072 megs/sec Here the write performance is fine. This definitely indicates that its not the single threaded benchmark thats slowing things down. As I understand it, the elevator should be dealing with the interleaved nature of the writes. This seems to be working ok for reads, but it doesnt seem to be working properly for writes. The source can be found at http://tltsu.anu.edu.au/~robert/elv_test.c -- Robert Cohen TLTSU, Unix support Australian National University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: problem with 2.4.0-test9-pre6 seems to be SHM
safemode [EMAIL PROTECTED] writes: The sum of the Bytes used in the 4096 entries ipcs shows is WAY off from the bytes used in df if that's what you wanted to know.df shows 109K in use... and that's easily beaten by the first entry in ipcs -- Shared Memory Segments key shmid owner perms bytes nattchstatus 0x 32769 root 600 5038082 dest 0x0002 131074root 600 1966082 0x0003 163843root 600 6553602 0x 3997700 root 777 5240 1 dest 0x 4030469 root 777 5060 1 dest 0x 4063238 root 777 4700 1 dest this is the first 6 entries ... i'm not sure what you're getting at with this though.. Just to give you some debugging help for the future: you can get the attachees to a shm segment with shmfs using fuser(1): [root /root]# ipcs -m -- Shared Memory Segments key shmid owner perms bytes nattchstatus 0x 32769 nobody600 46084 11dest [root /root]# fuser -v /dev/shm/.IPC_8001 USERPID ACCESS COMMAND /dev/shm/.IPC_8001 root883 m httpd root886 m httpd root887 m httpd root888 m httpd root889 m httpd root890 m httpd root891 m httpd root892 m httpd root893 m httpd root894 m httpd root895 m httpd The number in .IPC_ is the shmid in hex. So if you are in doubt which program is to blame, you should have a way to find it now. Greetings Christoph - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 2.4.0-test8 lockup
Donn Washburn wrote: I would request a "cc" message. It seems as recent I have either a memory problem and or possible kernel problem with this system. System is a ASUS P5A, AMD K6-II/350 128Meg/IDE system. Don't use test8! It is known for cannibalism (particularly for eating mailboxes). My personal advice: Apply the test9-pre1 patch, but nothing later if you are not into serious kernel debugging. -- Martin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [DOC] Debugging early kernel hangs
On Mon, 25 Sep 2000, Russell King wrote: James Sutherland writes: On Sat, 23 Sep 2000, Russell King wrote: And I'll try to make the point a second time that everything does not have a character-based screen to write to. So what? For platforms which have a nice easy way to stick ASCII on screen, use this. For other platforms, find some other approach - if you have a nice easy serial port handy, try feeding the characters there. So what? It shouldn't be called "VIDEO_CHAR" then - calling it that describes ONE implementation only, not what it is actually doing. Something more like DEBUGCH(char) or whatever is a better choice because it describes what the intention of it is, rather than the implementation. Yes, a better name could be found; I'd go for DUMP_CHAR() myself, I think. The basic concept is great, it just needs a new name... James. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 82559 driver bug
Greg, On Sun, Sep 24, 2000 at 11:42:11PM -0700, Greg Zhang wrote: I need to update the MAC address on a Intel 82559 ethernet card. Tried: # ifconfig eth0 down # ifconfig eth0 hw ether0 xx:xx:xx:xx:xx:xx # ifconfig eth0 up It seems to take effect. Ping works. I have not had time to verify whether the MAC address is changed on the wire. When the machine was rebooted, the new MAC address was lost. This seems to be a bug in the 82559 driver. 82559 spec specifies how to manipulate its control and status register to write to the EEPROM that stores the MAC address. Before I write a program to do this, can someone confirm that this is a bug and it currently has no fix? It's not a bug and shouldn't be fixed. The address set by `ifconfig hw' is a part of run-time system configuration, and should stay as it. If you want to change EEPROM, do it. Take EEPROM update utility (e.g. from http://scyld.com/) and write what you want. Best regards Andrey V. Savochkin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: (reiserfs) Re: An elevator algorithm
Ragnar Kjørstad wrote: On Fri, Sep 22, 2000 at 03:23:26PM -0700, Hans Reiser wrote: I think Xuan's algorithm is good, so I want to add to it.:-) Ragnar, I don't understand your objection to it. It is always the case that if you specify real time constraints that are impossible then they aren't met. My objection was that in the case where it is impossible to serve requests within the maximum latency, it would stop ordering the requests. With a FIFO queue, the throuput will be lower, and that will also give longer latency. -- Ragnar Ok, reasonable objection.:) Hans - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: problem with 2.4.0-test9-pre6 seems to be SHM
Very correct except for one thing, allocation fails and ipcs -u shows 4097 when the limit shows 4096. safemode reports that eventually the kernel crashes. This may be due to the test9 'features' and a side affect, or it may be something to keep in mind once we get things nailed down a bit. -d Christoph Rohland wrote: Here I am on the line again. But fortunately you found out yourself that it's not the kernel to blame... Greetings Christoph -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President begin:vcard n:Ford;David x-mozilla-html:TRUE org:img src="http://www.kalifornia.com/images/paradise.jpg" adr:;; version:2.1 email;internet:[EMAIL PROTECTED] title:Blue Labs Developer x-mozilla-cpt:;28256 fn:David Ford end:vcard
Re: kernel compiled with frame pointer
On Sun, 24 Sep 2000, Robert Redelmeier wrote: I am trying to get the call trace of a process by tracing the return addresses on the stack. To get the correct location of the return address I need to know whether the kernel is being compiled with frame pointer because this will affect the offset of return address on the stack. Of course. But when your kernel was compiling, did you notice the `gcc` options as the files flew by? `-fomit-frame-pointer` is standard on i386 and perhaps other arch's. I agree. Sitting in the front of desktop I can see if the source files are getting compiled with or without -fomit-frame-pointer. But, while writing a function in a kernel source file, I want to know whether the caller of this function was compiled with or without -fomit-frame-pointer because this will affect the location of return address to it on the stack. So, I assume that if CONFIG_FRAME_POINTER is defined then the kernel (and hopefully the caller function also) is being compiled without -fomit-frame-pointer and then look for the return address appropriately. Although this assumption is not correct (see Keith's mail in this thread) but works in the case I am looking at (the function __dump_save_panic_regs in the arch/i386/kernel/vmdump.c from the LKCD patch) because there the caller and the callee are part of one code and either both or none is compiled with frame pointer. But when you say "process", that sounds like userland. Then it would depend on whether you compiled with `-fomit-frame-pointer` or not. I am looking at crash dump utilities for Linux and in that context if the kernel crashes then I am only interested in the kernel functions which the process was executing at the time of the crash and not worried about the user land call trace before the process entered the kernel. Therefore, whether the user level program (which the process is executing) is compiled with or without -fomit-frame-pointer is irrelevent in this case. Regards, Sushil. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Given an image, how can show its config?
Keith Owens wrote: On Sat, 23 Sep 2000 14:15:44 +0100 (BST), James Sutherland [EMAIL PROTECTED] wrote: How about putting these files in the modules directory? That way, we have a nice consistent location for them. Why do you think modutils 2.3.14 added a prune list of files to ignore in /lib/modules/`uname -r`? The current (2.3.17) list from modprobe -c is # Prune prune modules.dep prune modules.pcimap prune System.map prune .config prune build prune vmlinux prune vmlinuz prune bzImage prune zImage The 2.5 Makefiles rewrite will create /lib/modules/`uname -r`, even if you do not use modules (Hi, Rusty ;) and install the kernel specific output files in it. There will also be enough information saved in /lib/modules to allow external compilation of third party software without having to guess what the kernel compile options were, this includes module symbol version information. This is all covered in ftp://ftp.kernel.org/pub/linux/kernel/projects/kbuild/makefile-wishlist-2.5-4.bz2. The Makefile rewrite is definitely a 2.5 project, it is too big a change for 2.4. Whether we rename /lib/modules to /lib/kernel has not been decided yet. BTW, any discussion about this rewrite should be on the linux-kbuild list, not linux-kernel (yet). See 2.4.0-test9-pre6 MAINTAINERS. I'm slowly drifting towards enlightenment on this issue. Let me try to state this in simple terms I can understand: the tree descending from the revision name in the modules directory will contain everything needed to: - Boot and run a given kernel + modules - Reconfigure the same kernel, given the original source tree - Support symbolic crash dumps and debugging And to satisfy these needs the following will be saved in that tree: - Kernel image (one or more of vmlinux, vmlinuz, etc.) - Module tree - Kernel configuration (.config) - Module dependencies - Kernel symbols (System.map) This makes sense to me. This arrangement keeps track of my .config and System.map and gives me a nice mindless 'make install' that does it all. Gosh, we could even put a README in it. The next obvious thing to do is move the whole tree to the /boot directory, leaving just a symbolic link in /lib/modules. I'll stop promoting the idea of tacking a portion of this tree onto the bzImage. This can wait, and if I want it, it would be better to tack on the whole tree anyway, filtered for the parts I don't need. This would give a nice, linear file that I can just cat onto any boot device or feed to lilo in the usual way. It also suggests a way of loading modules using a stub filesystem that knows only about the bzImage. The bottom line is I can stop panicking about this issue and panic about something else instead :-) -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
PATCH 2.2.18.9: Backport /proc/pci from 2.4.x to 2.2.x
The 2.4.x kernel series obtains its /proc/pci device name data from a data file pci.ids. The file makes PCI device name generic enough that it may be used by multiple utilities -- the kernel, Martin Mares' pciutils, distro installers, etc. The attached patch, against kernel 2.2.18-pre9, backports the 2.4.x /proc/pci facilities and device name database. BTW, what do you think of idea making the pci.ids base modular ? I mean replacing data requests from pci.ids base by their queuing requests (+ eventually request_module(pci_ids) to process the queue if possible ) The module while loading should process the queue. I see two advantages of this solution: - make if possible to use Vendor/Device info when booting from floppy (kernel size limitations) - useful for hot-plugable PCI devices... What do you think of it ? -- === Andrzej M. Krzysztofowicz [EMAIL PROTECTED] phone (48)(58) 347 14 61 Faculty of Applied Phys. Math., Technical University of Gdansk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 kernels do not boot on UX (Alpha)
On Sun, 24 Sep 2000, Richard Henderson wrote: The PCI setup widgetry is known to be broken for pci-pci bridges. I've been intending to rewrite all this, but keep finding something more interesting to do -- like clean the cat box. If it makes you feel any better, I have an AS4100 that can't boot 2.4 at the moment either for the same reason. To give us a knowledge jump start... what is broken? The latest test9 pre-patches include some bridge cleanup.. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.0-test8: Alpha RTC clean-ups
On Fri, 22 Sep 2000, Jan-Benedict Glaw wrote: Instead of having hard-coded values, we should maybe do something more variable like: if (year = (20 + YEARS_SINCE_2000) year (48 + YEARS_SINCE_2000) ... This looks reasonable. YEARS_SINCE_2000 could be define'd through (menu;x;...)config... I don't think it's really needed. We have 20 years before epoch gets misdetected. Just keeping the macro up to date for Linux releases should be enough (and if someone insists on running a given kernel for such a long time, they may modify sources accordingly themselves). This applies to other platforms using different epoch vaules as well, of course... Alpha appears to be the only one. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: Not sure if this is the right moment for those changes though, I'm not worried about ext2 but about the other non-netoworked fses that nobody uses regularly. it *is* the right moment to clean these issues up. These kinds of things are what made the 2.2 VM a mess (everybody added his easy improvements, without solving some of the conceptual problems), and frankly, instead of yet another elevator algorithm we need a squeaky clean VM balancer above all. Please help identifying, fixing, debugging and testing these VM balancing issues. This is tough work and it needs to be done. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Kernel Oops with bonding
Hi all I'm getting kernel oops if I try networking with bonding. I working with 2.2.16-smp and the bonding.c etc. included with it. Everything starts up but as soon as a packet is sent (ping). I'm getting the following error: Unable to handel kernel NULL pointer derefernce at virtual address current-tss.cr3 = 1f949000, %cr3 = 1f949000 pde* = Oops: CPU:1 EIP:0010:[e0051343] EFLAGS: 00010246 . . . Segmentation fault I read about others having this problem but couldn't find a patch or something... any help appreciated.. thanx phibo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
VIA UDMA / Kernel 2.2.17
Hi there ... I have the 2.2.17 Kernel with the VIA Chipset Support. My BIOS says that my HD (Samsung) is in UDMA Mode 4. A friend of mine told me that I can increase my disk performance a little if I use DMA. hdparm -d 1 /dev/hda But I will get the following errors whenever I run hdparm -tT /dev/hda : Sep 25 10:06:15 cello kernel: hdc: drive_cmd: status=0x51 { DriveReady SeekComplete Error } Sep 25 10:06:15 cello kernel: hdc: drive_cmd: error=0x04 Sep 25 10:06:42 cello kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Sep 25 10:06:42 cello kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Sep 25 10:06:42 cello kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Sep 25 10:06:42 cello kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Sep 25 10:06:52 cello kernel: hda: lost interrupt Sep 25 10:06:52 cello kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Sep 25 10:06:52 cello kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Sep 25 10:07:02 cello kernel: hda: lost interrupt Sep 25 10:07:02 cello kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Sep 25 10:07:02 cello kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Sep 25 10:07:02 cello kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Sep 25 10:07:02 cello kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Sep 25 10:07:02 cello kernel: hda: DMA disabled Whats wrong there? with friendly regards jens luedicke [EMAIL PROTECTED] Support the Theory of Evolution; 400 Billion Amphibians can't be wrong! Q: What is the difference between Texas and yogurt? A: Yogurt has culture. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH 2.2.18.9: Backport /proc/pci from 2.4.x to 2.2.x
On Mon, 25 Sep 2000 11:07:58 +0200 (CEST), Andrzej Krzysztofowicz [EMAIL PROTECTED] wrote: BTW, what do you think of idea making the pci.ids base modular ? The module while loading should process the queue. Does the modules.pcimap file creates by recent modules do what you want? It maps PCI vendor and device codes to the module that supports them. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[SysRq PATCH]: no more reboots because console freeze
Hi all I made a very small patch to the SysRq facility that signals a program with SIGUSR1, the program is registered via sysctl The signal is launched with Alt+SysRq+X (X stands for eXecute program) /proc/sys/kernel/sysrq_progid contains pid and start_time which totally identifies the program to signal One of the uses of this is restore the console when a X-server or a SVGAlib program crashes How many times did you have got to reboot because of that, while your system may be doing something important??? this hopefully solves the problem Patch is included below, sample programs/scripts are available at http://pusa.uv.es/~ulisses/sysrq_X.tar.gz Comments/sugestions wanted, please e-mail me directly to [EMAIL PROTECTED] (I'm not subscribed to the linux-kernel mailing list) Bye! Ulisses - Debian GNU/Linux: a dream come true - diff -u -r linux-2.4.0-test8/drivers/char/sysrq.c linux-2.4.0-test8-modificat2/drivers/char/sysrq.c --- linux-2.4.0-test8/drivers/char/sysrq.c Tue Aug 1 04:36:10 2000 +++ linux-2.4.0-test8-modificat2/drivers/char/sysrq.c Mon Sep 25 08:58:23 2000 @@ -6,6 +6,8 @@ * * (c) 1997 Martin Mares [EMAIL PROTECTED] * based on ideas by Pavel Machek [EMAIL PROTECTED] + * + * (c) 2000 Ulisses Alonso Camaró eXecute_program extension */ #include linux/config.h @@ -22,6 +24,7 @@ #include linux/quotaops.h #include linux/smp_lock.h #include linux/module.h +#include linux/kernel.h #include asm/ptrace.h @@ -30,9 +33,13 @@ extern int console_loglevel; extern struct list_head super_blocks; +void signal_program(void); + /* Whether we react on sysrq keys or just ignore them */ int sysrq_enabled = 1; +unsigned long sysrq_progid[2]= {0, 0}; /* pid and start_time */ + /* Machine specific power off function */ void (*sysrq_power_off)(void) = NULL; @@ -53,6 +60,35 @@ } } +void signal_program(void) +{ + struct task_struct *p; + pid_t pid= (pid_t)sysrq_progid[0]; + + read_lock(tasklist_lock); + + if ((p= find_task_by_pid(pid)) == NULL) + goto error; + + if (!p-mm || p-pid == 1) + goto error; + + if (p-start_time != sysrq_progid[1]) + goto error; + + send_sig(SIGUSR1, p, 0); + + read_unlock(tasklist_lock); + + return; + + error: + printk(KERN_ERR "SysRq: Could not send signal to pid %d with start_time +%lu\n", + pid, sysrq_progid[1]); + + return; +} + /* * This function is called by the keyboard handler when SysRq is pressed * and any other keycode arrives. @@ -138,6 +174,10 @@ send_sig_all(SIGKILL, 1); orig_log_level = 8; break; + case 'x': + printk("eXecute program\n"); + signal_program(); + break; default:/* Unknown: help */ if (kbd) printk("unRaw "); @@ -148,7 +188,7 @@ printk("Boot "); if (sysrq_power_off) printk("Off "); - printk("Sync Unmount showPc showTasks showMem loglevel0-8 tErm kIll killalL\n"); + printk("Sync Unmount showPc showTasks showMem loglevel0-8 tErm kIll +killalL eXec_program\n"); /* Don't use 'A' as it's handled specially on the Sparc */ } diff -u -r linux-2.4.0-test8/include/linux/sysctl.h linux-2.4.0-test8-modificat2/include/linux/sysctl.h --- linux-2.4.0-test8/include/linux/sysctl.hThu Aug 10 22:01:26 2000 +++ linux-2.4.0-test8-modificat2/include/linux/sysctl.h Sun Sep 24 12:39:30 2000 @@ -113,6 +113,7 @@ KERN_OVERFLOWGID=47,/* int: overflow GID */ KERN_SHMPATH=48,/* string: path to shm fs */ KERN_HOTPLUG=49,/* string: path to hotplug policy agent */ + KERN_SYSRQ_PROGID=50/* string: pid and start_time of the program to signal +*/ }; diff -u -r linux-2.4.0-test8/kernel/sysctl.c linux-2.4.0-test8-modificat2/kernel/sysctl.c --- linux-2.4.0-test8/kernel/sysctl.c Tue Aug 1 04:36:11 2000 +++ linux-2.4.0-test8-modificat2/kernel/sysctl.cMon Sep 25 08:59:24 2000 @@ -83,6 +83,10 @@ extern int acct_parm[]; #endif +#ifdef CONFIG_MAGIC_SYSRQ +extern unsigned long sysrq_progid[]; +#endif + extern int pgt_cache_water[]; static int parse_table(int *, int, void *, size_t *, void *, size_t, @@ -220,6 +224,10 @@ #ifdef CONFIG_MAGIC_SYSRQ {KERN_SYSRQ, "sysrq", sysrq_enabled, sizeof (int), 0644, NULL, proc_dointvec}, + {KERN_SYSRQ_PROGID, "sysrq_progid", sysrq_progid, 2*sizeof(unsigned long), + 0644, NULL, proc_doulongvec_minmax, NULL, + (void *)NULL, (void *)NULL}, +/*
the new VM
i'd also like to share my experiences with recent kernels, compared to the 'old VM'. I frequently run high VM load multi-gigabyte systems with alot of IRQ-side allocations as well, and it's surprising how sensitive these systems' performance is to VM balance, despite gobs of RAM. - The biggest difference under high allocation load is that the CPU usage of kswapd and the synchronous VM balancing code has decreased significantly. Under previous kernels it was not uncommon to see sudden spikes in kswapd usage, and to see significant CPU time spent in shrink_mmap() friends. I suspect that this is because the new VM does much less 'guessing' and blind list-walking. - i'm also happy that __alloc_pages() now 'guarantees' allocation. This i believe could simplify unrelated kernel code significantly. Eg. no need to check for NULL pointers on most allocations, a GFP_KERNEL allocation always succeeds, end of story. This behavior also has the 'nice' side-effect of showing memory inbalance rather forcefully: the system locks up ;-) A GFP_ATOMIC allocation obviously still has the potential to fail, and must be handled properly. all in one, the new VM balancing code looks really promising, despite all the growing pains. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: more testing on 2.4.0-t9p[456] VM deadlocks
On Mon, 25 Sep 2000, Martin Diehl wrote: PS: vmfixes-2.4.0-test9-B2 not yet tested - will do later. Hi - done now: using 2.4.0-t9p6 + vmfixes-2.4.0-test9-B2 I ended up with the box deadlocked again! Was "make bzImage" on UP booted with mem=8M. After about 4 hours at load 2-3 and almost continously paging the box is apparently locked up. SysRq+t still shows several processes including kswapd being scheduled "current" (one after the other of course). Mem-Info (retyped from SysRq+m): Active: 847 / inactive dirty: 67 / inactive clean: 0 / free: 64 2x16 + 1x32 + 1x64 + 1x128 = 256kB Swap cache: add 3353996, delete 3353209, find 2300336/9605753 Free swap: 496144kB 2048 pages of RAM 0 pages of HIGHMEM 490 reserved pages 74 pages shared 787 pages cached 0 pages in page table cache Buffer memory: 236kB No change on this at all, despite the scheduling activity still observed. I've looked up several EIP-values (given by SysRq+p) vs. System.map to get an idea what is still going on. The functions I've recorded (this # often): page_launder(10) try_to_free_buffer(5) deactivate_page_nolock(4) refill_inactive_scan(3) nr_free_pages(1) wakeup_kswapd(1) __wake_up(1) kmem_cache_reap(1) sys_fstatfs(1) sys_statfs(1) The results of this very rudimentary "profiling the deadlock" are far from statistical significance of course. The only ordering rule implied in this list is the number of occurences - i.e., I don't see any pattern or call chain there. Finally, SysRq+e solved the problem: hanging processes term'ed, VM deadlock released, box seems to be as useable as after it was booted. Comments? Regards Martin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.0-test8: Alpha RTC clean-ups
On Mon, Sep 25, 2000 at 11:35:35AM +0200, Maciej W. Rozycki wrote: On Fri, 22 Sep 2000, Jan-Benedict Glaw wrote: Instead of having hard-coded values, we should maybe do something more variable like: if (year = (20 + YEARS_SINCE_2000) year (48 + YEARS_SINCE_2000) ... This looks reasonable. This applies to other platforms using different epoch vaules as well, of course... Alpha appears to be the only one. ./driver/char/rtc.c:rtc_init() #if defined(__alpha__) || defined(__mips__) [...] MIPS does that as well _in the wrong way_ compared to rtc.c: ./arch/mips/dev/time.c:time_init() /* * The DECstation RTC is used as a TOY (Time Of Year). * The PROM will reset the year to either '70, '71 or '72. * This hack will only work until Dec 31 2001. */ year += 1928; MfG, JBG -- Fehler eingestehen, Größe zeigen: Nehmt die Rechtschreibreform zurück!!! /* Jan-Benedict Glaw [EMAIL PROTECTED] -- +49-177-5601720 */ keyID=0x8399E1BB fingerprint=250D 3BCF 7127 0D8C A444 A961 1DBD 5E75 8399 E1BB "insmod vi.o and there we go..." (Alexander Viro on linux-kernel) PGP signature
Re: PATCH 2.2.18.9: Backport /proc/pci from 2.4.x to 2.2.x
On Mon, 25 Sep 2000, Andrzej Krzysztofowicz wrote: BTW, what do you think of idea making the pci.ids base modular ? I mean replacing data requests from pci.ids base by their queuing requests (+ eventually request_module(pci_ids) to process the queue if possible ) The module while loading should process the queue. I see two advantages of this solution: - make if possible to use Vendor/Device info when booting from floppy (kernel size limitations) - useful for hot-plugable PCI devices... I'm not sure I understand what you are describing here... pci.ids is just a vendor/device id - device name map. It shouldn't affect functionality at all... just whether or not you know the names of your devices. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: how interesting are data-bss patches?
On Sun, 24 Sep 2000 14:28:07 -0500 (CDT), Peter Samuelson [EMAIL PROTECTED] wrote: [Tigran Aivazian [EMAIL PROTECTED]] The question you ask can be answered trivially - yes, it is definitely a good idea, please make such a patch. My expression doesn't catch *all* offenders, by any means. For example, things like char *foo[MAX_BLURFL] = { NULL, }; would be much harder to pick up. Got bored, wrote some Perl. On a 2.4.0-test8 system with a fairly minimal config, most stuff in modules, it found 14,420 bytes of vmlinux .data where the entire variable was initialized to 0. Obviously this depends on your config and will only catch data for options that have been selected. Anybody want to try this on a kernel with the lot? Biggest offenders 0xc02ce7e0(64) empty_iops.505 0xc02ce820(64) empty_fops.506 0xc02c8c40(128) last_irq_sums.664 0xc02c8cc0(128) alert_counter.665 0xc02c9120(128) vm86_irqs 0xc02ca340(128) apic_timer_irqs 0xc02dc0c0(128) inet_protos 0xc02dc240(128) tcp_listening_hash 0xc02dff20(128) inet6_protos 0xc02c9920(256) command_line 0xc02ca080(256) tsc_values 0xc02cee60(512) proc_alloc_map 0xc02c9680(672) e820 0xc02d0c60(1024)floppy_blocksizes 0xc02d4f80(1024)raw_device_bindings 0xc02d5380(1024)raw_device_inuse 0xc02d5780(1024)raw_device_sector_size 0xc02d5b80(1024)raw_device_sector_bits 0xc02cd160(2048)chrdevs 0xc02cdc40(2048)blkdevs --- cut here #!/usr/bin/perl -w # List symbols in .data section which are initialized to zero. # Needs readelf command from recent binutils. use strict; die($0 . " takes exactly one argument, vmlinux or a module to be explored\n") if($#ARGV); my ($data, $section, $start, $size, $addr, $len, $i, $total); my @f; my @symdata; my @keys; my %symbol; $data = `readelf -S $ARGV[0] | fgrep ' .data '`; chomp($data); die("$0 could not find .data section in $ARGV[0]\n") if($data eq ""); print("Data section ", $data, "\n"); $data =~ s/[][]//g; (@f) = split(' ', $data); $section = $f[0]; $start = hex("0x" . $f[3]); $size = hex("0x" . $f[5]); $symbol{$start} = ".data__start"; $symbol{$start+$size} = ".data__end"; $data = ""; @symdata = `readelf -s -x $section $ARGV[0]`; foreach (@symdata) { chomp(); if (/^ *[0-9]+:/) { (@f) = split(); $symbol{hex("0x" . $f[1])} = $f[7] if ($f[6] eq "$section"); } elsif (/^ *0x/) { s/^ +//; $addr = hex(substr($_, 0, 10)); ($_ = substr($_, 11, 35)) =~ s/ //g; $data .= scalar reverse($_);# assumes little endian } } @keys = sort(keys(%symbol)); $total = 0; for ($i = 0; $i $#keys-1; ++$i) { $addr = $keys[$i]; $len = $keys[$i+1] - $addr; if (substr($data, ($addr-$start)*2, $len*2) =~ /^0+$/) { printf("0x%x(%d)\t%s\n", $addr, $len, $symbol{$addr}); $total += $len; } } printf("Total %d (0x%x)\n", $total, $total); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel compiled with frame pointer
Sushil wrote: I agree. Sitting in the front of desktop I can see if the source files are getting compiled with or without -fomit-frame-pointer. But, while writing a function in a kernel source file, I want to know whether the caller of this function was compiled with or without -fomit-frame-pointer because this will affect the location of return address to it on the stack. So, I assume that if CONFIG_FRAME_POINTER is defined then the kernel (and hopefully the caller function also) is being compiled without -fomit-frame-pointer and then look for the return address appropriately. Ah -- I see, you are looking at some sort of kernel debugger. Well, then one way would be to look at entry and exit points. i386 Frame pointers are set up with `pushl %ebp / movl %esp, %ebp / subl $local, %esp` or sometimes [not by gcc AFAIK with `enter`]. Exit points are similarly `movl %ebp, %esp / popl %ebp / ret`. Some versions of gcc do generate `leave / ret`. You could look for these byte signatures. Should be quite reliable with a good System.map. -- Robert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel compiled with frame pointer
On Mon, 25 Sep 2000 06:21:48 -0500, Robert Redelmeier [EMAIL PROTECTED] wrote: Ah -- I see, you are looking at some sort of kernel debugger. Well, then one way would be to look at entry and exit points. i386 Frame pointers are set up with `pushl %ebp / movl %esp, %ebp / subl $local, %esp` or sometimes [not by gcc AFAIK with `enter`]. Exit points are similarly `movl %ebp, %esp / popl %ebp / ret`. Some versions of gcc do generate `leave / ret`. You could look for these byte signatures. Should be quite reliable with a good System.map. Until you go to gcc 2.96 when the prologue code changes dramatically. Interleaved instructions, plus "nice" constructs like void foo(int bar) { if (!bar) return; return; } Could generate the test before doing anything on stack. foo: cmpl 4(%esp),$0 be1f pushl %ebp movl %esp,%ebp ... movl %ebp,%esp popl %ebp 1: ret - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 82559 driver bug
When the machine was rebooted, the new MAC address was lost. This seems to be a bug in the 82559 driver. 82559 spec specifies The kernel address overrides never do permanent changes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH 2.2.18.9: Backport /proc/pci from 2.4.x to 2.2.x
On Mon, 25 Sep 2000 11:07:58 +0200 (CEST), Andrzej Krzysztofowicz [EMAIL PROTECTED] wrote: BTW, what do you think of idea making the pci.ids base modular ? The module while loading should process the queue. Does the modules.pcimap file creates by recent modules do what you want? It maps PCI vendor and device codes to the module that supports them. I'm affraid no. Which module should be responsible for VGA/PCI-PCI bridge/ PCI-ISA bridge/IDE and all other features compiled into kernel ? I mean updating the data not at boot time (when it is not always necessary, but later). Probably it would be also an alternative solution for modules.pcimap. -- === Andrzej M. Krzysztofowicz [EMAIL PROTECTED] phone (48)(58) 347 14 61 Faculty of Applied Phys. Math., Technical University of Gdansk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: more testing on 2.4.0-t9p[456] VM deadlocks
On Mon, 25 Sep 2000, Martin Diehl wrote: On Mon, 25 Sep 2000, Martin Diehl wrote: PS: vmfixes-2.4.0-test9-B2 not yet tested - will do later. Hi - done now: using 2.4.0-t9p6 + vmfixes-2.4.0-test9-B2 I ended up with the box deadlocked again! Was "make bzImage" on UP booted with mem=8M. There is a known deadlock with Ingo's patch. I'm attaching a patch which should fix it. (on top of vmfixes-2.4.0-test9-B2) diff -Nur --exclude-from=exclude linux.orig/fs/dcache.c linux/fs/dcache.c --- linux.orig/fs/dcache.c Mon Sep 25 08:40:47 2000 +++ linux/fs/dcache.c Mon Sep 25 08:40:53 2000 @@ -556,15 +556,11 @@ int count = 0; if (priority) count = dentry_stat.nr_unused / priority; - prune_dcache(count); - /* FIXME: kmem_cache_shrink here should tell us - the number of pages freed, and it should - work in a __GFP_DMA/__GFP_HIGHMEM behaviour - to free only the interesting pages in - function of the needs of the current allocation. */ - kmem_cache_shrink(dentry_cache); - return 0; + if(gfp_mask __GFP_IO) + prune_dcache(count); + + return kmem_cache_shrink(dentry_cache); } #define NAME_ALLOC_LEN(len)((len+16) ~15) diff -Nur --exclude-from=exclude linux.orig/fs/inode.c linux/fs/inode.c --- linux.orig/fs/inode.c Mon Sep 25 08:40:47 2000 +++ linux/fs/inode.cMon Sep 25 08:40:53 2000 @@ -460,15 +460,11 @@ if (priority) count = inodes_stat.nr_unused / priority; - prune_icache(count); - /* FIXME: kmem_cache_shrink here should tell us - the number of pages freed, and it should - work in a __GFP_DMA/__GFP_HIGHMEM behaviour - to free only the interesting pages in - function of the needs of the current allocation. */ - kmem_cache_shrink(inode_cachep); - return 0; + if(gfp_mask __GFP_IO) + prune_icache(count); + + return kmem_cache_shrink(inode_cachep); } /* diff -Nur --exclude-from=exclude linux.orig/mm/slab.c linux/mm/slab.c --- linux.orig/mm/slab.cMon Sep 25 08:40:38 2000 +++ linux/mm/slab.c Mon Sep 25 08:40:53 2000 @@ -887,7 +887,7 @@ static int __kmem_cache_shrink(kmem_cache_t *cachep) { slab_t *slabp; - int ret; + int ret, freed = 0; drain_cpu_caches(cachep); @@ -912,8 +912,11 @@ spin_unlock_irq(cachep-spinlock); kmem_slab_destroy(cachep, slabp); spin_lock_irq(cachep-spinlock); + + freed++; } - ret = !list_empty(cachep-slabs); + + ret = ((1 cachep-gfporder) * freed); spin_unlock_irq(cachep-spinlock); return ret; } @@ -923,7 +926,8 @@ * @cachep: The cache to shrink. * * Releases as many slabs as possible for a cache. - * To help debugging, a zero exit status indicates all slabs were released. + * + * Returns the amount of freed pages. */ int kmem_cache_shrink(kmem_cache_t *cachep) { @@ -962,7 +966,9 @@ list_del(cachep-next); up(cache_chain_sem); - if (__kmem_cache_shrink(cachep)) { + __kmem_cache_shrink(cachep); + + if (!list_empty(cachep-slabs)) { printk(KERN_ERR "kmem_cache_destroy: Can't free all objects %p\n", cachep); down(cache_chain_sem);
Re: PATCH 2.2.18.9: Backport /proc/pci from 2.4.x to 2.2.x
On Mon, 25 Sep 2000, Andrzej Krzysztofowicz wrote: I mean moving the __init database compiled into kernel (based on pci.ids) to a separate module, which would be responsible for on-demand updating of text information (i.e. replacing VID:DID numbers with text). In early 2.3.x, the fbdev subsystem added "modedb", a feature which provides a standard video mode database for all framebuffer drivers. This is also __init code, because after boot, video mode information can be provided from userspace (via 'fbset', in fbdev's case). I see you suggestion in the same way... If we keep the PCI device name data around after boot, then we have a lot of kernel memory locked up on the off chance that a HotPlug PCI device might appear for which we need a name. I would much prefer a userspace solution for naming unnamed PCI devices after boot... Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH 2.2.18.9: Backport /proc/pci from 2.4.x to 2.2.x
On Mon, 25 Sep 2000, Jeff Garzik wrote: I see you suggestion in the same way... If we keep the PCI device name data around after boot, then we have a lot of kernel memory locked up on the off chance that a HotPlug PCI device might appear for which we need a name. I would much prefer a userspace solution for naming unnamed PCI devices after boot... How about the kernel calling lspci? -Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.0-test8: Alpha RTC clean-ups
On Mon, 25 Sep 2000, Jan-Benedict Glaw wrote: ./driver/char/rtc.c:rtc_init() #if defined(__alpha__) || defined(__mips__) [...] That is wrong. I fixed this partially in the MIPS/Linux CVS tree a few weeks ago. The __mips__ conditional is to be completely removed. MIPS does that as well _in the wrong way_ compared to rtc.c: ./arch/mips/dev/time.c:time_init() /* * The DECstation RTC is used as a TOY (Time Of Year). * The PROM will reset the year to either '70, '71 or '72. * This hack will only work until Dec 31 2001. */ year += 1928; We already handle this differently for the DECstation -- no need to check the year from the RTC apart from handling leap years. The real year has to be stored elsewhere. This is platform specific and other MIPS systems are unaffected so the only check needed is whether we are on a DECstation or not. As we don't have a unified MIPS kernel, this can be accomplished at the compile time. These changes will get into the official 2.4 kernel once a merge is performed. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
some sound-related oops'es
I get some oops whenever I try to insmod sb here are some of them, in the hope that someone can track down the problem Unable to handle kernel paging request at virtual address ca8fc1a0 ca88a49d *pde = 07f8a063 Oops: CPU:0 EIP:0010:[usbcore:__insmod_usbcore_S.bss_L96+166557/5043409] EFLAGS: 00010286 eax: ca8fc1a0 ebx: c1f5ec60 ecx: edx: esi: c704d920 edi: ebp: esp: c7025f00 ds: 0018 es: 0018 ss: 0018 Process aumix (pid: 939, stackpage=c7025000) Stack: c704d920 c36c4540 c128c640 c704d920 c36c4540 c128c640 c7024000 0001 c74d2ac0 c6009005 c36c4540 c012b062 c36c4540 c704d920 c704d920 c36c4540 c012a371 c36c4540 c704d920 c6009000 Call Trace: [chrdev_open+62/76] [dentry_open+189/328] [filp_open+82/92] [sys_open+56/180] [system_call+51/56] Code: 8b 10 85 d2 74 1d 52 e8 bf d1 88 f5 83 c4 04 85 c0 74 14 8b Using defaults from ksymoops -t elf32-i386 -a i386 Code; Before first symbol _EIP: Code; Before first symbol 0: 8b 10 mov(%eax),%edx Code; 0002 Before first symbol 2: 85 d2 test %edx,%edx Code; 0004 Before first symbol 4: 74 1d je 23 _EIP+0x23 0023 Before first symbol Code; 0006 Before first symbol 6: 52push %edx Code; 0007 Before first symbol 7: e8 bf d1 88 f5call f588d1cb _EIP+0xf588d1cb f588d1cb END_OF_CODE+2afb034c/ Code; 000c Before first symbol c: 83 c4 04 add$0x4,%esp Code; 000f Before first symbol f: 85 c0 test %eax,%eax Code; 0011 Before first symbol 11: 74 14 je 27 _EIP+0x27 0027 Before first symbol Code; 0013 Before first symbol 13: 8b 00 mov(%eax),%eax --- Unable to handle kernel paging request at virtual address ca8fc1a0 ca88a49d *pde = 07f8a063 Oops: CPU:0 EIP:0010:[usbcore:__insmod_usbcore_S.bss_L96+166557/5043409] EFLAGS: 00010286 eax: ca8fc1a0 ebx: c0558440 ecx: 0003 edx: 0003 esi: c2abc3e0 edi: 0003 ebp: 0003 esp: c2ff9f00 ds: 0018 es: 0018 ss: 0018 Process esd (pid: 1781, stackpage=c2ff9000) Stack: c2abc3e0 c3d209e0 c128c640 c2abc3e0 c3d209e0 c128c640 72616863 6a616d2d 312d726f 0034 0287 c012b062 c3d209e0 c2abc3e0 c2abc3e0 c3d209e0 c012a371 c3d209e0 c2abc3e0 c45b2000 Call Trace: [chrdev_open+62/76] [dentry_open+189/328] [filp_open+82/92] [sys_open+56/180] [system_call+51/56] Code: 8b 10 85 d2 74 1d 52 e8 bf d1 88 f5 83 c4 04 85 c0 74 14 8b Code; Before first symbol _EIP: Code; Before first symbol 0: 8b 10 mov(%eax),%edx Code; 0002 Before first symbol 2: 85 d2 test %edx,%edx Code; 0004 Before first symbol 4: 74 1d je 23 _EIP+0x23 0023 Before first symbol Code; 0006 Before first symbol 6: 52push %edx Code; 0007 Before first symbol 7: e8 bf d1 88 f5call f588d1cb _EIP+0xf588d1cb f588d1cb END_OF_CODE+2afb034c/ Code; 000c Before first symbol c: 83 c4 04 add$0x4,%esp Code; 000f Before first symbol f: 85 c0 test %eax,%eax Code; 0011 Before first symbol 11: 74 14 je 27 _EIP+0x27 0027 Before first symbol Code; 0013 Before first symbol 13: 8b 00 mov(%eax),%eax - This is what appeared in the logs right before the 2nd oops Sep 25 14:08:35 penny kernel: Soundblaster audio driver Copyright (C) by Hannu Savolainen 1993-1996 Sep 25 14:08:35 penny kernel: sb: No ISAPnP cards found, trying standard ones... Sep 25 14:08:35 penny kernel: SB 4.13 detected OK (220) Sep 25 14:08:35 penny kernel: Sound Blaster 16 (4.13) at 0x220 irq 5 dma 1,5 Sep 25 14:08:35 penny kernel: Sound Blaster 16 at 0x330 irq 5 dma 0,0 Sep 25 14:08:35 penny kernel: sb: I/O region in use. Sep 25 14:08:35 penny kernel: Sound: Hmm, DMA1 was left allocated - fixed Sep 25 14:08:35 penny kernel: Sound: Hmm, DMA5 was left allocated - fixed Sep 25 14:08:35 penny insmod: /lib/modules/2.4.0-test8/kernel/drivers/sound/sb.o: init_module: No such device Sep 25 14:08:35 penny insmod: Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters Sep 25 14:08:35 penny insmod: /lib/modules/2.4.0-test8/kernel/drivers/sound/sb.o: insmod char-major-14 failed Sep 25 14:08:35 penny kernel: Unable to handle kernel paging request at virtual address ca8fc1a0
Re: test9pre6 usb-storage
On Sun, 24 Sep 2000, Matthew Dharm wrote: I'm the usb-storage maintainer. Yes, I realize that there is really no need to reset the state to TASK_RUNNING, but I felt better having those there. Considering that code is from the reset routines which almost never get called, I figured it was fine. Matt OK,that's reasonable, but my concern is that these things tend to propogate. If people see this code, they will assume it is *necessary* to do such a thing. Unfortunately this is the only real way much development can get done, by USTL ... thanks john - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25, 2000 at 12:13:08PM +0200, Ingo Molnar wrote: On Mon, 25 Sep 2000, Andrea Arcangeli wrote: Not sure if this is the right moment for those changes though, I'm not worried about ext2 but about the other non-netoworked fses that nobody uses regularly. it *is* the right moment to clean these issues up. These kinds of things I'm talking about the removal of the superblock lock from the filesystems. Note: I don't have problems with the removal of the superblock lock even if done at this stage, I'm not the one who can choose those things, it's Linus's responsability to take the final decision for the official tree, but don't ask me to test patches that removes the superblock lock _at_this_stage_ before I can run a stable and fast 2.4.x because I won't do that. Period. yet another elevator algorithm we need a squeaky clean VM balancer above FYI: My current tree (based on 2.4.0-test8-pre5) delivers 16mbyte/sec in the tiobench write test compared to clean 2.4.0-test8-pre5 that delivers 8mbyte/sec instead with only blkdev layer changes in between the two kernels (and no that's not a matter of the elevator since there are no seeks in the test and I've not changed the elevator sorting algorithm during the bench). Also I I found the reason of your hang, it's the TASK_EXCLUSIVE in wait_for_request. The high part of the queue is reserved for reads. Now if a read completes and it wakeups a write you'll hang. If you think I should delay those fixes to do something else I don't agree sorry. all. Please help identifying, fixing, debugging and testing these VM balancing issues. This is tough work and it needs to be done. I had an alternative VM, that I prefer from a design standpoint, I'll improve it and I'll maintain it. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, Sep 25, 2000 at 12:42:09PM +0200, Ingo Molnar wrote: believe could simplify unrelated kernel code significantly. Eg. no need to check for NULL pointers on most allocations, a GFP_KERNEL allocation always succeeds, end of story. This behavior also has the 'nice' Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed that is a showstopper bug. We also have another showstopper bug in getblk that will be hard to fix because people was used to rely on it and they wrote dealdock prone code. You should know that people not running benchmarks and and using the machine power for simulations runs out of memory all the time. If you put this kind of obvious deadlock into the main kernel allocator you'll screwup the hard work to fix all the other deadlock problems during OOM that is been done so far. Please fix raid1 instead of making things worse. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed that is a showstopper bug. [...] why? machine power for simulations runs out of memory all the time. If you put this kind of obvious deadlock into the main kernel allocator FYI, i havent put it there. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: Please fix raid1 instead of making things worse. huh, what do you mean? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.4.0 i386 watchpoint problems [NEW PATCH]
Here is a patch to arch/i386/traps.c and arch/i386/signal.c which does what you are suggesting, I believe. I have tested this and it works fine for me. (Though I do also need the patch which stores dr6 back into current-thread.debugreg[6]. That is not included here since I submitted it separately and assume it is uncontentious). All user generated watchpoints are noted, as are ones triggered from the kernel by system calls overwriting the watched data. A couple more points :- 1) I restore the debug control register at the point where a signal is about to be delivered, rather than at the top of the loop as you suggested. I think that's safe (and potentially less work if we go around the loop more than once). 2) Rather than zapping the eip in the watchpoint trap to -1 if the trap occurs in the kernel, I make it the user eip from the thread. That makes a debugger point at the right place, i.e. the system call inside which the watchpoint triggered. -- Jim James Cownie[EMAIL PROTECTED] Etnus, LLC. +44 117 9071438 http://www.etnus.com jcownie@pc2: diff -u signal.c-test7 signal.c --- signal.c-test7 Mon Sep 25 11:52:35 2000 +++ signal.cMon Sep 25 11:52:38 2000 @@ -600,6 +600,7 @@ for (;;) { unsigned long signr; + unsigned long thread_dr7; spin_lock_irq(current-sigmask_lock); signr = dequeue_signal(current-blocked, info); @@ -689,6 +690,16 @@ /* NOTREACHED */ } } + + /* Reenable any watchpoints before delivering the +* signal to user space. The processor register will +* have been cleared if the watchpoint triggered +* inside the kernel. +*/ + thread_dr7 = current-thread.debugreg[7]; + __asm__("movl %0,%%db7" + : /* no output */ + : "r" (thread_dr7)); /* Whee! Actually deliver the signal. */ handle_signal(signr, ka, info, oldset, regs); jcownie@pc2: diff -c traps.c-2.4.0-test7 traps.c *** traps.c-2.4.0-test7 Sat Aug 5 00:15:38 2000 --- traps.c Mon Sep 25 13:42:43 2000 *** *** 491,507 } /* ! * Careful - we must not do a lock-kernel until we have checked that the ! * debug fault happened in user mode. Getting debug exceptions while ! * in the kernel has to be handled without locking, to avoid deadlocks.. * * Being careful here means that we don't have to be as careful in a * lot of more complicated places (task switching can be a bit lazy * about restoring all the debug state, and ptrace doesn't have to * find every occurrence of the TF bit that could be saved away even ! * by user code - and we don't have to be careful about what values ! * can be written to the debug registers because there are no really ! * bad cases). */ asmlinkage void do_debug(struct pt_regs * regs, long error_code) { --- 491,516 } /* ! * Our handling of the processor debug registers is non-trivial. ! * We do not clear them on entry and exit from the kernel. Therefore ! * it is possible to get a watchpoint trap here from inside the kernel. ! * However, the code in ./ptrace.c has ensured that the user can ! * only set watchpoints on userspace addresses. Therefore the in-kernel ! * watchpoint trap can only occur in code which is reading/writing ! * from user space. Such code must not hold kernel locks (since it ! * can equally take a page fault), therefore it is safe to call ! * force_sig_info even though that claims and releases locks. ! * ! * Code in ./signal.c ensures that the debug control register ! * is restored before we deliver any signal, and therefore that ! * user code runs with the correct debug control register even though ! * we clear it here. * * Being careful here means that we don't have to be as careful in a * lot of more complicated places (task switching can be a bit lazy * about restoring all the debug state, and ptrace doesn't have to * find every occurrence of the TF bit that could be saved away even ! * by user code) */ asmlinkage void do_debug(struct pt_regs * regs, long error_code) { *** *** 535,562 goto clear_TF; } - /* If this is a kernel mode trap, we need to reset db7 to allow us to continue sanely */ - if ((regs-xcs 3) == 0) - goto clear_dr7; - /* Ok, finally something we can handle */ tsk-thread.trap_no = 1; tsk-thread.error_code = error_code; info.si_signo = SIGTRAP; info.si_errno = 0; info.si_code = TRAP_BRKPT; ! info.si_addr = (void *)regs-eip; force_sig_info(SIGTRAP, info, tsk); - return; - - debug_vm86: - handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1); -
Re: the new VM
On Mon, Sep 25, 2000 at 03:02:58PM +0200, Ingo Molnar wrote: On Mon, 25 Sep 2000, Andrea Arcangeli wrote: Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed that is a showstopper bug. [...] why? Because as you said the machine can lockup when you run out of memory. FYI, i havent put it there. Ok. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: yet another elevator algorithm we need a squeaky clean VM balancer above FYI: My current tree (based on 2.4.0-test8-pre5) delivers 16mbyte/sec in the tiobench write test compared to clean 2.4.0-test8-pre5 that delivers 8mbyte/sec great! I'm happy we have a fine-tuned elevator again. Also I I found the reason of your hang, it's the TASK_EXCLUSIVE in wait_for_request. The high part of the queue is reserved for reads. Now if a read completes and it wakeups a write you'll hang. yep. But i dont understand why this makes any difference - the waitqueue wakeup is FIFO, so any other request will eventually arrive. Could you explain this bug a bit better? If you think I should delay those fixes to do something else I don't agree sorry. no, i never ment it. I find it very good that those half-done changes are cleaned up and the remaining bugs / performance problems are eliminated - the first reports about bad write performance came right after the original elevator patches went in, about 6 months ago. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed that is a showstopper bug. [...] why? Because as you said the machine can lockup when you run out of memory. well, i think all kernel-space allocations have to be limited carefully, denying succeeding allocations is not a solution against over-allocation, especially in a multi-user environment. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, Sep 25, 2000 at 03:04:10PM +0200, Ingo Molnar wrote: On Mon, 25 Sep 2000, Andrea Arcangeli wrote: Please fix raid1 instead of making things worse. huh, what do you mean? I mean this: while (!( /* FIXME: now we are rather fault tolerant than nice */ mirror_bh[i] = kmalloc (sizeof (struct buffer_head), GFP_KERNEL) ) ) I've seen in the 2.4.0-test9-pre6 raid1 code the above is gone (and this looks very promising :)), it is at least proof that some care about the deadlock is been taken) and you instead sleep on a waitqueue now. While it's not obvious at all that sleeping on the waitqueue is not deadlock prone (for example getblk sleeps on a waitqueue bit it's deadlock prone too), at least it's not an infinite loop anymore and that's still better. Is it safe to sleep on the waitqueue in the kmalloc fail path in raid1? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: huh, what do you mean? I mean this: while (!( /* FIXME: now we are rather fault tolerant than nice */ this is fixed in 2.4. The 2.2 RAID code is frozen, and has known limitations (ie. due to the above RAID1 cannot be used as a swap-device). Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: Is it safe to sleep on the waitqueue in the kmalloc fail path in raid1? yes. every RAID1-bh has a bound lifetime. (bound by worst-case IO latencies) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, Sep 25, 2000 at 03:12:58PM +0200, Ingo Molnar wrote: well, i think all kernel-space allocations have to be limited carefully, When a machine without a gigabit ethernet runs oom it's userspace that allocated the memory via page faults not the kernel. And if the careful limit avoids the deadlock in the layer above alloc_pages, then it will also avoid alloc_pages to return NULL and you won't need an infinite loop in first place (unless the memory balancing is buggy). GFP should return NULL only if the machine is out of memory. The kernel can be written in a way that never deadlocks when the machine is out of memory just checking the GFP retval. I don't think any in-kernel resource limit is necessary to have things reliable and fast. Most dynamic big caches and kernel data can be shrinked dynamically during memory pressure (pheraps except skbs and I agree that for skbs on gigabit ethernet the thing is a little different). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, Sep 25, 2000 at 03:21:01PM +0200, Ingo Molnar wrote: yes. every RAID1-bh has a bound lifetime. (bound by worst-case IO latencies) Very good! Many thanks Ingo. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Interrupt sharing
Hello to all, I have one doubt and is as below. Suppose say the two drivers driver1 and driver2 will install the ISR for a particular interrupt, say UART0. After some time the interrupt is generated. At this moment, which driver's ISR is going to execute ?. If driver1 ISR is get executed, will the driver2's ISR is going to execute ?. If say driver2's ISR is going to execute, Is the data that interrupt generated is going to be emulated to the driver2's ISR. please help, any help is welcome, with regards, Mahadev _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: And if the careful limit avoids the deadlock in the layer above alloc_pages, then it will also avoid alloc_pages to return NULL and you won't need an infinite loop in first place (unless the memory balancing is buggy). yes i like this property very much because it unearths VM balancing bugs, which plagued us for so long and are so hard to detect. But statistically it's also possible that try_to_free_pages() frees a page and alloc_pages() done on another CPU (or in IRQ context) 'steals' the page. This can happen, because the VM right now guarantees no straight path from deallocator to allocator. (and it's not necessery to guarantee it, given the varying nature of allocation requests.) GFP should return NULL only if the machine is out of memory. The kernel can be written in a way that never deadlocks when the machine is out of memory just checking the GFP retval. I don't think any in-kernel resource limit is necessary to have things reliable and fast. [...] Andrea, if you really mean this then you should not be let near the VM balancing code :-) Most dynamic big caches and kernel data can be shrinked dynamically during memory pressure (pheraps except skbs and I agree that for skbs on gigabit ethernet the thing is a little different). a big 'except'. You dont need gigabit for that, to the contrary, if the network is slow it's easier to overallocate within the kernel. Ask Alan about how many D.O.S. attacks there are possible without implicit or explicit bean counting. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Interrupt sharing
On Mon, 25 Sep 2000, Mahadev K Cholachagudda wrote: Hello to all, I have one doubt and is as below. Suppose say the two drivers driver1 and driver2 will install the ISR for a particular interrupt, say UART0. After some time the interrupt is generated. At this moment, which driver's ISR is going to execute ?. If driver1 ISR is get executed, will the driver2's ISR is going to execute ?. If say driver2's ISR is going to execute, Is the data that interrupt generated is going to be emulated to the driver2's ISR. When an interrupt is delivered, the kernel calls ALL interrupt handlers registered for that interrupt. That means all drivers capable of sharing interrupts should, ideally, have code in their interrupt handler to exit ASAP if no work is necessary. status = RTL_R16(IntrStatus); /* exit ASAP if no interrupt conditions (0), or * if the hardware was unplugged (0x) */ if ((status == 0) || (status == 0x)) return; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: yes. every RAID1-bh has a bound lifetime. (bound by worst-case IO latencies) Very good! Many thanks Ingo. this was actually coded/fixed by Neil Brown - so the kudos go to him! Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25, 2000 at 03:10:51PM +0200, Ingo Molnar wrote: yep. But i dont understand why this makes any difference - the waitqueue It makes a difference because your sleeping reads won't get the wakeup even while they could queue their reserved read request (they have to wait the FIFO to roll or some write to complete). wakeup is FIFO, so any other request will eventually arrive. Could you explain this bug a bit better? Well it may not explain an infinite hang because as you say the write that got the suprious wakeup will unplug the queue and after some time the reads will be wakenup. So maybe that wasn't the reason of your hangs because I remeber your problem looked more like an infinite hang that was only solved by kflushd writing some more stuff and unplugging the queue as side effect (however I'm not sure since I never experienced those myself). But I hope if it wasn't that one it's the below fix that will help: Index: mm/filemap.c === RCS file: /home/andrea/cvs/linux/mm/filemap.c,v retrieving revision 1.1.1.5.2.3 retrieving revision 1.1.1.5.2.4 diff -u -r1.1.1.5.2.3 -r1.1.1.5.2.4 --- mm/filemap.c2000/09/21 03:11:53 1.1.1.5.2.3 +++ mm/filemap.c2000/09/25 03:33:31 1.1.1.5.2.4 @@ -622,8 +622,8 @@ add_wait_queue(page-wait, wait); do { - sync_page(page); set_task_state(tsk, TASK_UNINTERRUPTIBLE); + sync_page(page); if (!PageLocked(page)) break; schedule(); Index: fs/buffer.c === RCS file: /home/andrea/cvs/linux/fs/buffer.c,v retrieving revision 1.1.1.5.2.1 retrieving revision 1.1.1.5.2.2 diff -u -r1.1.1.5.2.1 -r1.1.1.5.2.2 --- fs/buffer.c 2000/09/06 19:57:51 1.1.1.5.2.1 +++ fs/buffer.c 2000/09/25 03:33:30 1.1.1.5.2.2 @@ -147,8 +147,8 @@ atomic_inc(bh-b_count); add_wait_queue(bh-b_wait, wait); do { - run_task_queue(tq_disk); set_task_state(tsk, TASK_UNINTERRUPTIBLE); + run_task_queue(tq_disk); if (!buffer_locked(bh)) break; schedule(); Think if the buffer returns locked between set_task_state(tsk, TASK_UNINTERRUPTIBLE) and if (!buffer_locked(bh)). The window is very small but it looks a genuine window for a deadlock. (and this one could sure explain infinite hangs in read... even if it looks even less realistic than the EXCLUSIVE task thing) Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: - sync_page(page); set_task_state(tsk, TASK_UNINTERRUPTIBLE); + sync_page(page); - run_task_queue(tq_disk); set_task_state(tsk, TASK_UNINTERRUPTIBLE); + run_task_queue(tq_disk); these look like genuine fixes, but i dont think they can explain the hangs i had yesterday - those were simple VM deadlocks. I dont see any deadlocks today - but i'm running the unsafe B2 variant of the vmfixes patch. (and i have no swapping enabled which simplifies my VM setup.) but one of these two fixes could explain the slowdown i saw on and off for quite some time, seeing very bad read performance occasionally. (do you remember my sched.c tq_disc hack?) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
Hi, On Mon, Sep 25, 2000 at 04:02:30AM +0200, Andrea Arcangeli wrote: On Sun, Sep 24, 2000 at 09:27:39PM -0400, Alexander Viro wrote: So help testing the patches to them. Arrgh... I think I'd better fix the bugs that I know about before testing patches that tries to remove the superblock_lock at this stage. Right. If we're introducing new deadlock possibilities, then sure we can fix the obvious cases in ext2, but it will be next to impossible to do a thorough audit of all of the other filesystems. Adding in the new shrink_icache loop into the VFS just feels too dangerous right now. Of course, that doesn't mean we shouldn't remove the excessive superblock locking from ext2 --- rather, it is simply more robust to keep the two issues separate. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: Again: the bean counting and all the limit happens at the higher layer. I shouldn't know anything about it when I play with the lower layer GFP memory balancing code. exactly, and this is why if a higher level lets through a GFP_KERNEL, then it *must* succeed. Otherwise either the higher level code is buggy, or the VM balance is buggy, but we want to have clear signs of it. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Interrupt sharing
On Mon, 25 Sep 2000, Mahadev K Cholachagudda wrote: Hello to all, I have one doubt and is as below. Suppose say the two drivers driver1 and driver2 will install the ISR for a particular interrupt, say UART0. After some time the interrupt is generated. At this moment, which driver's ISR is going to execute ?. If driver1 ISR is get executed, will the driver2's ISR is going to execute ?. If say driver2's ISR is going to execute, Is the data that interrupt generated is going to be emulated to the driver2's ISR. please help, any help is welcome, with regards, Mahadev Interrupt sharing works only with level interrupts. Your choice of a UART for an example is unfortunate because the IRQs that they use (IRQ3 and IRQ4) are not normally configured for level triggering. That said, if you have a device that shares interrupts, the driver does not know and does not care that it is sharing an interrupt. It does not care if, and only if, the driver's ISR is written properly. A properly-written ISR does not muck with the interrupt controller. It reads the status registers of the device(s) that it is supposed to handle, does whatever is necessary to satisfy the device, then gets to hell out as quickly as possible. Under Linux, getting to hell out is a simple 'return' from the void ISR procedure. When your driver returns to the kernel code that called it, the kernel code determines if the specific interrupt level is still pending. If it is, it calls the next ISR that uses the same interrupt level. This means that every ISR that uses the same interrupt level (IRQ) may get called when there is nothing to do. This is why a properly written ISR will check its device status and if there is nothing to do, it will not complain, it will just return. As you can see shared interrupts have a little more overhead than non-shared ones, however nothing is ever 'lost'. An interrupt that occurs during the execution of an interrupt is 'remembered' by the controller because the the IRQ line will be set true and remain true until the device requesting it is finally satisfied. Cheers, Dick Johnson Penguin : Linux version 2.2.15 on an i686 machine (797.90 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, 25 Sep 2000, Jens Axboe wrote: The changes made were never half-done. The recent bug fixes have mainly been to remove cruft from the earlier elevator and fixing a bug where the elevator insert would screw up a bit. So I'd call that fine tuning or adjusting, not fixing half-done stuff. sorry i did not mean to offend you - unadjusted and unfixed stuff hanging around in the kernel for months is 'half done' for me. the first reports about bad write performance came right after the original elevator patches went in, about 6 months ago. And a new elevator was introduced some months ago to solve this. and these are still not solved in the vanilla kernel, as recent complaints on l-k prove. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25, 2000 at 03:57:31PM +0200, Ingo Molnar wrote: i had yesterday - those were simple VM deadlocks. I dont see any deadlocks Definitely. They can't explain anything about the VM deadlocks. I was _only_ talking about the blkdev hangs that caused you to unplug the queue at each reschedule in tux and that Eric reported me for the SG driver (and I very much hope that with EXCLUSIVE gone away and the wait_on_* fixed those hangs will go away because I don't see anything else wrong at this moment). but one of these two fixes could explain the slowdown i saw on and off for quite some time, seeing very bad read performance occasionally. (do you remember my sched.c tq_disc hack?) Exactly, that's the only thing I was talking about in this subthread. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Demo program]: Poor elevator performance in 2.4.0-test9pre6
On Mon, Sep 25 2000, Robert Cohen wrote: With kernel version 2.4.0-test9pre6 the results are as follows. The test machine has 128 Megs of memory. The tests accesses 240 Megs of files so that it can't fit in cache. If I run it with 8 files of size 30 Megs: [robert@test25 src]$ ./elv_test 8 30 files created, 240 megs written at 8.96 megs/sec finished writing 240 megs written at 1.05 megs per sec finished reading, 240 megs read at 5.848833 megs/sec If I do the same with a single file of size 240 Megs [robert@test25 src]$ ./elv_test 1 240 files created, 240 megs written at 11.12 megs/sec finished writing 240 megs written at 11.08 megs per sec finished reading, 240 megs read at 12.580521 megs/sec axboe@burns:/opt/software/testing ./elv_test 8 30 files created, 240 megs written at 21.64 megs/sec finished writing 240 megs written at 21.12 megs per sec This is my current tree on 2.4.0-test9-pre5. Thanks for the test program, Andrea and I are working on getting a polished patch ready for inclusion that (apparently) also fixes this problem. -- * Jens Axboe [EMAIL PROTECTED] * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: I was _only_ talking about the blkdev hangs [...] i guess this was just miscommunication. It never 'hung', it just performed reads with 20k/sec or so. (without any writes being done in the background.) A 'hang' for me is a deadlock or lockup, not a slowdown. that caused you to unplug the queue at each reschedule in tux and that Eric reported me for the SG driver (and I very much hope that with EXCLUSIVE gone away and the wait_on_* fixed those hangs will go away because I don't see anything else wrong at this moment). okay, i'll test this. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25 2000, Ingo Molnar wrote: The changes made were never half-done. The recent bug fixes have mainly been to remove cruft from the earlier elevator and fixing a bug where the elevator insert would screw up a bit. So I'd call that fine tuning or adjusting, not fixing half-done stuff. sorry i did not mean to offend you - unadjusted and unfixed stuff hanging around in the kernel for months is 'half done' for me. No offense taken, I just tried to explain my view. And in light of the bad test2, I'd like the new changes to not have any "issues". So this work has been going on for the last month or so, and I think we are finally getting to agreement on what needs to be done now and how. WIP. And a new elevator was introduced some months ago to solve this. and these are still not solved in the vanilla kernel, as recent complaints on l-k prove. Different problems, though :(. However, I believe they are solved in Andrea and my current tree. Just needs the final cleaning, more later. -- * Jens Axboe [EMAIL PROTECTED] * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: refill_inactive()
On Sun, 24 Sep 2000, Ingo Molnar wrote: i'm wondering about the following piece of code in refill_inactive(): if (current-need_resched (gfp_mask __GFP_IO)) { __set_current_state(TASK_RUNNING); schedule(); } shouldnt this be __GFP_WAIT? It's true that __GFP_IO implies __GFP_WAIT (because IO cannot be done without potentially scheduling), so the code is not buggy, but the above 'yielding' of the CPU should be done in the GFP_BUFFER case as well. (which is __GFP_WAIT but not __GFP_IO) Objections? 1) if __GFP_WAIT isn't set, we cannot run try_to_free_pages at all 2) you are right, we /can/ schedule when __GFP_IO isn't set, this is mistake ... now I'm getting confused about what __GFP_IO is all about, does anybody know the _exact_ meaning of __GFP_IO ? regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25, 2000 at 03:49:52PM +0200, Jens Axboe wrote: And a new elevator was introduced some months ago to solve this. And now that I done some benchmark it seems the major optimization consists in the implementation of the new _ordering_ algorithm in test2, not really from the removal of the more finegrined latency control (said that I'm not going to reintroduce the previous latency control, the current one doesn't provide great latency but it's ok). As soon I patch my tree with Peter's perfect CSCAN ordering (that only changes the ordering algorithm), tiotest performance drops significantly in the 2-thread-reading case. elvtune settings doesn't matter, that's only a matter of the ordering. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Bonding again..
thanx for the thip with 2.2.17, it really solved my problem. but know i'm getting SIOCSIFSLAVE: invalid agrument. error's when trying to ifenslave devices. i know that this may be the wrong place for a discussion on bonding, but i hardly can find any help on this. because it's quite urgent to me, any clue would help... thanx phibo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25 2000, Andrea Arcangeli wrote: i had yesterday - those were simple VM deadlocks. I dont see any deadlocks Definitely. They can't explain anything about the VM deadlocks. I was _only_ talking about the blkdev hangs that caused you to unplug the queue at each reschedule in tux and that Eric reported me for the SG driver (and I very much hope that with EXCLUSIVE gone away and the wait_on_* fixed those hangs will go away because I don't see anything else wrong at this moment). The sg problem was different. When sg queues a request, it invokes the request_fn to handle it. But if the queue is currently plugged, the scsi_request_fn will not do anything. -- * Jens Axboe [EMAIL PROTECTED] * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, Sep 25, 2000 at 04:04:14PM +0200, Ingo Molnar wrote: exactly, and this is why if a higher level lets through a GFP_KERNEL, then it *must* succeed. Otherwise either the higher level code is buggy, or the VM balance is buggy, but we want to have clear signs of it. I'm not sure if we should restrict the limiting only to the cases that needs them. For example do_anonymous_page looks a place that could rely on the GFP retval. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25 2000, Andrea Arcangeli wrote: And a new elevator was introduced some months ago to solve this. And now that I done some benchmark it seems the major optimization consists in the implementation of the new _ordering_ algorithm in test2, not really from the removal of the more finegrined latency control (said that I'm not going to reintroduce the previous latency control, the current one doesn't provide great latency but it's ok). Yes, I found this the greatest improvement too. As soon I patch my tree with Peter's perfect CSCAN ordering (that only changes the ordering algorithm), tiotest performance drops significantly in the 2-thread-reading case. elvtune settings doesn't matter, that's only a matter of the ordering. Interesting. I haven't done any serious benching with the CSCAN introduction in elevator_linus, I'll try that too. -- * Jens Axboe [EMAIL PROTECTED] * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: I'm not sure if we should restrict the limiting only to the cases that needs them. For example do_anonymous_page looks a place that could rely on the GFP retval. i think an application should not fail due to other applications allocating too much RAM. OOM behavior should be a central thing and based on allocation patterns, not pure luck or unluck. I always found it rude to SIGBUS when some other application is abusing RAM but the oom detector has not yet killed it off. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25, 2000 at 04:08:38PM +0200, Jens Axboe wrote: The sg problem was different. When sg queues a request, it invokes the request_fn to handle it. But if the queue is currently plugged, the scsi_request_fn will not do anything. That will explain it, yes. In the same way for correctness also those should be converted from request_fn to generic_unplug_device, right? (this will also avoid to recall spurious request_fn because the device is still in the tq_disk queue even when the I/O generated by the below request_fn completed) if (major = COMPAQ_SMART2_MAJOR+0 major = COMPAQ_SMART2_MAJOR+7) (q-request_fn)(q); if (major = DAC960_MAJOR+0 major = DAC960_MAJOR+7) (q-request_fn)(q); Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: driver (and I very much hope that with EXCLUSIVE gone away and the wait_on_* fixed those hangs will go away because I don't see anything else wrong at this moment). the EXCLUSIVE thing only optimizes the wakeup, it's not semantic! How better is it to let 100 processes race for one freed-up request slot? There is no guarantee at all that the reader will win. If reads and writes racing for request slots ever becomes a problem then we should introduce a separate read and write waitqueue. the EXCLUSIVE thing was noticed by Dimitris i think, and it makes tons of (performance) sense. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25 2000, Andrea Arcangeli wrote: The sg problem was different. When sg queues a request, it invokes the request_fn to handle it. But if the queue is currently plugged, the scsi_request_fn will not do anything. That will explain it, yes. In the same way for correctness also those should be converted from request_fn to generic_unplug_device, right? (this Yes, that would be the right fix. However, then we also need some way of inserting requests in the queue and let it plug when appropriate. The scsi layer currently "manually" does a list_add on the queue itself, which doesn't look too healthy. will also avoid to recall spurious request_fn because the device is still in the tq_disk queue even when the I/O generated by the below request_fn completed) if (major = COMPAQ_SMART2_MAJOR+0 major = COMPAQ_SMART2_MAJOR+7) (q-request_fn)(q); if (major = DAC960_MAJOR+0 major = DAC960_MAJOR+7) (q-request_fn)(q); AFAIR, Eric tried to talk to the Compaq folks (and Leonard too, I dunno) about why they want this. What came of it, I don't know. -- * Jens Axboe [EMAIL PROTECTED] * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25, 2000 at 04:11:34PM +0200, Jens Axboe wrote: Interesting. I haven't done any serious benching with the CSCAN introduction in elevator_linus, I'll try that too. Only changing that the performance decreased reproducibly from 16 to 14 mbyte/sec in the read test with 2 threads. So far I'm testing only IDE with LVM striping on two equal fast disks on separate IDE channels. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: snip I talked with Alexey about this and it seems the best way is to have a per-socket reservation of clean cache in function of the receive window. So we don't need an huge atomic pool but we can have a special lru with an irq spinlock that is able to shrink cache from irq as well. In the current 2.4 VM code, there is a kernel thread called "kreclaimd". This thread keeps freeing pages from the inactive clean list when needed (when zone-free_pages zone-pages_low), making them available for atomic allocations. Do you consider pages_low pages as a "huge atomic pool" ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: refill_inactive()
On Mon, 25 Sep 2000, Rik van Riel wrote: 2) you are right, we /can/ schedule when __GFP_IO isn't set, this is mistake ... now I'm getting confused about what __GFP_IO is all about, does anybody know the _exact_ meaning of __GFP_IO ? __GFP_IO set to 1 means that the allocator can afford doing IO implicitly by the page allocator. Most allocations dont care at all wether swap IO is started as part of gfp() or not. But a prominent counter-example is GFP_BUFFER, which is used by the buffer-cache/fs layer, and which cannot do any IO implicitly. (because it *is* the IO layer already, and it is already trying to do IO.) The other reason are legacy lowlevel-filesystem locks like the ext2fs lock, which cannot be taken recursively. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, Sep 25, 2000 at 04:27:24PM +0200, Ingo Molnar wrote: i think an application should not fail due to other applications allocating too much RAM. OOM behavior should be a central thing and based At least Linus's point is that doing perfect accounting (at least on the userspace allocation side) may cause you to waste resources, failing even if you could still run and I tend to agree with him. We're lazy on that side and that's global win in most cases. We are finegrined with page granularity, not with the mmap granularity. The point is that not all the mmapped regions are going to be pagedin. Think a program that only after 1 hour did all the calculations that allocated all the memory it requested with malloc. Before the hour passes the unused memory can still be used for other things and that's what the user also expects when he runs `free`. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25, 2000 at 04:29:42PM +0200, Ingo Molnar wrote: There is no guarantee at all that the reader will win. If reads and writes racing for request slots ever becomes a problem then we should introduce a separate read and write waitqueue. I agree. However here I also have a in flight per-queue limit of locked stuff (otherwise with 512k sized request on scsi I could fill in some second 128mbyte of RAM locked and I don't want to decrease the size of the queue because it has to be large for aggressive reordering when the request are 4k large each). This in-flight-perqueue limit is actually a non exclusive wakeup and it triggers more often than the request shortage (because most of the time write are consecutive) and so having two waitqueues and the reads that reigsters themself into both shouldn't be very significative improvement at the moment (I should first care about a wake-one in-flight-limit-per-queue wakeup :). the EXCLUSIVE thing was noticed by Dimitris i think, and it makes tons of Actually I'm the one who introduced the EXCLUSIVE thing there and I audited _all_ the device drivers to check they do 1 wakeup for each 1 request they release before sending it off Linus. But I never thought (until some day ago) about the fact that if a read completes a reserved request the write won't be able to accept it. So long term we'll do two wake-one queues with reads registered in both. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25, 2000 at 04:18:54PM +0200, Jens Axboe wrote: The scsi layer currently "manually" does a list_add on the queue itself, which doesn't look too healthy. It's grabbing the io_request_lock so it looks healthy for now :) Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, Sep 25, 2000 at 11:26:48AM -0300, Marcelo Tosatti wrote: This thread keeps freeing pages from the inactive clean list when needed (when zone-free_pages zone-pages_low), making them available for atomic allocations. This is flawed. It's the irq that have to shrink the memory itself. It can't certainly reschedule kreclaimd and wait it to do the work. Increasing the free_pages_min limit is the _only_ alternative to having irqs that are able to shrink clean cache (and hopefully that "feature" will be resurrected soon since it's the only way to go right now). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: On Mon, Sep 25, 2000 at 03:02:58PM +0200, Ingo Molnar wrote: On Mon, 25 Sep 2000, Andrea Arcangeli wrote: Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed that is a showstopper bug. [...] why? Because as you said the machine can lockup when you run out of memory. The fix for this is to kill a user process when you're OOM (you need to do this anyway). The last few allocations of the "condemned" process can come frome the reserved pages and the process we killed will exit just fine. regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: the EXCLUSIVE thing was noticed by Dimitris i think, and it makes tons of Actually I'm the one who introduced the EXCLUSIVE thing there and I audited sorry - i said it was *noticed* by Dimitris. (and sent to l-k IIRC) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: (Fwd) CD-ROM (SCSI and IDE) not mounting disk
On Sat, Sep 23, 2000 at 09:01:04PM -0500, [EMAIL PROTECTED] wrote: Another interesting thing that I just noticed, I can still play music CD's in either drive. I am currently seeing the same behaviour. My machine is up for 42 days now. Kernel 2.2.16-3 (RH 6.2). I am quite sure I could play CDROM a few weeks ago. But now, when I launch cdplay or xplaycd, no CD is detected : /home/danis/DISCOGRAPHIE/JethroTull/Stormwatch/mp3 cdplay /dev/cdrom: Mauvais type de medium (wrong medium type) /home/danis/DISCOGRAPHIE/JethroTull/Stormwatch/mp3 dmesg ... VFS: Disk change detected on device ide1(22,0) cdrom: pid 15218 must open device O_NONBLOCK! cdrom: open failed. ... A+, -- Thierry Danis Poste : 57 96 [EMAIL PROTECTED] # rm *;o o : commande non trouvée - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
Because as you said the machine can lockup when you run out of memory. well, i think all kernel-space allocations have to be limited carefully, denying succeeding allocations is not a solution against over-allocation, especially in a multi-user environment. GFP_KERNEL has to be able to fail for 2.4. Otherwise you can get everything jammed in kernel space waiting on GFP_KERNEL and if the swapper cannot make space you die. The alternative approach where it cannot fail has to be at higher levels so you can release other resources that might need freeing for deadlock avoidance before you retry Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, Sep 25, 2000 at 04:43:44PM +0200, Ingo Molnar wrote: i talked about GFP_KERNEL, not GFP_USER. Even in the case of GFP_USER i My bad, you're right I was talking about GFP_USER indeed. But even GFP_KERNEL allocations like the init of a module or any other thing that is static sized during production just checking the retval looks be ok. believe the right place to oom is via a signal, not in the gfp() case. Signal can be trapped and ignored by malicious task. We had that security problem until 2.2.14 IIRC. (because oom situation in the gfp() case is a completely random and statistical event, which might have no connection at all to the behavior of that given process.) I agree we should have more information about the behaviour of the system and I think a per-task page fault rate should work in practice. But my question isn't what you do when you're OOM, but is _how_ do you notice that you're OOM? In the GFP_USER case simply checking when GFP fails looks right to me. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Mon, Sep 25, 2000 at 04:53:05PM +0200, Ingo Molnar wrote: sorry - i said it was *noticed* by Dimitris. (and sent to l-k IIRC) I didn't know. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Swap on RAID; was: Re: the new VM
Ingo Molnar wrote: this is fixed in 2.4. The 2.2 RAID code is frozen, and has known limitations (ie. due to the above RAID1 cannot be used as a swap-device). Eh, just to be clear about this: does this apply to the RAID 0.90 code as commonly patched in by RedHat? Should I instead use a swap file for a machine that should be fault-tolerant against a drive failure? regards, David -- David L. Parsley Network Administrator Roanoke College - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Swap on RAID; was: Re: the new VM
On Mon, 25 Sep 2000 [EMAIL PROTECTED] wrote: this is fixed in 2.4. The 2.2 RAID code is frozen, and has known limitations (ie. due to the above RAID1 cannot be used as a swap-device). as commonly patched in by RedHat? Should I instead use a swap file for a machine that should be fault-tolerant against a drive failure? the answer is yes. RAID5 will not deadlock due to VM problems, but RAID5 might have other problems if the device is being reconstructed *and* used for swap. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[patch] 2.4.0-test9-pre6: Alpha cross-compilation fixes
Hi, The following patch allows an Alpha kernel to be built with a cross-compiling toolchain as $(NM) and $(STRIP) do incorporate the $(CROSS_COMPILE) prefix. Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ diff -u --recursive --new-file linux-2.4.0-test9-pre6.macro/arch/alpha/Makefile linux-2.4.0-test9-pre6/arch/alpha/Makefile --- linux-2.4.0-test9-pre6.macro/arch/alpha/MakefileMon Sep 25 17:01:52 2000 +++ linux-2.4.0-test9-pre6/arch/alpha/Makefile Mon Sep 25 17:07:49 2000 @@ -8,7 +8,7 @@ # Copyright (C) 1994 by Linus Torvalds # -NM := nm -B +NM := $(NM) -B LINKFLAGS = -static -T arch/alpha/vmlinux.lds -N #-relax CFLAGS := $(CFLAGS) -pipe -mno-fp-regs -ffixed-8 diff -u --recursive --new-file linux-2.4.0-test9-pre6.macro/arch/alpha/boot/Makefile linux-2.4.0-test9-pre6/arch/alpha/boot/Makefile --- linux-2.4.0-test9-pre6.macro/arch/alpha/boot/Makefile Wed Jul 19 05:58:27 2000 +++ linux-2.4.0-test9-pre6/arch/alpha/boot/Makefile Mon Sep 25 17:07:14 2000 @@ -68,7 +68,7 @@ $(OBJSTRIP) -v $(VMLINUX) vmlinux.nh vmlinux: $(TOPDIR)/vmlinux - strip -o vmlinux $(VMLINUX) + $(STRIP) -o vmlinux $(VMLINUX) tools/lxboot: $(OBJSTRIP) bootloader $(OBJSTRIP) -p bootloader tools/lxboot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VMt
GFP_KERNEL has to be able to fail for 2.4. Otherwise you can get everything jammed in kernel space waiting on GFP_KERNEL and if the swapper cannot make space you die. if one can get everything jammed waiting for GFP_KERNEL, and not being able to deallocate anything, thats a VM or resource-limit bug. This situation is just 1% RAM away from the 'root cannot log in', situation. Unless Im missing something here think about this case 2 active processes, no swap #1 #2 kmalloc 32K kmalloc 16K OK OK kmalloc 16K kmalloc 32K block block so GFP_KERNEL has to be able to fail - it can wait for I/O in some cases with care, but when we have no pages left something has to give - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Andrea Arcangeli wrote: i think the GFP_USER case should do the oom logic within __alloc_pages(), What's the difference of implementing the logic outside alloc_pages? Putting the logic inside looks not clean design to me. it gives consistency and simplicity. The allocators themselves do not have to care about oom. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0t8: hard reboot with ipchains/ipmasq
sorry: it was with iptables, not ipchains = modprobe iptable_nat iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE echo 1 /proc/sys/net/ipv4/ip_forward = g. everything else as in previous post les - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0t8: hard reboot with iptables/ipmasq
[reposted under __corrected__ subject line] My linux box was set up for ipmasq with: === modprobe iptable_nat iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE echo 1 /proc/sys/net/ipv4/ip_forward === a windows box had been browsing the net through the linux box several hours earlier (about 4 hours), and then left alone. when i went back to the windows box and tried to browse again from the same IExplorer window, _SNAP_ and the linux machine just plain up and rebooted instantly i am __guessing__ the problem had something to do with using an old IExplorer session so long after it had last been used??? something about NAT timeouts or something??? but a hard reboot??? apart from this crash, ipmasq had been working fine (just never tested with that kind of delay time). les schaffer other tidbits: -- a few hours prior to crash, i got these from net browsing on a connected windows box: Sep 24 22:14:34 localhost kernel: NAT: 0 dropping untracked packet c1afb180 1 207.88.240.105 - 24.191.22.34 [snip] Sep 25 00:03:12 localhost kernel: NAT: 0 dropping untracked packet c33da540 1 63.211.32.65 - 24.191.22. auth.log marks the last moment of conciousnes: Sep 25 01:37:01 localhost PAM_unix[19174]: (cron) session closed for user root no other significants things written to log. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: the new VM
On Mon, 25 Sep 2000, Alan Cox wrote: Unless Im missing something here think about this case 2 active processes, no swap #1#2 kmalloc 32K kmalloc 16K OKOK kmalloc 16K kmalloc 32K block block so GFP_KERNEL has to be able to fail - it can wait for I/O in some cases with care, but when we have no pages left something has to give you are right, i agree that synchronous OOM for higher-order allocations must be preserved (just like ATOMIC allocations). But the overwhelming majority of allocations is done at page granularity. with multi-page allocations and the need for physically contiguous buffers, the problem cannot be solved. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] vmfixes-2.4.0-test9-B2
On Sun, Sep 24, 2000 at 11:39:13PM -0300, Marcelo Tosatti wrote: - Change kmem_cache_shrink to return the number of freed pages. I did that too extending a patch from Mark. I also removed the first_not_full ugliness providing a LIFO behaviour to the completly freed slabs (so kmem_cache_reap removes the oldest completly unused slabs from the queue, not the most recently used ones with potentially live cache in the CPU). There was a comment on the shrink functions about making kmem_cache_shrink() work on a GFP_DMA/GFP_HIGHMEM basis to free only the wanted pages by the current allocation. This is meaningless at the moment because it can't be addressed without classzone logic in the allocator (classzone means that the allocator will pass to the memory balancing code the information about _which_ classzone you have to allocate memory from, so you won't waste time to synchronously balance unrelated zones). My patch is here (it isn't going to apply cleanly due the test9 changes in do_try_to_free_pages but porting is trivial). It was tested and it was working for me. ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test7/slab-1 BTW, here there's a fix for a longstanding SMP race (since swap_out and msync doesn't run with the big lock) that can corrupt memory: ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test5/msync-smp-race-1 Here the fix for another SMP race in enstablish_pte: ftp://ftp.uskernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test5/tlb-flush-smp-race-1 The fix for this last bit is ugly bit it's safe because Manfred said s390 have a flush_tlb_page that atomically flushes and makees the pte invalid (cleaner fix means moving part of enstablish_pte into the arch inlines). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/