Re: [PATCH] FUTEX : new PRIVATE futexes
Nick Piggin a écrit : Hi Eric, Thanks for doing this... It's looking good, I just have some minor comments: Hi Nick, thanks for reviewing. Eric Dumazet wrote: */ -int get_futex_key(void __user *uaddr, union futex_key *key) +int get_futex_key(void __user *uaddr, union futex_key *key, +struct rw_semaphore *shared) Can we pass in something other than the rw_semaphore here? Seeing as it only actually gets used as a flag, it might be nicer just to pass a 0 or 1? And all through the call stack... Did the whole thing just turn out neater when you passed the rwsem? We always know to use current->mm->mmap_sem, so it doesn't seem like a boolean flag would hurt? That's a good question current->mm->mmap_sem being calculated once is a win in itself, because current access is not cheap. It also does the memory access to go through part of the chain in advance, before its use. It does a prefetch() equivalent for free : If current->mm is not in CPU cache, CPU wont stall because next instructions dont depend on it. This means less CPU stall in case current->mm is not in CPU cache. Thats difficult to benchmark it, but you can trust me. A flag means : if (flag) up_read(>mm->mmap_sem) This generates quite a bad code. if (ptr) up_read(ptr) generates *much* better code. So this is a cleanup and a runtime optimization. I dit a similar optimization on commit 163da958ba5282cbf85e8b3dc08e4f51f8b01c5e I invite you to check it : http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=163da958ba5282cbf85e8b3dc08e4f51f8b01c5e { unsigned long address = (unsigned long)uaddr; struct mm_struct *mm = current->mm; @@ -218,6 +224,22 @@ int get_futex_key(void __user *uaddr, un address -= key->both.offset; /* + * PROCESS_PRIVATE futexes are fast. + * As the mm cannot disappear under us and the 'key' only needs + * virtual address, we dont even have to find the underlying vma. + * Note : We do have to check 'address' is a valid user address, + *but access_ok() should be faster than find_vma() + * Note : At this point, address points to the start of page, + *not the real futex address, this is ok. + */ +if (!shared) { +if (!access_ok(VERIFY_WRITE, address, sizeof(int))) +return -EFAULT; Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it should depend on the size of the operation. Maybe the access_ok check should go outside get_futex_key? If you check again, you'll see that address points to the start of the PAGE, not the real u32/u64 futex address. This checks the PAGE. We can use char, short, int, long, or char[PAGE_SIZE] as long as we know a futex cannot span two pages. */ key->shared.inode = vma->vm_file->f_path.dentry->d_inode; -key->both.offset++; /* Bit 0 of offset indicates inode-based key. */ +key->both.offset += FUT_OFF_INODE; /* inode-based key. */ if (likely(!(vma->vm_flags & VM_NONLINEAR))) { key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff); I like |= for adding flags, it seems less ambiguous. But I guess that's a matter of opinion. Hugh seems to like +=, and I can't argue with him about style issues ;) Previous code was doing offset++ wich means offset += 1; I didnt want to hurt Hugh :) EXPORT_SYMBOL_GPL(drop_futex_key_refs); I wonder if it would be worthwhile inlining and likley()ing the private fastpath? Might make it pretty compact... I guess that's something to worry about after glibc gets support. Yes, in a future patch, in about one year :) + +if (!(vma = find_vma(mm, address)) || +vma->vm_start > address || !(vma->vm_flags & VM_WRITE)) +ret = -EFAULT; + +else +switch (handle_mm_fault(mm, vma, address, 1)) { +case VM_FAULT_MINOR: +current->min_flt++; +break; +case VM_FAULT_MAJOR: +current->maj_flt++; +break; +default: +ret = -EFAULT; +} +if (!shared) +up_read(>mmap_sem); +return ret; } /* You've got an extra space after the if (maybe for clarity?). In this situation I prefer putting braces around both the if and the else, and if you get rid of that blank line, it doesn't cost you anything more ;) Oh well... @@ -1598,6 +1656,8 @@ static int futex_wait(unsigned long __us restart->arg1 = val; restart->arg2 = (unsigned long)abs_time; restart->arg3 = (unsigned long)futex64; +if (shared) +restart->arg3 |= 2; Could you make this into a proper flags argument and use #define CONSTANTs for it? Yes, but I'm not sure it will improve readability. @@ -2377,23 +2455,24 @@ sys_futex64(u64 __user *uaddr, int op, u struct timespec ts; ktime_t t, *tp = NULL; u64 val2 = 0; +int opm = op & FUTEX_CMD_MASK; What's opm stand for?
[PATCH 2.6.21-rc6] mm/page_alloc.c: removal of an unused definition of 'setup_n_node_ids'
Remove an empty and thus unused definition of 'setup_nr_node_ids' (in case of MAX_NUMNODES < 1) in order to resolve a compiler warning. Signed-off-by: Patrick Ringl <[EMAIL PROTECTED]> --- --- linux-2.6.20-o/mm/page_alloc.c 2007-03-22 23:11:25.0 +0100 +++ linux-2.6.20/mm/page_alloc.c2007-04-06 07:19:38.0 +0200 @@ -680,8 +680,6 @@ static void __init setup_nr_node_ids(voi highest = node; nr_node_ids = highest + 1; } -#else -static void __init setup_nr_node_ids(void) {} #endif #ifdef CONFIG_NUMA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
On 4/5/07, Andrew Morton <[EMAIL PROTECTED]> wrote: On Fri, 06 Apr 2007 02:33:03 +1000 Reuben Farrelly <[EMAIL PROTECTED]> wrote: > Hi, > > On 3/04/2007 3:47 PM, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > > > - The oops in git-net.patch has been fixed, so that tree has been restored. > > It is huge. > > > > - Added the device-mapper development tree to the -mm lineup (Alasdair > > Kergon). It is a quilt tree, living at > > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > > > - Added davidel's signalfd stuff. > > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. > > md1 is the first array on the disk, and it refuses to start up on boot, or after > boot. > > ... > > tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 > mdadm: device /dev/md1 already active - cannot assemble it > tornado ~ # mdadm --run /dev/md1 > mdadm: failed to run array /dev/md1: Cannot allocate memory > tornado ~ # > > and looking at a dmesg, this is logged: > > md: bind > md: bind > raid1: raid set md1 active with 2 out of 2 mirrors > md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 > md1: failed to create bitmap (-12) > md: pers->run() failed ... Is this the dmesg from boot or the dmesg after running the mdadm --run command? > > tornado ~ # uname -a > Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) > Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux > tornado ~ # > > The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing > out the -mm releases so much lately. OK. I assume that bitmap->chunks in bitmap_init_from_disk() has some unexpectedly large value. I don't _think_ there's anything in -mm which would have triggered this. Does mainline do the same thing? I guess it's possible that the code in git-md-accel.patch accidentally broke things. Perhaps try disabling CONFIG_DMA_ENGINE? git-md-accel.patch does not touch anything in the raid1 path, but I guess stranger things have happened. -- Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER.
On Thu, 05 Apr 2007 18:34:48 PDT, [EMAIL PROTECTED] said: > If they are accurate, THEN they are obviously very relevant. Erm. No. They're not "obviously" very relevant. I could hypothetically create a benchmark, that's accurate and repeatable, that shows that reiser4 is able to wash a herd of elephants exactly 11.458% faster than ext3. And you would, of course, say "But elephants have nothing to do with file systems", Because they aren't relevant to file systems. Similarly, we've seen benchmarks that show some patch improves NUMA performance by 5% - and those aren't relevant on my laptop because my laptop doesn't do NUMA. And a benchmark of file system performance is only as relevant as it reflects *your* application's use of the filesystem - how fast it can create and remove tiny files isn't relevant if your use of the filesystem is to store large files with long sequential read/write patterns. And the level of compression isn't very relevant if you're using the partition to store already-compressed audio or video. I know somebody who defines a "relevance index" for things, and the measure is "how many cubicles do I have to go to find somebody who actually cares about ABC?" - and for him, that's itself a relevant index, because if it's 0, *he* cares, and if it's 1, his immediate neighbors care and will cause him grief if ABC is a problem. People who are 5 or 6 cubicles away are less likely to give him a hard time, and the people who are 15 to 20 cubicles away are in an entirely separate building. :) pgp7slhTxfy9C.pgp Description: PGP signature
Re: Reiser4. BEST FILESYSTEM EVER.
Hi Peter, You say that the results may be accurate, but "Whether or not they're *relevant* is a totally different ball of wax." and "Whether or not they're relevant depends on how well they happen to reflect your particular usage pattern." Well, surprise, surprise,.. everyone knows that. Have a look at the (summary) of the results: .-. | FILESYSTEM | TIME |DISK | | TYPE |(secs)|USAGE| .-. |REISER4 lzo | 1938 | 278 | |REISER4 gzip| 2295 | 213 | |REISER4 | 3462 | 692 | |EXT2| 4092 | 816 | |JFS | 4225 | 806 | |EXT4| 4408 | 816 | |EXT3| 4421 | 816 | |XFS | 4625 | 779 | |REISER3 | 6178 | 793 | |FAT32 |12342 | 988 | |NTFS-3g |10414 | 772 | .-. for the full results see: http://linuxhelp.150m.com/resources/fs-benchmarks.htm Don't you agree, that "If they are accurate, THEN they are obviously very relevant." I have set up a Reiser4 partition with gzip compression, here is the difference in disk usage of a typical Debian installation on two 10GB partitions, one with Reiser3 and the other with Reiser4. debian:/# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda3 10490104 6379164 4110940 61% /3 /dev/sda7 9967960 2632488 7335472 27% /7 Partitions 3 and 7 have exactly the same data on them (the typical Debian install). The partitions are exactly the same size (although df records different sizes). Partition 3 is Reiser3 -- uses 6.4 GB. Partition 7 is Reiser4 -- uses 2.6 GB. So Reiser4 uses 2.6 GB to store the (typical) data that it takes Reiser3 6.4 GB to store (note it would take ext2/3/4 some 7 GB to store the same info). Don't you think this result is significant in itself? Following your hint I have booted /dev/sda7 and all the programs seem to work fine. They do not seem to be any faster than when using Reiser3. The whole system seems about as responsive as always. For fun, I ran bonnie++. Here are the results: debian:/# ./bonnie++ -u root Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.93c --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP debian 1G 121 99 86524 21 63297 41 920 99 187762 80 1782 233 Latency 82484us 386ms 438ms 26758us 110ms 398ms Version 1.93c --Sequential Create-- Random Create debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 + +++ + +++ 18509 92 17776 86 + +++ 19495 91 Latency 210us5475us5525us5777us5522us 5839us I particularly liked the 233%CP for Random-Seeks. John. On Thu, 05 Apr 2007 21:07:28 -0700, "H. Peter Anvin" <[EMAIL PROTECTED]> said: > [EMAIL PROTECTED] wrote: > > Hi Peter, > > > > You say that the results may be accurate, but not relevant. > > > > NO, I said that whether they're accurate is another matter. > > > If they are accurate, THEN they are obviously very relevant. > > Crap-o-la. Whether or not they're relevant depends on how well they > happen to reflect your particular usage pattern. > > There are NO benchmarks which are relevant to all users. Understanding > whether or not a benchmark is relevant to one's particular application > is one of the trickiest things about benchmarks. > > -hpa -- [EMAIL PROTECTED] -- http://www.fastmail.fm - Email service worth paying for. Try it for free - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER.
[EMAIL PROTECTED] wrote: Hi Peter, You say that the results may be accurate, but not relevant. NO, I said that whether they're accurate is another matter. If they are accurate, THEN they are obviously very relevant. Crap-o-la. Whether or not they're relevant depends on how well they happen to reflect your particular usage pattern. There are NO benchmarks which are relevant to all users. Understanding whether or not a benchmark is relevant to one's particular application is one of the trickiest things about benchmarks. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] USB gadget rndis: fix struct rndis_packet_msg_type unaligned bug
[PATCH] usb gadget rndis: skb_push function may return a pointer which is not aligned as required by struct rndis_packet_msg_type. Using attribute trick to fix this bug. Signed-off-by: Roy Huang <[EMAIL PROTECTED]> Signed-off-by: Jie Zhang <[EMAIL PROTECTED]> Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> --- drivers/usb/gadget/rndis.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/usb/gadget/rndis.h b/drivers/usb/gadget/rndis.h index 4c3c725..397b149 100644 --- a/drivers/usb/gadget/rndis.h +++ b/drivers/usb/gadget/rndis.h @@ -195,7 +195,7 @@ struct rndis_packet_msg_type __le32 PerPacketInfoLength; __le32 VcHandle; __le32 Reserved; -}; +} __attribute__ ((packed)); struct rndis_config_parameter { -- 1.5.0.5 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Lguest launcher, child starving parent
On Thu, 2007-04-05 at 16:40 -0400, Steven Rostedt wrote: > Glauber noticed long delays between hitting a key, and seeing data come > up on the virtual console. Looking into this, I found that the > wake_parent routine that reads from all devices was actually starving > out the parent after sending the parent a signal to wake up. > > The thing is, the child which takes the console input is recognized by > the scheduler as an interactive process. The parent, doesn't do so > much, so it is recognized more as a CPU hog. So the child easily gets a > higher priority than the parent. Hmm, I changed the prio of the waker from "nice(19)" to "nice(5)" after Andi complained (he still isn't happy tho). I'll change it back for the moment. Unfortunately we need to keep sending signals to the parent, in order to avoid the race between unblocking SIGUSR1 and the read() on /dev/lguest. This is the nature of Unix signals, unfortunately. I've been pondering restoring the original /dev/lguest interface, which handed an fd directly into the kernel. Then the child would just use this fd and not send signals. It could well improve performance, too... Thanks for the bug report, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: set up new kernel with grub
On Thu, Apr 05, 2007 at 12:28:03PM -0500, Michael wrote: >Hi, Dick, > >Your steps work beautifully. Thanks. > >If you could explain a little about what happens in each step, that >would be even better. > >> # cd /usr/src/linux-2.6.20.3 >> If your current kernel is 2.6.20.3, edit the Makefile to >> add some character after "EXTRAVERSION" as EXTRAVERSION= 3x >> # cp .config .. Save your existing config file in the parent directory. >> # make distclean Clean the files generated by last compiling. >> # cp ../.config . Copy your .config back here. >> # make oldconfig "The make oldconfig command causes the kernel configuration process to read in your existing configuration information and then prompt you for a value for any kernel configuration variables that were not provided set the existing kernel configuration file." >> # make Check all changed object files, and do the final kernel image link. >> # make modules_install Reinstall the newly-compiled modules. >> # make install Copy the kernel image and system.map to /boot and modify /boot/grub/menu.lst (or lilo.conf) properly. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: missing madvise functionality
Ulrich Drepper wrote: Nick Piggin wrote: Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's kernels using down_write(mmap_sem) for MADV_DONTNEED is better than mmap/mprotect, which have more fundamental locking requirements, more overhead and no benefits (except debugging, I suppose). It's a tiny bit faster, see http://people.redhat.com/drepper/dontneed.png I just ran it once so the graph is not smooth. This is on a UP dual core machine. Maybe tomorrow I'll turn on the big 4p machine. Hmm, I saw an improvement, but that was just on a raw syscall test with a single page chunk. Real-world use I guess will get progressively less dramatic as other overheads start being introduced. Multi-thread performance probably won't get a whole lot better (it does eliminate 1 down_write(mmap_sem), but one remains) until you use my madvise patch. I would have to see dramatically different results on the big machine to make me change the libc code. The reason is that there is a big drawback. So far, when we allocate a new arena, we allocate address space with PROT_NONE and only when we need memory the protection is changed to PROT_READ|PROT_WRITE. This is the advantage of catching wild pointer accesses. Sure, yes. And I guess you'd always want to keep that options around as a debugging aid. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: missing madvise functionality
Nick Piggin wrote: > Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's > kernels using down_write(mmap_sem) for MADV_DONTNEED is better than > mmap/mprotect, which have more fundamental locking requirements, more > overhead and no benefits (except debugging, I suppose). It's a tiny bit faster, see http://people.redhat.com/drepper/dontneed.png I just ran it once so the graph is not smooth. This is on a UP dual core machine. Maybe tomorrow I'll turn on the big 4p machine. I would have to see dramatically different results on the big machine to make me change the libc code. The reason is that there is a big drawback. So far, when we allocate a new arena, we allocate address space with PROT_NONE and only when we need memory the protection is changed to PROT_READ|PROT_WRITE. This is the advantage of catching wild pointer accesses. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Linux 2.6.21-rc6
Ok, I don't think there really is anything very interesting here, but we're hopefully whittling down the list of regressions, and fixing various random other small issues while at it. Some smallish MIPS updates, networking (and network driver) fixes, removal of a long obsolete framebuffer driver, etc etc. The shortlog really tells the story. We should be getting close to a 2.6.21 release, so please update any regression reports you've done, Linus --- Adrian Bunk (6): [DCCP]: make dccp_write_xmit_timer() static again 9p: make struct v9fs_cached_file_operations static drivers/spi/: fix section mismatches drivers/eisa/pci_eisa.c:pci_eisa_init() should be init drivers/mfd/sm501.c: fix an off-by-one net/sunrpc/svcsock.c: fix a check Alan Cox (2): tty: minor merge correction pata_pdc202xx_old: LBA48 bug Alan Stern (1): UHCI: Fix problem caused by lack of terminating QH Albert Lee (5): pdc202xx_new: Enable ATAPI DMA libata: reorder HSM_ST_FIRST for easier decoding (take 3) libata: Clear tf before doing request sense (take 3) libata: Limit max sector to 128 for TORiSAN DVD drives (take 3) libata: Limit ATAPI DMA to R/W commands only for TORiSAN DVD drives (take 3) Alexey Dobriyan (1): [NET]: Correct accept(2) recovery after sock_attach_fd() Alexey Kuznetsov (1): [NET]: Fix neighbour destructor handling. Andi Kleen (3): x86-64: Disable local APIC timer use on AMD systems with C1E x86-64: Let oprofile reserve MSR on all CPUs x86-64: Increase NMI watchdog probing timeout Andreas Oberritter (2): V4L/DVB (5495): Tda10086: fix DiSEqC message length V4L/DVB (5496): Pluto2: fix incorrect TSCR register setting Andrew Morton (4): proc: fix linkage with CONFIG_SYSCTL=y, CONFIG_PROC_SYSCTL=n revert "retries in ext3_prepare_write() violate ordering requirements" revert "retries in ext4_prepare_write() violate ordering requirements" remove protection of LANANA-reserved majors Andrew Victor (1): [ARM] 4289/1: AT91: SAM9260 NAND flash timing Arnaldo Carvalho de Melo (1): [DCCP] getsockopt: Fix DCCP_SOCKOPT_[SEND,RECV]_CSCOV Avi Kivity (1): KVM: Prevent system selectors leaking into guest on real->protected mode transition on vmx Ayaz Abdulla (2): forcedeth: fix nic poll forcedeth: fix tx timeout Bartlomiej Zolnierkiewicz (2): ide: revert "ide: fix drive side 80c cable check, take 2" for now ide: fix locking for manual DMA enable/disable ("hdparm -d") Bill Helfinstine (1): b44: fix IFF_ALLMULTI handling of CAM slots Brian Pomerantz (1): fix page leak during core dump Brice Goglin (1): myri10ge: correctly detect when TSO should be used Bruce Fields (2): knfsd: nfsd4: fix inheritance flags on v4 ace derived from posix default ace knfsd: nfsd4: demote "clientid in use" printk to a dprintk Carsten Otte (1): mm: fix xip issue with /dev/zero Chris Dearman (2): [MIPS] lockdep: Handle interrupts in R3000 style c0_status register. [MIPS] lockdep: Deal with interrupt disable hazard in TRACE_IRQFLAGS Chris Snook (1): atl1: save mac address on remove Chuck Meade (1): [POWERPC] qe: Fix QUICC Engine SDMA setup errors Conke Hu (1): ahci.c: walkaround for SB600 SATA internal error issue Cornelia Huck (2): [S390] cio: Device status validity. [S390] cio: Fix handling of interrupt for csch(). Cyrill V. Gorcunov (1): SUN3/3X Lance trivial fix improved Daniel Drake (1): generic_serial: fix decoding of baud rate David Brownell (4): USB: omap_udc: workaround dma_free_coherent() bogosity USB: fix usb-serial/generic build warning USB: fix usb-serial/ftdi build warning rtc-cmos lockdep fix, irq updates David Howells (1): SLAB: Mention slab name when listing corrupt objects David S. Miller (4): [IPV6]: Fix routing round-robin locking. [DRM]: Delete sparc64 FFB driver code that never gets built. [VIDEO] ffb: Fix two DAC handling bugs. [SCSI]: Fix scsi_send_eh_cmnd scatterlist handling David Wilder (1): [S390] kprobes: Align probe address. David Woodhouse (1): bcm43xx: Fix machine check on PPC for version 1 PHY Divy Le Ray (4): cxgb3 - Safeguard TCAM size usage cxgb3 - detect NIC only adapters cxgb3 - Tighten xgmac workaround cxgb3 - Firwmare update Dmitriy Monakhov (1): splice: partial write fix Erez Zilber (1): IB/iser: Handle aborting a command after it is sent Eric W. Biederman (4): MSI-X: fix resume crash pid: Properly detect orphaned process groups in exit_notify msi: synchronously mask and unmask msi-x irqs. net: Ignore sysfs network device rename bugs. Francois Romieu (3): sis190: new PHY support r8169: issue request_irq after the private data are completely initialized
Re: [-mm3 PATCH] (Retry) Check the return value of kobject_add and etc.
On Thu, Apr 05, 2007 at 06:00:16PM +0200, Cornelia Huck wrote: >On Thu, 5 Apr 2007 23:27:32 +0800, >WANG Cong <[EMAIL PROTECTED]> wrote: > >> Thank you very much! I know. So I should replace all kfree with kobject_put, >> like this one: >> >> -sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem"); >> +if (sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem")) { >> +kobject_uevent(>kobj, KOBJ_REMOVE); >> +kobject_del(>kobj); >> +kobject_put(>kobj); >> +return; >> +} >> >> Is that all right? >> > >Yes, or use kobject_unregister(). OK. Then I send it again. Hopefully it can be accepted this time. ;-p Signed-off-by: WANG Cong <[EMAIL PROTECTED]> --- --- linux-2.6.21-rc5-mm4/fs/partitions/check.c.orig 2007-04-05 12:48:29.0 +0800 +++ linux-2.6.21-rc5-mm4/fs/partitions/check.c 2007-04-05 23:15:41.0 +0800 @@ -385,10 +385,18 @@ void add_partition(struct gendisk *disk, p->kobj.parent = >kobj; p->kobj.ktype = _part; kobject_init(>kobj); - kobject_add(>kobj); + if (kobject_add(>kobj)) { + kobject_put(>kobj); + return; + } if (!disk->part_uevent_suppress) kobject_uevent(>kobj, KOBJ_ADD); - sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem"); + if (sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem")) { + kobject_uevent(>kobj, KOBJ_REMOVE); + kobject_del(>kobj); + kobject_put(>kobj); + return; + } if (flags & ADDPART_FLAG_WHOLEDISK) { static struct attribute addpartattr = { .name = "whole_disk", @@ -396,7 +404,13 @@ void add_partition(struct gendisk *disk, .owner = THIS_MODULE, }; - sysfs_create_file(>kobj, ); + if (sysfs_create_file(>kobj, )) { + sysfs_remove_link(>kobj, "subsystem"); + kobject_uevent(>kobj, KOBJ_REMOVE); + kobject_del(>kobj); + kobject_put(>kobj); + return; + } } partition_sysfs_add_subdir(p); disk->part[part-1] = p; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Questions about porting perfmon2 to powerpc
On Thu April 5 2007 6:04 pm, Benjamin Herrenschmidt wrote: > On Thu, 2007-04-05 at 14:55 -0500, Kevin Corry wrote: > > First, the stock 2.6.20 kernel has a prototype in include/linux/smp.h for > > a function called smp_call_function_single(). However, this routine is > > only implemented on i386, x86_64, ia64, and mips. Perfmon2 apparently > > needs to call this to run a function on a specific CPU. Powerpc provides > > an smp_call_function() routine to run a function on all active CPUs, so I > > used that as a basis to add an smp_call_function_single() routine. I've > > included the patch below and was wondering if it looked like a sane > > approach. > > We should do better... it will require some backend work for the various > supported PICs though. I've always wanted to look into doing a > smp_call_function_cpumask in fact :-) I was actually wondering about that myself today. It would seem like an smp_call_function() that takes a CPU mask would be much more flexible than either the current version or the new one that I proposed. However, that was a little more hacking that I was willing to do today on powerpc architecture code. :) > > Next, we ran into a problem related to Perfmon2 initialization and sysfs. > > The problem turned out to be that the powerpc version of topology_init() > > is defined as an __initcall() routine, but Perfmon2's initialization is > > done as a subsys_initcall() routine. Thus, Perfmon2 tries to initialize > > its sysfs information before some of the powerpc cpu information has been > > initialized. However, on all other architectures, topology_init() is > > defined as a subsys_initcall() routine, so this problem was not seen on > > any other platforms. Changing the powerpc version of topology_init() to a > > subsys_initcall() seems to have fixed the bug. However, I'm not sure if > > that is going to cause problems elsewhere in the powerpc code. I've > > included the patch below (after the smp-call-function-single patch). Does > > anyone know if this change is safe, or if there was a specific reason > > that topology_init() was left as an __initcall() on powerpc? > > It would make sense to follow what other archs do. Note that if both > perfmon and topology_init are subsys_initcall, that is on the same > level, it's still a bit hairy to expect one to be called before the > other... I wondered that as well, but based on what Arnd posted earlier (presumably about the kernel linking order), the topology_init() call, which is in the arch/ top-level directory, should occur before pfm_init(), which is in perfmon/, even if both are in the same initcall level. Thanks, -- Kevin Corry [EMAIL PROTECTED] http://www.ibm.com/linux/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Questions about porting perfmon2 to powerpc
On Thu April 5 2007 3:32 pm, Kevin Corry wrote: > On Thu April 5 2007 3:08 pm, Arnd Bergmann wrote: > > On Thursday 05 April 2007, Kevin Corry wrote: > > > First, the stock 2.6.20 kernel has a prototype in include/linux/smp.h > > > for a function called smp_call_function_single(). However, this routine > > > is only implemented on i386, x86_64, ia64, and mips. Perfmon2 > > > apparently needs to call this to run a function on a specific CPU. > > > Powerpc provides an smp_call_function() routine to run a function on > > > all active CPUs, so I used that as a basis to add an > > > smp_call_function_single() routine. I've included the patch below and > > > was wondering if it looked like a sane approach. > > > > The function itself looks good, but since it's very similar to the > > existing smp_call_function(), you should probably try to share some of > > the code, e.g. by making a helper function that gets an argument to > > decide whether to run on a specific CPU or on all CPUs. > > Ok. I'll see what I can come up with and post another patch today or > tomorrow. Here's a new version that adds smp_call_function_single(), and moves the code that's shared with smp_call_function() to __smp_call_function(). Thanks, -- Kevin Corry [EMAIL PROTECTED] http://www.ibm.com/linux/ Add an smp_call_function_single() to the powerpc architecture. Since this is very similar to the existing smp_call_function() routine, the common portions have been split out into __smp_call_function(). Since the spin_lock(_lock) was moved to __smp_call_function(), smp_call_function() now explicitly calls preempt_disable() before getting the count of online CPUs. Signed-off-by: Kevin Corry <[EMAIL PROTECTED]> Index: linux-2.6.20-arnd3-perfmon/arch/powerpc/kernel/smp.c === --- linux-2.6.20-arnd3-perfmon.orig/arch/powerpc/kernel/smp.c +++ linux-2.6.20-arnd3-perfmon/arch/powerpc/kernel/smp.c @@ -198,26 +198,11 @@ static struct call_data_struct { /* delay of at least 8 seconds */ #define SMP_CALL_TIMEOUT 8 -/* - * This function sends a 'generic call function' IPI to all other CPUs - * in the system. - * - * [SUMMARY] Run a function on all other CPUs. - * The function to run. This must be fast and non-blocking. - * An arbitrary pointer to pass to the function. - * currently unused. - * If true, wait (atomically) until function has completed on other CPUs. - * [RETURNS] 0 on success, else a negative status code. Does not return until - * remote CPUs are nearly ready to execute <> or are or have executed. - * - * You must not call this function with disabled interrupts or from a - * hardware interrupt handler or from a bottom half handler. - */ -int smp_call_function (void (*func) (void *info), void *info, int nonatomic, - int wait) -{ +static int __smp_call_function(void (*func)(void *info), void *info, + int wait, int target_cpu, int num_cpus) +{ struct call_data_struct data; - int ret = -1, cpus; + int ret = -1; u64 timeout; /* Can deadlock when called with interrupts disabled */ @@ -234,40 +219,33 @@ int smp_call_function (void (*func) (voi atomic_set(, 0); spin_lock(_lock); - /* Must grab online cpu count with preempt disabled, otherwise -* it can change. */ - cpus = num_online_cpus() - 1; - if (!cpus) { - ret = 0; - goto out; - } call_data = smp_wmb(); /* Send a message to all other CPUs and wait for them to respond */ - smp_ops->message_pass(MSG_ALL_BUT_SELF, PPC_MSG_CALL_FUNCTION); + smp_ops->message_pass(target_cpu, PPC_MSG_CALL_FUNCTION); timeout = get_tb() + (u64) SMP_CALL_TIMEOUT * tb_ticks_per_sec; /* Wait for response */ - while (atomic_read() != cpus) { + while (atomic_read() != num_cpus) { HMT_low(); if (get_tb() >= timeout) { - printk("smp_call_function on cpu %d: other cpus not " - "responding (%d)\n", smp_processor_id(), - atomic_read()); + printk("%s on cpu %d: other cpus not " + "responding (%d)\n", __FUNCTION__, + smp_processor_id(), atomic_read()); debugger(NULL); goto out; } } if (wait) { - while (atomic_read() != cpus) { + while (atomic_read() != num_cpus) { HMT_low(); if (get_tb() >= timeout) { - printk("smp_call_function on cpu %d: other " - "cpus not finishing (%d/%d)\n", - smp_processor_id(), + printk("%s on cpu %d: other
Re: missing madvise functionality
Ulrich Drepper wrote: In case somebody wants to play around with Rik patch or another madvise-based patch, I have x86-64 glibc binaries which can use it: http://people.redhat.com/drepper/rpms These are based on the latest Fedora rawhide version. They should work on older systems, too, but you screw up your updates. Use them only if you know what you do. By default madvise(MADV_DONTNEED) is used. With the environment variable Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's kernels using down_write(mmap_sem) for MADV_DONTNEED is better than mmap/mprotect, which have more fundamental locking requirements, more overhead and no benefits (except debugging, I suppose). MADV_DONTNEED is twice as fast in single threaded performance, and an order of magnitude faster for multiple threads, when MADV_DONTNEED only takes mmap_sem for read. Do you plan to include this change in general glibc releases? Maybe it will make google malloc obsolete? ;) (I don't suppose you'd be able to get any tests done, Andrew?) -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Define EFLAGS_IF
On Thu, 2007-04-05 at 18:06 -0700, H. Peter Anvin wrote: > Andi Kleen wrote: > > > > No processor.h is such a hodgepodge of unrelated stuff that any > > splitting up is a good thing. > > > > Fair enough. However, I'd still like to see the X86_CR* constants > moved, too (and constants added for at least CR0 as well.) Agreed. This was on theory of minimum damage, but since it seems to have received a warm reception, I'd say moving the rest to processor-flags.h would be a welcome addition. Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB gadget rndis: fix bug skb_push function may return an unaligned pointer bug
On Thu, 2007-04-05 at 14:29 -0700, David Brownell wrote: > On Tuesday 03 April 2007 11:28 pm, Wu, Bryan wrote: > > USB gadget rndis: skb_push function may return a pointer which is not > > aligned as required by struct rndis_packet_msg_type. > > Can you instead try to update the declaration of that struct > so that it's "__attribute__((packed))"? That's less invasive, > and will address similar issues elsewhere ... > > - Dave OK, Jie and Roy will try to use this __attribute__ method and test it on blackfin platform. Sorry for missing their "Signed-off-by". I will resend a patch later for review. Thanks -Bryan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Unified lguest launcher
On Thu, 2007-04-05 at 11:43 -0300, Glauber de Oliveira Costa wrote: > and here's the new patch, merging rusty's suggestions and some more on my own. > > May I upload this, or does Rusty (or any other) has some more suggestions? This looks excellent! There are a couple of extra spaces floating around, but that's trivial. You use "errno = ESRCH; err()" where you could use "errx()". Please merge it straight in. No need for a separate patch in the tree for this I think, unless you plan more work? Thanks! Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
On Thu, 2007-04-05 at 15:36 -0700, David Miller wrote: > From: Andrew Burgess <[EMAIL PROTECTED]> > Date: Thu, 5 Apr 2007 15:13:27 -0700 > > > David, do you see any other problems with scsi_send_eh_cmnd? > > > > I've switched back to 2.6.18 which seems to not oops > > and am happy to try patches. > > Does 2.6.20 with my patch OOPS too? Does reverting my patch > make the oops go away? > > If reverting my patch makes the OOPS go away, we need to > verify if page_address() is returning crap for some reason > or the length is wrong. 2.6.20.4 with your patch dies in the memcpy (as does 21-gitN) 2.6.20.4 without your patch dies in the subsequent __free_page with a null pointer ref at 000...008 James should I try your posted patch? On which kernel? This machine will die in boot on these kernels until I power cycle it (which somehow fixes the disk/controller for a while), 2.6.18 continue to work (gets the scsi errors and continues) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
AIC79xx: scsi0: device overrun (status a)
Can anyone tell me what this means: Apr 5 22:11:56 vegeta kernel: [ 1265.267700] scsi0: device overrun (status a) on 0:1:0 Kernel is 2.6.20. I setup a raid1 between 2 hard disks (on partition #2), as soon as it started to sync the array, my log was flooded with the above entry. The scsi adapter is an onboard controller on a supermicro x5da8. The hard disks are SEAGATE ST318404LW drives on channel 0 (no other devices on this channel). >From what I can tell, the speed of the sync is going fairly quickly. ~29mb/sec This is really strange to me, since I dd'd from the first disk to the second with out any messages in the log (It was an older kernel, 2.6.17) If there's any other information needed, ask. I'm not sure what else is needed. -- Lab tests show that use of micro$oft causes cancer in lab animals Got Gas??? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init's children list is long and slows reaping children.
Linus Torvalds <[EMAIL PROTECTED]> writes: > On Thu, 5 Apr 2007, Chris Snook wrote: > >> Linus Torvalds wrote: >> >> > Another thing we could do is to just make sure that kernel threads simply >> > don't end up as children of init. That whole thing is silly, they're really >> > not children of the user-space init anyway. Comments? >> >> Does anyone remember why we started doing this in the first place? I'm sure >> there are some tools that expect a process tree, rather than a forest, and >> making it a forest could make them unhappy. > > I'm not sure anybody would really be unhappy with pptr pointing to some > magic and special task that has pid 0 (which makes it clear to everybody > that the parent is something special), and that has SIGCHLD set to SIG_IGN > (which should make the exit case not even go through the zombie phase). > > I can't even imagine *how* you'd make a tool unhappy with that, since even > tools like "ps" (and even more "pstree" won't read all the process states > atomically, so they invariably will see parent pointers that don't even > exist any more, because by the time they get to the parent, it has exited > already. Right. pid == 1 being missing might cause some confusing having but having ppid == 0 should be fine. Heck pid == 1 already has ppid == 0, so it is a value user space has had to deal with for a while. In addition there was a period in 2.6 where most kernel threads and init had a pgid == 0 and a session == 0, and nothing seemed to complain. We should probably make all of the kernel threads children of init_task. The initial idle thread on the first cpu that is the parent of pid == 1. That will give the ppid == 0 naturally because the idle thread has pid == 0. >> The support angel on my shoulder says we should just put all the kernel >> threads under a kthread subtree to shorten init's child list and minimize >> impact. > > A number are already there, of course, since they use the kthread > infrastructure to get there. Almost everything should be using kthread by now. I do admit that there are a handful of kernel threads that still use kthread_create but it is a relatively short list. Looking we apparently have a couple of process started by kthread_create that are not under kthread. They all have pid numbers lower than kthread so I'm guessing it is some startup ordering issue. Currently it looks like daemonize is reparenting everything to init, changing that to init_task and making the threads self reaping should be trivial. . I'm a little nervous that we exceeded our default pid max just booting the kernel. 32768 is a lot of kernel threads. That sounds like 32 kernel threads per cpu. That seems to be more than I have on any of my little machines. There is no defined order for reaping of child processes and in fact I can't even see anything in the kernel right now that would even accidentally give user space the idea we had a defined order. So I think we have some options once we get the kernel threads out of the way. Getting the kernel threads out of the way would seem to be the first priority. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Jeremy Fitzhardinge wrote: Zachary Amsden wrote: Do you mean kmap_atomic_pfn? Yes. kunmap_atomic can stay lazy (at least for VMI), actually, but it doesn't help since it happens outside the spin lock. May as well be consistent. Or do you mean you can't flush outside the spinlock, even if there's nothing pending? Consistency is good. Flush is always fine, just an extra function call. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Zachary Amsden wrote: > Do you mean kmap_atomic_pfn? Yes. > kunmap_atomic can stay lazy (at least for VMI), actually, but it > doesn't help since it happens outside the spin lock. May as well be consistent. Or do you mean you can't flush outside the spinlock, even if there's nothing pending? J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Jeremy Fitzhardinge wrote: Zachary Amsden wrote: Throw it in the queue; I'll slide in after it. I've pushed it up. I added a few missing cases to the patch (kmap_atomic_pte, kunmap_atomic). Do you mean kmap_atomic_pfn? kunmap_atomic can stay lazy (at least for VMI), actually, but it doesn't help since it happens outside the spin lock. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER.
Hi Peter, You say that the results may be accurate, but not relevant. .-. | FILESYSTEM | TIME |DISK | | TYPE |(secs)|USAGE| .-. |REISER4 lzo | 1938 | 278 | |REISER4 gzip| 2295 | 213 | |REISER4 | 3462 | 692 | |EXT2| 4092 | 816 | |JFS | 4225 | 806 | |EXT4| 4408 | 816 | |EXT3| 4421 | 816 | |XFS | 4625 | 779 | |REISER3 | 6178 | 793 | |FAT32 |12342 | 988 | |NTFS-3g |10414 | 772 | .-. If they are accurate, THEN they are obviously very relevant. Trying to follow http://linuxhelp.150m.com/resources/fs-benchmarks.htm I have set up a Reiser4 partition with gzip compression, here is the difference in disk usage of a typical Debian installation on two 10GB partitions, one with Reiser3 and the other with Reiser4. debian:/# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda3 10490104 6379164 4110940 61% /3 /dev/sda7 9967960 2632488 7335472 27% /7 Partitions 3 and 7 have exactly the same data on them (the typical Debian install). The partitions are exactly the same size (although df records different sizes). Partition 3 is Reiser3 -- uses 6.4 GB. Partition 7 is Reiser4 -- uses 2.6 GB. So Reiser4 uses 2.6 GB to store the (typical) data that it takes Reiser3 6.4 GB to store (note it would take ext2/3/4 some 7 GB to store the same info). This seems very relevant to me. John. On Thu, 05 Apr 2007 17:39:58 -0700, "H. Peter Anvin" <[EMAIL PROTECTED]> said: > [EMAIL PROTECTED] wrote: > > Yeap, I guess that will probably work. > > > > And here I was trying to compile old versions of GRUB from namesys.com. > > > > By the way, do you think the benchmarks from: > > > > http://linuxhelp.150m.com/resources/fs-benchmarks.htm and > > http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm > > > > are accurate? > > > > Accurate, probably. Whether or not they're *relevant* is a totally > different ball of wax. > > -hpa -- [EMAIL PROTECTED] -- http://www.fastmail.fm - mmm... Fastmail... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Zachary Amsden wrote: > Throw it in the queue; I'll slide in after it. I've pushed it up. I added a few missing cases to the patch (kmap_atomic_pte, kunmap_atomic). J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init's children list is long and slows reaping children.
On Thu, 5 Apr 2007, Chris Snook wrote: > Linus Torvalds wrote: > > > Another thing we could do is to just make sure that kernel threads simply > > don't end up as children of init. That whole thing is silly, they're really > > not children of the user-space init anyway. Comments? > > Does anyone remember why we started doing this in the first place? I'm sure > there are some tools that expect a process tree, rather than a forest, and > making it a forest could make them unhappy. I'm not sure anybody would really be unhappy with pptr pointing to some magic and special task that has pid 0 (which makes it clear to everybody that the parent is something special), and that has SIGCHLD set to SIG_IGN (which should make the exit case not even go through the zombie phase). I can't even imagine *how* you'd make a tool unhappy with that, since even tools like "ps" (and even more "pstree" won't read all the process states atomically, so they invariably will see parent pointers that don't even exist any more, because by the time they get to the parent, it has exited already. > The support angel on my shoulder says we should just put all the kernel > threads under a kthread subtree to shorten init's child list and minimize > impact. A number are already there, of course, since they use the kthread infrastructure to get there. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: missing madvise functionality
Rik van Riel wrote: Nick Piggin wrote: Oh, also: something like this patch would help out MADV_DONTNEED, as it means it can run concurrently with page faults. I think the locking will work (but needs forward porting). Ironically, your patch decreases throughput on my quad core test system, with Jakub's test case. MADV_DONTNEED, my patch, 1 loops (14k context switches/second) real0m34.890s user0m17.256s sys 0m29.797s MADV_DONTNEED, my patch & your patch, 1 loops (50 context switches/second) real1m8.321s user0m20.840s sys 1m55.677s I suspect it's moving the contention onto the page table lock, in zap_pte_range(). I guess that the thread private memory areas must be living right next to each other, in the same page table lock regions :) For more real world workloads, like the MySQL sysbench one, I still suspect that your patch would improve things. I think it definitely would, because the app will be wanting to do other things with mmap_sem as well (like futexes *grumble*). Also, the test case is allocating and freeing 512K chunks, which I think would be on the high side of typical. You have 32 threads for 4 CPUs, so then it would actually make sense to context switch on mmap_sem write lock rather than spin on ptl. But the kernel doesn't know that. Testing with a small chunk size or thread == CPUs I think would show a swing toward my patch. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] epoll cleanups - epoll include diet ...
On Thu, 5 Apr 2007 18:12:58 -0700 (PDT) Davide Libenzi wrote: > On Thu, 5 Apr 2007, Andrew Morton wrote: > > > epoll uses signal stuff and might need signal.h. It implements syscalls > > and it certainly needs to have those syscall's prototypes in scope. It > > surely uses stuff from mm.h (doesn't everything??) > > Ack about signal.h, I forgot about the pwait code :( > Why syscalls.h? The eventpoll.c file expots syscalls, but it doesn't use > anything declared in there. So that the compiler can verify that our declarations of sys_epoll_foo() match our definitions of them. > What does eventpoll.c use *directly* from mm.h? If eventpoll.c uses, let's > say sched.h, and sched.h needs mm.h, it is sched.h responsibility to > include the mm.h file not eventpoll.c one. > Sure. But if epoll.c _does_ use something from mm.h (or uses something from a header which mm.h includes) then if we later remove the #include mm.h from sched.h, eventpoll.c will break. The general rule is: include in .c the header files which provide the stuff which that .c file uses. Now, it maybe that eventpoll.c indeed uses nothing which mm.h provides, and nothing which mm.h's includees provide. But it is non-trivial to prove that. Once added, includes are hard to remove :( - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Jeremy Fitzhardinge wrote: Zachary Amsden wrote: Yes, thought about several solutions, and this seems the best. But it requires a new paravirt-op. Not with the power of multiplexing. Something like this, perhaps? Ok, I tried that and I got a nice clean fix. For 2.6.22. Backporting this to 2.6.21 creates havoc, as a number of cleanup patches as well as changes to highmem code get in the way. Andi, do you really want to deal with the conflicts this will create for the paravirt queue for 2.6.22, or would you rather apply the dumb yet simple and non-confrontational workaround I have been trying to get applied to 2.6.21? Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
Hi Eric, Thanks for doing this... It's looking good, I just have some minor comments: Eric Dumazet wrote: Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> --- linux-2.6.21-rc5-mm4/kernel/futex.c +++ linux-2.6.21-rc5-mm4-ed/kernel/futex.c @@ -16,6 +16,9 @@ * Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <[EMAIL PROTECTED]> * Copyright (C) 2006 Timesys Corp., Thomas Gleixner <[EMAIL PROTECTED]> * + * PRIVATE futexes by Eric Dumazet + * Copyright (C) 2007 Eric Dumazet <[EMAIL PROTECTED]> + * * Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly * enough at me, Linus for the original (flawed) idea, Matthew * Kirkwood for proof-of-concept implementation. @@ -199,9 +202,12 @@ static inline int match_futex(union fute * Returns: 0, or negative error code. * The key words are stored in *key on success. * - * Should be called with >mm->mmap_sem but NOT any spinlocks. + * shared is NULL for PROCESS_PRIVATE futexes + * For other futexes, it points to >mm->mmap_sem and + * caller must have taken the reader lock. but NOT any spinlocks. */ -int get_futex_key(void __user *uaddr, union futex_key *key) +int get_futex_key(void __user *uaddr, union futex_key *key, + struct rw_semaphore *shared) Can we pass in something other than the rw_semaphore here? Seeing as it only actually gets used as a flag, it might be nicer just to pass a 0 or 1? And all through the call stack... Did the whole thing just turn out neater when you passed the rwsem? We always know to use current->mm->mmap_sem, so it doesn't seem like a boolean flag would hurt? { unsigned long address = (unsigned long)uaddr; struct mm_struct *mm = current->mm; @@ -218,6 +224,22 @@ int get_futex_key(void __user *uaddr, un address -= key->both.offset; /* +* PROCESS_PRIVATE futexes are fast. +* As the mm cannot disappear under us and the 'key' only needs +* virtual address, we dont even have to find the underlying vma. +* Note : We do have to check 'address' is a valid user address, +*but access_ok() should be faster than find_vma() +* Note : At this point, address points to the start of page, +*not the real futex address, this is ok. +*/ + if (!shared) { + if (!access_ok(VERIFY_WRITE, address, sizeof(int))) + return -EFAULT; Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it should depend on the size of the operation. Maybe the access_ok check should go outside get_futex_key? + key->private.mm = mm; + key->private.address = address; + return 0; + } + /* * The futex is hashed differently depending on whether * it's in a shared or private mapping. So check vma first. */ @@ -244,6 +266,7 @@ int get_futex_key(void __user *uaddr, un * mappings of _writable_ handles. */ if (likely(!(vma->vm_flags & VM_MAYSHARE))) { + key->both.offset += FUT_OFF_MMSHARED; /* reference taken on mm */ key->private.mm = mm; key->private.address = address; return 0; @@ -253,7 +276,7 @@ int get_futex_key(void __user *uaddr, un * Linear file mappings are also simple. */ key->shared.inode = vma->vm_file->f_path.dentry->d_inode; - key->both.offset++; /* Bit 0 of offset indicates inode-based key. */ + key->both.offset += FUT_OFF_INODE; /* inode-based key. */ if (likely(!(vma->vm_flags & VM_NONLINEAR))) { key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff); I like |= for adding flags, it seems less ambiguous. But I guess that's a matter of opinion. Hugh seems to like +=, and I can't argue with him about style issues ;) @@ -281,17 +304,19 @@ EXPORT_SYMBOL_GPL(get_futex_key); * Take a reference to the resource addressed by a key. * Can be called while holding spinlocks. * - * NOTE: mmap_sem MUST be held between get_futex_key() and calling this - * function, if it is called at all. mmap_sem keeps key->shared.inode valid. */ inline void get_futex_key_refs(union futex_key *key) { - if (key->both.ptr != 0) { - if (key->both.offset & 1) + if (key->both.ptr == 0) + return; + switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) { + case FUT_OFF_INODE: atomic_inc(>shared.inode->i_count); - else + break; + case FUT_OFF_MMSHARED: atomic_inc(>private.mm->mm_count); - } + break; + } } EXPORT_SYMBOL_GPL(get_futex_key_refs); @@ -301,11 +326,15 @@ EXPORT_SYMBOL_GPL(get_futex_key_refs); */ void drop_futex_key_refs(union futex_key *key) { - if (key->both.ptr != 0) { -
Re: [PATCH] x86_64/acpi: make kernel to be compiled when CONFIG_ACPI_NUMA is set and power management with acpi is not enabled
On Tue, 3 Apr 2007 21:02:03 -0700 "Yinghai Lu" <[EMAIL PROTECTED]> wrote: > [PATCH] x86_64/acpi: make kernel to be compiled when CONFIG_ACPI_NUMA is set > and power management with acpi is not enabled > > when CONFIG_ACPI_NUMA is set, and power management with acpi is not used. the > kernel can not be compiled. > so use CONFIG_ACPI_POWER and CONFIG_ACPI_SYTEM to comment function about > set/get power and event. > > Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]> > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c > index dd49ea0..4d06885 100644 > --- a/drivers/acpi/bus.c > +++ b/drivers/acpi/bus.c > @@ -121,6 +121,7 @@ int acpi_bus_get_status(struct acpi_device *device) > > EXPORT_SYMBOL(acpi_bus_get_status); > > +#ifdef CONFIG_ACPI_POWER > /* -- > Power Management > > -- */ > @@ -269,7 +270,9 @@ int acpi_bus_set_power(acpi_handle handle, int state) > } > > EXPORT_SYMBOL(acpi_bus_set_power); > +#endif > > +#ifdef CONFIG_ACPI_SYSTEM > /* -- > Event Management > > -- */ > @@ -358,6 +361,7 @@ int acpi_bus_receive_event(struct acpi_bus_event *event) > } > > EXPORT_SYMBOL(acpi_bus_receive_event); > +#endif > > /* -- > Notification Handling > diff --git a/drivers/net/e1000/e1000_param.c b/drivers/net/e1000/e1000_param.c > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > index a064f36..c2a1ac9 100644 > --- a/drivers/pci/pci-acpi.c > +++ b/drivers/pci/pci-acpi.c > @@ -255,7 +255,7 @@ static int acpi_pci_choose_state(struct pci_dev *pdev, > pm_message_t state) > > return -ENODEV; > } > - > +#ifdef CONFIG_ACPI_POWER > static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state) > { > acpi_handle handle = DEVICE_ACPI_HANDLE(>dev); > @@ -272,7 +272,7 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, > pci_power_t state) > return -ENODEV; > return acpi_bus_set_power(handle, acpi_state); > } > - > +#endif > > /* ACPI bus type */ > static int acpi_pci_find_device(struct device *dev, acpi_handle *handle) > @@ -321,7 +321,9 @@ static int __init acpi_pci_init(void) > if (ret) > return 0; > platform_pci_choose_state = acpi_pci_choose_state; > +#ifdef CONFIG_ACPI_POWER > platform_pci_set_power_state = acpi_pci_set_power_state; > +#endif > return 0; > } > arch_initcall(acpi_pci_init); This is a rather unpleasing patch from a maintainability point of view - all those ifdefs do cause various problems. I wonder if the situation could be improved by something like: - Move acpi_bus_set_power() and acpi_bus_get_power() into power.c, which is only compiled if CONFIG_ACPI_POWER. - Move acpi_bus_generate_event() and acpi_bus_receive_event() and their associated global variables into event.c, whcih is only compiled if CONFIG_ACPI_SYSTEM. - Move acpi_pci_set_power_state() into power.c - Move the initalisation of platform_pci_set_power_state into acpi_power_init() (this will have runtime effects - changed startup ordering) Of course, making these changes might require some adjustments elsewhere - some symbols might need to be made global, others maybe can become newly static, etc. The primary aim should be to keep the code _logical_. If we think that the above code motion reduces ifdefs, but makes the overall code layout less logical, then we shouldn't do it. But if the code remains at least equally logical afterwards, and we can reduce the ifdeffing then we should do it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] epoll cleanups - epoll include diet ...
On Thu, 5 Apr 2007, Andrew Morton wrote: > epoll uses signal stuff and might need signal.h. It implements syscalls > and it certainly needs to have those syscall's prototypes in scope. It > surely uses stuff from mm.h (doesn't everything??) Ack about signal.h, I forgot about the pwait code :( Why syscalls.h? The eventpoll.c file expots syscalls, but it doesn't use anything declared in there. What does eventpoll.c use *directly* from mm.h? If eventpoll.c uses, let's say sched.h, and sched.h needs mm.h, it is sched.h responsibility to include the mm.h file not eventpoll.c one. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Define EFLAGS_IF
Andi Kleen wrote: No processor.h is such a hodgepodge of unrelated stuff that any splitting up is a good thing. Fair enough. However, I'd still like to see the X86_CR* constants moved, too (and constants added for at least CR0 as well.) -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init's children list is long and slows reaping children.
Chris Snook wrote: Linus Torvalds wrote: On Thu, 5 Apr 2007, Robin Holt wrote: For testing, Jack Steiner create the following patch. All it does is moves tasks which are transitioning to the zombie state from where they are in the children list to the head of the list. In this way, they will be the first found and reaping does speed up. We will still do a full scan of the list once the rearranged tasks are all removed. This does not seem to be a significant problem. I'd almost prefer to just put the zombie children on a separate list. I wonder how painful that would be.. That would still make it expensive for people who use WUNTRACED to get stopped children (since they'd have to look at all lists), but maybe that's not a big deal. Shouldn't be any worse than it already is. Another thing we could do is to just make sure that kernel threads simply don't end up as children of init. That whole thing is silly, they're really not children of the user-space init anyway. Comments? Linus Does anyone remember why we started doing this in the first place? I'm sure there are some tools that expect a process tree, rather than a forest, and making it a forest could make them unhappy. The support angel on my shoulder says we should just put all the kernel threads under a kthread subtree to shorten init's child list and minimize impact. The hacker devil on my other shoulder says that with usermode helpers, containers, etc. it's about time we treat it as a tree, and any tools that have a problem with that need to be fixed. -- Chris Err, that should have been "about time we treat it as a forest". -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Ten percent test
On Thursday 05 April 2007 21:54, Ingo Molnar wrote: > - fiftyp.c: noticeable, but alot better than previously! fiftyp.c seems to have been stumbled across by accident as having an effect when Xenofon was trying to recreate Mike's 50% x 3 test case. I suggest a ten percent version like the following would be more useful as a test for the harmful effect discovered in fiftyp.c. (/me throws in obligatory code style change). Starts 15 processes that sleep ten times longer than they run. Change forks to 15 times the number of cpus you have and it should work on any size hardware. -- -ck // gcc -O2 -o tenp tenp.c -lrt // code from interbench.c #include #include #include #include #include #include /* * Start $forks processes that run for 10% cpu time each. Set this to * 15 * number of cpus for best effect. */ int forks = 15; unsigned long run_us = 10, sleep_us; unsigned long loops_per_ms; void terminal_error(const char *name) { fprintf(stderr, "\n"); perror(name); exit (1); } unsigned long long get_nsecs(struct timespec *myts) { if (clock_gettime(CLOCK_REALTIME, myts)) terminal_error("clock_gettime"); return (myts->tv_sec * 10 + myts->tv_nsec ); } void burn_loops(unsigned long loops) { unsigned long i; /* * We need some magic here to prevent the compiler from optimising * this loop away. Otherwise trying to emulate a fixed cpu load * with this loop will not work. */ for (i = 0 ; i < loops ; i++) asm volatile("" : : : "memory"); } /* Use this many usecs of cpu time */ void burn_usecs(unsigned long usecs) { unsigned long ms_loops; ms_loops = loops_per_ms / 1000 * usecs; burn_loops(ms_loops); } void microsleep(unsigned long long usecs) { struct timespec req, rem; rem.tv_sec = rem.tv_nsec = 0; req.tv_sec = usecs / 100; req.tv_nsec = (usecs - (req.tv_sec * 100)) * 1000; continue_sleep: if ((nanosleep(, )) == -1) { if (errno == EINTR) { if (rem.tv_sec || rem.tv_nsec) { req.tv_sec = rem.tv_sec; req.tv_nsec = rem.tv_nsec; goto continue_sleep; } goto out; } terminal_error("nanosleep"); } out: return; } /* * In an unoptimised loop we try to benchmark how many meaningless loops * per second we can perform on this hardware to fairly accurately * reproduce certain percentage cpu usage */ void calibrate_loop(void) { unsigned long long start_time, loops_per_msec, run_time = 0, min_run_us = run_us; unsigned long loops; struct timespec myts; int i; printf("Calibrating loop\n"); loops_per_msec = 100; redo: /* Calibrate to within 1% accuracy */ while (run_time > 101 || run_time < 99) { loops = loops_per_msec; start_time = get_nsecs(); burn_loops(loops); run_time = get_nsecs() - start_time; loops_per_msec = (100 * loops_per_msec / run_time ? : loops_per_msec); } /* Rechecking after a pause increases reproducibility */ microsleep(1); loops = loops_per_msec; start_time = get_nsecs(); burn_loops(loops); run_time = get_nsecs() - start_time; /* Tolerate 5% difference on checking */ if (run_time > 105 || run_time < 95) goto redo; loops_per_ms=loops_per_msec; printf("Calibrating sleep interval\n"); microsleep(1); /* Find the smallest time interval close to 1ms that we can sleep */ for (i = 0; i < 100; i++) { start_time=get_nsecs(); microsleep(1000); run_time=get_nsecs()-start_time; run_time /= 1000; if (run_time < run_us && run_us > 1000) run_us = run_time; } /* Then set run_us to that duration and sleep_us to 9 x that */ sleep_us = run_us * 9; printf("Calibrating run interval\n"); microsleep(1); /* Do a few runs to see what really gets us run_us runtime */ for (i = 0; i < 100; i++) { start_time=get_nsecs(); burn_usecs(run_us); run_time=get_nsecs()-start_time; run_time /= 1000; if (run_time < min_run_us && run_time > run_us) min_run_us = run_time; } if (min_run_us < run_us) run_us = run_us * run_us / min_run_us; printf("Each fork will run for %lu usecs and sleep for %lu usecs\n", run_us, sleep_us); } int main(void){ int i; calibrate_loop(); printf("starting %d forks\n", forks); for(i = 1; i < forks; i++){ if(!fork()) break; } while(1){ burn_usecs(run_us); microsleep(sleep_us); } return 0; }
Re: [patch 1/3] epoll cleanups - epoll include diet ...
On Tue, 03 Apr 2007 18:35:06 -0700 Davide Libenzi wrote: > Remove some unneeded include files from epoll code. > Our definitions of "unneeded" might differ. > > Signed-off-by: Davide Libenzi > > > - Davide > > > > Index: linux-2.6.21-rc5.mm4/fs/eventpoll.c > === > --- linux-2.6.21-rc5.mm4.orig/fs/eventpoll.c 2007-04-03 17:59:54.0 > -0700 > +++ linux-2.6.21-rc5.mm4/fs/eventpoll.c 2007-04-03 18:33:30.0 > -0700 > @@ -1,6 +1,6 @@ > /* > - * fs/eventpoll.c ( Efficent event polling implementation ) > - * Copyright (C) 2001,...,2006 Davide Libenzi > + * fs/eventpoll.c (Efficent event notification implementation) > + * Copyright (C) 2001,...,2007 Davide Libenzi > * > * This program is free software; you can redistribute it and/or modify > * it under the terms of the GNU General Public License as published by > @@ -17,30 +17,21 @@ > #include > #include > #include > -#include > #include > -#include > #include > #include > #include > #include > #include > #include > -#include > #include > #include > #include > #include > -#include > -#include > #include > #include > #include > -#include > -#include > -#include > #include > -#include epoll uses signal stuff and might need signal.h. It implements syscalls and it certainly needs to have those syscall's prototypes in scope. It surely uses stuff from mm.h (doesn't everything??) I am suspecting that this patch relies upon accidental nested inclusions from within other headers. But that is super-fragile: change a config item, switch to a different architecture and whoops, it doesn't compile any more. Maybe I'm wrong, and you somehow worked out that none of these things which these headers define, and none the things which these headers' includees define is used in epoll.c or in the headers which are included after these headers, or in those headers' includees. If so, how the heck did you do that? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] [RFC] HID bus design overview.
Dmitry Torokhov wrote: >> +static void hid_bus_release(struct device *dev) >> +{ >> +} >> + >> +struct device hid_bus = { >> +.bus_id = "hidbus0", >> +.release = hid_bus_release >> +}; >> + >> +static void hid_dev_release(struct device *dev) >> +{ >> +} >> + >> > > That will for sure raise Greg KH's blood pressure ;) > I know your words now. The entire hid_bus device is useless. The original code of hid bus is copied from LDD3e. It seem the API had changed since it pressed. In fact, the new kernel only work silent without it, or the kref_get() will warn us. And, I fixed the double hidinput_disconnect() problem last night. It's reason is not invalid memory access, instead of, it's normal behavior of hidinput_disconnect(). The resolution is easy, We should move inputs member to hid_device, not in hid_driver. so if we removed one hid_device, it do not disconnect all devices which its driver bind, just only itself. Now, usbhid works fine. Good luck. - Li Yu - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Jeremy Fitzhardinge wrote: Zachary Amsden wrote: Yes, thought about several solutions, and this seems the best. But it requires a new paravirt-op. Not with the power of multiplexing. Something like this, perhaps? Throw it in the queue; I'll slide in after it. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init's children list is long and slows reaping children.
Linus Torvalds wrote: On Thu, 5 Apr 2007, Robin Holt wrote: For testing, Jack Steiner create the following patch. All it does is moves tasks which are transitioning to the zombie state from where they are in the children list to the head of the list. In this way, they will be the first found and reaping does speed up. We will still do a full scan of the list once the rearranged tasks are all removed. This does not seem to be a significant problem. I'd almost prefer to just put the zombie children on a separate list. I wonder how painful that would be.. That would still make it expensive for people who use WUNTRACED to get stopped children (since they'd have to look at all lists), but maybe that's not a big deal. Shouldn't be any worse than it already is. Another thing we could do is to just make sure that kernel threads simply don't end up as children of init. That whole thing is silly, they're really not children of the user-space init anyway. Comments? Linus Does anyone remember why we started doing this in the first place? I'm sure there are some tools that expect a process tree, rather than a forest, and making it a forest could make them unhappy. The support angel on my shoulder says we should just put all the kernel threads under a kthread subtree to shorten init's child list and minimize impact. The hacker devil on my other shoulder says that with usermode helpers, containers, etc. it's about time we treat it as a tree, and any tools that have a problem with that need to be fixed. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
On Thu, 2007-04-05 at 17:15 -0700, David Miller wrote: > This won't work I believe. > > There are cases that use smaller sense buffers than the minimum > specified by the SCSI layer. > > One example is that do_sr_ioctl() stuff when the cgc passed > in has a sense buffer. That will only be as large as a > "struct request_sense". > > I'm pretty sure that's one of the reasons why we cons up a local sense > buffer in this EH code. > > So we could walk past the end of that and corrupt memory with > your patch. That should be fine ... the application copies the sense out of scmnd->sense_buffer ... it can take as much or as little as it wants (sense_buffer is actually a SCSI_SENSE_BUFFERSIZE array inside the command). There was one thing I missed, which is that the sense buffer size of the command is 252, whereas I need to set it back down to sizeof(scmnd->sense_buffer). This is another area where we "could do better" ... the request actually gives us a sense buffer, but we use our own and later copy data out of it back into the request. James - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Zachary Amsden wrote: > Yes, thought about several solutions, and this seems the best. But it > requires a new paravirt-op. Not with the power of multiplexing. Something like this, perhaps? J diff -r 5be4a5ff8e6b arch/i386/mm/highmem.c --- a/arch/i386/mm/highmem.cThu Apr 05 17:04:04 2007 -0700 +++ b/arch/i386/mm/highmem.cThu Apr 05 17:50:46 2007 -0700 @@ -42,6 +42,8 @@ void *kmap_atomic_prot(struct page *page vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); set_pte(kmap_pte-idx, mk_pte(page, prot)); + + arch_flush_lazy_mmu_mode(); return (void*) vaddr; } diff -r 5be4a5ff8e6b include/asm-generic/pgtable.h --- a/include/asm-generic/pgtable.h Thu Apr 05 17:04:04 2007 -0700 +++ b/include/asm-generic/pgtable.h Thu Apr 05 17:50:46 2007 -0700 @@ -180,6 +180,7 @@ static inline void ptep_set_wrprotect(st #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE #define arch_enter_lazy_mmu_mode() do {} while (0) #define arch_leave_lazy_mmu_mode() do {} while (0) +#define arch_flush_lazy_mmu_mode() do {} while (0) #endif /* @@ -193,6 +194,7 @@ static inline void ptep_set_wrprotect(st #ifndef __HAVE_ARCH_ENTER_LAZY_CPU_MODE #define arch_enter_lazy_cpu_mode() do {} while (0) #define arch_leave_lazy_cpu_mode() do {} while (0) +#define arch_flush_lazy_cpu_mode() do {} while (0) #endif /* diff -r 5be4a5ff8e6b include/asm-i386/paravirt.h --- a/include/asm-i386/paravirt.h Thu Apr 05 17:04:04 2007 -0700 +++ b/include/asm-i386/paravirt.h Thu Apr 05 17:50:46 2007 -0700 @@ -27,9 +27,10 @@ struct desc_struct; /* Lazy mode for batching updates / context switch */ enum paravirt_lazy_mode { - PARAVIRT_LAZY_NONE = 0, - PARAVIRT_LAZY_MMU = 1, - PARAVIRT_LAZY_CPU = 2, + PARAVIRT_LAZY_NONE = 0, /* exit lazy mode */ + PARAVIRT_LAZY_MMU = 1, /* lazy mmu updates */ + PARAVIRT_LAZY_CPU = 2, /* lazy cpu state updates */ + PARAVIRT_LAZY_FLUSH = 3,/* flush pending changes, if any */ }; struct paravirt_ops @@ -1044,6 +1045,10 @@ static inline void arch_leave_lazy_cpu_m { PVOP_VCALL1(set_lazy_mode, PARAVIRT_LAZY_NONE); } +static inline void arch_flush_lazy_cpu_mode(void) +{ + PVOP_VCALL1(set_lazy_mode, PARAVIRT_LAZY_FLUSH); +} #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE static inline void arch_enter_lazy_mmu_mode(void) @@ -1053,6 +1058,10 @@ static inline void arch_leave_lazy_mmu_m static inline void arch_leave_lazy_mmu_mode(void) { PVOP_VCALL1(set_lazy_mode, PARAVIRT_LAZY_NONE); +} +static inline void arch_flush_lazy_mmu_mode(void) +{ + PVOP_VCALL1(set_lazy_mode, PARAVIRT_LAZY_FLUSH); } void _paravirt_nop(void); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Define EFLAGS_IF
On Thu, Apr 05, 2007 at 05:29:52PM -0700, H. Peter Anvin wrote: > Jeremy Fitzhardinge wrote: > > > >That patch got dropped, and replaced by one which pulled all the flags > >definitions out of > > > > Saw that a little too late :) > > In general, it would be nice if the various CPU constants were all > defined in one place, so I'd rather suggest protecting the appropriate > parts of asm/processor.h with #ifndef __ASSEMBLY__. No processor.h is such a hodgepodge of unrelated stuff that any splitting up is a good thing. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Jeremy Fitzhardinge wrote: Zachary Amsden wrote: So the clean fix for this is still even further out. I don't think I want to hook kmap/unmap as paravirt-ops. Yes, it seems like overkill. How about something like adding PARAVIRT_LAZY_FLUSH as an argument to set_lazy_mode? It would be valid to use at any time, and it would flush any pending work while still remaining in whatever lazy mode its currently in. That way kmap_atomic can flush anything pending without having to muck around with the current lazy state. Yes, thought about several solutions, and this seems the best. But it requires a new paravirt-op. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER? I need help.
[EMAIL PROTECTED] wrote: Yeap, I guess that will probably work. And here I was trying to compile old versions of GRUB from namesys.com. By the way, do you think the benchmarks from: http://linuxhelp.150m.com/resources/fs-benchmarks.htm and http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm are accurate? Accurate, probably. Whether or not they're *relevant* is a totally different ball of wax. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Zachary Amsden wrote: > So the clean fix for this is still even further out. I don't think I > want to hook kmap/unmap as paravirt-ops. Yes, it seems like overkill. How about something like adding PARAVIRT_LAZY_FLUSH as an argument to set_lazy_mode? It would be valid to use at any time, and it would flush any pending work while still remaining in whatever lazy mode its currently in. That way kmap_atomic can flush anything pending without having to muck around with the current lazy state. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER? I need help.
Yeap, I guess that will probably work. And here I was trying to compile old versions of GRUB from namesys.com. By the way, do you think the benchmarks from: http://linuxhelp.150m.com/resources/fs-benchmarks.htm and http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm are accurate? .-. | FILESYSTEM | TIME |DISK | | TYPE |(secs)|USAGE| .-. |REISER4 lzo | 1938 | 278 | |REISER4 gzip| 2295 | 213 | |REISER4 | 3462 | 692 | |EXT2| 4092 | 816 | |JFS | 4225 | 806 | |EXT4| 4408 | 816 | |EXT3| 4421 | 816 | |XFS | 4625 | 779 | |REISER3 | 6178 | 793 | |FAT32 |12342 | 988 | |NTFS-3g |10414 | 772 | .-. Column one measures the time taken to complete the bonnie++ benchmarking test (run with the parameters bonnie++ -n128:128k:0) Column two, Disk Usage: measures the amount of disk used to store 655MB of raw data (which was 3 different copies of the Linux kernel sources). Thanks for that, John. On Thu, 05 Apr 2007 17:23:23 -0700, "H. Peter Anvin" <[EMAIL PROTECTED]> said: > [EMAIL PROTECTED] wrote: > > > > Anyway, I have patched the 2.6.20 kernel and have a partition formatted > > with Reiser4. > > > > However, I am having trouble getting LILO or GRUB working (with > > Reiser4). > > > > Could you guys who know all about this, help me, or point me to some > > help. > > > > Make your /boot a separate partition and format it as conservatively as > possible (e.g. ext3, or even ext2.) > > Problem solved. > > -hpa -- [EMAIL PROTECTED] -- http://www.fastmail.fm - Send your email first class - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio
On Tue, 03 Apr 2007 19:46:04 +0900 Tomoki Sekiyama <[EMAIL PROTECTED]> wrote: > This patchset is to avoid the problem that write(2) can be blocked for a > long time if a system has several disks with different speed and is > under heavy I/O pressure. > > -Description of the problem: > While Dirty+Writeback pages get more than 40%(`dirty_ratio') of memory, > generators of dirty pages are blocked in balance_dirty_pages() until > they start writeback of a specific number (`write_chunk', typically=1536) > of dirty pages on the disks they write to. > > Under this rule, if a process writes to the disk which has only a few > (less than 1536) dirty pages, that process will be blocked until > writeback of the other disks is completed and % of Dirty+Writeback goes > below 40%. > > Thus, if a slow device (such as a USB disk) has many dirty pages, the > processes which write small data to the other disks can be blocked for > quite a long time. > > -Solution: > This patch introduces high/low-watermark algorithm in > balance_dirty_pages() in order to throttle only the processes which > write to disks with heavy load. > > This patch adds `dirty_start_writeback_ratio' for the low-watermark, > and modifies get_dirty_limits() to calculate and return the writeback > starting level of dirty pages based on `dirty_start_writeback_ratio'. > > If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of > dirty pages start writeback of dirty pages by themselves. At that time, > these processes are not blocked in balance_dirty_pages(), but they may > be blocked if the write-requests-queue of the written disk is full > (that is, the length of the queue > `nr_requests'). By this behavior, > we can throttle only processes which write to the disks with heavy load, > and can allow processes to write to the other disks without blocking. > > If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages > are throttled as current Linux does, not to fill up memory with dirty > pages. Does this actually solve the problem? If the request queue is sufficiently large (relative to the various dirty-memory thresholds) then I'd expect that a heavy-writer will be able to very quickly take the total dirty+writeback memory up to the dirty_ratio (should be renamed throttle_threshold, but it's too late for that). I suspect the reason why this patch was successful in your testing was because dirty_start_writeback_ratio happens to exceed the size of the disk request queues, so the heavy writer is getting stuck on disk request queue exhaustion. But that won't work if we have a lot of processes writing to a lot of disks, and it won't work if the request queue size is large, or if the dirty-memory thresholds are small (relative to the request queue size). Do the patches still work after `echo 1 > /sys/block/sda/queue/nr_requests'? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Define EFLAGS_IF
Jeremy Fitzhardinge wrote: That patch got dropped, and replaced by one which pulled all the flags definitions out of Saw that a little too late :) In general, it would be nice if the various CPU constants were all defined in one place, so I'd rather suggest protecting the appropriate parts of asm/processor.h with #ifndef __ASSEMBLY__. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER? I need help.
[EMAIL PROTECTED] wrote: Anyway, I have patched the 2.6.20 kernel and have a partition formatted with Reiser4. However, I am having trouble getting LILO or GRUB working (with Reiser4). Could you guys who know all about this, help me, or point me to some help. Make your /boot a separate partition and format it as conservatively as possible (e.g. ext3, or even ext2.) Problem solved. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Jeremy Fitzhardinge wrote: Zachary Amsden wrote: No, they are totally dependent. The reason interrupts are disabled is to stop kmap_atomic in interrupt handlers. With the kmap_atomic_pte changes, the whole interrupt disable jibberish goes away. But kmap_atomic_pte is a special case of kmap_atomic for ptes. Interrupt routines can still use plain kmap_atomic for bouncebuffers and so on. Ah, yes. A more general patch would be to make kmap/unmap_atomic pv_ops, and then they can all be rolled together. I.e: check the type to see if special pte handling needs to happen, etc. So the clean fix for this is still even further out. I don't think I want to hook kmap/unmap as paravirt-ops. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Define EFLAGS_IF
H. Peter Anvin wrote: > Rusty Russell wrote: > >> There is now more than one place where we use the fact that bit 9 of >> eflags is the interrupt-enabled flag, so define EFLAGS_IF. We make it >> 512 so it can be used in asm, too. >> > > How about defining all the other EFLAGS in one place? > That patch got dropped, and replaced by one which pulled all the flags definitions out of J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Define EFLAGS_IF
Rusty Russell wrote: There is now more than one place where we use the fact that bit 9 of eflags is the interrupt-enabled flag, so define EFLAGS_IF. We make it 512 so it can be used in asm, too. How about defining all the other EFLAGS in one place? -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
From: James Bottomley <[EMAIL PROTECTED]> Date: Thu, 05 Apr 2007 19:02:19 -0500 > On Thu, 2007-04-05 at 15:36 -0700, David Miller wrote: > > From: Andrew Burgess <[EMAIL PROTECTED]> > > Date: Thu, 5 Apr 2007 15:13:27 -0700 > > > > > David, do you see any other problems with scsi_send_eh_cmnd? > > > > > > I've switched back to 2.6.18 which seems to not oops > > > and am happy to try patches. > > > > Does 2.6.20 with my patch OOPS too? Does reverting my patch > > make the oops go away? > > > > If reverting my patch makes the OOPS go away, we need to > > verify if page_address() is returning crap for some reason > > or the length is wrong. > > Assuming this does turn out to be the problem, we should just junk the > page allocation ... it's completely unnecessary; when the slab allocated > commands were done, we made sure the actual sense_buffer is at the > correct location, so this should be the final fix: This won't work I believe. There are cases that use smaller sense buffers than the minimum specified by the SCSI layer. One example is that do_sr_ioctl() stuff when the cgc passed in has a sense buffer. That will only be as large as a "struct request_sense". I'm pretty sure that's one of the reasons why we cons up a local sense buffer in this EH code. So we could walk past the end of that and corrupt memory with your patch. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Zachary Amsden wrote: > No, they are totally dependent. The reason interrupts are disabled is > to stop kmap_atomic in interrupt handlers. With the kmap_atomic_pte > changes, the whole interrupt disable jibberish goes away. But kmap_atomic_pte is a special case of kmap_atomic for ptes. Interrupt routines can still use plain kmap_atomic for bouncebuffers and so on. A more general patch would be to make kmap/unmap_atomic pv_ops, and then they can all be rolled together. I.e: check the type to see if special pte handling needs to happen, etc. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Jeremy Fitzhardinge wrote: Zachary Amsden wrote: Well at this point, the "proper" fix is dependent on Jeremy's kmap_atomic_pte changes, which are definitely too late to pull into 2.6.21. Can we just apply this patch please? Hm, I think they're independent aren't they? Your fix is about making lazy_mmu disable interrupts; that's independent of how highpte pages get mapped. No, they are totally dependent. The reason interrupts are disabled is to stop kmap_atomic in interrupt handlers. With the kmap_atomic_pte changes, the whole interrupt disable jibberish goes away. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Zachary Amsden wrote: > Well at this point, the "proper" fix is dependent on Jeremy's > kmap_atomic_pte changes, which are definitely too late to pull into > 2.6.21. Can we just apply this patch please? Hm, I think they're independent aren't they? Your fix is about making lazy_mmu disable interrupts; that's independent of how highpte pages get mapped. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/01] New FBDev driver for Intel Vermilion Range
On Thu, 2007-04-05 at 11:44 +0100, Alan Hourihane wrote: > Attached is a patch against 2.6.21-rc5 which adds the Intel Vermilion > Range support. > > Intel funded Tungsten Graphics to do this work. > > If there's any problems or updates needed to be done to get accepted, > please let me know. > Preferably, add sparse annotations and compile with make C=1. I've included possible sparse annotations (the only ones I can see) below. > > + > +struct cr_sys { > + struct vml_sys sys; > + struct pci_dev *mch_dev; > + struct pci_dev *lpc_dev; > + __u32 mch_bar; > + __u8 *mch_regs_base; void __iomem *mch_regs_base; (sparse) > + __u32 gpio_bar; > + __u32 saved_panel_state; > + __u32 saved_clock; > +}; > + > > +static void crvml_panel_on(const struct vml_sys *sys) > +{ > + const struct cr_sys *crsys = container_of(sys, struct cr_sys, sys); > + __u32 addr = crsys->gpio_bar + CRVML_PANEL_PORT; > + __u32 cur = inl(addr); > + > + if (!(cur & CRVML_PANEL_ON)) { > + /* Make sure LVDS controller is down. */ > + if (cur & 0x0001) { > + cur &= ~CRVML_LVDS_ON; > + outl(cur, addr); > + } > + /* Power up Panel */ > + schedule_timeout(HZ / 10); > + cur |= CRVML_PANEL_ON; > + outl(cur, addr); > + } > + > + /* Power up LVDS controller */ > + > + if (!(cur & CRVML_LVDS_ON)) { > + schedule_timeout(HZ / 10); > + outl(cur | CRVML_LVDS_ON, addr); > + } > +} > + > +static void crvml_panel_off(const struct vml_sys *sys) > +{ > + const struct cr_sys *crsys = container_of(sys, struct cr_sys, sys); > + > + __u32 addr = crsys->gpio_bar + CRVML_PANEL_PORT; > + __u32 cur = inl(addr); > + > + /* Power down LVDS controller first to avoid high currents */ > + if (cur & CRVML_LVDS_ON) { > + cur &= ~CRVML_LVDS_ON; > + outl(cur, addr); > + } > + if (cur & CRVML_PANEL_ON) { > + schedule_timeout(HZ / 10); > + outl(cur & ~CRVML_PANEL_ON, addr); > + } > +} > + > +static void crvml_backlight_on(const struct vml_sys *sys) > +{ > + const struct cr_sys *crsys = container_of(sys, struct cr_sys, sys); > + __u32 addr = crsys->gpio_bar + CRVML_PANEL_PORT; > + __u32 cur = inl(addr); > + > + if (cur & CRVML_BACKLIGHT_OFF) { > + cur &= ~CRVML_BACKLIGHT_OFF; > + outl(cur, addr); > + } > +} > + > +static void crvml_backlight_off(const struct vml_sys *sys) > +{ > + const struct cr_sys *crsys = container_of(sys, struct cr_sys, sys); > + __u32 addr = crsys->gpio_bar + CRVML_PANEL_PORT; > + __u32 cur = inl(addr); > + > + if (!(cur & CRVML_BACKLIGHT_OFF)) { > + cur |= CRVML_BACKLIGHT_OFF; > + outl(cur, addr); > + } > +} > Perhaps backling_on/off and panel_on/off can be moved to the backlight subsystem? > + > > +static int crvml_sys_restore(struct vml_sys *sys) > +{ > + struct cr_sys *crsys = container_of(sys, struct cr_sys, sys); > + __u32 *clock_reg = (__u32 *) (crsys->mch_regs_base + CRVML_REG_CLOCK); __u32 __iomem *clock_reg = crsys->mch_regs_base + CRVML_REG_CLOCK; (sparse) > + __u32 cur = crsys->saved_panel_state; > + > + if (cur & CRVML_BACKLIGHT_OFF) { > + crvml_backlight_off(sys); > + } else { > + crvml_backlight_on(sys); > + } > + > + if (cur & CRVML_PANEL_ON) { > + crvml_panel_on(sys); > + } else { > + crvml_panel_off(sys); > + if (cur & CRVML_LVDS_ON) { > + ; > + /* Will not power up LVDS controller while panel is off > */ > + } > + } > + iowrite32(crsys->saved_clock, clock_reg); > + ioread32(clock_reg); > + > + return 0; > +} > + > +static int crvml_sys_save(struct vml_sys *sys) > +{ > + struct cr_sys *crsys = container_of(sys, struct cr_sys, sys); > + __u32 *clock_reg = (__u32 *) (crsys->mch_regs_base + CRVML_REG_CLOCK); > + __u32 __iomem *clock_reg = crsys->mch_regs_base + CRVML_REG_CLOCK; (sparse) > + crsys->saved_panel_state = inl(crsys->gpio_bar + CRVML_PANEL_PORT); > + crsys->saved_clock = ioread32(clock_reg); > + > + return 0; > +} > + > +static int crvml_nearest_index(const struct vml_sys *sys, int clock) > +{ > + > + int i; > + int cur_index; > + int cur_diff; > + int diff; > + > + cur_index = 0; > + cur_diff = clock - crvml_clocks[0]; > + cur_diff = (cur_diff < 0) ? -cur_diff : cur_diff; > + for (i = 1; i < crvml_num_clocks; ++i) { > + diff = clock - crvml_clocks[i]; > + diff = (diff < 0) ? -diff : diff; > + if (diff < cur_diff) { > + cur_index = i; > + cur_diff = diff; > + } > + } > + return cur_index; > +} > + > +static int crvml_nearest_clock(const
Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
On Thu, 2007-04-05 at 15:36 -0700, David Miller wrote: > From: Andrew Burgess <[EMAIL PROTECTED]> > Date: Thu, 5 Apr 2007 15:13:27 -0700 > > > David, do you see any other problems with scsi_send_eh_cmnd? > > > > I've switched back to 2.6.18 which seems to not oops > > and am happy to try patches. > > Does 2.6.20 with my patch OOPS too? Does reverting my patch > make the oops go away? > > If reverting my patch makes the OOPS go away, we need to > verify if page_address() is returning crap for some reason > or the length is wrong. Assuming this does turn out to be the problem, we should just junk the page allocation ... it's completely unnecessary; when the slab allocated commands were done, we made sure the actual sense_buffer is at the correct location, so this should be the final fix: James diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index adb40f2..997532b 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -18,12 +18,12 @@ #include #include #include -#include #include #include #include #include #include +#include #include #include @@ -641,16 +641,8 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, memcpy(scmd->cmnd, cmnd, cmnd_size); if (copy_sense) { - gfp_t gfp_mask = GFP_ATOMIC; - - if (shost->hostt->unchecked_isa_dma) - gfp_mask |= __GFP_DMA; - - sgl.page = alloc_page(gfp_mask); - if (!sgl.page) - return FAILED; - sgl.offset = 0; - sgl.length = 252; + sg_init_one(, scmd->sense_buffer, + sizeof(scmd->sense_buffer)); scmd->sc_data_direction = DMA_FROM_DEVICE; scmd->request_bufflen = sgl.length; @@ -721,18 +713,6 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, /* -* Last chance to have valid sense data. -*/ - if (copy_sense) { - if (!SCSI_SENSE_VALID(scmd)) { - memcpy(scmd->sense_buffer, page_address(sgl.page), - sizeof(scmd->sense_buffer)); - } - __free_page(sgl.page); - } - - - /* * Restore original data */ scmd->request_buffer = old_buffer; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER? I need help.
Hi Ignatich, After seeing the following benchmarks at http://linuxhelp.150m.com/resources/fs-benchmarks.htm and http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm The Reiser4 benchmarks are so good, I have decided to try the Reiser4 filesystem. .-. | FILESYSTEM | TIME |DISK | | TYPE |(secs)|USAGE| .-. |REISER4 lzo | 1938 | 278 | |REISER4 gzip| 2295 | 213 | |REISER4 | 3462 | 692 | |EXT2| 4092 | 816 | |JFS | 4225 | 806 | |EXT4| 4408 | 816 | |EXT3| 4421 | 816 | |XFS | 4625 | 779 | |REISER3 | 6178 | 793 | |FAT32 |12342 | 988 | |NTFS-3g |10414 | 772 | .-. Column one measures the time taken to complete the bonnie++ benchmarking test (run with the parameters bonnie++ -n128:128k:0) Column two, Disk Usage: measures the amount of disk used to store 655MB of raw data (which was 3 different copies of the Linux kernel sources). Anyway, I have patched the 2.6.20 kernel and have a partition formatted with Reiser4. However, I am having trouble getting LILO or GRUB working (with Reiser4). Could you guys who know all about this, help me, or point me to some help. Thanks a lot, John. On Fri, 06 Apr 2007 02:42:35 +0400, "Ignatich" <[EMAIL PROTECTED]> said: > While trying to find the cause of problems with reiser4 in recent > kernels I came across this. > > Incomplete write handling seem to be missing from reiser4_write_extent() > thanks to reiser4-temp-fix.patch. Strangely, there is a patch by Edward > Shishkin that should address that issue, but it is missing from -mm > tree. Please check. > > Max > -- [EMAIL PROTECTED] -- http://www.fastmail.fm - And now for something completely different - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5 possible regression: KDE processes die silently (was: 2.6.21-rc3-mm2: KDE processes die while system is idle)
On Tuesday, 3 April 2007 01:06, Adrian Bunk wrote: > On Sun, Apr 01, 2007 at 06:48:03PM +0200, Rafael J. Wysocki wrote: > > On Sunday, 1 April 2007 17:21, Tilman Schmidt wrote: > > > I'm sorry to say this has now happened with kernel 2.6.21-rc5, too. > > > I started a kernel compilation in the evening and came back in the > > > morning to find all KDE decorations gone. All processes normally > > > running for a KDE session and labelled "[kinit]" in ps were gone > > > but everything else was running fine, and the system was still > > > usable via ssh. /var/log/kdm.log and /var/log/Xorg.0.log contained > > > nothing remotely suspicious. /var/log/messages had two lines I > > > never saw before: > > > > > > Mar 31 02:27:36 gx110 kernel: [153577.891443] ReiserFS: hda3: warning: > > > vs-8115: get_num_ver: not directory or indirect item > > > Mar 31 02:27:36 gx110 kernel: [153577.891559] ReiserFS: hda3: warning: > > > vs-8115: get_num_ver: not directory or indirect item > > > > > > But those didn't appear on previous occurrences of the "dying KDE" > > > problem so I guess they are not related. > > > > > > This is SUSE LINUX 10.0 (i586) running on a Dell OptiPlex GX110 > > > (Intel P3, 933 MHz, i810 chipset, 512 MB RAM, 60 GB ATA disk) > > > % uname -a > > > Linux gx110 2.6.21-rc5-noinitrd #1 PREEMPT Sat Mar 31 02:15:19 CEST 2007 > > > i686 i686 i386 GNU/Linux > > > % cat /proc/cmdline > > > root=/dev/hda3 selinux=0 x11i=vesa video=intelfb:[EMAIL PROTECTED] > > > nmi_watchdog=2 lapic 5 > > > Kernel configuration mostly-modular, based on standard SuSE kernel's > > > /proc/config.gz, just compiling into the kernel everything I need to > > > boot without an initrd and omitting some parts I'm not interested in. > > > (.config attached.) What else might be relevant? > > > > > > Again, this is a Heisenbug, ie. it's not reproducible and invariably > > > happens when I'm away from the machine. (Probably Murphy at work.) > > > It's pretty rare: I have seen it four times on 2.6.21-rc3-mm2 and > > > once on 2.6.21-rc5, on a machine which spends about equal amounts > > > of time running the latest stable, rc, and mm kernels. OTOH, so far > > > it hasn't ever happened with any 2.6.20 or earlier kernel. Nor have > > > I seen it with 2.6.21-rc[1-4] or 2.6.21-rc4-mm* - but for the -rc4 > > > and -rc4-mm releases that's not conclusive as those have only been > > > running for a very short time. > > > > I have a similar problem on x86_64 OpenSUSE 10.2, but it seems to happen > > when a sound (eg. notification) is played while the display is suspended > > (or "powered off"). > > Is it easily reproducible and still present with the latest -git? > If yes, can you bisect? > > > IMO it's a SUSE bug. > > We also have a report of KDE crashes on Debian [1]. > And just a few days ago a kernel bug kwin ran into was fixed [2]. > > If the pattern is "works with 2.6.20 but does not work with 2.6.21-rc", > then it's most likely a kernel regression. Well, I'm not able to reproduce it with the current mainline, so let's hope it's been fixed. :-) Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Andi Kleen wrote: On Friday 06 April 2007 01:29:56 Zachary Amsden wrote: I noticed this never got applied. There was some feedback which I did not include in this patch because I think it is inappropriate to touch code outside vmi.c at this point for 2.6.21. I think it is. That is why i didn't apply it. Well at this point, the "proper" fix is dependent on Jeremy's kmap_atomic_pte changes, which are definitely too late to pull into 2.6.21. Can we just apply this patch please? Thanks, Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-rc5-mm4 initramfs Make Error
I built a version of 2.6.21-rc5-mm4 with an initramfs and it built OK the first time. Then I made changes (applied a Reiser4 patch) and rebuilt, and got the following error: zephyr linux # make CHK include/linux/version.h CHK include/linux/utsrelease.h CALLscripts/checksyscalls.sh :1356:2: warning: #warning syscall getcpu not implemented :1360:2: warning: #warning syscall epoll_pwait not implemented :1364:2: warning: #warning syscall lutimesat not implemented :1380:2: warning: #warning syscall revokeat not implemented :1384:2: warning: #warning syscall frevoke not implemented CHK include/linux/compile.h /usr/src/linux-2.6.21-rc5-mm4/usr/Makefile:41: *** target pattern contains no `%'. Stop. make: *** [usr] Error 2 I have this in the config: CONFIG_INITRAMFS_SOURCE="/initramfs" /initramfs is the directory where I build my initramfs, which is just a busybox setup, very simple. # rm usr/.initramfs_data.* seems to make it go again. -- Zan Lynx <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [PATCH] Bugfix for VMI paravirt ops
On Thu, 05 Apr 2007 16:34:43 -0700 Zachary Amsden <[EMAIL PROTECTED]> wrote: > > I noticed this never got applied. There was some feedback which I did > > not include in this patch because I think it is inappropriate to touch > > code outside vmi.c at this point for 2.6.21. Please apply; this patch > > is needed as a bugfix in 2.6.21. An updated version for 2.6.22 will > > come later which has a nicer interface. > > There was a big foodfight last time you sent this out and I'd assumed that there were still unresolved issues. Or at least a general auru of unhappiness. I guess we merge it now, then (forget to) fix up those issues later on. > Erm, stale patch, sorry. This one instead. yeah, that's the patch which has been in -mm for a week. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
On Friday 06 April 2007 01:29:56 Zachary Amsden wrote: > I noticed this never got applied. There was some feedback which I did > not include in this patch because I think it is inappropriate to touch > code outside vmi.c at this point for 2.6.21. I think it is. That is why i didn't apply it. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Make page->private usable in compound pages V1
On Thu, 5 Apr 2007 15:36:51 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > If we add a new flag so that we can distinguish between the > first page and the tail pages then we can avoid to use page->private > in the first page. page->private == page for the first page, so there > is no real information in there. > > Freeing up page->private makes the use of compound pages more transparent. > They become more usable like real pages. Right now we have to be careful f.e. > if we are going beyond PAGE_SIZE allocations in the slab on i386 because we > can then no longer use the private field. This is one of the issues that > cause us not to support debugging for page size slabs in SLAB. > > Having page->private available for SLUB would allow more meta information > in the page struct. I can probably avoid the 16 bit ints that I have in > there right now. > > Also if page->private is available then a compound page may be equipped > with buffer heads. This may free up the way for filesystems to support > larger blocks than page size. > > We add PageTail as an alias of PageReclaim. Compound pages cannot > currently be reclaimed. Because of the alias one needs to check > PageCompound first. So slub is using compound pages so that it can locate the head page in higher-order pages, whereas slab uses per-object (or per-order-0-page?) metadata for that? I see four instances of + page = virt_to_page(p); + + if (unlikely(PageCompound(page))) + page = page->first_page; A new virt_to_head_page() is needed. Sigh. We're seeing rather a lot of churn to accommodate slub. Do we actually have any justification for all this? If we end up deciding to merge slub and to deprecate then remove slab, what would our reasons have been? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bugfix for VMI paravirt ops
Zachary Amsden wrote: I noticed this never got applied. There was some feedback which I did not include in this patch because I think it is inappropriate to touch code outside vmi.c at this point for 2.6.21. Please apply; this patch is needed as a bugfix in 2.6.21. An updated version for 2.6.22 will come later which has a nicer interface. Erm, stale patch, sorry. This one instead. Critical bugfix; when using software RAID, potentially USB or AIO in highmem configurations, drivers are allowed to use kmap_atomic from interrupt context. This is incompatible with the current implementation of lazy MMU mode, and means the kmap will silently fail, causing either memory corruption or kernel panics. The fix is to disable interrupts on the CPU when entering a lazy MMU state; this is totally safe, as preemption is already disabled, and lazy update state can neither be nested nor overlapping. Thus per-cpu variables to track the state and flags can be used to disable interrupts during this critical region. Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]> diff -r eee69881b2f9 arch/i386/kernel/vmi.c --- a/arch/i386/kernel/vmi.cThu Apr 05 16:20:18 2007 -0700 +++ b/arch/i386/kernel/vmi.cThu Apr 05 16:31:12 2007 -0700 @@ -69,6 +69,7 @@ static struct { void (*flush_tlb)(int); void (*set_initial_ap_state)(int, int); void (*halt)(void); + void (*set_lazy_mode)(int mode); } vmi_ops; /* @@ -545,6 +546,31 @@ vmi_startup_ipi_hook(int phys_apicid, un } #endif +static void vmi_set_lazy_mode(int new_mode) +{ + static DEFINE_PER_CPU(int, mode); + static DEFINE_PER_CPU(unsigned long, flags); + int cpu = smp_processor_id(); + + if (!vmi_ops.set_lazy_mode) + return; + + /* +* Modes do not nest or overlap, so we can simply disable +* irqs when entering a mode and re-enable when leaving. +*/ + BUG_ON(per_cpu(mode, cpu) && new_mode); + BUG_ON(!new_mode && !per_cpu(mode, cpu)); + + if (new_mode) + local_irq_save(per_cpu(flags, cpu)); + else + local_irq_restore(per_cpu(flags, cpu)); + + vmi_ops.set_lazy_mode(new_mode); + per_cpu(mode, cpu) = new_mode; +} + static inline int __init check_vmi_rom(struct vrom_header *rom) { struct pci_header *pci; @@ -769,7 +795,7 @@ static inline int __init activate_vmi(vo para_wrap(load_esp0, vmi_load_esp0, set_kernel_stack, UpdateKernelStack); para_fill(set_iopl_mask, SetIOPLMask); para_fill(io_delay, IODelay); - para_fill(set_lazy_mode, SetLazyMode); + para_wrap(set_lazy_mode, vmi_set_lazy_mode, set_lazy_mode, SetLazyMode); /* user and kernel flush are just handled with different flags to FlushTLB */ para_wrap(flush_tlb_user, vmi_flush_tlb_user, flush_tlb, FlushTLB);
[PATCH] Bugfix for VMI paravirt ops
I noticed this never got applied. There was some feedback which I did not include in this patch because I think it is inappropriate to touch code outside vmi.c at this point for 2.6.21. Please apply; this patch is needed as a bugfix in 2.6.21. An updated version for 2.6.22 will come later which has a nicer interface. Zach Critical bugfix; when using software RAID, potentially USB or AIO in highmem configurations, drivers are allowed to use kmap_atomic from interrupt context. This is incompatible with the current implementation of lazy MMU mode, and means the kmap will silently fail, causing either memory corruption or kernel panics. The fix is to disable interrupts on the CPU when entering a lazy MMU state; this is totally safe, as preemption is already disabled, and lazy update state can neither be nested nor overlapping. Thus per-cpu variables to track the state and flags can be used to disable interrupts during this critical region. Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]> Index: ubuntu-2.6.20/arch/i386/kernel/vmi.c === --- ubuntu-2.6.20.orig/arch/i386/kernel/vmi.c 2007-03-29 21:17:47.0 -0700 +++ ubuntu-2.6.20/arch/i386/kernel/vmi.c2007-03-30 00:01:20.0 -0700 @@ -69,6 +69,7 @@ void (fastcall *flush_tlb)(int); void (fastcall *set_initial_ap_state)(int, int); void (fastcall *halt)(void); + void (fastcall *set_lazy_mode)(int mode); } vmi_ops; /* XXX move this to alternative.h */ @@ -577,6 +578,31 @@ } #endif +static void vmi_set_lazy_mode(int new_mode) +{ + static DEFINE_PER_CPU(int, mode); + static DEFINE_PER_CPU(unsigned long, flags); + int cpu = smp_processor_id(); + + if (!vmi_ops.set_lazy_mode) + return; + + /* +* Modes do not nest or overlap, so we can simply disable +* irqs when entering a mode and re-enable when leaving. +*/ + BUG_ON(per_cpu(mode, cpu) && new_mode); + BUG_ON(!new_mode && !per_cpu(mode, cpu)); + + if (new_mode) + local_irq_save(per_cpu(flags, cpu)); + else + local_irq_restore(per_cpu(flags, cpu)); + + vmi_ops.set_lazy_mode(new_mode); + per_cpu(mode, cpu) = new_mode; +} + static inline int __init check_vmi_rom(struct vrom_header *rom) { struct pci_header *pci; @@ -806,7 +832,7 @@ para_wrap(load_esp0, vmi_load_esp0, set_kernel_stack, UpdateKernelStack); para_fill(set_iopl_mask, SetIOPLMask); para_fill(io_delay, IODelay); - para_fill(set_lazy_mode, SetLazyMode); + para_wrap(set_lazy_mode, vmi_set_lazy_mode, set_lazy_mode, SetLazyMode); /* user and kernel flush are just handled with different flags to FlushTLB */ para_wrap(flush_tlb_user, vmi_flush_tlb_user, flush_tlb, FlushTLB);
Re: [PATCH 12/12] mm: per BDI congestion feedback
On Thu, 05 Apr 2007 19:42:21 +0200 [EMAIL PROTECTED] wrote: > Now that we have per BDI dirty throttling is makes sense to also have oer BDI > congestion feedback; why wait on another device if the current one is not > congested. Similar comments apply. congestion_wait() should be called throttle_at_a_rate_proportional_to_the_speed_of_presently_uncongested_queues(). If a process is throttled in the page allocator waiting for pages to become reclaimable, that process absolutely does not care whether those pages were previously dirty against /dev/sda or against /dev/sdb. It wants to be woken up for writeout completion against any queue. - wbc.encountered_congestion = 0; + wbc.encountered_congestion = NULL; wbc.nr_to_write = MAX_WRITEBACK_PAGES; wbc.pages_skipped = 0; writeback_inodes(); min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write; if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) { /* Wrote less than expected */ - congestion_wait(WRITE, HZ/10); - if (!wbc.encountered_congestion) + if (wbc.encountered_congestion) + congestion_wait(wbc.encountered_congestion, + WRITE, HZ/10); + else Well that confused me. You'd be needing to rename wbc.encountered_congestion to congested_bdi or something. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 11/12] mm: accurate pageout congestion wait
On Thu, 05 Apr 2007 19:42:20 +0200 [EMAIL PROTECTED] wrote: > Only do the congestion wait when we actually encountered congestion. The name congestion_wait() was accurate back in 2002, but it isn't accurate any more, and you got misled. It does not only wait for a queue to become uncongested. See clear_bdi_congested()'s callers. As long as the queue is in an uncongested state, we deliver wakeups to congestion_wait() blockers on every IO completion. As I said before, it is so that the MM's polling operations poll at a higher frequency when the IO system is working faster. (It is also to synchronise with end_page_writeback()'s feeding of clean pages to us via rotate_reclaimable_page()). Page reclaim can get into trouble without any request queue having entered a congested state. For example, think about a machine which has a single disk, and the operator has increased that disk's request queue size to 100,000. With your patch all the VM's throttling would be bypassed and we go into a busy loop and declare OOM instantly. There are probably other situations in which page reclaim gets into trouble without a request queue being congested. Minor point: bdi_congested() can be arbitrarily expensive - for DM stackups it is roughly proportional to the number of subdevices in the device. We need to be careful about how frequently we call it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M page size support
From: "Luck, Tony" <[EMAIL PROTECTED]> Date: Thu, 5 Apr 2007 15:50:02 -0700 > Maybe a granule is not the right unit of allocation ... perhaps 4M > would work better (4M/56 ~= 75000 pages ~= 1.1G)? But if this is > too small, then a hard-coded 16M would be better than a granule, > because 64M is (IMHO) too big. A 4MB chunk of page structs covers about 512MB of ram (I'm rounding up to 64-bytes in my calculations and using an 8K page size, sorry :-). So I think that is too small although on the sparc64 side that is the biggest I have available on most processor models. But I do agree that 64MB is way too big and 16MB is a good compromise chunk size for this stuff. That covers about 2GB of ram with the above parameters, which should be about right. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Questions about porting perfmon2 to powerpc
On Thu, 2007-04-05 at 14:55 -0500, Kevin Corry wrote: > Hello, > > Carl Love and I have been working on getting the latest perfmon2 patches > (http://perfmon2.sourceforge.net/) working on Cell, and on powerpc in > general. We've come up with some powerpc-specific questions and we're hoping > to get some opinions from the powerpc kernel developers. > > First, the stock 2.6.20 kernel has a prototype in include/linux/smp.h for a > function called smp_call_function_single(). However, this routine is only > implemented on i386, x86_64, ia64, and mips. Perfmon2 apparently needs to > call this to run a function on a specific CPU. Powerpc provides an > smp_call_function() routine to run a function on all active CPUs, so I used > that as a basis to add an smp_call_function_single() routine. I've included > the patch below and was wondering if it looked like a sane approach. We should do better... it will require some backend work for the various supported PICs though. I've always wanted to look into doing a smp_call_function_cpumask in fact :-) > Next, we ran into a problem related to Perfmon2 initialization and sysfs. The > problem turned out to be that the powerpc version of topology_init() is > defined as an __initcall() routine, but Perfmon2's initialization is done as > a subsys_initcall() routine. Thus, Perfmon2 tries to initialize its sysfs > information before some of the powerpc cpu information has been initialized. > However, on all other architectures, topology_init() is defined as a > subsys_initcall() routine, so this problem was not seen on any other > platforms. Changing the powerpc version of topology_init() to a > subsys_initcall() seems to have fixed the bug. However, I'm not sure if that > is going to cause problems elsewhere in the powerpc code. I've included the > patch below (after the smp-call-function-single patch). Does anyone know if > this change is safe, or if there was a specific reason that topology_init() > was left as an __initcall() on powerpc? It would make sense to follow what other archs do. Note that if both perfmon and topology_init are subsys_initcall, that is on the same level, it's still a bit hairy to expect one to be called before the other... Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] Stop pmac_zilog from abusing 8250's device numbers; optionally.
On Fri, 2007-04-06 at 08:53 +1000, Paul Mackerras wrote: > Why would the numbers be prone to change, any more than they are > already? Because now 8250 ports can actually coexist with Zilog ports. Before my fix, it was strictly one or the other. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/12] mm: page_alloc_wait
On Thu, 05 Apr 2007 19:42:19 +0200 [EMAIL PROTECTED] wrote: > Introduce a mechanism to wait on free memory. > > Currently congestion_wait() is abused to do this. Such a very small explanation for such a terrifying change. > ... > > --- linux-2.6-mm.orig/mm/vmscan.c 2007-04-05 16:29:46.0 +0200 > +++ linux-2.6-mm/mm/vmscan.c 2007-04-05 16:29:49.0 +0200 > @@ -1436,6 +1436,7 @@ static int kswapd(void *p) > finish_wait(>kswapd_wait, ); > > balance_pgdat(pgdat, order); > + page_alloc_ok(); > } > return 0; > } For a start, we don't know that kswapd freed pages which are in a suitable zone. And we don't know that kswapd freed pages which are in a suitable cpuset. congestion_wait() is similarly ignorant of the suitability of the pages, but the whole idea behind congestion_wait is that it will throttle page allocators to some speed which is proportional to the speed at which the IO systems can retire writes - view it as a variable-speed polling operation, in which the polling frequency goes up when the IO system gets faster. This patch changes that philosophy fundamentally. That's worth more than a 2-line changelog. Also, there might be situations in which kswapd gets stuck in some dark corner. Perhaps the process which is waiting in the page allocator holds filesystem locks which kswapd is blocked on. Or kswapd might be blocked on a particular request queue, or a dead NFS server or something. The timeout will save us, but things will be slow. There could be other problems too, dunno - this stuff is tricky. Why are you changing it, what problems are being solved, etc? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] Stop pmac_zilog from abusing 8250's device numbers; optionally.
David Woodhouse writes: > Of course, the _numbers_ might change -- a given port might no longer be > ttyS0 but ttyS1. But we're happy to overlook that one even though the > effect on the user is identical, right? Why would the numbers be prone to change, any more than they are already? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OT] the shortest thread of LKML !
Willy Tarreau wrote: On Wed, Mar 28, 2007 at 01:02:10PM -0700, David Miller wrote: Please nobody reply to his posting, I'm shit-canning this thread from the start as it's nothing but flame fodder. He forgot the most important thing: there are *many* "benevolent dictators", all with their own domain of excellence ;-) Good catch, David, you're like a spider on a web waiting for the naive intruder ! Posted several days too early for April Fool... -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] leave loglevel at 7 through sysrq output so you can actually read it
We carefully set loglevel to 7, and print the sysrq messsage as to what event we're doing, but we can't actually see the output as it sets it back before calling the handler, rather than after. Move the assignment down one line. Signed-off-by: Martin J. Bligh <[EMAIL PROTECTED]> diff -aurpN -X /home/mbligh/.diff.exclude linux-2.6.21-rc5-git10/drivers/char/sysrq.c linux-2.6.21-rc5-git10-loglevel/drivers/char/sysrq.c --- linux-2.6.21-rc5-git10/drivers/char/sysrq.c 2007-04-03 11:23:54.0 -0700 +++ linux-2.6.21-rc5-git10-loglevel/drivers/char/sysrq.c2007-04-05 15:49:40.0 -0700 @@ -421,8 +421,8 @@ void __handle_sysrq(int key, struct tty_ */ if (!check_mask || sysrq_on_mask(op_p->enable_mask)) { printk("%s\n", op_p->action_msg); - console_loglevel = orig_log_level; op_p->handler(key, tty); + console_loglevel = orig_log_level; } else { printk("This sysrq operation is disabled.\n"); }
RE: [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M page size support
> This implements granule page sized vmemmap support for IA64. Christoph, Your calculations here are all based on a granule size of 16M, but it is possible to configure 64M granules. With current sizeof(struct page) == 56, a 16M page will hold enough page structures for about 4.5G of physical space (assuming 16K pages), so a 64M page would cover 18G. 4.5G is possibly a bit wasteful (for a system with only a handful of GBytes per node, and nodes that are not physically contiguous). 18G is definitely going to result in lots of wasted page structs (that refer to non-existant physical memory around the edges of each node). Maybe a granule is not the right unit of allocation ... perhaps 4M would work better (4M/56 ~= 75000 pages ~= 1.1G)? But if this is too small, then a hard-coded 16M would be better than a granule, because 64M is (IMHO) too big. -Tony P.S. This patch breaks the build for tiger_defconfig, zx1_defconfig etc. But you may have fit on the "grand-unified theory" of mem_map management ... so if the benchmarks come in favourably we could drop all the other CONFIG options. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC] x86: clear X86_FEATURE_MWAIT for AMD Fam10 CPU
Hi, I send this as RFC because I won't manage it to test it before end of Easter but want to have a consensus about how the final patch should look like. Andi, what do you finally prefer? (1) Something like the attached patch or (2) a version which keeps to the MWAIT flag for Fam10 but introduces an X86_FEATURE_MWAIT_DOESNT_SAVE_POWER as you suggested. An idle=mwait kernel parameter could (and should) be introduced with both alternatives. Meanwhile I think it would suffice to do (1) and issue another cpuid if idle=mwait was used to select mwait_idle. Regards, Andreas -- diff --git a/arch/i386/kernel/cpu/amd.c b/arch/i386/kernel/cpu/amd.c index 2d47db4..4e01262 100644 --- a/arch/i386/kernel/cpu/amd.c +++ b/arch/i386/kernel/cpu/amd.c @@ -228,6 +228,9 @@ #define CBAR_KEY(0X00CB) } switch (c->x86) { + case 16: + clear_bit(X86_FEATURE_MWAIT, c->x86_capability); + break; case 15: set_bit(X86_FEATURE_K8, c->x86_capability); break; diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c index 3d98b69..f53ee6c 100644 --- a/arch/x86_64/kernel/setup.c +++ b/arch/x86_64/kernel/setup.c @@ -583,6 +583,10 @@ #endif if (c->x86 == 15 && ((level >= 0x0f48 && level < 0x0f50) || level >= 0x0f58)) set_bit(X86_FEATURE_REP_GOOD, >x86_capability); + /* disable use of mwait on idle */ + if (c->x86 == 16) + clear_bit(X86_FEATURE_MWAIT, c->x86_capability); + /* Enable workaround for FXSAVE leak */ if (c->x86 >= 6) set_bit(X86_FEATURE_FXSAVE_LEAK, >x86_capability); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question: half-duplex and full-duplex serial driver
On Thu, 05 Apr 2007 18:40:55 EDT, Bill Davidsen said: > Mockern wrote: > > Hi, > > > > Could you help me please, how can my serial driver to work in half-duplex > > and full-duplex mode? > > > > Thank you > > Since you don't seem to have gotten an answer, and while this is > probably the wrong list for your question, I can give you a pointer > which may help. I got the impression that they were trying to write an in-kernel driver for a serial card, and it was oopsing. My first guess is "bad locking", and my first suggestion is 'Linux Device Drivers, 3rd edition' http://lwn.net/Kernel/LDD3 last I remember. pgpsbUmaVbAKi.pgp Description: PGP signature
Re: Any Intel folks on the list? Intel PCI-E bridge ACPI resource question
My .config is attached.. I cannot reproduce this problem, it only happened once, but I want to find out how to make sure it does not happen again. On Thu, 5 Apr 2007, Justin Piszcz wrote: On Thu, 5 Apr 2007, Justin Piszcz wrote: http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/0315.html Here is the badblocks output: p34:~# /usr/bin/time badblocks -b 512 -s -v -w /dev/sdl Checking for bad blocks in read-write mode From block 0 to 293046768 Testing with pattern 0xaa: done Reading and comparing: done Testing with pattern 0x55: done Reading and comparing: done Testing with pattern 0xff: done Reading and comparing: done Testing with pattern 0x00: done Reading and comparing: done Pass completed, 0 bad blocks found. 1929.06user 467.89system 4:36:23elapsed 14%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+257minor)pagefaults 0swaps p34:~# Nothing wrong with the drive. This problem concerns me greatly as I am not sure what I can do to fix this issue, how can I make sure it does not happen again? Justin. config-2.6.20.4.bz2 Description: Binary data
Re: [PATCH 08/12] mm: fixup possible deadlock
On Thu, 05 Apr 2007 19:42:17 +0200 [EMAIL PROTECTED] wrote: > When the threshol is in the order of the per cpu inaccuracies we can > deadlock by not receiveing the updated count, That explanation is a bit, umm, terse. > introduce a more expensive > but more accurate stat read function to use on low thresholds. Looks like percpu_counter_sum(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/12] mm: remove throttle_vm_writeback
On Thu, 05 Apr 2007 19:42:18 +0200 [EMAIL PROTECTED] wrote: > rely on accurate dirty page accounting to provide enough push back I think we'd like to see a bit more justification than that, please. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question: half-duplex and full-duplex serial driver
Mockern wrote: Hi, Could you help me please, how can my serial driver to work in half-duplex and full-duplex mode? Thank you Since you don't seem to have gotten an answer, and while this is probably the wrong list for your question, I can give you a pointer which may help. The communications program "kermit" can do this, google for the source, or try kermit.columbia.edu first, and read the source to see how they do it. I'm reasonably sure ioctl() is the answer, but that's choice three for your research. -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
REISER4: fix for reiser4_write_extent
While trying to find the cause of problems with reiser4 in recent kernels I came across this. Incomplete write handling seem to be missing from reiser4_write_extent() thanks to reiser4-temp-fix.patch. Strangely, there is a patch by Edward Shishkin that should address that issue, but it is missing from -mm tree. Please check. Max -- Subject: reiser4: fix write_extent From: Edward Shishkin <[EMAIL PROTECTED]> . Fix reiser4_write_extent(): 1) handling incomplete writes missed in reiser4-temp-fix.patch 2) bugs in the case of returned errors Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/reiser4/plugin/item/extent_file_ops.c | 64 - 1 file changed, 37 insertions(+), 27 deletions(-) diff -puN fs/reiser4/plugin/item/extent_file_ops.c~reiser4-fix-write_extent fs/reiser4/plugin/item/extent_file_ops.c --- a/fs/reiser4/plugin/item/extent_file_ops.c~reiser4-fix-write_extent +++ a/fs/reiser4/plugin/item/extent_file_ops.c @@ -941,15 +941,15 @@ static int write_extent_reserve_space(st * reiser4_write_extent - write method of extent item plugin * @file: file to write to * @buf: address of user-space buffer - * @write_amount: number of bytes to write - * @off: position in file to write to + * @count: number of bytes to write + * @pos: position in file to write to * */ ssize_t reiser4_write_extent(struct file *file, const char __user *buf, size_t count, loff_t *pos) { int have_to_update_extent; - int nr_pages; + int nr_pages, nr_dirty; struct page *page; jnode *jnodes[WRITE_GRANULARITY + 1]; struct inode *inode; @@ -958,7 +958,7 @@ ssize_t reiser4_write_extent(struct file int i; int to_page, page_off; size_t left, written; - int result; + int result = 0; inode = file->f_dentry->d_inode; if (write_extent_reserve_space(inode)) @@ -972,10 +972,12 @@ ssize_t reiser4_write_extent(struct file BUG_ON(get_current_context()->trans->atom != NULL); + left = count; index = *pos >> PAGE_CACHE_SHIFT; /* calculate number of pages which are to be written */ end = ((*pos + count - 1) >> PAGE_CACHE_SHIFT); nr_pages = end - index + 1; + nr_dirty = 0; assert("", nr_pages <= WRITE_GRANULARITY + 1); /* get pages and jnodes */ @@ -983,22 +985,17 @@ ssize_t reiser4_write_extent(struct file page = find_or_create_page(inode->i_mapping, index + i, reiser4_ctx_gfp_mask_get()); if (page == NULL) { - while(i --) { - unlock_page(jnode_page(jnodes[i])); - page_cache_release(jnode_page(jnodes[i])); - } - return RETERR(-ENOMEM); + nr_pages = i; + result = RETERR(-ENOMEM); + goto out; } - jnodes[i] = jnode_of_page(page); if (IS_ERR(jnodes[i])) { unlock_page(page); page_cache_release(page); - while (i --) { - jput(jnodes[i]); - page_cache_release(jnode_page(jnodes[i])); - } - return RETERR(-ENOMEM); + nr_pages = i; + result = RETERR(-ENOMEM); + goto out; } /* prevent jnode and page from disconnecting */ JF_SET(jnodes[i], JNODE_WRITE_PREPARED); @@ -1009,7 +1006,6 @@ ssize_t reiser4_write_extent(struct file have_to_update_extent = 0; - left = count; page_off = (*pos & (PAGE_CACHE_SIZE - 1)); for (i = 0; i < nr_pages; i ++) { to_page = PAGE_CACHE_SIZE - page_off; @@ -1050,14 +1046,26 @@ ssize_t reiser4_write_extent(struct file flush_dcache_page(page); kunmap_atomic(kaddr, KM_USER0); } - - written = filemap_copy_from_user(page, page_off, buf, to_page); + written = filemap_copy_from_user_atomic(page, page_off, buf, + to_page); + if (written != to_page) + /* Do it the slow way */ + written = filemap_copy_from_user_nonatomic(page, + page_off, + buf, + to_page); + if (unlikely(written != to_page)) { + unlock_page(page); + result = RETERR(-EFAULT); +
optimizing sendfile
Hi, How can I control the size of the block requests the sendfile() syscall performs against the disk? I'm using sendfile (on a 2.6.18 kernel) to copy 1M file chunks into a socket. The socket send buffer size is 2MB, and I verify that its empty before making the call. Indeed, 1M chunk is being sent, but from iostat, I can tell that the average request size is around 128KB. Are there any kernel configuration variables that could change that? Help will be appreciated. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/12] mm: scalable bdi statistics counters.
On Thu, 05 Apr 2007 19:42:11 +0200 [EMAIL PROTECTED] wrote: > Provide scalable per backing_dev_info statistics counters modeled on the ZVC > code. > > Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> > --- > block/ll_rw_blk.c |1 > drivers/block/rd.c |2 > drivers/char/mem.c |2 > fs/char_dev.c |1 > fs/fuse/inode.c |1 > fs/nfs/client.c |1 > include/linux/backing-dev.h | 98 + > mm/backing-dev.c| 103 > madness! Quite duplicative of vmstat.h, yet all this infrastructure is still only usable in one specific application. Can we please look at generalising the vmstat.h stuff? Or, the API in percpu_counter.h appears suitable to this application. (The comment at line 6 is a total lie). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] Optimize compound_head() by avoiding a shared page flag
Unalias PG_tail for performance reasons If PG_tail is an alias then we need to check PageCompound before PageTail. This is particularly bad because the slab and others have to use these tests in performance critical paths. This patch uses one of the freed up software suspend flags that is defined next to PG_compound. Excerpt from kfree (page = compound_head(page)) before patch: r33 = pointer to page struct. 0xa00100170271 : ld4.acq r14=[r33] 0xa00100170272 : nop.i 0x0;; 0xa00100170280 : [MIB] nop.m 0x0 0xa00100170281 : tbit.z p9,p8=r14,14 0xa00100170282 :(p09) br.cond.dptk.few 0xa001001702c0 0xa00100170290 : [MMI] ld4.acq r9=[r33] 0xa00100170291 : nop.m 0x0 0xa00100170292 : adds r8=16,r33;; 0xa001001702a0 : [MII] nop.m 0x0 0xa001001702a1 : tbit.z p10,p11=r9,17 0xa001001702a2 : nop.i 0x0 0xa001001702b0 : [MMI] nop.m 0x0;; 0xa001001702b1 : (p11) ld8 r33=[r8] 0xa001001702b2 : nop.i 0x0;; 0xa001001702c0 : [MII] ... After patch: r34 pointer to page struct 0xa0010016f541 : ld4.acq r3=[r34] 0xa0010016f542 : nop.i 0x0 0xa0010016f550 : [MMI] adds r2=16,r34;; 0xa0010016f551 : nop.m 0x0 0xa0010016f552 : tbit.z p10,p11=r3,13;; 0xa0010016f560 : [MII] (p11) ld8 r34=[r2] No branch anymore. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc5-mm4/include/linux/page-flags.h === --- linux-2.6.21-rc5-mm4.orig/include/linux/page-flags.h2007-04-05 15:18:33.0 -0700 +++ linux-2.6.21-rc5-mm4/include/linux/page-flags.h 2007-04-05 15:18:39.0 -0700 @@ -82,6 +82,7 @@ #define PG_private 11 /* If pagecache, has fs-private data */ #define PG_writeback 12 /* Page is under writeback */ +#define PG_tail13 /* Page is tail of a compound page */ #define PG_compound14 /* Part of a compound page */ #define PG_swapcache 15 /* Swap page: swp_entry_t in private */ @@ -95,12 +96,6 @@ /* PG_owner_priv_1 users should have descriptive aliases */ #define PG_checked PG_owner_priv_1 /* Used by some filesystems */ -/* - * Marks tail portion of a compound page. We currently do not reclaim - * compound pages so we can reuse a flag only used for reclaim here. - */ -#define PG_tailPG_reclaim - #if (BITS_PER_LONG > 32) /* * 64-bit-only flags build down from bit 31 @@ -220,10 +215,6 @@ static inline void SetPageUptodate(struc #define __SetPageCompound(page)__set_bit(PG_compound, &(page)->flags) #define __ClearPageCompound(page) __clear_bit(PG_compound, &(page)->flags) -/* - * Note: PG_tail is an alias of another page flag. The result of PageTail() - * is only valid if PageCompound(page) is true. - */ #define PageTail(page) test_bit(PG_tail, &(page)->flags) #define __SetPageTail(page)__set_bit(PG_tail, &(page)->flags) #define __ClearPageTail(page) __clear_bit(PG_tail, &(page)->flags) Index: linux-2.6.21-rc5-mm4/mm/page_alloc.c === --- linux-2.6.21-rc5-mm4.orig/mm/page_alloc.c 2007-04-05 15:18:33.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/page_alloc.c2007-04-05 15:18:39.0 -0700 @@ -500,18 +500,13 @@ static inline int free_pages_check(struc 1 << PG_private | 1 << PG_locked | 1 << PG_active | + 1 << PG_reclaim | 1 << PG_slab| 1 << PG_swapcache | 1 << PG_writeback | 1 << PG_reserved | 1 << PG_buddy bad_page(page); - /* -* PageReclaim == PageTail. It is only an error -* for PageReclaim to be set if PageCompound is clear. -*/ - if (unlikely(!PageCompound(page) && PageReclaim(page))) - bad_page(page); if (PageDirty(page)) __ClearPageDirty(page); /* Index: linux-2.6.21-rc5-mm4/mm/internal.h === --- linux-2.6.21-rc5-mm4.orig/mm/internal.h 2007-04-05 15:18:33.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/internal.h 2007-04-05 15:18:39.0 -0700 @@ -24,7 +24,7 @@ static inline void set_page_count(struct */ static inline void set_page_refcounted(struct page *page) { - VM_BUG_ON(PageCompound(page) && PageTail(page)); + VM_BUG_ON(PageTail(page)); VM_BUG_ON(atomic_read(>_count)); set_page_count(page, 1); } Index: linux-2.6.21-rc5-mm4/include/linux/mm.h
Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
From: Andrew Burgess <[EMAIL PROTECTED]> Date: Thu, 5 Apr 2007 15:13:27 -0700 > David, do you see any other problems with scsi_send_eh_cmnd? > > I've switched back to 2.6.18 which seems to not oops > and am happy to try patches. Does 2.6.20 with my patch OOPS too? Does reverting my patch make the oops go away? If reverting my patch makes the OOPS go away, we need to verify if page_address() is returning crap for some reason or the length is wrong. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] Make page->private usable in compound pages V1
[PATCH] Free up page->private for compound pages If we add a new flag so that we can distinguish between the first page and the tail pages then we can avoid to use page->private in the first page. page->private == page for the first page, so there is no real information in there. Freeing up page->private makes the use of compound pages more transparent. They become more usable like real pages. Right now we have to be careful f.e. if we are going beyond PAGE_SIZE allocations in the slab on i386 because we can then no longer use the private field. This is one of the issues that cause us not to support debugging for page size slabs in SLAB. Having page->private available for SLUB would allow more meta information in the page struct. I can probably avoid the 16 bit ints that I have in there right now. Also if page->private is available then a compound page may be equipped with buffer heads. This may free up the way for filesystems to support larger blocks than page size. We add PageTail as an alias of PageReclaim. Compound pages cannot currently be reclaimed. Because of the alias one needs to check PageCompound first. The RFC for the this approach was discussed at http://marc.info/?t=11757430281=1=2 Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc5-mm4/include/linux/mm.h === --- linux-2.6.21-rc5-mm4.orig/include/linux/mm.h2007-04-05 13:59:23.0 -0700 +++ linux-2.6.21-rc5-mm4/include/linux/mm.h 2007-04-05 14:08:11.0 -0700 @@ -297,17 +297,28 @@ static inline int get_page_unless_zero(s return atomic_inc_not_zero(>_count); } +static inline struct page *compound_head(struct page *page) +{ + /* +* We could avoid the PageCompound(page) check if +* we would not overload PageTail(). +* +* This check has to be done in several performance critical +* paths of the slab etc. IMHO PageTail deserves its own flag. +*/ + if (unlikely(PageCompound(page) && PageTail(page))) + return page->first_page; + return page; +} + static inline int page_count(struct page *page) { - if (unlikely(PageCompound(page))) - page = (struct page *)page_private(page); - return atomic_read(>_count); + return atomic_read(_head(page)->_count); } static inline void get_page(struct page *page) { - if (unlikely(PageCompound(page))) - page = (struct page *)page_private(page); + page = compound_head(page); VM_BUG_ON(atomic_read(>_count) == 0); atomic_inc(>_count); } @@ -344,6 +355,18 @@ static inline compound_page_dtor *get_co return (compound_page_dtor *)page[1].lru.next; } +static inline int compound_order(struct page *page) +{ + if (!PageCompound(page) || PageTail(page)) + return 0; + return (unsigned long)page[1].lru.prev; +} + +static inline void set_compound_order(struct page *page, unsigned long order) +{ + page[1].lru.prev = (void *)order; +} + /* * Multiple processes may "see" the same page. E.g. for untouched * mappings of /dev/null, all processes see the same page full of Index: linux-2.6.21-rc5-mm4/include/linux/page-flags.h === --- linux-2.6.21-rc5-mm4.orig/include/linux/page-flags.h2007-04-05 13:59:23.0 -0700 +++ linux-2.6.21-rc5-mm4/include/linux/page-flags.h 2007-04-05 14:00:56.0 -0700 @@ -95,6 +95,12 @@ /* PG_owner_priv_1 users should have descriptive aliases */ #define PG_checked PG_owner_priv_1 /* Used by some filesystems */ +/* + * Marks tail portion of a compound page. We currently do not reclaim + * compound pages so we can reuse a flag only used for reclaim here. + */ +#define PG_tailPG_reclaim + #if (BITS_PER_LONG > 32) /* * 64-bit-only flags build down from bit 31 @@ -214,6 +220,14 @@ static inline void SetPageUptodate(struc #define __SetPageCompound(page)__set_bit(PG_compound, &(page)->flags) #define __ClearPageCompound(page) __clear_bit(PG_compound, &(page)->flags) +/* + * Note: PG_tail is an alias of another page flag. The result of PageTail() + * is only valid if PageCompound(page) is true. + */ +#define PageTail(page) test_bit(PG_tail, &(page)->flags) +#define __SetPageTail(page)__set_bit(PG_tail, &(page)->flags) +#define __ClearPageTail(page) __clear_bit(PG_tail, &(page)->flags) + #ifdef CONFIG_SWAP #define PageSwapCache(page)test_bit(PG_swapcache, &(page)->flags) #define SetPageSwapCache(page) set_bit(PG_swapcache, &(page)->flags) Index: linux-2.6.21-rc5-mm4/mm/internal.h === --- linux-2.6.21-rc5-mm4.orig/mm/internal.h 2007-04-05 13:59:24.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/internal.h 2007-04-05 14:00:56.0 -0700 @@ -24,7 +24,7 @@
Re: Any Intel folks on the list? Intel PCI-E bridge ACPI resource question
On Thu, 5 Apr 2007, Justin Piszcz wrote: http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/0315.html Here is the badblocks output: p34:~# /usr/bin/time badblocks -b 512 -s -v -w /dev/sdl Checking for bad blocks in read-write mode From block 0 to 293046768 Testing with pattern 0xaa: done Reading and comparing: done Testing with pattern 0x55: done Reading and comparing: done Testing with pattern 0xff: done Reading and comparing: done Testing with pattern 0x00: done Reading and comparing: done Pass completed, 0 bad blocks found. 1929.06user 467.89system 4:36:23elapsed 14%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+257minor)pagefaults 0swaps p34:~# Nothing wrong with the drive. This problem concerns me greatly as I am not sure what I can do to fix this issue, how can I make sure it does not happen again? Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3
On Thu, 5 Apr 2007, David Miller wrote: > Hey Christoph, here is sparc64 support for this stuff. Great! > After implementing this and seeing more and more how it works, I > really like it :-) > > Thanks a lot for doing this work Christoph! Thanks for the appreciation. CCing Andy Whitcroft who will hopefully merge this all of this together into sparsemem including the S/390 implementation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Any Intel folks on the list? Intel PCI-E bridge ACPI resource question
On Thu, 5 Apr 2007, Justin Piszcz wrote: On Thu, 5 Apr 2007, Justin Piszcz wrote: http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/0315.html I have similar issues as this poster-- I was wondering (if anyone) had an idea to the root cause of this issue; is it a problem with the chipset, the BIOS revision? Mobo: Intel DG965WHMKR BIOS: 1666 Is it only Intel Chipsets that suffer from this problem? ... or is it a way the kernel handles ACPI/IO-APIC/etc? Justin. p34:~# /usr/bin/time badblocks -b 512 -s -v -w /dev/sdl Checking for bad blocks in read-write mode From block 0 to 293046768 Testing with pattern 0xaa: done Reading and comparing: done Testing with pattern 0x55: done Reading and comparing: done Testing with pattern 0xff: done Reading and comparing: done Testing with pattern 0x00: done Reading and comparing: done Pass completed, 0 bad blocks found. 1929.06user 467.89system 4:36:23elapsed 14%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+257minor)pagefaults 0swaps p34:~# Not a single bad block found. Does the ICH8 chipset have issues, or the cards I am using and how they are routed? Any suggestions as to what this is? Justin. http://www.linuxhq.com/kernel/v2.6/18/drivers/scsi/sata_sil24.c + [PORT_CERR_SEND] = { AC_ERR_ATA_BUS, ATA_EH_SOFTRESET, +"failed to transmit command FIS" }, + [PORT_CERR_INCONSISTENT] = { AC_ERR_HSM, ATA_EH_SOFTRESET, + "protocol mismatch" }, Is this a chipset or a problem with the PCI-e x1 SiI dual SATA port card? Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
Chuck Ebbert wrote: >Andrew Burgess wrote: > >> Apr 5 03:45:16 cichlid kernel: 3w-: scsi2: Command failed: status = >> 0xc7, flags = 0x7f, unit #4. >> Apr 5 03:45:20 cichlid kernel: 3w-: scsi2: Command failed: status = >> 0xc7, flags = 0x80, unit #4. >> Apr 5 03:47:20 cichlid kernel: 3w-: scsi0: Command failed: status = >> 0xc7, flags = 0x80, unit #0. >> Apr 5 03:47:20 cichlid kernel: 3w-: scsi0: Command failed: status = >> 0xc7, flags = 0x80, unit #1. >> Apr 5 04:00:08 cichlid kernel: 3w-: scsi0: Command failed: status = >> 0xc7, flags = 0x80, unit #0. >.. >> Apr 5 04:00:08 cichlid kernel: >> Apr 5 04:00:08 cichlid kernel: general protection fault: [1] PREEMPT >> SMP >> Apr 5 04:00:08 cichlid kernel: CPU 1 >> Apr 5 04:00:08 cichlid kernel: Modules linked in: dm_multipath multipath >> linear raid456 xor raid1 md_mod act_police sch_ingress sch_sfq sch_cbq >> ipt_TOS cls_u32 sch_htb ipt_MASQUERADE ipt_LOG xt_multiport nf_nat_ftp >> nf_conntrack_ftp iptable_mangle iptable_nat nf_nat emi26 w83627hf hwmon_vid >> i2c_isa sunrpc ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack >> nfnetlink iptable_filter ip_tables x_tables freq_table sr_mod loop dm_mirror >> dm_mod video thermal sbs processor i2c_ec fan dock button battery asus_acpi >> ac parport_pc lp parport floppy nvram snd_usb_audio snd_ice1712 >> snd_ice17xx_ak4xxx snd_via82xx sg gameport snd_seq_dummy pcspkr >> snd_ak4xxx_adda snd_cs8427 snd_via82xx_modem snd_seq_oss snd_ac97_codec >> sata_via snd_i2c snd_seq_midi_event snd_seq skge i2c_viapro snd_mpu401_uart >> i2c_core k8temp ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer >> snd_page_alloc dsbr100 snd_usb_lib snd_rawmidi compat_ioctl32 snd_seq_device >> videodev v4l2_common v4l1_compat snd_hwdep ! snd soundcore sisusbvga ftdi_sio usb_stor >> Apr 5 04:00:08 cichlid kernel: ge serio_raw emi62 usbserial asix usbnet >> ata_piix 3w_ ata_generic pata_via libata sd_mod scsi_mod ext3 jbd >> ehci_hcd ohci_hcd uhci_hcd >> Apr 5 04:00:08 cichlid kernel: Pid: 386, comm: scsi_eh_0 Tainted: G M >> 2.6.21-rc5-git10-1-slab-debug #1 >> Apr 5 04:00:08 cichlid kernel: RIP: 0010:[memcpy_c+11/32] [memcpy_c+11/32] >> memcpy_c+0xb/0x20 >> Apr 5 04:00:08 cichlid kernel: RIP: 0010:[] >> [] memcpy_c+0xb/0x20 >> Apr 5 04:00:08 cichlid kernel: RSP: :8100beebbce8 EFLAGS: 00010246 >> Apr 5 04:00:08 cichlid kernel: RAX: 8100b4978140 RBX: 2003 >> RCX: 000c >> Apr 5 04:00:08 cichlid kernel: RDX: RSI: 6ddaa592 >> RDI: 8100b4978140 >> Apr 5 04:00:08 cichlid kernel: RBP: 8100beebbe20 R08: 0002 >> R09: 0001 >> Apr 5 04:00:08 cichlid kernel: R10: R11: >> R12: 8100b4978140 >> Apr 5 04:00:08 cichlid kernel: R13: 8807a1ce R14: 8100b4978058 >> R15: 8100beebc000 >> Apr 5 04:00:08 cichlid kernel: FS: 2b615f4dc240() >> GS:8100bffae4c8() knlGS:f795eb90 >> Apr 5 04:00:08 cichlid kernel: CS: 0010 DS: 0018 ES: 0018 CR0: >> 8005003b >> Apr 5 04:00:08 cichlid kernel: CR2: 2b615f4b9000 CR3: a25ad000 >> CR4: 06e0 >> Apr 5 04:00:08 cichlid kernel: Process scsi_eh_0 (pid: 386, threadinfo >> 8100beeba000, task 8100bedd0100) >> Apr 5 04:00:08 cichlid kernel: Stack: 88055efc 2711 >> 8100b49780b4 8100bef94508 >> Apr 5 04:00:08 cichlid kernel: 00020002 1a240001 >> 810013404c78 >> Apr 5 04:00:08 cichlid kernel: 8101 dead4ead >> >> Apr 5 04:00:08 cichlid kernel: Call Trace: >> Apr 5 04:00:08 cichlid kernel: [_end+123674792/2126102956] >> :scsi_mod:scsi_send_eh_cmnd+0x3fc/0x480 >> Apr 5 04:00:08 cichlid kernel: [] >> :scsi_mod:scsi_send_eh_cmnd+0x3fc/0x480 >> Apr 5 04:00:08 cichlid kernel: [thread_return+230/301] >> thread_return+0xe6/0x12d >> Apr 5 04:00:08 cichlid kernel: [] >> thread_return+0xe6/0x12d >> Apr 5 04:00:08 cichlid kernel: [_end+123678124/2126102956] >> :scsi_mod:scsi_error_handler+0x0/0x540 >> Apr 5 04:00:08 cichlid kernel: [] >> :scsi_mod:scsi_error_handler+0x0/0x540 >> Apr 5 04:00:08 cichlid kernel: [keventd_create_kthread+0/144] >> keventd_create_kthread+0x0/0x90 >> Apr 5 04:00:08 cichlid kernel: [] >> keventd_create_kthread+0x0/0x90 >> Apr 5 04:00:08 cichlid kernel: [kthread+218/272] kthread+0xda/0x110 >> Apr 5 04:00:08 cichlid kernel: [] kthread+0xda/0x110 >> Apr 5 04:00:08 cichlid autoblacklist[9591]: src= proto= srcport= destport= >> srcname= destportname= srcportname= icmptype= icmpcode= >> Apr 5 04:00:08 cichlid kernel: [child_rip+10/18] child_rip+0xa/0x12 >> Apr 5 04:00:08 cichlid kernel: [] child_rip+0xa/0x12 >> Apr 5 04:00:08 cichlid kernel: [schedule_tail+124/240] >> schedule_tail+0x7c/0xf0 >> Apr 5 04:00:08 cichlid kernel: []
Re: Any Intel folks on the list? Intel PCI-E bridge ACPI resource question
On Thu, 5 Apr 2007, Justin Piszcz wrote: http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/0315.html I have similar issues as this poster-- I was wondering (if anyone) had an idea to the root cause of this issue; is it a problem with the chipset, the BIOS revision? Mobo: Intel DG965WHMKR BIOS: 1666 Is it only Intel Chipsets that suffer from this problem? ... or is it a way the kernel handles ACPI/IO-APIC/etc? Justin. p34:~# /usr/bin/time badblocks -b 512 -s -v -w /dev/sdl Checking for bad blocks in read-write mode From block 0 to 293046768 Testing with pattern 0xaa: done Reading and comparing: done Testing with pattern 0x55: done Reading and comparing: done Testing with pattern 0xff: done Reading and comparing: done Testing with pattern 0x00: done Reading and comparing: done Pass completed, 0 bad blocks found. 1929.06user 467.89system 4:36:23elapsed 14%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+257minor)pagefaults 0swaps p34:~# Not a single bad block found. Does the ICH8 chipset have issues, or the cards I am using and how they are routed? Any suggestions as to what this is? Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Any Intel folks on the list? Intel PCI-E bridge ACPI resource question
http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/0315.html I have similar issues as this poster-- I was wondering (if anyone) had an idea to the root cause of this issue; is it a problem with the chipset, the BIOS revision? Mobo: Intel DG965WHMKR BIOS: 1666 Is it only Intel Chipsets that suffer from this problem? ... or is it a way the kernel handles ACPI/IO-APIC/etc? Justin. (again, the dmesg output posted earlier (below)) [369143.916093] ata13.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [369143.916100] ata13.00: (irq_stat 0x00020002, failed to transmit command FIS) [369143.916107] ata13.00: cmd ca/00:00:97:1a:d5/00:00:00:00:00/e9 tag 0 cdb 0x0 data 131072 out [369143.916109] res 93/37:00:00:00:00/00:00:40:00:93/00 Emask 0x12 (ATA bus error) [369143.916116] ata13: hard resetting port [369146.145915] ata13: softreset failed (port not ready) [369146.145922] ata13: follow-up softreset failed, retrying in 5 secs [369151.146035] ata13: hard resetting port [369153.376736] ata13: softreset failed (port not ready) [369153.376743] ata13: follow-up softreset failed, retrying in 5 secs [369158.376664] ata13: hard resetting port [369160.608025] ata13: softreset failed (port not ready) [369160.608033] ata13: reset failed, giving up [369160.608036] ata13.00: disabled [369160.608043] ata13: EH pending after completion, repeating EH (cnt=4) [369160.718365] ata13: exception Emask 0x10 SAct 0x0 SErr 0x405 action 0x6 frozen [369160.718370] ata13: (irq_stat 0x00060002, failed to transmit command FIS) [369161.238432] ata13: waiting for device to spin up (8 secs) [369168.715610] ata13: hard resetting port [369170.946658] ata13: softreset failed (port not ready) [369170.94] ata13: follow-up softreset failed, retrying in 5 secs [369175.946249] ata13: hard resetting port [369178.167644] ata13: softreset failed (port not ready) [369178.167651] ata13: follow-up softreset failed, retrying in 5 secs [369183.167742] ata13: hard resetting port [369185.398497] ata13: softreset failed (port not ready) [369185.398504] ata13: reset failed, giving up [369185.398522] sd 12:0:0:0: SCSI error: return code = 0x0802 [369185.398526] sdl: Current [descriptor]: sense key: Aborted Command [369185.398532] Additional sense: Scsi parity error [369185.398539] Descriptor sense data with sense descriptors (in hex): [369185.398544] 72 0b 47 00 00 00 00 0c 00 0a 80 00 00 00 00 00 [369185.398572] 00 00 00 00 [369185.398581] end_request: I/O error, dev sdl, sector 164960919 [369185.398586] raid5: Disk failure on sdl1, disabling device. Operation continuing on 3 devices [369185.398617] sd 12:0:0:0: rejecting I/O to offline device [369185.398625] ata13: EH complete [369185.398635] ata13.00: detaching (SCSI 12:0:0:0) [369185.398676] sd 12:0:0:0: SCSI error: return code = 0x0001 [369185.398680] end_request: I/O error, dev sdl, sector 164961175 [369185.398702] raid5:md3: read error not correctable (sector 164962304 on sdl1). [369185.398707] raid5:md3: read error not correctable (sector 164962312 on sdl1). [369185.398711] raid5:md3: read error not correctable (sector 164962320 on sdl1). [369185.398716] raid5:md3: read error not correctable (sector 164962328 on sdl1). [369185.398760] Synchronizing SCSI cache for disk sdl: [369185.398784] FAILED [369185.398785] status = 0, message = 00, host = 4, driver = 00 [369185.398786] <3>scsi 12:0:0:0: rejecting I/O to dead device [369185.404619] scsi 12:0:0:0: rejecting I/O to dead device [369185.404641] scsi 12:0:0:0: rejecting I/O to dead device [369185.404662] scsi 12:0:0:0: rejecting I/O to dead device [369185.404682] scsi 12:0:0:0: rejecting I/O to dead device [369185.404686] scsi 12:0:0:0: rejecting I/O to dead device [369185.404691] scsi 12:0:0:0: rejecting I/O to dead device [369185.404712] scsi 12:0:0:0: rejecting I/O to dead device [369185.404732] scsi 12:0:0:0: rejecting I/O to dead device [369185.404753] scsi 12:0:0:0: rejecting I/O to dead device [369185.404774] scsi 12:0:0:0: rejecting I/O to dead device [369185.404794] scsi 12:0:0:0: rejecting I/O to dead device [369185.404815] scsi 12:0:0:0: rejecting I/O to dead device [369185.404844] scsi 12:0:0:0: rejecting I/O to dead device [369185.404863] scsi 12:0:0:0: rejecting I/O to dead device [369185.404882] scsi 12:0:0:0: rejecting I/O to dead device [369185.404900] scsi 12:0:0:0: rejecting I/O to dead device [369185.404918] scsi 12:0:0:0: rejecting I/O to dead device [369185.404937] scsi 12:0:0:0: rejecting I/O to dead device [369185.404956] scsi 12:0:0:0: rejecting I/O to dead device [369185.413938] RAID5 conf printout: [369185.413944] --- rd:4 wd:3 [369185.413948] disk 0, o:1, dev:sdi1 [369185.413950] disk 1, o:1, dev:sdj1 [369185.413953] disk 2, o:0, dev:sdl1 [369185.413956] disk 3, o:1, dev:sdg1 [369185.418873] RAID5 conf printout: [369185.418878] --- rd:4 wd:3 [369185.418881] disk 0, o:1, dev:sdi1 [369185.418884] disk 1, o:1, dev:sdj1 [369185.418887] disk
Re: [PATCH 01/01] New FBDev driver for Intel Vermilion Range
On Thu, 2007-04-05 at 22:42 +0100, Alan Hourihane wrote: > On Thu, 2007-04-05 at 21:38 +0200, Arnd Bergmann wrote: > > On Thursday 05 April 2007, Alan Hourihane wrote: > > > @@ -0,0 +1,405 @@ > > > +/* > > > + * Copyright (c) Intel Corp. 2007. > > > + * All Rights Reserved. > > > + * > > > > Saying 'All Rights Reserved' is usually considered the opposite of > > licensing your code as GPL. I suppose you need to remove that. > > Arnd, > > Thanks for your comments, and I'll review and make appropriate changes > as you've suggested. > > As for the above, I've noticed that drivers/video/epson1355fb.c also has > this wording and is under the GPL. > "All Rights Reserved" is written notice, a part of copyright law formality. Nowadays, your works are protected (under the copyright law) even without that written notice, so the phrase can be excluded. I don't think the phrase is incompatible with GPL. Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/01] New FBDev driver for Intel Vermilion Range
On Thursday 05 April 2007, Alan Hourihane wrote: > As for the above, I've noticed that drivers/video/epson1355fb.c also has > this wording and is under the GPL. Yes, many files have it, but that doesn't make it right ;-) Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB gadget rndis: fix bug skb_push function may return an unaligned pointer bug
On Tuesday 03 April 2007 11:28 pm, Wu, Bryan wrote: > USB gadget rndis: skb_push function may return a pointer which is not > aligned as required by struct rndis_packet_msg_type. Can you instead try to update the declaration of that struct so that it's "__attribute__((packed))"? That's less invasive, and will address similar issues elsewhere ... - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/